Parsing XML in a C# Application? - c#

Right now, I am getting a Google search's XML. However, the XML doc is so big, I can't find anything anywhere. I am wondering how I can find the answer on Google. By that, I mean when you Google "Capital of Florida" the box at the top says Tallahassee. I want to access that information but I am unsure how.
var request = (HttpWebRequest)WebRequest.Create(url);
request.Method = "GET";
var response = request.GetResponse();
var rstream = response.GetResponseStream();
var sr = new StreamReader(rstream);
var json = sr.ReadToEnd();
Console.WriteLine(json.ToString());
The last Console.Writeline obviously just shoots out a huge monster of an XML doc

See this it uses LINQ to extract a piece of info from XML documents https://coderwall.com/p/qghcqw

if you are requesting HTML, a good way to parse the data is using HtmlAgilityPack
http://htmlagilitypack.codeplex.com/

Related

The correct way to create an HTMLDocument using HTMLAgilityPack?

Consider the code below:
string url="http://badoo.com";
HttpWebRequest myRequest = (HttpWebRequest)WebRequest.Create(url);
myRequest.Method = "GET";
WebResponse myResponse = myRequest.GetResponse();
StreamReader sr = new StreamReader(myResponse.GetResponseStream(), System.Text.Encoding.UTF8);
string result = sr.ReadToEnd();
sr.Close();
myResponse.Close();
Now I have the htmlstring inside the result variable, let's try something:
// save normally
File.WriteAllText("1.html",result);
// save using HTMLAgilityPack
HtmlAgilityPack.HtmlDocument hdoc = new HtmlAgilityPack.HtmlDocument();
hdoc.LoadHtml(result);
hdoc.Save("2.html");
Can someone please tell me why 1.html and 2.html doesn't look the same ? Although they have the same file size ?
Link to the correct one (file.writealltext() ) : http://woman2.com/1.html
Link to the wrong one (saved with htmlagility pack) : http://woman2.com/2.html
Update:
I have also tried to save the file on local disk and then
hdoc.Load("path/to/local",true);
I have also tried:
hdoc.LoadHtml(result);
And tried:
hdoc.Save("2.html",Encoding.UTF8);
but any of the attemps seems to be working to me. I've been struggling with this for 3 days now.
Andrew Morton is correct. The file '1.html' is formed in a way that makes agility pack angry/scared/confused. In all seriousness, I ran your code and diffed the resulting files and here are some of the differences:
Innocuous:
removes whitespace redundancy
adds closing tags where were previously self-closing
Possibly affect the site:
adds attribute values where none previously existed
changes language-specific characters
Likely affect the site:
"fixes" unmatched quotations (I would put my money here if I had to guess)
Again, as Andrew mentioned, fix that up first before banging your head against the wall any further.

Pass through XML from another website

I am trying to pass through some XML from an external website.
What is the best way of doing this, through c# webpage or asp.MVC?
I tend to use something like this for working with external XML documents / RSS feeds etc:
string sURL = ".....";
// Create a request for the URL.
WebRequest oRequest = WebRequest.Create(sUrl);
// Get the response.
WebResponse oResponse = oRequest.GetResponse();
// Get the stream containing content returned by the server.
Stream oDataStream = oResponse.GetResponseStream();
// Open the stream using a StreamReader for easy access.
StreamReader oReader = new StreamReader(oDataStream, System.Text.Encoding.Default);
// Read the content.
string sXML = oReader.ReadToEnd();
// Convert string to XML
XDocument oFeed = XDocument.Parse(sXML);
Either should be fine. MVC is probably easiest (in terms of getting a raw response), but you could do the same in regular ASP.NET just by using a handler (possibly .ashx), or just by clearing the response.

Under High load XDocument.Parse Creating errors

I am trying to access this webservice, The problem is that sometimes XDocument.Parse is not able to process and generates an error System.Xml.XmlException: Root element is missing. on the line:
XDocument xmlDoc = XDocument.Parse(xmlData);
Even though the XML sent is correct according to my logs.
I was wondering, Is it possible that the StreamReader is not working properly
using (StreamReader reader = new StreamReader(context.Request.InputStream))
{
xmlData = reader.ReadToEnd();
}
XDocument xmlDoc = XDocument.Parse(xmlData);
By the way this is all under a Custom HttpHandler.
Can someone please me guide in the right direction for this.
Thanks
Does it work any more consistently if you use
XDocument.Load(new StreamReader(context.Request.InputStream))
instead of XDocument.Parse?
Your code sample doesn't include logging of the read inputstream. The problem is prior to this point.

Need help for parsing HTML in C#

For personal use i am trying to parse a little html page that show in a simple grid the result of the french soccer championship.
var Url = "http://www.lfp.fr/mobile/ligue1/resultat.asp?code_jr_tr=J01";
WebResponse result = null;
WebRequest req = WebRequest.Create(Url);
result = req.GetResponse();
Stream ReceiveStream = result.GetResponseStream();
Encoding encode = System.Text.Encoding.GetEncoding(0);
StreamReader sr = new StreamReader(ReceiveStream, encode);
while (sr.Read() != -1)
{
Line = sr.ReadLine();
Line = Regex.Replace(Line, #"<(.|\n)*?>", " ");
Line = Line.Replace(" ", "");
Line = Line.TrimEnd();
Line = Line.TrimStart();
and then i really dont have a clue either take line by line or the
whole stream at one and how to retreive only the team's name with the next number that would be the score.
At the end i want to put both 2 team's with scores in a liste or xml to use it with an phone application
If anyone has an idea it would be great thanks!
Take a look at Html Agility Pack
You could put the stream into an XmlDocument, allowing you to query via something like XPath. Or you could use LINQ to XML with an XDocument.
It's not perfect though, because HTML files aren't always well-formed XML (don't we know it!), but it's a simple solution using stuff already available in the framework.
You'll need an SgmlReader, which provides an XML-like API over any SGML document (which an HTML document really is).
You could use the Regex.Match method to pull out the team name and score. Examine the html to see how each row is built up. This is a common technique in screen scraping.

How can I write an XML on my hard drive to GetRequestStream

I need to post raw xml to a site and read the response. With the following code I keep getting an "Unknown File Format" error and I'm not sure why.
XmlDocument sampleRequest = new XmlDocument();
sampleRequest.Load(#"C:\SampleRequest.xml");
byte[] bytes = Encoding.UTF8.GetBytes(sampleRequest.ToString());
string uri = "https://www.sample-gateway.com/gw.aspx";
req = WebRequest.Create(uri);
req.Method = "POST";
req.ContentLength = bytes.Length;
req.ContentType = "text/xml";
using (var requestStream = req.GetRequestStream())
{
requestStream.Write(bytes, 0, bytes.Length);
}
// Send the data to the webserver
rsp = req.GetResponse();
XmlDocument responseXML = new XmlDocument();
using (var responseStream = rsp.GetResponseStream())
{
responseXML.Load(responseStream);
}
I am fairly certain my issue is what/how I am writing to the requestStream so..
How can I modify that code so that I may write an xml located on the hard drive to the request stream?
ok instead of doing sampleRequest.ToString(), you should use sampleRequest.OuterXml, and that would do the magic, you were sending "System.Xml.XmlDocument" instead of the Xml
XmlDocument sampleRequest = new XmlDocument();
sampleRequest.Load(#"C:\SampleRequest.xml");
//byte[] bytes = Encoding.UTF8.GetBytes(sampleRequest.ToString());
byte[] bytes = Encoding.UTF8.GetBytes(sampleRequest.OuterXml);
Two things:
First, whenever you're trying to diagnose a problem with an HTML response, you should always examine what the response stream actually contains. If you had in this case, you would have seen that it contains System.Xml.XmlDocument, which would have told you what was wrong pretty much immediately.
Second, in an application with any kind of transaction volume, you're not going to want to load a static XML file into an XmlDocument before putting it in the response stream; your program's spending time and memory building something that you don't need. (It's even worse than that in your case; your approach not only parses the XML into a DOM object, it then makes an in-memory copy of its OuterXml property when you encode it as UTF-8. Also, do you really need to be doing that?) Instead, you should create a FileStream object and use one of the techniques in this answer to copy it to the response stream.

Categories

Resources