Parse and extract a value from streaming XML file?

Parse and extract a value from streaming XML file? - c#

After much ado, I managed to create a restful service in asp.net MVC following Omar's brilliant Restful Asp.net article
Just one little thing remains.
My Asp.Net MVC controller returns an XML file , which has this tag
< FileCode > 24233224< / FileCode >
This is a console application I use to send a Get request which gives me the whole Xml file
//Generate get request
string url = "http://localhost:1193/Home/index?File=343456789012286";
HttpWebRequest GETRequest = (HttpWebRequest)WebRequest.Create(url);
GETRequest.Method = "GET";
GETRequest.ContentType = "text/xml";
GETRequest.Accept = "text/xml";
Console.WriteLine("Sending GET Request");
HttpWebResponse GETResponse = (HttpWebResponse)GETRequest.GetResponse();
Stream GETResponseStream = GETResponse.GetResponseStream();
StreamReader sr = new StreamReader(GETResponseStream);
Console.WriteLine("Response from Server");
// This writes whole file on screen
Console.WriteLine(sr.ReadToEnd());
I could perhaps save this file and then use Linq to parse it, but can't I just get the value in my tag out without saving it ? I simply need the FileCode
Thankyou :)

Yuo could emply the XPathReader (source download).
It comes with source and testsuite.
What it gives you is the ability to work with highlevel query constructs (XPath) in streaming mode.
There is also a similar article on CodeProject: Fast screen scraping with XPath over a modified XmlTextReader and SgmlReader

Related

Downloading xml file from URL

I am trying to store data from a xml file located in this link : Link to xml file
The xml file basically contains questions as well as options relating to English grammar. The main reason for saving these file is that I can parse these information later
From the browser (Chrome or IE), the xml file is loaded normally yet saving it programmatically did not work. The issue is that the data retrieved from that URL appeared to be something else.
This is my code for getting data from the URL
using (WebClient client = new WebClient())
{
client.Encoding = Encoding.UTF8;
// the data retrieved is something meaning : "Link does not exist"
string data = client.DownloadString("http://farm04.gox.vn/edu/IOE_Exam/IOE/l5/v1/g1/exam1.xml?v=");
}
The above approach gives nothing but a string which indicates there isnt such a link (though I still think the server itself sends it rather than a 404 error).
My second attempt is using WebRequest and WebResponse but no luck. I have read that WebClient is a wrapper class for the latter method so both give the same result
WebRequest request = WebRequest.Create(Url);
WebResponse response = request.GetResponse();
using (Stream stream = response.GetResponseStream())
{
return new StreamReader(stream).ReadToEnd();
}
I suppose the server has a mechanism to prevent client from downloading it. However Chrome would do the job successfully by loading the page then "Ctrl + S" (Save as ..) to save the xml file without any problems.
What the differences between browser and WebClient API ? Is there additional action which the browser implements ?
Any suggestion would be much appreciated.

How to scrape data

I am trying scrape data from this url: http://icecat.biz/en/p/Coby/DP102/desc.htm
I want to scrape that specs table from that url.
But I checked source code of url that spec table is not displaying because i think that table is loading using Ajax.
How can I get that table.Whats needs to be done?
I used the following code:
string Strproducturl = "http://icecat.biz/en/p/Coby/DP102/desc.htm";
System.Net.ServicePointManager.Expect100Continue = false;
HttpWebRequest httpWebRequest = (HttpWebRequest)WebRequest.Create(Strproducturl);
httpWebRequest.KeepAlive = true;
ASCIIEncoding encoding = new ASCIIEncoding();
HttpWebResponse httpWebResponse = (HttpWebResponse)httpWebRequest.GetResponse();
Stream responseStream = httpWebResponse.GetResponseStream();
StreamReader streamReader = new StreamReader(responseStream);
string response = streamReader.ReadToEnd();

As IanNorton mentioned, you'll need to make your request to the URL that Icecat use to load the specs using AJAX. For the example link you provided, the specs details URL you'll need to request will be:
http://icecat.biz/index.cgi?ajax=productPage;product_id=1091664;language=en;request=feature
You can then work your way through the HTML response to get the spec details you require.
You mentioned in your comment that the scraping process is automated. The specs URL is in a basic format, you just need the product ID. However, if you don't have the IDs, just a series of URLs like the example on in your original question, you'll need to get the product ID from the URL you have.
For example, the URL example you gave redirects to a different URL:
http://icecat.biz/p/coby/dp102/digital-photo-frames-0716829961025-dp-102-digital-photo-frame-1091664.html
This URL contains the product ID, right at the end.
You could do a HttpWebRequest to your original URL, stop before it does the redirect and catch the redirecting URL:
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://icecat.biz/en/p/Coby/DP102/desc.htm");
request.AllowAutoRedirect = false;
request.KeepAlive = true;
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
if(response.StatusCode == HttpStatusCode.Redirect){
string redirectUrl = response.GetResponseHeader("Location");
}
Once you've got the redirectUrl variable, you can use Regex to get the ID then do another HttpWebRequest to the specs detail URL.

I would suggest that you use a library like HtmlAgilityPack to select various elements from the html document.
I took a quick look at the link and noticed that the data is actually loaded using an addtional ajax request. You can use the following url to get the ajax data
http://icecat.biz/index.cgi?ajax=productPage;product_id=1091664;language=en;request=feature
The use HtmlAgilityPack to parse that data.

I know this is very old but you could more easily just retrieve the XML from
https://openIcecat-xml:freeaccess#data.icecat.biz/export/freexml.int/EN/1091664.xml
You will also get all images and descriptions as well :-)

RPC to java based Google App Engine from C# Client

I'm having a lot of problems calling a Java based Google App Engine server from a C# client
This is how my client code looks like:
// C# Client
static void Main(string[] args)
{
const string URL = "http://localhost:8888/googlewebapptest7/greet";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(URL);
request.Method = "POST";
request.ContentType = "text/x-gwt-rpc; charset=utf-8";
string content = "<?xml version='1.0'?><methodCall><methodName>greetServlet.GetName</methodName><params></params></methodCall>";
byte[] contentBytes = UTF8Encoding.UTF8.GetBytes(content);
request.ContentLength = contentBytes.Length;
using (Stream stream = request.GetRequestStream())
{
stream.Write(contentBytes, 0, contentBytes.Length);
}
// get response
WebResponse response = request.GetResponse();
using (Stream responseStream = response.GetResponseStream())
{
string res = new StreamReader(responseStream).ReadToEnd();
Console.WriteLine("response from server:");
Console.WriteLine(res);
Console.ReadKey();
}
The server is basically the Google default web project with an additional method:
public String GetName()
{
return "HI!";
}
added to GreetingServiceImpl.
Everytime I run my client, I get the following exception from the server:
An IncompatibleRemoteServiceException was thrown while processing this call.
com.google.gwt.user.client.rpc.IncompatibleRemoteServiceException: This application is out of date, please click the refresh button on your browser. ( Malformed or old RPC message received - expecting version 5 )
I would like to keep it in plain HTTP requests.
Any idea what's going on?

As Nick pointed out you're trying to emulate GWT's RPC format. I tried that too :)
Then I took different approach. I used Google Protocol Buffers as encoder/decoder over HTTP(S).
One of my projects is desktop application written in C#. Server-side is also C#.Net. Naturally we use WCF as transport.
You can plug-in Protocol Buffers into WCF transport. You'll have same interface configuration both for C# client and Java server. It's very convenient.
I'll update this answer with code-samples when I'm less busy with work.

I never found a good way to use XML based RPC on the Google App Engine. Instead i found this excellent JSON library with tutorial:
http://werxltd.com/wp/portfolio/json-rpc/simple-java-json-rpc/
it works very well!

How to read a file from a URI using StreamReader?

I have a file at a URI that I would like to read using StreamReader. Obviously, this causes a problem since File.OpenText does not support URI paths. The file is a txt file with a bunch of html in it. I have multiple web pages that use this same piece of html, so I have put it in a txt file, and am reading it into the page when the page loads (I can get it to work when I put the file on the file system, but need to put it in a document repository online so that a business user can get to it). I am trying to avoid using an iframe. Is there a way to use StreamReader with URI formats? If not, what other options are there using C# to read in the txt file of html? If this is not optimal, can someone suggest a better approach?

Is there a specific requirement to use StreamReader? Unless there is, you can use the WebClient class:
var webClient = new WebClient();
string readHtml = webClient.DownloadString("your_file_path_url");

You could try using the HttpWebRequestClass, or WebClient. Here's the slightly complicated web request example. It's advantage over WebClient is it gives you more control over how the request is made:
HttpWebRequest httpRequest = (HttpWebRequest) WebRequest.Create(lcUrl);
httpRequest.Timeout = 10000; // 10 secs
httpRequest.UserAgent = "Code Sample Web Client";
HttpWebResponse webResponse = (HttpWebResponse) httpRequest.GetResponse();
StreamReader responseStream = new StreamReader(webResponse.GetResponseStream());
string content = responseStream.ReadToEnd();

If you are behind a proxy don't forget to set your credentials:
WebRequest request=WebRequest.Create(url);
request.Timeout=30*60*1000;
request.UseDefaultCredentials=true;
request.Proxy.Credentials=request.Credentials;
WebResponse response=(WebResponse)request.GetResponse();
using (Stream s=response.GetResponseStream())
...

Is there any C# equivalent to the Perl's LWP::UserAgent?

In a project I'm invovled in, there is a requirment that the price of certain
stocks will be queryed from some web interface and be displayed in some way.
I know the "query" part of the requirment can be easily implemented using a Perl module like LWP::UserAgent. But for some reason, C# has been chosen as the language to implement the Display part. I don't want to add any IPC (like socket, or indirectly by database) into this tiny project, so my question is there any C# equivalent to the Perl's LWP::UserAgent?

You can use the System.Net.HttpWebRequest object.
It looks something like this:
// Setup the HTTP request.
HttpWebRequest httpWebRequest = (HttpWebRequest)HttpWebRequest.Create("http://www.google.com");
// This is optional, I'm just demoing this because of the comments receaved.
httpWebRequest.UserAgent = "My Web Crawler";
// Send the HTTP request and get the response.
HttpWebResponse httpWebResponse = (HttpWebResponse)httpWebRequest.GetResponse();
if (httpWebResponse.StatusCode == HttpStatusCode.OK)
{
// Get the HTML from the httpWebResponse...
Stream responseStream = httpWebResponse.GetResponseStream();
StreamReader reader = new StreamReader(responseStream);
string html = reader.ReadToEnd();
}

I'm not sure, but are you simply trying to make an HTTP Request? If so, you can use the HttpWebRequest class. Here's an example http://www.csharp-station.com/HowTo/HttpWebFetch.aspx

If you want to simply fetch data from the web, you could use the WebClient class. It seems to be quite good for quick requests.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Parse and extract a value from streaming XML file? - c#

Related

Downloading xml file from URL

How to scrape data

RPC to java based Google App Engine from C# Client

How to read a file from a URI using StreamReader?

Is there any C# equivalent to the Perl's LWP::UserAgent?

Categories

Resources