How to obtain data from a website and process it in C#

How to obtain data from a website and process it in C# - c#

I would like to make a console application that can query a website for data, process it, and then display it. ie- Access http://data.mtgox.com/ and then display the rate for currency X to currency Y.
I am able to get a large string of text via WebClient and StreamReader (though I don't really understand them), and I imagine that I could trim down the string to what I want and then loop the query with a delay to enable updating of the data without running the program again. However I'm only speculating and it seems like there would be a more efficient way of accessing data than this. Am I missing something?
EDIT - The general consensus seems to be to use JSON to do this; which is exactly was I was looking for! Thanks guys!

It looks like that site provides an API serving up JSON data. If you don't know what JSON is you need to look into that, but basically you could create object models representing this JSON. If you have the latest version of VS2012 you can copy the JSON and right click, hit paste special, then paste as class. This will automatically generate models for you. You then contact the API, retrieve the JSON, deserialize it into your models, and do whatever you want from there.

A better way:
string url = "http://data.mtgox.com/";
HttpWebRequest myWebRequest = (HttpWebRequest)HttpWebRequest.Create(url);
myWebRequest.Method = "GET"; // This can also be "POST"
HttpWebResponse myWebResponse = (HttpWebResponse)myWebRequest.GetResponse();
StreamReader myWebSource = new StreamReader(myWebResponse.GetResponseStream());
string myPageSource = string.Empty;
myPageSource= myWebSource.ReadToEnd();
myWebResponse.Close();
// Do something with the data in myPageSource
Once you have the data, look at JSON.NET to parse it

Related

Expected X-Hub-Signature via SHA1 not same as what FB is sending (code in C#)

I am having problems getting the X-Hub-Signature sent to me by facebook to match the one I am generating in C#. For a while I thought I was running the function wrong but I have now used multiple code sources on Stack Overflow and a website (http://billatnapier.com/security01.aspx) to confirm I am indeed creating the SHA-1 correctly.
So .... something is clearly wrong with the content. I am using ASP.NET Web API and the "Payload" that I am using to feed into the SHA-1 algorithm is the JSON object I am receiving from facebook, converted to a string. I assume this is what they want me to use when they say "Payload" is that correct?
It is a string that begins with {"entry":[ and ends with "object":"page"}
I feel like I've tried everything and have hit a brick wall so hoping someone can help me. Web API is a bit off - even grabbing the X-Hub-Signature was a challenge as you can't just use Request.Header["X-Hub-Signature"]; I am almost wondering if I should switch back to pure MVC.

OK so I am answering my own question! The problem with the "Payload" is that you can't simply grab the JSON object. You have to find a way to access the Request object from Web API and then read in the payload like this:
var context = Request.Properties["MS_HttpContext"] as HttpContextWrapper;
using (StreamReader reader = new StreamReader(context.Request.InputStream))
{
payload = reader.ReadToEnd();
}
It looks like binning Web API and just doing this in MVC would be easier as you then just do this:
using (StreamReader reader = new StreamReader(HttpContext.Request.InputStream))
{
PayLoad = reader.ReadToEnd();
}

Html Agility Pack - reading div InnerText in table

My problem is that I can't get div InnerText from table. I have successfully extraced different kind of data, but i don't know how to read div from table.
In following picture I've highlighted div, and I need to get InnerText from it, in this case - number 3.
Click here for first picture
I'm trying to accomplish this using following path:
"//div[#class='kal']//table//tr[2]/td[1]/div[#class='cipars']"
But I'm getting following Error:
Click here for Error message picture
Assuming that rest of the code is written correctly, could anyone point me in the right direction ? I have been trying to figure this one out, but i can't get any results.

So your problem is that you are relying on positions within your XPath. Whilst this can be OK in some cases, it is not here, because you are expecting the first td in a given tr to have a div with the class.
Looking at the source in Chrome, it shows this is not always the case. You can see this by comparing the "1" element in the calendar, to "2" and "3". You'll notice the "1" element has a number of elements around it, which the others don't.
Your original XPath query does not return an element, this is why you are getting the error. In the event the XPath query you give HtmlAgilityPack does not result in a DOM element, it will return null.
Now, because you've not shown your entire code, I don't know how this code is being run. However, I am guessing you are trying to loop through all of the calendar items. Regardless, you have multiple ways of doing this, but I will show you that with the descendant XPath selector, you can just grab the whole lot in one go:
//div[#class='kal']//table//descendant::div[#class='cipars']
This will return all of the calendar items (ie 1 through 30).
However, to get all the items in a particular row, you can just stick that tr into the query:
//div[#class='kal']//table//tr[3]/descendant::div[#class='cipars']
This would return 2 to 8 (the second row of calendar items).
To target a specific one, well, you'll have to make an assumption on the source code of the website. It looks like that every "cipars" div has an ancestor of a td with a class datums....so to get the "3" value from your question:
//div[#class='kal']//table//tr[3]//td[#class='datums'][2]/div[#class='cipars']
Hopefully this is enough to show the issue at least.
Edit
Although you do have an XPath problem, you also have another issue.
The site is created very strangely. The calendar is loaded in a strange way. When I hit that URL, the calendar is created by some Javascript calling an XML web service (written in PHP) that then calculates the full table to be used for the calendar.
Due to the fact this is Javascript (client side code), HtmlAgilityPack won't execute it. Therefore, HtmlAgilityPack doesn't even "see" the table. Hence the queries against it come back as "not found" (null).
Ways around this: 1) Use a tool that will call the scripts. By this, I mean load up a Browser. A great tool to use for this is called Selenium. This will probably be the better overall solution because it means all the scripting used by the site will actually be called. You can still use XPath with it, so your queries will not change.
The second way is to send a request off to the same web service that the page does. This is to basically get back the same HTML that the page is getting, and using that with HtmlAgilityPack. How do we do that?
Well, you can easily POST data to a web service using C#. Just for ease of use I've stolen the code from this SO question. With this, we can send the same request the page is, and get the same HTML back.
So to send some POST data, we generate a method like so.....
public static string SendPost(string url, string postData)
{
string webpageContent = string.Empty;
byte[] byteArray = Encoding.UTF8.GetBytes(postData);
HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create(url);
webRequest.Method = "POST";
webRequest.ContentType = "application/x-www-form-urlencoded";
webRequest.ContentLength = byteArray.Length;
using (Stream webpageStream = webRequest.GetRequestStream())
{
webpageStream.Write(byteArray, 0, byteArray.Length);
}
using (HttpWebResponse webResponse = (HttpWebResponse)webRequest.GetResponse())
{
using (StreamReader reader = new StreamReader(webResponse.GetResponseStream()))
{
webpageContent = reader.ReadToEnd();
}
}
return webpageContent;
}
We can call it like so:
string responseBody = SendPost("http://lekcijas.va.lv/lekcijas_request.php", "nodala=IT&kurss=1&gads=2013&menesis=9&c_dala=");
How did I get this? Well the php file we are calling is the web service the page is, and the POST data is too. The way I found out what data it sends to the service is by debugging the Javascript (using Chrome's Developer console), but you may notice it's pretty much the same thing that is in the URL. That seems to be intentional.
The responseBody that is returned is the physical HTML of just the table for the calendar.
What do we do with it now? We load that up into HtmlAgilityPack, because it is able to accept pure HTML.
var document = new HtmlDocument();
document.LoadHtml(webpageContent);
Now, we stick that original XPath in:
var node = document.DocumentNode.SelectSingleNode("//div[#class='kal']//table//tr[3]//td[#class='datums'][2]/div[#class='cipars']");
Now, we print out what should hopefully be "3":
Console.WriteLine(node.InnerText);
My output, running it locally, is indeed: 3.
However, although this would get you over the problem you are having, I am assuming the rest of the site is like this. If this is the case, you may still be able to work around it using technique above, but tools like Selenium were created for this very reason.

Getting specific information from a website C# windows store app

I am trying to get specific information from a website. Right Now I have this html string as you can see my code, the html source code of the website is placed in "responseText". I know I can do this with If's statement but it would be really tedious. I'm a newbie so I have no idea what I'm doing with this. I'm sure there must be another easier way to retrieve information from a website... This is c# for windows store so I can't use webclient. This codes get the string but isn't there is a way I can remove the html code and only leave the variables or something? I just want to do this for a webpage and I know the variables I want because I looked at the html code of the webpage. Isn't it a way to request a list of variables with its information from the website? I'm just kind of lost here. So basically I just want to get specific information from a website in c#, I'm making an app for windows store.
StringBuilder sb = new StringBuilder();
// used on each read operation
byte[] buf = new byte[8192];
// prepare the web page we will be asking for
HttpClient searchClient;
searchClient = new HttpClient();
searchClient.MaxResponseContentBufferSize = 256000;
HttpResponseMessage response = await searchClient.GetAsync(url);
response.EnsureSuccessStatusCode();
responseText = await response.Content.ReadAsStringAsync();

This codes get the string but isn't there is a way I can remove the html code and only leave the variables or something?
What "variables"? You get the HTML - that's the response from the web server. If you want to strip that HTML, that's up to you. You might want to use HTML Tidy to make it more pleasant to work with, but the business of extracting relevant information from HTML is up to you. HTML isn't designed to be machine-readable as a raw information source - it's meant to be mark-up to present to humans.
You should investigate whether the information is available in a more machine-friendly source, with no presentation information etc. For example, there may be some way of getting the data as JSON or XML.

Get response from php page

I need to call .php page from my aspx.cs page.but I don't want to load the page.I just want to call the page and the page will give me XLM response that I need to store in DB.I am trying this with the Ajax,but according to this link.We are not be able to call cross domain page from ajax.
In short I want to read the data from php page using asp.net code.
can anybody please help me out.
Update :: Is the P3P policy will usefull in case of cross domain page calling.

I got the solutions,thanks for your help.
create a new WebClient object
WebClient client = new WebClient();
string url = "http://testurl.com/test.php";
create a byte array for holding the returned data
byte[] html = client.DownloadData(url);
use the UTF8Encoding object to convert the byte
array into a string
UTF8Encoding utf = new UTF8Encoding();
get the converted string
string mystring = utf.GetString(html);

If I understand you correctly, your problem is that you want to do a cross-domain ajax call - which is not possible. The way to go around this is to make a call to your own backend which then fetches the data from the other site and sensd it back to the browser. Remember to do any needed safety check in the back end - depending on how much you trust the other domain of cource... (but even if you trust it 100%, it may be hacked or have some other problems that makes it return something else than what you tink it returns)

How to store JSON to RavenDB?

I want to take existing data and put it into RavenDB.
My existing data was in an XML format, so I converted it to JSON.
What should my next step be?
Can I store it in RavenDB as is?
Do I need to create new objects to store it?
Thanks in advance!

It is not mandatory to submit content to RavenDB using the RavenDB Client, nor is it necessary to populate a domain model first. This is unnecessary effort and only complicates the process of data submission/insertion/migration/import.
You can submit JSON formatted documents directly to RavenDB using the HTTP API, specifically you may wish to review the "Single Document Operations" topic for simple examples which (currently) show examples using 'curl'.
Consider the following .NET code:
var url = string.Format("http://ravendb-server:8080/databases/{0}/docs/{1}", databaseName, docId);
var webRequest = System.Net.HttpWebRequest.CreateHttp(url);
webRequest.Method = "PUT";
webRequest.ContentType = "application/json";
webRequest.Headers["Raven-Entity-Name"] = entityName;
var stream = webRequest.GetRequestStream();
using (var writer = new System.IO.StreamWriter(webRequest.GetRequestStream()))
{
writer.Write(json);
}
var webResponse = webRequest.GetResponse();
webResponse.Close();
The above snippet allows you to submit a valid JSON document into a specific database and a specific document collection with the specified ID. Database selection and ID designation is performed through URL paths, and the Document Collection is specified with the metadata header Raven-Entity-Name.
There are additional metadata headers you may want to send up, such as Raven-Clr-Type or Last-Modified but they are not required.

I suppose that your json-data represents the data of your applications domain, and you want to have classes with properties to work with that data in your application, right?
If that is the case, you need to write a simple import-application, that populates your domain model once and then stores all your objects as regular RavenDB documents, just the way you would store any other object with RavenDB.
Does that make sense?

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to obtain data from a website and process it in C# - c#

Related

Expected X-Hub-Signature via SHA1 not same as what FB is sending (code in C#)

Html Agility Pack - reading div InnerText in table

Getting specific information from a website C# windows store app

Get response from php page

How to store JSON to RavenDB?

Categories

Resources