I need to call .php page from my aspx.cs page.but I don't want to load the page.I just want to call the page and the page will give me XLM response that I need to store in DB.I am trying this with the Ajax,but according to this link.We are not be able to call cross domain page from ajax.
In short I want to read the data from php page using asp.net code.
can anybody please help me out.
Update :: Is the P3P policy will usefull in case of cross domain page calling.
I got the solutions,thanks for your help.
create a new WebClient object
WebClient client = new WebClient();
string url = "http://testurl.com/test.php";
create a byte array for holding the returned data
byte[] html = client.DownloadData(url);
use the UTF8Encoding object to convert the byte
array into a string
UTF8Encoding utf = new UTF8Encoding();
get the converted string
string mystring = utf.GetString(html);
If I understand you correctly, your problem is that you want to do a cross-domain ajax call - which is not possible. The way to go around this is to make a call to your own backend which then fetches the data from the other site and sensd it back to the browser. Remember to do any needed safety check in the back end - depending on how much you trust the other domain of cource... (but even if you trust it 100%, it may be hacked or have some other problems that makes it return something else than what you tink it returns)
Related
Is there a way to get the fully rendered html of a web page using WebClient instead of the page source? I'm trying to scrape some data from the page's html. My current code is like this:
WebClient client = new WebClient();
var result = client.DownloadString("https://somepageoutthere.com/");
//using CsQuery
CQ dom = result;
var someElementHtml = dom["body > main];
WebClient will only return the URL you requested. It will not run any javacript on the page (which runs on the client) so if javascript is changing the page DOM in any way, you will not get that through webclient.
You are better off using some other tools. Look for those that will render the HTML and javascript in the page.
I don't know what you mean by "fully rendered", but if you mean "with all data loaded by ajax calls", the answer is: no, you can't.
The data which is not present in the initial html page is loaded through javascript in the browser, and WebClient has no idea what javascript is, and cannot interpret it, only browsers do.
To get this kind of data, you need to identify these calls (if you don't know the url of the data webservice, you can use tools like Fiddler), simulate/replay them from your application, and then, if successful, get response data, and extract data from it (will be easy if data comes as json, and more tricky if it comes as html)
better use http://html-agility-pack.net
it has all the functionality to scrap web data and having good help on the site
I would like to access content of a web-page using C#. The content is inside an i-Frame of the Body of the website, underlying an #document object. I am using this to read the page:
WebClient wbClient = new WebClient();
wbClient.UseDefaultCredentials = true;
byte[] raw = wbClient.DownloadData(stWebPage);
stWebPageContent = System.Text.Encoding.UTF8.GetString(raw);
However, the relevant information inside the #document is ignored.
Can anybody explain what I have to do to access the needed info? It is nested under body/div/iframe/#document/html/body/div/..... Thanks!
Note: I am assuming stWebPage is pointing to a http url.
iFrame content will not be downloaded directly in this one call. You need to look for iFrame in stWebPageContent using Regex and pull the value in 'src' attribute, make another call to the src url for downloading content. More details can be found at this link.
I'm trying to do some web scraping from a simple form in C#.
My issue is trying to figure out the action to post to and how to work out the post params.
The form I am trying to submit has:
<form method="post" action="./"
As the page sits at www.foobar.com I am creating a WebRequest object in my C# code and posting to this address.
The other issue with this is that I am not sure of the post values as the inputs only have ids not names:
<input name="ctl00$MainContent$txtSearchName" type="text" maxlength="8" id="MainContent_txtSearchName" class="input-large input-upper">
So I read this: c# - programmatically form fill and submit login, amongst others and my code looks like this:
var httpRequest = WebRequest.Create("https://www.foobar.com/");
var values = "SearchName=Foo&SearchLastName=Bar";
byte[] send = Encoding.Default.GetBytes(values);
httpRequest.Method = "POST";
httpRequest.ContentType = "application/x-www-form-urlencoded";
httpRequest.ContentLength = send.Length;
Stream sout = httpRequest.GetRequestStream();
sout.Write(send, 0, send.Length);
sout.Flush();
sout.Close();
WebResponse res = httpRequest.GetResponse();
StreamReader sr = new StreamReader(res.GetResponseStream());
string returnvalue = sr.ReadToEnd();
File.WriteAllText(#"C:\src\test.html", returnvalue);
However, the resulting html page that is created does not show the search results, it shows the initial search form.
I am assuming the post is failing. My questions are around post I am making.
Does action="./" mean it posts back to the same page?
Do I need to submit all the form values (or can I get away with only submitting one or two)?
Is there any way to infer what the correct post parameter names are from the form?
Or am I missing something completely about web scraping and submitting forms in server side code?
What I would suggest is not doing all of this work manually, but letting your computer take a bit of the workload. You can use a tool such as Fiddler and the Fiddler Request To Code Plugin in order to programmatically generate the C# code for duplicating the web request. You can then modify it to take whatever dynamic input you may need.
If this isn't the route you'd like to take, you should make sure that you are requesting this data with the correct cookies (if applicable) and that you are supplying ALL POST data, no matter how menial it may seem.
I am trying to get specific information from a website. Right Now I have this html string as you can see my code, the html source code of the website is placed in "responseText". I know I can do this with If's statement but it would be really tedious. I'm a newbie so I have no idea what I'm doing with this. I'm sure there must be another easier way to retrieve information from a website... This is c# for windows store so I can't use webclient. This codes get the string but isn't there is a way I can remove the html code and only leave the variables or something? I just want to do this for a webpage and I know the variables I want because I looked at the html code of the webpage. Isn't it a way to request a list of variables with its information from the website? I'm just kind of lost here. So basically I just want to get specific information from a website in c#, I'm making an app for windows store.
StringBuilder sb = new StringBuilder();
// used on each read operation
byte[] buf = new byte[8192];
// prepare the web page we will be asking for
HttpClient searchClient;
searchClient = new HttpClient();
searchClient.MaxResponseContentBufferSize = 256000;
HttpResponseMessage response = await searchClient.GetAsync(url);
response.EnsureSuccessStatusCode();
responseText = await response.Content.ReadAsStringAsync();
This codes get the string but isn't there is a way I can remove the html code and only leave the variables or something?
What "variables"? You get the HTML - that's the response from the web server. If you want to strip that HTML, that's up to you. You might want to use HTML Tidy to make it more pleasant to work with, but the business of extracting relevant information from HTML is up to you. HTML isn't designed to be machine-readable as a raw information source - it's meant to be mark-up to present to humans.
You should investigate whether the information is available in a more machine-friendly source, with no presentation information etc. For example, there may be some way of getting the data as JSON or XML.
I would like to make a console application that can query a website for data, process it, and then display it. ie- Access http://data.mtgox.com/ and then display the rate for currency X to currency Y.
I am able to get a large string of text via WebClient and StreamReader (though I don't really understand them), and I imagine that I could trim down the string to what I want and then loop the query with a delay to enable updating of the data without running the program again. However I'm only speculating and it seems like there would be a more efficient way of accessing data than this. Am I missing something?
EDIT - The general consensus seems to be to use JSON to do this; which is exactly was I was looking for! Thanks guys!
It looks like that site provides an API serving up JSON data. If you don't know what JSON is you need to look into that, but basically you could create object models representing this JSON. If you have the latest version of VS2012 you can copy the JSON and right click, hit paste special, then paste as class. This will automatically generate models for you. You then contact the API, retrieve the JSON, deserialize it into your models, and do whatever you want from there.
A better way:
string url = "http://data.mtgox.com/";
HttpWebRequest myWebRequest = (HttpWebRequest)HttpWebRequest.Create(url);
myWebRequest.Method = "GET"; // This can also be "POST"
HttpWebResponse myWebResponse = (HttpWebResponse)myWebRequest.GetResponse();
StreamReader myWebSource = new StreamReader(myWebResponse.GetResponseStream());
string myPageSource = string.Empty;
myPageSource= myWebSource.ReadToEnd();
myWebResponse.Close();
// Do something with the data in myPageSource
Once you have the data, look at JSON.NET to parse it