How can I pull data from website using C# - c#

Web-page data into the application

You can replicate the request the website makes to get a list of relevant numbers. The following code might be a good start.
var httpRequest = (HttpWebRequest) WebRequest.Create("<url>");
httpRequest.Method = "POST";
httpRequest.Accept = "application/json";
string postData = "{<json payload>}";
using (var streamWriter = new StreamWriter(httpRequest.GetRequestStream())) {
streamWriter.Write(postData);
}
var httpResponse = (HttpWebResponse) httpRequest.GetResponse();
string result;
using (var streamReader = new StreamReader(httpResponse.GetResponseStream())) {
result = streamReader.ReadToEnd();
}
Console.WriteLine(result);
Now, for the <url> and <json payload> values:
Open the web inspector in your browser.
Go to the Network tab.
Set it so Fetch/XHR/AJAX requests are shown.
Refresh the page.
Look for a request that you want to replicate.
Copy the request URL.
Copy the Payload (JSON data, to use it in your code you'll have to add a \ before every ")
Side note: The owner of the website you are making automated requests to might not be very happy about your tool, and you/it might be blocked if it makes too many requests in a short time.

Related

Scrape data from web page with HtmlAgilityPack c#

I had a problem scraping data from a web page which I got a solution
Scrape data from web page that using iframe c#
My problem is that they changed the webpage which is now https://webportal.thpa.gr/ctreport/container/track and I don't think that is using iFrames and I cannot get any data back.
Can someone tell me if I can use the same method to get data from this webpage or should I use a different aproach?
I don't know how #coder_b found that I should use https://portal.thpa.gr/fnet5/track/index.php as web page and that I should use
var reqUrlContent =
hc.PostAsync(url,
new StringContent($"d=1&containerCode={reference}&go=1", Encoding.UTF8,
"application/x-www-form-urlencoded"))
.Result;
to pass the variables
EDIT: When I check the webpage there is an input which contains the number
input type="text" id="report_container_containerno"
name="report_container[containerno]" required="required"
class="form-control" minlength="11" maxlength="11" placeholder="E/K
για αναζήτηση" value="ARKU2215462"
Can I use something to pass with HtmlAgilityPack and then it should be easy to read the result
Also when I check the DocumentNode it seems to show me the cookies page that I should agree.
Can I bypass or auto allow cookies?
Try this:
public static string Download(string search)
{
var request = (HttpWebRequest)WebRequest.Create("https://webportal.thpa.gr/ctreport/container/track");
var postData = string.Format("report_container%5Bcontainerno%5D={0}&report_container%5Bsearch%5D=", search);
var data = Encoding.ASCII.GetBytes(postData);
request.Method = "POST";
request.ContentType = "application/x-www-form-urlencoded";
request.ContentLength = data.Length;
using (var stream = request.GetRequestStream())
{
stream.Write(data, 0, data.Length);
}
using (var response = (HttpWebResponse)request.GetResponse())
using (var stream = new StreamReader(response.GetResponseStream()))
{
return stream.ReadToEnd();
}
}
Usage:
var html = Download("ARKU2215462");
UPDATE
To find the post parameters to use, press F12 in the browser to show dev tools, then select Network tab. Now, fill the search input with your ARKU2215462 and press the button.
That do a request to the server to get the response. In that request, you can inspect both request and response. There are lots of request (styles, scripts, iamges...) but you want the html pages. In this case, look this:
This is the Form data requested. If you click in "view source", you get the data encoded like "report_container%5Bcontainerno%5D=ARKU2215462&report_container%5Bsearch%5D=", as you need in your code.

C#: HttpWebRequest POST data not working

I am developing a C# wpf application that has a functionality of logging into my website and download the file. This said website has an Authorize attribute on its action. I need 2 cookies for me to able to download the file, first cookie is for me to log in, second cookie(which is provided after successful log in) is for me to download the file. So i came up with the flow of keeping my cookies after my httpwebrequest/httpwebresponse. I am looking at my posting flow as maybe it is the problem. Here is my code.
void externalloginanddownload()
{
string pageSource = string.Empty;
CookieContainer cookies = new CookieContainer();
HttpWebRequest getrequest = (HttpWebRequest)WebRequest.Create("login uri");
getrequest.CookieContainer = cookies;
getrequest.Method = "GET";
getrequest.AllowAutoRedirect = false;
HttpWebResponse getresponse = (HttpWebResponse)getrequest.GetResponse();
using (StreamReader sr = new StreamReader(getresponse.GetResponseStream()))
{
pageSource = sr.ReadToEnd();
}
var values = new NameValueCollection
{
{"Username", "username"},
{"Password", "password"},
{ "Remember me?","False"},
};
var parameters = new StringBuilder();
foreach (string key in values.Keys)
{
parameters.AppendFormat("{0}={1}&",
HttpUtility.UrlEncode(key),
HttpUtility.UrlEncode(values[key]));
}
parameters.Length -= 1;
HttpWebRequest postrequest = (HttpWebRequest)WebRequest.Create("login uri");
postrequest.CookieContainer = cookies;
postrequest.Method = "POST";
using (var writer = new StreamWriter(postrequest.GetRequestStream()))
{
writer.Write(parameters.ToString());
}
using (WebResponse response = postrequest.GetResponse()) // the error 500 occurs here
{
using (var streamReader = new StreamReader(response.GetResponseStream()))
{
string html = streamReader.ReadToEnd();
}
}
}
When you get the WebResponse, the cookies returned will be in the response, not in the request (oddly enough, even though you need to CookieContainer on the request).
You will need to add the cookies from the response object to your CookieContainer, so it gets sent on the next request.
One simple way:
for(var cookie in getresponse.Cookies)
cookies.Add(cookie)
Since the cookies in response is already a cookies container, you can do this (might help to check for null in case all cookies were already there)
if (response.Cookies != null) cookies.Add(response.Cookies)
You may also have trouble with your POST as you need to set ContentType and length:
myWebRequest.ContentLength = parameters.Length;
myWebRequest.AllowWriteStreamBuffering = true;
If you have any multibyte characters to think about, you may have to address that as well by setting the encoding to UTF-8 on the request and the stringbuilder, and converting string to bytes and using that length.
Another tip: some web server code chokes if there is no user agent. Try:
myWebRequest.UserAgent = "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)";
And just in case you have any multibyte characters, it is better to do this:
var databytes = System.Text.Encoding.UTF8.GetBytes(parameters.ToString());
myWebRequest.ContentLength = databytes.Length;
myWebRequest.ContentType = "application/x-www-form-urlencoded; charset=utf-8";
using (var stream = myWebRequest.GetRequestStream())
{
stream.Write(databytes, 0, databytes.Length);
}
In C# Application (Server side Web API) Enable the C++ Exception and Common Language Run time Exceptions using (Ctrl+Alt+E) what is the Server side Exception it's throw.
First you check data is binding Properly. After you can see what it is Exact Exception. the Internal Server Error Mostly throw the data is not correct format and not properly managed Exception.

access site and get a search result

I need to access www.skyscanner.com and get the answer to search (set in console application )
I try
var url= #"www.skyscanner.com";
var webRequest = (HttpWebRequest)WebRequest.Create(url);
webRequest.ContentType = "application/x-www-form-urlencoded";
webRequest.Method = "POST";
using (var streamWriter = new StreamWriter(webRequestLogin.GetRequestStream()))
{
var httpResponsee = (HttpWebResponse)webRequest.GetResponse();
using (var streamReader = new StreamReader(httpResponsee.GetResponseStream()))
{
var response = streamReader.ReadToEnd();
}
}
But i have one error "500", how i can access the site and make a search and get the result?.
Thank's
It doesn't look like your POST request has any payload. Skyscanner's server is probably expecting the itinerary details and since it is not getting them it is throwing the 500 error.
I have to add that what you are doing is not the proper way to interact with Skyscanner's service. They have an official API available for which you will need to register and get an API key here: http://business.skyscanner.net/portal/en-GB/AffiliateNetwork
You will then be able to make your application send requests to http://partners.api.skyscanner.net/apiservices/pricing/v1.0, as documented here:
http://business.skyscanner.net/portal/en-GB/Documentation/FlightsLivePricingList

Can't access Web of Trust (WoT) API w/ JSON.Net

I'm new to JSON & am using VS 2013/C#. Here's the code for the request & response. Pretty straightforward, no?
Request request = new Request();
//request.hosts = ListOfURLs();
request.hosts = "www.cnn.com/www.cisco.com/www.microsoft.com/";
request.callback = "process";
request.key = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";
string output = JsonConvert.SerializeObject(request);
//string test = "hosts=www.cnn.com/www.cisco.com/www.microsoft.com/&callback=process&key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";
try
{
var httpWebRequest = (HttpWebRequest) WebRequest.Create("http://api.mywot.com/0.4/public_link_json2?);
httpWebRequest.ContentType = "application/json";
httpWebRequest.Method = "POST";
using (var streamWriter = new StreamWriter(httpWebRequest.GetRequestStream()))
{
string json = output;
streamWriter.Write(json);
}
var httpResponse = (HttpWebResponse) httpWebRequest.GetResponse();
using (var streamReader = new StreamReader(httpResponse.GetResponseStream()))
{
var responseText = streamReader.ReadToEnd();
}
}
catch (WebException e)
{
MessageBox.Show(e.ToString());
}
//response = true.
//no response = false
return true;
}
When I run this, I get a 405 error indicating method not allowed.
It seems to me that there are at least two possible problems here: (1) The WoT API (www.mywot.com/wiki/API) requires a GET request w/ a body, & httpWebRequest doesn't allow a GET in the httpWebRequest.Method; or (2) the serialized string isn't serialized properly.
NOTE: In the following I've had to remove the leading "http://" since I don't have enough rep to post more than 2 links.
It should look like:
api.mywot.com/0.4/public_link_json2?hosts=www.cnn.com/www.cisco.com/www.microsoft.com/&callback=process&key=xxxxxxxxxxxxxx
but instead looks like:
api.mywot.com/0.4/public_link_json2?{"hosts":"www.cnn.com/www.cisco.com/www.microsoft.com/","callback":"process","key":"xxxxxxxxxxxxxxxxxxx"}.
If I browse to:api.mywot.com/0.4/public_link_json2?hosts=www.cnn.com/www.cisco.com/www.microsoft.com/&callback=process&key=xxxxxxxxxxxxxx; I get the expected response.
If I browse to: api.mywot.com/0.4/public_link_json2?{"hosts":"www.cnn.com/www.cisco.com/www.microsoft.com/","callback":"process","key":"xxxxxxxxxxxxxxxxxxx"}; I get a 403 denied error.
If I hardcode the request & send as a GET like below:
var httpWebRequest = (HttpWebRequest) WebRequest.Create("api.mywot.com/0.4/public_link_json2? + "test"); it also works as expected.
I'd appreciate any help w/ this & hope I've made the problem clear. Thx.
Looks to me like the problem is that you are sending JSON in the URL. According to the API doc that you referenced, the API is expecting regular URL encoded parameters (not JSON), and it will return JSON to you in the body of the response:
Requests
The API consists of a number of interfaces, all of which are called using normal HTTP GET requests to api.mywot.com and return a response in XML or JSON format if successful. HTTP status codes are used for returning error information and parameters are passed using standard URL conventions. The request format is as follows:
http://api.mywot.com/version/interface?param1=value1&param2=value2
You should not be serializing your request; you should be deserializing the response. All of your tests above bear this out.

Problems consuming WebService in .Net (ReCaptcha)

I am having difficulty in consuming the reCaptcha Web Service using C#/.Net 3.5. Although I think the problem is with consuming web services in general.
String validate = String.Format("http://api-verify.recaptcha.net/verify?privatekey={0}&remoteip={1}&challenge={2}&response={3}", PrivateKey, UserIP, Challenge, Response);
WebClient serviceRequest = new WebClient();
serviceRequest.Headers.Add("ContentType","application/x-www-form-urlencoded")
String response = serviceRequest.DownloadString(new Uri(validate ));
It keeps telling me that the error is: nverify-params-incorrect. Which means:
The parameters to /verify were incorrect, make sure you are passing all the required parameters.
But it's correct. I am using the private key, the IP address (locally) is 127.0.0.1, and the challenge and response seem fine. However the error keeps occurring.
I am pretty sure this is a issue with how I am requesting the service as this is the first time I have actually used webservices and .Net.
I also tried this as it ensures the data is posted:
String queryString = String.Format("privatekey={0}&remoteip={1}&challenge={2}&response={3}",PrivateKey, UserIP, Challenge, Response);
String Validate = "http://api-verify.recaptcha.net/verify" + queryString;
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(new Uri(Validate));
request.Method = "POST";
request.ContentType = "application/x-www-form-urlencoded";
request.ContentLength = Validate.Length;
**HttpWebResponse captchaResponse = (HttpWebResponse)request.GetResponse();**
String response;
using (StreamReader reader = new StreamReader(captchaResponse.GetResponseStream()))
response = reader.ReadToEnd();
Seems to stall at the point where I get response.
Any advice?
Thanks in advance
Haven't worked with the recaptcha service previously, but I have two troubleshooting recommendations:
Use Fiddler or Firebug and watch what you're sending outbound. Verifying your parameters would help you with basic troubleshooting, i.e. invalid characters, etc.
The Recaptcha Wiki has an entry about dealing with development on Vista. It doesn't have to be limited to Vista, though; if you're system can handle IPv6, then your browser could be communicating in that format as a default. It appears as if Recaptcha deals with IPv4. Having Fiddler/Firebug working would tell you about those other parameters that could be causing you grief.
This may not help solve your problem but it might provide you with better troubleshooting info.
So got this working, for some reason I needed to write the request to a stream like so:
//Write data to request stream
using (Stream requestSteam = request.GetRequestStream())
requestSteam.Write(byteData, 0, byteData.Length);
Could anyone explain why this works. I didn't think I would need to do this, don't completely understand what's happening behind the scenes..
Damien's answer is correct of course, but just to be clear about the order of things (I was a little confused) and to have a complete code sample...
var uri = new Uri("http://api-verify.recaptcha.net/verify");
var queryString = string.Format(
"privatekey={0}&remoteip={1}&challenge={2}&response={3}",
privateKey,
userIP,
challenge,
response);
var request = (HttpWebRequest)HttpWebRequest.Create(uri);
request.Method = WebRequestMethods.Http.Post;
request.ContentLength = queryString.Length;
request.ContentType = "application/x-www-form-urlencoded";
using (var writer = new StreamWriter(request.GetRequestStream()))
{
writer.Write(queryString);
}
string result;
using (var webResponse = (HttpWebResponse)request.GetResponse())
{
var reader = new StreamReader(webResponse.GetResponseStream());
result = reader.ReadToEnd();
}
There's a slight difference in that I'm writing the post variables to the request, but the core of it is the same.

Categories

Resources