I using Winrt, I try to parse a HTML Page for Results.
But to get the result, I must fill out a search page and hit the submit button.
Is that possible to do that by code in Win RT?
If you find your button using WinJS query, you can programatically fire the click event like this:
element.fireEvent("onclick");
I guess you haven't downloaded the page yet (or displayed in a WebView). To make a request have a closer look at HttpClient and HttpClientHandler. Depending on whether the page uses GET or POST you will need to create a HttpRequestMessage additionally. Search for the url of the form (most often the form's action attribute) to know your request uri.
Example:
var ClientHandler = new HttpClientHandler();
ClientHandler.UseCookies = true;
ClientHandler.AllowAutoRedirect = true;
ClientHandler.UseDefaultCredentials = true;
ClientHandler.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
var Client = new HttpClient(ClientHandler);
Client.DefaultRequestHeaders.Add("Accept", "text/html, application/xhtml+xml, */*");
Client.DefaultRequestHeaders.Add("User-Agent", "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; WOW64; Trident/6.0)");
var Response = await Client.GetAsync(RequestUri);
Your RequestUri could be something like http://www.example.com/search?query=search. But if the page you want uses POST to submit your query I think you need to create a HttpRequestMessage as below:
var RequestMessage = new HttpRequestMessage();
RequestMessage.Content = new StringContent(YourPostData, Encoding.UTF8, "application/x-www-form-urlencoded");
RequestMessage.Method = HttpMethod.Post;
RequestMessage.RequestUri = new Uri(OtherRequestUri);
Response = await Client.SendAsync(RequestMessage);
To parse the content of the response you best use the HtmlAgilityPack I think.
Related
I know that my question looks like a duplicated question, but I could not find a helpful solution for my issue.
So I am trying to scrape data from a cargo ships data providing website Link (It's a Korean website. The black button on the right is the search button)
but in order to obtain data from it, some radio buttons have to be set up then hit search.
I thought I would be able to just pass parameters values through FormUrlEncodedContent then simply use PostAsync, but somehow I could not be able to get them pass through.
Here is my codes so far
using (var client = new HttpClient())
{
client.DefaultRequestHeaders.TryAddWithoutValidation("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36");
client.DefaultRequestHeaders.TryAddWithoutValidation("Content-Type", "application/x-www-form-urlencoded");
var doc = new HtmlAgilityPack.HtmlDocument();
var content = new FormUrlEncodedContent(structInfo.ScriptValues);
var response = await client.PostAsync(structInfo.PageURL, content);
var responseString = await response.Content.ReadAsStringAsync();
Console.WriteLine(responseString);
}
using (WebClient client = new WebClient())
{
var reqparm = new System.Collections.Specialized.NameValueCollection();
reqparm.Add("v_time", "month");
reqparm.Add("ROCD", "ALL");
reqparm.Add("ORDER", "item2");
reqparm.Add("v_gu", "S");
byte[] responsebytes = client.UploadValues("http://info.bptc.co.kr:9084/content/sw/frame/berth_status_text_frame_sw_kr.jsp", "POST", reqparm);
string responsebody = Encoding.UTF8.GetString(responsebytes);
Console.WriteLine(responsebody);
}
Values I put in the StructInfo Class
PageURL = "http://info.bptc.co.kr:9084/content/sw/frame/berth_status_text_frame_sw_kr.jsp",
ScriptValues = new Dictionary<string, string>
{
{"v_time", "month"},
{"ROCD", "ALL"},
{"ORDER", "item2"},
{"v_gu", "S"}
},
What I have tried so far are HttpClient, WebClient, WebBrowser but I had no luck.
But a strange thing is when I try to send a post with Burp Suite, data comes out just fine like the way in I wanted.
I've been searching a solution for last 4 hours, didn't have any luck.
Would you guys mind help me?
Thanks
Generated code for C# - RestSharp by Postman
var client = new RestClient("http://info.bptc.co.kr:9084/Berth_status_text_servlet_sw_kr");
client.Timeout = -1;
var request = new RestRequest(Method.POST);
request.AddHeader("Content-Type", "application/x-www-form-urlencoded");
request.AddParameter("v_time", "3days");
request.AddParameter("ROCD", "ALL");
request.AddParameter("ORDER", "item2");
request.AddParameter("v_gu", "S");
IRestResponse response = client.Execute(request);
Console.WriteLine(response.Content);
HttpClient version
using var client = new HttpClient();
var content = new FormUrlEncodedContent(new[]
{
new KeyValuePair<string, string>("v_time", "3days"),
new KeyValuePair<string, string>("ROCD", "ALL"),
new KeyValuePair<string, string>("ORDER", "item2"),
new KeyValuePair<string, string>("v_gu", "S"),
});
string url = "http://info.bptc.co.kr:9084/Berth_status_text_servlet_sw_kr";
var response = await client.PostAsync(url, content);
var bytes = await response.Content.ReadAsByteArrayAsync();
string responseString = Encoding.UTF8.GetString(bytes);
Console.WriteLine(responseString);
The issue
If we talk about the HttpClient version, assuming you are using .net core.
The exception is thrown on ReadAsStringAsync call.
More specifically down below:
https://github.com/microsoft/referencesource/blob/aaca53b025f41ab638466b1efe569df314f689ea/System/net/System/Net/Http/HttpContent.cs#L95
The response has ContentType: text/html; charset=euc-kr.
And the problem is .net core is not supporting Korean charset out of the box.
My workaround is using ReadAsByteArrayAsync instead and then using supported UTF8 encoder later. It screws Korean characters though.
The better way would be to reference the System.Text.Encoding.CodePages package and then use Encoding.RegisterProvider.
Something like this Encoding.GetEncoding can't work in UWP app
I would like to post a http request in order to login to a website using HttpClient and FormUrlEncodedContent. I can track how the response is supposed to look on my chrome browser, but when i recreate the post request made by my browser, I don't get the anticipated response. The website I'm trying to log into is https://www.lectio.dk/lectio/31/login.aspx.
I guess I'm in doubt as to what I should supply the FormUrlEncodedContent with (using the network tab in chrome I can see request headers and form data, but how do I select what I should supply?). Currently my code looks like this.
CookieContainer container = new CookieContainer();
HttpClientHandler handler = new HttpClientHandler();
handler.CookieContainer = container;
HttpClient client = new HttpClient(handler);
Console.WriteLine(client.DefaultRequestHeaders + "\n" + "--------------------------");
Dictionary<string, string> vals = new Dictionary<string, string>
{
{"user-agent","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"},
{"accept-language","en-GB,en-AS;q=0.9,en-DK;q=0.8,en;q=0.7,da-DK;q=0.6,da;q=0.5,en-US;q=0.4"},
{"accept-encoding","gzip, deflate, br"},
{"accept","text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3"},
{"m$Content$username2","username"},
{"m$Content$passwordHidden","password"},
{"__EVENTTARGET","m$Content$submitbtn2"},
};
FormUrlEncodedContent content = new FormUrlEncodedContent(vals);
var response = client.PostAsync("https://www.lectio.dk/lectio/31/login.aspx", content);
var responseString = response.Result;
Console.WriteLine(responseString);
handler.Dispose();
client.Dispose();
The idea is to be logged into the website(I guess my Cookiecontainer will take care of that?) so that I can scrape som data.
I want to go to download a string from a website, I made this php file to show an example.
(This won't work around my whole website)
The link http://swageh.co/information.php won't be downloaded using a webClient from any PC.
I prefer using a webClient.
No matter what I try, it won't downloadString.
It works fine on a browser.
It returns an error 500 An unhandled exception of type 'System.Net.WebException' occurred in System.dll
Additional information: The underlying connection was closed: An unexpected error occurred on a send. is the error
Did you change something on the server-side?
All of the following options are working just fine for me as of right now (all return just "false" with StatusCode of 200):
var client = new WebClient();
var stringResult = client.DownloadString("http://swageh.co/information.php");
Also HttpWebRequest:
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://swageh.co/information.php");
request.GetResponse().GetResponseStream();
Newer HttpClient:
var client = new HttpClient();
var req = new HttpRequestMessage(HttpMethod.Get, "http://swageh.co/information.php");
var res = client.SendAsync(req);
var stringResult = res.Result.Content.ReadAsStringAsync().Result;
it's because your website is responding with 301 Moved Permanently
see Get where a 301 URl redirects to
This shows how to automatically follow the redirect: Using WebClient in C# is there a way to get the URL of a site after being redirected?
look at Christophe Debove's answer rather than the accepted answer.
Interestingly this doesn't work - tried making headers the same as Chrome as below, perhaps use Telerik Fiddler to see what is happening.
var strUrl = "http://theurl_inhere";
var headers = new WebHeaderCollection();
headers.Add("Accept-Language", "en-US,en;q=0.9");
headers.Add("Cache-Control", "no-cache");
headers.Add("Pragma", "no-cache");
headers.Add("Upgrade-Insecure-Requests", "1");
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(strUrl);
request.Method = "GET";
request.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;
request.Accept = "text/html,application/xhtml+xml,application/xml; q = 0.9,image / webp,image / apng,*/*;q=0.8";
request.Headers.Add( headers );
request.AllowAutoRedirect = true;
request.KeepAlive = true;
request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36";
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Stream dataStream = response.GetResponseStream();
var strLastRedirect = response.ResponseUri.ToString();
StreamReader reader = new StreamReader(dataStream);
string strResponse = reader.ReadToEnd();
response.Close();
There's an ASP.NET website from a third party that requires one to log on. I need to get some data from the website and parse it, so I figured I'd use HttpClient to post the necessary credentials to the website, same as the browser would do it. Then, after that POST request, I figured I'd be able to use the cookie values I received to make further request to the (authorization-only) urls.
I'm down to the point where I can successfully POST the credentials to the login url and receive three cookies: ASP.NET_SessionId, .ASPXAUTH, and a custom value used by the website itself, each with their own values. I figured that since the HttpClient I set up is using an HttpHandler that is using a CookieContainer, the cookies would be sent along with each further request, and I'd remain logged in.
However, this does not appear to be working. If I use the same HttpClient instance to then request one of the secured areas of the website, I'm just getting the login form again.
The code:
const string loginUri = "https://some.website/login";
var cookieContainer = new CookieContainer();
var clientHandler = new HttpClientHandler() { CookieContainer = cookieContainer, AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate };
var client = new HttpClient(clientHandler);
client.DefaultRequestHeaders.Accept.Clear();
client.DefaultRequestHeaders.Accept.Add(new System.Net.Http.Headers.MediaTypeWithQualityHeaderValue("application/json"));
var loginRequest = new HttpRequestMessage(HttpMethod.Post, loginUri);
// These form values correspond with the values posted by the browser
var formContent = new FormUrlEncodedContent(new[]
{
new KeyValuePair<string, string>("customercode", "password"),
new KeyValuePair<string, string>("customerid", "username"),
new KeyValuePair<string, string>("HandleForm", "Login")
});
loginRequest.Content = formContent;
loginRequest.Headers.UserAgent.ParseAdd("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.79 Safari/537.36 Edge/14.14393");
loginRequest.Headers.Referrer = new Uri("https://some.website/Login?ReturnUrl=%2f");
loginRequest.Headers.Host = "some.website";
loginRequest.Headers.Connection.Add("Keep-Alive");
loginRequest.Headers.CacheControl = new System.Net.Http.Headers.CacheControlHeaderValue() { NoCache = true };
loginRequest.Headers.AcceptLanguage.ParseAdd("nl-NL");
loginRequest.Headers.AcceptEncoding.ParseAdd("gzip, deflate");
loginRequest.Headers.Accept.ParseAdd("text/html, application/xhtml+xml, image/jxr, */*");
var response = await client.SendAsync(loginRequest);
var responseString = await response.Content.ReadAsStringAsync();
var cookies = cookieContainer.GetCookies(new Uri(loginUri));
When using the proper credentials, cookies contains three items, including a .ASPXAUTH cookie and a session id, which suggests that the login succeeded. However:
var text = await client.GetStringAsync("https://some.website/secureaction");
...this just returns the login form again, and not the content I get when I log in using the browser and navigate to /secureaction.
What am I missing?
EDIT: here's the complete request my application is making and the request chrome is making. They are identical, save for the cookie values. I ran them through windiff: the lines marked <! are the lines sent by my application, the ones marked !> are sent by Chrome.
GET https://some.website/secureaction
Connection: keep-alive
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36
Accept-Encoding: gzip, deflate, sdch, br
Upgrade-Insecure-Requests: 1
Host: some.website
Accept-Language:nl-NL,
>> nl;q=0.8,en-US;q=0.6,en;q=0.4
Accept: text/html,
>> application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Cookie:
<! customCookie=7CF190C0;
<! .ASPXAUTH=37D61E47(shortened for readability);
<! ASP.NET_SessionId=oqwmfwahpvf0qzpiextx0wtb
!> ASP.NET_SessionId=kn4t4rmeu2lfrgozjjga0z2j;
!> customCookie=8D43E263;
!> .ASPXAUTH=C2477BA1(shortened for readability)
The HttpClient application get a 302 referral to /login, Chrome gets a 200 response containing the requested page.
As requested, here's how I eventually made it work. I had to do a simple GET request to /login first, and then do a POST with the login credentials. I don't recall what value exactly is being set by that GET (I assume a cookie with some encoded value the server wants), but the HttpClient takes care of the cookies anyway, so it just works. Here's the final, working code:
const string loginUri = "https://some.website/login";
var cookieContainer = new CookieContainer();
var clientHandler = new HttpClientHandler()
{
CookieContainer = cookieContainer,
AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate
};
var client = new HttpClient(clientHandler);
client.DefaultRequestHeaders.Accept.Clear();
client.DefaultRequestHeaders.Accept.Add(new System.Net.Http.Headers.MediaTypeWithQualityHeaderValue("application/json"));
// First do a GET to the login page, allowing the server to set certain
// required cookie values.
var initialGetRequest = new HttpRequestMessage(HttpMethod.GET, loginUri);
await client.SendAsync(initialGetRequest);
var loginRequest = new HttpRequestMessage(HttpMethod.Post, loginUri);
// These form values correspond with the values posted by the browser
var formContent = new FormUrlEncodedContent(new[]
{
new KeyValuePair<string, string>("customercode", "password"),
new KeyValuePair<string, string>("customerid", "username"),
new KeyValuePair<string, string>("HandleForm", "Login")
});
loginRequest.Content = formContent;
loginRequest.Headers.UserAgent.ParseAdd("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.79 Safari/537.36 Edge/14.14393");
loginRequest.Headers.Referrer = new Uri("https://some.website/Login?ReturnUrl=%2f");
loginRequest.Headers.Host = "some.website";
loginRequest.Headers.Connection.Add("Keep-Alive");
loginRequest.Headers.CacheControl = new System.Net.Http.Headers.CacheControlHeaderValue() { NoCache = true };
loginRequest.Headers.AcceptLanguage.ParseAdd("nl-NL");
loginRequest.Headers.AcceptEncoding.ParseAdd("gzip, deflate");
loginRequest.Headers.Accept.ParseAdd("text/html, application/xhtml+xml, image/jxr, */*");
var response = await client.SendAsync(loginRequest);
var responseString = await response.Content.ReadAsStringAsync();
var cookies = cookieContainer.GetCookies(new Uri(loginUri));
I am trying to make a request to an API called Pacer.gov. I'm expecting a file to be returned, but I'm not getting it. Can someone help me with what I'm missing?
So my C# Rest call looks like this:
(The variable PacerSession is the authentication cookie I got (with help from #jonathon-reinhart); read more about that here: How do I use RestSharp to POST a login and password to an API?)
var client = new RestClient("https://pcl.uscourts.gov/dquery");
client.CookieContainer = new System.Net.CookieContainer();
//var request = new RestRequest("/dquery", Method.POST);
var request = new RestRequest(Method.POST);
request.AddParameter("download", "1");
request.AddParameter("dl_fmt", "xml");
request.AddParameter("party", "Moncrief");
request.AddHeader("user-agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36");
request.AddHeader("content-type", "text/plain; charset=utf-8");
request.AddHeader("accept", "*/*");
request.AddHeader("accept-encoding", "gzip, deflate, sdch");
request.AddHeader("accept-language", "en-US,en;q=0.8");
request.AddHeader("cookie", "PacerSession=" + PacerSession);
IRestResponse response = client.Execute(request);
If I just type the URL https://pcl.uscourts.gov/dquery?download=1&dl_fmt=xml&party=Moncrief into Chrome, I get back an XML file. When I look at the IRestResponse, I don't see anything that looks like a file. Is there something wrong with my request or am I getting the file back and just need to know how to retrieve it?
Here's part of the file I get back if I use the URL directly in the browser:
Here's what I see in VS when I debug it and look at the IRestResponse variable:
UPDATE - 6/3/16
Received this response from Pacer tech support:
In the Advanced REST Client, you will see a HTTP 302 response (a redirect to another page). In a normal browser, the redirect is automatically followed without the user seeing anything (even on the URL in the browser).
The ARC does not automatically follow that redirect to the target page.
You can see in the header of the response the target URL that has the results.
If you manually cut and paste this URL to the ARC as a HTTP GET request, you will get the XML results. I have never used C#, but there is usually a property associated with web clients that will force the client to follow the redirect.
I tried adding this:
client.FollowRedirects = true;
but I'm still not seeing an xml file when I debug this code:
IRestResponse response = client.Execute(request);
How do I get the file? Is there something I have to do to get the file from the URL it's being redirected to?
There's one major problem with your code. You're only carrying one of the three cookies that checp-pacer-passwd.pl returns. You need to preserve all three. The following code is a possible implementation of this, with some notes afterwards.
public class PacerClient
{
private CookieContainer m_Cookies = new CookieContainer();
public string Username { get; set; }
public string Password { get; set; }
public PacerClient(string username, string password)
{
this.Username = username;
this.Password = password;
}
public void Connect()
{
var client = new RestClient("https://pacer.login.uscourts.gov");
client.CookieContainer = this.m_Cookies;
RestRequest request = new RestRequest("/cgi-bin/check-pacer-passwd.pl", Method.POST);
request.AddParameter("loginid", this.Username);
request.AddParameter("passwd", this.Password);
IRestResponse response = client.Execute(request);
if (response.Cookies.Count < 1)
{
throw new WebException("No cookies returned.");
}
}
public XmlDocument SearchParty(string partyName)
{
string requestUri = $"/dquery?download=1&dl_fmt=xml&party={partyName}";
var client = new RestClient("https://pcl.uscourts.gov");
client.CookieContainer = this.m_Cookies;
var request = new RestRequest(requestUri);
IRestResponse response = client.Execute(request);
if (!String.IsNullOrEmpty(response.Content))
{
XmlDocument result = new XmlDocument();
result.LoadXml(response.Content);
return result;
}
else return null;
}
}
It's easiest to just keep a hold of the CookieContainer throughout the entire time you're working with Pacer. I wrapped the functionality into a class, just to make it a little easier to package up with this answer, but you can implement it however you want. I didn't put in any real error checking, so you probably want to check that response.ResponseUri is actually the search page and not the logon page, and that the content is actually well-formed XML.
I've tested this using my own Pacer account, like so:
PacerClient client = new PacerClient(Username, Password);
client.Connect();
var document = client.SearchParty("Moncrief");