Scrape data from web page with HtmlAgilityPack c# - c#

I had a problem scraping data from a web page which I got a solution
Scrape data from web page that using iframe c#
My problem is that they changed the webpage which is now https://webportal.thpa.gr/ctreport/container/track and I don't think that is using iFrames and I cannot get any data back.
Can someone tell me if I can use the same method to get data from this webpage or should I use a different aproach?
I don't know how #coder_b found that I should use https://portal.thpa.gr/fnet5/track/index.php as web page and that I should use
var reqUrlContent =
hc.PostAsync(url,
new StringContent($"d=1&containerCode={reference}&go=1", Encoding.UTF8,
"application/x-www-form-urlencoded"))
.Result;
to pass the variables
EDIT: When I check the webpage there is an input which contains the number
input type="text" id="report_container_containerno"
name="report_container[containerno]" required="required"
class="form-control" minlength="11" maxlength="11" placeholder="E/K
για αναζήτηση" value="ARKU2215462"
Can I use something to pass with HtmlAgilityPack and then it should be easy to read the result
Also when I check the DocumentNode it seems to show me the cookies page that I should agree.
Can I bypass or auto allow cookies?

Try this:
public static string Download(string search)
{
var request = (HttpWebRequest)WebRequest.Create("https://webportal.thpa.gr/ctreport/container/track");
var postData = string.Format("report_container%5Bcontainerno%5D={0}&report_container%5Bsearch%5D=", search);
var data = Encoding.ASCII.GetBytes(postData);
request.Method = "POST";
request.ContentType = "application/x-www-form-urlencoded";
request.ContentLength = data.Length;
using (var stream = request.GetRequestStream())
{
stream.Write(data, 0, data.Length);
}
using (var response = (HttpWebResponse)request.GetResponse())
using (var stream = new StreamReader(response.GetResponseStream()))
{
return stream.ReadToEnd();
}
}
Usage:
var html = Download("ARKU2215462");
UPDATE
To find the post parameters to use, press F12 in the browser to show dev tools, then select Network tab. Now, fill the search input with your ARKU2215462 and press the button.
That do a request to the server to get the response. In that request, you can inspect both request and response. There are lots of request (styles, scripts, iamges...) but you want the html pages. In this case, look this:
This is the Form data requested. If you click in "view source", you get the data encoded like "report_container%5Bcontainerno%5D=ARKU2215462&report_container%5Bsearch%5D=", as you need in your code.

Related

How can I pull data from website using C#

Web-page data into the application
You can replicate the request the website makes to get a list of relevant numbers. The following code might be a good start.
var httpRequest = (HttpWebRequest) WebRequest.Create("<url>");
httpRequest.Method = "POST";
httpRequest.Accept = "application/json";
string postData = "{<json payload>}";
using (var streamWriter = new StreamWriter(httpRequest.GetRequestStream())) {
streamWriter.Write(postData);
}
var httpResponse = (HttpWebResponse) httpRequest.GetResponse();
string result;
using (var streamReader = new StreamReader(httpResponse.GetResponseStream())) {
result = streamReader.ReadToEnd();
}
Console.WriteLine(result);
Now, for the <url> and <json payload> values:
Open the web inspector in your browser.
Go to the Network tab.
Set it so Fetch/XHR/AJAX requests are shown.
Refresh the page.
Look for a request that you want to replicate.
Copy the request URL.
Copy the Payload (JSON data, to use it in your code you'll have to add a \ before every ")
Side note: The owner of the website you are making automated requests to might not be very happy about your tool, and you/it might be blocked if it makes too many requests in a short time.

Retrieving website content and returning it in ASP.NET MVC 4

I have two servers. One is a private server and I don't want users to have direct access to it, and the other one is the server that public does have access to.
I can access my private server by URL like: http://xxx.xx.xxx.xxx/
What i want to do is create some kind of "proxy", only to work with my private server. My idea is to go to: http://www.domain.com/server/path/here/something
This page should show me the content of http://xxx.xx.xxx.xxx/path/here/something
I have this working, but the only way I could make it work was to return the content as a string, and then the browser would interpret the HTML.
This works fine for pages that return HTML content, but it doesn't work (of course) if I want to access a .gif or any kind of file directly.
Here's the code I currently have:
public string Index(string url)
{
string uri = "http://xxx.xx.xxx.xxx/" + url;
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
request.Method = "GET";
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
StreamReader responseStream = new StreamReader(response.GetResponseStream());
string resultado = responseStream.ReadToEnd();
return resultado;
}
How can I change my code so that it works for any file ?
You can check the response content type and do what you need based on that.
You'll need to change your action to return ActionResult instead of string.
if(response.ContentType.Equals("text/html"))
{
//show html stuff
return Content(resultado);
}
else if(response.ContentType.Contains("image/"))
{
var ms = new MemoryStream();
responseStream.BaseStream.CopyTo(ms);
var imageBytes = ms.ToArray();
return File(imageBytes, response.ContentType);
}
you have to write a system which reads your html or images from resultado and do something according to that PLUS you need to control your Url as well.

How to post to asp.net validation required page with C# and read response

I am writing my own specific product crawler. Now there is a product selling website which uses post data for pages. I really really need to able to post data and read the response. But they are using asp.net validation and it is so messed up. I really could not figure how to properly post data and read. I am using htmlagilitypack. If it is possible to post data with htmlagilitypack and read the response it would be really really awesome.
Now this is the example page : http://www.hizlial.com/HizliListele.aspx?CatID=482643
When you opened the page look at the class "urun_listele"
You will see the options there
20 Ürün Listele
40 Ürün Listele
60 Ürün Listele
Tümünü Listele
Those numbers are product counts to be displayed. Tümünü listele means list all products. Now I really need to post data and get all of the products under that product category. I used firebug to debug and tried to code below but i still got default number of products
private void button11_Click(object sender, RoutedEventArgs e)
{
StringBuilder srBuilder = new StringBuilder();
AppendPostParameter(srBuilder, "ctl00$ContentPlaceHolder1$cmbUrunSayi", "full");
srBuilder = srBuilder.Replace("&", "", srBuilder.Length - 1, 1);
byte[] byteArray = Encoding.UTF8.GetBytes(srBuilder.ToString());
HttpWebRequest hWebReq = (HttpWebRequest)WebRequest.Create("http://www.hizlial.com/HizliListele.aspx?CatID=482643");
hWebReq.Method = "POST";
hWebReq.ContentType = "application/x-www-form-urlencoded";
using (Stream requestStream = hWebReq.GetRequestStream())
{
requestStream.Write(byteArray, 0, byteArray.Length);
}
HtmlDocument hd = new HtmlDocument();
using (HttpWebResponse response = (HttpWebResponse)hWebReq.GetResponse())
{
using (StreamReader sr = new StreamReader(response.GetResponseStream()))
{
var htmlstring = sr.ReadToEnd();
}
}
}
static private void AppendPostParameter(StringBuilder sb, string name, string value)
{
sb.AppendFormat("{0}={1}&", name, HttpUtility.UrlEncode(value));
}
After i get the data I will load it to the htmlagilitypack HtmlDocument
Any help is appreciated.
C# 4.0 , wpf application, htmlagiltiypack
ASP .Net uses __EVENTTARGET and __EVENTARGUMENT fields to simulate Windows Forms behavior. To simulate Change event of combobox on server you need to append to form field to request they are __EVENTTARGET as 'ctl00$ContentPlaceHolder1$cmbUrunSayi' and __EVENTARGUMENT as ''.
If you look onchange code of combo and __doPostBack method you will understand what I mean. You can insert the code below after your declaration of srBuilder. That way code will work.
AppendPostParameter(srBuilder, "__EVENTTARGET", "ctl00$ContentPlaceHolder1$cmbUrunSayi");
AppendPostParameter(srBuilder, "__EVENTARGUMENT", string.Empty);
You will also need to extract __VIEWSTATE & __EVENTVALIDATION values. To get them just send a dummy request and extaract that values and cookies from that request and then append them into new one...

Implementing OnSubmit with httpwebrequest

I am new to C# and just messing around with it myself, now, i have been trying to create a WinForm that can post some parameters in a webpage and do something something on the resultant webpage obtained. Now I have accomplished this on a page that uses POST method, But i am not able to do so with A webpage that has a html code like this :
<form method="post" action="test.asp" name=FrontPage_Form1 onsubmit="return FrontPage_Form1_Validator(this)">
<div align="center"><center><p>
<input name="name" size="8" maxlength=8><font color="#faebd7">---
Now i don't how How To implement this "ONSUBMIT" with HttpWebRequest..
This is my current Code :
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://cbseresults.nic.in/aieee/cbseaieee.asp");
request.Method = "POST";
string r = "regno=" + rno.ToString();
Bytes = Encoding.UTF8.GetBytes(r);
request.ContentLength = Bytes.Length;
request.ContentType = "application/x-www-form-urlencoded";
RequestStream = request.GetRequestStream();
RequestStream.Write(Bytes, 0, Bytes.Length);
RequestStream.Close();
Response = (HttpWebResponse)request.GetResponse();
StreamReader ResponseStream = new StreamReader(Response.GetResponseStream(), Encoding.ASCII);
string Result = ResponseStream.ReadToEnd();
ResponseStream.Close();
But its not working, Any Help is greatly appreciated...
Try using Fiddler in order to understand what the page is sending and receiving from the server.
Then make the request as it is shown in fiddler...
You can also use WebClient to open some pages or sending and receiving data from server.
There are some ways to click on buttons or links:
Use a WebBrowser object in your app and Iterate through objects on the page by using SelectNextControl method of WebBrowser object and then sending Enter key like so: SendKeys.Send("{Enter}");
Using JavaScript and invoking functions and reading elements using getElementById and some other methods

Using MVC2 ActionResult, how can I redirect (Post) to another site

I have a page that collects data and 'POST's to another site. I could just put he site url in the action of the form tag but I would like to record the information in my database prior to switching sites. In the ActionResult so far I have:
[HttpPost]
public ActionResult MyPage(MyPageModel model)
{
if (ModelState.IsValid)
{
StoreDate(model.fld1, model.fld2)
var encoding = new ASCIIEncoding();
var postData = "";
foreach (String postKey in Request.Form)
{
var postValue = Encode(Request.Form[postKey]);
postData += string.Format("&{0}={1}", postKey, postValue);
}
var data = encoding.GetBytes(postData);
// Prepare web request...
var myRequest = (HttpWebRequest)WebRequest.Create("https://www.site2.com");
myRequest.Method = "POST";
myRequest.ContentType = "application/x-www-form-urlencoded";
myRequest.ContentLength = data.Length;
// Send the data.
Stream newStream = myRequest.GetRequestStream();
newStream.Write(data, 0, data.Length);
newStream.Flush();
newStream.Close();
Does anyone know how to finish this and use the proper 'return' varient to have this post the data to the other site.
I have edited the snippet based on a response below.
The POST has already happened, so there's not going to be a magic bullet (i.e. a simple ActionResult) that will work for you. Since you're handling the POST response on your server, you'll need to recreate the POST request to the target server yourself. To do that you'll need to leverage an HttpWebRequest vis a vis this answer. After getting the response back from the HttpWebRequest, you'll need to pass that response back, probably via a ContentResult. All in all, it will be non-trivial, but it is possible.
Update:
Based on your snippet, I'd try adding the following:
WebResponse res = myRequest.GetResponse();
StreamReader sr = new StreamReader(res.GetResponseStream());
string returnvalue = sr.ReadToEnd();
return Content(returnValue);
Another option would be to point the form action at the other site and do an ajax post to your server before submitting the form. That would be much easier than playing man-in-the-middle with HttpWebRequest.

Categories

Resources