Submit an HTML form submit using C# - c#

I'm working on a school project which I want to see our timetable from our Windows8.1 devices with an universal app. So that I don't have to log in every time I want to check it. I need a method that logs me in to our school's website and lets me see the source code so I can see the lessons.
Website I need to log in with C# codes is here.
Source code seems like:
<form method="post" action="/adfs/ls/?SAMLRequest=nZJPb9swDMW/iqF7bMmt11SIU2QZihZosSBxd9hloG1m1WZJriglwT79lD9uM2DrYUeBj3yPP2py%0As9NdskFHypqSiZSzBE1jW2W%2Bl%2Bypuh2N2c10QqC7vJez4J/NEl8Ckk9ioyF5rJQsOCMtkCJpQCNJ%0A38jV7PFB5imXvbPeNrZjyYwInY9Wc2soaHQrdBvV4NPyoWTP3vcks0yrH4Z8aBWmv6D9qQympsug%0A77Mt1kSWJbfWNXgIU7I1dIQsuf9Usm8C4epacH6Zg/iQQ12PEQpx2dQXdcFbhCijBRCpDb41EgW8%0Aj4ZgfMlyLoqR4CN%2BUYmxFFzmeVoU119Zsjgt8VGZI5z3Nq6PIpJ3VbUYLT6vKpZ8GSBHATshlQd3%0Ad87y/cEwAGTTAdfACNvQBPoT1SQ7t3m94%2BmE2B4Yxlt43PlkbnUPTtE%2Bo4ad0kG/5jwXzruYYonr%0A/0q9l62xRQf7t4Q4F41XzfG5jdzslobYf3Odnor/2OKtfP5Zp78B&RelayState=Zadkine" id="MainForm">
<input name="ctl00$ContentPlaceHolder1$UsernameTextBox" type="text" id="ContentPlaceHolder1_UsernameTextBox" />
<input name="ctl00$ContentPlaceHolder1$PasswordTextBox" type="password" id="ContentPlaceHolder1_PasswordTextBox" />
<input type="submit" name="ctl00$ContentPlaceHolder1$SubmitButton" value="Aanmelden" id="ContentPlaceHolder1_SubmitButton" class="Resizable" />
</form>
This is it, basically. There are some __VIEWSTATE's but I don't know if they matter.
I found 2 type of solutions.
WebRequest examples here on Stack Overflow which didn't work :/
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("https://sts.zadkine.nl/adfs/ls/?SAMLRequest=nZJPb9swDMW/iqF7bMmt11SIU2QZihZosSBxd9hloG1m1WZJriglwT79lD9uM2DrYUeBj3yPP2py%0As9NdskFHypqSiZSzBE1jW2W%2Bl%2Bypuh2N2c10QqC7vJez4J/NEl8Ckk9ioyF5rJQsOCMtkCJpQCNJ%0A38jV7PFB5imXvbPeNrZjyYwInY9Wc2soaHQrdBvV4NPyoWTP3vcks0yrH4Z8aBWmv6D9qQympsug%0A77Mt1kSWJbfWNXgIU7I1dIQsuf9Usm8C4epacH6Zg/iQQ12PEQpx2dQXdcFbhCijBRCpDb41EgW8%0Aj4ZgfMlyLoqR4CN%2BUYmxFFzmeVoU119Zsjgt8VGZI5z3Nq6PIpJ3VbUYLT6vKpZ8GSBHATshlQd3%0Ad87y/cEwAGTTAdfACNvQBPoT1SQ7t3m94%2BmE2B4Yxlt43PlkbnUPTtE%2Bo4ad0kG/5jwXzruYYonr%0A/0q9l62xRQf7t4Q4F41XzfG5jdzslobYf3Odnor/2OKtfP5Zp78B&RelayState=Zadkine");
request.Method = "POST";
request.ContentType = "application/x-www-form-urlencoded";
using (var requestStream = request.GetRequestStream())
using (var writer = new StreamWriter(requestStream))
{
writer.Write("ctl00$ContentPlaceHolder1$UsernameTextBox=" + yourusername + "&ctl00$ContentPlaceHolder1$PasswordTextBox=" + yourpassword);
}
using (var responseStream = request.GetResponse().GetResponseStream())
using (var reader = new StreamReader(responseStream))
{
var result = reader.ReadToEnd();
richTextBox1.Text = result;
}
In another websites, if I try this code, I get an error like "You have to allow cookies to login." but on the website of my school I don't get any, not even "Wrong password". (If I type wrong password on a browser, I get the wrong password error.)
Duplicating the form to an .HTML file and use WebView to log in with JavaScript. If I try this, I get redirected to another page and get a very weird error like "User null couldn't recognized". So these 2 type of solutions didn't work for me.
So, the question is, how can I log in to website with C# ?

Example code in WebBrowser document completed event:
HtmlElement element;
// Filling the username
element = webBrowser.Document.GetElementById("ContentPlaceHolder1_UsernameTextBox");
if (element != null)
{
element.InnerText = "username";
}
// In case if there is no id of the input field you can get it by name
HtmlElementCollection elements = null;
elements = webBrowser.Document.All.GetElementsByName("pass");
element = elements[0];
element.InnerText = "password";
//login (click)
elements = webBrowser.Document.All.GetElementsByName("submit");
element = elements[0];
element.InvokeMember("CLICK");

Related

Scrape data from web page with HtmlAgilityPack c#

I had a problem scraping data from a web page which I got a solution
Scrape data from web page that using iframe c#
My problem is that they changed the webpage which is now https://webportal.thpa.gr/ctreport/container/track and I don't think that is using iFrames and I cannot get any data back.
Can someone tell me if I can use the same method to get data from this webpage or should I use a different aproach?
I don't know how #coder_b found that I should use https://portal.thpa.gr/fnet5/track/index.php as web page and that I should use
var reqUrlContent =
hc.PostAsync(url,
new StringContent($"d=1&containerCode={reference}&go=1", Encoding.UTF8,
"application/x-www-form-urlencoded"))
.Result;
to pass the variables
EDIT: When I check the webpage there is an input which contains the number
input type="text" id="report_container_containerno"
name="report_container[containerno]" required="required"
class="form-control" minlength="11" maxlength="11" placeholder="E/K
για αναζήτηση" value="ARKU2215462"
Can I use something to pass with HtmlAgilityPack and then it should be easy to read the result
Also when I check the DocumentNode it seems to show me the cookies page that I should agree.
Can I bypass or auto allow cookies?
Try this:
public static string Download(string search)
{
var request = (HttpWebRequest)WebRequest.Create("https://webportal.thpa.gr/ctreport/container/track");
var postData = string.Format("report_container%5Bcontainerno%5D={0}&report_container%5Bsearch%5D=", search);
var data = Encoding.ASCII.GetBytes(postData);
request.Method = "POST";
request.ContentType = "application/x-www-form-urlencoded";
request.ContentLength = data.Length;
using (var stream = request.GetRequestStream())
{
stream.Write(data, 0, data.Length);
}
using (var response = (HttpWebResponse)request.GetResponse())
using (var stream = new StreamReader(response.GetResponseStream()))
{
return stream.ReadToEnd();
}
}
Usage:
var html = Download("ARKU2215462");
UPDATE
To find the post parameters to use, press F12 in the browser to show dev tools, then select Network tab. Now, fill the search input with your ARKU2215462 and press the button.
That do a request to the server to get the response. In that request, you can inspect both request and response. There are lots of request (styles, scripts, iamges...) but you want the html pages. In this case, look this:
This is the Form data requested. If you click in "view source", you get the data encoded like "report_container%5Bcontainerno%5D=ARKU2215462&report_container%5Bsearch%5D=", as you need in your code.

How can I scrape a table that is created with JavaScript in c#

I am trying to get a table from the web page https://www.belastingdienst.nl/rekenhulpen/wisselkoersen/ using HtmlAgilityPack.
My code so far is
WebClient webClient = new WebClient();
string page = webClient.DownloadString("https://www.belastingdienst.nl/rekenhulpen/wisselkoersen/");
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(page);
List<List<string>> table = doc.DocumentNode.SelectSingleNode("//table[#class='list_result Result']")
.Descendants("tr")
.Skip(1)
.Where(tr => tr.Elements("td").Count() > 1)
.Select(tr => tr.Elements("td").Select(td => td.InnerText.Trim()).ToList())
.ToList();
My problem is that the webpage creates the table by using JavaScript and when I try to read it it throws a null exception because the web page is showing that I must enable JavaScript.
I also tried to use "GET" method
string Url = "https://www.belastingdienst.nl/rekenhulpen/wisselkoersen/";
HttpWebRequest myRequest = (HttpWebRequest)WebRequest.Create(Url);
myRequest.Method = "GET";
WebResponse myResponse = myRequest.GetResponse();
StreamReader sr = new StreamReader(myResponse.GetResponseStream(), System.Text.Encoding.UTF8);
string result = sr.ReadToEnd();
sr.Close();
myResponse.Close();
with the same results.
I already enable JavaScript in Internet Explorer and change registry as well
if (Environment.Is64BitOperatingSystem)
Regkey = Microsoft.Win32.Registry.LocalMachine.OpenSubKey(#"SOFTWARE\\Wow6432Node\\Microsoft\\Internet Explorer\\MAIN\\FeatureControl\\FEATURE_BROWSER_EMULATION", true);
else //For 32 bit machine
Regkey = Microsoft.Win32.Registry.LocalMachine.OpenSubKey(#"SOFTWARE\\Microsoft\\Internet Explorer\\Main\\FeatureControl\\FEATURE_BROWSER_EMULATION", true);
If I use a WebBrowser component I can see the web page without problem but I still can't get the table to list.
F12 is your friend in any browser.
Select the Network tab and you'll notice that all of the info is in this file :
https://www.belastingdienst.nl/data/douane_wisselkoersen/wks.douane.wisselkoersen.dd201806.xml
(I suppose that the data for july 2018 will be held in a url named *.dd201807.xml)
Using C# you will need to do a GET for that URL and parse it as XML, no need to use HtmlAgilityPack. You will need to construct the current year concatenated with the current month to pick the right URL.
Leuker kan ik het niet maken!
WebClient is an http client, not a web browser, so it won't execute JavaScript. What is need is a headless web browser. See this page for a list of headless web browsers. I have not tried any of them though, so I cannot give you a recommendation here:
Headless browser for C# (.NET)?

Can't obtain Source Code of embedded ISSUU flash

First of all what I want to do is legal (since they let you download the pdf).
I just wanted to make a faster and automatic method of downloading the pdf.
For example: http://www.lasirena.es/article/&path=10_17&ID=782
It has an embedded flash pdf and when I download that page source code, the link to the pdf:
http://issuu.com/lasirena/docs/af_fulleto_setembre_andorra_sense_c?e=3360093/9079351
Doesn't show up, the only thing that I have on the source code is this: 3360093/9079351
I tried to find a way to build the pdf link from it, but I can't find the name "af_fulleto_setembre_andorra_sense_c" anywhere...
I've made plenty of automatic downloads like this, but it's the first time that I can't build or get the pdf link and I can't seem to find a way, is it even possible?
I tried to try and find jpg's links but without success either. Either way (jpg or pdf) is fine...
PS: the Document ID doesn't show on the downloaded source code either.
Thank you.
I thought a workaround for this, some might not consider this a solution but in my case works fine because it depends on the ISSUU publisher account.
The Solution itself is making a Request to ISSUU API connected with the publisher account I'm looking for.
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://api.issuu.com/query?action=issuu.documents.list" +
"&apiKey=Inser Your API Key" +
"&format=json" +
"&documentUsername=User of the account you want to make a request" +
"&pageSize=100&resultOrder=asc" +
"&responseParams=name,documentId,pageCount" +
"&username=Insert your ISSUU username" +
"&token=Insert Your Token here");
request.Method = "POST";
request.ContentType = "application/x-www-form-urlencoded";
request.Accept = "application/json";
try
{
using (WebResponse response = request.GetResponse())
{
var responseValue = string.Empty;
// grab the response
using (var responseStream = response.GetResponseStream())
{
using (var reader = new StreamReader(responseStream))
{
responseValue = reader.ReadToEnd();
}
}
if (responseValue != "")
{
List<string> lista_linkss = new List<string>();
JObject ApiRequest = JObject.Parse(responseValue);
//// get JSON result objects into a list
IList<JToken> results = ApiRequest["rsp"]["_content"]["result"]["_content"].Children()["document"].ToList();
for (int i = 0; i < results.Count(); i++)
{
Folheto folheto = new Folheto();
folheto.name = results[i]["name"].ToString();
folheto.documentId = results[i]["documentId"].ToString();
folheto.pageCount = Int32.Parse(results[i]["pageCount"].ToString());
string _date = Newtonsoft.Json.JsonConvert.SerializeObject(results[i]["uploadTimestamp"], Formatting.None, new IsoDateTimeConverter() { DateTimeFormat = "yyyy-MM-dd hh:mm:ss" }).Replace(#"""", string.Empty);
folheto.uploadTimestamp = Convert.ToDateTime(_date);
if (!lista_nomes_Sirena.Contains(folheto.name))
{
list.Add(folheto);
}
}
}
}
}
catch (WebException ex)
{
// Handle error
}
You have to pay attention to the Parameter "pageSize" the maximum permitted by the API is 100, this means the maximum number of results you get is 100, since the account I'm following has around 240 pdf's, I used this request once with the Parameter "resultOrder = asc" and another time with the value "resultOrder=desc".
This allowed me to get the first 100 pdfs and the latest 100 pdfs inserted.
Since I didn't need a history but just the pdf's they will be sending out from now, it didn't make a difference.
Finalizing my code I'm sending all the document's ID's to a sql database I made, and when I start the program, I make a check to see if the ID was already downloaded, if not it downloads the pdf, if yes it doesn't.
Hope someone can find this work around useful

How to post to asp.net validation required page with C# and read response

I am writing my own specific product crawler. Now there is a product selling website which uses post data for pages. I really really need to able to post data and read the response. But they are using asp.net validation and it is so messed up. I really could not figure how to properly post data and read. I am using htmlagilitypack. If it is possible to post data with htmlagilitypack and read the response it would be really really awesome.
Now this is the example page : http://www.hizlial.com/HizliListele.aspx?CatID=482643
When you opened the page look at the class "urun_listele"
You will see the options there
20 Ürün Listele
40 Ürün Listele
60 Ürün Listele
Tümünü Listele
Those numbers are product counts to be displayed. Tümünü listele means list all products. Now I really need to post data and get all of the products under that product category. I used firebug to debug and tried to code below but i still got default number of products
private void button11_Click(object sender, RoutedEventArgs e)
{
StringBuilder srBuilder = new StringBuilder();
AppendPostParameter(srBuilder, "ctl00$ContentPlaceHolder1$cmbUrunSayi", "full");
srBuilder = srBuilder.Replace("&", "", srBuilder.Length - 1, 1);
byte[] byteArray = Encoding.UTF8.GetBytes(srBuilder.ToString());
HttpWebRequest hWebReq = (HttpWebRequest)WebRequest.Create("http://www.hizlial.com/HizliListele.aspx?CatID=482643");
hWebReq.Method = "POST";
hWebReq.ContentType = "application/x-www-form-urlencoded";
using (Stream requestStream = hWebReq.GetRequestStream())
{
requestStream.Write(byteArray, 0, byteArray.Length);
}
HtmlDocument hd = new HtmlDocument();
using (HttpWebResponse response = (HttpWebResponse)hWebReq.GetResponse())
{
using (StreamReader sr = new StreamReader(response.GetResponseStream()))
{
var htmlstring = sr.ReadToEnd();
}
}
}
static private void AppendPostParameter(StringBuilder sb, string name, string value)
{
sb.AppendFormat("{0}={1}&", name, HttpUtility.UrlEncode(value));
}
After i get the data I will load it to the htmlagilitypack HtmlDocument
Any help is appreciated.
C# 4.0 , wpf application, htmlagiltiypack
ASP .Net uses __EVENTTARGET and __EVENTARGUMENT fields to simulate Windows Forms behavior. To simulate Change event of combobox on server you need to append to form field to request they are __EVENTTARGET as 'ctl00$ContentPlaceHolder1$cmbUrunSayi' and __EVENTARGUMENT as ''.
If you look onchange code of combo and __doPostBack method you will understand what I mean. You can insert the code below after your declaration of srBuilder. That way code will work.
AppendPostParameter(srBuilder, "__EVENTTARGET", "ctl00$ContentPlaceHolder1$cmbUrunSayi");
AppendPostParameter(srBuilder, "__EVENTARGUMENT", string.Empty);
You will also need to extract __VIEWSTATE & __EVENTVALIDATION values. To get them just send a dummy request and extaract that values and cookies from that request and then append them into new one...

Logging into a website using HttpWebRequest/Response in C#?

Now, first off, I want to understand whether or not its better to use HttpWebRequest and Response or whether its better to simply use a webbrowser control. Most people seem to prefer to use the web browser, however whenever I ask people about it, they tell me that HttpWebRequest and Response is better. So, if this question could be avoided by switching to a web browser (and there's a good reason as to why its better), please let me know!
Basically, I set up a test site, written in PHP, running on localhost. It consists of three files....
The first is index.php, which just contains a simple login form, all the session and everything is just me testing how sessions work, so its not very well written, like I said, its just for testing purposes:
<?php
session_start();
$_SESSION['id'] = 2233;
?>
<form method="post" action="login.php">
U: <input type="text" name="username" />
<br />
P: <input type="password" name="password" />
<br />
<input type="submit" value="Log In" />
</form>
Then, I have login.php (the action of the form), which looks like:
<?php
session_start();
$username = $_POST['username'];
$password = $_POST['password'];
if ($username == "username" && $password == "password" && $_SESSION['id'] == 2233)
{
header('Location: loggedin.php');
die();
}
else
{
die('Incorrect login details');
}
?>
And lastly, loggedin.php just displays "Success!" (using the element).
As you can see, a very simple test, and many of the things I have there are just for testing purposes.
So, then I go to my C# code. I created a method called "HttpPost". It looks like:
private static string HttpPost(string url)
{
request = HttpWebRequest.Create(url) as HttpWebRequest;
request.CookieContainer = cookies;
request.UserAgent = userAgent;
request.KeepAlive = keepAlive;
request.Method = "POST";
response = request.GetResponse() as HttpWebResponse;
if (response.StatusCode != HttpStatusCode.Found)
throw new Exception("Website not found");
StreamReader sr = new StreamReader(response.GetResponseStream());
return sr.ReadToEnd();
}
I built a Windows Form application, so in the button Click event, I want to add the code to call the HttpPost method with the appropriate URL. However, I'm not really sure what I'm supposed to put there to cause it to log in.
Can anyone help me out? I'd also appreciate some general pointers on programatically logging into websites!
Have you considered using WebClient?
It provides a set of abstract methods for use with web pages, including UploadValues, but I'm not sure if that would work for your purposes.
Also, it's probably better not to use WebBrowser as that's a full blown web browser that can execute scripts and such; HttpWebRequest and WebClient are much more light weight.
Edit : Login to website, via C#
Check this answer out, I think this is exactly what you're looking for.
Relevant code snippet from above link :
var client = new WebClient();
client.BaseAddress = #"https://www.site.com/any/base/url/";
var loginData = new NameValueCollection();
loginData.Add("login", "YourLogin");
loginData.Add("password", "YourPassword");
client.UploadValues("login.php", "POST", loginData);
You should use something like WCF Web Api HttpClient. It much easier to achieve.
Following code is writte off the top of my head. But it should give you the idea.
using (var client = new HttpClient())
{
var data = new Dictionary<string, string>(){{"username", "username_value"}, {"password", "the_password"}};
var content = new FormUrlEncodedContent(data);
var response = client.Post("yourdomain/login.php", content);
if (response.StatusCode == HttpStatusCode.OK)
{
//
}
}

Categories

Resources