Get html data over Get Request C#

Get html data over Get Request C# - c#

I made a little API in PHP that return some user information after a successful login. The information will be returned in HTML format with Paragraph ID's. Here's an example of data return in HTML:
<body>
<p id="msg">Successful login</p>
<p id="uid">1</p>
<p id="username">Joey</p>
<p id="email">Test#gmail.com</p>
<p id="hwid"></p>
<p id="funds">0</p>
</body>
So I want to post the login data to the API and get the information by HTML-IDs.
The API:
api.php?set=login&username={USER}&password={PASS}

First up - I'd suggest using JSON instead of HTML for this - PHP has json_encode and json_decode - and then you can add the JSON.Net nuget package to deserialize on your end very easily.
echo json_encode(resultObject)
and then in c#
JsonConvert.DeserializeObject<ResultType>(downloadedString)
Then all you need to do is look into HttpWebRequest and WebRequest, to download that string from your api
That would look something like
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create("http://url/api");
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
using (StreamReader reader = new StreamReader(response.GetResponseStream()))
{
String downloadedString = reader.ReadToEnd();
}
Alternatively, you can process the HTML as XML
XmlDocument doc = new XmlDocument();
doc.Load(response.GetResponseStream());
String msg = doc.GetElementById("msg").Value;

Related

Scrape data from web page with HtmlAgilityPack c#

I had a problem scraping data from a web page which I got a solution
Scrape data from web page that using iframe c#
My problem is that they changed the webpage which is now https://webportal.thpa.gr/ctreport/container/track and I don't think that is using iFrames and I cannot get any data back.
Can someone tell me if I can use the same method to get data from this webpage or should I use a different aproach?
I don't know how #coder_b found that I should use https://portal.thpa.gr/fnet5/track/index.php as web page and that I should use
var reqUrlContent =
hc.PostAsync(url,
new StringContent($"d=1&containerCode={reference}&go=1", Encoding.UTF8,
"application/x-www-form-urlencoded"))
.Result;
to pass the variables
EDIT: When I check the webpage there is an input which contains the number
input type="text" id="report_container_containerno"
name="report_container[containerno]" required="required"
class="form-control" minlength="11" maxlength="11" placeholder="E/K
για αναζήτηση" value="ARKU2215462"
Can I use something to pass with HtmlAgilityPack and then it should be easy to read the result
Also when I check the DocumentNode it seems to show me the cookies page that I should agree.
Can I bypass or auto allow cookies?

Try this:
public static string Download(string search)
{
var request = (HttpWebRequest)WebRequest.Create("https://webportal.thpa.gr/ctreport/container/track");
var postData = string.Format("report_container%5Bcontainerno%5D={0}&report_container%5Bsearch%5D=", search);
var data = Encoding.ASCII.GetBytes(postData);
request.Method = "POST";
request.ContentType = "application/x-www-form-urlencoded";
request.ContentLength = data.Length;
using (var stream = request.GetRequestStream())
{
stream.Write(data, 0, data.Length);
}
using (var response = (HttpWebResponse)request.GetResponse())
using (var stream = new StreamReader(response.GetResponseStream()))
{
return stream.ReadToEnd();
}
}
Usage:
var html = Download("ARKU2215462");
UPDATE
To find the post parameters to use, press F12 in the browser to show dev tools, then select Network tab. Now, fill the search input with your ARKU2215462 and press the button.
That do a request to the server to get the response. In that request, you can inspect both request and response. There are lots of request (styles, scripts, iamges...) but you want the html pages. In this case, look this:
This is the Form data requested. If you click in "view source", you get the data encoded like "report_container%5Bcontainerno%5D=ARKU2215462&report_container%5Bsearch%5D=", as you need in your code.

How can I scrape a table that is created with JavaScript in c#

I am trying to get a table from the web page https://www.belastingdienst.nl/rekenhulpen/wisselkoersen/ using HtmlAgilityPack.
My code so far is
WebClient webClient = new WebClient();
string page = webClient.DownloadString("https://www.belastingdienst.nl/rekenhulpen/wisselkoersen/");
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(page);
List<List<string>> table = doc.DocumentNode.SelectSingleNode("//table[#class='list_result Result']")
.Descendants("tr")
.Skip(1)
.Where(tr => tr.Elements("td").Count() > 1)
.Select(tr => tr.Elements("td").Select(td => td.InnerText.Trim()).ToList())
.ToList();
My problem is that the webpage creates the table by using JavaScript and when I try to read it it throws a null exception because the web page is showing that I must enable JavaScript.
I also tried to use "GET" method
string Url = "https://www.belastingdienst.nl/rekenhulpen/wisselkoersen/";
HttpWebRequest myRequest = (HttpWebRequest)WebRequest.Create(Url);
myRequest.Method = "GET";
WebResponse myResponse = myRequest.GetResponse();
StreamReader sr = new StreamReader(myResponse.GetResponseStream(), System.Text.Encoding.UTF8);
string result = sr.ReadToEnd();
sr.Close();
myResponse.Close();
with the same results.
I already enable JavaScript in Internet Explorer and change registry as well
if (Environment.Is64BitOperatingSystem)
Regkey = Microsoft.Win32.Registry.LocalMachine.OpenSubKey(#"SOFTWARE\\Wow6432Node\\Microsoft\\Internet Explorer\\MAIN\\FeatureControl\\FEATURE_BROWSER_EMULATION", true);
else //For 32 bit machine
Regkey = Microsoft.Win32.Registry.LocalMachine.OpenSubKey(#"SOFTWARE\\Microsoft\\Internet Explorer\\Main\\FeatureControl\\FEATURE_BROWSER_EMULATION", true);
If I use a WebBrowser component I can see the web page without problem but I still can't get the table to list.

F12 is your friend in any browser.
Select the Network tab and you'll notice that all of the info is in this file :
https://www.belastingdienst.nl/data/douane_wisselkoersen/wks.douane.wisselkoersen.dd201806.xml
(I suppose that the data for july 2018 will be held in a url named *.dd201807.xml)
Using C# you will need to do a GET for that URL and parse it as XML, no need to use HtmlAgilityPack. You will need to construct the current year concatenated with the current month to pick the right URL.
Leuker kan ik het niet maken!

WebClient is an http client, not a web browser, so it won't execute JavaScript. What is need is a headless web browser. See this page for a list of headless web browsers. I have not tried any of them though, so I cannot give you a recommendation here:
Headless browser for C# (.NET)?

How Can I Read The XML

I'm getting geografic info from a webservice.
I'm trying to parse the return data for hours, but have been getting no where.
HttpWebRequest request = WebRequest.Create(url) as HttpWebRequest;
using (HttpWebResponse response = request.GetResponse() as HttpWebResponse)
{
StreamReader reader = new StreamReader(response.GetResponseStream());
string result = reader.ReadToEnd();
XDocument document = XDocument.Parse(result, LoadOptions.None);
i got this
document <html>
<body>
<state>Apure</state>
<municipality>RÓMULO GALLEGOS</municipality>
<parish>URBANA ELORZA</parish>
<street>La Trinidad De Arauca</street>
</body>
</html> System.Xml.Linq.XDocument
I try
document.Elements("state")
document.Descendants("body")
document.GetElementsByTagName("state");
But nothing.
I'm sure there is a simple way of do something so basic.
I'm seriously considering convert that to a string and do the parsing myself.
Aditional consideration:
The fields include it in the result is variable.
Because some info doesnt have all fields.

Ok, I make a change.
I read a XElement instead of a XDocument;
XElement sitemap = XElement.Parse(result, LoadOptions.None);
foreach (var bodyElement in sitemap.Elements("body"))
{
foreach (var fieldElement in bodyElement.Elements())
{
Console.WriteLine(fieldElement.Name);
Console.WriteLine(fieldElement.Value);
}
}
Probably there is a way to skip the first foreach, but still looking for it.
#Jonesy line works but that mean I have to know the fields names. This way i just create the info for the values I got.

How to integrate HTML markup from another URL in C#

I have an aspx file that loads HTML markup. This markup contains a div element which is basically a container for another HTML markup retrieved from another URL. A code snippet would look like this:
<div id="container">
<%= RetrieveIntegrationMarkup() %>
</div>
What is the best way to retrieve the markup in the RetrieveIntegrationMarkup()? Currently, we are using a workaround to accept self-signed SSL certificates, but it only works in our test environments. It doesn't work in the production environment.
I don't know if this will help, but here's the snippet of the said method:
HttpWebRequest.DefaultCachePolicy = new HttpRequestCachePolicy(HttpRequestCacheLevel.Revalidate);
ServicePointManager.CertificatePolicy = new MyPolicy();
Uri serviceUri = new Uri(integrationUrl, UriKind.Absolute);
HttpWebRequest webRequest = (HttpWebRequest)System.Net.WebRequest.Create(serviceUri);
HttpWebResponse response = (HttpWebResponse)webRequest.GetResponse();
using (var sr = new StreamReader(response.GetResponseStream()))
{
markup= sr.ReadToEnd();
}
Thanks!

WebRequest using Mozilla Firefox

I need to have access at the HTML of a Facebook page, to extract from it some data. So, I need to create a WebRequest.
Example:
My code worked well for other sites, but for Facebook, I must be logged in to can access the HTML.
How can I use Firefox data for creating a WebRequest for Facebook page?
I tried this:
List<string> HTML_code = new List<string>();
WebRequest request = WebRequest.Create(URL);
using (WebResponse response = request.GetResponse())
using (StreamReader stream = new StreamReader(response.GetResponseStream()))
{
string line;
while ((line = stream.ReadLine()) != null)
{
HTML_code.Add(line);
}
}
...but the HTML resulted is the HTML of Facebook Home Page when I am not logged in.

If what you are trying to is retrieve the number of likes from a Facebook page, you can use Facebook's Graph API service. Just too keep it simple, this is what I basically did in the code:
Retrieve the Facebook page's data. In this case I used the Coke page's data since it was an example FB had listed.
Parse the returned Json using Json.Net. There are other ways to do this, but this just keeps it simple, and you can get Json.Net over at Codeplex. The documentation that I looked for my code was from this page in the docs. Their documentation will also help you with parsing and serializing even more Json if you need to.
Then that basically translates in to this code. Just note that I left out all the fancy exception handling to keep it simple as using networking is not always reliable! Also don't forget to include the Json.Net library in your project!
Usings:
using System.IO;
using System.Net;
using Newtonsoft.Json.Linq;
Code:
string url = "https://graph.facebook.com/cocacola";
WebClient client = new WebClient();
string jsonData = string.Empty;
// Load the Facebook page info
Console.WriteLine("Connecting to Facebook...");
using (Stream data = client.OpenRead(url))
{
using (StreamReader reader = new StreamReader(data))
{
jsonData = reader.ReadToEnd();
}
}
// Get number of likes from Json data
JObject jsonParsed = JObject.Parse(jsonData);
int likes = (int)jsonParsed.SelectToken("likes");
// Write out the result
Console.WriteLine("Number of Likes: " + likes);

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Get html data over Get Request C# - c#

Related

Scrape data from web page with HtmlAgilityPack c#

How can I scrape a table that is created with JavaScript in c#

How Can I Read The XML

How to integrate HTML markup from another URL in C#

WebRequest using Mozilla Firefox

Categories

Resources