WebScraping from aspx page using WebClient C# - c#

I'm trying to crawl data from a aspx page, which have three dropdowns: State, District, and City.They are implemented as the dependency dropdowns with the server side post back.
I Have all the ids of the State, District, and the City.I am writing a Console application using WebClient to Post all three drop-down ids as a form data to the page. But every time it is redirecting to an error page. Can anyone help me to set all the drop-down values at a time with single post call?
Code Snippet:
var formValues = new NameValueCollection();
formValues["__VIEWSTATE"] = Extract("__VIEWSTATE", responseString);
formValues["__EVENTVALIDATION"] = Extract("__EVENTVALIDATION", responseString);
formValues["ddlSelectLanguage"] = "en-US";
formValues["ddlState"] = "19";
formValues["DDLDistrict"] = "237";
formValues["DDLVillage"] = "bcab59fd-35d2-e111-882d-001517f1d35c";
client.Headers.Set(HttpRequestHeader.UserAgent, "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36");
var responseData = client.UploadValues(firstPage, formValues);
responseString = Encoding.ASCII.GetString(responseData);

Related

Scraping Website with C# using HTML Request Not Giving Table Data

I'm making a simple website scraper to retrieve party names of supreme court cases(this is public information) in C# like in this sample link: https://www.supremecourt.gov/search.aspx?filename=/docket/docketfiles/html/public/19-8334.html
C# Code:
private static async void GetHtmlAsync(String docket)
{
var url = "https://www.supremecourt.gov/search.aspx?filename=/docket/docketfiles/html/public/19-8334.html";
var httpClient = new HttpClient();
httpClient.DefaultRequestHeaders.UserAgent.ParseAdd("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36 OPR/71.0.3770.234");
var html = await httpClient.GetStringAsync(url);
var htmlDocument = new HtmlAgilityPack.HtmlDocument();
htmlDocument.LoadHtml(html);
Console.WriteLine();
}
The problem is that whenever I run this, it successfully gives back the whole HTML file but without the data I need which is in enclosed in the element.
In browser:
In Runtime:
I don't know why but you should get proper response.
Try following you might get the answer.
var html = httpClient.GetAsync(url).GetAwaiter().GetResult();

C# WebClient receives 403 when getting html from a site

I am trying to download the HTML from a site and parse it. I am actually interested in the OpenGraph data in the head section only. For most sites using the WebClient, HttpClient or HtmlAgilityPack works, but some domains I get 403, for example: westelm.com
I have tried setting up the Headers to be absolutely the same as they are when I use the browser, but I still get 403. Here is some code:
string url = "https://www.westelm.com/m/products/brushed-herringbone-throw-t5792/?";
var doc = new HtmlDocument();
using(WebClient client = new WebClient()) {
client.Headers["User-Agent"] = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36";
client.Headers["Accept"] = "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9";
client.Headers["Accept-Encoding"] = "gzip, deflate, br";
client.Headers["Accept-Language"] = "en-US,en;q=0.9";
doc.Load(client.OpenRead(url));
}
At this point, I am getting a 403.
Am I missing something or the site administrator is protecting the site from API requests?
How can I make this work? Is there a better way to get OpenGraph data from a site?
Thanks.
I used your question to resolve the same problem. IDK if you're already fixed this but I tell you how it worked for me
A page was giving me 403 for the same reasons. The thing is: you need to emulate a "web browser" from the code, sending a lot of headers.
I used one of yours headers I wasn't using (like Accept-Language)
I didn't use WebClient though, I used HttpClient to parse the webpage
private static async Task<string> GetHtmlResponseAsync(HttpClient httpClient, string url)
{
using var request = new HttpRequestMessage(HttpMethod.Get, new Uri(url));
request.Headers.TryAddWithoutValidation("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9");
request.Headers.TryAddWithoutValidation("Accept-Encoding", "gzip, deflate, br");
request.Headers.TryAddWithoutValidation("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36");
request.Headers.TryAddWithoutValidation("Accept-Charset", "UTF-8");
request.Headers.TryAddWithoutValidation("Accept-Language", "en-US,en;q=0.9");
using var response = await httpClient.SendAsync(request).ConfigureAwait(false);
if (response == null)
return string.Empty;
using var responseStream = await response.Content.ReadAsStreamAsync().ConfigureAwait(false);
using var decompressedStream = new GZipStream(responseStream, CompressionMode.Decompress);
using var streamReader = new StreamReader(decompressedStream);
return await streamReader.ReadToEndAsync().ConfigureAwait(false);
}
If it helps you, I'm glad. If not, I will leave this answer here to help someone else in the future!

C# RestSharp Cookies

So Im learning RestSharp
But I'm stuck at this problem which is getting specific string for client cookies here is my code:
var cookieJar = new CookieContainer();
var client = new RestClient("https://server.com")
{
UserAgent =
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36",
};
client.CookieContainer = cookieJar;
var request = new RestRequest(Method.GET);
var cookie = client.CookieContainer.GetCookieHeader(new Uri("https://server.com"));
MessageBox.Show(""+cookie);
and I always get the cookie empty can anyone helps me?
This will set the cookie for your client. After all, you need to do is client.Execute. The code is in C# pretty sure you can make it work for anything else.
string myUserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36";
client.CookieContainer.Add(new Cookie("UserAgent", myUserAgent) { Domain = myUri.Host });

Call a Web Service reference Request in C#

I am currently trying to figure out how to setup some testing for web services in C#.
I have referenced the web services in my project and have populated the request, I am just wondering how I can call the request method?
Below is the existing code, and I am trying to simulate using the AddNewResponder web service. All of the items that the web service asks for are populated below, I just can't seem to figure out how to execute the web service code.
static void Main(string[] args)
{
int testID = 0;
//populate the test user with user data
TestUser tUser = GetUserData(testID);
//Create Request Body
RCWS.AddNewResponderRequestBody respRequestBody = new RCWS.AddNewResponderRequestBody();
respRequestBody.PriorityCode = tUser.PriCode;
respRequestBody.ClientCode = "TestData";
respRequestBody.Domain = "TestDomain";
respRequestBody.IPAddress = "192.168.2.1";
respRequestBody.Source = "web";
respRequestBody.OS = "WinNT";
respRequestBody.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36";
respRequestBody.Browser = "Chrome";
//Create Request
RCWS.AddNewResponderRequest addNewResp = new RCWS.AddNewResponderRequest(respRequestBody);
}

fake http post request - viewstate

I am trying to fake a post request to a site programmed with c#.
I used WireShark to sniff the communication between my computer and the server.
I noticed that the client send viewstate data (encoded in Base64) and I would like to know how to fake it in my request.
my post code
public static void sendPostRequest(string responseUri,CookieCollection responseCookies)
{
HttpWebRequest mPostRequest =
(HttpWebRequest)WebRequest.Create("http://tickets.cinema-city.co.il/webtixsnetglilot/SelectSeatPage2.aspx?dtticks=" + responseUri + "&hideBackButton=1");
mPostRequest.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36";
mPostRequest.KeepAlive = false;
mPostRequest.Method = "Post";
mPostRequest.ContentType = "application/x-www-form-urlencoded";
CookieContainer mCookies= new CookieContainer();
foreach (Cookie cookie in responseCookies)
{
mCookies.Add(cookie);
}
mPostRequest.CookieContainer = mCookies;
HttpWebResponse myHttpWebResponse2 = (HttpWebResponse)mPostRequest.GetResponse();
}
If you can "fake" signed/encrypted data you don't really need to deal with fake posts - just steal all SSL traffic :).
View state comes in original response for the page encrypted - so you simply need to parse original response (use Html Agility Pack) and send that view state back in post request.

Categories

Resources