I'm making a simple website scraper to retrieve party names of supreme court cases(this is public information) in C# like in this sample link: https://www.supremecourt.gov/search.aspx?filename=/docket/docketfiles/html/public/19-8334.html
C# Code:
private static async void GetHtmlAsync(String docket)
{
var url = "https://www.supremecourt.gov/search.aspx?filename=/docket/docketfiles/html/public/19-8334.html";
var httpClient = new HttpClient();
httpClient.DefaultRequestHeaders.UserAgent.ParseAdd("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36 OPR/71.0.3770.234");
var html = await httpClient.GetStringAsync(url);
var htmlDocument = new HtmlAgilityPack.HtmlDocument();
htmlDocument.LoadHtml(html);
Console.WriteLine();
}
The problem is that whenever I run this, it successfully gives back the whole HTML file but without the data I need which is in enclosed in the element.
In browser:
In Runtime:
I don't know why but you should get proper response.
Try following you might get the answer.
var html = httpClient.GetAsync(url).GetAwaiter().GetResult();
I am trying to download the HTML from a site and parse it. I am actually interested in the OpenGraph data in the head section only. For most sites using the WebClient, HttpClient or HtmlAgilityPack works, but some domains I get 403, for example: westelm.com
I have tried setting up the Headers to be absolutely the same as they are when I use the browser, but I still get 403. Here is some code:
string url = "https://www.westelm.com/m/products/brushed-herringbone-throw-t5792/?";
var doc = new HtmlDocument();
using(WebClient client = new WebClient()) {
client.Headers["User-Agent"] = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36";
client.Headers["Accept"] = "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9";
client.Headers["Accept-Encoding"] = "gzip, deflate, br";
client.Headers["Accept-Language"] = "en-US,en;q=0.9";
doc.Load(client.OpenRead(url));
}
At this point, I am getting a 403.
Am I missing something or the site administrator is protecting the site from API requests?
How can I make this work? Is there a better way to get OpenGraph data from a site?
Thanks.
I used your question to resolve the same problem. IDK if you're already fixed this but I tell you how it worked for me
A page was giving me 403 for the same reasons. The thing is: you need to emulate a "web browser" from the code, sending a lot of headers.
I used one of yours headers I wasn't using (like Accept-Language)
I didn't use WebClient though, I used HttpClient to parse the webpage
private static async Task<string> GetHtmlResponseAsync(HttpClient httpClient, string url)
{
using var request = new HttpRequestMessage(HttpMethod.Get, new Uri(url));
request.Headers.TryAddWithoutValidation("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9");
request.Headers.TryAddWithoutValidation("Accept-Encoding", "gzip, deflate, br");
request.Headers.TryAddWithoutValidation("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36");
request.Headers.TryAddWithoutValidation("Accept-Charset", "UTF-8");
request.Headers.TryAddWithoutValidation("Accept-Language", "en-US,en;q=0.9");
using var response = await httpClient.SendAsync(request).ConfigureAwait(false);
if (response == null)
return string.Empty;
using var responseStream = await response.Content.ReadAsStreamAsync().ConfigureAwait(false);
using var decompressedStream = new GZipStream(responseStream, CompressionMode.Decompress);
using var streamReader = new StreamReader(decompressedStream);
return await streamReader.ReadToEndAsync().ConfigureAwait(false);
}
If it helps you, I'm glad. If not, I will leave this answer here to help someone else in the future!
So Im learning RestSharp
But I'm stuck at this problem which is getting specific string for client cookies here is my code:
var cookieJar = new CookieContainer();
var client = new RestClient("https://server.com")
{
UserAgent =
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36",
};
client.CookieContainer = cookieJar;
var request = new RestRequest(Method.GET);
var cookie = client.CookieContainer.GetCookieHeader(new Uri("https://server.com"));
MessageBox.Show(""+cookie);
and I always get the cookie empty can anyone helps me?
This will set the cookie for your client. After all, you need to do is client.Execute. The code is in C# pretty sure you can make it work for anything else.
string myUserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36";
client.CookieContainer.Add(new Cookie("UserAgent", myUserAgent) { Domain = myUri.Host });
I am currently trying to figure out how to setup some testing for web services in C#.
I have referenced the web services in my project and have populated the request, I am just wondering how I can call the request method?
Below is the existing code, and I am trying to simulate using the AddNewResponder web service. All of the items that the web service asks for are populated below, I just can't seem to figure out how to execute the web service code.
static void Main(string[] args)
{
int testID = 0;
//populate the test user with user data
TestUser tUser = GetUserData(testID);
//Create Request Body
RCWS.AddNewResponderRequestBody respRequestBody = new RCWS.AddNewResponderRequestBody();
respRequestBody.PriorityCode = tUser.PriCode;
respRequestBody.ClientCode = "TestData";
respRequestBody.Domain = "TestDomain";
respRequestBody.IPAddress = "192.168.2.1";
respRequestBody.Source = "web";
respRequestBody.OS = "WinNT";
respRequestBody.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36";
respRequestBody.Browser = "Chrome";
//Create Request
RCWS.AddNewResponderRequest addNewResp = new RCWS.AddNewResponderRequest(respRequestBody);
}
I am trying to fake a post request to a site programmed with c#.
I used WireShark to sniff the communication between my computer and the server.
I noticed that the client send viewstate data (encoded in Base64) and I would like to know how to fake it in my request.
my post code
public static void sendPostRequest(string responseUri,CookieCollection responseCookies)
{
HttpWebRequest mPostRequest =
(HttpWebRequest)WebRequest.Create("http://tickets.cinema-city.co.il/webtixsnetglilot/SelectSeatPage2.aspx?dtticks=" + responseUri + "&hideBackButton=1");
mPostRequest.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36";
mPostRequest.KeepAlive = false;
mPostRequest.Method = "Post";
mPostRequest.ContentType = "application/x-www-form-urlencoded";
CookieContainer mCookies= new CookieContainer();
foreach (Cookie cookie in responseCookies)
{
mCookies.Add(cookie);
}
mPostRequest.CookieContainer = mCookies;
HttpWebResponse myHttpWebResponse2 = (HttpWebResponse)mPostRequest.GetResponse();
}
If you can "fake" signed/encrypted data you don't really need to deal with fake posts - just steal all SSL traffic :).
View state comes in original response for the page encrypted - so you simply need to parse original response (use Html Agility Pack) and send that view state back in post request.