I am trying to make a request to an API called Pacer.gov. I'm expecting a file to be returned, but I'm not getting it. Can someone help me with what I'm missing?
So my C# Rest call looks like this:
(The variable PacerSession is the authentication cookie I got (with help from #jonathon-reinhart); read more about that here: How do I use RestSharp to POST a login and password to an API?)
var client = new RestClient("https://pcl.uscourts.gov/dquery");
client.CookieContainer = new System.Net.CookieContainer();
//var request = new RestRequest("/dquery", Method.POST);
var request = new RestRequest(Method.POST);
request.AddParameter("download", "1");
request.AddParameter("dl_fmt", "xml");
request.AddParameter("party", "Moncrief");
request.AddHeader("user-agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36");
request.AddHeader("content-type", "text/plain; charset=utf-8");
request.AddHeader("accept", "*/*");
request.AddHeader("accept-encoding", "gzip, deflate, sdch");
request.AddHeader("accept-language", "en-US,en;q=0.8");
request.AddHeader("cookie", "PacerSession=" + PacerSession);
IRestResponse response = client.Execute(request);
If I just type the URL https://pcl.uscourts.gov/dquery?download=1&dl_fmt=xml&party=Moncrief into Chrome, I get back an XML file. When I look at the IRestResponse, I don't see anything that looks like a file. Is there something wrong with my request or am I getting the file back and just need to know how to retrieve it?
Here's part of the file I get back if I use the URL directly in the browser:
Here's what I see in VS when I debug it and look at the IRestResponse variable:
UPDATE - 6/3/16
Received this response from Pacer tech support:
In the Advanced REST Client, you will see a HTTP 302 response (a redirect to another page). In a normal browser, the redirect is automatically followed without the user seeing anything (even on the URL in the browser).
The ARC does not automatically follow that redirect to the target page.
You can see in the header of the response the target URL that has the results.
If you manually cut and paste this URL to the ARC as a HTTP GET request, you will get the XML results. I have never used C#, but there is usually a property associated with web clients that will force the client to follow the redirect.
I tried adding this:
client.FollowRedirects = true;
but I'm still not seeing an xml file when I debug this code:
IRestResponse response = client.Execute(request);
How do I get the file? Is there something I have to do to get the file from the URL it's being redirected to?
There's one major problem with your code. You're only carrying one of the three cookies that checp-pacer-passwd.pl returns. You need to preserve all three. The following code is a possible implementation of this, with some notes afterwards.
public class PacerClient
{
private CookieContainer m_Cookies = new CookieContainer();
public string Username { get; set; }
public string Password { get; set; }
public PacerClient(string username, string password)
{
this.Username = username;
this.Password = password;
}
public void Connect()
{
var client = new RestClient("https://pacer.login.uscourts.gov");
client.CookieContainer = this.m_Cookies;
RestRequest request = new RestRequest("/cgi-bin/check-pacer-passwd.pl", Method.POST);
request.AddParameter("loginid", this.Username);
request.AddParameter("passwd", this.Password);
IRestResponse response = client.Execute(request);
if (response.Cookies.Count < 1)
{
throw new WebException("No cookies returned.");
}
}
public XmlDocument SearchParty(string partyName)
{
string requestUri = $"/dquery?download=1&dl_fmt=xml&party={partyName}";
var client = new RestClient("https://pcl.uscourts.gov");
client.CookieContainer = this.m_Cookies;
var request = new RestRequest(requestUri);
IRestResponse response = client.Execute(request);
if (!String.IsNullOrEmpty(response.Content))
{
XmlDocument result = new XmlDocument();
result.LoadXml(response.Content);
return result;
}
else return null;
}
}
It's easiest to just keep a hold of the CookieContainer throughout the entire time you're working with Pacer. I wrapped the functionality into a class, just to make it a little easier to package up with this answer, but you can implement it however you want. I didn't put in any real error checking, so you probably want to check that response.ResponseUri is actually the search page and not the logon page, and that the content is actually well-formed XML.
I've tested this using my own Pacer account, like so:
PacerClient client = new PacerClient(Username, Password);
client.Connect();
var document = client.SearchParty("Moncrief");
Related
Every way found on internet regarding sending a request in C# .NET, and setting a custom header / authorization is only visible in the VS request, but when i check on fiddler it's not there. I don't believe it's a code problem, i think it has to do with something else.
Latest way tried:
var form = new MultipartFormDataContent();
form.Add(new ByteArrayContent(fileContent, 0, fileContent.Length), "image", filename);
HttpRequestMessage message = new HttpRequestMessage(HttpMethod.Post, URL)
{
Content = form
};
message.Headers.Authorization = new AuthenticationHeaderValue("Basic", AuthCode);
var response = httpClient.SendAsync(message).Result;
string apiResponse = response.Content.ReadAsStringAsync().Result;
I am trying to make some bitbucket api requests using csharp.bitbucket library. I have some code which fetches a request token then builds up an authenticate url. The authenticate url looks something like
https://bitbucket.org/api/1.0/oauth/authenticate/?oauth_token=xxxxxx
Where xxxxx is my token that I have already retrieved via bitbucket api.
The issue I am having is when I try to using webclient download the url I always get the bitbucket login page even though I am passing an authorisation header. When i hit the authenticate url using postman and pass through the same token and authorisation header it all works. My code looks like this:
using (var wc = new CookieWebClient(_username, _password))
{
pageText = wc.DownloadString(url);
}
The CookieWebClient class looks like
public class CookieWebClient : WebClient
{
public CookieContainer m_container = new CookieContainer();
public WebProxy proxy = null;
public CookieWebClient(string authenticationUser,string authenticationPassword)
{
string credentials = Convert.ToBase64String(Encoding.ASCII.GetBytes(authenticationUser + ":" + authenticationPassword));
Headers[HttpRequestHeader.Authorization] = "Basic " + credentials;
}
protected override WebRequest GetWebRequest(Uri address)
{
try
{
ServicePointManager.DefaultConnectionLimit = 1000000;
WebRequest request = base.GetWebRequest(address);
request.Proxy = proxy;
var webRequest = request as HttpWebRequest;
webRequest.Headers[HttpRequestHeader.UserAgent] = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36";
webRequest.PreAuthenticate = true;
webRequest.AllowAutoRedirect = true;
webRequest.Pipelined = true;
webRequest.KeepAlive = true;
if (webRequest != null)
{
webRequest.CookieContainer = m_container;
}
return webRequest;
}
catch
{
return null;
}
}
}
It looks like the authenticate part via webclient is not working becuase when i make the DownloadString call I get the bitbucket login page.
Anyone seen this before?
Thanks in advance
Ismail
So in answer to my own question, after looking at fiddler and postman I could see that when calling authenticate it was doing a 301 redirect and losing the authorisation header so I updated my code to hit the url it was trying to 301 to.
So instead of authenticate i goto authorise directly while passing my token and authorisation header and now it all works. This all used to work however I think at bitbucket's end they have changed something hence the breakage.
So issue is 301 redirect losing authorisation header that has been set. Hope this helps someone.
Ismail
I have tried just about all related solutions found on the Web, but they all refused to work for some reason. And this does not work too: C# - HttpWebRequest POST (Login to Facebook) , since we are using different methods.
And I am not using the POST method, but the GET method, which is being used in a request. The site I am using does not need any login credentials to get the image. (Most of the other root domains the site has does not require a cookie.)
The below code is a part of what I figured out to make the program get the image like the web-based versions do, but with a few problems.
Before, I was trying to use a normal WebClient to download the image since it refused to show up in any way that the PictureBox control would accept. But then I switched to HttpWebRequest.
The particular root domain of the site where I am trying to get the image from requires a cookie, though.
Below is a code snippet which basically tries to get an image from a site. The only trouble is, it is almost impossible to get the image from the site unless you pass a few things in the HttpWebRequest, along with a cookie.
For now, I am using a static cookie as a temporary workaround.
HttpWebRequest _request = (HttpWebRequest)HttpWebRequest.Create(_URL);
_request.Method = WebRequestMethods.Http.Get;
_request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8";
_request.Headers.Set(HttpRequestHeader.AcceptEncoding, "gzip,deflate,sdch");
_request.Headers.Set(HttpRequestHeader.AcceptLanguage, "en-US,en;q=0.8");
_request.Headers.Set(HttpRequestHeader.CacheControl, "max-age=0");
_request.Host = "www.habbo" + _Country;
_request.UserAgent = "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.146 Safari/537.36";
using (WebResponse _response = _request.GetResponse())
using (Stream _stream = _response.GetResponseStream())
{
Image _image = Image.FromStream(_stream);
_bitmap = new Bitmap(_image);
string contentType = _response.ContentType;
_PictureBox.Image = _bitmap;
}
Let's let the following variables be:
_URL = "http://www.habbo.com/habbo-imaging/avatarimage?hb=img&user=aa&direction=2&head_direction=2&size=m&img_format=gif";
_Country = ".com";
Most of the things I am passing into the HttpWebRequest is obtained from looking at the Network tab of Google Chrome's Developer Tools.
The web-based versions of the Habbo Imager seems to just direct people to the page where they can find the image, and their browsers seem to somehow add the cookie. What I am doing is different, as all they do is display the site where the image is located, but I want to locate the image's true location, then read from it to a type Image.
Apparently the site seems to need the user to "visit" them, according to what I read from this thread: Click here
What I would like to know is, is there a better way to get a valid cookie that the server will happily accept every time?
Or do I need to somehow trick the site into thinking the user has visited the page and seen it, thereby making them maybe return the cookie we might need, even though the user doesn't ever see the page?
Not too sure if this would mean that I need to somehow dynamically generate the cookies though.
I also do not understand how to truly create or get the cookies (and set stored cookies) using C#, so if it is possible, please use some examples.
I would prefer to not use any third-party libraries, or to change the code I am using too much. Neither is the program going to send two GET requests just to be able to get what it could get with one GET request. Thus, this wouldn't work: Passing cookie with HttpWebRequest in winforms?
I am using .NET 4.0.
It is a little bit more complicated than at first sight expected. The browser makes actually two calls. The first one returns an html script with a small piece of javascript that when executed sets a cookie and reload the page. In your c# code you have to mimic that.
In your form class add an instance variable to hold all the cookies across multiple httpwebrequest calls:
readonly CookieContainer cookiecontainer = new CookieContainer();
I have created a Builder method that creates the HttpWebRequest and returns an HttpWebResponse. It takes a namevaluecollection to add any cookies to the Cookiecontainer.
private HttpWebResponse Builder(string url, string host, NameValueCollection cookies)
{
HttpWebRequest request = (HttpWebRequest) WebRequest.Create(url);
request.Method = WebRequestMethods.Http.Get;
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8";
// _request.Headers.Set(HttpRequestHeader.AcceptEncoding, "gzip,deflate,sdch");
request.Headers.Set(HttpRequestHeader.AcceptLanguage, "en-US,en;q=0.8");
request.Headers.Set(HttpRequestHeader.CacheControl, "max-age=0");
request.Host = host;
request.UserAgent = "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.146 Safari/537.36";
request.CookieContainer = cookiecontainer;
if (cookies != null)
{
foreach (var cookiekey in cookies.AllKeys)
{
request.CookieContainer.Add(
new Cookie(
cookiekey,
cookies[cookiekey],
#"/",
host));
}
}
return (HttpWebResponse) request.GetResponse();
}
If the incoming stream turns out to be an text/html contenttype we need to parse its content and return the cookie name and value. The Parse method does just that:
// find in the html and return the three parameters in a string array
// setCookie('YPF8827340282Jdskjhfiw_928937459182JAX666', '127.0.0.1', 10);
private static string[] Parse(Stream _stream, string encoding)
{
const string setCookieCall = "setCookie('";
// copy html as string
var ms = new MemoryStream();
_stream.CopyTo(ms);
var html = Encoding.GetEncoding(encoding).GetString(ms.ToArray());
// find setCookie call
var findFirst = html.IndexOf(
setCookieCall,
StringComparison.InvariantCultureIgnoreCase) + setCookieCall.Length;
var last = html.IndexOf(");", findFirst, StringComparison.InvariantCulture);
var setCookieStatmentCall = html.Substring(findFirst, last - findFirst);
// take the parameters
var parameters = setCookieStatmentCall.Split(new[] {','});
for (int x = 0; x < parameters.Length; x++)
{
// cleanup
parameters[x] = parameters[x].Replace("'", "").Trim();
}
return parameters;
}
Now are our building blocks complete we can start calling our methods from the Click method. We use a loop to call our Builder twice to obtain a result from the given url. Based on the received contenttype we either Parse or create the Image from the stream.
private void button1_Click(object sender, EventArgs e)
{
var cookies = new NameValueCollection();
for (int tries = 0; tries < 2; tries++)
{
using (var response = Builder(_URL, "www.habbo" + _Country, cookies))
{
using (var stream = response.GetResponseStream())
{
string contentType = response.ContentType.ToLowerInvariant();
if (contentType.StartsWith("text/html"))
{
var parameters = Parse(stream, response.CharacterSet);
cookies.Add(parameters[0], parameters[1]);
}
if (contentType.StartsWith("image"))
{
pictureBox1.Image = Image.FromStream(stream);
break; // we're done, get out
}
}
}
}
}
Words of caution
This code works for the url in your question. I didn't take any measures to handle other patterns, and/or exceptions. It is up to you to add that. Also when doing this kind of scraping make sure the owner of the website does allow this.
I have written a method to post messages to an uri.
public string RestClientPost(string uri, string message = null)
{
var client = new RestClient(uri);
var request = new RestRequest(Method.POST);
request.AddHeader("Accept", "text/xml");
if (!string.IsNullOrEmpty(message))
request.AddParameter(message, ParameterType.RequestBody);
var result = "";
var response = client.Execute(request);
if (response.StatusCode == HttpStatusCode.OK)
{
result = response.Content;
Console.WriteLine(result);
}
else
{
result = response.StatusCode.ToString();
}
return result;
}
and below code is used above method to post.
public void test123()
{
string uri = "myuri"; //private uri, cannot expose.
var file= System.IO.File.ReadAllText(Path.Combine(Settings.EnvValPath, "RestClientXML", "test.XML"));
var content = new RestClientServices().RestClientPost(uri, file);
}
however, it returns "Unsupported Media type".
my test.XML's content is
<customer>
<customerName>test</customerName >
<customerStatus>OK</customerStatus >
</customer>
And using Advanced Rest Client Plugin for Google Chrome, I'm able to post it and return with string that I wanted. Is there something wrong?? I set "content-type" to "text/xml" in Advanced Rest Client.
The return message is id of the customer. e.g: 2132
im using postman,
if you can call any xml web services with this tools , then you can click on code and select restsharp and copy paste it to your code
This happened because the header "Accept" is to specify a type of return object. In this case a value of a variable content, not the type of content to send. Specify a type of content to send with: "Content-Type: application/xml".
If a return type of POST request is a media file, you can use 'image/png' or 'image/jpeg'. You can use multiple accept header values like: "application/xml, application/xhtml+xml, and image/png".
For example, you can use Fiddler to debug HTTP(s) traffic - it's a good tool for web developers.
Is there a way to spoof a web request from C# code so it doesn't look like a bot or spam hitting the site? I am trying to web scrape my website, but keep getting blocked after a certain amount of calls. I want to act like a real browser. I am using this code, from HTML Agility Pack.
var web = new HtmlWeb();
web.UserAgent =
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11";
I do way too much web scraping, but here are the options:
I have a default list of headers I add as all of these are expected from a browser:
wc.Headers[HttpRequestHeader.UserAgent] = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11";
wc.Headers[HttpRequestHeader.ContentType] = "application/x-www-form-urlencoded";
wc.Headers[HttpRequestHeader.Accept] = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
wc.Headers[HttpRequestHeader.AcceptEncoding] = "gzip,deflate,sdch";
wc.Headers[HttpRequestHeader.AcceptLanguage] = "en-GB,en-US;q=0.8,en;q=0.6";
wc.Headers[HttpRequestHeader.AcceptCharset] = "ISO-8859-1,utf-8;q=0.7,*;q=0.3";
(WC is my WebClient).
As a further help - here is my webclient class that keeps cookies stored - which is also a massive help:
public class CookieWebClient : WebClient
{
public CookieContainer m_container = new CookieContainer();
public WebProxy proxy = null;
protected override WebRequest GetWebRequest(Uri address)
{
try
{
ServicePointManager.DefaultConnectionLimit = 1000000;
WebRequest request = base.GetWebRequest(address);
request.Proxy = proxy;
HttpWebRequest webRequest = request as HttpWebRequest;
webRequest.Pipelined = true;
webRequest.KeepAlive = true;
if (webRequest != null)
{
webRequest.CookieContainer = m_container;
}
return request;
}
catch
{
return null;
}
}
}
Here is my usual use for it. Add a static copy to your base site class with all your parsing functions you likely have:
protected static CookieWebClient wc = new CookieWebClient();
And call it as such:
public HtmlDocument Download(string url)
{
HtmlDocument hdoc = new HtmlDocument();
HtmlNode.ElementsFlags.Remove("option");
HtmlNode.ElementsFlags.Remove("select");
Stream read = null;
try
{
read = wc.OpenRead(url);
}
catch (ArgumentException)
{
read = wc.OpenRead(HttpHelper.HTTPEncode(url));
}
hdoc.Load(read, true);
return hdoc;
}
The other main reason you may be crashing out is the connection is being closed by the server as you have had an open connection for too long. You can prove this by adding a try catch around the download part as above and if it fails, reset the webclient and try to download again:
HtmlDocument d = new HtmlDocument();
try
{
d = this.Download(prp.PropertyUrl);
}
catch (WebException e)
{
this.Msg(Site.ErrorSeverity.Severe, "Error connecting to " + this.URL + " : Resubmitting..");
wc = new CookieWebClient();
d = this.Download(prp.PropertyUrl);
}
This saves my ass all the time, even if it was the server rejecting you, this can re-jig the lot. Cookies are cleared and your free to roam again. If worse truly comes to worse - add proxy support and get a new proxy applied per 50-ish requests.
That should be more than enough for you to kick your own and any other sites arse.
RATE ME!
Use a regular browser and fiddler (if the developer tools are not up to scratch) and take a look at the request and response headers.
Build up your requests and request headers to match what the browser sends (you can use a couple of different browsers to asses if this makes a difference).
In regards to "getting blocked after a certain amount of calls" - throttle your calls. Only make one call every x seconds. Behave nicely to the site and it will behave nicely to you.
Chances are good that they simply look at the number of calls from your IP address per second and if it passes a threshold, the IP address gets blocked.