I open a website using webbrowser control and then save cookies in cookieContainer , and later use HTTPwebrequest to process forward browsing pages etc.
The issue arises, when i make a search and it returns 100 pages,on the first page ,it saves a cookie named : ABC ,which i add to the cookiecontainer and move to the next page , on the second page again same Cookie named: ABC is given some value, but now i have two same cookies in cookiecontainer and when i move to the next page it does not work , as its taking the first cookie which messes everything.
How to solve this?
HttpWEBREQUEST FUNCTION:
public string getHtmlCookies(string url)
{
string responseData = "";
try
{
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.Accept = "*/*";
request.AllowAutoRedirect = true;
request.UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)";
request.Timeout = 30000;
request.Method = "GET";
request.CookieContainer = yummycookies;
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
if (response.StatusCode == HttpStatusCode.OK)
{
foreach (Cookie cookie in response.Cookies)
{
string name = string.Empty;
name = cookie.Name;
string value = cookie.Value;
string path = "/";
string domain = "www.example.com";
yummycookies.Add(new Cookie(name.Trim(), value.Trim(), path, domain));
}
Stream responseStream = response.GetResponseStream();
StreamReader myStreamReader = new StreamReader(responseStream);
responseData = myStreamReader.ReadToEnd();
}
response.Close();
}
catch (Exception e)
{
responseData = "An error occurred: " + e.Message;
}
return responseData;
}
You can use SetCookies method.
var container = new System.Net.CookieContainer();
var uri = new Uri("http://www.example.com");
container.SetCookies(uri,"name=value");
container.SetCookies(uri,"name=value1");
Calling GetCookies(uri) will give a single cookie with Value=value1.
And in your case, the code would be something like
var uri = new Uri("http://www.example.com");
yummycookies.SetCookies(uri, response.Headers[HttpResponseHeader.SetCookie]);
RePierre answer, in my case, duplicates cookies if they are present in container. I have used this instead:
cookieContainer.GetAllCookies().FirstOrDefault(x => x.Name == "myCookie").Value = "MyValue";
Related
I have a little problem with cookie handling in C#
So on my web site, I have a login page, once logged in, I am redirected to the home page. I get with HttpWebRequest to connect and follow the redirection, I created a class, here it is :
class webReq
{
private string urlConnection;
private string login;
private string password;
private CookieCollection cookieContainer;
private long executionTime = 0;
public webReq(string urlCo, string login, string pass)
{
this.urlConnection = urlCo;
this.login = login;
this.password = pass;
this.cookieContainer = null;
}
public void StartConnection()
{
string WriteHTML = "D:/REM/Connection.html";
List<string> datas = new List<string>();
datas.Add("Username=" + this.login);
datas.Add("Password=" + this.password);
datas.Add("func=ll.login");
datas.Add("NextURL=/admin/livelink.exe");
datas.Add("loginbutton=Sign in");
string postData = "";
postData = string.Join("&", datas);
var buffer = Encoding.ASCII.GetBytes(postData);
try
{
var watch = System.Diagnostics.Stopwatch.StartNew();
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(this.urlConnection);
request.AllowAutoRedirect = true;
request.Method = "POST";
request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1003.1 Safari/535.19";
request.Accept = "text/html, application/xhtml+xml, */*";
request.ContentType = "application/x-www-form-urlencoded";
request.ContentLength = buffer.Length;
request.CookieContainer = new CookieContainer();
Stream stream = request.GetRequestStream();
stream.Write(buffer, 0, buffer.Length);
stream.Close();
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
stream = response.GetResponseStream();
watch.Stop();
this.executionTime = watch.ElapsedMilliseconds;
StreamReader reader = new StreamReader(stream);
System.IO.File.WriteAllText(WriteHTML, reader.ReadToEnd());
this.cookieContainer = new CookieCollection();
foreach (Cookie cookie in response.Cookies)
{
this.cookieContainer.Add(cookie);
}
}
catch (WebException ex)
{
Console.WriteLine(ex.GetBaseException().ToString());
}
}
}
I load the home page well, and I manage to get a cookie.
So I developed a function to use my cookie to browse the website :
public void connectUrl(string url, int numeroTest)
{
string WriteHTML = "D:/REM/Page"+numeroTest+".html";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
//Add cookie to request.CookieContainer
request.CookieContainer = new CookieContainer();
request.CookieContainer.Add(this.cookieContainer);
var watch = System.Diagnostics.Stopwatch.StartNew();
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Stream stream = response.GetResponseStream();
watch.Stop();
this.executionTime = watch.ElapsedMilliseconds;
StreamReader reader = new StreamReader(stream);
System.IO.File.WriteAllText(WriteHTML, reader.ReadToEnd());
}
Normally, I have to retrieve three cookies, like on the website :
Only, I can't navigate on the website, I end up on the login page, the cookies are not good, and that I'm in debug, I only loaded one cookie(BrowseSettings) out of the three(LLCookie & LLTZCookie) :
I don't understand why I can't retrieve all the cookies on the website.... If anyone has a solution!
I found the reason why I can't get all the cookies, even if I can't find exactly why it works by disabling redirection, in my StartConnection() method :
request.AllowAutoRedirect = true;
my question is, how can I call multiple requests via HttpWebRequest with same authenticate cookie in C#? I tried a lot of times but for now I dunno how to do it :/
My code is below:
var postData = "method=loginFormAccount&args[0][email]=###&args[0][pass]=###&args[0][cache]=37317&args[]=1";
var data = Encoding.ASCII.GetBytes(postData);
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("###");
request.CookieContainer = new CookieContainer();
request.Method = "POST";
request.ContentType = "application/x-www-form-urlencoded";
request.AllowAutoRedirect = true;
using (var stream = request.GetRequestStream())
{
stream.Write(data, 0, data.Length);
}
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
var cookies = new CookieContainer();
cookies.Add(response.Cookies);
System.IO.File.WriteAllText(#desktop + "\\post.html", new StreamReader(response.GetResponseStream()).ReadToEnd());
// =================================== END LOGIN ==================================== \\
System.IO.File.WriteAllText(#desktop + "\\cookie.html","");
foreach (Cookie cook in response.Cookies)
{
using (System.IO.StreamWriter file = new System.IO.StreamWriter(#desktop + "\\cookie.html", true))
{
file.WriteLine(cook.ToString());
}
// Show the string representation of the cookie.
}
HttpWebRequest requestNext = (HttpWebRequest)WebRequest.Create("####");
requestNext.CookieContainer = cookies;
requestNext.Method = "GET";
HttpWebResponse responseNext = (HttpWebResponse)requestNext.GetResponse();
//var responseString = new StreamReader(response.GetResponseStream()).ReadToEnd();
System.IO.File.WriteAllText(#desktop + "\\get.html", new StreamReader(responseNext.GetResponseStream()).ReadToEnd());
My main problem is that, cookie which I'm getting is the cookie BEFORE authenticate so I must to do something to get cookie AFTER authenticate.
Try this :
HttpWebRequest requestNext = (HttpWebRequest)WebRequest.Create("####");
requestNext.CookieContainer.Add(cookies);
I use this code to login:
CookieCollection cookies = new CookieCollection();
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("example.com");
request.CookieContainer = new CookieContainer();
request.CookieContainer.Add(cookies);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
cookies = response.Cookies;
string getUrl = "example.com";
string postData = String.Format("my parameters");
HttpWebRequest getRequest = (HttpWebRequest)WebRequest.Create(getUrl);
getRequest.CookieContainer = new CookieContainer();
getRequest.CookieContainer.Add(cookies);
getRequest.Method = WebRequestMethods.Http.Post;
getRequest.UserAgent = "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0";
getRequest.AllowWriteStreamBuffering = true;
getRequest.ProtocolVersion = HttpVersion.Version11;
getRequest.AllowAutoRedirect = true;
getRequest.ContentType = "application/x-www-form-urlencoded";
byte[] byteArray = Encoding.ASCII.GetBytes(postData);
getRequest.ContentLength = byteArray.Length;
Stream newStream = getRequest.GetRequestStream();
newStream.Write(byteArray, 0, byteArray.Length);
newStream.Close();
HttpWebResponse getResponse = (HttpWebResponse)getRequest.GetResponse();
using (StreamReader sr = new StreamReader(getResponse.GetResponseStream(), Encoding.GetEncoding("windows-1251")))
{
doc.LoadHtml(sr.ReadToEnd());
webBrowser1.DocumentText = doc.DocumentNode.OuterHtml;
}
then I want to use HtmlWeb (HtmlAgilityPack) or Webclient to parse the HTML to HtmlDocument(HtmlAgilityPack).
My problem is that when I use:
WebClient wc = new WebClient();
webBrowser1.DocumentText = wc.DownloadString(site);
or
doc = web.Load(site);
webBrowser1.DocumentText = doc.DocumentNode.OuterHtml;
The login disappear so i think I must somehow pass the cookies.. Any suggestions?
Check HtmlAgilityPack.HtmlDocument Cookies
Here is an example of what you're looking for (syntax not 100% tested, I just modified some class I usually use):
public class MyWebClient
{
//The cookies will be here.
private CookieContainer _cookies = new CookieContainer();
//In case you need to clear the cookies
public void ClearCookies() {
_cookies = new CookieContainer();
}
public HtmlDocument GetPage(string url) {
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.Method = "GET";
//Set more parameters here...
//...
//This is the important part.
request.CookieContainer = _cookies;
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
var stream = response.GetResponseStream();
//When you get the response from the website, the cookies will be stored
//automatically in "_cookies".
using (var reader = new StreamReader(stream)) {
string html = reader.ReadToEnd();
var doc = new HtmlDocument();
doc.LoadHtml(html);
return doc;
}
}
}
Here is how you use it:
var client = new MyWebClient();
HtmlDocument doc = client.GetPage("http://somepage.com");
//This request will be sent with the cookies obtained from the page
doc = client.GetPage("http://somepage.com/another-page");
Note: If you also want to use POST method, just create a method similar to GetPage with the POST logic, refactor the class, etc.
There are some recommendations here: Using CookieContainer with WebClient class
However, it's probably just easier to keep using the HttpWebRequest and set the cookie in the CookieContainer:
HTTPWebRequest and CookieContainer
http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.cookiecontainer.aspx
The code looks something like this:
// Create a HttpWebRequest
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(getUrl);
// Create the cookie container and add a cookie
request.CookieContainer = new CookieContainer();
// Add all the cookies
foreach (Cookie cookie in response.Cookies)
{
request.CookieContainer.Add(cookie);
}
The second thing is that you don't need to download the site again, since you already have it from your web response and you're saving it here:
HttpWebResponse getResponse = (HttpWebResponse)getRequest.GetResponse();
using (StreamReader sr = new StreamReader(getResponse.GetResponseStream(), Encoding.GetEncoding("windows-1251")))
{
webBrowser1.DocumentText = doc.DocumentNode.OuterHtml;
}
You should be able to just take the HTML and parse it with the HTML Agility Pack:
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(webBrowser1.DocumentText);
And that should do it... :)
Try caching cookies from previous response locally and resend them each web request as follows:
private CookieCollection cookieCollection;
...
parserObject = new HtmlWeb
{
AutoDetectEncoding = true,
PreRequest = request =>
{
if (cookieCollection != null)
cookieCollection.Cast<Cookie>()
.ForEach(cookie => request.CookieContainer.Add(cookie));
return true;
},
PostResponse = (request, response) => { cookieCollection = response.Cookies; }
};
I am trying to scrape content from this page: https://www.google.com/search?hl=en&biw=1920&bih=956&tbm=shop&q=Xenon+12640&oq=Xenon+12640&aq=f&gs_l=serp.3...3743.3743.0.3905.1.1.0.0.0.0.0.0..0.0.ekh..0.0.Hq3XS7AxFDU&sei=Dr_MT_WOM6nO2AWE25mTCA&gbv=2
The problem I am experiencing is that opening that url in a browser I get everything I need to scrape but scraping the same link in the code, two (important) pieces are missing, the reviews number and the ratings, below the price and the seller info.
Here is the screenshot from the internal web client in c#: http://gyazo.com/908a37c7f70712fba1f82ec90a604d4d.png?1338822369
Here is the code with which I am trying to get the content:
public string navGet(string inURL, CookieContainer inCookieContainer, bool GZip, string proxyAddress, int proxyPort,string proxyUserName, string proxyPassword)
{
try
{
this.currentUrl = inURL;
HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create(inURL);
webRequest.Timeout = this.TimeOutSetting;
webRequest.CookieContainer = inCookieContainer;
if (proxyAddress == "0" || proxyPort == 0)
{ }
else
{
webRequest.Proxy = new WebProxy(proxyAddress, proxyPort);
// Use login credentials to access proxy
NetworkCredential networkCredential = new NetworkCredential(proxyUserName, proxyPassword);
webRequest.Proxy.Credentials = networkCredential;
}
Uri destination = webRequest.Address;
webRequest.KeepAlive = true;
webRequest.Method = "GET";
webRequest.Accept = "*/*";
webRequest.Headers.Add("Accept-Language", "en-us");
if (GZip)
{
webRequest.Headers.Add("Accept-Encoding", "gzip, deflate");
}
webRequest.AllowAutoRedirect = true;
webRequest.UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; FunWebProducts; .NET CLR 1.1.4322; .NET CLR 2.0.50727)";
webRequest.ContentType = "text/xml";
//webRequest.CookieContainer.Add(inCookieContainer.GetCookies(destination));
try
{
string strSessionID = inCookieContainer.GetCookies(destination)["PHPSESSID"].Value;
webRequest.Headers.Add("Cookie", "USER_OK=1;PHPSESSID=" + strSessionID);
}
catch (Exception ex2)
{
}
HttpWebResponse webResponse = (HttpWebResponse)webRequest.GetResponse();
if (webRequest.HaveResponse)
{
// First handle cookies
foreach(Cookie retCookie in webResponse.Cookies)
{
bool cookieFound = false;
foreach(Cookie oldCookie in inCookieContainer.GetCookies(destination))
{
if (retCookie.Name.Equals(oldCookie.Name))
{
oldCookie.Value = retCookie.Value;
cookieFound = true;
}
}
if (!cookieFound)
inCookieContainer.Add(retCookie);
}
// Read response
Stream responseStream = responseStream = webResponse.GetResponseStream();
if (webResponse.ContentEncoding.ToLower().Contains("gzip"))
{
responseStream = new GZipStream(responseStream, CompressionMode.Decompress);
}
else if (webResponse.ContentEncoding.ToLower().Contains("deflate"))
{
responseStream = new DeflateStream(responseStream, CompressionMode.Decompress);
}
StreamReader stream = new StreamReader(responseStream, System.Text.Encoding.Default);
string responseString = stream.ReadToEnd();
stream.Close();
this.currentUrl = webResponse.ResponseUri.ToString();
this.currentAddress = webRequest.Address.ToString();
setViewState(responseString);
return responseString;
}
throw new Exception("No response received from host.");
return "An error was encountered";
}
catch(Exception ex)
{
//MessageBox.Show("NavGet:" + ex.Message);
return ex.Message;
}
}
Looks like it happens because the reviews number and the ratings are generated dynamically using Java Script (probably AJAX or something else). In this case you need to analyze additional traffic that takes place when the page is loaded in the browser and find where this data is transfered or analize JavaScript code to see how it's generated.
I'm trying to write a bit of code to login to a website. But it's not working. Please can you give me some advice. This is my a bit of code:
static void Main(string[] args)
{
CookieContainer container = new CookieContainer();
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://pagehype.com/login.php");
request.Method = "POST";
request.Timeout = 10000;
request.ReadWriteTimeout = 30000;
request.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 (.NET CLR 3.5.30729) (Prevx 3.0.5)";
request.CookieContainer = container;
ASCIIEncoding encoding = new ASCIIEncoding();
string postData = "username=user&password=password&processlogin=1&return=";
byte[] data = encoding.GetBytes(postData);
request.ContentType = "application/x-www-form-urlencoded";
request.ContentLength = data.Length;
Stream newStream = request.GetRequestStream();
newStream.Write(data, 0, data.Length);
newStream.Close();
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
StreamReader reader = new StreamReader(response.GetResponseStream());
string htmldoc = reader.ReadToEnd();
response.Close();
Console.Write(htmldoc);
}
Many thanks,
Use http://www.fiddler2.com/fiddler2/ to view the http request sent went you login in using a browser and ensure that the request you build in code is the same.
PHP logins use PHPSESSID cookie. You'll need to capture this and pass it back in the CookieContainer. This is how the server will recognise you as an authenticated user.
The cookie is set in the Set-Cookie header in the initial response. You'll need to parse it to recreate the cookie in your container (don't forget the path (and domain?)
var setCookie = response.GetResponseHeader("Set-Cookie");
response.Close();
container = new CookieContainer();
foreach (var cookie in setCookie.Split(','))
{
var split = cookie.Split(';');
string name = split[0].Split('=')[0];
string value = split[0].Split('=')[1];
var c = new Cookie(name, value);
if (cookie.Contains(" Domain="))
c.Domain = split.Where(x => x.StartsWith(" Domain")).First().Split('=')[1];
else
{
c.Domain = ".pagehype.com";
}
if (cookie.Contains(" Path="))
c.Path = split.Where(x => x.StartsWith(" Path")).First().Split('=')[1];
container.Add(c);
}
Then add this container to your request.