I'm writing one crawler to get the content of the website, however I have some doubts as following :
one URL which is debugged by Fiddler, in which I need to set some values ( set/get the sessionID, put in the dates...) in this URL with get parameters.
then I have another POST URL which uses the cookie which is contained in above URL in order to produce the content of the page with the date given above.
In C# what I did was, first I run the first URL to parse the ID , second I set the ID get the Session ID (PHPSESSID) , the third step I give the parameters with dates, Fourth, I run the final URL to get the content, but in the last step, it warns me that the date input format might not be correct, and I tried many date format types but still no results.
Is there any relation between those URLs as I did them separatedly, in order to get the content of the page ? I use the same PHPSESSID for each HTTPWebRequest
Related
I need to redirect to a url which is sent to my application as a query string that containing a url as a query string. This can be repeated in three or four levels.
Suppose my application is accessible using example.com, and I have a request like:
http://example.com/?landing={http://example2.com/?landing={http://example3.com/?landing={http://google.com}¶m=4}¶m=3}¶m=1
'{' and '}' are used to increase readability and do not exist in the urls actually.
I have to redirect to
http://example2.com/?landing={http://example3.com/?landing={http://google.com}¶m=4}¶m=3
example2 has to redirect to
http://example3.com/?landing={http://google.com}¶m=4
and example3 has to redirect to http://google.com
I don't know how to sent the 'landing' query strings such that each query string be passed to the corresponding address to be consumed, or how to access the query strings correctly in my application, or on example2.com or example3.com.
Need your help, Tnx...
I solved the problem by encoding(urlencoding or base64encoding) the query strings in source and send the user to corresponding page. On that page the query string is decoded and the user is redirect to the next one query string and this process is done till the last landing page.
i have one issue. i want to crawl link from one site (sample: www.x.com/date/counter of news).
now, my solution is:
1- i have lastest link that stored in my database, like as below:
www.x.com/2015/01/13/99901
2- i get newest link from site, like as below:
www.x.com/2015/01/12/99905
3- i want to loop between 99901 ~ 99905 for generate link between above both link, like as below:
www.x.com/2015/01/12/99901
www.x.com/2015/01/_( I don't know this day is /12 or /13 )_/99902
www.x.com/2015/01/_( I don't know this day is /12 or /13 )_/99903
www.x.com/2015/01/_( I don't know this day is /12 or /13 )_/99904
www.x.com/2015/01/13/99905
now, i know that when day of date changed ?!!
You should at first check what's response for non-existing page (e.g. 01/12/99999). Then you should loop over with "first" day, check the response, if you get the same response, add 1 to day and repeat until you receive expected response.
I am trying to use slugs in an MVC web application but can seem to work out the best way to implement them.
I have found the the recommendation on how to create the URL friendly slug stackoverflow slug post
I still want to be able to query the Db with the ID but don't want this to be in the URL similarly to most stackoverflow URLs, for example
http://website/home/list/outdoor-products
How can a slug be displayed in the URL while still passing and using an ID to be used to query with?
It's doesn't really depends on a technology/framework which you are using, the main thing is you have to have destinctive urls to unambiguously select page content.
If you do have unique titles/slugs for pages, then you may use them as identity for content selection. Otherwise, you need to put some sort of id (it could be int or guid, whatever) into your urls. There isn't anything which will hide your int id behind the slug.
Talking about stakoverflow's urls, you'll find id just before the friendly title. Another option could be put actual id at the end of friendly title (friendly-title-1559063).
The question basically drill's down to these two C# 2.0, ASP.NET 2.0 webpages.
viewtemplate.aspx
generatetemplate.aspx
Purpose of these:
viewtemplate.aspx - Displays Email template defined in 'generatetemplate.aspx', with client assigned data pulled from database
generatetemplate.aspx - Is the actual page that contains place holders for client to put data.
[i named it so because that's the file i will be generating email to be sent from]
Requirement:
I will be requesting the generatetemplate.aspx from viewtemplate.aspx
, get the rendered output of generatetemplate.aspx and then send that output as email to the recipients.
It is the rendering part which i don't know how to do.
Note:
I will be calling generatetemplate.aspx from viewtemplate.aspx with query string so that generatetemplate.aspx will Pull value from database and then render rather than rendering with default values
You wish to get the rendered HTML output of running the page? You can download it from an HTTP request like a browser would with the WebClient class.
string generated = new WebClient().DownloadString("generatetemplate.aspx?myparams=params");
"generated" will then contain rendered output that you can do whatever you like with.
if I got question right, this is looks dodgy a bit.
I've used XSL + XML for such case. So you just prepare data in XML format, than applying XSL layout and thats it.
I stucked at a condition , where i need to share values between the pages. I want to share value from Codebehind via little or no javascript. I already have a question here on SO , but using JS. Still did'nt got any result so another approach i am asking.
So I want to know can i pass any .net object in query string. SO that i can unbox it on other end conveniently.
Update
Or is there any JavaScript approach, by passing it to windows modal dialog. or something like that.
What I am doing
What i was doing is that on my parent page load. I am extracting the properties from my class that has values fetched from db. and put it in a Session["mySession"]. Some thing like this.
Session["mySession"] = myClass.myStatus which is List<int>;
Now on one my event that checkbox click event from client side, i am opening a popup. and on its page load, extracting the list and filling the checkbox list on the child page.
Now from here user can modify its selection and close this page. Close is done via a button called save , on which i am iterating through the checked items and again sending it in Session["mySession"].
But the problem is here , when ever i again click on radio button to view the updated values , it displays the previous one. That is , If my total count of list is 3 from the db, and after modification it is 1. After reopening it still displays 3 instead of 1.
Yes, you could but you would have to serialize that value so that it could be encoded as a string. I think a much better approach would be to put the object in session rather than on the URL.
I would so something like this.
var stringNumbers = intNumbers.Select(i => i.ToString()).ToArray();
var qsValue = string.Join(",", stringNumbers);
Request.Redirect("Page.aspx?numbers=" + sqValue);
Keep in mind that if there are too many numbers the query string is not the best option. Also remember that anyone can see the query string so if this data needs to be secure do not use the query string. Keep in mind the suggestions of other posters.
Note
If you are using .NET 4 you can simplify the above code:
var qsValue = string.Join(",", intNumbers);
Make the object serializable and store it in an out-of-process session.
All pages on your web application will then be able to access the object.
you could serialize it and make it printable but you shouldn't
really, you shouldn't
The specification does not dictate a minimum or maximum URL length, but implementation varies by browser and version. For example, Internet Explorer does not support URLs that have more than 2083 characters.[6][7] There is no limit on the number of parameters in a URL; only the raw (as opposed to URL encoded) character length of the URL matters. Web servers may also impose limits on the length of the query string, depending on how the URL and query string is stored. If the URL is too long, the web server fails with the 414 Request-URI Too Long HTTP status code.
I would probably use a cookie to store the object.