crawl link from site

crawl link from site - c#

i have one issue. i want to crawl link from one site (sample: www.x.com/date/counter of news).
now, my solution is:
1- i have lastest link that stored in my database, like as below:
www.x.com/2015/01/13/99901
2- i get newest link from site, like as below:
www.x.com/2015/01/12/99905
3- i want to loop between 99901 ~ 99905 for generate link between above both link, like as below:
www.x.com/2015/01/12/99901
www.x.com/2015/01/_( I don't know this day is /12 or /13 )_/99902
www.x.com/2015/01/_( I don't know this day is /12 or /13 )_/99903
www.x.com/2015/01/_( I don't know this day is /12 or /13 )_/99904
www.x.com/2015/01/13/99905
now, i know that when day of date changed ?!!

You should at first check what's response for non-existing page (e.g. 01/12/99999). Then you should loop over with "first" day, check the response, if you get the same response, add 1 to day and repeat until you receive expected response.

Related

Get entire URL from Request.Headers["Referer"]?

I'm using the TableFilter javascript library in a .NET 6.0 project (C#). When I've filtered out some data I can click on an item to edit it. After I've edited it I would like to return to the previous page, with the same filters applied.
When I come to the Edit page I can use Request.Headers["Referer"].ToString() to catch the referring URL. But it always only contains the URL up until the last bit, which tells TableFilter which filters to apply.
An example. If this is the URL when I've filtered:
https://localhost:7290/foo?a=2361ded1-b9bc-4007-840b-e5ddb864002b#%7B%22col_1%22%3A%7B%22flt%22%3A%22Monitor%22%7D%7D
Request.Headers["Referer"].ToString() just contains everyting up until the #:
https://localhost:7290/foo?a=2361ded1-b9bc-4007-840b-e5ddb864002b
Anyone know of the way for me to catch the last part as well?

Building a crawler to get the content of page

I'm writing one crawler to get the content of the website, however I have some doubts as following :
one URL which is debugged by Fiddler, in which I need to set some values ( set/get the sessionID, put in the dates...) in this URL with get parameters.
then I have another POST URL which uses the cookie which is contained in above URL in order to produce the content of the page with the date given above.
In C# what I did was, first I run the first URL to parse the ID , second I set the ID get the Session ID (PHPSESSID) , the third step I give the parameters with dates, Fourth, I run the final URL to get the content, but in the last step, it warns me that the date input format might not be correct, and I tried many date format types but still no results.
Is there any relation between those URLs as I did them separatedly, in order to get the content of the page ? I use the same PHPSESSID for each HTTPWebRequest

Add unique ID to a session

Hello and thanks for reading this, could really use some help ;)
On my site, you play a puzzle after you entered your information and when you either complete the game before the time runs out or not your score will be posted into the gridview below.
I cant post a image but you can check the site if you want a good view about what I'm talking about Website
everything is working right now. you can play it and as you can see the time is showed. in the database that the information is stored in you every row has a unique ID.
here is my question - when someone hits "start spillet" and adds the information to the database, how can i get the unique ID that when that just been created and store it into a session so i can use it later.
(here is a row from my Database)
ID NAME EMAIL COLLEGE CLASS/TEAM TIME
114 Carsten TESTUSER#mediacollege.dk Technology h0dt100413 54

public string generateID()
{
return Guid.NewGuid().ToString("N");
}
using this function you will get a unique id and you can user it as session.
it will different every time

query a website and retrive public data from it

I am really new in c# programming. I would like some help from you guys (if possible). I have a website (it is a shopping website ) with data : products, price, description...etc. What I would like to do is: Since the website has a search capability so I would like to get the data from it by querying the search link and get only the important data (product id, name, price and description). When I perform the search I get many pages, and every time I press next I get new page with extra list of products. How can I simply make automation of these tasks?
I searched a lot over internet I found that I need to use webclient() with regular expression, and I thought that maybe a loop over the page content and over the search result pages would be necessary.
what do you think guys?
Website Example.
I´ll appreciate any effort from your side.

What you're describing is called scraping.
What you'll want is to use something like HtmlAgilityPack to get the website. Then you find the nodes you're interested in by using the DOM, and reading their inner text.
The whole process is rather complicated, but at least I've sent you off in the right direction. For the most part, search urls tend to have the same format.
In your link for instance
http://cdon.se/hemelektronik/advanced-search?manufacturer-id=&title=.&title-matchtype=1&genre-id=&page-size=15&sort-order=142&page=2
You can change 'page' to be smething else and you can go through all the pages that way.
Added:
Also don't TRY to use regex to parse html. It drove one particular person mad...
RegEx match open tags except XHTML self-contained tags

Can I pass a .net Object via querystring?

I stucked at a condition , where i need to share values between the pages. I want to share value from Codebehind via little or no javascript. I already have a question here on SO , but using JS. Still did'nt got any result so another approach i am asking.
So I want to know can i pass any .net object in query string. SO that i can unbox it on other end conveniently.
Update
Or is there any JavaScript approach, by passing it to windows modal dialog. or something like that.
What I am doing
What i was doing is that on my parent page load. I am extracting the properties from my class that has values fetched from db. and put it in a Session["mySession"]. Some thing like this.
Session["mySession"] = myClass.myStatus which is List<int>;
Now on one my event that checkbox click event from client side, i am opening a popup. and on its page load, extracting the list and filling the checkbox list on the child page.
Now from here user can modify its selection and close this page. Close is done via a button called save , on which i am iterating through the checked items and again sending it in Session["mySession"].
But the problem is here , when ever i again click on radio button to view the updated values , it displays the previous one. That is , If my total count of list is 3 from the db, and after modification it is 1. After reopening it still displays 3 instead of 1.

Yes, you could but you would have to serialize that value so that it could be encoded as a string. I think a much better approach would be to put the object in session rather than on the URL.

I would so something like this.
var stringNumbers = intNumbers.Select(i => i.ToString()).ToArray();
var qsValue = string.Join(",", stringNumbers);
Request.Redirect("Page.aspx?numbers=" + sqValue);
Keep in mind that if there are too many numbers the query string is not the best option. Also remember that anyone can see the query string so if this data needs to be secure do not use the query string. Keep in mind the suggestions of other posters.
Note
If you are using .NET 4 you can simplify the above code:
var qsValue = string.Join(",", intNumbers);

Make the object serializable and store it in an out-of-process session.
All pages on your web application will then be able to access the object.

you could serialize it and make it printable but you shouldn't
really, you shouldn't
The specification does not dictate a minimum or maximum URL length, but implementation varies by browser and version. For example, Internet Explorer does not support URLs that have more than 2083 characters.[6][7] There is no limit on the number of parameters in a URL; only the raw (as opposed to URL encoded) character length of the URL matters. Web servers may also impose limits on the length of the query string, depending on how the URL and query string is stored. If the URL is too long, the web server fails with the 414 Request-URI Too Long HTTP status code.

I would probably use a cookie to store the object.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

crawl link from site - c#

You should at first check what's response for non-existing page (e.g. 01/12/99999). Then you should loop over with "first" day, check the response, if you get the same response, add 1 to day and repeat until you receive expected response.

Related

Get entire URL from Request.Headers["Referer"]?

Building a crawler to get the content of page

Add unique ID to a session

query a website and retrive public data from it

Can I pass a .net Object via querystring?

Categories

Resources