How to detect that the URLs navigate to the same webpage - c#

I'm working with webbrowser tool trying to build my own browser.
Something that i'm having trouble doing is the history part.
When the document completes navigating , I search in my database if its URL doesn't exist then I add it to the history, else I just increase the "counter" of this page in the database.
The problem is that when I enter some pages each time it gives me different URL but it's the same page ! such as google.com , when I navigate to it it gives me in the first time (for example) : https://www.google.co.il/?gws_rd=cr&ei=eBP-UtPCOMi84ASukoCAAw
the second time I navigate :
https://www.google.co.il/?gws_rd=cr&ei=rhP-UpW6CYG54ATAqIHIDg
Is there a way to identify that both these URLs lead to the same page ??
I'm trying to do this because when I load the history to my application , many URLs are loaded that are leading to the same page.
Any help is appreciated , thanx in advance

You can use the Uri object and ask for the AbsolutePath property

I personally would expect my browser to have the history by URL and not by content (that's what you actually try to do as far as I understand). But if you want to avoid these multiple entries, you might calculate a hash code for each content received by that page and increase your counter.
The problem is that you cannot know what the server will do with that URL. It might be the same today and different tomorrow. I also wouldn't just go for the URL without the parameters because on other pages the parameter might make a really important difference.
Another note: In case you hash the content, you might want to exclude things like 404 pages (which can occur with different URLs and shouldn't be grouped under the same hash.)

Related

Hiding Querystring in ASP.NET 2.0

Our site consists of 3 main pages we call "Start.aspx" and then a content iframe inside of that where the user does nearly all of the site interactions.
Recently though, I've had to implement functionality that will jump between Start.aspx pages in different products and automatically change the content iframe to a specified page.
The actual functionality works just fine, but the issue we're having is that the full querystring is exposed. Because we load all pages in the content iframe, the page URL remains at "Product/Start.aspx" during regular site usage.
However, this new functionality is passing a querystring to Start.aspx (which has appropriate parsers to load the requested page in the content iframe), and we need that URL to remain as "Start.aspx".
So far, I've researched into URL Rewriting, which was throwing errors because the landing page for each product is "[Product]/Start.aspx". I've looked at a different URL Rewriting solution, as well as ScottGu's blog post on routing.
The issue is that these solutions seem to be used for simplifying navigation, such as taking "Blogpost.aspx?Year=2013&Month=07&Day=15" and turning it into "Blogpost.aspx/2013/07/14" which really isn't what we're going for. We're not trying to simplify navigation via URL, we're really just trying to completely hide our querystrings.
What we're going for is turning "[Product]/Start.aspx?frame=Company.aspx?id=1570" into "[Product]/Start.aspx" once the content iframe has what it needs from the initial querystring. We don't need to account for every single page. We just need that to be the overarching rule. 90% of the time it won't be an issue, as most of the work being done doesn't jump from product to product without the user just switching products (which is done in a fashion that specifically uses "Response.Redirect("[Product]/Start.aspx")".
Once the content iframe has loaded from the Querystring paramters, we don't need them anymore for anything. The rest of the functionality runs through the iframe without any issue.
Am I overthinking this, or am I asking for something that's not really feasible?
As far as literally "removing all of the query string characters" and still beg able to pass the querystring values to another page, I do not think that is possible. Unless you do it in a Session Variable or something like that.
IF you're simply worried about sensitive data being displayed in plain text in the query string, there is the option of "encrypting" the query string:
http://www.codeproject.com/Articles/33350/Encrypting-Query-Strings
The query string will still show but it will be "Product/Start.aspx?e0ayfefae0y0someencryptedmess108yfe0ayf0a". The page that receives the query string would decrypt it. So the functionality of the query string is there, but the values are not known to the end user.
Since you've tagged this as an ASP.NET question, I'd say the way to go is to keep navigation data in your Session variables.
Can you use a POST instead of a GET? That way, the data is in the form, rather than the Query String.
As a side note, hiding the parameters as a way of making the URL look nicer and be bookmark-able is fine. If you're doing it for any kind of security reasons, it's very shallow security. It's trivial for a user to see what's being passed in both the form and on the query string and to change and repost those. Security needs to be handled primarily on the server side.

Caching issue with Firefox

I am working on an ASP.NET/MVC4 app and I fetch data continuously and my problem is related to caching.
The problem is that when I click on a particular link in my application it works fine, but sometimes it automatically redirects to the INDEX page that is the default page.
I surfed around about this problem and found that it's a problem in Mozilla that it maintains caching of every link. But sometimes some weird things happen and it automatically redirects a particular link to the INDEX page (301 Permanently REMOVED) and also stores it in the cache such that now every time I click on that link it always redirects me to the INDEX page that's been cached.
So now I have to clear the cache in my browser every time I face this problem.
How can I make it not automatically redirect to the cached INDEX page?
You should really expand on what exactly is happening at that particular link you mention because well it should not 301 redirect unless your telling it to.
Also you say I fetch data continuously. What does this mean to us? Why is this important to know? Explain if this changes the link or the data? Are you 404ing the older data or something? That could possibly explain why you 301 back to your index.
Now with the limited information we have been given by you... if you want to prevent firefox from caching your urls/redirects simply make your url have a querystring that updates which each request. Like using a timestamp.
For example: http://example.com/return-data.asp?timestamp=1350668920
Then each time you continuously fetch data update the page's link accordingly
For example: http://example.com/return-data.asp?timestamp=1350669084

Dynamic CSS - caching problem?

I am using an HttpHandler to modify some CSS (only simple colours) on the fly, based on a technique I read about on SO.
Everything works just fine expect on the page where I am giving the user the option to specify the colours they want. Ideally as soon as the user saves his new colours and the page refreshes I want the new colours to be displayed. However they only come through when I explicitly press the browser reload or F5 key.
I appreciate that something somewhere (IIS or the browser) is doing some helpful caching of my stylesheet which 999 times in 1000 is exactly what I want, however on this particular page event I want to be able to force a reload and cause the HttpHandler to fire.
Anyone understand how this works and what I can do?
Things I have tried:
Response.Clear();
Response.Cache.SetCacheability(HttpCacheability.NoCache);
Response.Expires = -1;
Response.Cache.SetExpires(DateTime.Now.AddDays(-1));
Because I am also using ASP.NET themes adding a querystring the stylesheet link isn't really a simple option.
Thoughts anyone?
This can be solved with technique that I use on my sites to cause reloads of assets once they have changed, such as after a deploy.
Append ?value to the end of your CSS url, where value corresponds to the version, or some unique value the browser hasn't seen yet. In my case I use the file modification time, however in your case since the CSS is dynamic on almost every pageload, I suggest generating some unique value.
Since the URL is always different, the browser will always reload it and it will never get put into its cache.

WebCrawling Dynamic Links

Anybody has any idea on crawling websites that have dynamic pages/queries? I mean if I click a certain link, it has different values every I try to reload it in a web browser. Now my webcrawler could not download the contents of these pages. Please advise.
it would be the same way even it is dynamic or not. actually a crawler is only a mater of 3 things
The url
The data it sent to server if it is a POST Method then
The cookie if authentication is required
that's all,
the common problem when doing crawler:
Miss-guess of default page [index.html, index.php, default.aspx etc].. actually it will work without it for all method [POST/GET]
One of each field name is not written exactly
ASP.Net form viewstate id field (i forgot the name) but i can be achieve easily
Dynamic page generated by javascript. this one is the hardest part and the most cases even google still have problem about this.
hope that help.
You might want to look at this question which details how to write a crawler or look at the source code for http://searcharoo.net/ which contains a good crawler (see here).

Bookmarkabale ajax calls with MVC routing

I have a page with a menu that uses JQuery AJAX calls to populate the page with. To reflect any changes I update the URL with a #... instead of ?... or /... So an URL that originally reads : htpp://localhost/pages/index/id=1 would look like : http://localhost/#pages/index/id=1. If a user bookmarks this, and later comes back to the page, I wonder if it's possible to use the second URL in my route decoding, or if I have to load it blank, then use the same JS/Ajax to populate the page?
In my mind it is problematic to use Ajax in these cases if a user copies the link and mails it to a friend with JavaScript disabled.
edit#1: Fixed some spelling.
edit#2: To clarify the question a bit: I want a site where I can do the following:
(a): with javascript turned on, use ajax calls to replace the content of a div (without reloading the page)
(b): with javascript turned on, bookmark the page as it is after the ajax call i (a)
(c): take the URL, send it to a person with noscript turned on, and have the same page as after the ajax call was made.
(a) and (b) works just fine on my page but (c) is seemingly impossible.
Currently, the only portion of a URL you can update without causing the browser to redirect is the hash. This portion of the URL is not sent to the server in a request and is only available for client-side processing, so it cannot be used to provide a javascript-free way of providing a link.
The issue you are facing is a common one amongst those using AJAX. The best solution I've encountered is to provide a way to view any AJAX-loaded state of every page through a "true" URL, one that will be passed to the server.
This means you have one URL which provides a "snapshot" of a page's state:
http://localhost/pages/index/1/someaction
And an AJAX-specific URL which provides the local state of the page in the client's browser:
http://localhost/pages/index/1#someaction
What you then have to do is provide some means of generating the "snapshot" link to the page from the AJAX version. A "Link to this Page" or "Permanent Link" button is a reasonable option.
This not possible simply because everything that comes after the # sign (fragment identifier) is never sent to the server and there's no way for the server to ever capture this value, so no routing with it.
You could try replacing the '#' with a '?' This will send the rest of it as a get variable, so you may need to do some tweeks, such as change the format to http://localhost/?pages=index&id=1
There are some fancy things you can set up with the web server so that localhost/article/fancystuff is re-directed to localhost/article.php?title=fancystuff
There are a lot of ways of allowing for an AJAX site to work with bookmarks and the back button. But you should ask your self, do you want people to do certain things. Generally, AJAX is used for more advanced web-applications that do not map well to the traditional back and forth model.
EDIT
What with you additions to the question. I will say that seeming as you want to fully support users who are scared of Javascript that you will need to make your site work perfectly with out any AJAX at all. But you should design it in such a way, that the content of pages are included from separate files. This means that when you add in the additional Javascript, it can load the file and place it more or less directly into the content holder on your page.
You do need to remember that you can't force some one to accept a bookmark or force a change to a book mark. What you are after may be best served suing cookies. Luckily, even less people are scared of cookies, hardly anyone disables them, unless they are either paranoid or up to something.

Categories

Resources