do you think it would be difficult to write a framework where mvc compares last html it output to the current html we want to output, and instead of sending the entire html, figure out what has changed and generate js code that will do the updating as compared to previous html? (presuming nothing was manually changed on the client using js)... maybe an idea for a codeplex project? or maybe something like this exists? if so, do tell. thanks.
I think it's an interesting question, but one without a practical solution..
If I understand correctly you want to generate a diff from the current DOM into a new one, and you want to generate this change script (which is javascript executed client side) on the server.
The issue with that is that in order for the server to generate a diff, it needs to know what the previous DOM structure was in order to compare it with the new one (i.e. the new html page)
The only ways I can think of are:
1. The client sends back the full current page or some representation of it.
2. The server stores a copy of the previous page.
The problem with #1 is that you've already negated any performance benefit you from it. Sending a full page back to the server is just as bad or worse than sending it from the server to the client. You can achieve the same effect by requesting the full page body via AJAX and replacing it, and it would be just as efficient and simpler to implement.
The problem with #2 is now the server needs x copies of each page where x is the number of users. That's a lot of memory, unless you're persisting it to disk, in which case that's a a moderate sized disk write for every request. Then there's the problem of figuring out how long to keep these around, because if someone visits the site once, you don't want to keep it around forever.
The performance of either situation is most likely going to be worse than just getting the full page and will only get worse with more users.
That's not including the complexity of actually getting it right. I think hypothetically it could be done, but other than as a fun experiment there aren't any practical benefits that would outweigh the cost of such a solution, which is why I doubt you'll find one.
Have you considered caching? E.g. Caching in asp.net-mvc
Will be more straightforward and it makes more sense to me too.
You would need to save the state of any client at your server, and no response could be cached anywhere because every client needs a different response.
Even if this is possible, it would make no sense in the "HTTP world" imho.
You are attempting to suggest a solution for a problem that has already been solved. AJAX solved your question. You can use AJAX requests to load html that you know will change thus saving round trips.
Related
on one of the project I was delegated to I saw some c# code like this
string.format("some JavaScript function(){{ {0}.setValue and do magic",param1,param2,...");
Then the function was set as index changed method of some JavaScript element.
It's not the first time I saw it but its the first time it struck me so hard. The actual code was enormous and there was large number of parameters passed.
I was wondering if there is a better way (Probably plenty?) cause this seemed for me like a poor one. Writing complex javascript logic can be painful and writing it using string.format is semi insane for me. Can some 1 explain to me what are the alternatives and best practices are?
Thanks for help.
Regards.
If you have not some really need of generating conditional javascript code for the client, bacause that is what the presented code is doing, I would strongly advice to keep JS on client, and deliver it to the client via different standard mechanisms available in browser.
In short
use client side JS delivered via JS files or like a text and after treated like JS, but this is often for advanced scenarious.
use any MVC framework on client side, to avoid continue callback to the server, if you don't really need it, and manage states and appearance of content on your site in easier way.
a) knockout
b) angular.js
... many others...
I am currently rewriting a large website with the goal of replacing a large number of page/form submittals - with AJAX calls. The goal is to reduce the amount of server roundtrips - and all the state handling on pages that are rich with client .
Having spent some time considering the best way forward with regards to performance - my question is now the following.
Will it lead to better performance to have just one single aspx page that are used for all AJAX calls - or will it be better to have a aspx page for every use of AJAX on a given webage?
Thank you very much for any insights
Lars Kjeldsen
Performancewise either approach can be made to work on a similar order of magnitude.
Maintanancewise, I prefer to have separate pages for each logical part of your site. Again, either can work, but I've seen more people make a mess of things with "monolithic" type approaches. Single page you'll need a good amount of skill structuring your scripts and client side logic. Well done there isn't a problem, however, I just see more people getting it right when they use separate pages for separate parts of the site.
If you take a look at the site http://battlelog.battlefield.com/ (you'll have to create an account) you'll notice a few things about this it.
It never refreshes the page as you navigate the website. (Using JSON to transmit new data)
It updates the URL and keeps track of where you are.
You can use the updated URL and immediately navigate to that portion of the web-application. (In this case it returns the HTML page)
Here's a full write up on the website.
Personally, I like this approach from a technology/performance perspective, but I don't know what the impact it will have on SEO since this design relies on the HTML5 History state mechanism in JavaScript.
Here's an article on SEO and JavaScript, but you'll have to do more research.
NOTE: History.js provides graceful degradation for Browsers that do not support History state.
This might be a pretty strange question in the eyes of some of you out here, but I really wonder if comments in my code will slow down the execution time of the pages I make.
I have some Classes / WebControls that required alot of comments to make everything clear and quickly readable to other people that will have to deal with my code and now wonder how ASP.Net deals with my comments. Will comments be stripped from my code at compile time or how is this all done?
I should be more specific: I mean comments in my code-behind in C#.
Comments serverside in C# won't do anything but a slight increase in compiletime.
Comments in javascript of course increase the downloadsize. But since you usually minify javascript on production systems, and thus strip out the comments and whitespace it doesn't matter in practice.
Since html minification on dynamically generated pages isn't that common, comments in html slow you down a bit, but they typically are so few that it doesn't matter in practice either.
Comments on the aspx pages (like in javascript etc.) are slowing down the page because it is content that needs to be downloaded. For JavaScript you might use a minimizer and have a minimized version of the javascript on the production system.
For c# code... it does not make a difference since the comments are not compiled into the assembly.
No. Only exception is when you have an exorbitant amount of HTML (<!--) comments because this will require extra time to transfer your HTML over the internet. All C# comments will be striped when compiled.
If the comments are in the .aspx page, it will depend on whether they're HTML comments or server-side comments. As Pieter points out, HTML (!<--) comments have an impact because they get transferred over the network.
Generally speaking, the more that gets sent to the browser, the longer it will take your pages to load. (It also puts additional load on your server - increased bandwidth usage, and most likely a small increase in CPU load simply because the server has to work harder to send more data.)
That's why ASP.NET supports server-side comments. If you use the !<%-- ... --%> syntax instead, the contents of the comment will not be sent to client. The best way to know for certain what's actually being transferred is to View Source in the browser to see what came across.
Scott Guthrie posted about this back in 2006: http://weblogs.asp.net/scottgu/archive/2006/07/09/Tip_2F00_Trick_3A00_-Using-Server-Side-Comments-with-ASP.NET-2.0-.aspx
I've been entrusted with an idiotic and retarded task by my boss.
The task is: given a web application that returns a table with pagination, do a software that "reads and parses it" since there is nothing like a webservice that provides the raw data. It's like a "spider" or a "crawler" application to steal data that is not meant to be accessed programmatically.
Now the thing: the application is made with standart aspx webform engine, so nothing like standard URLs or posts, but the dreadful postback engine crowded with javascript and non accessible html. The pagination links call the infamous javascript:__doPostBack(param, param) so I think it wouldn't even work if I try even to simulate clicks on those links.
There are also inputs to filter the results and they are also part of the postback mechanism, so I can't simulate a regular post to get the results.
I was forced to do something like this in the past, but it was on a standard-like website with parameters in the querystring like pagesize and pagenumber so I was able to sort it out.
Anyone has a vague idea if this is doable, or if I should tell to my boss to quit asking me to do this retarded stuff?
EDIT: maybe I was a bit unclear about what I have to achieve. I have to parse, extract and convert that data in another format - let's say excel - and not just read it. And this stuff must be automated without user input. I don't think Selenium would cut it.
EDIT: I just blogged about this situation. If anyone is interested can check my post at http://matteomosca.com/archive/2010/09/14/unethical-programming.aspx and comment about that.
Stop disregarding the tools suggested.
No, the parser you can write isn't WatiN or Selenium, both of those Will work in that scenario.
ps. had you mentioned anything on needing to extract the data from flash/flex/silverlight/similar this would be a different answer.
btw, reason to proceed or not is Definitely not technical, but ethical and maybe even lawful. See my comment on the question for my opinion on this.
WatiN will help you navigate the site from the perspective of the UI and grab the HTML for you, and you can find information on .NET DOM parsers here.
Already commented but think thus is actually an answer.
You need a tool which can click client side links and wait while page reloads.
Tool s like selenium can do that.
Also (from comments) WatiN WatiR
#Insane, the CDC's website has this exact problem, and the data is public (and we taxpayers have paid for it), I'm trying to get the survey and question data from http://wwwn.cdc.gov/qbank/Survey.aspx and it's absurdly difficult. Not illegal or unethical, just a terrible implementation that appears to be intentionally making it difficult to get the data (also inaccessible to search engines).
I think Selenium is going to work for us, thanks for the suggestion.
I'm building a small specialized search engine for prise info. The engine will only collect specific segments of data on each site. My plan is to split the process into two steps.
Simple screen scraping based on a URL that points to the page where the segment I need exists. Is the easiest way to do this just to use a WebClient object and get the full HTML?
Once the HTML is pulled and saved analyse it via some script and pull out just the segment and values I need (for example the price value of a product). My problem is that this script somehow has to be unique for each site I pull, it has to be able to handle really ugly HTML (so I don't think XSLT will do ...) and I need to be able to change it on the fly as the target sites updates and changes. I will finally take the specific values and write these to a database to make them searchable
Could you please give me some hints on how to architect the best way? Would you do different then described above?
Well, i would go with the way you describe.
1.
How much data is it going to handle? Fetching the full HTML via WebClient / HttpWebRequest should not be a problem.
2.
I would go for HtmlAgilityPack for HTML parsing. It's very forgiving, and can handle prety ugly markup. As HtmlAgilityPack supports XPath, it's pretty easy to have specific xpath selections for individual sites.
I'm on the run and going to expand on this answer asap.
Yes, a WebClient can work well for this. The WebBrowser control will work as well depending on your requirements. If you are going to load the document into a HtmlDocument (the IE HTML DOM) then it might be easier to use the web browser control.
The HtmlDocument object that is now built into .NET can be used to parse the HTML. It is designed to be used with the WebBrowser control but you can use the implementation from the mshtml dll as well. I hav enot used the HtmlAgilityPack, but I hear that it can do a similar job.
The HTML DOM objects will typically handle, and fix up, most ugly HTML That you throw at them. As well as allowing a nicer way to parse the html, document.GetElementsByTag to get a collection of tag objects for example.
As for handling the changing requirements of the site, it sounds like a good candidate for the strategy pattern. You could load the strategies for each site using reflection or something of that sort.
I have worked on a system that uses XML to define a generic set of parameters for extracting text from HTML pages. Basically it would define start and end elements to begin and end extraction. I have found this technique to work well enough for a small sample, but it gets rather cumbersome and difficult to customize as the collection of sites gets larger and larger. Keeping the XML up to date and trying to keep a generic set of XML and code the handle any type of site is difficult. But if the type and number of sites is small then this might work.
One last thing to mention is that you might want to add a cleaning step to your approach. A flexible way to clean up HTML as it comes into the process was invaluable on the code I have worked on in the past. Perhaps implementing a type of pipeline would be a good approach if you think the domain is complex enough to warrant it. But even just a method that runs some regexes over the HTML before you parse it would be valuable. Getting rid of images, replacing particular mis-used tags with nicer HTML , etc. The amount of really dodgy HTML that is out there continues to amaze me...