so im making a program which is kind of a web crawler. it downloads the html of a page and parses it for a specific text using regex and then adds it to a list.
to achieve this, i used async http requests. the GET request is sent asynchronously and the parsing operation is performed on the returned html.
my issue, and i'm not sure if it may be simple, is that the program doesn't run smoothly. it will send a bunch of requests, pause for a couple seconds, then increments the items parsed all at once (although the counter is programmed to increment once every time an item is added) so that for example it jumps from 53 to 69 instead of showing, 54,55,56,...
sorry for being a newb but i taught myself all this stuff and some experienced advice would go a long way.
thanks
That sounds correct.
The slowest part of your task is downloading the pages over the network.
Your program starts downloading a bunch of pages at once, waits for them to arrive, then parses them all almost instantly.
Related
I am experiencing a strange issue where some sections of the Asp.Net(aspx pages) site are taking a long time to load than others. After going through all the stages of the cycle and getting the page and the master page processed after it exits the code it takes about 20 seconds for the application to get to the Application_EndRequest event. I am not sure what it is doing for those 20 seconds. Since I know what I am telling is not specific I am just asking for suggestions on how to debug the issue or any helpful tips I can follow to see what's the holdup.
Thanks
You can try to use a code profiler. I used Stackify's Prefix in a similar situation. It's a free download.
Am fairly new to Web Development, and am currently building a website for client using Angular 5 front end, c# back end, using ASP.NET Core. The issue I'm having is I can pass the file and upload it, but want some way of tracking the upload process, as before I upload the file I run a whole bunch of formatting checks which can take anywhere between 10-15 minutes due to the size of the file.
Is there a way to have two HTTP requests, one which will start the process and return an indicator that the process has begun and another which can be called periodically from the front end, and provide a status update on the validating taking place.
Thanks in advance!
First of all - there is something wrong if you need to wait 15-20 minutes - you probably need to create separate work thread for this (Search for background work for asp.net core) and broadcast message to the clients after using something like SignalR.
About progress - JQuery deferred supports progress method. I believe there is an implementation for promises. And SignalR supports it as well.
Next, you need to implement it on the server side somehow like that
https://blogs.msdn.microsoft.com/dotnet/2012/06/06/async-in-4-5-enabling-progress-and-cancellation-in-async-apis/
Progress example:
https://www.codeproject.com/Articles/1124691/SignalR-Progress-Bar-Simple-Example-Sending-Live-D
Hope its enough to get started.
I’m building a OCR scanning module in my ASP forms webapplication. As you may know a operation like that can take some time. Therefor I’m using a background service application that responds to a message queue that runs the code so the user does not even have to stay on the same webpage.
What I would like to do is inform the user what is going on while the service is running. If lets say the user uploaded 5 documents I would like to see something like this appear in a literal, label or repeater control. These items do not have to be saved in a database and I don’t want then to.
Processing document 1 of 5
Document 1 processed with code 6732842
Processing document 2 of 5
Document 2 processed with code 8732457
Processing document 3 of 5
Document 3 processed with code 8725347
Processing document 4 of 5
Document 4 could not be processed “no OCR string recognized”
Processing document 5 of 5
Document 5 processed with code 4372537
Completed: Processed 4 of 5 documents received
If an error occurred I would like to see something like this
An error occurred. The scanning process has been stopped.
I have some idears but I don’t know what’s the best practice.
Option 1:
I could save above items in a static class and let javascript post every 5 sec. to get
that value via a web method.
Option 2:
I could save above items in the session returning a updated session object when I let JavaScript post every 5 sec. I don’t know if this is available when using a service Application.
If you have any other options (preferably better ones) that would be greatly appriciated.
Thanx in advance.
Save the progress items in a database , give each user's upload with a different id, publish this data with a web service. In the browser, use JavaScipt to retrieve the progress from the web service.
Of course, the database need to delete periodically, e.g., every minitue.
I wish I could give you a working example, but what you're wanting to do is not easy, and is made even more difficult by the fact that you're not using ASP.NET MVC which makes async work easier. You're going to need to write a series of asynchronous tasks to do some work for you. This article will give you a good start: http://blogs.msdn.com/b/tmarq/archive/2010/04/14/performing-asynchronous-work-or-tasks-in-asp-net-applications.aspx
I need to translate a group of words from a free online dictionary, so I wrote a simple program in C# to send http requests, and then parse returned HTML to extract the meanings.
However, the free web site stops after 130 requests, asking for manual entering of words as in image (captcha ) in order to continue. how can i over come this problem?
Thanks,
Samer
This isnt a problem with your code, it is their website stopping itself from being spammed with hits from a single user. Easiest thing to do would be to have a dictionary of your own, then there would be no Captcha to get around.
I am working on a ASP.net application written in C# with Sql Server 2000 database. We have several PDF reports which clients use for their business needs. The problem is these reports take a while to generate (> 3 minutes). What usually ends up happening is when the user requests the report the request timeout kills the request before the web server has time to finish generating the report, so the user never gets a chance to download the file. Then the user will refresh the page and try again, which starts the entire report generation process over and still ends up timing out. (No we aren't caching reports right now; that is something I am pushing hard for...).
How do you handle these scenarios? I have an idea in my head which involves making an aysnchronous request to start the report generating and then have some javascript to periodically check the status. Once the status indicates the report is finished then make a separate request for the actual file.
Is there a simpler way that I am not seeing?
Using the filesystem here is probably a good bet. Have a request that immediately returns a url to the report pdf location. Your server can then either kick off an external process or send a request to itself to perform the reporting. The client can poll the server (using http HEAD) for the PDF at the supplied url. If you make the filename of the PDF derive from the report parameters, either by using a hash or directly putting the parameters into the name you will get instant server side caching too.
I would consider making this report somehow a little bit more offline from the processing point of view.
Like creating a queue to put report requests into, process the reports from there and after finish, it can send a message to the user.
Maybe I would even create a separate Windows Service for the queue handling.
Update: sending to the user can be email or they can have a 'reports' page, where they can check their reports' status and download them if they are ready.
What about emailing the report to the user. All the asp page should do is send the request to generate the report and return a message that the report will be emailed after is has finished running.
Your users may not accept this approach, but:
When they request a report (by clicking a button or a link or whatever), you could start the report generation process on a separate thread, and re-direct the user to a page that says "thank you, your report will be emailed to you in a few minutes".
When the thread is done generating the report, you could email the PDF directly (probably won't work because of size), or save the report on the server and email a link to the user.
Alternatively, you could go into IIS and raise the timeout to > 3 minutes.
Here is some of the things I would do if I would be presented this problem:
1- Stop those timeout! They are a total waste of resources. (bring up the timeout value of asp pages)
2- Centralize all the db access in one single point, then gather stats about what reports ran when by whom and how much time it took. Investigate why it takes so long, is it because of report complexity? data range? server load? (you could actually all write that on a .csv file on the server and import this file periodically in the sql server to analyze later).
Eventually, it's going to be easier for you to "cache" reports if you go through this single access point (example, same query same date will return same PDF previously generated)
3- I know this really wasn't the question but have you tried diving into those queries to see why they are so long to run? Query tuning maybe?
4- Email/SMS/on screen message when report is ready seems great... if your user generally send a batch of report to be generated maybe a little dashboard indicating progression of "their" queue could be built in the app. A little ajax control would periodically refresh the status..
Hint: If you used that central db access and you have sufficient information about what runs when why and how-long you will eventually be able to roughly estimates the time it will take for a report to run.
If the response time is mission critical, should certain users be limited in the data range (date range for example) during some hours of the day?
Good luck and please post more details about your scenario if you want to get more accurate hints...
Query tuning is probably your best place to start. Though I don't know you are generating the report, that step shouldn't really take all that long. A poorly performing query on the other hand could absolutely kill your performance.
Depending on what you find in looking at the query, you may need to add some indexes, or possibly even set up a table to store the information for your report in a denormalized way, to make it available faster. This denormalized table could then be refreshed (through a SQL Server Job) every hour, or with whatever frequency your requirements dictate (within reason).
If its' a relatively static report, without varying user input parameters, then caching the report run earlier in the day would be a good idea as well, but its' hard to say any more about this without knowing your situation.
For a problem like this you really need to start at the database unless you have reason to suspect your report generating code of being the culprit. There are various band-aids you could use that might help for a while, but if your db is the root cause then those solutions will not scale well, and you'll likely run into similar problems (or worse) some time in the future.