In my project, i has many simple html text in database eg.
<ul>this is apple</ul>
<ul>this is orange</ul>
<p>I want to <u>eat</u> apple</p>
hello <i>mr</i> orange
In my project, I have a server and a client. the server will read the html text from database, the client is a web client, it must get the html text in form of Image object to display it.(I have my own reason not to display html text directly in DOM......), anyway the client must display everything in the form of image.
Now I think 2 approach to solve this problem.
the first one , server convert 'html text' into image (eg. base64 string) , then send it to client.
or .....
the second one, client get the 'html text' and convert it by javascript to javascript image object in the browser.
my server program is a dot net program.
my client is browser which can run html5, eg. chrome, firefox.
How can I do it?
I have done this before using the js plugin html2canvas
It's not perfect, but it works pretty good, and maybe it improved since I last used it.
If you want to do it on the server side, you can crawl the page with a headless browser like phantomJs and have it create a screenshot for you, but this will probably be more complicated to setup.
Related
I have a scenario where I would like to automate programmatically the following process:
Currently, I have to manually
Navigate to a webpage
Enter some text (an email) in a certain field on the webpage
Press the 'Search' button, which generates a new page containing a Table with the results on it.
Manually scroll through the generated results table and extract 4 pieces of information.
Is there a way for me to do this from a Desktop WPF App using C# ?
I am aware there is a WebClient type that can download a string, presumably of the content of the webpage, but I don't see how that would help me.
My knowledge of web based stuff is pretty non-existent so I am quite lost how to go about this, or even if this is possible.
I think a web driver is what you're looking for, I would suggest using Selenium, you can navigate to sites and send input or clicks to specific elements in them.
Well, I'll write the algorithm for you but you also need to some homework.
UseWebClient get the htm page with the form you want to auto fill and submit
Us regex and extract the action attribute of the form you want to auto submit. That gets you the URL you want to submit your next request to.
Since you know the fields in that form, create a class corresponding to those fields, let's call the class AutoClass
Create a new instance of your auto class and assign values you want to auto fill
Using WebClient to send your new request with the url you extracted from the form previously, attach your object which you want to send to the server either through serialization or any method.
Send the request and wait for feedback, then further action
Either use a web driver like Puppeteer (Selenium is kinda dead) or use HTTPS protocol to make web requests (if you don't get stopped by bot checks). I feel like your looking for the latter method because there is no reason to use a web driver in this case when a lighter method like HTTP requests can be used.
You can use RestSharp or the built in libraries if you want. Here is a popular thread of the ways to send requests with the libraries built in to C#.
To figure out what you need to send you should use a tool like Fiddler or Chrome Dev Tools (specifically the Network tab) to see what you need to send to acheive your goal as you would in a browser.
I am working on a Web scraper that will scrape from angular website.
I am using HttpClient class for this purpose but instead of getting html tags in the body tag of the page I am getting <ng-view> </ng-view> tags.
Can anyone explain what is going on and how can i get HTML code instead of ng-view tags?
As far as I know, this won't be possible. All you can "scrape" is the initial markup that is served to the browser. All other content will be obtained by running the JavaScript, which makes calls back to the server for additional data.
Unless you're prepared to write a complete, fully-functional JavaScript engine, I'd say that initial page is all you'll get.
I want to grab a set of data from a site into my C# application. I've referred to some sites and articles using the WebClient class.
But the problem is the data I want is in a news bar made using flash. Is it possible to grab the data from it? The data in it also keeps on updating as well.
Have you tried the Yahoo approach? The below project does just that.
It is easy to download stock data from Yahoo!. For example, copy and
paste this URL into your browser address:
http://download.finance.yahoo.com/d/quotes.csv?s=YHOO+GOOG+MSFT&f=sl1d1t1c1hgvbap2.
Depending on your Internet browser setting, you may be asked to save
the results into a filename called "quotes.csv" or the following will
appear in your browser:
http://www.codeproject.com/KB/aspnet/StockQuote.aspx?display=Normal
It is unable to grab a data from Flash.
One possible solution is that, if you dig into embed tag at the Flash object or find some url or rss that looks to be consumed by the flash, you can read that by WebClient or (hopefully) XmlReader.
I have a div and inside it exists one gridview and few other divs. Now I want to convert its contents to an image and store it on the server. I don't want to send it to client.
How can I do it?
The Page Content will be process in browser and browser is in client side
so you dont now how it seems in server side ,in server side we just have html and scripts ...
so the link ddrace gave may be usefull, or you can develope a windows App and put a webbrowser on it and load your page in the browser and save the image of browser to the server
You're going to have to use a piece of middleware to do this. I have used this one in the past: http://www.winnovative-software.com/Html-To-Pdf-Converter.aspx . I know it says "to PDF", but one of it's features is that it can render the result to an image and save it in various image formats.
In your web application, have the middleware components bring the web-page down that has your control etc on it, and save it as an image on the server.
I'm letting my users register an email account, the users just fills in all information in my program and my program will fill the fields. Well not really it makes a POST request with the correct postdata to the correct form/post url.
However the website requires a captcha, I just simply want to show the captcha to my user, he enters the value and then it gets send along with the postdata.
The register page is here: http://register.rediff.com/register/register.php?FormName=user_details
I can just get all image urls from the html but when I copy the url of the captcha image and go to it, it's a different image then the image i copied the url from:
http://register.rediff.com/register/tb135/tb_getimage.php?uid=1312830635&start=JTNG
How do I do this using HttpWebRequest ?
I can just grab the html first:
string html = new WebClient().DownloadString("http://register.rediff.com/register/register.php?FormName=user_details ");
Then get the image url but I don't know how to show the same captcha to the user?
Btw it's not for a bot... it's not something automated.. it's just I don't want the user to show the webinterface...
Not really answer, some advice instead:
If you're writing an app client to work with the website, a better approach would be to write a WCF/WebService for the App to interact with directly - this can just refer directly to your BL layer.
If you want the whole app to work on screen scraping then that's a lot of work ahead, and your app will be dependent on the site not being changed.