Recursive HTML Parsing using C#

Recursive HTML Parsing using C# - c#

I'm trying to export HTML content (tables) to CSV files using C#, and based from my research here, one of the best ways to implement this is through the use of the HTML Agility Pack.
I haven't started coding and testing this yet because I need to be sure if it's doable first. The HTML table from the website is actually getting push messages from the server so its contents are updated real-time, so a change can happen all the time. What I would like to do is to be able to export the table to CSV every after there's a change in the table (e.g. row added, row deleted, cell contents modified, etc).
I am not sure if this can be done using HTML agility pack, or can be done using C# at all.
Please advise and thank you in advance.

Since this is dynamically updating data it sounds like a headless browser would be a better fit for what you're looking to do. Something like espion.io or phantom.js. A headless browser would allow you to respond to these data pushes and capture the html for further processing.

Related

How can I create a dynamic pdf with c#

I have a conception problem and before coding I would like your help.
My goal is to create an app that call an api, gather the data, analyze it, create a report from it and lastly send it by email.
I managed to get the data (as json) with c#, to deserialize it and to make my analysis.
I would like to know what would be the best way to create a nice pdf with the data. My goal would therefore to create for example an html template, that every time I call out my api it modify the content of the template and generate from the html a pdf.
If I think about html it is because I know that I can do my html exactly as I am please, but the problem is the conversion to pdf that sometimes destroy a bit the style.
I know that with visual code I can create html pages, and I would like to know if I can modify the content from a c# page if it is the same project.
If not, can you recommend anything in order for me to do a pdf exactly as I am please with the dynamic content
Thank you

Document search and add engine web application

I want to develop a asp.net web application which should do the following task
a) user should be able to add content to the document. Content to be added can include text as well as image, screen shots etc.
b) user should be able to search based on some keywords. when searching with the keyword appropriate content along with images(if any) should be shown to user.
I am not sure what should be the proper approach for this. One way i think is to store text content in some xml file and later search for keywords by going though each node of xml and displaying. but i am not sure how to attach image content with xml. Also this method doesn't seem to be nice and efficient if with time document size increases a lot.
Anyone please suggest some proper way to do above requirement. Any hint would be appreciated.

Split it to two tasks. Editation and search.
Full text search is solved problem. Simply use Sphinx Search and you are done. Sphinx is simple to use and can do everything you will need. It has MySQL interface (your app connects to sphinx the same way as to second MySQL database).
Editation is a bit more complicated. If I understand correctly, you want multiple users to edit single document concurrently.
I recommend using websockets to notify other clients about changes in document. Long-polling and Server Sent Events have ugly side effects, like stopping browser from making another requests to server. To implement client side in Javascript, I would use React, Angular or similar framework to make updates as easy as possible.
Server side requires modification-friendly representation of a document, so if one user changes one part, and another user another part, your app should be able to merge changes. Changing completely different parts is easy, but it may be tricky to change the same paragraph or document node. Exact representation of each change depends on format of your document.
I do not see much benefits of using XML rather than any other format. It may be practical for document representation, but it will not help with merging of colliding modifications. I would start with plain array of strings, each representing a single paragraph. Extending it to full XML document is the easy part, once two users can edit the same paragraph.
To store images in XML, simply store files using their hash as a file name and then use such name to link the file in XML. Git does the same thing and it works nicely. You may want to count references to identify unused files.

How to convert a HTML Table in an MVC View to PDF file, ITextSharp

I've been pointed in the direction of ITextSharp, when I went to download the package from NuGet I noticed something called RazorToPDF only to discover unsolvable formatting issues due to the project no longer being supported.
After more research I was surprised to find there wasn't a similarly worded question as this on SO.
So guys, what's the best way to convert a HTML page/table in an MVC project to a PDF file?

What's the best way to convert a HTML page/table in an MVC project to a PDF file?
Generally, print it to a PDF from the web browser on the client.
The thing is, by relying on the end-user perspective of the view in this case, you're also relying on the end-user rendering of that view. It's a step that should be removed from this particular equation entirely.
Keep in mind that there are fundamental differences between how an HTML page renders and how a PDF renders. The two aren't 100% interchangeable. A PDF has a static page size and elements are placed absolutely, whereas HTML has dynamic sizes and elements are placed in a flow layout. There are additional considerations such as client-side DOM manipulation that may take place in that view. "Rendering" it quickly becomes a browser-based activity, which is something you shouldn't really need to do server-side.
Instead of thinking of the PDF as an extra step following the rendering of the view, think of it as a view in and or itself, parallel to the other view. One requested action results in the HTML view, another requested action results in the PDF "view". As such, you design the PDF template how you want it to look and populate it with data (using something like iTextSharp) before returning the file contents to the client.

How to create a multiple page invoice in asp.net c#?

I am thoroughly confused with something I want to do and am looking for some advice.
One of my client has to produce monthly invoice detailing all of the company expenditure, and two other such invoices. The client is sure that he only needs these invoices - and they are extremely simple enough to produce as far as logic is concerned.
Now, to make the actual invoice, I don't really want to use reporting solutions like Telerik, SSRS etc.. as I think they are an overkill for my purpose. At the same time, I am not sure how I can get the printer to print the invoices in a neat pages without cutting off anything.
I am very tempted to just give the output in a webpage and ask my client to print them off from there.
Am I not looking at this the right way? Is this possible?
I could use ITextSharp or something to produce pdf's.. In fact, I think I will go ahead with this if it isn't possible to just output to html page and get the printer to recognize the page breaks somehow.
Because this is a very small job, I don't want to spend too much time on it as the cost of this freelance project is minimal too.
The reason printing to a new page is important is that my client has a few shops he deals with and he would want to print each of his customers their own invoices. I can get him to produce each customer's invoice separately and print them but it is not ideal way to deal with it.
thanks

There is a css property which should tell a browser to break a page: page-break-before.
But if you have a a wide list of browsers to support, it would be better to get some HTML to PDF conversion library or really use iTextSharp (as far as I know there is even a module/class which allows to conver HTML to PDF with iTextSharp) as printing web pages has many issues.

In the past, when I wanted to create a reusable document, I used Word or Excel XML formats.
See: http://en.wikipedia.org/wiki/Microsoft_Office_XML_formats
They are easy to create and tweak, then all you have to do is recreate the dynamic parts in your code. All you have to do is save the document in Office XML format, then open it up in word pad to see where to make your changes.

SSRS has a drag and drop interface for designing reports and has a PDF output option. If the data is in a SQL server database then even with the learning curve it should be easier to do SSRS reports.

html parsing in c#

How can i parse values from the scoreboard of http://www.cricinfo.com/nzvaus2010/engine/current/match/423789.html
But how it could be managed? i am stuck how to fetch data and store it in database

I suggest you start reading, this looks like a good place to start
Screen Scraping Tutorial using C#
.NET

It won't be easy, looking at the source of the page, it's all dynamic. You're going to have to pull the javascript apart to figure out where it's getting it's data from and use that. Conveniently, it's written in jquery.

Not sure how much data you are trying to get but there is an rss feed on the site http://www.cricinfo.com/rss/livescores.xml

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.