c# html file encoding iso-8859-2 - c#

I have this html file: http://mek.oszk.hu/17700/17789/17789.htm, which I already downloaded.
This html file has iso-8859-2 charset.
I want to convert this HTML file to a PDF file with IronPdf nuget package.
I tried this, but It doesn't work:
using (StreamReader stream = new StreamReader(book.Source,Encoding.GetEncoding("ISO-8859-2")))
{
HtmlDocument htmlDocument = new HtmlDocument();
htmlDocument.Load(stream);
var Renderer = new IronPdf.HtmlToPdf();
var PDF = Renderer.RenderHtmlAsPdf(htmlDocument.DocumentNode.OuterHtml);
var OutputPath = "HtmlToPDF.pdf";
PDF.SaveAs(OutputPath);
System.Diagnostics.Process.Start(OutputPath);
}
My output result:
UPDATE 1: I want to this output result:

For me it's Magyar :) but obtained a better result with this piece of code:
var Renderer = new IronPdf.HtmlToPdf();
var PDF = Renderer.StaticRenderHTMLFileAsPdf("17789.htm", new IronPdf.PdfPrintOptions() { InputEncoding = Encoding.GetEncoding("ISO-8859-2") });
var OutputPath = "HtmlToPDF.pdf";
PDF.SaveAs(OutputPath);
System.Diagnostics.Process.Start(OutputPath);

Related

System.InvalidOperationException: ''DocumentRenderer' must be set before calling 'PrepareDocumentRenderer'.'

I'm trying to convert html codes to pdf with pdfsharp & migradoc. I use the RenderDocument() function for Turkish characters.But after the RenderDocument() function I get this error.
System.InvalidOperationException: '' DocumentRenderer 'must be set before calling' PrepareDocumentRenderer '.'
I wrote the code below by looking at the example in this link.
http://www.pdfsharp.net/wiki/HelloMigraDoc-sample.ashx
protected void btnGeneratePdf_Click(object sender, EventArgs e)
{
string html = "";
using (var client = new WebClient())
{
html = client.DownloadString("http://localhost:14670/WebForm6");
}
PdfGenerateConfig config = new PdfGenerateConfig();
config.PageSize = PageSize.A4;
config.SetMargins(20);
var doc = PdfGenerator.GeneratePdf(html, config);
PdfDocumentRenderer renderer = new PdfDocumentRenderer(true);
renderer.PdfDocument = doc;
renderer.RenderDocument();
var tmpFile = "C://Users//mutlu.ozkurt//Desktop//Files/tmp372A.pdf";
renderer.PdfDocument.Save(tmpFile);
Process.Start(tmpFile);
}
You are using the HTML Renderer for PDF using PDFsharp that creates a PDF file, not a MigraDoc document. You mix this with sample code from MigraDoc. Things do not work this way.
Use the doc variable you get and use it to create a PDF directly without calling any MigraDoc code.

Ignore warning when creating excel from Memory Stream in C# MVC

I am creating excel file by convert .cshtml MVC Razor page into bytes and then converting it into Memory stream to create excel, but when opening that excel file I am getting a warning message, I want to ignore it or you can say remove it. How can I do that ( don't want to use External library like NPOI or EPPlus)
Here is my C# Code
var html = FakeController.RenderViewToString2(ControllerContext, "~/Views/Report/DistirbuterPriceListExcel.cshtml", distributerListModel, true);
var bytes = Encoding.ASCII.GetBytes(html);
MemoryStream ms = new MemoryStream(bytes);
var root = Server.MapPath("~/Uploads/ReportPDFFiles/");
var pathtosave = "../../Uploads/ReportPDFFiles/";
var CompURL = "";
var FileName = "";
var pdfname = String.Format("{0}.xls", Guid.NewGuid().ToString());
var pdfCompleteName = "Distributer-List.xls";
var path = Path.Combine(root, pdfname);
var path2 = Path.Combine(root, pdfCompleteName);
path = Path.GetFullPath(path);
path2 = Path.GetFullPath(path2);
var fileStream2 = new FileStream(path2, FileMode.Create, FileAccess.Write);
fileStream2.Write(bytes, 0, bytes.Length);
fileStream2.Close();
CompURL = pathtosave + pdfCompleteName;
FileName = pdfCompleteName;
return Json(new { CompURL, FileName }, JsonRequestBehavior.AllowGet);
Where distributerListModel= Model passed to view, I want to return url in JSON response.
I can create excel file, but it is opening and showing results, but showing warning error "the file you are trying to open is in different format", here is the image
I want to ignore or remove the above error, how can I do that?
As the HTML layout of table is quite complicated I don't want to use EPPlus and directly want to convert html/css code into excel.

add html content in existing docx file using openxml in C#

How do I add/append HTML content in an existing .docx file, using OpenXML in asp.net C#?
In an existing word file, I want to append the html content part.
For example:
In this example, I want to place "This is a Heading" inside a H1 tag.
Here its my code
protected void Button1_Click(object sender, EventArgs e)
{
try
{
using (WordprocessingDocument doc = WordprocessingDocument.Open(#"C:\Users\admin\Downloads\WordGenerator\WordGenerator\FTANJS.docx", true))
{
string altChunkId = "myId";
MainDocumentPart mainDocPart = doc.MainDocumentPart;
var run = new Run(new Text("test"));
var p = new Paragraph(new ParagraphProperties(new Justification() { Val = JustificationValues.Center }), run);
var body = mainDocPart.Document.Body;
body.Append(p);
MemoryStream ms = new MemoryStream(Encoding.UTF8.GetBytes("<html><head></head><body><h1>HELLO</h1></body></html>"));
// Uncomment the following line to create an invalid word document.
// MemoryStream ms = new MemoryStream(Encoding.UTF8.GetBytes("<h1>HELLO</h1>"));
// Create alternative format import part.
AlternativeFormatImportPart formatImportPart =
mainDocPart.AddAlternativeFormatImportPart(
AlternativeFormatImportPartType.Html, altChunkId);
//ms.Seek(0, SeekOrigin.Begin);
// Feed HTML data into format import part (chunk).
formatImportPart.FeedData(ms);
AltChunk altChunk = new AltChunk();
altChunk.Id = altChunkId;
mainDocPart.Document.Body.Append(altChunk);
}
}
catch (Exception ex)
{
ex.ToString ();
}
}
Add HTML content as Chunk should work, and you are almost there.
If I understand the question properly, this code should work.
//insert html content to H1 tag
using(WordprocessingDocument fDocx = WordprocessingDocument.Open(sDocxFile,true))
{
string sChunkID = "myhtmlID";
AlternativeFormatImportPart oChunk = fDocx.MainDocumentPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.Html, sChunkID);
using(FileStream fs = File.Open(sHtml,FileMode.OpenOrCreate))
{
oChunk.FeedData(fs);
}
AltChunk oAltChunk = new AltChunk();
oAltChunk.Id =sChunkID ;
//insert html to the tag of 'H1' and remove H1.
Body body = fDocx.MainDocumentPart.Document.Body;
Paragraph theParagraph = body.Descendants<Paragraph>().Where(p => p.InnerText == "H1").FirstOrDefault();
theParagraph.InsertAfterSelf<AltChunk>(oAltChunk);
theParagraph.Remove();
fDocx.MainDocumentPart.Document.Save();
}
The short answer is "You can't add HTML to a docx file".
Docx is an open format defined here. If you're using the Microsoft version they have a number of extensions.
In any case, the file contains XML, not HTML and you can't simply add HTML to a docx file. There are styles and formatting objects and pointers that all need to be updated.
If you need to modify a docx file and don't want to do a lot of research and a lot of coding, you'll need to find an existing library to work with.

Is there a way to read a text file from resources without typing the file name?

using (var stream = Assembly.GetExecutingAssembly().GetManifestResourceStream("xyz.project.Folder1.Folder2.SomeFile.Txt"))
{
TextReader tr = new StreamReader(stream);
string fileContents = tr.ReadToEnd();
}
var html = File.ReadAllText(Properties.Resources.FilesTypes);
var html = File.ReadAllText(#"E:\New folder (44)\New Text Document.txt");
In the original and what is working i used the lastl ine:
var html = File.ReadAllText(#"E:\New folder (44)\New Text Document.txt");
But then i added the text file to my project resources since i want the file to be in the program all the time and not to read it from the hard disk.
Then i added the code with the assembly
Assembly.GetExecutingAssembly().GetManifestResourceStream("xyz.project.Folder1.Folder2.SomeFile.Txt"))
{
TextReader tr = new StreamReader(stream);
string fileContents = tr.ReadToEnd();
}
But in this case i need to type the name of the file manualy.
So i tried this line:
var html = File.ReadAllText(Properties.Resources.FilesTypes);
But getting exception:
Illegal characters in path

html scrape in .net

In .net what is the best way to scrape HTML web pages.
Is there something open source that runs on .net framework 2 and and put all the html into objects. I have read about "HTML Agility Pack" but is there any think else?
I think HtmlAgilityPack is but you can also use
Fizzler : css selector engine for C#
SgmlReader : Convert html to valid xml
SharpQuery : Alternative of fizzler
You might use Tidy.net, which is a c# wrapper for the Tidy Library to convert HTML in XHTML available here: http://sourceforge.net/projects/tidynet/ so you could get valid XML and process it as such.
I'd make it this way:
// don't forget to import TidyNet and System.Xml.Linq
var t = new Tidy();
TidyMessageCollection messages = new TidyMessageCollection();
t.Options.Xhtml = true;
//extra options if you plan to edit the result by hand
t.Options.IndentContent = true;
t.Options.SmartIndent = true;
t.Options.DropEmptyParas = true;
t.Options.DropFontTags = true;
t.Options.BreakBeforeBR = true;
string sInput = "your html code goes here";
var bytes = System.Text.Encoding.UTF8.GetBytes(sInput);
StringBuilder sbOutput = new StringBuilder();
var msIn = new MemoryStream(bytes);
var msOut = new MemoryStream();
t.Parse(msIn, msOut, messages);
var bytesOut = msOut.ToArray();
string sOut = System.Text.Encoding.UTF8.GetString(bytesOut);
XDocument doc = XDocument.Parse(sOut);
//process XML as you like
Otherwise, HTML Agility pack is ok.

Categories

Resources