C# download HTML page to string asp.net

C# download HTML page to string asp.net - c#

I'm trying to download a aspx page's html from an another page, using the following code:
WebClient webClient = new WebClient();
String CompleteReport = webClient.DownloadString(new System.Uri(reportURL));
however the HTML that is returned contains the markup similar to the following:
"\r\n\r\n<!DOCTYPE html>\r\n\r\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\r\n<head><meta charset=\"utf-8\"
what should i do to download a string without these escape sequences.
Thank You!

The string doesn't actually contain those sequences. It contains the characters that they represent (actual newline and linefeed characters).
You are probably viewing the string in a debugger and the debugger is adding those sequences for you. If you dump it to a file and read it in notepad they won't be there.
See also this answer. If you add ,nq to the variable name in the watch window, the escape sequences will go away.

Related

C# .NET: Problems with query strings with character references

I'm having problems creating a query string and sending it to another webpage.
The text I'm trying to send is long and has special characters. Here is an example:
Represent a fraction 1/𝘣 on a number line diagram by defining the interval from 0 to 1 as the whole and partitioning it into 𝘣 equal parts. Recognize that each part has size 1/𝘣 and that the endpoint of the part based at 0 locates the number 1/𝘣 on the number line.
I can send this just fine if I hand code it:
<a href="Default.cshtml?standardText=Represent a fraction 1/𝘣 on a number line diagram by defining the interval from 0 to 1 as the whole and partitioning it into 𝘣 equal parts. Recognize that each part has size 1/𝘣 and that the endpoint of the part based at 0 locates the number 1/𝘣 on the number line.">
Link Text
</a>
This goes through without any problems, and I can read the entire Query String on the other side.
But if I am creating the link programmatically, my query string gets cut off right before the first character reference. I am using the following setup in a helper function:
string url = "Default.cshtml";
url += "?standardText=" + standard.text;
Link Text
When I use this, I only get "Understand a Fraction as 1/" and then it stops.
When I look at the page source, the only difference in the links is that one has actual ampersands and the second is having those turned into &
<a href="Default.cshtml?standardText=Understand a fraction 1/&#120355; as the quantity formed by 1 part when a whole is partitioned into &#120355; equal parts; understand a fraction &#120354;/&#119887; as the quantity formed by &#120354; parts of size 1/&#120355;."
So the problem is not really the spaces, but the fact that the & is being interpreted as starting a new query string parameter.
I have tried various things [using HttpUtility.UrlEncode, HttpUtility.UrlEncodeUnicode, Html.Raw, trying to replace spaces with "+"], but the problem isn't with the spaces, its with how the character references are being handled. When I tried HttpUtility.urlEncode I got a double-encoding security error.
On the advice of OmG I tried replacing all the &s, #s, and /s using:
url = url.Replace("&","%26");
url = url.Replace("#","%23");
url = url.Replace("/","%2F");
This led to the following link:
All Items
And now when I click on the link I get a different security warning/error:
A potentially dangerous Request.QueryString value was detected from the client (standardText="...raction 1/𝘣 as the qua...").
I don't see why it is so hard to send character references through a QueryString. Is there a way to prevent Razor from converting all my &s to the &amp ; ? The address works fine when it is just plain "&"s.
Update: using URLDecode() on the string does not affect its character entity references, so when I try to decode the string then re-encode it, I still get the double-escape security warning.
Update: on the suggestion of #MikeMcCaughan, I tried using JS, but I am not very knowledgeable about mixing JS and Razor. I tried creating a link by dropping a script into the body like so:
<script type="text/javascript">
var a = document.createElement('a');
var linkText = document.createTextNode("my title text");
a.appendChild(linkText);
a.title = "my title text";
a.href = encodeURIComponent(#url);
document.body.appendChild(a);
</script>
But no link showed up, so I'm obviously doing it wrong.
For reference, when I try to use #Html.Raw(url),
Link Text
The &s are still turned into &amp ;s. the link renders as:
Link text

One simple solution is replacing the special characters by their encoding which can be accessed from here.
As you can find, replace in the string & with %26 using .replace for string. Also, replace / with %2F, # with %23, ; with %3B, and space with %20.
Also, You can do these in C# by the following function:
Server.URLEncode("<The Url>")
and in Javascript by the following function:
encodeURI("<The Url>")
Also, as you know the double-encoding is this. To prevent the double-encoding, you should have not encoded some part of the string before passing the string into the Server.URLEncode function.

How to replace a string within a larger string but exclude anything within img tags

I have created a system where i load out content from a Database field into a literal as content for an article. I have added the ability to pass a search text string via the URL to be highlighted on the page. So this is being done via doing a replace like so below...
articleTitle = articleTitle.Replace(searchString, "<span title=\"Searched Term Match\" class=\"SearchedTextTitle\">" + searchString + "</span>");
The issue i have encountered is my content is all HTML so it includes the html for images and so on and if the alt tags or image url's contain the search text term it is also being replaced by the replace method above. How can i exclude any of the content that is within HTML tags etc?
Thanks in advance for you help

You can either use IndexOf("") method to build substrings and perform the replaces on only the parts of the HTML you want to affect, or you can use regex replace, which will allow you to build more logic into your search.

How do I change a '\' to a '/'

My web program is getting an error when trying to access a file in a code behind C# program that has a backward slash between the directory name and the file name. The address for the file comes into my web page with a query value of 'deaths\bakerd.htm'. The browser, however, converts it to 'deaths%08akerd.htm'.
The url in the webpage reads
'http://localhost:57602/obitm.aspx?url=deaths%08akerd.htm'
and says the web page cannot be found but the webpage obitm.aspx does exist so why would it say it doesn't?
If I manually change the value of the query value in Windows Explorer to 'deaths/bakerd.htm' it doesn't do any conversion when coming in as a query value in the browser and I am able to access the file in my C# program.
I tried to change the query value in javascript using
thisurl = url.replace("\\", "/")
but that didn't change anything.
I haven't tried any conversion in my C# program. So how do I programmatically change the '\' to a '/'? I have no idea why this is happening and is very confusing. Any help is appreciated.

Just Converting \ to / in the URL string won't work for you, because in this case the "\b" is being turned into the backspace character which gets encoded into %08 - which is the HEX value for the ASCII equivalent of the backspace character.
To fix this one occurrence, you could convert the "%08" into the string "/B" but there are lots of HTML codes for the various characters that it would not be productive or fun for you to try.
Where are you getting the original string containing the file name name from?
If it is something that you have control over then convert the "\" to "/" at the point when you read the path / name of the file and before you pass it in a URL to the Web App.
you could also HTMLEncode the path before sending it so that the string becomes
http://localhost:57602/obitm.aspx?url=deaths%92Bakerd.htm'

Try using verbatim string by prefixing with # symbol
string url = #"http://localhost:57602/obitm.aspx?url=deaths\bakerd.htm".Replace("\\","/").ToString();

try thisurl = url.Replace("\\", "/");
Just like in javascript.

To parse query string parameter, you can user:
NameValueCollection qscoll = HttpUtility.ParseQueryString(querystring);
Here us MSDN help
or you can:
HttpUtility.UrlEncode(Request.QueryString["url"]);

How to escape url encoding?

I am creating a link that creates URL parameters that contains links with URL parameters.
The issue is that I have a link like this
http://mydomain/_layouts/test/MyLinksEdit.aspx?auto=true&source=
http://vtss-sp2010hh:8088/AdminReports/helloworld.aspx?pdfid=193
&url=http://vtss-sp2010hh:8088/AdminReports/helloworld.aspx?pdfid=193%26pdfname=5.6%20Upgrade
&title=5.6 Upgrade
This link goes to a bookmark adding page where it reads these parameters.
auto is wheather to read the following parameters or not
source is where to go after you finish adding or cancelling
url is the bookmark link
title is the name of the bookmark
The values of url and title get entered into 2 fields. Then the user has to click save or cancel.
The problem is when the bookmark page enters the values into the field, it will decode them.
Then if you try to save, it will won't let you save because the pdfname value in the url value has a space in it. It needs the link to not have any spaces. So basically, I want it so that after it enters it in the field, it will still be a %20 instead of a space.
There isn't a problem with source, auto, or title, just the url...
Is there a way to solve this? Like maybe a special escape character I can use for the %20?
Note: I cannot modify the bookmark page.
I am using c#/asp.net to create the link and go to it.
Thanks

Since .NET Framework 4.5 you can use WebUtility.UrlEncode.
It resides in System.dll, so it does not require any additional references.
It properly escapes characters for URLs, unlike Uri.EscapeUriString
It does not have any limits on the length of the string, unlike Uri.EscapeDataString, so it can be used for POST requests
System.Net.WebUtility.UrlEncode(urlText)
Another option is
System.Uri.EscapeDataString()

Uri.EscapeDataString() and Uri.UnescapeDataString() are safe comparing to UrlEncode/UrlDecode methods and does not convert plus characters into spaces when decoding.
Some details from another user: http://geekswithblogs.net/mikehuguet/archive/2009/08/16/134123.aspx

Just use HttpUtilty's UrlEncode method right before you hand off the url;
string encoded = HttpUtility.UrlEncode(url);

Format HtmlEncoded text to ASP

I am taking string from database, which will then be HtmlEncoded. How do I do the formatting of newline and tab?
I don't think I will be able to use CSS because it is only one string (unless using CSS to replace the substring)
One way I've tried is by putting <br> and   inside of the text in database and then using HttpUtility.HtmlDecode to format it, but I am not sure it is the right way.
Any suggestion and feedback is welcomed.

if you are getting a html encoded string from database then you just have to use htmldecode for decoding and it will place tabs and new line.
Prior to that check if the encoded string is html encoded or any other encoding has been used.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# download HTML page to string asp.net - c#

Related

C# .NET: Problems with query strings with character references

How to replace a string within a larger string but exclude anything within img tags

How do I change a '\' to a '/'

How to escape url encoding?

Format HtmlEncoded text to ASP

Categories

Resources