How to read URL starting with view-source in C# - c#

I want to read following URL and it should save the content available in the page to Text file.
I use below code to read page source:
string address = "view-source:http://stackoverflow.com/"; //any web site url
using (WebClient wc = new WebClient())
{
var Text= wc.DownloadString(address);
}
But it is throwing exception "The URI prefix is not recognized."
Any Help Would be Appreciate.
Thanks! in advance.

You're using a feature of Chrome by prepending "view-source:" to that url. The WebClient class probably doesn't know anything about that feature. It's complaining about the "URI prefix" being unrecognized. That's the "view-source:" portion of your string.
So, remove that part of the URL and you will have a valid url.
string userInput = "view-source:http://stackoverflow.com/";
string address = userInput.Replace("view-source:", "");
Note: this may produce different results for web apps that provide additional content after javascript has been run and interpreted. You might not ultimately get what you want.
Edit: after your comment, it sounds like you want to remove the possibility of the url starting with "view-source:" which I have reflected in the answer.
Just in case you're looking for the "post javascript" source. There's a project on github that offers this feature but I've never used it. I only know about it because it's maintained by a guy I work with.
You can also find a working example in this repl

Related

WebClient DownloadString not getting all the information

I'm trying to do the following:
string url = #"https://picasaweb.google.com/data/feed/api/user/bladibla?thumbsize=206c";
WebClient myClient = new WebClient();
string picasaxml = myClient.DownloadString(url);
When I go to the url (which is not the real url ofcourse, the "bladibla" is not the actual username), I get to see all the information.
When I look at the picasaxml, I'm missing parts of the information. Some xml-sections are missing in the document.
Can anyone help me?
-- Update --
The real url is : https://picasaweb.google.com/data/feed/api/user/tim#boerenbond.be?thumbsize=206c
so if you go to that url, you should get a lot of information, including the names of some photoalbums that were created there.
But when I run the code as displayed higher, i'm not getting all that information.
Ok, I just noticed that when I go to the page on another machine, I'm not getting all the info either.

Retrieve the Original (Client) Url Without the Default Document [duplicate]

I would like to get the exact url that user typed into the browser. Of course I could always use something like Request.Url.ToString() but this does not give me what i want in the following situation:
http://www.mysite.com/rss
With the url above what Request.Url.ToString() would give me is:
http://www.mysite.com/rss/Default.aspx
Does anyone know how to accomplish this?
I have already tried:
Request.Url
Request.RawUrl
this.Request.ServerVariables["CACHE_URL"]
this.Request.ServerVariables["HTTP_URL"]
((HttpWorkerRequest)((IServiceProvider)HttpContext.Current).GetService(typeof(HttpWorkerRequest))).GetServerVariable( "CACHE_URL")
((HttpWorkerRequest)((IServiceProvider)HttpContext.Current).GetService(typeof(HttpWorkerRequest))).GetServerVariable( "HTTP_URL")
Edit: You want the HttpWorkerRequest.GetServerVariable() with the key HTTP_URL or CACHE_URL. Note that the behavior differs between IIS 5 and IIS 6 (see documentation of the keys).
In order to be able to access all server variables (in case you get null), directly access the HttpWorkerRequest:
HttpWorkerRequest workerRequest =
(HttpWorkerRequest)((IServiceProvider)HttpContext.Current)
.GetService(typeof(HttpWorkerRequest));
Remember too that the "exact URL that the user entered" may never be available at the server. Each link in the chain from fingers to server can slightly modify the request.
For example if I type xheo.com into my browser window, IE will be convert to http://www.xheo.com automatically. Then when the request gets to IIS it says to the browser - you really want the default page at http://www.xheo.com/Default.aspx. So the browser responds by asking for the default page.
Same thing happens with HTTP 30x redirect requests. The server will likely only ever see the final request made by the browser.
Try using Request.Url.OriginalString
Might give you the thing you are looking for.
It is possible, you just need to combining a few of the values from the request object to rebuild the exact url entered:
Dim pageUrl As String = String.Format("{0}://{1}{2}",
Request.Url.Scheme,
Request.Url.Host,
Request.RawUrl)
Response.Write(pageUrl)
Entering the address http://yousite.com/?hello returns exactly:
http://yousite.com/?hello
Request.RawUrl
I think is the monkey you are after...
Easiest way to do this is used client-side programming to extract the exact url:
<script language="javascript" type="text/javascript">
document.write (document.location.href);
</script>

Submitting a Google Docs form in C#

I previously found the this article about how to submit google docs forms via C#.
It seems that, since that was written, the method has broken, however.
If I use the actual online form, it updates immediately (without refreshing even). But if I use this method, it returns success, but no data ever shows up.
Anyone know how to do this now?
Change formkey string and try this:
string formkey = <your form key>;
WebClient wc = new WebClient();
var keyval = new NameValueCollection();
keyval.Add("entry.0.single", "aaa");
keyval.Add("entry.1.single", "bbb");
Uri uri = new Uri("https://spreadsheets.google.com/spreadsheet/formResponse?formkey="+formkey);
wc.UploadValuesAsync(uri, "POST", keyval, Guid.NewGuid().ToString());
This is not a hack, as your linked (broken?) solution is, so a little heavier. But Google provides an API for Google Docs.
the provided link is updated.
however as google is moving from google docs to google drive, you need to make some modification to adapt to changes.

Google Charts of SSL

I need to get the free Google charts working over SSL without any security errors. I am using c# and asp.net.
As Google charts does not support SSL by default, I am looking for a robust method of using there charts but ensuring my user doesn't get any security warnings over their browser.
One thought was to use a handler to call the charts api and then generate the output my site needs.
Similar to Pants are optional blog post. I haven't been able to get this example working at this stage.
Any suggestions, or samples are welcome.
Thanks
the Google Charts API is now available over HTTPS at via https at chart.googleapis.com.
Source: http://www.imperialviolet.org/2010/11/29/charthttps.html
We do this automatically in the NetQuarry Platform - it's pretty simple, although you do force the image to come through your site vs. charts.google.com, making your browser run the request through a single connection.
Since a chart is just a link to an image, what we do is to build the link to the chart (a much more complex process, obviously), then add the whole link to the query string on an internal handler (handler.ashx?req=chart& ). So the new link looks like this:
handler.ashx?act=chrt&req=chart&cht=p3&chs=450x170&chd=s:HAR9GBA&chl=New|In%20Progress|Responded|Won't%20Respond|On%20Hold|Future|Review|&chg=20,20,1,5&chg=10,25,1,5&chco=0A477D
Then, we simply download the image data and write it back as the response.
Here's the code:
Blockquote
private void GoogleChart(HttpContext cxt)
{
const string csPrefix = "?act=chrt&req=chart&";
HttpRequest req = cxt.Request;
HttpResponse rsp = cxt.Response;
string sUrl = cxt.Request.RawUrl;
int nStart = sUrl.IndexOf(csPrefix, StringComparison.OrdinalIgnoreCase);
rsp.Clear();
if (nStart > 0)
{
sUrl = "http://chart.apis.google.com/chart?" + sUrl.Substring(nStart + csPrefix.Length);
System.Net.WebClient wc = new System.Net.WebClient();
byte[] buffer = wc.DownloadData(sUrl);
cxt.Response.ClearContent();
cxt.Response.ClearHeaders();
cxt.Response.ContentType = "application/octet-stream";
cxt.Response.AppendHeader("content-length", buffer.Length.ToString());
cxt.Response.BinaryWrite(buffer);
}
}
I Have a partial solution that has one issue.
here is the link to my new post asking for help with a specific problem regarding my solution
My Attempt at a SSL handler

Truncating Query String & Returning Clean URL C# ASP.net

I would like to take the original URL, truncate the query string parameters, and return a cleaned up version of the URL. I would like it to occur across the whole application, so performing through the global.asax would be ideal. Also, I think a 301 redirect would be in order as well.
ie.
in: www.website.com/default.aspx?utm_source=twitter&utm_medium=social-media
out: www.website.com/default.aspx
What would be the best way to achieve this?
System.Uri is your friend here. This has many helpful utilities on it, but the one you want is GetLeftPart:
string url = "http://www.website.com/default.aspx?utm_source=twitter&utm_medium=social-media";
Uri uri = new Uri(url);
Console.WriteLine(uri.GetLeftPart(UriPartial.Path));
This gives the output: http://www.website.com/default.aspx
[The Uri class does require the protocol, http://, to be specified]
GetLeftPart basicallys says "get the left part of the uri up to and including the part I specify". This can be Scheme (just the http:// bit), Authority (the www.website.com part), Path (the /default.aspx) or Query (the querystring).
Assuming you are on an aspx web page, you can then use Response.Redirect(newUrl) to redirect the caller.
Here is a simple trick
Dim uri = New Uri(Request.Url.AbsoluteUri)
dim reqURL = uri.GetLeftPart(UriPartial.Path)
Here is a quick way of getting the root path sans the full path and query.
string path = Request.Url.AbsoluteUri.Replace(Request.Url.PathAndQuery,"");
This may look a little better.
string rawUrl = String.Concat(this.GetApplicationUrl(), Request.RawUrl);
if (rawUrl.Contains("/post/"))
{
bool hasQueryStrings = Request.QueryString.Keys.Count > 1;
if (hasQueryStrings)
{
Uri uri = new Uri(rawUrl);
rawUrl = uri.GetLeftPart(UriPartial.Path);
HtmlLink canonical = new HtmlLink();
canonical.Href = rawUrl;
canonical.Attributes["rel"] = "canonical";
Page.Header.Controls.Add(canonical);
}
}
Followed by a function to properly fetch the application URL.
Works perfectly.
I'm guessing that you want to do this because you want your users to see pretty looking URLs. The only way to get the client to "change" the URL in its address bar is to send it to a new location - i.e. you need to redirect them.
Are the query string parameters going to affect the output of your page? If so, you'll have to look at how to maintain state between requests (session variables, cookies, etc.) because your query string parameters will be lost as soon as you redirect to a page without them.
There are a few ways you can do this globally (in order of preference):
If you have direct control over your server environment then a configurable server module like ISAPI_ReWrite or IIS 7.0 URL Rewrite Module is a great approach.
A custom IHttpModule is a nice, reusable roll-your-own approach.
You can also do this in the global.asax as you suggest
You should only use the 301 response code if the resource has indeed moved permanently. Again, this depends on whether your application needs to use the query string parameters. If you use a permanent redirect a browser (that respects the 301 response code) will skip loading a URL like .../default.aspx?utm_source=twitter&utm_medium=social-media and load .../default.aspx - you'll never even know about the query string parameters.
Finally, you can use POST method requests. This gives you clean URLs and lets you pass parameters in, but will only work with <form> elements or requests you create using JavaScript.
Take a look at the UriBuilder class. You can create one with a url string, and the object will then parse this url and let you access just the elements you desire.
After completing whatever processing you need to do on the query string, just split the url on the question mark:
Dim _CleanUrl as String = Request.Url.AbsoluteUri.Split("?")(0)
Response.Redirect(_CleanUrl)
Granted, my solution is in VB.NET, but I'd imagine that it could be ported over pretty easily. And since we are only looking for the first element of the split, it even "fails" gracefully when there is no querystring.

Categories

Resources