URL Formats are not supported - c#

I am reading File with File.OpenRead method, I am giving this path
http://localhost:10001/MyFiles/folder/abc.png
I have tried this as well but no luck
http://localhost:10001//MyFiles//abc.png
but its giving
URL Formats are not supported
When I give physical path of my Drive like this,It works fine
d:\MyFolder\MyProject\MyFiles\folder\abc.png
How can I give file path to an Http path?
this is my code
public FileStream GetFile(string filename)
{
FileStream file = File.OpenRead(filename);
return file;
}

Have a look at WebClient (MSDN docs), it has many utility methods for downloading data from the web.
If you want the resource as a Stream, try:
using(WebClient webClient = new WebClient())
{
using(Stream stream = webClient.OpenRead(uriString))
{
using( StreamReader sr = new StreamReader(stream) )
{
Console.WriteLine(sr.ReadToEnd());
}
}
}

You could either use a WebClient as suggested in other answers or fetch the relative path like this:
var url = "http://localhost:10001/MyFiles/folder/abc.png";
var uri = new Uri(url);
var path = Path.GetFileName(uri.AbsolutePath);
var file = GetFile(path);
// ...
In general you should get rid of the absolute URLs.

The best way to download the HTML is by using the WebClient class. You do this like:
private string GetWebsiteHtml(string url)
{
WebRequest request = WebRequest.Create(url);
WebResponse response = request.GetResponse();
Stream stream = response.GetResponseStream();
StreamReader reader = new StreamReader(stream);
string result = reader.ReadToEnd();
stream.Dispose();
reader.Dispose();
return result;
}
Then, If you want to further process the HTML to ex. extract images or links, you will want to use technique known as HTML scrapping.
It's currently best achieved by using the HTML Agility Pack.
Also, documentation on WebClient class: MSDN

Here I found this snippet. Might do exactly what you need:
using(WebClient client = new WebClient()) {
string s = client.DownloadFile(new Uri("http://.../abc.png"), filename);
}
It uses the WebClient class.

To convert a file:// URL to a UNC file name, you should use the Uri.LocalPath property, as documented.
In other words, you can do this:
public FileStream GetFile(string url)
{
var filename = new Uri(url).LocalPath;
FileStream file = File.OpenRead(filename);
return file;
}

Related

C# encoding Shift-JIS vs. utf8 html agility pack

i have a problem. My goal is to save some Text from a (Japanese Shift-JS encoded)html into a utf8 encoded text file.
But i don't really know how to encode the text.. The HtmlNode object is encoded in Shift-JS. But after i used the ToString() Method, the content is corrupted.
My method so far looks like this:
public String getPage(String url)
{
String content = "";
HtmlDocument page = new HtmlWeb(){AutoDetectEncoding = true}.Load(url);
HtmlNode anchor = page.DocumentNode.SelectSingleNode("//div[contains(#class, 'article-def')]");
if (anchor != null)
{
content = anchor.InnerHtml.ToString();
}
return content;
}
I tried
Console.WriteLine(page.Encoding.EncodingName.ToString());
and got: Japanese Shift-JIS
But converting the html into a String produces the error. I thought there should be a way, but since documentation for html-agility-pack is sparse and i couldn't really find a solution via google, i'm here too get some hints.
Well, AutoDetectEncoding doesn't really work like you'd expect it to. From what i found from looking at the source code of the AgilityPack, the property is only used when loading a local file from disk, not from an url.
So there's three options. One would be to just set the Encoding
OverrideEncoding = Encoding.GetEncoding("shift-jis")
If you know the encoding will always be the same that's the easiest fix.
Or you could download the file locally and load it the same way you do now but instead of the url you'd pass the file path.
using (var client=new WebClient())
{
client.DownloadFile(url, "20130519-OYT1T00606.htm");
}
var htmlWeb = new HtmlWeb(){AutoDetectEncoding = true};
var file = new FileInfo("20130519-OYT1T00606.htm");
HtmlDocument page = htmlWeb.Load(file.FullName);
Or you can detect the encoding from your content like this:
byte[] pageBytes;
using (var client = new WebClient())
{
pageBytes = client.DownloadData(url);
}
HtmlDocument page = new HtmlDocument();
using (var ms = new MemoryStream(pageBytes))
{
page.Load(ms);
var metaContentType = page.DocumentNode.SelectSingleNode("//meta[#http-equiv='Content-Type']").GetAttributeValue("content", "");
var contentType = new System.Net.Mime.ContentType(metaContentType);
ms.Position = 0;
page.Load(ms, Encoding.GetEncoding(contentType.CharSet));
}
And finally, if the page you are querying returns the content-Type in the response you can look here for how to get the encoding.
Your code would of course need a few more null checks than mine does. ;)

Copy a PDF Stream to File

I'm calling a routine in PHP (TCPDF) from C# via WebRequest using StreamReader. The PDF file is returned as a stream and stored in a string (obv). I know the data being returned to the string is actually a PDF file, as I've tested it in PHP. I'm having a hard time writing the string to a file and actually getting a valid PDF in C#. I know it has something to do with the way I'm trying to encode the file, but the several things I've tried have resulted in 'Not today, Padre' (i.e. they didn't work)
Here's the class I'm using to perform the request (thanks to user 'Paramiliar' for the example I'm using/borrowed/stole):
public class httpPostData
{
WebRequest request;
WebResponse response;
public string senddata(string url, string postdata)
{
// create the request to the url passed in the paramaters
request = (WebRequest)WebRequest.Create(url);
// set the method to POST
request.Method = "POST";
// set the content type and the content length
request.ContentType = "application/x-www-form-urlencoded";
request.ContentLength = postdata.Length;
// convert the post data into a byte array
byte[] byteData = Encoding.UTF8.GetBytes(postdata);
// get the request stream and write the data to it
Stream dataStream = request.GetRequestStream();
dataStream.Write(byteData, 0, byteData.Length);
dataStream.Close();
// get the response
response = request.GetResponse();
dataStream = response.GetResponseStream();
StreamReader reader = new StreamReader(dataStream);
// read the response
string serverresponse = reader.ReadToEnd();
//Console.WriteLine(serverresponse);
reader.Close();
dataStream.Close();
response.Close();
return serverresponse;
}
} // end class httpPostData
...and my call to it
httpPostData myPost = new httpPostData();
// postData defined (not shown)
string response = myPost.senddata("http://www.example.com/pdf.php", postData);
In case it isn't clear, I'm stuck writing string response to a valid .pdf file. I've tried this (Thanks to user Adrian):
static public void SaveStreamToFile(string fileFullPath, Stream stream)
{
if (stream.Length == 0) return;
// Create a FileStream object to write a stream to a file
using (FileStream fileStream = System.IO.File.Create(fileFullPath, (int)stream.Length))
{
// Fill the bytes[] array with the stream data
byte[] bytesInStream = new byte[stream.Length];
stream.Read(bytesInStream, 0, (int)bytesInStream.Length);
// Use FileStream object to write to the specified file
fileStream.Write(bytesInStream, 0, bytesInStream.Length);
}
}
..and the call to it:
string location = "C:\\myLocation\\";
SaveStreamToFile(location, response); // <<-- this throws an error b/c 'response' is a string, not a stream. New to C# and having some basic issues with things like this
I think I'm close...a nudge in the right direction would be greatly appreciated.
You can use WebClient. Use the method DownloadFile, or the async ones.
Have fun!
Fernando.-
Sorry, I haven't read your comments till now.
I guess you have already done this...
But this may help you (just replace urls and paths) (from: http://msdn.microsoft.com/en-us/library/ez801hhe.aspx )
string remoteUri = "http://www.contoso.com/library/homepage/images/";
string fileName = "ms-banner.gif", myStringWebResource = null;
// Create a new WebClient instance.
WebClient myWebClient = new WebClient();
// Concatenate the domain with the Web resource filename.
myStringWebResource = remoteUri + fileName;
Console.WriteLine("Downloading File \"{0}\" from \"{1}\" .......\n\n", fileName, myStringWebResource);
// Download the Web resource and save it into the current filesystem folder.
myWebClient.DownloadFile(myStringWebResource,fileName);
Console.WriteLine("Successfully Downloaded File \"{0}\" from \"{1}\"", fileName, myStringWebResource);
Console.WriteLine("\nDownloaded file saved in the following file system folder:\n\t" + Application.StartupPath);

Howto: download a file keeping the original name in C#?

I have files on a server that can be accessed from a URL formatted like this:
http:// address/Attachments.aspx?id=GUID
I have access to the GUID and need to be able to download multiple files to the same folder.
if you take that URL and throw it in a browser, you will download the file and it will have the original file name.
I want to replicate that behavior in C#. I have tried using the WebClient class's DownloadFile method, but with that you have to specify a new file name. And even worse, DownloadFile will overwrite an existing file. I know I could generate a unique name for every file, but i'd really like the original.
Is it possible to download a file preserving the original file name?
Update:
Using the fantastic answer below to use the WebReqest class I came up with the following which works perfectly:
public override void OnAttachmentSaved(string filePath)
{
var webClient = new WebClient();
//get file name
var request = WebRequest.Create(filePath);
var response = request.GetResponse();
var contentDisposition = response.Headers["Content-Disposition"];
const string contentFileNamePortion = "filename=";
var fileNameStartIndex = contentDisposition.IndexOf(contentFileNamePortion, StringComparison.InvariantCulture) + contentFileNamePortion.Length;
var originalFileNameLength = contentDisposition.Length - fileNameStartIndex;
var originalFileName = contentDisposition.Substring(fileNameStartIndex, originalFileNameLength);
//download file
webClient.UseDefaultCredentials = true;
webClient.DownloadFile(filePath, String.Format(#"C:\inetpub\Attachments Test\{0}", originalFileName));
}
Just had to do a little string manipulation to get the actual filename. I'm so excited. Thanks everyone!
As hinted in comments, the filename will be available in Content-Disposition header. Not sure about how to get its value when using WebClient, but it's fairly simple with WebRequest:
WebRequest request = WebRequest.Create("http://address/Attachments.aspx?id=GUID");
WebResponse response = request.GetResponse();
string originalFileName = response.Headers["Content-Disposition"];
Stream streamWithFileBody = response.GetResponseStream();

Saving a GIF from a URL

I want to save a GIF driven from an internet address.
When I request the file from the browser, I receive these characters:
GIF87a½€ÿÿÿ,½þ„©Ëí£œ´Ú‹³Þ¼û†âH–æ‰Ádk
ÇòòµuÏú>æùôãEX¯–Š¨²µ‚‰"<}O)Ò¨¬:kÜ'Õª]þ¢`ìc
­Ú”ìvöhe;Ï]¸;oßÂyþ|Þ‡µe§'hÄÔdhW÷§H#&gö¨ð…G™Gy(‰ x÷j†øyÙˆI8uJZZiª
™ªÚº¹i {ç‰Zj™ˆwšµ7
©;é›9Æy¤VÄ$ú;œË›ÙÛI¬,œ=}|,l¼Û˧¸ìæ%]h~n»8»ÌíIv—Fe6¿‹¸®0YÉ
J7k8”•pÕá C¬4Èráâ;zü2¤È‘$Kš
Can I retrieve GIF format with these characters?
A GIF is a binary file. You shouldn't be reading it as a string. Read the GIF into a byte[], and then save those bytes to a file.
Just do this:
string url = "http://address.com/getFile?s=1234";
string filePath = #"c:\MyFolder\file.gif";
using (WebClient client = new WebClient())
{
client.DownloadFile(url, filePath);
}
and do not forget to add this at the beginning:
using System.Net;
thanks for your answers.
I've tried a way and I found the Issue
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(url);
request.Accept = "text/html, text/plain";
request.AllowAutoRedirect = true;
WebResponse response = request.GetResponse();
Stream stream = response.GetResponseStream();
StreamReader reader = new StreamReader(stream, Encoding.UTF8);
Bitmap bmp = new Bitmap(Image.FromStream(stream, true, false));
bmp.Save("image.gif");
Judging from what I know about file formats and the web, the characters are representations ASCII or another encoding of binary. You may still be able to save it using the BinaryWriter.Write() Method.

How to get plaintext from the response of a WebRequest class in C#

I want to get plain text using WebRequest class, just like what we get when we use webbrowser1.Document.Body.InnerText . I have tried the following code
public string request_Resource()
{
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(myurl);
Stream stream = request.GetResponse().GetResponseStream();
StreamReader sr = new StreamReader(stream);
WebBrowser wb = new WebBrowser();
wb.DocumentText = sr.ReadToEnd();
return wb.Document.Body.InnerText;
}
when i execute this is get a NullReferenceException.
Is there a better way to get a plain text.
Note: I cannot use webbrowser control directly to load the webpage, because, i don't want to deal with all those events that fire up multiple times when ever a page is loaded.
UPDATE: I have changed my code to use WebClient Class instead of WebRequest upon suggestion
My code looks something like this now
public string request_Resource()
{
WebClient wc = new WebClient();
wc.Proxy = null;
//The user agent header is added to avoid any possible errors
wc.Headers.Add("user-agent", "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.10) Gecko/20100914 Firefox/3.6.10 ( .NET CLR 3.5.30729; .NET4.0C)");
return wc.DownloadString(myurl);
}
I am considering using HTML Utility Pack, can anyone suggest any better alternative.
You're looking for the HTML Agility Pack, which can parse the HTML without IE.
It has an InnerText property.
To answer your question, you need to wait for the browser to parse the text.
By the way, you should use the WebClient class instead of WebRequest.
Use webclient:
public string request_Resource()
{
WebClient wc = new WebClient();
byte[] data = wc.DownloadData(myuri);
return Encoding.UTF8.GetString(data);
}
This will give you the content of the website. Then you can use HtmlAgilityPack to parse the result.
If you need just plain HTML text, then you have already wrote that code.
public string request_Resource()
{
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(myurl);
Stream stream = request.GetResponse().GetResponseStream();
StreamReader sr = new StreamReader(stream);
return sr.ReadToEnd();
}

Categories

Resources