When downloading an XML response from a REST API, I cannot get .NET to download the full XML document on many requests. In each case, I'm missing the last several characters of the XML file which means I can't parse it. The requests work fine in a browser.
I have tried WebResponse.GetResponseStream() using a StreamReader. Within the StreamReader I have tried Read(...) with a buffer, ReadLine(), and ReadToEnd() to build a string for the response. Wondering if there was a bug in my code, I also tried WebClient.DownloadString(url) with the same result and XmlDocument.Load(url) which just throws an exception (unexpected end of file while parsing ____).
I know for a fact that this API has had some encoding issues in the past, so I've tried specifying multiple different encodings (e.g., UTF-8, iso-8859-1) for the StreamReader as well as letting .NET detect the encoding. Changing the encoding seems to result in a different number of characters that get left off the end.
Is there any way I can detect the proper encoding myself? How does a browser do it? Is there somewhere in any browser to see the actual encoding the response is using (not what the HTTP headers say it's returning)? Any other methods of getting a string response from a web site with an unknown encoding?
StreamReader sample code
StringBuilder sb = new StringBuilder();
using (resp = (HttpWebResponse)req.GetResponse())
{
using (Stream stream = resp.GetResponseStream())
{
using (StreamReader sr = new StreamReader(stream))
{
int charsRead = 1;
char[] buffer = new char[4096];
while (charsRead > 0)
{
charsRead = sr.Read(buffer, 0, buffer.Length);
sb.Append(buffer, 0, charsRead);
}
}
}
}
WebClient sample code
WebClient wc = new WebClient();
string text = wc.DownloadString(url);
XmlDocument sample code
XmlDocument doc = new XmlDocument();
doc.Load(url)
Related
I am creating a Voice Authentication system, for that I am using third party API which stores my wav file and when I call GET, it returns the RIFF format encoded string in response.
I am not able to figure out a way to convert this RIFF into a wav file.
I tried below code, it is creating wav file but the wav is currupted:
using (var response = await httpClient.GetAsync(""))
{
string responseData = await response.Content.ReadAsStringAsync();
using (BinaryWriter writer = new BinaryWriter(System.IO.File.Open(#"C:\wavFile.wav", FileMode.Create)))
{
byte[] data = System.Text.Encoding.UTF8.GetBytes(responseData);
writer.Write(data);
}
}
I tried ASCII along with UTF8 but same result. Can anyone help?
You should not read the response as a string. A wave file is binary data which may contain byte sequences which are not valid for strings.
The WebClient (MSDN) can directly download binary data without need of converting it.
using (var webClient = new WebClient())
{
byte[] wave = webClient.DownloadData("...");
}
I downloaded a webpage and it contains paragraph having this type of quotations marks
“I simple extracted this line from html page”
but when I write then to file then this “ character is not properly shown.
WebClient wc = new WebClient();
Stream strm = wc.OpenRead("http://images.thenews.com.pk/21-08-2013/ethenews/t-24895.htm");
StreamReader sr = new StreamReader(strm);
StreamWriter sw = new StreamWriter("D://testsharp.txt");
String line;
Console.WriteLine(sr.CurrentEncoding);
while ((line = sr.ReadLine()) != null) {
sw.WriteLine(line);
}
sw.Close();
strm.Close();
If all you want to do is to write the file to disk, then: use the Stream API directly, or (even easier) just use:
wc.DownloadFile("http://images.thenews.com.pk/21-08-2013/ethenews/t-24895.htm",
#"D:\testsharp.txt");
If you don't treat it as binary, then you need to worry about encodings - and it isn't enough just to look at sr.CurrentEncoding, because we can't be sure that it detected it correctly. It could be that the encoding was reported in the HTTP headers, which would be nice. It could also be that the encoding is specified in a BOM at the start of the payload. However, in the case of HTML the encoding could also be specified inside the HTML. In all three cases, treating the file as binary will improve things (for the BOM and inside-the-html cases, it will fix it entirely).
I am using c# on a windows mobile 6.1 device. compact framework 3.5.
I am getting a OutofMemoryException when loading in a large string of XML.
The handset has limited memory, but should be more than enough to handle the size of the xml string. The string of xml contains the base64 contents of a 2 MB file. The code will work when the xml string contains files of up to 1.8 MB.
I am completely puzzled as to what to do. Not sure how to change any memory settings.
I have included a condensed copy of the code below. Any help is appreciated.
Stream newStream = myRequest.GetRequestStream();
// Send the data.
newStream.Write(data, 0, data.Length);
//close the write stream
newStream.Close();
// Get the response.
HttpWebResponse response = (HttpWebResponse)myRequest.GetResponse();
// Get the stream containing content returned by the server.
Stream dataStream = response.GetResponseStream();
//Process the return
//Set the buffer
byte[] server_response_buffer = new byte[8192];
int response_count = -1;
string tempString = null;
StringBuilder response_sb = new StringBuilder();
//Loop through the stream until it is all written
do
{
// Read content into a buffer
response_count = dataStream.Read(server_response_buffer, 0, server_response_buffer.Length);
// Translate from bytes to ASCII text
tempString = Encoding.ASCII.GetString(server_response_buffer, 0, response_count);
// Write content to a file from buffer
response_sb.Append(tempString);
}
while (response_count != 0);
responseFromServer = response_sb.ToString();
// Cleanup the streams and the response.
dataStream.Close();
response.Close();
}
catch {
MessageBox.Show("There was an error with the communication.");
comm_error = true;
}
if(comm_error == false){
//Load the xml file into an XML object
XmlDocument xdoc = new XmlDocument();
xdoc.LoadXml(responseFromServer);
}
The error occurs on the xdoc.LoadXML line. I have tried writing the stream to a file and then loading the file directly into the xmldocument but it was no better.
Completely stumped at this point.
I would recommend that you use the XmlTextReader class instead of the XmlDocument class. I am not sure what your requirements are for reading of the xml, but XmlDocument is very memory intensive as it creates numerous objects and attempts to load the entire xml string. The XmlTextReader class on the other hand simply scans through the xml as you read from it.
Assuming you have the string, this means you would do something like the following
String xml = "<someXml />";
using(StringReader textReader = new StringReader(xml)) {
using(XmlTextReader xmlReader = new XmlTextReader(textReader)) {
xmlReader.MoveToContent();
xmlReader.Read();
// the reader is now pointed at the first element in the document...
}
}
Have you tried loading from a stream instead of from a string(this is different from writing to a stream, because in your example you are still trying to load it all at once into memory with the XmlDocument)?
There are other .NET components for XML files that work with the XML as a stream instead of loading it all at once. The problem is that .LoadXML probably tries to process the entire document at once, loading it in memory. Not only that but you've already loaded it into a string, so it exists in two different forms in memory, further increasing the chance that you do not have enough free contiguous memory available.
What you want is some way to read it piece meal into a stream through an XmlReader so that you can begin reading the XML document piece wise without loading the entire thing into memory. Of course there are limitations to this approach because an XmlReader is forward only and readonly, and whether it will work depends on what you are wanting to do with the XML once it is loaded.
I'm unsure why you are reading the xml in the way you are, but it could be very memory inefficient. If the garbage collector hasn't kicked in you could have 3+ copies of the document in memory: in the string builder, in the string and in the XmlDocument.
Much better to do something like:
XmlDocument xDoc = new XmlDocument();
Stream dataStream;
try {
dataStream = response.GetResponseStream();
xDoc.Load(dataStream);
} catch {
MessageBox.Show("There was an error with the communication.");
} finally {
if(dataStream != null)
dataStream.Dispose();
}
I need to download a large file (2 GB) over HTTP in a C# console application. Problem is, after about 1.2 GB, the application runs out of memory.
Here's the code I'm using:
WebClient request = new WebClient();
request.Credentials = new NetworkCredential(username, password);
byte[] fileData = request.DownloadData(baseURL + fName);
As you can see... I'm reading the file directly into memory. I'm pretty sure I could solve this if I were to read the data back from HTTP in chunks and write it to a file on disk.
How could I do this?
If you use WebClient.DownloadFile you could save it directly into a file.
The WebClient class is the one for simplified scenarios. Once you get past simple scenarios (and you have), you'll have to fall back a bit and use WebRequest.
With WebRequest, you'll have access to the response stream, and you'll be able to loop over it, reading a bit and writing a bit, until you're done.
From the Microsoft documentation:
We don't recommend that you use WebRequest or its derived classes for
new development. Instead, use the System.Net.Http.HttpClient class.
Source: learn.microsoft.com/WebRequest
Example:
public void MyDownloadFile(Uri url, string outputFilePath)
{
const int BUFFER_SIZE = 16 * 1024;
using (var outputFileStream = File.Create(outputFilePath, BUFFER_SIZE))
{
var req = WebRequest.Create(url);
using (var response = req.GetResponse())
{
using (var responseStream = response.GetResponseStream())
{
var buffer = new byte[BUFFER_SIZE];
int bytesRead;
do
{
bytesRead = responseStream.Read(buffer, 0, BUFFER_SIZE);
outputFileStream.Write(buffer, 0, bytesRead);
} while (bytesRead > 0);
}
}
}
}
Note that if WebClient.DownloadFile works, then I'd call it the best solution. I wrote the above before the "DownloadFile" answer was posted. I also wrote it way too early in the morning, so a grain of salt (and testing) may be required.
You need to get the response stream and then read in blocks, writing each block to a file to allow memory to be reused.
As you have written it, the whole response, all 2GB, needs to be in memory. Even on a 64bit system that will hit the 2GB limit for a single .NET object.
Update: easier option. Get WebClient to do the work for you: with its DownloadFile method which will put the data directly into a file.
WebClient.OpenRead returns a Stream, just use Read to loop over the contents, so the data is not buffered in memory but can be written in blocks to a file.
i would use something like this
The connection can be interrupted, so it is better to download the file in small chunks.
Akka streams can help download file in small chunks from a System.IO.Stream using multithreading. https://getakka.net/articles/intro/what-is-akka.html
The Download method will append the bytes to the file starting with long fileStart. If the file does not exist, fileStart value must be 0.
using Akka.Actor;
using Akka.IO;
using Akka.Streams;
using Akka.Streams.Dsl;
using Akka.Streams.IO;
private static Sink<ByteString, Task<IOResult>> FileSink(string filename)
{
return Flow.Create<ByteString>()
.ToMaterialized(FileIO.ToFile(new FileInfo(filename), FileMode.Append), Keep.Right);
}
private async Task Download(string path, Uri uri, long fileStart)
{
using (var system = ActorSystem.Create("system"))
using (var materializer = system.Materializer())
{
HttpWebRequest request = WebRequest.Create(uri) as HttpWebRequest;
request.AddRange(fileStart);
using (WebResponse response = request.GetResponse())
{
Stream stream = response.GetResponseStream();
await StreamConverters.FromInputStream(() => stream, chunkSize: 1024)
.RunWith(FileSink(path), materializer);
}
}
}
I'm having trouble reading a "chunked" response when using a StreamReader to read the stream returned by GetResponseStream() of a HttpWebResponse:
// response is an HttpWebResponse
StreamReader reader = new StreamReader(response.GetResponseStream());
string output = reader.ReadToEnd(); // throws exception...
When the reader.ReadToEnd() method is called I'm getting the following System.IO.IOException: Unable to read data from the transport connection: The connection was closed.
The above code works just fine when server returns a "non-chunked" response.
The only way I've been able to get it to work is to use HTTP/1.0 for the initial request (instead of HTTP/1.1, the default) but this seems like a lame work-around.
Any ideas?
#Chuck
Your solution works pretty good. It still throws the same IOExeception on the last Read(). But after inspecting the contents of the StringBuilder it looks like all the data has been received. So perhaps I just need to wrap the Read() in a try-catch and swallow the "error".
Haven't tried it this with a "chunked" response but would something like this work?
StringBuilder sb = new StringBuilder();
Byte[] buf = new byte[8192];
Stream resStream = response.GetResponseStream();
string tmpString = null;
int count = 0;
do
{
count = resStream.Read(buf, 0, buf.Length);
if(count != 0)
{
tmpString = Encoding.ASCII.GetString(buf, 0, count);
sb.Append(tmpString);
}
}while (count > 0);
I am working on a similar problem. The .net HttpWebRequest and HttpWebRequest handle cookies and redirects automatically but they do not handle chunked content on the response body automatically.
This is perhaps because chunked content may contain more than simple data (i.e.: chunk names, trailing headers).
Simply reading the stream and ignoring the EOF exception will not work as the stream contains more than the desired content. The stream will contain chunks and each chunk begins by declaring its size. If the stream is simply read from beginning to end the final data will contain the chunk meta-data (and in case where it is gziped content it will fail the CRC check when decompressing).
To solve the problem it is necessary to manually parse the stream, removing the chunk size from each chunk (as well as the CR LF delimitors), detecting the final chunk and keeping only the chunk data. There likely is a library out there somewhere that does this, I have not found it yet.
Usefull resources :
http://en.wikipedia.org/wiki/Chunked_transfer_encoding
https://www.rfc-editor.org/rfc/rfc2616#section-3.6.1
I've had the same problem (which is how I ended up here :-). Eventually tracked it down to the fact that the chunked stream wasn't valid - the final zero length chunk was missing. I came up with the following code which handles both valid and invalid chunked streams.
using (StreamReader sr = new StreamReader(response.GetResponseStream(), Encoding.UTF8))
{
StringBuilder sb = new StringBuilder();
try
{
while (!sr.EndOfStream)
{
sb.Append((char)sr.Read());
}
}
catch (System.IO.IOException)
{ }
string content = sb.ToString();
}
After trying a lot of snippets from StackOverflow and Google, ultimately I found this to work the best (assuming you know the data a UTF8 string, if not, you can just keep the byte array and process appropriately):
byte[] data;
var responseStream = response.GetResponseStream();
var reader = new StreamReader(responseStream, Encoding.UTF8);
data = Encoding.UTF8.GetBytes(reader.ReadToEnd());
return Encoding.Default.GetString(data.ToArray());
I found other variations work most of the time, but occasionally truncate the data. I got this snippet from:
https://social.msdn.microsoft.com/Forums/en-US/4f28d99d-9794-434b-8b78-7f9245c099c4/problems-with-httpwebrequest-and-transferencoding-chunked?forum=ncl
It is funny. During playing with the request header and removing "Accept-Encoding: gzip,deflate" the server in my usecase did answer in a plain ascii manner and no longer with chunked, encoded snippets. Maybe you should give it a try and keep "Accept-Encoding: gzip,deflate" away. The idea came while reading the upper mentioned wiki in topic about using compression.