Reading "chunked" response with HttpWebResponse

Reading "chunked" response with HttpWebResponse - c#

I'm having trouble reading a "chunked" response when using a StreamReader to read the stream returned by GetResponseStream() of a HttpWebResponse:
// response is an HttpWebResponse
StreamReader reader = new StreamReader(response.GetResponseStream());
string output = reader.ReadToEnd(); // throws exception...
When the reader.ReadToEnd() method is called I'm getting the following System.IO.IOException: Unable to read data from the transport connection: The connection was closed.
The above code works just fine when server returns a "non-chunked" response.
The only way I've been able to get it to work is to use HTTP/1.0 for the initial request (instead of HTTP/1.1, the default) but this seems like a lame work-around.
Any ideas?
#Chuck
Your solution works pretty good. It still throws the same IOExeception on the last Read(). But after inspecting the contents of the StringBuilder it looks like all the data has been received. So perhaps I just need to wrap the Read() in a try-catch and swallow the "error".

Haven't tried it this with a "chunked" response but would something like this work?
StringBuilder sb = new StringBuilder();
Byte[] buf = new byte[8192];
Stream resStream = response.GetResponseStream();
string tmpString = null;
int count = 0;
do
{
count = resStream.Read(buf, 0, buf.Length);
if(count != 0)
{
tmpString = Encoding.ASCII.GetString(buf, 0, count);
sb.Append(tmpString);
}
}while (count > 0);

I am working on a similar problem. The .net HttpWebRequest and HttpWebRequest handle cookies and redirects automatically but they do not handle chunked content on the response body automatically.
This is perhaps because chunked content may contain more than simple data (i.e.: chunk names, trailing headers).
Simply reading the stream and ignoring the EOF exception will not work as the stream contains more than the desired content. The stream will contain chunks and each chunk begins by declaring its size. If the stream is simply read from beginning to end the final data will contain the chunk meta-data (and in case where it is gziped content it will fail the CRC check when decompressing).
To solve the problem it is necessary to manually parse the stream, removing the chunk size from each chunk (as well as the CR LF delimitors), detecting the final chunk and keeping only the chunk data. There likely is a library out there somewhere that does this, I have not found it yet.
Usefull resources :
http://en.wikipedia.org/wiki/Chunked_transfer_encoding
https://www.rfc-editor.org/rfc/rfc2616#section-3.6.1

I've had the same problem (which is how I ended up here :-). Eventually tracked it down to the fact that the chunked stream wasn't valid - the final zero length chunk was missing. I came up with the following code which handles both valid and invalid chunked streams.
using (StreamReader sr = new StreamReader(response.GetResponseStream(), Encoding.UTF8))
{
StringBuilder sb = new StringBuilder();
try
{
while (!sr.EndOfStream)
{
sb.Append((char)sr.Read());
}
}
catch (System.IO.IOException)
{ }
string content = sb.ToString();
}

After trying a lot of snippets from StackOverflow and Google, ultimately I found this to work the best (assuming you know the data a UTF8 string, if not, you can just keep the byte array and process appropriately):
byte[] data;
var responseStream = response.GetResponseStream();
var reader = new StreamReader(responseStream, Encoding.UTF8);
data = Encoding.UTF8.GetBytes(reader.ReadToEnd());
return Encoding.Default.GetString(data.ToArray());
I found other variations work most of the time, but occasionally truncate the data. I got this snippet from:
https://social.msdn.microsoft.com/Forums/en-US/4f28d99d-9794-434b-8b78-7f9245c099c4/problems-with-httpwebrequest-and-transferencoding-chunked?forum=ncl

It is funny. During playing with the request header and removing "Accept-Encoding: gzip,deflate" the server in my usecase did answer in a plain ascii manner and no longer with chunked, encoded snippets. Maybe you should give it a try and keep "Accept-Encoding: gzip,deflate" away. The idea came while reading the upper mentioned wiki in topic about using compression.

Related

.NET won't download full XML response from REST API

When downloading an XML response from a REST API, I cannot get .NET to download the full XML document on many requests. In each case, I'm missing the last several characters of the XML file which means I can't parse it. The requests work fine in a browser.
I have tried WebResponse.GetResponseStream() using a StreamReader. Within the StreamReader I have tried Read(...) with a buffer, ReadLine(), and ReadToEnd() to build a string for the response. Wondering if there was a bug in my code, I also tried WebClient.DownloadString(url) with the same result and XmlDocument.Load(url) which just throws an exception (unexpected end of file while parsing ____).
I know for a fact that this API has had some encoding issues in the past, so I've tried specifying multiple different encodings (e.g., UTF-8, iso-8859-1) for the StreamReader as well as letting .NET detect the encoding. Changing the encoding seems to result in a different number of characters that get left off the end.
Is there any way I can detect the proper encoding myself? How does a browser do it? Is there somewhere in any browser to see the actual encoding the response is using (not what the HTTP headers say it's returning)? Any other methods of getting a string response from a web site with an unknown encoding?
StreamReader sample code
StringBuilder sb = new StringBuilder();
using (resp = (HttpWebResponse)req.GetResponse())
{
using (Stream stream = resp.GetResponseStream())
{
using (StreamReader sr = new StreamReader(stream))
{
int charsRead = 1;
char[] buffer = new char[4096];
while (charsRead > 0)
{
charsRead = sr.Read(buffer, 0, buffer.Length);
sb.Append(buffer, 0, charsRead);
}
}
}
}
WebClient sample code
WebClient wc = new WebClient();
string text = wc.DownloadString(url);
XmlDocument sample code
XmlDocument doc = new XmlDocument();
doc.Load(url)

Using dataSet.ReadXml when the server is continuously pushing xml

I'm connecting (using HttpWebRequest and HttpWebResponse) to a server that is continuously pushing XML data. what I need is to read the data and store them into a dataset online.
Currently this is what I do:
HttpWebRequest httpRequest;
string url = "something";
System.Data.DataSet dataSet = new System.Data.DataSet();
httpRequest = (HttpWebRequest)WebRequest.Create(url);
httpRequest.Credentials = new NetworkCredential("username", "pass");
ServicePointManager.ServerCertificateValidationCallback = ((sender1, certificate, chain, sslPolicyErrors) => true);
byte[] buf = new byte[1024];
HttpWebResponse response;
int count = -1;
string read = string.Empty;
response = (HttpWebResponse)httpRequest.GetResponse();
do
{
count = response.GetResponseStream().Read(buf, 0, buf.Length);
read += Encoding.UTF8.GetString(buf, 0, count);
dataSet.ReadXml(new MemoryStream(System.Text.Encoding.UTF8.GetBytes(read)));
} while (response.GetResponseStream().CanRead && count != 0);
but since in every loop a part of the xml data is received, the ReadXml function would cause an exception.
What can I do to get this problem solved?

If by "continuously" you mean without ever ending, then you should not use a DataSet, since this is not designed to be used like this.
That being said, the problem you're having is that the network data you're reading can be arbitrarily cut to pieces, so that you don't even get valid XML fragments back. Instead, you should use an XmlReader on the response stream and read from that, since this will put the pieces back together.
However, directly passing this into ReadXml() will not do the trick because it would never stop. So what you'll probably end up with is some simple code reading the parsed fragments from the XmlReader, and insert these manually into whatever data storage you want.

Have you tried putting the line
dataSet.ReadXml(new MemoryStream(System.Text.Encoding.UTF8.GetBytes(read)));
outside the do-while loop, and then having the while-clause only evaluate count:
while (count > 0);
It could also be helpful to get the instance of the response stream outside the loop. There might be unforseen consequences by having the GetResponseStream() call on every loop.

Raw Stream Has Data, Deflate Returns Zero Bytes

I'm reading data (an adCenter report, as it happens), which is supposed to be zipped. Reading the contents with an ordinary stream, I get a couple thousand bytes of gibberish, so this seems reasonable. So I feed the stream to DeflateStream.
First, it reports "Block length does not match with its complement." A brief search suggests that there is a two-byte prefix, and indeed if I call ReadByte() twice before opening DeflateStream, the exception goes away.
However, DeflateStream now returns nothing at all. I've spent most of the afternoon chasing leads on this, with no luck. Help me, StackOverflow, you're my only hope! Can anyone tell me what I'm missing?
Here's the code. Naturally I only enabled one of the two commented blocks at a time when testing.
_results = new List<string[]>();
using (Stream compressed = response.GetResponseStream())
{
// Skip the zlib prefix, which conflicts with the deflate specification
compressed.ReadByte(); compressed.ReadByte();
// Reports reading 3,000-odd bytes, followed by random characters
/*byte[] buffer = new byte[4096];
int bytesRead = compressed.Read(buffer, 0, 4096);
Console.WriteLine("Read {0} bytes.", bytesRead.ToString("#,##0"));
string content = Encoding.ASCII.GetString(buffer, 0, bytesRead);
Console.WriteLine(content);*/
using (DeflateStream decompressed = new DeflateStream(compressed, CompressionMode.Decompress))
{
// Reports reading 0 bytes, and no output
/*byte[] buffer = new byte[4096];
int bytesRead = decompressed.Read(buffer, 0, 4096);
Console.WriteLine("Read {0} bytes.", bytesRead.ToString("#,##0"));
string content = Encoding.ASCII.GetString(buffer, 0, bytesRead);
Console.WriteLine(content);*/
using (StreamReader reader = new StreamReader(decompressed))
while (reader.EndOfStream == false)
_results.Add(reader.ReadLine().Split('\t'));
}
}
As you can probably guess from the last line, the unzipped content should be TDT.
Just for fun, I tried decompressing with GZipStream, but it reports that the magic number is not correct. MS' docs just say "The downloaded report is compressed by using zip compression. You must unzip the report before you can use its contents."
Here's the code that finally worked. I had to save the content out to a file and read it back in. This does not seem reasonable, but for the small quantities of data I'm working with, it's acceptable, I'll take it!
WebRequest request = HttpWebRequest.Create(reportURL);
WebResponse response = request.GetResponse();
_results = new List<string[]>();
using (Stream compressed = response.GetResponseStream())
{
// Save the content to a temporary location
string zipFilePath = #"\\Server\Folder\adCenter\Temp.zip";
using (StreamWriter file = new StreamWriter(zipFilePath))
{
compressed.CopyTo(file.BaseStream);
file.Flush();
}
// Get the first file from the temporary zip
ZipFile zipFile = ZipFile.Read(zipFilePath);
if (zipFile.Entries.Count > 1) throw new ApplicationException("Found " + zipFile.Entries.Count.ToString("#,##0") + " entries in the report; expected 1.");
ZipEntry report = zipFile[0];
// Extract the data
using (MemoryStream decompressed = new MemoryStream())
{
report.Extract(decompressed);
decompressed.Position = 0; // Note that the stream does NOT start at the beginning
using (StreamReader reader = new StreamReader(decompressed))
while (reader.EndOfStream == false)
_results.Add(reader.ReadLine().Split('\t'));
}
}

You will find that DeflateStream is hugely limited in what data it will decompress. In fact if you are expecting entire files it will be of no use at all.
There are hundereds of (mostly small) variations of ZIP files and DeflateStream will get along only with two or three of them.
Best way is likely to use a dedicated library for reading Zip files/streams like DotNetZip or SharpZipLib (somewhat unmaintained).

You could write the stream to a file and try my tool Precomp on it. If you use it like this:
precomp -c- -v [name of input file]
any ZIP/gZip stream(s) inside the file will be detected and some verbose information will be reported (position and length of the stream). Additionally, if they can be decompressed and recompressed bit-to-bit identical, the output file will contain the decompressed stream(s).
Precomp detects ZIP/gZip (and some other) streams anywhere in the file, so you won't have to worry about header bytes or garbage at the beginning of the file.
If it doesn't detect a stream like this, try to add -slow, which detects deflate streams even if they don't have a ZIP/gZip header. If this fails, you can try -brute which even detects deflate streams that lack the two byte header, but this will be extremely slow and can cause false positives.
After that, you'll know if there is a (valid) deflate stream in the file and if so, the additional information should help you to decompress other reports correctly using zLib decompression routines or similar.

WebException.Response.GetResponseStream() limited to 65536 characters

I am trying to retrieve HTML code from a webpage using HttpWebRequest and HttpWebResponse.
response = (HttpWebResponse)request.GetResponse();
...
Stream stream = response.GetResponseStream();
The response object has a ContentLength value of 106142. When I look at the stream object, it has a length of 65536. When reading the stream with a StreamReader using ReadToEnd(), only the first 65536 characters are returned.
How can I get the whole code?
Edit:
Using the following code segment:
catch (WebException ex)
{
errorMessage = errorMessage + ex.Message;
if (ex.Response != null) {
if (ex.Response.ContentLength > 0)
{
using (Stream stream = ex.Response.GetResponseStream())
{
using (StreamReader reader = new StreamReader(stream))
{
string pageOutput = reader.ReadToEnd().Trim();
ex.Response.ContentLength = 106142
ex.Response.GetResponseStream().Length = 65536
stream.Length = 65536
pageOutput.Length = 65534 (because of the trim)
And yes, the code is actually truncated.

You can find an answer in this topic in System.Net.HttpWebResponse.GetResponseStream() returns truncated body in WebException
You have to manage the HttpWebRequest object and change DefaultMaximumErrorResponseLength property.
For example :
HttpWebRequest.DefaultMaximumErrorResponseLength = 1048576;

ReadToEnd does specifically just that, it reads to the end of the stream. I would check to make sure that you were actually being sent the entire expected response.

There seems to be a problem when calling the GetResponseStream() method on the HttpWebResponse returned by the exception. Everything works as expected when there is no exception.
I wanted to get the HTML code from the error returned by the server.
I guess I'll have to hope the error doesn't exceed 65536 characters...

OutOfMemoryException loading xml document winmo 6.1

I am using c# on a windows mobile 6.1 device. compact framework 3.5.
I am getting a OutofMemoryException when loading in a large string of XML.
The handset has limited memory, but should be more than enough to handle the size of the xml string. The string of xml contains the base64 contents of a 2 MB file. The code will work when the xml string contains files of up to 1.8 MB.
I am completely puzzled as to what to do. Not sure how to change any memory settings.
I have included a condensed copy of the code below. Any help is appreciated.
Stream newStream = myRequest.GetRequestStream();
// Send the data.
newStream.Write(data, 0, data.Length);
//close the write stream
newStream.Close();
// Get the response.
HttpWebResponse response = (HttpWebResponse)myRequest.GetResponse();
// Get the stream containing content returned by the server.
Stream dataStream = response.GetResponseStream();
//Process the return
//Set the buffer
byte[] server_response_buffer = new byte[8192];
int response_count = -1;
string tempString = null;
StringBuilder response_sb = new StringBuilder();
//Loop through the stream until it is all written
do
{
// Read content into a buffer
response_count = dataStream.Read(server_response_buffer, 0, server_response_buffer.Length);
// Translate from bytes to ASCII text
tempString = Encoding.ASCII.GetString(server_response_buffer, 0, response_count);
// Write content to a file from buffer
response_sb.Append(tempString);
}
while (response_count != 0);
responseFromServer = response_sb.ToString();
// Cleanup the streams and the response.
dataStream.Close();
response.Close();
}
catch {
MessageBox.Show("There was an error with the communication.");
comm_error = true;
}
if(comm_error == false){
//Load the xml file into an XML object
XmlDocument xdoc = new XmlDocument();
xdoc.LoadXml(responseFromServer);
}
The error occurs on the xdoc.LoadXML line. I have tried writing the stream to a file and then loading the file directly into the xmldocument but it was no better.
Completely stumped at this point.

I would recommend that you use the XmlTextReader class instead of the XmlDocument class. I am not sure what your requirements are for reading of the xml, but XmlDocument is very memory intensive as it creates numerous objects and attempts to load the entire xml string. The XmlTextReader class on the other hand simply scans through the xml as you read from it.
Assuming you have the string, this means you would do something like the following
String xml = "<someXml />";
using(StringReader textReader = new StringReader(xml)) {
using(XmlTextReader xmlReader = new XmlTextReader(textReader)) {
xmlReader.MoveToContent();
xmlReader.Read();
// the reader is now pointed at the first element in the document...
}
}

Have you tried loading from a stream instead of from a string(this is different from writing to a stream, because in your example you are still trying to load it all at once into memory with the XmlDocument)?
There are other .NET components for XML files that work with the XML as a stream instead of loading it all at once. The problem is that .LoadXML probably tries to process the entire document at once, loading it in memory. Not only that but you've already loaded it into a string, so it exists in two different forms in memory, further increasing the chance that you do not have enough free contiguous memory available.
What you want is some way to read it piece meal into a stream through an XmlReader so that you can begin reading the XML document piece wise without loading the entire thing into memory. Of course there are limitations to this approach because an XmlReader is forward only and readonly, and whether it will work depends on what you are wanting to do with the XML once it is loaded.

I'm unsure why you are reading the xml in the way you are, but it could be very memory inefficient. If the garbage collector hasn't kicked in you could have 3+ copies of the document in memory: in the string builder, in the string and in the XmlDocument.
Much better to do something like:
XmlDocument xDoc = new XmlDocument();
Stream dataStream;
try {
dataStream = response.GetResponseStream();
xDoc.Load(dataStream);
} catch {
MessageBox.Show("There was an error with the communication.");
} finally {
if(dataStream != null)
dataStream.Dispose();
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Reading "chunked" response with HttpWebResponse - c#

Related

.NET won't download full XML response from REST API

Using dataSet.ReadXml when the server is continuously pushing xml

Raw Stream Has Data, Deflate Returns Zero Bytes

WebException.Response.GetResponseStream() limited to 65536 characters

OutOfMemoryException loading xml document winmo 6.1

Categories

Resources