I am using c# on a windows mobile 6.1 device. compact framework 3.5.
I am getting a OutofMemoryException when loading in a large string of XML.
The handset has limited memory, but should be more than enough to handle the size of the xml string. The string of xml contains the base64 contents of a 2 MB file. The code will work when the xml string contains files of up to 1.8 MB.
I am completely puzzled as to what to do. Not sure how to change any memory settings.
I have included a condensed copy of the code below. Any help is appreciated.
Stream newStream = myRequest.GetRequestStream();
// Send the data.
newStream.Write(data, 0, data.Length);
//close the write stream
newStream.Close();
// Get the response.
HttpWebResponse response = (HttpWebResponse)myRequest.GetResponse();
// Get the stream containing content returned by the server.
Stream dataStream = response.GetResponseStream();
//Process the return
//Set the buffer
byte[] server_response_buffer = new byte[8192];
int response_count = -1;
string tempString = null;
StringBuilder response_sb = new StringBuilder();
//Loop through the stream until it is all written
do
{
// Read content into a buffer
response_count = dataStream.Read(server_response_buffer, 0, server_response_buffer.Length);
// Translate from bytes to ASCII text
tempString = Encoding.ASCII.GetString(server_response_buffer, 0, response_count);
// Write content to a file from buffer
response_sb.Append(tempString);
}
while (response_count != 0);
responseFromServer = response_sb.ToString();
// Cleanup the streams and the response.
dataStream.Close();
response.Close();
}
catch {
MessageBox.Show("There was an error with the communication.");
comm_error = true;
}
if(comm_error == false){
//Load the xml file into an XML object
XmlDocument xdoc = new XmlDocument();
xdoc.LoadXml(responseFromServer);
}
The error occurs on the xdoc.LoadXML line. I have tried writing the stream to a file and then loading the file directly into the xmldocument but it was no better.
Completely stumped at this point.
I would recommend that you use the XmlTextReader class instead of the XmlDocument class. I am not sure what your requirements are for reading of the xml, but XmlDocument is very memory intensive as it creates numerous objects and attempts to load the entire xml string. The XmlTextReader class on the other hand simply scans through the xml as you read from it.
Assuming you have the string, this means you would do something like the following
String xml = "<someXml />";
using(StringReader textReader = new StringReader(xml)) {
using(XmlTextReader xmlReader = new XmlTextReader(textReader)) {
xmlReader.MoveToContent();
xmlReader.Read();
// the reader is now pointed at the first element in the document...
}
}
Have you tried loading from a stream instead of from a string(this is different from writing to a stream, because in your example you are still trying to load it all at once into memory with the XmlDocument)?
There are other .NET components for XML files that work with the XML as a stream instead of loading it all at once. The problem is that .LoadXML probably tries to process the entire document at once, loading it in memory. Not only that but you've already loaded it into a string, so it exists in two different forms in memory, further increasing the chance that you do not have enough free contiguous memory available.
What you want is some way to read it piece meal into a stream through an XmlReader so that you can begin reading the XML document piece wise without loading the entire thing into memory. Of course there are limitations to this approach because an XmlReader is forward only and readonly, and whether it will work depends on what you are wanting to do with the XML once it is loaded.
I'm unsure why you are reading the xml in the way you are, but it could be very memory inefficient. If the garbage collector hasn't kicked in you could have 3+ copies of the document in memory: in the string builder, in the string and in the XmlDocument.
Much better to do something like:
XmlDocument xDoc = new XmlDocument();
Stream dataStream;
try {
dataStream = response.GetResponseStream();
xDoc.Load(dataStream);
} catch {
MessageBox.Show("There was an error with the communication.");
} finally {
if(dataStream != null)
dataStream.Dispose();
}
Related
I am attempting to modify a simple MS word templates XML. I realize there are SDK's available that could make this process easier but what I am tasked with maintaining uses packages and I was told to do the same.
I have a basic test document with two placeholders mapped to the following XML:
<root>
<element>
Fubar
</element>
<second>
This is the second placeholder
</second>
</root>
What I am doing is creating a stream using the word doc, removing the existing XML, getting some hard coded test XML and trying to write that to the stream.
Here is the code I am using:
string strRelRoot = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument";
byte[] buffer = File.ReadAllBytes("dev.docx");
//stream with the template
MemoryStream stream = new MemoryStream(buffer, true);
//create a package using the stream
Package package = Package.Open(stream, FileMode.Open, FileAccess.ReadWrite);
PackageRelationshipCollection pkgrcOfficeDocument = package.GetRelationshipsByType(strRelRoot);
foreach (PackageRelationship pkgr in pkgrcOfficeDocument)
{
if (pkgr.SourceUri.OriginalString == "/")
{
Uri uriData = new Uri("/customXML/item1.xml", UriKind.Relative);
//remove the existing part
if (package.PartExists(uriData))
{
// Delete template "/customXML/item1.xml" part
package.DeletePart(uriData);
}
//create a new part
PackagePart pkgprtData = package.CreatePart(uriData, "application/xml");
//test data
string xml = #"<root>
<element>
Changed
</element>
<second>
The second placeholder changed
</second>
</root>";
//stream created from the xml string
MemoryStream fromStream = new MemoryStream();
UnicodeEncoding uniEncoding = new UnicodeEncoding();
byte[] fromBuffer = uniEncoding.GetBytes(xml);
fromStream.Write(fromBuffer, 0, fromBuffer.Length);
fromStream.Seek(0L, SeekOrigin.Begin);
Stream toStream = pkgprtData.GetStream();
//copy the xml to the part stream
fromStream.CopyTo(toStream);
//copy part stream to the byte stream
toStream.CopyTo(stream);
}
}
This is currently not modifying the document although I feel like I am close to a solution. Any advice would be very much appreciated. Thanks!
Edit: To clairify, the result I am getting is the document is unchanged. I get no exceptions or the like, but the documents XML is not modified.
OK, so not quite the timely response I promised, but here goes!
There are several aspects to the problem. Sample code is from memory and documentation, not necessarily compiled and tested.
Read the template XML
Before you delete the package part containing the template XML, you need to open its stream and read the XML. How you get the XML if the part doesn't exist to begin with is up to you.
My example code uses classes from the LINQ to XML API, though you could use whichever set of XML APIs you prefer.
XElement templateXml = null;
using (Stream stream = package.GetPart(uriData))
templateXml = XElement.Load(stream);
// Now you can delete the part.
At this point you have an in-memory representation of the template XML in templateXml.
Substitute values into the placeholders
templateXml.SetElementValue("element", "Replacement value of first placeholder");
templateXml.SetElementValue("second", "Replacement value of second placeholder");
Check out the methods on XElement if you need to do anything more advanced than this, e.g. read the original content in order to determine the replacement value.
Save the document
This is your original code, modified and annotated.
// The very first thing to do is create the Package in a using statement.
// This makes sure it's saved and closed when you're done.
using (Package package = Package.Open(...))
{
// XML reading, substituting etc. goes here.
// Eventually...
//create a new part
PackagePart pkgprtData = package.CreatePart(uriData, "application/xml");
// Don't need the test data anymore.
// Assuming you need UnicodeEncoding, set it up like this.
var writerSettings = new XmlWriterSettings
{
Encoding = Encoding.Unicode,
};
// Shouldn't need a MemoryStream at all; write straight to the part stream.
// Note using statements to ensure streams are flushed and closed.
using (Stream toStream = pkgprtData.GetStream())
using (XmlWriter writer = XmlWriter.Create(toStream, writerSettings))
templateXml.Save(writer);
// No other copying should be necessary.
// In particular, your toStream.CopyTo(stream) appeared
// to be appending the part's data to the package's stream
// (the physical file), which is a bug.
} // This closes the using statement for the package, which saves the file.
I have a textReader that in a specific instance I want to be able to advance to the end of file quickly so other classes that might hold a reference to this object will not be able to call tr.ReadLine() without getting a null.
This is a large file. I cannot use TextReader.ReadToEnd() as it will often lead to an OutOfMemoryException
I thought I would ask the community if there was a way SEEK the stream without using TextReader.ReadToEnd() which returns a string of all data in the file.
Current method, inefficient.
The following example code is a mock up. Obviously I am not opening a file with an if statement directly following it asking if I want to read to the end.
TextReader tr = new StreamReader("Largefile");
if(needToAdvanceToEndOfFile)
{
while(tr.ReadLine() != null) { }
}
Desired solution (Note this code block contains fake 'concept' methods or methods that cannot be used due to risk of outofmemoryexception)
TextReader tr = new StreamReader("Largefile");
if(needToAdvanceToEndOfFile)
{
tr.SeekToEnd(); // A method that does not return anything. This method does not exist.
// tr.ReadToEnd() not acceptable as it can lead to OutOfMemoryException error as it is very large file.
}
A possible alternative is to read through the file in bigger chunks using tr.ReadBlock(args).
I poked around ((StreamReader)tr).BaseStream but could not find anything that worked.
As I am new to the community I figured I would see if someone knew the answer off the top of their head.
You have to discard any buffered data if you have read any file content - since data is buffered you might get content even if you seek the underlying stream to the end - working example:
StreamReader sr = new StreamReader(fileName);
string sampleLine = sr.ReadLine();
//discard all buffered data and seek to end
sr.DiscardBufferedData();
sr.BaseStream.Seek(0, SeekOrigin.End);
The problem as mentioned in the documentation is
The StreamReader class buffers input from the underlying stream when
you call one of the Read methods. If you manipulate the position of
the underlying stream after reading data into the buffer, the position
of the underlying stream might not match the position of the internal
buffer. To reset the internal buffer, call the DiscardBufferedData
method
Use
reader.BaseStream.Seek(0, SeekOrigin.End);
Test:
using (StreamReader reader = new StreamReader(#"Your Large File"))
{
reader.BaseStream.Seek(0, SeekOrigin.End);
int read = reader.Read();//read will be -1 since you are at the end of the stream
}
Edit: Test it with your code:
using (TextReader tr = new StreamReader("C:\\test.txt"))//test.txt is a file that has data and lines
{
((StreamReader)tr).BaseStream.Seek(0, SeekOrigin.End);
string foo = tr.ReadLine();
Debug.WriteLine(foo ?? "foo is null");//foo is null
int read = tr.Read();
Debug.WriteLine(read);//-1
}
I am receiving a stream of data from a webservice and trying to save the contents of the stream to file. The stream contains standard lines of text alongside large chunks of xml data (on a single line). The size of the file is about 800Mb.
Problem: Receiving an out of memory exception when I process the xml section of each line.
==start file
line 1
line 2
<?xml version=.....huge line etc</xml>
line 3
line4
<?xml version=.....huge line etc</xml>
==end file
Current code, as you can see when it reads in the huge xml line then it spikes the memory.
string readLine;
using (StreamReader reader = new StreamReader(downloadStream))
{
while ((readLine = reader.ReadLine()) != null)
{
streamWriter.WriteLien(readLine); //writes to file
}
}
I was trying to think of a solution where I used both a TextReader/StreamReader and XmlTextReader in combination to process each section. As I get to the xml section I could switch to the XmlTextReader and use the Read() method to read each node thus stopping the memory spike.
Any suggestions on how I could do this? Alternatively, I could create a custom XmlTextReader that was able to read in these lines? Any pointers for this?
Updated
A further problem to this is that I need to read this file back in and split out the two xml sections to separate xml files! I converted the solution to write the file using a binary writer and then started to read the file back in using a binary reader. I have text processing to detect the start of the xml section and specifically which xml section so I can map it to the correct file! However this causes problems reading in the binary file and doing detection...
using (BinaryReader reader = new BinaryReader(savedFileStream))
{
while ((streamLine = reader.ReadString()) != null)
{
if (streamLine.StartsWith("<?xml version=\"1.0\" ?><tag1"))
//xml file 1
else if (streamLine.StartsWith("<?xml version=\"1.0\" ?><tag2"))
//xml file 2
XML may contain all content as one single line, so you'd probably better use a binary reader/writer where you can decide about the read/write size.
An example below, here we read BUFFER_SIZE bytes for each iteration:
Stream s = new MemoryStream();
Stream outputStream = new MemoryStream();
int BUFFER_SIZE = 1024;
using (BinaryReader reader = new BinaryReader(s))
{
BinaryWriter writer = new BinaryWriter(outputStream);
byte[] buffer = new byte[BUFFER_SIZE];
int read = buffer.Length;
while(read != 0)
{
read = reader.Read(buffer, 0, BUFFER_SIZE);
writer.Write(buffer, 0, read);
}
writer.Flush();
writer.Close();
}
I don't know if this causes you problems with encodings etc, but I think you will have to read the file as binary.
If all you want to do is copy one stream to another without modifying the data, you don't need the Stream text or binary helpers (StreamReader, StreamWriter, BinaryReader, BinaryWriter, etc.), simply copy the stream.
internal static class StreamExtensions
{
public static void CopyTo(this Stream readStream, Stream writeStream)
{
byte[] buffer = new byte[4096];
int read;
while ((read = readStream.Read(buffer, 0, buffer.Length)) > 0)
writeStream.Write(buffer, 0, read);
}
}
I think there is a memory leakage
Are you getting out of memory exception after processing a few lines or on the first line itself?
And there is no streamWriter.Flush() inside the while loop.
Don't you think there should be one?
I'm reading data (an adCenter report, as it happens), which is supposed to be zipped. Reading the contents with an ordinary stream, I get a couple thousand bytes of gibberish, so this seems reasonable. So I feed the stream to DeflateStream.
First, it reports "Block length does not match with its complement." A brief search suggests that there is a two-byte prefix, and indeed if I call ReadByte() twice before opening DeflateStream, the exception goes away.
However, DeflateStream now returns nothing at all. I've spent most of the afternoon chasing leads on this, with no luck. Help me, StackOverflow, you're my only hope! Can anyone tell me what I'm missing?
Here's the code. Naturally I only enabled one of the two commented blocks at a time when testing.
_results = new List<string[]>();
using (Stream compressed = response.GetResponseStream())
{
// Skip the zlib prefix, which conflicts with the deflate specification
compressed.ReadByte(); compressed.ReadByte();
// Reports reading 3,000-odd bytes, followed by random characters
/*byte[] buffer = new byte[4096];
int bytesRead = compressed.Read(buffer, 0, 4096);
Console.WriteLine("Read {0} bytes.", bytesRead.ToString("#,##0"));
string content = Encoding.ASCII.GetString(buffer, 0, bytesRead);
Console.WriteLine(content);*/
using (DeflateStream decompressed = new DeflateStream(compressed, CompressionMode.Decompress))
{
// Reports reading 0 bytes, and no output
/*byte[] buffer = new byte[4096];
int bytesRead = decompressed.Read(buffer, 0, 4096);
Console.WriteLine("Read {0} bytes.", bytesRead.ToString("#,##0"));
string content = Encoding.ASCII.GetString(buffer, 0, bytesRead);
Console.WriteLine(content);*/
using (StreamReader reader = new StreamReader(decompressed))
while (reader.EndOfStream == false)
_results.Add(reader.ReadLine().Split('\t'));
}
}
As you can probably guess from the last line, the unzipped content should be TDT.
Just for fun, I tried decompressing with GZipStream, but it reports that the magic number is not correct. MS' docs just say "The downloaded report is compressed by using zip compression. You must unzip the report before you can use its contents."
Here's the code that finally worked. I had to save the content out to a file and read it back in. This does not seem reasonable, but for the small quantities of data I'm working with, it's acceptable, I'll take it!
WebRequest request = HttpWebRequest.Create(reportURL);
WebResponse response = request.GetResponse();
_results = new List<string[]>();
using (Stream compressed = response.GetResponseStream())
{
// Save the content to a temporary location
string zipFilePath = #"\\Server\Folder\adCenter\Temp.zip";
using (StreamWriter file = new StreamWriter(zipFilePath))
{
compressed.CopyTo(file.BaseStream);
file.Flush();
}
// Get the first file from the temporary zip
ZipFile zipFile = ZipFile.Read(zipFilePath);
if (zipFile.Entries.Count > 1) throw new ApplicationException("Found " + zipFile.Entries.Count.ToString("#,##0") + " entries in the report; expected 1.");
ZipEntry report = zipFile[0];
// Extract the data
using (MemoryStream decompressed = new MemoryStream())
{
report.Extract(decompressed);
decompressed.Position = 0; // Note that the stream does NOT start at the beginning
using (StreamReader reader = new StreamReader(decompressed))
while (reader.EndOfStream == false)
_results.Add(reader.ReadLine().Split('\t'));
}
}
You will find that DeflateStream is hugely limited in what data it will decompress. In fact if you are expecting entire files it will be of no use at all.
There are hundereds of (mostly small) variations of ZIP files and DeflateStream will get along only with two or three of them.
Best way is likely to use a dedicated library for reading Zip files/streams like DotNetZip or SharpZipLib (somewhat unmaintained).
You could write the stream to a file and try my tool Precomp on it. If you use it like this:
precomp -c- -v [name of input file]
any ZIP/gZip stream(s) inside the file will be detected and some verbose information will be reported (position and length of the stream). Additionally, if they can be decompressed and recompressed bit-to-bit identical, the output file will contain the decompressed stream(s).
Precomp detects ZIP/gZip (and some other) streams anywhere in the file, so you won't have to worry about header bytes or garbage at the beginning of the file.
If it doesn't detect a stream like this, try to add -slow, which detects deflate streams even if they don't have a ZIP/gZip header. If this fails, you can try -brute which even detects deflate streams that lack the two byte header, but this will be extremely slow and can cause false positives.
After that, you'll know if there is a (valid) deflate stream in the file and if so, the additional information should help you to decompress other reports correctly using zLib decompression routines or similar.
I'm having trouble reading a "chunked" response when using a StreamReader to read the stream returned by GetResponseStream() of a HttpWebResponse:
// response is an HttpWebResponse
StreamReader reader = new StreamReader(response.GetResponseStream());
string output = reader.ReadToEnd(); // throws exception...
When the reader.ReadToEnd() method is called I'm getting the following System.IO.IOException: Unable to read data from the transport connection: The connection was closed.
The above code works just fine when server returns a "non-chunked" response.
The only way I've been able to get it to work is to use HTTP/1.0 for the initial request (instead of HTTP/1.1, the default) but this seems like a lame work-around.
Any ideas?
#Chuck
Your solution works pretty good. It still throws the same IOExeception on the last Read(). But after inspecting the contents of the StringBuilder it looks like all the data has been received. So perhaps I just need to wrap the Read() in a try-catch and swallow the "error".
Haven't tried it this with a "chunked" response but would something like this work?
StringBuilder sb = new StringBuilder();
Byte[] buf = new byte[8192];
Stream resStream = response.GetResponseStream();
string tmpString = null;
int count = 0;
do
{
count = resStream.Read(buf, 0, buf.Length);
if(count != 0)
{
tmpString = Encoding.ASCII.GetString(buf, 0, count);
sb.Append(tmpString);
}
}while (count > 0);
I am working on a similar problem. The .net HttpWebRequest and HttpWebRequest handle cookies and redirects automatically but they do not handle chunked content on the response body automatically.
This is perhaps because chunked content may contain more than simple data (i.e.: chunk names, trailing headers).
Simply reading the stream and ignoring the EOF exception will not work as the stream contains more than the desired content. The stream will contain chunks and each chunk begins by declaring its size. If the stream is simply read from beginning to end the final data will contain the chunk meta-data (and in case where it is gziped content it will fail the CRC check when decompressing).
To solve the problem it is necessary to manually parse the stream, removing the chunk size from each chunk (as well as the CR LF delimitors), detecting the final chunk and keeping only the chunk data. There likely is a library out there somewhere that does this, I have not found it yet.
Usefull resources :
http://en.wikipedia.org/wiki/Chunked_transfer_encoding
https://www.rfc-editor.org/rfc/rfc2616#section-3.6.1
I've had the same problem (which is how I ended up here :-). Eventually tracked it down to the fact that the chunked stream wasn't valid - the final zero length chunk was missing. I came up with the following code which handles both valid and invalid chunked streams.
using (StreamReader sr = new StreamReader(response.GetResponseStream(), Encoding.UTF8))
{
StringBuilder sb = new StringBuilder();
try
{
while (!sr.EndOfStream)
{
sb.Append((char)sr.Read());
}
}
catch (System.IO.IOException)
{ }
string content = sb.ToString();
}
After trying a lot of snippets from StackOverflow and Google, ultimately I found this to work the best (assuming you know the data a UTF8 string, if not, you can just keep the byte array and process appropriately):
byte[] data;
var responseStream = response.GetResponseStream();
var reader = new StreamReader(responseStream, Encoding.UTF8);
data = Encoding.UTF8.GetBytes(reader.ReadToEnd());
return Encoding.Default.GetString(data.ToArray());
I found other variations work most of the time, but occasionally truncate the data. I got this snippet from:
https://social.msdn.microsoft.com/Forums/en-US/4f28d99d-9794-434b-8b78-7f9245c099c4/problems-with-httpwebrequest-and-transferencoding-chunked?forum=ncl
It is funny. During playing with the request header and removing "Accept-Encoding: gzip,deflate" the server in my usecase did answer in a plain ascii manner and no longer with chunked, encoded snippets. Maybe you should give it a try and keep "Accept-Encoding: gzip,deflate" away. The idea came while reading the upper mentioned wiki in topic about using compression.