XmlMtomReader reading strategy - c#

Consider the following code:
Stream stream = GetStreamFromSomewhere();
XmlDictionaryReader mtomReader =XmlDictionaryReader.CreateMtomReader
(
stream,
Encoding.UTF8,
XmlDictionaryReaderQuoatas.Max
);
/// ...
/// is there best way to read binary data from mtomReader's element??
string elementString = mtomReader.XmlReader.ReadElementString();
byte[] elementBytes = Covert.FromBase64String(elementString);
Stream elementFileStream = new FileStream(tempFileLocation);
elementFileStream.Write(elementBytes,0,elementBytes.Length);
elementFileStream.Close();
/// ...
mtomReader.Close();
The problem is that the size of the binary attachment supposed to be over 100Mb sometimes. Is there a way to read element's binary attachment block by block and then write it to the temporary file stream so i can escape from allocating memory for the hole stuff?
The second - even more specific issue - does mtomReader create any internal cache of the mime binary attachment before i read element's content, i.e. allocate memory for binary data? Or does it read bytes from the input stream directly?

For those who may be interested in the solution:
using (Stream stream = GetStreamFromSomewhere())
{
using (
XmlDictionaryReader mtomReader = XmlDictionaryReader.CreateMtomReader(
stream, Encoding.UTF8, XmlDictionaryReaderQuotas.Max))
{
string elementString = mtomReader.ReadElementString();
byte[] buffer = new byte[1024];
using (
Stream elementFileStream =
new FileStream(tempFileLocation, FileMode.Create))
{
while(mtomReader.XmlReader.ReadElementContentAsBase64(buffer,0,buffer.Length)
{
elementFileStream.Write(buffer, 0, buffer.Length);
}
}
/// ...
mtomReader.Close();
}
}
ReadElementContentAsBase64(...) helps read binary parts block by block. The second issue of my post was covered perfectly here: Does XmlMtomReader cache binary data from the input stream internally?

For an attachment of that size it would be better to use streaming.
Streamed transfers can improve the
scalability of a service by
eliminating the requirement for large
memory buffers. Whether changing the
transfer mode improves scalability
depends on the size of the messages
being transferred. Large message sizes
favor using streamed transfers.
See: http://msdn.microsoft.com/en-us/library/ms731913.aspx

To begin with, your code should be more like this:
using (Stream stream = GetStreamFromSomewhere())
{
using (
XmlDictionaryReader mtomReader = XmlDictionaryReader.CreateMtomReader(
stream, Encoding.UTF8, XmlDictionaryReaderQuotas.Max))
{
string elementString = mtomReader.ReadElementString();
byte[] elementBytes = Convert.FromBase64String(elementString);
using (
Stream elementFileStream =
new FileStream(tempFileLocation, FileMode.Create))
{
elementFileStream.Write(
elementBytes, 0, elementBytes.Length);
}
/// ...
mtomReader.Close();
}
}
Without the using blocks, you're at risk of resource leaks.

Related

C# increase the size of reading binary data

I am using the below code from Jon Skeet's article. Of late, the binary data that needs to be processed has grown multi-fold. The binary data size file size that I am trying to import is ~ 900 mb almost 1 gb. How do I increase the memory stream size.
public static byte[] ReadFully (Stream stream)
{
byte[] buffer = new byte[32768];
using (MemoryStream ms = new MemoryStream())
{
while (true)
{
int read = stream.Read (buffer, 0, buffer.Length);
if (read <= 0)
return ms.ToArray();
ms.Write (buffer, 0, read);
}
}
}
Your method returns a byte array, which means it will return all of the data in the file. Your entire file will be loaded into memory.
If that is what you want to do, then simply use the built in File methods:
byte[] bytes = System.IO.File.ReadAllBytes(string path);
string text = System.IO.File.ReadAllText(string path);
If you don't want to load the entire file into memory, take advantage of your Stream
using (var fs = new FileStream("path", FileMode.Open))
using (var reader = new StreamReader(fs))
{
var line = reader.ReadLine();
// do stuff with 'line' here, or use one of the other
// StreamReader methods.
}
You don't have to increase the size of MemoryStream - by default it expands to fit the contents.
Apparently there can be problems with memory fragmentation, but you can pre-allocate memory to avoid them:
using (MemoryStream ms = new MemoryStream(1024 * 1024 * 1024)) // initial capacity 1GB
{
}
In my opinion 1GB should be no big deal these days, but it's probably better to process the data in chunks if possible. That is what Streams are designed for.

Inconsistent file size change while writing bytes from stream to a file

I have a file with size 10124, I am adding a byte array, which has length 4 in the beginning of the file.
After that the file size should become 10128, but as I write it to file, the size decreased to 22 bytes. I don't know where is the problem
public void AppendAllBytes(string path, byte[] bytes)
{
var encryptedFile = new FileStream(path, FileMode.Open, FileAccess.Read);
////argument-checking here.
Stream header = new MemoryStream(bytes);
var result = new MemoryStream();
header.CopyTo(result);
encryptedFile.CopyTo(result);
using (var writer = new StreamWriter(#"C:\\Users\\life.monkey\\Desktop\\B\\New folder (2)\\aaaaaaaaaaaaaaaaaaaaaaaaaaa.docx.aef"))
{
writer.Write(result);
}
}
How can I write bytes to the file?
The issue seems to be caused by:
using a StreamWriter to write binary formatted data. The name does not inthuitively suggest this, but the StreamWriter class is suited for writing textual data.
passing an entire stream instead of the actual binary data. To obtain the bytes stored in a MemoryStream, use its convenient ToArray() method.
I suggest you the following code:
public void AppendAllBytes(string path, byte[] bytes)
{
var fileName = #"C:\\Users\\life.monkey\\Desktop\\B\\New folder (2)\\aaaaaaaaaaaaaaaaaaaaaaaaaaa.docx.aef";
using (var encryptedFile = new FileStream(path, FileMode.Open, FileAccess.Read))
using (var writer = new BinaryWriter(File.Open(fileName, FileMode.Append)))
using (var result = new MemoryStream())
{
encryptedFile.CopyTo(result);
result.Flush(); // ensure header is entirely written.
// write header directly, no need to put it in a memory stream
writer.Write(bytes);
writer.Flush(); // ensure the header is written to the result stream.
writer.Write(result.ToArray());
writer.Flush(); // ensure the encryptdFile is written to the result stream.
}
}
The code above uses the BinaryWriter class which is better suited for binary data. It has a Write(byte[] bytes) method overload that is used above to write an entire array to the file. The code uses regular calls to the Flush() method that some may consider not needed, but these guarantee in general, that all the data written prior the call of the Flush() method is persisted within the stream.

Base64 - CryptoStream with StreamWriter vs Convert.ToBase64String()

Following feedback from Alexei, a simplification of the question:
How do I use a buffered Stream approach to convert the contents of a CryptoStream (using ToBase64Transform) into a StreamWriter (Unicode encoding) without using Convert.ToBase64String()?
Note: Calling Convert.ToBase64String() throws OutOfMemoryException, hence the need for a buffered/Stream approach to the conversion.
You probably should implement custom Stream, not a TextWriter. It is much easier to compose streams than writers (like pass your stream to compressed stream).
To create custom stream - derive from Stream and implement at least Write and Flush (and Read if you need R/W stream). The rest is more or less optional and depends on you additional needs, regular copy to other stream does not need anything else.
In constructor get inner stream passed to you for writing to. Base64 is always producing ASCII characters, so it should be easy to write output as UTF-8 with or without BOM directly to a stream, but if you want to specify encoding you can wrap inner stream with StreamWriter internally.
In your Write implementation buffer data till you get enough bytes to have block of multiple of 3 bytes (i.e. 300) and call Convert.ToBase64String on that portion. Make sure not to loose not-yet-converted portion. Since Base64 converts 3 bytes to 4 characters converting in blocks of multiple of 3 size will never have =/== padding at the end and can be concatenated with next block. So write that converted portion into inner stream/writer. Note that you want to limit block size to something relatively small like 3*10000 to avoid allocation of your blocks on large objects heap.
In Flush make sure to convert the last unwritten bytes (this will be the only one with = padding at the end) and write it to the stream too.
For reading you may need to be more careful as in Base64 white spaces are allowed, so you can't read fixed number of characters and convert to bytes. The easiest approach would be to read by character from StreamReader and convert each 4 non-space ones to bytes.
Note: you can consider writing/reading Base64 by hand directly from bytes. It will give you some performance benefits, but may be hard if you are not good with bit shifting.
Please try using following to encrypt. I am using fileName/filePath as input. You can adjust it as per your requirement. Using this I have encrypted over 1 gb file successfully without any out of memory exception.
public bool EncryptUsingStream(string inputFileName, string outputFileName)
{
bool success = false;
// here assuming that you already have key
byte[] key = new byte[128];
SymmetricAlgorithm algorithm = SymmetricAlgorithm.Create();
algorithm.Key = key;
using (ICryptoTransform transform = algorithm.CreateEncryptor())
{
CryptoStream cs = null;
FileStream fsEncrypted = null;
try
{
using (FileStream fsInput = new FileStream(inputFileName, FileMode.Open, FileAccess.Read))
{
//First write IV
fsEncrypted = new FileStream(outputFileName, FileMode.Create, FileAccess.Write);
fsEncrypted.Write(algorithm.IV, 0, algorithm.IV.Length);
//then write using stream
cs = new CryptoStream(fsEncrypted, transform, CryptoStreamMode.Write);
int bytesRead;
int _bufferSize = 1048576; //buggersize = 1mb;
byte[] buffer = new byte[_bufferSize];
do
{
bytesRead = fsInput.Read(buffer, 0, _bufferSize);
cs.Write(buffer, 0, bytesRead);
} while (bytesRead > 0);
success = true;
}
}
catch (Exception ex)
{
//handle exception or throw.
}
finally
{
if (cs != null)
{
cs.Close();
((IDisposable)cs).Dispose();
if (fsEncrypted != null)
{
fsEncrypted.Close();
}
}
}
}
return success;
}

Raw Stream Has Data, Deflate Returns Zero Bytes

I'm reading data (an adCenter report, as it happens), which is supposed to be zipped. Reading the contents with an ordinary stream, I get a couple thousand bytes of gibberish, so this seems reasonable. So I feed the stream to DeflateStream.
First, it reports "Block length does not match with its complement." A brief search suggests that there is a two-byte prefix, and indeed if I call ReadByte() twice before opening DeflateStream, the exception goes away.
However, DeflateStream now returns nothing at all. I've spent most of the afternoon chasing leads on this, with no luck. Help me, StackOverflow, you're my only hope! Can anyone tell me what I'm missing?
Here's the code. Naturally I only enabled one of the two commented blocks at a time when testing.
_results = new List<string[]>();
using (Stream compressed = response.GetResponseStream())
{
// Skip the zlib prefix, which conflicts with the deflate specification
compressed.ReadByte(); compressed.ReadByte();
// Reports reading 3,000-odd bytes, followed by random characters
/*byte[] buffer = new byte[4096];
int bytesRead = compressed.Read(buffer, 0, 4096);
Console.WriteLine("Read {0} bytes.", bytesRead.ToString("#,##0"));
string content = Encoding.ASCII.GetString(buffer, 0, bytesRead);
Console.WriteLine(content);*/
using (DeflateStream decompressed = new DeflateStream(compressed, CompressionMode.Decompress))
{
// Reports reading 0 bytes, and no output
/*byte[] buffer = new byte[4096];
int bytesRead = decompressed.Read(buffer, 0, 4096);
Console.WriteLine("Read {0} bytes.", bytesRead.ToString("#,##0"));
string content = Encoding.ASCII.GetString(buffer, 0, bytesRead);
Console.WriteLine(content);*/
using (StreamReader reader = new StreamReader(decompressed))
while (reader.EndOfStream == false)
_results.Add(reader.ReadLine().Split('\t'));
}
}
As you can probably guess from the last line, the unzipped content should be TDT.
Just for fun, I tried decompressing with GZipStream, but it reports that the magic number is not correct. MS' docs just say "The downloaded report is compressed by using zip compression. You must unzip the report before you can use its contents."
Here's the code that finally worked. I had to save the content out to a file and read it back in. This does not seem reasonable, but for the small quantities of data I'm working with, it's acceptable, I'll take it!
WebRequest request = HttpWebRequest.Create(reportURL);
WebResponse response = request.GetResponse();
_results = new List<string[]>();
using (Stream compressed = response.GetResponseStream())
{
// Save the content to a temporary location
string zipFilePath = #"\\Server\Folder\adCenter\Temp.zip";
using (StreamWriter file = new StreamWriter(zipFilePath))
{
compressed.CopyTo(file.BaseStream);
file.Flush();
}
// Get the first file from the temporary zip
ZipFile zipFile = ZipFile.Read(zipFilePath);
if (zipFile.Entries.Count > 1) throw new ApplicationException("Found " + zipFile.Entries.Count.ToString("#,##0") + " entries in the report; expected 1.");
ZipEntry report = zipFile[0];
// Extract the data
using (MemoryStream decompressed = new MemoryStream())
{
report.Extract(decompressed);
decompressed.Position = 0; // Note that the stream does NOT start at the beginning
using (StreamReader reader = new StreamReader(decompressed))
while (reader.EndOfStream == false)
_results.Add(reader.ReadLine().Split('\t'));
}
}
You will find that DeflateStream is hugely limited in what data it will decompress. In fact if you are expecting entire files it will be of no use at all.
There are hundereds of (mostly small) variations of ZIP files and DeflateStream will get along only with two or three of them.
Best way is likely to use a dedicated library for reading Zip files/streams like DotNetZip or SharpZipLib (somewhat unmaintained).
You could write the stream to a file and try my tool Precomp on it. If you use it like this:
precomp -c- -v [name of input file]
any ZIP/gZip stream(s) inside the file will be detected and some verbose information will be reported (position and length of the stream). Additionally, if they can be decompressed and recompressed bit-to-bit identical, the output file will contain the decompressed stream(s).
Precomp detects ZIP/gZip (and some other) streams anywhere in the file, so you won't have to worry about header bytes or garbage at the beginning of the file.
If it doesn't detect a stream like this, try to add -slow, which detects deflate streams even if they don't have a ZIP/gZip header. If this fails, you can try -brute which even detects deflate streams that lack the two byte header, but this will be extremely slow and can cause false positives.
After that, you'll know if there is a (valid) deflate stream in the file and if so, the additional information should help you to decompress other reports correctly using zLib decompression routines or similar.

Convert a Stream to a FileStream in C#

What is the best method to convert a Stream to a FileStream using C#.
The function I am working on has a Stream passed to it containing uploaded data, and I need to be able to perform stream.Read(), stream.Seek() methods which are methods of the FileStream type.
A simple cast does not work, so I'm asking here for help.
Read and Seek are methods on the Stream type, not just FileStream. It's just that not every stream supports them. (Personally I prefer using the Position property over calling Seek, but they boil down to the same thing.)
If you would prefer having the data in memory over dumping it to a file, why not just read it all into a MemoryStream? That supports seeking. For example:
public static MemoryStream CopyToMemory(Stream input)
{
// It won't matter if we throw an exception during this method;
// we don't *really* need to dispose of the MemoryStream, and the
// caller should dispose of the input stream
MemoryStream ret = new MemoryStream();
byte[] buffer = new byte[8192];
int bytesRead;
while ((bytesRead = input.Read(buffer, 0, buffer.Length)) > 0)
{
ret.Write(buffer, 0, bytesRead);
}
// Rewind ready for reading (typical scenario)
ret.Position = 0;
return ret;
}
Use:
using (Stream input = ...)
{
using (Stream memory = CopyToMemory(input))
{
// Seek around in memory to your heart's content
}
}
This is similar to using the Stream.CopyTo method introduced in .NET 4.
If you actually want to write to the file system, you could do something similar that first writes to the file then rewinds the stream... but then you'll need to take care of deleting it afterwards, to avoid littering your disk with files.

Categories

Resources