Can GC copy byte arrays? - c#

I have a function which generates a random key, like this:
internal static char[] GenerateKey(int size) {
byte[] bytes = null;
try
{
bytes = new byte[size];
char[] result = new char[size];
FillByteArrayWithStuff(bytes); // Fills bytes with some data, nothing more
FillCharArrayWithBytes(result, bytes); // Fills result with data from bytes, nothing more
return result;
}
finally
{
if (bytes != null)
{
Array.Clear(bytes, 0, bytes.Length);
}
}
}
All FillByteArrayWithStuff does is it fills bytes with randomly generated bytes. FillCharArrayWithBytes converts data from bytes and fills it into result.
At the end of the function, I clear bytes because it's not needed anymore.
When I finish using result from this function, I also clear that char[]. The data is used for a win32 function call.
My question:
Is it possible for bytes or any other byte[] created using new to be copied by the GC?
If byte[]'s can be copied by the GC that means that the array might not be properly zeroed out, and that there still could be copies of the byte[] hanging in plain memory.

Related

.net: efficient way to read a binary file into memory then access

I'm a novice programmer. I'm creating a library to process binary files of a certain type -- like a codec (though without a need to process a progressive stream coming over a wire). I'm looking for an efficient way to read the file into memory and then parse portions of the data as needed. In particular, I'd like to avoid large memory copies, hopefully without a lot of added complexity to avoid that.
In some situations, I want to do sequential reading of values in the data. For this, a MemoryStream works well.
FileStream fs = new FileStream(_fileName, FileMode.Open, FileAccess.Read);
byte[] bytes = new byte[fs.Length];
fs.Read(bytes, 0, bytes.Length);
_ms = new MemoryStream(bytes, 0, bytes.Length, false, true);
fs.Close();
(That involved a copy from the bytes array into the memory stream; that's one time, and I don't know of a way to avoid it.)
With the memory stream, it's easy to seek to arbitrary positions and then start reading structure members. E.g.,
_ms.Seek(_tableRecord.Offset, SeekOrigin.Begin);
byte[] ab32 = new byte[4];
_version = ConvertToUint(_ms.Read(ab32));
_numRecords = ConvertToUint(_ms.Read(ab32));
// etc.
But there may also be times when I want to take a slice out of the memory corresponding to some large structure and then pass into a method for certain processing. MemoryStream doesn't support that. I could always pass the MemoryStream plus offset and length, though that might not always be the most convenient.
Instead of MemoryStream, I could store the data in memory using Memory. That supports slicing, but not sequential reading.
If for some situation I want to get a slice (rather than pass stream & offset/length), I could construct an ArraySegment from MemoryStream.GetBuffer.
ArraySegment<byte> as = new ArraySegment<byte>(ms.GetBuffer(), offset, length);
It's not clear to me, though, if that will result in a (potentially large) copy, or if that uses a reference into the same memory held by the MemoryStream. I gather that GetBuffer exposes the underlying memory rather than providing a copy; and that ArraySegment will point into the same memory?
There will be times when I need to get a slice that is a copy as I'll need to modify some elements and then process that, but without changing the original. If ArraySegment gets a reference rather than a copy, I gather I could use ArraySegment<byte>.ToArray()?
So, my questions are:
Is MemoryStream the best approach? Is there any other type that allows sequential reading like MemoryStream but also allows slicing like Memory?
If I want a slice without copying memory, will ArraySegment<byte>(ms.GetBuffer(), offset, length) do that?
Then if I need a copy that can be modified without affecting the original, use ArraySegment<byte>.ToArray()?
Is there a way to read the data from a file directly into a new MemoryStream without creating a temporary byte array that gets copied?
Am I approaching all this the best way?
To get the initial MemoryStream from reading the file, the following works:
byte[] bytes;
try
{
// File.ReadAllBytes opens a filestream and then ensures it is closed
bytes = File.ReadAllBytes(_fi.FullName);
_ms = new MemoryStream(bytes, 0, bytes.Length, false, true);
}
catch (IOException e)
{
throw e;
}
File.ReadAllBytes() copies the file content into memory. It uses using, which means that it ensures the file gets closed. So no Finally statement is needed.
I can read individual values from the MemoryStream using MemoryStream.Read. These calls involve copies of those values, which is fine.
In one situation, I needed to read a table out of the file, change a value, and then calculate a checksum of the entire file with that change in place. Instead of copying the entire file so that I could edit one part, I was able to calculate the checksum in progressive steps: first on the initial, unchanged segment of the file, then continue with the middle segment that was changed, then continue with the remainder.
For this I could process the first and final segments using the MemoryStream. This involved lots of reads, with each read copying; but those copies were transient variables, so no significant working set increase.
For the middle segment, that needed to be copied since it had to be changed (but the original version needed to be kept intact). The following worked:
// get ref (not copy!) to the byte array underlying the MemoryStream
byte[] fileData = _ms.GetBuffer();
// determine the required length
int length = _tableRecord.Length;
// create array to hold the copy
byte[] segmentCopy = new byte[length];
// get the copy
Array.ConstrainedCopy(fileData, _tableRecord.Offset, segmentCopy, 0, length);
After modifying values in segmentCopy, I then needed to pass this to my static method for calculating checksums, which expected a MemoryStream (for sequential reading). This worked:
// new MemoryStream will hold a ref to the segmentCopy array (no new copy!)
MemoryStream ms = new MemoryStream(segmentCopy, 0, segmentCopy.Length);
What I haven't needed to do yet, but will want to do, is to get a slice of the MemoryStream that doesn't involve copying. This works:
MemoryStream sliceFromMS = new MemoryStream(fileData, offset, length);
From above, fileData was a ref to the array underlying the original MemoryStream. Now sliceFromMS will have a ref to a segment within that same array.
You can use FileStream.Seek, as I understand it, there is no need to load data into memory, then to use this method of MemoryStream
In the following example, str1 and str2 are equal:
using (var fs = new FileStream(#"C:\Users\bar_v\OneDrive\Desktop\js_balancer.txt", FileMode.Open))
{
var buffer = new byte[20];
fs.Read(buffer, 0, 20);
var str1= Encoding.ASCII.GetString(buffer);
fs.Seek(0, SeekOrigin.Begin);
fs.Read(buffer, 0, 20);
var str2 = Encoding.ASCII.GetString(buffer);
}
By the way, when you create a new MemoryStream object, you don’t copy the byte array, you just keep a reference to it:
public MemoryStream(byte[] buffer, bool writable)
{
if (buffer == null)
throw new ArgumentNullException(nameof(buffer), SR.ArgumentNull_Buffer);
_buffer = buffer;
_length = _capacity = buffer.Length;
_writable = writable;
_exposable = false;
_origin = 0;
_isOpen = true;
}
But when reading, as we can see, copying occurs:
public override int Read(byte[] buffer, int offset, int count)
{
if (buffer == null)
throw new ArgumentNullException(nameof(buffer), SR.ArgumentNull_Buffer);
if (offset < 0)
throw new ArgumentOutOfRangeException(nameof(offset), SR.ArgumentOutOfRange_NeedNonNegNum);
if (count < 0)
throw new ArgumentOutOfRangeException(nameof(count), SR.ArgumentOutOfRange_NeedNonNegNum);
if (buffer.Length - offset < count)
throw new ArgumentException(SR.Argument_InvalidOffLen);
EnsureNotClosed();
int n = _length - _position;
if (n > count)
n = count;
if (n <= 0)
return 0;
Debug.Assert(_position + n >= 0, "_position + n >= 0"); // len is less than 2^31 -1.
if (n <= 8)
{
int byteCount = n;
while (--byteCount >= 0)
buffer[offset + byteCount] = _buffer[_position + byteCount];
}
else
Buffer.BlockCopy(_buffer, _position, buffer, offset, n);
_position += n;
return n;
}

C# MemoryStream & GZipInputStream: Can't .Read more than 256 bytes

I'm having a problem with writing an uncompressed GZIP stream using SharpZipLib's GZipInputStream. I only seem to be able to get 256 bytes worth of data with the rest not being written to and left zeroed. The compressed stream (compressedSection) has been checked and all data is there (1500+ bytes). The snippet of the decompression process is below:
int msiBuffer = 4096;
using (Stream msi = new MemoryStream(msiBuffer))
{
msi.Write(compressedSection, 0, compressedSection.Length);
msi.Position = 0;
int uncompressedIntSize = AllMethods.GetLittleEndianInt(uncompressedSize, 0); // Gets little endian value of uncompressed size into an integer
// SharpZipLib GZip method called
using (GZipInputStream decompressStream = new GZipInputStream(msi, uncompressedIntSize))
{
using (MemoryStream outputStream = new MemoryStream(uncompressedIntSize))
{
byte[] buffer = new byte[uncompressedIntSize];
decompressStream.Read(buffer, 0, uncompressedIntSize); // Stream is decompressed and read
outputStream.Write(buffer, 0, uncompressedIntSize);
using (var fs = new FileStream(kernelSectionUncompressed, FileMode.Create, FileAccess.Write))
{
fs.Write(buffer, 0, buffer.Length);
fs.Close();
}
outputStream.Close();
}
decompressStream.Close();
So in this snippet:
1) The compressed section is passed in, ready to be decompressed.
2) The expected size of the uncompressed output (which is stored in a header with the file as a 2-byte little-endian value) is passed through a method to convert it to integer. The header is removed earlier as it is not part of the compressed GZIP file.
3) SharpLibZip's GZIP stream is declared with the compressed file stream (msi) and a buffer equal to int uncompressedIntSize (have tested with a static value of 4096 as well).
4) I set up a MemoryStream to handle writing the output to a file as GZipInputStream doesn't have Read/Write; it takes the expected decompressed file size as the argument (capacity).
5) The Read/Write of the stream needs byte[] array as the first argument, so I set up a byte[] array with enough space to take all the bytes of the decompressed output (3584 bytes in this case, derived from uncompressedIntSize).
6) int GzipInputStream decompressStream uses .Read with the buffer as first argument, from offset 0, using the uncompressedIntSize as the count. Checking the arguments in here, the buffer array still has a capacity of 3584 bytes but has only been given 256 bytes of data. The rest are zeroes.
It looks like the output of .Read is being throttled to 256 bytes but I'm not sure where. Is there something I've missed with the Streams, or is this a limitation with .Read?
You need to loop when reading from a stream; the lazy way is probably:
decompressStream.CopyTo(outputStream);
(but this doesn't guarantee to stop after uncompressedIntSize bytes - it'll try to read to the end of decompressStream)
A more manual version (that respects an imposed length limit) would be:
const int BUFFER_SIZE = 1024; // whatever
var buffer = ArrayPool<byte>.Shared.Rent(BUFFER_SIZE);
try
{
int remaining = uncompressedIntSize, bytesRead;
while (remaining > 0 && // more to do, and making progress
(bytesRead = decompressStream.Read(
buffer, 0, Math.Min(remaining, buffer.Length))) > 0)
{
outputStream.Write(buffer, 0, bytesRead);
remaining -= bytesRead;
}
if (remaining != 0) throw new EndOfStreamException();
}
finally
{
ArrayPool<byte>.Shared.Return(buffer);
}
The issue turned out to be an oversight I'd made earlier in the posted code:
The file I'm working with has 27 sections which are GZipped, but they each have a header which will break the Gzip decompression if the GZipInput stream hits any of them. When opening the base file, it was starting from the beginning (adjusted by 6 to avoid the first header) each time instead of going to the next post-head offset:
brg.BaseStream.Seek(6, SeekOrigin.Begin);
Instead of:
brg.BaseStream.Seek(absoluteSectionOffset, SeekOrigin.Begin);
This meant that the extracted compressed data was an amalgam of the first headerless section + part of the 2nd section along with its header. As the first section is 256 bytes long without its header, this part was being decompressed correctly by the GZipInput stream. But after that is 6-bytes of header which breaks it, resulting in the rest of the output being 00s.
There was no explicit error being thrown by the GZipInput stream when this happened, so I'd incorrectly assumed that the cause was the .Read or something in the stream retaining data from the previous pass. Sorry for the hassle.

C++/zlib/gzip compression and C# GZipStream decompression fails

I know there's a ton of questions about zlib/gzip etc but none of them quite match what I'm trying to do (or at least I haven't found it). As a quick overview, I have a C# server that decompresses incoming strings using a GZipStream. My task is to write a C++ client that will compress a string compatible with GZipStream decompression.
When I use the code below I get an error that says "The magic number in GZip header is not correct. Make sure you are passing in a GZip stream." I understand what the magic number is and everything, I just don't know how to magically set it properly.
Finally, I'm using the C++ zlib nuget package but have also used the source files directly from zlib with the same bad luck.
Here's a more in depth view:
The server's function for decompression
public static string ReadMessage(NetworkStream stream)
{
byte[] buffer = new byte[512];
StringBuilder messageData = new StringBuilder();
GZipStream gzStream = new GZipStream(stream, CompressionMode.Decompress, true);
int bytes = 0;
while (true)
{
try
{
bytes = gzStream.Read(buffer, 0, buffer.Length);
}
catch (InvalidDataException ex)
{
Console.WriteLine($"Busted: {ex.Message}");
return "";
}
// Use Decoder class to convert from bytes to Default
// in case a character spans two buffers.
Decoder decoder = Encoding.Default.GetDecoder();
char[] chars = new char[decoder.GetCharCount(buffer, 0, bytes)];
decoder.GetChars(buffer, 0, bytes, chars, 0);
messageData.Append(chars);
Console.WriteLine(messageData);
// Check for EOF or an empty message.
if (messageData.ToString().IndexOf("<EOF>", StringComparison.Ordinal) != -1)
break;
}
int eof = messageData.ToString().IndexOf("<EOF>", StringComparison.Ordinal);
string message = messageData.ToString().Substring(0, eof).Trim();
//Returns message without ending EOF
return message;
}
To sum it up, it accepts a NetworkStream in, gets the compressed string, decompresses it, adds it to a string, and loops until it finds <EOF> which is removed then returns the final decompressed string. This is almost a match from the example off of MSDN.
Here's the C++ client side code:
char* CompressString(char* message)
{
int messageSize = sizeof(message);
//Compress string
z_stream zs;
memset(&zs, 0, sizeof(zs));
zs.zalloc = Z_NULL;
zs.zfree = Z_NULL;
zs.opaque = Z_NULL;
zs.next_in = reinterpret_cast<Bytef*>(message);
zs.avail_in = messageSize;
int iResult = deflateInit2(&zs, Z_BEST_COMPRESSION, Z_DEFLATED, (MAX_WBITS + 16), 8, Z_DEFAULT_STRATEGY);
if (iResult != Z_OK) zerr(iResult);
int ret;
char* outbuffer = new char[messageSize];
std::string outstring;
// retrieve the compressed bytes blockwise
do {
zs.next_out = reinterpret_cast<Bytef*>(outbuffer);
zs.avail_out = sizeof(outbuffer);
ret = deflate(&zs, Z_FINISH);
if (outstring.size() < zs.total_out) {
// append the block to the output string
outstring.append(outbuffer,
zs.total_out - outstring.size());
}
} while (ret == Z_OK);
deflateEnd(&zs);
if (ret != Z_STREAM_END) { // an error occurred that was not EOF
std::ostringstream oss;
oss << "Exception during zlib compression: (" << ret << ") " << zs.msg;
throw std::runtime_error(oss.str());
}
return &outstring[0u];
}
Long story short here, it accepts a string and goes through a pretty standard zlib compression with the WBITS being set to wrap it in a gzip header/footer. It then returns a char* of the compressed input. This is what is sent to the server above to be decompressed.
Thanks for any help you can give me! Also, let me know if you need any more information.
In your CompressString function you return a char* obtained from the a locally declared std::string. The string will be destroyed when the function returns which will release the memory at the pointer you've returned.
It's likely that something is being allocated to the this memory region and writing over your compressed data before it gets sent.
You need to ensure the memory containing the compressed data remains allocated until it has been sent. Perhaps by passing a std::string& into the function and storing it in there.
An unrelated bug: you do char* outbuffer = new char[messageSize]; but there is no call to delete[] for that buffer. This will result in a memory leak. As you're throwing exceptions from this function too I would recommend using std::unique_ptr<char[]> instead of trying to manually sort this out with your own delete[] calls. In fact I would always recommend std::unique_ptr instead of explicit calls to delete if possible.

How memory management works between bytes and string?

In C# If I have 4-5 GB of data which is now in form of bytes then I am converting it into string then what will be the impact of this on memory and how to do better memory management while using variable for large string ?
Code
public byte[] ExtractMessage(int start, int end)
{
if (end <= start)
return null;
byte[] message = new byte[end - start];
int remaining = Size - end;
Array.Copy(Frame, start, message, 0, message.Length);
// Shift any remaining bytes to front of the buffer
if (remaining > 0)
Array.Copy(Frame, end, Frame, 0, remaining);
Size = remaining;
ScanPosition = 0;
return message;
}
byte[] rawMessage = Buffer.ExtractMessage(messageStart, messageEnd);
// Once bytes are received, I want to create an xml file which will be used for further more work
string msg = Encoding.UTF8.GetString(rawMessage);
CreateXMLFile(msg);
public void CreateXMLFile(string msg)
{
string fileName = "msg.xml";
if (File.Exists(fileName))
{
File.Delete(fileName);
}
using (File.Create(fileName)) { };
TextWriter tw = new StreamWriter(fileName, true);
tw.Write(msg);
tw.Close();
}
.NET strings are stored as unicode which means two bytes per character. As you are using UTF8 you'll double the memory usage when converting to a string.
Once you've converted the text to a string nothing more will happen unless you try do modify it. string objects are immutable which means that a new copy of the string will be created each time you modify it using one of the methods such as Remove().
You can read more here: How are strings passed in .NET?
A byte array however is always passed by reference, and each change will affect all variables holding it. Thus changes will not hurt performance/memory consumption.
You can get a byte[] from a string by using var buffer = yourEncoding.GetBytes(yourString);. Common encodings can be accessed using static variables: var buffer= Encoding.UTF8.GetBytes(yourString);

How to use ICSharpCode.ZipLib with stream?

I'm very sorry for the conservative title and my question itself,but I'm lost.
The samples provided with ICsharpCode.ZipLib doesn't include what I'm searching for.
I want to decompress a byte[] by putting it in InflaterInputStream(ICSharpCode.SharpZipLib.Zip.Compression.Streams.InflaterInputStream)
I found a decompress function ,but it doesn't work.
public static byte[] Decompress(byte[] Bytes)
{
ICSharpCode.SharpZipLib.Zip.Compression.Streams.InflaterInputStream stream =
new ICSharpCode.SharpZipLib.Zip.Compression.Streams.InflaterInputStream(new MemoryStream(Bytes));
MemoryStream memory = new MemoryStream();
byte[] writeData = new byte[4096];
int size;
while (true)
{
size = stream.Read(writeData, 0, writeData.Length);
if (size > 0)
{
memory.Write(writeData, 0, size);
}
else break;
}
stream.Close();
return memory.ToArray();
}
It throws an exception at line(size = stream.Read(writeData, 0, writeData.Length);) saying it has a invalid header.
My question is not how to fix the function,this function is not provided with the library,I just found it googling.My question is,how to decompress the same way the function does with InflaterStream,but without exceptions.
Thanks and again - sorry for the conservative question.
the code in lucene is very nice.
public static byte[] Compress(byte[] input) {
// Create the compressor with highest level of compression
Deflater compressor = new Deflater();
compressor.SetLevel(Deflater.BEST_COMPRESSION);
// Give the compressor the data to compress
compressor.SetInput(input);
compressor.Finish();
/*
* Create an expandable byte array to hold the compressed data.
* You cannot use an array that's the same size as the orginal because
* there is no guarantee that the compressed data will be smaller than
* the uncompressed data.
*/
MemoryStream bos = new MemoryStream(input.Length);
// Compress the data
byte[] buf = new byte[1024];
while (!compressor.IsFinished) {
int count = compressor.Deflate(buf);
bos.Write(buf, 0, count);
}
// Get the compressed data
return bos.ToArray();
}
public static byte[] Uncompress(byte[] input) {
Inflater decompressor = new Inflater();
decompressor.SetInput(input);
// Create an expandable byte array to hold the decompressed data
MemoryStream bos = new MemoryStream(input.Length);
// Decompress the data
byte[] buf = new byte[1024];
while (!decompressor.IsFinished) {
int count = decompressor.Inflate(buf);
bos.Write(buf, 0, count);
}
// Get the decompressed data
return bos.ToArray();
}
Well it sounds like the data is just inappropriate, and that otherwise the code would work okay. (Admittedly I'd use a "using" statement for the streams instead of calling Close explicitly.)
Where did you get your data from?
Why don't you use the System.IO.Compression.DeflateStream class (available since .Net 2.0)? This uses the same compression/decompression method but doesn't require an extra library dependency.
Since .Net 2.0 you only need the ICSharpCode.ZipLib if you need the file container support.

Categories

Resources