How memory management works between bytes and string? - c#

In C# If I have 4-5 GB of data which is now in form of bytes then I am converting it into string then what will be the impact of this on memory and how to do better memory management while using variable for large string ?
Code
public byte[] ExtractMessage(int start, int end)
{
if (end <= start)
return null;
byte[] message = new byte[end - start];
int remaining = Size - end;
Array.Copy(Frame, start, message, 0, message.Length);
// Shift any remaining bytes to front of the buffer
if (remaining > 0)
Array.Copy(Frame, end, Frame, 0, remaining);
Size = remaining;
ScanPosition = 0;
return message;
}
byte[] rawMessage = Buffer.ExtractMessage(messageStart, messageEnd);
// Once bytes are received, I want to create an xml file which will be used for further more work
string msg = Encoding.UTF8.GetString(rawMessage);
CreateXMLFile(msg);
public void CreateXMLFile(string msg)
{
string fileName = "msg.xml";
if (File.Exists(fileName))
{
File.Delete(fileName);
}
using (File.Create(fileName)) { };
TextWriter tw = new StreamWriter(fileName, true);
tw.Write(msg);
tw.Close();
}

.NET strings are stored as unicode which means two bytes per character. As you are using UTF8 you'll double the memory usage when converting to a string.
Once you've converted the text to a string nothing more will happen unless you try do modify it. string objects are immutable which means that a new copy of the string will be created each time you modify it using one of the methods such as Remove().
You can read more here: How are strings passed in .NET?
A byte array however is always passed by reference, and each change will affect all variables holding it. Thus changes will not hurt performance/memory consumption.
You can get a byte[] from a string by using var buffer = yourEncoding.GetBytes(yourString);. Common encodings can be accessed using static variables: var buffer= Encoding.UTF8.GetBytes(yourString);

Related

.net: efficient way to read a binary file into memory then access

I'm a novice programmer. I'm creating a library to process binary files of a certain type -- like a codec (though without a need to process a progressive stream coming over a wire). I'm looking for an efficient way to read the file into memory and then parse portions of the data as needed. In particular, I'd like to avoid large memory copies, hopefully without a lot of added complexity to avoid that.
In some situations, I want to do sequential reading of values in the data. For this, a MemoryStream works well.
FileStream fs = new FileStream(_fileName, FileMode.Open, FileAccess.Read);
byte[] bytes = new byte[fs.Length];
fs.Read(bytes, 0, bytes.Length);
_ms = new MemoryStream(bytes, 0, bytes.Length, false, true);
fs.Close();
(That involved a copy from the bytes array into the memory stream; that's one time, and I don't know of a way to avoid it.)
With the memory stream, it's easy to seek to arbitrary positions and then start reading structure members. E.g.,
_ms.Seek(_tableRecord.Offset, SeekOrigin.Begin);
byte[] ab32 = new byte[4];
_version = ConvertToUint(_ms.Read(ab32));
_numRecords = ConvertToUint(_ms.Read(ab32));
// etc.
But there may also be times when I want to take a slice out of the memory corresponding to some large structure and then pass into a method for certain processing. MemoryStream doesn't support that. I could always pass the MemoryStream plus offset and length, though that might not always be the most convenient.
Instead of MemoryStream, I could store the data in memory using Memory. That supports slicing, but not sequential reading.
If for some situation I want to get a slice (rather than pass stream & offset/length), I could construct an ArraySegment from MemoryStream.GetBuffer.
ArraySegment<byte> as = new ArraySegment<byte>(ms.GetBuffer(), offset, length);
It's not clear to me, though, if that will result in a (potentially large) copy, or if that uses a reference into the same memory held by the MemoryStream. I gather that GetBuffer exposes the underlying memory rather than providing a copy; and that ArraySegment will point into the same memory?
There will be times when I need to get a slice that is a copy as I'll need to modify some elements and then process that, but without changing the original. If ArraySegment gets a reference rather than a copy, I gather I could use ArraySegment<byte>.ToArray()?
So, my questions are:
Is MemoryStream the best approach? Is there any other type that allows sequential reading like MemoryStream but also allows slicing like Memory?
If I want a slice without copying memory, will ArraySegment<byte>(ms.GetBuffer(), offset, length) do that?
Then if I need a copy that can be modified without affecting the original, use ArraySegment<byte>.ToArray()?
Is there a way to read the data from a file directly into a new MemoryStream without creating a temporary byte array that gets copied?
Am I approaching all this the best way?
To get the initial MemoryStream from reading the file, the following works:
byte[] bytes;
try
{
// File.ReadAllBytes opens a filestream and then ensures it is closed
bytes = File.ReadAllBytes(_fi.FullName);
_ms = new MemoryStream(bytes, 0, bytes.Length, false, true);
}
catch (IOException e)
{
throw e;
}
File.ReadAllBytes() copies the file content into memory. It uses using, which means that it ensures the file gets closed. So no Finally statement is needed.
I can read individual values from the MemoryStream using MemoryStream.Read. These calls involve copies of those values, which is fine.
In one situation, I needed to read a table out of the file, change a value, and then calculate a checksum of the entire file with that change in place. Instead of copying the entire file so that I could edit one part, I was able to calculate the checksum in progressive steps: first on the initial, unchanged segment of the file, then continue with the middle segment that was changed, then continue with the remainder.
For this I could process the first and final segments using the MemoryStream. This involved lots of reads, with each read copying; but those copies were transient variables, so no significant working set increase.
For the middle segment, that needed to be copied since it had to be changed (but the original version needed to be kept intact). The following worked:
// get ref (not copy!) to the byte array underlying the MemoryStream
byte[] fileData = _ms.GetBuffer();
// determine the required length
int length = _tableRecord.Length;
// create array to hold the copy
byte[] segmentCopy = new byte[length];
// get the copy
Array.ConstrainedCopy(fileData, _tableRecord.Offset, segmentCopy, 0, length);
After modifying values in segmentCopy, I then needed to pass this to my static method for calculating checksums, which expected a MemoryStream (for sequential reading). This worked:
// new MemoryStream will hold a ref to the segmentCopy array (no new copy!)
MemoryStream ms = new MemoryStream(segmentCopy, 0, segmentCopy.Length);
What I haven't needed to do yet, but will want to do, is to get a slice of the MemoryStream that doesn't involve copying. This works:
MemoryStream sliceFromMS = new MemoryStream(fileData, offset, length);
From above, fileData was a ref to the array underlying the original MemoryStream. Now sliceFromMS will have a ref to a segment within that same array.
You can use FileStream.Seek, as I understand it, there is no need to load data into memory, then to use this method of MemoryStream
In the following example, str1 and str2 are equal:
using (var fs = new FileStream(#"C:\Users\bar_v\OneDrive\Desktop\js_balancer.txt", FileMode.Open))
{
var buffer = new byte[20];
fs.Read(buffer, 0, 20);
var str1= Encoding.ASCII.GetString(buffer);
fs.Seek(0, SeekOrigin.Begin);
fs.Read(buffer, 0, 20);
var str2 = Encoding.ASCII.GetString(buffer);
}
By the way, when you create a new MemoryStream object, you don’t copy the byte array, you just keep a reference to it:
public MemoryStream(byte[] buffer, bool writable)
{
if (buffer == null)
throw new ArgumentNullException(nameof(buffer), SR.ArgumentNull_Buffer);
_buffer = buffer;
_length = _capacity = buffer.Length;
_writable = writable;
_exposable = false;
_origin = 0;
_isOpen = true;
}
But when reading, as we can see, copying occurs:
public override int Read(byte[] buffer, int offset, int count)
{
if (buffer == null)
throw new ArgumentNullException(nameof(buffer), SR.ArgumentNull_Buffer);
if (offset < 0)
throw new ArgumentOutOfRangeException(nameof(offset), SR.ArgumentOutOfRange_NeedNonNegNum);
if (count < 0)
throw new ArgumentOutOfRangeException(nameof(count), SR.ArgumentOutOfRange_NeedNonNegNum);
if (buffer.Length - offset < count)
throw new ArgumentException(SR.Argument_InvalidOffLen);
EnsureNotClosed();
int n = _length - _position;
if (n > count)
n = count;
if (n <= 0)
return 0;
Debug.Assert(_position + n >= 0, "_position + n >= 0"); // len is less than 2^31 -1.
if (n <= 8)
{
int byteCount = n;
while (--byteCount >= 0)
buffer[offset + byteCount] = _buffer[_position + byteCount];
}
else
Buffer.BlockCopy(_buffer, _position, buffer, offset, n);
_position += n;
return n;
}

Read large file into byte array and encode it to ToBase64String

I have implemented POC to read entire file content into Byte[] array. I am now succeeded to read files whose size below 100MB, when I load file whose size more than 100MB then it is throwing
Convert.ToBase64String(mybytearray) Cannot obtain value of the
local variable or argument because there is not enough memory
available.
Below is my code that I have tried to read content from file to Byte array
var sFile = fileName;
var mybytearray = File.ReadAllBytes(sFile);
var binaryModel = new BinaryModel
{
fileName = binaryFile.FileName,
binaryData = Convert.ToBase64String(mybytearray),
filePath = string.Empty
};
My model class is as below
public class BinaryModel
{
public string fileName { get; set; }
public string binaryData { get; set; }
public string filePath { get; set; }
}
I am getting "Convert.ToBase64String(mybytearray) Cannot obtain value of the local variable or argument because there is not enough memory available." this error at Convert.ToBase64String(mybytearray).
Is there anything which I need to take care to prevent this error?
Note: I do not want to add line breaks to my file content
To save memory you can convert stream of bytes in 3-packs. Every three bytes produce 4 bytes in Base64. You don't need whole file in memory at once.
Here is pseudocode:
Repeat
1. Try to read max 3 bytes from stream
2. Convert to base64, write to output stream
And simple implementation:
using (var inStream = File.OpenRead("E:\\Temp\\File.xml"))
using (var outStream = File.CreateText("E:\\Temp\\File.base64"))
{
var buffer = new byte[3];
int read;
while ((read = inStream.Read(buffer, 0, 3)) > 0)
{
var base64 = Convert.ToBase64String(buffer, 0, read);
outStream.Write(base64);
}
}
Hint: every multiply of 3 is valid. Higher - more memory, better performance, lower - less memory, worse performance.
Additional info:
File stream is an example. As a result stream use [HttpContext].Response.OutputStream and write directly to it. Processing hundreds of megabytes in one chunk will kill you and your server.
Think about total memory requirements. 100MB in string, leads to 133 MB in byte array, since you wrote about model I expect copy of this 133 MB in response. And remember it's just a simple request. A few such requests could drain your memory.
I would use two filestreams - one to read the large file, one to write the result back out.
So in chunks you would convert to base 64 ... then convert the resulting string to bytes ... and write.
private static void ConvertLargeFileToBase64()
{
var buffer = new byte[16 * 1024];
using (var fsIn = new FileStream("D:\\in.txt", FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
using (var fsOut = new FileStream("D:\\out.txt", FileMode.CreateNew, FileAccess.Write))
{
int read;
while ((read = fsIn.Read(buffer, 0, buffer.Length)) > 0)
{
// convert to base 64 and convert to bytes for writing back to file
var b64 = Encoding.ASCII.GetBytes(Convert.ToBase64String(buffer));
// write to the output filestream
fsOut.Write(b64, 0, read);
}
fsOut.Close();
}
}
}

Read memory mapped file to string

I am trying to read data from a memory mapped file, which is written to the memory file from a C++ program. I am able to use the debug method and write the data as a string from a loop. However, I want to convert the byte array to a usable string that I can then manipulate.
using (MemoryMappedFile mmf = MemoryMappedFile.OpenExisting("DataFile"))
{
using (MemoryMappedViewAccessor reader = mmf.CreateViewAccessor())
{
var bytes = new byte[reader.Capacity];
reader.ReadArray<byte>(0, bytes, 0, bytes.Length);
for(int i = 0; i<bytes.Length; i++)
{
System.Diagnostics.Debug.Write((char) bytes[i]);
}
}
}
I've tried removing the for loop and replacing it with a GetString() encoder, but it only returns a question mark character instead of the full data string.
So long as you know the encoding the bytes were written with this is easy, you would use for example the System.Text.UTF8Encoding class (probably the System.Text.Encoding.UTF8 instance) and the GetString method:
string str = System.Text.Encoding.UTF8.GetString(bytes, 0, bytes.Length);
If you don't know the encoding this becomes much harder, you would have to try and use a heuristic method and guess what the encoding actually is.

C++/zlib/gzip compression and C# GZipStream decompression fails

I know there's a ton of questions about zlib/gzip etc but none of them quite match what I'm trying to do (or at least I haven't found it). As a quick overview, I have a C# server that decompresses incoming strings using a GZipStream. My task is to write a C++ client that will compress a string compatible with GZipStream decompression.
When I use the code below I get an error that says "The magic number in GZip header is not correct. Make sure you are passing in a GZip stream." I understand what the magic number is and everything, I just don't know how to magically set it properly.
Finally, I'm using the C++ zlib nuget package but have also used the source files directly from zlib with the same bad luck.
Here's a more in depth view:
The server's function for decompression
public static string ReadMessage(NetworkStream stream)
{
byte[] buffer = new byte[512];
StringBuilder messageData = new StringBuilder();
GZipStream gzStream = new GZipStream(stream, CompressionMode.Decompress, true);
int bytes = 0;
while (true)
{
try
{
bytes = gzStream.Read(buffer, 0, buffer.Length);
}
catch (InvalidDataException ex)
{
Console.WriteLine($"Busted: {ex.Message}");
return "";
}
// Use Decoder class to convert from bytes to Default
// in case a character spans two buffers.
Decoder decoder = Encoding.Default.GetDecoder();
char[] chars = new char[decoder.GetCharCount(buffer, 0, bytes)];
decoder.GetChars(buffer, 0, bytes, chars, 0);
messageData.Append(chars);
Console.WriteLine(messageData);
// Check for EOF or an empty message.
if (messageData.ToString().IndexOf("<EOF>", StringComparison.Ordinal) != -1)
break;
}
int eof = messageData.ToString().IndexOf("<EOF>", StringComparison.Ordinal);
string message = messageData.ToString().Substring(0, eof).Trim();
//Returns message without ending EOF
return message;
}
To sum it up, it accepts a NetworkStream in, gets the compressed string, decompresses it, adds it to a string, and loops until it finds <EOF> which is removed then returns the final decompressed string. This is almost a match from the example off of MSDN.
Here's the C++ client side code:
char* CompressString(char* message)
{
int messageSize = sizeof(message);
//Compress string
z_stream zs;
memset(&zs, 0, sizeof(zs));
zs.zalloc = Z_NULL;
zs.zfree = Z_NULL;
zs.opaque = Z_NULL;
zs.next_in = reinterpret_cast<Bytef*>(message);
zs.avail_in = messageSize;
int iResult = deflateInit2(&zs, Z_BEST_COMPRESSION, Z_DEFLATED, (MAX_WBITS + 16), 8, Z_DEFAULT_STRATEGY);
if (iResult != Z_OK) zerr(iResult);
int ret;
char* outbuffer = new char[messageSize];
std::string outstring;
// retrieve the compressed bytes blockwise
do {
zs.next_out = reinterpret_cast<Bytef*>(outbuffer);
zs.avail_out = sizeof(outbuffer);
ret = deflate(&zs, Z_FINISH);
if (outstring.size() < zs.total_out) {
// append the block to the output string
outstring.append(outbuffer,
zs.total_out - outstring.size());
}
} while (ret == Z_OK);
deflateEnd(&zs);
if (ret != Z_STREAM_END) { // an error occurred that was not EOF
std::ostringstream oss;
oss << "Exception during zlib compression: (" << ret << ") " << zs.msg;
throw std::runtime_error(oss.str());
}
return &outstring[0u];
}
Long story short here, it accepts a string and goes through a pretty standard zlib compression with the WBITS being set to wrap it in a gzip header/footer. It then returns a char* of the compressed input. This is what is sent to the server above to be decompressed.
Thanks for any help you can give me! Also, let me know if you need any more information.
In your CompressString function you return a char* obtained from the a locally declared std::string. The string will be destroyed when the function returns which will release the memory at the pointer you've returned.
It's likely that something is being allocated to the this memory region and writing over your compressed data before it gets sent.
You need to ensure the memory containing the compressed data remains allocated until it has been sent. Perhaps by passing a std::string& into the function and storing it in there.
An unrelated bug: you do char* outbuffer = new char[messageSize]; but there is no call to delete[] for that buffer. This will result in a memory leak. As you're throwing exceptions from this function too I would recommend using std::unique_ptr<char[]> instead of trying to manually sort this out with your own delete[] calls. In fact I would always recommend std::unique_ptr instead of explicit calls to delete if possible.

StreamReader.Read text out of order

I'm trying to send the Base64 string of a screenshot to the server via NetworkStream and it appears I'm receiving the full string, problem is it's scrambled...
I assume this has something to do with it being fragmented and put back together? What would be the appropriate way to go about this...
Client Code
byte[] ImageBytes = Generics.Imaging.ImageToByte(Generics.Imaging.Get_ScreenShot_In_Bitmap());
string StreamData = "REMOTEDATA|***|" + Convert.ToBase64String(ImageBytes);
SW.WriteLine(StreamData);
SW.Flush();
Server Code
char[] ByteData = new char[350208];
SR.Read(ByteData, 0, 350208);
string Data = new string(ByteData);
File.WriteAllText("C:\\RecievedText", Data);
Also the size of the sent message and the char array are exactly the same.\
EDIT:
After messing around with it some more I realized the text isnt scrambled but the proper text is trailing the previous stream.. How can I ensure the stream is clear or gets the entire text
It's likely that you're not reading all of the previous response. You have to read in a loop until you get no data, like this:
char[] ByteData = new char[350208];
int totalChars = 0;
int charsRead;
while ((charsRead = SR.Read(ByteData, totalChars, ByteData.Length - totalChars) != 0)
{
totalChars += charsRead;
}
string Data = new string(ByteData, 0, totalChars);
File.WriteAllText("C:\\RecievedText", Data);
The key here is that StreamReader.Read reads up to the maximum number of characters you told it to. If there aren't that many characters immediately available, it will read what's available and return those. The return value tells you how many it read. You have to keep reading until you get the number of characters you want, or until Read returns 0.

Categories

Resources