Memory Issue in string C# - c#

I have little test program
public class Test
{
public string Response { get; set; }
}
My console simply call Test class
class Program
{
static void Main(string[] args)
{
Test t = new Test();
using (StreamReader reader = new StreamReader("C:\\Test.txt"))
{
t.Response = reader.ReadToEnd();
}
t.Response = t.Response.Substring(0, 5);
Console.WriteLine(t.Response);
Console.Read();
}
}
I have appox 60 MB data in my Test.txt file. When the program get executes, it is taking lot of memory because string is immutable. What is the better way handle this kind of scenario using string.
I know that i can use string builder. but i have created this program to replicate a scenario in one of my production application which uses string.
when i tried with GC.Collect(), memory is released immediately. I am not sure whether i can call GC in code.
Please help. Thanks.
UPDATE:
I think i did not explain it clearly. sorry for the confusion.
I am just reading data from file to get huge data as don't want create 60MB of data in code.
My pain point is below line of code where i have huge data in Response field.
t.Response = t.Response.Substring(0, 5);

You could limit your reads to a block of bytes (buffer). Loop through and read the next block into your buffer and write that buffer out. This will prevent a large chunk of data being stored in memory.
using (StreamReader reader = new StreamReader(#"C:\Test.txt", true))
{
char[] buffer = new char[1024];
int idx = 0;
while (reader.ReadBlock(buffer, idx, buffer.Length) > 0)
{
idx += buffer.Length;
Console.Write(buffer);
}
}

Can you read your file line by line? If so, I would recommend calling:
IEnumerable<string> lines = File.ReadLines(path)
When you iterate this collection using
foreach(string line in lines)
{
// do something with line
}
the collection will be iterated using lazy evaluation. That means the entire contents of the file won't need to be kept in memory while you do something with each line.

StreamReader provides just version of Read that you looking for - Read(Char[], Int32, Int32) - which lets you pick out first characters of the stream. Alternatively you can read char-by-char with regular StreamReader.Read till you decided that you have enough.
var textBuffer = new char[5];
reader.ReadToEnd(textBuffer, 0, 5); // TODO: check if it actually read engough
t.Response = new string(textBuffer);
Note that if you know encoding of the stream you may use lower level reading as byte array and use System.Text.Encoding classes to construct strings with encoding yourself instead of relaying on StreamReader.

Related

Convert.ToBase64String throws 'System.OutOfMemoryException' for byte [] (file: large size)

I am trying to convert byte[] to base64 string format so that i can send that information to third party. My code as below:
byte[] ByteArray = System.IO.File.ReadAllBytes(path);
string base64Encoded = System.Convert.ToBase64String(ByteArray);
I am getting below error:
Exception of type 'System.OutOfMemoryException' was thrown. Can you
help me please ?
Update
I just spotted #PanagiotisKanavos' comment pointing to Is there a Base64Stream for .NET?. This does essentially the same thing as my code below attempts to achieve (i.e. allows you to process the file without having to hold the whole thing in memory in one go), but without the overhead/risk of self-rolled code / rather using a standard .Net library method for the job.
Original
The below code will create a new temporary file containing the Base64 encoded version of your input file.
This should have a lower memory footprint, since rather than doing all data at once, we handle it several bytes at a time.
To avoid holding the output in memory, I've pushed that back to a temp file, which is returned. When you later need to use that data for some other process, you'd need to stream it (i.e. so that again you're not consuming all of this data at once).
You'll also notice that I've used WriteLine instead of Write; which will introduce non base64 encoded characters (i.e. the line breaks). That's deliberate, so that if you consume the temp file with a text reader you can easily process it line by line.
However, you can amend per your needs.
void Main()
{
var inputFilePath = #"c:\temp\bigfile.zip";
var convertedDataPath = ConvertToBase64TempFile(inputFilePath);
Console.WriteLine($"Take a look in {convertedDataPath} for your converted data");
}
//inputFilePath = where your source file can be found. This is not impacted by the below code
//bufferSizeInBytesDiv3 = how many bytes to read at a time (divided by 3); the larger this value the more memory is required, but the better you'll find performance. The Div3 part is because we later multiple this by 3 / this ensures we never have to deal with remainders (i.e. since 3 bytes = 4 base64 chars)
public string ConvertToBase64TempFile(string inputFilePath, int bufferSizeInBytesDiv3 = 1024)
{
var tempFilePath = System.IO.Path.GetTempFileName();
using (var fileStream = File.Open(inputFilePath,FileMode.Open))
{
using (var reader = new BinaryReader(fileStream))
{
using (var writer = new StreamWriter(tempFilePath))
{
byte[] data;
while ((data = reader.ReadBytes(bufferSizeInBytesDiv3 * 3)).Length > 0)
{
writer.WriteLine(System.Convert.ToBase64String(data)); //NB: using WriteLine rather than Write; so when consuming this content consider removing line breaks (I've used this instead of write so you can easily stream the data in chunks later)
}
}
}
}
return tempFilePath;
}

Read memory mapped file to string

I am trying to read data from a memory mapped file, which is written to the memory file from a C++ program. I am able to use the debug method and write the data as a string from a loop. However, I want to convert the byte array to a usable string that I can then manipulate.
using (MemoryMappedFile mmf = MemoryMappedFile.OpenExisting("DataFile"))
{
using (MemoryMappedViewAccessor reader = mmf.CreateViewAccessor())
{
var bytes = new byte[reader.Capacity];
reader.ReadArray<byte>(0, bytes, 0, bytes.Length);
for(int i = 0; i<bytes.Length; i++)
{
System.Diagnostics.Debug.Write((char) bytes[i]);
}
}
}
I've tried removing the for loop and replacing it with a GetString() encoder, but it only returns a question mark character instead of the full data string.
So long as you know the encoding the bytes were written with this is easy, you would use for example the System.Text.UTF8Encoding class (probably the System.Text.Encoding.UTF8 instance) and the GetString method:
string str = System.Text.Encoding.UTF8.GetString(bytes, 0, bytes.Length);
If you don't know the encoding this becomes much harder, you would have to try and use a heuristic method and guess what the encoding actually is.

Advanced TextReader to EndOfFile

I have a textReader that in a specific instance I want to be able to advance to the end of file quickly so other classes that might hold a reference to this object will not be able to call tr.ReadLine() without getting a null.
This is a large file. I cannot use TextReader.ReadToEnd() as it will often lead to an OutOfMemoryException
I thought I would ask the community if there was a way SEEK the stream without using TextReader.ReadToEnd() which returns a string of all data in the file.
Current method, inefficient.
The following example code is a mock up. Obviously I am not opening a file with an if statement directly following it asking if I want to read to the end.
TextReader tr = new StreamReader("Largefile");
if(needToAdvanceToEndOfFile)
{
while(tr.ReadLine() != null) { }
}
Desired solution (Note this code block contains fake 'concept' methods or methods that cannot be used due to risk of outofmemoryexception)
TextReader tr = new StreamReader("Largefile");
if(needToAdvanceToEndOfFile)
{
tr.SeekToEnd(); // A method that does not return anything. This method does not exist.
// tr.ReadToEnd() not acceptable as it can lead to OutOfMemoryException error as it is very large file.
}
A possible alternative is to read through the file in bigger chunks using tr.ReadBlock(args).
I poked around ((StreamReader)tr).BaseStream but could not find anything that worked.
As I am new to the community I figured I would see if someone knew the answer off the top of their head.
You have to discard any buffered data if you have read any file content - since data is buffered you might get content even if you seek the underlying stream to the end - working example:
StreamReader sr = new StreamReader(fileName);
string sampleLine = sr.ReadLine();
//discard all buffered data and seek to end
sr.DiscardBufferedData();
sr.BaseStream.Seek(0, SeekOrigin.End);
The problem as mentioned in the documentation is
The StreamReader class buffers input from the underlying stream when
you call one of the Read methods. If you manipulate the position of
the underlying stream after reading data into the buffer, the position
of the underlying stream might not match the position of the internal
buffer. To reset the internal buffer, call the DiscardBufferedData
method
Use
reader.BaseStream.Seek(0, SeekOrigin.End);
Test:
using (StreamReader reader = new StreamReader(#"Your Large File"))
{
reader.BaseStream.Seek(0, SeekOrigin.End);
int read = reader.Read();//read will be -1 since you are at the end of the stream
}
Edit: Test it with your code:
using (TextReader tr = new StreamReader("C:\\test.txt"))//test.txt is a file that has data and lines
{
((StreamReader)tr).BaseStream.Seek(0, SeekOrigin.End);
string foo = tr.ReadLine();
Debug.WriteLine(foo ?? "foo is null");//foo is null
int read = tr.Read();
Debug.WriteLine(read);//-1
}

How to merge efficiently gigantic files with C#

I have over 125 TSV files of ~100Mb each that I want to merge. The merge operation is allowed destroy the 125 files, but not the data. What matter is that a the end, I end up with a big file of the content of all the files one after the other (no specific order).
Is there an efficient way to do that? I was wondering if Windows provides an API to simply make a big "Union" of all those files? Otherwise, I will have to read all the files and write a big one.
Thanks!
So "merging" is really just writing the files one after the other? That's pretty straightforward - just open one output stream, and then repeatedly open an input stream, copy the data, close. For example:
static void ConcatenateFiles(string outputFile, params string[] inputFiles)
{
using (Stream output = File.OpenWrite(outputFile))
{
foreach (string inputFile in inputFiles)
{
using (Stream input = File.OpenRead(inputFile))
{
input.CopyTo(output);
}
}
}
}
That's using the Stream.CopyTo method which is new in .NET 4. If you're not using .NET 4, another helper method would come in handy:
private static void CopyStream(Stream input, Stream output)
{
byte[] buffer = new byte[8192];
int bytesRead;
while ((bytesRead = input.Read(buffer, 0, buffer.Length)) > 0)
{
output.Write(buffer, 0, bytesRead);
}
}
There's nothing that I'm aware of that is more efficient than this... but importantly, this won't take up much memory on your system at all. It's not like it's repeatedly reading the whole file into memory then writing it all out again.
EDIT: As pointed out in the comments, there are ways you can fiddle with file options to potentially make it slightly more efficient in terms of what the file system does with the data. But fundamentally you're going to be reading the data and writing it, a buffer at a time, either way.
Do it from the command line:
copy 1.txt+2.txt+3.txt combined.txt
or
copy *.txt combined.txt
Do you mean with merge that you want to decide with some custom logic what lines go where? Or do you mean that you mainly want to concatenate the files into one big one?
In the case of the latter, it is possible that you don't need to do this programmatically at all, just generate one batch file with this (/b is for binary, remove if not needed):
copy /b "file 1.tsv" + "file 2.tsv" "destination file.tsv"
Using C#, I'd take the following approach. Write a simple function that copies two streams:
void CopyStreamToStream(Stream dest, Stream src)
{
int bytesRead;
// experiment with the best buffer size, often 65536 is very performant
byte[] buffer = new byte[GOOD_BUFFER_SIZE];
// copy everything
while((bytesRead = src.Read(buffer, 0, buffer.Length)) > 0)
{
dest.Write(buffer, 0, bytesRead);
}
}
// then use as follows (do in a loop, don't forget to use using-blocks)
CopStreamtoStream(yourOutputStream, yourInputStream);
Using a folder of 100MB text files totalling ~12GB, I found that a small time saving could be made over the accepted answer by using File.ReadAllBytes and then writing that out to the stream.
[Test]
public void RaceFileMerges()
{
var inputFilesPath = #"D:\InputFiles";
var inputFiles = Directory.EnumerateFiles(inputFilesPath).ToArray();
var sw = new Stopwatch();
sw.Start();
ConcatenateFilesUsingReadAllBytes(#"D:\ReadAllBytesResult", inputFiles);
Console.WriteLine($"ReadAllBytes method in {sw.Elapsed}");
sw.Reset();
sw.Start();
ConcatenateFiles(#"D:\CopyToResult", inputFiles);
Console.WriteLine($"CopyTo method in {sw.Elapsed}");
}
private static void ConcatenateFiles(string outputFile, params string[] inputFiles)
{
using (var output = File.OpenWrite(outputFile))
{
foreach (var inputFile in inputFiles)
{
using (var input = File.OpenRead(inputFile))
{
input.CopyTo(output);
}
}
}
}
private static void ConcatenateFilesUsingReadAllBytes(string outputFile, params string[] inputFiles)
{
using (var stream = File.OpenWrite(outputFile))
{
foreach (var inputFile in inputFiles)
{
var currentBytes = File.ReadAllBytes(inputFile);
stream.Write(currentBytes, 0, currentBytes.Length);
}
}
}
ReadAllBytes method in 00:01:22.2753300
CopyTo method in 00:01:30.3122215
I repeated this a number of times with similar results.

Read data in FileStream into a generic Stream

What's the most efficient way to read a stream into another stream? In this case, I'm trying to read data in a Filestream into a generic stream. I know I could do the following:
1. read line by line and write the data to the stream
2. read chunks of bytes and write to the stream
3. etc
I'm just trying to find the most efficient way.
Thanks
Stephen Toub discusses a stream pipeline in his MSDN .NET matters column here. In the article he describes a CopyStream() method that copies from one input stream to another stream. This sounds quite similar to what you're trying to do.
I rolled together a quick extension method (so VS 2008 w/ 3.5 only):
public static class StreamCopier
{
private const long DefaultStreamChunkSize = 0x1000;
public static void CopyTo(this Stream from, Stream to)
{
if (!from.CanRead || !to.CanWrite)
{
return;
}
var buffer = from.CanSeek
? new byte[from.Length]
: new byte[DefaultStreamChunkSize];
int read;
while ((read = from.Read(buffer, 0, buffer.Length)) > 0)
{
to.Write(buffer, 0, read);
}
}
}
It can be used thus:
using (var input = File.OpenRead(#"C:\wrnpc12.txt"))
using (var output = File.OpenWrite(#"C:\wrnpc12.bak"))
{
input.CopyTo(output);
}
You can also swap the logic around slightly and write a CopyFrom() method as well.
Reading a buffer of bytes and then writing it is fastest. Methods like ReadLine() need to look for line delimiters, which takes more time than just filling a buffer.
I assume by generic stream, you mean any other kind of stream, like a Memory Stream, etc.
If so, the most efficient way is to read chunks of bytes and write them to the recipient stream. The chunk size can be something like 512 bytes.

Categories

Resources