Write large data into MemoryStream with C# [duplicate] - c#

This question already has answers here:
Failed to write large amount of data to stream
(2 answers)
Closed 1 year ago.
I have a big stream (4Go), I need to replace some character (I need to replace one specific character with 2 or 3 ones) in that stream, i get the stream from à service à.d I have to return back a stream.
This is what I'm doing
private static Stream UpdateStream(Stream stream, string oldCharacters, string newCharacters, int size = 2048)
{
stream.Position = 0;
StreamReader reader = new StreamReader(stream);
MemoryStream outputStream = new MemoryStream();
StreamWriter writer = new StreamWriter(outputStream);
writer.AutoFlush = true;
char[] buffer = new char[size];
while (!reader.EndOfStream)
{
reader.Read(buffer, 0, buffer.Length);
if (buffer != null)
{
string line = new string(buffer);
if (!string.IsNullOrEmpty(line))
{
string newLine = line.Replace(oldCharacters, newCharacters);
writer.Write(newLine);
}
}
}
return outputStream;
}
But I'm getting an OutOfMemory exception at some point in this line but when looking at computer memery I still have planty available.
writer.Write(newLine);
Any advise ?

This is not an answer, but I couldn't possibly fit it into a comment.
Currently your problem is not solvable without making some assumptions. The problem, as I hear it, is that you want to replace some parts of a large body of text saved in a file and save the modified text in the file again.
Some unknown variables:
How long are those strings you are replacing?
How long are those strings you are replacing it with? The same length as the replaced strings?
What kinds of strings are you looking to replace? A single word? A whole sentence? A whole paragraph?
A solution to your problem would be to read the file into memory in chunks, replace the necessary text and save the "updated" text in a new file and then finally rename the "new file" to the name of the old file. However, without knowing the answers to the above points, you could potentially be wanting to replace a string as long as all text in the file (unlikely, yes). This means in order to do the "replacing" I would have to read the whole file into memory before I can replace any of the text, which causes an OutOfMemoryException. (Yes, you could do some clever scanning to replace such large strings without reading it all into memory at once, but I doubt such a solution is necessary here).
Please edit your question to address the above points.

So to make it work I had to :
use the HugeMemoryStream class from this post Failed to write large amount of data to stream
and define the gcAllowVeryLargeObjects parameter to true
and set the build to 64 bit (Prefer 32-bit unchecked)

Related

How to read a very huge file from the end in C#? [duplicate]

This question already has answers here:
Read from a file starting at the end, similar to tail
(4 answers)
Closed 8 years ago.
I have a text file of size 4.5GB, which was created by one of the experimental applications. On my system of 8GB RAM, this file cannot be opened (tried with sublime text). I just want to read the content of this file from the end, say 45 characters from the end, without loading the entire file in memory. How to do this ?
The inbuilt methods like,
StreamReader.ReadBlock(char[] buffer,int index,int count)
has type int for index, but the index for my case is out of the range for int value. If only there was an overload with long. How to overcome this ? Thanks.
A Stream has long properties named Position and Length. You can set the Position to any value you like within the range [0, Length[, like fileStream.Position = fileStream.Length - 45;.
Or, you could use the Stream.Seek function: fileStream.Seek(-45, SeekOrigin.End).
Use the more readable one depending on the surrounding code and situation, the one that best conveys your intention.
Here's some code:
using (var fileStream = File.OpenRead("myfile.txt"))
using (var streamReader = new StreamReader(fileStream, Encoding.ASCII))
{
if (fileStream.Length >= 45)
fileStream.Seek(-45, SeekOrigin.End);
var value = streamReader.ReadToEnd(); // Last 45 chars;
}
This is easier since you know your file is encoded in ASCII. Otherwise, you'd have to read a bit of text from the start of the file with the reader to let it detect the encoding (if a BOM is present), and only then seek to the position you want to read from.

Read Extremely large file efficiently in C#. Currently using StreamReader

I have a Json file that is sized 50GB and beyond.
Following is what I have written to read a very small chunk of the Json. I now need to modify this to read the large file.
internal static IEnumerable<T> ReadJson<T>(string filePath)
{
DataContractJsonSerializer ser = new DataContractJsonSerializer(typeof(T));
using (StreamReader sr = new StreamReader(filePath))
{
String line;
// Read and display lines from the file until the end of
// the file is reached.
while ((line = sr.ReadLine()) != null)
{
byte[] jsonBytes = Encoding.UTF8.GetBytes(line);
XmlDictionaryReader jsonReader = JsonReaderWriterFactory.CreateJsonReader(jsonBytes, XmlDictionaryReaderQuotas.Max);
var myPerson = ser.ReadObject(jsonReader);
jsonReader.Close();
yield return (T)myPerson;
}
}
}
Would it suffice if I specify the buffer size while constructing the StreamReader in the current code?
Please correct me if I am wrong here.. The buffer size basically specifies how much data is read from disk to memory at a time. So if File is 100MB in size with buffer size as 5MB, it reads 5MB at a time to memory, until entire file is read.
Assuming my understanding of point 3 is right, what would be the ideal buffer size with such a large text file? Would int.Max size be a bad idea? In 64-bit PC int.Max size is 2147483647. I presume buffer size is in bytes, which evaluates to about 2GB. This itself could consume time. I was looking at something like 100MB - 300MB as buffer size.
It is going to read a line at a time (of the input file), which could be 10 bytes, and could be all 50GB. So it comes down to : how is the input file structured? And if the input JSON has newlines other than cleanly at the breaks between objects, this could get really ill.
The buffer size might impact how much it reads while looking for the end of each line, but ultimately: it needs to find a new-line each time (at least, how it is written currently).
I think you should first compare different parsers before worrying about details as the buffer size.
The differences between DataContractJsonSerializer, Raven JSON or Newtonsoft JSON will be quite significant.
So your main issue with this is where are your boundaries, and given that your doc is a JSON doc it seems to me likely that your boundaries are likely to be classes, I assume (or hope) that you don't have 1 honking great class that is 50gb large. I also assume that you don't really need all those classes in memory but you may need to search the whole thing for your subset...does that sound roughly right? if so I think that your pseudo code is something like
using a Json parser that accepts a streamreader (newtonsoft?)
read and parse until eof
yield return your parsed class that matches criteria
read and parse next class
end

Using StreamReader and StreamWriter to Modify Files

I am trying to use StreamReader and StreamWriter to Open a text file (fixed width) and to modify a few specific columns of data. I have dates with the following format that are going to be converted to packed COMP-3 fields.
020100718F
020100716F
020100717F
020100718F
020100719F
I want to be able to read in the dates form a file using StreamReader, then convert them to packed fields (5 characters), and then output them using StreamWriter. However, I haven't found a way to use StreamWriter to right to a specific position, and beginning to wonder if is possible.
I have the following code snip-it.
System.IO.StreamWriter writer;
this.fileName = #"C:\Test9.txt";
reader = new System.IO.StreamReader(System.IO.File.OpenRead(this.fileName));
currentLine = reader.ReadLine();
currentLine = currentLine.Substring(30, 10); //Substring Containing the Date
reader.Close();
...
// Convert currentLine to Packed Field
...
writer = new System.IO.StreamWriter(System.IO.File.Open(this.fileName, System.IO.FileMode.Open));
writer.Write(currentLine);
Currently what I have does the following:
After:
!##$%0718F
020100716F
020100717F
020100718F
020100719F
!##$% = Ascii Characters SO can't display
Any ideas? Thanks!
UPDATE
Information on Packed Fields COMP-3
Packed Fields are used by COBOL systems to reduce the number of bytes a field requires in files. Please see the following SO post for more information: Here
Here is Picture of the following date "20120123" packed in COMP-3. This is my end result and I have included because I wasn't sure if it would effect possible answers.
My question is how do you get StreamWriter to dynamically replace data inside a file and change the lengths of rows?
I have always found it better to to read the input file, filter/process the data and write the output to a temporary file. After finished, delete the original file (or make a backup) and copy the temporary file over. This way you haven't lost half your input file in case something goes wrong in the middle of processing.
You should probably be using a Stream directly (probably a FileStream). This would allow you to change position.
However, you're not going to be able to change record sizes this way, at least, not in-line. You can have one Stream reading from the original file, and another writing to a new, converted copy of the file.
However, I haven't found a way to use StreamWriter to right to a specific position, and
beginning to wonder if is possible.
You can use StreamWriter.BaseStream.Seek method
using (StreamWriter wr = new StreamWriter(File.Create(#"c:\Temp\aaa.txt")))
{
wr.Write("ABC");
wr.Flush();
wr.BaseStream.Seek(0, SeekOrigin.Begin);
wr.Write("Z");
}

Filestream only reading the first 4 characters of the file

Hey there! I'm trying to read a 150mb file with a file stream but every time I do it all I get is: |zl instead of the whole stream. Note that it has some special characters in it.
Does anybody know what the problem could be? here is my code:
using (FileStream fs = File.OpenRead(path))
{
byte[] buffer = new byte[fs.Length];
fs.Read(buffer, 0, buffer.Length);
extract = Encoding.Default.GetString(buffer);
}
Edit:
I tried to read all text but it still returned the same four characters. It works fine on any other file except for these few. When I use read all lines it only gets the first line.
fs.Read() does not read the whole smash of bytes all at once, it reads some number of bytes and returns the number of bytes read. MSDN has an excellent example of how to use it to get the whole file:
http://msdn.microsoft.com/en-us/library/system.io.filestream.read.aspx
For what it's worth, reading the entire 150MB of data into memory is really going to put a drain on your client's system -- the preferred option would be to optimize it so that you don't need the whole file all at once.
If you want to read text this way File.ReadAllLine (or ReadAllText) - http://msdn.microsoft.com/en-us/library/s2tte0y1.aspx is better option.
My guess the file is not text file to start with and the way you display resulting string does stop at 0 characters.
As debracey pointed out Read returns number of bytes read - check that out. Also for file operations it is unlikely to stop at 4 characters...

FileStream position is off after calling ReadLine() from C#

I'm trying to read a (small-ish) file in chunks of a few lines at a time, and I need to return to the beginning of particular chunks.
The problem is, after the very first call to
streamReader.ReadLine();
the streamReader.BaseStream.Position property is set to the end of the file! Now I assume some caching is done in the backstage, but I was expecting this property to reflect the number of bytes that I used from that file. And yes, the file has more than one line :-)
For instance, calling ReadLine() again will (naturally) return the next line in the file, which does not start at the position previously reported by streamReader.BaseStream.Position.
How can I find the actual position where the 1st line ends, so I can return there later?
I can only think of manually doing the bookkeeping, by adding the lengths of the strings returned by ReadLine(), but even here there are a couple of caveats:
ReadLine() strips the new-line character(s) which may have a variable length (is is '\n'? Is it "\r\n"? Etc.)
I'm not sure if this would work OK with variable-length characters
...so right now it seems like my only option is to rethink how I parse the file, so I don't have to rewind.
If it helps, I open my file like this:
using (var reader = new StreamReader(
new FileStream(
m_path,
FileMode.Open,
FileAccess.Read,
FileShare.ReadWrite)))
{...}
Any suggestions?
If you need to read lines, and you need to go back to previous chunks, why not store the lines you read in a List? That should be easy enough.
You should not depend on calculating a length in bytes based on the length of the string - for the reasons you mention yourself: Multibyte characters, newline characters, etc.
I have done a similar implementation where I needed to access the n-th line in an extremely big text file fast.
The reason streamReader.BaseStream.Position had pointed to the end of file is that it has a built-in buffer, as you expected.
Bookkeeping by counting number of bytes read from each ReadLine() call will work for most plain text files. However, I have encounter cases where there control character, the unprintable one, mixed in the text file. The number of bytes calculated is wrong and caused my program not beeing able to seek to the correct location thereafter.
My final solution was to go with implementing the line reader on my own. It worked well so far. This should give some ideas what it looks like:
using (FileStream fs = new FileStream(filePath, FileMode.Open))
{
int ch;
int currentLine = 1, offset = 0;
while ((ch = fs.ReadByte()) >= 0)
{
offset++;
// This covers all cases: \r\n and only \n (for UNIX files)
if (ch == 10)
{
currentLine++;
// ... do sth such as log current offset with line number
}
}
}
And to go back to logged offset:
using (FileStream fs = new FileStream(filePath, FileMode.Open))
{
fs.Seek(yourOffset, SeekOrigin.Begin);
TextReader tr = new StreamReader(fs);
string line = tr.ReadLine();
}
Also note there is already buffering mechanism built into FileStream.
StreamReader isn't designed for this kind of usage, so if this is what you need I suspect that you'll have to write your own wrapper for FileStream.
A problem with the accepted answer is that if ReadLine() encounters an exception, say due to the logging framework locking the file temporarily right when you ReadLine(), then you will not have that line "saved" into a list because it never returned a line. If you catch this exception you cannot retry the ReadLine() a second time because StreamReaders internal state and buffer are screwed up from the last ReadLine() and you will only get part of a line returned, and you cannot ignore that broken line and seek back to the beginning of it as OP found out.
If you want to get to the true seekable location then you need to use reflection to get to StreamReaders private variables that allow you calculate its position inside it's own buffer. Granger's solution seen here: StreamReader and seeking, should work. Or do what other answers in other related questions have done: create your own StreamReader that exposes the true seekable location (this answer in this link: Tracking the position of the line of a streamreader). Those are the only two options I've come across while dealing with StreamReader and seeking, which for some reason decided to completely remove the possibility of seeking in nearly every situation.
edit: I used Granger's solution and it works. Just be sure you go in this order: GetActualPosition(), then set BaseStream.Position to that position, then make sure you call DiscardBufferedData(), and finally you can call ReadLine() and you will get the full line starting from the position given in the method.

Categories

Resources