Using StreamReader and StreamWriter to Modify Files - c#

I am trying to use StreamReader and StreamWriter to Open a text file (fixed width) and to modify a few specific columns of data. I have dates with the following format that are going to be converted to packed COMP-3 fields.
020100718F
020100716F
020100717F
020100718F
020100719F
I want to be able to read in the dates form a file using StreamReader, then convert them to packed fields (5 characters), and then output them using StreamWriter. However, I haven't found a way to use StreamWriter to right to a specific position, and beginning to wonder if is possible.
I have the following code snip-it.
System.IO.StreamWriter writer;
this.fileName = #"C:\Test9.txt";
reader = new System.IO.StreamReader(System.IO.File.OpenRead(this.fileName));
currentLine = reader.ReadLine();
currentLine = currentLine.Substring(30, 10); //Substring Containing the Date
reader.Close();
...
// Convert currentLine to Packed Field
...
writer = new System.IO.StreamWriter(System.IO.File.Open(this.fileName, System.IO.FileMode.Open));
writer.Write(currentLine);
Currently what I have does the following:
After:
!##$%0718F
020100716F
020100717F
020100718F
020100719F
!##$% = Ascii Characters SO can't display
Any ideas? Thanks!
UPDATE
Information on Packed Fields COMP-3
Packed Fields are used by COBOL systems to reduce the number of bytes a field requires in files. Please see the following SO post for more information: Here
Here is Picture of the following date "20120123" packed in COMP-3. This is my end result and I have included because I wasn't sure if it would effect possible answers.
My question is how do you get StreamWriter to dynamically replace data inside a file and change the lengths of rows?

I have always found it better to to read the input file, filter/process the data and write the output to a temporary file. After finished, delete the original file (or make a backup) and copy the temporary file over. This way you haven't lost half your input file in case something goes wrong in the middle of processing.

You should probably be using a Stream directly (probably a FileStream). This would allow you to change position.
However, you're not going to be able to change record sizes this way, at least, not in-line. You can have one Stream reading from the original file, and another writing to a new, converted copy of the file.

However, I haven't found a way to use StreamWriter to right to a specific position, and
beginning to wonder if is possible.
You can use StreamWriter.BaseStream.Seek method
using (StreamWriter wr = new StreamWriter(File.Create(#"c:\Temp\aaa.txt")))
{
wr.Write("ABC");
wr.Flush();
wr.BaseStream.Seek(0, SeekOrigin.Begin);
wr.Write("Z");
}

Related

Write large data into MemoryStream with C# [duplicate]

This question already has answers here:
Failed to write large amount of data to stream
(2 answers)
Closed 1 year ago.
I have a big stream (4Go), I need to replace some character (I need to replace one specific character with 2 or 3 ones) in that stream, i get the stream from à service à.d I have to return back a stream.
This is what I'm doing
private static Stream UpdateStream(Stream stream, string oldCharacters, string newCharacters, int size = 2048)
{
stream.Position = 0;
StreamReader reader = new StreamReader(stream);
MemoryStream outputStream = new MemoryStream();
StreamWriter writer = new StreamWriter(outputStream);
writer.AutoFlush = true;
char[] buffer = new char[size];
while (!reader.EndOfStream)
{
reader.Read(buffer, 0, buffer.Length);
if (buffer != null)
{
string line = new string(buffer);
if (!string.IsNullOrEmpty(line))
{
string newLine = line.Replace(oldCharacters, newCharacters);
writer.Write(newLine);
}
}
}
return outputStream;
}
But I'm getting an OutOfMemory exception at some point in this line but when looking at computer memery I still have planty available.
writer.Write(newLine);
Any advise ?
This is not an answer, but I couldn't possibly fit it into a comment.
Currently your problem is not solvable without making some assumptions. The problem, as I hear it, is that you want to replace some parts of a large body of text saved in a file and save the modified text in the file again.
Some unknown variables:
How long are those strings you are replacing?
How long are those strings you are replacing it with? The same length as the replaced strings?
What kinds of strings are you looking to replace? A single word? A whole sentence? A whole paragraph?
A solution to your problem would be to read the file into memory in chunks, replace the necessary text and save the "updated" text in a new file and then finally rename the "new file" to the name of the old file. However, without knowing the answers to the above points, you could potentially be wanting to replace a string as long as all text in the file (unlikely, yes). This means in order to do the "replacing" I would have to read the whole file into memory before I can replace any of the text, which causes an OutOfMemoryException. (Yes, you could do some clever scanning to replace such large strings without reading it all into memory at once, but I doubt such a solution is necessary here).
Please edit your question to address the above points.
So to make it work I had to :
use the HugeMemoryStream class from this post Failed to write large amount of data to stream
and define the gcAllowVeryLargeObjects parameter to true
and set the build to 64 bit (Prefer 32-bit unchecked)

Best approach for in memory manipulation of text file in memory: read as byte[] first? read as File.ReadAllText() then save as binary?

I need to change a file in memory, and currently I read the file to memory into a byte[] using a filestream and a binaryreader.
I was wondering whats the best approach to change that file in memory, convert the byte[] to string, make changes and do an Encoding.GetBytes()? or Read the file first as string using File.ReadAllText() and then Encoding.GetBytes()? or any approach will work without caveats?
Any special approaches? I need to replace specific text inside files with additional chars or replacement strings, several 100,000 of files. Reliability is preferred over efficiency. Files are text like HTML, not binary files.
Read the files using File.ReadAllText(), modify them, then do byte[] byteData = Encoding.UTF8.GetBytes(your_modified_string_from_file). Use the encoding with which the files were saved. This will give you an array of byte[]. You can convert the byte[] to a stream like this:
MemoryStream stream = new MemoryStream();
stream.Write(byteData, 0, byteData.Length);
Edit:
It looks like one of the Add methods in the API can take a byte array, so you don't have to use a stream.
You're definitely making things harder on yourself by reading into bytes first. Just use a StreamReader. You can probably get away with using ReadLine() and processing a line at a time. This can seriously reduce your app's memory usage, especially if you're working with that many files.
using (var reader = File.OpenText(originalFile))
using (var writer = File.CreateText(tempFile))
{
string line;
while ((line = reader.ReadLine()) != null)
{
var temp = DoMyStuff(line);
writer.WriteLine(temp);
}
}
File.Delete(originalFile);
File.Move(tempFile, originalFile);
Based on the size of the files, I would use File.ReadAllText to read them and File.WriteAllText to wirte them. This frees you up from the responsibility of having to call Close or Dispose on either read or write.
You generally don't want to read a text file on a binary level - just use File.ReadAllText() and supply it with the correct encoding used in the file (there's an overload for that). If the file encoding is UTF8 or UTF32 usually the method can automatically detect and use the correct endcoding. Same applies to writing it back - if it's not UTF8 specify which encoding you want.

C# equivalent to VB6's 'Open' & 'Put' Functions

I will try to make this as straight forward as possible. This question does not simply involve reading and writing bytes. I am looking for an exact translation between this VB6 code and C# code. I know this is not always a posibility but I'm sure someone out there has some ideas!
VB6 Code & explanation:
The below code writes data into a specific part of the file.
[ Put [#]filenumber, [byte position], varname ].
It is the *byte position * that I am having trouble figuring out - and help with this would be very much appreciated!
Dim file, stringA as string
Open file for Binary As #1
lPos = 10,000
stringA = "ThisIsMyData"
Put #1, lPos, stringA
Close #1
So, I am looking for some help with the byte position, once again. In this example the byte position was represented by lPos.
EDIT FOR HENK -
I will be reading binary data. There are some characters in this binary data that I will need to replace. For this reason, I will be using VB6's instr function to get the poisition of this data (there lengths are previously known). I will then use Vb6's Put function to write this data at the newfound position. This will overwrite the old data with the new data. I hope this helped!
If it helps anyone, here is some further information regarding the Put function.
Thanks so much,
Evan
Can you not use a BinaryWriter?
For example:
FileStream fs = new FileStream(file, FileMode.Open);
BinaryWriter w = new BinaryWriter(fs);
w.Seek(10000, SeekOrigin.Origin);
w.Write(encoding.GetBytes("ThisIsMyData"));
w.Flush();
w.Close();
fs.Close();
You can do this using StreamReader and StreamWriter.
I would try something like this:
Read the first n bits and write them into a new stream using StreamWriter.
Using the same StreamWriter, write the new bits that you want to insert.
Finally, write the rest of the bits from your StreamReader.
This question is not a perfect fit, however it shows a similiar technique using text (not binary data): Insert data into text file
Take a look at the StreamWriter Class specially at this overload of the Write method, which allows you to start writing to a specific place within the stream.

Filestream only reading the first 4 characters of the file

Hey there! I'm trying to read a 150mb file with a file stream but every time I do it all I get is: |zl instead of the whole stream. Note that it has some special characters in it.
Does anybody know what the problem could be? here is my code:
using (FileStream fs = File.OpenRead(path))
{
byte[] buffer = new byte[fs.Length];
fs.Read(buffer, 0, buffer.Length);
extract = Encoding.Default.GetString(buffer);
}
Edit:
I tried to read all text but it still returned the same four characters. It works fine on any other file except for these few. When I use read all lines it only gets the first line.
fs.Read() does not read the whole smash of bytes all at once, it reads some number of bytes and returns the number of bytes read. MSDN has an excellent example of how to use it to get the whole file:
http://msdn.microsoft.com/en-us/library/system.io.filestream.read.aspx
For what it's worth, reading the entire 150MB of data into memory is really going to put a drain on your client's system -- the preferred option would be to optimize it so that you don't need the whole file all at once.
If you want to read text this way File.ReadAllLine (or ReadAllText) - http://msdn.microsoft.com/en-us/library/s2tte0y1.aspx is better option.
My guess the file is not text file to start with and the way you display resulting string does stop at 0 characters.
As debracey pointed out Read returns number of bytes read - check that out. Also for file operations it is unlikely to stop at 4 characters...

FileStream position is off after calling ReadLine() from C#

I'm trying to read a (small-ish) file in chunks of a few lines at a time, and I need to return to the beginning of particular chunks.
The problem is, after the very first call to
streamReader.ReadLine();
the streamReader.BaseStream.Position property is set to the end of the file! Now I assume some caching is done in the backstage, but I was expecting this property to reflect the number of bytes that I used from that file. And yes, the file has more than one line :-)
For instance, calling ReadLine() again will (naturally) return the next line in the file, which does not start at the position previously reported by streamReader.BaseStream.Position.
How can I find the actual position where the 1st line ends, so I can return there later?
I can only think of manually doing the bookkeeping, by adding the lengths of the strings returned by ReadLine(), but even here there are a couple of caveats:
ReadLine() strips the new-line character(s) which may have a variable length (is is '\n'? Is it "\r\n"? Etc.)
I'm not sure if this would work OK with variable-length characters
...so right now it seems like my only option is to rethink how I parse the file, so I don't have to rewind.
If it helps, I open my file like this:
using (var reader = new StreamReader(
new FileStream(
m_path,
FileMode.Open,
FileAccess.Read,
FileShare.ReadWrite)))
{...}
Any suggestions?
If you need to read lines, and you need to go back to previous chunks, why not store the lines you read in a List? That should be easy enough.
You should not depend on calculating a length in bytes based on the length of the string - for the reasons you mention yourself: Multibyte characters, newline characters, etc.
I have done a similar implementation where I needed to access the n-th line in an extremely big text file fast.
The reason streamReader.BaseStream.Position had pointed to the end of file is that it has a built-in buffer, as you expected.
Bookkeeping by counting number of bytes read from each ReadLine() call will work for most plain text files. However, I have encounter cases where there control character, the unprintable one, mixed in the text file. The number of bytes calculated is wrong and caused my program not beeing able to seek to the correct location thereafter.
My final solution was to go with implementing the line reader on my own. It worked well so far. This should give some ideas what it looks like:
using (FileStream fs = new FileStream(filePath, FileMode.Open))
{
int ch;
int currentLine = 1, offset = 0;
while ((ch = fs.ReadByte()) >= 0)
{
offset++;
// This covers all cases: \r\n and only \n (for UNIX files)
if (ch == 10)
{
currentLine++;
// ... do sth such as log current offset with line number
}
}
}
And to go back to logged offset:
using (FileStream fs = new FileStream(filePath, FileMode.Open))
{
fs.Seek(yourOffset, SeekOrigin.Begin);
TextReader tr = new StreamReader(fs);
string line = tr.ReadLine();
}
Also note there is already buffering mechanism built into FileStream.
StreamReader isn't designed for this kind of usage, so if this is what you need I suspect that you'll have to write your own wrapper for FileStream.
A problem with the accepted answer is that if ReadLine() encounters an exception, say due to the logging framework locking the file temporarily right when you ReadLine(), then you will not have that line "saved" into a list because it never returned a line. If you catch this exception you cannot retry the ReadLine() a second time because StreamReaders internal state and buffer are screwed up from the last ReadLine() and you will only get part of a line returned, and you cannot ignore that broken line and seek back to the beginning of it as OP found out.
If you want to get to the true seekable location then you need to use reflection to get to StreamReaders private variables that allow you calculate its position inside it's own buffer. Granger's solution seen here: StreamReader and seeking, should work. Or do what other answers in other related questions have done: create your own StreamReader that exposes the true seekable location (this answer in this link: Tracking the position of the line of a streamreader). Those are the only two options I've come across while dealing with StreamReader and seeking, which for some reason decided to completely remove the possibility of seeking in nearly every situation.
edit: I used Granger's solution and it works. Just be sure you go in this order: GetActualPosition(), then set BaseStream.Position to that position, then make sure you call DiscardBufferedData(), and finally you can call ReadLine() and you will get the full line starting from the position given in the method.

Categories

Resources