Filestream only reading the first 4 characters of the file - c#

Hey there! I'm trying to read a 150mb file with a file stream but every time I do it all I get is: |zl instead of the whole stream. Note that it has some special characters in it.
Does anybody know what the problem could be? here is my code:
using (FileStream fs = File.OpenRead(path))
{
byte[] buffer = new byte[fs.Length];
fs.Read(buffer, 0, buffer.Length);
extract = Encoding.Default.GetString(buffer);
}
Edit:
I tried to read all text but it still returned the same four characters. It works fine on any other file except for these few. When I use read all lines it only gets the first line.

fs.Read() does not read the whole smash of bytes all at once, it reads some number of bytes and returns the number of bytes read. MSDN has an excellent example of how to use it to get the whole file:
http://msdn.microsoft.com/en-us/library/system.io.filestream.read.aspx
For what it's worth, reading the entire 150MB of data into memory is really going to put a drain on your client's system -- the preferred option would be to optimize it so that you don't need the whole file all at once.

If you want to read text this way File.ReadAllLine (or ReadAllText) - http://msdn.microsoft.com/en-us/library/s2tte0y1.aspx is better option.
My guess the file is not text file to start with and the way you display resulting string does stop at 0 characters.
As debracey pointed out Read returns number of bytes read - check that out. Also for file operations it is unlikely to stop at 4 characters...

Related

Read text file block by block

I have text file which contain 200000 rows. I want to read first 50000 rows and then process it and then read second part say 50001 to 100000 etc. When I read second block I don't write to loop on first 1 to 50000. I want that reader pointer directly goes to row number 50001 and start reading.
How it can be possible? Which reader is used for that?
You need the StreamReader class.
With this you can do line by line reading with the ReadLine() method. You will need to keep track of the line count yourself and call a method to process your data every 50000 lines, but so long as you keep the reader open you should not need to restart the reading.
No unfortunately there is no way you can skip counting the Lines. At the raw level files do not work on a line number basis. Instead they work at a position / offset basis. The root file system has no concept of lines. It's a concept added by higher level components.
So there is no way to tell the operating system, please open file at line specified. Instead you have to open the file and skip around counting new lines until you've passed the specified number. Then store the next set of bytes into an array until you hit the next new line.
Though If each line has equal number of bytes present then you can try the following.
using( Stream stream = File.Open(fileName, FileMode.Open) )
{
stream.Seek(bytesPerLine * (myLine - 1), SeekOrigin.Begin);
using( StreamReader reader = new StreamReader(stream) )
{
string line = reader.ReadLine();
}
}
I believe the best way would be to use stream reader,
Here are two related questions to yours, in which you can get answers from there. But ultimately if you want to get blocks of text it is very hard to do unless it is a set amount.
However I believe these would be a good read for you to use:
Reading Block of text file
This one shows you how to separate blocks of code to read. The answer for this one would be best suited, you can just set the conditions to read how many lines you have read, and set the conditions to check if the line count == 50000 or so on then do something.
As you can see
This answer makes use of the keyword continue which I believe will be useful for what you are intending to do.
Reading text file block by block
This one shows you a more readable answer but doesn't really answer what you are looking for in reading blocks.
For your question I believe that what you want to do has confused you a little, it seems like you want to highlight 50000 lines and then read it as one, that is not the way streamreader works, and yes reading line by line makes the process longer but unfortunately that's the case.
Unless the rows are exactly the same length, you can't start directly at row 50001.
What you can do, however, is when reading the first 50000 rows, remember where the last row ends. You can then seek directly to that offset and continue reading from there.
Where the row length is fixed, you do something like this:
myfile.Seek(50000 * (rowCharacters + 2), SeekOrigin.Begin);
Seek goes to a specific offset in bytes, so you just need to tell it how many bytes 50000 rows occupy. Given an ASCII encoding, that's the number of characters in the line, plus 2 for the newline sequence.

Best approach for in memory manipulation of text file in memory: read as byte[] first? read as File.ReadAllText() then save as binary?

I need to change a file in memory, and currently I read the file to memory into a byte[] using a filestream and a binaryreader.
I was wondering whats the best approach to change that file in memory, convert the byte[] to string, make changes and do an Encoding.GetBytes()? or Read the file first as string using File.ReadAllText() and then Encoding.GetBytes()? or any approach will work without caveats?
Any special approaches? I need to replace specific text inside files with additional chars or replacement strings, several 100,000 of files. Reliability is preferred over efficiency. Files are text like HTML, not binary files.
Read the files using File.ReadAllText(), modify them, then do byte[] byteData = Encoding.UTF8.GetBytes(your_modified_string_from_file). Use the encoding with which the files were saved. This will give you an array of byte[]. You can convert the byte[] to a stream like this:
MemoryStream stream = new MemoryStream();
stream.Write(byteData, 0, byteData.Length);
Edit:
It looks like one of the Add methods in the API can take a byte array, so you don't have to use a stream.
You're definitely making things harder on yourself by reading into bytes first. Just use a StreamReader. You can probably get away with using ReadLine() and processing a line at a time. This can seriously reduce your app's memory usage, especially if you're working with that many files.
using (var reader = File.OpenText(originalFile))
using (var writer = File.CreateText(tempFile))
{
string line;
while ((line = reader.ReadLine()) != null)
{
var temp = DoMyStuff(line);
writer.WriteLine(temp);
}
}
File.Delete(originalFile);
File.Move(tempFile, originalFile);
Based on the size of the files, I would use File.ReadAllText to read them and File.WriteAllText to wirte them. This frees you up from the responsibility of having to call Close or Dispose on either read or write.
You generally don't want to read a text file on a binary level - just use File.ReadAllText() and supply it with the correct encoding used in the file (there's an overload for that). If the file encoding is UTF8 or UTF32 usually the method can automatically detect and use the correct endcoding. Same applies to writing it back - if it's not UTF8 specify which encoding you want.

Using StreamReader and StreamWriter to Modify Files

I am trying to use StreamReader and StreamWriter to Open a text file (fixed width) and to modify a few specific columns of data. I have dates with the following format that are going to be converted to packed COMP-3 fields.
020100718F
020100716F
020100717F
020100718F
020100719F
I want to be able to read in the dates form a file using StreamReader, then convert them to packed fields (5 characters), and then output them using StreamWriter. However, I haven't found a way to use StreamWriter to right to a specific position, and beginning to wonder if is possible.
I have the following code snip-it.
System.IO.StreamWriter writer;
this.fileName = #"C:\Test9.txt";
reader = new System.IO.StreamReader(System.IO.File.OpenRead(this.fileName));
currentLine = reader.ReadLine();
currentLine = currentLine.Substring(30, 10); //Substring Containing the Date
reader.Close();
...
// Convert currentLine to Packed Field
...
writer = new System.IO.StreamWriter(System.IO.File.Open(this.fileName, System.IO.FileMode.Open));
writer.Write(currentLine);
Currently what I have does the following:
After:
!##$%0718F
020100716F
020100717F
020100718F
020100719F
!##$% = Ascii Characters SO can't display
Any ideas? Thanks!
UPDATE
Information on Packed Fields COMP-3
Packed Fields are used by COBOL systems to reduce the number of bytes a field requires in files. Please see the following SO post for more information: Here
Here is Picture of the following date "20120123" packed in COMP-3. This is my end result and I have included because I wasn't sure if it would effect possible answers.
My question is how do you get StreamWriter to dynamically replace data inside a file and change the lengths of rows?
I have always found it better to to read the input file, filter/process the data and write the output to a temporary file. After finished, delete the original file (or make a backup) and copy the temporary file over. This way you haven't lost half your input file in case something goes wrong in the middle of processing.
You should probably be using a Stream directly (probably a FileStream). This would allow you to change position.
However, you're not going to be able to change record sizes this way, at least, not in-line. You can have one Stream reading from the original file, and another writing to a new, converted copy of the file.
However, I haven't found a way to use StreamWriter to right to a specific position, and
beginning to wonder if is possible.
You can use StreamWriter.BaseStream.Seek method
using (StreamWriter wr = new StreamWriter(File.Create(#"c:\Temp\aaa.txt")))
{
wr.Write("ABC");
wr.Flush();
wr.BaseStream.Seek(0, SeekOrigin.Begin);
wr.Write("Z");
}

Remove item from binary file

What's the best and the fastest method to remove an item from a binary file?
I have a binary file and I know that I need to remove B number of bytes from a position A, how to do it?
Thanks
You might want to consider working in batches to prevent allocation on the LOH but that depends on the size of your file and the frequency in which you call this logic.
long skipIndex = 100;
int skipLength = 40;
using (FileStream fileStream = File.Open("file.dat", FileMode.Open))
{
int bufferSize;
checked
{
bufferSize = (int)(fileStream.Length - (skipLength + skipIndex));
}
byte[] buffer = new byte[bufferSize];
// read all data after
fileStream.Position = skipIndex + skipLength;
fileStream.Read(buffer, 0, bufferSize);
// write to displacement
fileStream.Position = skipIndex;
fileStream.Write(buffer, 0, bufferSize);
fileStream.SetLength(fileStream.Position); // trim the file
}
Depends... There are a few ways to do this, depending on your requirements.
The basic solution is to read chunks of data from the source file into a target file, skipping over the bits that must be removed (is it always only one segment to remove, or multiple segments?). After you're done, delete the original file and rename the temp file to the original's name.
Things to keep in mind here are that you should tend towards larger chunks rather than smaller. The size of your files will determine a suitable value. 1MB is a good 'default'.
The simple approach assumes that deleting and renaming a new file is not a problem. If you have specific permissions attached to the file, or used NTFS streams or some-such, this approach won't work.
In that case, make a copy of the original file. Then, skip to the first byte after the segment to ignore in the copied file, skip to the start of the segment in the source file, and transfer bytes from copy to original. If you're using Streams, you'll want to call Stream.SetLength to truncate the original to the correct size
If you want to just rewrite the original file, and remove a sequence from it the best way is to "rearrange" the file.
The idea is:
for i = A+1 to file.length - B
file[i] = file[i+B]
For better performance it's best to read and write in chunks and not single bytes. Test with different chunk sizes to see what best for your target system.

FileStream position is off after calling ReadLine() from C#

I'm trying to read a (small-ish) file in chunks of a few lines at a time, and I need to return to the beginning of particular chunks.
The problem is, after the very first call to
streamReader.ReadLine();
the streamReader.BaseStream.Position property is set to the end of the file! Now I assume some caching is done in the backstage, but I was expecting this property to reflect the number of bytes that I used from that file. And yes, the file has more than one line :-)
For instance, calling ReadLine() again will (naturally) return the next line in the file, which does not start at the position previously reported by streamReader.BaseStream.Position.
How can I find the actual position where the 1st line ends, so I can return there later?
I can only think of manually doing the bookkeeping, by adding the lengths of the strings returned by ReadLine(), but even here there are a couple of caveats:
ReadLine() strips the new-line character(s) which may have a variable length (is is '\n'? Is it "\r\n"? Etc.)
I'm not sure if this would work OK with variable-length characters
...so right now it seems like my only option is to rethink how I parse the file, so I don't have to rewind.
If it helps, I open my file like this:
using (var reader = new StreamReader(
new FileStream(
m_path,
FileMode.Open,
FileAccess.Read,
FileShare.ReadWrite)))
{...}
Any suggestions?
If you need to read lines, and you need to go back to previous chunks, why not store the lines you read in a List? That should be easy enough.
You should not depend on calculating a length in bytes based on the length of the string - for the reasons you mention yourself: Multibyte characters, newline characters, etc.
I have done a similar implementation where I needed to access the n-th line in an extremely big text file fast.
The reason streamReader.BaseStream.Position had pointed to the end of file is that it has a built-in buffer, as you expected.
Bookkeeping by counting number of bytes read from each ReadLine() call will work for most plain text files. However, I have encounter cases where there control character, the unprintable one, mixed in the text file. The number of bytes calculated is wrong and caused my program not beeing able to seek to the correct location thereafter.
My final solution was to go with implementing the line reader on my own. It worked well so far. This should give some ideas what it looks like:
using (FileStream fs = new FileStream(filePath, FileMode.Open))
{
int ch;
int currentLine = 1, offset = 0;
while ((ch = fs.ReadByte()) >= 0)
{
offset++;
// This covers all cases: \r\n and only \n (for UNIX files)
if (ch == 10)
{
currentLine++;
// ... do sth such as log current offset with line number
}
}
}
And to go back to logged offset:
using (FileStream fs = new FileStream(filePath, FileMode.Open))
{
fs.Seek(yourOffset, SeekOrigin.Begin);
TextReader tr = new StreamReader(fs);
string line = tr.ReadLine();
}
Also note there is already buffering mechanism built into FileStream.
StreamReader isn't designed for this kind of usage, so if this is what you need I suspect that you'll have to write your own wrapper for FileStream.
A problem with the accepted answer is that if ReadLine() encounters an exception, say due to the logging framework locking the file temporarily right when you ReadLine(), then you will not have that line "saved" into a list because it never returned a line. If you catch this exception you cannot retry the ReadLine() a second time because StreamReaders internal state and buffer are screwed up from the last ReadLine() and you will only get part of a line returned, and you cannot ignore that broken line and seek back to the beginning of it as OP found out.
If you want to get to the true seekable location then you need to use reflection to get to StreamReaders private variables that allow you calculate its position inside it's own buffer. Granger's solution seen here: StreamReader and seeking, should work. Or do what other answers in other related questions have done: create your own StreamReader that exposes the true seekable location (this answer in this link: Tracking the position of the line of a streamreader). Those are the only two options I've come across while dealing with StreamReader and seeking, which for some reason decided to completely remove the possibility of seeking in nearly every situation.
edit: I used Granger's solution and it works. Just be sure you go in this order: GetActualPosition(), then set BaseStream.Position to that position, then make sure you call DiscardBufferedData(), and finally you can call ReadLine() and you will get the full line starting from the position given in the method.

Categories

Resources