Problems with Streams in C#

Problems with Streams in C# - c#

I have this code for c# to read all the lines in TestFile.txt but when i finish reading i want to read it again and then put it in a string array (not a List) but when i try do that again it says that the file is already in use. I want to reset the stream or do something like sr.Close() because first time i read it i want to count how many lines are there in the Testfile.txt.
using (StreamReader sr = new StreamReader("TestFile.txt"))
{
string line;
while ((line = sr.ReadLine()) != null)
{
Console.WriteLine(line);
}
}
I already tried to put after the while loop if(line == null) sr.Close() but it doesn't work.

Why not just read it into a List<string> and then build an array from that? Or more simply still, just call File.ReadAllLines:
string[] lines = File.ReadAllLines("TestFile.txt");
While you could reset the underlying stream and flush the buffer in the reader, I wouldn't do so - I'd just read it all once in a way that doesn't require you to know the size up-front.
(In fact, I'd try to use a List<string> instead of a string[] anyway - they're generally more pleasant to use. Read Eric Lippert's blog post on the subject for more information.)

You can do it by setting the BaseStream Position property to 0.
If you cannot (example would be a HttpWebResponse stream) then a good option would be to copy the stream to a MemoryStream...there you can set Position to 0 and restart the Stream as much as you want.
Stream s = new MemoryStream();
StreamReader sr = new StreamReader(s);
// later... after we read stuff
s.Position = 0;

Related

Convert a file froM Shift-JIS to UTF8 No BOM without re-reading from disk

I am dealing with files in many formats, including Shift-JIS and UTF8 NoBOM. Using a bit of language knowledge, I can detect if the files are being interepeted correctly as UTF8 or ShiftJIS, but if I detect that the file is not of the type I read in, I was wondering if there is a way to just reinterperet my in-memory array without having to re-read the file with a new encoding specified.
Right now, I read in the file assuming Shift-JIS as such:
using (StreamReader sr = new StreamReader(path, Encoding.GetEncoding("shift-jis"), true))
{
String line = sr.ReadToEnd();
// Detection must be done AFTER you read from the file. Silly rabbit.
fileFormatCertain = !sr.CurrentEncoding.Equals(Encoding.GetEncoding("shift-jis"));
codingFromBOM = sr.CurrentEncoding;
}
and after I do my magic to determine if it is either a known format (has a BOM) or that the data makes sense as Shift-JIS, all is well. If the data is garbage though, then I am re-reading the file via:
using (StreamReader sr = new StreamReader(path, Encoding.UTF8))
{
String line = sr.ReadToEnd();
}
I am trying to avoid this re-read step and reinterperet the data in memory if possible.
Or is magic already happening and I am needlessly worrying about double I/O access?

var buf = File.ReadAllBytes(path);
var text = Encoding.UTF8.GetString(buf);
if (text.Contains("\uFFFD")) // Unicode replacement character
{
text = Encoding.GetEncoding(932).GetString(buf);
}

StreamReader is too greedy

I'm trying to process part of a text file, and write the remainder of the text file to a cloud blob using UploadFromStream. The problem is that the StreamReader appears to be grabbing too much content from the underlying stream, and so the subsequent write does nothing.
Text file:
3
Col1,String
Col2,Integer
Col3,Boolean
abc,123,True
def,3456,False
ghijkl,532,True
mnop,1211,False
Code:
using (var stream = File.OpenRead("c:\\test\\testinput.txt"))
using (var reader = new StreamReader(stream))
{
var numColumns = int.Parse(reader.ReadLine());
while (numColumns-- > 0)
{
var colDescription = reader.ReadLine();
// do stuff
}
// Write remaining contents to another file, for testing
using (var destination = File.OpenWrite("c:\\test\\testoutput.txt"))
{
stream.CopyTo(destination);
destination.Flush();
}
// Actual intended usage:
// CloudBlockBlob blob = ...;
// blob.UploadFromStream(stream);
}
When debugging, I observe that stream.Position jumps to the end of the file on the first call to reader.ReadLine(), which I don't expect. I expected the stream to be advanced only as many positions as the reader needed to read some content.
I imagine that the stream reader is doing some buffering for performance reasons, but there doesn't seem to be a way to ask the reader where in the underlying stream it "really" is. (If there was, I could manually Seek the stream to that position before CopyingTo).
I know that I could keep taking lines using the same reader and sequentially append them to the text file I'm writing, but I'm wondering if there's a cleaner way?
EDIT:
I found a StreamReader constructor which leaves the underlying stream open when it is disposed, so I tried this, hoping that the reader would set the stream's position as it's being disposed:
using (var stream = File.OpenRead("c:\\test\\testinput.txt"))
{
using (var reader = new StreamReader(stream, Encoding.UTF8,
detectEncodingFromByteOrderMarks: true,
bufferSize: 1 << 12,
leaveOpen: true))
{
var numColumns = int.Parse(reader.ReadLine());
while (numColumns-- > 0)
{
var colDescription = reader.ReadLine();
// do stuff
}
}
// Write remaining contents to another file
using (var destination = File.OpenWrite("c:\\test\\testoutput.txt"))
{
stream.CopyTo(destination);
destination.Flush();
}
}
But it doesn't. Why would this constructor be exposed if it doesn't leave the stream in an intuitive state/position?

Sure, there's a cleaner way. Use ReadToEnd to read the remaining data, and then write it to a new file. For example:
using (var reader = new StreamReader("c:\\test\\testinput.txt"))
{
var numColumns = int.Parse(reader.ReadLine());
while (numColumns-- > 0)
{
var colDescription = reader.ReadLine();
// do stuff
}
// write everything else to another file.
File.WriteAllText("c:\\test\\testoutput.txt", reader.ReadToEnd());
}
Edit after comment
If you want to read the text and upload it to a stream, you could replace the File.WriteAllText with code that reads the remaining text, writes it to a StreamWriter backed by a MemoryStream, and then sends the contents of that MemoryStream. Something like:
using (var memStream = new MemoryStream())
{
using (var writer = new StreamWriter(memStream))
{
writer.Write(reader.ReadToEnd());
writer.Flush();
memStream.Position = 0;
blob.UploadFromStream(memStream);
}
}

You should never access the underlying stream of a StreamReader. Trying to use both is going to have an undefined behavior.
What's going on here is that the reader is buffering the data from the underlying stream. It doesn't read each byte exactly when you request it, because that's often going to be very inefficient. Instead it will grab chunks, put them in a buffer, and then provide you with data from that buffer, grabbing a new chunk when it needs to.
You should continue to use the StreamReader throughout the remainder of that block, instead of using stream. To minimize the memory footprint of the program, the most effective way of doing this would be to read the next line from the reader in a loop until it his the end of the file, writing each line to the output stream as you go.
Also note that you don't need to be disposing of both the stream reader and the underlying stream. The stream reader will dispose of the underlying stream itself, so you can simply adjust your header to:
using (var reader = new StreamReader(
File.OpenRead("c:\\test\\testinput.txt")))

StreamReader cannot read from disposed object

I'm using StreamReader, but if try to read from the same stream using two StreamReader-objects I get an error saying I can't read from dispose object (reader3.ReadLine).
Since I'm not disposing any object, what am I doing wrong?
Stream responseStream2;
FtpWebResponse ftpResponse2;
string casefile = CNCElement.ID_CASE_TEST_FILE;
string casepath;
if (FileManager.PathCombine(result, lock_root_folder, casefile, out casepath) == false)
return false;
if (fm.DownloadFtp(result, casepath, out responseStream2, out ftpResponse2) == false)
return false;
StreamReader reader2 = new StreamReader(responseStream2);
StreamReader reader3 = new StreamReader(responseStream2);
byte[] contents=null;
//if cycle is not present update case file
//if cycle is present, case file is already correct
if (reader2.ReadToEnd().Contains(cycle) == false)
{
byte seekcase = CNCElement.ID_CASE.Value;
int casecount = 1;
string line;
using (MemoryStream ms = new MemoryStream())
{
while ((line = reader3.ReadLine()) != null
|| casecount <= seekcase)
{
if (line.Contains("\"\"") == true)
{
if (casecount == seekcase)
line = line.Replace("\"\"", "\"" + cycle + "\"");
}
byte[] app = StrToByteArray(line);
ms.Write(app, 0, line.Length);
contents = ms.ToArray();
}
}
}
if (reader2 != null)
reader2.Close();
if (ftpResponse2 != null)
ftpResponse2.Close();

When you read to the end of reader2 you are really reading to the end of the underlying stream (responseStream2). At this point another read from that stream will fail.
While the specific exception is slightly expected, wrapping the same stream in different StreamReaders is going to do weird things because it is a weird thing to do.
If you need to read a stream twice you need to either use a stream that supports resetting its position to the beginning (ie. random access) and then create a new reader for the second read; or (as would seem likely in this case: I doubt any network stream will support random access) buffer the stream content yourself.

When you call ReadToEnd() the underlying steam is all read into memory and you have reached the end.
Each time you call the function ReadLine() then underlying stream is moved to the next line.
This means that when your application reaches Reader3.ReadLine() loop, as you have already reached the end of the file, the reader fails.
If the expected file stream is not too large, I would suggest you assign the result of the ReadToEnd() call to a variable and perform subsequent operations on this variable.
If the stream is large, then try resetting the Position property (provided it is supported - See the docs).

Using StreamReader after the underlying stream has beed disposed?

Using a StreamReader, if you dispose the underlying stream, I thought you shouldn't be able to read anymore.
That this is true suggests this question, where it's suggested that you don't have to dispose the StreamWriter (in their case) if the life of the underlying stream is handled elsewhere.
But that's not the case. What I did was the following:
I have a file called delme.txt containing the following
abc
def
ghi
The I run this:
Stream s = File.OpenRead(#"C:\delme.txt");
StreamReader sr = new StreamReader(s, Encoding.ASCII);
Console.WriteLine(sr.ReadLine());
s.Dispose();
Console.WriteLine(sr.ReadLine());
And the result is:
abc
def
How is this possible?

Your StreamReader already read the next line into its buffer.
It won't go back to the source Stream until it runs out of data in its buffer.
In fact, it would be impossible for it to throw an exception in that case, since there is no idempotent way to find out whether a Stream has been disposed. (There is no IsDisposed property)

To add to #SLaks answer, here will demonstrate (using a file with a couple thousand lines of text):
Stream s = File.OpenRead(path);
StreamReader sr = new StreamReader(s, Encoding.ASCII);
Console.WriteLine(sr.ReadLine());
s.Dispose();
int i = 1;
try
{
while (!sr.EndOfStream)
{
Console.WriteLine(sr.ReadLine());
i++;
}
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
Console.WriteLine(i + " lines total");
Console.ReadLine();
It will print out lots and lots of lines, like a couple hundred, then will throw an Exception. My output ended like this:
qrs
tuv
wxy
zab
cde
fgh
ijk
lmn
Cannot access a closed file.
204 lines total
In fact we see that there is a constructor for StreamReader that takes a parameter bufferSize as the fourth parameter:
StreamReader sr = new StreamReader(s, Encoding.ASCII, false, 10000);
Using 10000, it actually prints out a total of 1248 lines for me before crashing. Also, the smallest possible value you can use is 1, and for that case, it still pre-fetches 25 lines.

What you need to understand here is what dispose is trying to do.
http://msdn.microsoft.com/en-us/library/ms227563.aspx
It says the TextReader will be an unusable state if the TextReader is finished. Perhaps, since it hasn't read everything, it is not considered finished; therefore, you can continue to use it. That is my guess.

Advanced TextReader to EndOfFile

I have a textReader that in a specific instance I want to be able to advance to the end of file quickly so other classes that might hold a reference to this object will not be able to call tr.ReadLine() without getting a null.
This is a large file. I cannot use TextReader.ReadToEnd() as it will often lead to an OutOfMemoryException
I thought I would ask the community if there was a way SEEK the stream without using TextReader.ReadToEnd() which returns a string of all data in the file.
Current method, inefficient.
The following example code is a mock up. Obviously I am not opening a file with an if statement directly following it asking if I want to read to the end.
TextReader tr = new StreamReader("Largefile");
if(needToAdvanceToEndOfFile)
{
while(tr.ReadLine() != null) { }
}
Desired solution (Note this code block contains fake 'concept' methods or methods that cannot be used due to risk of outofmemoryexception)
TextReader tr = new StreamReader("Largefile");
if(needToAdvanceToEndOfFile)
{
tr.SeekToEnd(); // A method that does not return anything. This method does not exist.
// tr.ReadToEnd() not acceptable as it can lead to OutOfMemoryException error as it is very large file.
}
A possible alternative is to read through the file in bigger chunks using tr.ReadBlock(args).
I poked around ((StreamReader)tr).BaseStream but could not find anything that worked.
As I am new to the community I figured I would see if someone knew the answer off the top of their head.

You have to discard any buffered data if you have read any file content - since data is buffered you might get content even if you seek the underlying stream to the end - working example:
StreamReader sr = new StreamReader(fileName);
string sampleLine = sr.ReadLine();
//discard all buffered data and seek to end
sr.DiscardBufferedData();
sr.BaseStream.Seek(0, SeekOrigin.End);
The problem as mentioned in the documentation is
The StreamReader class buffers input from the underlying stream when
you call one of the Read methods. If you manipulate the position of
the underlying stream after reading data into the buffer, the position
of the underlying stream might not match the position of the internal
buffer. To reset the internal buffer, call the DiscardBufferedData
method

Use
reader.BaseStream.Seek(0, SeekOrigin.End);
Test:
using (StreamReader reader = new StreamReader(#"Your Large File"))
{
reader.BaseStream.Seek(0, SeekOrigin.End);
int read = reader.Read();//read will be -1 since you are at the end of the stream
}
Edit: Test it with your code:
using (TextReader tr = new StreamReader("C:\\test.txt"))//test.txt is a file that has data and lines
{
((StreamReader)tr).BaseStream.Seek(0, SeekOrigin.End);
string foo = tr.ReadLine();
Debug.WriteLine(foo ?? "foo is null");//foo is null
int read = tr.Read();
Debug.WriteLine(read);//-1
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Problems with Streams in C# - c#

Related

Convert a file froM Shift-JIS to UTF8 No BOM without re-reading from disk

StreamReader is too greedy

StreamReader cannot read from disposed object

Using StreamReader after the underlying stream has beed disposed?

Advanced TextReader to EndOfFile

Categories

Resources