C# Text File Reading ObjectDisposedException: The object was used after being disposed - c#

I want to read the last line of a text file. I'm using a solution that's suggested here:
How to efficiently read only last line of the text file
Using that library, I'm getting an error saying the stream is disposed. But I'm confused as I'm declaring the stream during every frame.
FileStream fileStream = new FileStream("C:\\Users\\LukasRoper\\Desktop\\Test.log", FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
ReverseLineReader reverseLineReader = new ReverseLineReader(() => fileStream, Encoding.UTF8);
List<string> stringParts = new List<string>();
do
{
IEnumerable<string> line = reverseLineReader.Take(1);
string data = line.First();
stringParts = data.Split(',').ToList();
} while (stringParts.Count != 9);
I should explain I'm trying to read from a file that another program is writing to at the same time and I can't amend that program as its third party software. Can anybody explain why my FileStream becomes disposed?
The Reverse File Reader is here:
using System;
using System.Collections;
using System.Collections.Generic;
using System.IO;
using System.Text;
namespace MiscUtil.IO
{
/// <summary>
/// Takes an encoding (defaulting to UTF-8) and a function which produces a seekable stream
/// (or a filename for convenience) and yields lines from the end of the stream backwards.
/// Only single byte encodings, and UTF-8 and Unicode, are supported. The stream
/// returned by the function must be seekable.
/// </summary>
public sealed class ReverseLineReader : IEnumerable<string>
{
/// <summary>
/// Buffer size to use by default. Classes with internal access can specify
/// a different buffer size - this is useful for testing.
/// </summary>
private const int DefaultBufferSize = 4096;
/// <summary>
/// Means of creating a Stream to read from.
/// </summary>
private readonly Func<Stream> streamSource;
/// <summary>
/// Encoding to use when converting bytes to text
/// </summary>
private readonly Encoding encoding;
/// <summary>
/// Size of buffer (in bytes) to read each time we read from the
/// stream. This must be at least as big as the maximum number of
/// bytes for a single character.
/// </summary>
private readonly int bufferSize;
/// <summary>
/// Function which, when given a position within a file and a byte, states whether
/// or not the byte represents the start of a character.
/// </summary>
private Func<long,byte,bool> characterStartDetector;
/// <summary>
/// Creates a LineReader from a stream source. The delegate is only
/// called when the enumerator is fetched. UTF-8 is used to decode
/// the stream into text.
/// </summary>
/// <param name="streamSource">Data source</param>
public ReverseLineReader(Func<Stream> streamSource)
: this(streamSource, Encoding.UTF8)
{
}
/// <summary>
/// Creates a LineReader from a filename. The file is only opened
/// (or even checked for existence) when the enumerator is fetched.
/// UTF8 is used to decode the file into text.
/// </summary>
/// <param name="filename">File to read from</param>
public ReverseLineReader(string filename)
: this(filename, Encoding.UTF8)
{
}
/// <summary>
/// Creates a LineReader from a filename. The file is only opened
/// (or even checked for existence) when the enumerator is fetched.
/// </summary>
/// <param name="filename">File to read from</param>
/// <param name="encoding">Encoding to use to decode the file into text</param>
public ReverseLineReader(string filename, Encoding encoding)
: this(() => File.OpenRead(filename), encoding)
{
}
/// <summary>
/// Creates a LineReader from a stream source. The delegate is only
/// called when the enumerator is fetched.
/// </summary>
/// <param name="streamSource">Data source</param>
/// <param name="encoding">Encoding to use to decode the stream into text</param>
public ReverseLineReader(Func<Stream> streamSource, Encoding encoding)
: this(streamSource, encoding, DefaultBufferSize)
{
}
internal ReverseLineReader(Func<Stream> streamSource, Encoding encoding, int bufferSize)
{
this.streamSource = streamSource;
this.encoding = encoding;
this.bufferSize = bufferSize;
if (encoding.IsSingleByte)
{
// For a single byte encoding, every byte is the start (and end) of a character
characterStartDetector = (pos, data) => true;
}
else if (encoding is UnicodeEncoding)
{
// For UTF-16, even-numbered positions are the start of a character
characterStartDetector = (pos, data) => (pos & 1) == 0;
}
else if (encoding is UTF8Encoding)
{
// For UTF-8, bytes with the top bit clear or the second bit set are the start of a character
// See http://www.cl.cam.ac.uk/~mgk25/unicode.html
characterStartDetector = (pos, data) => (data & 0x80) == 0 || (data & 0x40) != 0;
}
else
{
throw new ArgumentException("Only single byte, UTF-8 and Unicode encodings are permitted");
}
}
/// <summary>
/// Returns the enumerator reading strings backwards. If this method discovers that
/// the returned stream is either unreadable or unseekable, a NotSupportedException is thrown.
/// </summary>
public IEnumerator<string> GetEnumerator()
{
Stream stream = streamSource();
if (!stream.CanSeek)
{
stream.Dispose();
throw new NotSupportedException("Unable to seek within stream");
}
if (!stream.CanRead)
{
stream.Dispose();
throw new NotSupportedException("Unable to read within stream");
}
return GetEnumeratorImpl(stream);
}
private IEnumerator<string> GetEnumeratorImpl(Stream stream)
{
try
{
long position = stream.Length;
if (encoding is UnicodeEncoding && (position & 1) != 0)
{
throw new InvalidDataException("UTF-16 encoding provided, but stream has odd length.");
}
// Allow up to two bytes for data from the start of the previous
// read which didn't quite make it as full characters
byte[] buffer = new byte[bufferSize + 2];
char[] charBuffer = new char[encoding.GetMaxCharCount(buffer.Length)];
int leftOverData = 0;
String previousEnd = null;
// TextReader doesn't return an empty string if there's line break at the end
// of the data. Therefore we don't return an empty string if it's our *first*
// return.
bool firstYield = true;
// A line-feed at the start of the previous buffer means we need to swallow
// the carriage-return at the end of this buffer - hence this needs declaring
// way up here!
bool swallowCarriageReturn = false;
while (position > 0)
{
int bytesToRead = Math.Min(position > int.MaxValue ? bufferSize : (int)position, bufferSize);
position -= bytesToRead;
stream.Position = position;
StreamUtil.ReadExactly(stream, buffer, bytesToRead);
// If we haven't read a full buffer, but we had bytes left
// over from before, copy them to the end of the buffer
if (leftOverData > 0 && bytesToRead != bufferSize)
{
// Buffer.BlockCopy doesn't document its behaviour with respect
// to overlapping data: we *might* just have read 7 bytes instead of
// 8, and have two bytes to copy...
Array.Copy(buffer, bufferSize, buffer, bytesToRead, leftOverData);
}
// We've now *effectively* read this much data.
bytesToRead += leftOverData;
int firstCharPosition = 0;
while (!characterStartDetector(position + firstCharPosition, buffer[firstCharPosition]))
{
firstCharPosition++;
// Bad UTF-8 sequences could trigger this. For UTF-8 we should always
// see a valid character start in every 3 bytes, and if this is the start of the file
// so we've done a short read, we should have the character start
// somewhere in the usable buffer.
if (firstCharPosition == 3 || firstCharPosition == bytesToRead)
{
throw new InvalidDataException("Invalid UTF-8 data");
}
}
leftOverData = firstCharPosition;
int charsRead = encoding.GetChars(buffer, firstCharPosition, bytesToRead - firstCharPosition, charBuffer, 0);
int endExclusive = charsRead;
for (int i = charsRead - 1; i >= 0; i--)
{
char lookingAt = charBuffer[i];
if (swallowCarriageReturn)
{
swallowCarriageReturn = false;
if (lookingAt == '\r')
{
endExclusive--;
continue;
}
}
// Anything non-line-breaking, just keep looking backwards
if (lookingAt != '\n' && lookingAt != '\r')
{
continue;
}
// End of CRLF? Swallow the preceding CR
if (lookingAt == '\n')
{
swallowCarriageReturn = true;
}
int start = i + 1;
string bufferContents = new string(charBuffer, start, endExclusive - start);
endExclusive = i;
string stringToYield = previousEnd == null ? bufferContents : bufferContents + previousEnd;
if (!firstYield || stringToYield.Length != 0)
{
yield return stringToYield;
}
firstYield = false;
previousEnd = null;
}
previousEnd = endExclusive == 0 ? null : (new string(charBuffer, 0, endExclusive) + previousEnd);
// If we didn't decode the start of the array, put it at the end for next time
if (leftOverData != 0)
{
Buffer.BlockCopy(buffer, 0, buffer, bufferSize, leftOverData);
}
}
if (leftOverData != 0)
{
// At the start of the final buffer, we had the end of another character.
throw new InvalidDataException("Invalid UTF-8 data at start of stream");
}
if (firstYield && string.IsNullOrEmpty(previousEnd))
{
yield break;
}
yield return previousEnd ?? "";
}
finally
{
stream.Dispose();
}
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
}
I do see that dispose is called on the stream, but doesn't redeclaring it fix that? That class was copied from here: How to read a text file reversely with iterator in C#
Thanks,

Your finally clause in private IEnumerator<string> GetEnumeratorImpl(Stream stream) is disposing your stream after you read from it. Generally you should disposing an object in the same scope or class you create it in.
In this case, remove all disposes in ReverseLineReader and wrap your original code in a using:
using (FileStream fileStream = new FileStream(...))
{
...
do
{
...
} while(...);
}

Related

how to decompress big file of more than 100mb in not using any external libraries

I've tried using NuGet packages to extract the tgz file but the tgz contains the file with names having unsupported characters to a file name eg: 1111-11-1111:11:11.111.AA
verified this issue using the sharpcompress lib.
so I had to follow the gist link below
https://gist.github.com/ForeverZer0/a2cd292bd2f3b5e114956c00bb6e872b
this is the link I've followed to extract the tgz file. This is a really nice piece of code and is working well. but when I try to extract big size tgz files more than 100MB an error is getting like the stream is too long.
The error means that you are trying to feed too much bytes into MemoryStream, which has a maximum capacity of int.MaxValue (about 2GB).
If you cannot find a suitable library and want to work with provided code, then it can be modified as follows.
Note that entire GZipStream is first copied to MemoryStream. Why? As comment in the code states:
// A GZipStream is not seekable, so copy it first to a MemoryStream
However, in subsequent code, only two operations are used which require stream to be seekable: stream.Seek(x, SeekOrigin.Current) (where x is always positive), and stream.Position. Both of this operations can be emulated by reading the stream, without seeking. For example, to seek forward you can read that amount of bytes and discard:
private static void FakeSeekForward(Stream stream, int offset) {
if (stream.CanSeek)
stream.Seek(offset, SeekOrigin.Current);
else {
int bytesRead = 0;
var buffer = new byte[offset];
while (bytesRead < offset)
{
int read = stream.Read(buffer, bytesRead, offset - bytesRead);
if (read == 0)
throw new EndOfStreamException();
bytesRead += read;
}
}
}
And to track current stream position you can just store amount of bytes read. Then we can remove converation to MemoryStream and code from the link becomes:
public class Tar
{
/// <summary>
/// Extracts a <i>.tar.gz</i> archive to the specified directory.
/// </summary>
/// <param name="filename">The <i>.tar.gz</i> to decompress and extract.</param>
/// <param name="outputDir">Output directory to write the files.</param>
public static void ExtractTarGz(string filename, string outputDir)
{
using (var stream = File.OpenRead(filename))
ExtractTarGz(stream, outputDir);
}
/// <summary>
/// Extracts a <i>.tar.gz</i> archive stream to the specified directory.
/// </summary>
/// <param name="stream">The <i>.tar.gz</i> to decompress and extract.</param>
/// <param name="outputDir">Output directory to write the files.</param>
public static void ExtractTarGz(Stream stream, string outputDir)
{
using (var gzip = new GZipStream(stream, CompressionMode.Decompress))
{
// removed convertation to MemoryStream
ExtractTar(gzip, outputDir);
}
}
/// <summary>
/// Extractes a <c>tar</c> archive to the specified directory.
/// </summary>
/// <param name="filename">The <i>.tar</i> to extract.</param>
/// <param name="outputDir">Output directory to write the files.</param>
public static void ExtractTar(string filename, string outputDir)
{
using (var stream = File.OpenRead(filename))
ExtractTar(stream, outputDir);
}
/// <summary>
/// Extractes a <c>tar</c> archive to the specified directory.
/// </summary>
/// <param name="stream">The <i>.tar</i> to extract.</param>
/// <param name="outputDir">Output directory to write the files.</param>
public static void ExtractTar(Stream stream, string outputDir) {
var buffer = new byte[100];
// store current position here
long pos = 0;
while (true) {
pos += stream.Read(buffer, 0, 100);
var name = Encoding.ASCII.GetString(buffer).Trim('\0');
if (String.IsNullOrWhiteSpace(name))
break;
FakeSeekForward(stream, 24);
pos += 24;
pos += stream.Read(buffer, 0, 12);
var size = Convert.ToInt64(Encoding.UTF8.GetString(buffer, 0, 12).Trim('\0').Trim(), 8);
FakeSeekForward(stream, 376);
pos += 376;
var output = Path.Combine(outputDir, name);
if (!Directory.Exists(Path.GetDirectoryName(output)))
Directory.CreateDirectory(Path.GetDirectoryName(output));
if (!name.Equals("./", StringComparison.InvariantCulture)) {
using (var str = File.Open(output, FileMode.OpenOrCreate, FileAccess.Write)) {
var buf = new byte[size];
pos += stream.Read(buf, 0, buf.Length);
str.Write(buf, 0, buf.Length);
}
}
var offset = (int) (512 - (pos % 512));
if (offset == 512)
offset = 0;
FakeSeekForward(stream, offset);
pos += offset;
}
}
private static void FakeSeekForward(Stream stream, int offset) {
if (stream.CanSeek)
stream.Seek(offset, SeekOrigin.Current);
else {
int bytesRead = 0;
var buffer = new byte[offset];
while (bytesRead < offset)
{
int read = stream.Read(buffer, bytesRead, offset - bytesRead);
if (read == 0)
throw new EndOfStreamException();
bytesRead += read;
}
}
}
}

"Must be already floating point" when converting an audio file into wav file

Here I have a code for covering a audio file into wav format for better quality and reducing file size. Here I am using naudio file compression source code and I got an exception when I try to convert that file.
Must be already floating point
public string ConvertToWAV(string tempFilePath, string tempFileName, string audioType)
{
//Try to transform the file, if it fails use the original file
FileInfo fileInfo = new FileInfo(tempFilePath + tempFileName);
byte[] fileData = new byte[fileInfo.Length];
fileData = File.ReadAllBytes(tempFilePath + tempFileName);
ISampleProvider sampleProvider;
try
{
if (audioType.ToLower().Contains("wav"))
{
try
{
using (MemoryStream wav = new MemoryStream(fileData))
{
WaveStream stream = new WaveFileReader(wav);
WaveFormat target = new WaveFormat();
var s = new RawSourceWaveStream(new MemoryStream(), new WaveFormat(8000, 16, 1));
var c = new WaveFormatConversionStream(WaveFormat.CreateALawFormat(8000, 1), s);
sampleProvider = new WaveToSampleProvider(c);
WaveFileWriter.CreateWaveFile16(tempFilePath + tempFileName, sampleProvider);
wav.Close();
}
}
catch (Exception ex)
{
//We couldn't convert the file, continue with the original file.
}
}
}
catch (Exception ex)
{
throw ex;
}
return Convert.ToBase64String(fileData);
}
There are a couple problems with the code and the concept in general.
First, you're ignoring the WaveFormat of the input file. I'm guessing that you're assuming it's 8K, 16-bit, 1 channel based on the line where you create var s, but this is not a guarantee.
Second, you don't need a MemoryStream or RawSourceWaveStream. WaveFileReader is a WaveStream, and is suitable for any "next-stage" NAudio wave processor.
Third (and this is most likely your exception): The NAudio Wave processors and converters don't like A-Law (or u-Law) in the WaveFormat. A-Law (and u-Law) is technically not PCM data. As such, they're not the "wave" data that NAudio likes to play with.
Ok, with all that said, here are some suggestions. There are very particular A-Law and u-Law encoders in the NAudio.Codecs namespace. Oddly enough, they're named ALawEncoder and MuLawEncoder. These things are not stream-compatible, so we want to make them compatible.
I've added a class at the end here that does just that: Creates an IWaveProvider that actually spits out a stream A-Law or u-Law. Here's the test code that makes use of the new class. The test code does the following:
Reads an input file using MediaFoundationReader (I like this one)
Converts whatever the input format is into 16-bit PCM (while keeping the channel count) using MediaFoundationResampler. Note that this means that your input file does not have to have same format as the A-law output, so it'll convert pretty much anything.
Feeds that new 16-bit PCM stream to the custom "ALaw-to-IWaveProvider" converter class.
Writes the IWaveProvider-compatible A-Law output to a wave file.
I use the MediaFoundation classes here because they don't seem to be as particular about wave formats as the ACM-based ones.
static void ConversionTest( string _outfilename, string _infilename )
{
try
{
using( var reader = new MediaFoundationReader(_infilename) )
{
// Create a wave format for 16-bit pcm at 8000 samples per second.
int channels = reader.WaveFormat.Channels;
int rate = 8000;
int rawsize = 2;
int blockalign = rawsize * channels; // this is the size of one sample.
int bytespersecond = rate * blockalign;
var midformat =
WaveFormat.CreateCustomFormat( WaveFormatEncoding.Pcm,
rate,
channels,
bytespersecond,
blockalign,
rawsize * 8 );
// And a conversion stream to turn input into 16-bit PCM.
var midstream = new MediaFoundationResampler(reader, midformat);
//var midstream = new WaveFormatConversionStream(midformat, reader);
// The output stream is our custom stream.
var outstream = new PcmToALawConversionStream(midstream);
WaveFileWriter.CreateWaveFile(_outfilename, outstream);
}
}
catch( Exception _ex )
{
}
}
And here's the class that converts 16-bit PCM into A-Law or u-Law. At the end are specializations for A-Law or u-Law:
/// <summary>
/// Encodes 16-bit PCM input into A- or u-Law, presenting the output
/// as an IWaveProvider.
/// </summary>
public class PcmToG711ConversionStream : IWaveProvider
{
/// <summary>Gets the local a-law or u-law format.</summary>
public WaveFormat WaveFormat { get { return waveFormat; } }
/// <summary>Returns <paramref name="count"/> encoded bytes.</summary>
/// <remarks>
/// Note that <paramref name="count"/> is raw bytes. It doesn't consider
/// channel counts, etc.
/// </remarks>
/// <param name="buffer">The output buffer.</param>
/// <param name="offset">The starting position in the output buffer.</param>
/// <param name="count">The number of bytes to read.</param>
/// <returns>The total number of bytes encoded into <paramref name="buffer"/>.</returns>
public int Read(byte[] buffer, int offset, int count)
{
// We'll need a source buffer, twice the size of 'count'.
int shortcount = count*2;
byte [] rawsource = new byte [shortcount];
int sourcecount = Provider.Read(rawsource, 0, shortcount);
int bytecount = sourcecount / 2;
for( int index = 0; index < bytecount; ++index )
{
short source = BitConverter.ToInt16(rawsource, index*2);
buffer[offset+index] = Encode(source);
}
return bytecount;
}
/// <summary>
/// Initializes and A-Law or u-Law "WaveStream". The source stream
/// must be 16-bit PCM!
/// </summary>
/// <param name="_encoding">ALaw or MuLaw only.</param>
/// <param name="_sourcestream">The input PCM stream.</param>
public PcmToG711ConversionStream( WaveFormatEncoding _encoding,
IWaveProvider _provider )
{
Provider = _provider;
WaveFormat sourceformat = Provider.WaveFormat;
if( (sourceformat.Encoding != WaveFormatEncoding.Pcm) &&
(sourceformat.BitsPerSample != 16) )
{
throw new NotSupportedException("Input must be 16-bit PCM. Try using a conversion stream.");
}
if( _encoding == WaveFormatEncoding.ALaw )
{
Encode = this.EncodeALaw;
waveFormat = WaveFormat.CreateALawFormat( _provider.WaveFormat.SampleRate,
_provider.WaveFormat.Channels) ;
}
else if( _encoding == WaveFormatEncoding.MuLaw )
{
Encode = this.EncodeMuLaw;
waveFormat = WaveFormat.CreateMuLawFormat( _provider.WaveFormat.SampleRate,
_provider.WaveFormat.Channels) ;
}
else
{
throw new NotSupportedException("Encoding must be A-Law or u-Law");
}
}
/// <summary>The a-law or u-law encoder delegate.</summary>
EncodeHandler Encode;
/// <summary>a-law or u-law wave format.</summary>
WaveFormat waveFormat;
/// <summary>The input stream.</summary>
IWaveProvider Provider;
/// <summary>A-Law or u-Law encoder delegate.</summary>
/// <param name="_sample">The 16-bit PCM sample to encode.</param>
/// <returns>The encoded value.</returns>
delegate byte EncodeHandler( short _sample );
byte EncodeALaw( short _sample )
{
return ALawEncoder.LinearToALawSample(_sample);
}
byte EncodeMuLaw( short _sample )
{
return MuLawEncoder.LinearToMuLawSample(_sample);
}
}
public class PcmToALawConversionStream : PcmToG711ConversionStream
{
public PcmToALawConversionStream( IWaveProvider _provider )
: base(WaveFormatEncoding.ALaw, _provider)
{
}
}
public class PcmToMuLawConversionStream : PcmToG711ConversionStream
{
public PcmToMuLawConversionStream( IWaveProvider _provider )
: base(WaveFormatEncoding.MuLaw, _provider)
{
}
}
}
at last i got a solution for this issue ie, need to add kind of additional feature named Media Foundation for the better working in Windows Server 2012.
Use the Add Roles and Features wizard from the Server Manager. Skip through to Features and select Media Foundation

Do Networkstreams have racing conditions? [duplicate]

So, it would seem that a blocking Read() can return before it is done receiving all of the data being sent to it. In turn we wrap the Read() with a loop that is controlled by the DataAvailable value from the stream in question. The problem is that you can receive more data while in this while loop, but there is no behind the scenes processing going on to let the system know this. Most of the solutions I have found to this on the net have not been applicable in one way or another to me.
What I have ended up doing is as the last step in my loop, I do a simple Thread.Sleep(1) after reading each block from the stream. This appears to give the system time to update and I am not getting accurate results but this seems a bit hacky and quite a bit 'circumstantial' for a solution.
Here is a list of the circumstances I am dealing with: Single TCP Connection between an IIS Application and a standalone application, both written in C# for send/receive communication. It sends a request and then waits for a response. This request is initiated by an HTTP request, but I am not having this issue reading data from the HTTP Request, it is after the fact.
Here is the basic code for handling an incoming connection
protected void OnClientCommunication(TcpClient oClient)
{
NetworkStream stream = oClient.GetStream();
MemoryStream msIn = new MemoryStream();
byte[] aMessage = new byte[4096];
int iBytesRead = 0;
while ( stream.DataAvailable )
{
int iRead = stream.Read(aMessage, 0, aMessage.Length);
iBytesRead += iRead;
msIn.Write(aMessage, 0, iRead);
Thread.Sleep(1);
}
MemoryStream msOut = new MemoryStream();
// .. Do some processing adding data to the msOut stream
msOut.WriteTo(stream);
stream.Flush();
oClient.Close();
}
All feedback welcome for a better solution or just a thumbs up on needing to give that Sleep(1) a go to allow things to update properly before we check the DataAvailable value.
Guess I am hoping after 2 years that the answer to this question isn't how things still are :)
You have to know how much data you need to read; you cannot simply loop reading data until there is no more data, because you can never be sure that no more is going to come.
This is why HTTP GET results have a byte count in the HTTP headers: so the client side will know when it has received all the data.
Here are two solutions for you depending on whether you have control over what the other side is sending:
Use "framing" characters: (SB)data(EB), where SB and EB are start-block and end-block characters (of your choosing) but which CANNOT occur inside the data. When you "see" EB, you know you are done.
Implement a length field in front of each message to indicate how much data follows: (len)data. Read (len), then read (len) bytes; repeat as necessary.
This isn't like reading from a file where a zero-length read means end-of-data (that DOES mean the other side has disconnected, but that's another story).
A third (not recommended) solution is that you can implement a timer. Once you start getting data, set the timer. If the receive loop is idle for some period of time (say a few seconds, if data doesn't come often), you can probably assume no more data is coming. This last method is a last resort... it's not very reliable, hard to tune, and it's fragile.
I'm seeing a problem with this.
You're expecting that the communication will be faster than the while() loop, which is very unlikely.
The while() loop will finish as soon as there is no more data, which may not be the case a few milliseconds just after it exits.
Are you expecting a certain amount of bytes?
How often is OnClientCommunication() fired? Who triggers it?
What do you do with the data after the while() loop? Do you keep appending to previous data?
DataAvailable WILL return false because you're reading faster than the communication, so that's fine only if you keep coming back to this code block to process more data coming in.
I was trying to check DataAvailable before reading data from a network stream and it would return false, although after reading a single byte it would return true. So I checked the MSDN documentation and they also read before checking. I would re-arrange the while loop to a do while loop to follow this pattern.
http://msdn.microsoft.com/en-us/library/system.net.sockets.networkstream.dataavailable.aspx
// Check to see if this NetworkStream is readable.
if(myNetworkStream.CanRead){
byte[] myReadBuffer = new byte[1024];
StringBuilder myCompleteMessage = new StringBuilder();
int numberOfBytesRead = 0;
// Incoming message may be larger than the buffer size.
do{
numberOfBytesRead = myNetworkStream.Read(myReadBuffer, 0, myReadBuffer.Length);
myCompleteMessage.AppendFormat("{0}", Encoding.ASCII.GetString(myReadBuffer, 0, numberOfBytesRead));
}
while(myNetworkStream.DataAvailable);
// Print out the received message to the console.
Console.WriteLine("You received the following message : " +
myCompleteMessage);
}
else{
Console.WriteLine("Sorry. You cannot read from this NetworkStream.");
}
When I have this code:
var readBuffer = new byte[1024];
using (var memoryStream = new MemoryStream())
{
do
{
int numberOfBytesRead = networkStream.Read(readBuffer, 0, readBuffer.Length);
memoryStream.Write(readBuffer, 0, numberOfBytesRead);
}
while (networkStream.DataAvailable);
}
From what I can observe:
When sender sends 1000 bytes and reader wants to read them. Then I suspect that NetworkStream somehow "knows" that it should receive 1000 bytes.
When I call .Read before any data arrives from NetworkStream then .Read should be blocking until it gets more than 0 bytes (or more if .NoDelay is false on networkStream)
Then when I read first batch of data I suspect that .Read is somehow updating from its result the counter of those 1000 bytes at NetworkStream and before this happens I suspect, that in this time the .DataAvailable is set to false and after the counter is updated then the .DataAvailable is then set to correct value if the counter data is less than 1000 bytes. It makes sense when you think about it. Because otherwise it would go to the next cycle before checking that 1000 bytes arrived and the .Read method would be blocking indefinitely, because reader could have already read 1000 bytes and no more data would arrive.
This I think is the point of failure here as already James said:
Yes, this is just the way these libraries work. They need to be given time to run to fully validate the data incoming. – James Apr 20 '16 at 5:24
I suspect that the update of internal counter between end of .Read and before accessing .DataAvailable is not as atomic operation (transaction) so the TcpClient needs more time to properly set the DataAvailable.
When I have this code:
var readBuffer = new byte[1024];
using (var memoryStream = new MemoryStream())
{
do
{
int numberOfBytesRead = networkStream.Read(readBuffer, 0, readBuffer.Length);
memoryStream.Write(readBuffer, 0, numberOfBytesRead);
if (!networkStream.DataAvailable)
System.Threading.Thread.Sleep(1); //Or 50 for non-believers ;)
}
while (networkStream.DataAvailable);
}
Then the NetworkStream have enough time to properly set .DataAvailable and this method should function correctly.
Fun fact... This seems to be somehow OS Version dependent. Because the first function without sleep worked for me on Win XP and Win 10, but was failing to receive whole 1000 bytes on Win 7. Don't ask me why, but I tested it quite thoroughly and it was easily reproducible.
Using TcpClient.Available will allow this code to read exactly what is available each time. TcpClient.Available is automatically set to TcpClient.ReceiveBufferSize when the amount of data remaining to be read is greater than or equal to TcpClient.ReceiveBufferSize. Otherwise it is set to the size of the remaining data.
Hence, you can indicate the maximum amount of data that is available for each read by setting TcpClient.ReceiveBufferSize (e.g., oClient.ReceiveBufferSize = 4096;).
protected void OnClientCommunication(TcpClient oClient)
{
NetworkStream stream = oClient.GetStream();
MemoryStream msIn = new MemoryStream();
byte[] aMessage;
oClient.ReceiveBufferSize = 4096;
int iBytesRead = 0;
while (stream.DataAvailable)
{
int myBufferSize = (oClient.Available < 1) ? 1 : oClient.Available;
aMessage = new byte[oClient.Available];
int iRead = stream.Read(aMessage, 0, aMessage.Length);
iBytesRead += iRead;
msIn.Write(aMessage, 0, iRead);
}
MemoryStream msOut = new MemoryStream();
// .. Do some processing adding data to the msOut stream
msOut.WriteTo(stream);
stream.Flush();
oClient.Close();
}
public class NetworkStream
{
private readonly Socket m_Socket;
public NetworkStream(Socket socket)
{
m_Socket = socket ?? throw new ArgumentNullException(nameof(socket));
}
public void Send(string message)
{
if (message is null)
{
throw new ArgumentNullException(nameof(message));
}
byte[] data = Encoding.UTF8.GetBytes(message);
SendInternal(data);
}
public string Receive()
{
byte[] buffer = ReceiveInternal();
string message = Encoding.UTF8.GetString(buffer);
return message;
}
private void SendInternal(byte[] message)
{
int size = message.Length;
if (size == 0)
{
m_Socket.Send(BitConverter.GetBytes(size), 0, sizeof(int), SocketFlags.None);
}
else
{
m_Socket.Send(BitConverter.GetBytes(size), 0, sizeof(int), SocketFlags.None);
m_Socket.Send(message, 0, size, SocketFlags.None);
}
}
private byte[] ReceiveInternal()
{
byte[] sizeData = CommonReceiveMessage(sizeof(int));
int size = BitConverter.ToInt32(sizeData);
if (size == 0)
{
return Array.Empty<byte>();
}
return CommonReceiveMessage(size);
}
private byte[] CommonReceiveMessage(int messageLength)
{
if (messageLength < 0)
{
throw new ArgumentOutOfRangeException(nameof(messageLength), messageLength, "Размер сообщения не может быть меньше нуля.");
}
if (messageLength == 0)
{
return Array.Empty<byte>();
}
byte[] buffer = new byte[m_Socket.ReceiveBufferSize];
int currentLength = 0;
int receivedDataLength;
using (MemoryStream memoryStream = new())
{
do
{
receivedDataLength = m_Socket.Receive(buffer, 0, m_Socket.ReceiveBufferSize, SocketFlags.None);
currentLength += receivedDataLength;
memoryStream.Write(buffer, 0, receivedDataLength);
}
while (currentLength < messageLength);
return memoryStream.ToArray();
}
}
}
This example presents an algorithm for sending and receiving data, namely text messages. You can also send files.
using System;
using System.IO;
using System.Net.Sockets;
using System.Text;
namespace Network
{
/// <summary>
/// Represents a network stream for transferring data.
/// </summary>
public class NetworkStream
{
#region Fields
private static readonly byte[] EmptyArray = Array.Empty<byte>();
private readonly Socket m_Socket;
#endregion
#region Constructors
/// <summary>
/// Initializes a new instance of the class <seealso cref="NetworkStream"/>.
/// </summary>
/// <param name="socket">
/// Berkeley socket interface.
/// </param>
public NetworkStream(Socket socket)
{
m_Socket = socket ?? throw new ArgumentNullException(nameof(socket));
}
#endregion
#region Properties
#endregion
#region Methods
/// <summary>
/// Sends a message.
/// </summary>
/// <param name="message">
/// Message text.
/// </param>
/// <exception cref="ArgumentNullException"/>
public void Send(string message)
{
if (message is null)
{
throw new ArgumentNullException(nameof(message));
}
byte[] data = Encoding.UTF8.GetBytes(message);
Write(data);
}
/// <summary>
/// Receives the sent message.
/// </summary>
/// <returns>
/// Sent message.
/// </returns>
public string Receive()
{
byte[] data = Read();
return Encoding.UTF8.GetString(data);
}
/// <summary>
/// Receives the specified number of bytes from a bound <seealso cref="Socket"/>.
/// </summary>
/// <param name="socket">
/// <seealso cref="Socket"/> for receiving data.
/// </param>
/// <param name="size">
/// The size of the received data.
/// </param>
/// <returns>
/// Returns an array of received data.
/// </returns>
private byte[] Read(int size)
{
if (size < 0)
{
// You can throw an exception.
return null;
}
if (size == 0)
{
// Don't throw an exception here, just return an empty data array.
return EmptyArray;
}
// There are many examples on the Internet where the
// Socket.Available property is used, this is WRONG!
// Important! The Socket.Available property is not working as expected.
// Data packages may be in transit, but the Socket.Available property may indicate otherwise.
// Therefore, we use a counter that will allow us to receive all data packets, no more and no less.
// The cycle will continue until we receive all the data packets or the timeout is triggered.
// Note. This algorithm is not designed to work with big data.
SimpleCounter counter = new(size, m_Socket.ReceiveBufferSize);
byte[] buffer = new byte[counter.BufferSize];
int received;
using MemoryStream storage = new();
// The cycle will run until we get all the data.
while (counter.IsExpected)
{
received = m_Socket.Receive(buffer, 0, counter.Available, SocketFlags.None);
// Pass the size of the received data to the counter.
counter.Count(received);
// Write data to memory.
storage.Write(buffer, 0, received);
}
return storage.ToArray();
}
/// <summary>
/// Receives the specified number of bytes from a bound <seealso cref="Socket"/>.
/// </summary>
/// <returns>
/// Returns an array of received data.
/// </returns>
private byte[] Read()
{
byte[] sizeData;
// First, we get the size of the master data.
sizeData = Read(sizeof(int));
// We convert the received data into a number.
int size = BitConverter.ToInt32(sizeData);
// If the data size is less than 0 then throws an exception.
// We inform the recipient that an error occurred while reading the data.
if (size < 0)
{
// Or return the value null.
throw new SocketException();
}
// If the data size is 0, then we will return an empty array.
// Do not allow an exception here.
if (size == 0)
{
return EmptyArray;
}
// Here we read the master data.
byte[] data = Read(size);
return data;
}
/// <summary>
/// Writes data to the stream.
/// </summary>
/// <param name="data"></param>
private void Write(byte[] data)
{
if (data is null)
{
// Throw an exception.
// Or send a negative number that will represent the value null.
throw new ArgumentNullException(nameof(data));
}
byte[] sizeData = BitConverter.GetBytes(data.Length);
// In any case, we inform the recipient about the size of the data.
m_Socket.Send(sizeData, 0, sizeof(int), SocketFlags.None);
if (data.Length != 0)
{
// We send data whose size is greater than zero.
m_Socket.Send(data, 0, data.Length, SocketFlags.None);
}
}
#endregion
#region Classes
/// <summary>
/// Represents a simple counter of received data over the network.
/// </summary>
private class SimpleCounter
{
#region Fields
private int m_Received;
private int m_Available;
private bool m_IsExpected;
#endregion
#region Constructors
/// <summary>
/// Initializes a new instance of the class <seealso cref="SimpleCounter"/>.
/// </summary>
/// <param name="dataSize">
/// Data size.
/// </param>
/// <param name="bufferSize">
/// Buffer size.
/// </param>
/// <exception cref="ArgumentOutOfRangeException"/>
public SimpleCounter(int dataSize, int bufferSize)
{
if (dataSize < 0)
{
throw new ArgumentOutOfRangeException(nameof(dataSize), dataSize, "Data size cannot be less than 0");
}
if (bufferSize < 0)
{
throw new ArgumentOutOfRangeException(nameof(dataSize), bufferSize, "Buffer size cannot be less than 0");
}
DataSize = dataSize;
BufferSize = bufferSize;
// Update the counter data.
UpdateCounter();
}
#endregion
#region Properties
/// <summary>
/// Returns the size of the expected data.
/// </summary>
/// <value>
/// Size of expected data.
/// </value>
public int DataSize { get; }
/// <summary>
/// Returns the size of the buffer.
/// </summary>
/// <value>
/// Buffer size.
/// </value>
public int BufferSize { get; }
/// <summary>
/// Returns the available buffer size for receiving data.
/// </summary>
/// <value>
/// Available buffer size.
/// </value>
public int Available
{
get
{
return m_Available;
}
}
/// <summary>
/// Returns a value indicating whether the thread should wait for data.
/// </summary>
/// <value>
/// <see langword="true"/> if the stream is waiting for data; otherwise, <see langword="false"/>.
/// </value>
public bool IsExpected
{
get
{
return m_IsExpected;
}
}
#endregion
#region Methods
// Updates the counter.
private void UpdateCounter()
{
int unreadDataSize = DataSize - m_Received;
m_Available = unreadDataSize < BufferSize ? unreadDataSize : BufferSize;
m_IsExpected = m_Available > 0;
}
/// <summary>
/// Specifies the size of the received data.
/// </summary>
/// <param name="bytes">
/// The size of the received data.
/// </param>
public void Count(int bytes)
{
// NOTE: Counter cannot decrease.
if (bytes > 0)
{
int received = m_Received += bytes;
// NOTE: The value of the received data cannot exceed the size of the expected data.
m_Received = (received < DataSize) ? received : DataSize;
// Update the counter data.
UpdateCounter();
}
}
/// <summary>
/// Resets counter data.
/// </summary>
public void Reset()
{
m_Received = 0;
UpdateCounter();
}
#endregion
}
#endregion
}
}
Use a do-while loop. This will make sure the memory stream pointers have moved. The first Read or ReadAsync will cause the memorystream pointer to move and then onwards the ".DataAvailable" property will continue to return true until we hit the end of the stream.
An example from microsoft docs:
// Check to see if this NetworkStream is readable.
if(myNetworkStream.CanRead){
byte[] myReadBuffer = new byte[1024];
StringBuilder myCompleteMessage = new StringBuilder();
int numberOfBytesRead = 0;
// Incoming message may be larger than the buffer size.
do{
numberOfBytesRead = myNetworkStream.Read(myReadBuffer, 0, myReadBuffer.Length);
myCompleteMessage.AppendFormat("{0}", Encoding.ASCII.GetString(myReadBuffer, 0, numberOfBytesRead));
}
while(myNetworkStream.DataAvailable);
// Print out the received message to the console.
Console.WriteLine("You received the following message : " +
myCompleteMessage);
}
else{
Console.WriteLine("Sorry. You cannot read from this NetworkStream.");
}
Original Micorosoft Doc

Code Contracts Static Analysis: Prover Limitations?

I've been playing with Code Contracts and I really like what I've seen so far. They encourage me to evaluate and explicitly declare my assumptions, which has already helped me to identify a few corner cases I hadn't considered in the code to which I'm adding contracts. Right now I'm playing with trying to enforce more sophisticated invariants. I have one case that currently fails proving and I'm curious if there is a way I can fix this besides simply adding Contract.Assume calls. Here is the class in question, stripped down for ease of reading:
public abstract class MemoryEncoder
{
private const int CapacityDelta = 16;
private int _currentByte;
/// <summary>
/// The current byte index in the encoding stream.
/// This should not need to be modified, under typical usage,
/// but can be used to randomly access the encoding region.
/// </summary>
public int CurrentByte
{
get
{
Contract.Ensures(Contract.Result<int>() >= 0);
Contract.Ensures(Contract.Result<int>() <= Length);
return _currentByte;
}
set
{
Contract.Requires(value >= 0);
Contract.Requires(value <= Length);
_currentByte = value;
}
}
/// <summary>
/// Current number of bytes encoded in the buffer.
/// This may be less than the size of the buffer (capacity).
/// </summary>
public int Length { get; private set; }
/// <summary>
/// The raw buffer encapsulated by the encoder.
/// </summary>
protected internal Byte[] Buffer { get; private set; }
/// <summary>
/// Reserve space in the encoder buffer for the specified number of new bytes
/// </summary>
/// <param name="bytesRequired">The number of bytes required</param>
protected void ReserveSpace(int bytesRequired)
{
Contract.Requires(bytesRequired > 0);
Contract.Ensures((Length - CurrentByte) >= bytesRequired);
//Check if these bytes would overflow the current buffer););
if ((CurrentByte + bytesRequired) > Buffer.Length)
{
//Create a new buffer with at least enough space for the additional bytes required
var newBuffer = new Byte[Buffer.Length + Math.Max(bytesRequired, CapacityDelta)];
//Copy the contents of the previous buffer and replace the original buffer reference
Buffer.CopyTo(newBuffer, 0);
Buffer = newBuffer;
}
//Check if the total length of written bytes has increased
if ((CurrentByte + bytesRequired) > Length)
{
Length = CurrentByte + bytesRequired;
}
}
[ContractInvariantMethod]
private void GlobalRules()
{
Contract.Invariant(Buffer != null);
Contract.Invariant(Length <= Buffer.Length);
Contract.Invariant(CurrentByte >= 0);
Contract.Invariant(CurrentByte <= Length);
}
}
I'm interested in how I can structure the Contract calls in ReserveSpace so that the class invariants are provable. In particular, it complains about (Length <= Buffer.Length) and (CurrentByte <= Length). It's reasonable to me that it can't see that (Length <= Buffer.Length) is satisfied, since it's creating a new buffer and reassigning the reference. Is my only option to add an Assume that the invariants are satisfied?
After fighting with this for a while, I came up with this provable solution (constructor is a dummy to allow for isolated testing):
public abstract class MemoryEncoder
{
private const int CapacityDelta = 16;
private byte[] _buffer;
private int _currentByte;
private int _length;
protected MemoryEncoder()
{
Buffer = new byte[500];
Length = 0;
CurrentByte = 0;
}
/// <summary>
/// The current byte index in the encoding stream.
/// This should not need to be modified, under typical usage,
/// but can be used to randomly access the encoding region.
/// </summary>
public int CurrentByte
{
get
{
return _currentByte;
}
set
{
Contract.Requires(value >= 0);
Contract.Requires(value <= Length);
_currentByte = value;
}
}
/// <summary>
/// Current number of bytes encoded in the buffer.
/// This may be less than the size of the buffer (capacity).
/// </summary>
public int Length
{
get { return _length; }
private set
{
Contract.Requires(value >= 0);
Contract.Requires(value <= _buffer.Length);
Contract.Requires(value >= CurrentByte);
Contract.Ensures(_length <= _buffer.Length);
_length = value;
}
}
/// <summary>
/// The raw buffer encapsulated by the encoder.
/// </summary>
protected internal Byte[] Buffer
{
get { return _buffer; }
private set
{
Contract.Requires(value != null);
Contract.Requires(value.Length >= _length);
_buffer = value;
}
}
/// <summary>
/// Reserve space in the encoder buffer for the specified number of new bytes
/// </summary>
/// <param name="bytesRequired">The number of bytes required</param>
protected void ReserveSpace(int bytesRequired)
{
Contract.Requires(bytesRequired > 0);
Contract.Ensures((Length - CurrentByte) >= bytesRequired);
//Check if these bytes would overflow the current buffer););
if ((CurrentByte + bytesRequired) > Buffer.Length)
{
//Create a new buffer with at least enough space for the additional bytes required
var newBuffer = new Byte[Buffer.Length + Math.Max(bytesRequired, CapacityDelta)];
//Copy the contents of the previous buffer and replace the original buffer reference
Buffer.CopyTo(newBuffer, 0);
Buffer = newBuffer;
}
//Check if the total length of written bytes has increased
if ((CurrentByte + bytesRequired) > Length)
{
Contract.Assume(CurrentByte + bytesRequired <= _buffer.Length);
Length = CurrentByte + bytesRequired;
}
}
[ContractInvariantMethod]
private void GlobalRules()
{
Contract.Invariant(_buffer != null);
Contract.Invariant(_length <= _buffer.Length);
Contract.Invariant(_currentByte >= 0);
Contract.Invariant(_currentByte <= _length);
}
}
The main thing I noticed is that placing invariants on properties gets messy, but seems to solve more easily with invariants on fields. It was also important to place appropriate contractual obligations in the property accessors. I'll have to keep experimenting and see what works and what doesn't. It's an interesting system, but I'd definitely like to know more if anybody has a good 'cheat sheet' on how the prover works.

Reading text files line by line, with exact offset/position reporting

My simple requirement: Reading a huge (> a million) line test file (For this example assume it's a CSV of some sorts) and keeping a reference to the beginning of that line for faster lookup in the future (read a line, starting at X).
I tried the naive and easy way first, using a StreamWriter and accessing the underlying BaseStream.Position. Unfortunately that doesn't work as I intended:
Given a file containing the following
Foo
Bar
Baz
Bla
Fasel
and this very simple code
using (var sr = new StreamReader(#"C:\Temp\LineTest.txt")) {
string line;
long pos = sr.BaseStream.Position;
while ((line = sr.ReadLine()) != null) {
Console.Write("{0:d3} ", pos);
Console.WriteLine(line);
pos = sr.BaseStream.Position;
}
}
the output is:
000 Foo
025 Bar
025 Baz
025 Bla
025 Fasel
I can imagine that the stream is trying to be helpful/efficient and probably reads in (big) chunks whenever new data is necessary. For me this is bad..
The question, finally: Any way to get the (byte, char) offset while reading a file line by line without using a basic Stream and messing with \r \n \r\n and string encoding etc. manually? Not a big deal, really, I just don't like to build things that might exist already..
You could create a TextReader wrapper, which would track the current position in the base TextReader :
public class TrackingTextReader : TextReader
{
private TextReader _baseReader;
private int _position;
public TrackingTextReader(TextReader baseReader)
{
_baseReader = baseReader;
}
public override int Read()
{
_position++;
return _baseReader.Read();
}
public override int Peek()
{
return _baseReader.Peek();
}
public int Position
{
get { return _position; }
}
}
You could then use it as follows :
string text = #"Foo
Bar
Baz
Bla
Fasel";
using (var reader = new StringReader(text))
using (var trackingReader = new TrackingTextReader(reader))
{
string line;
while ((line = trackingReader.ReadLine()) != null)
{
Console.WriteLine("{0:d3} {1}", trackingReader.Position, line);
}
}
After searching, testing and do something crazy, there is my code to solve (I'm currently using this code in my product).
public sealed class TextFileReader : IDisposable
{
FileStream _fileStream = null;
BinaryReader _binReader = null;
StreamReader _streamReader = null;
List<string> _lines = null;
long _length = -1;
/// <summary>
/// Initializes a new instance of the <see cref="TextFileReader"/> class with default encoding (UTF8).
/// </summary>
/// <param name="filePath">The path to text file.</param>
public TextFileReader(string filePath) : this(filePath, Encoding.UTF8) { }
/// <summary>
/// Initializes a new instance of the <see cref="TextFileReader"/> class.
/// </summary>
/// <param name="filePath">The path to text file.</param>
/// <param name="encoding">The encoding of text file.</param>
public TextFileReader(string filePath, Encoding encoding)
{
if (!File.Exists(filePath))
throw new FileNotFoundException("File (" + filePath + ") is not found.");
_fileStream = new FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.Read);
_length = _fileStream.Length;
_binReader = new BinaryReader(_fileStream, encoding);
}
/// <summary>
/// Reads a line of characters from the current stream at the current position and returns the data as a string.
/// </summary>
/// <returns>The next line from the input stream, or null if the end of the input stream is reached</returns>
public string ReadLine()
{
if (_binReader.PeekChar() == -1)
return null;
string line = "";
int nextChar = _binReader.Read();
while (nextChar != -1)
{
char current = (char)nextChar;
if (current.Equals('\n'))
break;
else if (current.Equals('\r'))
{
int pickChar = _binReader.PeekChar();
if (pickChar != -1 && ((char)pickChar).Equals('\n'))
nextChar = _binReader.Read();
break;
}
else
line += current;
nextChar = _binReader.Read();
}
return line;
}
/// <summary>
/// Reads some lines of characters from the current stream at the current position and returns the data as a collection of string.
/// </summary>
/// <param name="totalLines">The total number of lines to read (set as 0 to read from current position to end of file).</param>
/// <returns>The next lines from the input stream, or empty collectoin if the end of the input stream is reached</returns>
public List<string> ReadLines(int totalLines)
{
if (totalLines < 1 && this.Position == 0)
return this.ReadAllLines();
_lines = new List<string>();
int counter = 0;
string line = this.ReadLine();
while (line != null)
{
_lines.Add(line);
counter++;
if (totalLines > 0 && counter >= totalLines)
break;
line = this.ReadLine();
}
return _lines;
}
/// <summary>
/// Reads all lines of characters from the current stream (from the begin to end) and returns the data as a collection of string.
/// </summary>
/// <returns>The next lines from the input stream, or empty collectoin if the end of the input stream is reached</returns>
public List<string> ReadAllLines()
{
if (_streamReader == null)
_streamReader = new StreamReader(_fileStream);
_streamReader.BaseStream.Seek(0, SeekOrigin.Begin);
_lines = new List<string>();
string line = _streamReader.ReadLine();
while (line != null)
{
_lines.Add(line);
line = _streamReader.ReadLine();
}
return _lines;
}
/// <summary>
/// Gets the length of text file (in bytes).
/// </summary>
public long Length
{
get { return _length; }
}
/// <summary>
/// Gets or sets the current reading position.
/// </summary>
public long Position
{
get
{
if (_binReader == null)
return -1;
else
return _binReader.BaseStream.Position;
}
set
{
if (_binReader == null)
return;
else if (value >= this.Length)
this.SetPosition(this.Length);
else
this.SetPosition(value);
}
}
void SetPosition(long position)
{
_binReader.BaseStream.Seek(position, SeekOrigin.Begin);
}
/// <summary>
/// Gets the lines after reading.
/// </summary>
public List<string> Lines
{
get
{
return _lines;
}
}
/// <summary>
/// Performs application-defined tasks associated with freeing, releasing, or resetting unmanaged resources.
/// </summary>
public void Dispose()
{
if (_binReader != null)
_binReader.Close();
if (_streamReader != null)
{
_streamReader.Close();
_streamReader.Dispose();
}
if (_fileStream != null)
{
_fileStream.Close();
_fileStream.Dispose();
}
}
~TextFileReader()
{
this.Dispose();
}
}
This is really tough issue.
After very long and exhausting enumeration of different solutions in the internet (including solutions from this thread, thank you!) I had to create my own bicycle.
I had following requirements:
Performance - reading must be very fast, so reading one char at the time or using reflection are not acceptable, so buffering is required
Streaming - file can be huge, so it is not acceptable to read it to memory entirely
Tailing - file tailing should be available
Long lines - lines can be very long, so buffer can't be limited
Stable - single byte error was immediately visible during usage. Unfortunately for me, several implementations I found were with stability problems
public class OffsetStreamReader
{
private const int InitialBufferSize = 4096;
private readonly char _bom;
private readonly byte _end;
private readonly Encoding _encoding;
private readonly Stream _stream;
private readonly bool _tail;
private byte[] _buffer;
private int _processedInBuffer;
private int _informationInBuffer;
public OffsetStreamReader(Stream stream, bool tail)
{
_buffer = new byte[InitialBufferSize];
_processedInBuffer = InitialBufferSize;
if (stream == null || !stream.CanRead)
throw new ArgumentException("stream");
_stream = stream;
_tail = tail;
_encoding = Encoding.UTF8;
_bom = '\uFEFF';
_end = _encoding.GetBytes(new [] {'\n'})[0];
}
public long Offset { get; private set; }
public string ReadLine()
{
// Underlying stream closed
if (!_stream.CanRead)
return null;
// EOF
if (_processedInBuffer == _informationInBuffer)
{
if (_tail)
{
_processedInBuffer = _buffer.Length;
_informationInBuffer = 0;
ReadBuffer();
}
return null;
}
var lineEnd = Search(_buffer, _end, _processedInBuffer);
var haveEnd = true;
// File ended but no finalizing newline character
if (lineEnd.HasValue == false && _informationInBuffer + _processedInBuffer < _buffer.Length)
{
if (_tail)
return null;
else
{
lineEnd = _informationInBuffer;
haveEnd = false;
}
}
// No end in current buffer
if (!lineEnd.HasValue)
{
ReadBuffer();
if (_informationInBuffer != 0)
return ReadLine();
return null;
}
var arr = new byte[lineEnd.Value - _processedInBuffer];
Array.Copy(_buffer, _processedInBuffer, arr, 0, arr.Length);
Offset = Offset + lineEnd.Value - _processedInBuffer + (haveEnd ? 1 : 0);
_processedInBuffer = lineEnd.Value + (haveEnd ? 1 : 0);
return _encoding.GetString(arr).TrimStart(_bom).TrimEnd('\r', '\n');
}
private void ReadBuffer()
{
var notProcessedPartLength = _buffer.Length - _processedInBuffer;
// Extend buffer to be able to fit whole line to the buffer
// Was [NOT_PROCESSED]
// Become [NOT_PROCESSED ]
if (notProcessedPartLength == _buffer.Length)
{
var extendedBuffer = new byte[_buffer.Length + _buffer.Length/2];
Array.Copy(_buffer, extendedBuffer, _buffer.Length);
_buffer = extendedBuffer;
}
// Copy not processed information to the begining
// Was [PROCESSED NOT_PROCESSED]
// Become [NOT_PROCESSED ]
Array.Copy(_buffer, (long) _processedInBuffer, _buffer, 0, notProcessedPartLength);
// Read more information to the empty part of buffer
// Was [ NOT_PROCESSED ]
// Become [ NOT_PROCESSED NEW_NOT_PROCESSED ]
_informationInBuffer = notProcessedPartLength + _stream.Read(_buffer, notProcessedPartLength, _buffer.Length - notProcessedPartLength);
_processedInBuffer = 0;
}
private int? Search(byte[] buffer, byte byteToSearch, int bufferOffset)
{
for (int i = bufferOffset; i < buffer.Length - 1; i++)
{
if (buffer[i] == byteToSearch)
return i;
}
return null;
}
}
Though Thomas Levesque's solution works well, here's mine. It uses reflection so it will be slower, but it's encoding-independent. Plus I added Seek extension too.
/// <summary>Useful <see cref="StreamReader"/> extentions.</summary>
public static class StreamReaderExtentions
{
/// <summary>Gets the position within the <see cref="StreamReader.BaseStream"/> of the <see cref="StreamReader"/>.</summary>
/// <remarks><para>This method is quite slow. It uses reflection to access private <see cref="StreamReader"/> fields. Don't use it too often.</para></remarks>
/// <param name="streamReader">Source <see cref="StreamReader"/>.</param>
/// <exception cref="ArgumentNullException">Occurs when passed <see cref="StreamReader"/> is null.</exception>
/// <returns>The current position of this stream.</returns>
public static long GetPosition(this StreamReader streamReader)
{
if (streamReader == null)
throw new ArgumentNullException("streamReader");
var charBuffer = (char[])streamReader.GetType().InvokeMember("charBuffer", BindingFlags.DeclaredOnly | BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.GetField, null, streamReader, null);
var charPos = (int)streamReader.GetType().InvokeMember("charPos", BindingFlags.DeclaredOnly | BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.GetField, null, streamReader, null);
var charLen = (int)streamReader.GetType().InvokeMember("charLen", BindingFlags.DeclaredOnly | BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.GetField, null, streamReader, null);
var offsetLength = streamReader.CurrentEncoding.GetByteCount(charBuffer, charPos, charLen - charPos);
return streamReader.BaseStream.Position - offsetLength;
}
/// <summary>Sets the position within the <see cref="StreamReader.BaseStream"/> of the <see cref="StreamReader"/>.</summary>
/// <remarks>
/// <para><see cref="StreamReader.BaseStream"/> should be seekable.</para>
/// <para>This method is quite slow. It uses reflection and flushes the charBuffer of the <see cref="StreamReader.BaseStream"/>. Don't use it too often.</para>
/// </remarks>
/// <param name="streamReader">Source <see cref="StreamReader"/>.</param>
/// <param name="position">The point relative to origin from which to begin seeking.</param>
/// <param name="origin">Specifies the beginning, the end, or the current position as a reference point for origin, using a value of type <see cref="SeekOrigin"/>. </param>
/// <exception cref="ArgumentNullException">Occurs when passed <see cref="StreamReader"/> is null.</exception>
/// <exception cref="ArgumentException">Occurs when <see cref="StreamReader.BaseStream"/> is not seekable.</exception>
/// <returns>The new position in the stream. This position can be different to the <see cref="position"/> because of the preamble.</returns>
public static long Seek(this StreamReader streamReader, long position, SeekOrigin origin)
{
if (streamReader == null)
throw new ArgumentNullException("streamReader");
if (!streamReader.BaseStream.CanSeek)
throw new ArgumentException("Underlying stream should be seekable.", "streamReader");
var preamble = (byte[])streamReader.GetType().InvokeMember("_preamble", BindingFlags.DeclaredOnly | BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.GetField, null, streamReader, null);
if (preamble.Length > 0 && position < preamble.Length) // preamble or BOM must be skipped
position += preamble.Length;
var newPosition = streamReader.BaseStream.Seek(position, origin); // seek
streamReader.DiscardBufferedData(); // this updates the buffer
return newPosition;
}
}
Would this work:
using (var sr = new StreamReader(#"C:\Temp\LineTest.txt")) {
string line;
long pos = 0;
while ((line = sr.ReadLine()) != null) {
Console.Write("{0:d3} ", pos);
Console.WriteLine(line);
pos += line.Length;
}
}

Categories

Resources