Reading text files line by line, with exact offset/position reporting - c#

My simple requirement: Reading a huge (> a million) line test file (For this example assume it's a CSV of some sorts) and keeping a reference to the beginning of that line for faster lookup in the future (read a line, starting at X).
I tried the naive and easy way first, using a StreamWriter and accessing the underlying BaseStream.Position. Unfortunately that doesn't work as I intended:
Given a file containing the following
Foo
Bar
Baz
Bla
Fasel
and this very simple code
using (var sr = new StreamReader(#"C:\Temp\LineTest.txt")) {
string line;
long pos = sr.BaseStream.Position;
while ((line = sr.ReadLine()) != null) {
Console.Write("{0:d3} ", pos);
Console.WriteLine(line);
pos = sr.BaseStream.Position;
}
}
the output is:
000 Foo
025 Bar
025 Baz
025 Bla
025 Fasel
I can imagine that the stream is trying to be helpful/efficient and probably reads in (big) chunks whenever new data is necessary. For me this is bad..
The question, finally: Any way to get the (byte, char) offset while reading a file line by line without using a basic Stream and messing with \r \n \r\n and string encoding etc. manually? Not a big deal, really, I just don't like to build things that might exist already..

You could create a TextReader wrapper, which would track the current position in the base TextReader :
public class TrackingTextReader : TextReader
{
private TextReader _baseReader;
private int _position;
public TrackingTextReader(TextReader baseReader)
{
_baseReader = baseReader;
}
public override int Read()
{
_position++;
return _baseReader.Read();
}
public override int Peek()
{
return _baseReader.Peek();
}
public int Position
{
get { return _position; }
}
}
You could then use it as follows :
string text = #"Foo
Bar
Baz
Bla
Fasel";
using (var reader = new StringReader(text))
using (var trackingReader = new TrackingTextReader(reader))
{
string line;
while ((line = trackingReader.ReadLine()) != null)
{
Console.WriteLine("{0:d3} {1}", trackingReader.Position, line);
}
}

After searching, testing and do something crazy, there is my code to solve (I'm currently using this code in my product).
public sealed class TextFileReader : IDisposable
{
FileStream _fileStream = null;
BinaryReader _binReader = null;
StreamReader _streamReader = null;
List<string> _lines = null;
long _length = -1;
/// <summary>
/// Initializes a new instance of the <see cref="TextFileReader"/> class with default encoding (UTF8).
/// </summary>
/// <param name="filePath">The path to text file.</param>
public TextFileReader(string filePath) : this(filePath, Encoding.UTF8) { }
/// <summary>
/// Initializes a new instance of the <see cref="TextFileReader"/> class.
/// </summary>
/// <param name="filePath">The path to text file.</param>
/// <param name="encoding">The encoding of text file.</param>
public TextFileReader(string filePath, Encoding encoding)
{
if (!File.Exists(filePath))
throw new FileNotFoundException("File (" + filePath + ") is not found.");
_fileStream = new FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.Read);
_length = _fileStream.Length;
_binReader = new BinaryReader(_fileStream, encoding);
}
/// <summary>
/// Reads a line of characters from the current stream at the current position and returns the data as a string.
/// </summary>
/// <returns>The next line from the input stream, or null if the end of the input stream is reached</returns>
public string ReadLine()
{
if (_binReader.PeekChar() == -1)
return null;
string line = "";
int nextChar = _binReader.Read();
while (nextChar != -1)
{
char current = (char)nextChar;
if (current.Equals('\n'))
break;
else if (current.Equals('\r'))
{
int pickChar = _binReader.PeekChar();
if (pickChar != -1 && ((char)pickChar).Equals('\n'))
nextChar = _binReader.Read();
break;
}
else
line += current;
nextChar = _binReader.Read();
}
return line;
}
/// <summary>
/// Reads some lines of characters from the current stream at the current position and returns the data as a collection of string.
/// </summary>
/// <param name="totalLines">The total number of lines to read (set as 0 to read from current position to end of file).</param>
/// <returns>The next lines from the input stream, or empty collectoin if the end of the input stream is reached</returns>
public List<string> ReadLines(int totalLines)
{
if (totalLines < 1 && this.Position == 0)
return this.ReadAllLines();
_lines = new List<string>();
int counter = 0;
string line = this.ReadLine();
while (line != null)
{
_lines.Add(line);
counter++;
if (totalLines > 0 && counter >= totalLines)
break;
line = this.ReadLine();
}
return _lines;
}
/// <summary>
/// Reads all lines of characters from the current stream (from the begin to end) and returns the data as a collection of string.
/// </summary>
/// <returns>The next lines from the input stream, or empty collectoin if the end of the input stream is reached</returns>
public List<string> ReadAllLines()
{
if (_streamReader == null)
_streamReader = new StreamReader(_fileStream);
_streamReader.BaseStream.Seek(0, SeekOrigin.Begin);
_lines = new List<string>();
string line = _streamReader.ReadLine();
while (line != null)
{
_lines.Add(line);
line = _streamReader.ReadLine();
}
return _lines;
}
/// <summary>
/// Gets the length of text file (in bytes).
/// </summary>
public long Length
{
get { return _length; }
}
/// <summary>
/// Gets or sets the current reading position.
/// </summary>
public long Position
{
get
{
if (_binReader == null)
return -1;
else
return _binReader.BaseStream.Position;
}
set
{
if (_binReader == null)
return;
else if (value >= this.Length)
this.SetPosition(this.Length);
else
this.SetPosition(value);
}
}
void SetPosition(long position)
{
_binReader.BaseStream.Seek(position, SeekOrigin.Begin);
}
/// <summary>
/// Gets the lines after reading.
/// </summary>
public List<string> Lines
{
get
{
return _lines;
}
}
/// <summary>
/// Performs application-defined tasks associated with freeing, releasing, or resetting unmanaged resources.
/// </summary>
public void Dispose()
{
if (_binReader != null)
_binReader.Close();
if (_streamReader != null)
{
_streamReader.Close();
_streamReader.Dispose();
}
if (_fileStream != null)
{
_fileStream.Close();
_fileStream.Dispose();
}
}
~TextFileReader()
{
this.Dispose();
}
}

This is really tough issue.
After very long and exhausting enumeration of different solutions in the internet (including solutions from this thread, thank you!) I had to create my own bicycle.
I had following requirements:
Performance - reading must be very fast, so reading one char at the time or using reflection are not acceptable, so buffering is required
Streaming - file can be huge, so it is not acceptable to read it to memory entirely
Tailing - file tailing should be available
Long lines - lines can be very long, so buffer can't be limited
Stable - single byte error was immediately visible during usage. Unfortunately for me, several implementations I found were with stability problems
public class OffsetStreamReader
{
private const int InitialBufferSize = 4096;
private readonly char _bom;
private readonly byte _end;
private readonly Encoding _encoding;
private readonly Stream _stream;
private readonly bool _tail;
private byte[] _buffer;
private int _processedInBuffer;
private int _informationInBuffer;
public OffsetStreamReader(Stream stream, bool tail)
{
_buffer = new byte[InitialBufferSize];
_processedInBuffer = InitialBufferSize;
if (stream == null || !stream.CanRead)
throw new ArgumentException("stream");
_stream = stream;
_tail = tail;
_encoding = Encoding.UTF8;
_bom = '\uFEFF';
_end = _encoding.GetBytes(new [] {'\n'})[0];
}
public long Offset { get; private set; }
public string ReadLine()
{
// Underlying stream closed
if (!_stream.CanRead)
return null;
// EOF
if (_processedInBuffer == _informationInBuffer)
{
if (_tail)
{
_processedInBuffer = _buffer.Length;
_informationInBuffer = 0;
ReadBuffer();
}
return null;
}
var lineEnd = Search(_buffer, _end, _processedInBuffer);
var haveEnd = true;
// File ended but no finalizing newline character
if (lineEnd.HasValue == false && _informationInBuffer + _processedInBuffer < _buffer.Length)
{
if (_tail)
return null;
else
{
lineEnd = _informationInBuffer;
haveEnd = false;
}
}
// No end in current buffer
if (!lineEnd.HasValue)
{
ReadBuffer();
if (_informationInBuffer != 0)
return ReadLine();
return null;
}
var arr = new byte[lineEnd.Value - _processedInBuffer];
Array.Copy(_buffer, _processedInBuffer, arr, 0, arr.Length);
Offset = Offset + lineEnd.Value - _processedInBuffer + (haveEnd ? 1 : 0);
_processedInBuffer = lineEnd.Value + (haveEnd ? 1 : 0);
return _encoding.GetString(arr).TrimStart(_bom).TrimEnd('\r', '\n');
}
private void ReadBuffer()
{
var notProcessedPartLength = _buffer.Length - _processedInBuffer;
// Extend buffer to be able to fit whole line to the buffer
// Was [NOT_PROCESSED]
// Become [NOT_PROCESSED ]
if (notProcessedPartLength == _buffer.Length)
{
var extendedBuffer = new byte[_buffer.Length + _buffer.Length/2];
Array.Copy(_buffer, extendedBuffer, _buffer.Length);
_buffer = extendedBuffer;
}
// Copy not processed information to the begining
// Was [PROCESSED NOT_PROCESSED]
// Become [NOT_PROCESSED ]
Array.Copy(_buffer, (long) _processedInBuffer, _buffer, 0, notProcessedPartLength);
// Read more information to the empty part of buffer
// Was [ NOT_PROCESSED ]
// Become [ NOT_PROCESSED NEW_NOT_PROCESSED ]
_informationInBuffer = notProcessedPartLength + _stream.Read(_buffer, notProcessedPartLength, _buffer.Length - notProcessedPartLength);
_processedInBuffer = 0;
}
private int? Search(byte[] buffer, byte byteToSearch, int bufferOffset)
{
for (int i = bufferOffset; i < buffer.Length - 1; i++)
{
if (buffer[i] == byteToSearch)
return i;
}
return null;
}
}

Though Thomas Levesque's solution works well, here's mine. It uses reflection so it will be slower, but it's encoding-independent. Plus I added Seek extension too.
/// <summary>Useful <see cref="StreamReader"/> extentions.</summary>
public static class StreamReaderExtentions
{
/// <summary>Gets the position within the <see cref="StreamReader.BaseStream"/> of the <see cref="StreamReader"/>.</summary>
/// <remarks><para>This method is quite slow. It uses reflection to access private <see cref="StreamReader"/> fields. Don't use it too often.</para></remarks>
/// <param name="streamReader">Source <see cref="StreamReader"/>.</param>
/// <exception cref="ArgumentNullException">Occurs when passed <see cref="StreamReader"/> is null.</exception>
/// <returns>The current position of this stream.</returns>
public static long GetPosition(this StreamReader streamReader)
{
if (streamReader == null)
throw new ArgumentNullException("streamReader");
var charBuffer = (char[])streamReader.GetType().InvokeMember("charBuffer", BindingFlags.DeclaredOnly | BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.GetField, null, streamReader, null);
var charPos = (int)streamReader.GetType().InvokeMember("charPos", BindingFlags.DeclaredOnly | BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.GetField, null, streamReader, null);
var charLen = (int)streamReader.GetType().InvokeMember("charLen", BindingFlags.DeclaredOnly | BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.GetField, null, streamReader, null);
var offsetLength = streamReader.CurrentEncoding.GetByteCount(charBuffer, charPos, charLen - charPos);
return streamReader.BaseStream.Position - offsetLength;
}
/// <summary>Sets the position within the <see cref="StreamReader.BaseStream"/> of the <see cref="StreamReader"/>.</summary>
/// <remarks>
/// <para><see cref="StreamReader.BaseStream"/> should be seekable.</para>
/// <para>This method is quite slow. It uses reflection and flushes the charBuffer of the <see cref="StreamReader.BaseStream"/>. Don't use it too often.</para>
/// </remarks>
/// <param name="streamReader">Source <see cref="StreamReader"/>.</param>
/// <param name="position">The point relative to origin from which to begin seeking.</param>
/// <param name="origin">Specifies the beginning, the end, or the current position as a reference point for origin, using a value of type <see cref="SeekOrigin"/>. </param>
/// <exception cref="ArgumentNullException">Occurs when passed <see cref="StreamReader"/> is null.</exception>
/// <exception cref="ArgumentException">Occurs when <see cref="StreamReader.BaseStream"/> is not seekable.</exception>
/// <returns>The new position in the stream. This position can be different to the <see cref="position"/> because of the preamble.</returns>
public static long Seek(this StreamReader streamReader, long position, SeekOrigin origin)
{
if (streamReader == null)
throw new ArgumentNullException("streamReader");
if (!streamReader.BaseStream.CanSeek)
throw new ArgumentException("Underlying stream should be seekable.", "streamReader");
var preamble = (byte[])streamReader.GetType().InvokeMember("_preamble", BindingFlags.DeclaredOnly | BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.GetField, null, streamReader, null);
if (preamble.Length > 0 && position < preamble.Length) // preamble or BOM must be skipped
position += preamble.Length;
var newPosition = streamReader.BaseStream.Seek(position, origin); // seek
streamReader.DiscardBufferedData(); // this updates the buffer
return newPosition;
}
}

Would this work:
using (var sr = new StreamReader(#"C:\Temp\LineTest.txt")) {
string line;
long pos = 0;
while ((line = sr.ReadLine()) != null) {
Console.Write("{0:d3} ", pos);
Console.WriteLine(line);
pos += line.Length;
}
}

Related

MultipartFormData File Uploading out of memory exception

I am using this code for uploading a file :
https://gist.github.com/bgrins/1789787
But if I am trying to use this code for uploading a file "2 GB" file I am getting out of memory exception and the reason in this line :
https://gist.github.com/bgrins/1789787#file-gistfile1-cs-L75
so how can I fix this issue?
Read giant file piece by piece, and upload pieces one by one. you could provide a progress bar also.
upload code piece by piece : How to read a big file piece by piece in C#
in server side, append new pieces to a file: C# Append byte array to existing file
you can detail the code with this idea. I did it once last year, but cannot share the code.
There are more than one solution
1- Writing to RequestStream directly instead of writing to MemoryStream :
https://blogs.msdn.microsoft.com/johan/2006/11/15/are-you-getting-outofmemoryexceptions-when-uploading-large-files/
public static string MyUploader(string strFileToUpload, string strUrl)
{
string strFileFormName = "file";
Uri oUri = new Uri(strUrl);
string strBoundary = "----------" + DateTime.Now.Ticks.ToString("x");
// The trailing boundary string
byte[] boundaryBytes = Encoding.ASCII.GetBytes("\r\n--" + strBoundary + "\r\n");
// The post message header
StringBuilder sb = new StringBuilder();
sb.Append("--");
sb.Append(strBoundary);
sb.Append("\r\n");
sb.Append("Content-Disposition: form-data; name=\"");
sb.Append(strFileFormName);
sb.Append("\"; filename=\"");
sb.Append(Path.GetFileName(strFileToUpload));
sb.Append("\"");
sb.Append("\r\n");
sb.Append("Content-Type: ");
sb.Append("application/octet-stream");
sb.Append("\r\n");
sb.Append("\r\n");
string strPostHeader = sb.ToString();
byte[] postHeaderBytes = Encoding.UTF8.GetBytes(strPostHeader);
// The WebRequest
HttpWebRequest oWebrequest = (HttpWebRequest)WebRequest.Create(oUri);
oWebrequest.ContentType = "multipart/form-data; boundary=" + strBoundary;
oWebrequest.Method = "POST";
// This is important, otherwise the whole file will be read to memory anyway...
oWebrequest.AllowWriteStreamBuffering = false;
// Get a FileStream and set the final properties of the WebRequest
FileStream oFileStream = new FileStream(strFileToUpload, FileMode.Open, FileAccess.Read);
long length = postHeaderBytes.Length + oFileStream.Length + boundaryBytes.Length;
oWebrequest.ContentLength = length;
Stream oRequestStream = oWebrequest.GetRequestStream();
// Write the post header
oRequestStream.Write(postHeaderBytes, 0, postHeaderBytes.Length);
// Stream the file contents in small pieces (4096 bytes, max).
byte[] buffer = new Byte[checked((uint)Math.Min(4096, (int)oFileStream.Length))];
int bytesRead = 0;
while ((bytesRead = oFileStream.Read(buffer, 0, buffer.Length)) != 0)
oRequestStream.Write(buffer, 0, bytesRead);
oFileStream.Close();
// Add the trailing boundary
oRequestStream.Write(boundaryBytes, 0, boundaryBytes.Length);
WebResponse oWResponse = oWebrequest.GetResponse();
Stream s = oWResponse.GetResponseStream();
StreamReader sr = new StreamReader(s);
String sReturnString = sr.ReadToEnd();
// Clean up
oFileStream.Close();
oRequestStream.Close();
s.Close();
sr.Close();
return sReturnString;
}
2- Using RecyclableMemoryStream instead of MemoryStream solution
You can read more about RecyclableMemoryStream here :
http://www.philosophicalgeek.com/2015/02/06/announcing-microsoft-io-recycablememorystream/
https://github.com/Microsoft/Microsoft.IO.RecyclableMemoryStream
3- Using MemoryTributary instead of MemoryStream
You can read more about MemoryTributary here :
https://www.codeproject.com/Articles/348590/A-replacement-for-MemoryStream?msg=5257615#xx5257615xx
using System;
using System.Collections.Generic;
using System.IO;
using System.Runtime.InteropServices;
namespace LiquidEngine.Tools
{
/// <summary>
/// MemoryTributary is a re-implementation of MemoryStream that uses a dynamic list of byte arrays as a backing store, instead of a single byte array, the allocation
/// of which will fail for relatively small streams as it requires contiguous memory.
/// </summary>
public class MemoryTributary : Stream /* http://msdn.microsoft.com/en-us/library/system.io.stream.aspx */
{
#region Constructors
public MemoryTributary()
{
Position = 0;
}
public MemoryTributary(byte[] source)
{
this.Write(source, 0, source.Length);
Position = 0;
}
/* length is ignored because capacity has no meaning unless we implement an artifical limit */
public MemoryTributary(int length)
{
SetLength(length);
Position = length;
byte[] d = block; //access block to prompt the allocation of memory
Position = 0;
}
#endregion
#region Status Properties
public override bool CanRead
{
get { return true; }
}
public override bool CanSeek
{
get { return true; }
}
public override bool CanWrite
{
get { return true; }
}
#endregion
#region Public Properties
public override long Length
{
get { return length; }
}
public override long Position { get; set; }
#endregion
#region Members
protected long length = 0;
protected long blockSize = 65536;
protected List<byte[]> blocks = new List<byte[]>();
#endregion
#region Internal Properties
/* Use these properties to gain access to the appropriate block of memory for the current Position */
/// <summary>
/// The block of memory currently addressed by Position
/// </summary>
protected byte[] block
{
get
{
while (blocks.Count <= blockId)
blocks.Add(new byte[blockSize]);
return blocks[(int)blockId];
}
}
/// <summary>
/// The id of the block currently addressed by Position
/// </summary>
protected long blockId
{
get { return Position / blockSize; }
}
/// <summary>
/// The offset of the byte currently addressed by Position, into the block that contains it
/// </summary>
protected long blockOffset
{
get { return Position % blockSize; }
}
#endregion
#region Public Stream Methods
public override void Flush()
{
}
public override int Read(byte[] buffer, int offset, int count)
{
long lcount = (long)count;
if (lcount < 0)
{
throw new ArgumentOutOfRangeException("count", lcount, "Number of bytes to copy cannot be negative.");
}
long remaining = (length - Position);
if (lcount > remaining)
lcount = remaining;
if (buffer == null)
{
throw new ArgumentNullException("buffer", "Buffer cannot be null.");
}
if (offset < 0)
{
throw new ArgumentOutOfRangeException("offset",offset,"Destination offset cannot be negative.");
}
int read = 0;
long copysize = 0;
do
{
copysize = Math.Min(lcount, (blockSize - blockOffset));
Buffer.BlockCopy(block, (int)blockOffset, buffer, offset, (int)copysize);
lcount -= copysize;
offset += (int)copysize;
read += (int)copysize;
Position += copysize;
} while (lcount > 0);
return read;
}
public override long Seek(long offset, SeekOrigin origin)
{
switch (origin)
{
case SeekOrigin.Begin:
Position = offset;
break;
case SeekOrigin.Current:
Position += offset;
break;
case SeekOrigin.End:
Position = Length - offset;
break;
}
return Position;
}
public override void SetLength(long value)
{
length = value;
}
public override void Write(byte[] buffer, int offset, int count)
{
long initialPosition = Position;
int copysize;
try
{
do
{
copysize = Math.Min(count, (int)(blockSize - blockOffset));
EnsureCapacity(Position + copysize);
Buffer.BlockCopy(buffer, (int)offset, block, (int)blockOffset, copysize);
count -= copysize;
offset += copysize;
Position += copysize;
} while (count > 0);
}
catch (Exception e)
{
Position = initialPosition;
throw e;
}
}
public override int ReadByte()
{
if (Position >= length)
return -1;
byte b = block[blockOffset];
Position++;
return b;
}
public override void WriteByte(byte value)
{
EnsureCapacity(Position + 1);
block[blockOffset] = value;
Position++;
}
protected void EnsureCapacity(long intended_length)
{
if (intended_length > length)
length = (intended_length);
}
#endregion
#region IDispose
/* http://msdn.microsoft.com/en-us/library/fs2xkftw.aspx */
protected override void Dispose(bool disposing)
{
/* We do not currently use unmanaged resources */
base.Dispose(disposing);
}
#endregion
#region Public Additional Helper Methods
/// <summary>
/// Returns the entire content of the stream as a byte array. This is not safe because the call to new byte[] may
/// fail if the stream is large enough. Where possible use methods which operate on streams directly instead.
/// </summary>
/// <returns>A byte[] containing the current data in the stream</returns>
public byte[] ToArray()
{
long firstposition = Position;
Position = 0;
byte[] destination = new byte[Length];
Read(destination, 0, (int)Length);
Position = firstposition;
return destination;
}
/// <summary>
/// Reads length bytes from source into the this instance at the current position.
/// </summary>
/// <param name="source">The stream containing the data to copy</param>
/// <param name="length">The number of bytes to copy</param>
public void ReadFrom(Stream source, long length)
{
byte[] buffer = new byte[4096];
int read;
do
{
read = source.Read(buffer, 0, (int)Math.Min(4096, length));
length -= read;
this.Write(buffer, 0, read);
} while (length > 0);
}
/// <summary>
/// Writes the entire stream into destination, regardless of Position, which remains unchanged.
/// </summary>
/// <param name="destination">The stream to write the content of this stream to</param>
public void WriteTo(Stream destination)
{
long initialpos = Position;
Position = 0;
this.CopyTo(destination);
Position = initialpos;
}
#endregion
}
}

"Must be already floating point" when converting an audio file into wav file

Here I have a code for covering a audio file into wav format for better quality and reducing file size. Here I am using naudio file compression source code and I got an exception when I try to convert that file.
Must be already floating point
public string ConvertToWAV(string tempFilePath, string tempFileName, string audioType)
{
//Try to transform the file, if it fails use the original file
FileInfo fileInfo = new FileInfo(tempFilePath + tempFileName);
byte[] fileData = new byte[fileInfo.Length];
fileData = File.ReadAllBytes(tempFilePath + tempFileName);
ISampleProvider sampleProvider;
try
{
if (audioType.ToLower().Contains("wav"))
{
try
{
using (MemoryStream wav = new MemoryStream(fileData))
{
WaveStream stream = new WaveFileReader(wav);
WaveFormat target = new WaveFormat();
var s = new RawSourceWaveStream(new MemoryStream(), new WaveFormat(8000, 16, 1));
var c = new WaveFormatConversionStream(WaveFormat.CreateALawFormat(8000, 1), s);
sampleProvider = new WaveToSampleProvider(c);
WaveFileWriter.CreateWaveFile16(tempFilePath + tempFileName, sampleProvider);
wav.Close();
}
}
catch (Exception ex)
{
//We couldn't convert the file, continue with the original file.
}
}
}
catch (Exception ex)
{
throw ex;
}
return Convert.ToBase64String(fileData);
}
There are a couple problems with the code and the concept in general.
First, you're ignoring the WaveFormat of the input file. I'm guessing that you're assuming it's 8K, 16-bit, 1 channel based on the line where you create var s, but this is not a guarantee.
Second, you don't need a MemoryStream or RawSourceWaveStream. WaveFileReader is a WaveStream, and is suitable for any "next-stage" NAudio wave processor.
Third (and this is most likely your exception): The NAudio Wave processors and converters don't like A-Law (or u-Law) in the WaveFormat. A-Law (and u-Law) is technically not PCM data. As such, they're not the "wave" data that NAudio likes to play with.
Ok, with all that said, here are some suggestions. There are very particular A-Law and u-Law encoders in the NAudio.Codecs namespace. Oddly enough, they're named ALawEncoder and MuLawEncoder. These things are not stream-compatible, so we want to make them compatible.
I've added a class at the end here that does just that: Creates an IWaveProvider that actually spits out a stream A-Law or u-Law. Here's the test code that makes use of the new class. The test code does the following:
Reads an input file using MediaFoundationReader (I like this one)
Converts whatever the input format is into 16-bit PCM (while keeping the channel count) using MediaFoundationResampler. Note that this means that your input file does not have to have same format as the A-law output, so it'll convert pretty much anything.
Feeds that new 16-bit PCM stream to the custom "ALaw-to-IWaveProvider" converter class.
Writes the IWaveProvider-compatible A-Law output to a wave file.
I use the MediaFoundation classes here because they don't seem to be as particular about wave formats as the ACM-based ones.
static void ConversionTest( string _outfilename, string _infilename )
{
try
{
using( var reader = new MediaFoundationReader(_infilename) )
{
// Create a wave format for 16-bit pcm at 8000 samples per second.
int channels = reader.WaveFormat.Channels;
int rate = 8000;
int rawsize = 2;
int blockalign = rawsize * channels; // this is the size of one sample.
int bytespersecond = rate * blockalign;
var midformat =
WaveFormat.CreateCustomFormat( WaveFormatEncoding.Pcm,
rate,
channels,
bytespersecond,
blockalign,
rawsize * 8 );
// And a conversion stream to turn input into 16-bit PCM.
var midstream = new MediaFoundationResampler(reader, midformat);
//var midstream = new WaveFormatConversionStream(midformat, reader);
// The output stream is our custom stream.
var outstream = new PcmToALawConversionStream(midstream);
WaveFileWriter.CreateWaveFile(_outfilename, outstream);
}
}
catch( Exception _ex )
{
}
}
And here's the class that converts 16-bit PCM into A-Law or u-Law. At the end are specializations for A-Law or u-Law:
/// <summary>
/// Encodes 16-bit PCM input into A- or u-Law, presenting the output
/// as an IWaveProvider.
/// </summary>
public class PcmToG711ConversionStream : IWaveProvider
{
/// <summary>Gets the local a-law or u-law format.</summary>
public WaveFormat WaveFormat { get { return waveFormat; } }
/// <summary>Returns <paramref name="count"/> encoded bytes.</summary>
/// <remarks>
/// Note that <paramref name="count"/> is raw bytes. It doesn't consider
/// channel counts, etc.
/// </remarks>
/// <param name="buffer">The output buffer.</param>
/// <param name="offset">The starting position in the output buffer.</param>
/// <param name="count">The number of bytes to read.</param>
/// <returns>The total number of bytes encoded into <paramref name="buffer"/>.</returns>
public int Read(byte[] buffer, int offset, int count)
{
// We'll need a source buffer, twice the size of 'count'.
int shortcount = count*2;
byte [] rawsource = new byte [shortcount];
int sourcecount = Provider.Read(rawsource, 0, shortcount);
int bytecount = sourcecount / 2;
for( int index = 0; index < bytecount; ++index )
{
short source = BitConverter.ToInt16(rawsource, index*2);
buffer[offset+index] = Encode(source);
}
return bytecount;
}
/// <summary>
/// Initializes and A-Law or u-Law "WaveStream". The source stream
/// must be 16-bit PCM!
/// </summary>
/// <param name="_encoding">ALaw or MuLaw only.</param>
/// <param name="_sourcestream">The input PCM stream.</param>
public PcmToG711ConversionStream( WaveFormatEncoding _encoding,
IWaveProvider _provider )
{
Provider = _provider;
WaveFormat sourceformat = Provider.WaveFormat;
if( (sourceformat.Encoding != WaveFormatEncoding.Pcm) &&
(sourceformat.BitsPerSample != 16) )
{
throw new NotSupportedException("Input must be 16-bit PCM. Try using a conversion stream.");
}
if( _encoding == WaveFormatEncoding.ALaw )
{
Encode = this.EncodeALaw;
waveFormat = WaveFormat.CreateALawFormat( _provider.WaveFormat.SampleRate,
_provider.WaveFormat.Channels) ;
}
else if( _encoding == WaveFormatEncoding.MuLaw )
{
Encode = this.EncodeMuLaw;
waveFormat = WaveFormat.CreateMuLawFormat( _provider.WaveFormat.SampleRate,
_provider.WaveFormat.Channels) ;
}
else
{
throw new NotSupportedException("Encoding must be A-Law or u-Law");
}
}
/// <summary>The a-law or u-law encoder delegate.</summary>
EncodeHandler Encode;
/// <summary>a-law or u-law wave format.</summary>
WaveFormat waveFormat;
/// <summary>The input stream.</summary>
IWaveProvider Provider;
/// <summary>A-Law or u-Law encoder delegate.</summary>
/// <param name="_sample">The 16-bit PCM sample to encode.</param>
/// <returns>The encoded value.</returns>
delegate byte EncodeHandler( short _sample );
byte EncodeALaw( short _sample )
{
return ALawEncoder.LinearToALawSample(_sample);
}
byte EncodeMuLaw( short _sample )
{
return MuLawEncoder.LinearToMuLawSample(_sample);
}
}
public class PcmToALawConversionStream : PcmToG711ConversionStream
{
public PcmToALawConversionStream( IWaveProvider _provider )
: base(WaveFormatEncoding.ALaw, _provider)
{
}
}
public class PcmToMuLawConversionStream : PcmToG711ConversionStream
{
public PcmToMuLawConversionStream( IWaveProvider _provider )
: base(WaveFormatEncoding.MuLaw, _provider)
{
}
}
}
at last i got a solution for this issue ie, need to add kind of additional feature named Media Foundation for the better working in Windows Server 2012.
Use the Add Roles and Features wizard from the Server Manager. Skip through to Features and select Media Foundation

C# Text File Reading ObjectDisposedException: The object was used after being disposed

I want to read the last line of a text file. I'm using a solution that's suggested here:
How to efficiently read only last line of the text file
Using that library, I'm getting an error saying the stream is disposed. But I'm confused as I'm declaring the stream during every frame.
FileStream fileStream = new FileStream("C:\\Users\\LukasRoper\\Desktop\\Test.log", FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
ReverseLineReader reverseLineReader = new ReverseLineReader(() => fileStream, Encoding.UTF8);
List<string> stringParts = new List<string>();
do
{
IEnumerable<string> line = reverseLineReader.Take(1);
string data = line.First();
stringParts = data.Split(',').ToList();
} while (stringParts.Count != 9);
I should explain I'm trying to read from a file that another program is writing to at the same time and I can't amend that program as its third party software. Can anybody explain why my FileStream becomes disposed?
The Reverse File Reader is here:
using System;
using System.Collections;
using System.Collections.Generic;
using System.IO;
using System.Text;
namespace MiscUtil.IO
{
/// <summary>
/// Takes an encoding (defaulting to UTF-8) and a function which produces a seekable stream
/// (or a filename for convenience) and yields lines from the end of the stream backwards.
/// Only single byte encodings, and UTF-8 and Unicode, are supported. The stream
/// returned by the function must be seekable.
/// </summary>
public sealed class ReverseLineReader : IEnumerable<string>
{
/// <summary>
/// Buffer size to use by default. Classes with internal access can specify
/// a different buffer size - this is useful for testing.
/// </summary>
private const int DefaultBufferSize = 4096;
/// <summary>
/// Means of creating a Stream to read from.
/// </summary>
private readonly Func<Stream> streamSource;
/// <summary>
/// Encoding to use when converting bytes to text
/// </summary>
private readonly Encoding encoding;
/// <summary>
/// Size of buffer (in bytes) to read each time we read from the
/// stream. This must be at least as big as the maximum number of
/// bytes for a single character.
/// </summary>
private readonly int bufferSize;
/// <summary>
/// Function which, when given a position within a file and a byte, states whether
/// or not the byte represents the start of a character.
/// </summary>
private Func<long,byte,bool> characterStartDetector;
/// <summary>
/// Creates a LineReader from a stream source. The delegate is only
/// called when the enumerator is fetched. UTF-8 is used to decode
/// the stream into text.
/// </summary>
/// <param name="streamSource">Data source</param>
public ReverseLineReader(Func<Stream> streamSource)
: this(streamSource, Encoding.UTF8)
{
}
/// <summary>
/// Creates a LineReader from a filename. The file is only opened
/// (or even checked for existence) when the enumerator is fetched.
/// UTF8 is used to decode the file into text.
/// </summary>
/// <param name="filename">File to read from</param>
public ReverseLineReader(string filename)
: this(filename, Encoding.UTF8)
{
}
/// <summary>
/// Creates a LineReader from a filename. The file is only opened
/// (or even checked for existence) when the enumerator is fetched.
/// </summary>
/// <param name="filename">File to read from</param>
/// <param name="encoding">Encoding to use to decode the file into text</param>
public ReverseLineReader(string filename, Encoding encoding)
: this(() => File.OpenRead(filename), encoding)
{
}
/// <summary>
/// Creates a LineReader from a stream source. The delegate is only
/// called when the enumerator is fetched.
/// </summary>
/// <param name="streamSource">Data source</param>
/// <param name="encoding">Encoding to use to decode the stream into text</param>
public ReverseLineReader(Func<Stream> streamSource, Encoding encoding)
: this(streamSource, encoding, DefaultBufferSize)
{
}
internal ReverseLineReader(Func<Stream> streamSource, Encoding encoding, int bufferSize)
{
this.streamSource = streamSource;
this.encoding = encoding;
this.bufferSize = bufferSize;
if (encoding.IsSingleByte)
{
// For a single byte encoding, every byte is the start (and end) of a character
characterStartDetector = (pos, data) => true;
}
else if (encoding is UnicodeEncoding)
{
// For UTF-16, even-numbered positions are the start of a character
characterStartDetector = (pos, data) => (pos & 1) == 0;
}
else if (encoding is UTF8Encoding)
{
// For UTF-8, bytes with the top bit clear or the second bit set are the start of a character
// See http://www.cl.cam.ac.uk/~mgk25/unicode.html
characterStartDetector = (pos, data) => (data & 0x80) == 0 || (data & 0x40) != 0;
}
else
{
throw new ArgumentException("Only single byte, UTF-8 and Unicode encodings are permitted");
}
}
/// <summary>
/// Returns the enumerator reading strings backwards. If this method discovers that
/// the returned stream is either unreadable or unseekable, a NotSupportedException is thrown.
/// </summary>
public IEnumerator<string> GetEnumerator()
{
Stream stream = streamSource();
if (!stream.CanSeek)
{
stream.Dispose();
throw new NotSupportedException("Unable to seek within stream");
}
if (!stream.CanRead)
{
stream.Dispose();
throw new NotSupportedException("Unable to read within stream");
}
return GetEnumeratorImpl(stream);
}
private IEnumerator<string> GetEnumeratorImpl(Stream stream)
{
try
{
long position = stream.Length;
if (encoding is UnicodeEncoding && (position & 1) != 0)
{
throw new InvalidDataException("UTF-16 encoding provided, but stream has odd length.");
}
// Allow up to two bytes for data from the start of the previous
// read which didn't quite make it as full characters
byte[] buffer = new byte[bufferSize + 2];
char[] charBuffer = new char[encoding.GetMaxCharCount(buffer.Length)];
int leftOverData = 0;
String previousEnd = null;
// TextReader doesn't return an empty string if there's line break at the end
// of the data. Therefore we don't return an empty string if it's our *first*
// return.
bool firstYield = true;
// A line-feed at the start of the previous buffer means we need to swallow
// the carriage-return at the end of this buffer - hence this needs declaring
// way up here!
bool swallowCarriageReturn = false;
while (position > 0)
{
int bytesToRead = Math.Min(position > int.MaxValue ? bufferSize : (int)position, bufferSize);
position -= bytesToRead;
stream.Position = position;
StreamUtil.ReadExactly(stream, buffer, bytesToRead);
// If we haven't read a full buffer, but we had bytes left
// over from before, copy them to the end of the buffer
if (leftOverData > 0 && bytesToRead != bufferSize)
{
// Buffer.BlockCopy doesn't document its behaviour with respect
// to overlapping data: we *might* just have read 7 bytes instead of
// 8, and have two bytes to copy...
Array.Copy(buffer, bufferSize, buffer, bytesToRead, leftOverData);
}
// We've now *effectively* read this much data.
bytesToRead += leftOverData;
int firstCharPosition = 0;
while (!characterStartDetector(position + firstCharPosition, buffer[firstCharPosition]))
{
firstCharPosition++;
// Bad UTF-8 sequences could trigger this. For UTF-8 we should always
// see a valid character start in every 3 bytes, and if this is the start of the file
// so we've done a short read, we should have the character start
// somewhere in the usable buffer.
if (firstCharPosition == 3 || firstCharPosition == bytesToRead)
{
throw new InvalidDataException("Invalid UTF-8 data");
}
}
leftOverData = firstCharPosition;
int charsRead = encoding.GetChars(buffer, firstCharPosition, bytesToRead - firstCharPosition, charBuffer, 0);
int endExclusive = charsRead;
for (int i = charsRead - 1; i >= 0; i--)
{
char lookingAt = charBuffer[i];
if (swallowCarriageReturn)
{
swallowCarriageReturn = false;
if (lookingAt == '\r')
{
endExclusive--;
continue;
}
}
// Anything non-line-breaking, just keep looking backwards
if (lookingAt != '\n' && lookingAt != '\r')
{
continue;
}
// End of CRLF? Swallow the preceding CR
if (lookingAt == '\n')
{
swallowCarriageReturn = true;
}
int start = i + 1;
string bufferContents = new string(charBuffer, start, endExclusive - start);
endExclusive = i;
string stringToYield = previousEnd == null ? bufferContents : bufferContents + previousEnd;
if (!firstYield || stringToYield.Length != 0)
{
yield return stringToYield;
}
firstYield = false;
previousEnd = null;
}
previousEnd = endExclusive == 0 ? null : (new string(charBuffer, 0, endExclusive) + previousEnd);
// If we didn't decode the start of the array, put it at the end for next time
if (leftOverData != 0)
{
Buffer.BlockCopy(buffer, 0, buffer, bufferSize, leftOverData);
}
}
if (leftOverData != 0)
{
// At the start of the final buffer, we had the end of another character.
throw new InvalidDataException("Invalid UTF-8 data at start of stream");
}
if (firstYield && string.IsNullOrEmpty(previousEnd))
{
yield break;
}
yield return previousEnd ?? "";
}
finally
{
stream.Dispose();
}
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
}
I do see that dispose is called on the stream, but doesn't redeclaring it fix that? That class was copied from here: How to read a text file reversely with iterator in C#
Thanks,
Your finally clause in private IEnumerator<string> GetEnumeratorImpl(Stream stream) is disposing your stream after you read from it. Generally you should disposing an object in the same scope or class you create it in.
In this case, remove all disposes in ReverseLineReader and wrap your original code in a using:
using (FileStream fileStream = new FileStream(...))
{
...
do
{
...
} while(...);
}

Code Contracts Static Analysis: Prover Limitations?

I've been playing with Code Contracts and I really like what I've seen so far. They encourage me to evaluate and explicitly declare my assumptions, which has already helped me to identify a few corner cases I hadn't considered in the code to which I'm adding contracts. Right now I'm playing with trying to enforce more sophisticated invariants. I have one case that currently fails proving and I'm curious if there is a way I can fix this besides simply adding Contract.Assume calls. Here is the class in question, stripped down for ease of reading:
public abstract class MemoryEncoder
{
private const int CapacityDelta = 16;
private int _currentByte;
/// <summary>
/// The current byte index in the encoding stream.
/// This should not need to be modified, under typical usage,
/// but can be used to randomly access the encoding region.
/// </summary>
public int CurrentByte
{
get
{
Contract.Ensures(Contract.Result<int>() >= 0);
Contract.Ensures(Contract.Result<int>() <= Length);
return _currentByte;
}
set
{
Contract.Requires(value >= 0);
Contract.Requires(value <= Length);
_currentByte = value;
}
}
/// <summary>
/// Current number of bytes encoded in the buffer.
/// This may be less than the size of the buffer (capacity).
/// </summary>
public int Length { get; private set; }
/// <summary>
/// The raw buffer encapsulated by the encoder.
/// </summary>
protected internal Byte[] Buffer { get; private set; }
/// <summary>
/// Reserve space in the encoder buffer for the specified number of new bytes
/// </summary>
/// <param name="bytesRequired">The number of bytes required</param>
protected void ReserveSpace(int bytesRequired)
{
Contract.Requires(bytesRequired > 0);
Contract.Ensures((Length - CurrentByte) >= bytesRequired);
//Check if these bytes would overflow the current buffer););
if ((CurrentByte + bytesRequired) > Buffer.Length)
{
//Create a new buffer with at least enough space for the additional bytes required
var newBuffer = new Byte[Buffer.Length + Math.Max(bytesRequired, CapacityDelta)];
//Copy the contents of the previous buffer and replace the original buffer reference
Buffer.CopyTo(newBuffer, 0);
Buffer = newBuffer;
}
//Check if the total length of written bytes has increased
if ((CurrentByte + bytesRequired) > Length)
{
Length = CurrentByte + bytesRequired;
}
}
[ContractInvariantMethod]
private void GlobalRules()
{
Contract.Invariant(Buffer != null);
Contract.Invariant(Length <= Buffer.Length);
Contract.Invariant(CurrentByte >= 0);
Contract.Invariant(CurrentByte <= Length);
}
}
I'm interested in how I can structure the Contract calls in ReserveSpace so that the class invariants are provable. In particular, it complains about (Length <= Buffer.Length) and (CurrentByte <= Length). It's reasonable to me that it can't see that (Length <= Buffer.Length) is satisfied, since it's creating a new buffer and reassigning the reference. Is my only option to add an Assume that the invariants are satisfied?
After fighting with this for a while, I came up with this provable solution (constructor is a dummy to allow for isolated testing):
public abstract class MemoryEncoder
{
private const int CapacityDelta = 16;
private byte[] _buffer;
private int _currentByte;
private int _length;
protected MemoryEncoder()
{
Buffer = new byte[500];
Length = 0;
CurrentByte = 0;
}
/// <summary>
/// The current byte index in the encoding stream.
/// This should not need to be modified, under typical usage,
/// but can be used to randomly access the encoding region.
/// </summary>
public int CurrentByte
{
get
{
return _currentByte;
}
set
{
Contract.Requires(value >= 0);
Contract.Requires(value <= Length);
_currentByte = value;
}
}
/// <summary>
/// Current number of bytes encoded in the buffer.
/// This may be less than the size of the buffer (capacity).
/// </summary>
public int Length
{
get { return _length; }
private set
{
Contract.Requires(value >= 0);
Contract.Requires(value <= _buffer.Length);
Contract.Requires(value >= CurrentByte);
Contract.Ensures(_length <= _buffer.Length);
_length = value;
}
}
/// <summary>
/// The raw buffer encapsulated by the encoder.
/// </summary>
protected internal Byte[] Buffer
{
get { return _buffer; }
private set
{
Contract.Requires(value != null);
Contract.Requires(value.Length >= _length);
_buffer = value;
}
}
/// <summary>
/// Reserve space in the encoder buffer for the specified number of new bytes
/// </summary>
/// <param name="bytesRequired">The number of bytes required</param>
protected void ReserveSpace(int bytesRequired)
{
Contract.Requires(bytesRequired > 0);
Contract.Ensures((Length - CurrentByte) >= bytesRequired);
//Check if these bytes would overflow the current buffer););
if ((CurrentByte + bytesRequired) > Buffer.Length)
{
//Create a new buffer with at least enough space for the additional bytes required
var newBuffer = new Byte[Buffer.Length + Math.Max(bytesRequired, CapacityDelta)];
//Copy the contents of the previous buffer and replace the original buffer reference
Buffer.CopyTo(newBuffer, 0);
Buffer = newBuffer;
}
//Check if the total length of written bytes has increased
if ((CurrentByte + bytesRequired) > Length)
{
Contract.Assume(CurrentByte + bytesRequired <= _buffer.Length);
Length = CurrentByte + bytesRequired;
}
}
[ContractInvariantMethod]
private void GlobalRules()
{
Contract.Invariant(_buffer != null);
Contract.Invariant(_length <= _buffer.Length);
Contract.Invariant(_currentByte >= 0);
Contract.Invariant(_currentByte <= _length);
}
}
The main thing I noticed is that placing invariants on properties gets messy, but seems to solve more easily with invariants on fields. It was also important to place appropriate contractual obligations in the property accessors. I'll have to keep experimenting and see what works and what doesn't. It's an interesting system, but I'd definitely like to know more if anybody has a good 'cheat sheet' on how the prover works.

Best way to find position in the Stream where given byte sequence starts

How do you think what is the best way to find position in the System.Stream where given byte sequence starts (first occurence):
public static long FindPosition(Stream stream, byte[] byteSequence)
{
long position = -1;
/// ???
return position;
}
P.S. The simpliest yet fastest solution is preffered. :)
I've reached this solution.
I did some benchmarks with an ASCII file that was 3.050 KB and 38803 lines.
With a search byte array of 22 bytes in the last line of the file I've got the result in about 2.28 seconds (in a slow/old machine).
public static long FindPosition(Stream stream, byte[] byteSequence)
{
if (byteSequence.Length > stream.Length)
return -1;
byte[] buffer = new byte[byteSequence.Length];
using (BufferedStream bufStream = new BufferedStream(stream, byteSequence.Length))
{
int i;
while ((i = bufStream.Read(buffer, 0, byteSequence.Length)) == byteSequence.Length)
{
if (byteSequence.SequenceEqual(buffer))
return bufStream.Position - byteSequence.Length;
else
bufStream.Position -= byteSequence.Length - PadLeftSequence(buffer, byteSequence);
}
}
return -1;
}
private static int PadLeftSequence(byte[] bytes, byte[] seqBytes)
{
int i = 1;
while (i < bytes.Length)
{
int n = bytes.Length - i;
byte[] aux1 = new byte[n];
byte[] aux2 = new byte[n];
Array.Copy(bytes, i, aux1, 0, n);
Array.Copy(seqBytes, aux2, n);
if (aux1.SequenceEqual(aux2))
return i;
i++;
}
return i;
}
If you treat the stream like another sequence of bytes, you can just search it like you were doing a string search. Wikipedia has a great article on that. Boyer-Moore is a good and simple algorithm for this.
Here's a quick hack I put together in Java. It works and it's pretty close if not Boyer-Moore. Hope it helps ;)
public static final int BUFFER_SIZE = 32;
public static int [] buildShiftArray(byte [] byteSequence){
int [] shifts = new int[byteSequence.length];
int [] ret;
int shiftCount = 0;
byte end = byteSequence[byteSequence.length-1];
int index = byteSequence.length-1;
int shift = 1;
while(--index >= 0){
if(byteSequence[index] == end){
shifts[shiftCount++] = shift;
shift = 1;
} else {
shift++;
}
}
ret = new int[shiftCount];
for(int i = 0;i < shiftCount;i++){
ret[i] = shifts[i];
}
return ret;
}
public static byte [] flushBuffer(byte [] buffer, int keepSize){
byte [] newBuffer = new byte[buffer.length];
for(int i = 0;i < keepSize;i++){
newBuffer[i] = buffer[buffer.length - keepSize + i];
}
return newBuffer;
}
public static int findBytes(byte [] haystack, int haystackSize, byte [] needle, int [] shiftArray){
int index = needle.length;
int searchIndex, needleIndex, currentShiftIndex = 0, shift;
boolean shiftFlag = false;
index = needle.length;
while(true){
needleIndex = needle.length-1;
while(true){
if(index >= haystackSize)
return -1;
if(haystack[index] == needle[needleIndex])
break;
index++;
}
searchIndex = index;
needleIndex = needle.length-1;
while(needleIndex >= 0 && haystack[searchIndex] == needle[needleIndex]){
searchIndex--;
needleIndex--;
}
if(needleIndex < 0)
return index-needle.length+1;
if(shiftFlag){
shiftFlag = false;
index += shiftArray[0];
currentShiftIndex = 1;
} else if(currentShiftIndex >= shiftArray.length){
shiftFlag = true;
index++;
} else{
index += shiftArray[currentShiftIndex++];
}
}
}
public static int findBytes(InputStream stream, byte [] needle){
byte [] buffer = new byte[BUFFER_SIZE];
int [] shiftArray = buildShiftArray(needle);
int bufferSize, initBufferSize;
int offset = 0, init = needle.length;
int val;
try{
while(true){
bufferSize = stream.read(buffer, needle.length-init, buffer.length-needle.length+init);
if(bufferSize == -1)
return -1;
if((val = findBytes(buffer, bufferSize+needle.length-init, needle, shiftArray)) != -1)
return val+offset;
buffer = flushBuffer(buffer, needle.length);
offset += bufferSize-init;
init = 0;
}
} catch (IOException e){
e.printStackTrace();
}
return -1;
}
You'll basically need to keep a buffer the same size as byteSequence so that once you've found that the "next byte" in the stream matches, you can check the rest but then still go back to the "next but one" byte if it's not an actual match.
It's likely to be a bit fiddly whatever you do, to be honest :(
I needed to do this myself, had already started, and didn't like the solutions above. I specifically needed to find where the search-byte-sequence ends. In my situation, I need to fast-forward the stream until after that byte sequence. But you can use my solution for this question too:
var afterSequence = stream.ScanUntilFound(byteSequence);
var beforeSequence = afterSequence - byteSequence.Length;
Here is StreamExtensions.cs
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace System
{
static class StreamExtensions
{
/// <summary>
/// Advances the supplied stream until the given searchBytes are found, without advancing too far (consuming any bytes from the stream after the searchBytes are found).
/// Regarding efficiency, if the stream is network or file, then MEMORY/CPU optimisations will be of little consequence here.
/// </summary>
/// <param name="stream">The stream to search in</param>
/// <param name="searchBytes">The byte sequence to search for</param>
/// <returns></returns>
public static int ScanUntilFound(this Stream stream, byte[] searchBytes)
{
// For this class code comments, a common example is assumed:
// searchBytes are {1,2,3,4} or 1234 for short
// # means value that is outside of search byte sequence
byte[] streamBuffer = new byte[searchBytes.Length];
int nextRead = searchBytes.Length;
int totalScannedBytes = 0;
while (true)
{
FillBuffer(stream, streamBuffer, nextRead);
totalScannedBytes += nextRead; //this is only used for final reporting of where it was found in the stream
if (ArraysMatch(searchBytes, streamBuffer, 0))
return totalScannedBytes; //found it
nextRead = FindPartialMatch(searchBytes, streamBuffer);
}
}
/// <summary>
/// Check all offsets, for partial match.
/// </summary>
/// <param name="searchBytes"></param>
/// <param name="streamBuffer"></param>
/// <returns>The amount of bytes which need to be read in, next round</returns>
static int FindPartialMatch(byte[] searchBytes, byte[] streamBuffer)
{
// 1234 = 0 - found it. this special case is already catered directly in ScanUntilFound
// #123 = 1 - partially matched, only missing 1 value
// ##12 = 2 - partially matched, only missing 2 values
// ###1 = 3 - partially matched, only missing 3 values
// #### = 4 - not matched at all
for (int i = 1; i < searchBytes.Length; i++)
{
if (ArraysMatch(searchBytes, streamBuffer, i))
{
// EG. Searching for 1234, have #123 in the streamBuffer, and [i] is 1
// Output: 123#, where # will be read using FillBuffer next.
Array.Copy(streamBuffer, i, streamBuffer, 0, searchBytes.Length - i);
return i; //if an offset of [i], makes a match then only [i] bytes need to be read from the stream to check if there's a match
}
}
return 4;
}
/// <summary>
/// Reads bytes from the stream, making sure the requested amount of bytes are read (streams don't always fulfill the full request first time)
/// </summary>
/// <param name="stream">The stream to read from</param>
/// <param name="streamBuffer">The buffer to read into</param>
/// <param name="bytesNeeded">How many bytes are needed. If less than the full size of the buffer, it fills the tail end of the streamBuffer</param>
static void FillBuffer(Stream stream, byte[] streamBuffer, int bytesNeeded)
{
// EG1. [123#] - bytesNeeded is 1, when the streamBuffer contains first three matching values, but now we need to read in the next value at the end
// EG2. [####] - bytesNeeded is 4
var bytesAlreadyRead = streamBuffer.Length - bytesNeeded; //invert
while (bytesAlreadyRead < streamBuffer.Length)
{
bytesAlreadyRead += stream.Read(streamBuffer, bytesAlreadyRead, streamBuffer.Length - bytesAlreadyRead);
}
}
/// <summary>
/// Checks if arrays match exactly, or with offset.
/// </summary>
/// <param name="searchBytes">Bytes to search for. Eg. [1234]</param>
/// <param name="streamBuffer">Buffer to match in. Eg. [#123] </param>
/// <param name="startAt">When this is zero, all bytes are checked. Eg. If this value 1, and it matches, this means the next byte in the stream to read may mean a match</param>
/// <returns></returns>
static bool ArraysMatch(byte[] searchBytes, byte[] streamBuffer, int startAt)
{
for (int i = 0; i < searchBytes.Length - startAt; i++)
{
if (searchBytes[i] != streamBuffer[i + startAt])
return false;
}
return true;
}
}
}
Bit old question, but here's my answer. I've found that reading blocks and then searching in that is extremely inefficient compared to just reading one at a time and going from there.
Also, IIRC, the accepted answer would fail if part of the sequence was in one block read and half in another - ex, given 12345, searching for 23, it would read 12, not match, then read 34, not match, etc... haven't tried it, though, seeing as it requires net 4.0. At any rate, this is way simpler, and likely much faster.
static long ReadOneSrch(Stream haystack, byte[] needle)
{
int b;
long i = 0;
while ((b = haystack.ReadByte()) != -1)
{
if (b == needle[i++])
{
if (i == needle.Length)
return haystack.Position - needle.Length;
}
else
i = b == needle[0] ? 1 : 0;
}
return -1;
}
static long Search(Stream stream, byte[] pattern)
{
long start = -1;
stream.Seek(0, SeekOrigin.Begin);
while(stream.Position < stream.Length)
{
if (stream.ReadByte() != pattern[0])
continue;
start = stream.Position - 1;
for (int idx = 1; idx < pattern.Length; idx++)
{
if (stream.ReadByte() != pattern[idx])
{
start = -1;
break;
}
}
if (start > -1)
{
return start;
}
}
return start;
}

Categories

Resources