I'm reading string data from inside a file. When I search the string data I read, the value I want does not seem to exist. Can you help with this topic?
The word I'm trying to search is: GTA:SA:MP
The code I use is:
static byte[] ReadFile(string filePath)
{
byte[] buffer;
FileStream fileStream = new FileStream(filePath, FileMode.Open, FileAccess.Read);
try
{
int length = (int)fileStream.Length; // get file length
buffer = new byte[length]; // create buffer
int count; // actual number of bytes read
int sum = 0; // total number of bytes read
// read until Read method returns 0 (end of the stream has been reached)
while ((count = fileStream.Read(buffer, sum, length - sum)) > 0)
sum += count; // sum is a buffer offset for next reading
}
finally
{
fileStream.Close();
}
return buffer;
}
static void Main(string[] args)
{
byte[] data = ReadFile(#"FILE.exe");
string result = Encoding.ASCII.GetString(data);
if (result.Contains("GTA:SA:MP"))
{
Console.WriteLine("Found");
}
else
{
Console.WriteLine("Not found");
}
Console.ReadLine();
}
The answer to me: Not found
You've got a couple problems. As others have pointed out if your source is bytes then you should compare bytes not strings. Otherwise you have encoding issues. Second issue is you're using a buffer but you're not checking for any boundary conditions - where the pattern you're searching for is split across the buffer size boundary. One simple way to do something like this is treat the source as a stream and just check byte by byte. I'll include an example using a simple state machine made from local functions.
I used the local functions just because it seemed fun, you can do this in a myriad of ways..
static void Main(string[] _)
{
byte[] target = Encoding.UTF8.GetBytes("2:30pm");
long offsetInSource = 0;
int indexOfTarget = 0;
long current = 0;
bool found = false;
Func<byte, byte, bool> match = CheckStart;
using (BinaryReader reader = new BinaryReader(File.Open("foo.txt", FileMode.Open)))
{
while (current < reader.BaseStream.Length)
{
var b = reader.ReadByte();
var t = target[indexOfTarget];
if (match(t, b))
{
found = true;
break;
}
++current;
}
}
if (found)
{
Console.WriteLine($"Found matching pattern at: {offsetInSource}");
}
else
{
Console.WriteLine("Did not find pattern");
}
bool CheckStart(byte t, byte b)
{
if (t == b)
{
offsetInSource = current;
if (++indexOfTarget == target.Length)
return true;
match = CheckRest;
}
return false;
}
bool CheckRest(byte t, byte b)
{
if (t == b)
{
if (++indexOfTarget == target.Length)
return true;
}
else
{
indexOfTarget = 0;
match = CheckStart;
}
return false;
}
}
}
If your file is huge, you can read file as text in 500 characters (for example) and store them into a string variable and search your phrase in this variable. If your phrase not found, read another 500 characters by 450 (500-50) offset and store them into a string variable and search your phrase in this variable. Do this loop until your phrase found or EOF reached.
Related
SUMMARY
in reading bytes from a file in chunks(not got a specific size between 128 - 1024, haven't decided yet) and i want to search the buffer to see if it contains a signature(pattern) of another byte array, and if it finds some of the pattern at the very end of the buffer it should read the next few bytes from the files to see if its found a match
What I've Tried
public static bool Contains(byte[] buffer, byte[] signiture, FileStream file)
{
for (var i = buffer.Length - 1; i >= signiture.Length - 1; i--) //move backwards through array stop if < signature
{
var found = true; //set found to true at start
for (var j = signiture.Length - 1; j >= 0 && found; j--) //loop backwards throughsignature
{
found = buffer[i - (signiture.Length - 1 - j)] == signiture[j];// compare signature's element with corresponding element of buffer
}
if (found)
return true; //if signature is found return true
}
//checking end of buffer for partial signiture
for (var x = signiture.Length - 1; x >= 1; x--)
{
if (buffer.Skip(buffer.Length - x).Take(x).SequenceEqual(signiture.Skip(0).Take(x))) //check if partial is equal to partial signiture
{
byte[] nextBytes = new byte[signiture.Length - x];
file.Read(nextBytes, 0, signiture.Length - x); //read next needed bytes from file
if (!signiture.Skip(0).Take(x).ToArray().Concat(nextBytes).SequenceEqual(signiture))
return false; //return false if not a match
return true; //return true if a match
}
}
return false; //if not found return false
}
This works but I've been told linq is slow and that i should use Array.IndexOf(). I've tried that but cant figure out how to implement it
You can make use of Span<T>, AsSpan and MemoryExtensions.SequenceEqual. The latter is not LINQ; it is optimized, especially for byte arrays. It unrolls the loop and uses unsafe code to essentially do a memcmp.
If you aren't using a framework that includes these types/methods by default, (.Netcore2.1+,. Netstandard 2.1) you can add the System.Memory nuget package. The implementation of SequenceEqual is a bit different (the so-called "slow version") but it is still faster than using LINQ's SequenceEqual.
Note that you also need to check the return value of FileStream.Read.
public static bool Contains(byte[] buffer, byte[] signiture, FileStream file)
{
var sigSpan = signiture.AsSpan();
//move backwards through buffer and check if signature found
for (var i = buffer.Length - signiture.Length; i >= 0; i--)
{
if (buffer.AsSpan(i, signiture.Length).SequenceEqual(sigSpan))
return true;
}
for (var x = signiture.Length - 1; x >= 1; x--)
{
var sig = sigSpan.Slice(0, x);
if (buffer.AsSpan(buffer.Length - x).SequenceEqual(sig)) //check if partial is equal to partial signiture
{
var sigLen = signiture.Length;
byte[] nextBytes = ArrayPool<byte>.Shared.Rent(sigLen - x);
// need to store number of bytes read
var read = file.Read(nextBytes, 0, sigLen - x); //read next needed bytes from file
var next = nextBytes.AsSpan(0, read);
// don't need to concat with signature, because obviously signature is going to
// start with signature.Skip(0).Take(...)
// just test that the number of bytes we read, plus the number we will skip equals
// the actual length, then check the remainder
var result = (read + x == signiture.Length
&& signiture.AsSpan(x).SequenceEqual(next));
ArrayPool<byte>.Shared.Return(nextBytes);
return result;
}
}
return false; //if not found return false
}
I have 2 files.
1 is Source File and 2nd is Destination file.
Below is my code for Intersect and Union two file using byte array.
FileStream frsrc = new FileStream("Src.bin", FileMode.Open);
FileStream frdes = new FileStream("Des.bin", FileMode.Open);
int length = 24; // get file length
byte[] src = new byte[length];
byte[] des = new byte[length]; // create buffer
int Counter = 0; // actual number of bytes read
int subcount = 0;
while (frsrc.Read(src, 0, length) > 0)
{
try
{
Counter = 0;
frdes.Position = subcount * length;
while (frdes.Read(des, 0, length) > 0)
{
var data = src.Intersect(des);
var data1 = src.Union(des);
Counter++;
}
subcount++;
Console.WriteLine(subcount.ToString());
}
}
catch (Exception ex)
{
}
}
It is works fine with fastest speed.
but Now the problem is that I want count of it and when I Use below code then it becomes very slow.
var data = src.Intersect(des).Count();
var data1 = src.Union(des).Count();
So, Is there any solution for that ?
If yes,then please lete me know as soon as possible.
Thanks
Intersect and Union are not the fastest operations. The reason you see it being fast is that you never actually enumerate the results!
Both return an enumerable, not the actual results of the operation. You're supposed to go through that and enumerate the enumerable, otherwise nothing happens - this is called "deferred execution". Now, when you do Count, you actually enumerate the enumerable, and incur the full cost of the Intersect and Union - believe me, the Count itself is relatively trivial (though still an O(n) operation!).
You'll need to make your own methods, most likely. You want to avoid the enumerable overhead, and more importantly, you'll probably want a lookup table.
A few points: the comment // get file length is misleading as it is the buffer size. Counter is not the number of bytes read, it is the number of blocks read. data and data1 will end up with the result of the last block read, ignoring any data before them. That is assuming that nothing goes wrong in the while loop - you need to remove the try structure to see if there are any errors.
What you can do is count the number of occurences of each byte in each file, then if the count of a byte in any file is greater than one then it is is a member of the intersection of the files, and if the count of a byte in all the files is greater than one then it is a member of the union of the files.
It is just as easy to write the code for more than two files as it is for two files, whereas LINQ is easy for two but a little bit more fiddly for more than two. (I put in a comparison with using LINQ in a naïve fashion for only two files at the end.)
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
var file1 = #"C:\Program Files (x86)\Electronic Arts\Crysis 3\Bin32\Crysis3.exe"; // 26MB
var file2 = #"C:\Program Files (x86)\Electronic Arts\Crysis 3\Bin32\d3dcompiler_46.dll"; // 3MB
List<string> files = new List<string> { file1, file2 };
var sw = System.Diagnostics.Stopwatch.StartNew();
// Prepare array of counters for the bytes
var nFiles = files.Count;
int[][] count = new int[nFiles][];
for (int i = 0; i < nFiles; i++)
{
count[i] = new int[256];
}
// Get the counts of bytes in each file
int bufLen = 32768;
byte[] buffer = new byte[bufLen];
int bytesRead;
for (int fileNum = 0; fileNum < nFiles; fileNum++)
{
using (var sr = new FileStream(files[fileNum], FileMode.Open, FileAccess.Read))
{
bytesRead = bufLen;
while (bytesRead > 0)
{
bytesRead = sr.Read(buffer, 0, bufLen);
for (int i = 0; i < bytesRead; i++)
{
count[fileNum][buffer[i]]++;
}
}
}
}
// Find which bytes are in any of the files or in all the files
var inAny = new List<byte>(); // union
var inAll = new List<byte>(); // intersect
for (int i = 0; i < 256; i++)
{
Boolean all = true;
for (int fileNum = 0; fileNum < nFiles; fileNum++)
{
if (count[fileNum][i] > 0)
{
if (!inAny.Contains((byte)i)) // avoid adding same value more than once
{
inAny.Add((byte)i);
}
}
else
{
all = false;
}
};
if (all)
{
inAll.Add((byte)i);
};
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
// Display the results
Console.WriteLine("Union: " + string.Join(",", inAny.Select(x => x.ToString("X2"))));
Console.WriteLine();
Console.WriteLine("Intersect: " + string.Join(",", inAll.Select(x => x.ToString("X2"))));
Console.WriteLine();
// Compare to using LINQ.
// N/B. Will need adjustments for more than two files.
var srcBytes1 = File.ReadAllBytes(file1);
var srcBytes2 = File.ReadAllBytes(file2);
sw.Restart();
var intersect = srcBytes1.Intersect(srcBytes2).ToArray().OrderBy(x => x);
var union = srcBytes1.Union(srcBytes2).ToArray().OrderBy(x => x);
Console.WriteLine(sw.ElapsedMilliseconds);
Console.WriteLine("Union: " + String.Join(",", union.Select(x => x.ToString("X2"))));
Console.WriteLine();
Console.WriteLine("Intersect: " + String.Join(",", intersect.Select(x => x.ToString("X2"))));
Console.ReadLine();
}
}
}
The counting-the-byte-occurences method is roughly five times faster than the LINQ method on my computer, even without the latter loading the files and on a range of file sizes (a few KB to a few MB).
I start a process to retrieve a few frames from a video file with ffmpeg,
ffmpeg -i "<videofile>.mp4" -frames:v 10 -f image2pipe pipe:1
and pipe the images to stdout -
var cmd = Process.Start(p);
var stream = cmd.StandardOutput.BaseStream;
var img = Image.FromStream(stream);
Getting the first image this way works, but how do I get all of them?
OK this was gobspackingly easy, kind of embarrassed I asked here. I'll post the answer in case it will help anyone else.
The first few bytes in the stream will be repeated every time there is a new image. I guessed the first 8 would do and voila.
static IEnumerable<Image> GetThumbnails(Stream stream)
{
byte[] allImages;
using (var ms = new MemoryStream())
{
stream.CopyTo(ms);
allImages = ms.ToArray();
}
var bof = allImages.Take(8).ToArray(); //??
var prevOffset = -1;
foreach (var offset in GetBytePatternPositions(allImages, bof))
{
if (prevOffset > -1)
yield return GetImageAt(allImages, prevOffset, offset);
prevOffset = offset;
}
if (prevOffset > -1)
yield return GetImageAt(allImages, prevOffset, allImages.Length);
}
static Image GetImageAt(byte[] data, int start, int end)
{
using (var ms = new MemoryStream(end - start))
{
ms.Write(data, start, end - start);
return Image.FromStream(ms);
}
}
static IEnumerable<int> GetBytePatternPositions(byte[] data, byte[] pattern)
{
var dataLen = data.Length;
var patternLen = pattern.Length - 1;
int scanData = 0;
int scanPattern = 0;
while (scanData < dataLen)
{
if (pattern[0] == data[scanData])
{
scanPattern = 1;
scanData++;
while (pattern[scanPattern] == data[scanData])
{
if (scanPattern == patternLen)
{
yield return scanData - patternLen;
break;
}
scanPattern++;
scanData++;
}
}
scanData++;
}
}
I'm new to C#, and I'm trying to use task async await for a WinsForm GUI. I've read so many tutorials about it, but all of them implement tasks differently. Some tasks use functions, and others just put the code in to execute. Some use Task.Run() or just await. Furthermore, all the examples I've seen are of functions that are included in the UI class. I'm trying to run functions that are in classes that are within my UI. I'm just really confused now, and don't know what's right/wrong.
What I'm trying to do is write a file to an EEPROM, using the SpringCard API/ PC/SC library. I parse the file into packets and write it to the smart card. I also want to update a status label and progress bar. A lot of things can go wrong. I have flags set in the smart card, and right now I just a while loop running until it reads a certain flag, which will obviously stall the program if it's forever waiting for a flag.
I guess I'm just confused about how to set it up. Help. I've tried using Tasks. Here is my code so far.
/* Initialize open file dialog */
OpenFileDialog ofd = new OpenFileDialog();
ofd.Multiselect = false;
ofd.Filter = "BIN Files (.bin)|*.bin|HEX Files (.hex)|*.hex";
ofd.InitialDirectory = "C:";
ofd.Title = "Select File";
//Check open file dialog result
if (ofd.ShowDialog() != DialogResult.OK)
{
if (shade != null)
{
shade.Dispose();
shade = null;
}
return;
}
//progform.Show();
Progress<string> progress = new Progress<string>();
file = new ATAC_File(ofd.FileName);
try
{
cardchannel.DisconnectReset();
Task upgrade = upgradeASYNC();
if(cardchannel.Connect())
{
await upgrade;
}
else
{
add_log_text("Connection to the card failed");
MessageBox.Show("Failed to connect to the card in the reader : please check that you don't have another application running in background that tries to work with the smartcards in the same time");
if (shade != null)
{
shade.Dispose();
shade = null;
}
cardchannel = null;
}
}
private async Task upgradeASYNC()
{
int i = 0;
int totalpackets = 0;
add_log_text("Parsing file into packets.");
totalpackets = file.parseFile();
/*progress.Report(new MyTaskProgressReport
{
CurrentProgressAmount = i,
TotalProgressAmount = totalpackets,
CurrentProgressMessage = "Sending upgrade file..."
});*/
ST_EEPROMM24LR64ER chip = new ST_EEPROMM24LR64ER(this, cardchannel, file, EEPROM.DONOTHING);
bool writefile = chip.WriteFileASYNC();
if(writefile)
{
add_log_text("WRITE FILE OK.");
}
else
{
add_log_text("WRITE FILE BAD.");
}
}
In the file class:
public int parseFile()
{
FileStream fs = new FileStream(filename, FileMode.Open, FileAccess.Read);
BinaryReader br = new BinaryReader(fs);
FileInfo finfo = new FileInfo(filename);
int readbytecount = 0;
int packetcount = 0;
int numofbytesleft = 0;
byte[] hash = new byte[4];
byte[] packetinfo = new byte[4];
byte[] filechunk = null;
/* Read file until all file bytes read */
while (size_int > readbytecount)
{
//Initialize packet array
filechunk = new byte[MAXDATASIZE];
//read into byte array of max write size
if (packetcount < numoffullpackets)
{
//Initialize packet info array
packetinfo[0] = (byte)((size_int + 1) % 0x0100); //packetcountlo
packetinfo[1] = (byte)((size_int + 1) / 0x0100); //packetcounthi
packetinfo[2] = (byte)((packetcount + 1) / 0x0100); //packetcounthi
packetinfo[3] = (byte)((packetcount + 1) % 0x0100); //packetcountlo
//read bytes from file into packet array
bytesread = br.Read(filechunk, 0, MAXDATASIZE);
//add number of bytes read to readbytecount
readbytecount += bytesread;
}
//read EOF into byte array of size smaller than max write size
else if (packetcount == numoffullpackets)
{
//find out how many bytes left to read
numofbytesleft = size_int - (MAXDATASIZE * numoffullpackets);
//Initialize packet info array
packetinfo[0] = (byte)((size_int + 1) / 0x0100); //packetcounthi
packetinfo[1] = (byte)((size_int + 1) % 0x0100); //packetcountlo
packetinfo[2] = (byte)((packetcount + 1) / 0x0100); //packetcounthi
packetinfo[3] = (byte)((packetcount + 1) % 0x0100); //packetcountlo
//Initialize array and add byte padding, MAXWRITESIZE-4 because the other 4 bytes will be added when we append the CRC
//filechunk = new byte[numofbytesleft];
for (int j = 0; j < numofbytesleft; j++)
{
//read byte from file
filechunk[j] = br.ReadByte();
//add number of bytes read to readbytecount
readbytecount++;
}
for (int j = numofbytesleft; j < MAXDATASIZE; j++)
{
filechunk[j] = 0xFF;
}
}
else
{
MessageBox.Show("ERROR");
}
//calculate crc32 on byte array
int i = 0;
foreach (byte b in crc32.ComputeHash(filechunk))
{
hash[i++] = b;
}
//Append hash to filechunk to create new byte array named chunk
byte[] chunk = new byte[MAXWRITESIZE];
Buffer.BlockCopy(packetinfo, 0, chunk, 0, packetinfo.Length);
Buffer.BlockCopy(filechunk, 0, chunk, packetinfo.Length, filechunk.Length);
Buffer.BlockCopy(hash, 0, chunk, (packetinfo.Length + filechunk.Length), hash.Length);
//Add chunk to byte array list
packetcount++;
PacketBYTE.Add(chunk);
}
parseCMD();
return PacketBYTE.Count;
}
In the EEPROM class:
public bool WriteFileASYNC()
{
int blocknum = ATAC_CONSTANTS.RFBN_RFstartwrite;
byte[] response = null;
CAPDU[] EEPROMcmd = null;
int packetCount = 0;
log("ATTEMPT: Read response funct flag.");
do
{
StopRF();
Thread.SpinWait(100);
StartRF();
log("ATTEMPT: Write function flag.");
while (!WriteFlag(ATAC_CONSTANTS.RFBN_functflag, EEPROM.UPLOADAPP)) ;
} while (ReadFunctFlag(ATAC_CONSTANTS.RFBN_responseflag, 0) != EEPROM.UPLOADAPP);
for (int EEPROMcount = 0; EEPROMcount < file.CmdBYTE.Count; EEPROMcount++)
{
string temp = "ATTEMPT: Write EEPROM #" + EEPROMcount.ToString();
log(temp);
EEPROMcmd = file.CmdBYTE[EEPROMcount];
while (EEPROMcmd[blocknum] != null)
{
if (blocknum % 32 == 0)
{
string tempp = "ATTEMPT: Write packet #" + packetCount.ToString();
log("ATTEMPT: Write packet #");
packetCount++;
}
do
{
response = WriteBinaryASYNC(EEPROMcmd[blocknum]);
} while (response == null);
blocknum++;
}
log("ATTEMPT: Write packet flag.");
while (!WriteFlag(ATAC_CONSTANTS.RFBN_packetflag, ATAC_CONSTANTS.RFflag)) ;
log("ATTEMPT: Write packet flag.");
do
{
StopRF();
Thread.SpinWait(300);
StartRF();
} while (!ReadFlag(ATAC_CONSTANTS.RFBN_packetresponseflag, ((blocknum/32) - 1)*(EEPROMcount+1)));
blocknum = ATAC_CONSTANTS.RFBN_RFstartwrite;
}
return true;
}
Tasks are threads.
When you write this:
Task upgrade = upgradeASYNC();
you are simply executing upgradeASYNC in a new thread.
When you write this:
await upgrade;
You are only waiting for the new thread to finish (before going to the next instruction).
And this method
private async Task upgradeASYNC()
returns a Task object only because you add the async keyword. But in the body of this method there is no await. So it just runs synchronously, like any thread job.
I don't have time to rewrite your code, i let that to another stackoverflow user. You should learn and work harder ;)
an example (that might not be real life, but to make my point) :
public void StreamInfo(StreamReader p)
{
string info = string.Format(
"The supplied streamreaer read : {0}\n at line {1}",
p.ReadLine(),
p.GetLinePosition()-1);
}
GetLinePosition here is an imaginary extension method of streamreader.
Is this possible?
Of course I could keep count myself but that's not the question.
I came across this post while looking for a solution to a similar problem where I needed to seek the StreamReader to particular lines. I ended up creating two extension methods to get and set the position on a StreamReader. It doesn't actually provide a line number count, but in practice, I just grab the position before each ReadLine() and if the line is of interest, then I keep the start position for setting later to get back to the line like so:
var index = streamReader.GetPosition();
var line1 = streamReader.ReadLine();
streamReader.SetPosition(index);
var line2 = streamReader.ReadLine();
Assert.AreEqual(line1, line2);
and the important part:
public static class StreamReaderExtensions
{
readonly static FieldInfo charPosField = typeof(StreamReader).GetField("charPos", BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.DeclaredOnly);
readonly static FieldInfo byteLenField = typeof(StreamReader).GetField("byteLen", BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.DeclaredOnly);
readonly static FieldInfo charBufferField = typeof(StreamReader).GetField("charBuffer", BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.DeclaredOnly);
public static long GetPosition(this StreamReader reader)
{
// shift position back from BaseStream.Position by the number of bytes read
// into internal buffer.
int byteLen = (int)byteLenField.GetValue(reader);
var position = reader.BaseStream.Position - byteLen;
// if we have consumed chars from the buffer we need to calculate how many
// bytes they represent in the current encoding and add that to the position.
int charPos = (int)charPosField.GetValue(reader);
if (charPos > 0)
{
var charBuffer = (char[])charBufferField.GetValue(reader);
var encoding = reader.CurrentEncoding;
var bytesConsumed = encoding.GetBytes(charBuffer, 0, charPos).Length;
position += bytesConsumed;
}
return position;
}
public static void SetPosition(this StreamReader reader, long position)
{
reader.DiscardBufferedData();
reader.BaseStream.Seek(position, SeekOrigin.Begin);
}
}
This works quite well for me and depending on your tolerance for using reflection It thinks it is a fairly simple solution.
Caveats:
While I have done some simple testing using various Systems.Text.Encoding options, pretty much all of the data I consume with this are simple text files (ASCII).
I only ever use the StreamReader.ReadLine() method and while a brief review of the source for StreamReader seems to indicate this will still work when using the other read methods, I have not really tested that scenario.
No, not really possible. The concept of a "line number" is based upon the actual data that's already been read, not just the position. For instance, if you were to Seek() the reader to an arbitrary position, it's not actuall going to read that data, so it wouldn't be able to determine the line number.
The only way to do this is to keep track of it yourself.
It is extremely easy to provide a line-counting wrapper for any TextReader:
public class PositioningReader : TextReader {
private TextReader _inner;
public PositioningReader(TextReader inner) {
_inner = inner;
}
public override void Close() {
_inner.Close();
}
public override int Peek() {
return _inner.Peek();
}
public override int Read() {
var c = _inner.Read();
if (c >= 0)
AdvancePosition((Char)c);
return c;
}
private int _linePos = 0;
public int LinePos { get { return _linePos; } }
private int _charPos = 0;
public int CharPos { get { return _charPos; } }
private int _matched = 0;
private void AdvancePosition(Char c) {
if (Environment.NewLine[_matched] == c) {
_matched++;
if (_matched == Environment.NewLine.Length) {
_linePos++;
_charPos = 0;
_matched = 0;
}
}
else {
_matched = 0;
_charPos++;
}
}
}
Drawbacks (for the sake of brevity):
Does not check constructor argument for null
Does not recognize alternate ways to terminate the lines. Will be inconsistent with ReadLine() behavior when reading files separated by raw \r or \n.
Does not override "block"-level methods like Read(char[], int, int), ReadBlock, ReadLine, ReadToEnd. TextReader implementation works correctly since it routes everything else to Read(); however, better performance could be achieved by
overriding those methods via routing calls to _inner. instead of base.
passing the characters read to the AdvancePosition. See the sample ReadBlock implementation:
public override int ReadBlock(char[] buffer, int index, int count) {
var readCount = _inner.ReadBlock(buffer, index, count);
for (int i = 0; i < readCount; i++)
AdvancePosition(buffer[index + i]);
return readCount;
}
No.
Consider that it's possible to seek to any poisition using the underlying stream object (which could be at any point in any line).
Now consider what that would do to any count kept by the StreamReader.
Should the StreamReader go and figure out which line it's now on?
Should it just keep a number of lines read, regardless of position within the file?
There are more questions than just these that would make this a nightmare to implement, imho.
Here is a guy that implemented a StreamReader with ReadLine() method that registers file position.
http://www.daniweb.com/forums/thread35078.html
I guess one should inherit from StreamReader, and then add the extra method to the special class along with some properties (_lineLength + _bytesRead):
// Reads a line. A line is defined as a sequence of characters followed by
// a carriage return ('\r'), a line feed ('\n'), or a carriage return
// immediately followed by a line feed. The resulting string does not
// contain the terminating carriage return and/or line feed. The returned
// value is null if the end of the input stream has been reached.
//
/// <include file='doc\myStreamReader.uex' path='docs/doc[#for="myStreamReader.ReadLine"]/*' />
public override String ReadLine()
{
_lineLength = 0;
//if (stream == null)
// __Error.ReaderClosed();
if (charPos == charLen)
{
if (ReadBuffer() == 0) return null;
}
StringBuilder sb = null;
do
{
int i = charPos;
do
{
char ch = charBuffer[i];
int EolChars = 0;
if (ch == '\r' || ch == '\n')
{
EolChars = 1;
String s;
if (sb != null)
{
sb.Append(charBuffer, charPos, i - charPos);
s = sb.ToString();
}
else
{
s = new String(charBuffer, charPos, i - charPos);
}
charPos = i + 1;
if (ch == '\r' && (charPos < charLen || ReadBuffer() > 0))
{
if (charBuffer[charPos] == '\n')
{
charPos++;
EolChars = 2;
}
}
_lineLength = s.Length + EolChars;
_bytesRead = _bytesRead + _lineLength;
return s;
}
i++;
} while (i < charLen);
i = charLen - charPos;
if (sb == null) sb = new StringBuilder(i + 80);
sb.Append(charBuffer, charPos, i);
} while (ReadBuffer() > 0);
string ss = sb.ToString();
_lineLength = ss.Length;
_bytesRead = _bytesRead + _lineLength;
return ss;
}
Think there is a minor bug in the code as the length of the string is used to calculate file position instead of using the actual bytes read (Lacking support for UTF8 and UTF16 encoded files).
I came here looking for something simple. If you're just using ReadLine() and don't care about using Seek() or anything, just make a simple subclass of StreamReader
class CountingReader : StreamReader {
private int _lineNumber = 0;
public int LineNumber { get { return _lineNumber; } }
public CountingReader(Stream stream) : base(stream) { }
public override string ReadLine() {
_lineNumber++;
return base.ReadLine();
}
}
and then you make it the normal way, say from a FileInfo object named file
CountingReader reader = new CountingReader(file.OpenRead())
and you just read the reader.LineNumber property.
The points already made with respect to the BaseStream are valid and important. However, there are situations in which you want to read a text and know where in the text you are. It can still be useful to write that up as a class to make it easy to reuse.
I tried to write such a class now. It seems to work correctly, but it's rather slow. It should be fine when performance isn't crucial (it isn't that slow, see below).
I use the same logic to track position in the text regardless if you read a char at a time, one buffer at a time, or one line at a time. While I'm sure this can be made to perform rather better by abandoning this, it made it much easier to implement... and, I hope, to follow the code.
I did a very basic performance comparison of the ReadLine method (which I believe is the weakest point of this implementation) to StreamReader, and the difference is almost an order of magnitude. I got 22 MB/s using my class StreamReaderEx, but nearly 9 times as much using StreamReader directly (on my SSD-equipped laptop). While it could be interesting, I don't know how to make a proper reading test; maybe using 2 identical files, each larger than the disk buffer, and reading them alternately..? At least my simple test produces consistent results when I run it several times, and regardless of which class reads the test file first.
The NewLine symbol defaults to Environment.NewLine but can be set to any string of length 1 or 2. The reader considers only this symbol as a newline, which may be a drawback. At least I know Visual Studio has prompted me a fair number of times that a file I open "has inconsistent newlines".
Please note that I haven't included the Guard class; this is a simple utility class and it should be obvoius from the context how to replace it. You can even remove it, but you'd lose some argument checking and thus the resulting code would be farther from "correct". For example, Guard.NotNull(s, "s") simply checks that is s is not null, throwing an ArgumentNullException (with argument name "s", hence the second parameter) should it be the case.
Enough babble, here's the code:
public class StreamReaderEx : StreamReader
{
// NewLine characters (magic value -1: "not used").
int newLine1, newLine2;
// The last character read was the first character of the NewLine symbol AND we are using a two-character symbol.
bool insideNewLine;
// StringBuilder used for ReadLine implementation.
StringBuilder lineBuilder = new StringBuilder();
public StreamReaderEx(string path, string newLine = "\r\n") : base(path)
{
init(newLine);
}
public StreamReaderEx(Stream s, string newLine = "\r\n") : base(s)
{
init(newLine);
}
public string NewLine
{
get { return "" + (char)newLine1 + (char)newLine2; }
private set
{
Guard.NotNull(value, "value");
Guard.Range(value.Length, 1, 2, "Only 1 to 2 character NewLine symbols are supported.");
newLine1 = value[0];
newLine2 = (value.Length == 2 ? value[1] : -1);
}
}
public int LineNumber { get; private set; }
public int LinePosition { get; private set; }
public override int Read()
{
int next = base.Read();
trackTextPosition(next);
return next;
}
public override int Read(char[] buffer, int index, int count)
{
int n = base.Read(buffer, index, count);
for (int i = 0; i