C# Roll back Streamreader 1 character - c#

For a C# project i am using streamreader, i need to go back 1 character (basicly like undoing) and i need it to change so when you get the next character it is the same one as when you rolled back
For example
Hello There
we do
H
E
L
L
O
[whitespace]
T
H
E
R <-- we undo the R
so..
R <-- undid
R
E
that is a rough idea

When you don't know if you want the value, instead of Read(), use Peek() - then you can check the value without advancing the stream. Another approach (that I use in some of my code) is to encapsulate the reader (or in my case, a Stream) in a class that has an internal buffer that lets you push values back. The buffer is always used first, making it easy to push values (or even: adjusted values) back into the stream without having to rewind it (which doesn't work for a number of streams).

A clean solution would be to derive a class from StreamReader and override the Read() function.
For your requirements a simple private int lastChar would suffice to implement a Pushback() method. A more general solution would use a Stack<char> to allow unlimited pushbacks.
//untested, incomplete
class MyReader : StreamReader
{
public MyReader(Stream strm)
: base(strm)
{
}
private int lastChar = -1;
public override int Read()
{
int ch;
if (lastChar >= 0)
{
ch = lastChar;
lastChar = -1;
}
else
{
ch = base.Read(); // could be -1
}
return ch;
}
public void PushBack(char ch) // char, don't allow Pushback(-1)
{
if (lastChar >= 0)
throw new InvalidOperation("PushBack of more than 1 char");
lastChar = ch;
}
}

Minus one from the position:
var bytes = Encoding.ASCII.GetBytes("String");
Stream stream = new MemoryStream(bytes);
Console.WriteLine((char)stream.ReadByte()); //S
Console.WriteLine((char)stream.ReadByte()); //t
stream.Position -= 1;
Console.WriteLine((char)stream.ReadByte()); //t
Console.WriteLine((char)stream.ReadByte()); //r
Console.WriteLine((char)stream.ReadByte()); //i
Console.WriteLine((char)stream.ReadByte()); //n
Console.WriteLine((char)stream.ReadByte()); //g

Related

Reading and finding texts in a file

I'm reading string data from inside a file. When I search the string data I read, the value I want does not seem to exist. Can you help with this topic?
The word I'm trying to search is: GTA:SA:MP
The code I use is:
static byte[] ReadFile(string filePath)
{
byte[] buffer;
FileStream fileStream = new FileStream(filePath, FileMode.Open, FileAccess.Read);
try
{
int length = (int)fileStream.Length; // get file length
buffer = new byte[length]; // create buffer
int count; // actual number of bytes read
int sum = 0; // total number of bytes read
// read until Read method returns 0 (end of the stream has been reached)
while ((count = fileStream.Read(buffer, sum, length - sum)) > 0)
sum += count; // sum is a buffer offset for next reading
}
finally
{
fileStream.Close();
}
return buffer;
}
static void Main(string[] args)
{
byte[] data = ReadFile(#"FILE.exe");
string result = Encoding.ASCII.GetString(data);
if (result.Contains("GTA:SA:MP"))
{
Console.WriteLine("Found");
}
else
{
Console.WriteLine("Not found");
}
Console.ReadLine();
}
The answer to me: Not found
You've got a couple problems. As others have pointed out if your source is bytes then you should compare bytes not strings. Otherwise you have encoding issues. Second issue is you're using a buffer but you're not checking for any boundary conditions - where the pattern you're searching for is split across the buffer size boundary. One simple way to do something like this is treat the source as a stream and just check byte by byte. I'll include an example using a simple state machine made from local functions.
I used the local functions just because it seemed fun, you can do this in a myriad of ways..
static void Main(string[] _)
{
byte[] target = Encoding.UTF8.GetBytes("2:30pm");
long offsetInSource = 0;
int indexOfTarget = 0;
long current = 0;
bool found = false;
Func<byte, byte, bool> match = CheckStart;
using (BinaryReader reader = new BinaryReader(File.Open("foo.txt", FileMode.Open)))
{
while (current < reader.BaseStream.Length)
{
var b = reader.ReadByte();
var t = target[indexOfTarget];
if (match(t, b))
{
found = true;
break;
}
++current;
}
}
if (found)
{
Console.WriteLine($"Found matching pattern at: {offsetInSource}");
}
else
{
Console.WriteLine("Did not find pattern");
}
bool CheckStart(byte t, byte b)
{
if (t == b)
{
offsetInSource = current;
if (++indexOfTarget == target.Length)
return true;
match = CheckRest;
}
return false;
}
bool CheckRest(byte t, byte b)
{
if (t == b)
{
if (++indexOfTarget == target.Length)
return true;
}
else
{
indexOfTarget = 0;
match = CheckStart;
}
return false;
}
}
}
If your file is huge, you can read file as text in 500 characters (for example) and store them into a string variable and search your phrase in this variable. If your phrase not found, read another 500 characters by 450 (500-50) offset and store them into a string variable and search your phrase in this variable. Do this loop until your phrase found or EOF reached.

Is there a way to read raw content from XmlReader?

I have a very large XML file so I am using XmlReader in C#. Problem is some of the content contains XML-like markers that should not be processed by XmlReader.
<Narf name="DOH">Mark's test of <newline> like stuff</Narf>
This is legacy data, so it cannot be refactored... (of course)
I have tried ReadInnerXml but get the whole node.
I have tried ReadElementContentAsString but get an exception saying 'newline' is not closed.
// Does not deal with markup in the content (Both lines)
ms.mText = reader.ReadElementContentAsString();
XElement el = XNode.ReadFrom(reader) as XElement; ms.mText = el.ToString();
What I want is ms.mText to equal "Mark's test of <newline> like stuff" and not an exception.
System.Xml.XmlException was unhandled
HResult=-2146232000
LineNumber=56
LinePosition=63
Message=The 'newline' start tag on line 56 position 42 does not match the end tag of 'Narf'. Line 56, position 63.
Source=System.Xml
The duplicate flagged question did not solve the problem because it requires changing the input to remove the problem before using the data. As stated above, this is legacy data.
I figured it out based on responses here! Not elegant, but works...
public class TextWedge : TextReader
{
private StreamReader mSr = null;
private string mBuffer = "";
public TextWedge(string filename)
{
mSr = File.OpenText(filename);
// buffer 50
for (int i =0; i<50; i++)
{
mBuffer += (char) (mSr.Read());
}
}
public override int Peek()
{
return mSr.Peek() + mBuffer.Length;
}
public override int Read()
{
int iRet = -1;
if (mBuffer.Length > 0)
{
iRet = mBuffer[0];
int ic = mSr.Read();
char c = (char)ic;
mBuffer = mBuffer.Remove(0, 1);
if (ic != -1)
{
mBuffer += c;
// Run through the battery of non-xml tags
mBuffer = mBuffer.Replace("<newline>", "[br]");
}
}
return iRet;
}
}

C# compress a byte array

I do not know much about compression algorithms. I am looking for a simple compression algorithm (or code snippet) which can reduce the size of a byte[,,] or byte[]. I cannot make use of System.IO.Compression. Also, the data has lots of repetition.
I tried implementing the RLE algorithm (posted below for your inspection). However, it produces array's 1.2 to 1.8 times larger.
public static class RLE
{
public static byte[] Encode(byte[] source)
{
List<byte> dest = new List<byte>();
byte runLength;
for (int i = 0; i < source.Length; i++)
{
runLength = 1;
while (runLength < byte.MaxValue
&& i + 1 < source.Length
&& source[i] == source[i + 1])
{
runLength++;
i++;
}
dest.Add(runLength);
dest.Add(source[i]);
}
return dest.ToArray();
}
public static byte[] Decode(byte[] source)
{
List<byte> dest = new List<byte>();
byte runLength;
for (int i = 1; i < source.Length; i+=2)
{
runLength = source[i - 1];
while (runLength > 0)
{
dest.Add(source[i]);
runLength--;
}
}
return dest.ToArray();
}
}
I have also found a java, string and integer based, LZW implementation. I have converted it to C# and the results look good (code posted below). However, I am not sure how it works nor how to make it work with bytes instead of strings and integers.
public class LZW
{
/* Compress a string to a list of output symbols. */
public static int[] compress(string uncompressed)
{
// Build the dictionary.
int dictSize = 256;
Dictionary<string, int> dictionary = new Dictionary<string, int>();
for (int i = 0; i < dictSize; i++)
dictionary.Add("" + (char)i, i);
string w = "";
List<int> result = new List<int>();
for (int i = 0; i < uncompressed.Length; i++)
{
char c = uncompressed[i];
string wc = w + c;
if (dictionary.ContainsKey(wc))
w = wc;
else
{
result.Add(dictionary[w]);
// Add wc to the dictionary.
dictionary.Add(wc, dictSize++);
w = "" + c;
}
}
// Output the code for w.
if (w != "")
result.Add(dictionary[w]);
return result.ToArray();
}
/* Decompress a list of output ks to a string. */
public static string decompress(int[] compressed)
{
int dictSize = 256;
Dictionary<int, string> dictionary = new Dictionary<int, string>();
for (int i = 0; i < dictSize; i++)
dictionary.Add(i, "" + (char)i);
string w = "" + (char)compressed[0];
string result = w;
for (int i = 1; i < compressed.Length; i++)
{
int k = compressed[i];
string entry = "";
if (dictionary.ContainsKey(k))
entry = dictionary[k];
else if (k == dictSize)
entry = w + w[0];
result += entry;
// Add w+entry[0] to the dictionary.
dictionary.Add(dictSize++, w + entry[0]);
w = entry;
}
return result;
}
}
Have a look here. I used this code as a basis to compress in one of my work projects. Not sure how much of the .NET Framework is accessbile in the Xbox 360 SDK, so not sure how well this will work for you.
The problem with that RLE algorithm is that it is too simple. It prefixes every byte with how many times it is repeated, but that does mean that in long ranges of non-repeating bytes, each single byte is prefixed with a "1". On data without any repetitions this will double the file size.
This can be avoided by using Code-type RLE instead; the 'Code' (also called 'Token') will be a byte that can have two meanings; either it indicates how many times the single following byte is repeated, or it indicates how many non-repeating bytes follow that should be copied as they are. The difference between those two codes is made by enabling the highest bit, meaning there are still 7 bits available for the value, meaning the amount to copy or repeat per such code can be up to 127.
This means that even in worst-case scenarios, the final size can only be about 1/127th larger than the original file size.
A good explanation of the whole concept, plus full working (and, in fact, heavily optimised) C# code, can be found here:
http://www.shikadi.net/moddingwiki/RLE_Compression
Note that sometimes, the data will end up larger than the original anyway, simply because there are not enough repeating bytes in it for RLE to work. A good way to deal with such compression failures is by adding a header to your final data. If you simply add an extra byte at the start that's on 0 for uncompressed data and 1 for RLE compressed data, then, when RLE fails to give a smaller result, you just save it uncompressed, with the 0 in front, and your final data will be exactly one byte larger than the original. The system at the other side can then read that starting byte and use that to determine if the following data should be uncompressed or just copied.
Look into Huffman codes, it's a pretty simple algorithm. Basically, use fewer bits for patterns that show up more often, and keep a table of how it's encoded. And you have to account in your codewords that there are no separators to help you decode.

How can we prepend strings with StringBuilder?

I know we can append strings using StringBuilder. Is there a way we can prepend strings (i.e. add strings in front of a string) using StringBuilder so we can keep the performance benefits that StringBuilder offers?
Using the insert method with the position parameter set to 0 would be the same as prepending (i.e. inserting at the beginning).
C# example : varStringBuilder.Insert(0, "someThing");
Java example : varStringBuilder.insert(0, "someThing");
It works both for C# and Java
Prepending a String will usually require copying everything after the insertion point back some in the backing array, so it won't be as quick as appending to the end.
But you can do it like this in Java (in C# it's the same, but the method is called Insert):
aStringBuilder.insert(0, "newText");
If you require high performance with lots of prepends, you'll need to write your own version of StringBuilder (or use someone else's). With the standard StringBuilder (although technically it could be implemented differently) insert require copying data after the insertion point. Inserting n piece of text can take O(n^2) time.
A naive approach would be to add an offset into the backing char[] buffer as well as the length. When there is not enough room for a prepend, move the data up by more than is strictly necessary. This can bring performance back down to O(n log n) (I think). A more refined approach is to make the buffer cyclic. In that way the spare space at both ends of the array becomes contiguous.
Here's what you can do If you want to prepend using Java's StringBuilder class:
StringBuilder str = new StringBuilder();
str.Insert(0, "text");
You could try an extension method:
/// <summary>
/// kind of a dopey little one-off for StringBuffer, but
/// an example where you can get crazy with extension methods
/// </summary>
public static void Prepend(this StringBuilder sb, string s)
{
sb.Insert(0, s);
}
StringBuilder sb = new StringBuilder("World!");
sb.Prepend("Hello "); // Hello World!
You could build the string in reverse and then reverse the result.
You incur an O(n) cost instead of an O(n^2) worst case cost.
If I understand you correctly, the insert method looks like it'll do what you want. Just insert the string at offset 0.
I haven't used it but Ropes For Java Sounds intriguing. The project name is a play on words, use a Rope instead of a String for serious work. Gets around the performance penalty for prepending and other operations. Worth a look, if you're going to be doing a lot of this.
A rope is a high performance
replacement for Strings. The
datastructure, described in detail in
"Ropes: an Alternative to Strings",
provides asymptotically better
performance than both String and
StringBuffer for common string
modifications like prepend, append,
delete, and insert. Like Strings,
ropes are immutable and therefore
well-suited for use in multi-threaded
programming.
Try using Insert()
StringBuilder MyStringBuilder = new StringBuilder("World!");
MyStringBuilder.Insert(0,"Hello "); // Hello World!
You could create an extension for StringBuilder yourself with a simple class:
namespace Application.Code.Helpers
{
public static class StringBuilderExtensions
{
#region Methods
public static void Prepend(this StringBuilder sb, string value)
{
sb.Insert(0, value);
}
public static void PrependLine(this StringBuilder sb, string value)
{
sb.Insert(0, value + Environment.NewLine);
}
#endregion
}
}
Then, just add:
using Application.Code.Helpers;
To the top of any class that you want to use the StringBuilder in and any time you use intelli-sense with a StringBuilder variable, the Prepend and PrependLine methods will show up. Just remember that when you use Prepend, you will need to Prepend in reverse order than if you were Appending.
Judging from the other comments, there's no standard quick way of doing this. Using StringBuilder's .Insert(0, "text") is approximately only 1-3x as fast as using painfully slow String concatenation (based on >10000 concats), so below is a class to prepend potentially thousands of times quicker!
I've included some other basic functionality such as append(), subString() and length() etc. Both appends and prepends vary from about twice as fast to 3x slower than StringBuilder appends. Like StringBuilder, the buffer in this class will automatically increase when the text overflows the old buffer size.
The code has been tested quite a lot, but I can't guarantee it's free of bugs.
class Prepender
{
private char[] c;
private int growMultiplier;
public int bufferSize; // Make public for bug testing
public int left; // Make public for bug testing
public int right; // Make public for bug testing
public Prepender(int initialBuffer = 1000, int growMultiplier = 10)
{
c = new char[initialBuffer];
//for (int n = 0; n < initialBuffer; n++) cc[n] = '.'; // For debugging purposes (used fixed width font for testing)
left = initialBuffer / 2;
right = initialBuffer / 2;
bufferSize = initialBuffer;
this.growMultiplier = growMultiplier;
}
public void clear()
{
left = bufferSize / 2;
right = bufferSize / 2;
}
public int length()
{
return right - left;
}
private void increaseBuffer()
{
int nudge = -bufferSize / 2;
bufferSize *= growMultiplier;
nudge += bufferSize / 2;
char[] tmp = new char[bufferSize];
for (int n = left; n < right; n++) tmp[n + nudge] = c[n];
left += nudge;
right += nudge;
c = new char[bufferSize];
//for (int n = 0; n < buffer; n++) cc[n]='.'; // For debugging purposes (used fixed width font for testing)
for (int n = left; n < right; n++) c[n] = tmp[n];
}
public void append(string s)
{
// If necessary, increase buffer size by growMultiplier
while (right + s.Length > bufferSize) increaseBuffer();
// Append user input to buffer
int len = s.Length;
for (int n = 0; n < len; n++)
{
c[right] = s[n];
right++;
}
}
public void prepend(string s)
{
// If necessary, increase buffer size by growMultiplier
while (left - s.Length < 0) increaseBuffer();
// Prepend user input to buffer
int len = s.Length - 1;
for (int n = len; n > -1; n--)
{
left--;
c[left] = s[n];
}
}
public void truncate(int start, int finish)
{
if (start < 0) throw new Exception("Truncation error: Start < 0");
if (left + finish > right) throw new Exception("Truncation error: Finish > string length");
if (finish < start) throw new Exception("Truncation error: Finish < start");
//MessageBox.Show(left + " " + right);
right = left + finish;
left = left + start;
}
public string subString(int start, int finish)
{
if (start < 0) throw new Exception("Substring error: Start < 0");
if (left + finish > right) throw new Exception("Substring error: Finish > string length");
if (finish < start) throw new Exception("Substring error: Finish < start");
return toString(start,finish);
}
public override string ToString()
{
return new string(c, left, right - left);
//return new string(cc, 0, buffer); // For debugging purposes (used fixed width font for testing)
}
private string toString(int start, int finish)
{
return new string(c, left+start, finish-start );
//return new string(cc, 0, buffer); // For debugging purposes (used fixed width font for testing)
}
}
This should work:
aStringBuilder = "newText" + aStringBuilder;

How to know position(linenumber) of a streamreader in a textfile?

an example (that might not be real life, but to make my point) :
public void StreamInfo(StreamReader p)
{
string info = string.Format(
"The supplied streamreaer read : {0}\n at line {1}",
p.ReadLine(),
p.GetLinePosition()-1);
}
GetLinePosition here is an imaginary extension method of streamreader.
Is this possible?
Of course I could keep count myself but that's not the question.
I came across this post while looking for a solution to a similar problem where I needed to seek the StreamReader to particular lines. I ended up creating two extension methods to get and set the position on a StreamReader. It doesn't actually provide a line number count, but in practice, I just grab the position before each ReadLine() and if the line is of interest, then I keep the start position for setting later to get back to the line like so:
var index = streamReader.GetPosition();
var line1 = streamReader.ReadLine();
streamReader.SetPosition(index);
var line2 = streamReader.ReadLine();
Assert.AreEqual(line1, line2);
and the important part:
public static class StreamReaderExtensions
{
readonly static FieldInfo charPosField = typeof(StreamReader).GetField("charPos", BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.DeclaredOnly);
readonly static FieldInfo byteLenField = typeof(StreamReader).GetField("byteLen", BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.DeclaredOnly);
readonly static FieldInfo charBufferField = typeof(StreamReader).GetField("charBuffer", BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.DeclaredOnly);
public static long GetPosition(this StreamReader reader)
{
// shift position back from BaseStream.Position by the number of bytes read
// into internal buffer.
int byteLen = (int)byteLenField.GetValue(reader);
var position = reader.BaseStream.Position - byteLen;
// if we have consumed chars from the buffer we need to calculate how many
// bytes they represent in the current encoding and add that to the position.
int charPos = (int)charPosField.GetValue(reader);
if (charPos > 0)
{
var charBuffer = (char[])charBufferField.GetValue(reader);
var encoding = reader.CurrentEncoding;
var bytesConsumed = encoding.GetBytes(charBuffer, 0, charPos).Length;
position += bytesConsumed;
}
return position;
}
public static void SetPosition(this StreamReader reader, long position)
{
reader.DiscardBufferedData();
reader.BaseStream.Seek(position, SeekOrigin.Begin);
}
}
This works quite well for me and depending on your tolerance for using reflection It thinks it is a fairly simple solution.
Caveats:
While I have done some simple testing using various Systems.Text.Encoding options, pretty much all of the data I consume with this are simple text files (ASCII).
I only ever use the StreamReader.ReadLine() method and while a brief review of the source for StreamReader seems to indicate this will still work when using the other read methods, I have not really tested that scenario.
No, not really possible. The concept of a "line number" is based upon the actual data that's already been read, not just the position. For instance, if you were to Seek() the reader to an arbitrary position, it's not actuall going to read that data, so it wouldn't be able to determine the line number.
The only way to do this is to keep track of it yourself.
It is extremely easy to provide a line-counting wrapper for any TextReader:
public class PositioningReader : TextReader {
private TextReader _inner;
public PositioningReader(TextReader inner) {
_inner = inner;
}
public override void Close() {
_inner.Close();
}
public override int Peek() {
return _inner.Peek();
}
public override int Read() {
var c = _inner.Read();
if (c >= 0)
AdvancePosition((Char)c);
return c;
}
private int _linePos = 0;
public int LinePos { get { return _linePos; } }
private int _charPos = 0;
public int CharPos { get { return _charPos; } }
private int _matched = 0;
private void AdvancePosition(Char c) {
if (Environment.NewLine[_matched] == c) {
_matched++;
if (_matched == Environment.NewLine.Length) {
_linePos++;
_charPos = 0;
_matched = 0;
}
}
else {
_matched = 0;
_charPos++;
}
}
}
Drawbacks (for the sake of brevity):
Does not check constructor argument for null
Does not recognize alternate ways to terminate the lines. Will be inconsistent with ReadLine() behavior when reading files separated by raw \r or \n.
Does not override "block"-level methods like Read(char[], int, int), ReadBlock, ReadLine, ReadToEnd. TextReader implementation works correctly since it routes everything else to Read(); however, better performance could be achieved by
overriding those methods via routing calls to _inner. instead of base.
passing the characters read to the AdvancePosition. See the sample ReadBlock implementation:
public override int ReadBlock(char[] buffer, int index, int count) {
var readCount = _inner.ReadBlock(buffer, index, count);
for (int i = 0; i < readCount; i++)
AdvancePosition(buffer[index + i]);
return readCount;
}
No.
Consider that it's possible to seek to any poisition using the underlying stream object (which could be at any point in any line).
Now consider what that would do to any count kept by the StreamReader.
Should the StreamReader go and figure out which line it's now on?
Should it just keep a number of lines read, regardless of position within the file?
There are more questions than just these that would make this a nightmare to implement, imho.
Here is a guy that implemented a StreamReader with ReadLine() method that registers file position.
http://www.daniweb.com/forums/thread35078.html
I guess one should inherit from StreamReader, and then add the extra method to the special class along with some properties (_lineLength + _bytesRead):
// Reads a line. A line is defined as a sequence of characters followed by
// a carriage return ('\r'), a line feed ('\n'), or a carriage return
// immediately followed by a line feed. The resulting string does not
// contain the terminating carriage return and/or line feed. The returned
// value is null if the end of the input stream has been reached.
//
/// <include file='doc\myStreamReader.uex' path='docs/doc[#for="myStreamReader.ReadLine"]/*' />
public override String ReadLine()
{
_lineLength = 0;
//if (stream == null)
// __Error.ReaderClosed();
if (charPos == charLen)
{
if (ReadBuffer() == 0) return null;
}
StringBuilder sb = null;
do
{
int i = charPos;
do
{
char ch = charBuffer[i];
int EolChars = 0;
if (ch == '\r' || ch == '\n')
{
EolChars = 1;
String s;
if (sb != null)
{
sb.Append(charBuffer, charPos, i - charPos);
s = sb.ToString();
}
else
{
s = new String(charBuffer, charPos, i - charPos);
}
charPos = i + 1;
if (ch == '\r' && (charPos < charLen || ReadBuffer() > 0))
{
if (charBuffer[charPos] == '\n')
{
charPos++;
EolChars = 2;
}
}
_lineLength = s.Length + EolChars;
_bytesRead = _bytesRead + _lineLength;
return s;
}
i++;
} while (i < charLen);
i = charLen - charPos;
if (sb == null) sb = new StringBuilder(i + 80);
sb.Append(charBuffer, charPos, i);
} while (ReadBuffer() > 0);
string ss = sb.ToString();
_lineLength = ss.Length;
_bytesRead = _bytesRead + _lineLength;
return ss;
}
Think there is a minor bug in the code as the length of the string is used to calculate file position instead of using the actual bytes read (Lacking support for UTF8 and UTF16 encoded files).
I came here looking for something simple. If you're just using ReadLine() and don't care about using Seek() or anything, just make a simple subclass of StreamReader
class CountingReader : StreamReader {
private int _lineNumber = 0;
public int LineNumber { get { return _lineNumber; } }
public CountingReader(Stream stream) : base(stream) { }
public override string ReadLine() {
_lineNumber++;
return base.ReadLine();
}
}
and then you make it the normal way, say from a FileInfo object named file
CountingReader reader = new CountingReader(file.OpenRead())
and you just read the reader.LineNumber property.
The points already made with respect to the BaseStream are valid and important. However, there are situations in which you want to read a text and know where in the text you are. It can still be useful to write that up as a class to make it easy to reuse.
I tried to write such a class now. It seems to work correctly, but it's rather slow. It should be fine when performance isn't crucial (it isn't that slow, see below).
I use the same logic to track position in the text regardless if you read a char at a time, one buffer at a time, or one line at a time. While I'm sure this can be made to perform rather better by abandoning this, it made it much easier to implement... and, I hope, to follow the code.
I did a very basic performance comparison of the ReadLine method (which I believe is the weakest point of this implementation) to StreamReader, and the difference is almost an order of magnitude. I got 22 MB/s using my class StreamReaderEx, but nearly 9 times as much using StreamReader directly (on my SSD-equipped laptop). While it could be interesting, I don't know how to make a proper reading test; maybe using 2 identical files, each larger than the disk buffer, and reading them alternately..? At least my simple test produces consistent results when I run it several times, and regardless of which class reads the test file first.
The NewLine symbol defaults to Environment.NewLine but can be set to any string of length 1 or 2. The reader considers only this symbol as a newline, which may be a drawback. At least I know Visual Studio has prompted me a fair number of times that a file I open "has inconsistent newlines".
Please note that I haven't included the Guard class; this is a simple utility class and it should be obvoius from the context how to replace it. You can even remove it, but you'd lose some argument checking and thus the resulting code would be farther from "correct". For example, Guard.NotNull(s, "s") simply checks that is s is not null, throwing an ArgumentNullException (with argument name "s", hence the second parameter) should it be the case.
Enough babble, here's the code:
public class StreamReaderEx : StreamReader
{
// NewLine characters (magic value -1: "not used").
int newLine1, newLine2;
// The last character read was the first character of the NewLine symbol AND we are using a two-character symbol.
bool insideNewLine;
// StringBuilder used for ReadLine implementation.
StringBuilder lineBuilder = new StringBuilder();
public StreamReaderEx(string path, string newLine = "\r\n") : base(path)
{
init(newLine);
}
public StreamReaderEx(Stream s, string newLine = "\r\n") : base(s)
{
init(newLine);
}
public string NewLine
{
get { return "" + (char)newLine1 + (char)newLine2; }
private set
{
Guard.NotNull(value, "value");
Guard.Range(value.Length, 1, 2, "Only 1 to 2 character NewLine symbols are supported.");
newLine1 = value[0];
newLine2 = (value.Length == 2 ? value[1] : -1);
}
}
public int LineNumber { get; private set; }
public int LinePosition { get; private set; }
public override int Read()
{
int next = base.Read();
trackTextPosition(next);
return next;
}
public override int Read(char[] buffer, int index, int count)
{
int n = base.Read(buffer, index, count);
for (int i = 0; i

Categories

Resources