Related
I want to publish server-messages on Twitter, for our clients.
Unfortunately, Twitter only allows posting 140 Chars or less. This is a shame.
Now, I have to write an algorithm that concatenates the different messages from the server together, but shortens them to a max of 140 characters.
It's pretty tricky.
CODE
static string concatinateStringsWithLength(string[] strings, int length, string separator) {
// This is the maximum number of chars for the strings
// We have to subtract the separators
int maxLengthOfAllStrings = length - ((strings.Length - 1) * separator.Length);
// Here we save all shortenedStrings
string[] cutStrings = new string[strings.Length];
// This is the average length of all the strings
int averageStringLenght = maxLengthOfAllStrings / strings.Length;
// Now we check how many strings are longer than the average string
int longerStrings = 0;
foreach (string singleString in strings)
{
if (singleString.Length > averageStringLenght)
{
longerStrings++;
}
}
// If a string is smaller than the average string, we can more characters to the longer strings
int maxStringLength = averageStringLenght;
foreach (string singleString in strings)
{
if (averageStringLenght > singleString.Length)
{
maxStringLength += (int)((averageStringLenght - singleString.Length) * (1.0 / longerStrings));
}
}
// Finally we shorten the strings and save them to the array
int i = 0;
foreach (string singleString in strings)
{
string shortenedString = singleString;
if (singleString.Length > maxStringLength)
{
shortenedString = singleString.Remove(maxStringLength);
}
cutStrings[i] = shortenedString;
i++;
}
return String.Join(separator, cutStrings);
}
Problem with this
This algorithm works, but it's not very optimized.
It uses less characters than it actually could.
The main problem with this is that the variable longerStrings is relative to the maxStringLength, and backwards.
This means if I change longerStrings, maxStringLength gets changed, and so on and so on.
I'd have to make a while loop and do this until there are no changes, but I don't think that's necessary for such a simple case.
Can you give me a clue on how to continue?
Or maybe there already exists something similar?
Thanks!
EDIT
The messages I get from the server look like this:
Message
Subject
Date
Body
Message
Subject
Date
Body
And so on.
What I want is to concatenate the strings with a separator, in this case a semi-colon.
There should be a max length. The long strings should be shortened first.
Example
This is a subject
This is the body and is a bit lon...
25.02.2013
This is a s...
This is the...
25.02.2013
I think you get the idea ;)
Five times slower than yours (in our simple example) but should use maximum avaliable space (no critical values checking):
static string Concatenate(string[] strings, int maxLength, string separator)
{
var totalLength = strings.Sum(s => s.Length);
var requiredLength = totalLength - (strings.Length - 1)*separator.Length;
// Return if there is enough place.
if (requiredLength <= maxLength)
return String.Concat(strings.Take(strings.Length - 1).Select(s => s + separator).Concat(new[] {strings.Last()}));
// The problem...
var helpers = new ConcatenateInternal[strings.Length];
for (var i = 0; i < helpers.Length; i++)
helpers[i] = new ConcatenateInternal(strings[i].Length);
var avaliableLength = maxLength - (strings.Length - 1)*separator.Length;
var charsInserted = 0;
var currentIndex = 0;
while (charsInserted != avaliableLength)
{
for (var i = 0; i < strings.Length; i++)
{
if (charsInserted == avaliableLength)
break;
if (currentIndex >= strings[i].Length)
{
helpers[i].Finished = true;
continue;
}
helpers[i].StringBuilder.Append(strings[i][currentIndex]);
charsInserted++;
}
currentIndex++;
}
var unified = new StringBuilder(avaliableLength);
for (var i = 0; i < strings.Length; i++)
{
if (!helpers[i].Finished)
{
unified.Append(helpers[i].StringBuilder.ToString(0, helpers[i].StringBuilder.Length - 3));
unified.Append("...");
}
else
{
unified.Append(helpers[i].StringBuilder.ToString());
}
if (i < strings.Length - 1)
{
unified.Append(separator);
}
}
return unified.ToString();
}
And ConcatenateInternal:
class ConcatenateInternal
{
public StringBuilder StringBuilder { get; private set; }
public bool Finished { get; set; }
public ConcatenateInternal(int capacity)
{
StringBuilder = new StringBuilder(capacity);
}
}
I'm trying to insert a certain number of indentations before a string based on an items depth and I'm wondering if there is a way to return a string repeated X times. Example:
string indent = "---";
Console.WriteLine(indent.Repeat(0)); //would print nothing.
Console.WriteLine(indent.Repeat(1)); //would print "---".
Console.WriteLine(indent.Repeat(2)); //would print "------".
Console.WriteLine(indent.Repeat(3)); //would print "---------".
If you only intend to repeat the same character you can use the string constructor that accepts a char and the number of times to repeat it new String(char c, int count).
For example, to repeat a dash five times:
string result = new String('-', 5);
Output: -----
If you're using .NET 4.0, you could use string.Concat together with Enumerable.Repeat.
int N = 5; // or whatever
Console.WriteLine(string.Concat(Enumerable.Repeat(indent, N)));
Otherwise I'd go with something like Adam's answer.
The reason I generally wouldn't advise using Andrey's answer is simply that the ToArray() call introduces superfluous overhead that is avoided with the StringBuilder approach suggested by Adam. That said, at least it works without requiring .NET 4.0; and it's quick and easy (and isn't going to kill you if efficiency isn't too much of a concern).
most performant solution for string
string result = new StringBuilder().Insert(0, "---", 5).ToString();
public static class StringExtensions
{
public static string Repeat(this string input, int count)
{
if (string.IsNullOrEmpty(input) || count <= 1)
return input;
var builder = new StringBuilder(input.Length * count);
for(var i = 0; i < count; i++) builder.Append(input);
return builder.ToString();
}
}
For many scenarios, this is probably the neatest solution:
public static class StringExtensions
{
public static string Repeat(this string s, int n)
=> new StringBuilder(s.Length * n).Insert(0, s, n).ToString();
}
Usage is then:
text = "Hello World! ".Repeat(5);
This builds on other answers (particularly #c0rd's). As well as simplicity, it has the following features, which not all the other techniques discussed share:
Repetition of a string of any length, not just a character (as requested by the OP).
Efficient use of StringBuilder through storage preallocation.
Strings and chars [version 1]
string.Join("", Enumerable.Repeat("text" , 2 ));
//result: texttext
Strings and chars [version 2]:
String.Concat(Enumerable.Repeat("text", 2));
//result: texttext
Strings and chars [version 3]
new StringBuilder().Insert(0, "text", 2).ToString();
//result: texttext
Chars only:
new string('5', 3);
//result: 555
Extension way:
(works FASTER - better for WEB)
public static class RepeatExtensions
{
public static string Repeat(this string str, int times)
{
var a = new StringBuilder();
//Append is faster than Insert
( () => a.Append(str) ).RepeatAction(times) ;
return a.ToString();
}
public static void RepeatAction(this Action action, int count)
{
for (int i = 0; i < count; i++)
{
action();
}
}
}
usage:
var a = "Hello".Repeat(3);
//result: HelloHelloHello
Use String.PadLeft, if your desired string contains only a single char.
public static string Indent(int count, char pad)
{
return String.Empty.PadLeft(count, pad);
}
Credit due here
You can repeat your string (in case it's not a single char) and concat the result, like this:
String.Concat(Enumerable.Repeat("---", 5))
I would go for Dan Tao's answer, but if you're not using .NET 4.0 you can do something like that:
public static string Repeat(this string str, int count)
{
return Enumerable.Repeat(str, count)
.Aggregate(
new StringBuilder(str.Length * count),
(sb, s) => sb.Append(s))
.ToString();
}
string indent = "---";
string n = string.Concat(Enumerable.Repeat(indent, 1).ToArray());
string n = string.Concat(Enumerable.Repeat(indent, 2).ToArray());
string n = string.Concat(Enumerable.Repeat(indent, 3).ToArray());
Adding the Extension Method I am using all over my projects:
public static string Repeat(this string text, int count)
{
if (!String.IsNullOrEmpty(text))
{
return String.Concat(Enumerable.Repeat(text, count));
}
return "";
}
Hope someone can take use of it...
I like the answer given. Along the same lines though is what I've used in the past:
"".PadLeft(3*Indent,'-')
This will fulfill creating an indent but technically the question was to repeat a string. If the string indent is something like >-< then this as well as the accepted answer would not work. In this case, c0rd's solution using StringBuilder looks good, though the overhead of StringBuilder may in fact not make it the most performant. One option is to build an array of strings, fill it with indent strings, then concat that. To whit:
int Indent = 2;
string[] sarray = new string[6]; //assuming max of 6 levels of indent, 0 based
for (int iter = 0; iter < 6; iter++)
{
//using c0rd's stringbuilder concept, insert ABC as the indent characters to demonstrate any string can be used
sarray[iter] = new StringBuilder().Insert(0, "ABC", iter).ToString();
}
Console.WriteLine(sarray[Indent] +"blah"); //now pretend to output some indented line
We all love a clever solution but sometimes simple is best.
Surprised nobody went old-school.
I am not making any claims about this code, but just for fun:
public static string Repeat(this string #this, int count)
{
var dest = new char[#this.Length * count];
for (int i = 0; i < dest.Length; i += 1)
{
dest[i] = #this[i % #this.Length];
}
return new string(dest);
}
Print a line with repetition.
Console.Write(new string('=', 30) + "\n");
==============================
For general use, solutions involving the StringBuilder class are best for repeating multi-character strings. It's optimized to handle the combination of large numbers of strings in a way that simple concatenation can't and that would be difficult or impossible to do more efficiently by hand. The StringBuilder solutions shown here use O(N) iterations to complete, a flat rate proportional to the number of times it is repeated.
However, for very large number of repeats, or where high levels of efficiency must be squeezed out of it, a better approach is to do something similar to StringBuilder's basic functionality but to produce additional copies from the destination, rather than from the original string, as below.
public static string Repeat_CharArray_LogN(this string str, int times)
{
int limit = (int)Math.Log(times, 2);
char[] buffer = new char[str.Length * times];
int width = str.Length;
Array.Copy(str.ToCharArray(), buffer, width);
for (int index = 0; index < limit; index++)
{
Array.Copy(buffer, 0, buffer, width, width);
width *= 2;
}
Array.Copy(buffer, 0, buffer, width, str.Length * times - width);
return new string(buffer);
}
This doubles the length of the source/destination string with each iteration, which saves the overhead of resetting counters each time it would go through the original string, instead smoothly reading through and copying the now much longer string, something that modern processors can do much more efficiently.
It uses a base-2 logarithm to find how many times it needs to double the length of the string and then proceeds to do so that many times. Since the remainder to be copied is now less than the total length it is copying from, it can then simply copy a subset of what it has already generated.
I have used the Array.Copy() method over the use of StringBuilder, as a copying of the content of the StringBuilder into itself would have the overhead of producing a new string with that content with each iteration. Array.Copy() avoids this, while still operating with an extremely high rate of efficiency.
This solution takes O(1 + log N) iterations to complete, a rate that increases logarithmically with the number of repeats (doubling the number of repeats equals one additional iteration), a substantial savings over the other methods, which increase proportionally.
Another approach is to consider string as IEnumerable<char> and have a generic extension method which will multiply the items in a collection by the specified factor.
public static IEnumerable<T> Repeat<T>(this IEnumerable<T> source, int times)
{
source = source.ToArray();
return Enumerable.Range(0, times).SelectMany(_ => source);
}
So in your case:
string indent = "---";
var f = string.Concat(indent.Repeat(0)); //.NET 4 required
//or
var g = new string(indent.Repeat(5).ToArray());
Not sure how this would perform, but it's an easy piece of code. (I have probably made it appear more complicated than it is.)
int indentCount = 3;
string indent = "---";
string stringToBeIndented = "Blah";
// Need dummy char NOT in stringToBeIndented - vertical tab, anyone?
char dummy = '\v';
stringToBeIndented.PadLeft(stringToBeIndented.Length + indentCount, dummy).Replace(dummy.ToString(), indent);
Alternatively, if you know the maximum number of levels you can expect, you could just declare an array and index into it. You would probably want to make this array static or a constant.
string[] indents = new string[4] { "", indent, indent.Replace("-", "--"), indent.Replace("-", "---"), indent.Replace("-", "----") };
output = indents[indentCount] + stringToBeIndented;
I don't have enough rep to comment on Adam's answer, but the best way to do it imo is like this:
public static string RepeatString(string content, int numTimes) {
if(!string.IsNullOrEmpty(content) && numTimes > 0) {
StringBuilder builder = new StringBuilder(content.Length * numTimes);
for(int i = 0; i < numTimes; i++) builder.Append(content);
return builder.ToString();
}
return string.Empty;
}
You must check to see if numTimes is greater then zero, otherwise you will get an exception.
Using the new string.Create function, we can pre-allocate the right size and copy a single string in a loop using Span<char>.
I suspect this is likely to be the fastest method, as there is no extra allocation at all: the string is precisely allocated.
public static string Repeat(this string source, int times)
{
return string.Create(source.Length * times, source, RepeatFromString);
}
private static void RepeatFromString(Span<char> result, string source)
{
ReadOnlySpan<char> sourceSpan = source.AsSpan();
for (var i = 0; i < result.Length; i += sourceSpan.Length)
sourceSpan.CopyTo(result.Slice(i, sourceSpan.Length));
}
dotnetfiddle
I didn't see this solution. I find it simpler for where I currently am in software development:
public static void PrintFigure(int shapeSize)
{
string figure = "\\/";
for (int loopTwo = 1; loopTwo <= shapeSize - 1; loopTwo++)
{
Console.Write($"{figure}");
}
}
You can create an ExtensionMethod to do that!
public static class StringExtension
{
public static string Repeat(this string str, int count)
{
string ret = "";
for (var x = 0; x < count; x++)
{
ret += str;
}
return ret;
}
}
Or using #Dan Tao solution:
public static class StringExtension
{
public static string Repeat(this string str, int count)
{
if (count == 0)
return "";
return string.Concat(Enumerable.Repeat(indent, N))
}
}
In my test I created a string with 32000 characters.
After repeated execution of the test the BCL StringReader consistently executed in 350us while mine ran in 400us. What kind of secrets are they hiding?
Test:
private void SpeedTest()
{
String r = "";
for (int i = 0; i < 1000; i++)
{
r += Randomization.GenerateString();
}
StopWatch s = new StopWatch();
s.Start();
using (var sr = new System.IO.StringReader(r))
{
while (sr.Peek() > -1)
{
sr.Read();
}
}
s.Stop();
_Write(s.Elapsed);
s.Reset();
s.Start();
using (var sr = new MagicSynthesis.StringReader(r))
{
while (sr.PeekNext() > Char.MinValue)
{
sr.Next();
}
}
s.Stop();
_Write(s.Elapsed);
}
Code:
public unsafe class StringReader : IDisposable
{
private Char* Base;
private Char* End;
private Char* Current;
private const Char Null = '\0';
/// <summary></summary>
public StringReader(String s)
{
if (s == null)
throw new ArgumentNullException("s");
Base = (Char*)Marshal.StringToHGlobalUni(s).ToPointer();
End = (Base + s.Length);
Current = Base;
}
/// <summary></summary>
public Char Next()
{
return (Current < End) ? *(Current++) : Null;
}
/// <summary></summary>
public String Next(Int32 length)
{
String s = String.Empty;
while (Current < End && length > 0)
{
length--;
s += *(Current++);
}
return s;
}
/// <summary></summary>
public Char PeekNext()
{
return *(Current);
}
/// <summary></summary>
public String PeekNext(Int32 length)
{
String s = String.Empty;
Char* a = Current;
while (Current < End && length > 0)
{
length--;
s += *(Current++);
}
Current = a;
return s;
}
/// <summary></summary>
public Char Previous()
{
return ((Current > Base) ? *(--Current) : Null);
}
/// <summary></summary>
public Char PeekPrevious()
{
return ((Current > Base) ? *(Current - 1) : Null);
}
/// <summary></summary>
public void Dispose()
{
Marshal.FreeHGlobal(new IntPtr(Base));
}
}
I would bet that Marshal.StringToHGlobalUni() and Marshal.FreeHGlobal(new IntPtr(Base)) have a lot to do with the differences. I'm not sure how StringReader manages the string, but I bet it's not copying it to unmanaged memory.
Looking at the StringReader.Read() method in Reflector shows this:
public override int Read()
{
if (this._s == null)
{
__Error.ReaderClosed();
}
if (this._pos == this._length)
{
return -1;
}
return this._s[this._pos++];
}
The contructor is also just:
public StringReader(string s)
{
if (s == null)
{
throw new ArgumentNullException("s");
}
this._s = s;
this._length = (s == null) ? 0 : s.Length;
}
So, it appear that StringReader just maintains the current position and uses regular indexes to return values.
Edit
In response to your comment, your Next() method does a comparison and an unsafe cast, which probably isn't optimized in any way. StringReader.Read() does simple comparison and returns the character as _pos index in the string, which probably has some optimization by the compiler.
Maybe Reflector would help you find your answer?
You can always look at the source code
Couldn't tell after simply looking at your code, but here's the code for StringReader.Read():
public override int Read()
{
if (this._s == null)
{
__Error.ReaderClosed();
}
if (this._pos == this._length)
{
return -1;
}
return this._s[this._pos++];
}
They've got two simple value checks and an array access plus increment, versus your value check and pointer increment. Perhaps it would be useful to look at the IL and see how many ops each compiles down to.
Have you tried profiling your StringReader to see if there are any obvious places where you could save time? This is the most reliable way to determine what the bottlenecks in your code are.
Normally I would suggest profiling your solution against the other but I'm not sure about the viability of profiling the BCL. It's GAC'd and strongly signed which makes instrumentation difficult so you would have to rely on sampling.
I know we can append strings using StringBuilder. Is there a way we can prepend strings (i.e. add strings in front of a string) using StringBuilder so we can keep the performance benefits that StringBuilder offers?
Using the insert method with the position parameter set to 0 would be the same as prepending (i.e. inserting at the beginning).
C# example : varStringBuilder.Insert(0, "someThing");
Java example : varStringBuilder.insert(0, "someThing");
It works both for C# and Java
Prepending a String will usually require copying everything after the insertion point back some in the backing array, so it won't be as quick as appending to the end.
But you can do it like this in Java (in C# it's the same, but the method is called Insert):
aStringBuilder.insert(0, "newText");
If you require high performance with lots of prepends, you'll need to write your own version of StringBuilder (or use someone else's). With the standard StringBuilder (although technically it could be implemented differently) insert require copying data after the insertion point. Inserting n piece of text can take O(n^2) time.
A naive approach would be to add an offset into the backing char[] buffer as well as the length. When there is not enough room for a prepend, move the data up by more than is strictly necessary. This can bring performance back down to O(n log n) (I think). A more refined approach is to make the buffer cyclic. In that way the spare space at both ends of the array becomes contiguous.
Here's what you can do If you want to prepend using Java's StringBuilder class:
StringBuilder str = new StringBuilder();
str.Insert(0, "text");
You could try an extension method:
/// <summary>
/// kind of a dopey little one-off for StringBuffer, but
/// an example where you can get crazy with extension methods
/// </summary>
public static void Prepend(this StringBuilder sb, string s)
{
sb.Insert(0, s);
}
StringBuilder sb = new StringBuilder("World!");
sb.Prepend("Hello "); // Hello World!
You could build the string in reverse and then reverse the result.
You incur an O(n) cost instead of an O(n^2) worst case cost.
If I understand you correctly, the insert method looks like it'll do what you want. Just insert the string at offset 0.
I haven't used it but Ropes For Java Sounds intriguing. The project name is a play on words, use a Rope instead of a String for serious work. Gets around the performance penalty for prepending and other operations. Worth a look, if you're going to be doing a lot of this.
A rope is a high performance
replacement for Strings. The
datastructure, described in detail in
"Ropes: an Alternative to Strings",
provides asymptotically better
performance than both String and
StringBuffer for common string
modifications like prepend, append,
delete, and insert. Like Strings,
ropes are immutable and therefore
well-suited for use in multi-threaded
programming.
Try using Insert()
StringBuilder MyStringBuilder = new StringBuilder("World!");
MyStringBuilder.Insert(0,"Hello "); // Hello World!
You could create an extension for StringBuilder yourself with a simple class:
namespace Application.Code.Helpers
{
public static class StringBuilderExtensions
{
#region Methods
public static void Prepend(this StringBuilder sb, string value)
{
sb.Insert(0, value);
}
public static void PrependLine(this StringBuilder sb, string value)
{
sb.Insert(0, value + Environment.NewLine);
}
#endregion
}
}
Then, just add:
using Application.Code.Helpers;
To the top of any class that you want to use the StringBuilder in and any time you use intelli-sense with a StringBuilder variable, the Prepend and PrependLine methods will show up. Just remember that when you use Prepend, you will need to Prepend in reverse order than if you were Appending.
Judging from the other comments, there's no standard quick way of doing this. Using StringBuilder's .Insert(0, "text") is approximately only 1-3x as fast as using painfully slow String concatenation (based on >10000 concats), so below is a class to prepend potentially thousands of times quicker!
I've included some other basic functionality such as append(), subString() and length() etc. Both appends and prepends vary from about twice as fast to 3x slower than StringBuilder appends. Like StringBuilder, the buffer in this class will automatically increase when the text overflows the old buffer size.
The code has been tested quite a lot, but I can't guarantee it's free of bugs.
class Prepender
{
private char[] c;
private int growMultiplier;
public int bufferSize; // Make public for bug testing
public int left; // Make public for bug testing
public int right; // Make public for bug testing
public Prepender(int initialBuffer = 1000, int growMultiplier = 10)
{
c = new char[initialBuffer];
//for (int n = 0; n < initialBuffer; n++) cc[n] = '.'; // For debugging purposes (used fixed width font for testing)
left = initialBuffer / 2;
right = initialBuffer / 2;
bufferSize = initialBuffer;
this.growMultiplier = growMultiplier;
}
public void clear()
{
left = bufferSize / 2;
right = bufferSize / 2;
}
public int length()
{
return right - left;
}
private void increaseBuffer()
{
int nudge = -bufferSize / 2;
bufferSize *= growMultiplier;
nudge += bufferSize / 2;
char[] tmp = new char[bufferSize];
for (int n = left; n < right; n++) tmp[n + nudge] = c[n];
left += nudge;
right += nudge;
c = new char[bufferSize];
//for (int n = 0; n < buffer; n++) cc[n]='.'; // For debugging purposes (used fixed width font for testing)
for (int n = left; n < right; n++) c[n] = tmp[n];
}
public void append(string s)
{
// If necessary, increase buffer size by growMultiplier
while (right + s.Length > bufferSize) increaseBuffer();
// Append user input to buffer
int len = s.Length;
for (int n = 0; n < len; n++)
{
c[right] = s[n];
right++;
}
}
public void prepend(string s)
{
// If necessary, increase buffer size by growMultiplier
while (left - s.Length < 0) increaseBuffer();
// Prepend user input to buffer
int len = s.Length - 1;
for (int n = len; n > -1; n--)
{
left--;
c[left] = s[n];
}
}
public void truncate(int start, int finish)
{
if (start < 0) throw new Exception("Truncation error: Start < 0");
if (left + finish > right) throw new Exception("Truncation error: Finish > string length");
if (finish < start) throw new Exception("Truncation error: Finish < start");
//MessageBox.Show(left + " " + right);
right = left + finish;
left = left + start;
}
public string subString(int start, int finish)
{
if (start < 0) throw new Exception("Substring error: Start < 0");
if (left + finish > right) throw new Exception("Substring error: Finish > string length");
if (finish < start) throw new Exception("Substring error: Finish < start");
return toString(start,finish);
}
public override string ToString()
{
return new string(c, left, right - left);
//return new string(cc, 0, buffer); // For debugging purposes (used fixed width font for testing)
}
private string toString(int start, int finish)
{
return new string(c, left+start, finish-start );
//return new string(cc, 0, buffer); // For debugging purposes (used fixed width font for testing)
}
}
This should work:
aStringBuilder = "newText" + aStringBuilder;
an example (that might not be real life, but to make my point) :
public void StreamInfo(StreamReader p)
{
string info = string.Format(
"The supplied streamreaer read : {0}\n at line {1}",
p.ReadLine(),
p.GetLinePosition()-1);
}
GetLinePosition here is an imaginary extension method of streamreader.
Is this possible?
Of course I could keep count myself but that's not the question.
I came across this post while looking for a solution to a similar problem where I needed to seek the StreamReader to particular lines. I ended up creating two extension methods to get and set the position on a StreamReader. It doesn't actually provide a line number count, but in practice, I just grab the position before each ReadLine() and if the line is of interest, then I keep the start position for setting later to get back to the line like so:
var index = streamReader.GetPosition();
var line1 = streamReader.ReadLine();
streamReader.SetPosition(index);
var line2 = streamReader.ReadLine();
Assert.AreEqual(line1, line2);
and the important part:
public static class StreamReaderExtensions
{
readonly static FieldInfo charPosField = typeof(StreamReader).GetField("charPos", BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.DeclaredOnly);
readonly static FieldInfo byteLenField = typeof(StreamReader).GetField("byteLen", BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.DeclaredOnly);
readonly static FieldInfo charBufferField = typeof(StreamReader).GetField("charBuffer", BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.DeclaredOnly);
public static long GetPosition(this StreamReader reader)
{
// shift position back from BaseStream.Position by the number of bytes read
// into internal buffer.
int byteLen = (int)byteLenField.GetValue(reader);
var position = reader.BaseStream.Position - byteLen;
// if we have consumed chars from the buffer we need to calculate how many
// bytes they represent in the current encoding and add that to the position.
int charPos = (int)charPosField.GetValue(reader);
if (charPos > 0)
{
var charBuffer = (char[])charBufferField.GetValue(reader);
var encoding = reader.CurrentEncoding;
var bytesConsumed = encoding.GetBytes(charBuffer, 0, charPos).Length;
position += bytesConsumed;
}
return position;
}
public static void SetPosition(this StreamReader reader, long position)
{
reader.DiscardBufferedData();
reader.BaseStream.Seek(position, SeekOrigin.Begin);
}
}
This works quite well for me and depending on your tolerance for using reflection It thinks it is a fairly simple solution.
Caveats:
While I have done some simple testing using various Systems.Text.Encoding options, pretty much all of the data I consume with this are simple text files (ASCII).
I only ever use the StreamReader.ReadLine() method and while a brief review of the source for StreamReader seems to indicate this will still work when using the other read methods, I have not really tested that scenario.
No, not really possible. The concept of a "line number" is based upon the actual data that's already been read, not just the position. For instance, if you were to Seek() the reader to an arbitrary position, it's not actuall going to read that data, so it wouldn't be able to determine the line number.
The only way to do this is to keep track of it yourself.
It is extremely easy to provide a line-counting wrapper for any TextReader:
public class PositioningReader : TextReader {
private TextReader _inner;
public PositioningReader(TextReader inner) {
_inner = inner;
}
public override void Close() {
_inner.Close();
}
public override int Peek() {
return _inner.Peek();
}
public override int Read() {
var c = _inner.Read();
if (c >= 0)
AdvancePosition((Char)c);
return c;
}
private int _linePos = 0;
public int LinePos { get { return _linePos; } }
private int _charPos = 0;
public int CharPos { get { return _charPos; } }
private int _matched = 0;
private void AdvancePosition(Char c) {
if (Environment.NewLine[_matched] == c) {
_matched++;
if (_matched == Environment.NewLine.Length) {
_linePos++;
_charPos = 0;
_matched = 0;
}
}
else {
_matched = 0;
_charPos++;
}
}
}
Drawbacks (for the sake of brevity):
Does not check constructor argument for null
Does not recognize alternate ways to terminate the lines. Will be inconsistent with ReadLine() behavior when reading files separated by raw \r or \n.
Does not override "block"-level methods like Read(char[], int, int), ReadBlock, ReadLine, ReadToEnd. TextReader implementation works correctly since it routes everything else to Read(); however, better performance could be achieved by
overriding those methods via routing calls to _inner. instead of base.
passing the characters read to the AdvancePosition. See the sample ReadBlock implementation:
public override int ReadBlock(char[] buffer, int index, int count) {
var readCount = _inner.ReadBlock(buffer, index, count);
for (int i = 0; i < readCount; i++)
AdvancePosition(buffer[index + i]);
return readCount;
}
No.
Consider that it's possible to seek to any poisition using the underlying stream object (which could be at any point in any line).
Now consider what that would do to any count kept by the StreamReader.
Should the StreamReader go and figure out which line it's now on?
Should it just keep a number of lines read, regardless of position within the file?
There are more questions than just these that would make this a nightmare to implement, imho.
Here is a guy that implemented a StreamReader with ReadLine() method that registers file position.
http://www.daniweb.com/forums/thread35078.html
I guess one should inherit from StreamReader, and then add the extra method to the special class along with some properties (_lineLength + _bytesRead):
// Reads a line. A line is defined as a sequence of characters followed by
// a carriage return ('\r'), a line feed ('\n'), or a carriage return
// immediately followed by a line feed. The resulting string does not
// contain the terminating carriage return and/or line feed. The returned
// value is null if the end of the input stream has been reached.
//
/// <include file='doc\myStreamReader.uex' path='docs/doc[#for="myStreamReader.ReadLine"]/*' />
public override String ReadLine()
{
_lineLength = 0;
//if (stream == null)
// __Error.ReaderClosed();
if (charPos == charLen)
{
if (ReadBuffer() == 0) return null;
}
StringBuilder sb = null;
do
{
int i = charPos;
do
{
char ch = charBuffer[i];
int EolChars = 0;
if (ch == '\r' || ch == '\n')
{
EolChars = 1;
String s;
if (sb != null)
{
sb.Append(charBuffer, charPos, i - charPos);
s = sb.ToString();
}
else
{
s = new String(charBuffer, charPos, i - charPos);
}
charPos = i + 1;
if (ch == '\r' && (charPos < charLen || ReadBuffer() > 0))
{
if (charBuffer[charPos] == '\n')
{
charPos++;
EolChars = 2;
}
}
_lineLength = s.Length + EolChars;
_bytesRead = _bytesRead + _lineLength;
return s;
}
i++;
} while (i < charLen);
i = charLen - charPos;
if (sb == null) sb = new StringBuilder(i + 80);
sb.Append(charBuffer, charPos, i);
} while (ReadBuffer() > 0);
string ss = sb.ToString();
_lineLength = ss.Length;
_bytesRead = _bytesRead + _lineLength;
return ss;
}
Think there is a minor bug in the code as the length of the string is used to calculate file position instead of using the actual bytes read (Lacking support for UTF8 and UTF16 encoded files).
I came here looking for something simple. If you're just using ReadLine() and don't care about using Seek() or anything, just make a simple subclass of StreamReader
class CountingReader : StreamReader {
private int _lineNumber = 0;
public int LineNumber { get { return _lineNumber; } }
public CountingReader(Stream stream) : base(stream) { }
public override string ReadLine() {
_lineNumber++;
return base.ReadLine();
}
}
and then you make it the normal way, say from a FileInfo object named file
CountingReader reader = new CountingReader(file.OpenRead())
and you just read the reader.LineNumber property.
The points already made with respect to the BaseStream are valid and important. However, there are situations in which you want to read a text and know where in the text you are. It can still be useful to write that up as a class to make it easy to reuse.
I tried to write such a class now. It seems to work correctly, but it's rather slow. It should be fine when performance isn't crucial (it isn't that slow, see below).
I use the same logic to track position in the text regardless if you read a char at a time, one buffer at a time, or one line at a time. While I'm sure this can be made to perform rather better by abandoning this, it made it much easier to implement... and, I hope, to follow the code.
I did a very basic performance comparison of the ReadLine method (which I believe is the weakest point of this implementation) to StreamReader, and the difference is almost an order of magnitude. I got 22 MB/s using my class StreamReaderEx, but nearly 9 times as much using StreamReader directly (on my SSD-equipped laptop). While it could be interesting, I don't know how to make a proper reading test; maybe using 2 identical files, each larger than the disk buffer, and reading them alternately..? At least my simple test produces consistent results when I run it several times, and regardless of which class reads the test file first.
The NewLine symbol defaults to Environment.NewLine but can be set to any string of length 1 or 2. The reader considers only this symbol as a newline, which may be a drawback. At least I know Visual Studio has prompted me a fair number of times that a file I open "has inconsistent newlines".
Please note that I haven't included the Guard class; this is a simple utility class and it should be obvoius from the context how to replace it. You can even remove it, but you'd lose some argument checking and thus the resulting code would be farther from "correct". For example, Guard.NotNull(s, "s") simply checks that is s is not null, throwing an ArgumentNullException (with argument name "s", hence the second parameter) should it be the case.
Enough babble, here's the code:
public class StreamReaderEx : StreamReader
{
// NewLine characters (magic value -1: "not used").
int newLine1, newLine2;
// The last character read was the first character of the NewLine symbol AND we are using a two-character symbol.
bool insideNewLine;
// StringBuilder used for ReadLine implementation.
StringBuilder lineBuilder = new StringBuilder();
public StreamReaderEx(string path, string newLine = "\r\n") : base(path)
{
init(newLine);
}
public StreamReaderEx(Stream s, string newLine = "\r\n") : base(s)
{
init(newLine);
}
public string NewLine
{
get { return "" + (char)newLine1 + (char)newLine2; }
private set
{
Guard.NotNull(value, "value");
Guard.Range(value.Length, 1, 2, "Only 1 to 2 character NewLine symbols are supported.");
newLine1 = value[0];
newLine2 = (value.Length == 2 ? value[1] : -1);
}
}
public int LineNumber { get; private set; }
public int LinePosition { get; private set; }
public override int Read()
{
int next = base.Read();
trackTextPosition(next);
return next;
}
public override int Read(char[] buffer, int index, int count)
{
int n = base.Read(buffer, index, count);
for (int i = 0; i