Why is my StringReader 50us slower than the .NET StringReader? - c#

In my test I created a string with 32000 characters.
After repeated execution of the test the BCL StringReader consistently executed in 350us while mine ran in 400us. What kind of secrets are they hiding?
Test:
private void SpeedTest()
{
String r = "";
for (int i = 0; i < 1000; i++)
{
r += Randomization.GenerateString();
}
StopWatch s = new StopWatch();
s.Start();
using (var sr = new System.IO.StringReader(r))
{
while (sr.Peek() > -1)
{
sr.Read();
}
}
s.Stop();
_Write(s.Elapsed);
s.Reset();
s.Start();
using (var sr = new MagicSynthesis.StringReader(r))
{
while (sr.PeekNext() > Char.MinValue)
{
sr.Next();
}
}
s.Stop();
_Write(s.Elapsed);
}
Code:
public unsafe class StringReader : IDisposable
{
private Char* Base;
private Char* End;
private Char* Current;
private const Char Null = '\0';
/// <summary></summary>
public StringReader(String s)
{
if (s == null)
throw new ArgumentNullException("s");
Base = (Char*)Marshal.StringToHGlobalUni(s).ToPointer();
End = (Base + s.Length);
Current = Base;
}
/// <summary></summary>
public Char Next()
{
return (Current < End) ? *(Current++) : Null;
}
/// <summary></summary>
public String Next(Int32 length)
{
String s = String.Empty;
while (Current < End && length > 0)
{
length--;
s += *(Current++);
}
return s;
}
/// <summary></summary>
public Char PeekNext()
{
return *(Current);
}
/// <summary></summary>
public String PeekNext(Int32 length)
{
String s = String.Empty;
Char* a = Current;
while (Current < End && length > 0)
{
length--;
s += *(Current++);
}
Current = a;
return s;
}
/// <summary></summary>
public Char Previous()
{
return ((Current > Base) ? *(--Current) : Null);
}
/// <summary></summary>
public Char PeekPrevious()
{
return ((Current > Base) ? *(Current - 1) : Null);
}
/// <summary></summary>
public void Dispose()
{
Marshal.FreeHGlobal(new IntPtr(Base));
}
}

I would bet that Marshal.StringToHGlobalUni() and Marshal.FreeHGlobal(new IntPtr(Base)) have a lot to do with the differences. I'm not sure how StringReader manages the string, but I bet it's not copying it to unmanaged memory.
Looking at the StringReader.Read() method in Reflector shows this:
public override int Read()
{
if (this._s == null)
{
__Error.ReaderClosed();
}
if (this._pos == this._length)
{
return -1;
}
return this._s[this._pos++];
}
The contructor is also just:
public StringReader(string s)
{
if (s == null)
{
throw new ArgumentNullException("s");
}
this._s = s;
this._length = (s == null) ? 0 : s.Length;
}
So, it appear that StringReader just maintains the current position and uses regular indexes to return values.
Edit
In response to your comment, your Next() method does a comparison and an unsafe cast, which probably isn't optimized in any way. StringReader.Read() does simple comparison and returns the character as _pos index in the string, which probably has some optimization by the compiler.

Maybe Reflector would help you find your answer?

You can always look at the source code

Couldn't tell after simply looking at your code, but here's the code for StringReader.Read():
public override int Read()
{
if (this._s == null)
{
__Error.ReaderClosed();
}
if (this._pos == this._length)
{
return -1;
}
return this._s[this._pos++];
}
They've got two simple value checks and an array access plus increment, versus your value check and pointer increment. Perhaps it would be useful to look at the IL and see how many ops each compiles down to.

Have you tried profiling your StringReader to see if there are any obvious places where you could save time? This is the most reliable way to determine what the bottlenecks in your code are.
Normally I would suggest profiling your solution against the other but I'm not sure about the viability of profiling the BCL. It's GAC'd and strongly signed which makes instrumentation difficult so you would have to rely on sampling.

Related

Any alternative in C# similar to Java's nextInt() to extract int from string, without using Split()? [duplicate]

In Java I can pass a Scanner a string and then I can do handy things like, scanner.hasNext() or scanner.nextInt(), scanner.nextDouble() etc.
This allows some pretty clean code for parsing a string that contains rows of numbers.
How is this done in C# land?
If you had a string that say had:
"0 0 1 22 39 0 0 1 2 33 33"
In Java I would pass that to a scanner and do a
while(scanner.hasNext())
myArray[i++] = scanner.nextInt();
Or something very similar. What is the C#' ish way to do this?
I'm going to add this as a separate answer because it's quite distinct from the answer I already gave. Here's how you could start creating your own Scanner class:
class Scanner : System.IO.StringReader
{
string currentWord;
public Scanner(string source) : base(source)
{
readNextWord();
}
private void readNextWord()
{
System.Text.StringBuilder sb = new StringBuilder();
char nextChar;
int next;
do
{
next = this.Read();
if (next < 0)
break;
nextChar = (char)next;
if (char.IsWhiteSpace(nextChar))
break;
sb.Append(nextChar);
} while (true);
while((this.Peek() >= 0) && (char.IsWhiteSpace((char)this.Peek())))
this.Read();
if (sb.Length > 0)
currentWord = sb.ToString();
else
currentWord = null;
}
public bool hasNextInt()
{
if (currentWord == null)
return false;
int dummy;
return int.TryParse(currentWord, out dummy);
}
public int nextInt()
{
try
{
return int.Parse(currentWord);
}
finally
{
readNextWord();
}
}
public bool hasNextDouble()
{
if (currentWord == null)
return false;
double dummy;
return double.TryParse(currentWord, out dummy);
}
public double nextDouble()
{
try
{
return double.Parse(currentWord);
}
finally
{
readNextWord();
}
}
public bool hasNext()
{
return currentWord != null;
}
}
Using part of the answers already given, I've created a StringReader that can extract Enum and any data type that implements IConvertible.
Usage
using(var reader = new PacketReader("1 23 ErrorOk StringValue 15.22")
{
var index = reader.ReadNext<int>();
var count = reader.ReadNext<int>();
var result = reader.ReadNext<ErrorEnum>();
var data = reader.ReadNext<string>();
var responseTime = reader.ReadNext<double>();
}
Implementation
public class PacketReader : StringReader
{
public PacketReader(string s)
: base(s)
{
}
public T ReadNext<T>() where T : IConvertible
{
var sb = new StringBuilder();
do
{
var current = Read();
if (current < 0)
break;
sb.Append((char)current);
var next = (char)Peek();
if (char.IsWhiteSpace(next))
break;
} while (true);
var value = sb.ToString();
var type = typeof(T);
if (type.IsEnum)
return (T)Enum.Parse(type, value);
return (T)((IConvertible)value).ToType(typeof(T), System.Globalization.CultureInfo.CurrentCulture);
}
}
While this isn't the exact same fundamental concept, what you're looking for can be done with this lambda expression:
string foo = "0 0 1 22 39 0 0 1 2 33 33";
int[] data = foo.Split(' ').Select(p => int.Parse(p)).ToArray();
What this does is first Split the string, using a space as a delimiter. The Select function then allows you to specify an alias for a given member in the array (which I referred to as 'p' in this example), then perform an operation on that member to give a final result. The ToArray() call then turns this abstract enumerable class into a concrete array.
So in this end, this splits the string, then converts each element into an int and populates an int[] with the resulting values.
To my knowledge, there are no built in classes in the framework for doing this. You would have to roll your own.
That would not be too hard. A nice C# version might implement IEnumerable so you could say:
var scanner = new Scanner<int>(yourString);
foreach(int n in scanner)
; // your code
To get as close as possible to your syntax, this'll work if you're only interested in one type ("int" in the example):
static void Main(string[] args)
{
if (args.Length == 0) { args = new string[] { "3", "43", "6" }; }
IEnumerator<int> scanner = (from arg in args select int.Parse(arg)).GetEnumerator();
while (scanner.MoveNext())
{
Console.Write("{0} ", scanner.Current);
}
}
Here's an even more whiz-bang version that allows you to access any type that is supported by string's IConvertible implementation:
static void Main(string[] args)
{
if (args.Length == 0) { args = new string[] { "3", "43", "6" }; }
var scanner = args.Select<string, Func<Type, Object>>((string s) => {
return (Type t) =>
((IConvertible)s).ToType(t, System.Globalization.CultureInfo.InvariantCulture);
}).GetEnumerator();
while (scanner.MoveNext())
{
Console.Write("{0} ", scanner.Current(typeof(int)));
}
}
Just pass a different type to the "typeof" operator in the while loop to choose the type.
These both require the latest versions of C# and the .NET framework.
You could use linq to accomplish this like so:
string text = "0 0 1 22 39 0 0 1 2 33 33";
text.Where(i => char.IsNumber(i)).Write(); // do somthing usefull here...
I would do this in one of a couple ways depending on whether 1) you are using the latest .NET framework with LINQ support and 2) you know the values are valid integers. Here's a function to demonstrate both:
int[] ParseIntArray(string input, bool validateRequired)
{
if (validateRequired)
{
string[] split = input.Split();
List<int> result = new List<int>(split.Length);
int parsed;
for (int inputIdx = 0; inputIdx < split.Length; inputIdx++)
{
if (int.TryParse(split[inputIdx], out parsed))
result.Add(parsed);
}
return result.ToArray();
}
else
return (from i in input.Split()
select int.Parse(i)).ToArray();
}
Based on comments in other answer(s), I assume you need the validation. After reading those comments, I think the closest thing you'll get is int.TryParse and double.TryParse, which is kind of a combination of hasNextInt and nextInt (or a combination of hasNextDouble and nextDouble).

Is it required to check before replacing a string in StringBuilder (using functions like "Contains" or "IndexOf")?

Is there any method IndexOf or Contains in C#. Below is the code:
var sb = new StringBuilder(mystring);
sb.Replace("abc", "a");
string dateFormatString = sb.ToString();
if (sb.ToString().Contains("def"))
{
sb.Replace("def", "aa");
}
if (sb.ToString().Contains("ghi"))
{
sb.Replace("ghi", "assd");
}
As you might have noticed I am using ToString() above again and again which I want to avoid as it is creating new string everytime. Can you help me how can I avoid it?
If the StringBuilder doesn't contain "def" then performing the replacement won't cause any problems, so just use:
var sb = new StringBuilder(mystring);
sb.Replace("abc", "a");
sb.Replace("def", "aa");
sb.Replace("ghi", "assd");
There's no such method in StringBuilder but you don't need the Contains tests. You can simply write it like this:
sb.Replace("abc", "a");
sb.Replace("def", "aa");
sb.Replace("ghi", "assd");
If the string in the first parameter to Replace is not found then the call to Replace is a null operation—exactly what you want.
The documentation states:
Replaces all occurrences of a specified string in this instance with another specified string.
The way you read this is that when there are no occurrences, nothing is done.
You can write a class that extends methods to the StringBuilder object. Here, I have added IndexOf, Substring, and other methods to the StringBuilder class. Just put this class in your project.
using System;
using System.Text;
namespace Helpers
{
/// <summary>
/// Adds IndexOf, IsStringAt, AreEqual, and Substring to all StringBuilder objects.
/// </summary>
public static class StringBuilderExtension
{
// Adds IndexOf, Substring, AreEqual to the StringBuilder class.
public static int IndexOf(this StringBuilder theStringBuilder,string value)
{
const int NOT_FOUND = -1;
if (theStringBuilder == null)
{
return NOT_FOUND;
}
if (String.IsNullOrEmpty(value))
{
return NOT_FOUND;
}
int count = theStringBuilder.Length;
int len = value.Length;
if (count < len)
{
return NOT_FOUND;
}
int loopEnd = count - len + 1;
for (int loop = 0; loop < loopEnd; loop++)
{
bool found = true;
for (int innerLoop = 0; innerLoop < len; innerLoop++)
{
if (theStringBuilder[loop + innerLoop] != value[innerLoop])
{
found = false;
break;
}
}
if (found)
{
return loop;
}
}
return NOT_FOUND;
}
public static int IndexOf(this StringBuilder theStringBuilder, string value,int startPosition)
{
const int NOT_FOUND = -1;
if (theStringBuilder == null)
{
return NOT_FOUND;
}
if (String.IsNullOrEmpty(value))
{
return NOT_FOUND;
}
int count = theStringBuilder.Length;
int len = value.Length;
if (count < len)
{
return NOT_FOUND;
}
int loopEnd = count - len + 1;
if (startPosition >= loopEnd)
{
return NOT_FOUND;
}
for (int loop = startPosition; loop < loopEnd; loop++)
{
bool found = true;
for (int innerLoop = 0; innerLoop < len; innerLoop++)
{
if (theStringBuilder[loop + innerLoop] != value[innerLoop])
{
found = false;
break;
}
}
if (found)
{
return loop;
}
}
return NOT_FOUND;
}
public static string Substring(this StringBuilder theStringBuilder, int startIndex, int length)
{
return theStringBuilder == null ? null : theStringBuilder.ToString(startIndex, length);
}
public static bool AreEqual(this StringBuilder theStringBuilder, string compareString)
{
if (theStringBuilder == null)
{
return compareString == null;
}
if (compareString == null)
{
return false;
}
int len = theStringBuilder.Length;
if (len != compareString.Length)
{
return false;
}
for (int loop = 0; loop < len; loop++)
{
if (theStringBuilder[loop] != compareString[loop])
{
return false;
}
}
return true;
}
/// <summary>
/// Compares one string to part of another string.
/// </summary>
/// <param name="haystack"></param>
/// <param name="needle">Needle to look for</param>
/// <param name="position">Looks to see if the needle is at position in haystack</param>
/// <returns>Substring(theStringBuilder,offset,compareString.Length) == compareString</returns>
public static bool IsStringAt(this StringBuilder haystack, string needle,int position)
{
if (haystack == null)
{
return needle == null;
}
if (needle == null)
{
return false;
}
int len = haystack.Length;
int compareLen = needle.Length;
if (len < compareLen + position)
{
return false;
}
for (int loop = 0; loop < compareLen; loop++)
{
if (haystack[loop+position] != needle[loop])
{
return false;
}
}
return true;
}
}
}
IMHO you don't have to use StringBuilder in this case... StringBuilder is more useful when used in a loop. Like Microsoft say in In this article
The String object is immutable. Every
time you use one of the methods in the
System.String class, you create a new
string object in memory, which
requires a new allocation of space for
that new object. In situations where
you need to perform repeated
modifications to a string, the
overhead associated with creating a
new String object can be costly. The
System.Text.StringBuilder class can be
used when you want to modify a string
without creating a new object. For
example, using the StringBuilder class
can boost performance when
concatenating many strings together in
a loop
So simply you can use String and avoid use ToString()...

Best implementation for an isNumber(string) method

In my limited experience, I've been on several projects that have had some sort of string utility class with methods to determine if a given string is a number. The idea has always been the same, however, the implementation has been different. Some surround a parse attempt with try/catch
public boolean isInteger(String str) {
try {
Integer.parseInt(str);
return true;
} catch (NumberFormatException nfe) {}
return false;
}
and others match with regex
public boolean isInteger(String str) {
return str.matches("^-?[0-9]+(\\.[0-9]+)?$");
}
Is one of these methods better than the other? I personally prefer using the regex approach, as it's concise, but will it perform on par if called while iterating over, say, a list of a several hundred thousand strings?
Note: As I'm kinda new to the site I don't fully understand this Community Wiki business, so if this belongs there let me know, and I'll gladly move it.
EDIT:
With all the TryParse suggestions I ported Asaph's benchmark code (thanks for a great post!) to C# and added a TryParse method. And as it seems, the TryParse wins hands down. However, the try catch approach took a crazy amount of time. To the point of me thinking I did something wrong! I also updated regex to handle negatives and decimal points.
Results for updated, C# benchmark code:
00:00:51.7390000 for isIntegerParseInt
00:00:03.9110000 for isIntegerRegex
00:00:00.3500000 for isIntegerTryParse
Using:
static bool isIntegerParseInt(string str) {
try {
int.Parse(str);
return true;
} catch (FormatException e){}
return false;
}
static bool isIntegerRegex(string str) {
return Regex.Match(str, "^-?[0-9]+(\\.[0-9]+)?$").Success;
}
static bool isIntegerTryParse(string str) {
int bob;
return Int32.TryParse(str, out bob);
}
I just ran some benchmarks on the performance of these 2 methods (On Macbook Pro OSX Leopard Java 6). ParseInt is faster. Here is the output:
This operation took 1562 ms.
This operation took 2251 ms.
And here is my benchmark code:
public class IsIntegerPerformanceTest {
public static boolean isIntegerParseInt(String str) {
try {
Integer.parseInt(str);
return true;
} catch (NumberFormatException nfe) {}
return false;
}
public static boolean isIntegerRegex(String str) {
return str.matches("^[0-9]+$");
}
public static void main(String[] args) {
long starttime, endtime;
int iterations = 1000000;
starttime = System.currentTimeMillis();
for (int i=0; i<iterations; i++) {
isIntegerParseInt("123");
isIntegerParseInt("not an int");
isIntegerParseInt("-321");
}
endtime = System.currentTimeMillis();
System.out.println("This operation took " + (endtime - starttime) + " ms.");
starttime = System.currentTimeMillis();
for (int i=0; i<iterations; i++) {
isIntegerRegex("123");
isIntegerRegex("not an int");
isIntegerRegex("-321");
}
endtime = System.currentTimeMillis();
System.out.println("This operation took " + (endtime - starttime) + " ms.");
}
}
Also, note that your regex will reject negative numbers and the parseInt method will accept them.
Here is our way of doing this:
public boolean isNumeric(String string) throws IllegalArgumentException
{
boolean isnumeric = false;
if (string != null && !string.equals(""))
{
isnumeric = true;
char chars[] = string.toCharArray();
for(int d = 0; d < chars.length; d++)
{
isnumeric &= Character.isDigit(chars[d]);
if(!isnumeric)
break;
}
}
return isnumeric;
}
If absolute performance is key, and if you are just checking for integers (not floating point numbers) I suspect that iterating over each character in the string, returning false if you encounter something not in the range 0-9, will be fastest.
RegEx is a more general-purpose solution so will probably not perform as fast for that special case. A solution that throws an exception will have some extra overhead in that case. TryParse will be slightly slower if you don't actually care about the value of the number, just whether or not it is a number, since the conversion to a number must also take place.
For anything but an inner loop that's called many times, the differences between all of these options should be insignificant.
I needed to refactor code like yours to get rid of NumberFormatException. The refactored Code:
public static Integer parseInteger(final String str) {
if (str == null || str.isEmpty()) {
return null;
}
final Scanner sc = new Scanner(str);
return Integer.valueOf(sc.nextInt());
}
As a Java 1.4 guy, I didn't know about java.util.Scanner. I found this interesting article:
http://rosettacode.org/wiki/Determine_if_a_string_is_numeric#Java
I personaly liked the solution with the scanner, very compact and still readable.
Some languages, like C#, have a TryParse (or equivalent) that works fairly well for something like this.
public boolean IsInteger(string value)
{
int i;
return Int32.TryParse(value, i);
}
Personally I would do this if you really want to simplify it.
public boolean isInteger(string myValue)
{
int myIntValue;
return int.TryParse(myValue, myIntValue)
}
You could create an extension method for a string, and make the whole process look cleaner...
public static bool IsInt(this string str)
{
int i;
return int.TryParse(str, out i);
}
You could then do the following in your actual code...
if(myString.IsInt())....
Using .NET, you could do something like:
private bool isNumber(string str)
{
return str.Any(c => !char.IsDigit(c));
}
public static boolean CheckString(String myString) {
char[] digits;
digits = myString.toCharArray();
for (char div : digits) {// for each element div of type char in the digits collection (digits is a collection containing div elements).
try {
Double.parseDouble(myString);
System.out.println("All are numbers");
return true;
} catch (NumberFormatException e) {
if (Character.isDigit(div)) {
System.out.println("Not all are chars");
return false;
}
}
}
System.out.println("All are chars");
return true;
}
That's my implementation to check whether a string is made of digits:
public static boolean isNumeric(String string)
{
if (string == null)
{
throw new NullPointerException("The string must not be null!");
}
final int len = string.length();
if (len == 0)
{
return false;
}
for (int i = 0; i < len; ++i)
{
if (!Character.isDigit(string.charAt(i)))
{
return false;
}
}
return true;
}
I like code:
public static boolean isIntegerRegex(String str) {
return str.matches("^[0-9]+$");
}
But it will good more when create Pattern before use it:
public static Pattern patternInteger = Pattern.compile("^[0-9]+$");
public static boolean isIntegerRegex(String str) {
return patternInteger.matcher(str).matches();
}
Apply by test we have result:
This operation isIntegerParseInt took 1313 ms.
This operation isIntegerRegex took 1178 ms.
This operation isIntegerRegexNew took 304 ms.
With:
public class IsIntegerPerformanceTest {
private static Pattern pattern = Pattern.compile("^[0-9]+$");
public static boolean isIntegerParseInt(String str) {
try {
Integer.parseInt(str);
return true;
} catch (NumberFormatException nfe) {
}
return false;
}
public static boolean isIntegerRegexNew(String str) {
return pattern.matcher(str).matches();
}
public static boolean isIntegerRegex(String str) {
return str.matches("^[0-9]+$");
}
public static void main(String[] args) {
long starttime, endtime;
int iterations = 1000000;
starttime = System.currentTimeMillis();
for (int i = 0; i < iterations; i++) {
isIntegerParseInt("123");
isIntegerParseInt("not an int");
isIntegerParseInt("-321");
}
endtime = System.currentTimeMillis();
System.out.println("This operation isIntegerParseInt took " + (endtime - starttime) + " ms.");
starttime = System.currentTimeMillis();
for (int i = 0; i < iterations; i++) {
isIntegerRegex("123");
isIntegerRegex("not an int");
isIntegerRegex("-321");
}
endtime = System.currentTimeMillis();
System.out.println("This operation took isIntegerRegex " + (endtime - starttime) + " ms.");
starttime = System.currentTimeMillis();
for (int i = 0; i < iterations; i++) {
isIntegerRegexNew("123");
isIntegerRegexNew("not an int");
isIntegerRegexNew("-321");
}
endtime = System.currentTimeMillis();
System.out.println("This operation took isIntegerRegexNew " + (endtime - starttime) + " ms.");
}
}
I think It could be faster than previous solutions if you do the following (Java):
public final static boolean isInteger(String in)
{
char c;
int length = in.length();
boolean ret = length > 0;
int i = ret && in.charAt(0) == '-' ? 1 : 0;
for (; ret && i < length; i++)
{
c = in.charAt(i);
ret = (c >= '0' && c <= '9');
}
return ret;
}
I ran the same code that Asaph ran and the result was:
This operation took 28 ms.
A huge difference (against 1691 ms and 2049 ms -on my computer). Take in account that this method does not validate if the string is null, so you should do that previously (including the String trimming)
I think people here is missing a point. The use of the same pattern repeatedly has a very easy optimization. Just use a singleton of the pattern. Doing it, in all my tests the try-catch approach never have a better benchmark than the pattern approach. With a success test try-catch takes twice the time, with a fail test it's 6 times slower.
public static final Pattern INT_PATTERN= Pattern.compile("^-?[0-9]+(\\.[0-9]+)?$");
public static boolean isInt(String s){
return INT_PATTERN.matcher(s).matches();
}
I use this but I liked Asaph's rigor in his post.
public static bool IsNumeric(object expression)
{
if (expression == null)
return false;
double number;
return Double.TryParse(Convert.ToString(expression, CultureInfo.InvariantCulture), NumberStyles.Any,
NumberFormatInfo.InvariantInfo, out number);
}
For long numbers use this:
(JAVA)
public static boolean isNumber(String string) {
try {
Long.parseLong(string);
} catch (Exception e) {
return false;
}
return true;
}
public static boolean isNumber(String str){
return str.matches("[0-9]*\\.[0-9]+");
}
to check whether number (including float, integer) or not
A modified version of my previous answer:
public static boolean isInteger(String in)
{
if (in != null)
{
char c;
int i = 0;
int l = in.length();
if (l > 0 && in.charAt(0) == '-')
{
i = 1;
}
if (l > i)
{
for (; i < l; i++)
{
c = in.charAt(i);
if (c < '0' || c > '9')
return false;
}
return true;
}
}
return false;
}
I just added this class to my utils:
public class TryParseLong {
private boolean isParseable;
private long value;
public TryParseLong(String toParse) {
try {
value = Long.parseLong(toParse);
isParseable = true;
} catch (NumberFormatException e) {
// Exception set to null to indicate it is deliberately
// being ignored, since the compensating action
// of clearing the parsable flag is being taken.
e = null;
isParseable = false;
}
}
public boolean isParsable() {
return isParseable;
}
public long getLong() {
return value;
}
}
To use it:
TryParseLong valueAsLong = new TryParseLong(value);
if (valueAsLong.isParsable()) {
...
// Do something with valueAsLong.getLong();
} else {
...
}
This only parses the value once.
It still makes use of the exception and control flow by exceptions, but at least it encapsulates that kind of code in a utility class, and code that uses it can work in a more normal way.
The problem with Java versus C#, is that C# has out values and pass by reference, so it can effectively return 2 pieces of information; the flag to indicate that something is parsable or not, and the actual parsed value. When we reutrn >1 value in Java, we need to create an object to hold them, so I took that approach and put the flag and the parsed value in an object.
Escape analysis is likely to handle this efficiently, and create the value and flag on the stack, and never create this object on the heap, so I think doing this will have minimal impact on performance.
To my thinking this gives about the optimal compromise between keeping control-flow-by-exception out your code, good performance, and not parsing the integer more than once.
public static boolean CheckIfNumber(String number){
for(int i = 0; i < number.length(); i++){
try{
Double.parseDouble(number.substring(i));
}catch(NumberFormatException ex){
return false;
}
}
return true;
}
I had this problem before but when I had input a number and then a character, it would still return true, I think this is the better way to do it. Just check if every char is a number. A little longer but it takes care if you have the situation of a user inputting "1abc". For some reason, when I tried to try and catch without iterating, it still thought it was a number so..

Is there an equivalent to the Scanner class in C# for strings?

In Java I can pass a Scanner a string and then I can do handy things like, scanner.hasNext() or scanner.nextInt(), scanner.nextDouble() etc.
This allows some pretty clean code for parsing a string that contains rows of numbers.
How is this done in C# land?
If you had a string that say had:
"0 0 1 22 39 0 0 1 2 33 33"
In Java I would pass that to a scanner and do a
while(scanner.hasNext())
myArray[i++] = scanner.nextInt();
Or something very similar. What is the C#' ish way to do this?
I'm going to add this as a separate answer because it's quite distinct from the answer I already gave. Here's how you could start creating your own Scanner class:
class Scanner : System.IO.StringReader
{
string currentWord;
public Scanner(string source) : base(source)
{
readNextWord();
}
private void readNextWord()
{
System.Text.StringBuilder sb = new StringBuilder();
char nextChar;
int next;
do
{
next = this.Read();
if (next < 0)
break;
nextChar = (char)next;
if (char.IsWhiteSpace(nextChar))
break;
sb.Append(nextChar);
} while (true);
while((this.Peek() >= 0) && (char.IsWhiteSpace((char)this.Peek())))
this.Read();
if (sb.Length > 0)
currentWord = sb.ToString();
else
currentWord = null;
}
public bool hasNextInt()
{
if (currentWord == null)
return false;
int dummy;
return int.TryParse(currentWord, out dummy);
}
public int nextInt()
{
try
{
return int.Parse(currentWord);
}
finally
{
readNextWord();
}
}
public bool hasNextDouble()
{
if (currentWord == null)
return false;
double dummy;
return double.TryParse(currentWord, out dummy);
}
public double nextDouble()
{
try
{
return double.Parse(currentWord);
}
finally
{
readNextWord();
}
}
public bool hasNext()
{
return currentWord != null;
}
}
Using part of the answers already given, I've created a StringReader that can extract Enum and any data type that implements IConvertible.
Usage
using(var reader = new PacketReader("1 23 ErrorOk StringValue 15.22")
{
var index = reader.ReadNext<int>();
var count = reader.ReadNext<int>();
var result = reader.ReadNext<ErrorEnum>();
var data = reader.ReadNext<string>();
var responseTime = reader.ReadNext<double>();
}
Implementation
public class PacketReader : StringReader
{
public PacketReader(string s)
: base(s)
{
}
public T ReadNext<T>() where T : IConvertible
{
var sb = new StringBuilder();
do
{
var current = Read();
if (current < 0)
break;
sb.Append((char)current);
var next = (char)Peek();
if (char.IsWhiteSpace(next))
break;
} while (true);
var value = sb.ToString();
var type = typeof(T);
if (type.IsEnum)
return (T)Enum.Parse(type, value);
return (T)((IConvertible)value).ToType(typeof(T), System.Globalization.CultureInfo.CurrentCulture);
}
}
While this isn't the exact same fundamental concept, what you're looking for can be done with this lambda expression:
string foo = "0 0 1 22 39 0 0 1 2 33 33";
int[] data = foo.Split(' ').Select(p => int.Parse(p)).ToArray();
What this does is first Split the string, using a space as a delimiter. The Select function then allows you to specify an alias for a given member in the array (which I referred to as 'p' in this example), then perform an operation on that member to give a final result. The ToArray() call then turns this abstract enumerable class into a concrete array.
So in this end, this splits the string, then converts each element into an int and populates an int[] with the resulting values.
To my knowledge, there are no built in classes in the framework for doing this. You would have to roll your own.
That would not be too hard. A nice C# version might implement IEnumerable so you could say:
var scanner = new Scanner<int>(yourString);
foreach(int n in scanner)
; // your code
To get as close as possible to your syntax, this'll work if you're only interested in one type ("int" in the example):
static void Main(string[] args)
{
if (args.Length == 0) { args = new string[] { "3", "43", "6" }; }
IEnumerator<int> scanner = (from arg in args select int.Parse(arg)).GetEnumerator();
while (scanner.MoveNext())
{
Console.Write("{0} ", scanner.Current);
}
}
Here's an even more whiz-bang version that allows you to access any type that is supported by string's IConvertible implementation:
static void Main(string[] args)
{
if (args.Length == 0) { args = new string[] { "3", "43", "6" }; }
var scanner = args.Select<string, Func<Type, Object>>((string s) => {
return (Type t) =>
((IConvertible)s).ToType(t, System.Globalization.CultureInfo.InvariantCulture);
}).GetEnumerator();
while (scanner.MoveNext())
{
Console.Write("{0} ", scanner.Current(typeof(int)));
}
}
Just pass a different type to the "typeof" operator in the while loop to choose the type.
These both require the latest versions of C# and the .NET framework.
You could use linq to accomplish this like so:
string text = "0 0 1 22 39 0 0 1 2 33 33";
text.Where(i => char.IsNumber(i)).Write(); // do somthing usefull here...
I would do this in one of a couple ways depending on whether 1) you are using the latest .NET framework with LINQ support and 2) you know the values are valid integers. Here's a function to demonstrate both:
int[] ParseIntArray(string input, bool validateRequired)
{
if (validateRequired)
{
string[] split = input.Split();
List<int> result = new List<int>(split.Length);
int parsed;
for (int inputIdx = 0; inputIdx < split.Length; inputIdx++)
{
if (int.TryParse(split[inputIdx], out parsed))
result.Add(parsed);
}
return result.ToArray();
}
else
return (from i in input.Split()
select int.Parse(i)).ToArray();
}
Based on comments in other answer(s), I assume you need the validation. After reading those comments, I think the closest thing you'll get is int.TryParse and double.TryParse, which is kind of a combination of hasNextInt and nextInt (or a combination of hasNextDouble and nextDouble).

How to know position(linenumber) of a streamreader in a textfile?

an example (that might not be real life, but to make my point) :
public void StreamInfo(StreamReader p)
{
string info = string.Format(
"The supplied streamreaer read : {0}\n at line {1}",
p.ReadLine(),
p.GetLinePosition()-1);
}
GetLinePosition here is an imaginary extension method of streamreader.
Is this possible?
Of course I could keep count myself but that's not the question.
I came across this post while looking for a solution to a similar problem where I needed to seek the StreamReader to particular lines. I ended up creating two extension methods to get and set the position on a StreamReader. It doesn't actually provide a line number count, but in practice, I just grab the position before each ReadLine() and if the line is of interest, then I keep the start position for setting later to get back to the line like so:
var index = streamReader.GetPosition();
var line1 = streamReader.ReadLine();
streamReader.SetPosition(index);
var line2 = streamReader.ReadLine();
Assert.AreEqual(line1, line2);
and the important part:
public static class StreamReaderExtensions
{
readonly static FieldInfo charPosField = typeof(StreamReader).GetField("charPos", BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.DeclaredOnly);
readonly static FieldInfo byteLenField = typeof(StreamReader).GetField("byteLen", BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.DeclaredOnly);
readonly static FieldInfo charBufferField = typeof(StreamReader).GetField("charBuffer", BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.DeclaredOnly);
public static long GetPosition(this StreamReader reader)
{
// shift position back from BaseStream.Position by the number of bytes read
// into internal buffer.
int byteLen = (int)byteLenField.GetValue(reader);
var position = reader.BaseStream.Position - byteLen;
// if we have consumed chars from the buffer we need to calculate how many
// bytes they represent in the current encoding and add that to the position.
int charPos = (int)charPosField.GetValue(reader);
if (charPos > 0)
{
var charBuffer = (char[])charBufferField.GetValue(reader);
var encoding = reader.CurrentEncoding;
var bytesConsumed = encoding.GetBytes(charBuffer, 0, charPos).Length;
position += bytesConsumed;
}
return position;
}
public static void SetPosition(this StreamReader reader, long position)
{
reader.DiscardBufferedData();
reader.BaseStream.Seek(position, SeekOrigin.Begin);
}
}
This works quite well for me and depending on your tolerance for using reflection It thinks it is a fairly simple solution.
Caveats:
While I have done some simple testing using various Systems.Text.Encoding options, pretty much all of the data I consume with this are simple text files (ASCII).
I only ever use the StreamReader.ReadLine() method and while a brief review of the source for StreamReader seems to indicate this will still work when using the other read methods, I have not really tested that scenario.
No, not really possible. The concept of a "line number" is based upon the actual data that's already been read, not just the position. For instance, if you were to Seek() the reader to an arbitrary position, it's not actuall going to read that data, so it wouldn't be able to determine the line number.
The only way to do this is to keep track of it yourself.
It is extremely easy to provide a line-counting wrapper for any TextReader:
public class PositioningReader : TextReader {
private TextReader _inner;
public PositioningReader(TextReader inner) {
_inner = inner;
}
public override void Close() {
_inner.Close();
}
public override int Peek() {
return _inner.Peek();
}
public override int Read() {
var c = _inner.Read();
if (c >= 0)
AdvancePosition((Char)c);
return c;
}
private int _linePos = 0;
public int LinePos { get { return _linePos; } }
private int _charPos = 0;
public int CharPos { get { return _charPos; } }
private int _matched = 0;
private void AdvancePosition(Char c) {
if (Environment.NewLine[_matched] == c) {
_matched++;
if (_matched == Environment.NewLine.Length) {
_linePos++;
_charPos = 0;
_matched = 0;
}
}
else {
_matched = 0;
_charPos++;
}
}
}
Drawbacks (for the sake of brevity):
Does not check constructor argument for null
Does not recognize alternate ways to terminate the lines. Will be inconsistent with ReadLine() behavior when reading files separated by raw \r or \n.
Does not override "block"-level methods like Read(char[], int, int), ReadBlock, ReadLine, ReadToEnd. TextReader implementation works correctly since it routes everything else to Read(); however, better performance could be achieved by
overriding those methods via routing calls to _inner. instead of base.
passing the characters read to the AdvancePosition. See the sample ReadBlock implementation:
public override int ReadBlock(char[] buffer, int index, int count) {
var readCount = _inner.ReadBlock(buffer, index, count);
for (int i = 0; i < readCount; i++)
AdvancePosition(buffer[index + i]);
return readCount;
}
No.
Consider that it's possible to seek to any poisition using the underlying stream object (which could be at any point in any line).
Now consider what that would do to any count kept by the StreamReader.
Should the StreamReader go and figure out which line it's now on?
Should it just keep a number of lines read, regardless of position within the file?
There are more questions than just these that would make this a nightmare to implement, imho.
Here is a guy that implemented a StreamReader with ReadLine() method that registers file position.
http://www.daniweb.com/forums/thread35078.html
I guess one should inherit from StreamReader, and then add the extra method to the special class along with some properties (_lineLength + _bytesRead):
// Reads a line. A line is defined as a sequence of characters followed by
// a carriage return ('\r'), a line feed ('\n'), or a carriage return
// immediately followed by a line feed. The resulting string does not
// contain the terminating carriage return and/or line feed. The returned
// value is null if the end of the input stream has been reached.
//
/// <include file='doc\myStreamReader.uex' path='docs/doc[#for="myStreamReader.ReadLine"]/*' />
public override String ReadLine()
{
_lineLength = 0;
//if (stream == null)
// __Error.ReaderClosed();
if (charPos == charLen)
{
if (ReadBuffer() == 0) return null;
}
StringBuilder sb = null;
do
{
int i = charPos;
do
{
char ch = charBuffer[i];
int EolChars = 0;
if (ch == '\r' || ch == '\n')
{
EolChars = 1;
String s;
if (sb != null)
{
sb.Append(charBuffer, charPos, i - charPos);
s = sb.ToString();
}
else
{
s = new String(charBuffer, charPos, i - charPos);
}
charPos = i + 1;
if (ch == '\r' && (charPos < charLen || ReadBuffer() > 0))
{
if (charBuffer[charPos] == '\n')
{
charPos++;
EolChars = 2;
}
}
_lineLength = s.Length + EolChars;
_bytesRead = _bytesRead + _lineLength;
return s;
}
i++;
} while (i < charLen);
i = charLen - charPos;
if (sb == null) sb = new StringBuilder(i + 80);
sb.Append(charBuffer, charPos, i);
} while (ReadBuffer() > 0);
string ss = sb.ToString();
_lineLength = ss.Length;
_bytesRead = _bytesRead + _lineLength;
return ss;
}
Think there is a minor bug in the code as the length of the string is used to calculate file position instead of using the actual bytes read (Lacking support for UTF8 and UTF16 encoded files).
I came here looking for something simple. If you're just using ReadLine() and don't care about using Seek() or anything, just make a simple subclass of StreamReader
class CountingReader : StreamReader {
private int _lineNumber = 0;
public int LineNumber { get { return _lineNumber; } }
public CountingReader(Stream stream) : base(stream) { }
public override string ReadLine() {
_lineNumber++;
return base.ReadLine();
}
}
and then you make it the normal way, say from a FileInfo object named file
CountingReader reader = new CountingReader(file.OpenRead())
and you just read the reader.LineNumber property.
The points already made with respect to the BaseStream are valid and important. However, there are situations in which you want to read a text and know where in the text you are. It can still be useful to write that up as a class to make it easy to reuse.
I tried to write such a class now. It seems to work correctly, but it's rather slow. It should be fine when performance isn't crucial (it isn't that slow, see below).
I use the same logic to track position in the text regardless if you read a char at a time, one buffer at a time, or one line at a time. While I'm sure this can be made to perform rather better by abandoning this, it made it much easier to implement... and, I hope, to follow the code.
I did a very basic performance comparison of the ReadLine method (which I believe is the weakest point of this implementation) to StreamReader, and the difference is almost an order of magnitude. I got 22 MB/s using my class StreamReaderEx, but nearly 9 times as much using StreamReader directly (on my SSD-equipped laptop). While it could be interesting, I don't know how to make a proper reading test; maybe using 2 identical files, each larger than the disk buffer, and reading them alternately..? At least my simple test produces consistent results when I run it several times, and regardless of which class reads the test file first.
The NewLine symbol defaults to Environment.NewLine but can be set to any string of length 1 or 2. The reader considers only this symbol as a newline, which may be a drawback. At least I know Visual Studio has prompted me a fair number of times that a file I open "has inconsistent newlines".
Please note that I haven't included the Guard class; this is a simple utility class and it should be obvoius from the context how to replace it. You can even remove it, but you'd lose some argument checking and thus the resulting code would be farther from "correct". For example, Guard.NotNull(s, "s") simply checks that is s is not null, throwing an ArgumentNullException (with argument name "s", hence the second parameter) should it be the case.
Enough babble, here's the code:
public class StreamReaderEx : StreamReader
{
// NewLine characters (magic value -1: "not used").
int newLine1, newLine2;
// The last character read was the first character of the NewLine symbol AND we are using a two-character symbol.
bool insideNewLine;
// StringBuilder used for ReadLine implementation.
StringBuilder lineBuilder = new StringBuilder();
public StreamReaderEx(string path, string newLine = "\r\n") : base(path)
{
init(newLine);
}
public StreamReaderEx(Stream s, string newLine = "\r\n") : base(s)
{
init(newLine);
}
public string NewLine
{
get { return "" + (char)newLine1 + (char)newLine2; }
private set
{
Guard.NotNull(value, "value");
Guard.Range(value.Length, 1, 2, "Only 1 to 2 character NewLine symbols are supported.");
newLine1 = value[0];
newLine2 = (value.Length == 2 ? value[1] : -1);
}
}
public int LineNumber { get; private set; }
public int LinePosition { get; private set; }
public override int Read()
{
int next = base.Read();
trackTextPosition(next);
return next;
}
public override int Read(char[] buffer, int index, int count)
{
int n = base.Read(buffer, index, count);
for (int i = 0; i

Categories

Resources