In my limited experience, I've been on several projects that have had some sort of string utility class with methods to determine if a given string is a number. The idea has always been the same, however, the implementation has been different. Some surround a parse attempt with try/catch
public boolean isInteger(String str) {
try {
Integer.parseInt(str);
return true;
} catch (NumberFormatException nfe) {}
return false;
}
and others match with regex
public boolean isInteger(String str) {
return str.matches("^-?[0-9]+(\\.[0-9]+)?$");
}
Is one of these methods better than the other? I personally prefer using the regex approach, as it's concise, but will it perform on par if called while iterating over, say, a list of a several hundred thousand strings?
Note: As I'm kinda new to the site I don't fully understand this Community Wiki business, so if this belongs there let me know, and I'll gladly move it.
EDIT:
With all the TryParse suggestions I ported Asaph's benchmark code (thanks for a great post!) to C# and added a TryParse method. And as it seems, the TryParse wins hands down. However, the try catch approach took a crazy amount of time. To the point of me thinking I did something wrong! I also updated regex to handle negatives and decimal points.
Results for updated, C# benchmark code:
00:00:51.7390000 for isIntegerParseInt
00:00:03.9110000 for isIntegerRegex
00:00:00.3500000 for isIntegerTryParse
Using:
static bool isIntegerParseInt(string str) {
try {
int.Parse(str);
return true;
} catch (FormatException e){}
return false;
}
static bool isIntegerRegex(string str) {
return Regex.Match(str, "^-?[0-9]+(\\.[0-9]+)?$").Success;
}
static bool isIntegerTryParse(string str) {
int bob;
return Int32.TryParse(str, out bob);
}
I just ran some benchmarks on the performance of these 2 methods (On Macbook Pro OSX Leopard Java 6). ParseInt is faster. Here is the output:
This operation took 1562 ms.
This operation took 2251 ms.
And here is my benchmark code:
public class IsIntegerPerformanceTest {
public static boolean isIntegerParseInt(String str) {
try {
Integer.parseInt(str);
return true;
} catch (NumberFormatException nfe) {}
return false;
}
public static boolean isIntegerRegex(String str) {
return str.matches("^[0-9]+$");
}
public static void main(String[] args) {
long starttime, endtime;
int iterations = 1000000;
starttime = System.currentTimeMillis();
for (int i=0; i<iterations; i++) {
isIntegerParseInt("123");
isIntegerParseInt("not an int");
isIntegerParseInt("-321");
}
endtime = System.currentTimeMillis();
System.out.println("This operation took " + (endtime - starttime) + " ms.");
starttime = System.currentTimeMillis();
for (int i=0; i<iterations; i++) {
isIntegerRegex("123");
isIntegerRegex("not an int");
isIntegerRegex("-321");
}
endtime = System.currentTimeMillis();
System.out.println("This operation took " + (endtime - starttime) + " ms.");
}
}
Also, note that your regex will reject negative numbers and the parseInt method will accept them.
Here is our way of doing this:
public boolean isNumeric(String string) throws IllegalArgumentException
{
boolean isnumeric = false;
if (string != null && !string.equals(""))
{
isnumeric = true;
char chars[] = string.toCharArray();
for(int d = 0; d < chars.length; d++)
{
isnumeric &= Character.isDigit(chars[d]);
if(!isnumeric)
break;
}
}
return isnumeric;
}
If absolute performance is key, and if you are just checking for integers (not floating point numbers) I suspect that iterating over each character in the string, returning false if you encounter something not in the range 0-9, will be fastest.
RegEx is a more general-purpose solution so will probably not perform as fast for that special case. A solution that throws an exception will have some extra overhead in that case. TryParse will be slightly slower if you don't actually care about the value of the number, just whether or not it is a number, since the conversion to a number must also take place.
For anything but an inner loop that's called many times, the differences between all of these options should be insignificant.
I needed to refactor code like yours to get rid of NumberFormatException. The refactored Code:
public static Integer parseInteger(final String str) {
if (str == null || str.isEmpty()) {
return null;
}
final Scanner sc = new Scanner(str);
return Integer.valueOf(sc.nextInt());
}
As a Java 1.4 guy, I didn't know about java.util.Scanner. I found this interesting article:
http://rosettacode.org/wiki/Determine_if_a_string_is_numeric#Java
I personaly liked the solution with the scanner, very compact and still readable.
Some languages, like C#, have a TryParse (or equivalent) that works fairly well for something like this.
public boolean IsInteger(string value)
{
int i;
return Int32.TryParse(value, i);
}
Personally I would do this if you really want to simplify it.
public boolean isInteger(string myValue)
{
int myIntValue;
return int.TryParse(myValue, myIntValue)
}
You could create an extension method for a string, and make the whole process look cleaner...
public static bool IsInt(this string str)
{
int i;
return int.TryParse(str, out i);
}
You could then do the following in your actual code...
if(myString.IsInt())....
Using .NET, you could do something like:
private bool isNumber(string str)
{
return str.Any(c => !char.IsDigit(c));
}
public static boolean CheckString(String myString) {
char[] digits;
digits = myString.toCharArray();
for (char div : digits) {// for each element div of type char in the digits collection (digits is a collection containing div elements).
try {
Double.parseDouble(myString);
System.out.println("All are numbers");
return true;
} catch (NumberFormatException e) {
if (Character.isDigit(div)) {
System.out.println("Not all are chars");
return false;
}
}
}
System.out.println("All are chars");
return true;
}
That's my implementation to check whether a string is made of digits:
public static boolean isNumeric(String string)
{
if (string == null)
{
throw new NullPointerException("The string must not be null!");
}
final int len = string.length();
if (len == 0)
{
return false;
}
for (int i = 0; i < len; ++i)
{
if (!Character.isDigit(string.charAt(i)))
{
return false;
}
}
return true;
}
I like code:
public static boolean isIntegerRegex(String str) {
return str.matches("^[0-9]+$");
}
But it will good more when create Pattern before use it:
public static Pattern patternInteger = Pattern.compile("^[0-9]+$");
public static boolean isIntegerRegex(String str) {
return patternInteger.matcher(str).matches();
}
Apply by test we have result:
This operation isIntegerParseInt took 1313 ms.
This operation isIntegerRegex took 1178 ms.
This operation isIntegerRegexNew took 304 ms.
With:
public class IsIntegerPerformanceTest {
private static Pattern pattern = Pattern.compile("^[0-9]+$");
public static boolean isIntegerParseInt(String str) {
try {
Integer.parseInt(str);
return true;
} catch (NumberFormatException nfe) {
}
return false;
}
public static boolean isIntegerRegexNew(String str) {
return pattern.matcher(str).matches();
}
public static boolean isIntegerRegex(String str) {
return str.matches("^[0-9]+$");
}
public static void main(String[] args) {
long starttime, endtime;
int iterations = 1000000;
starttime = System.currentTimeMillis();
for (int i = 0; i < iterations; i++) {
isIntegerParseInt("123");
isIntegerParseInt("not an int");
isIntegerParseInt("-321");
}
endtime = System.currentTimeMillis();
System.out.println("This operation isIntegerParseInt took " + (endtime - starttime) + " ms.");
starttime = System.currentTimeMillis();
for (int i = 0; i < iterations; i++) {
isIntegerRegex("123");
isIntegerRegex("not an int");
isIntegerRegex("-321");
}
endtime = System.currentTimeMillis();
System.out.println("This operation took isIntegerRegex " + (endtime - starttime) + " ms.");
starttime = System.currentTimeMillis();
for (int i = 0; i < iterations; i++) {
isIntegerRegexNew("123");
isIntegerRegexNew("not an int");
isIntegerRegexNew("-321");
}
endtime = System.currentTimeMillis();
System.out.println("This operation took isIntegerRegexNew " + (endtime - starttime) + " ms.");
}
}
I think It could be faster than previous solutions if you do the following (Java):
public final static boolean isInteger(String in)
{
char c;
int length = in.length();
boolean ret = length > 0;
int i = ret && in.charAt(0) == '-' ? 1 : 0;
for (; ret && i < length; i++)
{
c = in.charAt(i);
ret = (c >= '0' && c <= '9');
}
return ret;
}
I ran the same code that Asaph ran and the result was:
This operation took 28 ms.
A huge difference (against 1691 ms and 2049 ms -on my computer). Take in account that this method does not validate if the string is null, so you should do that previously (including the String trimming)
I think people here is missing a point. The use of the same pattern repeatedly has a very easy optimization. Just use a singleton of the pattern. Doing it, in all my tests the try-catch approach never have a better benchmark than the pattern approach. With a success test try-catch takes twice the time, with a fail test it's 6 times slower.
public static final Pattern INT_PATTERN= Pattern.compile("^-?[0-9]+(\\.[0-9]+)?$");
public static boolean isInt(String s){
return INT_PATTERN.matcher(s).matches();
}
I use this but I liked Asaph's rigor in his post.
public static bool IsNumeric(object expression)
{
if (expression == null)
return false;
double number;
return Double.TryParse(Convert.ToString(expression, CultureInfo.InvariantCulture), NumberStyles.Any,
NumberFormatInfo.InvariantInfo, out number);
}
For long numbers use this:
(JAVA)
public static boolean isNumber(String string) {
try {
Long.parseLong(string);
} catch (Exception e) {
return false;
}
return true;
}
public static boolean isNumber(String str){
return str.matches("[0-9]*\\.[0-9]+");
}
to check whether number (including float, integer) or not
A modified version of my previous answer:
public static boolean isInteger(String in)
{
if (in != null)
{
char c;
int i = 0;
int l = in.length();
if (l > 0 && in.charAt(0) == '-')
{
i = 1;
}
if (l > i)
{
for (; i < l; i++)
{
c = in.charAt(i);
if (c < '0' || c > '9')
return false;
}
return true;
}
}
return false;
}
I just added this class to my utils:
public class TryParseLong {
private boolean isParseable;
private long value;
public TryParseLong(String toParse) {
try {
value = Long.parseLong(toParse);
isParseable = true;
} catch (NumberFormatException e) {
// Exception set to null to indicate it is deliberately
// being ignored, since the compensating action
// of clearing the parsable flag is being taken.
e = null;
isParseable = false;
}
}
public boolean isParsable() {
return isParseable;
}
public long getLong() {
return value;
}
}
To use it:
TryParseLong valueAsLong = new TryParseLong(value);
if (valueAsLong.isParsable()) {
...
// Do something with valueAsLong.getLong();
} else {
...
}
This only parses the value once.
It still makes use of the exception and control flow by exceptions, but at least it encapsulates that kind of code in a utility class, and code that uses it can work in a more normal way.
The problem with Java versus C#, is that C# has out values and pass by reference, so it can effectively return 2 pieces of information; the flag to indicate that something is parsable or not, and the actual parsed value. When we reutrn >1 value in Java, we need to create an object to hold them, so I took that approach and put the flag and the parsed value in an object.
Escape analysis is likely to handle this efficiently, and create the value and flag on the stack, and never create this object on the heap, so I think doing this will have minimal impact on performance.
To my thinking this gives about the optimal compromise between keeping control-flow-by-exception out your code, good performance, and not parsing the integer more than once.
public static boolean CheckIfNumber(String number){
for(int i = 0; i < number.length(); i++){
try{
Double.parseDouble(number.substring(i));
}catch(NumberFormatException ex){
return false;
}
}
return true;
}
I had this problem before but when I had input a number and then a character, it would still return true, I think this is the better way to do it. Just check if every char is a number. A little longer but it takes care if you have the situation of a user inputting "1abc". For some reason, when I tried to try and catch without iterating, it still thought it was a number so..
Related
In Java I can pass a Scanner a string and then I can do handy things like, scanner.hasNext() or scanner.nextInt(), scanner.nextDouble() etc.
This allows some pretty clean code for parsing a string that contains rows of numbers.
How is this done in C# land?
If you had a string that say had:
"0 0 1 22 39 0 0 1 2 33 33"
In Java I would pass that to a scanner and do a
while(scanner.hasNext())
myArray[i++] = scanner.nextInt();
Or something very similar. What is the C#' ish way to do this?
I'm going to add this as a separate answer because it's quite distinct from the answer I already gave. Here's how you could start creating your own Scanner class:
class Scanner : System.IO.StringReader
{
string currentWord;
public Scanner(string source) : base(source)
{
readNextWord();
}
private void readNextWord()
{
System.Text.StringBuilder sb = new StringBuilder();
char nextChar;
int next;
do
{
next = this.Read();
if (next < 0)
break;
nextChar = (char)next;
if (char.IsWhiteSpace(nextChar))
break;
sb.Append(nextChar);
} while (true);
while((this.Peek() >= 0) && (char.IsWhiteSpace((char)this.Peek())))
this.Read();
if (sb.Length > 0)
currentWord = sb.ToString();
else
currentWord = null;
}
public bool hasNextInt()
{
if (currentWord == null)
return false;
int dummy;
return int.TryParse(currentWord, out dummy);
}
public int nextInt()
{
try
{
return int.Parse(currentWord);
}
finally
{
readNextWord();
}
}
public bool hasNextDouble()
{
if (currentWord == null)
return false;
double dummy;
return double.TryParse(currentWord, out dummy);
}
public double nextDouble()
{
try
{
return double.Parse(currentWord);
}
finally
{
readNextWord();
}
}
public bool hasNext()
{
return currentWord != null;
}
}
Using part of the answers already given, I've created a StringReader that can extract Enum and any data type that implements IConvertible.
Usage
using(var reader = new PacketReader("1 23 ErrorOk StringValue 15.22")
{
var index = reader.ReadNext<int>();
var count = reader.ReadNext<int>();
var result = reader.ReadNext<ErrorEnum>();
var data = reader.ReadNext<string>();
var responseTime = reader.ReadNext<double>();
}
Implementation
public class PacketReader : StringReader
{
public PacketReader(string s)
: base(s)
{
}
public T ReadNext<T>() where T : IConvertible
{
var sb = new StringBuilder();
do
{
var current = Read();
if (current < 0)
break;
sb.Append((char)current);
var next = (char)Peek();
if (char.IsWhiteSpace(next))
break;
} while (true);
var value = sb.ToString();
var type = typeof(T);
if (type.IsEnum)
return (T)Enum.Parse(type, value);
return (T)((IConvertible)value).ToType(typeof(T), System.Globalization.CultureInfo.CurrentCulture);
}
}
While this isn't the exact same fundamental concept, what you're looking for can be done with this lambda expression:
string foo = "0 0 1 22 39 0 0 1 2 33 33";
int[] data = foo.Split(' ').Select(p => int.Parse(p)).ToArray();
What this does is first Split the string, using a space as a delimiter. The Select function then allows you to specify an alias for a given member in the array (which I referred to as 'p' in this example), then perform an operation on that member to give a final result. The ToArray() call then turns this abstract enumerable class into a concrete array.
So in this end, this splits the string, then converts each element into an int and populates an int[] with the resulting values.
To my knowledge, there are no built in classes in the framework for doing this. You would have to roll your own.
That would not be too hard. A nice C# version might implement IEnumerable so you could say:
var scanner = new Scanner<int>(yourString);
foreach(int n in scanner)
; // your code
To get as close as possible to your syntax, this'll work if you're only interested in one type ("int" in the example):
static void Main(string[] args)
{
if (args.Length == 0) { args = new string[] { "3", "43", "6" }; }
IEnumerator<int> scanner = (from arg in args select int.Parse(arg)).GetEnumerator();
while (scanner.MoveNext())
{
Console.Write("{0} ", scanner.Current);
}
}
Here's an even more whiz-bang version that allows you to access any type that is supported by string's IConvertible implementation:
static void Main(string[] args)
{
if (args.Length == 0) { args = new string[] { "3", "43", "6" }; }
var scanner = args.Select<string, Func<Type, Object>>((string s) => {
return (Type t) =>
((IConvertible)s).ToType(t, System.Globalization.CultureInfo.InvariantCulture);
}).GetEnumerator();
while (scanner.MoveNext())
{
Console.Write("{0} ", scanner.Current(typeof(int)));
}
}
Just pass a different type to the "typeof" operator in the while loop to choose the type.
These both require the latest versions of C# and the .NET framework.
You could use linq to accomplish this like so:
string text = "0 0 1 22 39 0 0 1 2 33 33";
text.Where(i => char.IsNumber(i)).Write(); // do somthing usefull here...
I would do this in one of a couple ways depending on whether 1) you are using the latest .NET framework with LINQ support and 2) you know the values are valid integers. Here's a function to demonstrate both:
int[] ParseIntArray(string input, bool validateRequired)
{
if (validateRequired)
{
string[] split = input.Split();
List<int> result = new List<int>(split.Length);
int parsed;
for (int inputIdx = 0; inputIdx < split.Length; inputIdx++)
{
if (int.TryParse(split[inputIdx], out parsed))
result.Add(parsed);
}
return result.ToArray();
}
else
return (from i in input.Split()
select int.Parse(i)).ToArray();
}
Based on comments in other answer(s), I assume you need the validation. After reading those comments, I think the closest thing you'll get is int.TryParse and double.TryParse, which is kind of a combination of hasNextInt and nextInt (or a combination of hasNextDouble and nextDouble).
I am currently taking a C# class and in the class we are looking to take our error handling out of our primary code and build all the error handling and data parsing for all integers in another class, however the problem is you can only return one variable.
How can i return both a "true/false" (bool) and the parsed data from one class to another.
Class1.cs (primary code)
int num1;
Class2 class2Object = new Class2();
public Class1()
{
//constructor
}
public void Num1Method()
{
string tempVal = "";
bool errorFlag; //bool = true/false
do
{
errorFlag = false; //no error & initialize
Console.Write("Enter Num1: ");
tempVal = Console.ReadLine();
class2Object.IntErrorCheckMethod(tempVal);
}//close do
while (errorFlag == true);
}//close Num1Method
Class2.cs (error and parse handling)
public bool IntErrorCheckMethod(string xTempVal)
{
int tempNum = 0;
bool errorFlag = false;
try
{
tempNum = int.Parse(xTempVal);
}
catch(FormatException)
{
errorFlag = true;
tempNum = 999;
}
return errorFlag;
}//close int error check
So Class2 will only return the true/false (if there is an error or not), how can I also return the good parsed data back to Class1 to be put into the "int num1" variable?
Our professor can only think to remove the bool and use a dummy value (like if the data has an error, set the value to 999 and return it, then do an if elseif to check if the value is 999 then return an error message, otherwise submit the data to the variable.
I think its better code to be able to use a bool for the error as 999 could POSSIBLY be good data that is entered by the user.
Any ideas are appreciated,
Thanks!
You can use out parameter just like TryParse methods in .NET. BTW
instead of your method you can use
int tempNum;
errorFlag = Int32.TryParse(string, out tempNum);
Or if you really want to use your own method for parsing:
public bool IntErrorCheckMethod(string xTempVal, out int tempNum)
{
tempNum = 0;
bool errorFlag = false;
try
{
tempNum = int.Parse(xTempVal);
}
catch(FormatException)
{
errorFlag = true;
tempNum = 999;
}
return errorFlag;
}
Usage:
int num1;
public void Num1Method()
{
string tempVal;
do
{
Console.Write("Enter Num1: ");
tempVal = Console.ReadLine();
}
while(class2Object.IntErrorCheckMethod(tempVal, out num1));
}
Also consider to do some refactoring to your method:
public bool TryParse(string s, out int result)
{
result = 0;
try
{
result = Int32.Parse(s);
return true; // parsing succeed
}
catch(FormatException)
{
return false; // parsing failed, you don't care of result value
}
}
Why do I see people implement properties like this?
What is the point of checking if the value is equal to the current value?
public double? Price
{
get
{
return _price;
}
set
{
if (_price == value)
return;
_price = value;
}
}
In this case it would be moot; however, in the case where there is an associated side-effect (typically an event), it avoids trivial events. For example:
set
{
if (_price == value)
return;
_price = value;
OnPriceChanged(); // invokes the Price event
}
Now, if we do:
foo.Price = 16;
foo.Price = 16;
foo.Price = 16;
foo.Price = 16;
we don't get 4 events; we get at most 1 (maybe 0 if it is already 16).
In more complex examples there could be validation, pre-change actions and post-change actions. All of these can be avoided if you know that it isn't actually a change.
set
{
if (_price == value)
return;
if(value < 0 || value > MaxPrice) throw new ArgumentOutOfRangeException();
OnPriceChanging();
_price = value;
OnPriceChanged();
}
This is not an answer, more: it is an evidence-based response to the claim (in another answer) that it is quicker to check than to assign. In short: no, it isn't. No difference whatsoever. I get (for non-nullable int):
AutoProp: 356ms
Field: 356ms
BasicProp: 357ms
CheckedProp: 356ms
(with some small variations on successive runs - but essentially they all take exactly the same time within any sensible rounding - when doing something 500 MILLION times, we can ignore 1ms difference)
In fact, if we change to int? I get:
AutoProp: 714ms
Field: 536ms
BasicProp: 714ms
CheckedProp: 2323ms
or double? (like in the question):
AutoProp: 535ms
Field: 535ms
BasicProp: 539ms
CheckedProp: 3035ms
so this is not a performance helper!
with tests
class Test
{
static void Main()
{
var obj = new Test();
Stopwatch watch;
const int LOOP = 500000000;
watch = Stopwatch.StartNew();
for (int i = 0; i < LOOP; i++)
{
obj.AutoProp = 17;
}
watch.Stop();
Console.WriteLine("AutoProp: {0}ms", watch.ElapsedMilliseconds);
watch = Stopwatch.StartNew();
for (int i = 0; i < LOOP; i++)
{
obj.Field = 17;
}
watch.Stop();
Console.WriteLine("Field: {0}ms", watch.ElapsedMilliseconds);
watch = Stopwatch.StartNew();
for (int i = 0; i < LOOP; i++)
{
obj.BasicProp = 17;
}
watch.Stop();
Console.WriteLine("BasicProp: {0}ms", watch.ElapsedMilliseconds);
watch = Stopwatch.StartNew();
for (int i = 0; i < LOOP; i++)
{
obj.CheckedProp = 17;
}
watch.Stop();
Console.WriteLine("CheckedProp: {0}ms", watch.ElapsedMilliseconds);
Console.ReadLine();
}
public int AutoProp { get; set; }
public int Field;
private int basicProp;
public int BasicProp
{
get { return basicProp; }
set { basicProp = value; }
}
private int checkedProp;
public int CheckedProp
{
get { return checkedProp; }
set { if (value != checkedProp) checkedProp = value; }
}
}
Let's suppose we don't handle any change related events.
I don't think comparing is faster than assingment. It depends on the data type. Let's say you have a string, Comparison is much longer in the worst case than a simple assignment where the member simply changes reference to the ref of the new string.
So my guess is that it's better in that case to assign right away.
In the case of simple data types it doesn't have a real impact.
Such that, you dont have to re-assign the same value. Its just faster execution for comparing values. AFAIK
In my test I created a string with 32000 characters.
After repeated execution of the test the BCL StringReader consistently executed in 350us while mine ran in 400us. What kind of secrets are they hiding?
Test:
private void SpeedTest()
{
String r = "";
for (int i = 0; i < 1000; i++)
{
r += Randomization.GenerateString();
}
StopWatch s = new StopWatch();
s.Start();
using (var sr = new System.IO.StringReader(r))
{
while (sr.Peek() > -1)
{
sr.Read();
}
}
s.Stop();
_Write(s.Elapsed);
s.Reset();
s.Start();
using (var sr = new MagicSynthesis.StringReader(r))
{
while (sr.PeekNext() > Char.MinValue)
{
sr.Next();
}
}
s.Stop();
_Write(s.Elapsed);
}
Code:
public unsafe class StringReader : IDisposable
{
private Char* Base;
private Char* End;
private Char* Current;
private const Char Null = '\0';
/// <summary></summary>
public StringReader(String s)
{
if (s == null)
throw new ArgumentNullException("s");
Base = (Char*)Marshal.StringToHGlobalUni(s).ToPointer();
End = (Base + s.Length);
Current = Base;
}
/// <summary></summary>
public Char Next()
{
return (Current < End) ? *(Current++) : Null;
}
/// <summary></summary>
public String Next(Int32 length)
{
String s = String.Empty;
while (Current < End && length > 0)
{
length--;
s += *(Current++);
}
return s;
}
/// <summary></summary>
public Char PeekNext()
{
return *(Current);
}
/// <summary></summary>
public String PeekNext(Int32 length)
{
String s = String.Empty;
Char* a = Current;
while (Current < End && length > 0)
{
length--;
s += *(Current++);
}
Current = a;
return s;
}
/// <summary></summary>
public Char Previous()
{
return ((Current > Base) ? *(--Current) : Null);
}
/// <summary></summary>
public Char PeekPrevious()
{
return ((Current > Base) ? *(Current - 1) : Null);
}
/// <summary></summary>
public void Dispose()
{
Marshal.FreeHGlobal(new IntPtr(Base));
}
}
I would bet that Marshal.StringToHGlobalUni() and Marshal.FreeHGlobal(new IntPtr(Base)) have a lot to do with the differences. I'm not sure how StringReader manages the string, but I bet it's not copying it to unmanaged memory.
Looking at the StringReader.Read() method in Reflector shows this:
public override int Read()
{
if (this._s == null)
{
__Error.ReaderClosed();
}
if (this._pos == this._length)
{
return -1;
}
return this._s[this._pos++];
}
The contructor is also just:
public StringReader(string s)
{
if (s == null)
{
throw new ArgumentNullException("s");
}
this._s = s;
this._length = (s == null) ? 0 : s.Length;
}
So, it appear that StringReader just maintains the current position and uses regular indexes to return values.
Edit
In response to your comment, your Next() method does a comparison and an unsafe cast, which probably isn't optimized in any way. StringReader.Read() does simple comparison and returns the character as _pos index in the string, which probably has some optimization by the compiler.
Maybe Reflector would help you find your answer?
You can always look at the source code
Couldn't tell after simply looking at your code, but here's the code for StringReader.Read():
public override int Read()
{
if (this._s == null)
{
__Error.ReaderClosed();
}
if (this._pos == this._length)
{
return -1;
}
return this._s[this._pos++];
}
They've got two simple value checks and an array access plus increment, versus your value check and pointer increment. Perhaps it would be useful to look at the IL and see how many ops each compiles down to.
Have you tried profiling your StringReader to see if there are any obvious places where you could save time? This is the most reliable way to determine what the bottlenecks in your code are.
Normally I would suggest profiling your solution against the other but I'm not sure about the viability of profiling the BCL. It's GAC'd and strongly signed which makes instrumentation difficult so you would have to rely on sampling.
In Java I can pass a Scanner a string and then I can do handy things like, scanner.hasNext() or scanner.nextInt(), scanner.nextDouble() etc.
This allows some pretty clean code for parsing a string that contains rows of numbers.
How is this done in C# land?
If you had a string that say had:
"0 0 1 22 39 0 0 1 2 33 33"
In Java I would pass that to a scanner and do a
while(scanner.hasNext())
myArray[i++] = scanner.nextInt();
Or something very similar. What is the C#' ish way to do this?
I'm going to add this as a separate answer because it's quite distinct from the answer I already gave. Here's how you could start creating your own Scanner class:
class Scanner : System.IO.StringReader
{
string currentWord;
public Scanner(string source) : base(source)
{
readNextWord();
}
private void readNextWord()
{
System.Text.StringBuilder sb = new StringBuilder();
char nextChar;
int next;
do
{
next = this.Read();
if (next < 0)
break;
nextChar = (char)next;
if (char.IsWhiteSpace(nextChar))
break;
sb.Append(nextChar);
} while (true);
while((this.Peek() >= 0) && (char.IsWhiteSpace((char)this.Peek())))
this.Read();
if (sb.Length > 0)
currentWord = sb.ToString();
else
currentWord = null;
}
public bool hasNextInt()
{
if (currentWord == null)
return false;
int dummy;
return int.TryParse(currentWord, out dummy);
}
public int nextInt()
{
try
{
return int.Parse(currentWord);
}
finally
{
readNextWord();
}
}
public bool hasNextDouble()
{
if (currentWord == null)
return false;
double dummy;
return double.TryParse(currentWord, out dummy);
}
public double nextDouble()
{
try
{
return double.Parse(currentWord);
}
finally
{
readNextWord();
}
}
public bool hasNext()
{
return currentWord != null;
}
}
Using part of the answers already given, I've created a StringReader that can extract Enum and any data type that implements IConvertible.
Usage
using(var reader = new PacketReader("1 23 ErrorOk StringValue 15.22")
{
var index = reader.ReadNext<int>();
var count = reader.ReadNext<int>();
var result = reader.ReadNext<ErrorEnum>();
var data = reader.ReadNext<string>();
var responseTime = reader.ReadNext<double>();
}
Implementation
public class PacketReader : StringReader
{
public PacketReader(string s)
: base(s)
{
}
public T ReadNext<T>() where T : IConvertible
{
var sb = new StringBuilder();
do
{
var current = Read();
if (current < 0)
break;
sb.Append((char)current);
var next = (char)Peek();
if (char.IsWhiteSpace(next))
break;
} while (true);
var value = sb.ToString();
var type = typeof(T);
if (type.IsEnum)
return (T)Enum.Parse(type, value);
return (T)((IConvertible)value).ToType(typeof(T), System.Globalization.CultureInfo.CurrentCulture);
}
}
While this isn't the exact same fundamental concept, what you're looking for can be done with this lambda expression:
string foo = "0 0 1 22 39 0 0 1 2 33 33";
int[] data = foo.Split(' ').Select(p => int.Parse(p)).ToArray();
What this does is first Split the string, using a space as a delimiter. The Select function then allows you to specify an alias for a given member in the array (which I referred to as 'p' in this example), then perform an operation on that member to give a final result. The ToArray() call then turns this abstract enumerable class into a concrete array.
So in this end, this splits the string, then converts each element into an int and populates an int[] with the resulting values.
To my knowledge, there are no built in classes in the framework for doing this. You would have to roll your own.
That would not be too hard. A nice C# version might implement IEnumerable so you could say:
var scanner = new Scanner<int>(yourString);
foreach(int n in scanner)
; // your code
To get as close as possible to your syntax, this'll work if you're only interested in one type ("int" in the example):
static void Main(string[] args)
{
if (args.Length == 0) { args = new string[] { "3", "43", "6" }; }
IEnumerator<int> scanner = (from arg in args select int.Parse(arg)).GetEnumerator();
while (scanner.MoveNext())
{
Console.Write("{0} ", scanner.Current);
}
}
Here's an even more whiz-bang version that allows you to access any type that is supported by string's IConvertible implementation:
static void Main(string[] args)
{
if (args.Length == 0) { args = new string[] { "3", "43", "6" }; }
var scanner = args.Select<string, Func<Type, Object>>((string s) => {
return (Type t) =>
((IConvertible)s).ToType(t, System.Globalization.CultureInfo.InvariantCulture);
}).GetEnumerator();
while (scanner.MoveNext())
{
Console.Write("{0} ", scanner.Current(typeof(int)));
}
}
Just pass a different type to the "typeof" operator in the while loop to choose the type.
These both require the latest versions of C# and the .NET framework.
You could use linq to accomplish this like so:
string text = "0 0 1 22 39 0 0 1 2 33 33";
text.Where(i => char.IsNumber(i)).Write(); // do somthing usefull here...
I would do this in one of a couple ways depending on whether 1) you are using the latest .NET framework with LINQ support and 2) you know the values are valid integers. Here's a function to demonstrate both:
int[] ParseIntArray(string input, bool validateRequired)
{
if (validateRequired)
{
string[] split = input.Split();
List<int> result = new List<int>(split.Length);
int parsed;
for (int inputIdx = 0; inputIdx < split.Length; inputIdx++)
{
if (int.TryParse(split[inputIdx], out parsed))
result.Add(parsed);
}
return result.ToArray();
}
else
return (from i in input.Split()
select int.Parse(i)).ToArray();
}
Based on comments in other answer(s), I assume you need the validation. After reading those comments, I think the closest thing you'll get is int.TryParse and double.TryParse, which is kind of a combination of hasNextInt and nextInt (or a combination of hasNextDouble and nextDouble).