How to make this function not prematurely split? - c#

I've written this function...
internal static IEnumerable<KeyValuePair<char?, string>> SplitUnescaped(this string input, char[] separators)
{
int index = 0;
var state = new Stack<char>();
for (int i = 0; i < input.Length; ++i)
{
char c = input[i];
char s = state.Count > 0 ? state.Peek() : default(char);
if (state.Count > 0 && (s == '\\' || (s == '[' && c == ']') || ((s == '"' || s == '\'') && c == s)))
state.Pop();
else if (c == '\\' || c == '[' || c == '"' || c == '\'')
state.Push(c);
if (state.Count == 0 && separators.Contains(c))
{
yield return new KeyValuePair<char?, string>(c, input.Substring(index, i - index));
index = i + 1;
}
}
yield return new KeyValuePair<char?, string>(null, input.Substring(index));
}
Which splits a string on the given separators, as long as they aren't escaped, in quotes, or in brackets. Seems to work pretty well, but there's one problem with it.
There characters I want to split on include a space:
{ '>', '+', '~', ' ' };
So, given the string
a > b
I want it to split on > and ignore the spaces, but given
a b
I do want it to split on the space.
How can I fix the function?

You could continue to split based on and > and then remove the strings which are empty.

I think this does it...
internal static IEnumerable<KeyValuePair<char?, string>> SplitUnescaped(this string input, char[] separators)
{
int startIndex = 0;
var state = new Stack<char>();
input = input.Trim(separators);
for (int i = 0; i < input.Length; ++i)
{
char c = input[i];
char s = state.Count > 0 ? state.Peek() : default(char);
if (state.Count > 0 && (s == '\\' || (s == '[' && c == ']') || ((s == '"' || s == '\'') && c == s)))
state.Pop();
else if (c == '\\' || c == '[' || c == '"' || c == '\'')
state.Push(c);
else if (state.Count == 0 && separators.Contains(c))
{
int endIndex = i;
while (input[i] == ' ' && separators.Contains(input[i + 1])) { ++i; }
yield return new KeyValuePair<char?, string>(input[i], input.Substring(startIndex, endIndex - startIndex));
while (input[++i] == ' ') { }
startIndex = i;
}
}
yield return new KeyValuePair<char?, string>(null, input.Substring(startIndex));
}
I was trying to push the space onto the stack too before, and then doing some checks against that...but I think this is easier.

Related

How can I get the values new.TITLE['kinds.of'].food

I am new to regex. I have this string
new.TITLE['kinds.of'].food
or
new.TITLE['deep thought'].food
I want to retrieve these tokens:
new, TITLE, kinds.of, food.
or (2nd example)
new, TITLE, deep thought, food.
I can't simply split it with '.' I need regex match to get the values.
How is it done?
When working with tokens a parser (FST - Finite State Machine in this case) should do:
private static IEnumerable<string> ParseIt(string value) {
int lastIndex = 0;
bool inApostroph = false;
for (int i = 0; i < value.Length; ++i) {
char ch = value[i];
if (ch == '\'') {
inApostroph = !inApostroph;
continue;
}
if (inApostroph)
continue;
if (ch == '.' || ch == ']' || ch == '[') {
if (i - lastIndex > 0) {
if (value[lastIndex] != '\'')
yield return value.Substring(lastIndex, i - lastIndex);
else {
string result = value.Substring(lastIndex, i - lastIndex).Replace("''", "'");
yield return result.Substring(1, result.Length - 2);
}
}
lastIndex = i + 1;
}
}
if (lastIndex < value.Length)
yield return value.Substring(lastIndex);
}
Tests:
string test1 = #"new.TITLE['kinds.of'].food";
string test2 = #"new.TITLE['deep thought'].food";
string[] result1 = ParseIt(test1).ToArray();
string[] result2 = ParseIt(test2).ToArray();
Console.WriteLine(string.Join(Environment.NewLine, result1));
Console.WriteLine(string.Join(Environment.NewLine, result2));
Outcome:
new
TITLE
kinds.of
food
new
TITLE
deep thought
food

StringToASCII from scratch failed

I actually tried to make a StringToASCII function from scratch in c#.
I get the input from _myString and this is the code :
public void convertToASCII() {
//A-Z --> 65-90
//a-z --> 97-122
//0-9 --> 48-57
//Space --> 32
int[] returnString = new int[_myString.Length];
int iTableau = 0;
char iAZ = 'A';
char iaz = 'a';
char i09 = '0';
char iSpace = ' ';
for(int i = 0; i < _myString.Length; i++)
{
if(_myString[i] >= 65 && _myString[i] <= 90 || _myString[i] >= 97 && _myString[i] <= 122 || _myString[i] >= 48 && _myString[i] <= 57 || _myString[i] == 32)
{
while(iAZ < 90 || iaz < 122 || iaz < 122 || i09 < 57 || _myString[i] == 32)
{
if(_myString[i] == iAZ && iAZ >= 'A' && iAZ <= 'Z')
{
returnString[iTableau] = iAZ;
iTableau++;
iAZ--;
}
else
{
iAZ++;
}
if(_myString[i] == iaz && iaz >= 'a' && iaz <= 'z')
{
returnString[iTableau] = iaz;
iTableau++;
iaz--;
}
else
{
iaz++;
}
if(_myString[i] == i09 && i09 >= '0' && i09 <= '9')
{
returnString[iTableau] = i09;
iTableau++;
i09--;
}
else
{
i09++;
}
if(_myString[i] == iSpace)
{
returnString[iTableau] = iSpace;
iTableau++;
}
}
}
}
_myString = "";
for (int i = 0; i < returnString.Length; i++)
{
_myString += returnString[i];
}
}
I also tried this kind of function which it works, but i would like to make one who checks only chars from A-Z and a-z and 0-9 and space.
Same thing as the first function, i take the input from a global string variable called "_myString".
public void convertToASCII()
{
string asciiChar;
string returnString = "";
foreach (char c in _myString)
{
asciiChar= ((int)(c)).ToString();
returnString += " " + asciiChar;
}
_myString = returnString;
}
This is actually relatively simple:
public string StringToLettersOrNumbersOrSpace(string input)
{
var sb = new StringBuilder();
for (int i = 0; i < input.Length; i++)
{
if (Char.IsLetterOrDigit(input[i]) || input[i] == ' ')
{
sb.Append(input[i]);
}
}
return sb.ToString();
}
First, you'll want to use StringBuilder instead of continuously appending to a string variable. Strings in C# are immutable, meaning that they can't be changed after they've been created, so doing something like string s1 = "aaa"; s1 += "bbb"; will actually create an entirely new string instead of just adding to the original. StringBuilder, on the other hand, is mutable, so you don't need to worry about reallocating a bunch of strings every time you want to concatenate strings (which gets progressively slower and slower as the string gets bigger).
Second, you can use Char.IsLetterOrDigit instead of using comparisons. The method takes a char as input and returns true if the character is a letter (uppercase or lowercase) or a number. This maps directly to your desired range a-z, A-Z, and 0-9. Since you also care about spaces, though, you will have to manually check for that.

changing data to xml

I have been tasked with finishing a demo for a client as our main developer is currently unavailable. All the programming experience I have is that I briefly crossed swords with C# 5 years ago but haven't used it since.
I need help with turning exported data into XML if that makes sense. Currently we have a build which takes a PDF form extracts the required data which is shown to work via command prompt. I need to be able to turn this data into XML, so it can be queried to a database. The idea of the program is to take required data from a PDF convert it to XML and query it to a database where the data is stored. We are using the C# language in conjunction with the iTextsharp library. I would post the code but I'm not allowed to.
So I am asking can anyone help me out? Maybe point me towards an example of how this is done and or explain as simply as possible how I would go about doing this? I wouldn't usually ask for help from others but because the fact I haven't coded in years, it has left me feeling intimidated.
This may prove usefull to you... Taken from a way to parse PDF data using iTextSharp
using iTextSharp.text.pdf;
using iTextSharp.text;
private void openPDF()
{
string str = "";
string newFile = "c:\\New Document.pdf";
Document doc = new Document();
PdfReader reader = new PdfReader("c:\\New Document.pdf");
for (int i = 1; i <= reader.NumberOfPages; i++)
{
byte[] bt = reader.GetPageContent(i);
str += ExtractTextFromPDFBytes(bt);
}
}
private string ExtractTextFromPDFBytes(byte[] input)
{
if (input == null || input.Length == 0) return "";
try
{
string resultString = "";
// Flag showing if we are we currently inside a text object
bool inTextObject = false;
// Flag showing if the next character is literal
// e.g. '\\' to get a '\' character or '\(' to get '('
bool nextLiteral = false;
// () Bracket nesting level. Text appears inside ()
int bracketDepth = 0;
// Keep previous chars to get extract numbers etc.:
char[] previousCharacters = new char[_numberOfCharsToKeep];
for (int j = 0; j < _numberOfCharsToKeep; j++) previousCharacters[j] = ' ';
for (int i = 0; i < input.Length; i++)
{
char c = (char)input[i];
if (inTextObject)
{
// Position the text
if (bracketDepth == 0)
{
if (CheckToken(new string[] { "TD", "Td" }, previousCharacters))
{
resultString += "\n\r";
}
else
{
if (CheckToken(new string[] { "'", "T*", "\"" }, previousCharacters)) {
resultString += "\n";
}
else
{
if (CheckToken(new string[] { "Tj" }, previousCharacters))
{
resultString += " ";
}
}
}
}
// End of a text object, also go to a new line.
if (bracketDepth == 0 && CheckToken(new string[] { "ET" }, previousCharacters))
{
inTextObject = false;
resultString += " ";
}
else
{
// Start outputting text
if ((c == '(') && (bracketDepth == 0) && (!nextLiteral))
{
bracketDepth = 1;
}
else
{
// Stop outputting text
if ((c == ')') && (bracketDepth == 1) && (!nextLiteral))
{
bracketDepth = 0;
}
else
{
// Just a normal text character:
if (bracketDepth == 1)
{
// Only print out next character no matter what.
// Do not interpret.
if (c == '\\' && !nextLiteral)
{
nextLiteral = true;
}
else
{
if (((c >= ' ') && (c <= '~')) || ((c >= 128) && (c < 255)))
{
resultString += c.ToString();
}
nextLiteral = false;
}
}
}
}
}
}
// Store the recent characters for
// when we have to go back for a checking
for (int j = 0; j < _numberOfCharsToKeep - 1; j++)
{
previousCharacters[j] = previousCharacters[j + 1];
}
previousCharacters[_numberOfCharsToKeep - 1] = c;
// Start of a text object
if (!inTextObject && CheckToken(new string[] { "BT" }, previousCharacters))
{
inTextObject = true;
}
}
return resultString;
}
catch
{
return string.Empty;
}
}
private bool CheckToken(string[] tokens, char[] recent)
{
foreach (string token in tokens)
{
if ((recent[_numberOfCharsToKeep - 3] == token[0]) &&
(recent[_numberOfCharsToKeep - 2] == token[1]) &&
((recent[_numberOfCharsToKeep - 1] == ' ') ||
(recent[_numberOfCharsToKeep - 1] == 0x0d) ||
(recent[_numberOfCharsToKeep - 1] == 0x0a)) &&
((recent[_numberOfCharsToKeep - 4] == ' ') ||
(recent[_numberOfCharsToKeep - 4] == 0x0d) ||
(recent[_numberOfCharsToKeep - 4] == 0x0a)))
{
return true;
}
}
return false;
}
Once the data is parsed it's then a matter of converting each line to a class (as recommended by Andrei V in your post's comment), serializing it to XML and then storing either the XML file itself to the database or the XML data to the database.

Faster parsing of numbers on .NET

I have written two functions that convert a string of whitespace-separated integers into an int array. The first function uses Substring and then applies System.Int32.Parse to convert the substring into an int value:
let intsOfString (s: string) =
let ints = ResizeArray()
let rec inside i j =
if j = s.Length then
ints.Add(s.Substring(i, j-i) |> System.Int32.Parse)
else
let c = s.[j]
if '0' <= c && c <= '9' then
inside i (j+1)
else
ints.Add(s.Substring(i, j-i) |> System.Int32.Parse)
outside (j+1)
and outside i =
if i < s.Length then
let c = s.[i]
if '0' <= c && c <= '9' then
inside i (i+1)
else
outside (i+1)
outside 0
ints.ToArray()
The second function traverses the characters of the string in-place accumulating the integer without creating a temporary substring:
let intsOfString (s: string) =
let ints = ResizeArray()
let rec inside n i =
if i = s.Length then
ints.Add n
else
let c = s.[i]
if '0' <= c && c <= '9' then
inside (10*n + int c - 48) (i+1)
else
ints.Add n
outside(i+1)
and outside i =
if i < s.Length then
let c = s.[i]
if '0' <= c && c <= '9' then
inside (int c - 48) (i+1)
else
outside (i+1)
outside 0
ints.ToArray()
Benchmarking on space-separated integers 1 to 1,000,000, the first version takes 1.5s whereas the second version takes 0.3s.
Parsing such values can be performance critical so leaving 5x performance on the table by using temporary substrings can be undesirable. Parsing integers is easy but parsing other values such as floating point numbers, decimals and dates is considerably harder.
So, are there built-in functions to parse directly from a substring within a string (i.e. using the given start and length of a string) in order to avoid generating a temporary string? If not, are there any libraries that provide efficient functions to do this?
System.Int32.Parse is slowlest, because it used CultureInfo, FormatInfo and etc; and performance reason is not in the temporary strings.
Code from reflection:
private unsafe static bool ParseNumber(ref char* str, NumberStyles options, ref Number.NumberBuffer number, NumberFormatInfo numfmt, bool parseDecimal)
{
number.scale = 0;
number.sign = false;
string text = null;
string text2 = null;
string str2 = null;
string str3 = null;
bool flag = false;
string str4;
string str5;
if ((options & NumberStyles.AllowCurrencySymbol) != NumberStyles.None)
{
text = numfmt.CurrencySymbol;
if (numfmt.ansiCurrencySymbol != null)
{
text2 = numfmt.ansiCurrencySymbol;
}
str2 = numfmt.NumberDecimalSeparator;
str3 = numfmt.NumberGroupSeparator;
str4 = numfmt.CurrencyDecimalSeparator;
str5 = numfmt.CurrencyGroupSeparator;
flag = true;
}
else
{
str4 = numfmt.NumberDecimalSeparator;
str5 = numfmt.NumberGroupSeparator;
}
int num = 0;
char* ptr = str;
char c = *ptr;
while (true)
{
if (!Number.IsWhite(c) || (options & NumberStyles.AllowLeadingWhite) == NumberStyles.None || ((num & 1) != 0 && ((num & 1) == 0 || ((num & 32) == 0 && numfmt.numberNegativePattern != 2))))
{
bool flag2;
char* ptr2;
if ((flag2 = (((options & NumberStyles.AllowLeadingSign) == NumberStyles.None) ? false : ((num & 1) == 0))) && (ptr2 = Number.MatchChars(ptr, numfmt.positiveSign)) != null)
{
num |= 1;
ptr = ptr2 - (IntPtr)2 / 2;
}
else
{
if (flag2 && (ptr2 = Number.MatchChars(ptr, numfmt.negativeSign)) != null)
{
num |= 1;
number.sign = true;
ptr = ptr2 - (IntPtr)2 / 2;
}
else
{
if (c == '(' && (options & NumberStyles.AllowParentheses) != NumberStyles.None && (num & 1) == 0)
{
num |= 3;
number.sign = true;
}
else
{
if ((text == null || (ptr2 = Number.MatchChars(ptr, text)) == null) && (text2 == null || (ptr2 = Number.MatchChars(ptr, text2)) == null))
{
break;
}
num |= 32;
text = null;
text2 = null;
ptr = ptr2 - (IntPtr)2 / 2;
}
}
}
}
c = *(ptr += (IntPtr)2 / 2);
}
int num2 = 0;
int num3 = 0;
while (true)
{
if ((c >= '0' && c <= '9') || ((options & NumberStyles.AllowHexSpecifier) != NumberStyles.None && ((c >= 'a' && c <= 'f') || (c >= 'A' && c <= 'F'))))
{
num |= 4;
if (c != '0' || (num & 8) != 0)
{
if (num2 < 50)
{
number.digits[(IntPtr)(num2++)] = c;
if (c != '0' || parseDecimal)
{
num3 = num2;
}
}
if ((num & 16) == 0)
{
number.scale++;
}
num |= 8;
}
else
{
if ((num & 16) != 0)
{
number.scale--;
}
}
}
else
{
char* ptr2;
if ((options & NumberStyles.AllowDecimalPoint) != NumberStyles.None && (num & 16) == 0 && ((ptr2 = Number.MatchChars(ptr, str4)) != null || (flag && (num & 32) == 0 && (ptr2 = Number.MatchChars(ptr, str2)) != null)))
{
num |= 16;
ptr = ptr2 - (IntPtr)2 / 2;
}
else
{
if ((options & NumberStyles.AllowThousands) == NumberStyles.None || (num & 4) == 0 || (num & 16) != 0 || ((ptr2 = Number.MatchChars(ptr, str5)) == null && (!flag || (num & 32) != 0 || (ptr2 = Number.MatchChars(ptr, str3)) == null)))
{
break;
}
ptr = ptr2 - (IntPtr)2 / 2;
}
}
c = *(ptr += (IntPtr)2 / 2);
}
bool flag3 = false;
number.precision = num3;
number.digits[(IntPtr)num3] = '\0';
if ((num & 4) != 0)
{
if ((c == 'E' || c == 'e') && (options & NumberStyles.AllowExponent) != NumberStyles.None)
{
char* ptr3 = ptr;
c = *(ptr += (IntPtr)2 / 2);
char* ptr2;
if ((ptr2 = Number.MatchChars(ptr, numfmt.positiveSign)) != null)
{
c = *(ptr = ptr2);
}
else
{
if ((ptr2 = Number.MatchChars(ptr, numfmt.negativeSign)) != null)
{
c = *(ptr = ptr2);
flag3 = true;
}
}
if (c >= '0' && c <= '9')
{
int num4 = 0;
do
{
num4 = num4 * 10 + (int)(c - '0');
c = *(ptr += (IntPtr)2 / 2);
if (num4 > 1000)
{
num4 = 9999;
while (c >= '0' && c <= '9')
{
c = *(ptr += (IntPtr)2 / 2);
}
}
}
while (c >= '0' && c <= '9');
if (flag3)
{
num4 = -num4;
}
number.scale += num4;
}
else
{
ptr = ptr3;
c = *ptr;
}
}
while (true)
{
if (!Number.IsWhite(c) || (options & NumberStyles.AllowTrailingWhite) == NumberStyles.None)
{
bool flag2;
char* ptr2;
if ((flag2 = (((options & NumberStyles.AllowTrailingSign) == NumberStyles.None) ? false : ((num & 1) == 0))) && (ptr2 = Number.MatchChars(ptr, numfmt.positiveSign)) != null)
{
num |= 1;
ptr = ptr2 - (IntPtr)2 / 2;
}
else
{
if (flag2 && (ptr2 = Number.MatchChars(ptr, numfmt.negativeSign)) != null)
{
num |= 1;
number.sign = true;
ptr = ptr2 - (IntPtr)2 / 2;
}
else
{
if (c == ')' && (num & 2) != 0)
{
num &= -3;
}
else
{
if ((text == null || (ptr2 = Number.MatchChars(ptr, text)) == null) && (text2 == null || (ptr2 = Number.MatchChars(ptr, text2)) == null))
{
break;
}
text = null;
text2 = null;
ptr = ptr2 - (IntPtr)2 / 2;
}
}
}
}
c = *(ptr += (IntPtr)2 / 2);
}
if ((num & 2) == 0)
{
if ((num & 8) == 0)
{
if (!parseDecimal)
{
number.scale = 0;
}
if ((num & 16) == 0)
{
number.sign = false;
}
}
str = ptr;
return true;
}
}
str = ptr;
return false;
}
public static int Parse(string s)
{
return Number.ParseInt32(s, NumberStyles.Integer, NumberFormatInfo.CurrentInfo);
}
internal unsafe static int ParseInt32(string s, NumberStyles style, NumberFormatInfo info)
{
byte* stackBuffer = stackalloc byte[1 * 114 / 1];
Number.NumberBuffer numberBuffer = new Number.NumberBuffer(stackBuffer);
int result = 0;
Number.StringToNumber(s, style, ref numberBuffer, info, false);
if ((style & NumberStyles.AllowHexSpecifier) != NumberStyles.None)
{
if (!Number.HexNumberToInt32(ref numberBuffer, ref result))
{
throw new OverflowException(Environment.GetResourceString("Overflow_Int32"));
}
}
else
{
if (!Number.NumberToInt32(ref numberBuffer, ref result))
{
throw new OverflowException(Environment.GetResourceString("Overflow_Int32"));
}
}
return result;
}
private unsafe static void StringToNumber(string str, NumberStyles options, ref Number.NumberBuffer number, NumberFormatInfo info, bool parseDecimal)
{
if (str == null)
{
throw new ArgumentNullException("String");
}
fixed (char* ptr = str)
{
char* ptr2 = ptr;
if (!Number.ParseNumber(ref ptr2, options, ref number, info, parseDecimal) || ((ptr2 - ptr / 2) / 2 < str.Length && !Number.TrailingZeros(str, (ptr2 - ptr / 2) / 2)))
{
throw new FormatException(Environment.GetResourceString("Format_InvalidString"));
}
}
}
I've written this one for doubles, that doesn't create a temporary substring. It's meant to be used inside a JSON parser so it limits itself to how doubles can be represented in JSON according to http://www.json.org/.
It's not optimal yet because it requires you to know where the number begins and ends (begin and end parameters), so you'll have to traverse the length of the number twice to find out where it ends. It's still around 10-15x faster than double.Parse and it could be fairly easily modified that it finds the end inside the function which is then returned as an out parameter to know where you have to resume parsing the main string.
Used like so:
Parsers.TryParseDoubleFastStream("1", 0, 1, out j);
Parsers.TryParseDoubleFastStream("2.0", 0, 3, out j);
Parsers.TryParseDoubleFastStream("3.5", 0, 3, out j);
Parsers.TryParseDoubleFastStream("-4.5", 0, 4, out j);
Parsers.TryParseDoubleFastStream("50.06", 0, 5, out j);
Parsers.TryParseDoubleFastStream("1000.65", 0, 7, out j);
Parsers.TryParseDoubleFastStream("-10000.8600", 0, 11, out j);
Code can be found here:
https://gist.github.com/3010984 (would be too lengthy to post here).
And StandardFunctions.IgnoreChar is for my purpose as simple as:
public static bool IgnoreChar(char c)
{
return c < 33;
}
Paste all this code into C# and call Test(). This is as close as you can get to operating directly on the string array to parse numbers using C#. It is built for speed, not elegance. The ParseInt and ParseFloat function were created for an OpenGL graphics engine to import vectors from text-based 3d models. Parsing floats is a significant bottleneck in that process. This was as fast as I could make it.
using System.Diagnostics;
private void Test()
{
Stopwatch sw = new Stopwatch();
StringBuilder sb = new StringBuilder();
int iterations = 1000;
// Build a string of 1000000 space separated numbers
for (var n = 0; n < iterations; n++)
{
if (n > 0)
sb.Append(' ');
sb.Append(n.ToString());
}
string numberString = sb.ToString();
// Time the process
sw.Start();
StringToInts(numberString, iterations);
//StringToFloats(numberString, iterations);
sw.Stop();
long proc1 = sw.ElapsedMilliseconds;
Console.WriteLine("iterations: {0} \t {1}ms", iterations, proc1);
}
private unsafe int[] StringToInts(string s, int length)
{
int[] ints = new int[length];
int index = 0;
int startpos = 0;
fixed (char* pStringBuffer = s)
{
fixed (int* pIntBuffer = ints)
{
for (int n = 0; n < s.Length; n++)
{
if (s[n] == ' ' || n == s.Length - 1)
{
if (n == s.Length - 1)
n++;
// pIntBuffer[index++] = int.Parse(new string(pStringBuffer, startpos, n - startpos));
pIntBuffer[index++] = ParseInt((pStringBuffer + startpos), n - startpos);
startpos = n + 1;
}
}
}
}
return ints;
}
private unsafe float[] StringToFloats(string s, int length)
{
float[] floats = new float[length];
int index = 0;
int startpos = 0;
fixed (char* pStringBuffer = s)
{
fixed (float* pFloatBuffer = floats)
{
for (int n = 0; n < s.Length; n++)
{
if (s[n] == ' ' || n == s.Length - 1)
{
if (n == s.Length - 1)
n++;
pFloatBuffer[index++] = ParseFloat((pStringBuffer + startpos), n - startpos); // int.Parse(new string(pStringBuffer, startpos, n - startpos));
startpos = n + 1;
}
}
}
}
return floats;
}
public static unsafe int ParseInt(char* input, int len)
{
int pos = 0; // read pointer position
int part = 0; // the current part (int, float and sci parts of the number)
bool neg = false; // true if part is a negative number
int* ret = stackalloc int[1];
while (pos < len && (*(input + pos) > '9' || *(input + pos) < '0') && *(input + pos) != '-')
pos++;
// sign
if (*(input + pos) == '-')
{
neg = true;
pos++;
}
// integer part
while (pos < len && !(input[pos] > '9' || input[pos] < '0'))
part = part * 10 + (input[pos++] - '0');
*ret = neg ? (part * -1) : part;
return *ret;
}
public static unsafe float ParseFloat(char* input, int len)
{
//float ret = 0f; // return value
int pos = 0; // read pointer position
int part = 0; // the current part (int, float and sci parts of the number)
bool neg = false; // true if part is a negative number
float* ret = stackalloc float[1];
// find start
while (pos < len && (input[pos] < '0' || input[pos] > '9') && input[pos] != '-' && input[pos] != '.')
pos++;
// sign
if (input[pos] == '-')
{
neg = true;
pos++;
}
// integer part
while (pos < len && !(input[pos] > '9' || input[pos] < '0'))
part = part * 10 + (input[pos++] - '0');
*ret = neg ? (float)(part * -1) : (float)part;
// float part
if (pos < len && input[pos] == '.')
{
pos++;
double mul = 1;
part = 0;
while (pos < len && !(input[pos] > '9' || input[pos] < '0'))
{
part = part * 10 + (input[pos] - '0');
mul *= 10;
pos++;
}
if (neg)
*ret -= (float)part / (float)mul;
else
*ret += (float)part / (float)mul;
}
// scientific part
if (pos < len && (input[pos] == 'e' || input[pos] == 'E'))
{
pos++;
neg = (input[pos] == '-'); pos++;
part = 0;
while (pos < len && !(input[pos] > '9' || input[pos] < '0'))
{
part = part * 10 + (input[pos++] - '0');
}
if (neg)
*ret /= (float)Math.Pow(10d, (double)part);
else
*ret *= (float)Math.Pow(10d, (double)part);
}
return (float)*ret;
}
So, are there built-in functions to parse directly from a substring within a string (i.e.
using the given start and length of a string) in order to avoid generating a temporary
string? If not, are there any libraries that provide efficient functions to do this?
It seems that you want to use a lexing buffer and a lexer, similar to what OCaml can provide with ocamllex and the Lexbuf buffer. (I cannot provide references for F#.)
If your benchmark involving a huge string of integers separated by other tokens is your typical case, it will work well. But in other situations, it could be impractical.
Not sure if this is any good, but have you tried something like:
var stringValues = input.split(" ");
var intValues = Array.ConvertAll(stringValues, s => int.Parse(s));

Delete all english letters in string

I need to delete all english letters in a string.
I wrote the following code:
StringBuilder str = new StringBuilder();
foreach(var letter in test)
{
if(letter >= 'a' && letter <= 'z')
continue;
str.Append(letter); }
What is the fastest way?
use Regex replace method, and give it [a-z]|[A-Z]
Try this:
var str = test.Where(item => item < 'A' || item > 'z' || (item > 'Z' && item < 'a'));
Use this method to do such execution....
public static string RemoveSpecialCharacters(string str)
{
StringBuilder sb = new StringBuilder();
foreach (char c in str)
{
if ((c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z'))
continue;
else
sb.Append(c);
}
return sb.ToString();
}

Categories

Resources