i have a string of length 98975333 and i need to remove first 5 letters in it. Can anyone suggest the best way to do this keeping performance in mind?
I tried
str.Substring(5,str.Length);
str.Remove(0,5);
which gives me result in 0.29 sec
but i want something even faster than the above.
Problem Using StringBuilder
-> i need to substring a part of the string and to do this i need to write
StringBuilder2.ToString().Substring(anyvaluehere)"
here the conversion of StringBuilder to string by ".ToString()" takes time and in this case i cant use StringBuilder
If you are working with long strings, always use StringBuilder. This class provides you fast adding and removing characters, faster than String.Concat or it's syntactic sugar "a" + "b". Moreover StringBuilder.ToString() method has special implementation for best performance as possible.
Sorry, c# strings are not arrays; they are immutable so extracting a (possibly very long) substring involves a copy.
However, most [string utilities] accept start and end indices, for instance IndexOf and CompareInfo.Compare all take a startIndexoverload.
Perhaps if you tell us what you want to do afterward we could suggest alternatives?
Update
Here are some ways you can write performant string parsing with the immutable strings in c#. Say for instance that you need to deserialize XML data inside the string, and need to skip the first N characters. You could do something like this:
public static object XmlDeserializeFromString<T>(this string objectData, int skip)
{
var serializer = new XmlSerializer(typeof(T));
using (var reader = new StringReader(objectData))
{
for (; skip > 0 && reader.Read() != -1; skip--)
;
return (T)serializer.Deserialize(reader);
}
}
As you can see from the source. StringReader.Read() does not make a copy of the unread portion of the string, it keeps an internal index to the remaining unread portion.
Or say you want to skip the first N characters of a string, then parse the string by splitting it at every "," character. You could write something like this:
public static IEnumerable<Pair<int>> WalkSplits(this string str, int startIndex, int count, params char[] separator)
{
if (string.IsNullOrEmpty(str))
yield break;
var length = str.Length;
int endIndex;
if (count < 0)
endIndex = length;
else
{
endIndex = startIndex + count;
if (endIndex > length)
endIndex = length;
}
while (true)
{
int nextIndex = str.IndexOfAny(separator, startIndex, endIndex - startIndex);
if (nextIndex == startIndex)
{
startIndex = nextIndex + 1;
}
else if (nextIndex == -1)
{
if (startIndex < endIndex)
yield return new Pair<int>(startIndex, endIndex - 1);
yield break;
}
else
{
yield return new Pair<int>(startIndex, nextIndex - 1);
startIndex = nextIndex + 1;
}
}
}
And then use the start and end indices of the Pair to further parse the string, or extract small substrings to feed to further parsing methods.
(Pair<T> is a small struct I created similar to KeyValuePair<TKey, TValue> but with identically typed first and second values. I can provide if needed.)
Using a StringBuilder to produce and manipulate the string will help you save on resources:
StringBuilder sb = new StringBuilder();
sb.Append("text"); //to add text in front
sb.Insert(50,"text"); // to insert text
sb.Remove(50,4); // to remove text
sb.ToString(); // to produce the string
If you have a fixed length of string that you wish to store elsewhere, you can make a char array and use StringBuilder's CopyTo() method:
e.g.
char[] firstfive = new char[5];
sb.CopyTo(0,firstfive,0,5);
Edit:
Actually, the OP figured this out himself, but I'm including it on the post for reference:
To get a portion of the StringBuilder as string:
sb.ToString(intStart,intLength)
Use String.Remove() i.e
String newStr = "";
newStr = str.Remove(0,5); //This will delete 5 characters starting from 0 index
Or
newStr = str.Remove(5); //Assumes the starting position as 0 and will ddelete 5 chars from that
Read more Here
Related
Problem statement:
Using just the ElementAt, Length, and Substring string methods and the + (concatenate)
operator, write a function that accepts a string s, a start position p, and a length l, and returns s with the characters starting in position p for a length of l removed. Don’t forget that strings start at position 0. Thus (“abcdefghijk”, 2, 4) returns “abghijk”. Don’t use any “remove” or similar built-in string gadget.
I tried to do this
static string rstring(string str, int p, int l)
{
string end= "";
for (int i=0 ; i<p; i++){
end+= str[i];
}
for (int i=p+l ; i<str.length i++){
end+= str[i];
}
return end;
}
I tried to do this but i couldn't figure out to use ElementAT and substring. Any help will be appricated.
You're using [] which is essentially the same thing as ElementAt, and if you look at your loops they're basically doing the same thing as Substring, albeit less efficiently because you're building up and throwing away a bunch of intermediate strings.
That said, I don't see why you'd use both -- you'd just use one or the other.
If you don't want to use Substring() or Remove() or any other string maninpulation I would use a simple loop (this doesn't include, but should include error handling).
for (int i=0; i < str.Length; i++)
{
if (i < p || i > l+1) end += str[i];
}
This basically just accepts a string, a start and an end and then returns a new string according to those parameters.
This is probably the easiest way to explain Substrings. They accept a start and end and then give you back a new string according to your start and end.
public string ReturnSubstring(string str, int start, int end)
{
return str.Substring(start, end);
}
var firstHalf = ReturnSubstring(myString, 0, 3);
var secondHalf = ReturnSubstring(myString, 6, myString.Length);
var newString = firstHalf + secondHalf;
I have a large string accepted from TCP listner which is in following format
"1,7620257787,0123456789,99,0922337203,9223372036,32.5455,87,12.7857,1/1/2012,9223372036:1,7620257787,0123456789,99,0922337203,9223372036,32.5455,87,12.7857:2/1/2012,234234234:3,7620257787,01234343456789,99,0922337203,9223372036,32.5455,87,12.7857,1/1/2012,9223372036:34,76202343457787,012434343456789,93339,34340922337203,9223372036,32.5455,87,12.7857,1/1/2012,9223372036"
You can see that this is a : seperated string which contains Records which are comma seperated fields.
I am looking for the best (fastest) way that split the string in given number of chunks and take care that one chunk should contain full record (string upto ':')
or other way of saying , there should not be any chunck which is not ending with :
e.g. 20 MB string to 4 chunks of 5 MB each with proper records (thus size of each chunk may not be exactly 5 MB but very near to it and total of all 4 chunks will be 20 MB)
I hope you can understand my question (sorry for the bad english)
I like the following link , but it does not take care of full record while spliting also don't know if that is the best and fastest way.
Split String into smaller Strings by length variable
I don't know how large a 'large string' is, but initially I would just try it with the String.Split method.
The idea is to divide the lenght of your data for the num of blocks required, then look backwards to search the last sep in the current block.
private string[] splitToBlocks(string data, int numBlocks, char sep)
{
// We return an array of the request length
if (numBlocks <= 1 || data.Length == 0)
{
return new string [] { data };
}
string[] result = new string[numBlocks];
// The optimal size of each block
int blockLen = (data.Length / numBlocks);
int idx = 0; int pos = 0; int lastSepPos = blockLen;
while (idx < numBlocks)
{
// Search backwards for the first sep starting from the lastSepPos
char c = data[lastSepPos];
while (c != sep) { lastSepPos--; c = data[lastSepPos]; }
// Get the block data in the result array
result[idx] = data.Substring(pos, (lastSepPos + 1) - pos);
// Reposition for then next block
idx++;
pos = lastSepPos + 1;
if(idx == numBlocks-1)
lastSepPos = data.Length - 1;
else
lastSepPos = blockLen * (idx + 1);
}
return result;
}
Please test it. I have not fully tested for fringe cases.
OK, I suggest you way with two steps:
Split string into chunks (see below)
Check chunks for completeness
Splitting string into chunks with help of linq (linq extension method taked from Split a collection into `n` parts with LINQ? ):
string tcpstring = "chunk1 : chunck2 : chunk3: chunk4 : chunck5 : chunk6";
int numOfChunks = 4;
var chunks = (from string z in (tcpstring.Split(':').AsEnumerable()) select z).Split(numOfChunks);
List<string> result = new List<string>();
foreach (IEnumerable<string> chunk in chunks)
{
result.Add(string.Join(":",chunk));
}
.......
static class LinqExtensions
{
public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> list, int parts)
{
int i = 0;
var splits = from item in list
group item by i++ % parts into part
select part.AsEnumerable();
return splits;
}
}
Am I understand your aims clearly?
[EDIT]
In my opinion, In case of performance consideration, better way to use String.Split method for chunking
It seems you want to split on ":" (you can use the Split method).
Then you have to add ":" after splitting to each chunk that has been split.
(you can then split on "," for all the strings that have been split by ":".
int index = yourstring.IndexOf(":");
string[] whatever = string.Substring(0,index);
yourstring = yourstring.Substring(index); //make a new string without the part you just cut out.
this is a general view example, all you need to do is establish an iteration that will run while the ":" character is encountered; cheers...
When I have a string that I want to cut into a new string from a certain Index to a certain Index, which function do I use?
If the string was:
ABCDEFG
This would mean retrieving BCD when the two indexes specified were 1 and 3.
If endIndex points to the last character that you want to have included in the extracted substring:
int length = endIndex - startIndex + 1;
string extracted = s.Substring(startIndex, length);
If endIndex points to the first character following the desired substring (i.e. to the start of the remaining text):
int length = endIndex - startIndex;
string extracted = s.Substring(startIndex, length);
See String.Substring Method (Int32, Int32) for the official description on Microsoft Docs.
Since C# 8.0, in .NET Core and .NET 5+ only, you can use Indices and ranges
string extracted = s[startIndex..endIndex];
where the position at endIndex is excluded. This corresponds to my second example with Substring where endIndex points to the first character following the desired substring (i.e. to the start of the remaining text).
If endIndex is intended to point to the last character that you want to have included, just add one to endIndex:
string extracted = s[startIndex..(endIndex + 1)];
This becomes possible with the new Range feature of C# 8.0.
An extension method on string that uses Range to achieve this is:
public static class StringExtensions
{
public static string SubstringByIndexes(this string value, int startIndex, int endIndex)
{
var r = Range.Create(startIndex, endIndex + 1);
return value[r];
/*
// The content of this method can be simplified down to:
return value[startIndex..endIndex + 1];
// by using a 'Range Expression' instead of constructing the Range 'long hand'
*/
}
}
Note: 1 is added to endIndex when constructing the Range that's used as the end of the range is exclusive, rather than inclusive.
Which can be called like this:
var someText = "ABCDEFG";
var substring = someText.SubstringByIndexes(1, 3);
Giving a value of BCD in substring.
Unfortunately, C# doesn't natively have what you need. C# offers Substring(int startIndex, int length) instead. To achieve Substring(int startIndex, int endIndex), you will need custom implementation. Following extension method can make reusability easier/cleaner:
public static class Extensions
{
public static string Substring2(this string value, int startIndex, int endIndex)
{
return value.Substring(startIndex, (endIndex - startIndex + 1));
}
}
There is two way to substring string..
1 )
public string Substring(
int startIndex
)
Retrieves a substring from this instance. The substring starts at a specified character position.
2)
public string Substring(
int startIndex,
int length
)
Retrieves a substring from this instance. The substring starts at a specified character position and has a specified length.
I'm currently making a game but I seem to have problems reading values from a text file. For some reason, when I read the value, it gives me the ASCII code of the value rather than the actual value itself when I wrote it to the file. I've tried about every ASCII conversion function and string conversion function, but I just can't seem to figure it out.
I use a 2D array of integers. I use a nested for loop to write each element into the file. I've looked at the file and the values are correct, but I don't understand why it's returning the ASCII code. Here's the code I'm using to write and read to file:
Writing to file:
for (int i = 0; i < level.MaxRows(); i++)
{
for (int j = 0; j < level.MaxCols(); j++)
{
fileWrite.Write(level.GetValueAtIndex(i, j) + " ");
//Console.WriteLine(level.GetValueAtIndex(i, j));
}
//add new line
fileWrite.WriteLine();
}
And here's the code where I read the values from the file:
string str = "";
int iter = 0; //used to iterate in each column of array
for (int i = 0; i < level.MaxRows(); i++)
{
iter = 0;
//TODO: For some reason, the file is returning ASCII code, convert to int
//keep reading characters until a space is reached.
str = fileRead.ReadLine();
//take the above string and extract the values from it.
//Place each value in the level.
foreach (char id in str)
{
if (id != ' ')
{
//convert id to an int
num = (int)id;
level.ChangeTile(i, iter, num);
iter++;
}
}
This is the latest version of the loop that I use to read the values. Reading other values is fine; it's just when I get to the array, things go wrong. I guess my question is, why did the conversion to ASCII happen? If I can figure that out, then I might be able to solve the issue. I'm using XNA 4 to make my game.
This is where the convertion to ascii is happening:
fileWrite.Write(level.GetValueAtIndex(i, j) + " ");
The + operator implicitly converts the integer returned by GetValueAtIndex into a string, because you are adding it to a string (really, what did you expect to happen?)
Furthermore, the ReadLine method returns a String, so I am not sure why you'd expect a numeric value to magically come back here. If you want to write binary data, look into BinaryWriter
This is where you are converting the characters to character codes:
num = (int)id;
The id variable is a char, and casting that to int gives you the character code, not the numeric value.
Also, this converts a single character, not a whole number. If you for example have "12 34 56 " in your text file, it will get the codes for 1, 2, 3, 4, 5 and 6, not 12, 34 and 56.
You would want to split the line on spaces, and parse each substring:
foreach (string id in str.Split(' ')) {
if (id.Length > 0) {
num = Int32.Parse(id);
level.ChangeTile(i, iter, num);
iter++;
}
}
Update: I've kept the old code (below) with the assumption that one record was on each line, but I've also added a different way of doing it that should work with multiple integers on a line, separated by a space.
Multiple records on one line
str = fileRead.ReadLine();
string[] values = str.Split(new Char[] {' '});
foreach (string value in values)
{
int testNum;
if (Int32.TryParse(str, out testnum))
{
// again, not sure how you're using iter here
level.ChangeTile(i, iter, num);
}
}
One record per line
str = fileRead.ReadLine();
int testNum;
if (Int32.TryParse(str, out testnum))
{
// however, I'm not sure how you're using iter here; if it's related to
// parsing the string, you'll probably need to do something else
level.ChangeTile(i, iter, num);
}
Please note that the above should work if you write out each integer line-by-line (i.e. how you were doing it via the WriteLine which you remarked out in your code above). If you switch back to using a WriteLine, this should work.
You have:
foreach (char id in str)
{
//convert id to an int
num = (int)id;
A char is an ASCII code (or can be considered as such; technically it is a unicode code-point, but that is broadly comparable assuming you are writing ANSI or low-value UTF-8).
What you want is:
num = (int)(id - '0');
This:
fileWrite.Write(level.GetValueAtIndex(i, j) + " ");
converts the int returned from level.GetValueAtIndex(i, j) into a string. Assuming the function returns the value 5 for a particular i and j then you write "5 " into the file.
When you then read it is being read as a string which consists of chars and you get the ASCII code of 5 when you cast it simply to an int. What you need is:
foreach (char id in str)
{
if (id != ' ')
{
//convert id to an int
num = (int)(id - '0'); // subtract the ASCII value for 0 from your current id
level.ChangeTile(i, iter, num);
iter++;
}
}
However this only works if you only ever are going to have single digit integers (only 0 - 9). This might be better:
foreach (var cell in fileRead.ReadLine().Split(' '))
{
num = Int.Parse(cell);
level.ChangeTile(i, iter, num);
iter++;
}
I need to get the letters as an array on passing two letters using C#
For ex.: When i pass "AE", i need to get the {A,B,C,D,E} as an array. and passing "FJ" should return {F,G,H,I,J}.
The Enumerable class can create a range, which makes the looping simple:
public static char[] CharactersBetween(char start, char end) {
return Enumerable.Range(start, end - start + 1).Select(c => (char)c).ToArray();
}
Note: A char value converts implicitly into int, so there is no conversion needed in that direction. You only have to convert the integers back to char.
Edit:
If you want to send in the alphabet to use (to handle language differences), you can use Substring to get a part of that string:
public static char[] CharactersBetween(char start, char end, string alphabet) {
int idx = alphabet.IndexOf(start);
return alphabet.Substring(idx, alphabet.IndexOf(end) - idx + 1).ToCharArray();
}
Do you mean something like
char[] CharactersBetween(char start, char end)
{
List<char> result = new List<char>();
for (char i = start; i <= end; i++)
{
result.Add(i);
}
return result.ToArray();
}
This should work out well
string startandend = "AG";
string result= "";
for( char i = startandend[0]; i <= startandend[1]; i++){
result += i;
}
result will now contain ABCDEFG.
You should probably add some logic to check if startandend actually have a Length of 2 and so on, but this should be a good starting block for you.
If you want the char[] instead of the string representation, simply call result.ToCharArray() at the end.
Use a loop with integer conversion
with
System.Convert.ToInt32(Char);
and
System.Convert.ToChar(Int32);
see http://msdn.microsoft.com/en-us/library/system.convert_methods.aspx
Pretty simple if you use a fixed alphabet,
public static string ALPHABET = "ABCDEFGHIJKLMNOPWRSTUVWXYZ";
public static List<char> GetLetters(string firstLast)
{
List<char> letters = new List<char>();
int start = ALPHABET.IndexOf(firstLast[0]);
int end = ALPHABET.IndexOf(firstLast[1]);
for (int i = start; i <= end; i++)
{
letters.Add(ALPHABET[i]);
}
return letters;
}
Obviously add in your checks for various things, but it does the basic job.
You're going to have to reference whichever alphabet you want to use. English is easy enough as the letters happen to correspond to code-point order, French treats Œ and Æ as letters in their own right sometimes, and not others. Danish and Norwegian place "Æ, Ø, Å" after Z and Swedish does the same with "Å, Ä, Ö". Irish uses "ABCDEFGHILMNOPRSTU" as the alphabet, but does also use J, K, Q, V, W, X, Y & Z in loan words.
And those are relatively easy cases. So there's no one-size-fits-all.
The easiest way to pass an alphabet is to have a string that contains it. So, e.g. the Danish alphabet would have the string "ABCDEFGHIJKLMNOPQRSTUVWXYZÆØÅ" while French could either include the ligatures or not as you wish (but do you need to deal with the possibility of receiving them while not using them?).
This done:
public static IEnumerable<char> AlphabetRange(string alphabet, string alphaRange)
{
if(alphaRange == null)
throw new ArgumentNullException();
if(alphaRange.Length < 2)
throw new ArgumentException();
int startIdx = alphabet.IndexOf(alphaRange[0]);
int endIdx = alphabet.IndexOf(alphaRange[1]) + 1;
if(startIdx == -1 || endIdx == 0)
throw new ArgumentException();
while(startIdx < endIdx)
yield return alphabet[startIdx++];
}