How to split two strings and swap sliced parts in C#? - c#

I have an array and each index contains a string with four characters. I need to select a random point in the string and then slice stringaArray[0] and stringaArray[1] at the same point and swap their sliced parts and add these to splicedStringArray[0] and splicedStringArray[1].
I know how to use split in C# and I have been experimenting with this, but it will only split the string into characters, not parts. I ask this question because my way of thinking is to create lots of variables to hold temporary strings then add them to the splicedStringArray[].
Here is my latest attempt to find the start middle and end of a string and hopefully copy whatever I want to variables to make new strings and then store these in teh second array:
string s = stringaArray[0];
char[] charArray = s.ToCharArray();
int amount = charArray.Length;
int findMiddle = amount / 2 + 1;
int midchar = findMiddle - 1;
int findLast = amount - 1;
char fchar = charArray[0];
char mchar = charArray[midchar];
char lchar = charArray[findLast];
I was also looking at the string builder class in C# and wondering if there was something there I could use, but I think I will spend a lot of time on this and and develop the worst solution so any advice on how to do this would appreciated.

For splitting at the exact position, use String.Substring. This way you can split up to certain point and from certain point. The simplest solution is similar to this:
var offset = 1;
splicedStringArray[0] = stringArray[0].Substring(0, offset) + stringArray[1].Substring(offset);
splicedStringArray[1] = stringArray[1].Substring(0, offset) + stringArray[0].Substring(offset);
Disclaimer: the code is written without testing.

Related

Incremental counting and saving all values in one string

I'm having trouble thinking of a logical way to achieve this. I have a method which sends a web request with a for loop that is counting up from 1 to x, the request counts up until it finds a specific response and then sends the URL + number to another method.
After this, saying we got the number 5, I need to create a string which displays as "1,2,3,4,5" but cannot seem to find a way to create the entire string, everything I try is simply replacing the string and only keeping the last number.
string unionMod = string.Empty;
for (int i = 1; i <= count; i++)
{
unionMod =+ count + ",";
}
I assumed I'd be able to simply add each value onto the end of the string but the output is just "5," with it being the last number. I have looked around but I can't seem to even think of what I would search in order to get the answer, I have a hard-coded solution but ideally, I'd like to not have a 30+ string with each possible value and just have it created when needed.
Any pointers?
P.S: Any coding examples are appreciated but I've probably just forgotten something obvious so any directions you can give are much appreciated, I should sleep but I'm on one of those all-night coding grinds.
Thank you!
First of all your problem is the +=. You should avoid concatenating strings because it allocates a new string. Instead you should use a StringBuilder.
Your Example: https://dotnetfiddle.net/Widget/qQIqWx
My Example: https://dotnetfiddle.net/Widget/sx7cxq
public static void Main()
{
var counter = 5;
var sb = new StringBuilder();
for(var i = 1; i <= counter; ++i) {
sb.Append(i);
if (i != counter) {
sb.Append(",");
}
}
Console.WriteLine(sb);
}
As it's been pointed out, you should use += instead of =+. The latter means "take count and append a comma to it", which is the incorrect result you experienced.
You could also simplify your code like this:
int count = 10;
string unionMod = String.Join(",", Enumerable.Range(1, count));
Enumerable.Range generates a sequence of integers between its two parameters and String.Join joins them up with the given separator character.

Package 3 chars of Array into an integer C#

So, i have this array which contains a bunch of numbers. I want to always take 3 of those chars and make one integer out of them. I haven't found anything on this yet.
here is an example:
string number = "123456xyz";
The string is what I have, these integers are what I want
int goal1 = 123;
int goal2 = 456;
int goaln = xyz;
It should go through all the chars and always split them into groups of three. I think foreach() is going to help me, but im not quite sure how to do it.
Something like this:
var goals = new List<int>();
for (int i = 0; i + 2 < number.Length; i += 3)
{
goals.Add(int.Parse(number.Substring(i,3)));
}
This has no error checking but it shows the general outline. Foreach isn't a great option because it would go through the characters one at a time when you want to look at them three at a time.
var numbers = (from Match m in Regex.Matches(number, #"\d{3}")
select m.Value).ToList();
var goal1 = Convert.ToInt32(numbers[0]);
var goal2 = Convert.ToInt32(numbers[1]);
...

C# Search array within provided index points

I'm not sure how best to phrase this. I have a text file of almost 80,000 words which I have converted across to a string array.
Basically I want a method where I pass it a word and it checks if it's in the word string array. To save it searching 80,000 each time I have indexed the locations where the words beginning with each letter start and end in a two dimensional array. So wordIndex[0,0] = 0 when the 'A' words start and wordIndex[1,0] = 4407 is where they end. Then wordIndex[0,1] = 4408 which is where the words beginning with 'B' start etc.
What I would like to know is how can I present this range to a method to have it search for a value. I know I can give an index and length but is this the only way? Can I say look for x within range y and z?
Look at Trie set. It can help you to store many words using few memory and quick search. Here is good implementation.
Basically you could use a for loop to search just a part of the array:
string word = "apple";
int start = 0;
int end = 4407;
bool found = false;
for (int i = start; i <= end ; i++)
{
if (arrayOfWords[i] == word)
{
found = true;
break;
}
}
But since the description of your index implies that your array is already sorted a better way might be to go with Array.BinarySearch<T>.

Changing the order of a string based on an array of numbers in C#

Thanks for the help with my question about making an array of booleans into a string. The code is now working good. Now I have another problem. Maybe somebody would like to try. I think I could come up with a solution but if so I'm 99 percent sure that it would be not so simple as the answers I have seen from people here.
What I have is the string "ABD" from my question here. I also have an array of integers. For example [0] = 2, [1] = 3 and [2] = 1. I would like to find a way to apply this to my string to reorder the string so that the string changes to BDA.
Can anyone think of a simple way to do this?
If those integers are 1-based indices within the string (i.e. 2 = 2nd character), then you could do this:
string s = "ABD";
int[] oneBasedIndices = new [] { 2, 3, 1 };
string result = String.Join(String.Empty, oneBasedIndices.Select(i => s[i-1]));
NB: If you are using a version less than C# 4.0, you need to put a .ToArray() on the end of the select.
What this is doing is going through your int[] and with each int element picking the character in the string at that position (well -1, as the first index in an array is 0, but your example starts at 1).
Then, it has to do a String.Join() to turn that collection of characters back into a String.
As an aside, I'd recommend downloading LINQPad (no connection) - then you can just paste that code in there as a C# Program, and at any point type variable.Dump(); (e.g. result.Dump(); at the end) and see what the value is at that point.
First make a copy of the string. The copy will never change; it serves as your reference to what the values used to be.
Then loop through the original string one character at a time using a for loop. The counter in the for loop is the position of which character in the original string we are replacing next. The counter is also the index into the array to look up the position in the original string. Then replace the character at that position in the original string with the character from the copied string.
string orig = "ABD";
int[] oneBasedIndices = new [] { 2, 3, 1 };
string copy = orig;
for ( int i = 0; i < orig.Length; i++ )
{
orig[i] = copy[ oneBasedIndices[i] - 1 ];
}
There you have it. If the indices are zero based, remove the - 1.
Napkin code again...
string result = "ABD"; // from last question
var indecies = new []{ 1,2,0 };
string result2 = indecies.Aggregate(new StringBuilder(),
(sb, i)=>sb.Append(result[i]))
.ToString();
or a different version (in hopes of redeeming myself for -1)
StringBuilder sb = new StringBuilder();
for(int i = 0; i < indecies.Length; i++)
{
sb.Append(result[i]); // make [i-1] if indecies are 1 based.
}
string result3 = sb.ToString();

Does any one know of a faster method to do String.Split()?

I am reading each line of a CSV file and need to get the individual values in each column. So right now I am just using:
values = line.Split(delimiter);
where line is the a string that holds the values that are seperated by the delimiter.
Measuring the performance of my ReadNextRow method I noticed that it spends 66% on String.Split, so I was wondering if someone knows of a faster method to do this.
Thanks!
The BCL implementation of string.Split is actually quite fast, I've done some testing here trying to out preform it and it's not easy.
But there's one thing you can do and that's to implement this as a generator:
public static IEnumerable<string> GetSplit( this string s, char c )
{
int l = s.Length;
int i = 0, j = s.IndexOf( c, 0, l );
if ( j == -1 ) // No such substring
{
yield return s; // Return original and break
yield break;
}
while ( j != -1 )
{
if ( j - i > 0 ) // Non empty?
{
yield return s.Substring( i, j - i ); // Return non-empty match
}
i = j + 1;
j = s.IndexOf( c, i, l - i );
}
if ( i < l ) // Has remainder?
{
yield return s.Substring( i, l - i ); // Return remaining trail
}
}
The above method is not necessarily faster than string.Split for small strings but it returns results as it finds them, this is the power of lazy evaluation. If you have long lines or need to conserve memory, this is the way to go.
The above method is bounded by the performance of IndexOf and Substring which does too much index of out range checking and to be faster you need to optimize away these and implement your own helper methods. You can beat the string.Split performance but it's gonna take cleaver int-hacking. You can read my post about that here.
It should be pointed out that split() is a questionable approach for parsing CSV files in case you come across commas in the file eg:
1,"Something, with a comma",2,3
The other thing I'll point out without knowing how you profiled is be careful about profiling this kind of low level detail. The granularity of the Windows/PC timer might come into play and you may have a significant overhead in just looping so use some sort of control value.
That being said, split() is built to handle regular expressions, which are obviously more complex than you need (and the wrong tool to deal with escaped commas anyway). Also, split() creates lots of temporary objects.
So if you want to speed it up (and I have trouble believing that performance of this part is really an issue) then you want to do it by hand and you want to reuse your buffer objects so you're not constantly creating objects and giving the garbage collector work to do in cleaning them up.
The algorithm for that is relatively simple:
Stop at every comma;
When you hit quotes continue until you hit the next set of quotes;
Handle escaped quotes (ie \") and arguably escaped commas (\,).
Oh and to give you some idea of the cost of regex, there was a question (Java not C# but the principle was the same) where someone wanted to replace every n-th character with a string. I suggested using replaceAll() on String. Jon Skeet manually coded the loop. Out of curiosity I compared the two versions and his was an order of magnitude better.
So if you really want performance, it's time to hand parse.
Or, better yet, use someone else's optimized solution like this fast CSV reader.
By the way, while this is in relation to Java it concerns the performance of regular expressions in general (which is universal) and replaceAll() vs a hand-coded loop: Putting char into a java string for each N characters.
Here's a very basic example using ReadOnlySpan. On my machine this takes around 150ns as opposed to string.Split() which takes around 250ns. That's a nice 40% improvement right there.
string serialized = "1577836800;1000;1";
ReadOnlySpan<char> span = serialized.AsSpan();
Trade result = new Trade();
index = span.IndexOf(';');
result.UnixTimestamp = long.Parse(span.Slice(0, index));
span = span.Slice(index + 1);
index = span.IndexOf(';');
result.Price = float.Parse(span.Slice(0, index));
span = span.Slice(index + 1);
index = span.IndexOf(';');
result.Quantity = float.Parse(span.Slice(0, index));
return result;
Note that a ReadOnlySpan.Split() will soon be part of the framework. See
https://github.com/dotnet/runtime/pull/295
Depending on use, you can speed this up by using Pattern.split instead of String.split. If you have this code in a loop (which I assume you probably do since it sounds like you are parsing lines from a file) String.split(String regex) will call Pattern.compile on your regex string every time that statement of the loop executes. To optimize this, Pattern.compile the pattern once outside the loop and then use Pattern.split, passing the line you want to split, inside the loop.
Hope this helps
I found this implementation which is 30% faster from Dejan Pelzel's blog. I qoute from there:
The Solution
With this in mind, I set to create a string splitter that would use an internal buffer similarly to a StringBuilder. It uses very simple logic of going through the string and saving the value parts into the buffer as it goes along.
public int Split(string value, char separator)
{
int resultIndex = 0;
int startIndex = 0;
// Find the mid-parts
for (int i = 0; i < value.Length; i++)
{
if (value[i] == separator)
{
this.buffer[resultIndex] = value.Substring(startIndex, i - startIndex);
resultIndex++;
startIndex = i + 1;
}
}
// Find the last part
this.buffer[resultIndex] = value.Substring(startIndex, value.Length - startIndex);
resultIndex++;
return resultIndex;
How To Use
The StringSplitter class is incredibly simple to use as you can see in the example below. Just be careful to reuse the StringSplitter object and not create a new instance of it in loops or for a single time use. In this case it would be better to juse use the built in String.Split.
var splitter = new StringSplitter(2);
splitter.Split("Hello World", ' ');
if (splitter.Results[0] == "Hello" && splitter.Results[1] == "World")
{
Console.WriteLine("It works!");
}
The Split methods returns the number of items found, so you can easily iterate through the results like this:
var splitter = new StringSplitter(2);
var len = splitter.Split("Hello World", ' ');
for (int i = 0; i < len; i++)
{
Console.WriteLine(splitter.Results[i]);
}
This approach has advantages and disadvantages.
You might think that there are optimizations to be had, but the reality will be you'll pay for them elsewhere.
You could, for example, do the split 'yourself' and walk through all the characters and process each column as you encounter it, but you'd be copying all the parts of the string in the long run anyhow.
One of the optimizations we could do in C or C++, for example, is replace all the delimiters with '\0' characters, and keep pointers to the start of the column. Then, we wouldn't have to copy all of the string data just to get to a part of it. But this you can't do in C#, nor would you want to.
If there is a big difference between the number of columns that are in the source, and the number of columns that you need, walking the string manually may yield some benefit. But that benefit would cost you the time to develop it and maintain it.
I've been told that 90% of the CPU time is spent in 10% of the code. There are variations to this "truth". In my opinion, spending 66% of your time in Split is not that bad if processing CSV is the thing that your app needs to do.
Dave
Some very thorough analysis on String.Slit() vs Regex and other methods.
We are talking ms savings over very large strings though.
The main problem(?) with String.Split is that it's general, in that it caters for many needs.
If you know more about your data than Split would, it can make an improvement to make your own.
For instance, if:
You don't care about empty strings, so you don't need to handle those any special way
You don't need to trim strings, so you don't need to do anything with or around those
You don't need to check for quoted commas or quotes
You don't need to handle quotes at all
If any of these are true, you might see an improvement by writing your own more specific version of String.Split.
Having said that, the first question you should ask is whether this actually is a problem worth solving. Is the time taken to read and import the file so long that you actually feel this is a good use of your time? If not, then I would leave it alone.
The second question is why String.Split is using that much time compared to the rest of your code. If the answer is that the code is doing very little with the data, then I would probably not bother.
However, if, say, you're stuffing the data into a database, then 66% of the time of your code spent in String.Split constitutes a big big problem.
CSV parsing is actually fiendishly complex to get right, I used classes based on wrapping the ODBC Text driver the one and only time I had to do this.
The ODBC solution recommended above looks at first glance to be basically the same approach.
I thoroughly recommend you do some research on CSV parsing before you get too far down a path that nearly-but-not-quite works (all too common). The Excel thing of only double-quoting strings that need it is one of the trickiest to deal with in my experience.
As others have said, String.Split() will not always work well with CSV files. Consider a file that looks like this:
"First Name","Last Name","Address","Town","Postcode"
David,O'Leary,"12 Acacia Avenue",London,NW5 3DF
June,Robinson,"14, Abbey Court","Putney",SW6 4FG
Greg,Hampton,"",,
Stephen,James,"""Dunroamin"" 45 Bridge Street",Bristol,BS2 6TG
(e.g. inconsistent use of speechmarks, strings including commas and speechmarks, etc)
This CSV reading framework will deal with all of that, and is also very efficient:
LumenWorks.Framework.IO.Csv by Sebastien Lorien
This is my solution:
Public Shared Function FastSplit(inputString As String, separator As String) As String()
Dim kwds(1) As String
Dim k = 0
Dim tmp As String = ""
For l = 1 To inputString.Length - 1
tmp = Mid(inputString, l, 1)
If tmp = separator Then k += 1 : tmp = "" : ReDim Preserve kwds(k + 1)
kwds(k) &= tmp
Next
Return kwds
End Function
Here is a version with benchmarking:
Public Shared Function FastSplit(inputString As String, separator As String) As String()
Dim sw As New Stopwatch
sw.Start()
Dim kwds(1) As String
Dim k = 0
Dim tmp As String = ""
For l = 1 To inputString.Length - 1
tmp = Mid(inputString, l, 1)
If tmp = separator Then k += 1 : tmp = "" : ReDim Preserve kwds(k + 1)
kwds(k) &= tmp
Next
sw.Stop()
Dim fsTime As Long = sw.ElapsedTicks
sw.Start()
Dim strings() As String = inputString.Split(separator)
sw.Stop()
Debug.Print("FastSplit took " + fsTime.ToString + " whereas split took " + sw.ElapsedTicks.ToString)
Return kwds
End Function
Here are some results on relatively small strings but with varying sizes, up to 8kb blocks. (times are in ticks)
FastSplit took 8 whereas split took 10
FastSplit took 214 whereas split took 216
FastSplit took 10 whereas split took 12
FastSplit took 8 whereas split took 9
FastSplit took 8 whereas split took 10
FastSplit took 10 whereas split took 12
FastSplit took 7 whereas split took 9
FastSplit took 6 whereas split took 8
FastSplit took 5 whereas split took 7
FastSplit took 10 whereas split took 13
FastSplit took 9 whereas split took 232
FastSplit took 7 whereas split took 8
FastSplit took 8 whereas split took 9
FastSplit took 8 whereas split took 10
FastSplit took 215 whereas split took 217
FastSplit took 10 whereas split took 231
FastSplit took 8 whereas split took 10
FastSplit took 8 whereas split took 10
FastSplit took 7 whereas split took 9
FastSplit took 8 whereas split took 10
FastSplit took 10 whereas split took 1405
FastSplit took 9 whereas split took 11
FastSplit took 8 whereas split took 10
Also, I know someone will discourage my use of ReDim Preserve instead of using a list... The reason is, the list really didn't provide any speed difference in my benchmarks so I went back to the "simple" way.
public static unsafe List<string> SplitString(char separator, string input)
{
List<string> result = new List<string>();
int i = 0;
fixed(char* buffer = input)
{
for (int j = 0; j < input.Length; j++)
{
if (buffer[j] == separator)
{
buffer[i] = (char)0;
result.Add(new String(buffer));
i = 0;
}
else
{
buffer[i] = buffer[j];
i++;
}
}
buffer[i] = (char)0;
result.Add(new String(buffer));
}
return result;
}
You can assume that String.Split will be close to optimal; i.e. it could be quite hard to improve on it. By far the easier solution is to check whether you need to split the string at all. It's quite likely that you'll be using the individual strings directly. If you define a StringShim class (reference to String, begin & end index) you'll be able to split a String into a set of shims instead. These will have a small, fixed size, and will not cause string data copies.
String.split is rather slow, if you want some faster methods, here you go. :)
However CSV is much better parsed by a rule based parser.
This guy, has made a rule based tokenizer for java. (requires some copy and pasting unfortunately)
http://www.csdgn.org/code/rule-tokenizer
private static final String[] fSplit(String src, char delim) {
ArrayList<String> output = new ArrayList<String>();
int index = 0;
int lindex = 0;
while((index = src.indexOf(delim,lindex)) != -1) {
output.add(src.substring(lindex,index));
lindex = index+1;
}
output.add(src.substring(lindex));
return output.toArray(new String[output.size()]);
}
private static final String[] fSplit(String src, String delim) {
ArrayList<String> output = new ArrayList<String>();
int index = 0;
int lindex = 0;
while((index = src.indexOf(delim,lindex)) != -1) {
output.add(src.substring(lindex,index));
lindex = index+delim.length();
}
output.add(src.substring(lindex));
return output.toArray(new String[output.size()]);
}

Categories

Resources