Efficient way to extract just one element from a delimited string

Efficient way to extract just one element from a delimited string - c#

NOTE: I did ask the same question here but since some people have marked it as duplicate though it had some crafty, neat solutions, I had to create this extra(dupe) question to make it easier for others who are facing similar doubts. Added the question based on the suggestion of fellow stack overflow members.
What is the efficient way to parse through a large delimited string so that I can access just one element from the delimited set without having to store the other substrings involved?
I specifically am not interested in storing the rest of the element values as done when using Split() method since all of this information is irrelevant to the problem at hand. Also, I want to save memory in doing the same.
Problem Statement:
Given the exact delimited position, I need to extract the element contained in that given position in the most efficient way in terms of memory consumed and time taken.
Simple example string: "1,2,3,4,....,21,22,23,24"
Delimter: ,
Delimited Position: 22
Answer expected: 23
Another example string:
"61d2e3f6-bcb7-4cd1-a81e-4f8f497f0da2;0;192.100.0.102:4362;2014-02-14;283;0;354;23;0;;;""0x8D15A2913C934DE"";Thursday, 19-Jun-14 22:58:10 GMT;"
Delimiter: ;
Delimited Position: 7
Expected Answer: 23

There are some useful remarks relevant to this problem in the documentation for String.Split, although I wrote the following before discovering that.
One way to do it is to find a delimiter with String.IndexOf method - you can specify the index to start the search from, so it is possible to skip along the items without having to examine every character. (The examination of every character happens behind the scenes, but it's a little bit faster than doing it yourself.)
I made up an extension method by adding a new class named "ExtensionMethods.cs" to the solution with this content:
namespace ExtensionMethods
{
public static class MyExtensions
{
/// <summary>
/// Get the nth item from a delimited string.
/// </summary>
/// <param name="s">The string to retrieve a delimited item from.</param>
/// <param name="delimiter">The character used as the item delimiter.</param>
/// <param name="n">Zero-based index of item to return.</param>
/// <returns>The nth item or an empty string.</returns>
public static string Split(this string s, char delimiter, int n)
{
int pos = pos = s.IndexOf(delimiter);
if (n == 0 || pos < 0)
{ return (pos >= 0) ? s.Substring(0, pos) : s; }
int nDelims = 1;
while (nDelims < n && pos >= 0)
{
pos = s.IndexOf(delimiter, pos + 1);
nDelims++;
}
string result = "";
if (pos >= 0)
{
int nextDelim = s.IndexOf(delimiter, pos + 1);
result = (nextDelim < 0) ? s.Substring(pos + 1) : s.Substring(pos + 1, nextDelim - pos - 1);
}
return result;
}
}
}
And a small program to test it:
using System;
using System.Diagnostics;
using System.Linq;
using ExtensionMethods;
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
// test data...
string s = string.Join(";", Enumerable.Range(65, 26).Select(c => (char)c));
s = s.Insert(3, ";;;");
string o = "";
Stopwatch sw = new Stopwatch();
sw.Start();
for (int i = 1; i <= 1000000; i++) {
o = s.Split(';', 21);
}
sw.Stop();
Console.WriteLine("Item directly selected: " + sw.ElapsedMilliseconds);
sw.Restart();
for (int i = 1; i <= 1000000; i++) {
o = s.Split(';')[21];
}
sw.Stop();
Console.WriteLine("Item from split array: " + sw.ElapsedMilliseconds + "\r\n");
Console.WriteLine(s);
Console.WriteLine(o);
Console.ReadLine();
}
}
}
Sample output:
Item directly selected: 1016
Item from split array: 1345
A;B;;;;C;D;E;F;G;H;I;J;K;L;M;N;O;P;Q;R;S;T;U;V;W;X;Y;Z
S
Reference: How to: Implement and Call a Custom Extension Method (C# Programming Guide)

try this:
public static string MyExtension(this string s, char delimiter, int n)
{
var begin = n== 0 ? 0 : Westwind.Utilities.StringUtils.IndexOfNth(s, delimiter, n);
if (begin == -1)
return null;
var end = s.IndexOf(delimiter, begin + (n==0?0:1));
if (end == -1 ) end = s.Length;
//var end = Westwind.Utilities.StringUtils.IndexOfNth(s, delimiter, n + 1);
var result = s.Substring(begin +1, end - begin -1 );
return result;
}
PS: Library used is Westwind.Utilities
Benchmark Code:
void Main()
{
string s = string.Join(";", Enumerable.Range(65, 26).Select(c => (char)c));
s = s.Insert(3, ";;;");
string o = "";
Stopwatch sw = new Stopwatch();
sw.Start();
for (int i = 1; i <= 1000000; i++) {
o = s.Split(';', 21);
}
sw.Stop();
Console.WriteLine("Item directly selected: " + sw.ElapsedMilliseconds);
sw.Restart();
for (int i = 1; i <= 1000000; i++) {
o = s.MyExtension(';', 21);
}
sw.Stop();
Console.WriteLine("Item directly selected by MyExtension: " + sw.ElapsedMilliseconds);
sw.Restart();
for (int i = 1; i <= 1000000; i++) {
o = s.Split(';')[21];
}
sw.Stop();
Console.WriteLine("Item from split array: " + sw.ElapsedMilliseconds + "\r\n");
Console.WriteLine(s);
Console.WriteLine(o);
}
public static class MyExtensions
{
/// <summary>
/// Get the nth item from a delimited string.
/// </summary>
/// <param name="s">The string to retrieve a delimited item from.</param>
/// <param name="delimiter">The character used as the item delimiter.</param>
/// <param name="n">Zero-based index of item to return.</param>
/// <returns>The nth item or an empty string.</returns>
public static string Split(this string s, char delimiter, int n)
{
int pos = pos = s.IndexOf(delimiter);
if (n == 0 || pos < 0)
{ return (pos >= 0) ? s.Substring(0, pos) : s; }
int nDelims = 1;
while (nDelims < n && pos >= 0)
{
pos = s.IndexOf(delimiter, pos + 1);
nDelims++;
}
string result = "";
if (pos >= 0)
{
int nextDelim = s.IndexOf(delimiter, pos + 1);
result = (nextDelim < 0) ? s.Substring(pos + 1) : s.Substring(pos + 1, nextDelim - pos - 1);
}
return result;
}
public static string MyExtension(this string s, char delimiter, int n)
{
var begin = n== 0 ? 0 : Westwind.Utilities.StringUtils.IndexOfNth(s, delimiter, n);
if (begin == -1)
return null;
var end = s.IndexOf(delimiter, begin + (n==0?0:1));
if (end == -1 ) end = s.Length;
//var end = Westwind.Utilities.StringUtils.IndexOfNth(s, delimiter, n + 1);
var result = s.Substring(begin +1, end - begin -1 );
return result;
}
}
Results:
Item directly selected: 277
Item directly selected by MyExtension: 114
Item from split array: 1297
A;B;;;;C;D;E;F;G;H;I;J;K;L;M;N;O;P;Q;R;S;T;U;V;W;X;Y;Z
S
Edit: Thanks to #Kalten, I enhanced solution further. Considerable difference has been seen on benchmark results.

By using the following Regex : ^([^;]*;){21}(.*?); , with that you don't have to generate the hole split list to search for your desired position, and once you reach it, it gonna be a matter of whether exists or not.
Explanation :
^ --> start of a line.
([^;]*;){Position - 1} --> notice that the symbol ; here is the delimiter, the expression will loop Pos - 1 times
(.*?) --> Non-Greedy .*
DEMO
For more about regular expressions on C# : documentation
In the example below i did implemant the two samples to show you how it works.
Match Method : documentation (Basically it searchs only for the first occurence of the pattern)
RegexOptions.Singleline : Treats the input as a signle line.
C# Code
Console.WriteLine("First Delimiter : ");
int Position = 22;
char delimiter = ',';
string pattern = #"^([^" + delimiter + "]*" + delimiter + "){" + (Position - 1) + #"}(.*?)" + delimiter;
Regex regex = new Regex(pattern, RegexOptions.Singleline);
// First Example
string Data = #"AAV,zzz,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22ABC,23,24,24";
Match Re = regex.Match(Data);
if (Re.Groups.Count > 0)
Console.WriteLine("\tMatch found : " + Re.Groups[2]);
// Second Example
Console.WriteLine("Second Delimiter : ");
Position = 8;
delimiter = ';';
pattern = #"^([^" + delimiter + "]*" + delimiter + "){" + (Position - 1) + #"}(.*?)" + delimiter;
Data = #"61d2e3f6-bcb7-4cd1-a81e-4f8f497f0da2;0;192.100.0.102:4362;2014-02-14;283;0;354;23;0;;;""0x8D15A2913C934DE"";Thursday, 19-Jun-14 22:58:10 GMT;";
regex = new Regex(pattern, RegexOptions.Singleline);
Re = regex.Match(Data);
if (Re.Groups.Count > 0)
Console.WriteLine("\tMatch found : " + Re.Groups[2]);
Output :
First Delimiter :
Match found : 22ABC
Second Delimiter :
Match found : 23

If you want to be sure the code parses the string in only one pass, and only parses what is needed, you can write the routine that iterates over the string yourself.
Since all c# strings implement IEnumerable<char> it is fairly straightforward to devise a method that requires zero string allocations:
static public IEnumerable<char> GetDelimitedField(this IEnumerable<char> source, char delimiter, int index)
{
foreach (var c in source)
{
if (c == delimiter)
{
if (--index < 0) yield break;
}
else
{
if (index == 0) yield return c;
}
}
}
This returns the result as an IEnumerable<char> but it's cheap to convert to a string. It's going to be a much shorter string at this point anyway.
static public string GetDelimitedString(this string source, char delimiter, int index)
{
var result = source.GetDelimitedField(delimiter, index);
return new string(result.ToArray());
}
And you can call it like this:
var input ="Zero,One,Two,Three,Four,Five,Six";
var output = input.GetDelimitedString(',',5);
Console.WriteLine(output);
Output:
Five
Example on DotNetFiddle

Too late for "answer" but this code gives me a run time of about 0.75 seconds with both strings processed 1,000,000 times. Difference this time is that now I'm not Marshaling an object but using pointers.
And this time I am returning a single new string (String.Substring).
using System;
using System.Diagnostics;
using System.Runtime.InteropServices;
class Program
{
static void Main(string[] args)
{
string testString1 = "1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24";
string testString2 = "61d2e3f6-bcb7-4cd1-a81e-4f8f497f0da2;0;192.100.0.102:4362;2014-02-14;283;0;354;23;0;;;\"0x8D15A2913C934DE\";Thursday, 19-Jun-14 22:58:10 GMT;";
Stopwatch sw = new Stopwatch();
sw.Start();
for (int i = 1; i < 1000000; i++)
{
Delimit(testString1, ',', 22);
Delimit(testString2, ';', 6);
}
sw.Stop();
Console.WriteLine($"==>{sw.ElapsedMilliseconds}");
Console.ReadLine();
}
static string Delimit(string stringUnderTest, char delimiter, int skipCount)
{
const int SIZE_OF_UNICHAR = 2;
int i = 0;
int index = 0;
char c = Char.MinValue;
GCHandle handle = GCHandle.Alloc(stringUnderTest, GCHandleType.Pinned);
try
{
IntPtr ptr = handle.AddrOfPinnedObject();
for (i = 0; i < skipCount; i++)
while ((char)Marshal.ReadByte(ptr, index += SIZE_OF_UNICHAR) != delimiter) ;
i = index;
while ((c = (char)Marshal.ReadByte(ptr, i += SIZE_OF_UNICHAR)) != delimiter) ;
}
finally
{
if (handle.IsAllocated)
handle.Free();
}
return stringUnderTest.Substring((index + SIZE_OF_UNICHAR) >> 1, (i - index - SIZE_OF_UNICHAR) >> 1);
}
}

Related

How do you do a string split with 2 chars counts in C#?

I know how to do a string split if there's a letter, number, that I want to replace.
But how could I do a string.Split() by 2 char counts without replacing any existing letters, number, etc...?
Example:
string MAC = "00122345"
I want that string to output: 00:12:23:45

You could create a LINQ extension method to give you an IEnumerable<string> of parts:
public static class Extensions
{
public static IEnumerable<string> SplitNthParts(this string source, int partSize)
{
if (string.IsNullOrEmpty(source))
{
throw new ArgumentException("String cannot be null or empty.", nameof(source));
}
if (partSize < 1)
{
throw new ArgumentException("Part size has to be greater than zero.", nameof(partSize));
}
return Enumerable
.Range(0, (source.Length + partSize - 1) / partSize)
.Select(pos => source
.Substring(pos * partSize,
Math.Min(partSize, source.Length - pos * partSize)));
}
}
Usage:
var strings = new string[] {
"00122345",
"001223453"
};
foreach (var str in strings)
{
Console.WriteLine(string.Join(":", str.SplitNthParts(2)));
}
// 00:12:23:45
// 00:12:23:45:3
Explanation:
Use Enumerable.Range to get number of positions to slice string. In this case its the length of the string + chunk size - 1, since we need to get a big enough range to also fit leftover chunk sizes.
Enumerable.Select each position of slicing and get the startIndex using String.Substring using the position multiplied by 2 to move down the string every 2 characters. You will have to use Math.Min to calculate the smallest size leftover size if the string doesn't have enough characters to fit another chunk. You can calculate this by the length of the string - current position * chunk size.
String.Join the final result with ":".
You could also replace the LINQ query with yield here to increase performance for larger strings since all the substrings won't be stored in memory at once:
for (var pos = 0; pos < source.Length; pos += partSize)
{
yield return source.Substring(pos, Math.Min(partSize, source.Length - pos));
}

You can use something like this:
string newStr= System.Text.RegularExpressions.Regex.Replace(MAC, ".{2}", "$0:");
To trim the last colon, you can use something like this.
newStr.TrimEnd(':');
Microsoft Document

Try this way.
string MAC = "00122345";
MAC = System.Text.RegularExpressions.Regex.Replace(MAC,".{2}", "$0:");
MAC = MAC.Substring(0,MAC.Length-1);
Console.WriteLine(MAC);

A quite fast solution, 8-10x faster than the current accepted answer (regex solution) and 3-4x faster than the LINQ solution
public static string Format(this string s, string separator, int length)
{
StringBuilder sb = new StringBuilder();
for (int i = 0; i < s.Length; i += length)
{
sb.Append(s.Substring(i, Math.Min(s.Length - i, length)));
if (i < s.Length - length)
{
sb.Append(separator);
}
}
return sb.ToString();
}
Usage:
string result = "12345678".Format(":", 2);

Here is a one (1) line alternative using LINQ Enumerable.Aggregate.
string result = MAC.Aggregate("", (acc, c) => acc.Length % 3 == 0 ? acc += c : acc += c + ":").TrimEnd(':');

An easy to understand and simple solution.
This is a simple fast modified answer in which you can easily change the split char.
This answer also checks if the number is even or odd , to make the suitable string.Split().
input : 00122345
output : 00:12:23:45
input : 0012234
output : 00:12:23:4
//The List that keeps the pairs
List<string> MACList = new List<string>();
//Split the even number into pairs
for (int i = 1; i <= MAC.Length; i++)
{
if (i % 2 == 0)
{
MACList.Add(MAC.Substring(i - 2, 2));
}
}
//Make the preferable output
string output = "";
for (int j = 0; j < MACList.Count; j++)
{
output = output + MACList[j] + ":";
}
//Checks if the input string is even number or odd number
if (MAC.Length % 2 == 0)
{
output = output.Trim(output.Last());
}
else
{
output += MAC.Last();
}
//input : 00122345
//output : 00:12:23:45
//input : 0012234
//output : 00:12:23:4

How to parse below string in C#?

Please someone to help me to parse these sample string below? I'm having difficulty to split the data and also the data need to add carriage return at the end of every event
sample string:
L,030216,182748,00,FF,I,00,030216,182749,00,FF,I,00,030216,182750,00,FF,I,00
batch of events
expected output:
L,030216,182748,00,FF,I,00 - 1st Event
L,030216,182749,00,FF,I,00 - 2nd Event
L,030216,182750,00,FF,I,00 - 3rd Event

Seems like an easy problem. Something as easy as this should do it:
string line = "L,030216,182748,00,FF,I,00,030216,182749,00,FF,I,00,030216,182750,00,FF,I,00";
string[] array = line.Split(',');
StringBuilder sb = new StringBuilder();
for(int i=0; i<array.Length-1;i+=6)
{
sb.AppendLine(string.Format("{0},{1} - {2} event",array[0],string.Join(",",array.Skip(i+1).Take(6)), "number"));
}
output (sb.ToString()):
L,030216,182748,00,FF,I,00 - number event
L,030216,182749,00,FF,I,00 - number event
L,030216,182750,00,FF,I,00 - number event
All you have to do is work on the function that increments the ordinals (1st, 2nd, etc), but that's easy to get.

This should do the trick, given there are no more L's inside your string, and the comma place is always the sixth starting from the beginning of the batch number.
class Program
{
static void Main(string[] args)
{
String batchOfevents = "L,030216,182748,00,FF,I,00,030216,182749,00,FF,I,00,030216,182750,00,FF,I,00,030216,182751,00,FF,I,00,030216,182752,00,FF,I,00,030216,182753,00,FF,I,00";
// take out the "L," to start processing by finding the index of the correct comma to slice.
batchOfevents = batchOfevents.Substring(2);
String output = "";
int index = 0;
int counter = 0;
while (GetNthIndex(batchOfevents, ',', 6) != -1)
{
counter++;
if (counter == 1){
index = GetNthIndex(batchOfevents, ',', 6);
output += "L, " + batchOfevents.Substring(0, index) + " - 1st event\n";
batchOfevents = batchOfevents.Substring(index + 1);
} else if (counter == 2) {
index = GetNthIndex(batchOfevents, ',', 6);
output += "L, " + batchOfevents.Substring(0, index) + " - 2nd event\n";
batchOfevents = batchOfevents.Substring(index + 1);
}
else if (counter == 3)
{
index = GetNthIndex(batchOfevents, ',', 6);
output += "L, " + batchOfevents.Substring(0, index) + " - 3rd event\n";
batchOfevents = batchOfevents.Substring(index + 1);
} else {
index = GetNthIndex(batchOfevents, ',', 6);
output += "L, " + batchOfevents.Substring(0, index) + " - " + counter + "th event\n";
batchOfevents = batchOfevents.Substring(index + 1);
}
}
output += "L, " + batchOfevents + " - " + (counter+1) + "th event\n";
Console.WriteLine(output);
}
public static int GetNthIndex(string s, char t, int n)
{
int count = 0;
for (int i = 0; i < s.Length; i++)
{
if (s[i] == t)
{
count++;
if (count == n)
{
return i;
}
}
}
return -1;
}
}
Now the output will be in the format you asked for, and the original string has been decomposed.
NOTE: the getNthIndex method was taken from this old post.

If you want to split the string into multiple strings, you need a set of rules,
which are implementable. In your case i would start splitting the complete
string by the given comma , and than go though the elements in a loop.
All the strings in the loop will be appended in a StringBuilder. If your ruleset
say you need a new line, just add it via yourBuilder.Append('\r\n') or use AppendLine.
EDIT
Using this method, you can also easily add new chars like L or at the end rd Event

Look for the start index of 00,FF,I,00 in the entire string.
Extract a sub string starting at 0 and index plus 10 which is the length of the characters in 1.
Loop through it again each time with a new start index where you left of in 2.
Add a new line character each time.

Have a try the following:
string stream = "L,030216,182748,00,FF,I,00, 030216,182749,00,FF,I,00, 030216,182750,00,FF,I,00";
string[] lines = SplitLines(stream, "L", "I", ",");
Here the SplitLines function is implemented to detect variable-length events within the arbitrary-formatted stream:
string stream = "A;030216;182748 ;00;FF;AA;01; 030216;182749;AA;02";
string[] lines = SplitLines(batch, "A", "AA", ";");
Split-rules are:
- all elements of input stream are separated by separator(, for example).
- each event is bounded by the special markers(L and I for example)
- end marker is previous element of event-sequence
static string[] SplitLines(string stream, string startSeq, string endLine, string separator) {
string[] elements = stream.Split(new string[] { separator }, StringSplitOptions.RemoveEmptyEntries);
int pos = 0;
List<string> line = new List<string>();
List<string> lines = new List<string>();
State state = State.SeqStart;
while(pos < elements.Length) {
string current = elements[pos].Trim();
switch(state) {
case State.SeqStart:
if(current == startSeq)
state = State.LineStart;
continue;
case State.LineStart:
if(++pos < elements.Length) {
line.Add(startSeq);
state = State.Line;
}
continue;
case State.Line:
if(current == endLine)
state = State.LineEnd;
else
line.Add(current);
pos++;
continue;
case State.LineEnd:
line.Add(endLine);
line.Add(current);
lines.Add(string.Join(separator, line));
line.Clear();
state = State.LineStart;
continue;
}
}
return lines.ToArray();
}
enum State { SeqStart, LineStart, Line, LineEnd };

f you want to split the string into multiple strings, you need a set of rules, which are implementable. In your case i would start splitting the complete string by the given comma , and than go though the elements in a loop. All the strings in the loop will be appended in a StringBuilder. If your ruleset say you need a new line, just add it via yourBuilder.Append('\r\n') or use AppendLine.

Best way to split string into lines with maximum length, without breaking words

I want to break a string up into lines of a specified maximum length, without splitting any words, if possible (if there is a word that exceeds the maximum line length, then it will have to be split).
As always, I am acutely aware that strings are immutable and that one should preferably use the StringBuilder class. I have seen examples where the string is split into words and the lines are then built up using the StringBuilder class, but the code below seems "neater" to me.
I mentioned "best" in the description and not "most efficient" as I am also interested in the "eloquence" of the code. The strings will never be huge, generally splitting into 2 or three lines, and it won't be happening for thousands of lines.
Is the following code really bad?
private static IEnumerable<string> SplitToLines(string stringToSplit, int maximumLineLength)
{
stringToSplit = stringToSplit.Trim();
var lines = new List<string>();
while (stringToSplit.Length > 0)
{
if (stringToSplit.Length <= maximumLineLength)
{
lines.Add(stringToSplit);
break;
}
var indexOfLastSpaceInLine = stringToSplit.Substring(0, maximumLineLength).LastIndexOf(' ');
lines.Add(stringToSplit.Substring(0, indexOfLastSpaceInLine >= 0 ? indexOfLastSpaceInLine : maximumLineLength).Trim());
stringToSplit = stringToSplit.Substring(indexOfLastSpaceInLine >= 0 ? indexOfLastSpaceInLine + 1 : maximumLineLength);
}
return lines.ToArray();
}

Even when this post is 3 years old I wanted to give a better solution using Regex to accomplish the same:
If you want the string to be splitted and then use the text to be displayed you can use this:
public string SplitToLines(string stringToSplit, int maximumLineLength)
{
return Regex.Replace(stringToSplit, #"(.{1," + maximumLineLength +#"})(?:\s|$)", "$1\n");
}
If on the other hand you need a collection you can use this:
public MatchCollection SplitToLines(string stringToSplit, int maximumLineLength)
{
return Regex.Matches(stringToSplit, #"(.{1," + maximumLineLength +#"})(?:\s|$)");
}
NOTES
Remember to import regex (using System.Text.RegularExpressions;)
You can use string interpolation on the match:
$#"(.{{1,{maximumLineLength}}})(?:\s|$)"
The MatchCollection works almost like an Array
Matching example with explanation here

How about this as a solution:
IEnumerable<string> SplitToLines(string stringToSplit, int maximumLineLength)
{
var words = stringToSplit.Split(' ').Concat(new [] { "" });
return
words
.Skip(1)
.Aggregate(
words.Take(1).ToList(),
(a, w) =>
{
var last = a.Last();
while (last.Length > maximumLineLength)
{
a[a.Count() - 1] = last.Substring(0, maximumLineLength);
last = last.Substring(maximumLineLength);
a.Add(last);
}
var test = last + " " + w;
if (test.Length > maximumLineLength)
{
a.Add(w);
}
else
{
a[a.Count() - 1] = test;
}
return a;
});
}
I reworked this as prefer this:
IEnumerable<string> SplitToLines(string stringToSplit, int maximumLineLength)
{
var words = stringToSplit.Split(' ');
var line = words.First();
foreach (var word in words.Skip(1))
{
var test = $"{line} {word}";
if (test.Length > maximumLineLength)
{
yield return line;
line = word;
}
else
{
line = test;
}
}
yield return line;
}

I don't think your solution is too bad. I do, however, think you should break up your ternary into an if else because you are testing the same condition twice. Your code might also have a bug. Based on your description, it seems you want lines <= maxLineLength, but your code counts the space after the last word and uses it in the <= comparison resulting in effectively < behavior for the trimmed string.
Here is my solution.
private static IEnumerable<string> SplitToLines(string stringToSplit, int maxLineLength)
{
string[] words = stringToSplit.Split(' ');
StringBuilder line = new StringBuilder();
foreach (string word in words)
{
if (word.Length + line.Length <= maxLineLength)
{
line.Append(word + " ");
}
else
{
if (line.Length > 0)
{
yield return line.ToString().Trim();
line.Clear();
}
string overflow = word;
while (overflow.Length > maxLineLength)
{
yield return overflow.Substring(0, maxLineLength);
overflow = overflow.Substring(maxLineLength);
}
line.Append(overflow + " ");
}
}
yield return line.ToString().Trim();
}
It is a bit longer than your solution, but it should be more straightforward. It also uses a StringBuilder so it is much faster for large strings. I performed a benchmarking test for 20,000 words ranging from 1 to 11 characters each split into lines of 10 character width. My method completed in 14ms compared to 1373ms for your method.

Try this (untested)
private static IEnumerable<string> SplitToLines(string value, int maximumLineLength)
{
var words = value.Split(' ');
var line = new StringBuilder();
foreach (var word in words)
{
if ((line.Length + word.Length) >= maximumLineLength)
{
yield return line.ToString();
line = new StringBuilder();
}
line.AppendFormat("{0}{1}", (line.Length>0) ? " " : "", word);
}
yield return line.ToString();
}

~6x faster than the accepted answer
More than 1.5x faster than the Regex version in Release Mode (dependent on line length)
Optionally keep the space at the end of the line or not (the regex version always keeps it)
static IEnumerable<string> SplitToLines(string stringToSplit, int maximumLineLength, bool removeSpace = true)
{
int start = 0;
int end = 0;
for (int i = 0; i < stringToSplit.Length; i++)
{
char c = stringToSplit[i];
if (c == ' ' || c == '\n')
{
if (i - start > maximumLineLength)
{
string substring = stringToSplit.Substring(start, end - start); ;
start = removeSpace ? end + 1 : end; // + 1 to remove the space on the next line
yield return substring;
}
else
end = i;
}
}
yield return stringToSplit.Substring(start); // remember last line
}
Here is the example code used to test speeds (again, run on your own machine and test in Release mode to get accurate timings)
https://dotnetfiddle.net/h5I1GC
Timings on my machine in release mode .Net 4.8
Accepted Answer: 667ms
Regex: 368ms
My Version: 117ms

My requirement was to have a line break at the last space before the 30 char limit.
So here is how i did it. Hope this helps anyone looking.
private string LineBreakLongString(string input)
{
var outputString = string.Empty;
var found = false;
int pos = 0;
int prev = 0;
while (!found)
{
var p = input.IndexOf(' ', pos);
{
if (pos <= 30)
{
pos++;
if (p < 30) { prev = p; }
}
else
{
found = true;
}
}
outputString = input.Substring(0, prev) + System.Environment.NewLine + input.Substring(prev, input.Length - prev).Trim();
}
return outputString;
}

An approach using recursive method and ReadOnlySpan (Tested)
public static void SplitToLines(ReadOnlySpan<char> stringToSplit, int index, ref List<string> values)
{
if (stringToSplit.IsEmpty || index < 1) return;
var nextIndex = stringToSplit.IndexOf(' ');
var slice = stringToSplit.Slice(0, nextIndex < 0 ? stringToSplit.Length : nextIndex);
if (slice.Length <= index)
{
values.Add(slice.ToString());
nextIndex++;
}
else
{
values.Add(slice.Slice(0, index).ToString());
nextIndex = index;
}
if (stringToSplit.Length <= index) return;
SplitToLines(stringToSplit.Slice(nextIndex), index, ref values);
}

finding an string expression in an sentence

I have like a three word expression: "Shut The Door" and I want to find it in a sentence. Since They are kind of seperated by space what would be the best solution for it.

If you have the string:
string sample = "If you know what's good for you, you'll shut the door!";
And you want to find where it is in a sentence, you can use the IndexOf method.
int index = sample.IndexOf("shut the door");
// index will be 42
A non -1 answer means the string has been located. -1 means it does not exist in the string. Please note that the search string ("shut the door") is case sensitive.

Use build in Regex.Match Method for matching strings.
string text = "One car red car blue car";
string pat = #"(\w+)\s+(car)";
// Compile the regular expression.
Regex r = new Regex(pat, RegexOptions.IgnoreCase);
// Match the regular expression pattern against a text string.
Match m = r.Match(text);
int matchCount = 0;
while (m.Success)
{
Console.WriteLine("Match"+ (++matchCount));
for (int i = 1; i <= 2; i++)
{
Group g = m.Groups[i];
Console.WriteLine("Group"+i+"='" + g + "'");
CaptureCollection cc = g.Captures;
for (int j = 0; j < cc.Count; j++)
{
Capture c = cc[j];
System.Console.WriteLine("Capture"+j+"='" + c + "', Position="+c.Index);
}
}
m = m.NextMatch();
}
http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.match(v=vs.71).aspx
http://support.microsoft.com/kb/308252

if (string1.indexOf(string2) >= 0)
...

The spaces are nothing special, they are just characters, so you can find a string like this like yuo would find any other string in your sentence, for example using "indexOf" if you need the position, or just "Contains" if you need to know if it exists or not.
E.g.
string sentence = "foo bar baz";
string phrase = "bar baz";
Console.WriteLine(sentence.Contains(phrase)); // True

Here is some C# code to find a substrings using a start string and end string point but you can use as a base and modify (i.e. remove need for end string) to just find your string...
2 versions, one to just find the first instance of a substring, other returns a dictionary of all starting positions of the substring and the actual string.
public Dictionary<int, string> GetSubstringDic(string start, string end, string source, bool includeStartEnd, bool caseInsensitive)
{
int startIndex = -1;
int endIndex = -1;
int length = -1;
int sourceLength = source.Length;
Dictionary<int, string> result = new Dictionary<int, string>();
try
{
//if just want to find string, case insensitive
if (caseInsensitive)
{
source = source.ToLower();
start = start.ToLower();
end = end.ToLower();
}
//does start string exist
startIndex = source.IndexOf(start);
if (startIndex != -1)
{
//start to check for each instance of matches for the length of the source string
while (startIndex < sourceLength && startIndex > -1)
{
//does end string exist?
endIndex = source.IndexOf(end, startIndex + 1);
if (endIndex != -1)
{
//if we want to get length of string including the start and end strings
if (includeStartEnd)
{
//make sure to include the end string
length = (endIndex + end.Length) - startIndex;
}
else
{
//change start index to not include the start string
startIndex = startIndex + start.Length;
length = endIndex - startIndex;
}
//add to dictionary
result.Add(startIndex, source.Substring(startIndex, length));
//move start position up
startIndex = source.IndexOf(start, endIndex + 1);
}
else
{
//no end so break out of while;
break;
}
}
}
}
catch (Exception ex)
{
//Notify of Error
result = new Dictionary<int, string>();
StringBuilder g_Error = new StringBuilder();
g_Error.AppendLine("GetSubstringDic: " + ex.Message.ToString());
g_Error.AppendLine(ex.StackTrace.ToString());
}
return result;
}
public string GetSubstring(string start, string end, string source, bool includeStartEnd, bool caseInsensitive)
{
int startIndex = -1;
int endIndex = -1;
int length = -1;
int sourceLength = source.Length;
string result = string.Empty;
try
{
if (caseInsensitive)
{
source = source.ToLower();
start = start.ToLower();
end = end.ToLower();
}
startIndex = source.IndexOf(start);
if (startIndex != -1)
{
endIndex = source.IndexOf(end, startIndex + 1);
if (endIndex != -1)
{
if (includeStartEnd)
{
length = (endIndex + end.Length) - startIndex;
}
else
{
startIndex = startIndex + start.Length;
length = endIndex - startIndex;
}
result = source.Substring(startIndex, length);
}
}
}
catch (Exception ex)
{
//Notify of Error
result = string.Empty;
StringBuilder g_Error = new StringBuilder();
g_Error.AppendLine("GetSubstring: " + ex.Message.ToString());
g_Error.AppendLine(ex.StackTrace.ToString());
}
return result;
}

You may want to make sure the check ignores the case of both phrases.
string theSentence = "I really want you to shut the door.";
string thePhrase = "Shut The Door";
bool phraseIsPresent = theSentence.ToUpper().Contains(thePhrase.ToUpper());
int phraseStartsAt = theSentence.IndexOf(
thePhrase,
StringComparison.InvariantCultureIgnoreCase);
Console.WriteLine("Is the phrase present? " + phraseIsPresent);
Console.WriteLine("The phrase starts at character: " + phraseStartsAt);
This outputs:
Is the phrase present? True
The phrase starts at character: 21

How to make next step of a string. C#

The question is complicated but I will explain it in details.
The goal is to make a function which will return next "step" of the given string.
For example
String.Step("a"); // = "b"
String.Step("b"); // = "c"
String.Step("g"); // = "h"
String.Step("z"); // = "A"
String.Step("A"); // = "B"
String.Step("B"); // = "C"
String.Step("G"); // = "H"
Until here its quite easy, But taking in mind that input IS string it can contain more than 1 characters and the function must behave like this.
String.Step("Z"); // = "aa";
String.Step("aa"); // = "ab";
String.Step("ag"); // = "ah";
String.Step("az"); // = "aA";
String.Step("aA"); // = "aB";
String.Step("aZ"); // = "ba";
String.Step("ZZ"); // = "aaa";
and so on...
This doesn't exactly need to extend the base String class.
I tried to work it out by each characters ASCII values but got stuck with strings containing 2 characters.
I would really appreciate if someone can provide full code of the function.
Thanks in advance.
EDIT
*I'm sorry I forgot to mention earlier that the function "reparse" the self generated string when its length reaches n.
continuation of this function will be smth like this. for example n = 3
String.Step("aaa"); // = "aab";
String.Step("aaZ"); // = "aba";
String.Step("aba"); // = "abb";
String.Step("abb"); // = "abc";
String.Step("abZ"); // = "aca";
.....
String.Step("zzZ"); // = "zAa";
String.Step("zAa"); // = "zAb";
........
I'm sorry I didn't mention it earlier, after reading some answers I realised that the problem was in question.
Without this the function will always produce character "a" n times after the end of the step.

NOTE: This answer is incorrect, as "aa" should follow after "Z"... (see comments below)
Here is an algorithm that might work:
each "string" represents a number to a given base (here: twice the count of letters in the alphabet).
The next step can thus be computed by parsing the "number"-string back into a int, adding 1 and then formatting it back to the base.
Example:
"a" == 1 -> step("a") == step(1) == 1 + 1 == 2 == "b"
Now your problem is reduced to parsing the string as a number to a given base and reformatting it. A quick googling suggests this page: http://everything2.com/title/convert+any+number+to+decimal
How to implement this?
a lookup table for letters to their corresponding number: a=1, b=2, c=3, ... Y = ?, Z = 0
to parse a string to number, read the characters in reverse order, looking up the numbers and adding them up:
"ab" -> 2*BASE^0 + 1*BASE^1
with BASE being the number of "digits" (2 count of letters in alphabet, is that 48?)
EDIT: This link looks even more promising: http://www.citidel.org/bitstream/10117/20/12/convexp.html

Quite collection of approaches, here is mine:-
The Function:
private static string IncrementString(string s)
{
byte[] vals = System.Text.Encoding.ASCII.GetBytes(s);
for (var i = vals.Length - 1; i >= 0; i--)
{
if (vals[i] < 90)
{
vals[i] += 1;
break;
}
if (vals[i] == 90)
{
if (i != 0)
{
vals[i] = 97;
continue;
}
else
{
return new String('a', vals.Length + 1);
}
}
if (vals[i] < 122)
{
vals[i] += 1;
break;
}
vals[i] = 65;
break;
}
return System.Text.Encoding.ASCII.GetString(vals);
}
The Tests
Console.WriteLine(IncrementString("a") == "b");
Console.WriteLine(IncrementString("z") == "A");
Console.WriteLine(IncrementString("Z") == "aa");
Console.WriteLine(IncrementString("aa") == "ab");
Console.WriteLine(IncrementString("az") == "aA");
Console.WriteLine(IncrementString("aZ") == "ba");
Console.WriteLine(IncrementString("zZ") == "Aa");
Console.WriteLine(IncrementString("Za") == "Zb");
Console.WriteLine(IncrementString("ZZ") == "aaa");

public static class StringStep
{
public static string Next(string str)
{
string result = String.Empty;
int index = str.Length - 1;
bool carry;
do
{
result = Increment(str[index--], out carry) + result;
}
while (carry && index >= 0);
if (index >= 0) result = str.Substring(0, index+1) + result;
if (carry) result = "a" + result;
return result;
}
private static char Increment(char value, out bool carry)
{
carry = false;
if (value >= 'a' && value < 'z' || value >= 'A' && value < 'Z')
{
return (char)((int)value + 1);
}
if (value == 'z') return 'A';
if (value == 'Z')
{
carry = true;
return 'a';
}
throw new Exception(String.Format("Invalid character value: {0}", value));
}
}

Split the input string into columns and process each, right-to-left, like you would if it was basic arithmetic. Apply whatever code you've got that works with a single column to each column. When you get a Z, you 'increment' the next-left column using the same algorithm. If there's no next-left column, stick in an 'a'.

I'm sorry the question is stated partly.
I edited the question so that it meets the requirements, without the edit the function would end up with a n times by step by step increasing each word from lowercase a to uppercase z without "re-parsing" it.
Please consider re-reading the question, including the edited part

This is what I came up with. I'm not relying on ASCII int conversion, and am rather using an array of characters. This should do precisely what you're looking for.
public static string Step(this string s)
{
char[] stepChars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ".ToCharArray();
char[] str = s.ToCharArray();
int idx = s.Length - 1;
char lastChar = str[idx];
for (int i=0; i<stepChars.Length; i++)
{
if (stepChars[i] == lastChar)
{
if (i == stepChars.Length - 1)
{
str[idx] = stepChars[0];
if (str.Length > 1)
{
string tmp = Step(new string(str.Take(str.Length - 1).ToArray()));
str = (tmp + str[idx]).ToCharArray();
}
else
str = new char[] { stepChars[0], str[idx] };
}
else
str[idx] = stepChars[i + 1];
break;
}
}
return new string(str);
}

This is a special case of a numeral system. It has the base of 52. If you write some parser and output logic you can do any kind of arithmetics an obviously the +1 (++) here.
The digits are "a"-"z" and "A" to "Z" where "a" is zero and "Z" is 51
So you have to write a parser who takes the string and builds an int or long from it. This function is called StringToInt() and is implemented straight forward (transform char to number (0..51) multiply with 52 and take the next char)
And you need the reverse function IntToString which is also implementet straight forward (modulo the int with 52 and transform result to digit, divide the int by 52 and repeat this until int is null)
With this functions you can do stuff like this:
IntToString( StringToInt("ZZ") +1 ) // Will be "aaa"

You need to account for A) the fact that capital letters have a lower decimal value in the Ascii table than lower case ones. B) The table is not continuous A-Z-a-z - there are characters inbetween Z and a.
public static string stepChar(string str)
{
return stepChar(str, str.Length - 1);
}
public static string stepChar(string str, int charPos)
{
return stepChar(Encoding.ASCII.GetBytes(str), charPos);
}
public static string stepChar(byte[] strBytes, int charPos)
{
//Escape case
if (charPos < 0)
{
//just prepend with a and return
return "a" + Encoding.ASCII.GetString(strBytes);
}
else
{
strBytes[charPos]++;
if (strBytes[charPos] == 91)
{
//Z -> a plus increment previous char
strBytes[charPos] = 97;
return stepChar(strBytes, charPos - 1); }
else
{
if (strBytes[charPos] == 123)
{
//z -> A
strBytes[charPos] = 65;
}
return Encoding.ASCII.GetString(strBytes);
}
}
}
You'll probably want some checking in place to ensure that the input string only contains chars A-Za-z
Edit Tidied up code and added new overload to remove redundant byte[] -> string -> byte[] conversion
Proof http://geekcubed.org/random/strIncr.png

This is a lot like how Excel columns would work if they were unbounded. You could change 52 to reference chars.Length for easier modification.
static class AlphaInt {
private static string chars =
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
public static string StepNext(string input) {
return IntToAlpha(AlphaToInt(input) + 1);
}
public static string IntToAlpha(int num) {
if(num-- <= 0) return "a";
if(num % 52 == num) return chars.Substring(num, 1);
return IntToAlpha(num / 52) + IntToAlpha(num % 52 + 1);
}
public static int AlphaToInt(string str) {
int num = 0;
for(int i = 0; i < str.Length; i++) {
num += (chars.IndexOf(str.Substring(i, 1)) + 1)
* (int)Math.Pow(52, str.Length - i - 1);
}
return num;
}
}

LetterToNum should be be a Function that maps "a" to 0 and "Z" to 51.
NumToLetter the inverse.
long x = "aazeiZa".Aggregate((x,y) => (x*52) + LetterToNum(y)) + 1;
string s = "";
do { // assertion: x > 0
var c = x % 52;
s = NumToLetter() + s;
x = (x - c) / 52;
} while (x > 0)
// s now should contain the result

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Efficient way to extract just one element from a delimited string - c#

Related

How do you do a string split with 2 chars counts in C#?

How to parse below string in C#?

Best way to split string into lines with maximum length, without breaking words

finding an string expression in an sentence

How to make next step of a string. C#

Categories

Resources