Break string into separate lines: a showdown - c#

Accommodating legacy database tables that were designed specifically for the IBM mainframe screens they represent has caused me much aggravation. Often, I find the need to break a string into multiple lines to fit the table column width as well as the users who are still viewing the data with terminal emulators. Here are two functions I've written to perform that task, accepting a string and a line width as parameters and returning some string enumerable. Which do you think is the better function and why? And by all means share the super-easy-fast-efficient way that I totally overlooked.
public string[] BreakStringIntoArray(string s, int lineWidth)
{
int lineCount = ((s.Length + lineWidth) - 1) / lineWidth;
string[] strArray = new string[lineCount];
for (int i = 0; i <= lineCount - 1; i++)
{
if (((i * lineWidth) + lineWidth) >= s.Length)
strArray[i] = s.Substring(i * lineWidth);
else
strArray[i] = s.Substring(i * lineWidth, lineWidth);
}
return strArray;
}
vs.
public List<string> BreakStringIntoList(string s, int lineWidth)
{
List<string> lines = new List<string>();
if (s.Length > lineWidth)
{
lines.Add(s.Substring(0, lineWidth));
lines.AddRange(this.BreakStringIntoList(s.Substring(lineWidth), lineWidth));
}
else
{
lines.Add(s);
}
return lines;
}
For example, passing in ("Hello world", 5) would return 3 strings:
"Hello"
" worl"
"d"

The first one is way better.
The second one will produce tons of temporary objects which is completely unnecessary if you can pre-determine the amount of target lines.
And for having something overlooked: Depending on the speed requirements you could likely use pointers to get a considerable speedup (depends on what you actually want to do with the result). But this will be WAY MORE complex than it is now.
Just to quantify the "tons of temporary objects". If you have a 1MB String (and worst case line-length of 1) the first approach needs 1MB of String-contents memory allocations. The second will need 500GB of String-contents memory allocations.

public List<string> BreakStringIntoLines(string s, int lineWidth)
{
string working = s;
List<string> result = new List<string>(Math.Ceil((double)s.Length / lineWidth));
while (working.Length > lineWidth)
{
result.add(working.Substring(0, lineWidth);
working = working.Substring(5);
}
result.Add(working);
return result;
}
That's probably how I would do it. A List is more flexible than a string array IMHO.
Also, I would avoid recursion like the plague. It is far easier to create and debug something that uses a simple loop.

Regex.Replace(input, ".{5}", x => x.Value + "\n").Split(new char [] {'\n'})

EDIT: sorry, i thought you want one string devided into lines ^__^"
depends on the length of the string.. but for long ones:
string BreakStringIntoLines(string s, int lineWidth)
{
StringBuilder sb = new StringBuilder(s);
for (int i = lineWidth; i < sb.Length; i += lineWidth)
{
sb.Insert(i, Environment.NewLine);
}
return sb.ToString();
}

I like the solution from Itay, but it does not produce the right output, so here's my edited version.
protected void Page_Load(object sender, EventArgs e)
{
string a = "12345678901234567890123456789012345";
TextBox1.Text = a;
TextBox2.Text = BreakStringIntoLinesVer2(a, 10);
}
string BreakStringIntoLinesVer2(string s, int lineWidth)
{
StringBuilder sb = new StringBuilder(s);
int last = (sb.Length % lineWidth == 0) ? sb.Length - lineWidth : sb.Length - (sb.Length % lineWidth);
for (int i = last; i > 0; i -= lineWidth)
{
sb.Insert(i, Environment.NewLine);
}
return sb.ToString();
}

Related

How do you do a string split with 2 chars counts in C#?

I know how to do a string split if there's a letter, number, that I want to replace.
But how could I do a string.Split() by 2 char counts without replacing any existing letters, number, etc...?
Example:
string MAC = "00122345"
I want that string to output: 00:12:23:45
You could create a LINQ extension method to give you an IEnumerable<string> of parts:
public static class Extensions
{
public static IEnumerable<string> SplitNthParts(this string source, int partSize)
{
if (string.IsNullOrEmpty(source))
{
throw new ArgumentException("String cannot be null or empty.", nameof(source));
}
if (partSize < 1)
{
throw new ArgumentException("Part size has to be greater than zero.", nameof(partSize));
}
return Enumerable
.Range(0, (source.Length + partSize - 1) / partSize)
.Select(pos => source
.Substring(pos * partSize,
Math.Min(partSize, source.Length - pos * partSize)));
}
}
Usage:
var strings = new string[] {
"00122345",
"001223453"
};
foreach (var str in strings)
{
Console.WriteLine(string.Join(":", str.SplitNthParts(2)));
}
// 00:12:23:45
// 00:12:23:45:3
Explanation:
Use Enumerable.Range to get number of positions to slice string. In this case its the length of the string + chunk size - 1, since we need to get a big enough range to also fit leftover chunk sizes.
Enumerable.Select each position of slicing and get the startIndex using String.Substring using the position multiplied by 2 to move down the string every 2 characters. You will have to use Math.Min to calculate the smallest size leftover size if the string doesn't have enough characters to fit another chunk. You can calculate this by the length of the string - current position * chunk size.
String.Join the final result with ":".
You could also replace the LINQ query with yield here to increase performance for larger strings since all the substrings won't be stored in memory at once:
for (var pos = 0; pos < source.Length; pos += partSize)
{
yield return source.Substring(pos, Math.Min(partSize, source.Length - pos));
}
You can use something like this:
string newStr= System.Text.RegularExpressions.Regex.Replace(MAC, ".{2}", "$0:");
To trim the last colon, you can use something like this.
newStr.TrimEnd(':');
Microsoft Document
Try this way.
string MAC = "00122345";
MAC = System.Text.RegularExpressions.Regex.Replace(MAC,".{2}", "$0:");
MAC = MAC.Substring(0,MAC.Length-1);
Console.WriteLine(MAC);
A quite fast solution, 8-10x faster than the current accepted answer (regex solution) and 3-4x faster than the LINQ solution
public static string Format(this string s, string separator, int length)
{
StringBuilder sb = new StringBuilder();
for (int i = 0; i < s.Length; i += length)
{
sb.Append(s.Substring(i, Math.Min(s.Length - i, length)));
if (i < s.Length - length)
{
sb.Append(separator);
}
}
return sb.ToString();
}
Usage:
string result = "12345678".Format(":", 2);
Here is a one (1) line alternative using LINQ Enumerable.Aggregate.
string result = MAC.Aggregate("", (acc, c) => acc.Length % 3 == 0 ? acc += c : acc += c + ":").TrimEnd(':');
An easy to understand and simple solution.
This is a simple fast modified answer in which you can easily change the split char.
This answer also checks if the number is even or odd , to make the suitable string.Split().
input : 00122345
output : 00:12:23:45
input : 0012234
output : 00:12:23:4
//The List that keeps the pairs
List<string> MACList = new List<string>();
//Split the even number into pairs
for (int i = 1; i <= MAC.Length; i++)
{
if (i % 2 == 0)
{
MACList.Add(MAC.Substring(i - 2, 2));
}
}
//Make the preferable output
string output = "";
for (int j = 0; j < MACList.Count; j++)
{
output = output + MACList[j] + ":";
}
//Checks if the input string is even number or odd number
if (MAC.Length % 2 == 0)
{
output = output.Trim(output.Last());
}
else
{
output += MAC.Last();
}
//input : 00122345
//output : 00:12:23:45
//input : 0012234
//output : 00:12:23:4

Taking certain string characters and returning the string

I asked this question yesterday but it wasn't well received mainly due to how I asked it so ill try do better this time.
I have a string variable called message. lets say message equals "ABCDABCDABCDABCD"
now I need to do some processing on the characters in the string but not all at the same time, I want to access characters [0][4][8][12] on the first pass of the function, put each of these characters in a string and return it which is easy done if I pass an integer to my function lets say 4 and with in a for loop do
if(i % int == 0)
{
string += message[i];
}
this should return "AAAA"
the next time I call the function ill need elements [0][1], [4][5], [8][9], [12][13] and the time after that ill need [0][1][2], [4][5][6], [8][9][10], [12][13][14].
I need the characters returned in a string in the order they were taken, I could do this by changing my int I pass the function but then id need to call the function several times and do work on the returned strings to get them into the order they were taken, which I have already tried and it slowed my program down when dealing with large messages > 10k characters.
Please don't delete or put my question on hold, im quite happy to give more information on my problem if its not clear, ill seldom post to this site and usually try and find a solution myself, there are too many acceptance junkies on here for my liking. but I would appreciate some help from some of them regarding this.Thanks
Edit
I understand its not easy to figure it out and I have to say im not the best at describing it, its a vigenere cracker in WPF, I have done the kasiski examination on a piece of text and graphed out all the data, it finds the key length 90% of the time or gives me the best clue to what the key might be, now im calculating the frequency of bi,tri and quad grams of the message based on the data from the kasiski exam, lets say the key is 5 and the message is "ABCDABCDABCDABCD" im calculating probability on only the characters of the key Im changing so when I try key AAAAA im only wanting to calculate monograms on elements [0][4][9][14] of the message, ill run through 26 characters up to ZAAAA and take the most probable then I move onto element [1] of the key, lets say FAAAA gave the best score on the first element of the key. now I need elements [0][1],[5][6],[9][10][13][14] as im calculating probability on 2 pieces on the key FCAAA, so the length of the key and what key character im working on will determine what elements of the message ill be taking.
One-liner with LINQ (I use Batch extension from MoreLINQ, but you can use your own) which selects all required chars from input string:
string message = "ABCDABCDABCDABCD";
int size = 4;
int charsToTake = 2;
var characters = message.Batch(size).SelectMany(b => b.Take(charsToTake));
If you need result as string, you can easily create one:
var result = new String(characters.ToArray());
// ABABABAB
More efficient way - create your own method which will split string by substrings of required length:
public static IEnumerable<string> ToSubstrings(this string s, int length)
{
int index = 0;
while (index + length < s.Length)
{
yield return s.Substring(index, length);
index += length;
}
if (index < s.Length)
yield return s.Substring(index);
}
I would also create method for safe getting substring from start of string (to avoid annoying string length check and passing zero as start index):
public static string SubstringFromStart(this string s, int length)
{
return s.Substring(0, Math.Min(s.Length, length));
}
Now its very clear what you are doing:
var substrings = message.ToSubstrings(size)
.Select(s => s.SubstringFromStart(charsToTake));
var result = String.Concat(substrings);
Here is a simple program which does what you want, if I understand correctly:
static void Main(string[] args)
{
string data = "ABCDABCDABCDABCD";
Console.WriteLine(StrangeSubstring(data,4, 1));
// "AAAA"
Console.WriteLine(StrangeSubstring(data,4, 2));
// "ABABABAB"
Console.WriteLine(StrangeSubstring(data,4, 3));
// "ABCABCABCABC"
}
static string StrangeSubstring(string input, int modulo, int length)
{
StringBuilder sb = new StringBuilder();
for (int i = 0; i < input.Length; ++i)
{
if (i % modulo == 0)
{
for (int j = 0; j<length; ++j)
{
if (i+j < input.Length)
sb.Append(input[i+j]);
}
}
}
return sb.ToString();
}
My solution will be like this
static string MethodName(int range){
StringBuilder sb = new StringBuilder();
for(int i = 0 ; i < str.Length ; i++){
if(i % 4 == 0){
sb.Append(str[i]);
for(int j = i + 1 ; j <= i + range ; j ++){
if(j >= str.Length)
break;
sb.Append(str[j]);
}
}
}
return sb.ToString();
}
you can parse your string to a char array :
string message="ABCDABCDABCDABCD";
char[] myCharArray = message.ToCharArray();
string result="";
for(int i=0, i<myCharArray.Length -1 ; i++)
{
if(i%4 ==0)
result+=myCharArray[i];
}
EDIT 1 :
public string[] myfunction(char[] charArray)
{
List<string> result = new List<string>();
for(int i=0, i<charArray.length -1; i=i+4)
{
result.add(charArray[i]+charArray[i+1])
}
return result.toArray();
}
This is a recursive solution. In YourFunction, PatternLength is the length of the character pattern which is repeated (so, 4 for "ABCD"), Offset is where you start in the pattern (e.g. 0 if you start with "A") and SubstringLength is the number of characters.
The function call in Main will give you all "A". If you change SubstringLength to 2, it gives you all "AB". There is no error handling, make sure then PatternLength<=Offest+SubstringLength
namespace Foo
{
class Bar
{
static void Main(string[] args)
{
Console.WriteLine(YourFunction("ABCABCABCABCABCABCABC", 3, 0,1));
Console.ReadKey();
}
static string YourFunction(string SubString, int PatternLength, int Offset, int SubstringLength)
{
string result;
if (SubString.Length <= PatternLength)
{
result = SubString.Substring(Offset, SubstringLength);
}
else
{
result = YourFunction(SubString.Substring(PatternLength, (SubString.Length - PatternLength)), PatternLength, Offset, SubstringLength) + SubString.Substring(Offset, SubstringLength);
}
return result;
}
}
}

Best way to split string into lines with maximum length, without breaking words

I want to break a string up into lines of a specified maximum length, without splitting any words, if possible (if there is a word that exceeds the maximum line length, then it will have to be split).
As always, I am acutely aware that strings are immutable and that one should preferably use the StringBuilder class. I have seen examples where the string is split into words and the lines are then built up using the StringBuilder class, but the code below seems "neater" to me.
I mentioned "best" in the description and not "most efficient" as I am also interested in the "eloquence" of the code. The strings will never be huge, generally splitting into 2 or three lines, and it won't be happening for thousands of lines.
Is the following code really bad?
private static IEnumerable<string> SplitToLines(string stringToSplit, int maximumLineLength)
{
stringToSplit = stringToSplit.Trim();
var lines = new List<string>();
while (stringToSplit.Length > 0)
{
if (stringToSplit.Length <= maximumLineLength)
{
lines.Add(stringToSplit);
break;
}
var indexOfLastSpaceInLine = stringToSplit.Substring(0, maximumLineLength).LastIndexOf(' ');
lines.Add(stringToSplit.Substring(0, indexOfLastSpaceInLine >= 0 ? indexOfLastSpaceInLine : maximumLineLength).Trim());
stringToSplit = stringToSplit.Substring(indexOfLastSpaceInLine >= 0 ? indexOfLastSpaceInLine + 1 : maximumLineLength);
}
return lines.ToArray();
}
Even when this post is 3 years old I wanted to give a better solution using Regex to accomplish the same:
If you want the string to be splitted and then use the text to be displayed you can use this:
public string SplitToLines(string stringToSplit, int maximumLineLength)
{
return Regex.Replace(stringToSplit, #"(.{1," + maximumLineLength +#"})(?:\s|$)", "$1\n");
}
If on the other hand you need a collection you can use this:
public MatchCollection SplitToLines(string stringToSplit, int maximumLineLength)
{
return Regex.Matches(stringToSplit, #"(.{1," + maximumLineLength +#"})(?:\s|$)");
}
NOTES
Remember to import regex (using System.Text.RegularExpressions;)
You can use string interpolation on the match:
$#"(.{{1,{maximumLineLength}}})(?:\s|$)"
The MatchCollection works almost like an Array
Matching example with explanation here
How about this as a solution:
IEnumerable<string> SplitToLines(string stringToSplit, int maximumLineLength)
{
var words = stringToSplit.Split(' ').Concat(new [] { "" });
return
words
.Skip(1)
.Aggregate(
words.Take(1).ToList(),
(a, w) =>
{
var last = a.Last();
while (last.Length > maximumLineLength)
{
a[a.Count() - 1] = last.Substring(0, maximumLineLength);
last = last.Substring(maximumLineLength);
a.Add(last);
}
var test = last + " " + w;
if (test.Length > maximumLineLength)
{
a.Add(w);
}
else
{
a[a.Count() - 1] = test;
}
return a;
});
}
I reworked this as prefer this:
IEnumerable<string> SplitToLines(string stringToSplit, int maximumLineLength)
{
var words = stringToSplit.Split(' ');
var line = words.First();
foreach (var word in words.Skip(1))
{
var test = $"{line} {word}";
if (test.Length > maximumLineLength)
{
yield return line;
line = word;
}
else
{
line = test;
}
}
yield return line;
}
I don't think your solution is too bad. I do, however, think you should break up your ternary into an if else because you are testing the same condition twice. Your code might also have a bug. Based on your description, it seems you want lines <= maxLineLength, but your code counts the space after the last word and uses it in the <= comparison resulting in effectively < behavior for the trimmed string.
Here is my solution.
private static IEnumerable<string> SplitToLines(string stringToSplit, int maxLineLength)
{
string[] words = stringToSplit.Split(' ');
StringBuilder line = new StringBuilder();
foreach (string word in words)
{
if (word.Length + line.Length <= maxLineLength)
{
line.Append(word + " ");
}
else
{
if (line.Length > 0)
{
yield return line.ToString().Trim();
line.Clear();
}
string overflow = word;
while (overflow.Length > maxLineLength)
{
yield return overflow.Substring(0, maxLineLength);
overflow = overflow.Substring(maxLineLength);
}
line.Append(overflow + " ");
}
}
yield return line.ToString().Trim();
}
It is a bit longer than your solution, but it should be more straightforward. It also uses a StringBuilder so it is much faster for large strings. I performed a benchmarking test for 20,000 words ranging from 1 to 11 characters each split into lines of 10 character width. My method completed in 14ms compared to 1373ms for your method.
Try this (untested)
private static IEnumerable<string> SplitToLines(string value, int maximumLineLength)
{
var words = value.Split(' ');
var line = new StringBuilder();
foreach (var word in words)
{
if ((line.Length + word.Length) >= maximumLineLength)
{
yield return line.ToString();
line = new StringBuilder();
}
line.AppendFormat("{0}{1}", (line.Length>0) ? " " : "", word);
}
yield return line.ToString();
}
~6x faster than the accepted answer
More than 1.5x faster than the Regex version in Release Mode (dependent on line length)
Optionally keep the space at the end of the line or not (the regex version always keeps it)
static IEnumerable<string> SplitToLines(string stringToSplit, int maximumLineLength, bool removeSpace = true)
{
int start = 0;
int end = 0;
for (int i = 0; i < stringToSplit.Length; i++)
{
char c = stringToSplit[i];
if (c == ' ' || c == '\n')
{
if (i - start > maximumLineLength)
{
string substring = stringToSplit.Substring(start, end - start); ;
start = removeSpace ? end + 1 : end; // + 1 to remove the space on the next line
yield return substring;
}
else
end = i;
}
}
yield return stringToSplit.Substring(start); // remember last line
}
Here is the example code used to test speeds (again, run on your own machine and test in Release mode to get accurate timings)
https://dotnetfiddle.net/h5I1GC
Timings on my machine in release mode .Net 4.8
Accepted Answer: 667ms
Regex: 368ms
My Version: 117ms
My requirement was to have a line break at the last space before the 30 char limit.
So here is how i did it. Hope this helps anyone looking.
private string LineBreakLongString(string input)
{
var outputString = string.Empty;
var found = false;
int pos = 0;
int prev = 0;
while (!found)
{
var p = input.IndexOf(' ', pos);
{
if (pos <= 30)
{
pos++;
if (p < 30) { prev = p; }
}
else
{
found = true;
}
}
outputString = input.Substring(0, prev) + System.Environment.NewLine + input.Substring(prev, input.Length - prev).Trim();
}
return outputString;
}
An approach using recursive method and ReadOnlySpan (Tested)
public static void SplitToLines(ReadOnlySpan<char> stringToSplit, int index, ref List<string> values)
{
if (stringToSplit.IsEmpty || index < 1) return;
var nextIndex = stringToSplit.IndexOf(' ');
var slice = stringToSplit.Slice(0, nextIndex < 0 ? stringToSplit.Length : nextIndex);
if (slice.Length <= index)
{
values.Add(slice.ToString());
nextIndex++;
}
else
{
values.Add(slice.Slice(0, index).ToString());
nextIndex = index;
}
if (stringToSplit.Length <= index) return;
SplitToLines(stringToSplit.Slice(nextIndex), index, ref values);
}

Remove additional spacing in string [Fastest Way]

I need to remove all additional spaces in a string.
I use regex for matching strings and matched strings i replace with some others.
For better understanding please see examples below:
3 input strings:
Hello, how are you?
Hello , how are you?
Hello , how are you ?
This are 3 strings that should match by one pattern-regex.
It looks something like this:
Hello\s*,\s+how\s+are\s+you\s*?
It works fine but there is a perfomance problem.
If I have a lot of patterns (~20k) and try to execute each pattern it runs very slow (3-5 minutes).
Maybe there is better way for doing this?
for example use some 3d-party libs?
UPD: Folks, this question is not about how to do this. It's about how to do this with best perfomance. :)
Let me explain more detailed. The main goal is tokenize text. (replace some token with special symbols)
For example I have a token "nice try".
Then I input text "this is nice try".
result: "this is #tokenizedtext#" where #tokenizedtext# some special symbols. It doesen't matter in this case.
Next I have string "Mike said it was a nice try".
result should be "Mike said it was a #tokenizedtext#".
I think the main idea is clear.
So I can have a lot of tokens. When I process it I convert my token from "nice try" to pattern "nice\s+try". and try to replace with this pattern input text.
It works fine. But if in tokens there is more spaces and there is also punctuation then my regexes became bigger and works very slow.
Do you have some suggestions (technical or logic) for solving this problem?
I can suggest a few solutions.
First of all, avoid the static Regex method. Create an instance of it (and store it, don't call the constructor for each replacement!) and, if possible, use RegexOptions.Compiled. It should improve your performance.
Second, you can try to review your pattern. I'll do some profiling, but I'm currently undecisive between:
#"(?<=\s)\s+"
With replacement being an empty string or:
#"\s+"
With a space as a replacement. You can try this code, in the meanwhile:
var s = "Hello , how are you?";
var pattern = #"\s+";
var regex = new Regex(pattern, RegexOptions.Compiled);
var replaced = regex.Replace(s, " ");
EDIT: After having done some measurement, the second pattern seems to be faster. I'm editing my sample to adapt it.
EDIT 2: I've written an unsafe method. It's much faster than the other ones presented here, including the Regex ones, but, as the word itself says, it's unsafe. I don't think that there's any problem with the code I've written but I may be wrong -- So please, check it again and again in case there's a bug in the method.
static unsafe string TrimInternal(string input)
{
var length = input.Length;
var array = stackalloc char[length];
fixed (char* fix = input)
{
var ptr = fix;
var counter = 0;
var lastWasSpace = false;
while (*ptr != '\x0')
{
//Current char is a space?
var isSpace = *ptr == ' ';
//If it's a space but the last one wasn't
//Or if it's not a space
if (isSpace && !lastWasSpace || !isSpace)
//Write into the result array
array[counter++] = *ptr;
//The last character (before the next loop) was a space
lastWasSpace = isSpace;
//Increase the pointer
ptr++;
}
return new string(array, 0, counter);
}
}
Usage (compile with /unsafe):
var s = TrimInternal("Hello , how are you?");
Profiling made in Release build, optimizations on, 1000000 iterations:
My above solution with Regex: 00:00:03.2130121
The unsafe solution: 00:00:00.2063467
This might work for you. It should be pretty fast. Note that it also removes spaces at the end of the string; that might not be what you want...
using System;
namespace Demo
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine(">{0}<", RemoveExtraSpaces("Hello, how are you?"));
Console.WriteLine(">{0}<", RemoveExtraSpaces("Hello , how are you?"));
Console.WriteLine(">{0}<", RemoveExtraSpaces("Hello , how are you ?"));
}
public static string RemoveExtraSpaces(string text)
{
var buffer = new char[text.Length];
bool isSpaced = false;
int n = 0;
foreach (char c in text)
{
if (c == ' ')
{
isSpaced = true;
}
else
{
if (isSpaced)
{
if ((c != ',') && (c != '?'))
{
buffer[n++] = ' ';
}
isSpaced = false;
}
buffer[n++] = c;
}
}
return new string(buffer, 0, n);
}
}
}
Something of my own :
find all the position of WhiteSpacechar in string;
private static IEnumerable<int> GetWhiteSpacePos(string input)
{
int iPos = -1;
while ((iPos = input.IndexOf(" ", iPos + 1, StringComparison.Ordinal)) > -1)
{
yield return iPos;
}
}
Remove all whitespace that are in in sequence Returned from GetWhiteSpacePos
string original_string = "Hello , how are you ?";
var poss = GetWhiteSpacePos(original_string).ToList();
int startPos;
int endPos;
StringBuilder builder = new StringBuilder(original_string);
for (int i = poss.Count -1; i > 1; i--)
{
endPos = poss[i];
while ((poss[i] == poss[i - 1] + 1) && i > 1)
{
i--;
}
startPos = poss[i];
if (endPos - startPos > 1)
{
builder.Remove(startPos, endPos - startPos);
}
}
string new_string = builder.ToString();
You are using a very complex regex..simplify the regex and that would definitely increasre the performance
Use \s+ and replace it with a single space
Well, these kind of problems really trouble us. Use this code, and I'm sure you're getting the result for what you've asked. This command removes any extra white space between any string.
cleanString= Regex.Replace(originalString, #"\s", " ");
Hope thar works for you. Thanks.
And since this is a single Instruction. It will utilize less CPU resource and hence less CPU time, which ultimately increases your performance. Therefore A/C to me this method works the best when compared in terms of performance.
if its just a matter of SPACE;
try this
Source : http://www.codeproject.com/Articles/10890/Fastest-C-Case-Insenstive-String-Replace
private static string ReplaceEx(string original,
string pattern, string replacement)
{
int count, position0, position1;
count = position0 = position1 = 0;
string upperString = original.ToUpper();
string upperPattern = pattern.ToUpper();
int inc = (original.Length / pattern.Length) *
(replacement.Length - pattern.Length);
char[] chars = new char[original.Length + Math.Max(0, inc)];
while ((position1 = upperString.IndexOf(upperPattern,
position0)) != -1)
{
for (int i = position0; i < position1; ++i)
chars[count++] = original[i];
for (int i = 0; i < replacement.Length; ++i)
chars[count++] = replacement[i];
position0 = position1 + pattern.Length;
}
if (position0 == 0) return original;
for (int i = position0; i < original.Length; ++i)
chars[count++] = original[i];
return new string(chars, 0, count);
}
Usage:
string original_string = "Hello , how are you ?";
while (original_string.Contains(" "))
{
original_string = ReplaceEx(original_string, " ", " ");
}
Replacing the regex way:
string resultString = null;
try {
resultString = Regex.Replace(subjectString, #"\s+", " ", RegexOption.Compiled);
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}

C# line break every n characters

Suppose I have a string with the text: "THIS IS A TEST". How would I split it every n characters? So if n was 10, then it would display:
"THIS IS A "
"TEST"
..you get the idea. The reason is because I want to split a very big line into smaller lines, sort of like word wrap. I think I can use string.Split() for this, but I have no idea how and I'm confused.
Any help would be appreciated.
Let's borrow an implementation from my answer on code review. This inserts a line break every n characters:
public static string SpliceText(string text, int lineLength) {
return Regex.Replace(text, "(.{" + lineLength + "})", "$1" + Environment.NewLine);
}
Edit:
To return an array of strings instead:
public static string[] SpliceText(string text, int lineLength) {
return Regex.Matches(text, ".{1," + lineLength + "}").Cast<Match>().Select(m => m.Value).ToArray();
}
Maybe this can be used to handle efficiently extreme large files :
public IEnumerable<string> GetChunks(this string sourceString, int chunkLength)
{
using(var sr = new StringReader(sourceString))
{
var buffer = new char[chunkLength];
int read;
while((read= sr.Read(buffer, 0, chunkLength)) == chunkLength)
{
yield return new string(buffer, 0, read);
}
}
}
Actually, this works for any TextReader. StreamReader is the most common used TextReader. You can handle very large text files (IIS Log files, SharePoint Log files, etc) without having to load the whole file, but reading it line by line.
You should be able to use a regex for this. Here is an example:
//in this case n = 10 - adjust as needed
List<string> groups = (from Match m in Regex.Matches(str, ".{1,10}")
select m.Value).ToList();
string newString = String.Join(Environment.NewLine, lst.ToArray());
Refer to this question for details:
Splitting a string into chunks of a certain size
Probably not the most optimal way, but without regex:
string test = "my awesome line of text which will be split every n characters";
int nInterval = 10;
string res = String.Concat(test.Select((c, i) => i > 0 && (i % nInterval) == 0 ? c.ToString() + Environment.NewLine : c.ToString()));
Coming back to this after doing a code review, there's another way of doing the same without using Regex
public static IEnumerable<string> SplitText(string text, int length)
{
for (int i = 0; i < text.Length; i += length)
{
yield return text.Substring(i, Math.Min(length, text.Length - i));
}
}
Some code that I just wrote:
string[] SplitByLength(string line, int len, int IsB64=0) {
int i;
if (IsB64 == 1) {
// Only Allow Base64 Line Lengths without '=' padding
int mod64 = (len % 4);
if (mod64 != 0) {
len = len + (4 - mod64);
}
}
int parts = line.Length / len;
int frac = line.Length % len;
int extra = 0;
if (frac != 0) {
extra = 1;
}
string[] oline = new string[parts + extra];
for(i=0; i < parts; i++) {
oline[i] = line.Substring(0, len);
line = line.Substring(len);
}
if (extra == 1) {
oline[i] = line;
}
return oline;
}
string CRSplitByLength(string line, int len, int IsB64 = 0)
{
string[] lines = SplitByLength(line, len, IsB64);
return string.Join(System.Environment.NewLine, lines);
}
string m = "1234567890abcdefghijklmnopqrstuvwxhyz";
string[] r = SplitByLength(m, 6, 0);
foreach (string item in r) {
Console.WriteLine("{0}", item);
}

Categories

Resources