Suppose I have a string with the text: "THIS IS A TEST". How would I split it every n characters? So if n was 10, then it would display:
"THIS IS A "
"TEST"
..you get the idea. The reason is because I want to split a very big line into smaller lines, sort of like word wrap. I think I can use string.Split() for this, but I have no idea how and I'm confused.
Any help would be appreciated.
Let's borrow an implementation from my answer on code review. This inserts a line break every n characters:
public static string SpliceText(string text, int lineLength) {
return Regex.Replace(text, "(.{" + lineLength + "})", "$1" + Environment.NewLine);
}
Edit:
To return an array of strings instead:
public static string[] SpliceText(string text, int lineLength) {
return Regex.Matches(text, ".{1," + lineLength + "}").Cast<Match>().Select(m => m.Value).ToArray();
}
Maybe this can be used to handle efficiently extreme large files :
public IEnumerable<string> GetChunks(this string sourceString, int chunkLength)
{
using(var sr = new StringReader(sourceString))
{
var buffer = new char[chunkLength];
int read;
while((read= sr.Read(buffer, 0, chunkLength)) == chunkLength)
{
yield return new string(buffer, 0, read);
}
}
}
Actually, this works for any TextReader. StreamReader is the most common used TextReader. You can handle very large text files (IIS Log files, SharePoint Log files, etc) without having to load the whole file, but reading it line by line.
You should be able to use a regex for this. Here is an example:
//in this case n = 10 - adjust as needed
List<string> groups = (from Match m in Regex.Matches(str, ".{1,10}")
select m.Value).ToList();
string newString = String.Join(Environment.NewLine, lst.ToArray());
Refer to this question for details:
Splitting a string into chunks of a certain size
Probably not the most optimal way, but without regex:
string test = "my awesome line of text which will be split every n characters";
int nInterval = 10;
string res = String.Concat(test.Select((c, i) => i > 0 && (i % nInterval) == 0 ? c.ToString() + Environment.NewLine : c.ToString()));
Coming back to this after doing a code review, there's another way of doing the same without using Regex
public static IEnumerable<string> SplitText(string text, int length)
{
for (int i = 0; i < text.Length; i += length)
{
yield return text.Substring(i, Math.Min(length, text.Length - i));
}
}
Some code that I just wrote:
string[] SplitByLength(string line, int len, int IsB64=0) {
int i;
if (IsB64 == 1) {
// Only Allow Base64 Line Lengths without '=' padding
int mod64 = (len % 4);
if (mod64 != 0) {
len = len + (4 - mod64);
}
}
int parts = line.Length / len;
int frac = line.Length % len;
int extra = 0;
if (frac != 0) {
extra = 1;
}
string[] oline = new string[parts + extra];
for(i=0; i < parts; i++) {
oline[i] = line.Substring(0, len);
line = line.Substring(len);
}
if (extra == 1) {
oline[i] = line;
}
return oline;
}
string CRSplitByLength(string line, int len, int IsB64 = 0)
{
string[] lines = SplitByLength(line, len, IsB64);
return string.Join(System.Environment.NewLine, lines);
}
string m = "1234567890abcdefghijklmnopqrstuvwxhyz";
string[] r = SplitByLength(m, 6, 0);
foreach (string item in r) {
Console.WriteLine("{0}", item);
}
Related
I am trying to read a file and split the text after every 1000 characters. But I want to keep the words intact. So it should just split at the space. If the 1000th character is not a space, then split at the first space just before or just after it. Any idea how to do that? I am also removing the extra spaces from the text.
while ((line = file.ReadLine()) != null)
{
text = text + line.Trim();
noSpaceText = Regex.Replace(text, #"\r\n?|\n/", "");
}
List<string> rowsToInsert = new List<string>();
int splitAt = 1000;
for (int i = 0; i < noSpaceText.Length; i = i + splitAt)
{
if (noSpaceText.Length - i >= splitAt)
{
rowsToInsert.Add(noSpaceText.Substring(i, splitAt));
}
else
rowsToInsert.Add(noSpaceText.Substring(i,
((noSpaceText.Length - i))));
}
foreach(var item in rowsToInsert)
{
Console.WriteLine(item);
}
Okay, just typed this non tested solution which should do the trick:
public static List<string> SplitOn(this string input, int charLength, char[] seperator)
{
List<string> splits = new List<string>();
var tokens = input.Split(seperator);
// -1 because first token adds 1 to length
int totalLength = -1;
List<string> segments = new List<string>;
foreach(var t in tokens)
{
if(totalLength + t.Length+1 > charLength)
{
splits.Add(String.Join(" ", segments));
totalLength = -1;
segments.Clear();
}
totalLength += t.Length + 1;
segments.Add(t);
}
if(segments.Count>0)
{
splits.Add(String.Join(" ", segments));
}
return splits;
}
It's an extension Method, which splits an input text in segments by whitespaces, means, i iterate over an array with just words. Then counting the length of each segment, checking for totallength and add it to result list.
An alternate solution:
public static List<string> SplitString(string stringInput, int blockLength)
{
var output = new List<string>();
var count = 0;
while(count < stringInput.Length)
{
string block = "";
if(count + blockLength > stringInput.Length)
{
block = stringInput.Substring(count, stringInput.Length - count);
}
else
{
block = stringInput.Substring(count, blockLength + 1);
}
if(block.Length < blockLength)
{
output.Add(block);
count += block.Length;
}
else if(block.EndsWith(" "))
{
output.Add(block);
count = count+blockLength + 1;
}
else
{
output.Add(block.Substring(0, block.LastIndexOf(" ")));
count = count + block.LastIndexOf(" ") +1;
}
}
return output;
}
I know how to do a string split if there's a letter, number, that I want to replace.
But how could I do a string.Split() by 2 char counts without replacing any existing letters, number, etc...?
Example:
string MAC = "00122345"
I want that string to output: 00:12:23:45
You could create a LINQ extension method to give you an IEnumerable<string> of parts:
public static class Extensions
{
public static IEnumerable<string> SplitNthParts(this string source, int partSize)
{
if (string.IsNullOrEmpty(source))
{
throw new ArgumentException("String cannot be null or empty.", nameof(source));
}
if (partSize < 1)
{
throw new ArgumentException("Part size has to be greater than zero.", nameof(partSize));
}
return Enumerable
.Range(0, (source.Length + partSize - 1) / partSize)
.Select(pos => source
.Substring(pos * partSize,
Math.Min(partSize, source.Length - pos * partSize)));
}
}
Usage:
var strings = new string[] {
"00122345",
"001223453"
};
foreach (var str in strings)
{
Console.WriteLine(string.Join(":", str.SplitNthParts(2)));
}
// 00:12:23:45
// 00:12:23:45:3
Explanation:
Use Enumerable.Range to get number of positions to slice string. In this case its the length of the string + chunk size - 1, since we need to get a big enough range to also fit leftover chunk sizes.
Enumerable.Select each position of slicing and get the startIndex using String.Substring using the position multiplied by 2 to move down the string every 2 characters. You will have to use Math.Min to calculate the smallest size leftover size if the string doesn't have enough characters to fit another chunk. You can calculate this by the length of the string - current position * chunk size.
String.Join the final result with ":".
You could also replace the LINQ query with yield here to increase performance for larger strings since all the substrings won't be stored in memory at once:
for (var pos = 0; pos < source.Length; pos += partSize)
{
yield return source.Substring(pos, Math.Min(partSize, source.Length - pos));
}
You can use something like this:
string newStr= System.Text.RegularExpressions.Regex.Replace(MAC, ".{2}", "$0:");
To trim the last colon, you can use something like this.
newStr.TrimEnd(':');
Microsoft Document
Try this way.
string MAC = "00122345";
MAC = System.Text.RegularExpressions.Regex.Replace(MAC,".{2}", "$0:");
MAC = MAC.Substring(0,MAC.Length-1);
Console.WriteLine(MAC);
A quite fast solution, 8-10x faster than the current accepted answer (regex solution) and 3-4x faster than the LINQ solution
public static string Format(this string s, string separator, int length)
{
StringBuilder sb = new StringBuilder();
for (int i = 0; i < s.Length; i += length)
{
sb.Append(s.Substring(i, Math.Min(s.Length - i, length)));
if (i < s.Length - length)
{
sb.Append(separator);
}
}
return sb.ToString();
}
Usage:
string result = "12345678".Format(":", 2);
Here is a one (1) line alternative using LINQ Enumerable.Aggregate.
string result = MAC.Aggregate("", (acc, c) => acc.Length % 3 == 0 ? acc += c : acc += c + ":").TrimEnd(':');
An easy to understand and simple solution.
This is a simple fast modified answer in which you can easily change the split char.
This answer also checks if the number is even or odd , to make the suitable string.Split().
input : 00122345
output : 00:12:23:45
input : 0012234
output : 00:12:23:4
//The List that keeps the pairs
List<string> MACList = new List<string>();
//Split the even number into pairs
for (int i = 1; i <= MAC.Length; i++)
{
if (i % 2 == 0)
{
MACList.Add(MAC.Substring(i - 2, 2));
}
}
//Make the preferable output
string output = "";
for (int j = 0; j < MACList.Count; j++)
{
output = output + MACList[j] + ":";
}
//Checks if the input string is even number or odd number
if (MAC.Length % 2 == 0)
{
output = output.Trim(output.Last());
}
else
{
output += MAC.Last();
}
//input : 00122345
//output : 00:12:23:45
//input : 0012234
//output : 00:12:23:4
I have a string as follow 51200000000000000000000000000000
This string is not fixed. It will be appended depends on the number of boards. If there are two boards, the string will be as follow 5120000000000000000000000000000052200000000000000000000000000000
I would like to know how to calculate the number of zeros in the string.
I'm using the following code but it is not flexible if there are more than two boards.
string str = "51200000000000000000000000000000";
string zeros = "00000000000000000000000000000";
if (str.Contains(zeros))
{
Console.WriteLine("true");
}
else
{
Console.WriteLine("false");
}
You can use the following piece of code to do this, which will give you the number of zeros(Example).
char matchChar='0';
string strInput = "51200000000000000000000000000000";
int zeroCount = strInput.Count(x => x == matchChar); // will be 29
You can do the same by iterating through each characters and check whether it is the required character(say 0) then take its count.
Use a simple foreach loop to traverse the string and count:
int CountZeroes(string str)
{
// TODO: error checking, etc.
int count = 0;
foreach (var character in str)
{
if (character == '0') count++;
}
return count;
}
a little advanced (or so) technique would be to convert the string to char array then to list of chars then using LINQ
string str = "51200000000000000000000000000000";
List<char> nums = str.ToCharArray().ToList();
Console.WriteLine(nums.Where(x => x.Equals('0')).Select(x => x.ToString()).Count());
i just placed this here in case you want to learn not just a single approach :)
It can also do with a for loop and Substring.
Code
string str = "51200000000000000000000000000000";
int n = 0;
for (int i = 0; i < str.Length; i++)
{
if (str.Substring(i, 1) == "0")
n += 1;
}
Console.WriteLine("Count : " + n.ToString());
Working fiddle demo
Code:
string st;
st = textBox1.Text;
int countch = 0, i;
for (i = 0; i < st.Length; i++)
if (st[i]=='0') countch++;
MessageBox.Show(countch.ToString());
using System.Linq
int count0s = str.Count(z => z == '0');
will return how many 0's in your str string
In each line i want to parse the string after the tag
I know it's html but i copied this part of the html to text file.
For example the first two lines in the text file are like this:
<li>602 — Text602 document</li>
<li>ABW — AbiWord Document</li>
I want to parse the 602 from the first line and the ABW from the second line.
What i tried to do is:
private void ParseFilesTypes()
{
string[] lines = File.ReadAllLines(#"E:\New folder (44)\New Text Document.txt");
foreach (string str in lines)
{
int r = str.IndexOf("<li>");
if (r >= 0)
{
int i = str.IndexOf(" -", r + 1);
if (i >= 0)
{
int c = str.IndexOf(" -", i + 1);
if (c >= 0)
{
i++;
MessageBox.Show(str.Substring(i, c - i));
}
}
}
}
}
But c is all the time -1
I think it is a case when regex would be useful (unless there will be no li attributes):
var regex = new Regex("^<li>(.+) —");
foreach (string str in lines)
{
var m = regex.Match(str);
if (m.Success)
MessageBox.Show(m.Groups[1].Value);
}
Actually, your problem is that you're reading the file with the incorrect encoding. You have a special character in your file — and not -. So you need to correct this character in your code and read the file in the correct encoding. If you debug your string read with wrong encoding, you'll see a black diamond instead of —.
Also, you need to remove the space before — or replace i + 1 with i;
private static void ParseFilesTypes()
{
string sampleFilePath = #"log.txt";
string[] lines = File.ReadAllLines(#"log.txt", Encoding.GetEncoding("windows-1252"));
foreach (string str in lines)
{
int r = str.IndexOf("<li>");
if (r >= 0)
{
int i = str.IndexOf(" —", r + 1);
if (i >= 0)
{
int c = str.IndexOf(" —", i);
if (c >= 0)
{
i++;
int startIndex = r + "<li>".Length;
int length = i - startIndex - 1;
string result = str.Substring(r + "<li>".Length, length);
MessageBox.Show(result);
}
}
}
}
}
I want to break a string up into lines of a specified maximum length, without splitting any words, if possible (if there is a word that exceeds the maximum line length, then it will have to be split).
As always, I am acutely aware that strings are immutable and that one should preferably use the StringBuilder class. I have seen examples where the string is split into words and the lines are then built up using the StringBuilder class, but the code below seems "neater" to me.
I mentioned "best" in the description and not "most efficient" as I am also interested in the "eloquence" of the code. The strings will never be huge, generally splitting into 2 or three lines, and it won't be happening for thousands of lines.
Is the following code really bad?
private static IEnumerable<string> SplitToLines(string stringToSplit, int maximumLineLength)
{
stringToSplit = stringToSplit.Trim();
var lines = new List<string>();
while (stringToSplit.Length > 0)
{
if (stringToSplit.Length <= maximumLineLength)
{
lines.Add(stringToSplit);
break;
}
var indexOfLastSpaceInLine = stringToSplit.Substring(0, maximumLineLength).LastIndexOf(' ');
lines.Add(stringToSplit.Substring(0, indexOfLastSpaceInLine >= 0 ? indexOfLastSpaceInLine : maximumLineLength).Trim());
stringToSplit = stringToSplit.Substring(indexOfLastSpaceInLine >= 0 ? indexOfLastSpaceInLine + 1 : maximumLineLength);
}
return lines.ToArray();
}
Even when this post is 3 years old I wanted to give a better solution using Regex to accomplish the same:
If you want the string to be splitted and then use the text to be displayed you can use this:
public string SplitToLines(string stringToSplit, int maximumLineLength)
{
return Regex.Replace(stringToSplit, #"(.{1," + maximumLineLength +#"})(?:\s|$)", "$1\n");
}
If on the other hand you need a collection you can use this:
public MatchCollection SplitToLines(string stringToSplit, int maximumLineLength)
{
return Regex.Matches(stringToSplit, #"(.{1," + maximumLineLength +#"})(?:\s|$)");
}
NOTES
Remember to import regex (using System.Text.RegularExpressions;)
You can use string interpolation on the match:
$#"(.{{1,{maximumLineLength}}})(?:\s|$)"
The MatchCollection works almost like an Array
Matching example with explanation here
How about this as a solution:
IEnumerable<string> SplitToLines(string stringToSplit, int maximumLineLength)
{
var words = stringToSplit.Split(' ').Concat(new [] { "" });
return
words
.Skip(1)
.Aggregate(
words.Take(1).ToList(),
(a, w) =>
{
var last = a.Last();
while (last.Length > maximumLineLength)
{
a[a.Count() - 1] = last.Substring(0, maximumLineLength);
last = last.Substring(maximumLineLength);
a.Add(last);
}
var test = last + " " + w;
if (test.Length > maximumLineLength)
{
a.Add(w);
}
else
{
a[a.Count() - 1] = test;
}
return a;
});
}
I reworked this as prefer this:
IEnumerable<string> SplitToLines(string stringToSplit, int maximumLineLength)
{
var words = stringToSplit.Split(' ');
var line = words.First();
foreach (var word in words.Skip(1))
{
var test = $"{line} {word}";
if (test.Length > maximumLineLength)
{
yield return line;
line = word;
}
else
{
line = test;
}
}
yield return line;
}
I don't think your solution is too bad. I do, however, think you should break up your ternary into an if else because you are testing the same condition twice. Your code might also have a bug. Based on your description, it seems you want lines <= maxLineLength, but your code counts the space after the last word and uses it in the <= comparison resulting in effectively < behavior for the trimmed string.
Here is my solution.
private static IEnumerable<string> SplitToLines(string stringToSplit, int maxLineLength)
{
string[] words = stringToSplit.Split(' ');
StringBuilder line = new StringBuilder();
foreach (string word in words)
{
if (word.Length + line.Length <= maxLineLength)
{
line.Append(word + " ");
}
else
{
if (line.Length > 0)
{
yield return line.ToString().Trim();
line.Clear();
}
string overflow = word;
while (overflow.Length > maxLineLength)
{
yield return overflow.Substring(0, maxLineLength);
overflow = overflow.Substring(maxLineLength);
}
line.Append(overflow + " ");
}
}
yield return line.ToString().Trim();
}
It is a bit longer than your solution, but it should be more straightforward. It also uses a StringBuilder so it is much faster for large strings. I performed a benchmarking test for 20,000 words ranging from 1 to 11 characters each split into lines of 10 character width. My method completed in 14ms compared to 1373ms for your method.
Try this (untested)
private static IEnumerable<string> SplitToLines(string value, int maximumLineLength)
{
var words = value.Split(' ');
var line = new StringBuilder();
foreach (var word in words)
{
if ((line.Length + word.Length) >= maximumLineLength)
{
yield return line.ToString();
line = new StringBuilder();
}
line.AppendFormat("{0}{1}", (line.Length>0) ? " " : "", word);
}
yield return line.ToString();
}
~6x faster than the accepted answer
More than 1.5x faster than the Regex version in Release Mode (dependent on line length)
Optionally keep the space at the end of the line or not (the regex version always keeps it)
static IEnumerable<string> SplitToLines(string stringToSplit, int maximumLineLength, bool removeSpace = true)
{
int start = 0;
int end = 0;
for (int i = 0; i < stringToSplit.Length; i++)
{
char c = stringToSplit[i];
if (c == ' ' || c == '\n')
{
if (i - start > maximumLineLength)
{
string substring = stringToSplit.Substring(start, end - start); ;
start = removeSpace ? end + 1 : end; // + 1 to remove the space on the next line
yield return substring;
}
else
end = i;
}
}
yield return stringToSplit.Substring(start); // remember last line
}
Here is the example code used to test speeds (again, run on your own machine and test in Release mode to get accurate timings)
https://dotnetfiddle.net/h5I1GC
Timings on my machine in release mode .Net 4.8
Accepted Answer: 667ms
Regex: 368ms
My Version: 117ms
My requirement was to have a line break at the last space before the 30 char limit.
So here is how i did it. Hope this helps anyone looking.
private string LineBreakLongString(string input)
{
var outputString = string.Empty;
var found = false;
int pos = 0;
int prev = 0;
while (!found)
{
var p = input.IndexOf(' ', pos);
{
if (pos <= 30)
{
pos++;
if (p < 30) { prev = p; }
}
else
{
found = true;
}
}
outputString = input.Substring(0, prev) + System.Environment.NewLine + input.Substring(prev, input.Length - prev).Trim();
}
return outputString;
}
An approach using recursive method and ReadOnlySpan (Tested)
public static void SplitToLines(ReadOnlySpan<char> stringToSplit, int index, ref List<string> values)
{
if (stringToSplit.IsEmpty || index < 1) return;
var nextIndex = stringToSplit.IndexOf(' ');
var slice = stringToSplit.Slice(0, nextIndex < 0 ? stringToSplit.Length : nextIndex);
if (slice.Length <= index)
{
values.Add(slice.ToString());
nextIndex++;
}
else
{
values.Add(slice.Slice(0, index).ToString());
nextIndex = index;
}
if (stringToSplit.Length <= index) return;
SplitToLines(stringToSplit.Slice(nextIndex), index, ref values);
}