Split text into two sentences in C#

Split text into two sentences in C# - c#

I want to divide a text into sentences.the sentence contains whitespace characters
For example:
Orginal sentence: 100 10 20 13
the result:
first sentence:100 10 20
second sentence:13
I tried split but the result was :
first:100
second:10
third:20
fourth:13
How can I do that?

You want all before the last space and the rest? You can use String.LastIndexOf and Substring:
string text = "100 10 20 13";
string firstPart = text;
string lastPart;
int lastSpaceIndex = text.LastIndexOf(' ');
if(lastSpaceIndex >= 0)
{
firstPart = text.Substring(0, lastSpaceIndex);
lastPart = text.Substring(lastSpaceIndex).TrimStart();
}

You can use Linq for this;
// This splits on white space
var split = original.Split(' ');
// This takes all split parts except for the last one
var first = split.Take(split.Count() - 1);
// And rejoins it
first = String.Join(" ", first);
// This gets the last one
var last = split.Last();
Note: This is assuming that you want the first result to be every word except for the last and the second result to be only the last... If you have different requirements please clarify your question

Related

Taking parts out of a string, how?

So I have a server that receives a connection with the message being converted to a string, I then have this string split between by the spaces

So you have a line:
var line = "hello world my name is bob";
And you don't want "world" or "is", so you want:
"hello my name bob"
If you split to a list, remove the things you don't want and recombine to a line, you won't have extraneous spaces:
var list = line.Split().ToList();
list.Remove("world");
list.Remove("is");
var result = string.Join(" ", list);
Or if you know the exact index positions of your list items, you can use RemoveAt, but remove them in order from highest index to lowest, because if you e.g. want to remove 1 and 4, removing 1 first will mean that the 4 you wanted to remove is now in index 3.. Example:
var list = line.Split().ToList();
list.RemoveAt(4); //is
list.RemoveAt(1); //world
var result = string.Join(" ", list);
If you're seeking a behavior that is like string.Replace, which removes all occurrences, you can use RemoveAll:
var line = "hello is world is my is name is bob";
var list = line.Split().ToList();
list.RemoveAll(w => w == "is"); //every occurence of "is"
var result = string.Join(" ", list);

You could remove the empty space using TrimStart() method.
Something like this:
string text = "Hello World";
string[] textSplited = text.Split(' ');
string result = text.Replace(textSplited[0], "").TrimStart();

Assuming that you only want to remove the first word and not all repeats of it, a much more efficient way is to use the overload of split that lets you control the maximum number of splits (the argument is the maximum number of results, which is one more than the maximum number of splits):
string[] arguments = line.Split(new[] { ' ' }, 2, StringSplitOptions.RemoveEmptyEntries); // split only once
User.data = arguments.Skip(1).FirstOrDefault();
arguments[1] does the right thing when there are "more" arguments, but throw IndexOutOfRangeException if the number of words is zero or one. That could be fixed without LINQ by (arguments.Length > 1)? arguments[1]: string.Empty

If you're just removing the first word of a string, you don't need to use Split at all; doing a Substring after you found the space will be more efficient.
var line = ...
var idx = line.IndexOf(' ')+1;
line = line.Substring(idx);
or in recent C# versions
line = line[idx..];

C# Replace Everything before the first Space

I need to remove everything in a string before the first occurrence of a space.
Every string starts with a number and followed by a space
Replace the number and the space, thus leaving the rest of the string in tact
For Example:
22 The cats of India
4 Royal Highness
562 Eating Potatoes
42 Biscuits in the 2nd fridge
2564 Niagara Falls at 2 PM
I just need:
The cats of India
Royal Highness
Eating Potatoes
Biscuits in the 2nd fridge
Niagara Falls at 2 PM
Basically remove every number before the first space, including the first space.
I tried this:
foreach (string line in lines)
{
string newline = line.Trim().Remove(0, line.IndexOf(' ') + 1);
}
This works for numbers below 10. After it hits 2 digits, it doesn't work properly.
How should I change my code?

If you want to make sure you only match digits at the beginning of the string, you can use the following regex:
^\d+\p{Zs}
See demo
Declare it like:
public static readonly Regex rx = new Regex(#"^\d+\p{Zs}", RegexOptions.Compiled);
The ^\d+\p{Zs} regex means: one or more digits at the start of the string followed with 1 whitespace.
And then use it like
string newline = rx.Replace(line, string.Empty);
EDIT: To make sure the line has no leading whitespace, we can add .Trim() to strip it like:
Regex rx = new Regex(#"^\d+\p{Zs}", RegexOptions.Compiled);
string newline = rx.Replace(line.Trim(), string.Empty);

I know you already found a resolution to your issue. But I am going to explain why your code didn't work in the first place.
Your data has extra spaces which is why you are trimming it: line.Trim(). But the real problem lies in the the following statement:
string newline = line.Trim().Remove(0, line.IndexOf(' ') + 1);
You are making the assumption about the order of the operation and the fact that string data type is not immutable. When the operation of Trim() function is complete it returns a whole new string which is used in the Remove() operation. But the IndexOf() function is done on the original line of data.
So the correct line of code would be the following:
foreach (string line in lines)
{
// trim the line first
var temp = line.Trim();
// now perform all operation on the new temporary string
string newline = temp.Remove(0, temp.IndexOf(' ') + 1);
// debugging purpose
Console.WriteLine(newline);
}

Another solution:
var lines = new string[]
{
"22 The cats of India",
"4 Royal Highness",
"562 Eating Potatoes",
"42 Biscuits in the 2nd fridge",
"2564 Niagara Falls at 2 PM"
};
foreach (var line in lines)
{
var newLine = string.Join(" ", line.Split(' ').Skip(1));
}

Use a regex like so:
string newline = Regex.Replace(line, #"^\s*\d+\s*", "");
This will remove numbers only, not other text before the first space.

This is what you are looking for
foreach (string line in lines)
{
string newline = line.Replace(line.Split(new Char[]{' '})[0] + ' ',string.Empty);
}
UPDATE
string search=line.Split(new Char[]{' '})[0];
int pos=line.indexOf(search);
string newline = line.Substring(0, pos) + string.Empty + line.Substring(pos + search.Length);
FULL CODE
using System;
public class Program
{
public static void Main()
{
var lines = new string[]
{
"22 The cats of India",
"4 Royal Highness",
"562 Eating Potatoes",
"42 Biscuits in the 2nd fridge",
"2 Niagara Falls at 2 PM"
};
foreach(string line in lines){
string search=line.Split(new Char[]{' '})[0];
int pos=line.IndexOf(search);
string newline = line.Substring(0, pos) + string.Empty + line.Substring(pos + search.Length);
Console.WriteLine(newline);
}
}
}

c# Substring characters in a string

I have string value below,
string value = "034 TH4493";
In first side,
var result = value.Substring(2,value.Length - 2);
In second side,
var result2 = value.Substring(0, 2);
result1 must be
"34TH4493"
result2 must be
34
However its not working for me and I can not solve the problem. Do I need to use another solution or what's missing in the above code ?
Thanks.

var result = value.Substring(2, value.Length - 2);
There you're actually telling it to start at index position 2 (the 4 in "034 TH4493") and then to add as many characters as the length of "034 TH4493" (10, the space counts) minus 2, which would equal 8, thus: "4 TH4493".
What you want is to tell it to remove the space by replacing it with nothing, then start at index 1, so that the "0" at index 0 is discarded, then count for all other characters except the one you're ignoring:
var result = value.Replace(" ", "").Substring(1, value.Length - 2); // -2 because "value" holds both the space and the first 0, rather than just the 0
As you may imagine by now, var result2 = value.Substring(0, 2); is actually grabbing the "03" (index 0, two characters), when you'd actually want var result2 = value.Substring(1, 2).
Alternatively, you could split the string, then grab whatever you want:
var result = value.Replace(" ", "").Substring(1, value.Length - 2);
var values = value.Split(' '); // Split at the space character
var result2 = values[0];
// or
var result2 = value.Split(' ')[0];
In cases like these, where you're unsure of what's going on, it helps to add breakpoints (F9 key with the default settings), so the application pauses when that line of code is reached, and you can explore the current values by hovering the cursor over the variables, or checking in the "Locals" tab.
EDIT: I ended mixing up the values you wanted for result and result2, should be fixed now...

string v2 = value.TrimStart('0');
var result1 = v2.Replace(" ","");
var result2 = v2.Split(' ')[0];

You can find them as;
var result1 = value.Substring(1).Replace(" ", "");
var result2 = value.Substring(1, 2);
Need to replace your white space because it doesn't magically disappear. If you wanna get "34TH4493" instead of 34TH4493 as a string, you can format it like;
result1 = string.Format("\"{0}\"", result1);

You must consider that the index for first character is zero not 1 for string.
First case
So first results starts from 3rd character but you want from second character to starting index would be 1 and ending index would be one less than the length.
var result = value.Substring(1,value.Length - 1);
Second case
For second you started from zero and want the to start from second character so started index should be 1 instead of 0.
var result2 = value.Substring(1, 2);
You must read the documentation of Substring where startIndex is "The zero-based starting character position of a substring in this instance"

split a string into 2 arrays based on 2 delimiters

I want to split a string into 2 arrays, one with the text that's delimited by vbTab (I think it's \t in c#) and another string with the test thats delimited by vbtab (I think it's \n in c#).
By searching I found this (StackOverFlow Question: 1254577):
string input = "abc][rfd][5][,][.";
string[] parts1 = input.Split(new string[] { "][" }, StringSplitOptions.None);
string[] parts2 = Regex.Split(input, #"\]\[");
but my string would be something like this:
aaa\tbbb\tccc\tddd\teee\nAccount\tType\tCurrency\tBalance\t123,456.78\nDate\tDetails\tAmount\n03NOV13\tTransfer\t9,999,999.00-\n02NOV13\t\Cheque\t125.00\nDebit Card Cash\t200.00
so in the above code input becomes:
string input = "aa\tbbb\tccc\tddd\teee\nAccount\tType\tPersonal Current Account\tCurrency\tGBP\tBalance\t123,456.78\nDate\tDetails\tAmount\n03NOV13\tTransfer\t9,999,999.00-\n02NOV13\t\Cheque\t125.00\nDebit Card Cash\t200.00\n30OCT13\tLoan Repayment\t1,234.56-\n\tType\t30-Day Notice Savings Account\tCurrency\tGBP\tBalance\t983,456.78\nDate\tDetails\tAmount\n03NOV13\tRepaid\t\250\n"
but how do I create one string array with everthing up to the first newline and another array that holds everything after?
Then the second one will have to be split again into several string arrays so I can write out a mini-statement with the account details, then showing the transactions for each account.
I want to be able to take the original string and produce something like this on A5 paper:

You can use a LINQ query:
var cells = from row in input.Split('\n')
select row.Split('\t');
You can get just the first row using First() and the remaining rows using Skip(). For example:
foreach (string s in cells.First())
{
Console.WriteLine("First: " + s);
}
Or
foreach (string[] row in cells.Skip(1))
{
Console.WriteLine(String.Join(",", row));
}

The code below should do what you requested. This resulted in part1 having 5 entries and part2 having 26 entries
string input = "aa\tbbb\tccc\tddd\teee\nAccount\tType\tPersonal Current Account\tCurrency\tGBP\tBalance\t123,456.78\nDate\tDetails\tAmount\n03NOV13\tTransfer\t9,999,999.00-\n02NOV13\t\Cheque\t125.00\nDebit Card Cash\t200.00\n30OCT13\tLoan Repayment\t1,234.56-\n\tType\t30-Day Notice Savings Account\tCurrency\tGBP\tBalance\t983,456.78\nDate\tDetails\tAmount\n03NOV13\tRepaid\t\250\n";
// Substring starting at 0 and ending where the first newline begins
string input1 = input.Substring(0, input.IndexOf(#"\n"));
/* Substring starting where the first newline begins
plus the length of the new line to the end */
string input2 = input.Substring(input.IndexOf(#"\n") + 2);
string[] part1 = Regex.Split(input1, #"\\t");
string[] part2 = Regex.Split(input2, #"\\t");

Regular expression to split long strings in several lines

I'm not an expert in regular expressions and today in my project I face the need to split long string in several lines in order to check if the string text fits the page height.
I need a C# regular expression to split long strings in several lines by "\n", "\r\n" and keeping 150 characters by line maximum. If the character 150 is in the middle of an word, the entire word should be move to the next line.
Can any one help me?

It's actually a quite simple problem. Look for any characters up to 150, followed by a space. Since Regex is greedy by nature it will do exactly what you want it to. Replace it by the Match plus a newline:
.{0,150}(\s+|$)
Replace with
$0\r\n
See also: http://regexhero.net/tester/?id=75645133-1de2-4d8d-a29d-90fff8b2bab5

var regex = new Regex(#".{0,150}", RegexOptions.Multiline);
var strings = regex.Replace(sourceString, "$0\r\n");

Here you go:
^.{1,150}\n
This will match the longest initial string like this.

if you just want to split a long string into lines of 150 chars then I'm not sure why you'd need a regular expression:
private string stringSplitter(string inString)
{
int lineLength = 150;
StringBuilder sb = new StringBuilder();
while (inString.Length > 0)
{
var curLength = inString.Length >= lineLength ? lineLength : inString.Length;
var lastGap = inString.Substring(0, curLength).LastIndexOfAny(new char[] {' ', '\n'});
if (lastGap == -1)
{
sb.AppendLine(inString.Substring(0, curLength));
inString = inString.Substring(curLength);
}
else
{
sb.AppendLine(inString.Substring(0, lastGap));
inString = inString.Substring(lastGap + 1);
}
}
return sb.ToString();
}
edited to account for word breaks

This code should help you. It will check the length of the current string. If it is greater than your maxLength (150) in this case, it will start at the 150th character and (going backwards) find the first non-word character (as described by the OP, this is a sequence of non-space characters). It will then store the string up to that character and start over again with the remaining string, repeating until we end up with a substring that is less than maxLength characters. Finally, join them all back together again in a final string.
string line = "This is a really long run-on sentence that should go for longer than 150 characters and will need to be split into two lines, but only at a word boundary.";
int maxLength = 150;
string delimiter = "\r\n";
List<string> lines = new List<string>();
// As long as we still have more than 'maxLength' characters, keep splitting
while (line.Length > maxLength)
{
// Starting at this character and going backwards, if the character
// is not part of a word or number, insert a newline here.
for (int charIndex = (maxLength); charIndex > 0; charIndex--)
{
if (char.IsWhiteSpace(line[charIndex]))
{
// Split the line after this character
// and continue on with the remainder
lines.Add(line.Substring(0, charIndex+1));
line = line.Substring(charIndex+1);
break;
}
}
}
lines.Add(line);
// Join the list back together with delimiter ("\r\n") between each line
string final = string.Join(delimiter , lines);
// Check the results
Console.WriteLine(final);
Note: If you run this code in a console application, you may want to change "maxLength" to a smaller number so that the console doesn't wrap on you.
Note: This code does not take into effect any tab characters. If tabs are also included, your situation gets a bit more complicated.
Update: I fixed a bug where new lines were starting with a space.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Split text into two sentences in C# - c#

I want to divide a text into sentences.the sentence contains whitespace characters For example: Orginal sentence: 100 10 20 13 the result: first sentence:100 10 20 second sentence:13 I tried split but the result was : first:100 second:10 third:20 fourth:13 How can I do that?

Related

Taking parts out of a string, how?

C# Replace Everything before the first Space

c# Substring characters in a string

split a string into 2 arrays based on 2 delimiters

Regular expression to split long strings in several lines

Categories

Resources