I need to remove everything in a string before the first occurrence of a space.
Every string starts with a number and followed by a space
Replace the number and the space, thus leaving the rest of the string in tact
For Example:
22 The cats of India
4 Royal Highness
562 Eating Potatoes
42 Biscuits in the 2nd fridge
2564 Niagara Falls at 2 PM
I just need:
The cats of India
Royal Highness
Eating Potatoes
Biscuits in the 2nd fridge
Niagara Falls at 2 PM
Basically remove every number before the first space, including the first space.
I tried this:
foreach (string line in lines)
{
string newline = line.Trim().Remove(0, line.IndexOf(' ') + 1);
}
This works for numbers below 10. After it hits 2 digits, it doesn't work properly.
How should I change my code?
If you want to make sure you only match digits at the beginning of the string, you can use the following regex:
^\d+\p{Zs}
See demo
Declare it like:
public static readonly Regex rx = new Regex(#"^\d+\p{Zs}", RegexOptions.Compiled);
The ^\d+\p{Zs} regex means: one or more digits at the start of the string followed with 1 whitespace.
And then use it like
string newline = rx.Replace(line, string.Empty);
EDIT: To make sure the line has no leading whitespace, we can add .Trim() to strip it like:
Regex rx = new Regex(#"^\d+\p{Zs}", RegexOptions.Compiled);
string newline = rx.Replace(line.Trim(), string.Empty);
I know you already found a resolution to your issue. But I am going to explain why your code didn't work in the first place.
Your data has extra spaces which is why you are trimming it: line.Trim(). But the real problem lies in the the following statement:
string newline = line.Trim().Remove(0, line.IndexOf(' ') + 1);
You are making the assumption about the order of the operation and the fact that string data type is not immutable. When the operation of Trim() function is complete it returns a whole new string which is used in the Remove() operation. But the IndexOf() function is done on the original line of data.
So the correct line of code would be the following:
foreach (string line in lines)
{
// trim the line first
var temp = line.Trim();
// now perform all operation on the new temporary string
string newline = temp.Remove(0, temp.IndexOf(' ') + 1);
// debugging purpose
Console.WriteLine(newline);
}
Another solution:
var lines = new string[]
{
"22 The cats of India",
"4 Royal Highness",
"562 Eating Potatoes",
"42 Biscuits in the 2nd fridge",
"2564 Niagara Falls at 2 PM"
};
foreach (var line in lines)
{
var newLine = string.Join(" ", line.Split(' ').Skip(1));
}
Use a regex like so:
string newline = Regex.Replace(line, #"^\s*\d+\s*", "");
This will remove numbers only, not other text before the first space.
This is what you are looking for
foreach (string line in lines)
{
string newline = line.Replace(line.Split(new Char[]{' '})[0] + ' ',string.Empty);
}
UPDATE
string search=line.Split(new Char[]{' '})[0];
int pos=line.indexOf(search);
string newline = line.Substring(0, pos) + string.Empty + line.Substring(pos + search.Length);
FULL CODE
using System;
public class Program
{
public static void Main()
{
var lines = new string[]
{
"22 The cats of India",
"4 Royal Highness",
"562 Eating Potatoes",
"42 Biscuits in the 2nd fridge",
"2 Niagara Falls at 2 PM"
};
foreach(string line in lines){
string search=line.Split(new Char[]{' '})[0];
int pos=line.IndexOf(search);
string newline = line.Substring(0, pos) + string.Empty + line.Substring(pos + search.Length);
Console.WriteLine(newline);
}
}
}
Related
Background
I am working with a delimited string and was using String.Split to put each substring into an array when I noticed that the last spot in the array was "". It was throwing off my results since I was looking for a specific substring at the last index in the array and I eventually came across this post explaining all strings end with string.Empty.
Example
The following shows this behavior in action. When I split my sentence and write each substring to the console, we can see the last element is the empty string:
public class Program
{
static void Main(string[] args)
{
const string mySentence = "Hello,this,is,my,string!";
var wordArray = mySentence.Split(new[] {",", "!"}, StringSplitOptions.None);
foreach (var word in wordArray)
{
var message = word;
if (word == string.Empty) message = "Empty string";
Console.WriteLine(message);
}
Console.ReadKey();
}
}
Question & "Fix"
I get conceptually that there are empty strings between every character, but why does String behave like this even for the end of a string? It seems confusing that "ABC" is equivalent to "ABC" + "" or ABC + "" + "" + "" so why not treat the string literally as only "ABC"?
There is a "fix" around it to get the "true" substrings I wanted:
public class Program
{
static void Main(string[] args)
{
const string mySentence = "Hello,this,is,my,string!";
var wordArray = mySentence.Split(new[] {",", "!"}, StringSplitOptions.None);
var wordList = new List<string>();
wordList.AddRange(wordArray);
wordList.RemoveAt(wordList.LastIndexOf(string.Empty));
foreach (var word in wordList)
{
var message = word;
if (word == string.Empty) message = "Empty string";
Console.WriteLine(message);
}
Console.ReadKey();
}
}
But I still don't understand why the end of the string gets treated with the same behavior since there is not another character following it where an empty string would be needed. Does it serve some purpose for the compiler?
Empty strings are the 0 of strings. There are literally infinity of them everywhere.
It's only natural that "ABC" is equivalent to "ABC" + "" or ABC + "" + "" + "". Just like it's natural that 3 is equivalent to 3 + 0 or 3 + 0 + 0 + 0.
and the fact that you have an empty string after "Hello,this,is,my,string!".Split('!')" does mean something. It means that your string ended with a "!"
This is happening because you are using StringSplitOptions.None while one of your delimiter values occurs at the end of the string. The entire purpose of that option is to create the behavior you are observing: it splits a string containing N delimiters into exactly N + 1 pieces.
To see the behavior you want, use StringSplitOptions.RemoveEmptyEntries:
var wordArray = mySentence.Split(new[] {",", "!"}, StringSplitOptions.RemoveEmptyEntries);
As for why you are seeing what you're seeing. The behavior StringSplitOptions.None is to find all the places where the delimiters are in the input string and return an array of each piece before and after the delimiters. This could be useful, for example, if you're parsing a string that you know to have exactly N parts, but where some of them could be blank. So for example, splitting the following on a comma delimiter, they would each yield exactly 3 parts:
a,b,c
a,b,
a,,c
a,,
,b,c
,b,
,,c
,,
If you want to allow empty values between delimiters, but not at the beginning or end, you can strip off delimiters at the beginning or end of the string before splitting:
var wordArray = Regex
.Replace(mySentence, "^[,!]|[,!]$", "")
.Split(new[] {",", "!"}, StringSplitOptions.None);
"" is the gap in-between each letter of Hello,this,is,my,string! So when the string is split by , and ! the result is Hello, this, is, my, string, "". The "" being the empty character between the end of the string and !.
If you replaced "" with a visible character (say #) your string would look like this #H#e#l#l#o#,#t#h#i#s#,#i#s#,#m#y#,#s#t#r#i#n#g#!#.
I'm trying to work on this string
abc
def
--------------
efg
hij
------
xyz
pqr
--------------
Now I have to split the string with the - character.
So far I'm first spliting the string in lines and then finding the occurrence of - and the replacing the line with a single *, then combining the whole string and splitting them again.
I'm trying to get the data as
string[] set =
{
"abc
def",
"efg
hij",
"xyz
pqr"
}
Is there a better way to do this?
var spitStrings = yourString.Split(new char[] { '-' }, StringSplitOptions.RemoveEmptyEntries);
If i understand your question, this above code solves it.
Use of string split function using the specific char or string of chars (here -) can be used.
The output will be array of strings. Then choose whichever strings you want.
Example:
http://www.dotnetperls.com/split
I'm confused with exactly what you're asking, but won't this work?
string[] seta =
{
"abc\ndef",
"efg\nhij",
"xyz\npqr"
}
\n = CR (Carriage Return) // Used as a new line character in Unix
\r = LF (Line Feed) // Used as a new line character in Mac OS
\n\r = CR + LF // Used as a new line character in Windows
(char)13 = \n = CR // Same as \n
If I'm understanding your question about splitting -'s then the following should work.
string s = "abc-def-efg-hij-xyz-pqr"; // example?
string[] letters = s.Split(new char[] { '-' }, StringSplitOptions.RemoveEmptyEntries);
If this is what your array looks like at the moment, then you can loop through it as follows:
string[] seta = {
"abc-def",
"efg-hij",
"xyz-pqr"
};
foreach (var letter in seta)
{
string[] letters = letter.Split(new char[] { '-' }, StringSplitOptions.RemoveEmptyEntries);
// do something with letters?
}
I'm sure this below code will help you...
string m = "adasd------asdasd---asdasdsad-------asdasd------adsadasd---asdasd---asdadadad-asdadsa-asdada-s---adadasd-adsd";
var array = m.Split('-');
List<string> myCollection = new List<string>();
if (array.Length > 0)
{
foreach (string item in array)
{
if (item != "")
{
myCollection.Add(item);
}
}
}
string[] str = myCollection.ToArray();
if it does then don't forget to mark my answer thanks....;)
string set = "abc----def----------------efg----hij--xyz-------pqr" ;
var spitStrings = set.Split(new char[]{'-'},StringSplitOptions.RemoveEmptyEntries);
EDIT -
He wants to split the strings no matter how many '-' are there.
var spitStrings = set.Split(new char[]{'-'},StringSplitOptions.RemoveEmptyEntries);
This will do the work.
I want to split a string into 2 arrays, one with the text that's delimited by vbTab (I think it's \t in c#) and another string with the test thats delimited by vbtab (I think it's \n in c#).
By searching I found this (StackOverFlow Question: 1254577):
string input = "abc][rfd][5][,][.";
string[] parts1 = input.Split(new string[] { "][" }, StringSplitOptions.None);
string[] parts2 = Regex.Split(input, #"\]\[");
but my string would be something like this:
aaa\tbbb\tccc\tddd\teee\nAccount\tType\tCurrency\tBalance\t123,456.78\nDate\tDetails\tAmount\n03NOV13\tTransfer\t9,999,999.00-\n02NOV13\t\Cheque\t125.00\nDebit Card Cash\t200.00
so in the above code input becomes:
string input = "aa\tbbb\tccc\tddd\teee\nAccount\tType\tPersonal Current Account\tCurrency\tGBP\tBalance\t123,456.78\nDate\tDetails\tAmount\n03NOV13\tTransfer\t9,999,999.00-\n02NOV13\t\Cheque\t125.00\nDebit Card Cash\t200.00\n30OCT13\tLoan Repayment\t1,234.56-\n\tType\t30-Day Notice Savings Account\tCurrency\tGBP\tBalance\t983,456.78\nDate\tDetails\tAmount\n03NOV13\tRepaid\t\250\n"
but how do I create one string array with everthing up to the first newline and another array that holds everything after?
Then the second one will have to be split again into several string arrays so I can write out a mini-statement with the account details, then showing the transactions for each account.
I want to be able to take the original string and produce something like this on A5 paper:
You can use a LINQ query:
var cells = from row in input.Split('\n')
select row.Split('\t');
You can get just the first row using First() and the remaining rows using Skip(). For example:
foreach (string s in cells.First())
{
Console.WriteLine("First: " + s);
}
Or
foreach (string[] row in cells.Skip(1))
{
Console.WriteLine(String.Join(",", row));
}
The code below should do what you requested. This resulted in part1 having 5 entries and part2 having 26 entries
string input = "aa\tbbb\tccc\tddd\teee\nAccount\tType\tPersonal Current Account\tCurrency\tGBP\tBalance\t123,456.78\nDate\tDetails\tAmount\n03NOV13\tTransfer\t9,999,999.00-\n02NOV13\t\Cheque\t125.00\nDebit Card Cash\t200.00\n30OCT13\tLoan Repayment\t1,234.56-\n\tType\t30-Day Notice Savings Account\tCurrency\tGBP\tBalance\t983,456.78\nDate\tDetails\tAmount\n03NOV13\tRepaid\t\250\n";
// Substring starting at 0 and ending where the first newline begins
string input1 = input.Substring(0, input.IndexOf(#"\n"));
/* Substring starting where the first newline begins
plus the length of the new line to the end */
string input2 = input.Substring(input.IndexOf(#"\n") + 2);
string[] part1 = Regex.Split(input1, #"\\t");
string[] part2 = Regex.Split(input2, #"\\t");
My string is like this:
string input = "STRIP, HR 3/16 X 1 1/2 X 1 5/8 + API";
Here actually I want to extract the last word, 'API', and return.
What would be the C# code to do the above extraction?
Well, the naive implementation to that would be to simply split on each space and take the last element.
Splitting is done using an instance method on the String object, and the last of the elements can either be retrieved using array indexing, or using the Last LINQ operator.
End result:
string lastWord = input.Split(' ').Last();
If you don't have LINQ, I would do it in two operations:
string[] parts = input.Split(' ');
string lastWord = parts[parts.Length - 1];
While this would work for this string, it might not work for a slightly different string, so either you'll have to figure out how to change the code accordingly, or post all the rules.
string input = ".... ,API";
Here, the comma would be part of the "word".
Also, if the first method of obtaining the word is correct, that is, everything after the last space, and your string adheres to the following rules:
Will always contain at least one space
Does not end with one or more spaces (in case of this you can trim it)
Then you can use this code that will allocate fewer objects on the heap for GC to worry about later:
string lastWord = input.Substring(input.LastIndexOf(' ') + 1);
However, if you need to consider commas, semicolons, and whatnot, the first method using splitting is the best; there are fewer things to keep track of.
First:
using System.Linq; // System.Core.dll
then
string last = input.Split(' ').LastOrDefault();
// or
string last = input.Trim().Split(' ').LastOrDefault();
// or
string last = input.Trim().Split(' ').LastOrDefault().Trim();
var last = input.Substring(input.LastIndexOf(' ')).TrimStart();
This method doesn't allocate an entire array of strings as the others do.
string workingInput = input.Trim();
string last = workingInput.Substring(workingInput.LastIndexOf(' ')).Trim();
Although this may fail if you have no spaces in the string. I think splitting is unnecessarily intensive just for one word :)
static class Extensions
{
private static readonly char[] DefaultDelimeters = new char[]{' ', '.'};
public string LastWord(this string StringValue)
{
return LastWord(StringValue, DefaultDelimeters);
}
public string LastWord(this string StringValue, char[] Delimeters)
{
int index = StringValue.LastIndexOfAny(Delimeters);
if(index>-1)
return StringValue.Substring(index);
else
return null;
}
}
class Application
{
public void DoWork()
{
string sentence = "STRIP, HR 3/16 X 1 1/2 X 1 5/8 + API";
string lastWord = sentence.LastWord();
}
}
var lastWord = input.Split(new char[] {' '}, StringSplitOptions.RemoveEmptyEntries).Last();
string input = "STRIP, HR 3/16 X 1 1/2 X 1 5/8 + API";
var a = input.Split(' ');
Console.WriteLine(a[a.Length-1]);
I need to parse a string so the result should output like that:
"abc,def,ghi,klm,nop"
But the string I am receiving could looks more like that:
",,,abc,,def,ghi,,,,,,,,,klm,,,nop"
The point is, I don't know in advance how many commas separates the words.
Is there a regex I could use in C# that could help me resolve this problem?
You can use the ,{2,} expression to match any occurrences of 2 or more commas, and then replace them with a single comma.
You'll probably need a Trim call in there too, to remove any leading or trailing commas left over from the Regex.Replace call. (It's possible that there's some way to do this with just a regex replace, but nothing springs immediately to mind.)
string goodString = Regex.Replace(badString, ",{2,}", ",").Trim(',');
Search for ,,+ and replace all with ,.
So in C# that could look like
resultString = Regex.Replace(subjectString, ",,+", ",");
,,+ means "match all occurrences of two commas or more", so single commas won't be touched. This can also be written as ,{2,}.
a simple solution without regular expressions :
string items = inputString.Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries);
string result = String.Join(",", items);
Actually, you can do it without any Trim calls.
text = Regex.Replace(text, "^,+|,+$|(?<=,),+", "");
should do the trick.
The idea behind the regex is to only match that, which we want to remove. The first part matches any string of consecutive commas at the start of the input string, the second matches any consecutive string of commas at the end, while the last matches any consecutive string of commas that follows a comma.
Here is my effort:
//Below is the test string
string test = "YK 002 10 23 30 5 TDP_XYZ "
private static string return_with_comma(string line)
{
line = line.TrimEnd();
line = line.Replace(" ", ",");
line = Regex.Replace(line, ",,+", ",");
string[] array;
array = line.Split(',');
for (int x = 0; x < array.Length; x++)
{
line += array[x].Trim();
}
line += "\r\n";
return line;
}
string result = return_with_comma(test);
//Output is
//YK,002,10,23,30,5,TDP_XYZ