String.Split cut separator - c#

Is it possible to use String.Split without cutting separator from string?
For example I have string
convertSource = "http://www.domain.com http://www.domain1.com";
I want to build array and use code below
convertSource.Split(new[] { " http" }, StringSplitOptions.RemoveEmptyEntries)
I get such array
[1] http://www.domain.com
[2] ://www.domain1.com
I would like to keep http, it seems String.Split not only separate string but also cut off separator.

This is screaming for Regular Expressions:
Regex regEx = new Regex(#"((mailto\:|(news|(ht|f)tp(s?))\://){1}\S+)");
Match match= regEx.Match("http://www.domain.com http://www.domain1.com");
IList<string> values = new List<string>();
while (match.Success)
{
values.Add(match.Value);
match = match.NextMatch();
}

string[] array = Regex.Split(convertSource, #"(?=http://)");

That's because you use " http" as separator.
Try this:
string separator = " ";
convertSource.Split(separator.ToCharArray(), StringSplitOptions.RemoveEmptyEntries)
The Split method works in a way that when it comes to the separator you provide it cuts it off right there and removes the separator from the string also.
From what you are saying you want to do there are other ways to split the string keeping the delimiters and then if you only want to remove leading or trailing spaces from your string then I wouuld suggest that you use .Trim() method: convertSource.Trim()

Related

Need to create a Regular expression to Split String after first \r\n

I have been stuck in a situation .
Here are few input strings -
"abacuses\r\n25"
"alphabet\r\n56,\r\n57"
"animals\r\n44,\r\n45,\r\n47"
I need the output to be splited like -
"abacuses\r\n25" to be splitted into A)abacuses B)25
"alphabet\r\n56,\r\n57" to be splitted into A)alphabet B)56,57
"animals\r\n44,\r\n45,\r\n47" to be splitted into A)animals B)44,45,47
So far I have tried this but it doesn't work-
string[] ina = Regex.Split(indexname, #"\r\n\D+");
string[] ina = Regex.Split(indexname, #"\r\n\");
Please Help
No regex needed in your example. You basicaly parse string:
string input = "animals\r\n44,\r\n45,\r\n47";
var split = input.Split(new char[]{'\r','\n',','}, StringSplitOptions.RemoveEmptyEntries);
var name = split[0]; //animals
var args = string.Join(",", split.Skip(1)); //44,45,37
Many people use it for parsing, but Regex is not a parsing language! It is pattern matcher! It is used to find substrings in string! If you can just Split your string - just do it, really. It is much easier to understand than Regex expression.
If you need to split a string at the first \r\n, you may use a String.Split with a count argument:
var line = "animals\r\n44,\r\n45,\r\n47";
var res = line
.Split(new[] {"\r\n"}, 2, StringSplitOptions.RemoveEmptyEntries);
// Demo output
Console.WriteLine(res[0]);
if (res.GetLength(0) > 1)
Console.WriteLine(res[1].Replace("\r\n", "")); // In the second value, linebreaks should be removed
See the C# demo
The 2 in .Split(new[] {"\r\n"}, 2, StringSplitOptions.RemoveEmptyEntries) means that the whole string should be split into 2 parts only and since the string is processed from left to right, the split will occur on the first "\r\n" substring found.

how to replace one/multiple spaces into a deliminator using C#

Now I'm parsing a text, I want to split and add one by one
But first thing first, the best way is to replace multiple spaces with one unique deliminator
Below is the sample target text:
Total fare 619,999.0d-
12 11 82139 09/13/2013 D 103,500.00 2/025189 PARK LA000137
09/13/2013 D 50.00 File Ticket - PS1309121018882/
Can anybody know how to handle it in C#?
the best way is to replace multiple spaces with one unique
deliminator
Not really sure if its the best way, but following works, without REGEX
string newStr = string.Join(":",
str.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries));
try
var strings = text.Split(' ').Where(str => str.Length > 0);
You can use a regular expression:
string delimiter = ":";
var whiteSpaceNormalised = Regex.Replace(input, #"\s+", delimiter);
Use regular expressions instead, replace more than one occurrence of space with single space
string parsedText = System.Text.RegularExpressions.Regex.Replace(inputString,"[ ]+"," ");

Regular Expression to split a sting

I have a string like
[123,234,321,....]
Can i use a regular expression to extract only the numbers ?
this is the code i am using now
public static string[] Split()
{
string s = "[123,234,345,456,567,678,789,890,100]";
var temp = s.Replace("[", "").Replace("]", "");
char[] separator = { ',' };
return temp.Split(separator);
}
You can use string.Split for this - no need for a regular expression, though your code can be simplified:
var nums = s.Split('[', ']', ',');
Thought you may want to exclude empty entries in the returned array:
var nums = s.Split(new[] { '[', ']', ',' },
StringSplitOptions.RemoveEmptyEntries);
There's an overload to Trim() that takes a character.
You could do this.
string s = "[123,234,345,456,567,678,789,890,100]";
var nums = s.Trim('[').Trim(']').Split(',');
If you want to use a regular expression, try:
string s = "[123,234,345,456,567,678,789,890,100]";
var matches = Regex.Matches(s, #"[0-9]+", RegexOptions.Compiled);
However, regular expressions tend to make your code less readable, so you might stick with your original approach.
Try with using string.Split method;
string s = "[123,234,345,456,567,678,789,890,100]";
var numbers = s.Split('[',']', ',');
foreach(var i in numbers )
Console.WriteLine(i);
Here is a DEMO.
EDIT: As Oded mentioned, you may want to use StringSplitOptions.RemoveEmptyEntries also.
string s = "[123,234,345,456,567,678,789,890,100]";
MatchCollection matches = Regex.Matches(s, #"(\d+)[,\]]");
string[] result = matches.OfType<Match>().Select(m => m.Groups[1].Value).ToArray();
Here the # is used to signify a verbatim string literal and allows the escape character '\' to be used directly in Regular expression notation without escaping itself "\".
\d is a digit, \d+ mean 1 or more digits. The parenthesis signify a group so (\d+) means I want a group of digits. (*See group used a little later)
[,\]] square brackets, in brief, mean choose any one of my element so it will choose either the comma , or a square bracket ] which I had to escape.
So the regular expression will find the expressions of sequential digits followed by a , or ]. The Matches will return the set of matches (which we use because there are multiple set) then we go through each match - with some LINQ - and grab the index 1 group which is the second group, "But we only made one group?" We only specified one group, the first group (index 0) is the entire regular expression match, which in our case, will include the , or ] which we don't want.
while you can and probably should use string.Split as other answers indicate, the question specifically asks if you can do it with regex, and yes, you can :-
var r = new Regex(#"\d+", RegexOptions.Compiled );
var matches = r.Matches("[123,234,345,456,567,678,789,890,100]");

Regex + Convert line of numbers separated by white space into array

I'm trying to convert a string that contains multiple numbers, where each number is separated by white space, into a double array.
For example, the original string looks like:
originalString = "50 12.2 30 48.1"
I've been using Regex.Split(originalString, #"\s*"), but it's returning an array that looks like:
[50
""
12
"."
2
""
...]
Any help is much appreciated.
Using this instead
originalString.Split(new char[]{'\t', '\n', ' ', '\r'}, StringSplitOptions.RemoveEmptyEntries);
No need to rush RegEx everytime :)
What about string[] myArray = originalString.Split(' ');
I don't see the need for a RegEx here..
If you really want to use a RegEx, use the pattern \s+ instead of \s*.
The * means zero or more, but you want to split on one or more space character.
Working example with a RegEx:
string originalString = "50 12.2 30 48.1";
string[] arr = Regex.Split(originalString, #"\s+");
foreach (string s in arr)
Console.WriteLine(s);
Regex.Split(originalString, #"\s+").Where(s => !string.IsNullOrWhiteSpace(s))
The Where returns an IEnumerable with the null/whitespace filtered out. if you want it as an array still, then just add .ToArray() to that chain of calls.
The + character is necessary because you need a MINIMUM of one to make this a correct match.
I would stick with String.Split, supplying all whitespace characters that you are expecting.
In regular expressions, \s is equivalent to [ \t\r\n] (plus some other characters specific to the flavour in use); we can represent these through a char[]:
string[] nums = originalString.Split(
new char[] { ' ', '\t', '\r', '\n' },
StringSplitOptions.RemoveEmptyEntries);
The default behaviour if you pass null as a separator to String.Split is to split on whitespace. That includes anything that matches the Unicode IsWhiteSpace test. Within the ASCII range that means tab, line feed, vertical tab, form feed, carriage return and space.
Also you can avoid empty fields by passing the RemoveEmptyEntries option.
originalString = "50 12.2 30 48.1";
string[] fields = originalString.Split(null as char[], StringSplitOptions.RemoveEmptyEntries);

Regex removing double/triple comma in string

I need to parse a string so the result should output like that:
"abc,def,ghi,klm,nop"
But the string I am receiving could looks more like that:
",,,abc,,def,ghi,,,,,,,,,klm,,,nop"
The point is, I don't know in advance how many commas separates the words.
Is there a regex I could use in C# that could help me resolve this problem?
You can use the ,{2,} expression to match any occurrences of 2 or more commas, and then replace them with a single comma.
You'll probably need a Trim call in there too, to remove any leading or trailing commas left over from the Regex.Replace call. (It's possible that there's some way to do this with just a regex replace, but nothing springs immediately to mind.)
string goodString = Regex.Replace(badString, ",{2,}", ",").Trim(',');
Search for ,,+ and replace all with ,.
So in C# that could look like
resultString = Regex.Replace(subjectString, ",,+", ",");
,,+ means "match all occurrences of two commas or more", so single commas won't be touched. This can also be written as ,{2,}.
a simple solution without regular expressions :
string items = inputString.Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries);
string result = String.Join(",", items);
Actually, you can do it without any Trim calls.
text = Regex.Replace(text, "^,+|,+$|(?<=,),+", "");
should do the trick.
The idea behind the regex is to only match that, which we want to remove. The first part matches any string of consecutive commas at the start of the input string, the second matches any consecutive string of commas at the end, while the last matches any consecutive string of commas that follows a comma.
Here is my effort:
//Below is the test string
string test = "YK 002 10 23 30 5 TDP_XYZ "
private static string return_with_comma(string line)
{
line = line.TrimEnd();
line = line.Replace(" ", ",");
line = Regex.Replace(line, ",,+", ",");
string[] array;
array = line.Split(',');
for (int x = 0; x < array.Length; x++)
{
line += array[x].Trim();
}
line += "\r\n";
return line;
}
string result = return_with_comma(test);
//Output is
//YK,002,10,23,30,5,TDP_XYZ

Categories

Resources