Split the string with different conditions using Linq in C# - c#

I need to extract and remove a word from a string. The word should be upper-case, and following one of the delimiters /, ;, (, - or a space.
Some Examples:
"this is test A/ABC"
Expected output: "this is test A" and "ABC"
"this is a test; ABC/XYZ"
Expected output: "this is a test; ABC" and "XYZ"
"This TASK is assigned to ANIL/SHAM in our project"
Expected output: "This TASK is assigned to ANIL in our project" and "SHAM"
"This TASK is assigned to ANIL/SHAM in OUR project"
Expected output: "This TASK is assigned to ANIL/SHAM in project" and "OUR"
"this is test AWN.A"
Expected output: "this is test" and "AWN.A"
"XETRA-DAX"
Expected output: "XETRA" and "DAX"
"FTSE-100"
Expected output: "-100" and "FTSE"
"ATHEX"
Expected output: "" and "ATHEX"
"Euro-Stoxx-50"
Expected output: "Euro-Stoxx-50" and ""
How can I achieve that?

An "intelligent" version:
string strValue = "this is test A/ABC";
int ix = strValue.LastIndexOfAny(new[] { '/', ' ', ';', '(', '-' });
var str1 = strValue.Substring(0, ix);
var str2 = strValue.Substring(ix + 1);
A "stupid LINQ" version:
var str3 = new string(strValue.Reverse().SkipWhile(p => p != '/' && p != ' ' && p != ';' && p != '(' && p != '-').Skip(1).Reverse().ToArray());
var str4 = new string(strValue.Reverse().TakeWhile(p => p != '/' && p != ' ' && p != ';' && p != '(' && p != '-').Reverse().ToArray());
both cases are WITHOUT checks. The OP can add checks if he wants them.
For the second question, using LINQ is REALLY too much difficult. With a Regex it's "easily doable".
var regex = new Regex("^(.*[A-Z]+)([-/ ;(]+)([A-Z]+)(.*?)$");
var strValueWithout = regex.Replace(strValue, "$1$4");
var extractedPart = regex.Replace(strValue, "$3");
For the third question
var regex = new Regex("^(.*?)([A-Z.]*)([-/ ;(]+)([A-Z.]+)(.*?)$", RegexOptions.RightToLeft);
var strValueWithout = regex.Replace(strValue, "$1$2$5");
var extractedPart = regex.Replace(strValue, "$4");
With code sample: http://ideone.com/5OSs0
Another update (it's becoming BORING)
Regex Regex = new Regex(#"^(?<1>.*?)(?<2>[-/ ;(]*)(?<=\b)(?<3>[A-Z.]+)(?=\b)(?<4>.*?)$|^(?<1>.*)$", RegexOptions.RightToLeft);
Regex Regex2 = new Regex(#"^(?<1>.*?)(?<2>[-/ ;(]*)(?<=\b)(?<3>(?:\p{Lu}|\.)+)(?=\b)(?<4>.*?)$|^(?<1>.*)$", RegexOptions.RightToLeft);
var str1 = Regex.Replace(str, "$1$4");
var str2 = Regex.Replace(str, "$3");
The difference between the two is that the first will use A-Z as upper case characters, the second one will use other "upper case" characters, for example ÀÈÉÌÒÙ
With code sample: http://ideone.com/FqcmY

This should work according to the new requirements: it should find the last separator that is wrapped with uppercase words:
Match lastSeparator = Regex.Match(strExample,
#"(?<=\b\p{Lu}+)[-/ ;(](\p{Lu}+)\b",
RegexOptions.RightToLeft); // last match
string main = lastSeparator.Result("$`$'"); // before and after the match
string word = lastSeparator.Groups[1].Value; // word after the separator
This regex is a little tricky. Main tricks:
Use RegexOptions.RightToLeft to find the last match.
Use of Match.Result for a replace.
$`$' as replacement string: http://www.regular-expressions.info/refreplace.html
\p{Lu} for upper-case letters, you can change that to [A-Z] if your more comfortable with that.
If the word shouldn't follow an upper case word, you can simplify the regex to:
#"[-/ ;(](\p{Lu}+)\b"
If you want other characters as well, you can use a character class (and maybe remove \b). For example:
#"[-/ ;(]([\p{Lu}.,]+)"
Working example: http://ideone.com/U9AdK

use a List of strings, set all the words to it
find the index of the / then use ElementAt() to determine the word to split which is "SHAM" in your question.
in the below sentence of yours your index of / will be 6.
string strSentence ="This TASK is assigned to ANIL/SHAM in our project";
then use ElementAt(6) at the end of
index is the index of the / in your List<string>
str = str.Select(s => strSentence.ElementAt(index+1)).ToList();
this will return you the SHAM
str = str.Delete(s => strSentence.ElementAt(index+1));
this will delete the SHAM then just print the strSentence without SHAM
if you dont want to use a list of strings you can use the " " to determinate the words in your sentence i think, but that would be a long way to go.
the idea of mine is right i think but the code may not be that flawless.

You can use a combination of the string.Split() method and the Regex class. A simple Split is suitable for simple cases, such as splitting according to the character /. Regular expressions are perfect for matching more complicated patterns.

As a proof of concept, you could re-implement Split in LINQ using TakeWhile and SkipWhile
string strValue = "this is test A/ABC";
var s1=new string(
strValue
.TakeWhile(c => c!= '/')
.ToArray());
var s2=new string(
strValue
.SkipWhile(c => c!= '/')
.Skip(1)
.ToArray());
I think the resulting code is so mind-blowingly ugly that I hope you'll decide not to use linq

Related

Remove anything from string after any "a-zA-Z" char

I have this types of string:
"10a10", "10b5641", "5a1121", "438z2a5f"
and I need to remove anything after the FIRST a-zA-Z char in the string (the symbol itself should be removed as well). What could be a solution?
Examples of results I expect:
"10a10" returns "10"
"10b5641" returns "10"
"5a1121" returns "5"
"438z2a5f" returns "438"
You could use Regular Expressions along with Regex, something like:
string str = "10a10";
str = Regex.Replace(str, #"[a-zA-Z].*", "");
Console.WriteLine(str);
will output:
10
Basically it will takes everything that starts with a-zA-Z and everything after it (.* matches any characters zero or unlimited times) and remove it from the string.
An easy to understand approach would be to use the String.IndexOfAny Method to find the Index of the first a-zA-Z char, and then use the String.Substring Method to cut the string accordingly.
To do so you would create an array containing all a-zA-Z characters and use this as an argument to String.IndexOfAny. After that you use 0 and the result of String.IndexOfAny as arguments for String.Substring.
I am pretty sure there are more elegant ways to do this, but this seems the most basic approach to me, so its worth mentioning.
You could do so using Linq as follows.
var result = new string(strInput.TakeWhile(x => !char.IsLetter(x)).ToArray());
var sList = new List<string> { "10a10", "10b5641", "5a1121", "438z2a5f" };
foreach (string s in sList.ToArray())
{
string number = new string(s.TakeWhile(c => !Char.IsLetter(c)).ToArray());
Console.WriteLine(number);
}
Either Linq:
var result = string.Concat(strInput
.TakeWhile(c => !((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z')));
Or regular expression:
using System.Text.RegularExpressions;
...
var result = Regex.Match(strInput, "^[^A-Za-z]*").Value;
In both cases starting from strInput beginning take characters until a..z or A-Z occurred
Demo:
string[] tests = new[] {
"10a10", "10b5641", "5a1121", "438z2a5f"
};
string demo = string.Join(Environment.NewLine, tests
.Select(test => $"{test,-10} returns \"{Regex.Match(test, "^[^A-Za-z]*").Value}\""));
Console.Write(demo);
Outcome:
10a10 returns "10"
10b5641 returns "10"
5a1121 returns "5"
438z2a5f returns "438"

Replace text in camelCase inside the string

I have string like:
/api/agencies/{AgencyGuid}/contacts/{ContactGuid}
I need to change text in { } to cameCase
/api/agencies/{agencyGuid}/contacts/{contactGuid}
How can I do that? What is the best way to do that? Please help
I have no experience with Regex. So, I have tried so far:
string str1 = "/api/agencies/{AgencyGuid}/contacts/{ContactGuid}";
string str3 = "";
int i = 0;
while(i < str1.Length)
{
if (str1[i] == '{')
{
str3 += "{" + char.ToLower(str1[i + 1]);
i = i + 2;
} else
{
str3 += str1[i];
i++;
}
}
You can do it with regex of course.
But you can do it also with LINQ like this:
var result = String.Join("/{",
str1.Split(new string[1] { "/{" }, StringSplitOptions.RemoveEmptyEntries)
.Select(k => k = !k.StartsWith("/") ? Char.ToLowerInvariant(k[0]) + k.Substring(1) : k));
What is done here is: Splitting into 3 parts:
"/api/agencies/"
"AgencyGuid}/contactpersons"
"ContactPersonGuid}"
After that we are selecting from each element such value: "If you start with "/" it means you are the first element. If so - you should be returned without tampering. Otherwise : take first char (k[0]) change it to lowercase ( Char.ToLowerInvariant() ) and concatenate with the rest.
At the end Join those three (one unchanged and two changed) strings
With Regex you can do it as:
var regex = new Regex(#"\/{(\w)");
var result = regex.Replace(str1, m => m.ToString().ToLower());
in regex we search for pattern "/{\w" meaning find "/{" and one letter (\w). This char will be taken into a group ( because of () surrounding) and after that run Regex and replace such group to m.ToString().ToLower()
I probably wouldn't use regex, but since you asked
Regex.Replace(
"/api/agencies/{AgencyGuid}/contactpersons/{ContactPersonGuid}",
#"\{[^\}]+\}",
m =>
$"{{{m.Value[1].ToString().ToLower()}{m.Value.Substring(2, m.Value.Length-3)}}}",
RegexOptions.ExplicitCapture
)
This assumes string interpolation in c# 6, but you can do the same thing by concatenating.
Explanation:
{[^}]+} - grab all letters that follow an open mustache that are not a close mustache and then the close mustache
m => ... - A lambda to run on each match
"{{{m.Value[1].ToString().ToLower()}{m.Value.Substring(2, m.Value.Length-3)}}}" - return a new string by taking the an open mustache, the first letter lowercased, then the rest of the string, then a close mustache.

How to remove only certain substrings from a string?

Using C#, I have a string that is a SQL script containing multiple queries. I want to remove sections of the string that are enclosed in single quotes. I can do this using Regex.Replace, in this manner:
string test = "Only 'together' can we turn him to the 'dark side' of the Force";
test = Regex.Replace(test, "'[^']*'", string.Empty);
Results in: "Only can we turn him to the of the Force"
What I want to do is remove the substrings between quotes EXCEPT for substrings containing a specific substring. For example, using the string above, I want to remove the quoted substrings except for those that contain "dark," such that the resulting string is:
Results in: "Only can we turn him to the 'dark side' of the Force"
How can this be accomplished using Regex.Replace, or perhaps by some other technique? I'm currently trying a solution that involves using Substring(), IndexOf(), and Contains().
Note: I don't care if the single quotes around "dark side" are removed or not, so the result could also be: "Only can we turn him to the dark side of the Force." I say this because a solution using Split() would remove all the single quotes.
Edit: I don't have a solution yet using Substring(), IndexOf(), etc. By "working on," I mean I'm thinking in my head how this can be done. I have no code, which is why I haven't posted any yet. Thanks.
Edit: VKS's solution below works. I wasn't escaping the \b the first attempt which is why it failed. Also, it didn't work unless I included the single quotes around the whole string as well.
test = Regex.Replace(test, "'(?![^']*\\bdark\\b)[^']*'", string.Empty);
'(?![^']*\bdark\b)[^']*'
Try this.See demo.Replace by empty string.You can use lookahead here to check if '' contains a word dark.
https://www.regex101.com/r/rG7gX4/12
While vks's solution works, I'd like to demonstrate a different approach:
string test = "Only 'together' can we turn him to the 'dark side' of the Force";
test = Regex.Replace(test, #"'[^']*'", match => {
if (match.Value.Contains("dark"))
return match.Value;
// You can add more cases here
return string.Empty;
});
Or, if your condition is simple enough:
test = Regex.Replace(test, #"'[^']*'", match => match.Value.Contains("dark")
? match.Value
: string.Empty
);
That is, use a lambda to provide a callback for the replacement. This way, you can run arbitrary logic to replace the string.
some thing like this would work. you can add all strings you want to keep into the excludedStrings array
string test = "Only 'together' can we turn him to the 'dark side' of the Force";
var excludedString = new string[] { "dark side" };
int startIndex = 0;
while ((startIndex = test.IndexOf('\'', startIndex)) >= 0)
{
var endIndex = test.IndexOf('\'', startIndex + 1);
var subString = test.Substring(startIndex, (endIndex - startIndex) + 1);
if (!excludedString.Contains(subString.Replace("'", "")))
{
test = test.Remove(startIndex, (endIndex - startIndex) + 1);
}
else
{
startIndex = endIndex + 1;
}
}
Another method through regex alternation operator |.
#"('[^']*\bdark\b[^']*')|'[^']*'"
Then replace the matched character with $1
DEMO
string str = "Only 'together' can we turn him to the 'dark side' of the Force";
string result = Regex.Replace(str, #"('[^']*\bdark\b[^']*')|'[^']*'", "$1");
Console.WriteLine(result);
IDEONE
Explanation:
(...) called capturing group.
'[^']*\bdark\b[^']*' would match all the single quoted strings which contains the substring dark . [^']* matches any character but not of ', zero or more times.
('[^']*\bdark\b[^']*'), because the regex is within a capturing group, all the matched characters are stored inside the group index 1.
| Next comes the regex alternation operator.
'[^']*' Now this matches all the remaining (except the one contains dark) single quoted strings. Note that this won't match the single quoted string which contains the substring dark because we already matched those strings with the pattern exists before to the | alternation operator.
Finally replacing all the matched characters with the chars inside group index 1 will give you the desired output.
I made this attempt that I think you were thinking about (some solution using split, Contain, ... without regex)
string test = "Only 'together' can we turn him to the 'dark side' of the Force";
string[] separated = test.Split('\'');
string result = "";
for (int i = 0; i < separated.Length; i++)
{
string str = separated[i];
str = str.Trim(); //trim the tailing spaces
if (i % 2 == 0 || str.Contains("dark")) // you can expand your condition
{
result += str+" "; // add space after each added string
}
}
result = result.Trim(); //trim the tailing space again

Replace Contiguous Instance of a String in C#

How can I replace contiguous substring of a string in C#?
For example, the string
"<p>The quick fox</p>"
will be converted to
"<p>The quick fox</p>"
Use the below regex
#"(.+)\1+"
(.+) captures the group of characters and matches also the following \1+ one or more same set of characters.
And then replace the match with $1
DEMO
string result = Regex.Replace(str, #"(.+)\1+", "$1");
Maybe this simple one is enough:
( ){2,}
and replace with $1 ( that's captured in first parenthesized group)
See test at regex101
To check, if a substring is followed by itself, also can use a lookahead:
(?:( )(?=\1))+
and replace with empty. See test at regex101.com
Let's call the original string s and the substring subString:
var s = "<p>The quick fox</p>";
var subString = " ";
I'd prefer this instead of a regex, much more readable:
var subStringTwice = subString + subString;
while (s.Contains(subStringTwice))
{
s = s.Replace(subStringTwice, subString);
}
Another possible solution with better performance:
var elements = s.Split(new []{subString}, StringSplitOptions.RemoveEmptyEntries);
s = string.Join(subString, elements);
// This part is only needed when subString can appear at the start or the end of s
if (result != "")
{
if (s.StartsWith(subString)) result = subString + result;
if (s.EndsWith(subString)) result = result + subString;
}

replacing characters in a single field of a comma-separated list

I have string in my c# code
a,b,c,d,"e,f",g,h
I want to replace "e,f" with "e f" i.e. ',' which is inside inverted comma should be replaced by space.
I tried using string.split but it is not working for me.
OK, I can't be bothered to think of a regex approach so I am going to offer an old fashioned loop approach which will work:
string DoReplace(string input)
{
bool isInner = false;//flag to detect if we are in the inner string or not
string result = "";//result to return
foreach(char c in input)//loop each character in the input string
{
if(isInner && c == ',')//if we are in an inner string and it is a comma, append space
result += " ";
else//otherwise append the character
result += c;
if(c == '"')//if we have hit an inner quote, toggle the flag
isInner = !isInner;
}
return result;
}
NOTE: This solution assumes that there can only be one level of inner quotes, for example you cannot have "a,b,c,"d,e,"f,g",h",i,j" - because that's just plain madness!
For the scenario where you only need to match one pair of letters, the following regex will work:
string source = "a,b,c,d,\"e,f\",g,h";
string pattern = "\"([\\w]),([\\w])\"";
string replace = "\"$1 $2\"";
string result = Regex.Replace(source, pattern, replace);
Console.WriteLine(result); // a,b,c,d,"e f",g,h
Breaking apart the pattern, it is matching any instance where there is a "X,X" sequence where X is any letter, and is replacing it with the very same sequence, with a space in between the letters instead of a comma.
You could easily extend this if you needed to to have it match more than one letter, etc, as needed.
For the case where you can have multiple letters separated by commas within quotes that need to be replaced, the following can do it for you. Sample text is a,b,c,d,"e,f,a",g,h:
string source = "a,b,c,d,\"e,f,a\",g,h";
string pattern = "\"([ ,\\w]+),([ ,\\w]+)\"";
string replace = "\"$1 $2\"";
string result = source;
while (Regex.IsMatch(result, pattern)) {
result = Regex.Replace(result, pattern, replace);
}
Console.WriteLine(result); // a,b,c,d,"e f a",g,h
This does something similar compared to the first one, but just removes any comma that is sandwiched by letters surrounded by quotes, and repeats it until all cases are removed.
Here's a somewhat fragile but simple solution:
string.Join("\"", line.Split('"').Select((s, i) => i % 2 == 0 ? s : s.Replace(",", " ")))
It's fragile because it doesn't handle flavors of CSV that escape double-quotes inside double-quotes.
Use the following code:
string str = "a,b,c,d,\"e,f\",g,h";
string[] str2 = str.Split('\"');
var str3 = str2.Select(p => ((p.StartsWith(",") || p.EndsWith(",")) ? p : p.Replace(',', ' '))).ToList();
str = string.Join("", str3);
Use Split() and Join():
string input = "a,b,c,d,\"e,f\",g,h";
string[] pieces = input.Split('"');
for ( int i = 1; i < pieces.Length; i += 2 )
{
pieces[i] = string.Join(" ", pieces[i].Split(','));
}
string output = string.Join("\"", pieces);
Console.WriteLine(output);
// output: a,b,c,d,"e f",g,h

Categories

Resources