Substring search in string using linq - c#

I am looking in a string for operators. I need the actual operator and its index in the string
For example: x>10&y>=10
Operators
>
&
>=
=
So I need results like
> 1
& 4
>= 6
So I wrote the code like this
string substr= "x>10&y>=10";
List<string> substringList = new List<string>{">", "&", ">=", "="};
var orderedOccurances = substringList
.Where((substr) => str.IndexOf(substr, StringComparison.Ordinal) >= 0)
.Select((substr, inx) => new
{ substr, inx = str.IndexOf(substr, StringComparison.Ordinal) })
.OrderBy(x => x.inx).ToList();
However I got results like this(obviously)
> 1
& 4
> 6
= 7
I can use a for loop for the search and cover this error scenario. But I like the linq short hand code. Is there anyway that I can cover the error condition using lambdas/linq?

Here is more general alternative:
string str = "x>10&y>=10";
var result = Regex.Matches(str, #">=|>|&|=").Cast<Match>()
.Select(m => new { s = m.Value, i = m.Index }).ToList();
Result:
> 1
& 4
>= 6
or a bit shorter if there aren't any other operators in the string:
var d = Regex.Matches(str, #"\W+").Cast<Match>().ToDictionary(m => m.Index, m => m.Value);

So basically what you want is to scan your sequence for the characters '<', '>', '=' and '&', and if any of them found remember the index and the found character, if '<' or '>' is found you want to know if '=' is after it, and if so, the next search should start after the '='.
Note that you didn't specify what you want with &= or ==.
Whenever you have to scan strings for some syntax, it is always wise to at least consider the use of regular expressions.
According to the specification above you want a regular expression that matches if you find any of the following:
'<='
'>='
'='
'&'
'<' followed by something else than '='
'>' followed by something else than '='
Code would be simple:
using System.Text.RegularExpressions;
string expression = ...;
var regex = new RegularExpression("&|<=|>=|[<>][^=]");
var matches = regex.Matches(expression);
Object matches is an array of Match objects. Every match object has properties Index, Length and Value; exactly the properties you want.
foreach (var match in matches)
{
Console.WriteLine($"Match {match.Value} found"
+ " at index {match.Index} with length {match.Length}");
}
The vertical bar | in the regular expression means an OR; the [ ] means any of the characters between the brackets,; the [^ ] means NOT any of the characters between the brackets.
So a match is found if either & or <= or >= or any character in <> which is not followed by =.
If you also want to find &= and ==, then your reguilar expression would be even easier:
find any <>&= that is followed by =
or find any <>&= that is not followed by =
Code:
var regex = new Regex("[<>&=]|[<>&=][^=]");
A good online regex tester where you can check your regular expression can be found here. This shows also which matches are found and a description of the syntax of regular expressions.

Well, if you are bent on using LINQ you could do the following:
public static IEnumerable<(int Index, string Substring)> GetAllIndicees(this string str, IEnumerable<string> subtrings)
{
IEnumerable<(int Index, string Substring)> GetAllIndicees(string substring)
{
if (substring.Length > str.Length)
return Enumerable.Empty<(int, string)>();
if (substring.Length == str.Length)
return Enumerable.Repeat((0, str), 1);
return from start in Enumerable.Range(0, str.Length - substring.Length + 1)
where str.Substring(start, substring.Length).Equals(substring)
select (start, substring);
}
var alloperators = subtrings.SelectMany(s => GetAllIndicees(s));
return alloperators.Where(o => !alloperators.Except(new[] { o })
.Any(other => o.Index >= other.Index &&
o.Index < other.Index + other.Substring.Length &&
other.Substring.Contains(o.Substring)));
}
using c#7 syntax here becuase code is more concise and readable but its easily translatable to previous versions.
And now if you do:
var substr = "x>10&y>=10";
var operators = new HashSet<string>(new[] { ">", "&", ">=", "=" });
Console.WriteLine(string.Join(", ", filteredOperators.Select(o => $"[{o.Operator}: {o.Index}]")));
You'll get the expected result:
[>: 1], [&: 4], [>=: 6]
Is this "better" than using other tools? I'm not so sure.

Related

Remove anything from string after any "a-zA-Z" char

I have this types of string:
"10a10", "10b5641", "5a1121", "438z2a5f"
and I need to remove anything after the FIRST a-zA-Z char in the string (the symbol itself should be removed as well). What could be a solution?
Examples of results I expect:
"10a10" returns "10"
"10b5641" returns "10"
"5a1121" returns "5"
"438z2a5f" returns "438"
You could use Regular Expressions along with Regex, something like:
string str = "10a10";
str = Regex.Replace(str, #"[a-zA-Z].*", "");
Console.WriteLine(str);
will output:
10
Basically it will takes everything that starts with a-zA-Z and everything after it (.* matches any characters zero or unlimited times) and remove it from the string.
An easy to understand approach would be to use the String.IndexOfAny Method to find the Index of the first a-zA-Z char, and then use the String.Substring Method to cut the string accordingly.
To do so you would create an array containing all a-zA-Z characters and use this as an argument to String.IndexOfAny. After that you use 0 and the result of String.IndexOfAny as arguments for String.Substring.
I am pretty sure there are more elegant ways to do this, but this seems the most basic approach to me, so its worth mentioning.
You could do so using Linq as follows.
var result = new string(strInput.TakeWhile(x => !char.IsLetter(x)).ToArray());
var sList = new List<string> { "10a10", "10b5641", "5a1121", "438z2a5f" };
foreach (string s in sList.ToArray())
{
string number = new string(s.TakeWhile(c => !Char.IsLetter(c)).ToArray());
Console.WriteLine(number);
}
Either Linq:
var result = string.Concat(strInput
.TakeWhile(c => !((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z')));
Or regular expression:
using System.Text.RegularExpressions;
...
var result = Regex.Match(strInput, "^[^A-Za-z]*").Value;
In both cases starting from strInput beginning take characters until a..z or A-Z occurred
Demo:
string[] tests = new[] {
"10a10", "10b5641", "5a1121", "438z2a5f"
};
string demo = string.Join(Environment.NewLine, tests
.Select(test => $"{test,-10} returns \"{Regex.Match(test, "^[^A-Za-z]*").Value}\""));
Console.Write(demo);
Outcome:
10a10 returns "10"
10b5641 returns "10"
5a1121 returns "5"
438z2a5f returns "438"

Replace text in camelCase inside the string

I have string like:
/api/agencies/{AgencyGuid}/contacts/{ContactGuid}
I need to change text in { } to cameCase
/api/agencies/{agencyGuid}/contacts/{contactGuid}
How can I do that? What is the best way to do that? Please help
I have no experience with Regex. So, I have tried so far:
string str1 = "/api/agencies/{AgencyGuid}/contacts/{ContactGuid}";
string str3 = "";
int i = 0;
while(i < str1.Length)
{
if (str1[i] == '{')
{
str3 += "{" + char.ToLower(str1[i + 1]);
i = i + 2;
} else
{
str3 += str1[i];
i++;
}
}
You can do it with regex of course.
But you can do it also with LINQ like this:
var result = String.Join("/{",
str1.Split(new string[1] { "/{" }, StringSplitOptions.RemoveEmptyEntries)
.Select(k => k = !k.StartsWith("/") ? Char.ToLowerInvariant(k[0]) + k.Substring(1) : k));
What is done here is: Splitting into 3 parts:
"/api/agencies/"
"AgencyGuid}/contactpersons"
"ContactPersonGuid}"
After that we are selecting from each element such value: "If you start with "/" it means you are the first element. If so - you should be returned without tampering. Otherwise : take first char (k[0]) change it to lowercase ( Char.ToLowerInvariant() ) and concatenate with the rest.
At the end Join those three (one unchanged and two changed) strings
With Regex you can do it as:
var regex = new Regex(#"\/{(\w)");
var result = regex.Replace(str1, m => m.ToString().ToLower());
in regex we search for pattern "/{\w" meaning find "/{" and one letter (\w). This char will be taken into a group ( because of () surrounding) and after that run Regex and replace such group to m.ToString().ToLower()
I probably wouldn't use regex, but since you asked
Regex.Replace(
"/api/agencies/{AgencyGuid}/contactpersons/{ContactPersonGuid}",
#"\{[^\}]+\}",
m =>
$"{{{m.Value[1].ToString().ToLower()}{m.Value.Substring(2, m.Value.Length-3)}}}",
RegexOptions.ExplicitCapture
)
This assumes string interpolation in c# 6, but you can do the same thing by concatenating.
Explanation:
{[^}]+} - grab all letters that follow an open mustache that are not a close mustache and then the close mustache
m => ... - A lambda to run on each match
"{{{m.Value[1].ToString().ToLower()}{m.Value.Substring(2, m.Value.Length-3)}}}" - return a new string by taking the an open mustache, the first letter lowercased, then the rest of the string, then a close mustache.

Using regex on a specific setup

I know a bit about regular expressions, but far from enough to figure out this one.
I have tried to see if I could find something that could help me, but I got a hard time understanding how to construct the REGEX expression in c#.
Here is what I need.If I have a string like the following.
string s = "this is (a (string))"
What I need is to focus on the parentheses.
I want to be able to split this string up into the following List/Array "parts".
1) "this", "is", "a (string)"
or
2) "this", "is", "(a (string))".
would both like how to do it with 1) and 2). Anyone got an idea of how to solve this problem?
Can this be solved using REGEX? Anyone knows a good guide to learn about it?
Hope someone can help.
Greetings.
If you want to split with some kind of escape (do not count for space if it's within parentheses) you
can easily implement something like this, easy loop without regular expressions:
private static IEnumerable<String> SplitWithEscape(String source) {
if (String.IsNullOrEmpty(source))
yield break;
int escapeCount = 0;
int start = 0;
for (int i = 0; i < source.Length; ++i) {
char ch = source[i];
if (escapeCount > 0) {
if (ch == '(')
escapeCount += 1;
else if (ch == ')')
escapeCount -= 1;
}
else {
if (ch == ' ') {
yield return source.Substring(start, i - start);
start = i;
}
else if (ch == '(')
escapeCount += 1;
}
}
if ((start < source.Length - 1) && (escapeCount == 0))
yield return source.Substring(start);
}
....
String source = "this is (a (string))";
String[] split = SplitWithEscape(source).ToArray();
Console.Write(String.Join("; ", split));
You can try something like this:
([^\(\s]+)\s+([^\(\s]+)\s+\((.*)\)
Regex Demo
But this will only match with fixed number of words in your input string, in this case, two words before the parentheses. The final regex will depend on what are your specifications.
.NET regex supports balanced constructs. Thus, you can always safely use .NET regex to match substrings between a balanced number of delimiters that may have something inside them.
So, you can use
\(((?>[^()]+|\((?<o>)|\)(?<-o>))*(?(o)(?!)))\)|\S+
to match parenthesized substrings (while capturing the contents in-between parentheses into Group 1) or match all non-whitespace chunks (\S+ matches 1+ non-whitespace symbols).
See Grouping Constructs in Regular Expressions, Matching Nested Constructs with Balancing Groups or What are regular expression Balancing Groups? for more details on how balancing groups work.
Here is a regex demo
If you need to extract all the match values and captured values, you need to get all matched groups that are not empty or whitespace. So, use this C# code:
var line = "this is (a (string))";
var pattern = #"\(((?>[^()]+|\((?<o>)|\)(?<-o>))*(?(o)(?!)))\)|\S+";
var result = Regex.Matches(line, pattern)
.Cast<Match>()
.SelectMany(x => x.Groups.Cast<Group>()
.Where(m => !string.IsNullOrWhiteSpace(m.Value))
.Select(t => t.Value))
.ToList();
foreach (var s in result) // DEMO
Console.WriteLine(s);
Maybe you can use ((?<=\()[^}]*(?=\)))|\W+ to split in words and then get the content in the group 1...
See this Regex

Split the string with different conditions using Linq in C#

I need to extract and remove a word from a string. The word should be upper-case, and following one of the delimiters /, ;, (, - or a space.
Some Examples:
"this is test A/ABC"
Expected output: "this is test A" and "ABC"
"this is a test; ABC/XYZ"
Expected output: "this is a test; ABC" and "XYZ"
"This TASK is assigned to ANIL/SHAM in our project"
Expected output: "This TASK is assigned to ANIL in our project" and "SHAM"
"This TASK is assigned to ANIL/SHAM in OUR project"
Expected output: "This TASK is assigned to ANIL/SHAM in project" and "OUR"
"this is test AWN.A"
Expected output: "this is test" and "AWN.A"
"XETRA-DAX"
Expected output: "XETRA" and "DAX"
"FTSE-100"
Expected output: "-100" and "FTSE"
"ATHEX"
Expected output: "" and "ATHEX"
"Euro-Stoxx-50"
Expected output: "Euro-Stoxx-50" and ""
How can I achieve that?
An "intelligent" version:
string strValue = "this is test A/ABC";
int ix = strValue.LastIndexOfAny(new[] { '/', ' ', ';', '(', '-' });
var str1 = strValue.Substring(0, ix);
var str2 = strValue.Substring(ix + 1);
A "stupid LINQ" version:
var str3 = new string(strValue.Reverse().SkipWhile(p => p != '/' && p != ' ' && p != ';' && p != '(' && p != '-').Skip(1).Reverse().ToArray());
var str4 = new string(strValue.Reverse().TakeWhile(p => p != '/' && p != ' ' && p != ';' && p != '(' && p != '-').Reverse().ToArray());
both cases are WITHOUT checks. The OP can add checks if he wants them.
For the second question, using LINQ is REALLY too much difficult. With a Regex it's "easily doable".
var regex = new Regex("^(.*[A-Z]+)([-/ ;(]+)([A-Z]+)(.*?)$");
var strValueWithout = regex.Replace(strValue, "$1$4");
var extractedPart = regex.Replace(strValue, "$3");
For the third question
var regex = new Regex("^(.*?)([A-Z.]*)([-/ ;(]+)([A-Z.]+)(.*?)$", RegexOptions.RightToLeft);
var strValueWithout = regex.Replace(strValue, "$1$2$5");
var extractedPart = regex.Replace(strValue, "$4");
With code sample: http://ideone.com/5OSs0
Another update (it's becoming BORING)
Regex Regex = new Regex(#"^(?<1>.*?)(?<2>[-/ ;(]*)(?<=\b)(?<3>[A-Z.]+)(?=\b)(?<4>.*?)$|^(?<1>.*)$", RegexOptions.RightToLeft);
Regex Regex2 = new Regex(#"^(?<1>.*?)(?<2>[-/ ;(]*)(?<=\b)(?<3>(?:\p{Lu}|\.)+)(?=\b)(?<4>.*?)$|^(?<1>.*)$", RegexOptions.RightToLeft);
var str1 = Regex.Replace(str, "$1$4");
var str2 = Regex.Replace(str, "$3");
The difference between the two is that the first will use A-Z as upper case characters, the second one will use other "upper case" characters, for example ÀÈÉÌÒÙ
With code sample: http://ideone.com/FqcmY
This should work according to the new requirements: it should find the last separator that is wrapped with uppercase words:
Match lastSeparator = Regex.Match(strExample,
#"(?<=\b\p{Lu}+)[-/ ;(](\p{Lu}+)\b",
RegexOptions.RightToLeft); // last match
string main = lastSeparator.Result("$`$'"); // before and after the match
string word = lastSeparator.Groups[1].Value; // word after the separator
This regex is a little tricky. Main tricks:
Use RegexOptions.RightToLeft to find the last match.
Use of Match.Result for a replace.
$`$' as replacement string: http://www.regular-expressions.info/refreplace.html
\p{Lu} for upper-case letters, you can change that to [A-Z] if your more comfortable with that.
If the word shouldn't follow an upper case word, you can simplify the regex to:
#"[-/ ;(](\p{Lu}+)\b"
If you want other characters as well, you can use a character class (and maybe remove \b). For example:
#"[-/ ;(]([\p{Lu}.,]+)"
Working example: http://ideone.com/U9AdK
use a List of strings, set all the words to it
find the index of the / then use ElementAt() to determine the word to split which is "SHAM" in your question.
in the below sentence of yours your index of / will be 6.
string strSentence ="This TASK is assigned to ANIL/SHAM in our project";
then use ElementAt(6) at the end of
index is the index of the / in your List<string>
str = str.Select(s => strSentence.ElementAt(index+1)).ToList();
this will return you the SHAM
str = str.Delete(s => strSentence.ElementAt(index+1));
this will delete the SHAM then just print the strSentence without SHAM
if you dont want to use a list of strings you can use the " " to determinate the words in your sentence i think, but that would be a long way to go.
the idea of mine is right i think but the code may not be that flawless.
You can use a combination of the string.Split() method and the Regex class. A simple Split is suitable for simple cases, such as splitting according to the character /. Regular expressions are perfect for matching more complicated patterns.
As a proof of concept, you could re-implement Split in LINQ using TakeWhile and SkipWhile
string strValue = "this is test A/ABC";
var s1=new string(
strValue
.TakeWhile(c => c!= '/')
.ToArray());
var s2=new string(
strValue
.SkipWhile(c => c!= '/')
.Skip(1)
.ToArray());
I think the resulting code is so mind-blowingly ugly that I hope you'll decide not to use linq

Pattern matching problem in C#

I have a string like "AAA 101 B202 C 303 " and I want to get rid of the space between char and number if there is any.
So after operation, the string should be like "AAA101 B202 C303 ". But I am not sure whether regex could do this?
Any help? Thanks in advance.
Yes, you can do this with regular expressions. Here's a short but complete example:
using System;
using System.Text.RegularExpressions;
class Test
{
static void Main()
{
string text = "A 101 B202 C 303 ";
string output = Regex.Replace(text, #"(\p{L}) (\d)", #"$1$2");
Console.WriteLine(output); // Prints A101 B202 C303
}
}
(If you're going to do this a lot, you may well want to compile a regular expression for the pattern.)
The \p{L} matches any unicode letter - you may want to be more restrictive.
You can do something like
([A-Z]+)\s?(\d+)
And replace with
$1$2
The expression can be tightened up, but the above should work for your example input string.
What it does is declaring a group containing letters (first set of parantheses), then an optional space (\s?), and then a group of digits (\d+). The groups can be used in the replacement by referring to their index, so when you want to get rid of the space, just replace with $1$2.
While not as concise as Regex, the C# code for something like this is fairly straightforward and very fast-running:
StringBuilder sb = new StringBuilder();
for(int i=0; i<s.Length; i++)
{
// exclude spaces preceeded by a letter and succeeded by a number
if(!(s[i] == ' '
&& i-1 >= 0 && IsLetter(s[i-1])
&& i+1 < s.Length && IsNumber(s[i+1])))
{
sb.Append(s[i]);
}
}
return sb.ToString();
Just for fun (because the act of programming is/should be fun sometimes) :o) I'm using LINQ with Aggregate:
var result = text.Aggregate(
string.Empty,
(acc, c) => char.IsLetter(acc.LastOrDefault()) && Char.IsDigit(c) ?
acc + c.ToString() : acc + (char.IsWhiteSpace(c) && char.IsLetter(acc.LastOrDefault()) ?
string.Empty : c.ToString())).TrimEnd();

Categories

Resources