regex expression last match C# - c#

I have string like this:
test- qweqw (Barcelona - Bayer) - testestsetset
And i need to capture Bayer word.I tried this regex expression ( between "-" and ")" )
(?<=-)(.*)(?=\))
Example: https://regex101.com/r/wI9zD0/2
As you see it worked a bit incorrect.What should i fix?

Here's a different regex to do what you are looking for:
-\s([^()]+)\)
https://regex101.com/r/wI9zD0/3

You don't need regex for that, you can use LINQ:
string input = "test - qweqw(Barcelona - Bayer) - testestsetset";
string res = String.Join("", input.SkipWhile(c => c != '(')
.SkipWhile(c => c != '-').Skip(1)
.TakeWhile(c => c != ')'))
.Trim();
Console.WriteLine(res); // Bayer

Related

Replace text in camelCase inside the string

I have string like:
/api/agencies/{AgencyGuid}/contacts/{ContactGuid}
I need to change text in { } to cameCase
/api/agencies/{agencyGuid}/contacts/{contactGuid}
How can I do that? What is the best way to do that? Please help
I have no experience with Regex. So, I have tried so far:
string str1 = "/api/agencies/{AgencyGuid}/contacts/{ContactGuid}";
string str3 = "";
int i = 0;
while(i < str1.Length)
{
if (str1[i] == '{')
{
str3 += "{" + char.ToLower(str1[i + 1]);
i = i + 2;
} else
{
str3 += str1[i];
i++;
}
}
You can do it with regex of course.
But you can do it also with LINQ like this:
var result = String.Join("/{",
str1.Split(new string[1] { "/{" }, StringSplitOptions.RemoveEmptyEntries)
.Select(k => k = !k.StartsWith("/") ? Char.ToLowerInvariant(k[0]) + k.Substring(1) : k));
What is done here is: Splitting into 3 parts:
"/api/agencies/"
"AgencyGuid}/contactpersons"
"ContactPersonGuid}"
After that we are selecting from each element such value: "If you start with "/" it means you are the first element. If so - you should be returned without tampering. Otherwise : take first char (k[0]) change it to lowercase ( Char.ToLowerInvariant() ) and concatenate with the rest.
At the end Join those three (one unchanged and two changed) strings
With Regex you can do it as:
var regex = new Regex(#"\/{(\w)");
var result = regex.Replace(str1, m => m.ToString().ToLower());
in regex we search for pattern "/{\w" meaning find "/{" and one letter (\w). This char will be taken into a group ( because of () surrounding) and after that run Regex and replace such group to m.ToString().ToLower()
I probably wouldn't use regex, but since you asked
Regex.Replace(
"/api/agencies/{AgencyGuid}/contactpersons/{ContactPersonGuid}",
#"\{[^\}]+\}",
m =>
$"{{{m.Value[1].ToString().ToLower()}{m.Value.Substring(2, m.Value.Length-3)}}}",
RegexOptions.ExplicitCapture
)
This assumes string interpolation in c# 6, but you can do the same thing by concatenating.
Explanation:
{[^}]+} - grab all letters that follow an open mustache that are not a close mustache and then the close mustache
m => ... - A lambda to run on each match
"{{{m.Value[1].ToString().ToLower()}{m.Value.Substring(2, m.Value.Length-3)}}}" - return a new string by taking the an open mustache, the first letter lowercased, then the rest of the string, then a close mustache.

Substring search in string using linq

I am looking in a string for operators. I need the actual operator and its index in the string
For example: x>10&y>=10
Operators
>
&
>=
=
So I need results like
> 1
& 4
>= 6
So I wrote the code like this
string substr= "x>10&y>=10";
List<string> substringList = new List<string>{">", "&", ">=", "="};
var orderedOccurances = substringList
.Where((substr) => str.IndexOf(substr, StringComparison.Ordinal) >= 0)
.Select((substr, inx) => new
{ substr, inx = str.IndexOf(substr, StringComparison.Ordinal) })
.OrderBy(x => x.inx).ToList();
However I got results like this(obviously)
> 1
& 4
> 6
= 7
I can use a for loop for the search and cover this error scenario. But I like the linq short hand code. Is there anyway that I can cover the error condition using lambdas/linq?
Here is more general alternative:
string str = "x>10&y>=10";
var result = Regex.Matches(str, #">=|>|&|=").Cast<Match>()
.Select(m => new { s = m.Value, i = m.Index }).ToList();
Result:
> 1
& 4
>= 6
or a bit shorter if there aren't any other operators in the string:
var d = Regex.Matches(str, #"\W+").Cast<Match>().ToDictionary(m => m.Index, m => m.Value);
So basically what you want is to scan your sequence for the characters '<', '>', '=' and '&', and if any of them found remember the index and the found character, if '<' or '>' is found you want to know if '=' is after it, and if so, the next search should start after the '='.
Note that you didn't specify what you want with &= or ==.
Whenever you have to scan strings for some syntax, it is always wise to at least consider the use of regular expressions.
According to the specification above you want a regular expression that matches if you find any of the following:
'<='
'>='
'='
'&'
'<' followed by something else than '='
'>' followed by something else than '='
Code would be simple:
using System.Text.RegularExpressions;
string expression = ...;
var regex = new RegularExpression("&|<=|>=|[<>][^=]");
var matches = regex.Matches(expression);
Object matches is an array of Match objects. Every match object has properties Index, Length and Value; exactly the properties you want.
foreach (var match in matches)
{
Console.WriteLine($"Match {match.Value} found"
+ " at index {match.Index} with length {match.Length}");
}
The vertical bar | in the regular expression means an OR; the [ ] means any of the characters between the brackets,; the [^ ] means NOT any of the characters between the brackets.
So a match is found if either & or <= or >= or any character in <> which is not followed by =.
If you also want to find &= and ==, then your reguilar expression would be even easier:
find any <>&= that is followed by =
or find any <>&= that is not followed by =
Code:
var regex = new Regex("[<>&=]|[<>&=][^=]");
A good online regex tester where you can check your regular expression can be found here. This shows also which matches are found and a description of the syntax of regular expressions.
Well, if you are bent on using LINQ you could do the following:
public static IEnumerable<(int Index, string Substring)> GetAllIndicees(this string str, IEnumerable<string> subtrings)
{
IEnumerable<(int Index, string Substring)> GetAllIndicees(string substring)
{
if (substring.Length > str.Length)
return Enumerable.Empty<(int, string)>();
if (substring.Length == str.Length)
return Enumerable.Repeat((0, str), 1);
return from start in Enumerable.Range(0, str.Length - substring.Length + 1)
where str.Substring(start, substring.Length).Equals(substring)
select (start, substring);
}
var alloperators = subtrings.SelectMany(s => GetAllIndicees(s));
return alloperators.Where(o => !alloperators.Except(new[] { o })
.Any(other => o.Index >= other.Index &&
o.Index < other.Index + other.Substring.Length &&
other.Substring.Contains(o.Substring)));
}
using c#7 syntax here becuase code is more concise and readable but its easily translatable to previous versions.
And now if you do:
var substr = "x>10&y>=10";
var operators = new HashSet<string>(new[] { ">", "&", ">=", "=" });
Console.WriteLine(string.Join(", ", filteredOperators.Select(o => $"[{o.Operator}: {o.Index}]")));
You'll get the expected result:
[>: 1], [&: 4], [>=: 6]
Is this "better" than using other tools? I'm not so sure.

Using Regex, how to find repeating patterns between 2 characters?

How an I use regex to find anything between 2 ASCII codes?
ASCII code STX (\u0002) and ETX (\u0003)
Example string "STX,T1,ETXSTX,1,1,1,1,1,1,ETXSTX,A,1,0,B,ERRETX"
Using Regex on the above my matches should be
,T1,
,1,1,1,1,1,1,
,A,1,0,B,ERR
Did a bit of googling and I tried the following pattern but it didn't find anything.
#"^\u0002.*\u0003$"
UPDATE: Thank you all, some great answers below and all seem to work!
You could use Regex.Split.
var input = (char)2 + ",T1," + (char)3 + (char)2 + ",1,1,1,1,1,1," + (char)3 + (char)2 + ",A,1,0,B,ERR" + (char)3;
var result = Regex.Split(input, "\u0002|\u0003").Where(r => !String.IsNullOrEmpty(r));
You may use a non-regex solution, too (based on Wyatt's answer):
var result = input.Split(new[] {'\u0002', '\u0003'}) // split with the known char delimiters
.Where(p => !string.IsNullOrEmpty(p)) // Only take non-empty ones
.ToList();
A Regex solution I suggested in comments:
var res = Regex.Matches(input, "(?s)\u0002(.*?)\u0003")
.OfType<Match>()
.Select(p => p.Groups[1].Value)
.ToList();
var s = "STX,T1,ETXSTX,1,1,1,1,1,1,ETXSTX,A,1,0,B,ERRETX";
s = s.Replace("STX", "\u0002");
s = s.Replace("ETX", "\u0003");
var result1 = Regex.Split(s, #"[\u0002\u0003]").Where(a => a != String.Empty).ToList();
result1.ForEach(a=>Console.WriteLine(a));
Console.WriteLine("------------ OR WITHOUT REGEX ---------------");
var result2 = s.Split(new char[] { '\u0002','\u0003' }, StringSplitOptions.RemoveEmptyEntries).ToList();
result2.ForEach(a => Console.WriteLine(a));
output:
,T1,
,1,1,1,1,1,1,
,A,1,0,B,ERR
------------ OR WITHOUT REGEX ---------------
,T1,
,1,1,1,1,1,1,
,A,1,0,B,ERR

Split the string with different conditions using Linq in C#

I need to extract and remove a word from a string. The word should be upper-case, and following one of the delimiters /, ;, (, - or a space.
Some Examples:
"this is test A/ABC"
Expected output: "this is test A" and "ABC"
"this is a test; ABC/XYZ"
Expected output: "this is a test; ABC" and "XYZ"
"This TASK is assigned to ANIL/SHAM in our project"
Expected output: "This TASK is assigned to ANIL in our project" and "SHAM"
"This TASK is assigned to ANIL/SHAM in OUR project"
Expected output: "This TASK is assigned to ANIL/SHAM in project" and "OUR"
"this is test AWN.A"
Expected output: "this is test" and "AWN.A"
"XETRA-DAX"
Expected output: "XETRA" and "DAX"
"FTSE-100"
Expected output: "-100" and "FTSE"
"ATHEX"
Expected output: "" and "ATHEX"
"Euro-Stoxx-50"
Expected output: "Euro-Stoxx-50" and ""
How can I achieve that?
An "intelligent" version:
string strValue = "this is test A/ABC";
int ix = strValue.LastIndexOfAny(new[] { '/', ' ', ';', '(', '-' });
var str1 = strValue.Substring(0, ix);
var str2 = strValue.Substring(ix + 1);
A "stupid LINQ" version:
var str3 = new string(strValue.Reverse().SkipWhile(p => p != '/' && p != ' ' && p != ';' && p != '(' && p != '-').Skip(1).Reverse().ToArray());
var str4 = new string(strValue.Reverse().TakeWhile(p => p != '/' && p != ' ' && p != ';' && p != '(' && p != '-').Reverse().ToArray());
both cases are WITHOUT checks. The OP can add checks if he wants them.
For the second question, using LINQ is REALLY too much difficult. With a Regex it's "easily doable".
var regex = new Regex("^(.*[A-Z]+)([-/ ;(]+)([A-Z]+)(.*?)$");
var strValueWithout = regex.Replace(strValue, "$1$4");
var extractedPart = regex.Replace(strValue, "$3");
For the third question
var regex = new Regex("^(.*?)([A-Z.]*)([-/ ;(]+)([A-Z.]+)(.*?)$", RegexOptions.RightToLeft);
var strValueWithout = regex.Replace(strValue, "$1$2$5");
var extractedPart = regex.Replace(strValue, "$4");
With code sample: http://ideone.com/5OSs0
Another update (it's becoming BORING)
Regex Regex = new Regex(#"^(?<1>.*?)(?<2>[-/ ;(]*)(?<=\b)(?<3>[A-Z.]+)(?=\b)(?<4>.*?)$|^(?<1>.*)$", RegexOptions.RightToLeft);
Regex Regex2 = new Regex(#"^(?<1>.*?)(?<2>[-/ ;(]*)(?<=\b)(?<3>(?:\p{Lu}|\.)+)(?=\b)(?<4>.*?)$|^(?<1>.*)$", RegexOptions.RightToLeft);
var str1 = Regex.Replace(str, "$1$4");
var str2 = Regex.Replace(str, "$3");
The difference between the two is that the first will use A-Z as upper case characters, the second one will use other "upper case" characters, for example ÀÈÉÌÒÙ
With code sample: http://ideone.com/FqcmY
This should work according to the new requirements: it should find the last separator that is wrapped with uppercase words:
Match lastSeparator = Regex.Match(strExample,
#"(?<=\b\p{Lu}+)[-/ ;(](\p{Lu}+)\b",
RegexOptions.RightToLeft); // last match
string main = lastSeparator.Result("$`$'"); // before and after the match
string word = lastSeparator.Groups[1].Value; // word after the separator
This regex is a little tricky. Main tricks:
Use RegexOptions.RightToLeft to find the last match.
Use of Match.Result for a replace.
$`$' as replacement string: http://www.regular-expressions.info/refreplace.html
\p{Lu} for upper-case letters, you can change that to [A-Z] if your more comfortable with that.
If the word shouldn't follow an upper case word, you can simplify the regex to:
#"[-/ ;(](\p{Lu}+)\b"
If you want other characters as well, you can use a character class (and maybe remove \b). For example:
#"[-/ ;(]([\p{Lu}.,]+)"
Working example: http://ideone.com/U9AdK
use a List of strings, set all the words to it
find the index of the / then use ElementAt() to determine the word to split which is "SHAM" in your question.
in the below sentence of yours your index of / will be 6.
string strSentence ="This TASK is assigned to ANIL/SHAM in our project";
then use ElementAt(6) at the end of
index is the index of the / in your List<string>
str = str.Select(s => strSentence.ElementAt(index+1)).ToList();
this will return you the SHAM
str = str.Delete(s => strSentence.ElementAt(index+1));
this will delete the SHAM then just print the strSentence without SHAM
if you dont want to use a list of strings you can use the " " to determinate the words in your sentence i think, but that would be a long way to go.
the idea of mine is right i think but the code may not be that flawless.
You can use a combination of the string.Split() method and the Regex class. A simple Split is suitable for simple cases, such as splitting according to the character /. Regular expressions are perfect for matching more complicated patterns.
As a proof of concept, you could re-implement Split in LINQ using TakeWhile and SkipWhile
string strValue = "this is test A/ABC";
var s1=new string(
strValue
.TakeWhile(c => c!= '/')
.ToArray());
var s2=new string(
strValue
.SkipWhile(c => c!= '/')
.Skip(1)
.ToArray());
I think the resulting code is so mind-blowingly ugly that I hope you'll decide not to use linq

C# 3.0 Remove chars from string

I have a string and what to
remove all characters except all english letters (a..z)
replace all whitespaces sequences with a single whitespace
How would you do that with C# 3.0 ?
Regex (edited)?
string s = "lsg #~A\tSd 2£R3 ad"; // note tab
s = Regex.Replace(s, #"\s+", " ");
s = Regex.Replace(s, #"[^a-zA-Z ]", ""); // "lsg A Sd R ad"
Of course the Regex solution is the best one (i think).
But someone HAS to do it in LINQ, so i had some fun. There you go:
bool inWhiteSpace = false;
string test = "lsg #~A\tSd 2£R3 ad";
var chars = test.Where(c => ('a' <= c && c <= 'z') || ('A' <= c && c <= 'Z') || char.IsWhiteSpace(c))
.Select(c => {
c = char.IsWhiteSpace(c) ? inWhiteSpace ? char.MinValue : ' ' : c;
inWhiteSpace = c == ' ' || c == char.MinValue;
return c;
})
.Where(c => c != char.MinValue);
string result = new string(chars.ToArray());
Using regular expressions of course!
string myCleanString = Regex.Replace(stringToCleanUp, #"[\W]", "");
string myCleanString = Regex.Replace(stringToCleanUp, #"[^a-zA-Z0-9]", "");
I think you can do this with regular expression .What Marc and boekwurm mentioned.
Try these links also http://www.c-sharpcorner.com/UploadFile/prasad_1/RegExpPSD12062005021717AM/RegExpPSD.aspx
http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.aspx
note : [a-z] :A range of characters. Matches any character in the specified
range. For example, “[a-z]” matches any lowercase alphabetic
character in the range “a” through “z”.
Regular expressions also provide special characters to represent common character
ranges. You could use “[0-9]” to match any numeric digit, or you can use “\d”. Similarly,
“\D” matches any non-numeric digit. Use “\s” to match any white-space character,
and use “\S” to match any non-white-space character.

Categories

Resources