C# Regexcollection between special characters

C# Regexcollection between special characters - c#

I am trying use regex to parse the values 903001,343001,343491 in the following input:
"contact_value":"903001" other random
"contact_value":"343001" random information
"contact_value":"343491" more random
I used the following in c# but it returns "contact_value":"903001"
MatchCollection numMatch = Regex.Matches(input, #"contact_value\"":\"".*"\""");
thanks in advance

The regex could be as simple as
#"\d+"

If you use # with strings (e.g. #"string"), escape characters are not processed. In those strings, you use "" instead of \" to represent double quotes. Try this regex:
var regex = #"contact_value"":""(\d+)"""

Try something like:
string input = "\"contact_value\":\"1234567890\"" ;
Regex rx = new Regex( #"^\s*""contact_value""\s*:\s*""(?<value>\d+)""\s*$" ) ;
Match m = rx.Match( input ) ;
if ( !m.Success )
{
Console.WriteLine("Invalid");
}
else
{
string value = m.Groups["value"].Value ;
int n = int.Parse(value) ;
Console.WriteLine( "The contact_value is {0}",n) ;
}
[And read up on how to use regular expressions]

Related

How to split string by another string

I have this string (it's from EDI data):
ISA*ESA?ISA*ESA?
The * indicates it could be any character and can be of any length.
? indicates any single character.
Only the ISA and ESA are guaranteed not to change.
I need this split into two strings which could look like this: "ISA~this is date~ESA|" and
"ISA~this is more data~ESA|"
How do I do this in c#?
I can't use string.split, because it doesn't really have a delimeter.

You can use Regex.Split for accomplishing this
string splitStr = "|", inputStr = "ISA~this is date~ESA|ISA~this is more data~ESA|";
var regex = new Regex($#"(?<=ESA){Regex.Escape(splitStr)}(?=ISA)", RegexOptions.Compiled);
var items = regex.Split(inputStr);
foreach (var item in items) {
Console.WriteLine(item);
}
Output:
ISA~this is date~ESA
ISA~this is more data~ESA|
Note that if your string between the ISA and ESA have the same pattern that we are looking for, then you will have to find some smart way around it.
To explain the Regex a bit:
(?<=ESA) Look-behind assertion. This portion is not captured but still matched
(?=ISA) Look-ahead assertion. This portion is not captured but still matched
Using these look-around assertions you can find the correct | character for splitting

Simply use the
int x = whateverString.indexOf("?ISA"); // replace ? with the actual character here
and then just use the substring from 0 to that indexOf, indexOf to length.
Edit:
If ? is not known,
can we just use the regex Pattern and Matcher.
Matcher matcher = Patter.compile("ISA.*ESA").match(whateverString);
if(matcher.find()) {
matcher.find();
int x = matcher.start();
}
Here x would give that start index of that match.
Edit: I mistakenly saw it as java one, for C#
string pattern = #"ISA.*ESA";
Regex myRegex = new Regex(pattern, RegexOptions.IgnoreCase);
Match m = myRegex.Match(whateverString); // m is the first match
while (m.Success)
{
Console.writeLine(m.value);
m = m.NextMatch(); // more matches
}

RegEx will probably be the best for this. See this link
Mask would be
ISA(?<data1>.*?)ESA.ISA(?<data2>.*?)ESA.
This will give you 2 groups with data you need
Match match = Regex.Match(input, #"ISA(?<data1>.*?)ESA.ISA(?<data2>.*?)ESA.",RegexOptions.IgnoreCase);
if (match.Success)
{
var data1 = match.Groups["data1"].Value;
var data2 = match.Groups["data2"].Value;
}
Use Regex.Matches If you need multiple matches found, and specify different RegexOptions if needed.

It's kinda hacky but you could do...
string x = "ISA*ESA?ISA*ESA?";
x = x.Replace("*","~"); // OR SOME OTHER DELIMITER
string[] y = x.Split('~');
Not perfect in all situations, but it could solve your problem simply.

You could split by "ISA" and "ESA" and then put the parts back together.
string input = "ISA~this is date~ESA|ISA~this is more data~ESA|";
string start = "ISA",
end = "ESA";
var splitedInput = input.Split(new[] { start, end }, StringSplitOptions.None);
var firstPart = $"{start}{splitedInput[1]}{end}{splitedInput[2]}";
var secondPart = $"{start}{splitedInput[3]}{end}{splitedInput[4]}";
firstPart = "ISA~this is date~ESA|"
secondPart = "ISA~this is more data~ESA|";

Use a Regex like ISA(.+?)ESA and select the first group
string input = "ISA~mycontent+ESA";
Match match = Regex.Match(input, #"ISA(.+?)ESA",RegexOptions.IgnoreCase);
if (match.Success)
{
string key = match.Groups[1].Value;
}

Instead of "splitting" by a string, I would instead describe your question as "grouping" by a string. This can easily be done using a regular expression:
Regular expression: ^(ISA.*?(?=ESA)ESA.)(ISA.*?(?=ESA)ESA.)$
Explanation:
^ - asserts position at start of the string
( - start capturing group
ISA - match string ISA exactly
.*?(?=ESA) - match any character 0 or more times, positive lookahead on the
string ESA (basically match any character until the string ESA is found)
ESA - match string ESA exactly
. - match any character
) - end capturing group
repeat one more time...
$ - asserts position at end of the string
Try it on Regex101
Example:
string input = "ISA~this is date~ESA|ISA~this is more data~ESA|";
Regex regex = new Regex(#"^(ISA.*?(?=ESA)ESA.)(ISA.*?(?=ESA)ESA.)$",
RegexOptions.Compiled);
Match match = regex.Match(input);
if (match.Success)
{
string firstValue = match.Groups[1].Value; // "ISA~this is date~ESA|"
string secondValue = match.Groups[2].Value; // "ISA~this is more data~ESA|"
}

There are two answers to the question "How to split a string by another string".
var matches = input.Split(new [] { "ISA" }, StringSplitOptions.RemoveEmptyEntries);
and
var matches = Regex.Split(input, "ISA").ToList();
However, the first removes empty entries, while the second does not.

How can I cut out the below pattern from a string using Regex?

I have a string which will have the word "TAG" followed by an integer,underscore and another word.
Eg: "TAG123_Sample"
I need to cut the "TAGXXX_" pattern and get only the word Sample. Meaning I will have to cut the word "TAG" and the integer followed by and the underscore.
I wrote the following code but it doesn't work. What have I done wrong? How can I do this? Please advice.
static void Main(string[] args)
{
String sentence = "TAG123_Sample";
String pattern=#"TAG[^\d]_";
String replacement = "";
Regex r = new Regex(pattern);
String res = r.Replace(sentence,replacement);
Console.WriteLine(res);
Console.ReadLine();
}

You're currently negating (matching NOT a digit), you need to modify the regex as follows:
String s = "TAG123_Sample";
String r = Regex.Replace(s, #"TAG\d+_", "");
Console.WriteLine(r); //=> "Sample"
Explanation:
TAG match 'TAG'
\d+ digits (0-9) (1 or more times)
_ '_'

You can use String.Split for this:
string[] s = "TAG123_Sample".Split('_');
Console.WriteLine(s[1]);
https://msdn.microsoft.com/en-us/library/b873y76a.aspx

Try this will work in this case for sure:
resultString = Regex.Replace(sentence ,
#"^ # Match start of string
[^_]* # Match 0 or more characters except underscore
_ # Match the underscore", "", RegexOptions.IgnorePatternWhitespace);

No regex is necessary if your string contains 1 underscore and you need to get a substring after it.
Here is a Substring+IndexOf-based approach:
var res = sentence.Substring(sentence.IndexOf('_') + 1); // => Sample
See IDEONE demo

C# filter String with Regex

I'm not familiar with the regex, However I think that REGEX could help me a lot to resolve my problem.
I have 2 kind of string in a big List<string> str (with or without description) :
str[0] = "[toto]";
str[1] = "[toto] descriptionToto";
str[2] = "[titi]";
str[3] = "[titi] descriptionTiti";
str[4] = "[tata]";
str[5] = "[tata] descriptionTata";
The list isn't really ordered.
I would parse all my list then format datas depending on what I will find inside.
If I find: "[toto]" I would like to get to set str[0]="toto"
and If I find "[toto] descriptionToto" I would like to get to set str[1]="descriptionToto"
Do you have any ideas of the better way to get this result please ?

There are two regex options if you ask me:
Make a regex pattern with two capturing groups, then use group 1 or group 2 depending on whether group 1 is empty. In this case you'd use named capturing groups to get a clear relationship between the pattern and the code
Make a regex that matches string type 1 or string type 2, in which case you would get your end result directly from regex
If you're going for speed, using str[0].IndexOf(']') would get most of the job done.

Rather than regex, I'd be inclined to just use string.split, something along the lines of:
string[] tokens = str[0].Split(new Char [] {'[', ']'});
if (tokens[2] == "") {
str = tokens[1];
} else {
str = tokens[2];
}

You can use single regex:
string s = Regex.Match(str[0], #"(?<=\[)[^\]]*(?=]$)|(?<=] ).*").Value;
Idea is simple: if the text is ended with ] and there is no other ], then take everything between [ ], otherwise take everything after first ].
Sample code:
List<string> strList = new List<string> {
"[toto]",
"[toto] descriptionToto",
"[titi]",
"[titi] descriptionTiti",
"[tata]",
"[tata] descriptionTata" };
foreach(string str in strList)
Console.WriteLine(Regex.Match(str, #"(?<=\[)[^\]]*(?=]$)|(?<=] ).*").Value);
Sample output:
toto
descriptionToto
titi
descriptionTiti
tata
descriptionTata

if you are planning to get just the description for those that contain description:
you can do a split at a space char - " " and store the second element of the array in str[1] which would be the description.
If there's no description, a space would not exist.
So do a loop and then in an array store : list.Split(' '). This will split the str with description into two elements.
so:
for (int i = 0; i < str.Length; i++)
{
string words[] = str[i].Split(' ')
if words.length > 1
{str[i] = word[1];
}
}

If those are code strings and not literal variable notation this should work.
The replacement just catenates capture group 1 and 2.
Find: ^\s*(?:\[([^\[\]]*)\]\s*|\[[^\[\]]*\]\s*((?:\s*\S)+\s*))$
Replace: "$1$2"
^
\s*
(?:
\[
( [^\[\]]* ) # (1)
\] \s*
|
\[ [^\[\]]* \]
\s*
( # (2 start)
(?: \s* \S )+
\s*
) # (2 end)
)
$
Dot-Net test case
string str1 = "[titi]";
Console.WriteLine( Regex.Replace(str1, #"^\s*(?:\[([^\[\]]*)\]\s*|\[[^\[\]]*\]\s*((?:\s*\S)+\s*))$", #"$1$2"));
string str2 = "[titi] descriptionTiti";
Console.WriteLine( Regex.Replace(str2, #"^\s*(?:\[([^\[\]]*)\]\s*|\[[^\[\]]*\]\s*((?:\s*\S)+\s*))$", #"$1$2"));
Output >>
titi
descriptionTiti

Parsing measurement units

I have the following string:
string value = "123.456L";
What is the best way to parse this string into a string and a double:
double number = 123.456;
string measure = "L"
Instead of the L, we could also have something else, like oz, m/s, liter, kilograms, etc

Assuming that the units of measure are always expressed as a single character at the back of the string, you can do this:
string value = "123.456L";
var pos = value.LastIndexOfAny("0123456789".ToCharArray());
double number = double.Parse(value.Substring(0, pos+1));
string measure = value.Substring(pos+1);

Based on the comment explaining the input, I'd use Regex.
double number = double.Parse(Regex.Match(value, #"[\d.]+").Value);
string measure = value.Replace(number.ToString(), "");
The regex [\d.] will match any number or ., the + means it must be for 1 or more matches.

I'd do it like this:
public bool TryParseUnit ( string sValue, out double fValue, out string sUnit )
{
fValue = 0;
sUnit = null;
if ( !String.IsNullOrEmpty ( sValue ) )
{
sUnit = GetUnit ( sValue );
if ( sUnit != null )
{
return ( Double.TryParse ( sValue.Substring ( sValue.Length - sUnit.Length ),
out fValue );
}
}
return ( false );
}
private string GetUnit ( string sValue )
{
string sValue = sValue.SubString ( sValue.Length - 1 );
switch ( sValue.ToLower () )
{
case "l":
return ( "L" );
}
return ( null );
}
I know it's more complicated than the other answers but this way you can also validate the data during parsing and discard invalid input.

You could do it with a regex
using System.Text.RegularExpression;
Regex reg = new Regex(#"([\d|\.]*)(\w*)");
string value = "123.4L";
MatchCollection matches = reg.Matches(value);
foreach (Match match in matches)
{
if (match.Success)
{
GroupCollection groups = match.Groups;
Console.WriteLine(groups[1].Value); // will be 123.4
Console.WriteLine(groups[2].Value); // will be L
}
}
So what this will do is look for a 0 or more digits or "." and then group them and then look for any character (0 or more). You can then get the groups from each match and get the value. This will work if you want to change the type of measurement and will work if you don't have a decimal point either.
Edit: It is important to note that you must use groups[1] for the first group and groups[2] for the second group. If you use group[0] it will display the original string.

You might want to take a look at Units.NET on GitHub and NuGet. It supports parsing abbreviations in different cultures, but it is still on my TODO list to add support for parsing combinations of numbers and units. I have already done this on a related project, so it should be straight-forward to add.
Update Apr 2015: You can now parse units and values by Length.Parse("5.3 m"); and similar for other units.

Simply spoken: look for all characters that are 0..9 or . and trim them to a new string, then have last part in second string. In a minute I cann give code.
Edit: Yes, I meant digits 0-9, corrected it. But easier is to get index of last number and ignore stuff before for the trimming.

You can try this:
string ma = Regex.Match(name, #"((\d\s)|(\d+\s)|(\d+)|(\d+\.\d+\s))(g\s|kg\s|ml\s)").Value;
this will match:
40 g , 40g , 12.5 g , 1 kg , 2kg , 150 ml ....

Regular Expression to match numbers inside parenthesis inside square brackets with optional text

Firstly, I'm in C# here so that's the flavor of RegEx I'm dealing with. And here are thing things I need to be able to match:
[(1)]
or
[(34) Some Text - Some Other Text]
So basically I need to know if what is between the parentheses is numeric and ignore everything between the close parenthesis and close square bracket. Any RegEx gurus care to help?

This should work:
\[\(\d+\).*?\]
And if you need to catch the number, simply wrap \d+ in parentheses:
\[\((\d+)\).*?\]

Do you have to match the []? Can you do just ...
\((\d+)\)
(The numbers themselves will be in the groups).
For example ...
var mg = Regex.Match( "[(34) Some Text - Some Other Text]", #"\((\d+)\)");
if (mg.Success)
{
var num = mg.Groups[1].Value; // num == 34
}
else
{
// No match
}

Regex seems like overkill in this situation. Here is the solution I ended up using.
var src = test.IndexOf('(') + 1;
var dst = test.IndexOf(')') - 1;
var result = test.SubString(src, dst-src);

Something like:
\[\(\d+\)[^\]]*\]
Possibly with some more escaping required?

How about "^\[\((d+)\)" (perl style, not familiar with C#). You can safely ignore the rest of the line, I think.

Depending on what you're trying to accomplish...
List<Boolean> rslt;
String searchIn;
Regex regxObj;
MatchCollection mtchObj;
Int32 mtchGrp;
searchIn = #"[(34) Some Text - Some Other Text] [(1)]";
regxObj = new Regex(#"\[\(([^\)]+)\)[^\]]*\]");
mtchObj = regxObj.Matches(searchIn);
if (mtchObj.Count > 0)
rslt = new List<bool>(mtchObj.Count);
else
rslt = new List<bool>();
foreach (Match crntMtch in mtchObj)
{
if (Int32.TryParse(crntMtch.Value, out mtchGrp))
{
rslt.Add(true);
}
}

How's this? Assuming you only need to determine if the string is a match, and need not extract the numeric value...
string test = "[(34) Some Text - Some Other Text]";
Regex regex = new Regex( "\\[\\(\\d+\\).*\\]" );
Match match = regex.Match( test );
Console.WriteLine( "{0}\t{1}", test, match.Success );

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# Regexcollection between special characters - c#

The regex could be as simple as #"\d+"

If you use # with strings (e.g. #"string"), escape characters are not processed. In those strings, you use "" instead of \" to represent double quotes. Try this regex: var regex = #"contact_value"":""(\d+)"""

Related

How to split string by another string

How can I cut out the below pattern from a string using Regex?

C# filter String with Regex

Parsing measurement units

Regular Expression to match numbers inside parenthesis inside square brackets with optional text

Categories

Resources