Simple regex matching issue, what's my mistake?

Simple regex matching issue, what's my mistake? - c#

I have a string:
1/45 files checked
I want to parse the numbers (1 and 45) out of it, but first, to check if a string matches this pattern at all. So I write a regex:
String line = "1/45 files checked";
Match filesProgressMatch = Regex.Match(line, #"[0-9]+/[0-9]+ files checked");
if (filesProgressMatch.Success)
{
String matched = filesProgressMatch.Groups[1].Value.Replace(" files checked", "");
string[] numbers = matched.Split('/');
filesChecked = Convert.ToInt32(numbers[0]);
totalFiles = Convert.ToInt32(numbers[1]);
}
I expected matched to contain "1/45", but it is, in fact, empty. What's my mistake?
My first thought was '/' is a special character in a regex, but that doesn't seem to be the case.
P. S. Is there a better way to parse these values from such string in C#?

Your regex is matching, but you are selecting Groups[1] where the count of groups is one. So use
String matched = filesProgressMatch.Groups[0].Value.Replace(" files checked", "");
And you should be fine

Try this regex:
You need to escape the forward slash
([0-9]+\/[0-9]+) files checked
Demo

Use capture group:
Regex.Match(line, #"([0-9]+/[0-9]+) files checked");
# here __^ and __^
You could also use 2 groups:
Regex.Match(line, #"([0-9]+)/([0-9]+) files checked");

Applying the replace operation to the first element of filesProgressMath.Groups seems to work.
String matched = filesProgressMatch.Groups[0].Value.Replace(" files checked", "");

This should give you your results
string txtText = #"1\45 files matched";
int[] s = System.Text.RegularExpressions.Regex.Split(txtText, "[^\\d+]").Where(x => !string.IsNullOrEmpty(x)).Select(x => Convert.ToInt32(x)).ToArray();

Related

Regex - extract rest of string after specific sequence

I have a long string with random letters, numbers, and spaces.
I need a regex expression to pull out the part of the string after the sequence of characters and numbers --> AQ102.
For example :
string t = "kjdsjsk158dfdd 125.196.168.210helloAQ102Lab101 section2";
desired output:
Lab101 section2

Why not use
string s = t.Split("AQ102").Last();

Or, a regular expression as originally asked for:
Regex regEx = new Regex(#".*(AQ102.*)");
OR
Regex regEx = new Regex(#".*(AQ102)(.*)");
And you can get the matches doing the following:
Matches matches = regEx.Matches(t);
And you can get the match by referencing the first index:
matches[1]
OR, if you're really confident:
string val = regEx.Matches(t)[1].Value;

Don't need Regex for this. A simple split should suffice:
string output = input.Split(new string[] { "AQ102" }, StringSplitOptions.None)[1];
Depend on how sure you are of your input, you may want to check that AQ102 exist first, or even to count how many times... but as I said, depends on your scenario.

Omit unnecessary parts in string array

In C#, I have a string comes from a file in this format:
Type="Data"><Path.Style><Style
or maybe
Type="Program"><Rectangle.Style><Style
,etc. Now I want to only extract the Data or Program part of the Type element. For that, I used the following code:
string output;
var pair = inputKeyValue.Split('=');
if (pair[0] == "Type")
{
output = pair[1].Trim('"');
}
But it gives me this result:
output=Data><Path.Style><Style
What I want is:
output=Data
How to do that?

This code example takes an input string, splits by double quotes, and takes only the first 2 items, then joins them together to create your final string.
string input = "Type=\"Data\"><Path.Style><Style";
var parts = input
.Split('"')
.Take(2);
string output = string.Join("", parts); //note: .net 4 or higher
This will make output have the value:
Type=Data
If you only want output to be "Data", then do
var parts = input
.Split('"')
.Skip(1)
.Take(1);
or
var output = input
.Split('"')[1];

What you can do is use a very simple regular express to parse out the bits that you want, in your case you want something that looks like this and then grab the two groups that interest you:
(Type)="(\w+)"
Which would return in groups 1 and 2 the values Type and the non-space characters contained between the double-quotes.

Instead of doing many split, why don't you just use Regex :
output = Regex.Match(pair[1].Trim('"'), "\"(\w*)\"").Value;

Maybe I missed something, but what about this:
var str = "Type=\"Program\"><Rectangle.Style><Style";
var splitted = str.Split('"');
var type = splitted[1]; // IE Data or Progam
But you will need some error handling as well.

How about a regex?
var regex = new Regex("(?<=^Type=\").*?(?=\")");
var output = regex.Match(input).Value;
Explaination of regex
(?<=^Type=\") This a prefix match. Its not included in the result but will only match
if the string starts with Type="
.*? Non greedy match. Match as many characters as you can until
(?=\") This is a suffix match. It's not included in the result but will only match if the next character is "

Given your specified format:
Type="Program"><Rectangle.Style><Style
It seems logical to me to include the quote mark (") when splitting the strings... then you just have to detect the end quote mark and subtract the contents. You can use LinQ to do this:
string code = "Type=\"Program\"><Rectangle.Style><Style";
string[] parts = code.Split(new string[] { "=\"" }, StringSplitOptions.None);
string[] wantedParts = parts.Where(p => p.Contains("\"")).
Select(p => p.Substring(0, p.IndexOf("\""))).ToArray();

Why does regex replace my feed string once + 3 times?

Interesting situation I have here. I have some files in a folder that all have a very explicit string in the first line that I always know will be there. Want I want to do is really just append |DATA_SOURCE_KEY right after AVAILABLE_IND
//regex to search for the bb_course_*.bbd files
string courseRegex = #"BB_COURSES_([C][E][Q]|[F][A]|[H][S]|[S][1]|[S][2]|[S][P])\d{1,6}.bbd";
string courseHeaderRegex = #"EXTERNAL_COURSE_KEY|COURSE_ID|COURSE_NAME|AVAILABLE_IND";
//get files from the directory specifed in the GetFiles parameter and returns the matches to the regex
var matches = Directory.GetFiles(#"c:\courseFolder\").Where(path => Regex.Match(path, courseRegex).Success);
//prints the files returned
foreach (string file in matches)
{
Console.WriteLine(file);
File.WriteAllText(file, Regex.Replace(File.ReadAllText(file), courseHeaderRegex, "EXTERNAL_COURSE_KEY|COURSE_ID|COURSE_NAME|AVAILABLE_IND|DATA_SOURCE_KEY"));
}
But this code takes the original occurrence of the matching regex, replaces it with my replacement value, and then does it 3 more times.
EXTERNAL_COURSE_KEY|COURSE_ID|COURSE_NAME|AVAILABLE_IND|DATA_SOURCE_KEY|EXTERNAL_COURSE_KEY|COURSE_ID|COURSE_NAME|AVAILABLE_IND|DATA_SOURCE_KEY|EXTERNAL_COURSE_KEY|COURSE_ID|COURSE_NAME|AVAILABLE_IND|DATA_SOURCE_KEY|EXTERNAL_COURSE_KEY|COURSE_ID|COURSE_NAME|AVAILABLE_IND|DATA_SOURCE_KEY
And I can't figure out why with breakpoints. My loop is running only 12 times to match the # of files I have in the directory. My only guess is that File.WriteAllText is somehow recursively searching itself after replacing the text and re-replacing. If that makes sense. Any ideas? Is it because courseHeaderRegex is so explicit?
If I change courseHeaderRegex to string courseHeaderRegex = #"AVAILABLE_IND";
then I get the correct changes in my files
EXTERNAL_COURSE_KEY|COURSE_ID|COURSE_NAME|AVAILABLE_IND|DATA_SOURCE_KEY
I'd just like to understand why the original way doesn't work.

I think your problem is that you need to escape the | character in courseHeaderRegex:
string courseHeaderRegex = #"EXTERNAL_COURSE_KEY\|COURSE_ID\|COURSE_NAME\|AVAILABLE_IND";
The character | is the Alternation Operator and it will match 'EXTERNAL_COURSE_KEY' , 'COURSE_ID' , ,'COURSE_NAME' and 'AVAILABLE_IND', replacing each of them with your substitution string.

What about
string newString = File.ReadAllText(file)
.Replace(#"EXTERNAL_COURSE_KEY|COURSE_ID|COURSE_NAME|AVAILABLE_IND",#"EXTERNAL_COURSE_KEY|COURSE_ID|COURSE_NAME|AVAILABLE_IND|DATA_SOURCE_KEY");
just using a simple String.Replace()

Pattern Matching c#

Lets say I have a text file with the line below within it. I want to take both values within the quotations by matching between (" and "), so that would be I retreive ABC and DEF and put them in a string list or something, what's the best way of doing this? It's so annoying
If EXAMPLEA("ABC") AND EXAMPLEB("DEF")

Assuming a case where the value between the double quotes can not contain escaped double quotes might work like this:
var text = "If EXAMPLEA(\"ABC\") AND EXAMPLEB(\"DEF\")";
Regex pattern = new Regex("\"[^\"]*\"");
foreach (Match match in pattern.Matches(text))
{
Console.WriteLine(match.Value.Trim('"'));
}
But this is only one of the many ways you could do it and maybe not the smartest way out there. Try something yourself!

Best way...
List<string> matches=Regex.Matches(File.ReadAllText(yourPath),"(?<="")[^""]*(?="")")
.Cast<Match>()
.Select(x=>x.Value)
.ToList();

This pattern should do the trick:
\"([^"]*)\"
string str = "If EXAMPLEA(\"ABC\") AND EXAMPLEB(\"DEF\")";
MatchCollection matched = Regex.Matches(str, #"\""([^\""]*)\""");
foreach (Match match in matched)
{
Console.WriteLine(match.Groups[1].Value);
}
Note that the quotation marks are doubled in the actual code in order to escape them. And the code refers to group [1] to get just the part inside the parentheses.

IEnumerable<string> matches =
from Match match
in Regex.Matches(File.ReadAllText(filepath), #"\""([^\""]*)\""")
select match.Groups[1].Value;
Others already posted some answers, but my takes into account that you just want ABC and DEF in your example, without quotation marks and save it in a IEnumerable<string>.

Regex to remove all (non numeric OR period)

I need for text like "joe ($3,004.50)" to be filtered down to 3004.50 but am terrible at regex and can't find a suitable solution. So only numbers and periods should stay - everything else filtered. I use C# and VS.net 2008 framework 3.5

This should do it:
string s = "joe ($3,004.50)";
s = Regex.Replace(s, "[^0-9.]", "");

The regex is:
[^0-9.]
You can cache the regex:
Regex not_num_period = new Regex("[^0-9.]")
then use:
string result = not_num_period.Replace("joe ($3,004.50)", "");
However, you should keep in mind that some cultures have different conventions for writing monetary amounts, such as: 3.004,50.

You are dealing with a string - string is an IEumerable<char>, so you can use LINQ:
var input = "joe ($3,004.50)";
var result = String.Join("", input.Where(c => Char.IsDigit(c) || c == '.'));
Console.WriteLine(result); // 3004.50

For the accepted answer, MatthewGunn raises a valid point in that all digits, commas, and periods in the entire string will be condensed together. This will avoid that:
string s = "joe.smith ($3,004.50)";
Regex r = new Regex(#"(?:^|[^w.,])(\d[\d,.]+)(?=\W|$)/)");
Match m = r.match(s);
string v = null;
if (m.Success) {
v = m.Groups[1].Value;
v = Regex.Replace(v, ",", "");
}

The approach of removing offending characters is potentially problematic. What if there's another . in the string somewhere? It won't be removed, though it should!
Removing non-digits or periods, the string joe.smith ($3,004.50) would transform into the unparseable .3004.50.
Imho, it is better to match a specific pattern, and extract it using a group. Something simple would be to find all contiguous commas, digits, and periods with regexp:
[\d,\.]+
Sample test run:
Pattern understood as:
[\d,\.]+
Enter string to check if matches pattern
> a2.3 fjdfadfj34 34j3424 2,300 adsfa
Group 0 match: "2.3"
Group 0 match: "34"
Group 0 match: "34"
Group 0 match: "3424"
Group 0 match: "2,300"
Then for each match, remove all commas and send that to the parser. To handle case of something like 12.323.344, you could do another check to see that a matching substring has at most one ..

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Simple regex matching issue, what's my mistake? - c#

Your regex is matching, but you are selecting Groups[1] where the count of groups is one. So use String matched = filesProgressMatch.Groups[0].Value.Replace(" files checked", ""); And you should be fine

Try this regex: You need to escape the forward slash ([0-9]+\/[0-9]+) files checked Demo

Use capture group: Regex.Match(line, #"([0-9]+/[0-9]+) files checked"); # here ^ and ^ You could also use 2 groups: Regex.Match(line, #"([0-9]+)/([0-9]+) files checked");

Applying the replace operation to the first element of filesProgressMath.Groups seems to work. String matched = filesProgressMatch.Groups[0].Value.Replace(" files checked", "");

This should give you your results string txtText = #"1\45 files matched"; int[] s = System.Text.RegularExpressions.Regex.Split(txtText, "[^\\d+]").Where(x => !string.IsNullOrEmpty(x)).Select(x => Convert.ToInt32(x)).ToArray();

Related

Regex - extract rest of string after specific sequence

Omit unnecessary parts in string array

Why does regex replace my feed string once + 3 times?

Pattern Matching c#

Regex to remove all (non numeric OR period)

Categories

Resources

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Simple regex matching issue, what's my mistake? - c#

Your regex is matching, but you are selecting Groups[1] where the count of groups is one. So use String matched = filesProgressMatch.Groups[0].Value.Replace(" files checked", ""); And you should be fine

Try this regex: You need to escape the forward slash ([0-9]+\/[0-9]+) files checked Demo

Use capture group: Regex.Match(line, #"([0-9]+/[0-9]+) files checked"); # here __^ and __^ You could also use 2 groups: Regex.Match(line, #"([0-9]+)/([0-9]+) files checked");

Applying the replace operation to the first element of filesProgressMath.Groups seems to work. String matched = filesProgressMatch.Groups[0].Value.Replace(" files checked", "");

This should give you your results string txtText = #"1\45 files matched"; int[] s = System.Text.RegularExpressions.Regex.Split(txtText, "[^\\d+]").Where(x => !string.IsNullOrEmpty(x)).Select(x => Convert.ToInt32(x)).ToArray();

Related

Regex - extract rest of string after specific sequence

Omit unnecessary parts in string array

Why does regex replace my feed string once + 3 times?

Pattern Matching c#

Regex to remove all (non numeric OR period)

Categories

Resources

Use capture group: Regex.Match(line, #"([0-9]+/[0-9]+) files checked"); # here ^ and ^ You could also use 2 groups: Regex.Match(line, #"([0-9]+)/([0-9]+) files checked");