How to extract specific value from a string with Regex? [closed]

How to extract specific value from a string with Regex? [closed] - c#

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I am new to Regex and i want to extract a specific value from a string, i have strings like:
"20098: Blue Quest"
"95: Internal Comp"
"33: ICE"
and so on.Every string has the same pattern : Number followed by ":" followed by a space and random text. I want to get the numbers at the start for ex: "20098","95","33" etc.
i tried
Regex ex = new regex(#"[0-9]+\: [a-zA-Z]$")
This is not giving me any solution, Where am i going wrong?
(i am using c#)

This is a totally silly solution. However, i decided to benchmark an unchecked pointer version, against the other regex and int parse solutions here in the answers.
You mentioned the strings are always the same format, so i decided to see how fast we could get it.
Yehaa
public unsafe static int? FindInt(string val)
{
var result = 0;
fixed (char* p = val)
{
for (var i = 0; i < val.Length; i++)
{
if (*p == ':')return result;
result = result * 10 + *p - 48;
}
return null;
}
}
I run each test 50 times with 100,000 comparisons, and 1,000,000 respectively with both Lee Gunn's int.parse,The fourth bird version ^\d+(?=: [A-Z]) also my pointer version and ^\d+
Results
Test Framework : .NET Framework 4.7.1
Scale : 100000
Name | Time | Delta | Deviation | Cycles
----------------------------------------------------------------------------
Pointers | 2.597 ms | 0.144 ms | 0.19 | 8,836,015
Int.Parse | 17.111 ms | 1.009 ms | 2.91 | 57,167,918
Regex ^\d+ | 85.564 ms | 10.957 ms | 6.14 | 290,724,120
Regex ^\d+(?=: [A-Z]) | 98.912 ms | 1.508 ms | 7.16 | 336,716,453
Scale : 1000000
Name | Time | Delta | Deviation | Cycles
-------------------------------------------------------------------------------
Pointers | 25.968 ms | 1.150 ms | 1.15 | 88,395,856
Int.Parse | 143.382 ms | 2.536 ms | 2.62 | 487,929,382
Regex ^\d+ | 847.109 ms | 14.375 ms | 21.92 | 2,880,964,856
Regex ^\d+(?=: [A-Z]) | 950.591 ms | 6.281 ms | 20.38 | 3,235,489,411
Not surprisingly regex sucks

If they are all separate strings - you don't need to use a regex, you can simply use:
var s = "20098: Blue Quest";
var index = s.IndexOf(':');
if(index > 0){
if(int.TryParse(s.Substring(0, index), out var number))
{
// Do stuff
}
}
If they're all contained in one sting, you can loop over each line and perform the Substring. Perhaps a bit easier to read as a lot of people aren't comfortable with regular expressions.

In your regex "[0-9]+: [a-zA-Z]$ you match one or more digits followed by a colon and then a single lower or uppercase character.
That would match 20098: B and would not match the digits only.
There are better alternatives besides using a regex like as suggested, but you might match from the beginning of the string ^ one or more digits \d+ and use a positive lookahead (?= to assert that what follows is a colon, whitespace and an uppercase character [A-Z])
^\d+(?=: [A-Z])

Firstly, after colon, yoiu should use \s instead of literal space. Also, if the text after colon can include spaces, the second group should also allow /s and have a + after it.
[0-9]+\:\s[a-zA-Z\s]+$
Secondly, that entire regex will return the entire string. If you only want the first number, then the regex would be simply:
[0-9]+

You can use look-behind ?<= to find any number following ^" (where ^ is the beginning of line):
(?<=^")[0-9]+

Related

Split long string for each colon ":" and get index of the line by position

im struggling with the understanding of using Split method to receive my desired texts
im receiving long registration string from user and im trying to split it by colon : and for each colon found i want to get all the text until /n in the line
The string i'm receiving from the user is formatted like this example:
"Username: Jony \n
Fname: Dep\n
Address: Los Angeles\n
Age: 28\n
Date: 11/01:2001\n"
Thats my approche until now didnt figurate out how it works and didnt found question similler like my question
str = the long string
List<string> names = str.ToString().Split(':').ToList<string>();
names.Reverse();
var result = names[0].ToString();
var result1 = names[1].ToString();
Console.WriteLine(result.Remove('\n').Replace(" ",string.Empty));
Console.WriteLine(result1.Remove('\n').Replace(" ",string.Empty));

Benchmarks
----------------------------------------------------------------------------
Mode : Release (64Bit)
Test Framework : .NET Framework 4.7.1 (CLR 4.0.30319.42000)
----------------------------------------------------------------------------
Operating System : Microsoft Windows 10 Pro
Version : 10.0.17134
----------------------------------------------------------------------------
CPU Name : Intel(R) Core(TM) i7-3770K CPU # 3.50GHz
Description : Intel64 Family 6 Model 58 Stepping 9
Cores (Threads) : 4 (8) : Architecture : x64
Clock Speed : 3901 MHz : Bus Speed : 100 MHz
L2Cache : 1 MB : L3Cache : 8 MB
----------------------------------------------------------------------------
Results
--- Random characters -------------------------------------------------
| Value | Average | Fastest | Cycles | Garbage | Test | Gain |
--- Scale 1 -------------------------------------------- Time 1.152 ---
| split | 4.975 µs | 4.091 µs | 20.486 K | 0.000 B | N/A | 71.62 % |
| regex | 17.530 µs | 14.029 µs | 65.707 K | 0.000 B | N/A | 0.00 % |
-----------------------------------------------------------------------
Original Answer
You could use regex , or you could simply use Split
var input = "Username: Jony\n Fname: Dep\nAddress: Los Angeles\nAge: 28\nDate: 11/01:2001\n";
var results = input.Split(new []{'\n'}, StringSplitOptions.RemoveEmptyEntries)
.Select(x => x.Split(':')[1].Trim());
foreach (var result in results)
Console.WriteLine(result);
Full Demo Here
Output
Jony
Dep
Los Angeles
28
11/01
Note : This has no error checking, so if your string doesn't contain a Colon, it will break
Additional Resources
String.Split Method
Returns a string array that contains the substrings in this instance
that are delimited by elements of a specified string or Unicode
character arr
StringSplitOptions Enum
Specifies whether applicable Split method overloads include or omit
empty substrings from the return value
String.Trim Method
Returns a new string in which all leading and trailing occurrences of
a set of specified characters from the current String object are
removed.
Enumerable.Select Method
Projects each element of a sequence into a new form.

You can use a regex to find the matches after colon and up to the Newline character:
(?<=:)\s*[^\n]*
The regex uses a look back, ensuring there's a colon in front of the string, then it matches everything not being Newline = rest of line.
Use it like this:
string searchText = "Username: Jony\n
Fname: Dep\n
Address: Los Angeles\n
Age: 28\n
Date: 11/01:2001\n";
Regex myRegex = new Regex("(?<=:)\s*[^\n]*");
foreach (Match match in myRegex.Matches(searchText))
{
DoSomething(match.Value);
}

Trim string in c# output last string

I'm currently making a Game and I have already split the string everything right but how I can trim the string and output the last line or a line in the middle?
Code:
text = "Username:King100 ID:100 Level:10";
string[] splittext = text.Split(' ');
foreach (var texs in splittext)
{
Console.WriteLine(texs);
}
Output:
Username:King100
ID:100
Level:10
I just want display the level 10 in the Console how thats works?
thanxs for helping.
Edit: the level can be changed often like 10 or 100 or 1000

Regex is more flexible solution. But if your text format is contsant, you can use this simple way:
string level = text.Substring(text.LastIndexOf(':') + 1);

You can also use a Regular Expression to solve this:
var regex = new Regex(#"Level:(?<Level>\d*)");
var matches = regex.Matches("Username:King100 ID:100 Level:10");
if (matches.Count > 0 && matches[0].Success)
{
Console.WriteLine(matches[0].Groups["Level"].Value);
}

var text = "Username:King100 ID:100 Level:10";
/*
Splits the given string on spaces and then splits on ":"
and creates a Dictionary ("Dictionary<TKey, TValue>")
*/
var dict = text.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries)
.Select(part => part.Split(':'))
.ToDictionary(split => split[0], split => split[1]);
//If the dictionary count is greater than Zero
if(dict.Count > 0)
{
var levelValue = dict["Level"].ToString();
}

OK, because i'm annoying and totally bored of work, i decided to benchmark everyone's solutions.
The premise was simply to make an array of 1000 (scale) lines of strings (in the given format) with random positive int on the end;
Note : I made every solution int.Parse the result, as it seemed more useful
Mine
This just uses fixed, unsafe, pointers and no error checking
var level = 0;
fixed (char* pitem = item)
{
var len = pitem + item.Length;
for (var p = pitem ; p < len; p++)
if (*p >= '0' && *p <= '9')
level = level * 10 + *p - 48;
else
level = 0;
}
Results
Mode : Release
Test Framework : .NET Framework 4.7.1
Benchmarks runs : 1000 times (averaged)
Scale : 1,000
Name | Average | Fastest | StDv | Cycles | Pass | Gain
--------------------------------------------------------------------------
Mine | 0.095 ms | 0.085 ms | 0.01 | 317,205 | Yes | 96.59 %
Sanan | 0.202 ms | 0.184 ms | 0.02 | 680,747 | Yes | 92.75 %
Zaza | 0.373 ms | 0.316 ms | 0.10 | 1,254,302 | Yes | 86.60 %
Kishore | 0.479 ms | 0.423 ms | 0.06 | 1,620,756 | Yes | 82.81 %
Hussein | 1.045 ms | 0.946 ms | 0.11 | 3,547,305 | Yes | 62.50 %
Maccettura | 2.787 ms | 2.476 ms | 0.39 | 9,474,133 | Base | 0.00 %
Hardkoded | 6.691 ms | 5.927 ms | 0.67 | 22,750,311 | Yes | -140.09 %
Tom | 11.561 ms | 10.635 ms | 0.78 | 39,344,419 | Yes | -314.80 %
Summary
All the solutions do different things in different ways, comparing them is not really apples to apples.
Don't use mine, its totally unrealistic and only for fun. Use the version that makes the most sense to you, that is the most robust and easiest to maintain.
As always, regex is the slowest.

If level is always the last part of the string, and all you care about is the actual number, then you can just do:
var level = text.Split(':').LastOrDefault();
This would just split on ':' and give you the last (or default) element. Given your example input, level = "10".

Try this:
string input = "Username:King100 ID:100 Level:10";
Match m = Regex.Match(input, #"\s*Level:(?<level>\d+)");
if (m.Success&& m.Groups["level"].Success)
Console.WriteLine(m.Groups["level"].Value);
Also works for:
string input = "Username:King100 Level:10 ID:100";

string text = texs.Substring(texs.IndexOf("Level:")+6);
System.Console.WriteLine(text);

How to create Regex that contains Not colon char?

I have created a regular expression that seems to be working somewhat:
// look for years starting with 19 or 20 followed by two digits surrounded by spaces.
// Instead of ending space, the year may be followed by a '.' or ';'
static Regex regex = new Regex(#" 19\d{2} | 19\d{2}. | 19\d{2}; | 20\d {2} | 20\d{2}. | 20\d{2}; ");
// Trying to add 'NOT followed by a colon'
static Regex regex = new Regex(#" 19\d{2}(?!:) | 19\d{2}. | 19\d{2}; | 20\d{2}(?!:) | 20\d{2}. | 20\d{2}; ");
// Trying to optimize --
//static Regex regex = new Regex(#" (19|20)\d{2}['.',';']");
You can see where I tried to optimize a bit.
But more importantly, it is finding a match for 2002:
How do I make it not do that?
I think I am looking for some sort of NOT operator?

(?:19|20)\d{2}(?=[ ,;.])
Try this.See demo.
https://regex101.com/r/sJ9gM7/103

I'd rather go with \b here, this will help deal with other punctuation that may appear after/before the years:
\b(?:19|20)[0-9]{2}\b
C#:
static Regex regex = new Regex(#"\b(?:19|20)[0-9]{2}\b");
Tested in Expresso:

Problem in your regex is dot.
You should have something like this:
static Regex regex =
new Regex(#" 19\d{2} | 19\d{2}[.] | 19\d{2}; | 20\d{2} | 20\d{2}[.] | 20\d{2}; ");

This did it for me:
// look for years starting with 19 or 20 followed by two digits surrounded by spaces.
// Instead of ending space, the year may also be followed by a '.' or ';'
// but may not be followed by a colon, dash or any other unspecified character.
// optimized --
static Regex regex = new Regex(#"(19|20)\d{2} | (19|20)\d{2};| (19|20)\d{2}[.]");
Used Regex Tester here:
http://regexhero.net/tester/

How do I validate a decimal like this 00.00?

I am learning validation expressions and have attempted to write one to check a decimal like the example below but I am having some issues.
The number to validate is like this:
00.00 (any 2 numbers, then a ., then any 2 numbers)
This is what I have:
^[0-9]{2}[.][0-9]{2}$
This expression returns false but from a tutorial I read I was under the understanding that it should be written like this:
^ = starting character
[0-9] = any number 0-9
{2} = 2 numbers 0-9
[.] = full stop
$ = end

Use the right tool for the job. If you're parsing decimals, use decimal.TryParse instead of Regex.
string input = "00.00";
decimal d;
var parsed = Decimal.TryParse(input, out d);
If the requirement is to always have a 2 digits then a decimal point then 2 digits you could do:
var lessThan100 = d < 100m;
var twoDecimals = d % 0.01m == 0;
var allOkay = parsed && lessThan100 && twoDecimals;
So our results are
Stage | input = "" | "abc" | "00.00" | "123" | "0.1234"
-------------------------------------------------------------
parsed | false | false | true | true | true
lessThan100 | - | - | true | false | true
twoDecimals | - | - | true | - | false
Although if you really need it to be that exact format then you could do
var separator = CultureInfo.CurrentCulture.NumberFormat.NumberDecimalSeparator;
var allOkay = isOkay && input.Length == 5 && input[2] == separator;

If you absolutely have to use Regex then the following works as required:
Regex.IsMatch("12.34", #"^([0-9]{2}\.[0-9]{2})$")
Regex explanation:
^ - start of string
() - match what's inside of brackets
[0-9]{2} exactly 2 characters in the range 0 - 9
\. - full stop (escaped)
$ - end of string

Regex Pattern for filter out anything that doesn't Match

Using Regex.Replace(mystring, #"[^MV:0-9]", "") will remove any Letters that are not M,V,:, or 0-9 (\d could also be used) the problem is I want to remove anything that is not MV: then numbers.
I need to replace anything that is not this pattern with nothing:
Starting String | Wanted Result
---------------------------------------------------------
sdhfuiosdhusdhMV:1234567890sdfahuosdho | MV:1234567890
MV:2138911230989hdsafh89ash32893h8u098 | MV:2138911230989
809308ej0efj0934jf0934jf4fj84j8904jf09 | Null
123MV:1234321234mnnnio234324234njiojh3 | MV:1234321234
mdfmsdfuiovvvajio123oij213432ofjoi32mm | Null
But what I get with what I have is:
Starting String | Returned Result
---------------------------------------------------------
sdhfuiosdhusdhMV:1234567890sdfahuosdho | MV:1234567890
MV:2138911230989hdsafh89ash32893h8u098 | MV:213891123098989328938098
809308ej0efj0934jf0934jf4fj84j8904jf09 | 809308009340934484890409
123MV:1234321234mnnnio234324234njiojh3 | 123MV:12343212342343242343
mdfmsdfuiovvvajio123oij213432ofjoi32mm | mmvvv1232134232mm
And even if there is a Regex pattern for this would I be better off using something along the lines of:
if (Regex.IsMatch(strMyString, #"MV:"))
{
string[] strarMyString = Regex.Split(strMyString, #"MV:");
string[] strarNumbersAfterMV = Regex.Split(strarMyString[1], #"[^\d]");
string WhatIWant = strarNumbersAfterMV[0]
}
If I went with the Latter option would there be away to have:
string[] strarNumbersAfterMV = Regex.Split(strarMyString[1], #"[^\d]");
Only make one split at the first change from numbers? (It will always start with number following the MV:)

Can't you just do:
string matchedText = null;
var match = Regex.Match(myString, #"MV:[0-9]+");
if (match.Success)
{
matchedText = Value;
}
Console.WriteLine((matchedText == null) ? "Not found" : matchedText);
That should give you exactly what you need.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to extract specific value from a string with Regex? [closed] - c#

You can use look-behind ?<= to find any number following ^" (where ^ is the beginning of line): (?<=^")[0-9]+

Related

Split long string for each colon ":" and get index of the line by position

Trim string in c# output last string

How to create Regex that contains Not colon char?

How do I validate a decimal like this 00.00?

Regex Pattern for filter out anything that doesn't Match

Categories

Resources