Extracting text from the middle of a string [closed]

Extracting text from the middle of a string [closed] - c#

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have the following string that is captured from the DVLA when looking up a car registration details and I need to be able to extract just the numbers from the CC.
"A5 S LINE BLACK EDITION PLUS TDI 190 (2 DOOR), 1968cc, 2015 -
PRESENT"
Given that the lentgh of the string can change, is there a way to do this with a sub-string so for example always grab the numbers from before the cc without the space that comes before it? Bare in mind too that this can sometimes be a 3 digit number or a four digit number.

This does the trick:
string input = "A5 S LINE BLACK EDITION PLUS TDI 190 (2 DOOR), 1968cc, 2015 - PRESENT";
string size;
Regex r = new Regex("(\\d*)cc", RegexOptions.IgnoreCase);
Match m = r.Match(input);
if (m.Success)
{
size = m.Groups[0];
}
It captures every number that is right before cc

If the count of the comma doesn't change you can do following:
string s = "A5 S LINE BLACK EDITION PLUS TDI 190 (2 DOOR), 1968cc, 2015 - PRESENT";
string ccString = s.Split(',').ToList().Find(x => x.EndsWith("cc")).Trim();
int cc = Int32.Parse(ccString.Substring(0, ccString.Length - 2));

You can use Regex to match a pattern withing the string - so you can return parts of the string that match the given pattern. This Regex pattern will attempt to match parts of the string that fit the following pattern:
\d{1,5} *[cC]{2}
Starts with 1 to 5 digits \d{1,5} (seems sensible for an engine cc value!)
Can then contain 0 or more spaces in between that and cc *
Ends with any combination of 2 C or c [cC]{2}
So you can then use this in the following manner:
string str = "A5 S LINE BLACK EDITION PLUS TDI 190 (2 DOOR), 1968cc, 2015 - PRESENT";
Match result = Regex.Match(str, #"\d{1,5} *[cC]{2}");
string cc = result.Value; // 1968cc

Here is another solution:
string text = "A5 S LINE BLACK EDITION PLUS TDI 190 (2 DOOR), 1968cc, 2015 - PRESENT";
string[] substrings = text.Split(',');
string numeric = new String(substrings[1].Where(Char.IsDigit).ToArray());
Console.WriteLine(numeric);
Here is a working DEMO

Related

Regex to extract time information from a string [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I'm receiving data from a third party device. I need to extract two pieces of information. I think I need to use a Regular Expression, but I don't know anything of this.
Below you can find a few example strings:
TN 12 1 17:45:19.90400 7173
TN 4 4 17:45:20.51800 7173
TN 13 1 17:45:24.03200 7173
TN 5 4 17:45:26.06300 7173
TN 6 4 17:45:29.28700 7173
TN 14 1 17:45:31.03200 7173
From each of these strings I need to extract two pieces of data:
the time
the number before the time
So the data I'm looking for is this:
1 and 17:45:19.90400
4 and 17:45:20.51800
1 and 17:45:24.03200
4 and 17:45:26.06300
4 and 17:45:29.28700
1 and 17:45:31.03200
The number will always be present and it will always be 1, 2, 3 or 4.
The time will also be the same format but I'm not sure if there will be single digit hours. So I don't know if 9 o'clock will be displayed as
9 or 09
Any suggestions on how I can extract this using a RegEx?
Thanks

My usual approach to this is to create a class that represents the data we want to capture, and give it a static Parse method that takes in an input string and returns an instance of the class populated with data from the string. Then we can just loop through the lines and populate a list of our custom class with data from each line.
For example:
class TimeData
{
public TimeSpan Time { get; set; }
public int Number { get; set; }
public static TimeData Parse(string input)
{
var timeData = new TimeData();
int number;
TimeSpan time;
if (string.IsNullOrWhiteSpace(input)) return timeData;
var parts = input.Split(new char[0], StringSplitOptions.RemoveEmptyEntries);
if (parts.Length > 2 && int.TryParse(parts[2], out number))
{
timeData.Number = number;
}
if (parts.Length > 3 && TimeSpan.TryParseExact(parts[3], "hh\\:mm\\:ss\\.fffff",
CultureInfo.CurrentCulture, out time))
{
timeData.Time = time;
}
return timeData;
}
}
Now we can just loop through the list of strings, call Parse on each line, and end up with a new list of objects that contain the Time and associated Number for each line. Also note that, by using a TimeSpan to represent the time, we now have properties for all the parts, like Hour, Minute, Seconds, Milliseconds, TotalMinutes, etc:
var fileLines = new List<string>
{
"TN 12 1 17:45:19.90400 7173",
"TN 4 4 17:45:20.51800 7173",
"TN 13 1 17:45:24.03200 7173",
"TN 5 4 17:45:26.06300 7173",
"TN 6 4 17:45:29.28700 7173",
"TN 14 1 17:45:31.03200 7173",
};
List<TimeData> allTimeData = fileLines.Select(TimeData.Parse).ToList();

get a specific string in C# and trim the value [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
Hi I would like to trim a string
IWF01 - STSD Campus | 1009432 | Posted Today
I need to get this string 1009432.
Or something like this
ROS03 - Roseville, CA R-3, More... | T_R_1624621 | Posted Today
I want to get this one T_R_1624621.
How do I get that part of string only?

string s = "ROS03 - Roseville, CA R-3, More... | T_R_1624621 | Posted Today";
var strArr = s.Split('|');
string yourValue = strArr[1].Trim();
Beware, this can cause some exceptions. If you don't have right string (that can be splitted by |, or if you have string that has only one | etc...

The first thing I'd do is split the string, then I'd get the 2nd item (provided that there are enough items).
Here's a function that'll do what you need (remember that it won't raise an exception if there aren't enough items):
string GetFieldTrimmed(string input, char separator, int fieldIndex)
{
var strSplitArray = input.Split(separator);
return strSplitArray.Length >= fieldIndex + 1
? strSplitArray[fieldIndex].Trim()
: "";
}
Example usage:
var fieldVal = GetFieldTrimmed("ROS03 - Roseville, CA R-3, More... | T_R_1624621 | Posted Today", '|', 1);

RegEx to accept only numbers between 0-9 [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
I'm fairly new to using regex and I've got a problem with the results of this one.
Regex #"^0[0-9]{9,10}$" is accepting spaces between the digits. How to I stop this as I only want digits between 0-9 to be acceptable characters.
The valid result should start with zero and have either 9 or 10 digits (no spaces or any other character is permissible).
All help appreciated.
Here's my code
Regex telephoneExp = new Regex(#"^0[0-9]{9,10}$");
if (telephoneExp.Match(txtTelephoneNumber.Text).Success==false)
{
MessageBox.Show("The telephone number is not valid, it must only contain digits (0-9) and be either 10 or 11 digits in length.");
return false;
}

This works
^0\d{9,10}$
Matches 0, then 9 to 10 digits.
You could also use Regex.IsMatch instead:
Regex telephoneExp = new Regex(#"^0\d{9,10}$");
if (!telephoneExp.IsMatch(txtTelephoneNumber.Text))
{
MessageBox.Show("The telephone number is not valid, it must only contain digits (0-9) and be either 10 or 11 digits in length.");
return false;
}

Have you tried this to check if it's only a number:
^\d*$
UPDATE:
(^0\d{8,9})$
Checks if 9 or 10 digits starting with leading 0

You don't need regex, you can use LINQ:
var text = txtTelephoneNumber.Text;
bool valid = false;
if (!String.IsNullOrEmpty(text))
{
valid = text.First() == '0' &&
9 <= text.Count() && text.Count() <= 10 &&
text.All(c => Char.IsDigit(c));
}
if (!valid)
{
}

Your code is correct but remove empty chars using Trim() function 100% it will working fine,
Regex telephoneExp = new Regex(#"^0[0-9]{9,10}$");
if (telephoneExp.Match(txtTelephoneNumber.Text.Trim()).Success==false)
{
MessageBox.Show("The telephone number is not valid, it must only contain digits (0-9) and be either 10 or 11 digits in length.");
return false;
}

Need multiple regular expression matches using C#

So I have this list of flight data and I need to be able to parse through it using regular expressions (this isn't the entire list).
1 AA2401 F7 A4 Y7 B7 M7 H7 K7 /DFW A LAX 4 0715 0836 E0.M80 9 3:21
2 AA2421 F7 A1 Y7 B7 M7 H7 K7 DFWLAX 4 1106 1215 E0.777 7 3:09
3UA:US6352 B9 M9 H9 K0 /DFW 1 LAX 1200 1448 E0.733 1:48
For example, I might need from the first line 1, AA, 2401, and so on and so on. Now, I'm not asking for someone to come up with a regular expression for me because for the most part I'm getting to where I can pretty much handle that myself. My issue has more to do with being able to store the data some where and access it.
So I'm just trying to initially just "match" the first piece of data I need, which is the line number '1'. My "pattern" for just getting the first number is: ".?(\d{1,2}).*" . The reason it's {1,2} is because obviously once you get past 10 it needs to be able to take 2 numbers. The rest of the line is set up so that it will definitely be a space or a letter.
Here's the code:
var assembly = Assembly.GetExecutingAssembly();
var textStreamReader = new StreamReader(
assembly.GetManifestResourceStream("FlightParser.flightdata.txt"));
List<string> lines = new List<string>();
do
{
lines.Add(textStreamReader.ReadLine());
} while (!textStreamReader.EndOfStream);
Regex sPattern = new Regex(#".?(\d{1,2}).*");//whatever the pattern is
foreach (string line in lines)
{
System.Console.Write("{0,24}", line);
MatchCollection mc = sPattern.Matches(line);
if ( sPattern.IsMatch(line))
{
System.Console.WriteLine(" (match for '{0}' found)", sPattern);
}
else
{
System.Console.WriteLine();
}
System.Console.WriteLine(mc[0].Groups[0].Captures);
System.Console.WriteLine(line);
}//end foreach
System.Console.ReadLine();
With the code I'm writing, I'm basically just trying to get '1' into the match collection and somehow access it and write it to the console (for the sake of testing, that's not the ultimate goal).

Your regex pattern includes an asterisk which matches any number of characters - ie. the whole line. Remove the "*" and it will only match the "1". You may find an online RegEx tester such as this useful.

Assuming your file is not actually formatted as you posted and has each of the fields separated by something, you can match the first two-digit number of the line with this regex (ignoring 0 and leading zeros):
^\s*([1-9]\d?)
Since it is grouped, you can access the matched part through the Groups property of the Match object.
var line = "12 foobar blah 123 etc";
var re = new Regex(#"^\s*([1-9]\d?)");
var match = re.Match(line);
if (match.Success)
{
Console.WriteLine(match.Groups[1].Value); // "12"
}
else
{
Console.WriteLine("No match");
}

The following expression matches the first digit, that you wanted to capture, in the group "First".
^\s*(?<First>\d{1})
I find this regular expression tool highly useful when dealing with regex. Give it a try.
Also set RegexOption to Multiline when you are making the match.

How does this regex find triangular numbers?

Part of a series of educational regex articles, this is a gentle introduction to the concept of nested references.
The first few triangular numbers are:
1 = 1
3 = 1 + 2
6 = 1 + 2 + 3
10 = 1 + 2 + 3 + 4
15 = 1 + 2 + 3 + 4 + 5
There are many ways to check if a number is triangular. There's this interesting technique that uses regular expressions as follows:
Given n, we first create a string of length n filled with the same character
We then match this string against the pattern ^(\1.|^.)+$
n is triangular if and only if this pattern matches the string
Here are some snippets to show that this works in several languages:
PHP (on ideone.com)
$r = '/^(\1.|^.)+$/';
foreach (range(0,50) as $n) {
if (preg_match($r, str_repeat('o', $n))) {
print("$n ");
}
}
Java (on ideone.com)
for (int n = 0; n <= 50; n++) {
String s = new String(new char[n]);
if (s.matches("(\\1.|^.)+")) {
System.out.print(n + " ");
}
}
C# (on ideone.com)
Regex r = new Regex(#"^(\1.|^.)+$");
for (int n = 0; n <= 50; n++) {
if (r.IsMatch("".PadLeft(n))) {
Console.Write("{0} ", n);
}
}
So this regex seems to work, but can someone explain how?
Similar questions
How to determine if a number is a prime with regex?

Explanation
Here's a schematic breakdown of the pattern:
from beginning…
| …to end
| |
^(\1.|^.)+$
\______/|___match
group 1 one-or-more times
The (…) brackets define capturing group 1, and this group is matched repeatedly with +. This subpattern is anchored with ^ and $ to see if it can match the entire string.
Group 1 tries to match this|that alternates:
\1., that is, what group 1 matched (self reference!), plus one of "any" character,
or ^., that is, just "any" one character at the beginning
Note that in group 1, we have a reference to what group 1 matched! This is a nested/self reference, and is the main idea introduced in this example. Keep in mind that when a capturing group is repeated, generally it only keeps the last capture, so the self reference in this case essentially says:
"Try to match what I matched last time, plus one more. That's what I'll match this time."
Similar to a recursion, there has to be a "base case" with self references. At the first iteration of the +, group 1 had not captured anything yet (which is NOT the same as saying that it starts off with an empty string). Hence the second alternation is introduced, as a way to "initialize" group 1, which is that it's allowed to capture one character when it's at the beginning of the string.
So as it is repeated with +, group 1 first tries to match 1 character, then 2, then 3, then 4, etc. The sum of these numbers is a triangular number.
Further explorations
Note that for simplification, we used strings that consists of the same repeating character as our input. Now that we know how this pattern works, we can see that this pattern can also match strings like "1121231234", "aababc", etc.
Note also that if we find that n is a triangular number, i.e. n = 1 + 2 + … + k, the length of the string captured by group 1 at the end will be k.
Both of these points are shown in the following C# snippet (also seen on ideone.com):
Regex r = new Regex(#"^(\1.|^.)+$");
Console.WriteLine(r.IsMatch("aababc")); // True
Console.WriteLine(r.IsMatch("1121231234")); // True
Console.WriteLine(r.IsMatch("iLoveRegEx")); // False
for (int n = 0; n <= 50; n++) {
Match m = r.Match("".PadLeft(n));
if (m.Success) {
Console.WriteLine("{0} = sum(1..{1})", n, m.Groups[1].Length);
}
}
// 1 = sum(1..1)
// 3 = sum(1..2)
// 6 = sum(1..3)
// 10 = sum(1..4)
// 15 = sum(1..5)
// 21 = sum(1..6)
// 28 = sum(1..7)
// 36 = sum(1..8)
// 45 = sum(1..9)
Flavor notes
Not all flavors support nested references. Always familiarize yourself with the quirks of the flavor that you're working with (and consequently, it almost always helps to provide this information whenever you're asking regex-related questions).
In most flavors, the standard regex matching mechanism tries to see if a pattern can match any part of the input string (possibly, but not necessarily, the entire input). This means that you should remember to always anchor your pattern with ^ and $ whenever necessary.
Java is slightly different in that String.matches, Pattern.matches and Matcher.matches attempt to match a pattern against the entire input string. This is why the anchors can be omitted in the above snippet.
Note that in other contexts, you may need to use \A and \Z anchors instead. For example, in multiline mode, ^ and $ match the beginning and end of each line in the input.
One last thing is that in .NET regex, you CAN actually get all the intermediate captures made by a repeated capturing group. In most flavors, you can't: all intermediate captures are lost and you only get to keep the last.
Related questions
(Java) method matches not work well - with examples on how to do prefix/suffix/infix matching
Is there a regex flavor that allows me to count the number of repetitions matched by * and + (.NET!)
Bonus material: Using regex to find power of twos!!!
With very slight modification, you can use the same techniques presented here to find power of twos.
Here's the basic mathematical property that you want to take advantage of:
1 = 1
2 = (1) + 1
4 = (1+2) + 1
8 = (1+2+4) + 1
16 = (1+2+4+8) + 1
32 = (1+2+4+8+16) + 1
The solution is given below (but do try to solve it yourself first!!!!)
(see on ideone.com in PHP, Java, and C#):
^(\1\1|^.)*.$

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Extracting text from the middle of a string [closed] - c#

This does the trick: string input = "A5 S LINE BLACK EDITION PLUS TDI 190 (2 DOOR), 1968cc, 2015 - PRESENT"; string size; Regex r = new Regex("(\\d*)cc", RegexOptions.IgnoreCase); Match m = r.Match(input); if (m.Success) { size = m.Groups[0]; } It captures every number that is right before cc

If the count of the comma doesn't change you can do following: string s = "A5 S LINE BLACK EDITION PLUS TDI 190 (2 DOOR), 1968cc, 2015 - PRESENT"; string ccString = s.Split(',').ToList().Find(x => x.EndsWith("cc")).Trim(); int cc = Int32.Parse(ccString.Substring(0, ccString.Length - 2));

Here is another solution: string text = "A5 S LINE BLACK EDITION PLUS TDI 190 (2 DOOR), 1968cc, 2015 - PRESENT"; string[] substrings = text.Split(','); string numeric = new String(substrings[1].Where(Char.IsDigit).ToArray()); Console.WriteLine(numeric); Here is a working DEMO

Related

Regex to extract time information from a string [closed]

get a specific string in C# and trim the value [closed]

RegEx to accept only numbers between 0-9 [closed]

Need multiple regular expression matches using C#

How does this regex find triangular numbers?

Categories

Resources