C# regex replace of numbers and units

C# regex replace of numbers and units - c#

Have a number of product names with sizes in.
For example "X 300g", "X 400 g", "X 250 kg", "X 25kg".
Now I would like to replace all instances of "NumberUnit" with "Number Unit", i.e. "300g" to "300 g", "25kg" to "25 kg", etc.
I know I can do that easily with a loop and a string replace, but there are millions of products and I worry it would take to long to do it that way.
Instead, I guess it might be a better idea to do a Regex.Replace() on each product name.
Agree? - and how would you write the regex?
Thanks.

Try this code:
(?<=\d)(?=[a-zA-Z])
The previous regex will capture the location between numbers and units only if they don't have a space in between. You can replace this location with a space, and you will be done.
Demo: https://regex101.com/r/yry7Sl/3

Try this:
string pattern = #"(?<=\d)(?=[a-zA-Z])";
string substitution = #" ";
string input = #"""X 300g"", ""X 400 g"", ""X 250 kg"", ""X 25kg""";
Regex regex = new Regex(pattern);
string result = regex.Replace(input, substitution);

Related

Regex first digits occurrence

My task is extract the first digits in the following string:
GLB=VSCA|34|speed|1|
My pattern is the following:
(?x:VSCA(\|){1}(\d.))
Basically I need to extract "34", the first digits occurrence after the "VSCA". With my pattern I obtain a group but would be possibile to get only the number? this is my c# snippet:
string regex = #"(?x:VSCA(\|){1}(\d.))";
Regex rx = new Regex(regex);
string s = "GLB=VSCA|34|speed|1|";
if (rx.Match(s).Success)
{
var test = rx.Match(s).Groups[1].ToString();
}

You could match 34 (the first digits after VSCA) using a positive lookbehind (?<=VSCA\D*) to assert that what is on the left side is VSCA followed by zero or times not a digit \D* and then match one or more digits \d+:
(?<=VSCA\D*)\d+
If you need the pipe to be after VSCA the you could include that in the lookbehind:
(?<=VSCA\|)\d+
Demo

This regex pattern: (?<=VSCA\|)\d+?(?=\|) will match only the number. (If your number can be negative / have decimal places you may want to use (?<=VSCA\|).+?(?=\|) instead)

You don't need Regex for this, you can simply split on the '|' character:
string s = "GLB=VSCA|34|speed|1|";
string[] parts = s.Split('|');
if(parts.Length >= 2)
{
Console.WriteLine(parts[1]); //prints 34
}
The benefit here is that you can access all parts of the original string based on the index:
[0] - "GLB=VSCA"
[1] - "34"
[2] - "speed"
[3] - "1"
Fiddle here

While the other answers work really well, if you really must use a regular expression, or are interested in knowing how to get to that straight away you can use a named group for the number. Consider the following code:
string regex = #"(?x:VSCA(\|){1}(?<number>\d.?))";
Regex rx = new Regex(regex);
string s = "GLB:VSCA|34|speed|1|";
var match = rx.Match(s);
if(match.Success) Console.WriteLine(match.Groups["number"]);

How about (?<=VSCA\|)[0-9]+?
Try it out here

Regex - extract rest of string after specific sequence

I have a long string with random letters, numbers, and spaces.
I need a regex expression to pull out the part of the string after the sequence of characters and numbers --> AQ102.
For example :
string t = "kjdsjsk158dfdd 125.196.168.210helloAQ102Lab101 section2";
desired output:
Lab101 section2

Why not use
string s = t.Split("AQ102").Last();

Or, a regular expression as originally asked for:
Regex regEx = new Regex(#".*(AQ102.*)");
OR
Regex regEx = new Regex(#".*(AQ102)(.*)");
And you can get the matches doing the following:
Matches matches = regEx.Matches(t);
And you can get the match by referencing the first index:
matches[1]
OR, if you're really confident:
string val = regEx.Matches(t)[1].Value;

Don't need Regex for this. A simple split should suffice:
string output = input.Split(new string[] { "AQ102" }, StringSplitOptions.None)[1];
Depend on how sure you are of your input, you may want to check that AQ102 exist first, or even to count how many times... but as I said, depends on your scenario.

Add space between Number and String

I have a string Like this
"My Train Coming on Track 10B on 6A and string test with 11S"
Now i want to Add space between Number like 11 and Char B and so on
i want to like this
"My Train Coming on Track 10 B on 6 A and string test with 11 S"
using C#. is there any logic for that.
thank

With a regex:
var result = Regex.Replace(str, #"(?<=\d)(?=\p{L})", " ");
This replaces the "empty space" between a digit ((?<=\d)) and a letter ((?=\p{L})) with a space.
A different method without the lookarounds would be:
var result = Regex.Replace(str, #"(\d)(\p{L})", "$1 $2");
In this case, it replaces the last digit and the first letter with the pattern $1 $2, inserting a space in the process.

The Above answer is correct But according to requirement we have to use that like :
var result = Regex.Replace("4A", #"(?=\p{L})(?<=\d)", " ");
I hope it will help you.

How to get all words of a string in c#?

I have a paragraph in a single string and I'd like to get all the words in that paragraph.
My problem is that I don't want the suffixes words that end with punctuation marks such as (',','.',''','"',';',':','!','?') and /n /t etc.
I also don't want words with 's and 'm such as world's where it should only return world.
In the example
he said. "My dog's bone, toy, are missing!"
the list should be: he said my dog bone toy are missing

Expanding on Shan's answer, I would consider something like this as a starting point:
MatchCollection matches = Regex.Match(input, #"\b[\w']*\b");
Why include the ' character? Because this will prevent words like "we're" from being split into two words. After capturing it, you can manually strip out the suffix yourself (whereas otherwise, you couldn't recognize that re is not a word and ignore it).
So:
static string[] GetWords(string input)
{
MatchCollection matches = Regex.Matches(input, #"\b[\w']*\b");
var words = from m in matches.Cast<Match>()
where !string.IsNullOrEmpty(m.Value)
select TrimSuffix(m.Value);
return words.ToArray();
}
static string TrimSuffix(string word)
{
int apostropheLocation = word.IndexOf('\'');
if (apostropheLocation != -1)
{
word = word.Substring(0, apostropheLocation);
}
return word;
}
Example input:
he said. "My dog's bone, toy, are missing!" What're you doing tonight, by the way?
Example output:
[he, said, My, dog, bone, toy, are, missing, What, you, doing, tonight, by, the, way]
One limitation of this approach is that it will not handle acronyms well; e.g., "Y.M.C.A." would be treated as four words. I think that could also be handled by including . as a character to match in a word and then stripping it out if it's a full stop afterwards (i.e., by checking that it's the only period in the word as well as the last character).

Hope this is helpful for you:
string[] separators = new string[] {",", ".", "!", "\'", " ", "\'s"};
string text = "My dog's bone, toy, are missing!";
foreach (string word in text.Split(separators, StringSplitOptions.RemoveEmptyEntries))
Console.WriteLine(word);

See Regex word boundary expressions, What is the most efficient way to count all of the words in a richtextbox?. Moral of the story is that there are many ways to approach the problem, but regular expressions are probably the way to go for simplicity.

split on whitespace, trim anything that isn't a letter on the resulting strings.

Here's a looping replace method... not fast, but a way to solve it...
string result = "string to cut ' stuff. ! out of";
".',!#".ToCharArray().ToList().ForEach(a => result = result.Replace(a.ToString(),""));
This assumes you want to place it back in the original string, not a new string or a list.

RegEx Problem using .NET

I have a little problem on RegEx pattern in c#. Here's the rule below:
input: 1234567
expected output: 123/1234567
Rules:
Get the first three digit in the input. //123
Add /
Append the the original input. //123/1234567
The expected output should looks like this: 123/1234567
here's my regex pattern:
regex rx = new regex(#"((\w{1,3})(\w{1,7}))");
but the output is incorrect. 123/4567

I think this is what you're looking for:
string s = #"1234567";
s = Regex.Replace(s, #"(\w{3})(\w+)", #"$1/$1$2");
Instead of trying to match part of the string, then match the whole string, just match the whole thing in two capture groups and reuse the first one.

It's not clear why you need a RegEx for this. Why not just do:
string x = "1234567";
string result = x.Substring(0, 3) + "/" + x;

Another option is:
string s = Regex.Replace("1234567", #"^\w{3}", "$&/$&"););
That would capture 123 and replace it to 123/123, leaving the tail of 4567.
^\w{3} - Matches the first 3 characters.
$& - replace with the whole match.
You could also do #"^(\w{3})", "$1/$1" if you are more comfortable with it; it is better known.

Use positive look-ahead assertions, as they don't 'consume' characters in the current input stream, while still capturing input into groups:
Regex rx = new Regex(#"(?'group1'?=\w{1,3})(?'group2'?=\w{1,7})");
group1 should be 123, group2 should be 1234567.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# regex replace of numbers and units - c#

Try this code: (?<=\d)(?=[a-zA-Z]) The previous regex will capture the location between numbers and units only if they don't have a space in between. You can replace this location with a space, and you will be done. Demo: https://regex101.com/r/yry7Sl/3

Try this: string pattern = #"(?<=\d)(?=[a-zA-Z])"; string substitution = #" "; string input = #"""X 300g"", ""X 400 g"", ""X 250 kg"", ""X 25kg"""; Regex regex = new Regex(pattern); string result = regex.Replace(input, substitution);

Related

Regex first digits occurrence

Regex - extract rest of string after specific sequence

Add space between Number and String

How to get all words of a string in c#?

RegEx Problem using .NET

Categories

Resources