Remove all zeroes except in numbers - c#

I have to remove all the zeroes in a string, but I have to keep the zeroes in numbers.
The strings I receive are in a format similar to "zeroes-letter-zeroes-number", without the '-' and the numbers are always integers. A few examples:
"0A055" -> "A55"
"0A050" -> "A50"
"0A500" -> "A500"
"0A0505" -> "A505"
"0055" -> "55"
"0505" -> "505"
"0050" -> "50"
I know I can iterate trough the characters in the string and set a flag when I encounter a letter or a number different from 0, but I think that using a RegEx would nicer. The RegEx would also be more helpful if I'll have to use this algorithm in the database.
I tried something like this but I don't get the results that I want:
Regex r = new Regex(#"[0*([a-zA-Z]*)0*([1-9]*)]");
string result = r.Replace(input, "");
I'm not so good in writing RegEx-es so please help me if you can.

I'm not convinced that a regex is the best way to approach this, but this one works with all your test cases:
string clean = Regex.Replace(dirty, #"(?<!\d)0+|0+(?!\d|$)", "");

If I understand your pattern correctly, the following should work.
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
public class Test
{
public static void Main()
{
List<String> samples = new List<String>(new[]{
"0A055","0A050","0A500","0A0505","0055","0505","0050"
});
String re = #"^0*([A-Z]*)0*([1-9]\d*)$";
// iterate over all results
samples.ForEach(n => {
Console.WriteLine("\"{0}\" -> \"{1}\"",
n,
Regex.Replace(n, re, "$1$2")
);
});
}
}
With the following output:
"0A055" -> "A55"
"0A050" -> "A50"
"0A500" -> "A500"
"0A0505" -> "A505"
"0055" -> "55"
"0505" -> "505"
"0050" -> "50"
Basically use the pattern to negate all 0s that don't matter, and use the regex replace grouping to re-concatenate the "meaningful" numbers (and letters when present).

Like some of the others I'm not sure regex is the best idea here, but this works with the test cases:
0+(?=[0-9].)|0(?=[a-zA-z])|(?<=[a-zA-Z])0+

Since you seem to only have one letter, you can split the string in two halves on that letter.
On the left part, trim all zeros.
On the right part, convert it to a number, this will drop all leading zeros or you could use TrimStart.

To do a replace with regex will be much harder than extracting the value you want. So try match the string using a simple regex like below
0*(?<letter>[A-Z])0*(?<number>\d*)
Your match result will then contain two groups, letter and number. Take the value of the two group and append them and you will get what you wanted.

Here's a Perl answer for what it's worth
s/0*([a-zA-Z]*)0*([1-9]+0*)/$1$2/g

I don't know how regex is implemented in .net, so I'll let you write the proper code with the toys in System.Text.Regularexpressions.Regex (MSDN)
Either way, this pattern should work (in pseudo-code):
Replace "(0*)(.+)" by "$2"
0* means zero or more 0
.+ means any character except end of line
$2 represents the second set of brackets (so we're simply discarding the (0*) part of the string).

Related

Using Regular Expression to match fields with an arbitrary delimiter

I suppose this should be an old question, however, I didn't find suitable solution in the forums after several hours searching.
I'm using C# and I know the Regex.Split and String.Split methods can be used to achieve the expected results. For some reason, I need to use a regular expression to match the required fields by specifying an arbitrary delimiter. For example, here is the string:
#DIV#This#DIV#is#DIV#"A "#DIV#string#DIV#
Here, #DIV# is the delimiter and is going to be split as:
This
is
"A "
string
How can I use a regular expression to match these values?
By the way, the leading and trailing #DIV# could also be ignored, for example, below source string should also be same result with above:
#DIV#This#DIV#is#DIV#"A "#DIV#string
This#DIV#is#DIV#"A "#DIV#string#DIV#
This#DIV#is#DIV#"A "#DIV#string
UPDATE:
I think I found a way (mind it is not efficient!) to get rid of empty values with a regex.
var splits = Regex.Matches(strIn, #"(?<=#DIV#|^)(?:(?!#DIV#).)+?(?=$|#DIV#)");
See demo on regexstorm (mind the \r? is only to demo in Multiline mode, you do not need it when using in real life)
ORIGINAL ANSWER
Here is another approach using a regular Split:
var strIn = "#DIV#This#DIV#is#DIV#\"A # \"#DIV#string#DIV#";
var splitText = strIn.Split(new[] {"#DIV#"}, StringSplitOptions.RemoveEmptyEntries);
Or else, you can use a regex to match the fields you need and then remove empty items with LINQ:
var spltsTxt2 = Regex.Matches(strIn, #"(?<=#DIV#|^).*?(?=#DIV#|$)").Cast<Match>().Where(p => !string.IsNullOrEmpty(p.Value)).Select(p => p.Value).ToList();
Output:
#DIV#|(.+?)(?=#DIV#|$)
Try this.Grab the captures or groups.See demo.
https://www.regex101.com/r/fJ6cR4/21
You can use the following to match:
/#?DIV#?/g
And replace with ' ' (space)
But this will give trailing and leading spaces sometimes.. which can be removed by using String.Trim()
Edit1: If you want to match the field values you can use the following:
(?<=(#?DIV#?)|^)[^#]*?(?=(#?DIV#?)|$)
See DEMO
Edit2: More generalized regex for matching # in fields:
(?m)(?<=(^(?!#?DIV#)|(#?DIV#)))(.*?)(?=($|(#DIV#?)))

Regex - find every occurrence of integer surrounded by space and coma

I have the following string:
"121 fd412 4151 3213, 421, 423 41241 fdsfsd"
And I need to get 3213 and 421 - because they both have space in front of them, and a coma behind.
The result will be set inside the string array...How can I do that?
"\\d+" catches every integer.
"\s\\d+(,)" throws some memory errors.
EDIT.
space to the left (<-) of the number, coma to the right (->)
EDIT 2.
string mainString = "Tests run: 5816, 8346, 28364 iansufbiausbfbabsbo3 4";
MatchCollection c = Regex.Matches(a, #"\d+(?=\,)");
var myList = new List<String>();
foreach(Match match in c)
{
myList.Add(match.Value);
}
Console.Write(myList[1]);
Console.ReadKey();
Your regex syntax is incorrect for wanting to match both digits, if you want them as separate results, you could do:
#"\s(\d+),\s(\d+)\s"
Live Demo
Edit
#"\s(\d+),"
Live Demo
\s\\d+(,):
\s is not properly escaped, should be \\s, same as for \\d
\\d matches single digit, you need \\d+ - one or more consecutive digits
(,) captures comma, do you really need this? seems like you need to capture a number, so \\s(\\d+),
you said "because they both have space behind them, and a coma in front", so probably ,\\s(\\d+)
How about this expression :
" \d+," // expression without the quotes
it should find what you need.
How to work with regular expression can you check on the MSDN
Hope it helps
Another solution
\s(\d+), // or maybe you'll need a double slash \\
Output:
3213
421
Demo
I think you mean you're looking for something like ,<space><digit> not ,<digit><space>
If so, try this:
, (\d+) //you might need to add another backslash as the others have noted
Well, based on your new edit
\s(\d+),
Test it here
It's all you need, only the numbers
\d+(?=\,)

Regular expression to check if a string contains all specific symbols

the problem is that possible strings are:
abcdefghijklmnopqrstuvwxyz
(sorted)
and i have other one string that have chars to find like: adef
what is the regex to check if all of specified characters is in string?
test cases:
string: amnosxy
find chars: osy
result: true
strings: amnosxy
find chars: anz
result: false ( z not found ).
it looks like containsAll method
what is the regex to check? (it is possible to make it dinamically depends on find chars string).
I don't like solution like loop for each chars and check IndexOf..
No need to use regex:
bool containsAll = !"osy".Except("amnosxy").Any();
Another efficient approach is using a HashSet<char> and it's IsSubsetOf method:
HashSet<char> chars = new HashSet<char>("osy");
bool containsAll = chars.IsSubsetOf("amnosxy");
I wouldn't use regular expressions for this if it is guaranteed that both arrays are sorted. Just loop input[x] through until you find toFind[y] or until toFind[y] is bigger than input[x] -> which would mean there is no such element.
Edit: alternative RegEx: .*o.*s.*y.*, so just put .* between all those chars.

Check Formatting of a String

This has probably been answered somewhere before but since there are millions of unrelated posts about string formatting.
Take the following string:
24:Something(true;false;true)[0,1,0]
I want to be able to do two things in this case. I need to check whether or not all the following conditions are true:
There is only one : Achieved using Split() which I needed to use anyway to separate the two parts.
The integer before the : is a 1-3 digit int Simple int.parse logic
The () exists, and that the "Something", in this case any string less than 10 characters, is there
The [] exists and has at least 1 integer in it. Also, make sure the elements in the [] are integers separated by ,
How can I best do this?
EDIT: I have crossed out what I've achieved so far.
A regular expression is the quickest way. Depending on the complexity it may also be the most computationally expensive.
This seems to do what you need (I'm not that good so there might be better ways to do this):
^\d{1,3}:\w{1,9}\((true|false)(;true|;false)*\)\[\d(,[\d])*\]$
Explanation
\d{1,3}
1 to 3 digits
:
followed by a colon
\w{1,9}
followed by a 1-9 character alpha-numeric string,
\((true|false)(;true|;false)*\)
followed by parenthesis containing "true" or "false" followed by any number of ";true" or ";false",
\[\d(,[\d])*\]
followed by another set of parenthesis containing a digit, followed by any number of comma+digit.
The ^ and $ at the beginning and end of the string indicate the start and end of the string which is important since we're trying to verify the entire string matches the format.
Code Sample
var input = "24:Something(true;false;true)[0,1,0]";
var regex = new System.Text.RegularExpressions.Regex(#"^\d{1,3}:.{1,9}\(.*\)\[\d(,[\d])*\]$");
bool isFormattedCorrectly = regex.IsMatch(input);
Credit # Ian Nelson
This is one of those cases where your only sensible option is to use a Regular Expression.
My hasty attempt is something like:
var input = "24:Something(true;false;true)[0,1,0]";
var regex = new System.Text.RegularExpressions.Regex(#"^\d{1,3}:.{1,9}\(.*\)\[\d(,[\d])*\]$");
System.Diagnostics.Debug.Assert(regex.IsMatch(input));
This online RegEx tester should help refine the expression.
I think, the best way is to use regular expressions like this:
string s = "24:Something(true;false;true)[0,1,0]";
Regex pattern = new Regex(#"^\d{1,3}:[a-zA-z]{1,10}\((true|false)(;true|;false)*\)\[\d(,\d)*\]$");
if (pattern.IsMatch(s))
{
// s is valid
}
If you want anything inside (), you can use following regex:
#"^\d{1,3}:[a-zA-z]{1,10}\([^:\(]*\)\[\d(,\d)*\]$"

Replacing numbers in strings with C#

I'd thought i do a regex replace
Regex r = new Regex("[0-9]");
return r.Replace(sz, "#");
on a file named aa514a3a.4s5 . It works exactly as i expect. It replaces all the numbers including the numbers in the ext. How do i make it NOT replace the numbers in the ext. I tried numerous regex strings but i am beginning to think that its a all or nothing pattern so i cant do this? do i need to separate the ext from the string or can i use regex?
This one does it for me:
(?<!\.[0-9a-z]*)[0-9]
This does a negative lookbehind (the string must not occur before the matched string) on a period, followed by zero or more alphanumeric characters. This ensures only numbers are matched that are not in your extension.
Obviously, the [0-9a-z] must be replaced by which characters you expect in your extension.
I don't think you can do that with a single regular expression.
Probably best to split the original string into base and extension; do the replace on the base; then join them back up.
Yes, I thing you'd be better off separating the extension.
If you are sure there is always a 3-character extension at the end of your string, the easiest, most readable/maintainable solution would be to only perform the replace on
yourString.Substring(0,YourString.Length-4)
..and then append
yourString.Substring(YourString.Length-4, 4)
Why not run the regex on the substring?
String filename = "aa514a3a.4s5";
String nameonly = filename.Substring(0,filename.Length-4);

Categories

Resources