Remove anything from string after any "a-zA-Z" char - c#

I have this types of string:
"10a10", "10b5641", "5a1121", "438z2a5f"
and I need to remove anything after the FIRST a-zA-Z char in the string (the symbol itself should be removed as well). What could be a solution?
Examples of results I expect:
"10a10" returns "10"
"10b5641" returns "10"
"5a1121" returns "5"
"438z2a5f" returns "438"

You could use Regular Expressions along with Regex, something like:
string str = "10a10";
str = Regex.Replace(str, #"[a-zA-Z].*", "");
Console.WriteLine(str);
will output:
10
Basically it will takes everything that starts with a-zA-Z and everything after it (.* matches any characters zero or unlimited times) and remove it from the string.

An easy to understand approach would be to use the String.IndexOfAny Method to find the Index of the first a-zA-Z char, and then use the String.Substring Method to cut the string accordingly.
To do so you would create an array containing all a-zA-Z characters and use this as an argument to String.IndexOfAny. After that you use 0 and the result of String.IndexOfAny as arguments for String.Substring.
I am pretty sure there are more elegant ways to do this, but this seems the most basic approach to me, so its worth mentioning.

You could do so using Linq as follows.
var result = new string(strInput.TakeWhile(x => !char.IsLetter(x)).ToArray());

var sList = new List<string> { "10a10", "10b5641", "5a1121", "438z2a5f" };
foreach (string s in sList.ToArray())
{
string number = new string(s.TakeWhile(c => !Char.IsLetter(c)).ToArray());
Console.WriteLine(number);
}

Either Linq:
var result = string.Concat(strInput
.TakeWhile(c => !((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z')));
Or regular expression:
using System.Text.RegularExpressions;
...
var result = Regex.Match(strInput, "^[^A-Za-z]*").Value;
In both cases starting from strInput beginning take characters until a..z or A-Z occurred
Demo:
string[] tests = new[] {
"10a10", "10b5641", "5a1121", "438z2a5f"
};
string demo = string.Join(Environment.NewLine, tests
.Select(test => $"{test,-10} returns \"{Regex.Match(test, "^[^A-Za-z]*").Value}\""));
Console.Write(demo);
Outcome:
10a10 returns "10"
10b5641 returns "10"
5a1121 returns "5"
438z2a5f returns "438"

Related

Get a number and string from string

I have a kinda simple problem, but I want to solve it in the best way possible. Basically, I have a string in this kind of format: <some letters><some numbers>, i.e. q1 or qwe12. What I want to do is get two strings from that (then I can convert the number part to an integer, or not, whatever). The first one being the "string part" of the given string, so i.e. qwe and the second one would be the "number part", so 12. And there won't be a situation where the numbers and letters are being mixed up, like qw1e2.
Of course, I know, that I can use a StringBuilder and then go with a for loop and check every character if it is a digit or a letter. Easy. But I think it is not a really clear solution, so I am asking you is there a way, a built-in method or something like this, to do this in 1-3 lines? Or just without using a loop?
You can use a regular expression with named groups to identify the different parts of the string you are interested in.
For example:
string input = "qew123";
var match = Regex.Match(input, "(?<letters>[a-zA-Z]+)(?<numbers>[0-9]+)");
if (match.Success)
{
Console.WriteLine(match.Groups["letters"]);
Console.WriteLine(match.Groups["numbers"]);
}
You can try Linq as an alternative to regular expressions:
string source = "qwe12";
string letters = string.Concat(source.TakeWhile(c => c < '0' || c > '9'));
string digits = string.Concat(source.SkipWhile(c => c < '0' || c > '9'));
You can use the Where() extension method from System.Linq library (https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.where), to filter only chars that are digit (number), and convert the resulting IEnumerable that contains all the digits to an array of chars, that can be used to create a new string:
string source = "qwe12";
string stringPart = new string(source.Where(c => !Char.IsDigit(c)).ToArray());
string numberPart = new string(source.Where(Char.IsDigit).ToArray());
MessageBox.Show($"String part: '{stringPart}', Number part: '{numberPart}'");
Source:
https://stackoverflow.com/a/15669520/8133067
if possible add a space between the letters and numbers (q 3, zet 64 etc.) and use string.split
otherwise, use the for loop, it isn't that hard
You can test as part of an aggregation:
var z = "qwe12345";
var b = z.Aggregate(new []{"", ""}, (acc, s) => {
if (Char.IsDigit(s)) {
acc[1] += s;
} else {
acc[0] += s;
}
return acc;
});
Assert.Equal(new [] {"qwe", "12345"}, b);

get measurement value only from string

I have a string which gives the measurement followed the units in either cm, m or inches.
For example :
The number could be 112cm, 1.12m, 45inches or 45in.
I would like to extract only the number part of the string. Any idea how to use the units as the delimiters to extract the number ?
While I am at it, I would like to ignore the case of the units.
Thanks
You can try:
string numberMatch = Regex.Match(measurement, #"\d+\.?\d*").Value;
EDIT
Furthermore, converting this to a double is trivial:
double result;
if (double.TryParse(number, out result))
{
// Yeiiii I've got myself a double ...
}
Use String.Split http://msdn.microsoft.com/en-us/library/tabh47cf.aspx
Something like:
var units = new[] {"cm", "inches", "in", "m"};
var splitnumber = mynumberstring.Split(units, StringSplitOptions.RemoveEmptyEntries);
var number = Convert.ToInt32(splitnumber[0]);
Using Regex this can help you out:
(?i)(\d+(?:\.\d+)?)(?=c?m|in(?:ch(?:es)?)?)
Break up:
(?i) = ignores characters case // specify it in C#, live do not have it
\d+(\.\d+)? = supports numbers like 2, 2.25 etc
(?=c?m|in(ch(es)?)?) = positive lookahead, check units after the number if they are
m, cm,in,inch,inches, it allows otherwise it is not.
?: = specifies that the group will not capture
? = specifies the preceding character or group is optional
Demo
EDIT
Sample code:
MatchCollection mcol = Regex.Matches(sampleStr,#"(?i)(\d+(?:\.\d+)?)(?=c?m|in(?:ch(?:es)?)?)")
foreach(Match m in mcol)
{
Debug.Print(m.ToString()); // see output window
}
I guess I'd try to replace with "" every character that is not number or ".":
//s is the string you need to convert
string tmp=s;
foreach (char c in s.ToCharArray())
{
if (!(c >= '0' && c <= '9') && !(c =='.'))
tmp = tmp.Replace(c.ToString(), "");
}
s=tmp;
Try using regular expression \d+ to find an integer number.
resultString = Regex.Match(measurementunit , #"\d+").Value;
Is it a requirement that you use the unit as the delimiter? If not, you could extract the number using regex (see Find and extract a number from a string).

RegExp: X number of matches => X number of replacements?

Using regular expressions I'm trying to match a string, which has a substring consisting of unknown number of repeats (one or more) and then replace the repeating substring with the same number of replacement strings.
If the Regexp is "(st)[a]+(ck)", then I want to get these kind of results:
"stack" => "stOck"
"staaack" => "stOOOck" //so three times "a" to be replaced with three times "O"
"staaaaack" => "stOOOOOck"
How do I do that?
Either C# or AS3 would do.
If you use .net you can do this
find: (?<=\bsta*)a(?=a*ck\b)
replace: o
If you want to change all sta+ck that are substring of other words, only remove the \b
Since I am not familiar with either C# or AS3, I will write a solution in JavaScript, but the concept in the solution can be used for C# code or AS3 code.
var str = "stack stackoverflow staaaaaack stOackoverflow should not replace";
var replaced = str.replace(/st(a+)ck/g, function ($0, $1) {
var r = "";
for (var i = 0; i < $1.length; i++) {
r += "O";
}
return "st" + r + "ck";
});
Output:
"stOck stOckoverflow stOOOOOOck stOackoverflow should not replace"
In C#, you would use Regex.Replace(String, String, MatchEvaluator) (or other Regex.Replace methods that takes in a MatchEvaluator delegate) to achieve the same effect.
In AS3, you can pass a function as replacement, similar to how I did above in JavaScript. Check out the documentation of String.replace() method.
For AS3 you can pass a function to the replace method on the String object where matching elements are into the arguments array. So you can build and return a new String with all the 'a' replaced by 'O'
for example:
// first way explicit loop
var s:String="staaaack";
trace("before", s);
var newStr:String = s.replace(/(st)(a+)(ck)/g, function():String{
var ret:String=arguments[1]; // here match 'st'
//arguments[2] match 'aaa..'
for (var i:int=0, len:int=arguments[2].length; i < len; i++)
ret += "O";
return ret + arguments[3]; // arguments[3] match 'ck'
});
trace("after", newStr); // output stOOOOck
// second way array and join
var s1:String="staaaack staaaaaaaaaaaaack stack paaaack"
trace("before", s1)
var after:String = s1.replace(/(st)(a+)(ck)/g, function():String{
return arguments[1]+(new Array(arguments[2].length+1)).join("O")+arguments[3]
})
trace("after", after)
here live example on wonderfl : http://wonderfl.net/c/bOwE
Why not use the String Replace() method instead?
var str = "stack";
str = str.Replace("a", "O");
I would do it like this:
String s = "Staaack";
Console.WriteLine(s);
while (Regex.Match(s,"St[O]*([a]{1})[a]*ck").Success){
s = Regex.Replace(s,"(St[O]*)([a]{1})([a]*ck)", "$1O$3");
Console.WriteLine(s);
}
Console.WriteLine(s);
Console.ReadLine();
it replaces one a with every iteration, until no more as can be found.

Split the string with different conditions using Linq in C#

I need to extract and remove a word from a string. The word should be upper-case, and following one of the delimiters /, ;, (, - or a space.
Some Examples:
"this is test A/ABC"
Expected output: "this is test A" and "ABC"
"this is a test; ABC/XYZ"
Expected output: "this is a test; ABC" and "XYZ"
"This TASK is assigned to ANIL/SHAM in our project"
Expected output: "This TASK is assigned to ANIL in our project" and "SHAM"
"This TASK is assigned to ANIL/SHAM in OUR project"
Expected output: "This TASK is assigned to ANIL/SHAM in project" and "OUR"
"this is test AWN.A"
Expected output: "this is test" and "AWN.A"
"XETRA-DAX"
Expected output: "XETRA" and "DAX"
"FTSE-100"
Expected output: "-100" and "FTSE"
"ATHEX"
Expected output: "" and "ATHEX"
"Euro-Stoxx-50"
Expected output: "Euro-Stoxx-50" and ""
How can I achieve that?
An "intelligent" version:
string strValue = "this is test A/ABC";
int ix = strValue.LastIndexOfAny(new[] { '/', ' ', ';', '(', '-' });
var str1 = strValue.Substring(0, ix);
var str2 = strValue.Substring(ix + 1);
A "stupid LINQ" version:
var str3 = new string(strValue.Reverse().SkipWhile(p => p != '/' && p != ' ' && p != ';' && p != '(' && p != '-').Skip(1).Reverse().ToArray());
var str4 = new string(strValue.Reverse().TakeWhile(p => p != '/' && p != ' ' && p != ';' && p != '(' && p != '-').Reverse().ToArray());
both cases are WITHOUT checks. The OP can add checks if he wants them.
For the second question, using LINQ is REALLY too much difficult. With a Regex it's "easily doable".
var regex = new Regex("^(.*[A-Z]+)([-/ ;(]+)([A-Z]+)(.*?)$");
var strValueWithout = regex.Replace(strValue, "$1$4");
var extractedPart = regex.Replace(strValue, "$3");
For the third question
var regex = new Regex("^(.*?)([A-Z.]*)([-/ ;(]+)([A-Z.]+)(.*?)$", RegexOptions.RightToLeft);
var strValueWithout = regex.Replace(strValue, "$1$2$5");
var extractedPart = regex.Replace(strValue, "$4");
With code sample: http://ideone.com/5OSs0
Another update (it's becoming BORING)
Regex Regex = new Regex(#"^(?<1>.*?)(?<2>[-/ ;(]*)(?<=\b)(?<3>[A-Z.]+)(?=\b)(?<4>.*?)$|^(?<1>.*)$", RegexOptions.RightToLeft);
Regex Regex2 = new Regex(#"^(?<1>.*?)(?<2>[-/ ;(]*)(?<=\b)(?<3>(?:\p{Lu}|\.)+)(?=\b)(?<4>.*?)$|^(?<1>.*)$", RegexOptions.RightToLeft);
var str1 = Regex.Replace(str, "$1$4");
var str2 = Regex.Replace(str, "$3");
The difference between the two is that the first will use A-Z as upper case characters, the second one will use other "upper case" characters, for example ÀÈÉÌÒÙ
With code sample: http://ideone.com/FqcmY
This should work according to the new requirements: it should find the last separator that is wrapped with uppercase words:
Match lastSeparator = Regex.Match(strExample,
#"(?<=\b\p{Lu}+)[-/ ;(](\p{Lu}+)\b",
RegexOptions.RightToLeft); // last match
string main = lastSeparator.Result("$`$'"); // before and after the match
string word = lastSeparator.Groups[1].Value; // word after the separator
This regex is a little tricky. Main tricks:
Use RegexOptions.RightToLeft to find the last match.
Use of Match.Result for a replace.
$`$' as replacement string: http://www.regular-expressions.info/refreplace.html
\p{Lu} for upper-case letters, you can change that to [A-Z] if your more comfortable with that.
If the word shouldn't follow an upper case word, you can simplify the regex to:
#"[-/ ;(](\p{Lu}+)\b"
If you want other characters as well, you can use a character class (and maybe remove \b). For example:
#"[-/ ;(]([\p{Lu}.,]+)"
Working example: http://ideone.com/U9AdK
use a List of strings, set all the words to it
find the index of the / then use ElementAt() to determine the word to split which is "SHAM" in your question.
in the below sentence of yours your index of / will be 6.
string strSentence ="This TASK is assigned to ANIL/SHAM in our project";
then use ElementAt(6) at the end of
index is the index of the / in your List<string>
str = str.Select(s => strSentence.ElementAt(index+1)).ToList();
this will return you the SHAM
str = str.Delete(s => strSentence.ElementAt(index+1));
this will delete the SHAM then just print the strSentence without SHAM
if you dont want to use a list of strings you can use the " " to determinate the words in your sentence i think, but that would be a long way to go.
the idea of mine is right i think but the code may not be that flawless.
You can use a combination of the string.Split() method and the Regex class. A simple Split is suitable for simple cases, such as splitting according to the character /. Regular expressions are perfect for matching more complicated patterns.
As a proof of concept, you could re-implement Split in LINQ using TakeWhile and SkipWhile
string strValue = "this is test A/ABC";
var s1=new string(
strValue
.TakeWhile(c => c!= '/')
.ToArray());
var s2=new string(
strValue
.SkipWhile(c => c!= '/')
.Skip(1)
.ToArray());
I think the resulting code is so mind-blowingly ugly that I hope you'll decide not to use linq

c# Regex substring after second time char appear

My problem is that I have a string in format like that:
dsadadsadas
dasdasda
dasda
4TOT651.43|0.00|651.43|98933|607.75|0.00|607.75|607.75|7621|14|0|0|799.42
dsda
dasad
das
I need to find the line that contains the 4TOT and substring the value between the socond and third '|' any ideas how I can obtain that in regex substring?
For now I Have only that:
var test = Regex.Match(fileContent, "4TOT.*").Value;
Which finds me entire line.
When the input is simple and follows a strict format like this, I usually prefer to use plain old string handling over regex. In this case it's spiced up with some LINQ for simpler code:
// filter out lines to use
var linesToUse = input
.Split(new[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries)
.Where(s => s.StartsWith("4TOT"));
foreach (string line in linesToUse)
{
// pick out the value
string valueToUse = line.Split('|')[2];
// more code here, I guess
}
If you know that the input contains only one line that you are interested in, you can remove the loop:
string line = input
.Split(new[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries)
.Where(s => s.StartsWith("4TOT"))
.FirstOrDefault();
string value = string.IsNullOrEmpty(line) ? string.Empty : line.Split('|')[2];
Update
Here is an approach that will work well when loading the input from a file instead:
foreach (var line in File.ReadLines(#"c:\temp\input.txt")
.Where(s => s.StartsWith("4TOT")))
{
string value = string.IsNullOrEmpty(line) ? string.Empty : line.Split('|')[2];
Console.WriteLine(value);
}
File.ReadLines is new in .NET 4 and enumerates the lines in the file without loading the full file into memory, but instead it reads it line by line. If you are using an earlier version of .NET you can fairly easily make your own method providing this behavior.
What about this regex?
Seems to be working for me.
4TOT.*?\|.*?\|(.*?)\|
Captures the value you're looking for into a group.
Why don't you split your string twice: firstly with newline and then if target substring is found by '|' symbol without using of regex?
var tot = source.Split(Environment.NewLine.ToCharArray())
.FirstOrDefault(s => s.StartsWith("4TOT"));
if (tot != null)
{
// gets 651.43
var result = tot.Split('|')
.Skip(2)
.FirstOrDefault();
}
Use the regex : ^4TOT(?:(?:[0-9]*(?:.[0-9]*)?)\|){2}([0-9]*(?:.[0-9]*)?).*
This regex will match 4TOT at the beginning followed by "2 numbers (decimal separated) then |" two times, and captures a number. The rest is ignored.
If you then use :
Match match = Regex.Match(input, pattern);
You will find the anwser into match.Groups
Memo:
Numbers are [0-9]*\.[0-9]*
Using the (?: ... ) makes a non-capturing parenthesis

Categories

Resources