How to pull out alpha and count digits using regex? - c#

I want to build Regex in C#. I need to know how to pull out alpha and count digits using Regex.
string example = "ASDFG 3457";
Need to pull out of "ASDFG" and then count digits (eg 4 or 5 - 7). If finding 4 digits, return value = 3457 without alpha. How to do this in C#?
I know it is better to do without regex but i have a requirement that i must use regex for a reason.

If all your doing is trying to get the numbers from a piece of text you can do this:
string expr=#"\d+";
string text="ASDFG 3457":
MatchCollection mc = Regex.Matches(text, expr);
foreach (Match m in mc)
{
Console.WriteLine(m);
}

regex
(?<alpha>\w*) (?<number>\d*)
this extracts two named groups: alpha and number.
It assumes the first group only contain words and the second only contains digits and that they are separated by a blank space.
None of them are mandatory.
If you need to make them mandatory you could replace * with +
You can also force the number of digits to four with \d{4}
I'd recommend you reading a regex tutorial and take some c# sample from the web. #Srb1313711 answer already helps you on that.

Obviously (cough) the simplest "solution" is here:
using System;
using System.Collections.Generic;
class Program
{
private static IEnumerable<long> ParseNumbers(IEnumerable<char> stream)
{
bool eos = false;
using (var it = stream.GetEnumerator())
do
{
Func<bool> advance = () => !(eos = !it.MoveNext());
while (advance() && !char.IsDigit(it.Current)) ;
if (eos) break;
long accum = 0;
do accum = accum * 10 + (it.Current - '0');
while (advance() && char.IsDigit(it.Current));
yield return accum;
}
while (!eos);
}
static void Main()
{
foreach (var num in ParseNumbers("ASDFG 3457 ASDFG.\n 123457"))
{
Console.WriteLine(num);
}
}
}
For fun, of course.
Edit
For more fun: the unsafe variation. Note this is also no longer deferred, so it won't work if not all input has arrived yet, and it generates an eager list of values:
using System;
using System.Collections.Generic;
class Program
{
private static unsafe List<long> ParseNumbers(char[] input)
{
var r = new List<long>();
fixed (char* begin = input)
{
char* it = begin, end = begin + input.Length;
while (true)
{
while (it != end && (*it < '0' || *it > '9'))
++it;
if (it == end) break;
long accum = 0;
while (it != end && *it >= '0' && *it <= '9')
accum = accum * 10 + (*(it++) - '0');
r.Add(accum);
}
}
return r;
}
static void Main()
{
foreach (var number in ParseNumbers("ASDFG 3457 ASDFG.\n 123457".ToCharArray()))
{
Console.WriteLine(number);
}
}
}

Description
This regular expression will:
capture the text into group 1
count the number of digits and place them into a capture group based on how many where found
Capture group 2 will have numbers which are 8 or more digits long
Capture group 3 will have numbers which are 5-7 digits long
Capture group 4 will have numbers which are exactly 4 digits long
Capture group 5 will have numbers which are 1-3 digits long
([A-Za-z]*) (?:(\d{8,})|(\d{5,7})|(\d{4})|(\d{1,3}))
Example
Live Demo: http://www.rubular.com/r/AIO9uUNNQc
Sample Text
ASDFG 1234567890
ASDFG 123456789
ASDFG 12345678
ASDFG 1234567
ASDFG 123456
ASDFG 12345
ASDFG 1234
ASDFG 123
ASDFG 12
ASDFG 1
Capture Groups
[0][0] = ASDFG 1234567890
[0][1] = ASDFG
[0][2] = 1234567890
[0][3] =
[0][4] =
[0][5] =
[1][0] = ASDFG 123456789
[1][1] = ASDFG
[1][2] = 123456789
[1][3] =
[1][4] =
[1][5] =
[2][0] = ASDFG 12345678
[2][1] = ASDFG
[2][2] = 12345678
[2][3] =
[2][4] =
[2][5] =
[3][0] = ASDFG 1234567
[3][1] = ASDFG
[3][2] =
[3][3] = 1234567
[3][4] =
[3][5] =
[4][0] = ASDFG 123456
[4][1] = ASDFG
[4][2] =
[4][3] = 123456
[4][4] =
[4][5] =
[5][0] = ASDFG 12345
[5][1] = ASDFG
[5][2] =
[5][3] = 12345
[5][4] =
[5][5] =
[6][0] = ASDFG 1234
[6][1] = ASDFG
[6][2] =
[6][3] =
[6][4] = 1234
[6][5] =
[7][0] = ASDFG 123
[7][1] = ASDFG
[7][2] =
[7][3] =
[7][4] =
[7][5] = 123
[8][0] = ASDFG 12
[8][1] = ASDFG
[8][2] =
[8][3] =
[8][4] =
[8][5] = 12
[9][0] = ASDFG 1
[9][1] = ASDFG
[9][2] =
[9][3] =
[9][4] =
[9][5] = 1

Related

Selecting multiple months for a MonthlyTrigger using css

I have the need to create scheduled windows tasks using a C# app. I have a comma separated string that stores the months I'd like to run the task on. The string contains the short values for the type MonthsOfYear - eg. "1,2,4,16,128,1024".
The example I have shows that you can assign multiple months seperated by a pipe as follows:
MonthlyTrigger mt = new MonthlyTrigger();
mt.StartBoundary = Convert.ToDateTime(task.getStartDateTime());
mt.DaysOfMonth = new int[] { 10, 20 };
mt.MonthsOfYear = MonthsOfTheYear.July | MonthsOfTheYear.November;
My question is, how do I assign multiple months to the trigger dynamically, using the values from the comma seperated string.
I'm not quite sure, what your problem is. And you didn't post code of your Trigger or your enum. Because of this i'll provide a complete example with a List for comparesion:
public class MonthlyTrigger
{
[Flags] // Important because we want to set multiple values to this type
public enum MonthOfYear
{
Jan = 1, // 1st bit
Feb = 2, // 2nd bit..
Mar = 4,
Apr = 8,
May = 16,
Jun = 32,
Jul = 64,
Aug = 128,
Sep = 256,
Oct = 512,
Nov = 1024,
Dec = 2048
}
public HashSet<int> Months { get; set; } = new HashSet<int>(); // classical list to check months
public MonthOfYear MonthFlag { get; set; } // out new type
}
public static void Main(string[] args)
{
MonthlyTrigger mt = new MonthlyTrigger();
string monthsFromFileOrSomething = "1,3,5,7,9,11"; // fake some string..
IEnumerable<int> splittedMonths = monthsFromFileOrSomething.Split(',').Select(s => Convert.ToInt32(s)); // split to values and convert to integers
foreach (int month in splittedMonths)
{
mt.Months.Add(month); // adding to list (hashset)
// Here we "add" another month to our Month-Flag => "Flag = Flag | Month"
MonthlyTrigger.MonthOfYear m = (MonthlyTrigger.MonthOfYear)Convert.ToInt32(Math.Pow(2, month - 1));
mt.MonthFlag |= m;
}
Console.WriteLine(String.Join(", ", mt.Months)); // let's see our list
Console.WriteLine(mt.MonthFlag); // what is contained in our flag?
Console.WriteLine(Convert.ToString((int)mt.MonthFlag, 2)); // how is it binarily-stored?
// Or if you like it in one row:
mt.MonthFlag = 0;
foreach (MonthlyTrigger.MonthOfYear m in monthsFromFileOrSomething.Split(',').Select(s => (MonthlyTrigger.MonthOfYear)Convert.ToInt32(s)))
mt.MonthFlag = m;
return;
}
Adding or removing single Flags in an enum:
MyEnumType myEnum = 1; // enum with first flag set
myEnum |= 2; // Adding the second bit. Ofcouse you can use the enum-name here "MyEnumType.MyValueForBitTwo"
// Becuase:
// 0000 0001
// | 0000 0010 "or"
// = 0000 0011
myEnum &= (int.MaxValue - 2) // Deletes the second enum-bit.
// Because:
// 0000 0011
// & 1111 1101 "and"
// = 0000 0001

What corner case am I missing in my algorithm that determines whether 0 or 1 character removals can make the character counts equal?

For example, it is not possible to remove 0 or 1 characters from "aaabbbcc" to make an equal number of each character, but it is for "aaabbcc" (remove 1 'a').
My algorithm is passing 12/15 test cases and the ones it isn't passing have too large of input to possible debug. My guess is that there's some corner case I'm missing in my algorithm. Let me know if you see any flaws in the logic which I've detailed in comments.
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
class Solution
{
static void Main(String[] args)
{
string s = Console.ReadLine();
// s = source string, consists of characters in range 'a', ..., 'z'
int[] f = Enumerable.Repeat(0, 26).ToArray();
foreach(var c in s)
f[c - 'a'] += 1;
// f[0] = # of 'a' characters, ..., f[25] = # of 'z' characters
var ff = new Dictionary<int,int>();
foreach(var k in f.Where(x => x > 0)) // for non-zero character frequencies
{
// run an "add or increment" on the frequencies of character frequencies
if(ff.ContainsKey(k)) ff[k] += 1;
else ff[k] = 1;
if(ff.Keys.Count() > 2)
{
// if there are more than 2 distinct counts of characters, it is impossible to level
// the counts by removing a single character
Console.WriteLine("NO");
return;
}
}
// ff[k] = frequency of the count k of a character in s, excluding any non-zero counts
var ffl = ff.ToList();
if(ffl.Count == 2)
{
// If there are 2 distinct frequencies of characters, one of those frequencies must be
// represented only 1 time and its frequecy must be 1 more than other one.
// For example, if s="aaabbcc" then ffl = { {3,1}, {2,2} }, meaning that there is 1 example
// of a character that appears 3 times (the 'a') and 2 examples of characters that appear 2
// times (the 'b' and 'c'). So we can remove 1 'a' to level the character counts.
// For another example, if s="aaabbbcc", then ffl = { {3,2}, {2,1} }, meaning that there are 2
// examples of a character appearing 3 times (the 'a' and 'b') and 1 example of a
// character appearing 2 times (the 'c'). So we have no way of leveling the character counts.
KeyValuePair<int,int> oneff, other;
if(ffl[0].Value == 1)
{
oneff = ffl[0];
other = ffl[1];
}
else if(ffl[1].Value == 1)
{
oneff = ffl[1];
other = ffl[0];
}
else
{
Console.WriteLine("NO");
return;
}
if((oneff.Key - 1) != other.Key)
{
Console.WriteLine("NO");
return;
}
}
// if we're here, there are 1 or fewer distinct counts of characters, meaning the
// frequency of characters in the string is already level
Console.WriteLine("YES");
}
}
EDIT: I found the corner case on my own. If the frequency that is represented 1 time is 1, then we're good. For example, with
a -> 111
b -> 111
c -> 111
d -> 111
e -> 111
f -> 111
g -> 111
h -> 111
i -> 111
j -> 0
k -> 0
l -> 0
m -> 0
n -> 1
o -> 0
p -> 0
q -> 0
r -> 0
s -> 0
t -> 0
u -> 0
v -> 0
w -> 0
x -> 0
y -> 0
z -> 0
we can remove the n character. I changed
if((oneff.Key - 1) != other.Key)
to
if((oneff.Key != 1) && (oneff.Key - 1) != other.Key)
and it worked.

Sort List<string > by Leading Numbers

I am having trouble properly sorting my list based on the leading number. When I sort, it starts with 1, then goes to 10, 11, etc.
I am trying to sort the following in order:
1 | Text One
10 | Text Two
11 | Text Three
The method I'm trying to sort is here:
finalnoteslist = finalnoteslist.OrderBy(num => num).ToList();
System.Text.StringBuilder clipData = new System.Text.StringBuilder();
foreach (object value in finalnoteslist)
{
clipData.AppendLine(value.ToString());
}
Clipboard.Clear();
Clipboard.SetText(clipData.ToString());
MessageBox.Show(clipData.ToString() + Environment.NewLine + "NOTES COPIED TO CLIPBOARD. CONTROL + V TO PASTE IN DRAWING");
}
int CompareStringBuilders(System.Text.StringBuilder a, System.Text.StringBuilder b)
{
for (int i = 0; i < a.Length && i < b.Length; i++)
{
var comparison = a[i].CompareTo(b[i]);
if (comparison != 0)
return comparison;
}
return a.Length.CompareTo(b.Length);
}
You split each item by its seperator | and parse the first part into a int value. Then you sort those.
List<string> finalnoteslist = new List<string>()
{ "1 | Text One",
"10 | Text Two",
"11 | Text Three"
};
finalnoteslist = finalnoteslist.OrderBy(x => int.Parse(x.Split('|').First())).ToList();
You could use string.Split to split and get the leading integer, which can be used to sort your list.
finalnoteslist = finalnoteslist.OrderBy(x=> int.Parse(x.Split('|')[0])).ToList();
Try this Demo
To Sort the List in-place:
List<string> strings = new List<string>()
{
"1 | Text One", "12 | Text Two", "100 | Text Three", "2 | Text Four"
};
Func<string, int> getNumber = (str) => Int32.Parse(str.Split('|').FirstOrDefault());
strings.Sort((y, x) => getNumber(y) - getNumber(x));
To Sort using Linq (creates a new List):
strings = strings.OrderBy(x => convertFunction(x)).ToList();

How to Split Year month Day,if Year does not exist in the given string using c#?

Here the below code am working for Split the number from the given string and stores the correspond integer into combobox.That working Perfect.But
i want to know ,If Year does not exist in the string,how to assign Year as Zero and the next integer for month strores in second combobox
For example :If string is "4Month(s)2Day(s)" Here No Year,So how to check Year not contains and insert Zero to combobox1,4 to combobox2 and 2 to combobox3
in the following code
int count = 0;
string[] delimiterChars = {"Year","Years","Years(s)","Month","Month(s)","Day","Day(s)"};
string variable =agee;
string[] words = variable.Split(delimiterChars, StringSplitOptions.None);
foreach (string s in words)
{
var data = Regex.Match(s, #"\d+").Value;
count++;
if (count == 1)
{
comboBox1.Text = data;
}
else if (count == 2)
{
comboBox2.Text = data;
}
else if (count == 3)
{
comboBox3.Text = data;
}
}
You can do with Regex like this
int combBox1, combBox2, combBox3;
var sample = "1Year(s)4month(s)2DaY(s)";
var yearString = Regex.Match(sample, #"\d+Year", RegexOptions.IgnoreCase).Value;
if (!string.IsNullOrEmpty(yearString))
combBox1 = int.Parse(Regex.Match(yearString, #"\d+").Value);
var monthString = Regex.Match(sample, #"\d+Month", RegexOptions.IgnoreCase).Value;
if (!string.IsNullOrEmpty(monthString))
combBox2 = int.Parse(Regex.Match(monthString, #"\d+").Value);
var dayStrings = Regex.Match(sample, #"\d+Day", RegexOptions.IgnoreCase).Value;
if (!string.IsNullOrEmpty(dayStrings))
combBox3 = int.Parse(Regex.Match(dayStrings, #"\d+").Value);
You can skip the int.Parse() if you want, then you have to set 0 manually.
Instead of first splitting the string and then using a RegEx to parse the parts, I'd use a RegEx for the entire work.
Using Regex Hero's tester (requires Silverlight to work...) I came up with the following:
(?:(?<years>\d+)Year\(?s?\)?)?(?<months>\d+)Month\(?s?\)?(?<days>\d+)Day\(?s?\)?
This matches all of the following inputs
Input Matching groups:
***** ****************
4Month(s)2Day(s) months: 4, days: 2
1Year(s)4Month(s)2Day(s) years: 1, months: 4, days: 2
3Years6Month(s)14Day(s) years: 3, months: 6, days: 14
1Year1Month1Day years: 1, months, 1, days: 1
As you see, it matches everything that's there. If you don't have a match for years, you can test for that with the Success property of the capture group.
Sample
var pattern = #"(?:(?<years>\d+)Year\(?s?\)?)?(?<months>\d+)Month\(?s?\)?(?<days>\d+)Day\(?s?\)?";
var regex = new Regex(pattern);
var testCases = new List<string> {
"4Month(s)2Day(s)",
"1Year(s)4Month(s)2Day(s)",
"3Years6Month(s)14Day(s)",
"1Year1Month1Day"
};
foreach (var test in testCases) {
var match = regex.Match(test);
var years = match.Groups["years"].Success ? match.Groups["years"].Value : "0";
var months = match.Groups["months"].Value;
var days = match.Groups["days"].Value;
string.Format("input: {3}, years: {0}, months: {1}, days: {2}", years, months, days, test).Dump();
}
Run that in LinqPad, and you'll see
input: 4Month(s)2Day(s), years: 0, months: 4, days: 2
input: 1Year(s)4Month(s)2Day(s), years: 1, months: 4, days: 2
input: 3Years6Month(s)14Day(s), years: 3, months: 6, days: 14
input: 1Year1Month1Day, years: 1, months: 1, days: 1
I think you have another problem here. If you split the string, you don't now if the value is a year, month or day. This information get lost with splitting. Maybe you should parse the string another way, to get this information.
You can create 3 boolean variables to check whether you have year day and month in your string, and check that boolean variable before assigning value to that combobox.
if(variable.Contains("Year"))
bool Hasyear = true;
if(variable.Contains("Month"))
bool HasMonth= true;
if(variable.Contains("Day"))
bool HasDay= true;
Use a better pattern
string input1 = "1Year(s)4Month(s)2Day(s)";
string pattern1 = #"(?'year'\d+)?(Year(\(s\))?)?(?'month'\d+)(Month(\(s\))?)?(?'day'\d+)(Day(\(s\))?)?";
Match match1 = Regex.Match(input1, pattern1);
string year1 = match1.Groups["year"].Value;
string month1 = match1.Groups["month"].Value;
string day1 = match1.Groups["day"].Value;
string input2 = "4Month(s)2Day(s)";
string pattern2 = #"(?'year'\d+)?(Year(\(s\))?)?(?'month'\d+)(Month(\(s\))?)?(?'day'\d+)(Day(\(s\))?)?";
Match match2 = Regex.Match(input2, pattern2);
string year2 = match2.Groups["year"].Value;
string month2 = match2.Groups["month"].Value;
string day2 = match2.Groups["day"].Value;​
You could very simply do like this:
string agee = "1Year4Month(s)2Day(s)";
string[] delimiterChars = {"Year", "Month", "Day"};
string variable =agee.Replace("(s)","").Replace("s", "");
string[] words = variable.Split(delimiterChars, StringSplitOptions.RemoveEmptyEntries);
int count = words.Length;
switch (count)
{
case 0:
combobox1.Text = "0";
combobox2.Text = "0";
combobox3.Text = "0";
break;
case 1:
combobox1.Text = "0";
combobox2.Text = "0";
combobox3.Text = words[0];
break;
case 2:
combobox1.Text = "0";
combobox2.Text = words[0];
combobox3.Text = words[1];
break;
case 2:
combobox1.Text = words[0];
combobox2.Text = words[1];
combobox3.Text = words[2];
break;
}

Split strings that have strange pattern

I need help to split a collection of strings that have rather strange pattern.
Example data:
List<string> input = new List<string>();
input.Add("Blue Code \n 03 ID \n 05 Example \n Sky is blue");
input.Add("Green Code\n 01 ID\n 15");
input.Add("Test TestCode \n 99 \n Testing is fun");
Expected output:
For input[0]:
string part1 = "Blue"
string part2 = "Code \n 03"
string part3 = "ID \n 05"
string part4 = "Example \n Sky is blue"
For input[1]:
string part1 = "Green"
string part2 = "Code\n 01"
string part3 = "ID\n 15"
For input[2]:
string part1 = "Test"
string part2 = "TestCode \n 99"
string part3 = "\n Testing is fun"
Edited with one more example:
"038 038\n 0004 049.0\n 0006"
Expected output:
"038"
"038\n 0004"
"049.0\n 0006"
In short, I don't even know how to describe the pattern... It seems like I need the first string(act as a key) right before the "\n" as part of the new string, but the last input[2] has slightly different pattern from the other 2. Also, please take note of the spaces, they are extremely inconsistent.
I know this is a long shot, but please let me know if anyone can figure out how to deal with these data.
Updated: I think I can forget about solving this... When I actually take a look at the database in detail, I just found out that there are NOT only \n, it can be... anything, including |a |b |c (from a-z, A-Z), \a \b \c (from a-z, A-Z). Manually re-entering the data could be much more easier...
I would say the pattern is:
List<string> input = new List<string>();
input.Add("Blue Code \n 03 ID \n 05 Example \n Sky is blue");
input.Add("Green Code\n 01 ID\n 15");
input.Add("Test TestCode \n 99 \n Testing is fun");
foreach(string text in input)
{
string rest = text;
//1 Take first word
string part1 = rest.Split(' ')[0];
rest = rest.Skip(part1.Length).ToString();
//while rest contains (/n number)
while (rest.Contains("\n"))
{
//Take until /n number
int index = rest.IndexOf("\n");
string partNa = rest.Take(index).ToString();
string temp = rest.Skip(index).ToString();
string partNb = temp.Split(' ')[0];
int n;
if (int.TryParse("123", out n))
{
string partN = partNa + partNb;
rest = rest.Skip(partN.Length).ToString();
}
}
//Take rest
string part3 = rest;
}
It could probably be written a bit more optimised, but you get the idea.
Ok, I have got this little code snippet to generate the output you are looking for. the Pattern seems to be: Word [Key \n Value] [Key \n Value] [Key \n Value (With Spaces)]
Where the Key can be empty. Is that right?
var input = new List<string>
{
"Blue Code \n 03 ID \n 05 Example \n Sky is blue",
"Green Code\n 01 ID\n 15",
"038 038\n 0004 049.0\n 0006",
"Test TestCode \n 99 \n Testing is fun"
};
var output = new List<List<string>>();
foreach (var item in input)
{
var items = new List<string> {item.Split(' ')[0]};
const string strRegex = #"(?<group>[a-zA-Z0-9\.]*\s*\n\s*[a-zA-Z0-9\.]*)";
var myRegex = new Regex(strRegex, RegexOptions.None);
var matchCollection = myRegex.Matches(item.Remove(0, item.Split(' ')[0].Length));
for (var i = 0; i < 2; i++)
{
if (matchCollection[i].Success)
{
items.Add(matchCollection[i].Value);
}
}
var index = item.IndexOf(items.Last()) + items.Last().Length;
var final = item.Substring(index);
if (final.Contains("\n"))
{
items.Add(final);
}
else
{
items[items.Count -1 ] = items[items.Count - 1] + final;
}
output.Add(items);
}

Categories

Resources