Regular expression to include starting string - c#

I have managed to match into groups as follows using the below expression but its incomplete.
\([^\)]*\)
Example strings are,
s11(h 1 1 c)(h 1 1 c) x="" y="" z="" phi="" theta=""
e(45,10,h 1 1 c,1,cross,max) x="" y="" z="" phi="" theta=""
With the above expression I can match (h 1 1 c)(h 1 1 c) and (45,10,h 1 1 c,1,cross,max)
But I want to capture the starting string s11 and e along with (h 1 1 c)(h 1 1 c) and (45,10,h 1 1 c,1,cross,max)

You can use
var lines = new List<string> { "s11(h 1 1 c)(h 1 1 c) x=\"\" y=\"\" z=\"\" phi=\"\" theta=\"\"",
"e(45,10,h 1 1 c,1,cross,max) x=\"\" y=\"\" z=\"\" phi=\"\" theta=\"\""};
foreach (var s in lines)
{
Console.WriteLine("==== Next string: \"" + s + "\" =>");
Console.WriteLine(string.Join(", ",
Regex.Matches(s, #"\w+(?:\([^()]*\))+").Cast<Match>().Select(x => x.Value)));
Console.WriteLine("=== With groups and captures:");
var results = Regex.Matches(s, #"(\w+)(?:(\([^()]*\)))+");
foreach (Match m in results)
{
Console.WriteLine(m.Groups[1].Value);
Console.WriteLine(string.Join(", ", m.Groups[2].Captures.Cast<Capture>().Select(z => z.Value)));
}
}
See the C# demo. Output:
==== Next string: "s11(h 1 1 c)(h 1 1 c) x="" y="" z="" phi="" theta=""" =>
s11(h 1 1 c)(h 1 1 c)
=== With groups and captures:
s11
(h 1 1 c), (h 1 1 c)
==== Next string: "e(45,10,h 1 1 c,1,cross,max) x="" y="" z="" phi="" theta=""" =>
e(45,10,h 1 1 c,1,cross,max)
=== With groups and captures:
e
(45,10,h 1 1 c,1,cross,max)
Depending on what exact results you want to get, you may use a regex with or without capturing groups:
\w+(?:\([^()]*\))+
(\w+)(?:(\([^()]*\)))+
See the regex 1 demo and regex 2 demo.
Details
\w+ - one or more word chars (letters, digits and some connector puncutation)
(?:\([^()]*\))+ - one or more repetitions of
\( - a ( char
[^()]* - zero or more chars other than ( and )
\) - a ) char.

Related

how to remove delimiters in C#

Hi I have tried already the code below and it does not remove the delimeters in a dat file:
StreamReader input = new StreamReader(txtFile.Text);
string content = input.ReadToEnd().Trim();
string[] split = System.Text.RegularExpressions.Regex.Split(content, "\\s+", RegexOptions.None);
foreach (string s in split)
{
txtStatus.Text = s + "\r\n" + txtStatus.Text;
}
here is a sample data from the dat file I am working on:
xú !' Date D Time C Millitm N TagIndex N Value C Status C! Marker C" Internal C#
2020032403:25:25829 0 # Bÿÿÿÿ 2020032403:25:25829 1 9# Bÿÿÿÿ 2020032403:25:26844 0 # 2020032403:25:26844 1 9# 2020032403:25:27845 0 # 2020032403:25:27845 1 :# 2020032403:25:28847 0 # 2020032403:25:28847 1 ;# 2020032403:25:29851 0 # 2020032403:25:29851 1 <# 2020032403:25:30857 0 # 2020032403:25:30857 1 =# 2020032403:25:31861 0 #
2020032403:25:31861 1 ># 2020032403:25:32867 0 # 2020032403:25:32867 1 ?#
2020032403:25:33873 0 # 2020032403:25:33873 1 ## 2020032403:25:34877 0 # 2020032403:25:34877 1 €## 2020032403:25:35879 0 # 2020032403:25:35879 1 A# 2020032403:25:36888 0 # 2020032403:25:36888 1 €A# 2020032403:25:37890 0 # 2020032403:25:37890 1 B# 2020032403:25:38838 0 # 2020032403:25:38838 1 €B# 2020032403:25:39841 0 # 2020032403:25:39841 1 C# 2020032403:25:40846 0 # 2020032403:25:40846 1 €C# 2020032403:25:41849 0 # 2020032403:25:41849 1 D# 2020032403:25:42851 0 # 2020032403:25:42851 1 €D# ! 2020032403:25:43852 0 # " 2020032403:25:43852 1 E# # 2020032403:25:44860 0 # $ 2020032403:25:44860 1 €E# % 2020032403:25:45862 0 # & 2020032403:25:45862 1 F# ' 2020032403:25:46869 0 # ( 2020032403:25:46869 1 €F# ) 2020032403:25:47873 0 # * 2020032403:25:47873 1 G# + 2020032403:25:48883 0 # , 2020032403:25:48883 1 €G# - 2020032403:25:49887 0 # . 2020032403:25:49887 1 H# / 2020032403:25:50842 0 # 0 2020032403:25:50842 1 €H# 1 2020032403:25:51844 0 # 2 2020032403:25:51844 1 I# 3 2020032403:25:52866 0 # 4 2020032403:25:52866 1 €I# 5 2020032403:25:53868 0 # 6 2020032403:25:53868 1 J# 7 2020032403:25:54884 0 # 8 2020032403:25:54884 1 €J# 9 2020032403:25:55886 0 # : 2020032403:25:55886 1 K# ; 2020032403:25:56896 0 # < 2020032403:25:56896 1 €K# = 2020032403:25:57860 0 # > 2020032403:25:57860 1 L# ? 2020032403:25:58866 0 # # 2020032403:25:58866 1 €L# A 2020032403:25:59868 0 # B 2020032403:25:59868 1 M# C 2020032403:26:00873
can anyone help me?
It depends on your definition of special characters. To remove all characters except numbers, alphabets, spaces and ":" you can use following pattern [^0-9a-zA-Z:\s]+
s = Regex.Replace(s, "[^0-9a-zA-Z]+", "");
txtStatus.Text = s + "\r\n" + txtStatus.Text;

Regex - Match versus Groups

I am sorry in advance if this would fall under duplicates but I could not see these answered my questions.
Could you please help and explain:
Where is the match or capture only for name held? The initial part of the pattern [A-Za-z0-9_\-\.]+ does not show it between brackets so I understand it won't be a group, how then is name captured and held as a component of Match 0?
If I replace the string t2 to name#domain.com alt#yahoo.net and pattern to ^([A-Za-z0-9_\-\.\ ]+#(([A-Za-z0-9\-])+\.)+([A-Za-z\-])+)+$
I would expect 2 matches: One for each full email address. Output only shows 1 match holding both separated by a space, why?
How should the pattern read to get 2 matches or would the string need to be different for this pattern?
I don't see the consistency in the Group output because it does not show another Group holding capture 0=com and capture 1=net, similarly to Group 2 holding domain. and yahoo. captures, why?
Group 3 captures seem to hold the captures of the Group 2 Capture 0 and 1, is that how hierarchies work, there are captures of captures of groups?
Code
static void Main(string[] args)
{
string t2 = "name#domain.com";
string p2 = #"^[A-Za-z0-9_\-\.\ ]+#(([A-Za-z0-9\-])+\.)+([A-Za-z\-])+$";
MatchCollection matches = Regex.Matches(t2, p2);
GroupCollection gc;
int groupIndex = 0;
int matchIndex = 0;
int captureIndex = 0;
foreach (Match nextMatch in matches)
{
gc = nextMatch.Groups;
Console.WriteLine("Match {0} holds: {1}", matchIndex, nextMatch.Value);
matchIndex++;
foreach (Group g in gc)
{
Console.WriteLine("Group {0} holding: {1}", groupIndex, g.ToString());
groupIndex++;
foreach (Capture capture in g.Captures)
{
Console.WriteLine("\tCapture {0} holds {1}", captureIndex, capture.ToString());
captureIndex++;
}
captureIndex = 0;
}
groupIndex = 0;
}
matchIndex = 0;
}
Output for the above code:
Match 0 holds: name#domain.com
Group 0 holding: name#domain.com
Capture 0 holds name#domain.com
Group 1 holding: domain.
Capture 0 holds domain.
Group 2 holding: n
Capture 0 holds d
Capture 1 holds o
Capture 2 holds m
Capture 3 holds a
Capture 4 holds i
Capture 5 holds n
Group 3 holding: m
Capture 0 holds c
Capture 1 holds o
Capture 2 holds m
Press any key to continue . . .
Output if string t2 = "name#domain.com alt#yahoo.net"; and string p2 = #"^([A-Za-z0-9_\-\.\ ]+#(([A-Za-z0-9\-])+\.)+([A-Za-z\-])+)+$";
Match 0 holds: name#domain.com alt#yahoo.net
Group 0 holding: name#domain.com alt#yahoo.net
Capture 0 holds name#domain.com alt#yahoo.net
Group 1 holding: alt#yahoo.net
Capture 0 holds name#domain.com
Capture 1 holds alt#yahoo.net
Group 2 holding: yahoo.
Capture 0 holds domain.
Capture 1 holds yahoo.
Group 3 holding: o
Capture 0 holds d
Capture 1 holds o
Capture 2 holds m
Capture 3 holds a
Capture 4 holds i
Capture 5 holds n
Capture 6 holds y
Capture 7 holds a
Capture 8 holds h
Capture 9 holds o
Capture 10 holds o
Group 4 holding: t
Capture 0 holds c
Capture 1 holds o
Capture 2 holds m
Capture 3 holds n
Capture 4 holds e
Capture 5 holds t
Press any key to continue . . .
The Match covers the matching of the entire regex. The regex can be applied to the given string.
Groups are part of that Match and Captures are (if you specified multiple occurences of a group like (someRegex)+ ) all Captures of that Group. Try changing ([A-Za-z\-])+ to ([A-Za-z\-]+) and see the difference!
Examples:
\w*(123)\w* on "asdsa123asdf"
Match -> asdsa123asdf
Group -> 123 (== last capture)
Captures -> 123
\w*([123])+\w* on "asdsa123asdf"
Match -> asdsa123asdf
Group -> 3 (== last capture)
Captures -> 1, 2, 3
There are multiple sites to test and show details of your regex, i.e. https://regexr.com or https://regex101.com

Regex to get numbers after a period in a string

I'm trying to find the right regex to extract the numbers after the . in the string below. E.g, the first line should return and array of 1 1 1 1 1, the second should return 2 1 0 1 2. I can't seem to figure the correct regex expression to achieve this. Any help would be appreciated.
line = 0.1, 1.1, 2.1, 3.1, 4.1 // payline 0
line = 0.2, 1.1, 2.0, 3.1, 4.2 // payline 1
So far, I have the code below, but it just returns all the the numbers in the sting instead. eg, the first line returns 0 1 1 1 2 1 3 1 4 1 0 and the second returns 0 2 1 1 2 0 3 1 4 2 1
foreach (var line in Paylines)
{
int[] lines = (from Match m in Regex.Matches(line.ToString(), #"\d+")
select int.Parse(m.Value)).ToArray();
foreach (var x in lines)
{
Console.WriteLine(x.ToString());
}
}
You may use a lookbehind-based regex solution:
#"(?<=\.)\d+"
It matches 1+ digits after a dot without placing the dot into a match value.
See the regex demo.
In C#, you may use
var myVals = Regex.Matches(line, #"(?<=\.)\d+", RegexOptions.ECMAScript)
.Cast<Match>()
.Select(m => int.Parse(m.Value))
.ToList();
See the C# demo.
The RegexOptions.ECMAScript option is passed for the \d to only match ASCII digits in the [0-9] range and avoid matching other Unicode digits.

To Count Occurrences of all sub strings in string C#

Question: I have a long string and I require to find the count of occurrences of all sub strings present under that string and print a list of all sub strings and their count (if count is > 1) in decreasing order of count.
Example:
String = "abcdabcd"
Result:
Substrings Count
abcd 2
abc 2
bcd 2
ab 2
bc 2
cd 2
a 2
b 2
c 2
d 2
Problem: My string can be 5000 character long and I am not able to find a efficient way to achieve this.( Efficiency is very important for application)
Is there any algorithm present or by multi threading it is possible. please help.
Example using: Find a common string within a list of strings
void Main()
{
"abcdabcd".getAllSubstrings()
.AsParallel()
.GroupBy(x => x)
.Select(g => new {g.Key, count=g.Count()})
.Dump();
}
// Define other methods and classes here
public static class Ext
{
public static IEnumerable<string> getAllSubstrings(this string word)
{
return from charIndex1 in Enumerable.Range(0, word.Length)
from charIndex2 in Enumerable.Range(0, word.Length - charIndex1 + 1)
where charIndex2 > 0
select word.Substring(charIndex1, charIndex2);
}
}
Produces:
a 2
dabc 1
abcdabc 1
b 2
abc 2
dabcd 1
bc 2
bcda 1
abcd 2
ab 2
bcdab 1
cdabc 1
abcda 1
d 2
bcdabc 1
dab 1
bcd 2
abcdab 1
c 2
bcdabcd 1
abcdabcd 1
cd 2
da 1
cdab 1
cda 1
cdabcd 1

get char position in textBox

I am trying to get the index of a each char in the text "ABCDCEF" (textBox.text). The problem is that the first 'C' index is 2 and the second C index is 4 but the second 'C' index in the result is 2 too.
This is the code:
foreach (char character in textBox1.Text)
{
MessageBox.Show(character + " - " + textBox1.Text.IndexOf(character));
}
Result:
char - index
A - 0
B - 1
C - 2
D - 3
C - 2
E - 5
F - 6
The correct result should be:
char - index
A - 0
B - 1
C - 2
D - 3
C - 4
E - 5
F - 6
Why it's happening?
Thanks
string.IndexOf returns first occurrence of a character, that's why it returns index 2 for c lookup.
MSDN Says,
Reports the zero-based index of the first occurrence of a specified
Unicode character or string within this instance. The method returns
-1 if the character or string is not found in this instance.
You could convert it to for loop and get index for each character.
for(int i=0;i<textBox1.Text.Length;i++)
{
MessageBox.Show(textBox1.Text[i] + " - " + i);
}

Categories

Resources