Regex finding any characters in between ( ) - c#

i have a long text and in the text there is many something like this ( hello , hi ) or (hello,hi) , i have to take the space into account . how do i detect them in a long text and retrieve the hello and hi word and add to a list from the text? currently i use this regex :
string helpingWordPattern = "(?<=\\()(.*?)(?<=\\))";
Regex regexHelpingWord = new Regex(helpingWordPattern);
foreach (Match m in regexHelpingWord.Matches(lstQuestion.QuestionContent))
{
// removing "," and store helping word into a list
string str = m.ToString();
if (str.Contains(","))
{
string[] strWords = str.Split(','); // Will contain a ) with a word , e.g. ( whole) )
if(strWords.Contains(")"))
{
strWords.Replace(")", ""); // Try to remove them. ERROR here cos i can't use array with replace.
}
foreach (string words in strWords)
{
options.Add(words);
}
}
}
I google and search for the correct regex , the regex i use suppose to remove the ) too but it doesn't .

Put the \\( \\) bracket-matchers, outside the group you wish to capture?
Regex regex = new Regex( "\\((.*?)\\)");
foreach (Match m in regex.Matches( longText)) {
string inside = Match.Groups[1]; // inside the brackets.
...
}
Then use Match.Groups[1], not the whole text of the match.

You can also use this regex pattern:
(?<=[\(,])(.*?)(?=[\),])
(?<=[\(,])(\D*?)(?=[\),]) // for anything except number
Break Up:
(?<=[\(,]) = Positive look behind, looks for `(`or `,`
(.*?) = Looks for any thing except new line, but its lazy(matches as less as possible)
(?=[\),]) = Positive look ahead, looks for `)` or `,` after `hello` or `hi` etc.
Demo
EDIT
You can try this sample code for achievement: (untested)
List<string> lst = new List<string>();
MatchCollection mcoll = Regex.Matches(sampleStr,#"(?<=[\(,])(.*?)(?=[\),])")
foreach(Match m in mcoll)
{
lst.Add(m.ToString());
Debug.Print(m.ToString()); // Optional, check in Output window.
}

There are a lot of different ways you could do this... Below is some code using regex to match / split.
string input = "txt ( apple , orange) txt txt txt ( hello, hi,5 ) txt txt txt txt";
List Options = new List();
Regex regexHelpingWord = new Regex(#"\((.+?)\)");
foreach (Match m in regexHelpingWord.Matches(input))
{
string words = Regex.Replace(m.ToString(), #"[()]", "");
Regex regexSplitComma = new Regex(#"\s*,\s*");
foreach (string word in regexSplitComma.Split(words))
{
string Str = word.Trim();
double Num;
bool isNum = double.TryParse(Str, out Num);
if (!isNum) Options.Add(Str);
}
}

Related

Getting a list of strings by splitting a string by a specific tag

I would like to split a string into a list or array by a specific tag.
<START><A>message<B>UnknownLengthOfText<BEOF><AEOF><A>message<B>UnknownLengthOfText<BEOF><AEOF><END>
I want to split the above example into two items, the items being the strings between the <A> and <AEOF> tags
Any help is appreciated.
I would suggest simple regex for this.
Take a look at this example:
using System.Diagnostics;
using System.Text.RegularExpressions;
...
Regex regex = new Regex("<A>(.*?)<B><BEOF>(.*?)<AEOF>");
string myString = #"<START><A>message<B><BEOF>UnknownLengthOfText<AEOF><A>message<B><BEOF>some other line of text<AEOF><END>";
MatchCollection matches = regex.Matches(myString);
foreach (Match m in matches)
{
Debug.WriteLine(m.Groups[1].ToString(), m.Groups[2].ToString());
}
EDIT:
Since string is in one line, regex should be "lazy", marked with lazy quantifier ?. Also, I changed regex so that it uses sTrenat's suggestion to automatically parse message and title also.
So, instead of
Regex regex = new Regex("<A>(.*)<AEOF>");
I used
Regex regex = new Regex("<A>(.*?)<B><BEOF>(.*?)<AEOF>");
Notice additional ? which marks lazy quantifier, to stop when it finds first match between tags (without ? whole strign will be captured and not n messages between tags)
Try it with something like this:
string test = #"<START>
<A>message<B><BEOF>UnknownLengthOfText<AEOF>
<A>message<B><BEOF>UnknownLengthOfText<AEOF>
<END>";
//for this test this will give u an array containing 3 items...
string[] tmp1 = test.Split("<AEOF>");
//here u will store your results in
List<string> results = new List<string>();
//for every single one of those 3 items:
foreach(string item in tmp1)
{
//this will only be true for the first and second item
if(item.Contains("<A>"))
{
string[] tmp2 = item.Split("<A>");
//As the string you are looking for is always BEHIND the <A> you
//store the item[1], (the item[0] would be in front)
results.Add(tmp2[1]);
}
}
Rather than using the String.Split you can use the Regex.Split as below
var stringToSplit = #"<START>
<A>message<B>UnknownLengthOfText<BEOF><AEOF>
<A>message<B>UnknownLengthOfText<BEOF><AEOF>
<END>";
var regex = "<A>(.*)<AEOF>";
var splitStrings = Regex.Split(stringToSplit, regex);
splitStrings will contain 4 elements
splitString[0] = "<START>"
splitString[1] = "message<B>UnknownLengthOfText<BEOF>"
splitString[2] = "message<B>UnknownLengthOfText<BEOF>"
splitString[3] = "<END>"
Playing with the regex could give you only the strings between and
All answer so far are regex based. Here is an alternative without:
Try it Online!
var input = #"
<START>
<A>message<B>UnknownLengthOfText<BEOF><AEOF>
<A>message<B>UnknownLengthOfText<BEOF><AEOF>
<END>";
var start = "<A>";
var end = "<AEOF>";
foreach (var item in ExtractEach(input, start, end))
{
Console.WriteLine(item);
}
}
public static IEnumerable<string> ExtractEach(string input, string start, string end)
{
foreach (var line in input
.Split(Environment.NewLine.ToCharArray())
.Where(x=> x.IndexOf(start) > 0 && x.IndexOf(start) < x.IndexOf(end)))
{
yield return Extract(line, start, end);
}
}
public static string Extract(string input, string start, string end)
{
int startPosition = input.LastIndexOf(start) + start.Length;
int length = input.IndexOf(end) - startPosition;
var substring = input.Substring(startPosition, length);
return substring;
}

C# Regex returning multiple lines of text

I have the following function:
public static string ReturnEmailAddresses(string input)
{
string regex1 = #"\[url=";
string regex2 = #"mailto:([^\?]*)";
string regex3 = #".*?";
string regex4 = #"\[\/url\]";
Regex r = new Regex(regex1 + regex2 + regex3 + regex4, RegexOptions.IgnoreCase | RegexOptions.Multiline);
MatchCollection m = r.Matches(input);
if (m.Count > 0)
{
StringBuilder sb = new StringBuilder();
int i = 0;
foreach (var match in m)
{
if (i > 0)
sb.Append(Environment.NewLine);
string shtml = match.ToString();
var innerString = shtml.Substring(shtml.IndexOf("]") + 1, shtml.IndexOf("[/url]") - shtml.IndexOf("]") - 1);
sb.Append(innerString); //just titles
i++;
}
return sb.ToString();
}
return string.Empty;
}
As you can see I define a url in the "markdown" format:
[url = http://sample.com]sample.com[/url]
In the same way, emails are written in that format too:
[url=mailto:service#paypal.com.au]service#paypal.com.au[/url]
However when i pass in a multiline string, with multiple email addresses, it only returns the first email only. I would like it to have multple matches, but I cannot seem to get that working?
For example
[url=mailto:service#paypal.com.au]service#paypal.com.au[/url] /r/n a whole bunch of text here /r/n more stuff here [url=mailto:anotheremail#paypal.com.au]anotheremail#paypal.com.au[/url]
This will only return the first email above?
The mailto:([^\?]*) part of your pattern is matching everything in your input string. You need to add the closing bracket ] to the inside of your excluded characters to restrict that portion from overflowing outside of the "mailto" section and into the text within the "url" tags:
\[url=mailto:([^\?\]]*).*?\[\/url\]
See this link for an example: https://regex101.com/r/zcgeW8/1
You can extract desired result with help of positive lookahead and positive lookbehind. See http://www.rexegg.com/regex-lookarounds.html
Try regex: (?<=\[url=mailto:).*?(?=\])
Above regex will capture two email addresses from sample string
[url=mailto:service#paypal.com.au]service#paypal.com.au[/url] /r/n a whole bunch of text here /r/n more stuff here [url=mailto:anotheremail#paypal.com.au]anotheremail#paypal.com.au[/url]
Result:
service#paypal.com.au
anotheremail#paypal.com.au

Regular Expression to split a string with comma and double quotes in c#

I have tried a regular expression to split a string with comma and space. Expression matches all the cases except only one. The code I have tried is:
List<string> strNewSplit = new List<string>();
Regex csvSplit = new Regex("(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)", RegexOptions.Compiled);
foreach (Match match in csvSplit.Matches(input))
{
strNewSplit.Add(match.Value.TrimStart(','));
}
return strNewSplit;
CASE1: "MYSQL,ORACLE","C#,ASP.NET"
EXpectedOutput:
"MYSQL,ORACLE"
"C#,ASP.NET"
RESULT : PASS
CASE2: "MYSQL,ORACLE", "C#,ASP.NET"
ExpectedOutput:
"MYSQL,ORACLE"
"C#,ASP.NET"
Actual OutPut:
"MYSQL,ORACLE"
"C#
ASP.NET"
RESULT: FAIL.
If I provide a space after a comma in between two DoubleQuotes then I didn't get appropriate output. Am I missing anything? Please provide a better solution.
I normally write down the EBNF of my Input to parse.
In your case I would say:
List = ListItem {Space* , Space* ListItem}*;
ListItem = """ Identifier """; // Identifier is everything whitout "
Space = [\t ]+;
Which means a List consists of a ListItem that is folled by zero or mutliple (*) ListItems that are separated with spaces a comma and again spaces.
That lead me to the following (you are searching for ListItems):
static void Main(string[] args)
{
matchRegex("\"MYSQL,ORACLE\",\"C#,ASP.NET\"").ForEach(Console.WriteLine);
matchRegex("\"MYSQL,ORACLE\", \"C#,ASP.NET\"").ForEach(Console.WriteLine);
}
static List<string> matchRegex(string input)
{
List<string> strNewSplit = new List<string>();
Regex csvSplit = new Regex(
"(\"(?:[^\"]*)\")"
, RegexOptions.Compiled);
foreach (Match match in csvSplit.Matches(input))
{
strNewSplit.Add(match.Value.TrimStart(','))
}
return strNewSplit;
}
Which returns what you wanted. Hope I understood you correctly.

A More Efficient Way to Parse a String in C#

I have this code that reads a file and creates Regex groups. Then I walk through the groups and use other matches on keywords to extract what I need. I need the stuff between each keyword and the next space or newline. I am wondering if there is a way using the Regex keyword match itself to discard what I don't want (the keyword).
//create the pattern for the regex
String VSANMatchString = #"vsan\s(?<number>\d+)[:\s](?<info>.+)\n(\s+name:(?<name>.+)\s+state:(?<state>.+)\s+\n\s+interoperability mode:(?<mode>.+)\s\n\s+loadbalancing:(?<loadbal>.+)\s\n\s+operational state:(?<opstate>.+)\s\n)?";
//set up the patch
MatchCollection VSANInfoList = Regex.Matches(block, VSANMatchString);
// set up the keyword matches
Regex VSANNum = new Regex(#" \d* ");
Regex VSANName = new Regex(#"name:\S*");
Regex VSANState = new Regex(#"operational state\S*");
//now we can extract what we need since we know all the VSAN info will be matched to the correct VSAN
//match each keyword (name, state, etc), then split and extract the value
foreach (Match m in VSANInfoList)
{
string num=String.Empty;
string name=String.Empty;
string state=String.Empty;
string s = m.ToString();
if (VSANNum.IsMatch(s)) { num=VSANNum.Match(s).ToString().Trim(); }
if (VSANName.IsMatch(s))
{
string totrim = VSANName.Match(s).ToString().Trim();
string[] strsplit = Regex.Split (totrim, "name:");
name=strsplit[1].Trim();
}
if (VSANState.IsMatch(s))
{
string totrim = VSANState.Match(s).ToString().Trim();
string[] strsplit=Regex.Split (totrim, "state:");
state=strsplit[1].Trim();
}
It looks like your single regex should be able to gather all you need. Try this:
string name = m.Groups["name"].Value; // Or was it m.Captures["name"].Value?

regex for PropertyName e.g. HelloWorld2HowAreYou would get Hello HelloWorld2 HelloWorld2How

I need a regex for PropertyName e.g. HelloWorld2HowAreYou would get:
Hello HelloWorld2 HelloWorld2How etc.
I want to use it in C#
[A-Z][a-z0-9]+ would give you all words that start with capital letter. You can write code to concat them one by one to get the complete set of words.
For example matching [A-Z][a-z0-9]+ against HelloWorld2HowAreYou with global flag set, you will get the following matches.
Hello
World2
How
Are
You
Just iterate through the matches and concat them to form the words.
Port this to C#
var s = "HelloWorld2HowAreYou";
var r = /[A-Z][a-z0-9]+/g;
var m;
var matches = [];
while((m = r.exec(s)) != null)
matches.push(m[0]);
var o = "";
for(var i = 0; i < matches.length; i++)
{
o += matches[i]
console.log(o + "\n");
}
I think something like this is what you want:
var s = "HelloWorld2HowAreYou";
Regex r = new Regex("(?=[A-Z]|$)(?<=(.+))");
foreach (Match m in r.Matches(s)) {
Console.WriteLine(m.Groups[1]);
}
The output is (as seen on ideone.com):
Hello
HelloWorld2
HelloWorld2How
HelloWorld2HowAre
HelloWorld2HowAreYou
How it works
The regex is based on two assertions:
(?=[A-Z]|$) matches positions just before an uppercase, and at the end of the string
(?<=(.+)) is a capturing lookbehind for .+ behind the current position into group 1
Essentially, the regex translates to:
"Everywhere just before an uppercase, or at the end of the string"...
"grab everything behind you if it's not an empty string"

Categories

Resources