C# RegEx to find values within a string - c#

I am new to RegEx. I have a string like following. I want to get the values between [{# #}]
Ex: "Employee name is [{#John#}], works for [{#ABC Bank#}], [{#Houston#}]"
I would like to get the following values from the above string.
"John",
"ABC Bank",
"Houston"

Based on the solution Regular Expression Groups in C#.
You can try this:
string sentence = "Employee name is [{#john#}], works for [{#ABC BANK#}],
[{#Houston#}]";
string pattern = #"\[\{\#(.*?)\#\}\]";
foreach (Match match in Regex.Matches(sentence, pattern))
{
if (match.Success && match.Groups.Count > 0)
{
var text = match.Groups[1].Value;
Console.WriteLine(text);
}
}
Console.ReadLine();

Based on the solution and awesome breakdown for matching patterns inside wrapping patterns you could try:
\[\{\#(?<Text>(?:(?!\#\}\]).)*)\#\}\]
Where \[\{\# is your escaped opening sequence of [{# and \#\}\] is the escaped closing sequence of #}].
Your inner values are in the matching group named Text.
string strRegex = #"\[\{\#(?<Text>(?:(?!\#\}\]).)*)\#\}\]";
Regex myRegex = new Regex(strRegex, RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.Singleline);
string strTargetString = #"Employee name is [{#John#}], works for [{#ABC Bank#}], [{#Houston#}]";
foreach (Match myMatch in myRegex.Matches(strTargetString))
{
if (myMatch.Success)
{
var text = myMatch.Groups["Text"].Value;
// TODO: Do something with it.
}
}

using System;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine(Test("the quick brown [{#fox#}] jumps over the lazy dog."));
Console.ReadLine();
}
public static string Test(string str)
{
if (string.IsNullOrEmpty(str))
return string.Empty;
var result = System.Text.RegularExpressions.Regex.Replace(str, #".*\[{#", string.Empty, RegexOptions.Singleline);
result = System.Text.RegularExpressions.Regex.Replace(result, #"\#}].*", string.Empty, RegexOptions.Singleline);
return result;
}
}
}

Related

How to extract a substring from one delimiter to another in C#?

My input is going to be as follows:
abc#gmail.com,def#yahoo.com;xyz#gmail.com;ghi#hotmail.com and so on
Now I want my output to be:
abc
def
xyz
ghi
The following is my code:
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main(string[] args)
{
string str;
string[] newstr,newstr2;
Console.WriteLine("Enter the email addresses: ");
str=Console.ReadLine();
newstr=Regex.Split(str,",|;|#");
foreach (string s in newstr)
{
Console.WriteLine(s);
}
}
}
My output right now is:
abc
gmail.com
def
yahoo.com
xyz
gmail.com
ghi
hotmail.com
Any kind of help would be greatly appreciated. Thanks.
You shouldn't use regex for split, and should no split by #. Instead, use the follopwing code:
using System;
public class Program
{
public static void Main(string[] args)
{
string str;
string[] newstr;
Console.WriteLine("Enter the email addresses: ");
str = Console.ReadLine();
newstr = str.Split(new char[] { ',', ';' }); // Split to get a temporal array of addresses
foreach (string s in newstr)
{
Console.WriteLine(s.Substring(0, s.IndexOf('#'))); // Extract the sender from the email addresses
}
}
}
Edit:
Or, with LINQ:
using System;
using System.Linq;
public class Program
{
public static void Main(string[] args)
{
string str;
string[] newstr;
Console.WriteLine("Enter the email addresses: ");
str = Console.ReadLine();
newstr = str.Split(new char[] { ',', ';' }) // Split to get a array of addresses to work with
.Select(s => s.Substring(0, s.IndexOf('#'))).ToArray(); // Extract the sender from the email addresses
foreach (string s in newstr)
{
Console.WriteLine(s);
}
}
}
another approach without RegEx
string input = "abc#gmail.com,def#yahoo.com;xy#gmail.com; ghi#hotmail.com";
var result = input.Split(',', ';').Select(x => x.Split('#').First());
first Split the adresses by , and ;, then select the part before the # by splitting again.
You can use this email regex:
var regex = new Regex(#"(?<name>\w+([-+.']\w+)*)#\w+([-.]\w+)*\.\w+([-.]\w+)*");
var results =
regex.Matches("abc#gmail.com,def#yahoo.com;xyz#gmail.com;ghi#hotmail.com")
.Cast<Match>()
.Select(m => m.Groups["name"].Value)
.ToList();
Perhaps using this might help
str.Substring(0, str.LastIndexOf(" ")<0?0:str.LastIndexOf(" "));
As Mail is a weird thing with a complexe definition, I will never assume that something with an # is a mail.
My best try would be to convert the string to a MailAddress, just in case it look like a mail but it's not one because of some invalid char etc.
string input = "abc#gmail.com,ghi#hotmail.com;notme; #op this is not a mail!";
var result = input
.Split(',', ';') // Split
.Select(x =>
{
string adr = "";
try
{ // Create an MailAddress, MailAddress has no TryParse.
adr = new MailAddress(x).User;
}
catch
{
return new { isValid = false, mail = adr };
}
return new { isValid = true, mail = adr };
})
.Where(x => x.isValid)
.Select(x => x.mail);
Actually, in the regular expression, to capture some substring, you need to wrap the expected content by ( and )
Below code should work
string str22 = "abc#gmail.com;def#yahoo.com,xyz#gmail.com;fah#yao.com,h347.2162#yahoo.com.hk";// ghi#hotmail.com";
List<string> ret = new List<string>();
string regExp = #"(.*?)#.*?[,;]{1}|(.*)#";
MatchCollection matches = Regex.Matches(str22, regExp, RegexOptions.IgnoreCase);
foreach (Match match in matches)
{
if (match.Success)
{
int pvt = 1;
while (string.IsNullOrEmpty(match.Groups[pvt].Value))
{
pvt++;
}
MessageBox.Show(match.Groups[pvt].Value);
}
}
return;
The regular expression is as below
(.*?)#.*?[,;]{1}|(.*)#
(.*?)#.*?[,;]{1} is fetching the substring before # and ? restrict it fetches the first match.
The last email do not contain , or ;, thus add a OR condition and fetch the last email name by the substring before #

RegEx get highlited text C#

i'am trying to get from this text:
updateServer
cInterface_test-184
cServer_test-184
dControl_test-178
mcontrol_doorinterface_test-178
the italic strings. Eg in second line test-184 i'am using the following regEx:
test+.*
But how can i extract the test-184 in a string in c# wpf ?
Thanks :)
EDIT: The Stars are not in the Text, I wanted to highlight the text i want
After 'test-', is there always a number?
If so, the following is the best for you:
static void Main(string[] args)
{
var text = #"
updateServer
cInterface_*test-184*
cServer_*test-184*
dControl_*test-178*
mcontrol_doorinterface_*test-178*
";
var pattern = #"test-\d*";
foreach (Match match in Regex.Matches(text, pattern))
Console.WriteLine(match);
}
If 'test-' is not a constant, you can use the following code:
static void Main(string[] args)
{
var text = #"
updateServer
cInterface_*test-184*cServer_*test-184*
cServer_*test-184*
dControl_*test-178*
mcontrol_doorinterface_*test-178*
";
var pattern = #"\*(.*?)\*";
foreach (Match match in Regex.Matches(text, pattern))
Console.WriteLine(match.Groups[1].Value);
}
You could use code llke this:
class Program
{
static void Main(string[] args)
{
var text = #"
updateServer
cInterface_*test-184*
cServer_*test-184*
dControl_*test-178*
mcontrol_doorinterface_*test-178*
";
foreach (Match match in Regex.Matches(text, "\\*(.*)\\*"))
{
Console.WriteLine(match.Groups[1].Value);
}
}
}

C# Regex extract certain digits from a number

I am trying to extract the digits from 10:131186; and get 10131186 without the : and ;.
What is the Regex pattern I need to create?
var input = "10:131186;";
string pattern = ":(.*);";
Match m = Regex.Match(input, pattern);
Console.WriteLine(m.Value);
With the above code, I am getting :131186; instead of 10121186.
Why you need to use Regex. It's slower than using string.Replace method
string input = "10:131186;";
input = input.Replace(":", "");
input = input.Replace(";", "");
Console.WriteLine(input);
You can try using Regex.Replace:
var input = "10:131186;";
string pattern = #"(\d+):(\d+);";
string res = Regex.Replace(input, pattern, "$1$2");
Console.WriteLine(res);
and you can also use Split with Join:
var input = "10:131186;";
Console.WriteLine(string.Join("", input.Split (new char[] { ':', ';' }, StringSplitOptions.RemoveEmptyEntries)));
Please try this..
string input = "10:131186;";
input = input.Replace(":", String.Empty).Replace(";", string.Empty);
Just print the group index 1.
var input = "10:131186;";
string pattern = ":(.*);";
Match m = Regex.Match(input, pattern);
Console.WriteLine(m.Value[1]);
or use assertions.
var input = "10:131186;";
string pattern = "(?<=:).*?(?=;)";
Match m = Regex.Match(input, pattern);
Console.WriteLine(m.Value);
You could use the pattern \\d+ to match digits in the string and concatenate them into a single string.
using System;
using System.Text;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
string input = "10:131186;";
MatchCollection mCol = Regex.Matches(input, "\\d+");
StringBuilder sb = new StringBuilder();
foreach (Match m in mCol)
{
sb.Append(m.Value);
}
Console.WriteLine(sb);
}
}
Results:
10131186
Demo

Regex extract encoding type from given strings

following a recent thread in stackoverflow, I'm posting a new question:
I have several strings from which I want to extract the encoding type.
I'm willing to do it using regex:
Examples:
utf-8 quoted printable
string str = "=?utf-8?Q?=48=69=67=68=2d=45=6e=64=2d=44=65=73=69=67=6e=65=72=2d=57=61=74=63=68=2d=52=65=70=6c=69=63=61=73=2d=53=61=76=65=2d=54=48=4f=55=53=41=4e=44=53=2d=32=30=31=32=2d=4d=6f=64=65=6c=73?=";
utf-8 Base 64
string fld4 = "=?utf-8?B?VmFsw6lyaWUgTWVqc25lcm93c2tp?= <Valerie.renamed#company.com>";
Windows 1258 Base 64
string msg2= "=?windows-1258?B?UkU6IFRyIDogUGxhbiBkZSBjb250aW51aXTpIGQnYWN0aXZpdOkgZGVz?= =?windows-1258?B?IHNlcnZldXJzIFdlYiBHb1ZveWFnZXN=?=";
iso-8859-1 Quoted printable
string fld2 = "=?iso-8859-1?Q?Fr=E9d=E9ric_Germain?= <Frederic.Germain#company.com>";
etc...
In order to write a generic decoding function, we need to extract:
the charset (utf-8, Windows1258, etc...)
the transfert encoding type (quoted printable or base 64)
the encoded string
Any idea how to extract the pattern between ?xxx?Q? or ?xxx?B?
Note: this can be uppercase or lowercase
Thanks.
Here is a Rubular that will do it for you. In short, this Regex =\?(.*?)\?[QBqb] will grab that encoding. But one thing to note is this, when grabbing the results, the third example you gave has two matches in it so just make sure you decide what you want to do with the second match.
Here is a full working solution
public class Encoded
{
public string Charset;
public string ContentTransfertEncoding;
public string Data;
}
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace ConsoleApplication2
{
public class Decoding
{
public Decoding()
{
}
public List<Encoded> Process(string data)
{
List<Encoded> list = new List<Encoded>();
var occurences = new Regex(#"=\?[a-zA-Z0-9?=-]*\?[BbQq]\?[a-zA-Z0-9?=-]*\?=", RegexOptions.IgnoreCase);
var matches = occurences.Matches(data);
foreach (Match match in matches)
{
Encoded cls = new Encoded();
cls.Data = match.Groups[0].Value;
cls.Charset = GetCharset(cls.Data);
cls.ContentTransfertEncoding = GetContentTransfertEncoding(cls.Data);
// cleanup data
int pos = cls.Data.IndexOf("=?");
pos = cls.Data.IndexOf("?",pos+ 2);
cls.Data = cls.Data.Substring(pos + 3);
cls.Data = cls.Data.Replace("?=", "");
list.Add(cls);
}
return list;
}
private string GetContentTransfertEncoding(string data)
{
var occurences = new Regex(#"=\?(.*?)\?[QBqb]", RegexOptions.IgnoreCase);
var matches = occurences.Matches(data);
foreach (Match match in matches)
{
int pos = match.Groups[0].Value.LastIndexOf('?');
return match.Groups[0].Value.Substring(pos+1);
}
return data;
}
public string GetCharset(string data)
{
var occurences = new Regex(#"=\?(.*?)\?[QBqb]", RegexOptions.IgnoreCase);
var matches = occurences.Matches(data);
foreach (Match match in matches)
{
string str1 = match.Groups[0].Value.Replace("=?", "");
int pos = str1.IndexOf('?');
str1 = str1.Substring(0, pos);
return str1; // there should be only 1 match
}
return data;
}
public string Decodeetc...()
}

Get the value between the apostrophes in C#

I want to extract the value between the apostrophes, for example from this string: package: name='com.app' versionCode='4' versionName='1.3' This is what "aapt" returns when developing android apps. I have to get the values com.app, 4, and 1.3. I'd appreciate any help :)
I found this, however this is VBA.
This regex should work on all cases, assuming that the ' character only occurs as the enclosing character for values:
string input = "package: name='com.app' versionCode='4' versionName='1.3'";
string[] values = Regex.Matches(input, #"'(?<val>.*?)'")
.Cast<Match>()
.Select(match => match.Groups["val"].Value)
.ToArray();
string strRegex = #"(?<==\')(.*?)(?=\')";
RegexOptions myRegexOptions = RegexOptions.None;
Regex myRegex = new Regex(strRegex, myRegexOptions);
string strTargetString = #"package: name='com.app' versionCode='4' versionName='1.3'";
foreach (Match myMatch in myRegex.Matches(strTargetString))
{
if (myMatch.Success)
{
// Add your code here
}
}
RegEx Hero sample here.
In case you're interested, here's a translation of that VBA you linked to:
public static void Test1()
{
string sText = "this {is} a {test}";
Regex oRegExp = new Regex(#"{([^\}]+)", RegexOptions.IgnoreCase | RegexOptions.CultureInvariant);
MatchCollection oMatches = oRegExp.Matches(sText);
foreach (Match Text in oMatches)
{
Console.WriteLine(Text.Value.Substring(1));
}
}
Also in VB.NET:
Sub Test1()
Dim sText = "this {is} a {test}"
Dim oRegExp = New Regex("{([^\}]+)", RegexOptions.IgnoreCase Or RegexOptions.CultureInvariant)
Dim oMatches = oRegExp.Matches(sText)
For Each Text As Match In oMatches
Console.WriteLine(Mid(Text.Value, 2, Len(Text.Value)))
Next
End Sub

Categories

Resources