How can C# Regex capture everything between *| and |*? - c#

In C#, I need to capture variablename in the phrase *|variablename|*.
I've got this RegEx: Regex regex = new Regex(#"\*\|(.*)\|\*");
Online regex testers return "variablename", but in C# code, it returns *|variablename|*, or the string including the star and bar characters. Anyone know why I'm experiencing this return value?
Thanks much!
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace RegExTester
{
class Program
{
static void Main(string[] args)
{
String teststring = "This is a *|variablename|*";
Regex regex = new Regex(#"\*\|(.*)\|\*");
Match match = regex.Match(teststring);
Console.WriteLine(match.Value);
Console.Read();
}
}
}
//Outputs *|variablename|*, instead of variablename

match.Value contains the entire match. This includes the delimiters since you specified them in your regex. When I test your regex and input with RegexPal, it highlights *|variablename|*.
You want to get only the capture group (the stuff in the brackets), so use match.Groups[1]:
String teststring = "This is a *|variablename|*";
Regex regex = new Regex(#"\*\|(.*)\|\*");
Match match = regex.Match(teststring);
Console.WriteLine(match.Groups[1]);

Related

Find exact substring in string array using LINQ in C#

I'm trying to see if an exact substring exists in a string array. It is returning true if the substring exists in the string but it will contains spelling errors.
EDIT:
For example if I am checking if 'Connecticut' exists in the string array but it is spelled 'Connecticute' it will still return true but I do not want it to. I want it to return false for 'Connecticute' and return true for
'Connecticut' only
Is there a way to do this using LINQ?
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.IO;
namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
string[] sample = File.ReadAllLines(#"C:\samplefile.txt");
/* Sample file containing data organised like
Niall Gleeson 123 Fake Street UNIT 63 Connecticute 00703 USA
*/
string[] states = File.ReadAllLines(#"C:\states.txt"); //Text file containing list of all US states
foreach (string s in sample)
{
if (states.Any(s.Contains))
{
Console.WriteLine("Found State");
Console.WriteLine(s);
Console.ReadLine();
}
else
{
Console.WriteLine("Could not find State");
Console.WriteLine(s);
Console.ReadLine();
}
}
}
}
}
String.Contains returns true if one part of the string is anywhere within the string being matched.
Hence "Conneticute".Contains("Conneticut") will be true.
If you want exact matches, what you're looking for is String.Equals
...
if (states.Any(s.Equals))
...
You could use \b to match word breaking characters (ie. white spaces, periods, start or end of string etc):
var r = new Regex("\bConneticut\b", RegexOptions.IgnoreCase);
var m = r.Match("Conneticute");
Console.WriteLine(m.Success); // false
Rather than using string.Contains, which matches whether the string contains the sequence of letters, use a regular expression match, with whatever you consider to be appropriate. For example, this will match on word boundaries,
var x = new [] { "Connect", "Connecticute is a cute place", "Connecticut", "Connecticut is a nice place" };
x.Dump();
var p = new Regex(#"\bConnecticut\b", RegexOptions.Compiled);
x.Where(s=>p.IsMatch(s)).Dump();
This will match "Connecticut" and "CConnecticut is a nice place" but not the other strings. Change the regex to suit your exact requirements.
(.Dump() is used in linqpad, which can be used to experiment with this sort of thing )

Not able to get specific pattern inside a string

I want to find a specific pattern of substring inside a string .Upto some extent I can able to get but not exactly what i want to extract .
I am working on a console application . Below i have mentioned the code
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string item = #"wewe=23213123i18n("""", test. ),cstr(12),i18n("""",test3)hdsghwgdhwsgd)";
item = #"MsgBox(I18N(CStr(539)," + "Cannot migrate to the same panel type.)" +", MsgBoxStyle.Exclamation, DOWNLOAD_CAPTION)";
string reg1 = #"i18n(.*),(.*)\)";
string strVal = Regex.Match(item, reg1, RegexOptions.IgnorePatternWhitespace | RegexOptions.IgnoreCase).Groups[0].Value;
List<string> str = new List<string> ();
str.Add(strVal);
System.IO.File.WriteAllLines(#"C:\Users\E543925.PACRIM1\Desktop\Tools\Test.txt", str);
}
}
}
Expected output - I18N(CStr(539)," + "Cannot migrate to the same panel type.)
Actual output - I18N(CStr(539),Cannot migrate to the samepaneltype.),MsgBoxStyle.Exclamation, DOWNLOAD_CAPTION)
I have to do some changes in regex expression . i tried , but not able to get success .
I am new to regex and c# .
Please help .
Thanks in advance ..
You want to make the .* lazy (i.e. match as few characters as possible) with .*?
(or perhaps make your regex something like"i18n\([^,)]*,[^)]*\)" instead).
If you want multiple matches, so you should probably have a while-loop.
This:
string item = #"wewe=23213123i18n("""", test. ),cstr(12),i18n("""",test3)hdsghwgdhwsgd)";
item = #"MsgBox(I18N(CStr(539)," + "Cannot migrate to the same panel type.)" +", MsgBoxStyle.Exclamation, DOWNLOAD_CAPTION)";
string reg1 = #"i18n(.*?),(.*?)\)";
Match match = Regex.Match(item, reg1, RegexOptions.IgnorePatternWhitespace | RegexOptions.IgnoreCase);
while (match.Success)
{
string strVal = match.Groups[0].Value;
Console.WriteLine(strVal);
match = match.NextMatch();
}
Prints:
I18N(CStr(539),Cannot migrate to the same panel type.)
Live demo.
you can try this regex:
i18n(\([^\)]*\))
it means: match i18n and capture groups that start with an open (, are followed by any character except a closed ) and then have a closed )

C# Regex Validating Mac Address

I am trying to validate mac addresses. In this instance there is no - or : for example a valid mac would be either:
0000000000
00-00-00-00-00-00
00:00:00:00:00:00
However I keep getting false when run against the below code:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Text.RegularExpressions;
namespace parsingxml
{
class Program
{
static void Main(string[] args)
{
Console.Write("Give me a mac address: ");
string input = Console.ReadLine();
input = input.Replace(" ", "").Replace(":","").Replace("-","");
Regex r = new Regex("^([:xdigit:]){12}$");
if (r.IsMatch(input))
{
Console.Write("Valid Mac");
}
else
{
Console.Write("Invalid Mac");
}
Console.Read();
}
}
}
OUTPUT: Invalid Mac
.NET regex does not have support for POSIX character class. And even if it does support, you need to enclose it in [] to make it effective, i.e. [[:xdigit:]], otherwise, it will be treated as a character class with the characters :, x, d, i, g, t.
You probably want this regex instead (for the MAC address after you have cleaned up the unwanted characters):
^[a-fA-F0-9]{12}$
Note that by cleaning up the string of space, - and :, you will allow inputs as shown below to pass:
34: 3-342-9fbc: 6:7
DEMO
Try this regex instead:
^(?:[0-9a-fA-F]{2}:){5}[0-9a-fA-F]{2}|(?:[0-9a-fA-F]{2}-){5}[0-9a-fA-F]{2}|(?:[0-9a-fA-F]{2}){5}[0-9a-fA-F]{2}$
Matches:
12-23-34-45-56-67
12:23:34:45:56:67
122334455667
But not:
12:34-4556-67
Edit: Your code works for me.
Seems like you could just do this:
Regex r = new Regex("^([0-9a-fA-F]{2}(?:(?:-[0-9a-fA-F]{2}){5}|(?::[0-9a-fA-F]{2}){5}|[0-9a-fA-F]{10}))$");
Or this, which is a lot simpler and would be a little more forgiving:
Regex r = new Regex("^([0-9a-fA-F]{2}(?:[:-]?[0-9a-fA-F]{2}){5})$");
I'd use a regular expression like this one, myself:
Regex rxMacAddress = new Regex( #"^[0-9a-fA-F]{2}(((:[0-9a-fA-F]{2}){5})|((:[0-9a-fA-F]{2}){5}))$") ;
6 pairs of hex digits, separator either by colons or by hyphens, but not a mixture.
Your regex isn't good. Here you got a good one:
public const string ValidatorInvalidMacAddress = "^([0-9A-Fa-f]{2}[:-]?){5}([0-9A-Fa-f]{2})$";
Correct Regex expression that worked for me and accept either macaddress with either all - or all : separation is :- "^[0-9a-fA-F]{2}(((:[0-9a-fA-F]{2}){5})|((-[0-9a-fA-F]{2}){5}))$"
using System.Net.NetworkInformation;
try{
PhysicalAddress py = PhysicalAddress.Parse("abcd");
}catch(Exception){
Console.WriteLine("Mac address not valid");
}

How to extract the useful data with regular expression in C#?

Sorry guys, it seems like I didn't explain my question clearly. Please allow me to rephrase my question again.
I use WebClient to download the whole webpage and I got the content as a string
"
.......
.....
var picArr ="/d/manhua/naruto/516/1.png|/d/manhua/naruto/516/2.png|/d/manhua/naruto/516/3.png|/d/manhua/naruto/516/4.png|/d/manhua/naruto/516/5.png|/d/manhua/naruto/516/6.png|/d/manhua/naruto/516/7.png|/d/manhua/naruto/516/8.png|/d/manhua/naruto/516/9.png|/d/manhua/naruto/516/10.png|/d/manhua/naruto/516/11.png|/d/manhua/naruto/516/12.png|/d/manhua/naruto/516/13.png|/d/manhua/naruto/516/14.png|/d/manhua/naruto/516/15.png|/d/manhua/naruto/516/16.png"
......
";
in this content, I want to get only one line which is
var picArr ="/d/manhua/naruto/516/1.png|/d/manhua/naruto/516/2.png|/d/manhua/naruto/516/3.png|/d/manhua/naruto/516/4.png|/d/manhua/naruto/516/5.png|/d/manhua/naruto/516/6.png|/d/manhua/naruto/516/7.png|/d/manhua/naruto/516/8.png|/d/manhua/naruto/516/9.png|/d/manhua/naruto/516/10.png|/d/manhua/naruto/516/11.png|/d/manhua/naruto/516/12.png|/d/manhua/naruto/516/13.png|/d/manhua/naruto/516/14.png|/d/manhua/naruto/516/15.png|/d/manhua/naruto/516/16.png"
now I want use regular expression to get this string and get the value of picArr.
my reg exp is
var picArr ="([.]*)"
I think the dot means any characters. But it doesn't work. :(
Any idea?
THanks a lot
/picArr =\"([^\"]+)\"/
If I got this right that's what you need.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace ExtractFileNames
{
class Program
{
static void Main(string[] args)
{
string pageData = #"blah blah
var picArr =""/d/manhua/naruto/516/1.png|/d/manhua/naruto/516/2.png|/d/manhua/naruto/516/3.png|/d/manhua/naruto/516/4.png|/d/manhua/naruto/516/5.png|/d/manhua/naruto/516/6.png|/d/manhua/naruto/516/7.png|/d/manhua/naruto/516/8.png|/d/manhua/naruto/516/9.png|/d/manhua/naruto/516/10.png|/d/manhua/naruto/516/11.png|/d/manhua/naruto/516/12.png|/d/manhua/naruto/516/13.png|/d/manhua/naruto/516/14.png|/d/manhua/naruto/516/15.png|/d/manhua/naruto/516/16.png""
more blah decimal blah";
var match = Regex.Match(pageData, #"var\s+picArr\s*=\s*""(.*?)""");
var str = match.Groups[1].Value;
var files = str.Split('|');
foreach(var f in files)
{
Console.WriteLine(f);
}
Console.ReadLine();
}
}
}
Output:
/d/manhua/naruto/516/1.png
/d/manhua/naruto/516/2.png
/d/manhua/naruto/516/3.png
/d/manhua/naruto/516/4.png
/d/manhua/naruto/516/5.png
/d/manhua/naruto/516/6.png
/d/manhua/naruto/516/7.png
/d/manhua/naruto/516/8.png
/d/manhua/naruto/516/9.png
/d/manhua/naruto/516/10.png
/d/manhua/naruto/516/11.png
/d/manhua/naruto/516/12.png
/d/manhua/naruto/516/13.png
/d/manhua/naruto/516/14.png
/d/manhua/naruto/516/15.png
/d/manhua/naruto/516/16.png
If you just want to get the filenames, you could just do a split on the pipe:
var picArr = "/d/manhua/naruto/516/1.png|/d/manhua/naruto/516/2.png|/d/manhua/naruto/516/3.png|/d/manhua/naruto/516/4.png|/d/manhua/naruto/516/5.png|/d/manhua/naruto/516/6.png|/d/manhua/naruto/516/7.png|/d/manhua/naruto/516/8.png|/d/manhua/naruto/516/9.png|/d/manhua/naruto/516/10.png|/d/manhua/naruto/516/11.png|/d/manhua/naruto/516/12.png|/d/manhua/naruto/516/13.png|/d/manhua/naruto/516/14.png|/d/manhua/naruto/516/15.png|/d/manhua/naruto/516/16.png";
var splitPics = picArr.Split('|');
foreach (var pic in splitPics)
{
Console.WriteLine(pic);
}
It looks like you want the value of the string literal in your snippet, "/d/manhua/naruto/516/1.png|..."
Get rid of the square brackets. "." matches any character just as it is, without brackets. Square brackets are for matching a limited set of characters: For example, you'd use "[abc]" to match any "a", "b", or "c".
It looks like the brackets have the effect of escaping the ".", a feature I hadn't known about (or forgot, sometime in the Ordovician). But I tested the regex as you have it with the string value replaced with a series of dots, and the regex matched. It's being treated as a literal "." character, which you would more likely try to match with a backslash escape: "\."
So just get rid of the brackets and it should work. It works in VS2008 for me.

C# Regex, Either Or

I have a string that I parse in regex:
"one [two] three [four] five"
I have regex that extracts the bracketed text into <bracket>, but now I want to add the other stuff (one, three, five) into <text>, but I want there to be seperate matches.
So either it is a match for <text> or a match for <bracket>. Is this possible using regex?
So the list of matches would look like:
text=one, bracketed=null
text=null, bracketed=[two]
text=three, bracketed=null
text=one, bracketed=[four]
text=five, bracketed=null
Is this what you're after? Basically | is used for alternation in regular expressions.
using System;
using System.Text.RegularExpressions;
public class Test
{
public static void Main()
{
string test = "one [two] three [four] five";
Regex regex = new Regex(#"(?<text>[a-z]+)|(?<bracketed>\[[a-z]+\])");
Match match = regex.Match(test);
while (match.Success)
{
Console.WriteLine("text: {0}; bracketed: {1}",
match.Groups["text"],
match.Groups["bracketed"]);
match = match.NextMatch();
}
}
}

Categories

Resources