Regex not working in .NET - c#

So I'm trying to match up a regex and I'm fairly new at this. I used a validator and it works when I paste the code but not when it's placed in the codebehind of a .NET2.0 C# page.
The offending code is supposed to be able to split on a single semi-colon but not on a double semi-colon. However, when I used the string
"entry;entry2;entry3;entry4;"
I get a nonsense array that contains empty values, the last letter of the previous entry, and the semi-colons themselves. The online javascript validator splits it correctly. Please help!
My regex:
((;;|[^;])+)

Split on the following regular expression:
(?<!;);(?!;)
It means match semicolons that are neither preceded nor succeeded by another semicolon.
For example, this code
var input = "entry;entry2;entry3;entry4;";
foreach (var s in Regex.Split(input, #"(?<!;);(?!;)"))
Console.WriteLine("[{0}]", s);
produces the following output:
[entry]
[entry2]
[entry3]
[entry4]
[]
The final empty field is a result of the semicolon on the end of the input.
If the semicolon is a terminator at the end of each field rather than a separator between consecutive fields, then use Regex.Matches instead
foreach (Match m in Regex.Matches(input, #"(.+?)(?<!;);(?!;)"))
Console.WriteLine("[{0}]", m.Groups[1].Value);
to get
[entry]
[entry2]
[entry3]
[entry4]

Why not use String.Split on the semicolon?
string sInput = "Entry1;entry2;entry3;entry4";
string[] sEntries = sInput.Split(';');
// Do what you have to do with the entries in the array...
Hope this helps,
Best regards,
Tom.

As tommieb75 wrote, you can use String.Split with StringSplitOptions Enumeration so you can control your output of newly created splitting array
string input = "entry1;;entry2;;;entry3;entry4;;";
char[] charSeparators = new char[] {';'};
// Split a string delimited by characters and return all non-empty elements.
result = input.Split(charSeparators, StringSplitOptions.RemoveEmptyEntries);
The result would contain only 4 elements like this:
<entry1><entry2><entry3><entry4>

Related

Get the string from known index till \r\n is found

I have a string s which reads my batch file content.
Suppose the content of s is as follows:
"\t\r\n##echo off\r\necho \"Hello World!!!\"\r\necho \"One\"\r\nset /p DUMMY=Hit ENTER to continue...\r\ncall second.bat\r\necho \"done!!!\"\r\ncall third.bat\r\necho \"done 3!!!\""
i want to write a condition which does the below,
while (s.Contains("call")) && (if string next to "call" contains(.bat))
how to acheive this?
I am new to c#. Please help me in this regard.
thanks in advance
You can split the string on new lines and process only the lines you want as follows:
foreach (string line in s.Split("\r\n", StringSplitOptions.None).Where(x => x.ToLower().StartsWith("call") && x.ToLower().EndsWith(".bat")))
{
// do stuff here
}
It seems that you are parsing some kind of log; in this case I suggest using regular expressions, e.g.
using System.Text.RegularExpressions;
...
string source =
"\t\r\n##echo off\r\necho \"Hello World!!!\"\r\necho \"One\"\r\nset /p DUMMY=Hit ENTER to continue...\r\ncall second.bat\r\necho \"done!!!\"\r\ncall third.bat\r\necho \"done 3!!!\"";
var matches = Regex
.Matches(source, #"call.+?\.bat", RegexOptions.IgnoreCase)
.OfType<Match>()
.Select(match => match.Value);
// call second.bat
// call third.bat
foreach (string match in matches) {
...
}
It's unclear what is "string next", in the code above I've treated it as "after". In case it means "after several white spaces" the pattern will be
.Matches(source, #"call\s+?\.bat", RegexOptions.IgnoreCase)
The first thing that comes to my mind is using the text.Split ('\n', '\r') method. This way you get an array of strings which are separated by those line break symbols. Because you'd get empty strings, you should also filter those out. For that, I would recommend converting the array to a list, iterate through all elements and remove all empty strings (consider using string.IsNullOrEmpty (text)).
If you always have \r\n, you can use text.Split("\r\n", StringSplitOptions.None) instead, and don't have to worry about empty strings in between. You could still convert it to a list for easier use.
Then you would get a fine list of the entire content separated through line breaks. Now you could loop through that and do whatever you want.

Splitting of a string using Regex

I have string of the following format:
string test = "test.BO.ID";
My aim is string that part of the string whatever comes after first dot.
So ideally I am expecting output as "BO.ID".
Here is what I have tried:
// Checking for the first occurence and take whatever comes after dot
var output = Regex.Match(test, #"^(?=.).*?");
The output I am getting is empty.
What is the modification I need to make it for Regex?
You get an empty output because the pattern you have can match an empty string at the start of a string, and that is enough since .*? is a lazy subpattern and . matches any char.
Use (the value will be in Match.Groups[1].Value)
\.(.*)
or (with a lookahead, to get the string as a Match.Value)
(?<=\.).*
See the regex demo and a C# online demo.
A non-regex approach can be use String#Split with count argument (demo):
var s = "test.BO.ID";
var res = s.Split(new[] {"."}, 2, StringSplitOptions.None);
if (res.GetLength(0) > 1)
Console.WriteLine(res[1]);
If you only want the part after the first dot you don't need a regex at all:
x.Substring(x.IndexOf('.'))

Split the string based on regular expression

I've an array of strings like Name, Groups[0].Id, Types[11].Name.
I want to filter the string that has square brackets and split them into two parts. For ex., Groups[0].Id into Groups and Id.
How I can find the strings that has square brackets using regular expression?
You can try this
Regex.Split(input,#"\[.*?\][.]");
Just for splitting a single string like
string value = "Groups[0].Id";
use
string[] parts = Regex.Split(value, "\[\d+\]\.");
Explanation: you have to escape the square bracket and dot characters with a backslash (they have special meanings within a regular expression) and \d+ will accept only a string of number digits ('0'..'9') with at least one digit.
Links:
A nice .NET regex test page is http://regexhero.net/
MSDN documentation on Regex: http://msdn.microsoft.com/en-us/library/8yttk7sy.aspx
I'm not sure if you wanted to split the strings which is implied by your question title, or filter the list which seems to be what your asking at the end. You can split each element of the array with brackets and a periods this regex. This regex does not assume that the indices are digits alone -- for example it will allow an array keyed by strings.
Regex.Split(a, #"\[[^\]]+\]\.");
REY
You can use LINQ to Filter the array in one line.
string[] ary = new string[3] {"Name", "Groups[0].Id", "Types[11].Name" };
ary = ary.Where(a => Regex.Match(a, #"\[[^\]]+\]\.").Success).ToArray();
foreach (string str in ary)
{
Console.WriteLine(str);
}

Regex: C# extract text within double quotes

I want to extract only those words within double quotes. So, if the content is:
Would "you" like to have responses to your "questions" sent to you via email?
The answer must be
you
questions
Try this regex:
\"[^\"]*\"
or
\".*?\"
explain :
[^ character_group ]
Negation: Matches any single character that is not in character_group.
*?
Matches the previous element zero or more times, but as few times as possible.
and a sample code:
foreach(Match match in Regex.Matches(inputString, "\"([^\"]*)\""))
Console.WriteLine(match.ToString());
//or in LINQ
var result = from Match match in Regex.Matches(line, "\"([^\"]*)\"")
select match.ToString();
Based on #Ria 's answer:
static void Main(string[] args)
{
string str = "Would \"you\" like to have responses to your \"questions\" sent to you via email?";
var reg = new Regex("\".*?\"");
var matches = reg.Matches(str);
foreach (var item in matches)
{
Console.WriteLine(item.ToString());
}
}
The output is:
"you"
"questions"
You can use string.TrimStart() and string.TrimEnd() to remove double quotes if you don't want it.
I like the regex solutions. You could also think of something like this
string str = "Would \"you\" like to have responses to your \"questions\" sent to you via email?";
var stringArray = str.Split('"');
Then take the odd elements from the array. If you use linq, you can do it like this:
var stringArray = str.Split('"').Where((item, index) => index % 2 != 0);
This also steals the Regex from #Ria, but allows you to get them into an array where you then remove the quotes:
strText = "Would \"you\" like to have responses to your \"questions\" sent to you via email?";
MatchCollection mc = Regex.Matches(strText, "\"([^\"]*)\"");
for (int z=0; z < mc.Count; z++)
{
Response.Write(mc[z].ToString().Replace("\"", ""));
}
I combine Regex and Trim:
const string searchString = "This is a \"search text\" and \"another text\" and not \"this text";
var collection = Regex.Matches(searchString, "\\\"(.*?)\\\"");
foreach (var item in collection)
{
Console.WriteLine(item.ToString().Trim('"'));
}
Result:
search text
another text
Try this (\"\w+\")+
I suggest you to download Expresso
http://www.ultrapico.com/Expresso.htm
I needed to do this in C# for parsing CSV and none of these worked for me so I came up with this:
\s*(?:(?:(['"])(?<value>(?:\\\1|[^\1])*?)\1)|(?<value>[^'",]+?))\s*(?:,|$)
This will parse out a field with or without quotes and will exclude the quotes from the value while keeping embedded quotes and commas. <value> contains the parsed field value. Without using named groups, either group 2 or 3 contains the value.
There are better and more efficient ways to do CSV parsing and this one will not be effective at identifying bad input. But if you can be sure of your input format and performance is not an issue, this might work for you.
Slight improvement on answer by #ria,
\"[^\" ][^\"]*\"
Will recognize a starting double quote only when not followed by a space to allow trailing inch specifiers.
Side effect: It will not recognize "" as a quoted value.

Quick way of splitting a mixed alphanum string into text and numeric parts?

Say I have a string such as
abc123def456
What's the best way to split the string into an array such as
["abc", "123", "def", "456"]
string input = "abc123def456";
Regex re = new Regex(#"\D+|\d+");
string[] result = re.Matches(input).OfType<Match>()
.Select(m => m.Value).ToArray();
string[] result = Regex.Split("abc123def456", "([0-9]+)");
The above will use any sequence of numbers as the delimiter, though wrapping it in () says that we still would like to keep our delimiter in our returned array.
Note: In the example snippet we will get an empty element as the last entry of our array.
The boundary you look for can be described as "A position where a digit follows a non-digit, or where a non-digit follows a digit."
So:
string[] result = Regex.Split("abc123def456", #"(?<=\D)(?=\d)|(?<=\d)(?=\D)");
Use [0-9] and [^0-9], respectively, if \d and \D are not specific enough.
Add space around digitals, then split it. So there is the solution.
Regex.Replace("abc123def456", #"(\d+)", #" \1 ").Split(' ');
I hope it works.
You could convert the string to a char array and then loop through the characters. As long as the characters are of the same type (letter or number) keep adding them to a string. When the next character no longer is of the same type (or you've reached the end of the string), add the temporary string to the array and reset the temporary string to null.

Categories

Resources