I want to write a RegEx to pull out all the variable values and their names from the variable declaration statement. Say i have
int i,k = 10,l=0
i want to write a regex something like int\s^,?|(^,?)*
but this will also accept
k = 10 i.e. (without int preceding it)
Basically idea is
If string starts with int then
get the variable list seperated by ,
i know to extract csv values, but here my string has some initial value as well. How can i resolve it?
Start thinking about the structure of a definition, say,
(a line can start with some spaces) followed by,
(Type) followed by
(at least one space)
(variable_1)
(optionally
(comma // next var
|
'='number // initialization
) ...`
then try to convert each group:
^ \s* \w+ \s+ \w+ ? (',' | '=' \d+ ) ...
line some type at least var optionally more or init some
start spaces (some chars) one space (some chars) vars val digits
Left as homework to remove spaces and fix up the final regex.
Here is some useful information which you can use
http://compsci.ca/v3/viewtopic.php?t=6712
You could build up your regular expression from the [C# Grammar](http://msdn.microsoft.com/en-us/library/aa664812(VS.71).aspx). But building a parser would certainly be better.
Try this:
^(int|[sS]tring)\s+\w+\s*(=\s*[^,]+)?(,\s*\w+\s*(=\s*[^,]+)?)*$
It'll match your example code
int i,k = 10,l=0
And making a few assumptions about the language you may or may not be using, it'll also match:
int i, j, k=10, l=0
string i=23, j, k=10, l=0
Related
I've been trying to figure this out, but I don't think I understand Regex well enough to get to where I need to.
I have string that resemble these:
filename.txt(1)attribute, 2)attribute(s), more!)
otherfile.txt(abc, def)
Basically, a string that always starts with a filename, then has some text between parentheses. And I'm trying to extract that part which is between the main parentheses, but the text that's there can contain absolutely anything, even some more parentheses (it often does.)
Originally, there was a 'hacky' expression made like this:
/\(([^#]+)\)\g
And it worked, until we ran into a case where the input string contained a # and we were stuck. Obviously...
I can't change the way the strings are generated, it's always a filename, then some parentheses and something of unknown length and content inside.
I'm hoping for a simple Regex expression, since I need this to work in both C# and in Perl -- is such a thing possible? Or does this require something more complex, like its own parsing method?
You can change exception for # symbol in your regex to regex matches any characters and add quantifier that matches from 0 to infinity symbols. And also simplify your regex by deleting group construction:
\(.*\)
Here is the explanation for the regular expression:
Symbol \( matches the character ( literally.
.* matches any character (except for line terminators)
* quantifier matches between zero and unlimited times, as many times
as possible, giving back as needed (greedy)
\) matches the character ) literally.
You can use regex101 to compose and debug your regular expressions.
Regex seems overkill to me in this case. Can be more reliably achieved using string manipulation methods.
int first = str.IndexOf("(");
int last = str.LastIndexOf(")");
if (first != -1 && last != -1)
{
string subString = str.Substring(first + 1, last - first - 1);
}
I've never used Perl, but I'll venture a guess that it has equivalent methods.
Well, I hope the title is not too confusing. My task is to match (and replace) all Xs that are between Y and Z.
I use X,Y,Z since those values may vary at runtime, but that's not a problem at all.
What I've tried so far is this:
pattern = ".*Y.*?(X).*?Z.*";
Which actually works.. but only for one X. I simply can't figure out, how to match all Xs between those "tags".
I also tried this:
pattern = #"((Y|\G).*?)(?!Z)(X)"
But this matches all Xs, ignoring the "tags".
What is the correct pattern to solve my problem? Thanks in advance :)
Edit
some more information:
X is a single char, Y and Z are strings
A more real life test string:
Some.text.with.dots [nodots]remove.dots.here[/nodots] again.with.dots
=> match .s between [nodots] and [/nodots]
(note: I used xml-like syntax here, but that's not guaranteed so I can unfortunately not use a simple xml or html parser)
In C#, if you need to replace some text inside some block of text, you may match the block(s) with a simple regex like (?s)(START)(.*?)(END) and then inside a match evaluator make the necessary replacements in the matched blocks.
In your case, you may use something like
var res = Regex.Replace(str, #"(?s)(\[nodots])(.*?)(\[/nodots])",
m => string.Format(
"{0}{1}{2}",
m.Groups[1].Value, // Restoring start delimiter
m.Groups[2].Value.Replace(".",""), // Modifying inner contents
m.Groups[3].Value // Restoring end delimiter
)
);
See the C# online demo
Pattern details:
(?s) - an inline version of the RegexOptions.Singleline modifier flag
(\[nodots])- Group 1: starting delimiter (literal string [nodots])
(.*?) - Group 2: any 0+ chars as few as possible
(\[/nodots]) - Group 3: end delimiter (literal string [/nodots])
Okay, I give up - time to call upon the regex gurus for some help.
I'm trying to validate CSV file contents, just to see if it looks like the expected valid CSV data. I'm not trying to validate all possible CSV forms, just that it "looks like" CSV data and isn't binary data, a code file or whatever.
Each line of data comprises comma-separated words, each word comprising a-z, 0-9, and a small number of of punctuation chars, namely - and _. There may be several lines in the file. That's it.
Here's my simple code:
const string dataWord = #"[a-z0-9_\-]+";
const string dataLine = "("+dataWord+#"\s*,\s*)*"+dataWord;
const string csvDataFormat = "("+dataLine+") | (("+dataLine+#"\r\n)*"+dataLine +")";
Regex validCSVDataPattern = new Regex(csvDataFormat, RegexOptions.IgnoreCase);
protected override bool IsCorrectDataFormat(string fileContents)
{
return validCSVDataPattern.IsMatch(fileContents);
}
This gives me a regex pattern of
(([a-z0-9_\-]+\s*,\s*)*[a-z0-9_\-]+) | ((([a-z0-9_\-]+\s*,\s*)*[a-z0-9_\-]+\r\n)*([a-z0-9_\-]+\s*,\s*)*[a-z0-9_\-]+)
However if I present this with a block of, say, C# code, the regex parser says it is a match. How is that? the C# code doesn't look anything like my CSV pattern (it has punctuation other than _ and -, for a start).
Can anyone point out my obvious error? Let me repeat - I am not trying to validate all possible CSV forms, just my simple subset.
Your regular expression is missing the ^ (beginning of line) and $ (end of line) anchors. This means that it would match any text that contains what is described by the expression, even if the text contains other completely unrelated parts.
For example, this text matches the expression:
foo, bar
and therefore this text also matches:
var result = calculate(foo, bar);
You can see where this is going.
Add ^ at the beginning and $ at the end of csvDataFormat to get the behavior you expect.
Here is a better pattern which looks for CSV groups such as XXX, or yyy for one to many in each line:
^([\w\s_\-]*,?)+$
^ - Start of each line
( - a CSV match group start
[\w\s_\-]* - Valid characters \w (a-zA-Z0-9) and _ and - in each CSV
,? - maybe a comma
)+ - End of the csv match group, 1 to many of these expected.
That will validate a whole file, line by line for a basic CSV structure and allow for empty ,, situations.
I came up with this regex:
^([a-z0-9_\-]+)(\s*)(,\s*[a-z0-9_\-]+)*$
Tests
asbc_- , khkhkjh, lkjlkjlkj_-, j : PASS
asbc, : FAIL
asbc_-,khkhkjh,lkjlkjlk909j_-,j : PASS
If you want to match empty lines like ,,, or when some values are blank like ,abcd,, use
^([a-z0-9_\-]*)(\s*)(,\s*[a-z0-9_\-]*)*$
Loop through all the lines to see if the file is ok:
const string dataLine = "^([a-z0-9_\-]+)(\s*)(,\s*[a-z0-9_\-]+)*$";
Regex validCSVDataPattern = new Regex(csvDataFormat, RegexOptions.IgnoreCase);
protected override bool IsCorrectDataFormat(string fileContents)
{
string[] lines = fileContents.Split(new string[] { "\r\n", "\n" }, StringSplitOptions.None);
foreach (var line in lines)
{
if (!validCSVDataPattern.IsMatch(line))
return false;
}
return true;
}
I think this is what you're looking for:
#"(?in)^[a-z0-9_-]+( *, *[a-z0-9_-]+)*([\r\n]+[a-z0-9_-]+( *, *[a-z0-9_-]+)*)*$"
The noteworthy changes are:
Added anchors (^ and $, because the regex is totally pointless without them
Removed spaces (which have to match literal spaces, and I don't think that's what you intended)
Replaced the \s in every occurrence of \s* with a literal space (because \s can match any whitespace character, and you only want to match actual spaces in those spots)
The basic structure of your regex looked pretty good until that | came along and bollixed things up. ;)
p.s., In case you're wondering, (?in) is an inline modifier that sets IgnoreCase and ExplicitCapture modes.
I need to ensure that a input string follows these rules:
It should contain upper case characters only.
NO character should be repeated in the string.
eg. ABCA is not valid because 'A' is being repeated.
For the upper case thing, [A-Z] should be fine.
But i am lost at how to ensure no repeating characters.
Can someone suggest some method using regular expressions ?
You can do this with .NET regular expressions although I would advise against it:
string s = "ABCD";
bool result = Regex.IsMatch(s, #"^(?:([A-Z])(?!.*\1))*$");
Instead I'd advise checking that the length of the string is the same as the number of distinct characters, and checking the A-Z requirement separately:
bool result = s.Cast<char>().Distinct().Count() == s.Length;
Alteranatively, if performance is a critical issue, iterate over the characters one by one and keep a record of which you have seen.
This cannot be done via regular expressions, because they are context-free. You need at least context-sensitive grammar language, so only way how to achieve this is by writing the function by hand.
See formal grammar for background theory.
Why not check for a character which is repeated or not in uppercase instead ? With something like ([A-Z])?.*?([^A-Z]|\1)
Use negative lookahead and backreference.
string pattern = #"^(?!.*(.).*\1)[A-Z]+$";
string s1 = "ABCDEF";
string s2 = "ABCDAEF";
string s3 = "ABCDEBF";
Console.WriteLine(Regex.IsMatch(s1, pattern));//True
Console.WriteLine(Regex.IsMatch(s2, pattern));//False
Console.WriteLine(Regex.IsMatch(s3, pattern));//False
\1 matches the first captured group. Thus the negative lookahead fails if any character is repeated.
This isn't regex, and would be slow, but You could create an array of the contents of the string, and then iterate through the array comparing n to n++
=Waldo
It can be done using what is call backreference.
I am a Java program so I will show you how it is done in Java (for C#, see here).
final Pattern aPattern = Pattern.compile("([A-Z]).*\\1");
final Matcher aMatcher1 = aPattern.matcher("ABCDA");
System.out.println(aMatcher1.find());
final Matcher aMatcher2 = aPattern.matcher("ABCDA");
System.out.println(aMatcher2.find());
The regular express is ([A-Z]).*\\1 which translate to anything between 'A' to 'Z' as group 1 ('([A-Z])') anything else (.*) and group 1.
Use $1 for C#.
Hope this helps.
Duplicate
Regex for variable declaration and initialization in c#
I was looking for a Regular Expression to parse CSV values, and I came across this Regular Expression
[^,]+
Which does my work by splitting the words on every occurance of a ",". What i want to know is say I have the string
value_name v1,v2,v3,v4,...
Now I want a regular expression to find me the words v1,v2,v3,v4..
I tried ->
^value_name\s+([^,]+)*
But it didn't work for me. Can you tell me what I am doing wrong? I remember working on regular expressions and their statemachine implementation. Doesn't it work in the same way.
If a string starts with Value_name followed by one or more whitespaces. Go to Next State. In That State read a word until a "," comes. Then do it again! And each word will be grouped!
Am i wrong in understanding it?
You could use a Regex similar to those proposed:
(?:^value_name\s+)?([^,]+)(?:\s*,\s*)?
The first group is non-capturing and would match the start of the line and the value_name.
To ensure that the Regex is still valid over all matches, we make that group optional by using the '?' modified (meaning match at most once).
The second group is capturing and would match your vXX data.
The third group is non-capturing and would match the ,, and any whitespace before and after it.
Again, we make it optional by using the '?' modifier, otherwise the last 'vXX' group would not match unless we ended the string with a final ','.
In you trials, the Regex wouldn't match multiple times: you have to remember that if you want a Regex to match multiple occurrences in a strings, the whole Regex needs to match every single occurrence in the string, so you have to build your Regex not only to match the start of the string 'value_name', but also match every occurrence of 'vXX' in it.
In C#, you could list all matches and groups using code like this:
Regex r = new Regex(#"(?:^value_name\s+)?([^,]+)(?:\s*,\s*)?");
Match m = r.Match(subjectString);
while (m.Success) {
for (int i = 1; i < m.Groups.Count; i++) {
Group g = m.Groups[i];
if (g.Success) {
// matched text: g.Value
// match start: g.Index
// match length: g.Length
}
}
m = m.NextMatch();
}
I would expect it only to get v1 in the group, because the first comma is "blocking" it from grabbing the rest of the fields. How you handle this is going to depend on the methods you use on the regular expression, but it may make sense to make two passes, first grab all the fields seperated by commas and then break things up on spaces. Perhaps ^value_name\s+(?:([^,]+),?)* instead.
Oh yeah, lists....
/(?:^value_name\s+|,\s*)([^,]+)/g will theoreticly grab them, but you will have to use RegExp.exec() in a loop to get the capture, rather than the whole match.
I wish pre-matches worked in JS :(.
Otherwise, go with Logan's idea: /^value_name\s+([^,]+(?:,\s*[^,]+)*)$/ followed by .split(/,\s*/);