I have a lot of source codes with SQL queries like below:
c_query := "SELECT * FROM TABLE WHERE FIELD_NAME_ONE[2] = 'AB' AND FIELD_NAME_TWO[1,8] = 'ABCDEFGH'"
I would like to match these: FIELD_NAME_ONE[2] and FIELD_NAME_TWO[1,8] and these patterns must be found between double quotes (").
Edit
c_query := "SELECT * FROM TABLE WHERE FIELD_NAME_ONE[2] = 'AB' AND FIELD_NAME_TWO[1,8] = 'ABCDEFGH' AND TESTE[9] = 'XXXXXXXXX' AND FOO = '" + is_an_array[2] + "'"
It shouldn´t match is_an_array[2] because is not inside the double quotes.
I'm assuming you want to be able to match more than just those two specific fields, otherwise you wouldn't have gone to the trouble of applying regular expressions:
var tokens = Regex.Matches(sql, "\"([^\"]+)\"");
foreach (Match token in tokens) {
string str = token.Groups[1].Value;
var fields = Regex.Matches(str, #"(\w+\[\d+(,\d+)*\])");
foreach (Match field in fields)
Console.WriteLine(field.Value);
}
This will find any sequence of letters, numbers and underscores followed by square brackets, with 1 or more comma-separated numbers.
If you only want to match a sequence of letters and underscores before the square brackets, ammend the pattern to:
#"([a-zA-Z_]+\[\d+(,\d+)*\])"
Here's a regex pattern for what you seek (edited based on comment):
".*?[A-Z_]+\[\d(,\d)?\].*?"
I made the match non-greedy (.*?) in case there are multiple matches on the one line (unlikely, but for completeness...)
Edited
It's not clear if you want to match just the target field names or the whole SQL. If you want to match just the field names alone, use non-capturing groups for the rest:
(?:".*?)[A-Z_]+\[\d(,\d)?\](?:.*?")
Related
I've got the following value:
--> Some comment
CREATE VIEW ABC
AS SELECT
Z.NUMBER AS ID,
Z.LANGUAGE AS LNG,
SUBSTR(Z.VALUE_01,01,02) AS RUN_NUMB,
SUBSTR(Z.TXT_VALUE_01,01,79) AS TXT
FROM
MYTABLE Z
WHERE ID = '0033'
AND LNG = 'DE'
I want a regular expression, where I can pass the value (or a part of the value) before the AS and I'll receive the AS-Value, e.g.
Z.NUMBER --> I'll receive ID
Z.LANGUAGE --> I'll receive LNG
Z.VALUE_01 --> I'll receive RUN_NUMB
Z.TXT_VALUE_01 --> I'll receive TXT
Currently I have something like this:
(?<=Z.NUMBER\sAS).+?(?=(,|FROM))
...but this doesn't work for my SUBSTR values
Edit: I'm using C# to execute the Regex:
string expr = #"--> Some comment ....."; //so the long text
string columnExprValue = "Z.LANGUAGE";
string asValue = Regex.Match(expr, #"(?<=" + columnExprValue + #"\sAS).+?(?=(,|FROM))")?.Value.Replace("AS", "").Trim() ?? ""; //Workaround to remove AS, because I don't know how to remove it in Regex
This should work, but the implementation is "naive" in sense that it always expects correct valid parameters that do really exists, you can add necessary checks needed.
So the regex I'm going to use is this .*Z\.VALUE_01.*\s+AS\s+(?<Alias>[^,\s]*), where "Z\.VALUE_01" I will do as parameter. See regex tester - https://regex101.com/r/UJi8pY/1
The idea here is that in Group named "Alias" we should have the exact thing you are looking for
Then C# code will look like this:
public static string GetAlias(string input, string column)
{
var regexPart = column.Replace(".","\\.");
return Regex.Match(input, $".*{regexPart}.*\\s+AS\\s+(?<Alias>[^,\\s]*)").Groups["Alias"].ToString();
}
public static void Main()
{
string val = #"--> Some comment
CREATE VIEW ABC
AS SELECT
Z.NUMBER AS ID,
Z.LANGUAGE AS LNG,
SUBSTR(Z.VALUE_01,01,02) AS RUN_NUMB,
SUBSTR(Z.TXT_VALUE_01,01,79) AS TXT
FROM
MYTABLE Z
WHERE ID = '0033'
AND LNG = 'DE'";
Console.WriteLine(GetAlias(val, "Z.NUMBER"));
Console.WriteLine(GetAlias(val, "Z.LANGUAGE "));
Console.WriteLine(GetAlias(val, "Z.VALUE_01"));
Console.WriteLine(GetAlias(val, "Z.TXT_VALUE_01"));
}
.NET Fiddle - https://dotnetfiddle.net/Z9kd8h
Good suggestion in another answer from #the-fourth-bird to use Regex.Escape instead of column.Replace(".","\\."), so all regex symbols would be escaped
Getting the values with a regex from sql can be very brittle, this pattern is based on the example data.
To get the values only you might use lookarounds:
(?<=\bZ\.(?:LANGUAGE|NUMBER|(?:TXT_)?VALUE_01)\b.*?\sAS\s+)[^\s,]+(?=,|\s+FROM\b)
Explanation
(?<= Lookbehind assertion
\b A word boundary
Z\. Match Z.
(?:LANGUAGE|NUMBER|(?:TXT_)?VALUE_01)\b Match any of the alternatives followed by a word boundary (Or just match a single string like Z\.LANGUAGE)
.*? Match optional characters, as few as possible
\sAS\s+ Match AS between whitespace chars
) Close the lookbehind
[^\s,]+ Match 1+ non whitspace chars except for a comma
(?=,|\s+FROM\b) Positive lookahead, assert either , or FROM to the right
See a .NET regex demo.
Or a capture group variant:
\bZ\.(?:LANGUAGE|NUMBER|(?:TXT_)?VALUE_01)\b.*?\sAS\s+([^\s,])+(?:,|\s+FROM\b)
See another .NET regex demo.
If you want to make the pattern dynamic, you can make use of Regex.Escape to escape the meta characters like the dot to match it literally, or else it would match any character.
For example:
string input = #"--> Some comment
CREATE VIEW ABC
AS SELECT
Z.NUMBER AS ID,
Z.LANGUAGE AS LNG,
SUBSTR(Z.VALUE_01,01,02) AS RUN_NUMB,
SUBSTR(Z.TXT_VALUE_01,01,79) AS TXT
FROM
MYTABLE Z
WHERE ID = '0033'
AND LNG = 'DE'";
string columnExprValue = Regex.Escape("Z.LANGUAGE");
string pattern = #"(?<=\b" + columnExprValue + #"\b.*?\sAS\s+)[^\s,]+(?=,|\s+FROM\b)";
string asValue = Regex.Match(input, pattern)?.Value ?? "";
Console.WriteLine(asValue);
Output
LNG
Check this :
/^ \h*+ (?:substr[(])?(?: Z.TXT_VALUE_01 )(?:,[^,]+,[^,]+[)])? \h* AS \h+ (\w+) \v* [,]? \v* $/gmxi
I want to save an e-mail-address out of a .txt-file into a string variable. This is my code:
String path = "C:\\Users\\test.txt";
string from;
var fro = new Regex("from: (?<fr>)");
using (var reader = new StreamReader(File.OpenRead(#path)))
{
while (true)
{
var nextLine = reader.ReadLine();
if (nextLine == null)
break;
var matchb = fro.Match(nextLine);
if (matchb.Success)
{
from = matchb.Groups["fr"].Value;
Console.WriteLine(from);
}
}
}
I know that matchb.Success is true, however from won't be displayed correctly. I'm afraid it has something to do with the escape sequence, but I was unable to find anything helpful on the internet.
The textfile might look like this:
LOG 00:01:05 processID=123456-12345 from: test#test.org
LOG 00:01:06 processID=123456-12345 OK
Your (?<fr>) pattern defines a named group "fr" that matches an empty string.
To fill the group with some value you need to define the group pattern.
If you plan to match the rest of the line, you may use .*. To match a sequence of non-whitespace chars, use \S+. To match a sequence of non-whitespace chars that has a # inside, use \S+#\S+. All the three approaches will work for the current scenario.
In C#, it will look like
var fro = new Regex(#"from: *(?<fr>\S+#\S+)");
Note that #"..." is a verbatim string literal where a single backslash defines a literal backslash, so you do not have to double it. I also suggest using the * quantifier to match 0 or more spaces before the email. You might want to use \s* (to match any 0+ whitespace chars) or [\p{Zs}\t]* (to match only horizontal whitespace chars) instead.
I already tried two days to solve the Problem, that I have a MatchCollection. In the patter is a Group and I want to have a list with the Solutions of the Group (there were two or more Solutions).
string input = "<tr><td>Mi, 09.09.15</td><td>1</td><td>PK</td><td>E</td><td>123</td><td></td></tr><tr><td>Mi, 09.09.15</td><td>2</td><td>ER</td><td>ER</td><td>234</td><td></td></tr>";
string Patter2 = "^<tr>$?<td>$?[D-M][i-r],[' '][0-3][1-9].[0-1][1-9].[0-9][0-9]$?</td>$?<td>$?([1-9][0-2]?)$?</td>$?";
Regex r2 = new Regex(Patter2);
MatchCollection mc2 = r2.Matches(input);
foreach (Match match in mc2)
{
GroupCollection groups = match.Groups;
string s = groups[1].Value;
Datum2.Text = s;
}
But only the last match (2) appears in the TextBox "Datum2".
I know that I have to use e.g. a listbox, but the Groups[1].Value is a string...
Thanks for your help and time.
Dieter
First thing you need to correct in the code is Datum2.Text = s; would overwrite the text in Datum2 if it were more than one match.
Now, about your regex,
^ forces a match at the begging of the line, so there is really only 1 match. If you remove it, it'll match twice.
I can't seem to understand what was intended with $? all over the pattern (just take them out).
[' '] matches "either a quote, a space or a quote (no need to repeat characters in a character class.
All dots in [0-3][1-9].[0-1][1-9].[0-9][0-9] need to be escaped. A dot matches any character otherwise.
[0-1][1-9] matches all months except "10". The second character shoud be [0-9] (or \d).
Code:
string input = "<tr><td>Mi, 09.09.15</td><td>1</td><td>PK</td><td>E</td><td>123</td><td></td></tr><tr><td>Mi, 09.09.15</td><td>2</td><td>ER</td><td>ER</td><td>234</td><td></td></tr>";
string Patter2 = "<tr><td>[D-M][i-r],[' ][0-3][0-9]\\.[0-1][0-9]\\.[0-9][0-9]</td><td>([1-9][0-2]?)</td>";
Regex r2 = new Regex(Patter2);
MatchCollection mc2 = r2.Matches(input);
string s= "";
foreach (Match match in mc2)
{
GroupCollection groups = match.Groups;
s = s + " " + groups[1].Value;
}
Datum2.Text = s;
Output:
1 2
DEMO
You should know that regex is not the tool to parse HTML. It'll work for simple cases, but for real cases do consider using HTML Agility Pack
I tried to find a method to count a specific word in a string, and I found this.
using System.Text.RegularExpressions;
MatchCollection matches = Regex.Matches("hi, hi, everybody.", ",");
int cnt = matches.Count;
Console.WriteLine(cnt);
It worked fine, and the result shows 2.
But when I change "," to ".", it shows 18, not the expected 1. Why?
MatchCollection matches = Regex.Matches("hi, hi, everybody.", ".");
and when I change "," to "(", it shows me an error!
the error reads:
SYSTEM.ARGUMENTEXCEPTION - THERE ARE TOO MANY (...
I don't understand why this is happening
MatchCollection matches = Regex.Matches("hi( hi( everybody.", "(");
Other cases seem to work fine but I need to count "(".
The first instance, with the ., is using a special character which has a different meaning in regular expressions. It is matching ALL of the characters you have; hence you getting a result of 18.
http://www.regular-expressions.info/dot.html
To match an actual "." character, you'll need to "escape" it so that it is read as a full-stop and not a special character.
MatchCollection matches = Regex.Matches("hi, hi, everybody.", "\.");
The same exists for the ( character. It's a special character that has a different meaning in terms of regular expressions and you will need to escape it.
MatchCollection matches = Regex.Matches("hi( hi( everybody.", "\(");
Looks like you're new to regular expressions so I'd suggest reading, the link I posted above is a good start.
HOWEVER!
If you are looking to just count ocurences in a string, you don't need regex.
How would you count occurrences of a string within a string?
If you're using .NET 3.5 you can do this in a one-liner with LINQ:
int cnt = source.Count(f => f == '(');
If you don't want to use LINQ you can do it with:
int cnt = source.Split('(').Length - 1;
The second parameter represents a pattern, not necessarily just a character to search for in your string, and the ( by itself is an invalid pattern.
You don't need Regex to count occurrences of a character. Just use LINQ's Count():
var input = "hi( hi( everybody.";
var occurrences = input.Count(x => x == '('); // 2
( character is a special character which means start of a group. If you need to use ( as literal you need to escape it with \(. That should solve your problem.
I am a complete newb when it comes to regex, and would like help to make an expression to match in the following:
{ValidFunctionName}({parameter}:"{value}")
{ValidFunctionName}({parameter}:"{value}",
{parameter}:"{value}")
{ValidFunctionName}()
Where {x} is what I want to match, {parameter} can be anything $%"$ for example and {value} must be enclosed in quotation marks.
ThisIsValid_01(a:"40")
would be "ThisIsValid_01", "a", "40"
ThisIsValid_01(a:"40", b:"ZOO")
would be "ThisIsValid_01", "a", "40", "b", "ZOO"
01_ThisIsntValid(a:"40")
wouldn't return anything
ThisIsntValid_02(a:40)
wouldn't return anything, as 40 is not enclosed in quotation marks.
ThisIsValid_02()
would return "ThisIsValid_02"
For a valid function name I came across: "[A-Za-z_][A-Za-z_0-9]*"
But I can't for the life of me figure out how to match the rest.
I've been playing around on http://regexpal.com/ to try to get valid matches to all conditions, but to no avail :(
It would be nice if you kindly explained the regex too, so I can learn :)
EDIT: This will work, uses 2 regexs. The first get the function name and everything inside it, the second extracts each pair of params and values from what's inside the function's brackets. You cannot do this with a single regex. Add some [ \t\n\r]* for whitespace.
Regex r = new Regex(#"(?<function>\w[\w\d]*?)\((?<inner>.*?)\)");
Regex inner = new Regex(#",?(?<param>.+?):""(?<value>[^""]*?)""");
string input = "_test0(a:\"lolololol\",b:\"2\") _test1(ghgasghe:\"asjkdgh\")";
List<List<string>> matches = new List<List<string>>();
MatchCollection mc = r.Matches(input);
foreach (Match match in mc)
{
var l = new List<string>();
l.Add(match.Groups["function"].Value);
foreach (Match m in inner.Matches(match.Groups["inner"].Value))
{
l.Add(m.Groups["param"].Value);
l.Add(m.Groups["value"].Value);
}
matches.Add(l);
}
(Old) Solution
(?<function>\w[\w\d]*?)\((?<param>.+?):"(?<value>[^"]*?)"\)
(Old) Explanation
Let's remove the group captures so it is easier to understand: \w[\w\d]*?\(.+?:"[^"]?"\)
\w is the word class, it is short for [a-zA-Z_]
\d is the digit class, it is short for [0-9]
\w[\w\d]*? Makes sure there is valid word character for the start of the function, and then matches zero or more further word or digit characters.
\(.+? Matches a left bracket then one or more of any characters (for the parameter)
:"[^"]*?"\) Matches a colon, then the opening quote, then zero or more of any character except quotes (for the value) then the close quote and right bracket.
Brackets (or parens, as some people call them) as escaped with the backslashes because otherwise they are capturing groups.
The (?<name> ) captures some text.
The ? after each the * and + operators makes them non-greedy, meaning that they will match the least, rather than the most, amount of text.
(Old) Use
Regex r = new Regex(#"(?<function>\w[\w\d]*?)\((?<param>.+?):""(?<value>[^""]*?)""");
string input = "_test0(aa%£$!:\"lolololol\") _test1(ghgasghe:\"asjkdgh\")";
List<string[]> matches = new List<string[]>();
if(r.IsMatch(input))
{
MatchCollection mc = r.Matches(input);
foreach (Match match in mc)
matches.Add(new[] { match.Groups["function"].Value, match.Groups["param"].Value, match.Groups["value"].Value });
}
EDIT: Now you've added an undefined number of multiple parameters, I would recommend making your own parser rather than using regexs. The above example only works with one parameter and strictly no whitespace. This will match multiple parameters with strict whitespace but will not return the parameters and values:
\w[\w\d]*?\(.+?:"[^"]*?"(,.+?:"[^"]*?")*\)
Just for fun, like above but with whitepace:
\w[\w\d]*?[ \t\r\n]*\([ \t\r\n]*.+?[ \t\r\n]*:[ \t\r\n]*"[^"]*?"([ \t\r\n]*,[ \t\r\n]*.+?[ \t\r\n]*:[ \t\r\n]*"[^"]*?")*[ \t\r\n]*\)
Capturing the text you want will be hard, because you don't know how many captures you are going to have and as such regexs are unsuited.
Someone else has already given an answer that gives you a flat list of strings, but in the interest of strong typing and proper class structure, I’m going to provide a solution that encapsulates the data properly.
First, declare two classes:
public class ParamValue // For a parameter and its value
{
public string Parameter;
public string Value;
}
public class FunctionInfo // For a whole function with all its parameters
{
public string FunctionName;
public List<ParamValue> Values;
}
Then do the matching and populate a list of FunctionInfos:
(By the way, I’ve made some slight fixes to the regexes... it will now match identifiers correctly, and it will not include the double-quotes as part of the “value” of each parameter.)
Regex r = new Regex(#"(?<function>[\p{L}_]\w*?)\((?<inner>.*?)\)");
Regex inner = new Regex(#",?(?<param>.+?):""(?<value>[^""]*?)""");
string input = "_test0(a:\"lolololol\",b:\"2\") _test1(ghgasghe:\"asjkdgh\")";
var matches = new List<FunctionInfo>();
if (r.IsMatch(input))
{
MatchCollection mc = r.Matches(input);
foreach (Match match in mc)
{
var l = new List<ParamValue>();
foreach (Match m in inner.Matches(match.Groups["inner"].Value))
l.Add(new ParamValue
{
Parameter = m.Groups["param"].Value,
Value = m.Groups["value"].Value
});
matches.Add(new FunctionInfo
{
FunctionName = match.Groups["function"].Value,
Values = l
});
}
}
Then you can access the collection nicely with identifiers like FunctionName:
foreach (var match in matches)
{
Console.WriteLine("{0}({1})", match.FunctionName,
string.Join(", ", match.Values.Select(val =>
string.Format("{0}: \"{1}\"", val.Parameter, val.Value))));
}
Try this:
^\s*(?<FunctionName>[A-Za-z][A-Za-z_0-9]*)\(((?<parameter>[^:]*):"(?<value>[^"]+)",?\s*)*\)
^\s*(?<FunctionName>[A-Za-z][A-Za-z_0-9]*) matches the function name, ^ means start of the line, so that the first character in string must match. You can keep you remove the whitespace capture if you don't need it, I just added it to make the match a little more flexible.
The next set \(((?<parameter>[^:]*):"(?<value>[^"]+)",?)*\) means capture each parameter-value pair inside the parenthesis. You have to escape the parenthesis for the function since they are symbols within the regex syntax.
The ?<> inside parenthesis are named capture groups, which when supported by a library, as they are in .NET, make grabbing the groups in the matches a little easier.
Here:
\w[\w\d]*\s*\(\s*(?:(\w[\w\d]*):("[^"]*"|\d+))*\s*\)
Visualization of that regex here.
For Problems like that I always suggest people not to "find" a single regex but to write multiple regex sharing the work.
But here is my quick shot:
(?<funcName>[A-Za-z_][A-Za-z_0-9]*)
\(
(?<ParamGroup>
(?<paramName>[^(]+?)
:
"(?<paramValue>[^"]*)"
((,\s*)|(?=\)))
)*
\)
The whitespaces are there for better readability. Remove them or set the option to ignore pattern whitespaces.
This regex passes all your test cases:
^(?<function>[A-Za-z][\w]*?)\(((?<param>[^:]*?):"(?<value>[^"]*?)",{0,1}\s*)*\)$
This works on multiple parameters and no parameters. It also handles special characters in the param name and whitespace after the comma. There may need to be some adjustments as your test cases do not cover everything you indicate in your text.
Please note that \w usually includes digits and is not appropriate as the leading character of the function name. Reference: http://www.regular-expressions.info/charclass.html#shorthand