regular expression to capture content of a group and reuse them

regular expression to capture content of a group and reuse them - c#

How can I write a regular expression to replace
VALUES ('some text')
with
SELECT * FROM (SELECT 'some text') AS tmp...
Basically, I have an input file, with multiple Insert statements. I want to use Regex to convert each insert statement into a IF NOT EXISTS then INSERT Statement (and run in in MySQL).
So, this is my input:
INSERT INTO table_listnames (name, address, tele) VALUES ('Rupert', 'Somewhere', '022')
and this is the desired output:
INSERT INTO table_listnames (name, address, tele)
SELECT * FROM (SELECT 'Rupert', 'Somewhere', '022') AS tmp
WHERE NOT EXISTS (
SELECT VersionNumber FROM ReleaseInfo WHERE VersionNumber = '1.0.0.1'
) LIMIT 1;

You could use
VALUES\s*\(([^()]*)\)
And replace this with
SELECT * FROM (SELECT $1) AS tmp
See a demo on regex101.com.
Broken down, this says:
VALUES # match VALUES
\s*\( # whitespaces, optionally, (
([^()]*) # capture anything inside ()
\s* # another whitespaces, optionally
\) # )

Related

string manipulation : how to split and join a string with delimiters include

I want to split a string and join a certain string at the same time. the string that will be splitted is SQL query.
I set the split delimiters: {". ", ",", ", ", " "}
for example:
select id, name, age, status from tb_test where age > 20 and status = 'Active'
I want it to produce a result something like this:
select
id
,
name
,
age
,
status
from
tb_test
where
age > 20
and
status = 'Active'
but the one that I got by using string split is only word by word.
what should I do to make it have a result like the above?
Thanks in advance.

First create a list of all SQL commands where you want to split on:
List<string> sql = new List<string>() {
"select",
"where",
"and",
"or",
"from",
","
};
After that loop over this list and replace the command with his self surrounded by $.
This $ dollar sign will be the character to split on later on.
string query = "select id, name, age, status from tb_test where age > 20 and status = 'Active'";
foreach (string s in sql)
{
//Use ToLower() so that all strings don't have capital characters
query = query.Replace(s.ToLower(), "$" + s.ToLower() + "$");
}
Now do the split and remove the spaces in front and end using Trim():
string[] splits = query.Split(new char[] { '$' }, StringSplitOptions.RemoveEmptyEntries);
foreach (string s in splits) Console.WriteLine(s.Trim() + "\r\n");
This will split on the SQL commands. Now you can further customize it to your needs.
Result:
select
id
,
name
,
age
,
status
from
tb_test
where
age > 20
and
status = 'Active'

Here's a pure-regex solution:
(?:(?=,)|(?<![<>=]) +(?! *[<>=])|(?:(?<=,)))(?=(?:(?:[^'"]*(?P<s>['"])(?:(?!(?P=s)).)*(?P=s)))*[^'"]*$)
I made it so it can deal with the usual pitfalls, like strings, but there's probably still some stuff that'll break it. See demo.
Explanation:
(?:
(?=,) # split before a comma.
|
(?<! # if not preceded by an operator, ...
[<>=]
)
+ #...split at a space...
(?! *[<>=]) #...unless there's an operator behind the space.
|
(?: # also split after a comma.
(?<=,)
)
)
# HOWEVER, make sure this isn't inside of a string.
(?= # assert that there's an even number of quotes left in the text.
(?: # consume pairs of quotes.
[^'"]* # all text up to a quote
(?P<s>['"]) # capture the quote
(?: # consume everything up to the next quote.
(?!
(?P=s)
)
.
)*
(?P=s)
)*
[^'"]* # then make sure there are no more quotes until the end of the text.
$
)

First split splits keywords SELECT, FROM, WHERE.
Second split splits all columns by using your delimeters

One approach using regex:
string strRegex = #"(select)|(from)|(where)|([,\.])";
Regex myRegex = new Regex(strRegex, RegexOptions.IgnoreCase | RegexOptions.Multiline);
string strTargetString = #"select id, name, age, status from tb_test where age > 20 and status = 'Active'";
string strReplace = "$1\r\n";
return myRegex.Replace(strTargetString, strReplace);
This should output:
select
id ,
name ,
age ,
status from
tb_test where
age > 20 and status = 'Active'
You may want to perform another replacement to trim spaces before coma.
And also use "\r\n$1\r\n" only for sql keywords (select, from where, ...)
Hope this help.

Efficient SQL Bucket Sort based on Length of Substring Match

Given a SQL database table containing strings indexed alphabetically, how might I perform a search query that orders by substring match?
For example, given the data set:
bad
banana
bandana
banker
bed
brother
And the search string band, I would expect the results ordered as follows
bandana (index 0-3 matched)
banana (index 0-2 matched)
banker
bad (index 0-1 matched)
bed (index 0 matched)
brother
Note that we only care about the length of the substring matched. The matches that fall into each bucket don't have to be sorted alphabetically, I only care about the bucket they fall into.
So I guess naively the problem involves:
Seeing the length of substring match against my input for each row
Putting each row into the appropriate bucket based on the match length
Ordering the buckets in a descending order, ie (4 chars matched, 3 chars matched, 2..)
But this sounds expensive, so how could I implement this in SQL or C#, and do it efficiently?
Is there a similar problem/pattern I could benefit from here?
Many thanks

Not sure if it is the most efficient way but.
Using a numbers table, split the strings into chars and join this to a split of the search string then just order by count and the string.
DECLARE #t TABLE ( string VARCHAR(50) )
INSERT INTO #t (string)
VALUES
('bad'),
('banana'),
('bandana'),
('banker'),
('bed'),
('brother')
DECLARE #search VARCHAR(50) = 'band'
;WITH numbers AS
(
SELECT TOP 10000 ROW_NUMBER() OVER(ORDER BY t1.number) AS n
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
)
SELECT string
FROM #t t
CROSS APPLY (
SELECT SUBSTRING(t.string, numbers.n, 1) c, n
FROM numbers
WHERE numbers.n <= LEN(string)
) s1
JOIN (
SELECT SUBSTRING(#search, numbers.n, 1) c, n
FROM numbers
WHERE numbers.n <= LEN(#search)
) s2 ON s2.c = s1.c
AND s2.n = s1.n
GROUP BY string
ORDER BY COUNT(1) DESC, string
demo

String operations and sql-server is not the best match afaik.
My best bet would be to try a modified version of the Bayer-Moore-horspool to find the number of matching characters. However, on a miss you wouldn't skip the full word length, only the length of the maximum match. Then simply insert into the appriate bucket.

Regex replace between and including tags

I have the following line of text (META Title):
Buy [ProductName][Text] at a great price [/Text] from [ShopName] today.
I am replacing depending on what values I have.
I have it working as I require however I can't find the correct regex to replace:
[Text] at a great price [/Text]
The words (in a nd between square brackets) change so the only thing that will remain the same is:
[][/]
i.e I may also want to replace
[TestText]some test text[/TestText] with nothing.
I have this working:
System.Text.RegularExpressions.Regex.Replace(SEOContent, #"\[Text].*?\[/Text]", #"");
I presumed the regex of:
[.*?].*?\[/.*?]
Would work but it didn't! - I'm coding in ASP.NET C#
Thanks in advance,
Dave

Use a named capture to get the node name of [..], then find it again using \k<..>.
(\[(?<Tag>[^\]]+)\][^\[]+\[/\k<Tag>\])
Broken down using Ignore Pattern Whitespace and an example program.
string pattern = #"
( # Begin our Match
\[ # Look for the [ escape anchor
(?<Tag>[^\]]+) # Place anything that is not antother ] into the named match Tag
\] # Anchor of ]
[^\[]+ # Get all the text to the next anchor
\[/ # Anchor of the closing [...] tag
\k<Tag> # Use the named capture subgroup Tag to balance it out
\] # Properly closed end tag/node.
) # Match is done";
string text = "[TestText]some test text[/TestText] with nothing.";
Console.WriteLine (Regex.Replace(text, pattern, "Jabberwocky", RegexOptions.IgnorePatternWhitespace));
// Outputs
// Jabberwocky with nothing.
As an aside, I would actually create a tokenizing regex (using a regex If with the above pattern) and replace within matches by identify the sections by named captures. Then in the replace using a match evaluator replace the identified tokens such as:
string pattern = #"
(?(\[(?<Tag>[^\]]+)\][^\[]+\[/\k<Tag>\]) # If statement to check []..[/] situation
( # Yes it is, match into named captures
\[
(?<Token>[^\]]+) # What is the text inside the [ ], into Token
\]
(?<TextOptional>[^\[]+) # Optional text to reuse
\[
(?<Closing>/[^\]]+) # The closing tag info
\]
)
| # Else, let is start a new check for either [] or plain text
(?(\[) # If a [ is found it is a token.
( # Yes process token
\[
(?<Token>[^\]]+) # What is the text inside the [ ], into Token
\]
)
| # Or (No of the second if) it is just plain text
(?<Text>[^\[]+) # Put it into the text match capture.
)
)
";
string text = #"Buy [ProductName] [Text]at a great price[/Text] from [ShopName] today.";
Console.WriteLine (
Regex.Replace(text,
pattern,
new MatchEvaluator((mtch) =>
{
if (mtch.Groups["Text"].Success) // If just text, return it.
return mtch.Groups["Text"].Value;
if (mtch.Groups["Closing"].Success) // If a Closing match capture group reports success, then process
{
return string.Format("Reduced Beyond Comparison (Used to be {0})", mtch.Groups["TextOptional"].Value);
}
// Otherwise its just a plain old token, swap it out.
switch ( mtch.Groups["Token"].Value )
{
case "ProductName" : return "Jabberwocky"; break;
case "ShopName" : return "StackOverFlowiZon"; break;
}
return "???"; // If we get to here...we have failed...need to determine why.
}),
RegexOptions.IgnorePatternWhitespace | RegexOptions.ExplicitCapture));
// Outputs:
// Buy Jabberwocky Reduced Beyond Comparison (Used to be at a great price) from StackOverFlowiZon today.

C# regular expression to NSRegularExpression

I have the following regular expression that gets me the table name and column details of a create index statement:
Regex r = new Regex(#"create\s*index.*?\son\s*\[?(?<table>[\s\w]*\w)\]?\s*\((?:(?<cname>[\s\d\w\[\]]*),?)*\)", RegexOptions.Compiled | RegexOptions.IgnoreCase);
I would like to use this in Objective-C. I have tried the following:
NSError * error = nil;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern: #"create\\s*index.*?\\son\\s*\\[?([\\s\\w]*\\w)\\]?\\s*\\((?:([\\s\\d\\w\\[\\]]*),?)*\\)"
options: NSRegularExpressionCaseInsensitive | NSRegularExpressionSearch
error: &error];
if(nil != error)
{
NSLog(#"Error is: %#. %#", [error localizedDescription], error);
}
NSRange rangeOfFirstMatch = [regex rangeOfFirstMatchInString: createStatement options:0 range: NSMakeRange(0, [createStatement length])];
NSArray *matches = [regex matchesInString: createStatement
options: 0
range: NSMakeRange(0, [createStatement length])];
Which partially works. It gives me three ranges. The first one contains the entire string and the second one contains the table name. The problem is that the third one is empty.
Anyone have any ideas where I'm going wrong?
Edit: The string I'm trying to parse is: CREATE INDEX cardSetIndex ON [card] (cardSetId ASC)

The problem you seem to have is that the second capture group is being overwritten by the last itteration of (?: )*. Since its optional, its always blank.
Your regex:
create\s*index.*?\son\s*
\[?
( [\s\w]*\w )
\]?\s*
\(
(?:
( [\s\d\w\[\]]* )
,?
)*
\)
Change it to:
create\s*index.*?\son\s*
\[?
( [\s\w]*\w )
\]?
\s*
\(
(
(?: [\s\d\w\[\]]* ,? )*
)
\)
Compressed and escaped:
create\\s*index.*?\\son\\s*\\[?([\\s\\w]*\\w)\\]?\\s*\\(((?:[\\s\\d\\w\\[\\]]*,?)*)\\)

Your question is sort of vague about what you actually want to grab, and that regex is quite gnarly, so I wasn't able to glean it from that.
CREATE\s*INDEX\s*(\w+)\s*ON\s*(\[\w+\])\s*\((.+)\)
That will grab the table name in group 1, the ON property in group 2, and the following property after that in group 3.

SQL/C# - Primary Key error on UPSERT

UPDATE(simplified problem, removed C# from the issue)
How can I write an UPSERT that can recognize when two rows are the same in the following case...
See how there's a \b [backspace] encoded there (the weird little character)? SQL sees these as the same. While my UPSERT sees this as new data and attempts an INSERT where there should be an UPDATE.
//UPSERT
INSERT INTO [table]
SELECT [col1] = #col1, [col2] = #col2, [col3] = #col3, [col4] = #col4
FROM [table]
WHERE NOT EXISTS
-- race condition risk here?
( SELECT 1 FROM [table]
WHERE
[col1] = #col1
AND [col2] = #col2
AND [col3] = #col3)
UPDATE [table]
SET [col4] = #col4
WHERE
[col1] = #col1
AND [col2] = #col2
AND [col3] = #col3

You need the # sign, otherwise a C# character escape sequence is hit.
C# defines the following character escape sequences:
\' - single quote, needed for character literals
\" - double quote, needed for string literals
\\ - backslash
\0 - Unicode character 0
\a - Alert (character 7)
\b - Backspace (character 8)
\f - Form feed (character 12)
\n - New line (character 10)
\r - Carriage return (character 13)
\t - Horizontal tab (character 9)
\v - Vertical quote (character 11)
\uxxxx - Unicode escape sequence for character with hex value xxxx
\xn[n][n][n] - Unicode escape sequence for character with hex value nnnn (variable length version of \uxxxx)
\Uxxxxxxxx - Unicode escape sequence for character with hex value xxxxxxxx (for generating surrogates)

After hours of tinkering it turns out I've been on a wild goose chase. The problem is very simple. I pulled my UPSERT from a popular SO post. The code is no good. The select will sometimes return > 1 rows on INSERT. Thereby attempting to insert a row, then insert the same row again.
The fix is to remove FROM
//UPSERT
INSERT INTO [table]
SELECT [col1] = #col1, [col2] = #col2, [col3] = #col3, [col4] = #col4
--FROM [table] (Dont use FROM..not a race condition, just a bad SELECT)
WHERE NOT EXISTS
( SELECT 1 FROM [table]
WHERE
[col1] = #col1
AND [col2] = #col2
AND [col3] = #col3)
UPDATE [table]
SET [col4] = #col4
WHERE
[col1] = #col1
AND [col2] = #col2
AND [col3] = #col3
Problem is gone.
Thanks to all of you.

You are using '\u' which generates a Unicode character.
Your column is a varchar, which does not support Unicode characters. nvarchar would support the character.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

regular expression to capture content of a group and reuse them - c#

You could use VALUES\s\(([^()])\) And replace this with SELECT * FROM (SELECT $1) AS tmp See a demo on regex101.com. Broken down, this says: VALUES # match VALUES \s\( # whitespaces, optionally, ( ([^()]) # capture anything inside () \s* # another whitespaces, optionally \) # )

Related

string manipulation : how to split and join a string with delimiters include

Efficient SQL Bucket Sort based on Length of Substring Match

Regex replace between and including tags

C# regular expression to NSRegularExpression

SQL/C# - Primary Key error on UPSERT

Categories

Resources

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

regular expression to capture content of a group and reuse them - c#

You could use VALUES\s*\(([^()]*)\) And replace this with SELECT * FROM (SELECT $1) AS tmp See a demo on regex101.com. Broken down, this says: VALUES # match VALUES \s*\( # whitespaces, optionally, ( ([^()]*) # capture anything inside () \s* # another whitespaces, optionally \) # )

Related

string manipulation : how to split and join a string with delimiters include

Efficient SQL Bucket Sort based on Length of Substring Match

Regex replace between and including tags

C# regular expression to NSRegularExpression

SQL/C# - Primary Key error on UPSERT

Categories

Resources

You could use VALUES\s\(([^()])\) And replace this with SELECT * FROM (SELECT $1) AS tmp See a demo on regex101.com. Broken down, this says: VALUES # match VALUES \s\( # whitespaces, optionally, ( ([^()]) # capture anything inside () \s* # another whitespaces, optionally \) # )