single quotation break the program in c# - c#

i compare words of RichTextBox with database. but if i write word in richtextbox having single quotation then this program raise Exception.
Example
who resort to primitive and barbaric methods to kill 'Israeli's.
In this Sentence the Word 'Israeli's having single quotation so this word break the program.
i writes the following code.
private void btnSeparte_Click(object sender, EventArgs e)
{
string MyConString = "server=localhost;" +
"database=sentiwordnet;" + "password=zia;" +
"User Id=root;";
MySqlConnection con = new MySqlConnection(MyConString);
string line = rtbEmotion.Text;
Regex replacer = new Regex(#"\b(is|are|am|could|will|the|you|'|not|I|in)\b|(\b\d\b)");
line = replacer.Replace(line, "");
string[] parts = Regex.Split(line, " ");
foreach (string part in parts)
{
MySqlCommand cmd = new MySqlCommand("select * from score where Word='" + part + "'", con);
con.Close();
con.Open();
MySqlDataReader r = cmd.ExecuteReader();
if (r.Read())
{
txtBxPosEmot.Text = r["Pos"].ToString();
TxtBoxNeg.Text = r["Neg"].ToString();
pos = Convert.ToDouble(txtBxPosEmot.Text);
neg = Convert.ToDouble(TxtBoxNeg.Text);
listView1.Items.Add(part);
listView1.Items.Add(pos.ToString());
listView2.Items.Add(part);
listView2.Items.Add(neg.ToString());
pos1 = pos + pos1;
neg1 = neg + neg1;
r.Close();
con.Close();
}
else
{
textBox1.Text = "";
txtbPosSyth.Text = "";
r.Close();
con.Close();
}
}
}
Exception
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'Israels'' at line

This is classic SQL injection. In SQL, statements are encased in single quotes... So your single quote ends your statement prematurely since you're not escaping it. This also leaves you vulnerable to attack from malicious users. You must use parameterized SQL to avoid it. See: http://www.dotnetperls.com/sqlparameter

In case you were wanting an explanation for the behavior that you're seeing, your regex isn't catching the ' in your input because it has to be surrounded by a word boundary, which in your case, it's not.
the reason your regex breaks is because ' causes a word boundary itself, so it has to be surrounded by letters.
your regex doesn't account for this.

Description
If you're looking to replace just the ' trapped inside a word, then you could use the following regex.
(?<=\w\b)['](?=\b\w)
Example
This php routine is included to simply show how the regex works.
<?php
$sourcestring="She said 'hi, this is Sam's house'.
'The 'quick red f'ox jumpe'd over the 'laz'y brown dog.'";
echo preg_replace('/(?<=\w\b)[\'](?=\b\w)/im','',$sourcestring);
?>
$sourcestring after replacement:
She said 'hi, this is Sams house'.
'The 'quick red fox jumped over the 'lazy brown dog.'

Related

SQL Insert not considering blank values for the insert in my C# code

I have a nice piece of C# code which allows me to import data into a table with less columns than in the SQL table (as the file format is consistently bad).
My problem comes when I have a blank entry in a column. The values statement does not pickup an empty column from the csv. And so I receive the error
You have more insert columns than values
Here is the query printed to a message box...
As you can see there is nothing for Crew members 4 to 11, below is the file...
Please see my code:
SqlConnection ADO_DB_Connection = new SqlConnection();
ADO_DB_Connection = (SqlConnection)
(Dts.Connections["ADO_DB_Connection"].AcquireConnection(Dts.Transaction) as SqlConnection);
// Inserting data of file into table
int counter = 0;
string line;
string ColumnList = "";
// MessageBox.Show(fileName);
System.IO.StreamReader SourceFile =
new System.IO.StreamReader(fileName);
while ((line = SourceFile.ReadLine()) != null)
{
if (counter == 0)
{
ColumnList = "[" + line.Replace(FileDelimiter, "],[") + "]";
}
else
{
string query = "Insert into " + TableName + " (" + ColumnList + ") ";
query += "VALUES('" + line.Replace(FileDelimiter, "','") + "')";
// MessageBox.Show(query.ToString());
SqlCommand myCommand1 = new SqlCommand(query, ADO_DB_Connection);
myCommand1.ExecuteNonQuery();
}
counter++;
}
If you could advise how to include those fields in the insert that would be great.
Here is the same file but opened with a text editor and not given in picture format...
Date,Flight_Number,Origin,Destination,STD_Local,STA_Local,STD_UTC,STA_UTC,BLOC,AC_Reg,AC_Type,AdultsPAX,ChildrenPAX,InfantsPAX,TotalPAX,AOC,Crew 1,Crew 2,Crew 3,Crew 4,Crew 5,Crew 6,Crew 7,Crew 8,Crew 9,Crew 10,Crew 11
05/11/2022,241,BOG,SCL,15:34,22:47,20:34,02:47,06:13,N726AV,"AIRBUS A-319 ",0,0,0,36,AV,100612,161910,323227
Not touching the potential for sql injection as I'm free handing this code. If this a system generated file (Mainframe extract, dump from Dynamics or LoB app) the probability for sql injection is awfully low.
// Char required
char FileDelimiterChar = FileDelimiter.ToChar()[0];
int columnCount = 0;
while ((line = SourceFile.ReadLine()) != null)
{
if (counter == 0)
{
ColumnList = "[" + line.Replace(FileDelimiterChar, "],[") + "]";
// How many columns in line 1. Assumes no embedded commas
// The following assumes FileDelimiter is of type char
// Add 1 as we will have one fewer delimiters than columns
columnCount = line.Count(x => x == FileDelimiterChar) +1;
}
else
{
string query = "Insert into " + TableName + " (" + ColumnList + ") ";
// HACK: this fails if there are embedded delimiters
int foundDelimiters = line.Count(x => x == FileDelimiter) +1;
// at this point, we know how many delimiters we have
// and how many we should have.
string csv = line.Replace(FileDelimiterChar, "','");
// Pad out the current line with empty strings aka ','
// Note: I may be off by one here
// Probably a classier linq way of doing this or string.Concat approach
for (int index = foundDelimiters; index <= columnCount; index++)
{
csv += "','";
}
query += "VALUES('" + csv + "')";
// MessageBox.Show(query.ToString());
SqlCommand myCommand1 = new SqlCommand(query, ADO_DB_Connection);
myCommand1.ExecuteNonQuery();
}
counter++;
}
Something like that should get you a solid shove in the right direction. The concept is that you need to inspect the first line and see how many columns you should have. Then for each line of data, how many columns do you actually have and then stub in the empty string.
If you change this up to use SqlCommand objects and parameters, the approximate logic is still the same. You'll add all the expected parameters by figuring out columns in the first line and then for each line you will add your values and if you have a short row, you just send the empty string (or dbnull or whatever your system expects).
The big take away IMO is that CSV parsing libraries exist for a reason and there are so many cases not addressed in the above psuedocode that you'll likely want to trash the current approach in favor of a standard parsing library and then while you're at it, address the potential security flaws.
I see your updated comment that you'll take the formatting concerns back to the source party. If they can't address them, I would envision your SSIS package being
Script Task -> Data Flow task.
Script Task is going to wrangle the unruly data into a strict CSV dialect that a Data Flow task can handle. Preprocessing the data into a new file instead of trying to modify the existing in place.
The Data Flow then becomes a chip shot of Flat File Source -> OLE DB Destination
Here's how you can process this file... I would still ask for Json or XML though.
You need two outputs set up. Flight Info (the 1st 16 columns) and Flight Crew (a business key [flight number and date maybe] and CrewID).
Seems to me the problem is how the crew is handled in the CSV.
So basic steps are Read the file, use regex to split it, write out first 16 col to output1 and the rest (with key) to flight crew. And skip the header row on your read.
var lines = System.File.IO.ReadAllLines("filepath");
for(int i =1; i<lines.length; i++)
{
var = new System.Text.RegularExpressions.Regex("new Regex("(?:^|,)(?=[^\"]|(\")?)\"?((?(1)(?:[^\"]|\"\")*|[^,\"]*))\"?(?=,|$)"); //Some code I stole to split quoted CSVs
var m = r.Matches(line[i]); //Gives you all matches in a MatchCollection
//first 16 columns are always correct
OutputBuffer0.AddRow();
OutputBuffer0.Date = m[0].Groups[2].Value;
OutputBuffer0.FlightNumber = m[1].Groups[2].Value;
[And so on until m[15]]
for(int j=16; j<m.Length; j++)
{
OutputBuffer1.AddRow(); //This is a new output that you need to set up
OutputBuffer1.FlightNumber = m[1].Groups[2].Value;
[Keep adding to make a business key here]
OutputBuffer1.CrewID = m[j].Groups[2].Value;
}
}
Be careful as I just typed all this out to give you a general plan without any testing. For example m[0] might actually be m[0].Value and all of the data types will be strings that will need to be converted.
To check out how regex processes your rows, please visit https://regex101.com/r/y8Ayag/1 for explanation. You can even paste in your row data.
UPDATE:
I just tested this and it works now. Needed to escape the regex function. And specify that you wanted the value of group 2. Also needed to hit IO in the File.ReadAllLines.
The solution that I implemented in the end avoided the script task completely. Also meaning no SQL Injection possibilities.
I've done a flat file import. Everything into one column then using split_string and a pivot in SQL then inserted into a staging table before tidy up and off into main.
Flat File Import to single column table -> SQL transform -> Load
This also allowed me to iterate through the files better using a foreach loop container.
ELT on this occasion.
Thanks for all the help and guidance.

string with quotes in C#

How I can have a string value like "PropertyBag["ABCD"]" ? I want it to assign to a value of a chart in my WPF App. I am using it like as shown below :
private void Window_Loaded(object sender, RoutedEventArgs e)
{
ViewModel vm = new ViewModel();
foreach (var str in vm.Data)
{
string name = "ABCD";
LineSeries lineSeries = new LineSeries();
lineSeries.ItemsSource = new ViewModel().Data;
lineSeries.XBindingPath = "Date";
lineSeries.YBindingPath = "PropertyBag[\" + name + \"]"; // here i am getting error saying - Input string was not in a correct format
lineSeries.IsSeriesVisible = true;
lineSeries.ShowTooltip = true;
chart.Series.Add(lineSeries);
}
this.DataContext = vm;
}
I want it like
lineSeries.YBindingPath = "PropertyBag["ABCD"]" //
(should include all 4 double quotes). How it is possible ??
I tried this also, but still same error :
lineSeries.YBindingPath = String.Format(#"PropertBag[""{0}""]", name);
This must work and it is a more elegant solution:
string name = "ABC";
lineSeries.YBindingPath = string.Format("\"PropertyBag[\"{0}\"]\"", name);
You escaped the quote before and after name, in doing that you didn't specify where the first part of the string ends and where the second one starts. To avoid errors like that it is better to use the Format method. It makes code easier to read and maintain.
The following solution:
string.Format(#"""PropertyBag[""{0}""]""", name)
is good too, but I think the first one is more readable. I personally like more the first one. But it is up to you, just avoid concatenation; in modern programming it is deprecated especially when the language has more efficient and powerful tools to do the job.
You're escaping the quotes before the pluses, meaning they get interpreted as characters, not as addition operators. You need to add additional quotation marks, like so:
lineSeries.YBindingPath = "\"PropertyBag[\"" + name + "\"]\"";
This way, you define two new strings; a prefix and a suffix to the name variable.
You can also use string interpolation, which results in cleaner code:
lineSeries.YBindingPath = $"\"PropertyBag[\"{name}\"]\"";
Quoted string literals
("PropertyBag[\"" + name + "\"]");
Verbatim string literals
(#"PropertyBag[""{0}""]", name);
Try Raw string literals if you're on C# 11

code C# error getting ;expected

code for Inserting into database
query = "INSERT INTO Question(Image, AnswerA, AnswerB, AnswerC, AnswerD, CorrectAnswer)"
+ $"VALUES(\""{name}\",\""{answerList[0]}\",\"{answerList[1]}\",\""{answerList[2]}\",\"{answerList[3]}\",\"{name}\"};";
I am getting error in this line as "; expected":
+ $"VALUES(\""{name}\",\""{answerList[0]}\",\"{answerList[1]}\",\""{answerList[2]}\",\"{answerList[3]}\",\"{name}\");";
In three places, you have an unescaped second double quote, which ends the quoted string right there:
\""{name
and
\""{answerList[0]
and
\""{answerList[2]
Those break your C#, and if you escaped them, they'd break your SQL. So don't do that. Almost certainly, you should be using single quotes rather than double quotes as well (thanks Icarus):
query = "INSERT INTO Question(Image, AnswerA, AnswerB, AnswerC, AnswerD, CorrectAnswer)"
+ $"VALUES('{name}','{answerList[0]}','{answerList[1]}','{answerList[2]}','{answerList[3]}','{name}'};";
However, that's very bad coding style. It's vulnerable to SQL injection attacks, it'll crash if one of your answers happens to have an apostrophe in it, and putting quoted or even just matched quotes in a string is highly error-prone, as you've discovered.
So start over and rewrite the code using parameters, which resolve all of these issues cleanly and simply:
SqlCommand cmd = new SqlCommand();
// ...etc.
cmd.Parameters.Add("#name", SqlDbType.NVarChar);
cmd.Parameters.Add("#answerList0", SqlDbType.NVarChar);
// ...etc.
cmd.Parameters["#name"].Value = name;
cmd.Parameters["#answerList0"].Value = answerList[0];
// ...etc.
query = "INSERT INTO Question(Image, AnswerA, AnswerB, AnswerC, AnswerD, CorrectAnswer)"
+ "VALUES(#name,#answerList0,#answerList1,#answerList2,#answerList3,#name};";
try to build the string using like this:
$#"query = ""INSERT INTO Question(Image, AnswerA, AnswerB, AnswerC, AnswerD, CorrectAnswer)""
""VALUES(""{name}"",""{ answerList[0]}"",""{answerList[1]}"",\""{
answerList[2]}"",""{answerList[3]}"",""{name}""};"";";
never use a + when building strings, because it will evaluate both and then append them to a third instead of creating just 1 string.

how to insert newline after word in sql table in c#

I am trying to insert New Line after word car but it is not working with folowing solution
Char(13) - not working
Environment.NewLine - when i use this it works but appends '(' this char in sql rows like 'Car ( Rate:2CR'
\n\r - not working
Code:
cmd.Parameters.AddWithValue("#ColumnCar", Car + "char(13)" + "Rate:2CR";
//cmd.Parameters.AddWithValue("#ColumnCar", Car + "\n\r" + "Rate:2CR";
//cmd.Parameters.AddWithValue("#ColumnCar", Car + Environment.NewLine + "Rate:2CR";
cmd.ExecuteNonQuery();
Need output in sql table ColumnCar row value as follows:
Car
Rate:2cr
Note : here after Car there will be a newline and then Rate:2Cr will be added
With the LoC Car + "char(13)" + "Rate:2CR"; you will get a literal string "char(13)" between your 2 values, not a new line. If you want only a new line you can append "\n" or you can append the character equivalent (char)10 of new line.
Now what character or string actually represents a new line might depend on your environment including the collation you are using. In simple ascii/ansi this will work. It might not be the same for another collation. As #mhasan pointed out it could also be different depending on the O/S.
Using characters
const char carriageReturn = (char) 13; // see https://en.wikipedia.org/wiki/Carriage_return
const char newLine = (char) 10;
var car = "some car";
var toInsert = car + newLine + "Rate:2CR";
cmd.Parameters.AddWithValue("#ColumnCar", toInsert);
This would also work and produce the same result:
var toInsert = car + "\n" + "Rate:2CR";
Use combination of newline and carriage return characters i.e. char(13) + char(10) for inserting new line in windows OS system.
For MAC its \r char(13) , for Linux its \n i.e. char(10) but for windows its combination of both.
Try this code hope its working...
Make a string variable and store all value in variable..
ex: string abc=textbox1.text+" "+"Rate:2cr";
#ColumnCar=abc.tostring();
now put your code
cmd.Parameters.AddWithValue("#ColumnCar",datatype);
cmd.executenonquery();
The following code works fine with unicode fields in a MS SQL-Server 2016 DB :
string carString = $"Volvo{Environment.NewLine}Rate: 2CR";
SqlParameter parameter = new SqlParameter("#ColumnCar", carString);
command.Parameters.Add(parameter);
The '(' when you use Environment.NewLine must be another error somewhere else. What is Car in your code? A class instance? What does its ToString() expand to?
Don't use string1 + " " + string2 concatenation.
Use string.Format(), $"" - inline syntax (like above) or StringBuilder to build your strings.

Regex to remove single-line SQL comments (--)

Question:
Can anybody give me a working regex expression (C#/VB.NET) that can remove single line comments from a SQL statement ?
I mean these comments:
-- This is a comment
not those
/* this is a comment */
because I already can handle the star comments.
I have a made a little parser that removes those comments when they are at the start of the line, but they can also be somewhere after code or worse, in a SQL-string 'hello --Test -- World'
Those comments should also be removed (except those in a SQL string of course - if possible).
Surprisingly I didn't got the regex working. I would have assumed the star comments to be more difficult, but actually, they aren't.
As per request, here my code to remove /**/-style comments
(In order to have it ignore SQL-Style strings, you have to subsitute strings with a uniqueidentifier (i used 4 concated), then apply the comment-removal, then apply string-backsubstitution.
static string RemoveCstyleComments(string strInput)
{
string strPattern = #"/[*][\w\d\s]+[*]/";
//strPattern = #"/\*.*?\*/"; // Doesn't work
//strPattern = "/\\*.*?\\*/"; // Doesn't work
//strPattern = #"/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/ "; // Doesn't work
//strPattern = #"/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/ "; // Doesn't work
// http://stackoverflow.com/questions/462843/improving-fixing-a-regex-for-c-style-block-comments
strPattern = #"/\*(?>(?:(?>[^*]+)|\*(?!/))*)\*/"; // Works !
string strOutput = System.Text.RegularExpressions.Regex.Replace(strInput, strPattern, string.Empty, System.Text.RegularExpressions.RegexOptions.Multiline);
Console.WriteLine(strOutput);
return strOutput;
} // End Function RemoveCstyleComments
I will disappoint all of you. This can't be done with regular expressions. Sure, it's easy to find comments not in a string (that even the OP could do), the real deal is comments in a string. There is a little hope of the look arounds, but that's still not enough. By telling that you have a preceding quote in a line won't guarantee anything. The only thing what guarantees you something is the oddity of quotes. Something you can't find with regular expression. So just simply go with non-regular-expression approach.
EDIT:
Here's the c# code:
String sql = "--this is a test\r\nselect stuff where substaff like '--this comment should stay' --this should be removed\r\n";
char[] quotes = { '\'', '"'};
int newCommentLiteral, lastCommentLiteral = 0;
while ((newCommentLiteral = sql.IndexOf("--", lastCommentLiteral)) != -1)
{
int countQuotes = sql.Substring(lastCommentLiteral, newCommentLiteral - lastCommentLiteral).Split(quotes).Length - 1;
if (countQuotes % 2 == 0) //this is a comment, since there's an even number of quotes preceding
{
int eol = sql.IndexOf("\r\n") + 2;
if (eol == -1)
eol = sql.Length; //no more newline, meaning end of the string
sql = sql.Remove(newCommentLiteral, eol - newCommentLiteral);
lastCommentLiteral = newCommentLiteral;
}
else //this is within a string, find string ending and moving to it
{
int singleQuote = sql.IndexOf("'", newCommentLiteral);
if (singleQuote == -1)
singleQuote = sql.Length;
int doubleQuote = sql.IndexOf('"', newCommentLiteral);
if (doubleQuote == -1)
doubleQuote = sql.Length;
lastCommentLiteral = Math.Min(singleQuote, doubleQuote) + 1;
//instead of finding the end of the string you could simply do += 2 but the program will become slightly slower
}
}
Console.WriteLine(sql);
What this does: find every comment literal. For each, check if it's within a comment or not, by counting the number of quotes between the current match and the last one. If this number is even, then it's a comment, thus remove it (find first end of line and remove whats between). If it's odd, this is within a string, find the end of the string and move to it. Rgis snippet is based on a wierd SQL trick: 'this" is a valid string. Even tho the 2 quotes differ. If it's not true for your SQL language, you should try a completely different approach. I'll write a program to that too if that's the case, but this one's faster and more straightforward.
You want something like this for the simple case
-{2,}.*
The -{2,} looks for a dash that happens 2 or more times
The .* gets the rest of the lines up to the newline
*But, for the edge cases, it appears that SinistraD is correct in that you cannot catch everything, however here is an article about how this can be done in C# with a combination of code and regex.
This seems to work well for me so far; it even ignores comments within strings, such as SELECT '--not a comment--' FROM ATable
private static string removeComments(string sql)
{
string pattern = #"(?<=^ ([^'""] |['][^']*['] |[""][^""]*[""])*) (--.*$|/\*(.|\n)*?\*/)";
return Regex.Replace(sql, pattern, "", RegexOptions.IgnorePatternWhitespace | RegexOptions.Multiline);
}
Note: it is designed to eliminate both /**/-style comments as well as -- style. Remove |/\*(.|\n)*?\*/ to get rid of the /**/ checking. Also be sure you are using the RegexOptions.IgnorePatternWhitespace Regex option!!
I wanted to be able to handle double-quotes too, but since T-SQL doesn't support them, you could get rid of |[""][^""]*[""] too.
Adapted from here.
Note (Mar 2015): In the end, I wound up using Antlr, a parser generator, for this project. There may have been some edge cases where the regex didn't work. In the end I was much more confident with the results having used Antlr, and it's worked well.
Using System.Text.RegularExpressions;
public static string RemoveSQLCommentCallback(Match SQLLineMatch)
{
System.Text.StringBuilder sb = new System.Text.StringBuilder();
bool open = false; //opening of SQL String found
char prev_ch = ' ';
foreach (char ch in SQLLineMatch.ToString())
{
if (ch == '\'')
{
open = !open;
}
else if ((!open && prev_ch == '-' && ch == '-'))
{
break;
}
sb.Append(ch);
prev_ch = ch;
}
return sb.ToString().Trim('-');
}
The code
public static void Main()
{
string sqlText = "WHERE DEPT_NAME LIKE '--Test--' AND START_DATE < SYSDATE -- Don't go over today";
//for every matching line call callback func
string result = Regex.Replace(sqlText, ".*--.*", RemoveSQLCommentCallback);
}
Let's replace, find all the lines that match dash dash comment and call your parsing function for every match.
As a late solution, the simplest way is to do it using ScriptDom-TSqlParser:
// https://michaeljswart.com/2014/04/removing-comments-from-sql/
// http://web.archive.org/web/*/https://michaeljswart.com/2014/04/removing-comments-from-sql/
public static string StripCommentsFromSQL(string SQL)
{
Microsoft.SqlServer.TransactSql.ScriptDom.TSql150Parser parser =
new Microsoft.SqlServer.TransactSql.ScriptDom.TSql150Parser(true);
System.Collections.Generic.IList<Microsoft.SqlServer.TransactSql.ScriptDom.ParseError> errors;
Microsoft.SqlServer.TransactSql.ScriptDom.TSqlFragment fragments =
parser.Parse(new System.IO.StringReader(SQL), out errors);
// clear comments
string result = string.Join(
string.Empty,
fragments.ScriptTokenStream
.Where(x => x.TokenType != Microsoft.SqlServer.TransactSql.ScriptDom.TSqlTokenType.MultilineComment)
.Where(x => x.TokenType != Microsoft.SqlServer.TransactSql.ScriptDom.TSqlTokenType.SingleLineComment)
.Select(x => x.Text));
return result;
}
or instead of using the Microsoft-Parser, you can use ANTL4 TSqlLexer
or without any parser at all:
private static System.Text.RegularExpressions.Regex everythingExceptNewLines =
new System.Text.RegularExpressions.Regex("[^\r\n]");
// http://drizin.io/Removing-comments-from-SQL-scripts/
// http://web.archive.org/web/*/http://drizin.io/Removing-comments-from-SQL-scripts/
public static string RemoveComments(string input, bool preservePositions, bool removeLiterals = false)
{
//based on http://stackoverflow.com/questions/3524317/regex-to-strip-line-comments-from-c-sharp/3524689#3524689
var lineComments = #"--(.*?)\r?\n";
var lineCommentsOnLastLine = #"--(.*?)$"; // because it's possible that there's no \r\n after the last line comment
// literals ('literals'), bracketedIdentifiers ([object]) and quotedIdentifiers ("object"), they follow the same structure:
// there's the start character, any consecutive pairs of closing characters are considered part of the literal/identifier, and then comes the closing character
var literals = #"('(('')|[^'])*')"; // 'John', 'O''malley''s', etc
var bracketedIdentifiers = #"\[((\]\])|[^\]])* \]"; // [object], [ % object]] ], etc
var quotedIdentifiers = #"(\""((\""\"")|[^""])*\"")"; // "object", "object[]", etc - when QUOTED_IDENTIFIER is set to ON, they are identifiers, else they are literals
//var blockComments = #"/\*(.*?)\*/"; //the original code was for C#, but Microsoft SQL allows a nested block comments // //https://msdn.microsoft.com/en-us/library/ms178623.aspx
//so we should use balancing groups // http://weblogs.asp.net/whaggard/377025
var nestedBlockComments = #"/\*
(?>
/\* (?<LEVEL>) # On opening push level
|
\*/ (?<-LEVEL>) # On closing pop level
|
(?! /\* | \*/ ) . # Match any char unless the opening and closing strings
)+ # /* or */ in the lookahead string
(?(LEVEL)(?!)) # If level exists then fail
\*/";
string noComments = System.Text.RegularExpressions.Regex.Replace(input,
nestedBlockComments + "|" + lineComments + "|" + lineCommentsOnLastLine + "|" + literals + "|" + bracketedIdentifiers + "|" + quotedIdentifiers,
me => {
if (me.Value.StartsWith("/*") && preservePositions)
return everythingExceptNewLines.Replace(me.Value, " "); // preserve positions and keep line-breaks // return new string(' ', me.Value.Length);
else if (me.Value.StartsWith("/*") && !preservePositions)
return "";
else if (me.Value.StartsWith("--") && preservePositions)
return everythingExceptNewLines.Replace(me.Value, " "); // preserve positions and keep line-breaks
else if (me.Value.StartsWith("--") && !preservePositions)
return everythingExceptNewLines.Replace(me.Value, ""); // preserve only line-breaks // Environment.NewLine;
else if (me.Value.StartsWith("[") || me.Value.StartsWith("\""))
return me.Value; // do not remove object identifiers ever
else if (!removeLiterals) // Keep the literal strings
return me.Value;
else if (removeLiterals && preservePositions) // remove literals, but preserving positions and line-breaks
{
var literalWithLineBreaks = everythingExceptNewLines.Replace(me.Value, " ");
return "'" + literalWithLineBreaks.Substring(1, literalWithLineBreaks.Length - 2) + "'";
}
else if (removeLiterals && !preservePositions) // wrap completely all literals
return "''";
else
throw new System.NotImplementedException();
},
System.Text.RegularExpressions.RegexOptions.Singleline | System.Text.RegularExpressions.RegexOptions.IgnorePatternWhitespace);
return noComments;
}
I don't know if C#/VB.net regex is special in some way but traditionally s/--.*// should work.
In PHP, i'm using this code to uncomment SQL (only single line):
$sqlComments = '#(([\'"`]).*?[^\\\]\2)|((?:\#|--).*?$)\s*|(?<=;)\s+#ms';
/* Commented version
$sqlComments = '#
(([\'"`]).*?[^\\\]\2) # $1 : Skip single & double quoted + backticked expressions
|((?:\#|--).*?$) # $3 : Match single line comments
\s* # Trim after comments
|(?<=;)\s+ # Trim after semi-colon
#msx';
*/
$uncommentedSQL = trim( preg_replace( $sqlComments, '$1', $sql ) );
preg_match_all( $sqlComments, $sql, $comments );
$extractedComments = array_filter( $comments[ 3 ] );
var_dump( $uncommentedSQL, $extractedComments );
To remove all comments see Regex to match MySQL comments

Categories

Resources