Regex to remove this string in C#?

Regex to remove this string in C#? - c#

I am scripting Agent Jobs using SMO for SQL Server and the resulting script strings have a have parameter and value that I want to remove from the final version I am storing. The portion of the script that I want to look at is the schedule being added to the job, where it includes a #schedule_uid parameter with a GUID associated with it. I'd like to remove this entirely from the script.
EXEC #ReturnCode = msdb.dbo.sp_add_jobschedule #job_id=#jobId, #name='Job Name',
#enabled=1,
#freq_type=4,
#freq_interval=1,
#freq_subday_type=4,
#freq_subday_interval=10,
#freq_relative_interval=1,
#freq_recurrence_factor=0,
#active_start_date=20150119,
#active_end_date=99991231,
#active_start_time=0,
#active_end_time=235959,
#schedule_uid=N'a70709af-bce7-4c65-a4cd-7574acd31ca2'
The part that I want to replace is the following:
, \r\n\t\t#schedule_uid=N'a70709af-bce7-4c65-a4cd-7574acd31ca2'
So that the final string is:
EXEC #ReturnCode = msdb.dbo.sp_add_jobschedule #job_id=#jobId, #name='Job Name',
#enabled=1,
#freq_type=4,
#freq_interval=1,
#freq_subday_type=4,
#freq_subday_interval=10,
#freq_relative_interval=1,
#freq_recurrence_factor=0,
#active_start_date=20150119,
#active_end_date=99991231,
#active_start_time=0,
#active_end_time=235959
I've tried various combinations of things I've been reading online but I can't seem to make it replace or even match. I know that the regex for the guid matching is:
\b[A-F0-9]{8}(?:-[A-F0-9]{4}){3}-[A-F0-9]{12}\b'
I've tried to add this into a number of things, and thought that the following regex would work but can't figure out what I'm doing wrong or missing
#", \r\n\t\t#schedule_uid=N'\b[A-F0-9]{8}(?:-[A-F0-9]{4}){3}-[A-F0-9]{12}\b'"
#", \r\n\t\t#schedule_uid=N'[A-F0-9]{8}(?:-[A-F0-9]{4}){3}-[A-F0-9]{12}'"
#", \r\n\t\t\b#schedule_uid=N'[A-F0-9]{8}(?:-[A-F0-9]{4}){3}-[A-F0-9]{12}'\b"
I'm not looking for a solution as much as I'd like to know what I'm missing. I've been reading the regular-expressions.info site for a while and I'm usually able to figure out the correct regex, but this has had me stumped for a few days now.
EDIT:
It's not always the last item and it's not guaranteed to only occur once within the script since a job can have multiple schedules which have different #schedule_uid's and I want to get rid of all of them without looping. This is why I chose Regex for the operation. It also needs to remove the comma at the end of the previous parameters line for the code to remain syntax correct.

The following seems to work for me and it will enable you to remove all the newlines, tabs etc:
(?:\n|\t|\r|.){1,3}.*\#sc.*'
You can see it working here

There you go:
#schedule_uid=N'[\w]{8}-[\w]{4}-[\w]{4}-[\w]{4}-[\w]{12}'
Created and tested using http://regexpal.com/

Assuming as little as possible, just using basic string operations.
string exec = ...
int i = exec.IndexOf("#schedule_uid");
while (i > -1)
{
int j = i;
//Find the previous comma
while (exec[i] != ',')
i--;
//Find the end, next line, or next comma
while (j < exec.Length && exec[j] != '\r' && exec[j] != ',')
j++;
exec = exec.Remove(i, j - i);
i = exec.IndexOf("#schedule_uid");
}
I'm deliberately ignoring the no looping requirement, in favour of simple code that works. tested vs this...
string exec = #"
EXEC #ReturnCode = msdb.dbo.sp_add_jobschedule, #schedule_uid=N'a70709af-bce7-4c65-a4cd-7574acd31ca2', #job_id=#jobId, #name='Job Name',
#enabled=1,
#freq_type=4,
#freq_interval=1,
#freq_subday_type=4,
#freq_subday_interval=10,
#freq_relative_interval=1,
#freq_recurrence_factor=0,
#schedule_uid=N'a70709af-bce7-4c65-a4cd-7574acd31ca2',
#active_start_date=20150119,
#active_end_date=99991231,
#active_start_time=0,
#active_end_time=235959,
#schedule_uid=N'a70709af-bce7-4c65-a4cd-7574acd31ca2'";

A little more complicated, but works.
string test = "EXEC...";
var lines = test.Split(new char [] { ',' }).ToList();
lines = lines.Select((line, index) =>
{
var indexof = line.IndexOf("#schedule_uid");
if (indexof > -1)
{
if (index == 0)
{
return line.Substring(0, indexof);
}
else
{
return null;
}
}
return line + ",";
})
.Where(line => line != null)
.ToList();
test = string.Join(string.Empty, lines);
JsFiddle Example.

Related

counting a string with special characters in a string in c#

I would like to count a string (search term) in another string (logfile).
Splitting the string with the method Split and searching the array afterwards is too inefficient for me, because the logfile is very large.
In the net I found the following possibility, which worked quite well so far. However,
count = Regex.Matches(_editor.Text, txtLookFor.Text, RegexOptions.IgnoreCase).Count;
I am now running into another problem there, that I get the following error when I count a string in the format of "Nachricht erhalten (".
Errormessage:
System.ArgumentException: "Nachricht erhalten (" analysed - not enough )-characters.

You need to escape the ( symbol as it has a special function in regular expressions:
var test = Regex.Matches("Nachricht erhalten (3)", #"Nachricht erhalten \(", RegexOptions.IgnoreCase).Count;
If you do this by user input where the user is not familiar with regular expressions you probably easier off using IndexOf in a while loop, where you keep using the new index found in the last loop. Which might also be a bit better on performance than a regular expression. Example:
var test = "This is a test";
var searchFor = "is";
var count = 0;
var index = test.IndexOf(searchFor, 0);
while (index != -1)
{
++count;
index = test.IndexOf(searchFor, index + searchFor.Length);
}

Regex to remove single-line SQL comments (--)

Question:
Can anybody give me a working regex expression (C#/VB.NET) that can remove single line comments from a SQL statement ?
I mean these comments:
-- This is a comment
not those
/* this is a comment */
because I already can handle the star comments.
I have a made a little parser that removes those comments when they are at the start of the line, but they can also be somewhere after code or worse, in a SQL-string 'hello --Test -- World'
Those comments should also be removed (except those in a SQL string of course - if possible).
Surprisingly I didn't got the regex working. I would have assumed the star comments to be more difficult, but actually, they aren't.
As per request, here my code to remove /**/-style comments
(In order to have it ignore SQL-Style strings, you have to subsitute strings with a uniqueidentifier (i used 4 concated), then apply the comment-removal, then apply string-backsubstitution.
static string RemoveCstyleComments(string strInput)
{
string strPattern = #"/[*][\w\d\s]+[*]/";
//strPattern = #"/\*.*?\*/"; // Doesn't work
//strPattern = "/\\*.*?\\*/"; // Doesn't work
//strPattern = #"/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/ "; // Doesn't work
//strPattern = #"/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/ "; // Doesn't work
// http://stackoverflow.com/questions/462843/improving-fixing-a-regex-for-c-style-block-comments
strPattern = #"/\*(?>(?:(?>[^*]+)|\*(?!/))*)\*/"; // Works !
string strOutput = System.Text.RegularExpressions.Regex.Replace(strInput, strPattern, string.Empty, System.Text.RegularExpressions.RegexOptions.Multiline);
Console.WriteLine(strOutput);
return strOutput;
} // End Function RemoveCstyleComments

I will disappoint all of you. This can't be done with regular expressions. Sure, it's easy to find comments not in a string (that even the OP could do), the real deal is comments in a string. There is a little hope of the look arounds, but that's still not enough. By telling that you have a preceding quote in a line won't guarantee anything. The only thing what guarantees you something is the oddity of quotes. Something you can't find with regular expression. So just simply go with non-regular-expression approach.
EDIT:
Here's the c# code:
String sql = "--this is a test\r\nselect stuff where substaff like '--this comment should stay' --this should be removed\r\n";
char[] quotes = { '\'', '"'};
int newCommentLiteral, lastCommentLiteral = 0;
while ((newCommentLiteral = sql.IndexOf("--", lastCommentLiteral)) != -1)
{
int countQuotes = sql.Substring(lastCommentLiteral, newCommentLiteral - lastCommentLiteral).Split(quotes).Length - 1;
if (countQuotes % 2 == 0) //this is a comment, since there's an even number of quotes preceding
{
int eol = sql.IndexOf("\r\n") + 2;
if (eol == -1)
eol = sql.Length; //no more newline, meaning end of the string
sql = sql.Remove(newCommentLiteral, eol - newCommentLiteral);
lastCommentLiteral = newCommentLiteral;
}
else //this is within a string, find string ending and moving to it
{
int singleQuote = sql.IndexOf("'", newCommentLiteral);
if (singleQuote == -1)
singleQuote = sql.Length;
int doubleQuote = sql.IndexOf('"', newCommentLiteral);
if (doubleQuote == -1)
doubleQuote = sql.Length;
lastCommentLiteral = Math.Min(singleQuote, doubleQuote) + 1;
//instead of finding the end of the string you could simply do += 2 but the program will become slightly slower
}
}
Console.WriteLine(sql);
What this does: find every comment literal. For each, check if it's within a comment or not, by counting the number of quotes between the current match and the last one. If this number is even, then it's a comment, thus remove it (find first end of line and remove whats between). If it's odd, this is within a string, find the end of the string and move to it. Rgis snippet is based on a wierd SQL trick: 'this" is a valid string. Even tho the 2 quotes differ. If it's not true for your SQL language, you should try a completely different approach. I'll write a program to that too if that's the case, but this one's faster and more straightforward.

You want something like this for the simple case
-{2,}.*
The -{2,} looks for a dash that happens 2 or more times
The .* gets the rest of the lines up to the newline
*But, for the edge cases, it appears that SinistraD is correct in that you cannot catch everything, however here is an article about how this can be done in C# with a combination of code and regex.

This seems to work well for me so far; it even ignores comments within strings, such as SELECT '--not a comment--' FROM ATable
private static string removeComments(string sql)
{
string pattern = #"(?<=^ ([^'""] |['][^']*['] |[""][^""]*[""])*) (--.*$|/\*(.|\n)*?\*/)";
return Regex.Replace(sql, pattern, "", RegexOptions.IgnorePatternWhitespace | RegexOptions.Multiline);
}
Note: it is designed to eliminate both /**/-style comments as well as -- style. Remove |/\*(.|\n)*?\*/ to get rid of the /**/ checking. Also be sure you are using the RegexOptions.IgnorePatternWhitespace Regex option!!
I wanted to be able to handle double-quotes too, but since T-SQL doesn't support them, you could get rid of |[""][^""]*[""] too.
Adapted from here.
Note (Mar 2015): In the end, I wound up using Antlr, a parser generator, for this project. There may have been some edge cases where the regex didn't work. In the end I was much more confident with the results having used Antlr, and it's worked well.

Using System.Text.RegularExpressions;
public static string RemoveSQLCommentCallback(Match SQLLineMatch)
{
System.Text.StringBuilder sb = new System.Text.StringBuilder();
bool open = false; //opening of SQL String found
char prev_ch = ' ';
foreach (char ch in SQLLineMatch.ToString())
{
if (ch == '\'')
{
open = !open;
}
else if ((!open && prev_ch == '-' && ch == '-'))
{
break;
}
sb.Append(ch);
prev_ch = ch;
}
return sb.ToString().Trim('-');
}
The code
public static void Main()
{
string sqlText = "WHERE DEPT_NAME LIKE '--Test--' AND START_DATE < SYSDATE -- Don't go over today";
//for every matching line call callback func
string result = Regex.Replace(sqlText, ".*--.*", RemoveSQLCommentCallback);
}
Let's replace, find all the lines that match dash dash comment and call your parsing function for every match.

As a late solution, the simplest way is to do it using ScriptDom-TSqlParser:
// https://michaeljswart.com/2014/04/removing-comments-from-sql/
// http://web.archive.org/web/*/https://michaeljswart.com/2014/04/removing-comments-from-sql/
public static string StripCommentsFromSQL(string SQL)
{
Microsoft.SqlServer.TransactSql.ScriptDom.TSql150Parser parser =
new Microsoft.SqlServer.TransactSql.ScriptDom.TSql150Parser(true);
System.Collections.Generic.IList<Microsoft.SqlServer.TransactSql.ScriptDom.ParseError> errors;
Microsoft.SqlServer.TransactSql.ScriptDom.TSqlFragment fragments =
parser.Parse(new System.IO.StringReader(SQL), out errors);
// clear comments
string result = string.Join(
string.Empty,
fragments.ScriptTokenStream
.Where(x => x.TokenType != Microsoft.SqlServer.TransactSql.ScriptDom.TSqlTokenType.MultilineComment)
.Where(x => x.TokenType != Microsoft.SqlServer.TransactSql.ScriptDom.TSqlTokenType.SingleLineComment)
.Select(x => x.Text));
return result;
}
or instead of using the Microsoft-Parser, you can use ANTL4 TSqlLexer
or without any parser at all:
private static System.Text.RegularExpressions.Regex everythingExceptNewLines =
new System.Text.RegularExpressions.Regex("[^\r\n]");
// http://drizin.io/Removing-comments-from-SQL-scripts/
// http://web.archive.org/web/*/http://drizin.io/Removing-comments-from-SQL-scripts/
public static string RemoveComments(string input, bool preservePositions, bool removeLiterals = false)
{
//based on http://stackoverflow.com/questions/3524317/regex-to-strip-line-comments-from-c-sharp/3524689#3524689
var lineComments = #"--(.*?)\r?\n";
var lineCommentsOnLastLine = #"--(.*?)$"; // because it's possible that there's no \r\n after the last line comment
// literals ('literals'), bracketedIdentifiers ([object]) and quotedIdentifiers ("object"), they follow the same structure:
// there's the start character, any consecutive pairs of closing characters are considered part of the literal/identifier, and then comes the closing character
var literals = #"('(('')|[^'])*')"; // 'John', 'O''malley''s', etc
var bracketedIdentifiers = #"\[((\]\])|[^\]])* \]"; // [object], [ % object]] ], etc
var quotedIdentifiers = #"(\""((\""\"")|[^""])*\"")"; // "object", "object[]", etc - when QUOTED_IDENTIFIER is set to ON, they are identifiers, else they are literals
//var blockComments = #"/\*(.*?)\*/"; //the original code was for C#, but Microsoft SQL allows a nested block comments // //https://msdn.microsoft.com/en-us/library/ms178623.aspx
//so we should use balancing groups // http://weblogs.asp.net/whaggard/377025
var nestedBlockComments = #"/\*
(?>
/\* (?<LEVEL>) # On opening push level
|
\*/ (?<-LEVEL>) # On closing pop level
|
(?! /\* | \*/ ) . # Match any char unless the opening and closing strings
)+ # /* or */ in the lookahead string
(?(LEVEL)(?!)) # If level exists then fail
\*/";
string noComments = System.Text.RegularExpressions.Regex.Replace(input,
nestedBlockComments + "|" + lineComments + "|" + lineCommentsOnLastLine + "|" + literals + "|" + bracketedIdentifiers + "|" + quotedIdentifiers,
me => {
if (me.Value.StartsWith("/*") && preservePositions)
return everythingExceptNewLines.Replace(me.Value, " "); // preserve positions and keep line-breaks // return new string(' ', me.Value.Length);
else if (me.Value.StartsWith("/*") && !preservePositions)
return "";
else if (me.Value.StartsWith("--") && preservePositions)
return everythingExceptNewLines.Replace(me.Value, " "); // preserve positions and keep line-breaks
else if (me.Value.StartsWith("--") && !preservePositions)
return everythingExceptNewLines.Replace(me.Value, ""); // preserve only line-breaks // Environment.NewLine;
else if (me.Value.StartsWith("[") || me.Value.StartsWith("\""))
return me.Value; // do not remove object identifiers ever
else if (!removeLiterals) // Keep the literal strings
return me.Value;
else if (removeLiterals && preservePositions) // remove literals, but preserving positions and line-breaks
{
var literalWithLineBreaks = everythingExceptNewLines.Replace(me.Value, " ");
return "'" + literalWithLineBreaks.Substring(1, literalWithLineBreaks.Length - 2) + "'";
}
else if (removeLiterals && !preservePositions) // wrap completely all literals
return "''";
else
throw new System.NotImplementedException();
},
System.Text.RegularExpressions.RegexOptions.Singleline | System.Text.RegularExpressions.RegexOptions.IgnorePatternWhitespace);
return noComments;
}

I don't know if C#/VB.net regex is special in some way but traditionally s/--.*// should work.

In PHP, i'm using this code to uncomment SQL (only single line):
$sqlComments = '#(([\'"`]).*?[^\\\]\2)|((?:\#|--).*?$)\s*|(?<=;)\s+#ms';
/* Commented version
$sqlComments = '#
(([\'"`]).*?[^\\\]\2) # $1 : Skip single & double quoted + backticked expressions
|((?:\#|--).*?$) # $3 : Match single line comments
\s* # Trim after comments
|(?<=;)\s+ # Trim after semi-colon
#msx';
*/
$uncommentedSQL = trim( preg_replace( $sqlComments, '$1', $sql ) );
preg_match_all( $sqlComments, $sql, $comments );
$extractedComments = array_filter( $comments[ 3 ] );
var_dump( $uncommentedSQL, $extractedComments );
To remove all comments see Regex to match MySQL comments

Parsing CSV File enclosed with quotes in C#

I've seen lots of samples in parsing CSV File. but this one is kind of annoying file...
so how do you parse this kind of CSV
"1",1/2/2010,"The sample ("adasdad") asdada","I was pooping in the door "Stinky", so I'll be damn","AK"

The best answer in most cases is probably #Jim Mischel's. TextFieldParser seems to be exactly what you want for most conventional cases -- though it strangely lives in the Microsoft.VisualBasic namespace! But this case isn't conventional.
The last time I ran into a variation on this issue where I needed something unconventional, I embarrassingly gave up on regexp'ing and bullheaded a char by char check. Sometimes, that's not-wrong enough to do. Splitting a string isn't as difficult a problem if you byte push.
So I rewrote for this case as a string extension. I think this is close.
Do note that, "I was pooping in the door "Stinky", so I'll be damn", is an especially nasty case. Without the *** STINKY CONDITION *** code, below, you'd get I was pooping in the door "Stinky as one value and so I'll be damn" as the other.
The only way to do better than that for any anonymous weird splitter/escape case would be to have some sort of algorithm to determine the "usual" number of columns in each row, and then check for, in this case, fixed length fields like your AK state entry or some other possible landmark as a sort of normalizing backstop for nonconformist columns. But that's serious crazy logic that likely isn't called for, as much fun as it'd be to code. As #Vash points out, you're better off following some standard and coding a little more OFfensively.
But the problem here is probably easier than that. The only lexically meaningful case is the one in your example -- ", -- double quote, comma, and then a space. So that's what the *** STINKY CONDITION *** code checks. Even so, this code is getting nastier than I'd like, which means you have ever stranger edge cases, like "This is also stinky," a f a b","Now what?" Heck, even "A,"B","C" doesn't work in this code right now, iirc, since I treat the begin and end chars as having been escape pre- and post-fixed. So we're largely back to #Vash's comment!
Apologies for all the brackets for one-line if statements, but I'm stuck in a StyleCop world right now. I'm not necessarily suggesting you use this -- that strictEscapeToSplitEvaluation plus the STINKY CONDITION makes this a little complex. But it's worth keeping in mind that a normal csv parser that's intelligent about quotes is significantly more straightforward to the point of being tedious, but otherwise trivial.
namespace YourFavoriteNamespace
{
using System;
using System.Collections.Generic;
using System.Text;
public static class Extensions
{
public static Queue<string> SplitSeeingQuotes(this string valToSplit, char splittingChar = ',', char escapeChar = '"',
bool strictEscapeToSplitEvaluation = true, bool captureEndingNull = false)
{
Queue<string> qReturn = new Queue<string>();
StringBuilder stringBuilder = new StringBuilder();
bool bInEscapeVal = false;
for (int i = 0; i < valToSplit.Length; i++)
{
if (!bInEscapeVal)
{
// Escape values must come immediately after a split.
// abc,"b,ca",cab has an escaped comma.
// abc,b"ca,c"ab does not.
if (escapeChar == valToSplit[i] && (!strictEscapeToSplitEvaluation || (i == 0 || (i != 0 && splittingChar == valToSplit[i - 1]))))
{
bInEscapeVal = true; // not capturing escapeChar as part of value; easy enough to change if need be.
}
else if (splittingChar == valToSplit[i])
{
qReturn.Enqueue(stringBuilder.ToString());
stringBuilder = new StringBuilder();
}
else
{
stringBuilder.Append(valToSplit[i]);
}
}
else
{
// Can't use switch b/c we're comparing to a variable, I believe.
if (escapeChar == valToSplit[i])
{
// Repeated escape always reduces to one escape char in this logic.
// So if you wanted "I'm ""double quote"" crazy!" to come out with
// the double double quotes, you're toast.
if (i + 1 < valToSplit.Length && escapeChar == valToSplit[i + 1])
{
i++;
stringBuilder.Append(escapeChar);
}
else if (!strictEscapeToSplitEvaluation)
{
bInEscapeVal = false;
}
// *** STINKY CONDITION ***
// Kinda defense, since only `", ` really makes sense.
else if ('"' == escapeChar && i + 2 < valToSplit.Length &&
valToSplit[i + 1] == ',' && valToSplit[i + 2] == ' ')
{
i = i+2;
stringBuilder.Append("\", ");
}
// *** EO STINKY CONDITION ***
else if (i+1 == valToSplit.Length || (i + 1 < valToSplit.Length && valToSplit[i + 1] == splittingChar))
{
bInEscapeVal = false;
}
else
{
stringBuilder.Append(escapeChar);
}
}
else
{
stringBuilder.Append(valToSplit[i]);
}
}
}
// NOTE: The `captureEndingNull` flag is not tested.
// Catch null final entry? "abc,cab,bca," could be four entries, with the last an empty string.
if ((captureEndingNull && splittingChar == valToSplit[valToSplit.Length-1]) || (stringBuilder.Length > 0))
{
qReturn.Enqueue(stringBuilder.ToString());
}
return qReturn;
}
}
}
Probably worth mentioning that the "answer" you gave yourself doesn't have the "Stinky" problem in its sample string. ;^)
[Understanding that we're three years after you asked,] I will say that your example isn't as insane as folks here make out. I can see wanting to treat escape characters (in this case, ") as escape characters only when they're the first value after the splitting character or, after finding an opening escape, stopping only if you find the escape character before a splitter; in this case, the splitter is obviously ,.
If the row of your csv is abc,bc"a,ca"b, I would expect that to mean we've got three values: abc, bc"a, and ca"b.
Same deal in your "The sample ("adasdad") asdada" column -- quotes that don't begin and end a cell value aren't escape characters and don't necessarily need doubling to maintain meaning. So I added a strictEscapeToSplitEvaluation flag here.
Enjoy. ;^)

I very strongly recommend using TextFieldParser. Hand-coded parsers that use String.Split or regular expressions almost invariably mishandle things like quoted fields that have embedded quotes or embedded separators.
I would be surprised, though, if it handled your particular example. As others have said, that line is, at best, ambiguous.

Split based on
",
I would use MyString.IndexOf("\","
And then substring the parts. Other then that im sure someone written a csv parser out there that can handle this :)

I found a way to parse this malformed CSV. I looked for a pattern and found it.... I first replace (",") with a character... like "¤" and then split it...
from this:
"Annoying","CSV File","poop#mypants.com",1999,01-20-2001,"oh,boy",01-20-2001,"yeah baby","yeah!"
to this:
"Annoying¤CSV File¤poop#mypants.com",1999,01-20-2001,"oh,boy",01-20-2001,"yeah baby¤yeah!"
then split it:
ArrayA[0]: "Annoying //this value will be trimmed by replace("\"","") same as the array[4]
ArrayA[1]: CSV File
ArrayA[2]: poop#mypants.com",1999,01-20-2001,"oh,boy",01-20-2001,"yeah baby
ArrayA[3]: yeah!"
after splitting it, I will replace strings from ArrayA[2] ", and ," with ¤ and then split it again
from this
ArrayA[2]: poop#mypants.com",1999,01-20-2001,"oh,boy",01-20-2001,"yeah baby
to this
ArrayA[2]: poop#mypants.com¤1999,01-20-2001¤oh,boy¤01-20-2001¤yeah baby
then split it again and would turn to this
ArrayB[0]: poop#mypants.com
ArrayB[1]: 1999,01-20-2001
ArrayB[2]: oh,boy
ArrayB[3]: 01-20-2001
ArrayB[4]: yeah baby
and lastly... I'll split the Year only and the date from ArrayB[1] with , to ArrayC
It's tedious but there's no other way to do it...

There is one another open source library, Cinchoo ETL, handle quoted string fine. Here is sample code.
string csv = #"""1"",1/2/2010,""The sample(""adasdad"") asdada"",""I was pooping in the door ""Stinky"", so I'll be damn"",""AK""";
using (var r = ChoCSVReader.LoadText(csv)
.QuoteAllFields()
)
{
foreach (var rec in r)
Console.WriteLine(rec.Dump());
}
Output:
[Count: 5]
Key: Column1 [Type: Int64]
Value: 1
Key: Column2 [Type: DateTime]
Value: 1/2/2010 12:00:00 AM
Key: Column3 [Type: String]
Value: The sample(adasdad) asdada
Key: Column4 [Type: String]
Value: I was pooping in the door Stinky, so I'll be damn
Key: Column5 [Type: String]
Value: AK

You could split the string by ",". It is recomended that the csv file could each cell value should be enclosed in quotes like "1","2","3".....

I don't see how you could if each line is different. This line is a malformed for CSV. Quotes contained within a value must be doubled as shown below. I can't even tell for sure where the values should be terminated.
"1",1/2/2010,"The sample (""adasdad"") asdada","I was pooping in the door ""Stinky"", so I'll be damn","AK"
Here's my code to parse a CSV file but I don't see how any code would know how to handle your line because it's malformed.

You might want to give CsvReader a try. It will handle quoted string fine, so you just will have to remove leading and trailing quotes.
It will fail if your strings contains a coma. To avoid this, the quotes needs to be doubled as said in other answers.

As no (decent) .csv parser can parse non-csv-data correctly, the task isn't to parse the data, but to fix the file(s) (and then to parse the correct data).
To fix the data you need a list of bad rows (to be sent to the person responsible for the garbage for manual editing). To get such a list, you can
use Access with a correct import specification to import the file. You'll get a list of import failures.
write a script/program that opens the file via the OLEDB text driver.
Sample file:
"Id","Remark","DateDue"
1,"This is good",20110413
2,"This is ""good""",20110414
3,"This is ""good"","bad",and "ugly",,20110415
4,"This is ""good""" again,20110415
Sample SQL/Result:
SELECT * FROM [badcsv01.csv]
Id Remark DateDue
1 This is good 4/13/2011
2 This is "good" 4/14/2011
3 This is "good", NULL
4 This is "good" again 4/15/2011
SELECT * FROM [badcsv01.csv] WHERE DateDue Is Null
Id Remark DateDue
3 This is "good", NULL

First you will do it for the columns names:
DataTable pbResults = new DataTable();
OracleDataAdapter oda = new OracleDataAdapter(cmd);
oda.Fill(pbResults);
StringBuilder sb1 = new StringBuilder();
StringBuilder sb2 = new StringBuilder();
IEnumerable<string> columnNames = pbResults.Columns.Cast<DataColumn>().Select(column => column.ColumnName);
sb1.Append(string.Join("\"" + "," + "\"", columnNames));
sb2.Append("\"");
sb2.Append(sb1);
sb2.AppendLine("\"");
Second you will do it for each row:
foreach (DataRow row in pbResults.Rows)
{
IEnumerable<string> fields = row.ItemArray.Select(field => field.ToString());
sb2.Append("\"");
sb2.Append(string.Join("\"" + "," + "\"", fields));
sb2.AppendLine("\"");
}

Retrieving embedded resources with special characters

I'm having a problem getting streams for embedded resources. Most online samples show paths that can be directly translated by changing the slash of a path to a dot for the source (MyFolder/MyFile.ext becomes MyNamespace.MyFolder.MyFile.ext). However when a folder has a dot in the name and when special characters are used, manually getting the resource name does not work. I'm trying to find a function that can convert a path to a resource name as Visual Studio renames them when compiling..
These names from the solution ...
Content/jQuery.UI-1.8.2/jQuery.UI.css
Scripts/jQuery-1.5.2/jQuery.js
Scripts/jQuery.jPlayer-2.0.0/jQuery.jPlayer.js
Scripts/jQuery.UI-1.8.2/jQuery.UI.js
... are changed into these names in the resources ...
Content.jQuery.UI_1._8._2.jQuery.UI.css
Scripts.jQuery_1._5._2.jQuery.js
Scripts.jQuery.jPlayer_2._0._0.jQuery.jPlayer.js
Scripts.jQuery.UI_1._8._12.jQuery.UI.js
Slashes are translated to dots. However, when a dot is used in a folder name, the first dot is apparently considered an extension and the rest of the dots are changed to be prefixed with an underscore. This logic does not apply on the jQuery.js file, though, maybe because the 'extension' is a single number? Here's a function able to translate the issues I've had so far, but doesn't work on the jQuery.js path.
protected String _GetResourceName( String[] zSegments )
{
String zResource = String.Empty;
for ( int i = 0; i < zSegments.Length; i++ )
{
if ( i != ( zSegments.Length - 1 ))
{
int iPos = zSegments[i].IndexOf( '.' );
if ( iPos != -1 )
{
zSegments[i] = zSegments[i].Substring( 0, iPos + 1 )
+ zSegments[i].Substring( iPos + 1 ).Replace( ".", "._" );
}
}
zResource += zSegments[i].Replace( '/', '.' ).Replace( '-', '_' );
}
return String.Concat( _zAssemblyName, zResource );
}
Is there a function that can change the names for me? What is it? Or where can I find all the rules so I can write my own function? Thanks for any assistance you may be able to provide.

This is kinda a very late answer... But since this was the first hit on google, I'll post what I've found!
You can simply force compiler to name the embedded resource as you want it; Which will kinda solves this problem from the beginning... You've just got to edit your csproj file, which you normally do if you want wildcards in it! here is what I did:
<EmbeddedResource Include="$(SolutionDir)\somefolder\**">
<Link>somefolder\%(RecursiveDir)%(Filename)%(Extension)</Link>
<LogicalName>somefolder:\%(RecursiveDir)%(Filename)%(Extension)</LogicalName>
</EmbeddedResource>
In this case, I'm telling Visual studio, that I want all the files in "some folder" to be imported as embedded resources. Also I want them to be shown under "some folder", in VS solution explorer (this is link tag). And finally, when compiling them, I want them to be named exactly with same name and address they had on my disk, with only "somefolder:\" prefix. The last part is doing the magic.

This is what I came up with to solve the issue. I'm still open for better methods, as this is a bit of a hack (but seems to be accurate with the current specifications). The function expects a segment from an Uri to process (LocalPath when dealing with web requests). Example call is below..
protected String _GetResourceName( String[] zSegments )
{
// Initialize the resource string to return.
String zResource = String.Empty;
// Initialize the variables for the dot- and find position.
int iDotPos, iFindPos;
// Loop through the segments of the provided Uri.
for ( int i = 0; i < zSegments.Length; i++ )
{
// Find the first occurrence of the dot character.
iDotPos = zSegments[i].IndexOf( '.' );
// Check if this segment is a folder segment.
if ( i < zSegments.Length - 1 )
{
// A dash in a folder segment will cause each following dot occurrence to be appended with an underscore.
if (( iFindPos = zSegments[i].IndexOf( '-' )) != -1 && iDotPos != -1 )
{
zSegments[i] = zSegments[i].Substring( 0, iFindPos + 1 ) + zSegments[i].Substring( iFindPos + 1 ).Replace( ".", "._" );
}
// A dash is replaced with an underscore when no underscores are in the name or a dot occurrence is before it.
//if (( iFindPos = zSegments[i].IndexOf( '_' )) == -1 || ( iDotPos >= 0 && iDotPos < iFindPos ))
{
zSegments[i] = zSegments[i].Replace( '-', '_' );
}
}
// Each slash is replaced by a dot.
zResource += zSegments[i].Replace( '/', '.' );
}
// Return the assembly name with the resource name.
return String.Concat( _zAssemblyName, zResource );
}
Example call..
var testResourceName = _GetResourceName( new String[] {
"/",
"Scripts/",
"jQuery.UI-1.8.12/",
"jQuery-_.UI.js"
});

Roel,
Hmmm... This is a hack, but I guess it should work. Just define an empty "Marker" class in each directory which contains resources, then get the FullName of it's type, remove the class name from end and wala: there's your decoded-path.
string path = (new MarkerClass()).GetType().FullName.Replace(".MarkerClass", "");
I'm sure there's a "better" way to do it... with a LOT more lines of code; and this one has the advantage that Microsoft maintains it when they change stuff ;-)
Cheers. Keith.

A late answer here as well, I googled before I attempted this on my own and I eventually had to.
Here's the solution I came up with:
public string ProcessFolderDash(string path)
{
int dotCount = path.Split('/').Length - 1; // Gets the count of slashes
int dotCountLoop = 1; // Placeholder
string[] absolutepath = path.Split('/');
for (int i = 0; i < absolutepath.Length; i++)
{
if (dotCountLoop <= dotCount) // check to see if its a file
{
absolutepath[i] = absolutepath[i].Replace("-", "_");
}
dotCountLoop++;
}
return String.Join("/", absolutepath);
}

Remove characters after specific character in string, then remove substring?

I feel kind of dumb posting this when this seems kind of simple and there are tons of questions on strings/characters/regex, but I couldn't find quite what I needed (except in another language: Remove All Text After Certain Point).
I've got the following code:
[Test]
public void stringManipulation()
{
String filename = "testpage.aspx";
String currentFullUrl = "http://localhost:2000/somefolder/myrep/test.aspx?q=qvalue";
String fullUrlWithoutQueryString = currentFullUrl.Replace("?.*", "");
String urlWithoutPageName = fullUrlWithoutQueryString.Remove(fullUrlWithoutQueryString.Length - filename.Length);
String expected = "http://localhost:2000/somefolder/myrep/";
String actual = urlWithoutPageName;
Assert.AreEqual(expected, actual);
}
I tried the solution in the question above (hoping the syntax would be the same!) but nope. I want to first remove the queryString which could be any variable length, then remove the page name, which again could be any length.
How can I get the remove the query string from the full URL such that this test passes?

For string manipulation, if you just want to kill everything after the ?, you can do this
string input = "http://www.somesite.com/somepage.aspx?whatever";
int index = input.IndexOf("?");
if (index >= 0)
input = input.Substring(0, index);
Edit: If everything after the last slash, do something like
string input = "http://www.somesite.com/somepage.aspx?whatever";
int index = input.LastIndexOf("/");
if (index >= 0)
input = input.Substring(0, index); // or index + 1 to keep slash
Alternately, since you're working with a URL, you can do something with it like this code
System.Uri uri = new Uri("http://www.somesite.com/what/test.aspx?hello=1");
string fixedUri = uri.AbsoluteUri.Replace(uri.Query, string.Empty);

To remove everything before the first /
input = input.Substring(input.IndexOf("/"));
To remove everything after the first /
input = input.Substring(0, input.IndexOf("/") + 1);
To remove everything before the last /
input = input.Substring(input.LastIndexOf("/"));
To remove everything after the last /
input = input.Substring(0, input.LastIndexOf("/") + 1);
An even more simpler solution for removing characters after a specified char is to use the String.Remove() method as follows:
To remove everything after the first /
input = input.Remove(input.IndexOf("/") + 1);
To remove everything after the last /
input = input.Remove(input.LastIndexOf("/") + 1);

Here's another simple solution. The following code will return everything before the '|' character:
if (path.Contains('|'))
path = path.Split('|')[0];
In fact, you could have as many separators as you want, but assuming you only have one separation character, here is how you would get everything after the '|':
if (path.Contains('|'))
path = path.Split('|')[1];
(All I changed in the second piece of code was the index of the array.)

The Uri class is generally your best bet for manipulating Urls.

To remove everything before a specific char, use below.
string1 = string1.Substring(string1.IndexOf('$') + 1);
What this does is, takes everything before the $ char and removes it. Now if you want to remove the items after a character, just change the +1 to a -1 and you are set!
But for a URL, I would use the built in .NET class to take of that.

Request.QueryString helps you to get the parameters and values included within the URL
example
string http = "http://dave.com/customers.aspx?customername=dave"
string customername = Request.QueryString["customername"].ToString();
so the customername variable should be equal to dave
regards

I second Hightechrider: there is a specialized Url class already built for you.
I must also point out, however, that the PHP's replaceAll uses regular expressions for search pattern, which you can do in .NET as well - look at the RegEx class.

you can use .NET's built in method to remove the QueryString.
i.e., Request.QueryString.Remove["whatever"];
here whatever in the [ ] is name of the querystring which you want to
remove.
Try this...
I hope this will help.

You can use this extension method to remove query parameters (everything after the ?) in a string
public static string RemoveQueryParameters(this string str)
{
int index = str.IndexOf("?");
return index >= 0 ? str.Substring(0, index) : str;
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex to remove this string in C#? - c#

The following seems to work for me and it will enable you to remove all the newlines, tabs etc: (?:\n|\t|\r|.){1,3}.\#sc.' You can see it working here

There you go: #schedule_uid=N'[\w]{8}-[\w]{4}-[\w]{4}-[\w]{4}-[\w]{12}' Created and tested using http://regexpal.com/

Related

counting a string with special characters in a string in c#

Regex to remove single-line SQL comments (--)

Parsing CSV File enclosed with quotes in C#

Retrieving embedded resources with special characters

Remove characters after specific character in string, then remove substring?

Categories

Resources

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex to remove this string in C#? - c#

The following seems to work for me and it will enable you to remove all the newlines, tabs etc: (?:\n|\t|\r|.){1,3}.*\#sc.*' You can see it working here

There you go: #schedule_uid=N'[\w]{8}-[\w]{4}-[\w]{4}-[\w]{4}-[\w]{12}' Created and tested using http://regexpal.com/

Related

counting a string with special characters in a string in c#

Regex to remove single-line SQL comments (--)

Parsing CSV File enclosed with quotes in C#

Retrieving embedded resources with special characters

Remove characters after specific character in string, then remove substring?

Categories

Resources

The following seems to work for me and it will enable you to remove all the newlines, tabs etc: (?:\n|\t|\r|.){1,3}.\#sc.' You can see it working here