really simple question... just want to represent double quote " without needing to do "" or \"
cases that I'm aware of:
var s=#"123 "" 456 """;
var s="123 \" 456 \"";
It'd make a reasonalbe difference if I could remove this noise somehow. The reason is that the escape sequence \ and the double quote have meaning in a domain specific language (DSL) that we're using. Sometimes it's convenient to throw some syntax inline into a C# string.
What I'd like is a way to tell .net not to touch it. Perhaps some kind of catch all via the DLR?
Within a C# literal, there's nothing you can to - don't forget this is all done at compile-time.
If you don't use single quotes, you could always do:
var s = "123 ' 456 '".Replace("'", "\"");
(Or choose some other character you don't use much, and replace that afterwards instead.)
Other than that, avoiding storing lots of data in your source code helps a lot with this sort of thing - for test data, I often use an embedded resource and load that in at execution time.
I don't suppose you could just read them in from a file or database?
Yeah, there's definitely a way to do that, and I use it all the time for exactly that reason.
You create a string resource collection (open Project Properties, Resources, make sure it's on Strings) and put your literal strings in there. Then, when you need one of those strings, use the Properties.Resources.{insert string resource name} reference to collect it in a pure and unadulterated form!
For completeness, I'll mention that you can use hex in a C# string, so in this case, \x0022. Note that you can omit the leading 0's if the character immediately following isn't hex.
Related
Those nasty single quotes that love to cause havoc in MySQL, seem to have cousins!!! We have a system where users will to job updates from clients, either pasting in content from emails, or copy and paste from almost anywhere, and every time we cater for one single quote, another pops up. Here are the different ones : ’ ´ ' ` <--
As my regex is pretty weak, my fellow developer said I should just remove them using:
return Regex.Replace(oldText, #"[’´'`""]", #"");
I don't like this, its unprofessional, removing all single quotes. What I want to do, as many forums suggest, is just double up the single quotes. Would this be correct?
return Regex.Replace(oldText, #"[’´'`""]", #"''");
this though is done, because in his DAL, he constructs his insert statements with single quotes:
sql.Append(",`to_complete_by`='" + obj.toCompleteBy + "'");
Would I be able to avoid this error by changing ^^ to this?
sql.Append(",`to_complete_by`=\"" + obj.toCompleteBy + "\"");
or regex replace a preferred method?
If you're working with an old version of MySQL that doesn't support prepared statements, you should at least take advantage of the fact that modern database interfaces can emulate such features of they aren't supported by the database itself. In your case it's not so much the potential performance gain, but the added security though separation of statement syntax and value representations.
Let's say I have this string:
"param1,r:1234,p:myparameters=1,2,3"
...and I would like to split it into:
param1
r:1234
p:myparameters=1,2,3
I've used the split function and of course it splits it at every comma. Is there a way to do this using regex or will I have to write my own split function?
Personally, I would try something like this:
,(?=[^,]+:.*?)
Basically, use a positive look-ahead to find a comma, followed by a "key-value" pair (this defined by a key, a colon, and more information [data] (including other commas). This should disqualify the commas between the numbers, too.
You can use ; for separating values which makes easy to work with it.
Since you have , for separation and also for values it is difficult to split it.
You have
string str = "param1,r:1234,p:myparameters=1,2,3"
Recommended to use
string str = "param1;r:1234;p:myparameters=1,2,3"
which can be splited as
var strArray = str.Split(';');
strArray[0]; // contains param1
strArray[1]; // r:1234
strArray[2]; // p:myparameters=1,2,3
I'm not sure how you would write a split that knew which commas to split on there, honestly.
Unless it's a fixed number each time in which case, just use the String.Split overload that takes an int specifying how many substrings to return at max
If you're going to have comma-delimited data that's not always a fixed number of items and it could have literal commas in the data itself, they really should be quoted. If you can control the input in any way, you should encourage that, and use an actual CSV parser instead of String.Split
That depends. You can't parse it with regex (or anything else) unless you can identify a consistent rule separating one group from another. Based on your sample, I can't clearly identify such a rule (though I have some guesses). How does the system know that p:myparameters=1,2,3 is a single item? For example, if there were another item after it, what would be the difference between that and the 1,2,3? Figure that out and you'll be pretty close to a solution.
If you're able to change the format of the input string, why not decide on a consistent delimiter between your groups? ; would be a good choice. Use an input like param1;r:1234;p:myparameters=1,2,3 and there will be no ambiguity where the groups are, plus you can just split on ; and you won't need regex.
The simplest approach would be changing your delimiter from "," to something like "|". Then you can split on "|" no problem. However if you can't change the delimiting character then maybe you could encode the sections in a fashion similar to CSV.
CSV files have the same issue... the standard there is to put double quotes "" around columns.
For example, your string would be "param1","r:1234","p:myparameters=1,2,3".
Then you could use the Microsoft.VisualBasic.FileIO.TextFieldParser to split/parse. You can include this in c# even though its in the VisualBasic namespace.
TextFieldParser
Do you mean that:string[] str = System.Text.RegularExpression.Regex.Spilt("param1,r:1234,p:myparameters=1,2,3",#"\,");
I want to build a comma separated list so that I can split on the comma later to get an array of the values. However, the values may have comma's in them. In fact, they may have any normal keyboard character in them (they are supplied from a user). What is a good strategy for determining a character you are sure will not collide with the values?
In case this matters in a language dependent way, I am building the "some character" separated list in C# and sending it to a browser to be split in javascript.
If JavaScript is consuming the list, why not send it in the form of a JavaScript array? It already has an established and reliable method for representing a list and escaping characters.
["Value 1", "Value 2", "Escaped \"Quotes\"", "Escaped \\ Backslash"]
You could split it by a null character, and terminate your list with a double null character.
I always use | but if you still think that it can contain it, you can use combinations like #|#. For example:
"string one#|#string two#|#...#|#last string"
Eric S. Raymond wrote a book chapter on this that you might find useful. It is directed toward Unix users but should still apply.
As for your question, if you will have commas within cells, then you will need some form of escaping. Using \, is a standard way, but you will also have to escape slashes, which are also common.
Alternatively, use another character such as the pipe (|), tab, or something else of your choice. If users need to work with the data using a spreadsheet program, you can usually add filter rules to split cells on the delimiter of your choice. If this is a concern, it's probably best to choose a delimiter that users can easily type, which excludes the nul char, among others.
You could also use quoting:
"value1", "value2", "etc"
In which case, you will only need to escape quotes (and slashes). This should also be accepted by spreadsheets given the correct filter options.
There are several ways to do this. The first is to select a separator character that would not normally be input from the keyboard. NULL or TAB are normally good. The second is to use a character sequence as a separator, the Excel CSV files are a good example where the cell values are defined by quotes with commas separating the cells.
The answer is dependent on whether you want to reinvent the wheel or not.
If there is potential for any splitting character to appear in your strings then then I would suggest that you write a script element to your output with a javascript array definition in it. For example:
<script>
var myVars=new Array();
myVars[0]="abc|#123$";
myVars[1]="123*456";
myVars[2]="blah|blah";
</script>
Your javascript can then reference that array
Doing this also avoids the need to create a comma seperated string from your C# string array.
The only gotcha I can think of is strings that contains quotes, in this case you would have to escape them in C# when writing them out to the myVars output.
There is an RFC which documents the CSV format. Follow the standards and you will avoid reinventing the wheel and creating a mess for the next guy to come along and maintain your code. The nice thing is that there are libraries available to import/export CSV for just about any platform you can imagine.
That said, if you are serialising data to send to a browser, JSON is really the way to go and it too is documented in an RFC and you can get libraries for just about any platform such as JSON.NET
I'd like to String.Split() the following string using a comma as the delimitter:
John,Smith,123 Main Street,212-555-1212
The above content is entered by a user. If they enter a comma in their address, the resulting string would cause problems to String.Split() since you now have 5 fields instead of 4:
John,Smith,123 Main Street, Apt 101,212-555-1212
I can use String.Replace() on all user input to replace commas with something else, and then use String.Replace() again to convert things back to commas:
value = value.Replace(",", "*");
However, this can still be fooled if a user happens to use the placeholder delimitter "*" in their input. Then you'd end up with extra commas and no asterisks in the result.
I see solutions online for dealing with escaped delimitters, but I haven't found a solution for this seemingly common situation. What am I missing?
EDIT: This is called delimitter collision.
This is a common scenario — you have some arbitrary string values that you would like to compose into a structure, which is itself a string, but without allowing the values to interfere with the delimiters in structure around them.
You have several options:
Input restriction: If it is acceptable for your scenario, the simplest solution is to restrict the use of delimiters in the values. In your specific case, this means disallow commas.
Encoding: If input restriction is not appropriate, the next easiest option would be to encode the entire input value. Choose an encoding that does not have delimiters in its range of possible outputs (e.g. Base64 does not feature commas in its encoded output)
Escaping delimiters: A slightly more complex option is to come up with a convention for escaping delimiters. If you're working with something mainstream like CSV it is likely that the problem of escaping is already solved, and there's a standard library that you can use. If not, then it will take some thought to come up with a complete escaping system, and implement it.
If you have the flexibility to not use CSV for your data representation this would open up a host of other options. (e.g. Consider the way in which parameterised SQL queries sidestep the complexity of input escaping by storing the parameter values separately from the query string.)
This may not be an option for you but would is it not be easier to use a very uncommon character, say a pipe |, as your delimiter and not allow this character to be entered in the first instance?
If this is CSV, the address should be surrounded by quotes. CSV parsers are widely available that take this into account when parsing the text.
John,Smith,"123 Main Street, Apt. 6",212-555-1212
One foolproof solution would be to convert the user input to base64 and then delimit with a comma. It will mean that you will have to convert back after parsing.
You could try putting quotes, or some other begin and end delimiters, around each of the user inputs, and ignore any special character between a set of quotes.
This really comes down to a situation of cleansing user inputs. You should only allow desired characters in the user input and reject/strip invalid inputs from the user. This way you could use your asterisk delimiter.
The best solution is to define valid characters, and reject non valid characters somehow, then use the nonvalid character (which will not appear in the input since they are "banned") as you delimiters
Dont allow the user to enter that character which you are using as a Delimiter. I personally feel this is best way.
Funny solution (works if the address is the only field with coma):
Split the string by coma. First two pieces will be name and last name; the last piece is the telephone - take those away. Combine the rest by coma back - that would be address ;)
In a sense, the user is already "escaping" the comma with the space afterward.
So, try this:
string[] values = RegEx.Split(value, ",(?![ ])");
The user can still break this if they don't put a space, and there is a more foolproof method (using the standard CSV method of quoting values that contain commas), but this will do the trick for the use case you've presented.
One more solution: provide an "Address 2" field, which is where things like apartment numbers would traditionally go. User can still break it if they are lazy, though what they'll actually break the fields after address2.
Politely remind your users that properly-formed street addresses in the United States and Canada should NEVER contain any punctuation whatsoever, perhaps?
The process of automatically converting corrupted data into useful data is non-trivial without heuristic logic. You could try to outsource the parsing by calling a third-party address-formatting library to apply the USPS formatting rules.
Even USPS requires the user to perform much of the work, by having components of the address entered into distinct fields on their address "canonicalizer" page (http://zip4.usps.com/zip4/welcome.jsp).
I have a really long string. I would like to add a linefeed every 80 characters. Is there a regular expression replacement pattern I can use to insert "\r\n" every 80 characters? I am using C# if that matters.
I would like to avoid using a loop.
I don't need to worry about being in the middle of a word. I just want to insert a linefeed exactly every 80 characters.
I don't know the exact C# names, but it should be something like
str.Replace("(.{80})", "$1\r\n");
The idea is to grab 80 characters and save it in a group, then put it back in (I think "$1" is the right syntax) along with the "\r\n".
(Edit: The original regex had a + in it, which you definitely don't want. That would completely eliminate everything except the last line and any leftover pieces--a decidedly suboptimal result.)
Note that this way, you will most likely split inside words, so it might look pretty ugly.
You should be looking more into word wrapping if this is indeed supposed to be readable text. A little googling turned up a couple of functions; or if this is a text box, you can just turn on the WordWrap property.
Also, check out the .Net page at regular-expressions.info. It's by far the best reference site for regexes that I know of. (Jan Goyvaerts is on SO, but nobody told me to say that.)