Replace broken characters - c#

I have a small programm that replace strings that contains umlauts, apostrophes etc.
But sometimes I haven broken strings that contains for example A¶ for ü, A¼ (or ü) for ö, and so on.
Is there a way to fix these strings?
I just tried to use another replace statement
str = str.Replace("A¶", "ü");
str = str.Replace("A¼", "ö");
str = str.Replace("ü", "ö");
But this do not work for me

It looks like because they are non-standard characters it is having trouble matching. You will probably have to use Regex.Replace and reference the Unicode value of the characters in your regex: How can you strip non-ASCII characters from a string? (in C#)
Unicode/UTF8 reference: http://www.utf8-chartable.de/
Complete Unicode character set: http://www.unicode.org/charts/

Related

When multi-line text pasted into text input regex does not match the space

When user pastes something like this (from notepad for example):
multi
line#email.com
into input text box, the line break dissapears and it looks like this:
multi
line#email.com
But whatever the line break is converted to does not match this regex:
'\s|\t|\r|\n|\0','i'
so this invalid character passes through js validation to the .NET application code I am working on.
It is interesting but this text editor does the same transformation, that is why I had to post original sample as code. I would like to find out what the line break got converted to, so I can add a literal to the regex but I don't know how. Many thanks!
Here is the whole snippet:
var invalidChars = new RegExp('(^[.])|[<]|[>]|[(]|[)]|[\]|[,]|[;]|[:]|([.])[.]|\s|\t|\r|\n|\0', 'i');
if (text.match(invalidChars)) {
return false;
}
Your immediate problem is escaping. You're using a string literal to create the regex, like this:
'(^[.])|[<]|[>]|[(]|[)]|[\]|[,]|[;]|[:]|([.])[.]|\s|\t|\r|\n|\0'
But before it ever reaches the RegExp constructor, the [\] becomes []; \s becomes s; \0 becomes 0; and \t, \r and \n are converted to the characters they represent (tab, carriage return and linefeed, respectively). That won't happen if you use a regex literal instead, but you still have to escape the backslash to match a literal backslash.
Your regex is also has way more brackets than it needs. I think this is what you were trying for:
/^\.|\.\.|[<>()\\,;:\s]/
That matches a dot at the beginning, two consecutive dots, or one of several forbidden characters including any whitespace character (\s matches any whitespace character, not just a space).
Ok - here it is
vbCrLF
This is what pasted line breaks are converted to. I added (vbCrLF) group and those spaces are now detected. Thanks, Dan1M
http://forums.asp.net/t/1183613.aspx?Multiline+Textbox+Input+not+showing+line+breaks+in+Repeater+Control

Characters are not escaped properly in a Dictionary

I have a string such as this:
Hello[00]
And I want to replace the [00] with 00 (I don't want to do it through deleting the [] because that won't be useful for me later). I want a direct replace from [00] to 00. To do so, I have the following code:
var conversionRegex = new Regex(string.Join("|", conversion.Keys));
var textConverted = conversionRegex.Replace(allLines, n => conversion[n.Value]);
"conversion" is a Dictionary [string],[string]. And one of its entries is this one:
{#"\[00\]","00"}
According to my knowledge and experience, that should work properly, but it isn't. It throws an exception: the key can't be found in the dictionary. However, when the exception is thrown, the debugger says that "n.Value" equals to "[00]". So it should be found in the dictionary, because it's there!
I have more elements in this Dictionary, but the only ones that are throwing exceptions are the ones with characters that should be escaped. Somehow they are not escaped properly...
Any ideas on this? Thank you very much!
I think you are confusing escaping for regex with escaping for C# string literals. Square brackets ([]) have no special meaning in C# string literals and thus do not need to be escaped. However, they do have special meaning in regex so they do need to be escaped in the regex string if you wish to match those chars. Your key is properly escaped for regex but that means your C# string literal contains literal backslash chars.
Here is how C# interprets the following string literals:
"[00]" is a 4-char string containing the chars [00].
"\[00\]" is invalid C# due to invalid \[ and \] C# string literal escape sequences. It will not compile.
#"\[00\]" is a 6-char string containing the chars \[00\]. This is the proper format for escaping for regex but it's important to recognize that the backslashes are part of the C# string literal and not C# escape sequences. This will not match "[00]" because they are different strings.
"\\[00\\]" is the same as the previous. Instead of using #, it uses the C# \\ escape sequence which emits a literal backslash char.
When you use #"\[00\]" as a dictionary key, your dictionary key includes those
backslash chars. Therefore, your dictionary does not contain the key "[00]".
There are a few different ways you could rewrite your code to accomplish what you are trying to do. Here's an easy way to do by using the string representation without the regex escaping as the dict keys and then using Regex.Escape to escape these for generating the regex string.
var conversion = new Dictionary<string, string> {
{ #"[00]", "00" }
};
var allLines = "Hello[00]\r\nWorld[00]";
var conversionRegex = new Regex(string.Join("|", conversion.Keys.Select(key => Regex.Escape(key))));
var textConverted = conversionRegex.Replace(allLines, n => conversion[n.Value]);
Console.WriteLine(textConverted);

Regex in between characters

Im trying to create a regex that will match ascii characters in a string so that they be converted with hex afterwards. The string is received as follows:<<<441234567895,ASCII,4,54657379>>> so I am looking to match everything between the third comma and the >>> characters at the end of the string like so.
<<<441234567895,ASCII,4,54657379>>>
So far I have managed to create this regex (/([^,]*,[^,]*)*([^;]*)>>>/) for it but the third comma is picked up as well which I don't want. What do I need to do to remove it from the match?
thanks Callum
(?<=,)[^,]+(?=>>>)
This should do it.See demo.
https://regex101.com/r/sJ9gM7/79
Do you need to use Regex?
string input = "<<<441234567895,ASCII,4,54657379>>>";
string match = input.Substring(3, input.Length - 6).Split(',')[3];
You can also use further splits on the beginning and ending padding strings or check their lengths if you want something safer than the Substring magic.

Escape double quotes in a string

Double quotes can be escaped like this:
string test = #"He said to me, ""Hello World"". How are you?";
But this involves adding character " to the string. Is there a C# function or other method to escape double quotes so that no changing in string is required?
No.
Either use verbatim string literals as you have, or escape the " using backslash.
string test = "He said to me, \"Hello World\" . How are you?";
The string has not changed in either case - there is a single escaped " in it. This is just a way to tell C# that the character is part of the string and not a string terminator.
You can use backslash either way:
string str = "He said to me, \"Hello World\". How are you?";
It prints:
He said to me, "Hello World". How are you?
which is exactly the same that is printed with:
string str = #"He said to me, ""Hello World"". How are you?";
Here is a DEMO.
" is still part of your string.
You can check Jon Skeet's Strings in C# and .NET article for more information.
In C# you can use the backslash to put special characters to your string.
For example, to put ", you need to write \".
There are a lot of characters that you write using the backslash:
Backslash with other characters
\0 nul character
\a Bell (alert)
\b Backspace
\f Formfeed
\n New line
\r Carriage return
\t Horizontal tab
\v Vertical tab
\' Single quotation mark
\" Double quotation mark
\\ Backslash
Any character substitution by numbers:
\xh to \xhhhh, or \uhhhh - Unicode character in hexadecimal notation (\x has variable digits, \u has 4 digits)
\Uhhhhhhhh - Unicode surrogate pair (8 hex digits, 2 characters)
Another thing worth mentioning from C# 6 is interpolated strings can be used along with #.
Example:
string helloWorld = #"""Hello World""";
string test = $"He said to me, {helloWorld}. How are you?";
Or
string helloWorld = "Hello World";
string test = $#"He said to me, ""{helloWorld}"". How are you?";
Check running code here!
View the reference to interpolation here!
You're misunderstanding escaping.
The extra " characters are part of the string literal; they are interpreted by the compiler as a single ".
The actual value of your string is still He said to me, "Hello World". How are you?, as you'll see if you print it at runtime.
2022 UPDATE: Previously the answer would have been "no". However, C#11 introduces a new feature called "raw string literals." To quote the Microsoft documentation:
Beginning with C# 11, you can use raw string literals to more easily create strings that are multi-line, or use any characters requiring escape sequences. Raw string literals remove the need to ever use escape sequences. You can write the string, including whitespace formatting, how you want it to appear in output."
SOURCE: https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/strings/#raw-string-literals
EXAMPLE: So using the original example, you could do this (note that raw string literals always begin with three or more quotation marks):
string testSingleLine = """He said to me, "Hello World". How are you?""";
string testMultiLine = """
He said to me, "Hello World". How are you?
""";
Please explain your problem. You say:
But this involves adding character " to the string.
What problem is that? You can't type string foo = "Foo"bar"";, because that'll invoke a compile error. As for the adding part, in string size terms that is not true:
#"""".Length == 1
"\"".Length == 1
In C# 11.0 preview you can use raw string literals.
Raw string literals are a new format for string literals. Raw string literals can contain arbitrary text, including whitespace, new lines, embedded quotes, and other special characters without requiring escape sequences. A raw string literal starts with at least three double-quote (""") characters. It ends with the same number of double-quote characters. Typically, a raw string literal uses three double quotes on a single line to start the string, and three double quotes on a separate line to end the string.
string test = """He said to me, "Hello World" . How are you?""";
In C#, there are at least four ways to embed a quote within a string:
Escape quote with a backslash
Precede string with # and use double quotes
Use the corresponding ASCII character
Use the hexadecimal Unicode character
Please refer this document for detailed explanation.

something like a python's triple-quote in F# (or C#)?

I want to assign a xml code into a string variable.
I can do this without escaping single or double-quotes by using triple-quote in python.
Is there a similar way to do this in F# or C#?
F# 3.0 supports triple quoted strings. See Visual Studio F# Team Blog Post on 3.0 features.
The F# 3.0 Spec Strings and Characters section specifically mentions the XML scenario:
A triple-quoted string is specified by using three quotation marks
(""") to ensure that a string that includes one or more escaped
strings is interpreted verbatim. For example, a triple-quoted string
can be used to embed XML blobs:
As far as I know, there is no syntax corresponding to this in C# / F#. If you use #"str" then you have to replace quote with two quotes and if you just use "str" then you need to add backslash.
In any case, there is some encoding of ":
var str = #"allows
multiline, but still need to encode "" as two chars";
var str = "need to use backslahs \" here";
However, the best thing to do when you need to embed large strings (such as XML data) into your application is probably to use .NET resources (or store the data somewhere else, depending on your application). Embedding large string literals in program is generally not very recommended. Also, there used to be a plugin for pasting XML as a tree that constructs XElement objects for C#, but I'm not sure whether it still exists.
Although, I would personally vote to add """ as known from Python to F# - it is very useful, especially for interactive scripting.
In case someone ran into this question when looking for triple quote strings in C# (rather than F#), C#11 now has raw string literals and they're (IMO) better than Python's (due to how indentation is handled)!
Raw string literals are a new format for string literals. Raw string literals can contain arbitrary text, including whitespace, new lines, embedded quotes, and other special characters without requiring escape sequences. A raw string literal starts with at least three double-quote (""") characters. It ends with the same number of double-quote characters. Typically, a raw string literal uses three double quotes on a single line to start the string, and three double quotes on a separate line to end the string. The newlines following the opening quote and preceding the closing quote are not included in the final content:
string longMessage = """
This is a long message.
It has several lines.
Some are indented
more than others.
Some should start at the first column.
Some have "quoted text" in them.
""";
Any whitespace to the left of the closing double quotes will be removed from the string literal. Raw string literals can be combined with string interpolation to include braces in the output text. Multiple $ characters denote how many consecutive braces start and end the interpolation:
var location = $$"""
You are at {{{Longitude}}, {{Latitude}}}
""";
The preceding example specifies that two braces starts and end an interpolation. The third repeated opening and closing brace are included in the output string.
https://devblogs.microsoft.com/dotnet/csharp-11-preview-updates/#raw-string-literals
https://learn.microsoft.com/en-us/dotnet/csharp/whats-new/csharp-11
As shoosh said, you want to use the verbatim string literals in C#, where the string starts with # and is enclosed in double quotation marks. The only exception is if you need to put a double quotation mark in the string, in which case you need to double it
System.Console.WriteLine(#"Hello ""big"" world");
would output
Hello "big" world
http://msdn.microsoft.com/en-us/library/362314fe.aspx
In C# the syntax is #"some string"
see here

Categories

Resources