How to decode special characters (blank square unreadable) in c#? - c#

I have a problem that i can manage to resolve, i need to replace unreadable characters (i can't paste it here since it's not taken into acount but it shows like a blank square in the visual C# debugger.
When those are inserted in the sql database they are replaced by a ? but i don't want it... I tried to do a simple replace on the string but visual c# makes the pasting of such characters impossible.

You can't. When you see the red squares, you've already lost the relevant character info and you can't convert them back.

You can replace any character if you only know the unicode character code:
s = s.Replace('\u0080', ' ');
You can use a regular expression to replace any character outside of a set of allowed characters:
s = Regex.Replace(s, #"[^0-9A-Za-z]", " ");
Another alternative is to use a unicode data type in the database, so that it can handle any character that you have in your string.

Related

How to prevent ’ character being entered into a textbox?

I am trying to prevent the ’ character from being entered into a textbox.
The problem is if you copy and paste the ’ character it becomes a right single quotation, which causes an error in the sql when trying to save the textbox to the database.
You can see this here in the
unicode lookup
If you manually enter in ' then it is an apostrophe and this character is accepted.
The problem is people are copying and pasting this character from emails into the textbox and it gets converted to the single right quote.
Regex regex = new Regex(#"^[a-zA-Z0-9]*$");
Match match = regex.Match(txtName.Text);
if (!match.Success)
{
DisplayMsg("Name contains invalid character");
}
Is there a way to still allow the user to enter an apostrophe but prevent the single right quotation?
Rather than preventing users from entering (especially if they are copying and pasting), you'd be better off replacing the character yourself if you really need to get rid of it:
txtName.Text = txtName.Text
.Replace((char) 0x2018, '\'') // ‘ ‘
.Replace((char) 0x2019, '\'') // ’ ’
.Replace((char) 0x201C, '"') // “ “
.Replace((char) 0x201D, '"'); // ” ”
This way, you won't get in the way of your users and you'll still remove this character.
However, it does sound like you might be building up queries using string concatenation, and this is a more serious problem!
Is it possible to just encode the strings using HttpUtility.HtmlEncode("A string with lot's of q'o'u't'e's that should work f'i'n'e'") which results in A string with lot's of q'o'u't'e's that should work f'i'n'e'...
Since I can't comment on anyone else's answers other than my own, I have to do it like this.
Building off of the solution from #Richard, it could be made a little more readable for others that will have to follow after you.
string someString = "This has a left quote: ’\nThis has a right quote: ‘";
string sanitized = someString.Replace(HttpUtility.HtmlDecode("‘"), "'")
.Replace(HttpUtility.HtmlDecode("’"), "'");
Console.WriteLine(sanitized);
Edit: Misunderstood the original intent.

How to make wordwrap consider a whitespace in a string as a regular character?

I want to create a multiline textbox control that will be used for an application as "Snap to grid" when all of the characters are Monospaced, and the fitting font size was found (for example - for a textbox with 6 columns, exactly 6 characters should be entered). - Of course with Word Wrap!
As for the above - it's OK. The correct way to calculate the font size was found.
The only issue is that I need to create an "Alt+Enter" option to represent "Enter".
For some reasons, I can't use \r\n and need that all of the remaining line space will be full of whitespaces.
The thing is that wrap doesn't accept spaces in an amount which is larger than the textbox's width.
For example:
If I write in a textbox with width=8cells, the following (or paste that string etc..):
"Hello World!" ("Hello+6Whitespaces+World!")
I would like to recieve: (_=whitespace)
First Line: Hello___
Second Line: ___World
Third Line: !
What really happens is:
First Line: Hello___
Second Line: World!
(No spaces at all in the beginning of the second line)
PS - On debug I can see that the spaces (all of them) are considered as a part of the string.
Firstly, try Environment.NewLine instead of \r\n. That will ensure you get the correct line breaks on your target platform.
Secondly, Word Wrap is likely ignoring extra spaces, so you need to replace normal spaces with a unicode non-breaking space. The unicode character for that is \u00A0.
You need to replace all normal spaces with the unicode non-breaking space:
string spaceReplacer = "\u00A0";
And to use it, try this:
textBox1.Text = textBox1.Text.Replace(" ", spaceReplacer);
Or even this:
textBox1.Text = textBox1.Text.Replace(" ", "\u00A0");

C# Reading a file add special characters

I am reading a text file with this structure:
20150218;"C7";"B895";00101;"FTBCCAL16"
I read the line and split like this:
System.IO.StreamReader fichero = new System.IO.StreamReader(ruta, Encoding.Default);
while ((linea = fichero.ReadLine()) != null)
{
// Split by ";"
String[] separador = linea.Split(';');
}
But when I see the content of "linea", I have this:
"20150218";\"C7\";\"B895\";"00101";\"FTBCCAL16\"
As you see, the streamreader add some special character to the output like "" and \. I want to obtain this.
20150218;"C7";"B895";00101;"FTBCCAL16"
Is there a way to obtain this?
Thanks in advance! Regards!
You are watching it in Visual Studio debugger, which just shows you your lines this way. You can write your result into a console or into the file. And you will see normal text without special characters.
StreamReader is not adding or modifying the strings read from the file at all.
If you are viewing the contents of separador in the Visual Studio debugger, it will add an escape sequence to any special characters (for display purposes).
The displayed format matches how you would have to enter them in the code editor if you were creating a string constant.
For example,
However, the real contents of these strings (in memory) are not escaped. They are exactly as you expect them to be in your question.
If you output them or try to manipulate them in code they will have the correct contents.
So, your code is correct. You just have to understand escape sequences and how strings appear in the Visual Studio debugger.
Update:
See this question for an explanation of how to display unquoted strings in the debugger.
Okay here is the quotation from MSDN
At compile time, verbatim strings are converted to ordinary strings with all the same escape sequences. Therefore, if you view a verbatim string in the debugger watch window, you will see the escape characters that were added by the compiler, not the verbatim version from your source code. For example, the verbatim string #"C:\files.txt" will appear in the watch window as "C:\files.txt".
In your case for " it uses \" (Verbatim string)and this can be visible at debugging time.
Why this happens ?
Double quotation mark " is an escape sequence
Escape sequences are typically used to specify actions such as carriage returns and tab movements on terminals and printers. They are also used to provide literal representations of nonprinting characters and characters that usually have special meanings, such as the double quotation mark (")
So when a string purposefully contains an escape sequence, you need to represent it as a verbatim string. That's what compiler do and that's what you see in debugger

how to merge or inject "#" character in a string including escape characters without definning the string varibale from scratch in C#

hi , I have 2 related questions.
1)suppose we have:
string strMessage="\nHellow\n\nWorld";
console.writeln(strMessage);
Result is:
Hellow
World
Now if we want to show the string in the original format in One Line
we must redefine the first variable from scratch.
string strOrignelMessage=#"\nHellow\n\nWorld" ;
console.writln(strOrignelMessage);
Result is:
\nHellow\n\nWorld --------------------->and everything is ok.
i am wondering is there a way to avoid definning
the new variable(strOrignelMessage) in code for this purpose and just using only
the first string variable(strMessage) and apply some tricks and print it in one line.
at first i tried the following workaround but it makes some bugs.suppose we have:
string strMessage="a\aa\nbb\nc\rccc";
string strOrigenalMessage=strMessage.replace("\n","\\n").replace("\r","\\r");
Console.writeln(strOrigenalMessage)
result is :aa\nbb\nc\rccc
notice that befor the first "\" not printed.and now my second question is:
2)How we can fix the new problem with single "\"in the string
i hope to entitle this issue correctly and my explanations would be enough,thanks
No, because the compiler has already converted all of your escaped characters in the original string to the characters they represent. After the fact, it is too late to convert them to non-special characters. You can do a search and replace, converting '\n' to literally #"\n", but that is whacky and you're better off defining the string correctly in the first place. If you wanted to escape the backslashes in the first place, why not put an extra backslash character in front of each of them:
Instead of "\n" use "\\n".
Updated in response to your comment:
If the string is coming from user input, you don't need to escape the backslash, because it will be stored as a backslash in the input string. The escape character only works as an escape character in string literals in code (and not preceded by #, which makes them verbatim string literals).
if you want "\n\n\a\a\r\blah" to print as \n\n\a\a\r\blah without # just replace all \ with \\
\ is the escaper in a non-verbatim string. So you simply need to escape the escaper, as it were.
If you want to use both strings, but want to have only one in the code then write the string with #, and construct the other one with Replace(#"\n","\n").
explanations for Anthony Pegram (if i understand u right) and anyone that found it usefull
i think i find my way in question2.
at first ,unfortunately,i thought that the
escape characters limts to \n,\t,\r,\v and
this made me confuesed becouse in my sample string i used \a and \b
and the compiler behaviuor was not understandable for me.
but finally i found that \a and \b is in
escape-characters set too.and if u use "\" without escap characters
a compile time error would be raised (its so funny when i think to My mistake again)
pls refers to this usefull msdn article for more info.
2.4.4.5 String literals
and you couldnt replace \ (single\) with \\
becouse fundamentally you couldnt have a (single \) without using
escape-characters after it in a string .so we coudnt write such a string in the code:
string strTest="abc\pwww"; ------> compile time error
and for retriving an inactived escape characters version of a string
we can use simply string.replace method as i used befor.
excuse me for long strory ,thank u all for cooperation.

I need a regular expression to convert US tel number to link

Basically, the input field is just a string. People input their phone number in various formats. I need a regular expression to find and convert those numbers into links.
Input examples:
(201) 555-1212
(201)555-1212
201-555-1212
555-1212
Here's what I want:
(201) 555-1212 - Notice the space is gone
(201)555-1212
201-555-1212
555-1212
I know it should be more robust than just removing spaces, but it is for an internal web site that my employees will be accessing from their iPhone. So, I'm willing to "just get it working."
Here's what I have so far in C# (which should show you how little I know about regular expressions):
strchk = Regex.Replace(strchk, #"\b([\d{3}\-\d{4}|\d{3}\-\d{3}\-\d{4}|\(\d{3}\)\d{3}\-\d{4}])\b", "<a href='tel:$&'>$&</a>", RegexOptions.IgnoreCase);
Can anyone help me by fixing this or suggesting a better way to do this?
EDIT:
Thanks everyone. Here's what I've got so far:
strchk = Regex.Replace(strchk, #"\b(\d{3}[-\.\s]\d{3}[-\.\s]\d{4}|\(\d{3}\)\s*\d{3}[-\.\s]\d{4}|\d{3}[-\.\s]\d{4})\b", "<a href='tel:$1'>$1</a>", RegexOptions.IgnoreCase);
It is picking up just about everything EXCEPT those with (nnn) area codes, with or without spaces between it and the 7 digit number. It does pick up the 7 digit number and link it that way. However, if the area code is specified it doesn't get matched. Any idea what I'm doing wrong?
Second Edit:
Got it working now. All I did was remove the \b from the start of the string.
Remove the [] and add \s* (zero or more whitespace characters) around each \-.
Also, you don't need to escape the -. (You can take out the \ from \-)
Explanation: [abcA-Z] is a character group, which matches a, b, c, or any character between A and Z.
It's not what you're trying to do.
Edits
In response to your updated regex:
Change [-\.\s] to [-\.\s]+ to match one or more of any of those characters (eg, a - with spaces around it)
The problem is that \b doesn't match the boundary between a space and a (.
Afaik, no phone enters the other characters, so why not replace [^0-9] with '' ?
Here's a regex I wrote for finding phone numbers:
(\+?\d[-\.\s]?)?(\(\d{3}\)\s?|\d{3}[-\.\s]?)\d{3}[-\.\s]?\d{4}
It's pretty flexible... allows a variety of formats.
Then, instead of killing yourself trying to replace it w/out spaces using a bunch of back references, instead pass the match to a function and just strip the spaces as you wanted.
C#/.net should have a method that allows a function as the replace argument...
Edit: They call it a `MatchEvaluator. That example uses a delegate, but I'm pretty sure you could use the slightly less verbose
(m) => m.Value.Replace(' ', '')
or something. working from memory here.

Categories

Resources