I'm trying to parse some phone numbers, and I have a function to check if the parsed string is made up of only numbers or the + sign.
In some of them there is an hiden character of value 8236.
Comparing it against '\0' and '\u8236' doesnt work...
What is this character and how do I remove it?
Thanks to #Maximilian Gerhardt who sent this link in a comment https://www.fileformat.info/info/unicode/char/202c/index.htm
I was able to know that 8236 corresponds to character '\u202c'
So I did str.Trim('\u202c')
And it did work
edit:
The simple way to get the corresponding code is to convert from decimal to hex.
8236(decimal) -> 202C(hexadecimal)
I had the same issue, but with character 8237, which led me to this post.
This corresponds with character \u202d.
Related
sample data1: value1 sampledata2: value value 2
sample data3: data3 sampledata5:
sampledata4: value value value2
sampledata6: sampledata7: value-value,value
I tried the following regex:
(*keywordsample* *\d+ *:[ ]{0,25})([\w\-\,\.] {0,2})+
I assumed that if there 25 whitespaces after the keyword:, then the value is null for that keyword.
values have 2 spaces at most. Ex:
value value - valid
valuevalue - valid
value value-invalid(3 whitespaces between values)
The following data has values:
sample data1-value1
sampledata2-value value 2
sample data3-data3
sampledata5-null
sampledata4-value value value2
sampledata6-null
sampledata7-value-value,value
However, i think the 25 spaces is not safe.
Is there any other way to implement this?
Will there ever be any other value than whitespace?
You can reverse the logic and check for anything, not whitespace. Use it in conjunction with the end of line specifier and the Multiline option and you should be good to go.
Something like:
keyword\s*([^\s]*)$
Based on comments:
Okay nevermind my previous edit ;) I Finally got behind a computer and nailed it (probably needs some minor tweaks to fit your case exactly):
keyword:\s*([^\s:]+\s{0,2}[^\s:]*)\s[^s]+:
The lesson here is you should look for something that can be used as a delimiter. In this case, there is not much to work with, except the knowledge that there will be another keyword followed by a :
Again, for values at the end of a line use:
keyword:\s*([^\s:]+\s{0,2}[^\s:]*)$
(?<=keyword {0,1}: *)\b(\w+ {0,2})(?!( {0,2}\w)+:)\b
finally discovered the answer thank you for the ideas
I have some string that contains the following code/value:
"You won\u0092t find a ...."
It looks like that string contains the Right Apostrophe special character.
ref1: Unicode control 0092
ref2: ASCII chart (both 127 + extra extended ascii)
I'm not sure how to display this to the webbrowser. It keeps displaying the TOFU square-box character instead. I'm under the impression that the unicode (hex) value 00092 can be converted to unicode (html)
Is my understanding correct?
Update 1:
It was suggested by #sam-axe that I HtmlEncode the unicode. That didn't work. Here it is...
Note the ampersand got correctly encoded....
It looks like there's an encoding mix-up. In .NET, strings are normally encoded as UTF-16, and a right apostrophe should be represented as \u2019. But in your example, the right apostrophe is represented as \x92, which suggests the original encoding was Windows code page 1252. If you include your string in a Unicode document, the character \x92 won't be interpreted properly.
You can fix the problem by re-encoding your string as UTF-16. To do so, treat the string as an array of bytes, and then convert the bytes back to Unicode using the 1252 code page:
string title = "You won\u0092t find a cheaper apartment * Sauna & Spa";
byte[] bytes = title.Select(c => (byte)c).ToArray();
title = Encoding.GetEncoding(1252).GetString(bytes);
// Result: "You won’t find a cheaper apartment * Sauna & Spa"
Note: much of my answer is based on guessing and looking at the decompiled code of System.Web 4.0. The reference source looks very similar (identical?).
You're correct that "" (6 characters) can be displayed in the browser. Your output string, however, contains "\u0092" (1 character). This is a control character, not an HTML entity.
According to the reference code, WebUtility.HtmlEncode() doesn't transform characters between 128 and 160 - all characters in this range are control characters (ampersand is special-cased in the code as are a few other special HTML symbols).
My guess is that because these are control characters, they're output without transformation because transforming it would change the meaning of the string. (I tried running some examples using LinqPad, this character was not rendered.)
If you really want to transform these characters (or remove them), you'll probably have to write your own function before/after calling HtmlEncode() - there may be something that does this already but I don't know of any.
Hope this helps.
Edit: Michael Liu's answer seems correct. I'm leaving my answer here because it may be useful in cases when the input encoding of a string is not known.
I have what I think is an easy problem. For some reason the following code generates the exception, "String must be exactly one character long".
int n = 0;
foreach (char letter in charMsg)
{
// Get the integral value of the character.
int value = Convert.ToInt32(letter);
// Convert the decimal value to a hexadecimal value in string form.
string hexOutput = String.Format("{0:X}", value);
//Console.WriteLine("Hexadecimal value of {0} is {1}", letter, hexOutput);
charMsg[n] = Convert.ToChar(hexOutput);
n++;
}
The exception occurs at the charMsg[n] = Convert.ToChar(hexOutput); line. Why does it happen? When I check the values of CharMsg it seems to contain all of them properly, yet still throws an error at me.
UPDATE: I've solved this problem, it was my mistake. Sorry for bothering you.
OK, this was a really stupid mistake on my part. Point is, with my problem I'm not even supposed to do this as hex values clearly won't help me in any way.
What I am trying to do it to encrypt a message in an image. I've already encrypted the length of said message in last digits on each color channel of first pixel. Now I'm trying to put the very message in there. I lookt here: http://en.wikipedia.org/wiki/ASCII and said to myself without thinking that usung hexes would be a good idea. Can't belive I thought that.
Convert.ToChar( string s ), per the documentation requires a single character string, otherwise it throws a FormatException as you've noted. It is a rough, though more restrictive, equivalent of
public char string2char( string s )
{
return s[0] ;
}
Your code does the following:
Iterates over all the characters in some enumrable collection of characters.
For each such character, it...
Converts the char to an int. Hint: a char is an integral type: its an unsigned 16-bit integral value.
converts that value to a string containing a hex representation of the character in question. For most characters, that string will be at least two character in length: for instance, converting the space character (' ', 0x20) this way will give you the string "20".
You then try to convert that back to a char and replace the current item being iterated over. This is where your exception is thrown. One thing you should note here is that altering a collection being enumerated is likely to cause the enumerator to throw an exception.
What exactly are you trying to accomplish here. For instance, given a charMsg that consist of 3 characters, 'a', 'b' and 'c', what should happen. A clear problem statement helps us to help you.
Since printable unicode characters can be anywhere in range from 0x0000 to 0xFFFF, your hexOutput variable can hold more than one character - this is why error is thrown.
Convert.ToChar(string) would always check length a of string, and if it is not equal to 1 - it would throw. So it would not convert string 0x30 to hexadecimal number, and then to ascii representation, symbol 0.
Can you elaborate on what you are trying to archieve ?
Your hexOutput is a string, and I'm assuming charMsg is a character array. Suppose the first element in charMsg is 'p', or hex value 70. The documentation for Convert.ToChar(string) says it'll use just the first character of the string ('7'), but it's wrong. It'll throw this error. You can test this with a static example, like charMsg[n] = Convert.ToChar("70");. You'll get the same error.
Are you trying to replace characters with hex values? If so, you might try using a StringBuilder object instead of your array assignments.
Convert.ToChar(string) if it is empty string lead this error. instead use cchar()
We have an import service for stores into our database. The latitude and longitude columns are not currently being formatted on import and have resulted in invalid values being inserted into our db. I therefore need to format any lat/long to have 3 places before the decimal and up to 6 after which I thought would be Format("###.######") but it doesn't seem to work. Input values such as 38.921322 or -12.235 have not seemed to conform to the formatting provided. Could anyone a bit more experienced in the area of C# string formatting shed light on how to achieve this? Thank you in advance.
Have you tried String.Format("{0:000.000000}", value);?
Console.WriteLine("{0:000.000000}", 123);
Outputs: 123.000000
Use "0" instead of "#". From MSDN:
0 Replaces the zero with the corresponding digit if one is present; otherwise, zero appears in the result string.
# Replaces the "#" symbol with the corresponding digit if one is present; otherwise, no digit appears in the result string.
I want infinity Symbol i my string. i used following Code to get infinity symbol
char.ConvertFromUtf32(8734)
and it convert to json when json is encoded ie.
Encoding.ASCII.GetBytes(json)
then it convert "∞" to "?" symbol
so how i can resolve this problem. please Help me.
thanks.
The infinity sign ∞ is not part of the ASCII character set. So by using Encoding.ASCII.GetBytes() you explicitly exclude it from the string, effectivly replacing it with a placeholder, in this case ?
Since you use the resulting byte array for a JSON reply, you might want to consider using UTF8 inxtead of ASCII