Unable to Remove \\u0000 From String [duplicate] - c#

This question already has answers here:
Help with \0 terminated strings in C#
(7 answers)
Closed 2 years ago.
I am calling an API that returns a string with the following information: "abc \\u0000\\u0000 fjkdshf". I have tried removing this using the following code but it doesn't seem to work.
string res = str.Replace("\\", string.Empty)
.Replace("u0000", string.Empty)
.Trim();
I have read in a couple of articles that this string doesn't actually display when you are not debugging using Visual Studio so I don't know how to fix this problem. Please help!

you mistake the double backslash as actual characters. They are displayed in the debugger because the second backslash is escaped and the first is used as an escape charachter. if you want to replace it simply use "\u0000".
Here is an examplary programm that prints the UTF code of each character
void Main()
{
string s = "abc \u0000\u0000 fjkdshf";
Console.WriteLine(string.Join(" ", s.Select(x => Convert.ToInt32(x))));
string res = s.Replace("\u0000", string.Empty);
Console.WriteLine(string.Join(" ", res.Select(x => Convert.ToInt32(x))));
}
Output:
97 98 99 32 0 0 32 102 106 107 100 115 104 102
97 98 99 32 32 102 106 107 100 115 104 102
as you can see in the output, after the replacement the zeros are gone!
More information about literal string escaping

Related

NetworkStream.read() reads all bytes but won't convert to string

I am trying to read the response of the server when attempting to log on using networkStream.read() using the following code:
if (connectionStream.DataAvailable && connectionStream.CanRead)
{
byte[] myReadBuffer = new byte[64];
string responseMessage = string.Empty;
int numberOfBytesRead = 0;
do
{
numberOfBytesRead = connectionStream.Read(myReadBuffer, 0, myReadBuffer.Length);
responseMessage = Encoding.ASCII.GetString(myReadBuffer, 0, numberOfBytesRead);
} while (connectionStream.DataAvailable);
Debug.Log("Message:" + responseMessage);
#breakpoint
if (responseMessage.Contains("OK"))
{
Debug.Log("logon sucessful");
}
else
{
Debug.LogError("Logon denied!");
}
}
By inspecting my local variables at breakpoint i know the Read() is excecuted without problem as numberOfBytesRead is set to 32, and myReadBuffer is filled with 32 bytes (all bytes in myReadBuffer match the bytes sent by the server). However after trying to extract the string from myReadbuffer using Encoding.ASCII.GetString() the string is still empty (Visual studio also says it is empty at the breakpoint), even though myReadBuffer isn't.
The bytes in myReadBuffer read:
32 0 0 0
1 0 0 0
0 0 0 0
76 79 71 79
78 58 32 48
59 79 75 59
32 83 83 61
54 66 67 0
which translates to: _ _ _ _ _ _ _ _ _ _ L O G O N : 0 ; O K ; S S = 5 A 8 _
Any suggestions as to what can cause this?
The response from the server contains some null ('\0') characters. Despite what the docs on Strings (C#) say about null termination in C#:
There is no null-terminating character at the end of a C# string; therefore a C# string can contain any number of embedded null characters ('\0').
Unity does not seem to comply to this, and actually does terminate a string after a null character has been encountered. Though I couldn't find any references to this in the unity docs.
The fix i ended up going with was replacing the null characters by spaces (Could also remove the null characters completely, but i want to know the characters were there at some point) like so: responseMessage = responseMessage.Replace('\x0', '\x0020');
While creating this post i figured it out, but coulnd't find any other posts on SO describing my problem. So answering it myself for future references. If anyone has any other/better solutions or additional information i'd still be glad to hear/accept that.

C# string.format add a "-" value?

I have a string.format issue ...
I'm trying to pass my invoice ID as an arguments to my program ... and the 6th argument always end up with "-" no matter what I do ( we must use the ¿ because of an old program ) ...
public static void OpenIdInvoice(string wdlName, string IdInvoice, Form sender){
MessageBox.Show(string.Format("¿{0}",IdInvoice));
proc.Arguments = string.Format("{0}¿{1}¿{2}¿{3}¿{4}¿­{5}",
session.SessionId.ToString(),
Session.GetCurrentDatabaseName(),
session.Librairie,
wdlName,
"",
IdInvoice
);
System.Windows.Forms.MessageBox.Show(proc.Arguments);
In the end, "-" is always added to my formatted result, but only before my IdInvoice ... (so Id 10 ends up -10 in my Arguments )
now the fun part ... I hardcode some string and ...
if I pass -1 instead of an Id, I have --1 as a result and If I write "banana" ... i get "-banana" ...
I know I could just build the string otherwise ... but I'm getting curious as to why it happens.
Here's the screenshot ...
EDIT :
thats the copy/paste of my code
var proc = new System.Diagnostics.ProcessStartInfo("Achat.exe");
System.Windows.Forms.MessageBox.Show(string.Format("¿{0}",args));
proc.Arguments = string.Format(#"{0}¿{1}¿{2}¿{3}¿{4}¿­{5}¿{6}",
"12346", //session.SessionId.ToString(),
"fake DB",//Session.GetCurrentDatabaseName().ToString(),
"false", //session.Librairie.ToString(),
"myScreenName", //wdl.ToString(),
"123456",
"Banana",
"123456"
//args.ToString(),
);
System.Windows.Forms.MessageBox.Show(proc.Arguments);
System.Windows.Forms.MessageBox.Show(args);
and thats the copy/paste of my text visualiser result :
12346¿fake DB¿false¿myScreenName¿123456¿­Banana¿123456
You literally have an extra character before "{5}" that's called a soft hyphen. It's one of those weird characters that isn't always displayed. If you place your cursor after the "{" in "{5}" and press the left arrow and then press backspace it will actually delete it. That or you can try to use an editor like Notepad++ that will display it. I was able to find it by running the following code
var t = #"{0}¿{1}¿{2}¿{3}¿{4}¿­{5}";
foreach (var c in t)
{
Console.WriteLine((int)c + " " + c);
}
which printed out
123 {
48 0
125 }
191 ¿
123 {
49 1
125 }
191 ¿
123 {
50 2
125 }
191 ¿
123 {
51 3
125 }
191 ¿
123 {
52 4
125 }
191 ¿
173 -
123 {
53 5
125 }

parse hex incoming / create outgoing strings

I've modified an example to send & receive from serial, and that works fine.
The device I'm connecting to has three commands I need to work with.
My experience is with C.
MAP - returns a list of field_names, (decimal) values & (hex) addresses
I can keep track of which values are returned as decimal or hex.
Each line is terminated with CR
:: Example:
MEMBERS:10 - number of (decimal) member names
NAME_LENGTH:15 - (decimal) length of each name string
NAME_BASE:0A34 - 10 c-strings of (15) characters each starting at address (0x0A34) (may have junk following each null terminator)
etc.
GET hexaddr hexbytecount - returns a list of 2-char hex values starting from (hexaddr).
The returned bytes are a mix of bytes/ints/longs, and null terminated c-strings terminated with CR
:: Example::
get 0a34 10 -- will return
0A34< 54 65 73 74 20 4D 65 20 4F 75 74 00 40 D3 23 0B
This happens to be 'Test Me Out'(00) followed by junk
etc.
PUT hexaddr hexbytevalue {{value...} {value...}} sends multiple hex byte values separated by spaces starting at hex address, terminated by CR/LF
These bytes are a mix of bytes/ints/longs, and null terminated c-strings :: Example:
put 0a34 50 75 73 68 - (ascii Push)
Will replace the first 4-chars at 0x0A34 to become 'Push Me Out'
SAVED OK
See my answer previously about serial handling, which might be useful Serial Port Polling and Data handling
to convert your response to actual text :-
var s = "0A34 < 54 65 73 74 20 4D 65 20 4F 75 74 00 40 D3 23 0B";
var hex = s.Substring(s.IndexOf("<") + 1).Trim().Split(new char[] {' '});
var numbers = hex.Select(h => Convert.ToInt32(h, 16)).ToList();
var toText = String.Join("",numbers.TakeWhile(n => n!=0)
.Select(n => Char.ConvertFromUtf32(n)).ToArray());
Console.WriteLine(toText);
which :-
skips through the string till after the < character, then splits the rest into hex string
then, converts each hex string into ints ( base 16 )
then, takes each number till it finds a 0 and converts each number to text (using UTF32 encoding)
then, we join all the converted strings together to recreate the original text
alternatively, more condensed
var hex = s.Substring(s.IndexOf("<") + 1).Trim().Split(new char[] {' '});
var bytes = hex.Select(h => (byte) Convert.ToInt32(h, 16)).TakeWhile(n => n != 0);
var toText = Encoding.ASCII.GetString(bytes.ToArray());
for converting to hex from a number :-
Console.WriteLine(123.ToString("X"));
Console.WriteLine(123.ToString("X4"));
Console.WriteLine(123.ToString("X8"));
Console.WriteLine(123.ToString("x4"));
also you will find playing with hex data is well documented at https://msdn.microsoft.com/en-us/library/bb311038.aspx

How to filter hidden characters in a String using C#

I am new to C# and trying to lean how to filter data that I read from a file. I have a file that I read from that has data similer to the follwoing:
3 286 858 95.333 0.406 0.427 87.00 348 366 4 b
9 23 207 2.556 0.300 1.00 1.51 62 207 41 a
9 37 333 4.111 0.390 0.811 2.03 130 270 64 a
10 21 210 2.100 0.348 0.757 3.17 73 159 23 a
9 79 711 8.778 0.343 0.899 2.20 244 639 111 a
10 66 660 6.600 0.324 0.780 2.25 214 515 95 a
When I read these data, some of them have Carriage return Or Line Feed characters hidden in them. Can you please tell me if there is a way to remove them. For example, one of my variable may hold the the following value due to a newline character in them:
mystringval = "9
"
I want this mystringval variable to be converted back to
mystringval = "9"
If you want to get rid of all special characters, you can learn regular expressions and use Regex.Replace.
var value = "&*^)#abcd.";
var filtered = System.Text.RegularExpressions.Regex.Replace(value, #"[^\w]", "");
REGEXPLANATION
the # before the string means that you're using a literal string and c# escape sequences don't work, leaving only the regex escape sequences
[^abc] matches all characters that are not a, b, or c(to replace them with empty space)
\w is a special regex code that means a letter, number, or underscore
you can also use #"[^A-Za-z0-9\.]" which will filter letters, numbers and decimal. See http://rubular.com/ for more details.
As well as using RegEx, you can use LINQ to do something like
var goodCharacters = input
.Replace("\r", " ")
.Replace("\n", " ")
.Where(c => char.IsLetterOrDigit(c) || c == ' ' || c == '.')
.ToArray();
var result = new string(goodCharacters).Trim();
The first two Replace calls will guard against having a number at the end of one line and a number at the start of the next, e.g. "123\r\n987" would otherwise be "123987", whereas I assume you want "123 987".
Try my sample here on ideone.com.

adding 48 to string number

I have a string in C# like this:
string only_number;
I assigned it a value = 40
When I check only_number[0], I get 52
When I check only_number[1], I get 48
why it is adding 48 to a character at current position? Please suggest
String is basically char[]. So what you are seeing is ASCII value of char 4 and 0.
Proof: Diff between 4 and 0 = Diff between 52 and 48.
Since it is a string so you didn't assigned it 40. Instead you assigned it "40".
What you see is the ASCII code of '4' and '0'.
It's not adding 48 to the character. What you see is the character code, and the characters for digits start at 48 in Unicode:
'0' = 48
'1' = 49
'2' = 50
'3' = 51
'4' = 52
'5' = 53
'6' = 54
'7' = 55
'8' = 56
'9' = 57
A string is a range of char values, and each char value is a 16 bit integer basically representing a code point in the Unicode character set.
When you read from only_number[0] you get a char value that is '4', and the character code for that is 52. So, what you have done is reading a character from the string, and then converted that to an integer before you display it.
So:
char c = only_number[0];
Console.WriteLine(c); // displays 4
int n = (int)only_number[0]; // cast to integer
Console.WriteLine(n); // displays 52
int m = only_number[0]; // the cast is not needed, but the value is cast anyway
Console.WriteLine(m); // displays 52
You are accessing this string and it is outputting the ASCII character codes for each of your two characters, '4' and '0' - please see here:
http://www.theasciicode.com.ar/ascii-control-characters/null-character-ascii-code-0.html
string is the array of chars, so, that;s why you recieved these results, it basicallly display the ASCII of '4' and '0'.

Categories

Resources