XmlTextReader node value not the same as value set - c#

I've a XmlTextReader initialized by a memorystream, and the value in MemoryStream is :
<val><![CDATA[value]]></val>
In MemoryStream I've the good byte array corresponding to this value, but when I do :
XmlTextReader reader = new XmlTextReader(myMemoryStream);
reader.ReadToFollowing("val");
string result = reader.ReadElementContentAsString();
I get the following result value :
"\r\n\t\t\t\tvalue\r\n\t\t\t"
Why carriage returns and tabulations are appened to the value? I don't add it when I create the reader...
Hope I'm clear enough.
Thanks for help.
[EDIT]
byte[] DEBUGvalue = myMemoryStream.GetBuffer()
.SkipWhile((b) => b != (byte)'[')
.TakeWhile((b) => b != (byte)']')
.Select((b) => b).ToArray();
And DEBUGvalue contains :
[0] 91 byte ([)
[1] 67 byte (C)
[2] 68 byte (D)
[3] 65 byte (A)
[4] 84 byte (T)
[5] 65 byte (A)
[6] 91 byte ([)
[7] 118 byte (v)
[8] 97 byte (a)
[9] 108 byte (l)
[10] 117 byte (u)
[11] 101 byte (e)
[12] 32 byte ( )
[13] 32 byte ( )
[14] 32 byte ( )
[15] 32 byte ( )
[16] 32 byte ( )

Are you sure this is the literal input for this result?
Have you tried dumping the memStream to a (debug) file and examen the contents?
ReadElementContentAsString() will concatenate CDATA and whitespace. It looks like your input is more like
<val>
<![CDATA[value]]>
</val>

You could create your XmlReader like this.
var settings = new XmlReaderSettings { IgnoreWhitespace = true };
var reader = XmlReader.Create(new StringReader(#"<val> <![CDATA[value]]> </val>"), settings);
That would make things a bit easier.

Related

Order of bytes after BitArray to byte[] convertation

I'm, trying to figure out bytes order after convertation from BitArray to byte[].
Firstly, here is the BitArray content:
BitArray encoded = huffmanTree.Encode(input);
foreach (bool bit in encoded)
{
Console.Write((bit ? 1 : 0));
}
Console.WriteLine();
Output:
Encoded: 000001010110101011111111
Okay, so if we convert these binary to Hex manually we will get: 05 6A FF
However, when I am using convertation in C#, here is what I get:
BitArray encoded = huffmanTree.Encode(input);
byte[] bytes = new byte[encoded.Length / 8 + (encoded.Length % 8 == 0 ? 0 : 1)];
encoded.CopyTo(bytes, 0);
string StringByte = BitConverter.ToString(bytes);
Console.WriteLine(StringByte); // just to check the Hex
Output:
A0-56-FF
Nevertheless, as I have mentioned, it should be 05 6A FF. Please help me to understand why is that so.

SortedDictionary throws "same key already exists" with two different entries

I have two strings, they are not equal :
var filename1 = "Statuts PE signés.pdf";
var filename2 = "Statuts PE signés.pdf";
The characters for filename1 :
[0] S= 83
[1] t=116
[2] a=97
[3] t=116
[4] u=117
[5] t=116
[6] s=115
[7] =32
[8] P=80
[9] E=69
[10] =32
[11] s=115
[12] i=105
[13] g=103
[14] n=110
[15] e=101
[16] ´=769
[17] s=115
[18] .=46
[19] p=112
[20] d=100
[21] f=102
The characters for filename2 :
[0] S=83
[1] t=116
[2] a=97
[3] t=116
[4] u=117
[5] t=116
[6] s=115
[7] =32
[8] P=80
[9] E=69
[10] =32
[11] s=115
[12] i=105
[13] g=103
[14] n=110
[15] é=233
[16] s=115
[17] .=46
[18] p=112
[19] d=100
[20] f=102
I can add this two entries in a Dictionary :
var files1 = new Dictionary<string, int>();
files1.Add(filename1, 1);
files1.Add(filename2, 2); // OK
But when I try with a SortedDictionary, I get "ArgumentException : An entry with the same key already exists" :
var files2 = new SortedDictionary<string, int>();
files2.Add(filename1, 1);
files2.Add(filename2, 2); // throw "ArgumentException : An entry with the same key already exists"
Why ?
It's because by default Dictionary<string, TValue> uses EqualityComparer<string>.Default, which considers filename1 and filename2 different because it uses ordinal comparison. On the other hand, SortedDictionary<string, TValue> uses Comparer<string>.Default, which uses invariant comparison, which considers these strings equal:
Console.WriteLine(filename1 == filename2); // false
Console.WriteLine(EqualityComparer<string>.Default.Equals(filename1, filename2)); // false
Console.WriteLine(Comparer<string>.Default.Compare(filename1, filename2) == 0); // true
You can enforce ordinal comparison also for SortedDictionary by passing StringComparer.Ordinal to the constructor:
Console.WriteLine(StringComparer.Ordinal.Compare(filename1, filename2) == 0); // false

C# Ascii to bytes, parsing not conversion

I have a windows form where you can input text in one textbox, and it outputs the conversion in the other textbox. I have various conversions.
say I input "hello world"
my ascii to bytes function gives me back: 10410110810811132119111114108100
all is good. now I need to use my bytes to ascii function to convert it back.
the problem is that
byte[] b;
b = ASCIIEncoding.ASCII.GetBytes(plaintext); //it is a string from the textbox
OK, MOSTLY SOLVED, BUT, the problem still remains, input "1101000 1100101" as a string, parse as bytes/ byte array, and then get a string out of it. (I know the last part)
UPDATE
From binary input string to ASCII string
using System;
using System.Linq;
public class Program
{
public static void Main()
{
string input = "1101000 1100101 1101100 1101100 1101111 100000 1110111 1101111 1110010 1101100 1100100";
string[] binary = input.Split(' ');
Console.WriteLine(String.Join("", binary.Select(b => Convert.ToChar(Convert.ToByte(b, 2))).ToArray()));
}
}
Results:
hello world
Demo
OLD ANSWER
So now it sounds like you want to convert your string to binary and then from binary back to a string. From my OLD ANSWER, you can use the Select() (LINQ) statement to convert your string to a binary string array.
Once you have a binary string array, to convert it back you have to convert each element to a byte from base 2, then convert the byte to a char resulting in a char[], from which can be converting back to a string. No padding is necessary.
using System;
using System.Linq;
using System.Text;
public class Program
{
public static void Main()
{
string input = "hello world";
byte[] inputBytes = ASCIIEncoding.ASCII.GetBytes(input);
// Decimal display
Console.WriteLine(String.Join(" ", inputBytes));
// Hex display
Console.WriteLine(String.Join(" ", inputBytes.Select(ib => ib.ToString("X2"))));
// Binary display
string[] binary = inputBytes.Select(ib => Convert.ToString(ib, 2)).ToArray();
Console.WriteLine(String.Join(" ", binary));
// Converting bytes back to string
Console.WriteLine(ASCIIEncoding.ASCII.GetString(inputBytes, 0, inputBytes.Length));
// Binary to ASCII (This is what you're looking for)
Console.WriteLine(String.Join("", binary.Select(b => Convert.ToChar(Convert.ToByte(b, 2)))));
}
}
Results:
104 101 108 108 111 32 119 111 114 108 100
68 65 6C 6C 6F 20 77 6F 72 6C 64
1101000 1100101 1101100 1101100 1101111 100000 1110111 1101111 1110010 1101100 1100100
hello world
hello world
Demo
The inverse to ASCIIEncoding.ASCII.GetBytes(string) is ASCIIEncoding.ASCII.GetString(bytes[]):
string plaintext = "hello world";
byte[] b = ASCIIEncoding.ASCII.GetBytes(plaintext);
Console.WriteLine(b); // new bytes[] { 104, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100 }
string s = ASCIIEncoding.ASCII.GetString(b);
Console.WriteLine(s); // "hello world"
how the heck does ASCIIEncoding.ASCII.GetBytes("hello world") give me back 10410110810811132119111114108100?! that's not binary!
It does not give you that number. It gives you a byte array; an array of bytes. And a byte is a number between 0 and 255 (which can be stored in one byte, hence them name). What did you expect? A string containing only 1 and 0 characters? That’s not binary either; that’s a string.
You can use Convert.ToString to get a binary string from a single byte:
Console.WriteLine(Convert.ToString(104, 2)); // "1101000"
Note that you need to left-pad those strings to make them use 8 characters.

PHP SHA1 vs PasswordDeriveBytes SHA1 - differing lengths

Having read around on this issue on these posts (to name but two)...
Encrypting / Decrypting in vb.net does not always return the same value
C# PasswordDeriveBytes Confusion
...I was wondering if anyone knew the method that Microsoft use to key stretch their output?
To integrate with a system written in C# by someone else, I need to generate a large hash in PHP. The code that generates their SHA1 hash generates a 32 byte output, rather than the standard 20 byte SHA1 output. So has anyone cracked how Microsoft fill up the remaining 12 bytes? I've tried lots of combinations of appending the a SHA1 of the salt, password etc. as well as appending on previous iterations, but nothing seems to produce Microsoft's output.
I'm slightly concerned by this comment on https://stackoverflow.com/a/13482133/4346051...
"PasswordDeriveBytes uses an unknown, proprietary, non-deterministic, broken,
cryptographically insecure method of key stretching for PasswordDeriveBytes"
Sample code as follows...
The C# code I need to replicate in PHP in order to produce the correct hash for integration with third party software:
string sPassword = "iudF98yfh9aRfhanifakdfn0a4ut-a8J789jasdpasd=";
string sSalt = "lYU7m+MDCvVWVQyuLIX3og==";
int iLength = 256 / 8;
byte[] saltInBytes = Encoding.ASCII.GetBytes(sSalt);
PasswordDeriveBytes password = new PasswordDeriveBytes(sPassword, saltInBytes, "SHA1", 2);
byte[] keyBytes = password.GetBytes(iLength);
Console.WriteLine("\nkeyBytes = \n");
foreach (byte value in keyBytes)
{
Console.WriteLine(value);
}
...which outputs...
0
176
172
127
35
113
212
85
123
19
71
65
23
127
84
165
163
225
80
207
67
125
128
205
188
248
103
52
23
245
111
20
My PHP code for the same functionality is...
$sPassword = "iudF98yfh9aRfhanifakdfn0a4ut-a8J789jasdpasd=";
$sSalt = "lYU7m+MDCvVWVQyuLIX3og==";
$iIterations = 2;
$iLength = 256 / 8;
// combine password with salt
$key = $sPassword.$sSalt;
// perform sha1 how ever many times, using raw binary output instead of hex
for($i = 1; $i <= $iIterations; $i++) {
$key = sha1($key, true);
}
// get the iLength number of chars
$key = substr($key, 0, $iLength);
$aKeyBytes = unpack('C*', $key);
print_r($aKeyBytes);
...and this outputs...
Array
(
[1] => 0
[2] => 176
[3] => 172
[4] => 127
[5] => 35
[6] => 113
[7] => 212
[8] => 85
[9] => 123
[10] => 19
[11] => 71
[12] => 65
[13] => 23
[14] => 127
[15] => 84
[16] => 165
[17] => 163
[18] => 225
[19] => 80
[20] => 207
)
...but as you can see, we're 12 "bytes" short.
Unfortunately, it has to be replicated - I can't amend the C# to use a better method - I have no control over that side - my PHP code has to generate it
I needed more or less the same thing - data encrypted using AES256-CBC in PHP that I needed to be able to decrypt using VB code.
Took me a while to search the internet and test various code samples, but I was able to get it working using information from the following posts:
C# / VB code here
How do I convert this C# Rijndael encryption to PHP?
Kind regards,
Sven

Why are ASCII values of a byte different when cast as Int32?

I'm in the process of creating a program that will scrub extended ASCII characters from text documents. I'm trying to understand how C# is interpreting the different character sets and codes, and am noticing some oddities.
Consider:
namespace ASCIITest
{
class Program
{
static void Main(string[] args)
{
string value = "Slide™1½”C4®";
byte[] asciiValue = Encoding.ASCII.GetBytes(value); // byte array
char[] array = value.ToCharArray(); // char array
Console.WriteLine("CHAR\tBYTE\tINT32");
for (int i = 0; i < array.Length; i++)
{
char letter = array[i];
byte byteValue = asciiValue[i];
Int32 int32Value = array[i];
//
Console.WriteLine("{0}\t{1}\t{2}", letter, byteValue, int32Value);
}
Console.ReadLine();
}
}
}
Output from program
CHAR BYTE INT32
S 83 83
l 108 108
i 105 105
d 100 100
e 101 101
T 63 8482 <- trademark symbol
1 49 49
½ 63 189 <- fraction
" 63 8221 <- smartquotes
C 67 67
4 52 52
r 63 174 <- registered trademark symbol
In particular, I'm trying to understand why the extended ASCII characters (the ones with my notes added to the right of the third column) show up with the correct value when cast as int32, but all show up as 63 when cast as the byte value. What's going on here?
ASCII.GetBytes conversion replaces all characters outside of ASCII range (0-127) with question mark (code 63).
So since your string contains characters outside of that range your asciiValue have ? instead of all interesting symbols like ™ - its Char (Unicode) repesentation is 8482 which is indeed outside of 0-127 range.
Converting string to char array does not modify values of characters and you still have original Unicode codes (char is essentially Int16) - casting it to longer integer type Int32 does not change the value.
Below are possible conversion of that character into byte/integers:
var value = "™";
var ascii = Encoding.ASCII.GetBytes(value)[0]; // 63(`?`) - outside 0-127 range
var castToByte = (byte)(value[0]); // 34 = 8482 % 256
var Int16 = (Int16)value[0]; // 8482
var Int32 = (Int16)value[0]; // 8482
Details available at ASCIIEncoding Class
ASCIIEncoding corresponds to the Windows code page 20127. Because ASCII is a 7-bit encoding, ASCII characters are limited to the lowest 128 Unicode characters, from U+0000 to U+007F. If you use the default encoder returned by the Encoding.ASCII property or the ASCIIEncoding constructor, characters outside that range are replaced with a question mark (?) before the encoding operation is performed.

Categories

Resources