string not equal event it is actually equal - c#

My application have an auto update feature. To verify if it successfully download the file I compare two hash, one to the xml and to the hash generated after downloading. The two hash is same but its throwing me that the two hash not same. When I check the size, xml hash string have 66 and the other is 36. I use the trim method but still no luck.
string file = ((string[])e.Argument)[0];
string updateMD5 = "--"+((string[])e.Argument)[1].ToUpper()+"--";
string xx="--"+Hasher.HashFile(file, HashType.MD5).ToUpper()+"--";
// Hash the file and compare to the hash in the update xml
int xxx = (updateMD5.Trim()).Length;
int xxxxx = xx.Trim().Length;
if (String.Equals(updateMD5.Trim(), xx.Trim(), StringComparison.InvariantCultureIgnoreCase))
e.Result = DialogResult.OK;
else
e.Result = DialogResult.No;
hasher code
internal static string HashFile(string filePath, HashType algo)
{
switch (algo)
{
case HashType.MD5:
return MakeHashString(MD5.Create().ComputeHash(new FileStream(filePath, FileMode.Open)));
case HashType.SHA1:
return MakeHashString(SHA1.Create().ComputeHash(new FileStream(filePath, FileMode.Open)));
case HashType.SHA512:
return MakeHashString(SHA512.Create().ComputeHash(new FileStream(filePath, FileMode.Open)));
default:
return "";
}
}
private static string MakeHashString(byte[] hash)
{
StringBuilder s = new StringBuilder();
foreach (byte b in hash)
s.Append(b.ToString("x2").ToLower());
return s.ToString();
}
NOTE: I use the '--' to check if there are trailing space
StringBuilder s=new StringBuilder();
foreach (char c in updateMD5.Trim())
s.AppendLine(string.Format("{0}=={1}",c,(int)c));

Once you showed the character for character output of the longer string the explanation is clear.
As to why this happens, that's pretty impossible to tell from our end due to the nature of the problem.
Anyway, the problem are these two:
==8204
==8203
Those two code points are 0x200C and 0x200B aka:
0x200C = ZERO WIDTH NON-JOINER
0x200B = ZERO WIDTH SPACE
These are invisible characters meant to give hints to word-breaking algorithms and similar gory stuff.
Simply put, somewhere in your code where you concatenate strings you have those two characters as part of your source code. Since they're not visible in your source code either (zero width, remember) they can be hard to spot.
I would take a look at all strings involved in thise, in particular I would starte with the "x2" format string used to build up the hash code, or possibly the code that returns the MD5 code for the update to apply.

Related

C# utf string conversion, characters which don't display correctly get converted to "unknown character" - how to prevent this?

I've got two strings which are derived from Windows filenames, which contain unicode characters that do not display correctly in Windows (they show just the square box "unknown character" instead of the correct character). However the filenames are valid and these files exist without problems in the operating system, which means I need to be able to deal with them correctly and accurately.
I'm loading the filenames the usual way:
string path = #"c:\folder";
foreach (FileInfo file in DirectoryInfo.EnumerateFiles(path))
{
string filename = file.FullName;
}
but for the purposes of explaining this problem, these are the two filenames I'm having issues with:
string filename1 = "\ude18.txt";
string filename2 = "\udca6.txt";
Two strings, two filenames with a single unicode character plus an extension, both different. This so far is fine, I can read and write these files no problem, however I need to store these strings in a sqlite db and later retrieve them. Every attempt I make to do so results in both of these characters being changed to the "unknown character", so the original data is lost and I can no longer differentiate the two strings. At first I thought this was an sqlite issue, and I've made sure my db is in UTF16, but it turns out it's the conversion in c# to UTF16 that is causing the problem.
If I ignore sqlite entirely, and simply try to manually convert these strings to UTF16 (or to any other encoding), these characters are converted to the "unknown character" and the original data is lost. If I do this:
System.Text.Encoding enc = System.Text.Encoding.Unicode;
string filename1 = "\ude18.txt";
string filename2 = "\udca6.txt";
byte[] name1Bytes = enc.GetBytes(filename1);
byte[] name2Bytes = enc.GetBytes(filename2);
and I then inspect the bytearrays 'name1Bytes' and 'name2Bytes' they are both identical. and I can see that the unicode character in both cases has been converted to a pair of bytes 253 and 255 - the unknown character. and sure enough when I convert back
string newFilename1 = enc.GetString(name1Bytes);
string newFilename2 = enc.GetString(name2Bytes);
the orignal unicode character in each case is lost, and replaced with a diamond question mark symbol. I have lost the original filenames altogether.
It seems that these encoding conversions rely on the system font being able to display the characters, and this is a problem as these strings already exist as filenames, and changing the filenames isn't an option. I need to preserve this data somehow when sending it to sqlite, and when it's sent to sqlite it will go through a conversion process to UTF16, and it's this conversion that I need it to survive without losing data.
If you cast a char to an int, you get the numeric value, bypassing the Unicode conversion mechanism:
foreach (char ch in filename1)
{
int i = ch; // 0x0000de18 == 56856 for the first char in filename1
... do whatever, e.g., create an int array, store it as base64
}
This turns out to work as well, and is perhaps more elegant:
foreach (int ch in filename1)
{
...
}
So perhaps something like this:
string Encode(string raw)
{
byte[] bytes = new byte[2 * raw.Length];
int i = 0;
foreach (int ch in raw)
{
bytes[i++] = (byte)(ch & 0xff);
bytes[i++] = (byte)(ch >> 8);
}
return Convert.ToBase64String(bytes);
}
string Decode(string encoded)
{
byte[] bytes = Convert.FromBase64String(encoded);
char[] chars = new char[bytes.Length / 2];
for (int i = 0; i < chars.Length; ++i)
{
chars[i] = (char)(bytes[i * 2] | (bytes[i * 2 + 1] << 8));
}
return new string(chars);
}

Hashed passwords displays weirdly

I've ran into a curious behaviour when trying to hash a string password and then display the hash in console.
My code is:
static void Main(string[] args)
{
string password = "password";
ConvertPasswordToHash(password);
}
private static void ConvertPasswordToHash(string password)
{
using (HashAlgorithm sha = SHA256.Create())
{
byte[] result = sha.ComputeHash(Encoding.UTF8.GetBytes(password));
string hashText = Encoding.UTF8.GetString(result);
Console.WriteLine(hashText);
StringBuilder sb = new StringBuilder();
foreach (var item in result)
{
sb.Append((char)item);
}
Console.WriteLine(sb);
}
}
The problem is two fold:
The hashTest and sb contain different values (both are 32 characters long before outputting) and 2) Console outputs are even stranger. They are not 32 characters in length and second the outputs are slightly different:
When examining the strings before outputting them, I've noticed that hashText contains for instance \u0004, which could be a unicode character of some sort while sb does not contain that at all (that is before outputting the values into the console).
My questions are:
Which way is the correct way of getting a string of chars from the provided array of bytes?
Why are the console outputs different but only so slightly? It does not look like it is the fault of using the wrong Encoding.
How do I output the correct hash (32 symbols) into the console? Ive tried adding '#' before the strings to cancel any possible carriage returns etc... Pretty much without any result.
Maybe I am missing something obvious. Thank you.
The correct logic should be as follows:
private static void ConvertPasswordToHash(string password)
{
using (HashAlgorithm sha = SHA256.Create())
{
byte[] result = sha.ComputeHash(Encoding.UTF8.GetBytes(password));
StringBuilder sb = new StringBuilder();
foreach (var item in result)
{
sb.Append(item.ToString("x2"));
}
Console.WriteLine(sb);
}
}
ToString("x2") formats the string as two hexadecimal characters.
Live example: https://dotnetfiddle.net/QkREkX
Another way is just to represent your byte[] array as a base 64 string, no StringBuilder required.
byte[] result = sha.ComputeHash(Encoding.UTF8.GetBytes(password));
Console.WriteLine(Convert.ToBase64String(result));

Why do these code blocks produce different results?

Below are 2 similar code blocks. They take a string, encrypt in SHA512, then convert to Base64, I had trouble getting the second code block to produce the same results as my manual test using online calculators and encoders. So I broke the process down step by step and discovered that it was capable of producing the same results as my manual test but only if it behaved like the first code block. Why do these two code blocks produce different results? Thanks!
private void EditText_AfterTextChanged(object sender, AfterTextChangedEventArgs e)
{
//This builds a string to encrypt.
string domain = txtDomain.Text;
string username = txtUsername.Text;
string pin = txtPin.Text;
txtPreview.Text = string.Format("{0}+{1}+{2}", domain, username, pin);
//This takes the above string, encrypts it.
StringBuilder Sb = new StringBuilder();
SHA512Managed HashTool = new SHA512Managed();
Byte[] PhraseAsByte = System.Text.Encoding.UTF8.GetBytes(string.Concat(txtPreview.Text));
Byte[] EncryptedBytes = HashTool.ComputeHash(PhraseAsByte);
HashTool.Clear();
//This rebuilds the calculated hash for manual comparison.
foreach (Byte b in EncryptedBytes)
Sb.Append(b.ToString("x2"));
txtHash.Text = Sb.ToString();
//This takes the rebuilt hash and re-converts it to bytes before encoding it in Base64
EncryptedBytes = System.Text.Encoding.UTF8.GetBytes(string.Concat(txtHash.Text));
txtResult.Text = Convert.ToBase64String(EncryptedBytes);
}
and
private void EditText_AfterTextChanged(object sender, AfterTextChangedEventArgs e)
{
//This builds a string to encrypt.
string domain = txtDomain.Text;
string username = txtUsername.Text;
string pin = txtPin.Text;
txtPreview.Text = string.Format("{0}+{1}+{2}", domain, username, pin);
//This takes the above string, encrypts it.
StringBuilder Sb = new StringBuilder();
SHA512Managed HashTool = new SHA512Managed();
Byte[] PhraseAsByte = System.Text.Encoding.UTF8.GetBytes(string.Concat(txtPreview.Text));
Byte[] EncryptedBytes = HashTool.ComputeHash(PhraseAsByte);
HashTool.Clear();
//This takes the EncryptedBytes and converts them to base64.
txtResult.Text = Convert.ToBase64String(EncryptedBytes);
//This reverses the EncryptedBytes into readable hash for manual comparison
foreach (Byte b in EncryptedBytes)
Sb.Append(b.ToString("x2"));
txtHash.Text = Sb.ToString();
}
Found the answer, no thanks to your less-than-useful downvotes..
Encoding.Unicode is Microsoft's misleading name for UTF-16 (a double-wide encoding, used in the Windows world for historical reasons but not used by anyone else). http://msdn.microsoft.com/en-us/library/system.text.encoding.unicode.aspx
If you inspect your bytes array, you'll see that every second byte is 0x00 (because of the double-wide encoding).
You should be using Encoding.UTF8.GetBytes instead.
But also, you will see different results depending on whether or not you consider the terminating '\0' byte to be part of the data you're hashing. Hashing the two bytes "Hi" will give a different result from hashing the three bytes "Hi". You'll have to decide which you want to do. (Presumably you want to do whichever one your friend's PHP code is doing.)
For ASCII text, Encoding.UTF8 will definitely be suitable. If you're aiming for perfect compatibility with your friend's code, even on non-ASCII inputs, you'd better try a few test cases with non-ASCII characters such as é and 家 and see whether your results still match up. If not, you'll have to figure out what encoding your friend is really using; it might be one of the 8-bit "code pages" that used to be popular before the invention of Unicode. (Again, I think Windows is the main reason that anyone still needs to worry about "code pages".)
Source: Hashing a string with Sha256

SHA1 C# method equivalent in Perl?

I was given C# code and I'm trying to generate the equivalent SHA1 using Perl.
public string GetHashedPassword(string passkey)
{
// Add a timestamp to the passkey and encrypt it using SHA1.
string passkey = passkey + DateTime.UtcNow.ToString("yyyyMMddHH0000");
using (SHA1 sha1 = new SHA1CryptoServiceProvider())
{
byte[] hashedPasskey =
sha1.ComputeHash(Encoding.UTF8.GetBytes(passkey));
return ConvertToHex(hashedPasskey);
}
}
private string ConvertToHex(byte[] bytes)
{
StringBuilder hex = new StringBuilder();
foreach (byte b in bytes)
{
if (b < 16)
{
hex.AppendFormat("0{0:X}", b);
}
else
{
hex.AppendFormat("{0:X}", b);
}
}
return hex.ToString();
}
The same as:
use Digest::SHA1 qw( sha1_hex );
my $pass = "blahblah";
my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = gmtime();
$year += 1900;
my $date = sprintf("%d%02d%02d%02d0000", $year, $mon+1, $mday, $hour);
my $passSha1 = sha1_hex($pass.$date);
//9c55409372610f8fb3695d1c7c2e6945164a2578
I don't actually have any C# experience so I'm not able to test what is normally outputted from the C# code.
The code is supposed to be used as a checksum for a website but the one I'm providing is failing.
Edit: it also adds the UTC timestamp (yyyyMMDDHH0000) to the end of the pass before hashing so I've added that code in case the issue is there.
I do not know C# either. However, {0:X} formats hex digits using upper case letters. So, would
my $passSha1 = uc sha1_hex($pass);
help? (Assuming GetHashedPassword makes sense.)
The only difference I can see (from running the code under Visual Studio 2008) is that the C# code is returning the hex string with alphas in uppercase
D3395867D05CC4C27F013D6E6F48D644E96D8241
and the perl code is using lower case for alphas
d3395867d05cc4c27f013d6e6f48d644e96d8241
The format string used in the C# code is asking for uppercase ("X" as opposed to "x"):
hex.AppendFormat("{0:X}", b);
Maybe the code at the website is using a case sensitive comparison? I assume it would be trivial for you to convert the output from the CPAN function to uppercase before you submit it?
Could it be as simple as changing the uppercase 'X' in the AppendFormat call to a lowercase 'x'?
I think you're looking for Digest::SHA1
Your SHA-1 could have also just been:
BitConverter.ToString(SHA.ComputeHash(buffer)).Replace("-", "");

Decoding base64-encoded data from xml document

I receive some xml-files with embedded base64-encoded images, that I need to decode and save as files.
An unmodified (other than zipped) example of such a file can be downloaded below:
20091123-125320.zip (60KB)
However, I get errors like "Invalid length for a Base-64 char array" and "Invalid character in a Base-64 string". I marked the line in the code where I get the error in the code.
A file could look like this:
<?xml version="1.0" encoding="windows-1252"?>
<mediafiles>
<media media-type="image">
<media-reference mime-type="image/jpeg"/>
<media-object encoding="base64"><![CDATA[/9j/4AAQ[...snip...]P4Vm9zOR//Z=]]></media-object>
<media.caption>What up</media.caption>
</media>
</mediafiles>
And the code to process like this:
var xd = new XmlDocument();
xd.Load(filename);
var nodes = xd.GetElementsByTagName("media");
foreach (XmlNode node in nodes)
{
var mediaObjectNode = node.SelectSingleNode("media-object");
//The line below is where the errors occur
byte[] imageBytes = Convert.FromBase64String(mediaObjectNode.InnerText);
//Do stuff with the bytearray to save the image
}
The xml-data is from an enterprise newspaper system, so I am pretty sure the files are ok - and there must be something in the way I process them, that is just wrong. Maybe a problem with the encoding?
I have tried writing out the contents of mediaObjectNode.InnerText, and it is the base64 encoded data - so the navigating the xml-doc is not the issue.
I have been googling, binging, stackoverflowing and crying - and found no solution... Help!
Edit:
Added an actual example file (and a bounty). PLease note the downloadable file is in a bit different schema, since I simplified it in the above example, removing irrelevant stuff...
For a first shot i didn't use any programming language, just Notepad++
I opened the xml file within and copy and pasted the raw base64 content into a new file (without square brackets).
Afterwards I selected everything (Strg-A) and used the option Extensions - Mime Tools - Base64 decode. This threw an error about the wrong text length (must be mod 4). So i just added two equal signs ('=') as placeholder at the end to get the correct length.
Another retry and it decoded successfully into 'something'. Just save the file as .jpg and it opens like a charm in any picture viewer.
So i would say, there IS something wrong with the data you'll get. They just don't have the right numbers of equal signs at the end to fill up to a number of signs which can be break into packets of 4.
The 'easy' way would be to add the equal sign till the decoding doesn't throw an error. The better way would be to count the number of characters (minus CR/LFs!) and add the needed ones in one step.
Further investigations
After some coding and reading of the convert function, the problem is a wrong attaching of a equal sign from the producer. Notepad++ has no problem with tons of equal signs, but the Convert function from MS only works with zero, one or two signs. So if you fill up the already existing one with additional equal signs you get an error too! To get this damn thing to work, you have to cut off all existing signs, calculate how much are needed and add them again.
Just for the bounty, here is my code (not absolute perfect, but enough for a good starting point): ;-)
static void Main(string[] args)
{
var elements = XElement
.Load("test.xml")
.XPathSelectElements("//media/media-object[#encoding='base64']");
foreach (XElement element in elements)
{
var image = AnotherDecode64(element.Value);
}
}
static byte[] AnotherDecode64(string base64Decoded)
{
string temp = base64Decoded.TrimEnd('=');
int asciiChars = temp.Length - temp.Count(c => Char.IsWhiteSpace(c));
switch (asciiChars % 4)
{
case 1:
//This would always produce an exception!!
//Regardless what (or what not) you attach to your string!
//Better would be some kind of throw new Exception()
return new byte[0];
case 0:
asciiChars = 0;
break;
case 2:
asciiChars = 2;
break;
case 3:
asciiChars = 1;
break;
}
temp += new String('=', asciiChars);
return Convert.FromBase64String(temp);
}
The base64 string is not valid as Oliver has already said, the string length must be multiples of 4 after removing white space characters. If you look at then end of the base64 string (see below) you will see the line is shorter than the rest.
RRRRRRRRRRRRRRRRRRRRRRRRRRRRX//Z=
If you remove this line, your program will work, but the resulting image will have a missing section in the bottom right hand corner. You need to pad this line so the overall string length is corect. From my calculations if you had 3 characters it should work.
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRX//Z=
remove last 2 characters while image not get proper
public Image Base64ToImage(string base64String)
{
// Convert Base64 String to byte[]
byte[] imageBytes=null;
bool iscatch=true;
while(iscatch)
{
try
{
imageBytes = Convert.FromBase64String(base64String);
iscatch = false;
}
catch
{
int length=base64String.Length;
base64String=base64String.Substring(0,length-2);
}
}
MemoryStream ms = new MemoryStream(imageBytes, 0,
imageBytes.Length);
// Convert byte[] to Image
ms.Write(imageBytes, 0, imageBytes.Length);
Image image = Image.FromStream(ms, true);
pictureBox1.Image = image;
return image;
}
Try using Linq to XML:
using System.Xml.XPath;
class Program
{
static void Main(string[] args)
{
var elements = XElement
.Load("test.xml")
.XPathSelectElements("//media/media-object[#encoding='base64']");
foreach (var element in elements)
{
byte[] image = Convert.FromBase64String(element.Value);
}
}
}
UPDATE:
After downloading the XML file and analyzing the value of the media-object node it is clear that it is not a valid base64 string:
string value = "PUT HERE THE BASE64 STRING FROM THE XML WITHOUT THE NEW LINES";
byte[] image = Convert.FromBase64String(value);
throws a System.FormatException saying that the length is not a valid base 64 string. Event when I remove the \n from the string it doesn't work:
var elements = XElement
.Load("20091123-125320.xml")
.XPathSelectElements("//media/media-object[#encoding='base64']");
foreach (var element in elements)
{
string value = element.Value.Replace("\n", "");
byte[] image = Convert.FromBase64String(value);
}
also throws System.FormatException.
I've also had a problem with decoding Base64 encoded string from XML document (specifically Office OpenXML package document).
It turned out that string had additional encoding applied: HTML encoding, so doing first HTML decoding and then Base64 decoding did the trick:
private static byte[] DecodeHtmlBase64String(string value)
{
return System.Convert.FromBase64String(System.Net.WebUtility.HtmlDecode(value));
}
Just in case someone else stumbles on the same issue.
Well, it's all very simple. CDATA is a node itself, so mediaObjectNode.InnerText actually produces <![CDATA[/9j/4AAQ[...snip...]P4Vm9zOR//Z=]]>, which is obviously not valid Base64-encoded data.
To make things work, use mediaObjectNode.ChildNodes[0].Value and pass that value to Convert.FromBase64String'.
Is the character encoding correct? The error sounds like there's a problem that causes invalid characters to appear in the array. Try copying out the text and decoding manually to see if the data is indeed valid.
(For the record, windows-1252 is not exactly the same as iso-8859-1, so that may be the cause of a problem, barring other sources of corruption.)

Categories

Resources