strings from DynamoDB that were originally byte arrays have funky values

strings from DynamoDB that were originally byte arrays have funky values - c#

Now I'm not sure if this is something I'm doing wrong, or something thats happening in DynamoDB..
Basically, Im building a simple registration/login system for my project, saving the userdata/password in a DynamoDB instance with the password hashed using RIPEMD160, and salted as well using C#'s RNGCryptoServiceProvider().
Registration seems to work perfectly fine. the issue is upon login, no matter what, the passwords dont match up, and I think its because I'm getting some funky characters back when pulling the hash/salt back from DynamoDB. First off, both the hash and the salt are byte arrays of length 20, and converted to strings before saved in the database.
These examples are copy/pasted from the dynamo web interface
Example Hash: ">�Bb.ŧ�E���d��Ʀ"
Example Salt: "`���!�!�Hb�m�}e�"
When they're coming back and I debug into the function that pulls back the data from dynamo, both strings have different characters (VS2010 Debugger):
Returned Hash: "u001B>�Bb.ŧ�E��u0003�d�u001C�Ʀ"
Returned Salt: "`���!u000B�!�Hb�u001Dmu0012�u0001}e�"
Seems these u001B, u000B, u001D, u0012, u0003, u001C, and u0001 are sneaking into the returned data, and I'm not entirely sure whats going on?

You shouldn't be trying to convert opaque binary data into a string in this way in the first place. They're not text so don't treat them that way. You're just begging to lose information that way.
Use Convert.ToBase64String(data) instead of Encoding.GetString before putting the data into the database. When you get it out again, use Convert.FromBase64String to retrieve the original binary data.
Alternatively, don't store the data in a text field to start with - use a database field type which is meant to store binary data...

Related

WE8DEC (or MCS) encoding in c#

Okay, I have this big .NET project which uses multiple databases with a lot of already-written requests. The databases all uses WE8DEC as the character system, until now, all the data was latin and there was no problem.
But I now have the task to use a new database, again in WE8DEC, but this database stores russian data, written in cyrillic. Using a tool like DBeaver, it shows data like ÇÎËÎÒÀ�ÅÂ instead of the actual cyrillic text.
I know I can retrieve the byte data directly from the database using the dump function to retrieve the bytes and then convert them.
WORD | DUMP(WORD)
ÇÎËÎÒÀ�ÅÂ | Typ=1 Len=9: 199,206,203,206,210,192,208,197,194
But I don't feel like duplicating/altering all my request and the way I retrieve the results in c#, I have a place just before sending the data as JSON where I could just reincode all the string before sending them.
So I was looking for a way to retrieve the bytes just like in Oracle and found a way using this line of code :
byte[] bytes = Encoding.GetEncoding("Windows-1252").GetBytes(word);
But my main problem is this, I don't find any exact equivalent of the WE8DEC encoding from Oracle in .NET, Windows-1252 is the closest I found (but still incorrect).
So the question, is there an exact equivalent of WE8DEC, also called MCS, in c#?

Encoding two values in a short URL token

I am working on a short URL app, where the token must identify 2 values: the link ID and the user ID. Ideally this token should be short.
For example, considering the URL http://sho.rt/15qq6, the token "15qq6" must identify the link and user ID.
I guess one option is to insert both values in a table and use the auto-generated ID as a token, but I would rather not. I would prefer a solution involving encryption.
How could I use the .NET encryption classes for such purpose, if possible? Many thanks for your help.

I'm not clear on how short you want your code. I posted some code online to encrypt any number of query arguments.
The result could be shortened by base64-encoding the result. That might still not be short enough for you though. (Note that I didn't base64-encode it because I had some concerns about base64 encoding is case-sensitive.)
Another approach would be to come up with a code that consists of an ID into your database and some sort of checksum. If the user tries modifying the ID, you could detect this. However, this approach may not be that secure since it might not be that hard to figure out how to create your own checksums.

Short answer is "You can't", at least, not easily.
Encryption typically doesn't change the length of the data being encrypted. So if you take the URL and UserId that you want to encode and encrypt them you'll end up with a token that's the same total length.
You could try compressing the data before encryption, but there's not a lot of redundancy in a single URL, and this won't buy you much.
You culd hash the data to give you a shorter result, but there's no way to reverse this process to get your URL and userId back.
If it's a short token you need then the only real option I can think of is a lookup table on the server, using the token as the key.

I don't think you understand exactly how Encryption works.
Encryption is just a technique for making it difficult to decode the response, without knowing the original encryption key.
The encrypted data is at least as long as the original data, if not larger.
There is no viable way of encoding a URL into a smaller amount of data, that's still valid in a URL.
Use a database for this, that's what they're for.
Edit: D'oh, Andrew beat me to it with a better response after editing.

You could use something like the RNGCryptoServiceProvider to generate a unique set of characters. Use a few constants strings holding a range of characters like "a" to "z", "A" to "Z", and "1" to "9". Save the randomly mixed case alphanumeric string with the original URL and UserID.

Generate random token and save link and user id in db for this token. It is security enough.

If you don't need encryption, than simple combination of Convert.ToBase64String and BitConverter.GetBytes will give you reasonable string. Note that Base64 uses some non-url cahnracters, so consider replacing them in result WikiPedia Base64, or using Base32 encoding.
int first =1234;
int second =789;
var encoded = Convert.ToBase64String(
BitConverter.GetBytes (((ulong)first<<32)+(ulong)second));

monitor html change using hash func

I want to write an application that gets a list of urls.
For each of them I need to monitor periodically if the content has changed.
I thought :
to use HtmlAgilityPack to fetch html content (any other recommendation?)
I don't need to spot the change itself,
so I though to hash the content, save it in the DB
and re-compare the has in the future.
How would you suggest hashing? .net's GetHashCode() ?
I saw this documentation http://support.microsoft.com/kb/307020
which advise using
tmpSource = ASCIIEncoding.ASCII.GetBytes(sSourceData);
why?

You should absolutely not use GetHashCode() for this. The documentation explicitly states:
Furthermore, the .NET Framework does not guarantee the default implementation of the GetHashCode method, and the value it returns will be the same between different versions of the .NET Framework.
The results of GetHashCode can change between runs - all that's guaranteed is that calling it on two equal objects in the same process (possibly AppDomain) will give the same hash code. Indeed, String.GetHashCode's algorithm has changed over time, and in .NET 4 the 32-bit implementation is different to the 64-bit implementation.
If you want to use hashing, use MD5, SHA1 etc - something with a specified algorithm which will not change. (Note that these operation on binary data rather than string data, which is probably more appropriate too - you don't need to bother decoding the data as text.)
It's not clear to me whether refetching periodically is really the best idea though - do these servers not support last modified times, etags etc?

As you have asked for suggestions. I would have used this method instead
WebClient client = new WebClient();
String htmlCode = client.DownloadString("http://google.com");
And i would have saved this string in my DB. After the particular interval i could have compared them again.
But yes I do agree the string size would be really be large.
If I just want to get a alert on the fact the content has changed some how. I would use MD5. As the result size of an MD5 string is only 27 characters.
Hence easier to compare and store in DB

Migrate C# Hash Code to PHP

I know there are similar questions already on SO but none of them seem to address this problem. I have inherited the following c# code that has been used to create password hashes in a legacy .net app, for various reasons the C# implementation is now being migrated to php:
string input = "fred";
SHA256CryptoServiceProvider provider = new SHA256CryptoServiceProvider();
byte[] hashedValue = provider.ComputeHash(Encoding.ASCII.GetBytes(input));
string output = "";
string asciiString = ASCIIEncoding.ASCII.GetString(hashedValue);
foreach ( char c in asciiString ) {
int tmp = c;
output += String.Format("{0:x2}",
(uint)System.Convert.ToUInt32(tmp.ToString()));
}
return output;
My php code is very simple but for the same input "fred" doesn't produce the same result:
$output = hash('sha256', "fred");
I've traced the problem down to an encoding issue - if I change this line in the C# code:
string asciiString = ASCIIEncoding.ASCII.GetString(hashedValue);
to
string asciiString = ASCIIEncoding.UTF7.GetString(hashedValue);
Then the php and C# output match (it yields d0cfc2e5319b82cdc71a33873e826c93d7ee11363f8ac91c4fa3a2cfcd2286e5).
Since I'm not able to change the .net code I need to work out how to replicate the results in php.
Thanks in advance for any help,

I don’t know PHP well enough to answer your question; however, I must point out that your C# code is broken. Try generating the hash of these two inputs: "âèí" and "çñÿ". You will find that their hash collides:
3f3b221c6c6e3f71223f51695d456d52223f243f3f363949443f3f763b483615
The first bug lies in this operation:
Encoding.ASCII.GetBytes(input)
This assumes that all characters within your input are US-ASCII. Any non-ASCII characters would cause the encoder to fall back to the byte value for the ? character, thereby giving (unwanted) hash collisions, as demonstrated above. Notwithstanding, this will not be an issue if your input is constrained to only allow US-ASCII characters.
The other (more severe) bug lies in the following operation:
ASCIIEncoding.ASCII.GetString(hashedValue)
ASCII only defines mappings for values 0–127. Since the elements of your hashedValue byte array may contain any byte value (0–255), encoding them as ASCII would cause data to be lost whenever a value greater than 127 is encountered. This may lead to further “unwanted” (read: potentially maliciously generated) hash collisions, even when your original input was US-ASCII.
Given that, statistically, half of the bytes constituting your hashes would be greater than 127, then you are losing at least half the strength of your hash algorithm. If a hacker gains access to your stored hashes, it is quite likely that they will manage to devise an attack to generate hash collisions by exploiting this cryptographic weakness.
Edit: Notwithstanding the considerations mentioned in my posts and Jon’s, here is the PHP code that succumbs to the same weakness – so to speak – as your C# code, and thereby gives the same hash:
$output = hash('sha256', $input, true);
for ($i = 0; $i < strlen($output); $i++)
if ($output[$i] > chr(127))
$output[$i] = '?';
$output = bin2hex($output);

Could you use mb_convert_encoding (see http://php.net/manual/en/function.mb-convert-encoding.php - the page also has a link to a list of supported encodings) to convert the PHP string to ASCII from UTF7?

I've traced the problem down to an encoding issue
Yes. You're trying to treat arbitrary binary data as if it's valid text-encoded data. It's not. You should not be using any Encoding here.
If you want the results in hex, the simplest approach is to use BitConverter.ToString
string text = BitConverter.ToString(hashedValue).Replace("-", "").ToLower();
And yes, as pointed out elsewhere, you probably shouldn't be using ASCII to convert the text to binary at the start of the hashing process. I'd probably use UTF-8.
It's really important that you understand the problem here though, as otherwise you'll run into it in other places too. You should only use encodings such as ASCII, UTF-8 etc (on any platform) when you've genuinely got encoded text data. You shouldn't use them for images, the results of cryptography, the results of hashing, etc.
EDIT: Okay, you say you can't change the C# code... it's not clear whether that just means you've got legacy data, or whether you need to keep using the C# code regardless. You should absolutey not run this code for a second longer than you have to.
But in PHP, you may find you can get away with just replacing every byte with a value >= 0x80 in the hash with 0x3F, which is the ASCII for "question mark". If you look through your data you'll probably find there are a lot of 3F bytes in there.
If you can get this to work, I would strongly suggest that you migrate over to the true MD5 hash without losing information like this. Wherever you're storing the hashes, store two: the legacy one (which is all you have now) and the rehashed one. Whenever you're asked to validate that a password is correct, you should:
Check whether you have a "new" one; if so, only use that - ignore the legacy one.
If you only have a legacy one:
Hash the password in the broken way to check whether it's correct
If it is, hash it again properly and store the results in the "new" place.
Then when everyone's logged in correctly once, you'll be able to wipe out the legacy hashes.

storing large data in string

i am trying to store large data more than 255 characters in a string datatype but it truncates after 255. how can i achive this basically i need to pass this data to database

C# strings do not have any particular character limit. However the database column you are writing to may have a limit. If you are storing large amounts of data, you should use a BLOB column instead of an ordinary varchar type.

StringBuilder class
Like they said the string class is not limited, but you can do this for large strings. I feel it handles them better.
StringBuilder sb = new StringBuilder();
sb.append("Some text...");
sb.append("more text...");
sb.append("even more text!");
sb.toString();

Okay, it sounds like you have several different technologies involved - Excel, XML, databases etc. Try to tackle just one at a time. First read the data out of Excel, and make sure you can do that without any truncation.
Write a small console app which will read the value, then write it to the console - and its length. If that works, you know the problem isn't in Excel.
Next you can write a small console app with hardcoded input data (so you don't need to keep using interop with Excel) and write the XML from that, or whatever your next stage is.
Basically, take the one big problem ("when I read data from Excel and write it to the database it truncates long values") and split it into smaller and smaller ones until you've found what's wrong.

The string type does not limit strings to 255 characters. Your database column must be 255 characters.

I know that c# strings can hold much longer data than that. If the truncation occurs on commiting to DB, check the length constraint on ur Db field

The problem lies in the Excel part; .Character has a 255 characters limitation.
To read the complete text from a shape the following VBA syntax would do:
Worksheets("YourSheet").Shapes("Shape1").OLEFormat.Object.Text

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.