Persisting text as byte array to flat file - c#

Update
This is to hide my public key on the client side to make it slightly more difficult for prying eyes to get to it. Yes I know the intent of the public key is to stay public but I am trying to mitigate key substitution attacks and in addition to obfuscation, assembly merging, assembly signing and some other measures this is a part of the overall strategy. So in my code that gets shipped to the client side I want to be able to do something like this
string publicKey = #"random characters" //don't want this in the code
byte[] keyBytes = [............] //this should be in the code
I am not quite sure how do I take the text of my public key, convert it to a byte array and then use that in the client code to convert back into public key.
I reckon there is an easy way to do this but I am going round in circles trying to figure it out.
Essentially I have a series of text data which I want to be able to save as bytes to a flat file. I can read it as a byte array but when I use a BinaryWriter to write that byte array to a file I end up with the original text.
Can someone please help?

Just as every file, a text file is a binary file.
It just happens to be that in this case every binary number corresponds with a character, so when you open the file in a text editor, you see readable text.
Obligatory The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!).

You could create a little converter for this purpose.
string publicKey = "AsdfsSDhysffsdfsdfZ09";
Console.Write("byte[] keyBytes = { ");
Console.Write(String.Join(", ", Array.ConvertAll(Encoding.ASCII.GetBytes(publicKey), b => String.Format("0x{0:X2}", b))));
Console.WriteLine("};");
Run it. Then just copy the last line of the output to your source code.

Related

Obfuscating/Randomizing a String

How would I go about obfuscating text so it's unreadable to a user reading a text file, but my program could still read it? Basically, I'm going to have something like True*True*False*True*False*False*False*True*False*true* in a text file, and I need it to be all crazy looking.
I know how to get the text from the file and write to the file and all that stuff, I just need to figure out how to obfuscate the string and de-obfuscate it. Is this possible without getting into all crazy encryption stuff? I think AES and other encryption methods are overkill because in my program, this info isn't top secret or something, it can be viewed in the program anyways. I just don't want it edited directly through the file.
Thanks a bunch :D
Nathan
Is this possible without getting into all crazy encryption stuff?
Sure, but if user even remotely knows what he's doing he will be able to decode it with no problem.
// Encode
var bytes = Encoding.UTF8.GetBytes("true*false*true");
var base64 = Convert.ToBase64String(bytes);
// Decode
var data = Convert.FromBase64String(base64);
var decodedString = Encoding.UTF8.GetString(data); // get string and not bytes, thanks trope

Why binary data writing's result is string?

I want to create a binary file and store string data in it, I used this sample:
FileStream fs = new FileStream("c:\\test.data", FileMode.Create);
BinaryWriter bw = new BinaryWriter(fs);
bw.Write(Encoding.ASCII.GetBytes("david stein"));
bw.Close();
but when I opened created file by this sample (test.data) in notepad, it has string data in it ("david stein"), now my question is that whats the difference between this binary writing and text writing when the result is string?
I'm looking to create a data in binary file until user can not open and read my data by note pad and if user open it in notepad see real binary data .
in some files when you open theme in text editors you can not read file content like jpg files contents,they do not use any encryption methods,what about it?how can i wite my data like this?
now my question is that whats the difference between this binary writing and text writing when the result is string?
The data in a file is always "a sequence of bytes". In this case, the sequence of bytes you've written is "the bytes representing the text 'david stein'" in the ASCII encoding. So yes, if you open the file in any editor which tries to interpret the bytes as text in a way which is compatible with ASCII, you'll see the text "david stein". Really it's just a load of bytes though - it all depends on how you interpret them.
If you'd written:
File.WriteAllText("c:\\test.data", "david stein", Encoding.ASCII);
you'd have ended up with the exact same sequence of bytes. There are any number of ways you could have created a file with the same bytes in. There's nothing about File.WriteAllText which "marks" the file as text, and there's nothing about FileStream or BinaryWriter which marks the file as binary.
EDIT: From comments:
I'm looking to create a data in binary file until user can not open and read my data by note pad
Well, there are lots of ways of doing that with different levels of security. Ideally, you'd want some sort of encryption - but then the code reading the data would need to be able to decrypt it as well, which means it would need to be able to get a key. That then moves the question to "how do I store a key".
If you only need to verify the data in the file (e.g. check that it matches something from elsewhere) then you could use a cryptographic hash instead.
If you only need to prevent the most casual of snoopers, you could use something which is basically just obfuscation - a very weak form of encryption with no "key" as such. Anyone who dceompiled your code would easily be able to get at the data in that case, but you may not care too much about that.
It all depends on your requirements.
All data is binary. A text file is binary data that happens to be a limited subset that represent valid characters, but it's still binary.
The way text editors typically differentiate a text file from a binary file is they scan a certain portion of the file for zero values, \0. These never exist in text-only files and almost always exist in binary files.

Migrate C# Hash Code to PHP

I know there are similar questions already on SO but none of them seem to address this problem. I have inherited the following c# code that has been used to create password hashes in a legacy .net app, for various reasons the C# implementation is now being migrated to php:
string input = "fred";
SHA256CryptoServiceProvider provider = new SHA256CryptoServiceProvider();
byte[] hashedValue = provider.ComputeHash(Encoding.ASCII.GetBytes(input));
string output = "";
string asciiString = ASCIIEncoding.ASCII.GetString(hashedValue);
foreach ( char c in asciiString ) {
int tmp = c;
output += String.Format("{0:x2}",
(uint)System.Convert.ToUInt32(tmp.ToString()));
}
return output;
My php code is very simple but for the same input "fred" doesn't produce the same result:
$output = hash('sha256', "fred");
I've traced the problem down to an encoding issue - if I change this line in the C# code:
string asciiString = ASCIIEncoding.ASCII.GetString(hashedValue);
to
string asciiString = ASCIIEncoding.UTF7.GetString(hashedValue);
Then the php and C# output match (it yields d0cfc2e5319b82cdc71a33873e826c93d7ee11363f8ac91c4fa3a2cfcd2286e5).
Since I'm not able to change the .net code I need to work out how to replicate the results in php.
Thanks in advance for any help,
I don’t know PHP well enough to answer your question; however, I must point out that your C# code is broken. Try generating the hash of these two inputs: "âèí" and "çñÿ". You will find that their hash collides:
3f3b221c6c6e3f71223f51695d456d52223f243f3f363949443f3f763b483615
The first bug lies in this operation:
Encoding.ASCII.GetBytes(input)
This assumes that all characters within your input are US-ASCII. Any non-ASCII characters would cause the encoder to fall back to the byte value for the ? character, thereby giving (unwanted) hash collisions, as demonstrated above. Notwithstanding, this will not be an issue if your input is constrained to only allow US-ASCII characters.
The other (more severe) bug lies in the following operation:
ASCIIEncoding.ASCII.GetString(hashedValue)
ASCII only defines mappings for values 0–127. Since the elements of your hashedValue byte array may contain any byte value (0–255), encoding them as ASCII would cause data to be lost whenever a value greater than 127 is encountered. This may lead to further “unwanted” (read: potentially maliciously generated) hash collisions, even when your original input was US-ASCII.
Given that, statistically, half of the bytes constituting your hashes would be greater than 127, then you are losing at least half the strength of your hash algorithm. If a hacker gains access to your stored hashes, it is quite likely that they will manage to devise an attack to generate hash collisions by exploiting this cryptographic weakness.
Edit: Notwithstanding the considerations mentioned in my posts and Jon’s, here is the PHP code that succumbs to the same weakness – so to speak – as your C# code, and thereby gives the same hash:
$output = hash('sha256', $input, true);
for ($i = 0; $i < strlen($output); $i++)
if ($output[$i] > chr(127))
$output[$i] = '?';
$output = bin2hex($output);
Could you use mb_convert_encoding (see http://php.net/manual/en/function.mb-convert-encoding.php - the page also has a link to a list of supported encodings) to convert the PHP string to ASCII from UTF7?
I've traced the problem down to an encoding issue
Yes. You're trying to treat arbitrary binary data as if it's valid text-encoded data. It's not. You should not be using any Encoding here.
If you want the results in hex, the simplest approach is to use BitConverter.ToString
string text = BitConverter.ToString(hashedValue).Replace("-", "").ToLower();
And yes, as pointed out elsewhere, you probably shouldn't be using ASCII to convert the text to binary at the start of the hashing process. I'd probably use UTF-8.
It's really important that you understand the problem here though, as otherwise you'll run into it in other places too. You should only use encodings such as ASCII, UTF-8 etc (on any platform) when you've genuinely got encoded text data. You shouldn't use them for images, the results of cryptography, the results of hashing, etc.
EDIT: Okay, you say you can't change the C# code... it's not clear whether that just means you've got legacy data, or whether you need to keep using the C# code regardless. You should absolutey not run this code for a second longer than you have to.
But in PHP, you may find you can get away with just replacing every byte with a value >= 0x80 in the hash with 0x3F, which is the ASCII for "question mark". If you look through your data you'll probably find there are a lot of 3F bytes in there.
If you can get this to work, I would strongly suggest that you migrate over to the true MD5 hash without losing information like this. Wherever you're storing the hashes, store two: the legacy one (which is all you have now) and the rehashed one. Whenever you're asked to validate that a password is correct, you should:
Check whether you have a "new" one; if so, only use that - ignore the legacy one.
If you only have a legacy one:
Hash the password in the broken way to check whether it's correct
If it is, hash it again properly and store the results in the "new" place.
Then when everyone's logged in correctly once, you'll be able to wipe out the legacy hashes.

parsing binary file in C#

I have a binary file. i stored it in byte array. file size can be 20MB or more. then i want to parse or find particular value in the file. i am doing it by 2 ways ->
1. By converting full file in char array.
2. By converting full file in hex string.(i also have hex values)
what is best way to parse full file..or should i do in binary form. i am using vs-2005.
From the aspect of memory consumption, it would be best it you could parse it directly, on-the-fly.
Converting it to a char array in C# means effectively doubling it's size in memory (presuming you are converting each byte to a char), while hex string will take at least 4 times the size (C# chars are 16-bit unicode characters).
On the other hand, it you need to make many searches and parsing over an existing set of data repeatedly, you may benefit from having it stored in any form which suits your needs better.
What's stopping you from seaching in the byte[]?
IMHO, If you're simply searching for a byte of specified value, or several continous bytes, this is the easiest way and most efficient way to do it.
If I understood your question correctly you need to find strings which can contain any characters in a large binary file. Does the binary file contain text? If so do you know the encoding? If so you can use StreamReader class like so:
using (StreamReader sr = new StreamReader("C:\test.dat", System.Text.Encoding.UTF8))
{
string s = sr.ReadLine();
}
In any case I think it's much more efficient using some kind of stream access to the file, instead of loading it all to memory.
You could load it by chunks into the memory, and then use some pattern matching algorithm (like Knuth-Moris-Pratt or Karp-Rabin)

How to write a file format handler

Today i'm cutting video at work (yea me!), and I came across a strange video format, an MOD file format with an companion MOI file.
I found this article online from the wiki, and I wanted to write a file format handler, but I'm not sure how to begin.
I want to write a file format handler to read the information files, has anyone ever done this and how would I begin?
Edit:
Thanks for all the suggestions, I'm going to attempt this tonight, and I'll let you know. The MOI files are not very large, maybe 5KB in size at most (I don't have them in front of me).
You're in luck in that the MOI format at least spells out the file definition. All you need to do is read in the file and interpret the results based on the file definition.
Following the definition, you should be able to create a class that could read and interpret a file which returns all of the file format definitions as properties in their respective types.
Reading the file requires opening the file and generally reading it on a byte-by-byte progression, such as:
using(FileStream fs = File.OpenRead(path-to-your-file)) {
while(true) {
int b = fs.ReadByte();
if(b == -1) {
break;
}
//Interpret byte or bytes here....
}
}
Per the wiki article's referenced PDF, it looks like someone already reverse engineered the format. From the PDF, here's the first entry in the format:
Hex-Address: 0x00
Data Type: 2 Byte ASCII
Value (Hex): "V6"
Meaning: Version
So, a simplistic implementation could pull the first 2 bytes of data from the file stream and convert to ASCII, which would provide a property value for the Version.
Next entry in the format definition:
Hex-Address: 0x02
Data Type: 4 Byte Unsigned Integer
Value (Hex):
Meaning: Total size of MOI-file
Interpreting the next 4 bytes and converting to an unsigned int would provide a property value for the MOI file size.
Hope this helps.
If the files are very large and just need to be streamed in, I would create a new reader object that uses an unmanagedmemorystream to read the information in.
I've done a lot of different file format processing like this. More recently, I've taken to making a lot of my readers more functional where reading tends to use 'yield return' to return read only objects from the file.
However, it all depends on what you want to do. If you are trying to create a general purpose format for use in other applications or create an API, you probably want to conform to an existing standard. If however you just want to get data into your own application, you are free to do it however you want. You could use a binaryreader on the stream and construct the information you need within your app, or get the reader to return objects representing the contents of the file.
The one thing I would recommend. Make sure it implements IDisposable and you wrap it in a using!

Categories

Resources