converting bytes to a string C# - c#

I want to convert a binary file to a string which could be then converted back to the binary file.
I tried this:
byte[] byteArray = File.ReadAllBytes(#"D:\pic.png");
for (int i = 0; i < byteArray.Length; i++)
{
textBox1.Text += (char)byteArray[i];
}
but it's too slow, it takes about 20 seconds to convert 5KB on i5 CPU.
I noticed that notepad does the same in much less time.
Any ideas on how to do it?
Thanks

If you want to be able to convert back to binary without losing any information, you shouldn't be doing this sort of thing at all - you should use base64 encoding or something similar:
textBox1.Text = Convert.ToBase64String(byteArray);
You can then convert back using byte[] data = Convert.FromBase64String(text);. The important thing is that base64 converts arbitrary binary data to known ASCII text; all byte sequences are valid, all can be round-tripped, and as it only requires ASCII it's friendly to many transports.
There are four important things to take away here:
Don't treat arbitrary binary data as if it were valid text in a particular encoding. Phil Haack wrote about this in a blog post recently, in response to some of my SO answers.
Don't perform string concatenation in a loop; use a StringBuilder if you want to create one final string out of lots of bits, and you don't know how many bits in advance
Don't use UI properties in a loop unnecessarily - even if the previous steps were okay, it would have been better to construct the string with a loop and then do a single assignment to the Text property
Learn about System.Text.Encoding for the situation where you really have got encoded text; Encoding.UTF8.GetString(byteArray) would have been appropriate if this had been UTF-8-encoded data, for example

Related

Persisting text as byte array to flat file

Update
This is to hide my public key on the client side to make it slightly more difficult for prying eyes to get to it. Yes I know the intent of the public key is to stay public but I am trying to mitigate key substitution attacks and in addition to obfuscation, assembly merging, assembly signing and some other measures this is a part of the overall strategy. So in my code that gets shipped to the client side I want to be able to do something like this
string publicKey = #"random characters" //don't want this in the code
byte[] keyBytes = [............] //this should be in the code
I am not quite sure how do I take the text of my public key, convert it to a byte array and then use that in the client code to convert back into public key.
I reckon there is an easy way to do this but I am going round in circles trying to figure it out.
Essentially I have a series of text data which I want to be able to save as bytes to a flat file. I can read it as a byte array but when I use a BinaryWriter to write that byte array to a file I end up with the original text.
Can someone please help?
Just as every file, a text file is a binary file.
It just happens to be that in this case every binary number corresponds with a character, so when you open the file in a text editor, you see readable text.
Obligatory The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!).
You could create a little converter for this purpose.
string publicKey = "AsdfsSDhysffsdfsdfZ09";
Console.Write("byte[] keyBytes = { ");
Console.Write(String.Join(", ", Array.ConvertAll(Encoding.ASCII.GetBytes(publicKey), b => String.Format("0x{0:X2}", b))));
Console.WriteLine("};");
Run it. Then just copy the last line of the output to your source code.

How to decode (shifting and xoring) a massive byte array in a fast way?

How to decode (shifting and xoring) a massive byte array in a fast way?
I need it for a file viewer application that opens the archive file and decodes the files inside and display them to the user. The files are encrypted with a byte shifting and xoring system. It is impossible for me to change the algorithm. Currently, I just read all the bytes and then run the Decode function on them.
The decode function that I currently use:
byte[] DecodeVOQ(byte[] EncodedBytes)
{
for (int i = 0; i < EncodedBytes.Length; i++)
{
EncodedBytes[i] ^= (byte)194;
EncodedBytes[i] = (byte)((EncodedBytes[i] << 4) | (EncodedBytes[i] >> 4));
}
return EncodedBytes;
}
Edit: I found out that the real performance problem is with displaying the text. Reading + Decoding is pretty fast.
One possible optimization would be to precompute the output for any input byte. So you'd have:
private static byte[] DecodedBytes = PrecomputeDecodedBytes();
public static byte[] DecodeVOQ(byte[] data)
{
for (int i = 0; i < data.Length; i++)
{
data[i] = DecodedBytes[data[i]];
}
return data;
}
It's quite possible that that will be slower than your existing bitshifting algorithm though. EDIT: I've just tried comparing this with the original bitshift but using a temporary local variable: they're about the same.
Have you benchmarked the current performance? Is it definitely too slow? In particular, loading the file from just about any storage medium will be much much slower than the cost of decoding. I've just tried this on my laptop - for 200MB of data, it takes about half a second. (EDIT: With Marcelo's answer, it takes under half a second.) Is that really too slow?
Would you be happy to use more than one processor? It's an embarrassingly parallelizable routine, after all. If you're using .NET 4, the TPL may well make this pretty simple.
I should emphasize again though that this isn't "encryption" - it's a mild form of obfuscation, in the same way that the base-64 encoding of a username/password for basic HTTP authentication is.
I'm thinking a table driven approach would be faster, right? Since it's just bytes, and no byte depends on an adjacent byte, there are only 256 possible choices, so just lookup each one
You might speed things up by using a temporary:
byte b = EncodedBytes[i] ^ (byte)194;
EncodedBytes[i] = (byte)((b << 4) | (b >> 4));
You might speed things up further by using unsafe and raw pointers, thus avoiding checked accesses (though I don't know if that's a consideration with current JIT optimisers).
One approach to consider is to decode the data just as it's being displayed. That is, only decode a portion at a time. But I suspect you are just dumping the data to an edit control or something, which would not really make that an option. How are you displaying the data?
Other than that, I'm not sure how you'd speed it up too much.
Without .NET you'd be able to decode that 4 bytes in a time, but here, actually, the only one thing you can do is to precompute translation table.
That's not xor, that's shift and or...
In assembly, that would be a single "rotate byte by 4" instruction.
BTW can't you decode it on demand? Decode the file in blocks, as you stream the file.

parsing binary file in C#

I have a binary file. i stored it in byte array. file size can be 20MB or more. then i want to parse or find particular value in the file. i am doing it by 2 ways ->
1. By converting full file in char array.
2. By converting full file in hex string.(i also have hex values)
what is best way to parse full file..or should i do in binary form. i am using vs-2005.
From the aspect of memory consumption, it would be best it you could parse it directly, on-the-fly.
Converting it to a char array in C# means effectively doubling it's size in memory (presuming you are converting each byte to a char), while hex string will take at least 4 times the size (C# chars are 16-bit unicode characters).
On the other hand, it you need to make many searches and parsing over an existing set of data repeatedly, you may benefit from having it stored in any form which suits your needs better.
What's stopping you from seaching in the byte[]?
IMHO, If you're simply searching for a byte of specified value, or several continous bytes, this is the easiest way and most efficient way to do it.
If I understood your question correctly you need to find strings which can contain any characters in a large binary file. Does the binary file contain text? If so do you know the encoding? If so you can use StreamReader class like so:
using (StreamReader sr = new StreamReader("C:\test.dat", System.Text.Encoding.UTF8))
{
string s = sr.ReadLine();
}
In any case I think it's much more efficient using some kind of stream access to the file, instead of loading it all to memory.
You could load it by chunks into the memory, and then use some pattern matching algorithm (like Knuth-Moris-Pratt or Karp-Rabin)

Printing time to binary file in C# .net

I got this next problem.
I have a binary file, which I write to it vital data of the system.
One of the fields is time, which I use DateTime.Now.ToString("HHmmssffffff), in format of microseconds. This data (in a string) I convert (to ToCahrArray) (and checked it in debugging in it is fine), it consists of time valid till the microseconds.
Then I write it and flush it to the file. When opening it with PsPad that translate binary to Ascii, I see that data is corrupted in this field and a nother one but the rest of the message is fine.
The code:
void Write(string strData) {
char[] cD = strData.ToCharArry();
bw.Write(c); //br is from type of BinaryWriter
bw.Flush();
}
You're writing out the bytes in Unicode characters, not Ascii bytes. If you want Ascii bytes, you should change this to use the Encoding class.
byte[] data = Encoding.ASCII.GetBytes(strData);
bw.Write(data);
I strongly recommend reading Joel Spolsky's article on character sets and encoding. It may help you understand what your current code is not working properly.

Is there a better way to convert to ASCII from an arbitrary input?

I need to be able to take an arbitrary text input that may have a byte order marker (BOM) on it to mark its encoding, and output it as ASCII. We have some old tools that don't understand BOM's and I need to send them ASCII-only data.
Now, I just got done writing this code and I just can't quite believe the inefficiency here. Four copies of the data, not to mention any intermediate buffers internally in StreamReader. Is there a better way to do this?
// i_fileBytes is an incoming byte[]
string unicodeString = new StreamReader(new MemoryStream(i_fileBytes)).ReadToEnd();
byte[] unicodeBytes = Encoding.Unicode.GetBytes(unicodeString.ToCharArray());
byte[] ansiBytes = Encoding.Convert(Encoding.Unicode, Encoding.ASCII, unicodeBytes);
string ansiString = Encoding.ASCII.GetString(ansiBytes);
I need the StreamReader() because it has an internal BOM detector to choose the encoding to read the rest of the file. Then the rest is just to make it convert into the final ASCII string.
Is there a better way to do this?
If you've got i_fileBytes in memory already, you can just check whether or not it starts with a BOM, and then convert either the whole of it or just the bit after the BOM using Encoding.Unicode.GetString. (Use the overload which lets you specify an index and length.)
So as code:
int start = (i_fileBytes[0] == 0xff && i_fileBytes[1] == 0xfe) ? 2 : 0;
string text = Encoding.Unicode.GetString(i_fileBytes, start, i_fileBytes.Length-start);
Note that that assumes a genuinely little-endian UTF-16 encoding, however. If you really need to detect the encoding first, you could either reimplement what StreamReader does, or perhaps just build a StreamReader from the first (say) 10 bytes, and use the CurrentEncoding property to work out what you should use for the encoding.
EDIT: Now, as for the conversion to ASCII - if you really only need it as a .NET string, then presumably all you want to do is replace any non-ASCII characters with "?" or something similar. (Alternatively it might be better to throw an exception... that's up to you, of course.)
EDIT: Note that when detecting the encoding, it would be a good idea to just call Read() a single time to read one character. Don't call ReadToEnd() as by picking 10 bytes as an arbitrary amount of data, it might end mid-character. I don't know offhand whether that would throw an exception, but it has no benefits anyway...
System.Text.Encoding.ASCII.GetBytes(new StreamReader(new MemoryStream(i_fileBytes)).ReadToEnd())
That should save a few round-trips.

Categories

Resources