String back into byte array - not working - c#

I have a stream that gets converted into a byte array.
I then take that bye array and turn it into a string.
When I try to turn that string back into a byte array it is not correct...see the code below.
private void Parse(Stream stream, Encoding encoding)
{
// Read the stream into a byte array
byte[] allData = ToByteArray(stream);
// Copy to a string for header parsing
string allContent = encoding.GetString(allData);
//This does not convert back right - just for demo purposes, not how the code is used
allData = encoding.GetBytes(allContent);
}
private byte[] ToByteArray(Stream stream)
{
byte[] buffer = new byte[32768];
using (MemoryStream ms = new MemoryStream())
{
while (true)
{
int read = stream.Read(buffer, 0, buffer.Length);
if (read <= 0)
return ms.ToArray();
ms.Write(buffer, 0, read);
}
}
}

Without having more information, I'm quite certain that this is a text encoding issue. Most likely, the text encoding in the stream is different than the encoding specified as your parameter. This will result in different values at the byte level.
Here's a few good articles that explains why you're seeing what you're seeing.
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
How to Determine Text File Encoding
General questions, relating to UTF or Encoding Form

I think changing the ToByteArray method to use a StreamReader that matches the encoding should work in this case, although without seeing more of the code I can't be certain.
private byte[] ToByteArray(Stream stream, System.Text.Encoding encoding)
{
using(var sr = new StreamReader(stream, encoding))
{
return encoding.GetBytes(sr.ReadToEnd());
}
}
EDIT
Since you're working with image data, you should use Convert.ToBase64String to convert the byte[] to a string. You can then use Convert.FromBase64String decode to convert back into a byte[]. The reason encoding.GetBytes doesn't work is because there may be some data in the byte[] that cannot be represented as a string for that encoding.
private void Parse(Stream stream, Encoding encoding)
{
byte[] allData = ToByteArray(stream);
string allContent = Convert.ToBase64String(allData);
allData = Convert.FromBase64String(allContent);
}

Related

Prevent File.WriteAllText to write double Byte-Order Mark (BOM)?

In the following example, if (string) text starts with a BOM, File.writeAllText() will add another one, writing two BOMs.
I want to write the text two times:
Without a BOM at all
With a single BOM (if applicable to the encoding)
What is the canonical way to achieve this?
HttpWebResponse response = ...
Byte[] byte = ... // bytes from response possibly including BOM
var encoding = Encoding.GetEncoding(
response.get_CharacterSet(),
new EncoderExceptionFallback(),
new DecoderExceptionFallback()
);
string text = encoding.GetString(bytes); // will preserve BOM if any
System.IO.File.WriteAllText(fileName, text, encoding);
You are decoding and then reencoding the file... It is quite useless.
Inside the Encoding class there is a GetPreamble() method that returns the preamble (called BOM for utf-* encodings), in a byte[]. Then we can check if the bytes array received has already this prefix or not. Then based on this information we can write the two versions of the file, adding or removing the prefix when necessary.
var encoding = Encoding.GetEncoding(response.CharacterSet, new EncoderExceptionFallback(), new DecoderExceptionFallback());
// This will throw if the file is wrongly encoded
encoding.GetCharCount(bytes);
byte[] preamble = encoding.GetPreamble();
bool hasPreamble = bytes.Take(preamble.Length).SequenceEqual(preamble);
if (hasPreamble)
{
File.WriteAllBytes("WithPreambleFile.txt", bytes);
using (var fs = File.OpenWrite("WithoutPreamble.txt"))
{
fs.Write(bytes, preamble.Length, bytes.Length - preamble.Length);
}
}
else
{
File.WriteAllBytes("WithoutPreambleFile.txt", bytes);
using (var fs = File.OpenWrite("WithPreamble.txt"))
{
fs.Write(preamble, 0, preamble.Length);
fs.Write(bytes, 0, bytes.Length);
}
}

Read memory mapped file to string

I am trying to read data from a memory mapped file, which is written to the memory file from a C++ program. I am able to use the debug method and write the data as a string from a loop. However, I want to convert the byte array to a usable string that I can then manipulate.
using (MemoryMappedFile mmf = MemoryMappedFile.OpenExisting("DataFile"))
{
using (MemoryMappedViewAccessor reader = mmf.CreateViewAccessor())
{
var bytes = new byte[reader.Capacity];
reader.ReadArray<byte>(0, bytes, 0, bytes.Length);
for(int i = 0; i<bytes.Length; i++)
{
System.Diagnostics.Debug.Write((char) bytes[i]);
}
}
}
I've tried removing the for loop and replacing it with a GetString() encoder, but it only returns a question mark character instead of the full data string.
So long as you know the encoding the bytes were written with this is easy, you would use for example the System.Text.UTF8Encoding class (probably the System.Text.Encoding.UTF8 instance) and the GetString method:
string str = System.Text.Encoding.UTF8.GetString(bytes, 0, bytes.Length);
If you don't know the encoding this becomes much harder, you would have to try and use a heuristic method and guess what the encoding actually is.

Why does UTF8 encoding change/corrupt bytes as oppose to Base64 and ASCII, when writing to file?

I am writing an application, which would receive encrypted byte array, consisting of file name and file bytes, with the following protocol: file_name_and_extension|bytes. Byte array is then decrypted and passing into Encoding.UTF8.getString(decrypted_bytes) would be preferable, because I would like to trim file_name_and_extension from the received bytes to save actual file bytes into file_name_and_extension.
I simplified my application, to only receive file bytes which are then passed into Encoding.UTF8.GetString() and back into byte array with Encoding.UTF8.getBytes(). After that, I am trying to write a zip file, but the file is invalid. It works when using ASCII or Base64.
private void Decryption(byte[] encryptedMessage, byte[] iv)
{
using (Aes aes = new AesCryptoServiceProvider())
{
aes.Key = receiversKey;
aes.IV = iv;
// Decrypt the message
using (MemoryStream decryptedBytes = new MemoryStream())
{
using (CryptoStream cs = new CryptoStream(decryptedBytes, aes.CreateDecryptor(), CryptoStreamMode.Write))
{
cs.Write(encryptedMessage, 0, encryptedMessage.Length);
cs.Close();
string decryptedBytesString = Encoding.UTF8.GetString(decryptedBytes.ToArray()); //corrupts the zip
//string decryptedBytesString = Encoding.ASCII.GetString(decryptedBytes.ToArray()); //works
//String decryptedBytesString = Convert.ToBase64String(decryptedBytes.ToArray()); //works
byte[] fileBytes = Encoding.UTF8.GetBytes(decryptedBytesString);
//byte[] fileBytes = Encoding.ASCII.GetBytes(decryptedBytesString);
//byte[] fileBytes = Convert.FromBase64String(decryptedBytesString);
File.WriteAllBytes("RECEIVED\\received.zip", fileBytes);
}
}
}
}
Because one shouldn't try to interpret raw bytes as symbols in some encoding unless he actually knows/can deduce the encoding used.
If you receive some nonspecific raw bytes, then process them as raw bytes.
But why it works/doesn't work?
Because:
Encoding.Ascii seems to ignore values greater than 127 and return them as they are. So no matter the encoding/decoding done, raw bytes will be the same.
Base64 is a straightforward encoding that won't change the original data in any way.
UTF8 - theoretically with those bytes not being proper UTF8 string we may have some conversion data loss (though it would more likely result in an exception). But the most probable reason is a BOM being added during Encoding.UTF8.GetString call that would remain there after Encoding.UTF8.GetBytes.
In any case, I repeat - do not encode/decode anything unless it is actually string data/required format.

Base64 - CryptoStream with StreamWriter vs Convert.ToBase64String()

Following feedback from Alexei, a simplification of the question:
How do I use a buffered Stream approach to convert the contents of a CryptoStream (using ToBase64Transform) into a StreamWriter (Unicode encoding) without using Convert.ToBase64String()?
Note: Calling Convert.ToBase64String() throws OutOfMemoryException, hence the need for a buffered/Stream approach to the conversion.
You probably should implement custom Stream, not a TextWriter. It is much easier to compose streams than writers (like pass your stream to compressed stream).
To create custom stream - derive from Stream and implement at least Write and Flush (and Read if you need R/W stream). The rest is more or less optional and depends on you additional needs, regular copy to other stream does not need anything else.
In constructor get inner stream passed to you for writing to. Base64 is always producing ASCII characters, so it should be easy to write output as UTF-8 with or without BOM directly to a stream, but if you want to specify encoding you can wrap inner stream with StreamWriter internally.
In your Write implementation buffer data till you get enough bytes to have block of multiple of 3 bytes (i.e. 300) and call Convert.ToBase64String on that portion. Make sure not to loose not-yet-converted portion. Since Base64 converts 3 bytes to 4 characters converting in blocks of multiple of 3 size will never have =/== padding at the end and can be concatenated with next block. So write that converted portion into inner stream/writer. Note that you want to limit block size to something relatively small like 3*10000 to avoid allocation of your blocks on large objects heap.
In Flush make sure to convert the last unwritten bytes (this will be the only one with = padding at the end) and write it to the stream too.
For reading you may need to be more careful as in Base64 white spaces are allowed, so you can't read fixed number of characters and convert to bytes. The easiest approach would be to read by character from StreamReader and convert each 4 non-space ones to bytes.
Note: you can consider writing/reading Base64 by hand directly from bytes. It will give you some performance benefits, but may be hard if you are not good with bit shifting.
Please try using following to encrypt. I am using fileName/filePath as input. You can adjust it as per your requirement. Using this I have encrypted over 1 gb file successfully without any out of memory exception.
public bool EncryptUsingStream(string inputFileName, string outputFileName)
{
bool success = false;
// here assuming that you already have key
byte[] key = new byte[128];
SymmetricAlgorithm algorithm = SymmetricAlgorithm.Create();
algorithm.Key = key;
using (ICryptoTransform transform = algorithm.CreateEncryptor())
{
CryptoStream cs = null;
FileStream fsEncrypted = null;
try
{
using (FileStream fsInput = new FileStream(inputFileName, FileMode.Open, FileAccess.Read))
{
//First write IV
fsEncrypted = new FileStream(outputFileName, FileMode.Create, FileAccess.Write);
fsEncrypted.Write(algorithm.IV, 0, algorithm.IV.Length);
//then write using stream
cs = new CryptoStream(fsEncrypted, transform, CryptoStreamMode.Write);
int bytesRead;
int _bufferSize = 1048576; //buggersize = 1mb;
byte[] buffer = new byte[_bufferSize];
do
{
bytesRead = fsInput.Read(buffer, 0, _bufferSize);
cs.Write(buffer, 0, bytesRead);
} while (bytesRead > 0);
success = true;
}
}
catch (Exception ex)
{
//handle exception or throw.
}
finally
{
if (cs != null)
{
cs.Close();
((IDisposable)cs).Dispose();
if (fsEncrypted != null)
{
fsEncrypted.Close();
}
}
}
}
return success;
}

Base64 Encode a PDF in C#?

Can someone provide some light on how to do this? I can do this for regular text or byte array, but not sure how to approach for a pdf. do i stuff the pdf into a byte array first?
Use File.ReadAllBytes to load the PDF file, and then encode the byte array as normal using Convert.ToBase64String(bytes).
Byte[] fileBytes = File.ReadAllBytes(#"TestData\example.pdf");
var content = Convert.ToBase64String(fileBytes);
There is a way that you can do this in chunks so that you don't have to burn a ton of memory all at once.
.Net includes an encoder that can do the chunking, but it's in kind of a weird place. They put it in the System.Security.Cryptography namespace.
I have tested the example code below, and I get identical output using either my method or Andrew's method above.
Here's how it works: You fire up a class called a CryptoStream. This is kind of an adapter that plugs into another stream. You plug a class called CryptoTransform into the CryptoStream (which in turn is attached to your file/memory/network stream) and it performs data transformations on the data while it's being read from or written to the stream.
Normally, the transformation is encryption/decryption, but .net includes ToBase64 and FromBase64 transformations as well, so we won't be encrypting, just encoding.
Here's the code. I included a (maybe poorly named) implementation of Andrew's suggestion so that you can compare the output.
class Base64Encoder
{
public void Encode(string inFileName, string outFileName)
{
System.Security.Cryptography.ICryptoTransform transform = new System.Security.Cryptography.ToBase64Transform();
using(System.IO.FileStream inFile = System.IO.File.OpenRead(inFileName),
outFile = System.IO.File.Create(outFileName))
using (System.Security.Cryptography.CryptoStream cryptStream = new System.Security.Cryptography.CryptoStream(outFile, transform, System.Security.Cryptography.CryptoStreamMode.Write))
{
// I'm going to use a 4k buffer, tune this as needed
byte[] buffer = new byte[4096];
int bytesRead;
while ((bytesRead = inFile.Read(buffer, 0, buffer.Length)) > 0)
cryptStream.Write(buffer, 0, bytesRead);
cryptStream.FlushFinalBlock();
}
}
public void Decode(string inFileName, string outFileName)
{
System.Security.Cryptography.ICryptoTransform transform = new System.Security.Cryptography.FromBase64Transform();
using (System.IO.FileStream inFile = System.IO.File.OpenRead(inFileName),
outFile = System.IO.File.Create(outFileName))
using (System.Security.Cryptography.CryptoStream cryptStream = new System.Security.Cryptography.CryptoStream(inFile, transform, System.Security.Cryptography.CryptoStreamMode.Read))
{
byte[] buffer = new byte[4096];
int bytesRead;
while ((bytesRead = cryptStream.Read(buffer, 0, buffer.Length)) > 0)
outFile.Write(buffer, 0, bytesRead);
outFile.Flush();
}
}
// this version of Encode pulls everything into memory at once
// you can compare the output of my Encode method above to the output of this one
// the output should be identical, but the crytostream version
// will use way less memory on a large file than this version.
public void MemoryEncode(string inFileName, string outFileName)
{
byte[] bytes = System.IO.File.ReadAllBytes(inFileName);
System.IO.File.WriteAllText(outFileName, System.Convert.ToBase64String(bytes));
}
}
I am also playing around with where I attach the CryptoStream. In the Encode method,I am attaching it to the output (writing) stream, so when I instance the CryptoStream, I use its Write() method.
When I read, I'm attaching it to the input (reading) stream, so I use the read method on the CryptoStream. It doesn't really matter which stream I attach it to. I just have to pass the appropriate Read or Write enumeration member to the CryptoStream's constructor.

Categories

Resources