I am trying to read data from a memory mapped file, which is written to the memory file from a C++ program. I am able to use the debug method and write the data as a string from a loop. However, I want to convert the byte array to a usable string that I can then manipulate.
using (MemoryMappedFile mmf = MemoryMappedFile.OpenExisting("DataFile"))
{
using (MemoryMappedViewAccessor reader = mmf.CreateViewAccessor())
{
var bytes = new byte[reader.Capacity];
reader.ReadArray<byte>(0, bytes, 0, bytes.Length);
for(int i = 0; i<bytes.Length; i++)
{
System.Diagnostics.Debug.Write((char) bytes[i]);
}
}
}
I've tried removing the for loop and replacing it with a GetString() encoder, but it only returns a question mark character instead of the full data string.
So long as you know the encoding the bytes were written with this is easy, you would use for example the System.Text.UTF8Encoding class (probably the System.Text.Encoding.UTF8 instance) and the GetString method:
string str = System.Text.Encoding.UTF8.GetString(bytes, 0, bytes.Length);
If you don't know the encoding this becomes much harder, you would have to try and use a heuristic method and guess what the encoding actually is.
Related
I am trying to convert byte[] to base64 string format so that i can send that information to third party. My code as below:
byte[] ByteArray = System.IO.File.ReadAllBytes(path);
string base64Encoded = System.Convert.ToBase64String(ByteArray);
I am getting below error:
Exception of type 'System.OutOfMemoryException' was thrown. Can you
help me please ?
Update
I just spotted #PanagiotisKanavos' comment pointing to Is there a Base64Stream for .NET?. This does essentially the same thing as my code below attempts to achieve (i.e. allows you to process the file without having to hold the whole thing in memory in one go), but without the overhead/risk of self-rolled code / rather using a standard .Net library method for the job.
Original
The below code will create a new temporary file containing the Base64 encoded version of your input file.
This should have a lower memory footprint, since rather than doing all data at once, we handle it several bytes at a time.
To avoid holding the output in memory, I've pushed that back to a temp file, which is returned. When you later need to use that data for some other process, you'd need to stream it (i.e. so that again you're not consuming all of this data at once).
You'll also notice that I've used WriteLine instead of Write; which will introduce non base64 encoded characters (i.e. the line breaks). That's deliberate, so that if you consume the temp file with a text reader you can easily process it line by line.
However, you can amend per your needs.
void Main()
{
var inputFilePath = #"c:\temp\bigfile.zip";
var convertedDataPath = ConvertToBase64TempFile(inputFilePath);
Console.WriteLine($"Take a look in {convertedDataPath} for your converted data");
}
//inputFilePath = where your source file can be found. This is not impacted by the below code
//bufferSizeInBytesDiv3 = how many bytes to read at a time (divided by 3); the larger this value the more memory is required, but the better you'll find performance. The Div3 part is because we later multiple this by 3 / this ensures we never have to deal with remainders (i.e. since 3 bytes = 4 base64 chars)
public string ConvertToBase64TempFile(string inputFilePath, int bufferSizeInBytesDiv3 = 1024)
{
var tempFilePath = System.IO.Path.GetTempFileName();
using (var fileStream = File.Open(inputFilePath,FileMode.Open))
{
using (var reader = new BinaryReader(fileStream))
{
using (var writer = new StreamWriter(tempFilePath))
{
byte[] data;
while ((data = reader.ReadBytes(bufferSizeInBytesDiv3 * 3)).Length > 0)
{
writer.WriteLine(System.Convert.ToBase64String(data)); //NB: using WriteLine rather than Write; so when consuming this content consider removing line breaks (I've used this instead of write so you can easily stream the data in chunks later)
}
}
}
}
return tempFilePath;
}
I have a compressed (LZMA) .txt file and need to decompress it, but i have to exclude the first 4 bytes as they are not part of the file content.
I load my file like this:
byte[] curFile = File.ReadAllBytes(files[i]);
Performance is critical as i have to loop trough over 14k+ files, average file size is around 4KB.
for (int i = 0; i < filePath.Length; i++)
{
var positionToSkipTo = 4;
using (var fileStream = File.OpenRead(filePath))
{
fileStream.Seek(positionToSkipTo, SeekOrigin.Begin);
var curFile = new byte[fileStream.Length - positionToSkipTo];
fileStream.Read(curFile, 0, curFile.Length);
//Do your thing
}
}
Everything is self-explanatory. Important functions are listed at MSDN FileStream class documentation.
If you're just using a byte array, you can utilize the ConstrainedCopy method in the Array class.
Array.ConstrainedCopy(unclippedArray, 4, clippedArray, 0, unclippedArray.Length - 4);
If you're not going to just be dealing with the raw bytes, utilize a memory stream and a binary reader or a filestream like other people suggested.
Following feedback from Alexei, a simplification of the question:
How do I use a buffered Stream approach to convert the contents of a CryptoStream (using ToBase64Transform) into a StreamWriter (Unicode encoding) without using Convert.ToBase64String()?
Note: Calling Convert.ToBase64String() throws OutOfMemoryException, hence the need for a buffered/Stream approach to the conversion.
You probably should implement custom Stream, not a TextWriter. It is much easier to compose streams than writers (like pass your stream to compressed stream).
To create custom stream - derive from Stream and implement at least Write and Flush (and Read if you need R/W stream). The rest is more or less optional and depends on you additional needs, regular copy to other stream does not need anything else.
In constructor get inner stream passed to you for writing to. Base64 is always producing ASCII characters, so it should be easy to write output as UTF-8 with or without BOM directly to a stream, but if you want to specify encoding you can wrap inner stream with StreamWriter internally.
In your Write implementation buffer data till you get enough bytes to have block of multiple of 3 bytes (i.e. 300) and call Convert.ToBase64String on that portion. Make sure not to loose not-yet-converted portion. Since Base64 converts 3 bytes to 4 characters converting in blocks of multiple of 3 size will never have =/== padding at the end and can be concatenated with next block. So write that converted portion into inner stream/writer. Note that you want to limit block size to something relatively small like 3*10000 to avoid allocation of your blocks on large objects heap.
In Flush make sure to convert the last unwritten bytes (this will be the only one with = padding at the end) and write it to the stream too.
For reading you may need to be more careful as in Base64 white spaces are allowed, so you can't read fixed number of characters and convert to bytes. The easiest approach would be to read by character from StreamReader and convert each 4 non-space ones to bytes.
Note: you can consider writing/reading Base64 by hand directly from bytes. It will give you some performance benefits, but may be hard if you are not good with bit shifting.
Please try using following to encrypt. I am using fileName/filePath as input. You can adjust it as per your requirement. Using this I have encrypted over 1 gb file successfully without any out of memory exception.
public bool EncryptUsingStream(string inputFileName, string outputFileName)
{
bool success = false;
// here assuming that you already have key
byte[] key = new byte[128];
SymmetricAlgorithm algorithm = SymmetricAlgorithm.Create();
algorithm.Key = key;
using (ICryptoTransform transform = algorithm.CreateEncryptor())
{
CryptoStream cs = null;
FileStream fsEncrypted = null;
try
{
using (FileStream fsInput = new FileStream(inputFileName, FileMode.Open, FileAccess.Read))
{
//First write IV
fsEncrypted = new FileStream(outputFileName, FileMode.Create, FileAccess.Write);
fsEncrypted.Write(algorithm.IV, 0, algorithm.IV.Length);
//then write using stream
cs = new CryptoStream(fsEncrypted, transform, CryptoStreamMode.Write);
int bytesRead;
int _bufferSize = 1048576; //buggersize = 1mb;
byte[] buffer = new byte[_bufferSize];
do
{
bytesRead = fsInput.Read(buffer, 0, _bufferSize);
cs.Write(buffer, 0, bytesRead);
} while (bytesRead > 0);
success = true;
}
}
catch (Exception ex)
{
//handle exception or throw.
}
finally
{
if (cs != null)
{
cs.Close();
((IDisposable)cs).Dispose();
if (fsEncrypted != null)
{
fsEncrypted.Close();
}
}
}
}
return success;
}
I've written several ints, char[]s and the such to a data file with BinaryWriter in C#. Reading the file back in (in C#) with BinaryReader, I can recreate all of the pieces of the file perfectly.
However, attempting to read them back in with C++ yields some scary results. I was using fstream to attempt to read back the data and the data was not reading in correctly. In C++, I set up an fstream with ios::in|ios::binary|ios::ate and used seekg to target my location. I then read the next four bytes, which were written as the integer "16" (and reads correctly into C#). This reads as 1244780 in C++ (not the memory address, I checked). Why would this be? Is there an equivalent to BinaryReader in C++? I noticed it mentioned on msdn, but that's Visual C++ and intellisense doesn't even look like c++, to me.
Example code for writing the file (C#):
public static void OpenFile(string filename)
{
fs = new FileStream(filename, FileMode.Create);
w = new BinaryWriter(fs);
}
public static void WriteHeader()
{
w.Write('A');
w.Write('B');
}
public static byte[] RawSerialize(object structure)
{
Int32 size = Marshal.SizeOf(structure);
IntPtr buffer = Marshal.AllocHGlobal(size);
Marshal.StructureToPtr(structure, buffer, true);
byte[] data = new byte[size];
Marshal.Copy(buffer, data, 0, size);
Marshal.FreeHGlobal(buffer);
return data;
}
public static void WriteToFile(Structures.SomeData data)
{
byte[] buffer = Serializer.RawSerialize(data);
w.Write(buffer);
}
I'm not sure how I could show you the data file.
Example of reading the data back (C#):
BinaryReader reader = new BinaryReader(new FileStream("C://chris.dat", FileMode.Open));
char[] a = new char[2];
a = reader.ReadChars(2);
Int32 numberoffiles;
numberoffiles = reader.ReadInt32();
Console.Write("Reading: ");
Console.WriteLine(a);
Console.Write("NumberOfFiles: ");
Console.WriteLine(numberoffiles);
This I want to perform in c++. Initial attempt (fails at first integer):
fstream fin("C://datafile.dat", ios::in|ios::binary|ios::ate);
char *memblock = 0;
int size;
size = 0;
if (fin.is_open())
{
size = static_cast<int>(fin.tellg());
memblock = new char[static_cast<int>(size+1)];
memset(memblock, 0, static_cast<int>(size + 1));
fin.seekg(0, ios::beg);
fin.read(memblock, size);
fin.close();
if(!strncmp("AB", memblock, 2)){
printf("test. This works.");
}
fin.seekg(2); //read the stream starting from after the second byte.
int i;
fin >> i;
Edit: It seems that no matter what location I use "seekg" to, I receive the exact same value.
You realize that a char is 16 bits in C# rather than the 8 it usually is in C. This is because a char in C# is designed to handle Unicode text rather than raw data. Therefore, writing chars using the BinaryWriter will result in Unicode being written rather than raw bytes.
This may have lead you to calculate the offset of the integer incorrectly. I recommend you take a look at the file in a hex editor, and if you cannot work out the issue post the file and the code here.
EDIT1
Regarding your C++ code, do not use the >> operator to read from a binary stream. Use read() with the address of the int that you want to read to.
int i;
fin.read((char*)&i, sizeof(int));
EDIT2
Reading from a closed stream is also going to result in undefined behavior. You cannot call fin.close() and then still expect to be able to read from it.
This may or may not be related to the problem, but...
When you create the BinaryWriter, it defaults to writing chars in UTF-8. This means that some of them may be longer than one byte, throwing off your seeks.
You can avoid this by using the 2 argument constructor to specify the encoding. An instance of System.Text.ASCIIEncoding would be the same as what C/C++ use by default.
There are many thing going wrong in your C++ snippet. You shouldn't mix binary reading with formatted reading:
// The file is closed after this line. It is WRONG to read from a closed file.
fin.close();
if(!strncmp("AB", memblock, 2)){
printf("test. This works.");
}
fin.seekg(2); // You are moving the "get pointer" of a closed file
int i;
// Even if the file is opened, you should not mix formatted reading
// with binary reading. ">>" is just an operator for reading formatted data.
// In other words, it is for reading "text" and converting it to a
// variable of a specific data type.
fin >> i;
If it's any help, I went through how the BinaryWriter writes data here.
It's been a while but I'll quote it and hope it's accurate:
Int16 is written as 2 bytes and padded.
Int32 is written as Little Endian and zero padded
Floats are more complicated: it takes the float value and dereferences it, getting the memory address's contents which is a hexadecimal
Can someone provide some light on how to do this? I can do this for regular text or byte array, but not sure how to approach for a pdf. do i stuff the pdf into a byte array first?
Use File.ReadAllBytes to load the PDF file, and then encode the byte array as normal using Convert.ToBase64String(bytes).
Byte[] fileBytes = File.ReadAllBytes(#"TestData\example.pdf");
var content = Convert.ToBase64String(fileBytes);
There is a way that you can do this in chunks so that you don't have to burn a ton of memory all at once.
.Net includes an encoder that can do the chunking, but it's in kind of a weird place. They put it in the System.Security.Cryptography namespace.
I have tested the example code below, and I get identical output using either my method or Andrew's method above.
Here's how it works: You fire up a class called a CryptoStream. This is kind of an adapter that plugs into another stream. You plug a class called CryptoTransform into the CryptoStream (which in turn is attached to your file/memory/network stream) and it performs data transformations on the data while it's being read from or written to the stream.
Normally, the transformation is encryption/decryption, but .net includes ToBase64 and FromBase64 transformations as well, so we won't be encrypting, just encoding.
Here's the code. I included a (maybe poorly named) implementation of Andrew's suggestion so that you can compare the output.
class Base64Encoder
{
public void Encode(string inFileName, string outFileName)
{
System.Security.Cryptography.ICryptoTransform transform = new System.Security.Cryptography.ToBase64Transform();
using(System.IO.FileStream inFile = System.IO.File.OpenRead(inFileName),
outFile = System.IO.File.Create(outFileName))
using (System.Security.Cryptography.CryptoStream cryptStream = new System.Security.Cryptography.CryptoStream(outFile, transform, System.Security.Cryptography.CryptoStreamMode.Write))
{
// I'm going to use a 4k buffer, tune this as needed
byte[] buffer = new byte[4096];
int bytesRead;
while ((bytesRead = inFile.Read(buffer, 0, buffer.Length)) > 0)
cryptStream.Write(buffer, 0, bytesRead);
cryptStream.FlushFinalBlock();
}
}
public void Decode(string inFileName, string outFileName)
{
System.Security.Cryptography.ICryptoTransform transform = new System.Security.Cryptography.FromBase64Transform();
using (System.IO.FileStream inFile = System.IO.File.OpenRead(inFileName),
outFile = System.IO.File.Create(outFileName))
using (System.Security.Cryptography.CryptoStream cryptStream = new System.Security.Cryptography.CryptoStream(inFile, transform, System.Security.Cryptography.CryptoStreamMode.Read))
{
byte[] buffer = new byte[4096];
int bytesRead;
while ((bytesRead = cryptStream.Read(buffer, 0, buffer.Length)) > 0)
outFile.Write(buffer, 0, bytesRead);
outFile.Flush();
}
}
// this version of Encode pulls everything into memory at once
// you can compare the output of my Encode method above to the output of this one
// the output should be identical, but the crytostream version
// will use way less memory on a large file than this version.
public void MemoryEncode(string inFileName, string outFileName)
{
byte[] bytes = System.IO.File.ReadAllBytes(inFileName);
System.IO.File.WriteAllText(outFileName, System.Convert.ToBase64String(bytes));
}
}
I am also playing around with where I attach the CryptoStream. In the Encode method,I am attaching it to the output (writing) stream, so when I instance the CryptoStream, I use its Write() method.
When I read, I'm attaching it to the input (reading) stream, so I use the read method on the CryptoStream. It doesn't really matter which stream I attach it to. I just have to pass the appropriate Read or Write enumeration member to the CryptoStream's constructor.