I have a requirement where I need to decrypt large files in memory because these files contain sensitive data (SSNs, DOBs, etc). In other words the decrypted data cannot be at rest (on disk). I was able to use the BouncyCastle API for C# and managed to make it to work for files up to 780 MB. Basically here's the code that works:
string PRIVATE_KEY_FILE_PATH = #"c:\pgp\privatekey.gpg";
string PASSPHRASE = "apassphrase";
string pgpData;
string[] pgpLines;
string pgpFilePath = #"C:\test\test.gpg";
Stream inputStream = File.Open(pgpFilePath, FileMode.Open);
Stream privateKeyStream = File.Open(PRIVATE_KEY_FILE_PATH, FileMode.Open);
string pgpData = CryptoHelper.DecryptPgpData(inputStream, privateKeyStream, PASSPHRASE);
string[] pgpLines = pgpData.Split(new string[] { "\r\n", "\n" }, StringSplitOptions.None);
Console.WriteLine(pgpLines.Length);
Console.ReadLine();
foreach (var x in pgpLines)
{
Console.WriteLine(x);
Console.ReadLine();
}
In the above code the entire decrypted data is stored in the pgpData string and that's fine for files up to 780MB as stated previously.
However, I will be getting much larger files and my solution above does not work as I get OutOfMemoryExeception exception.
I've been trying the code below, and I keep getting errors when the DecryptPgpData method below is invoked. When using 512 as the chunksize I get a "Premature end of stream in PartialInputStream" exception. When using 1024 as the chunksize I get a "Exception starting decryption" exception. My question is, is this the correct way to decerypt chunks of data using BouncyCastle? The pgp file I'm trying to decrypt was encrypted using the gpg.exe utility. Any help is much appreciated.....It's been almost two days that I've been trying to make this to work with no success.
string PRIVATE_KEY_FILE_PATH = #"c:\pgp\privatekey.gpg";
string PASSPHRASE = "apassphrase";
string pgpData;
string[] pgpLines;
string pgpFilePath = #"C:\test\test.gpg";
string decryptedData = string.Empty;
FileInfo inFile = new FileInfo(pgpFilePath);
FileStream fs = null;
fs = inFile.OpenRead();
int chunkSize = 1024;
byte[] buffer = new byte[chunkSize];
int totalRead = 0;
while (totalRead < fs.Length)
{
int readBytes = fs.Read(buffer, 0, chunkSize);
totalRead += readBytes;
Stream stream = new MemoryStream(buffer);
decryptedData = CryptoHelper.DecryptPgpData(stream, privateKeyStream, PASSPHRASE);
Console.WriteLine(decryptedData);
}
fs.Close();
Console.WriteLine(totalRead);
Console.ReadLine();
Related
I get the following error: The archive entry was compressed using an unsupported compression method.
I got to decode the following gzip compressed base64 string, here is the string:
H4sIAAAAAAAAAD1SwW6bQBAdO0mDrUq9VGoPPWzVSj1ZItiO7aNjk4TI4IRgY7gtMLZxFnBhiYM/oLee+wn8QL+AT+mHVF1y6F5WM+/N07yZaQO0oBG2AaDRhGYYNH424GyS5DFvtOGE000LTjH2t1C/E2jdhgFeM7rJRPi3De3Hp5yx+SHGVIKmFsBX2R8gri96nXXf9zpduT/seArKHcWnA2WoKD30B6LuPk32mPIQsxZIHF94nmL22oYEZ0vKcoTfWNzJ7morB6s75hfapYitR5nNtd1+oMXLwptol1ok8Nur4zwcPgc3y15wuyzclZ57Nstd2ygc25VnUZ8Fk9F/rdnRL3RLlR1rc9CPi64xdY5utCjc3aLvWnrfUK63zo5FztFR3OlDT49UxbAfLvSdGc2nm65+81Dott4V8U63tsxRnK5h+eF6dTESDtpwHoTZntFCzG6WpCi9Jt9X5dDE73kojBKGz8iIIoMksldJwut5fqjKgU3ZUxh/I0lMUhrGXnLIPgvoi4DG+z0rCN+GGUnzGAlPiFdXkiQl6zxD+CRI/JAIYIN8iymhXNCRmHkc+vBWoPcYYMYpqyU/VuVlVbKZeqMa07HpkMn8UVctbSLBqUEjhE5VBn9+/SBV6ZuCS6sSw6qkcVV6XlWOEgEfxF/LI9GEw3fqC0/pmPM09HJer/OsbjQ7gXNzrBlXc7veL0j1ncGpuTBUgCa8mdKIblAcF/wDb9VytI4CAAA\u003d
When I first use the Convert.FromBase64String method I receive this string when I convert it into a string:
a"\u001f�\b\0\0\0\0\0\0\0=R�n�#\u0010\u001d;I��J�Tj\u000f=l�J=Y"؎��c�����c�-0�q\u0016pa��?����\t�#��O�T]r�^V3��Ӽ�i\u0003��\u0011�\u0001�фf\u00184~6�l��1o���M\vN1��P�\u0013h݆\u0001^3��D��\r�ǧ���!�T��\u0016�W�\u001f �/z�u��:]�?�x\n�\u001dŧ\u0003e�(=�\a��>M���\u0010�\u0016H\u001c_x�b�چ\u0004gK�r��X���j+\a�;�\u0017ڥ��G�͵�~���\u009bh�Z$�۫�<\u001c>\a7�^p�,ܕ�{6�]�(\u001cەgQ�\u0005��\u007f���/tK�\u001dksЏ��1u�n�(�ݢ�Zz�P��ΎE��Q��CO�TŰ\u001f.��\u0019ͧ��~�P��\u0015�N���Q��a��zu1\u0012\u000e�p\u001e�ٞ�B�n��(�&�W����y(�\u0012��Ȉ\"�$�WI��y~�ʁM�S\u0018\u007f#ILR\u001a�^r�>\v苀��=+\b߆\u0019I�\u0018\tO�WW�$%�<C�$H��\b�|�)�\Б�y\u001c��V��\u0018`�)�%?V�eU��z�\u001aӱ���QW-m"��A#�NU\u0006\u007f~� U雂K�\u0012ê�qUz^U�\u0012\u0001\u001f�_�#ф�w�\vO��4�r^��n4;�ss�\u0019Ws��/H�����0T�&��҈nP\u001c\u0017�\u0003o�r��\u0002\0\0"
could this have something to do with the problem?
Here is my code:
public static string Decompress(string input)
{
byte[] compressed = Convert.FromBase64String(input);
byte[] decompressed = Decompress(compressed);
return Encoding.UTF8.GetString(decompressed);
}
private static byte[] Decompress(byte[] input)
{
using (var source = new MemoryStream(input))
{
byte[] lengthBytes = new byte[4];
source.Read(lengthBytes, 0, 4);
var length = BitConverter.ToInt32(lengthBytes, 0);
using (var decompressionStream = new GZipStream(source,
CompressionMode.Decompress))
{
var result = new byte[length];
decompressionStream.Read(result, 0, length); Error: The archive entry was compressed using an unsupported compression method.
return result;
}
}
}
There is one little oddity in the base64 string, though it should not result in the error message you are getting. the \u003d should be replaced an equal sign (=), in order for the base64 decoding to work properly. (I can't tell if the string actually has those five characters at the end, or if it is just a representation of a string with an equal sign at the end. In the latter case, I don't know it wouldn't just show an equal sign as opposed to a unicode escaped representation of an equal sign.)
Otherwise, that base64 string decodes to a valid gzip stream that should decompress with no problem.
I solved the issue, I use the GZipStream.CopyTo to a MemoryStream in place of the read function. Here is the code if anyone would need it!
public static string Decompress(string value)
{
byte[] buffer = Convert.FromBase64String(value);
byte[] decompressed;
using (var inputStream = new MemoryStream(buffer))
{
using var outputStream = new MemoryStream();
using (var gzip = new GZipStream(inputStream, CompressionMode.Decompress, leaveOpen: true))
{
gzip.CopyTo(outputStream);
}
decompressed = outputStream.ToArray();
}
return Encoding.UTF8.GetString(decompressed);
}
I have a large file (Source file (assuming 10GB)), that I need to read it by chunks, compress and hash it.
(Finally, we have two outputs: the hash of the file in string format (md5HashHex) and the compressed file in byte format (destData).)
Also before compression, I need to add a header to the destination (destData) and hash it. After that, need to open the source file and read it chunk by chunk, compress and hash each chunk. I found out that my hashing would be different when I read the file chunk by chunk comparing to do the hash in one go. Here is my code, I appreciate if you can help me with that. Also I would like to know if I am doing the compression correctly. Thank you.
public static void CompresingHashing(string inputFile)
{
MD5 md5 = MD5.Create();
int byteCount = 0;
var length = 8192;
var chunk = new byte[length];
byte[] destData;
byte[] compressedData;
byte[] header;
header = Encoding.ASCII.GetBytes("HEADER");
md5.TransformBlock(header, 0, header.Length, null, 0);
destData = AppendingArrays(destData, header); //destination
using (FileStream sourceFile = File.OpenRead(inputFile))
{
while ((byteCount = sourceFile.Read(chunk, 0, length)) > 0)
{
using (var ms = new MemoryStream())
{
using (ZlibStream result = new ZlibStream(ms, CompressionMode.Compress, CompressionLevel.Default)
result.Write(chunk, 0, chunk.Length);
}
compressedData = ms.ToArray();
md5.TransformBlock(compressedData, 0, compressedData.Length, null, 0);
destData = AppendingArrays(destData, compressedData);
}
md5.TransformFinalBlock(chunk, 0, 0);
byte[] md5Hash = md5.Hash;
string md5HashHex = string.Join(string.Empty, md5Hash.Select(b => b.ToString("x2")));
}
Console.WriteLine("Hash : " + hash);
}
public static byte[] AppendingArrays(byte[] existingArray, byte[] ArrayToAdd)
{
byte[] newArray = new byte[existingArray.Length + ArrayToAdd.Length];
existingArray.CopyTo(newArray, 0);
ArrayToAdd.CopyTo(newArray, existingArray.Length);
return newArray;
}
But If I hash destData (which is the source file + the header) I got the different result: (for the sake of space I didn't repeat the code )
.
.
.
destData = AppendingArrays(destData, compressedData);
byte[] md5Hash = md5.ComputeHash(data);
.
.
.
Looks like you are processing the last chunk twice on the md5. Simply call TransformFinalBlock with a byte[0] and length and offset of 0.
I'm working with GZipStream at the moment using .net 3.5.
I have two methods listed below. As input file I use text file which consists of chars 's'. Size of the file is 2MB. This code works fine if I use .net 4.5 but with .net 3.5 after compress and decompress I get file of size 435KB which of course isn't the same with source file.
If I try to decompress file via WinRAR it is also looks good (the same with source file).
If I try decompress file using GZipStream from .net4.5 (file compressed via GZipStream from .net 3.5) the result is bad.
UPD:
In general I really need to read the file as several separate gzip chunks, in this case all the bytes of copressed files are read at one call of the Read() method so I still don't understand why decompressing doesn't works.
public void CompressFile()
{
string fileIn = #"D:\sin2.txt";
string fileOut = #"D:\sin2.txt.pgz";
using (var fout = File.Create(fileOut))
{
using (var fin = File.OpenRead(fileIn))
{
using (var zip = new GZipStream(fout, CompressionMode.Compress))
{
var buffer = new byte[1024 * 1024 * 10];
int n = fin.Read(buffer, 0, buffer.Length);
zip.Write(buffer, 0, n);
}
}
}
}
public void DecompressFile()
{
string fileIn = #"D:\sin2.txt.pgz";
string fileOut = #"D:\sin2.1.txt";
using (var fsout = File.Create(fileOut))
{
using (var fsIn = File.OpenRead(fileIn))
{
var buffer = new byte[1024 * 1024 * 10];
int n;
while ((n = fsIn.Read(buffer, 0, buffer.Length)) > 0)
{
using (var ms = new MemoryStream(buffer, 0, n))
{
using (var zip = new GZipStream(ms, CompressionMode.Decompress))
{
int nRead = zip.Read(buffer, 0, buffer.Length);
fsout.Write(buffer, 0, nRead);
}
}
}
}
}
}
You're trying to decompress each "chunk" as if it's a separate gzip file. Don't do that - just read from the GZipStream in a loop:
using (var fsout = File.Create(fileOut))
{
using (var fsIn = File.OpenRead(fileIn))
{
using (var zip = new GZipStream(fsIn, CompressionMode.Decompress))
{
var buffer = new byte[1024 * 32];
int bytesRead;
while ((bytesRead = zip.Read(buffer, 0, buffer.Length)) > 0)
{
fsout.Write(buffer, 0, bytesRead);
}
}
}
}
Note that your compression code should look similar, reading in a loop rather than assuming a single call to Read will read all the data.
(Personally I'd skip fsIn, and just use new GZipStream(File.OpenRead(fileIn)) but that's just a personal preference.)
First, as #Jon Skeet mentioned, you are not using Stream.Read method correctly. It doesn't matter if your buffer is big enough or not, the stream is allowed to return less bytes than requested, with zero indicating no more, so reading from stream should always be performed in a loop.
However the main problem in your decompress code is the way you share the buffer. Your read the input into a buffer, than wrap it in a MemoryStream (note that the constructor used does not make a copy of the passed array, but actually sets it as it's internal buffer), and then you try to read and write to that buffer at the same time. Taking into account that decompressing writes data "faster" than reading, it's surprising that your code works at all.
The correct implementation is quite simple
static void CompressFile()
{
string fileIn = #"D:\sin2.txt";
string fileOut = #"D:\sin2.txt.pgz";
using (var input = File.OpenRead(fileIn))
using (var output = new GZipStream(File.Create(fileOut), CompressionMode.Compress))
Write(input, output);
}
static void DecompressFile()
{
string fileIn = #"D:\sin2.txt.pgz";
string fileOut = #"D:\sin2.1.txt";
using (var input = new GZipStream(File.OpenRead(fileIn), CompressionMode.Decompress))
using (var output = File.Create(fileOut))
Write(input, output);
}
static void Write(Stream input, Stream output, int bufferSize = 10 * 1024 * 1024)
{
var buffer = new byte[bufferSize];
for (int readCount; (readCount = input.Read(buffer, 0, buffer.Length)) > 0;)
output.Write(buffer, 0, readCount);
}
I have the following C# code (code is inherited and can't compile it). This is used to decrypt and unzip a saved file.
using System.Security.Cryptography;
using System.Text;
using ICSharpCode.SharpZipLib.Zip;
//Not the real key but same amount of chars
private const string kEncyptionKey = "01234567";
public string DecryptAndDecompressText (string strFileName)
{
// Decryption ///
FileStream fin = null;
try
{
fin = new FileStream(strFileName, FileMode.Open, FileAccess.Read);
}
catch (System.IO.FileNotFoundException)
{
return "";
}
MemoryStream memoryToDecompress = new MemoryStream();
UnicodeEncoding UE = new UnicodeEncoding();
RijndaelManaged RMCrypto = new RijndaelManaged();
// This is the encryption key for our file
byte[] key = UE.GetBytes(kEncyptionKey);
// Decrypt the data to a stream
CryptoStream cs = new CryptoStream( memoryToDecompress,
RMCrypto.CreateDecryptor(key, key),
CryptoStreamMode.Write);
byte [] fileBuffer = new byte[fin.Length];
fin.Read(fileBuffer, 0, fileBuffer.Length);
cs.Write(fileBuffer, 0, fileBuffer.Length);
fin.Close();
// Reset the index of the Memory Stream
memoryToDecompress.Position = 0;
// Let the GC clean this up, we still need the memory stream
//cs.Close();
// Decompress the File
ZipInputStream s;
s = new ZipInputStream(memoryToDecompress);
ZipEntry theEntry;
try
{
theEntry = s.GetNextEntry();
}
catch (System.Exception)
{
// Could not open the file...
return "";
}
}
I'm trying to create a python program to do the same. This is what I've got:
from Crypto.Cipher import AES
KEY = '01234567'.encode('utf-16be')
_f = open('<file>', 'r')
_content = _f.read()
_cipher = AES.new(KEY, AES.MODE_CBC, KEY)
_dcontent = _cipher.decrypt(_content)
with open('extract.zip', 'w') as newfile:
newfile.write(_dcontent)
_f.close()
I'm writing the result to the disk since I expect it to be a zip file (which contains one file). However I can't open the file with Archive Manager.
Any suggestions are welcome!
You have to use the same key. System.Text.UnicodeEncoding is the UTF-16le encoding which also has an equivalent in python:
KEY = '01234567'.encode('utf-16le')
You have to read and write the files in binary mode if you're on Windows:
_f = open('<file>', 'rb')
...
open('extract.zip', 'wb')
You should use the proper zip file library. I am guessing that is something format specific that is failing on your write statement. Using this library should avoid such drawbacks. The open function can take a password as optional in case it is protected.
I'm trying to differentiate between "text files" and "binary" files, as I would effectively like to ignore files with "unreadable" contents.
I have a file that I believe is a GZIP archive. I'm tring to ignore this kind of file by detecting the magic numbers / file signature. If I open the file with the Hex editor plugin in Notepad++ I can see the first three hex codes are 1f 8b 08.
However if I read the file using a StreamReader, I'm not sure how to get to the original bytes..
using (var streamReader = new StreamReader(#"C:\file"))
{
char[] buffer = new char[10];
streamReader.Read(buffer, 0, 10);
var s = new String(buffer);
byte[] bytes = new byte[6];
System.Buffer.BlockCopy(s.ToCharArray(), 0, bytes, 0, 6);
var hex = BitConverter.ToString(bytes);
var otherhex = BitConverter.ToString(System.Text.Encoding.UTF8.GetBytes(s.ToCharArray()));
}
At the end of the using statement I have the following variable values:
hex: "1F-00-FD-FF-08-00"
otherhex: "1F-EF-BF-BD-08-00-EF-BF-BD-EF-BF-BD-0A-51-02-03"
Neither of which start with the hex values shown in Notepad++.
Is it possible to get the original bytes from the result of reading a file via StreamReader?
Your code tries to change a binary buffer into a string. Strings are Unicode in NET so two bytes are required. The resulting is a bit unpredictable as you can see.
Just use a BinaryReader and its ReadBytes method
using(FileStream fs = new FileStream(#"C:\file", FileMode.Open, FileAccess.Read))
{
using (var reader = new BinaryReader(fs, new ASCIIEncoding()))
{
byte[] buffer = new byte[10];
buffer = reader.ReadBytes(10);
if(buffer[0] == 31 && buffer[1] == 139 && buffer[2] == 8)
// you have a signature match....
}
}
Usage (for a pdf file):
Assert.AreEqual("25504446", GetMagicNumbers(filePath, 4));
Method GetMagicNumbers:
private static string GetMagicNumbers(string filepath, int bytesCount)
{
// https://en.wikipedia.org/wiki/List_of_file_signatures
byte[] buffer;
using (var fs = new FileStream(filepath, FileMode.Open, FileAccess.Read))
using (var reader = new BinaryReader(fs))
buffer = reader.ReadBytes(bytesCount);
var hex = BitConverter.ToString(buffer);
return hex.Replace("-", String.Empty).ToLower();
}
You can't. StreamReader is made to read text, not binary. Use the Stream directly to read bytes. In your case FileStream.
To guess whether a file is text or binary you could read the first 4K into a byte[] and interpret that.
Btw, you tried to force chars into bytes. This is invalid by principle. I suggest you familiarize yourself with what an Encoding is: it is the only way to convert between chars and bytes in a semantically correct way.