I have a large file (Source file (assuming 10GB)), that I need to read it by chunks, compress and hash it.
(Finally, we have two outputs: the hash of the file in string format (md5HashHex) and the compressed file in byte format (destData).)
Also before compression, I need to add a header to the destination (destData) and hash it. After that, need to open the source file and read it chunk by chunk, compress and hash each chunk. I found out that my hashing would be different when I read the file chunk by chunk comparing to do the hash in one go. Here is my code, I appreciate if you can help me with that. Also I would like to know if I am doing the compression correctly. Thank you.
public static void CompresingHashing(string inputFile)
{
MD5 md5 = MD5.Create();
int byteCount = 0;
var length = 8192;
var chunk = new byte[length];
byte[] destData;
byte[] compressedData;
byte[] header;
header = Encoding.ASCII.GetBytes("HEADER");
md5.TransformBlock(header, 0, header.Length, null, 0);
destData = AppendingArrays(destData, header); //destination
using (FileStream sourceFile = File.OpenRead(inputFile))
{
while ((byteCount = sourceFile.Read(chunk, 0, length)) > 0)
{
using (var ms = new MemoryStream())
{
using (ZlibStream result = new ZlibStream(ms, CompressionMode.Compress, CompressionLevel.Default)
result.Write(chunk, 0, chunk.Length);
}
compressedData = ms.ToArray();
md5.TransformBlock(compressedData, 0, compressedData.Length, null, 0);
destData = AppendingArrays(destData, compressedData);
}
md5.TransformFinalBlock(chunk, 0, 0);
byte[] md5Hash = md5.Hash;
string md5HashHex = string.Join(string.Empty, md5Hash.Select(b => b.ToString("x2")));
}
Console.WriteLine("Hash : " + hash);
}
public static byte[] AppendingArrays(byte[] existingArray, byte[] ArrayToAdd)
{
byte[] newArray = new byte[existingArray.Length + ArrayToAdd.Length];
existingArray.CopyTo(newArray, 0);
ArrayToAdd.CopyTo(newArray, existingArray.Length);
return newArray;
}
But If I hash destData (which is the source file + the header) I got the different result: (for the sake of space I didn't repeat the code )
.
.
.
destData = AppendingArrays(destData, compressedData);
byte[] md5Hash = md5.ComputeHash(data);
.
.
.
Looks like you are processing the last chunk twice on the md5. Simply call TransformFinalBlock with a byte[0] and length and offset of 0.
Related
I get the following error: The archive entry was compressed using an unsupported compression method.
I got to decode the following gzip compressed base64 string, here is the string:
H4sIAAAAAAAAAD1SwW6bQBAdO0mDrUq9VGoPPWzVSj1ZItiO7aNjk4TI4IRgY7gtMLZxFnBhiYM/oLee+wn8QL+AT+mHVF1y6F5WM+/N07yZaQO0oBG2AaDRhGYYNH424GyS5DFvtOGE000LTjH2t1C/E2jdhgFeM7rJRPi3De3Hp5yx+SHGVIKmFsBX2R8gri96nXXf9zpduT/seArKHcWnA2WoKD30B6LuPk32mPIQsxZIHF94nmL22oYEZ0vKcoTfWNzJ7morB6s75hfapYitR5nNtd1+oMXLwptol1ok8Nur4zwcPgc3y15wuyzclZ57Nstd2ygc25VnUZ8Fk9F/rdnRL3RLlR1rc9CPi64xdY5utCjc3aLvWnrfUK63zo5FztFR3OlDT49UxbAfLvSdGc2nm65+81Dott4V8U63tsxRnK5h+eF6dTESDtpwHoTZntFCzG6WpCi9Jt9X5dDE73kojBKGz8iIIoMksldJwut5fqjKgU3ZUxh/I0lMUhrGXnLIPgvoi4DG+z0rCN+GGUnzGAlPiFdXkiQl6zxD+CRI/JAIYIN8iymhXNCRmHkc+vBWoPcYYMYpqyU/VuVlVbKZeqMa07HpkMn8UVctbSLBqUEjhE5VBn9+/SBV6ZuCS6sSw6qkcVV6XlWOEgEfxF/LI9GEw3fqC0/pmPM09HJer/OsbjQ7gXNzrBlXc7veL0j1ncGpuTBUgCa8mdKIblAcF/wDb9VytI4CAAA\u003d
When I first use the Convert.FromBase64String method I receive this string when I convert it into a string:
a"\u001f�\b\0\0\0\0\0\0\0=R�n�#\u0010\u001d;I��J�Tj\u000f=l�J=Y"؎��c�����c�-0�q\u0016pa��?����\t�#��O�T]r�^V3��Ӽ�i\u0003��\u0011�\u0001�фf\u00184~6�l��1o���M\vN1��P�\u0013h݆\u0001^3��D��\r�ǧ���!�T��\u0016�W�\u001f �/z�u��:]�?�x\n�\u001dŧ\u0003e�(=�\a��>M���\u0010�\u0016H\u001c_x�b�چ\u0004gK�r��X���j+\a�;�\u0017ڥ��G�͵�~���\u009bh�Z$�۫�<\u001c>\a7�^p�,ܕ�{6�]�(\u001cەgQ�\u0005��\u007f���/tK�\u001dksЏ��1u�n�(�ݢ�Zz�P��ΎE��Q��CO�TŰ\u001f.��\u0019ͧ��~�P��\u0015�N���Q��a��zu1\u0012\u000e�p\u001e�ٞ�B�n��(�&�W����y(�\u0012��Ȉ\"�$�WI��y~�ʁM�S\u0018\u007f#ILR\u001a�^r�>\v苀��=+\b߆\u0019I�\u0018\tO�WW�$%�<C�$H��\b�|�)�\Б�y\u001c��V��\u0018`�)�%?V�eU��z�\u001aӱ���QW-m"��A#�NU\u0006\u007f~� U雂K�\u0012ê�qUz^U�\u0012\u0001\u001f�_�#ф�w�\vO��4�r^��n4;�ss�\u0019Ws��/H�����0T�&��҈nP\u001c\u0017�\u0003o�r��\u0002\0\0"
could this have something to do with the problem?
Here is my code:
public static string Decompress(string input)
{
byte[] compressed = Convert.FromBase64String(input);
byte[] decompressed = Decompress(compressed);
return Encoding.UTF8.GetString(decompressed);
}
private static byte[] Decompress(byte[] input)
{
using (var source = new MemoryStream(input))
{
byte[] lengthBytes = new byte[4];
source.Read(lengthBytes, 0, 4);
var length = BitConverter.ToInt32(lengthBytes, 0);
using (var decompressionStream = new GZipStream(source,
CompressionMode.Decompress))
{
var result = new byte[length];
decompressionStream.Read(result, 0, length); Error: The archive entry was compressed using an unsupported compression method.
return result;
}
}
}
There is one little oddity in the base64 string, though it should not result in the error message you are getting. the \u003d should be replaced an equal sign (=), in order for the base64 decoding to work properly. (I can't tell if the string actually has those five characters at the end, or if it is just a representation of a string with an equal sign at the end. In the latter case, I don't know it wouldn't just show an equal sign as opposed to a unicode escaped representation of an equal sign.)
Otherwise, that base64 string decodes to a valid gzip stream that should decompress with no problem.
I solved the issue, I use the GZipStream.CopyTo to a MemoryStream in place of the read function. Here is the code if anyone would need it!
public static string Decompress(string value)
{
byte[] buffer = Convert.FromBase64String(value);
byte[] decompressed;
using (var inputStream = new MemoryStream(buffer))
{
using var outputStream = new MemoryStream();
using (var gzip = new GZipStream(inputStream, CompressionMode.Decompress, leaveOpen: true))
{
gzip.CopyTo(outputStream);
}
decompressed = outputStream.ToArray();
}
return Encoding.UTF8.GetString(decompressed);
}
A program in C# which copy a file or whole folders to another folder is made and in that application checksum SHA-512 has been used to verify that input and output of the copy process is/are identical, the program works fine but I need to test the whole program and especially test or verify the checksum. how can I, give the program an input e.g. a file and in the process modify the file somehow in order to see that checksum detect that error? thanks for your suggestions
Here's a simple example of testing the SHA512 hash. Here we have two tests, TestSHA512Modify and TestSHA512Append. One modifies bytes within the file, open appends bytes to the file. Both are useful tests of the Hash.
static void TestSHA512Modify()
{
var testFile = Path.GetTempFileName();
CreateRandomFile(testFile, 1024);
var sha12 = GetFileSHA512(testFile);
Console.WriteLine("TestSHA12Modify: Original file SHA512: " + ToHexString(sha12));
// Modify file bytes. Here we set byte offset [100] [101] [102]
WriteBytes(testFile, 100, new byte[] { 1, 2, 3 });
var modifiedSha12 = GetFileSHA512(testFile);
Console.WriteLine("TestSHA12Modify: Updated file SHA512: " + ToHexString(modifiedSha12));
Console.WriteLine("TestSHA12Modify: SHA12 Hashes are: " + (sha12.SequenceEqual(modifiedSha12) ? "EQUAL" : "NOT EQUAL"));
}
static void TestSHA512Append()
{
var testFile = Path.GetTempFileName();
CreateRandomFile(testFile, 1024);
var sha12 = GetFileSHA512(testFile);
Console.WriteLine("TestSHA12Append: Original file SHA512: " + ToHexString(sha12));
// Append bytes to the end of a file
AppendBytes(testFile, new byte[] { 1 });
var modifiedSha12 = GetFileSHA512(testFile);
Console.WriteLine("TestSHA12Append: Updated file SHA512: " + ToHexString(modifiedSha12));
Console.WriteLine("TestSHA12Append: SHA12 Hashes are: " + (sha12.SequenceEqual(modifiedSha12) ? "EQUAL" : "NOT EQUAL"));
}
static void CreateRandomFile(string path, int length)
{
// Make some random bytes.
var randomData = new byte[1024];
RNGCryptoServiceProvider p = new RNGCryptoServiceProvider();
p.GetBytes(randomData);
File.WriteAllBytes(path, randomData);
}
static void WriteBytes(string path, int fileOffset, byte[] data)
{
using (var fileStream = new FileStream(path, FileMode.Open))
{
fileStream.Seek(fileOffset, SeekOrigin.Begin);
fileStream.Write(data, 0, data.Length);
}
}
static void AppendBytes(string path, byte[] data)
{
using (var fileStream = new FileStream(path, FileMode.Append))
{
fileStream.Write(data, 0, data.Length);
}
}
static byte[] GetFileSHA512(string path)
{
using (SHA512 sha = new SHA512Managed())
{
return sha.ComputeHash(File.ReadAllBytes(path));
}
}
static string ToHexString(byte[] data)
{
return string.Join("", data.Select(b => b.ToString("X2")));
}
I would like to make some secure container for my application, and here's the map :
I finished opening/saving code now, and tested it, however, ArgumentException was thrown.
The code will run like this.
Create byte[] type variable for containing not crypted user data.
FileStream Writes Magic Number to first 5 bytes.
RijndaelManaged accepts key, and generates Initialization Vector.
FileStream Writes Initialization Vector to next 16 bytes. <- Exception thrown!
CryptoStream transform the variable from 1.
FileStream Writes the crypted data from 22th bytes.
Debugging, and I found the reason that FileStream.Read() has been thrown the Exception. and the message is:
Offset and length were out of bounds for the array or count is greater than the number of elements from index to the end of the source collection.
I tried to set the length of stream as (user data) + 21. but it doesn't work. I attach entire code for saving file, and I hope this problem will be solved.
Thank you!
private bool SaveFile(string FilePath, bool IsCrypt)
{
byte[] Data = Encoding.UTF8.GetBytes(WorkspaceList[CurrentIndex]._textbox.Text);
using (var Stream = new FileStream(FilePath, FileMode.OpenOrCreate, FileAccess.ReadWrite))
{
if (IsCrypt)
{
Stream.SetLength(Data.Length + 21); // Tried when I got Exception
Stream.Write(MagicNumber, 0, 5); //Magic Number
using (var CryptoHandler = new RijndaelManaged()) // AES256 Encryption
{
CryptoHandler.BlockSize = 128;
CryptoHandler.KeySize = 256;
CryptoHandler.Padding = PaddingMode.PKCS7;
CryptoHandler.Mode = CipherMode.CBC;
var tempKey = WorkspaceList[CurrentIndex]._cryptkey;
if(tempKey.Length < 32)
{
tempKey.PadRight(32);
}
else if (tempKey.Length > 32)
{
tempKey.Remove(33);
}
CryptoHandler.Key = Encoding.UTF8.GetBytes(WorkspaceList[CurrentIndex]._cryptkey.PadRight(32));
CryptoHandler.GenerateIV();
Stream.Write(CryptoHandler.IV, 5, 16); //IV Insertion *** ArgumentException ***
var CryptoInstance = CryptoHandler.CreateEncryptor(CryptoHandler.Key, CryptoHandler.IV);
using (var MemoryHandler = new MemoryStream())
{
using (var Crypto = new CryptoStream(MemoryHandler, CryptoInstance, CryptoStreamMode.Write))
{
byte[] _Buffer = Data;
Crypto.Read(Data, 0, Data.Length);
_Buffer = MemoryHandler.ToArray();
Stream.Write(_Buffer, 21, _Buffer.Length); // Insert Crypted Data
Stream.Close();
return true;
}
}
}
}
else
{
Stream.Write(Data, 0, Data.Length);
Stream.Close();
return true;
}
}
}
Replace Stream.Write(CryptoHandler.IV, 5, 16); //IV Insertion
With Stream.Write(CryptoHandler.IV, 0, CryptoHandler.IV.Length); //IV Insertion
array = CryptoHandler.IV (the data you want to write)
offset = 0 (you write from the first byte of array)
count = CryptoHandler.IV.Length (you write all bytes from CryptoHandler.IV)
Note that offset is intrinsic to array, not to the Stream. After a successful Write operation, the stream cursor waits at the last written position. I suppose you specified an offset of 5 to take into account the MagicNumber?
You would have add the same problem with Stream.Write(_Buffer, 21, _Buffer.Length);
I have a requirement where I need to decrypt large files in memory because these files contain sensitive data (SSNs, DOBs, etc). In other words the decrypted data cannot be at rest (on disk). I was able to use the BouncyCastle API for C# and managed to make it to work for files up to 780 MB. Basically here's the code that works:
string PRIVATE_KEY_FILE_PATH = #"c:\pgp\privatekey.gpg";
string PASSPHRASE = "apassphrase";
string pgpData;
string[] pgpLines;
string pgpFilePath = #"C:\test\test.gpg";
Stream inputStream = File.Open(pgpFilePath, FileMode.Open);
Stream privateKeyStream = File.Open(PRIVATE_KEY_FILE_PATH, FileMode.Open);
string pgpData = CryptoHelper.DecryptPgpData(inputStream, privateKeyStream, PASSPHRASE);
string[] pgpLines = pgpData.Split(new string[] { "\r\n", "\n" }, StringSplitOptions.None);
Console.WriteLine(pgpLines.Length);
Console.ReadLine();
foreach (var x in pgpLines)
{
Console.WriteLine(x);
Console.ReadLine();
}
In the above code the entire decrypted data is stored in the pgpData string and that's fine for files up to 780MB as stated previously.
However, I will be getting much larger files and my solution above does not work as I get OutOfMemoryExeception exception.
I've been trying the code below, and I keep getting errors when the DecryptPgpData method below is invoked. When using 512 as the chunksize I get a "Premature end of stream in PartialInputStream" exception. When using 1024 as the chunksize I get a "Exception starting decryption" exception. My question is, is this the correct way to decerypt chunks of data using BouncyCastle? The pgp file I'm trying to decrypt was encrypted using the gpg.exe utility. Any help is much appreciated.....It's been almost two days that I've been trying to make this to work with no success.
string PRIVATE_KEY_FILE_PATH = #"c:\pgp\privatekey.gpg";
string PASSPHRASE = "apassphrase";
string pgpData;
string[] pgpLines;
string pgpFilePath = #"C:\test\test.gpg";
string decryptedData = string.Empty;
FileInfo inFile = new FileInfo(pgpFilePath);
FileStream fs = null;
fs = inFile.OpenRead();
int chunkSize = 1024;
byte[] buffer = new byte[chunkSize];
int totalRead = 0;
while (totalRead < fs.Length)
{
int readBytes = fs.Read(buffer, 0, chunkSize);
totalRead += readBytes;
Stream stream = new MemoryStream(buffer);
decryptedData = CryptoHelper.DecryptPgpData(stream, privateKeyStream, PASSPHRASE);
Console.WriteLine(decryptedData);
}
fs.Close();
Console.WriteLine(totalRead);
Console.ReadLine();
I created a simple program.
I create a string and compress it by following methods and store it in a binary data field type in sql server 2008 (binary(1000) field type).
When I read that binary data and result string is true like original string data with the same length and data but when I want to decompress it it gave me an error.
I use this method to get bytes:
System.Text.ASCIIEncoding.ASCII.GetBytes(mystring)
And this method to get string:
System.Text.ASCIIEncoding.ASCII.GetString(binarydata)
In hard code in VS2012 editor, result string works fine, but when I read it from sql it gives me this error in first line of decompression method:
The input is not a valid Base-64 string as it contains a
non-base 64 character, more than two padding characters,
or a non-white space character among the padding characters.
What's wrong with my code? These two strings are same but
string test1=Decompress("mystring");
...this method works fine but this gave me that error and can not decompress retrieved string
string temp=System.Text.ASCIIEncoding.ASCII.GetString(get data from sql) ;
string test2=Decompress(temp);
The comparing these string do not shows any deference
int result = string.Compare(test1, test2); // result=0
My compression method:
public static string Compress(string text)
{
byte[] buffer = Encoding.UTF8.GetBytes(text);
var memoryStream = new MemoryStream();
using (var gZipStream = new GZipStream(memoryStream, CompressionMode.Compress, true))
{
gZipStream.Write(buffer, 0, buffer.Length);
}
memoryStream.Position = 0;
var compressedData = new byte[memoryStream.Length];
memoryStream.Read(compressedData, 0, compressedData.Length);
var gZipBuffer = new byte[compressedData.Length + 4];
Buffer.BlockCopy(compressedData, 0, gZipBuffer, 4, compressedData.Length);
Buffer.BlockCopy(BitConverter.GetBytes(buffer.Length), 0, gZipBuffer, 0, 4);
return Convert.ToBase64String(gZipBuffer);
}
My decompression method:
public static string Decompress(string compressedText)
{
byte[] gZipBuffer = Convert.FromBase64String(compressedText);
using (var memoryStream = new MemoryStream())
{
int dataLength = BitConverter.ToInt32(gZipBuffer, 0);
memoryStream.Write(gZipBuffer, 4, gZipBuffer.Length - 4);
var buffer = new byte[dataLength];
memoryStream.Position = 0;
using (var gZipStream = new GZipStream(memoryStream, CompressionMode.Decompress))
{
gZipStream.Read(buffer, 0, buffer.Length);
}
return Encoding.UTF8.GetString(buffer);
}
}
The most likely issue is the way you are getting the string from the SQL binary filed.
Currently (I guess, you have not showed how you stored or retrieved your data from SQL)
Compress : Text -> UTF8.GetBytes -> compress -> base64 string-> Send to Sql (transformed to binary)
Decompress: Binary -> String representation of binary -> base64 decode -> decompress -> UTF8.GetString
Your issue is the String representation of binary step is not the same as the Send to Sql (transformed to binary). If you are storing this as a varbinary you should be returning the byte array from compress and decompress should take in a byte array.
public byte[] string Compress(string text)
{
//Snip
}
public static string Decompress(byte[] compressedText)
{
//Snip
}
this changes your process to
Compress : Text -> UTF8.GetBytes -> compress -> Send to Sql
Decompress: Binary -> decompress -> UTF8.GetString