Modified Memorystream Byte length less than when its saved - c#

I'm loading in a binary file into a memory stream and modifying the bytes and then storing the file to disk. However to save time I retain the modified byte array to calculate a checksum. When I load the saved file from disk and calculate the checksum the file length is about 150 bytes different from the original byte length when it was saved and obviously the checksum doesn't match the one before it was saved. Any ideas as to why this happens? I've searched and searched for clues but it looks like I'd have to reload the file after it was saved to calculate an accurate checksum.
Also note that the shorter byte array does render its contents correctly and so does the longer byte array, in fact the two arrays render identically!
Here's the code that collects the modified bytes from the memory stream:
writerStream.Flush();
storedFile = new Byte[writerStream.Length];
writerStream.Position = 0;
writerStream.Read(storedFile, 0, Convert.ToInt32(writerStream.Length));
And here's how I read the file:
using (BinaryReader readFile = new BinaryReader(Delimon.Win32.IO.File.Open(filePath, Delimon.Win32.IO.FileMode.Open)))
{
byte[] cgmBytes = readFile.ReadBytes(Convert.ToInt32(readFile.BaseStream.Length));
hash = fileCheck.ComputeHash(cgmBytes);
}
And here's how the file is saved:
(using BinaryWriter aWriter = new BinaryWriter(Delimon.Win32.IO.File.Create(filePath))
{
aWriter.Write(storedFile);
}
Any suggestions would be much appreciated.
Thx

The problem seems to have resolved itself by simply changing the point where the stream position is set:
writerStream.Flush();
writerStream.Position = 0;
storedFile = new Byte[writerStream.Length];
writerStream.Read(storedFile, 0, Convert.ToInt32(writerStream.Length));
In the previous code the Position was set after reading the stream length, now the position is set before reading the stream length. In either case the byte length DOES NOT CHANGE, but the saved file when retrieved returns the identical byte length now. Why? Not sure, setting the stream position does not affect the stream length nor should it affect how a newly instantiated writer decides to save the byte array. Gremlins?...

Related

Hash from different files is always the same

I'm building an API which has a method that accepts a file via POST request.
Based on that file, i need to create a hash on the file itself (not the name), check if the hash already exists and do some other actions.
My problem is that whatever file i will send through postman, the hash is always the same for every file, which means that every time i get only 1 file which is overwritten.
Here is my method
private string GetHashFromImage(IFormFile file)
{
/* Creates a hash with the image as a parameter
* with the SHA1 algorithm and returns the hash
* as a string since the ComputeHash() method
* creates a byte array.
*/
System.IO.MemoryStream image = new System.IO.MemoryStream();
file.CopyTo(image);
var hashedValue = System.Security.Cryptography.SHA1.Create().ComputeHash(image);
var hashAsString = Convert.ToBase64String(hashedValue).Replace(#"/", #"");
image.Seek(0, System.IO.SeekOrigin.Begin);
return hashAsString;
}
}
I need a hash method that is agnostic to OS and will return the same hash on each file.
Not entirely sure why you're solution is not working but I think I have an idea on how to achieve what you want and it uses MD5 instead of SHA1.
Let's create a function that will receive an IFormFile, compute the MD5 hash of its contents then return the hash value as a string.
using System;
using System.IO;
using System.Security.Cryptography;
private string GetMD5Hash(IFormFile file)
{
// get stream from file then convert it to a MemoryStream
MemoryStream stream = new MemoryStream();
file.OpenReadStream().CopyTo(stream);
// compute md5 hash of the file's byte array.
byte[] bytes = MD5.Create().ComputeHash(stream.ToArray());
return BitConverter.ToString(bytes).Replace("-",string.Empty).ToLower();
}
Hope it works for you!
The real reason of this behaviour is the last position (same as position after image.Seek(0, System.IO.SeekOrigin.End)) in the calculated stream.
Stream operations like CopyTo, ComputeHash, etc change the position of sreams because they have to iterate through them. The final hash of any stream with position on the end is always same - like a hash of empty stream or empty array.
Convert stream to array works, of course, because to array function works with whole stream (from position = 0) but it is not generally very elegant solution because you have to copy whole stream into memory (this is same for memory stream - the data are also in memory).
When you work directly with stream the function (like compute hash from stream) reads the stream by small chunks (like 4096B) and compute hash iteretively (.NET source code). It means that original solution should work when the seek operation to the start is performed before hash calculation.
Actually you should be able to compute hash directly from input stream (in IFormFile) without copy whole stream into memory (array or memory stream) with better performance and without risk e.g. OutOfMemoryException.

Difficulty reading large file into byte array

I have a very large BMP file that I have to read in all at once because I need to reverse the bytes when writing it to a temp file. This BMP is 1.28GB, and I'm getting the "Out of memory" error. I can't read it completely (using ReadAllBytes) or using a buffer into a binary array because I can't initialize an array of that size. I also can't read it into a List (which I could then Reverse()) using a buffer because halfway through it runs out of memory.
So basically the question is, how do I read a very large file backwards (ie, starting at LastByte and ending at FirstByte) and then write that to disk?
Bonus: when writing the reversed file to disk, do not write the last 54 bytes.
With a StreamReader object, you can Seek (place the "cursor") to any particular byte, so you can use that to go over the entire file's contents in reverse.
Example:
const int bufferSize = 1024;
string fileName = 'yourfile.txt';
StreamReader myStream = new StreamReader(fileName);
myStream.BaseStream.Seek(bufferSize, SeekOrigin.End);
char[] bytes = new char[bufferSize];
while(myStream.BaseStream.Position > 0)
{
bytes.Initialize();
myStream.BaseStream.Seek(bufferSize, SeekOrigin.Current);
int bytesRead = myStream.Read(bytes, 0, bufferSize);
}
You can not normally handle so big files in .NET, due the implied memory limit for CLR applications and collections inside them neither for 32 nor for 64 platform.
For this you can use Memory Mapped File, to read a file directly from the disk, without loading it into the memory. One time memory mapping created move the reading pointer to end of the file and read backwards.
Hope this helps.
You can use Memory Mapped Files.
http://msdn.microsoft.com/en-us/library/vstudio/dd997372%28v=vs.100%29.aspx
Also, you can use FileStream and positioning on necessary position by stream.Seek(xxx, SeekOrigin.Begin) (relative position) or Position property (absolute position).

Remove item from binary file

What's the best and the fastest method to remove an item from a binary file?
I have a binary file and I know that I need to remove B number of bytes from a position A, how to do it?
Thanks
You might want to consider working in batches to prevent allocation on the LOH but that depends on the size of your file and the frequency in which you call this logic.
long skipIndex = 100;
int skipLength = 40;
using (FileStream fileStream = File.Open("file.dat", FileMode.Open))
{
int bufferSize;
checked
{
bufferSize = (int)(fileStream.Length - (skipLength + skipIndex));
}
byte[] buffer = new byte[bufferSize];
// read all data after
fileStream.Position = skipIndex + skipLength;
fileStream.Read(buffer, 0, bufferSize);
// write to displacement
fileStream.Position = skipIndex;
fileStream.Write(buffer, 0, bufferSize);
fileStream.SetLength(fileStream.Position); // trim the file
}
Depends... There are a few ways to do this, depending on your requirements.
The basic solution is to read chunks of data from the source file into a target file, skipping over the bits that must be removed (is it always only one segment to remove, or multiple segments?). After you're done, delete the original file and rename the temp file to the original's name.
Things to keep in mind here are that you should tend towards larger chunks rather than smaller. The size of your files will determine a suitable value. 1MB is a good 'default'.
The simple approach assumes that deleting and renaming a new file is not a problem. If you have specific permissions attached to the file, or used NTFS streams or some-such, this approach won't work.
In that case, make a copy of the original file. Then, skip to the first byte after the segment to ignore in the copied file, skip to the start of the segment in the source file, and transfer bytes from copy to original. If you're using Streams, you'll want to call Stream.SetLength to truncate the original to the correct size
If you want to just rewrite the original file, and remove a sequence from it the best way is to "rearrange" the file.
The idea is:
for i = A+1 to file.length - B
file[i] = file[i+B]
For better performance it's best to read and write in chunks and not single bytes. Test with different chunk sizes to see what best for your target system.

how to convert Image to string the most efficient way?

I want to convert an image file to a string. The following works:
MemoryStream ms = new MemoryStream();
Image1.Save(ms, ImageFormat.Jpeg);
byte[] picture = ms.ToArray();
string formmattedPic = Convert.ToBase64String(picture);
However, when saving this to a XmlWriter, it takes ages before it's saved(20secs for a 26k image file). Is there a way to speed this action up?
Thanks,
Raks
There are three points where you are doing large operations needlessly:
Getting the stream's bytes
Converting it to Base64
Writing it to the XmlWriter.
Instead. First call Length and GetBuffer. This let's you operate upon the stream's buffer directly. (Do flush it first though).
Then, implement base-64 yourself. It's relatively simple as you take groups of 3 bytes, do some bit-twiddling to get the index into the character it'll be converted to, and then output that character. At the very end you add some = symbols according to how many bytes where in the last block sent (= for one remainder byte, == for two remainder bytes and none if there were no partial blocks).
Do this writting into a char buffer (a char[]). The most efficient size is a matter for experimentation but I'd start with 2048 characters. When you've filled the buffer, call XmlWriter.WriteRaw on it, and then start writing back at index 0 again.
This way, you're doing less allocations, and you're started on the output from the moment you've got your image loaded into the memory stream. Generally, this should result in better throughput.

How to read a binary file into an array of bytes?

I have a binary file which I am reading to a collection of byte arrays.
The file contains multiple (arbitrary number) of records. Essentially a block of bytes. Each record is of arbitrary length.
The header of the file provides the offsets of each of the records.
record 0: offset 2892
record 1: offset 4849
....
record 98: offset 328932
record 99: offset 338498
I have written code to do loop and read in each record to it's byte array. Looking at difference in offsets gives me the record size. A seek to the offset and then a call to ReadBytes() reads the record into its array.
My current incomplete solution won't work for the last record. How would you read that last record into an array (remember it is of arbitrary length).
As for why? Each record is encrypted and needs to be decrypted separately. I am writing code which will read in each record into a byte array. Decrypt it and then write all the record back to a file.
Code added at request:
//recordOffsets contain byte location of each record start. All headers (other than universal header) are contained within record 0.
recordBlocks = new List<RecordBlock>();
//store all recordOffsets. Record0 offset will be used to load rest of headers. Remaining are used to parse text of eBook.
for (int i = 0; i < standardHeader.numRecs; i++)
{
RecordBlock r = new RecordBlock();
r.offset = bookReader.ReadInt32(EndianReader.Endian.BigEndian);
r.number = bookReader.ReadInt32(EndianReader.Endian.BigEndian);
recordBlocks.Add(r);
}
foreach (RecordBlock r in recordBlocks)
{
if (r.number == recordBlocks.Count)
{
///deal with last record
}
else
{
r.size = recordBlocks[(r.number) + 1].offset - r.offset;
}
bookReader.Seek(r.offset, SeekOrigin.Begin);
r.data = bookReader.ReadBytes(r.size);
}
System.IO.File.ReadAllBytes() will read all bytes in Byte Array and after that you can read from that byte array record by record.
You could use the Length property from the FileInfo Class to determine the total number of bytes, so that you can calculate the amount of bytes of the last record as well.
So you can keep most of your current logic.
your problem which seems to me is how will you get the actual record size of the last record.
Either you may add this information in the header explicitly then your code will work as charm in my thoughts.
I'm not a .net guy, but it would seem you have a couple options. There's got to be a way to tell the size of the file. if you can find that, you can read everything. alternatively, the msdn description for binaryreader.readbytes() says thT if you ask for mroe than the stream contains you'll get whatever's in the file. do you know the max size of the blob you're reading? if so, just read that into pre-cleared memory.

Categories

Resources