Updating large binary files - c#

Thera a number of large binary files (>1Gb each) that need to be partially updated with no length changes. How do I update it as fast as possible? Looks like they're being buffered prior saving to disk when I use standard FileStream call. Would like to set certain bytes directly on file system.
Thanks.

Related

Zipping a large amount of data into an output stream without loading all the data into memory first in C#

I have a C# program that generates a bunch of short (10 seconds or so) video files. These are stored in an azure file storage blob. I want the user to be able to download these files at a later date as a zip. However, it would take a substantial amount of memory to load the entire collection of video files into memory to create the zip. I was wondering if it is possible to pull data from a stream into memory, zip encode it, output it to another stream, and dispose of it before moving onto the next segment of data.
Lets say the user has generated 100 10mb videos. If possible, this would allow me to send the zip to the user without first loading the entire 1GB of footage into memory (or storing the entire zip in memory after the fact).
The individual videos are pretty small, so if I need to load an entire file into memory at a time, that is fine as long as I can remove it from memory after it has been encoded and transmitted before moving onto the next file
Yes, it is certainly possible to stream in files, not requiring even any of those to be entirely in memory at any one time, and to compress, stream out, and transmit a zip file containing those, without holding the entire zip file either in memory or mass storage. The zip format is designed to be streamable. However I am not aware of a library that will do that for you.
ZipFile would require saving the entire zip file before transmitting it. If you're ok with saving the zip file in mass storage (not memory) before transmitting, then use ZipFile.
To write your own zip streamer, you would need to generate the zip file format manually. The zip format is documented here. You can use DeflateStream to do the actual compression and Crc32 to compute the CRC-32s. You would transmit the local header before each file's compressed data, followed by a data descriptor after each. You would save the local header information in memory as you go along, and then transmit the central directory and end record after all of the local entries.
zip is a relatively straightforward format, so while it would take a little bit of work, it is definitely doable.

How to efficiently write multiple data ranges from one file on the internet simultaneosly into one File

I want to have multiple network stream threads writing/downloading into one file simultaneosly.
So e.G you have one File and download the ranges:
0-1000
1001-2002
2003-3004...
And I want them all to write their receiving bytes into one File as efficient as possible.
Right now I am downloading each range part into one File and combine them later when they are all finished into the final File.
I would like them to, if it is possible to all write into one File to reduce disk usage and I feel like this could all be done better.
You could use persisted memory mapped files, see https://learn.microsoft.com/en-us/dotnet/standard/io/memory-mapped-files
Persisted files are memory-mapped files that are associated with a source file on a disk. When the last process has finished working with the file, the data is saved to the source file on the disk. These memory-mapped files are suitable for working with extremely large source files.

DotNetZip - Calculate final zip size before calling Save(stream)

When using DotNetZip, is it possible to get what the final zip file size will be before calling Save(stream)? I have a website where users will be downloading fairly large zip files (over 2 gigs), and I would like to be able to stream the file to the user rather then buffering the entire file into memory. Some thing like this...
response.BufferOutput = false;
response.AddHeader("Content-Length", ????);
Is this possible?
If the stream is homogenous, you could waste some time by compressing a 'small' portion ahead, calculating the compression ratio and extrapolating from that.
If you are meaning to set a content-length header or something like that, it can only be done when you (1) write a temporary file (advisable if there is any risk of connection trouble and clients requesting specific chunks anyway) (2) can keep the entire file in memory (presumably on ly on 64bit system with copious memory)
Of course, you could waste enormous resources and just compress the stream twice, but I hope you agree that would be silly.
The way to do what you want is to save the file to a temporary filesystem file, then stream the result to the user. This lets you compute the size then transmit the file.
In this case dotnetzip will not save the file into memory.

MemoryStream "out of memory" C#

I have an implementation of a custom DataObject (Virtual File) see here. I have drag and drop functionality in a control view (drag and drop a file OUT of a control view without having a temp local file).
This works fine with smaller files but as soon as the file is larger than say 12-15megs it says not enough memory available. seems like the memory stream is out of memory.
what can i do about this? can i somehow split a larger byte[] into several memoryStreams and reassemble those to a single file?
Any help would be highly appreciated.
can i somehow split a larger byte[]
into several momoryStreams and
reassemble those to a single file?
Yes.
When I had to deal with a similar situation I built my own stream that internally used byte arrays of 4mb. This "paging" means it never has to allocate ONE LARGE BYTE ARRAY, which is what memory stream does. So, dump memory stream, build your own stream based on another internal storage mechanism.

Options for header in raw byte file

I have a large raw data file (up to 1GB) which contains raw samples from a USB data logger.
I need to store extra information relating to the file (sample rate, description, trigger point, last seek position etc) and was looking into adding this as a some sort of header.
The header file should ideally be human readable and flexible so I've so far ruled out some sort of binary serialization into a header.
I also want to avoid two separate files as they could end up separated when copied or backed up. I remembered somebody telling me that newer *.*x Microsoft Office documents are actually a number of files in a zip. Is there a simple way to achieve this? Could I still keep the quick seek times to the raw file?
Update
I started using the binary serializer and found it to be a pain. I ended up using the xml serializer as I'm more comfortable using it.
I reserve some space at the start of the files for the xml. Simple
When you say you want to make the header human readable, this suggests opening the file in a text editor. Do you really want to do this considering the file size and (I'm assuming), the remainder of the file being non-human readable binary data? If it is, just write the text header data to the start of the binary file - it will be visible when the file is opened but, of course, the remainder of the file will look like garbage.
You could create an uncompressed ZIP archive, which may allow you to seek directly to the binary data. See this for information on creating a ZIP archive: http://weblogs.asp.net/jgalloway/archive/2007/10/25/creating-zip-archives-in-net-without-an-external-library-like-sharpziplib.aspx

Categories

Resources