Streaming a large file into a database BLOB field

Streaming a large file into a database BLOB field - c#

I have a large file on disk whose contents I need to store in a SqlServer database as a VARBINARY(MAX) field. By "contents", I mean using something like File.ReadAllBytes(). (I understand the pitfalls of this, but it's the current mode of operations, so I have to deal with it for now.)
I found this answer, which provides a way to stream a large Byte[] by using UPDATE.WRITE:
How to I serialize a large graph of .NET object into a SQL Server BLOB without creating a large buffer?
However, simply reading the contents of a large file into a Byte[] in memory leads to this problem:
OutOfMemoryException when I read 500MB FileStream
I might be overlooking something obvious, but I just can't work out how should I go about getting from a large file on disk, to the resulting storage into the database.

Using some hints I got from these two pages, I have an answer that works:
http://www.syntaxwarriors.com/2013/stream-varbinary-data-to-and-from-mssql-using-csharp/ (this looks a lot like the serialize SO answer, but there's more here...not sure who copied who!).
How do I copy the contents of one stream to another?
Basically, it uses the same methodology as the answer about serializing Blobs, but instead of using BinaryFormatter (a class I'm not fond of anyhow), it creates a FileStream that takes the path to the file, and an extension method to copy that stream into the target stream, or BlobStream, as the example named it.
Here's the extension:
public static class StreamEx
{
public static void CopyTo(this Stream Input, Stream Output)
{
var buffer = new Byte[32768];
Int32 bytesRead;
while ((bytesRead = Input.Read(buffer, 0, buffer.Length)) > 0)
Output.Write(buffer, 0, bytesRead);
}
}
So the trick was to link two streams, copying the data from one to another in chunked fashion, as noted in the comments.

Related

What is the simplest way to decompress a ZIP buffer in C#?

When I use zlib in C/C++, I have a simple method uncompress which only requires two buffers and no more else. Its definition is like this:
int uncompress (Bytef *dest, uLongf *destLen, const Bytef *source,
uLong sourceLen);
/*
Decompresses the source buffer into the destination buffer. sourceLen is the byte length of the source buffer. Upon entry,
destLen is the total size of the destination buffer, which must be
large enough to hold the entire uncompressed data. (The size of
the uncompressed data must have been saved previously by the
compressor and transmitted to the decompressor by some mechanism
outside the scope of this compression library.) Upon exit, destLen
is the actual size of the uncompressed data.
uncompress returns Z_OK if success, Z_MEM_ERROR if there was not enough memory, Z_BUF_ERROR if there was not enough room in the output
buffer, or Z_DATA_ERROR if the input data was corrupted or incomplete.
In the case where there is not enough room, uncompress() will fill
the output buffer with the uncompressed data up to that point.
*/
I want to know if C# has a similar way. I checked SharpZipLib FAQ as follows but did not quite understand:
How do I compress/decompress files in memory?
Use a memory stream when creating the Zip stream!
MemoryStream outputMemStream = new MemoryStream();
using (ZipOutputStream zipOutput = new ZipOutputStream(outputMemStream)) {
// Use zipOutput stream as normal
...
You can get the resulting data with memory stream methods ToArray or GetBuffer.
ToArray is the cleaner and easiest to use correctly with the penalty
of duplicating allocated memory. GetBuffer returns a raw buffer raw
and so you need to account for the true length yourself.
See the framework class library help for more information.
I can't figure out if this block of code is for compression or decompression, if outputMemStream meas a compressed stream or an uncompressed stream. I really hope there is a easy-to-understand-way like in zlib. Thanks you very much if you can help me.

Check out the ZipArchive class, which I think has the features you need to accomplish in-memory decompression of zip files.
Assuming you have an array of bytes (byte []) which represent the ZIP file in memory, you have to instantiate a ZipArchive object which will be used to read that array of bytes and interpret them as the ZIP file you whish to load. If you check the ZipArchive class' available constructors in documentation, you will see that they require a stream object from which the data will be read. So, first step would be to convert your byte [] array to a stream that can be read by the constructors, and you can do this by using a MemoryStream object.
Here's an example of how to list all entries inside of a ZIP archive represented in memory as a bytes array:
byte [] zipArchiveBytes = ...; // Read the ZIP file in memory as an array of bytes
using (var inputStream = new MemoryStream(zipArchiveBytes))
using (var zipArchive = new ZipArchive(inputStream, ZipArchiveMode.Read))
{
Console.WriteLine("Listing archive entries...");
foreach (var archiveEntry in zipArchive.Entries)
Console.WriteLine($" {archiveEntry.FullName}");
}
Each file in the ZIP archive will be represented as a ZipArchiveEntry instance. This class offers properties which allow you to retrieve information such as the original length of a file from the ZIP archive, its compressed length, its name, etc.
In order to read a specific file which is contained inside the ZIP file, you can use ZipArchiveEntry.Open(). The following exemplifies how to open a specific file from an archive, if you have its FullName inside the ZIP archive:
ZipArchiveEntry archEntry = zipArchive.GetEntry("my-folder-inside-zip/dog-picture.jpg");
byte[] readResult;
using (Stream entryReadStream = archEntry.Open())
{
using (var tempMemStream = new MemoryStream())
{
entryReadStream.CopyTo(tempMemStream);
readResult = tempMemStream.ToArray();
}
}
This example reads the given file contents, and returns them as an array of bytes (stored in the byte[] readResult variable) which you can then use according to your needs.

MemoryStream - OutOfMemoryException when trying to allocate space

I'm attempting to take a large file, uploaded from a web app, and make it a memorystream for processing later. I was receiving OutOfMemory exceptions when trying to copy the HttpPostedFileBase's inputstream into a new MemoryStream. During troubleshooting, I tried just creating a new MemoryStream and allocate the same amount of space (roughly) as the length of the InputStream (935,638,275), like so:
MemoryStream memStream = new MemoryStream(935700000);
Even doing this results in a System.OutOfMemoryException on this line.
I only slightly understand MemoryStreams, and this seems to be something to do with how MemoryStreams buffer data. Is there a way for me to get all of the data into one MemoryStream without too much fuss?

I am not sure what the processing involves, but the HttpPostedFileBase already contains a stream with the data. You can use that stream to process what you need to do.
If you really need to move back and forth or multiple times over the stream, and the input stream does not support seeking/positioning, you may want to stream the data to a temporary local file first and then use a file stream to do your processing against that file.
If many people uploading via your web app, the array size you specified would quickly eat up all memory using a MemoryStream.

Principles behind FileStreaming

I've been working on a project recently that involves a lot of FileStreaming, something which I've not really touched on before.
To try and better acquaint myself with the principles of such methods, I've written some code that (theoretically) downloads a file from one dir to another, and gone through it step by step, commenting in my understanding of what each step achieves, like so...
Get fileinfo object from DownloadRequest Object
RemoteFileInfo fileInfo = svr.DownloadFile(request);
DownloadFile method in WCF Service
public RemoteFileInfo DownloadFile(DownloadRequest request)
{
RemoteFileInfo result = new RemoteFileInfo(); // create empty fileinfo object
try
{
// set filepath
string filePath = System.IO.Path.Combine(request.FilePath , #"\" , request.FileName);
System.IO.FileInfo fileInfo = new System.IO.FileInfo(filePath); // get fileinfo from path
// check if exists
if (!fileInfo.Exists)
throw new System.IO.FileNotFoundException("File not found",
request.FileName);
// open stream
System.IO.FileStream stream = new System.IO.FileStream(filePath,
System.IO.FileMode.Open, System.IO.FileAccess.Read);
// return result
result.FileName = request.FileName;
result.Length = fileInfo.Length;
result.FileByteStream = stream;
}
catch (Exception ex)
{
// do something
}
return result;
}
Use returned FileStream from fileinfo to read into a new write stream
// set new location for downloaded file
string basePath = System.IO.Path.Combine(#"C:\SST Software\DSC\Compilations\" , compName, #"\");
string serverFileName = System.IO.Path.Combine(basePath, file);
double totalBytesRead = 0.0;
if (!Directory.Exists(basePath))
Directory.CreateDirectory(basePath);
int chunkSize = 2048;
byte[] buffer = new byte[chunkSize];
// create new write file stream
using (System.IO.FileStream writeStream = new System.IO.FileStream(serverFileName, FileMode.OpenOrCreate, FileAccess.ReadWrite))
{
do
{
// read bytes from fileinfo stream
int bytesRead = fileInfo.FileByteStream.Read(buffer, 0, chunkSize);
totalBytesRead += (double)bytesRead;
if (bytesRead == 0) break;
// write bytes to output stream
writeStream.Write(buffer, 0, bytesRead);
} while (true);
// report end
Console.WriteLine(fileInfo.FileName + " has been written to " + basePath + " - Done!");
writeStream.Close();
}
What I was hoping for is any clarification or expansion on what exactly happens when using a FileStream.
I can achieve the download, and now I know what code I need to write in order to perform such a download, but I would like to know more about why it works. I can find no 'beginner-friendly' or step by step explanations on the web.
What is happening here behind the scenes?

A stream is just an abstraction, fundamentally it works like a pointer within a collection of data.
Take the example string of "Hello World!" for example, it is just a collection of characters, which are fundamentally just bytes.
As a stream, it could be represented to have:
A length of 12 (possibly more including termination characters etc)
A position in the stream.
You read a stream by moving the position around and requesting data.
So reading the text above could be (in pseudocode) seen to be like this:
do
get next byte
add gotten byte to collection
while not the end of the stream
the entire data is now in the collection
Streams are really useful when it comes to accessing data from sources such as the file system or remote machines.
Imagine a file that is several gigabytes in size, if the OS loaded all of that into memory any time a program wanted to read it (say a video player), there would be a lot of problems.
Instead, what happens is the program requests access to the file, and the OS returns a stream; the stream tells the program how much data there is, and allows it to access that data.
Depending on implementation, the OS may load a certain amount of data into memory ahead of the program accessing it, this is known as a buffer.
Fundamentally though, the program just requests the next bit of data, and the OS either gets it from the buffer, or from the source (e.g. the file on disk).
The same principle applies to streams between different computers, except requesting the next bit of data may very well involve a trip to the remote machine to request it.
The .NET FileStream class and the Stream base class, all just defer to the windows systems for working with streams in the end, there's nothing particularly special about them, it's just what you can do with the abstraction that makes them so powerful.
Writing to a stream is just the same, but it just puts data into the buffer, ready for the requester to access.
Infinite Data
As a user pointed out, streams can be used for data of indeterminate length.
All stream operations take time, so reading a stream is typically a blocking operation that will wait until data is available.
So you could loop forever while the stream is still open, and just wait for data to come in - an example of this in practice would be a live video broadcast.

I've since located a book - C# 5.0 All-In-One For Dummies - It explains everything about all Stream classes, how they work, which one is most appropriate and more.
Only been reading about 30 minutes, already have such a better understanding. Excellent guide!

Uploading files to SQL Server without using too much memory

I am working with SQL Server 2008 and FileStream data types to save large files in the database.
So far everything is working fine but the problem is my upload method looks like this:
public static void UploadFileToDatabase(string location)
{
FileStream fs = new FileStream(location, FileMode.Open, FileAccess.Read);
byte[] data = new byte[fs.Length];
fs.Read(data, 0, System.Convert.ToInt32(fs.Length));
fs.Close();
SaveToDatabaseMethod(data)
data = null;
}
Obviously I am saving the files in memory and then uploading them to server which I think is a really bad practice so is there anyway I can at least limit the amount of memory needed for this ?
What is the best practice in my case ?

Filestream has a WriteByte method
WriteByte
See The Insert Data Example
Not sure this is best practice as it keeps memory down but adds steps. Would need to test it out. If the entire files does not load need logic to back it out.

Something like System.Diagnostics.Process.Start to run a stream

I get from server images and videos by stream.
Now I'm saving it:
Stream str = client.GetFile(path);
using (var outStream = new FileStream(#"c:\myFile.jpg", FileMode.Create))
{
var buffer = new byte[4096];
int count;
while ((count = str.Read(buffer, 0, buffer.Length)) > 0)
{
outStream.Write(buffer, 0, count);
}
}
I can be jpg, mpg, flv and a lot of other multimedia types (Before I get stream I know what is a extension of this file).
Now I want to not save it , but run direct from stream.
Examples:
I get stream which is mybirthay.avi and I call my method RunFile(stream) and I think this method should works like System.Diagnostics.Process.Start(path), so my stream should be opened by default program in my SO for example allplayer.
I get stream from myfile.jpg and it is opening by irfanview,
etc...
Is it possible ??

If you are set on not saving the stream to the disk, you could use something like Eldos' Callback File System: http://www.eldos.com/cbfs/
Or, use a ramdisk, save your stream there, and shell to that file location.

Windows puts up a wall between processes so they cannot directly access each other's memory without going through privileged debug-like API functions like ReadProcessMemory. This is what keeps the operating system stable and secure. But spells doom for what you are trying to accomplish.
You'll need to use a file. It won't be much slower than direct memory access, the file system cache takes care of that.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.