Principles behind FileStreaming - c#

I've been working on a project recently that involves a lot of FileStreaming, something which I've not really touched on before.
To try and better acquaint myself with the principles of such methods, I've written some code that (theoretically) downloads a file from one dir to another, and gone through it step by step, commenting in my understanding of what each step achieves, like so...
Get fileinfo object from DownloadRequest Object
RemoteFileInfo fileInfo = svr.DownloadFile(request);
DownloadFile method in WCF Service
public RemoteFileInfo DownloadFile(DownloadRequest request)
{
RemoteFileInfo result = new RemoteFileInfo(); // create empty fileinfo object
try
{
// set filepath
string filePath = System.IO.Path.Combine(request.FilePath , #"\" , request.FileName);
System.IO.FileInfo fileInfo = new System.IO.FileInfo(filePath); // get fileinfo from path
// check if exists
if (!fileInfo.Exists)
throw new System.IO.FileNotFoundException("File not found",
request.FileName);
// open stream
System.IO.FileStream stream = new System.IO.FileStream(filePath,
System.IO.FileMode.Open, System.IO.FileAccess.Read);
// return result
result.FileName = request.FileName;
result.Length = fileInfo.Length;
result.FileByteStream = stream;
}
catch (Exception ex)
{
// do something
}
return result;
}
Use returned FileStream from fileinfo to read into a new write stream
// set new location for downloaded file
string basePath = System.IO.Path.Combine(#"C:\SST Software\DSC\Compilations\" , compName, #"\");
string serverFileName = System.IO.Path.Combine(basePath, file);
double totalBytesRead = 0.0;
if (!Directory.Exists(basePath))
Directory.CreateDirectory(basePath);
int chunkSize = 2048;
byte[] buffer = new byte[chunkSize];
// create new write file stream
using (System.IO.FileStream writeStream = new System.IO.FileStream(serverFileName, FileMode.OpenOrCreate, FileAccess.ReadWrite))
{
do
{
// read bytes from fileinfo stream
int bytesRead = fileInfo.FileByteStream.Read(buffer, 0, chunkSize);
totalBytesRead += (double)bytesRead;
if (bytesRead == 0) break;
// write bytes to output stream
writeStream.Write(buffer, 0, bytesRead);
} while (true);
// report end
Console.WriteLine(fileInfo.FileName + " has been written to " + basePath + " - Done!");
writeStream.Close();
}
What I was hoping for is any clarification or expansion on what exactly happens when using a FileStream.
I can achieve the download, and now I know what code I need to write in order to perform such a download, but I would like to know more about why it works. I can find no 'beginner-friendly' or step by step explanations on the web.
What is happening here behind the scenes?

A stream is just an abstraction, fundamentally it works like a pointer within a collection of data.
Take the example string of "Hello World!" for example, it is just a collection of characters, which are fundamentally just bytes.
As a stream, it could be represented to have:
A length of 12 (possibly more including termination characters etc)
A position in the stream.
You read a stream by moving the position around and requesting data.
So reading the text above could be (in pseudocode) seen to be like this:
do
get next byte
add gotten byte to collection
while not the end of the stream
the entire data is now in the collection
Streams are really useful when it comes to accessing data from sources such as the file system or remote machines.
Imagine a file that is several gigabytes in size, if the OS loaded all of that into memory any time a program wanted to read it (say a video player), there would be a lot of problems.
Instead, what happens is the program requests access to the file, and the OS returns a stream; the stream tells the program how much data there is, and allows it to access that data.
Depending on implementation, the OS may load a certain amount of data into memory ahead of the program accessing it, this is known as a buffer.
Fundamentally though, the program just requests the next bit of data, and the OS either gets it from the buffer, or from the source (e.g. the file on disk).
The same principle applies to streams between different computers, except requesting the next bit of data may very well involve a trip to the remote machine to request it.
The .NET FileStream class and the Stream base class, all just defer to the windows systems for working with streams in the end, there's nothing particularly special about them, it's just what you can do with the abstraction that makes them so powerful.
Writing to a stream is just the same, but it just puts data into the buffer, ready for the requester to access.
Infinite Data
As a user pointed out, streams can be used for data of indeterminate length.
All stream operations take time, so reading a stream is typically a blocking operation that will wait until data is available.
So you could loop forever while the stream is still open, and just wait for data to come in - an example of this in practice would be a live video broadcast.

I've since located a book - C# 5.0 All-In-One For Dummies - It explains everything about all Stream classes, how they work, which one is most appropriate and more.
Only been reading about 30 minutes, already have such a better understanding. Excellent guide!

Related

Reading data from a socket directly into a Memory Mapped File

I'm trying to create a .NET Core application and am getting a little stuck on IPC.
I have data coming in (let's say, from a socket, or a file, some sort of streaming interface), via executable 1. Now, I want this data to be read by executable 2. So, I've created a MMF, where executable 1 writes data, and executable 2 reads data. All is well.
However, I would really like to skip the "copy" step here. If I have a message coming in via a socket, I need to read the message (ergo; store it in some byte array), and then copy it to the appropriate Memory Mapped File.
Is there a way (especially now, with the new Memory, Span etc) to have them use the same memory?
This code almost seems to work, but not completely:
const int bufferSize = 1024;
var mappedFile = MemoryMappedFile.CreateNew("IPCFile", bufferSize);
var fileAccessor = mappedFile.CreateViewAccessor(0, bufferSize, MemoryMappedFileAccess.ReadWrite);
// THere now exists a region of byte[] * bufferSize somewhere. I want to write to that.
byte[] bufferMem = new byte[bufferSize];
// I know the memory address where this area is, and I know the size of it:
unsafe
{
byte* startOfRegion = (byte*)0;
fileAccessor.SafeMemoryMappedViewHandle.AcquirePointer(ref startOfRegion);
// But how do I "assign" this region to the managed object?
// This throws "System.MissingMethodException: 'No parameterless constructor defined for type 'System.Byte[]'."
//bufferMem = Marshal.PtrToStructure<byte[]>(new IntPtr(startOfRegion));
// This almost looks like it works, but bufferMem remains null. Does not give any errors though.
bufferMem = Unsafe.AsRef<byte[]>(startOfRegion);
}
// For a shorter example, just using a file stream
var incomingData = File.OpenRead(#"C:\Temp\plaatje.png");
// Pass in the "reserved" memory region. But StreamPipeReaderOptions wants a MemoryPool<byte>, not a byte[].
// How would I cast that? MemoryPool is abstract, so can't even be instantiated
var reader = PipeReader.Create(incomingData, new StreamPipeReaderOptions(bufferMem));
ReadResult readResult;
// Actually read data
while (true)
{
readResult = await reader.ReadAsync();
if (readResult.IsCompleted || readResult.IsCanceled)
break;
reader.AdvanceTo(readResult.Buffer.Start, readResult.Buffer.End);
}
// Now signal the other process to read the contents of the MMF and continue
This question seems to ask a similar thing, but does not have an answer, and is from 2013.

How to efficiently set the number of bytes to download for HttpWebRequest?

I'm currently working on a file downloader project. The application is designed so as to support resumable downloads. All downloaded data and its metadata(download ranges) are stored on the disk immediately per call to ReadBytes. Let's say that I used the following code snippet :-
var reader = new BinaryReader(response.GetResponseStream());
var buffr = reader.ReadBytes(_speedBuffer);
DownloadSpeed += buffr.Length;//used for reporting speed and zeroed every second
Here _speedBuffer is the number of bytes to download which is set to a default value.
I have tested the application by two methods. First is by downloading a file which is hosted on a local IIS server. The speed is great. Secondly, I tried to download the same file's copy(from where it was actually downloaded) from the internet. My net speed is real slow. Now, what I observed that if I increase the _speedBuffer then the downloading speed from the local server is good but for the internet copy, speed reporting is slow. Whereas if I decrease the value of _speedBuffer, the downloading speed(reporting) for the file's internet copy is good but not for the local server. So I thought, why shouldn't I change the _speedBuffer at runtime. But all the custom algorithms(for changing the value) I came up with were in-efficient. Means the download speed was still slow as compared other downloaders.
Is this approach OK?
Am I doing it the wrong way?
Should I stick with default value for _speedBuffer(byte count)?
The problem with ReadBytes in this case is that it attempts to read exactly that number of bytes, or it returns when there is no more data to read.
So you receive a packet containing 99 bytes of data, then calling ReadBytes(100) will wait for the next packet to include that missing byte.
I wouldn't use a BinaryReader at all:
byte[] buffer = new byte[bufferSize];
using (Stream responseStream = response.GetResponseStream())
{
int bytes;
while ((bytes = responseStream.Read(buffer, 0, buffer.Length)) > 0)
{
DownloadSpeed += bytes;//used for reporting speed and zeroed every second
// on each iteration, "bytes" bytes of the buffer have been filled, store these to disk
}
// bytes was 0: end of stream
}

Streaming a large file into a database BLOB field

I have a large file on disk whose contents I need to store in a SqlServer database as a VARBINARY(MAX) field. By "contents", I mean using something like File.ReadAllBytes(). (I understand the pitfalls of this, but it's the current mode of operations, so I have to deal with it for now.)
I found this answer, which provides a way to stream a large Byte[] by using UPDATE.WRITE:
How to I serialize a large graph of .NET object into a SQL Server BLOB without creating a large buffer?
However, simply reading the contents of a large file into a Byte[] in memory leads to this problem:
OutOfMemoryException when I read 500MB FileStream
I might be overlooking something obvious, but I just can't work out how should I go about getting from a large file on disk, to the resulting storage into the database.
Using some hints I got from these two pages, I have an answer that works:
http://www.syntaxwarriors.com/2013/stream-varbinary-data-to-and-from-mssql-using-csharp/ (this looks a lot like the serialize SO answer, but there's more here...not sure who copied who!).
How do I copy the contents of one stream to another?
Basically, it uses the same methodology as the answer about serializing Blobs, but instead of using BinaryFormatter (a class I'm not fond of anyhow), it creates a FileStream that takes the path to the file, and an extension method to copy that stream into the target stream, or BlobStream, as the example named it.
Here's the extension:
public static class StreamEx
{
public static void CopyTo(this Stream Input, Stream Output)
{
var buffer = new Byte[32768];
Int32 bytesRead;
while ((bytesRead = Input.Read(buffer, 0, buffer.Length)) > 0)
Output.Write(buffer, 0, bytesRead);
}
}
So the trick was to link two streams, copying the data from one to another in chunked fashion, as noted in the comments.

Something like System.Diagnostics.Process.Start to run a stream

I get from server images and videos by stream.
Now I'm saving it:
Stream str = client.GetFile(path);
using (var outStream = new FileStream(#"c:\myFile.jpg", FileMode.Create))
{
var buffer = new byte[4096];
int count;
while ((count = str.Read(buffer, 0, buffer.Length)) > 0)
{
outStream.Write(buffer, 0, count);
}
}
I can be jpg, mpg, flv and a lot of other multimedia types (Before I get stream I know what is a extension of this file).
Now I want to not save it , but run direct from stream.
Examples:
I get stream which is mybirthay.avi and I call my method RunFile(stream) and I think this method should works like System.Diagnostics.Process.Start(path), so my stream should be opened by default program in my SO for example allplayer.
I get stream from myfile.jpg and it is opening by irfanview,
etc...
Is it possible ??
If you are set on not saving the stream to the disk, you could use something like Eldos' Callback File System: http://www.eldos.com/cbfs/
Or, use a ramdisk, save your stream there, and shell to that file location.
Windows puts up a wall between processes so they cannot directly access each other's memory without going through privileged debug-like API functions like ReadProcessMemory. This is what keeps the operating system stable and secure. But spells doom for what you are trying to accomplish.
You'll need to use a file. It won't be much slower than direct memory access, the file system cache takes care of that.

ZIP file created with SharpZipLib cannot be opened on Mac OS X

Argh, today is the day of stupid problems and me being an idiot.
I have an application which creates a zip file containing some JPEGs from a certain directory. I use this code in order to:
read all files from the directory
append each of them to a ZIP file
using (var outStream = new FileStream("Out2.zip", FileMode.Create))
{
using (var zipStream = new ZipOutputStream(outStream))
{
foreach (string pathname in pathnames)
{
byte[] buffer = File.ReadAllBytes(pathname);
ZipEntry entry = new ZipEntry(Path.GetFileName(pathname));
entry.DateTime = now;
zipStream.PutNextEntry(entry);
zipStream.Write(buffer, 0, buffer.Length);
}
}
}
All works well under Windows, when I open the file e. g. with WinRAR, the files are extracted. But as soon as I try to unzip my archive on Mac OS X, it only creates a .cpgz file. Pretty useless.
A normal .zip file created manually with the same files on Windows is extracted without any problems on Windows and Mac OS X.
I found the above code on the Internet, so I am not absolutely sure if the whole thing is correct. I wonder if it is needed to use zipStream.Write() in order to write directly to the stream?
got the exact same problem today. I tried implementing the CRC stuff as proposed but it didn't help.
I finaly found the solution on this page: http://community.sharpdevelop.net/forums/p/7957/23476.aspx#23476
As a result, I just had to add this line in my code:
oZIPStream.UseZip64 = UseZip64.Off;
And the file opens up as it should on MacOS X :-)
Cheers
fred
I don't know for sure, because I am not very familiar with either SharpZipLib or OSX , but I still might have some useful insight for you.
I've spent some time wading through the zip spec, and actually I wrote DotNetZip, which is a zip library for .NET, unrelated to SharpZipLib.
Currently on the user forums for DotNetZip, there's a discussion going on about zip files generated by DotNetZip that cannot be read on OSX. One of the people using the library is having a problem that seems similar to what you are seeing. Except I have no idea what a .cpgxz file is.
We tracked it down, a little. At this point the most promising theory is that OSX does not like "bit 3" in the "general purpose bitfield" in the header of each zip entry.
Bit 3 is not new. PKWare added bit 3 to the spec 17 years ago. It was intended to support streaming generation of archives, in the way that SharpZipLib works. DotNetZip also has a way to produce a zipfile as it is streamed out, and it will also set bit-3 in the zip file if used in this way, although normally DotNetZip will produce a zipfile with bit-3 unset in it.
From what we can tell, when bit 3 is set, the OSX zip reader (whatever it is - like I said I'm not familiar with OSX) chokes on the zip file. The same zip contents produced without bit 3, allows the zip file to be opened. Actually it's not as simple as just flipping one bit - the presence of the bit signals the presence of other metadata. So I am using "bit 3" as a shorthand for all that.
So the theory is that bit 3 causes the problem. I haven't tested this myself. There's been some impedance mismatch on the communication with the person who has the OSX machine - so it is unresolved as yet.
But, if this theory holds, it would explain your situation: that WinRar and any Windows machine can open the file, but OSX cannot.
On the DotNetZip forums, we had a discussion about what to do about the problem. As near as I can tell, the OSX zip reader is broken, and cannot handle bit 3, so the workaround is to produce a zip file with bit 3 unset. I don't know if SharpZipLib can be convinced to do that.
I do know that if you use DotNetZip, and use the normal ZipFile class, and save to a seekable stream (like a filesystem file), you will get a zip that does not have bit 3 set. If the theory is correct, it should open with no problem on the Mac, every time. This is the result the DotNetZip user has reported. It's just one result so not generalizable yet, but it looks plausible.
example code for your scenario:
using (ZipFile zip = new ZipFile()
{
zip.AddFiles(pathnames);
zip.Save("Out2.zip");
}
Just for the curious, in DotNetZip you will get bit 3 set if you use the ZipFile class and save it to a nonseekable stream (like ASPNET's Response.OutputStream) or if you use the ZipOutputStream class in DotNetZip, which always writes forward only (no seeking back).
I think SharpZipLib's ZipOutputStream is also always "forward only."
So, I searched for a few more examples on how to use SharpZipLib and I finally got it to work on Windows and os x. Basically I added the "Crc32" of the file to the zip archive. No idea what this is though.
Here is the code that worked for me:
using (var outStream = new FileStream("Out3.zip", FileMode.Create))
{
using (var zipStream = new ZipOutputStream(outStream))
{
Crc32 crc = new Crc32();
foreach (string pathname in pathnames)
{
byte[] buffer = File.ReadAllBytes(pathname);
ZipEntry entry = new ZipEntry(Path.GetFileName(pathname));
entry.DateTime = now;
entry.Size = buffer.Length;
crc.Reset();
crc.Update(buffer);
entry.Crc = crc.Value;
zipStream.PutNextEntry(entry);
zipStream.Write(buffer, 0, buffer.Length);
}
zipStream.Finish();
// I dont think this is required at all
zipStream.Flush();
zipStream.Close();
}
}
Explanation from cheeso:
CRC is Cyclic Redundancy Check - it's a checksum on the entry data. Normally the header for each entry in a zip file contains a bunch of metadata, including some things that cannot be known until all the entry data has been streamed - CRC, Uncompressed size, and compressed size. When generating a zipfile through a streamed output, the zip spec allows setting a bit (bit 3) to specify that these three data fields will immediately follow the entry data.
If you use ZipOutputStream, normally as you write the entry data, it is compressed and a CRC is calculated, and the 3 data fields are written immediately after the file data.
What you've done is streamed the data twice - the first time implicitly as you calculate the CRC on the file before writing it. If my theory is correct, the what is happening is this: When you provide the CRC to the zipStream before writing the file data, this allows the CRC to appear in its normal place in the entry header, which keeps OSX happy. I'm not sure what happens to the other two quantities (compressed and uncompressed size).
I had exactly the same problem, my mistake was (and in your example code as well) that I didn't supply the file lenght for each entry.
Example code:
...
ZipEntry entry = new ZipEntry(Path.GetFileName(pathname));
entry.DateTime = now;
var fileInfo = new FileInfo(pathname)
entry.size = fileInfo.lenght;
...
I was separating the folder names with a backslash... when I changed this to a forward slash it worked!
What's going on with the .cpgz file is that Archive Utility is being launched by a file with a .zip extension. Archive Utility examines the file and thinks it isn't compressed, so it's compressing it. For some bizarre reason, .cpgz (CPIO archiving + gzip compression) is the default. You can set a different default in Archive Utility's Preferences.
If you do indeed discover this is a problem with OS X's zip decoder, please file a bug. You can also try using the ditto command-line tool to unpack it; you may get a better error message. Of course, OS X also ships unzip, the Info-ZIP utility, but I'd expect that to work.
I agree with Cheeso's answer however if the Input file size is greater than 2GB then byte[] buffer = File.ReadAllBytes(pathname); will throw an IO exception.
So i modified Cheeso code and it works like a charm for all the files.
.
long maxDataToBuffer = 104857600;//100MB
using (var outStream = new FileStream("Out3.zip", FileMode.Create))
{
using (var zipStream = new ZipOutputStream(outStream))
{
Crc32 crc = new Crc32();
foreach (string pathname in pathnames)
{
tempBuffLength = maxDataToBuffer;
FileStream fs = System.IO.File.OpenRead(pathname);
ZipEntry entry = new ZipEntry(Path.GetFileName(pathname));
entry.DateTime = now;
entry.Size = buffer.Length;
crc.Reset();
long totalBuffLength = 0;
if (fs.Length <= tempBuffLength) tempBuffLength = fs.Length;
byte[] buffer = null;
while (totalBuffLength < fs.Length)
{
if ((fs.Length - totalBuffLength) <= tempBuffLength)
tempBuffLength = (fs.Length - totalBuffLength);
totalBuffLength += tempBuffLength;
buffer = new byte[tempBuffLength];
fs.Read(buffer, 0, buffer.Length);
crc.Update(buffer, 0, buffer.Length);
buffer = null;
}
entry.Crc = crc.Value;
zipStream.PutNextEntry(entry);
tempBuffLength = maxDataToBuffer;
fs = System.IO.File.OpenRead(pathname);
totalBuffLength = 0;
if (fs.Length <= tempBuffLength) tempBuffLength = fs.Length;
buffer = null;
while (totalBuffLength < fs.Length)
{
if ((fs.Length - totalBuffLength) <= tempBuffLength)
tempBuffLength = (fs.Length - totalBuffLength);
totalBuffLength += tempBuffLength;
buffer = new byte[tempBuffLength];
fs.Read(buffer, 0, buffer.Length);
zipStream.Write(buffer, 0, buffer.Length);
buffer = null;
}
fs.Close();
}
zipStream.Finish();
// I dont think this is required at all
zipStream.Flush();
zipStream.Close();
}
}
I had a similar problem but on Windows 7. I updated to the as of this writing latest version of ICSharpZipLib 0.86.0.518. From then on I could no longer decompress any ZIP archives created with the code that was working so far.
There error messages were different depending on the tool I tried to extract with:
Unknown compression method.
Compressed size in local header does not match that of central directory header in new zip file.
What did the trick was to remove the CRC calculation as mentioned here: http://community.sharpdevelop.net/forums/t/8630.aspx
So I removed the line that is:
entry.Crc = crc.Value
And from then on I could again unzip the ZIP archives with any third party tool. I hope this helps someone.
There are two things:
Ensure your underlying output stream is seekable, or SharpZipLib won't be able to back up and fill in any ZipEntry fields that you omitted (size, crc, compressed size, ...). As a result, SharpZipLib will force "bit 3" to be enabled. The background has been explained pretty well in previous answers.
Fill in ZipEntry.Size, or explicitly set stream.UseZip64 = UseZip64.Off. The default is to conservatively assume the stream could be very large. Unzipping then requires "pk 4.5" support.
I encountered weird behavior when archive is empty (no entries inside it) it can not be opened on MAC - generates cpgz only. The idea was to put a dummy .txt file in it in case when no files for archiving.

Categories

Resources