Best way to handle a large download site?

Best way to handle a large download site? - c#

I currently have a download site for my school that is based in .net. We offer anything from antivirus, autocad, spss, office, and a number of large applications for students to download. It's currently setup to handle them in 1 of 2 ways; anything over 800 megs is directly accessable through a seperate website while under 800 megs is secured behind .net code using a filestream to feed it to the user in 10,000 byte chunks. I have all sorts of issues with feeding downloads this way... I'd like to be able to secure the large downloads, but the .net site just can't handle it, and the smaller files will often fail. Is there a better approach to this?
edit - I just wanted to update on how I finally solved this: I ended up adding my download directory as a virtual directory in iis and specified custom http handler. The handler grabbed the file name from the request and checked for permissions based on that, then either redirected the users to a error/login page, or let the download continue. I've had no problems with this solution, and I've been on it for probably 7 months now, serving files several gigs in size.

If you are having performance issues and you are delivering files that exist on the filesystem (versus a DB), use the HttpResponse.TransmitFile function.
As for the failures, you likely have a bug. If you post the code you may be better response.

Look into bit torrent. It's designed specifically for this sort of thing and is quite flexible.

I have two recommendations:
Increase the buffer size so that there are less iterations
AND/OR
Do not call IsClientConnected on each iteration.
The reason is that according to Microsoft Guidelines:
Response.IsClientConnected has some costs, so only use it before an operation that takes at least, say 500 milliseconds (that's a long time if you're trying to sustain a throughput of dozens of pages per second). As a general rule of thumb, don't call it in every iteration of a tight loop

Whats wrong with using a robust web server (like Apache) and let it deal with files. Just as you now separate larger files to a webserver, why not serve smaller files the same way too?
Is there some hidden requirements to prevent this?

Ok, this is what it currently looks like...
Stream iStream = null;
// Buffer to read 10K bytes in chunk:
byte[] buffer = new Byte[10000];
// Length of the file:
int length;
// Total bytes to read:
long dataToRead;
if (File.Exists(localfilename))
{
try
{
// Open the file.
iStream = new System.IO.FileStream(localfilename, System.IO.FileMode.Open, System.IO.FileAccess.Read, System.IO.FileShare.Read);
// Total bytes to read:
dataToRead = iStream.Length;
context.Response.Clear();
context.Response.Buffer = false;
context.Response.ContentType = "application/octet-stream";
Int64 fileLength = iStream.Length;
context.Response.AddHeader("Content-Length", fileLength.ToString());
context.Response.AddHeader("Content-Disposition", "attachment; filename=" + originalFilename);
// Read the bytes.
while (dataToRead > 0)
{
// Verify that the client is connected.
if (context.Response.IsClientConnected)
{
// Read the data in buffer.
length = iStream.Read(buffer, 0, 10000);
// Write the data to the current output stream.
context.Response.OutputStream.Write(buffer, 0, length);
// Flush the data to the HTML output.
context.Response.Flush();
buffer = new Byte[10000];
dataToRead = dataToRead - length;
}
else
{
//prevent infinite loop if user disconnects
dataToRead = -1;
}
}
iStream.Close();
iStream.Dispose();
}
catch (Exception ex)
{
if (iStream != null)
{
iStream.Close();
iStream.Dispose();
}
if (ex.Message.Contains("The remote host closed the connection"))
{
context.Server.ClearError();
context.Trace.Warn("GetFile", "The remote host closed the connection");
}
else
{
context.Trace.Warn("IHttpHandler", "DownloadFile: - Error occurred");
context.Trace.Warn("IHttpHandler", "DownloadFile: - Exception", ex);
}
context.Response.Redirect("default.aspx");
}
}

There's a lot of licensing restrictions... for example we have an Office 2007 license agreement that says any technical staff on campus can download and install Office, but not students. So we don't let students download it. So our solution was to hide those downloads behind .net.

Amazon S3 sounds ideal for what you need, but it is commercial service and fileas are served from their servers.
You should try to contact amazon and ask for academic pricing. Even if they don't have one.

Related

How to efficiently set the number of bytes to download for HttpWebRequest?

I'm currently working on a file downloader project. The application is designed so as to support resumable downloads. All downloaded data and its metadata(download ranges) are stored on the disk immediately per call to ReadBytes. Let's say that I used the following code snippet :-
var reader = new BinaryReader(response.GetResponseStream());
var buffr = reader.ReadBytes(_speedBuffer);
DownloadSpeed += buffr.Length;//used for reporting speed and zeroed every second
Here _speedBuffer is the number of bytes to download which is set to a default value.
I have tested the application by two methods. First is by downloading a file which is hosted on a local IIS server. The speed is great. Secondly, I tried to download the same file's copy(from where it was actually downloaded) from the internet. My net speed is real slow. Now, what I observed that if I increase the _speedBuffer then the downloading speed from the local server is good but for the internet copy, speed reporting is slow. Whereas if I decrease the value of _speedBuffer, the downloading speed(reporting) for the file's internet copy is good but not for the local server. So I thought, why shouldn't I change the _speedBuffer at runtime. But all the custom algorithms(for changing the value) I came up with were in-efficient. Means the download speed was still slow as compared other downloaders.
Is this approach OK?
Am I doing it the wrong way?
Should I stick with default value for _speedBuffer(byte count)?

The problem with ReadBytes in this case is that it attempts to read exactly that number of bytes, or it returns when there is no more data to read.
So you receive a packet containing 99 bytes of data, then calling ReadBytes(100) will wait for the next packet to include that missing byte.
I wouldn't use a BinaryReader at all:
byte[] buffer = new byte[bufferSize];
using (Stream responseStream = response.GetResponseStream())
{
int bytes;
while ((bytes = responseStream.Read(buffer, 0, buffer.Length)) > 0)
{
DownloadSpeed += bytes;//used for reporting speed and zeroed every second
// on each iteration, "bytes" bytes of the buffer have been filled, store these to disk
}
// bytes was 0: end of stream
}

Principles behind FileStreaming

I've been working on a project recently that involves a lot of FileStreaming, something which I've not really touched on before.
To try and better acquaint myself with the principles of such methods, I've written some code that (theoretically) downloads a file from one dir to another, and gone through it step by step, commenting in my understanding of what each step achieves, like so...
Get fileinfo object from DownloadRequest Object
RemoteFileInfo fileInfo = svr.DownloadFile(request);
DownloadFile method in WCF Service
public RemoteFileInfo DownloadFile(DownloadRequest request)
{
RemoteFileInfo result = new RemoteFileInfo(); // create empty fileinfo object
try
{
// set filepath
string filePath = System.IO.Path.Combine(request.FilePath , #"\" , request.FileName);
System.IO.FileInfo fileInfo = new System.IO.FileInfo(filePath); // get fileinfo from path
// check if exists
if (!fileInfo.Exists)
throw new System.IO.FileNotFoundException("File not found",
request.FileName);
// open stream
System.IO.FileStream stream = new System.IO.FileStream(filePath,
System.IO.FileMode.Open, System.IO.FileAccess.Read);
// return result
result.FileName = request.FileName;
result.Length = fileInfo.Length;
result.FileByteStream = stream;
}
catch (Exception ex)
{
// do something
}
return result;
}
Use returned FileStream from fileinfo to read into a new write stream
// set new location for downloaded file
string basePath = System.IO.Path.Combine(#"C:\SST Software\DSC\Compilations\" , compName, #"\");
string serverFileName = System.IO.Path.Combine(basePath, file);
double totalBytesRead = 0.0;
if (!Directory.Exists(basePath))
Directory.CreateDirectory(basePath);
int chunkSize = 2048;
byte[] buffer = new byte[chunkSize];
// create new write file stream
using (System.IO.FileStream writeStream = new System.IO.FileStream(serverFileName, FileMode.OpenOrCreate, FileAccess.ReadWrite))
{
do
{
// read bytes from fileinfo stream
int bytesRead = fileInfo.FileByteStream.Read(buffer, 0, chunkSize);
totalBytesRead += (double)bytesRead;
if (bytesRead == 0) break;
// write bytes to output stream
writeStream.Write(buffer, 0, bytesRead);
} while (true);
// report end
Console.WriteLine(fileInfo.FileName + " has been written to " + basePath + " - Done!");
writeStream.Close();
}
What I was hoping for is any clarification or expansion on what exactly happens when using a FileStream.
I can achieve the download, and now I know what code I need to write in order to perform such a download, but I would like to know more about why it works. I can find no 'beginner-friendly' or step by step explanations on the web.
What is happening here behind the scenes?

A stream is just an abstraction, fundamentally it works like a pointer within a collection of data.
Take the example string of "Hello World!" for example, it is just a collection of characters, which are fundamentally just bytes.
As a stream, it could be represented to have:
A length of 12 (possibly more including termination characters etc)
A position in the stream.
You read a stream by moving the position around and requesting data.
So reading the text above could be (in pseudocode) seen to be like this:
do
get next byte
add gotten byte to collection
while not the end of the stream
the entire data is now in the collection
Streams are really useful when it comes to accessing data from sources such as the file system or remote machines.
Imagine a file that is several gigabytes in size, if the OS loaded all of that into memory any time a program wanted to read it (say a video player), there would be a lot of problems.
Instead, what happens is the program requests access to the file, and the OS returns a stream; the stream tells the program how much data there is, and allows it to access that data.
Depending on implementation, the OS may load a certain amount of data into memory ahead of the program accessing it, this is known as a buffer.
Fundamentally though, the program just requests the next bit of data, and the OS either gets it from the buffer, or from the source (e.g. the file on disk).
The same principle applies to streams between different computers, except requesting the next bit of data may very well involve a trip to the remote machine to request it.
The .NET FileStream class and the Stream base class, all just defer to the windows systems for working with streams in the end, there's nothing particularly special about them, it's just what you can do with the abstraction that makes them so powerful.
Writing to a stream is just the same, but it just puts data into the buffer, ready for the requester to access.
Infinite Data
As a user pointed out, streams can be used for data of indeterminate length.
All stream operations take time, so reading a stream is typically a blocking operation that will wait until data is available.
So you could loop forever while the stream is still open, and just wait for data to come in - an example of this in practice would be a live video broadcast.

I've since located a book - C# 5.0 All-In-One For Dummies - It explains everything about all Stream classes, how they work, which one is most appropriate and more.
Only been reading about 30 minutes, already have such a better understanding. Excellent guide!

IE9 randomly popping Windows Security Dialog when I send down a binary blob as a .xlsx

So on our website, we have multiple reports that can be downloaded as an Excel Spreadsheet, we accomplish this by reading in a blank template file from the harddrive, copying it into a MemoryStream, pushing the data into the template with DocumentFormat.OpenXml.Spreadsheet; Then we pass the MemoryStream to a function that sets the headers and copies the stream into the Response.
Works GREAT in FF & Chrome, but IE9 (and 8, so my QA tells me) randomly pop a Windows Security login dialog asking you to log into the remote server. I can either cancel the dialog, or hit ok (the credentials seem to be ignored), and get the Excel file as expected. Looking at the queries (using CharlesProxy) I cannot get the login dialog to pop until I disable CharlesProxy again, so I cannot see if there's any difference in the traffic between my dev machine and the server. It also doesn't happen when running debug from my local-host, just from the Dev/Test server.
Any help would be useful, the code in question follows. This is called out of a server-side function in the code behind, hence the RespondAsExcel clears the response and puts in the xlsx instead.
using (MemoryStream excelStream = new MemoryStream())
{
using (FileStream template = new FileStream(Server.MapPath(#"Reports\AlignedTemplateRII.xlsx"), FileMode.Open, FileAccess.Read))
{
Master.CopyStream(template, excelStream);
}
//Logic here to push data into the Memory stream using DocumentFormat.OpenXml.Spreadsheet;
Master.RespondAsExcel(excelStream, pgmName);
}
public void RespondAsExcel(MemoryStream excelStream, string fileName)
{
var contenttype = "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet";
Response.Clear();
Response.ContentType = contenttype;
fileName = Utils.ReplaceWhiteSpaceWithUnderScores(fileName);
Response.AddHeader("content-disposition", "inline;filename=" + fileName);
Response.Cache.SetCacheability(HttpCacheability.NoCache);
Response.BinaryWrite(excelStream.ToArray());
//If that doesn't work, can try this way:
//excelStream.WriteTo(Response.OutputStream);
Response.End();
}
public void CopyStream(Stream source, Stream destination)
{
byte[] buffer = new byte[32768];
int bytesRead;
do
{
bytesRead = source.Read(buffer, 0, buffer.Length);
destination.Write(buffer, 0, bytesRead);
} while (bytesRead != 0);
}

A couple of ideas come to mind regarding that "extra authentication dialog" that can always be dismissed...won't promise this is your issue, but it sure smells like a first-cousin of it.
Office 2007 and later documents open HTTP-based repositories with the WebClient libraries, which do not honor any of IE's security zone filters when requests are made. If the file is requested by IE, and host URL contains dots (implying a FQDN), even if the site is anonymously authenticated (requiring no credentials), you'll get the "credential" dialog that can be cancelled or simply clicked three times and discarded. I was dealing with this problem just yesterday, and as best I can tell, there's no workaround if the file is delivered with IE. There's some quirk about how IE delivers the file that makes Office apps believe it has to authenticate the request before opening it, even though the file has already been delivered to the client!
The dialog issue may be overcome if the document is delivered from a host server in the same domain as the requesting server, eg some-server.a.domain.com to my-machine.a.domain.com.
The second idea is something strictly born of my own experience - that the openoffice vendor format types sometimes introduce their own set of oddness in document stream situations. We've just used a type of application/vnd.ms-excel and, while it seems it should map to the same applications, the problems don't seem to be as prevalent.
Perhaps that can give you some thoughts on going forward. Ultimately, right now, I don't think there's an ideal solution for the situation you're encountering. We're in the same boat, and had to tell our in-house clients that get the dialog to just hit "Cancel," and they get the document they want.

In your RespondAsExcel() method, change your content-dispositon response header from inline to attachment. This will force the browser to open the file as read only. See KB899927.
Response.AddHeader("content-disposition", "attachment;filename=" + fileName);

I had something similar with VBScript when using "Response.ContentType="application/vnd.ms-excel". I simply added the following code and the Windows Security popup window no longer appeared:
Response.AddHeader "content-disposition","attachment; filename=your_file_name_here.xls"

The process cannot access the file <filepath> because it is being used by another process

I'm upload big files dividing its on chunks(small parts) on my ASMX webservice(asmx doesn't support streaming, I not found another way):
bool UploadChunk(byte[] bytes, string path, string md5)
{
...
using (FileStream fs = new FileStream(tempPath, FileMode.Append) )
{
fs.Write( bytes, 0, bytes.Length );
}
...
return status;
}
but on some files after ~20-50 invokes I catch this error: The process cannot access the file because it is being used by another process.
I suspect that this related with Windows can't realize the file. Any idea to get rid of this boring error?
EDIT
the requests executes sequentially and synchronously
EDIT2
client code looks like:
_service.StartUpload(path);
...
do
{
..
bool status = _service.UploadChunk(buf, path, md5);
if(!status)return Status.Failed;
..
}
while(bytesRead > 0);
_service.CheckFile(path, md5);

Each request is handled independently. The process still accessing the file may be the previous request.
In general, you should use file transfer protocols to transfer files. ASMX is not good for that.
And, I presume you have a good reason to not use WCF?

Use WhoLockMe at the moment the error occurs to check who is using the file. You could put the application into debug mode and hold the break point to do this. In all probability it will be your process.
Also try adding a delay after each transfer (and before the next) to see if it helps. Maybe your transfers are too fast and the stream is still in use or being flushed when the next transfer comes in.

Option 1: Get the requirements changed so you don't have to do this using ASMX. WCF supports a streaming model that I'm about to experiment with, but it should be much more effective for what you want.
Option 2: Look into WSE 3.0. I haven't looked at it much, but I think it extends ASMX web services to support things like DIME and MTOM which are designed for transferring files so that may help.
Option 3: Set the system up so that each call writes a piece of the file into a different filename, then write code to rejoin everything at the end.

use this for creating a file
if you want to append something then add FileMode.Append
var filestreama = new FileStream(name, FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.ReadWrite);

How do I seamlessly compress the data I post to a form using C# and IIS?

I have to interface with a slightly archaic system that doesn't use webservices. In order to send data to this system, I need to post an XML document into a form on the other system's website. This XML document can get very large so I would like to compress it.
The other system sits on IIS and I use C# my end. I could of course implement something that compresses the data before posting it, but that requires the other system to change so it can decompress the data. I would like to avoid changing the other system as I don't own it.
I have heard vague things about enabling compression / http 1.1 in IIS and the browser but I have no idea how to translate that to my program. Basically, is there some property I can set in my program that will make my program automatically compress the data that it is sending to IIS and for IIS to seamlessly decompress it so the receiving app doesn't even know the difference?
Here is some sample code to show roughly what I am doing;
private static void demo()
{
Stream myRequestStream = null;
Stream myResponseStream = null;
HttpWebRequest myWebRequest = (HttpWebRequest)System.Net
.WebRequest.Create("http://example.com");
byte[] bytMessage = null;
bytMessage = Encoding.ASCII.GetBytes("data=xyz");
myWebRequest.ContentLength = bytMessage.Length;
myWebRequest.Method = "POST";
// Set the content type as form so that the data
// will be posted as form
myWebRequest.ContentType = "application/x-www-form-urlencoded";
//Get Stream object
myRequestStream = myWebRequest.GetRequestStream();
//Writes a sequence of bytes to the current stream
myRequestStream.Write(bytMessage, 0, bytMessage.Length);
//Close stream
myRequestStream.Close();
WebResponse myWebResponse = myWebRequest.GetResponse();
myResponseStream = myWebResponse.GetResponseStream();
}
"data=xyz" will actually be "data=[a several MB XML document]".
I am aware that this question may ultimately fall under the non-programming banner if this is achievable through non-programmatic means so apologies in advance.

I see no way to compress the data on one side and receiving them uncompressed on the other side without actively uncompressing the data..

No idea if this will work since all of the examples I could find were for download, but you could try using gzip to compress the data, then set the Content-Encoding header on the outgoing message to gzip. I believe that the Length should be the length of the zipped message, although you may want to play with making it the length of the unencoded message if that doesn't work.
Good luck.
EDIT I think the issue is whether the ISAPI filter that supports compression is ever/always/configurably invoked on upload. I couldn't find an answer to that so I suspect that the answer is never, but you won't know until you try (or find the answer that eluded me).

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.