Best way to package large numbers of large files for download?

Best way to package large numbers of large files for download? - c#

I have an ASP.NET website that stores large numbers of files such as videos. I want an easy way to allow the user to download all the files in a single package. I was thinking about creating ZIP files dynamically.
All the examples I have seen involve creating the file before it is downloaded but potentially terabytes of information will be downloaded and therefor the user will have a long wait. Apparently ZIP files store all the information regarding what is in the ZIP file at the end of the file.
My idea is to dynamically create the file as its downloaded. This way I could allow the user to click download. The download would start and not require any space on the server to be pre packaged as it would copy things over uncompressed sequentially. The final part of the file would contain the information on the contents of what has been downloaded.
Has anyone had any experience of this? Does anyone know a better way of doing this? At the moment I cant see any pre made utilities for doing this but I believe it will work. If it doesn't exist then i'm thinking that I will have to read the Zip file format specifications and write my own code... something that will take more time than I was intending to spend on this.
https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT

Related

Bulk Downloads in Azure Blob Storage

I need to figure out a way to let my users download several pdf files (sometimes thousands), from Azure Blob Storage, I know that I can download the files in paralel, and that would make things quicker, but the issue here is that the user could possibly have thousands of pdf files to download, and, that isn't at all reasonable.
Also, I can't download the files to another server, zip them, and let the user download them from there, as that would be incredibly inefficient for me.
Is there a way to create a zip of the files and let the user download that (other than the way above)? I saw other questions on this topic but none gave an answer/solution that suits my needs.
What would be, the absolute best way I can do this? Or isn't there another way to preform this task?
Thank you in advance.

Since no one gave an answer, and I see more posts about this on stack overflow and other sites, I decided to share here my solution (can't share code, because reasons...)
Firstly, as of today 04-09-2020 there's still no support for bulk download from Azure Blob Storage in a zip (or other format) that is directly from azure to client, without routing the download flow through a server that does the organizing and zipping.
The problem I had...
A need to download (several) files from Azure Blob Storage, zip them (maybe organize them by folders), and prompt the client to download them in bulk without any download data passing through the server and not filling the client downloads folder with scattered files...
During my research I thought about doing everything on the client's side in javascript through memory and let the client download it, but it could be quite memory expensive since my downloads could be in the GB size range.
The solution...
Then I came across a javascript library called StreamSaver, this library writes the files with streams and writes directly on the client's machine, meaning the memory expense was much less.
By luck this library also allows to organize the files inside the 'download directory' that will be prompted to the user, and even lets me zip that directory before telling the user if he wants to download it, meaning that this one library solved, almost, all my problems.
Now I only have a webmethod called by javascript that returns all the Azure SAS url to download from, and the rest is all in javascript in the client.
TL;DR:
Used StreamSaver javascript library to download, organize and zip all the files from the client side and then prompt them to download it, only using a webmethod to get all the urls wich are to be downloaded.
This solution works (from what I've tested) in at least these browsers:
Chrome;
FireFox;
Opera;
Edge (Chromium)
Problems I came across using the StreamSaver Library...
There are a few drawbacks/problems with the library,
1st Safary doesn't support it! more info about this here
2nd StreamSaver only allows zipping to files smaller than 4GB, this could be worked around using yet another library for zipping...

c# Multiple files and file streams

This might not be the best place. I don't have a problem with some code but rather looking for a code idea.
I want to be able to scan a file to a see if the file contains multiple files within, hidden or not.
For example: Take a movie with mp4 extension that movie has a video stream and an audio stream and/or srt file embedded. you can hide a zip file behind a jpeg file using standard cmd command line.
So I want to be able to scan a file for those multiple hidden files/streams inside. Is there such a way and can anyone guide me packages or code snippet or website?
So far I haven't found anything cause I don't know what too google for.

c# uwp read csv from web

I am trying to make an app that make use of open data.
The data I try to read out is in a CSV format (and is about 40mb big).
I have 2 problems I can't solve.
First I having difficulties to read the file from the web.
I already read on MSDN how to read files asynchrome but it's all about local files. I want to make a list of objects. Each line (except the first line) contains all props for 1 object
Secondly when I finally managed to read the file, is there a way to save it's data and read it somehow the next time? Because 40mb is pretty big to re-download each time you open the app and it takes a lot of time.
I was wondering if it is possible that when I read the the file on the web again, it will only read and at the new lines.
I am a newbie in UWP (c#) applications, so my apologies for the questions.
Thanks in advance.

There are two APIs you can use to download a file. One is HttpClient, described here on MSDN Documentation and in a UWP sample here. This class is usually recommended for smaller files and smaller data, but can easily handler larger files as well. Its disadvantage is, that when the user closes the app, the file will stop downloading.
The alternative is BackgroundDownloader, again here on MSDN and here in UWP samples. This class is usually recommended for downloading larger files and data, as it automatically perfroms the download in the background so the download will continue even when the app is closed.
To store your files, you can use the ApplicationData.Current.LocalFolder. This is a special folder provided to you by the system for storage of application files. You have read/write access to this folder and you can not only store your files here, but even create subfolder structure using UWP StorageFile and StorageFolder APIs. More about this is on MSDN.

Best way to store a file temporarily untill the usage of file

As we all know that we can not get the full path of the file using File Upload control, we will follow the process for saving the file in to our application by creating a folder and by getting that folder path as follows
Server.MapPath
But i am having a scenario to select 1200 excel files, not at a time. I will select each and every excel file and read the requied content from that excel and saving the information to Database. While doing this i am saving the files to the Application Folder by creating a folder Excel. As i am having 1200 files all these files will be saved in to this folder after each and every run.
Is it the correct method to follow or not I don't know
I am looking for an alternative solution rather than saving the file to folder. I would like to save the full path of file temporarily until the process was executed.
So can any tell me the best way as per my requirement.

Grrbrr404 is correct. You can perfectly take the byte [] from the FileUpload.PostedFile and save it to the database directly without using the intermediate folder. You could store the file name with extension on a separate column so you know how to stream it later, in case you need to.
The debate of whether it's good or bad to store these things on the database itself or on the filesystem is very heated. I don't think either approach is best over the other; you'll have to look at your resources and your particular situation and make the appropriate decision. Search for "Store images on database or filesystem" in here or Google and you'll see what I mean.
See this one, for example.

Detect file extension c#

There is a virus that my brother got in his computer and what that virus did was to rename almost all files in his computer. It changed the file extensions as well. so a file that might have been named picture.jpg was renamed to kjfks.doc for example.
so what I have done in order to solve this problem is:
remove all file extensions from files. (I use a recursive method to search for all files in a directory and as I go through the files I remove the extension)
now the files do not have an extension. the files now look like:
I think this file names are stored in a local database created by the virus and if I purchase the anti virus they will be renamed back to their original name.
since my brother created a backup I selected the files that had a creation date latter than when my brother performed the backup. so I have placed that files in a directory.
I am not interested in getting the right extension as long as I can see the content of the file. for example, I will scan each file and if it has text inside I know it will have a .txt extension. maybe it was a .html or .css extension I will not be able to know that I know.
I belive that all pdf files should have something in common. or doc files should also have something in common. How can I figure what the most common types (pdf, doc, docx, png, jpg, etc) files have in common)
Edit:
I know it will probably take less time to go over all this 200 files and test each one instead of creating this program. it is just that I am curios to see if it will be possible to get the file extension.

In unix, you can use file to determine the type of file. There is also a port for windows and you can obviously write a script (batch, powershell, etc.) or C# program to automate this.

First, congratulate your brother on doing a backup. Many people don't, and are absolutely wiped out by these problems.
You're going to have to do a lot of research, I'm afraid, but you're on the right track.
Open each file with a TextReader or a BinaryReader and examine the headers. Most of them are detectable.
For instance: Every PDF starts with "%PDF-" and then its version number. Just look at those first 5 characters. If it's "%PDF-", then put a PDF on the filename and move on.
Similarly: "ÿØÿà..JFIF" for JPEG's, "[InternetShortcut]" for URL shortcuts, "L...........À......Fƒ" for regular shortcuts (the "." is a zero/null, BTW)
ZIPs / Compressed directories start with {0x50}{0x4B]{0x03}{0x04}{0x14}, and you should be aware that Office 2007/2010 documents are really ZIPs with XML files inside of them.
You'll have to do some digging as you find each type, but you should be able to write something to establish most of the file types.
You'll have to write some recursion to work through directories, but you can eliminate any file with no extension.
BTW - A great tool to help pwith this is HxD: http://www.mh-nexus.de/ It's what I used to pull this answer together!
Good luck!

"most common types" each have it's own format and most of them have some magic bytes at the fixed position near beginning of the file. You can detect most of formats quite easily. Even HTML, XML, .CSS and similar text files can be detected by analyzing their beginning. But it will take some time to write an application that will guess the format. For some types (such as ODF format or JAR format, which are built on top of regular ZIPs) you will be also able to detect this format.
But ... Can it be that there exists such application on the market? I guess you can find something if you search, cause the task is not as tricky as it initially seems to be.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.