I have an automated job that pulls files that are uploaded to our servers via a client facing site using xcopy.
Is there any way to only pull files that are fully uploaded?
I have thought about creating a second "inProcess" folder that will be used for uploading and then move those files once fully uploaded, but that still creates a window of time when the file is in transition to a "Done" folder...
Any thoughts?
use the .filepart extension for temporary files.
It's probably the most simple and clear way of doing this.
WinSCP does this.
You can upload an MD5 hash of the file and then upload the file and if the file uploaded doesn't match the MD5 then it's not finished (or if it takes to long, perhaps it didn't upload properly)
MD5 is often used to check the integrity of a file by creating a hash that represents the file. If the file varies at all, it will almost always (as in, basically never for our purposes) generate a different MD5 hash. The only reason a file would not match it's previously uploaded MD5 hash is if it wasn't finished or the MD5/file was corrupted on upload.
There is also this. but it's perl and from expert exchange (ick)
Related
I have this C# code which unzip a zip file.
ZipFile.ExtractToDirectory(_downloadPath, _extractPath);
To test download process, I use the file size and compare them. But for extraction process, how do we ensure that the extraction process is successful? It could be corrupted (extraction process stop half way). Can I use file count to compare?
I suggest you go ahead and compare md5 hash of files in archive and the ones that were extracted. Though it is definitely not the fastest process, this way you'll be 100% sure the data is not corrupted.
You can find how to get md5 of a file inside archive here:
I have to take the directory of a file in zip file
I am using chilkat to download large .zip files from FTP server..
Files size usually goes around 12-13GB and after downloading I need to validate if file is not corrupt.
I've trying to use ICSharpCode.SharpZipLib.Zip
like this
ZipFile zip = new ZipFile(path);
bool isValidZip = zip.TestArchive(true, TestStrategy.FindFirstError, null);
But validation take VERY long time or even crashes..
Is there any quicker solutions ?
If the customer is uploading to FTP, then maybe the customer can also upload a SHA256 hash. For example, if the customer uploads x.zip, then compute the SHA256 of x.zip and also upload x.zip.sha256. Then your application can download both x.zip and x.zip.sha256, and then use Chilkat.Crypt2.HashFile to hash the x.zip and check against x.zip.sha256.
If it's not possible to get an expected hash value, then you might first check the file size against what is on the server. FTP servers can differ in how file information is provided. Older servers will provide human-readable directory listings (LIST command) whereas newer servers (i.e. within the last 10 years) support MLSD. Chilkat will use MLSD if possible. The older FTP servers might provide in accurate (non-exact) file size information, whereas MLSD will be accurate. You can call the Ftp2.Feat method to check to see if MLSD is supported. If so, then you can first validate the size of the downloaded file. If it's not the expected size, then you can skip any remaining validation because you already know it's invalid. (You can set Ftp2.AutoGetSizeForProgress = true, and then Chilkat will not return a success status when MLSD is used and the total number of bytes downloaded is not equal to the expected download size.
Assuming the byte counts are equal, or if you can't get an accurate byte count, and you don't have an expected hash, then you can test to see if the zip is valid. The 1st option is to call the Chilkat.Zip.OpenZip method. Opening the .zip will walk the zip's local file headers and central directory headers. Most errors will be caught if the .zip is corrupt. The more comprehensive check is only possible by actually decompressing the data for each file within the zip -- and this is probably why SharpZipLib takes so long. The only way to validate the compressed data is to actually do the decompression. Corrupted bytes would likely cause the decompressor to encounter an impossible internal state, which is clearly corruption. Also, the CRC-32 of the uncompressed data is stored in each local file header within the .zip. Checking the CRC-32 requires decompression. SharpZipLib is surely checking the CRC-32 (after it decompresses, and it's probably trying to decompress in memory and runs out of memory). Chilkat.OpenZip does not check the CRC-32 because it's not decompressing. You can call Chilkat.Unzip to unzip to the filesystem, and the act of unzipping also checks the CRC-32.
Anyway.. you might decide that checking the byte count and being able to call Chilkat.Zip.OpenZip successfully is sufficient for the validation check.
Otherwise, it's best to design the validation (using a parallel .sha256 file) in the system architecture if you're dealing with huge files..
Some FTP servers have implemented hash commands (see Appendix B). Issue HELP on ftp prompt in order to get a list of all available commands and see if your server supports a hash command. Otherwise you must stick to zip testing.
I have an ASP.NET website that stores large numbers of files such as videos. I want an easy way to allow the user to download all the files in a single package. I was thinking about creating ZIP files dynamically.
All the examples I have seen involve creating the file before it is downloaded but potentially terabytes of information will be downloaded and therefor the user will have a long wait. Apparently ZIP files store all the information regarding what is in the ZIP file at the end of the file.
My idea is to dynamically create the file as its downloaded. This way I could allow the user to click download. The download would start and not require any space on the server to be pre packaged as it would copy things over uncompressed sequentially. The final part of the file would contain the information on the contents of what has been downloaded.
Has anyone had any experience of this? Does anyone know a better way of doing this? At the moment I cant see any pre made utilities for doing this but I believe it will work. If it doesn't exist then i'm thinking that I will have to read the Zip file format specifications and write my own code... something that will take more time than I was intending to spend on this.
https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT
I have a folder with music videos which I want to backup from my laptop to a external hdd. I dont want to use a backup-Image, but a direct file copy so I can directly watch the music videos from the backup hdd on another computer/laptop or a console.
Curently I use the freeware SyncBack Free to mirror the files to the external hdd. SyncBack Free is a nice tool, but it does not seem to fully satisfy my needs. The problem is that I like to modify the filenames of my music videos from time to time. Though SyncBack Free has a option for files with identical content it does not seem to work for videos and you end up with two copies from the same file in each folder when you synchronise after a file name change.
So im thinking about writing my own freeware backup software.
The question is:
-how can I identify identical files with c#/.Net 4.0 without using the filename? Im thinking of generating hashes or a checksum for the files without knowing much about it
-Is it to slow to really be used for a backup software?
You can get a hash of a file like this
using System.Security.Cryptography;
static string GetFileHash(string filename)
{
byte[] data = File.ReadAllBytes(filename);
byte[] hash = MD5.Create().ComputeHash(data);
return Convert.ToBase64String(hash);
}
MD5 is not the most secure hash, but it is still fast which makes it good for file checksums. If the files are large ComputerHash() also takes a Stream.
You may also want to check out some other check sum algorithms in the HashLib library. It contains CRC and other algorithms which should be even faster. You can download it with nuget.
There are other strategies you can use as well such as checking if only the first x bytes are the same.
You can keep a database of hashes that have been backed up so that you don't have to recompute the hashes each time the backup runs. You could loop through only files which have been modified since the last backup time and see if their hash is in your hash database. SQLite comes to mind as a good database to use for this if you want your backup program to be portable.
I'm creating an encrypted zip file using DotNetZip using WinZip AES 256. However I'm able to read the directory and even to remove some of the zipentries without having the encryption key.
As far as I understand the directory visibility is a limitation of the Zip format. I just wonder, if this also applies for any changes in removing / adding components to the zip file or does there exist a way for preventing such changes.
EDIT:
A quick read of Zip File Format seems to show, that double zipping seems to be the only solution to prevent random removal / addition of comoponents in a zipfile, regardless of encryption of the single entry.
From the kb of the Winzip last update last updated 20 Feb, 2013:
To hide the names of the files in your encrypted Zip file, you can double zip them. To do this:
So I'll say no :-)
Winrar has an option to encrypt the filenames, sadly the algorithm isn't public.