We need to create or integrate some existing software which identifies the FTP folder to download files.
The problem here is the folder structure will be configured by the user according to how the client stores them on FTP at run time and stored in some xml or DB.
The folder structure needs to be generic so that we can easily configure it for any type of structure. The folder or file names can contain dates or part of date in their names which change everyday according to the date.
For eg. we can have a folder Files_DDMMYYYY and inside that there will be specific files which have to be downloaded everyday.
OR
A single folder in which different files can contain dates.
The first major improvement you can make to your solution is a very simple one.
That would be to restructure the date format in your folder structure from using
Files_DDMMYYYY
to using
Files_YYYYMMDD
This way, your folders will list in the directory in sequential order. Otherwise, your files will be crosscut sorted by the day of the month and then the month of the year and then the year. With DDMMYYYY you'll see them listed something like this:
01102011
01102012
01102013
01112011
01112012
01112013
01122011
01122012
01122013
15102011
15102012
15102013
15112011
15112012
15112013
15122011
15122012
15122013
With them YYYYMMDD you'll see them
20111001
20111015
20111101
20111115
20111201
20111215
20121001
20121015
20121101
20121115
20121201
20121215
20131001
20131015
20131101
20131115
20131201
20131215
As I said, it's a very simple change that will help keep your structure organized. The second list is automatically sequentially ordered by date.
Related
I have a job that once in a set periods of time "looks" at FTP if some new files have been uploaded. Once it finds any, it downloads it.
The question is how using C# to extract the time when the file was actually uploaded to FTP.
Thank you. I just still can't figure out how to extract exactly the time when file was uploaded to FTP, not modified. As the following shows the time of file modification.
fileInfo = session.GetFileInfo(FileFullPath);
dateUploaded = fileInfo.LastWriteTime;
Please advice some sample code that may be integrated in my current solution
using (Session session = new Session())
{
string FileFullPath =
Dts.Variables["User::FTP_FileFullPath"].Value.ToString();
session.Open(sessionOptions);
DateTime dateTime = DateTime.Now;
session.MoveFile(FileFullPath, newFTPFullPath);
TransferOperationResult transferResult;
transferResult = session.GetFiles(newFTPFullPath,
Dts.Variables["User::Local_DownloadFolder"].Value.ToString(),false);
Dts.Variables["User::FTP_FileProcessDate"].Value = dateTime;
}
You might not be able to, unless you know the FTP server reliably sets the file create/modified date to the date it was uploaded. Do some test uploads and see. If it works out for you on this particular server you want to use then great; keep a note of when you last visited and retrieve files with a greater date. By way of an example, a test upload to an Azure ftp server just now (probably derived from Microsoft IIS) did indeed set the time of the file to the datetime it was uploaded. Beware that the listed file time sent by the server might not be the same timezone as you are in, nor will it have any timezone info represented - it could just be some number of hours out relative to your current time
To get the date itself you'll need to parse the response the server gives you when you list the remote directory. If you're using an FTP library for C# (edit: you're using WinSCP), that might already be handled for you (edit: it is, see https://winscp.net/eng/docs/library_session_listdirectory and https://winscp.net/eng/docs/library_remotefileinfo); unless things have improved recently the default FTP provision in .NET isn't great - it's more intended for basic file retrieval than complex syncing, so i'd definitely look at using a capable library (and we don't do software recs here, sorry, so I can't recommend one) if you're scrutinizing the date info offered
That said, there's another way to carry out this sync process that is more of a side effect of what you want to do (and doesn't necessarily rely on parsing a non standard list output) overall as a process:
Keep a memory of every file you saw last time and reference it when looking at every file that is there now. This is actually quite easy to do:
Download all the files.
Disconnect.
Go back some time later and download any files that you don't already have
Keep track of which files you downloaded and do something with them?
You say you want to download them anyway, so just treat any file you don't already have (or maybe one that has a newer date, different file size etc) as one that is new/changed since you last looked
Big job, potentially, depending how many various servers you want to support
I've created a program that's supposed to run once each night. What it does is that it downloads images from my FTP, compresses them and uploads them back to the FTP. I'm using WinSCP for downloading and uploading files.
Right now I have a filemask applied that makes sure that only images are downloaded, that subdirectories are excluded and most importantly that only files that are modified the last 24 hours are downloaded. Code snippet for this filemask:
DateTime currentDate = DateTime.Now;
string date = currentDate.AddHours(-24).ToString("yyyy-MM-dd");
transferOptions.FileMask = "*.jpg>=" + date + "; *.png>=" + date + "|*/";
Thing is, as I'm about to publish this I realize that if I run this once per night, and it checks if files are modified the last 24 hours, it will just keep downloading and compressing the same files, as the modified timestamp will keep increasing for each compression.
To fix this I need to edit the FileMask to only download NEW files, ie files that weren't in the folder the last time the program was run. I don't know if you can check the Created By-timestamp in some way, or if I have to do some comparisons. I've been looking through the docs but I haven't found any solution to my specific use case.
Is there anyone experienced in WinSCP that can point me in the right direction?
It doesn't look like WinSCP can access the Created Date of the files.
Unless you can do something to make the files 'different' when you re-upload them (e.g. put them in a different folder) then you best option might be:
Forget about using FileMask
Use WinSCP method EnumerateRemoteFiles to get a list of the files
Loop through them yourself (its a collection of RemoteFileInfo objects
You'll probably need to keep a list of 'files already processed' somewhere and compare with that list
Call GetFiles for the specific files that you actually want
There's a whole article on WinSCP site on How do I transfer new/modified files only?
To summarize the article:
If you keep the past files locally, just run synchronization to download only the modified/new ones.
Then iterate the list returned by Session.SynchronizeDirectories to find out what the new files are.
Otherwise you have to use a time threshold. Just remember the last time you ran your application and use a time constraint that includes also a time, not just a date.
string date = lastRun.ToString("yyyy-MM-dd HH:mm:ss");
transferOptions.FileMask = "*.jpg>=" + date + "; *.png>=" + date + "|*/";
I am developing a winform application that moves files between directories in a SAN. The app searches a list of files in directories with over 200,000 files (per directory) in the SAN's Directory, and then moves the found files to another directory.
For example I have this path:
\san\xxx\yyy\zzz
I perform the search in \zzz and then move the files to \yyy, but while I'm moving the files, another app is inserting more files into the xxx, yyy and zzz directories.
I don't want to impact the access or performance to other applications that use these directories.
The file's integrity is also a big concern, because when the application moves the files to \yyy directory another app uses those files.
All of these steps should run in parallel.
What methods could I use to achieve this?
How can I handle concurrency?
Thanks in advance.
Using the published .net sources, one can see how the move file operation works Here. Now if you follow the logic you'll find there's a call to KERNEL32. The MSDN documentation for this states
To perform this operation as a transacted operation, use the MoveFileTransacted function.
Based on that I'd say the move method is a safe/good way to move a file.
Searching through a large number of files is always going to take time. If you need to increase the speed of this operation, I'd suggest think out of side of the box to achieve this. For instance, I've seen cache buckets produced for both letters and dates. Using some sort of cache system, you could search for files in a quicker manner, say by asking the cache bucket for files starting with "a" for "these files like this starting with a" file, or files between these dates - etc.
Presumably, these files are used by something. For the purposes of this conversation, let's assume they are indexed in an SQL Database.
TABLE: Files
File ID Filename
-------------- ----------------------------
1 \\a\b\aaaaa.bin
2 \\a\b\bbbbb.bin
To move them to \\a\c\ without impacting access, you can execute the following:
Copy the file from \\a\b\aaaa.bin to \\a\c\aaaa.bin
Update the database to point to \\a\c\aaaa.bin
Once all open sessions have been closed for \\a\b\aaaa.bin, delete the old file
If the files are on the same volume, you could also use a hardlink instead of a copy.
I am working on a small application to allow me modify files and version each file before each change. What I would like the app to do is uniquely mark each file so that whenever the same file is opened up, the history for that particular file can be pulled back up. I am not using any of the big version control tools for this. How do I do this pro grammatically please?
Simple solution. Use a verison control which already exists (eg. Git) but if your really want to do this then try this.
Each time you create a new version copy the previous version of the file into a separate hidden directory and have a config file in that directory which holds the checksum of that file. Checksum will "more than likely" be unique since its a hashed value of the file (each time file changes, checksum will be different - you need to calculate the checksum yourself.)
When you open a file just check if there is that config file in the directory and compare the checksum with the checksum of what's already open. If they are the same then you are on the same file. That's how it works.
You could use checksums to optimise it. So if a user goes in to a file changes things, changes back to the way they were and saves. Checksum should return the same thing (unless you include modified date and time etc.)
Each folder should have a name which follows a pattern (filenameVn.n eg. someTextFile.txt.v1.0) then you will be able to figure out what the directory you are navigating to in the history should say.
Another approach would be to simply copy the file and append some tag onto the end of it (checksum maybe? version number?) so then you wouldn't need extra folders.
Yet another approach would be to call the files whatever the checksum recorded and store the history of versions (along with corresponding checksums) in a separate config file and then refer to it when you want to figure out what the file that you want to access is called. So each version will be refered to based on its own checksum (like in Git.)
So to sum up each file version would be stored somewhere, you will be able to validate if they are the same (so you can optimise by avoiding storing multiple files with no changes in them and wasting space) and you will be able to dynamically determine where each version is and get access to it.
Hope it gives you a bit more understanding of how to get started.
I have an application that stores images in a database.
Now I have learned that this isn't the best way to do this because of performance.
I've started extracting all the "blobs" out of the database and added them (through a virtual folder) into the data folder
the problem is that I allready have 8000 blobs stored and if I put them in the folder like this "data/< blobid >/< blobname.extension >" this folder will contain to many folder to be manageable.
I was wondering how you can store your files the best?
group them in dates of creation like this "data/< year >/< month >/< day >/< blobid >/< name >".
I also have to add that our files are stored in a tree in the database.
I was wondering if I should map that treestructure to the filesystem, the only problem there is that you can move branches. That would mean that I have to move the branches on the filesystem.
Any help is welcome.
Grtz,
M
What version of SQL Server are you using? Because if you are using 2008 you can use the FILESTREAM datatype to store images. This is just as efficient as storing them on the filestore but without any of the associated hassle. See Getting Traction with SQL Server 2008 Filestream.
A simple strategy is grouping according to the first [few] digit(s). E.g.:
1/
2/
123.blob
129.blob
5/
151.blob
2/
0/
208.blob
That way, you know you'll never have more than 10 subdirectories in a directory. You may of course use more or less levels (of directories) and/or more digits per level.
A more complex, dynamic system could create sublevels on demand: If the number of blobs in a certain directory exceed a preset maximum, create another 10 subdirectories and move the files in.
Most filesystems for BLOB data will set up a number of subdirectories. For instance, if you have IDs 1-10000, you could have:
00/
00/
01/
02/
00020.blob
00021.blob
...
...
01/
02/
03/
...
The other question I have back for you: why is it so bad for you to manage them as BLOBs?
Do you need to store the files in the relevant tree structure? If not you could name the file /YOURFOLDER/blobid_blobname.extension. This way the upload folder acts purely as a repository for the data rather than mimicking the data structure.