best way to calculate statistics from a file in c# - c#

I have around 300k image files in a remote location. I download (have to) and write the details of these files to a text file (with some additional info). Due to the nature of the info I'm getting, I have to process each file as they arrive (Also I write each file info to a file line) to get some form of statistics for example, I have a list of objects with attributes size and count to see how many images of certain sizes I have.
I have also thought about getting everything read and written to a file without keeping any statistics info where I could just open the file again to add the statistics. But I can't think of a way to process a 250k line multi attribute file for statistics info.
I know the lists (yeah I have 2 of them) and the constant loop for each item is bugging the application down but is there another way? Right now it's been 2 hours and the application is still on 26k. For each image item, I do something like this to keep count where I check if an image comes with a certain size that did come before, I add it to that List item.
public void AddSizeTokens(Token token)
{
int index = tokenList.FindIndex(item => item.size== token.size);
if (index >= 0)
tokenList[index].count+=1;
else
tokenList.Add(token);
}
What a single line from the file I write to looks like
Hits Size Downloads Local Loc Virtual ID
204 88.3 4212 .../someImage.jpg f-dd-edb2-4a64-b42
I'm downloading the files like below;
try
{
using (WebClient client = new WebClient())
{
if (File.Exists(filePath + "/" + fileName + "." + ext))
{
return "File Exists: " + filePath + "/" + fileName + "." + ext;
}
client.DownloadFile(virtualPath, filePath + "/" + fileName + "." + ext);
return "Downloaded: " + filePath + "/" + fileName + "." + ext;
}
}
catch (Exception e) {
return"Problem Downloading " + fileName + ": " + e.Message;
}

You should be changing your tokenList from List<Token> to Dictionary<long, Token>.
The key is the size.
Your code would look like this:
Dictionary<long, Token> tokens = new Dictionary<long, Token>();
public void AddSizeTokens(Token token)
{
Token existingToken;
if(!tokens.TryGetValue(token.size, out existingToken))
tokens.Add(token.size, token);
else
existingToken.count += 1;
}
That will change it from an O(n) operation to a O(1) operation.
Another point to consider is Destrictor's comment. Your internet connection speed is very possibly the bottle neck here.

Well, I thought perhaps the coding was the issue. Some of the problem was indeed so. As per Daniel Hilgarth's instructions, changing to dictionary helped a lot, but only the first 30 minutes. Then It was getting worse by every minute.
The problem was apparently the innocent looking UI elements that I've fed information. They ate away so much cpu that it killed the application eventually. Minimizing UI info feed helped (1.5k per minute to at slowest 1.3k). Unbelievable! Hope it helps others who have similar problems.

Related

How can I fix my code to move files that already exist to another directory?

This is my first C# script and first non-SQL based script in general.I'm really proud of it (and I couldn't have done it this quickly without help from this community, thanks!), but I know it's going to be all kinds of messy.
The script loops through all files in a single directory, removes special characters from the file names, and renames the files to a standard, user-friendly, name. The script is looking for a specific set of files in the directory. If it finds a file that isn't supposed to be in the directory, it moves the file to a safe folder and renames it. If the folder
I'm working with 4 files that have dynamic names that will include numbers and special characters. The renaming process happens in two steps:
Remove special characters and numbers from the name. Ex: From "EOY 12.21.2018 - 12.28.2018 PRF.xls" to "EOYPRF.xls"
Rename the file to clearly label what the file is. Ex: From "EOYPRF.xls" to "EOY_CompanyName.xls"
There may be files added to this directory by accident, and since they are payroll files, they are highly confidential and cannot be moved unless they need to be moved (only if they are one of the 4 files), so I move them to a subdirectory in the same directory the files are stored in and rename them.
I am also trying to account for if my script or process messes up midway. This script is part of a larger automation process run in SSIS, so there are many failure points. It may be possible that the script fails and leaves one or all of the 4 files in the directory. If this is the case, I need to move the file out of the main directory before the user adds new, unaltered master files to be processed. If the directory contains files of the same final name ("EOY_CompanyName.xls") then it will not work properly.
I'm testing the script by placing the three scenario in the directory.
2 files that are not in any way associated with the 4 master files.
4 unaltered master files formatted with numbers and special characters: "EOY 12.21.2018 - 12.28.2018 PRF.xls"
4 master files already in their final state (simulating a failure before the files are moved to their final directory). Ex: "EOY_CompanyName.xls"
The problem I'm facing is in the rare scenario where there are both unaltered master files and final master files in the directory, the script runs up until the first unaltered file, removes the special characters, then fails at the final renaming step because a file already exists with the same name (Scenario 3 from the 3 points above). It'll then continue to run the script and will move one of the master files into the unexpected file directory and stop processing any other files for some reason. I really need some help from someone with experience.
I've tried so many things, but I think it's a matter of the order in which the files are processed. I have two files named "a.xls" and "b.xls" which are placeholders for unexpected files. They are the first two files in the directory and always get processed first. The 3rd file in the directory is the file named above in its unaltered form ("EOY 12.21.2018 - 12.28.2018 PRF.xls"). It gets renamed and moved into the unexpected files folder, but really it should be passed over to move the master files containing the final name ("EOY_CompanyName.xls") into the unexpected folder. I want to make sure that the script only processes new files whenever it's run, so I want to move any already processed files that failed to get moved via the script into another directory.
public void Main()
{
///Define paths and vars
string fileDirectory_Source = Dts.Variables["User::PayrollSourceFilePath"].Value.ToString();
string fileDirectory_Dest = Dts.Variables["User::PayrollDestFilePath"].Value.ToString();
string errorText = Dts.Variables["User::errorText"].Value.ToString();
DirectoryInfo dirInfo_Source = new DirectoryInfo(fileDirectory_Source);
DirectoryInfo dirInfo_Dest = new DirectoryInfo(fileDirectory_Dest);
string finalFileName = "";
List<string> files = new List<string>(new string[]
{
fileDirectory_Source + "EOY_PRF.xls",
fileDirectory_Source + "EOY_SU.xls",
fileDirectory_Source + "FS_PRF.xls",
fileDirectory_Source + "FS_SU.xls"
});
Dictionary<string, string> fileNameChanges = new Dictionary<string, string>();
fileNameChanges.Add("EOYReportPRF.xls", "EOY_PRF.xls");
fileNameChanges.Add("PayrollEOY.xls", "EOY_SU.xls");
fileNameChanges.Add("PRFFundingStatement.xls", "FS_PRF.xls");
fileNameChanges.Add("SUFundingStatement.xls", "FS_SU.xls");
///Determine if all files present
int count = dirInfo_Source.GetFiles().Length;
int i = 0;
///Loop through directory to standardize file names
try
{
foreach (FileInfo fi in dirInfo_Source.EnumerateFiles())
{
string cleanFileName = Regex.Replace(Path.GetFileNameWithoutExtension(fi.Name), "[0-9]|[.,/ -]", "").TrimEnd() + fi.Extension;
File.Move(fileDirectory_Source + Path.GetFileName(fi.Name), fileDirectory_Source + cleanFileName);
///Move unexpectd files in source directory
if (!fileNameChanges.ContainsKey(cleanFileName))
{
errorText = errorText + "Unexpected File: " + cleanFileName.ToString() + " moved into the Unexpected File folder.\r\n";
File.Move(dirInfo_Source + cleanFileName, dirInfo_Source + "Unexpected Files\\" + Path.GetFileNameWithoutExtension(cleanFileName) + "_" + DateTime.Now.ToString("yyyyMMddHHmmssfff") + fi.Extension);
}
if (fileNameChanges.ContainsKey(cleanFileName))
{
///Final Friendly File Name from Dict
var friendlyName = fileNameChanges[cleanFileName];
///Handle errors produced by files that already exist
if (files.Contains(fileDirectory_Source + friendlyName))//File.Exists(fileDirectory_Source + friendlyName))
{
MessageBox.Show("File.Exists(dirInfo_Source + friendlyName)" + File.Exists(dirInfo_Source + friendlyName).ToString() + " cleanFileName " + cleanFileName);
errorText = errorText + "File already exists: " + friendlyName.ToString() + " moved into the Unexpected File folder.\r\n";
File.Move(dirInfo_Source + friendlyName, dirInfo_Source + "Unexpected Files\\" + Path.GetFileNameWithoutExtension(friendlyName) + "_" + DateTime.Now.ToString("yyyyMMddHHmmssfff") + Path.GetExtension(friendlyName));
return;
}
///Rename files to friendly name
File.Move(dirInfo_Source + cleanFileName, dirInfo_Source + friendlyName);
finalFileName = friendlyName.ToString();
}
///Count valid PR files
if (files.Contains(dirInfo_Source + finalFileName))
{
i++;
}
}
///Pass number of files in source folder to SSIS
Dts.Variables["User::FilesInSourceDir"].Value = i;
}
catch (Exception ex)
{
errorText = errorText + ("\r\nError at Name Standardization step: " + ex.Message.ToString()) + $"Filename: {finalFileName}\r\n";
}
///Search for missing files and store paths
try
{
if (i != 4)
{
var errors = files.Where(x => !File.Exists(x)).Select(x => x);
if (errors.Any())
errorText = (errorText + $" Missing neccessary files in PR Shared drive. Currently {i} valid files in directory.\r\n\n" + "Files missing\r\n" + string.Join(Environment.NewLine, errors) + "\r\n");
}
}
catch (Exception ex)
{
errorText = errorText + ("Error at Finding Missing PR Files step: " + ex.Message.ToString()) + "\r\n\n";
throw;
}
///Loop through directory to move files to encrypted location
try
{
if (i == 4)
foreach (FileInfo fi in dirInfo_Source.EnumerateFiles())
{
fi.MoveTo(fileDirectory_Dest + Path.GetFileName(fi.FullName));
}
}
catch (Exception ex)
{
errorText = errorText + ("Error at Move Files to Encrypted Directory step: " + ex.Message.ToString()) + "\r\n";
}
Dts.TaskResult = (int)ScriptResults.Success;
Dts.Variables["User::errorText"].Value = errorText;
}
#region ScriptResults declaration
/// <summary>
/// This enum provides a convenient shorthand within the scope of this class for setting the
/// result of the script.
///
/// This code was generated automatically.
/// </summary>
enum ScriptResults
{
Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
};
#endregion
}
}
I would ideally like to move all files that are in the folder before the files need to be cleaned and renamed so I dont receive errors or commit records to the database that already exist.
If you made it this far, thank you for your time and I appreciate you taking the hour it probably took to read this. You are a hero.
As I understand you want to move out any of the "4 short names" if they already exist before doing anything else. I would go with below, please note, I did not run the code..
I hope I understood you correct
///Loop through directory to standardize file names
try
{
//Cleanup source folder
foreach (string fileShortName in files)
{
if (File.Exists(fileDirectory_Source + fileShortName))
{
//Time to move the file, its old
errorText = errorText + "Old File: " + fileShortName + " moved into the Old File folder.\r\n";
File.Move(dirInfo_Source + fileShortName, dirInfo_Source + "Old Files\\" + Path.GetFileNameWithoutExtension(fileShortName) + "_" + DateTime.Now.ToString("yyyyMMddHHmmssfff") + Path.GetExtension(fileShortName));
}
}
foreach (FileInfo fi in dirInfo_Source.GetFiles())

Download a file from a variable path every 15 minutes

I'm working on a console application which is scheduled in windows scheduler to run every 15 minutes which when ran downloads a file from a public website using WebClient.
string Url1 = "http://www2.epa.gov/sites/production/files/" + DateTime.Now.Year + "-" + DateTime.Now.Month.ToString("d2")+ "/rindata.csv";
WebClient webClient = new WebClient();
webClient.DownloadFile(Url1, filename);
The above code works fine, but the above URL might or might not change every month randomly which cause my application throw 404 Exception.
Example
Consider the URL to be http://www2.epa.gov/sites/production/files/2015-09/rindata.csv and the variable part of the URL is 2015-09 which contains the data regarding September and it might change to 2015-10 for October if there any data change for that month but there no pattern of when or whether it changes everymonth.
May I know a better way to handle this?
To make it download every 15 minutes you can use a timer, set its interval to 15 minutes(in miliseconds), and put that code in the tick. Regarding the change of URL, I donĀ“t realize of a better way to do it.
It sounds like the URL won't necessarily update every month, so if this is the case don't re-evaluate the string every 15 minutes. So on first run set
string Url1 = "http://www2.epa.gov/sites/production/files/" + DateTime.Now.Year + "-" + DateTime.Now.Month.ToString("d2")+ "/rindata.csv";
you'll have to save this working value somewhere like the config file, then you can keep reusing this value until it fails, so every 15 minutes only run
WebClient webClient = new WebClient();
webClient.DownloadFile(Url1, filename);
instead of evaluating the URL again.
When it fails then re-evaluate it to the current month again.
if (failed)
{
string Url1 = "http://www2.epa.gov/sites/production/files/" + DateTime.Now.Year + "-" + DateTime.Now.Month.ToString("d2")+ "/rindata.csv";
}
then overwrite the saved value.
More info on saving to a settings file:
https://msdn.microsoft.com/en-us/library/aa730869(VS.80).aspx
I regards to the Date change, I would probably do a backwards search from 12 to 1. Since the newest data is all you're interested in, a higher month that doesn't return a 404 would always be the freshest data. A simple loop that checks for a 404 would be fine if you have no other way to know what the URL is. This is a very basic example but the concept should be sound.
for (int i = 12; i > 1; i--)
{
string folder = string.Format("{0}-{1}", DateTime.Now.Year + "-" + i.ToString().PadLeft(2, '0'));
string Url1 = "http://www2.epa.gov/sites/production/files/" + folder + "/rindata.csv";
try
{
using (WebClient client = new WebClient())
{
client.DownloadFile(Url1, "/rindata.csv");
}
}
catch (Exception e)
{
Console.WriteLine(string.Format("404 Error:{0}", Url1));
}
}

Best way to fix: Acces to path 'C:\..\USB-map' is denied because map is already used

I am trying to make to create a .cmd file with this code into it: call .\CopyToTarget.cmd w60 glb "C:\Users\oma\me\trunk-r664\USB-map". I am creating this code ~5 times.
But since \trunk-r664\ is already in use it seems like I cannot write: #"\trunk-r664\USB-map" into the .cmd file for some reason. Does anyone know how to fix it? It keeps getting me the error: UnauthorizedAccesExpection was unhandled, ccess to the path 'C:\Users\me\Desktop\trunk-r664\USB-map' is denied.
using (StreamWriter sw = File.CreateText(Environment.GetFolderPath(Environment.SpecialFolder.Desktop)
+ "\\trunk-r664\\trunk\\cmd\\custom\\RunAll.cmd"))
{
for (int j=0;j<installeerlijst64.Count;j++)
{
sw.WriteLine("call .\\CopyToTarget.cmd " + installeerlijst64[j] + " glb" +
File.CreateText(Environment.GetFolderPath(Environment.SpecialFolder.Desktop) + #"\trunk-r664\USB-map"));
}
}
I tried this too, but it tells me I am using an illegal character:
"\""+File.CreateText(Environment.GetFolderPath(Environment.SpecialFolder.Desktop)
+ #"\trunk-r664\USB-map" + "\""));
File.CreateText will create a new file. First time when for loop execute, it will create and open the file USB-map and hold the handle of that file. During second iteration of for loop, it will try to do the same thing. Hence, already in use error.
Remove File.CreateText and you will get the desired result.
sw.WriteLine("call .\\CopyToTarget.cmd " + installeerlijst64[j] + " glb " + "\"" +
Environment.GetFolderPath(Environment.SpecialFolder.Desktop) + #"\trunk-r664\USB-map" + "\"");

Temp Files Asp.net

Is it possilbe to create temp files and images in asp.net applications using something like this:
If no, how can i do it?
(ImagePB is a previously treated Bitmap)
if (System.IO.File.Exists(System.IO.Path.GetTempPath() + #"img" + imgID.ToString() + "PB" + extencao) == true)
{
try
{
System.IO.File.Delete(System.IO.Path.GetTempPath() + #"img" + imgID.ToString() + "PB" + extencao);
imagePB.Save(System.IO.Path.GetTempPath() + #"img" + imgID.ToString() + "PB" + extencao, imgFormat);
}
catch (Exception)
{ }
}
Yes, GetTempPath() should return a temporary file path, and the code you have posted should work. http://msdn.microsoft.com/en-us/library/system.io.path.gettemppath.aspx has more information about how GetTempPath() get's the path.
Though, it does not verify if the Temp Path directory exists, or is writeable by the application. I haven't run into a situation where GetTempPath() does return an inaccessible path. You'd probably want to account for this in your application to handle this situation.
Also be mindful, this is very possibly C:\Windows\Temp. It could have limited disk space, or deleted at any time by someone else when disk space is needed. You may want to create a temp path within your application, and delete it when you no longer need it.

When to use try/catch blocks?

I've done my reading and understand what a Try/Catch block does and why it's important to use one. But I'm stuck on knowing when/where to use them. Any advice? I'll post a sample of my code below in hopes that someone has some time to make some recommendations for my example.
public AMPFileEntity(string filename)
{
transferFileList tfl = new transferFileList();
_AMPFlag = tfl.isAMPFile(filename);
_requiresPGP = tfl.pgpRequired(filename);
_filename = filename.ToUpper();
_fullSourcePathAndFilename = ConfigurationSettings.AppSettings.Get("sourcePath") + _filename;
_fullDestinationPathAndFilename = ConfigurationSettings.AppSettings.Get("FTPStagePath") + _filename;
_hasBeenPGPdPathAndFilename = ConfigurationSettings.AppSettings.Get("originalsWhichHaveBeenPGPdPath");
}
public int processFile()
{
StringBuilder sb = new StringBuilder();
sb.AppendLine(" ");
sb.AppendLine(" --------------------------------");
sb.AppendLine(" Filename: " + _filename);
sb.AppendLine(" AMPFlag: " + _AMPFlag);
sb.AppendLine(" Requires PGP: " + _requiresPGP);
sb.AppendLine(" --------------------------------");
sb.AppendLine(" ");
string str = sb.ToString();
UtilityLogger.LogToFile(str);
if (_AMPFlag)
{
if (_requiresPGP == true)
{
encryptFile();
}
else
{
UtilityLogger.LogToFile("This file does not require encryption. Moving file to FTPStage directory.");
if (File.Exists(_fullDestinationPathAndFilename))
{
UtilityLogger.LogToFile(_fullDestinationPathAndFilename + " alreadyexists. Archiving that file.");
if (File.Exists(_fullDestinationPathAndFilename + "_archive"))
{
UtilityLogger.LogToFile(_fullDestinationPathAndFilename + "_archive already exists. Overwriting it.");
File.Delete(_fullDestinationPathAndFilename + "_archive");
}
File.Move(_fullDestinationPathAndFilename, _fullDestinationPathAndFilename + "_archive");
}
File.Move(_fullSourcePathAndFilename, _fullDestinationPathAndFilename);
}
}
else
{
UtilityLogger.LogToFile("This file is not an AMP transfer file. Skipping this file.");
}
return (0);
}
private int encryptFile()
{
UtilityLogger.LogToFile("This file requires encryption. Starting encryption process.");
// first check for an existing PGPd file in the destination dir. if exists, archive it - otherwise this one won't save. it doesn't overwrite.
string pgpdFilename = _fullDestinationPathAndFilename + ".PGP";
if(File.Exists(pgpdFilename))
{
UtilityLogger.LogToFile(pgpdFilename + " already exists in the FTPStage directory. Archiving that file." );
if(File.Exists(pgpdFilename + "_archive"))
{
UtilityLogger.LogToFile(pgpdFilename + "_archive already exists. Overwriting it.");
File.Delete(pgpdFilename + "_archive");
}
File.Move(pgpdFilename, pgpdFilename + "_archive");
}
Process pProc = new Process();
pProc.StartInfo.FileName = "pgp.exe";
string strParams = #"--encrypt " + _fullSourcePathAndFilename + " --recipient infinata --output " + _fullDestinationPathAndFilename + ".PGP";
UtilityLogger.LogToFile("Encrypting file. Params: " + strParams);
pProc.StartInfo.Arguments = strParams;
pProc.StartInfo.UseShellExecute = false;
pProc.StartInfo.RedirectStandardOutput = true;
pProc.Start();
pProc.WaitForExit();
//now that it's been PGPd, save the orig in 'hasBeenPGPd' dir
UtilityLogger.LogToFile("PGP encryption complete. Moving original unencrypted file to " + _hasBeenPGPdPathAndFilename);
if(File.Exists(_hasBeenPGPdPathAndFilename + _filename + "original_which_has_been_pgpd"))
{
UtilityLogger.LogToFile(_hasBeenPGPdPathAndFilename + _filename + "original_which_has_been_pgpd already exists. Overwriting it.");
File.Delete(_hasBeenPGPdPathAndFilename + _filename + "original_which_has_been_pgpd");
}
File.Move(_fullSourcePathAndFilename, _hasBeenPGPdPathAndFilename + _filename + "original_which_has_been_pgpd");
return (0);
}
}
}
The basic rule of thumb for catching exceptions is to catch exceptions if and only if you have a meaningful way of handling them.
Don't catch an exception if you're only going to log the exception and throw it up the stack. It serves no meaning and clutters code.
Do catch an exception when you are expecting a failure in a specific part of your code, and if you have a fallback for it.
Of course you always have the case of checked exceptions which require you to use try/catch blocks, in which case you have no other choice. Even with a checked exception, make sure you log properly and handle as cleanly as possible.
Like some others have said, you want to use try-catch blocks around code that can throw an Exception AND code that you are prepared to deal with.
Regarding your particular examples, File.Delete can throw a number of exceptions, for example, IOException, UnauthorizedAccessException. What would you want your application to do in those situations? If you try to delete the file but someone somewhere else is using it, you will get an IOException.
try
{
File.Delete(pgpdFilename + "_archive")
}
catch(IOException)
{
UtilityLogger.LogToFile("File is in use, could not overwrite.");
//do something else meaningful to your application
//perhaps save it under a different name or something
}
Also, keep in mind that if this does fail, then the File.Move you do outside of your if block next will also fail (again to an IOException - since the file was not deleted it is still there which will cause the move to fail).
I was taught to use try/catch/finally for any methods/classes where multiple errors could occur and that you can actually handle. Database transactions, FileSystem I/O, streaming, etc. Core logic usually doesn't require try/catch/finally.
The great part about try/catch/finally is that you can have multiple catches so that you can create a series of exception handlers to deal with very specific error or use a general exception to catch whatever errors you don't see coming.
In your case, you're using File.Exists which is good, but their maybe another problem with the disk that may throw another error that File.Exists cannot handle. Yes, it's a boolean method, but say the File is locked and what happens if you try to write to it? With the catch, you can plan for a rare scenario, but without try/catch/finally, you may be exposing the code to completely unforeseen conditions.
The other guys have given quite a number of good pointers and references.
My input is a short one:
When to use it is one thing, equally or more importanly is how to use it properly.
PS: "it" is refeings to "trying-catching exceptions".

Categories

Resources