I have a DataTable which contains approximately 150.000 rows. I try to compare those rows to Files (approximately 200.000 files) in a specific folder. My code looks like:
foreach (DataRow row in KundenundDateinamenohneDoppelte.Rows)
{
OrdnerSchadenanlage = #"" + Settings.Default.PathtoKundenablage + "\\Kundenablage\\";
foreach (string item in listbox2.Items)
{
if (item == "Leerzeile")
{
OrdnerSchadenanlage += " ";
}
else if (item == "-" || item == ",")
{
OrdnerSchadenanlage += item;
}
else
{
OrdnerSchadenanlage += row[item].ToString().Replace("/", " u. ").Replace(#"""", "").Replace("00:00:00", "");
}
}
if (!Directory.Exists(OrdnerSchadenanlage))
{
Directory.CreateDirectory(OrdnerSchadenanlage);
}
DirectoryInfo DIR = new DirectoryInfo(Settings.Default.PathtoGesamtablage);
FileInfo[] FILES = DIR.GetFiles("*.*");
DirectoryInfo[] DIRECTORIES = DIR.GetDirectories();
foreach (FileInfo f in FILES)
{
double _fileSize = new System.IO.FileInfo(f.FullName).Length;
string filenametocheck = string.Empty;
string filenamewithoutklammern = f.Name;
string pattern = #"\s\([0-9]+\)\.";
bool m = Regex.IsMatch(filenamewithoutklammern, pattern);
if (m == true)
{
filenamewithoutklammern = Regex.Replace(filenamewithoutklammern, pattern, ".");
filenametocheck = filenamewithoutklammern;
}
else
{
filenametocheck = f.Name;
}
if (System.IO.Path.GetFileNameWithoutExtension(filenametocheck) == row["Dateiname"].ToString() && Math.Ceiling(_fileSize / 1024) == Convert.ToDouble(row["Größe in kB"].ToString()))
{
string folderfound = "";
if (newtable.Rows.Count > 0)
{
foreach (DataRow dr in newtable.Rows)
{
if (dr["Suchart"].ToString() == "beginnend")
{
if (f.Name.StartsWith(dr["Dateiname"].ToString()))
{
folderfound = dr["Ordnername"].ToString();
break;
}
else
{
folderfound = string.Empty;
}
}
else
{
if (f.Name.Contains(dr["Dateiname"].ToString()))
{
folderfound = dr["Ordnername"].ToString();
break;
}
else
{
folderfound = string.Empty;
}
}
}
if (folderfound != string.Empty)
{
OrdnermitUnterodner = OrdnerSchadenanlage + "\\" + folderfound + "";
if (!Directory.Exists(OrdnermitUnterodner))
{
Directory.CreateDirectory(OrdnermitUnterodner);
}
if (!File.Exists(System.IO.Path.Combine(OrdnerSchadenanlage, f.Name)) && !File.Exists(System.IO.Path.Combine(OrdnermitUnterodner, f.Name)))
{
File.Copy(f.FullName, System.IO.Path.Combine(OrdnermitUnterodner, f.Name));
File.Delete(f.FullName);
//kopiervorgang.Rows.Add(1, "Copied file " + f.FullName + " to " + System.IO.Path.Combine(OrdnermitUnterodner, f.Name) + "\n");
}
}
else
{
if (!File.Exists(System.IO.Path.Combine(OrdnerSchadenanlage, f.Name)) && !File.Exists(System.IO.Path.Combine(OrdnermitUnterodner, f.Name)))
{
File.Copy(f.FullName, System.IO.Path.Combine(OrdnerSchadenanlage, f.Name));
File.Delete(f.FullName);
//kopiervorgang.Rows.Add(1, "Copied file " + f.FullName + " to " + System.IO.Path.Combine(OrdnerSchadenanlage, f.Name) + "\n");
}
}
}
else
{
if (!File.Exists(System.IO.Path.Combine(OrdnerSchadenanlage, f.Name)) && !File.Exists(System.IO.Path.Combine(OrdnermitUnterodner, f.Name)))
{
File.Copy(f.FullName, System.IO.Path.Combine(OrdnerSchadenanlage, f.Name));
File.Delete(f.FullName);
//kopiervorgang.Rows.Add(1, "Copied file " + f.FullName + " to " + System.IO.Path.Combine(OrdnerSchadenanlage, f.Name) + "\n");
}
}
}
else
{
//kopiervorgang.Rows.Add(0, "Copy failed cause there is no match between filenmame and filesize.\n");
}
}
GC.Collect();
progressrow++;
double Percentage = Convert.ToDouble(progressrow) * 100 / KundenundDateinamenohneDoppelte.Rows.Count;
übertragenworker.ReportProgress(0, Percentage);
}
kopiervorgang.Rows.Add(0, "Übertragung abgeschlossen.");
e.Result = kopiervorgang;
}
This works just fine, Files are getting compared and copied to the destined folder. However for processing just 0.2 % of this task my application needs about 1 hour. Calculated to 100% this should be about 17 days.
Is there any way to make this loop more efficient and faster?
This needs what we call "refactoring" to achieve the desired optimization. There are too many nested loops containing code that can be removed from the loops.
One particular kind of refactoring needed is removing everything that is not affected by the loops from inside the loops. Everything that depends on these things that are not related to the loops, can also then be removed (as long as there is no dependency on the loop, e.g. loop variable).
For example, I would start with removing this chunk from the loop because it is non-trivial (expensive, computationally) but it does not depend on the loop at all.
// should remove these lines from the loop!
DirectoryInfo DIR = new DirectoryInfo(Settings.Default.PathtoGesamtablage);
FileInfo[] FILES = DIR.GetFiles("*.*");
DirectoryInfo[] DIRECTORIES = DIR.GetDirectories();
The process of refactoring involve identifying these cases and improving code by putting them in suitable better places. Another thing to do is split the big method into smaller methods.
More things that can be removed from loop:
OrdnerSchadenanlage = #"" + Settings.Default.PathtoKundenablage + "\\Kundenablage\\";
string pattern = #"\s\([0-9]+\)\.";
FILES
Now that FILES is removed from the loop, we can also remove this from the loop: double _fileSize = new System.IO.FileInfo(f.FullName).Length; Since we are looping all the FILES we can get all the sizes ONCE not for each data row, this is because the files are the same for each row in KundenundDateinamenohneDoppelte.Rows (the main foreach).
Note: fileSize should be long not double.
The way to remove this from the loop is to create a dictionary of file sizes. For example:
// outside the loop, just under `FILES`
Dictionary<string, long> fileSizes = FILES.Select(file => new { file.FullName, new System.IO.FileInfo(file.FullName).Length }).ToDictionary(fl => fl.FullName, fl => fl.Length);
You can then use this dictionary like this to get the file size without doing many calls to FileInfo.Length!
// inside the FILES loop, I replace double with long
long _fileSize = fileSizes[f.FullName];
newtable
I don't know what's inside of newtable and how does it change, but I don't see it depending on the outside loops, therefore it should be possible to remove this whole section from the nested loops:
foreach (DataRow dr in newtable.Rows) {
if (dr["Suchart"].ToString() == "beginnend") {
if (f.Name.StartsWith(dr["Dateiname"].ToString())) {
folderfound = dr["Ordnername"].ToString();
break;
} else {
folderfound = string.Empty;
}
} else {
if (f.Name.Contains(dr["Dateiname"].ToString())) {
folderfound = dr["Ordnername"].ToString();
break;
} else {
folderfound = string.Empty;
}
}
}
Related
private void pictureBox2_MouseUp(object sender, MouseEventArgs e)
{
if (e.Button != MouseButtons.Left) return;
if (DrawingRects.Count > 0)
{
// The last drawn shape
var dr = DrawingRects.Last();
if (dr.Rect.Width > 0 && dr.Rect.Height > 0)
{
rectImage = cropAtRect((Bitmap)pictureBox2.Image, dr.Rect);
if (saveRectangles)
{
DirectoryInfo dInfo = new DirectoryInfo(#"d:\rectangles");
var files = GetFilesByExtensions(dInfo, ".bmp");
if (files.Count() > 0)
{
foreach (var f in files)
{
}
}
rectangleName = #"d:\Rectangles\rectangle" + saveRectanglesCounter + ".bmp";
FileList.Add($"{dr.Location}, {dr.Size}", rectangleName);
string json = JsonConvert.SerializeObject(
FileList,
Formatting.Indented // this for pretty print
);
using (StreamWriter sw = new StreamWriter(#"d:\rectangles\rectangles.txt", false))
{
sw.Write(json);
sw.Close();
}
rectImage.Save(rectangleName);
saveRectanglesCounter++;
}
pixelsCounter = rect.Width * rect.Height;
pictureBox1.Invalidate();
listBox1.DataSource = FileList.ToList();
listBox1.SelectedIndex = listBox1.Items.Count - 1;
}
}
}
I'm using DirectoryInfo and the method GetFilesByExtensions
public IEnumerable<FileInfo> GetFilesByExtensions(DirectoryInfo dir, params string[] extensions)
{
if (extensions == null)
throw new ArgumentNullException("extensions");
IEnumerable<FileInfo> files = dir.EnumerateFiles();
return files.Where(f => extensions.Contains(f.Extension));
}
if there are existing files for example rectangle1.bmp rectangle2.bmp.....rectangle7.bmp
then when creating a new rectangle file on the hard disk i want it to be rectangle8.bmp
now it's trying to create another rectangle1.bmp and give exception and i don't want to delete the existing files but to create new ones.
and make it as much as possible generic. but the main goal is to create new files names according to those existing and continue the counting.
You can write a method that checks if the proposed name exists or not
string GetNextName(string baseName, string extension)
{
int counter = 1;
string nextName = baseName + counter + extension;
while(File.Exists(nextName))
{
counter++;
nextName = baseName + counter + extension;
}
return nextName;
}
and call it in this way:
rectangleName = GetNextName(#"d:\Rectangles\rectangle", ".bmp");
You can use linq and do everything in one statement like this:
DirectoryInfo di = new DirectoryInfo(#"D:\rectangles");
var maxIndex = di.GetFiles().Select(fi => fi.Name.Replace("rectangle","").Replace(".bmp", "")).Max(i => i);
I have working example without the conditional here.
public string RenderPostTags(DMCResultSet resultSet)
{
string output = "";
string filterForm = RenderFilterForm(resultSet);
string pagination = RenderPagination(resultSet);
List<XElement> items = resultSet.items;
foreach(XElement i in items)
{
string tags = "";
if (i.Element("tags") != null)
{
foreach(string tag in i.Element("tags").Elements("tag"))
{
tags += "" + tag + "";
}
}
output += tags;
}
return output;
}
I know just putting a count on it wont work but I've tried several different methods and they haven't worked for me. Could be a syntactical error I'm a total C# noob.
But I need to output adjusted html using a if else conditional similar to this
public string RenderPostTags(DMCResultSet resultSet){
string output = "";
string filterForm = RenderFilterForm(resultSet);
string pagination = RenderPagination(resultSet);
List<XElement> items = resultSet.items;
foreach(XElement i in items){
string tags = "";
if (i.Element("tags") != null) {
int count = 1;
int total = i.Element("tags").Elements("tag").Count;
foreach(string tag in i.Element("tags").Elements("tag")) {
if(count == total){
tags += "" + tag + "";
count++;
}else{
tags += "" + tag +","+ " " + "";
count++;
}
}
}
output += tags;
}
return output;
}
Methods I have tried can be found on this thread.
Foreach loop, determine which is the last iteration of the loop
Thank you for any assitance.
As #Sach said, use the for loop instead of foreach.
string output = "";
List<XElement> items = new List<XElement>();
foreach (XElement i in items)
{
string tags = "";
if (i.Element("tags") != null && i.Element("tags")?.Elements("tag") != null)
{
List<XElement> tagItems = i.Element("tags").Elements("tag").ToList();
if (tagItems == null) continue;
for (int j = 0; j < tagItems.Count(); j++)
{
XElement tag = tagItems[j];
if (j == i.Element("tags")?.Elements("tag").Count() - 1)
{
tags += "" + tag + "";
}
else
{
tags += "" + tag + "," + " " + "";
}
}
}
output += tags;
}
You can write your foreach loop like this to capture both conditions. You can use condition ? true : false to write either/or based on the last item in the collection.
int counter = 1; // Start with 1 since we are using != later on.
int totalRecords = i.Element("tags").Elements("tag").Count();
foreach (string tag in i.Element("tags").Elements("tag"))
tags += "" + tag + counter++ != totalRecords ? ", " : string.Empty + "";
Above is equivalent to
if (i.Element("tags") != null)
{
int counter = 1;
int totalRecords = i.Element("tags").Elements("tag").Count();
foreach (string tag in i.Element("tags").Elements("tag"))
{
if (counter++ == totalRecords)
{
tags += "" + tag + "";
}
else
{
tags += "" + tag + ", " + "";
}
}
}
Make a note that IEnumerable does not have a Count property but have a method Count().
So maybe for namespace reasons the count method would not solve my issue.
However after some toil this solution worked perfectly. Thankyou for those to helped me get to this solution.
public string RenderPostTags(DMCResultSet resultSet){
string output = "";
string filterForm = RenderFilterForm(resultSet);
string pagination = RenderPagination(resultSet);
List<XElement> items = resultSet.items;
foreach(XElement i in items){
string tags = "";
if (i.Element("tags") != null) {
foreach(string tag in i.Element("tags").Elements("tag")){
if(tags != "") tags += ", ";
tags += "" + tag +"";
}
}
output += tags;
}
return output;
}
I have a c# application that connects to a server, gets the datagrid, manipulates each row and then according to each updated row the new row gets uploaded to the server and one file per row gets renamed on the hdd.
The application works totally fine but i analyzed it with the profiler and realised that this line of code:
File.Move(symbolsOldPath, symbolsPath);
takes 80% of the time my application needs to complete its task.
I went through all the questions on StackOverflow and other questions out there if there is a different way for a better performance but i wasnt succesful. The only other way i found was implementing VB to use the Rename method, but as it calls the File.Move method it is no improvement. Do you guys know an alternative way with better performance?
Here is my code of the class that changes the data.
public DataTable ChangeData(DataTable unchangedData, string searchPathSymbols, string searchPathImages, ProgressBar pbForm)
{
pbtemp = pbForm;
int rowCount = unchangedData.Rows.Count;
foreach (DataRow row in unchangedData.Rows)
{
counter++;
if (counter == 10)
{
pbtemp.Value += counter;
counter = 0;
Application.DoEvents();
}
number = row[1].ToString();
symbolsPath = row[2].ToString();
symbolsPathCopy = symbolsPath;
imagesPath = row[3].ToString();
imagesPathCopy = imagesPath;
aliasSymbols = symbolsPath.Substring(0, symbolsPath.IndexOf('>') + 1);
if (symbolsPath == imagesPath)
{
if (aliasSymbols.Contains("Symbole"))
{
if (!string.IsNullOrEmpty(symbolsPath))
{
SymbolsChanger(searchPathSymbols, row);
row[3] = row[2];
}
}
else
{
if (!string.IsNullOrEmpty(imagesPath))
{
ImagesChanger(searchPathImages, row);
row[2] = row[3];
}
}
}
else
{
if (!string.IsNullOrEmpty(symbolsPath))
{
SymbolsChanger(searchPathSymbols, row);
}
if (!string.IsNullOrEmpty(imagesPath))
{
ImagesChanger(searchPathImages, row);
}
}
}
pbtemp.Value += (rowCount - pbtemp.Value);
return unchangedData;
}
private void SymbolsChanger(string searchPathSymbols, DataRow row)
{
string symbolsOldPath;
//Symbols
//Get and delete Alias and get filepath
int countAliasSymbolsIndex = symbolsPath.LastIndexOf('>') + 1;
symbolsPath = symbolsPath.Remove(0, countAliasSymbolsIndex);
symbolsOldPath = searchPathSymbols + "\\" + symbolsPath;
//Remove and replace numbers
int startSymbolsIndex = 0;
int endSymbolsIndex = symbolsPath.IndexOf('_') == -1 ? symbolsPath.LastIndexOf('.') : symbolsPath.IndexOf('_');
int countSymbolsIndex = endSymbolsIndex - startSymbolsIndex;
symbolsPath = symbolsPath.Remove(startSymbolsIndex, countSymbolsIndex);
string nameSymbols = number + symbolsPath;
symbolsPath = searchPathSymbols + "\\" + nameSymbols;
try
{
//Rename file
File.Move(symbolsOldPath, symbolsPath);
}
catch(FileNotFoundException)
{
try
{
File.Move(symbolsPath, symbolsPath);
}
catch (FileNotFoundException)
{
logArrayDataChange.Add(symbolsPathCopy);
}
}
row[2] = aliasSymbols + nameSymbols;
}
private void ImagesChanger(string searchPathImages, DataRow row)
{
string imagesOldPath;
//Images
//Get and delete Alias and get filepath
string aliasImage = imagesPath.Substring(0, imagesPath.IndexOf('>') + 1);
int countAliasImagesIndex = imagesPath.LastIndexOf('>') + 1;
imagesPath = imagesPath.Remove(0, countAliasImagesIndex);
imagesOldPath = imagesPath.StartsWith("\\") == true ? searchPathImages + imagesPath : searchPathImages + "\\" + imagesPath;
//Remove and replace numbers
int startImagesIndex = imagesPath.LastIndexOf("\\") == -1 ? 0 : imagesPath.LastIndexOf("\\");
int endImagesIndex = imagesPath.IndexOf('_') == -1 ? imagesPath.LastIndexOf('.') : imagesPath.IndexOf('_');
int countImagesIndex = endImagesIndex - startImagesIndex;
imagesPath = imagesPath.Remove(startImagesIndex + 1, countImagesIndex - 1);
int insertIndex = imagesPath.LastIndexOf("\\") == -1 ? 0 : imagesPath.LastIndexOf("\\");
string nameImages = imagesPath.Insert(insertIndex + 1, number);
imagesPath = searchPathImages + "\\" + nameImages;
try
{
//Rename file
File.Move(imagesOldPath, imagesPath);
}
catch (FileNotFoundException)
{
try
{
File.Move(imagesPath, imagesPath);
}
catch (FileNotFoundException)
{
logArrayDataChange.Add(imagesPathCopy);
}
}
row[3] = aliasImage + nameImages;
}
}
}
I would keep File.Move to do the job. Besides a little overhead (checks), File.Move uses only the native MoveFile Windows call to move the file:
[DllImport(KERNEL32, SetLastError=true, CharSet=CharSet.Auto, BestFitMapping=false)]
[ResourceExposure(ResourceScope.Machine)]
internal static extern bool MoveFile(String src, String dst);
You can call that method yourself, but I doubt it will get any faster than that.
From the documentation it seems that move is already built to rename efficiently:
The MoveFile function will move (rename) either a file or a directory ...
I am using the code below start at a path (root) provided by a GET variable and recursively go into every sub folder and display it's contents as list items. The path I'm using has about 3800 files and 375 sub folders. I takes about 45 seconds to render the page, is there any way I can cut this time down as this is unacceptable for my users.
string output;
protected void Page_Load(object sender, EventArgs e) {
getDirectoryTree(Request.QueryString["path"]);
itemWrapper.InnerHtml = output;
}
private void getDirectoryTree(string dirPath) {
try {
System.IO.DirectoryInfo rootDirectory = new System.IO.DirectoryInfo(dirPath);
foreach (System.IO.DirectoryInfo subDirectory in rootDirectory.GetDirectories()) {
output = output + "<ul><li><a>" + Regex.Replace(subDirectory.Name, "_", " ");
if (subDirectory.GetFiles().Length != 0 || subDirectory.GetDirectories().Length != 0) {
output = output + " +</a>";
} else {
output = output + "</a>";
}
getDirectoryTree(subDirectory.FullName);
if (subDirectory.GetFiles().Length != 0) {
output = output + "<ul>";
foreach (System.IO.FileInfo file in subDirectory.GetFiles()) {
output = output + "<li><a href='" + file.FullName + "'>" + file.Name + "</a></li>";
}
output = output + "</ul>";
}
output = output + "</li></ul>";
}
} catch (System.UnauthorizedAccessException) {
//This throws when we don't have access.
}
}
You should use System.Text.StringBuilder (Good performance) instead of string concatenate(Immutable) Bad performance.
You should use normal string replace function is not using complex search. subDirectory.Name.replace("_", " ");
Main reason for slowness in your code is most likely multiple calls to GetFiles and GetDirectories. You are calling them over and over again in if conditions as well as in your initial lookups. You only need the counts only once. Also, adding strings aren't helping the cause.
Following code was able to run through my simple usb-drive in 300ms and return with over 400 folders and 11000 files. On slow network drive, it was able to return in 9 seconds for 4000 files in 300 folders. It can probably be further optimized with Parallel.ForEach during recursion.
protected void Page_Load(object sender, EventArgs e) {
itemWrapper.InnerHtml = GetDirectory(Request.QueryString["path"]);
}
static string GetDirectory(string path)
{
StringBuilder output = new StringBuilder();
var subdir = System.IO.Directory.GetDirectories(path);
var files = System.IO.Directory.GetFiles(path);
output.Append("<ul><li><a>");
output.Append(path.Replace("_", " "));
output.Append(subdir.Length > 0 || files.Length > 0 ? "+</a>" : "</a>");
foreach(var sb in subdir)
{
output.Append(GetDirectory(sb));
}
if (files.Length > 0)
{
output.Append("<ul>");
foreach (var file in files)
{
output.AppendFormat("<li>{1}</li>", file, System.IO.Path.GetFileName(file));
}
output.Append("</ul>");
}
output.Append("</ul>");
return output.ToString();
}
I have a folder that is filled with dwg files so I just need to find the latest version of a File or if a File has no versions then copy it to a directory. For example here are three files:
ABBIE 08-10 #6-09H4 FINAL 06-12-2012.dwg
ABBIE 08-10 #6-09H4 FINAL 06-12-2012_1.dwg
ABBIE 08-10 #6-09H4 FINAL 06-12-2012_2.dwg
Notice the difference is one file has a _1 and another has a _2 so the latest file here is the _2. I need to keep the latest file and copy it to a directory. Some files will not have different versions so those can be copied. I cannot focus on the creation date of the file or the modified date because in many instances they are the same so all I have to go on is the file name itself. I'm sure there is a more efficient way to do this than what I will post below.
DirectoryInfo myDir = new DirectoryInfo(#"H:\Temp\Test");
var Files = myDir.GetFiles("*.dwg");
string[] fileList = Directory.GetFiles(#"H:\Temp\Test", "*FINAL*", SearchOption.AllDirectories);
ArrayList list = new ArrayList();
ArrayList WithUnderscores = new ArrayList();
string nameNOunderscores = "";
for (int i = 0; i < fileList.Length; i++)
{
//Try to get just the filename..
string filename = fileList[i].Split('.')[0];
int position = filename.LastIndexOf('\\');
filename = filename.Substring(position + 1);
filename = filename.Split('_')[0];
foreach (FileInfo allfiles in Files)
{
var withoutunderscore = allfiles.Name.Split('_')[0];
withoutunderscore = withoutunderscore.Split('.')[0];
if (withoutunderscore.Equals(filename))
{
nameNOunderscores = filename;
list.Add(allfiles.Name);
}
}
//If there is a number after the _ then capture it in an ArrayList
if (list.Count > 0)
{
foreach (string nam in list)
{
if (nam.Contains("_"))
{
//need regex to grab numeric value after _
var match = new Regex("_(?<number>[0-9]+)").Match(nam);
if (match.Success)
{
var value = match.Groups["number"].Value;
var number = Int32.Parse(value);
WithUnderscores.Add(number);
}
}
}
int removedcount = 0;
//Whats the max value?
if (WithUnderscores.Count > 0)
{
var maxval = GetMaxValue(WithUnderscores);
Int32 intmax = Convert.ToInt32(maxval);
foreach (FileInfo deletefile in Files)
{
string shorten = deletefile.Name.Split('.')[0];
shorten = shorten.Split('_')[0];
if (shorten == nameNOunderscores && deletefile.Name != nameNOunderscores + "_" + intmax + ".dwg")
{
//Keep track of count of Files that are no good to us so we can iterate to next set of files
removedcount = removedcount + 1;
}
else
{
//Copy the "Good" file to a seperate directory
File.Copy(#"H:\Temp\Test\" + deletefile.Name, #"H:\Temp\AllFinals\" + deletefile.Name, true);
}
}
WithUnderscores.Clear();
list.Clear();
}
i = i + removedcount;
}
else
{
//This File had no versions so it is good to be copied to the "Good" directory
File.Copy(#"H:\Temp\SH_Plats\" + filename, #"H:\Temp\AllFinals" + filename, true);
i = i + 1;
}
}
I've made a Regex based solution, and apparently come late to the party in the meantime.
(?<fileName>[A-Za-z0-9-# ]*)_?(?<version>[0-9]+)?\.dwg
this regex will recognise the fileName and version and split them into groups, a pretty simple foreach loop to get the most recent files in a dictionary (cos I'm lazy) and then you just need to put the fileNames back together again before you access them.
var fileName = file.Key + "_" + file.Value + ".dwg"
full code
var files = new[] {
"ABBIE 08-10 #6-09H4 FINAL 06-12-2012.dwg",
"ABBIE 08-10 #6-09H4 FINAL 06-12-2012_1.dwg",
"ABBIE 08-10 #6-09H4 FINAL 06-12-2012_2.dwg",
"Second File.dwg",
"Second File_1.dwg",
"Third File.dwg"
};
// regex to split fileName from version
var r = new Regex( #"(?<fileName>[A-Za-z0-9-# ]*)_?(?<version>[0-9]+)?\.dwg" );
var latestFiles = new Dictionary<string, int>();
foreach (var f in files)
{
var parsedFileName = r.Match( f );
var fileName = parsedFileName.Groups["fileName"].Value;
var version = parsedFileName.Groups["version"].Success ? int.Parse( parsedFileName.Groups["version"].Value ) : 0;
if( latestFiles.ContainsKey( fileName ) && version > latestFiles[fileName] )
{
// replace if this file has a newer version
latestFiles[fileName] = version;
}
else
{
// add all newly found filenames
latestFiles.Add( fileName, version );
}
}
// open all most recent files
foreach (var file in latestFiles)
{
var fileToCopy = File.Open( file.Key + "_" + file.Value + ".dwg" );
// ...
}
You can use this Linq query with Enumerable.GroupBy which should work(now tested):
var allFiles = Directory.EnumerateFiles(sourceDir, "*.dwg")
.Select(path => new
{
Path = path,
FileName = Path.GetFileName(path),
FileNameWithoutExtension = Path.GetFileNameWithoutExtension(path),
VersionStartIndex = Path.GetFileNameWithoutExtension(path).LastIndexOf('_')
})
.Select(x => new
{
x.Path,
x.FileName,
IsVersionFile = x.VersionStartIndex != -1,
Version = x.VersionStartIndex == -1 ? new Nullable<int>()
: x.FileNameWithoutExtension.Substring(x.VersionStartIndex + 1).TryGetInt(),
NameWithoutVersion = x.VersionStartIndex == -1 ? x.FileName
: x.FileName.Substring(0, x.VersionStartIndex)
})
.OrderByDescending(x => x.Version)
.GroupBy(x => x.NameWithoutVersion)
.Select(g => g.First());
foreach (var file in allFiles)
{
string oldPath = Path.Combine(sourceDir, file.FileName);
string newPath;
if (file.IsVersionFile && file.Version.HasValue)
newPath = Path.Combine(versionPath, file.FileName);
else
newPath = Path.Combine(noVersionPath, file.FileName);
File.Copy(oldPath, newPath, true);
}
Here's the extension method which i'm using to determine if a string is parsable to int:
public static int? TryGetInt(this string item)
{
int i;
bool success = int.TryParse(item, out i);
return success ? (int?)i : (int?)null;
}
Note that i'm not using regex but string methods only.
Try this
var files = new My.Computer().FileSystem.GetFiles(#"c:\to\the\sample\directory", Microsoft.VisualBasic.FileIO.SearchOption.SearchAllSubDirectories, "*.dwg");
foreach (String f in files) {
Console.WriteLine(f);
};
NB: Add a reference to Microsoft.VisualBasic and use the following line at the beginning of the class:
using My = Microsoft.VisualBasic.Devices;
UPDATE
The working sample[tested]:
String dPath=#"C:\to\the\sample\directory";
var xfiles = new My.Computer().FileSystem.GetFiles(dPath, Microsoft.VisualBasic.FileIO.SearchOption.SearchAllSubDirectories, "*.dwg").Where(c => Regex.IsMatch(c,#"\d{3,}\.dwg$"));
XElement filez = new XElement("filez");
foreach (String f in xfiles)
{
var yfiles = new My.Computer().FileSystem.GetFiles(dPath, Microsoft.VisualBasic.FileIO.SearchOption.SearchAllSubDirectories, string.Format("{0}*.dwg",System.IO.Path.GetFileNameWithoutExtension(f))).Where(c => Regex.IsMatch(c, #"_\d+\.dwg$"));
if (yfiles.Count() > 0)
{
filez.Add(new XElement("file", yfiles.Last()));
}
else {
filez.Add(new XElement("file", f));
};
};
Console.Write(filez);
Can you do this by string sort? The only tricky part I see here is to convert the file name to a sortable format. Just do a string replace from dd-mm-yyyy to yyyymmdd. Then, sort the the list and get the last record out.
This is what you want considering fileList contain all file names
List<string> latestFiles=new List<string>();
foreach(var groups in fileList.GroupBy(x=>Regex.Replace(x,#"(_\d+\.dwg$|\.dwg$)","")))
{
latestFiles.Add(groups.OrderBy(s=>Regex.Match(s,#"\d+(?=\.dwg$)").Value==""?0:int.Parse(Regex.Match(s,#"\d+(?=\.dwg$)").Value)).Last());
}
latestFiles has the list of all new files..
If fileList is bigger,use Threading or PLinq