Creating a list of docs that contains same name - c#

I'm creating a tool that is supposed to concatenate docs that contain the same name.
example: C_BA_20000_1.pdf and C_BA_20000_2.pdf
These files should be grouped in one list.
That tool runs on a directory lets say
//directory of pdf files
DirectoryInfo dirInfo = new DirectoryInfo(#"C:\Users\derp\Desktop");
FileInfo[] fileInfos = dirInfo.GetFiles("*.pdf");
foreach (FileInfo info in fileInfos)
I want to create an ArrayList that contains filenames of the same name
ArrayList list = new ArrayList();
list.Add(info.FullName);
and then have a list that contains all the ArrayLists of similar docs.
List<ArrayList> bigList = new List<ArrayList>();
So my question, how can I group files that contains same name and put them in the same list.
EDIT:
Files have the same pattern in their names AB_CDEFG_i
where i is a number and can be from 1-n. Files with the same name should have only different number at the end.
AB_CDEFG_1
AB_CDEFG_2
HI_JKLM_1
Output should be:
List 1: AB_CDEFG_1 and AB_CDEFG_2
List 2: HI_JKLM_1

Create method which extracts 'same' part of file name. E.g.
public string GetRawName(string fileName)
{
int index = fileName.LastIndexOf("_");
return fileName.Substring(0, index);
}
And use this method for grouping:
var bigList = Directory.EnumerateFiles(#"C:\Users\derp\Desktop", "*.pdf")
.GroupBy(file => GetRawName(file))
.Select(g => g.ToList())
.ToList();
This will return List<List<string>> (without ArrayList).
UPDATE Here is regular expression, which will work with all kind of files, whether they have number at the end, or not
public string GetRawName(string file)
{
string name = Path.GetFileNameWithoutExtension(file);
return Regex.Replace(name, #"(_\d+)?$", "")
}
Grouping:
var bigList = Directory.EnumerateFiles(#"C:\Users\derp\Desktop", "*.pdf")
.GroupBy(GetRawName)
.Select(g => g.ToList())
.ToList();

It sounds like the difficulty is in deciding which files are the same.
static string KeyFromFileName(string file)
{
// Convert from "C_BA_20000_2" to "C_BA_20000"
return file.Substring(0, file.LastIndexOf("_"));
// Note: This assumes there is an _ in the filename.
}
Then you can use this LINQ to build a list of fileSets.
using System.Linq; // Near top of file
var files = Directory.GetFiles(#"C:\Users\derp\Desktop", "*.pdf")
var fileSets = files
.Select(file => file.FullName)
.GroupBy(KeyFromFileName)
.Select(g => new {g.Key, Files = g.ToList()}
.ToList();

Aside from the fact that your question doesnt identify what "same name" means. This is a typical solution.
fileInfos.GroupBy ( f => f.FullName )
.Select( grp => grp.ToList() ).ToList();

This will get you a list of lists... also won't throw an exception if a file doesn't contain the underscore, etc.
private string GetKey(FileInfo fi)
{
var index = fi.Name.LastIndexOf('_');
return index == -1 ? Path.GetFileNameWithoutExtension(fi.Name)
: fi.Name.Substring(0, index);
}
var bigList = fileInfos.GroupBy(GetKey)
.Select(x => x.ToList())
.ToList();

Related

DirectoryInfo.GetFiles with multiple filters

I am trying to get a list of FileInfo objects that satisfy multiple filters.
Every suggestion I have seen uses array of file names/paths instead of FileInfo:
var files = Directory.GetFiles(sLogPath, "*.*", SearchOption.TopDirectoryOnly)
.Where(s => s.StartsWith("abc", StringComparison.CurrentCultureIgnoreCase) || s.StartsWith("def", StringComparison.CurrentCultureIgnoreCase));
What I am trying to get is:
DirectoryInfo di = new DirectoryInfo(sLogPath);
var files = di.GetFiles(<same filter as above>);
But it looks like I can only do something like:
var files = di.GetFiles("*_" + dateStr + ".log");
Based on your comment to me on your question, it looks like you want to filter on file names, but get the FileInfos that correspond to these names.
You can do something like this
var di = new DirectoryInfo(sLogPath);
var files = di
.GetFiles("*.*", SearchOption.TopDirectoryOnly)
.Where(x => x.Name.StartsWith("abc", StringComparison.CurrentCultureIgnoreCase)
|| x.Name.StartsWith("def", StringComparison.CurrentCultureIgnoreCase))
.ToList();
We're using the Name property in the filter and working with the FileInfo[] array returned by DirectoryInfo.GetFiles().

How to sort numbered filenames in list in C#

I try to sort a list that contains filepaths.
And I want them to be sorted by the numbers in them.
With the given code I use I don't get the expected result.
var mylist = mylist.OrderBy(x => int.Parse(Regex.Replace(x, "[^0-9]+", "0"))).ToList<string>();
I expect the result to be:
c:\somedir\1.jpg
c:\somedir\2.jpg
c:\somedir\3.jpg
c:\somedir\7.jpg
c:\somedir\8.jpg
c:\somedir\9.jpg
c:\somedir\10.jpg
c:\somedir\12.jpg
c:\somedir\20.jpg
But the output is random.
There is a simple way of achieving that.
Let's say you have a string list like this:
List<string> allThePaths = new List<string>()
{
"c:\\somedir\\1.jpg",
"c:\\somedir\\2.jpg",
"c:\\somedir\\20.jpg",
"c:\\somedir\\7.jpg",
"c:\\somedir\\12.jpg",
"c:\\somedir\\8.jpg",
"c:\\somedir\\9.jpg",
"c:\\somedir\\3.jpg",
"c:\\somedir\\10.jpg"
};
You can get the desired result with this:
List<string> sortedPaths = allThePaths
.OrderBy(stringItem => stringItem.Length)
.ThenBy(stringItem => stringItem).ToList();
Note: Also make sure you've included LINQ:
using System.Linq;
Here is a demo example just in case it's needed.
More complex solutions can be found there.
A cleaner way of doing this would be to use System.IO.Path:
public IEnumerable<string> OrderFilesByNumberedName(IEnumerable<string> unsortedPathList) =>
unsortedPathList
.Select(path => new { name = Path.GetFileNameWithoutExtension(path), path }) // Get filename
.OrderBy(file => int.Parse(file.name)) // Sort by number
.Select(file => file.path); // Return only path

C# How do I Access this generic list, when i didn't create the type?

I've stepped through the first part and that works correctly. My list ends up As fileName[288], and in my locals window I have a "value". This is a list. I didn't create the type, so I don't know how to access it. I know it is a generic list of strings, so I imported System.Collections.Generic.List, but I cant seem to figure it out.
var fileName = new DirectoryInfo(text)
.GetFiles(".", SearchOption.AllDirectories)
.Select(x => x.Name)
.ToList();
for (var i = 0; i < fileName.Count; i++)
{
Console.WriteLine("Filename: {0}", fileName[i].?)
}
Since the last Select returns string (x.Name is of type string)
...
.Select(x => x.Name) // select string(s)
.ToList(); // materialize them into a list
then fileName is of type List<string> and you have no need in any additional method:
Console.WriteLine("Filename: {0}", fileName[i]);
I suggest getting rid of for loop and let (with a help of foreach) .Net work for you:
// we have (potentially) many files' names - let use "fileNames" - plural
var fileNames = new DirectoryInfo(text)
.GetFiles(".", SearchOption.AllDirectories)
.Select(x => x.Name);
foreach (var name in fileNames)
Console.WriteLine($"Filename: {name}"); // string interpolation for readability
Edit: Please, notice that we don't need .ToList() in case of foreach - all we want is to enumerate the names without saving them into an any collection (say List<string>).

Sort List<string> based on character count

Example:
List<string> folders = new List<string>();
folders.Add("folder1/folder2/folder3/");
folders.Add("folder1/");
folders.Add("folder1/folder2/");
I want to sort this list based on character i.e '/'
so my output will be
folder1/
folder1/folder2/
folder1/folder2/folder3
LINQ:
folders = folders.OrderBy(f => f.Length).ToList(); // consider null strings
or List.Sort
folders.Sort((s1, s2) => s1.Length.CompareTo(s2.Length));
a safe approach if the list could contain null's:
folders = folders.OrderBy(f => f?.Length ?? int.MinValue).ToList();
If you actually want to sort by the folder-depth not string length:
folders = folders.OrderBy(f => f.Split(Path.DirectorySeparatorChar, Path.AltDirectorySeparatorChar).Length).ToList();
It's likely you actually want to sort by name:
folders = folders.OrderBy(f => f).ToList();
Or simply:
folders.Sort();
This will work correctly for cases like this:
folder1/
folder1/subfolder1
folder1/subfolder1/subsubfolder
folder2
folder2/subfolder2
Sorting by length alone will consider "folder1" and "folder2" equal.

get distinct count of substring of file name

I have a directory with a list of file names.
VAH007157100-pic1.jpg
VAH007157100-pic2.jpg
VAH007157100-pic3.jpg
WAZ009999200-pic1.jpg
WAZ009999200-pic2.jpg
WAZ009999200-pic3.jpg
I want to know the distinct count of substringing (0, 12).
This isn't working for some reason:
string[] originalFiles = Directory.GetFiles(SelectedDirectory);
private int GetDistinctPolicyCountInDirectory()
{
var prefixes = originalFiles
.GroupBy(x => x.Substring(0, 12))
.Select(y => new { Policy = y.Key, Count = y.Count() });
return prefixes.Count();
}
I keep getting 0. Am I missing anything here?
Please note that I do not want to do a split to get the numbers separated. I want to do it by substringing.
UPDATE -
private int GetDistinctPolicyCountInDirectory(string[] originalFiles)
{
var count = originalFiles.Distinct(x => Path.GetFileName(x).Substring(0, 12)).Count();
return Convert.ToInt32(count);
}
I'm running into an error here where it says: Error 1 Cannot convert lambda expression to type 'System.Collections.Generic.IEqualityComparer' because it is not a delegate type
I'd just consider using .Distinct().
Also you need to strip it down to just the filename instead of the full file path.
originalFiles.Select(x => Path.GetFileName(x).Substring(0, 12))
.Distinct().Count();
GetFiles returns an array of file names with full paths, including the directory. You want to compare only the file name, so you should consider using Path.GetFileName.
GroupBy(x => Path.GetFileName(x).Substring(0, 12));

Categories

Resources