Compare files in List<FileInfo> using LINQ C# - c#

I have 2 collections of files as List<FileInfo>. I am currently using 2 x foreach to loop through each set and match the files (shown below). Is there a quicker way to do this in LINQ and .RemoveAt when found.?
I need the filenames and file lengths to match.
var sdinfo = new DirectoryInfo(srcPath);
var ddinfo = new DirectoryInfo(dstPath);
var sFiles = new List<FileInfo>(sdinfo.GetFiles("*", SearchOption.AllDirectories));
var dFiles = new List<FileInfo>(ddinfo.GetFiles("*", SearchOption.AllDirectories));
foreach (var sFile in sFiles)
{
bool foundFile = false;
int i = 0;
foreach (var dFile in dFiles)
{
if (sFile.Name == dFile.Name && sFile.Length == dFile.Length)
{
foundFile = true;
dFiles.RemoveAt(i);
}
i += 1;
}
}
Cheers.

You could use the Enumerable.Except<TSource> method:
private class FileInfoComparer : IEqualityComparer<FileInfo>
{
public bool Equals(FileInfo x, FileInfo y)
{
return x == null ? y == null : (x.Name.Equals(y.Name, StringComparison.CurrentCultureIgnoreCase) && x.Length == y.Length);
}
public int GetHashCode(FileInfo obj)
{
return obj.GetHashCode();
}
}
sFiles = sFiles.Except(dFiles, new FileInfoComparer()).ToList();
In the example above you get all files from sFiles that are absent in the dFiles.

for one, this code will throw an exception if executed, because you're modifying an enumeration (dFiles) while iterating through it. This is easily solved by using the ToList() method however, in order to copy the enumeration. This will also have an issue because you increment your index regardless of removal, which can also cause an error - the colloquial off-by-one-exception.
If you're worried about speed, don't be. Linq uses methods which use foreach and yield returns, and are mostly visible in source from the Reference Source.
If you want to make the code easier to read, then this is where Linq becomes useful. For one, there is the .Join() Method:
foreach(var fileToRemove in sFiles.Join(dFiles, s => s, d => d, (s, d) => s).ToArray())
dFiles.Remove(fileToRemove);
Assuming you're iterating through dList afterwards, you can also use .Except(...) Method:
var files = sdinfo.GetFiles("*", SearchOption.AllDirectories)
.Except(ddinfo.GetFiles("*", SearchOption.AllDirectories));
finally, if you need to KEEP sFiles, the following code wraps it all together
List<string> sFiles, dFiles;
dFiles = ddinfo.GetFiles("*", SearchOption.AllDirectories)
.Except(sFiles = sdinfo.GetFiles("*", SearchOption.AllDirectories));

If you want to trade space for time, you could build a hash set of one list, and the lookup each element of the other in the hash set. Lookups are O(1) whereas the loop is O(n)

Related

Building up one array from multiple arrays in loop

I'm currently trying to build up one array by injecting multiple nested arrays into a for loop. This is not currently working but i cant figure out what i'm doing wrong.
Here is my current code :
// initialise an empty array
var x = new List<Model>();
// initialise an empty array to use in the loop
var mergedX = new List<Model>();
// Build up the array using the loop
foreach (var y in ys) {
if (y.nestedArray != null) {
mergedX = x.Concat(y.nestedArray).ToList();
}
}
// Return the built up array
return mergedX;
What am I doing wrong/is there a better way to achieve this?
Thanks
The problem is with this line:
mergedX = x.Concat(y.nestedArray).ToList();
You are always taking the value of x, but never changing it. Thus mergedX will only contain the final array's items.
Perhaps full LINQ would be better:
return ys
.Where(y => y.nestedArray != null) // only take items from ys if nestedArray != null
.SelectMany(y => y.nestedArray) // flatten the many arrays into one (in order)
.ToList(); // materialise the result into a list
Alternatively, you can use List<T>'s AddRange method:
foreach (var y in ys)
{
if (y.nestedArray != null)
{
mergedX.AddRange(y.nestedArray);
}
}

Compare two list of string in a single iteration using C# - Unsorted list

I wish to implement a Logic for Compare two List in a single iteration using C# (Un-Sorted List).
For Example:
List<string> listA = new List<string>() {"IOS", "Android", "Windows"};
List<string> listB = new List<string>() {"LINUS", "IOS"};
now I need to compare listB with listA, and I need to trace the missing items in listB like "Android", "Windows" without using C# predefined methods.
Note: Iterate each list only once.
Kindly assist me.
This is most likely one of the most optimized answers you are likely to find:
public static List<T> Except<T>(List<T> a, List<T> b)
{
var hash = new HashSet<T>(b);
var results = new List<T>(a.Count);
foreach (var item in a)
{
if (!hash.Contains(item))
{
results.Add(item);
}
}
return results;
}
Rather than the X x Y iterations you get from comparing lists directly, you get X + Y - Y from iterating the comparison list (when converting to a hash table), and X for iterating over the source list (no additional Y since hash table lookups are constant time).
try this
var objectList3 = listA.Where(o => !listB.Contains(o)).ToList();
I don't know if I got it completly right (please correct me if not), but this could be helpful:
//Remove all elements of b from a
foreach (string item in b)
{
a.Remove(item);
}
// check for all elements of a if they exist in b and store them in c if not
public static List<string> Excepts(List<string> a, List<string> b)
{
List<string> c = new List<string>();
foreach (string s1 in a)
{
bool found = false;
foreach (string s2 in b)
{
if (s1 == s2)
{
found = true;
break;
}
}
if (!found)
c.Add(s1);
}
return c;
}

What is the best way to trim a list?

I have a List of strings. Its being generated elsewhere but i will generate it below to help describe this simplified example
var list = new List<string>();
list.Add("Joe");
list.Add("");
list.Add("Bill");
list.Add("Bill");
list.Add("");
list.Add("Scott");
list.Add("Joe");
list.Add("");
list.Add("");
list = TrimList(list);
I would like a function that "trims" this list and by trim I want to remove all items at the end of the array that are blank strings (the final two in this case).
NOTE: I still want to keep the blank one that is the second item in the array (or any other one that is just not at the end) so I can't do a .Where(r=> String.isNullOrEmpty(r))
I would just write it without any LINQ, to be honest- after all, you're modifying a collection rather than just querying it:
void TrimList(List<string> list)
{
int lastNonEmpty = list.FindLastIndex(x => !string.IsNullOrEmpty(x));
int firstToRemove = lastNonEmpty + 1;
list.RemoveRange(firstToRemove, list.Count - firstToRemove);
}
If you actually want to create a new list, then the LINQ-based solutions are okay... although potentially somewhat inefficient (as Reverse has to buffer everything).
Take advantage of Reverse and SkipWhile.
list = list.Reverse().SkipWhile(s => String.IsNullOrEmpty(s)).Reverse().ToList();
List<T> (not the interface) has a FindLastIndex method. Therefore you can wrap that in a method:
static IList<string> TrimList(List<string> input) {
return input.Take(input.FindLastIndex(x => !string.IsNullOrEmpty(x)) + 1)
.ToList();
}
This produces a copy, whereas Jon's modifies the list.
The only solution I can think of is to code a loop that starts at the end of the list and searches for an element that is not an empty string. Don't know of any library functions that would help. Once you know the last good element, you know which ones to remove.
Be careful not to modify the collection while you are iterating over it. Tends to break the iterator.
I always like to come up with the most generic solution possible. Why restrict yourself with lists and strings? Let's make an algorithm for generic enumerable!
public static class EnumerableExtensions
{
public static IEnumerable<T> TrimEnd<T>(this IEnumerable<T> enumerable, Predicate<T> predicate)
{
if (predicate == null)
{
throw new ArgumentNullException("predicate");
}
var accumulator = new LinkedList<T>();
foreach (var item in enumerable)
{
if (predicate(item))
{
accumulator.AddLast(item);
}
else
{
foreach (var accumulated in accumulator)
{
yield return accumulated;
}
accumulator.Clear();
yield return item;
}
}
}
}
Use it like this:
var list = new[]
{
"Joe",
"",
"Bill",
"Bill",
"",
"Scott",
"Joe",
"",
""
};
foreach (var item in list.TrimEnd(string.IsNullOrEmpty))
{
Console.WriteLine(item);
}

Collection of strings to dictionary

Given an ordered collection of strings:
var strings = new string[] { "abc", "def", "def", "ghi", "ghi", "ghi", "klm" };
Use LINQ to create a dictionary of string to number of occurrences of that string in the collection:
IDictionary<string,int> stringToNumOccurrences = ...;
Preferably do this in a single pass over the strings collection...
var dico = strings.GroupBy(x => x).ToDictionary(x => x.Key, x => x.Count());
Timwi/Darin's suggestion will perform this in a single pass over the original collection, but it will create multiple buffers for the groupings. LINQ isn't really very good at doing this kind of counting, and a problem like this was my original motiviation for writing Push LINQ. You might like to read my blog post on it for more details about why LINQ isn't terribly efficient here.
Push LINQ and the rather more impressive implementation of the same idea - Reactive Extensions - can handle this more efficiently.
Of course, if you don't really care too much about the extra efficiency, go with the GroupBy answer :)
EDIT: I hadn't noticed that your strings were ordered. That means you can be much more efficient, because you know that once you've seen string x and then string y, if x and y are different, you'll never see x again. There's nothing in LINQ to make this particularly easier, but you can do it yourself quite easily:
public static IDictionary<string, int> CountEntries(IEnumerable<string> strings)
{
var dictionary = new Dictionary<string, int>();
using (var iterator = strings.GetEnumerator())
{
if (!iterator.MoveNext())
{
// No entries
return dictionary;
}
string current = iterator.Current;
int currentCount = 1;
while (iterator.MoveNext())
{
string next = iterator.Current;
if (next == current)
{
currentCount++;
}
else
{
dictionary[current] = currentCount;
current = next;
currentCount = 1;
}
}
// Write out the trailing result
dictionary[current] = currentCount;
}
return dictionary;
}
This is O(n), with no dictionary lookups involved other than when writing the values. An alternative implementation would use foreach and a current value starting off at null... but that ends up being pretty icky in a couple of other ways. (I've tried it :) When I need special-case handling for the first value, I generally go with the above pattern.
Actually you could do this with LINQ using Aggregate, but it would be pretty nasty.
The standard LINQ way is this:
stringToNumOccurrences = strings.GroupBy(s => s)
.ToDictionary(g => g.Key, g => g.Count());
If this is actual production code, I'd go with Timwi's response.
If this is indeed homework and you're expected to write your own implementation, it shouldn't be too tough. Here are just a couple of hints to point you in the right direction:
Dictionary<TKey, TValue> has a ContainsKey method.
The IDictionary<TKey, TValue> interface's this[TKey] property is settable; i.e., you can do dictionary[key] = 1 (which means you can also do dictionary[key] += 1).
From those clues I think you should be able to figure out how to do it "by hand."
If you are looking for a particularly efficient (fast) solution, then GroupBy is probably too slow for you. You could use a loop:
var strings = new string[] { "abc", "def", "def", "ghi", "ghi", "ghi", "klm" };
var stringToNumOccurrences = new Dictionary<string, int>();
foreach (var str in strings)
{
if (stringToNumOccurrences.ContainsKey(str))
stringToNumOccurrences[str]++;
else
stringToNumOccurrences[str] = 1;
}
return stringToNumOccurrences;
This is a foreach version like the one that Jon mentions that he finds "pretty icky" in his answer. I'm putting it in here, so there's something concrete to talk about.
I must admit that I find it simpler than Jon's version and can't really see what's icky about it. Jon? Anyone?
static Dictionary<string, int> CountOrderedSequence(IEnumerable<string> source)
{
var result = new Dictionary<string, int>();
string prev = null;
int count = 0;
foreach (var s in source)
{
if (prev != s && count > 0)
{
result.Add(prev, count);
count = 0;
}
prev = s;
++count;
}
if (count > 0)
{
result.Add(prev, count);
}
return result;
}
Updated to add a necessary check for empty source - I still think it's simpler than Jon's :-)

Best way to remove items from a collection

What is the best way to approach removing items from a collection in C#, once the item is known, but not it's index. This is one way to do it, but it seems inelegant at best.
//Remove the existing role assignment for the user.
int cnt = 0;
int assToDelete = 0;
foreach (SPRoleAssignment spAssignment in workspace.RoleAssignments)
{
if (spAssignment.Member.Name == shortName)
{
assToDelete = cnt;
}
cnt++;
}
workspace.RoleAssignments.Remove(assToDelete);
What I would really like to do is find the item to remove by property (in this case, name) without looping through the entire collection and using 2 additional variables.
If RoleAssignments is a List<T> you can use the following code.
workSpace.RoleAssignments.RemoveAll(x =>x.Member.Name == shortName);
If you want to access members of the collection by one of their properties, you might consider using a Dictionary<T> or KeyedCollection<T> instead. This way you don't have to search for the item you're looking for.
Otherwise, you could at least do this:
foreach (SPRoleAssignment spAssignment in workspace.RoleAssignments)
{
if (spAssignment.Member.Name == shortName)
{
workspace.RoleAssignments.Remove(spAssignment);
break;
}
}
#smaclell asked why reverse iteration was more efficient in in a comment to #sambo99.
Sometimes it's more efficient. Consider you have a list of people, and you want to remove or filter all customers with a credit rating < 1000;
We have the following data
"Bob" 999
"Mary" 999
"Ted" 1000
If we were to iterate forward, we'd soon get into trouble
for( int idx = 0; idx < list.Count ; idx++ )
{
if( list[idx].Rating < 1000 )
{
list.RemoveAt(idx); // whoops!
}
}
At idx = 0 we remove Bob, which then shifts all remaining elements left. The next time through the loop idx = 1, but
list[1] is now Ted instead of Mary. We end up skipping Mary by mistake. We could use a while loop, and we could introduce more variables.
Or, we just reverse iterate:
for (int idx = list.Count-1; idx >= 0; idx--)
{
if (list[idx].Rating < 1000)
{
list.RemoveAt(idx);
}
}
All the indexes to the left of the removed item stay the same, so you don't skip any items.
The same principle applies if you're given a list of indexes to remove from an array. In order to keep things straight you need to sort the list and then remove the items from highest index to lowest.
Now you can just use Linq and declare what you're doing in a straightforward manner.
list.RemoveAll(o => o.Rating < 1000);
For this case of removing a single item, it's no more efficient iterating forwards or backwards. You could also use Linq for this.
int removeIndex = list.FindIndex(o => o.Name == "Ted");
if( removeIndex != -1 )
{
list.RemoveAt(removeIndex);
}
If it's an ICollection then you won't have a RemoveAll method. Here's an extension method that will do it:
public static void RemoveAll<T>(this ICollection<T> source,
Func<T, bool> predicate)
{
if (source == null)
throw new ArgumentNullException("source", "source is null.");
if (predicate == null)
throw new ArgumentNullException("predicate", "predicate is null.");
source.Where(predicate).ToList().ForEach(e => source.Remove(e));
}
Based on:
http://phejndorf.wordpress.com/2011/03/09/a-removeall-extension-for-the-collection-class/
For a simple List structure the most efficient way seems to be using the Predicate RemoveAll implementation.
Eg.
workSpace.RoleAssignments.RemoveAll(x =>x.Member.Name == shortName);
The reasons are:
The Predicate/Linq RemoveAll method is implemented in List and has access to the internal array storing the actual data. It will shift the data and resize the internal array.
The RemoveAt method implementation is quite slow, and will copy the entire underlying array of data into a new array. This means reverse iteration is useless for List
If you are stuck implementing this in a the pre c# 3.0 era. You have 2 options.
The easily maintainable option. Copy all the matching items into a new list and and swap the underlying list.
Eg.
List<int> list2 = new List<int>() ;
foreach (int i in GetList())
{
if (!(i % 2 == 0))
{
list2.Add(i);
}
}
list2 = list2;
Or
The tricky slightly faster option, which involves shifting all the data in the list down when it does not match and then resizing the array.
If you are removing stuff really frequently from a list, perhaps another structure like a HashTable (.net 1.1) or a Dictionary (.net 2.0) or a HashSet (.net 3.5) are better suited for this purpose.
What type is the collection? If it's List, you can use the helpful "RemoveAll":
int cnt = workspace.RoleAssignments
.RemoveAll(spa => spa.Member.Name == shortName)
(This works in .NET 2.0. Of course, if you don't have the newer compiler, you'll have to use "delegate (SPRoleAssignment spa) { return spa.Member.Name == shortName; }" instead of the nice lambda syntax.)
Another approach if it's not a List, but still an ICollection:
var toRemove = workspace.RoleAssignments
.FirstOrDefault(spa => spa.Member.Name == shortName)
if (toRemove != null) workspace.RoleAssignments.Remove(toRemove);
This requires the Enumerable extension methods. (You can copy the Mono ones in, if you are stuck on .NET 2.0). If it's some custom collection that cannot take an item, but MUST take an index, some of the other Enumerable methods, such as Select, pass in the integer index for you.
This is my generic solution
public static IEnumerable<T> Remove<T>(this IEnumerable<T> items, Func<T, bool> match)
{
var list = items.ToList();
for (int idx = 0; idx < list.Count(); idx++)
{
if (match(list[idx]))
{
list.RemoveAt(idx);
idx--; // the list is 1 item shorter
}
}
return list.AsEnumerable();
}
It would look much simpler if extension methods support passing by reference !
usage:
var result = string[]{"mike", "john", "ali"}
result = result.Remove(x => x.Username == "mike").ToArray();
Assert.IsTrue(result.Length == 2);
EDIT: ensured that the list looping remains valid even when deleting items by decrementing the index (idx).
Here is a pretty good way to do it
http://support.microsoft.com/kb/555972
System.Collections.ArrayList arr = new System.Collections.ArrayList();
arr.Add("1");
arr.Add("2");
arr.Add("3");
/*This throws an exception
foreach (string s in arr)
{
arr.Remove(s);
}
*/
//where as this works correctly
Console.WriteLine(arr.Count);
foreach (string s in new System.Collections.ArrayList(arr))
{
arr.Remove(s);
}
Console.WriteLine(arr.Count);
Console.ReadKey();
There is another approach you can take depending on how you're using your collection. If you're downloading the assignments one time (e.g., when the app runs), you could translate the collection on the fly into a hashtable where:
shortname => SPRoleAssignment
If you do this, then when you want to remove an item by short name, all you need to do is remove the item from the hashtable by key.
Unfortunately, if you're loading these SPRoleAssignments a lot, that obviously isn't going to be any more cost efficient in terms of time. The suggestions other people made about using Linq would be good if you're using a new version of the .NET Framework, but otherwise, you'll have to stick to the method you're using.
Similar to Dictionary Collection point of view, I have done this.
Dictionary<string, bool> sourceDict = new Dictionary<string, bool>();
sourceDict.Add("Sai", true);
sourceDict.Add("Sri", false);
sourceDict.Add("SaiSri", true);
sourceDict.Add("SaiSriMahi", true);
var itemsToDelete = sourceDict.Where(DictItem => DictItem.Value == false);
foreach (var item in itemsToDelete)
{
sourceDict.Remove(item.Key);
}
Note:
Above code will fail in .Net Client Profile (3.5 and 4.5) also some viewers mentioned it is
Failing for them in .Net4.0 as well not sure which settings are causing the problem.
So replace with below code (.ToList()) for Where statement, to avoid that error. “Collection was modified; enumeration operation may not execute.”
var itemsToDelete = sourceDict.Where(DictItem => DictItem.Value == false).ToList();
Per MSDN From .Net4.5 onwards Client Profile are discontinued. http://msdn.microsoft.com/en-us/library/cc656912(v=vs.110).aspx
Save your items first, than delete them.
var itemsToDelete = Items.Where(x => !!!your condition!!!).ToArray();
for (int i = 0; i < itemsToDelete.Length; ++i)
Items.Remove(itemsToDelete[i]);
You need to override GetHashCode() in your Item class.
The best way to do it is by using linq.
Example class:
public class Product
{
public string Name { get; set; }
public string Price { get; set; }
}
Linq query:
var subCollection = collection1.RemoveAll(w => collection2.Any(q => q.Name == w.Name));
This query will remove all elements from collection1 if Name match any element Name from collection2
Remember to use: using System.Linq;
To do this while looping through the collection and not to get the modifying a collection exception, this is the approach I've taken in the past (note the .ToList() at the end of the original collection, this creates another collection in memory, then you can modify the existing collection)
foreach (SPRoleAssignment spAssignment in workspace.RoleAssignments.ToList())
{
if (spAssignment.Member.Name == shortName)
{
workspace.RoleAssignments.Remove(spAssignment);
}
}
If you have got a List<T>, then List<T>.RemoveAll is your best bet. There can't be anything more efficient. Internally it does the array moving in one shot, not to mention it is O(N).
If all you got is an IList<T> or an ICollection<T> you got roughly these three options:
public static void RemoveAll<T>(this IList<T> ilist, Predicate<T> predicate) // O(N^2)
{
for (var index = ilist.Count - 1; index >= 0; index--)
{
var item = ilist[index];
if (predicate(item))
{
ilist.RemoveAt(index);
}
}
}
or
public static void RemoveAll<T>(this ICollection<T> icollection, Predicate<T> predicate) // O(N)
{
var nonMatchingItems = new List<T>();
// Move all the items that do not match to another collection.
foreach (var item in icollection)
{
if (!predicate(item))
{
nonMatchingItems.Add(item);
}
}
// Clear the collection and then copy back the non-matched items.
icollection.Clear();
foreach (var item in nonMatchingItems)
{
icollection.Add(item);
}
}
or
public static void RemoveAll<T>(this ICollection<T> icollection, Func<T, bool> predicate) // O(N^2)
{
foreach (var item in icollection.Where(predicate).ToList())
{
icollection.Remove(item);
}
}
Go for either 1 or 2.
1 is lighter on memory and faster if you have less deletes to perform (i.e. predicate is false most of the times).
2 is faster if you have more deletes to perform.
3 is the cleanest code but performs poorly IMO. Again all that depends on input data.
For some benchmarking details see https://github.com/dotnet/BenchmarkDotNet/issues/1505
A lot of good responses here; I especially like the lambda expressions...very clean. I was remiss, however, in not specifying the type of Collection. This is a SPRoleAssignmentCollection (from MOSS) that only has Remove(int) and Remove(SPPrincipal), not the handy RemoveAll(). So, I have settled on this, unless there is a better suggestion.
foreach (SPRoleAssignment spAssignment in workspace.RoleAssignments)
{
if (spAssignment.Member.Name != shortName) continue;
workspace.RoleAssignments.Remove((SPPrincipal)spAssignment.Member);
break;
}

Categories

Resources