I'm looking for a performant way to add distinct items of a second ICollection to an existing one. I'm using .NET 4.
This should do it:
list1.Union(list2).Distinct(aCustomComparer).ToList()
As long as they're IEnumerable, you can use the go-to Linq answer:
var union = firstCollection.Union(secondCollection);
This will use the default equality comparison, which for most objects is referential equality. To change this, you can define an IEqualityComparer generic to the item type in your collection that will perform a more semantic comparison, and specify it as the second argument of the Union.
Another way to add to your exisiting list would be:
list1.AddRange(list2.Distinct().Except(list1));
The most direct answer to your question - since you didn't give much detail on the actual types of ICollection you have as input or need as output is the one given by KeithS
var union = firstCollection.Union(secondCollection);
This will return a distinct IEnumerable - if that is what you need then it is VERY fast. I made a small test app (below) that ran the union method (MethodA) against a simple hashset method of deduplicating and returns a Hashset<>(MethodB). The union method DESTROYS the hashset:
MethodA: 1ms
MethodB: 2827ms
However -- Having to convert that IEnumerable to some other type of collection such as List<> (like the version ADas posted) changes everything:
Simply adding .ToList() to MethodA
var union = firstCollection.Union(secondCollection).ToList();
Changes the results:
MethodA: 3656ms
MethodB: 2803ms
So - it seems more would need to be known about the specific case you are working with - and any solution you come up with should be tested - since a small (code) change can have HUGE impacts.
Below is the test I used to compare these methods - I'm sure it is a stupid way to test - but it seems to work :)
private static void Main(string[] args)
{
ICollection<string> collectionA = new List<string>();
ICollection<string> collectionB = new List<string>();
for (int i = 0; i < 1000; i++)
{
string randomString = Path.GetRandomFileName();
collectionA.Add(randomString);
collectionA.Add(randomString);
collectionB.Add(randomString);
collectionB.Add(randomString);
}
Stopwatch testA = new Stopwatch();
testA.Start();
MethodA(collectionA, collectionB);
testA.Stop();
Stopwatch testB = new Stopwatch();
testB.Start();
MethodB(collectionA, collectionB);
testB.Stop();
Console.WriteLine("MethodA: {0}ms", testA.ElapsedMilliseconds);
Console.WriteLine("MethodB: {0}ms", testB.ElapsedMilliseconds);
Console.ReadLine();
}
private static void MethodA(ICollection<string> collectionA, ICollection<string> collectionB)
{
for (int i = 0; i < 10000; i++)
{
var result = collectionA.Union(collectionB);
}
}
private static void MethodB(ICollection<string> collectionA, ICollection<string> collectionB)
{
for (int i = 0; i < 10000; i++)
{
var result = new HashSet<string>(collectionA);
foreach (string s in collectionB)
{
result.Add(s);
}
}
}
Related
I have two record structures and two lists as follows:
public struct gtAliasRecType : ICloneable
{
public int lRecordNum;
public double dLocationCd;
}
public struct gtCVARecType : ICloneable
{
public double dLocationCd;
}
static public List<gtCVARecType> LCVARec = null;
static public List<gtAliasRecType> LAliasRec = null;
Now i want to iterate "LAliasRec" list and find whether similar "dLocationCd" exists in "LCVARec" list or not.
I tried using "Contains" and "Find" function of list1 but ended up in errors.
public static void XYZ()
{
gtAliasRecType uAliasRec = gtAliasRecType.CreateInstance();
gtCVARecType uCVARec = gtCVARecType.CreateInstance();
for (int i = 0; i < LAliasRec.Count; i++)
{
uAliasRec = LAliasRec[i];
//trying Find method
gtCVARecType c1 = LCVARec.Find(uAliasRec.dLocationCd);
//trying Contains method
bool nReturn = LCVARec.Contains( uAliasRec.dLocationCd );
}
}
However, i ran into "Cannot convert from 'double' to 'gtCVARecType' error.
Contains & Find
Thanks in advance :)
You can't use Contains to find an item of a different type. You can use Find, but I'd personally use the LINQ Any method:
foreach (var uAliasRec in LAliasRec)
{
bool nReturn = LCVARec.Any(rec => rec.dLocationCd == uAliasRec.dLocationCd);
// Presumably do something with nReturn
}
If the lists are large, you might want to create a HashSet<double> for all the locations first, which is an up-front cost that will make everything else cheaper:
HashSet<double> locations = new HashSet<double>(LCVARec.Select(rec => rec.dLocationCd));
foreach (var uAliasRec in LAliasRec)
{
bool nReturn = locations.Contains(uAliasRec.dLocationCd);
// Presumably do something with nReturn
}
As an aside, I'd strongly advise you to start following regular .NET naming conventions. In its current form, your code is going to be very hard for anyone used to regular C# code to work with.
What about using Intersect
var results = LAliasRec
.Select(x => x.dLocationCd)
.Intersect(LCVARec.Select(x => x.dLocationCd));
bool exists = results.Count() > 0;
Select only the double values, and get intersected ones. If Count greater than 0, you got mutual property values.
You can use LINQ and Inner join to find the intersection of two lists.
var query = from lcva in LCVARec
join lAlias in LAliasRec on lcva.dLocationCd equals lAlias.dLocationCd
select lcva;
Console.WriteLine(query.Count()); //prints number of matching items.
Update
If you can change the List<T> to SortedList<TKey, TValue> of SortedDictionary<TKey, TValue> it will help in quicker lookup.
If you prefer to use Contains() you must implement IEquatable<T> and if you want performance you have to Sort() which needs the class to have IComparable<T> and then do BinarySearch
Reference : https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.list-1?view=netcore-3.1#remarks
I've the following method:
private void SelectingCoreItems(SortedList<ICoreItem, ICoreItem> sortedList)
{
for (int i = 0; i < VisibleCoreItems.Count; i++)
{
CoreItem currentItem = VisibleCoreItems[i];
if (sortedList.ContainsKey(currentItem))
{
itemListView.SelectedItems.Add(currentItem);
}
}
}
I want to mark all equal items. That works, but the performance is very bad because the sortedList contains 10,000 items and the VisibleCoreItems over 200,000 items.
Is there a way to optimize the method?
You can use HashSet<ICoreItem> instead of SortedList<ICoreItem, ICoreItem>:
private void SelectingCoreItems(SortedList<ICoreItem, ICoreItem> sortedList)
{
var lookup = new HashSet<ICoreItem>(sortedList.Select(i => i.Key));
for (int i = 0; i < VisibleCoreItems.Count; i++)
{
CoreItem currentItem = VisibleCoreItems[i];
if (lookup.Contains(currentItem))
{
itemListView.SelectedItems.Add(currentItem);
}
}
}
Also, it may be slow to compare instances of ICoreItem interface (depending on implementation). If they contain property that is unique for elements in sortedList (for example, Id), it may be worth to use this property for lookup set.
EDIT. If elements of sortedList do not have reasonable GetHashCode and Equals implementation you may also need to specify IEqualityComparer<ICoreItem> as second argument to HashSet constructor.
What's the best way to create a list with an arbitrary number of instances of the same object? i.e is there a more compact or efficient way to do the following?
static List<MyObj> MyObjs = Enumerable.Range(0, 100)
.Select(i => new MyObj())
.ToList();
(Enumerable.Repeat would give me ten references to the same object, so I don't think it would work.)
Edited to reflect that this method does not work.
I was curious about your comment about Enumerable.Repeat, so I tried it.
//do not use!
List<object> myList = Enumerable.Repeat(new object(), 100).ToList();
I confirmed that they do all share the same reference like the OP mentioned.
This wouldn't be hard to implement as an iterator:
IEnumerable<T> CreateItems<T> (int count) where T : new() {
return CreateItems(count, () => new T());
}
IEnumerable<T> CreateItems<T> (int count, Func<T> creator) {
for (int i = 0; i < count; i++) {
yield return creator();
}
}
Apparently, the answer is "no". Thanks, everyone!
Not sure what is wrong with a for loop in this case. At the very least, we can presize the capacity of the list. That might not be important for 100 objects, but the size is arbitrary.
public class MyClass
{
static int Capacity = 100;
static List<MyObj> MyObjs = new List<MyObj>(Capacity);
static MyClass() {
for( var i = 0; i < Capacity; i++ ) {
MyObjs.Add(new MyObj());
}
}
}
you can use the enumerable as a base and use select to create the new objects:
List<object> myList = Enumerable.Repeat(null, 100).Select(_ => new object()).ToList();
you can attach a debugger, the new is executed every time.
This is almost the same as your code, but you don't have to provide a fake range.
You only provide a null as fake object you want to repeat...
I created a method to organize a generic list without know the type, it will sort if its int or decimal.
However the code that retrieves the values from textboxes uses List
I tried to convert it to List, but it doesnt work.
I want this code to work if they type integers or decimals or strings in the textboxes.
This was part of an interview question where they asked not to use the sort method, and that the input should receive for example INTS or DECIMALS
private void btnSort_Click(object sender, EventArgs e)
{
List<int> list = new List<int>();
list.Add(int.Parse(i1.Text));
list.Add(int.Parse(i2.Text));
list.Add(int.Parse(i3.Text));
list.Add(int.Parse(i4.Text));
list.Add(int.Parse(i5.Text));
Sort(list);
StringBuilder sb = new StringBuilder();
foreach (int t in list)
{
sb.Append(t.ToString());
sb.AppendLine();
}
result.Text = sb.ToString();
}
private void Sort<T>(List<T> list)
{
bool madeChanges;
int itemCount = list.Count;
do
{
madeChanges = false;
itemCount--;
for (int i = 0; i < itemCount; i++)
{
int result = Comparer<T>.Default.Compare(list[i], list[i + 1]);
if (result > 0)
{
Swap(list, i, i + 1);
madeChanges = true;
}
}
} while (madeChanges);
}
public List<T> Swap<T>(List<T> list,
int firstIndex,
int secondIndex)
{
T temp = list[firstIndex];
list[firstIndex] = list[secondIndex];
list[secondIndex] = temp;
return list;
}
I wanted that something like this: but gives error
Error 1 The type or namespace name 'T' could not be found (are you missing a using directive or an assembly reference?) c:\users\luis.simbios\documents\visual studio 2010\Projects\InterViewPreparation1\InterViewPreparation1\Generics\GenericsSorting1.cs 22 18 InterViewPreparation1
List list = new List();
list.Add(i1.Text);
list.Add(i2.Text);
Sort(list);
because its an interview question in which they asked not to use the
sort method.
In this case you can add a generic constraint IComparable<T> and then use the CompareTo() method:
private void Sort<T>(List<T> list) where T : IComparable<T>
{
//...
}
Edit:
You would have to write custom code to determine whether the input is string, int or decimal, i.e. use TryParse(..) - this will be very fragile though. Once you do know the type (one way or another) you can use MakeGenericType() and Activator.CreateInstance() to create your List<T> object at run time and then use MakeGenericMethod() to call your generic method:
Type type = typeof(string);
IList list = (IList) Activator.CreateInstance(typeof(List<>).MakeGenericType(type));
//add items to list here
var p = new Program();
MethodInfo method = typeof(Program).GetMethod("Sort");
MethodInfo genericMethod = method.MakeGenericMethod(new Type[] { type });
genericMethod.Invoke(p, new [] {list} );
I am pretty sure that is not what the interview question intended to ask for.
First, as Jason points out, let the platform do the work for you - call .Sort.
Second, it looks to me like you're going to have to select the 'T' of the List based on examining the contents of the textboxes so you can handle ints vs. strings, etc. And then assign items to the list based on that. But once you have decided, your sort won't care.
You're not going about this the right way. Embrace generics correctly. What you want is this:
public string Foo<T>(IEnumerable<string> strings) where T : struct, IComparable<T> {
var list = strings.Select(s => (T)Convert.ChangeType(s, typeof(T))).ToList();
list.Sort((x, y) => (x.CompareTo(y)));
return String.Join("\n", list);
}
Now you can say
string response = Foo<int>(strings);
or
string response = Foo<decimal>(strings);
depending on which you want.
Note that
We use List<T>.Sort to do the sorting.
We use String.Join to build the string to display back to the user.
This should compile, but please excuse trivial errors if it doesn't. I can't fire up the ol' compiler right now.
Edit: I see you edited in that you can't use List<T>.Sort. It's easy enough to replace my use of List<T>.Sort with your own implementation.
Try something like:
private static IList foobar(Type t)
{
var listType = typeof(List<>);
var constructedListType = listType.MakeGenericType(t);
var instance = Activator.CreateInstance(constructedListType);
return (IList)instance;
}
Then use:
IList list = foobar(TYPE);
Where TYPE is the type that you want you list to be.
Hope this helps!
I m not on .NET 4.
I get a huge list from a data source. When the number of elements in the list are higher than X i like to partition the list, assign each partition to a thread. after processing partitions i like to merge them.
var subsets = list.PartitionEager(50000);
//var subsets = list.Partition<T>(50000);
Thread[] threads = new Thread[subsets.Count()];
int i = 0;
foreach (var set in subsets)
{
threads[i] = new Thread(() => Convertor<T>(set));
threads[i].Start();
i++;
}
for (int j = 0; j < i; j++)
{
threads[j].Join();
}
Convertor method is a static method that takes a list and does some lookup.
public static void Convertor<T>(List<T> list) where T : IInterface {
foreach (var element in list)
{
**// do some lookup and assing a value to element
// then do more lookup and assign a value to element**
}
}
When i run this code, even though i know that most of the elements will be assigned a value. They are in fact coming back null.
I m aware that the copy of the list will be passed to the method but any change to the element should be reflected in the upper method. however this is happening only within the final subset.
I even added some code to merge the lists into a single one.
list.Clear();
foreach (var set in subsets)
{
list.AddRange(set);
}
code for paritioning:
public static List<List<T>> PartitionEager<T>(this List<T> source, Int32 size)
{
List<List<T>> merged = new List<List<T>>();
for (int i = 0; i < Math.Ceiling(source.Count / (Double)size); i++)
{
merged.Add(new List<T>(source.Skip(size * i).Take(size)));
}
return merged;
}
What am i doing wrong? how to resolve this issue? i d like the elements to be assigned values after the lookups? is this related to synchronization or parameter passing?
If .NET 4 is an option, you can just use Parallel.For or Parallel.ForEach. These methods automatically handle partitioning for you, as well as providing many other advantages in terms of scalability across multiple degrees of concurrency on different systems.
Looks like you're having modified closure while creating threads. If I'm correct then all your threads update the same (last) set. Modify the code in this way:
foreach (var set in subsets)
{
var setLocalCopy = set;
threads[i] = new Thread(() => Convertor<T>(setLocalCopy));
threads[i].Start();
i++;
}