Remove Range of items with HashSet - c#

I am using an HashSet.
I am looking for a way to remove range of items from the beginning of the HashSet.
With List it can be done with RemoveRange
Example
Removes 10 items from the beginning:
dinosaurs.RemoveRange(0, 10);
Can this be done with HashSet?
[Edit] The the order of the HashSet does not matter since it contains only random strings.

HashSet stores unique pieces of data (under the hood, the storage is identical to the keys stored in a Dictionary) so it doesn't make much sense to remove "the first" 10 items, since they'll be effectively randomly ordered, only not actually random for most needs.
If you need to order them more appropriately for a random, OrderBy can do that:
var r = new Random();
foreach(var dinoToRemove in dinosaurs
.OrderBy(x => r.Next())
.Take(10))
{
dinosaurs.Remove(dinoToRemove);
}
If you really are determined to remove the first ten, you could also use an iterator with RemoveWhere
var count = 0;
dinosaurs.RemoveWhere(x => count++ < 10);

You can use LINQ and pass the results to a HashSet constructor:
var filteredList = new HashSet<Dinosaur>
(
originalList.Skip(10)
);
As Servy notes, a HashSet may not be ordered, so you might want to get an ordered enumerable before you skip 10.
var filteredList = new HashSet<Dinosaur>
(
originalList
.OrderBy( x => x.Foo )
.Skip(10)
);
If you don't want to instantiate a new HashSet (i.e. you want to remove items from the existing instance) you will need to do that one by one.

Related

Performance Improvement Tips for ForEach loop in C#?

I need to optimize the below foreach loop. The foreach loop is taken more time to get the unique items.
Instead can the FilterItems be converted into a list collection. If so how to do it. Then i will take unique items easily from it.
The problem arises when i have 5,00,000 items in FilterItems.
Please suggest some ways to optimize the below code:
int i = 0;
List<object> order = new List<object>();
List<object> unique = new List<object>();
// FilterItems IS A COLLECTION OF RECORDS. CAN THIS BE CONVERTED TO A LIST COLLECTION DIRECTLY, SO THAT I CAN TAKE THE UNIQUE ITEMS FROM IT.
foreach (Record rec in FilterItems)
{
string text = rec.GetValue(“Column Name”);
int position = order.BinarySearch(text);
if (position < 0)
{
order.Insert(-position - 1, text);
unique.Add(text);
}
i++;
}
It's unclear what you mean by "converting FilterItems into a list" when we don't know anything about it, but you could definitely consider sorting after you've got all the items, rather than as you go:
var strings = FilterItems.Select(record => record.GetValue("Column Name"))
.Distinct()
.OrderBy(x => x)
.ToList();
The use of Distinct() here will avoid sorting lots of equal items - it looks like you only want distinct items anyway.
If you want unique to be in the original order but order to be the same items, just sorted, you could use:
var unique = FilterItems.Select(record => record.GetValue("Column Name"))
.Distinct()
.ToList();
var order = unique.OrderBy(x => x).ToList();
Now Distinct() isn't guaranteed to preserve order - but it does so in the current implementation, and that's the most natural implementation, too.

ConcurrentDictionary.Where very slow for filtering based int array (Key field)

I have the following
var links = new ConcurrentDictionary<int, Link>();
which is populated with around 20k records, I have another array of strings (List) that I turn into int array using following.
var intPossible = NonExistingListingIDs.Select(int.Parse); //this is very fast but need to be done
which is pretty fast. but I would like to create a new list or filter out "links" only to what is actually in the intPossible array which matches the Key element of the ConcurrentDictionary.
I have the following using a where clause but it takes about 50 seconds to do the actual filtering which is very slow for what I want to do.
var filtered = links.Where(x => intPossible.Any(y => y == x.Key)).ToList();
I know intersect is pretty fast but I have a array of ints and intersect is not working with this against a ConcurrentDictionary
How can i filter the links to be a little faster instead of 50 seconds.
You need to replace your O(n) inner lookup with something more speedy like a hashset which offers O(1) complexity for lookups.
So
var intPossible = new HashSet<int>(NonExistingListingIDs.Select(int.Parse));
and
var filtered = links.Where(x => intPossible.Contains(x.Key)).ToList();
This will avoid iterating most of intPossible for every item in links.
Alternatively, Linq is your friend:
var intPossible = NonExistingListingIDs.Select(int.Parse);
var filtered =
links.Join(intPossible, link => link.Key, intP => intP, (link, intP) => link);
The implementation of Join does much the same thing as I do above.
An alternative method would be to enumerate your list and use the indexer of the dictionary...might be a little cleaner...
var intPossible = NonExistingListingIDs.Select(int.Parse);
var filtered = from id in intPossible
where links.ContainsKey(id)
select links[id];
You might want to chuck in a .ToList() in there for good measure too...
This should actually be slightly faster than #spender's solution, since .Join has to create a new HashTable, whilst this method uses the HashTable in the ConcurrentDictionary.

Remove sublist from a list

I have 2 lists: list1 and list2 (both of type int)
Now I want to remove content of list2 from list1. How I can do this in C#?
PS: Don't use loop.
IMPORTANT CHANGE
As was pointed out in the comments, .Except() uses a set internally, so any duplicate members of list1 will be absent in the final result.
Produces the set difference of two sequences
http://msdn.microsoft.com/en-us/library/system.linq.enumerable.except(v=vs.110).aspx
However, there is a solution that is both O(N) and preserves duplicates in the original list: Modify the RemoveAll(i => list2.Contains(i)) approach to use a HashSet<int> to hold the exclusion set.
List<int> list1 = Enumerable.Range(1, 10000000).ToList();
HashSet<int> exclusionSet = Enumerable.Range(500000, 10).ToHashSet();
list1.Remove(i => exclusionSet.Contains(i));
The extension method ToHashSet() is available in MoreLinq.
ORIGINAL ANSWER
You can use Linq
list1 = list1.Except(list2).ToList();
UPDATE
Out of curiosity I did a simple benchmark of my solution vs. #HighCore's.
For list2 having just one element, his code is faster. As list2 gets larger and larger, his code gets extremely slow. It looks like his is O(N-squared) (or more specifically O(list1.length*list2.length) since each item in list1 is compared to each item in list2). Don't have enough data points to check the Big-O of my solution, but it is much faster when list2 has more than a handful of elements.
Code used to test:
List<int> list1 = Enumerable.Range(1, 10000000).ToList();
List<int> list2 = Enumerable.Range(500000, 10).ToList(); // Gets MUCH slower as 10 increases to 100 or 1000
Stopwatch sw = Stopwatch.StartNew();
//list1 = list1.Except(list2).ToList();
list1.RemoveAll(i => list2.Contains(i));
sw.Stop();
var ms1 = sw.ElapsedMilliseconds;
UPDATE 2
This solution assigns a new list to the variable list1. As #Толя points out, other references (if any) to the original list1 will not be updated. This solution drastically outperforms RemoveAll for all but the smallest sizes of list2. If no other references must see the update, it is preferable for that reason.
list1.RemoveAll(x => list2.Contains(x));
You can use this:
List<T> result = list1.Except(list2).ToList();
This will remove every item in the secondList from the firstList:
firstList.RemoveAll( item => { secondList.Contains(item); } );

Some misunderstanding about sort extensions

I have the following code example:
List<int> list = new List<int>();
list.Add(1);
list.Add(2);
list.Add(3);
list.Add(4);
list.Add(5);
list.Add(6);
list.Add(7);
list.OrderByDescending(n=>n).Reverse();
But when I use this:
list.OrderByDescending(n=>n).Reverse();
I don't get wanted result.
If instead of the above statement, I use this one:
list.Reverse();
I get the wanted result.
Any idea why I don't get wanted result using the first statement ?
I believe I am missing something in understanding the extensions.
Thank you in advance.
The list.Reverse() method reverses the list in-place, so your original list is changed.
The .OrderByDescending() extension method produces a new list (or rather an IEnumerable<T>) and leaves your original list intact.
EDIT
To get two lists, for both sort orders:
List<int> upList = list.OrderBy(n => n).ToList();
List<int> downList = list.OrderByDescending(n => n).ToList();
Edit: So the problem seems to be that you think that the Enumerable extensions would change the original collection. No they do not. Actually they return something new you need to asign to a variable:
IEnumerable<int> ordered = list.OrderByDescending(n => n);
foreach(int i in ordered)
Console.WriteLine(i);
OrderByDescending orders descending(highest first) which is what you obviously want. So i don't understand why you reverse it afterwards.
So this should give you the expected result:
var ordered = list.OrderByDescending(n=> n);
This returns an "arbitrary" order:
list.Reverse()
since it just reverses the order you have added the ints. If you have added them in an ordered way you don't need to order at all.
In general: use OrderBy or OrderByDescending if you want to order a sequence and Reverse if you want to invert the sequence what is not necessarily an order (it is at least confusing).

How to delete multiple entries from a List without going out of range?

I have a list of integers that contains a number of values (say, 200).
List<int> ExampleList;
And another list on integers that holds the indexes that need to be deleted from ExampleList. However, this list is not sorted.
List<int> RemoveFromExampleList;
If it were sorted, I would have run a reverse loop and deleted all the values like this:
for (int i = (RemoveFromExampleList.Count-1); i >=0; i--)
{
ExampleList.RemoveAt(RemoveFromExampleList[i]);
}
Do I have to sort RemoveFromExampleList, or is there another way to prune the unnecessary values from ExampleList?
If I do have to sort, whats the easiest way to sort? Is there any inbuilt C# library/method to sort?
If RemoveFromExampleList is a list of indexes, you would have to sort it and work in descending order to delete based on those indexes. Doing it any other way would cause you to delete values you don't mean to delete.
Here is the one liner.
ExampleList.RemoveAll(x => RemoveFromExampleList.Contains(ExampleList.IndexOf(x)));
You could replace the values you are going to remove with a sentinel value, i.e., one that you know doesn't occur in the list, and then remove all occurrences of that value.
Your option is to sort, yes. Sort the removal list in descending order and then remove by index that way.
// perform an orderby projection, remove
foreach (int index in RemoveFromExampleList.OrderByDescending(i => i)
ExampleList.RemoveAt(index);
Or
// actually sort the list, then remove
RemoveFromExampleList.Sort((a,b) => b.CompareTo(a));
foreach (int index in RemoveFromExampleList)
ExampleList.RemoveAt(index);
(Assumes there are no duplicates, use .Distinct() on the list/projection if otherwise.)
If you really had some aversion to sorting the list, you could make the list a list of nullable ints:
List<int?> ints;
Then you could nullify the values in the "delete list", and use the RemoveAll method to delete the null values.
But this is obviously a bit of a hack.
You could do it with LINQ / Lambda like so:
//EXAMPLE TO REMOVE ITEMS COMING FROM ANOTHER LIST
List masterList = new List();
masterList.Add(1);
masterList.Add(1);
masterList.Add(2);
masterList.Add(3);
List<int> itemsToRemove = new List<int>();
itemsToRemove.Add(1);
itemsToRemove.Add(2);
itemsToRemove.Add(3);
List<int> cleanList = new List<int>();
foreach (int value in itemsToRemove)
{
masterList = masterList.Where(x => x != value).ToList();
}

Categories

Resources