How to make extension unionWith for Hashset - c#

I am trying to make an extension for the custom type. This is my code. I don't know how my source becomes zero in this code. Even in the debug part hashset temp is giving me a list of 10 logevents. But in the final the source is becoming zero.
public static void UnionSpecialWith(this HashSet<LogEvent> source, List<LogEvent> given,IEqualityComparer<LogEvent> comparer)
{
List<LogEvent> original = new List<LogEvent>(source);
List<LogEvent> second = given.Condense(comparer);
source = new HashSet<LogEvent>(original.Condense(comparer),comparer);
foreach (LogEvent logEvent in second)
{
if (original.Contains(logEvent, comparer))
{
int index = original.FindIndex(x => comparer.Equals(x, logEvent));
original[index].filesAndLineNos.MergeFilesAndLineNos(logEvent.filesAndLineNos);
}
else
original.Add(logEvent);
}
#if DEBUG
String content = String.Join(Environment.NewLine, original.Select(x => x.GetContentAsEventsOnly()));
HashSet<LogEvent> temp = new HashSet<LogEvent>(original, comparer);
#endif
source = new HashSet<LogEvent>(original, comparer);
}
Can anybody point me out what is wrong?
EDIT:
This is my custom type. Whenever I found a duplicate , I want to merge it's "filesAndLineNos" with the original one. This is what I am trying to achieve with the above code.
public class LogEvent
{
public String mainEventOriginal;
public String subEventOriginal;
public String mainEvent;
public String subEvent;
public int level;
public Dictionary<String,HashSet<int>> filesAndLineNos;
}
The usage is something like
HashSet<LogEvent> required = new HashSet<LogEvent>(initialUniqueSet);
required.UnionSpecialWith(givenListOfLogEvents);

This is simply a matter of parameters being passed by value in .NET by default. You're changing the value of source to refer to a different HashSet, and that doesn't change the caller's variable at all. Assuming that Condense doesn't modify the list (I'm unaware of that method) your method is as pointless as:
public void TrimString(string text)
{
// This changes the value of the *parameter*, but doesn't affect the original
// *object* (strings are immutable). The caller won't see any effect!
text = text.Trim();
}
If you call the above with:
string foo = " hello ";
TrimString(foo);
... then foo is still going to refer to a string with contents " hello ". Obviously your method is more complicated, but the cause of the problem is the same.
Either your extension method needs to modify the contents of the original HashSet passed in via the source parameter, or it should return the new set. Returning the new set is more idiomatically LINQ-like, but HashSet.UnionWith does modify the original set - it depends which model you want to be closer to.
EDIT: If you want to modify the set in place, but effectively need to replace the contents entirely due to the logic, then you might want to consider creating the new set, then clearing the old and adding all the contents back in:
public static void UnionSpecialWith(this HashSet<LogEvent> source,
List<LogEvent> given,
IEqualityComparer<LogEvent> comparer)
{
List<LogEvent> original = new List<LogEvent>(source);
List<LogEvent> second = given.Condense(comparer);
foreach (LogEvent logEvent in second)
{
if (original.Contains(logEvent, comparer))
{
int index = original.FindIndex(x => comparer.Equals(x, logEvent));
original[index].filesAndLineNos
.MergeFilesAndLineNos(logEvent.filesAndLineNos);
}
else
{
original.Add(logEvent);
}
}
source.Clear();
foreach (var item in original)
{
source.Add(item);
}
}
However, note:
This does not replace the comparer in the existing set. You can't do that.
It's pretty inefficient in general. It feels like a Dictionary would be a better fit, to be honest.

Related

Return Linq.Where object (IEnumerable) from within a lock - is it thread safe?

Consider the following code block
public class Data
{
public bool Init { get; set; }
public string Value {get; set; }
}
public class Example
{
private Object myObject = new Object();
private List<Data> myList = new List<Data>
{
new Data { Init = true, Value = "abc" },
new Data { Init = false, Value = "def" },
};
public IEnumerable<string> Get()
{
lock(this.myObject)
{
return this.myList.Where(i => i.Init == true).Select(i => i.Value);
}
}
public void Set(string value)
{
lock (this.myObject)
{
this.myList.Add(new Data { Init = false, Value = value });
}
}
}
If multiple threads are calling Get() - will the method be thread-safe?
In addition - will invoking .ToList() at the linq query will make it thread-safe?
return this.myList.Where(i => i.Init == true).Select(i => i.Value).ToList()
Note that you do not lock here:
public void Set(string value)
{
this.myList.Add(new Data { Init = false, Value = value });
}
So it's not thread-safe in any case.
Assuming you just forgot to do that - it's still not safe because Get returns "lazy" IEnumerable. It holds a reference to myList and it will enumerate it only when returned IEnumerable itself will be enumerated. So you are leaking reference to myList you are trying to protect with lock, outside of lock statement to arbitrary code.
You can test it like this:
var example = new Example();
var original = example.Get();
example.Clear(); // new method which clears underlying myList
foreach (var x in original)
Console.WriteLine(x);
Here we call Get, then we clear myList and then we enumerate what we got from Get. Naively one may assume original will contain original 2 items we had, but it will not contain anything, because it's evaluated only when we enumerate original, and at that point in time - list has already been cleared and is empty.
If you use
public IList<string> Get()
{
lock(this.myObject)
{
return this.myList.Where(i => i.Init == true).Select(i => i.Value).ToList();
}
}
Then it will be "safe". Now you return not "lazy" IEnumerable but a new instance of List<> with copies of values you have in myList. Note that it's a good idea to change return type to IList here, otherwise caller might pay additional overhead (like calling ToArray or ToList which makes a copy) while it's not necessary in this case.
You have to be aware about the difference between the potential to enumerate a sequence (= the IEnumerable) and the enumerated sequence itself (the List, Array, etc after you enumerated the sequence.
So you have a class Example, which internally holds a List<Data> in member MyList. Every Data has at least a string property 'Value`.
Class Example has Methods to extract Value, and to add new elements to the MyList.
I'm not sure if it is wise to call them Set and Get, these names are quite confusing. Maybe you've simplified your example (which by the way made it more difficult to talk about it).
You have an object of class Example and two threads, that both have access to this object. You worry, that while one thread is enumerating the elements of the sequence, that the other thread is adding elements of the sequence.
Your Get method returns the "potential to enumerate". The sequence is not enumerated yet after you return from Get, and after the Lock is disposed.
This means, that when you start enumerating the sequence, the Data is not locked anymore. If you've ever returned an IEnumerable from data that you fetched from a database, you probably have seen the same problem: the connection to the database is disposed before you start enumerating.
Solution 1: return enumerated data: inefficient
You already mention one solution: enumerate the data in a List before you return. This way, property MyList will not be accesses after you return from Get, so the lock is not needed anymore:
public IEnumerable<string> GetInitializedValues()
{
lock(this.MyList)
{
return this.MyList
.Where(data => data.Init == true)
.Select(data => data.Value)
.ToList();
}
}
In words: Lock MyList, which is a sequence of Data. Keep only those Data in this sequence that have a true value for property Init. From every remaining Data, take the value of property Value and put them in a List. Dispose the lock and return the List.
This is not efficient if the caller doesn't need the complete list.
// Create an object of class Example which has a zillion Data in MyList:
Example example = CreateFilledClassExample();
// I only want the first element of MyList:
string onlyOneString = example.GetInitializedValues().FirstOrDefault();
GetInitializedValues creates a list of zillion elements, and returns it. The caller only takes the first initialized value and throws the rest of the list away. What a waste of processing power.
Solution 2: use yield return: only enumerate what must be enumerated
The keyword yield means something like: return the next element of the sequence. Keep everything alive, until the caller disposes the IEnumerator
public IEnumerable<string> GetInitializedValues()
{
lock(this.MyList)
{
IEnumerable<string> initializedValues = this.MyList
.Where(data => data.Init == true)
.Select(data => data.Value);
foreach (string initializedValue in initializedValues)
{
yield return initializedValue;
}
}
}
Because the yield is inside the lock, the lock remains active, until you dispose the enumerator:
List<string> someInitializedValues = GetInitializedValues()
.Take(3)
.ToList();
This one is save, and only enumerates the first three elements.
Deep inside it will do something like this:
List<string> someInitializedValues = new List<string>();
IEnumerable<string> enumerableInitializedValues = GetInitializedValues();
// MyList is not locked yet!
// Create the enumerator. This is Disposable, so use using statement:
using (IEnumerator<string> initializedValuesEnumerator = enumerableInitializedValues.GetEnumerator())
{
// start enumerating the first 3 elements (remember: we used Take(3)
while (initializedValuesEnumerator.MoveNext() && someInitializedValues.Count < 3)
{
// there is a next element, fetch it and add the fetched value to the list
string fetchedInitializedValue = initializedValuesEnumerator.Current;
someInitializedValues.Add(fetchedInitializedValue);
}
// the enumerator is not disposed yet, MyList is still locked.
}
// the enumerator is disposed. MyList is not locked anymore

How to delete item from LinkedList by specific compare?

I have LinkedList there is Remove(item) method that get as param item.
I would like to know what I have to override to delete item by specific param?
For example I have LinkedList<MyClass>, in MyClass I have variable int index and I would like to compare by this value...
Like, I am looking for some override compare(Obj) method that I can override and compare objects when I delete items from LinkedList by value that I need...
EDIT
Method Where does not fit, because actually I have a generic LinkedList implementation and actually my LinkedList looks like LinkedList<TYPE>, and it could be any of type. Because of this I can't use where because actually I don't know the exact type
The most simple way is to set a variable that will check if each value in a linked way is the one you're looking to delete and ALSO set a variable that is one value/step behind whatever the first variable is tracking.
That way, when the main variable (the one doing the checking) spots the list value you want to remove, you set the behind variable's "next value" equal to the main variable's "next value" thus overwriting and deleting it.
Each time you increment the main variable, also increment the behind variable so that you can keep it one step behind.
DIAGRAM (left is before, right is after): https://gyazo.com/4d989b6ff6249249c9d63a17a830a8c1
Basically, just set two variables: one that is checking each linked list value to find the one you're looking to delete and one BEHIND it so that when you find the value you want to delete, you set the variable that's behind it to the value in front of the main variable thus overwriting the deleted part.
using System;
using System.Collections.Generic;
namespace ConsoleAppCore
{
public static class Extension
{
public static List<dynamic> Where(this IEnumerable<dynamic> list, Func<dynamic, bool> func)
{
List<dynamic> result = new List<dynamic>();
foreach(dynamic item in list) {
try {
if (func(item))
result.Add(item);
}
catch {
continue;
}
}
return result;
}
}
class YourClass
{
public int x = 5;
}
class Program
{
static void Main(string[] args)
{
LinkedList<dynamic> list = new LinkedList<dynamic>();
list.AddAfter(list.AddAfter(list.AddAfter(list.AddAfter(
list.AddAfter(list.AddAfter(list.AddAfter(list.AddFirst(
(decimal)1), 2), (double)3), "Hello"), 5), new YourClass()), (float)7), 8);
var newlist = list.Where(i => i == "Hello");
// only one logical operation at a time (caused exceptions break the logic)
newlist.AddRange(list.Where(i => i.x == 5));
newlist.AddRange(list.Where(i => i > 5));
foreach(var i in newlist)
Console.WriteLine(i);
}
}
}
Output
Hello, ConsoleAppCore.YourClass, 7, 8

Exception when modifying collection in foreach loop

I know the basic principle of not modifying collection inside a foreach, that's why I did something like this:
public void UpdateCoverages(Dictionary<PlayerID, double> coverages)
{
// TODO: temp
var keys = coverages.Select(pair => pair.Key);
foreach (var key in keys)
{
coverages[key] = 0.84;
}
}
And:
class PlayerID : IEquatable<PlayerID>
{
public PlayerID(byte value)
{
Value = value;
}
public byte Value { get; private set; }
public bool Equals(PlayerID other)
{
return Value == other.Value;
}
}
First I save all my keys not to have the Collection modified exception and then I go through it. But I still get the exception which I cannot understand.
How to correct this and what is causing the problem?
First I save all my keys
No you don't; keys is a live sequence that is actively iterating the collection as it is iterated by the foreach. To create an isolated copy of the keys, you need to add .ToList() (or similar) to the end:
var keys = coverages.Select(pair => pair.Key).ToList();
Although personally I'd probably go for:
var keys = new PlayerID[coverages.Count];
coverages.Keys.CopyTo(keys, 0);
(which allows for correct-length allocation, and memory-copy)
What is a live sequence actually?
The Select method creates non-buffered spooling iterator over another... that is a really complicated thing to understand, but basically: when you first start iterating var key in keys, it grabs the inner sequence of coverages (aka coverages.GetEnumerator()), and then every time the foreach asks for the next item, it asks for the next item. Yeah, that sounds complicated. The good news is the C# compiler has it all built in automatically, with it generating state machines etc for you. All mainly done using the yield return syntax. Jon Skeet gives an excellent discussion of this in Chapter 6 of C# in Depth. IIRC this used to be the "free chapter", but now it is not.
However, consider the following:
static IEnumerable<int> OneTwoOneTwoForever()
{
while(true) {
yield return 1;
yield return 2;
}
}
It might surprise you to learn that you can consume the above, using the same non-buffered "when you ask for another value, it runs just enough code to give you the next value" approach:
var firstTwenty = OneTwoOneTwoForever().Take(20).ToList(); // works!

Remove item from List and get the item simultaneously

In C# I am trying to get an item from a list at a random index. When it has been retrieved I want it to be removed so that it can't be selected anymore. It seems as if I need a lot of operations to do this, isn't there a function where I can simply extract an item from the list? the RemoveAt(index) function is void. I would like one with a return value.
What I am doing:
List<int> numLst = new List<int>();
numLst.Add(1);
numLst.Add(2);
do
{
int index = rand.Next(numLst.Count);
int extracted = numLst[index];
// do something with extracted value...
numLst.removeAt(index);
}
while(numLst.Count > 0);
What I would like to do:
List<int> numLst = new List<int>();
numLst.Add(1);
numLst.Add(2);
do
{
int extracted = numLst.removeAndGetItem(rand.Next(numLst.Count));
// do something with this value...
}
while(numLst.Count > 0);
Does such a "removeAndGetItem" function exist?
No, as it's a breach of pure function etiquette, where a method either has a side effect, or returns a useful value (i.e. not just indicating an error state) - never both.
If you want the function to appear atomic, you can acquire a lock on the list, which will stop other threads from accessing the list while you are modifying it, provided they also use lock:
public static class Extensions
{
public static T RemoveAndGet<T>(this IList<T> list, int index)
{
lock(list)
{
T value = list[index];
list.RemoveAt(index);
return value;
}
}
}
public static class ListExtensions
{
public static T RemoveAndGetItem<T>(this IList<T> list, int iIndexToRemove}
{
var item = list[iIndexToRemove];
list.RemoveAt(iIndexToRemove);
return item;
}
}
These are called extension methods, call as new List<T>().RemoveAndGetItem(0).
Things to consider in the extension method
Exception handling with the index that you pass, check that the index is withing 0 and the count of the list before doing this.

Adding items to an IEnumerable through an extension method does not work?

In most of the methods I use that return some kind of collection I return IEnumerable rather than the specific type (e.g. List). In many cases I have another collection that I want to combine with the result IEnumerable, this would be exactly like taking a List and adding another List to it using the AddRange method. I have the following example, in it I have created an extension method that should take a collection of items to add and adds them to a base collection, while debugging this appears to works but in the original collection the items are never added. I don't understand this, why aren't they added, is there something about the implementation of the IEnumerable that I am missing? I understand that IEnumerable is a read only interface, but Iam not adding to this list in the example below, I am replacing it, but the original IEnumerable does not change.
class Program
{
static void Main(string[] args)
{
var collectionOne = new CollectionContainerOne();
var collectionTwo = new CollectionContainerTwo();
// Starts at 1- 50 //
collectionOne.Orders.AddRange(collectionTwo.Orders);
// Should now be 100 items but remains original 50 //
}
}
public class CollectionContainerOne
{
public IEnumerable<Order> Orders { get; set; }
public CollectionContainerOne()
{
var testIds = Enumerable.Range(1, 50);
var orders = new List<Order>();
foreach (int i in testIds)
{
orders.Add(new Order() { Id = i, Name = "Order #" + i.ToString() });
}
this.Orders = orders;
}
}
public class CollectionContainerTwo
{
public IEnumerable<Order> Orders { get; set; }
public CollectionContainerTwo()
{
var testIds = Enumerable.Range(51, 50);
var orders = new List<Order>();
foreach (int i in testIds)
{
orders.Add(new Order() { Id = i, Name = "Order #" + i.ToString() });
}
this.Orders = orders;
}
}
public class Order
{
public int Id { get; set; }
public string Name { get; set; }
public override string ToString()
{
return this.Name;
}
}
public static class IEnumerable
{
public static void AddRange<T>(this IEnumerable<T> enumerationToAddTo, IEnumerable<T> itemsToAdd)
{
var addingToList = enumerationToAddTo.ToList();
addingToList.AddRange(itemsToAdd);
// Neither of the following works //
enumerationToAddTo.Concat(addingToList);
// OR
enumerationToAddTo = addingToList;
// OR
enumerationToAddTo = new List<T>(addingToList);
}
}
You are modifying the parameter enumerationToAddTo, which is a reference. However, the reference is not itself passed by reference, so when you modify the reference, the change is not observable in the caller. Furthermore, it is not possible to use ref parameters in extension methods.
You are better off using Enumerable.Concat<T>. Alternatively, you can use ICollection, which has an Add(T) method. Unfortunately, List<T>.AddRange isn't defined in any interface.
Here is an example to illustrate the passing of reference types by reference. As Nikola points out, this is not really useful code. Don't try this at home!
void Caller()
{
// think of ss as a piece of paper that tells you where to find the list.
List<string> ss = new List<string> { "a", "b" };
//passing by value: we take another piece of paper and copy the information on ss to that piece of paper; we pass that to the method
DoNotReassign(ss);
//as this point, ss refers to the same list, that now contains { "a", "b", "c" }
//passing by reference: we pass the actual original piece of paper to the method.
Reassign(ref ss);
// now, ss refers to a different list, whose contents are { "x", "y", "z" }
}
void DoNotReassign(List<string> strings)
{
strings.Add("c");
strings = new List<string> { "x", "y", "z" }; // the caller will not see the change of reference
//in the piece of paper analogy, we have erased the piece of paper and written the location
//of the new list on it. Because this piece of paper is a copy of SS, the caller doesn't see the change.
}
void Reassign(ref List<string> strings)
{
strings.Add("d");
//at this point, strings contains { "a", "b", "c", "d" }, but we're about to throw that away:
strings = new List<string> { "x", "y", "z" };
//because strings is a reference to the caller's variable ss, the caller sees the reassignment to a new collection
//in the piece of paper analogy, when we erase the paper and put the new object's
//location on it, the caller sees that, because we are operating on the same
//piece of paper ("ss") as the caller
}
EDIT
Consider this program fragment:
string originalValue = "Hello, World!";
string workingCopy = originalValue;
workingCopy = workingCopy.Substring(0, workingCopy.Length - 1);
workingCopy = workingCopy + "?";
Console.WriteLine(originalValue.Equals("Hello, World!"); // writes "True"
Console.WriteLine(originalValue.Equals(workingCopy); // writes "False"
If your assumption about reference types were true, the output would be "False" then "True"
Calling your extensions method like this:
collectionOne.Orders.AddRange(collectionTwo.Orders);
Is essentially the same as:
IEnumerable.AddRange(collectionOne.Orders, collectionTwo.Orders);
Now what happens there, is you pass copy of reference to the collectionOne.Orders to the AddRange method. In your AddRange implementation you try to assign new value to the copy. It gets "lost" inside. You are not assigning new value to collectionOne.Orders, you assign it to its local copy - which scope is only within the method body itself. As a result of all modifications happenining inside AddRange, outside world notices no changes.
You either need to return new enumerable, or work on lists directly. Having mutating methods on IEnumerable<T> is rather counterintuitive, I'd stay away from doing that.
What you want exists and is called Concat. Essentially, when you do this in your Main:
var combined = collectionOne.Orders.Concat(collectionTwo.Orders);
Here, combined will refer to an IEnumerable that will traverse both source collections when enumerated.
IEnumerable does not support adding. What you in essence did in your code is create new collection from your enumerable, and add items to that new collection. Your old collection still has same items.
E.g., you create a collection of numbers like this
Collection1 = [ 1, 2, 3, 4, 5 ]
when you do Collection1.ToList().Add(...) you will get new collection with same members, and add new members like so:
Collection1 = [ 1, 2, 3, 4, 5, 6, 7, ... ]
your old collection will however still hold old members, as ToList creates new collection.
Solution #1:
Instead of using IEnumerable use IList which supports modification.
Solution #2 (bad):
Cast your IEnumerable back to it's derived type and add members to it. This is quite bad though, in fact it's better to just return List in the first place
IEnumerable<Order> collectionOne = ...;
List<Order> collectionOneList = (List<Order>)collectionOne;
collectionOneList.Add(new Order());
General guideline (best):
If you are returning collections which are standard in .NET there is no reason to return their interfaces. In this case it's best to use original type. If you are however returning collection which you implemented yourself, then you should return an interface
It's a completely different case when you are thinking about input parameters. If your method is asking to enumerate over items, then you should ask for IEnumerable. This way you can do what you need over it, and you are placing least constraint on person who is calling it. They can send any enumerable. If you need to add to that collection, you may require IList so that you can also modify it in your method.
Basically the problem is that you can't assign a value to enumerationToAddTo partially because it isn't a reference parameter. Also as phoog mentions ToList() creates a new list and does not cast the existing IEnumerable to a list.
This isn't really a good use of a extension. I would recommend that you add a method to your container collection that allows you add add new items to the IEnumerable instance. This would better encapsulate the logic that's particular to that class.

Categories

Resources