Get first (or any) value from HashSet

Get first (or any) value from HashSet - c#

Currently, my code looks like this:
private List<Node> dirtyNodes = new List<Node> dirtyNodes();
public void UpdateDirtyNodes()
{
while(dirtyNodes.Count > 0)
{
Node nodeToUpdate = dirtyNodes[0];
nodeToUpdate.UpdateNode();
dirtyNodes.Remove(nodeToUpdate);
}
}
public void MarkNodeDirty(Node node)
{
if(!dirtyNodes.Contains(node))
{
dirtyNodes.Add(node);
}
}
public void MarkNodeClean(Node node)
{
dirtyNodes.Remove(node);
}
This is a performance-critical part of the code, and it's slower than I'd like because dirtyNodes.Contains has to iterate over the entire array in most cases. I'd like to replace the List with a HashSet because it should be faster, but I can't figure out how to make that work with UpdateDirtyNodes().
The difficulty is that UpdateNode() can add or remove nodes from dirtyNodes at any time, hence the slightly awkward while loop. Is there a way I can get the "first" value from a HashSet? Order doesn't matter, I just need to stay in the while loop until dirtyNodes is empty, updating whatever node comes next.
I would prefer to avoid using Linq since this code will be part of a library and I don't want to force them to include Linq.
How can I do this?

Turns out it's very easy to just use the enumerator directly:
public void UpdateDirtyNodes()
{
while(dirtyNodes.Count > 0)
{
using(HashSet<Node>.Enumerator enumerator = dirtyNodes.GetEnumerator())
{
if(enumerator.MoveNext())
{
Node nodeToUpdate = enumerator.Current;
nodeToUpdate.UpdateNode();
dirtyNodes.Remove(nodeToUpdate);
}
}
}
}
public void MarkNodeDirty(Node node)
{
dirtyNodes.Add(node);
}
I originally tried something similar but didn't fully understand how to manually use enumerators and it didn't work.
It's significantly faster than the List (overall frame time is ~25-50% faster depending on the situation just from that one change) so I'm very happy. (Don't panic about that 30MB allocation in the screenshot below - I'm working on it.)

Add a bool dirty field inside the Node class. This is in addition to keeping a hash set. Then MarkNodeClean() doesn't need to remove nodes from the HashSet, shaving off some CPU cycles.
If you feel that adding a field inside Node class is too "dirty" (pun intended), then just make a HashSet<(Node, bool)> instead of HashSet<Node>, but you are loosing performance on allocating and garbage-collecting extra objects, which is not ideal since your code is performance-critical.
UpdateDirtyNodes() will take nodes one at a time until the HashSet is empty. After taking each node, it will look at the boolean flag to decide whether the node is actually dirty.
P.S.
You should remove dirtyNodes.Clear(); from UpdateDirtyNodes(). This is a race condition. If a node is added after the while loop finds that dirtyNodes.Count is 0, then dirtyNodes.Clear(); clears out that node without processing it. This is a separate bug, not related to your question.

Related

Replace items in a list -- in place

I'm facing an issue with some simple C# code which I would easily fix in C/C++.
I guess I'm missing something.
I want to do the following (modifying items in a list -- in place):
//pseudocode
void modify<T>(List<T> a) {
foreach(var item in a) {
if(condition(item)) {
item = somethingElse;
}
}
}
I understand that foreach loops on a collection viewed as immutable, so the code above can't work.
I therefore tried the following :
void modify<T>(List<T> a) {
using (var sequenceEnum = a.GetEnumerator())
{
while (sequenceEnum.MoveNext())
{
var m = sequenceEnum.Current;
if(condition(m)) {
sequenceEnum.Current = somethingElse;
}
}
}
}
Naively thinking that Enumerator was some kind of pointer to my Element. Apparently enumerators are also immutable.
In C++ I would write something like that:
template<typename T>
struct Node {
T* value;
Node* next;
}
being then able to modify *value without touching anything in Node and therefore in the parent collection:
Node<T>* current = a->head;
while(current != nullptr) {
if(condition(current->value))
current->value = ...
}
current = current->next;
}
Do I really have to to unsafe code?
Or am i stuck the awfulness of calling subscript inside the loop?

You could also use a simple for loop.
void modify<T>(List<T> a)
{
for (int i = 0; i < a.Count; i++)
{
if (condition(a[i]))
{
a[i] = default(T);
}
}
}

In short - do not modify lists. You can achieve desired effect with
a = a.Select(x => <some logic> ? x : default(T)).ToList()
In general lists in C# are immutable during iteration. You can hovewer use .RemoveAll or similar methods.

As described in documentation here, System.Collections.Generic.List<T> is a generic implementation of the System.Collections.ArrayList which has O(1) complexity in indexed accessor. Very much like C++ std::vector<>, complexity of insertion/addition of elements is unpredictible, but access is time constant (complexity-wise, with respect to caching, etc).
The equivalent of your C++ code snippet would be LinkedList
As for the immutability of your collection during iteration, it is clearly stated in the documentation of GetEnumerator method here. Indeed, during enumeration (within a foreach, or using IEnumerator.MoveNext directly):
Enumerators can be used to read the data in the collection, but they
cannot be used to modify the underlying collection.
Moreover, modifying the list will invalidate the enumerator and usually throw an exception:
An enumerator remains valid as long as the collection remains
unchanged. If changes are made to the collection, such as adding,
modifying, or deleting elements, the enumerator is irrecoverably
invalidated and its behavior is undefined.
I believe this interface contract consistency between various types of collections leads to your misunderstanding: it would possible to implement a mutable list, but it is not required by the interfaces contract.
Imagine you want to implement a list that is mutable during enumeration. Would the enumerator hold a reference to (or a way to retrieve) the entry or the entry itself ? The entry itself would make it unmutable, a reference would be invalid when inserting elements in a linked list for example.
The simple for loop proposed by #Igor seems to be the best way to go if you want to use the standard Collections library. Otherwise, you may need to reimplement it yourself.

Use something like this:
List<T> GetModified<T>(List<T> list, Func<T, bool> condition, Func<T> replacement)
{
return list.Select(m => if (condition(m))
{ return m; }
else
{ return replacement(); }).ToList();
}
Usage:
originalList = GetModified(originalList, i => i.IsAwesome(), null);
But this can also get you into trouble with cross-thread operations. Try to use immutable instances where possible, especially with IEnumerable.
If you really really want to modify the instance of list:
//if you ever want to also remove items, this is magic (why I iterate backwards)
for (int i = list.Count - 1; i >= 0; i--)
{
if (condition)
{
list[i] = whateverYouWant;
}
}

Is there an accepted pattern to preserve variables' values during a function call which modifies those variables?

Within a class I have a property used by a method which I want to remain in the same state after a call to a second method (which might alter that state).
Example: for a property Value I could do something like this:
void MethodOne()
{
...
var tempValue = this.Value;
MethodTwo(); // might modify this.Value
this.Value = tempValue;
...
}
For a single property this isn't a big deal. If I have multiple properties it gets uglier.
I'm looking for a C# solution but would be interested to know if this kind of construct appears in any common language. The sort of syntax I'm after might look something like this:
void MethodOne()
{
...
preserving(this.Value)
{
MethodTwo(); // might modify this.Value
}
...
}
where the preserving keyword could potentially accept multiple properties/fields.
In my specific case it's a recursive method, so the code looks more like:
void MethodOne(object[] args)
{
...
// Do something which might modify this.Value
preserving(this.Value)
{
MethodOne(args);
}
...
}
Is there an accepted pattern / best practice to achieve this?
EDIT
The specific case for which I'm asking is something like this:
For the purposes of sorting lists I have a custom comparison class which implements IComparer. Its Compare method acts on objects which appear in collections (which may therefore be sorted). These collections might be nested, so sorting such a collection might result in the sort function, and therefore Compare(), being called recursively.
The actual comparison function is partially dynamic, which means that it could be set at runtime to something invalid (e.g. non-transitive or non-deterministic). I can't prevent this, so I want to set a limit on the number of comparisons (let's say n-squared, where n is the length of the list being sorted) to protect against cases where an invalid comparison function might result in the sorting algorithm going into an infinite loop.
The Compare method might be called from (e.g.) various LINQ methods such as OrderBy, possibly resulting in lazily evaluated sorts and possibly from code over which I have no control. However, I need to count the number of comparisons in each sort without any 'subsorts' of nested objects corrupting the count (but also counting comparisons in those subsorts).
My code looks something like this:
public int Compare(T x, T y)
{
// this.MaxComparisons is set from outside this code, since this method does not know the length of the list it is sorting.
if (++this.ComparisonCount > this.MaxComparisons)
{
// Error: too many comparisons
}
if (predicate)
{
// Preserve...
tempComparisonCount = this.ComparisonCount;
tempMaxComparisons = this.MaxComparisons;
// ...reset...
this.ComparisonCount = 0;
this.MaxComparisons = ... ; // set as required
var result = this.customComparer.Compare(x.Child, y.Child); // might involve further calls to the above method, which should be counted separately
// ...and restore
this.ComparisonCount = tempComparisonCount;
this.MaxComparisons = tempMaxComparisons;
return result;
}
else
{
return otherComparer.Compare(x, y);
}
}
I hope this makes it clearer why I have asked the question.

private static void Preserving<T>(ref T value, Action act)
{
T old = value;
act();
value = old;
}
then you can do:
Preserving(ref this.Value, MethodTwo);
If you have multiple variables you want to save and restore, you should probably create a Context class containing the state you want to save and then push/pop them from a stack.

Help me understand the code snippet in c#

I am reading this blog: Pipes and filters pattern
I am confused by this code snippet:
public class Pipeline<T>
{
private readonly List<IOperation<T>> operations = new List<IOperation<T>>();
public Pipeline<T> Register(IOperation<T> operation)
{
operations.Add(operation);
return this;
}
public void Execute()
{
IEnumerable<T> current = new List<T>();
foreach (IOperation<T> operation in operations)
{
current = operation.Execute(current);
}
IEnumerator<T> enumerator = current.GetEnumerator();
while (enumerator.MoveNext());
}
}
what is the purpose of this statement: while (enumerator.MoveNext());? seems this code is a noop.

First consider this:
IEnumerable<T> current = new List<T>();
foreach (IOperation<T> operation in operations)
{
current = operation.Execute(current);
}
This code appears to be creating nested enumerables, each of which takes elements from the previous, applies some operation to them, and passes the result to the next. But it only constructs the enumerables. Nothing actually happens yet. It's just ready to go, stored in the variable current. There are lots of ways to implement IOperation.Execute but it could be something like this.
IEnumerable<T> Execute(IEnumerable<T> ts)
{
foreach (T t in ts)
yield return this.operation(t); // Perform some operation on t.
}
Another option suggested in the article is a sort:
IEnumerable<T> Execute(IEnumerable<T> ts)
{
// Thank-you LINQ!
// This was 10 lines of non-LINQ code in the original article.
return ts.OrderBy(t => t.Foo);
}
Now look at this:
IEnumerator<T> enumerator = current.GetEnumerator();
while (enumerator.MoveNext());
This actually causes the chain of operations to be performed. When the elements are requested from the enumeration, it causes elements from the original enumerable to be passed through the chain of IOperations, each of which performs some operation on them. The end result is discarded so only the side-effect of the operation is interesting - such as writing to the console or logging to a file. This would have been a simpler way to write the last two lines:
foreach (T t in current) {}
Another thing to observe is that the initial list that starts the process is an empty list so for this to make sense some instances of T have to be created inside the first operation. In the article this is done by asking the user for input from the console.

In this case, the while (enumerator.MoveNext()); is simply evaluating all the items that are returned by the final IOperation<T>. It looks a little confusing, but the empty List<T> is only created in order to supply a value to the first IOperation<T>.
In many collections this would do exaclty nothing as you suggest, but given that we are talking about the pipes and filters pattern it is likely that the final value is some sort of iterator that will cause code to be executed. It could be something like this, for example (assuming that is an integer):
public class WriteToConsoleOperation : IOperation<int>
{
public IEnumerable<int> Execute(IEnumerable<int> ints)
{
foreach (var i in ints)
{
Console.WriteLine(i);
yield return i;
}
}
}
So calling MoveNext() for each item on the IEnumerator<int> returned by this iterator will return each of the values (which are ignored in the while loop) but also output each of the values to the console.
Does that make sense?

while (enumerator.MoveNext());
Inside the current block of code, there is no affect (it moves through all the items in the enumeration). The displayed code doesn't act on the current element in the enumeration. What might be happening is that the MoveNext() method is moving to the next element, and it is doing something to the objects in the collection (updating an internal value, pull the next from the database etc.). Since the type is List<T> this is probably not the case, but in other instances it could be.

C# - Collection is enough or comobination of LINQ will improve performance?

According to the requirement we have to return a collection either in reverse order or as
it is. We, beginning level programmer designed the collection as follow :(sample is given)
namespace Linqfying
{
class linqy
{
static void Main()
{
InvestigationReport rpt=new InvestigationReport();
// rpt.GetDocuments(true) refers
// to return the collection in reverse order
foreach( EnquiryDocument doc in rpt.GetDocuments(true) )
{
// printing document title and author name
}
}
}
class EnquiryDocument
{
string _docTitle;
string _docAuthor;
// properties to get and set doc title and author name goes below
public EnquiryDocument(string title,string author)
{
_docAuthor = author;
_docTitle = title;
}
public EnquiryDocument(){}
}
class InvestigationReport
{
EnquiryDocument[] docs=new EnquiryDocument[3];
public IEnumerable<EnquiryDocument> GetDocuments(bool IsReverseOrder)
{
/* some business logic to retrieve the document
docs[0]=new EnquiryDocument("FundAbuse","Margon");
docs[1]=new EnquiryDocument("Sexual Harassment","Philliphe");
docs[2]=new EnquiryDocument("Missing Resource","Goel");
*/
//if reverse order is preferred
if(IsReverseOrder)
{
for (int i = docs.Length; i != 0; i--)
yield return docs[i-1];
}
else
{
foreach (EnquiryDocument doc in docs)
{
yield return doc;
}
}
}
}
}
Question :
Can we use other collection type to improve efficiency ?
Mixing of Collection with LINQ reduce the code ? (We are not familiar with LINQ)

Looks fine to me. Yes, you could use the Reverse extension method... but that won't be as efficient as what you've got.
How much do you care about the efficiency though? I'd go with the most readable solution (namely Reverse) until you know that efficiency is a problem. Unless the collection is large, it's unlikely to be an issue.
If you've got the "raw data" as an array, then your use of an iterator block will be more efficient than calling Reverse. The Reverse method will buffer up all the data before yielding it one item at a time - just like your own code does, really. However, simply calling Reverse would be a lot simpler...
Aside from anything else, I'd say it's well worth you learning LINQ - at least LINQ to Objects. It can make processing data much, much cleaner than before.

Two questions:
Does the code you currently have work?
Have you identified this piece of code as being your performance bottleneck?
If the answer to either of those questions is no, don't worry about it. Just make it work and move on. There's nothing grossly wrong about the code, so no need to fret! Spend your time building new functionality instead. Save LINQ for a new problem you haven't already solved.

Actually this task seems pretty straightforward. I'd actually just use the Reverse method on a Generic List.
This should already be well-optimized.

Your GetDocuments method has a return type of IEnumerable so there is no need to even loop over your array when IsReverseOrder is false, you can just return it as is as Array type is IEnumerable...
As for when IsReverseOrder is true you can use either Array.Reverse or the Linq Reverse() extension method to reduce the amount of code.

Is yield useful outside of LINQ?

When ever I think I can use the yield keyword, I take a step back and look at how it will impact my project. I always end up returning a collection instead of yeilding because I feel the overhead of maintaining the state of the yeilding method doesn't buy me much. In almost all cases where I am returning a collection I feel that 90% of the time, the calling method will be iterating over all elements in the collection, or will be seeking a series of elements throughout the entire collection.
I do understand its usefulness in linq, but I feel that only the linq team is writing such complex queriable objects that yield is useful.
Has anyone written anything like or not like linq where yield was useful?

Note that with yield, you are iterating over the collection once, but when you build a list, you'll be iterating over it twice.
Take, for example, a filter iterator:
IEnumerator<T> Filter(this IEnumerator<T> coll, Func<T, bool> func)
{
foreach(T t in coll)
if (func(t)) yield return t;
}
Now, you can chain this:
MyColl.Filter(x=> x.id > 100).Filter(x => x.val < 200).Filter (etc)
You method would be creating (and tossing) three lists. My method iterates over it just once.
Also, when you return a collection, you are forcing a particular implementation on you users. An iterator is more generic.

I do understand its usefulness in linq, but I feel that only the linq team is writing such complex queriable objects that yield is useful.
Yield was useful as soon as it got implemented in .NET 2.0, which was long before anyone ever thought of LINQ.
Why would I write this function:
IList<string> LoadStuff() {
var ret = new List<string>();
foreach(var x in SomeExternalResource)
ret.Add(x);
return ret;
}
When I can use yield, and save the effort and complexity of creating a temporary list for no good reason:
IEnumerable<string> LoadStuff() {
foreach(var x in SomeExternalResource)
yield return x;
}
It can also have huge performance advantages. If your code only happens to use the first 5 elements of the collection, then using yield will often avoid the effort of loading anything past that point. If you build a collection then return it, you waste a ton of time and space loading things you'll never need.
I could go on and on....

I recently had to make a representation of mathematical expressions in the form of an Expression class. When evaluating the expression I have to traverse the tree structure with a post-order treewalk. To achieve this I implemented IEnumerable<T> like this:
public IEnumerator<Expression<T>> GetEnumerator()
{
if (IsLeaf)
{
yield return this;
}
else
{
foreach (Expression<T> expr in LeftExpression)
{
yield return expr;
}
foreach (Expression<T> expr in RightExpression)
{
yield return expr;
}
yield return this;
}
}
Then I can simply use a foreach to traverse the expression. You can also add a Property to change the traversal algorithm as needed.

At a previous company, I found myself writing loops like this:
for (DateTime date = schedule.StartDate; date <= schedule.EndDate;
date = date.AddDays(1))
With a very simple iterator block, I was able to change this to:
foreach (DateTime date in schedule.DateRange)
It made the code a lot easier to read, IMO.

yield was developed for C#2 (before Linq in C#3).
We used it heavily in a large enterprise C#2 web application when dealing with data access and heavily repeated calculations.
Collections are great any time you have a few elements that you're going to hit multiple times.
However in lots of data access scenarios you have large numbers of elements that you don't necessarily need to pass round in a great big collection.
This is essentially what the SqlDataReader does - it's a forward only custom enumerator.
What yield lets you do is quickly and with minimal code write your own custom enumerators.
Everything yield does could be done in C#1 - it just took reams of code to do it.
Linq really maximises the value of the yield behaviour, but it certainly isn't the only application.

Whenever your function returns IEnumerable you should use "yielding". Not in .Net > 3.0 only.
.Net 2.0 example:
public static class FuncUtils
{
public delegate T Func<T>();
public delegate T Func<A0, T>(A0 arg0);
public delegate T Func<A0, A1, T>(A0 arg0, A1 arg1);
...
public static IEnumerable<T> Filter<T>(IEnumerable<T> e, Func<T, bool> filterFunc)
{
foreach (T el in e)
if (filterFunc(el))
yield return el;
}
public static IEnumerable<R> Map<T, R>(IEnumerable<T> e, Func<T, R> mapFunc)
{
foreach (T el in e)
yield return mapFunc(el);
}
...

I'm not sure about C#'s implementation of yield(), but on dynamic languages, it's far more efficient than creating the whole collection. on many cases, it makes it easy to work with datasets much bigger than RAM.

I am a huge Yield fan in C#. This is especially true in large homegrown frameworks where often methods or properties return List that is a sub-set of another IEnumerable. The benefits that I see are:
the return value of a method that uses yield is immutable
you are only iterating over the list once
it a late or lazy execution variable, meaning the code to return the values are not executed until needed (though this can bite you if you dont know what your doing)
of the source list changes, you dont have to call to get another IEnumerable, you just iterate over IEnumeable again
many more
One other HUGE benefit of yield is when your method potentially will return millions of values. So many that there is the potential of running out of memory just building the List before the method can even return it. With yield, the method can just create and return millions of values, and as long the caller also doesnt store every value. So its good for large scale data processing / aggregating operations

Personnally, I haven't found I'm using yield in my normal day-to-day programming. However, I've recently started playing with the Robotics Studio samples and found that yield is used extensively there, so I also see it being used in conjunction with the CCR (Concurrency and Coordination Runtime) where you have async and concurrency issues.
Anyway, still trying to get my head around it as well.

Yield is useful because it saves you space. Most optimizations in programming makes a trade off between space (disk, memory, networking) and processing. Yield as a programming construct allows you to iterate over a collection many times in sequence without needing a separate copy of the collection for each iteration.
consider this example:
static IEnumerable<Person> GetAllPeople()
{
return new List<Person>()
{
new Person() { Name = "George", Surname = "Bush", City = "Washington" },
new Person() { Name = "Abraham", Surname = "Lincoln", City = "Washington" },
new Person() { Name = "Joe", Surname = "Average", City = "New York" }
};
}
static IEnumerable<Person> GetPeopleFrom(this IEnumerable<Person> people, string where)
{
foreach (var person in people)
{
if (person.City == where) yield return person;
}
yield break;
}
static IEnumerable<Person> GetPeopleWithInitial(this IEnumerable<Person> people, string initial)
{
foreach (var person in people)
{
if (person.Name.StartsWith(initial)) yield return person;
}
yield break;
}
static void Main(string[] args)
{
var people = GetAllPeople();
foreach (var p in people.GetPeopleFrom("Washington"))
{
// do something with washingtonites
}
foreach (var p in people.GetPeopleWithInitial("G"))
{
// do something with people with initial G
}
foreach (var p in people.GetPeopleWithInitial("P").GetPeopleFrom("New York"))
{
// etc
}
}
(Obviously you are not required to use yield with extension methods, it just creates a powerful paradigm to think about data.)
As you can see, if you have a lot of these "filter" methods (but it can be any kind of method that does some work on a list of people) you can chain many of them together without requiring extra storage space for each step. This is one way of raising the programming language (C#) up to express your solutions better.
The first side-effect of yield is that it delays execution of the filtering logic until you actually require it. If you therefore create a variable of type IEnumerable<> (with yields) but never iterate through it, you never execute the logic or consume the space which is a powerful and free optimization.
The other side-effect is that yield operates on the lowest common collection interface (IEnumerable<>) which enables the creation of library-like code with wide applicability.

Note that yield allows you to do things in a "lazy" way. By lazy, I mean that the evaluation of the next element in the IEnumberable is not done until the element is actually requested. This allows you the power to do a couple of different things. One is that you could yield an infinitely long list without the need to actually make infinite calculations. Second, you could return an enumeration of function applications. The functions would only be applied when you iterate through the list.

I've used yeild in non-linq code things like this (assuming functions do not live in same class):
public IEnumerable<string> GetData()
{
foreach(String name in _someInternalDataCollection)
{
yield return name;
}
}
...
public void DoSomething()
{
foreach(String value in GetData())
{
//... Do something with value that doesn't modify _someInternalDataCollection
}
}
You have to be careful not to inadvertently modify the collection that your GetData() function is iterating over though, or it will throw an exception.

Yield is very useful in general. It's in ruby among other languages that support functional style programming, so its like it's tied to linq. It's more the other way around, that linq is functional in style, so it uses yield.
I had a problem where my program was using a lot of cpu in some background tasks. What I really wanted was to still be able to write functions like normal, so that I could easily read them (i.e. the whole threading vs. event based argument). And still be able to break the functions up if they took too much cpu. Yield is perfect for this. I wrote a blog post about this and the source is available for all to grok :)

The System.Linq IEnumerable extensions are great, but sometime you want more. For example, consider the following extension:
public static class CollectionSampling
{
public static IEnumerable<T> Sample<T>(this IEnumerable<T> coll, int max)
{
var rand = new Random();
using (var enumerator = coll.GetEnumerator());
{
while (enumerator.MoveNext())
{
yield return enumerator.Current;
int currentSample = rand.Next(max);
for (int i = 1; i <= currentSample; i++)
enumerator.MoveNext();
}
}
}
}
Another interesting advantage of yielding is that the caller cannot cast the return value to the original collection type and modify your internal collection

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.