Find the closest DateTime key in Dictionary<DateTime, double> - c#

I have the Date portion of DateTime as lookup value and like to retrieve the matching value in a Dictionary of type Dictionary<DateTime, double>. Note the DateTime keys are stored as only the Date portion.
My problem is that there may be no key that matches with my lookup value. What I then like to do is to find the nearest previous dateTime.Date key and matching value.
Now, I am aware that Dictionaries are not sorted by key. I could use a SortedDictionary but prefer to use a Dictionary for a specific reason or else switch to a List collection (can be pre-sorted). My question is, what would you recommend to do in this case: Would it be more efficient to retain the Dictionary structure and decrement my lookup value until I find a matching key? Or would it be better to use a list collection and use Linq? Each Dictionary contains around 5000 key/value pairs. Also, please note I look for a highly computationally efficient solution because the frequency of lookups is quite high (potentially many hundred thousand times (each lookup is guaranteed to be different from any previous value)

Since you need it fast, I think the best thing would be to use the results of a BinarySearch. This requires a List<T> that's sorted.
int result = myList.BinarySearch(targetDate);
if (result >= 0)
return myList[result];
else
{
int nextLarger = ~result;
// return next smaller, or if that doesn't exist, the smallest one
return myList[Math.Max(0, nextLarger - 1)];
}
It should be possible to create a class that combines a Dictionary<TKey,TValue> and a sorted List<TKey> that still serializes like a Dictionary<TKey,TValue>. The serialization might be as simple (in Json.NET) as putting a [JsonConverter(typeof(KeyValuePairConverter))] on your class.
Just for completeness and in case others read this in the future, if speed wasn't very important, you can do it more simply with something like this:
var result = myDict.Keys.Where(x => x < targetDate).Max();

I would use a custom structure and collection to store these informations:
public struct DateValue
{
public DateValue(DateTime date, double val)
: this()
{
this.Date = date;
this.Value = val;
}
public DateTime Date { get; set; }
}
Here is a possible implementation of a collection that holds all DateValues and encapsulates the logic to return the nearest. It uses List.BinarySearch to find it. If it doesn't find a direct match it uses the logic of BinarySearch to detect the nearest which is:
The index of the specified value in the specified array, if value is
found. If value is not found and value is less than one or more
elements in array, a negative number which is the bitwise complement
of the index of the first element that is larger than value. If value
is not found and value is greater than any of the elements in array, a
negative number which is the bitwise complement of (the index of the
last element plus 1).
public class DateValueCollection : List<DateValue>, IComparer<DateValue>
{
public DateValueCollection() { }
public DateValueCollection(IEnumerable<DateValue> dateValues, bool isOrdered)
{
if (isOrdered)
base.AddRange(dateValues);
else
base.AddRange(dateValues.OrderBy(dv => dv.Date));
}
public DateValue GetNearest(DateTime date)
{
if (base.Count == 0)
return default(DateValue);
DateValue dv = new DateValue(date, 0);
int index = base.BinarySearch(dv, this);
if (index >= 0)
{
return base[index];
}
// If not found, List.BinarySearch returns the complement of the index
index = ~index;
DateValue[] all;
if(index >= base.Count - 1)
{
// proposed index is last, check previous and last
all = new[] { base[base.Count - 1], base[base.Count - 2] };
}
else if(index == 0)
{
// proposed index is first, check first and second
all = new[] { base[index], base[index + 1] };
}
else
{
// return nearest DateValue from previous and this
var thisDV = base[index];
var prevDV = base[index - 1];
all = new[]{ thisDV, prevDV };
}
return all.OrderBy(x => (x.Date - date).Duration()).First();
}
public int Compare(DateValue x, DateValue y)
{
return x.Date.CompareTo(y.Date);
}
}
Quick test:
var dateVals = new[] {
new DateValue(DateTime.Today.AddDays(10), 1), new DateValue(DateTime.Today, 3), new DateValue(DateTime.Today.AddDays(4), 7)
};
var dvCollection = new DateValueCollection(dateVals, false);
DateValue nearest = dvCollection.GetNearest(DateTime.Today.AddDays(1));

why worry about optomisation prematurley
do it
THEN AND ONLY THEN if its slow then you have a problem
Measure it with a profiler
then understanding starts at which point you try other ways and profile them.
Answer is: If you do it any way at all and there isnt a performance problem you just saved time and managed to do something else useful to add value in your day.
Premature optimization is not only pointless you will usually be completely wrong about where you should be looking.

Related

Looking for something like a HashSet, but with a range of values for the key?

I'm wondering if there is something like HashSet, but keyed by a range of values.
For example, we could add an item which is keyed to all integers between 100 and 4000. This item would be returned if we used any key between 100 and 4000, e.g. 287.
I would like the lookup speed to be quite close to HashSet, i.e. O(1). It would be possible to implement this using a binary search, but this would be too slow for the requirements. I would like to use standard .NET API calls as much as possible.
Update
This is interesting: https://github.com/mbuchetics/RangeTree
It has a time complexity of O(log(N)) where N is number of intervals, so it's not exactly O(1), but it could be used to build a working implementation.
I don't believe there's a structure for it already. You could implement something like a RangedDictionary:
class RangedDictionary {
private Dictionary<Range, int> _set = new Dictionary<Range, int>();
public void Add(Range r, int key) {
_set.Add(r, key);
}
public int Get(int key) {
//find a range that includes that key and return _set[range]
}
}
struct Range {
public int Begin;
public int End;
//override GetHashCode() and Equals() methods so that you can index a Dictionary by Range
}
EDIT: changed to HashSet to Dictionary
Here is a solution you can try out. However it assumes some points :
No range overlaps
When you request for a number, it is effectively inside a range (no error check)
From what you said, this one is O(N), but you can make it O(log(N)) with little effort I think.
The idea is that a class will handle the range thing, it will basically convert any value given to it to its range's lower boundary. This way your Hashtable (here a Dictionary) contains the low boundaries as keys.
public class Range
{
//We store all the ranges we have
private static List<int> ranges = new List<int>();
public int value { get; set; }
public static void CreateRange(int RangeStart, int RangeStop)
{
ranges.Add(RangeStart);
ranges.Sort();
}
public Range(int value)
{
int previous = ranges[0];
//Here we will find the range and give it the low boundary
//This is a very simple foreach loop but you can make it better
foreach (int item in ranges)
{
if (item > value)
{
break;
}
previous = item;
}
this.value = previous;
}
public override int GetHashCode()
{
return value;
}
}
Here is to test it.
class Program
{
static void Main(string[] args)
{
Dictionary<int, int> myRangedDic = new Dictionary<int,int>();
Range.CreateRange(10, 20);
Range.CreateRange(50, 100);
myRangedDic.Add(new Range(15).value, 1000);
myRangedDic.Add(new Range(75).value, 5000);
Console.WriteLine("searching for 16 : {0}", myRangedDic[new Range(16).value].ToString());
Console.WriteLine("searching for 64 : {0}", myRangedDic[new Range(64).value].ToString());
Console.ReadLine();
}
}
I don't believe you really can go below O(Log(N)) because there is no way for you to know immediately in which range a number is, you must always compare it with a lower (or upper) bound.
If you had predetermined ranges, that would have been easier to do. i.e. if your ranges are every hundreds, it is really easy to find the correct range of any number by calculating it modulo 100, but here we can assume nothing, so we must check.
To go down to Log(N) with this solution, just replace the foreach with a loop that will look at the middle of the array, then split it in two every iteration...

C# data structure, list which can dynamically resize up to a given limit, and allows fast access to any index

I'm implementing a memory system for an AI agent. It needs to have an internal list of state transitions which is capped at some number, say 10000.
If at capacity, adding a new memory should automatically remove the oldest memory.
Importantly, I should also need to be able to quickly access any item in this list.
A wrapper for Queue at first seemed obvious, but Queue does not allow fast access of any element. (O(n))
Similarly, remove an item from the beginning of a List structure takes O(n).
LinkedLists allow fast additions and removals, but again do not allow quick access to every index.
An array would allow random access but obviously it's not dynamically resizeable and deletion is problematic.
I've seen a HashMap being suggested but I'm ensure how that might be implemented.
Suggestions?
If you want the queue to be a fixed length, you could use a circular buffer which enables O(1) enqueue, dequeue and indexing operations and automatically overwrites old entries when the queue is full.
Try using a Dictionary with a LinkedList. The keys of the Dictionary are the indexes of the LinkedList nodes and the values of the Dictionary are of type LinkedListNode; that is, the LinkedList nodes.
The Dictionary would give you almost an O(1) on its operations and removing/adding LinkedListNode(s) to the beginning or end of a LinkedList is of O(1) as well.
Another alternative is to use a HashTable. However, in this case you have to know the capacity of the table beforehand (See Hashtable.Add Method) in order to get the O(1) performance:
If Count is less than the capacity of the Hashtable, this method is an O(1) operation. If the capacity needs to be increased to accommodate the new element, this method becomes an O(n) operation, where n is Count.
In the first solution, no matter what's the capcity of the LinkedList or the Dictionary you would still get almost an O(1) from both the Dictionary and the LinkedList. Of course that's going to be an O(3) or O(4) depending on the total number of operations that you perform on both the Dictionary and the LinkedList to do an add or remove operation inside your memory class. The search access is going to be always an O(1) because you will be using the Dictionary only.
HashMap is for Java, so the closest equivalent is Dictionary. C# Java HashMap equivalent. But I wouldn't say that this is the ultimate answer.
If you implement it as Dictionary, which key == the content, then you can search the content with O(1). However, you cannot have same key. Also, because it is not ordered, you may not know which the 1st content is.
If you implement it as Dictionary, which key == index, and value == the content, searching for the content still takes O(n) because you don't know the location of content.
A List or an Array will cost O(1) if you search the content by index reference. So, please double check your statement that it takes O(n)
If you search by index is sufficient, then circular array/ buffer which #Lee mentioned is good enough.
Otherwise, similar to DB, you might want to maintain in 2 separate data: 1 for storing the data (Circular Array) and the other one for search (Hash).
EDIT: #Lee has it right. A circular buffer seems to give you what you want. Answer left in place though.
I think the data structure you want might be a priority queue -- it depends on what you mean by 'quickly access any item'. If you mean 'able to enumerate all items in O(N)', then a priority queue fits the bill. If you mean 'enumerate the list in historical order', then it won't.
Assuming you need these operations;
add an item and associate with a time
remove the oldest item
enumerate all existing items in arbitrary order
Then you could easily extend this priority queue implementation I wrote a little while ago.
You'll want implement IEnumerable as a loop through the T[] data array from 0 to cursor. This will give you your enumeration.
Implement a GetItem(i) function which returns this.data[i] so long as i <= cursor.
Implement an automatic size limit by putting this into the Push() method;
if (queue.Size => 10000) {
queue.Pop();
}
I think this is O(ln n) for push and pop, and O(N) to enumerate ALL items, or O(i) to find ANY item, so long as you don't need them in order.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace Mindfire.DataStructures
{
public class PiorityQueue<T>
{
private int[] priorities;
private T[] data;
private int cursor;
private int capacity;
public int Size
{
get
{
return cursor+1;
}
}
public PiorityQueue(int capacity)
{
this.cursor = -1;
this.capacity = capacity;
this.priorities = new int[this.capacity];
this.data = new T[this.capacity];
}
public T Pop()
{
if (this.Size == 0)
{
throw new InvalidOperationException($"The {this.GetType().Name} is Empty");
}
var result = this.data[0];
this.data[0] = this.data[cursor];
this.priorities[0] = this.priorities[cursor];
this.cursor--;
var loc = 0;
while (true)
{
var l = loc * 2;
var r = loc * 2 + 1;
var leftIsBigger = l <= cursor && this.priorities[loc] < this.priorities[l];
var rightIsBigger = r <= cursor && this.priorities[loc] < this.priorities[r];
if (leftIsBigger)
{
Swap(loc, l);
loc = l;
}
else if (rightIsBigger)
{
Swap(loc, r);
loc = r;
}
else
{
break;
}
}
return result;
}
public void Push(int priority, T v)
{
this.cursor++;
if (this.cursor == this.capacity)
{
Resize(this.capacity * 2);
};
this.data[this.cursor] = v;
this.priorities[this.cursor] = priority;
var loc = (this.cursor -1)/ 2;
while (this.priorities[loc] < this.priorities[cursor])
{
// swap
this.Swap(loc, cursor);
}
}
private void Swap(int a, int b)
{
if (a == b) { return; }
var data = this.data[b];
var priority = this.priorities[b];
this.data[b] = this.data[a];
this.priorities[b] = this.priorities[a];
this.priorities[a] = priority;
this.data[a] = data;
}
private void Resize(int newCapacity)
{
var newPriorities = new int[newCapacity];
var newData = new T[newCapacity];
this.priorities.CopyTo(newPriorities, 0);
this.data.CopyTo(newData, 0);
this.data = newData;
this.priorities = newPriorities;
this.capacity = newCapacity;
}
public PiorityQueue() : this(1)
{
}
public T Peek()
{
if (this.cursor > 0)
{
return this.data[0];
}
else
{
return default(T);
}
}
public void Push(T item, int priority)
{
}
}
}

Quickest way to compare two generic lists for differences

What is the quickest (and least resource intensive) to compare two massive (>50.000 items) and as a result have two lists like the ones below:
items that show up in the first list but not in the second
items that show up in the second list but not in the first
Currently I'm working with the List or IReadOnlyCollection and solve this issue in a linq query:
var list1 = list.Where(i => !list2.Contains(i)).ToList();
var list2 = list2.Where(i => !list.Contains(i)).ToList();
But this doesn't perform as good as i would like.
Any idea of making this quicker and less resource intensive as i need to process a lot of lists?
Use Except:
var firstNotSecond = list1.Except(list2).ToList();
var secondNotFirst = list2.Except(list1).ToList();
I suspect there are approaches which would actually be marginally faster than this, but even this will be vastly faster than your O(N * M) approach.
If you want to combine these, you could create a method with the above and then a return statement:
return !firstNotSecond.Any() && !secondNotFirst.Any();
One point to note is that there is a difference in results between the original code in the question and the solution here: any duplicate elements which are only in one list will only be reported once with my code, whereas they'd be reported as many times as they occur in the original code.
For example, with lists of [1, 2, 2, 2, 3] and [1], the "elements in list1 but not list2" result in the original code would be [2, 2, 2, 3]. With my code it would just be [2, 3]. In many cases that won't be an issue, but it's worth being aware of.
Enumerable.SequenceEqual Method
Determines whether two sequences are equal according to an equality comparer.
MS.Docs
Enumerable.SequenceEqual(list1, list2);
This works for all primitive data types. If you need to use it on custom objects you need to implement IEqualityComparer
Defines methods to support the comparison of objects for equality.
IEqualityComparer Interface
Defines methods to support the comparison of objects for equality.
MS.Docs for IEqualityComparer
More efficient would be using Enumerable.Except:
var inListButNotInList2 = list.Except(list2);
var inList2ButNotInList = list2.Except(list);
This method is implemented by using deferred execution. That means you could write for example:
var first10 = inListButNotInList2.Take(10);
It is also efficient since it internally uses a Set<T> to compare the objects. It works by first collecting all distinct values from the second sequence, and then streaming the results of the first, checking that they haven't been seen before.
If you want the results to be case insensitive, the following will work:
List<string> list1 = new List<string> { "a.dll", "b1.dll" };
List<string> list2 = new List<string> { "A.dll", "b2.dll" };
var firstNotSecond = list1.Except(list2, StringComparer.OrdinalIgnoreCase).ToList();
var secondNotFirst = list2.Except(list1, StringComparer.OrdinalIgnoreCase).ToList();
firstNotSecond would contain b1.dll
secondNotFirst would contain b2.dll
using System.Collections.Generic;
using System.Linq;
namespace YourProject.Extensions
{
public static class ListExtensions
{
public static bool SetwiseEquivalentTo<T>(this List<T> list, List<T> other)
where T: IEquatable<T>
{
if (list.Except(other).Any())
return false;
if (other.Except(list).Any())
return false;
return true;
}
}
}
Sometimes you only need to know if two lists are different, and not what those differences are. In that case, consider adding this extension method to your project. Note that your listed objects should implement IEquatable!
Usage:
public sealed class Car : IEquatable<Car>
{
public Price Price { get; }
public List<Component> Components { get; }
...
public override bool Equals(object obj)
=> obj is Car other && Equals(other);
public bool Equals(Car other)
=> Price == other.Price
&& Components.SetwiseEquivalentTo(other.Components);
public override int GetHashCode()
=> Components.Aggregate(
Price.GetHashCode(),
(code, next) => code ^ next.GetHashCode()); // Bitwise XOR
}
Whatever the Component class is, the methods shown here for Car should be implemented almost identically.
It's very important to note how we've written GetHashCode. In order to properly implement IEquatable, Equals and GetHashCode must operate on the instance's properties in a logically compatible way.
Two lists with the same contents are still different objects, and will produce different hash codes. Since we want these two lists to be treated as equal, we must let GetHashCode produce the same value for each of them. We can accomplish this by delegating the hashcode to every element in the list, and using the standard bitwise XOR to combine them all. XOR is order-agnostic, so it doesn't matter if the lists are sorted differently. It only matters that they contain nothing but equivalent members.
Note: the strange name is to imply the fact that the method does not consider the order of the elements in the list. If you do care about the order of the elements in the list, this method is not for you!
try this way:
var difList = list1.Where(a => !list2.Any(a1 => a1.id == a.id))
.Union(list2.Where(a => !list1.Any(a1 => a1.id == a.id)));
Not for this Problem, but here's some code to compare lists for equal and not! identical objects:
public class EquatableList<T> : List<T>, IEquatable<EquatableList<T>> where T : IEquatable<T>
/// <summary>
/// True, if this contains element with equal property-values
/// </summary>
/// <param name="element">element of Type T</param>
/// <returns>True, if this contains element</returns>
public new Boolean Contains(T element)
{
return this.Any(t => t.Equals(element));
}
/// <summary>
/// True, if list is equal to this
/// </summary>
/// <param name="list">list</param>
/// <returns>True, if instance equals list</returns>
public Boolean Equals(EquatableList<T> list)
{
if (list == null) return false;
return this.All(list.Contains) && list.All(this.Contains);
}
If only combined result needed, this will work too:
var set1 = new HashSet<T>(list1);
var set2 = new HashSet<T>(list2);
var areEqual = set1.SetEquals(set2);
where T is type of lists element.
While Jon Skeet's answer is an excellent advice for everyday's practice with small to moderate number of elements (up to a few millions) it is nevertheless not the fastest approach and not very resource efficient. An obvious drawback is the fact that getting the full difference requires two passes over the data (even three if the elements that are equal are of interest as well). Clearly, this can be avoided by a customized reimplementation of the Except method, but it remains that the creation of a hash set requires a lot of memory and the computation of hashes requires time.
For very large data sets (in the billions of elements) it usually pays off to consider the particular circumstances. Here are a few ideas that might provide some inspiration:
If the elements can be compared (which is almost always the case in practice), then sorting the lists and applying the following zip approach is worth consideration:
/// <returns>The elements of the specified (ascendingly) sorted enumerations that are
/// contained only in one of them, together with an indicator,
/// whether the element is contained in the reference enumeration (-1)
/// or in the difference enumeration (+1).</returns>
public static IEnumerable<Tuple<T, int>> FindDifferences<T>(IEnumerable<T> sortedReferenceObjects,
IEnumerable<T> sortedDifferenceObjects, IComparer<T> comparer)
{
var refs = sortedReferenceObjects.GetEnumerator();
var diffs = sortedDifferenceObjects.GetEnumerator();
bool hasNext = refs.MoveNext() && diffs.MoveNext();
while (hasNext)
{
int comparison = comparer.Compare(refs.Current, diffs.Current);
if (comparison == 0)
{
// insert code that emits the current element if equal elements should be kept
hasNext = refs.MoveNext() && diffs.MoveNext();
}
else if (comparison < 0)
{
yield return Tuple.Create(refs.Current, -1);
hasNext = refs.MoveNext();
}
else
{
yield return Tuple.Create(diffs.Current, 1);
hasNext = diffs.MoveNext();
}
}
}
This can e.g. be used in the following way:
const int N = <Large number>;
const int omit1 = 231567;
const int omit2 = 589932;
IEnumerable<int> numberSequence1 = Enumerable.Range(0, N).Select(i => i < omit1 ? i : i + 1);
IEnumerable<int> numberSequence2 = Enumerable.Range(0, N).Select(i => i < omit2 ? i : i + 1);
var numberDiffs = FindDifferences(numberSequence1, numberSequence2, Comparer<int>.Default);
Benchmarking on my computer gave the following result for N = 1M:
Method
Mean
Error
StdDev
Ratio
Gen 0
Gen 1
Gen 2
Allocated
DiffLinq
115.19 ms
0.656 ms
0.582 ms
1.00
2800.0000
2800.0000
2800.0000
67110744 B
DiffZip
23.48 ms
0.018 ms
0.015 ms
0.20
-
-
-
720 B
And for N = 100M:
Method
Mean
Error
StdDev
Ratio
Gen 0
Gen 1
Gen 2
Allocated
DiffLinq
12.146 s
0.0427 s
0.0379 s
1.00
13000.0000
13000.0000
13000.0000
8589937032 B
DiffZip
2.324 s
0.0019 s
0.0018 s
0.19
-
-
-
720 B
Note that this example of course benefits from the fact that the lists are already sorted and integers can be very efficiently compared. But this is exactly the point: If you do have favourable circumstances, make sure that you exploit them.
A few further comments: The speed of the comparison function is clearly relevant for the overall performance, so it may be beneficial to optimize it. The flexibility to do so is a benefit of the zipping approach. Furthermore, parallelization seems more feasible to me, although by no means easy and maybe not worth the effort and the overhead. Nevertheless, a simple way to speed up the process by roughly a factor of 2, is to split the lists respectively in two halfs (if it can be efficiently done) and compare the parts in parallel, one processing from front to back and the other in reverse order.
I have used this code to compare two list which has million of records.
This method will not take much time
//Method to compare two list of string
private List<string> Contains(List<string> list1, List<string> list2)
{
List<string> result = new List<string>();
result.AddRange(list1.Except(list2, StringComparer.OrdinalIgnoreCase));
result.AddRange(list2.Except(list1, StringComparer.OrdinalIgnoreCase));
return result;
}
I compared 3 different methods for comparing different data sets. Tests below create a string collection of all the numbers from 0 to length - 1, then another collection with the same range, but with even numbers. I then pick out the odd numbers from the first collection.
Using Linq Except
public void TestExcept()
{
WriteLine($"Except {DateTime.Now}");
int length = 20000000;
var dateTime = DateTime.Now;
var array = new string[length];
for (int i = 0; i < length; i++)
{
array[i] = i.ToString();
}
Write("Populate set processing time: ");
WriteLine(DateTime.Now - dateTime);
var newArray = new string[length/2];
int j = 0;
for (int i = 0; i < length; i+=2)
{
newArray[j++] = i.ToString();
}
dateTime = DateTime.Now;
Write("Count of items: ");
WriteLine(array.Except(newArray).Count());
Write("Count processing time: ");
WriteLine(DateTime.Now - dateTime);
}
Output
Except 2021-08-14 11:43:03 AM
Populate set processing time: 00:00:03.7230479
2021-08-14 11:43:09 AM
Count of items: 10000000
Count processing time: 00:00:02.9720879
Using HashSet.Add
public void TestHashSet()
{
WriteLine($"HashSet {DateTime.Now}");
int length = 20000000;
var dateTime = DateTime.Now;
var hashSet = new HashSet<string>();
for (int i = 0; i < length; i++)
{
hashSet.Add(i.ToString());
}
Write("Populate set processing time: ");
WriteLine(DateTime.Now - dateTime);
var newHashSet = new HashSet<string>();
for (int i = 0; i < length; i+=2)
{
newHashSet.Add(i.ToString());
}
dateTime = DateTime.Now;
Write("Count of items: ");
// HashSet Add returns true if item is added successfully (not previously existing)
WriteLine(hashSet.Where(s => newHashSet.Add(s)).Count());
Write("Count processing time: ");
WriteLine(DateTime.Now - dateTime);
}
Output
HashSet 2021-08-14 11:42:43 AM
Populate set processing time: 00:00:05.6000625
Count of items: 10000000
Count processing time: 00:00:01.7703057
Special HashSet test:
public void TestLoadingHashSet()
{
int length = 20000000;
var array = new string[length];
for (int i = 0; i < length; i++)
{
array[i] = i.ToString();
}
var dateTime = DateTime.Now;
var hashSet = new HashSet<string>(array);
Write("Time to load hashset: ");
WriteLine(DateTime.Now - dateTime);
}
> TestLoadingHashSet()
Time to load hashset: 00:00:01.1918160
Using .Contains
public void TestContains()
{
WriteLine($"Contains {DateTime.Now}");
int length = 20000000;
var dateTime = DateTime.Now;
var array = new string[length];
for (int i = 0; i < length; i++)
{
array[i] = i.ToString();
}
Write("Populate set processing time: ");
WriteLine(DateTime.Now - dateTime);
var newArray = new string[length/2];
int j = 0;
for (int i = 0; i < length; i+=2)
{
newArray[j++] = i.ToString();
}
dateTime = DateTime.Now;
WriteLine(dateTime);
Write("Count of items: ");
WriteLine(array.Where(a => !newArray.Contains(a)).Count());
Write("Count processing time: ");
WriteLine(DateTime.Now - dateTime);
}
Output
Contains 2021-08-14 11:19:44 AM
Populate set processing time: 00:00:03.1046998
2021-08-14 11:19:49 AM
Count of items: Hosting process exited with exit code 1.
(Didnt complete. Killed it after 14 minutes)
Conclusion:
Linq Except ran approximately 1 second slower on my device than using HashSets (n=20,000,000).
Using Where and Contains ran for a very long time
Closing remarks on HashSets:
Unique data
Make sure to override GetHashCode (correctly) for class types
May need up to 2x the memory if you make a copy of the data set, depending on implementation
HashSet is optimized for cloning other HashSets using the IEnumerable constructor, but it is slower to convert other collections to HashSets (see special test above)
First approach:
if (list1 != null && list2 != null && list1.Select(x => list2.SingleOrDefault(y => y.propertyToCompare == x.propertyToCompare && y.anotherPropertyToCompare == x.anotherPropertyToCompare) != null).All(x => true))
return true;
Second approach if you are ok with duplicate values:
if (list1 != null && list2 != null && list1.Select(x => list2.Any(y => y.propertyToCompare == x.propertyToCompare && y.anotherPropertyToCompare == x.anotherPropertyToCompare)).All(x => true))
return true;
Both Jon Skeet's and miguelmpn's answers are good. It depends on whether the order of the list elements is important or not:
// take order into account
bool areEqual1 = Enumerable.SequenceEqual(list1, list2);
// ignore order
bool areEqual2 = !list1.Except(list2).Any() && !list2.Except(list1).Any();
One line:
var list1 = new List<int> { 1, 2, 3 };
var list2 = new List<int> { 1, 2, 3, 4 };
if (list1.Except(list2).Count() + list2.Except(list1).Count() == 0)
Console.WriteLine("same sets");
I did the generic function for comparing two lists.
public static class ListTools
{
public enum RecordUpdateStatus
{
Added = 1,
Updated = 2,
Deleted = 3
}
public class UpdateStatu<T>
{
public T CurrentValue { get; set; }
public RecordUpdateStatus UpdateStatus { get; set; }
}
public static List<UpdateStatu<T>> CompareList<T>(List<T> currentList, List<T> inList, string uniqPropertyName)
{
var res = new List<UpdateStatu<T>>();
res.AddRange(inList.Where(a => !currentList.Any(x => x.GetType().GetProperty(uniqPropertyName).GetValue(x)?.ToString().ToLower() == a.GetType().GetProperty(uniqPropertyName).GetValue(a)?.ToString().ToLower()))
.Select(a => new UpdateStatu<T>
{
CurrentValue = a,
UpdateStatus = RecordUpdateStatus.Added,
}));
res.AddRange(currentList.Where(a => !inList.Any(x => x.GetType().GetProperty(uniqPropertyName).GetValue(x)?.ToString().ToLower() == a.GetType().GetProperty(uniqPropertyName).GetValue(a)?.ToString().ToLower()))
.Select(a => new UpdateStatu<T>
{
CurrentValue = a,
UpdateStatus = RecordUpdateStatus.Deleted,
}));
res.AddRange(currentList.Where(a => inList.Any(x => x.GetType().GetProperty(uniqPropertyName).GetValue(x)?.ToString().ToLower() == a.GetType().GetProperty(uniqPropertyName).GetValue(a)?.ToString().ToLower()))
.Select(a => new UpdateStatu<T>
{
CurrentValue = a,
UpdateStatus = RecordUpdateStatus.Updated,
}));
return res;
}
}
I think this is a simple and easy way to compare two lists element by element
x=[1,2,3,5,4,8,7,11,12,45,96,25]
y=[2,4,5,6,8,7,88,9,6,55,44,23]
tmp = []
for i in range(len(x)) and range(len(y)):
if x[i]>y[i]:
tmp.append(1)
else:
tmp.append(0)
print(tmp)
Maybe it's funny, but this works for me:
string.Join("",List1) != string.Join("", List2)
This is the best solution you'll found
var list3 = list1.Where(l => list2.ToList().Contains(l));

get next available integer using LINQ

Say I have a list of integers:
List<int> myInts = new List<int>() {1,2,3,5,8,13,21};
I would like to get the next available integer, ordered by increasing integer. Not the last or highest one, but in this case the next integer that is not in this list. In this case the number is 4.
Is there a LINQ statement that would give me this? As in:
var nextAvailable = myInts.SomeCoolLinqMethod();
Edit: Crap. I said the answer should be 2 but I meant 4. I apologize for that!
For example: Imagine that you are responsible for handing out process IDs. You want to get the list of current process IDs, and issue a next one, but the next one should not just be the highest value plus one. Rather, it should be the next one available from an ordered list of process IDs. You could get the next available starting with the highest, it does not really matter.
I see a lot of answers that write a custom extension method, but it is possible to solve this problem with the standard linq extension methods and the static Enumerable class:
List<int> myInts = new List<int>() {1,2,3,5,8,13,21};
// This will set firstAvailable to 4.
int firstAvailable = Enumerable.Range(1, Int32.MaxValue).Except(myInts).First();
The answer provided by #Kevin has a undesirable performance profile. The logic will access the source sequence numerous times: once for the .Count call, once for the .FirstOrDefault call, and once for each .Contains call. If the IEnumerable<int> instance is a deferred sequence, such as the result of a .Select call, this will cause at least 2 calculations of the sequence, along with once for each number. Even if you pass a list to the method, it will potentially go through the entire list for each checked number. Imagine running it on the sequence { 1, 1000000 } and you can see how it would not perform well.
LINQ strives to iterate source sequences no more than once. This is possible in general and can have a big impact on the performance of your code. Below is an extension method which will iterate the sequence exactly once. It does so by looking for the difference between each successive pair, then adds 1 to the first lower number which is more than 1 away from the next number:
public static int? FirstMissing(this IEnumerable<int> numbers)
{
int? priorNumber = null;
foreach(var number in numbers.OrderBy(n => n))
{
var difference = number - priorNumber;
if(difference != null && difference > 1)
{
return priorNumber + 1;
}
priorNumber = number;
}
return priorNumber == null ? (int?) null : priorNumber + 1;
}
Since this extension method can be called on any arbitrary sequence of integers, we make sure to order them before we iterate. We then calculate the difference between the current number and the prior number. If this is the first number in the list, priorNumber will be null and thus difference will be null. If this is not the first number in the list, we check to see if the difference from the prior number is exactly 1. If not, we know there is a gap and we can add 1 to the prior number.
You can adjust the return statement to handle sequences with 0 or 1 items as you see fit; I chose to return null for empty sequences and n + 1 for the sequence { n }.
This will be fairly efficient:
static int Next(this IEnumerable<int> source)
{
int? last = null;
foreach (var next in source.OrderBy(_ => _))
{
if (last.HasValue && last.Value + 1 != next)
{
return last.Value + 1;
}
last = next;
}
return last.HasValue ? last.Value + 1 : Int32.MaxValue;
}
public static class IntExtensions
{
public static int? SomeCoolLinqMethod(this IEnumerable<int> ints)
{
int counter = ints.Count() > 0 ? ints.First() : -1;
while (counter < int.MaxValue)
{
if (!ints.Contains(++counter)) return counter;
}
return null;
}
}
Usage:
var nextAvailable = myInts.SomeCoolLinqMethod();
Ok, here is the solution that I came up with that works for me.
var nextAvailableInteger = Enumerable.Range(myInts.Min(),myInts.Max()).FirstOrDefault( r=> !myInts.Contains(r));
If anyone has a more elegant solution I would be happy to accept that one. But for now, this is what I'm putting in my code and moving on.
Edit: this is what I implemented after Kevin's suggestion to add an extension method. And that was the real answer - that no single LINQ extension would do so it makes more sense to add my own. That is really what I was looking for.
public static int NextAvailableInteger(this IEnumerable<int> ints)
{
return NextAvailableInteger(ints, 1); // by default we use one
}
public static int NextAvailableInteger(this IEnumerable<int> ints, int defaultValue)
{
if (ints == null || ints.Count() == 0) return defaultValue;
var ordered = ints.OrderBy(v => v);
int counter = ints.Min();
int max = ints.Max();
while (counter < max)
{
if (!ordered.Contains(++counter)) return counter;
}
return (++counter);
}
Not sure if this qualifies as a cool Linq method, but using the left outer join idea from This SO Answer
var thelist = new List<int> {1,2,3,4,5,100,101};
var nextAvailable = (from curr in thelist
join next in thelist
on curr + 1 equals next into g
from newlist in g.DefaultIfEmpty()
where !g.Any ()
orderby curr
select curr + 1).First();
This puts the processing on the sql server side if you're using Linq to Sql, and allows you to not have to pull the ID lists from the server to memory.
var nextAvailable = myInts.Prepend(0).TakeWhile((x,i) => x == i).Last() + 1;
It is 7 years later, but there are better ways of doing this than the selected answer or the answer with the most votes.
The list is already in order, and based on the example 0 doesn't count. We can just prepend 0 and check if each item matches it's index. TakeWhile will stop evaluating once it hits a number that doesn't match, or at the end of the list.
The answer is the last item that matches, plus 1.
TakeWhile is more efficient than enumerating all the possible numbers then excluding the existing numbers using Except, because we TakeWhile will only go through the list until it finds the first available number, and the resulting Enumerable collection is at most n.
The answer using Except generates an entire enumerable of answers that are not needed just to grab the first one. Linq can do some optimization with First(), but it still much slower and more memory intensive than TakeWhile.

Access Enumerator within a foreach loop?

I have a List class, and I would like to override GetEnumerator() to return my own Enumerator class. This Enumerator class would have two additional properties that would be updated as the Enumerator is used.
For simplicity (this isn't the exact business case), let's say those properties were CurrentIndex and RunningTotal.
I could manage these properties within the foreach loop manually, but I would rather encapsulate this functionality for reuse, and the Enumerator seems to be the right spot.
The problem: foreach hides all the Enumerator business, so is there a way to, within a foreach statement, access the current Enumerator so I can retrieve my properties? Or would I have to foreach, use a nasty old while loop, and manipulate the Enumerator myself?
Strictly speaking, I would say that if you want to do exactly what you're saying, then yes, you would need to call GetEnumerator and control the enumerator yourself with a while loop.
Without knowing too much about your business requirement, you might be able to take advantage of an iterator function, such as something like this:
public static IEnumerable<decimal> IgnoreSmallValues(List<decimal> list)
{
decimal runningTotal = 0M;
foreach (decimal value in list)
{
// if the value is less than 1% of the running total, then ignore it
if (runningTotal == 0M || value >= 0.01M * runningTotal)
{
runningTotal += value;
yield return value;
}
}
}
Then you can do this:
List<decimal> payments = new List<decimal>() {
123.45M,
234.56M,
.01M,
345.67M,
1.23M,
456.78M
};
foreach (decimal largePayment in IgnoreSmallValues(payments))
{
// handle the large payments so that I can divert all the small payments to my own bank account. Mwahaha!
}
Updated:
Ok, so here's a follow-up with what I've termed my "fishing hook" solution. Now, let me add a disclaimer that I can't really think of a good reason to do something this way, but your situation may differ.
The idea is that you simply create a "fishing hook" object (reference type) that you pass to your iterator function. The iterator function manipulates your fishing hook object, and since you still have a reference to it in your code outside, you have visibility into what's going on:
public class FishingHook
{
public int Index { get; set; }
public decimal RunningTotal { get; set; }
public Func<decimal, bool> Criteria { get; set; }
}
public static IEnumerable<decimal> FishingHookIteration(IEnumerable<decimal> list, FishingHook hook)
{
hook.Index = 0;
hook.RunningTotal = 0;
foreach(decimal value in list)
{
// the hook object may define a Criteria delegate that
// determines whether to skip the current value
if (hook.Criteria == null || hook.Criteria(value))
{
hook.RunningTotal += value;
yield return value;
hook.Index++;
}
}
}
You would utilize it like this:
List<decimal> payments = new List<decimal>() {
123.45M,
.01M,
345.67M,
234.56M,
1.23M,
456.78M
};
FishingHook hook = new FishingHook();
decimal min = 0;
hook.Criteria = x => x > min; // exclude any values that are less than/equal to the defined minimum
foreach (decimal value in FishingHookIteration(payments, hook))
{
// update the minimum
if (value > min) min = value;
Console.WriteLine("Index: {0}, Value: {1}, Running Total: {2}", hook.Index, value, hook.RunningTotal);
}
// Resultint output is:
//Index: 0, Value: 123.45, Running Total: 123.45
//Index: 1, Value: 345.67, Running Total: 469.12
//Index: 2, Value: 456.78, Running Total: 925.90
// we've skipped the values .01, 234.56, and 1.23
Essentially, the FishingHook object gives you some control over how the iterator executes. The impression I got from the question was that you needed some way to access the inner workings of the iterator so that you could manipulate how it iterates while you are in the middle of iterating, but if this is not the case, then this solution might be overkill for what you need.
With foreach you indeed can't get the enumerator - you could, however, have the enumerator return (yield) a tuple that includes that data; in fact, you could probably use LINQ to do it for you...
(I couldn't cleanly get the index using LINQ - can get the total and current value via Aggregate, though; so here's the tuple approach)
using System.Collections;
using System.Collections.Generic;
using System;
class MyTuple
{
public int Value {get;private set;}
public int Index { get; private set; }
public int RunningTotal { get; private set; }
public MyTuple(int value, int index, int runningTotal)
{
Value = value; Index = index; RunningTotal = runningTotal;
}
static IEnumerable<MyTuple> SomeMethod(IEnumerable<int> data)
{
int index = 0, total = 0;
foreach (int value in data)
{
yield return new MyTuple(value, index++,
total = total + value);
}
}
static void Main()
{
int[] data = { 1, 2, 3 };
foreach (var tuple in SomeMethod(data))
{
Console.WriteLine("{0}: {1} ; {2}", tuple.Index,
tuple.Value, tuple.RunningTotal);
}
}
}
You can also do something like this in a more Functional way, depending on your requirements. What you are asking can be though of as "zipping" together multiple sequences, and then iterating through them all at once. The three sequences for the example you gave would be:
The "value" sequence
The "index" sequence
The "Running Total" Sequence
The next step would be to specify each of these sequences seperately:
List<decimal> ValueList
var Indexes = Enumerable.Range(0, ValueList.Count)
The last one is more fun... the two methods I can think of are to either have a temporary variable used to sum up the sequence, or to recalculate the sum for each item. The second is obviously much less performant, I would rather use the temporary:
decimal Sum = 0;
var RunningTotals = ValueList.Select(v => Sum = Sum + v);
The last step would be to zip these all together. .Net 4 will have the Zip operator built in, in which case it will look like this:
var ZippedSequence = ValueList.Zip(Indexes, (value, index) => new {value, index}).Zip(RunningTotals, (temp, total) => new {temp.value, temp.index, total});
This obviously gets noisier the more things you try to zip together.
In the last link, there is source for implementing the Zip function yourself. It really is a simple little bit of code.

Categories

Resources