Order a list by id - c#

I'm making a web application using Jquery, c # and ASP.Net in this application I insert id and make the id like next way:
the first two digits are the current day, the second two digits are the current month, the third two digits are the current year and a final digit is a consecutive number.
so if I generate the first id of the current day the id will be like this:
DDMMYY+CONSECUTIVE NUMBER
I want to order all id from oldest to newest based on date of id and the consecutive number
how could I make this using linq?

If you don't want to store equivalent versions of the ids as strongly typed objects, I would write a comparer like the following (you can tweak it for your exact requirements and add validation if needed).
public class IdComparer : IComparer<string>
{
public int Compare(string x, string y)
{
int xYear = GetYear(x);
int yYear = GetYear(y);
if (xYear != yYear)
{
return xYear.CompareTo(yYear);
}
int xMonth = GetMonth(x);
int yMonth = GetMonth(y);
if (xMonth != yMonth)
{
return xMonth.CompareTo(yMonth);
}
int xDay = GetDay(x);
int yDay = GetDay(y);
if (xDay != yDay)
{
return xDay.CompareTo(yDay);
}
int xUniqueId = GetUniqueIdentifier(x);
int yUniqueId = GetUniqueIdentifier(y);
if (xUniqueId != yUniqueId)
{
return xUniqueId.CompareTo(yUniqueId);
}
return 0;
}
private static int GetYear(string id)
{
return Int32.Parse(id.Substring(4, 2));
}
private static int GetMonth(string id)
{
return Int32.Parse(id.Substring(2, 2));
}
private static int GetDay(string id)
{
return Int32.Parse(id.Substring(0, 2));
}
private static int GetUniqueIdentifier(string id)
{
return Int32.Parse(id.Substring(6));
}
}
And then you just need to call idList.OrderBy(x => x, new IdComparer());
This feels much cleaner and easier to read than trying to do everything in one linq statement unless you have a specific requirement for doing this?
This class could then be unit tested and any issues / bugs would be easier to resolve than as part of a long linq statement.

So the rules for comparing these id's is:
If the year of one id is greater than the other, then that id is considered greater.
If the years are equal but the month of one id is greater than the other, then that id is considered greater.
If the years and months are equal but the day of one id is greater than the other, then that id is considered greater.
If the years, months, and days are equal but the incrementing portion of one id is greater than the other, then that id is considered greater.
So now we just need to break the id value into 4 parts, after which we can do a proper comparison:
The year part, which is Substring(4, 2)
The month part, which is Substring(2, 2)
The day part, which is Substring(0, 2)
The incrementing number part, which is Substring(7)
Here's one way this could be done, by creating a class that implement IComparer<Item> (where Item is your class). Note: This answer assumes that your id column is a string since you're creating it by concatenating different values together. If it's not a string, then the arguments should be int values and would need to be converted to strings inside the method first:
public class ItemComparer : IComparer<Item>
{
public int Compare(Item x, Item y)
{
return Compare(x?.Id, y?.Id);
}
private int Compare(string first, string second)
{
// Rudimentary argument validation - this could be improved
// Null check - consider null as less than non-null
if (first == null) return second == null ? 0 : -1;
if (second == null) return 1;
// Number check - if fails, return string comparison
if (!first.All(char.IsDigit) || !second.All(char.IsDigit))
return first.CompareTo(second);
// Length check - if fails, return int comparison
if (first.Length < 7 || second.Length < 7)
return int.Parse(first).CompareTo(int.Parse(second));
// Compare years
var result = int.Parse(first.Substring(4, 2))
.CompareTo(int.Parse(second.Substring(4, 2)));
if (result != 0) return result;
// Compare months part
result = int.Parse(first.Substring(2, 2))
.CompareTo(int.Parse(second.Substring(2, 2)));
if (result != 0) return result;
// Compare days part
result = int.Parse(first.Substring(0, 2))
.CompareTo(int.Parse(second.Substring(0, 2)));
if (result != 0) return result;
// Compare incrementing number part
return int.Parse(first.Substring(6)).CompareTo(int.Parse(second.Substring(6)));
}
}
Now that we have this method in place, we can order a list of items by the id. For an example, let's start with a simple class:
public class Item
{
public string Id { get; set; }
public string Value { get; set; }
}
We can populate a list of these items and use our comparer class to order them:
public static void Main(string[] args)
{
var items = new List<Item>
{
// Second-most oldest item on Oct. 14, 2018
new Item {Id = "14101801", Value = "Second"},
// Fifth-most oldest item on Nov. 14, 2019
new Item {Id = "14111901", Value = "Fifth"},
// First-most oldest item on Oct. 13, 2018
new Item {Id = "13101801", Value = "First"},
// Fourth-most oldest item on Nov. 14, 2018 (id 02)
new Item {Id = "14111802", Value = "Fourth"},
// Third-most oldest item on Nov. 14, 2018 (id 01)
new Item {Id = "14111801", Value = "Third"},
};
// Order our items by Id:
items = items.OrderBy(i => i, new ItemComparer()).ToList();
// Output our results
items.ForEach(Console.WriteLine);
GetKeyFromUser("\nDone! Press any key to exit...");
}
Output

If you change your ID to be YYMMDD + the extra number, LINQ's OrderBy(foo => foo.Id) will work exactly as you need. YYMMDD is naturally ordered.
If you don't have the capability to change the format of the ID, you will need to implement a comparer and provide that to OrderBy

Related

How can I get a sorted `List<string>` based on an associated numeric index

I want to sort a string using indexes, but it's not working after 10th index, 10th/later indexes added after 1st index in the list after using Sort() method.
I have tried below code, but it's not working.
List<string> stringList = new List<string>();
foreach (ManagementObject disk in objectSearcher.Get() )
{
stringList.Add(string.Format("{0, -15} {1,-35} {2, -20}",
disk.GetPropertyValue("Index"),
disk.GetPropertyValue("Model"),
diskSize));
}
stringList.Sort();
In the above scenario, the code is working fine for 0-9 indexes but for later indexes, this is not working as expected.
Put your object into a class structure and work with that strong type as long as possible:
public class DiskInfo
{
private int index = 0;
private string model = String.Empty;
private unsigned long size = 0;
public int getIndex() { return index; }
public string getModel() { return model; }
public unsigned long getSize() { return size; }
public DiskInfo(int index, string model, unsigned long size)
{
this.index = index;
this.model = model;
this.size = size;
}
public string ToString()
{
return string.Format("{0, -15} {1,-35} {2, -20}", index, model, size);
}
}
// ...
List<DiskInfo> lst = new List<DiskInfo>();
foreach (ManagementObject disk in objectSearcher.Get() )
{
lst.Add(new DiskInfo(
disk.GetPropertyValue("Index"),
disk.GetPropertyValue("Model"),
diskSize
));
}
Adjust types as needed.
Then you can use simple linq to sort.
lst = lst.OrderBy(x => x.getIndex());
On top of that you get IDE support and compiler errors instead of trying to figure out why you get format exceptions, etc when mucking around with strings.
If your input data is not of the correct data type, then cast it then and there.
For example, index gets passed as a string:
string strIdx = "15";
lst.Add(new DiskInfo(int.Parse(strIdx)), ...)
It's not working after 10th index.
That is because List().Sort invoke string's comparison function.In string comparison "0" is less than "1", "1" is less than "11" and "12" is less than "2" etc.So it is not working after 10.
You can definition a sample comparison function as below:
public static int Compare(string a, string b)
{
return int.Parse(a.Substring(0, 15)).CompareTo(int.Parse(b.Substring(0, 15)));
}
and then invoke it in sort method:
stringList.Sort(Compare);
The prerequisite is that your format is satisfied that its first 15 characters can convert to an integer.
You are probably looking for the "logical sort order" seen in Windows Explorer. Below I have replaced the default string comparer with a comparer using that API: StrCmpLogicalW
class Program
{
public sealed class NaturalStringComparer : IComparer<string>
{
[DllImport("shlwapi.dll", CharSet = CharSet.Unicode)]
public static extern int StrCmpLogicalW(string psz1, string psz2);
public int Compare(string a, string b) => StrCmpLogicalW(a, b);
}
static void Main()
{
var stringList = new List<string>();
var index = 0;
while (index < 12)
{
stringList.Add($"{index,-15} {"Model",-35} {"35GB",-20}");
index++;
}
stringList.Sort(new NaturalStringComparer());
foreach (var s in stringList)
{
Console.WriteLine(s);
}
}
}
You seem to be deliberately left aligning index your numbers, which will mean that the ascending string sorted sequence of 1 through 12 would would be 1, 11, 12, 2, 3, 4, ...
Since you have the index value during the creation of the string, it would be wasteful to again parse the number out of the string in order to sort it. It would be better to retain the index and the string separately in a suitable data structure, sort by the index, and then project out just the string.
Updated for OP's new Question
Creating a custom POCO class (with or without an IComparable implementation) seems overkill everytime you need to sort an enumerable of related data by one of its properties.
Instead, you can easily build up a sortable anon class, struct or tuple containing the sortable integer and the concatenated string, then sort, then project out just the string. Either way, OP's GetPropertyValue method appears to return (reflect) a weak type such as object or string - accepted answer wouldn't compile as it needs to cast index to an int.
Here's value tuple solution:
var tuples = new List<(int index, string str)>();
foreach (ManagementObject disk in objectSearcher.Get() )
{
var indexValue = int.Parse(disk.GetPropertyValue("Index"));
tuples.Add((indexValue, string.Format("{0, -15} {1,-35} {2, -20}",
indexValue,
disk.GetPropertyValue("Model"),
diskSize)));
}
// Sort by index, and project out the assembled string.
var myList = tuples
.OrderBy(t => t.index)
.Select(t => t.str)
.ToList();
Original Answer, OP had a simple loop
What I've done below is to keep a Value tuple of the original string, and the parsed integer value of the first 15 digits.
Note that this will break if there are non-numeric characters in the first 15 characters of your string.
// Test Data
var strings = Enumerable.Range(0, 12)
.Select(i => (string.Format("{0, -15} {1,-35} {2, -20}", i, "Model", "35GB")));
// Project out a tuple of (index, string)
var indexedTuples = strings.Select(s => (idx: int.Parse(s.Substring(0, 15)), str: s));
var sorted = indexedTuples.OrderBy(t => t.idx)
.Select(t => t.str);

Find the closest DateTime key in Dictionary<DateTime, double>

I have the Date portion of DateTime as lookup value and like to retrieve the matching value in a Dictionary of type Dictionary<DateTime, double>. Note the DateTime keys are stored as only the Date portion.
My problem is that there may be no key that matches with my lookup value. What I then like to do is to find the nearest previous dateTime.Date key and matching value.
Now, I am aware that Dictionaries are not sorted by key. I could use a SortedDictionary but prefer to use a Dictionary for a specific reason or else switch to a List collection (can be pre-sorted). My question is, what would you recommend to do in this case: Would it be more efficient to retain the Dictionary structure and decrement my lookup value until I find a matching key? Or would it be better to use a list collection and use Linq? Each Dictionary contains around 5000 key/value pairs. Also, please note I look for a highly computationally efficient solution because the frequency of lookups is quite high (potentially many hundred thousand times (each lookup is guaranteed to be different from any previous value)
Since you need it fast, I think the best thing would be to use the results of a BinarySearch. This requires a List<T> that's sorted.
int result = myList.BinarySearch(targetDate);
if (result >= 0)
return myList[result];
else
{
int nextLarger = ~result;
// return next smaller, or if that doesn't exist, the smallest one
return myList[Math.Max(0, nextLarger - 1)];
}
It should be possible to create a class that combines a Dictionary<TKey,TValue> and a sorted List<TKey> that still serializes like a Dictionary<TKey,TValue>. The serialization might be as simple (in Json.NET) as putting a [JsonConverter(typeof(KeyValuePairConverter))] on your class.
Just for completeness and in case others read this in the future, if speed wasn't very important, you can do it more simply with something like this:
var result = myDict.Keys.Where(x => x < targetDate).Max();
I would use a custom structure and collection to store these informations:
public struct DateValue
{
public DateValue(DateTime date, double val)
: this()
{
this.Date = date;
this.Value = val;
}
public DateTime Date { get; set; }
}
Here is a possible implementation of a collection that holds all DateValues and encapsulates the logic to return the nearest. It uses List.BinarySearch to find it. If it doesn't find a direct match it uses the logic of BinarySearch to detect the nearest which is:
The index of the specified value in the specified array, if value is
found. If value is not found and value is less than one or more
elements in array, a negative number which is the bitwise complement
of the index of the first element that is larger than value. If value
is not found and value is greater than any of the elements in array, a
negative number which is the bitwise complement of (the index of the
last element plus 1).
public class DateValueCollection : List<DateValue>, IComparer<DateValue>
{
public DateValueCollection() { }
public DateValueCollection(IEnumerable<DateValue> dateValues, bool isOrdered)
{
if (isOrdered)
base.AddRange(dateValues);
else
base.AddRange(dateValues.OrderBy(dv => dv.Date));
}
public DateValue GetNearest(DateTime date)
{
if (base.Count == 0)
return default(DateValue);
DateValue dv = new DateValue(date, 0);
int index = base.BinarySearch(dv, this);
if (index >= 0)
{
return base[index];
}
// If not found, List.BinarySearch returns the complement of the index
index = ~index;
DateValue[] all;
if(index >= base.Count - 1)
{
// proposed index is last, check previous and last
all = new[] { base[base.Count - 1], base[base.Count - 2] };
}
else if(index == 0)
{
// proposed index is first, check first and second
all = new[] { base[index], base[index + 1] };
}
else
{
// return nearest DateValue from previous and this
var thisDV = base[index];
var prevDV = base[index - 1];
all = new[]{ thisDV, prevDV };
}
return all.OrderBy(x => (x.Date - date).Duration()).First();
}
public int Compare(DateValue x, DateValue y)
{
return x.Date.CompareTo(y.Date);
}
}
Quick test:
var dateVals = new[] {
new DateValue(DateTime.Today.AddDays(10), 1), new DateValue(DateTime.Today, 3), new DateValue(DateTime.Today.AddDays(4), 7)
};
var dvCollection = new DateValueCollection(dateVals, false);
DateValue nearest = dvCollection.GetNearest(DateTime.Today.AddDays(1));
why worry about optomisation prematurley
do it
THEN AND ONLY THEN if its slow then you have a problem
Measure it with a profiler
then understanding starts at which point you try other ways and profile them.
Answer is: If you do it any way at all and there isnt a performance problem you just saved time and managed to do something else useful to add value in your day.
Premature optimization is not only pointless you will usually be completely wrong about where you should be looking.

C# application how could i implement dictionary or hashtable for this?

This is my problem, i want to write a basic console application where i enter a date as input, if that date hasnt been entered the application then allow a time to enter a note, i.e for 07/08/2013 for time 5:00 - 7:00 pm enter text blah blah
then application will keep looping, if i enter the same date, i shouldnt be able to enter
the same times as above, but i should be able to enter 7:00 to 8 for example.
i was thinking of using dictionary :
Dictionary<string, Booking> BookingDict = new Dictionary<string, Booking>();
and adding date as id, but it seems only one element id can be entered uniquely
can some one please help
Do you have to use a list or dictionary? If you will not be searching in it alot and do not have a lot of elements, you are probably best going with a generic list: List<Booking>
Then you can use LINQ to search through the list to retrieve the bookings that you need.
i would recommend a dictionary with the key being the date and the value be an array of 24 booleans represent hours. that is only if the booking is for full hours. if it's by the minute then the array will be too big and then another option might be needed.
each cell in the array is true if the hour is booked, for example if <07/08/13, booked[2] = true> means that hour 2-3 is booked on 07/08/13 is booked.
when you get a new booking you'll need to check each hour between the two. if you got hours 4-10 you'll need to check the value in 4,5,6,7,8,9. not the best efficiency i think, but not that bad if no one is booking a full week or so.
sum it up my answer is using
Dictionary<string/DateTime, bool[24]> bookingTbl
create a key with two dateTime (the start and end dateTime). When you want to enter a new booking, check (with linq for instance) if there's a known end Datetime that is between your new start and end datetime and do the same with the known start datetime. If there is no overlap, add it.
Here's an example:
Dictionary<TwoUintsKeyInfo,object> test = new Dictionary<TwoUintsKeyInfo, object>();
test.Add(new TwoUintsKeyInfo { IdOne = 3, IdTwo = 9 }, new object());
test.Add(new TwoUintsKeyInfo { IdOne = 10, IdTwo = 15 }, new object());
uint newStartPoint1 = 16,newEndPoint1=20;
bool mayUse = (from result in test
let newStartPointIsBetweenStartAndEnd = newStartPoint1.Between(result.Key.IdOne,result.Key.IdTwo)
let newEndPointIsBetweenStartAndEnd = newEndPoint1.Between(result.Key.IdOne,result.Key.IdTwo)
let completeOverlap = result.Key.IdOne < newStartPoint1 && result.Key.IdTwo > newEndPoint1
let oldDateWithingNewRange = result.Key.IdOne.Between(newStartPoint1, newEndPoint1) || result.Key.IdTwo.Between(newStartPoint1, newEndPoint1)
let FoundOne = 1
where newStartPointIsBetweenStartAndEnd || newEndPointIsBetweenStartAndEnd || completeOverlap || oldDateWithingNewRange
select FoundOne).Sum()==0;
using:
public static class LinqExtentions
{
/// <summary>
/// Note: For the compare parameters; First the low, than the High
/// </summary>
/// <returns>Bool</returns>
public static bool Between<T1, T2, T3>(this T1 actual, T2 lowest, T3 highest)
where T1 : IComparable
where T2 : IConvertible
where T3 : IConvertible
{
return actual.CompareTo(lowest.ToType(typeof(T1), null)) >= 0 &&
actual.CompareTo(highest.ToType(typeof(T1), null)) <= 0;
}
}
public class TwoUintsKeyInfo
{
public uint IdOne { get; set; }
public uint IdTwo { get; set; }
public class EqualityComparerTwoUintsKeyInfo : IEqualityComparer<TwoUintsKeyInfo>
{
public bool Equals(TwoUintsKeyInfo x, TwoUintsKeyInfo y)
{
return x.IdOne == y.IdOne &&
x.IdTwo == y.IdTwo;
}
public int GetHashCode(TwoUintsKeyInfo x)
{
return (Math.Pow(x.IdOne, Math.E) + Math.Pow(x.IdTwo, Math.PI)).GetHashCode();
}
}
}
I tested it, seems to be working.
Good luck!

Quickest way to compare two generic lists for differences

What is the quickest (and least resource intensive) to compare two massive (>50.000 items) and as a result have two lists like the ones below:
items that show up in the first list but not in the second
items that show up in the second list but not in the first
Currently I'm working with the List or IReadOnlyCollection and solve this issue in a linq query:
var list1 = list.Where(i => !list2.Contains(i)).ToList();
var list2 = list2.Where(i => !list.Contains(i)).ToList();
But this doesn't perform as good as i would like.
Any idea of making this quicker and less resource intensive as i need to process a lot of lists?
Use Except:
var firstNotSecond = list1.Except(list2).ToList();
var secondNotFirst = list2.Except(list1).ToList();
I suspect there are approaches which would actually be marginally faster than this, but even this will be vastly faster than your O(N * M) approach.
If you want to combine these, you could create a method with the above and then a return statement:
return !firstNotSecond.Any() && !secondNotFirst.Any();
One point to note is that there is a difference in results between the original code in the question and the solution here: any duplicate elements which are only in one list will only be reported once with my code, whereas they'd be reported as many times as they occur in the original code.
For example, with lists of [1, 2, 2, 2, 3] and [1], the "elements in list1 but not list2" result in the original code would be [2, 2, 2, 3]. With my code it would just be [2, 3]. In many cases that won't be an issue, but it's worth being aware of.
Enumerable.SequenceEqual Method
Determines whether two sequences are equal according to an equality comparer.
MS.Docs
Enumerable.SequenceEqual(list1, list2);
This works for all primitive data types. If you need to use it on custom objects you need to implement IEqualityComparer
Defines methods to support the comparison of objects for equality.
IEqualityComparer Interface
Defines methods to support the comparison of objects for equality.
MS.Docs for IEqualityComparer
More efficient would be using Enumerable.Except:
var inListButNotInList2 = list.Except(list2);
var inList2ButNotInList = list2.Except(list);
This method is implemented by using deferred execution. That means you could write for example:
var first10 = inListButNotInList2.Take(10);
It is also efficient since it internally uses a Set<T> to compare the objects. It works by first collecting all distinct values from the second sequence, and then streaming the results of the first, checking that they haven't been seen before.
If you want the results to be case insensitive, the following will work:
List<string> list1 = new List<string> { "a.dll", "b1.dll" };
List<string> list2 = new List<string> { "A.dll", "b2.dll" };
var firstNotSecond = list1.Except(list2, StringComparer.OrdinalIgnoreCase).ToList();
var secondNotFirst = list2.Except(list1, StringComparer.OrdinalIgnoreCase).ToList();
firstNotSecond would contain b1.dll
secondNotFirst would contain b2.dll
using System.Collections.Generic;
using System.Linq;
namespace YourProject.Extensions
{
public static class ListExtensions
{
public static bool SetwiseEquivalentTo<T>(this List<T> list, List<T> other)
where T: IEquatable<T>
{
if (list.Except(other).Any())
return false;
if (other.Except(list).Any())
return false;
return true;
}
}
}
Sometimes you only need to know if two lists are different, and not what those differences are. In that case, consider adding this extension method to your project. Note that your listed objects should implement IEquatable!
Usage:
public sealed class Car : IEquatable<Car>
{
public Price Price { get; }
public List<Component> Components { get; }
...
public override bool Equals(object obj)
=> obj is Car other && Equals(other);
public bool Equals(Car other)
=> Price == other.Price
&& Components.SetwiseEquivalentTo(other.Components);
public override int GetHashCode()
=> Components.Aggregate(
Price.GetHashCode(),
(code, next) => code ^ next.GetHashCode()); // Bitwise XOR
}
Whatever the Component class is, the methods shown here for Car should be implemented almost identically.
It's very important to note how we've written GetHashCode. In order to properly implement IEquatable, Equals and GetHashCode must operate on the instance's properties in a logically compatible way.
Two lists with the same contents are still different objects, and will produce different hash codes. Since we want these two lists to be treated as equal, we must let GetHashCode produce the same value for each of them. We can accomplish this by delegating the hashcode to every element in the list, and using the standard bitwise XOR to combine them all. XOR is order-agnostic, so it doesn't matter if the lists are sorted differently. It only matters that they contain nothing but equivalent members.
Note: the strange name is to imply the fact that the method does not consider the order of the elements in the list. If you do care about the order of the elements in the list, this method is not for you!
try this way:
var difList = list1.Where(a => !list2.Any(a1 => a1.id == a.id))
.Union(list2.Where(a => !list1.Any(a1 => a1.id == a.id)));
Not for this Problem, but here's some code to compare lists for equal and not! identical objects:
public class EquatableList<T> : List<T>, IEquatable<EquatableList<T>> where T : IEquatable<T>
/// <summary>
/// True, if this contains element with equal property-values
/// </summary>
/// <param name="element">element of Type T</param>
/// <returns>True, if this contains element</returns>
public new Boolean Contains(T element)
{
return this.Any(t => t.Equals(element));
}
/// <summary>
/// True, if list is equal to this
/// </summary>
/// <param name="list">list</param>
/// <returns>True, if instance equals list</returns>
public Boolean Equals(EquatableList<T> list)
{
if (list == null) return false;
return this.All(list.Contains) && list.All(this.Contains);
}
If only combined result needed, this will work too:
var set1 = new HashSet<T>(list1);
var set2 = new HashSet<T>(list2);
var areEqual = set1.SetEquals(set2);
where T is type of lists element.
While Jon Skeet's answer is an excellent advice for everyday's practice with small to moderate number of elements (up to a few millions) it is nevertheless not the fastest approach and not very resource efficient. An obvious drawback is the fact that getting the full difference requires two passes over the data (even three if the elements that are equal are of interest as well). Clearly, this can be avoided by a customized reimplementation of the Except method, but it remains that the creation of a hash set requires a lot of memory and the computation of hashes requires time.
For very large data sets (in the billions of elements) it usually pays off to consider the particular circumstances. Here are a few ideas that might provide some inspiration:
If the elements can be compared (which is almost always the case in practice), then sorting the lists and applying the following zip approach is worth consideration:
/// <returns>The elements of the specified (ascendingly) sorted enumerations that are
/// contained only in one of them, together with an indicator,
/// whether the element is contained in the reference enumeration (-1)
/// or in the difference enumeration (+1).</returns>
public static IEnumerable<Tuple<T, int>> FindDifferences<T>(IEnumerable<T> sortedReferenceObjects,
IEnumerable<T> sortedDifferenceObjects, IComparer<T> comparer)
{
var refs = sortedReferenceObjects.GetEnumerator();
var diffs = sortedDifferenceObjects.GetEnumerator();
bool hasNext = refs.MoveNext() && diffs.MoveNext();
while (hasNext)
{
int comparison = comparer.Compare(refs.Current, diffs.Current);
if (comparison == 0)
{
// insert code that emits the current element if equal elements should be kept
hasNext = refs.MoveNext() && diffs.MoveNext();
}
else if (comparison < 0)
{
yield return Tuple.Create(refs.Current, -1);
hasNext = refs.MoveNext();
}
else
{
yield return Tuple.Create(diffs.Current, 1);
hasNext = diffs.MoveNext();
}
}
}
This can e.g. be used in the following way:
const int N = <Large number>;
const int omit1 = 231567;
const int omit2 = 589932;
IEnumerable<int> numberSequence1 = Enumerable.Range(0, N).Select(i => i < omit1 ? i : i + 1);
IEnumerable<int> numberSequence2 = Enumerable.Range(0, N).Select(i => i < omit2 ? i : i + 1);
var numberDiffs = FindDifferences(numberSequence1, numberSequence2, Comparer<int>.Default);
Benchmarking on my computer gave the following result for N = 1M:
Method
Mean
Error
StdDev
Ratio
Gen 0
Gen 1
Gen 2
Allocated
DiffLinq
115.19 ms
0.656 ms
0.582 ms
1.00
2800.0000
2800.0000
2800.0000
67110744 B
DiffZip
23.48 ms
0.018 ms
0.015 ms
0.20
-
-
-
720 B
And for N = 100M:
Method
Mean
Error
StdDev
Ratio
Gen 0
Gen 1
Gen 2
Allocated
DiffLinq
12.146 s
0.0427 s
0.0379 s
1.00
13000.0000
13000.0000
13000.0000
8589937032 B
DiffZip
2.324 s
0.0019 s
0.0018 s
0.19
-
-
-
720 B
Note that this example of course benefits from the fact that the lists are already sorted and integers can be very efficiently compared. But this is exactly the point: If you do have favourable circumstances, make sure that you exploit them.
A few further comments: The speed of the comparison function is clearly relevant for the overall performance, so it may be beneficial to optimize it. The flexibility to do so is a benefit of the zipping approach. Furthermore, parallelization seems more feasible to me, although by no means easy and maybe not worth the effort and the overhead. Nevertheless, a simple way to speed up the process by roughly a factor of 2, is to split the lists respectively in two halfs (if it can be efficiently done) and compare the parts in parallel, one processing from front to back and the other in reverse order.
I have used this code to compare two list which has million of records.
This method will not take much time
//Method to compare two list of string
private List<string> Contains(List<string> list1, List<string> list2)
{
List<string> result = new List<string>();
result.AddRange(list1.Except(list2, StringComparer.OrdinalIgnoreCase));
result.AddRange(list2.Except(list1, StringComparer.OrdinalIgnoreCase));
return result;
}
I compared 3 different methods for comparing different data sets. Tests below create a string collection of all the numbers from 0 to length - 1, then another collection with the same range, but with even numbers. I then pick out the odd numbers from the first collection.
Using Linq Except
public void TestExcept()
{
WriteLine($"Except {DateTime.Now}");
int length = 20000000;
var dateTime = DateTime.Now;
var array = new string[length];
for (int i = 0; i < length; i++)
{
array[i] = i.ToString();
}
Write("Populate set processing time: ");
WriteLine(DateTime.Now - dateTime);
var newArray = new string[length/2];
int j = 0;
for (int i = 0; i < length; i+=2)
{
newArray[j++] = i.ToString();
}
dateTime = DateTime.Now;
Write("Count of items: ");
WriteLine(array.Except(newArray).Count());
Write("Count processing time: ");
WriteLine(DateTime.Now - dateTime);
}
Output
Except 2021-08-14 11:43:03 AM
Populate set processing time: 00:00:03.7230479
2021-08-14 11:43:09 AM
Count of items: 10000000
Count processing time: 00:00:02.9720879
Using HashSet.Add
public void TestHashSet()
{
WriteLine($"HashSet {DateTime.Now}");
int length = 20000000;
var dateTime = DateTime.Now;
var hashSet = new HashSet<string>();
for (int i = 0; i < length; i++)
{
hashSet.Add(i.ToString());
}
Write("Populate set processing time: ");
WriteLine(DateTime.Now - dateTime);
var newHashSet = new HashSet<string>();
for (int i = 0; i < length; i+=2)
{
newHashSet.Add(i.ToString());
}
dateTime = DateTime.Now;
Write("Count of items: ");
// HashSet Add returns true if item is added successfully (not previously existing)
WriteLine(hashSet.Where(s => newHashSet.Add(s)).Count());
Write("Count processing time: ");
WriteLine(DateTime.Now - dateTime);
}
Output
HashSet 2021-08-14 11:42:43 AM
Populate set processing time: 00:00:05.6000625
Count of items: 10000000
Count processing time: 00:00:01.7703057
Special HashSet test:
public void TestLoadingHashSet()
{
int length = 20000000;
var array = new string[length];
for (int i = 0; i < length; i++)
{
array[i] = i.ToString();
}
var dateTime = DateTime.Now;
var hashSet = new HashSet<string>(array);
Write("Time to load hashset: ");
WriteLine(DateTime.Now - dateTime);
}
> TestLoadingHashSet()
Time to load hashset: 00:00:01.1918160
Using .Contains
public void TestContains()
{
WriteLine($"Contains {DateTime.Now}");
int length = 20000000;
var dateTime = DateTime.Now;
var array = new string[length];
for (int i = 0; i < length; i++)
{
array[i] = i.ToString();
}
Write("Populate set processing time: ");
WriteLine(DateTime.Now - dateTime);
var newArray = new string[length/2];
int j = 0;
for (int i = 0; i < length; i+=2)
{
newArray[j++] = i.ToString();
}
dateTime = DateTime.Now;
WriteLine(dateTime);
Write("Count of items: ");
WriteLine(array.Where(a => !newArray.Contains(a)).Count());
Write("Count processing time: ");
WriteLine(DateTime.Now - dateTime);
}
Output
Contains 2021-08-14 11:19:44 AM
Populate set processing time: 00:00:03.1046998
2021-08-14 11:19:49 AM
Count of items: Hosting process exited with exit code 1.
(Didnt complete. Killed it after 14 minutes)
Conclusion:
Linq Except ran approximately 1 second slower on my device than using HashSets (n=20,000,000).
Using Where and Contains ran for a very long time
Closing remarks on HashSets:
Unique data
Make sure to override GetHashCode (correctly) for class types
May need up to 2x the memory if you make a copy of the data set, depending on implementation
HashSet is optimized for cloning other HashSets using the IEnumerable constructor, but it is slower to convert other collections to HashSets (see special test above)
First approach:
if (list1 != null && list2 != null && list1.Select(x => list2.SingleOrDefault(y => y.propertyToCompare == x.propertyToCompare && y.anotherPropertyToCompare == x.anotherPropertyToCompare) != null).All(x => true))
return true;
Second approach if you are ok with duplicate values:
if (list1 != null && list2 != null && list1.Select(x => list2.Any(y => y.propertyToCompare == x.propertyToCompare && y.anotherPropertyToCompare == x.anotherPropertyToCompare)).All(x => true))
return true;
Both Jon Skeet's and miguelmpn's answers are good. It depends on whether the order of the list elements is important or not:
// take order into account
bool areEqual1 = Enumerable.SequenceEqual(list1, list2);
// ignore order
bool areEqual2 = !list1.Except(list2).Any() && !list2.Except(list1).Any();
One line:
var list1 = new List<int> { 1, 2, 3 };
var list2 = new List<int> { 1, 2, 3, 4 };
if (list1.Except(list2).Count() + list2.Except(list1).Count() == 0)
Console.WriteLine("same sets");
I did the generic function for comparing two lists.
public static class ListTools
{
public enum RecordUpdateStatus
{
Added = 1,
Updated = 2,
Deleted = 3
}
public class UpdateStatu<T>
{
public T CurrentValue { get; set; }
public RecordUpdateStatus UpdateStatus { get; set; }
}
public static List<UpdateStatu<T>> CompareList<T>(List<T> currentList, List<T> inList, string uniqPropertyName)
{
var res = new List<UpdateStatu<T>>();
res.AddRange(inList.Where(a => !currentList.Any(x => x.GetType().GetProperty(uniqPropertyName).GetValue(x)?.ToString().ToLower() == a.GetType().GetProperty(uniqPropertyName).GetValue(a)?.ToString().ToLower()))
.Select(a => new UpdateStatu<T>
{
CurrentValue = a,
UpdateStatus = RecordUpdateStatus.Added,
}));
res.AddRange(currentList.Where(a => !inList.Any(x => x.GetType().GetProperty(uniqPropertyName).GetValue(x)?.ToString().ToLower() == a.GetType().GetProperty(uniqPropertyName).GetValue(a)?.ToString().ToLower()))
.Select(a => new UpdateStatu<T>
{
CurrentValue = a,
UpdateStatus = RecordUpdateStatus.Deleted,
}));
res.AddRange(currentList.Where(a => inList.Any(x => x.GetType().GetProperty(uniqPropertyName).GetValue(x)?.ToString().ToLower() == a.GetType().GetProperty(uniqPropertyName).GetValue(a)?.ToString().ToLower()))
.Select(a => new UpdateStatu<T>
{
CurrentValue = a,
UpdateStatus = RecordUpdateStatus.Updated,
}));
return res;
}
}
I think this is a simple and easy way to compare two lists element by element
x=[1,2,3,5,4,8,7,11,12,45,96,25]
y=[2,4,5,6,8,7,88,9,6,55,44,23]
tmp = []
for i in range(len(x)) and range(len(y)):
if x[i]>y[i]:
tmp.append(1)
else:
tmp.append(0)
print(tmp)
Maybe it's funny, but this works for me:
string.Join("",List1) != string.Join("", List2)
This is the best solution you'll found
var list3 = list1.Where(l => list2.ToList().Contains(l));

Access Enumerator within a foreach loop?

I have a List class, and I would like to override GetEnumerator() to return my own Enumerator class. This Enumerator class would have two additional properties that would be updated as the Enumerator is used.
For simplicity (this isn't the exact business case), let's say those properties were CurrentIndex and RunningTotal.
I could manage these properties within the foreach loop manually, but I would rather encapsulate this functionality for reuse, and the Enumerator seems to be the right spot.
The problem: foreach hides all the Enumerator business, so is there a way to, within a foreach statement, access the current Enumerator so I can retrieve my properties? Or would I have to foreach, use a nasty old while loop, and manipulate the Enumerator myself?
Strictly speaking, I would say that if you want to do exactly what you're saying, then yes, you would need to call GetEnumerator and control the enumerator yourself with a while loop.
Without knowing too much about your business requirement, you might be able to take advantage of an iterator function, such as something like this:
public static IEnumerable<decimal> IgnoreSmallValues(List<decimal> list)
{
decimal runningTotal = 0M;
foreach (decimal value in list)
{
// if the value is less than 1% of the running total, then ignore it
if (runningTotal == 0M || value >= 0.01M * runningTotal)
{
runningTotal += value;
yield return value;
}
}
}
Then you can do this:
List<decimal> payments = new List<decimal>() {
123.45M,
234.56M,
.01M,
345.67M,
1.23M,
456.78M
};
foreach (decimal largePayment in IgnoreSmallValues(payments))
{
// handle the large payments so that I can divert all the small payments to my own bank account. Mwahaha!
}
Updated:
Ok, so here's a follow-up with what I've termed my "fishing hook" solution. Now, let me add a disclaimer that I can't really think of a good reason to do something this way, but your situation may differ.
The idea is that you simply create a "fishing hook" object (reference type) that you pass to your iterator function. The iterator function manipulates your fishing hook object, and since you still have a reference to it in your code outside, you have visibility into what's going on:
public class FishingHook
{
public int Index { get; set; }
public decimal RunningTotal { get; set; }
public Func<decimal, bool> Criteria { get; set; }
}
public static IEnumerable<decimal> FishingHookIteration(IEnumerable<decimal> list, FishingHook hook)
{
hook.Index = 0;
hook.RunningTotal = 0;
foreach(decimal value in list)
{
// the hook object may define a Criteria delegate that
// determines whether to skip the current value
if (hook.Criteria == null || hook.Criteria(value))
{
hook.RunningTotal += value;
yield return value;
hook.Index++;
}
}
}
You would utilize it like this:
List<decimal> payments = new List<decimal>() {
123.45M,
.01M,
345.67M,
234.56M,
1.23M,
456.78M
};
FishingHook hook = new FishingHook();
decimal min = 0;
hook.Criteria = x => x > min; // exclude any values that are less than/equal to the defined minimum
foreach (decimal value in FishingHookIteration(payments, hook))
{
// update the minimum
if (value > min) min = value;
Console.WriteLine("Index: {0}, Value: {1}, Running Total: {2}", hook.Index, value, hook.RunningTotal);
}
// Resultint output is:
//Index: 0, Value: 123.45, Running Total: 123.45
//Index: 1, Value: 345.67, Running Total: 469.12
//Index: 2, Value: 456.78, Running Total: 925.90
// we've skipped the values .01, 234.56, and 1.23
Essentially, the FishingHook object gives you some control over how the iterator executes. The impression I got from the question was that you needed some way to access the inner workings of the iterator so that you could manipulate how it iterates while you are in the middle of iterating, but if this is not the case, then this solution might be overkill for what you need.
With foreach you indeed can't get the enumerator - you could, however, have the enumerator return (yield) a tuple that includes that data; in fact, you could probably use LINQ to do it for you...
(I couldn't cleanly get the index using LINQ - can get the total and current value via Aggregate, though; so here's the tuple approach)
using System.Collections;
using System.Collections.Generic;
using System;
class MyTuple
{
public int Value {get;private set;}
public int Index { get; private set; }
public int RunningTotal { get; private set; }
public MyTuple(int value, int index, int runningTotal)
{
Value = value; Index = index; RunningTotal = runningTotal;
}
static IEnumerable<MyTuple> SomeMethod(IEnumerable<int> data)
{
int index = 0, total = 0;
foreach (int value in data)
{
yield return new MyTuple(value, index++,
total = total + value);
}
}
static void Main()
{
int[] data = { 1, 2, 3 };
foreach (var tuple in SomeMethod(data))
{
Console.WriteLine("{0}: {1} ; {2}", tuple.Index,
tuple.Value, tuple.RunningTotal);
}
}
}
You can also do something like this in a more Functional way, depending on your requirements. What you are asking can be though of as "zipping" together multiple sequences, and then iterating through them all at once. The three sequences for the example you gave would be:
The "value" sequence
The "index" sequence
The "Running Total" Sequence
The next step would be to specify each of these sequences seperately:
List<decimal> ValueList
var Indexes = Enumerable.Range(0, ValueList.Count)
The last one is more fun... the two methods I can think of are to either have a temporary variable used to sum up the sequence, or to recalculate the sum for each item. The second is obviously much less performant, I would rather use the temporary:
decimal Sum = 0;
var RunningTotals = ValueList.Select(v => Sum = Sum + v);
The last step would be to zip these all together. .Net 4 will have the Zip operator built in, in which case it will look like this:
var ZippedSequence = ValueList.Zip(Indexes, (value, index) => new {value, index}).Zip(RunningTotals, (temp, total) => new {temp.value, temp.index, total});
This obviously gets noisier the more things you try to zip together.
In the last link, there is source for implementing the Zip function yourself. It really is a simple little bit of code.

Categories

Resources