linq query group by fail to group all rows with similar keys - c#

this is my query:
rows.GroupBy(row => new TaxGroupObject
{
EnvelopeID = row.Field<int>("EnvelopeID"),
PolicyNumber = row.Field<string>("PolicyNumber"),
TZ = row.Field<string>("TZ")
})
.Select(row =>
{
int i;
if (row.Key.EnvelopeID == 5713 && row.Key.PolicyNumber == "50002617" && row.Key.TZ == "50002617")
i=1+1;
var newRow = structure.NewRow();
newRow["PolicyNumber"]=row.Key.PolicyNumber;
newRow["TZ"]=row.Key.TZ;
newRow["CreditPremiaTaxParagraph45"] = row.Sum(x => decimal.Parse(x["CreditPremiaTaxParagraph45"].ToString()));
newRow["WorklossTax"] = row.Sum(x => decimal.Parse(x["WorklossTax"].ToString()));
newRow["MiscTax"] = row.Sum(x => decimal.Parse(x["MiscTax"].ToString()));
newRow["EnvelopeID"] = row.Key.EnvelopeID;
return newRow;
}
);
internal class TaxGroupObject
{
public long? EnvelopeID{ get; set; }
public string PolicyNumber { get; set; }
public string TZ { get; set; }
}
i put a breakpoint on the line with "i=1+1", after an if condition comparing all the keys i've used for the group by with some hard coded values. that break point is being hit twice, although the group by suppose to group all rows with same keys together. the thing is that for most of the values in the table the grouping works just fine and i cant understand how its possible. if you can help in any way it would be highly appreciated.

The problem is that TaxGroupObject does not implement GetHashCode and Equals. These methods are used by GroupBy to determine what makes one TaxGroupObject object equal to another. By default, it's by reference equality, not property equality.
This should work, using the GetHashCode algorithm from What is the best algorithm for an overridden System.Object.GetHashCode?:
internal class TaxGroupObject
{
public long? EnvelopeID { get; set; }
public string PolicyNumber { get; set; }
public string TZ { get; set; }
public override int GetHashCode()
{
unchecked // Overflow is fine, just wrap
{
int hash = 17;
hash = hash * 23 + EnvelopeID.GetHashCode();
hash = hash * 23 + (PolicyNumber != null ? PolicyNumber.GetHashCode() : -2);
hash = hash * 23 + (TZ != null ? TZ.GetHashCode() : -1);
return hash;
}
}
public override bool Equals(object obj)
{
if (obj.GetType() != typeof(TaxGroupObject))
return false;
var other = (TaxGroupObject)obj;
return this.EnvelopeID == other.EnvelopeID &&
this.PolicyNumber == other.PolicyNumber &&
this.TZ == other.TZ;
}
}
Also, you should only use immutable objects in something like a grouping or dictionary. At a minimum, you must be sure that the objects here do not change during your grouping.

eventually i found it simpler to give up inheritance and use a struct instead of class, it also works since struct is a value type therefore doesn't need equals method override. I am intrested in which of these approaches bring better performance, if anybody knows. Intuitively it seems like structs are more efficient, but I am not sure, and I currently don't have the time to emulate the two options or make the proper re(google)search.
Thanks

Related

How to find the differences between two objects using IEquatable?

I am comparing two same objects by implementing an IEquatable interface on the object. If they are not equal, then update the DB; otherwise, leave it as it is. Here the context is i need to update the table with the data coming from an excel sheet and compare the data and update only when there is a data change.
Below is the code for the same
var constructionMaterialTypes = package.GetObjectsFromSheet<ConstructionMaterialTypeExcelDto>(ConstructionDataSheetNames.CONSTRUCTION_MATERIAL_TYPE,
ConstructionDataTableNames.ConstructionMaterialType);
var materialTypes = new List<ConstructionMaterialType>();
foreach (var materialType in constructionMaterialTypes)
{
var materialTypeId = GetGuidFromMD5Hash(materialType.name);
List<string> materialTypeNotes = new();
if (!string.IsNullOrEmpty(materialType.notes))
{
materialTypeNotes.Add(materialType.notes);
}
var existingMaterialType = ctx.ConstructionMaterialTypes.SingleOrDefault(cm => cm.Id == materialTypeId);
var constructionMaterialType = new ConstructionMaterialType
{
Id = materialTypeId,
Name = materialType.name,
NotesHTML = materialTypeNotes
};
if (existingMaterialType != default)
{
if (existingMaterialType != constructionMaterialType) // Object comparison happening here
{
existingMaterialType.Name = materialType.name;
existingMaterialType.NotesHTML = materialTypeNotes;
}
}
else
{
materialTypes.Add(constructionMaterialType);
}
}
and then below is the actual class where I am implementing Iequatable interface
public sealed class ConstructionMaterialType : IIdentity<Guid>, IEquatable<ConstructionMaterialType>
{
[Key]
public Guid Id { get; set; }
public string Name { get; set; }
public List<string> NotesHTML { get; set; }
public bool Equals(ConstructionMaterialType other)
{
if (other is null)
return false;
return this.Id == other.Id
&& this.Name == other.Name
&& this.NotesHTML == other.NotesHTML;
}
public override bool Equals(object obj) => Equals(obj as ConstructionMaterialType);
public override int GetHashCode()
{
int hash = 19;
hash = hash * 31 + (Id == default ? 0 : Id.GetHashCode());
hash = hash * 31 + (Name == null ? 0 : Name.GetHashCode(StringComparison.OrdinalIgnoreCase));
hash = hash * 31 + (NotesHTML == null ? 0 : NotesHTML.GetHashCode());
return hash;
}
}
this condition existingMaterialType != constructionMaterialType is always true even if both objects are holding the same values, and I have attached the images as well for reference purposes
I am not sure where I am doing wrong in the above code. Could anyone please point me in the right direction?
Many thanks in advance
You did not override the != operator, but you can use !existingMaterialType.Equals(constructionMaterialType) instead.
this.NotesHTML == other.NotesHTML will do a reference comparison of the two list, so even if both contain excactly the same strings, it will return false is the two lists are different instances. You might want to use this.NotesHTML.SequenceEqual(other.NotesHTML) instead (might need sone adaptation if NotesHTML can be null).
Note: GetHashCode must deliver the same result for all objects that compare equal. So if you change anything in the Equals method, you probably also have to change GetHashCode. As it is not necessary that objects that compare non-equal have different hash codes, it is an option to just not take into account some properties. Here: just omit the line with NotesHTML.

Best way to find values not in two lists c#

I have two lists which I need to compare (carOptions and custOptions).
Both of these lists are in my Customer class like below:
public class CustomerDTO
{
public int CustomerId { get; set; }
//other props removed for brevity
public List<OptionDTO> SelectedCarOptions { get; set; }
public List<OptionDTO> SelectedCustomerOptions { get; set; }
}
var existingData = _myRepository.GetDataByCustomer(customerId, year);
var existingCarOptions = existingData.Select(f => f.SelectedCarOptions);
var existingCustomerOptions = existingData.Select(f => f.SelectedCustomerOptions);
existingData is an IEnumerable of CustomerDTO and then existingCarOptions and existingCustomerOptions is an IEnumerable<List<OptionDTO>>
In the method, I have a list of IEnumerable<OptionDTO> options that gets passed in. I then break this down into car or customer based on the Enum as below:
var newCarOptions = options.Where(o => o.OptionTypeID == OptionType.CarOptions);
var newCustomerOptions = options.Where(o => o.OptionTypeID == OptionType.CustomerOptions).ToList();
What I need to do is find which options are in one collection but no in the other.
I tried as below but getting an Error on the Except (I maybe need to create my own static method in that class) but I am not sure this is the best approach really?
if (existingCarOptions.Count() != newCarOptions.Count())
{
//var test = newCarOptions.Except(existingCarOptions);
}
if (existingCustomerOptions.Count() != newCustomerOptions.Count())
{
//var test2 = newCustomerOptions.Except(existingCustomerOptions);
}
Is it also quite a bit of code in the method - I could split it out into sperate methods if required but perhaps there is a simpler way I could achieve this?
I'm assuming OptionDTO has a property called Id, which uniquely identifies an option (you have to change the code accordingly if this is not the case), you may use HashSets to quickly find unmatched OptionsDTOs, while keeping the overall time cost O(n) (where n is the max number of combined options).
Create the existing options sets:
var existingCarOptions = existingData.SelectMany(d => d.SelectedCarOptions).Select(o => o.Id);
var existingCustomerOptions = existingData.SelectMany(d => d.SelectedCustomerOptions).Select(o => o.Id);
var existingCarOptionsIds = new HashSet<int>(existingCarOptions);
var existingCustomerOptionsIds = new HashSet<int>(existingCustomerOptions );
Then you extract options missing in existing sets with:
var unmatchedCarOptions = newCarOptions.Where(o => !existingCarOptionsIds.Contains(o.Id));
var unmatchedCustomerOptions = newCustomerOptions.Where(o => !existingCustomerOptionsIds.Contains(o.Id));
If you want to compare two classes you can use IEqualityComparer
public class OptionComparer : IEqualityComparer<OptionDTO>
{
public bool Equals(OptionDTO x, OptionDTO y)
{
if (object.ReferenceEquals(x, y))
{
return true;
}
if (object.ReferenceEquals(x, null) ||
object.ReferenceEquals(y, null))
{
return false;
}
return x.OptionTypeID == y.OptionTypeID ;
}
public int GetHashCode(OptionDTO obj)
{
if (obj == null)
{
return 0;
}
return obj.OptionTypeID.GetHashCode();
}
With using this you can ıdentify that What is the concept of equality for these classes.
Now we can find different values..
public List<OptionDTO>CalculateDiffBetweenLists(List<OptionDTO> left, List<OptionDTO> right){
List<OptionDTO> optionDiff;
optionDiff = left.Except(right, new OptionComparer ()).ToList();
return optionDiff ;
}

C# RemoveAll with Generic Lists showing error

I'm doing a SAT Solver (mainly the DPLL or Partial DPLL) and I have the method for Unit Propogation. Basically what it does is that it checks whether there are any standalone literals, and removes that literal, and any instance found in the other clauses. Any Example would be
(x) (x,y) (w,z)
the Unit Propogation would be 'x' and when performing the unit prop it would leave only (w,z)
In this method I have several nested foreach loops and List<literals> <literals> is a custom made class which has 2 variables hasNegation (bool) and char literalCharacter
The Coding is below, and will explain from below
foreach (clauses c1 in listOfClauses)
{
if (c1.listOfLiterals.Count == 1)
{
literals l1 = c1.listOfLiterals[0];
solved.Add(l1);
foreach (clauses c2 in listOfClauses)
{
List<literals> tempList = new List<literals>();
foreach (literals l2 in listOfLiterals)
{
if (l2.solveDPLL(l1))
{
removable.Add(c2);
}
else
{
if (c2.listOfLiterals.Count == 1)
{
UNSAT = true;
return false;
}
else
{
if (l1.solveDPLL(l2))
{
tempList.Add(l2);
}
}
}
c2.listOfLiterals.RemoveAll(tempList); //obviously giving error
}
}
}
}
return true;
}
I have 2 List <literals> which are templist and listOfLiterals where the LATTER is the "parent"
I am tryign to remove the entries of listOfLiterals that match with tempList and I use c2.listOfLiterals.RemoveAll(tempList); obviously will output an error as it is not a Delegate.
I've searched a lot,even on stackoverflow, but every one of them compares either to an ID or an integer. In my case, since I'm just comparing 2 Lists, how can I do the delegate so that, the entries that are the same in both listOfLiterals and tempList are removed from listOfLiterals?
Many thanks
EDIT:
Literals Class
public class literals
{
public char literalCharacter { get; set; }
public bool negation { get; set; }
public literals(char lc, bool neg )
{
literalCharacter = lc;
negation = neg;
}
public bool solveDPLL (literals lit)
{
return ((Object.Equals(literalCharacter, lit.literalCharacter) && (negation == lit.negation)));
}
public String toString()
{
return literalCharacter + " : " + !negation;
}
}
If you're okay with using a little LINQ magic:
c2.listOfLiterals = c2.listOfLiterals.Except(tempList).ToList();
Or loop over the tempList:
foreach (var item in tempList)
{
c2.listOfLiterals.Remove(item);
}
You may need your literals class to implement IEqualityComparer<literal> and then provide an implementation for Equals and GetHashCode. See the MSDN page for Except for a good example of this.

Loading the Table little by little in chunks

To not run out of memory by brining in the whole table at ones, I am doing in it chunks of LOAD_SIZE records.
Here is how I am doing it, I feel like there are some indexes that are off by one record? and possible performance improvements that I can do in
So I wanted to have your opinion on this approach.
int totalCount = repo.Context.Employees.Count();
int startRow = 0;
while (startRow <= totalCount)
{
repo.PaginateEmployees(startRow, LOAD_SIZE);
startRow = startRow + LOAD_SIZE ;
}
public List<EmpsSummary> PaginateEmployees(int startRow, int loadSize)
{
var query = (from p in this.Context.Employees
.Skip(startRow).Take(loadSize)
select new EmpsSummary
{
FirstName = p.FirstName,
LastName = p.LastName,
Phone = p.Phone
});
return query.ToList();
}
Because of how Linq works (lazy loading and has compares), if you formulate your statements right it will manage memory much better than you will be able.
From your comments (which should be added to the question) I offer this solution which should manage memory for you just fine.
This example code is not intended to compile -- it is given as an example
// insert list
List<EmpsSummary> insertList;
// add stuff to insertList
List<EmpsSummary> filteredList = insertList.Except(this.Context.Employees);
This assumes that this.Context.Employees is of type EmpsSummary. If it isn't you have to cast it to the correct type.
Also you will need to be able to compare EmpsSummary. To do so create this IEquitable like this:
This example code is not intended to compile -- it is given as an example
public class EmpsSummary : IEquatable<EmpsSummary>
{
public string FirstName { get; set; }
public string LastName { get; set; }
public string Phone { get; set; }
public bool Equals(EmpsSummary other)
{
//Check whether the compared object is null.
if (Object.ReferenceEquals(other, null)) return false;
//Check whether the compared object references the same data.
if (Object.ReferenceEquals(this, other)) return true;
//Check whether the products' properties are equal.
return FirstName.Equals(other.FirstName) &&
LastName.Equals(other.LastName) &&
Phone.Equals(other.Phone);
}
// If Equals() returns true for a pair of objects
// then GetHashCode() must return the same value for these objects.
public override int GetHashCode()
{
int hashProductFirstName = FirstName == null ? 0 : FirstName.GetHashCode();
int hashProductLastName = LastName == null ? 0 : LastName.GetHashCode();
int hashProductPhone = Phone == null ? 0 : Phone.GetHashCode();
//Calculate the hash code
return hashProductFirstName ^ hashProductLastName ^ hashProductPhone;
}
}

MSTest: CollectionAssert.AreEquivalent failed. The expected collection contains 1 occurrence(s) of

Question:
Can anyone tell me why my unit test is failing with this error message?
CollectionAssert.AreEquivalent failed. The expected collection contains 1
occurrence(s) of . The actual
collection contains 0 occurrence(s).
Goal:
I'd like to check if two lists are identical. They are identical if both contain the same elements with the same property values. The order is irrelevant.
Code example:
This is the code which produces the error. list1 and list2 are identical, i.e. a copy-paste of each other.
[TestMethod]
public void TestListOfT()
{
var list1 = new List<MyPerson>()
{
new MyPerson()
{
Name = "A",
Age = 20
},
new MyPerson()
{
Name = "B",
Age = 30
}
};
var list2 = new List<MyPerson>()
{
new MyPerson()
{
Name = "A",
Age = 20
},
new MyPerson()
{
Name = "B",
Age = 30
}
};
CollectionAssert.AreEquivalent(list1.ToList(), list2.ToList());
}
public class MyPerson
{
public string Name { get; set; }
public int Age { get; set; }
}
I've also tried this line (source)
CollectionAssert.AreEquivalent(list1.ToList(), list2.ToList());
and this line (source)
CollectionAssert.AreEquivalent(list1.ToArray(), list2.ToArray());
P.S.
Related Stack Overflow questions:
I've seen both these questions, but the answers didn't help.
CollectionAssert use with generics?
Unit-testing IList with CollectionAssert
You are absolutely right. Unless you provide something like an IEqualityComparer<MyPerson> or implement MyPerson.Equals(), the two MyPerson objects will be compared with object.Equals, just like any other object. Since the objects are different, the Assert will fail.
It works if I add an IEqualityComparer<T> as described on MSDN and if I use Enumerable.SequenceEqual. Note however, that now the order of the elements is relevant.
In the unit test
//CollectionAssert.AreEquivalent(list1, list2); // Does not work
Assert.IsTrue(list1.SequenceEqual(list2, new MyPersonEqualityComparer())); // Works
IEqualityComparer
public class MyPersonEqualityComparer : IEqualityComparer<MyPerson>
{
public bool Equals(MyPerson x, MyPerson y)
{
if (object.ReferenceEquals(x, y)) return true;
if (object.ReferenceEquals(x, null) || object.ReferenceEquals(y, null)) return false;
return x.Name == y.Name && x.Age == y.Age;
}
public int GetHashCode(MyPerson obj)
{
if (object.ReferenceEquals(obj, null)) return 0;
int hashCodeName = obj.Name == null ? 0 : obj.Name.GetHashCode();
int hasCodeAge = obj.Age.GetHashCode();
return hashCodeName ^ hasCodeAge;
}
}
I was getting this same error when testing a collection persisted by nHibernate. I was able to get this to work by overriding both the Equals and GetHashCode methods. If I didn't override both I still got the same error you mentioned:
CollectionAssert.AreEquivalent failed. The expected collection contains 1 occurrence(s) of .
The actual collection contains 0 occurrence(s).
I had the following object:
public class EVProjectLedger
{
public virtual long Id { get; protected set; }
public virtual string ProjId { get; set; }
public virtual string Ledger { get; set; }
public virtual AccountRule AccountRule { get; set; }
public virtual int AccountLength { get; set; }
public virtual string AccountSubstrMethod { get; set; }
private Iesi.Collections.Generic.ISet<Contract> myContracts = new HashedSet<Contract>();
public virtual Iesi.Collections.Generic.ISet<Contract> Contracts
{
get { return myContracts; }
set { myContracts = value; }
}
public override bool Equals(object obj)
{
EVProjectLedger evProjectLedger = (EVProjectLedger)obj;
return ProjId == evProjectLedger.ProjId && Ledger == evProjectLedger.Ledger;
}
public override int GetHashCode()
{
return new { ProjId, Ledger }.GetHashCode();
}
}
Which I tested using the following:
using (ITransaction tx = session.BeginTransaction())
{
var evProject = session.Get<EVProject>("C0G");
CollectionAssert.AreEquivalent(TestData._evProjectLedgers.ToList(), evProject.EVProjectLedgers.ToList());
tx.Commit();
}
I'm using nHibernate which encourages overriding these methods anyways. The one drawback I can see is that my Equals method is based on the business key of the object and therefore tests equality using the business key and no other fields. You could override Equals however you want but beware of equality pollution mentioned in this post:
CollectionAssert.AreEquivalent failing... can't figure out why
If you would like to achieve this without having to write an equality comaparer, there is a unit testing library that you can use, called FluentAssertions,
https://fluentassertions.com/documentation/
It has many built in equality extension functions including ones for the Collections. You can install it through Nuget and its really easy to use.
Taking the example in the question above all you have to write in the end is
list1.Should().BeEquivalentTo(list2);
By default, the order matters in the two collections, however it can be changed as well.
I wrote this to test collections where the order is not important:
public static bool AreCollectionsEquivalent<T>(ICollection<T> collectionA, ICollection<T> collectionB, IEqualityComparer<T> comparer)
{
if (collectionA.Count != collectionB.Count)
return false;
foreach (var a in collectionA)
{
if (!collectionB.Any(b => comparer.Equals(a, b)))
return false;
}
return true;
}
Not as elegant as using SequenceEquals, but it works.
Of course to use it you simply do:
Assert.IsTrue(AreCollectionsEquivalent<MyType>(collectionA, collectionB, comparer));

Categories

Resources