Remove all but 1 object in list based on grouping - c#

I have a list of objects with multiple properties in it. Here is the object.
public class DataPoint
{
private readonly string uniqueId;
public DataPoint(string uid)
{
this.uniqueId = uid;
}
public string UniqueId
{
get
{
return this.uniqueId;
}
}
public string ScannerID { get; set; }
public DateTime ScanDate { get; set; }
}
Now in my code, I have a giant list of these, hundreds maybe a few thousand.
Each data point object belongs to some type of scanner, and has a scan date. I want to remove any data points that were scanned on the same day except for the last one for a given machine.
I tried using LINQ as follows but this did not work. I still have many duplicate data points.
this.allData = this.allData.GroupBy(g => g.ScannerID)
.Select(s => s.OrderByDescending(o => o.ScanDate))
.First()
.ToList();`
I need to group the data points by scanner ID, because there could be data points scanned on the same day but on a different machine. I only need the last data point for a day if there are multiple.
Edit for clarification - By last data point I mean the last scanned data point for a given scan date for a given machine. I hope that helps. So when grouping by scanner ID, I then tried to order by scan date and then only keep the last scan date for days with multiple scans.
Here is some test data for 2 machines:
Unique ID Scanner ID Scan Date
A1JN221169H07 49374 2003-02-21 15:12:53.000
A1JN22116BK08 49374 2003-02-21 15:14:08.000
A1JN22116DN09 49374 2003-02-21 15:15:23.000
A1JN22116FP0A 49374 2003-02-21 15:16:37.000
A1JOA050U900J 80354 2004-10-05 10:53:24.000
A1JOA050UB30K 80354 2004-10-05 10:54:39.000
A1JOA050UD60L 80354 2004-10-05 10:55:54.000
A1JOA050UF80M 80354 2004-10-05 10:57:08.000
A1JOA0600O202 80354 2004-10-06 08:38:26.000

I want to remove any data points that were scanned on the same day except for the last one for a given machine.
So I assume you want to group by both ScanDate and ScannerID. Here is the code:
var result = dataPoints.GroupBy(i => new { i.ScanDate.Date, i.ScannerID })
.OrderByDescending(i => i.Key.Date)
.Select(i => i.First())
.ToList();

If I understand you correctly this is what you want.
var result = dataPoints.GroupBy(i => new { i.ScanDate.Date, i.ScannerID })
.Select(i => i.OrderBy(x => x.ScanDate).Last())
.ToList();
This groups by the scanner id and the day (SacnnerDate.Date will zero out the time portion), then for each grouping it orders by the ScanDate (since the groups are the same day this will order on the time) and takes the last. So for each day you will get one result for each scanner which has the latest ScanDate for that particular day.

Just as an aside, the class could be defined as
public class DataPoint
{
public DataPoint(string uid)
{
UniqueId = uid;
}
public string UniqueId {get; private set; }
public string ScannerID { get; set; }
public DateTime ScanDate { get; set; }
}

Related

How to Sort a List based on the value of a list within the list in C#?

Probably the question is a bit confusing but basically I have a list which inside consisting another list. Like this:
public class Transaction
{
public string TransactionType { get; set; }
public List<TransactionDetails> Details { get; set; } = new List<TransactionDetails>();
}
public class TransactionDetails
{
public string TransactionDate { get; set; }
public double TransactionAmount { get; set; }
public double TransactionBalance { get; set; }
}
Now, I want to sort the Transaction list based on the TransactionDate inside the TransactionDetails list. If let's say I just want to sort it based on TransactionType, I know that this is the way to do it: (assuming I already added the list)
List<Transaction> listTransactions;
List<Transaction> sortedListTransactions;
sortedListTransactions = listTransactions
.OrderBy(x => x.TransactionType)
.ToList();
I tried it and this worked. But, when I applied basically the same method for my case,which is to sort the list based on TransactionDate, it doesnt give me the answer I needed. Please let me know if there is any way I could achieve this..
If you have to sort a list, you have to pick an element which defines the order (like a value, a date, etc.) If the property you like to use for sorting is also a list, you have to either pick one element from that inner list (e.g. min, max, first, last) or to aggregate all inner list values (like sum, average, median, concat) to one value.
One approach in your case could be to find the minimum element within the inner list and use this as criteria for sorting the outer elements. This could be something like this:
var sortedTransactions = transactions
.OrderBy(transaction => transaction.Details
.OrderBy(detail => detail.TransactionDate)
.FirstOrDefault())
.ToList();
From your example another possibility to sort the outer elements from the inner list, could be to take the sum of amount of all transactions, which would be something like this:
var sortedTransactions = transactions
.OrderBy(transaction => transaction.Details
.Sum(detail => detail.TransactionAmount))
.ToList();
Nevertheless, it is up to you to define the criteria of sorting on every level and then let it bubble up to give the sorting criteria back.

LINQ - take x objects from database for each object property value

I have an entity:
public class Component
{
public Guid Id { get; set; }
public string Name { get; set; }
public ProductType Type { get; set; }
}
ProductType
public enum ProductType
{
Harddrive = 1,
GraphicCard,
ComputerCase,
}
I'm trying to get list of Product that contains 15 random items (5 per ProductType) in single LINQ.
ComputerCase, GraphicCard and Harddrive inherts from same base class
For now I have something like that:
var response = db.Components
.Select(x => new Product
{
Id = x.Id,
Name = x.Name,
Type = x.Type,
}).ToList();
but I have no idea how could I achive what I need. Can anyone help me with that?
Make groups of Components with the same ProductType. From the resulting collection of groups take the first Component in the group. From that result take the first 5, items.
var result = myComponents. // take the collection of Components
.GroupBy(component => component.Type) // group this into groups of components with same Type
.Select(group => group.FirstOrDefault()) // from every group take the first element
.Take(5) // take only the first five
Of course, if you really want a proper random, you'll have to fetch all Component groups to local memory and use RND to extract random groups and random element from each selected group

LINQ - AND, ANY and NOT Query

I am fairly rookie with LINQ. I can do some basic stuff with it but I am in need of an expert.
I am using Entity Framework and I have a table that has 3 columns.
public class Aspect
{
[Key, Column(Order = 0)]
public int AspectID { get; set; }
[Key, Column(Order = 1)]
public int AspectFieldID { get; set; }
public string Value { get; set; }
}
I have 3 lists of words from a user's input. One contains phrases or words that must be in the Value field (AND), another contains phrases or words that don't have to be in the Value field (ANY) and the last list contains phrases or words that can not be found in the Value field (NOT).
I need to get every record that has all of the ALL words, any of the ANY words and none of the NOT words.
Here are my objects.
public class SearchAllWord
{
public string Word { get; set; }
public bool includeSynonoyms { get; set; }
}
public class SearchAnyWord
{
public string Word { get; set; }
public bool includeSynonoyms { get; set; }
}
public class SearchNotWord
{
public string Word { get; set; }
}
What I have so far is this,
var aspectFields = getAspectFieldIDs().Where(fieldID => fieldID > 0).ToList();//retrieves a list of AspectFieldID's that match user input
var result = db.Aspects
.Where(p => aspectFields.Contains(p.AspectFieldID))
.ToList();
Any and all help is appreciated.
First let me say, if this is your requirement... your query will read every record in the database. This is going to be a slow operation.
IQueryable<Aspect> query = db.Aspects.AsQueryable();
//note, if AllWords is empty, query is not modified.
foreach(SearchAllWord x in AllWords)
{
//important, lambda should capture local variable instead of loop variable.
string word = x.Word;
query = query.Where(aspect => aspect.Value.Contains(word);
}
foreach(SearchNotWord x in NotWords)
{
string word = x.Word;
query = query.Where(aspect => !aspect.Value.Contains(word);
}
if (AnyWords.Any()) //haha!
{
List<string> words = AnyWords.Select(x => x.Value).ToList();
query =
from aspect in query
from word in words //does this work in EF?
where aspect.Value.Contains(word)
group aspect by aspect into g
select g.Key;
}
If you're sending this query into Sql Server, be aware of the ~2100 parameter limit. Each word is going to be sent as a parameter.
What you need are the set operators, specifically
Intersect
Any
Bundle up your "all" words into a string array (or some other enumerable) and then you can use intersect and count to check they are all present.
Here are two sets
var A = new string[] { "first", "second", "third" };
var B = new string[] { "second", "third" };
A is a superset of B?
var isSuperset = A.Intersect(B).Count() == B.Count();
A is disjoint with B?
var isDisjoint1 = !A.Intersect(B).Any();
var isDisjoint2 = !A.Any(a => B.Any(b => a == b)); //faster
Your objects are not strings so you will want the overload that allows you to supply a comparator function.
And now some soapboxing.
Much as I love Linq2sql it is not available in ASP.NET Core and the EF team wants to keep it that way, probably because jerks like me keep saying "gross inefficiency X of EF doesn't apply to Linq2Sql".
Core is the future. Small, fast and cross platform, it lets you serve a Web API from a Raspberry Pi running Windows IOT or Linux -- or get ridiculously high performance on big hardware.
EF is not and probably never will be a high performance proposition because it takes control away from you while insisting on being platform agnostic, which prevents it from exploiting the platform.
In the absence of Linq2sql, the solution seems to be libraries like Dapper, which handle parameters when sending the query and map results into object graphs when the result arrives, but otherwise don't do much. This makes them more or less platform agnostic but still lets you exploit the platform - apart from parameter substitution your SQL is passthrough.

Algorithm to calculate frequency and recency of an entity?

I have a list of entities opened by various users.
I keep track of each access of any entity by storing access dates and times as the following:
public class Entity
{
public int Id { get; set; }
public virtual ICollection<AccessInfo> Accesses { get; set; }
= new HashSet<AccessInfo>();
}
public class AccessInfo
{
public int Id { get; set; }
public AccessInfoType Type { get; set; }
public User User { get; set; }
public DateTime DateTime { get; set; }
}
public enum AccessInfoType
{
Create,
Read,
Update,
Delete,
}
Now I'm trying to make an algorithm that filters the most wanted contacts based on both factors: recency and frequency.
I want contacts that were accessed 5 times yesterday to be prioritized over a contact that was accessed 30 times a week ago. But in the other hand, a user that was only accessed one time today is less important.
Is there an official name for this? I'm sure people have worked on a frequency calculation like this one before, and I'd like to read about this before I spend some time coding.
I thought about calculating the sum of the access dates in recent month and sort accordingly but I'm still not sure it's the right way, I'd love to learn from the experts.
return Entities
.OrderBy(c =>
c.Accesses
.Where(a => a.Employee.UserName == UserName)
.Where(a => a.DateTime > lastMonth)
.Select(a => a.DateTime.Ticks)
.Sum());
Exponential decay is what you're looking for. See this link:
http://www.evanmiller.org/rank-hotness-with-newtons-law-of-cooling.html
I would use a heuristic that assigns points to Entities for access and uses some kind of decay on those points.
For example, you could give an entity 1 point every time it is accessed, and once every day multiply all the points by a factor of 0.8

Get all records with inner record from last month - MongoDB C# SDK

I'm trying to get all the users that had any kind of activity in the last month from my MongoDB database, using C# SDK.
My User record contains a list of statistical records (as ObjctId) with creation date.
public class UserRecord
{
private string firstName;
public ObjectId Id
{
get;
set;
}
public List<ObjectId> Statistics
{
get;
set;
}
}
And my query builder function looks like this:
static IMongoQuery GenerateGetLiveUsersQuery(DateTime lastStatistic)
{
List<IMongoQuery> queries = new List<IMongoQuery>();
queries.Add((Query<UserRecord>.GTE(U =>
U.Statistics.OrderByDescending(x => x.CreationTime).First().CreationTime
, lastStatistic)));
///.... More "queries.Add"...
return Query.And(queries);
}
Not sure what I'm doing wrong but I get en System.NotSupportedException error message while trying to build the query (the GTE query).
Unable to determine the serialization information for the expression:
(UserRecord U) =>
Enumerable.First(Enumerable.OrderByDescending(U.Statistics, (ObjectId x) => x.CreationTime)).CreationTime.
The following query should work
var oId = new ObjectId(lastStatistic, 1, 1, 1);
var query = Query<UserRecord>.GTE(e => e.Statistics, oId);
you can create an ObjectId based on the lastStatistic Date which is your cut-off. Then you can just query the UserRecord collection to find any records that have an item in the Statistics list that is greater than your ObjectId.

Categories

Resources