C# - Custom GroupBy taking more time for large dataset - c#

The below code groups the result (a List of ClassTypeObject with 500,000 items) into List<a> type.
The GroupBy takes around 40 to 50 sec when executed. Is there any way to optimize this?
var groupByTest = result.
GroupBy(g => new
{
First = g.Field1
}).
Select(gp => new
{
gp.Key.Field1,
InnerList = result.Where(x => x.Field1 == gp.Key.Field1).ToList()
}).ToList();

You are selecting InnerList from non-grouped collection i.e. result that's why your query is taking time. You can change the inner query assignment as
InnerList = gp.ToList()
as gp is already grouped based on Field1.
Full Code
var groupByTest = result.
GroupBy(g => new
{
First = g.Field1
}).
Select(gp => new
{
gp.Key.Field1,
InnerList = gp.ToList()
}).ToList();

The way this query is written InnerList ends up containing just the items in the group. In its current form, the original source is scanned once for each group key. The equivalent:
var groupByTest = result.GroupBy(g => g.Field1)
.Select(gp => new {
Field1=gp.Key,
InnerList = gp.ToList()})
.ToList();
Would scan the source only once.
Once this is fixed, the query can be parallelized easily with AsParallel()
var groupByTest = result.AsParallel()
.GroupBy(g => g.Field1)
.Select(gp => new {
Field1=gp.Key,
InnerList = gp.ToList()})
.ToList();
This will use all cores in the machine to partition the data, group them and construct the final list

Related

How to OrderBy() as per the order requested in distinct string list

How to use OrderBy for shaping output in the same order as per the requested distinct list
public DataCollectionList GetLatestDataCollection(List<string> requestedDataPointList)
{
var dataPoints = _context.DataPoints.Where(c => requestedDataPointList.Contains(c.dataPointName))
.OrderBy(----------) //TODO: RE-ORDER IN THE SAME ORDER AS REQUESTED requestedDataPointList
.ToList();
dataPoints.ForEach(dp =>
{
....
});
}
Do the sorting on the client side:
public DataCollectionList GetLatestDataCollection(List<string> requestedDataPointList)
{
var dataPoints = _context.DataPoints.Where(c => requestedDataPointList.Contains(c.dataPointName))
.AsEnumerable()
.OrderBy(requestedDataPointList.IndexOf(c.dataPointName));
foreach (var dp in dataPoints)
{
....
});
}
NOTE: Also, I don't think ToList().ForEach() is ever better than foreach ().
It think the fastest method is to join the result back with the request list. This makes use of the fact that LINQ's join preserves the sort order of the first list:
var dataPoints = _context.DataPoints
.Where(c => requestedDataPointList.Contains(c.dataPointName))
.ToList();
var ordered = from n in requestedDataPointList
join dp in dataPoints on n equals dp.dataPointName
select dp;
foreach (var dataPoint in ordered)
{
...
}
This doesn't involve any ordering, joining does it all, which will be close to O(n).
Another fast method consists of creating a dictionary of sequence numbers:
var indexes = requestedDataPointList
.Select((n, i) => new { n, i }).ToDictionary(x => x.n, x => x.i);
var ordered = dataPoints.OrderBy(dp => indexes[dp.dataPointName]);

Linq Groupby Not Grouping in c#

Below i have a snippet of code which outputs a list of Appointments based on clients, some clients can have more than one appointment but the latest one is the one that needs to be outputted for said client
the output is not grouping at all and for some reason i cannot figure why the heck not
foreach (ClientRecord client in clients)
{
List<ReturnRecord> records = db.Appointments
.AsNoTracking()
.Include(rec => rec.Property)
.Include(rec => rec.Property.Address)
.Include(rec => rec.AppointmentType)
.ToList()
.Where(rec => rec.ClientID == client.ID)
.Select(rec => new ReturnRecord
{
ClientName = $"{client.FirstNames} {client.Surnames}",
PropertyAddress = $"{rec.Property.Address.FormattedAddress}",
AppStatus = $"{rec.AppointmentStatus.Name}",
StockStatus = $"{rec.Property.Stocks.FirstOrDefault().StockStatus.Name}",
LastUpdated = rec.LastUpdated
})
.ToList();
returnList.AddRange(records);
}
returnList.GroupBy(rec => rec.PropertyAddress);
return Ok(returnList);
here is an attachment of the screen grab of the output
You need to assign result of GroupBy() to variable:
returnList = returnList.GroupBy(rec => rec.PropertyAddress).ToList();
Make sure to actually use the new IEnumerable that the .GroupBy() Method returned.
If you want to return a List you need to use a workaround:
Get the IEnumerable<IGrouping<int, ReturnRecord>> from the .GroupBy()
Use .SelectMany() to select all elements and save them into an IEnumerable
Now you can convert your IEnumerable into a List with .List()
Example:
// Longer Alternative
IEnumerable<IGrouping<int, ReturnRecord>> groups = resultList
.GroupBy((rec => rec.PropertyAddress);
IEnumerable<ReturnRecord> result = groups.SelectMany(group => group);
List<ReturnRecord> listResult = result.ToList();
return Ok(listResult);
// Shorter Alternative
IEnumerable<IGrouping<int, ReturnRecord>> groups = resultList
.GroupBy((rec => rec.PropertyAddress);
IEnumerable<ReturnRecord> result = groups.SelectMany(group => group);
return Ok(result.ToList());

Linq on List<object>

Existing legacy code is as follows:
List<object> myItems;
//myItems gets populated by a method call
foreach (object[] item in myItems)
{
string Id = item[0].ToString();
string Number = item[1].ToString();
//now do some processing if Number satisfies some criteria
}
would like to convert this using linq to select all Ids that match a certain Number.
All suggestions would be appreciated.
Thanks.
Use Select() and Where()
bool IsSatisfyingNumber(String number) {
// True if number satisfies some criteria
}
List<String> matchingIds = myItems
.Where(item => IsSatisfyingNumber(item[1].ToString()))
.Select(item => item[0].ToString())
.ToList();
The list myItems contains items of type object where each this item is actually object[] so we need to cast to object[] first and then filter and select based on the searched certain number.
string certainNumber = "1";
var myIds = myItems
.Where(o => ((object[]) o)[1].ToString() == certainNumber)
.Select(o => ((object[]) o)[0].ToString());
The equality operator on strings performs an ordinal (case-sensitive and culture-insensitive) comparison so change it in the Where... if you need some different kind of comparison in your case.
Got it working and wanted to share the information:
var myIds =
(from item in myItems.Cast<object[]>()
select new
{ Id = item[0], Number = (string)item[1] }
)
.Where(x => x.Number == filtercondition)
.Select(x => (string)x.Id)
.ToList();

How do i sum a list of items by code(or any field)?

I have an object that has a list of another object in it. i.e Object1 contains List<Object2>.
Assuming this is the definition of object 2:
public class Object2
{
string code,
string name,
decimal amount
}
I want to be a able to make a list2 from the list whose value will contain what something similar to what a select name, code, sum(amount) group by code kinda statement could have given me
this is what i did but it didnt contain what i needed on passing through.
var newlist = obj2List.GroupBy(x => x.code)
.Select(g => new { Amount = g.Sum(x => x.amount) });
I want code and name in the new list just like the sql statement above.
You're almost there:
var newlist = obj2List.GroupBy(x => x.code)
.Select(g => new
{
Code = g.First().code,
Name = g.First().name,
Amount = g.Sum(x => x.amount)
});
This groups the items by code and creates an anonymous object for each group, taking the code and name of first item of the group. (I assume that all items with the same code also have the same name.)
If you are grouping by code and not by name you'd have to choose something for name from the list, perhaps with First() or Last() or something.
var newlist = obj2List.GroupBy(x => x.code).Select(g => new {
Code = g.Key,
Name = g.First().name,
Amount = g.Sum(x => x.amount)
});
var query = Object1.Obj2List
.GroupBy(obj2 => obj2.code)
.Select(g => new {
Names = string.Join(",", g.Select(obj2.name)),
Code = g.Key,
Amount = g.Sum(obj2 => obj2.Amount)
});
Since you group by code only you need to aggregate the name also in some way. I have used string.Join to create a string like "Name1,Name2,Name3" for each code-group.
Now you could consume the query for example with a foreach:
foreach(var x in query)
{
Console.WriteLine("Code: {0} Names: {1} Amount: {2}"
, x.Code, x.Names, x.Amount);
}
Instead of using the LINQ Extension Methods .GroupBy() and .Select() you could also use a pure LINQ statement which is way easier to read if you come from a SQL Background.
var ls = new List<Object2>();
var newLs = from obj in ls
group obj by obj.code into codeGroup
select new { code = codeGroup.Key, amount = codeGroup.Sum(s => s.amount) };

Split query into multiple queries and then join the results

I have this function below that takes a list of id's and searches the DB for the matching persons.
public IQueryable<Person> GetPersons(List<int> list)
{
return db.Persons.Where(a => list.Contains(a.person_id));
}
The reason I need to split this into four queries is because the query can't take more than 2100 comma-separated values:
The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. Too many parameters were provided in this RPC request. The maximum is 2100.
How can I split the list into 4 pieces and make a query for each list. Then join the results into one list of persons?
Solved
I don't want to post it as an own answer and take cred away from #George Duckett's answer, just show the solution:
public IQueryable<Person> GetPersons(List<int> list)
{
var persons = Enumerable.Empty<Person>().AsQueryable<Person>();
var limit = 2000;
var result = list.Select((value, index) => new { Index = index, Value = value })
.GroupBy(x => x.Index / limit)
.Select(g => g.Select(x => x.Value).ToList())
.ToList();
foreach (var r in result)
{
var row = r;
persons = persons.Union(db.Persons.Where(a => row.Contains(a.person_id)));
}
return persons;
}
See this answer for splitting up your list: Divide a large IEnumerable into smaller IEnumerable of a fix amount of item
var result = list.Select((value, index) => new { Index = index, Value = value})
.GroupBy(x => x.Index / 5)
.Select(g => g.Select(x => x.Value).ToList())
.ToList();
Then do a foreach over the result (a list of lists), using the below to combine them.
See this answer for combining the results: How to combine Linq query results
I am not sure why you have a method like this. What exactly are you trying to do. Anyway you can do it with Skip and Take methods that are used for paging.
List<Person> peopleToReturn = new List<Person>();
int pageSize = 100;
var idPage = list.Skip(0).Take(pageSize).ToList();
int index = 1;
while (idPage.Count > 0)
{
peopleToReturn.AddRange(db.Persons.Where(a => idPage.Contains(a.person_id)).ToList());
idPage = list.Skip(index++ * pageSize).Take(pageSize).ToList();
}

Categories

Resources