I have a loop like the following, can I do the same using multiple SUM?
foreach (var detail in ArticleLedgerEntries.Where(pd => pd.LedgerEntryType == LedgerEntryTypeTypes.Unload &&
pd.InventoryType == InventoryTypes.Finished))
{
weight += detail.GrossWeight;
length += detail.Length;
items += detail.NrDistaff;
}
Technically speaking, what you have is probably the most efficient way to do what you are asking. However, you could create an extension method on IEnumerable<T> called Each that might make it simpler:
public static class EnumerableExtensions
{
public static void Each<T>(this IEnumerable<T> col, Action<T> itemWorker)
{
foreach (var item in col)
{
itemWorker(item);
}
}
}
And call it like so:
// Declare variables in parent scope
double weight;
double length;
int items;
ArticleLedgerEntries
.Where(
pd =>
pd.LedgerEntryType == LedgerEntryTypeTypes.Unload &&
pd.InventoryType == InventoryTypes.Finished
)
.Each(
pd =>
{
// Close around variables defined in parent scope
weight += pd.GrossWeight;
lenght += pd.Length;
items += pd.NrDistaff;
}
);
UPDATE:
Just one additional note. The above example relies on a closure. The variables weight, length, and items should be declared in a parent scope, allowing them to persist beyond each call to the itemWorker action. I've updated the example to reflect this for clarity sake.
You can call Sum three times, but it will be slower because it will make three loops.
For example:
var list = ArticleLedgerEntries.Where(pd => pd.LedgerEntryType == LedgerEntryTypeTypes.Unload
&& pd.InventoryType == InventoryTypes.Finished))
var totalWeight = list.Sum(pd => pd.GrossWeight);
var totalLength = list.Sum(pd => pd.Length);
var items = list.Sum(pd => pd.NrDistaff);
Because of delayed execution, it will also re-evaluate the Where call every time, although that's not such an issue in your case. This could be avoided by calling ToArray, but that will cause an array allocation. (And it would still run three loops)
However, unless you have a very large number of entries or are running this code in a tight loop, you don't need to worry about performance.
EDIT: If you really want to use LINQ, you could misuse Aggregate, like this:
int totalWeight, totalLength, items;
list.Aggregate((a, b) => {
weight += detail.GrossWeight;
length += detail.Length;
items += detail.NrDistaff;
return a;
});
This is phenomenally ugly code, but should perform almost as well as a straight loop.
You could also sum in the accumulator, (see example below), but this would allocate a temporary object for every item in your list, which is a dumb idea. (Anonymous types are immutable)
var totals = list.Aggregate(
new { Weight = 0, Length = 0, Items = 0},
(t, pd) => new {
Weight = t.Weight + pd.GrossWeight,
Length = t.Length + pd.Length,
Items = t.Items + pd.NrDistaff
}
);
You could also group by true - 1 (which is actually including any of the items and then have them counted or summered):
var results = from x in ArticleLedgerEntries
group x by 1
into aggregatedTable
select new
{
SumOfWeight = aggregatedTable.Sum(y => y.weight),
SumOfLength = aggregatedTable.Sum(y => y.Length),
SumOfNrDistaff = aggregatedTable.Sum(y => y.NrDistaff)
};
As far as Running time, it is almost as good as the loop (with a constant addition).
You'd be able to do this pivot-style, using the answer in this topic: Is it possible to Pivot data using LINQ?
Ok. I realize that there isn't an easy way to do this using LINQ. I'll take may foreach loop because I understood that it isn't so bad. Thanks to all of you
Related
I need to return the 3 latest elements in a collection... If use Linq e.g. .OrderByDescending(a => a.Year).Take(3) then this is fine as long as the collection contains at least 3 elements. What I want is for it always to return 3, so for example if there are only 2 items then the last item would be a blank/initialised element (ideally where I could configure what was returned)
Is this possible?
You can concatenate the sequence with another (lazily created) sequence of 3 elements:
var result = query
.OrderByDescending(a => a.Year)
.Concat(Enumerable.Range(0, 3).Select(_ => new ResultElement()))
.Take(3);
Or perhaps:
var result = query
.OrderByDescending(a => a.Year)
.Concat(Enumerable.Repeat(new ResultElement(), 3))
.Take(3);
(The latter will end up with duplicate references and will always create an empty element, so I'd probably recommend the former... but it depends on the context. You might want to use Enumerable.Repeat(null, 3) and handle null elements instead.)
You could write your own extension method:
public static IEnumerable<T> TakeAndCreate<T>(this IEnumerable<T> input, int amount, Func<T> defaultElement)
{
int counter = 0;
foreach(T element in input.Take(amount))
{
yield return element;
counter++;
}
for(int i = 0; i < amount - counter; i++)
{
yield return defaultElement.Invoke();
}
}
Usage is
var result = input.OrderByDescending(a => a.Year).TakeAndCreate(3, () => new ResultElement());
One advantage of this solution is that it will create new elements only if they are acutally needed, which might be good for performance if you have a lot of elements to be created or their creation is not trivial.
Online demo: https://dotnetfiddle.net/HHexGd
I have the below snippet which takes a long time to run as the data increase.
OrderEntityColection is a List and samplePriceList is a List
OrderEntityColection = 30k trades
samplePriceList = 1million prices
Takes easily 10-15 minute to finish or more
I have tested this with 1500 orders and 300k prices but it takes around 40-50 seconds as well and as the orders increase so do prices and even takes longer
Can you see how i can improve this. I have alreadyy cut it down to these numbers before in hand from a big set.
MarketId = int
Audit = string
foreach (var tradeEntity in OrderEntityColection)
{
Parallel.ForEach(samplePriceList,new ParallelOptions {MaxDegreeOfParallelism = 8}, (price) =>
{
if (price.MarketId == tradeEntity.MarketId)
{
if (tradeEntity.InstructionPriceAuditId == price.Audit)
{
// OrderExportColection.Enqueue(tradeEntity);
count++;
}
}
});
}
So you want to do data in memory, ok - you need to be smart about the way you formulate the data up front. First thing is you're getting a list of prices by MarketId - so create that first:
var pricesLookupByMarketId = samplePriceList.ToDictionary(
p => p.MarketId,
v => v.ToDictionary(k => k.Market));
Now you have a Dictionary<int,Dictionary<int,Price>>(); (note ive assumed both MarketId and Audit are ints. If they're not it should still work)
Now your code becomes super simple and a lot faster
foreach (var tradeEntity in OrderEntityColection)
{
if(pricesLookupByMarketId.ContainsKey(tradeEntity.MarketId)
&& pricesLookupByMarketId[tradeEntity.MarketId].ContainsKey(tradeEntity.InstructionPriceAuditId))
{
count++;
}
}
Or, if you'er a fan of one long line
var count = OrderEntityColection.Count(tradeEntity => pricesLookupByMarketId.ContainsKey(tradeEntity.MarketId)
&& pricesLookupByMarketId[tradeEntity.MarketId].ContainsKey(tradeEntity.InstructionPriceAuditId))
As pointed out in the comments, this can be further optimized to stop repeated reads of the dictionaries - but the exact implementation depends on how you want to use this data in the end.
In the parallel loop you have cases, where you skip the processing for certain items. That's quite expensive, as you rely on that check to also happen on a separate thread. I'd just filter out the results first before processing those, as follows:
foreach (var tradeEntity in OrderEntityColection)
{
Parallel.ForEach(samplePriceList.Where(item=>item.MarketId == tradeEntity.MarketId && item.Audit == tradeEntity.InstructionPriceAuditId) ,new ParallelOptions {MaxDegreeOfParallelism = 8}, (price) =>
{
// Do whatever processing is required here
Interlocked.Increment(ref count);
});
}
On a side note, seems like you need to replace count++ with Interlocked.Increment(ref count), to be thread safe.
Manage to do this with the help of my friend
var samplePriceList = PriceCollection.GroupBy(priceEntity=> priceEntity.MarketId).ToDictionary(g=> g.Key,g=> g.ToList());
foreach (var tradeEntity in OrderEntityColection)
{
var price = samplePriceList[tradeEntity.MarketId].FirstOrDefault(obj => obj.Audit == tradeEntity.Audit);
if (price != null)
{
count+=1;
}
}
I have this code below that:
InstanceCollection instances = this.MyService(typeID, referencesIDs);
My problem here is when the referencesIDs.Count() is greater than a specific count, it throws an error which is related to SQL.
Suggested to me is to call the this.MyService multiple times so it won't process many referencesIDs.
What is the way to do that? I am thinking of using a while loop like this:
while (referencesIDs.Count() != maxCount)
{
newReferencesIDs = referencesIDs.Take(500).ToArray();
instances = this.MyService(typeID, newReferencesIDs);
maxCount += newReferencesIDs.Count();
}
The problem that I can see here is that how can I remove the first 500 referencesIDs on the newReferencesIDs? Because if I won't remove the first 500 after the first loop, it will continue to add the referencesIDs.
Are you just looking to update the referencesIDs value? Something like this?:
referencesIDs = referencesIDs.Skip(500);
Then the next time you call .Take(500) on referencesIDs it'll get the next 500 values.
Conversely, without updating the referencesIDs variable, you can include the Skip in your loop. Something like this:
var pageSize = 500;
var skipCount = 0;
while(...)
{
newReferencesIDs = referencesIDs.Skip(skipCount).Take(pageSize).ToArray();
skipCount += pageSize;
...
}
My first choice would be to fix the service, if you have access to it. A SQL-specific error could be a result of an incomplete database configuration, or a poorly written SQL query on the server. For example, Oracle limits IN lists in SQL queries to about 1000 items by default, but your Oracle DBA should be able to re-configure this limit for you. Alternatively, server side programmers could rewrite their query to avoid hitting this limit in the first place.
If this does not work, you could split your list into blocks of max size that does not trigger the error, make multiple calls to the server, and combine the instances on your end, like this:
InstanceCollection instances = referencesIDs
.Select((id, index) => new {Id = id, Index = index})
.GroupBy(p => p.Index / 500) // 500 is the max number of IDs
.SelectMany(g => this.MyService(typeID, g.Select(item => item.Id).ToArray()))
.ToList();
If you want a general way of splitting lists into chunks, you can use something like:
/// <summary>
/// Split a source IEnumerable into smaller (more manageable) lists.
/// </summary>
public static IEnumerable<IList<TSource>>
SplitIntoChunks<TSource>(this IEnumerable<TSource> source, int chunkSize)
{
long i = 1;
var list = new List<TSource>();
foreach (var t in source)
{
list.Add(t);
if (i++ % chunkSize == 0)
{
yield return list;
list = new List<TSource>();
}
}
if (list.Count > 0)
yield return list;
}
And then you can use SelectMany to flatten results:
InstanceCollection instances = referencesIDs
.SplitIntoChunks(500)
.SelectMany(chunk => MyService(typeID, chunk))
.ToList();
I have a query something like this
function List<CustomObject2> GetDataPoint(List<CustomObject> listDataPoints)
{
if(listDataPoints.Count == 0)
return;
var startPoint = new CustomObject();
startPoint = listDataPoint.First();
List<CustomObject2> cObjList = from r in listDataPoints
where r != null && r.GetDistance(startPoint) > 100
select new CustomObject2
{
Var1 = r.Var1
}.ToList()
}
The problem here is that, in the beginning the startPoint is set to the first object in listDataPoint. However, after the comparison in the query (GetDistance) I want to reassign startPoint to the value of "r" if the Distance is greater than 100.
Is there any way to do so?
Thanks in advance
No, there is no clean way to do that.
LINQ is essentially a piece of functional programming that has been brought into C#. In functional programming values are immutable (they cannot be changed). Thanks to being functional and using immutality, LINQ queries can be lazily evaluated. It is not uncommon for a LINQ query to be only partly run, or for some parts of the sequence to be evaluated several times. That is safe to do thanks to immutability.
As soon as you want to change a value, you are working against LINQ. In this case you are much better off with a for loop.
Of course there are ways to solve this in a functional manner, as it is possible to solve this in a purely functional language. But in C# it is much cleaner to use a for loop.
You can use a fold:
var cObjList = listDataPoints.Where(r => r != null)
.Aggregate(Tuple.Create(startPoint, new List<CustomObject2>()), (acc, r) => {
if(r.GetDistance(acc.Item1)) {
acc.Item2.Add(new CustomObject2 { Var1 = r.Var1 });
return Tuple.Create(r, acc.Item2);
}
else return acc;
}).Item2;
Since you were not-null checking the elements from listDataPoints, so I assume it may contain null objects. In this case, your code may be vulnerable when the First() element from the list is empty.
//there is no function or procedure in c#;
//function List<CustomObject2> GetDataPoint(List<CustomObject> listDataPoints)
List<CustomObject2> GetDataPoint(List<CustomObject> listDataPoints)
{
var dataPoints = listDataPoints.Where(r => r != null);
if (dataPoints.Empty())
//return; you cant not return anything in a function
return null; //or return an empty list
//return new List<CustomObject2>();
var cObjList = dataPoints.Aggregate(
new Stack<CustomObject>(),
(results, r) =>
{
if (r.GetDistance(results.Peek()) > 100)
results.Add(r);
return results;
})
.Select(r => new CustomObject2(){ Var1 = r.Var1 })
.ToList();
//return directly the line above or do more work with cObjList...
}
Yet, this is still messy and not easily maintained. Like Anders Abel suggests, you are best to go with the for loop for this case :
var cObjList= new List<CustomObject2>();
foreach(var r in dataPoints)
{
if (r.GetDistance(results.Peek()) > 100)
results.Add(new CustomObject2(){ Var1 = r.Var1 });
}
//...
return cObjList;
Here is some sample code I have basically written thousands of times in my life:
// find bestest thingy
Thing bestThing;
float bestGoodness = FLOAT_MIN;
foreach( Thing x in arrayOfThings )
{
float goodness = somefunction( x.property, localvariable );
if( goodness > bestGoodness )
{
bestGoodness = goodness;
bestThing = x;
}
}
return bestThing;
And it seems to me C# should already have something that does this in just a line. Something like:
return arrayOfThings.Max( delegate(x)
{ return somefunction( x.property, localvariable ); });
But that doesn't return the thing (or an index to the thing, which would be fine), that returns the goodness-of-fit value.
So maybe something like:
var sortedByGoodness = from x in arrayOfThings
orderby somefunction( x.property, localvariable ) ascending
select x;
return x.first;
But that's doing a whole sort of the entire array and could be too slow.
Does this exist?
This is what you can do using System.Linq:
var value = arrayOfThings
.OrderByDescending(x => somefunction(x.property, localvariable))
.First();
If the array can be empty, use .FirstOrDefault(); to avoid exceptions.
You really don't know how this is implemented internally, so you can't assure this will sort the whole array to get the first element. For example, if it was linq to sql, the server would receive a query including the sort and the condition. It wouldn't get the array, then sort it, then get the first element.
In fact, until you don't call First, the first part of the query isn't evaluated. I mean this isn't a two steps evaluation, but a one step evaluation.
var sortedValues =arrayOfThings
.OrderByDescending(x => somefunction(x.property, localvariable));
// values isn't still evaluated
var value = sortedvalues.First();
// the whole expression is evaluated at this point.
I don't think this is possible in standard LINQ without sorting the enuermable (which is slow in the general case), but you can use the MaxBy() method from the MoreLinq library to achieve this. I always include this library in my projects as it is so useful.
http://code.google.com/p/morelinq/source/browse/trunk/MoreLinq/MaxBy.cs
(The code actually looks very similar to what you have, but generalized.)
I would implement IComparable<Thing> and just use arrayOfThings.Max().
Example here:
http://msdn.microsoft.com/en-us/library/bb347632.aspx
I think this is the cleanest approach and IComparable may be of use in other places.
UPDATE
There is also an overloaded Max method that takes a projection function, so you can provide different logic for obtaining height, age, etc.
http://msdn.microsoft.com/en-us/library/bb534962.aspx
I followed the link Porges listed in the comment, How to use LINQ to select object with minimum or maximum property value and ran the following code in LINQPad and verified that both LINQ expressions returned the correct answers.
void Main()
{
var things = new Thing [] {
new Thing { Value = 100 },
new Thing { Value = 22 },
new Thing { Value = 10 },
new Thing { Value = 303 },
new Thing { Value = 223}
};
var query1 = (from t in things
orderby GetGoodness(t) descending
select t).First();
var query2 = things.Aggregate((curMax, x) =>
(curMax == null || (GetGoodness(x) > GetGoodness(curMax)) ? x : curMax));
}
int GetGoodness(Thing thing)
{
return thing.Value * 2;
}
public class Thing
{
public int Value {get; set;}
}
Result from LinqPad