Implementation options of summing, averaging, concatenating, etc items in an IEnumerable - c#

I'm looking for the shortest code to create methods to perform common operations on items in an IEnumerable.
For example:
public interface IPupil
{
string Name { get; set; }
int Age { get; set; }
}
Summing a property - e.g. IPupil.Age in IEnumerable<IPupil>
Averaging a property - e.g. IPupil.Age in IEnumerable<IPupil>
Building a CSV string - e.g. IPupil.Name in IEnumerable<IPupil>
I'm interested in the various approaches to solve these examples: foreach (long hand), delegates, LINQ, anonymous methods, etc...
Sorry for the poor wording, I'm having trouble describing exactly what I'm after!

Summing and averaging: easy with LINQ:
var sum = pupils.Sum(pupil => pupil.Age);
var average = pupils.Average(pupil => pupil.Age);
Building a CSV string - there are various options here, including writing your own extension methods. This will work though:
var csv = string.Join(",", pupils.Select(pupil => pupil.Name).ToArray());
Note that it's tricky to compute multiple things (e.g. average and sum) in one pass over the data with normal LINQ. If you're interested in that, have a look at the Push LINQ project which Marc Gravell and I have written. It's a pretty specialized requirement though.

Related

Efficiently calculating totals from a file using LINQ

I'm reading a file and turning each line within it into a class, let's call it Record, and returning each Record as it is read using IEnumerable<Record> and yield return.
Because of this I only start actually performing these reads whenever I do an operation on the enumeration, such as performing a sum on it or iterating through it with a foreach.
I do need to go through each record and then translate that into a database, but due to database design before my time I need the totals on each record in the database, so I need these totals before I start translating them into my database.
At the moment I have five separate .Count() or .Sum() operations on my enumeration before I start iterating the enumeration (example int i = records.Sum(r => r.SomeField) or int j = records.Count(r => r.IsSomethingTrue)). Each one of those counts or sums will loop through the entire file to calculate each one separately. I'm not really happy with this behaviour and would like to find a more efficient way of doing this.
I am using .NET 3.5 if that makes any difference.
You could use your own struct to calculate a few values at the single pass through an enumerable object.
public struct ComplexAccumulator
{
public int TotalSumField { get; set; }
public int CountSomethingTrue { get; set; }
}
Now you can use Aggreagate extension method to accumulate values:
records.Aggregate(default(ComplexAccumulator), (a, r) => new ComplexAccumulator
{
TotalSumFiled = a.TotalSumField + r.SumField,
CountSomethingTrue = a.CountSomethingTrue + r.IsSomethingTrue ? 1 : 0,
});
Instead of the struct you could use suitable Tuple instance, f.e. something like Tuple<int, int, int>.
Efficiency is not a strength of LINQ... You need to replace some LINQ things with manual loops here.
You seem to need two passes over the data. One for aggregation:
var sum = 0; //etc.
foreach (var item in items) {
//compute all 5 aggregates here
}
And then one to translate the data:
items.Select(item => Translate(item, aggregates))
Whether you should buffer items (for example using ToList) or not depends on whether available memory can hold those items or not.
You can use Aggregate to perform all 5 aggregations in one pass but that's not better than a loop in any way. It's slower, far more code and the code arguably is illegible.

Make a list readonly in c#

I have this example code. What I want to do is to make it so that the "Nums" value can only be written to using the "AddNum" method.
namespace ConsoleApplication1
{
public class Person
{
string myName = "N/A";
int myAge = 0;
List<int> _nums = new List<int>();
public List<int> Nums
{
get
{
return _nums;
}
}
public void AddNum(int NumToAdd)
{
_nums.Add(NumToAdd);
}
public string Name { get; set; }
public int Age { get; set; }
}
}
Somehow, I've tried a bunch of things regarding AsReadOnly() and the readonly keyword, but I can't seem to get it to do what I want it to do.
Here is the sample of the code I have to access the property.
Person p1 = new Person();
p1.Nums.Add(25); //access 1
p1.AddNum(37); //access 2
Console.WriteLine("press any key");
Console.ReadLine();
I really want "access 1" to fail, and "access 2" to be the ONLY way that the value can be set. Thanks in advance for the help.
√ DO use ReadOnlyCollection, a subclass of ReadOnlyCollection,
or in rare cases IEnumerable for properties or return values
representing read-only collections.
The quote from this article.
You should have something like this:
List<int> _nums = new List<int>();
public ReadOnlyCollection<int> Nums
{
get
{
return _nums.AsReadOnly();
}
}
In general, collection types make poor properties because even when a collection is wrapped in ReadOnlyCollection, it's inherently unclear what:
IEnumerable<int> nums = myPerson.Nums;
myPerson.AddNum(23);
foreach(int i in nums) // Should the 23 be included!?
...
is supposed to mean. Is the object returned from Nums a snapshot of the numbers that existed when it called, is it a live view?
A cleaner approach is to have a method called something like GetNumsAsArray which returns a new array each time it's called; it may also be helpful in some cases to have a GetNumsAsList variant depending upon what the caller will want to do with the numbers. Some methods only work with arrays, and some only work with lists, so if only one of the above is provided some callers will have to call it and then convert the returned object to the required type.
If performance-sensitive callers will be needing to use this code a lot, it may be helpful to have a more general-purpose method:
int CopyNumsIntoArray(int sourceIndex, int reqCount, ref int[] dest,
int destIndex, CopyCountMode mode);
where CopyCountMode indicates what the code should do the number of items available starting at sourceIndex is greater or less than reqCount; the method should either return the number of items that were available, or throw an exception if it violated the caller's stated expectations. Some callers might start by create and passing in a 10-item array but be prepared to have the method replace it with a bigger array if there are more than ten items to be returned; others might expect that there will be exactly 23 items and be unprepared to handle any other number. Using a parameter to specify the mode will allow one method to service many kinds of callers.
Although many collection authors don't bother including any method that fits the above pattern, such methods can greatly improve efficiency in cases where code wants to work with a significant minority of a collection (e.g. 1,000 items out of a collection of 50,000). In the absence of such methods, code wishing to work with such a range must either ask for a copy of the whole thing (very wasteful) or request thousands of items individually (also wasteful). Allowing the caller to supply the destination array would improve efficiency in the case where the same method makes many queries, especially if the destination array would be large enough to be put on the large object heap.

Best way to store ROE in C#

I want to save Rate of Exchange of all currencies corresponding different base currency. What is the best and efficient type or struct to save the same. Currently i am using
Dictionary<string, Dictionary<string, decimal>>();
Thanks in advance. Please suggest
Typically I use a variation of the following class.
public ForexSpotContainer : IEnumerable<FxSpot>
{
[DataMember] private readonly Dictionary<string, FxSpot> _fxSpots;
public FxSpot this[string baseCurrency, string quoteCurrency]
{
get
{
var baseCurrencySpot = _fxSpots[baseCurrency];
var quoteCurrencySpot = _fxSpots[quoteCurrency];
return baseCurrencySpot.Invert()*quoteCurrencySpot;
}
}
public IEnumerator<FxSpot> GetEnumerator()
{
return _fxSpots.Values.GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
I tend to then create a Money class and a FxSpot class then create +-*/ operators for the Money and FxSpot classes so that I can do financial calculations in a safe way.
EDIT: I my experience, when working in financial systems, I have always had issues with code like this
decimal sharePriceOfMsft = 40.30m;
decimal usdEur = 0.75m;
decimal msftInEur = sharePriceOfMsft * usdEur;
Since it always takes a few second for me to check if I should multiply the spot or divide.
The problem is compounded when I have to use Forex Crosses, such as JPYEUR or EURJPY etc, and hours were lost to subtitle bugs from close Forex Spots.
Also of consequence is the dimensional analysis of equations. When you multiple lots of numbers together and you expect a Price, are you sure you didn't mess up a multiple/divide. By creating a new class for each unit, you have a little more compile time error checking that can ultimately save you seconds for each line of code you read (which in a many thousand line library will add up to hours very quickly).
Don't see big problem with your approach, if not
1) use some custom 3rd party in memory DB (redis like) . But may reveal too combersome for your case.
2) derive from your type
public class MyCustomHolder : Dictionary<string, Dictionary<string, decimal>> {
}
So avoid that long and confusing definitions in the code, and bring more semantics to
your code reader and yourself.

Allowing ad-hoc expressions over IEnumerable<MyObj>

I have the following model
public class Model
{
public string Name {get;set;}
public DateTime HireDate {get;set;}
public decimal Salary {get;set;}
public int Hours {get;set;}
}
Now I have the following List
List<Model> employees = new List<Model>();
I am taking expression string input from user, following are the examples of what user might use
Salary
Salary + 500
Salary * Hours
SUM(Hours)
SUM(Hours * Salary)
SUM(Hours) / MIN (Hours)
I have to process user's expressions into another IEnumerable of either int or decimal, depending on calculation and which type has higher precision, for example The 3rd Expression will generate the following
var result = employees.Select(e => e.Salary * e.Hours)
The 4th one will result into this
var result = employees.Sum(e => e.Hours)
I am currently doing it by first parsing the expression into parse tree and building Expression tree using System.Linq.Expressions namespace. With that approach, there is a lot of code and not easily readable by other developers. Is there any easier way ?
Maybe instead of doing it again you might find some help on (already pretty old) Scott Gu's Blog
Or at least it should give you a good code base to have a look into

Is possible to change search method in LINQ?

I have csv file with 30 000 lines. I have to select many values based on many conditions, so insted of many loops and "if's" i decided to use linq. I have written class to read csv. It implements IEnumerable to be used with linq. This is my enumerator:
class CSVEnumerator : IEnumerator
{
private CSVReader _csv;
private int _index;
public CSVEnumerator(CSVReader csv)
{
_csv = csv;
_index = -1;
}
public void Reset(){_index = -1;}
public object Current
{
get
{
return new CSVRow(_index,_csv);
}
}
public bool MoveNext()
{
return ++_index < _csv.TotalRows;
}
}
It's working, but it's slow. Let's say i want to select max value in column A in range 100;150 row.
max = (from CSVRow r in csv where r.ID > 100 && r.ID < 150 select r).Max(y=>y["A"]);
This will work, but linq searches for max value in 30 000 rows instead of 48.
As I said, I could use loop, but only in this example case, conditions are "brutal" :)
Is there any way to override linq collection search. Something like: look into query used on my enumerator, look, if any linq conditions in "where" contains "row ID filter" and give another data based on this.
I don't want to copy part of data to another array/collection and problem is not in my csv reader. Accessing every row by id is fast, only problem is when you access all 30 000 of them.
Any help appriciated :-)
If you wanted to be able to use LINQ for this efficiently, you would need to use expression trees, in a similar (but much simpler) way than what various LINQ providers for SQL databases do. While doable, I think it would be quite a lot of code for such a simple task.
Because of that, I think a better solution would be to use a separate method to select the rows you want (and then possibly use LINQ to work with the result).
Also, many operations that return collections (including your original code and my modification) can be simplified by using iterator methods.
So, your code could look something like this:
public static IEnumerable<CSVRow> GetRows(
this CSVReader reader, int idGreaterThan, int idLessThan)
{
for (int i = idGreaterThan + 1; i < idLessThan; i++)
{
yield return new CSVRow(reader, i);
}
}
Here, it's an extension method for CSVReader, but another solution (e.g. actual method on that class) might be more appropriate for you.
Your example would then look something like:
max = csvReader.GetRows(100, 150).Max(y => y["A"]);
(Also, I find it weird that when you have limits 100 and 150, you actually want rows between 101 and 149. But I'm assuming you have a reason for that, so I did the same.)
As far as LINQ is concerned, r.ID is simply a value that is being filtered and so all 30k lines are considered for use in the Max operation. If this is a row index, which seems to be the case here, you can use Skip and Take to avoid comparing all 30k rows.
max = csv.Skip(100).Take(50).Max(y => y["A"]);
#DougM is right about the order of evaluation, but in this case what I would do is take a one time hit on initialization and generate lookups for any "index" fields: basically, pre calculate a map (dictionary) of row index to row. That said, this would only be useful if you have many repeated queries for a given index field.

Categories

Resources