I am processing a complex query in parallel. From the called methods I get a lot of Tuple<IEnumerable<Object>, int> objects. I would like to aggregate them quickly, but probably .Aggregate (code below) is not the best option. What is the right way to do it?
public static Tuple<IEnumerable<Object>, int> Parse(Object obj)
{
var ieo = new List<Object>();
var x = 5;
return new Tuple<IEnumerable<Object>, int>(ieo, x);
}
public static void Query(List<Object> obj)
{
var result = obj
.AsParallel()
.Select(o => Parse(o))
. // do something to aggregate this quickly and get a tuple of:
// - flattened IEnumerable<Object>
// - summed up all second items
}
And my aggregate suggestion, which probably is very slow and looks terribly. But works.
.Aggregate((t1, t2) => new Tuple<IEnumerable<Object>, int>(t1.Item1.Concat(t2.Item1), t1.Item2 + t2.Item2));
you can write custom flattener.
public static Tuple<IEnumerable<T>, int> MagicFlatten<T>(
this IEnumerable<Tuple<IEnumerable<T>, int>> tupleCrap)
{
var item1 = tupleCrap.SelectMany(x => x.Item1);
var item2 = tupleCrap.Sum(x => x.Item2);
return new Tuple<...>(item1, item2);
}
and later you can use it:
.AsParallel()
.Select(o => Parse(o))
.MagicFlatten();
Related
I got this linq:
return ngrms.GroupBy(x => x)
.Select(s => new { Text = s.Key, Count = s.Count() })
.Where(x => x.Count > minCount)
.OrderByDescending(x => x.Count)
.ToDictionary(g => g.Text, g => g.Count);
ngrms is IEnumerable<String>
Is there a way that I can optimize this code?
I don't care if I have to rewrite all the code and open to all low level tweaks.
If you implement a Dictionary that can be incremented (emulating a multiset or bag) then you can speed up about 3x faster than LINQ, but the difference is small unless you have a lot of ngrms. On a list of 10 million, with about 100 unique values, the LINQ code still takes less than a second on my PC. If your LINQ code takes time 1, a foreach with a Dictionary<string,int> takes 0.85 and this code takes 0.32.
Here is the class for creating an updateable value in the Dictionary:
public class Ref<T> {
public T val { get; set; }
public Ref(T firstVal) => val = firstVal;
public static implicit operator T(Ref<T> rt) => rt.val;
}
(If C# allowed operator ref T you could return a ref to the val property and almost treat a Ref<T> as if it were a lvalue of type T.)
Now you can count the occurrences of the strings in a Dictionary<string,Ref<int>> with only one lookup per string:
var dictCounts = new Dictionary<string, Ref<int>>();
foreach (var s in ngrms) {
if (dictCounts.TryGetValue(s, out var refn))
++refn.val;
else
dictCounts.Add(s, new Ref<int>(1));
}
Finally you can compute the answer by filtering the counts to the ones you want to keep:
var ans = dictCounts.Where(kvp => kvp.Value > minCount).ToDictionary(kvp => kvp.Key, kvp => kvp.Value.val);
Going by your linq query, you may consider rewriting the code using simple foreach loop for better performance, like below. It takes o(n) time complexity to execute:
Dictionary<string, int> dict = new Dictionary<string, int>();
foreach(var s in ngrms)
{
if (dict.ContainsKey(s))
dict[s]++;
else
dict.Add(s, 1);
}
return dict.Where(a => a.Value > minCount);
I just wrote myself a utility function
private static IEnumerable<T> Flatten<T>(params object[] items) where T : class
{
return items.SelectMany(c => c is T ? new[] {c as T} : (IEnumerable<T>) c);
}
It allowed me to go from this:
var lines = records
.GroupBy(c => new {c.CODEID, c.DESCRIPTION})
.SelectMany(c =>
new[] { string.Format(insertRecord, c.Key.CODEID, c.Key.DESCRIPTION) }
.Concat(c.Select(d => string.Format(insertDetail, d.CODEID, d.CODESEQ, d.DATAVALUE, d.DISPLAYVALUE)))
.Concat(new [] {Environment.NewLine}))
;
To this:
var lines2 = records
.GroupBy(c => new {c.CODEID, c.DESCRIPTION})
.SelectMany(c => Flatten<string>(
string.Format(insertRecord, c.Key.CODEID, c.Key.DESCRIPTION),
c.Select(d => string.Format(insertDetail, d.CODEID, d.CODESEQ, d.DATAVALUE, d.DISPLAYVALUE)),
Environment.NewLine))
;
Before I commit such an obscure looking thing to my code base, I wanted to see if I was overlooking some other obvious way to avoid the use of Concat in the first way above...
NOTE: maybe this belongs on code review... not sure
You can make the code simpler, but more importantly make the whole thing statically typed, by creating a method that simply prepends a single item to the start of a sequence:
public static IEnumerable<T> Prepend<T>(
this IEnumerable<T> sequence,
T item)
{
yield return item;
foreach(var current in sequence)
yield return current;
}
And one to append an item to the end of a sequence:
public static IEnumerable<T> Append<T>(
this IEnumerable<T> sequence,
T item)
{
foreach(var current in sequence)
yield return current;
yield return item;
}
Now your method can be written as:
var lines = records
.GroupBy(c => new {c.CODEID, c.DESCRIPTION})
.SelectMany(c =>
c.Select(d =>
string.Format(insertDetail, d.CODEID, d.CODESEQ, d.DATAVALUE, d.DISPLAYVALUE))
.Prepend(string.Format(insertRecord, c.Key.CODEID, c.Key.DESCRIPaTION))
.Append(Environment.NewLine);
The other route you can go is to write an AsSequence method that can more effectively turn an item into a sequence of size one.
public static IEnumerable<T> AsSequence<T>(this T item)
{
yield return item;
}
This does clean up your original code a bit by making the entire query one fluent sequence of method calls:
var lines = records
.GroupBy(c => new {c.CODEID, c.DESCRIPTION})
.SelectMany(c =>
string.Format(insertRecord, c.Key.CODEID, c.Key.DESCRIPTION)
.AsSequence()
.Concat(c.Select(d =>
string.Format(insertDetail, d.CODEID, d.CODESEQ, d.DATAVALUE, d.DISPLAYVALUE)))
.Concat(Environment.NewLine.AsSequence()));
I am trying to determine if there is a better way to execute the following query:
I have a List of Pair objects.
A Pair is defined as
public class Pair
{
public int IDA;
public int IDB;
public double Stability;
}
I would like to extract a list of all distinct ID's (ints) contained in the List<Pair>.
I am currently using
var pIndices = pairs.SelectMany(p => new List<int>() { p.IDA, p.IDB }).Distinct().ToList();
Which works, but it seems unintuitive to me to create a new List<int> only to have it flattened out by SelectMany.
This is another option I find unelegant to say the least:
var pIndices = pairs.Select(p => p.IDA).ToList();
pIndices.AddRange(pairs.Select((p => p.IDB).ToList());
pIndices = pIndices.Distinct().ToList();
Is there a better way? And if not, which would you prefer?
You could use Union() to get both the A's and B's after selecting them individually.
var pIndices = pairs.Select(p => p.IDA).Union(pairs.Select(p => p.IDB));
You could possibly shorten the inner expression to p => new [] { p.IDA, p.IDB }.
If you don't want to create a 2-element array/list for each Pair, and don't want to iterate your pairs list twice, you could just do it by hand:
HashSet<int> distinctIDs = new HashSet<int>();
foreach (var pair in pairs)
{
distinctIDs.Add(pair.IDA);
distinctIDs.Add(pair.IDB);
}
This is one without a new collection:
var pIndices = pairs.Select(p => p.IDA)
.Concat(pairs.Select(p => p.IDB))
.Distinct();
Shorten it like this:
var pIndices = pairs.SelectMany(p => new[] { p.IDA, p.IDB }).Distinct().ToList();
Using Enumerable.Repeat is a little unorthodox, but here it is anyway:
var pIndices = pairs
.SelectMany(
p => Enumerable.Repeat(p.IDA, 1).Concat(Enumerable.Repeat(p.IDB, 1))
).Distinct()
.ToList();
Finally, if you do not mind a little helper class, you can do this:
public static class EnumerableHelper {
// usage: EnumerableHelper.AsEnumerable(obj1, obj2);
public static IEnumerable<T> AsEnumerable<T>(params T[] items) {
return items;
}
}
Now you can do this:
var pIndices = pairs
.SelectMany(p => EnumerableHelper.AsEnumerable(p.IDA, p.IDB))
.Distinct()
.ToList();
I want to generate an observable where each value of the observable is dependent on the one before it, starting from a single value. If I have a simple transformation between values like Func<int, int>, it is easy to do with Observable.Generate like so:
Func<int, IObservable<int>> mkInts = init =>
Observable.Generate(
init, // start value
_ => true, // continue ?
i => i + 1, // transformation function
i => i); // result selector
using (mkInts(1).Subscribe(Console.WriteLine))
{
Console.ReadLine();
}
This will happily write numbers on my screen until I press enter. However, my transformation function does some network IO, so the type is Func<int, IObservable<int>>, so I cannot use that approach. Instead, I have tried this:
// simulate my transformation function
Func<int, IObservable<int>> mkInt = ts =>
Observable.Return(ts)
.Delay(TimeSpan.FromMilliseconds(10));
// pre-assign my generator function, since the function calls itself recursively
Func<int, IObservable<int>> mkInts = null;
// my generator function
mkInts = init =>
{
var ints = mkInt(init);
// here is where I depend on the previous value.
var nextInts = ints.SelectMany(i => mkInts(i + 1));
return ints.Concat(nextInts);
};
using (mkInts(1).Subscribe(Console.WriteLine))
{
Console.ReadLine();
}
But this will stackoverflow after printing about 5000 numbers. How can I solve this?
I think I've got a nice clean solution for you.
First-up, go back to using a Func<int, int> - it can easily be turned into a Func<int, IObservable<int>> using Observable.FromAsyncPattern.
I used this for testing:
Func<int, int> mkInt = ts =>
{
Thread.Sleep(100);
return ts + 1;
};
Now here's the money maker:
Func<int, Func<int, int>, IObservable<int>> mkInts = (i0, fn) =>
Observable.Create<int>(o =>
{
var ofn = Observable
.FromAsyncPattern<int, int>(
fn.BeginInvoke,
fn.EndInvoke);
var s = new Subject<int>();
var q = s.Select(x => ofn(x)).Switch();
var r = new CompositeDisposable(new IDisposable[]
{
q.Subscribe(s),
s.Subscribe(o),
});
s.OnNext(i0);
return r;
});
The iterating function is turned into an asynchronous observable.
The q variable feeds the values from the subject into the observable iterating function and selects the calculated observable. The Switch method flattens out the result and ensures that each call to the observable iterating function is properly cleaned up.
Also, the use of a CompositeDisposable allows the two subscriptions to be disposed of as one. Very neat!
It's easily used like this:
using (mkInts(7, mkInt).Subscribe(Console.WriteLine))
{
Console.ReadLine();
}
Now you have a fully parametrized version of your generator function. Nice, huh?
I find the following answer correct, but a little too complicated.
The only change I suggest is the mkInts method:
Func<int, Func<int, int>, IObservable<int>> mkInts = (i0, fn) =>
{
var s = new Subject<int>();
s.ObserveOn(Scheduler.NewThread).Select(fn).Subscribe(s);
s.OnNext(i0);
return s;
};
I was not entirely sure if you meant to feed the eventual result of the function back into the function again or if you meant to have a separate function that would get the next input, so I made both. The trick here is to let the IScheduler do the heavy lifting of the repeated calls.
public Func<T, IObservable<T>> Feedback<T>(Func<T, IObservable<T>> generator,
IScheduler scheduler)
{
return seed =>
Observable.Create((IObserver<T> observer) =>
scheduler.Schedule(seed,
(current, self) =>
generator(current).Subscribe(value =>
{
observer.OnNext(value);
self(value);
})));
}
public Func<T, IObservable<T>> GenerateAsync<T>(Func<T, IObservable<T>> generator,
Func<T, T> seedTransform,
IScheduler scheduler)
{
return seed =>
Observable.Create((IObserver<T> observer) =>
scheduler.Schedule(seed,
(current, self) =>
generator(current).Subscribe(value =>
{
observer.OnNext(value);
self(seedTransform(current));
})));
}
I believe the code is not tail recursive and hence causes SO exception. Below is the code which works fine without any such exception.
public static IObservable<int> GetObs(int i)
{
return Observable.Return(i).Delay(TimeSpan.FromMilliseconds(10));
}
public static IObservable<int> MakeInts(int start)
{
return Observable.Generate(start, _ => true, i => i + 1, i => GetObs(i))
.SelectMany(obs => obs);
}
using (MakeInts(1).Subscribe(Console.WriteLine))
{
Console.ReadLine();
}
Or by modifying your code like:
Action<int, IObserver<int>> mkInt = (i,obs) =>
Observable.Return(i)
.Delay(TimeSpan.FromMilliseconds(10)).Subscribe<int>(ii => obs.OnNext(ii));
// pre-assign my generator function, since the function calls itself recursively
Func<int, IObservable<int>> mkInts = null;
// my generator function
mkInts = init =>
{
var s = new Subject<int>();
var ret = s.Do(i => {
mkInt(i + 1, s);
});
mkInt(init, s);
return ret;
};
using (mkInts(1).Subscribe(Console.WriteLine))
{
Console.ReadLine();
}
I found a solution, which, although it may not be the prettiest, does what I want it to. If anyone has a better solution, I will mark that as an answer.
Func<int, IObservable<int>> mkInt = ts =>
Observable.Return(ts)
.Delay(TimeSpan.FromMilliseconds(10));
Func<int, IObservable<int>> mkInts = init =>
{
Subject<int> subject = new Subject<int>();
IDisposable sub = null;
Action<int> onNext = null;
onNext = i =>
{
subject.OnNext(i);
sub.Dispose();
sub = mkInt(i + 1).Subscribe(onNext);
};
sub = mkInt(init).Subscribe(onNext);
return subject;
};
using (mkInts(1).Subscribe(Console.WriteLine))
{
Console.ReadLine();
}
I have this structure:
static Dictionary<int, Dictionary<int, string>> tasks =
new Dictionary<int, Dictionary<int, string>>();
it looks like that
[1]([8] => "str1")
[3]([8] => "str2")
[2]([6] => "str3")
[5]([6] => "str4")
I want to get from this list all of the [8] strings, meaning str1 + str2
The method should look like the following:
static List<string> getTasksByNum(int num){
}
How do I access it?
With LINQ, you can do something like:
return tasks.Values
.Where(dict => dict.ContainsKey(8))
.Select(dict => dict[8])
.ToList();
While this is elegant, the TryGetValue pattern is normally preferable to the two lookup operations this uses (first trying ContainsKey and then using the indexer to get the value).
If that's an issue for you, you could do something like (with a suitable helper method):
return tasks.Values
.Select(dict => dict.TryGetValueToTuple(8))
.Where(tuple => tuple.Item1)
.Select(tuple => tuple.Item2)
.ToList();
Just iterate over all values of the first hierarchy level and use TryGetValue on the second level:
var result = new List<string>();
foreach(var inner in tasks.Values)
{
string tmp;
if(inner.TryGetValue(yourKey, out tmp)
result.Add(tmp);
}
This solution has a major advantage over all other solutions presented so far:
It actually uses the dictionaries of the second hierarchy level as a dictionary, i.e. the part inside the foreach loop is O(1) instead of O(n) as with all other solutions.
Check this function:
tasks.
Where(task => task.Value.ContainsKey(8)).
Select(task => task.Value[8]);
Daniel's solution is probably best, since it's easier to understand. But it's possible to use TryGetValue in a linq approach, too:
return tasks.Values
.Select(dictionary => {
string task;
var success = dictionary.TryGetValue(yourKey, out task);
return new { success, task };
})
.Where(t => t.success)
.Select(t => t.task)
.ToList();
Are you building tasks ?
And if I'm guessing right it's tasks[task_id]([cpu] => "task_name");
I would advice you also build cpu_tasks[cpu]([task_id] => "task_name);
static Dictionary<int, Dictionary<int, string>> cpu_tasks
It would require some more maintenance but would give you a faster run on this specific function.
Dictionary<int, Dictionary<int, string>> tasks = new Dictionary<int, Dictionary<int, string>>();
List<string> strings = new List<string>();
foreach(var dict in tasks.Values)
{
if(dict.ContainsKey(8))
strings.Add(dict[8]);
}
Dictionary<int, Dictionary<int, string>> tasks = new Dictionary<int, Dictionary<int, string>>();
var result = string.Empty;
//more human-readable version
var searchValue = 8;
foreach (var task in tasks)
{
if (task.Value.ContainsKey(searchValue))
result += task.Value[searchValue];
}
//one-line version
result = tasks.ToList().Aggregate(string.Empty, (a, kvp) => a += kvp.Value.ContainsKey(searchValue) ? kvp.Value[searchValue] : string.Empty);