combining one observable with latest from another observable - c#

I'm trying to combine two observables whose values share some key.
I want to produce a new value whenever the first observable produces a new value, combined with the latest value from a second observable which selection depends on the latest value from the first observable.
pseudo code example:
var obs1 = Observable.Interval(TimeSpan.FromSeconds(1)).Select(x => Tuple.create(SomeKeyThatVaries, x)
var obs2 = Observable.Interval(TimeSpan.FromMilliSeconds(1)).Select(x => Tuple.create(SomeKeyThatVaries, x)
from x in obs1
let latestFromObs2WhereKeyMatches = …
select Tuple.create(x, latestFromObs2WhereKeyMatches)
Any suggestions?
Clearly this could be implemented by subcribing to the second observable and creating a dictionary with the latest values indexable by the key. But I'm looking for a different approach..
Usage scenario: one minute price bars computed from a stream of stock quotes. In this case the key is the ticker and the dictionary contains latest ask and bid prices for concrete tickers, which are then used in the computation.
(By the way, thank you Dave and James this has been a very fruitful discussion)
(sorry about the formatting, hard to get right on an iPad..)

...why are you looking for a different approach? Sounds like you are on the right lines to me. It's short, simple code... roughly speaking it will be something like:
var cache = new ConcurrentDictionary<long, long>();
obs2.Subscribe(x => cache[x.Item1] = x.Item2);
var results = obs1.Select(x => new {
obs1 = x.Item2,
cache.ContainsKey(x.Item1) ? cache[x.Item1] : 0
});
At the end of the day, C# is an OO language and the heavy lifting of the thread-safe mutable collections is already all done for you.
There may be fancy Rx approach (feels like joins might be involved)... but how maintainable will it be? And how will it perform?
$0.02

I'd like to know the purpose of a such a query. Would you mind describing the usage scenario a bit?
Nevertheless, it seems like the following query may solve your problem. The initial projections aren't necessary if you already have some way of identifying the origin of each value, but I've included them for the sake of generalization, to be consistent with your extremely abstract mode of questioning. ;-)
Note: I'm assuming that someKeyThatVaries is not shared data as you've shown it, which is why I've also included the term anotherKeyThatVaries; otherwise, the entire query really makes no sense to me.
var obs1 = Observable.Interval(TimeSpan.FromSeconds(1))
.Select(x => Tuple.Create(someKeyThatVaries, x));
var obs2 = Observable.Interval(TimeSpan.FromSeconds(.25))
.Select(x => Tuple.Create(anotherKeyThatVaries, x));
var results = obs1.Select(t => new { Key = t.Item1, Value = t.Item2, Kind = 1 })
.Merge(
obs2.Select(t => new { Key = t.Item1, Value = t.Item2, Kind = 2 }))
.GroupBy(t => t.Key, t => new { t.Value, t.Kind })
.SelectMany(g =>
g.Scan(
new { X = -1L, Y = -1L, Yield = false },
(acc, cur) => cur.Kind == 1
? new { X = cur.Value, Y = acc.Y, Yield = true }
: new { X = acc.X, Y = cur.Value, Yield = false })
.Where(s => s.Yield)
.Select(s => Tuple.Create(s.X, s.Y)));

Related

LINQ Lambda efficiency of code groupby orderby

I have this code, but I think that it could run faster, or I just hope to. But I have plenty of data. I'd like to have it as effective as it can be.
Here is the code:
(Need to return newest translations of words (Language and value) from resources grouped by resource and language based on Expression<Func<ResourcesTranslation, bool>> ConditionExpression)
KeyValues = item.Resources
.Where(ConditionExpression)
.GroupBy(g => new { g.ResourceId, g.Language })
.Select(m => m.OrderByDescending(o => o.Changed ?? o.Created))
.Select( s => new KeyValues
{
Language = s.FirstOrDefault().Language,
KeyValue = s.FirstOrDefault().Value
}).ToList();
As you need only one element after grouping, you can return it right in GroupBy clause, it will simplify your code:
KeyValues = item.Resources
.Where(ConditionExpression)
.GroupBy(g => new { g.ResourceId, g.Language },
(x, y) => new { Max = y.OrderByDescending(o => o.Changed ?? o.Created).First() })
.Select(s => new KeyValues
{
Language = s.Max.Language,
KeyValue = s.Max.Value
})
.ToList();
Even though you can get some performance by removing the first, unneeded select (depending on the volume of data this could be minimal to medium improvement) like this:
KeyValues = item.Resources
.Where(ConditionExpression)
.GroupBy(g => new { g.ResourceId, g.Language })
.OrderByDescending(o => o.Changed.HasValue ? o.Changed : o.Created)
.Select( s => new KeyValues
{
Language = s.Language,
KeyValue = s.Value
}).ToList();
Depending on your case, you could:
If your data is in a database, you can create database improvements like adding indexes, updating statistics, using hints etc.
if this is local data, you can use some strategy to split new and old data between various enumerables.
There is no other way to significantly improve your linq query. You need to find another strategy to achieve that.
I found out that Visual Studio translates it in to selects, so I realized that, the best solution for stuff like this is to make some View.. Just giving answer to own Q for another guys.

How to remove Items from a list with the same property value, where the count is greater than 2

I have List of Jo Cards, i want to remove Job Cards where the VehicleID count is greater than two
Here is My Attempt.
var OpenJobCards = await _context.WorkshopJobCards.Include(wjc => wjc.WorkshopJobCardCategory).Where(wjc => wjc.Job_Card_Closed == false).ToListAsync() ;
OpenJobCards.Remove(OpenJobCards.GroupBy(wjc => wjc.VehicleID).Count() >2);
Based on your comment, it would seem you're actually looking at this the wrong way. If vehicles have a collection of job cards, and your ultimate goal is to show only vehicles with less than 2 job cards assigned to them, then just do:
var vehicles = await _context.Vehicles.Where(x => x.JobCards.Count() < 2).ToListAsync();
Done.
as far as I understand you are using DBContext. If it is so, please also keep in mind that it is better to use Queriable where possible so that filtering is done on the DB side and does not fetched into your application memory
var listToRemove = await _context.WorkshopJobCards.Include(wjc => wjc.WorkshopJobCardCategory)
.Where(wjc => wjc.Job_Card_Closed == false).GroupBy(wjc => wjc.VehicleID)
.Where(t => t.Count() >2)
.Select(x => new OpenJobCard() {Id = x.Key});
_context.entity.RemoveRange(listToRemove);

Improve Linq query performance that use ToList()

this code written by #Rahul Singh in this post Convert TSQL to Linq to Entities :
var result = _dbContext.ExtensionsCategories.ToList().GroupBy(x => x.Category)
.Select(x =>
{
var files = _dbContext.FileLists.Count(f => x.Select(z => z.Extension).Contains(f.Extension));
return new
{
Category = x.Key,
TotalFileCount = files
};
});
but this code have problem when used inside database context and we should use ToList() like this to fix "Only primitive types or enumeration types are supported in this context" error :
var files = _dbContext.FileLists.Count(f => x.Select(z => z.Extension).ToList().Contains(f.Extension));
the problem of this is ToList() fetch all records and reduce performance, now i wrote my own code :
var categoriesByExtensionFileCount =
_dbContext.ExtensionsCategories.Select(
ec =>
new
{
Category = ec.Category,
TotalSize = _dbContext.FileLists.Count(w => w.Extension == ec.Extension)
});
var categoriesTOtalFileCount =
categoriesByExtensionFileCount.Select(
se =>
new
{
se.Category,
TotalCount =
categoriesByExtensionFileCount.Where(w => w.Category == se.Category).Sum(su => su.TotalSize)
}).GroupBy(x => x.Category).Select(y => y.FirstOrDefault());
the performance of this code is better but it have much line of code, any idea about improve performance of first code or reduce line of second code :D
Regards, Mojtaba
You should have a navigation property from ExtensionCategories to FileLists. If you are using DB First, and have your foreign key constraints set up in the database, it should do this automatically for you.
If you supply your table designs (or model classes), it would help a lot too.
Lastly, you can rewrite using .ToList().Contains(...) with .Any() which should solve your immediate issue. Something like:
_dbContext.FileLists.Count(f => x.Any(z => z.Extension==f.Extension)));

Parallelizing data processing

I'm trying to improve the runtime of some data processing I'm doing. The data starts out as various collections (Dictionary mostly, but a few other IEnumerable types), and the end result of processing should be a Dictionary<DataType, List<DataPoint>>.
I have all this working just fine... except it takes close to an hour to run, and it needs to run in under 20 minutes. None of the data has any connection to any other from the same collection, although they cross-reference other collections frequently, so I figured I should parallelize it.
The main structure of the processing has two levels of loops with some processing in between:
// Custom class, 0.01%
var primaryData= GETPRIMARY().ToDictionary(x => x.ID, x => x);
// Custom class, 11.30%
var data1 = GETDATAONE().GroupBy(x => x.Category)
.ToDictionary(x => x.Key, x => x);
// DataRows, 8.19%
var data2 = GETDATATWO().GroupBy(x => x.Type)
.ToDictionary(x => x.Key, x => x.OrderBy(y => y.ID));
foreach (var key in listOfKeys)
{
// 0.01%
var subData1 = data1[key].ToDictionary(x => x.ID, x => x);
// 1.99%
var subData2 = data2.GroupBy(x => x.ID)
.Where(x => primaryData.ContainsKey(x.Type))
.ToDictionary(x => x.Key, x => ProcessDataTwo(x, primaryData[x.Key]));
// 0.70%
var grouped = primaryData.Select(x => new { ID = x.Key,
Data1 = subData1[x.Key],
Data2 = subData2[x.Key] }).ToList();
foreach (var item in grouped)
{
// 62.12%
item.Data1.Results = new Results(item.ID, item.Data2);
// 12.37%
item.Data1.Status = new Status(item.ID, item.Data2);
}
results.Add(key, grouped);
}
return results;
listOfKeys is very small, but each grouped will have several thousand items. How can I structure this so that each call to item.Data1.Process(item.Data2) can get queued up and executed in parallel?
According to my profiler, all the ToDictionary() calls together take up about 21% of the time, the ToList() takes up 0.7%, and the two items inside the inner foreach together take up 74%. Hence why I'm focusing my optimization there.
I don't know if I should use Parallel.ForEach() to replace the outer foreach, the inner one, both, or if there's some other structure I should use. I'm also not sure if there's anything I can do to the data (or the structures holding it) to improve parallel access to it.
(Note that I'm stuck on .NET4, so don't have access to async or await)
Based on the percentages you posted and you said that grouped was very large you would definitely benefit by only paralyzing the inner loop.
Doing it is fairly simple to do
var grouped = primaryData.Select(x => new { ID = x.Key,
Data1 = subData1[x.Key],
Data2 = subData2[x.Key] }).ToList();
Parallel.ForEach(grouped, (item) =>
{
item.Data1.Results = new Results(item.ID, item.Data2);
item.Data1.Status = new Status(item.ID, item.Data2);
});
results.Add(key, grouped);
This assumes that new Results(item.ID, item.Data2); and new Status(item.ID, item.Data2); are safe to do multiple initializations at once (the only concern I would have is if they access non-thread safe static resources internally, and even so a non thread safe constructor is a really bad design flaw)
There is one big cavat: This will only help if you are CPU bound. If Results or Status is IO bound (for example it is waiting on a database call or a file on the hard drive) doing this will hurt your performance instead of helping it. If you are IO bound instead of CPU bound the only options are to buy faster hardware, attempt optimize those two methods more, or use caching in memory if possible so you don't need to do the slow IO.
EDIT
Given the time measurements provided after I wrote this answer, it appears that this approach was looking for savings in the wrong places. I'll leave my answer as a warning against optimization without measurement!!!
So, because of the nestedness of your approach, you are causing some unnecessary over-iteration of some of your collections leading to rather nasty Big O characteristics.
This can be mitigated by using the ILookup interface to pre-group collections by a key and to use these instead of repeated and expensive Where clauses.
I've had a stab at re-imagining your code to reduce complexity (but it is somewhat abstract):
var data2Lookup = data2.ToLookup(x => x.Type);
var tmp1 =
listOfKeys
.Select(key =>
new {
key,
subData1 = data1[key],
subData2 = data2Lookup[key].GroupBy(x=>x.Category)
})
.Select(x =>
new{
x.key,
x.subData1,
x.subData2,
subData2Lookup = x.subData2.ToLookup(y => y.Key)});
var tmp2 =
tmp1
.Select(x =>
new{
x.key,
grouped = x.subData1
.Select(sd1 =>
new{
Data1 = sd1,
Data2 = subData2Lookup[sd1]
})
});
var result =
tmp2
.ToDictionary(x => x.key, x => x.grouped);
It seems to me that the processing is somewhat arbitrarily place midway through the building of results, but shouldn't affect it, right?
So once results is built, let's process it...
var items = result.SelectMany(kvp => kvp.Value);
for(var item in items)
{
item.Data1.Process(item.Data2);
}
EDIT
I've deliberately avoided going parallel fttb, so if you can get this working, there might be further speedup by adding a bit of parallel magic.

Order By on the Basis of Integer present in string

I've a problem in my C# application... I've some school classes in database for example 8-B, 9-A, 10-C, 11-C and so on .... when I use order by clause to sort them, the string comparison gives results as
10-C
11-C
8-B
9-A
but I want integer sorting on the basis of first integer present in string...
i.e.
8-B
9-A
10-C
11-C
hope you'll understand...
I've tried this but it throws exception
var query = cx.Classes.Select(x=>x.Name)
.OrderBy( x=> new string(x.TakeWhile(char.IsDigit).ToArray()));
Please help me... want ordering on the basis of classes ....
Maybe Split will do?
.OrderBy(x => Convert.ToInt32(x.Split('-')[0]))
.ThenBy(x => x.Split('-')[1])
If the input is well-formed enough, this would do:
var maxLen = cx.Classes.Max(x => x.Name.Length);
var query = cx.Classes.Select(x => x.Name).OrderBy(x => x.PadLeft(maxLen));
You can add 0 as left padding for a specified length as your data for example 6
.OrderBy(x => x.PadLeft(6, '0'))
This is fundamentally the same approach as Andrius's answer, written out more explicitly:
var names = new[] { "10-C", "8-B", "9-A", "11-C" };
var sortedNames =
(from name in names
let parts = name.Split('-')
select new {
fullName = name,
number = Convert.ToInt32(parts[0]),
letter = parts[1]
})
.OrderBy(x => x.number)
.ThenBy(x => x.letter)
.Select(x => x.fullName);
It's my naive assumption that this would be more efficient because the Split is only processed once in the initial select rather than in both OrderBy and ThenBy, but for all I know the extra "layers" of LINQ may outweigh any gains from that.

Categories

Resources