Is it possible to accelerate (dynamic) LINQ queries using GPU? - c#

I have been searching for some days for solid information on the possibility to accelerate LINQ queries using a GPU.
Technologies I have "investigated" so far:
Microsoft Accelerator
Cudafy
Brahma
In short, would it even be possible at all to do an in-memory filtering of objects on the GPU?
Let´s say we have a list of some objects and we want to filter something like:
var result = myList.Where(x => x.SomeProperty == SomeValue);
Any pointers on this one?
Thanks in advance!
UPDATE
I´ll try to be more specific about what I am trying to achieve :)
The goal is, to use any technology, which is able to filter a list of objects (ranging from ~50 000 to ~2 000 000), in the absolutely fastest way possible.
The operations I perform on the data when the filtering is done (sum, min, max etc) is made using the built in LINQ-methods and is already fast enough for our application, so that´s not a problem.
The bottleneck is "simply" the filtering of data.
UPDATE
Just wanted to add that I have tested about 15 databases, including MySQL (checking possible cluster approach / memcached solution), H2, HSQLDB, VelocityDB (currently investigating further), SQLite, MongoDB etc, and NONE is good enough when it comes to the speed of filtering data (of course, the NO-sql solutions do not offer this like the sql ones, but you get the idea) and/or the returning of the actual data.
Just to summarize what I/we need:
A database which is able to sort data in the format of 200 columns and about 250 000 rows in less than 100 ms.
I currently have a solution with parallellized LINQ which is able (on a specific machine) to spend only nano-seconds on each row when filtering AND processing the result!
So, we need like sub-nano-second-filtering on each row.
Why does it seem that only in-memory LINQ is able to provide this?
Why would this be impossible?
Some figures from the logfile:
Total tid för 1164 frågor: 2579
This is Swedish and translates:
Total time for 1164 queries: 2579
Where the queries in this case are queries like:
WHERE SomeProperty = SomeValue
And those queries are all being done in parallell on 225639 rows.
So, 225639 rows are being filtered in memory 1164 times in about 2.5 seconds.
That´s 9,5185952917007032597107300413827e-9 seconds / row, BUT, that also includes the actual processing of the numbers! We do Count (not null), total count, Sum, Min, Max, Avg, Median. So, we have 7 operations on these filtered rows.
So, we could say it´s actually 7 times faster than the the databases we´ve tried, since we do NOT do any aggregation-stuff in those cases!
So, in conclusion, why are the databases so poor at filtering data compared to in-memory LINQ filtering? Have Microsoft really done such a good job that it is impossible to compete with it? :)
It makes sense though that in-memory filtering should be faster, but I don´t want a sense that it is faster. I want to know what is faster, and if it´s possible why.

I will answer definitively about Brahma since it's my library, but it probably applies to other approaches as well. The GPU has no knowledge of objects. It's memory is also mostly completely separate from CPU memory.
If you do have a LARGE set of objects and want to operate on them, you can only pack the data you want to operate on into a buffer suitable for the GPU/API you're using and send it off to be processed.
Note that this will make two round trips over the CPU-GPU memory interface, so if you aren't doing enough work on the GPU to make it worthwhile, you'll be slower than if you simply used the CPU in the first place (like the sample above).
Hope this helps.

The GPU is really not intended for all general purpose computing purposes, especially with object oriented designs like this, and filtering an arbitrary collection of data like this would really not be an appropriate thing.
GPU computations are great for things where you are performing the same operation on a large dataset - which is why things like matrix operations and transforms can be very nice. There, the data copying can be outweighed by the incredibly fast computational capabilities on the GPU....
In this case, you'd have to copy all of the data into the GPU to make this work, and restructure it into some form the GPU will understand, which would likely be more expensive than just performing the filter in software in the first place.
Instead, I would recommend looking at using PLINQ for speeding up queries of this nature. Provided your filter is thread safe (which it'd have to be for any GPU related work...) this is likely a better option for general purpose query optimization, as it won't require the memory copying of your data. PLINQ would work by rewriting your query as:
var result = myList.AsParallel().Where(x => x.SomeProperty == SomeValue);
If the predicate is an expensive operation, or the collection is very large (and easily partitionable), this can make a significant improvement to the overall performance when compared to standard LINQ to Objects.

GpuLinq
GpuLinq's main mission is to democratize GPGPU programming through LINQ. The main idea is that we represent the query as an Expression tree and after various transformations-optimizations we compile it into fast OpenCL kernel code. In addition we provide a very easy to work API without the need of messing with the details of the OpenCL API.
https://github.com/nessos/GpuLinq

select *
from table1 -- contains 100k rows
left join table2 -- contains 1M rows
on table1.id1=table2.id2 -- this would run for ~100G times
-- unless they are cached on sql side
where table1.id between 1 and 100000 -- but this optimizes things (depends)
could be turned into
select id1 from table1 -- 400k bytes if id1 is 32 bit
-- no need to order
stored in memory
select id2 from table2 -- 4Mbytes if id2 is 32 bit
-- no need to order
stored in memory, both arrays sent to gpu using a kernel(cuda,opencl) like below
int i=get_global_id(0); // to select an id2, we need a thread id
int selectedID2=id2[i];
summary__=-1;
for(int j=0;j<id1Length;j++)
{
int selectedID1=id1[j];
summary__=(selectedID2==selectedID1?j:summary__); // no branching
}
summary[i]=j; // accumulates target indexings of
"on table1.id1=table2.id2" part.
On the host side, you can make
select * from table1 --- query3
and
select * from table2 --- query4
then use the id list from gpu to select the data
// x is table1 ' s data
myList.AsParallel().ForEach(x=>query3.leftjoindata=query4[summary[index]]);
The gpu code shouldn't be slower than 50ms for a gpu with constant memory, global broadcast ability and some thousands of cores.
If any trigonometric function is used for filtering, the performance would drop fast. Also when left joined tables row count makes it O(m*n) complexity so millions versus millions would be much slower. GPU memory bandwidth is important here.
Edit:
A single operation of gpu.findIdToJoin(table1,table2,"id1","id2") on my hd7870(1280 cores) and R7-240(320 cores) with "products table(64k rows)" and a "categories table(64k rows)" (left join filter) took 48 milliseconds with unoptimized kernel.
Ado.Net 's "nosql" style linq-join took more than 2000 ms with only 44k products and 4k categories table.
Edit-2:
left join with a string search condition gets 50 to 200 x faster on gpu when tables grow to 1000s of rows each having at least hundreds of characters.

The simple answer for your use case is no.
1) There's no solution for that kind of workload even in raw linq to object, much less in something that would replace your database.
2) Even if you were fine with loading the whole set of data at once (this takes time) it would still be much slower as GPU have high thoroughput but their access is high latency, so if you're looking at "very" fast solutions GPGPU is often not the answer as just preparing / sending the workload and getting back the results will be slow, and in your case probably need to be done in chunks too.

Related

Linq join three tables orderd by Timstamp

I have three tables:
Sites:
Id
Timezone (string)
SiteName
Vehicles:
Id
SiteId
Name
Positions:
Id
TimestampLocal (DateTimeOffset)
VehicleId
Data1
Data2
...
Data50
There are multiple positions for a single vehicle. The position table is very large (100+ mil. records)
I need to get the last position for each vehicle (by timestamp as they can send old data) and its timezone so I can do further data processing based on the timezone. Something like {PositionId, VehicleId, Timezone, Data1}
I have tried with:
var result =
from ot in entities.Positions
join v in entities.Vehicles on ot.VehicleId equals v.Id
join s in entities.Sites on v.SiteId equals s.Id
group ot by ot.VehicleId into grp
select grp.OrderByDescending(g=>g.TimestampLocal).FirstOrDefault();
then I process the data with:
foreach (var rr in result){... update Data1 field ... }
This gets the last values but it does bring all the fields in Positions (a lot of data) and no Timezone. Also the foreach part is very CPU intensive (as it probably brings the data) as it gets 100% CPU for a few seconds).
How can this be done in Linq ...and be lightweight for the DB and its transfers?
Check out the following it is on same lines as you have done but the result contains Sites Timezonecolumn as projected, it is projected in the Join itself
var result =
S.Join(V,s1=>s1.Id,v1=>v1.SiteId,(s1,v1)=>new {v1.Id,s1.Timezone})
.Join(P,v1=>v1.Id,p1=>p1.VehicleId,(v1,p1)=>new {p1,v1.Timezone})
.GroupBy(g=>g.p1.VehicleId)
.Select(x=>x.OrderByDescending(y=>y.p1.TimestampLocal).FirstOrDefault())
.Select(y=>new {y.p1,y.Timezone});
Now following are important points related to the questions you have asked:
To reduce the number of columns fetched, as you may not want all the Positions columns, following need to be done:
In this line - Join(P,v1=>v1.Id,p1=>p1.VehicleId,(v1,p1)=>new {p1,v1.Timezone})
project the fields, which are result of the Join, something like:
new {p1.Id,p1.TimestampLocal,p1.VehicleId,p1.Data1,p1...,v1.Timezone}
which will provide only the projected field, but then the GroupBy would change to GroupBy(g=>g.VehicleId)
Other option for the same would be change the GroupBy projection, as follows instead of Join statement
GroupBy(g=>g.p1.VehicleId,
g=>new {g.p1.Id,g.p1.TimestampLocal,g.p1.VehicleId,g.p1.Data1,g.p1...,g.Timezone})
Now the remaining part process being CPU intensive and using 100% CPU, following could be done to optimize:
Does each foreach loop iterations does the network Update call, then it is bound to make it a network intensive process and thus slow, preferable would be doing all the necessary changes in-memory and then updating the database in one go that would still be intensive if you have millions of record
Even doing the same for million records will never be a good idea, since that would be a fairly network and CPU intensive, thus making it slow, your options would be:
Chunk the memory into smaller components, since I am not sure if any one needs to update million records in one shot, so make it multiple smaller updates, which are executed one after another triggered by the user, but will be much less taxing on the system resources.
Bring smaller dataset in the memory, do the filtering at the database level using a parameter and pass the necessary data for modification in memory to update.
Using projection as shown in the Linq above, bring only the required columns, that would reduce overall data memory footprint, bound to have an impact.
If the logic is such that various updates are mutually exclusive, then do them using Parallel API in a thread safe structure, that would ensure efficient and quick CPU utilization of all the cores, thus faster, though it will spike to 100% CPU but to a fraction of the Non parallel execution
Beside this provide me specific details to help you our with more details, these are basic suggestions, there's no golden rule to solve such optimization issues

no improvements on the following PLINQ code

I do not see any improvements in processing speed using the following code:
IEnumerable<Quote> sortedQuotes = (from x in unsortedQuotes.AsParallel()
orderby (x.DateTimeTicks)
select x);
over the sequential version:
IEnumerable<Quote> sortedQuotes = (from x in unsortedQuotes
orderby (x.DateTimeTicks)
select x);
Am I missing something here? I varied the number of items in the source collections from thousands to several tens of millions and no size showed the Parallel version coming out ahead.
Any tips appreciated. By the way if anyone knows of a faster way to sort more efficiently (given my indicated item variable type (containing a long DateTimeTicks by which the items are sorted in the collection) that would also be appreciate.
Edit: "sorting efficiently" -> As fast as possible.
Thanks
According to this page,
If you have a sort in your query, stop-and-go will be used instead because pipelining the output of a sort is wasteful. A sort exhibits extremely high latency [...], and so PLINQ prefers to devote all processing power to completing the sort as quickly as possible.
Your query only contains a Sort, the select doesn't count. So the PLINQ engine will execute it as sequential.
You can only expect some improvement when the sorting is a part of a larger query.

When does compile queries of LINQ to SQL improve performance

I was referring to an article which focuses on Speeding up LINQ to SQL Queries. One of the techniques it mentions is "Use Compiled Queries" and explains how to use it.
I wanted to see performance improvement of compiled queries and hence i tried the same example provided by the author. I used Northwind Db as datacontext. I tried normal execution and compiledquery execution and checked them on LINQ PAD.
First I tried executing the query without using CompileQuery. It took 2.065 seconds.
var oo = from o in Orders
where o.OrderDetails.Any (p => p.UnitPrice > 100)
select o;
oo.Dump ("Order items with unit price more than $100");
var oo1 = from o in Orders
where o.OrderDetails.Any (p => p.UnitPrice > 10)
select o;
oo1.Dump ("Order items with unit price more than $10");
Secondly, the queries with using CompileQuery. It took 2.100 seconds.
var oo = CompiledQuery.Compile ((TypedDataContext dc, decimal unitPrice) =>
from o in Orders
where o.OrderDetails.Any (p => p.UnitPrice > unitPrice)
select o
);
oo (this, 100).Dump ("Order items with unit price more than $100");
oo (this, 10).Dump ("Order items with unit price more than $10");
Re-executing them several times showed that the time taken by both of the approaches are almost similar.
Here we see only two query executions for each method. I tried making 10 queries for each of them. But both of them completed around 7 seconds.
Does pre-compiling the queries really improve the performance? Or am I getting it wrong it terms of usage ?
Thank you for your time and consideration.
Edit:
After reading the accepted answer, readers may also want to go through this article which nicely explains how compiled queries improve performance.
Bear in mind that there are two main pieces of a LINQ query that can be particularly expensive:
Compiling the LINQ expressions into an SQL Statement.
Running the SQL Statement and retrieving the results
In your case, you have a relatively simple query, and either a very slow database connection, some very large data sets, or tables that are not indexed in an optimal way to run this particular query. Or maybe a combination of all three.
So compared to the amount of time it is taking to produce the SQL for your query (maybe 10-50 milliseconds), the second step is taking so much time (~1000 ms) that you can hardly notice the difference.
You would see significant improvements if the following conditions are all true:
your LINQ query is complex,
you have a fast connection to your database,
the SQL query itself runs quickly on that database, and
the result set is small enough that it gets transferred back from the database relatively quickly.
In practice, I've had queries that can take upwards of 500ms to compile, but only a few milliseconds to actually run. These are usually the cases where I focus on precompiling queries.
One good way to know ahead of time what kind of performance gains you can expect from precompiled queries is to time the second instance of your query using a Stopwatch object, and then run the generated SQL directly using LINQPad's Analyze SQL feature. If the SQL query returns quickly but the LINQ query takes a long time, that's a good candidate for precompiling.

Speed up linq query without where clauses

Quick LINQ performance question.
I have a database with many many records and it's used for a webshop.
All query logic and paging is done with LINQ, and it performs quite well.
This is, because the usual search for products contains one or more where clause, and that shortens my result set to a couple of hundred results at max.
But.. there is an option to list all products (when no search criteria is provided), and that query is slow.. real slow. Even though i'm just asking for a single page with .Skip(20).Take(10), it's still slow because the total result is something like 140000 products. Is there a way to limit this (or all) query, so that the speed of the whole thing is kept okay?
I don't want to force my customers to provide one or more criteria.. but on the other hand i have no problem with telling them that they can never find more than 2000 products.
Thanks for helping!
Tys
Why don't you limit the number of records on the sql side as described in this post
http://www.sqlservercurry.com/2009/06/skip-and-take-n-number-of-records-in.html
Watch out for any "premature" enumerations when you pass down queries/results in your code!
There are also several LINQ visualizers available, which can help to see what the LINQ expressions actually translate to. Or you can play around with expressions in LINQPad before integrating in your code…
What you can do is to have Linq use stored procedure from the database.
In that case, it will be faster because it is the database engine who will do the work and return it to Linq; the database engine is made for that, and it is closer to data than Linq.
I suggest you give it a try and give us feedback
You can check what indexes has the table and what PK is. It could be the table has no index at all so records compared by field values. Also you can catch the query in the SqlProfiler, run it separately and analyse its query plan.

List<T> FirstOrDefault() bad performance - is dictionary possible in this case?

I have set of 'codes' Z that are valid in a certain time period.
Since I need them a lot of times in a large loop (million+) and every time I have to lookup the corresponding code I cache them in a List<>. After finding the correct codes, i'm inserting (using SqlBulkCopy) a million rows.
I lookup the id with the following code (l_z is a List<T>)
var z_fk = (from z in l_z
where z.CODE == lookupCode &&
z.VALIDFROM <= lookupDate &&
z.VALIDUNTIL >= lookupDate
select z.id).SingleOrDefault();
In other situations I have used a Dictionary with superb performance, but in those cases I only had to lookup the id based on the code.
But now with searching on the combination of fields, I am stuck.
Any ideas? Thanks in advance.
Create a Dictionary that stores a List of items per lookup code - Dictionary<string, List<Code>> (assuming that lookup code is a string and the objects are of type Code).
Then when you need to query based on lookupDate, you can run your query directly off of dict[lookupCode]:
var z_fk = (from z in dict[lookupCode]
where z.VALIDFROM <= lookupDate &&
z.VALIDUNTIL >= lookupDate
select z.id).SingleOrDefault();
Then just make sure that whenever you have a new Code object, that it gets added to the List<Code> collection in the dict corresponding to the lookupCode (and if one doesn't exist, then create it).
A simple improvement would be to use...
//in initialization somewhere
ILookup<string, T> l_z_lookup = l_z.ToLookup(z=>z.CODE);
//your repeated code:
var z_fk = (from z in lookup[lookupCode]
where z.VALIDFROM <= lookupDate && z.VALIDUNTIL >= lookupDate
select z.id).SingleOrDefault();
You could further use a more complex, smarter data structure storing dates in sorted fashion and use a binary search to find the id, but this may be sufficient. Further, you speak of SqlBulkCopy - if you're dealing with a database, perhaps you can execute the query on the database, and then simply create the appropriate index including columns CODE, VALIDUNTIL and VALIDFROM.
I generally prefer using a Lookup over a Dictionary containing Lists since it's trivial to construct and has a cleaner API (e.g. when a key is not present).
We don't have enough information to give very prescriptive advice - but there are some general things you should be thinking about.
What types are the time values? Are you comparing date times or some primitive value (like a time_t). Think about how your data types affects performance. Choose the best ones.
Should you really be doing this in memory or should you be putting all these rows in to SQL and letting it be queried on there? It's really good at that.
But let's stick with what you asked about - in memory searching.
When searching is taking too long there is only one solution - search fewer things. You do this by partitioning your data in a way that allows you to easily rule out as many nodes as possible with as few operations as possible.
In your case you have two criteria - a code and a date range. Here are some ideas...
You could partition based on code - i.e. Dictionary> - if you have many evenly distributed codes your list sizes will each be about N/M in size (where N = total event count and M = number of events). So a million nodes with ten codes now requires searching 100k items rather than a million. But you could take that a bit further. The List could itself be sorted by starting time allowing a binary search to rule out many other nodes very quickly. (this of course has a trade-off in time building the collection of data). This should provide very quick
You could partition based on date and just store all the data in a single list sorted by start date and use a binary search to find the start date then march forward to find the code. Is there a benefit to this approach over the dictionary? That depends on the rest of your program. Maybe being an IList is important. I don't know. You need to figure that out.
You could flip the dictionary model partition the data by start time rounded to some boundary (depending on the length, granularity and frequency of your events). This is basically bucketing the data in to groups that have similar start times. E.g., all the events that were started between 12:00 and 12:01 might be in one bucket, etc. If you have a very small number of events and a lot of highly frequent (but not pathologically so) events this might give you very good lookup performance.
The point? Think about your data. Consider how expensive it should be to add new data and how expensive it should be to query the data. Think about how your data types affect those characteristics. Make an informed decision based on that data. When in doubt let SQL do it for you.
This to me sounds like a situation where this could all happen on the database via a single statement. Then you can use indexing to keep the query fast and avoid having to push data over the wire to and from your database.

Categories

Resources