How to bring C# AsQueryable() object into memory

How to bring C# AsQueryable() object into memory - c#

I want to use Entity Framework to pull a database list into memory with AsQueryable(), as when I use ToList() LINQ doesn't work anymore as it returns a list of 0 elements.
It is possible to do something like
var vwEntityList = dc.vwEntity.ToList().AsQueryable()
? I've tried this but LINQ fails when I query the object. If I use just .AsQueryable() it works but is really slow querying the database for every operation, which I want to avoid, I want to bring the EF view to memory as a queryable object.
When I have my list and try to use LINQ on it like
var newlist = vwEntityList.Where(x => x.deal_status == "B").ToList();
newlist is a list with 0 items, when I know there should be items in the list by running a similar query against the database, and it also works when using just .AsQueryable()
Thanks.

Based on what you've provided it sounds like this works (but is too slow for your liking)
var list = context.vwEntity.Where(x => x.deal_status == "B").ToList();
but this does not:
var totalList = context.vwEntity.ToList();
var list = totalList.Where(x => x.deal_status == "B").ToList();
The reason this would be the case is database collation. Some database such as SQL Server use a case-insensitive comparison for strings by default. So if a record had a deal_status of "b", then the first query would return that record because in SQL "B" == "b", however once loaded into memory, a Linq2Object where clause will do a case-sensitive comparison. "B" <> "b". SQL Server databases can be configured with case sensitive collations, so it is dangerous to assume strings will always be treated case-insensitive.
Linq expressions work against IEnumerable so you don't need to force a List back to IQueryable.
If you are working with Linq2Object, or against a case-sensitive database you should ensure case-insensitive matching is done where you want rather than relying on collation:
// this should work...
var totalList = context.vwEntity.ToList();
var list = totalList.Where(x => x.deal_status.ToUpper() == "B").ToList();
That said, reading an entire table/view into memory then querying against the objects should never really be "faster" than querying the entities unless perhaps you need to do a lot of different queries and want to cache the table in memory. Querying across strings can commonly hit performance snags if the fields being queried are not indexed. I would suggest getting familiar with a query profiler for whatever database you are running against to capture the exact queries that EF is executing to identify performance bottlenecks. When facing slow queries, I don't think I've ever recommended "load the entire 200k records into memory first" as a solution. :)

Related

Calling SKIP() in code or using TOP in function

I'm coding an application with Entity Framework in which I rely heavily on user defined functions.
I have a question about the best way (most optimized way) of how I limit and page my result sets. Basically I am wondering if these two options are the same or one is prefered performance wise.
Option 1.
//C#
var result1 = _DB.fn_GetData().OrderBy(x => Id).Skip(page *100).Take(100).ToList();
// SQL in fn_GetData
SELECT * FROM [Data].[Table]
Option 2.
//C#
var result2 = _DB.fn_GetData(page = 0, size = 100).ToList();
// SQL in fn_GetData
SELECT * FROM [Data].[Table]
ORDER BY Id
OFFSET (size * page) ROWS FETCH NEXT size ROWS ONLY
To me these seem to be producing about the same result, but maybe I am missing some key aspect.

You'll have to be aware when your LINQ statement is AsEnumerable and when it is AsQueryable. As long as your statement is an IQueryable<...> the software will try to translate it into SQL and let your database do the query. Once it really has lost the IQueryable, and has become an implementation of an IEnumerable, the data has been brought to local memory, and all further LINQ statements will be performed by your process, not by the database.
If you use your debugger, you will see that the return value of your fn_getData returns an IEnumerable. This means that the result of fn_GetData is brought to local memory and your OrderBy etc is performed by your process.
Usually it is much more efficient to only move the records that you will use to local memory. Besides: do not fetch the complete records, but only the properties that you plan to use. So in this case I guess you'll have to create an extended version of fn_GetData that returns only the values you plan to use

I suggest second option because SQL Server can more faster then C# methods.
In your first option, you take all of the records in table and loop through. But second option, SQL Server do it for you and you get what you want.

You should apply the limiting and where clauses (it depends on table indexes) in the database as far as possible. For first example;
var result1 = _DB.fn_GetData().OrderBy(x => Id).Skip(page *100).Take(100).ToList();
// SQL in fn_GetData
SELECT * FROM [Data].[Table]
The whole table is retrieved from database into in-memory and it kills the performance and reliability. I strongly don't suggest it. You should consider to put some limitations to filter records on the database. So, the second option is better approach in this case.

Linq query timing out, how to streamline query

Our front end UI has a filtering system that, in the back end, operates over millions of rows. It uses a an IQueryable that is built up over the course of the logic, then executed all at once. Each individual UI component is ANDed together (for example, Dropdown1 and Dropdown2 will only return rows that have both of what is selected in common). This is not a problem. However, Dropdown3 has has two types of data in it, and the checked items need to be ORd together, then ANDed with the rest of the query.
Due to the large amount of rows it is operating over, it keeps timing out. Since there are some additional joins that need to happen, it is somewhat tricky. Here is my code, with the table names replaced:
//The end list has driver ids in it--but the data comes from two different places. Build a list of all the driver ids.
driverIds = db.CarDriversManyToManyTable.Where(
cd =>
filter.CarIds.Contains(cd.CarId) && //get driver IDs for each car ID listed in filter object
).Select(cd => cd.DriverId).Distinct().ToList();
driverIds = driverIds.Concat(
db.DriverShopManyToManyTable.Where(ds => filter.ShopIds.Contains(ds.ShopId)) //Get driver IDs for each Shop listed in filter object
.Select(ds => ds.DriverId)
.Distinct()).Distinct().ToList();
//Now we have a list solely of driver IDs
//The query operates over the Driver table. The query is built up like this for each item in the UI. Changing from Linq is not an option.
query = query.Where(d => driverIds.Contains(d.Id));
How can I streamline this query so that I don't have to retrieve thousands and thousands of IDs into memory, then feed them back into SQL?

There are several ways to produce a single SQL query. All they require to keep the parts of the query of type IQueryable<T>, i.e. do not use ToList, ToArray, AsEnumerable etc. methods that force them to be executed and evaluated in memory.
One way is to create Union query containing the filtered Ids (which will be unique by definition) and use join operator to apply it on the main query:
var driverIdFilter1 = db.CarDriversManyToManyTable
.Where(cd => filter.CarIds.Contains(cd.CarId))
.Select(cd => cd.DriverId);
var driverIdFilter2 = db.DriverShopManyToManyTable
.Where(ds => filter.ShopIds.Contains(ds.ShopId))
.Select(ds => ds.DriverId);
var driverIdFilter = driverIdFilter1.Union(driverIdFilter2);
query = query.Join(driverIdFilter, d => d.Id, id => id, (d, id) => d);
Another way could be using two OR-ed Any based conditions, which would translate to EXISTS(...) OR EXISTS(...) SQL query filter:
query = query.Where(d =>
db.CarDriversManyToManyTable.Any(cd => d.Id == cd.DriverId && filter.CarIds.Contains(cd.CarId))
||
db.DriverShopManyToManyTable.Any(ds => d.Id == ds.DriverId && filter.ShopIds.Contains(ds.ShopId))
);
You could try and see which one performs better.

The answer to this question is complex and has many facets that, individually, may or may not help in your particular case.
First of all, consider using pagination. .Skip(PageNum * PageSize).Take(PageSize) I doubt your user needs to see millions of rows at once in the front end. Show them only 100, or whatever other smaller number seems reasonable to you.
You've mentioned that you need to use joins to get the data you need. These joins can be done while forming your IQueryable (entity framework), rather than in-memory (linq to objects). Read up on join syntax in linq.
HOWEVER - performing explicit joins in LINQ is not the best practice, especially if you are designing the database yourself. If you are doing database first generation of your entities, consider placing foreign-key constraints on your tables. This will allow database-first entity generation to pick those up and provide you with Navigation Properties which will greatly simplify your code.
If you do not have any control or influence over the database design, however, then I recommend you construct your query in SQL first to see how it performs. Optimize it there until you get the desired performance, and then translate it into an entity framework linq query that uses explicit joins as a last resort.
To speed such queries up, you will likely need to perform indexing on all of the "key" columns that you are joining on. The best way to figure out what indexes you need to improve performance, take the SQL query generated by your EF linq and bring it on over to SQL Server Management Studio. From there, update the generated SQL to provide some predefined values for your #p parameters just to make an example. Once you've done this, right click on the query and either use display estimated execution plan or include actual execution plan. If indexing can improve your query performance, there is a pretty good chance that this feature will tell you about it and even provide you with scripts to create the indexes you need.

It looks to me that using the instance versions of the LINQ extensions is creating several collections before you're done. using the from statement versions should cut that down quite a bit:
driveIds = (from var record in db.CarDriversManyToManyTable
where filter.CarIds.Contains(record.CarId)
select record.DriverId).Concat
(from var record in db.DriverShopManyToManyTable
where filter.ShopIds.Contains(record.ShopId)
select record.DriverId).Distinct()
Also using the groupby extension would give better performance than querying each driver Id.

Linq Queries containing "AsEnumerable()" equivalence

Are this 2 queries functionally equivalent?
1)
var z=Categories
.Where(s=>s.CategoryName.Contains("a"))
.OrderBy(s => s.CategoryName).AsEnumerable()
.Select((x,i)=>new {x.CategoryName,Rank=i});
2)
var z=Categories.AsEnumerable()
.Where(s=>s.CategoryName.Contains("a"))
.OrderBy(s => s.CategoryName)
.Select((x,i)=>new {x.CategoryName,Rank=i});
I mean, does the order of "AsNumerable()" in the query change the number of data items retrieved from the client, or the way they are retrieved?
Thank you for you help.

Are this 2 queries functionally equivalent?
If by equivalent you means to the final results, then probably yes (depending how the provider implements those operations), the difference is in the second query you are using in-memory extensions.
I mean, does the order of "AsNumerable()" in the query change the
number of data items retrieved from the client, or the way they are
retrieved?
Yes, in the first query, Where and OrderBy will be translated to SQL and the Select will be executed in memory.
In your second query all the information from the database is brought to memory, then is filtered and transformed in memory.
Categories is probably an IQueryable, so you will be using the extensions in Queryable class. this version of the extensions receive a Expression as parameter, and these expression trees is what allows transform your code to sql queries.
AsEnumerable() returns the object as an IEnumerable, so you will be using the extensions in Enumerable class that are executed directly in memory.

Yes they do the same thing but in different ways. The first query do all the selection,ordering and conditions in the SQL database itself.
However the second code segment fetches all the rows from the database and store it in the memory. Then after it sorts, orders, and apply conditions to the fetched data i.e now in the memory.
AsEnumerable() breaks the query into two parts:
The Inside-Part(query before AsEnumerable) is executed as LINQ-to-SQL
The Outside-Part(query after AsEnumerable) is executed as LINQ-to-Objects

Is LINQ faster on a list or a table?

I have many queries to do and I was wondering if there is a significant performance difference between querying a List and a DataTable or even a SQL server indexed table? Or maybe would it be faster if I go with another type of collection?
In general, what do you think?
Thank you!

It should almost always be faster querying anything in memory, like a List<T> or a DataTable vis-a-vis a database.
Having said that, you have to get the data into an in-memory object like a List before it can be queried, so I certainly hope you're not thinking of dumping your DB into a List<T> for fast querying. That would be a very bad idea.
Am I getting the point of your question?

You might be confusing Linq with a database query language. I would suggest reading up on Linq, particularly IQueryable vs IEnumerable.
In short, Linq is an in-code query language, which can be pointed at nearly any collection of data to perform searches, projections, aggregates, etc in a similar fashion as SQL, but not limited to RDBMSes. It is not, on its face, a DB query language like SQL; it can merely be translated into one by use of an IQueryable provider, line Linq2SQL, Linq2Azure, Linq for Entities... the list goes on.
The IEnumerable side of Linq, which works on in-memory objects that are already in the heap, will almost certainly perform better than the IQueryable side, which exists to be translated into a native query language like SQL. However, that's not because of any inherent weakness or strength in either side of the language. It is instead a factor of (usually) having to send the translated IQueryable command over a network channel and get the results over same, which will perform much more slowly than your local computer's memory.
However, the "heavy lifting" of pulling records out of a data store and creating in-memory object representations has to be done at some time, and IQueryable Linq will almost certainly be faster than instantiating ALL records as in-memory objects, THEN using IEnumerable Linq (Linq 2 Objects) to filter to get your actual data.
To illustrate: You have a table MyTable; it contains a relatively modest 200 million rows. Using a Linq provider like Linq2SQL, your code might look like this:
//GetContext<>() is a method that will return the IQueryable provider
//used to produce MyTable entitiy objects
//pull all records for the past 5 days
var results = from t in Repository.GetContext<MyTable>()
where t.SomeDate >= DateTime.Today.AddDays(-5)
&& t.SomeDate <= DateTime.Now
select t;
This will be digested by the Linq2SQL IQueryable provider into a SQL string like this:
SELECT [each of MyTable's fields] FROM MyTable WHERE SomeDate Between #p1 and #p2; #p1 = '2/26/2011', #p2 = '3/3/2011 9:30:00'
This query can be easily digested by the SQL engine to return EXACTLY the information needed (say 500 rows).
Without a Linq provider, but wanting to use Linq, you may do something like this:
//GetAllMyTable() is a method that will execute and return the results of
//"Select * from MyTable"
//pull all records for the past 5 days
var results = from t in Repository.GetAllMyTable()
where t.SomeDate >= DateTime.Today.AddDays(-5)
&& t.SomeDate <= DateTime.Now
select t;
On the surface, the difference is subtle. Behind the scenes, the devil's in those details. This second query relies on a method that retrieves and instantiates an object for every record in the database. That means it has to pull all those records, and create a space in memory for them. That will give you a list of 200 MILLION records, which isn't so modest anymore now that each of those records was transmitted over the network and is now taking up residence in your page file. The first query MAY introduce some overhead in building and then digesting the expression tree into SQL, but it's MUCH preferred over dumping an entire table into an in-memory collection and iterating over it.

What is the right way to enumerate using LINQ?

Code:
var result = db.rows.Take(30).ToList().Select(a => AMethod(a));
db.rows.Take(30) is Linq-To-SQL
I am using ToList() to enumerate the results, so the rest of the query isn't translated to SQL
Which is the fastest way of doing that? ToArray()?

Use Enumerable.AsEnumerable:
var result = db.rows
.Take(30)
.AsEnumerable()
.Select(a => AMethod(a));

Use Enumerable.AsEnumerable() if you don't want to execute the database query immidiatly because because AsEnumerable() will still deffer the database query execution until you start enumerating the LINQ to Object query.
If you are sure that you will require the data and/or want to immidiatly execute the database query, use Enumerable.ToList() or Enumerable.ToArray(). The performance difference should be not to big.
I assume the rows are read into a variable sized container at first in both call, because the number of rows is not yet known. So I tend to say that ToList() could be a bit faster, because the rows could be directly read into the list while ToArray() probably reads the rows into a kind of list at first and then copies to an array after all rows have been transfered.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.