I have a web service that returns List<SampleClass>. I take data from database using LINQ-to-SQL as IEnumerable<SampleClass>. I then convert IEnumerable<SampleClass> to List<SampleClass>.
LINQ-to-SQL operation performance is OK, but IEnumerable<SampleClass> to List<SampleClass> takes some time to do the operation. Are there any solutions to get best performance from IEnumerable<SampleClass> to List<SampleClass>?
I read more than 3000 records from my database.
Thank you
IEnumerable to List takes some time to do
the operation.
The reason you are getting the delay is because, when you do ToList, that is the time, when the actual query gets executed and fetches records from the database.
This is called Deferred execution.
var query = db.yourTable.Where(r=> r.ID > 10);
var List = query.ToList(); //this is where the actual query gets executed.
When ever you iterate over the query using ToList, ToArray , Count() etc, that is when the actual query gets executed.
Are there any solutions to get best performance from
IEnumerable to List? I read more than 3000
records from my database.
Without improving the query, No, you can't. But do you really need to fetch 3000 records, you may look into paging using Skip and Take
Related
Are this 2 queries functionally equivalent?
1)
var z=Categories
.Where(s=>s.CategoryName.Contains("a"))
.OrderBy(s => s.CategoryName).AsEnumerable()
.Select((x,i)=>new {x.CategoryName,Rank=i});
2)
var z=Categories.AsEnumerable()
.Where(s=>s.CategoryName.Contains("a"))
.OrderBy(s => s.CategoryName)
.Select((x,i)=>new {x.CategoryName,Rank=i});
I mean, does the order of "AsNumerable()" in the query change the number of data items retrieved from the client, or the way they are retrieved?
Thank you for you help.
Are this 2 queries functionally equivalent?
If by equivalent you means to the final results, then probably yes (depending how the provider implements those operations), the difference is in the second query you are using in-memory extensions.
I mean, does the order of "AsNumerable()" in the query change the
number of data items retrieved from the client, or the way they are
retrieved?
Yes, in the first query, Where and OrderBy will be translated to SQL and the Select will be executed in memory.
In your second query all the information from the database is brought to memory, then is filtered and transformed in memory.
Categories is probably an IQueryable, so you will be using the extensions in Queryable class. this version of the extensions receive a Expression as parameter, and these expression trees is what allows transform your code to sql queries.
AsEnumerable() returns the object as an IEnumerable, so you will be using the extensions in Enumerable class that are executed directly in memory.
Yes they do the same thing but in different ways. The first query do all the selection,ordering and conditions in the SQL database itself.
However the second code segment fetches all the rows from the database and store it in the memory. Then after it sorts, orders, and apply conditions to the fetched data i.e now in the memory.
AsEnumerable() breaks the query into two parts:
The Inside-Part(query before AsEnumerable) is executed as LINQ-to-SQL
The Outside-Part(query after AsEnumerable) is executed as LINQ-to-Objects
I have a simple count query using LINQ and EF:
var count = (from I in db.mytable
where xyz
select I).Count();
the code above shows the query being locked in the database.
while the execute sql executes right away:
var count = db.SqlQuery<int>("select count(*) from mytable where xyz").FirstOrDefault();
the code above returns immediately.
I few have suggested to remove the .ToList() which I did and not difference. One thing is that this only happens on the PROD server. The QA server executes pretty fast as expected. But the prod server shows that it gets suspended. I suspect this could be a data storage limitation or server related. But wanted to make sure I am not doing something stupid in the code.
UPDATE:
One thing I noticed is the first time it execute is takes longer the first time. When I set next statement to run it again, it executes immediately. Is there a compile of the query the first time?
Because you are calling ToList in the first query and that causes fetching all records from DB and do the counting in memory. Instead of ToList you can just call Count() to get the same behaviour:
var count = (from I in db.mytable
where xyz
select I).Count();
You must not call .ToList() method, because you start retrieve all objects from database.
Just call .Count()
var count = (from I in db.mytable
where xyz
select I).Count();
Count can take a predicate. I'm not sure if it will speed up your code any but you can write the count as such.
var count = db.mytable.Count(x => predicate);
Where predicate is whatever you are testing for in your where clause.
Simple fiddling in LINQPad shows that this will generate similar, if not exactly the same, SQL as above. This is about the simplest way, in terseness of code, that I know how to do it.
If you need much higher speeds than what EF provides, yet stay in the confines of EF without using inline SQL you could make a stored procedure and call it from EF.
In my C# Class Library project I have a method that needs to compute some statistics GetFaultRate, that, given a date, computes the number of products with faults over the number of products produced.
float GetFaultRate(DateTime date)
{
var products = GetProducts(date);
var faultyProducts = GetFaultyProducts(date);
var rate = (float) (faultyProducts.Count() / products.Count());
return rate;
}
Both methods, GetProducts and GetFaultyProducts take the data from a Repository class _productRepository.
IEnumerable<Product> GetProducts(DateTime date)
{
var products = _productRepository.GetAll().ToList();
var periodProducts = products.Where(p => CustomFunction(p.productionDate) == date);
return periodProducts;
}
IEnumerable<Product> GetFaultyProducts(DateTime date)
{
var products = _productRepository.GetAll().ToList();
var periodFaultyProducts = products.Where(p => CustomFunction(p.ProductionDate) == date && p.Faulty == true);
return periodFaultyProducts;
}
Where GetAll has signature:
IQueryable<Product> GetAll();
The products in the database are many and it takes a lot of time to retrieve them and convert ToList(). I need to enumerate the collection since any custom function such as CustomFunction, cannot be executed in a IQueryable<T>.
My application gets stuck for a long time before obtaining the fault rate. I guess it is because of the large number of objects to be retrieved. I can indeed remove the two functions GetProducts and GetFaultyProducts and implement the logic inside GetFaultRate. However since I have other functions that use GetProducts and GetFaultyProducts, with the latter solution I have only one access to the database but a lot of duplicate code.
What can be a good compromise?
First off, don't convert the IQueryable to a list. It forces the entire data set to be brought into memory all at once, rather than just calling Where directly on the query which will allow you to filter the data as it comes in. This will substantially decrease your memory footprint, and (very) marginally increase the runtime speed. If you need to convert an IQueryable to an IEnumerable so that the Where isn't executed by the database simply use AsEnumerable.
Next, getting all of the data is something you should avoid if at all possible, especially multiple times. You'd need to show us what your date function does, but it's possible that it is something that could be done on the database. Any filtering you can do at all at the database will substantially increase performance.
Next, you really don't need two queries here. The second query is just a subset of the first, so if you know that you'll always be using both queries then you should just just perform the first query, bring the results into memory (i.e. with a ToList that you store) and then use a Where on that to filter the results further. This will avoid another database trip as well as all of the data processing/filtering.
If you won't always be using both queries, but will sometimes use just one or the other, then you can improve the second query by filtering out on Faulty before getting all items. Add Where(p => p.Faulty) before you call AsEnumerable and filter on the date information after calling AsEnumerable (and that's if you can't convert any of the date filtering to filtering that can be done at the database).
It appears that in the end you only need to compute the ratio of items that are faulty as compared to the total. That can easily be done with a single query, rather than two.
You've said that Count is running really slowly in your code, but that's not really true. Count is simply the method that is actually enumerating your query, whereas all of the other methods were simply building the query, not executing it. However, you can cut your performance costs drastically by combining the queries entirely.
var lookup = _productRepository.GetAll()
.AsEnumerable()//if at all possible, try to re-write the `Where`
//to be a valid SQL query so that you don't need this call here
.Where(p => CustomFunction(p.productionDate) == date)
.ToLookup(product => product.Faulty);
int totalCount = lookup[true].Count() + lookup[false].Count();
double rate = lookup[true].Count() / (double) totalCount;
var products = GetProducts(date);
var periodFaultyProducts = (from p in products.AsParallel()
where p.Faulty == true
select p).AsEnumerable();
You need to reduce the number of database requests. ToList, First, FirstOrDefault, Any, Take and Count forces your query to run at a database. As Servy pointed out, AsEnumerable converts your query from IQueryable to IEnumerable. If you have to find subsets you can use Where.
I have many queries to do and I was wondering if there is a significant performance difference between querying a List and a DataTable or even a SQL server indexed table? Or maybe would it be faster if I go with another type of collection?
In general, what do you think?
Thank you!
It should almost always be faster querying anything in memory, like a List<T> or a DataTable vis-a-vis a database.
Having said that, you have to get the data into an in-memory object like a List before it can be queried, so I certainly hope you're not thinking of dumping your DB into a List<T> for fast querying. That would be a very bad idea.
Am I getting the point of your question?
You might be confusing Linq with a database query language. I would suggest reading up on Linq, particularly IQueryable vs IEnumerable.
In short, Linq is an in-code query language, which can be pointed at nearly any collection of data to perform searches, projections, aggregates, etc in a similar fashion as SQL, but not limited to RDBMSes. It is not, on its face, a DB query language like SQL; it can merely be translated into one by use of an IQueryable provider, line Linq2SQL, Linq2Azure, Linq for Entities... the list goes on.
The IEnumerable side of Linq, which works on in-memory objects that are already in the heap, will almost certainly perform better than the IQueryable side, which exists to be translated into a native query language like SQL. However, that's not because of any inherent weakness or strength in either side of the language. It is instead a factor of (usually) having to send the translated IQueryable command over a network channel and get the results over same, which will perform much more slowly than your local computer's memory.
However, the "heavy lifting" of pulling records out of a data store and creating in-memory object representations has to be done at some time, and IQueryable Linq will almost certainly be faster than instantiating ALL records as in-memory objects, THEN using IEnumerable Linq (Linq 2 Objects) to filter to get your actual data.
To illustrate: You have a table MyTable; it contains a relatively modest 200 million rows. Using a Linq provider like Linq2SQL, your code might look like this:
//GetContext<>() is a method that will return the IQueryable provider
//used to produce MyTable entitiy objects
//pull all records for the past 5 days
var results = from t in Repository.GetContext<MyTable>()
where t.SomeDate >= DateTime.Today.AddDays(-5)
&& t.SomeDate <= DateTime.Now
select t;
This will be digested by the Linq2SQL IQueryable provider into a SQL string like this:
SELECT [each of MyTable's fields] FROM MyTable WHERE SomeDate Between #p1 and #p2; #p1 = '2/26/2011', #p2 = '3/3/2011 9:30:00'
This query can be easily digested by the SQL engine to return EXACTLY the information needed (say 500 rows).
Without a Linq provider, but wanting to use Linq, you may do something like this:
//GetAllMyTable() is a method that will execute and return the results of
//"Select * from MyTable"
//pull all records for the past 5 days
var results = from t in Repository.GetAllMyTable()
where t.SomeDate >= DateTime.Today.AddDays(-5)
&& t.SomeDate <= DateTime.Now
select t;
On the surface, the difference is subtle. Behind the scenes, the devil's in those details. This second query relies on a method that retrieves and instantiates an object for every record in the database. That means it has to pull all those records, and create a space in memory for them. That will give you a list of 200 MILLION records, which isn't so modest anymore now that each of those records was transmitted over the network and is now taking up residence in your page file. The first query MAY introduce some overhead in building and then digesting the expression tree into SQL, but it's MUCH preferred over dumping an entire table into an in-memory collection and iterating over it.
Code:
var result = db.rows.Take(30).ToList().Select(a => AMethod(a));
db.rows.Take(30) is Linq-To-SQL
I am using ToList() to enumerate the results, so the rest of the query isn't translated to SQL
Which is the fastest way of doing that? ToArray()?
Use Enumerable.AsEnumerable:
var result = db.rows
.Take(30)
.AsEnumerable()
.Select(a => AMethod(a));
Use Enumerable.AsEnumerable() if you don't want to execute the database query immidiatly because because AsEnumerable() will still deffer the database query execution until you start enumerating the LINQ to Object query.
If you are sure that you will require the data and/or want to immidiatly execute the database query, use Enumerable.ToList() or Enumerable.ToArray(). The performance difference should be not to big.
I assume the rows are read into a variable sized container at first in both call, because the number of rows is not yet known. So I tend to say that ToList() could be a bit faster, because the rows could be directly read into the list while ToArray() probably reads the rows into a kind of list at first and then copies to an array after all rows have been transfered.