Using First() get the 2nd item of LINQ result? - c#

I'm new to Linq and EntityFramework. This is a sample program I met while learning them.
The data in table is like this:
BlogId Title
1 Hello Blog
2 New Blog
3 New Blog
I have the following Linq code, trying to read the first blog id(expected to be 2):
var name = "New Blog";
var blogs = (from b in db.Blogs
where b.Title == name
orderby b.Title
select b);//.ToList();
Console.Write("The first id: ");
Console.WriteLine(blogs.First().BlogId);
The result comes out to be 3.
Then I use ToList():
var blogs = (from b in db.Blogs
where b.Title == name
orderby b.Title
select b).ToList();
Console.Write("The first id: ");
Console.WriteLine(blogs.First().BlogId);
The result comes out to be 2.
Can anyone help to explain this? Or is this a bug?
Thanks.
//////////////////////// UPDATE /////////////////////////////
I just deleted the data in the database and inserted some new items. Now the table is like this:
BlogId Title
5 New Blog
6 New Blog
7 New Blog
8 New Blog
Then I ran the program above(Not with ToList()), the First() method returns the id 6
So I assume the method always returns the 2nd item in the situation above. And it doesn't seem to have anything to do with the RDBMS. Can anyone explain?
Thanks.
/////////////////////////////////////////////////////
FYI, the following is the whole .cs file:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Data.Entity;
using System.ComponentModel.DataAnnotations;
namespace SampleNew
{
class Program
{
public class Blog
{
[Key]
public Int32 BlogId { get; set; }
public String Title { get; set; }
public virtual List<Post> Posts { get; set; }
}
public class Post
{
[Key]
public Int32 PostId { get; set; }
public String Title{ get; set; }
public String Content { get; set; }
}
public class BlogContext : DbContext
{
public DbSet<Blog> Blogs{ get; set; }
public DbSet<Post> Posts { get; set; }
}
static void Main(string[] args)
{
using (var db = new BlogContext())
{
// Create and save a new Blog
// Console.Write("Enter a name for a new Blog: ");
var name = "New Blog";
//var blog = new Blog { Title = name };
var blogs = (from b in db.Blogs
where b.Title == name
orderby b.Title
select b).ToList();
Console.Write("The first id: ");
Console.WriteLine(blogs.First().BlogId);
Console.WriteLine(blogs.Count());
Blog blog = null;
foreach (Blog b in blogs)
{
blog = b;
Console.WriteLine(blog.BlogId);
}
Console.WriteLine(blog.BlogId);
Console.WriteLine(blogs.First().BlogId);
Console.WriteLine(blogs.First().BlogId);
Console.WriteLine(blogs.Last().BlogId);
Console.WriteLine(blogs.Last().BlogId);
blog.Posts = new List<Post>();
var post = new Post { Content = "Test Content2", Title = "Test Title2"};
blog.Posts.Add(post);
db.Posts.Add(post);
db.SaveChanges();
// Display all Blogs from the database
var query = from b in db.Blogs
orderby b.Title
select b;
Console.WriteLine("All blogs in the database:");
foreach (var item in query)
{
Console.WriteLine(item.Title);
}
Console.WriteLine("Press any key to exit...");
Console.ReadKey();
}
}
}
}

You've got two identical titles there, but with different IDs. Your RDBMS has the flexibility of returning the rows that correspond to your 'New Blog' in any order that it wishes, because your code does not specify anything beyond the requirement to order by the title. Moreover, it is not even required to return results in the same order each time that you run the same query.
If you would like predictable results, add a "then by" to your LINQ statement to force the ordering that you wish to have:
var query = from b in db.Blogs
orderby b.Title, b.BlogId
select b;
EDIT :
When I ran the program above, the First() method returns the id 6 so I assume the method always returns the 2nd item in the situation above. And it doesn't seem to have anything to do with the RDBMS. Can anyone explain?
That's also happening in RDBMS, and it is reproducible without LINQ. Here is a small demo (link to sqlfiddle):
create table blogs(blogid int,title varchar(20));
insert into blogs(blogid,title) values (5,'New blog');
insert into blogs(blogid,title) values (6,'New blog');
insert into blogs(blogid,title) values (7,'New blog');
insert into blogs(blogid,title) values (8,'New blog');
SELECT * FROM Blogs ORDER BY Title
This query produces results in "natural" order:
BLOGID TITLE
------ --------
5 New blog
6 New blog
7 New blog
8 New blog
However, this query, which is what EF runs to get the First() item in RDBMS
SELECT TOP 1 * FROM Blogs ORDER BY Title
returns the second row in natural order:
BLOGID TITLE
------ --------
6 New blog
It does not mean that it is going to return the same row in other RDBMSs (link to a demo with MySQL returning a different row for the same query), or even in the same RDBMS. It simply demonstrates that LINQ relies on RDBMS for the selection of the row, and the RDBMS returns an arbitrarily selected row.

I suspect the difference comes in the optimizations that are taken by First() without the ToList().
When you call ToList(), the entire ordered list must be created. So it will order everything using an efficient sort algorithm.
However, with First(), it only needs to find the min value. So it can use a much more effecient algorithm that basically goes through the enumerable once and stores the current min object value. (So it will result in the first object of the min value).
This is a different algorithm then sorting the entire list and hence gets a different result.
Update:
Also, this being a database, it may be using linq to sql which will produce a different query based on the above description (getting a sorted list vs getting the first with the min value).

Related

Optimize EF core query in an alphabetically ordered list

I've been dealing with an issue lately, and although i have some solutions in mind, i'd like to find the best one from every point of view.
Let's say i have a WPF app with EF Core. There are about 3000 customers in my database (SQLite in my case, but in the future this should also work with slower ones). When the user opens the customer's list, i'm loading only some of them (quantity = 50, page = 0), in alphabetical order. As soon as the user scrolls down to the bottom, 50 more are loaded (quantity = 50, page = 1).
CustomerRepository.GetQueryableAll().Skip(page * quantity).Take(quantity).ToList();
Everything works fine. Here comes the problem though: there's a button to create a new customer, which opens a modal window. Let's say the user creates a customer with starting letter W. As soon as he/she hits SAVE, the new customer is saved to the database, the window is closed, and the list must be reloaded. But loading the whole list until W is, of course, really slow.
So far, i've tried to query the database in a background task and store how many customers start with each letter of the database in a static Dictionary: as soon as SAVE is hit, i can guess more or less how many "pages" to Skip() in the database and get the group of 50 in which the new customer will be. It works, it's quite fast, but i'm worried that it won't work in countries with non Latin alphabets:
public async Task<Dictionary<char, int>> GetCustomersByInitialsCount()
{
return await Task.Run(async delegate
{
var dictionary = new Dictionary<char, int>();
for (char c = 'A'; c <= 'Z'; c++)
{
var count = await CustomerRepository.GetCustomerCountStartingWith(c.ToString());
dictionary.Add(c, count);
}
return dictionary;
});
}
[... and in the repository:]
public async Task<int> GetCustomerCountStartingWith(string startingLetter)
{
using (var dbContext = new MyDbContext())
{
return await dbContext.Set<Customer>().CountAsync(p => p.LastName.ToUpper().StartsWith(startingLetter.ToUpper()));
}
}
Otherwise, instead of this background query, i could also try to "guess" the right page depending on the starting char, but i'm still puzzled by the unexpected outcomes i could have with non latin languages.
If anybody knows better tools or have any other useful ideas, i'll gladly consider them!
Thank you very much in advance and happy coding.
What if you add a request to get all the first "letters" in your table ?
public async Task<List<string>> GetCustomerFirstLetter()
{
using (var dbContext = new MyDbContext())
{
return await dbContext.Set<Customer>().Select(x => x.lastName.Substring(0, 1)).Distinct().ToList();
}
}
and then
public async Task<Dictionary<char, int>> GetCustomersByInitialsCount()
{
return await Task.Run(async delegate
{
var dictionary = new Dictionary<char, int>();
var letters = GetCustomerFirstLetter();
foreach(letter in letters)
{
var count = await CustomerRepository.GetCustomerCountStartingWith(letter);
dictionary.Add(letter, count);
}
return dictionary;
});
}
Alternative solution. A little bit more efficient from my point of view
Your problem boils down to how to get new customer's row number in whole dataset ordered by customer's name.
First of all, in plain SQL for SQLite or MSSQL you may solve your problem of getting right page number with ROW_NUMBER function. Query example:
SELECT TOP 1 rnd.rownum, rnd.LastName
from (SELECT ROW_NUMBER() OVER( ORDER BY c.LastName) AS rownum, c.LastName
FROM [Customer] c) rnd
WHERE rnd.LastName = '<your new customers name here>'
So, after getting exact rownumber value and having already page count param you can easily calculate needed page.
Getting back to your code. This feature can be implemented in EF with overloaded version of Select method, but unfortunately, it has not been implemented in EF Core for IQueryable yet (see this).
But you can still pass exact query right to db using FromSql method.
Solution consists of two steps:
To get required data you need to define Query for model builder this way (additional fields just for example, youl need RowNum only):
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.Query<CustomerRownNum>();
}
public class CustomerRownNum
{
public long RowNum { get; set; }
public Guid Id { get; set; }
public string LastName { get; set; }
}
Then you need to pass mentioned above SQL query to context's Query method this way:
string customerLastName = "<your customer's last name>";
var result = dbContext.Query<CustomerRownNum>().FromSql(
#"select top 1 rnd.RowNum, rnd.Id, rnd.LastName
from
(SELECT ROW_NUMBER() OVER( ORDER BY c.LastName) AS RowNum
, c.Id, c.LastName
FROM [Customer] c) rnd
WHERE rnd.LastName = {0}", customerLastName).FirstOrDefault();
Finally you'll get data you needed right in result variable.
Hope that helps!

Update collection from DbSet object via Linq

i know it is not complicated but i struggle with it.
I have IList<Material> collection
public class Material
{
public string Number { get; set; }
public decimal? Value { get; set; }
}
materials = new List<Material>();
materials.Add(new Material { Number = 111 });
materials.Add(new Material { Number = 222 });
And i have DbSet<Material> collection
with columns Number and ValueColumn
I need to update IList<Material> Value property based on DbSet<Material> collection but with following conditions
Only one query request into database
The returned data from database has to be limited by Number identifier (do not load whole database table into memory)
I tried following (based on my previous question)
Working solution 1, but download whole table into memory (monitored in sql server profiler).
var result = (
from db_m in db.Material
join m in model.Materials
on db_m.Number.ToString() equals m.Number
select new
{
db_m.Number,
db_m.Value
}
).ToList();
model.Materials.ToList().ForEach(m => m.Value= result.SingleOrDefault(db_m => db_m.Number.ToString() == m.Number).Value);
Working solution 2, but it execute query for each item in the collection.
model.Materials.ToList().ForEach(m => m.Value= db.Material.FirstOrDefault(db_m => db_m.Number.ToString() == m.Number).Value);
Incompletely solution, where i tried to use contains method
// I am trying to get new filtered collection from database, which i will iterate after.
var result = db.Material
.Where(x=>
// here is the reasonable error: cannot convert int into Material class, but i do not know how to solve this.
model.Materials.Contains(x.Number)
)
.Select(material => new Material { Number = material.Number.ToString(), Value = material.Value});
Any idea ? For me it is much easier to execute stored procedure with comma separated id values as a parameter and get the data directly, but i want to master linq too.
I'd do something like this without trying to get too cute :
var numbersToFilterby = model.Materials.Select(m => m.Number).ToArray();
...
var result = from db_m in db.Material where numbersToFilterBy.Contains(db_m.Number) select new { ... }

Get Id's of recently inserted rows in Entity Framework

I'm bulk inserting rows into a table (which has a identity column which auto increments every time a new row is inserted) based on the following post
https://stackoverflow.com/a/5942176/3861992
After all rows are inserted, how do I get the list of ids of the rows that are recently inserted?
Thanks
EntityFrameWork(EF) after insert entity and SaveChanges(). it sets the value of Id.
Suppose that the entity you want to enter into database is as follows:
public class EntityToInsert
{
public int Id { get; set; }
public string Name { get; set; }
public int Age { get; set; }
}
And you want to insert a list of entity:
var list = new List<EntityToInsert>()
{
new EntityToInsert() {Name = "A", Age = 15},
new EntityToInsert() {Name = "B", Age = 25},
new EntityToInsert() {Name = "C", Age = 35}
};
foreach (var item in list)
{
context.Set<EntityToInsert>().Add(item);
}
context.SaveChanges();
// get the list of ids of the rows that are recently inserted
var listOfIds=list.Select(x => x.Id).ToList();
I hope this helps.
When all rows are really inserted in the database(after calling SaveChanges() in Entity Framework), the real IDs of these rows are populated.
So after SaveChanges() you will have IDs there in inserted objects without doing any query.
Try this:
dbcontext.Entry( [object] ).GetDatabaseValues();
This is for a single row. If my internet connection at the moment wasn't so slow I'd look up the documentation to see if it's easy to get multiple rows. At the very least you can iterate through your list of database objects and get each entries values. That however may not be the fastest solution.

DB first Entity Framework query incredibly slow

I am new to databases, and to EF. I am using EF within an ASP.NET Core MVC project. The implementation code below is from a Controller, aiming to combine data from two tables into a summary.
The database has tables: Batch, Doc.
Batch has many columns, including: int BatchId, string BatchEnd. BatchEnd is a consistently formatted DateTime, e.g. 23/09/2016 14:33:21
Doc has many columns including: string BatchId, string HardCopyDestination. Many Docs can refer to the same BatchId, but all Docs that do so have the same value for HardCopyDestination.
I want to populate the following ViewModel
public class Batch
{
public int BatchId { get; set; }
public string Time { get; set; } // from BatchEnd
public string HardCopyDestination { get; set; }
}
But my current query, below, is running dog slow. Have I implemented this correctly?
var BatchViewModels = new List<Batch>();
// this is fine
var batches = _context.BatchTable.Where(
b => b.BatchEnd.Contains(
DateTime.Now.Date.ToString("dd/MM/yyyy")));
// this bit disappears down a hole
foreach (var batch in batches)
{
var doc = _context.DocTable.FirstOrDefault(
d => d.BatchId == batch.BatchId.ToString());
if (doc != null)
{
var newBatchVM = new Batch
{
BatchId = batch.BatchId,
Time = batch.BatchEnd.Substring(whatever to get time),
HardCopyDestination = doc.HardCopyDestination
};
BatchViewModels.Add(newBatchVM);
continue;
}
}
return View(BatchViewModels);
I think you're hitting the database once per batch. If you have many batches that is expensive. You can get all documents in one go from db.
var batchDict = batches.ToDictionary(b => b.BatchId);
var documents = _context.DocTable.Where(doc => batchDict.Keys.Contains(doc.BatchId));
BatchViewModels.AddRange(documents.Select(d => new Batch
{
BatchId = d.BatchId,
Time = batchDict[d.BatchId].BatchEnd.TimeOfDay, // you only want the time?
HardCopyDestination = d.HardCopyDestination
});
By the way, Igor is right about dates and in addition, if BatchId is int in BatchTable, then it should be that in DocTable as well. In above code I assume they are same type but shouldn't be so hard to change if they aren't.
Igor is also right about profiling db is a good way to see what the problem is. I'm just taking a guess based on your code.

Linq-to-Entities return list along with average of ratings infor each item

I am returning a list of restaurants that pulls information from the RESTAURANT, CUISINE, CITY, and STARRATING tables. I want to get a list of each restaurant with its associated city and cuisine along with the average rating in the STARRATING table. This is what I have, so far ... Thanks in advance.
RestaurantsEntities db = new RestaurantsEntities();
public List<RESTAURANT> getRestaurantsWRating(string cuisineName, string cityName, string priceName, string ratingName)
{
var cuisineID = db.CUISINEs.First(s => s.CUISINE_NAME == cuisineName).CUISINE_ID;
List<RESTAURANT> result = (from RESTAURANT in db.RESTAURANTs.Include("CITY").Include("CUISINE").Include("STARRATING")
where RESTAURANT.CUISINE_ID == cuisineID
orderby RESTAURANT.REST_NAME ascending
select RESTAURANT).ToList();
return result;
}
From what you have it looks like Restaurant has a STARRATING collection. If so, this is what you can do:
from r in db.Restaurants
where r.CUISINE_ID == cuisineID
orderby r.REST_NAME ascending
select new {
Restaurant = r,
City = r.CITY,
Cuisine = r.CUISINE,
AvgRating = r.STARRATING.Average(rt => rt.Rating)
}
You'd need to give more informations about your classes and associations (preferably a class diagram) if this is not right.
(BTW using capitals for class and property names is not conventional).
First I would wrap your whole code block above in a using statement:
using(RestaurantEntities db = new RestaurantEntities())
{
...
}
This will help with cleanup for the EF context.
The way I would typically do this is if you have control of your database, I would create a view in the database that does this work, add the view to your entity model and query the view. This simplifies the whole process and offloads the work of the aggregation to the database.
If you don't have control over the database or don't prefer the view technique then I would query using the include technique as you have done and then add a partial class to RESTAURANT (if using model-first) in order to add an AverageRating property and then manually calculate the average for each related STARRATING set of related rows and apply the resultant value to the added property. You could do this through linq to objects once you have all the data back. This technique would not scale very well as more data is accumulated unless you are confident you never return but one or a few RESTAURANT instances. You could use something like:
//query data as you have done above...
foreach(RESTAURANT r in result)
{
if(r.STARRATING.Count() > 0)
{
r.AverageRating = r.STARRATING.Average(rating => rating.Value); //.Value is your field name
}
else
{
r.AverageRating = 0; // or whatever default you prefer...
}
}
Hope this helps.

Categories

Resources