I was using servicestack.redis recently, and I need query from IRedisTypedClient. I know all data is in memory, but still want to know, is there a speed different between GetAll().Where() and GetByIds()?
GetAll() and GetByIds() are two methods provided by servicestack.redis.
Use GetAll() can continue search in result(by lambda), that means I can use some custom conditions, but I don't know, whether that will load all data from redis memory then search in IEnumable<T>, and whether the search speed will slower than GetByIds().
I just did a experiment, I stored 1 million object(ps:there is a servicestack's bug, can only store about half million object once).
queried with these two methods.
DateTime beginDate = DateTime.Now;
Debug.WriteLine("查询1开始");`
Website site = WebsiteRedis.GetByCondition(w => w.Name == "网址2336677").First();
double time = (DateTime.Now - beginDate).TotalMilliseconds;
Debug.WriteLine("耗时:" + time + "ms");
DateTime beginDate2 = DateTime.Now;
Debug.WriteLine("查询2开始");
Website site2 = WebsiteRedis.GetByID(new Guid("29284415-5de0-4781-bea4-5e01332814b2"));
double time2 = (DateTime.Now - beginDate2).TotalMilliseconds;
Debug.WriteLine("耗时:" + time2 + "ms");
Result is
GetAll().Where() - takes 19 seconds,
GetById()- take 190ms.
I guess it's because servicestack use object id as redis key, so never use GetAll().Where() as query, every object should related with id and use GetById() as query. GetAll() should use on object with less records.
You can have a look at the implementations of GetAll and GetByIds to see how they work.
GetByIds just converts all Ids to a fully qualified Key which each entry is stored under then calls GetValues() which creates a single MGET request to fetch all the values:
public IList<T> GetByIds(IEnumerable ids)
{
if (ids != null)
{
var urnKeys = ids.Map(x => client.UrnKey<T>(x));
if (urnKeys.Count != 0)
return GetValues(urnKeys);
}
return new List<T>();
}
public IList<T> GetAll()
{
var allKeys = client.GetAllItemsFromSet(this.TypeIdsSetKey);
return this.GetByIds(allKeys.ToArray());
}
GetAll fetches all the Ids from the TypeIdsSetKey (i.e. Redis SET containing all ids for that Type) then calls GetByIds().
So GetByIds is faster because it makes one less call to Redis, but together they only make 2 Redis operations.
Note they both return an in memory .NET List<T> so you can use LINQ to further filter the returned results, but it returns all results for that Type and the filtering is performed on the client so this isn't efficient for large datasets. Instead you should look at creating manual indexes using Redis SETs for common queries.
Related
I was writing unit tests to compare an original response to a filtered response using a request object as a parameter. In doing so I noticed that if I change the request object after getting a response the IEnumerable list will change - As I type this, my thinking is that because it is an IEnumerable with LINQ, the request.Filter property is a reference in the LINQ query, which is what causes this behavior. If I converted this to a list instead of an IEnumerable, I suspect the behavior would go away because the .ToList() will evaluate the LINQ expressions instead of deferring. Is that the case?
public class VendorResponse {
public IEnumerable<string> Vendors { get; set; }
}
var request = new VendorRequest() {
Filter = ""
};
var response = await _service.GetVendors(request);
int vendorCount = response.Vendors.Count(); // 20
request.Filter = "at&t";
int newCount = response.Vendors.Count(); // 17
public async Task<VendorResponse> GetVendors(VendorRequest request)
{
var vendors = await _dataService.GetVendors();
return new VendorResponse {
Vendors = vendors.Where(v => v.IndexOf(request.Filter) >= 0)
}
}
If deferred execution is preferable, you can capture the current state of request.Filter with a local variable and use that in the Where predicate
public async Task<VendorResponse> GetVendors(VendorRequest request)
{
var filter = request.Filter;
var vendors = await _dataService.GetVendors();
return new VendorResponse {
Vendors = vendors.Where(v => v.IndexOf(filter) >= 0)
}
}
Yes!
This is an example of deferred execution of an IEnumerable, which just encapsulates a query on some data without encapsulating the result of that query.
An IEnumerable can be enumerated (via its IEnumerator), and "knows" how to enumerate the query it encapsulates, but this will not actually happen until something executes the enumeration.
In your case the enumeration is executed by the call to .Count() which needs to know how many items are in the result of the query. The enumeration occurs every time you call .Count(), so changing the filter between the two invocations leads to you getting two different results.
As you have correctly deduced, calling .ToList() and capturing the result in a variable before performing any further operations would lead to you capturing the resulting data rather than the query, and so lead to both counts having the same value.
Try this out yourself. In future, be sure to force the evaluation of the enumerable before passing to other queries, or returning out to unknown code, otherwise you or your users will encounter unexpected behaviour and possible performance issues.
Hope this helps :)
Edit 1:
As Moho has pointed out, and you have also alluded to in your original post, this is also a result of the request.Filter being captured by the IEnumerable as a reference type. If you can capture the value and pass this in instead, the result of the IEnumerable will no longer be modified by changing the filter.
I have two datetime pickers on my form. I want a function that will return all datetimes from a specific table (which are values of a specific column) between those two dates.
My method looks like this:
public DateTime[] GetAllArchiveDates(string username = null)
{
var result = new DateTime[0];
if (username != null)
{
result = this._context.archive.OrderBy(s => s.IssuingDate).Where(s => s.insertedBy == username).Select(s => s.issuing_date).Distinct().ToArray();
}
else
{
result = this._context.archive.OrderBy(s => s.IssuingDate).Select(s => s.issuing_date).Distinct().ToArray();
}
return result;
}
But I am getting this error:
System.NotSupportedException: 'The specified type member 'IssuingDate' is not supported in LINQ to Entities. Only initializers, entity members, and entity navigation properties are supported.'
How to do this?
The cause of your error message
You should be aware about the differences between IEnumerable and IQueryable.
An object of a class that implements IEnumerable holds everything to enumerate over the sequence of items it represents. You can ask for the first item of the sequence, and once you've got one, you can ask for the next item, until there are no more items.
On the other hand, an object of a class that implements IQueryable holds everything to ask another process to provide data to create an IEnumerable sequence. To do this, it holds an Expression and a Provider.
The Expression is a generic representation of what kind of IEnumerable must be created once you start enumerating the IQueryable.
The Provider knows who must execute the query, and it knows how to translate the Expression into a format that the executor understands, for instance SQL.
There are two kinds of LINQ statements. Those that use deferred execution, and those that don't. The deferred functions can be recognized, because they return IQueryable<TResult> (or IEnumerable). Examples are Where, Select, GroupBy, etc.
The non-deferred functions return a TResult: ToList, ToDictionary, FirstOrDefault, Max.
As long as you concatenate deferred LINQ functions, the query is not executed, only the Expression is changed. Once you start enumerating, either explicitly using GetEnumerator and MoveNext, or implicitly using foreach, ToList, Max, etc, the Expression is sent to the Provider who will translate it to SQL and execute the query. The result is represented as an IEnumerable, on which the GetEnumerator is performed.
What has this to do with my question?
Because the Expression must be translated into SQL, it can't hold anything that you invented. After all, SQL doesn't know your functions. In fact, there are a lot of standard functions that can't be used in an IQueryable. See Supported and unsupported LINQ functions
Alas you forgot to give us the archive class definition, but I think that it is not a POCO: It contains functions and properties that do more than just get / set. I think that IssuingDate is not just get / set.
For IQueryables you should keep your classes simple: use only {get; set;} during your query, nothing more. Other functions can be called after you've materialized your IQueryable into something IEnumerable which is to be executed within your local process
Back to your question
So you have a database with a table Archive with at least columns IssuingDate and InsertedBy. It seems that InsertedBy is just a string. It could be a foreign key to a table with users. This won't influence the answer very much.
Following the entity framework code first conventions this leads to the following classes
class Archive
{
public int Id {get; set;}
public DateTime IssuingDate {get; set;}
public string InsertedBy {get; set;}
...
}
public class MyDbContext : DbContext
{
public DbSet<Archive> Archives {get; set;}
}
By the way, is there a proper reason you deviate so often from Microsoft standards about naming identifiers, especially pluralization and camel casing?
Anyway, your requirement
I have two datetime pickers on my form. I want a function that will return all datetimes from a specific table (which are values of a specific column) between those two dates.
Your code seems to do a lot more, but let's first write an extension function that meets your requirement. I'll write it as an extension method of your archive class. This will keep your archive class simple (only {get; set;}), yet it adds functionality to the class. Writing it as an extension function also enables you to use these functions as if they were any other LINQ function. See Extension methods demystified
public static IQueryable<Archive> BetweenDates(this IQueryable<Archive> archives,
DateTime startDate,
DateTime endDate)
{
return archives.Where(archive => startDate <= archive.IssuingDate
&& archive.IssuingDate <= endDate);
}
If I look at your code, you don't do anything of selecting archives between dates. You do something with a userName, ordering, select distinct... It is a bit strange that you first Order all your million archives, and then decide to keep only the ten archives that belong to userName, and if you have several same issuing dates you decide to remove the duplicates. Wouldn't it be more efficient to first limit the number of issuing dates before you start ordering them?
public static IQueryable<archive> ToIssuingDatesOfUser(this IQueryable<archive> archives,
string userName)
{
// first limit the number of archives, depdning on userName,
// then select the IssuingDate, remove duplicates, and finally Order
var archivesOfUser = (userName == null) ? archives :
archives.Where(archive => archive.InsertedBy == userName);
return archivesOfUser.Select(archive => archive.IssuingDate)
.Distinct()
.OrderBy(issuingDate => issuingDate);
}
Note: until now, I only created IQueryables. So only the Expression is changed, which is fairly efficient. The database is not communicated yet.
Example of usage:
Requirement: given a userName, a startDate and an endDate, give me the unique issuingDates of all archives that are issued by this user, in ascending order
public ICollection<string> GetIssuingDatesOfUserBetweenDates(string userName,
DateTime startDate,
DateTime endDate)
{
using (var dbContext = new MyDbContext(...))
{
return dbContext.Archives
.BetweenDates(startDate, endDate)
.ToIssuingDatesOfUser(userName)
.ToList();
}
}
I'm looking at a problem where I wish to get a collection from an expensive service call and then store it in cache so it can be used for subsequent operations on the UI. The code I'm using is as follows:
List<OrganisationVO> organisations = (List<OrganisationVO>)MemoryCache.Default["OrganisationVOs"];
List<Organisation> orgs = new List<Organisation>();
if (organisations == null)
{
organisations = new List<OrganisationVO>();
orgs = pmService.GetOrganisationsByName("", 0, 4000, ref totalCount);
foreach (Organisation org in orgs)
{
OrganisationVO orgVO = new OrganisationVO();
orgVO = Mapper.ToViewObject(org);
organisations.Add(orgVO);
}
MemoryCache.Default.AddOrGetExisting("OrganisationVOs", organisations, DateTime.Now.AddMinutes(10));
}
List<OrganisationVO> data = new List<OrganisationVO>();
data = organisations;
if (!string.IsNullOrEmpty(filter) && filter != "*")
{
data.RemoveAll(filterOrg => !filterOrg.DisplayName.ToLower().StartsWith(filter.ToLower()));
}
The issue I'm facing is that the data.RemoveAll operation affects the cached version. i.e. I want the cached version to always reflect the full dataset returned by the service call. I then want to retrieve this collection from cache whenever the filter is set and apply it but this should not change cached data - i.e. subsequent filters should happen on the full dataset - what is the best way to do this?
You need to make copy of the list if you want to use RemoveAll operation (ToList would be enough).
Also instead of modigying the list consider using LINQ operations like Where/Select.
I would either:
apply the filter dynamically and replace the filter if needed (so you cache the complete data but only return the cachedData.Where(currentFilter)
make two caches - one for the complete data and one for the filtered data - in this case the first one should only consist of the data returned from the service - no need to cache the VO-data as well
I am just doing some experiments on Castle AR and 2nd level cache of NH. In the following two methods, I can see caching working fine but only for the repetition of the call of each. In other words if I call RetrieveByPrimaryKey twice for same PK, the object is found in cache. And if I call RetrieveAll twice, I see SQL issued only once.
But if I call RetrieveAll and then RetrieveByPrimaryKey with some PK, I see two SQL statements getting issued. My question is, Why AR does not look for that entity in cache first? Sure it would have found it there as a result of previous call to RetrieveAll.
public static T RetrieveByPrimaryKey(Guid id)
{
var res = default(T);
var findCriteria = DetachedCriteria.For<T>().SetCacheable(true);
var eqExpression = NHibernate.Criterion.Expression.Eq("Id", id);
findCriteria.Add(eqExpression);
var items = FindAll(findCriteria);
if (items != null && items.Length > 0)
res = items[0];
return res;
}
public static T[] RetrieveAll()
{
var findCriteria = DetachedCriteria.For<T>().SetCacheable(true);
var res = FindAll(findCriteria);
return res;
}
You're using caching on specific queries. that means that cache lookup is done in the following way:
search the cahce for results of a query with identical syntax AND the same parameters. If found- use cached results.
nHibernate (this has nothing to do with AR, by the way) doesn't know that logically, one query 'contains' the other. so this is why you're getting 2 db trips.
I would suggest using ISession.Get to retreive items by ID (it's the recommended method). I think (not tested it though) that Get can use items cached by other queries.
here's a nice blog post from ayende about it.
I use C# on WP7 (Mango). I try to use a special query because I receive an error:
Method 'Int32 orderBirthday(System.DateTime)' has no supported
translation to SQL.
Yes, I know... Linq can't use my function but I don't know the right way...
I have a database table with the columns name and birthday. In my query I will calculate how many days are to the next birthday (from all items) and then I will order with "descending".
static int orderBirthday(DateTime Birthday)
{
DateTime today = DateTime.Today;
DateTime birthday = Birthday;
DateTime next = new DateTime(today.Year, birthday.Month, birthday.Day);
if (next < today)
next = next.AddYears(1);
int numDays = (next - today).Days;
// No Conversion
return numDays;
}
public void LoadCollectionsFromDatabase()
{
DateTime today = DateTime.Today;
var toDoItemsInDB = from ToDoItem todo in toDoDB.Items
let daysToBirthday = orderBirthday(todo.ItemDate)
orderby daysToBirthday ascending
select todo;
// Query the database and load all to-do items.
AllToDoItems = new ObservableCollection<ToDoItem>(toDoItemsInDB);
.
.
.
}
You either have to pull everything from the database and sort it locally (as Enigmativity) shows, or find a way to express the sort operation in a LINQ statement itself. And since you extracted the sorting behavior into its own function, you probably want to reuse this logic. In that case your best bet is to create a function that filters an IQueryable.
Here is an example of how to do this:
public static IOrderedQueryable<Item> OrderByBirthday(
this IQueryable<Item> items)
{
return
from item in items
let today = DateTime.Today
let birthday = item.ItemDate
let next = new DateTime(today.Year, birthday.Month, birthday.Day)
let next2 = next < today ? next.AddYears(1) : next
orderby (next - today).Days
select item;
}
You can use the method as follows:
var toDoItemsInDB = OrderByBirthday(toDoDB.Items);
Or you can use it as an extension method:
var toDoItemsInDB = toDoDB.Items.OrderByBirthday();
It's easy if you do this:
var toDoItemsInDB = from ToDoItem todo in toDoDB.Items.ToArray()
let daysToBirthday = orderBirthday(todo.ItemDate)
orderby daysToBirthday ascending
select todo.;
Notice the .ToArray() added to Items. You basically bring the results into memory and them your function can work.
Two ways:
One: Pull it from Linq2SQL to Linq2Objects using ToEnumerable(), and then use orderBirthday at the C# level.
Advantage is that it's simple to code and maintain, disadvantage is that it can be less efficient (depends on just what you are doing.
Two: Write an equivalent function in SQL, let's say it was called dbo.orderBirthday. Make your orderBirthday method a non-static method of your datacontext-derived class, and then mark your method as having a SQL function equivalent:
[Function(Name="dbo.orderBirthday",IsComposable=true)] //IsComposable is true for functions that can be used within queries, false for stored procedures that must be called on their own.
public int OrderBirthday([Parameter(Name="#birthday",DbType="datetime") DateTime birthday)
{
return Helper.OrderBirthday(birthday); // just to show that we can keep the static version around if we want and call into it. Alternatively we could just move the whole body here.
}
Here the C# code is used in a non-Linq2SQL context, and the SQL code is used in composing a SQL query in a Linq2SQL context.
Advantage: Can stay within SQL longer. Disadvantage: Two versions of the same method can fall out of sync and cause bugs.
It's also possible to have the C# code call the SQL code all the time:
[Function(Name="dbo.orderBirthday",IsComposable=true)]
public int OrderBirthday([Parameter(Name="#birthday",DbType="datetime") DateTime birthday)
{
return (int)ExecuteMethodCall(this, (MethodInfo)MethodInfo.GetCurrentMethod(), birthday).ReturnValue;
}
Advantage: Keeps one version (the SQL) as the only version, so it can't fall out of synch with the C# version. Disadvantage: Calls SQL even when working on objects that have nothing to do with SQL.
If you don't want to load all the items in memory and you want the database execute the calculation, you can write a stored procedure that can execute complex calculation and call the procedure using ADO or EF.