Raven query only works sometimes

Raven query only works sometimes - c#

We've been having a problem where the following method which queries a raven db works, but only about 90% of the time
member.UserId = userService.GivenUsernameGetUserId(command.EmailAddress.ToLower());
to counteract this, I made this ugly hack workaround which does seem to have fixed the problem:
member.UserId = userService.GivenUsernameGetUserId(command.EmailAddress.ToLower());
System.Threading.Thread.Sleep(1000);
if (member.UserId.IsNullOrEmpty())
{
logger.Error("the userid was not loaded in time");
for (int i = 0; i < 5; i++)
{
member.UserId = userService.GivenUsernameGetUserId(command.EmailAddress.ToLower());
System.Threading.Thread.Sleep(1000);
if (member.UserId.IsNotNullOrEmpty())
{
logger.Info("The userid was retrieved in a loop after some delay ");
break;
}
}
if (member.UserId.IsNullOrEmpty())
{
logger.Error("a loop of 5 cycles was run trying to retrieve the userId but couldn't get it.");
}
}
Can anyone see why it might only be retrieving the correct data sometimes and whether there's a more elegant solution to making sure it keeps trying until it retrieves the data? I'm thinking whether there's some basic timeout setting that can be set in web.config or something?

The issue is likely stale indexes: the user has been recently created, and the indexes haven't had a chance to update. (Usually this takes milliseconds, but on a large database it can take longer.)
There are 3 things you can do here to fix your problem:
Option 1: Make User Ids based on email address. Then you don't have to mess with indexes at all.
Option 2: You can leave user IDs as-is, but wait for non-stale indexes.
Option 3: When you create a user, wait for indexes to update.
I'll describe each of these options below:
Option 1:
Make your user IDs well-known, so that you don't have to user indexes at all.
Say your object is called User. When you register the User, your code will look like:
public void RegisterUser(string emailAddress)
{
var user = new User
{
UserName = emailAddress,
...
};
// Give the User a well-known ID, so that we don't have to mess with indexes later.
user.Id = "Users/" + emailAddress;
ravenSession.Store(user);
}
If you do that, you won't have to mess with indexes at all. When it comes time to load your user:
public string GivenUsernameGetUserId(string userName)
{
// Look ma, no query needed.
return "Users/" + userName;
// Or, need to return the User itself? You can use .Load, which will never be stale.
// return ravenSession.Load<User>("Users/" + userName);
}
This is really your best option, and you never have to deal with indexes, therefore, you'll never have to deal with stale data.
Option 2
Option 2 is to use .WaitForNonStaleResults. It waits for indexes to become up-to-date before returning results.
public string GivenUsernameGetUserId(string userName)
{
// Use .WaitForNonStaleResultsAsOfNow()
return ravenSession.Query<User>()
.Customize(x => x.WaitForNonStaleResultsAsOfNow())
.Where(u => u.UserName == userName)
.Select(u => u.Id)
.FirstOrDefault();
}
Option 3
Option 3 is to wait for indexes to update when saving your user.
This requires Raven 3.5 or greater.
public void RegisterUser(string userName)
{
ravenSession.Advanced.WaitForIndexesAfterSaveChanges(timeout: TimeSpan.FromSeconds(30));
var user = new User {...};
ravenSession.Store(user);
ravenSession.SaveChanges(); // This won't return until the User is stored *AND* the indexes are updated.
};
Personally, I'd recommend using #1: well-known IDs for your users. Also, I recommend #3 even if you implement other solutions: SaveChanges will wait for the indexes to be updated before returning. This will result in fewer surprises around stale indexes, so I recommend it generally.

I am using Raven DB 3.5 and using mentioned Option 3
This does work but I encountered a problem using this approach:
In a specific use case this operation would take about 60 seconds to complete.
I'd not recommend to make general use of WaitForIndexesAfterSaveChanges() as it obviously can lead to tremendous performance issues.
Instead I'd configure queries using WaitForNonStaleResultsAsOfNow().

Related

Methods executing out of order when using Linq Select

I have a series of methods that do some insertions into the database in a pretty set order. However, I'm running into an issue where one of the methods is firing off before the other method is called, unless I put a breakpoint in.
My initial thought was that due to me passing around IEnumerables, the method wasn't firing off until the Controller was returning the object, but after converting everything to lists, the error still occurs.
My code looks something like the following:
// controller
public IActionResult CreateConversation([FromBody] CreateConvoRequest model)
{
var userId = ViewBag.UserData.UserID;
model.message.UserFrom = userId;
model.users.Add(userId);
var result = MessagingLogic.CreateConversation(model.conversation, model.message, model.users);
return Json(result);
}
// Logic
public static Conversation CreateConversation(Conversation conversation, ConversationMessage message,
List<int> users)
{
conversation.DateCreated = DateTime.Now;
conversation.LastUpdated = DateTime.Now;
var convo = SaveNewConversation(conversation, users);
message.ConversationId = convo.ConversationId;
SendMessage(message);
return convo;
}
public static Conversation SaveNewConversation(Conversation conversation, List<int> users)
{
conversation = SaveConversation(conversation);
conversation.Users = users.Select(n => CreateConversationUser(conversation.ConversationId, n)).ToList();
// the above linq executes some SQL insertions and returns the new object
return conversation;
}
public static ConversationMessage SendMessage(ConversationMessage message)
{
if (message.CloseConversation) CloseConversation(message.ConversationId);
return SaveMessage(message);
}
What appears to be happening is that SendMessage is being called before CreateConversationUser. This is turn causes my messages not to be saved for the users as they aren't saved into the database until after the SendMessage method is called.
However, if I put a breakpoint on SaveNewConversation method, everything works as intended.
Edit
So after some more tinkering, adding the ToList to my Linq Selected corrected the issue with it executing out of order. However, my SendMessage SQL statement is using a INSERT INTO SELECT. It appears that the users often haven't been inserted into the database by the time the SendMessage sql executes.
For example, 2 of my users were inserted at 2017-09-20 10:29:35.820 and 2017-09-20 10:29:35.823 respectively. However, the message was inserted at 2017-09-20 10:29:35.810.
It appears something odd is happening on the SQL server.
Edit 2
Further testing is putting the entirety of the blame on SQL Server. I ran the SQL Profiler tool and the calls are coming in, in the correct order. However, there is only roughly a 3 millisecond delay between the last user being inserted and the messages being inserted.
If I place a sleep statement between the user insert and the message insert of a couple hundred milliseconds, it works as intended. I'm not quite sure how to address this issue, but it no longer appears to be executing out of order.

Parallel.ForEach Source List with Where Condition

I have a code block which processes StoreProducts an then adds or updates them in the database in a for each loop. But this is slow. When I convert the code Parallel.ForEach block, then same products gets both added and updated at the same time. I could not figure out how to safely utilize for the following functionality, any help would be appreciated.
var validProducts = storeProducts.Where(p => p.Price2 > 0
&& !string.IsNullOrEmpty(p.ProductAtt08Desc.Trim())
&& !string.IsNullOrEmpty(p.Barcode.Trim())
).ToList();
var processedProductCodes = new List<string>();
var po = new ParallelOptions()
{
MaxDegreeOfParallelism = 4
};
Parallel.ForEach(validProducts.Where(p => !processedProductCodes.Contains(p.ProductCode)), po,
(product) =>
{
lock (_lockThis)
{
processedProductCodes.Add(product.ProductCode);
}
// Check if Product Exists in Db
// if product is not in Db Add to Db
// if product is in Db Update product in Db
}
The thing in here is, the list validProducts may have more than one same ProductCode, so they are variants and I have to manage that even one of them is being processed it should not be processed again.
So where condition that is found in the parallel foreach 'validProducts.Where(p => !processedProductCodes.Contains(p.ProductCode)' is not working as expected like in normal for each.

The bulk of my answer is less-so an answer to your question and more some guidance - if you were to provide some more technical details, I may be able to assist more precisely.
A Parallel.ForEach is probably not the best solution here -- especially when you have a shared list or a busy server.
You are locking to write but not to read from that shared list. So I'm surprised it's not throwing during the Where. Turn the List<string> into a ConcurrentDictionary<string, bool> (just to create a simple concurrent hash table) then you'll get better write throughput and it won't throw during reads.
But you're going to have database contention issues (if using multiple connections) because your insert will likely still require locks. Even if you simply split the workload you would run into this. This DB locking could cause blocks/deadlocks so it may end up slower than the original. If using one connection, you generally cannot parallelize commands.
I would try wrapping the majority of inserts in a transaction containing batches of say 1000 inserts or place the entire workload into one bulk insert. Then the database will keep the data in-memory and commit the entire thing to disk when finished (instead of one record at a time).
Depending on your typical workload, you may want to try different storage solutions. Databases are generally bad for inserting large volumes of records... you will likely see much better performance with alternative solutions (such as Key-Value stores). Or place the data into something like Redis and slowly persist to the database in the background.

Parallel.ForEach buffers items internally for each thread, one option you could do is switch to a partitioner that does not use buffering
var pat = Partitioner.Create(validProducts.Where(p => !processedProductCodes.Contains(p.ProductCode))
,EnumerablePartitionerOptions.NoBuffering);
Parallel.ForEach(pat, po, (product) => ...
That will get you closer but you will still have a race conditions where two of the same object can be processed because you don't break out of the loop if you find a duplicate.
The better option is switch processedProductCodes to a HashSet<string> and change your code to
var processedProductCodes = new HashSet<string>();
var po = new ParallelOptions()
{
MaxDegreeOfParallelism = 4
};
Parallel.ForEach(validProducts, po,
(product) =>
{
//You can safely lock on processedProductCodes
lock (processedProductCodes)
{
if(!processedProductCodes.Add(product.ProductCode))
{
//Add returns false if the code is already in the collection.
return;
}
}
// Check if Product Exists in Db
// if product is not in Db Add to Db
// if product is in Db Update product in Db
}
HashSet has a much faster lookup and is built in to the Add function.

Updating a field through workflows, better approach?

I have been asked to create a view that includes entities that fit within a date range. So if the entity's new_date1 field is lesser than today, and its new_date2 field is greater than today, the entity should appear in the subgrid on the form.
Unfortunately, you can't do this with simple views as FetchXML doesn't support calculations and operators that could return today's date.
I have come up with the idea of creating an Active field on the entity, then have javascript rules set that field depending on the date range entered.
A view could then use the Active field for a filter criteria.
The problem is that if the entity's form is not opened in a while, the entity might become inactive (today's date is now beyond both date1 and date2 for example) but if the users are not opening the entity's form, the field won't update itself and the view will show inactive entities as active ones.
So I thought of have a scheduled workflow gather all entities that should be active, or inactive then this workflow launches a child workflows, that either sets the Active flag to yes or no.
Here's a bit of the code involved:
private void LaunchUpdateOpportunityWorkflow(IOrganizationService service, ITracingService tracingService, DataCollection<Entity> collection, bool active)
{
foreach (Entity entity in collection)
{
//launch a different workflow, depending on whether we want it active or inactive...
Guid wfId = (active) ? setActiveWorkflowId : setInactiveWorkflowId;
ExecuteWorkflowRequest execRequest = new ExecuteWorkflowRequest();
execRequest.WorkflowId = wfId;
execRequest.EntityId = (Guid)entity["opportunityid"];
try
{
CrmServiceExtensions.ExecuteWithRetry<ExecuteWorkflowResponse>(service, execRequest);
}
catch (Exception ex)
{
tracingService.Trace(string.Format("Error executing workflow for opportunity {0}: {1}", entity["opportunityid"], ex.Message));
}
}
}
The process of gathering the relevant DataCollection is done through simple RetrieveMultipleRequest requests.
The problem with that approach is that if the server reboots, someone has to go and start the workflow that runs the code above.
Is there better a approach to this ? I am using MS CRM 2016.

Adding to Jame's answer, if the filter criterias get complicated where it cannot be achieved using fetchxml, you can always use a plugin.
Register a plugin on "RetrieveMultiple" message.
var queryExpression = PluginExecutionContext.InputParameters["Query"];
if(queryExpression == null || !queryExpression.EntityName.equals("yourentityname", StringComparison.InvariantCultureIgnoreCase) return;
Add a condition which is unique to the advanced find, because there is no way to filter down on which advanced find is triggering the plugin on your entity, easiest way to achieve this would be to add an attribute and use it in the advanced find query.
Check for the condition, if found, the user is trying to run the advanced find you have set up:
if (queryExpression.Criteria == null || queryExpression.Criteria.Conditions == null ||
!queryExpression.Criteria.Conditions.Any()) return;
Find the matching condition, so you can remove it and add conditions you'd like to filter the data by:
var matchContidion = queryExpression.Criteria.Conditions.FirstOrDefault(c => c.AttributeName == "yourflagattribute");
if (matchContidion == null) return;
Remove the dummy match criteria and add your own criterias:
queryExpression.Criteria.Conditions.Remove(matchContidion);
queryExpression.Criteria.Conditions.Add(new ConditionExpression("new_date1", ConditionOperator.LessThan, DateTime.Now));
queryExpression.Criteria.Conditions.Add(new ConditionExpression("new_field2", ConditionOperator.Equals, "Some complex value which cannot be set using fetchxml")); //for example, based on certain values, you might want to call a webservice to get the filter value.

I think you can probably achieve this with FetchXML.
new_date1 field is lesser than today
This is older than 24 hours.
new_date2 field is greater than today
This is any date in the future, so assuming your dates are no further than 100 years in the future you can use Next X Years.
As Darren Lewis pointed out, older than 24 hours might not be yesterday depending on your definition of yesterday. In which case try using Last X Years.

Entity Framework COUNT is doing a SELECT of all records

Profiling my code because it is taking a long time to execute, it is generating a SELECT instead of a COUNT and as there are 20,000 records it is very very slow.
This is the code:
var catViewModel= new CatViewModel();
var catContext = new CatEntities();
var catAccount = catContext.Account.Single(c => c.AccountId == accountId);
catViewModel.NumberOfCats = catAccount.Cats.Count();
It is straightforward stuff, but the code that the profiler is showing is:
exec sp_executesql N'SELECT
[Extent1].xxxxx AS yyyyy,
[Extent1].xxxxx AS yyyyy,
[Extent1].xxxxx AS yyyyy,
[Extent1].xxxxx AS yyyyy // You get the idea
FROM [dbo].[Cats] AS [Extent1]
WHERE Cats.[AccountId] = #EntityKeyValue1',N'#EntityKeyValue1 int',#EntityKeyValue1=7
I've never seen this behaviour before, any ideas?
Edit: It is fixed if I simply do this instead:
catViewModel.NumberOfRecords = catContext.Cats.Where(c => c.AccountId == accountId).Count();
I'd still like to know why the former didn't work though.

So you have 2 completely separate queries going on here and I think I can explain why you get different results. Let's look at the first one
// pull a single account record
var catAccount = catContext.Account.Single(c => c.AccountId == accountId);
// count all the associated Cat records against said account
catViewModel.NumberOfCats = catAccount.Cats.Count();
Going on the assumption that Cats has a 0..* relationship with Account and assuming you are leveraging the frameworks ability to lazily load foreign tables then your first call to catAccounts.Cats is going to result in a SELECT for all the associated Cat records for that particular account. This results in the table being brought into memory therefore the call to Count() would result in an internal check of the Count property of the in-memory collection (hence no COUNT SQL generated).
The second query
catViewModel.NumberOfRecords =
catContext.Cats.Where(c => c.AccountId == accountId).Count();
Is directly against the Cats table (which would be IQueryable<T>) therefore the only operations performed against the table are Where/Count, and both of these will be evaluated on the DB-side before execution so it's obviously a lot more efficient than the first.
However, if you need both Account and Cats then I would recommend you eager load the data on the fetch, that way you take the hit upfront once
var catAccount = catContext.Account.Include(a => a.Cats).Single(...);

Most times, when somebody accesses a sub-collection of an entity, it is because there are a limited number of records, and it is acceptable to populate the collection. Thus, when you access:
catAccount.Cats
(regardless of what you do next), it is filling that collection. Your .Count() is then operating on the local in-memory collection. The problem is that you don't want that. Now you have two options:
check whether your provider offer some mechanism to make that a query rather than a collection
build the query dynamically
access the core data-model instead
I'm pretty confident that if you did:
catViewModel.NumberOfRecords =
catContext.Cats.Count(c => c.AccountId == accountId);
it will work just fine. Less convenient? Sure. But "works" is better than "convenient".

Check or nickname already exists in List<User>

I am creating an ASP website with a possibility to register.
The nickname that visitors choose to register has to be unique.
Everytime when an user registers, I select all users from the database, and then I am using a foreach loop to check or username already exists:
private List<User> listExistingUsers;
listExistingUsers = Database.GetUsers();
foreach (User u in listExistingUsers)
{
if (u.Nickname == txtNickname.text)
{
Error = "Username already in use.";
}
}
But the code above doesn't work properly. It doesn't check all the items in the list which are read from the database. So it is possible to have users with the same usernames, which I don't want.
What can I do to solve this problem? I read about LINQ, but I think that this is the wrong way of checking usernames with List<> in my opinion. I think this username-check must be done in another way.
Can you experts help me? I could also do this check with a SQL-query, but I would like to do it in c#.

Instead of returning ALL users from DB, pass username to Query/stored procedure and let backend do the check, and then return back just a status flag 1/0 - exists/doesn't.

if (Database.GetUsers().Select(x => x.Nickname).Contains(txtNickname.Text)) should do what you want.
I've condensed everything into a single line so I'll give a quick explanation; First I use your Database.GetUsers() method to retrieve the users, then I use select to project the Nickname since that's what we're comparing. If that were to execute on it's own it would result in an IEnumerable<string> with all of the Nicknames. From there I use contains to see if that list contains the nickname that (I'm assuming) has been entered in the UI.

You can use Contains operator in order tocheck
listExistingUsers.Select(x => x.Nickname).Contains(txtNickname.text);
link : http://msdn.microsoft.com/fr-fr/library/bhkz42b3%28v=vs.80%29.aspx
Remark : You can use Any or count (very expensive last solution)

Use Any operator. It checks whether any element of a sequence satisfies some condition. In your case condition is user nickname equals to text in textBox:
if (Database.GetUsers().Any(u => u.Nickname == txtNickname.Text))
Error = "Username already in use.";
BTW if you change GetUsers to return IQueryable<User> then check will occur on server side.

Do get a list of NickNames once
var nickNames = new List<string>();
for(int i=0;i<listExistingUsers.Count;i++)
{
nickNames.Add(listExistingUsers.NickName);
}
Then u can simply use
if(nickNames.Contains(txtNickname.text))
{
Error = "Username already in use.";
}

1) Have you verified that Database.GetUsers() is actually returning the full list, with no SQL issues?
2) Do you need it to be case-insensitive?
3) You can use the LINQ to do the query like this:
if (listExistingUsers.Any(u => string.Equals(u, txtNickname.Text, StringComparison.CurrentCultureIgnoreCase)))
{
// Process error
}

If Database.GetUsers() return all the users from database, so do not use it! Imagine if you have already 1000 of users, for each new user it will load all the users, and you will have performance issues.
Instead, create a new method that search your database and return only one result, case it exists.
Something like :
private bool Database.UserExists(txtNickname.text) {
//Your query to database with a where statment looking for the nickname. It could be a LINQ query, or any other way you use in your system.
//If it brings 1 result, it has to return true.
}

I think the most tricky part of your task is to fill the database Correctly.
Particularly:
Cut off trailing and ending spaces
Decide if the user names should becase sensitive
Make sure that when creating a new user name you do not have the nick already
About Loading users and checking:
As mentioned above LINQ is the most effective a C# like checking for duplicates
( if (Database.GetUsers().Select(x => x.Nickname).Contains(txtNickname.Text)))
I am more used to writing SQL statements than using LINQ. If you've got lots of users SQL will read only the selected ones but I don't know if the LINQ statement above pulls all users into the memory pool or just the one(s) with the same nickname.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.