On a single-instance MongoDB server, even with the write concern on the client set to journaled, one in every couple of thousand documents isn't replacable immediately after inserting.
I was under the impression that once journaled, documents are immediately available for querying.
The code below inserts a document, then updates the DateModified property of the document and tries to update the document based on the document's Id and the old value of that property.
public class MyDocument
{
public BsonObjectId Id { get; set; }
public DateTime DateModified { get; set; }
}
static void Main(string[] args)
{
var r = Task.Run(MainAsync);
Console.WriteLine("Inserting documents... Press any key to exit.");
Console.ReadKey(intercept: true);
}
private static async Task MainAsync()
{
var client = new MongoClient("mongodb://localhost:27017");
var database = client.GetDatabase("updateInsertedDocuments");
var concern = new WriteConcern(journal: true);
var collection = database.GetCollection<MyDocument>("docs").WithWriteConcern(concern);
int errorCount = 0;
int totalCount = 0;
do
{
totalCount++;
// Create and insert the document
var document = new MyDocument
{
DateModified = DateTime.Now,
};
await collection.InsertOneAsync(document);
// Save and update the modified date
var oldDateModified = document.DateModified;
document.DateModified = DateTime.Now;
// Try to update the document by Id and the earlier DateModified
var result = await collection.ReplaceOneAsync(d => d.Id == document.Id && d.DateModified == oldDateModified, document);
if (result.ModifiedCount == 0)
{
Console.WriteLine($"Error {++errorCount}/{totalCount}: doc {document.Id} did not have DateModified {oldDateModified.ToString("yyyy-MM-dd HH:mm:ss.ffffff")}");
await DoesItExist(collection, document, oldDateModified);
}
}
while (true);
}
The code inserts at a rate of around 250 documents per second. One in around 1,000-15,000 calls to ReplaceOneAsync(d => d.Id == document.Id && d.DateModified == oldDateModified, ...) fails, as it returns a ModifiedCount of 0. The failure rate depends on whether we run a Debug or Release build and with debugger attached or not: more speed means more errors.
The code shown represents something that I can't really easily change. Of course I'd rather perform a series of Update.Set() calls, but that's not really an option right now. The InsertOneAsync() followed by a ReplaceOneAsync() is abstracted by some kind of repository pattern that updates entities by reference. The non-async counterparts of the methods display the same behavior.
A simple Thread.Sleep(100) between inserting and replacing mitigates the problem.
When the query fails, and we wait a while and then attempt to query the document again in the code below, it'll be found every time.
private static async Task DoesItExist(IMongoCollection<MyDocument> collection, MyDocument document, DateTime oldDateModified)
{
Thread.Sleep(500);
var fromDatabaseCursor = await collection.FindAsync(d => d.Id == document.Id && d.DateModified == oldDateModified);
var fromDatabaseDoc = await fromDatabaseCursor.FirstOrDefaultAsync();
if (fromDatabaseDoc != null)
{
Console.WriteLine("But it was found!");
}
else
{
Console.WriteLine("And wasn't found!");
}
}
Versions on which this occurs:
MongoDB Community Server 3.4.0, 3.4.1, 3.4.3, 3.4.4 and 3.4.10, all on WiredTiger storage engine
Server runs on Windows, other OSes as well
C# Mongo Driver 2.3.0 and 2.4.4
Is this an issue in MongoDB, or are we doing (or assuming) something wrong?
Or, the actual end goal, how can I ensure an insert is immediately retrievable by an update?
ReplaceOneAsync returns 0 if the new document is identical to the old one (because nothing changed).
It looks to me like if your test executes fast enough the various calls to DateTime.Now could return the same value, so it is possible that you are passing the exact same document to InsertOneAsync and ReplaceOneAsync.
Related
Im trying to update a string field in specific document using Firebase Firestore for my android app but every method I see is while knowing the document refernce which I find difficult to find in my program.
Would like for some help for another method or help in finding the document refernce using a specific field value.
Thanks in advance.
(using C# btw)
private async Task<string> GetDocRefAsync(string userId)
{
Object obj = await FirestoreData.GetFirestore().Collection(DBConstants.FS_COLLECTION_USERS).
WhereEqualTo(DBConstants.FS_COLLECTION_USERS_USER_ID, userId).Get();
QuerySnapshot snapshot = (QuerySnapshot)obj;
if (snapshot.IsEmpty)
{
Log.Debug("UpdateGroupAsync", "userId: " + userId + " not found");
return null;
}
string docRef = "";
foreach (DocumentSnapshot item in snapshot.Documents)
{
//docRef = item.;
}
return docRef;
}
Firstly ive tried to find the document ref using this code but dont have a function to get the ref even after getting the correct document.
the fourth line from the bottom is where I couldnt find it.
database pic
this.groupCode = code;
string strUserRef = GetDocRefAsync(userRef).ToString();
DocumentReference docReference = database.Collection(DBConstants.FS_COLLECTION_USERS_GROUPS_CODE).Document(strUserRef);
docReference.Update(DBConstants.FS_COLLECTION_USERS_GROUPS_CODE, groupCode);
If you want to get the documents where a field has a given value, you can use a query. Then once the query returns, you can get documents IDs with the .Id field on each DocumentShapshot in the returned documents.
You will also need to add await for the returned value since it is an async method returning a Task<string> not returning a string.
private async Task<string> GetDocRefAsync(string userId) {
CollectionReference usersRef = FirestoreData.GetFirestore().Collection(DBConstants.FS_COLLECTION_USERS);
Query query = usersRef.WhereEqualTo(DBConstants.FS_COLLECTION_USERS_USER_ID, userId);
// or GetSnapshotAsync depending on the version of firebase
QuerySnapshot querySnapshot = await query.Get();
// Note: if it matches multiple documents this will just return
// the ID of the first match
foreach (DocumentSnapshot item in snapshot.Documents)
{
return item.Id;
}
Log.Debug("UpdateGroupAsync", "userId: " + userId + " not found");
return null;
}
And you can use it like this to update a document (note that you were using a different collection here - probably by mistake).
string userDocId = await GetDocRefAsync(userId);
CollectionReference userCollection = database.Collection(DBConstants.FS_COLLECTION_USERS);
DocumentReference docReference = userCollection.Document(userDocId);
// or UpdateAsync depending on your version of firebase
docReference.Update(DBConstants.FS_COLLECTION_USERS_GROUPS_CODE, groupCode);
Since collection.InsertOne(document) returns void how do i know that the document written to the database for sure? I have a function which need to be run exactly after document is written to the database.
How can I check that without running a new query?
"Since collection.InsertOne(document) returns void" - is wrong, see db.collection.insertOne():
Returns: A document containing:
A boolean acknowledged as true if the operation ran with write concern or false if write concern was disabled.
A field insertedId with the _id value of the inserted document.
So, run
ret = db.collection.insertOne({your document})
print(ret.acknowledged);
or
print(ret.insertedId);
to get directly the _id of inserted document.
The write concern can be configured on either the connection string or the MongoClientSettings which are both passed in to the MongoClient object on creation.
var client = new MongoClient(new MongoClientSettings
{
WriteConcern = WriteConcern.W1
});
More information on write concern can be found on the MongoDB documentation - https://docs.mongodb.com/manual/reference/write-concern/
If the document is not saved the C# Driver will throw an exception (MongoWriteException).
Also if you have any write concern > Acknowledged, you'll also get back the Id of the document you've just save.
var client = new MongoClient(new MongoClientSettings
{
WriteConcern = WriteConcern.W1
});
var db = client.GetDatabase("test");
var orders = db.GetCollection<Order>("orders");
var newOrder = new Order {Name = $"Order-{Guid.NewGuid()}"};
await orders.InsertOneAsync(newOrder);
Console.WriteLine($"Order Id: {newOrder.Id}");
// Output
// Order Id: 5f058d599f1f033f3507c368
public class Order
{
public ObjectId Id { get; set; }
public string Name { get; set; }
}
I am using Parse in a mobile application I am working on Xamarin/C#. I am trying to query a table by the updatedAt field so that I can reduce the amount of data calls being made by my application. I am querying my Parse DB with the most recent "updatedAt" date in my local SQlite DB. The only issue is Parse is returning all items within that table. Here is my function:-
public static async Task getNewLiveTips(Action<bool> callback)
{
DateTime lastDate = App.Current.modelManager.GetOnlyLocalTip().updatedAt;
if (lastDate != null) {
var query = new ParseQuery<ParseTip>();
query.WhereGreaterThanOrEqualTo("updatedAt", lastDate);
IEnumerable<ParseTip> parseTips = await query.FindAsync();
foreach (var tip in parseTips)
{
Log.Debug(TAG, "Adding new updated live tip item");
App.Current.modelManager.SaveLocalTip(ModelUtils.parseToLocalTip((ParseTip)tip));
}
}
callback(true);
}
I don't do any manipulation of dates anywhere so the date from my local SQLite DB looks like this:-
06/09/2016 12:50:02
The dates returned are:-
06/09/2016 15:14:23
17/08/2016 21:12:31
As you can see, one of the dates is more recent and one of the dates is older. Can anyone spot my issue?
Thanks
Didn't manage to figure out why this function didn't work but I managed to get the same results doing:-
private static async Task getAllLiveTips()
{
var query = new ParseQuery<ParseTip>().OrderByDescending("updatedAt").Limit(5);
IEnumerable<ParseTip> parseTips = await query.FindAsync();
if (parseTips != null)
{
foreach (var liveTip in parseTips)
{
Log.Debug(TAG, "Adding live tip item");
App.Current.modelManager.SaveLocalTip(ModelUtils.parseToLocalTip(liveTip));
}
}
}
We have an email queue table in the database. It holds the subject, HTML body, to address, from address etc.
In Global.asax every interval, the Process() function is called which despatches a set number of emails. Here's the code:
namespace v2.Email.Queue
{
public class Settings
{
// How often process() should be called in seconds
public const int PROCESS_BATCH_EVERY_SECONDS = 1;
// How many emails should be sent in each batch. Consult SES send rates.
public const int EMAILS_PER_BATCH = 20;
}
public class Functions
{
private static Object QueueLock = new Object();
/// <summary>
/// Process the queue
/// </summary>
public static void Process()
{
lock (QueueLock)
{
using (var db = new MainContext())
{
var emails = db.v2EmailQueues.OrderBy(c => c.ID).Take(Settings.EMAILS_PER_BATCH);
foreach (var email in emails)
{
var sent = Amazon.Emailer.SendEmail(email.FromAddress, email.ToAddress, email.Subject,
email.HTML);
if (sent)
db.ExecuteCommand("DELETE FROM v2EmailQueue WHERE ID = " + email.ID);
else
db.ExecuteCommand("UPDATE v2EmailQueue Set FailCount = FailCount + 1 WHERE ID = " + email.ID);
}
}
}
}
The problem is that every now and then it's sending one email twice.
Is there any reason from the code above that could explain this double sending?
Small test as per Matthews suggestion
const int testRecordID = 8296;
using (var db = new MainContext())
{
context.Response.Write(db.tblLogs.SingleOrDefault(c => c.ID == testRecordID) == null ? "Not Found\n\n" : "Found\n\n");
db.ExecuteCommand("DELETE FROM tblLogs WHERE ID = " + testRecordID);
context.Response.Write(db.tblLogs.SingleOrDefault(c => c.ID == testRecordID) == null ? "Not Found\n\n" : "Found\n\n");
}
using (var db = new MainContext())
{
context.Response.Write(db.tblLogs.SingleOrDefault(c => c.ID == testRecordID) == null ? "Not Found\n\n" : "Found\n\n");
}
Returns when there is a record:
Found
Found
Not Found
If I use this method to clear the context cache after the delete sql query it returns:
Found
Not Found
Not Found
However still not sure if it's the root cause of the problem though. I would of thought the locking would definitely stop double sends.
The issue that your having is due to the way Entity Framework does its internal cache.
In order to increase performance, Entity Framework will cache entities to avoid doing a database hit.
Entity Framework will update its cache when you are doing certain operations on DbSet.
Entity Framework does not understand that your "DELETE FROM ... WHERE ..." statement should invalidate the cache because EF is not an SQL engine (and does not know the meaning of the statement you wrote). Thus, to allow EF to do its job, you should use the DbSet methods that EF understands.
for (var email in db.v2EmailQueues.OrderBy(c => c.ID).Take(Settings.EMAILS_PER_BATCH))
{
// whatever your amazon code was...
if (sent)
{
db.v2EmailQueues.Remove(email);
}
else
{
email.FailCount++;
}
}
// this will update the database, and its internal cache.
db.SaveChanges();
On a side note, you should leverage the ORM as much as possible, not only will it save time debugging, it makes your code easier to understand.
I'm building a console application that have to process a bunch of document.
To stay simple, the process is :
for each year between X and Y, query the DB to get a list of document reference to process
for each of this reference, process a local file
The process method is, I think, independent and should be parallelized as soon as input args are different :
private static bool ProcessDocument(
DocumentsDataset.DocumentsRow d,
string langCode
)
{
try
{
var htmFileName = d.UniqueDocRef.Trim() + langCode + ".htm";
var htmFullPath = Path.Combine("x:\path", htmFileName;
missingHtmlFile = !File.Exists(htmFullPath);
if (!missingHtmlFile)
{
var html = File.ReadAllText(htmFullPath);
// ProcessHtml is quite long : it use a regex search for a list of reference
// which are other documents, then sends the result to a custom WS
ProcessHtml(ref html);
File.WriteAllText(htmFullPath, html);
}
return true;
}
catch (Exception exc)
{
Trace.TraceError("{0,8}Fail processing {1} : {2}","[FATAL]", d.UniqueDocRef, exc.ToString());
return false;
}
}
In order to enumerate my document, I have this method :
private static IEnumerable<DocumentsDataset.DocumentsRow> EnumerateDocuments()
{
return Enumerable.Range(1990, 2020 - 1990).AsParallel().SelectMany(year => {
return Document.FindAll((short)year).Documents;
});
}
Document is a business class that wrap the retrieval of documents. The output of this method is a typed dataset (I'm returning the Documents table). The method is waiting for a year and I'm sure a document can't be returned by more than one year (year is part of the key actually).
Note the use of AsParallel() here, but I never got issue with this one.
Now, my main method is :
var documents = EnumerateDocuments();
var result = documents.Select(d => {
bool success = true;
foreach (var langCode in new string[] { "-e","-f" })
{
success &= ProcessDocument(d, langCode);
}
return new {
d.UniqueDocRef,
success
};
});
using (var sw = File.CreateText("summary.csv"))
{
sw.WriteLine("Level;UniqueDocRef");
foreach (var item in result)
{
string level;
if (!item.success) level = "[ERROR]";
else level = "[OK]";
sw.WriteLine(
"{0};{1}",
level,
item.UniqueDocRef
);
//sw.WriteLine(item);
}
}
This method works as expected under this form. However, if I replace
var documents = EnumerateDocuments();
by
var documents = EnumerateDocuments().AsParrallel();
It stops to work, and I don't understand why.
The error appears exactly here (in my process method):
File.WriteAllText(htmFullPath, html);
It tells me that the file is already opened by another program.
I don't understand what can cause my program not to works as expected. As my documents variable is an IEnumerable returning unique values, why my process method is breaking ?
thx for advises
[Edit] Code for retrieving document :
/// <summary>
/// Get all documents in data store
/// </summary>
public static DocumentsDS FindAll(short? year)
{
Database db = DatabaseFactory.CreateDatabase(connStringName); // MS Entlib
DbCommand cm = db.GetStoredProcCommand("Document_Select");
if (year.HasValue) db.AddInParameter(cm, "Year", DbType.Int16, year.Value);
string[] tableNames = { "Documents", "Years" };
DocumentsDS ds = new DocumentsDS();
db.LoadDataSet(cm, ds, tableNames);
return ds;
}
[Edit2] Possible source of my issue, thanks to mquander. If I wrote :
var test = EnumerateDocuments().AsParallel().Select(d => d.UniqueDocRef);
var testGr = test.GroupBy(d => d).Select(d => new { d.Key, Count = d.Count() }).Where(c=>c.Count>1);
var testLst = testGr.ToList();
Console.WriteLine(testLst.Where(x => x.Count == 1).Count());
Console.WriteLine(testLst.Where(x => x.Count > 1).Count());
I get this result :
0
1758
Removing the AsParallel returns the same output.
Conclusion : my EnumerateDocuments have something wrong and returns twice each documents.
Have to dive here I think
This is probably my source enumeration in cause
I suggest you to have each task put the file data into a global queue and have a parallel thread take writing requests from the queue and do the actual writing.
Anyway, the performance of writing in parallel on a single disk is much worse than writing sequentially, because the disk needs to spin to seek the next writing location, so you are just bouncing the disk around between seeks. It's better to do the writes sequentially.
Is Document.FindAll((short)year).Documents threadsafe? Because the difference between the first and the second version is that in the second (broken) version, this call is running multiple times concurrently. That could plausibly be the cause of the issue.
Sounds like you're trying to write to the same file. Only one thread/program can write to a file at a given time, so you can't use Parallel.
If you're reading from the same file, then you need to open the file with only read permissions as not to put a write lock on it.
The simplest way to fix the issue is to place a lock around your File.WriteAllText, assuming the writing is fast and it's worth parallelizing the rest of the code.