MongoDB C# driver 2.0 InsertManyAsync vs BulkWriteAsync - c#

I have to insert many documents in a MongoDB collection, using the new C# 2.0 driver. Is using either collection.InsertManyAsync(...) or collection.BulkWriteAsync(...) making any difference? (particularly about performance).
From what I understand from MongoDB documentation, an insert with an array of documents should be a bulk operation under the hood. Is that correct?
Thanks for your help.

I found the answer looking at the driver source code: the InsertManyAsync uses internally the BulkWriteAsync.
So using InsertManyAsync it's the same as writing:
List<BsonDocument> documents = ...
collection.BulkWriteAsync(documents.Select(d => new InsertOneModel<BsonDocument>(d)));
Obviously, if all operations are Inserts, the InsertManyAsync should be used.

Related

How to perform 2 phase commit in C# Mongodb strongly typed driver

I am using the official C# MongoDb strongly typed driver version 2.5.0 to interact with MongoDB.
I have multiple update operations (up to 4) on documents that belongs to different collections that need to be performed in an all-or-nothing pattern (similar to transactions in relational databases).
According to the MongoDB official documentation on this link Perform Two Phase Commits, to perform transactions like updates use 2 phase commits, so my questions are:-
How to achieve that functionality using the C# Mongodb strongly typed driver (please provide code example)?
And if it is not possible please suggest NoSQL Databases that support this functionality.

Indexing an entire MongoDB collection into Elastcticsearch quickly

I have a collection in MongoDB which I am indexing into Elasticsearch. I am doing this in a C# process. The collection has 100 million documents, and for each document, I have to query other documents in order to denormalise into the Elasticsearch index.
This all takes time. Reading from MongoDB is the slow part (indexing is relatively quick). I am batching the data from MongoDB as efficiently as I can but the process takes over 2 days.
This only has to happen when the mapping in Elasticsearch changes, but that has happened a couple of times over the last month.
Are there any ways of improving the performance for this?
Maybe you don't need launch import from scratch (I mean import from MongoDB), when you change mappings. Read this: Elasticsearch Reindex API
When you need to change mapping you must:
Create new index with new mapping
Reindex data from the old index into a new index using the built-in feature of elasticsearch.
After this old documents will be indexed with new mappings inside the new index. And built-in reindex in elasticsearch will work more quickly, than import from MongoDB via HTTP API.
If you will use reindex, don't forget to use parameter wait_for_completion(this parameter described in the documentation). This will run the reindex in the background.
Is this approach will solve your problem?

Is there a C# equivalent for the Java MongoDB Hadoop Connector?

I'm playing with Mobius (the C# language binding for Spark) and the C# Driver for MongoDB. What I'm aiming to do is use MongoDB as the input/output for the Spark queries within my C# application. I know there's a Java MongoDB Hadoop Connector but I would like to continue using Mobius to write my Spark queries.
You could use MongoDB Spark Connector and DataFrame API in Mobius for querying MongoDB. The code to load data will look like
var mongoDbDataFrame = sqlContext.Read.Format("com.mongodb.spark.sql").Load()
Once the data is loaded, you could do Select(), Filter() operations on the DataFrame. You could also register the DataFrame as TempTable for using SQL queries using the code template below
mongoDbDataFrame.RegisterTempTable("MongDbDataFrameTempTable")
sqlContext.Sql("SELECT <columns> FROM MongDbDataFrameTempTable WHERE <condition>")
Note that you need to include the connector and its dependencies in the classpath and "--jars" parameter could be used for that.

MongoDB Insertion faliure Report

Is it possible to know Document Insertion failure in MongoDB. I am using C# to connect to MongoDB.
You have to enable the "safe mode" on the operation level. safe=True in Python. Check with the C# drivers docs for the related functionality in C#.

How can I use Linq with a MySql database on Mono?

There are numerous libraries providing Linq capabilities to C# code interacting with a MySql database. Which one of them is the most stable and usable on Mono?
Background (mostly irrelevant): I have a simple C# (.Net 2.0) program updating values in a MySql database. It is executed nightly via a cron job and runs on a Pentium 3 450Mhz, Linux + Mono. I want to rewrite it using Linq (.Net 3.5) mostly as an exercise (I have not yet used Linq).
The only (free) linq provider for MySql is DbLinq, and I believe it is a long way from production-ready.
There is also MyDirect.Net which is commercial, but I have heard mixed reviews of it's capability.
I've read that MySql will be implementing the Linq to Entities API for the 5.3 version of the .net connector, but I don't know if there's even a timeline for that. In fact, MySql has been totally silent about Entity Framework support for months.
Addendum: The latest release of the MySql Connector/Net 6.0 has support for the EF according to the release notes. I have no idea how stable/useful this is, so I'd love to hear from anybody who have tried it.
According to the Mono roadmap I'm not sure if Linq is available for mono?
At least some of Linq might be available in the very latest release, but Linq to DB is listed for Mono 2.4 (Feb 2009)
Not sure about Mono, but I just started using LightSpeed and that supports LINQ-to-MySQL.
at this time you cannot use linq to sql, you might look into a third party linq mysql provider or linq to entities. linq to sql only works for sql server databases.
LINQ to SQL is simply a ORM layer leveraging the power of expressions to make it easy to construct queries in your code.
If you are just calling adhoc queries for your tool, there is little need to use LINQ, it just adds an extra layer of abstraction to your code.
I have tried the tutorial at http://www.primaryobjects.com/CMS/Article100.aspx. This uses dblinq/dbmetal to generate the data context class and classes for each table.
The code failed at the first attempt with an unhandled exception (MySql.Data.Types.MySqlConversionException: Unable to convert MySQL date/time value to System.DateTime"). Googling revealed this should be easily solved by appending "Allow Zero Datetime=True;" to the connection string.
Unfortionately this turned out not to solve my problem. Thinking the MySQL .Net Connector was to blame I executed the SQL generated by dblinq without the linq2sql intermediary layer using the MySQL Connector. This time no exception occured.
Tables which do not have a date column did work with DbLinq.
So, from my experiment I agree with Adam, DbLinq is a long way from production ready.

Categories

Resources