I need to export thousands of files from a Cosmos DB, and I am wondering if there may be a more efficient way to get all these documents (but I haven't been able to figure one out by browsing the documentation and searching).
Right now I am using the FeedIterator to get my results:
Database database = m_cosmosClient.GetDatabase(m_databaseId);
DatabaseResponse databaseResponse = await database.ReadAsync();
// The response from Azure Cosmos
DatabaseProperties properties = databaseResponse;
Container container = databaseResponse.Database.GetContainer(m_cosmosDbContainer);
QueryDefinition query = new QueryDefinition(queryString);
QueryRequestOptions queryOptions = new QueryRequestOptions { MaxItemCount = 10000, MaxBufferedItemCount = 10000 };
List<Article> results = new List<Article>();
FeedIterator<Article> resultSetIterator = container.GetItemQueryIterator<Article>(query, null, queryOptions);
while (resultSetIterator.HasMoreResults)
{
FeedResponse<Article> response = await resultSetIterator.ReadNextAsync();
results.AddRange(response);
if (response.Diagnostics != null)
{
Console.WriteLine($"\nQueryWithSqlParameters Diagnostics: {response.Diagnostics.ToString()}");
}
}
I am worried that without some form of multi-tasking that I could run out of memory, and then again it is always nice to have a faster run time.
The Cosmos DB Data Migration Tool is a good (and simple) option if you want to run the extract from a workstation. It can be run interactively or automated using scripts.
Creating a job in Azure Data Factory is a bit more complex but also offers a lot more flexibility.
This article discusses the various options for data migration in and out of Cosmos DB.
Related
I have an aggregation query that I run in a mongo shell. What I want, is to execute this command from a simple console application with .Net 5.
I just want to run the command (not use any fancy LINQ queries) and loop in the results to measure somethings.
I tried a simple query:
_client = new MongoClient("mongodb://localhost:27017");
_database = _client.GetDatabase("zzz");
var res = await _database.RunCommandAsync<BsonDocument>("users.find({})");
But results is always null. I haven't quite understood whether runCommand supports only built-in commands or it's more generic.
Could you please elaborate?
Disclaimer: I don't know much about MongoDb.
From the docs:
db.runCommand(
{
"find": <string>,
// ...
}
)
So it seems to me your query should be:
var res = await _database.RunCommandAsync<BsonDocument>("{ find: 'users' }");
I am loving Apache Ignite particularly as distributed caching. However I have realised that the tooling is not as good.
I am looking for a simple desktop tool to be able to view and search the cache for values etc something similar to Redis Deskop Manager
I am in WINDOWS environment. My google searches has returned "DBeaver" which I have downloaded and configured but doesn't show my cache key values. The other one has been "Web Console" though this is a web based and I prefer some desktop thing - Not sure if I can install this locally?
Anything else around?
much appreciated.
I think the closest you can get is LINQPad + .NET Thin Client.
Ignite NuGet package actually includes LINQPad sample to get first 5 items from every cache in cluster and display them, you can modify it to your needs.
This approach requires some coding, but is quite flexible with LINQ capabilities and rich API at your disposal, plus LINQPad data display features.
Sample code:
var cfg = new IgniteClientConfiguration { Host = "127.0.0.1" };
using (var client = Ignition.StartClient(cfg))
{
// Create cache for demo purpose.
var fooCache = client.GetOrCreateCache<int, object>("thin-client-test").WithKeepBinary<int, IBinaryObject>();
fooCache[1] = client.GetBinary().GetBuilder("foo")
.SetStringField("Name", "John")
.SetTimestampField("Birthday", new DateTime(2001, 5, 15).ToUniversalTime())
.Build();
var cacheNames = client.GetCacheNames();
"Diplaying first 5 items from each cache:".Dump();
foreach (var name in cacheNames)
{
var cache = client.GetCache<object, object>(name).WithKeepBinary<object, object>();
var items = cache.Query(new ScanQuery<object, object>()).Take(5)
.ToDictionary(x => x.Key.ToString(), x => x.Value.ToString());
items.Dump(name);
}
}
```
GridGain has GUI tool which allows you connect to your grid, peek into caches, as well as many more things.
It is a part of commercial offering, but will connect to Apache Ignite grids.
I have a Orchard CMS running that is tied to a user synchronization. This sync updates each user overnight and the code begins with
... = mContentManager.Get<Orchard.Users.Models.UserPart>(lOrchardUser.ContentItem.Id,
Orchard.ContentManagement.VersionOptions.DraftRequired);
As you can see VersionOptions.DraftRequired is passed to the Get() method with creates a new draft each time the user is synchronized. It's not intended to create a new draft here so i changed it to VersionOptions.Published which avoids creating a new version record on each call.
But issue here is that passing VersionOptions.DraftRequired in the past has created around 120 version records for each user whereas there are around 1000 users in the DB.
When i now use IContentManager.Query() it takes considerably longer due to the high amount of versions.
My idea is to remove all versions except the published one as i don't need them but IContentManager doesn't provide any version removal option and deleting the records by using IRepository<> causes a NHibernate exception.
So my last resort was to use LINQ queries to remove the versions, this did work but i was told that using LINQ in Orchard is not recommended.
I wonder whether anyone had any issues with high amounts of version records as the longer the system runs the more data is accumulated and the system is getting slower and slower.
So, my questions are
is there an Orchard way to remove version records?
is there a way to disable versioning?
As there seems no Orchard way to do this, i deleted the versions with the following LINQ 2 SQL code:
public System.Web.Mvc.ActionResult RemoveVersions()
{
// select user ids, an alternate way is to retrieve user IDs via content manager but this takes a veeeeery long time due
// to the need of querying all versions
//
// var lOrchardUserIDs = mContentManager
// .Query<Orchard.Users.Models.UserPart, Orchard.Users.Models.UserPartRecord>(Orchard.ContentManagement.VersionOptions.AllVersions)
// .List()
// .Select(u => u.Id)
// .ToList()
var lOrchardUserIDs = mUserRepository.Fetch(u => true).Select(u => u.Id).ToList();
foreach (var lOrchardUserID in lOrchardUserIDs)
{
var lContentItemVersionRecords =
(from r in DataContext.ContentItemVersionRecords where r.ContentItemRecord_id == lOrchardUserID select r).ToList();
if (lContentItemVersionRecords.Count > 1)
{
foreach (var lContentItemVersionRecord in lContentItemVersionRecords)
{
if (lContentItemVersionRecord.Number == 1)
{
if (lContentItemVersionRecords[lContentItemVersionRecords.Count - 1].Published)
{
lContentItemVersionRecord.Latest = true;
lContentItemVersionRecord.Published = true;
}
}
else
DataContext.ContentItemVersionRecords.DeleteOnSubmit(lContentItemVersionRecord);
}
}
}
DataContext.SubmitChanges();
return Content("Done");
}
Currently we are using in our company OleDb in an older application.
I have started to profilng the application and dotTrace said me that this code is one of the bottlenecks. In total it takes about 18s for execution (avg 6ms).
m_DataSet = new DataSet("CommandExecutionResult");
m_DataAdapter.SelectCommand = m_OleDbCommand;
m_DataAdapter.Fill(m_DataSet, "QueryResult"); // <-- bottleneck
ReturnValue = m_DataSet.Tables[0].Copy();
m_InsertedRecordId = -1;
m_EffectedRecords = m_DataSet.Tables[0].Rows.Count;
I know, maybe there some ways to reduce the number of queries. BUT is there a way to get a DataTable from an access database without using the DataAdapter?
I've searched all over and I now have to ask SO. I'm trying to construct a simple dataflow using EzAPI. It's been anything but easy, but I'm committed to figuring this out. What I can't figure out is how to get the EzOleDBDestination working. Here's my complete code
var a = new Application();
// using a template since it's impossible to set up an ADO.NET connection to MySQL
// using EzAPI and potentially even with the raw SSIS API...
var pkg = new EzPackage(a.LoadPackage(#"C:\...\Package.dtsx", null));
pkg.Name = "Star";
var df = new EzDataFlow(pkg);
df.Name = "My DataFlow";
var src = new EzAdoNetSource(df);
src.Name = "Source Database";
src.SqlCommand = "SELECT * FROM enum_institution";
src.AccessMode = AccessMode.AM_SQLCOMMAND;
src.Connection = new EzConnectionManager(pkg, pkg.Connections["SourceDB"]);
src.ReinitializeMetaData();
var derived = new EzDerivedColumn(df);
derived.AttachTo(src);
derived.Name = "Prepare Dimension Attributes";
derived.LinkAllInputsToOutputs();
derived.Expression["SourceNumber"] = "id";
derived.Expression["Name"] = "(DT_STR,255,1252)description";
// EDIT: reordered the operation here and I no longer get an error, but
// I'm not getting any mappings or any input columns when I open the package in the designer
var dest = new EzOleDbDestination(df);
dest.AttachTo(derived, 0, 0);
dest.Name = "Target Database";
dest.AccessMode = 0;
dest.Table = "[dbo].[DimInstitution]";
dest.Connection = new EzConnectionManager(pkg, pkg.Connections["TargetDB"]);
// this comes from Yahia's link
var destInput = dest.Meta.InputCollection[0];
var destVirInput = destInput.GetVirtualInput();
var destInputCols = destInput.InputColumnCollection;
var destExtCols = destInput.ExternalMetadataColumnCollection;
var sourceColumns = derived.Meta.OutputCollection[0].OutputColumnCollection;
foreach(IDTSOutputColumn100 outputCol in sourceColumns) {
// Now getting COM Exception here...
var extCol = destExtCols[outputCol.Name];
if(extCol != null) {
// Create an input column from an output col of previous component.
destVirInput.SetUsageType(outputCol.ID, DTSUsageType.UT_READONLY);
var inputCol = destInputCols.GetInputColumnByLineageID(outputCol.ID);
if(inputCol != null) {
// map the input column with an external metadata column
dest.Comp.MapInputColumn(destInput.ID, inputCol.ID, extCol.ID);
}
}
}
Basically, anything that involves calls to ReinitializeMetadata() results in 0xC0090001, because that method is where the error happens. There's no real documentation to help me, so I have to rely on any gurus here.
I should mention that the source DB is MySQL and the target DB is SQL Server. Building packages like this using the SSIS designer works fine, so I know it's possible.
Feel free to tell me if I'm doing anything else wrong.
EDIT: here's a link to the base package I'm using as a template: http://www.filedropper.com/package_1 . I've redacted the connection details, but any MySQL and SQL Server database will do. The package will read from MySQL (using the MySQL ADO.NET Connector) and write to SQL Server.
The database schema is mostly irrelevant. For testing, just make a table in MySQL that has two columns: id (int) and description (varchar), with id being the primary key. Make equivalent columns in SQL Server. The goal here is simply to copy from one to the other. It may end up being more complex at some point, but I have to get past this hurdle first.
I can't test this now BUT I am rather sure that the following will help you get it working:
Calling ReinitializeMetadata() causes the component to fetch the table metadata. This should only be called after setting the AccessMode and related property. You are calling it before setting AccessMode...
Various samples including advice on debugging problems
define the derived column(s) directly in the SQL command instead of using a EzDerivedColumn
try to get it working with 2 SQL Server DBs first, some of the available MySQL ADO.NET provider have some shortcomings under some circumstances
UPDATE - as per comments some more information on debugging this and a link to a complete end-to-end sample with source:
http://blogs.msdn.com/b/mattm/archive/2009/08/03/looking-up-ssis-hresult-comexception-errorcode.aspx
http://blogs.msdn.com/b/mattm/archive/2009/08/03/debugging-a-comexception-during-package-generation.aspx
Complete working sample with source
I've had this exact same issue and been able to resolve it with a lot of experimentation. In short you must set the connection for both the source and destination, and then call the attachTo after both connections are set. You must call attachTo for every component.
I've written a blog about starting with an SSIS package as a template, and then manipulating it programmatically to produce a set of new packages.
The article explains the issue more.