I'm using the RavenDB streaming API to retrieve all 355,000 DocsToProcess from my database instance using the following code:
_myDocs = new List<DocsToProcess>();
var query = RavenSession.Query<DocsToProcess>();
using (var enumerator = RavenSession.Advanced.Stream(query))
{
while (enumerator.MoveNext())
{
DocsToProcess u = enumerator.Current.Document;
_myDocs.Add(u);
}
}
However, the following exception message is thrown:
StreamQuery does not support querying dynamic indexes. It is designed to be used with large data-sets and is unlikely to return all data-set after 15 sec of indexing, like Query() does.
How I can correctly iterate through all elements of type DocsToProcess in my C# application?
The documentation says explicitly for unbound results:
Important side notes:
the index already exists. Creation of a index won't occur and the query error with an IndexDoesNotExistsException exception.
And that's what your exception is saying. You have to create a static index for streaming results.
Similar to JHo above, the solution that I came up with means that you do not need to make a static index for streaming, because you're relying on the default index and using the StartsWith overload of Stream<T> in the Raven client.
We've found the solution below to work fine for most of our use-cases where we need to get everything from a Raven instance.
public IEnumerable<T> GetAll()
{
var results = new List<T>();
var conventions = _documentStore.Conventions ?? new DocumentConvention();
var defaultIndexStartsWith = conventions.GetTypeTagName(typeof(T));
using(var session = _documentStore.OpenSession())
{
using(var enumerator = session.Advanced.Stream<T>(defaultIndexStartsWith))
{
while(enumerator.MoveNext())
results.Add(enumerator.Current.Document);
}
}
return results;
}
To get by without creating a static index, you can provide some minimal bounding like this:
using (var session = store.OpenSession())
{
IEnumerator<StreamResult<Employee>> stream =
session.Advanced.Stream<Employee>("employees/");
while (stream.MoveNext())
{
// ....
}
}
Related
Using Cosmos SDK V3.
Does Cosmos support LINQ skip and Take for server side pagination, in following example?
Based on my analysis although I'm able to retrieve data however seems query is not doing server side pagination.
Why do I say that:
I tried using fiddler and put breakpoint at beginning of while loop, to see that cosmos db is called with skip and take. However there was no server side call, seems all data is fetched while calling Count itself.
private static async Task ExportAsync<T>(Database database, string paritionKeyName, string partitionKeyPath)
{
IOrderedQueryable<T> query = database
.GetContainer(SourceContainerName)
.GetItemLinqQueryable<T>(allowSynchronousQueryExecution: true);
var totalCount = query.Count();
int skip = 0;
int take = MAX_BATCH_SIZE;
int taken = 0;
while (taken < totalCount)
{
//breakpoint
var itemsToInsert = query.Skip(skip).Take(take).ToList().AsReadOnly();
await ProcessBatchAsync(database, paritionKeyName, partitionKeyPath, itemsToInsert);
taken += take;
skip++;
}
}
Adding to what #404 mentioned in the answer, Cosmos DB does support skip and take by using OFFSET and LIMIT clauses in the query but using this is not really advisable for the following reasons:
It results in expensive operations in terms of RU consumption.
It still does not provide server-side pagination as when you execute a query with OFFSET and LIMIT, number of documents that you get based on the value of LIMIT and it does not tell you if there are more documents available.
More on OFFSET and LIMIT clauses can be found here: https://learn.microsoft.com/en-us/azure/cosmos-db/sql-query-offset-limit.
In your scenario, the recommendation would be to make use of continuation tokens (as suggested by #mjwills). Using continuation tokens, you can achieve server-side pagination where you request a certain number of items (specified using QueryRequestOptions). When the query executes, you get two things back:
Documents matching your query and
Continuation token if more documents are available matching your query.
You can process the documents received. If you receive continuation token, you send another query to Cosmos DB service (but include the continuation token this time) and the service will return next set of documents.
Please see the sample code:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
using Microsoft.Azure.Cosmos;
using Microsoft.Azure.Cosmos.Linq;
namespace SO67263501
{
class Program
{
static string connectionString = "connection-string";
static string databaseName = "database-name";
static string containerName = "container-name";
static async Task Main(string[] args)
{
string continuationToken = null;
int pageSize = 100;//Let's fetch 100 items at a time
CosmosClient cosmosClient = new CosmosClient(connectionString);
Container container = cosmosClient.GetContainer(databaseName, containerName);
QueryRequestOptions requestOptions = new QueryRequestOptions()
{
MaxItemCount = pageSize
};
do
{
FeedIterator<dynamic> queryResult = container.GetItemLinqQueryable<dynamic>(true, continuationToken, requestOptions).ToFeedIterator();
FeedResponse<dynamic> feedResponse = await queryResult.ReadNextAsync();
List<dynamic> documents = feedResponse.Resource.ToList();
continuationToken = feedResponse.ContinuationToken;
//Do something with the documents...
} while (continuationToken != null);
Console.WriteLine("All done...");
Console.WriteLine("Press any key to terminate the application.");
Console.ReadKey();
}
}
}
It's supported and can be tested using ToString() on the queryable to view the query that's send to the database.
var query = container.GetItemLinqQueryable<Dictionary<string, object>>()
.OrderBy(x => x["_ts"])
.Skip(50)
.Take(10)
.ToString();
//result:
//{"query":"SELECT VALUE root FROM root ORDER BY root[\"_ts\"] ASC OFFSET 50 LIMIT 10"}
Using OFFSET has increasing RU usage in linear fashion. When you have a lot of pages it becomes extremely expensive to use this type of query with the later pages. If possible you're better off using a continuation token or the WHERE clause to filter the results.
I have a small C# application that queries a Neo4j database. The queries work fine, but I'm having trouble converting arrays of integers to an appropriate C# list type. It just seems that they should be in a proper List/Array/etc. on the C# side.
Here is the query that I run - it returns the correct results, but I'm not smart enough to get the returned list portion into a proper C# list-type object. Here is the code that is running:
using (var driver = GraphDatabase.Driver("bolt:validURL, AuthTokens.Basic("username", "password")))
// using (var session = driver.Session())
{
var result = session.Run("MATCH (c:Climate) WHERE c.koppen = 'BSh' RETURN c.koppen AS koppen, c.counties AS counties");
foreach (var record in result)
{
var koppen = record["koppen"].As<string>();
var counties = record["counties"];
}
}
}
}
}
The record[counties] is a list of a integers with - so it seems that I should easily be able to put it into a C# list object that I can iterate through. I need to do that in order to process each county and display results. C# is pretty new for me, so thanks for your help in letting me connect these two languages!
EDIT: No responses, so I tried a new approach - to simplify things, I am no longer returning the koppen value - just the list of counties (which is where the problem lies). I found some code in the Neo4j manual, and tried it:
public List<int> GetCounties()
{
using (var driver = GraphDatabase.Driver("bolt://validURL", AuthTokens.Basic("username", "password")))
using (var session = driver.Session())
{
return session.ReadTransaction(tx =>
{
var result = tx.Run("MATCH (c:Climate) WHERE c.koppen = 'BSh' RETURN c.counties");
var a = result.Select(Record => Record[0].As<int>()).ToList();
return a;
});
}
}
}
Now I'm getting an exception:
System.InvalidCastException occurred
HResult=0x80004002
Message=Unable to cast object of type 'System.Collections.Generic.List`1[System.Object]' to type 'System.IConvertible'.
Source=
StackTrace:
The exception occurs on the line ending with the toList() statement. Sorry for my not understanding the conversions that need to occur between Neo4j list results and C# lists/arrays/etc. I'm happy to try any suggestions that I get!
I have a process to read a CSV file to database. This csv is over 600 Mb so I can't set all in memory.
I use generic pattern to achieve this but I have problems on casting:
Here I Read the file
using (var fs = File.OpenText(Path.Combine(FolderContainer, filename)))
{
var csvConfiguration = new CsvConfiguration
{
HasHeaderRecord = true,
Delimiter = ","
};
using (var csvReader = new CsvReader(fs, csvConfiguration))
{
csvReader.Configuration.RegisterClassMap(CsvMapping.CsvMapping.RetrieveMapType(type));
var list = csvReader.GetRecords(type);
// Console.WriteLine(list.First());
dynamic repository = Activator.CreateInstance(typeof(Repository<>).MakeGenericType(type), UnitOfWork);
// var activities = new Repository<Activity>(UnitOfWork);
repository.InsertAllOnSubmit(list.Take(100));
repository.SubmitChanges();
}
}
I use the Take(100) for my tests purpose, I use a Unit of work InMemory.
Here is the Repository method called
public void InsertAllOnSubmit(IEnumerable<T> entities)
{
_source.InsertAllOnSubmit(entities);
}
public void InsertAllOnSubmit(IEnumerable<object> entities)
{
// foreach (var entity in entities)
// {
// InsertOnSubmit((T) entity);
// }
this.InsertAllOnSubmit((IEnumerable<T>)entities);
}
When I execute my test, I have a castException
System.InvalidCastException : Unable to cast object of type '<TakeIterator>d__3a`1[System.Object]' to type 'System.Collections.Generic.IEnumerable`1[KboOpenDataData.Activity]'.
I Try to add a AsEnumerable() or a ToList() after the Take(100) as readed on http://blog.codelab.co.nz/2009/12/22/unable-to-cast-object-of-type-issue/ with no success.
And I don't want to use ToList because I can't set the whole file in memory due to size.
I comment the lines where I used a foreach in the repository class, the foreach works but it's very very slow.
Any advices to success the cast?
Try following -
repository.InsertAllOnSubmit(list.Take(100).Cast<Activity>());
Cast extension method lets you cast enumerables from one type to another (Assuming of course that the cast is valid.. which is what you expect with explicit casts).
I want one method that can query my entire RavenDB database.
My method signature looks like this:
public static DataTable GetData(string className, int amount, string orderByProperty, string filterByProperty, string filterByOperator, string filterCompare)
I figured I can accomplish all of the above with a dynamic LuceneQuery.
session.Advanced.LuceneQuery<dynamic>();
The problem is: Since I'm using dynamic in the type given, how do I ensure that the query only includes the types matching the className?
I'm looking for something like .WhereType(className) or .Where("type: " + className).
Solution
This returns the results of the correct type:
var type = Type.GetType("Business.Data.DTO." + className);
var tagName = RavenDb.GetTypeTagName(type);
using (var session = RavenDb.OpenSession())
{
var result = session.Advanced
.LuceneQuery<object, RavenDocumentsByEntityName>()
.WhereEquals("Tag", tagName)
.ToList();
}
Note, it is not possible to add additional "WhereEquals" or other filters to this. This is because nothing specific to that document type is included in the "RavenDocumentByEntityName" index.
This means that this solution cannot be used for what I wanted to accomplish.
What I ended up doing
Although it doesn't fulfill my requirement completely, this is what I ended up doing:
public static List<T> GetData<T>(DataQuery query)
{
using (var session = RavenDb.OpenSession())
{
var result = session.Advanced.LuceneQuery<T>();
if (!string.IsNullOrEmpty(query.FilterByProperty))
{
if (query.FilterByOperator == "=")
{
result = result.WhereEquals(query.FilterByProperty, query.FilterCompare);
}
else if (query.FilterByOperator == "StartsWith")
{
result = result.WhereStartsWith(query.FilterByProperty, query.FilterCompare);
}
else if (query.FilterByOperator == "EndsWith")
{
result = result.WhereEndsWith(query.FilterByProperty, query.FilterCompare);
}
}
if (!string.IsNullOrEmpty(query.OrderByProperty))
{
if (query.Descending)
{
result = result.OrderBy(query.OrderByProperty);
}
else
{
result = result.OrderByDescending(query.OrderByProperty);
}
}
result = result.Skip(query.Skip).Take(query.Amount);
return result.ToList();
}
}
Although this is most certainly an anti-pattern, it's a neat way to just look at some data, if that's what you want. It's called very easily like this:
DataQuery query = new DataQuery
{
Amount = int.Parse(txtAmount.Text),
Skip = 0,
FilterByProperty = ddlFilterBy.SelectedValue,
FilterByOperator = ddlOperator.SelectedValue,
FilterCompare = txtCompare.Text,
OrderByProperty = ddlOrderBy.SelectedValue,
Descending = chkDescending.Checked
};
grdData.DataSource = DataService.GetData<Server>(query);
grdData.DataBind();
"Server" is one of the classes/document types I'm working with, so the downside, where it isn't completely dynamic, is that I would have to define a call like that for each type.
I strongly suggest you don't go down this road. You are essentially attempting to hide the RavenDB Session object, which is very powerful and intended to be used directly.
Just looking at the signature of the method you want to create, the parameters are all very restrictive and make a lot of assumptions that might not be true for the data you're working on. And the return type - why would you return a DataTable? Maybe return an object or a dynamic, but nothing in Raven is structured in tables, so DataTable is a bad idea.
To answer the specific question, the type name comes from the Raven-Entity-Name metadata, which you would need to build an index over. This happens automatically when you index using the from docs.YourEntity syntax in an index. Raven does this behind the scenes when you use a dynamic index such as .Query<YourEntity> or .Advanced.LuceneQuery<YourEntity>.
Still, you shouldn't do this.
I am just doing some experiments on Castle AR and 2nd level cache of NH. In the following two methods, I can see caching working fine but only for the repetition of the call of each. In other words if I call RetrieveByPrimaryKey twice for same PK, the object is found in cache. And if I call RetrieveAll twice, I see SQL issued only once.
But if I call RetrieveAll and then RetrieveByPrimaryKey with some PK, I see two SQL statements getting issued. My question is, Why AR does not look for that entity in cache first? Sure it would have found it there as a result of previous call to RetrieveAll.
public static T RetrieveByPrimaryKey(Guid id)
{
var res = default(T);
var findCriteria = DetachedCriteria.For<T>().SetCacheable(true);
var eqExpression = NHibernate.Criterion.Expression.Eq("Id", id);
findCriteria.Add(eqExpression);
var items = FindAll(findCriteria);
if (items != null && items.Length > 0)
res = items[0];
return res;
}
public static T[] RetrieveAll()
{
var findCriteria = DetachedCriteria.For<T>().SetCacheable(true);
var res = FindAll(findCriteria);
return res;
}
You're using caching on specific queries. that means that cache lookup is done in the following way:
search the cahce for results of a query with identical syntax AND the same parameters. If found- use cached results.
nHibernate (this has nothing to do with AR, by the way) doesn't know that logically, one query 'contains' the other. so this is why you're getting 2 db trips.
I would suggest using ISession.Get to retreive items by ID (it's the recommended method). I think (not tested it though) that Get can use items cached by other queries.
here's a nice blog post from ayende about it.