C# MongoDB, remove top 10 from sorted collection

C# MongoDB, remove top 10 from sorted collection - c#

I have a collection with 1000 items. I want to sort them per date (SaveDateUtc field) and remove the top 10 of them, so I'm left with the 990 newest items in my collection.
I could do a Find and then a Remove, no problem, but it'd be much better if I could do this with just a Remove call. But I can't find a way to sort and set top 10 through the query.
So my question is, can I do this in just one call?
(I'm using the C# driver)

Actually there was similar question: MongoDB find and remove - the fastest way
But unfortunately findAndUpdate cannot be restricted by records number. So my propositions to you:
You may introduce some surrogate field so you could use it as field in query
You may write you own java-script server-side function that will do this operation on server side. Benefits - operation is done on server, it is atomic. Pitfall - it is not works with sharded tables.

Related

How do I find out if a DynamoDB table is empty?

How can I find out if a DynamoDB table contains any items using the .NET SDK?
One option is to do a Scan operation, and check the returned Count. But Scans can be costly for large tables and should be avoided.

The describe table count does not return real time value. The item count is updated every 6 hours.
The best way is to scan only once without any filter expression and check the count. This may not be costly as you are scanning the table only once and it would not scan the entire table as you don't need to scan recursively to find whether the table has any item.
A single scan returns only 1 MB of data.
If the use case requires real time value, this is the best and only option available.

Edit: While the below appears to work fine with small tables on localhost, the docs state
DynamoDB updates this value approximately every six hours. Recent changes might not be reflected in this value.
so only use DescribeTable if you don't need an accurate, up to date figure.
Original:
It looks like the best way to do this is to use the DescribeTable method on AmazonDynamoDBClient:
AmazonDynamoDBClient client = ...
if (client.DescribeTable("FooTable").Table.ItemCount == 0)
// do stuff

API - filter big list with word fragment

I have asp.net web api application. In database I have a big list (between 100.000 and 200.000) of pairs like id:name and this list could be changed quite rarely. I need to implement filtering like this /pair/filter?fragment=bla. It should return first 25 pairs where any word in name starts with word fragment. I see two approachs here: 1st approach is to load data into cache (HttpRuntimeCache, redis or smth like this) to increase loading time and filter in linq. But I think there will be problems with time required for serialiazing/deserialiazing. Another approach: for instance I have a pair 22:some title here so I need to provide separate table like this:
ID | FRAGMENT
22 | some
22 | title
22 | here
with primary key on both columns and separate index on FRAGMENT column to make queries faster. Any offers and remarks are welcome.
UPD: now I've refreshed my mind. I don't want to query database because requests happen quite often. So now I see the best solution is
load entire list in memory
build trie structure which keeps hashset of values in each node
in case of one text fragment - just return the hashset from trie node, in case of few fragments - find all hashsets and get their intersection

You could try a full-text index on your current DB (if its supported) and the CONTAINS keyword like so
SELECT * FROM tableName WHERE CONTAINS(name, 'bla*');
This will look for words starting with "bla" in the entire string, and also match the string "Monkeys blabla"

I dont really understand your question but if you want to query any table you can do so since you already have the queryString. You can try this out.
var res = _repository.Table.Where(c => c.Name.StartsWith("bla")).Take(25);
If it doesnt help. Try to to restructure your question a little bit.

Is this a case of premature optimization?
How many users will be hitting this service simultaneously? How many will be hitting your database simultaneously? How efficient is your query? How much data will be returned across the wire?
In most cases, you can't outsmart an efficient database for performance. Your row count is too small to create a truly heavy burden on your application's runtime performance when querying. This assumes, of course, that your query is well written and that you're properly opening, closing, and freeing resources in a timely fashion.
Caching the data in memory has its trade-offs that should be considered. It increases the memory footprint of your application, and requires you to write and maintain additional code to maintain that cache. That is by no means prohibitive, but should be considered in view of your overall architecture.
Consider these things carefully. From what I can tell, keeping this data in the database is fine. Deserialization tends to be fast (as most of the data you return is native types), and shouldn't be cost-prohibitive.

ParseObject queries with relative search parameters

I'm rather new to Parse and Cloud Code, and I'm having trouble writing a certain query script.
I have a table of Salespeople, who have two integers : dailySold and dailyQuota.
The dailySold is reset to 0 each day, and the dailyQuota is defined by upper management.
Now, I'd like to make queries that call out bulks of users. Say, all users which dailySold is below their dailyQuota. In MySQL it would just look like this :
select * from salespeople where dailySold < dailyQuota
But in Parse / CloudCode I have been unable to find something like this. Currently, I'm loading all the entries, and going through them one by one, populating a large array clientside. This feels like the absolutely wrong way of doing it.
And the query.WhereNotEqualTo() function (and their siblings) seem to only be able to compare with static queries.
Does anyone know how to put together a query to optimize this ? I need it to go through thousands of records, and its often only 10-20 results I'm interested in. If nothing else, I'll have to make a cloudcode function that iterates for me serverside, but I still feel like there is some function I should be able to use, to make a more lean query.

You can't compare two columns in a query. You can only compare a key with a provided object. If the dailyQuota is set by upper management, I'm assuming this is the same for all salespeople, or for groups of people. I'd suggest first making a query for the daily quota and then either use
whereKey:matchesKey:inQuery
or just fetch the dailyQuota first and then use that value in the second query.

Find exact position then insert or add then sort? Which is better?

We have a big List of >1000 items with big classes (of the same type).
The list is inserted or deleted very frequent. About 10 or 20, 30 items inserted at a time. With each item, I find exact position to insert using quick search algorithm.
But I wonder if I add every items to the end of the list then do the sort using List.Sort (I believe that MS use quick sort algorithm) then it will be better: consume less CPU like current?
I am using C#, .Net Framework 2.0.

There's rarely a general answer to questions like these. It depends very heavily on your scenario. But here's an intermediate suggestion between the two choices you bring up:
Sort the list of items to be inserted (this requires sorting 10 - 30 items based on your description). Then, insert these in order. Note that once you find the position to insert the first item, the position to insert the second item must be strictly after that location (and so on, for each subsequent item), so you don't need to search starting from the beginning again. The list being inserted into only needs to be searched through in this case as it would maintain its ordering after each insertion.

How can I get a percentage of LINQ to SQL submitchanges?

I wonder if anyone else has asked a similar question.
Basically, I have a huge tree I'm building up in RAM using LINQ objects, and then I dump it all in one go using DataContext.SubmitChanges().
It works, but I can't find how to give the user a sort of visual indication of how far has the query progressed so far. If I could ultimately implement a sort of progress bar, that would be great, even if there is a minimal loss in performance.
Note that I have quite a large amount of rows to put into the DB, over 750,000 rows.
I haven't timed it exactly, but it does take a long while to put them in.
Edit: I thought I'd better give some indication of what I'm doing.
Basically, I'm building a suffix tree from the Lord of the Rings. Thus, there are a lot of Nodes, and certain Nodes have positions associated to them (Nodes that happen to be at the end of a suffix). I am building the Linq objects along these lines.
suffixTreeDB.NodeObjs.InsertOnSubmit(new NodeObj()
{
NodeID = 0,
ParentID = 0,
Path = "$"
});
After the suffix tree has been fully generated in RAM (which only takes a few seconds), I then call suffixTreeDB.submitChanges();
What I'm wondering is if there is any faster way of doing this. Thanks!
Edit 2: I've did a stopwatch, and apparently it takes precisely 6 minutes for the DB to be written.

I suggest you divide the calls you are doing, as they are sent in separate calls to the db anyway. This will also reduce the size of the transaction (which linq does when calling submitchanges).
If you divide them in 10 blocks of 75.000, you can provide a rough estimate on a 1/10 scale.
Update 1: After re-reading your post and your new comments, I think you should take a look at SqlBulkCopy instead. If you need to improve the time of the operation, that's the way to go. Check this related question/answer: What's the fastest way to bulk insert a lot of data in SQL Server (C# client)

I was able to get percentage progress for ctx.submitchanges() by using ctx.Log and ActionTextWriter
ctx.Log = new ActionTextWriter(s => {
if (s.StartsWith("INSERT INTO"))
insertsCount++;
ReportProgress(insertsCount);
});
more details are available at my blog post
http://epandzo.wordpress.com/2011/01/02/linq-to-sql-ctx-submitchanges-progress/
and stackoverflow question
LINQ to SQL SubmitChangess() progress

This isn't ideal, but you could create another thread that periodically queries the table you're populating to count the number of records that have been inserted. I'm not sure how/if this will work if you are running in a transaction though, since there could be locking/etc.

What I really think I need is a form of Bulk-Insert, however it appears that Linq doesn't support it.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# MongoDB, remove top 10 from sorted collection - c#

Related

How do I find out if a DynamoDB table is empty?

API - filter big list with word fragment

ParseObject queries with relative search parameters

Find exact position then insert or add then sort? Which is better?

How can I get a percentage of LINQ to SQL submitchanges?

Categories

Resources