How to get Total document count from Firestore - c#

How to get the total document count from Firestore in Unity c#?
In the below picture is my FireStore DB. I want to know two things.
I want to get the total count of documents. How do I get the total count of Documents from the collection "users" in unity C#?
How to filter based on the school. And get the name of the person in unity C#?

You have at least 2 choices:
a) Either you retrieve all documents and you count them. This is simple but will cost you as many reads as there are documents (not viable if you have many documents!)
b) You create a counter in an external document which you increment/decrement on each document creation/deletion. This will cost you some writes but only 1 read to get the count. It is a bit more complex to setup, just make sure the document creation/deletion and the increment/decrement are done as per the same batch operation to avoid inconsistencies in case of errors.
Perform a simple query such as collection("users").where("school","==", "XXX").get()

Related

How can I deal with slow performance on Contains query in Entity Framework / MS-SQL?

I'm building a proof of concept data analysis app, using C# & Entity Framework. Part of this app is calculating TF*IDF scores, which means getting a count of documents that contain every word.
I have a SQL query (to a remote database with about 2,000 rows) wrapped in a foreach loop:
idf = db.globalsets.Count(t => t.text.Contains("myword"));
Depending on my dataset, this loop would run 50-1,000+ times for a single report. On a sample set where it only has to run about 50 times, it takes nearly a minute, so about 1 second per query. So I'll need much faster performance to continue.
Is 1 second per query slow for an MSSQL contains query on a remote machine?
What paths could be used to dramatically improve that? Should I look at upgrading the web host the database is on? Running the queries async? Running the queries ahead of time and storing the result in a table (I'm assuming a WHERE = query would be much faster than a CONTAINS query?)
You can do much better than full text search in this case, by making use of your local machine to store the idf scores, and writing back to the database once the calculation is complete. There aren't enough words in all the languages of the world for you to run out of RAM:
Create a dictionary Dictionary<string,int> documentFrequency
Load each document in the database in turn, and split into words, then apply stemming. Then, for each distinct stem in the document, add 1 to the value in the documentFrequency dictionary.
Once all documents are processed this way, write the document frequencies back to the database.
Calculating a tf-idf for a given term in a given document can now be done just by:
Loading the document.
Counting the number of instances of the term.
Loading the correct idf score from the idf table in the database.
Doing the tf-idf calculation.
This should be thousands of times faster than your original, and hundreds of times faster than full-text-search.
As others have recommended, I think you should implement that query on the db side. Take a look at this article about SQL Server Full Text Search, that should be the way to solve your problem.
Applying a contains query in a loop extremely bad idea. It kills the performance and database. You should change your approach and I strongly suggest you to create Full Text Search indexes and perform query over it. You can retrieve the matched record texts with your query strings.
select t.Id, t.SampleColumn from containstable(Student,SampleColumn,'word or sampleword') C
inner join table1 t ON C.[KEY] = t.Id
Perform just one query, put the desired words which are searched by using operators (or, and etc.) and retrieve the matched texts. Then you can calculate TF-IDF scores in memory.
Also, still retrieving the texts from SQL Server into in memory might takes long to stream but it is the best option instead of apply N contains query in the loop.

How do I find out if a DynamoDB table is empty?

How can I find out if a DynamoDB table contains any items using the .NET SDK?
One option is to do a Scan operation, and check the returned Count. But Scans can be costly for large tables and should be avoided.
The describe table count does not return real time value. The item count is updated every 6 hours.
The best way is to scan only once without any filter expression and check the count. This may not be costly as you are scanning the table only once and it would not scan the entire table as you don't need to scan recursively to find whether the table has any item.
A single scan returns only 1 MB of data.
If the use case requires real time value, this is the best and only option available.
Edit: While the below appears to work fine with small tables on localhost, the docs state
DynamoDB updates this value approximately every six hours. Recent changes might not be reflected in this value.
so only use DescribeTable if you don't need an accurate, up to date figure.
Original:
It looks like the best way to do this is to use the DescribeTable method on AmazonDynamoDBClient:
AmazonDynamoDBClient client = ...
if (client.DescribeTable("FooTable").Table.ItemCount == 0)
// do stuff

RavenDB indices chains

Is it possible to use an output of one index as an input for another?
Something like:
public class ChainedIndex: AbstractIndexCreationTask<InputIndex, InputIndexOutputType, ReduceResult>
{
//blahblahblah
}
Yes. You can now do this.
Enable the Scripted Index Results bundle
Write your first index, for example - a map/reduce index.
Write a script that writes the result back to another document.
Write a new index against those documents.
As changes to the original documents are indexed, the resulting changes get written to new documents, which then get indexed. Repeat if desired, just be careful not to create an endless loop.
This is a new feature for RavenDB 2.5. Oren describes it in this video at 21:36.

How to convert datagrid data to tree structure ,and whether it can improve search speed?

I try to resolve a telephone business problem,say there are very huge call records,each of them mainly contain "caller num","called num","start datetime","duration time".
the original data was stored in excel and i have import them into infragistics's ultragrid control,
say it is grid_A.
what i want to do is :for a certain call record outside grid_A, search grid_A and find if there is a matching record.there matching mean the two records have same "caller num" ,"called num",and the two records have similar "start datetime","duration time",for example,their "start datetime" is
00:10:00 and 00:12:00 ,just 2 mins different.
because such compare will also be done many times and current search code is too slow,I think if i can change the grid data into a tree structure.
the tree i image is this form:
the first level is "caller num",assume 100 nodes of "caller num",
below it ,is "called num",assume for each "caller num", there may have 1-10 "called num"
last level is "start time" and "duration time".
what i want to know is :if this structure can be accomplish with c# and how to do it ,whether this structure can improve my search much.
Look forward to your help.
what i want to do is :for a certain call record outside grid_A, search grid_A and find if there is a matching record.there matching mean the two records have same "caller num" ,"called num",and the two records have similar "start datetime","duration time",for example,their "start datetime" is
00:10:00 and 00:12:00 ,just 2 mins different.
It sounds like you are trying to do a search on the records displayed in your grid, although I will admit I had a difficult time parsing your language.
If this is the case, and since you only have four columns, I recommend creating an index (Dictionary<string, IEnumerable<object>>) on all of your records that takes caller num and called num and stores it as the key (the value would be the row objects that match the given key). Then, you can efficiently find match candidates, and use Linq to sort through the (ideally much smaller) subset of records to find records that further meet your criteria.
In fact, in thinking about it, this approach turns the problem into a map-reduce problem, where the map phase is the querying of the individual index entries to find ones that match. The linq queries can be executed in parallel to further speed your execution time.

How do i edit how SOLR scores a document?

I'm using SolrNet to map Solr Index Documents and Results to classes and use the server for a desktop search application. What I need from Solr is to give a query string, and return a list of documents with two details : the unique id for that document, and the score for that document
But the score i want is not the score that SOLR calculates by itself. I need a score that reflects only the frequency of that string in the document (in other words, hit-count in that document). How do I change how SOLR scores documents so that the score generated for each document is either equal to or proportional to the hit-count?
have you looked to function queries? specifically termfreq can be helpful for you.
http://wiki.apache.org/solr/FunctionQuery#termfreq
you can sort just by termfreq using http://solrurl/?q=myterm&sort=termfreq(text,'myterm') desc

Categories

Resources