Is it possible to use an output of one index as an input for another?
Something like:
public class ChainedIndex: AbstractIndexCreationTask<InputIndex, InputIndexOutputType, ReduceResult>
{
//blahblahblah
}
Yes. You can now do this.
Enable the Scripted Index Results bundle
Write your first index, for example - a map/reduce index.
Write a script that writes the result back to another document.
Write a new index against those documents.
As changes to the original documents are indexed, the resulting changes get written to new documents, which then get indexed. Repeat if desired, just be careful not to create an endless loop.
This is a new feature for RavenDB 2.5. Oren describes it in this video at 21:36.
Related
How to get the total document count from Firestore in Unity c#?
In the below picture is my FireStore DB. I want to know two things.
I want to get the total count of documents. How do I get the total count of Documents from the collection "users" in unity C#?
How to filter based on the school. And get the name of the person in unity C#?
You have at least 2 choices:
a) Either you retrieve all documents and you count them. This is simple but will cost you as many reads as there are documents (not viable if you have many documents!)
b) You create a counter in an external document which you increment/decrement on each document creation/deletion. This will cost you some writes but only 1 read to get the count. It is a bit more complex to setup, just make sure the document creation/deletion and the increment/decrement are done as per the same batch operation to avoid inconsistencies in case of errors.
Perform a simple query such as collection("users").where("school","==", "XXX").get()
My app will build an item list and grab the necessary data (ex: prices, customer item codes) from an excel file.
This reference excel file has 650 lines and 7 columns.
App will read rows of 10-12 items in one run-time.
Would it be wiser to read line item by line item?
Or should I first read all line item in the excel file into a list/array and make the search from there?
Thank you
It's good to start by designing the classes that best represent the data regardless of where it comes from. Pretend that there is no Excel, SQL, etc.
If your data is always going to be relatively small (650 rows) then I would just read the whole thing into whatever data structure you create (your own classes.) Then you can query those for whatever data you want, like
var itemsIWant = allMyData.Where(item => item.Value == "something");
The reason is that it enables you to separate the query (selecting individual items) from the storage (whatever file or source the data comes from.) If you replace Excel with something else you won't have to rewrite other code. If you read it line by line then the code that selects items based on criteria is mingled with your Excel-reading code.
Keeping things separate enables you to more easily test parts of your code in isolation. You can confirm that one component correctly reads what's in Excel and converts it to your data. You can confirm that another component correctly executes a query to return the data you want (and it doesn't care where that data came from.)
With regard to optimization - you're going to be opening the file from disk and no matter what you'll have to read every row. That's where all the overhead is. Whether you read the whole thing at once and then query or check each row one at a time won't be a significant factor.
I have to extract data from a saved search and drop each column into a csv file. This search is routinely over 300 lines long and I need to parse each record into a separate csv file ( so 300+ csv files need to be created )
With all the previous searches I have done this with the amount of columns required were small (less then 10) and the amount of joins minimal to none, so efficiency wasn't a large concern.
I now have a project that has 42 fields in in the saved search. The search is built off of a sales order and includes joins to the customer record and item records.
The search makes extensive use of custom fields as well as formula's.
What is the most efficient way for me to step through all of this?
I am thinking that the easiest method (and maybe the quickest) is to wrap it in a
foreach (TransactionSearchRow row in searchResult.searchRowList)
{
using (var sw = System.IO.File.CreateText(path+filename))
{
....
}
}
block, but I want to try and avoid
if (customFieldRef is SelectCustomFieldRef)
{
SelectCustomFieldRef selectCustomFieldRef = (SelectCustomFieldRef)customFieldRef;
if (selectCustomFieldRef.internalId.Equals("custom_field_name"))
{
....
}
}
as I expect this code to become excessively long with this process. So any ideas are appreciated.
Using the NetSuite WSDL-generated API, there is no alternative to the nested type/name tests when reading custom fields. It just sucks and you have to live with it.
You could drop down to manual SOAP and parse the XML response yourself. That sounds like torture to me but with a few helper functions you could make the process of reading custom fields much more logical.
The other alternative would be to ditch SuiteTalk entirely and do your search in a SuiteScript RESTlet. The JavaScript API has much simpler, more direct access to custom fields than the SOAP API. You could do whatever amount of pre-processing you wanted on the server side before returning data (which could be JSON, XML, plain text, or even the final CSV) to the calling application.
I'm taking over a project so I'm still learning this. The project uses Lucence.NET. I also have no idea if this piece of functionality is correct or not. Anyway, I am instantiating:
var writer = new IndexWriter(directory, analyzer, false);
For specific documents, I'm calling:
writer.DeleteDocuments(new Term(...));
In the end, I'm calling the usual writer.Optimize(), writer.Commit(), and writer.Close().
The field in the Term object is a Guid, converted to a string (.ToString("D")), and is stored in the document, using Field.Store.YES, and Field.Index.NO.
However, with these settings, I cannot seem to delete these documents. The goal is to delete, then add the updated versions, so I'm getting duplicates of the same document. I can provide more code/explanation if needed. Any ideas? Thanks.
The field must be indexed. If a field is not indexed, its terms will not show up in enumeration.
I don't think there is anything wrong with how you are handling the writer.
It sounds as if the term you are passing to DeleteDocuments is not returning any documents. Have you tried to do a query using the same term to see if it returns any results?
Also, if your goal is to simple recreate the document, you can call UpdateDocument:
// Updates a document by first deleting the document(s) containing term and
// then adding the new document. The delete and then add are atomic as seen
// by a reader on the same index (flush may happen only after the add). NOTE:
// if this method hits an OutOfMemoryError you should immediately close the
// writer. See above for details.
You may also want to check out SimpleLucene (http://simplelucene.codeplex.com) - it makes it a bit easier to do basic Lucene tasks.
[Update]
Not sure how I missed it but #Shashikant Kore is correct, you need to make sure the field is indexed otherwise your term query will not return anything.
Can anyone please explain the concept of map-reduce, particularly in Mongo?
I also use C# so any specifics in that area would also be useful.
One way to understand Map-Reduce coming from C# and LINQ is to think of it as a SelectMany() followed by a GroupBy() followed by an Aggregate() operation.
In a SelectMany() you are projecting a sequence but each element can become multiple elements. This is equivalent to using multiple emit statements in your map operation. The map operation can also chose not to call emit which is like having a Where() clause inside your SelectMany() operation.
In a GroupBy() you are collecting elements with the same key which is what Map-Reduce does with the key value that you emit from the map operation.
In the Aggregate() or reduce step you are taking the collections associated with each group key and combining them in some way to produce one result for each key. Often this combination is simply adding up a single '1' value output with each key from the map step but sometimes it's more complicated.
One important caveat with MongoDB's map-reduce is that the reduce operation must accept and output the same data type because it may be applied repeatedly to partial sets of the grouped data. If you are passed an array of values, don't simply take the length of it because it might be a partial result from an earlier reduce operation.
Here's a spot to get started with Map Reduce in Mongo. The cookbook has a few examples, I would focus on these two.
I like to think of map-reduces in the context of "data warehousing jobs" or "rollups". You're basically taking detailed data and "rolling up" a smaller version of that data.
In SQL you would normally do this with sum() and avg() and group by. In MongoDB you would do this with a Map Reduce. The basic premise of a Map Reduce is that you have two functions.
The first function (map) is a basically a giant for loop that runs over your data and "emits" certain keys and values. The second function (reduce), is a giant loop over all of the emitted data. The map says "hey this is the data you want to summarize" and the reduce says "hey this array of values reduces to this single value"
The output from a map-reduce can come in many forms (typically flat files). In MongoDB, the output is actually a new collection.
C# Specifics
In MongoDB all of the Map Reduces are performed inside of the javascript engine. So both the map & reduce function are all written in javascript. The various drivers will allow you to build the javascript and issue the command, however, this is not how I normally do it.
The preferred method for running Map Reduce jobs is to compile the JS into a file and then mongo map_reduce.js. Generally you'll do this on the server somewhere as a cron job or a scheduled task.
Why?
Well, map reduce is not a "real-time", especially with a big data set. It's really designed to be used in a batch fashion. Don't get me wrong, you can call it from your code, but generally, you don't want users to initiate map reduce jobs. Instead you want those jobs to be scheduled and you want users to be querying the results :)
Map Reduce is a way to process data where you have a map stage/function that identifies all data to be processed and process it, row by row.
Then you have a reduce step/function that can be run multiple times, for example once per server in a cluster and then once in the client to return a final result.
Here is a Wiki article describing it in more detail:
http://en.wikipedia.org/wiki/MapReduce
And here is the documentation for MongoDB for Mapreduce
http://www.mongodb.org/display/DOCS/MapReduce
Simple example, find the longest string in a list.
The map step will loop over the list calculating the length of each string, the reduce step will loop over the result from map and for each line keep the longest one.
This can of cause be much more complex but that's the essence of it.