Elasticsearch NEST Indeces and Indexing - c#

my following problem is, that I have a List of Items and want to index those with elasticsearch. I have a running elasticsearch instance, and this instance has an index called "default".
So I'm running following code:
var items = GetAListOfItem();
var response = Client.IndexMany(items);
I also tried it with Client.IndexManyAsync(items). But that didn't do anything.
Only 1 Item of this List gets indexed. Nothing more. I think its the last item, which got indexed.
I thought it could be a thing with IEnumerable and multiple enumerations, but i parsed it as a List<Item>.
Another Question would be about the best practice with Elasticsearch. Is it common to use a Index per Model. So if I'm gathering data from for example Exchange and another system, I would do 2 indeces?
ExchangeIndex
OtherSystemIndex
Thank you for your help.
Update: I saw that my Client.Index does all those calls succesful, but all those objects got the same ID from NEST. Normally she had to increment by herself, isnt it?
Update 2: I fixed the Indexing Problem. I had setup an empty ID-Field.
But still have the question mit best practive about Elasticsearch.

If you are uploading all the data with the same id, it will not increment the id, that will update the record with that id and you will have only one record, so you can upload the data without an id or give wherever unique id to identified the records.
The other common problem is that your records have not the same mapping that you give for the index.
About the other question, in the indexes, you store the information that is relevant for you, even if that have content from many models, the only thing that you have to avoid is mix information, if you have an index about server logs dont mix it with user activities for example.

Related

C#-Replacing Sharepoint list data nightly

I have a Sharepoint list on a site that I want to update nightly from a SQL server DB, preferably using C#. Here is the catch, I do not know if any records were removed, added, or if any field in any record has been updated. I would believe then the simplest thing to do is remove the data from the list and then replace it with the new list data. But is there any simple way to do this? I would hate to remove 3000+ items line by line from the list and then add the 3000+ records one at a time.
Its up to your environment. If you not that much load on the systems in the night, i would prefer one of the following ways:
1) Build a timerjob, delete the list (not the items one by one, cause this is slow), recreate the list and import the items from the db. When we are talking about 3.000 - 5.000 Elements, this is not that much and i think done under 10 Minutes.
2) Loop through the sharepoint list with the items and check field by field if it was updated within the db and if yes, update it.
I would preferr to delete the list and import the complete table, cause we are talking about not that much data.
Another way, which is a good idea, is to use BCS or BDC. Then you would have the data always in place and synched with the db. Look at
https://msdn.microsoft.com/en-us/library/office/jj163782.aspx
https://msdn.microsoft.com/de-de/library/ee231515(v=vs.110).aspx
Unfortunately there is no "easy" and/or elegant way to delete all the items in a list, like the delete statement in SQL. You can either delete the entire list and recreate it if the list can be easily created from a list definition or, if your concern is performance, since SP 2007 the SPWeb Class has a method called ProcessBatchData. You can use it to batch process commands to avoid the performance penalty of issuing 6000 separate commands to the server. However, it still requires you to pass an ugly XML that contains a list of all the items to be deleted or added.
The ideal way is to enumerate all the rows from the database and see if each row already exists in the SharePoint list using a primary field value. If it already exists, simply update them[1]. Otherwise you can add a new item.
[1] - Optionally, while updating them we can compare the list item field values with database column values. Only if there is a change in any of the field, update it. Otherwise skip it.

Retrieve random DB record

What is the best way to retrieve a "X" number of random records using Entity Framework (EF5 if it's relevant). The value of "X" will be set based on where this will be used.
Is there a method for doing this built into EF or is best to pull down a result set and then use a C# random number function to pull the records. Or is there a method that I'm not thinking of?
On the off chance that it's relevant I have a table that stores images that I use for different usages (there is a FK to an image type table). The images that I use in my carousel on the homepage is what I'm wanting to add some variety to...consequently how "random" it is doesn't matter to me much. I'm just trying to get away from the same six or so pictures always being displayed. (Also, I'm not really interested in debating/discussing storing images in a table vs local storage.)
The solution needs to be one using EF via a LINQ statement. If this isn't directly possibly I may end up doing something SIMILAR to what #cmd has recommended in the comments. This would most likely be a matter of retrieving a record count...testing the PK to make sure the resulting object wasn't null and building a LIST of the X number of object's PKs to pass to front end. The carousel lazy loads the images so I don't actually need the image when I'm building the list that will be used by the carousel.
Can you just add an ORDER BY RAND() clause to your query?
See this related question: MySQL: Alternatives to ORDER BY RAND()

storing dataset of entire table and doing query on copy then updating GridView with results of query

I'm new to n-tier enterprise development. I just got quite a tutorial just reading threw the 'questions that may already have your answer' but didn't find what I was looking for. I'm doing a geneology site that starts off with the first guy that came over on the boat, you click on his name and the grid gets populated with all his children, then click on one of his kids that has kids and the grid gets populated with his kids and so forth. Each record has an ID and a ParentID. When you choose any given person, the ID is stored and then used in a search for all records that match the ParentID which returns all the kids. The data is never changed (at least by the user) so I want to just do one database access, fill all fields into one datatable and then do a requery of it each time to get the records to display. In the DAL I put all the records into a List which, in the ObjectDataSource the function that fills the GridView just returns the List of all entries. What I want to do is requery the datatable, fill the list back up with the new query and display in the GridView. My code is in 3 files here
(I can't get the backticks to show my code in this window) All I need is to figure out how to make a new query on the existing DataTable and copy it to a new DataTable. Hope this explains it well enough.
[edit: It would be easier to just do a new query from the database each time and it would be less resource intensive (in the future if the database gets too large) to store in memory, but I just want to know if I can do it this way - that is, working from 1 copy of the entire table] Any ideas...
Your data represents a tree structure by nature.
A grid to display it may not be my first choice...
Querying all data in one query can be done by using a complex SP.
But you are already considering performance. Thats always a good thing to keep in mind when coming up with a design. But creating something, improve it and only then start to optimize seems a better to go.
Since relational databases are not real good on hierarchical data, consider a nosql (graph)database. As you mentioned there are almost no writes to the DB, nosql shines here.

Generating unique ID for clients on a system running over a LAN in C#

I've a simple client registration system that runs over a network. The system is supposed to generate a unique three digit ID (primary key) with the current year concatenated (e.g. 001-2013). However, I've encountered the problem that the same primary keys being generated when two users from different computers (over a LAN) try to register different clients.
What if the user cancels the registration after an ID is already generated? I've to reuse that ID for another client. I've read about static variable but it didn't solve my problem. I really appreciate your ideas.
Unique and sequential IDs are hard to implement. To completely achive it you would have to serialize commiting creation of client information so ID generated only when data is actually stored, otherwise you'll endup with holes when something wrong happened during submittion.
If you don't need strict sequential numbers - giving out ranges of ID (1-22, 23-44,...) to each system is common approach. Instead of ranges you can give out lists of IDs to use ({1,3,233,234}, {235,236,237}) if you need to use as many IDs as possible.
Issue:
New item -001 is created, but not saved yet
New item -002 is created, but not saved yet
Item -001 is cancelled
What to do with ID -001?
The easiest solution is to simply not assign an ID until an item is definitely stored.
An alternative is, when finally saving an item, you look up the first free ID. If the item from step 2 (#2) is saved before the one from step 1, #2 gets ID -001. When #1 then gets saved, the saving logic sees that its claimed ID (-001) is in use, so it'll assign -002. So ID's get reassigned.
Finally you can simply find the next free ID when creating a new item. In the three steps described above, this'll mean you initially have a gap where -001 is supposed to be. If you now create a new item, your code will see -001 is unused and will assign that to the new item.
But, and that totally depends on your requirements which you didn't specify, now -001 was created later in time than -002, I do not know if that is allowed. Furthermore at any given moment you can have a gap in your numbering where an item has been cancelled. If it happens at the end of a reporting period, this will cause errors (-033, -034, -036).
You also might want to include an auto-incrementing primary key instead of this invoice number or whatever it is.

How do I determine if a record already exists in a DataTable?

I have a DataTable that I am binding it to a GridView on my ASP.NET page. I also allow editing and insertion.
Upon saving/insertion, I need to determine if there is a duplicate description in the Gridview.
How can I accomplish this?
Any way the data which you are binding will have the unique id.
So after binding check for that id whether it is there or not in datatable.We can't say more than this unless you explain it more.
We may need some more information on what kind of database you are using to give you the right answer, but I'll take a swing anyway.
First, you need to have a PRIMARY KEY on your database table for several reasons including a default index and insuring uniqueness. Second, you can configure the table to have a UNIQUE INDEX on the description column. This will prevent the insertion of duplicate data at the database level. But, once you do that you will likely get some kind of exception or error in your client application that you will need to catch and handle.
Also, you could create an AJAX function to filter the data as the user types in the new row and show them records that are similar. I did this on an app where the users would put in the same request but use slightly different wording.

Categories

Resources