As a part of a project I'm working on, I need to implement a cache like mechanism.
My cache is composed of 3 main things: Dictionary, List, File.
where the Dictionary needs to save the last 1000 items.
the list saves everything it can with its given memory
and at last we have the File to hold all items.
when I'd like to search from a range og items I need to look for like that:
Dictionary->List->File.
searching the file should be about 0%, since efficiency is required.
I'm trying to think about a proper function to search those last items (usually they are the ones that I need to extract)
for example one can request for items 1 to 1200 (searching the dictionary and then the list) and so on...
What is the best way to do this kind of search?
Related
I have a program that creates a list of objects from a file, and also creates a list of the same type of object, but with fewer/and some different properties, from the database, like:
List from FILE: Address ID, Address, City, State, Zip, other important properties
list from DB: Address ID, Address, City, State
I have implemented IEquatable on this CustObj so that it only compares against Address, City, and State, in the hopes of doing easy comparisons between the two lists.
The ultimate goal is to get the address ID from the database and update the address IDs for each address in the list of objects from the file. These two lists could have quite a lot of objects (over 1,000,000) so I want it to be fast.
The alternative is to offload this to the database and have the DB return the info we need. If that would be significantly faster/more resource efficient, I will go that route, but I want to see if it can be done quickly and efficiently in code first.
Anyways, I see there's a Zip method. I was wondering if I could use that to say "if there's a match between the two lists, keep the data in list 1 but update the address id property of each object in list 1 to the address Id from list 2".
Is that possible?
The answer is, it really depends. There are a lot of parameters you haven't mentioned.
The only way to be sure is to build one solution (preferably using the zip method, since it has less work involved) and if it works within the parameters of your requirements (time or any other parameter, memory footprint?), you can stop there.
Otherwise you have to off load it to the database. Mind you, you would have to hold the 1 million records from files and 1 million records from DB in memory at the same time if you want to use the zip method.
The problem with pushing everything to the database is, inserting that many records is resource (time, space etc) consuming. Moreover if you want to do that maybe everyday, it is going to be more difficult, resource wise.
Your question didn't say if this was going to be a one time thing or a daily event in a production environment. Even that is going to make a difference in which approach to choose.
To repeat, you would have to try different approaches to see which will work best for you based on your requirements: is this a one time thing? How much resources does the process have? How much time does it have? and possibly many more.
It kindof sounds also like a job for .Aggregate() aka
var aggreg = list1.Aggregate(otherListPrefilled, (acc,elemFrom1) =>
{
// some code to create the joined data, usig elemFrom1 to find
// and modify the correct element in otherListPrefilled
return acc;
});
normally I would use an empty otherListPrefilled, not sure how it performs on 100k data items though.
If its a onetime thing, its probably faster to put your file to a csv, import it in your database as temporary table and join the data in sql.
I have a Sharepoint list on a site that I want to update nightly from a SQL server DB, preferably using C#. Here is the catch, I do not know if any records were removed, added, or if any field in any record has been updated. I would believe then the simplest thing to do is remove the data from the list and then replace it with the new list data. But is there any simple way to do this? I would hate to remove 3000+ items line by line from the list and then add the 3000+ records one at a time.
Its up to your environment. If you not that much load on the systems in the night, i would prefer one of the following ways:
1) Build a timerjob, delete the list (not the items one by one, cause this is slow), recreate the list and import the items from the db. When we are talking about 3.000 - 5.000 Elements, this is not that much and i think done under 10 Minutes.
2) Loop through the sharepoint list with the items and check field by field if it was updated within the db and if yes, update it.
I would preferr to delete the list and import the complete table, cause we are talking about not that much data.
Another way, which is a good idea, is to use BCS or BDC. Then you would have the data always in place and synched with the db. Look at
https://msdn.microsoft.com/en-us/library/office/jj163782.aspx
https://msdn.microsoft.com/de-de/library/ee231515(v=vs.110).aspx
Unfortunately there is no "easy" and/or elegant way to delete all the items in a list, like the delete statement in SQL. You can either delete the entire list and recreate it if the list can be easily created from a list definition or, if your concern is performance, since SP 2007 the SPWeb Class has a method called ProcessBatchData. You can use it to batch process commands to avoid the performance penalty of issuing 6000 separate commands to the server. However, it still requires you to pass an ugly XML that contains a list of all the items to be deleted or added.
The ideal way is to enumerate all the rows from the database and see if each row already exists in the SharePoint list using a primary field value. If it already exists, simply update them[1]. Otherwise you can add a new item.
[1] - Optionally, while updating them we can compare the list item field values with database column values. Only if there is a change in any of the field, update it. Otherwise skip it.
We have a big List of >1000 items with big classes (of the same type).
The list is inserted or deleted very frequent. About 10 or 20, 30 items inserted at a time. With each item, I find exact position to insert using quick search algorithm.
But I wonder if I add every items to the end of the list then do the sort using List.Sort (I believe that MS use quick sort algorithm) then it will be better: consume less CPU like current?
I am using C#, .Net Framework 2.0.
There's rarely a general answer to questions like these. It depends very heavily on your scenario. But here's an intermediate suggestion between the two choices you bring up:
Sort the list of items to be inserted (this requires sorting 10 - 30 items based on your description). Then, insert these in order. Note that once you find the position to insert the first item, the position to insert the second item must be strictly after that location (and so on, for each subsequent item), so you don't need to search starting from the beginning again. The list being inserted into only needs to be searched through in this case as it would maintain its ordering after each insertion.
I have approximately 10,000 records. Each records has 2 fields: one field is a string up to 300 characters in length and the other field is a decimal value. This is like a product catalog with product names and the price of each product.
What I need to do is allow the user to type any word and display all products containing that word together with their prices in a listbox. That's all.
What type of collection is best for this scenario?
If I need to sort based on either product name or price, will the choice still be the same?
Right now I am using an XML file, but I thought using a collection so that I can embed all the values in the code is simpler. Thanks for your suggestions.
A Dictionary will do the job. However, if you are doing rapid partial matches (e.g. search as the user types) you may get better performance by creating multiple keys which point to the same item. For example, the word "Apple" could be located with "Ap", "App", "Appl", and "Apple".
I have used this approach on a similar number of records with very good results. I have turned my 10K source items into about 50K unique keys. Each of these Dictionary entries points to a list containing references to all matches for that term. You can then search this much smaller list more efficiently. Despite the large number of lists this creates, the memory footprint is quite reasonable.
You can also make up your own keys if desired to redirect common misspellings or point to related items. This also eliminates most of the issues with unique keys because each key points to a list. A single item may be classified by each of the words in its name; this is extremely useful if you have long product names with multiple words in it. When classifying your items, each word in the name can be mapped to one or more keys.
I should also point out that building and classifying 10K items shouldn't take long if done correctly (couple hundred milliseconds is reasonable). The results can be cached for as long as you want using Application, Cache, or static members.
To summarize, the resulting structure is a Dictionary<string, List<T>> where the string is a short (2-6 characters works well) but unique key. Each key points to a List<T> (or other collection, if you are so inclined) of items which match that key. When a search is performed, you locate the key which matches the term provided by the user. Depending on the length of your keys, you may truncate the user's search to your maximum key length. After locating the correct child collection, you then search that collection for a complete or partial match using whatever methodology you wish.
Lastly, you may wish to create a lightweight structure for each item in the list so that you can store additional information about the item. For example, you might create a small Product class which stores the name, price, department, and popularity of the product. This can help you refine the results you show to the user.
All-in-all, you can perform intelligent, detailed, fuzzy searches in real-time.
The aforementioned structures should provide functionality roughly equivalent to a trie.
10K records is not that much.
An Dictionary<string,decimal> would fit the bill. You can sort by key or by value using LINQ, as well as do searches.
This assumes that product names are unique.
I have a Excel COM addin which reads the CustomDocumentProperties section of a workbook.
This is how I access a particular entry from the CustomDocumentProperties section
DocumentProperties docProperties = (DocumentProperties)
xlWorkbook.CustomDocumentProperties;
docProperty = docProperties[propName];
The problem is when the CustomDocumentProperties contain more than 8000 entries, the performance of this
code is really bad. I have ran CPU profiler and it showed that the following line takes more than a minute.
docProperty = docProperties[propName];
Does anyone know how to improve the performance of accessing DocumentProperties?
Thanks!
I doubt that there is anything that you could do to improve the performance of the document properties. I believe that it is implemented as a simple list -- not as a dictionary or hash table. In fact, I don't believe that the list is sorted, so with 8000 entries, on average half of them, or 4000, would have to be accessed in order to find the property that you are looking for.
You might consider not using the CustomDocumentProperties as a dictionary. Instead, you might try putting all 8000 of your entries into a custom dictionary, serializing it, and then adding the entire serialized dictionary to the CustomDocumentProperties as a single entry. So to use it, you would access the CustomDocumentProperties, deserialize the dictionary, and then use it repeatedly. When done, if there were any changes to the dictionary, you would have to re-serialize it and save it back to the CustomDocumentProperties, which you would probably only want to do once -- for example, just before saving your workbook. (You might want to put code to re-serialize and save your custom dictionary to the CustomDocumentProperties within the Workbook.BeforeSave event.)