adding multiple bitmap values for one key in dictionary with C# - c#
I had a dictionary to store pattern images for OCR purposes. I grabbed these bitmaps from dictionary and compared to ones that I cropped from image, if they matched => grabbed the key (OCR part is done).
The problem arises here. One Key should be represented by several different bitmaps (i.e. values). How do you add multiple bitmaps to the dictionary, to represent the same key?
that's how I used dictionary:
Dictionary<string, Bitmap> lookup = new Dictionary<string, Bitmap>();
lookup.Add("A", new Bitmap(#"C:\08\letters\1\a1.bmp", true));
lookup.Add("A", new Bitmap(#"C:\08\letters\1\a2.bmp", true)); // Error will be here, because key A already exists for one Bitmap value.
lookup.Add("a", new Bitmap(#"C:\08\letters\1\aa1.bmp", true));
lookup.Add("B", new Bitmap(#"C:\08\letters\1\b1.bmp", true));
Now, to grab images and value I did following:
var target = lookup.ToList();
bitmap b1 = target[j].Value; //grab value
//if value = cropped bitmap => proceed
string key = target[j].Key; //grab key
How will this process change according to your solution?
P.s. I have heard of "System.Linq.Lookup(Of TKey, TElement)", but never used it before. Will this "lookup" help me solve my problem or is it a completely different tool? google doesn't know much about it either, so an example would be welcomed
please note, that I load dictionary only once, at program start, so it doesn't matter how fast adding is.
Lookup, on the other side is what bothers me the most. I have 120 elements in two of my dictionaries, and according to this article http://www.dotnetperls.com/dictionary-time - Lookup in the List is much more slower, than in dictionary..
anyway I'll be doing some measures to test out how List solution that was suggested below - compares to Dictionary solution which I have right now and tell the results later, probably this evening.
Lookup. It is basically a dictionary of Key to list of Values, instead of key to value.
lookup.Add("a", "123"); // creates 'a' key and adds '123' to it
lookup.Add("a", "456"); // adds '456' to existing 'a' key
lookup.Add("b", "000"); // creates 'b' key and adds '000' to it
You can not add item into a dictionary with the same key. I think you are using the wrong data structure. I thing you might have to look into using a list<> instead. Like this:
var lookup=new List<KeyValuePair<string,Bitmap>>();
lookup.Add(new KeyValuePair<string,Bitmap>("A", new Bitmap(#"C:\08\letters\1\a1.bmp", true)));
lookup.Add(new KeyValuePair<string,Bitmap>("A", new Bitmap(#"C:\08\letters\1\a2.bmp", true)));
lookup.Add(new KeyValuePair<string,Bitmap>("a", new Bitmap(#"C:\08\letters\1\aa1.bmp", true)));
lookup.Add(new KeyValuePair<string,Bitmap>("B", new Bitmap(#"C:\08\letters\1\b1.bmp", true)));
The you can do this instead. Without doing ToList():
bitmap b1 = target[j].Value; //grab value
string key = target[j].Key; //grab key
Edit
But if you are doing a ToList() on a Dictionary then you are missing the point of having a Dictionary in the first place. Because then you are accessing the Dictionary anyway the list way. I can also see a problem doing a ToList() on a Dictionary because the sorting in a Dictionary is not as you insert them, it is by hash. That means that you can not be sure that Index 1 is index 1. You also have to take in considerations that the operation add on a Dictionary is not as effective as the add on a List. The god things with a Dictionary is that the lookup is fast. But that are you not using with your current solution.
So there is two ways what i can see. The one above or make sure that the keys are unique and get it by the lookup in the Dictionary. Like this:
Dictionary<string, Bitmap> lookup = new Dictionary<string, Bitmap>();
lookup.Add("A", new Bitmap(#"C:\08\letters\1\a1.bmp", true));
lookup.Add("B", new Bitmap(#"C:\08\letters\1\a2.bmp", true));
lookup.Add("C", new Bitmap(#"C:\08\letters\1\aa1.bmp", true));
lookup.Add("D", new Bitmap(#"C:\08\letters\1\b1.bmp", true));
Then you can get the Bitmap like this:
Bitmap bm;
if(lookup.TryGetValue("A",out bm))
{
//Do something
}
Or if you know that the key is present in the Dictionary then you can do like this:
Bitmap bm;
bm= lookup["A"];
First, it's different to have 'multiple bmp-s' to 'represent the key' - or have 'one key' associated (mapped) into multiple 'values' - which is what Yorye suggested rightly.
So, if you want more values attached to a single key - then you can use something like Dictionary<TKey, IList<TValue>> - where TKey and TValue are types you need.
But that doesnt' solve indexing and querying of the data.
That assumes that your 'key' is just 'A' in your case - which is not clear what it is.
So in that case you're using the 'dictionary' for something that it shouldn't be used. Dictionary is a hashing structure (basically indexes all its entries into buckets etc.) which servers the purpose of speeding up the querying process, locating the 'right' value.
As I see in your case the 'key' is/are the 'set of bitmaps' which sort of present a 'signature' of the OCR-ed image, if I"m right? I'm not much into the OCR but I'm guessing here.
That complicates things a bit, you'd need to create a 'composite' key of a sort.
The 'key' and not the 'value' (or value list) would be the bitmaps (providing they can be made to be comparable and equal or non-equal, also there's a problem of how you compare multiple values to multiple values etc.).
If that's the case usually (but for simpler cases than yours), you'd create a custom class and make that class have a GetHashCode() Equals override (or IEqualityComparer) etc. so that it can be used as a key in dictionaries. And then you use that as a key.
Again, in your case I think that's a bit of a stretch (in a sense that it's not easy to implement).
Basically you need to think about 'querying' the data, not much as storing. What are the real 'keys' for your system. If it's a bitmap, that's always the same (or if not how you compare to signature bmps) then you might save some bmp hashcode instead and use that as a key - and compare that, instead of bmps.
i.e. you need to think about things like that - and then the solution will usually be obvious, what you need to use.
I would not recommend a list, as that's a poor-men's choice - unless you might have only a couple so it's easy to go through it by hand, somehow I dont' think that's the case in your case.
If you need some way of 'indexing' by some key or keys - then it's usually the dictionary (or dictionary is involved in some way or part) - but you can have many 'dictionaries' - or combinations. Also you can have 'many types of keys' and values etc.
You'd need to give us some data for that.
hope this helps
EDIT: And lastly - getting the right 'hash-code' is also not a simple thing to do - as with your custom structure, comnparing, that's something you need to work out yourself - so that boils down to what's your key - and what represents the 'key' (as in which property, value best describes it and makes it unique, hard thing to do for an image/bmp?), the distribution of hash values etc.
Related
Problems with finding tuple key in c# dictionary
Dictionary<Tuple<int, int>, link_data> dic_links I have the above code. I use tuple as the dictionary key. I want to find value using only one of the two values in tuple. Is there any way to find it using only index instead of searching the entire dictionary in foreach? cout << dic_links[new Tuple<int, int>(57,**).data;
No, it is not possible to use only partial key to search in a dictionary with O(1) performance. Your options are either search through all key or have separate dictionary to map each part of the key to object (make sure to keep them in sync). If you only need to search by full key or one component and O(log n) is reasonable you can use sorted list instead (you will not be able to search by second component with single array). For more ideas you can search for "range queries on dictionaries" where one would like to find "all items with key 10 to 100" which is the same problem.
No. The Dictionary is designed for efficient search using strict equality of the key. If you don't know exactly the key, then you must enumerate all elements one-by-one. In your case you'll probably have duplicate values on each individual property of the tuple, so you'll not be able to use a simple Dictionary<int, link_data> for indexing by property. You could use either a Lookup<int, link_data> if your data are static, or a Dictionary<int, List<link_data>> if you need to add/remove elements after the creation of the index.
You can convert the dictionary to a nested dictionary. Dictionary<int, Dictionary<int, link_data>>dic_links;
Search Large ConcurrentBag?
.NET 4.5.1 I have a ConcurrentBag with 200,000 objects. An object is considered "unique" by two properties of type long. I need to check the bag for a previous existence of a unique object, and if it does not exist, add it. I think doing something like the below is not correct - var foundRef = mybag.Where( r => r.mainid == tempObj.mainid && r.subid == tempObj.subid); what is the right way to search the bag as quickly as possible? I do need the concurrency/safety of the 'bag. Thanks.
Why not to use ConcurrentDictionary<Tuple<long, long>, Foo>? Your data will be indexed by these two properties mainid and subid. The only disadvantage of this approach is that you have to create new Tuple<long, long> each time you want to retrieve a value from the dictionary: var foundRef = myDict[new Tuple<long, long>(tempObj.mainid, tempObj.subid)]; But it will give you the fastest possible access time close to O(1).
list with non-unique index
I've always thought the any index should be unique, but I think it's not true at least for SQL Server as shown in the following post: Do clustered indexes have to be unique? Recently I had to store a very amount of data within a collection and thought of using a dictionary for it's the fastest collection to get an object by index. But my collection would have to allow duplicated keys. But in fact duplicated keys would not be a problem since any of the object returned would be meet the requirements (The objects are not exactly unique, but the keys would be). Some more research led me to the following post: C# Hashset Contains Non-Unique Objects Which shows a way to get a HashSet with "duplicated keys". His problem would be my solution but I wonder if there's any other way that I can have a list with duplicated keys which allows me to search very fast without having to do any workaround the get this done.
"duplicated indexes would not be a problem since any of them would be meet the requirements" If by this, you mean that obtaining any item stored against the same index value would be satisfactory you when retrieving an item by index, then a simple Dictionary will suffice. E.g. Dictionary<int, string> myData = new Dictionary<int, string>(); myData[1] = "foo"; myData[2] = "bar"; myData[2] = "baz"; // overwrites "bar" var myDatum = myData[2]; // retrievs "baz" not "bar", but this is satisfactory.
What is the fastest way to search a List<T> across multiple properties?
I have a process I've inherited that I'm converting to C# from another language. Numerous steps in the process loop through what can be a lot of records (100K-200K) to do calculations. As part of those processes it generally does a lookup into another list to retrieve some values. I would normally move this kind of thing into a SQL statement (and we have where we've been able to) but in these cases there isn't really an easy way to do that. In some places we've attempted to convert the code to a stored procedure and decided it wasn't working nearly as well as we had hoped. Effectively, the code does this: var match = cost.Where(r => r.ryp.StartsWith(record.form.TrimEnd()) && r.year == record.year && r.period == record.period).FirstOrDefault(); cost is a local List type. If I was doing a search on only one field I'd probably just move this into a Dictionary. The records aren't always unique either. Obviously, this is REALLY slow. I ran across the open source library I4O which can build indexes, however it fails for me in various queries (and I don't really have the time to attempt to debug the source code). It also doesn't work with .StartsWith or .Contains (StartsWith is much more important since a lot of the original queries take advantage of the fact that doing a search for "A" would find a match in "ABC"). Are there any other projects (open source or commercial) that do this sort of thing? EDIT: I did some searching based on the feedback and found Power Collections which supports dictionaries that have keys that aren't unique. I tested ToLookup() which worked great - it's still not quite as fast as the original code, but it's at least acceptable. It's down from 45 seconds to 3-4 seconds. I'll take a look at the Trie structure for the other look ups. Thanks.
Looping through a list of 100K-200K items doesn't take very long. Finding matching items within the list by using nested loops (n^2) does take long. I infer this is what you're doing (since you have assignment to a local match variable). If you want to quickly match items together, use .ToLookup. var lookup = cost.ToLookup(r => new {r.year, r.period, form = r.ryp}); foreach(var group in lookup) { // do something with items in group. } Your startswith criteria is troublesome for key-based matching. One way to approach that problem is to ignore it when generating keys. var lookup = cost.ToLookup(r => new {r.year, r.period }); var key = new {record.year, record.period}; string lookForThis = record.form.TrimEnd(); var match = lookup[key].FirstOrDefault(r => r.ryp.StartsWith(lookForThis)) Ideally, you would create the lookup once and reuse it for many queries. Even if you didn't... even if you created the lookup each time, it will still be faster than n^2.
Certainly you can do better than this. Let's start by considering that dictionaries are not useful only when you want to query one field; you can very easily have a dictionary where the key is an immutable value that aggregates many fields. So for this particular query, an immediate improvement would be to create a key type: // should be immutable, GetHashCode and Equals should be implemented, etc etc struct Key { public int year; public int period; } and then package your data into an IDictionary<Key, ICollection<T>> or similar where T is the type of your current list. This way you can cut down heavily on the number of rows considered in each iteration. The next step would be to use not an ICollection<T> as the value type but a trie (this looks promising), which is a data structure tailored to finding strings that have a specified prefix. Finally, a free micro-optimization would be to take the TrimEnd out of the loop. Now certainly all of this only applies to the specific example given and may need to be revisited due to other specifics of your situation, but in any case you should be able to extract practical gain from this or something similar.
What is the most efficient way to do look-up table in C#
What is the most efficient way to do look-up table in C# I have a look-up table. Sort of like 0 "Thing 1" 1 "Thing 2" 2 "Reserved" 3 "Reserved" 4 "Reserved" 5 "Not a Thing" So if someone wants "Thing 1" or "Thing 2" they pass in 0 or 1. But they may pass in something else also. I have 256 of these type of things and maybe 200 of them are reserved. So what is the most efficient want to set this up? A string Array or dictionary variable that gets all of the values. And then take the integer and return the value at that place. One problem I have with this solution is all of the "Reserved" values. I don't want to create those redundant "reserved" values. Or else I can have an if statement against all of the various places that are "reserved" but they might now be just 2-3, might be 2-3, 40-55 and all different places in the byte. This if statement would get unruly quick My other option that I was thinking was a switch statement. And I would have all of the 50ish known values and would fall through through and default for the reserved values. I am wondering if this is a lot more processing than creating a string array or dictionary and just returning the appropriate value. Something else? Is there another way to consider?
"Retrieving a value by using its key is very fast, close to O(1), because the Dictionary(TKey, TValue) class is implemented as a hash table." var things = new Dictionary<int, string>(); things[0]="Thing 1"; things[1]="Thing 2"; things[4711]="Carmen Sandiego";
The absolute fastest way to do lookups of integer values in C# is with an array. This will be preferable to using a dictionary, maybe, if you are trying to do tens of thousands of lookups at a time. For most purposes, this is overkill; it's more likely that you need to optimize developer time than processor time. If the reserved keys are not simply all keys that aren't in the lookup table (i.e. if a lookup for a key can return the found value, a not-found status, or a reserved status), you'll need to save the reserved keys somewhere. Saving them as dictionary entries with magic values (e.g. the key of any dictionary entry whose value is null is reserved) is OK unless you write code that iterates over the dictionary's entries without filtering them. A way to solve that problem is to use a separate HashSet<int> to store the reserved keys, and maybe bake the whole thing into a class, e.g.: public class LookupTable { public readonly Dictionary<int, string> Table { get; } public readonly HashSet<int> ReservedKeys { get; } public LookupTable() { Table = new Dictionary<int, string>(); ReservedKeys = new HashSet<int>(); } public string Lookup(int key) { return (ReservedKeys.Contains(key)) ? null : Table[key]; } } You'll note that this still has the magic-value issue - Lookup returns null if the key is reserved, and throws an exception if it's not in the table - but at least now you can iterate over Table.Values without filtering magic values.
Checkout the HybridDictionary. It automatically adjusts it's underlying storage mechanism based on size to get the greatest efficiency. http://msdn.microsoft.com/en-us/library/system.collections.specialized.hybriddictionary.aspx
If you have lots of reserved (currently unused) values or if the range of the integer values can get very big, then I would use a generic dictionary (Dictionary): var myDictionary = new Dictionary<int, string>(); myDictionary.Add(0, "Value 1"); myDictionary.Add(200, "Another value"); // and so on Otherwise, if you have a fixed number of values and only few of the are currently unused, then I'd use a string array (string[200]) and set/leave the reserved entries to null. var myArray = new string[200]; myArray[0] = "Value 1"; myArray[2] = "Another value"; //myArray[1] is null
The in-built Dictionary object (preferably a generic dictionary) would be ideal for this, and is specifically designed for fast/efficient retrieval of the values relating to the keys. From the linked MSDN article: Retrieving a value by using its key is very fast, close to O(1), because the Dictionary<(Of <(TKey, TValue>)>) class is implemented as a hash table. As far as your "reserved" keys go, I wouldn't worry about that at all if we're only talking about a few hundred keys/values. It's only when you reach tens, maybe hundreds of thousands of "reserved" keys/values that you'll want to implement something more efficient. In those cases, probably the most efficient storage container then would be an implementation of a Sparse Matrix.
I'm not quite sure I understand your problem correctly. You have a collection of strings. Each string is associated to an index. The consumer requests gives an index and you return the corresponding string, unless the index is reserved. Right? Can't you simple set reserved items as null in the array. If not, using a dictionary that doesn't contain the reserved items seems a reasonable solution. Anyway, you'll probably get better answers if you clarify your problem.
I would use a Dictionary to do the lookups. This is the most efficient way to do look ups by far. Using a string will run somewhere in the region of O(n) to find the object. It might be useful to have a 2nd Dictionary to all you to do a reverse lookup if its needed
Load all your values into var dic = new Dictionary<int, string>(); And use this for retrieval: string GetDescription(int val) { if(0 <= val && val < 256) if(!dic.Contains(val)) return "Reserved"; return dic[val]; throw new ApplicationException("Value must be between 0 and 255"); }
Your question seems to imply that the query key is an integer. Since you have at most 256 items, then the query key is in the range 0..255, right? If so, just have a string array of 256 strings, and use the key as an index into the array. If your query key is a string value, then it's more like a real lookup table. Using a Dictionary object is simple, but if you're after raw speed for a set of as few as 50 or so actual answers, a do-it-yourself approach such as binary search, or a trie, could be quicker. If you use binary search, since the number of items is so small, you could unroll it. How often does the list of items change? If it only changes very seldom, you can get even better speed by generating code to do the search, which you can then compile and execute to do each query. On the other hand, I assume you've proven that this lookup is your bottleneck, either by profiling or taking stackshots. If less than 10% of time-when-slow is spent in this query, then it is not your bottleneck so you may as well do the thing that is easiest to code.