C# - Need Suggestions on Improving a Section of Code - c#

I have a function that receives three different "people" objects and generates a new "compatibility" object based on the combined values in the "people" objects.
However, about 1/3 of the time the three "people" objects that it receives as input are the same as one before, though possibly in a different order. In these cases I do NOT want to make a new "score" object, but simply return a value contained within the existing object.
Originally, the program just loops through the list<> of "compatibility" objects searching for the one that belongs to these three "people" (since each "compatibility" object contains an array of people objects). This method is really slow considering that there's over thousands of "compatibility" objects and over a million "people" objects.
I had the idea of using a dictionary where the key is a number I generated by combining the three people objects' id values into a single UInt64 using XOR, and storing the score objects in as dictionary values rather than in a list. This cuts down the time by about half, and is acceptable in terms of time performance, but there's way too many collisions, and it returns a wrong score too often.
Any suggestions or pointers would be much appreciated.
Edit: To add to the original question, each "people" object has a bunch of other fields that I could use, but the problem is making a key that is UNIQUE and COMMUTATIVE.

I think you're looking at things in a much too complex manner. Take the 3 PersonID values and sort them,so that they're always in the same order, no matter which order they were passed in. Then set a value in a hashtable using the three PersonIDs as the key, separated with a hyphen or some other character that won't occur in a PersonID value. Then later, check if there's a value in the hashtable with that key.
So if the three PersonIDs are 10, 5 and 22, the hash key could be something like "5-10-22".

Create the key by concatinating objectids after sorting the trio in a pre-determined order.

Your best option would be a custom IEqualityComparer class. Declare your Dictionary like this
Dictionary<List<People>, Compatability> people =
new Dictionary<List<People>, Compatability>(new PersonListComparer());
You'll need to create a PersonListComparer class that implements IEqualityComparer<List<People>>. There are two methods you'll need to implement, one that gets a hash code and one that compares equality. The Dictionary will use GetHashCode to determine if two lists are POSSIBLY equal, and the Equals method to determine if they actually are (in other words, the hash code is fast but could give a false positive but never a false negative). Use your existing hashing algorithm (the XOR) for GetHashCode, then just comare the two lists explicitly in the Equals method.
This should do the trick!

Why not use the names of the people as the dictionary key? (Sort the names first, so that order of passing doesn't matter.)
IE, John, Alice, and Bob become something like my_dictionary["Alice_Bob_John"] <- if that key exists, you've already computed the score, otherwise, you need to compute it. As an alternative to my string hacking above, you could actually use a structure:
NameTriple n = new NameTriple("John", "Alice", "Bob");
// NameTriple internally sorts the names.
my_dictionary[n] ...

If you want to keep everything in memory and not use a database, I'd recommend something akin to a tree structure. Assuming your object IDs are sortable and order doesn't matter, you can accomplish this with nested dictionaries.
Namely, a Dictionary<Key, Dictionary<Key, Dictionary<Key, Compatibility>>> should do the trick. Sort the IDs, and use the lowest value in the outer dictionary, the next value in the next, and the final value to find the compatibility object. This way, there will be no collisions, and lookup should be quite fast.
Or, now that I think again, this doesn't have to be that complicated. Just use a string as a key and concatenate the IDs together in sorted order with a "!" or something else in between that doesn't occur naturally in the IDs.

assuming all "Person" objects are unique, store a UUID in the object.
in your function staticly store the quad (P1,P2,P3,V) where P1,P2,P3 are UUID's of a Person object, sorted (to avoid the ordering problem) and V is the result from the previous calculation.
then your function checks to is if there is an entry for this triplet of Persons, if not it does the work and stores it.
you can store the (P1,P2,P3,V) values in a dictionary, just key off some hash of the three P values

Related

SQL good practices and foreign key

I have to create a database structure. I have a question about foreing keys and good practice:
I have a table which must have a field that can be two different string values, either "A" or "B".
It cannot be anything else (therefore, i cannot use a string type field).
What is the best way to design this table:
1) create an int field which is a foreign key to another table with just two records, one for the string "A" and one for the string "B"
2) create an int field then, in my application, create an enumeration such as this
public enum StringAllowedValues
{
A = 1,
B
}
3) ???
In advance, thanks for your time.
Edit: 13 minutes later and I get all this awesome feedback. Thank you all for the ideas and insight.
Many database engines support enumerations as a data type. And there are, indeed, cases where an enumeration is the right design solution.
However...
There are two requirements which may decide that a foreign key to a separate table is better.
The first is: it may be necessary to increase the number of valid options in that column. In most cases, you want to do this without a software deployment; enumerations are "baked in", so in this case, a table into which you can write new data is much more efficient.
The second is: the application needs to reason about the values in this column, in ways that may go beyond "A" or "B". For instance, "A" may be greater/older/more expensive than "B", or there is some other attribute to A that you want to present to the end user, or A is short-hand for something.
In this case, it is much better to explicitly model this as columns in a table, instead of baking this knowledge into your queries.
In 30 years of working with databases, I personally have never found a case where an enumeration was the right decision....
Create a secondary table with the meanings of these integer codes. There's nothing that compels you to JOIN that in, but if you need to that data is there. Within your C# code you can still use an enum to look things up but try to keep that in sync with what's in the database, or vice-versa. One of those should be authoritative.
In practice you'll often find that short strings are easier to work with than rigid enums. In the 1990s when computers were slow and disk space scarce you had to do things like this to get reasonable performance. Now it's not really an issue even on tables with hundreds of millions of rows.

C# Linq - Elegant way to order a dataset by the maximum value between two columns?

To elaborate on what I'm asking here, I'm working in C# trying to order a list of data that is coming out of a database. A certain object (table) has two separate properties (columns) which can hold dates. I want to sort this list based on date, meaning for each data row, I want to take the maximum of the two dates and use that for the sort.
Is there an elegant way to do this? At the moment I'm thinking that I should just get the max time from each record, and store it as a key in a Dictionary, with the data row being the value, and then iterate through the Dictionary with the sorted keys. Although this wouldn't be terrible to code, I'm trying to see if there is a nicer way to do this using Linq. Any help would be appreciated. Thanks.
If you want an alternative for sorting your collection everytime, .NET contains a class called SortedList. Here is the link to the MSDN article: http://msdn.microsoft.com/en-us/library/system.collections.sortedlist(v=vs.110).aspx
MSDN states that a SortedList:
Represents a collection of key/value pairs that are sorted by the keys and are accessible by key and by index.
So if you want to sort by date you can declare your sorted list as:
SortedList<DateTime, YourData> mySortedList = new SortedList<DateTime, YourData>();
and when you add values to it, it will already be sorted. Or you can just go with LINQ and #Alexander answer.
Edit
I just understood what you want to do. You can do:
table.OrderBy(item => Math.Max(item.Date1.Ticks, item.Date2.Ticks));
Note: The linq query above will not be performant on a large collection.

What type of collection should I use?

I have approximately 10,000 records. Each records has 2 fields: one field is a string up to 300 characters in length and the other field is a decimal value. This is like a product catalog with product names and the price of each product.
What I need to do is allow the user to type any word and display all products containing that word together with their prices in a listbox. That's all.
What type of collection is best for this scenario?
If I need to sort based on either product name or price, will the choice still be the same?
Right now I am using an XML file, but I thought using a collection so that I can embed all the values in the code is simpler. Thanks for your suggestions.
A Dictionary will do the job. However, if you are doing rapid partial matches (e.g. search as the user types) you may get better performance by creating multiple keys which point to the same item. For example, the word "Apple" could be located with "Ap", "App", "Appl", and "Apple".
I have used this approach on a similar number of records with very good results. I have turned my 10K source items into about 50K unique keys. Each of these Dictionary entries points to a list containing references to all matches for that term. You can then search this much smaller list more efficiently. Despite the large number of lists this creates, the memory footprint is quite reasonable.
You can also make up your own keys if desired to redirect common misspellings or point to related items. This also eliminates most of the issues with unique keys because each key points to a list. A single item may be classified by each of the words in its name; this is extremely useful if you have long product names with multiple words in it. When classifying your items, each word in the name can be mapped to one or more keys.
I should also point out that building and classifying 10K items shouldn't take long if done correctly (couple hundred milliseconds is reasonable). The results can be cached for as long as you want using Application, Cache, or static members.
To summarize, the resulting structure is a Dictionary<string, List<T>> where the string is a short (2-6 characters works well) but unique key. Each key points to a List<T> (or other collection, if you are so inclined) of items which match that key. When a search is performed, you locate the key which matches the term provided by the user. Depending on the length of your keys, you may truncate the user's search to your maximum key length. After locating the correct child collection, you then search that collection for a complete or partial match using whatever methodology you wish.
Lastly, you may wish to create a lightweight structure for each item in the list so that you can store additional information about the item. For example, you might create a small Product class which stores the name, price, department, and popularity of the product. This can help you refine the results you show to the user.
All-in-all, you can perform intelligent, detailed, fuzzy searches in real-time.
The aforementioned structures should provide functionality roughly equivalent to a trie.
10K records is not that much.
An Dictionary<string,decimal> would fit the bill. You can sort by key or by value using LINQ, as well as do searches.
This assumes that product names are unique.

C# : Using hashtables to store two of the same value. Is it possible?

I'm fairly new to programming C# and I have written a program that uses hashtables to store data (in my case, the users name, and if they are "Ready" or "Not Ready". I have 2 tables in total. The first table has the key as the username and the IP address of the client in the value box. the second table has the Ready/Not Ready status (given by a combo box) for the key, and the IP address as the value.
The first table isn't a problem, as I don't want the users name to re-occur. However, in the second table I need the Ready/Not Ready status to re-occur many times. However this does not work as it says there is already a key called "Ready" in the hashtable. Is there nay way to get around this?
You could use a Dictionary<Status,HashSet<IP>> for the second table. This has the additional advantage that inserting/removing an IP is fast since it's a key into the HashSet.
So, the reason for the second hashtable is to quickly look up who is ready or not, correct?
In that case, consider splitting that up into 2 different collections: one for those who are ready, and one for those who are not.
Most likely, a simple List<T> will be fine here, since you just need to see who is in there, rather than finding a specific one (because if you want to do that, you could just look in the other hashtable). If it's important to have similar lookup properties to the hashtable, you can use a HashSet<T> instead, but it depends on your needs.
Keys have to be unique. If you tried to access a value by the key how would it know which one you really wanted?
It sounds like Hashtable is probably not the ideal data structure for your problem.
Keys within a hashtable / dictionary must be unique, so no, you can't technically store two entries in a hash table that share the exact same key.
Also, you should probably use a Dictionary<TKey, TValue) instead of the actual Hashtable type, as it has better performance characteristics.
You can simulate what you want by doing something like creating a Dictionary whose value is a set of some kind:
// Map containing two sets of IP addresses: those that are ready,
// and those that are not ready.
var readyMap = new Dictionary<bool, HashSet<string>>();
readyMap[true] = new HashSet<string>();
readyMap[false] = new HashSet<string>();
// Add an IP address that is ready.
readyMap[true].Add(ipAddress1);
// Add an IP address that is not ready.
readyMap[false].Add(ipAddress2);
However, that might not be the ideal solution. What is the actual problem you are trying to solve?
Okay, I have finally figured this out. I have 3 tables. First one is Key: Username Value: IP. Second is Key: IP Value: Username. Third is Key:Username Value: Ready/Not Ready. Then I simply reference htReady.Value (the hashtable). My whole code is here: http://pastebin.com/Z60GEjK8 .The parts you want to be looking at are the start of Class ChatServer, AddUser, RemoveUser and AcceptClient.
I am new to this, so If you could suggest a better way, I'm all ears.

best way to represent this lookup table in c#

I need to represent a lookup table in C#, here is the basic structure:
Name Range Multiplier
Active 10-20 0.5
What do you guys suggest?
I will need to lookup on range and retrieve the multiplier.
I will also need to lookup using the name.
UPdate
It will have maybe 10-15 rows in total.
Range is integer date type.
What you actually have is two lookup tables: one by Name and one by Range. There are several ways you can represent these in memory depending on how big the table will get.
The mostly-likely fit for the "by-name" lookup is a dictionary:
var MultiplierByName = new Dictionary<string, double>() { {"Active",.5}, {"Other", 1.0} };
The range is trickier. For that you will probably want to store either just the minimum or the maximum item, depending on how your range works. You may also need to write a function to reduce any given integer to it's corresponding stored key value (hint: use integer division or the mod operator).
From there you can choose another dictionary (Dictionary<int, double>), or if it works out right you could make your reduce function return a sequential int and use a List<double> so that your 'key' just becomes an index.
But like I said: to know for sure what's best we really need to know the scope and nature of the data in the lookup, and the scenario you'll use to access it.
Create a class to represent each row. It would have Name, RangeLow, RangeHigh and Multiplier properties. Create a list of such rows (read from a file or entered in the code), and then use LINQ to query it:
from r in LookupTable
where r.RangeLow <= x && r.RangeHigh >= x
select r.Multiplier;
Sometimes simplicity is best. How many entries are we looking at, and are the ranges integer ranges as you seem to imply in your example? While there are several approaches I can think of, the first one that comes to mind is to maintain two different lookup dictionaries, one for the name and one for the value (range) and then just store redundant info in the range dictionary. Of course, if your range is keyed by doubles, or your range goes into the tens of thousands I'd look for something different, but simplicity rules in my book.
I would implement this using a DataTable, assuming there was no pressing reason to use another datatype. DataTable.Select would work fine for running a lookup on Name or Range. You do lose some performance using a DataTable for this but with 10-15 records would it matter that much.

Categories

Resources