I have history of records for Deck of cards (52) for combinations, now I want to find what are the most occurring combinations from history records.
I can iterate on each card / number and check for combination from history but this is not the efficient algorithm, so I am looking the most used way of finding combinations from history.
Let say I have history in a table:
AKJ
JKQ
AKK
AJJ
A123
AKJ
AkQ
A234
AKQ
AKQ
AKQ
similar to all cards
Now I want to get most occurred combination for A - in the above AKQ -4 tunes
Similarly for all cards
I have tried with foreach card iterate on history and get combinations into list and get count but this is inefficient
I want to know what is the best pattern matching one, do I need to use sets and how?
From what i understand, you are trying to get the count of the combination which is most frequent for a particular card/number.
And a combination would match a number if the number is present anywhere in the combination.
I think to answer this type of question efficiently for a growing history, you can use a graph to model the history.
The graph would be bipartite (nodes of 2 types). One type of nodes would be each single number (13 of them) and the other type of nodes would be the unique combinations present in the history.
Each combination node would be connected to all the numbers present in the combination. Each combination node would also have a count of its frequency.
In your example, node A would be connected to AKJ(2 times), AKK(1 time), AJJ(1 time), A123(1 time), A234(1 time) and AKQ(4 times).
Whenever a combination occurs, you can check whether it is already present in the graph (using a hash table). If present, increment its count. Else, add a new combination node for it and connect it to the numbers present in it.
Using this graph, you can query the combinations matching any number by iterating through its neighbors and find the neighbor(combination) with the highest frequency. So for A, from the list of its neighbors, the most frequent one is AKQ.
If i suggest your question right, you want to get the count of usages of each card in your history.
What you can do is:
iterate through the possible cards and find all in history
or
iterate through the history and count the maximum for each card.
To win performance here, it is possible to have a look at the actual maximum count and look how many cards are still in the history.
If you just want to know the kind of card you are finished.
If you want to know the count you have to iterate through all of the cards.
Related
I'm looking for help on a problem which I don't know how to deal with. I'm guessing similar questions have already been asked, but I couldn't google it the right way.
What I'm trying to do is a randomizer for the boards in FFXII in C# and there's a part of the problem I don't know how to solve: the randomization.
I'm simplifying a bit here: there are 12 boards containing licenses that you can unlock to equip stuff or use magic. Board spots may be empty and no single license may appear twice on one board, but licenses can occur several times if they are on different boards. Each board also has a different number of licenses. There's a total of 1626 licenses on the boards, with the number of unique licenses being around 350. I have a list of all licenses, along with the number times they occur in the original board setup. (The one you get if you play the game normally.)
I would like help with generating 12 random licence lists of predetermined size, without duplicates, from the multiset of licence occurences in the original game. What I'm specifically worried about is that the algorithm might get stuck in a state where there are more duplicate elements than there are sets with room for those elements. The total size of the 12 lists is equal to the number of elements in the multiset, of course. (I'll place them on the board myself, that is not too difficult.)
I have a list that I divided into a fixed number of sections (with some that might be empty).
Every section contains unordered elements, however the sections themselves are ordered backwards.
I am referencing each beginning of a section through a fixed dimension array, whose elements are the indexes at which the each section can be found in the list.
I regularly extract the whole section at the tail of the list and, when I do so, I set its index inside the array at 0 (so the section will start to regrow from the head of the list) and then I circularly increment the lastSection variable that I use to keep track of which section is at the tail of the list.
With the same frequency I also need to insert back into the list some new elements that will be spread across one or more sections.
I chose a single sectioned list (instead of a list of lists or something like that) because, even if the sections may vary a lot (from empty to a length of some thousands), the total number of elements has little variations during the application runtime AND because I also frequently need to get all the elements in the list, and didn't want to concatenate multiple lists in order to get the result.
Graphical representation of the data structure
Existential question:
Up to here did I do some mistakes in the choice of the data structure, since these described are all the operations I am doing with it?
Going forward:
The problem I am trying to address, since this is the core of the application I am building (and I want to squeeze out every slice of performance I can since it should run on smartphones), is: how can I do those multiple inserts as fast as possible?
Trivial solution:
For each new group of elements belonging to a certain section, just do an insertRange (sectionBeginning, groupOfElements).
Performance footprint:
every insertRange will force the list to shift all the content after the root of a section to the right, and with multiple insertRange this means that some data will be shifted even M times, where M is the number of insertRange done with index != list.Count.
Little smarter solution:
Knowing before every multiple-inserts step which and how many new elements per section I need to add, I can add empty elements to the back of the list, perform M shifts of determined size, then copy the new elements to the corresponding "holes" left inside the list.
I could extend the list class and implement a new insertRange (int [] index, IEnumerable [] collection) where each index points to the beginning of a section, however I am worried about some possible internal optimizations that the list class might have and that could transform my for loop shifts in worse performance, like an Array.Copy to which I do not think to have access. Is there a way to do a performant list shift in order to implement this and gain some advantages over multiple standard insertRanges?
Note: index and collections should be ordered by section.
Graphical representation of the multiple-at once insertRange approach
Another similar thread about insertRange:
Replace multiple InsertRange() into efficient way
Another similar thread about shifts in lists:
Does code exist, for shifting List elements to left or right by specified amount, in C#?
I apologize if there is a similar question already out there. There
are several questions about scoring hands but I don't need that.
The project I am working on takes in 10 cards and needs to report the
best possible 5-card hand found ("straight", "high card", "flush"
etc.). Luckily what the actual hand of cards is is irrelevant, I just
need a name.
I've already parsed and sorted all the cards out and have the tests
for all the possible hands laid out. All I need now is a convenient
way to store the hands. My mad method is as follows, in pseudocode
terms:
I want to have a dynamic list horizontally that I can populate with the NUMBER values of the cards, in order from highest to lowest. For
example, "Q J T 7 4 2 1". T is 10. Duplicates of values will be
ignored. Next, I want each of those values to have, underneath, a
list of the suits of each value that exist in the deck. For example,
J will have a sub-list with the values "D H" to represent that I have
a Jack of Diamonds and a Jack of Hearts.
I believe this to be the most elegant way to deal with these cards,
since most poker hands deal with only values and this way I don't have
to worry about cards of the same value in a row for say the straight
test. Then the two tests that do deal with suit can easily be
tested for by referring to the values under the keys.
Take a deep breath, almost there.
So an instance of Lookup appears to be perfect! It has the exact "one
key to multiple values" structure that I want. However, it doesn't
allow me to add the suits as I come to them. I have to add them all
at once or not at all since the lists are immutable after entry.
So I either
have to find all the suits at once before I even make the Lookup
Somehow add values to the Lookup lists or
Use something else.
Any ideas on any of these?
UPDATE
TL;DR SPARKNOTES VERSION: How can I add more values to the keys inside of a Lookup?
*IMOPRTANT NOTE:*The output of this program should be a string containing the name of the highest hand possible, for example "four of a kind", "two pair" or "high card."
I found one solution (which I unfortunately lost the link to and can't find again) where they suggested re-creating the entire Lookup with the new list. It may just be me but I find that solution to be very... ugly... Anyway several other solutions I have explored or tested are to:
METHOD 1
Roll through and populate another array with the suits associated with each value. Basically (in actual pseudocode this time >_>):
Create an array of ArrayLists (array 1)
Iterate through a sorted string array of "cards" (array 2)
For each card:
Take the char at string index [1] (representing the suit) and add into the ArrayList in array 1 at the index number extracted from string index [0].
This way I have the list of values with associated suits that I wanted. And the list of suits is the minimal size to boot, making iteration through that easier later. With some extra steps I can even make the umbrella array an ArrayList and populate it with the card values in order so there are no gaps and no duplicate numbers. This will leave me with a jagged array of what I want. To be clear, this is not a homework assignment. However, it IS from a coding class project my roommate completed in the past which is why I have the constrictions and requirements I have. Someone else I asked told me SE gets plagued by these kinds of homework questions around this time, so I understand your skepticism. This is a personal project because I want to learn C# (all I know is Java right now, and I like the sound of parameter pointer passing in C# methods, which Java does not do).
If it WERE for a grade I would end there because it works. But I don't really like arraylists of arraylists, they seem messy to me. So I want to know if there is another method.
METHOD 2
I also considered simply dealing with duplicates that inevitably appear. For example, here is my test for a straight:
for (int i = 0; i < 5; i++)
{
int counter = 0;
for (int j = i; j < i + 4; j++)
{
int secondCard = getValue(cardsArray[j + 1]);
int firstCard = getValue(cardsArray[j]);
if (secondCard == firstCard)
{
break;
}
if (secondCard == (firstCard + 1))
{
counter++;
if (counter == 4)
{
isStraight = true;
return "straight";
}
}
}
}
This code does not work. It needs some tweaks somewhere or other to work completely but I want to analyze if it is worth it before I try to fix it. It DOES accurately test for a straight, though. Also a couple notes: firstCard and secondCard are there for readability and debug purposes, and isStraight is there so that I don't reinvent the wheel later when I test for a straight flush.
This nested loop will iterate through all the cards up till the 5th card (since you can't have a straight out of ten sorted cards with less than 5 cards) and then check the next five cards as you would expect. If during this iteration I encounter a duplicate entry it means that it's the same card of another suit and I simply "break". What SHOULD happen as a result of this one statement is that now we have incremented our second iteration by one to check the next card instead of the current one. The count of in-order cards that we have will stay the same so that a list like " 1D 2D 3S 3H 4D 5C" will skip over the second 3 when finding the straight. Despite the break I was actually quite pleased with the elegance of this solution, whether I had a right to be or not.
It all goes back to the flaws of using a simple array of strings ("cards"), which is what my code is tailored to right now. And I hate fixing issues, I'd rather avoid them. Maybe I'm being unnecessarily picky but I'm learning along the way.
METHOD 3
My consideration of the weaknesses of an array of strings lead me to Dictionaries, which looked attractive. It can easily be made to hold my values in order, and easy to find if I have a certain suit for a key (TryGet), all in a neat, tailor-made package. Creating multiple array lists and doing things like "(find index of my value); array1[index].Add(value)" would be replaced by "Dictionary.Add(value, suit)". But I can only add a suit to a key at the point of creation. I couldn't make a "2" key and add "S" and then when I find out the next card is a "2D" add a D under the "2" key. Dictionary just doesn't support that, or even adding multiple values at all. I can make a dictionary of lists, but I still can't edit the list since Dictionaries are mostly query data structures. Lookups support multiple values per key but still cannot be changed after the initial "Add()". Again I could "re-create" the entire lookup or dictionary to add a suit and keep everything in order. But to me that seems like rebuilding the whole bridge because this one cable is too long and I don't have an industrial able cutter. It's a problem that SHOULD have an easier solution, like maybe go and GET some cutters (import a class maybe?).
CONCLUSION
Since you suggest that my needs are no different than what a hand scoring system could deliver leads me to another question:
Are hand scores directly tied to certain hands? Like I mentioned earlier the result I want is "The best hand you can make is a full house" not "This player has the highest hand." So can I calculate the highest scoring hand and extrapolate a "full house" from that score? If so then I guess this is all unnecessary code, but I would kind of like to solve this anyway in that case.
As I wrote this edit it dawned on me that this is basically a vanity issue. I don't "like" the solution I have. I also don't want to use the accepted solution (table lookup) because that is not a coding project that is a copypaste project. I would greatly appreciate any input.
Let's do this the simplest way possible. First, say you have an array of 10 strings, each of which is the name of a card. Like "Four of Hearts" and "Queen of Spades". That's really inconvenient to work with. So the first thing we do is convert those strings to numbers to represent each card. A very convenient way to do it is to use numbers 0-12 for hearts, 13-25 for diamonds, etc. So you have code (possibly a lookup table) that converts names to numbers:
Ace of Hearts = 0
Two of Hearts = 1
Three of Hearts = 2
...
Queen of Hearts = 11
King of Hearts = 12
Ace of Diamonds = 13
...
...
Ace of Clubs = 26
...
...
Kind of Spades = 51
So you have an array of numbers that represents the 10 cards. Call it cardsArray:
int[] cardsArray = new int[10];
// here, fill the cards array from the input
It's easy to check for flushes if you sort by suit and value. Remember, there are only 10 cards, so sorting isn't going to take a huge amount of time. The sort is really easy:
int[] sortedBySuit =
cardsArray
.OrderBy(x => x/13) // sorts by suit
.ThenByDescending(x => x % 13) // then by value, descending
.ToArray();
You can then go sequentially through the array and determine if you have a flush, straight flush, and what the high card in the flush (if any) is.
You have to save that information, because four-of-a-kind beats a flush, for example. So you need to check that, too.
Next, sort by value:
int[] sortedByValue =
cardsArray
.OrderByDescending(x => x % 13)
.ToArray();
Now you can go sequentially through that list to determine high card, pairs, three of a kind, four of a kind, or straights. As you find each type of hand, you save that hand information ("king high straight" or "three tens", along with the hand's value [1 for high card, 2 for pair, straight, flush, full house, etc. in the proper order]) to a list.
Then you just pick the hand with the highest value from those that you found.
That's definitely not the fastest way to do things, but it's simple, uses very little memory, and is fast enough for a prototype. It's certainly simpler than using a dictionary or array of arrays, etc.
To be clear, I didn't read your novel.
TL;DR SPARKNOTES VERSION: How can I add more values to the keys inside of a Lookup?
Usually, when I need a "dictionary" with a key that has multiple values, I use a List<KeyValuePair<string, int>>. You could use LINQ to Objects to select all the values. For example:
static void StackOverflowExample()
{
var cardList = new List<KeyValuePair<string,int>> ()
{
new KeyValuePair<string, int>("Club", 8),
new KeyValuePair<string, int>("Spade", 9),
new KeyValuePair<string, int>("Heart", 10)
};
var results = cardList.Where(p => p.Key == "Heart");
}
var results is an IEnumerable<KeyValuePair<string,int>>. Hopefully, this helps.
I have to write a program that compares 10'000'000+ Entities against one another. The entities are basically flat rows in a database/csv file.
The comparison algorithm has to be pretty flexible, it's based on a rule engine where the end user enters rules and each entity is matched against every other entity.
I'm thinking about how I could possibly split this task into smaller workloads but I haven't found anything yet. Since the rules are entered by the end user pre-sorting the DataSet seems impossible.
What I'm trying to do now is fit the entire DataSet in memory and process each item. But that's not highly efficient and requires approx. 20 GB of memory (compressed).
Do you have an idea how I could split the workload or reduce it's size?
Thanks
If your rules are on the highest level of abstraction (e.g. any unknown comparison function), you can't achive your goal. 10^14 comparison operations will run for ages.
If the rules are not completely general I see 3 solutions to optimize different cases:
if comparison is transitive and you can calculate hash (somebody already recommended this), do it. Hashes can also be complicated, not only your rules =). Find good hash function and it might help in many cases.
if entities are sortable, sort them. For this purpose I'd recommend not sorting in-place, but build an array of indexes (or IDs) of items. If your comparison can be transformed to SQL (as I understand your data is in database), you can perform this on the DBMS side more efficiently and read the sorted indexes (for example 3,1,2 which means that item with ID=3 is the lowest, with ID=1 is in the middle and with ID=2 is the largest). Then you need to compare only adjacent elements.
if things are worth, I would try to use some heuristical sorting or hashing. I mean I would create hash which not necessarily uniquely identifies equal elements, but can split your dataset in groups between which there are definitely no one pair of equal elements. Then all equal pairs will be in the inside groups and you can read groups one by one and do manual complex function calculation in the group of not 10 000 000, but for example 100 elements. The other sub-approach is heuristical sorting with the same purpose to guarantee that equal elements aren't on the different endings of a dataset. After that you can read elements one by one and compare with 1000 previous elements for example (already read and kept in memory). I would keep in memory for example 1100 elements and free oldest 100 every time new 100 comes. This would optimize your DB reads. The other implementation of this may be possible also in case your rules contains rules like (Attribute1=Value1) AND (...), or rule like (Attribute1 < Value2) AND (...) or any other simple rule. Then you can make clusterisation first by this criterias and then compare items in created clusters.
By the way, what if your rule considers all 10 000 000 elements equal? Would you like to get 10^14 result pairs? This case proves that you can't solve this task in general case. Try making some limitations and assumptions.
I would try to think about rule hierarchy.
Let's say for example that rule A is "Color" and rule B is "Shape".
If you first divide objects by color,
than there is no need to compare Red circle with Blue triangle.
This will reduce the number of compares you will need to do.
I would create a hashcode from each entity. You probably have to exclude the id from the hash generation and then test for equals. If you have the hashs you could order all the hashcodes alphabetical. Having all entities in order means that it's pretty easy to check for doubles.
If you want to compare each entity with all entities than effectively you need to cluster the data , there is very fewer reasons to compare totally unrelated things ( compare Clothes with Human does not make sense) , i think your rules will try to cluster the data.
so you need to cluster the data , try some clustering algorithms like K-Means.
Also see , Apache Mahout
Are you looking for the best suitable sorting algorithm, kind of a, for this?
I think Divide and Concur seems good.
If the algorithm seems good, you can have plenty of other ways to do the calculation. Specially parallel processing using MPICH or something may give you a final destination.
But before decide how to execute, you have to think if algorithm fits first.
I have a list of recipes obtained from a database that looks like this:
List<RecipeNode> _recipeList;
RecipeNode, among other things, has a property that references one or more tags (Such as Dinner, Breakfast, Side, Vegetarian, Holiday, and about 60 others).
public sealed class RecipeNode
{
public Guid RecipeId;
public Byte[] Tags; //Tags such as 1, 5, 6, 8, 43
//... More stuff
}
Finding a random recipe from _recipeList in O(1) would of course be easy, however what I need to do is find a random recipe that has, say, 5 in the Tags in O(1).
Right now, my only idea is to make an array of List<RecipeNodes>, keyed by tag. For example:
List<RecipeNode>[] _recipeListByTag;
Then, _recipeListByTag[5] would contain a list of all the recipes that have a 5 in the Tags array. I could then choose a random allowed tag and a random recipe within that tag in O(1).
The drawback of this approach is the size of this multidimensional array would be Recipes * Tags (eg, the sum of Tags.length across all recipes), which starts to take up a lot of memory since I'm storing a potentially huge number of recipes in this array. Of course, since RecipeNode is a reference type, I'm only repeating the 4byte pointers to the recipes, so this still might be the best way to go.
Is there a more efficient data structure or algorithm I could use to allow me to find a random recipe that contains a certain allowed tag? Thanks!
List<RecipeNode>[] _recipeListByTag is probably the best approach for you, and its size is not Recipes * Tags because each list in the array will only contain as many recipes as match a tag, and not more. Its size would become Recipes * Tags if every single recipe contained every single tag.
If the amount of memory occupied by your data structures is so very dear to you, do not forget to call List.TrimExcess() after you have populated each list.
Is this homework? I doubt any real-world recipe program would require O(1) access to tags, and be too slow for using a database. I also doubt any real-world recipe would have numeric tags. Understanding the real domain can help provide a better answer.
However, if it really is about recipes and numeric tags, and if you only have 256 tags, why don't you just choose a random recipe 1 million times? The odds of not finding a recipe with the required tag are minimal, and the complexity is still O(1). If you don't like the odds, choose a random recipe 10^20 times. The complexity is still O(1).
UPDATE:
Since it's not the O(1) you're worried about, but rather the time it takes to pick a random recipe, I suggest you let your database handle this for you - the same database that holds all the recipes anyway, and the same database you're going to access to show the random recipe.
You can SELECT a random record in SQL Server this way: SQL Server Random Sort . If you're using some other database, there are other ways: http://www.petefreitag.com/item/466.cfm . Just make sure your WHERE clause has Tag=17 in it.
If you want to keep the data in memory, you won't do much better than a list of (4 byte) pointers for each tag. If you can use a DB... well, others have already posted about that. Depending on how huge is "huge", you might just fork out some $$$ to add RAM to the target machine.
If you do want to keep the data in memory, but want to be ridiculously parsimonious with memory, you could try to squeeze down the 4 bytes per tag-recipe combination. For example, keep all the recipes in a big array, and (assuming you won't have more than about a million) store array indexes in 3 bytes each.
To go even further, you could divide the recipes with a given tag into equally-sized "buckets" (each occupying a contiguous area of the big array), store a starting index for each "bucket" (in 3-4 bytes), and then store a list of delta values between indexes of consecutive recipes with the given tag. Encode the delta values using an array of bytes, in such a way that a single delta value can use anything from 1-4 bytes, as required.
Because the number of recipes in a "bucket" will be limited to a constant number, retrieval using this approach is still O(1).
(I have done embedded programming on micros with as little as 256 bytes of RAM... when you do that you start thinking of very creative ways to save bytes or even bits. I'm sure that going to such lengths will not be necessary in your application, but I thought this was an interesting idea.)
I would make an export from the source list to another with references to all elements that suit you. There make a random choosing and take an element from the source list, according to the reference.
If there is a possibility that you again coud use the same derived list, put such lists into a greater list of them.
(Of cource, the chosen algorithm depends on the real statistics of your list.)
If you use only one parameter, you could order your list by this parameter and remember in another list B of it references to where the elements with the same parameter value start. Later you could simply take random in the interval (B[4];b[5]-1). This would make the speed of a random choosing equal to O(const).
In this case, I would, personally, go for SQlite solution (as I, personally, know it's better then others). I see that you worry about a space, and not performance in terms of speed, but in terms of constant recovery time, you worry about flexibility of data access too. Imo, in this case, SQlite is way to go.
Design your small DB in a way you like and execute queries and joins in a way you want.
This is old but always valid example of how can you use it.
There is also, naturally, ORM solution (for example LINQ driver), but to me personally it seems kind of overhead.
Hope this helps.