Best way to store / retrieve bits C# [duplicate] - c#

This question already has answers here:
Best way to store long binary (up to 512 bit) in C#
(5 answers)
Closed 9 years ago.
I am modifying an existing C# solution, wherein data is validated and status is stored as below:
a) A given record is validated against certain no. of conditions (say 5). Failed / passed status is represented by a bit value (0 - passed; 1 - failed)
b) So, if a record failed for all 5 validations, value will be 11111. This will be converted to a decimal and stored in a DB.
Once again, this decimal value will be converted back to binary (using bitwise & operator) which will be used to show the passed / failed records.
The issue is, long datatype is used in C# to handle the decimal and 'decimal' datatype in SQL Server 2008 to store this decimal value. The max. value of long converted to binary can hold only upto 64 bits, so validation count is currently restricted to 64.
My requirement is to remove this limit to allow any no. of validations.
How do I store a large no. of bits and also retrieve them? Also, please keep in mind, this being an existing (.NET 2.0) solution, can't afford to upgrade / use any 3rd party libraries and changes must be minimum
Latest update
Yes, this solution seems to be OK from an application perspective, i.e. if only I (a.k.a the present solution) were to use only C#. However, the designers of the existing solution made things complicated by storing the binary value (11111 represents all 5 records failed, 10111 - all but 4th record failed, and so on...) converted into decimal in SQL Server DB. An SP takes this value to arrive at no. of records failed for e each validation.
OPEN sValidateCUR
FETCH NEXT FROM sValidateCUR INTO #ValidationOID,#ValidationBit, #ValidationType
WHILE ##FETCH_STATUS = 0
BEGIN
-- Fetch the Error Record Count
SET #nBitVal = ABS(RPT.fGetPowerValue(#ValidationBit)) -- Validation bit is no. of a type of validation, say, e.g. 60. So 1st time when loop run, ValidationBit will be 0
select #ErrorRecordCount = COUNT(1) FROM
<<Error_Table_Where_Flags_are availble in decimal values>>
WITH(NOLOCK) WHERE ExpressionValidationFlags & CAST(CAST(#nBitVal AS VARCHAR(20)) AS Bigint) = CAST(#nBitVal AS VARCHAR(20)) -- For #ValidationBit = 3, #nBitVal = 2^3 = 8
Now, in application, using BitArray, I managed to stored the passed / failed records in BitArray, converted this to byte[] and stored in SQL Server as VARBINARY(100)... (the same column ExpressionValidationFlags, which was earlier BIGINT, is now VARBINARY and holds the byte array). However, to complete my changes, I need to modify the SP above.
Again, looking forward for help!!
Thanks

Why not use a specially designed class BitArray?
http://msdn.microsoft.com/query/dev11.query?appId=Dev11IDEF1&l=EN-US&k=k(System.Collections.BitArray);k(TargetFrameworkMoniker-.NETFramework,Version%3Dv4.5);k(DevLang-csharp)&rd=true
e.g.
BitArray array = new BitArray(150); // <- up to 150 bits
...
array[140] = true; // <- set 140th bit
array[130] = false; // <- reset 130th bit
...
if (array[120]) { // <- if 120th bit is set
...

There are several ways to go about this, based on the limitations of the database you are using.
If you are able to store byte arrays within the database, you can use the BitArray class. You can pass the constructor a byte array, use it to easily check and set each bit by index, and then use it's built in CopyTo method to copy it back out into a byte array.
Example:
byte[] statusBytes = yourDatabase.Get("passed_failed_bits");
BitArray statusBits = new BitArray(statusBytes);
...
statusBits[65] = false;
statusBits[66] = true;
...
statusBits.CopyTo(statusBytes, 0);
yourDatabase.Set("passed_failed_bits", statusBytes);
If the database is unable to deal with raw byte arrays, you can always encode the byte array as a hex string:
string hex = BitConverter.ToString(statusBytes);
hex.Replace("-","");
and then get it back into a byte array again:
int numberChars = hex.Length;
byte[] statusBytes= new byte[numberChars / 2];
for (int i = 0; i < numberChars; i += 2) {
statusBytes[i / 2] = Convert.ToByte(hex.Substring(i, 2), 16);
}
And if you can't even store strings, there are more creative ways to turn the byte array into multiple longs or doubles.
Also, if space efficiency is an issue, there are other, more efficient (but more complicated) ways to encode bytes as ascii text by using more of the character range without using control characters. You may also want to look into Run Length Encoding the byte array, if you're finding the data remains the same value for long stretches.
Hope this helps!

Why not use a string instead? You could put a very large number of characters in the database (use VARCHAR and not NVARCHAR since you control the input).
Using your example, if you had "11111", you could skip bitwise operations and just do things like this:
string myBits = "11111";
bool failedPosition0 = myBits[0] == '1';
bool failedPosition1 = myBits[1] == '1';
bool failedPosition2 = myBits[2] == '1';
bool failedPosition3 = myBits[3] == '1';
bool failedPosition4 = myBits[4] == '1';

Related

C# 4 bit data type [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
Does C# have a 4 bit data type? I want to make a program with variables that waste the minimum amount of memory, because the program will consume a lot.
For example: I need to save a value that i know it will go from 0 to 10 and a 4 bit var can go from 0 to 15 and it's perfect. But the closest i found was the 8 bit (1 Byte) data type Byte.
I have the idea of creating a c++ dll with a custom data type. Something like nibble. But, if that's the solution to my problem, i don't know where to start, and what i have to do.
Limitations: Creating a Byte and splitting it in two is NOT an option.
No, there is no such thing as a four-bit data type in c#.
Incidentally, four bits will only store a number from 0 to 15, so it doesn't sound like it is fit for purpose if you are storing values from 0 to 127. To compute the range of a variable given that it has N bits, use the formula (2^N)-1 to calculate the maximum. 2^4 = 16 - 1 = 15.
If you need to use a data type that is less than 8 bits in order to save space, you will need to use a packed binary format and special code to access it.
You could for example store two four-bit values in a byte using an AND mask plus a bit shift, e.g.
byte source = 0xAD;
var hiNybble = (source & 0xF0) >> 4; //Left hand nybble = A
var loNyblle = (source & 0x0F); //Right hand nybble = D
Or using integer division and modulus, which works well too but maybe isn't quite as readable:
var hiNybble = source / 16;
var loNybble = source % 16;
And of course you can use an extension method.
static byte GetLowNybble(this byte input)
{
return input % 16;
}
static byte GetHighNybble(this byte input)
{
return input / 16;
}
var hiNybble = source.GetHighNybble();
var loNybble = source.GetLowNybble();
Storing it is easier:
var source = hiNybble * 16 + lowNybble;
Updating just one nybble is harder:
var source = source & 0xF0 + loNybble; //Update only low four bits
var source = source & 0x0F + (hiNybble << 4); //Update only high bits
A 4-bit data type (AKA Nib) only goes from 0-15. It requires 7 bits to go from 0-127. You need a byte essentially.
No, C# does not have a 4-bit numeric data type. If you wish to pack 2 4-bit values in a single 8-bit byte, you will need to write the packing and unpacking code yourself.
No, even boolean is 8 bits size.
You can use >> and << operators to store and read two 4 bit values from one byte.
https://msdn.microsoft.com/en-us/library/a1sway8w.aspx
https://msdn.microsoft.com/en-us/library/xt18et0d.aspx
Depending on how many of your nibbles you need to handle and how much of an issue performance is over memory usage, you might want to have a look at the BitArray and BitVector32 classes. For passing around of values, you'd still need bigger types though.
Yet another option could also be StructLayout fiddling, ... beware of dragons though.

integer conversion fails when a certain bit is set on

Most confused - we are trying to process an octet-stream binary file. We have various possible destination structs. The incoming file is a string of x bytes - a blob - which we understand we first need to convert to a byte array. We use a FOR loop to move a byte at a time to the byte array. Then, when we know the specific struct of the data - as defined by a fixed-position text field within the data - we use a deserialize routine specific to that struct. Character arrays use one deserialize function to populate string variables, integer fields populate other variables (generally UINT16s), and so on through the received data. When we know we have an int16 (2-byte integer) processing fails if the low order integer's byte 8 is set to negative. We don't know if the 8-bits in the FOR loop is integer, char, or what until after the blob has been moved to the byte-array using the FOR loop (standard
for (i=1, I <= blob_length, i++)
{dest(i) = source(i); }
)and we have identified which struct is in play.
By the time we exit deserialize, we see the data is corrupted as follows:
so decimal 511 binary 01 11111111 converts to decimal 256 binary 01 00000000
but decimal 383 binary 01 01111111 converts correctly
We cannot tell if the FOR loop processing is somehow unable to handle an 8-bit field if the high-order bit is on, or if the actual deserialize process for the UINT16 is failing. We have struggled through other ascii-related issues where that 8th bit corrupts processing. Not sure this is yet another, or if it's something else.
Any insight or guidance would be gratefully appreciated.
Usually the indexes are 0-based and the for-loop should look like this:
for(int i = 0; i < blob_length; i++) {
dest[i] = source[i];
}
Probably you were one byte off.

Convert ten character classification string into four character one in C#

What's the best way to convert (to hash) a string like 3800290030, which represents an id for a classification into a four character one like 3450 (I need to support at max 9999 classes). We will only have less than 1000 classes in 10 character space and it will never grow to more than 10k.
The hash needs to be unique and always the same for the same an input.
The resulting string should be numeric (but it will be saved as char(4) in SQL Server).
I removed the requirement for reversibility.
This is my solution, please comment:
string classTIC = "3254002092";
MD5 md5Hasher = MD5.Create();
byte[] classHash = md5Hasher.ComputeHash(Encoding.Default.GetBytes(classTIC));
StringBuilder sBuilder = new StringBuilder();
foreach (byte b in classHash)
{
sBuilder.Append(b.ToString());
}
string newClass = (double.Parse(sBuilder.ToString())%9999 + 1).ToString();
You can do something like
str.GetHashCode() % 9999 + 1;
The hash can't be unique since you have more than 9,999 strings
It is not unique so it cannot be reversible
and of course my answer is wrong in case you don't have more than 9999 different 10 character classes.
In case you don't have more than 9999 classes you need to have a mapping from string id to its 4 char representation - for example - save the stings in a list and each string key will be its index in the list
When you want to reverse the process, and have no knowledge about the id's apart from that there are at most 9999 of them, I think you need to use a translation dictionary to map each id to its short version.
Even without the need to reverse the process, I don't think there is a way to guerantee unique id's without such a dictionary.
This short version could then simply be incremented by one with each new id.
You do not want a hash. Hashing by design allows for collisions. There is no possible hashing function for the kind of strings you work with that won't have collisions.
You need to build a persistent mapping table to convert the string to a number. Logically similar to a Dictionary<string, int>. The first string you'll add gets number 0. When you need to map, look up the string and return its associate number. If it is not present then add the string and simply assign it a number equal to the count.
Making this mapping table persistent is what you'll need to think about. Trivially done with a dbase of course.
ehn no idea
Unique is difficult, you have - in your request - 4 characters - thats a max of 9999, collision will occur.
Hash is not reversible. Data is lost (obviously).
I think you might need to create and store a lookup table to be able to support your requirements. And in that case you don't even need a hash you could just increment the last used 4 digit lookup code.
use md5 or sha like:
string = substring(md5("05910395410"),0,4)
or write your own simple method, for example
sum = 0
foreach(char c in string)
{
sum+=(int)c;
}
sum %= 9999
Convert the number to base35/base36
ex: 3800290030 decimal = 22CGHK5 base-35 //length: 7
Or may be convert to Base60 [ignoring Capital O and small o to not confuse with 0]
ex: 3800290030 decimal = 4tDw7A base-60 //length: 6
Convert your int to binary and then base64 encode it. It wont be numbers then, but it will be a reversible hash.
Edit:
As far as my sense tells me you are asking for the impossible.
You cannot take a totally random data and somehow reduce the amount of data it takes to encode it (some might be shorter, others might be longer), thus your requirement that the number is unique is not possible, there has to be some dataloss somewhere and no matter how you do it it won't ensure uniqueness.
Second, due to the above it is also not possible to make it reversible. Thus that is out of the question.
Therefore, the only possible way I can see, is if you have an enumerable data source. IE. you know all the values prior to calculating the value. In that case you can simply assign them a sequencial id.

Need a smaller alternative to GUID for DB ID but still unique and random for URL

I have looked all of the place for this and I can't seem to get a complete answer for this. So if the answer does already exist on stackoverflow then I apologize in advance.
I want a unique and random ID so that users in my website can't guess the next number and just hop to someone else's information. I plan to stick to a incrementing ID for the primary key but to also store a random and unique ID (sort of a hash) for that row in the DB and put an index on it.
From my searching I realize that I would like to avoid collisions and I have read some mentions of SHA1.
My basic requirements are
Something smaller than a GUID. (Looks horrible in URL)
Must be unique
Avoid collisions
Not a long list of strange characters that are unreadable.
An example of what I am looking for would be www.somesite.com/page.aspx?id=AF78FEB
I am not sure whether I should be implementing this in the database (I am using SQL Server 2005) or in the code (I am using C# ASP.Net)
EDIT:
From all the reading I have done I realize that this is security through obscurity. I do intend having proper authorization and authentication for access to the pages. I will use .Net's Authentication and authorization framework. But once a legitimate user has logged in and is accessing a legimate (but dynamically created page) filled with links to items that belong to him. For example a link might be www.site.com/page.aspx?item_id=123. What is stopping him from clicking on that link, then altering the URL above to go www.site.com/page.aspx?item_id=456 which does NOT belong to him? I know some Java technologies like Struts (I stand to be corrected) store everything in the session and somehow work it out from that but I have no idea how this is done.
Raymond Chen has a good article on why you shouldn't use "half a guid", and offers a suitable solution to generating your own "not quite guid but good enough" type value here:
GUIDs are globally unique, but substrings of GUIDs aren't
His strategy (without a specific implementiation) was based on:
Four bits to encode the computer number,
56 bits for the timestamp, and
four bits as a uniquifier.
We can reduce the number of bits to make the computer unique since the number of computers in the cluster is bounded, and we can reduce the number of bits in the timestamp by assuming that the program won’t be in service 200 years from now.
You can get away with a four-bit uniquifier by assuming that the clock won’t drift more than an hour out of skew (say) and that the clock won’t reset more than sixteen times per hour.
UPDATE (4 Feb 2017):
Walter Stabosz discovered a bug in the original code. Upon investigation there were further bugs discovered, however, extensive testing and reworking of the code by myself, the original author (CraigTP) has now fixed all of these issues. I've updated the code here with the correct working version, and you can also download a Visual Studio 2015 solution here which contains the "shortcode" generation code and a fairly comprehensive test suite to prove correctness.
One interesting mechanism I've used in the past is to internally just use an incrementing integer/long, but to "map" that integer to a alphanumeric "code".
Example
Console.WriteLine($"1371 as a shortcode is: {ShortCodes.LongToShortCode(1371)}");
Console.WriteLine($"12345 as a shortcode is: {ShortCodes.LongToShortCode(12345)}");
Console.WriteLine($"7422822196733609484 as a shortcode is: {ShortCodes.LongToShortCode(7422822196733609484)}");
Console.WriteLine($"abc as a long is: {ShortCodes.ShortCodeToLong("abc")}");
Console.WriteLine($"ir6 as a long is: {ShortCodes.ShortCodeToLong("ir6")}");
Console.WriteLine($"atnhb4evqqcyx as a long is: {ShortCodes.ShortCodeToLong("atnhb4evqqcyx")}");
// PLh7lX5fsEKqLgMrI9zCIA
Console.WriteLine(GuidToShortGuid( Guid.Parse("957bb83c-5f7e-42b0-aa2e-032b23dcc220") ) );
Code
The following code shows a simple class that will change a long to a "code" (and back again!):
public static class ShortCodes
{
// You may change the "shortcode_Keyspace" variable to contain as many or as few characters as you
// please. The more characters that are included in the "shortcode_Keyspace" constant, the shorter
// the codes you can produce for a given long.
private static string shortcodeKeyspace = "abcdefghijklmnopqrstuvwxyz0123456789";
public static string LongToShortCode(long number)
{
// Guard clause. If passed 0 as input
// we always return empty string.
if (number == 0)
{
return string.Empty;
}
var keyspaceLength = shortcodeKeyspace.Length;
var shortcodeResult = "";
var numberToEncode = number;
var i = 0;
do
{
i++;
var characterValue = numberToEncode % keyspaceLength == 0 ? keyspaceLength : numberToEncode % keyspaceLength;
var indexer = (int) characterValue - 1;
shortcodeResult = shortcodeKeyspace[indexer] + shortcodeResult;
numberToEncode = ((numberToEncode - characterValue) / keyspaceLength);
}
while (numberToEncode != 0);
return shortcodeResult;
}
public static long ShortCodeToLong(string shortcode)
{
var keyspaceLength = shortcodeKeyspace.Length;
long shortcodeResult = 0;
var shortcodeLength = shortcode.Length;
var codeToDecode = shortcode;
foreach (var character in codeToDecode)
{
shortcodeLength--;
var codeChar = character;
var codeCharIndex = shortcodeKeyspace.IndexOf(codeChar);
if (codeCharIndex < 0)
{
// The character is not part of the keyspace and so entire shortcode is invalid.
return 0;
}
try
{
checked
{
shortcodeResult += (codeCharIndex + 1) * (long) (Math.Pow(keyspaceLength, shortcodeLength));
}
}
catch(OverflowException)
{
// We've overflowed the maximum size for a long (possibly the shortcode is invalid or too long).
return 0;
}
}
return shortcodeResult;
}
}
}
This is essentially your own baseX numbering system (where the X is the number of unique characters in the shortCode_Keyspace constant.
To make things unpredicable, start your internal incrementing numbering at something other than 1 or 0 (i.e start at 184723) and also change the order of the characters in the shortCode_Keyspace constant (i.e. use the letters A-Z and the numbers 0-9, but scamble their order within the constant string. This will help make each code somewhat unpredictable.
If you're using this to "protect" anything, this is still security by obscurity, and if a given user can observe enough of these generated codes, they can predict the relevant code for a given long. The "security" (if you can call it that) of this is that the shortCode_Keyspace constant is scrambled, and remains secret.
EDIT:
If you just want to generate a GUID, and transform it to something that is still unique, but contains a few less characters, this little function will do the trick:
public static string GuidToShortGuid(Guid gooid)
{
string encoded = Convert.ToBase64String(gooid.ToByteArray());
encoded = encoded.Replace("/", "_").Replace("+", "-");
return encoded.Substring(0, 22);
}
If you don't want other users to see people information why don't you secure the page which you are using the id?
If you do that then it won't matter if you use an incrementing Id.
[In response to the edit]
You should consider query strings as "evil input". You need to programmatically check that the authenticated user is allowed to view the requested item.
if( !item456.BelongsTo(user123) )
{
// Either show them one of their items or a show an error message.
}
You could randomly generate a number. Check that this number is not already in the DB and use it. If you want it to appear as a random string you could just convert it to hexadecimal, so you get A-F in there just like in your example.
A GUID is 128 bit. If you take these bits and don’t use a character set with just 16 characters to represent them (16=2^4 and 128/4 = 32 chacters) but a character set with, let’s say, 64 characters (like Base 64), you would end up at only 22 characters (64=2^6 and 128/6 = 21.333, so 22 characters).
Take your auto-increment ID, and HMAC-SHA1 it with a secret known only to you. This will generate a random-looking 160-bits that hide the real incremental ID. Then, take a prefix of a length that makes collisions sufficiently unlikely for your application---say 64-bits, which you can encode in 8 characters. Use this as your string.
HMAC will guarantee that no one can map from the bits shown back to the underlying number. By hashing an auto-increment ID, you can be pretty sure that it will be unique. So your risk for collisions comes from the likelihood of a 64-bit partial collision in SHA1. With this method, you can predetermine if you will have any collisions by pre-generating all the random strings that this method which generate (e.g. up to the number of rows you expect) and checking.
Of course, if you are willing to specify a unique condition on your database column, then simply generating a totally random number will work just as well. You just have to be careful about the source of randomness.
How long is too long? You could convert the GUID to Base 64, which ends up making it quite a bit shorter.
What you could do is something I do when I want exactly what you are wanting.
Create your GUID.
Get remove the dashes, and get a
substring of how long you want your
ID
Check the db for that ID, if it
exists goto step 1.
Insert record.
This is the simplest way to insure it is obscured and unique.
I have just had an idea and I see Greg also pointed it out. I have the user stored in the session with a user ID. When I create my query I will join on the Users table with that User ID, if the result set is empty then we know he was hacking the URL and I can redirect to an error page.
A GUID is just a number
The latest generation of GUIDs (version 4) is basically a big random number*
Because it's a big random number the chances of a collision are REALLY small.
The biggest number you can make with a GUID is over:
5,000,000,000,000,000,000,000,000,000,000,000,000
So if you generate two GUIDs the chance the second GUID is the same as the first is:
1 in 5,000,000,000,000,000,000,000,000,000,000,000,000
If you generate 100 BILLION GUIDs.
The chance your 100 billionth GUID collides with the other 99,999,999,999 GUIDs is:
1 in 50,000,000,000,000,000,000,000,000
Why 128 bits?
One reason is that computers like working with multiples of 8 bits.
8, 16, 32, 64, 128, etc
The other reason is that the guy who came up with the GUID felt 64 wasn't enough, and 256 was way too much.
Do you need 128 bits?
No, how many bits you need depends on how many numbers you expect to generate and how sure you want to be that they don't collide.
64 bit example
Then the chance that your second number would collide with the first would be:
1 in 18,000,000,000,000,000,000 (64 bit)
Instead of:
1 in 5,000,000,000,000,000,000,000,000,000,000,000,000 (128 bit)
What about the 100 billionth number?
The chance your 100 billionth number collides with the other 99,999,999,999 would be:
1 in 180,000,000 (64 bit)
Instead of:
1 in 50,000,000,000,000,000,000,000,000 (128 bit)
So should you use 64 bits?
Depends are you generating 100 billion numbers? Even if you were then does 180,000,000 make you uncomfortable?
A little more details about GUIDs
I'm specifically talking about version 4.
Version 4 doesn't actually use all 128 bits for the random number portion, it uses 122 bits. The other 6 bits are used to indicate that is version 4 of the GUID standard.
The numbers in this answer are based on 122 bits.
And yes since it's just a random number you can just take the number of bits you want from it. (Just make sure you don't take any of the 6 versioning bits that never change - see above).
Instead of taking bits from the GUID though you could instead use the the same random number generator the GUID got it's bits from.
It probably used the random number generator that comes with the operating system.
Late to the party but I found this to be the most reliable way to generate Base62 random strings in C#.
private static Random random = new Random();
void Main()
{
var s = RandomString(7);
Console.WriteLine(s);
}
public static string RandomString(int length)
{
const string chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
return new string(Enumerable.Repeat(chars, length)
.Select(s => s[random.Next(s.Length)]).ToArray());
}

ensure two char arrays are not the same

I am randomly generating a grid of characters and storing it in a char[,] array ...
I need a way to ensure that i haven't already generated a grid before serializing it to a database in binary format...what is the best way to compare two grids based on bytes? The last thing i want to do is loop through their contents as I am already pulling one of them from the db in byte form.
I was thinking checksum but not so sure if this would work.
char[,] grid = new char[8,8];
char[,] secondgrid = new char[8,8];//gets its data from db
From what I can see, you are going to have to loop over the contents (or at least, a portion of it); there is no other way of talking about an arrays contents.
Well, as a fast "definitely not the same" you could compute a hash over the array - i.e. something like:
int hash = 7;
foreach (char c in data) {
hash = (hash * 17) + c.GetHashCode();
}
This has the risk of some false positives (reporting a dup when it is unique), but is otherwise quite cheap. Any use? You could store the hash alongside the data in the database to allow fast checks - but if you do that you should pick your own hash algorithm for char (since it isn't guaranteed to stay the same) - perhaps just convert to an int, for example - or to re-use the existing implementation:
int hash = 7;
foreach (char c in data) {
hash = (hash * 17) + (c | (c << 0x10));
}
As an aside - for 8x8, you could always just think in terms of a 64 character string, and just check ==. This would work equally well at the database and application.
Can't you get the database to do it? Make the grid column UNIQUE. Then, if you need to detect that you've generated a duplicate grid, the method for doing this might involve checking the number of rows affected by your operation, or perhaps testing for errors.
Also, if each byte is simply picked at random from [0, 255], then performing a hash to get a 4-byte number is no better than taking the first four bytes out of the grid. The chance of collisions is the same.
I'd go with a checksum/hash mechanism to catch a large percentage of the matches, then do a full comparison if you get a match.
What is the range of characters used to fill in your grid? If you're using just letters (not mixed case, or case not important), and an 8x8 grid, you're only talking about 7 or so possible collisions per item within your problem space (a very rare occurence) assuming a good hashing function. You could do something like:
Generate Grid
Load any matching grids from DB
if found match from #2, goto 1
Use your new grid.
Try this (invoke ComputeHash for every matrix and compare the guids):
private static MD5 md5 = MD5.Create();
public static Guid ComputeHash(object value)
{
Guid g = Guid.Empty;
BinaryFormatter bf = new BinaryFormatter();
using (MemoryStream stm = new MemoryStream())
{
bf.Serialize(stm, value);
g = new Guid(md5.ComputeHash(stm.ToArray()));
stm.Close();
}
return g;
}
note: Generating the byte array might be accomplished a lot simpler since you have a char array.

Categories

Resources