I have an object with the following properties
GID
ID
Code
Name
Some of the clients dont want to enter the Code so the intial plan was to put the ID in the code but the baseobject of the orm is different so I'm like screwed...
my plan was to put ####-#### totally random values in code how can I generate something like that say a windows 7 serial generator type stuff but would that not have an overhead what would you do in this case.
Do you want a random value, or a unique value?
random != unique.
Remember, random merely states a probability of not generating the same value, or a probability of generating the same value again. As time increases, likelihood of generating a previous value increases - becoming a near certainty. Which do you require?
Personally, I recommend just using a Guid with some context [refer to easiest section below]. I also provided some other suggestions so you have options, depending on your situation.
easiest
If Code is an unbounded string [ie can be of any length], easiest semi-legible means of generating a unique code would be
OrmObject ormObject= new OrmObject ();
string code = string.
Format ("{0} [{1}]", ormObject.Name, Guid.NewGuid ()).
Trim ();
// generates something like
// "My Product [DA9190E1-7FC6-49d6-9EA5-589BBE6E005E]"
you can substitute ormObject.Name for any distinguishable string. I would typically use typeof (objectInstance.GetType ()).Name but that will only work if OrmObject is a base class, if it's a concrete class used for everything they will all end up with similar tags. The point is to add some user context, such that - as in #Yuriy Faktorovich's referenced wtf article - users have something to read.
random
I responded a day or two ago about random number generation. Not so much generating numbers as building a simple flexible framework around a generator to improve quality of code and data, this should help streamline your source.
If you read that, you could easily write an extension method, say
public static class IRandomExtensions
{
public static CodeType GetCode (this IRandom random)
{
// 1. get as many random bytes as required
// 2. transform bytes into a 'Code'
// 3. bob's your uncle
...
}
}
// elsewhere in code
...
OrmObject ormObject = new OrmObject ();
ormObject.Code = random.GetCode ();
...
To actually generate a value, I would suggest implementing an IRandom interface with a System.Security.Cryptography.RNGCryptoServiceProvider implementation. Said implementation would generate a buffer of X random bytes, and dole out as many as required, regenerating a stream when exhausted.
Furthermore - I don't know why I keep writing, I guess this problem is really quite fascinating! - if CodeType is string and you want something readable, you could just take said random bytes and turn them into a "seemingly" readable string via Base64 conversion
public static class IRandomExtensions
{
// assuming 'CodeType' is in fact a string
public static string GetCode (this IRandom random)
{
// 1. get as many random bytes as required
byte[] randomBytes; // fill from random
// 2. transform bytes into a 'Code'
string randomBase64String =
System.Convert.ToBase64String (randomBytes).Trim ("=");
// 3. bob's your uncle
...
}
}
Remember
random != unique.
Your values will repeat. Eventually.
unique
There are a number of questions you need to ask yourself about your problem.
Must all Code values be unique? [if not, you're trying too hard]
What Type is Code? [if any-length string, use a full Guid]
Is this a distributed application? [if not, use a DB value as suggested by #LBushkin above]
If it is a distributed application, can client applications generate and submit instances of these objects? [if so, then you want a globally unique identifier, and again Guids are a sure bet]
I'm sure you have more constraints, but this is an example of the kind of line of inquiry you need to perform when you encounter a problem like your own. From these questions, you will come up with a series of constraints. These constraints will inform your design.
Hope this helps :)
Btw, you will receive better quality solutions if you post more details [ie constraints] about your problem. Again, what Type is Code, are there length constraints? Format constraints? Character constraints?
Arg, last edit, I swear. If you do end up using Guids, you may wish to obfuscate this, or even "compress" their representation by encoding them in base64 - similar to base64 conversion above for random numbers.
public static class GuidExtensions
{
public static string ToBase64String (this Guid id)
{
return System.Convert.
ToBase64String (id.ToByteArray ()).
Trim ("=");
}
}
Unlike truncating, base64 conversion is not a lossful transformation. Of course, the trim above is lossful in context of full base64 expansion - but = is just padding, extra information introduced by the conversion, and not part of original Guid data. If you want to go back to a Guid from this base64 converted value, then you will have to re-pad your base64 string until its length is a multiple of 4 - don't ask, just look up base64 if you are interested :)
You could generate a Guid using :
Guid.NewGuid().ToString();
It would give you something like :
788E94A0-C492-11DE-BFD4-FCE355D89593
Use an Autonumber column or Sequencer from your database to generate a unique code number. Almost all modern databases support automatically generated numbers in one form or another. Look into what you database supports.
Autonumber/Sequencer values from the DB are guaranteed to be unique and are relatively inexpensive to acquire. If you want to avoid completely sequential numbers assigned to codes, you can pad and concatenate several sequencer values together.
Related
I am writing a templating engine and I am searching for a good way to detect if a template has changed.
For this I have the following requirements (in order of importance):
non-equal strings are required to be detected different
as fast as possible
as less memory as possible (=> do not store the whole string for comparison)
high propability to detect equal strings as equal
It is not a big problem, if sometimes equal strings are not detected as equal as this would just trigger a "re-rendering" which would not be needed, but because of the "heavy work" of this, this should happen as less as possible.
I first thought of using String.GetHashCode(), but the probalility of getting the same hash-code for two non-equal strings is pretty high.
Are there any good combinations like checking hash-code and Length to get the probability of to non-equal strings wrongly detected as equal to an unrealisticly happening low number?
Or is using some hashing algorithm, like MD5 or SHA, a good alternative (after hash-code is equal)?
My rendering looks something like the following:
public string RenderTemplate(string name, string template)
{
var cachedTemplate = Cache.Get(name);
if(cachedTemplate == null || !cachedTemplate.Equals(template)) // <= Equals
{
cachedTemplate = new Template(name, template);
cachedTemplate.Render();
Cache.Set(name, cachedTemplate);
}
return cachedTemplate.Result;
}
The Equals is the point I am asking about.
I am also open for other suggestions how this could be solved.
UPDATE:
To add some numbers to get more context:
I expect to have >1000 individual templates and each template will have up to at least a few thousand characters.
This is why I would like to avoid storing the whole template-string "in memory" only for the comparison.
Most of the templates are stored in the DB.
UPDATE 2:
What do you think about extending my RenderTemplate method with a timestamp as suggested by Nikola:
public string RenderTemplate(string name, string template, DateTime timestamp)
Then I could compare name, GetHashCode and timestamp which does not need much memory, should be pretty fast and the probability of a "wrongly detected equality" is practically 0. The timestamp I can read from the DB (have it already there) or the "last changed date" from the file-system for a file-based template.
You don't have much choice. If you don't compare strings by comparing their content, use a hash algorithm to determine if strings are equal. Personally, I would probably use a hash algorithm. If you are a bit paranoid and afraid of a collision, choose algorithm with widest space (e.g. SHA512).
Why do you need to compare strings to determine that a template has changed? Why not use a different approach?
If file is stored on disk, why not use a file watcher?
If stored in database, why not use a timestamp to detect when it was saved?
If application is restarted, anyway reload templates
Also, it's worrying that a template for UI changes so often that you must make checks like this. I think you have more problems with design beside comparing strings.
I'm looking at using a Guid as a random anonymous visitor identifier for a website (stored both as a cookie client-size, and in a db server-side), and I wanted a cryptographically strong way of generating Guids (so as to minimize the chance of collisions).
For the record, there are 16 bytes (or 128 bits) in a Guid.
This is what I have in mind:
/// <summary>
/// Generate a cryptographically strong Guid
/// </summary>
/// <returns>a random Guid</returns>
private Guid GenerateNewGuid()
{
byte[] guidBytes = new byte[16]; // Guids are 16 bytes long
RNGCryptoServiceProvider random = new RNGCryptoServiceProvider();
random.GetBytes(guidBytes);
return new Guid(guidBytes);
}
Is there a better way to do this?
Edit:
This will be used for two purposes, a unique Id for a visitor, and a transaction Id for purchases (which will briefly be the token needed for viewing/updating sensitive information).
In answer to the OP's actual question whether this is cryptographically strong, the answer is yes since it is created directly from RNGCryptoServiceProvider. However the currently accepted answer provides a solution that is most definitely not cryptographically secure as per this SO answer:
Is Microsoft's GUID generator cryptographically secure.
Whether this is the correct approach architecturally due to theoretical lack of uniqueness (easily checked with a db lookup) is another concern.
So, what you're building is not technically a GUID. A GUID is a Globally Unique Identifier. You're building a random string of 128 bits. I suggest, like the previous answerer, that you use the built-in GUID generation methods. This method has a (albeit tremendously small) chance of generating duplicate GUID's.
There are a few advantages to using the built-in functionality, including cross-machine uniqueness [partially due to the MAC Address being referenced in the guid, see here: http://en.wikipedia.org/wiki/Globally_Unique_Identifier.
Regardless of whether you use the built in methods, I suggest that you not expose the Purchase GUID to the customer. The standard method used by Microsoft code is to expose a Session GUID that identifies the customer and expires comparatively quickly. Cookies track customer username and saved passwords for session creation. Thus your 'short term purchase ID' is never actually passed to (or, more importantly, received from) the client and there is a more durable wall between your customers' personal information and the Interwebs at large.
Collisions are theoretically impossible (it's not Globally Unique for nothing), but predictability is a whole other question. As Christopher Stevenson correctly points out, given a few previously generated GUIDs it actually becomes possible to start predicting a pattern within a much smaller keyspace than you'd think. GUIDs guarantee uniqueness, not predictability. Most algorithms take it into account, but you should never count on it, especially not as transaction Id for purchases, however briefly. You're creating an open door for brute force session hijacking attacks.
To create a proper unique ID, take some random stuff from your system, append some visitor specific information, and append a string only you know on the server, and then put a good hash algorithm over the whole thing. Hashes are meant to be unpredictable and unreversable, unlike GUIDs.
To simplify: if uniqueness is all you care about, why not just give all your visitors sequential integers, from 1 to infinity. Guaranteed to be unique, just terribly predictable that when you just purchased item 684 you can start hacking away at 685 until it appears.
To avoid collisions:
If you can't keep a global count, then use Guid.NewGuid().
Otherwise, increment some integer and use 1, 2, 3, 4...
"But isn't that ridiculously easy to guess?"
Yes, but accidental and deliberate collisions are different problems with different solutions, best solved separately, note least because predictability helps prevent accidental collision while simultaneously making deliberate collision easier.
If you can increment globally, then number 2 guarantees no collisions. UUIDs were invented as a means to approximate that without the ability to globally track.
Let's say we use incrementing integers. Let's say the ID we have in a given case is 123.
We can then do something like:
private static string GetProtectedID(int id)
{
using(var sha = System.Security.Cryptography.SHA1.Create())
{
return string.Join("", sha.ComputeHash(Encoding.UTF8.GetBytes(hashString)).Select(b => b.ToString("X2"))) + id.ToString();
}
}
Which produces 09C495910319E4BED2A64EA16149521C51791D8E123. To decode it back to the id we do:
private static int GetIDFromProtectedID(string str)
{
int chkID;
if(int.TryParse(str.Substring(40), out chkID))
{
string chkHash = chkID.ToString() + "this is my secret seed kjٵتשڪᴻᴌḶḇᶄ™∞ﮟﻑfasdfj90213";
using(var sha = System.Security.Cryptography.SHA1.Create())
{
if(string.Join("", sha.ComputeHash(Encoding.UTF8.GetBytes(hashString)).Select(b => b.ToString("X2"))) == str.Substring(0, 40))
return chkID;
}
}
return 0;//or perhaps raise an exception here.
}
Even if someone guessed from that they were given number 123, it wouldn't let them deduce that the id for 122 was B96594E536C9F10ED964EEB4E3D407F183FDA043122.
Alternatively, the two could be given as separate tokens, and so on.
I generally just use Guid.NewGuid();
http://msdn.microsoft.com/en-us/library/system.guid.newguid(v=vs.110).aspx
I'm developing a ticketing system for tracking bugs and software changes using ASP.NET MVC 4 and Entity Framework 5. I need a way to pick a unique number from a set of possible numbers. My thought is to create a set of possible numbers and mark numbers from this set as they are used and assigned to a support ticket.
I have this code for generating all possible ticket numbers to choose from, but I want to have leading zeroes so that all ticket numbers have the same length:
public static class GenerateNumber
{
private static IEnumerable<int> GenerateNumbers(int count)
{
return Enumerable.Range(0, count);
}
public static IEnumerable<string> GenerateTicketNumbers(int count)
{
return GenerateNumbers(count).Select(n => "TN" + n.ToString());
}
}
I want the output of
IEnumerable<string> ticketNumbers = GenerateNumber.GenerateTicketNumbers(Int32.MaxValue);
to be something like this:
TN0000000001
.
.
.
TN2147483647
Hopefully we won't need anything as large as Int32.MaxValue as that would mean we have way too many bugs haha. I just wanted to be safe than sorry on the limits of the available numbers. Perhaps we could use the methodology of reusing ticket numbers after they have been resolved. However, I don't know how I feel about reuse as it could lead to ambiguity for referring to documentation later on.
Considering the size of this set, is this the most efficient method to go about having unique ticket numbers?
Use an identity column in the database - this will autoincrement for you.
If you need a prefix as well, then store this as a separate varchar column and then for display purposes you can concatenate it (with your requisite leading zeros if that is absolutely really necessary). Trying to store an incrementing number in a varchar field is going to bite you in the ass one day.
As a side note, why the leading zeros? If I am fixing a ticket, I want to annotate my code with the ticket number. Leading zeros are just a pain - why not just have TN-123 and have the number get bigger as required?
I have the following t-sql code which I have converted to c#.
DECLARE #guidRegular UNIQUEIDENTIFIER, #dtmNow DATETIME
SELECT #guidRegular = '{5bf8e554-8dbc-4008-9d48-5c6e0a4d28d7}'
SELECT #dtmNow = '2012-02-09 18:31:38'
print (CAST(CAST(#guidRegular AS BINARY(10)) + CAST(#dtmNow AS BINARY(6)) AS UNIQUEIDENTIFIER))
When I execute the .net version of the code (using same Guid and DateTime) I Get a different guid? It looks like it has something to do with the datetime element can anyone help ?
c# extension code:
using system.data.linq;
...
...
public static class GuidExtensions
{
public static Guid ToNewModifiedGuid(this Guid guid)
{
var dateTime = new DateTime(2012,02,09,18,31,38);
var guidBinary = new Binary(guid.ToByteArray().Take(10).ToArray());
var dateBinary = new Binary(BitConverter.GetBytes(dateTime.ToBinary()).ToArray().Take(6).ToArray());
var bytes = new byte[guidBinary.Length + dateBinary.Length];
Buffer.BlockCopy(guidBinary.ToArray(), 0, bytes, 0, guidBinary.ToArray().Length);
Buffer.BlockCopy(dateBinary.ToArray(), 0, bytes, guidBinary.ToArray().Length, dateBinary.ToArray().Length);
return new Guid(bytes);
}
}
I'm not surprised that SQL and .net would have different binary representations of a date/time. I would be surprised if they had.
Your c# code is asking the DateTime structure to serialize a value to a 64-bit ( 8 byte) byte array that can be used to recreate the same value. Then you're throwing away 2 bytes (the year? the millisecond? a checksum? who knows?)
Your sql code is asking the sql engine to take it's internal representation of a datetime - which is also 8 bytes - throw away two, and give the result.
So:
If you want identical values, you would need to stop relying on the internals of how a datetime is stored / serialized. Convert it to 6 bytes using a repeatable method you can write in both .net and tsql
Realize that you are removing the 6 bytes of a guid that represent the spatially unique portion and replacing them with the time. So you are creating a GUID that has the time encoded twice, and are greatly increasing the odds of duplicate GUIDs being created.
Of course, this ignores the more glaring issue of "why would anyone want to do that?" I'm going to assume that it's some really brilliant subsystem, instead of the more likely explanation that somebody is desperately trying to solve the wrong problem.
The original article has a flaw in the logic. The author describes both Natural and Surrogate keys but doesn't recognize that the RFC for UUIDs can be used to create a Natural key. Of course, doing so would require creating a custom function for generating a UUID based on some solution domain information, rather than relying on the default machine/time-based function for their generation.
Doing a single function to replace the generation of the keys makes a lot more sense than this, though.
I'm working on a mobile app and i want to optimise the data that it's receiving from the server (as JSON).
There are 3 lists returned (each containing its own class of objects, the approximate list sizes are 50, 100 and 170). Each object has a Guid id and there is some relation data for each object. E.g.:
o = { Id = "8f088552-5b24-4ba4-a6e5-8958c4353581",
RelatedIds = ["19d2e562-0874-473f-8e05-7052e8defd9a", "615b4c47-199a-4f7d-8268-08ed43d9c891", ... ] }
Is there a way to compress these Guids to something sorter without storing an identity map? Perhaps using a hash function?
You can convert the 16-byte representation of a GUID into a Base 64 string. However you didn't mention a programming language so we can't help further.
A hash function is not recommended here because hash functions are generally lossy.
No. One of the attributes of (non-cryptographic) hashes is that they collide: hash(a) == hash(b) but a != b. They are a performance optimization in the case where you are doing a lot of equality checks and you expect many false results (because if hash(a) != hash(b) then a != b). A GUID->counter map is probably the best way to get smaller ids here.
You can convert hex (base16) to base64, and remove all the punctuation. You should save 25% for using base64, and another 4 bytes for punctuation.
Thinking about it some more i've realized that HTTP compression (if enabled) is probably going to compress that data well enough anyway, so it's not really worth the effort to compress data manually.