I have a situation where I need to create some kind of uniqueness between 'entities', but it is not a GUID, and it is not saved in a database (It is saved, however. Just not by a database).
The basic use of the key is a mere redundancy check. It does not have to be as scalable as a real 'primary key', but in the simplest terms I can think of , this is how it works.
[receiver] has List<possibilities>.
possibilities exist independently, but many will have the same values (impossible to predict. This is by design)
Frequently, the receivers list of possibilities will have to be emptied and then refilled (this is a business requirement).
The key is basically used to add a very lightweight redundancy check. In other words, sometimes the same possibility will be repeated, sometimes it should only appear once in the receiver's list.
I basically want to use something very light and simple. A string is sufficient. I was just wanting to figure out a modest algorithm to accomplish this. I thought about using the GetHashCode() method, but I am not certain about how reliable that is. Can I get some thoughts?
If you can use GetHashCode() at a first glance, you can probably use an MD5 hash as well, obtaining less collision probability. The resulting MD5 can be stored as a 24 charachter string by encoding it base 64, let see this example:
public static class MD5Gen
{
static MD5 hash = MD5.Create();
public static string Encode(string toEncode)
{
return Convert.ToBase64String(
hash.ComputeHash(Encoding.UTF8.GetBytes(toEncode)));
}
}
with this you encode a source string in an md5 hash in string format too. You just have to write the "possibility" class in term of string.
Try this for generating Guid.
VBScript Function to Generate a UUID/GUID
If you are on Windows, you can use the simple VBScript below to generate a UUID. Just save the code to a file called createguid.vbs, and then run cscript createguid.vbs at a command prompt.
Set TypeLib = CreateObject("Scriptlet.TypeLib")
NewGUID = TypeLib.Guid
WScript.Echo(left(NewGUID, len(NewGUID)-2))
Set TypeLib = Nothing
Create a UUID/GUID via the Windows Command Line
If you have the Microsoft SDK installed on your system, you can use the utility uuidgen.exe, which is located in the "C:\Program Files\Microsoft SDK\Bin" directory
or try the same for more info.
Link
I would say go for the Windows command line as it is more reliable.
Related
I am writing a templating engine and I am searching for a good way to detect if a template has changed.
For this I have the following requirements (in order of importance):
non-equal strings are required to be detected different
as fast as possible
as less memory as possible (=> do not store the whole string for comparison)
high propability to detect equal strings as equal
It is not a big problem, if sometimes equal strings are not detected as equal as this would just trigger a "re-rendering" which would not be needed, but because of the "heavy work" of this, this should happen as less as possible.
I first thought of using String.GetHashCode(), but the probalility of getting the same hash-code for two non-equal strings is pretty high.
Are there any good combinations like checking hash-code and Length to get the probability of to non-equal strings wrongly detected as equal to an unrealisticly happening low number?
Or is using some hashing algorithm, like MD5 or SHA, a good alternative (after hash-code is equal)?
My rendering looks something like the following:
public string RenderTemplate(string name, string template)
{
var cachedTemplate = Cache.Get(name);
if(cachedTemplate == null || !cachedTemplate.Equals(template)) // <= Equals
{
cachedTemplate = new Template(name, template);
cachedTemplate.Render();
Cache.Set(name, cachedTemplate);
}
return cachedTemplate.Result;
}
The Equals is the point I am asking about.
I am also open for other suggestions how this could be solved.
UPDATE:
To add some numbers to get more context:
I expect to have >1000 individual templates and each template will have up to at least a few thousand characters.
This is why I would like to avoid storing the whole template-string "in memory" only for the comparison.
Most of the templates are stored in the DB.
UPDATE 2:
What do you think about extending my RenderTemplate method with a timestamp as suggested by Nikola:
public string RenderTemplate(string name, string template, DateTime timestamp)
Then I could compare name, GetHashCode and timestamp which does not need much memory, should be pretty fast and the probability of a "wrongly detected equality" is practically 0. The timestamp I can read from the DB (have it already there) or the "last changed date" from the file-system for a file-based template.
You don't have much choice. If you don't compare strings by comparing their content, use a hash algorithm to determine if strings are equal. Personally, I would probably use a hash algorithm. If you are a bit paranoid and afraid of a collision, choose algorithm with widest space (e.g. SHA512).
Why do you need to compare strings to determine that a template has changed? Why not use a different approach?
If file is stored on disk, why not use a file watcher?
If stored in database, why not use a timestamp to detect when it was saved?
If application is restarted, anyway reload templates
Also, it's worrying that a template for UI changes so often that you must make checks like this. I think you have more problems with design beside comparing strings.
I am using this example that Brett gave:
Encrypt and decrypt a string
And doing this:
public static bool VerifyLicenseKey(string applicationGuid)
{
Console.WriteLine("G: " + applicationGuid);
var appSettings = AppSettings.GetInstance();
if (appSettings == null)
{
return false;
}
var hwinfo = HardwareInfo.GetHardwareSerial();
Console.WriteLine("h: " + hwinfo);
Console.WriteLine("a: " + applicationGuid);
var currentSerial = Crypto.EncryptStringAES(hwinfo, applicationGuid);
Console.WriteLine("c: " + currentSerial);
Console.WriteLine("o: " + appSettings.LicenseSerialNumber);
if (currentSerial == appSettings.LicenseSerialNumber)
{
return true;
}
return false;
}
}
The GetHardwareSerial and applicationGuid are coming back the same every time but when I call the EncryptStringAES it is not.
Am I using the wrong class? Is it not suppose to be the same each time?
If not, does someone have a better example where the encryted values are the same?
Your "encryption" is actually just obfuscation and not too hard to bypass.
All one needs to know is your application guid (probably stored public) and the method to get the same hardware ID (you probably didn't write that and it's easy to find).
Of course how hard do you want your protection to work depends also on how valuable or high-volume your software is, so simple obfuscation may be enough. Forget the AES, what you need here is a hash algorithm, such as SHA or MD5 where you can hash together your application guid, hardware number, user name etc and store the hash. For most typical users this will be enough of a deterrent.
If you insisto on having hard-to-crack protection you need is digital signatures and an activation procedure. See RSACryptoServiceProvider.
Basically you create a service that knows your private key and you place the matching public key in your software. Then from your software you call the service with HardwareInfo and whatever other info you want to have verified, the service signs it and returns a signature hash.
Once you have that on your client side you can use the public key to check the signature and even though the info can be stored in plaintext the signature can not be easily recreated.
Also check this question for more info.
The algorithm you're referring to uses the RijndaelManaged class and it seems to be using the default value for its IV property which is (quite rightly) automatically set to a new random value whenever you create a new instance (see documentation).
Hence, you'll get a different result every time. (You'll find more about the purpose of the IV on Wikipedia, for example.)
Yeah most AES encryption is non-deterministic (and for good reason) it will not work for you, but since you just want to compare a cryptographic result and you aren't really wanting to decrypt, might I suggest using HMAC instead.
I want to write an application that gets a list of urls.
For each of them I need to monitor periodically if the content has changed.
I thought :
to use HtmlAgilityPack to fetch html content (any other recommendation?)
I don't need to spot the change itself,
so I though to hash the content, save it in the DB
and re-compare the has in the future.
How would you suggest hashing? .net's GetHashCode() ?
I saw this documentation http://support.microsoft.com/kb/307020
which advise using
tmpSource = ASCIIEncoding.ASCII.GetBytes(sSourceData);
why?
You should absolutely not use GetHashCode() for this. The documentation explicitly states:
Furthermore, the .NET Framework does not guarantee the default implementation of the GetHashCode method, and the value it returns will be the same between different versions of the .NET Framework.
The results of GetHashCode can change between runs - all that's guaranteed is that calling it on two equal objects in the same process (possibly AppDomain) will give the same hash code. Indeed, String.GetHashCode's algorithm has changed over time, and in .NET 4 the 32-bit implementation is different to the 64-bit implementation.
If you want to use hashing, use MD5, SHA1 etc - something with a specified algorithm which will not change. (Note that these operation on binary data rather than string data, which is probably more appropriate too - you don't need to bother decoding the data as text.)
It's not clear to me whether refetching periodically is really the best idea though - do these servers not support last modified times, etags etc?
As you have asked for suggestions. I would have used this method instead
WebClient client = new WebClient();
String htmlCode = client.DownloadString("http://google.com");
And i would have saved this string in my DB. After the particular interval i could have compared them again.
But yes I do agree the string size would be really be large.
If I just want to get a alert on the fact the content has changed some how. I would use MD5. As the result size of an MD5 string is only 27 characters.
Hence easier to compare and store in DB
Is it possible to know the location of const variables within an exe? We were thinking of watermarking our program so that each user that downloads the program from our server will have some unique key embedded in the code.
Is there another way to do this?
You could build a binary with a watermark that is a string representation of a GUID in a .net type as a constant. After you build, perform a search for the GUID string in the binary file to check its location. You can change this GUID value to another GUID value and then run the binary and actually see the changed value in code output.
Note: The formatting is important as the length would be very important since you're messing with a compiled binary. For example, you'll want to keep the leading zeros of a GUID so that all instances have the same char length when converted to a string.
i have actually done this sort of thing with Win32 DLLs and even the Sql Server 2000 Desktop exe. (There was a hack where you could switch the desktop edition into a full blown SQL server by flipping a switch in the binary.)
This process could then be automated and a new copy of a DLL would be programmatically altered by a small, server-side utility for each client download.
Also take a look at this: link
It discusses the use of storing settings in a .Net DLL and uses a class-based approach and embeds the app settings file and is configurable after compilation.
Key consideration #1: Assembly signing
Since you are distributing your application, clearly you are signing it. As such, since you're modifying the binary contents, you'll have to integrate the signing process directly in the downloading process.
Key consideration #2: const or readonly
There is a key difference between const and readonly variables that many people do not know about. In particular, if I do the following:
private readonly int SomeValue = 3;
...
if (SomeValue > 0)
...
Then it will compile to byte code like the following:
ldsfld [SomeValue]
ldc.i4.0
ble.s
If you make the following:
private const int SomeValue = 3;
...
if (SomeValue > 0)
...
Then it will compile to byte code like the following:
{contents of if block here}
const variables are [allowed to be] substituted and evaluated by the compiler instead of at run time, where readonly variables are always evaluated at run time. This makes a big difference when you expose fields to other assemblies, as a change to a const variable is a breaking change that forces a recompile of all dependent assemblies.
My recommendation
I see two reasonably easy options for watermarking, though I'm not an expert in the area so don't know how "good" they are overall.
Watermark the embedded splash screen or About box logo image.
Watermark the symmetric key for loading your string resources. Keep a cache so only have to decode them once and it won't be a performance problem - this is a variable applied to a commonly used obfuscation technique. The strings are stored in the binary as UTF-8 encoded strings, and can be replaced in-line as long as the new string's null-terminated length is less than or equal to the length of the string currently found in the binary.
Finally, Google reported the following article on watermarking software that you might want to take a look at.
In C++ (for example):
#define GUID_TO_REPLACE "CC7839EB7EC047B290D686C65F98E0F4"
printf(GUID_TO_REPLACE);
in PHP:
<?php
exec("sed -e 's/CC7839EB7EC047B290D686C65F98E0F4/replacedreplacedreplacedreplaced/g' TestApp.exe > TestAppTagged.exe");
?>
If you stick your compiled binary on the server, visit the php script, download the tagged exe, and run it...you'll see that it now prints the "replaced" string rather than the GUID :)
Note that the length of the replaced string must be identical to the original (32 in this case), so you'll need to pad the length if you want to tag it with something shorter.
I'm not sure what you mean by "location" of a const value. You can certainly use items like reflection to access a const field on a particular type. Const fields bind like any other non-instance field of the same accessibility. I don't know if that fits your definition of location though.
I have an object with the following properties
GID
ID
Code
Name
Some of the clients dont want to enter the Code so the intial plan was to put the ID in the code but the baseobject of the orm is different so I'm like screwed...
my plan was to put ####-#### totally random values in code how can I generate something like that say a windows 7 serial generator type stuff but would that not have an overhead what would you do in this case.
Do you want a random value, or a unique value?
random != unique.
Remember, random merely states a probability of not generating the same value, or a probability of generating the same value again. As time increases, likelihood of generating a previous value increases - becoming a near certainty. Which do you require?
Personally, I recommend just using a Guid with some context [refer to easiest section below]. I also provided some other suggestions so you have options, depending on your situation.
easiest
If Code is an unbounded string [ie can be of any length], easiest semi-legible means of generating a unique code would be
OrmObject ormObject= new OrmObject ();
string code = string.
Format ("{0} [{1}]", ormObject.Name, Guid.NewGuid ()).
Trim ();
// generates something like
// "My Product [DA9190E1-7FC6-49d6-9EA5-589BBE6E005E]"
you can substitute ormObject.Name for any distinguishable string. I would typically use typeof (objectInstance.GetType ()).Name but that will only work if OrmObject is a base class, if it's a concrete class used for everything they will all end up with similar tags. The point is to add some user context, such that - as in #Yuriy Faktorovich's referenced wtf article - users have something to read.
random
I responded a day or two ago about random number generation. Not so much generating numbers as building a simple flexible framework around a generator to improve quality of code and data, this should help streamline your source.
If you read that, you could easily write an extension method, say
public static class IRandomExtensions
{
public static CodeType GetCode (this IRandom random)
{
// 1. get as many random bytes as required
// 2. transform bytes into a 'Code'
// 3. bob's your uncle
...
}
}
// elsewhere in code
...
OrmObject ormObject = new OrmObject ();
ormObject.Code = random.GetCode ();
...
To actually generate a value, I would suggest implementing an IRandom interface with a System.Security.Cryptography.RNGCryptoServiceProvider implementation. Said implementation would generate a buffer of X random bytes, and dole out as many as required, regenerating a stream when exhausted.
Furthermore - I don't know why I keep writing, I guess this problem is really quite fascinating! - if CodeType is string and you want something readable, you could just take said random bytes and turn them into a "seemingly" readable string via Base64 conversion
public static class IRandomExtensions
{
// assuming 'CodeType' is in fact a string
public static string GetCode (this IRandom random)
{
// 1. get as many random bytes as required
byte[] randomBytes; // fill from random
// 2. transform bytes into a 'Code'
string randomBase64String =
System.Convert.ToBase64String (randomBytes).Trim ("=");
// 3. bob's your uncle
...
}
}
Remember
random != unique.
Your values will repeat. Eventually.
unique
There are a number of questions you need to ask yourself about your problem.
Must all Code values be unique? [if not, you're trying too hard]
What Type is Code? [if any-length string, use a full Guid]
Is this a distributed application? [if not, use a DB value as suggested by #LBushkin above]
If it is a distributed application, can client applications generate and submit instances of these objects? [if so, then you want a globally unique identifier, and again Guids are a sure bet]
I'm sure you have more constraints, but this is an example of the kind of line of inquiry you need to perform when you encounter a problem like your own. From these questions, you will come up with a series of constraints. These constraints will inform your design.
Hope this helps :)
Btw, you will receive better quality solutions if you post more details [ie constraints] about your problem. Again, what Type is Code, are there length constraints? Format constraints? Character constraints?
Arg, last edit, I swear. If you do end up using Guids, you may wish to obfuscate this, or even "compress" their representation by encoding them in base64 - similar to base64 conversion above for random numbers.
public static class GuidExtensions
{
public static string ToBase64String (this Guid id)
{
return System.Convert.
ToBase64String (id.ToByteArray ()).
Trim ("=");
}
}
Unlike truncating, base64 conversion is not a lossful transformation. Of course, the trim above is lossful in context of full base64 expansion - but = is just padding, extra information introduced by the conversion, and not part of original Guid data. If you want to go back to a Guid from this base64 converted value, then you will have to re-pad your base64 string until its length is a multiple of 4 - don't ask, just look up base64 if you are interested :)
You could generate a Guid using :
Guid.NewGuid().ToString();
It would give you something like :
788E94A0-C492-11DE-BFD4-FCE355D89593
Use an Autonumber column or Sequencer from your database to generate a unique code number. Almost all modern databases support automatically generated numbers in one form or another. Look into what you database supports.
Autonumber/Sequencer values from the DB are guaranteed to be unique and are relatively inexpensive to acquire. If you want to avoid completely sequential numbers assigned to codes, you can pad and concatenate several sequencer values together.