How expensive is MD5 generation in .NET? - c#

To interact with an external data feed I need to pass a rolling security key which has been MD5 hashed (every day we need to generate a new MD5 hashed key).
I'm trading up whether or not to do it every time we call the external feed or not. I need to has a string of about 10 characters for the feed.
It's for an ASP.NET (C#/ .NET 3.5) site and the feed is used on pretty much every page. Would I best off generating the hash once a day and then storing it in the application cache, and taking the memory hit, or generating it on each request?

The only acceptable basis for optimizations is data. Measure generating this inline and measure caching it.
My high-end workstation can calculate well over 100k MD5 hashes of a 10-byte data segment in a second. There would be zero benefit from caching this for me and I bet it's the same for you.

Generate some sample data. Well, a lot of it. Compute the MD5 of the sample data. Measure the time it takes. Decide for yourself.

Calculate the time complexity of the algorithm!
Look at the following code:
public string GetMD5Hash(string input)
{
System.Security.Cryptography.MD5CryptoServiceProvider x = new System.Security.Cryptography.MD5CryptoServiceProvider();
byte[] bs = System.Text.Encoding.UTF8.GetBytes(input);
bs = x.ComputeHash(bs);
System.Text.StringBuilder s = new System.Text.StringBuilder();
foreach (byte b in bs)
{
s.Append(b.ToString("x2").ToLower());
}
string password = s.ToString();
return password;
}
If we were to calculate the time complexity we would get T= 11 + n * 2 however this is just "what we see" i.e. ToLower might do some heavy work which we don't know. But from this point we can see that this algorithm is O(n) in all cases. Meaning time grows as data growns.
Also to adress the cache issue, I'd rather have my "heavy" work in Memory since memory is less expensive when compared to CPU-usage.

If it'll be the same for a given day caching it might be an idea. You could even set the cache to be 24 hours and write code to regenerate the hash when the cache expires

Using the Asp.Net cache is very easy so I don't see why you shouldn't cache the key.
Storing the key in cache may even save some memory since you can reuse it instead of creating a new one for each request.

Related

C# Quick bit array

as stated in the title i am evaluating the cost of implement a BitArray over bytes[] (i have understood that native BitArray is pretty slow) insthead of using a string representation of bits (eg : "001001001" ) but i am open to any suggestion that are more effective.
The length of array is not known at design time, but i suppose may be between 200 and 500 bit per array.
Memory is not a concern, so use a lot of memory for represent the array is not an issue, what matter is speed when array is created and manupulated (thiy will be manipulated a lot).
Thanks in advance for yours consideration and suggenstion onto the topic.
A few suggestions:
1) Computers don't process bits o even n int or long will work at the same speed
2) To reach speed you can consider writing it with unsafe code
3) New is expensive. If the objects are created a lot you can do the following: Create a bulk of 10K
objects at a time and serve them from a method when required. Once the cache runs out you can recreate them. Have another method that once an object processing completes you clean it up and return it to the cache
4) Make sure your manipulation is optimal

Is this a cryptographically strong Guid?

I'm looking at using a Guid as a random anonymous visitor identifier for a website (stored both as a cookie client-size, and in a db server-side), and I wanted a cryptographically strong way of generating Guids (so as to minimize the chance of collisions).
For the record, there are 16 bytes (or 128 bits) in a Guid.
This is what I have in mind:
/// <summary>
/// Generate a cryptographically strong Guid
/// </summary>
/// <returns>a random Guid</returns>
private Guid GenerateNewGuid()
{
byte[] guidBytes = new byte[16]; // Guids are 16 bytes long
RNGCryptoServiceProvider random = new RNGCryptoServiceProvider();
random.GetBytes(guidBytes);
return new Guid(guidBytes);
}
Is there a better way to do this?
Edit:
This will be used for two purposes, a unique Id for a visitor, and a transaction Id for purchases (which will briefly be the token needed for viewing/updating sensitive information).
In answer to the OP's actual question whether this is cryptographically strong, the answer is yes since it is created directly from RNGCryptoServiceProvider. However the currently accepted answer provides a solution that is most definitely not cryptographically secure as per this SO answer:
Is Microsoft's GUID generator cryptographically secure.
Whether this is the correct approach architecturally due to theoretical lack of uniqueness (easily checked with a db lookup) is another concern.
So, what you're building is not technically a GUID. A GUID is a Globally Unique Identifier. You're building a random string of 128 bits. I suggest, like the previous answerer, that you use the built-in GUID generation methods. This method has a (albeit tremendously small) chance of generating duplicate GUID's.
There are a few advantages to using the built-in functionality, including cross-machine uniqueness [partially due to the MAC Address being referenced in the guid, see here: http://en.wikipedia.org/wiki/Globally_Unique_Identifier.
Regardless of whether you use the built in methods, I suggest that you not expose the Purchase GUID to the customer. The standard method used by Microsoft code is to expose a Session GUID that identifies the customer and expires comparatively quickly. Cookies track customer username and saved passwords for session creation. Thus your 'short term purchase ID' is never actually passed to (or, more importantly, received from) the client and there is a more durable wall between your customers' personal information and the Interwebs at large.
Collisions are theoretically impossible (it's not Globally Unique for nothing), but predictability is a whole other question. As Christopher Stevenson correctly points out, given a few previously generated GUIDs it actually becomes possible to start predicting a pattern within a much smaller keyspace than you'd think. GUIDs guarantee uniqueness, not predictability. Most algorithms take it into account, but you should never count on it, especially not as transaction Id for purchases, however briefly. You're creating an open door for brute force session hijacking attacks.
To create a proper unique ID, take some random stuff from your system, append some visitor specific information, and append a string only you know on the server, and then put a good hash algorithm over the whole thing. Hashes are meant to be unpredictable and unreversable, unlike GUIDs.
To simplify: if uniqueness is all you care about, why not just give all your visitors sequential integers, from 1 to infinity. Guaranteed to be unique, just terribly predictable that when you just purchased item 684 you can start hacking away at 685 until it appears.
To avoid collisions:
If you can't keep a global count, then use Guid.NewGuid().
Otherwise, increment some integer and use 1, 2, 3, 4...
"But isn't that ridiculously easy to guess?"
Yes, but accidental and deliberate collisions are different problems with different solutions, best solved separately, note least because predictability helps prevent accidental collision while simultaneously making deliberate collision easier.
If you can increment globally, then number 2 guarantees no collisions. UUIDs were invented as a means to approximate that without the ability to globally track.
Let's say we use incrementing integers. Let's say the ID we have in a given case is 123.
We can then do something like:
private static string GetProtectedID(int id)
{
using(var sha = System.Security.Cryptography.SHA1.Create())
{
return string.Join("", sha.ComputeHash(Encoding.UTF8.GetBytes(hashString)).Select(b => b.ToString("X2"))) + id.ToString();
}
}
Which produces 09C495910319E4BED2A64EA16149521C51791D8E123. To decode it back to the id we do:
private static int GetIDFromProtectedID(string str)
{
int chkID;
if(int.TryParse(str.Substring(40), out chkID))
{
string chkHash = chkID.ToString() + "this is my secret seed kjٵتשڪᴻᴌḶḇᶄ™∞ﮟﻑfasdfj90213";
using(var sha = System.Security.Cryptography.SHA1.Create())
{
if(string.Join("", sha.ComputeHash(Encoding.UTF8.GetBytes(hashString)).Select(b => b.ToString("X2"))) == str.Substring(0, 40))
return chkID;
}
}
return 0;//or perhaps raise an exception here.
}
Even if someone guessed from that they were given number 123, it wouldn't let them deduce that the id for 122 was B96594E536C9F10ED964EEB4E3D407F183FDA043122.
Alternatively, the two could be given as separate tokens, and so on.
I generally just use Guid.NewGuid();
http://msdn.microsoft.com/en-us/library/system.guid.newguid(v=vs.110).aspx

Encrypting Windows Phone Resources at the Bit Level... am I doing this right?

I have a question concerning encryption, more specifically encryption that requires no internet connection (opposed to private / public key or OAuth methods).
The problem arose when I discovered that the WP7 app store is not secure. I won't post a link, but a basic search will yield a desktop application that allows you to download any free WP7 in the marketplace. Then it's a matter of renaming .xap to .zip, and using reflector to look at the code.
I believe that Dotfuscator will solve my problem, but as a learning experience I decided to come up with my own solution.
I decided to have a program that in prebuild gathers the files I want to encrypt, puts them in one file, encrypts that file, and adds it to the project for compilation. Code in the phone app only needs to decrypt the data.
The data I'm encrypting / decrypting is several API Keys (for ~10 web services), meant to be readable as plain text when decrypted.
This is the encryption algorithm (roughly, and with a few alterations) that I came up with:
public static byte[] SuffleData(byte[] data)
{
// Create a bit array to deal with the data on the bit level
BitArray bits = new BitArray(data);
// Generate a random GUID, and store it in a bit array as well
Guid guid = Guid.NewGuid();
BitArray guidBits = new BitArray(guid.ToByteArray());
int guidBitsIndex = 0;
// Iterate over all the data bit by bit
for (int i = 0; i < bits.Count / 2; i++)
{
// if the current GUID bit is true (1), then swap
// the current bit with it's mirror
if (guidBits[guidBitsIndex])
{
bool temp = bits[i];
bits[i] = bits[bits.Length - i];
bits[bits.Length - i] = temp;
}
// Because the data being shuffled is expected to
// contain more bits than the GUID, this index
// needs to be reset
if (guidBitsIndex == guidBits.Count)
guidBitsIndex = 0;
else
guidBitsIndex++;
}
// HideGuidInData hides the bits for the GUID in a hard
// coded location inside the data being encrypted.
HideGuidInData(ref bits, guidBits);
// Convert the shuffled data bits (now containing the
// GUID needed to decrypt the bits) into a byte array
byte[] shuffled = new byte[bits.Length / 8];
bits.CopyTo(shuffled, 0);
// return the data, now shuffled. (this array should
// be the length of the original data, plus 16 bytes,
// since 16 bytes are needed to store the GUID).
return shuffled;
}
I may be shooting myself in the foot posting this, but if it's not known that the data is encrypted using this method, brute force breaking of this takes n! time, where n is the total number of bits in the file. (basically, much, much higher than the probability of randomly guessing a GUID).
Assuming the GUID is well hidden within the file, a brute force attack would take a very long time to figure out.
I spent a lot of time learning about encryption on my way to this solution, and everything I read seemed to be WAY more complicated than this (and, obviously all the things I read dealt with two parties, where encryption can involve a key being passed between them).
What I learned is this:
If the key to encrypting the data is stored with the data, it's only a matter of time for someone to crack it, and get the data
There is no such thing as "perfectly secure". There are varying degrees of success in encryption, and generally speaking, when picking a method of encryption you will want to weigh the importance of the data being secure with the ease with which (considering processor and memory limitations) the data can be decrypted by your program.
I'm thinking that this is too simple to be a good solution. Can anyone prove that suspicion, and explain to me why this isn't as secure as some other methods of encryption? (or make me very happy and tell me this is pretty secure?)
These are the downsides to this algorithm that I can see right now:
The algorithm requires all of the data to be in memory (not TOO worried about this, since I'm encrypting a very small file that's ~500 bytes)
The algorithm requires changing the position of the stream reading the data in order to extract the GUID (basically you can't stream the file from the beginning to the end to decrypt it).
As a note, my application is not really of high importance, realistically it's not likely that anyone malicious will every use reflector to look at my code (realistically it's just people like me who want to know how something works, not do any harm).
This algorithm isn't going to buy you much. Someone who goes to the trouble of downloading your app and using Reflector will have your encrypted data and the code of the decryption process. They could just find your method for decrypting the data, and then use it.
The problem is that you're storing the "encryption key" in the cypher text. There is no way to make that secure when the attacker also has access to the algorithm used. Doesn't matter what crypto system you use.
The basic problem you have is that the phone application itself has to have all the information needed to decrypt and use the data, so anyone looking at the code will be able to see that.
It's the same reason that DRM schemes on DVDs, etc are routinely broken so quickly. Any device, or application, that is able to play DRM protected material has to have the means to decrypt it. Do enough poking arond in memory while the device or app is playing the content and you'll find the decryption key, and then you can crack any similiarly protected media any time you like.

C# dictionary - how to solve limit on number of items?

I am using Dictionary and I need to store almost 13 000 000 keys in it. Unfortunatelly, after adding 11 950 000th key I got an exception "System out of memory". Is there any solution of this problem? I will need my program to run on less powerable computers than is actually mine in the future..
I need that many keys because I need to store pairs - sequence name and sequence length, it is for solving bioinformatics related problem.
Any help will be appreciated.
Buy more memory, install a 64 bit version of the OS and recompile for 64 bits. No, I'm not kidding. If you want so many objects... in ram... And then call it a "feature". If the new Android can require 16gb of memory to be compiled...
I was forgetting... You could begin by reading C# array of objects, very large, looking for a better way
You know how many are 13 million objects?
To make a comparison, a 32 bits Windows app has access to less than 2 gb of address space. So it's 2 billion bytes (give or take)... 2 billion / 13 million = something around 150 bytes/object. Now, if we consider how much a reference type occupies... It's quite easy to eat 150 bytes.
I'll add something: I've looked in my Magic 8-Ball and it told me: show us your code. If you don't tell us what you are using for the key and the values, how should we be able to help you? What are you using, class or struct or "primitive" types? Tell us the "size" of your TKey and TValue. Sadly our crystall ball broke yesterday :-)
C# is not a language that was designed to solve heavy-duty scientific computation problems. It is absolutely possible to use C# to build tools that do what you want, but the off-the-shelf parts like Dictionary were designed to solve more common business problems, like mapping zip codes to cities and that sort of thing.
You're going to have to go with external storage of some sort. My recommendation would be to buy a database and use it to store your data. Then use a DataSet or some similar technology to load portions of the data into memory, manipulate them, and then pour more data from the database into the DataSet, and so on.
Well, I had almost exactly the same problem.
I wanted to load about 12.5 million [string, int]s into a dictionary from a database (for all the programming "gods" above who don't understand why, the answer is that it is enormously quicker when you are working with a 150 GB database if you can cache a proportion of one of the key tables in memory).
It annoyingly threw an out of memory exception at pretty much the same place - just under the 12 million mark even though the process was only consuming about 1.3 GB of memory (reduced to about 800 MB of memory after a judicious change in db read method to not try and do it all at once) - despite running on an I7 with 8 GB of memory.
The solution was actually remarkably simple -
in Visual Studio (2010) in Solution Explorer right click the project and select properties.
In the Build tab set Platform Target to x64 and rebuild.
It rattles through the load into the Dictionary in a few seconds and the Dictionary performance is very good.
Easy solution is just use simple DB. The most obvious solution in this case, IMHO is using SQLite .NET , fast, easy and with low memory footprint.
I think that you need a new approach to your processing.
I must assume that you obtain the data from a file or a database, either way that is where it should remain.
There is no way that you may actually increase the limit on the number of values stored within a Dictionary, other than increasing system memory, but eitherway it is an extremely inefficient means of processing such a alarge amount of data.
You should rethink your algorithm so that you can process the data in more manageable portions. It will mean processing it in stages until you get your result. This may mean many hundreeds of passes through the data, but it's the only way to do it.
I would also suggest that you look at using generics to help speed up this repetitive processing and cut down on memory usage.
Remember that there will still be a balancing act between system performance and access to externally stored data (be it external disk store or database).
It is not the problem with the Dictionary object, but the available memory in your server. I've done some investigation to understand the failures of dictionary object, but it never failed. Below is the code for your reference
private static void TestDictionaryLimit()
{
int intCnt = 0;
Dictionary<long, string> dItems = new Dictionary<long, string>();
Console.WriteLine("Total number of iterations = {0}", long.MaxValue);
Console.WriteLine("....");
for (long lngCnt = 0; lngCnt < long.MaxValue; lngCnt++)
{
if (lngCnt < 11950020)
dItems.Add(lngCnt, lngCnt.ToString());
else
break;
if ((lngCnt % 100000).Equals(0))
Console.Write(intCnt++);
}
Console.WriteLine("Completed..");
Console.WriteLine("{0} number of items in dictionary", dItems.Count);
}
The above code executes properly, and stores more than the number of count that you have mentioned.
Really 13000000 items are quite a lot.
If 13000000 are allocated classes is a very deep kick into garbage collector stomach!
Also if you find a way to use the default .NET dictionary, the performance would be really bad, too much keys, the number of keys approaches the number of values a 31 bit hash can use, performance will be awful in whatever system you use, and of course, memory will be too much!
If you need a data structure that can use more memory than an hash table you probably need a custom hashtable mixed with a custom binary tree data structure.
Yes, it is possible to write your own combination of two.
You cannot rely on .net hashtable for sure for this so strange and specific problem.
Consider that a tree have a lookup complexity of O(log n), while a building complexity of O(n * log n), of course, building it will be too long.
You should then build an hashtable of binary trees (or viceversa) that will allow you to use both data structures consuming less memory.
Then, think about compiling it in 32 bit mode, not in 64 bit mode: 64 bit mode uses more memory for pointers.
In the same time it i spossible the contrary, 32 bit address space may be is not sufficient for your problem.
It never happened to me to have a problem that can run out 32 bit address space!
If both keys and values are simple value types i would suggest you to write your data structure in a C dll and use it through C#.
You can try to write a dictionary of dictionaries.
Let's say, you can split your data into chunks of 500000 items between 26 dictionaries for example, but the occupied memory would be very very big, don't think your system will handle it.
public class MySuperDictionary
{
private readonly Dictionary<KEY, VALUE>[] dictionaries;
public MySuperDictionary()
{
this.dictionaries = new Dictionary<KEY, VALUE>[373]; // must be a prime number.
for (int i = 0; i < dictionaries.Length; ++i)
dictionaries[i] = new Dicionary<KEY, VALUE>(13000000 / dictionaries.Length);
}
public void Add(KEY key, VALUE value)
{
int bucket = (GetSecondaryHashCode(key) & 0x7FFFFFFF) % dictionaries.Length;
dictionaries[bucket].Add(key, value);
}
public bool Remove(KEY key)
{
int bucket = (GetSecondaryHashCode(key) & 0x7FFFFFFF) % dictionaries.Length;
return dictionaries[bucket].Remove(key);
}
public bool TryGetValue(KEY key, out VALUE result)
{
int bucket = (GetSecondaryHashCode(key) & 0x7FFFFFFF) % dictionaries.Length;
return dictionaries[bucket].TryGetValue(key, out result);
}
public static int GetSecondaryHashCode(KEY key)
{
here you should return an hash code for key possibly using a different hashing algorithm than the algorithm you use in inner dictionaries
}
}
With that many keys, you should either use a database or something like memcache while swapping out chunks of the cache in storage. I'm doubting you need all of the items at once, and if you do, there's no way it's going to work on a low-powered machine with little RAM.

Compress Guids by hashing in small data sets

I'm working on a mobile app and i want to optimise the data that it's receiving from the server (as JSON).
There are 3 lists returned (each containing its own class of objects, the approximate list sizes are 50, 100 and 170). Each object has a Guid id and there is some relation data for each object. E.g.:
o = { Id = "8f088552-5b24-4ba4-a6e5-8958c4353581",
RelatedIds = ["19d2e562-0874-473f-8e05-7052e8defd9a", "615b4c47-199a-4f7d-8268-08ed43d9c891", ... ] }
Is there a way to compress these Guids to something sorter without storing an identity map? Perhaps using a hash function?
You can convert the 16-byte representation of a GUID into a Base 64 string. However you didn't mention a programming language so we can't help further.
A hash function is not recommended here because hash functions are generally lossy.
No. One of the attributes of (non-cryptographic) hashes is that they collide: hash(a) == hash(b) but a != b. They are a performance optimization in the case where you are doing a lot of equality checks and you expect many false results (because if hash(a) != hash(b) then a != b). A GUID->counter map is probably the best way to get smaller ids here.
You can convert hex (base16) to base64, and remove all the punctuation. You should save 25% for using base64, and another 4 bytes for punctuation.
Thinking about it some more i've realized that HTTP compression (if enabled) is probably going to compress that data well enough anyway, so it's not really worth the effort to compress data manually.

Categories

Resources