I saw the following code in my companies codebase and thought to myself "Damn that's a fine line of linq, I'd like to translate that to Haskell to see what it's like in an actual functional language"
static Random random = new Random();
static string RandomString(int length)
{
const string chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
return new string(Enumerable.Repeat(chars, length)
.Select(s => s[random.Next(s.Length)])
.ToArray());
}
However I'm having a bit of trouble getting a concise and direct translation to Haskell because of how awkward it is to generate random numbers in this language.
I've considered a couple of approaches. The most direct translation of the C# code only generates a single random index and then uses that in place of the random.Next(s.Length). But I need to generate multiple indexes, not a single one.
Then I considered doing a list of IO Int random number actions, but I can't work out how to go through and convert the list of IO actions into actual random numbers.
So the Haskell code that I end up writing ends up looking quite convoluted compared to the C# (which I wasn't expecting in this case) and I haven't even got it to work anyway.
My question, what would be a natural translation of the C# to Haskell? Or more generally, how would you go about generating a random String of a specified length in Haskell (because this C# way doesn't seem to translate well to Haskell)?
Note: I'm mainly interested in what the algorithm to generate a random string looks like in Haskell. I'm not really interested in any standard libraries which do the job for me
The natural translation to Haskell involves having some sort of IO (as you need randomness). Since you are essentially trying to perform the action of choosing a character n times, you want to use replicateM. Then, for getting a random number in a range, you can use randomRIO.
import Control.Monad (replicateM)
import System.Random (randomRIO)
randomString :: Int -> IO String
randomString n = replicateM n (do r <- randomRIO (0,m); pure (chars !! r))
where
chars = ['A'..'Z'] ++ ['0'..'9']
m = length chars
This is somewhat complicated by the fact you want a string of only characters in a certain range. otherwise, you'd have a one liner: randomString n = replicateM n randomIO.
That said, the more faithful translation would use conduit. I'm not going to include imports and language pragmas (because they are a bit painful). This looks a lot more like what you would write in C#:
randomString' :: Int -> IO String
randomString' n = runConduit $ replicate n chars
.| mapM (\cs -> do r <- randomRIO (0,m); pure (cs !! r))
.| sinkList
where
chars = ['A'..'Z'] ++ ['0'..'9']
m = length chars
Related
I wrote the following method to help me to generate a combination of decimal digits in C# :
public static string GenerateCodeNumeric(int length)
{
Random random = new Random();
string characters = "0123456789";
StringBuilder result = new StringBuilder(length);
for (int i = 0; i < length; i++)
{
result.Append(characters[random.Next(characters.Length)]);
}
return result.ToString();
}
The issue here is there is a probability that this method produce two or more identical combination, is there any idea to prevent this of happening?
I have a proposition to solve this :
every time that my method generates a code, I will stock it in a table in the database, and before generating this code I look over in this table if that code already exist.
But the problem here is when my table will be heavy of data it will take a long time to look over if this code already exit ... and it will reduce the performance...
It it were for me, I would just increment a counter, that's an absolute guarantee of no collision. (And if you need to generate the codes at different places, reserve some fixed digits to distinguish between them.)
Then if you need the codes to look random, scramble the bits in a reversible way so that distinct values remain distinct. If the codes are long enough, you can even insert random extra bits where you like.
By such a procedure, the values are implicitly unique and there is no need to check after the fact.
Well my question is how can I represent a string into an int code, I don't want it parsed or converted to int (sort of translating from english to french or german for example).
What I want is to convert the string into an int code that can be used as a search reference, I was going to use the hash code of the string to convert it but since hashing has things to do with environmental settings of the machine is not optimal for my project, I had already considered using the ascii codes of each letter for the word but sadly there are an incredible amount of long words in several languages and the app is globalized so it's not a very viable solution, the project is going to be deployed as an azure cloud site so I don't have full-text search
Any ideas what can I do in this case?
You are already giving a solution instead of giving requirements, so it may very well be that there are better options.
Anyway, you could use a platform independent hash with the managed cryptography classes like SHA512Managed. Be aware however that this is not guaranteed unique, so you might end up with collisions; but at least it's built in and you don't have to reinvent the wheel. Go here for an example.
I solved this by creating another table that contained just the words from each post and calling it with the ID of each word
One such hash is getting the sum char code of each letter int hash = s.Select<char, int>(x => (int)x).Aggregate((x, y) => x + y); however that hash has collisions. Another way is concatenate the char code of each letter, however you quickly surpass the number allowed per integer. One such work around for this is subtracting 64 from the value of the chars uint hash = Convert.ToUInt32(s.Select<char, string>(x => (((int)x) - 64).ToString()).Aggregate((x, y) => x + y));
I'm working on a simple game and I have the requirement of taking a word or phrase such as "hello world" and converting it to a series of numbers.
The criteria is:
Numbers need to be distinct
Need ability to configure maximum sequence of numbers. IE 10 numbers total.
Need ability to configure max range for each number in sequence.
Must be deterministic, that is we should get the same sequence everytime for the same input phrase.
I've tried breaking down the problem like so:
Convert characters to ASCII number code: "hello world" = 104 101 108 108 111 32 119 111 114 108 100
Remove everyother number until we satisfy total numbers (10 in this case)
Foreach number if number > max number then divide by 2 until number <= max number
If any numbers are duplicated increase or decrease the first occurence until satisfied. (This could cause a problem as you could create a duplicate by solving another duplicate)
Is there a better way of doing this or am I on the right track? As stated above I think I may run into issues with removing distinction.
If you want to limit the size of the output series - then this is impossible.
Proof:
Assume your output is a series of size k, each of range r <= M for some predefined M, then there are at most k*M possible outputs.
However, there are infinite number of inputs, and specifically there are k*M+1 different inputs.
From pigeonhole principle (where the inputs are the pigeons and the outputs are the pigeonholes) - there are 2 pigeons (inputs) in one pigeonhole (output) - so the requirement cannot be achieved.
Original answer, provides workaround without limiting the size of the output series:
You can use prime numbers, let p1,p2,... be the series of prime numbers.
Then, convert the string into series of numbers using number[i] = ascii(char[i]) * p_i
The range of each character is obviously then [0,255 * p_i]
Since for each i,j such that i != j -> p_i * x != p_j * y (for each x,y) - you get uniqueness. However, this is mainly nice theoretically as the generated numbers might grow quickly, and for practical implementation you are going to need some big number library such as java's BigInteger (cannot recall the C# equivalent)
Another possible solution (with the same relaxation of no series limitation) is:
number[i] = ascii(char[i]) + 256*(i-1)
In here the range for number[i] is [256*(i-1),256*i), and elements are still distinct.
Mathematically, it is theoretically possible to do what you want, but you won't be able to do it in C#:
If your outputs are required to be distinct, then you cannot lose any information after encoding the string using ASCII values. This means that if you limit your output size to n numbers then the numbers will have to include all information from the encoding.
So for your example
"Hello World" -> 104 101 108 108 111 32 119 111 114 108 100
you would have to preserve the meaning of each of those numbers. The simplest way to do this would just 0 pad your numbers to three digits and concatenate them together into one large number...making your result 104101108111032119111114108100 for max numbers = 1.
(You can see where the issue becomes, for arbitrary length input you need very large numbers.) So certainly it is possible to encode any arbitrary length string input to n numbers, but the numbers will become exceedingly large.
If by "numbers" you meant digits, then no you cannot have distinct outputs, as #amit explained in his example with the pidgeonhole principle.
Let's eliminate your criteria as easily as possible.
For distinct, deterministic, just use a hash code. (Hash actually isn't guaranteed to be distinct, but is highly likely to be):
string s = "hello world";
uint hash = Convert.ToUInt32(s.GetHashCode());
Note that I converted the signed int returned from GetHashCode to unsigned, to avoid the chance of having a '-' appear.
Then, for your max range per number, just convert the base.
That leaves you with the maximum sequence criteria. Without understanding your requirements better, all I can propose is truncate if necessary:
hash.toString().Substring(0, size)
Truncating leaves a chance that you'll no longer be distinct, but that must be built in as acceptable to your requirements? As amit explains in another answer, you can't have infinite input and non-infinite output.
Ok, so in one comment you've said that this is just to pick lottery numbers. In that case, you could do something like this:
public static List<int> GenNumbers(String input, int count, int maxNum)
{
List<int> ret = new List<int>();
Random r = new Random(input.GetHashCode());
for (int i = 0; i < count; ++i)
{
int next = r.Next(maxNum - i);
foreach (int picked in ret.OrderBy(x => x))
{
if (picked <= next)
++next;
else
break;
}
ret.Add(next);
}
return ret;
}
The idea is to seed a random number generator with the hash code of the String. The rest of that is just picking numbers without replacement. I'm sure it could be written more efficiently - an alternative is to generate all maxNum numbers and shuffle the first count. Warning, untested.
I know newer versions of the .Net runtime use a random String hash code algorithm (so results will differ between runs), but I believe this is opt-in. Writing your own hash algorithm is an option.
Given a consistently seeded Random:
Random r = new Random(0);
Calling r.Next() consistently produces the same series; so is there a way to quickly discover the N-th value in that series, without calling r.Next() N times?
My scenario is a huge array of values created via r.Next(). The app occasionally reads a value from the array at arbitrary indexes. I'd like to optimize memory usage by eliminating the array and instead, generating the values on demand. But brute-forcing r.Next() 5 million times to simulate the 5 millionth index of the array is more expensive than storing the array. Is it possible to short-cut your way to the Nth .Next() value, without / with less looping?
I don't know the details of the PRNG used in the BCL, but my guess is that you will find it extremely difficult / impossible to find a nice, closed-form solution for N-th value of the series.
How about this workaround:
Make the seed to the random-number generator the desired index, and then pick the first generated number. This is equally 'deterministic', and gives you a wide range to play with in O(1) space.
static int GetRandomNumber(int index)
{
return new Random(index).Next();
}
In theory if you knew the exact algorithm and the initial state you'd be able to duplicate the series but the end result would just be identical to calling r.next().
Depending on how 'good' you need your random numbers to be you might consider creating your own PRNG based on a Linear congruential generator which is relatively easy/fast to generate numbers for. If you can live with a "bad" PRNG there are likely other algorithms that may be better to use for your purpose. Whether this would be faster/better than just storing a large array of numbers from r.next() is another question.
No, I don't believe there is. For some RNG algorithms (such as linear congruential generators) it's possible in principle to get the n'th value without iterating through n steps, but the Random class doesn't provide a way of doing that.
I'm not sure whether the algorithm it uses makes it possible in principle -- it's a variant (details not disclosed in documentation) of Knuth's subtractive RNG, and it seems like the original Knuth RNG should be equivalent to some sort of polynomial-arithmetic thing that would allow access to the n'th value, but (1) I haven't actually checked that and (2) whatever tweaks Microsoft have made might break that.
If you have a good enough "scrambling" function f then you can use f(0), f(1), f(2), ... as your sequence of random numbers, instead of f(0), f(f(0)), f(f(f(0))), etc. (the latter being roughly what most RNGs do) and then of course it's trivial to start the sequence at any point you please. But you'll need to choose a good f, and it'll probably be slower than a standard RNG.
You could build your own on-demand dictionary of 'indexes' & 'random values'. This assumes that you will always 'demand' indexes in the same order each time the program runs or that you don't care if the results are the same each time the program runs.
Random rnd = new Random(0);
Dictionary<int,int> randomNumbers = new Dictionary<int,int>();
int getRandomNumber(int index)
{
if (!randomNumbers.ContainsKey(index))
randomNumbers[index] = rnd.Next();
return randomNumbers[index];
}
I'm sorry, didn't know that Length is computed at construction time!!
I got 200 char long string A, 5 char long string B
If I do
int Al = A.length;
int Bl = B.length;
and compare it -- all seems fine BUT If I do this few million times to
calculate something, it's too expensive for the thing I need.
Much simpler and neater way would be some function that can compare two strings and tell me when the other is AT LEAST same as the other.
Something like (compare_string_lengths(stringA,stringB) -> where string B must be at least as long (chars) as string A to return TRUE for function.
Yes,
I know that the function wouldn't have any idea which string is shorter, but if lengths of the two strings would be counted in parallel so when one exceeds the other, function knows what to "answer".
Thanks for any hints.
If you only need to know whether the strings differs in length (or if you wish to check whether the lenghts are equal before comparing), I don't think that you can do it faster than comparing the Length property. Retrieving the length from a string is an O(1) operation.
To actually compare the strings you need to look at each character, which makes it an O(n) operation.
Edit:
If things runs too slowly, you should try to have a look in a profiler, what is the slowest parts ? Perhaps it is the construction of your strings that takes the time ?
There are few things cheaper than comparing the length of two strings.
If you want to find a string in a list of strings, use a Hashtable, like:
var x = new System.Collections.Generic.Dictionary<string, bool>();
x.Add("string", true);
if (x.ContainsKey("string"))
Console.WriteLine("Found string.");
This is amazingly fast.