random binary of one million bit file size - c#

I want to generate a one million bit random binary but my problem is that the code take to much time and not execute why that happen?
string result1 = "";
Random rand = new Random();
for (int i = 0; i < 1000000; i++)
{
result1 += ((rand.Next() % 2 == 0) ? "0" : "1");
}
textBox1.Text = result1.ToString();

Concatenating strings is an O(N) operation. Strings are immutable, so when you add to a string the new value is copied into a new string, which requires iterating the previous string. Since you're adding a value for each iteration, the amount that has to be read each time grows with each addition, leading to a performance of O(N^2). Since your N is 1,000,000 this takes a very, very long time, and probably is eating all of the memory you have storing these intermediary throw-away strings.
The normal solution when building a string with an arbitrary number of inputs is to instead use a StringBuilder. Although, a 1,000,000 character bit string is still.. unwieldy. Assuming a bitstring is what you want/need, you can change your code to something like the following and have a much more performant solution.
public string GetGiantBitString() {
var sb = new StringBuilder();
var rand = new Random();
for(var i = 0; i < 1_000_000; i++) {
sb.Append(rand.Next() % 2);
}
return sb.ToString();
}

This works for me, it takes about 0.035 seconds on my box:
private static IEnumerable<Byte> MillionBits()
{
var rand = new RNGCryptoServiceProvider();
//a million bits is 125,000 bytes, so
var bytes = new List<byte>(125000);
for (var i = 0; i < 125; ++i)
{
byte[] tempBytes = new byte[1000];
rand.GetBytes(tempBytes);
bytes.AddRange(tempBytes);
}
return bytes;
}
private static string BytesAsString(IEnumerable<Byte> bytes)
{
var buffer = new StringBuilder();
foreach (var byt in bytes)
{
buffer.Append(Convert.ToString(byt, 2).PadLeft(8, '0'));
}
return buffer.ToString();
}
and then:
var myStopWatch = new Stopwatch();
myStopWatch.Start();
var lotsOfBytes = MillionBits();
var bigString = BytesAsString(lotsOfBytes);
var len = bigString.Length;
var elapsed = myStopWatch.Elapsed;
The len variable was a million, the string looked like it was all 1s and 0s.
If you really want to fill your textbox full of ones and zeros, just set its Text property to bigString.

Related

Convert byte array to array segments of a certain length

I have a byte array and I would like to return sequential chuncks (in the form of new byte arrays) of a certain size.
I tried:
originalArray = BYTE_ARRAY
var segment = new ArraySegment<byte>(originalArray,0,640);
byte[] newArray = new byte[640];
for (int i = segment.Offset; i <= segment.Count; i++)
{
newArray[i] = segment.Array[i];
}
Obviously this only creates an array of the first 640 bytes from the original array. Ultimately, I want a loop that goes through the first 640 bytes and returns an array of those bytes, then it goes through the NEXT 640 bytes and returns an array of THOSE bytes. The purpose of this is to send messages to a server and each message must contain 640 bytes. I cannot garauntee that the original array length is divisible by 640.
Thanks
if speed isn't a concern
var bytes = new byte[640 * 6];
for (var i = 0; i <= bytes.Length; i+=640)
{
var chunk = bytes.Skip(i).Take(640).ToArray();
...
}
Alternatively you could use
Span.Slice Method
Buffer.BlockCopy(Array, Int32, Array, Int32, Int32) Method
Span
Span<byte> bytes = arr; // Implicit cast from T[] to Span<T>
...
slicedBytes = bytes.Slice(i, 640);
BlockCopy
Note this will probably be the fastest of the 3
var chunk = new byte[640]
Buffer.BlockCopy(bytes, i, chunk, 0, 640);
If you truly want to make new arrays from each 640 byte chunk, then you're looking for .Skip and .Take
Here's a working example (and a repl of the example) that I hacked together.
using System;
using System.Linq;
using System.Text;
using System.Collections;
using System.Collections.Generic;
class MainClass {
public static void Main (string[] args) {
// mock up a byte array from something
var seedString = String.Join("", Enumerable.Range(0, 1024).Select(x => x.ToString()));
var byteArrayInput = Encoding.ASCII.GetBytes(seedString);
var skip = 0;
var take = 640;
var total = byteArrayInput.Length;
var output = new List<byte[]>();
while (skip + take < total) {
output.Add(byteArrayInput.Skip(skip).Take(take).ToArray());
skip += take;
}
output.ForEach(c => Console.WriteLine($"chunk: {BitConverter.ToString(c)}"));
}
}
It's really probably better to actually use the ArraySegment properly --unless this is an assignment to learn LINQ extensions.
You can write a generic helper method like this:
public static IEnumerable<T[]> AsBatches<T>(T[] input, int n)
{
for (int i = 0, r = input.Length; r >= n; r -= n, i += n)
{
var result = new T[n];
Array.Copy(input, i, result, 0, n);
yield return result;
}
}
Then you can use it in a foreach loop:
byte[] byteArray = new byte[123456];
foreach (var batch in AsBatches(byteArray, 640))
{
Console.WriteLine(batch.Length); // Do something with the batch.
}
Or if you want a list of batches just do this:
List<byte[]> listOfBatches = AsBatches(byteArray, 640).ToList();
If you want to get fancy you could make it an extension method, but this is only recommended if you will be using it a lot (don't make an extension method for something you'll only be calling in one place!).
Here I've changed the name to InChunksOf() to make it more readable:
public static class ArrayExt
{
public static IEnumerable<T[]> InChunksOf<T>(this T[] input, int n)
{
for (int i = 0, r = input.Length; r >= n; r -= n, i += n)
{
var result = new T[n];
Array.Copy(input, i, result, 0, n);
yield return result;
}
}
}
Which you could use like this:
byte[] byteArray = new byte[123456];
// ... initialise byteArray[], then:
var listOfChunks = byteArray.InChunksOf(640).ToList();
[EDIT] Corrected loop terminator from r > n to r >= n.

Faster way to generate random text file C#

The output should be a large text file, where each line has the form Number.String, text is random:
347. Bus
20175. Yes Yes
15. The same
2. Hello world
178. Tree
The file size must be specified in bytes. Interested in the fastest way to generate files of about 1000MB and more.
There is my code for generation random text:
public string[] GetRandomTextWithIndexes(int size)
{
var result = new string[size];
var sw = Stopwatch.StartNew();
var indexes = Enumerable.Range(0, size).AsParallel().OrderBy(g => GenerateRandomNumber(0, 5)).ToList();
sw.Stop();
Console.WriteLine("Queue fill: " + sw.Elapsed);
sw = Stopwatch.StartNew();
Parallel.For(0, size, i =>
{
var text = GetRandomText(GenerateRandomNumber(1, 20));
result[i] = $"{indexes[i]}. {text}";
});
sw.Stop();
Console.WriteLine("Text fill: " + sw.Elapsed);
return result;
}
public string GetRandomText(int size)
{
var builder = new StringBuilder();
for (var i = 0; i < size; i++)
{
var character = LegalCharacters[GenerateRandomNumber(0, LegalCharacters.Length)];
builder.Append(character);
}
return builder.ToString();
}
private int GenerateRandomNumber(int min, int max)
{
lock (_synlock)
{
if (_random == null)
_random = new Random();
return _random.Next(min, max);
}
}
I don't know how to make working this code not with size of strings but with size of MBs. When I set size to about 1000000000 I receive OutOfMemoryException. And maybe there is some faster way to generate indexes
Disk is your bottleneck, no need for parallel processing
No need to store everything in memory before writing
using (var fs = File.OpenWrite(#"c:\w\test.txt"))
using (var w = new StreamWriter(fs))
{
for (var i = 0; i < size; i++)
{
var text = GetRandomText(GenerateRandomNumber(1, 20));
var number = GenerateRandomNumber(0, 5);
var line = $"{number}. {text}";
w.WriteLine(line);
}
}
It's better to put the full exception in the question. I bet it shows at
var result = new string[size];
1000000000 for size of string array is too much, try to run this dotnetfiddle, you'll get:
Run-time exception (line 12): Array dimensions exceeded supported
range.
Stack Trace:
[System.OutOfMemoryException: Array dimensions exceeded supported
range.] at Program.Main() :line 12
Please have a look at the following to know why you are getting that exception and what's the workaround.
What is the Maximum Size that an Array can hold?
Can't create huge arrays
Error when Dictionary count is bigger as 89478457

Intersect and Union in byte array of 2 files

I have 2 files.
1 is Source File and 2nd is Destination file.
Below is my code for Intersect and Union two file using byte array.
FileStream frsrc = new FileStream("Src.bin", FileMode.Open);
FileStream frdes = new FileStream("Des.bin", FileMode.Open);
int length = 24; // get file length
byte[] src = new byte[length];
byte[] des = new byte[length]; // create buffer
int Counter = 0; // actual number of bytes read
int subcount = 0;
while (frsrc.Read(src, 0, length) > 0)
{
try
{
Counter = 0;
frdes.Position = subcount * length;
while (frdes.Read(des, 0, length) > 0)
{
var data = src.Intersect(des);
var data1 = src.Union(des);
Counter++;
}
subcount++;
Console.WriteLine(subcount.ToString());
}
}
catch (Exception ex)
{
}
}
It is works fine with fastest speed.
but Now the problem is that I want count of it and when I Use below code then it becomes very slow.
var data = src.Intersect(des).Count();
var data1 = src.Union(des).Count();
So, Is there any solution for that ?
If yes,then please lete me know as soon as possible.
Thanks
Intersect and Union are not the fastest operations. The reason you see it being fast is that you never actually enumerate the results!
Both return an enumerable, not the actual results of the operation. You're supposed to go through that and enumerate the enumerable, otherwise nothing happens - this is called "deferred execution". Now, when you do Count, you actually enumerate the enumerable, and incur the full cost of the Intersect and Union - believe me, the Count itself is relatively trivial (though still an O(n) operation!).
You'll need to make your own methods, most likely. You want to avoid the enumerable overhead, and more importantly, you'll probably want a lookup table.
A few points: the comment // get file length is misleading as it is the buffer size. Counter is not the number of bytes read, it is the number of blocks read. data and data1 will end up with the result of the last block read, ignoring any data before them. That is assuming that nothing goes wrong in the while loop - you need to remove the try structure to see if there are any errors.
What you can do is count the number of occurences of each byte in each file, then if the count of a byte in any file is greater than one then it is is a member of the intersection of the files, and if the count of a byte in all the files is greater than one then it is a member of the union of the files.
It is just as easy to write the code for more than two files as it is for two files, whereas LINQ is easy for two but a little bit more fiddly for more than two. (I put in a comparison with using LINQ in a naïve fashion for only two files at the end.)
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
var file1 = #"C:\Program Files (x86)\Electronic Arts\Crysis 3\Bin32\Crysis3.exe"; // 26MB
var file2 = #"C:\Program Files (x86)\Electronic Arts\Crysis 3\Bin32\d3dcompiler_46.dll"; // 3MB
List<string> files = new List<string> { file1, file2 };
var sw = System.Diagnostics.Stopwatch.StartNew();
// Prepare array of counters for the bytes
var nFiles = files.Count;
int[][] count = new int[nFiles][];
for (int i = 0; i < nFiles; i++)
{
count[i] = new int[256];
}
// Get the counts of bytes in each file
int bufLen = 32768;
byte[] buffer = new byte[bufLen];
int bytesRead;
for (int fileNum = 0; fileNum < nFiles; fileNum++)
{
using (var sr = new FileStream(files[fileNum], FileMode.Open, FileAccess.Read))
{
bytesRead = bufLen;
while (bytesRead > 0)
{
bytesRead = sr.Read(buffer, 0, bufLen);
for (int i = 0; i < bytesRead; i++)
{
count[fileNum][buffer[i]]++;
}
}
}
}
// Find which bytes are in any of the files or in all the files
var inAny = new List<byte>(); // union
var inAll = new List<byte>(); // intersect
for (int i = 0; i < 256; i++)
{
Boolean all = true;
for (int fileNum = 0; fileNum < nFiles; fileNum++)
{
if (count[fileNum][i] > 0)
{
if (!inAny.Contains((byte)i)) // avoid adding same value more than once
{
inAny.Add((byte)i);
}
}
else
{
all = false;
}
};
if (all)
{
inAll.Add((byte)i);
};
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
// Display the results
Console.WriteLine("Union: " + string.Join(",", inAny.Select(x => x.ToString("X2"))));
Console.WriteLine();
Console.WriteLine("Intersect: " + string.Join(",", inAll.Select(x => x.ToString("X2"))));
Console.WriteLine();
// Compare to using LINQ.
// N/B. Will need adjustments for more than two files.
var srcBytes1 = File.ReadAllBytes(file1);
var srcBytes2 = File.ReadAllBytes(file2);
sw.Restart();
var intersect = srcBytes1.Intersect(srcBytes2).ToArray().OrderBy(x => x);
var union = srcBytes1.Union(srcBytes2).ToArray().OrderBy(x => x);
Console.WriteLine(sw.ElapsedMilliseconds);
Console.WriteLine("Union: " + String.Join(",", union.Select(x => x.ToString("X2"))));
Console.WriteLine();
Console.WriteLine("Intersect: " + String.Join(",", intersect.Select(x => x.ToString("X2"))));
Console.ReadLine();
}
}
}
The counting-the-byte-occurences method is roughly five times faster than the LINQ method on my computer, even without the latter loading the files and on a range of file sizes (a few KB to a few MB).

Cryptograhically random unique strings

In this answer, the below code was posted for creating unique random alphanumeric strings. Could someone clarify for me how exactly they are ensured to be unique in this code and to what extent these are unique? If I rerun this method on different occasions would I still get unique strings?
Or did I just misunderstand the reply and these are not generating unique keys at all, only random?
I already asked this in a comment to that answer but the user seems to be inactive.
public static string GetUniqueKey()
{
int maxSize = 8;
char[] chars = new char[62];
string a;
a = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890";
chars = a.ToCharArray();
int size = maxSize;
byte[] data = new byte[1];
RNGCryptoServiceProvider crypto = new RNGCryptoServiceProvider();
crypto.GetNonZeroBytes(data);
size = maxSize;
data = new byte[size];
crypto.GetNonZeroBytes(data);
StringBuilder result = new StringBuilder(size);
foreach (byte b in data)
{ result.Append(chars[b % (chars.Length - 1)]); }
return result.ToString();
}
There is nothing in the code that guarantees that the result is unique. To get a unique value you either have to keep all previous values so that you can check for duplicates, or use a lot longer codes so that duplicates are practically impossible (e.g. a GUID). The code contains less than 48 bits of information, which is a lot less than the 128 bits of a GUID.
The string is just random, and although a crypto strength random generator is used, that is ruined by how the code is generated from the random data. There are some issues in the code:
A char array is created, that is just thrown away and replaced with another.
A one byte array of random data is created for no apparent reason at all, as it's not used for anything.
The GetNonZeroBytes method is used instead of the GetBytes method, which adds a skew to the distribution of characters as the code does nothing to handle the lack of zero values.
The modulo (%) operator is used to reduce the random number down to the number of characters used, but the random number can't be evenly divided into the number of characters, which also adds a skew to the distribution of characters.
chars.Length - 1 is used instead of chars.Length when the number is reduced, which means that only 61 of the predefined 62 characters can occur in the string.
Although those issues are minor, they are important when you are dealing with crypo strength randomness.
A version of the code that would produce a string without those issues, and give a code with enough information to be considered practically unique:
public static string GetUniqueKey() {
int size = 16;
byte[] data = new byte[size];
RNGCryptoServiceProvider crypto = new RNGCryptoServiceProvider();
crypto.GetBytes(data);
return BitConverter.ToString(data).Replace("-", String.Empty);
}
Uniqueness and randomness are mutually exclusive concepts. If a random number generator is truly random, then it can return the same value. If values are truly unique, although they may not be deterministic, they certainly aren't truly random, because every value generated removes a value from the pool of allowed values. This means that every run affects the outcome of subsequent runs, and at a certain point the pool is exhausted (barring of course the possibility of an infinitely-sized pool of allowed values, but the only way to avoid collisions in such a pool would be the use of a deterministic method of choosing values).
The code you're showing generates values that are very random, but not 100% guaranteed to be unique. After enough runs, there will be a collision.
I need to generate 7 characters of an alphanumeric string. With a small search, I wrote the below code. Performance results are uploaded above
I have used hashtable Class to guarantee uniqueness and also used RNGCryptoServiceProvider Class to get high-quality random chars
results of generating 100.000 - 1.000.000 - 10.000.000 sample
Generating unique strings;
thanks to nipul parikh
public static Tuple<List<string>, List<string>> GenerateUniqueList(int count)
{
uniqueHashTable = new Hashtable();
nonUniqueList = new List<string>();
uniqueList = new List<string>();
for (int i = 0; i < count; i++)
{
isUniqueGenerated = false;
while (!isUniqueGenerated)
{
uniqueStr = GetUniqueKey();
try
{
uniqueHashTable.Add(uniqueStr, "");
isUniqueGenerated = true;
}
catch (Exception ex)
{
nonUniqueList.Add(uniqueStr);
// Non-unique generated
}
}
}
uniqueList = uniqueHashTable.Keys.Cast<string>().ToList();
return new Tuple<List<string>, List<string>>(uniqueList, nonUniqueList);
}
public static string GetUniqueKey()
{
int size = 7;
char[] chars = new char[62];
string a = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890";
chars = a.ToCharArray();
RNGCryptoServiceProvider crypto = new RNGCryptoServiceProvider();
byte[] data = new byte[size];
crypto.GetNonZeroBytes(data);
StringBuilder result = new StringBuilder(size);
foreach (byte b in data)
result.Append(chars[b % (chars.Length - 1)]);
return Convert.ToString(result);
}
Whole Console Application Code below;
class Program
{
static string uniqueStr;
static Stopwatch stopwatch;
static bool isUniqueGenerated;
static Hashtable uniqueHashTable;
static List<string> uniqueList;
static List<string> nonUniqueList;
static Tuple<List<string>, List<string>> generatedTuple;
static void Main(string[] args)
{
int i = 0, y = 0, count = 100000;
while (i < 10 && y < 4)
{
stopwatch = new Stopwatch();
stopwatch.Start();
generatedTuple = GenerateUniqueList(count);
stopwatch.Stop();
Console.WriteLine("Time elapsed: {0} --- {1} Unique --- {2} nonUnique",
stopwatch.Elapsed,
generatedTuple.Item1.Count().ToFormattedInt(),
generatedTuple.Item2.Count().ToFormattedInt());
i++;
if (i == 9)
{
Console.WriteLine(string.Empty);
y++;
count *= 10;
i = 0;
}
}
Console.ReadLine();
}
public static Tuple<List<string>, List<string>> GenerateUniqueList(int count)
{
uniqueHashTable = new Hashtable();
nonUniqueList = new List<string>();
uniqueList = new List<string>();
for (int i = 0; i < count; i++)
{
isUniqueGenerated = false;
while (!isUniqueGenerated)
{
uniqueStr = GetUniqueKey();
try
{
uniqueHashTable.Add(uniqueStr, "");
isUniqueGenerated = true;
}
catch (Exception ex)
{
nonUniqueList.Add(uniqueStr);
// Non-unique generated
}
}
}
uniqueList = uniqueHashTable.Keys.Cast<string>().ToList();
return new Tuple<List<string>, List<string>>(uniqueList, nonUniqueList);
}
public static string GetUniqueKey()
{
int size = 7;
char[] chars = new char[62];
string a = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890";
chars = a.ToCharArray();
RNGCryptoServiceProvider crypto = new RNGCryptoServiceProvider();
byte[] data = new byte[size];
crypto.GetNonZeroBytes(data);
StringBuilder result = new StringBuilder(size);
foreach (byte b in data)
result.Append(chars[b % (chars.Length - 1)]);
return Convert.ToString(result);
}
}
public static class IntExtensions
{
public static string ToFormattedInt(this int value)
{
return string.Format(CultureInfo.InvariantCulture, "{0:0,0}", value);
}
}
Using strictly alphanumeric characters restricts the pool you draw from to 62. Using the complete printable character set(ASCII 32-126) increases your pool to 94, decreasing the likelihood of collision and eliminating creating the pool separately.

Random class generating same sequence

I have a method which I am using to generate random strings by creating random integers and casting them to char
public static string GenerateRandomString(int minLength, int maxLength)
{
var length = GenerateRandomNumber(minLength, maxLength);
var builder = new StringBuilder(length);
var random = new Random((int)DateTime.Now.Ticks);
for (var i = 0; i < length; i++)
{
builder.Append((char) random.Next(255));
}
return builder.ToString();
}
The problem is that when I call this method frequently, it is creating the same sequence of values, as the docs already says:
The random number generation starts from a seed value. If the same
seed is used repeatedly, the same series of numbers is generated. One
way to produce different sequences is to make the seed value
time-dependent, thereby producing a different series with each new
instance of Random.
As you can see I am making the seed time dependent and also creating a new instance of Random on each call to the method. Even though, my Test is still failing.
[TestMethod]
public void GenerateRandomStringTest()
{
for (var i = 0; i < 100; i++)
{
var string1 = Utilitaries.GenerateRandomString(10, 100);
var string2 = Utilitaries.GenerateRandomString(10, 20);
if (string1.Contains(string2))
throw new InternalTestFailureException("");
}
}
How could I make sure that independently of the frequency on which I call the method, the sequence will "always" be different?
Your test is failing because the GenerateRandomString function completes too soon for the DateTime.Now.Ticks to change. On most systems it is quantized at either 10 or 15 ms, which is more than enough time for a modern CPU to generate a sequence of 100 random characters.
Inserting a small delay in your test should fix the problem:
var string1 = Utilitaries.GenerateRandomString(10, 100);
Thread.Sleep(30);
var string2 = Utilitaries.GenerateRandomString(10, 20);
You're effectively doing the same as Random's default constructor. It's using Environment.TickCount. Take a look at the example in this MSDN documentation for the Random constructor. It shows that inserting a Thread.Sleep between the initialization of the different Random instances, will yield different results.
If you really want to get different values, I suggest you change to a seed value that's not time-dependent.
dasblinkenlight has given why this is happening.
Now you should do this to overcome this problem
public static string GenerateRandomString(Random random , int minLength,
int maxLength)
{
var length = GenerateRandomNumber(random , minLength, maxLength);
var builder = new StringBuilder(length);
for (var i = 0; i < length; i++)
builder.Append((char) random.Next(255));
return builder.ToString();
}
public void GenerateRandomStringTest()
{
Random rnd = New Random();
for (var i = 0; i < 100; i++)
{
var string1 = Utilitaries.GenerateRandomString(rnd, 10, 100);
var string2 = Utilitaries.GenerateRandomString(rnd, 10, 20);
if (string1.Contains(string2))
throw new InternalTestFailureException("");
}
}

Categories

Resources