Convert byte array to array segments of a certain length - c#

I have a byte array and I would like to return sequential chuncks (in the form of new byte arrays) of a certain size.
I tried:
originalArray = BYTE_ARRAY
var segment = new ArraySegment<byte>(originalArray,0,640);
byte[] newArray = new byte[640];
for (int i = segment.Offset; i <= segment.Count; i++)
{
newArray[i] = segment.Array[i];
}
Obviously this only creates an array of the first 640 bytes from the original array. Ultimately, I want a loop that goes through the first 640 bytes and returns an array of those bytes, then it goes through the NEXT 640 bytes and returns an array of THOSE bytes. The purpose of this is to send messages to a server and each message must contain 640 bytes. I cannot garauntee that the original array length is divisible by 640.
Thanks

if speed isn't a concern
var bytes = new byte[640 * 6];
for (var i = 0; i <= bytes.Length; i+=640)
{
var chunk = bytes.Skip(i).Take(640).ToArray();
...
}
Alternatively you could use
Span.Slice Method
Buffer.BlockCopy(Array, Int32, Array, Int32, Int32) Method
Span
Span<byte> bytes = arr; // Implicit cast from T[] to Span<T>
...
slicedBytes = bytes.Slice(i, 640);
BlockCopy
Note this will probably be the fastest of the 3
var chunk = new byte[640]
Buffer.BlockCopy(bytes, i, chunk, 0, 640);

If you truly want to make new arrays from each 640 byte chunk, then you're looking for .Skip and .Take
Here's a working example (and a repl of the example) that I hacked together.
using System;
using System.Linq;
using System.Text;
using System.Collections;
using System.Collections.Generic;
class MainClass {
public static void Main (string[] args) {
// mock up a byte array from something
var seedString = String.Join("", Enumerable.Range(0, 1024).Select(x => x.ToString()));
var byteArrayInput = Encoding.ASCII.GetBytes(seedString);
var skip = 0;
var take = 640;
var total = byteArrayInput.Length;
var output = new List<byte[]>();
while (skip + take < total) {
output.Add(byteArrayInput.Skip(skip).Take(take).ToArray());
skip += take;
}
output.ForEach(c => Console.WriteLine($"chunk: {BitConverter.ToString(c)}"));
}
}
It's really probably better to actually use the ArraySegment properly --unless this is an assignment to learn LINQ extensions.

You can write a generic helper method like this:
public static IEnumerable<T[]> AsBatches<T>(T[] input, int n)
{
for (int i = 0, r = input.Length; r >= n; r -= n, i += n)
{
var result = new T[n];
Array.Copy(input, i, result, 0, n);
yield return result;
}
}
Then you can use it in a foreach loop:
byte[] byteArray = new byte[123456];
foreach (var batch in AsBatches(byteArray, 640))
{
Console.WriteLine(batch.Length); // Do something with the batch.
}
Or if you want a list of batches just do this:
List<byte[]> listOfBatches = AsBatches(byteArray, 640).ToList();
If you want to get fancy you could make it an extension method, but this is only recommended if you will be using it a lot (don't make an extension method for something you'll only be calling in one place!).
Here I've changed the name to InChunksOf() to make it more readable:
public static class ArrayExt
{
public static IEnumerable<T[]> InChunksOf<T>(this T[] input, int n)
{
for (int i = 0, r = input.Length; r >= n; r -= n, i += n)
{
var result = new T[n];
Array.Copy(input, i, result, 0, n);
yield return result;
}
}
}
Which you could use like this:
byte[] byteArray = new byte[123456];
// ... initialise byteArray[], then:
var listOfChunks = byteArray.InChunksOf(640).ToList();
[EDIT] Corrected loop terminator from r > n to r >= n.

Related

random binary of one million bit file size

I want to generate a one million bit random binary but my problem is that the code take to much time and not execute why that happen?
string result1 = "";
Random rand = new Random();
for (int i = 0; i < 1000000; i++)
{
result1 += ((rand.Next() % 2 == 0) ? "0" : "1");
}
textBox1.Text = result1.ToString();
Concatenating strings is an O(N) operation. Strings are immutable, so when you add to a string the new value is copied into a new string, which requires iterating the previous string. Since you're adding a value for each iteration, the amount that has to be read each time grows with each addition, leading to a performance of O(N^2). Since your N is 1,000,000 this takes a very, very long time, and probably is eating all of the memory you have storing these intermediary throw-away strings.
The normal solution when building a string with an arbitrary number of inputs is to instead use a StringBuilder. Although, a 1,000,000 character bit string is still.. unwieldy. Assuming a bitstring is what you want/need, you can change your code to something like the following and have a much more performant solution.
public string GetGiantBitString() {
var sb = new StringBuilder();
var rand = new Random();
for(var i = 0; i < 1_000_000; i++) {
sb.Append(rand.Next() % 2);
}
return sb.ToString();
}
This works for me, it takes about 0.035 seconds on my box:
private static IEnumerable<Byte> MillionBits()
{
var rand = new RNGCryptoServiceProvider();
//a million bits is 125,000 bytes, so
var bytes = new List<byte>(125000);
for (var i = 0; i < 125; ++i)
{
byte[] tempBytes = new byte[1000];
rand.GetBytes(tempBytes);
bytes.AddRange(tempBytes);
}
return bytes;
}
private static string BytesAsString(IEnumerable<Byte> bytes)
{
var buffer = new StringBuilder();
foreach (var byt in bytes)
{
buffer.Append(Convert.ToString(byt, 2).PadLeft(8, '0'));
}
return buffer.ToString();
}
and then:
var myStopWatch = new Stopwatch();
myStopWatch.Start();
var lotsOfBytes = MillionBits();
var bigString = BytesAsString(lotsOfBytes);
var len = bigString.Length;
var elapsed = myStopWatch.Elapsed;
The len variable was a million, the string looked like it was all 1s and 0s.
If you really want to fill your textbox full of ones and zeros, just set its Text property to bigString.

How to convert int[] to short[]?

int[] iBuf = new int[2];
iBuf[0] = 1;
iBuf[1] = 2;
short[] sBuf = new short[2];
Buffer.BlockCopy(iBuf, 0, sBuf, 0, 2);
result
iBuf[0] = 1
sBuf[0] = 1
iBuf[1] = 2
sBuf[1] = 0
My desired result
iBuf[0] = 1
sBuf[0] = 1
iBuf[1] = 2
sBuf[1] = 2
The result is different from what I want.
Is there a way to convert without using loops?
You can use the Array.ConvertAll method.
Example:
int[] iBuf = new int[2];
...
short[] sBuf = Array.ConvertAll(iBuf, input => (short) input);
This method takes an input array and a converter and the result will be your desired array.
Edit:
An even shorter version would be to use the existing Convert.ToInt16 method. inside ConvertAll:
int[] iBuf = new int[5];
short[] sBuf = Array.ConvertAll(iBuf, Convert.ToInt16);
So, how does ConvertAll work? Let's have a look at the implementation:
public static TOutput[] ConvertAll<TInput, TOutput>(TInput[] array, Converter<TInput, TOutput> converter)
{
if (array == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.array);
}
if (converter == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.converter);
}
Contract.Ensures(Contract.Result<TOutput[]>() != null);
Contract.Ensures(Contract.Result<TOutput[]>().Length == array.Length);
Contract.EndContractBlock();
TOutput[] newArray = new TOutput[array.Length];
for (int i = 0; i < array.Length; i++)
{
newArray[i] = converter(array[i]);
}
return newArray;
}
To answer the actual question... no, at some point there will be a loop involved to convert all values. You can either program it yourself or use already built methods.
int is 32bit long and short is 16 bit long, so that way of copying data won't work right.
Universal way would be to create a method to convert ints to shorts:
public IEnumerable<short> IntToShort(IEnumerable<int> iBuf)
{
foreach (var i in iBuf)
{
yield return (short)i;
}
}
and then use it:
int[] iBuf = new int[2];
iBuf[0] = 1;
iBuf[1] = 2;
short[] sBuf = IntToShort(iBuf).ToArray();

Replace bit in BitArray c#

I am using this to convert a file into a BitArray:
public static byte[] GetBinaryFile(string filename)
{
byte[] bytes;
using (FileStream file = new FileStream(filename, FileMode.Open, FileAccess.Read))
{
bytes = new byte[file.Length];
file.Read(bytes, 0, (int)file.Length);
}
return bytes;
}
var x=GetBinaryFile(#"path");
BitArray bits = new BitArray(x);
How do I replace a pattern of Bit in a BitArray?
you can use Set method to set special bit in the BitArray.
bits.Set(index, value);
value is a bool, which will translate to 0 and 1 in your bitarray
i.e: To set the 10th bit to 1 use
bits.Set(9, true);
The below code should work, using a basic two-pass algorithm to find match locations and then do the replacements.
Note that for 10MB files, it takes roughly 10 seconds on my semi-decent laptop. If you want it to go faster, you can implement it using byte arrays and masks instead of the clunky and not-so-powerful BitArray abstraction.
Even better, you could use unsafe code, where you can make use of pointers and much faster copying... But as it's a C# question and you're already using the BitArray abstraction, I thought I'd show you how it can be achieved as is.
private static BitArray Replace(BitArray input, BitArray pattern, BitArray replacement)
{
var replacementPositions = GetReplacementPositions(input, pattern);
return PerformReplacements(input, pattern.Length, replacement, replacementPositions);
}
private static List<int> GetReplacementPositions(BitArray input, BitArray pattern)
{
if (pattern.Length == 0) throw new Exception("Pattern cannot have 0 length");
var matchIndicies = new List<int>();
var maxCheckIndex = input.Length - pattern.Length;
var i = 0;
while (i <= maxCheckIndex)
{
if (MatchesAt(input, pattern, i))
{
matchIndicies.Add(i);
i += pattern.Length;
continue;
}
i++;
}
return matchIndicies;
}
private static bool MatchesAt(BitArray input, BitArray pattern, int index)
{
for (var j = 0; j < pattern.Length; j++)
{
if (input[index + j] != pattern[j]) return false;
}
return true;
}
private static BitArray PerformReplacements(BitArray input, int patternLength, BitArray replacement, List<int> replacementPositions)
{
var outLength = input.Length + replacementPositions.Count * (replacement.Length - patternLength);
var output = new BitArray(outLength);
var currentReadIndex = 0;
var currentWriteIndex = 0;
foreach (var matchPosition in replacementPositions)
{
var inputSubstringLength = matchPosition - currentReadIndex;
CopyFromTo(input, output, currentReadIndex, inputSubstringLength, currentWriteIndex);
currentReadIndex = matchPosition + patternLength;
currentWriteIndex += inputSubstringLength;
CopyFromTo(replacement, output, 0, replacement.Length, currentWriteIndex);
currentWriteIndex += replacement.Length;
}
CopyFromTo(input, output, currentReadIndex, input.Length - currentReadIndex, currentWriteIndex);
return output;
}
private static void CopyFromTo(BitArray from, BitArray to, int fromIndex, int fromLength, int toIndex)
{
for (var i = 0; i < fromLength; i++)
{
to.Set(toIndex + i, from.Get(fromIndex + i));
}
}

Intersect and Union in byte array of 2 files

I have 2 files.
1 is Source File and 2nd is Destination file.
Below is my code for Intersect and Union two file using byte array.
FileStream frsrc = new FileStream("Src.bin", FileMode.Open);
FileStream frdes = new FileStream("Des.bin", FileMode.Open);
int length = 24; // get file length
byte[] src = new byte[length];
byte[] des = new byte[length]; // create buffer
int Counter = 0; // actual number of bytes read
int subcount = 0;
while (frsrc.Read(src, 0, length) > 0)
{
try
{
Counter = 0;
frdes.Position = subcount * length;
while (frdes.Read(des, 0, length) > 0)
{
var data = src.Intersect(des);
var data1 = src.Union(des);
Counter++;
}
subcount++;
Console.WriteLine(subcount.ToString());
}
}
catch (Exception ex)
{
}
}
It is works fine with fastest speed.
but Now the problem is that I want count of it and when I Use below code then it becomes very slow.
var data = src.Intersect(des).Count();
var data1 = src.Union(des).Count();
So, Is there any solution for that ?
If yes,then please lete me know as soon as possible.
Thanks
Intersect and Union are not the fastest operations. The reason you see it being fast is that you never actually enumerate the results!
Both return an enumerable, not the actual results of the operation. You're supposed to go through that and enumerate the enumerable, otherwise nothing happens - this is called "deferred execution". Now, when you do Count, you actually enumerate the enumerable, and incur the full cost of the Intersect and Union - believe me, the Count itself is relatively trivial (though still an O(n) operation!).
You'll need to make your own methods, most likely. You want to avoid the enumerable overhead, and more importantly, you'll probably want a lookup table.
A few points: the comment // get file length is misleading as it is the buffer size. Counter is not the number of bytes read, it is the number of blocks read. data and data1 will end up with the result of the last block read, ignoring any data before them. That is assuming that nothing goes wrong in the while loop - you need to remove the try structure to see if there are any errors.
What you can do is count the number of occurences of each byte in each file, then if the count of a byte in any file is greater than one then it is is a member of the intersection of the files, and if the count of a byte in all the files is greater than one then it is a member of the union of the files.
It is just as easy to write the code for more than two files as it is for two files, whereas LINQ is easy for two but a little bit more fiddly for more than two. (I put in a comparison with using LINQ in a naïve fashion for only two files at the end.)
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
var file1 = #"C:\Program Files (x86)\Electronic Arts\Crysis 3\Bin32\Crysis3.exe"; // 26MB
var file2 = #"C:\Program Files (x86)\Electronic Arts\Crysis 3\Bin32\d3dcompiler_46.dll"; // 3MB
List<string> files = new List<string> { file1, file2 };
var sw = System.Diagnostics.Stopwatch.StartNew();
// Prepare array of counters for the bytes
var nFiles = files.Count;
int[][] count = new int[nFiles][];
for (int i = 0; i < nFiles; i++)
{
count[i] = new int[256];
}
// Get the counts of bytes in each file
int bufLen = 32768;
byte[] buffer = new byte[bufLen];
int bytesRead;
for (int fileNum = 0; fileNum < nFiles; fileNum++)
{
using (var sr = new FileStream(files[fileNum], FileMode.Open, FileAccess.Read))
{
bytesRead = bufLen;
while (bytesRead > 0)
{
bytesRead = sr.Read(buffer, 0, bufLen);
for (int i = 0; i < bytesRead; i++)
{
count[fileNum][buffer[i]]++;
}
}
}
}
// Find which bytes are in any of the files or in all the files
var inAny = new List<byte>(); // union
var inAll = new List<byte>(); // intersect
for (int i = 0; i < 256; i++)
{
Boolean all = true;
for (int fileNum = 0; fileNum < nFiles; fileNum++)
{
if (count[fileNum][i] > 0)
{
if (!inAny.Contains((byte)i)) // avoid adding same value more than once
{
inAny.Add((byte)i);
}
}
else
{
all = false;
}
};
if (all)
{
inAll.Add((byte)i);
};
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
// Display the results
Console.WriteLine("Union: " + string.Join(",", inAny.Select(x => x.ToString("X2"))));
Console.WriteLine();
Console.WriteLine("Intersect: " + string.Join(",", inAll.Select(x => x.ToString("X2"))));
Console.WriteLine();
// Compare to using LINQ.
// N/B. Will need adjustments for more than two files.
var srcBytes1 = File.ReadAllBytes(file1);
var srcBytes2 = File.ReadAllBytes(file2);
sw.Restart();
var intersect = srcBytes1.Intersect(srcBytes2).ToArray().OrderBy(x => x);
var union = srcBytes1.Union(srcBytes2).ToArray().OrderBy(x => x);
Console.WriteLine(sw.ElapsedMilliseconds);
Console.WriteLine("Union: " + String.Join(",", union.Select(x => x.ToString("X2"))));
Console.WriteLine();
Console.WriteLine("Intersect: " + String.Join(",", intersect.Select(x => x.ToString("X2"))));
Console.ReadLine();
}
}
}
The counting-the-byte-occurences method is roughly five times faster than the LINQ method on my computer, even without the latter loading the files and on a range of file sizes (a few KB to a few MB).

Optimization - Encode a string and get hexadecimal representation of 3 bytes

I am currently working in an environment where performance is critical and this is what I am doing :
var iso_8859_5 = System.Text.Encoding.GetEncoding("iso-8859-5");
var dataToSend = iso_8859_5.GetBytes(message);
The I need to group the bytes by 3 so I have a for loop that does this (i being the iterator of the loop):
byte[] dataByteArray = { dataToSend[i], dataToSend[i + 1], dataToSend[i + 2], 0 };
I then get an integer out of these 4 bytes
BitConverter.ToUInt32(dataByteArray, 0)
and finally the integer is converted to a hexadecimal string that I can place in a network packet.
The last two lines repeat about 150 times
I am currently hitting 50 milliseconds of execution times and ideally I would want to reach 0... Is there a faster way to do this that I am not aware of?
UPDATE
Just tried
string hex = BitConverter.ToString(dataByteArray);
hex.Replace("-", "")
to get the hex string directly but it is 3 times slower
Ricardo Silva's answer adapted
public byte[][] GetArrays(byte[] fullMessage, int size)
{
var returnArrays = new byte[(fullMessage.Length / size)+1][];
int i, j;
for (i = 0, j = 0; i < (fullMessage.Length - 2); i += size, j++)
{
returnArrays[j] = new byte[size + 1];
Buffer.BlockCopy(
src: fullMessage,
srcOffset: i,
dst: returnArrays[j],
dstOffset: 0,
count: size);
returnArrays[j][returnArrays[j].Length - 1] = 0x00;
}
switch ((fullMessage.Length % i))
{
case 0: {
returnArrays[j] = new byte[] { 0, 0, EOT, 0 };
} break;
case 1: {
returnArrays[j] = new byte[] { fullMessage[i], 0, EOT, 0 };
} break;
case 2: {
returnArrays[j] = new byte[] { fullMessage[i], fullMessage[i + 1], EOT, 0 };
} break;
}
return returnArrays;
}
After the line below you will get the total byte array.
var dataToSend = iso_8859_5.GetBytes(message);
My sugestion is work with Buffer.BlockCopy and test to see if this will be faster than your current method.
Try the code below and tell us if is faster than your current code:
public byte[][] GetArrays(byte[] fullMessage, int size)
{
var returnArrays = new byte[fullMessage.Length/size][];
for(int i = 0, j = 0; i < fullMessage.Length; i += size, j++)
{
returnArrays[j] = new byte[size + 1];
Buffer.BlockCopy(
src: fullMessage,
srcOffset: i,
dst: returnArrays[j],
dstOffset: 0,
count: size);
returnArrays[j][returnArrays[j].Length - 1] = 0x00;
}
return returnArrays;
}
EDIT1: I run the test below and the output was 245900ns (or 0,2459ms).
[TestClass()]
public class Form1Tests
{
[TestMethod()]
public void GetArraysTest()
{
var expected = new byte[] { 0x30, 0x31, 0x32, 0x00 };
var size = 3;
var stopWatch = new Stopwatch();
stopWatch.Start();
var iso_8859_5 = System.Text.Encoding.GetEncoding("iso-8859-5");
var target = iso_8859_5.GetBytes("012");
var arrays = Form1.GetArrays(target, size);
BitConverter.ToUInt32(arrays[0], 0);
stopWatch.Stop();
foreach(var array in arrays)
{
for(int i = 0; i < expected.Count(); i++)
{
Assert.AreEqual(expected[i], array[i]);
}
}
Console.WriteLine(string.Format("{0}ns", stopWatch.Elapsed.TotalMilliseconds * 1000000));
}
}
EDIT 2
I looked to your code and I have only one suggestion. I understood that you need to add EOF message and the length of input array will not be Always multiple of size that you want to break.
BUT, now the code below has TWO responsabilities, that break the S of SOLID concept.
The S talk about Single Responsability - Each method has ONE, and only ONE responsability.
The code you posted has TWO responsabilities (break input array into N smaller arrays and add EOF). Try think a way to create two totally independente methods (one to break an array into N other arrays, and other to put EOF in any array that you pass). This will allow you to create unit tests for each method (and guarantee that they Works and will never be breaked for any changed), and call the two methods from your class that make the system integration.

Categories

Resources