I was trying to write some try catch for Convert.FromBase64String() and I found out that it already has TryFromBase64String() method. But it needs 3 arguments:
public static bool TryFromBase64String(string s, Span<byte> bytes, out int bytesWritten);
So how can I use Span<byte> bytes there?
I only found this in docs, but without proper description. Maybe this is too obvious.
https://learn.microsoft.com/en-us/dotnet/api/system.convert.tryfrombase64string?view=netcore-2.1
Thank to #Damien_The_Unbeliever and THIS article I found out more about Span. So...
Span is used for saving memory and don't call GC so much. It can store arrays or portion of array, but I still can't figure out how to use it in that method.
As written in the linked questions, System.Span<T> is a new C# 7.2 feature (and the Convert.TryFromBase64String is a newer .NET Core feature)
To use System.Span<> you have to install a nuget package:
Install-Package System.Memory
Then to use it:
byte[] buffer = new byte[((b64string.Length * 3) + 3) / 4 -
(b64string.Length > 0 && b64string[b64string.Length - 1] == '=' ?
b64string.Length > 1 && b64string[b64string.Length - 2] == '=' ?
2 : 1 : 0)];
int written;
bool success = Convert.TryFromBase64String(b64string, buffer, out written);
Where b64string is your base-64 string. The over-complicated size for buffer should be the exact length of the buffer based on the length of the b64string.
You could use it like this, making use of all the TryFromBase64String arguments:
public string DecodeUtf8Base64(string input)
{
var bytes = new Span<byte>(new byte[256]); // 256 is arbitrary
if (!Convert.TryFromBase64String(input, bytes, out var bytesWritten))
{
throw new InvalidOperationException("The input is not a valid base64 string");
}
return Encoding.UTF8.GetString(bytes.Slice(0, bytesWritten));
}
Here's another approach, using ArrayPool, if you need the buffer only temporarily:
// Minimum length that is sure to fit all the data.
// We don't need to be 100% accurate here,
// because ArrayPool might return a larger buffer anyway.
var length = ((value.Length * 3) + 3) / 4;
var buffer = ArrayPool<byte>.Shared.Rent(length);
try
{
// (buffer is implicitly cast to Span<byte>)
if (Convert.TryFromBase64String(value, buffer, out var bytesWritten))
{
// do something with it...
return Encoding.UTF8.GetString(buffer, 0, bytesWritten);
}
throw new FormatException("Invalid base-64 sequence.");
}
finally
{
ArrayPool<byte>.Shared.Return(buffer);
}
I used it like this:
string base64String = 'somebase64';
Span<byte> bytesBuffer = stackalloc byte[base64String.Length];
if (!Convert.TryFromBase64String(base64String, bytesBuffer, out int bytesWritten))
{
return false;
}
ReadOnlySpan<byte> actualBytes = bytesBuffer[..bytesWritten];
UPDATE:
more precise way to count bytes
const int bitsEncodedPerChar = 6;
int bytesExpected = (base64String.Length * bitsEncodedPerChar) >> 3; // divide by 8 bits in a byte
see https://en.wikipedia.org/wiki/Base64#Base64_table_from_RFC_4648
Related
I'm passing int[] array that hold image, later I want to convert it to bytes[] and save the image to local path. However, I notice that the bytePic[] length is equal to int[] arrPic just the values are missing. There is a screenshot below:
Below is the entire function:
public string ChangeMaterialPicture(int[] arrPic, int materialId,string defaultPath)
{
var material = _warehouseRepository.GetMaterialById(materialId);
if(material is not null)
{
// Convert the Array to Bytes
byte[] bytePic = new byte[arrPic.Length];
for(var i = 0; i < arrPic.Length; i++)
{
AddByteToArray(bytePic, Convert.ToByte(arrPic[i]));
}
// Convert the Bytes to IMG
string filename = Guid.NewGuid().ToString() + "_.png";
System.IO.File.WriteAllBytes(#$"{defaultPath}\materials\{material.VendorId}\{filename}", bytePic);
// Update the Image
material.Picture = filename;
_warehouseRepository.UpdateMaterial(material);
return material.Picture;
}
else
{
return String.Empty;
}
}
public byte[] AddByteToArray(byte[] bArray, byte newByte)
{
byte[] newArray = new byte[bArray.Length + 1];
bArray.CopyTo(newArray, 1);
newArray[0] = newByte;
return newArray;
}
You are creating the new array newArray in AddByteToArray and return it. But at the call site you are never using this returned value and the bytePic array remains unchanged.
The code in AddByteToArray makes no sense. Why create a new array when the intention was to insert one byte into an existing array? What you need to do is to cast the int into byte. Simply write:
byte[] bytePic = new byte[arrPic.Length];
for (int i = 0; i < arrPic.Length; i++)
{
bytePic[i] = (byte)arrPic[i];
}
And delete the method AddByteToArray.
This assumes that every value in the int array is in the range 0 to 255 and therefore fits into one byte.
There are different ways to do this. With LINQ you could also write:
byte[] bytePic = arrPic.Select(i => (byte)i).ToArray();
I would assume your original array uses a int to represent a full RGBA-pixel, since 32bit per pixel mono images are very rare in my experience. And if you do have such an image, you probably want to be more intelligent in how you do this conversion. The only time just casting int to bytes would be a good idea is if you are sure only the lower 8 bits are used, but if that is the case, why are you using an int-array in the first place.
If you actually have RGBA-pixles you do not want to convert individual int-values to bytes, but rather convert a single int value to 4 bytes. And this is not that difficult to do, you just need to use the right methods. The old school options is to use Buffer.BlockCopy.
Example:
byte[] bytePic = new byte[arrPic.Length * 4];
Buffer.BlockCopy(arrPic, 0, bytePic, 0, bytePic.Length);
But if your write-method accepts a span you might want to just convert your array to a span and cast this to the right type, avoiding the copy.
I want to split a large array of UTF-8 encoded data, so that decoding it into chars can be parallelized.
It seems that there's no way to find out how many bytes Encoding.GetCharCount reads. I also can't use GetByteCount(GetChars(...)) since it decodes the entire array anyways, which is what I'm trying to avoid.
UTF-8 has well-defined byte sequences and is considered self-synchronizing, meaning given any position in bytes you can find where the character at that position begins at.
The UTF-8 spec (Wikipedia is the easiest link) defines the following byte sequences:
0_______ : ASCII (0-127) char
10______ : Continuation
110_____ : Two-byte character
1110____ : Three-byte character
11110___ : Four-byte character
So, the following method (or something similar) should get your result:
Get the byte count for bytes (bytes.Length, et. al.)
Determine how many sections to split into
Select byte byteCount / sectionCount
Test byte against table:
If byte & 0x80 == 0x00 then you can make this byte part of either section
If byte & 0xE0 == 0xC0 then you need to seek ahead one byte, and keep it with the current section
If byte & 0xF0 == 0xE0 then you need to seek ahead two bytes, and keep it with the current section
If byte & 0xF8 == 0xF0 then you need to seek ahead three bytes, and keep it with the current section
If byte & 0xC0 == 0x80 then you are in a continuation, and should seek ahead until the first byte that does not fit val & 0xB0 == 0x80, then keep up to (but not including) this value in the current section
Select byteStart through byteCount + offset where offset can be defined by the test above
Repeat for each section.
Of course, if we redefine our test as returning the current char start position, we have two cases:
1. If (byte[i] & 0xC0) == 0x80 then we need to move around the array
2. Else, return the current i (since it's not a continuation)
This gives us the following method:
public static int GetCharStart(ref byte[] arr, int index) =>
(arr[index] & 0xC0) == 0x80 ? GetCharStart(ref arr, index - 1) : index;
Next, we want to get each section. The easiest way is to use a state-machine (or abuse, depending on how you look at it) to return the sections:
public static IEnumerable<byte[]> GetByteSections(byte[] utf8Array, int sectionCount)
{
var sectionStart = 0;
var sectionEnd = 0;
for (var i = 0; i < sectionCount; i++)
{
sectionEnd = i == (sectionCount - 1) ? utf8Array.Length : GetCharStart(ref utf8Array, (int)Math.Round((double)utf8Array.Length / sectionCount * i));
yield return GetSection(ref utf8Array, sectionStart, sectionEnd);
sectionStart = sectionEnd;
}
}
Now I built this in this manner because I want to use Parallel.ForEach to demonstrate the result, which makes it super easy if we have an IEnumerable, and it also allows me to be extremely lazy with the processing: we only continue to gather sections when needed, which means we can lazily process it and do it on-demand, which is a good thing, no?
Lastly, we need to be able to get a section of bytes, so we have the GetSection method:
public static byte[] GetSection(ref byte[] array, int start, int end)
{
var result = new byte[end - start];
for (var i = 0; i < result.Length; i++)
{
result[i] = array[i + start];
}
return result;
}
Finally, the demonstration:
var sourceText = "Some test 平仮名, ひらがな string that should be decoded in parallel, this demonstrates that we work flawlessly with Parallel.ForEach. The only downside to using `Parallel.ForEach` the way I demonstrate is that it doesn't take order into account, but oh-well.";
var source = Encoding.UTF8.GetBytes(sourceText);
Console.WriteLine(sourceText);
var results = new ConcurrentBag<string>();
Parallel.ForEach(GetByteSections(source, 10),
new ParallelOptions { MaxDegreeOfParallelism = 1 },
x => { Console.WriteLine(Encoding.UTF8.GetString(x)); results.Add(Encoding.UTF8.GetString(x)); });
Console.WriteLine();
Console.WriteLine("Assemble the result: ");
Console.WriteLine(string.Join("", results.Reverse()));
Console.ReadLine();
The result:
Some test ???, ???? string that should be decoded in parallel, this demonstrates that we work flawlessly with Parallel.ForEach. The only downside to using `Parallel.ForEach` the way I demonstrate is that it doesn't take order into account, but oh-well.
Some test ???, ??
?? string that should b
e decoded in parallel, thi
s demonstrates that we work
flawlessly with Parallel.
ForEach. The only downside
to using `Parallel.ForEach`
the way I demonstrate is
that it doesn't take order into account, but oh-well.
Assemble the result:
Some test ???, ???? string that should be decoded in parallel, this demonstrates that we work flawlessly with Parallel.ForEach. The only downside to using `Parallel.ForEach` the way I demonstrate is that it doesn't take order into account, but oh-well.
Not perfect, but it does the job. If we change MaxDegreesOfParallelism to a higher value, our string gets jumbled:
Some test ???, ??
e decoded in parallel, thi
flawlessly with Parallel.
?? string that should b
to using `Parallel.ForEach`
ForEach. The only downside
that it doesn't take order into account, but oh-well.
s demonstrates that we work
the way I demonstrate is
So, as you can see, super easy. You'll want to make modifications to allow for correct order-reassembly, but this should demonstrate the trick.
If we modify the GetByteSections method as follows, the last section is no longer ~2x the size of the remaining ones:
public static IEnumerable<byte[]> GetByteSections(byte[] utf8Array, int sectionCount)
{
var sectionStart = 0;
var sectionEnd = 0;
var sectionSize = (int)Math.Ceiling((double)utf8Array.Length / sectionCount);
for (var i = 0; i < sectionCount; i++)
{
if (i == (sectionCount - 1))
{
var lengthRem = utf8Array.Length - i * sectionSize;
sectionEnd = GetCharStart(ref utf8Array, i * sectionSize);
yield return GetSection(ref utf8Array, sectionStart, sectionEnd);
sectionStart = sectionEnd;
sectionEnd = utf8Array.Length;
yield return GetSection(ref utf8Array, sectionStart, sectionEnd);
}
else
{
sectionEnd = GetCharStart(ref utf8Array, i * sectionSize);
yield return GetSection(ref utf8Array, sectionStart, sectionEnd);
sectionStart = sectionEnd;
}
}
}
The result:
Some test ???, ???? string that should be decoded in parallel, this demonstrates that we work flawlessly with Parallel.ForEach. The only downside to using `Parallel.ForEach` the way I demonstrate is that it doesn't take order into account, but oh-well. We can continue to increase the length of this string to demonstrate that the last section is usually about double the size of the other sections, we could fix that if we really wanted to. In fact, with a small modification it does so, we just have to remember that we'll end up with `sectionCount + 1` results.
Some test ???, ???? string that should be de
coded in parallel, this demonstrates that we work flawless
ly with Parallel.ForEach. The only downside to using `Para
llel.ForEach` the way I demonstrate is that it doesn't tak
e order into account, but oh-well. We can continue to incr
ease the length of this string to demonstrate that the las
t section is usually about double the size of the other se
ctions, we could fix that if we really wanted to. In fact,
with a small modification it does so, we just have to rem
ember that we'll end up with `sectionCount + 1` results.
Assemble the result:
Some test ???, ???? string that should be decoded in parallel, this demonstrates that we work flawlessly with Parallel.ForEach. The only downside to using `Parallel.ForEach` the way I demonstrate is that it doesn't take order into account, but oh-well. We can continue to increase the length of this string to demonstrate that the last section is usually about double the size of the other sections, we could fix that if we really wanted to. In fact, with a small modification it does so, we just have to remember that we'll end up with `sectionCount + 1` results.
And finally, if for some reason you split into an abnormally large number of sections compared to input size (my input size of ~578 bytes at 250 chars demonstrates this) you'll hit an IndexOutOfRangeException in GetCharStart, the following version fixes that:
public static int GetCharStart(ref byte[] arr, int index)
{
if (index > arr.Length)
{
index = arr.Length - 1;
}
return (arr[index] & 0xC0) == 0x80 ? GetCharStart(ref arr, index - 1) : index;
}
Of course this leaves you with a bunch of empty results, but when you reassemble the string doesn't change, so I'm not even going to bother posting the full scenario test here. (I leave it up to you to experiment.)
Great answer Mathieu and Der, adding a python variant 100% based on your answer which works great:
def find_utf8_split(data, bytes=None):
bytes = bytes or len(data)
while bytes > 0 and data[bytes - 1] & 0xC0 == 0x80:
bytes -= 1
if bytes > 0:
if data[bytes - 1] & 0xE0 == 0xC0: bytes = bytes - 1
if data[bytes - 1] & 0xF0 == 0xE0: bytes = bytes - 1
if data[bytes - 1] & 0xF8 == 0xF0: bytes = bytes - 1
return bytes
This code finds a UTF-8 compatible split in a given byte string. It does not do the split as that would take more memory, that is left to the rest of the code.
For example you could:
position = find_utf8_split(data)
leftovers = data[position:]
text = data[:position].decode('utf-8')
So in c#, I have needed a random below given number generator and I found one on StackOverFlow. But near the end, it converts the byte array into a BigInteger. I tried doing the same, though I am using the Deveel-Math lib as it allows me to us BigDeciamals. But I have tried to the array change into a value, and that into a String but I keep getting a "Could not find any recognizable digits." error and as of now I am stumped.
public static BigInteger RandomIntegerBelow1(BigInteger N)
{
byte[] bytes = N.ToByteArray();
BigInteger R;
Random random = new Random();
do
{
random.NextBytes(bytes);
bytes[bytes.Length - 1] &= (byte)0x7F; //force sign bit to positive
R = BigInteger.Parse(BytesToStringConverted(bytes)) ;
//the Param needs a String value, exp: BigInteger.Parse("100")
} while (R >= N);
return R;
}
static string BytesToStringConverted(byte[] bytes)
{
using (var stream = new MemoryStream(bytes))
{
using (var streamReader = new StreamReader(stream))
{
return streamReader.ReadToEnd();
}
}
}
Deveel-Math
Wrong string conversion
You are converting your byte array to a string of characters based on UTF encoding. I'm pretty sure this is not what you want.
If you want to convert a byte array to a string that contains a number expressed in decimal, try this answer using BitConverter.
if (BitConverter.IsLittleEndian)
Array.Reverse(array); //need the bytes in the reverse order
int value = BitConverter.ToInt32(array, 0);
This is way easier
On the other hand, I notice that Deveel-Math's BigInteger has a constructor that takes a byte array as input (see line 226). So you should be able to greatly simplify your code by doing this:
R = new Deveel.Math.BigInteger(1, bytes) ;
However, since Deveel.Math appears to be BigEndian, you may need to reverse the array first:
System.Array.Reverse(bytes);
R = new Deveel.Math.BigInteger(1, bytes);
This question already has answers here:
What is the equivalent of memset in C#?
(17 answers)
Closed 8 years ago.
I'm busy rewriting an old project that was done in C++, to C#.
My task is to rewrite the program so that it functions as close to the original as possible.
During a bunch of file-handling the previous developer who wrote this program creates a structure containing a ton of fields that correspond to the set format that a file has to be written in, so all that work is already done for me.
These fields are all byte arrays. What the C++ code then does is use memset to set this entire structure to all spaces characters (0x20). One line of code. Easy.
This is very important as the utility that this file eventually goes to is expecting the file in this format. What I've had to do is change this struct to a class in C#, but I cannot find a way to easily initialize each of these byte arrays to all space characters.
What I've ended up having to do is this in the class constructor:
//Initialize all of the variables to spaces.
int index = 0;
foreach (byte b in UserCode)
{
UserCode[index] = 0x20;
index++;
}
This works fine, but I'm sure there must be a simpler way to do this. When the array is set to UserCode = new byte[6] in the constructor the byte array gets automatically initialized to the default null values. Is there no way that I can make it become all spaces upon declaration, so that when I call my class' constructor that it is initialized straight away like this? Or some memset-like function?
For small arrays use array initialisation syntax:
var sevenItems = new byte[] { 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20 };
For larger arrays use a standard for loop. This is the most readable and efficient way to do it:
var sevenThousandItems = new byte[7000];
for (int i = 0; i < sevenThousandItems.Length; i++)
{
sevenThousandItems[i] = 0x20;
}
Of course, if you need to do this a lot then you could create a helper method to help keep your code concise:
byte[] sevenItems = CreateSpecialByteArray(7);
byte[] sevenThousandItems = CreateSpecialByteArray(7000);
// ...
public static byte[] CreateSpecialByteArray(int length)
{
var arr = new byte[length];
for (int i = 0; i < arr.Length; i++)
{
arr[i] = 0x20;
}
return arr;
}
Use this to create the array in the first place:
byte[] array = Enumerable.Repeat((byte)0x20, <number of elements>).ToArray();
Replace <number of elements> with the desired array size.
You can use Enumerable.Repeat()
Enumerable.Repeat generates a sequence that contains one repeated value.
Array of 100 items initialized to 0x20:
byte[] arr1 = Enumerable.Repeat((byte)0x20,100).ToArray();
var array = Encoding.ASCII.GetBytes(new string(' ', 100));
If you need to initialise a small array you can use:
byte[] smallArray = new byte[] { 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20 };
If you have a larger array, then you could use:
byte[] bitBiggerArray Enumerable.Repeat(0x20, 7000).ToArray();
Which is simple, and easy for the next guy/girl to read. And will be fast enough 99.9% of the time.
(Normally will be the BestOption™)
However if you really really need super speed, calling out to the optimized memset method, using P/invoke, is for you:
(Here wrapped up in a nice to use class)
public static class Superfast
{
[DllImport("msvcrt.dll",
EntryPoint = "memset",
CallingConvention = CallingConvention.Cdecl,
SetLastError = false)]
private static extern IntPtr MemSet(IntPtr dest, int c, int count);
//If you need super speed, calling out to M$ memset optimized method using P/invoke
public static byte[] InitByteArray(byte fillWith, int size)
{
byte[] arrayBytes = new byte[size];
GCHandle gch = GCHandle.Alloc(arrayBytes, GCHandleType.Pinned);
MemSet(gch.AddrOfPinnedObject(), fillWith, arrayBytes.Length);
gch.Free();
return arrayBytes;
}
}
Usage:
byte[] oneofManyBigArrays = Superfast.InitByteArray(0x20,700000);
Maybe these could be helpful?
What is the equivalent of memset in C#?
http://techmikael.blogspot.com/2009/12/filling-array-with-default-value.html
Guys before me gave you your answer. I just want to point out your misuse of foreach loop. See, since you have to increment index standard "for loop" would be not only more compact, but also more efficient ("foreach" does many things under the hood):
for (int index = 0; index < UserCode.Length; ++index)
{
UserCode[index] = 0x20;
}
This is a faster version of the code from the post marked as the answer.
All of the benchmarks that I have performed show that a simple for loop that only contains something like an array fill is typically twice as fast if it is decrementing versus if it is incrementing.
Also, the array Length property is already passed as the parameter so it doesn't need to be retrieved from the array properties. It should also be pre-calculated and assigned to a local variable.
Loop bounds calculations that involve a property accessor will re-compute the value of the bounds before each iteration of the loop.
public static byte[] CreateSpecialByteArray(int length)
{
byte[] array = new byte[length];
int len = length - 1;
for (int i = len; i >= 0; i--)
{
array[i] = 0x20;
}
return array;
}
Just to expand on my answer a neater way of doing this multiple times would probably be:
PopulateByteArray(UserCode, 0x20);
which calls:
public static void PopulateByteArray(byte[] byteArray, byte value)
{
for (int i = 0; i < byteArray.Length; i++)
{
byteArray[i] = value;
}
}
This has the advantage of a nice efficient for loop (mention to gwiazdorrr's answer) as well as a nice neat looking call if it is being used a lot. And a lot mroe at a glance readable than the enumeration one I personally think. :)
The fastest way to do this is to use the api:
bR = 0xFF;
RtlFillMemory(pBuffer, nFileLen, bR);
using a pointer to a buffer, the length to write, and the encoded byte. I think the fastest way to do it in managed code (much slower), is to create a small block of initialized bytes, then use Buffer.Blockcopy to write them to the byte array in a loop. I threw this together but haven't tested it, but you get the idea:
long size = GetFileSize(FileName);
// zero byte
const int blocksize = 1024;
// 1's array
byte[] ntemp = new byte[blocksize];
byte[] nbyte = new byte[size];
// init 1's array
for (int i = 0; i < blocksize; i++)
ntemp[i] = 0xff;
// get dimensions
int blocks = (int)(size / blocksize);
int remainder = (int)(size - (blocks * blocksize));
int count = 0;
// copy to the buffer
do
{
Buffer.BlockCopy(ntemp, 0, nbyte, blocksize * count, blocksize);
count++;
} while (count < blocks);
// copy remaining bytes
Buffer.BlockCopy(ntemp, 0, nbyte, blocksize * count, remainder);
This function is way faster than a for loop for filling an array.
The Array.Copy command is a very fast memory copy function. This function takes advantage of that by repeatedly calling the Array.Copy command and doubling the size of what we copy until the array is full.
I discuss this on my blog at https://grax32.com/2013/06/fast-array-fill-function-revisited.html (Link updated 12/16/2019). Also see Nuget package that provides this extension method. http://sites.grax32.com/ArrayExtensions/
Note that this would be easy to make into an extension method by just adding the word "this" to the method declarations i.e. public static void ArrayFill<T>(this T[] arrayToFill ...
public static void ArrayFill<T>(T[] arrayToFill, T fillValue)
{
// if called with a single value, wrap the value in an array and call the main function
ArrayFill(arrayToFill, new T[] { fillValue });
}
public static void ArrayFill<T>(T[] arrayToFill, T[] fillValue)
{
if (fillValue.Length >= arrayToFill.Length)
{
throw new ArgumentException("fillValue array length must be smaller than length of arrayToFill");
}
// set the initial array value
Array.Copy(fillValue, arrayToFill, fillValue.Length);
int arrayToFillHalfLength = arrayToFill.Length / 2;
for (int i = fillValue.Length; i < arrayToFill.Length; i *= 2)
{
int copyLength = i;
if (i > arrayToFillHalfLength)
{
copyLength = arrayToFill.Length - i;
}
Array.Copy(arrayToFill, 0, arrayToFill, i, copyLength);
}
}
You can use a collection initializer:
UserCode = new byte[]{0x20,0x20,0x20,0x20,0x20,0x20};
This will work better than Repeat if the values are not identical.
You could speed up the initialization and simplify the code by using the the Parallel class (.NET 4 and newer):
public static void PopulateByteArray(byte[] byteArray, byte value)
{
Parallel.For(0, byteArray.Length, i => byteArray[i] = value);
}
Of course you can create the array at the same time:
public static byte[] CreateSpecialByteArray(int length, byte value)
{
var byteArray = new byte[length];
Parallel.For(0, length, i => byteArray[i] = value);
return byteArray;
}
I have a very painful library which, at the moment, is accepting a C# string as a way to get arrays of data; apparently, this makes marshalling for pinvokes easier.
So how do I make a ushort array into a string by bytes? I've tried:
int i;
String theOutData = "";
ushort[] theImageData = inImageData.DataArray;
//this is as slow like molasses in January
for (i = 0; i < theImageData.Length; i++) {
byte[] theBytes = System.BitConverter.GetBytes(theImageData[i]);
theOutData += String.Format("{0:d}{1:d}", theBytes[0], theBytes[1]);
}
I can do it this way, but it doesn't finish in anything remotely close to a sane amount of time.
What should I do here? Go unsafe? Go through some kind of IntPtr intermediate?
If it were a char* in C++, this would be significantly easier...
edit: the function call is
DataElement.SetByteValue(string inArray, VL Length);
where VL is a 'Value Length', a DICOM type, and the function itself is generated as a wrapper to a C++ library by SWIG. It seems that the representation chosen is string, because that can cross managed/unmanaged boundaries relatively easily, but throughout the C++ code in the project (this is GDCM), the char* is simply used as a byte buffer. So, when you want to set your image buffer pointer, in C++ it's fairly simple, but in C#, I'm stuck with this weird problem.
This is hackeration, and I know that probably the best thing is to make the SWIG library work right. I really don't know how to do that, and would rather a quick workaround on the C# side, if such exists.
P/Invoke can actually handle what you're after most of the time using StringBuilder to create writable buffers, for example see pinvoke.net on GetWindowText and related functions.
However, that aside, with data as ushort, I assume that it is encoded in UTF-16LE. If that is the case you can use Encoding.Unicode.GetString(), but that will exepect a byte array rather than a ushort array. To turn your ushorts into bytes, you can allocate a separate byte array and use Buffer.BlockCopy, something like this:
ushort[] data = new ushort[10];
for (int i = 0; i < data.Length; ++i)
data[i] = (char) ('A' + i);
string asString;
byte[] asBytes = new byte[data.Length * sizeof(ushort)];
Buffer.BlockCopy(data, 0, asBytes, 0, asBytes.Length);
asString = Encoding.Unicode.GetString(asBytes);
However, if unsafe code is OK, you have another option. Get the start of the array as a ushort*, and hard-cast it to char*, and then pass it to the string constructor, like so:
string asString;
unsafe
{
fixed (ushort *dataPtr = &data[0])
asString = new string((char *) dataPtr, 0, data.Length);
}
One thing you can do is switch from using a string to a stringBuilder it will help performance tremendously.
If you are willing to use unsafe code you can use pointers and implement the your c# code just like your c++. Or you could write a small c++\cli dll that implements this functionality.
Look into the Buffer class:
ushort[] theImageData = inImageData.DataArray;
byte[] buf = new byte[Buffer.ByteLength(theImageData)]; // 2 bytes per short
Buffer.BlockCopy(theImageData, 0, buf, 0, Buffer.ByteLength(theImageData));
string theOutData = System.Text.Encoding.ASCII.GetString(buf);
Just FYI, this has been fixed in later revision (gdcm 2.0.10). Look here:
http://gdcm.sourceforge.net/
-> http://apps.sourceforge.net/mediawiki/gdcm/index.php?title=GDCM_Release_2.0
I don't like this much, but it seems to work given the following assumptions:
1. Each ushort is an ASCII char between 0 and 127
2. (Ok, I guess there is just one assumption)
ushort[] data = inData; // The ushort array source
Byte[] bytes = new Byte[data.Length]; // Assumption - only need one byte per ushort
int i = 0;
foreach(ushort x in data) {
byte[] tmp = System.BitConverter.GetBytes(x);
bytes[i++] = tmp[0];
// Note: not using tmp[1] as all characters in 0 < x < 127 use one byte.
}
String str = Encoding.ASCII.GetString(bytes);
I'm sure there are better ways to do this, but it's all I could come up with quickly.
You can avoid unnecessary copying this way :
public static class Helpers
{
public static string ConvertToString(this ushort[] uSpan)
{
byte[] bytes = new byte[sizeof(ushort) * uSpan.Length];
for (int i = 0; i < uSpan.Length; i++)
{
Unsafe.As<byte, ushort>(ref bytes[i * 2]) = uSpan[i];
}
return Encoding.Unicode.GetString(bytes);
}
}