How do I truncate a string while converting to bytes in C#? - c#

I would like to put a string into a byte array, but the string may be too big to fit. In the case where it's too large, I would like to put as much of the string as possible into the array. Is there an efficient way to find out how many characters will fit?

In order to truncate a string to a UTF8 byte array without splitting in the middle of a character I use this:
static string Truncate(string s, int maxLength) {
if (Encoding.UTF8.GetByteCount(s) <= maxLength)
return s;
var cs = s.ToCharArray();
int length = 0;
int i = 0;
while (i < cs.Length){
int charSize = 1;
if (i < (cs.Length - 1) && char.IsSurrogate(cs[i]))
charSize = 2;
int byteSize = Encoding.UTF8.GetByteCount(cs, i, charSize);
if ((byteSize + length) <= maxLength){
i = i + charSize;
length += byteSize;
}
else
break;
}
return s.Substring(0, i);
}
The returned string can then be safely transferred to a byte array of length maxLength.

You should be using the Encoding class to do your conversion to byte array correct? All Encoding objects have an overridden method GetMaxCharCount, which will give you "The maximum number of characters produced by decoding the specified number of bytes." You should be able to use this value to trim your string and properly encode it.

Efficient way would be finding how much (pessimistically) bytes you will need per character with
Encoding.GetMaxByteCount(1);
then dividing your string size by the result, then converting that much characters with
public virtual int Encoding.GetBytes (
string s,
int charIndex,
int charCount,
byte[] bytes,
int byteIndex
)
If you want to use less memory use
Encoding.GetByteCount(string);
but that is a much slower method.

The Encoding class in .NET has a method called GetByteCount which can take in a string or char[]. If you pass in 1 character, it will tell you how many bytes are needed for that 1 character in whichever encoding you are using.
The method GetMaxByteCount is faster, but it does a worst case calculation which could return a higher number than is actually needed.

Related

How to get a unique ID for a string and the string from this ID with C#?

I have this name:
string name = "Centos 64 bit";
I want to generate a 168-bit (or whatever is feasible) uid from this name and to be able to get the name from this id vice versa
.
I tried this one GetHashCode() without success.
Result would be something like:
Centos 64 bit (=) 91C47A57-E605-4902-894B-74E791F37C1F
One solution I would recommend is to use a hash function and something like a dictionary. So, get a hash - say SHA256 - of your input string and truncate it to 168 bytes.
Now, to go back from a uid to original string, you would need to have a dictionary which stores pairs like (input_string, string_uid). input_string is original string and string_uid is the uid generated for input_string using method from first paragraph.
Using this dictionary you can easily go back to original input string using string_uid.
This is one way - of course in case, you are allowed to store mappings between string and uid.
The hash normally gives you result as byte array. Converting this byte array to string is a separate step.
For example if you have 10 bytes representing integers in the range [0, 255], converting it to string if you encode the byte array as hex string, will take 20 bytes.
So the next question is do you want the length of the uid as string to be 21 bytes?
Because this will mean the hash output must be somewhere like 10 bytes, this will poorly reflect on collision resistance of the output.
what you want is not achievable. You need to store a lookup table of hash to name. Since you dont give more details of yr system it hard to say if that has to be persistent or in memory. If in memory just use a dictionary of string->string
Here you go sir:
public byte[] GetUID(string name)
{
var bytes = Encoding.ASCII.GetBytes(name);
if (bytes.Length > 21)
throw new ArgumentException("Value is too long to be used as an ID");
var uid = new byte[21];
Buffer.BlockCopy(bytes, 0, uid, 0, bytes.Length);
return bytes;
}
public string GetName(byte[] UID)
{
int length = UID.Length;
for (int i = 0; i < UID.Length; i++)
{
if (UID[i] == 0)
{
length = i;
break;
}
}
return Encoding.ASCII.GetString(UID, 0, length);
}
Caveats: it works for strings up to 21 characters in length that only use ASCII characters (no Unicode support) and it doesn't encrypt the string in any way, but I believe it meets your requirements.

Fastest way to convert float to bytes and then save byte array in memory?

I am currently writing code that converts an audio clip into a float array and then want to convert that float array into bytes, and finally convert that byte array to hexadecimal.
Everything works but we are attempting to save arrays of data that are hundreds of thousands of elements long when this data is converted to bytes and once we try to save this data as a hexadecimal string it is a bit much or takes too long for the mobile devices we are testing on to handle.
So my question is are there any ways to optimize / speed up this process?
Here is my code for Convert our float array to bytes:
public byte[] ConvertFloatsToBytes(float[] audioData){
byte[] bytes = new byte[audioData.Length * 4];
//*** This function converts our current float array elements to the same exact place in byte data
Buffer.BlockCopy(audioData,0,bytes,0,bytes.Length);
return bytes;
}
Here we convert that data into a hex string :
public static string ByteArrayToString(byte[] ba)
{
string hex = BitConverter.ToString(ba);
//Debug.Log("ba.length = " + ba.Length.ToString() +"hex string = " + hex);
return hex.Replace("-","");
}
Ultimately at the end we save the string out and convert it from the hex string to a float array .
Like I said that code is slow but it is working I am just trying to find the best ways to optimize / speed up this process to improve performance
Do you know which part is costing you? I strongly suspect that the conversion to a hexadecimal array isn't the bottleneck in your program.
The final part, where you remove the hyphens ends up copying the string. You can probably do better by writing your own method that duplicates what BitArray.ToString does, without the hyphens. That is:
const string chars = "0123456789ABCDEF";
public string ByteArrayToString(byte[] ba)
{
var sb = new StringBuilder(ba.Length*2);
for (int i = 0; i < ba.Length; ++i)
{
var b = ba[i];
sb.Append(chars[b >> 4]);
sb.Append(chars[b & 0x0F]);
}
return sb.ToString();
}
That will avoid one string copy.
If you're willing to use unsafe code (don't know if you can on the devices you're working with), you can speed that even further by not even copying to the array of bytes. Rather, you fix the array of floats in memory and then address it with a byte pointer See Unsafe Code and Pointers if you're interested in that.
That sounds really convoluted - are audio samples not normally integers?
Anyway, StreamWriter supports writing of single and double natively, so you could use that to build a memory stream that you then convert to hex.

Limit UTF-8 encoded bytes length from string

I need to limit the output byte[] length encoded with UTF-8 encoding. Eg. byte[] length must be less than or equals 1000 First I wrote the following code
int maxValue = 1000;
if (text.Length > maxValue)
text = text.Substring(0, maxValue);
var textInBytes = Encoding.UTF8.GetBytes(text);
works good if string is just using ASCII characters, because 1 byte per character. But if characters goes beyond that it could be 2 or 3 or even 6 bytes per character. That would be a problem with the above code. So to fix that problem I wrote this.
List<byte> textInBytesList = new List<byte>();
char[] textInChars = text.ToCharArray();
for (int a = 0; a < textInChars.Length; a++)
{
byte[] valueInBytes = Encoding.UTF8.GetBytes(textInChars, a, 1);
if ((textInBytesList.Count + valueInBytes.Length) > maxValue)
break;
textInBytesList.AddRange(valueInBytes);
}
I haven't tested code, but Im sure it will work as I want. However, I dont like the way it is done, is there any better way to do this ? Something I'm missing ? or not aware of ?
Thank you.
My first posting on Stack Overflow, so be gentle! This method should take care of things pretty quickly for you..
public static byte[] GetBytes(string text, int maxArraySize, Encoding encoding) {
if (string.IsNullOrEmpty(text)) return null;
int tail = Math.Min(text.Length, maxArraySize);
int size = encoding.GetByteCount(text.Substring(0, tail));
while (tail >= 0 && size > maxArraySize) {
size -= encoding.GetByteCount(text.Substring(tail - 1, 1));
--tail;
}
return encoding.GetBytes(text.Substring(0, tail));
}
It's similar to what you're doing, but without the added overhead of the List or having to count from the beginning of the string every time. I start from the other end of the string, and the assumption is, of course, that all characters must be at least one byte. So there's no sense in starting to iterate down through the string any farther in than maxArraySize (or the total length of the string).
Then you can call the method like so..
byte[] bytes = GetBytes(text, 1000, Encoding.UTF8);

Difference between using Encoding.GetBytes or cast to byte [duplicate]

This question already has answers here:
Encoding used in cast from char to byte
(3 answers)
Closed 9 years ago.
I was wondering if there's any difference between converting characters to byte with Encoding.UTF8.GetBytes or manually using (byte) before characters and convert them to byte?
For an example, look at following code:
public static byte[] ConvertStringToByteArray(string str)
{
int i, n;
n = str.Length;
byte[] x = new byte[n];
for (i = 0; i < n; i++)
{
x[i] = (byte)str[i];
}
return x;
}
var arrBytes = ConvertStringToByteArray("Hello world");
or
var arrBytes = Encoding.UTF8.GetBytes("Hello world");
I liked the question so I executed your code on an ANSI text in Hebrew I read from a text file.
The text was "שועל"
string text = System.IO.File.ReadAllText(#"d:\test.txt");
var arrBytes = ConvertStringToByteArray(text);
var arrBytes1 = Encoding.UTF8.GetBytes(text);
The results were
As you can see there is a difference when the code point of any of your characters exceeds the 0-255 range of byte.
Your ConvertStringToByteArray method is incorrect.
you are casting each char to byte. char's numerical value is its Unicode code point which could be larger than a byte, so the casting will often result in an arithmetic overflow.
Your example works because you've used characters with code points within the byte range.
when wanna cast characters that have encoding, you cant use first one, and you must say chose encoding standard
Yes there is a difference. All .Net strings are stored as UTF16 LE.
Use this code to make a test string, so you get high order bytes in your chars, i.e chars that have a different representation in UTF8 and UTF16.
var testString = new string(
Enumerable.Range(char.MinValue, char.MaxValue - char.MinValue)
.Select(Convert.ToChar)
.ToArray());
This makes a string with every possible char value. If you do
ConvertStringToByteArray(testString).SequenceEqual(
Encoding.UTF8.GetBytes(testString));
It will return false, demonstrating that the results differ.

How i convert decimal number with point to binary number in c#?

In this web i see that all answer about thins like converting decimal number to binary
its refers number without point in the number(int)...
i want to know how to convert the decimal number with point like "332.434" to binary in c#
exemple i see:
using System;
namespace _01.Decimal_to_Binary
{
class DecimalToBinary
{
static void Main(string[] args)
{
Console.Write("Decimal: ");
int decimalNumber = int.Parse(Console.ReadLine());
int remainder;
string result = string.Empty;
while (decimalNumber > 0)
{
remainder = decimalNumber % 2;
decimalNumber /= 2;
result = remainder.ToString() + result;
}
Console.WriteLine("Binary: {0}",result);
}
}
}
the exemple refer to convert from int without point
thank
Just use a BitConverter to get the bytes then loop over them converting those to strings and appending the current string of bits to the previous one.
byte[] byteArray = BitConverter.GetBytes(MyDouble);
string ByteString = System.String.Empty;
for (int i = 0; i < byteArray.Length; i++)
ByteString = Convert.ToString(byteArray[i], 2).PadLeft(8, '0');
You may have to do some tinkering to get the bits in the correct order but I assume BysteString will have the high order bits on the left. Here's the MSDN page for that ToString method http://msdn.microsoft.com/en-us/library/8s62fh68.aspx
You can't simply convert non integer number to a binary format. E.g. for 3.145926 a computer keeps a sign (+/-), a number itself but with a lead zero alway (0.3141596) and a mantissa (E-1). So you need to keep all 3 parts. Read more in wikipedia http://en.wikipedia.org/wiki/Floating_point#Representable_numbers.2C_conversion_and_rounding

Categories

Resources