How are strings VS chars handled in C# vs Javascript?

How are strings VS chars handled in C# vs Javascript? - c#

In JavaScript, single and double quotes are somewhat interchangeable and largely a matter of styles (There is a good discussion of why this isn't actually the case in one of the answers here: When to use double or single quotes in JavaScript?). How are chars and strings handled in C#?
For example:
string test = "hello world";
string test2 = 'hello world'; // Too many characters in character literal
char test3 = 'a';
char test4 = "a"; // Cannot implicitly convert type string to char
It looks like strings and chars are being handled as separate, interchangeable types, and that the use of single or double quotes demarcates this?
What is the relationship between chars and strings in typed languages? Specifically, would it be correct to say that a string is an array of chars?

would it be correct to say that a string is an array of chars
In .NET, a string is an object containing a contiguous block of memory containing UTF-16 code units. A char is another (primitive) data type that just contains one code point, with no object overhead.
From this interesting blog post from Jon Skeet, where he compares the .NET vs. Java implementation:
A long string consists of a single large object in memory. Compare this with Java, where a String is a “normal” type in terms of memory consumption, containing an offset and length into a char array – so a long string consists of a small object referring to a large char array.

C# uses the quotes to indicate the type - ' is always a char, " is always a string.
In .NET a string behaves like a read-only array of char, so:
// Treat string like an array
char c = "This is a string"[3];
int len = "This is a string".Length
// Now we have the char at pos 3
c == 's';
What you can't do is edit them:
// This fails
"This is a string"[3] = 'x';
// This is fine, because we get a new editable char[]
char[] c = "This is a string".ToCharArray();
c[3] = 'x';
This is true in any .NET language as they all use the same string implementation. Other strongly typed frameworks have different ways of handling strings.
In .NET char can be explicitly cast to an int and implicitly cast back, so:
char c = (char) 115; // 's'
int i = c; // 115
Finally char is a value type, while string (being a collection of character bytes under the covers) is actually an immutable reference type. These behave very similarly in code but are stored in different ways in memory - I think this is why C# makes the distinction in how they are delimited - "s" and 's' are different things and stored in different ways (entirely unlike the Javascript string).

Related

Set specific length for each string

I dont know if there is aleady a fuction which is doing this:
I need to keep specific lenght for my string 20 characters. if my string is 5 characters then to keep rest 15 empty bytes to be null spaces in front of it.
Example
string test=12345;
string finalstring =test;
output
finalstring= 12345;
or
string test=13 characters;
string finalstring = 13 characters;
I cant specify it better.

You can't specify the length of a string in C#. In C# strings are immutable, meaning they can't change once they are initialized and although they are reference type they work much like value types do in a sense that they grow and shrink via formatting etc. In other words, every string instance in C# is the length and final length it will ever be and that can't change. The reference to the string can change, which may be a format of other strings. Take this for example:
var Name = string.empty;
Here Name is an immutable string of 0 characters and is empty.
Name = Name + "Michael";
Here Name combines the immutable empty string with the immutable "Michael" string and reassigns the Name reference to the new value immutable string value of "Michael". So yes, in this instance, there are 3 strings added to the heap now. This is why formatting strings in C# can be very resource intensive.
However; there is a StringBuilder class which handles this work for you. It allows you to pass in strings and or characters and build an array of characters internally, that can be then translated to a string at the end.
var nameBuilder = new StringBuilder();
nameBuilder.Append("Michael");
Now there is only, so far, one immutable string of "Michael" that was used and added to nameBuilder. This in turn can be passed around and manipulated without pushing and popping numerous strings to the heap. This is all a lot but here's you're answer.
In order to specify the length of a string you need to either, work in character arrays or borrow the well adopted StringBuilder. With StringBuilder you can specify the size and max size of the string, work with it in a more string like fashion, and benefit from better use of resources.
var initialCapacity = 20;
var maxCapacity = 20;
var nameBuilder = new StringBuilder(initialCapacity, maxCapacity);
When you're done with StringBuilder you can get the produced string by calling the overridden ToString() method supplied with it.
Hopefully this helps you understand a little bit more about how it works and why you can't set a size for string. Some languages, like C++, have string objects that let you set the max capacity but they are simply an array of characters with built in features. Immutable strings are better for performance but not formatting them in a reasonable way can make them worse; so it's a win / win if you know what you're doing and a lose / lose if you don't.

Use string.PadLeft method. Follow the instructions at msdn
string test="12345";
string finalstring = test.PadLeft(20, ' ');

Char pointer equivalent in c#

I am converting c++ code into c#, and char in c++ takes 8 bits while in c# takes 16 bits. I don't know about char*, So
What is the equivalent of char * in C#, Do i use byte[] or [MarshalAs(UnmanagedType.LPStr)] StringBuilder and also tell me whether the equivalent of char from C++ to C# is byte or string?

For an input parameter it could be a string or a byte[], depending on the meaning of the parameter. If it represents a sequence of characters then use string. If the parameter is a buffer to some arbitrary data then it's most likely a byte[].
However in C/C++ a char * can also be an output parameter, such as in the sprintf function. In that case a StringBuilder or a byte[] would be the equivalent types, depending again on the meaning of the parameter.
With regards the char datatype in C#, please keep in mind that a char in C# means character, whereras the meaning the C/C++ is closer to that of a byte in C#.

Can't understand the question correctly but C# too have Char type http://www.dotnetperls.com/char

It depends:
char *ptr should be replaced with string if it acts as a string
container (like commonly used in c).
If what matters is the value in it and to bound its range in-between [-127,128] use char[] or byte[] e.g. char[] array1 = { 'b', 'c', 'a' };

Specified Cast is not Valid - trying to cast a char

I am trying to cast a char as follows:
while (Reader.Read())
{
VM VMResult = new VM();
VMResult.status = (char)Reader["status"];
VMList.Add(VMResult);
}
Then comes the fun part: Specified Cast is not Valid.
VMResult.status is a char
The returned data is a char(1) in sql
I assume there must be a difference in the C# / SQL char terminology.
What do you think?

I assume there must be a difference in the C# / SQL char terminology.
That's correct. A char in sql server is a fixed length character string. It can be nullable.
A char in .net is a structure that represents a single character as a UTF-16 code unit. It cannot be null since its a structure
There is no fixed length character string .Net unless you consider a char array or byte array a fixed length string.
Since most of the .net ecosystem has better support for strings than chars, char arrays or byte arrays, you're much better off just using the string that gets returned for the char(x) fields.

If you know for a fact that Reader["status"] will always be a char (or you only want the first char), and the current type of Reader["status"] is a string you can always
VMResult.status = (!string.IsNullOrEmpty(Reader["status"])) ?
Reader["status"].ToCharArray()[0] : '';
EDIT: null checking ftw.

OK so you basically want to cast a string to a char, this is going to assume that your "status" value is a single character string:
VMResult.status = Reader["status"].ToString()[0];
this also assumes Reader[] does not already return a string (if it did then the ToString is not required) and that the value is not null.

Converting unicode character to a single hexadecimal value in C#

I am getting a character from a emf record using Encoding.Unicode.GetString and the resulting string contains only one character but has two bytes. I don't have any idea about the encoding scheme and the multi byte character set. I want to convert that character to its equivalent single hexadecimal value.Can you help me regarding this..

It's not clear what you mean. A char in C# is a 16-bit unsigned value. If you've got a binary data source and you want to get Unicode characters, you should use an Encoding to decode the binary data into a string, that you can access as a sequence of char values.
You can convert a char to a hex string by first converting it to an integer, and then using the X format specifier like this:
char = '\u0123';
string hex = ((int)c).ToString("X4"); // Now hex = "0123"
Now, that leaves one more issue: surrogate pairs. Values which aren't in the Basic Multilingual Plane (U+0000 to U+FFFF) are represented by two UTF-16 code units - a high surrogate and a low surrogate. You can use the char.IsSurrogate* methods to check for surrogate pairs... although it's harder (as far as I can see) to then convert a surrogate pair into a UCS-4 value. If you're lucky, you won't need to deal with this... if you're happy converting your binary data into a sequence of UTF-16 code units instead of strict UCS-4 values, you don't need to worry.
EDIT: Given your comments, it's still not entirely clear what you've got to start with. You say you've got two bytes... are they separate, or in a byte array? What do they represent? Text in a particular encoding, presumably... but which encoding? Once you know the encoding, you can convert a byte array into a string easily:
byte[] bytes = ...;
// For example, if your binary data is UTF-8
string text = Encoding.UTF8.GetString(bytes);
char firstChar = text[0];
string hex = ((int)firstChar).ToString("X4");
If you could edit your question to give more details about your actual situation, it would be a lot easier to help you get to a solution. If you're generally confused about encodings and the difference between text and binary data, you might want to read my article about it.

Try this:
System.Text.Encoding.Unicode.GetBytes(theChar.ToString())
.Aggregate("", (agg, val) => agg + val.ToString("X2"));
However, since you don't specify exactly what encoding that the character is in, this could fail. Futher, you don't make it very clear if you want the output to be a string of hex chars or bytes. I'm guessing the former, since I'd guess you want to generate HTML. Let me know if any of this is wrong.

I created an extension method to convert unicode or non-unicode string to hex string.
I shared for whom concern.
public static class StringHelper
{
public static string ToHexString(this string str)
{
byte[] bytes = str.IsUnicode() ? Encoding.UTF8.GetBytes(str) : Encoding.Default.GetBytes(str);
return BitConverter.ToString(bytes).Replace("-", string.Empty);
}
public static bool IsUnicode(this string input)
{
const int maxAnsiCode = 255;
return input.Any(c => c > maxAnsiCode);
}
}

Get thee to StringInfo:
http://msdn.microsoft.com/en-us/library/system.globalization.stringinfo.aspx
http://msdn.microsoft.com/en-us/library/8k5611at.aspx
The .NET Framework supports text elements. A text element is a unit of text that is displayed as a single character, called a grapheme. A text element can be a base character, a surrogate pair, or a combining character sequence. The StringInfo class provides methods that allow your application to split a string into its text elements and iterate through the text elements. For an example of using the StringInfo class, see String Indexing.

How Char By Char Traversal is possible?

When i apply IEnumerator and perform MoverNext() will it traverse like
C-Style 'a' 'p' 'p' 'l' 'e' '\o' until it finds null character?I thought it would return the entire string.How does the enumeration work here?
string ar = "apple";
IEnumerator enu = ar.GetEnumerator();
while (enu.MoveNext())
{
Console.WriteLine(enu.Current);
}
I get output as
a
p
p
l
e

Strings are not null-terminated in C#. Or, rather, the fact that strings are null-terminated is an implementation detail that is hidden from the user. The string "apple" has five characters, not six. You ask to see those five characters, we show all of them to you. There is no sixth null character.

The null character is not an inherent part of a CLR / .Net string and hence will not show up in the enumeration. Enumerating a string will return the characters of the string in order

An enumerator returns each element of the underlying container per iteration (MoveNext() call). In this case, your container is a string and its element type is char, so the enumerator will return a character per each iteration.
Also, the length of the string is known by the string type, which may be leveraged by the enumerator implementation to know when to terminate its traversal.

C# strings are stored like COM strings, a length field and a list of unicode chars. Therefore there's no need of a terminator. It uses a bit more memory (2 bytes more) but the strings themselves can hold nulls without any issues.
Another way to parse strings that uses the same functionality as your code only is more C#-like is:
string s="...";
foreach(char c in s)
Console.WriteLine(c);

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How are strings VS chars handled in C# vs Javascript? - c#

Related

Set specific length for each string

Char pointer equivalent in c#

Specified Cast is not Valid - trying to cast a char

Converting unicode character to a single hexadecimal value in C#

How Char By Char Traversal is possible?

Categories

Resources