Encoding Persian string using C#

Encoding Persian string using C# - c#

I am developing an SMS application using C# for the bank which sends transaction alerts i.e. ATM transactions to the customer through the SMS gateway. The application is working fine, the only issue is encoding the Persian text, it doesn't encode the Persian text correctly.
Here is the method which encodes Persian text to UTF-16 format.
public static string Endian2UTF(string s)
{
Encoding ui = Encoding.BigEndianUnicode;
Encoding u8 = Encoding.UTF8;
string str = u8.GetString(ui.GetBytes(s));
return str;
}
Some characters are not getting encoded correctly, the message on mobile device looks like as below
مشترۯ뾽 ۯ뾽رامۯ뾽 شما به مقدار 500.00 افغانۯ뾽 از حساب خوب از ماشۯ뾽ن صرافۯ뾽 نماۯ뾽ندۯ뾽ۯ뾽 شهر نو بدست اوردۯ뾽د، تشۯ뾽ر از شما.
The issue is only with some characters as you see above.
For your information, there is no issue with the English string.

Finally, I found the issue. In the library somewhere the text was encoded incorrectly, so I traced it using debugger break point and found the root case. It was encoding the message to UTF8, once I changed it to BigEndianUnicode. It worked like a charm. Here is the code. You need to apply below changes in SendSms method in SMPPClient.cs file.
if (dataCoding == 8)
{
//data = Encoding.UTF8.GetBytes(text);
data = Encoding.BigEndianUnicode.GetBytes(text);
}
else
{
data = Encoding.ASCII.GetBytes(text);
}
One more changes you need to apply if still the SMS is sent as garbage. comment the part which encode the text to UTF in SMMPClient.cs
if (dataCoding == 8)
{
//text = Tools.Endian2UTF(text);
maxLength = 70;
}
I hope this could help anyone who use EasySMPP library to send SMS to client.

Related

Best way to transform string to valid encoding in C#

Sorry in advance if you have a duplicate or a simple question! I can't find the answer.
I'm working with a dll made in Delphi. Data can be sent to the device using a DLL. However, at the time the data is sent, some strings are not accepted or are written blank. The data sent to the device is stored in a txt file. It was generated using txt file third party program.
That is, I think the string is in an indefinite format. If I send in utf-8 format, it receives all the information. But some strings at the time ???? ???? remains.
Many of my texts are in the Cyrillic alphabet.
What I did:
// string that send to device
[MarshalAsAttribute(UnmanagedType.LPStr, SizeConst = 36)]
public string Name;
When I did this, the device received only 10 out of 100 data.
If i encoding with UTF-8:
byte[] bytes = Encoding.Default.GetBytes(getDvsName[1].ToString());
string res = Encoding.UTF8.GetString(bytes);
Got all the data this way but too many strings are became as ??? ????.
Also i tried like this:
static private string Win1251ToUTF8(string source)
{
Encoding utf8 = Encoding.GetEncoding(«utf-8»);
Encoding win1251 = Encoding.GetEncoding(«windows-1251»);
byte[] utf8Bytes = win1251.GetBytes(source);
byte[] win1251Bytes = Encoding.Convert(win1251, utf8, utf8Bytes);
source = win1251.GetString(win1251Bytes);
return source;
}
All of the above methods did not help. How can I receive incoming information in the correct format? Are there other ways?

hi there here is what went wrong you did encode the string to default instead of utf8.
string tom = "ටොම් හැන්ක්ස්";
byte[] bytes = Encoding.UTF8.GetBytes(tom);
string res = Encoding.UTF8.GetString(bytes);

How can i create correct line breaks in a sns sms message?

I'm using the AWS .NET-SDK for sending SMS messages with the AWS SNS service. So far, so good; but when I use line breaks, I see the ? char at this point before the line break begins in the SMS. After that character, the line break is added as expected. Is there any possibility to get a line break without this ? character?
I have also tried following:
StringBuilder.AppendLine,
"\\n",
"\\r\\n",
#"\n",
#"\r\n",
Environment.NewLine
And encoding the string into UTF-8.
Example which doesn't work:
// Create message string
var sb = new StringBuilder();
sb.AppendLine("Line1.");
sb.Append("Line2.\\n");
sb.AppendLine(Environment.NewLine);
sb.Append(#"Line4\n");
// Encode into UTF-8
var utf8 = UTF8Encoding.UTF8;
var stringBytes = Encoding.Default.GetBytes(sb.ToString());
var decodedString = utf8.GetString(stringBytes);
var message = decodedString;
// Create request
var publishRequest = new PublishRequest
{
PhoneNumber = "+491234567890",
Message = message,
Subject = "subject",
MessageAttributes = "Promotional"
};
// Send SMS
var response = await snsClient.PublishAsync("topic", message, "subject");

Simply remove all attempts to encode the string. .NET strings are Unicode, specifically UTF16 already. PublishAsync expects a .NET string, not UTF8 bytes.
As for why this error occurs, it's because the code converts the string into bytes using the local machine's codepage and then tries to read those bytes as if they were UTF8, which they aren't - using UTF8 as a system codepage is a beta feature on Windows 10 which breaks a lot of applications.
The newline character for SMS is \n. Environment.NewLine returns \r\n unless you use .NET Core on Linux. StringBuilder.AppendLine uses Environment.NewLine so you can't use it.
You shouldn't need anything more than String.Join to combine multiple lines into a single message:
var message=String.Join("\n",lines);
If you need to use a StringBuilder, use AppendFormat to append a line with the \n character at the end, eg :
builder.AppendFormat("{0}\n",line);
Update
I was able to send an SMS containing newlines with this code:
var region = Amazon.RegionEndpoint.EUWest1;
var snsClient = new AmazonSimpleNotificationServiceClient(region);
var sb = new StringBuilder()
.Append("Line1.\n")
.Append("Line2.\n")
.Append("Line4\n");
var message = sb.ToString();
// Create request
var publishRequest = new PublishRequest
{
PhoneNumber = phone,
Message = message,
};
// Send SMS
var response = await snsClient.PublishAsync(publishRequest);
The message I received contained :
Line1.
Line2.
Line4.
I decided to get fancy and changed the last line to :
.Append("Line4ΑΒΓ£§¶\n");
I received this text without problems too

Sending a string containing special characters through a TcpClient (byte[])

I'm trying to send a string containing special characters through a TcpClient (byte[]). Here's an example:
Client enters "amé" in a textbox
Client converts string to byte[] using a certain encoding (I've tried all the predefined ones plus some like "iso-8859-1")
Client sends byte[] through TCP
Server receives and outputs the string reconverted with the same encoding (to a listbox)
Edit :
I forgot to mention that the resulting string was "am?".
Edit-2 (as requested, here's some code):
#DJKRAZE here's a bit of code :
byte[] buffer = Encoding.ASCII.GetBytes("amé");
(TcpClient)server.Client.Send(buffer);
On the server side:
byte[] buffer = new byte[1024];
Client.Recieve(buffer);
string message = Encoding.ASCII.GetString(buffer);
ListBox1.Items.Add(message);
The string that appears in the listbox is "am?"
=== Solution ===
Encoding encoding = Encoding.GetEncoding("iso-8859-1");
byte[] message = encoding.GetBytes("babé");
Update:
Simply using Encoding.Utf8.GetBytes("ééé"); works like a charm.

Never too late to answer a question I think, hope someone will find answers here.
C# uses 16 bit chars, and ASCII truncates them to 8 bit, to fit in a byte. After some research, I found UTF-8 to be the best encoding for special characters.
//data to send via TCP or any stream/file
byte[] string_to_send = UTF8Encoding.UTF8.GetBytes("amé");
//when receiving, pass the array in this to get the string back
string received_string = UTF8Encoding.UTF8.GetString(message_to_send);

Your problem appears to be the Encoding.ASCII.GetBytes("amé"); and Encoding.ASCII.GetString(buffer); calls, as hinted at by '500 - Internal Server Error' in his comments.
The é character is a multi-byte character which is encoded in UTF-8 with the byte sequence C3 A9. When you use the Encoding.ASCII class to encode and decode, the é character is converted to a question mark since it does not have a direct ASCII encoding. This is true of any character that has no direct coding in ASCII.
Change your code to use Encoding.UTF8.GetBytes() and Encoding.UTF8.GetString() and it should work for you.

Your question and your error is not clear to me but using Base64String may solve the problem
Something like this
static public string EncodeTo64(string toEncode)
{
byte[] toEncodeAsBytes
= System.Text.ASCIIEncoding.ASCII.GetBytes(toEncode);
string returnValue
= System.Convert.ToBase64String(toEncodeAsBytes);
return returnValue;
}
static public string DecodeFrom64(string encodedData)
{
byte[] encodedDataAsBytes
= System.Convert.FromBase64String(encodedData);
string returnValue =
System.Text.ASCIIEncoding.ASCII.GetString(encodedDataAsBytes);
return returnValue;
}

why I get just numbers in UCS2 how can I fixed at commands and c#?

I am having a problem with reading my sms through putty, Its beacuse I type AT+CMGL="ALL" but the message(text) and number are just numbers, I read that my gms modem nokia s10 uses UCS2, but I dont know what to do here? how can I read my message intead of just seeing numbers?? help please
Also I am using this code from codeproject and I changed this line but It is the same result as putty just number in ucs2
public ShortMessageCollection ReadSMS(SerialPort port, string p_strCommand)
{
// Set up the phone and read the messages
ShortMessageCollection messages = null;
try
{
#region Execute Command
// Check connection
ExecCommand(port,"AT", 300, "No phone connected");
// Use message format "Text mode"
ExecCommand(port,"AT+CMGF=1", 300, "Failed to set message format.");
// Use character set "PCCP437"
**ExecCommand(port, "AT+CSCS=\"UCS2\"", 300, "Failed to set character set.")**;
// Select SIM storage
ExecCommand(port,"AT+CPMS=\"SM\"", 300, "Failed to select message storage.");
// Read the messages
string input = ExecCommand(port, p_strCommand, 5000, "Failed to read the messages.");
#endregion
#region Parse messages
messages = ParseMessages(input);
#endregion
}
catch (Exception ex)
{
throw ex;
}
if (messages != null)
return messages;
else
return null;
}

Notice that AT+CSCS only affects string parameters to commands and responses. In the case of AT+CMGL the content of the message is not a string, but a <data> format. See the 27.005 specification for more details on that format, it is a bit complicated (only pay attention to the first In the case of SMS part, ignore the second In the case of CBS part).
But the short version of it is that for UCS-2 you will get the data hex encoded (e.g. two characters '2' and 'A' represents one byte with value 0x2A (ASCII/UTF-8 character '*')). So you should decode 4 and 4 received bytes as the hex encoding of the 16 bits in a UCS-2 character.
So decode into a byte array and then convert to string, see Appleman1234's answer for that (his answer does not address the core issue, namely the hex decoding).

To convert from the UCS-2 encoding store the result (input) in a byte array instead of a string and then call
System.Text.Encoding enc = Encoding.Unicode;
string myString = enc.GetString(myByteArray);
If the UCS-2 encoding is Big Endian then change System.Text.Encoding enc = Encoding.Unicode; to
System.Text.Encoding enc = Encoding.BigEndianUnicode;.
Related resources include:
Unicode and .NET
C# big-endian UCS-2

BlockingSenderDestination.sendReceive() UTF-8 issue

In my Blackberry application I am loading JSON using the following method.
private static Object loadJson(String uriStr){
Object _json = null;
Message response = null;
BlockingSenderDestination bsd = null;
try
{
bsd = (BlockingSenderDestination)
DestinationFactory.getSenderDestination
("CommAPISample", URI.create(uriStr));
if(bsd == null)
{
bsd =
DestinationFactory.createBlockingSenderDestination
(new Context("CommAPISample"),
URI.create(uriStr), new JSONMessageProcessor()
);
}
response = bsd.sendReceive();
_json = response.getObjectPayload();
}
catch(Exception e)
{
System.out.println(e.toString());
}
finally
{
if(bsd != null)
{
bsd.release();
}
}
return _json;
}
This is working fine. But the problem is when I am getting JSON, Arabic characters show as junk
(Ø§Ù„Ø±Ø¦ÙŠØ³ Ø§Ù„ØªÙ†Ù) . I submitted this issue to Blackberry support form
Arabic shows corrupted in the JSON output
As per the discussion, I encode the Arabic character into \uxxxx format(In my server side application) and it was working. But now I have to use a JSON from somebody else where I can’t change the server side code.
They are using asp.net C# , as per them they are sending the data like the following.
JsonResult result = new JsonResult();
result.ContentEncoding = System.Text.Encoding.UTF8;
result.JsonRequestBehavior = JsonRequestBehavior.AllowGet;
result.Data = “Data Object (Contains Arabic) comes here”
return result;
So my question is, If the server provide the data in the above manner, BlockingSenderDestination.sendReceive method can get a utf-8 data? Or it is expecting only \uxxxx encoded data for non-ascii. Or I have to do something else (like sending some header to server) so that I can directly use the utf-8 data.
In debug mode I check the value of 'response'. It is already showing junk characters.
Except from JSON I am able to handle Arabic everywhere else.
Yesterday I posted this issue in Blackberry form . But till now no reply.
I am new to blackberry and Java. So I am sorry if this is silly question.
Thanks in advance.

What is the content type in the response? Is the server explicitly defining the UTF-8 character encoding in the HTTP header? e.g.:
Content-Type: text/json; charset=UTF-8
If the API is ignoring the charset in the HTTP content type, an easier way to do the String conversion is by determining whether the Message received is a ByteMessage or a StreamMessage. Retrieve the message as a byte array and then convert to a string using the UTF-8 encoding
i.e.:
Message msg = bsd.sendReceive();
byte[] msgBytes = null;
if (msg instanceof ByteMessage) {
msgBytes = ((ByteMessage) msg).getBytePayload();
}
else { /* StreamMessage */
// TODO read the bytes from the stream into a byte array
}
return new String(msgBytes,"UTF-8");

At last I found the solution myself.
The data sending from server was in UTF-8 which uses double byte to show single character. But BlockingSenderDestination.sendReceive() is not able to identify that. So it is creating one character for each byte. So the solution was to get each character and get the byte from that character and add to a byte array. From that byte array create a string with UTF8 encoding.
If anyone know to use BlockingSenderDestination.sendReceive() for utf-8 please post here. So that we can avoid this extra conversion method.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Encoding Persian string using C# - c#

Related

Best way to transform string to valid encoding in C#

How can i create correct line breaks in a sns sms message?

Sending a string containing special characters through a TcpClient (byte[])

why I get just numbers in UCS2 how can I fixed at commands and c#?

BlockingSenderDestination.sendReceive() UTF-8 issue

Categories

Resources