WE8DEC (or MCS) encoding in c#

WE8DEC (or MCS) encoding in c# - c#

Okay, I have this big .NET project which uses multiple databases with a lot of already-written requests. The databases all uses WE8DEC as the character system, until now, all the data was latin and there was no problem.
But I now have the task to use a new database, again in WE8DEC, but this database stores russian data, written in cyrillic. Using a tool like DBeaver, it shows data like ÇÎËÎÒÀ�ÅÂ instead of the actual cyrillic text.
I know I can retrieve the byte data directly from the database using the dump function to retrieve the bytes and then convert them.
WORD | DUMP(WORD)
ÇÎËÎÒÀ�ÅÂ | Typ=1 Len=9: 199,206,203,206,210,192,208,197,194
But I don't feel like duplicating/altering all my request and the way I retrieve the results in c#, I have a place just before sending the data as JSON where I could just reincode all the string before sending them.
So I was looking for a way to retrieve the bytes just like in Oracle and found a way using this line of code :
byte[] bytes = Encoding.GetEncoding("Windows-1252").GetBytes(word);
But my main problem is this, I don't find any exact equivalent of the WE8DEC encoding from Oracle in .NET, Windows-1252 is the closest I found (but still incorrect).
So the question, is there an exact equivalent of WE8DEC, also called MCS, in c#?

Related

SQL Server UDF SQLCLR Call Converts Characters Into Question Marks

I've found nothing on Google or SO that quites lines up with my issue.
In SQL Server, I have a scalar function (we'll call it dbo.MySqlStringFunction).
What this function does is call a utility written in C# that calls an ASP.Net view and returns the HTML as a SqlString.
The function definition in SQL Server is:
RETURNS [nvarchar](max) WITH EXECUTE AS CALLER
AS EXTERNAL NAME [Utils.UserDefinedFunctions].[MySqlStringFunction]
The C# code simplified is:
var request = (HttpWebRequest)WebRequest.Create("www.mydomain.com");
using (var response = (HttpWebResponse)request.GetResponse())
using (var stream = response.GetResponseStream())
{
using (var streamReader = new StreamReader(stream, Encoding.UTF8)
{
return new SqlString(streamReader.ReadToEnd());
}
}
When I put the C# code into a console app and run it, I get everything exactly as it should be.
When I access the URL directly in my browser, it displays exactly as it should be.
When I do SELECT MySqlStringFunction() however, characters such as ™, §, ¤ display as 2 or 3 question marks each.
It appears that it is somewhere between the return new SqlString(..) and the sql function returning the value that something is going wonky. But I'm at a loss as to what it could be.

It seems that the issue was the location of the return. The current code (shown in the Question), is returning in the middle of 3 using blocks, one of which is the UTF-8 stream being read. This probably confused things as SQLCLR is isolated memory from the main SQL Server memory, and usually you can't return via a stream. It is best to close the open stream first and let the using blocks call Dispose(). Hence:
Create a string above the first using (i.e. string _TempReturn = String.Empty;)
Inside the inner-most using, replace return with: _TempReturn = streamReader.ReadToEnd();
Below the last using closing bracket, add: return new SqlString(_TempReturn);
(old answer, will remove in the near future)
The problem is with the encoding difference between the web page and SQL Server. You are using Encoding.UTF8 for the web page (which is quite likely correct given that UTF-8 is the most common encoding for the interwebs), but SQL Server (along with .NET and Windows in general) is UTF-16 Little Endian. This is why you are getting 2 or 3 ?s for each character above Code Point 127: UTF-8 is a multi-byte encoding that uses 1, 2, or 3 bytes per character, whereas UTF-16 is always 2-bytes (well, supplementary characters are 4 bytes, but that is due to being a pair of double-byte values).
You need to convert the encoding to UTF-16 Little Endian before, or as, you pass back the stream. And, UTF-16 Little Endian is the Unicode encoding in .NET, while Big Endian Unicode refers to "UTF-16 Big Endian". So you want to convert to the Unicode encoding.
OR, it could be the reverse: that the web page is NOT UTF-8, in which case you have declared it incorrectly in the StreamReader. If this is true, then you need to specify the correct encoding in the StreamReader constructor.

Reading binary file from delphi

I want to transition some old files first to human readable type, so in Delphi code is following:
OpenFileWriteRA(MyF, dd+'invoice.mfs', SizeOf(TFSerialDocEx)) then
and then calling
ReadFile(MyF, vSS1, SizeOf(TFSerialDocEx), nr1, nil);
So i am looking for a way to conver this files using with small programm i want to make it with C#, as i am more fammiliar with C# than with Delphi. .MFS file is written in binary, so what would i need to convert this to text/string, i tryed with simple binary convert but it was not ok, as it seems SizeOf Object at paramters is big thing here or?

There broadly speaking are three approaches that I would consider:
1. Transform data with Delphi code
Since you already have Delphi code to read the data, and structures defined, it will be simplest and quickest to transform the data with Delphi code. Simply read it using your existing code and then output in human readable form. For instance using the built in JSON libraries.
2. Define an equivalent formatted C# structure and blit the binary data onto that structure
Define a formatted structure in C# that has identical binary layout to the structure put to disk. This will use LayoutKind.Sequential and perhaps specify Pack = 1 if the Delphi structure is packed. You may need to use the MarshalAs attribute on some members to achieve binary equivalence. Then read the structure from disk into a byte array. Pin this array, and use Marshal.PtrToStructure on the pinned object address to deserialize. Now you have the data, you can write it how you please.
An example can be found here: Proper struct layout from delphi packed record
3. Read the structure field by field with a binary reader
Rather than declaring a binary compatible structure you can use a BinaryReader to read from a stream one field at a time. Method calls like Read, ReadInt32, ReadDouble, etc. let you work your way through the record. Remember that the fields will have been written in the order in which the Delphi record was declared. If the original record is aligned rather than packed you will need to step over any padding. Again, once you have the data available to your C# code you can write it as you please.

Converting Binary Data to String [In Persian]

I am working on a system that needs to read a binary file containing certain Persian names/stock instruments. I need to convert the binary data into string to be used in further processes. I have googled it and haven't really found a solution to my problem. Anyone here who has worked in such a scenario or knows how to tackle such a problem?
Here is the code that I am using to convert the bytes to string (simple as it maybe):
byte[] data = binaryReader.ReadBytes(amountOfData);
string symbolRead = Encoding.ASCII.GetString(data);
FYI, I have tried to change my system locale to Persian and that hasn't helped either. Although it does allow me to view already written text in Persian.
Hoping to find a solution.
Thanks.

Don't use ASCII for encoding. First try using Default after setting your locale; then try asking directly someone what encoding is most used for Persia, and use this one.

Determine what coding is used in your file and use the corresponding encoding instead of Encoding.ASCII.GetString(...). Possible values could be Encoding.UTF8.GetString(...) or Encoding.Default.GetString(...) to use your system encoding. See documentation of the Encoding class for other possibilities.

strings from DynamoDB that were originally byte arrays have funky values

Now I'm not sure if this is something I'm doing wrong, or something thats happening in DynamoDB..
Basically, Im building a simple registration/login system for my project, saving the userdata/password in a DynamoDB instance with the password hashed using RIPEMD160, and salted as well using C#'s RNGCryptoServiceProvider().
Registration seems to work perfectly fine. the issue is upon login, no matter what, the passwords dont match up, and I think its because I'm getting some funky characters back when pulling the hash/salt back from DynamoDB. First off, both the hash and the salt are byte arrays of length 20, and converted to strings before saved in the database.
These examples are copy/pasted from the dynamo web interface
Example Hash: ">�Bb.ŧ�E���d��Ʀ"
Example Salt: "`���!�!�Hb�m�}e�"
When they're coming back and I debug into the function that pulls back the data from dynamo, both strings have different characters (VS2010 Debugger):
Returned Hash: "u001B>�Bb.ŧ�E��u0003�d�u001C�Ʀ"
Returned Salt: "`���!u000B�!�Hb�u001Dmu0012�u0001}e�"
Seems these u001B, u000B, u001D, u0012, u0003, u001C, and u0001 are sneaking into the returned data, and I'm not entirely sure whats going on?

You shouldn't be trying to convert opaque binary data into a string in this way in the first place. They're not text so don't treat them that way. You're just begging to lose information that way.
Use Convert.ToBase64String(data) instead of Encoding.GetString before putting the data into the database. When you get it out again, use Convert.FromBase64String to retrieve the original binary data.
Alternatively, don't store the data in a text field to start with - use a database field type which is meant to store binary data...

Compress a short but repeating string

I'm working on a web app that needs to take a list of files on a query string (specifically a GET and not a POST), something like:
http://site.com/app?things=/stuff/things/item123,/stuff/things/item456,/stuff/things/item789
I want to shorten that string:
http://site.com/app?things=somekindofencoding
The string isn't terribly long, varies from 20-150 chars. Something that short isn't really suitable for GZip, but it does have an awful lot of repetition so compression should be possible.
I don't want a DB or Dictionary of strings - the URL will be built by a different application to the one that consumes it. I want a reversible compression that shortens this URL. It doesn't need to be secure.
Is there an existing way to do this? I'm working in C#/.Net but would be happy to adapt an algorithm from some other language/stack.

If you can express the data in BNF you could contruct a parser for the data. in stead of sending the data you could send the AST where each node would be identified as one character (or several if you have a lot of different nodes). In your example
we could have
files : file files
|
file : path id
path : itemsthing
| filesitem
| stuffthingsitem
you could the represent a list of files as path[id1,id2,...,idn] using 0,1,2 for the paths and the input being:
/stuff/things/item123,/stuff/things/item456,/stuff/things/item789
/files/item1,/files/item46,/files/item7
you'd then end up with ?things=2[123,456,789]1[1,46,7]
where /stuff/things/item is represented with 2 and /files/item/ is represented with 1 each number within [...] is an id. so 2[123] would expand to /stuff/things/item123
EDIT The approach does not have to be static. If you have to discover the repeated items dynamically you can use the same approach and pass the map between identifier and token. in that case the above example would be
?things=2[123,456,789]1[1,46,7]&tokens=2=/stuff/things/,1=/files/item
which if the grammar is this simple ofcourse would do better with
?things=/stuff/things/[123,456,789]/files/item[1,46,7]
compressing the repeated part to less than the unique value with such a short string is possible but will most likely have to be based on constraining the possible values or risk actually increasing the size when "compressing"

You can try zlib using raw deflate (no zlib or gzip headers and trailers). It will generally provide some compression even on short strings that are composed of printable characters and does look for and take advantage of repeated strings. I haven't tried it, but could also see if smaz works for your data.
I would recommend obtaining a large set of real-life example URLs to use for benchmark testing of possible compression approaches.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

WE8DEC (or MCS) encoding in c# - c#

Related

SQL Server UDF SQLCLR Call Converts Characters Into Question Marks

Reading binary file from delphi

Converting Binary Data to String [In Persian]

strings from DynamoDB that were originally byte arrays have funky values

Compress a short but repeating string

Categories

Resources