Convert ReadProcessMemory output to string - c#

I am using ReadProcessMemory in C# output is bytes[]. I want to covert this to string. How to do that? My code is below..
!ReadProcessMemory(appProcess.Handle, mbi.BaseAddress, buffer, mbi.RegionSize, ref nRead))
{
int lastError = Marshal.GetLastWin32Error();
if (lastError != 0)
throw new ApplicationException(string.Format("ReadProcessMemory returned Win32 Error {0}", lastError));
}
I am using string szData = Encoding.UTF8.GetString(buffer); and i am getting the below output.. how to get valid string
�#y��Actx Actx �ȶ�+eMZ�Actx Actx Actx ��ؚ~���������������MZ�j��xIlj�u�z�uy�u͙�u�}�u:�u��If�՜��D$f��f�4$��5Q�G"��L[���T_�N�b�l"���aa1wa��[�ۖ+3�����⯚*�e%��m�v�a�����S�+ ��b�r��o���V�G�q�1)v��*��[k<�CP�C�FYYE^i>�o �R��敠{�u�B3�����w�/���E�{U-��v|5�馘���U1�7�ҡ��[�## P^�J�
S4����S�<���� ���cD$�$ډD$$���&,�}�34���e��_��U����V�,I�
R��}��=63S�L���M�z[�|�v�{Y^OZ�q<2�#u�c7��dzx����8�.��'h��Jsw���V�J�4)���˧JV#c�z�R��~i�
��c0g�r�|
e����e�t2�!. �+�X*m�#�U9�5�������������E
��q
�n�'s�Yi��
�������H�����vG�Z�O� �0d��C͕����{D %�#�C���Y�M_E
�6�;3�v��c��Ʌ1]�y}�ldu�����#t���A�h�9#�SVG���zfnuy�osKђ�N��q�OD$������E0�v�؃�������������sȶ1+e�����?�������5��h0MZ��D$��M�z�uB|�u�;�ulj�uy�u���'��H[���&���
BEGINTHM�y[������RESCDIRRESCSEG{��"~��������D-x�.MZ���.�z�uB|�u�;�uK�u�E�uy�u�&��__�5����DD�.9���WU����~~�z==G�dd��]]�2+�ss�������OOѣ��D""fT**~;���
����FF����)k���(<���y�^^�
���v���;d22Vt::N
�II�
H$$l�\���]���nC����bb�9���1������7�yy����2���Cn77Y�mm�������d�NN�I����ll��VV�������%�ee��zz�G���o����xx�J%%o..r8$W���s��Ǘ��Q���#���|�tt�>!�KK�a���
�������pp�|>>Bq����ff��HH�����aa�j55_�WW�i���������X:''������8���+���"3�ii����p���3���-���<"������ ���I�UU�P((x���z���Y��� ���
e������1�BB��hh��AA�)���Z--w{��˨TT�m���,:��cc��||��ww��{{
����kk��ooT���P00��gg}V++���b����M����vvE��ʝ��#��ɇ�}}����YYɎGG
����A��g����_���E���#���S����rr[����u������=��jL&&Zl66A~??���O���\h44�Q��4�������qqs���Sb11?*R���eF##^���(0�7��
�/�� 6$���=���&���iN''�����uu ���tX,,.4-6��nn�ZZ�[����RRMv;;a����}��{R))>���q^//�����SSh���_ValidateTexInfoatToResourceFormat��y��{��"~����{��"~����RESCSEG�\�Ѕȶ1+e����ȶ1+e�������?�������'��P��W��n��W������9$�?������MZ��L$V3��y�t�ы��;T$t��F�Ѓx�u���ID�����ts.r�.��-�������#.MxX� p���O�.rsrc��lp���
�r�aaI��dGS��pOBB�W.�6t��g����MZ�����u��u�v�u���u��u��u�&\w~��u���u���u���u��\w�\w�=�uA\w��u#��u��uئ�u�D�u���uZ�u;��uܔ�u

You are reading raw binary data from a process, that will be a string only by accident. If it is a string at all, it is definitely not going to be encoded in UTF8. That's a format that you'd only ever see in files or data sent across the Internet. The in-memory representation of strings are ASCII or UTF-16.
But start out dumping this data in the same kind of format the debugger uses in the Debug + Windows + Memory 1 window. You can find the code to do so in this post.

It depends on the text encoding. For UTF8, you can do this:
string s = Encoding.UTF8.GetString(buffer);

You need to specify an encoding and then use that to construct your string.
Example:
byte [] dBytes = ...
string str;
System.Text.ASCIIEncoding enc = new System.Text.ASCIIEncoding();
str = enc.GetString(dBytes);

Related

String path to base64string

Im having a problem generating a base64string with a specific encoder.
I have an application that generate this base64string
RQA6AFwAUAByAG8AagBlAGMAdABzAFwAWQBvAHUAdAB1AGIAZQAuAE0AYQBuAGEAZwBlAHIAXABZAG8AdQB0AHUAYgBlAC4ATQBhAG4AYQBnAGUAcgAuAE0AbwBkAGUAbABzAC4AQwBvAG4AdABhAGkAbgBlAHIAXABvAGIAagBcAFIAZQBsAGUAYQBzAGUAXABuAGUAdABzAHQAYQBuAGQAYQByAGQAMgAuADAAXABZAG8AdQB0AHUAYgBlAC4ATQBhAG4AYQBnAGUAcgAuAE0AbwBkAGUAbABzAC4AQwBvAG4AdABhAGkAbgBlAHIALgBkAGwAbAAAAA==
which is equal to
E:\Projects\Youtube.Manager\Youtube.Manager.Models.Container\obj\Release\netstandard2.0\Youtube.Manager.Models.Container.dll
Now im trying convert
E:\Projects\Youtube.Manager\Youtube.Manager.Models.Container\obj\Release\netstandard2.0\Youtube.Manager.Models.Container.dll
To base64string but im getting this instead
RTpcUHJvamVjdHNcWW91dHViZS5NYW5hZ2VyXFlvdXR1YmUuTWFuYWdlci5Nb2RlbHMuQ29udGFpbmVyXG9ialxSZWxlYXNlXG5ldHN0YW5kYXJkMi4wXFlvdXR1YmUuTWFuYWdlci5Nb2RlbHMuQ29udGFpbmVyLmRsbA==
I want to get the same result as the first base64string which is
RQA6AFwAUAByAG8AagBlAGMAdABzAFwAWQBvAHUAdAB1AGIAZQAuAE0AYQBuAGEAZwBlAHIAXABZAG8AdQB0AHUAYgBlAC4ATQBhAG4AYQBnAGUAcgAuAE0AbwBkAGUAbABzAC4AQwBvAG4AdABhAGkAbgBlAHIAXABvAGIAagBcAFIAZQBsAGUAYQBzAGUAXABuAGUAdABzAHQAYQBuAGQAYQByAGQAMgAuADAAXABZAG8AdQB0AHUAYgBlAC4ATQBhAG4AYQBnAGUAcgAuAE0AbwBkAGUAbABzAC4AQwBvAG4AdABhAGkAbgBlAHIALgBkAGwAbAAAAA==
How can i do it?
This is my code which is giving me a wrong result
var bytes= Encoding.ASCII.GetBytes(msg);
return Convert.ToBase64String(bytes);
The problem here is the text encoding you're using.
The first Base64 string you posted is encoded using Unicode with a nul terminator byte pair. The trailing 'AAAAA==' is a dead giveaway here. You can see it yourself by examining the byte array:
var originalB64 = "RQA6AFwAUAByAG8AagBlAGMAdABzAFwAWQBvAHUAdAB1AGIAZQAuAE0AYQBuAGEAZwBlAHIAXABZAG8AdQB0AHUAYgBlAC4ATQBhAG4AYQBnAGUAcgAuAE0AbwBkAGUAbABzAC4AQwBvAG4AdABhAGkAbgBlAHIAXABvAGIAagBcAFIAZQBsAGUAYQBzAGUAXABuAGUAdABzAHQAYQBuAGQAYQByAGQAMgAuADAAXABZAG8AdQB0AHUAYgBlAC4ATQBhAG4AYQBnAGUAcgAuAE0AbwBkAGUAbABzAC4AQwBvAG4AdABhAGkAbgBlAHIALgBkAGwAbAAAAA==";
var bytes = Convert.FromBase64String(originalB64);
Converting this to a string will give you a null-terminated string 125 characters long, with the last character being nul.
Given a path that is not nul-terminated you can reproduce that string as follows:
string path = #"E:\Projects\Youtube.Manager\Youtube.Manager.Models.Container\obj\Release\netstandard2.0\Youtube.Manager.Models.Container.dll";
string newB64 = Convert.ToBase64String(Encoding.Unicode.GetBytes(path + "\0"));
This matches the original Base64 string exactly in my tests.

C# Encoding from utf-16 to ascii

I get question marks in output of my program: ?????? ??????
string str = "Привет медвед";
Encoding srcEncodingFormat = Encoding.GetEncoding("utf-16");
Encoding dstEncodingFormat = Encoding.ASCII;
byte [] originalByteString = srcEncodingFormat.GetBytes(str);
byte [] convertedByteString = Encoding.Convert(srcEncodingFormat,
dstEncodingFormat, originalByteString);
string finalString = dstEncodingFormat.GetString(convertedByteString);
Console.WriteLine (finalString);
There is no text but encoded text. But, .NET's char and string use Unicode/UTF-16, as you know. So, you can simplify your code by calling GetBytes and passing in the string instead of doing it twice as your code does.
As for your question, you have a choice of a lossy conversion or no conversion at all. Below is code that prevents a lossy conversion.
Now, how to see the result? As with all text, it is a sequence of bytes. Your best bet is to write them to a file and open the file in an editor that you can indicate the encoding to and that can use a font that supports the characters you want to see.
string str = "Привет медвед";
Encoding dstEncodingFormat = Encoding.GetEncoding("US-ASCII",
new EncoderExceptionFallback(),
new DecoderReplacementFallback());
byte[] output = dstEncodingFormat.GetBytes(str);
File.WriteAllBytes("Test Привет медвед.txt", output);

Understand how to decode this base64 encoded string

I've handled base64 encoded images and strings and have been able to decode them using C# in the past.
I'm now trying on what looks to me like a base64 string, but the value I'm getting is about 98% accurate and I just don't understand what is affecting the output.
Here is the string:
http://pastebin.com/ntcth6uN
And this is the decoded value:
http://pastebin.com/Buh4xXDA
That IS what it should be, but you can clearly see where there are artifacts and the decoded value isn't quite right.
Any idea why it's failing?
var data = Convert.FromBase64String(Faces[i].InfoData);
Faces[i].InfoData = Encoding.UTF8.GetString(data);
Thanks for your help.
The string is not encoded as UTF8, but instead as another encoding. Thus the same encoding must be used to decode it.
Use the following to decode it:
Encoding.ASCII.GetString(data);
If ASCII isn't the correct encoding, here's some code that will iterate through available encodings and list the first 200 characters in order to manually select original encoding:
String encStr = "PFdhdGNoIG5hbWU9IlBpa2FjaHUDV2F0Y2DDQ2hyb25vIiBkZXNjcmlwdGlvbj0iIiBhdXRob3I9IkFsY2hlbWlzdFByaW1lIiB3ZWJfbGluaz0iaHR0cgov42ZhY2VyZXBv4mNvbS9hcHAvZmFjZXMvdXNlci9BbGNoZW1pc3RQcmltZS8xIiBiZ19jb2xvcj0iYWJhYmFiIiBpbmRfbG9jPSJ0YyIDaW5kX2JnPSJZIiBob3R3b3JkX2xvYz0idGMiIGhvdHdvcmRfYmc9IlkiPDoDICADPExheWVyIHR5cGU9ImltYWdlIiBLPSIwIiB5PSIwIiBneXJvPSIwIiByb3RhdGlvbj0iMCIDc2tld19LPSIwIiBza2V3X3k9IjAiIG9wYWNpdHk9IjEwMCIDYWxpZ25tZW50PSJjYyIDcGF0ag0i4mltZzQ5OTIucHBuZyIDd2lkdGD9IjU1MCIDaGVpZ2h0PSI1NTAiIGNvbG9yPSJmZmZmZmYiIGRpc3BsYXk9ImJkIi8+CiADICA8TGF5ZXIDdHlwZT0idGVLdCIDeg0iMTciIHk9IjQ5IiBneXJvPSIwIiByb3RhdGlvbj0iMCIDc2tld19LPSIwIiBza2V3X3k9IjAiIG9wYWNpdHk9IjEwMCIDYWxpZ25tZW50PSJjYyIDdGVLdg0ie2RofTp7ZG16fSIDdGVLdF9zaXplPSIzMiIDZm9udg0iTENEUEhPTkUiIHRyYW5zZm9ybT0ibiIDY29sb3JfZGltPSIxZgFkMWQiIGNvbG9yPSIxZgFkMWQiIGRpc3BsYXk9ImJkIi8+CiADICA8TGF5ZXIDdHlwZT0idGVLdCIDeg0iOTAiIHk9IjU0IiBneXJvPSIwIiByb3RhdGlvbj0iMCIDc2tld19LPSIwIiBza2V3X3k9IjAiIG9wYWNpdHk9IjEwMCIDYWxpZ25tZW50PSJjYyIDdGVLdg0ie2Rzen0iIHRleHRfc2l6ZT0iMTDiIGZvbnQ9IkxgRFBIT05FIiB0cmFuc2Zvcm09ImLiIGNvbG9yX2RpbT0iMWQxZgFkIiBjb2xvcj0iMWQxZgFkIiBkaXNwbGF5PSJiZCIvPDoDICADPExheWVyIHR5cGU9InRleHQiIHD9IjEyNCIDeT0iNTQiIGd5cm89IjAiIHJvdGF0aW9uPSIwIiBza2V3X3D9IjAiIHNrZXdfeT0iMCIDb3BhY2l0eT0ie3N3cnN9PT0DMCBhbmQDMTAwIG9yIgAiIGFsaWdubWVudg0iY2MiIHRleHQ9IntkYX0iIHRleHRfc2l6ZT0iMjAiIGZvbnQ9IkxgRFBIT05FIiB0cmFuc2Zvcm09ImLiIGNvbG9yX2RpbT0iMWQxZgFkIiBjb2xvcj0iMWQxZgFkIiBkaXNwbGF5PSJiZCIvPDoDICADPExheWVyIHR5cGU9InRleHQiIHD9Ii0LOSIDeT0iNTIiIGd5cm89IjAiIHJvdGF0aW9uPSIwIiBza2V3X3D9IjAiIHNrZXdfeT0iMCIDb3BhY2l0eT0iMTAwIiBhbGlnbm1lbnQ9ImNjIiB0ZXh0PSJ7Ymx9IiB0ZXh0X3NpemU9IjE5IiBmb250PSJMQ0RQSE9ORSIDdHJhbnNmb3JtPSJuIiBjb2xvcl9kaW09IjFkMWQxZCIDY29sb3I9IjFkMWQxZCIDZGlzcGxheT0iYmQi4zLKICADIgxMYXllciB0eXBlPSJzaGFwZSIDeg0iMSIDeT0iNgDiIGd5cm89IjAiIHJvdGF0aW9uPSIwIiBza2V3X3D9IjAiIHNrZXdfeT0iMCIDb3BhY2l0eT0ie3N3cnN9PT0DMCBhbmQDMCBvciAxMgAiIGFsaWdubWVudg0iY2MiIHNoYXBlPSJTcXVhcmUiIHdpZHRoPSIyNzDiIGhlaWdodg0iNgUiIGNvbG9yPSJhYmFiYWIiIGRpc3BsYXk9ImJkIi8+CiADICA8TGF5ZXIDdHlwZT0idGVLdCIDeg0iMCIDeT0iNgDiIGd5cm89IjAiIHJvdGF0aW9uPSIwIiBza2V3X3D9IjAiIHNrZXdfeT0iMCIDb3BhY2l0eT0ie3N3cnN9PT0DMCBhbmQDMCBvciAxMgAiIGFsaWdubWVudg0iY2MiIHRleHQ9Intzd219Ontzd3N9Ontzd3Nzc30iIHRleHRfc2l6ZT0iMzYiIGZvbnQ9IkxgRFBIT05FIiB0cmFuc2Zvcm09ImLiIGNvbG9yX2RpbT0iMWQxZgFkIiBjb2xvcj0iMWQxZgFkIiBkaXNwbGF5PSJiZCIvPDoDICADPExheWVyIHR5cGU9InRleHQiIHD9Ii0xMTDiIHk9Ijk2IiBneXJvPSIwIiByb3RhdGlvbj0iMCIDc2tld19LPSIwIiBza2V3X3k9IjAiIG9wYWNpdHk9IjEwMCIDYWxpZ25tZW50PSJjYyIDdGVLdg0ie3N3cn0DYW5kICZhcG9zO1NUT1AmYXBvczsDb3IDJmFwb3M7U1RBUlQmYXBvczsiIHRleHRfc2l6ZT0iMTUiIGZvbnQ9IlJvYm90by1SZWd1bGFyIiB0cmFuc2Zvcm09ImLiIGNvbG9yX2RpbT0iMWQxZgFkIiBjb2xvcj0iMWQxZgFkIiBkaXNwbGF5PSJiZCIDdGFwX2FjdGlvbj0ic3dfc3RhcnRfc3RvcCIvPDoDICADPExheWVyIHR5cGU9InRleHQiIHD9IjExOCIDeT0iOTYiIGd5cm89IjAiIHJvdGF0aW9uPSIwIiBza2V3X3D9IjAiIHNrZXdfeT0iMCIDb3BhY2l0eT0iMTAwIiBhbGlnbm1lbnQ9ImNjIiB0ZXh0PSJSRVNFVCIDdGVLdF9zaXplPSIxNSIDZm9udg0iUm9ib3Rv4VJlZ3VsYXIiIHRyYW5zZm9ybT0ibiIDY29sb3JfZGltPSIxZgFkMWQiIGNvbG9yPSIxZgFkMWQiIGRpc3BsYXk9ImJkIiB0YXBfYWN0aW9uPSJzd19yZXNldCIvPDoDICADPExheWVyIHR5cGU9InNoYXBlIiBLPSItMTI0IiB5PSIxNgEiIGd5cm89IjAiIHJvdGF0aW9uPSItOTAiIHNrZXdfeg0iMCIDc2tld195PSIwIiBvcGFjaXR5PSIxMgAiIGFsaWdubWVudg0iY2MiIHNoYXBlPSJUcmlhbmdsZSIDd2lkdGD9IjM1IiBoZWlnaHQ9IjM1IiBjb2xvcj0iMWQxZgFkIiBkaXNwbGF5PSJiZCIvPDoDICADPExheWVyIHR5cGU9InNoYXBlIiBLPSItMTE3IiB5PSIxNgQiIGd5cm89IjAiIHJvdGF0aW9uPSIwIiBza2V3X3D9IjAiIHNrZXdfeT0iMCIDb3BhY2l0eT0iMCIDYWxpZ25tZW50PSJjYyIDc2hhcGU9IlNxdWFyZSIDd2lkdGD9IjEyMCIDaGVpZ2h0PSIxMjAiIGNvbG9yPSIyOWI5ZgMiIGRpc3BsYXk9ImJkIiB0YXBfYWN0aW9uPSJzd19zdGFydF9zdG9wIi8+CiADICA8TGF5ZXIDdHlwZT0ic2hhcGUiIHD9IjEyNCIDeT0iMTQxIiBneXJvPSIwIiByb3RhdGlvbj0iOTAiIHNrZXdfeg0iMCIDc2tld195PSIwIiBvcGFjaXR5PSIxMgAiIGFsaWdubWVudg0iY2MiIHNoYXBlPSJUcmlhbmdsZSIDd2lkdGD9IjM1IiBoZWlnaHQ9IjM1IiBjb2xvcj0iMWQxZgFkIiBkaXNwbGF5PSJiZCIvPDoDICADPExheWVyIHR5cGU9InNoYXBlIiBLPSIxMTciIHk9IjE0NCIDZ3lybz0iMCIDcm90YXRpb2L9IjAiIHNrZXdfeg0iMCIDc2tld195PSIwIiBvcGFjaXR5PSIwIiBhbGlnbm1lbnQ9ImNjIiBzaGFwZT0iU3F1YXJlIiB3aWR0ag0iMTIwIiBoZWlnaHQ9IjEyMCIDY29sb3I9IjI5YjlkMyIDZGlzcGxheT0iYmQiIHRhcF9hY3Rpb2L9InN3X3Jlc2V0Ii8+CiADICA8TGF5ZXIDdHlwZT0idGVLdCIDeg0i4TDyIiB5PSItNgMiIGd5cm89IjAiIHJvdGF0aW9uPSIwIiBza2V3X3D9IjAiIHNrZXdfeT0iMCIDb3BhY2l0eT0iMTAwIiBhbGlnbm1lbnQ9ImNjIiB0ZXh0PSJ7d3RkfSIDdGVLdF9zaXplPSIyNSIDZm9udg0iTENEUEhPTkUiIHRyYW5zZm9ybT0ibiIDY29sb3JfZGltPSIxZgFkMWQiIGNvbG9yPSIxZgFkMWQiIGRpc3BsYXk9ImJkIi8+CiADICA8TGF5ZXIDdHlwZT0iaW1hZ2VfY29uZCIDeg0i4TD5IiB5PSItOTIiIGd5cm89IjAiIHJvdGF0aW9uPSIwIiBza2V3X3D9IjAiIHNrZXdfeT0iMCIDb3BhY2l0eT0iMTAwIiBhbGlnbm1lbnQ9ImNjIiBwYXRoPSJ3ZWF0aGVyX3NldF8z4nBwbmciIHdpZHRoPSI3MCIDaGVpZ2h0PSI3MCIDY29sb3I9IjQ1NgU0NSIDY29uZF92YWx1ZT0iJmFwb3M7e3djaX0mYXBvczsDPT0DJmFwb3M7MgFkJmFwb3M7IGFuZCAxIG9yICZhcG9zO3t3Y2l9JmFwb3M7Ig09ICZhcG9zOzAyZCZhcG9zOyBhbmQDMiBvciAmYXBvczt7d2NpfSZhcG9zOyA9PSAmYXBvczswM2QmYXBvczsDYW5kIgMDb3IDJmFwb3M7e3djaX0mYXBvczsDPT0DJmFwb3M7MgRkJmFwb3M7IGFuZCA0IG9yICZhcG9zO3t3Y2l9JmFwb3M7Ig09ICZhcG9zOzA5ZCZhcG9zOyBhbmQDNSBvciAmYXBvczt7d2NpfSZhcG9zOyA9PSAmYXBvczsxMGQmYXBvczsDYW5kIgYDb3IDJmFwb3M7e3djaX0mYXBvczsDPT0DJmFwb3M7MTFkJmFwb3M7IGFuZCA3IG9yICZhcG9zO3t3Y2l9JmFwb3M7Ig09ICZhcG9zOzEzZCZhcG9zOyBhbmQDOCBvciAmYXBvczt7d2NpfSZhcG9zOyA9PSAmYXBvczs1MGQmYXBvczsDYW5kIgkDb3IDMSIDY29uZF9ncmlkPSIzegMiIGRpc3BsYXk9ImJkIi8+CiADICA8TGF5ZXIDdHlwZT0idGVLdCIDeg0iMTEzIiB5PSIzIiBneXJvPSIwIiByb3RhdGlvbj0iMCIDc2tld19LPSIwIiBza2V3X3k9IjAiIG9wYWNpdHk9IjEwMCIDYWxpZ25tZW50PSJjYyIDdGVLdg0iTW9vbiIDdGVLdF9zaXplPSIxOCIDZm9udg0iTENEUEhPTkUiIHRyYW5zZm9ybT0ibiIDY29sb3JfZGltPSIxZgFkMWQiIGNvbG9yPSIxZgFkMWQiIGRpc3BsYXk9ImJkIi8+CiADICA8TGF5ZXIDdHlwZT0ic2hhcGUiIHD9IjExNCIDeT0i4TUyIiBneXJvPSIwIiByb3RhdGlvbj0iMCIDc2tld19LPSIwIiBza2V3X3k9IjAiIG9wYWNpdHk9IjEwMCIDYWxpZ25tZW50PSJjYyIDc2hhcGU9IkNpcmNsZSIDd2lkdGD9IjYwIiBoZWlnaHQ9IjYwIiBjb2xvcj0iNgU0NTQ1IiBkaXNwbGF5PSJiZCIvPDoDICADPExheWVyIHR5cGU9ImltYWdlX2NvbmQiIHD9IjExNCIDeT0i4TUyIiBneXJvPSIwIiByb3RhdGlvbj0iMCIDc2tld19LPSIwIiBza2V3X3k9IjAiIG9wYWNpdHk9IjEwMCIDYWxpZ25tZW50PSJjYyIDcGF0ag0ibW9vbl9zZXRfMy5wcG5nIiB3aWR0ag0iNjAiIGhlaWdodg0iNjAiIGNvbG9yPSJhYmFiYWIiIGNvbmRfdmFsdWU9Int3bXB9IiBjb25kX2dyaWQ9IjNLMyIDZGlzcGxheT0iYmQi4zLKICADIgxMYXllciB0eXBlPSJ0ZXh0IiBLPSIwIiB5PSIwIiBneXJvPSIwIiByb3RhdGlvbj0iMCIDc2tld19LPSIwIiBza2V3X3k9IjAiIG9wYWNpdHk9IjEwMCIDYWxpZ25tZW50PSJjYyIDdGVLdg0iIiB0ZXh0X3NpemU9IjQwIiBmb250PSJSb2JvdG8tUmVndWxhciIDdHJhbnNmb3JtPSJuIiBjb2xvcl9kaW09ImZmZmZmZiIDY29sb3I9ImZmZmZmZiIDZGlzcGxheT0iYmQi4zLKICADIgxMYXllciB0eXBlPSJ0ZXh0IiBLPSIxMCIDeT0i4TExNiIDZ3lybz0iMCIDcm90YXRpb2L9IjAiIHNrZXdfeg0iMCIDc2tld195PSIwIiBvcGFjaXR5PSIxMgAiIGFsaWdubWVudg0iY2MiIHRleHQ9IntkZHd9IHtkbm5ufSB7ZGR9IiB0ZXh0X3NpemU9IjE2IiBmb250PSJMQ0RQSE9ORSIDdHJhbnNmb3JtPSJuIiBjb2xvcl9kaW09IjFkMWQxZCIDY29sb3I9IjFkMWQxZCIDZGlzcGxheT0iYmQi4zLKPC9XYXRjagLK";
var data = Convert.FromBase64String(encStr);
// Iterate over all encodings, and decode
foreach (EncodingInfo ei in Encoding.GetEncodings())
{
Encoding e = ei.GetEncoding();
Console.Write("{0,-15} - {1}{2}", ei.CodePage, e.EncodingName, System.Environment.NewLine);
Console.WriteLine(e.GetString(data).Substring(0, 200));
Console.Write(System.Environment.NewLine + "---------------------" + System.Environment.NewLine);
}
Fiddle: https://dotnetfiddle.net/LcV5s8

Notepad++ .NET plugin - get current buffer text -- encoding issues

I have a .NET plugin which needs to get the text of the current buffer. I found this page, which shows a way to do it:
public static string GetDocumentText(IntPtr curScintilla)
{
int length = (int)Win32.SendMessage(curScintilla, SciMsg.SCI_GETLENGTH, 0, 0) + 1;
StringBuilder sb = new StringBuilder(length);
Win32.SendMessage(curScintilla, SciMsg.SCI_GETTEXT, length, sb);
return sb.ToString();
}
And that's fine, until we reach the character encoding issues. I have a buffer that is set in the Encoding menu to "UTF-8 without BOM", and I write that text to a file:
System.IO.File.WriteAllText(#"C:\Users\davet\BBBBBB.txt", sb.ToString());
when I open that file (in notepad++) the encoding menu shows UTF-8 without BOM but the ß character is broken (ß).
I was able to get as far as finding the encoding for my current buffer:
int currentBuffer = (int)Win32.SendMessage(PluginBase.nppData._nppHandle, NppMsg.NPPM_GETCURRENTBUFFERID, 0, 0);
Console.WriteLine("currentBuffer: " + currentBuffer);
int encoding = (int) Win32.SendMessage(PluginBase.nppData._nppHandle, NppMsg.NPPM_GETBUFFERENCODING, currentBuffer, 0);
Console.WriteLine("encoding = " + encoding);
And that shows "4" for "UTF-8 without BOM" and "0" for "ASCII", but I cannot find what notepad++ or Scintilla thinks those values are supposed to represent.
So I'm a bit lost for where to go next (Windows not being my natural habitat). Anyone know what I'm getting wrong, or how to debug it further?
Thanks.
Removing the StringBuilder fixes this problem.
public static string GetDocumentTextBytes(IntPtr curScintilla) {
int length = (int) Win32.SendMessage(curScintilla, SciMsg.SCI_GETLENGTH, 0, 0) + 1;
byte[] sb = new byte[length];
unsafe {
fixed (byte* p = sb) {
IntPtr ptr = (IntPtr) p;
Win32.SendMessage(curScintilla, SciMsg.SCI_GETTEXT, length, ptr);
}
return System.Text.Encoding.UTF8.GetString(sb).TrimEnd('\0');
}
}
Alternative approach:
The reason for the broken UTF-8 characters is that this line..
Win32.SendMessage(curScintilla, SciMsg.SCI_GETTEXT, length, sb);
..reads the string using [MarshalAs(UnmanagedType.LPStr)], which uses your computer's default ANSI encoding when decoding strings (MSDN). This means you get a string with one character per byte, which breaks for multi-byte UTF-8 characters.
Now, to save the original UTF-8 bytes to disk, you simply need to use the same default ANSI encoding when writing the file:
File.WriteAllText(#"C:\Users\davet\BBBBBB.txt", sb.ToString(), Encoding.Default);

Bit Array to String and back to Bit Array

Possible Duplicate Converting byte array to string and back again in C#
I am using Huffman Coding for compression and decompression of some text from here
The code in there builds a huffman tree to use it for encoding and decoding. Everything works fine when I use the code directly.
For my situation, i need to get the compressed content, store it and decompress it when ever need.
The output from the encoder and the input to the decoder are BitArray.
When I tried convert this BitArray to String and back to BitArray and decode it using the following code, I get a weird answer.
Tree huffmanTree = new Tree();
huffmanTree.Build(input);
string input = Console.ReadLine();
BitArray encoded = huffmanTree.Encode(input);
// Print the bits
Console.Write("Encoded Bits: ");
foreach (bool bit in encoded)
{
Console.Write((bit ? 1 : 0) + "");
}
Console.WriteLine();
// Convert the bit array to bytes
Byte[] e = new Byte[(encoded.Length / 8 + (encoded.Length % 8 == 0 ? 0 : 1))];
encoded.CopyTo(e, 0);
// Convert the bytes to string
string output = Encoding.UTF8.GetString(e);
// Convert string back to bytes
e = new Byte[d.Length];
e = Encoding.UTF8.GetBytes(d);
// Convert bytes back to bit array
BitArray todecode = new BitArray(e);
string decoded = huffmanTree.Decode(todecode);
Console.WriteLine("Decoded: " + decoded);
Console.ReadLine();
The Output of Original code from the tutorial is:
The Output of My Code is:
Where am I wrong friends? Help me, Thanks in advance.
You cannot stuff arbitrary bytes into a string. That concept is just undefined. Conversions happen using Encoding.
string output = Encoding.UTF8.GetString(e);
e is just binary garbage at this point, it is not a UTF8 string. So calling UTF8 methods on it does not make sense.
Solution: Don't convert and back-convert to/from string. This does not round-trip. Why are you doing that in the first place? If you need a string use a round-trippable format like base-64 or base-85.
I'm pretty sure Encoding doesn't roundtrip - that is you can't encode an arbitrary sequence of bytes to a string, and then use the same Encoding to get bytes back and always expect them to be the same.
If you want to be able to roundtrip from your raw bytes to string and back to the same raw bytes, you'd need to use base64 encoding e.g.
http://blogs.microsoft.co.il/blogs/mneiter/archive/2009/03/22/how-to-encoding-and-decoding-base64-strings-in-c.aspx

Categories

Resources