Read file which name contains hangul from console [duplicate] - c#

I am trying to read unicode string from a console in C#, for the sake of example, lets uset his one:
c:\SVN\D³ebugger\src\виталик\Program.cs
At first I just tried to Console.ReadLine() which returned me c:\SVN\D3ebugger\src\???????\Program.cs
I've tried to set the Console.InputEncoding to UTF8 like so Console.InputEncoding = Encoding.UTF8 but that returned me c:\SVN\D³ebugger\src\???????\Program.cs, basically mucking up the Cyrillic part of the string.
So randomly stumbling I've tried to set the encoding like that, Console.InputEncoding = Encoding.GetEncoding(1251); which returned c:\SVN\D?ebugger\src\виталик\Program.cs, this time corrupting the ³ character.
At this point it seems that by switching encodings for the InputStream I can only get a single language at a time.
I've also tried going native and doing something like that:
// Code
public static string ReadLine()
{
const uint nNumberOfCharsToRead = 1024;
StringBuilder buffer = new StringBuilder();
uint charsRead = 0;
bool result = ReadConsoleW(GetStdHandle(STD_INPUT_HANDLE), buffer, nNumberOfCharsToRead, out charsRead, (IntPtr)0);
// Return the input minus the newline character
if (result && charsRead > 1) return buffer.ToString(0, (int)charsRead - 1);
return string.Empty;
}
// Extern definitions
[DllImport("Kernel32.DLL", ExactSpelling = true)]
internal static extern IntPtr GetStdHandle(int nStdHandle);
[DllImport("kernel32.dll", CharSet = CharSet.Unicode, ExactSpelling = true)]
static extern bool ReadConsoleW(IntPtr hConsoleInput, [Out] StringBuilder lpBuffer,
uint nNumberOfCharsToRead, out uint lpNumberOfCharsRead, IntPtr lpReserved);
That was working fine for non-unicode strings, however, when I tried to make it read my sample string, the application crashed. I've tried to tell Visual Studio to break on ALL exception (including native ones), yet, the application would still crash.
I also found this open bug in Microsoft's Connect that seems to say that it is impossible right now to read Unicode from the console's InputStream.
It is worth noting, even though not strictly related to my question, that Console.WriteLine is able to print this string just fine, if Console.OutputEncoding is set to UTF8.
Thank you!
Update 1
I am looking for a solution for .NET 3.5
Update 2
Updated with the full native code I've used.

This seems to work fine when targetting .NET 4 client profile, but unfortunately not when targetting .NET 3.5 client profile. Ensure you change the console font to Lucida Console.
As pointed out by #jcl, even though I have targetted .NET4, this is only because I have .NET 4.5 installed.
class Program
{
private static void Main(string[] args)
{
Console.InputEncoding = Encoding.Unicode;
Console.OutputEncoding = Encoding.Unicode;
while (true)
{
string s = Console.ReadLine();
if (!string.IsNullOrEmpty(s))
{
Debug.WriteLine(s);
Console.WriteLine(s);
}
}
}
}

Here's one fully working version in .NET 3.5 Client:
class Program
{
[DllImport("kernel32.dll", SetLastError = true)]
static extern IntPtr GetStdHandle(int nStdHandle);
[DllImport("kernel32.dll")]
static extern bool ReadConsoleW(IntPtr hConsoleInput, [Out] byte[]
lpBuffer, uint nNumberOfCharsToRead, out uint lpNumberOfCharsRead,
IntPtr lpReserved);
public static IntPtr GetWin32InputHandle()
{
const int STD_INPUT_HANDLE = -10;
IntPtr inHandle = GetStdHandle(STD_INPUT_HANDLE);
return inHandle;
}
public static string ReadLine()
{
const int bufferSize = 1024;
var buffer = new byte[bufferSize];
uint charsRead = 0;
ReadConsoleW(GetWin32InputHandle(), buffer, bufferSize, out charsRead, (IntPtr)0);
// -2 to remove ending \n\r
int nc = ((int)charsRead - 2) * 2;
var b = new byte[nc];
for (var i = 0; i < nc; i++)
b[i] = buffer[i];
var utf8enc = Encoding.UTF8;
var unicodeenc = Encoding.Unicode;
return utf8enc.GetString(Encoding.Convert(unicodeenc, utf8enc, b));
}
static void Main(string[] args)
{
Console.OutputEncoding = Encoding.UTF8;
Console.Write("Input: ");
var st = ReadLine();
Console.WriteLine("Output: {0}", st);
}
}

Related

Why IntPtr cannot be used when in subsequent call

My program:
class Program {
[DllImport("libiconvD.dll", CallingConvention = CallingConvention.Cdecl)]
public static extern IntPtr libiconv_open([MarshalAs(UnmanagedType.LPStr)]
string tocode,
[MarshalAs(UnmanagedType.LPStr)]
string fromcode);
[DllImport("libiconvD.dll", CallingConvention = CallingConvention.Cdecl)]
static extern ulong libiconv(IntPtr icd,
ref StringBuilder inbuf, ref ulong inbytesleft,
out StringBuilder outbuf, out ulong outbytesleft);
[DllImport("libiconvD.dll", CallingConvention = CallingConvention.Cdecl)]
static extern int libiconv_close(IntPtr icd);
static void Main(string[] args) {
var inbuf = new StringBuilder("Rule(s): Global Tag – Refer to Print Rules – General Requirements");
ulong inbytes = (ulong)inbuf.Length;
ulong outbytes = inbytes;
StringBuilder outbuf = new StringBuilder((int)outbytes);
IntPtr icd = libiconv_open("utf8", "windows-1252");
var rcode1 = libiconv(icd, ref inbuf, ref inbytes, out outbuf, out outbytes);
Debug.WriteLine(rcode1);
var rcode2 = libiconv_close(icd);
Debug.WriteLine(rcode2);
}//Main()
}//Program CLASS
The first call of libiconv_open() works and return a pointer to icd.
When the 2nd call of libiconv() runs it gets access violation on the icd pointer.
Here is the C code being called:
size_t iconv (iconv_t icd,
ICONV_CONST char* * inbuf, size_t *inbytesleft,
char* * outbuf, size_t *outbytesleft)
{
conv_t cd = (conv_t) icd;
if (inbuf == NULL || *inbuf == NULL)
return cd->lfuncs.loop_reset(icd,outbuf,outbytesleft);
else
return cd->lfuncs.loop_convert(icd,
(const char* *)inbuf,inbytesleft,
outbuf,outbytesleft);
}
It seems it can't access the function defined in the structure that pointer points to. Is there something special that has to be done to a returned pointer to make usable in subsequent calls.
Thanks
Turns out that using libiconv library is unnecessary with C#. Just use the Encoding class.
static void Main(string[] args) {
UTF8Encoding utf8 = new UTF8Encoding();
Encoding w1252 = Encoding.GetEncoding(1252);
string inbuf = "Rule(s): Global Tag – Refer to Print Rules – General Requirements";
byte[] bytearray = utf8.GetBytes(inbuf);
byte[] outbytes = Encoding.Convert(utf8, w1252, bytearray);
Debug.WriteLine("*************************");
Debug.WriteLine(String.Format(" Input: {0}", inbuf));
Debug.WriteLine(String.Format(" Output: {0}", utf8.GetString(outbytes)));
Debug.WriteLine("*************************");
}//Main()
*************************
Input: Rule(s): Global Tag – Refer to Print Rules – General Requirements
Output: Rule(s): Global Tag – Refer to Print Rules – General Requirements
*************************

How to make C# ReadLine() method maintain the same non-english characters? [duplicate]

I am trying to read unicode string from a console in C#, for the sake of example, lets uset his one:
c:\SVN\D³ebugger\src\виталик\Program.cs
At first I just tried to Console.ReadLine() which returned me c:\SVN\D3ebugger\src\???????\Program.cs
I've tried to set the Console.InputEncoding to UTF8 like so Console.InputEncoding = Encoding.UTF8 but that returned me c:\SVN\D³ebugger\src\???????\Program.cs, basically mucking up the Cyrillic part of the string.
So randomly stumbling I've tried to set the encoding like that, Console.InputEncoding = Encoding.GetEncoding(1251); which returned c:\SVN\D?ebugger\src\виталик\Program.cs, this time corrupting the ³ character.
At this point it seems that by switching encodings for the InputStream I can only get a single language at a time.
I've also tried going native and doing something like that:
// Code
public static string ReadLine()
{
const uint nNumberOfCharsToRead = 1024;
StringBuilder buffer = new StringBuilder();
uint charsRead = 0;
bool result = ReadConsoleW(GetStdHandle(STD_INPUT_HANDLE), buffer, nNumberOfCharsToRead, out charsRead, (IntPtr)0);
// Return the input minus the newline character
if (result && charsRead > 1) return buffer.ToString(0, (int)charsRead - 1);
return string.Empty;
}
// Extern definitions
[DllImport("Kernel32.DLL", ExactSpelling = true)]
internal static extern IntPtr GetStdHandle(int nStdHandle);
[DllImport("kernel32.dll", CharSet = CharSet.Unicode, ExactSpelling = true)]
static extern bool ReadConsoleW(IntPtr hConsoleInput, [Out] StringBuilder lpBuffer,
uint nNumberOfCharsToRead, out uint lpNumberOfCharsRead, IntPtr lpReserved);
That was working fine for non-unicode strings, however, when I tried to make it read my sample string, the application crashed. I've tried to tell Visual Studio to break on ALL exception (including native ones), yet, the application would still crash.
I also found this open bug in Microsoft's Connect that seems to say that it is impossible right now to read Unicode from the console's InputStream.
It is worth noting, even though not strictly related to my question, that Console.WriteLine is able to print this string just fine, if Console.OutputEncoding is set to UTF8.
Thank you!
Update 1
I am looking for a solution for .NET 3.5
Update 2
Updated with the full native code I've used.
This seems to work fine when targetting .NET 4 client profile, but unfortunately not when targetting .NET 3.5 client profile. Ensure you change the console font to Lucida Console.
As pointed out by #jcl, even though I have targetted .NET4, this is only because I have .NET 4.5 installed.
class Program
{
private static void Main(string[] args)
{
Console.InputEncoding = Encoding.Unicode;
Console.OutputEncoding = Encoding.Unicode;
while (true)
{
string s = Console.ReadLine();
if (!string.IsNullOrEmpty(s))
{
Debug.WriteLine(s);
Console.WriteLine(s);
}
}
}
}
Here's one fully working version in .NET 3.5 Client:
class Program
{
[DllImport("kernel32.dll", SetLastError = true)]
static extern IntPtr GetStdHandle(int nStdHandle);
[DllImport("kernel32.dll")]
static extern bool ReadConsoleW(IntPtr hConsoleInput, [Out] byte[]
lpBuffer, uint nNumberOfCharsToRead, out uint lpNumberOfCharsRead,
IntPtr lpReserved);
public static IntPtr GetWin32InputHandle()
{
const int STD_INPUT_HANDLE = -10;
IntPtr inHandle = GetStdHandle(STD_INPUT_HANDLE);
return inHandle;
}
public static string ReadLine()
{
const int bufferSize = 1024;
var buffer = new byte[bufferSize];
uint charsRead = 0;
ReadConsoleW(GetWin32InputHandle(), buffer, bufferSize, out charsRead, (IntPtr)0);
// -2 to remove ending \n\r
int nc = ((int)charsRead - 2) * 2;
var b = new byte[nc];
for (var i = 0; i < nc; i++)
b[i] = buffer[i];
var utf8enc = Encoding.UTF8;
var unicodeenc = Encoding.Unicode;
return utf8enc.GetString(Encoding.Convert(unicodeenc, utf8enc, b));
}
static void Main(string[] args)
{
Console.OutputEncoding = Encoding.UTF8;
Console.Write("Input: ");
var st = ReadLine();
Console.WriteLine("Output: {0}", st);
}
}

What's the appropriate method for marshalling an array of strings?

I'm having an issue where I get a different memory layout when debugging with ReSharper.
I have an unmanaged method that returns an array of (at most) 7-character, null-terminated strings. When executing this method without ReSharper's debugger, the start of the "next" string is 16 bytes later. When executing it with ReSharper's debugger (via ReSharper's Unit Test form choosing the "Debug Unit Tests" option), the start is 64 bytes later.
The method signature is similar to the snippet below. The string array is then "created" similar to the solution here.
[return: MarshalAs(UnmanagedType.I1)]
[DllImport("myDll.dll", CharSet = CharSet.Ansi, CallingConvention = CallingConvention.Cdecl)]
private static extern bool GetStrings(IntPtr sourceFile,
out IntPtr ptrToStrings,
out uint numberOfStrings);
Try using this to obtain the strings:
[return: MarshalAs(UnmanagedType.I1)]
[DllImport("myDll.dll", CharSet = CharSet.Ansi, CallingConvention = CallingConvention.Cdecl)]
private static unsafe extern bool GetStrings(IntPtr sourceFile,
[Out] out byte* ptrToStrings,
[Out] out uint numberOfStrings);
[SecuritySafeCritical]
private static unsafe string[] ManagedMethod(IntPtr sourceFile)
{
uint size;
byte* array;
if (!GetStrings(sourceFile, out array, out size))
{
throw new Exception("Unable to read strings.");
}
string[] retval = new string[size];
for (int i = 0, p = 0; i < size; i++, p += 8)
{
retval[i] = Marshal.PtrToStringAnsi(new IntPtr(&array[p]));
}
return retval;
}

Using pinvoke in c# to call sprintf and friends on 64-bit

I am having an interesting problem with using pinvoke in C# to call _snwprintf. It works for integer types, but not for floating point numbers.
This is on 64-bit Windows, it works fine on 32-bit.
My code is below, please keep in mind that this is a contrived example to show the behavior I am seeing.
class Program
{
[DllImport("msvcrt.dll", CharSet = CharSet.Unicode, CallingConvention = CallingConvention.Cdecl)]
private static extern int _snwprintf([MarshalAs(UnmanagedType.LPWStr)] StringBuilder str, IntPtr length, String format, int p);
[DllImport("msvcrt.dll", CharSet = CharSet.Unicode, CallingConvention = CallingConvention.Cdecl)]
private static extern int _snwprintf([MarshalAs(UnmanagedType.LPWStr)] StringBuilder str, IntPtr length, String format, double p);
static void Main(string[] args)
{
Double d = 1.0f;
Int32 i = 1;
Object o = (object)d;
StringBuilder str = new StringBuilder(32);
_snwprintf(str, (IntPtr)str.Capacity, "%10.1lf", (Double)o);
Console.WriteLine(str.ToString());
o = (object)i;
_snwprintf(str, (IntPtr)str.Capacity, "%10d", (Int32)o);
Console.WriteLine(str.ToString());
Console.ReadKey();
}
}
The output of this program is
0.0
1
It should print 1.0 on the first line and not 0.0, and so far I am stumped.
I'm not exactly sure why your calls do not work, but the secured versions of these methods do work properly in both x86 and x64.
The following code does work, as expected:
class Program
{
[DllImport("msvcrt.dll", CharSet = CharSet.Unicode, CallingConvention = CallingConvention.Cdecl)]
private static extern int _snwprintf_s([MarshalAs(UnmanagedType.LPWStr)] StringBuilder str, IntPtr bufferSize, IntPtr length, String format, int p);
[DllImport("msvcrt.dll", CharSet = CharSet.Unicode, CallingConvention = CallingConvention.Cdecl)]
private static extern int _snwprintf_s([MarshalAs(UnmanagedType.LPWStr)] StringBuilder str, IntPtr bufferSize, IntPtr length, String format, double p);
static void Main(string[] args)
{
// Preallocate this to a given length
StringBuilder str = new StringBuilder(100);
double d = 1.4;
int i = 7;
float s = 1.1f;
// No need for box/unbox
_snwprintf_s(str, (IntPtr)100, (IntPtr)32, "%10.1lf", d);
Console.WriteLine(str.ToString());
_snwprintf_s(str, (IntPtr)100, (IntPtr)32, "%10.1f", s);
Console.WriteLine(str.ToString());
_snwprintf_s(str, (IntPtr)100, (IntPtr)32, "%10d", i);
Console.WriteLine(str.ToString());
Console.ReadKey();
}
}
It is possible with the undocumented __arglist keyword:
using System;
using System.Text;
using System.Runtime.InteropServices;
class Program {
[DllImport("msvcrt.dll", CharSet = CharSet.Unicode, CallingConvention = CallingConvention.Cdecl)]
private static extern int _snwprintf(StringBuilder str, int length, String format, __arglist);
static void Main(string[] args) {
Double d = 1.0f;
Int32 i = 1;
String s = "nobugz";
StringBuilder str = new StringBuilder(666);
_snwprintf(str, str.Capacity, "%10.1lf %d %s", __arglist(d, i, s));
Console.WriteLine(str.ToString());
Console.ReadKey();
}
}
Please don't use that.
uint is 32 bits. The length parameter of snprintf is size_t, which is 64 bits in 64 bit processes. Change the second parameter to IntPtr, which is the closest .NET equivalent of size_t.
In addition, you need to preallocate your StringBuilder. Currently you have a buffer overrun.
Try MarshalAs R8 (Thats for real/floating 8 (which is double)) on last parameter in second function.
Take a look at these two articles:
http://www.codeproject.com/Messages/2840231/Alternative-using-MSVCRT-sprintf.aspx (this is actually note to CodeProject article)
http://bartdesmet.net/blogs/bart/archive/2006/09/28/4473.aspx

How to call NetUserModalsGet() from C#.NET?

EDIT: followup at NetUserModalsGet() returns strings incorrectly for C#.NET
I'm struggling with the DLL declarations for this function:
NET_API_STATUS NetUserModalsGet(
__in LPCWSTR servername,
__in DWORD level,
__out LPBYTE *bufptr
);
(Reference: http://msdn.microsoft.com/en-us/library/aa370656%28VS.85%29.aspx)
I tried this:
private string BArrayToString(byte[] myArray)
{
string retVal = "";
if (myArray == null)
retVal = "Null";
else
{
foreach (byte myByte in myArray)
{
retVal += myByte.ToString("X2");
}
}
return retVal;
}
...
[DllImport("netapi32.dll")]
public static extern int NetUserModalsGet(
string servername,
int level,
out byte[] bufptr
);
[DllImport("netapi32.dll")]
public static extern int NetApiBufferFree(
byte[] bufptr
);
...
int retVal;
byte[] myBuf;
retVal = NetUserModalsGet("\\\\" + tbHost.Text, 0, out myBuf);
myResults.Text += String.Format("retVal={0}\nBuffer={1}\n", retVal, BArrayToString(myBuf));
retVal = NetApiBufferFree(myBuf);
I get a return value of 1231 (Network Location cannot be reached) no matter if I use an IP address or a NetBIOS name of a machine that's undoubtedly online, or even my own. On edit: this happens even if I don't put a "\\" in front of the hostname.
I'm doing things wrong, I know, and let's not even get started on how to declare that blasted return buffer (which can have a number of different lengths, ewww).
From pinvoke.net (they also have some sample code on how to use it):
[DllImport("netapi32.dll", CharSet = CharSet.Unicode, CallingConvention = CallingConvention.StdCall, SetLastError = true)]
static extern uint NetUserModalsGet(
string server,
int level,
out IntPtr BufPtr);
It would be better to use System.Text.Encoding.ASCII.GetBytes(somestring_param) and System.Text.Encoding.ASCII.GetString(byte_array) instead of your own routine BArrayToString.
Hope this helps.

Categories

Resources