do you know if there is a library in C# or a dictionary that could help me to translate Hiragana to Kanji?
I know that there is the IME of Windows but I would like to customize entirely the design of the candidate list of Kanji for a given Hiragana and it is not possible with this IME.
Exemple : the user writes "toru", first it is translated in Hiragana : "とる"
I would like to have this list of choice:
撮る
取る
盗る
Thanks!
Unfortunatelly I do not know of a c# library. All I found involves importing some native libraries, like in this OS thread: Japanese to Romaji with Kakasi
If you are willing to do so, perhaps JWPce might help.
Although this is implemented as a Japanese text editor, it also contains a dictionary function (it actually contains a multitude of character lookup systems) that do what you want to do.
Possibly you can compile the project and then import those lookup functionality? JPWce is licensed under GPL and you can download both a binary executable and source code directly available from the homepage.
[Edit]
Researching some more I stumbled over mozc at Google Code:
Mozc is a Japanese Input Method Editor (IME) designed for
multi-platform such as Chromium OS, Windows, Mac and Linux. This
open-source project originates from Google Japanese Input.
(BSD license)
I have not looked into it myself yet, but it might be more what you are looking for as it does not have a full application "around it" but instead is intended to be used a library. Just like you wanted.
They also link to a short video how the input looks like: http://www.google.co.jp/ime/
Unfortunatelly, this still is C++, not .NET but it might be a starting point.
Microsoft publishes this as a separate product, called Visual Studio International Pack
http://visualstudiogallery.msdn.microsoft.com/74609641-70BD-4A18-8550-97441850A7A8
I do not know a C# library either. But given that a dictionary might be sufficient, you may want to look into using the IME dictionary that comes with Anthy.
If you download the sources of the most recent version, you'll find dictionary sources in the mkworddic and alt-cannadic directories. Look at the various files ending in .t.
Note that they are encoded in EUC-JP; you might want to convert them to UTF-8.
Related
Back in the late eighties I seem to remember using a unix utility method named 'banner' - (see http://en.wikipedia.org/wiki/Banner_(Unix) It basically took a string of text and 'rendered' it as a larger text 'banner' using each character as blocks to form the original character. It was usually used at the start of print runs to create a heading for multi-sheet reports.
Does anyone know of a C# library that reproduces this functionality?
I remember a similar thing on the VAX mainframes at college. useful with a roomful of people banging in Pascal programs on VT100 terminals, and only one matrix printer behind closed doors in the hallowed office where the IT staff worked.
Anyway, this is a C# ASCII art creator - if you could find a way of rendering your text to a graphic file you could then convert it.
You probably could invoke a free software C program doing the same, perhaps figlet
Just for my personal interest, I see from my research that it's not that easy to start your own OCR. However, I would like to hear ideas on how to achieve the challenge of not just recognising characters, but also giving back the results in the formatted string.
For example, I have an image of a table (imagine that it's an image with "|" and "_" being drawn straight lines):
|Number, AnotherNumber|Some Text|
|1,4 |Blah |
And after using a silent OCR, I get the result as "|Number, AnotherNumber|SomeText|\n|1,4|Blah|"
Any ideas of how could I achieve this, and what available tools/libraries I could make use of? I also would like to write this in C# with Visual Studio 2010. And ideally to work with PDFs but different image formats are fine. I've already looked at some, but they seem non-compatible as they use C++ or C.
Thank you.
Alina.
getting ocr libaries is quite hard (of course just if you dont pant to pay for it)
you could try this one, its not free but if you have office 2007:
http://www.codeproject.com/Articles/41709/How-To-Use-Office-2007-OCR-Using-C
Quick question: How can I access the BN_CLICKED constant and other constants defined for the Win32 API from .NET? Are they defined in some library? Do I have to define them myself? If so, where can I find these values? And are the values version-specific between versions of Windows?
I find the PInvoke Interop Assistant to be really helpful:
http://blogs.microsoft.co.il/blogs/sasha/archive/2008/01/12/p-invoke-signature-generator.aspx.
It has almost everything and can convert the C++ to C#/VB for you. I rarely, if ever, resort to searching google/pinvoke.net anymore.
Here's the MSDN Magazine Article: http://msdn.microsoft.com/en-us/magazine/cc164193.aspx
The original January 2008 MSDN Magazine Article is now only available as a .CHM help file download, linked from the very bottom of https://msdn.microsoft.com/magazine/msdn-magazine-issues. (Column "CLR Inside Out: Marshaling between managed and unmanaged code.")
And here's the download: http://download.microsoft.com/download/f/2/7/f279e71e-efb0-4155-873d-5554a0608523/CLRInsideOut2008_01.exe. The source code can be found at http://clrinterop.codeplex.com/.
You could download the Microsoft Platform SDK and take a look at the header files (*.h). E.g. the BN_CLICKED is defined in the winuser.h file.
Usually, if you just need one or two constants, a Google search and a look at the first few results is also sufficient, since the value is printed there.
http://pinvoke.net/ is an excellent resource for this many common P/Invoke definitions.
The MagNumDB website (by SO user Simon Mourier) is an easy way to look up constants:
http://www.magnumdb.com/search?q=BN_CLICKED
It's kind of a proof-of-concept, but I put together a script that can look up most any Windows API constant. Example usage:
PS > .\Get-WindowsSDKConstant.ps1 BN_CLICKED
0
PS > .\Get-WindowsSDKConstant.ps1 BN_DBLCLK
5
PS > .\Get-WindowsSDKConstant.ps1 WM_COMMAND
273
It requires you to download Visual Studio and the Windows 10 SDK, because behind-the-scenes it compiles a program that looks up the constant.
Finally, here's some answers to the asker's questions:
Are [the constants] defined in some library?
The authoritative source is the Windows Platform SDK
Do I have to define them myself?
They're not built-in to Windows or .NET, which means you'll probably define them yourself (or copy them from somewhere).
And are the values version-specific between versions of Windows?
They're very stable, because otherwise a program compiled for one version of Windows might stop working when a user upgrades to a newer version of Windows. Microsoft goes to great lengths to prevent this from happening.
However, I've seen at least one place where the constants are different depending on what platform/architecture you're compiling on. I wouldn't assume that just because your code works on x86 64-bit Windows, it'll work on ARM 32-bit Windows RT, for example.
I'm maintaining a WinForms application which was not written using any development patters conducive to localizing the classes in the project which were not directly associated with forms, or the code-behind partials of the forms.
Thus, there is MessageBox() code with English text in it in almost every code file. I'd like to find a tool which will "scrape" those strings from the code, insert the strings in a resource file, and substitute a call to the resource with the substituted string in a comment.
Does such a tool exist?
See ReSharper 5 Internationalization Features
When ReSharper finds a localizable
string, it helps you move it to a
resource file with only a couple of
clicks. You can optionally search for
identical strings and refactor them to
use the new resource item.
Hope it helps.
The Visual Localizer projects focuses on this very issue and is free.
Resharper can do it. There is a Resharper plugin called RGreatEx that has a lot of localization refactorings for strings. I'm guessing you are looking for something free though, and both of these cost money. RGreatEx also hasn't seen updates in more than two years.
EDIT: Did some more searching, and found this tool on CodePlex. It doesn't supporting pulling strings into resources, but it does have side-by-side editing of multiple resource files to ease writing string translations.
ReSharper is of course the best in this field, as it can scan all strings in your code base and let you know which can be placed into resources.
But if you intend to use a free tool, Microsoft does have an open source one here,
http://resourcerefactoring.codeplex.com/
You have to manually scan all files using this tool as it is not as smart as ReSharper.
You can also use T4 templates to do this kind of thing. They're built into Vis Studio as of 2005 I think.
I am building a Windows dialog box that has the standard 'OK' and 'Cancel' buttons. Given that Windows uses the same button text in its own dialogs is there a way for me to grab the correct strings to use on the buttons?
This way my application will have the correct strings no matter which language is being used, without me needing to localize it for lots of different languages myself. I am using C# but can happily use platform invoke to access an OS method if needed.
NOTE: Yes, I can easily localize the resources but I do not want to find and have to enter the zillion different language strings when it must be present within windows already. Please do not answer by saying localize the app!
In Visual Studio: File + Open + File, type c:\windows\system32\user32.dll. Open the String Table node and double click String Table. Scroll down to 800.
Microsoft takes a pretty no-nonsense stance against relying on these resource IDs. Given the number of programmers who've done what you're contemplating, it is however unlikely they can ever change these numbers. You'll need to P/Invoke LoadLibrary() and LoadString().
However, your ultimate downfall on this plan is Vista/Win7 Ultimate with MUI language packs. Which allows the user to switch between languages without updating the resource strings in the DLLs. Such an edition will always have English strings.
see MB_GetString which claims to do exactly this:
https://msdn.microsoft.com/en-us/library/windows/desktop/dn910915(v=vs.85).aspx
however, it seems to require runtime linkage:
http://undoc.airesoft.co.uk/user32.dll/MB_GetString.php
Well, if you use the standard MessageBox.Show() function and pass it approriate parameters it will automatically localize the yes/no/okay/cancel buttons for you.
What is more interesting is how you localize the message text.
No, there is no standard, supported way to do this. Yes, Windows does store these strings and it's (with some effort) possible to obtain them, but there is no guarantee that they'll remain in the same location and under the same identifier from version to version.
While you might not want this to be the answer, the answer is, indeed, to localize your application. If you're localizing everything else (as you'd have to, unless you just wanted OK and Cancel to be localized), I'm not sure why it would be any great effort to include localized values for OK and Cancel as well.