ગુજરાતી language in C# - c#

I am creating a "Gujarati To English Dictionary" application using C#.
In which I would want that if user types "tajmhal" from the keyboard "તાજમહાલ" would be displayed in the TEXTBOX at the same time.
ie. If "koml" , "કોમલ" is displayed. etc...
I have downloaded and set the font for the textbox to "Gujarati Saral-1" and it works.
But when I store the text of the textbox to the database it is stored as "tajmhal" not "તાજમહાલ".
so, could you please suggest to me another solution?

From what I call tell after looking at this font info page this font is NOT a Unicode font, but rather displays Latin code-points as Gujarati glyphs.
In order to make your idea work, you need to use a Gujarati Unicode-compliant font, and replace each Latin character with its equivalent character. See this table for the Gujarati code-points.

If the user types tajmhal from the keyboard, then that will be saved in the database. You can show it using whatever font you like - a nice Indian font, fancy calligraphy font, comic sans, barcode, wingdings - but it is still tajmhal.
You need to translate the word or characters from one language to another. That's not a font issue, that requires some mapping from one to the other. The characters will have different unicode values - you can try mapping them yourself, character by character, but this is unlikely to work unless the two languages have the same number of letters and they map directly from one to another. So you need to translate.
The other answer suggests using Google. You're writing a desktop application but you can still integrate with Google technology; if network is down then you don't translate and try again later.
Question: if the text is displaying as you want then why do you need to translate it? You're obviously using an english keyboard - why not store the text as english characters?

Google has a transliteration api (that is unfortunately deprecated), which might solve your problem. If you need there are other services like Quillpad that you might want to buy. These will allow you to type in one script and get the phonetic equivalent in another script - transliteration. Once you store your data in the database, you should be able to display it again, unless you're storing the English string there by mistake.

Ok, can you check your datatype in SQL, is it varchar or nvarchar. nvarchar deals with unicode characters.

Related

Search for Arabic text ignoring diacritics, alef hamza differences, and kashida in SQL

Well, I have a table with an Arabic column which may contains variant forms of the same word in multiple rows, for example the word "أسمى" might be in the following forms:
1- with diacritics: "أَسْمَى"
2- with changing the last letter of "ى" into "ي" so it would be like "أسمي"
3- with kashida or "ـ" in some part of the word so it would possibly be "أسمــى"
4-with varient forms of alef hamza (أ - إ - ا - ء), so it might be "اسمى" or "إسمى"
5- any combination of the former cases, i.e diacritics and kashida
and I'm looking for a way to store these values in the database (actually I need a solution for SQL Server), and to retrieve them regardless of these differences.
I found that I should use an Arabic collation like this arabic_ci_ai but this collation only helped me in sorting out the problems number (1 and 2)
In addition I considered using a fulltext index on the column, but this has its drawbacks and it doesn't provide full solution.
Thanks in advance.
Arabic Case Insensitive In Database Systems: How To Solve Alef With and Without Hamza Problem
update:
The link above is broken so here is the same article on Medium by Ahmed Essam
https://ahmedessamdev.medium.com/arabic-case-insensitive-in-database-systems-how-to-solve-alef-with-and-without-hamza-problem-c54ee6d40bed

C# Get character type(s) of a string by Unicode character range data

I am not trying to get specific language from a string (one word or two words). That is, not "English" or "German". For example, "Hand" can be both English and German. That seems to be a very complicated work and must rely on some cloud services which cost money and add network delay.
I just want to decide the character type, such as Latin, Chinese, Hiragana, etc, something like those shown here (http://jrgraphix.net/research/unicode_blocks.php). I could create my own class and type all those values myself, but it seems a lot of tedious work. I feel as if there should already be something that does that, and I do not want to reinvent the wheel. Is there any library for this?

Outputting Programmatically to MSword; sensing end of line

I'm trying to use the MSWord Interop Library to write a C# application that outputs specially formated text (isolated arabic letters) to a file. The problem I'm running into is determining how many characters remain before the text wraps onto a new line. I need the words to be on the same line, without wrapping, which is the default behavior. I'm finding this difficult because when I have the Arabic letters of the word isolated with spaces, they are treated as individual characters and therefore behave differently then connected words.
Any help is appreciated. Thanks.
Add each character to your range and then check the number of lines in the range
LineCount = range.ComputeStatistics(Word.WdStatistic.wdStatisticLines);
When the line count changes, you know it has been wrapped, and can remove the last character or reformat accordingly
Actually I don't know how this behaves today, but I've written something for the MSWork API when I was facing a somewhat weird fact. Actually you can't find that out. In MSWord, text in a document is always in paragraphs.
If you input text to your document, you won't get it in a page only, but this page will at least contain a paragraph for the text you wrote into it.
Unfortunately I can't figure this out again, because I don't have a license for MS Word these day.
Give it a try and look at the problem again in this way.
Hope this helps, and if not, please provide the code that generates the input and the exact version of MSWord.
Greetings,
Kjellski
I'm not sure what "Arabic letters of the word isolated with spaces" means exactly, but I assume that non breaking space is what you need.
Here's more details.

Handling Alphabets (Norwegian and Danish) in Sql Server

I have a application that have lot of products with special alphabets like é, è, ê, ó, ò, â, and ô.
Now these alphabets gives me problem like when i store them in sql server these symbols get replaced by ?. I also find problem during the processing.
How can i handle these.
Should i keep on using string to handle them or use something else
What should be their data-types in sql-server
Any help is appreciated.
Have you tried using nvarchar as the datatype? This is usually recommended when storing non-English text (the cost is more storage space). We use nvarchar for Finnish text (ä ö å), and have no problems or special processing. If writing to a stream, then make sure to use the iso-8859-1 encoding (at least for scandic languages. Eastern European languages use a different one).
If its not possible for you to change the datatype, let me know and we can come up with a different solution.

My website could possibly receive both Unicode and Non-Unicode characters

A user on my website could possibly add a comment on an item that contains both Arabic and English characters. sometimes maybe just Arabic, others just English, and others French!
It's an international website you can't expect the characters being stored in my application.
My website has nothing to do with the Facebook but I need my comments' TextBoxes to be able to accept and show any characters from any language!
...So how do you think I could achieve this ?
All strings in .NET are unicode strings (see documentation of the String class for more information). So taking input from the user in a mix of languages shouldn't be a problem.
If you plan to store this information in the database you will need to make sure the database columns are of type nchar or nvarchar. As others pointed out, when you run queries against these columns out of SSMS you will need to prefix Unicode strings with N to make sure they are handled properly. When executing queries from code you should use parameterized queries (or an ORM, which would probably be better) and ADO.NET will take care of properly constructing queries for you.
There are two elements here:
displaying the characters - this is handled on the user side, if something is missing there the outcome will be giberish, however you can't affect that
what you can affect is the way you save characters in your database - convert everything to utf-8 regardless of the input. Any popular browser is able to render utf-8.
If you use Unicode as a charset on the web pages and the database then you dont have to worry where are users from, since they will all type in unicode into your textboxes.
First, be sure for the fields of your database that will store data may be unicode charactters change these fields to Nvarchar not varchar
and you must know that NVarchar takes double value of row
ex. the maximum row size in sqlserver is 8000 character
that mean when you make a field nvarchar and make it 4000 that mean you have take the all 8000 character
Second, using according to language user select in browsing, set your charset in your code or page like
read from url like http://website.tld/ar/
<meta http-equiv="Content-Language" content="ar" >
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252" >
so according to change the language i n url you change the meta tags of your page
and that is it
and change it according to your language
Regards

Categories

Resources