My program reads a CSV file that contains hebrew text, it then displays the values in a form but the text is unredable. What am I doing wrong?
Thanks
James
Possible options for what you're doing wrong:
Reading the file with the wrong encoding
Using a font that doesn't support Hebrew
Using a control that doesn't support right-to-left
How are you reading the file? If you look at the data in the debugger, does it seem correct? Do you know what encoding the file is in to start with?
See my Debugging Unicode Problems for some suggestions - although they won't help with any right-to-left issues. (I'm afraid I don't know much about bidi displays.)
Related
I am trying to read a file that has some letters that aren't showing up correctly when I am trying to convert the file to XMl. The letters come up as blocks when I open them in notepad++, but in the original document they are correct. An example letter is á.
I am using UTF-8 to encode the file so it should be covered in that but it isn't for some reason. If I change it to windows 1252 then it shows the character correctly.
Why is it not available in UTF-8 encoding but is in Windows 1252?
If you need anymore information then just ask, Thanks in advance.
I need a help in developing a Windows Appl using C#.NET VS2010. The functionality is very simple, the user will input a text file and my program is supposed to extract the relevant data from the text file and output it to either csv or text or whatever.
My biggest problem whenever I deal with text files is the format. Even though if you open the input text file in a Notepad or Wordpad it looks perfect, the layout etc. But once we start programming it I realize that what I am seeing is not the way the data is stored inside the file. I read many articles on Unicode/UTF etc.. etc.. but I dont have a definite solution to know exactly what my file format is. So the end result is that I end up getting many exceptions.
In Unix Shell Scripting it used to be simple. There is some good Unix command like less which is similar to more but it also display any formatting characters inside the file. Also there are some useful commands like unix2dos and dos2unix.
Nevertheless, is there some program/code or professional method which can find the exact file formatting of my input file and then reformat it to "plain text" so that the data extraction becomes easy and bug-free.
Thanks
I am having a file name "Connecticut is now 2 °C.txt" which contains a unicode but the file contents are just normal characters. Previously the code was used to identify whether the file name has unicode if so the file header was written with the unicode details. This way of implementation leads to conflict in the output file. So can anyone suggest how to find whether the file stream has an unicode in it.
Thanks in advance,
Lokesh.
By far the simplest strategy is to decide on an encoding for a particular file, e.g. UTF-8, and use it exclusively, both when you write it and then when you read it. Trying to detect what encoding is in use is decidedly error prone so it's best not to have to do this detection.
UPDATE
In the comments below you clarify that you wish to write to a file that is created by somebody else with an unknown encoding.
In full generality this is impossible to do with 100% reliability.
If you are lucky then you may find that the file comes with a Byte Order Mark (BOM). In which case you can read the BOM and thus infer the encoding. There's no requirement for a text file to contain a BOM and they frequently don't.
However, I would urge you to agree an interchange format with whoever is creating these files. Pick a single encoding and always use it.
I think this link would be helpful for you. Pay attention to IsTextUnicode Function
I have a problem that while writing a C# code the output sometimes is words arabic language,and it appears as a strange symbols,how to make C# read and show arabic??
I don't know the precise problem you're having, but would suggest you read The Absolute Minimum Every Programmer Should Know About Unicode to give yourself a solid grounding in this often confusing topic.
Arabic Console Output/Input is not possible on Windows Platforms, according to Microsoft:
http://www.microsoft.com/middleeast/msdn/arabicsupp.aspx#12
C#/.NET will display Arabic characters without a problem, as it represents string internally as UTF-16.
The issue is with how you display the characters.
If you are on the web, you need to ensure that your are including the correct charset encoding header or meta tag for the output.
Please provide more information on where you don't see the characters, and how you are outputting the strings.
maybe it is a problem with your system language, go to Control Panel then to Language options and try to change you System Local Language to Arabic and ensure that the language for non-Unicode programs is arabic.
Please make sure that you have correct fonts installed. If you have them on your system, it could be a fallback mechanism problem.
For web pages (Asp.Net), please make sure that:
You are using (and declaring) correct encoding.
You have correct fonts declared in your style definition.
I know that it sounds strange, but for Internet Explorer, it helps to set language for non-Unicode programs to what you need to support (in my case it was Chinese Simplified on German Windows 2003).
I'm reading a CSV file with Fast CSV Reader (on codeproject). When I print the content of the fields, the console show the character '?' in some words. How can fix it?
The short version is that you have to know the encoding of any text file you're going to read up front. You could use things like byte order marks and other heuristics if you really aren't going to know, but you should always allow for the value to be tweaked (in the same way that Excel does if you're importing CSV).
It's also worth double checking the values in the debugger, as it may be that it is the output that is wrong, as opposed to the reading -- bear in mind that all strings are Unicode internally, and conversion to '?' sounds like it is failing converting the unicode to the relevant code page for the console.