Convert in Another Format With Character Encoding

Convert in Another Format With Character Encoding - c#

I am using Oracle 10g database for a C# application. The issue is, in the database, the NVARCHAR column doesn't save other languages except English. As NVARCHAR supports Unicode, this should be working. But instead I've tried a simple method using a tutorial as follows:
Encoding ascii = Encoding.ASCII;
Encoding unicode = Encoding.Unicode;
//Convert the string into a byte[].
byte[] unicodeBytes = ascii.GetBytes("আমার সোনার বাংলা!"); //Text to show
//Perform the conversion from one encoding to the other.
byte[] asciiBytes = Encoding.Convert(ascii, unicode, unicodeBytes);
char[] asciiChars = new char[ascii.GetCharCount(asciiBytes, 0, asciiBytes.Length)];
ascii.GetChars(asciiBytes, 0, asciiBytes.Length, asciiChars, 0);
string asciiString = new string(asciiChars);
Console.WriteLine(asciiString);
Console.ReadKey();
May seems silly but was expecting if I can show the text in the console app with the above format. Now it shows question (??????) marks. Any way I can show the text and save it at least in any other format, so I can retrieve and show it appropriately in the front-end.

If you can use unicode (and you should, hey its 2018), then it would be best to avoid Bijoy altogether. Process and store everything that is a string, as System.String in .NET and as NVARCHAR in Oracle.
The Windows console can handle unicode without any problems, if we observe two important prerequisites that the documentation clearly states:
Support for Unicode [...] requires a font that has the glyphs needed to render that character. To successfully display Unicode characters to the console, the console font must be set to a [...] font such as Consolas or Lucida Console
This is something you must ensure in Windows setting, independently from your .NET application.
The second prerequisite, emphasis mine:
[...] Console class supports UTF-8 encoding [...] Beginning with the .NET Framework 4.5, the Console class also supports UTF-16 encoding [...] To display Unicode characters to the console. you set the OutputEncoding property to either UTF8Encoding or UnicodeEncoding.
What the documentation does not say, is that none of the fonts that can be selected from the properties menu of the console window will normally contain glyphs of all alphabets in the world. If you needed right-to-left capability as for example with Hebrew or Arabic, you're out of luck.
If the program is running a Windows version without the east asian fonts preinstalled, follow this tutorial to install the Bangla LanguageInterfacePack (KB3180030).
Then apply this answer to our problem as follows:
open the windows registry editor
navigate to HKLM\Software\Microsoft\WindowsNT\CurrentVersion\Console\TrueTypeFont
create a new string value, assign an available key like "000", and the value "Bangla Medium"
reboot the PC
Now set the console font to "Bangla", using the window menu of the console, last menu item "Properties", second tab "Font".
Finally get rid of all that encoding back and forth, and simply write:
using System;
using System.Text;
namespace so49851713
{
class Program
{
public static void Main()
{
var mbb = "\u263Aআমার সোনার বাংলা!";
/* prepare console (once per process) */
Console.OutputEncoding = UTF8Encoding.UTF8;
Console.WriteLine(mbb);
Console.ReadLine();
}
}
}

Related

Generating GS1 DataMatrix using ZXing.Net

What I need
Is to generate a working GS1 DataMatrix, using this test content:
(240)1234567890(10)AA12345(11)123456(21)1(96)1234567
Steps
I've downloaded the nuget package from here:
and
I've created a console app that uses this code:
private static void DoGs1DataMatrixStuff()
{
var writer = new BarcodeWriter
{
Format = BarcodeFormat.DATA_MATRIX
};
writer
.Write("(240)1234567890(10)AA12345(11)123456(21)1(96)1234567")
.Save(#"C:\Temp\barcode.png");
}
There's no obvious specific GS1_DataMatrix format I can use ...
that gives me
which if read by a scanner app on my smartphone, gives the literal content that I originally presented, not with the FNC1 formatting that I expect for GS1:
(240)1234567890(10)AA12345(11)123456(21)1(96)1234567
while it should be
2401234567890 10AA12345 11123456211 961234567
From another source (not a source I can use) I got this barcode:
Using my smartphone app this reads into the correct data.
Question
How can I recreate this working GS1 datamatrix, using ZXing.Net?
also see
this link, Chris Bahns raises the same concern I have, but his request didn't get a working answer.

You have to use a formatted string with ASCII character 29 (GS - Group Separator):
< GS >2401234567890< GS >10AA12345< GS >11123456211< GS >961234567
(replace the "< GS >" with ASCII 29)
ZXing.Net supports the GS symbol with the ASCII encoder since version 0.15. It replaces the ASCII 29 value with the FNC1 codeword (232) in the resulting datamatrix image.
That's only a low level support. There is no built in class or something similar which understands AI (application identifiers) with fixed or variable length (similar to the result parser classes for vCards, vEvent, ISBN, ...).

MigraDox C# Checkboxes - Wingdings Not Working

I am needing to simulate a checkbox in a PDF I am generating using the MigraDoc library. I stumbled across two sources that offer essentially the same solution (here and here)
However, I am not getting the expected results. Instead I am getting þ for boxes that are supposed to be checked, and ¨ for those that are to be unchecked. What might the issue be?
Snippet of my code
para = section.AddParagraph();
para.Style = "ListLevelOne";
para.AddFormattedText("1 ", "Bold");
para.AddFormattedText(IsQ1Checked ? "\u00fe" : "\u00A8", new Font("Wingdings"));

MigraDoc does not use the font "Wingdings", instead it uses a default font (could be MS Sans or so) and therefore you see the characters from a standard font, not the Wingdings symbol.
The problem is somewhere outside the code snippet you are showing here. Make sure the font Wingdings is installed on the computer.

You may want to embed the font and ensure that MigraDoc uses unicode encoding instead of ansi:
private const bool unicode = true;
private const PdfFontEmbedding embedding = PdfFontEmbedding.Always;
//...
var pdfRenderer = new PdfDocumentRenderer(unicode, embedding);
http://www.pdfsharp.net/wiki/migradochelloworld-sample.ashx

C# Web scraper copying text

I have a web scraper written in C# for extracting data. I want to copy text from the web browser control and paste it into a Word file programmatically. When I try to extract rich text box content using its ID and InnerText, the text contains encoded characters like %2c.
I need to get the text with all formatting but I can't find any way. I have tried Encoding, HTTPUtility.UrlDecode, SendKeys and elem.InvokeMember() without success.
How can I programmatically copy and paste text from web browser control preserving formatting?
Here is the sample data to extract:
Description
The Advance Concepts Engineering team designs and develops new vehicles which will meet future regulatory requirements and customer competitive requirements. A qualified candidate will be responsible for the total vehicle packaging. The candidate will identify and resolve adaptation and packaging issues as the vehicle moves toward production. They will lead cross functional team meetings working with Systems & Components, Advance Manufacturing, Service, etc. to ensure that the solutions are optimized for all stages of the vehicle's life.
HtmlElement elem = wb.Document.GetElementById("ctl00_contplhDynamic_txtDescrContentHiddenTextarea");
if (elem == null) return;
elem.InvokeMember("Click");
//elem.InvokeMember("Select All");
//elem.InvokeMember("Copy");
SendKeys.SendWait("^a");
SendKeys.SendWait("^c");
Clipboard.Clear();
elem.Focus();
elem.InvokeMember("Right Click");
elem.InvokeMember("Select All");
elem.InvokeMember("Copy");
Clipboard.SetText(elem.InnerText);
string clipbrdText = Clipboard.GetText();
string data = elem.InnerText;
richTextBox1.Text = data;
string temp = System.Web.HttpUtility.UrlDecode(data);
Encoding iso = Encoding.GetEncoding("windows-1252");
Encoding utf8 = Encoding.UTF8;
byte[] utfBytes = utf8.GetBytes(data);
byte[] isoBytes = Encoding.Convert(utf8, iso, utfBytes);
string msg = iso.GetString(isoBytes);

The text with "%2c" etc has been encoded. If you are getting the content of a web page, you are decoding the HTML, not the URL. You can use HttpUtility.HtmlDecode, or if you are using .NET 4.0 or above you can also use WebUtility.HtmlDecode - this is available within the System.Net namespace.
You should note that Word does not use HTML for its formatting, so you won't be able to paste HTML tags and expect it to recognise them. i.e. <strong>Description</strong> will not result in bold text if you type that into Word.
EDIT:
It looks like you are mixing two different ways to copy the text in the code you pasted - both SendKeys.SendWait("^c"); and elem.InvokeMember("Copy");. I presume both of these methods work?
I think the problem you are having lies in the way you are getting the text. I see you're using Clipboard.GetText() to get the text. Try specifying that it is formatted text using Clipboard.GetText(TextDataFormat.Rtf) or Clipboard.GetText(TextDataFormat.Html). This should hopefully copy the string preserving the formatting.

Read txt files (in unicode and utf8) by means of C#

I created two txt files (windows notepad) with the same content "thank you - спасибо" and saved them in utf8 and unicode. In notepad they look fine. Then I tried to read them using .Net:
...File.ReadAllText(utf8FileFullName, Encoding.UTF8);
and
...File.ReadAllText(unicodeFileFullName, Encoding.Unicode);
But in both cases I got this "thank you - ???????". What's wrong?
Upd:
code for utf8
static void Main(string[] args)
{
var encoding = Encoding.UTF8;
var file = new FileInfo(#"D:\encodes\enc.txt");
Console.OutputEncoding = encoding;
var content = File.ReadAllText(file.FullName, encoding);
Console.WriteLine("encoding: " + encoding);
Console.WriteLine("content: " + content);
Console.ReadLine();
}
Result:
thanks ÑÐ¿Ð°ÑÐ¸Ð±Ð¾

Edited as UTF8 should support the characters. It seems that you're outputting to a console or a location which hasn't had its encoding set. If so, you need to set the encoding. For the console you can do this
string allText = File.ReadAllText(unicodeFileFullName, Encoding.UTF8);
Console.OutputEncoding = Encoding.UTF8;
Console.WriteLine(allText);

Use the Encoding type Default
File.ReadAllText(unicodeFileFullName, Encoding.Default);
It will fix the ???? Chracters.

When outputting Unicode or UTF-8 encoded multi-byte characters to the console you will need to set the encoding as well as ensure that the console has a font set that supports the multi-byte character in order to display the corresponding glyph. With your existing code a MessageBox.Show(content) or display on a Windows or Web Form would appear correctly.
Have a look at http://msdn.microsoft.com/en-us/library/system.console.aspx for an explanation on setting fonts within the console window.
"Support for Unicode characters requires the encoder to recognize a particular Unicode character, and also requires a font that has the glyphs needed to render that character. To successfully display Unicode characters to the console, the console font must be set to a non-raster or TrueType font such as Consolas or Lucida Console."
As a side note, you can use the FileStream class to read the first three bytes of the file and look for the byte order mark indicator to automatically set the encoding when reading the file. For example, if byte[0] == 0xEF && byte[1] == 0xBB && byte[2] == 0xBF then you have a UTF-8 encoded file. Refer to http://en.wikipedia.org/wiki/Byte_order_mark for more information.

Read UNIX encoded file with C#

I have c# program we use to replace some Values with others, to be used after as parameters. Like 'NAME1' replaced with &1, 'NAME2' with &2, and so on.
The problem is that the data to modify is on a text file encoded on UNIX, and special characters like í, which even on memory, gets read as a square(Invalid char). Due specifications that are out of my control, the file can't be changed and have no other choice than read it like that.
I have tryed to read with most of the 130 Encodings c# offers me with:
EncodingInfo[] info = System.Text.Encoding.GetEncodings();
string text;
for (int a = 0; a < info.Length; ++a)
{
text = File.ReadAllText(fn, info[a].GetEncoding());
File.WriteAllText(fn + a, text, info[a].GetEncoding());
}
fn is the file path to read. Have checked all the made files(like 130), no one of them writes properly the í so im out of ideas and im unable to find anything on internet.
SOLUTION:
Looks like finally this code made the work to get the text properly, also, had to fix the same encoder for the Writing part:
System.Text.Encoding encoding = System.Text.Encoding.GetEncodings()[41].GetEncoding();
String text = File.ReadAllText(fn, encoding); // get file text
// DO ALL THE STUFF I HAD TO
File.WriteAllText(fn, text, encoding) System.Text.Encoding.GetEncodings()[115].GetEncoding(); //Latin 9 (ISO)
/* ALL THIS ENCODINGS WORKED APARENTLY FOR ME WITH ALL WEIRD CHARS I WAS ABLE TO WRITE :P
System.Text.Encoding.GetEncodings()[108].GetEncoding(); //Baltic (ISO)
System.Text.Encoding.GetEncodings()[107].GetEncoding(); //Latin 3 (ISO)
System.Text.Encoding.GetEncodings()[106].GetEncoding(); //Central European (ISO)
System.Text.Encoding.GetEncodings()[105].GetEncoding(); //Western European (ISO)
System.Text.Encoding.GetEncodings()[49].GetEncoding(); //Vietnamese (Windows)
System.Text.Encoding.GetEncodings()[45].GetEncoding(); //Turkish (Windows)
System.Text.Encoding.GetEncodings()[41].GetEncoding(); //Central European (Windows) <-- Used this one
*/
Thank you very much for your help
Noman(1)

you have to get the proper encoding format. try
use file -i. That will output MIME-type information for the file,
which will also include the character-set encoding. I found a
man-page for it, too :)
Or try enca
It can guess and even convert between encodings. Just look at
the man page.
If you have the proper encoding format, look for a way to apply it to your file reading.
Quotes: How to find encoding of a file in Unix via script(s)

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Convert in Another Format With Character Encoding - c#

Related

Generating GS1 DataMatrix using ZXing.Net

MigraDox C# Checkboxes - Wingdings Not Working

C# Web scraper copying text

Read txt files (in unicode and utf8) by means of C#

Read UNIX encoded file with C#

Categories

Resources