Whitespace converted to rectangle - c#

When I copy the first paragraph of this page from my browser to a notepad which is under (محمد بصل), I see whitespaces replaced with rectangles in my notepad:
http://www.shorouknews.com/news/view.aspx?cdate=03092012&id=73df0e96-a9d8-44a1-83a8-77b0daf314a7
How can I convert this in C# code to be inserted properly as whitespace in a SQL Server table?
Thanks.

The issue is likely that the characters used in the source web page are not supported by the font used by notepad. As long as the character codes are maintained by everything (your IDE, DB library and so on), you should be fine. Of course, this is an assumption. I have not tried out Visual Studio's extended character support myself.

Related

Special characters won't display after ServerReport.Render() call

On the server running SQL Server Reporting Services, I'm able to run a report and get back a document with symbols included within (☎ and ✉). I'm even able to save the resulting page as a PDF - special characters intact.
The problem turns up when I create the PDF in C# code using ServerReport.Render() and all the special characters get turned into little empty squares (􀀀 and 􀀀).
I tried adding <HumanReadablePDF>true</HumanReadablePDF> into <Extension Name="PDF" Type="Microsoft.ReportingServices.Rendering.ImageRenderer.PDFRenderer,Microsoft.ReportingServices.ImageRendering"> in the \Reporting Services\ReportServer\rsreportserver.config file but that hasn't helped either.
Is there something I'm missing the in the configuration of the SSRS server? Is there another way of accomplishing this, perhaps with a special font or by parsing and replacing text with images?

C# windows form cannot display simplified Chinese characters

Somehow my previous question has been marked as duplicate.
Question:
I have a database with records in Chinese characters. I can take them out, and use them in button.Text.
However, when I use
Console.WriteLine(button.Text);
The output displays every Chinese character as a "?"
Now, why is the question NOT duplicate?
I have THOROUGHLY searched for a solution, not just on stackoverflow, on everywhere I can search (with my limited skills). Read all those related posts. I found two potential solutions:
One:
Console.OutputEncoding = Encoding.Unicode;
Unicode, UTF8, UT7, UTF32.
Two:
Change my computer's locale in Control Panel to a region with Simplified Chinese. Then reboot and run the solution again.
I have tried both these suggested solutions, individually and together. Nothing works. The output changes from "?" to completely jibberish, unrecognizable characters.
Does anyone have any idea what to do here?
This is a more complete version of my comment. The way I was able to display Simplified Chinese characters was by changing the language of Non-Unicode programs to Chinese:
Then in the cmd properties set the font to Consolas
I didn't even need to set the Console.OutputEncoding. This is the result (these are Chinese characters copy/pasted from the internet):
I think this is a duplicate of How to write Unicode characters to the console? which indicates that although .net and Unicode support your characters, the Font you are using as the output font of the Console does not support that Unicode character.
Your post does not indicate that you have tried adjusting the Console Font.

Workaround for crystal reports soft hyphen bug

I'm completely new to crystal reports 2013 and am running into the soft hyphen bug (as described in more detail in following SAP Thread). In short: the soft hyphen character won't show up in the generated crystal report file (but i need this exact character for a barcode validation but it's only shown in the generated pdf).
Since i need the crystal report file instead of a pdf, i'm looking for a workaround. I also tried the bug fix mentioned in the link above, but i have to do this fix on many systems, so it's not the best solution yet.
I thought about following workaround: If a crystal report file is going to be created, the program should generate an image of the bar code text and replace the barcode text with the generated image in the appropriate formula field.
What i want to know is:
Is this workaround even possible?
If not: are there any other, better, workarounds?
If not: do you know a working bugfix?
EDIT:
I tried a few things since i posted my question:
I thought, maybe it's a machine related problem. I'm working on a Windows 8.1 Enterprise, which is a german version. So i tried to recreate this exact problem on an english windows 8.1 OS. Unfortunately, i got the same incorrect barcode. So it doesn't seem to be a machine related problem.
I generated, programmatically with C#, a string with the "Code128" Font and saved it as a .png on my machine. It also rendered the soft hyphen as another character (unicode 172).
The problem is, that the barcode128 formula generates a checksum sign and for the exact string, i got a problem with validating, it generates the 173 char. So i can't directly influence the use of the soft hyphen since i need it for the validation.
Finally i've got the answer. It was a problem with my barcode font "Code128.ttf".
I don't know that much about fonts, but after a while, i decided to change the ascii code of the barcode character. So i opened the font in an editing software for fonts and saw, that there's actually no sign deposited for the soft hyphen. That's a little tricky since the windows character table and other programs are showing an alternate character for the soft hyphen. In my case, it was the Yen character (ascii code 165).
Either you use another barcode 128 font, or you have to use a font editing software to get the actual character layout.

Uneven character kerning in PDF when converted from Word via automation

I need your expertise in fixing a problem I have been facing from a week. This has already turned into a 'royal pain in the lower back side' category and time is running out fast.
Problem
I have developed a C# script that I call from ColdFusion to assist me in converting Word documents to PDF. This script is doing the conversion properly, but the (justified) text in the paragraphs is not being spaced properly. I get a non-select-able space next to some character.
See the image -
What is should look like...
What it looks like...
The red marks are added to show the spaces created.
Now, if I open the file by word manually and save it, I do not get this same problem. What is that I'm missing or doing wrong, that has resulted in this error?
Details of my application flow -
I create a DOC (based on my design needs) and save it as HTML.
This HTML will be used by my CF application to manipulate the content based on some placeholders and the final output is again saved as HTML.
The xx.html file is renamed to xx.doc and passed to my C# based converter, which does the doc to pdf convertion via Word Automation.
I ponder in joy seeing my well formed PDF output, but get sad that the text is a bit messy.
I have tried this with multiple fonts and what i observe is that it only happens with certain fonts (in my case its Palatino - Linotype). I want to know, what is the difference from manual to automation? Is there a setting (like a boolean) that is to done for this or some other hacks?
My system configuration -
Windows 2008 R2 64b + .NET 4 + Office 2010
Note: I know that office automation is bad. So on this date and time, this is the only option I have to get my job done.
I found a work-around for this. It seems to be dependent on the selected printer!
First go to the print dialog (File / Print) and select "Microsoft XPS Document Writer" instead of your normal printer. You don't need to print anything,
Now export the PDF (File / Export / Create PDF)
Selecting other printer drivers may work also. I found this solution at this thread: http://www.howtofixcomputers.com/forums/microsoft-office/bad-kerning-pdf-using-save-pdf-xps-add-244886.html
Notes:
I also installed Adobe PDF Writer before finding this. It's possible that affected it.
My system is Windows 8.1 & Office 2013 running under Fusion 5.0.3 on a Mac mini.
I guess that the trouble could be in used font. Please try:
change font
ensure, that language of the text (LanguageID Property) is correct
Or it could be inserted special character, for example, wrong way interpreted inserted "no-width optional break". Try to select the text, cut&paste in word and see non-printable characters - it should be visible.

Microsoft IDEs, source file encodings, BOMs and the Unicode character \uFEFF?

We have parsers for various Microsoft languages (VB6, VB.net, C#, MS dialects of C/C++).
They are Unicode enabled to the extent that we all agree on what Unicode is. Where we don't agree, our lexers object.
Recent MS IDEs all seem to read/write their source code files in UTF-8... I'm not sure this is always true. Is there some reference document that makes it clear how MS will write a souce code file? With or without byte order marks? Does it vary from IDE version to version? (I can't imagine that the old VB6 dev environment wrote anything other than an 8 bit character set, and I'd guess it would be in the CP-xxxx encoding established by the locale, right?)
For C# (and I assume other modern language dialects supported by MS), the character code \uFEFF can actually be found in the middle of a file. This code is defined as a zero-width no-break space. It appears to be ignored by VS 2010 when found in the middle of an identifier, in whitespace, but is significant in keywords and numbers. So, what are the rules? Or does MS have some kind of normalize-identifiers to handle things like composite characters, that allows different identifier strings to be treated as identical?
This is in a way a non-answer, because it does not tell what Microsoft says but what the standards say. Hope it will be of assistance anyway.
U+FEFF as a regular character
As you stated, U+FEFF should be treated as BOM (byte order mark) in the beginning of a file. Theoretically it could also appear in the middle of text since it actually is character denoting a zero width non-breaking space (ZWNBSP). In some languages/writing systems all words in a line are joined (=written together) and in such cases this character could be used as a separator, just like regular space in English but it does not cause a typographically visible gap. I'm not actually familiar with such scripts so my view might not be fully correct.
U+FEFF should only appear as a BOM
However, the usage of U+FEFF as a ZWNBSP has been deprecated as of Unicode version 3.2 and currently the purpose of U+FEFF is to act as a BOM. Instead of ZWNBSP as a separator, U+2060 (word joiner) character is strongly preferred by the Unicode consortium. Their FAQ also suggests that any U+FEFF occurring in the middle of a file can be treated as an unsupported character that should be displayed as invisible. Another possible solutions that comes into my mind would be to replace any U+FEFF occurring in the middle of a file with U+2060 or just ignore it.
Accidentally added U+FEFF
I guess the most probable reason for U+FEFF to appear in the middle of text is that it is a an erroneous result (or side effect) of a string concatenation. RFC 3629, that incorporated the usage of a BOM, denotes that stripping of the leading U+FEFF is necessary in concatenating strings. This also implies that the character could just be removed when found in middle of text.
U+FEFF and UTF-8
U+FEFF as a BOM has no real effect when the text is encoded as UTF-8 since it always has the same byte order. BOM in UTF-8 interferes with systems that rely on the presence of certain leading characters and protocols that explicitly mandate the encoding or an encoding identification method. Real world experience has also showed that some applications choke on UTF-8 with BOM. Therefore the usage of a BOM is generally discouraged when using UTF-8. Removing BOM from an UTF-8 encoded file should should not cause incorrect interpretation of the file (unless there is some checksum or digital signature related to the byte stream of the file).
On "how MS will write a souce code file" : VS can save files with and without BOM, as well in whole bunch of other encodings. The default is UTF-8 with BOM. You can try it yourself by going File -> Save ... as -> click triangle on "Save" button and chose "save with encoding".
On usage of FEFF in actual code - never seen one using it in the code... wikipedia suggests that it should be treated as zero-width space if happened anywhere but first position ( http://en.wikipedia.org/wiki/Byte_order_mark ).
For C++, the file is either Unicode with BOM, or will be interpreted as ANSI (meaning the system code page, not necessarily 1252). Yes, you can save with whatever encoding you want, but the compiler will choke if you try to compile a Shift-JIS file (Japanese, code page 932) on an OS with 1252 as system code page.
In fact, even the editor will get it wrong. You can save it as Shift-JIS on a 1252 system, and will look ok. But close the project and open it, and the text looks like junk. So the info is not preserved anywhere.
So that's your best guess: if there is no BOM, assume ANSI. That is what the editor/compiler do.
Also: VS 2008 and VS 2010, older editors where no to Unicode friendly.
And C++ has different rules than C# (for C++ the files are ANSI by default, for C# they are utf-8)

Categories

Resources