C # - .Net Core Write an ANSI encoded text file - c#

I have to create an ANSI encoded txt file, because the system where I have to load it only reads ANSI and does not read UTF-8. I tried to follow various threads but nothing worked, now my code is this:
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
StreamWriter sw = new StreamWriter(fileName, false, Encoding.GetEncoding(1252));
sw.Write(sb.ToString());
sw.Close();
sb.Clear();
When I open the generated file in Notepad ++ I see that it is in UTF-8. Can someone help me?
Thanks

Notepad++ guesses the encoding based on the file's contents, as text files don't include their used encoding (unless they start with a BOM).
If you write using code page 1252, then that's what's used.

Related

Unable to force ANSI encoding (windows-1252)

I'm trying to save two text files in ANSI encoding for later processing by a legacy system. However when I save it in the correct encoding, it still saves as a UTF-8 file.
I've tried the following:
File.WriteAllLines(filePath, lines, Encoding.GetEncoding(1252));
File.WriteAllLines(filePath, lines, Encoding.GetEncoding("windows-1252"));
using (StreamWriter writer = new StreamWriter(fileName, false, Encoding.GetEncoding(1252)))
{
foreach (string line in lines)
{
writer.WriteLine(line);
}
}
I've also tried converting an existing utf-8 file to ansi
File.WriteAllBytes(fileName, Encoding.Convert(Encoding.UTF8, Encoding.GetEncoding(1252), File.ReadAllBytes(fileName)));
None of the above solutions have worked; they're still UTF-8. The only way I managed to make it save as ANSI was by inserting swedish characters like åäö, which is a hack I cannot use for one of the files.
I'm at a loss. Has anyone got a solution to this issue?
We're on .NET Framework 4.5, C# 7.3
I did a thorough investigation and found that it works, just not in the way I expected. As #jdweng said: Nothing in the data contains the encoding, you're just saving bytes. For the most part you're saving regular ASCII characters, so when you for instance open Notepad++ to read it, it will default to whatever encoding it prefers, unless you have a special character that hints to the program which encoding to use.
I encoded a file in four encodings (default (UTF-8), ANSI, ASCII and UTF-8-BOM) and opened up all files in a hex editor and found that in most cases the ä in these files determined which decoder to use in Notepad++.
So if the legacy system uses an ANSI decoder, it should be able to open an "ANSI" encoded file without special characters. Despite it showing up as UTF-8 in Notepad++.
It definitely works. Try the following program:
using System.IO;
using System.Text;
namespace Demo
{
static class Program
{
static void Main()
{
string filePath = #"E:\tmp\test"; // Put your path here.
string[] lines = { "ÿ" };
File.WriteAllLines(filePath + ".1.bin", lines, Encoding.GetEncoding(1252));
File.WriteAllLines(filePath + ".2.bin", lines);
}
}
}
Run the program and then inspect the contents of the files in a binary editor.
You will see the following:
test.1.bin contains: FF 0D 0A
test.2.bin contains: 0C BF 0D 0A
(Note: If you drag and drop a ".bin" file into Visual Studio, it will open it in binary mode.)

Open .prn file that includes image with right Encoding using c#

I need to open a .prn file and replace some strings.
In the .prn file I included an image, that has a string like this:
When I open the .prn file, C# is not able to read the string as it is.
Probably, it misses some encoding, but not sure which one.
I tried different encodings, but without success.
Here is the code that opens the file in read mode:
string text = File.ReadAllText(root + #"testImage.prn");
c# reads that string in this way
and i'm not able to print the file with the image included.
Thanks in advance for your help.
Most PRN files contain ISO encoding. So, try using ISO encoding and read the file using System.IO.StreamReader with explicitly specifying the desired encoding.
The following example worked perfectly in my case:
System.Text.Encoding encoding = System.Text.Encoding.GetEncoding("ISO-8859-1");
string text;
using (System.IO.StreamReader sr = new System.IO.StreamReader(path, encoding))
{
text = sr.ReadToEnd();
}
In Java, it worked this way for me: Using Stream and charset ISO-8859-1.
Stream<String> stream = Files.lines(Paths.get(filePath), Charset.forName("ISO-8859-1"));

C# Write to file using StreamWriter with ANSI encoding doesn't work

I want to write to text file with ANSI encoding. Code looks like below:
string text = "abc123";
string filePath = "C:\\Data\\MyFile.csv";
using (StreamWriter sw = new StreamWriter(File.Open(filePath, FileMode.OpenOrCreate), Encoding.Default))
{
sw.Write(text);
}
When I open result file with Notepad++ and click on 'Encoding' button on menu then there is always 'UTF-8 (without BOM)' which I want to avoid.
I tried to choose option 'convert to ANSI', but after save of file and reopen it's still 'UTF-8.
I am stuck with this issue for long time, could anyone give some hint ?
There is no problem with StreamWriter, it is how Notepad++ works. You can easily see it yourself. Just open classic windows Notepad, type "test" and "save as" with ANSI encoding. Then open in Notepad++ - it will recognize encoding as UTF8.
If you export text containing special characters, for example 'é', Notepadd++ will then show the encoding as ANSI instead of UTF-8.
string text = "éçë";

OpenXml SDK excel accented French Chars (éèçà) [duplicate]

I've been trying this for quite a while now, but can't figure it out. I'm trying to export data to Excel via a *.csv file. It works great so far, but I have some encoding problems when opening the files in Excel.
(original string on the left, EXCEL result on the right):
Messwert(µm / m) ==> Messwert(µm / m)
Dümme Mässöng ==> Dümme Mässöng
Notepad++ tells me that the file is encoded "ANSI as UTF8"(WTF?)
So here are different ways I tried to get a valid result:
obvious implementation:
tWriter.Write(";Messwert(µm /m)");
more sophisticated one (tried probably a dozen or more encoding combinations:)
tWriter.Write(Encoding.Default.GetString(Encoding.Unicode.GetBytes(";Messwert(µm /m)")));
tWriter.Write(Encoding.ASCII.GetString(Encoding.Unicode.GetBytes(";Messwert(µm /m)")));
and so on
Whole source code for the method creating the data:
MemoryStream tStream = new MemoryStream();
StreamWriter tWriter = new StreamWriter(tStream);
tWriter.Write("\uFEFF");
tWriter.WriteLine(string.Format("{0}", aMeasurement.Name));
tWriter.WriteLine(aMeasurement.Comment);
tWriter.WriteLine();
tWriter.WriteLine("Zeit in Minuten;Messwert(µm / m)");
TimeSpan tSpan;
foreach (IMeasuringPoint tPoint in aMeasurement)
{
tSpan = new TimeSpan(tPoint.Time - aMeasurement[0].Time);
tWriter.WriteLine(string.Format("{0};{1};", (int)tSpan.TotalMinutes, getMPString(tPoint)));
}
tWriter.Flush();
return tStream;
Generated CSV file:
Dümme Mössäng
Testmessung die erste
Zeit in Minuten;Messwert(µm / m)
0;-703;
0;-381;
1;1039;
1;1045;
2;1457;
2;1045;
This worked perfect for me:
private const int WIN_1252_CP = 1252; // Windows ANSI codepage 1252
this._writer = new StreamWriter(fileName, false, Encoding.GetEncoding(WIN_1252_CP));
CSV encoding issues (Microsoft Excel)
try the following:
using (var sw = File.Create(Path.Combine(txtPath.Text, "UTF8.csv")))
{
var preamble = Encoding.UTF8.GetPreamble();
sw.Write(preamble, 0, preamble.Length);
var data = Encoding.UTF8.GetBytes("懘荧,\"Hello\",text");
sw.Write(data, 0, data.Length);
}
It writes the proper UTF8 preamble to the file before writing the UTF8 encoded CSV.
This solution is written up as a fix for a Java application however you should be able to do something similar in C#. You may also want to look at the documentation on the StreamWriter class, in the remarks it refers to the Byte Order Mark (BOM).
"ANSI as UTF8"(WTF?)
NotePad++ is probably correct. The encoding is UTF8 (i.e., correct Unicode header), but only contains ANSI data (i.e., é is not encoded in correct UTF8 way, which would mean two bytes).
Or: it is the other way around. It is ANSI (no file header BOM), but the encoding of the individual characters is, or looks like, UTF8. This would explain the ü and other characters expanding in more than one other character. You can fix this by forcing the file to be read as Unicode.
If it's possible to post (part of) your CSV, we may be able to help fixing it at the source.
Edit
Now that we've seen your code: can you remove the StreamWriter and replace it with a TextWriter? Also, remove the hand-encoding of the BOM, it is not necessary. When you create a TextWriter, you can specify the encoding (don't use ASCII, try UTF8).
Trevor Germain's helped me to save in the correct encoded format
using (var sw = File.Create(Path.Combine(txtPath.Text, "UTF8.csv")))
{
var preamble = Encoding.UTF8.GetPreamble();
sw.Write(preamble, 0, preamble.Length);
var data = Encoding.UTF8.GetBytes("懘荧,\"Hello\",text");
sw.Write(data, 0, data.Length);
}
I'd suggest you open up the text file in a hex editor, and see what it really is. The BOM for UTF-16 is 0xFEFF, which the writing code is apparently writing to the stream - but the rest of the writing doesn't specify an encoding to use - it would use the default encoding of the StreamWriter, which is UTF-8. There appears to be a mix up of encodings.
When you pop open the file in hex view, if you see lots of 0x00 between the characters, you're working with UTF-16, which is Encoding.Unicode in C#. If there are no 0x00 between chars, the encoding is probably UTF-8.
If the latter case, just fix up the BOM to be EF BB BF rather than FE FF, and read normally with UTF-8 encoding.
For my scenario using StreamWriter I found explicitly passing UTF8 encoding to the StreamWriter enabled excel to read the file using the correct encoding.
See this answer for more details:
https://stackoverflow.com/a/22306937/999048

C# Help reading foreign characters using StreamReader

I'm using the code below to read a text file that contains foreign characters, the file is encoded ANSI and looks fine in notepad. The code below doesn't work, when the file values are read and shown in the datagrid the characters appear as squares, could there be another problem elsewhere?
StreamReader reader = new StreamReader(inputFilePath, System.Text.Encoding.ANSI);
using (reader = File.OpenText(inputFilePath))
Thanks
Update 1: I have tried all encodings found under System.Text.Encoding. and all fail to show the file correctly.
Update 2: I've changed the file encoding (resaved the file) to unicode and used System.Text.Encoding.Unicode and it worked just fine. So why did notepad read it correctly? And why didn't System.Text.Encoding.Unicode read the ANSI file?
You may also try the Default encoding, which uses the current system's ANSI codepage.
StreamReader reader = new StreamReader(inputFilePath, Encoding.Default, true)
When you try using the Notepad "Save As" menu with the original file, look at the encoding combo box. It will tell you which encoding notepad guessed is used by the file.
Also, if it is an ANSI file, the detectEncodingFromByteOrderMarks parameter will probably not help much.
I had the same problem and my solution was simple: instead of
Encoding.ASCII
use
Encoding.GetEncoding("iso-8859-1")
The answer was found here.
Edit: more solutions. This maybe more accurate one:
Encoding.GetEncoding(1252);
Also, in some cases this will work for you too if your OS default encoding matches file encoding:
Encoding.Default;
Yes, it could be with the actual encoding of the file, probably unicode. Try UTF-8 as that is the most common form of unicode encoding. Otherwise if the file ASCII then standard ASCII encoding should work.
Using Encoding.Unicode won't accurately decode an ANSI file in the same way that a JPEG decoder won't understand a GIF file.
I'm surprised that Encoding.Default didn't work for the ANSI file if it really was ANSI - if you ever find out exactly which code page Notepad was using, you could use Encoding.GetEncoding(int).
In general, where possible I'd recommend using UTF-8.
Try a different encoding such as Encoding.UTF8. You can also try letting StreamReader find the encoding itself:
StreamReader reader = new StreamReader(inputFilePath, System.Text.Encoding.UTF8, true)
Edit: Just saw your update. Try letting StreamReader do the guessing.
For swedish Å Ä Ö the only solution form the ones above working was:
Encoding.GetEncoding("iso-8859-1")
Hopefully this will save someone time.
File.OpenText() always uses an UTF-8 StreamReader implicitly. Create your own StreamReader
instance instead and specify the desired encoding.
like
using (StreamReader reader = new StreamReader(#"C:\test.txt", Encoding.Default)
{
// ...
}
I solved my problem of reading portuguese characters, changing the source file on notepad++.
C#
var url = System.Web.HttpContext.Current.Server.MapPath(#"~/Content/data.json");
string s = string.Empty;
using (System.IO.StreamReader sr = new System.IO.StreamReader(url, System.Text.Encoding.UTF8,true))
{
s = sr.ReadToEnd();
}
I'm also reading an exported file which contains french and German languages. I used Encoding.GetEncoding("iso-8859-1"), true which worked out without any challenges.
for Arabic, I used Encoding.GetEncoding(1256). it is working good.
I had a similar problem with ProcessStartInfo and the property StandardOutputEncoding. I set it for German language console output to code page 850. This way I could read the output like ausführen instead of ausf�hren.

Categories

Resources