Getting Special character while creating a file - c#

I am writing text to a file but getting special character appended at the beginning. Can anyone please guide why it is happening and how can we avoid it.
using (FileStream fs = new FileStream(FILE_NAME, FileMode.CreateNew))
{
using (BinaryWriter w = new BinaryWriter(fs))
{
w.Write(#"
COMMENT: OnDemand Generic Index File Format
COMMENT: This file has been generated by DOC Application
COMMENT: date");
}
}
When I open this file I am getting special character at the starting like:
Â
COMMENT: OnDemand Generic Index File Format COMMENT: This file has
been generated by DOC Application
COMMENT: date

BinaryWriter is intended for writing binary files when paired with BinaryReader - it implements a very simple protocol for a range of common types needed in simple serializers - for example strings are length prefixed. What you're seeing here as  is: the length prefix.
Basically, don't use BinaryWriter to write text files. Either use File.WriteAllText (for a single string), or File.CreateText which will give you a TextWriter (specifically, a StreamWriter).
So:
File.WriteAllText(FILE_NAME, #"
COMMENT: OnDemand Generic Index File Format
COMMENT: This file has been generated by DOC Application
COMMENT: date");
or:
using(var file = File.CreateText(FILE_NAME))
{
file.Write(...); // etc
}

It's an encoding problem, it used to happen to ALL my files to. After a little playing, here's the easiest way, get the bytes with an explicit encoding first, then BinaryWriter.Write won't need to wrongly-guess the encoding:
byte[] mybytes = System.Text.Encoding.ASCII.GetBytes(#"
COMMENT: OnDemand Generic Index File Format
COMMENT: This file has been generated by DOC Application
COMMENT: date");
using (FileStream fs = new FileStream(#"C:\temp\2.txt", FileMode.CreateNew))
{
using (BinaryWriter w = new BinaryWriter(fs))
{
w.Write(mybytes);
}
}

Related

C# Streamwriter - Problem with the encoding

I have some product data that I want to write into a csv file. First I have a function that writes the header into the csv file:
using(StreamWriter streamWriter = new StreamWriter(path))
{
string[] headerContent = {"banana","apple","orange"};
string header = string.Join(",", headerContent);
streamWriter.WriteLine(header);
}
Another function goes over the products and writes their data into the csv file:
using (StreamWriter streamWriter = new StreamWriter(new FileStream(path, FileMode.Open), Encoding.UTF8))
{
foreach (var product in products)
{
await streamWriter.WriteLineAsync(product.ToString());
}
}
When writing the products into the csv file and do it with FileMode.Open and Encoding.UTF8, the encoding is set correctly into the file meaning that special characters in german or french get shown correctly. But the problem here is that I overwrite my header when I do it like this.
The solution I tried was to not use FileMode.Open but to use FileMode.Append which works, but then for some reason the encoding just gets ignored.
What could I do to append the data while maintaing the encoding? And also why is this happening in the first place?
EDIT:
Example with FileMode.Open:
Fußpflegecreme
Example with FileMode.Append:
Fußpflegecreme
The important question here is: what does the file actually contain; for example, if I use the following:
using System.Text;
string path = "my.txt";
using (StreamWriter streamWriter = new StreamWriter(new FileStream(path, FileMode.Create), Encoding.UTF8))
{
streamWriter.WriteLine("Fußpflegecreme 1");
}
using (StreamWriter streamWriter = new StreamWriter(new FileStream(path, FileMode.Append), Encoding.UTF8))
{
streamWriter.WriteLine("Fußpflegecreme 2");
}
// this next line is lazy and inefficient; only good for quick tests
Console.WriteLine(BitConverter.ToString(File.ReadAllBytes(path)));
then the output is (re-formatted a little):
EF-BB-BF-
46-75-C3-9F-70-66-6C-65-67-65-63-72-65-6D-65-20-31-0D-0A-
46-75-C3-9F-70-66-6C-65-67-65-63-72-65-6D-65-20-32-0D-0A
The first line (note: there aren't any "lines" in the original hex) is the UTF-8 BOM; the second and third lines are the correctly UTF-8 encoded payloads. It would help if you could show the exact bytes that get written in your case. I wonder if the real problem here is that in your version, there is no BOM, but the rest of the data is correct. Some tools, in the absence of a BOM, will choose the wrong encoding. But also, some tools: in the presence of a BOM: will incorrectly show some garbage at the start of the file (and may also, because they're clearly not using the BOM: use the wrong encoding). The preferred option is: specify the encoding explicitly when reading the file, and use a tool that can handle the presence of absence of a BOM.
Whether or not to include a BOM (especially in the case of UTF-8) is a complex question, and there are pros/cons of each - and there are tools that will work better, or worse, with each. A lot of UTF-8 text files do not include a BOM, but: there is no universal answer. The actual content is still correctly UTF-8 encoded whether or not there is a BOM - but how that is interpreted (in either case) is up to the specific tool that you're using to read the data (and how that tool is configured).
I think this will be solved once you explicitly choose the utf8 encoding when writing the header. This will prefix the file with a BOM.

Unable to convert special characters in UTF-8 file into ANSI

I have a file that needs to be read and a text has to be added at the end.
The program failed due to character "í" .
On opening the file in notepad++ (UTF-8) encoding, I could see
In my C# code I tried to convert it to Default encoding, but the application changes it to "?" instead of "í".
Sample code:
string processFilePath = #"D:\Test\File1.txt";
string outfile = #"D:\Test\File2.txt";
using (StreamReader reader = new StreamReader(processFilePath))
{
using (StreamWriter writer = new StreamWriter(outfile, false, Encoding.Default))
{
writer.WriteLine(reader.ReadToEnd());
}
}
I looked into similar questions on SO (above code snipped was the modified version from here):
UTF-8 to ANSI Conversion using C#
I tried different types of encoding available in the "System.Text.Encoding" - ASCII/ UTF*/ Default but the best I could get is a "?" instead of "í".
I had also gone through : http://kunststube.net/encoding/ , I did learn a lot, but was still unable to resolve the issue.
What I am getting:
What I need:
On Microsoft website:
What else am I missing (Should have been easy if System.Text.Encoding.ANSI existed )
MSDN:
StreamReader defaults to UTF-8 encoding unless specified otherwise,
instead of defaulting to the ANSI code page for the current system.
i.e. when opening StreamReader(processFilePath) it takes data as in UTF-8, which seems not the case, i.e. if the source text is ANSI, or most likely Windows-1252 for Spanish, use
using (StreamReader reader = new StreamReader(processFilePath, Encoding.GetEncoding(1252)))
{
using (StreamWriter writer = new StreamWriter(outfile, false, Encoding.UTF8))
{
writer.WriteLine(reader.ReadToEnd());
}
}
Note specified 1252 and UTF8.
P.S. Also note that false in StreamWriter will not append to the end, but overwrite.

C# StreamReader/StreamWriter encoding oddity

I have a very simple c# console app that reads through a text file and outputs the same file but with a particular string replaced on each line that it appears - utilizing StreamReader and StreamWriter. I do not know the encoding of the source file. I have encountered a situation where there is a character in the file (ext ascii dec 166, broken pipe) that when running through this app gets "mangled" using the default encoding (In the output file it ends up as a "box" character). Since I do not know the source file encoding I have attempted multiple options to see what would provide an unaltered result and oddly the only way that works is having it read in UTF-7 and written in UTF-8.
UTF-7 to UTF-7 causes problems like & to change to +AC. UTF-8 to UTF-8 (which I believe is the default) converts the character in question to the "box". ASCII to ASCII turns it into ?. Unicode to Unicode results in gibberish. Shouldn't it be same encoding read and write for same results? Simplified code example below:
using (var fileStream = new FileStream(fileName, FileMode.Open))
using (var fileReader = new StreamReader(fileStream,Encoding.UTF7))
using (var fileStreamOut = new FileStream(tempFileName,FileMode.Create))
using (var fileWriter = new StreamWriter(fileStreamOut,Encoding.UTF8))
{
while (!fileReader.EndOfStream)
{
var inputLine = fileReader.ReadLine();
if (inputLine != null)
{
inputLine = inputLine.Substring(0, 3) + newRdfi + inputLine.Substring(12);
fileWriter.WriteLine(inputLine);
}
}
fileWriter.Flush();
}
After clarification on the file creation method received from the developer of the source system and knowledge of the server it is being produced on I came to the conclusion the encoding was Windows-1252. Changing my read and write streams to use Encoding.GetEncoding(1252) resulted in all characters reading and outputting as expected.

Writing to .bin binary file

I am trying to write integer numbers to binary file, but it keeps giving weird characters in the binary file. For example, I try to write 2000, but in the file i will get something strange. How do I fix it? Couldn't find the solution anywhere.
I use the following code:
//create the file
FileStream fs = new FileStream("iram.bin", FileMode.Create);
// Create the writer for data.
BinaryWriter w = new BinaryWriter(fs);
w.Write((int) 2000);
w.Close();
fs.Close();
I think the problem is that you are not reading the data back properly.
You will need to read the data back using a BinaryReader like so...
using (FileStream fs2 = new FileStream("iram.bin", FileMode.Open))
{
using(BinaryReader r = new BinaryReader(fs2))
{
var integerValue = r.ReadInt32();
}
}
Unless of course you actually want to write text to the file in which case you probably don't want a BinaryWriter to write the data out.
If you actually want to write out text data you could do so like this... (Be sure to set your encoding to what you need)
using (var tw = new StreamWriter("iram.txt", true, Encoding.ASCII))
{
tw.WriteLine(2000);
}
Edit: As Jesse mentioned you normally want to wrap disposable objects in using blocks.
The reason you're getting unexpected chars in the file is because what you're writing to the file is not meant to be interpreted as a sequence of chars in the first place
When you open it in notepad or another text editor, It will just take what's there, guess the encoding(or use a default), and show you whatever chars the data would encode if it were encoding chars. It's not intended to be human readable.
A human readable text file that has the character sequence 2000 actually has an encoding of the character 2, followed by the encoding of 0 3 times.
in Unicode it's U+0032U+0030U+0030U+0030

How to set a file's encoding when writing it out in c#?

I'm not full certain I understand file encoding completely. If I write out text to a file in c#, how can I set the encoding type of that file? Maybe it's just I do not understand the full spectrum of file encoding.
using (var sw = new StreamWriter(File.Open(#"c:\test.txt", FileMode.CreateNew), Encoding.GetEncoding("iso-8859-1")))
{
sw.WriteLine("my text...");
}
Your code does exactly that - you're writing out text using ISO Latin 1.
Note that there's nothing in the file itself to specify the encoding, unless you're writing out a file which allows you to specify that. The file is basically just a sequence of bytes. The encoding you're specifying in your code determines how the text you're writing is converted into bytes, that's all.
Use the constructor that accepts an Encoding parameter, which you already do, and set the encoding to the one you like.
Something like this
using (var sw = new StreamWriter(fileName, true, System.Text.Encoding.UTF8,512);
{
sw.WriteLine(""text here);
}

Categories

Resources