I am trying to write integer numbers to binary file, but it keeps giving weird characters in the binary file. For example, I try to write 2000, but in the file i will get something strange. How do I fix it? Couldn't find the solution anywhere.
I use the following code:
//create the file
FileStream fs = new FileStream("iram.bin", FileMode.Create);
// Create the writer for data.
BinaryWriter w = new BinaryWriter(fs);
w.Write((int) 2000);
w.Close();
fs.Close();
I think the problem is that you are not reading the data back properly.
You will need to read the data back using a BinaryReader like so...
using (FileStream fs2 = new FileStream("iram.bin", FileMode.Open))
{
using(BinaryReader r = new BinaryReader(fs2))
{
var integerValue = r.ReadInt32();
}
}
Unless of course you actually want to write text to the file in which case you probably don't want a BinaryWriter to write the data out.
If you actually want to write out text data you could do so like this... (Be sure to set your encoding to what you need)
using (var tw = new StreamWriter("iram.txt", true, Encoding.ASCII))
{
tw.WriteLine(2000);
}
Edit: As Jesse mentioned you normally want to wrap disposable objects in using blocks.
The reason you're getting unexpected chars in the file is because what you're writing to the file is not meant to be interpreted as a sequence of chars in the first place
When you open it in notepad or another text editor, It will just take what's there, guess the encoding(or use a default), and show you whatever chars the data would encode if it were encoding chars. It's not intended to be human readable.
A human readable text file that has the character sequence 2000 actually has an encoding of the character 2, followed by the encoding of 0 3 times.
in Unicode it's U+0032U+0030U+0030U+0030
Related
I have some product data that I want to write into a csv file. First I have a function that writes the header into the csv file:
using(StreamWriter streamWriter = new StreamWriter(path))
{
string[] headerContent = {"banana","apple","orange"};
string header = string.Join(",", headerContent);
streamWriter.WriteLine(header);
}
Another function goes over the products and writes their data into the csv file:
using (StreamWriter streamWriter = new StreamWriter(new FileStream(path, FileMode.Open), Encoding.UTF8))
{
foreach (var product in products)
{
await streamWriter.WriteLineAsync(product.ToString());
}
}
When writing the products into the csv file and do it with FileMode.Open and Encoding.UTF8, the encoding is set correctly into the file meaning that special characters in german or french get shown correctly. But the problem here is that I overwrite my header when I do it like this.
The solution I tried was to not use FileMode.Open but to use FileMode.Append which works, but then for some reason the encoding just gets ignored.
What could I do to append the data while maintaing the encoding? And also why is this happening in the first place?
EDIT:
Example with FileMode.Open:
Fußpflegecreme
Example with FileMode.Append:
Fußpflegecreme
The important question here is: what does the file actually contain; for example, if I use the following:
using System.Text;
string path = "my.txt";
using (StreamWriter streamWriter = new StreamWriter(new FileStream(path, FileMode.Create), Encoding.UTF8))
{
streamWriter.WriteLine("Fußpflegecreme 1");
}
using (StreamWriter streamWriter = new StreamWriter(new FileStream(path, FileMode.Append), Encoding.UTF8))
{
streamWriter.WriteLine("Fußpflegecreme 2");
}
// this next line is lazy and inefficient; only good for quick tests
Console.WriteLine(BitConverter.ToString(File.ReadAllBytes(path)));
then the output is (re-formatted a little):
EF-BB-BF-
46-75-C3-9F-70-66-6C-65-67-65-63-72-65-6D-65-20-31-0D-0A-
46-75-C3-9F-70-66-6C-65-67-65-63-72-65-6D-65-20-32-0D-0A
The first line (note: there aren't any "lines" in the original hex) is the UTF-8 BOM; the second and third lines are the correctly UTF-8 encoded payloads. It would help if you could show the exact bytes that get written in your case. I wonder if the real problem here is that in your version, there is no BOM, but the rest of the data is correct. Some tools, in the absence of a BOM, will choose the wrong encoding. But also, some tools: in the presence of a BOM: will incorrectly show some garbage at the start of the file (and may also, because they're clearly not using the BOM: use the wrong encoding). The preferred option is: specify the encoding explicitly when reading the file, and use a tool that can handle the presence of absence of a BOM.
Whether or not to include a BOM (especially in the case of UTF-8) is a complex question, and there are pros/cons of each - and there are tools that will work better, or worse, with each. A lot of UTF-8 text files do not include a BOM, but: there is no universal answer. The actual content is still correctly UTF-8 encoded whether or not there is a BOM - but how that is interpreted (in either case) is up to the specific tool that you're using to read the data (and how that tool is configured).
I think this will be solved once you explicitly choose the utf8 encoding when writing the header. This will prefix the file with a BOM.
I am writing text to a file but getting special character appended at the beginning. Can anyone please guide why it is happening and how can we avoid it.
using (FileStream fs = new FileStream(FILE_NAME, FileMode.CreateNew))
{
using (BinaryWriter w = new BinaryWriter(fs))
{
w.Write(#"
COMMENT: OnDemand Generic Index File Format
COMMENT: This file has been generated by DOC Application
COMMENT: date");
}
}
When I open this file I am getting special character at the starting like:
Â
COMMENT: OnDemand Generic Index File Format COMMENT: This file has
been generated by DOC Application
COMMENT: date
BinaryWriter is intended for writing binary files when paired with BinaryReader - it implements a very simple protocol for a range of common types needed in simple serializers - for example strings are length prefixed. What you're seeing here as  is: the length prefix.
Basically, don't use BinaryWriter to write text files. Either use File.WriteAllText (for a single string), or File.CreateText which will give you a TextWriter (specifically, a StreamWriter).
So:
File.WriteAllText(FILE_NAME, #"
COMMENT: OnDemand Generic Index File Format
COMMENT: This file has been generated by DOC Application
COMMENT: date");
or:
using(var file = File.CreateText(FILE_NAME))
{
file.Write(...); // etc
}
It's an encoding problem, it used to happen to ALL my files to. After a little playing, here's the easiest way, get the bytes with an explicit encoding first, then BinaryWriter.Write won't need to wrongly-guess the encoding:
byte[] mybytes = System.Text.Encoding.ASCII.GetBytes(#"
COMMENT: OnDemand Generic Index File Format
COMMENT: This file has been generated by DOC Application
COMMENT: date");
using (FileStream fs = new FileStream(#"C:\temp\2.txt", FileMode.CreateNew))
{
using (BinaryWriter w = new BinaryWriter(fs))
{
w.Write(mybytes);
}
}
I have been having issues reading a file that contains a mix of Arabic and Western text. I read the file into a TextBox as follows:
tbx1.Text = File.ReadAllText(fileName.Text, Encoding.UTF8);
No matter what value I tried instead of "Encoding.UTF8" I got garbled characters displayed in place of the Arabic. The western text was displayed fine.
I thought it might have been an issue with the way the TextBox was defined, but on start up I write some mixed Western/Arabic text to the textbox and this displays fine:
tbx1.Text = "Start السلا عليكم" + Environment.NewLine + "Here";
Then I opened Notepad and copied the above text into it, then saved the file, at which point Notepad save dialogue asked for which encoding to use.
I then presented the saved file to my code and it displayed all the content correctly.
I examined the file and found 3 binary bytes at the beginning (not visible in Notepad):
The 3 bytes, I subsequently found through research represent the BOM, and this enables the C# "File.ReadAllText(fileName.Text, Encoding.UTF8);" to read/display the data as desired.
What puzzles me is specifying the " Encoding.UTF8" value should take care of this.
The only way I can think is to code up a step to add this data to a copy of teh file, then process that file. But this seems rather long-winded. Just wondering if there is a better way to do, or why the Encoding.UTF8 is not yielding the desired result.
Edit:
Still no luck despite trying the suggestion in the answer.
I cut the test data down to containing just Arabic as follows:
Code as follows:
FileStream fs = new FileStream(fileName.Text, FileMode.Open);
StreamReader sr = new StreamReader(fs, Encoding.UTF8, false);
tbx1.Text = sr.ReadToEnd();
sr.Close();
fs.Close();
Tried with both "true" and "false" on the 2nd line, but both give the same result.
If I open the file in Notepad++, and specify the Arabic ISO-8859-6 Character set it displays fine.
Here is what is looks like in Notepad++ (and what I would liek the textbox to display):
Not sure if the issue is in the reading from file, or the writing to the textbox.
I will try inspecting the data post read to see. But at the moment, I'm puzzled.
The StreamReader class has a constructor that will take care of testing for the BOM for you:
using (var stream = new FileStream(fileName.Text, FileAccess.Read))
{
using (var sr = new StreamReader(stream, Encoding.UTF8, true))
{
var text = sr.ReadToEnd();
}
}
The final true parameter is detectEncodingFromByteOrderMark:
The detectEncodingFromByteOrderMarks parameter detects the encoding by looking at the first three bytes of the stream. It automatically recognizes:
UTF-8
little-endian Unicode
and big-endian Unicode text
if the file
starts with the appropriate byte order marks. Otherwise, the
user-provided encoding is used. See the Encoding.GetPreamble method
for more information.
I have a very simple c# console app that reads through a text file and outputs the same file but with a particular string replaced on each line that it appears - utilizing StreamReader and StreamWriter. I do not know the encoding of the source file. I have encountered a situation where there is a character in the file (ext ascii dec 166, broken pipe) that when running through this app gets "mangled" using the default encoding (In the output file it ends up as a "box" character). Since I do not know the source file encoding I have attempted multiple options to see what would provide an unaltered result and oddly the only way that works is having it read in UTF-7 and written in UTF-8.
UTF-7 to UTF-7 causes problems like & to change to +AC. UTF-8 to UTF-8 (which I believe is the default) converts the character in question to the "box". ASCII to ASCII turns it into ?. Unicode to Unicode results in gibberish. Shouldn't it be same encoding read and write for same results? Simplified code example below:
using (var fileStream = new FileStream(fileName, FileMode.Open))
using (var fileReader = new StreamReader(fileStream,Encoding.UTF7))
using (var fileStreamOut = new FileStream(tempFileName,FileMode.Create))
using (var fileWriter = new StreamWriter(fileStreamOut,Encoding.UTF8))
{
while (!fileReader.EndOfStream)
{
var inputLine = fileReader.ReadLine();
if (inputLine != null)
{
inputLine = inputLine.Substring(0, 3) + newRdfi + inputLine.Substring(12);
fileWriter.WriteLine(inputLine);
}
}
fileWriter.Flush();
}
After clarification on the file creation method received from the developer of the source system and knowledge of the server it is being produced on I came to the conclusion the encoding was Windows-1252. Changing my read and write streams to use Encoding.GetEncoding(1252) resulted in all characters reading and outputting as expected.
I'm not full certain I understand file encoding completely. If I write out text to a file in c#, how can I set the encoding type of that file? Maybe it's just I do not understand the full spectrum of file encoding.
using (var sw = new StreamWriter(File.Open(#"c:\test.txt", FileMode.CreateNew), Encoding.GetEncoding("iso-8859-1")))
{
sw.WriteLine("my text...");
}
Your code does exactly that - you're writing out text using ISO Latin 1.
Note that there's nothing in the file itself to specify the encoding, unless you're writing out a file which allows you to specify that. The file is basically just a sequence of bytes. The encoding you're specifying in your code determines how the text you're writing is converted into bytes, that's all.
Use the constructor that accepts an Encoding parameter, which you already do, and set the encoding to the one you like.
Something like this
using (var sw = new StreamWriter(fileName, true, System.Text.Encoding.UTF8,512);
{
sw.WriteLine(""text here);
}