how to read this text from file - c#

how to read the text below?
‰€ˆ‡‰�#îõ‘þüŠ ꑯõù ‚†ƒ� -#�ª÷‘þü “‘
ª“îù )øþ¦ùý ¤ª—ùý î‘õ•þø—¤(#•¢þ¢�
ø¤÷¢ù ꑯõù îõ‘þü#^a—ú¤�ö^b•¦øû÷¢ð‘ö
¤�ù ¢�÷©^cˆˆƒ�#‚€� «.: õ¬ø¤Š
›¢øñ#…�…ˆí/Š…/
…€�…Š}TK{^aˆˆƒ�#†/„€€#}BF{#ª“îùû‘ý
î‘õ•þø—¤ý#ª“îùû‘ý î‘õ•þø—¤ý --
�¥õøöû‘#^c
I use this code but not display all characters
FileStream fs = new FileStream(open.FileName, FileMode.Open, FileAccess.Read);
System.Text.Encoding enc = System.Text.Encoding.UTF8 ;
byte[] data = new byte[fs.Length];
fs.Read(data, 0, data.Length);
string text = enc.GetString(data);
and show text :
†‰€ˆ‡‰Â�#îõâ� �˜Ã¾Ã¼Å
ꑯõù ‚†ƒ� -#�ª÷‘þü
“‘ ª“îù )øþ¦ùý
¤ª—ùý î‘õ•þø—¤(#�
�¢þ¢� ø¤÷¢ù ꑯõù
îõ‘þü#^a—ú¤�ö
^b•¦øû÷¢ð‘ö ¤�ù
¢Â�֩^cˆˆƒÂ�#‚₠¬Â� «.:
õ¬ø¤Š›¢øñ#…Â�…ˆí/Å
…/ …€Â�…Š}TK{^aˆˆƒÂ�#â€
/„€€#}BF{#ª“îùû ‘ý
î‘õ•þø—¤ý#�
�“îùû‘ý î‘õ
this is a TEXT DOS
and encoding this text is:
IBM037
IBM437
IBM500
ASMO-708
DOS-720
ibm737
ibm775
ibm850
ibm852
IBM855
ibm857
IBM00858
IBM860
ibm861
DOS-862
IBM863
IBM864
IBM865
cp866
ibm869
IBM870
windows-874
cp875
shift_jis
gb2312
ks_c_5601-1987
big5
IBM1026
IBM01047
IBM01140
IBM01141
IBM01142
IBM01143
IBM01144
IBM01145
IBM01146
IBM01147
IBM01148
IBM01149
utf-16
unicodeFFFE
windows-1250
windows-1251
Windows-1252
windows-1253
windows-1254
windows-1255
windows-1256
windows-1257
windows-1258
Johab
macintosh
x-mac-japanese
x-mac-chinesetrad
x-mac-korean
x-mac-arabic
x-mac-hebrew
x-mac-greek
x-mac-cyrillic
x-mac-chinesesimp
x-mac-romanian
x-mac-ukrainian
x-mac-thai
x-mac-ce
x-mac-icelandic
x-mac-turkish
x-mac-croatian
utf-32
utf-32BE
x-Chinese-CNS
x-cp20001
x-Chinese-Eten
x-cp20003
x-cp20004
x-cp20005
x-IA5
x-IA5-German
x-IA5-Swedish
x-IA5-Norwegian
us-ascii
x-cp20261
x-cp20269
IBM273
IBM277
IBM278
IBM280
IBM284
IBM285
IBM290
IBM297
IBM420
IBM423
IBM424
x-EBCDIC-KoreanExtended
IBM-Thai
koi8-r
IBM871
IBM880
IBM905
IBM00924
EUC-JP
x-cp20936
x-cp20949
cp1025
koi8-u
iso-8859-1
iso-8859-2
iso-8859-3
iso-8859-4
iso-8859-5
iso-8859-6
iso-8859-7
iso-8859-8
iso-8859-9
iso-8859-13
iso-8859-15
x-Europa
iso-8859-8-i
iso-2022-jp
csISO2022JP
iso-2022-jp
iso-2022-kr
x-cp50227
euc-jp
EUC-CN
euc-kr
hz-gb-2312
GB18030
x-iscii-de
x-iscii-be
x-iscii-ta
x-iscii-te
x-iscii-as
x-iscii-or
x-iscii-ka
x-iscii-ma
x-iscii-gu
x-iscii-pa
utf-7
utf-8

To read the file you need to know what encoding used in this file.
If you don't know, you can iterate through all encodings and see if find the one that works.
const string FileName = "FileName";
foreach (var encodingInfo in Encoding.GetEncodings())
{
try
{
var encoding = encodingInfo.GetEncoding();
var text = File.ReadAllText(FileName, encoding);
Console.WriteLine("{0} - {1}", encodingInfo.Name, text.Substring(0, 20));
// put break point and check if text is readable here
}
catch (Exception ex)
{
Console.WriteLine("Failed: {0}", encodingInfo.Name);
}
}
Disclaimer: assuming this is a text file, assuming the file isn't huge.

Well it looks like you're trying to open a .dat file, which is probably written with a byte format by the looks of it
Try the following code
File readThis = new File("file directory");
byte[] aByte = new byte[(int)readThis.length()];
FileInputStream Fis = new FileInputStream(readThis);
Fis.read(aByte);
System.out.println(Contents: "+aByte);
Fis.close();
Let me know how it goes :)

Related

Prevent File.WriteAllText to write double Byte-Order Mark (BOM)?

In the following example, if (string) text starts with a BOM, File.writeAllText() will add another one, writing two BOMs.
I want to write the text two times:
Without a BOM at all
With a single BOM (if applicable to the encoding)
What is the canonical way to achieve this?
HttpWebResponse response = ...
Byte[] byte = ... // bytes from response possibly including BOM
var encoding = Encoding.GetEncoding(
response.get_CharacterSet(),
new EncoderExceptionFallback(),
new DecoderExceptionFallback()
);
string text = encoding.GetString(bytes); // will preserve BOM if any
System.IO.File.WriteAllText(fileName, text, encoding);
You are decoding and then reencoding the file... It is quite useless.
Inside the Encoding class there is a GetPreamble() method that returns the preamble (called BOM for utf-* encodings), in a byte[]. Then we can check if the bytes array received has already this prefix or not. Then based on this information we can write the two versions of the file, adding or removing the prefix when necessary.
var encoding = Encoding.GetEncoding(response.CharacterSet, new EncoderExceptionFallback(), new DecoderExceptionFallback());
// This will throw if the file is wrongly encoded
encoding.GetCharCount(bytes);
byte[] preamble = encoding.GetPreamble();
bool hasPreamble = bytes.Take(preamble.Length).SequenceEqual(preamble);
if (hasPreamble)
{
File.WriteAllBytes("WithPreambleFile.txt", bytes);
using (var fs = File.OpenWrite("WithoutPreamble.txt"))
{
fs.Write(bytes, preamble.Length, bytes.Length - preamble.Length);
}
}
else
{
File.WriteAllBytes("WithoutPreambleFile.txt", bytes);
using (var fs = File.OpenWrite("WithPreamble.txt"))
{
fs.Write(preamble, 0, preamble.Length);
fs.Write(bytes, 0, bytes.Length);
}
}

How can I Change the encoding of a file 'without BOM' to an 'Windows - 1252' encoded file?

This is my function to convert the encoding of a file.
Before conversion I opened the file in Notepad++, and checked the Encoding using encoding menu, it shows that the Encoding is in UTF 8. I tried to convert the file using following function, but it did not convert to ASCII.
Please have a look into function.
public static void ConvertFileEncoding(string srcFile, Encoding srcEncoding, string tempFile)
{
try
{
using (var reader = new StreamReader(srcFile))
using (var writer = new StreamWriter(tempFile, false, Encoding.ASCII))
{
char[] buf = new char[1024];
while (true)
{
int count = reader.Read(buf, 0, buf.Length);
if (count == 0)
{
break;
}
writer.Write(buf, 0, count);
}
}
System.IO.File.Copy(tempFile, srcFile, true); // Source file is replaced with Temp file
DeleteTempFile(tempFile);
// TO DO -- Log Sucess Details
}
catch (Exception e)
{
throw new IOException("Encoding conversion failed.", e);
// TO DO -- Log failure Details
}
}
Please help me on understanding what wrong happenes when I convert the file without BOM to Windows-1252?
Characters that have values less than 128 in ASCII are all the same when encoded in UTF-8 or ASCII. If your file consists only of these (it is likely) then the file is identical as UTF-8 or ASCII.
A program can't be expected to distinguish these, because they are identical. UTF-8 is very commonly used now, so it's a reasonable choice when a program has no information other than the content of a file to guess from and it wants to display the encoding.

Determining text file encoding schema

I am trying to create a method that can detect the encoding schema of a text file. I know there are many out there, but I know for sure my text file with be either ASCII, UTF-8, or UTF-16. I only need to detect these three. Anyone know a way to do this?
First, open the file in binary mode and read it into memory.
For UTF-8 (or ASCII), do a validation check. You can decode the text using Encoding.GetEncoding("UTF-8", EncoderFallback.ExceptionFallback, DecoderFallback.ExceptionFallback).GetString(bytes) and catch the exception. If you don't get one, the data is valid UTF-8. Here is the code:
private bool detectUTF8Encoding(string filename)
{
byte[] bytes = File.ReadAllBytes(filename);
try {
Encoding.GetEncoding("UTF-8", EncoderFallback.ExceptionFallback, DecoderFallback.ExceptionFallback).GetString(bytes);
return true;
} catch {
return false;
}
}
For UTF-16, check for the BOM (FE FF or FF FE, depending on byte order).
Use the StreamReader to identify the encoding.
Example:
using(var r = new StreamReader(filename, Encoding.Default))
{
richtextBox1.Text = r.ReadToEnd();
var encoding = r.CurrentEncoding;
}

how to read special character like é, â and others in C#

I can't read those special characters
I tried like this
1st way #
string xmlFile = File.ReadAllText(fileName);
2nd way #
FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read);
StreamReader r = new StreamReader(fs);
string s = r.ReadToEnd();
But both statements don't understand those special characters.
How should I read?
UPDATE ###
I also try all encoding with
string xmlFile = File.ReadAllText(fileName, Encoding. );
but still don't understand those special characters.
There is no such thing as "special character". What those likely are is extended ascii characters from the latin1 set (iso-8859-1).
You can read those by supplying encoding explicitly to the stream reader (otherwise it will assume UTF8)
using (StreamReader r = new StreamReader(fileName, Encoding.GetEncoding("iso-8859-1")))
r.ReadToEnd();
StreamReader sr = new StreamReader(stream, Encoding.UTF8)
This worked for me :
var json = System.IO.File.ReadAllText(#"././response/response.json" , System.Text.Encoding.GetEncoding("iso-8859-1"));
You have to tell the StreamReader that you are reading Unicode like so
StreamReader sr = new StreamReader(stream, Encoding.Unicode);
If your file is of some other encoding, specify it as the second parameter
I had to "find" the encoding of the file first
//try to "find" the encoding, if not found, use UTF8
var enc = GetEncoding(filePath)??Encoding.UTF8;
var text = File.ReadAllText(filePath, enc );
(please refer to this answer to get the GetEncoding function)
If you can modify the file in question, you can save it with encoding.
I had a json file that I had created (normally) in VS, and I was having the same problem. Rather than specify the encoding when reading the file (I was using System.IO.File.ReadAllText which defaults to UTF8), I resaved the file (File->Save As) and on the Save button, I clicked the arrow and chose "Save with Encoding", then chose "Unicode (UTF-8 with signature) - Codepage 65001".
Problem solved, no need to specify the encoding when reading the file.

Read exe file as binary file in C#

I want to read an exe file in my C# code then decode as base64.
I am doing it like this
FileStream fr = new FileStream(#"c:\1.exe", FileMode.Open, FileAccess.Read, FileShare.Read);
StreamReader sr = new StreamReader(fr);
fr.Read(data, 0, count);
But the problem is that when I write this file the written file gets corrupted.
When analyzing in hex workshop code value 20 in hex is being replaced by 0.
A StreamReader should be used only with text files. With binary files you need to use directly a FileStream or:
byte[] buffer = File.ReadAllBytes(#"c:\1.exe");
string base64Encoded = Convert.ToBase64String(buffer);
// TODO: do something with the bas64 encoded string
buffer = Convert.FromBase64String(base64Encoded);
File.WriteAllBytes(#"c:\2.exe", buffer);
StreamReader official docs:
"Implements a TextReader that reads characters from a byte stream in a particular encoding."
It's for text, not binary files. Try just Stream or BinaryReader.. (Why did you try a StreamReader?)

Categories

Resources