writing string in utf8 format to zip file ( Ionic.Zip ) - c#

I have a problem that output string must be utf8-formatted, I am writing currently ansi string to zip file without problems like this:
StreamReader tr = new StreamReader( "myutf8-file.xml");
String myfilecontent = tr.ReadToEnd();
ZipFile zip = new ZipFile());
zip.AddFileFromString("my.xml", "", myfilecontent );
How to force string (my.xml file content) to be UTF8.

Don't use the deprecated AddFileFromString method. Use AddEntry(string, string, string, Encoding) instead:
zip.AddEntry("my.xml", "", myfilecontent, Encoding.UTF8);
If you're actually reading a UTF-8 text file to start with though, why not just open the stream and pass that into AddEntry? There's no need to decode from UTF-8 and then re-encode...

Related

How to remove special character or unwanted symbols from the file written JSON format in C# UNity?

The JSON format when I read after writing is as follows
�[{"SaveValues":[{"id":1,"allposition":{"x":-5.12429666519165,"y":4.792403697967529},"allrotation":{"x":0.0,"y":0.0,"z":0.0,"w":1.0},"allscale":{"x":1.0,"y":1.0},"linepos0":{"x":0.0,"y":0.0,"z":0.0},"linepos1":{"x":0.0,"y":0.0,"z":0.0},"movetype":1},{"id":1,"allposition":{"x":-4.788785934448242,"y":-3.4373996257781984},"allrotation":{"x":0.0,"y":0.0,"z":0.0,"w":1.0},"allscale":{"x":1.0,"y":1.0},"linepos0":{"x":0.0,"y":0.0,"z":0.0},"linepos1":{"x":0.0,"y":0.0,"z":0.0},"movetype":1}],"NoteValues":[{"movenumber":1,"notemsg":"Move One"}]},{"SaveValues":[{"id":1,"allposition":{"x":-5.12429666519165,"y":4.792403697967529},"allrotation":{"x":0.0,"y":0.0,"z":0.0,"w":1.0},"allscale":{"x":1.0,"y":1.0},"linepos0":{"x":0.0,"y":0.0,"z":0.0},"linepos1":{"x":0.0,"y":0.0,"z":0.0},"movetype":2},{"id":1,"allposition":{"x":-4.788785934448242,"y":-3.4373996257781984},"allrotation":{"x":0.0,"y":0.0,"z":0.0,"w":1.0},"allscale":{"x":1.0,"y":1.0},"linepos0":{"x":0.0,"y":0.0,"z":0.0},"linepos1":{"x":0.0,"y":0.0,"z":0.0},"movetype":2},{"id":2,"allposition":{"x":5.185188293457031,"y":4.803859233856201},"allrotation":{"x":0.0,"y":0.0,"z":0.0,"w":1.0},"allscale":{"x":1.0,"y":1.0},"linepos0":{"x":0.0,"y":0.0,"z":0.0},"linepos1":{"x":0.0,"y":0.0,"z":0.0},"movetype":2},{"id":2,"allposition":{"x":5.154441833496094,"y":-4.023111343383789},"allrotation":{"x":0.0,"y":0.0,"z":0.0,"w":1.0},"allscale":{"x":1.0,"y":1.0},"linepos0":{"x":0.0,"y":0.0,"z":0.0},"linepos1":{"x":0.0,"y":0.0,"z":0.0},"movetype":2}],"NoteValues":[{"movenumber":2,"notemsg":"Move Two"}]}]
The code I use for saving to JSON format is given below.
ListContainer container = new ListContainer(getAlldata,playerNotes);
var temp = container;
//--Adding data in container into List<string> jsonstring
jsonstring.Add(JsonUtility.ToJson(temp));
//--Combing list of string into a single string
string jsons = "[" +string.Join(",", jsonstring)+"]";
//Writing into a JSON file in the persistent path
using (FileStream fs = new FileStream( Path.Combine(Application.persistentDataPath , savedName+".json"), FileMode.Create))
{
BinaryWriter filewriter = new BinaryWriter(fs);
filewriter.Write(jsons);
fs.Close();
}
Here I am looking to remove the special character that came at the starting point of the JSON format.
I am trying to read the JSON by using the following code
using (FileStream fs = new FileStream(Application.persistentDataPath + "/" + filename, FileMode.Open))
{
fs.Dispose();
string dataAsJson = File.ReadAllText(Path.Combine(Application.persistentDataPath, filename));
Debug.Log("DataJsonRead - - -" + dataAsJson);
}
I am getting an error - ArgumentException: JSON parse error: Invalid value.
How to remove that special or unwanted symbol from the starting ?I think it is something to do with writing the file into the directory.While trying to save with other methods I did not find any character or symbols.
� is the Unicode Replacement character, emitted when there's an attempt to read text as if they were encoded with a single-byte codepage using the wrong codepage. It's not a BOM - File.ReadAllText would recognize it and use it to load the rest of the file using the correct encoding. This means there's garbage at the start.
The problem is caused by the inappropriate use of BinaryWriter. That class is used to write fields of primitive types in a binary format to a stream. For variable length types like stings, the first byte(s) contain the field length.
This code :
using var ms=new MemoryStream();
using( BinaryWriter writer = new BinaryWriter(ms))
{
writer.Write(new String('0',3));
}
var b=ms.ToArray();
Produces
3, 48,48,48
Use StreamWriter or File.WriteAllText instead. The default encoding used is UTF8 so there's no need to specify an encoding or try to change anything :
using (FileStream fs = new FileStream( Path.Combine(Application.persistentDataPath , savedName+".json"), FileMode.Create))
using(var writer=new StreamWriter(fs))
{
writer.Write(jsons);
}
or
var path=Path.Combine(Application.persistentDataPath , savedName+".json")
using(var writer=new StreamWriter(path))
{
writer.Write(jsons);
}
first add this to your .cs file
using System.Text.RegularExpressions;
then we can do this with RegEx as follows.
varName = Regex.Replace(SaveValues, "[-*]", "");
This will look for the - symbol and remove it from your string.
Hope this helps.

Byte array read from a file and byte array converted from string read from same file differs

If i read byte array from a file and write it using below code
byte[] bytes = File.ReadAllBytes(filePath);
File.WriteAllBytes(filePath, byteArr);
works perfectly fine.I can open and view the written file properly.
But if i read file contents into a string and then convert it to byte array using below function
string s = File.ReadAllText(filePath);
var byteArr = System.Text.Encoding.UTF8.GetBytes(s);
the size of byte array is more than the previous array read directly from file and the values are also different, hence if i write the file using this array the cannot be read when opened
Note:- File is utf-8 encoded
i found out that using below code
using (StreamReader reader = new StreamReader(filePath, Encoding.UTF8, true))
{
reader.Peek(); // you need this!
var encoding = reader.CurrentEncoding;
}
Unable to understand why both the array differs??
I was using the below attached image for converting and then writing
With
using (StreamReader reader = new StreamReader(filePath, Encoding.UTF8, true))
{
reader.Peek(); // you need this!
var encoding = reader.CurrentEncoding;
}
your var encoding will just echo the Encoding.UTF8 parameter. You are deceiving yourself there.
A binary file just has no text encoding.
Need to save a file may be anything an image or a text
Then just use ReadAllBytes/WriteAllBytes. A text file is always also a byte[], but not all file types are text. You would need Base64 encoding first and that just adds to the size.
The safest way to convert byte arrays to strings is indeed encoding it in something like base64.
Like:
string s= Convert.ToBase64String(bytes);
byte[] bytes = Convert.FromBase64String(s);

Detecting encoding of uploaded text file (ASP.NET MVC) [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How can I detect the encoding/codepage of a text file
I have a ASP.NET MVC application. In my view I upload a text file and process it with a controller method with this signature
[HttpPost]
public ActionResult FromCSV(HttpPostedFileBase file, string platform)
I get a stream from the uploaded file as file.InputStream and read it using a standard StreamReader
using (var sr = new StreamReader(file.InputStream))
{
...
}
The problem is, that this only works for UTF text files. When I have a text file in Windows-1250, the characters get messed up. I can work with Windows-1250 encoded text files when I explicitly specify the encoding
using (var sr = new StreamReader(file.InputStream, Encoding.GetEncoding(1250)))
{
...
}
My problem is, that I need to support both UTF and Windows-1250 encoded files so I need a way to detect the encoding of the submitted file.
Trying to decode a file encoded in Windows-1250 as UTF-8 is extremely likely to cause an exception (or if not, the file is only using ASCII subset so it doesn't matter what encoding is used to decode) with exception fallback, so you could do something like this:
Encoding[] encodings = new Encoding[]{
Encoding.GetEncoding("UTF-8", new EncoderExceptionFallback(), new DecoderExceptionFallback()),
Encoding.GetEncoding(1250, new EncoderExceptionFallback(), new DecoderExceptionFallback())
};
String result = null;
foreach( Encoding enc in encodings ) {
try {
result = enc.GetString( fileAsByteArray );
break;
}
catch( DecoderFallbackException e ) {
}
}

how to read special character like é, â and others in C#

I can't read those special characters
I tried like this
1st way #
string xmlFile = File.ReadAllText(fileName);
2nd way #
FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read);
StreamReader r = new StreamReader(fs);
string s = r.ReadToEnd();
But both statements don't understand those special characters.
How should I read?
UPDATE ###
I also try all encoding with
string xmlFile = File.ReadAllText(fileName, Encoding. );
but still don't understand those special characters.
There is no such thing as "special character". What those likely are is extended ascii characters from the latin1 set (iso-8859-1).
You can read those by supplying encoding explicitly to the stream reader (otherwise it will assume UTF8)
using (StreamReader r = new StreamReader(fileName, Encoding.GetEncoding("iso-8859-1")))
r.ReadToEnd();
StreamReader sr = new StreamReader(stream, Encoding.UTF8)
This worked for me :
var json = System.IO.File.ReadAllText(#"././response/response.json" , System.Text.Encoding.GetEncoding("iso-8859-1"));
You have to tell the StreamReader that you are reading Unicode like so
StreamReader sr = new StreamReader(stream, Encoding.Unicode);
If your file is of some other encoding, specify it as the second parameter
I had to "find" the encoding of the file first
//try to "find" the encoding, if not found, use UTF8
var enc = GetEncoding(filePath)??Encoding.UTF8;
var text = File.ReadAllText(filePath, enc );
(please refer to this answer to get the GetEncoding function)
If you can modify the file in question, you can save it with encoding.
I had a json file that I had created (normally) in VS, and I was having the same problem. Rather than specify the encoding when reading the file (I was using System.IO.File.ReadAllText which defaults to UTF8), I resaved the file (File->Save As) and on the Save button, I clicked the arrow and chose "Save with Encoding", then chose "Unicode (UTF-8 with signature) - Codepage 65001".
Problem solved, no need to specify the encoding when reading the file.

how to read this text from file

how to read the text below?
‰€ˆ‡‰�#îõ‘þüŠ ꑯõù ‚†ƒ� -#�ª÷‘þü “‘
ª“îù )øþ¦ùý ¤ª—ùý î‘õ•þø—¤(#•¢þ¢�
ø¤÷¢ù ꑯõù îõ‘þü#^a—ú¤�ö^b•¦øû÷¢ð‘ö
¤�ù ¢�÷©^cˆˆƒ�#‚€� «.: õ¬ø¤Š
›¢øñ#…�…ˆí/Š…/
…€�…Š}TK{^aˆˆƒ�#†/„€€#}BF{#ª“îùû‘ý
î‘õ•þø—¤ý#ª“îùû‘ý î‘õ•þø—¤ý --
�¥õøöû‘#^c
I use this code but not display all characters
FileStream fs = new FileStream(open.FileName, FileMode.Open, FileAccess.Read);
System.Text.Encoding enc = System.Text.Encoding.UTF8 ;
byte[] data = new byte[fs.Length];
fs.Read(data, 0, data.Length);
string text = enc.GetString(data);
and show text :
†‰€ˆ‡‰Â�#îõâ� �˜Ã¾Ã¼Å
ꑯõù ‚†ƒ� -#�ª÷‘þü
“‘ ª“îù )øþ¦ùý
¤ª—ùý î‘õ•þø—¤(#�
�¢þ¢� ø¤÷¢ù ꑯõù
îõ‘þü#^a—ú¤�ö
^b•¦øû÷¢ð‘ö ¤�ù
¢Â�֩^cˆˆƒÂ�#‚₠¬Â� «.:
õ¬ø¤Š›¢øñ#…Â�…ˆí/Å
…/ …€Â�…Š}TK{^aˆˆƒÂ�#â€
/„€€#}BF{#ª“îùû ‘ý
î‘õ•þø—¤ý#�
�“îùû‘ý î‘õ
this is a TEXT DOS
and encoding this text is:
IBM037
IBM437
IBM500
ASMO-708
DOS-720
ibm737
ibm775
ibm850
ibm852
IBM855
ibm857
IBM00858
IBM860
ibm861
DOS-862
IBM863
IBM864
IBM865
cp866
ibm869
IBM870
windows-874
cp875
shift_jis
gb2312
ks_c_5601-1987
big5
IBM1026
IBM01047
IBM01140
IBM01141
IBM01142
IBM01143
IBM01144
IBM01145
IBM01146
IBM01147
IBM01148
IBM01149
utf-16
unicodeFFFE
windows-1250
windows-1251
Windows-1252
windows-1253
windows-1254
windows-1255
windows-1256
windows-1257
windows-1258
Johab
macintosh
x-mac-japanese
x-mac-chinesetrad
x-mac-korean
x-mac-arabic
x-mac-hebrew
x-mac-greek
x-mac-cyrillic
x-mac-chinesesimp
x-mac-romanian
x-mac-ukrainian
x-mac-thai
x-mac-ce
x-mac-icelandic
x-mac-turkish
x-mac-croatian
utf-32
utf-32BE
x-Chinese-CNS
x-cp20001
x-Chinese-Eten
x-cp20003
x-cp20004
x-cp20005
x-IA5
x-IA5-German
x-IA5-Swedish
x-IA5-Norwegian
us-ascii
x-cp20261
x-cp20269
IBM273
IBM277
IBM278
IBM280
IBM284
IBM285
IBM290
IBM297
IBM420
IBM423
IBM424
x-EBCDIC-KoreanExtended
IBM-Thai
koi8-r
IBM871
IBM880
IBM905
IBM00924
EUC-JP
x-cp20936
x-cp20949
cp1025
koi8-u
iso-8859-1
iso-8859-2
iso-8859-3
iso-8859-4
iso-8859-5
iso-8859-6
iso-8859-7
iso-8859-8
iso-8859-9
iso-8859-13
iso-8859-15
x-Europa
iso-8859-8-i
iso-2022-jp
csISO2022JP
iso-2022-jp
iso-2022-kr
x-cp50227
euc-jp
EUC-CN
euc-kr
hz-gb-2312
GB18030
x-iscii-de
x-iscii-be
x-iscii-ta
x-iscii-te
x-iscii-as
x-iscii-or
x-iscii-ka
x-iscii-ma
x-iscii-gu
x-iscii-pa
utf-7
utf-8
To read the file you need to know what encoding used in this file.
If you don't know, you can iterate through all encodings and see if find the one that works.
const string FileName = "FileName";
foreach (var encodingInfo in Encoding.GetEncodings())
{
try
{
var encoding = encodingInfo.GetEncoding();
var text = File.ReadAllText(FileName, encoding);
Console.WriteLine("{0} - {1}", encodingInfo.Name, text.Substring(0, 20));
// put break point and check if text is readable here
}
catch (Exception ex)
{
Console.WriteLine("Failed: {0}", encodingInfo.Name);
}
}
Disclaimer: assuming this is a text file, assuming the file isn't huge.
Well it looks like you're trying to open a .dat file, which is probably written with a byte format by the looks of it
Try the following code
File readThis = new File("file directory");
byte[] aByte = new byte[(int)readThis.length()];
FileInputStream Fis = new FileInputStream(readThis);
Fis.read(aByte);
System.out.println(Contents: "+aByte);
Fis.close();
Let me know how it goes :)

Categories

Resources