Encoding string issue reading a CSV file in C#

Encoding string issue reading a CSV file in C# - c#

I am currently developing a Windows Phone 8 application in which one I have to download a CSV file from a web-service and convert data to a C# business object (I do not use a library for this part).
Download the file and convert data to a C# business object is not an issue using RestSharp.Portable, StreamReader class and MemoryStream class.
The issue I face to is about the bad encoding of the string fields.
With the library RestSharp.Portable, I retrieve the csv file content as a byte array and then convert data to string with the following code (where response is a byte array) :
using (var streamReader = new StreamReader(new MemoryStream(response)))
{
while (streamReader.Peek() >= 0)
{
var csvLine = streamReader.ReadLine();
}
}
but instead of "Jérome", my csvLine variable contains J�rome. I tried several things to obtain Jérome but without success like :
using (var streamReader = new StreamReader(new MemoryStream(response), true))
or
using (var streamReader = new StreamReader(new MemoryStream(response), Encoding.UTF8))
When I open the CSV file with a simple notepad software like notepad++ I obtain Jérome only when the file is encoding in ANSI. But if I try the following code in C# :
using (var streamReader = new StreamReader(new MemoryStream(response), Encoding.GetEncoding("ANSI")))
I have the following exception :
'ANSI' is not a supported encoding name.
Can someone help me to decode correctly my CSV file ?
Thank you in advance for your help or advices !

You need to pick one of these.
https://msdn.microsoft.com/en-us/library/windows/desktop/dd317756(v=vs.85).aspx
If you don't know, you can try to guess it. Guessing isn't a perfect solution, per the answer here.
You can't detect the codepage, you need to be told it. You can analyse the bytes and guess it, but that can give some bizarre (sometimes amusing) results.

From the link of Lawtonfogle I tried to use
using (var streamReader = new StreamReader(new MemoryStream(response), Encoding.GetEncoding("Windows-1252")))
But I had the following error :
'Windows-1252' is not a supported encoding name.
Searching why on the internet, I finally found following thread with the following answer that works for me.
So here the working solution in my case :
using (var streamReader = new StreamReader(new MemoryStream(response), Encoding.GetEncoding("ISO-8859-1")))
{
while (streamReader.Peek() >= 0)
{
var csvLine = streamReader.ReadLine();
}
}

Related

Decompress Stream to String using SevenZipSharp

I'd like to compress a string using SevenZipSharp and have cobbled together a C# console application (I'm new to C#) using the following code, (bits and pieces of which came from similar questions here on SO).
The compress part seems to work (albeit I'm passing in a file instead of a string), output of the compressed string to the console looks like gibberish but I'm stuck on the decompress...
I'm trying to do the same thing as here (I think):
https://stackoverflow.com/a/4305399/3451115
https://stackoverflow.com/a/45861659/3451115
https://stackoverflow.com/a/36331690/3451115
Appreciate any help, ideally the console will display the compressed string followed by the decompressed string.
Thanks :)
using System;
using System.IO;
using SevenZip;
namespace _7ZipWrapper
{
public class Program
{
public static void Main()
{
SevenZipCompressor.SetLibraryPath(#"C:\Temp\7za64.dll");
SevenZipCompressor compressor = new SevenZipCompressor();
compressor.CompressionMethod = CompressionMethod.Ppmd;
compressor.CompressionLevel = SevenZip.CompressionLevel.Ultra;
compressor.ScanOnlyWritable = true;
var compStream = new MemoryStream();
var decompStream = new MemoryStream();
compressor.CompressFiles(compStream, #"C:\Temp\a.txt");
StreamReader readerC = new StreamReader(compStream);
Console.WriteLine(readerC.ReadToEnd());
Console.ReadKey();
// works up to here... below here output to consol is: ""
SevenZipExtractor extractor = new SevenZip.SevenZipExtractor(compStream);
extractor.ExtractFile(0, decompStream);
StreamReader readerD = new StreamReader(decompStream);
Console.WriteLine(readerD.ReadToEnd());
Console.ReadKey();
}
}
}

The result of compression is binary data - it isn't a string. If you try to read it as a string, you'll just see garbage. That's to be expected - you shouldn't be treating it as a string.
The next problem is that you're trying to read from compStream twice, without "rewinding" it first. You're starting from the end of the stream, which means there's no data for it to decompress. If you just add:
compStream.Position = 0;
before you create the extractor, you may well find it works immediately. You may also need to rewind the decompStream before reading from it. So you'd have code like this:
// Rewind to the start of the stream before decompressing
compStream.Position = 0;
SevenZipExtractor extractor = new SevenZip.SevenZipExtractor(compStream);
extractor.ExtractFile(0, decompStream);
// Rewind to the start of the decompressed stream before reading
decompStream.Position = 0;

Convert a file froM Shift-JIS to UTF8 No BOM without re-reading from disk

I am dealing with files in many formats, including Shift-JIS and UTF8 NoBOM. Using a bit of language knowledge, I can detect if the files are being interepeted correctly as UTF8 or ShiftJIS, but if I detect that the file is not of the type I read in, I was wondering if there is a way to just reinterperet my in-memory array without having to re-read the file with a new encoding specified.
Right now, I read in the file assuming Shift-JIS as such:
using (StreamReader sr = new StreamReader(path, Encoding.GetEncoding("shift-jis"), true))
{
String line = sr.ReadToEnd();
// Detection must be done AFTER you read from the file. Silly rabbit.
fileFormatCertain = !sr.CurrentEncoding.Equals(Encoding.GetEncoding("shift-jis"));
codingFromBOM = sr.CurrentEncoding;
}
and after I do my magic to determine if it is either a known format (has a BOM) or that the data makes sense as Shift-JIS, all is well. If the data is garbage though, then I am re-reading the file via:
using (StreamReader sr = new StreamReader(path, Encoding.UTF8))
{
String line = sr.ReadToEnd();
}
I am trying to avoid this re-read step and reinterperet the data in memory if possible.
Or is magic already happening and I am needlessly worrying about double I/O access?

var buf = File.ReadAllBytes(path);
var text = Encoding.UTF8.GetString(buf);
if (text.Contains("\uFFFD")) // Unicode replacement character
{
text = Encoding.GetEncoding(932).GetString(buf);
}

Creating a stream out of an httppostedfile always returns an empty file

I have the following code:
file.InputStream.Seek(0,0);
Stream s = file.InputStream
s.Position = 0;
s = File.Create(path);
My goal is for the final output to be a duplicate of the original file. Using file.SaveAs(path) successfully does this. However converting it to a stream and then trying to create the file does not. Is there something completely obvious that I'm missing or is there a bigger problem?

The problem is this line:
s = File.Create(path)
That doesn't do what you want it to. That's creating a new stream - at which point you're ignoring the old one entirely.
You probably want something like:
using (var output = File.Create(path))
{
file.InputStream.CopyTo(output);
}

Read a PDF into a string or byte[] and write that string/byte[] back to disk

I am having a problem in my app where it reads a PDF from disk, and then has to write it back to a different location later.
The emitted file is not a valid PDF anymore.
In very simplified form, I have tried reading/writing it using
var bytes = File.ReadAllBytes(#"c:\myfile.pdf");
File.WriteAllBytes(#"c:\output.pdf", bytes);
and
var input = new StreamReader(#"c:\myfile.pdf").ReadToEnd();
File.WriteAllText("c:\output.pdf", input);
... and about 100 permutations of the above with various encodings being specified. None of the output files were valid PDFs.
Can someone please lend a hand? Many thanks!!

In C#/.Net 4.0:
using (var i = new FileStream(#"input.pdf", FileMode.Open, FileAccess.Read))
using (var o = File.Create(#"output.pdf"))
i.CopyTo(o);
If you insist on having the byte[] first:
using (var i = new FileStream(#"input.pdf", FileMode.Open, FileAccess.Read))
using (var ms = new MemoryStream())
{
i.CopyTo(ms);
byte[] rawdata = ms.GetBuffer();
using (var o = File.Create(#"output.pdf"))
ms.CopyTo(o);
}
The memory stream may need to be ms.Seek(0, SeekOrigin.Origin) or something like that before the second CopyTo. look it up, or try it out

You're using File.WriteAllText to write your file out.
Try File.WriteAllBytes.

Saving unicode characters as "“" into a file; C#

I have an xml file (converted from xfdl) which contains something like:
<custom:popUp xfdl:compute="toggle(activated,'off','on') == '1' ? viewer.messageBox('o Once you click ..... page.
o When you use the “Create ” function in.......Portal.','Information'):''">
I load it and save it using...
XmlDocument xmlOut = new XmlDocument(); //note: not read only
FileStream outfs = new FileStream(tempOutXmlFileName, FileMode.Open, FileAccess.Read,
FileShare.ReadWrite);
xmlOut.Load(outfs);
xmlOut.Save(tempOutXmlFileName);
outfs.Close();
This process converts some of the unicode instructions into actual characters which completely messes up the xml/xfdl parsing as there are now quotation marks where quotation marks shouldn't be.
Does anybody know a way I can save the file with all the lovely “ characters intact?
Thank you.
Well, after fiddling around for a bit and getting the xml->xfdl conversion working better, I ran into a new problem.
The solution below seems to work and all the parsing of the xml is correct, but the program to read the xfdl file doesn't seem to like when I encode it using UTF-8 and wants the encoding to be ISO-8859-1.
Any ideas?

Using StreamReader and StreamWriter should help. To be clear you are trying to read from and write to the same file? I added some nice using statements aswell.
XmlDocument xmlOut = new XmlDocument();
//note: not read only
using (FileStream outfs = new FileStream(tempOutXmlFileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
using (StreamReader reader = new StreamReader(outfs, Encoding.UTF8))
{
xmlOut.Load(reader);
}
using (StreamWriter writer = new StreamWriter(tempOutXmlFileName, false, Encoding.UTF8))
{
xmlOut.Save(writer);
}
I set append to false in the StreamWriter, seems to make sense.

Turns out that reading and writing the files byte by byte solved the problem since the writer never got the opportunity to do any interpretation on the content.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Encoding string issue reading a CSV file in C# - c#

Related

Decompress Stream to String using SevenZipSharp

Convert a file froM Shift-JIS to UTF8 No BOM without re-reading from disk

Creating a stream out of an httppostedfile always returns an empty file

Read a PDF into a string or byte[] and write that string/byte[] back to disk

Saving unicode characters as "“" into a file; C#

Categories

Resources