How to read Hebrew text using System.IO.FileStream? - c#

Am I missing something or does System.IO.FileStream not read Unicode text files containing Hebrew?
public TextReader CSVReader(Stream s, Encoding enc)
{
this.stream = s;
if (!s.CanRead)
{
throw new CSVReaderException("Could not read the given CSV stream!");
}
reader = (enc != null) ? new StreamReader(s, enc) : new StreamReader(s);
}
Thanks
Jonathan

The FileStream is nothing but a byte stream, which is language/charset agnostic. You need an encoding to convert bytes into characters (including Hebrew) and back.
There are several classes to help you with that, the most important being System.Text.Encoding and System.IO.StreamReader and System.IO.StreamWriter.

The stream might be closed.
From msdn on CanRead:
If a class derived from Stream does
not support reading, calls to the
Read, ReadByte, and BeginRead methods
throw a NotSupportedException.
If the stream is closed, this property
returns false.

I'd wager that you're simply not using the right encoding. Chances are you're passing in Encoding.Default or Encoding.ASCII when you should actually be passing Encoding.UTF8 (most common) or Encoding.Unicode to that method.
If you're sure that you're using the right encoding, post the full code and an example of the file.

Related

C# Streamwriter - Problem with the encoding

I have some product data that I want to write into a csv file. First I have a function that writes the header into the csv file:
using(StreamWriter streamWriter = new StreamWriter(path))
{
string[] headerContent = {"banana","apple","orange"};
string header = string.Join(",", headerContent);
streamWriter.WriteLine(header);
}
Another function goes over the products and writes their data into the csv file:
using (StreamWriter streamWriter = new StreamWriter(new FileStream(path, FileMode.Open), Encoding.UTF8))
{
foreach (var product in products)
{
await streamWriter.WriteLineAsync(product.ToString());
}
}
When writing the products into the csv file and do it with FileMode.Open and Encoding.UTF8, the encoding is set correctly into the file meaning that special characters in german or french get shown correctly. But the problem here is that I overwrite my header when I do it like this.
The solution I tried was to not use FileMode.Open but to use FileMode.Append which works, but then for some reason the encoding just gets ignored.
What could I do to append the data while maintaing the encoding? And also why is this happening in the first place?
EDIT:
Example with FileMode.Open:
Fußpflegecreme
Example with FileMode.Append:
Fußpflegecreme
The important question here is: what does the file actually contain; for example, if I use the following:
using System.Text;
string path = "my.txt";
using (StreamWriter streamWriter = new StreamWriter(new FileStream(path, FileMode.Create), Encoding.UTF8))
{
streamWriter.WriteLine("Fußpflegecreme 1");
}
using (StreamWriter streamWriter = new StreamWriter(new FileStream(path, FileMode.Append), Encoding.UTF8))
{
streamWriter.WriteLine("Fußpflegecreme 2");
}
// this next line is lazy and inefficient; only good for quick tests
Console.WriteLine(BitConverter.ToString(File.ReadAllBytes(path)));
then the output is (re-formatted a little):
EF-BB-BF-
46-75-C3-9F-70-66-6C-65-67-65-63-72-65-6D-65-20-31-0D-0A-
46-75-C3-9F-70-66-6C-65-67-65-63-72-65-6D-65-20-32-0D-0A
The first line (note: there aren't any "lines" in the original hex) is the UTF-8 BOM; the second and third lines are the correctly UTF-8 encoded payloads. It would help if you could show the exact bytes that get written in your case. I wonder if the real problem here is that in your version, there is no BOM, but the rest of the data is correct. Some tools, in the absence of a BOM, will choose the wrong encoding. But also, some tools: in the presence of a BOM: will incorrectly show some garbage at the start of the file (and may also, because they're clearly not using the BOM: use the wrong encoding). The preferred option is: specify the encoding explicitly when reading the file, and use a tool that can handle the presence of absence of a BOM.
Whether or not to include a BOM (especially in the case of UTF-8) is a complex question, and there are pros/cons of each - and there are tools that will work better, or worse, with each. A lot of UTF-8 text files do not include a BOM, but: there is no universal answer. The actual content is still correctly UTF-8 encoded whether or not there is a BOM - but how that is interpreted (in either case) is up to the specific tool that you're using to read the data (and how that tool is configured).
I think this will be solved once you explicitly choose the utf8 encoding when writing the header. This will prefix the file with a BOM.

String to zip file

I use a webservice that returns a zip file, as a string, and not bytes as I expected. I tried to write it to the disk, but when I open it, it tells me that it is corrupt. What am I doing wrong?
string cCsv = oResponse.fileCSV;//this is the result from webservice
MemoryStream ms = new MemoryStream(System.Text.Encoding.ASCII.GetBytes(cCsv));
using (FileStream file = new FileStream("test.zip", FileMode.Create, FileAccess.Write))
{
ms.WriteTo(file);
}
ms.Close();
I'm not sure what kind of encoding the string is in, but assuming UTF-8, the following should work. UTF-16 would be another guess.
string cCsv = oResponse.fileCSV;
using (BinaryWriter bw = new BinaryWriter(File.Create("test.zip")))
{
bw.Write(System.Text.Encoding.UTF8.GetBytes(cCsv));
}
It'd be informative to look at the characters and the raw string itself being returned.
Edit
Per Frank's answer, the correct encoding is base64, which of course makes sense because it's binary data stored as a string.
Also, per Frank's answer, if the only action is to directly write a single byte array, then File.WriteAllBytes is more compact.
Ok, i solve the problem:
File.WriteAllBytes("testbase64.zip", Convert.FromBase64String(cCsv));

File is getting saved in UTF16 format. What might be the issue?

I am trying to save xml string as file, file is getting saved in UTF16 format. What might be the issue?
private void SaveFile(string xmlData, string fileName)
{
File.WriteAllText(fileName, xmlData, Encoding.UTF8);
}
Even though I have mentioned Encoding as UTF8 still the file is getting saved in UTF16 format.
I'm guessing you have done something like:
string xml;
using(var sw = new StringWriter()) {
xmlSerializer.Serialize(sw, obj);
xml = sw.ToString();
}
in which case yes, the xml will internally declare utf-16, because it has correctly determined that it is writing to something that is inherently utf-16. There are probably ways to work around this in the writer (XmlWriterSettings.Encoding, for example), but a better approach would be either:
to write/serialize directly to the file, for example via a StreamWriter onto the file
to write/serialize to a MemoryStream rather than a StringWriter, since MemoryStream has no inherent utf-16 encoding
The encoding of a file is not quite the same thing as the declared encoding in the xml; if the xml as a string says utf-16, that won't magically change just because you write the string as utf-8

What is ADODB.Stream?

What exactly is it, or was it, as is a interop, used for?
Here, this is the method where I use it:
public void SaveAttachmentMime(String fileName, CDO.Message message)
{
ADODB.Stream stream = message.BodyPart.GetStream();
String messageString = stream.ReadText(stream.Size);
StreamWriter outputStream = new StreamWriter(fileName);
outputStream.Write(messageString);
outputStream.Flush();
outputStream.Close();
}
The ADODB.Stream object was used to read files and other streams. What it does is part of what the StreamReader, StreamWriter, FileStream and Stream does in the .NET framework.
For what the code in that method uses it for, in .NET you would use a StreamReader to read from a Stream.
Note that the code in the method only works properly if the stream contains non-Unicode data, as it uses the size in bytes to determine how many characters to read. With a Unicode encoding some characters may be encoded as several bytes, so the stream would run into the end of the stream before it could read the number of characters specified.
It is a COM object, which is used to represent a stream of data or text. The data can be binary. If I recall correctly, it implements the IStream interface, which stores data in a structured storage object. You can find the interop representation of the interface in System.Runtime.InteropServices.ComTypes.IStream.

C# GZipStream to String

I am in need of a way to write a GZipStream to a string.
I am using:
GZipStream Decompress = new GZipStream(inFile, CompressionMode.Decompress)
I have tried several methods, but can't figure it out.
Does anyone have any ideas?
Many thanks,
Brett
You have a decompressing GZipStream, so you need to read data from it. The easiest way is to wrap the GZipStream with a StreamReader which has a ReadToEnd method returning a string.
Something like:
string res;
using (var decompress = new GZipStream(inFile, CompressionMode.Decompress))
using (var sr = new StreamReader(decompress)) {
res = sr.ReadToEnd();
}
(using statements ensure that inFile is closed and any other resources are freed.)
NB this does assume that inFile contains text encoded UTF-8 or UTF-16. Binary content or other text encoding could cause problems (you can override the encoding with a different StreamReader constructor).

Categories

Resources