StreamReader "detect encoding" Strange Behavior - c#

I'm just doing this:
using (var f = File.Open("File.txt", FileMode.OpenOrCreate, FileAccess.ReadWrite))
{
using (var sw = new StreamWriter(f, Encoding.ASCII))
{
sw.WriteLine("Test");
}
}
using (var f = File.Open("File.txt", FileMode.OpenOrCreate, FileAccess.ReadWrite))
{
using (var sr = new StreamReader(f, detectEncodingFromByteOrderMarks: true))
{
var r = sr.ReadLine();
var e = sr.CurrentEncoding;
//e = UTF8Encoding ???????? !!!!!
}
}
Why the stream does not detects correctly the encoding ?

See the MSDN link on StreamReader.CurrentEncoding
The value can be different after the first call to any Read method of StreamReader, since encoding autodetection is not done until the first call to a Read method.

I found,
From MSDN:
The detectEncodingFromByteOrderMarks parameter detects the encoding by
looking at the first three bytes of the stream.It automatically
recognizes UTF-8, little-endian Unicode, and big-endian Unicode text
if the file starts with the appropriate byte order marks.Otherwise,
the UTF8Encoding is used.See the Encoding.GetPreamble method for more
information.

Related

c# encoding issues with CsvHelper

var orders = new List<Order>();
....
orders.Add(...)
string csvstring;
using (var ms = new MemoryStream())
using (var wr = new StreamWriter(stream, Encoding.UTF8))
using (var csvWriter = new CsvWriter(wr, CultureInfo.InvariantCulture, false))
{
csvWriter.WriteRecords(orders);
csvstring = Encoding.UTF8.GetString(stream.ToArray());
}
And then
sftp.WriteAllText(fileNameAbsolutePath, csvstring, Encoding.UTF8);
The content of the file created in sftp has "feff" in the begining.
" orders.csv: text/plain; charset=utf-8".
This is the first part of the problem. What I am looking is to convert this UTF8 to IS0-8859-1 as
the charset expected in the end file is IS0-8859-1.
May be I should do something like this ?
byte[] bytesSS = Encoding.Convert(Encoding.UTF8, Encoding.GetEncoding("ISO-8859-1"), Encoding.UTF8.GetBytes(csvstring));
string s1 = Encoding.GetEncoding("ISO-8859-1").GetString(bytesSS, 0, bytesSS.Length);
Tried to google for "<feff>" and I quite didn't get the concept of BOM and a way to fix this.
I have no idea which SFTP class you use as .NET itself doesn't have an SFTP client. I'll assume you use this one simply because it came first in a Google search for sftp WriteAllText.
If you want to create a file with a specific encoding, specify it in the StreamWriter constructor instead of UTF8 :
using (var ms = new MemoryStream())
using (var wr = new StreamWriter(stream, Encoding.GetEncoding("ISO-8859-1")))
using (var csvWriter = new CsvWriter(wr, CultureInfo.InvariantCulture, false))
{
csvWriter.WriteRecords(orders);
}
On the other hand, UTF8 and Latin1 (or any codepage) use the exact same values for characters in the range 0-127. If you want to send only English text, there won't be any difference no matter which encoding you use. If the actual requirement is to create a UTF8 file without a BOM, you can specify it by using the appropriate UTF8Encoding constructor :
var utf8NoBom=new UTF8Encoding(false);
using (var ms = new MemoryStream())
using (var wr = new StreamWriter(stream, utf8NoBom)))
using (var csvWriter = new CsvWriter(wr, CultureInfo.InvariantCulture, false))
{
csvWriter.WriteRecords(orders);
}
All SFTP clients have (or should have) a way to upload data using a stream. This means you can use Stream.CopyTo to copy data from the memory stream to the upload stream. Assuming OpenWrite is available, you can modify the code to:
using (var ms = new MemoryStream())
{
using (var wr = new StreamWriter(stream, Encoding.GetEncoding("ISO-8859-1")))
using (var csvWriter = new CsvWriter(wr, CultureInfo.InvariantCulture, false))
{
csvWriter.WriteRecords(orders);
}
ms.Position=0;
using(var stream=sftp.OpenWrite(somePath))
{
ms.CopyTo(stream);
}
}
When the CsvHelper completes, the MemoryStream's position is at the end of the stream and CopyTo wouldn't copy anything. By using ms.Position you move the position to the start of the stream.

Converting DotNetZip memory stream to string

I am trying to read a file within a zip to check if that file has a certain string in it. But I can seem to get the "file" (memory stream) into a string in order to search it.
When I use the following code "stringOfStream" is always blank, what am I doing wrong? The reader always has a length and read byte returns different numbers.
using (ZipFile zip = ZipFile.Read(currentFile.FullName))
{
ZipEntry e = zip[this.searchFile.Text];
using (MemoryStream reader = new MemoryStream())
{
e.Extract(reader);
var stringReader = new StreamReader(reader);
var stringOfStream = stringReader.ReadToEnd();
}
}
Thanks
I think when you call Extract the position of the stream goes to the end of the file, so you need to reposition again to get the data.
Can you try this please :
using (ZipFile zip = ZipFile.Read(currentFile.FullName))
{
ZipEntry e = zip[this.searchFile.Text];
using (MemoryStream reader = new MemoryStream())
{
e.Extract(reader);
reader.Position = 0;
var stringReader = new StreamReader(reader);
var stringOfStream = stringReader.ReadToEnd();
}
}
Check if it works or not.

Create and write to a text file inmemory and convert to byte array in one go

How can I create a .csv file implicitly/automatically by using the correct method, add text to that file existing in memory and then convert to in memory data to a byte array?
string path = #"C:\test.txt";
File.WriteAllLines(path, GetLines());
byte[] bytes = System.IO.File.ReadAllBytes(path);
With that approach I create a file always (good), write into it (good) then close it (bad) then open the file again from a path and read it from the hard disc (bad)
How can I improve that?
UPDATE
One nearly good approach would be:
using (var fs = new FileStream(#"C:\test.csv", FileMode.Create, FileAccess.ReadWrite))
{
using (var memoryStream = new MemoryStream())
{
fs.CopyTo(memoryStream );
return memoryStream .ToArray();
}
}
but I am not able to write text into that filestream... just bytes...
UPDATE 2
using (var fs = File.Create(#"C:\temp\test.csv"))
{
using (var sw = new StreamWriter(fs, Encoding.Default))
{
using (var ms = new MemoryStream())
{
String message = "Message is the correct ääüö Pi(\u03a0), and Sigma (\u03a3).";
sw.Write(message);
sw.Flush();
fs.CopyTo(ms);
return ms.ToArray();
}
}
}
The string message is not persisted to the test.csv file. Anyone knows why?
Write text into Memory Stream.
byte[] bytes = null;
using (var ms = new MemoryStream())
{
using(TextWriter tw = new StreamWriter(ms)){
tw.Write("blabla");
tw.Flush();
ms.Position = 0;
bytes = ms.ToArray();
}
}
UPDATE
Use file stream Directly and write to File
using (var fs = new FileStream(#"C:\ed\test.csv", FileMode.Create, FileAccess.ReadWrite))
{
using (TextWriter tw = new StreamWriter(fs))
{
tw.Write("blabla");
tw.Flush();
}
}
You can get a byte array from a string using encoding:
Encoding.ASCII.GetBytes(aString);
Or
Encoding.UTF8.GetBytes(aString);
But I don't know why you would want a csv as bytes. You could load the entire file to a string, add to it and then save it:
string content;
using (var reader = new StreamReader(filename))
{
content = reader.ReadToEnd();
}
content += "x,y,z";
using (var writer = new StreamWriter(filename))
{
writer.Write(content);
}
Update: Create a csv in memory and pass back as bytes:
var stringBuilder = new StringBuilder();
foreach(var line in GetLines())
{
stringBuilder.AppendLine(line);
}
return Encoding.ASCII.GetBytes(stringBuilder.ToString());

Why does one method of string compression return an empty string but another one doesn't?

I'm using GZipStream to compress a string, and I've modified two different examples to see what works. The first code snippet, which is a heavily modified version of the example in the documentation, simply returns an empty string.
public static String CompressStringGzip(String uncompressed)
{
String compressedString;
// Convert the uncompressed source string to a stream stored in memory
// and create the MemoryStream that will hold the compressed string
using (MemoryStream inStream = new MemoryStream(Encoding.Unicode.GetBytes(uncompressed)),
outStream = new MemoryStream())
{
using (GZipStream compress = new GZipStream(outStream, CompressionMode.Compress))
{
inStream.CopyTo(compress);
StreamReader reader = new StreamReader(outStream);
compressedString = reader.ReadToEnd();
}
}
return compressedString;
and when I debug it, all I can tell is nothing is read from reader, which is compressedString is empty. However, the second method I wrote, modified from a CodeProject snippet is successful.
public static String CompressStringGzip3(String uncompressed)
{
//Transform string to byte array
String compressedString;
byte[] uncompressedByteArray = Encoding.Unicode.GetBytes(uncompressed);
using (MemoryStream outStream = new MemoryStream())
{
using (GZipStream compress = new GZipStream(outStream, CompressionMode.Compress))
{
compress.Write(uncompressedByteArray, 0, uncompressedByteArray.Length);
compress.Close();
}
byte[] compressedByteArray = outStream.ToArray();
StringBuilder compressedStringBuilder = new StringBuilder(compressedByteArray.Length);
foreach (byte b in compressedByteArray)
compressedStringBuilder.Append((char)b);
compressedString = compressedStringBuilder.ToString();
}
return compressedString;
}
Why is the first code snippet not successful while the other one is? Even though they're slightly different, I don't know why the minor changes in the second snippet allow it to work. The sample string I'm using is SELECT * FROM foods f WHERE f.name = 'chicken';
I ended up using the following code for compression and decompression:
public static String Compress(String decompressed)
{
byte[] data = Encoding.UTF8.GetBytes(decompressed);
using (var input = new MemoryStream(data))
using (var output = new MemoryStream())
{
using (var gzip = new GZipStream(output, CompressionMode.Compress, true))
{
input.CopyTo(gzip);
}
return Convert.ToBase64String(output.ToArray());
}
}
public static String Decompress(String compressed)
{
byte[] data = Convert.FromBase64String(compressed);
using (MemoryStream input = new MemoryStream(data))
using (GZipStream gzip = new GZipStream(input, CompressionMode.Decompress))
using (MemoryStream output = new MemoryStream())
{
gzip.CopyTo(output);
StringBuilder sb = new StringBuilder();
return Encoding.UTF8.GetString(output.ToArray());
}
}
The explanation for a part of the problem comes from this question. Although I fixed the problem by changing the code to what I included in this answer, these lines (in my original code):
foreach (byte b in compressedByteArray)
compressedStringBuilder.Append((char)b);
are problematic, because as dlev aptly phrased it:
You are interpreting each byte as its own character, when in fact that is not the case. Instead, you need the line:
string decoded = Encoding.Unicode.GetString(compressedByteArray);
The basic problem is that you are converting to a byte array based on an encoding, but then ignoring that encoding when you retrieve the bytes.
Therefore, the problem is solved, and the new code I'm using is much more succinct than my original code.
You need to move the code below outside the second using statement:
using (GZipStream compress = new GZipStream(outStream, CompressionMode.Compress))
{
inStream.CopyTo(compress);
outStream.Position = 0;
StreamReader reader = new StreamReader(outStream);
compressedString = reader.ReadToEnd();
}
CopyTo() is not flushing the results to the underlying MemoryStream.
Update
Seems that GZipStream closes and disposes it's underlying stream when it is disposed (not the way I would have designed the class). I've updated the sample above and tested it.

Edit RTF as plain text but unable to open it again

I have a RTF file that I want to open, replace a String "TEMPLATE_Name" and save. But after saving, the file cannot open correctly again. When I use MS Word, the file opens and shows the RTF raw code instead the text.
I am afraid I am breaking the format or the encoding but I don't really know how:
using (MemoryStream ms = new MemoryStream(1000))
using (StreamWriter sw = new StreamWriter(ms,Encoding.UTF8))
{
using (Stream fsSource = new FileStream(Server.MapPath("~/LetterTemplates/TestTemplate.rtf"), FileMode.Open))
using (StreamReader sr = new StreamReader(fsSource,Encoding.UTF8))
while (!sr.EndOfStream)
{
String line = sr.ReadLine();
line = line.Replace("TEMPLATE_Name", model.FirstName + " " + model.LastName);
sw.WriteLine(line);
}
ms.Position = 0;
using (FileStream fs = new FileStream(Server.MapPath("~/LetterTemplates/test.rtf"), FileMode.Create))
ms.CopyTo(fs);
}
Any idea about what could be the issue?
Thanks.
SOLUTION: One problem was what #BrokenGlass has pointed out, the fact I was not flushing the stream. The other was the encoding. In the fist line of the RTF file I can see:
{\rtf1\adeflang1025\ansi\ansicpg1252\uc1\
So, even without understand anything about RTF, I set the encoding to code page 1252 and it works:
using (MemoryStream ms = new MemoryStream(1000))
using (StreamWriter sw = new StreamWriter(ms,Encoding.GetEncoding(1252)))
{
using (Stream fsSource = new FileStream(Server.MapPath("~/LetterTemplates/TestTemplate.rtf"), FileMode.Open))
using (StreamReader sr = new StreamReader(fsSource,Encoding.GetEncoding(1252)))
while (!sr.EndOfStream)
{
String line = sr.ReadLine();
line = line.Replace("TEMPLATE_Name", model.FirstName + " " + model.LastName);
sw.WriteLine(line);
}
sw.Flush();
ms.Position = 0;
using (FileStream fs = new FileStream(Server.MapPath("~/LetterTemplates/test.rtf"), FileMode.Create))
ms.CopyTo(fs);
}
StreamWriter is buffering content - make sure you call sw.Flush() before reading from your memory stream.
StreamWriter.Flush():
Clears all buffers for the current writer and causes any buffered data
to be written to the underlying stream.
Edit in light of comments:
A better alternative as #leppie alluded to is restructuring the code to use the using block to force flushing, instead of explicitly doing it:
using (MemoryStream ms = new MemoryStream(1000))
{
using (StreamWriter sw = new StreamWriter(ms,Encoding.UTF8))
{
//...
}
ms.Position = 0;
//Write to file
}
An even better alternative as #Slaks pointed out is writing to the file directly and not using a memory stream at all - unless there are other reasons you are doing this this seems to be the most straightforward solution, it would simplify your code and avoid buffering the file in memory.

Categories

Resources