Writing a string to MemoryStream corrupts the input

Writing a string to MemoryStream corrupts the input - c#

I have an ASP.net project and want to return a CSV file when an AJAX post is sent (yes it works. See Handle file download from AJAX post).
The special thing is, that I want to create the result in a MemoryStream to return it as FileResult.
But my problem is now, that German umlauts (ä, ö, ü) get corrupted. So here is my code:
public ActionResult Download(FormCollection form) {
string[] v = new string[16];
MemoryStream stream = new MemoryStream();
StreamWriter writer = new StreamWriter(stream,
System.Text.Encoding.GetEncoding("Windows-1252"));
SqlCommand cmd = dbconn.CreateCommand();
//create SQL command
while (rs.Read()) {
v = new string[16];
v[0] = rs.GetString("IstAktiv");
v[1] = rs.GetString("Haus");
//cache all the values
...
//write cached values
for (int i = 0; i < v.Length; i++) {
if (i > 0) writer.Write(";");
writer.Write(v[i]);
writer.Flush();
}
writer.Write("\r\n");
writer.Flush();
} //end while rs.Read()
FileContentResult ret = new FileContentResult(stream.ToArray(), "text/csv");
ret.FileDownloadName = "Kontakte.csv";
writer.Close();
return ret;
} //end method
So when I open the resulting file in Excel the umlauts are converted into something strange. For example the upper case letter "Ä" is changed to "�".
So is there any possibility to solve this issue?
Best regards

To have Excel read CSV files correctly, it expects the CSV file to be in the UTF-8 encoding (with BOM).
So, without a doubt, your StreamWriter would have to be set this way:
StreamWriter writer = new StreamWriter(stream,
System.Text.Encoding.GetEncoding("UTF-8"));
However, if that doesn't work for you, then it's very likely that it's because the characters are being corrupted before you even get a chance to write them to the stream. You may be facing an encoding conversion problem as you are reading the data from the database.
v = new string[16];
v[0] = rs.GetString("IstAktiv");
v[1] = rs.GetString("Haus");
To validate that, place a breakpoint as you read the values into the 'v' array, and check that the characters still look ok at this step. If they are corrupted, then you know that the problem is between the code and the database, and the writing to the CSV is not the problem.
EDIT: Here is an isolated test case you can use to prove that UTF-8 is the correct encoding to write CSVs. Perhaps you can try that first:
Encoding enc = Encoding.GetEncoding("UTF-8");
using (StreamWriter writer = new StreamWriter(#"d:\test\test.csv", false, enc))
{
writer.Write(#"""hello ä, ö, ü world""");
}

Related

How to write Binary and ASCII to the same txt-file in c#?

as you can imagine I'm adressing my question to you as I wasn't able to solve my problem doing research. Also I'm rather unexperienced so maybe you can help.
At work I have a simulation program that produces huge amount of data so I want to write out every time increment as VTK file using binary. For reasons of acquiring the data I want to implement this vtkWrite by myself.
Therefore as you might know, I need to write some lines in ASCII-format that contain the declaration of the data part that I need to write in binary format.
Up to now my code looks like this:
public void WriteVTKBin(int increment, string output_dir, bool format)
{
int data_structure = 12;
int[,] ElementTypes = new int[hex8Elements.numberOfElements, 1];
for (int ii = 0; ii < ElementTypes.Length; ii++) ElementTypes[ii, 0] = data_structure;
string File_name = "output_bin_" + increment + ".vtk";
var output = new FileStream(#output_dir + "/" + File_name, FileMode.Create);
var output_Stream = new StreamWriter(output);
using (output)
{
output_Stream.WriteLine("# vtk DataFile Version 3.0");
output_Stream.WriteLine("vtk-Output for increment");
if (format) output_Stream.WriteLine("BINARY");
else output_Stream.WriteLine("ASCII");
output_Stream.WriteLine("DATASET UNSTRUCTURED_GRID");
output_Stream.WriteLine("POINTS " + nodes.numberOfNodes + " double");
}
var output_rest = new FileStream(#output_dir + "/" + File_name, FileMode.Append);
var BinWriter = new BinaryWriter(output_rest);
using (output_rest)
{
BinWriter.Write(GetBytesDouble(GetPointsOutput()));
}
}
}
The argument taken from the BinaryWriter is a ByteArray that I produce using a different method.
My idea was to initialize the file with FileMode.Create to overwrite potentially existing old files and then write the header section. Afterwards closing the file, open it again using FileMode.Append and writing my binary data. This I want to repeat until all fields I want to write out are contained in the vtk-file.
The Problem is: The BinaryWriter overwrites my header even though it's in Append-Mode and when I want to close it and write another ASCII-line it tells me that it cannot access an already closed file.
So is there a solution for my approach or is there even a way more sophisticated way to deal with such types of output?
Thank you very much in advance,
RR

You can simply transform your ascii strings to binary and write them to file as such:
ByteArray = Encoding.XXX.GetBytes(text)
Where XXX is the encoding you want
BinaryWriter.Write(ByteArray)
When you open the file it will try to decode it with ascii and your ascii strings will be there for you to read

You have missed using for StreamWriter and BinaryWriter. Change your code like this and it will work:
using (var fstream = new FileStream(..., FileMode.Create))
using (var writer = new StreamWriter(fstream))
{
// Write headers
}
using (var fstream = new FileStream(..., FileMode.Append)
using (var writer = new BinaryWriter(fstream))
{
// Write binary data
}
Also you can encode your headers to byte[] and use only BinaryWriter.
using (var fstream = new FileStream(..., FileMode.Create)
using (var writer = new BinaryWriter(fstream))
{
// Write headers
writer.Write(Encoding.ASCII.GetBytes("..."));
// Write binary data
}
Or just use BinaryWriter to write string.
using (var fstream = new FileStream(..., FileMode.Create)
using (var writer = new BinaryWriter(fstream))
{
// Write headers
writer.Write("...");
// Write binary data
}

Convert a file froM Shift-JIS to UTF8 No BOM without re-reading from disk

I am dealing with files in many formats, including Shift-JIS and UTF8 NoBOM. Using a bit of language knowledge, I can detect if the files are being interepeted correctly as UTF8 or ShiftJIS, but if I detect that the file is not of the type I read in, I was wondering if there is a way to just reinterperet my in-memory array without having to re-read the file with a new encoding specified.
Right now, I read in the file assuming Shift-JIS as such:
using (StreamReader sr = new StreamReader(path, Encoding.GetEncoding("shift-jis"), true))
{
String line = sr.ReadToEnd();
// Detection must be done AFTER you read from the file. Silly rabbit.
fileFormatCertain = !sr.CurrentEncoding.Equals(Encoding.GetEncoding("shift-jis"));
codingFromBOM = sr.CurrentEncoding;
}
and after I do my magic to determine if it is either a known format (has a BOM) or that the data makes sense as Shift-JIS, all is well. If the data is garbage though, then I am re-reading the file via:
using (StreamReader sr = new StreamReader(path, Encoding.UTF8))
{
String line = sr.ReadToEnd();
}
I am trying to avoid this re-read step and reinterperet the data in memory if possible.
Or is magic already happening and I am needlessly worrying about double I/O access?

var buf = File.ReadAllBytes(path);
var text = Encoding.UTF8.GetString(buf);
if (text.Contains("\uFFFD")) // Unicode replacement character
{
text = Encoding.GetEncoding(932).GetString(buf);
}

GZipStream - write not writing all compressed data even with flush?

I've got a pesky problem with gzipstream targeting .Net 3.5. This is my first time working with gzipstream, however I have modeled after a number of tutorials including here and I'm still stuck.
My app serializes a datatable to xml and inserts into a database, storing the compressed data into a varbinary(max) field as well as the original length of the uncompressed buffer. Then, when I need it, I retrieve this data and decompress it and recreates the datatable. The decompress is what seems to fail.
EDIT: Sadly after changing the GetBuffer to ToArray as suggested, my issue remains. Code Updated below
Compress code:
DataTable dt = new DataTable("MyUnit");
//do stuff with dt
//okay... now compress the table
using (MemoryStream xmlstream = new MemoryStream())
{
//instead of stream, use xmlwriter?
System.Xml.XmlWriterSettings settings = new System.Xml.XmlWriterSettings();
settings.Encoding = Encoding.GetEncoding(1252);
settings.Indent = false;
System.Xml.XmlWriter writer = System.Xml.XmlWriter.Create(xmlstream, settings);
try
{
dt.WriteXml(writer);
writer.Flush();
}
catch (ArgumentException)
{
//likely an encoding issue... okay, base64 encode it
var base64 = Convert.ToBase64String(xmlstream.ToArray());
xmlstream.Write(Encoding.GetEncoding(1252).GetBytes(base64), 0, Encoding.GetEncoding(1252).GetBytes(base64).Length);
}
using (MemoryStream zipstream = new MemoryStream())
{
GZipStream zip = new GZipStream(zipstream, CompressionMode.Compress);
log.DebugFormat("Compressing commands...");
zip.Write(xmlstream.GetBuffer(), 0, xmlstream.ToArray().Length);
zip.Flush();
float ratio = (float)zipstream.ToArray().Length / (float)xmlstream.ToArray().Length;
log.InfoFormat("Resulting compressed size is {0:P2} of original", ratio);
using (SqlCommand cmd = new SqlCommand())
{
cmd.CommandText = "INSERT INTO tinydup (lastid, command, compressedlength) VALUES (#lastid,#compressed,#length)";
cmd.Connection = db;
cmd.Parameters.Add("#lastid", SqlDbType.Int).Value = lastid;
cmd.Parameters.Add("#compressed", SqlDbType.VarBinary).Value = zipstream.ToArray();
cmd.Parameters.Add("#length", SqlDbType.Int).Value = xmlstream.ToArray().Length;
cmd.ExecuteNonQuery();
}
}
Decompress Code:
/* This is an encapsulation of what I get from the database
public class DupUnit{
public uint lastid;
public uint complength;
public byte[] compressed;
}*/
//I have already retrieved my list of work to do from the database in a List<Dupunit> dupunits
foreach (DupUnit unit in dupunits)
{
DataSet ds = new DataSet();
//DataTable dt = new DataTable();
//uncompress and extract to original datatable
try
{
using (MemoryStream zipstream = new MemoryStream(unit.compressed))
{
GZipStream zip = new GZipStream(zipstream, CompressionMode.Decompress);
byte[] xmlbits = new byte[unit.complength];
//WHY ARE YOU ALWAYS 0!!!!!!!!
int bytesdecompressed = zip.Read(xmlbits, 0, unit.compressed.Length);
MemoryStream xmlstream = new MemoryStream(xmlbits);
log.DebugFormat("Uncompressed XML against {0} is: {1}", m_source.DSN, Encoding.GetEncoding(1252).GetString(xmlstream.ToArray()));
try{
ds.ReadXml(xmlstream);
}catch(Exception)
{
//it may have been base64 encoded... decode first.
ds.ReadXml(Encoding.GetEncoding(1254).GetString(
Convert.FromBase64String(
Encoding.GetEncoding(1254).GetString(xmlstream.ToArray())))
);
}
xmlstream.Dispose();
}
}
catch (Exception e)
{
log.Error(e);
Thread.Sleep(1000);//sleep a sec!
continue;
}
Note the comment above... bytesdecompressed is always 0. Any ideas? Am I doing it wrong?
EDIT 2:
So this is weird. I added the following debug code to the decompression routine:
GZipStream zip = new GZipStream(zipstream, CompressionMode.Decompress);
byte[] xmlbits = new byte[unit.complength];
int offset = 0;
while (zip.CanRead && offset < xmlbits.Length)
{
while (zip.Read(xmlbits, offset, 1) == 0) ;
offset++;
}
When debugging, sometimes that loop would complete, but other times it would hang. When I'd stop the debugging, it would be at byte 1600 out of 1616. I'd continue, but it wouldn't move at all.
EDIT 3: The bug appears to be in the compress code. For whatever reason, it is not saving all of the data. When I try to decompress the data using a third party gzip mechanism, I only get part of the original data.
I'd start a bounty, but I really don't have much reputation to give as of now :-(

Finally found the answer. The compressed data wasn't complete because GZipStream.Flush() does absolutely nothing to ensure that all of the data is out of the buffer - you need to use GZipStream.Close() as pointed out here. Of course, if you get a bad compress, it all goes downhill - if you try to decompress it, you will always get 0 returned from the Read().

I'd say this line, at least, is the most wrong:
cmd.Parameters.Add("#compressed", SqlDbType.VarBinary).Value = zipstream.GetBuffer();
MemoryStream.GetBuffer:
Note that the buffer contains allocated bytes which might be unused. For example, if the string "test" is written into the MemoryStream object, the length of the buffer returned from GetBuffer is 256, not 4, with 252 bytes unused. To obtain only the data in the buffer, use the ToArray method.
It should be noted that in the zip format, it first works by locating data stored at the end of the file - so if you've stored more data than was required, the required entries at the "end" of the file don't exist.
As an aside, I'd also recommend a different name for your compressedlength column - I'd initially taken it (despite your narrative) as being intended to store, well, the length of the compressed data (and written part of my answer to address that). Maybe originalLength would be a better name?

How to tell if a file is text-readable in C#

Part of a list of projects I'm doing is a little text-editor.
At one point, you can load all the sub directories and files in a given directory. The program will add each as a node in a TreeView.
What I want the functionality to be is to only add the files that are readable by a normal text reader.
This code currently adds it to the tree:
TreeNode navNode = new TreeNode();
navNode.Text = file.Name;
navNode.Tag = file.FullName;
directoryNode.Nodes.Add(navNode);
I know I could easily create an if statement with something like:
if(file.extension.equals(".txt"))
but I would have to expand that statement to contain every single extension that it could possibly be.
Is there an easier way to do this? I'm thinking it may have something to do with the mime types or file encoding.

There is no general way of figuring type of information stored in the file.
Even if you know in advance that it is some sort of text if you don't know what encoding was used to create file you may not be able to load it properly.
Note that HTTP give you some hints on type of file by content-type header, but there is no such information on file system.

There are a few methods you could use to "best guess" whether or not the file is a text file. Of course, the more encodings you support, the harder this becomes, especially if plan to support CJK (Chinese, Japanese, Korean) scripts. Let's just start with Encoding.Ascii and Encoding.UTF-8 for now.
Fortunately, most non-text files (executables, images, and the like) have a lot of non-parsable characters in their first couple of kilobytes.
What you could do is take a file and scan the first 1-4KB (up to you) and see if any "non-printable" characters come up. This operation shouldn't take much time and will at least give you some certainty of the contents of the file.
public static async Task<bool> IsValidTextFileAsync(string path,
int scanLength = 4096)
{
using(var stream = File.Open(path, FileMode.Open, FileAccess.Read, FileShare.Read))
using(var reader = new StreamReader(stream, Encoding.UTF8))
{
var bufferLength = (int)Math.Min(scanLength, stream.Length);
var buffer = new char[bufferLength];
var bytesRead = await reader.ReadBlockAsync(buffer, 0, bufferLength);
reader.Close();
if(bytesRead != bufferLength)
throw new IOException("There was an error reading from the file.");
for(int i = 0; i < bytesRead; i++)
{
var c = buffer[i];
if(char.IsControl(c))
return false;
}
return true;
}
}

My approach based on #Rubenisme's comment and #Erik's answer.
public static bool IsValidTextFile(string path)
{
using (var stream = System.IO.File.Open(path, System.IO.FileMode.Open, System.IO.FileAccess.Read, System.IO.FileShare.Read))
using (var reader = new System.IO.StreamReader(stream, System.Text.Encoding.UTF8))
{
var bytesRead = reader.ReadToEnd();
reader.Close();
return bytesRead.All(c => // Are all the characters either a:
c == (char)10 // New line
|| c == (char)13 // Carriage Return
|| c == (char)11 // Tab
|| !char.IsControl(c) // Non-control (regular) character
);
}
}

A hacky way to do it would be to see if the file contains any of the lower control characters (0-31) that aren't forms of white space (carriage return, tab, vertical tab, line feed, and just to be safe null and end of text). If it does, then it is probably binary. If it does not, it probably isn't. I haven't done any testing or anything to see what happens when applying this rule to non ASCII encodings, so you'd have to investigate further yourself :)

Writing XML files using XmlTextWriter with ISO-8859-1 encoding

I'm having a problem writing Norwegian characters into an XML file using C#. I have a string variable containing some Norwegian text (with letters like æøå).
I'm writing the XML using an XmlTextWriter, writing the contents to a MemoryStream like this:
MemoryStream stream = new MemoryStream();
XmlTextWriter xmlTextWriter = new XmlTextWriter(stream, Encoding.GetEncoding("ISO-8859-1"));
xmlTextWriter.Formatting = Formatting.Indented;
xmlTextWriter.WriteStartDocument(); //Start doc
Then I add my Norwegian text like this:
xmlTextWriter.WriteCData(myNorwegianText);
Then I write the file to disk like this:
FileStream myFile = new FileStream(myPath, FileMode.Create);
StreamWriter sw = new StreamWriter(myFile);
stream.Position = 0;
StreamReader sr = new StreamReader(stream);
string content = sr.ReadToEnd();
sw.Write(content);
sw.Flush();
myFile.Flush();
myFile.Close();
Now the problem is that in the file on this, all the Norwegian characters look funny.
I'm probably doing the above in some stupid way. Any suggestions on how to fix it?

Why are you writing the XML first to a MemoryStream and then writing that to the actual file stream? That's pretty inefficient. If you write directly to the FileStream it should work.
If you still want to do the double write, for whatever reason, do one of two things. Either
Make sure that the StreamReader and StreamWriter objects you use all use the same encoding as the one you used with the XmlWriter (not just the StreamWriter, like someone else suggested), or
Don't use StreamReader/StreamWriter. Instead just copy the stream at the byte level using a simple byte[] and Stream.Read/Write. This is going to be, btw, a lot more efficient anyway.

Both your StreamWriter and your StreamReader are using UTF-8, because you're not specifying the encoding. That's why things are getting corrupted.
As tomasr said, using a FileStream to start with would be simpler - but also MemoryStream has the handy "WriteTo" method which lets you copy it to a FileStream very easily.
I hope you've got a using statement in your real code, by the way - you don't want to leave your file handle open if something goes wrong while you're writing to it.
Jon

You need to set the encoding everytime you write a string or read binary data as a string.
Encoding encoding = Encoding.GetEncoding("ISO-8859-1");
FileStream myFile = new FileStream(myPath, FileMode.Create);
StreamWriter sw = new StreamWriter(myFile, encoding);
stream.Position = 0;
StreamReader sr = new StreamReader(stream, encoding);
string content = sr.ReadToEnd();
sw.Write(content);
sw.Flush();
myFile.Flush();
myFile.Close();

As mentioned in above answers, the biggest issue here is the Encoding, which is being defaulted due to being unspecified.
When you do not specify an Encoding for this kind of conversion, the default of UTF-8 is used - which may or may not match your scenario. You are also converting the data needlessly by pushing it into a MemoryStream and then out into a FileStream.
If your original data is not UTF-8, what will happen here is that the first transition into the MemoryStream will attempt to decode using default Encoding of UTF-8 - and corrupt your data as a result. When you then write out to the FileStream, which is also using UTF-8 as encoding by default, you simply persist that corruption into the file.
In order to fix the issue, you likely need to specify Encoding into your Stream objects.
You can actually skip the MemoryStream process entirely, also - which will be faster and more efficient. Your updated code might look something more like:
FileStream fs = new FileStream(myPath, FileMode.Create);
XmlTextWriter xmlTextWriter =
new XmlTextWriter(fs, Encoding.GetEncoding("ISO-8859-1"));
xmlTextWriter.Formatting = Formatting.Indented;
xmlTextWriter.WriteStartDocument(); //Start doc
xmlTextWriter.WriteCData(myNorwegianText);
StreamWriter sw = new StreamWriter(fs);
fs.Position = 0;
StreamReader sr = new StreamReader(fs);
string content = sr.ReadToEnd();
sw.Write(content);
sw.Flush();
fs.Flush();
fs.Close();

Which encoding do you use for displaying the result file? If it is not in ISO-8859-1, it will not display correctly.
Is there a reason to use this specific encoding, instead of for example UTF8?

After investigating, this is that worked best for me:
var doc = new XDocument(new XDeclaration("1.0", "ISO-8859-1", ""));
using (XmlWriter writer = doc.CreateWriter()){
writer.WriteStartDocument();
writer.WriteStartElement("Root");
writer.WriteElementString("Foo", "value");
writer.WriteEndElement();
writer.WriteEndDocument();
}
doc.Save("dte.xml");

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Writing a string to MemoryStream corrupts the input - c#

Related

How to write Binary and ASCII to the same txt-file in c#?

Convert a file froM Shift-JIS to UTF8 No BOM without re-reading from disk

GZipStream - write not writing all compressed data even with flush?

How to tell if a file is text-readable in C#

Writing XML files using XmlTextWriter with ISO-8859-1 encoding

Categories

Resources