Interop.Excel UTF-8 encoding when saving file - c#

I'm having an issue saving an Excel file as a CSV with UTF-8 encoding.
Because I have non standard characters (different language) in my Excel document it caused issues when saved as a CSV. This was solved here by setting the web options encoding to UTF-8.
I am creating a basic C# program that parses data from an Excel file and saves it in CSV format but I cant get it to save using the UTF-8 encoding.
I am using Microsoft.Office.Interop.Excel to work with the Excel files.
This is my code:
private Excel.Application application = new Excel.Application { Visible = false };
private Excel.Workbook Workbook = application.Workbooks.Open(OrigionalFileUrl);
Workbook.SaveAs(NewFileUrl);
I have tried setting
application.DefaultWebOptions.Encoding = MsoEncoding.msoEncodingUTF8;
but it doesnt work and the CSV file that I get is always a mess when it comes to sections with special characters.
Thanks!

I believe you want to do that on the workbook level, not on the application. Also, it's possible because you didn't include the file format as CSV that the SaveAs is using the native format but only changing the file extension.
Try these and see if they address your issue:
Workbook.WebOptions.Encoding = Microsoft.Office.Core.MsoEncoding.msoEncodingUTF8;
Workbook.SaveAs(NewFileUrl, XlFileFormat.xlCSV);

The proposed solution didn't work for me. But according to the documentation there is now a
XlFileFormat for CSV-UTF8:
XlFileFormat.xlCSVUTF8
https://learn.microsoft.com/en-us/office/vba/api/excel.xlfileformat

Related

.Net web api download csv file (Excel separator)

I have a controller endpoint from which I want to generate a csv file and download it.
Currently I am using nuget CsvHelper and my code is like this:
var cc = new CsvConfiguration(new System.Globalization.CultureInfo("sl-SI"));
using (var ms = new MemoryStream())
{
using (var sw = new StreamWriter(stream: ms, encoding: new UTF8Encoding(true)))
{
using (var cw = new CsvWriter(sw, cc))
{
cw.WriteRecords(ListOfReports);
}// The stream gets flushed here.
return File(ms.ToArray(), "text/csv", $"{docNumber.Trim()}_{docType}.csv");
}
}
It generated csv pretty nice, but the problem was, if I opened it in Excel, whole row was in the first column and was not splitted.
I added this part:
cw.WriteField("sep=,", false);
cw.NextRecord();
Before cw.WriteRecords(ListOfReports);, which made it work in Excel, but if I open it in Notepad, there is a sep=, in my first row.
I noticed there is a difference in CultureInfo, If I set "sl-SI" it will work properly on Slovenian windows (separator will be ;), if I set "en-US" it will work on English Windows (separator ,). But what do i need to do to work on any Culture?
Does anyone has any idea how to fix this so it will work properly in Excel and any other text editor?
This is effectively the same as this SuperUser question, but it appears you want a programming-oriented solution rather than a user-oriented one.
The problem is fundamentally that Excel is really bad at dealing with CSV files, especially when taking non-US cultures into account. My suggestion would be to allow users to download a real Excel file using a library like DocumentFormat.OpenXml.
If you have two separate use cases for your downloads (i.e. some users who open the file in notepad or consume it with software that reads CSV, and others who open the file in Excel), give the users separate options to download in CSV or Excel.

Excel file generated with C # is corrupted (ExcelExportEngine & IExportEngine)

I have an application made in C # that allows to generate an excel file from a list (List <>).
The code that generates the excel file has worked without problems but today a user who still uses windows 7 has indicated that the excel file has been generated but when it is opened it is unreadable.
The code to generate the excel is as follows:
IExportEngine engine = new ExcelExportEngine();
engine.AddData(productListExport);
MemoryStream memory = engine.Export();
FileStream fileStream;
SaveFileDialog saveFileDialog1 = new SaveFileDialog();
saveFileDialog1.Filter = "Excel files (*.xls or .xlsx)|.xls;*.xlsx";
saveFileDialog1.Title = "Export product list to Excel";
saveFileDialog1.ShowDialog();
if (saveFileDialog1.FileName != "")
{
String path = Path.GetFullPath(saveFileDialog1.FileName);
fileStream = new FileStream(path, FileMode.OpenOrCreate, FileAccess.Write, FileShare.ReadWrite);
memory.WriteTo(fileStream);
fileStream.Close();
}
When you open the excel file, a message appears saying that the format and extension does not match, if you choose to open the option to open anyway, it looks like the following image:
Any suggestions or comments are welcome.
UPDATE:
About ExcelExportEngine:
https://github.com/vvenegasv/exportable
The screenshot looks like the contents of an xlsx file. An xlsx file is a zip package containing XML files. The various xml paths in there are a very strong indication. The PK bytes too, but I saw docProps.xml first.
The code itself has a serious problem - it generates the Excel file contents before asking for a format. Given that the xls format became obsolete 13 years ago, the only sensible default is to use xlsx. If an xlsx file is saved as xls, Excel will complain. In my case though, it was able to load the file.
It looks like the code uses the Exportable package. The examples in the Github repo show how to specify the format. Apart from MemoryStream Export() though, the library also has an Export(string path) that writes to a file. The source code shows that Export(string) selects the format based on the extension and throws if it's invalid.
This means that the code can be reduced to :
IExportEngine engine = new ExcelExportEngine();
engine.AddData(productListExport);
var saveFileDialog1 = new SaveFileDialog(){
Filter = "Excel files (*.xls or .xlsx)|.xls;*.xlsx",
Title = "Export product list to Excel"
}
saveFileDialog1.ShowDialog();
if (saveFileDialog1.FileName != "")
{
var path = Path.GetFullPath(saveFileDialog1.FileName);
engine.Export(path);
}
I'd also suggest getting rid of the xls option too. It's not just that it was replaced 13 years ago. The format wasn't well defined to begin so libraries always have issues producing the same output Excel did. Services like Google Sheets or Office Online only work with xlsx. You have to pay to get xls support. The file size is a lot smaller too.

Specify Encoding Format when Export DataTable to Excel with Open Xml SDK in c#

I am just trying to export datatable as excel using open xml. Referred the below approach,
Export DataTable to Excel with Open Xml SDK in c#
But I couldn't specify the encoding format of the excel created. I want to save or create the excecl with UTF-Encoding-8 format.
Can some one help on how could the formatting specificed using Open XML
To save an Excel file with UTF-Encoding-8 Format you need to specify the encoding format under Tools -> Web Options -> Encoding while saving the excel file.
With OpenXML you need to create a new WebPublishing and add it to Workbook.
WebPublishing webPublishing1 = new WebPublishing(){ AllowPng = true, TargetScreenSize = TargetScreenSizeValues.Sz1024x768, CodePage = (UInt32Value)65001U };
workbook1.Append(webPublishing1);
//Where workbook1 is WorkbookPart.Workbook

file data seems to be corrupted when reading file as a string

I'm trying to read a file as string. But it seems that the data is corrupted.
string filepaths = Files[0].FullName;
System.IO.StreamReader myFile = new System.IO.StreamReader(filepaths);
string datas = myFile.ReadToEnd();
but in datas, it contains "pk0101" etc instead of original data. I'm doing this so I can replace a placeholder with this string data,datas. And finally when I replace,gets replaced text as 0101 etc. Is it because of the content in datas. How can I read the file as string. Your help will be greatly appreciated. Thank You.
*.docx is a file format which in raw view represents xml document. Take a look here to become more familiar with this format definition.
For working with office formats Microsoft recommends to use Open Xml SDK at DocumentFormat.OpenXml library.
Here is a great article for learning how to work with Word files.
It works as follows:
using (var wordDocument = WordprocessingDocument.Open(string.Empty, false))
{
var body = wordDocument.MainDocumentPart.Document.Body;
var text = body.GetFirstChild<Paragraph>().InnerText;
}
Also, take a look at this SO question: How do I read data from a word with format using the OpenXML Format SDK with c#?

How to get the contents of attachments of a file

I am trying to get the content of attachment. It may be an excel file, Document file or text file whatever it is but I want to store it in database so here I am using this code: -
foreach (FileAttachment file in em.Attachments)// Here em is type of EmailMessage class
{
Console.Write("Hello friends" + file.Name);
file.Load();
var stream = new System.IO.MemoryStream(file.Content);
var reader = new System.IO.StreamReader(stream, UTF8Encoding.UTF8);
var text = reader.ReadToEnd();
reader.Close();
Console.Write("Text Document" + text);
}
So By printing file.name is showing attachment file name but while printing 'text' on the console it is working if the attachment is .txt type but if it is .doc or .xls type then it is showing some symbolic result. I am not getting any text result. Am I doing something wrong or missing something. I want text result of any kind of file attachment . Please help me , I am beginner in C#
What you are seeing is what is actually in the file. Try opening one with Notepad.
There is no built-in way in .NET to show the "text contents" of arbitrary file formats. You'll have to create (preferably using third-party libraries that already solve this problem) some kind of logic that extracts plaintext from rich text documents.
See for example How to extract text from Pdf, Word and Excel documents?, Extract text from pdf and word files, and so on.
First, what do you expect when reading a binary file?
Your result is exactly what is expected. A text file can be shown as a string, but a doc or xls file is a binary file. You will see the binary content of the file. You will need to use a tool/lib to get the text/content from a binary file in human readable format.
TXT type is simple,DOC or XLS are much more complex.You can see TXT because is just text,DOC or XLS or PPT or something else needs to be interpreted by other mechanism.
See,for example,you have different colors or font sizes on a Word document,or a chart in an Excel document,how can you show that in a simple TextBox or RichTextBox?Short answer,you can't.

Categories

Resources