How to read from excel in a specific encoding?

How to read from excel in a specific encoding? - c#

I copied from an excel file to a txt file the column names. I read from the txt file the following way:
CultureInfo cultureHU;
Encoding encodingHU;
cultureHU = CultureInfo.GetCultureInfo("hu-HU");
encodingHU = Encoding.GetEncoding(cultureHU.TextInfo.ANSICodePage);
using (StreamReader sr = new StreamReader("settings.txt", encodingHU, true))
{
...
}
How do I read from an Excel file with the same encoding? If I do it the default way (xlRange.Cells[1, i].Value.ToString()), then I get wrong values:
in the excel and txt file I have: "Szerzõdõ"
reading from the text file in encodingHU encoding I get: "Szerződő" (this is the correct format)
reading from excel in C# i get: "Szerzõdõ"

You can convert the bad string to the correct encoding like this:
Console.WriteLine(encodingHU.GetString(Encoding.Default.GetBytes(str)));

I used ExcelDataReader. I needed coding 1250
ExcelReaderConfiguration conf = new ExcelReaderConfiguration();
conf.FallbackEncoding = Encoding.GetEncoding(1250);
using (IExcelDataReader reader = ExcelReaderFactory.CreateReader(stream, conf))
{
...
}

Related

What is the best way to check if a file contains a key before adding a new line?

I have a CSV file containing the following columns -
Key,Value
First,Line
Second,Line
Third,Line
I want to add a new Key-Value to this file given the key is not already present in the file using C#? What would be the best way to do this? Would I have to traverse line by line and check for the Keys or is there any other better way?
I am not using the CSVHelper package or any other CSV writer.

You could do this:
string path = #"PathToFile.csv";
string Content = string.Empty;
using (StreamReader reader = new StreamReader(path))
{
Content = reader.ReadToEnd();
reader.Close();
}
if (!Content.Contains("YourKey"))
{
using (StreamWriter sw = new StreamWriter(path))
{
sw.WriteLine(Content + "\nYourkey,YourValue");
sw.Close();
}
}
Read the file and write all text to a string variable, check the variable if the key exists, if it doesn't then write content back to the file along with your new key. as the file grows it will take longer and longer to search the whole file but it'll work well for a couple thousand lines.

c# docx load xml from string xml

I have created a program to read a file as array of bytes. The program is consuming word files by using docx library from Xceed. What I want to do is to recreate the parsed docx file from array of bytes.
To bytes:
var doc = Docx.Load("afile.docx");
...
return Encoding.Unicode.GetBytes(doc.Xml.Document.ToString());
Parse:
var doc = Docx.Create("anotherFile.docx");
var document = Encoding.Unicode.GetBytes({--returned bytes--}); <-- document is string with xml
How to save the document like the original?
I'm getting only blank file without any content.

using (var doc = DocX.Load("afile.docx"))
{
//here modify
doc.SaveAs("anotherFile.docx");
}

See this document BinaryWriter
bWriter.Writebytes(bytearray);

Replacing Invalid XML characters from an excel file and writing it back to disk causes file is corrupted error on opening in MS Excel

A little background on problem:
We have an ASP.NET MVC5 Application where we use FlexMonster to show the data in grid. The data source is a stored procedure that brings all the data into the UI grid, and once user clicks on export button, it exports the report to Excel. However, in some cases export to excel is failing.
Some of the data has some invalid characters, and it is not possible/feasible to fix the source as suggested here
My approach so far:
EPPlus library fails on initializing the workbook as the input excel file contains some invalid XML characters. I could find that the file is dumped with some invalid character in it. I looked into the possible approaches .
Firstly, I identified the problematic character in the excel file. I first tried to replace the invalid character with blank space manually using Notepad++ and the EPPlus could successfully read the file.
Now using the approaches given in other SO thread here and here, I replaced all possible occurrences of invalid chars. I am using at the moment
XmlConvert.IsXmlChar
method to find out the problematic XML character and replacing with blank space.
I created a sample program where I am trying to work on the problematic excel sheet.
//in main method
String readFile = File.ReadAllText(filePath);
string content = RemoveInvalidXmlChars(readFile);
File.WriteAllText(filePath, content);
//removal of invalid characters
static string RemoveInvalidXmlChars(string inputText)
{
StringBuilder withoutInvalidXmlCharsBuilder = new StringBuilder();
int firstOccurenceOfRealData = inputText.IndexOf("<t>");
int lastOccurenceOfRealData = inputText.LastIndexOf("</t>");
if (firstOccurenceOfRealData < 0 ||
lastOccurenceOfRealData < 0 ||
firstOccurenceOfRealData > lastOccurenceOfRealData)
return inputText;
withoutInvalidXmlCharsBuilder.Append(inputText.Substring(0, firstOccurenceOfRealData));
int remaining = lastOccurenceOfRealData - firstOccurenceOfRealData;
string textToCheckFor = inputText.Substring(firstOccurenceOfRealData, remaining);
foreach (char c in textToCheckFor)
{
withoutInvalidXmlCharsBuilder.Append((XmlConvert.IsXmlChar(c)) ? c : ' ');
}
withoutInvalidXmlCharsBuilder.Append(inputText.Substring(lastOccurenceOfRealData));
return withoutInvalidXmlCharsBuilder.ToString();
}
If I replaces the problematic character manually using notepad++, then the file opens fine in MSExcel. The above mentioned code successfully replaces the same invalid character and writes the content back to the file. However, when I try to open the excel file using MS Excel, it throws an error saying that file may have been corrupted and no content is displayed (snapshots below). Moreover, Following code
var excelPackage = new ExcelPackage(new FileInfo(filePath));
on the file that I updated via Notepad++, throws following exception
"CRC error: the file being extracted appears to be corrupted. Expected 0x7478AABE, Actual 0xE9191E00"}
My Questions:
Is my approach to modify content this way correct?
If yes, How can I write updated string to an Excel file?
If my approach is wrong then, How can I proceed to get rid of invalid XML chars?
Errors shown on opening file (without invalid XML char):
First Pop up
When I click on yes
Thanks in advance !

It does sounds like a binary (presumable XLSX) file based on your last comment. To confirm, open the file created by the FlexMonster with 7zip. If it opens properly and you see a bunch of XML files in folders, its a XLSX.
In that case, a search/replace on a binary file sounds like a very bad idea. It might work on the XML parts but might also replace legit chars in other parts. I think the better approach would be to do as #PanagiotisKanavos suggests and use ZipArchive. But you have to do rebuild it in the right order otherwise Excel complains. Similar to how it was done here https://stackoverflow.com/a/33312038/1324284, you could do something like this:
public static void ReplaceXmlString(this ZipArchive xlsxZip, FileInfo outFile, string oldString, string newstring)
{
using (var outStream = outFile.Open(FileMode.Create, FileAccess.ReadWrite))
using (var copiedzip = new ZipArchive(outStream, ZipArchiveMode.Update))
{
//Go though each file in the zip one by one and copy over to the new file - entries need to be in order
foreach (var entry in xlsxZip.Entries)
{
var newentry = copiedzip.CreateEntry(entry.FullName);
var newstream = newentry.Open();
var orgstream = entry.Open();
//Copy non-xml files over
if (!entry.Name.EndsWith(".xml"))
{
orgstream.CopyTo(newstream);
}
else
{
//Load the xml document to manipulate
var xdoc = new XmlDocument();
xdoc.Load(orgstream);
var xml = xdoc.OuterXml.Replace(oldString, newstring);
xdoc = new XmlDocument();
xdoc.LoadXml(xml);
xdoc.Save(newstream);
}
orgstream.Close();
newstream.Flush();
newstream.Close();
}
}
}
When it is used like this:
[TestMethod]
public void ReplaceXmlTest()
{
var datatable = new DataTable("tblData");
datatable.Columns.AddRange(new[]
{
new DataColumn("Col1", typeof (int)),
new DataColumn("Col2", typeof (int)),
new DataColumn("Col3", typeof (string))
});
for (var i = 0; i < 10; i++)
{
var row = datatable.NewRow();
row[0] = i;
row[1] = i * 10;
row[2] = i % 2 == 0 ? "ABCD" : "AXCD";
datatable.Rows.Add(row);
}
using (var pck = new ExcelPackage())
{
var workbook = pck.Workbook;
var worksheet = workbook.Worksheets.Add("source");
worksheet.Cells.LoadFromDataTable(datatable, true);
worksheet.Tables.Add(worksheet.Cells["A1:C11"], "Table1");
//Now similulate the copy/open of the excel file into a zip archive
using (var orginalzip = new ZipArchive(new MemoryStream(pck.GetAsByteArray()), ZipArchiveMode.Read))
{
var fi = new FileInfo(#"c:\temp\ReplaceXmlTest.xlsx");
if (fi.Exists)
fi.Delete();
orginalzip.ReplaceXmlString(fi, "AXCD", "REPLACED!!");
}
}
}
Gives this:
Just keep in mind that this is completely brute force. Anything you can do to make the file filter smarter rather then simply doing ALL xml files would be a very good thing. Maybe limit it to the SharedString.xml file if that is where the problem lies or in the xml files in the worksheet folders. Hard to say without knowing more about the data.

Read CSV file and insert to LocalDB (asp.net MVC)

I'm trying out a project with ASP.Net MVC and have a large CSV file that I want to save to the LocalDB.
I have been following this tutorial (and the ones before that are about MVC): https://learn.microsoft.com/en-us/aspnet/mvc/overview/getting-started/introduction/creating-a-connection-string
Now I want to add data to this database that I have set up and I would like to read this data from a csv file and then save it to my database.
I have tried this: https://www.aspsnippets.com/Articles/Upload-Read-and-Display-CSV-file-Text-File-data-in-ASPNet-MVC.aspx
but when I try to upload my file I get an error that my file is too large?
I would love it if it could be automated so that when I start my application the database will be populated with the data from my csv file (and if it already is populated it will not do it again) or just some way of coding so that I can add the data from my csv file to the database (LocalDB).
protected override void Seed(ProductsDBContext context)
{
Assembly assembly = Assembly.GetExecutingAssembly();
string resourceName = "WebbApplication.App_Data.SeedData.price_detail.csv";
using (Stream stream = assembly.GetManifestResourceStream(resourceName))
{
using (StreamReader reader = new StreamReader(stream, Encoding.UTF8))
{
CsvReader csvReader = new CsvReader(reader);
var products = csvReader.GetRecords<PriceDetail>().ToArray();
context.PriceDetails.AddOrUpdate(c => c.PriceValueId, products);
}
}
}

Your second link includes the following line:
string csvData = System.IO.File.ReadAllText(filePath);
If you are getting an Out of Memory Exception, then you should not load the entire file into memory at once - i.e. do not read all of the text.
The StreamReader has a built-in function to handle this.
System.IO.StreamReader file = new System.IO.StreamReader("WebbApplication.App_Data.SeedData.price_detail.csv");
while((line = file.ReadLine()) != null)
{
System.Console.WriteLine(line);
//Replace with your operation below
}
Potentially the same problem solved at this question.

With Cinchoo ETL - an open source library, you can bulk load CSV file into sqlserver with few lines of code.
using (var p = new ChoCSVReader(** YOUR CSV FILE **)
.WithFirstLineHeader()
)
{
p.Bcp("** ConnectionString **", "** tablename **");
}
For more information, please visit codeproject article.
Hope it helps.

Reading contents of CSV file

I have loaded a CSV file
Here is a sample of the content available in the CSV file
Name,Address,Address1,LandMark,User_location,City,State,Phone1,Phone2,Email,Category
Sriram Electricals and Plumbing Contractors,No 12, Vinayakar Koil Street Easa,"Back Side Of Therasa School,",Pallavaram,Chennai,Tamil Nadu,(044) 66590405,,sriram#gmail.com,Electrican
I've tried to convert the file to a list
public ActionResult UserCsv(HttpPostedFileBase uploadfile)
{
using (var sr = new StreamReader(uploadfile.InputStream, Encoding.UTF8))
{
var reader = new CsvReader(sr);
//CSVReader will now read the whole file into an enumerable
IEnumerable<UserCSVModel> records = reader.GetRecords<UserCSVModel>();
}
}
Unable to get a correct output.

try this article:
http://www.codeproject.com/Articles/415732/Reading-and-Writing-CSV-Files-in-Csharp
Or this Q on stack over flow:
Reading CSV file and storing values into an array
hope it helps.

Have a look a http://www.filehelpers.net/. It's a great library for working with CSV files and will give you an Enumerable that you can work with

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to read from excel in a specific encoding? - c#

You can convert the bad string to the correct encoding like this: Console.WriteLine(encodingHU.GetString(Encoding.Default.GetBytes(str)));

I used ExcelDataReader. I needed coding 1250 ExcelReaderConfiguration conf = new ExcelReaderConfiguration(); conf.FallbackEncoding = Encoding.GetEncoding(1250); using (IExcelDataReader reader = ExcelReaderFactory.CreateReader(stream, conf)) { ... }

Related

What is the best way to check if a file contains a key before adding a new line?

c# docx load xml from string xml

Replacing Invalid XML characters from an excel file and writing it back to disk causes file is corrupted error on opening in MS Excel

Read CSV file and insert to LocalDB (asp.net MVC)

Reading contents of CSV file

Categories

Resources