I am using papa parse as csv parser for my data but I can't convert the data to UTF-8
string name = dataitem.Headers.ContentDisposition.FileName.Replace("\"", "");
string newFileName = Guid.NewGuid() + Path.GetExtension(name);
File.Move(dataitem.LocalFileName, Path.Combine(rootPath, newFileName));
List<JObject> rows = new List<JObject>();
using (FileStream stream = File.OpenRead(Path.Combine(rootPath, newFileName)))
{
Papa.parse(stream, new Config()
{
header = true,
skipEmptyLines = true,
encoding = Encoding.UTF8,
complete = parsed =>
{
foreach (JObject jo in JArray.Parse(parsed.dataWithHeader.DumpAsJson()))
rows.Add(jo);
var dt = new DataTable();
dt.Columns.Add("data");
foreach (object jo in rows)
dt.Rows.Add(jo.ToString());
if (result.Rows[0]["Result"].ToString() == "False")
{
throw new Exception(result.Rows[0]["Message"].ToString());
}
}
});
}
File.Delete(Path.Combine(rootPath, newFileName));
Upon checking when the Papaparse parse the data. It produces the wrong data from Evelyn N Baliño to Evelyn N Bali�o but I already change the encoding to UTF8. What am I doing wrong? should I specify the encoding in the FileStream?
Related
I was trying to read CSV file in C#.
I have tried File.ReadAllLines(path).Select(a => a.Split(';')) way but the issue is when there is \n multiple line in a cell it is not working.
So I have tried below
using LumenWorks.Framework.IO.Csv;
var csvTable = new DataTable();
using (TextReader fileReader = File.OpenText(path))
using (var csvReader = new CsvReader(fileReader, false))
{
csvTable.Load(csvReader);
}
for (int i = 0; i < csvTable.Rows.Count; i++)
{
if (!(csvTable.Rows[i][0] is DBNull))
{
var row1= csvTable.Rows[i][0];
}
if (!(csvTable.Rows[i][1] is DBNull))
{
var row2= csvTable.Rows[i][1];
}
}
The issue is the above code throwing exception as
The CSV appears to be corrupt near record '0' field '5 at position '63'
This is because the header of CSV's having two double quote as below
"Header1",""Header2""
Is there a way that I can ignore double quotes and process the CSV's.
update
I have tried with TextFieldParser as below
public static void GetCSVData()
{
using (var parser = new TextFieldParser(path))
{
parser.HasFieldsEnclosedInQuotes = false;
parser.Delimiters = new[] { "," };
while (parser.PeekChars(1) != null)
{
string[] fields = parser.ReadFields();
foreach (var field in fields)
{
Console.Write(field + " ");
}
Console.WriteLine(Environment.NewLine);
}
}
}
The output:
Sample CSV data I have used:
Any help is appreciated.
Hope this works!
Please replace two double quotes as below from csv:
using (FileStream fs = new FileStream(Path, FileMode.Open, FileAccess.ReadWrite, FileShare.None))
{
StreamReader sr = new StreamReader(fs);
string contents = sr.ReadToEnd();
// replace "" with "
contents = contents.Replace("\"\"", "\"");
// go back to the beginning of the stream
fs.Seek(0, SeekOrigin.Begin);
// adjust the length to make sure all original
// contents is overritten
fs.SetLength(contents.Length);
StreamWriter sw = new StreamWriter(fs);
sw.Write(contents);
sw.Close();
}
Then use the same CSV helper
using LumenWorks.Framework.IO.Csv;
var csvTable = new DataTable();
using (TextReader fileReader = File.OpenText(path))
using (var csvReader = new CsvReader(fileReader, false))
{
csvTable.Load(csvReader);
}
Thanks.
I have the following code that I'm trying to use to parse a CSV file that is being uploaded:
private Dictionary<string, string[]> LoadData(IFormFile file)
{
// Verify that the user selected a file
if (file != null && file.Length > 0)
{
string wwwPath = this.environment.WebRootPath;
// string contentPath = this.environment.ContentRootPath;
string path = Path.Combine(wwwPath, "WeeklySchedules");
if (!Directory.Exists(path))
{
Directory.CreateDirectory(path);
}
string fileName = Path.GetFileName(file.FileName);
using (FileStream stream = new FileStream(Path.Combine(path, fileName), FileMode.Create))
{
file.CopyTo(stream);
// System.Threading.Thread.Sleep(1000);
using (TextFieldParser parser = new TextFieldParser(Path.Combine(path, fileName)))
{
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(",");
Dictionary<string, string[]> parsedData = new Dictionary<string, string[]>();
while (!parser.EndOfData)
{
// Process row
string[] fields = parser.ReadFields();
int count = 0;
if (count++ == 0)
{
continue;
}
var pickup = fields[0];
var pickupDate = fields[1];
var dropoff = fields[2];
var dropoffDate = fields[3];
var driver = fields[7];
var pickupTime = DateTime.Parse(pickupDate).ToLongTimeString();
// string[] data =
}
}
}
}
return null;
}
You will note that I am passing the path to the uploaded stream to the parser, rather than the stream itself. I tried passing in the stream, but that doesn't work either. When I check in wwwroot/WeeklySchedules, the file is there. But when the parser gets to it, it comes back as empty. I even threw in a Sleep() to see if I was just hitting the file too soon. But that didn't make any difference.
I am getting some weird errors with the original stream, but the file is written, which is puzzling to me.
The errors are:
stream.ReadTimeout = 'stream.ReadTimeout' threw an exception of type 'System.InvalidOperationException'
stream.WriteTimeout = 'stream.WriteTimeout' threw an exception of type 'System.InvalidOperationException'
I've read through a bunch of blog posts and SO questions on the technique for loading/parsing a CSV file, but none of them indicate this as an issue.
Does anyone have any ideas?
Your first file stream is still open in your first using and you try to read it again with TextFieldParser
private Dictionary<string, string[]> LoadData(IFormFile file)
{
// Verify that the user selected a file
if (file != null && file.Length > 0)
{
string wwwPath = this.environment.WebRootPath;
// string contentPath = this.environment.ContentRootPath;
string path = Path.Combine(wwwPath, "WeeklySchedules");
if (!Directory.Exists(path))
{
Directory.CreateDirectory(path);
}
string fileName = Path.GetFileName(file.FileName);
using (FileStream stream = new FileStream(Path.Combine(path, fileName), FileMode.Create))
{
file.CopyTo(stream);
}
// System.Threading.Thread.Sleep(1000);
using (TextFieldParser parser = new TextFieldParser(Path.Combine(path, fileName)))
{
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(",");
Dictionary<string, string[]> parsedData = new Dictionary<string, string[]>();
while (!parser.EndOfData)
{
// Process row
string[] fields = parser.ReadFields();
int count = 0;
if (count++ == 0)
{
continue;
}
var pickup = fields[0];
var pickupDate = fields[1];
var dropoff = fields[2];
var dropoffDate = fields[3];
var driver = fields[7];
var pickupTime = DateTime.Parse(pickupDate).ToLongTimeString();
// string[] data =
}
}
}
return null;
}
Preserving your code going via a file; untangle the 2 using statements, to ensure the file has been written completely and has been closed properly, before the parser starts reading it.
using (FileStream stream = new FileStream(Path.Combine(path, fileName), FileMode.Create))
{
file.CopyTo(stream);
}
using (TextFieldParser parser = new TextFieldParser(Path.Combine(path, fileName)))
{
// ..
}
This question already has an answer here:
Writing Large File To Disk Out Of Memory Exception
(1 answer)
Closed 2 years ago.
We have an endpoint that loads records from the database and creates a CSV from the records and then returns the file stream. But when the records are greater than 200K, we get OutOfMemoryException.
public async Task<IActionResult> Export()
{
var records = // get all records from the database
var memoryStream = new MemoryStream();
var streamWriter = new StreamWriter(memoryStream);
var csvWriter = new CsvWriter(streamWriter, CultureInfo.InvariantCulture);
await csvWriter.WriteRecordsAsync(records);
csvWriter.Flush();
streamWriter.Flush();
memoryStream.Flush();
string filename = $"Records_{DateTime.UtcNow.ToString("yyyy-MM-ddTHH:mm:ss")}.csv";
memoryStream.Seek(0, SeekOrigin.Begin);
return File(memoryStream, "text/csv", filename);
}
Is there a better way of doing this to prevent OutOfMemoryException.
I dont know why you are getting this error. but from my below code, I am able to read large size of data, I tested with 3Gb approx data. What is your data size?
Here is my code using CSV helper.
private IEnumerable<Dictionary<string, EntityProperty>> ReadCSV(Stream source, IEnumerable<TableField> cols)
{
using (TextReader reader = new StreamReader(source, Encoding.UTF8))
{
var cache = new TypeConverterCache();
cache.AddConverter<float>(new CSVSingleConverter());
cache.AddConverter<double>(new CSVDoubleConverter());
var csv = new CsvReader(reader,
new CsvHelper.Configuration.CsvConfiguration(global::System.Globalization.CultureInfo.InvariantCulture)
{
Delimiter = ";",
HasHeaderRecord = true,
CultureInfo = global::System.Globalization.CultureInfo.InvariantCulture,
TypeConverterCache = cache
});
csv.Read();
csv.ReadHeader();
var map = (
from col in cols
from src in col.Sources()
let index = csv.GetFieldIndex(src, isTryGet: true)
where index != -1
select new { col.Name, Index = index, Type = col.DataType }).ToList();
while (csv.Read())
{
yield return map.ToDictionary(
col => col.Name,
col => EntityProperty.CreateEntityPropertyFromObject(csv.GetField(col.Type, col.Index)));
}
}
}
I'm trying to generate a json file from exel files. I have different Excel files and I would like to read them and generate a json file. I imagine it must be quite easy, but I'm having some trouble.
Ok, so I read this link using Excel reader tool, as this is what my leader says we should use. I tried following this link https://www.hanselman.com/blog/ConvertingAnExcelWorksheetIntoAJSONDocumentWithCAndNETCoreAndExcelDataReader.aspx
I always get the readTimeout and writeTimeout error. Also it never reads my Excel. It always writes null on my json document.
public static IActionResult GetData(
[HttpTrigger(AuthorizationLevel.Function, "get", "post", Route = null)] HttpRequest req,
ILogger log)
{
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
var inFilePath = "C:\\Users\\a\\Desktop\\exelreader\\Wave.xlsx";
var outFilePath = "C:\\Users\\a\\Desktop\\exelreader\\text.json";
using (var inFile = File.Open(inFilePath, FileMode.Open, FileAccess.Read))
using (var outFile = File.CreateText(outFilePath))
{
using (var reader = ExcelReaderFactory.CreateReader(inFile, new ExcelReaderConfiguration()
{ FallbackEncoding = Encoding.GetEncoding(1252) }))
using (var writer = new JsonTextWriter(outFile))
{
writer.Formatting = Formatting.Indented; //I likes it tidy
writer.WriteStartArray();
reader.Read(); //SKIP FIRST ROW, it's TITLES.
do
{
while (reader.Read())
{
//peek ahead? Bail before we start anything so we don't get an empty object
var status = reader.GetString(1);
if (string.IsNullOrEmpty(status)) break;
writer.WriteStartObject();
writer.WritePropertyName("Source");
writer.WriteValue(reader.GetString(1));
writer.WritePropertyName("Event");
writer.WriteValue(reader.GetString(2));
writer.WritePropertyName("Campaign");
writer.WriteValue(reader.GetString(3));
writer.WritePropertyName("EventDate");
writer.WriteValue(reader.GetString(4));
//writer.WritePropertyName("FirstName");
//writer.WriteValue(reader.GetString(5).ToString());
//writer.WritePropertyName("LastName");
//writer.WriteValue(reader.GetString(6).ToString());
writer.WriteEndObject();
}
} while (reader.NextResult());
writer.WriteEndArray();
}
}
//never mind this return
return null;
}
Can anybody give some help on this matter. The idea is to read the first row of my Excel files as headers and then the other rows as values, so I can write the json.
For converting excel data to json, you could try read excel data as dataset and then serialize the dataset to json.
Try code below:
public async Task<IActionResult> ConvertExcelToJson()
{
var inFilePath = #"xx\Wave.xlsx";
var outFilePath = #"xx\text.json";
using (var inFile = System.IO.File.Open(inFilePath, FileMode.Open, FileAccess.Read))
using (var outFile = System.IO.File.CreateText(outFilePath))
{
using (var reader = ExcelReaderFactory.CreateReader(inFile, new ExcelReaderConfiguration()
{ FallbackEncoding = Encoding.GetEncoding(1252) }))
{
var ds = reader.AsDataSet(new ExcelDataSetConfiguration()
{
ConfigureDataTable = (_) => new ExcelDataTableConfiguration()
{
UseHeaderRow = true
}
});
var table = ds.Tables[0];
var json = JsonConvert.SerializeObject(table, Formatting.Indented);
outFile.Write(json);
}
}
return Ok();
}
For AsDataSet, install package ExcelDataReader.DataSet, if you got any error related with Encoding.GetEncoding(1252), configure the code below in Startup.cs
System.Text.Encoding.RegisterProvider(System.Text.CodePagesEncodingProvider.Instance);
Reference: ExcelDataReader
I know how to convert a json data into datatable, here I need to know if there is any formula to get the expected datatable row without actually converting the json into datatable.
as already commented, parse the big JSON as a stream to handle huge amounts.
Then it's up to you to count the rows or process it to DataTables without memory exceptions:
using (FileStream s = File.Open("big.json")) // or any other stream
using (StreamReader streamReader = new StreamReader(s))
using (JsonTextReader reader = new JsonTextReader(streamReader))
{
reader.SupportMultipleContent = true;
int rowCount = 0;
var serializer = new JsonSerializer();
while (reader.Read())
{
if (reader.TokenType == JsonToken.StartObject)
{
DataRow r = serializer.Deserialize<Contact>(reader);
rowCount++;
}
}
}
You can filter using JObject using this way
string jsonData = "";
using (StreamReader reader = new StreamReader("big.json"))
{
jsonData = reader.ReadToEnd();
reader.Close();
}
JObject o = JObject.Parse(jsonData);
var results = o["datatable"].Where(x => (bool)x["filter"]).ToArray();