Working with CSV file

Working with CSV file - c#

I've been working and trying to solve this problem for maybe a whole week, which at this point I am wondering if I can solve it without it diving even deeper into the C# language, and I'm quite fairly new to C#, as well as working with CSV files and sorting and organizing them, so I'm fairly inexperienced into the whole spectrum of this.
I'm trying to sort a CSV file alphabetically, hide items that need to be hidden and have them have depth levels based on their parents, child and grandchild elements.
I've been successful with a couple of them, and written somewhat working code, but I don't know how to sort them alphabetically and give them the proper depth layer based on the parent and child they belong to.
Here's the mockup CSV that I've been trying to organize:
ID;MenuName;ParentID;isHidden;LinkURL
1;Company;NULL;False;/company
2;About Us;1;False;/company/aboutus
3;Mission;1;False;/company/mission
4;Team;2;False;/company/aboutus/team
5;Client 2;10;False;/references/client2
6;Client 1;10;False;/references/client1
7;Client 4;10;True;/references/client4
8;Client 5;10;True;/references/client5
10;References;NULL;False;/references
I've delimited the items by the semicolon, I've displayed the items that needs to be shown, but I fail to sort them like I should.
The sorting should look like this:
Company
About Us
Team
Mission
References
Client 1
Client 2
I've tried to sort them or display them in that order by getting the index of the slash, but what the code reproduces is not how it should be displayed, and, it looks like this:
Company
About Us
Mission
Team
Client 2
Client 1
References
In the other try, where I recursively match their parent id with the id, the console display looks like this:
Company
About Us
Mission
Team
Client 2
Client 1
References
I've tried solving this with a friend, and, even he doesn't know how to approach this problem, since this code should work on a different file that uses different parent ids.
On top of all this, I am unable to index them to an array, because there's only index of 0 or the index is based on their letters or crashes the console if I enter the index position of 1.
Here's the code for the first part where I fail to sort them:
class Program
{
static void Main(string[] args)
{
StreamReader sr = new StreamReader(#"Navigation.csv");
string data = sr.ReadLine();
while (data != null)
{
string[] rows = data.Split(';');
int id;
int parentId;
bool ids = Int32.TryParse(rows[0], out id);
string name = rows[1];
bool pIds = Int32.TryParse(rows[2], out parentId);
string isHidden = rows[3];
string linkUrl = rows[4];
string[] splitted = linkUrl.Split('/');
if (isHidden == "False")
{
List<CsvParentChild> pIdCid = new List<CsvParentChild>()
{
new CsvParentChild(id, parentId, name, linkUrl)
};
}
data = sr.ReadLine();
}
}
}
class CsvParentChild
{
public int Id;
public int ParentId;
public string Name;
public string LinkUrl;
public List<CsvParentChild> Children = new List<CsvParentChild>();
public CsvParentChild(int id, int parentId, string name, string linkUrl)
{
Id = id;
ParentId = parentId;
Name = name;
LinkUrl = linkUrl;
string[] splitted = linkUrl.Split(new char[] { '/' }, StringSplitOptions.RemoveEmptyEntries);
if (splitted.Length == 1)
{
Console.WriteLine($". { name }");
}
else if (splitted.Length == 2)
{
Console.WriteLine($".... { name }");
}
else if (splitted.Length == 3)
{
Console.WriteLine($"....... { name }");
}
}
}
And here's for the second part:
class Program
{
static void Main(string[] args)
{
// Get the path for the file
const string filePath = #"../../Navigation.csv";
// Read the file
StreamReader sr = new StreamReader(File.OpenRead(filePath));
string data = sr.ReadLine();
while (data != null)
{
string[] rows = data.Split(';');
ListItems lis = new ListItems();
int id;
int parentId;
// Get the rows/columns from the Csv file
bool ids = Int32.TryParse(rows[0], out id);
string name = rows[1];
bool parentIds = Int32.TryParse(rows[2], out parentId);
string isHidden = rows[3];
string linkUrl = rows[4];
// Split the linkUrl so that we get the position of the
// elements based on their slash
string [] splitted = linkUrl.Split(new char[] { '/' }, StringSplitOptions.RemoveEmptyEntries);
// If item.isHidden == "False"
// then display the all items whose state is set to false.
// If the item.isHidden == "True", then display the item
// whose state is set to true.
if (isHidden == "False")
{
// Set the items
ListItems.data = new List<ListItems>()
{
new ListItems() { Id = id, Name = name, ParentId = parentId },
};
// Make a new instance of ListItems()
ListItems listItems = new ListItems();
// Loop through the CSV data
for (var i = 0; i < data.Count(); i++)
{
if (splitted.Length == 1)
{
listItems.ListThroughItems(i, i);
}
else if (splitted.Length == 2)
{
listItems.ListThroughItems(i, i);
}
else
{
listItems.ListThroughItems(i, i);
}
}
}
// Break out of infinite loop
data = sr.ReadLine();
}
}
public class ListItems
{
public int Id { get; set; }
public string Name { get; set; }
public int ParentId { get; set; }
public static List<ListItems> data = null;
public List<ListItems> Children = new List<ListItems>();
// http://stackoverflow.com/a/36250045/7826856
public void ListThroughItems(int id, int level)
{
Id = id;
// Match the parent id with the id
List<ListItems> children = data
.Where(p => p.ParentId == id)
.ToList();
foreach (ListItems child in children)
{
string depth = new string('.', level * 4);
Console.WriteLine($".{ depth } { child.Name }");
ListThroughItems(child.Id, level + 1);
}
}
}
}

For each item, you need to construct a kind of "sort array" consisting of ids. The sort array consists of the ids of the item's ancestors in order from most distant to least distant. For "Team", our sort array is [1, 2, 4].
Here are the sort arrays of each item:
[1]
[1, 2]
[1, 3]
[1, 2, 4]
[10, 5]
[10, 6]
[10, 7]
[10, 8]
[10]
Once you have this, sorting the items is simple. When comparing two "sort arrays", start with the numbers in order in each array. If they are different, sort according to the value of the first number and you're done. If they are the same, look at the second number. If there is no second number, then sort by the length of the arrays, i.e., nothing comes before something.
Applying this algorithm, we get:
[1]
[1, 2]
[1, 2, 4]
[1, 3]
[10]
[10, 5]
[10, 6]
[10, 7]
[10, 8]
After that, hide the items based on the flag. I leave that to you because it's so simple. Depth is easy: It's the length of the sort array.

My Application was compiled and produced the following output with your data:
Company
About Us
Team
Mission
References
Client 1
Client 2
Client 4
Client 5
I would attempt to use object relation to create your tree like structure.
The main difficulty with the question is that parents don't matter. Children do.
So at some point in your code, you will need to reverse the hierarchy; Parsing Children first but reading their Parents first to create the output.
The roots of our tree are the data entries without parents.
Parsing
This should be pretty self explanatory, we have a nice class with a constructor that parses the input array and stores the data in it's properties.
We store all the rows in a list. After we are done with this, we pretty much converted the list, but no sorting happened at all.
public partial class csvRow
{
// Your Data
public int Id { get; private set; }
public string MenuName { get; private set; }
public int? ParentId { get; private set; }
public bool isHidden { get; private set; }
public string LinkURL { get; private set; }
public csvRow(string[] arr)
{
Id = Int32.Parse(arr[0]);
MenuName = arr[1];
//Parent Id can be null!
ParentId = ToNullableInt(arr[2]);
isHidden = bool.Parse(arr[3]);
LinkURL = arr[4];
}
private static int? ToNullableInt(string s)
{
int i;
if (int.TryParse(s, out i))
return i;
else
return null;
}
}
static void Main(string[] args)
{
List<csvRow> unsortedRows = new List<csvRow>();
// Read the file
const string filePath = #"Navigation.csv";
StreamReader sr = new StreamReader(File.OpenRead(filePath));
string data = sr.ReadLine();
//Read each line
while (data != null)
{
var dataSplit = data.Split(';');
//We need to avoid parsing the first line.
if (dataSplit[0] != "ID" )
{
csvRow lis = new csvRow(dataSplit);
unsortedRows.Add(lis);
}
// Break out of infinite loop
data = sr.ReadLine();
}
sr.Dispose();
//At this point we got our data in our List<csvRow> unsortedRows
//It's parsed nicely. But we still need to sort it.
//So let's get ourselves the root values. Those are the data entries that don't have a parent.
//Please Note that the main method continues afterwards.
Creating our Tree Strukture and Sorting the items
We start by defining Children and a public ChildrenSorted property that returns them sorted. That's actually allsorting we are doing, it's alot easier to sort than to work recursively.
We also need a function that add's children. It will pretty much filter the input and find all the rows where row.parentId = this.ID.
The last one is the function that defines our output and allows us to get something we can print into the console.
public partial class csvRow
{
private List<csvRow> children = new List<csvRow>();
public List<csvRow> ChildrenSorted
{
get
{
// This is a quite neet way of sorting, isn't it?
//Btw this is all the sorting we are doing, recursion for win!
return children.OrderBy(row => row.MenuName).ToList();
}
}
public void addChildrenFrom(List<csvRow> unsortedRows)
{
// Add's only rows where this is the parent.
this.children.AddRange(unsortedRows.Where(
//Avoid running into null errors
row => row.ParentId.HasValue &&
//Find actualy children
row.ParentId == this.Id &&
//Avoid adding a child twice. This shouldn't be a problem with your data,
//but why not be careful?
!this.children.Any(child => child.Id == row.Id)));
//And this is where the magic happens. We are doing this recursively.
foreach (csvRow child in this.children)
{
child.addChildrenFrom(unsortedRows);
}
}
//Depending on your use case this function should be replaced with something
//that actually makes sense for your business logic, it's an example on
//how to read from a recursiv structure.
public List<string> FamilyTree
{
get
{
List<string> myFamily = new List<string>();
myFamily.Add(this.MenuName);
//Merges the Trees with itself as root.
foreach (csvRow child in this.ChildrenSorted)
{
foreach (string familyMember in child.FamilyTree)
{
//Adds a tab for all children, grandchildren etc.
myFamily.Add("\t" + familyMember);
}
}
return myFamily;
}
}
}
Adding Items to the Tree and accessing them
This is the second part of my main function, where we actually work with our data (Right after sr.Dispose();)
var roots = unsortedRows.Where(row => row.ParentId.HasValue == false).
OrderBy(root => root.MenuName).ToList();
foreach (csvRow root in roots)
{
root.addChildrenFrom(unsortedRows);
}
foreach (csvRow root in roots)
{
foreach (string FamilyMember in root.FamilyTree)
{
Console.WriteLine(FamilyMember);
}
}
Console.Read();
}
Entire Sourcecode (Visual Studio C# Console Application)
You can use this to test, play around and learn more about recursive structures.
Copyright 2017 Eldar Kersebaum
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
using System;
using System.IO;
using System.Collections.Generic;
using System.Linq;
namespace ConsoleApplication49
{
class Program
{
static void Main(string[] args)
{
List<csvRow> unsortedRows = new List<csvRow>();
const string filePath = #"Navigation.csv";
StreamReader sr = new StreamReader(File.OpenRead(filePath));
string data = sr.ReadLine();
while (data != null)
{
var dataSplit = data.Split(';');
//We need to avoid parsing the first line.
if (dataSplit[0] != "ID" )
{
csvRow lis = new csvRow(dataSplit);
unsortedRows.Add(lis);
}
// Break out of infinite loop
data = sr.ReadLine();
}
sr.Dispose();
var roots = unsortedRows.Where(row => row.ParentId.HasValue == false).
OrderBy(root => root.MenuName).ToList();
foreach (csvRow root in roots)
{
root.addChildrenFrom(unsortedRows);
}
foreach (csvRow root in roots)
{
foreach (string FamilyMember in root.FamilyTree)
{
Console.WriteLine(FamilyMember);
}
}
Console.Read();
}
}
public partial class csvRow
{
// Your Data
public int Id { get; private set; }
public string MenuName { get; private set; }
public int? ParentId { get; private set; }
public bool isHidden { get; private set; }
public string LinkURL { get; private set; }
public csvRow(string[] arr)
{
Id = Int32.Parse(arr[0]);
MenuName = arr[1];
ParentId = ToNullableInt(arr[2]);
isHidden = bool.Parse(arr[3]);
LinkURL = arr[4];
}
private static int? ToNullableInt(string s)
{
int i;
if (int.TryParse(s, out i))
return i;
else
return null;
}
private List<csvRow> children = new List<csvRow>();
public List<csvRow> ChildrenSorted
{
get
{
return children.OrderBy(row => row.MenuName).ToList();
}
}
public void addChildrenFrom(List<csvRow> unsortedRows)
{
this.children.AddRange(unsortedRows.Where(
row => row.ParentId.HasValue &&
row.ParentId == this.Id &&
!this.children.Any(child => child.Id == row.Id)));
foreach (csvRow child in this.children)
{
child.addChildrenFrom(unsortedRows);
}
}
public List<string> FamilyTree
{
get
{
List<string> myFamily = new List<string>();
myFamily.Add(this.MenuName);
foreach (csvRow child in this.ChildrenSorted)
{
foreach (string familyMember in child.FamilyTree)
{
myFamily.Add("\t" + familyMember);
}
}
return myFamily;
}
}
}
}

Related

Problem in databinding Array data to DataGridView in c#

I have been binding short data to DataGridView in C# Winforms. However, I need to bind long string array with size 75 to DataGridView. My data list class consists of 6 individual variables with get and set and array of string which I have defined get and set properties. The individual variables are displayed but the array of strings is not displayed in DataGridView. In debug, I checked the data source of DataGridView and it seems ok. How can I display binded array in gridview.
Below is my source code to populate DataGridView named Logview
public void populateLogData(string path)
{
StreamReader sr = null;
BindingList<LogList> bindLogList;
BindingSource bLogsource = new BindingSource();
List<LogList> loglist = new List<LogList>();
try
{
Logview.DataSource = null;
Logview.Rows.Clear();
Logview.Columns.Clear();
Logview.AutoGenerateColumns = true;
if (File.Exists(path))
{
try
{
sr = new StreamReader(path);
StringBuilder readline = new StringBuilder(sr.ReadLine());
if (readline.ToString() != null && readline.ToString() != "")
{
readline = new StringBuilder(sr.ReadLine());
while (readline.ToString() != null && readline.ToString() != "")
{
string[] subdata = readline.ToString().Split(',');
LogList tloglist = new LogList(subdata[0], subdata[1], subdata[2], subdata[3], subdata[4], subdata[5], max_index);
for (int i = 6; i < subdata.Length; i++)
tloglist.setPartList(i-6, subdata[i]);
loglist.Add(new LogList(subdata, subdata.Length));
readline = new StringBuilder(sr.ReadLine());
}
}
bindLogList = new BindingList<LogList>(loglist);
bLogsource.DataSource = bindLogList;
Logview.AutoGenerateColumns = true;
Logview.DataSource = bindLogList;
Logview.Columns[0].Width = 140; // project name
Logview.Columns[1].Width = 140; // data/time
Logview.Columns[2].Width = 90;
Logview.Columns[3].Width = 90;
Logview.Columns[4].Width = 90;
Logview.Columns[5].Width = 90;
// max_index is set from another part of code
for(int i = 0; i <= max_index; i++)
{
int counter = 6 + i;
Logview.Columns.Add(headertext[i], headertext[i]);
Logview.Columns[counter].Width = 90;
Logview.Columns[counter].HeaderText = headertext[i];
}
}
catch (IOException io)
{
MessageBox.Show("Error: Cannot Open log file.");
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
finally
{
if (sr != null) sr.Close();
}
}
else
{
MessageBox.Show("Log file not found \n" + path);
}
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
finally
{
GC.Collect();
}
}
Below is LogList class
class LogList
{
const int max_size = 100;
private string[] holdList;
public string project { get; set; }
public string date_time { get; set; }
public string Qty { get; set; }
public string Pass { get; set; }
public string Fail { get; set; }
public string Result { get; set; }
public string[] partlist
{
get
{
return holdList;
}
set
{
holdList = value;
}
}
public LogList(string project, string date_time, string Qty, string Pass, string Fail, string Result, int partsize)
{
this.project = project;
this.date_time = date_time;
this.Qty = Qty;
this.Pass = Pass;
this.Fail = Fail;
this.Result = Result;
partlist = new string[partsize+1];
}
public void setPartList(int size, string getValue)
{
partlist[size] = getValue;
}
}
Project, date/time, Qty, Pass, Fail, Result is displayed. But partlist array is not displayed.

To supplement IVSoftware’s answer, below is an example using two grids in a master-detail scenario.
One issue I would have with your current approach, is that it uses an Array for the “parts list.” Currently this is a string array, and that isn’t going to work if we want to display it in a grid. Fortunately, there are a few easy ways we can get the data to display as we want.
One simple solution is to create a “wrapper” Class for the string. I will call this Class Part. I added a simple int ID property and the string PartName property. You could easily leave out the ID and have a simple string wrapper. This simple Class may look something like…
public class Part {
public int ID { get; set; }
public string PartName { get; set; }
}
This should allow the data to display correctly in the grid using just about any construct like an array, list etc.… So, we “could” change your current code to use an array of Part objects like…
Part[] Parts = new Parts[X];
And this would work, however, if we use an array and we know for sure that each LogItem may have a different number of parts in its PartsList, then we will have to manage the array sizes. So, a BindingList of Part objects will simplify this. The altered LogList (LogItem) Class is below…
public class LogItem {
public BindingList<Part> PartsList { get; set; }
public string Project { get; set; }
public string Date_Time { get; set; }
public string Qty { get; set; }
public string Pass { get; set; }
public string Fail { get; set; }
public string Result { get; set; }
public LogItem(string project, string date_Time, string qty, string pass, string fail, string result) {
Project = project;
Date_Time = date_Time;
Qty = qty;
Pass = pass;
Fail = fail;
Result = result;
PartsList = new BindingList<Part>();
}
}
So given the updated Classes, this should simplify things and we will use the same DataSource for both grids. This DataSource for the “master” grid will be a BindingList of LogItem objects. In the “detail” grid, we simply need to point it’s DataMember property to the PartsList property of the currently selected LogItem. And this would look something like…
dgvLogs.DataSource = LogsBL;
if (LogsBL.Count > 0) {
dgvParts.DataMember = "PartsList";
dgvParts.DataSource = LogsBL;
}
Below is the code to test the Classes above in a master-detail scenario with two grids. Create a new winform solution and drop two (2) DataGridViews on the form. The grid on the left is dgvLogs and the grid on the right is dgvParts.
public void populateLogData(string path) {
BindingList<LogItem> LogsBL = new BindingList<LogItem>();
string currentLine;
if (File.Exists(path)) {
try {
using (StreamReader sr = new StreamReader(path)) {
LogItem tempLogItem;
currentLine = sr.ReadLine(); // <- header row - ignoring
currentLine = sr.ReadLine();
while (currentLine != null) {
if (!string.IsNullOrEmpty(currentLine)) {
string[] splitArray = currentLine.Split(',');
if (splitArray.Length >= 6) {
tempLogItem = new LogItem(splitArray[0], splitArray[1], splitArray[2], splitArray[3], splitArray[4], splitArray[5]);
for (int i = 6; i < splitArray.Length; i++) {
tempLogItem.PartsList.Add(new Part { ID = i, PartName = splitArray[i] });
}
LogsBL.Add(tempLogItem);
}
else {
Debug.WriteLine("DataRead Error: Not enough items to make a LogItem: " + currentLine);
}
}
else {
Debug.WriteLine("DataRead Empty row");
}
currentLine = sr.ReadLine();
}
}
dgvLogs.DataSource = LogsBL;
if (LogsBL.Count > 0) {
dgvParts.DataMember = "PartsList";
dgvParts.DataSource = LogsBL;
}
}
catch (IOException io) {
MessageBox.Show("Error: Cannot Open log file.");
}
catch (Exception ex) {
MessageBox.Show(ex.Message + " Stacktrace- " + ex.StackTrace);
}
}
else {
MessageBox.Show("Log file not found \n" + path);
}
}
And some test data…
H1,h2,h3,h4,h5,h6,h7,h8
Model: LMG600N_IF_2blablas,2022-9-6,112,61,51,Fail,p1,p3,p4,p5,p6
1,2022-9-6,2112,621,251,Pass,px4,px5,px6,px1,px2,px3
data1,2022-9-7,3456,789,123,Fail,z3,z3,z4
Model: LMG600N_IF_2blablas,2022-9-6,112,61,51,Fail
Model: LMG600N_IF_2blablas,2022-9-6,112,61,51,Fail,p1,p3,p4,p5,p6,p7,p8,p99
BadData Model: LMG600N_IF_2blablas,2022-9-6,112,61
Moxxxdel: LMG600N_IF_2blablas,2022-9-6,11x2,6x1,5x1,Fail
Hope this helps and makes sense.

Your data list class consists of 6 individual variables with get and set, and an array of string. Your question is about the variables are displayed but the array of strings is not.
Here's what has worked for me (similar to the excellent suggestion by JohnG) for displaying the string array. What I'm doing here is taking a DataGridView and dropping in my main form without changing any settings (other than to Dock it). Given the default settings, the LogList class (shown here in a minimal reproducible example of 1 variable and 1 array of strings) is defined with a public string property named PartList and with this basic implementation:
class LogList
{
public LogList(string product, string[] partList)
{
Product = product;
_partList = partList;
}
public string Product { get; set; }
private string[] _partList;
public string PartList => string.Join(",", _partList);
}
To autoconfigure the DataGridView with Product and PartList columns, here is an example initializer method that sets the DataSource and adds the first three items as a test:
// Set data source property once. Clear it, Add to it, but no reason to nullify it.
BindingList<LogList> DataSource { get; } = new BindingList<LogList>();
private void InitDataGridView()
{
dataGridView1.DataSource = DataSource;
// Auto config columns by adding at least one Record.
DataSource.Add(
new LogList(
product: "LMG450",
// Four parts
partList: new string[]
{
"PCT2000",
"WCT100",
"ZEL-0812LN",
"EN61000-3-3/-11",
}
));
DataSource.Add(
new LogList(
product: "LMG600N",
// Three parts
partList: new string[]
{
"LTC2280",
"BMS6815",
"ZEL-0812LN",
}
));
DataSource.Add(
new LogList(
product: "Long Array",
// 75 parts
partList: Enumerable.Range(1, 75).Select(x => $"{ x }").ToArray()
));
// Use string indexer to access columns for formatting purposes.
dataGridView1
.Columns[nameof(LogList.Product)]
.AutoSizeMode = DataGridViewAutoSizeColumnMode.AllCells;
dataGridView1
.Columns[nameof(LogList.PartList)]
.AutoSizeMode = DataGridViewAutoSizeColumnMode.Fill;
}
After running this code, the DGV looks like this:
With the mouse hovered over the item all 75 "parts" can be viewed.
One last thing - I notice you have some methods to assign a new partList[] of perhaps change an individual part at a specified index. (I didn't show them in the minimal sample but for sure you'll want things like that). You probably know this but make sure to call dataGridView1.Refresh after altering properties of an existing row/LogList object so that the view will reflect the changes.
I hope there's something here that offers a few ideas to achieve the outcome you want.

C# - Reading data from CSV file and show them in DataGridView

public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void btnLoad_Click(object sender, EventArgs e)
{
dgvData.DataSource = LoadCSV(#"C:\working\Summary.csv");
}
public List<Product> LoadCSV(string csvFile)
{
var query = from line in File.ReadAllLines(csvFile)
let data = line.Split(',')
select new Product
{
A = data[0],
B = data[1]
};
return query.ToList();
}
public class Product
{
public string A { get; set; }
public string B { get; set; }
}
}
I am a beginner who started using C# from last week for work.
.csv file containing simple numbers is read, but it is containing spaces which result in an error
System.IndexOutOfRangeException

Following is a simplified, non-LINQ version of the LoadCSV() method which might help you understand your scenario better in code. The method -
creates a Product only if the line has any value
creates the Product with only property A
sets a value for property B only if the second value is available
public List<Product> LoadCSV(string csvFile)
{
// create an empty list
var list = new List<Product>();
// read all the lines
var lines = File.ReadAllLines(csvFile);
// do some processing for each line
foreach (var line in lines)
{
// split line based on comma, only if line is not an empty string
// if line is an empty string, skip processing
var data = line.Split(',', StringSplitOptions.RemoveEmptyEntries);
if (data.Length == 0)
continue;
// we skipped empty lines, so data has at least one element
// we can safely create a Product with the first element for property A
var product = new Product { A = data[0] };
// if data has more than one element, then we have a second element
// we can safely assign the second element to property B
if (data.Length > 1)
{
product.B = data[1];
}
// add the product to list
list.Add(product);
}
return list;
}

Cannot get more than one result when using System.Linq.Dynamic.Core to query a list of objects that have a member dictionary

I am using System.Linq.Dynamic.Core v1.0.8.18
I am abbreviating the object I have--I have eliminated the JSON tags for serialization/deserialization as well as the constructor. Below is the abbreviated class for a line item on an order. Please note that this object is deserialized from JSON, and the purpose of the "other" dictionary is to capture any name/value pair that is not explicitly defined in the object (which works exactly as it should in testing and production):
public partial class OrderRequestItem
{
public string line_number { get; set; }
public decimal quantity { get; set; }
public string supplier_id { get; set; }
public string supplier_aux_id { get; set; }
public decimal unitprice { get; set; }
public string description { get; set; }
public string uom { get; set; }
public IDictionary<string, object> other;
public decimal extension
{
get
{
return unitprice * quantity;
}
}
public bool validated { get; set; }
public bool rejected { get; set; }
}
I am attempting to "split" an order using the following code based on a JSON config file entry that specifies which fields to split the order on (parameter 2):
private List<OrderRequest> SplitOrder(OrderRequest originalOrder, string[] orderSplitLineItemFields = null)
{
var retval = new List<OrderRequest>();
if (null == orderSplitLineItemFields || originalOrder.items.Count < 2) //Can't return more than one order if we don't have fields to split by, and we don't have at least 2 line items.
{
retval.Add(originalOrder);
}
else
{
var bareOrderHeader = (OrderRequest)originalOrder.DeepClone();
bareOrderHeader.items.Clear();
var firstLineItem = originalOrder.items[0];
var validOrderSplitLineItemFields = new List<string>();
var dynamicQueryBase = new List<string>();
int validFieldCount = 0;
foreach (var field in orderSplitLineItemFields)
{
if (firstLineItem.HasProperty(field))
{
validOrderSplitLineItemFields.Add(field);
dynamicQueryBase.Add(field + " = #" + validFieldCount++);
}
else if (null != firstLineItem.other[field])
{
validOrderSplitLineItemFields.Add("other[\"" + field + "\"]");
dynamicQueryBase.Add("other[\"" + field + "\"]" + " = #" + validFieldCount++);
}
}
if(validOrderSplitLineItemFields.Count<1) //Can't return more than one order if we don't have valid fields to split by.
{
retval.Add(originalOrder);
}
else //We have valid fields to split the order, so we might be able to return more than one order.
{
string distinctFields = String.Join(",", validOrderSplitLineItemFields);
var distinctFieldValues = originalOrder.items.AsQueryable().Select(distinctFields).Distinct();
var dynamicWhere = string.Join(" and ", dynamicQueryBase);
var originalLineItems = originalOrder.items.AsQueryable();
foreach (var distinctResult in distinctFieldValues)
{
var newOrderSplit = (OrderRequest)bareOrderHeader.DeepClone();
var results = originalLineItems.Where(dynamicWhere, distinctResult);
foreach (var lineitem in results)
{
newOrderSplit.items.Add(lineitem);
}
retval.Add(newOrderSplit);
}
}
}
return retval;
}
The field that I am attempting to split on is called "requested_delivery_date" which is being properly passed in to the SplitOrder function. Because this is not an actual property of OrderRequestItem, the split code checks (and in fact succeeds) in looking at/finding a dictionary entry in the "other" property and appropriately adds the field to the list of dynamic fields upon which to query--(I do it this way because the specifically defined properties are "required" and I won't be able to predict what additional fields we may be sent on future orders with other buyers).
I have a sample order file that contains 4 line items. The lines 1, 2, 3 all have a defined other["requested_delivery_date"] = 2018-09-29, and line 4 has a other["requested_delivery_date"] = 2018-09-30.
Based on the code, I would expect to return two orders, one with line items 1-3, and another with only line 4. However, what I am actually getting back is two orders, one with only line item #1, and another with only line item #4. It seems as though the line
var results = originalLineItems.Where(dynamicWhere, distinctResult);
only ever returns a single result when I query against the dictionary that is a member of OrderRequestItem.
I have been beating my head against the wall here for the better part of the day and I don't understand why I only get a single result when the debugger is showing me that the original list of items I am querying have more matches. I'm starting to think it is a bug in the current version of System.Linq.Dynamic.Core.
Any help/suggestions appreciated! Keep in mind that I need to use dynamic linq since I will be dealing with new or changed additional fields on the line items all the time--so going back to "regular linq" isn't an option here.

Changed this
dynamicQueryBase.Add("other[\"" + field + "\"]" + " = #" + validFieldCount++);
to this
dynamicQueryBase.Add("other[\"" + field + "\"].ToString()" + " = #" + validFieldCount++);
makes it work as expected.

I can't test right now, maybe the default return for "where" is only a single item.
Try
var results = originalLineItems.Where(dynamicWhere, distinctResult).ToList();
And check if it's working fine.

How to read large text files and keep tracking of the information of previous lines using C#?

(This problem is a adaptation of a real life scenario, I reduced the problem so it is easy to understand, otherwise this question would be 10000 lines long)
I have a pipe delimited text file that looks like this (the header is not in the file):
Id|TotalAmount|Reference
1|10000
2|50000
3|5000|1
4|5000|1
5|10000|2
6|10000|2
7|500|9
8|500|9
9|1000
The reference is optional and is the Id of another entry in this text file. The entries that have a reference, are considered "children" of that reference, and the reference is their parent. I need to validate each parent in the file, and the validation is that the sum of TotalAmount of it's children should be equal to the parent's total amount. The parents can be either first or before their children in the file, like the entry with Id 9, that comes after it's children
In the provided file, the entry with Id 1 is valid, because the sum of the total amount of it's children (Ids 3 and 4) is 10000 and the entry with Id 2 is invalid, because the sum of it's children (Ids 5 and 6) is 20000.
For a small file like this, I could just parse everything to objects like this (pseudo code, I don't have a way to run it now):
class Entry
{
public int Id { get; set; }
public int TotalAmout { get; set; }
public int Reference { get; set; }
}
class Validator
{
public void Validate()
{
List<Entry> entries = GetEntriesFromFile(#"C:\entries.txt");
foreach (var entry in entries)
{
var children = entries.Where(e => e.Reference == entry.Id).ToList();
if (children.Count > 0)
{
var sum = children.Sum(e => e.TotalAmout);
if (sum == entry.TotalAmout)
{
Console.WriteLine("Entry with Id {0} is valid", entry.Id);
}
else
{
Console.WriteLine("Entry with Id {0} is INVALID", entry.Id);
}
}
else
{
Console.WriteLine("Entry with Id {0} is valid", entry.Id);
}
}
}
public List<Entry> GetEntriesFromFile(string file)
{
var entries = new List<Entry>();
using (var r = new StreamReader(file))
{
while (!r.EndOfStream)
{
var line = r.ReadLine();
var splited = line.Split('|');
var entry = new Entry();
entry.Id = int.Parse(splited[0]);
entry.TotalAmout = int.Parse(splited[1]);
if (splited.Length == 3)
{
entry.Reference = int.Parse(splited[2]);
}
entries.Add(entry);
}
}
return entries;
}
}
The problem is that I am dealing with large files (10 GB), and that would load way to many objects in memory.
Performance itself is NOT a concern here. I know that I could use dictionaries instead of the Where() method for example. My only problem now is performing the validation without loading everything to memory, and I don't have any idea how to do it, because a entry at the bottom of the file may have a reference to the entry at the top, so I need to keep track of everything.
So my question is: it is possible to keep track of each line in a text file without loading it's information into memory?

Since performance is not an issue here, I would approach this in the following way:
First, I would sort the file so all the parents go right before their children. There are classical methods for sorting huge external data, see https://en.wikipedia.org/wiki/External_sorting
After that, the task becomes pretty trivial: read a parent data, remember it, read and sum children data one by one, compare, repeat.

All you really need to keep in memory is the expected total for each non-child entity, and the running sum of the child totals for each parent entity. Everything else you can throw out, and if you use the File.ReadLines API, you can stream over the file and 'forget' each line once you've processed it. Since the lines are read on demand, you don't have to keep the entire file in memory.
public class Entry
{
public int Id { get; set; }
public int TotalAmount { get; set; }
public int? Reference { get; set; }
}
public static class EntryValidator
{
public static void Validate(string file)
{
var entries = GetEntriesFromFile(file);
var childAmounts = new Dictionary<int, int>();
var nonChildAmounts = new Dictionary<int, int>();
foreach (var e in entries)
{
if (e.Reference is int p)
childAmounts.AddOrUpdate(p, e.TotalAmount, (_, n) => n + e.TotalAmount);
else
nonChildAmounts[e.Id] = e.TotalAmount;
}
foreach (var id in nonChildAmounts.Keys)
{
var expectedTotal = nonChildAmounts[id];
if (childAmounts.TryGetValue(id, out var childTotal) &&
childTotal != expectedTotal)
{
Console.WriteLine($"Entry with Id {id} is INVALID");
}
else
{
Console.WriteLine($"Entry with Id {id} is valid");
}
}
}
private static IEnumerable<Entry> GetEntriesFromFile(string file)
{
foreach (var line in File.ReadLines(file))
yield return GetEntryFromLine(line);
}
private static Entry GetEntryFromLine(string line)
{
var parts = line.Split('|');
var entry = new Entry
{
Id = int.Parse(parts[0]),
TotalAmount = int.Parse(parts[1])
};
if (parts.Length == 3)
entry.Reference = int.Parse(parts[2]);
return entry;
}
}
This uses a nifty extension method for IDictionary<K, V>:
public static class DictionaryExtensions
{
public static TValue AddOrUpdate<TKey, TValue>(
this IDictionary<TKey, TValue> dictionary,
TKey key,
TValue addValue,
Func<TKey, TValue, TValue> updateCallback)
{
if (dictionary == null)
throw new ArgumentNullException(nameof(dictionary));
if (updateCallback == null)
throw new ArgumentNullException(nameof(updateCallback));
if (dictionary.TryGetValue(key, out var value))
value = updateCallback(key, value);
else
value = addValue;
dictionary[key] = value;
return value;
}
}

Importing an Excel Sheet and Validate the Imported Data with Loosely Coupled

I am trying to develop a module which will read excel sheets (possibly from other data sources too, so it should be loosely coupled) and convert them into Entities so to save.
The logic will be this:
The excel sheet can be in different format, for example column names in Excel sheet can be different so my system needs to be able to map different fields to my entities.
For now I will be assuming the format defined above will be same and hardcoded for now instead of coming from database dynamically after set on a configuration mapping UI kinda thing.
The data needs to be validated before even get mapped. So I should be able validate it beforehand against something. We're not using like XSD or something else so I should validate it against the object structure I am using as a template for importing.
The problem is, I put together some things together but I don't say I liked what I did. My Question is how I can improve the code below and make things more modular and fix the validation issues.
The code below is a mock-up and is not expected to work, just to see some structure of the design.
This is code I've come up with so far, and I've realized one thing that I need to improve my design patterns skills but for now I need your help, if you could help me:
//The Controller, a placeholder
class UploadController
{
//Somewhere here we call appropriate class and methods in order to convert
//excel sheet to dataset
}
After we uploaded file using an MVC Controller, there could be different controllers specialized to import certain behaviors, in this example I will uploading person related tables,
interface IDataImporter
{
void Import(DataSet dataset);
}
//We can use many other importers besides PersonImporter
class PersonImporter : IDataImporter
{
//We divide dataset to approprate data tables and call all the IImportActions
//related to Person data importing
//We call inserting to database functions here of the DataContext since this way
//we can do less db roundtrip.
public string PersonTableName {get;set;}
public string DemographicsTableName {get;set;}
public Import(Dataset dataset)
{
CreatePerson();
CreateDemograhics();
}
//We put different things in different methods to clear the field. High cohesion.
private void CreatePerson(DataSet dataset)
{
var personDataTable = GetDataTable(dataset,PersonTableName);
IImportAction addOrUpdatePerson = new AddOrUpdatePerson();
addOrUpdatePerson.MapEntity(personDataTable);
}
private void CreateDemograhics(DataSet dataset)
{
var demographicsDataTable = GetDataTable(dataset,DemographicsTableName);
IImportAction demoAction = new AddOrUpdateDemographic(demographicsDataTable);
demoAction.MapEntity();
}
private DataTable GetDataTable(DataSet dataset, string tableName)
{
return dataset.Tables[tableName];
}
}
I have IDataImporter and specialized concrete class PersonImporter. However, I am not sure it looks good so far since things should be SOLID so basically easy to extend later in the project cycle, this will be a foundation for future improvements, lets keep going:
IImportActions are where the magic mostly happens. Instead of designing things table based, I am developing it behavior based so one can call any of them to import things in more modular model. For example a table may have 2 different actions.
interface IImportAction
{
void MapEntity(DataTable table);
}
//A sample import action, AddOrUpdatePerson
class AddOrUpdatePerson : IImportAction
{
//Consider using default values as well?
public string FirstName {get;set;}
public string LastName {get;set;}
public string EmployeeId {get;set;}
public string Email {get;set;}
public void MapEntity(DataTable table)
{
//Each action is producing its own data context since they use
//different actions.
using(var dataContext = new DataContext())
{
foreach(DataRow row in table.Rows)
{
if(!emailValidate(row[Email]))
{
LoggingService.LogWarning(emailValidate.ValidationMessage);
}
var person = new Person(){
FirstName = row[FirstName],
LastName = row[LastName],
EmployeeId = row[EmployeeId],
Email = row[Email]
};
dataContext.SaveObject(person);
}
dataContext.SaveChangesToDatabase();
}
}
}
class AddOrUpdateDemographic: IImportAction
{
static string Name {get;set;}
static string EmployeeId {get;set;}
//So here for example, we will need to save dataContext first before passing it in
//to get the PersonId from Person (we're assuming that we need PersonId for Demograhics)
public void MapEntity(DataTable table)
{
using(var dataContext = new DataCOntext())
{
foreach(DataRow row in table.Rows)
{
var demograhic = new Demographic(){
Name = row[Name],
PersonId = dataContext.People.First(t => t.EmployeeId = int.Parse(row["EmpId"]))
};
dataContext.SaveObject(person);
}
dataContext.SaveChangesToDatabase();
}
}
}
And the validation, which mostly where I suck at unfortunately. The validation needs to be easy to extend and loosely coupled and also I need to be able to call this validation beforehand instead of adding everything.
public static class ValidationFactory
{
public static Lazy<IFieldValidation> PhoneValidation = new Lazy<IFieldValidation>(()=>new PhoneNumberValidation());
public static Lazy<IFieldValidation> EmailValidation = new Lazy<IFieldValidation>(()=>new EmailValidation());
//etc.
}
interface IFieldValidation
{
string ValidationMesage{get;set;}
bool Validate(object value);
}
class PhoneNumberValidation : IFieldValidation
{
public string ValidationMesage{get;set;}
public bool Validate(object value)
{
var validated = true; //lets say...
var innerValue = (string) value;
//validate innerValue using Regex or something
//if validation fails, then set ValidationMessage propert for logging.
return validated;
}
}
class EmailValidation : IFieldValidation
{
public string ValidationMesage{get;set;}
public bool Validate(object value)
{
var validated = true; //lets say...
var innerValue = (string) value;
//validate innerValue using Regex or something
//if validation fails, then set ValidationMessage propert for logging.
return validated;
}
}

I have done the same thing on a project. The difference is that I didn't have to import Excel sheets, but CSV files. I created a CSVValueProvider. And, therefore, the CSV data was bound to my IEnumerable model automatically.
As for validation, I figured that going through all rows, and cells, and validating them one by one is not very efficient, especially when the CSV file has thousands of records. So, what I did was that I created some validation methods that went through the CSV data column by column, instead of row by row, and did a linq query on each column and returned the row numbers of the cells with invalid data. Then, added the invalid row number/column names to ModelState.
UPDATE:
Here is what I have done...
CSVReader Class:
// A class that can read and parse the data in a CSV file.
public class CSVReader
{
// Regex expression that's used to parse the data in a line of a CSV file
private const string ESCAPE_SPLIT_REGEX = "({1}[^{1}]*{1})*(?<Separator>{0})({1}[^{1}]*{1})*";
// String array to hold the headers (column names)
private string[] _headers;
// List of string arrays to hold the data in the CSV file. Each string array in the list represents one line (row).
private List<string[]> _rows;
// The StreamReader class that's used to read the CSV file.
private StreamReader _reader;
public CSVReader(StreamReader reader)
{
_reader = reader;
Parse();
}
// Reads and parses the data from the CSV file
private void Parse()
{
_rows = new List<string[]>();
string[] row;
int rowNumber = 1;
var headerLine = "RowNumber," + _reader.ReadLine();
_headers = GetEscapedSVs(headerLine);
rowNumber++;
while (!_reader.EndOfStream)
{
var line = rowNumber + "," + _reader.ReadLine();
row = GetEscapedSVs(line);
_rows.Add(row);
rowNumber++;
}
_reader.Close();
}
private string[] GetEscapedSVs(string data)
{
if (!data.EndsWith(","))
data = data + ",";
return GetEscapedSVs(data, ",", "\"");
}
// Parses each row by using the given separator and escape characters
private string[] GetEscapedSVs(string data, string separator, string escape)
{
string[] result = null;
int priorMatchIndex = 0;
MatchCollection matches = Regex.Matches(data, string.Format(ESCAPE_SPLIT_REGEX, separator, escape));
// Skip empty rows...
if (matches.Count > 0)
{
result = new string[matches.Count];
for (int index = 0; index <= result.Length - 2; index++)
{
result[index] = data.Substring(priorMatchIndex, matches[index].Groups["Separator"].Index - priorMatchIndex);
priorMatchIndex = matches[index].Groups["Separator"].Index + separator.Length;
}
result[result.Length - 1] = data.Substring(priorMatchIndex, data.Length - priorMatchIndex - 1);
for (int index = 0; index <= result.Length - 1; index++)
{
if (Regex.IsMatch(result[index], string.Format("^{0}.*[^{0}]{0}$", escape)))
result[index] = result[index].Substring(1, result[index].Length - 2);
result[index] = result[index].Replace(escape + escape, escape);
if (result[index] == null || result[index] == escape)
result[index] = "";
}
}
return result;
}
// Returns the number of rows
public int RowCount
{
get
{
if (_rows == null)
return 0;
return _rows.Count;
}
}
// Returns the number of headers (columns)
public int HeaderCount
{
get
{
if (_headers == null)
return 0;
return _headers.Length;
}
}
// Returns the value in a given column name and row index
public object GetValue(string columnName, int rowIndex)
{
if (rowIndex >= _rows.Count)
{
return null;
}
var row = _rows[rowIndex];
int colIndex = GetColumnIndex(columnName);
if (colIndex == -1 || colIndex >= row.Length)
{
return null;
}
var value = row[colIndex];
return value;
}
// Returns the column index of the provided column name
public int GetColumnIndex(string columnName)
{
int index = -1;
for (int i = 0; i < _headers.Length; i++)
{
if (_headers[i].Replace(" ","").Equals(columnName, StringComparison.CurrentCultureIgnoreCase))
{
index = i;
return index;
}
}
return index;
}
}
CSVValueProviderFactory Class:
public class CSVValueProviderFactory : ValueProviderFactory
{
public override IValueProvider GetValueProvider(ControllerContext controllerContext)
{
var uploadedFiles = controllerContext.HttpContext.Request.Files;
if (uploadedFiles.Count > 0)
{
var file = uploadedFiles[0];
var extension = file.FileName.Split('.').Last();
if (extension.Equals("csv", StringComparison.CurrentCultureIgnoreCase))
{
if (file.ContentLength > 0)
{
var stream = file.InputStream;
var csvReader = new CSVReader(new StreamReader(stream, Encoding.Default, true));
return new CSVValueProvider(controllerContext, csvReader);
}
}
}
return null;
}
}
CSVValueProvider Class:
// Represents a value provider for the data in an uploaded CSV file.
public class CSVValueProvider : IValueProvider
{
private CSVReader _csvReader;
public CSVValueProvider(ControllerContext controllerContext, CSVReader csvReader)
{
if (controllerContext == null)
{
throw new ArgumentNullException("controllerContext");
}
if (csvReader == null)
{
throw new ArgumentNullException("csvReader");
}
_csvReader = csvReader;
}
public bool ContainsPrefix(string prefix)
{
if (prefix.Contains('[') && prefix.Contains(']'))
{
if (prefix.Contains('.'))
{
var header = prefix.Split('.').Last();
if (_csvReader.GetColumnIndex(header) == -1)
{
return false;
}
}
int index = int.Parse(prefix.Split('[').Last().Split(']').First());
if (index >= _csvReader.RowCount)
{
return false;
}
}
return true;
}
public ValueProviderResult GetValue(string key)
{
if (!key.Contains('[') || !key.Contains(']') || !key.Contains('.'))
{
return null;
}
object value = null;
var header = key.Split('.').Last();
int index = int.Parse(key.Split('[').Last().Split(']').First());
value = _csvReader.GetValue(header, index);
if (value == null)
{
return null;
}
return new ValueProviderResult(value, value.ToString(), CultureInfo.CurrentCulture);
}
}
For the validation, as I mentioned before, I figured that it would not be efficient to do it using DataAnnotation attributes. A row by row validation of the data would take a long time for CSV files with thousands of rows. So, I decided to validate the data in the Controller after the Model Binding is done. I should also mention that I needed to validate the data in the CSV file against some data in the database. If you just need to validate things like Email Address or Phone Number, you might as well just use DataAnnotation.
Here is a sample method for validating the Email Address column:
private void ValidateEmailAddress(IEnumerable<CSVViewModel> csvData)
{
var invalidRows = csvData.Where(d => ValidEmail(d.EmailAddress) == false).ToList();
foreach (var invalidRow in invalidRows)
{
var key = string.Format("csvData[{0}].{1}", invalidRow.RowNumber - 2, "EmailAddress");
ModelState.AddModelError(key, "Invalid Email Address");
}
}
private static bool ValidEmail(string email)
{
if(email == "")
return false;
else
return new System.Text.RegularExpressions.Regex(#"^[\w-\.]+#([\w-]+\.)+[\w-]{2,6}$").IsMatch(email);
}
UPDATE 2:
For validation using DataAnnotaion, you just use DataAnnotation attributes in your CSVViewModel like below (the CSVViewModel is the class that your CSV data will be bound to in your Controller Action):
public class CSVViewModel
{
// User proper names for your CSV columns, these are just examples...
[Required]
public int Column1 { get; set; }
[Required]
[StringLength(30)]
public string Column2 { get; set; }
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Working with CSV file - c#

Related

Problem in databinding Array data to DataGridView in c#

C# - Reading data from CSV file and show them in DataGridView

Cannot get more than one result when using System.Linq.Dynamic.Core to query a list of objects that have a member dictionary

How to read large text files and keep tracking of the information of previous lines using C#?

Importing an Excel Sheet and Validate the Imported Data with Loosely Coupled

Categories

Resources