I'm trying to parse Excel (.xls, .xlsx) files. The structure of files is the same except for the amount of the records.
I need to parse the industry. In this case it is "FinTech". Due to the fact that it is in one cell, I guess I have to use a regex expression such as ^Industry: (.*)$?
It has to find which row/column the list of the people starts and put it into a IEnumerable<Person>. It could use the following regex expressions.
Number always consists of 6 digits. ^[0-9]{6}$
Name consists of at least two words where each one of them starts with a capital letter. ^([a-zA-Z]+\s?\b){2,}$
A test .xlsx file can be found here https://docs.google.com/spreadsheets/d/15SR04cHXgGLWe0cuOOuuB5vUZigebh96/edit?usp=sharing&ouid=112418126731411268789&rtpof=true&sd=true.
List of people
Normal condition
Industry: FinTech
# Number Name
1 226250 Zain Griffiths
2 226256 Michael Houghton
3 226259 Hugo Willis Johnson
4 226264 Anna-Maria Rose
The actual question
First of all, I'm not completely sure if my regex expressions are correct. I was only able to display the rows and the columns but I'm not sure how to actually parse the industry and the list of the people into a IEnumerable<Person>. So how do I do that?
Snippet
// Program.cs
var excel = new ExcelParser();
var sheet1 = excel.Import(#"a.xlsx");
Console.OutputEncoding = Encoding.UTF8;
for (var i = 0; i < sheet1.Rows.Count; i++)
{
for (var j = 0; j < sheet1.Columns.Count; j++)
{
var cell = sheet1.Rows[i][j].ToString()?.Trim();
Console.Write($"Column: {cell} | ");
}
Console.WriteLine();
}
Console.ReadLine();
// ExcelParser.cs
public sealed class ExcelParser
{
public ExcelParser()
{
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
}
public DataTable Import(string filePath)
{
// does file exist?
if (!File.Exists(filePath))
{
throw new FileNotFoundException();
}
// .xls or .xlsx allowed
var extension = new FileInfo(filePath).Extension.ToLowerInvariant();
if (extension is not (".xls" or ".xlsx"))
{
throw new NotSupportedException();
}
// read .xls or .xlsx
using var stream = File.Open(filePath, FileMode.Open, FileAccess.Read);
using var reader = ExcelReaderFactory.CreateReader(stream);
var dataSet = reader.AsDataSet(new ExcelDataSetConfiguration
{
ConfigureDataTable = _ => new ExcelDataTableConfiguration
{
UseHeaderRow = false
}
});
// Sheet1
return dataSet.Tables[0];
}
}
The structure of files is the same except for the amount of the records
As long as the table is structured (or semi-structured), you can state one/two simple assumptions and parse the tables based on these assumptions, and in case the structure is not following the assumptions, you will return false (throw exception, etc..).
Actually, designing regexs to parse the table is kind of assumptions encoding.. I just want to Keep it simple, So, Based on the problem statement, here are my assumptions:
There will be a "industry" (or "industry:", call .ToLower()) string in a separate cell (regex will do nothing more than finding such a string), and industry's name will be in the same cell.[1]
First person's name will be next to the first 6-digits-number cell.[2]
Here is the code
public (string industryName, List<string> peopleNames) ParseSheet(DataTable sheet1)
{
// 1. Get Indices of industry cell and first Name in people names..
var industryCellIndex = (-1, -1, false);
var peopleFirstCellIndex = (-1, -1, false);
for (var i = 0; i < sheet1.Rows.Count; i++)
{
for (var j = 0; j < sheet1.Columns.Count; j++)
{
// .ToLower() added
var cell = sheet1.Rows[i][j].ToString()?.Trim().ToLower();
if (cell.StartsWith("industry"))
{
industryCellIndex = (i, j, true);
break;
}
// the name after the first 6-digits number cell will be the first name in people records
if (cell.Length == 6 && int.TryParse(cell, out _))
{
peopleFirstCellIndex = (i, j + 1, true);
break;
}
}
if (industryCellIndex.Item3 && peopleFirstCellIndex.Item3)
break;
}
if (!industryCellIndex.Item3 || !peopleFirstCellIndex.Item3)
{
// throw new Exception("Excel file is not normalized!");
return (null, null);
}
// 2. retrieve the desired data
var industryName = sheet1.Rows[industryCellIndex.Item1][industryCellIndex.Item2]
.Replace(":", ""); // will do nothing if there were no ":"
industryName = industryName.Substring(industryName.IndexOf("indusrty") + "indusrty".Length);
var peopleNames = new List<string>();
var colIndex = peopleFirstCellIndex.Item2;
for (var rowIndex = peopleFirstCellIndex.Item1;
rowIndex < sheet1.Rows.Count;
rowIndex++)
{
peopleNames.Add(sheet1.Rows[rowIndex][colIndex].ToString()?.Trim());
}
return (industryName, peopleNames);
}
[1] If this assumption needs some editing (like: the indusrty name might be the next cell that has "industry" string), the idea still the same.. you can consider this in parsing.
[2] And, for example, after the "#" cell by 2 columns and 1 row.
Related
Problem Statement
In order to run gene annotation software, I need to prepare two types of files, vcard files and coverage tables, and there has to be one-to-one match of vcard to coverage table. Since Im running 2k samples, its hard to identify which file is not one-to-one match. I know that both files have unique identifier numbers, hence, if both folders have files that have same unique numbers, i treat that as "same" file
I made a program that compares two folders and reports unique entries in each folder. To do so, I made two list that contains unique file names to each directory.
I want to format the report file (tab delimited .txt file) such that it looks something like below:
Unique in fdr1 Unique in fdr2
file x file a
file y file b
file z file c
I find this difficult to do because I have to iterate twice (since I have two lists), but there is no way of going back to the previous line in StreamWriter as far as I know. Basically, once I iterate through the first list and fill the first column, how can I fill the second column with the second list?
Can someone help me out with this?
Thanks
If design of the code has to change (i.e. one list instead of two), please let me know
As requested by some user, this is how I was going to do (not working version)
// Write report
using (StreamWriter sw = new StreamWriter(dest_txt.Text + #"\" + "Report.txt"))
{
// Write headers
sw.WriteLine("Unique Entries in Folder1" + "\t" + "Unique Entries in Folder2");
// Write unique entries in fdr1
foreach(string file in fdr1FileList)
{
sw.WriteLine(file + "\t");
}
// Write unique entries in fdr2
foreach (string file in fdr2FileList)
{
sw.WriteLine(file + "\t");
}
sw.Dispose();
}
As requested for my approach for finding unique entries, here's my code snippet
Dictionary<int, bool> fdr1Dict = new Dictionary<int, bool>();
Dictionary<int, bool> fdr2Dict = new Dictionary<int, bool>();
List<string> fdr1FileList = new List<string>();
List<string> fdr2FileList = new List<string>();
string fdr1Path = folder1_txt.Text;
string fdr2Path = folder2_txt.Text;
// File names in the specified directory; path not included
string[] fdr1FileNames = Directory.GetFiles(fdr1Path).Select(Path.GetFileName).ToArray();
string[] fdr2FileNames = Directory.GetFiles(fdr2Path).Select(Path.GetFileName).ToArray();
// Iterate through the first directory, and add GL number to dictionary
for(int i = 0; i < fdr1FileNames.Length; i++)
{
// Grabs only the number from the file name
string number = Regex.Match(fdr1FileNames[i], #"\d+").ToString();
int glNumber;
// Make sure it is a number
if(Int32.TryParse(number, out glNumber))
{
fdr1Dict[glNumber] = true;
}
// If number not present, raise exception
else
{
throw new Exception(String.Format("GL Number not found in: {0}", fdr1FileNames[i]));
}
}
// Iterate through the second directory, and add GL number to dictionary
for (int i = 0; i < fdr2FileNames.Length; i++)
{
// Grabs only the number from the file name
string number = Regex.Match(fdr2FileNames[i], #"\d+").ToString();
int glNumber;
// Make sure it is a number
if (Int32.TryParse(number, out glNumber))
{
fdr2Dict[glNumber] = true;
}
// If number not present, raise exception
else
{
throw new Exception(String.Format("GL Number not found in: {0}", fdr2FileNames[i]));
}
}
// Iterate through the first directory, and find files that are unique to it
for (int i = 0; i < fdr1FileNames.Length; i++)
{
int glNumber = Int32.Parse(Regex.Match(fdr1FileNames[i], #"\d+").Value);
// If same file is not present in the second folder add to the list
if(!fdr2Dict[glNumber])
{
fdr1FileList.Add(fdr1FileNames[i]);
}
}
// Iterate through the second directory, and find files that are unique to it
for (int i = 0; i < fdr2FileNames.Length; i++)
{
int glNumber = Int32.Parse(Regex.Match(fdr2FileNames[i], #"\d+").Value);
// If same file is not present in the first folder add to the list
if (!fdr1Dict[glNumber])
{
fdr2FileList.Add(fdr2FileNames[i]);
}
I am a quite confident that this will work as I've tested it:
static void Main(string[] args)
{
var firstDir = #"Path1";
var secondDir = #"Path2";
var firstDirFiles = System.IO.Directory.GetFiles(firstDir);
var secondDirFiles = System.IO.Directory.GetFiles(secondDir);
print2Dirs(firstDirFiles, secondDirFiles);
}
private static void print2Dirs(string[] firstDirFile, string[] secondDirFiles)
{
var maxIndex = Math.Max(firstDirFile.Length, secondDirFiles.Length);
using (StreamWriter streamWriter = new StreamWriter("result.txt"))
{
streamWriter.WriteLine(string.Format("{0,-150}{1,-150}", "Unique in fdr1", "Unique in fdr2"));
for (int i = 0; i < maxIndex; i++)
{
streamWriter.WriteLine(string.Format("{0,-150}{1,-150}",
firstDirFile.Length > i ? firstDirFile[i] : string.Empty,
secondDirFiles.Length > i ? secondDirFiles[i] : string.Empty));
}
}
}
It's a quite simple code but if you need help understanding it just let me know :)
I would construct each line at a time. Something like this:
int row = 0;
string[] fdr1FileList = new string[0];
string[] fdr2FileList = new string[0];
while (row < fdr1FileList.Length || row < fdr2FileList.Length)
{
string rowText = "";
rowText += (row >= fdr1FileList.Length ? "\t" : fdr1FileList[row] + "\t");
rowText += (row >= fdr2FileList.Length ? "\t" : fdr2FileList[row]);
row++;
}
Try something like this:
static void Main(string[] args)
{
Dictionary<int, string> fdr1Dict = FilesToDictionary(Directory.GetFiles("path1"));
Dictionary<int, string> fdr2Dict = FilesToDictionary(Directory.GetFiles("path2"));
var unique_f1 = fdr1Dict.Where(f1 => !fdr2Dict.ContainsKey(f1.Key)).ToArray();
var unique_f2 = fdr2Dict.Where(f2 => !fdr1Dict.ContainsKey(f2.Key)).ToArray();
int f1_size = unique_f1.Length;
int f2_size = unique_f2.Length;
int list_length = 0;
if (f1_size > f2_size)
{
list_length = f1_size;
Array.Resize(ref unique_f2, list_length);
}
else
{
list_length = f2_size;
Array.Resize(ref unique_f1, list_length);
}
using (StreamWriter writer = new StreamWriter("output.txt"))
{
writer.WriteLine(string.Format("{0,-30}{1,-30}", "Unique in fdr1", "Unique in fdr2"));
for (int i = 0; i < list_length; i++)
{
writer.WriteLine(string.Format("{0,-30}{1,-30}", unique_f1[i].Value, unique_f2[i].Value));
}
}
}
static Dictionary<int, string> FilesToDictionary(string[] filenames)
{
Dictionary<int, string> dict = new Dictionary<int, string>();
for (int i = 0; i < filenames.Length; i++)
{
int glNumber;
string filename = Path.GetFileName(filenames[i]);
string number = Regex.Match(filename, #"\d+").ToString();
if (int.TryParse(number, out glNumber))
dict.Add(glNumber, filename);
}
return dict;
}
I'm attempting to import a CSV file into a DataTable, however the CSV contains headers that are the same. (For example, there are multiple "Date" headers for different form sections). To fix this, I decided to create a loop that will run through the headers and replace the duplicates or unwanted entries based on their position. I've replaced my replaceWith array with dummy entries, but my actual code does have the correct size to correlate with the replace array.
string[] columnNames = null;
string[] oStreamDataValues = null;
int[] error = {0,1,2,3,4,7,8,9,10,11,15,21,34,37,57,61,65,68,69,71,75,79,82,83,85,89,93,96,97,99,103,107,110,111,113,117,121,124,125,127,128,129,130,132,182,210,212,213,214,215,216,222,226,239};
int[] replace = {14,16,17,17,20,23,24,27,28,29,31,32,44,58,59,60,62,63,64,66,67,70,72,73,74,76,77,78,80,81,84,86,87,88,90,91,92,94,95,98,100,101,102,104,105,106,108,109,112,114,115,116,118,119,120,122,123,126,134,136,138,140,142,144,146,148,150,152,154,156,158,160,162,164,166,168,170,172,174,176,178,180,184,186,187,188,190,191,192,194,195,196,198,199,200,202,203,204,206,207,208,209,236,242,243,244};
string[] replaceWith = {"Replace 1", "Replace 2", "Replace 3"};
string fix = "ignore_";
int inc = 00;
string entry = "";
while (!oStreamReader.EndOfStream)
{
string oStreamRowData = oStreamReader.ReadLine().Trim();
if (oStreamRowData.Length > 0)
{
//oStreamDataValues = Regex.Split(oStreamRowData, ",(?=(?:[^']*'[^']*')*[^']*$)");
oStreamDataValues = oStreamRowData.Split(',');
if (rowCount == 0)
{
rowCount = 1;
columnNames = oStreamDataValues;
for (int i = 0; i < columnNames.Length; i++)
{
for (int j = 0; j < error.Length; j++)
{
if (error[j] == i)
{
entry = fix + inc++;
}
}
for (int k = 0; k < replace.Length; k++)
{
if (replace[i] == i)
{
int add = 0;
entry = replaceWith[add++];
}
}
DataColumn oDataColumn = new DataColumn(entry, typeof(string));
oDataColumn.DefaultValue = string.Empty;
oDataTable.Columns.Add(oDataColumn);
}
}
}
I'm no coding expert, so my syntax/decision making isn't perfect.
However the error that I get is that A column named 'ignore_4' already belongs to this DataTable.
I assume something is incorrect in my loop logic.
I think you have overcomplicated the loops. You just need to keep an index of the current position in the array of errors and array of replaces.
string rep = "replace_"; // base string for replace fields
string fix = "ignore_"; // base string for ignore fields
// For demonstation purpose I have commented out this array. If you
// want every 'replace' column have its specific name then prepare this
// array with exactly the number of names required by the number of
// elements in the replace array
//
// string[] replaceWith = {"Replace 1", "Replace 2", "Replace 3"};
int idxErrors = 0; // Current position in the error array
int idxReplace = 0; // Current position in the replace array
int fixCounter = 1;
int repCounter = 1;
string entry = "";
for (int i = 0; i < columnNames.Length; i++)
{
// Is this the index of a column that should be ignored?
if (idxErrors < error.Length && i == error[idxErrors])
{
entry = fix + fixCounter.ToString("D2");
idxErrors++;
fixCounter++;
}
// Is this the index of a column that should have a different name??
else if (idxReplace < replace.Length && i == replace[idxReplace])
{
entry = rep + repCounter.ToString("D2");
// entry = replaceWith[repCounter];
idxReplace++;
repCounter++;
}
else
entry = columnNames[i];
// Now create the column
DataColumn oDataColumn = new DataColumn(entry, typeof(string));
oDataColumn.DefaultValue = string.Empty;
oDataTable.Columns.Add(oDataColumn);
}
In this example I have used the same pattern used for the ignored column also for the columns that need to have the name changed. If you want to give each renamed column a proper name, then you need to prepare an array with these proper names and this array should be of the same length of the replace array. Then use the idxReplace to take the correct name from the array of possible proper names.
I am creating a spreadsheet from a List<object[]> using LoadFromArrays
The first entry of the array is a title, the other entries are possibly numbers, text or dates (but the same for each array in the list).
The generated Excel sheet has the green triangle warning that numbers are formatted as text.
I loop through all the cells and set their format to Number like so ws.Cells[i, j].Style.Numberformat.Format = "0";
However the problem remains and I still see the green warning, even though the number format is set to number when I look in the Format Cell... dialogue.
What are my options here? It is possible for me to know a bit more about what type is in each column, but how do I then set a column title?
Is there a better solution than EPPlus? or some more post processing of the spreadsheet I can do before downloading it?
Since you are using objects arrays they can contain numbers and strings that look like numbers you will have to go through each object and determine its type:
[TestMethod]
public void Object_Type_Write_Test()
{
//http://stackoverflow.com/questions/31537981/using-epplus-how-can-i-generate-a-spreadsheet-where-numbers-are-numbers-not-text
var existingFile = new FileInfo(#"c:\temp\temp.xlsx");
if (existingFile.Exists)
existingFile.Delete();
//Some data
var list = new List<Object[]>
{
new object[]
{
"111.11",
111.11,
DateTime.Now
}
};
using (var package = new ExcelPackage(existingFile))
{
var ws = package.Workbook.Worksheets.Add("Sheet1");
ws.Cells[1, 1, 2, 2].Style.Numberformat.Format = "0";
ws.Cells[1, 3, 2, 3].Style.Numberformat.Format = "[$-F400]h:mm:ss\\ AM/PM";
//This will cause numbers in string to be stored as string in excel regardless of cell format
ws.Cells["A1"].LoadFromArrays(list);
//Have to go through the objects to deal with numbers as strings
for (var i = 0; i < list.Count; i++)
{
for (var j = 0; j < list[i].Count(); j++)
{
if (list[i][j] is string)
ws.Cells[i + 2, j + 1].Value = Double.Parse((string) list[i][j]);
else if (list[i][j] is double)
ws.Cells[i + 2, j + 1].Value = (double)list[i][j];
else
ws.Cells[i + 2, j + 1].Value = list[i][j];
}
}
package.Save();
}
}
With the above, you see the image below as the output Note the upper left corner cell with the green arrow because it was a string that was written by LoadFromArray which looks like a number:
I created an extension method LoadFormulasFromArray, based on EPPlus LoadFromArray. The method assumes all objects in the list are to be treated as formulas (as opposed to LoadFromArray). The big picture is that both Value and Formula properties take string instead of a specific Type. I see this as a mistake because there's no way to differentiate if the string is Text or Formula. Implementing a Formula Type would enable overloading and type checking thus making it possible to always do the right thing.
// usage: ws.Cells[2,2].LoadFormulasFromArrays(MyListOfObjectArrays)
public static class EppPlusExtensions
{
public static ExcelRangeBase LoadFormulasFromArrays(this ExcelRange Cells, IEnumerable<object[]> Data)
{
//thanx to Abdullin for the code contribution
ExcelWorksheet _worksheet = Cells.Worksheet;
int _fromRow = Cells.Start.Row;
int _fromCol = Cells.Start.Column;
if (Data == null) throw new ArgumentNullException("data");
int column = _fromCol, row = _fromRow;
foreach (var rowData in Data)
{
column = _fromCol;
foreach (var cellData in rowData)
{
Cells[row, column].Formula = cellData.ToString();
column += 1;
}
row += 1;
}
return Cells[_fromRow, _fromCol, row - 1, column - 1];
}
}
The trick is to not pass the numbers as "raw objects" to EPPlus but casting them properly.
Here's how I did that in a DataTable-to-Excel export method I made with EPPlus:
if (dc.DataType == typeof(int)) ws.SetValue(row, col, !r.IsNull(dc) ? (int)r[dc] : (int?)null);
else if (dc.DataType == typeof(decimal)) ws.SetValue(row, col, !r.IsNull(dc) ? (decimal)r[dc] : (decimal?)null);
else if (dc.DataType == typeof(double)) ws.SetValue(row, col, !r.IsNull(dc) ? (double)r[dc] : (double?)null);
else if (dc.DataType == typeof(float)) ws.SetValue(row, col, !r.IsNull(dc) ? (float)r[dc] : (float?)null);
else if (dc.DataType == typeof(string)) ws.SetValue(row, col, !r.IsNull(dc) ? (string)r[dc] : null);
else if (dc.DataType == typeof(DateTime))
{
if (!r.IsNull(dc))
{
ws.SetValue(row, col, (DateTime)r[dc]);
// Change the following line if you need a different DateTime format
var dtFormat = "dd/MM/yyyy";
ws.Cells[row, col].Style.Numberformat.Format = dtFormat;
}
else ws.SetValue(row, col, null);
}
IMPORTANT: It's worth noting that DateTime values will require more work to be handled properly, since we would want to have it formatted in a certain way AND, arguably, support NULL values in the column: the above method fullfills both these requirements.
I published the full code sample (DataTable to Excel file with EPPlus) in this post on my blog.
The code below takes in data from an Excel spread sheet validates the data from a set of pre defined rules and writes out any errors to the console.
This works up to a point. The data returns as expected up to column Z. If any errors are returned passed Z. AB, AC, AD, etc. Then return values start messing up an I get values returned like ], ~, ?. I believe this issue is down to ASCII as I am starting from dec 65 (A). I guess I need to write some kind of Method that can cope with this but do not know where to start. Any help is much appreciated.
namespace WorksheetValidator
{
public class XcelReader
{
private readonly List<List<IRule>> m_Rules;
public XcelReader(List<List<IRule>> rules)
{
m_Rules = rules;
}
public void ValidateWorksheet(string fileName)
{
bool allRulesPassed = true;
WorkbookProvider workbookProvider = new WorkbookProvider();
IWorkbook workbook;
using (FileStream fileStream = File.OpenRead(fileName))
workbook = workbookProvider.GetWorkbook(fileStream, SpreadsheetType.Xlsx);
for (int rowCounter = 1; rowCounter < workbook.Worksheets[1].Rows.Count; rowCounter++)
{
IRow row = workbook.Worksheets[1].Rows[rowCounter];
for (int columnCounter = 0; columnCounter < row.Cells.Count; columnCounter++)
{
List<string> failedRules = ColumnValueIsValid(row.Cells[columnCounter].Value, m_Rules[columnCounter]);
failedRules.ForEach(failedRule =>
{
allRulesPassed = false;
Console.WriteLine("\n[{0}:{1}] Failed: {2}", rowCounter + 1, (char)(columnCounter + 65), failedRule);
});
}
}
if(allRulesPassed)
Console.WriteLine("\n\n\nWOOHOO! worksheet is hunky dory");
}
private List<string> ColumnValueIsValid(string value, List<IRule> rules)
{
List<string> failedRules = new List<string>();
rules.ForEach(rule =>
{
if(!rule.IsValid(value))
failedRules.Add(rule.GetReasonForFailure(value));
});
return failedRules;
}
}
}
Replace this:
(char)(columnCounter + 65)
With a function that converts 0 to "A", 1 to "B"..... 26 to "AA", 27 to "AB", etc.
I am getting a list of items from a csv file via a Web Api using this code:
private List<Item> items = new List<Item>();
public ItemRepository()
{
string filename = HttpRuntime.AppDomainAppPath + "App_Data\\items.csv";
var lines = File.ReadAllLines(filename).Skip(1).ToList();
for (int i = 0; i < lines.Count; i++)
{
var line = lines[i];
var columns = line.Split('$');
//get rid of newline characters in the middle of data lines
while (columns.Length < 9)
{
i += 1;
line = line.Replace("\n", " ") + lines[i];
columns = line.Split('$');
}
//Remove Starting and Trailing open quotes from fields
columns = columns.Select(c => { if (string.IsNullOrEmpty(c) == false) { return c.Substring(1, c.Length - 2); } return string.Empty; }).ToArray();
var temp = columns[5].Split('|', '>');
items.Add(new Item()
{
Id = int.Parse(columns[0]),
Name = temp[0],
Description = columns[2],
Photo = columns[7]
});
}
}
The Name attribute of the item list must come from column whose structure is as follows:
Groups>Subgroup>item
Therefore I use var temp = columns[5].Split('|', '>'); in my code to get the first element of the column before the ">", which in the above case is Groups. And this works fine.
However, I a getting many duplicates in the result. This is because other items in the column may be:
(These are some of the entries in my csv column 9)
Groups>Subgroup2>item2, Groups>Subgroup3>item4, Groups>Subgroup4>item9
All start with Groups, but I only want to get Groups once.
As it is I get a long list of Groups. How do I stop the duplicates?
I want that if an Item in the list is returned with the Name "Groups", that no other item with that name would be returned. How do I make this check and implement it?
If you are successfully getting the list of groups, take that list of groups and use LINQ:
var undupedList = dupedList
.Distinct();
Update: The reason distinct did not work is because your code is requesting not just Name, but also, Description, etc...If you only ask for Name, Distinct() will work.
Update 2: Try this:
//Check whether already exists
if((var match = items.Where(q=>q.Name == temp[0])).Count==0)
{
items.add(...);
}
How about using a List to store Item.Name?
Then check List.Contains() before calling items.Add()
Simple, only 3 lines of code, and it works.
IList<string> listNames = new List();
//
for (int i = 0; i < lines.Count; i++)
{
//
var temp = columns[5].Split('|', '>');
if (!listNames.Contains(temp[0]))
{
listNames.Add(temp[0]);
items.Add(new Item()
{
//
});
}
}