C# compare fields from different lines in csv - c#

I am trying to compare the value in the 0 index of an array on one line and the 0 index on the following line. Imagine a CSV where I have a unique identifier in the first column, a corresponding value in the second column.
USER1, 1P
USER1, 3G
USER2, 1P
USER3, 1V
I would like to check the value of [0] the next line (or previous if that's easier) to compare and if they are the same (as they are in the example) concatenate it to index 1. That is, the data should read as
USER1, 1P, 3G
USER2, 1P
USER3, 1V
before it gets passed onto the next function. So far I have
private void csvParse(string path)
{
using (TextFieldParser parser = new TextFieldParser(path))
{
parser.Delimiters = new string[] { "," };
while (!parser.EndOfData)
{
string[] parts = parser.ReadFields();
if (parts == null)
{
break;
}
contact.ContactId = parts[0];
long nextLine;
nextLine = parser.LineNumber+1;
//if line1 parts[0] == line2 parts[0] etc.
}
}
}
Does anyone have any suggestions? Thank you.

How about saving the array into a variable:
private void csvParse(string path)
{
using (TextFieldParser parser = new TextFieldParser(path))
{
parser.Delimiters = new string[] { "," };
string[] oldParts = new string[] { string.Empty };
while (!parser.EndOfData)
{
string[] parts = parser.ReadFields();
if (parts == null || parts.Length < 1)
{
break;
}
if (oldParts[0] == parts[0])
{
// concat logic goes here
}
else
{
contact.ContactId = parts[0];
}
long nextLine;
nextLine = parser.LineNumber+1;
oldParts = parts;
//if line1 parts[0] == line2 parts[0] etc.
}
}
}

If I understand you correctly, what you are asking is essentially "how do I group the values in the second column based on the values in the first column?".
A quick and quite succinct way of doing this would be to Group By using LINQ:
var linesGroupedByUser =
from line in File.ReadAllLines(path)
let elements = line.Split(',')
let user = new {Name = elements[0], Value = elements[1]}
group user by user.Name into users
select users;
foreach (var user in linesGroupedByUser)
{
string valuesAsString = String.Join(",", user.Select(x => x.Value));
Console.WriteLine(user.Key + ", " + valuesAsString);
}
I have left out the use of your TextFieldParser class, but you can easily use that instead. This approach does, however, require that you can afford to load all of the data into memory. You don't mention whether this is viable.

The easiest way to do something like this is to convert each line to an object. You can use CsvHelper, https://www.nuget.org/packages/CsvHelper/, to do the work for you or you can iterate each line and parse to an object. It is a great tool and it knows how to properly parse CSV files into a collection of objects. Then, whether you create the collection yourself or use CsvHelper, you can use Linq to GroupBy, https://msdn.microsoft.com/en-us/library/bb534304(v=vs.100).aspx, your "key" (in this case UserId) and Aggregate, https://msdn.microsoft.com/en-us/library/bb549218(v=vs.110).aspx, the other property into a string. Then, you can use the new, grouped by, collection for your end goal (write it to file or use it for whatever you need).

You're basically finding all the unique entries so put them into a dictionary with the contact id as the key. As follows:
private void csvParse(string path)
{
using (TextFieldParser parser = new TextFieldParser(path))
{
parser.Delimiters = new string[] { "," };
Dictionary<string, List<string>> uniqueContacts = new Dictionary<string, List<string>>();
while (!parser.EndOfData)
{
string[] parts = parser.ReadFields();
if (parts == null || parts.Count() != 2)
{
break;
}
//if contact id not present in dictionary add
if (!uniqueContacts.ContainsKey(parts[0]))
uniqueContacts.Add(parts[0],new List<string>());
//now there's definitely an existing contact in dic (the one
//we've just added or a previously added one) so add to the
//list of strings for that contact
uniqueContacts[parts[0]].Add(parts[1]);
}
//now do something with that dictionary of unique user names and
// lists of strings, for example dump them to console in the
//format you specify:
foreach (var contactId in uniqueContacts.Keys)
{
var sb = new StringBuilder();
sb.Append($"contactId, ");
foreach (var bit in uniqueContacts[contactId])
{
sb.Append(bit);
if (bit != uniqueContacts[contactId].Last())
sb.Append(", ");
}
Console.WriteLine(sb);
}
}
}

Related

Fill a MDF Database with CSV data

Well this is how my CSV data looks like:
Artistname;RecordTitle;RecordType;Year;SongTitle
999;Concrete;LP;1981;Mercy Mercy
999;Concrete;LP;1981;Public Enemy No.1
999;Concrete;LP;1981;So Greedy
999;Concrete;LP;1981;Taboo
10cc;Bloody Tourists;LP;1978;Dreadlock Holiday
10cc;Bloody Tourists;LP;1978;Everyhing You've Ever Wanted To Know About!!!
10cc;Bloody Tourists;LP;1978;Shock On The Tube
This is my code where I save this data in the Database:
private void FillDatabase()
{
var firstTime = true;
var lines = File.ReadAllLines("musicDbData.csv");
var list = new List<string>();
foreach (var line in lines)
{
var split = line.Split(";");
if (!firstTime)
{
var artist = new Artist()
{
ArtistName = split[0],
};
db.Artists.Add(artist);
db.SaveChanges();
}
else
{
firstTime = false;
}
}
}
The problem is that every artist should be in the Database only once. Right now there is 4 times Artist 999 and 3 times 10cc and if everything is correct there should only be one row for 999 and one row for 10cc. What do I have to add to my code to get the expected result.
First, a CSV is a comma-separated values file, rather than semicolon.
Besides, the parameter in method String.Split can be type of Char. So you need to modify it like line.Split(';').
And your csv file contains column name line, you need to exclude it when reading the file.
if everything is correct there should only be one row for 999 and one row for 10cc
Do you want to just save the first data of 999 and 10cc to the database? If so, you can first use LINQ to check whether the Artistname already exists in the database.
private void FillDatabase()
{
var lines = File.ReadAllLines("musicDbData.csv");
int count = 0; // line count
foreach (var line in lines)
{
count++;
if (count == 1) // remove first line
continue;
var split = line.Split(';');
string artistname = split[0];
var artistIndb = db.ArtistTables
.Where(c => c.Artistname == artistname)
.SingleOrDefault();
if (artistIndb == null) // check if exists, if not ...
{
var artist = new ArtistTable()
{
Artistname = split[0],
SongTitle = split[4]
};
db.ArtistTables.Add(artist);
db.SaveChanges();
}
}
}
If you want to merge lines with the same Artistname, you can refer to the following code.
if (artistIndb == null)
{
// code omitted
// ...
}
else
{
artistIndb.SongTitle += " ," + split[4]; // Modify the data in SongTitle column
try
{
db.SaveChanges();
}
catch { }
}

How to read and separate segments of a txt file?

I have a txt file, that has headers and then 3 columns of values (i.e)
Description=null
area = 100
1,2,3
1,2,4
2,1,5 ...
... 1,2,1//(these are the values that I need in one list)
Then another segment
Description=null
area = 10
1,2,3
1,2,4
2,1,5 ...
... 1,2,1//(these are the values that I need in one list).
In fact I just need one list per "Table" of values, the values always are in 3 columns but, there are n segments, any idea?
Thanks!
List<double> VMM40xyz = new List<double>();
foreach (var item in VMM40blocklines)
{
if (item.Contains(','))
{
VMM40xyz.AddRange(item.Split(',').Select(double.Parse).ToList());
}
}
I tried this, but it just work with the values in just one big list.
It looks like you want your data to end up in a format like this:
public class SetOfData //Feel free to name these parts better.
{
public string Description = "";
public string Area = "";
public List<double> Data = new List<double>();
}
...stored somewhere in...
List<SetOfData> finalData = new List<SetOfData>();
So, here's how I'd read that in:
public static List<SetOfData> ReadCustomFile(string Filename)
{
if (!File.Exists(Filename))
{
throw new FileNotFoundException($"{Filename} does not exist.");
}
List<SetOfData> returnData = new List<SetOfData>();
SetOfData currentDataSet = null;
using (FileStream fs = new FileStream(Filename, FileMode.Open))
{
using (StreamReader reader = new StreamReader(fs))
{
while (!reader.EndOfStream)
{
string line = reader.ReadLine();
//This will start a new object on every 'Description' line.
if (line.Contains("Description="))
{
//Save off the old data set if there is one.
if (currentDataSet != null)
returnData.Add(currentDataSet);
currentDataSet = new SetOfData();
//Now, to make sure there is something after "Description=" and to set the Description if there is.
//Your example data used "null" here, which this will take literally to be a string containing the letters "null". You can check the contents of parts[1] inside the if block to change this.
string[] parts = line.Split('=');
if (parts.Length > 1)
currentDataSet.Description = parts[1].Trim();
}
else if (line.Contains("area = "))
{
//Just in case your file didn't start with a "Description" line for some reason.
if (currentDataSet == null)
currentDataSet = new SetOfData();
//And then we do some string splitting like we did for Description.
string[] parts = line.Split('=');
if (parts.Length > 1)
currentDataSet.Area = parts[1].Trim();
}
else
{
//Just in case your file didn't start with a "Description" line for some reason.
if (currentDataSet == null)
currentDataSet = new SetOfData();
string[] parts = line.Split(',');
foreach (string part in parts)
{
if (double.TryParse(part, out double number))
{
currentDataSet.Data.Add(number);
}
}
}
}
//Make sure to add the last set.
returnData.Add(currentDataSet);
}
}
return returnData;
}

Delete rows in a csv file

I have two files: Example1.csv and Example2.csv, note they are not comma-separated, but are saved with the 'csv' extension.
Example 1 has 1 column which has emails address only
Example 2 has many columns in which it has the column that is there in example 1 csv file.
Example1.csv file
emails
abc#gmail.com
jhg#yahoo.com
...
...
Example 2.csv
Column1 column2 Column3 column4 emails
1 45 456 123 abc#gmail.com
2 89 898 254 jhg#yahoo.com
3 85 365 789 ...
Now i need to delete the rows in example2.csv that matches with data in example 1 file, for example: Row 1 and 2 should be removed as they both the email matches.
string[] lines = File.ReadAllLines(#"C:\example2.csv");
var emails = File.ReadAllLines(#"C:\example1.csv");
List<string> linesToWrite = new List<string>();
foreach (string s in lines)
{
String[] split = s.Split(' ');
if (s.Contains(emails))
linesToWrite.Remove(s);
}
File.WriteAllLines("file3.csv", linesToWrite);
This should work:
var emails = new HashSet<string>(File.ReadAllLines(#"C:\example1.csv").Skip(1));
File.WriteAllLines("file3.csv", File.ReadAllLines("C:\example2.csv").Where(line => !emails.Contains(line.Split(',')[4]));
It reads all of file one, puts all emails into a format where lookup is easy, then goes through all lines in the second file and writes only those to disk that don't match any of the existing emails in their 5th column. You may want to expand on many parts, for example there is little to no error handling. It also compares emails case-sensitive, although emails are normally not.
Variable line is not string, but string array, same as lines, you are reading it in the same way as lines.
Also this line
if (s.Contains(line))
is not correct. You are trying to check if a string contains an array. If you need to check if a line contains an email from list, then this will be better:
if (split.Intersect(line).Any())
So, here is the final code.
var lines = File.ReadAllLines(#"C:\example2.csv");
var line = File.ReadAllLines(#"C:\example1.csv");
var linesToWrite = new List<string>();
foreach (var s in lines)
{
var split = s.Split(',');
if (split.Intersect(line).Any())
{
linesToWrite.Remove(s);
}
}
File.WriteAllLines("file3.csv", linesToWrite);
static void Main(string[] args)
{
var Example1CsvPath = #"C:\Inetpub\Poligon\Poligon\Resources\Example1.csv";
var Example2CsvPath = #"C:\Inetpub\Poligon\Poligon\Resources\Example2.csv";
var Example3CsvPath = #"C:\Inetpub\Poligon\Poligon\Resources\Example3.csv";
var EmailsToDelete = new List<string>();
var Result = new List<string>();
foreach(var Line in System.IO.File.ReadAllLines(Example1CsvPath))
{
if (!string.IsNullOrWhiteSpace(Line) && Line.IndexOf('#') > -1)
{
EmailsToDelete.Add(Line.Trim());
}
}
foreach (var Line in System.IO.File.ReadAllLines(Example2CsvPath))
{
if (!string.IsNullOrWhiteSpace(Line))
{
var Values = Line.Split(' ');
if (!EmailsToDelete.Contains(Values[4]))
{
Result.Add(Line);
}
}
}
System.IO.File.WriteAllLines(Example3CsvPath, Result);
}
I know this is 4 years-old... But I've got some ideas from this and I like to share my solution...
The idea behind this code is a simple CSV, with maximum of about 20 lines (reeeeally maximum), so I've decided to make something basic and not use a DB for this.
My solution is to rescan the CSV saving all variables (that is not the same that I like to delete) into a list and after scanning the CSV, it writes the list into the CSV (removing the one I've passed {textBox1})
List<string> _ = new();
try {
using (var reader = new StreamReader($"{Main.directory}\\bin\\ip.csv")) {
while (!reader.EndOfStream) {
var line = reader.ReadLine();
var values = line.Split(',');
if (values[0] == textBox1.Text || values[1] == textBox2.Text)
continue;
_.Add($"{values[0]},{values[1]},{values[2]},");
}
}
File.WriteAllLines($"{Main.directory}\\bin\\ip.csv", _);
} catch (Exception f) {
MessageBox.Show(f.Message);
}

Split string that includes multiline substrings into substrings [duplicate]

I'm writing a simple import application and need to read a CSV file, show result in a DataGrid and show corrupted lines of the CSV file in another grid. For example, show the lines that are shorter than 5 values in another grid. I'm trying to do that like this:
StreamReader sr = new StreamReader(FilePath);
importingData = new Account();
string line;
string[] row = new string [5];
while ((line = sr.ReadLine()) != null)
{
row = line.Split(',');
importingData.Add(new Transaction
{
Date = DateTime.Parse(row[0]),
Reference = row[1],
Description = row[2],
Amount = decimal.Parse(row[3]),
Category = (Category)Enum.Parse(typeof(Category), row[4])
});
}
but it's very difficult to operate on arrays in this case. Is there a better way to split the values?
Don't reinvent the wheel. Take advantage of what's already in .NET BCL.
add a reference to the Microsoft.VisualBasic (yes, it says VisualBasic but it works in C# just as well - remember that at the end it is all just IL)
use the Microsoft.VisualBasic.FileIO.TextFieldParser class to parse CSV file
Here is the sample code:
using (TextFieldParser parser = new TextFieldParser(#"c:\temp\test.csv"))
{
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(",");
while (!parser.EndOfData)
{
//Processing row
string[] fields = parser.ReadFields();
foreach (string field in fields)
{
//TODO: Process field
}
}
}
It works great for me in my C# projects.
Here are some more links/informations:
MSDN: Read From Comma-Delimited Text Files in Visual Basic
MSDN: TextFieldParser Class
I recommend CsvHelper from NuGet.
PS: Regarding other more upvoted answers, I'm sorry but adding a reference to Microsoft.VisualBasic is:
Ugly
Not cross-platform, because it's not available in .NETCore/.NET5 (and Mono never had very good support of Visual Basic, so it may be buggy).
My experience is that there are many different csv formats. Specially how they handle escaping of quotes and delimiters within a field.
These are the variants I have ran into:
quotes are quoted and doubled (excel) i.e. 15" -> field1,"15""",field3
quotes are not changed unless the field is quoted for some other reason. i.e. 15" -> field1,15",fields3
quotes are escaped with \. i.e. 15" -> field1,"15\"",field3
quotes are not changed at all (this is not always possible to parse correctly)
delimiter is quoted (excel). i.e. a,b -> field1,"a,b",field3
delimiter is escaped with \. i.e. a,b -> field1,a\,b,field3
I have tried many of the existing csv parsers but there is not a single one that can handle the variants I have ran into. It is also difficult to find out from the documentation which escaping variants the parsers support.
In my projects I now use either the VB TextFieldParser or a custom splitter.
Sometimes using libraries are cool when you do not want to reinvent the wheel, but in this case one can do the same job with fewer lines of code and easier to read compared to using libraries.
Here is a different approach which I find very easy to use.
In this example, I use StreamReader to read the file
Regex to detect the delimiter from each line(s).
An array to collect the columns from index 0 to n
using (StreamReader reader = new StreamReader(fileName))
{
string line;
while ((line = reader.ReadLine()) != null)
{
//Define pattern
Regex CSVParser = new Regex(",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))");
//Separating columns to array
string[] X = CSVParser.Split(line);
/* Do something with X */
}
}
CSV can get complicated real fast.
Use something robust and well-tested:
FileHelpers:
www.filehelpers.net
The FileHelpers are a free and easy to use .NET library to import/export data from fixed length or delimited records in files, strings or streams.
Another one to this list, Cinchoo ETL - an open source library to read and write CSV files
For a sample CSV file below
Id, Name
1, Tom
2, Mark
Quickly you can load them using library as below
using (var reader = new ChoCSVReader("test.csv").WithFirstLineHeader())
{
foreach (dynamic item in reader)
{
Console.WriteLine(item.Id);
Console.WriteLine(item.Name);
}
}
If you have POCO class matching the CSV file
public class Employee
{
public int Id { get; set; }
public string Name { get; set; }
}
You can use it to load the CSV file as below
using (var reader = new ChoCSVReader<Employee>("test.csv").WithFirstLineHeader())
{
foreach (var item in reader)
{
Console.WriteLine(item.Id);
Console.WriteLine(item.Name);
}
}
Please check out articles at CodeProject on how to use it.
Disclaimer: I'm the author of this library
I use this here:
http://www.codeproject.com/KB/database/GenericParser.aspx
Last time I was looking for something like this I found it as an answer to this question.
private static DataTable ConvertCSVtoDataTable(string strFilePath)
{
DataTable dt = new DataTable();
using (StreamReader sr = new StreamReader(strFilePath))
{
string[] headers = sr.ReadLine().Split(',');
foreach (string header in headers)
{
dt.Columns.Add(header);
}
while (!sr.EndOfStream)
{
string[] rows = sr.ReadLine().Split(',');
DataRow dr = dt.NewRow();
for (int i = 0; i < headers.Length; i++)
{
dr[i] = rows[i];
}
dt.Rows.Add(dr);
}
}
return dt;
}
private static void WriteToDb(DataTable dt)
{
string connectionString =
"Data Source=localhost;" +
"Initial Catalog=Northwind;" +
"Integrated Security=SSPI;";
using (SqlConnection con = new SqlConnection(connectionString))
{
using (SqlCommand cmd = new SqlCommand("spInsertTest", con))
{
cmd.CommandType = CommandType.StoredProcedure;
cmd.Parameters.Add("#policyID", SqlDbType.Int).Value = 12;
cmd.Parameters.Add("#statecode", SqlDbType.VarChar).Value = "blagh2";
cmd.Parameters.Add("#county", SqlDbType.VarChar).Value = "blagh3";
con.Open();
cmd.ExecuteNonQuery();
}
}
}
Here's a solution I coded up today for a situation where I needed to parse a CSV without relying on external libraries. I haven't tested performance for large files since it wasn't relevant to my particular use case but I'd expect it to perform reasonably well for most situations.
static List<List<string>> ParseCsv(string csv) {
var parsedCsv = new List<List<string>>();
var row = new List<string>();
string field = "";
bool inQuotedField = false;
for (int i = 0; i < csv.Length; i++) {
char current = csv[i];
char next = i == csv.Length - 1 ? ' ' : csv[i + 1];
// if current character is not a quote or comma or carriage return or newline (or not a quote and currently in an a quoted field), just add the character to the current field text
if ((current != '"' && current != ',' && current != '\r' && current != '\n') || (current != '"' && inQuotedField)) {
field += current;
} else if (current == ' ' || current == '\t') {
continue; // ignore whitespace outside a quoted field
} else if (current == '"') {
if (inQuotedField && next == '"') { // quote is escaping a quote within a quoted field
i++; // skip escaping quote
field += current;
} else if (inQuotedField) { // quote signifies the end of a quoted field
row.Add(field);
if (next == ',') {
i++; // skip the comma separator since we've already found the end of the field
}
field = "";
inQuotedField = false;
} else { // quote signifies the beginning of a quoted field
inQuotedField = true;
}
} else if (current == ',') { //
row.Add(field);
field = "";
} else if (current == '\n') {
row.Add(field);
parsedCsv.Add(new List<string>(row));
field = "";
row.Clear();
}
}
return parsedCsv;
}
First of all need to understand what is CSV and how to write it.
Every next string ( /r/n ) is next "table" row.
"Table" cells is separated by some delimiter symbol. Most often used symbols is \t or ,
Every cell possibly can contain this delimiter symbol (cell must to start with quotes symbol and ends with this symbol in this case)
Every cell possibly can contains /r/n sybols (cell must to start with quotes symbol and ends with this symbol in this case)
The easiest way for C#/Visual Basic to work with CSV files is to use standard Microsoft.VisualBasic library. You just need to add needed reference, and the following string to your class:
using Microsoft.VisualBasic.FileIO;
Yes, you can use it in C#, don't worry. This library can read relatively big files and supports all of needed rules, so you will be able to work with all of CSV files.
Some time ago I had wrote simple class for CSV read/write based on this library. Using this simple class you will be able to work with CSV like with 2 dimensions array.
You can find my class by the following link:
https://github.com/ukushu/DataExporter
Simple example of using:
Csv csv = new Csv("\t");//delimiter symbol
csv.FileOpen("c:\\file1.csv");
var row1Cell6Value = csv.Rows[0][5];
csv.AddRow("asdf","asdffffff","5")
csv.FileSave("c:\\file2.csv");
To complete the previous answers, one may need a collection of objects from his CSV File, either parsed by the TextFieldParser or the string.Split method, and then each line converted to an object via Reflection. You obviously first need to define a class that matches the lines of the CSV file.
I used the simple CSV Serializer from Michael Kropat found here: Generic class to CSV (all properties)
and reused his methods to get the fields and properties of the wished class.
I deserialize my CSV file with the following method:
public static IEnumerable<T> ReadCsvFileTextFieldParser<T>(string fileFullPath, string delimiter = ";") where T : new()
{
if (!File.Exists(fileFullPath))
{
return null;
}
var list = new List<T>();
var csvFields = GetAllFieldOfClass<T>();
var fieldDict = new Dictionary<int, MemberInfo>();
using (TextFieldParser parser = new TextFieldParser(fileFullPath))
{
parser.SetDelimiters(delimiter);
bool headerParsed = false;
while (!parser.EndOfData)
{
//Processing row
string[] rowFields = parser.ReadFields();
if (!headerParsed)
{
for (int i = 0; i < rowFields.Length; i++)
{
// First row shall be the header!
var csvField = csvFields.Where(f => f.Name == rowFields[i]).FirstOrDefault();
if (csvField != null)
{
fieldDict.Add(i, csvField);
}
}
headerParsed = true;
}
else
{
T newObj = new T();
for (int i = 0; i < rowFields.Length; i++)
{
var csvFied = fieldDict[i];
var record = rowFields[i];
if (csvFied is FieldInfo)
{
((FieldInfo)csvFied).SetValue(newObj, record);
}
else if (csvFied is PropertyInfo)
{
var pi = (PropertyInfo)csvFied;
pi.SetValue(newObj, Convert.ChangeType(record, pi.PropertyType), null);
}
else
{
throw new Exception("Unhandled case.");
}
}
if (newObj != null)
{
list.Add(newObj);
}
}
}
}
return list;
}
public static IEnumerable<MemberInfo> GetAllFieldOfClass<T>()
{
return
from mi in typeof(T).GetMembers(BindingFlags.Public | BindingFlags.Instance | BindingFlags.Static)
where new[] { MemberTypes.Field, MemberTypes.Property }.Contains(mi.MemberType)
let orderAttr = (ColumnOrderAttribute)Attribute.GetCustomAttribute(mi, typeof(ColumnOrderAttribute))
orderby orderAttr == null ? int.MaxValue : orderAttr.Order, mi.Name
select mi;
}
I'd highly suggest using CsvHelper.
Here's a quick example:
public class csvExampleClass
{
public string Id { get; set; }
public string Firstname { get; set; }
public string Lastname { get; set; }
}
var items = DeserializeCsvFile<List<csvExampleClass>>( csvText );
public static List<T> DeserializeCsvFile<T>(string text)
{
CsvReader csv = new CsvReader( new StringReader( text ) );
csv.Configuration.Delimiter = ",";
csv.Configuration.HeaderValidated = null;
csv.Configuration.MissingFieldFound = null;
return (List<T>)csv.GetRecords<T>();
}
Full documentation can be found at: https://joshclose.github.io/CsvHelper

How to remove duplicates from List<string> without LINQ? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Remove duplicates from a List<T> in C#
i have a List like below (so big email list):
source list :
item 0 : jumper#yahoo.com|32432
item 1 : goodzila#yahoo.com|32432|test23
item 2 : alibaba#yahoo.com|32432|test65
item 3 : blabla#yahoo.com|32432|test32
the important part of each item is email address and the other parts(separated with pipes are not important) but i want to keep them in final list.
as i said my list is to big and i think it's not recommended to use another list.
how can i remove duplicate emails (entire item) form that list without using LINQ ?
my codes are like below :
private void WorkOnFile(UploadedFile file, string filePath)
{
File.SetAttributes(filePath, FileAttributes.Archive);
FileSecurity fSecurity = File.GetAccessControl(filePath);
fSecurity.AddAccessRule(new FileSystemAccessRule(#"Everyone",
FileSystemRights.FullControl,
AccessControlType.Allow));
File.SetAccessControl(filePath, fSecurity);
string[] lines = File.ReadAllLines(filePath);
List<string> list_lines = new List<string>(lines);
var new_lines = list_lines.Select(line => string.Join("|", line.Split(new string[] { " " }, StringSplitOptions.RemoveEmptyEntries)));
List<string> new_list_lines = new List<string>(new_lines);
int Duplicate_Count = 0;
RemoveDuplicates(ref new_list_lines, ref Duplicate_Count);
File.WriteAllLines(filePath, new_list_lines.ToArray());
}
private void RemoveDuplicates(ref List<string> list_lines, ref int Duplicate_Count)
{
char[] splitter = { '|' };
list_lines.ForEach(delegate(string line)
{
// ??
});
}
EDIT :
some duplicate email addrresses in that list have different parts ->
what can i do about them :
mean
goodzila#yahoo.com|32432|test23
and
goodzila#yahoo.com|asdsa|324234
Thanks in advance.
say you have a list of possible duplicates:
List<string> emailList ....
Then the unique list is the set of that list:
HashSet<string> unique = new HashSet<string>( emailList )
private void RemoveDuplicates(ref List<string> list_lines, ref int Duplicate_Count)
{
Duplicate_Count = 0;
List<string> list_lines2 = new List<string>();
HashSet<string> hash = new HashSet<string>();
foreach (string line in list_lines)
{
string[] split = line.Split('|');
string firstPart = split.Length > 0 ? split[0] : string.Empty;
if (hash.Add(firstPart))
{
list_lines2.Add(line);
}
else
{
Duplicate_Count++;
}
}
list_lines = list_lines2;
}
The easiest thing to do is to iterate through the lines in the file and add them to a HashSet. HashSets won't insert the duplicate entries and it won't generate an exception either. At the end you'll have a unique list of items and no exceptions will be generated for any duplicates.
1 - Get rid of your pipe separated string (create an dto class corresponding to the data it's representing)
2 - which rule do you want to apply to select two object with the same id ?
Or maybe this code can be useful for you :)
It's using the same method as the one in #xanatos answer
string[] lines= File.ReadAllLines(filePath);
Dictionary<string, string> items;
foreach (var line in lines )
{
var key = line.Split('|').ElementAt(0);
if (!items.ContainsKey(key))
items.Add(key, line);
}
List<string> list_lines = items.Values.ToList();
First, I suggest to you load the file via stream.
Then, create a type that represent your rows and load them into a HashSet(for
performance considerations).
Look (Ive removed some of your code to make it simple):
public struct LineType
{
public string Email { get; set; }
public string Others { get; set; }
public override bool Equals(object obj)
{
return this.Email.Equals(((LineType)obj).Email);
}
}
private static void WorkOnFile(string filePath)
{
StreamReader stream = File.OpenText(filePath);
HashSet<LineType> hashSet = new HashSet<LineType>();
while (true)
{
string line = stream.ReadLine();
if (line == null)
break;
string new_line = string.Join("|", line.Split(new string[] { " " }, StringSplitOptions.RemoveEmptyEntries));
LineType lineType = new LineType()
{
Email = new_line.Split('|')[3],
Others = new_line
};
if (!hashSet.Contains(lineType))
hashSet.Add(lineType);
}
}

Categories

Resources