Well this is how my CSV data looks like:
Artistname;RecordTitle;RecordType;Year;SongTitle
999;Concrete;LP;1981;Mercy Mercy
999;Concrete;LP;1981;Public Enemy No.1
999;Concrete;LP;1981;So Greedy
999;Concrete;LP;1981;Taboo
10cc;Bloody Tourists;LP;1978;Dreadlock Holiday
10cc;Bloody Tourists;LP;1978;Everyhing You've Ever Wanted To Know About!!!
10cc;Bloody Tourists;LP;1978;Shock On The Tube
This is my code where I save this data in the Database:
private void FillDatabase()
{
var firstTime = true;
var lines = File.ReadAllLines("musicDbData.csv");
var list = new List<string>();
foreach (var line in lines)
{
var split = line.Split(";");
if (!firstTime)
{
var artist = new Artist()
{
ArtistName = split[0],
};
db.Artists.Add(artist);
db.SaveChanges();
}
else
{
firstTime = false;
}
}
}
The problem is that every artist should be in the Database only once. Right now there is 4 times Artist 999 and 3 times 10cc and if everything is correct there should only be one row for 999 and one row for 10cc. What do I have to add to my code to get the expected result.
First, a CSV is a comma-separated values file, rather than semicolon.
Besides, the parameter in method String.Split can be type of Char. So you need to modify it like line.Split(';').
And your csv file contains column name line, you need to exclude it when reading the file.
if everything is correct there should only be one row for 999 and one row for 10cc
Do you want to just save the first data of 999 and 10cc to the database? If so, you can first use LINQ to check whether the Artistname already exists in the database.
private void FillDatabase()
{
var lines = File.ReadAllLines("musicDbData.csv");
int count = 0; // line count
foreach (var line in lines)
{
count++;
if (count == 1) // remove first line
continue;
var split = line.Split(';');
string artistname = split[0];
var artistIndb = db.ArtistTables
.Where(c => c.Artistname == artistname)
.SingleOrDefault();
if (artistIndb == null) // check if exists, if not ...
{
var artist = new ArtistTable()
{
Artistname = split[0],
SongTitle = split[4]
};
db.ArtistTables.Add(artist);
db.SaveChanges();
}
}
}
If you want to merge lines with the same Artistname, you can refer to the following code.
if (artistIndb == null)
{
// code omitted
// ...
}
else
{
artistIndb.SongTitle += " ," + split[4]; // Modify the data in SongTitle column
try
{
db.SaveChanges();
}
catch { }
}
Related
I want to insert multiple "items" into a list using a foreach loop(looping over a list). Now I want to insert the lines as a <td> element. But by specifying the index at the position I want to insert the line, the previous one gets overwritten. How can I add a line at position and then add the rest afterwards without overwriting the previously added line
private void Create_Driver_Report(string npcName)
{
var fileName = Get_Path("Driver_Reports.html");
var endTag = npcName;
var lineToAdd = "<!--New Line Here-->";
var htmlContent = File.ReadAllLines(fileName).ToList();
var index = htmlContent.FindIndex(x => x.Contains(lineToAdd));
htmlContent.Insert(index + 1, endTag);
File.WriteAllLines("drivers.html", htmlContent);
}
How I want to do it in theory
foreach (Drivers item in drivers)
{
Create_Driver_Report($"<td>{item.Driver_ID}</td>");
Create_Driver_Report($"<td>{item.Driver_Name}</td>");
Create_Driver_Report($"<td>{item.Vehicle_ID}</td>");
Create_Driver_Report($"<td>{item.Company_ID}</td>");
Create_Driver_Report($"<td>{item.Company_Name}</td>");
}
Override the ToString() method or create a new one.
If you always want to insert the same properties, seems unnecessary to invoke the Create_Driver_Report over and over again.
public override String ToString() {
return
$"<td>{this.Driver_ID}</td>\n" +
$"<td>{this.Driver_Name}</td>\n" +
$"<td>{this.Vehichle_ID}</td>\n" +
$"<td>{this.Company_ID}</td>\n" +
$"<td>{this.Company_Name}</td>";
}
and you can invoke it like:
foreach (Drivers item in drivers) {
Create_Driver_Report(item.ToString());
}
Edit:
Option 1:
Use LINQ Select() and List.InsertRange()
public String ToHtmlRow() {
return
$"<tr><td>{this.Driver_ID}</td><td>{this.Driver_Name}</td><td>{this.Vehichle_ID}</td><td>{this.Company_ID}</td><td>{this.Company_Name}</td></tr>";
}
{
IEnumerable<string> lines = drivers.Select(driver => driver.ToString())
Create_Driver_Report(lines);
}
static void Create_Driver_Report(IEnumerable<string> lines) {
var fileName = Get_Path("Driver_Reports.html");
var lineToAdd = "<!--New Line Here-->";
var htmlContent = File.ReadAllLines(fileName).ToList();
var index = htmlContent.FindIndex(x => x.Contains(lineToAdd));
htmlContent.InsertRange(index + 1, lines);
File.WriteAllLines("drivers.html", htmlContent);
}
Option 2:
You just add all of the rows you want and you call the Create_Driver_Report only once.
List<String> toAdd = new List<String>();
foreach (Drivers item in drivers) {
toAdded.Add(item.ToHtmlRow());
}
Create_Driver_Report(String.Join("\n", toAdd));
I have a txt file, that has headers and then 3 columns of values (i.e)
Description=null
area = 100
1,2,3
1,2,4
2,1,5 ...
... 1,2,1//(these are the values that I need in one list)
Then another segment
Description=null
area = 10
1,2,3
1,2,4
2,1,5 ...
... 1,2,1//(these are the values that I need in one list).
In fact I just need one list per "Table" of values, the values always are in 3 columns but, there are n segments, any idea?
Thanks!
List<double> VMM40xyz = new List<double>();
foreach (var item in VMM40blocklines)
{
if (item.Contains(','))
{
VMM40xyz.AddRange(item.Split(',').Select(double.Parse).ToList());
}
}
I tried this, but it just work with the values in just one big list.
It looks like you want your data to end up in a format like this:
public class SetOfData //Feel free to name these parts better.
{
public string Description = "";
public string Area = "";
public List<double> Data = new List<double>();
}
...stored somewhere in...
List<SetOfData> finalData = new List<SetOfData>();
So, here's how I'd read that in:
public static List<SetOfData> ReadCustomFile(string Filename)
{
if (!File.Exists(Filename))
{
throw new FileNotFoundException($"{Filename} does not exist.");
}
List<SetOfData> returnData = new List<SetOfData>();
SetOfData currentDataSet = null;
using (FileStream fs = new FileStream(Filename, FileMode.Open))
{
using (StreamReader reader = new StreamReader(fs))
{
while (!reader.EndOfStream)
{
string line = reader.ReadLine();
//This will start a new object on every 'Description' line.
if (line.Contains("Description="))
{
//Save off the old data set if there is one.
if (currentDataSet != null)
returnData.Add(currentDataSet);
currentDataSet = new SetOfData();
//Now, to make sure there is something after "Description=" and to set the Description if there is.
//Your example data used "null" here, which this will take literally to be a string containing the letters "null". You can check the contents of parts[1] inside the if block to change this.
string[] parts = line.Split('=');
if (parts.Length > 1)
currentDataSet.Description = parts[1].Trim();
}
else if (line.Contains("area = "))
{
//Just in case your file didn't start with a "Description" line for some reason.
if (currentDataSet == null)
currentDataSet = new SetOfData();
//And then we do some string splitting like we did for Description.
string[] parts = line.Split('=');
if (parts.Length > 1)
currentDataSet.Area = parts[1].Trim();
}
else
{
//Just in case your file didn't start with a "Description" line for some reason.
if (currentDataSet == null)
currentDataSet = new SetOfData();
string[] parts = line.Split(',');
foreach (string part in parts)
{
if (double.TryParse(part, out double number))
{
currentDataSet.Data.Add(number);
}
}
}
}
//Make sure to add the last set.
returnData.Add(currentDataSet);
}
}
return returnData;
}
I have two files: Example1.csv and Example2.csv, note they are not comma-separated, but are saved with the 'csv' extension.
Example 1 has 1 column which has emails address only
Example 2 has many columns in which it has the column that is there in example 1 csv file.
Example1.csv file
emails
abc#gmail.com
jhg#yahoo.com
...
...
Example 2.csv
Column1 column2 Column3 column4 emails
1 45 456 123 abc#gmail.com
2 89 898 254 jhg#yahoo.com
3 85 365 789 ...
Now i need to delete the rows in example2.csv that matches with data in example 1 file, for example: Row 1 and 2 should be removed as they both the email matches.
string[] lines = File.ReadAllLines(#"C:\example2.csv");
var emails = File.ReadAllLines(#"C:\example1.csv");
List<string> linesToWrite = new List<string>();
foreach (string s in lines)
{
String[] split = s.Split(' ');
if (s.Contains(emails))
linesToWrite.Remove(s);
}
File.WriteAllLines("file3.csv", linesToWrite);
This should work:
var emails = new HashSet<string>(File.ReadAllLines(#"C:\example1.csv").Skip(1));
File.WriteAllLines("file3.csv", File.ReadAllLines("C:\example2.csv").Where(line => !emails.Contains(line.Split(',')[4]));
It reads all of file one, puts all emails into a format where lookup is easy, then goes through all lines in the second file and writes only those to disk that don't match any of the existing emails in their 5th column. You may want to expand on many parts, for example there is little to no error handling. It also compares emails case-sensitive, although emails are normally not.
Variable line is not string, but string array, same as lines, you are reading it in the same way as lines.
Also this line
if (s.Contains(line))
is not correct. You are trying to check if a string contains an array. If you need to check if a line contains an email from list, then this will be better:
if (split.Intersect(line).Any())
So, here is the final code.
var lines = File.ReadAllLines(#"C:\example2.csv");
var line = File.ReadAllLines(#"C:\example1.csv");
var linesToWrite = new List<string>();
foreach (var s in lines)
{
var split = s.Split(',');
if (split.Intersect(line).Any())
{
linesToWrite.Remove(s);
}
}
File.WriteAllLines("file3.csv", linesToWrite);
static void Main(string[] args)
{
var Example1CsvPath = #"C:\Inetpub\Poligon\Poligon\Resources\Example1.csv";
var Example2CsvPath = #"C:\Inetpub\Poligon\Poligon\Resources\Example2.csv";
var Example3CsvPath = #"C:\Inetpub\Poligon\Poligon\Resources\Example3.csv";
var EmailsToDelete = new List<string>();
var Result = new List<string>();
foreach(var Line in System.IO.File.ReadAllLines(Example1CsvPath))
{
if (!string.IsNullOrWhiteSpace(Line) && Line.IndexOf('#') > -1)
{
EmailsToDelete.Add(Line.Trim());
}
}
foreach (var Line in System.IO.File.ReadAllLines(Example2CsvPath))
{
if (!string.IsNullOrWhiteSpace(Line))
{
var Values = Line.Split(' ');
if (!EmailsToDelete.Contains(Values[4]))
{
Result.Add(Line);
}
}
}
System.IO.File.WriteAllLines(Example3CsvPath, Result);
}
I know this is 4 years-old... But I've got some ideas from this and I like to share my solution...
The idea behind this code is a simple CSV, with maximum of about 20 lines (reeeeally maximum), so I've decided to make something basic and not use a DB for this.
My solution is to rescan the CSV saving all variables (that is not the same that I like to delete) into a list and after scanning the CSV, it writes the list into the CSV (removing the one I've passed {textBox1})
List<string> _ = new();
try {
using (var reader = new StreamReader($"{Main.directory}\\bin\\ip.csv")) {
while (!reader.EndOfStream) {
var line = reader.ReadLine();
var values = line.Split(',');
if (values[0] == textBox1.Text || values[1] == textBox2.Text)
continue;
_.Add($"{values[0]},{values[1]},{values[2]},");
}
}
File.WriteAllLines($"{Main.directory}\\bin\\ip.csv", _);
} catch (Exception f) {
MessageBox.Show(f.Message);
}
I am trying to compare the value in the 0 index of an array on one line and the 0 index on the following line. Imagine a CSV where I have a unique identifier in the first column, a corresponding value in the second column.
USER1, 1P
USER1, 3G
USER2, 1P
USER3, 1V
I would like to check the value of [0] the next line (or previous if that's easier) to compare and if they are the same (as they are in the example) concatenate it to index 1. That is, the data should read as
USER1, 1P, 3G
USER2, 1P
USER3, 1V
before it gets passed onto the next function. So far I have
private void csvParse(string path)
{
using (TextFieldParser parser = new TextFieldParser(path))
{
parser.Delimiters = new string[] { "," };
while (!parser.EndOfData)
{
string[] parts = parser.ReadFields();
if (parts == null)
{
break;
}
contact.ContactId = parts[0];
long nextLine;
nextLine = parser.LineNumber+1;
//if line1 parts[0] == line2 parts[0] etc.
}
}
}
Does anyone have any suggestions? Thank you.
How about saving the array into a variable:
private void csvParse(string path)
{
using (TextFieldParser parser = new TextFieldParser(path))
{
parser.Delimiters = new string[] { "," };
string[] oldParts = new string[] { string.Empty };
while (!parser.EndOfData)
{
string[] parts = parser.ReadFields();
if (parts == null || parts.Length < 1)
{
break;
}
if (oldParts[0] == parts[0])
{
// concat logic goes here
}
else
{
contact.ContactId = parts[0];
}
long nextLine;
nextLine = parser.LineNumber+1;
oldParts = parts;
//if line1 parts[0] == line2 parts[0] etc.
}
}
}
If I understand you correctly, what you are asking is essentially "how do I group the values in the second column based on the values in the first column?".
A quick and quite succinct way of doing this would be to Group By using LINQ:
var linesGroupedByUser =
from line in File.ReadAllLines(path)
let elements = line.Split(',')
let user = new {Name = elements[0], Value = elements[1]}
group user by user.Name into users
select users;
foreach (var user in linesGroupedByUser)
{
string valuesAsString = String.Join(",", user.Select(x => x.Value));
Console.WriteLine(user.Key + ", " + valuesAsString);
}
I have left out the use of your TextFieldParser class, but you can easily use that instead. This approach does, however, require that you can afford to load all of the data into memory. You don't mention whether this is viable.
The easiest way to do something like this is to convert each line to an object. You can use CsvHelper, https://www.nuget.org/packages/CsvHelper/, to do the work for you or you can iterate each line and parse to an object. It is a great tool and it knows how to properly parse CSV files into a collection of objects. Then, whether you create the collection yourself or use CsvHelper, you can use Linq to GroupBy, https://msdn.microsoft.com/en-us/library/bb534304(v=vs.100).aspx, your "key" (in this case UserId) and Aggregate, https://msdn.microsoft.com/en-us/library/bb549218(v=vs.110).aspx, the other property into a string. Then, you can use the new, grouped by, collection for your end goal (write it to file or use it for whatever you need).
You're basically finding all the unique entries so put them into a dictionary with the contact id as the key. As follows:
private void csvParse(string path)
{
using (TextFieldParser parser = new TextFieldParser(path))
{
parser.Delimiters = new string[] { "," };
Dictionary<string, List<string>> uniqueContacts = new Dictionary<string, List<string>>();
while (!parser.EndOfData)
{
string[] parts = parser.ReadFields();
if (parts == null || parts.Count() != 2)
{
break;
}
//if contact id not present in dictionary add
if (!uniqueContacts.ContainsKey(parts[0]))
uniqueContacts.Add(parts[0],new List<string>());
//now there's definitely an existing contact in dic (the one
//we've just added or a previously added one) so add to the
//list of strings for that contact
uniqueContacts[parts[0]].Add(parts[1]);
}
//now do something with that dictionary of unique user names and
// lists of strings, for example dump them to console in the
//format you specify:
foreach (var contactId in uniqueContacts.Keys)
{
var sb = new StringBuilder();
sb.Append($"contactId, ");
foreach (var bit in uniqueContacts[contactId])
{
sb.Append(bit);
if (bit != uniqueContacts[contactId].Last())
sb.Append(", ");
}
Console.WriteLine(sb);
}
}
}
I am currently able to parse and extract data from large tab delimited file. I am reading, parsing and extracting line by line and adding the split items in my Data table (Row Limit adding 3 rows at a time). I need to skip even lines i.e. Read first maximum tab delimited line and then skip 2nd one and read the third one directly.
My Tab delimited source file format
001Mean 26.975 1.1403 910.45
001Stdev 26.975 1.1403 910.45
002Mean 26.975 1.1403 910.45
002Stdev 26.975 1.1403 910.45
Need to skip or avoid reading Stdev tab delimited lines.
C# Code:
Getting the Maximum length of items in a tab delimited line of the file by splitting a line
using (var reader = new StreamReader(sourceFileFullName))
{
string line = null;
line = reader.ReadToEnd();
if (!string.IsNullOrEmpty(line))
{
var list_with_max_cols = line.Split('\n').OrderByDescending(y => y.Split('\t').Count()).Take(1);
foreach (var value in list_with_max_cols)
{
var values = value.ToString().Split(new[] { '\t', '\n' }).ToArray();
MAX_NO_OF_COLUMNS = values.Length;
}
}
}
Reading the file line by line until maximum length in a tab delimited line is satisfied as first line to parse and extract
using (var reader = new StreamReader(sourceFileFullName))
{
string new_read_line = null;
//Read and display lines from the file until the end of the file is reached.
while ((new_read_line = reader.ReadLine()) != null)
{
var items = new_read_line.Split(new[] { '\t', '\n' }).ToArray();
if (items.Length != MAX_NO_OF_COLUMNS)
continue;
//when reach first line it is column list need to create datatable based on that.
if (firstLineOfFile)
{
columnData = new_read_line;
firstLineOfFile = false;
continue;
}
if (firstLineOfChunk)
{
firstLineOfChunk = false;
chunkDataTable = CreateEmptyDataTable(columnData);
}
AddRow(chunkDataTable, new_read_line);
chunkRowCount++;
if (chunkRowCount == _chunkRowLimit)
{
firstLineOfChunk = true;
chunkRowCount = 0;
yield return chunkDataTable;
chunkDataTable = null;
}
}
}
Creating Data Table:
private DataTable CreateEmptyDataTable(string firstLine)
{
IList<string> columnList = Split(firstLine);
var dataTable = new DataTable("TableName");
for (int columnIndex = 0; columnIndex < columnList.Count; columnIndex++)
{
string c_string = columnList[columnIndex];
if (Regex.Match(c_string, "\\s").Success)
{
string tmp = Regex.Replace(c_string, "\\s", "");
string finaltmp = Regex.Replace(tmp, #" ?\[.*?\]", ""); // To strip strings inside [] and inclusive [] alone
columnList[columnIndex] = finaltmp;
}
}
dataTable.Columns.AddRange(columnList.Select(v => new DataColumn(v)).ToArray());
dataTable.Columns.Add("ID");
return dataTable;
}
How to skip lines by reading alternatively and split and then add to my datatable !!!
AddRow Function : Managed to achieve my requirement by adding following changes !!!
private void AddRow(DataTable dataTable, string line)
{
if (line.Contains("Stdev"))
{
return;
}
else
{
//Rest of Code
}
}
Considering you have tab separated values in each line, how about reading the odd lines and splitting them into arrays. This is just a sample; you can expand upon this.
Test data (file.txt)
luck is when opportunity meets preparation
this line needs to be skipped
microsoft visual studio
another line to be skipped
let us all code
Code
var oddLines = File.ReadLines(#"C:\projects\file.txt").Where((item, index) => index%2 == 0);
foreach (var line in oddLines)
{
var words = line.Split('\t');
}
Debug screen shots
EDIT
To get lines that don't contain 'Stdev'
var filteredLines = System.IO.File.ReadLines(#"C:\projects\file.txt").Where(item => !item.Contains("Stdev"));
Change
using (var reader = new StreamReader(sourceFileFullName))
{
string new_read_line = null;
//Read and display lines from the file until the end of the file is reached.
while ((new_read_line = reader.ReadLine()) != null)
{
var items = new_read_line.Split(new[] { '\t', '\n' }).ToArray();
if (items.Length != MAX_NO_OF_COLUMNS)
continue;
To
using (var reader = new StreamReader(sourceFileFullName))
{
int cnt = 0;
string new_read_line = null;
//Read and display lines from the file until the end of the file is reached.
while ((new_read_line = reader.ReadLine()) != null)
{
cnt++;
if(cnt % 2 == 0)
continue;
var items = new_read_line.Split(new[] { '\t', '\n' }).ToArray();
if (items.Length != MAX_NO_OF_COLUMNS)
continue;