I have a txt file, that has headers and then 3 columns of values (i.e)
Description=null
area = 100
1,2,3
1,2,4
2,1,5 ...
... 1,2,1//(these are the values that I need in one list)
Then another segment
Description=null
area = 10
1,2,3
1,2,4
2,1,5 ...
... 1,2,1//(these are the values that I need in one list).
In fact I just need one list per "Table" of values, the values always are in 3 columns but, there are n segments, any idea?
Thanks!
List<double> VMM40xyz = new List<double>();
foreach (var item in VMM40blocklines)
{
if (item.Contains(','))
{
VMM40xyz.AddRange(item.Split(',').Select(double.Parse).ToList());
}
}
I tried this, but it just work with the values in just one big list.
It looks like you want your data to end up in a format like this:
public class SetOfData //Feel free to name these parts better.
{
public string Description = "";
public string Area = "";
public List<double> Data = new List<double>();
}
...stored somewhere in...
List<SetOfData> finalData = new List<SetOfData>();
So, here's how I'd read that in:
public static List<SetOfData> ReadCustomFile(string Filename)
{
if (!File.Exists(Filename))
{
throw new FileNotFoundException($"{Filename} does not exist.");
}
List<SetOfData> returnData = new List<SetOfData>();
SetOfData currentDataSet = null;
using (FileStream fs = new FileStream(Filename, FileMode.Open))
{
using (StreamReader reader = new StreamReader(fs))
{
while (!reader.EndOfStream)
{
string line = reader.ReadLine();
//This will start a new object on every 'Description' line.
if (line.Contains("Description="))
{
//Save off the old data set if there is one.
if (currentDataSet != null)
returnData.Add(currentDataSet);
currentDataSet = new SetOfData();
//Now, to make sure there is something after "Description=" and to set the Description if there is.
//Your example data used "null" here, which this will take literally to be a string containing the letters "null". You can check the contents of parts[1] inside the if block to change this.
string[] parts = line.Split('=');
if (parts.Length > 1)
currentDataSet.Description = parts[1].Trim();
}
else if (line.Contains("area = "))
{
//Just in case your file didn't start with a "Description" line for some reason.
if (currentDataSet == null)
currentDataSet = new SetOfData();
//And then we do some string splitting like we did for Description.
string[] parts = line.Split('=');
if (parts.Length > 1)
currentDataSet.Area = parts[1].Trim();
}
else
{
//Just in case your file didn't start with a "Description" line for some reason.
if (currentDataSet == null)
currentDataSet = new SetOfData();
string[] parts = line.Split(',');
foreach (string part in parts)
{
if (double.TryParse(part, out double number))
{
currentDataSet.Data.Add(number);
}
}
}
}
//Make sure to add the last set.
returnData.Add(currentDataSet);
}
}
return returnData;
}
Related
i have to import 2 CSV's.
CSV 1 [49]: Including about 50 tab seperated colums.
CSV 2:[2] Inlcudes 3 Columns which should be replaced on the [3] [6] and [11] place of my first csv.
So heres what i do:
1) Importing the csv and split into a array.
string employeedatabase = "MYPATH";
List<String> status = new List<String>();
StreamReader file2 = new System.IO.StreamReader(filename);
string line = file2.ReadLine();
while ((line = file2.ReadLine()) != null)
{
string[] ud = line.Split('\t');
status.Add(ud[0]);
}
String[] ud_status = status.ToArray();
PROBLEM 1: i have about 50 colums to handle, ud_status is just the first, so do i need 50 Lists and 50 String arrays?
2) Importing the second csv and split into a array.
List<String> vorname = new List<String>();
List<String> nachname = new List<String>();
List<String> username = new List<String>();
StreamReader file = new System.IO.StreamReader(employeedatabase);
string line3 = file.ReadLine();
while ((line3 = file.ReadLine()) != null)
{
string[] data = line3.Split(';');
vorname.Add(data[0]);
nachname.Add(data[1]);
username.Add(data[2]);
}
String[] db_vorname = vorname.ToArray();
String[] db_nachname = nachname.ToArray();
String[] db_username = username.ToArray();
PROBLEM 2: After loading these two csv's i dont know how to combine them, and change to columns as mentioned above ..
somethine like this?
mynewArray = ud_status + "/t" + ud_xy[..n] + "/t" + changed_colum + ud_xy[..n];
save "mynewarray" into tablulator seperated csv with encoding "utf-8".
To read the file into a meaningful format, you should set up a class that defines the format of your CSV:
public class CsvRow
{
public string vorname { get; set; }
public string nachname { get; set; }
public string username { get; set; }
public CsvRow (string[] data)
{
vorname = data[0];
nachname = data[1];
username = data[2];
}
}
Then populate a list of this:
List<CsvRow> rows = new List<CsvRow>();
StreamReader file = new System.IO.StreamReader(employeedatabase);
string line3 = file.ReadLine();
while ((line3 = file.ReadLine()) != null)
{
rows.Add(new CsvRow(line3.Split(';'));
}
Similarly format your other CSV and include unused properties for the new fields. Once you have loaded both, you can populate the new properties from this list in a loop, matching the records by whatever common field the CSVs hopefully share. Then finally output the resulting data to a new CSV file.
Your solution is not to use string arrays to do this. That will just drive you crazy. It's better to use the System.Data.DataTable object.
I didn't get a chance to test the LINQ lambda expression at the end of this (or really any of it, I wrote this on a break), but it should get you on the right track.
using (var ds = new System.Data.DataSet("My Data"))
{
ds.Tables.Add("File0");
ds.Tables.Add("File1");
string[] line;
using (var reader = new System.IO.StreamReader("FirstFile"))
{
//first we get columns for table 0
foreach (string s in reader.ReadLine().Split('\t'))
ds.Tables["File0"].Columns.Add(s);
while ((line = reader.ReadLine().Split('\t')) != null)
{
//and now the rest of the data.
var r = ds.Tables["File0"].NewRow();
for (int i = 0; i <= line.Length; i++)
{
r[i] = line[i];
}
ds.Tables["File0"].Rows.Add(r);
}
}
//we could probably do these in a loop or a second method,
//but you may want subtle differences, so for now we just do it the same way
//for file1
using (var reader2 = new System.IO.StreamReader("SecondFile"))
{
foreach (string s in reader2.ReadLine().Split('\t'))
ds.Tables["File1"].Columns.Add(s);
while ((line = reader2.ReadLine().Split('\t')) != null)
{
//and now the rest of the data.
var r = ds.Tables["File1"].NewRow();
for (int i = 0; i <= line.Length; i++)
{
r[i] = line[i];
}
ds.Tables["File1"].Rows.Add(r);
}
}
//you now have these in functioning datatables. Because we named columns,
//you can call them by name specifically, or by index, to replace in the first datatable.
string[] columnsToReplace = new string[] { "firstColumnName", "SecondColumnName", "ThirdColumnName" };
for(int i = 0; i < ds.Tables[0].Rows.Count; i++)
{
//you didn't give a sign of any relation between the two tables
//so this is just by row, and assumes the row count is equivalent.
//This is also not advised.
//if there is a key these sets of data share
//you should join on them instead.
foreach(DataRow dr in ds.Tables[0].Rows[i].ItemArray)
{
dr[3] = ds.Tables[1].Rows[i][columnsToReplace[0]];
dr[6] = ds.Tables[1].Rows[i][columnsToReplace[1]];
dr[11] = ds.Tables[1].Rows[i][columnsToReplace[2]];
}
}
//ds.Tables[0] now has the output you want.
string output = String.Empty;
foreach (var s in ds.Tables[0].Columns)
output = String.Concat(output, s ,"\t");
output = String.Concat(output, Environment.NewLine); // columns ready, now the rows.
foreach (DataRow r in ds.Tables[0].Rows)
output = string.Concat(output, r.ItemArray.SelectMany(t => (t.ToString() + "\t")), Environment.NewLine);
if(System.IO.File.Exists("MYPATH"))
using (System.IO.StreamWriter file = new System.IO.StreamWriter("MYPATH")) //or a variable instead of string literal
{
file.Write(output);
}
}
With Cinchoo ETL - an open source file helper library, you can do the merge of CSV files as below. Assumed the 2 CSV file contains same number of lines.
string CSV1 = #"Id Name City
1 Tom New York
2 Mark FairFax";
string CSV2 = #"Id City
1 Las Vegas
2 Dallas";
dynamic rec1 = null;
dynamic rec2 = null;
StringBuilder csv3 = new StringBuilder();
using (var csvOut = new ChoCSVWriter(new StringWriter(csv3))
.WithFirstLineHeader()
.WithDelimiter("\t")
)
{
using (var csv1 = new ChoCSVReader(new StringReader(CSV1))
.WithFirstLineHeader()
.WithDelimiter("\t")
)
{
using (var csv2 = new ChoCSVReader(new StringReader(CSV2))
.WithFirstLineHeader()
.WithDelimiter("\t")
)
{
while ((rec1 = csv1.Read()) != null && (rec2 = csv2.Read()) != null)
{
rec1.City = rec2.City;
csvOut.Write(rec1);
}
}
}
}
Console.WriteLine(csv3.ToString());
Hope it helps.
Disclaimer: I'm the author of this library.
I am trying to compare the value in the 0 index of an array on one line and the 0 index on the following line. Imagine a CSV where I have a unique identifier in the first column, a corresponding value in the second column.
USER1, 1P
USER1, 3G
USER2, 1P
USER3, 1V
I would like to check the value of [0] the next line (or previous if that's easier) to compare and if they are the same (as they are in the example) concatenate it to index 1. That is, the data should read as
USER1, 1P, 3G
USER2, 1P
USER3, 1V
before it gets passed onto the next function. So far I have
private void csvParse(string path)
{
using (TextFieldParser parser = new TextFieldParser(path))
{
parser.Delimiters = new string[] { "," };
while (!parser.EndOfData)
{
string[] parts = parser.ReadFields();
if (parts == null)
{
break;
}
contact.ContactId = parts[0];
long nextLine;
nextLine = parser.LineNumber+1;
//if line1 parts[0] == line2 parts[0] etc.
}
}
}
Does anyone have any suggestions? Thank you.
How about saving the array into a variable:
private void csvParse(string path)
{
using (TextFieldParser parser = new TextFieldParser(path))
{
parser.Delimiters = new string[] { "," };
string[] oldParts = new string[] { string.Empty };
while (!parser.EndOfData)
{
string[] parts = parser.ReadFields();
if (parts == null || parts.Length < 1)
{
break;
}
if (oldParts[0] == parts[0])
{
// concat logic goes here
}
else
{
contact.ContactId = parts[0];
}
long nextLine;
nextLine = parser.LineNumber+1;
oldParts = parts;
//if line1 parts[0] == line2 parts[0] etc.
}
}
}
If I understand you correctly, what you are asking is essentially "how do I group the values in the second column based on the values in the first column?".
A quick and quite succinct way of doing this would be to Group By using LINQ:
var linesGroupedByUser =
from line in File.ReadAllLines(path)
let elements = line.Split(',')
let user = new {Name = elements[0], Value = elements[1]}
group user by user.Name into users
select users;
foreach (var user in linesGroupedByUser)
{
string valuesAsString = String.Join(",", user.Select(x => x.Value));
Console.WriteLine(user.Key + ", " + valuesAsString);
}
I have left out the use of your TextFieldParser class, but you can easily use that instead. This approach does, however, require that you can afford to load all of the data into memory. You don't mention whether this is viable.
The easiest way to do something like this is to convert each line to an object. You can use CsvHelper, https://www.nuget.org/packages/CsvHelper/, to do the work for you or you can iterate each line and parse to an object. It is a great tool and it knows how to properly parse CSV files into a collection of objects. Then, whether you create the collection yourself or use CsvHelper, you can use Linq to GroupBy, https://msdn.microsoft.com/en-us/library/bb534304(v=vs.100).aspx, your "key" (in this case UserId) and Aggregate, https://msdn.microsoft.com/en-us/library/bb549218(v=vs.110).aspx, the other property into a string. Then, you can use the new, grouped by, collection for your end goal (write it to file or use it for whatever you need).
You're basically finding all the unique entries so put them into a dictionary with the contact id as the key. As follows:
private void csvParse(string path)
{
using (TextFieldParser parser = new TextFieldParser(path))
{
parser.Delimiters = new string[] { "," };
Dictionary<string, List<string>> uniqueContacts = new Dictionary<string, List<string>>();
while (!parser.EndOfData)
{
string[] parts = parser.ReadFields();
if (parts == null || parts.Count() != 2)
{
break;
}
//if contact id not present in dictionary add
if (!uniqueContacts.ContainsKey(parts[0]))
uniqueContacts.Add(parts[0],new List<string>());
//now there's definitely an existing contact in dic (the one
//we've just added or a previously added one) so add to the
//list of strings for that contact
uniqueContacts[parts[0]].Add(parts[1]);
}
//now do something with that dictionary of unique user names and
// lists of strings, for example dump them to console in the
//format you specify:
foreach (var contactId in uniqueContacts.Keys)
{
var sb = new StringBuilder();
sb.Append($"contactId, ");
foreach (var bit in uniqueContacts[contactId])
{
sb.Append(bit);
if (bit != uniqueContacts[contactId].Last())
sb.Append(", ");
}
Console.WriteLine(sb);
}
}
}
This is the Input my file contains:
50|Hallogen|Mercury|M:4;C:40;A:1
90|Oxygen|Mars|M:10;C:20;A:00
5|Hydrogen|Saturn|M:33;C:00;A:3
Now i want to split each and every line of my text file and store in my class file like :
Expected output:
Planets[0]:
{
Number:50
name: Hallogen
object:Mercury
proportion[0]:
{
Number:4
},
proportion[1]:
{
Number:40
},
proportion[2]:
{
Number:1
}
}
etc........
My class file to store all this values:
public class Planets
{
public int Number { get; set; } //This field points to first cell of every row.output 50,90,5
public string name { get; set; } //This field points to Second cell of every row.output Hallogen,Oxygen,Hydrogen
public string object { get; set; } ////This field points to third cell of every row.output Mercury,Mars,Saturn
public List<proportion> proportion { get; set; } //This will store all proportions with respect to planet object.
//for Hallogen it will store 4,40,1.Just store number.ignore M,C,A initials.
//for oxygen it will store 10,20,00.Just store number.ignore M,C,A initials.
}
public class proportion
{
public int Number { get; set; }
}
This is what i have done:
List<Planets> Planets = new List<Planets>();
using (StreamReader sr = new StreamReader(args[0]))
{
String line;
while ((line = sr.ReadLine()) != null)
{
string[] parts = Regex.Split(line, #"(?<=[|;-])");
foreach (var item in parts)
{
var Obj = new Planets();//Not getting how to store it but not getting proper output in parts
}
Console.WriteLine(line);
}
}
Without you having to change any of your logic in "Planets"-class my fast solution to your problem would look like this:
List<Planets> Planets = new List<Planets>();
using (StreamReader sr = new StreamReader(args[0]))
{
String line;
while ((line = sr.ReadLine()) != null)
{
Planets planet = new Planets();
String[] parts = line.Split('|');
planet.Number = Convert.ToInt32(parts[0]);
planet.name = parts[1];
planet.obj = parts[2];
String[] smallerParts = parts[3].Split(';');
planet.proportion = new List<proportion>();
foreach (var item in smallerParts)
{
proportion prop = new proportion();
prop.Number =
Convert.ToInt32(item.Split(':')[1]);
planet.proportion.Add(prop);
}
Planets.Add(planet);
}
}
Oh before i forget it, you should not name your property of class Planets "object" because "object" is a keyword for the base class of everything, use something like "obj", "myObject" ,"planetObject" just not "object" your compiler will tell you the same ;)
To my understanding, multiple delimiters are maintained to have a nested structure.
You need to split the whole string first based on pipe, followed by semi colon and lastly by colon.
The order of splitting here is important. I don't think you can have all the tokens at once by splitting with all 3 delimiters.
Try following code for same kind of data
var values = new List<string>
{
"50|Hallogen|Mercury|M:4;C:40;A:1",
"90|Oxygen|Mars|M:10;C:20;A:00",
"5|Hydrogen|Saturn|M:33;C:00;A:3"
};
foreach (var value in values)
{
var pipeSplitted = value.Split('|');
var firstNumber = pipeSplitted[0];
var name = pipeSplitted[1];
var objectName = pipeSplitted[2];
var semiSpltted = value.Split(';');
var secondNumber = semiSpltted[0].Split(':')[1];
var thirdNumber = semiSpltted[1].Split(':')[1];
var colenSplitted = value.Split(':');
var lastNumber = colenSplitted[colenSplitted.Length - 1];
}
The most straigtforward solution is to use a regex where every (sub)field is matched inside a group
var subjectString = #"50|Hallogen|Mercury|M:4;C:40;A:1
90|Oxygen|Mars|M:10;C:20;A:00
5|Hydrogen|Saturn|M:33;C:00;A:3";
Regex regexObj = new Regex(#"^(.*?)\|(.*?)\|(.*?)\|M:(.*?);C:(.*?);A:(.*?)$", RegexOptions.Multiline);
Match match = regexObj.Match(subjectString);
while (match.Success) {
match.Groups[1].Value.Dump();
match.Groups[2].Value.Dump();
match.Groups[3].Value.Dump();
match.Groups[4].Value.Dump();
match.Groups[5].Value.Dump();
match.Groups[6].Value.Dump();
match = match.NextMatch();
}
If I understand correctly, your input is well formed. In this case you could use something like this:
string[] parts = Regex.Split(line, #"[|;-]");
var planet = new Planets(parts);
...
public Planets(string[] parts) {
int.TryParse(parts[0], this.Number);
this.name = parts[1];
this.object = parts[2];
this.proportion = new List<proportion>();
Regex PropRegex = new Regex("\d+");
for(int i = 3; i < parts.Length; i++){
Match PropMatch = PropRegex.Match(part[i]);
if(PropMatch.IsMatch){
this.proportion.Add(int.Parse(PropMatch.Value));
}
}
}
I am currently able to parse and extract data from large tab delimited file. I am reading, parsing and extracting line by line and adding the split items in my Data table (Row Limit adding 3 rows at a time). I need to skip even lines i.e. Read first maximum tab delimited line and then skip 2nd one and read the third one directly.
My Tab delimited source file format
001Mean 26.975 1.1403 910.45
001Stdev 26.975 1.1403 910.45
002Mean 26.975 1.1403 910.45
002Stdev 26.975 1.1403 910.45
Need to skip or avoid reading Stdev tab delimited lines.
C# Code:
Getting the Maximum length of items in a tab delimited line of the file by splitting a line
using (var reader = new StreamReader(sourceFileFullName))
{
string line = null;
line = reader.ReadToEnd();
if (!string.IsNullOrEmpty(line))
{
var list_with_max_cols = line.Split('\n').OrderByDescending(y => y.Split('\t').Count()).Take(1);
foreach (var value in list_with_max_cols)
{
var values = value.ToString().Split(new[] { '\t', '\n' }).ToArray();
MAX_NO_OF_COLUMNS = values.Length;
}
}
}
Reading the file line by line until maximum length in a tab delimited line is satisfied as first line to parse and extract
using (var reader = new StreamReader(sourceFileFullName))
{
string new_read_line = null;
//Read and display lines from the file until the end of the file is reached.
while ((new_read_line = reader.ReadLine()) != null)
{
var items = new_read_line.Split(new[] { '\t', '\n' }).ToArray();
if (items.Length != MAX_NO_OF_COLUMNS)
continue;
//when reach first line it is column list need to create datatable based on that.
if (firstLineOfFile)
{
columnData = new_read_line;
firstLineOfFile = false;
continue;
}
if (firstLineOfChunk)
{
firstLineOfChunk = false;
chunkDataTable = CreateEmptyDataTable(columnData);
}
AddRow(chunkDataTable, new_read_line);
chunkRowCount++;
if (chunkRowCount == _chunkRowLimit)
{
firstLineOfChunk = true;
chunkRowCount = 0;
yield return chunkDataTable;
chunkDataTable = null;
}
}
}
Creating Data Table:
private DataTable CreateEmptyDataTable(string firstLine)
{
IList<string> columnList = Split(firstLine);
var dataTable = new DataTable("TableName");
for (int columnIndex = 0; columnIndex < columnList.Count; columnIndex++)
{
string c_string = columnList[columnIndex];
if (Regex.Match(c_string, "\\s").Success)
{
string tmp = Regex.Replace(c_string, "\\s", "");
string finaltmp = Regex.Replace(tmp, #" ?\[.*?\]", ""); // To strip strings inside [] and inclusive [] alone
columnList[columnIndex] = finaltmp;
}
}
dataTable.Columns.AddRange(columnList.Select(v => new DataColumn(v)).ToArray());
dataTable.Columns.Add("ID");
return dataTable;
}
How to skip lines by reading alternatively and split and then add to my datatable !!!
AddRow Function : Managed to achieve my requirement by adding following changes !!!
private void AddRow(DataTable dataTable, string line)
{
if (line.Contains("Stdev"))
{
return;
}
else
{
//Rest of Code
}
}
Considering you have tab separated values in each line, how about reading the odd lines and splitting them into arrays. This is just a sample; you can expand upon this.
Test data (file.txt)
luck is when opportunity meets preparation
this line needs to be skipped
microsoft visual studio
another line to be skipped
let us all code
Code
var oddLines = File.ReadLines(#"C:\projects\file.txt").Where((item, index) => index%2 == 0);
foreach (var line in oddLines)
{
var words = line.Split('\t');
}
Debug screen shots
EDIT
To get lines that don't contain 'Stdev'
var filteredLines = System.IO.File.ReadLines(#"C:\projects\file.txt").Where(item => !item.Contains("Stdev"));
Change
using (var reader = new StreamReader(sourceFileFullName))
{
string new_read_line = null;
//Read and display lines from the file until the end of the file is reached.
while ((new_read_line = reader.ReadLine()) != null)
{
var items = new_read_line.Split(new[] { '\t', '\n' }).ToArray();
if (items.Length != MAX_NO_OF_COLUMNS)
continue;
To
using (var reader = new StreamReader(sourceFileFullName))
{
int cnt = 0;
string new_read_line = null;
//Read and display lines from the file until the end of the file is reached.
while ((new_read_line = reader.ReadLine()) != null)
{
cnt++;
if(cnt % 2 == 0)
continue;
var items = new_read_line.Split(new[] { '\t', '\n' }).ToArray();
if (items.Length != MAX_NO_OF_COLUMNS)
continue;
my project is using txt file as db, each line in the txt file will be something like "abc,cdf,ghi,zkl"
now i am reading line by line from the text file and split the line into an array data[] by ","
but i want to put this array into another main array called datas[], so i can store this datas[] array in memory for the whole class to use,
i dont want to fix datas[] array size as the txt file records will be growing.
what can i do in this case? i tried to make datas[] as arraylist then stored data[] array in it , but error showed.
class user
{
ArrayList userDatas = new ArrayList();
public user()
{
readUsers();
}
public void readUsers()
{
string line;
StreamReader sr = new StreamReader("user.txt", System.Text.Encoding.Default);
while ((line = sr.ReadLine()) != null)
{
ArrayList temp = new ArrayList();
string[] rc = line.Split('|');
for (int i = 0; i < rc.Length; i++)
{
temp.Add(rc[i]);
}
userDatas.Add(temp);
}
}
public bool login(string ic, string password)
{
for (int i = 0; i < userDatas.Count; i++)
{
ArrayList temp = userDatas;
if ((temp[1] == ic) && (temp[2] == password))
{
return true;
}
}
return false;
}
}
Of course if you don't mind being a little cute you should be able to do it with one line coutesy of LINQ:
string[][] LinesSplitByComma = File.ReadAllLines("file path").Select(s => s.Split(',')).ToArray();
Instead of ArrayList, use List<string> for temp and List<string[]> for userDatas.
When you're done filling them, you can convert to an array by simply calling userDatas.ToArray()
Also, your error might be here:
ArrayList temp = userDatas;
if ((temp[1] == ic) && (temp[2] == password))
{
return true;
}
You're not first checking to make sure temp has 3 or more elements before referencing indexes 1 and 2. Also, why are you creating temp only to assign it to userDatas? Why not just say:
if (userDatas.Count() >= 3 && (userDatas[1] == ic) && (userDatas[2] == password))
return true;
EDIT
As requested, here's my original code, though you already have much of it written, but here it is (you're code didn't show up at first):
StreamReader reader = new StreamReader();
List<string[]> datas = new List<string[]>();
List<string> data = new List<string>();
string line;
while (!reader.EndOfStream) {
line = reader.ReadLine();
datas.Add(line.Split(','));
}
string[] datas_array = datas.ToArray();