Turn dat file into matrix, include empty spaces - c#

I have a dat file that looks like this:
and I would like to turn this data into a matrix of some sort that includes values for the empty spaces. Any idea how to approach this?

If you wish to preserve the empty spaces then you may need to treat the data as fixed width columns.
https://codereview.stackexchange.com/questions/27782/how-to-read-fixed-width-data-fields-in-net
If it turns out that the data is tab delimited as #WaiHaLee suggests then just split the lines using the tab character. For example:
//read all lines
var lines = System.IO.File.ReadAllLines("C:/path/to/file.txt");
//loop through all lines
foreach(var line in lines)
{
//split the line
var splitString = line.Split(new char[] { '\t' });
//pull out some data from the 6th column
double avDP = double.Parse(splitString[5]);
//save the data wherever you want
}

Related

Load text file data into data table for specific length scenario

I have a text file which has many irrelevant values and then have values which I have load it into a table. Sample of the file looks like this
Some file description date
C D 8989898989898 some words
D F 8979797979 some more words
8 H 98988989989898 Some more words for the purpose
KD978787878 280000841 1974DIAA EIDER 320
KK967867668 280000551 1999OOOD FIDERN 680
I can't start from the number of lines because the description part (which is 4 lines, excluding empty line) can be of multi line. Means, it can have up to 40-50 lines per text file.
The only way I can think to pick the data is to select only those rows which has 5 columns and have certain number of space between them.
I have tried it using foreach loop but that didn't work out pretty well. May be I am not able to implement it.
DataTable dt = new DataTable();
using (StreamWriter sw = File.CreateText(path))
{
string[] rows = content.Split('\n');
foreach (string s in rows)
{
// how to pick up rows when there are only 5 columns in a row separated by a definite number of space?
string[] columns = s.Split(' '); // how to calculate exact spaces here, because space count could be different from one column to the other. Ex: difference between first column and second is 16 and second to third is 8.
foreach (string t in columns)
{
}
}
}
A lot of this comes down to massaging and sanitizing the data(yuck!) I would:
1.Use String.Split on content to get all lines(like you did)
string[] lines = content.Split(new[] { "\r\n", "\r", "\n" }, StringSplitOptions.None);
2.Parse out empty lines and loop over the result
foreach(string line in lines.Where(x => !String.IsNullOrEmpty(x.Trim())))
3.Use String.Split on each line to split out each field for a particular row, stripping white space
string[] fields = line.Split(new string[] { " " }, StringSplitOptions.RemoveEmptyEntries);
At this point you can count the number of fields in the row or throw something at each actual field.
This is an ideal place to use regex to find only lines that fit your needs and even grouping them properly you can get out the trimmed values of the five columns already.
The search expressions seems to be something like "^(K[A-Z0-9]+) +([0-9]+) +([A-Z0-9]+) +([A-Z]+) +([0-9]+) *$" or similar. It helped me a lot in programming to know regex.

C# text file to string array and how to remove specific strings?

I need read a text file (10mb) and convert to .csv. See below portion of code:
string DirPathForm = System.IO.Path.GetDirectoryName(System.Reflection.Assembly.GetEntryAssembly().Location);'
string[] lines = File.ReadAllLines(DirPathForm + #"\file.txt");
Some portion of the text file have a pattern. So, used as below:
string[] lines1 = lines.Select(x => x.Replace("abc[", "ab,")).ToArray();
Array.Clear(lines, 0, lines.Length);
lines = lines1.Select(x => x.Replace("] CDE ", ",")).ToArray();
Some portion does not have a pattern to use directly Replace. The question is how remove the characters, numbers and whitespaces in this portion. Please see below?
string[] lines = {
"a] 773 b",
"e] 1597 t",
"z] 0 c"
};
to get the result below:
string[] result = {
"a,b",
"e,t",
"z,c"
};
obs: the items removed need be replaced by ",".
First of all, you should not use ReadAllLines since it is a huge file operation. It will load all the data into RAM and it is not correct. Instead, read the lines one by one in a loop.
Secondly, you can definitely use regex to replace data from the first condition to the second one.

Problem reading csv file that has a column with first and last names

The csv file has Id and Name. Some of the Names are composed of the first and last names eg "John, Smith". If you see in db,after inserting in to SQL table the Name are inserted as "John". Could you please suggest how to get full name of the Name if it is ',' seperated?
string filepath = selecteditem.FullName;
using (StreamReader sr = new StreamReader(filepath))
{
while (sr.Peek() != -1)
{
string line = sr.ReadLine();
string[] value = line.Split(',');
List<string> lineValues = line.Split(',').ToList();
conn.Open();
cmd.CommandText = "insert into
The string.Split method has an overload that allows you to control how many splits are returned by the original string, so if your input string is
string input = "1,John, Smith";
var splits = input.Split(new char[] { ','}, 2, StringSplitOptions.RemoveEmptyEntries);
and you have only two entries in the splits array, the first is the ID, the second is the name.
If you have access to Excel, open it, save as XLSX
Find and replace all commas within the workbook with a | or other equally obscure character, having first made sure, the obscure character isn't in the sheet to begin with.
Save and then re-save as .csv.
In the C# code or directly in the Sql, replace the obscure character with the original comma.

Issue renaming two columns in a CSV file instead of one

I need to be able to rename the column in a spreadsheet from 'idn_prod' to 'idn_prod1', but there are two columns with this name.
I have tried implementing code from similar posts, but I've only been able to update both columns. Below you'll find the code I have that just renames both columns.
//locate and edit column in csv
string file1 = #"C:\Users\username\Documents\AppDevProjects\import.csv";
string[] lines = System.IO.File.ReadAllLines(file1);
System.IO.StreamWriter sw = new System.IO.StreamWriter(file1);
foreach(string s in lines)
{
sw.WriteLine(s.Replace("idn_prod", "idn_prod1"));
}
I expect only the 2nd column to be renamed, but the actual output is that both are renamed.
Here are the first couple rows of the CSV:
I'm assuming that you only need to update the column header, the actual rows need not be updated.
var file1 = #"test.csv";
var lines = System.IO.File.ReadAllLines(file1);
var columnHeaders = lines[0];
var textToReplace = "idn_prod";
var newText = "idn_prod1";
var indexToReplace = columnHeaders
.LastIndexOf("idn_prod");//LastIndex ensures that you pick the second idn_prod
columnHeaders = columnHeaders
.Remove(indexToReplace,textToReplace.Length)
.Insert(indexToReplace, newText);//I'm removing the second idn_prod and replacing it with the updated value.
using (System.IO.StreamWriter sw = new System.IO.StreamWriter(file1))
{
sw.WriteLine(columnHeaders);
foreach (var str in lines.Skip(1))
{
sw.WriteLine(str);
}
sw.Flush();
}
Replace foreach(string s in lines) loop with
for loop and get the lines count and rename only the 2nd column.
I believe the only way to handle this properly is to crack the header line (first string that has column names) into individual parts, separated by commas or tabs or whatever, and run through the columns one at a time yourself.
Your loop would consider the first line from the file, use the Split function on the delimiter, and look for the column you're interested in:
bool headerSeen = false;
foreach (string s in lines)
{
if (!headerSeen)
{
// special: this is the header
string [] parts = s.Split("\t");
for (int i = 0; i < parts.Length; i++)
{
if (parts[i] == "idn_prod")
{
// only fix the *first* one seen
parts[i] = "idn_prod1";
break;
}
}
sw.WriteLine( string.Join("\t", parts));
headerSeen = true;
}
else
{
sw.WriteLine( s );
}
}
The only reason this is even remotely possible is that it's the header and not the individual lines; headers tend to be more predictable in format, and you worry less about quoting and fields that contain the delimiter, etc.
Trying this on the individual data lines will rarely work reliably: if your delimiter is a comma, what happens if an individual field contains a comma? Then you have to worry about quoting, and this enters all kinds of fun.
For doing any real CSV work in C#, it's really worth looking into a package that specializes in this, and I've been thrilled with CsvHelper from Josh Close. Highly recommended.

Convert txt with different number of spaces into xls file

I tried searching for a solution here but I can't seem to find any answers. I have a textfile that appears like this:
Nmr_test 101E-6 PASSED PASSED PASSED PASSED
Dc_volts 10V_100 CAL_+10V +9.99999000 +10.0000100 +9.99999740 +9.99999727
Dcv_lin 10V_6U 11.5 +0.0000E+000 +7.0000E+000 +2.0367E+001 +2.7427E+001
Dcv_lin 10V_6U 3 +0.0000E+000 +5.0000E+000 +1.3331E+001 +1.8872E+001
I have to convert this textfile to an Excel/xls file but I can't figure out how to insert them to the correct excel columns as they have different number of spaces in between columns. I've tried using this code below which is using space as a separator but it fails of course due to the varying number of spaces between the columns:
var lines = File.ReadAllLines(string.Concat(Directory.GetCurrentDirectory(), "\\Temp_textfile.txt"));
var rowcounter = 1;
foreach(var line in lines)
{
var columncounter = 1;
var values = line.Split(' ');
foreach(var value in values)
{
excelworksheet.Cells[rowcounter, columncounter] = new Cell(value);
columncounter++;
}
rowcounter++;
}
excelworkbook.Worksheets.Add(excelworksheet);
excelworkbook.Save(string.Concat(Directory.GetCurrentDirectory(), "\\Exported_excelfile.xls"));
Any advice?
EDIT: Got it working using SubString that selects each column using their fixed width.

Categories

Resources