Load text file data into data table for specific length scenario - c#

I have a text file which has many irrelevant values and then have values which I have load it into a table. Sample of the file looks like this
Some file description date
C D 8989898989898 some words
D F 8979797979 some more words
8 H 98988989989898 Some more words for the purpose
KD978787878 280000841 1974DIAA EIDER 320
KK967867668 280000551 1999OOOD FIDERN 680
I can't start from the number of lines because the description part (which is 4 lines, excluding empty line) can be of multi line. Means, it can have up to 40-50 lines per text file.
The only way I can think to pick the data is to select only those rows which has 5 columns and have certain number of space between them.
I have tried it using foreach loop but that didn't work out pretty well. May be I am not able to implement it.
DataTable dt = new DataTable();
using (StreamWriter sw = File.CreateText(path))
{
string[] rows = content.Split('\n');
foreach (string s in rows)
{
// how to pick up rows when there are only 5 columns in a row separated by a definite number of space?
string[] columns = s.Split(' '); // how to calculate exact spaces here, because space count could be different from one column to the other. Ex: difference between first column and second is 16 and second to third is 8.
foreach (string t in columns)
{
}
}
}

A lot of this comes down to massaging and sanitizing the data(yuck!) I would:
1.Use String.Split on content to get all lines(like you did)
string[] lines = content.Split(new[] { "\r\n", "\r", "\n" }, StringSplitOptions.None);
2.Parse out empty lines and loop over the result
foreach(string line in lines.Where(x => !String.IsNullOrEmpty(x.Trim())))
3.Use String.Split on each line to split out each field for a particular row, stripping white space
string[] fields = line.Split(new string[] { " " }, StringSplitOptions.RemoveEmptyEntries);
At this point you can count the number of fields in the row or throw something at each actual field.

This is an ideal place to use regex to find only lines that fit your needs and even grouping them properly you can get out the trimmed values of the five columns already.
The search expressions seems to be something like "^(K[A-Z0-9]+) +([0-9]+) +([A-Z0-9]+) +([A-Z]+) +([0-9]+) *$" or similar. It helped me a lot in programming to know regex.

Related

C# text file to string array and how to remove specific strings?

I need read a text file (10mb) and convert to .csv. See below portion of code:
string DirPathForm = System.IO.Path.GetDirectoryName(System.Reflection.Assembly.GetEntryAssembly().Location);'
string[] lines = File.ReadAllLines(DirPathForm + #"\file.txt");
Some portion of the text file have a pattern. So, used as below:
string[] lines1 = lines.Select(x => x.Replace("abc[", "ab,")).ToArray();
Array.Clear(lines, 0, lines.Length);
lines = lines1.Select(x => x.Replace("] CDE ", ",")).ToArray();
Some portion does not have a pattern to use directly Replace. The question is how remove the characters, numbers and whitespaces in this portion. Please see below?
string[] lines = {
"a] 773 b",
"e] 1597 t",
"z] 0 c"
};
to get the result below:
string[] result = {
"a,b",
"e,t",
"z,c"
};
obs: the items removed need be replaced by ",".
First of all, you should not use ReadAllLines since it is a huge file operation. It will load all the data into RAM and it is not correct. Instead, read the lines one by one in a loop.
Secondly, you can definitely use regex to replace data from the first condition to the second one.

Issue renaming two columns in a CSV file instead of one

I need to be able to rename the column in a spreadsheet from 'idn_prod' to 'idn_prod1', but there are two columns with this name.
I have tried implementing code from similar posts, but I've only been able to update both columns. Below you'll find the code I have that just renames both columns.
//locate and edit column in csv
string file1 = #"C:\Users\username\Documents\AppDevProjects\import.csv";
string[] lines = System.IO.File.ReadAllLines(file1);
System.IO.StreamWriter sw = new System.IO.StreamWriter(file1);
foreach(string s in lines)
{
sw.WriteLine(s.Replace("idn_prod", "idn_prod1"));
}
I expect only the 2nd column to be renamed, but the actual output is that both are renamed.
Here are the first couple rows of the CSV:
I'm assuming that you only need to update the column header, the actual rows need not be updated.
var file1 = #"test.csv";
var lines = System.IO.File.ReadAllLines(file1);
var columnHeaders = lines[0];
var textToReplace = "idn_prod";
var newText = "idn_prod1";
var indexToReplace = columnHeaders
.LastIndexOf("idn_prod");//LastIndex ensures that you pick the second idn_prod
columnHeaders = columnHeaders
.Remove(indexToReplace,textToReplace.Length)
.Insert(indexToReplace, newText);//I'm removing the second idn_prod and replacing it with the updated value.
using (System.IO.StreamWriter sw = new System.IO.StreamWriter(file1))
{
sw.WriteLine(columnHeaders);
foreach (var str in lines.Skip(1))
{
sw.WriteLine(str);
}
sw.Flush();
}
Replace foreach(string s in lines) loop with
for loop and get the lines count and rename only the 2nd column.
I believe the only way to handle this properly is to crack the header line (first string that has column names) into individual parts, separated by commas or tabs or whatever, and run through the columns one at a time yourself.
Your loop would consider the first line from the file, use the Split function on the delimiter, and look for the column you're interested in:
bool headerSeen = false;
foreach (string s in lines)
{
if (!headerSeen)
{
// special: this is the header
string [] parts = s.Split("\t");
for (int i = 0; i < parts.Length; i++)
{
if (parts[i] == "idn_prod")
{
// only fix the *first* one seen
parts[i] = "idn_prod1";
break;
}
}
sw.WriteLine( string.Join("\t", parts));
headerSeen = true;
}
else
{
sw.WriteLine( s );
}
}
The only reason this is even remotely possible is that it's the header and not the individual lines; headers tend to be more predictable in format, and you worry less about quoting and fields that contain the delimiter, etc.
Trying this on the individual data lines will rarely work reliably: if your delimiter is a comma, what happens if an individual field contains a comma? Then you have to worry about quoting, and this enters all kinds of fun.
For doing any real CSV work in C#, it's really worth looking into a package that specializes in this, and I've been thrilled with CsvHelper from Josh Close. Highly recommended.

Convert txt with different number of spaces into xls file

I tried searching for a solution here but I can't seem to find any answers. I have a textfile that appears like this:
Nmr_test 101E-6 PASSED PASSED PASSED PASSED
Dc_volts 10V_100 CAL_+10V +9.99999000 +10.0000100 +9.99999740 +9.99999727
Dcv_lin 10V_6U 11.5 +0.0000E+000 +7.0000E+000 +2.0367E+001 +2.7427E+001
Dcv_lin 10V_6U 3 +0.0000E+000 +5.0000E+000 +1.3331E+001 +1.8872E+001
I have to convert this textfile to an Excel/xls file but I can't figure out how to insert them to the correct excel columns as they have different number of spaces in between columns. I've tried using this code below which is using space as a separator but it fails of course due to the varying number of spaces between the columns:
var lines = File.ReadAllLines(string.Concat(Directory.GetCurrentDirectory(), "\\Temp_textfile.txt"));
var rowcounter = 1;
foreach(var line in lines)
{
var columncounter = 1;
var values = line.Split(' ');
foreach(var value in values)
{
excelworksheet.Cells[rowcounter, columncounter] = new Cell(value);
columncounter++;
}
rowcounter++;
}
excelworkbook.Worksheets.Add(excelworksheet);
excelworkbook.Save(string.Concat(Directory.GetCurrentDirectory(), "\\Exported_excelfile.xls"));
Any advice?
EDIT: Got it working using SubString that selects each column using their fixed width.

Turn dat file into matrix, include empty spaces

I have a dat file that looks like this:
and I would like to turn this data into a matrix of some sort that includes values for the empty spaces. Any idea how to approach this?
If you wish to preserve the empty spaces then you may need to treat the data as fixed width columns.
https://codereview.stackexchange.com/questions/27782/how-to-read-fixed-width-data-fields-in-net
If it turns out that the data is tab delimited as #WaiHaLee suggests then just split the lines using the tab character. For example:
//read all lines
var lines = System.IO.File.ReadAllLines("C:/path/to/file.txt");
//loop through all lines
foreach(var line in lines)
{
//split the line
var splitString = line.Split(new char[] { '\t' });
//pull out some data from the 6th column
double avDP = double.Parse(splitString[5]);
//save the data wherever you want
}

split string to string array without loosing text order

I have a problem that I busted my head for 7 days, so I decide to ask you for help. Here is my problem:
I read data from datagridview (only 2 cell), and fill all given data in stringbuilder, its actually article and price like invoice (bill). Now I add all what I get in stringbuilder in just string with intention to split string line under line, and that part of my code work but not as I wont. Article is one below another but price is one price more left another more right not all in one vertical line, something like this:
Bread 10$
Egg 4$
Milk 5$
My code:
string[] lines;
StringBuilder sbd = new StringBuilder();
foreach (DataGridViewRow rowe in dataGridView2.Rows)
{
sbd.Append(rowe.Cells[0].Value).Append(rowe.Cells[10].Value);
sbd.Append("\n");
}
sbd.Remove(sbd.Length - 1, 1);
string userOutput = sbd.ToString();
lines = userOutput.Split(new string[] { "\r", "\n" },
StringSplitOptions.RemoveEmptyEntries);
You can use the Trim method in order to remove existing leading and trailing spaces. With PadRight you can automatically add the right number of spaces in order to get a specified total length.
Also use a List<string> that grows automatically instead of using an array that you get from splitting what you just put together before:
List<string> lines = new List<string>();
foreach (DataGridViewRow row in dataGridView2.Rows) {
lines.Add( row.Cells[0].Value.ToString().Trim().PadRight(25) +
row.Cells[10].Value.ToString().Trim());
}
But keep in mind that this way of formatting works only if you display the string in a monospaced font (like Courier New or Consolas). Proportional fonts like Arial will yield jagged columns.
Alternatively you can create an array with the right size by reading the number of lines from the Count property
string[] lines = new string[dataGridView2.Rows.Count];
for (int i = 0; i < lines.Length; i++) {
DataGridViewRow row = dataGridView2.Rows[i];
lines[i] = row.Cells[0].Value.ToString().Trim().PadRight(25) +
row.Cells[10].Value.ToString().Trim();
}
You can also use the PadLeft method in order to right align the amounts
row.Cells[10].Value.ToString().Trim().PadLeft(10)
Have you tried this String Split method ?
String myString = "Bread ;10$;";
String articleName = myString.split(';')[0];
String price = myString.split(';')[1];

Categories

Resources