parsing text file to data table with irregular rows - c#

i am trying to parse a tabular data in a text file into a data table.
the text file contains text
PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND
11 root 1 171 52 0K 12K RUN 23:46 80.42% idle
12 root 1 -20 -139 0K 12K RUN AS 0:56 7.96% swi7:
the code i have is like
public class Program
{
static void Main(string[] args)
{
var lines = File.ReadLines("bb.txt").ToArray();
var headerLine = lines[0];
var dt = new DataTable();
var columnsArray = headerLine.Split(" ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
var dataColumns = columnsArray.Select(item => new DataColumn { ColumnName = item });
dt.Columns.AddRange(dataColumns.ToArray());
for (int i = 1; i < lines.Length; i++)
{
var rowLine = lines[i];
var rowArray = rowLine.Split(" ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
var x = dt.NewRow();
x.ItemArray = rowArray;
dt.Rows.Add(x);
}
}
}
i get an error that "Input array is longer than the number of columns in this table" at second attempt on
x.ItemArray = rowArray;
Off course because second row has "RUN AS" as the value of 8th column. it also has a space between it which is a common split character for the entire row hence creating a mismatch between array's length and columns length.
what is the possible solution for this kind of situation.

Assuming that "RUN AS" is your only string that causes you the condition like this, you could just run var sanitizedLine = rowLine.Replace("RUN AS", "RUNAS") before your split and then separate the words back out afterwards. If this happens more often, however, you may need to set a condition to check that the array generated by the split matches the length of the header, then combine the offending indexes in a new array of the correct length before attempting to add it.
Ideally, however, you would instead have whatever is generating your input file wrap strings in quotes to make your life easier.

Related

Issue renaming two columns in a CSV file instead of one

I need to be able to rename the column in a spreadsheet from 'idn_prod' to 'idn_prod1', but there are two columns with this name.
I have tried implementing code from similar posts, but I've only been able to update both columns. Below you'll find the code I have that just renames both columns.
//locate and edit column in csv
string file1 = #"C:\Users\username\Documents\AppDevProjects\import.csv";
string[] lines = System.IO.File.ReadAllLines(file1);
System.IO.StreamWriter sw = new System.IO.StreamWriter(file1);
foreach(string s in lines)
{
sw.WriteLine(s.Replace("idn_prod", "idn_prod1"));
}
I expect only the 2nd column to be renamed, but the actual output is that both are renamed.
Here are the first couple rows of the CSV:
I'm assuming that you only need to update the column header, the actual rows need not be updated.
var file1 = #"test.csv";
var lines = System.IO.File.ReadAllLines(file1);
var columnHeaders = lines[0];
var textToReplace = "idn_prod";
var newText = "idn_prod1";
var indexToReplace = columnHeaders
.LastIndexOf("idn_prod");//LastIndex ensures that you pick the second idn_prod
columnHeaders = columnHeaders
.Remove(indexToReplace,textToReplace.Length)
.Insert(indexToReplace, newText);//I'm removing the second idn_prod and replacing it with the updated value.
using (System.IO.StreamWriter sw = new System.IO.StreamWriter(file1))
{
sw.WriteLine(columnHeaders);
foreach (var str in lines.Skip(1))
{
sw.WriteLine(str);
}
sw.Flush();
}
Replace foreach(string s in lines) loop with
for loop and get the lines count and rename only the 2nd column.
I believe the only way to handle this properly is to crack the header line (first string that has column names) into individual parts, separated by commas or tabs or whatever, and run through the columns one at a time yourself.
Your loop would consider the first line from the file, use the Split function on the delimiter, and look for the column you're interested in:
bool headerSeen = false;
foreach (string s in lines)
{
if (!headerSeen)
{
// special: this is the header
string [] parts = s.Split("\t");
for (int i = 0; i < parts.Length; i++)
{
if (parts[i] == "idn_prod")
{
// only fix the *first* one seen
parts[i] = "idn_prod1";
break;
}
}
sw.WriteLine( string.Join("\t", parts));
headerSeen = true;
}
else
{
sw.WriteLine( s );
}
}
The only reason this is even remotely possible is that it's the header and not the individual lines; headers tend to be more predictable in format, and you worry less about quoting and fields that contain the delimiter, etc.
Trying this on the individual data lines will rarely work reliably: if your delimiter is a comma, what happens if an individual field contains a comma? Then you have to worry about quoting, and this enters all kinds of fun.
For doing any real CSV work in C#, it's really worth looking into a package that specializes in this, and I've been thrilled with CsvHelper from Josh Close. Highly recommended.

Convert txt with different number of spaces into xls file

I tried searching for a solution here but I can't seem to find any answers. I have a textfile that appears like this:
Nmr_test 101E-6 PASSED PASSED PASSED PASSED
Dc_volts 10V_100 CAL_+10V +9.99999000 +10.0000100 +9.99999740 +9.99999727
Dcv_lin 10V_6U 11.5 +0.0000E+000 +7.0000E+000 +2.0367E+001 +2.7427E+001
Dcv_lin 10V_6U 3 +0.0000E+000 +5.0000E+000 +1.3331E+001 +1.8872E+001
I have to convert this textfile to an Excel/xls file but I can't figure out how to insert them to the correct excel columns as they have different number of spaces in between columns. I've tried using this code below which is using space as a separator but it fails of course due to the varying number of spaces between the columns:
var lines = File.ReadAllLines(string.Concat(Directory.GetCurrentDirectory(), "\\Temp_textfile.txt"));
var rowcounter = 1;
foreach(var line in lines)
{
var columncounter = 1;
var values = line.Split(' ');
foreach(var value in values)
{
excelworksheet.Cells[rowcounter, columncounter] = new Cell(value);
columncounter++;
}
rowcounter++;
}
excelworkbook.Worksheets.Add(excelworksheet);
excelworkbook.Save(string.Concat(Directory.GetCurrentDirectory(), "\\Exported_excelfile.xls"));
Any advice?
EDIT: Got it working using SubString that selects each column using their fixed width.

How can I divide line from input file in two parts and then compare it to the two column data of database table in c#?

Given a text file, how would I go about reading an particular digits in line .
Say, I have a file 123.txt. How would I go about reading line number and store first 5 digits in different variable and next 6 digits to another variable.
All I've seen is stuff involving storing the entire text file as a String array . but there are some complications: The text file is enormously huge and the machine that the application I'm coding isn't exactly a top-notch system. Speed isn't the top priority, but it is definitely a major issue.
// Please Help here
// Want to compare data of input file with database table columns.
// How to split data in to parts
// Access that split data later for comparison.
// Data in input file is like,
//
// 016584824684000000000000000+
// 045787544574000000000000000+
// 014578645447000000000000000+
// 047878741489000000000000000+ and so on..
string[] lines = System.IO.File.ReadAllLines("F:\\123.txt"); // Input file
// How can I divide lines from input file in 2 parts (For ex. 01658 and 4824684) and save it in variable so that I can use it for comparing later.
string conStr = ConfigurationManager.ConnectionStrings["BVI"].ConnectionString;
cnn = new SqlConnection(conStr);
cnn.Open();
// So I want to compare first 5 digits of all lines of input file (ex. 01658)with Transit_ID and next 6 digits with Client_Account and then export matching rows in excel file.
sql = "SELECT Transit_ID AS TransitID, Client_Account AS AccountNo FROM TCA_CLIENT_ACCOUNT WHERE Transit_ID = " //(What should I put here to comapare with first 5 digits of all lines of input file)" AND Client_Account = " ??" );
All I've seen is stuff involving storing the entire text file as a String array
Large text files should be processed by streaming one line at a time so that you don't allocate a large amount of memory needlessly
using (StreamReader sr = File.OpenText(path))
{
string s;
while ((s = sr.ReadLine()) != null)
{
// How would I go about reading line number and store first 5
// digits in different variable and next 6 digits to another variable.
string first = s.Substring(0, 5);
string second = s.Substring(6, 6);
}
}
https://msdn.microsoft.com/en-us/library/system.io.file.opentext(v=vs.110).aspx
Just use Substring(int32, int32) to get the appropriate values like this:
string[] lines = System.IO.File.ReadAllLines("F:\\123.txt");
List<string> first = new List<string>();
List<string> second = new List<string>();
foreach (string line in lines)
{
first.Add(line.Substring(0, 5));
second.Add(line.Substring(6, 6));
}
Though Eric's answer is much cleaner. This was just a quick and dirty proof of concept using your sample data. You should definitely use the using statement and StreamReader as he suggested.
first will contain the first 5 digits from each element in lines, and second will contain the next 6 digits.
Then to build your SQL, you'd do something like this;
sql = "SELECT Transit_ID AS TransitID, Client_Account AS AccountNo FROM TCA_CLIENT_ACCOUNT WHERE Transit_ID = #TransitId AND Client_Account = #ClientAcct");
SqlCommand cmd = new SqlCommand(sql);
for (int i = 0; i < lines.Count; i++)
{
cmd.Parameters.AddWithValue("#TransitId", first[i]);
cmd.Parameters.AddWithValue("#ClientAcct", second[i]);
//execute your command and validate results
}
That will loop N times and run a command for each of the values in lines.

How to skip txt file chunks

How do I skip reading the file at the red boxes only to continue reading the file at the blue boxes? What adjustments would I need to make to 'fileReader'?
So far, with the help of SO users, I've been able to successfully skip the first 8 lines (first red box) and read the rest of the file. But now I want to read ONLY the parts indicated in blue.
I'm thinking of making a method for each chunk in blue. Basically start it by skipping first 8 lines of file if its first blue box, about 23 for the next blue box but ending the file reader is where I'm having problems. Simply don't know what to use.
private void button1_Click(object sender, EventArgs e)
{
// Reading/Inputing column values
OpenFileDialog ofd = new OpenFileDialog();
if (ofd.ShowDialog() == System.Windows.Forms.DialogResult.OK)
{
string[] lines = File.ReadAllLines(ofd.FileName).Skip(8).ToArray();
textBox1.Lines = lines;
int[] pos = new int[3] {0, 6, 18}; //setlen&pos to read specific colmn vals
int[] len = new int[3] {6, 12, 28}; // only doing 3 columns right now
foreach (string line in textBox1.Lines)
{
for (int j = 0; j < 3; j++) // 3 columns
{
val[j] = line.Substring(pos[j], len[j]).Trim();
list.Add(val[j]); // column values stored in list
}
}
}
}
Try something like this:
using System.Text.RegularExpressions; //add this using
foreach (string line in lines)
{
string[] tokens = Regex.Split(line.Trim(), " +");
int seq = 0;
DateTime dt;
if(tokens.Length > 0 && int.TryParse(tokens[0], out seq))
{
// parse this line - 1st type
}
else if (tokens.Length > 0 && DateTime.TryParse(tokens[0], out dt))
{
// parse this line - 2nd type
}
// else - don't parse the line
}
The Regex split is handy to break on any spaces till the next token. The Regex " +" means match one or more spaces. It splits when it finds something else. Based on your example, you only want to parse lines that begin with a number or a date, which this should accomplish. Note that I trimmed the line of leading and trailing spaces so that you don't split on any of those and get empty string tokens.
I can see what you want to read anything what:
between line ending with Numerics (possible one line after)
until line starting with 0Total (is that zero, right?);
between line ending with CURREN
until line with 1 as first symbol in the row.
Shouldn't be hard. Read file by line. When (1) or (3) occurs, start generating until (2) or (4) correspondingly.

split string to string array without loosing text order

I have a problem that I busted my head for 7 days, so I decide to ask you for help. Here is my problem:
I read data from datagridview (only 2 cell), and fill all given data in stringbuilder, its actually article and price like invoice (bill). Now I add all what I get in stringbuilder in just string with intention to split string line under line, and that part of my code work but not as I wont. Article is one below another but price is one price more left another more right not all in one vertical line, something like this:
Bread 10$
Egg 4$
Milk 5$
My code:
string[] lines;
StringBuilder sbd = new StringBuilder();
foreach (DataGridViewRow rowe in dataGridView2.Rows)
{
sbd.Append(rowe.Cells[0].Value).Append(rowe.Cells[10].Value);
sbd.Append("\n");
}
sbd.Remove(sbd.Length - 1, 1);
string userOutput = sbd.ToString();
lines = userOutput.Split(new string[] { "\r", "\n" },
StringSplitOptions.RemoveEmptyEntries);
You can use the Trim method in order to remove existing leading and trailing spaces. With PadRight you can automatically add the right number of spaces in order to get a specified total length.
Also use a List<string> that grows automatically instead of using an array that you get from splitting what you just put together before:
List<string> lines = new List<string>();
foreach (DataGridViewRow row in dataGridView2.Rows) {
lines.Add( row.Cells[0].Value.ToString().Trim().PadRight(25) +
row.Cells[10].Value.ToString().Trim());
}
But keep in mind that this way of formatting works only if you display the string in a monospaced font (like Courier New or Consolas). Proportional fonts like Arial will yield jagged columns.
Alternatively you can create an array with the right size by reading the number of lines from the Count property
string[] lines = new string[dataGridView2.Rows.Count];
for (int i = 0; i < lines.Length; i++) {
DataGridViewRow row = dataGridView2.Rows[i];
lines[i] = row.Cells[0].Value.ToString().Trim().PadRight(25) +
row.Cells[10].Value.ToString().Trim();
}
You can also use the PadLeft method in order to right align the amounts
row.Cells[10].Value.ToString().Trim().PadLeft(10)
Have you tried this String Split method ?
String myString = "Bread ;10$;";
String articleName = myString.split(';')[0];
String price = myString.split(';')[1];

Categories

Resources