Text file to Excel - c#

I have a Text file contains table contents such as following:
|ID |SN| | Date | Code |Comp|Source| Format |Unit|BuyQTY|DoneQTY|YetQTY|Late
21C011 5 1080201 BAO-99 高雄 10P056 5X3X5M/R RBDC-18865LA M 10000 7000 3000 1
21C006 1 1080201 BAO-99 高雄 20A001 5X8X2M/R 高廠軟 Q 料 M 60000 40000 20000 1
21C002 6 1080201 BAO-99 高雄 10W013 5X1X5M/R PVC+UV M 202000 100500 101500
21C006 4 1080212 BAO-99 高雄 10P038 4X5X5M/R DIGI PACK M 255000 255000
21C006 5 1080212 BAO-99 高雄 10P039 4X6X5M/R DIGI PACK 295000 295000
21C006 6 1080212 BAO-99 高雄 10P040 4X2X5M/R DIGI PACK M 114000 114000
21C006 7 1080212 BAO-99 高雄 10P041 4X9X5M/R DIGI PACK M 49500 49500
Notice that there are many missing values and different length in "Format" column.
I tried to read it into Excel such as following:
Because of the missing values and different format length, I can NOT just simply use Split.
I tried to use Graphics.MeasureString() to get the width of the substring between certain lengths.
Such as width between 125 and 140 will be "Unit".
But because of the Chinese characters and spaces, the result are all "crooked"!
I can never get it to the right column!
Could somebody please be so kind and teach me how could I get it done correctly!?
Much appreciated!!!
Update:
Because I'm writing a program for somebody to do such a task, so I CAN'T ask him to modify the original text through NotePad++ or any other software.
I also can NOT ask him to import it using Excel and set the column widths!
ALL because of it's for their convenience!!!
So I apologize VERY MUCH if I can NOT make life any easier!!!
PS. The Chinese characters are BIG5.
The following is the code I use to parse the text file into a DataGridView:
float[] colLens = new float[] { 137, 161, 301, 359, 400, 510, 760, 804, 872, 944, 1010, 1035,1050 };
Graphics g = CreateGraphics();
str = File.ReadAllLines(ofd.FileName,Encoding.GetEncoding("BIG5"));
for(int i = 0; i < str.Count(); i++)
{
int c = 0;
DataGridViewRow row = new DataGridViewRow();
row.CreateCells(dgvMain);
d = -1;
for(int j = 1; j < str[i].Length ; j++)
{
string s = str[i].Substring(0, j);
SizeF size = g.MeasureString(s, new Font("細明體", 12));
for (int k = d + 1; k < colLens.Count()-1; k++)
{
if (size.Width < colLens[k]) break;
else if(size.Width < colLens[k + 1])
{
d = k;
row.Cells[d].Value = str[i].Substring(c, j - c);
c = j;
break;
}
}
}
dgvMain.Rows.Add(row);
}

Chinese encodings are variable length, whether Big5 or GB18030. This means that Xis stored as a single byte but 高 is stored as two bytes. It seems that this file has a fixed byte length per field, not a fixed character length.
This means that code that expects a fixed character length won't be able to read this file easily. That includes Excel and probably every CSV handling library or code.
In the worst case, you can read the bytes directly from a file stream. Each set of bytes can be converted to a string using Encoding.ToString. You can get the Big5 encoding with Encoding.GetEncoding(950).
Encoding _big5=Encoding.GetEncoding(950);
byte[] _buffer=new byte[90];
public string GetField(FileStream stream,int offset, int length)
{
var read=stream.ReadBytes(_buffer,offset,length);
if(read>0)
{
return _big5.GetString(buffer,0,read);
}
else
{
return "";
}
}
//Quick & dirty way to skip to end
public void SkipToLineEnd(FileStream stream)
{
int c;
while((in=stream.ReadByte()>-1)
{
if (c==(int)'\n')
{
return;
}
}
}
You can construct a record from a line this way :
public MyRecord GetNextRecord(FileStream stream)
{
var record = new MyRecord
{
Id=GetField(stream,0,9),
...
//6 bytes, not just 4
Comp = GetField(stream,28,6),
..
//Start from 50, 16 bytes
Format = GetField(stream,50,16)
};
SkipToLineEnd(stream);
return myRecord;
}
You can write an iterator method that reads records this way until it reaches the end of file. A quick&dirty method to do that would be to check whether the Position of the stream is so close to the end that no full record can be produced, eg :
public IEnumerable<MyRecord> GetRecords(FileStream stream,int recordLength)
{
while(stream.Position < stream.Length - recordLength)
{
yield return GetRecordNextRecord(stream);
}
}
And use it like this :
var records=GetRecords(myStream,96);
foreach(var record in records)
{
....
}
This will take care of trailing newlines and possibly broken last lines.
To skip the header lines, just call SkipToLineEnd() as many times as needed.
You could use a library like EPPlus to generate an Excel file directly from this, eg
using (var p = new ExcelPackage())
{
var ws=p.Workbook.Worksheets.Add("MySheet");
ws.Cells.LoadFromCollection(records);
p.SaveAs(new FileInfo(#"c:\workbooks\myworkbook.xlsx"));
}

My two cents:
First use "Split" and read from origin up to source column, then from late to unit (note "reverse" order). What is left is format.
IE, if using fixed columns and ONLY format gives problems,
var colsIdToSource = line.left(200); //Assuming 200 is the sum of cols up to source
var colsUnitToLate = line.right(150); //idem from unit to late
var formatColumn = line.substring(200, line.length-150); // May need to adjust a char or less
Then you process the known columns.
Good luck :)

This task is not as easy as it looks. The code work with posted input. It may need a little adjustments. See code below
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Data;
namespace ConsoleApplication100
{
class Program
{
const string FILENAME = #"c:\temp\test.csv";
static void Main(string[] args)
{
//|ID |SN| | Date | Code |Comp|Source| Format |Unit|BuyQTY|DoneQTY|YetQTY|Late
DataTable dt = new DataTable();
dt.Columns.Add("ID", typeof(string));
dt.Columns.Add("SN", typeof(string));
dt.Columns.Add("Date", typeof(string));
dt.Columns.Add("Code", typeof(string));
dt.Columns.Add("Comp", typeof(string));
dt.Columns.Add("Source", typeof(string));
dt.Columns.Add("Format", typeof(string));
dt.Columns.Add("Unit", typeof(string));
dt.Columns.Add("BuyQTY", typeof(int));
dt.Columns.Add("DoneQTY", typeof(int));
dt.Columns.Add("YetQTY", typeof(int));
dt.Columns.Add("Late", typeof(int));
StreamReader reader = new StreamReader(FILENAME, Encoding.Unicode);
string line = "";
int lineCount = 0;
while((line = reader.ReadLine()) != null)
{
if ((++lineCount > 1) && (line.Trim().Length > 0))
{
string leader = line.Substring(0, 30).Trim();
string source = line.Substring(31, 16).Trim();
string trailer = line.Substring(48).TrimStart();
string format = trailer.Substring(0, 12).TrimStart();
trailer = trailer.Substring(12).Trim();
DataRow newRow = dt.Rows.Add();
string[] splitLeader = leader.Split(new char[] {' '}, StringSplitOptions.RemoveEmptyEntries);
newRow["ID"] = splitLeader[0].Trim();
newRow["SN"] = splitLeader[1].Trim();
newRow["Date"] = splitLeader[2].Trim();
newRow["Code"] = splitLeader[3].Trim();
newRow["Comp"] = splitLeader[4].Trim();
newRow["Source"] = source;
newRow["Format"] = format;
newRow["Unit"] = trailer.Substring(0,4).Trim();
newRow["BuyQTY"] = int.Parse(trailer.Substring(4, 8));
string doneQTYstr = trailer.Substring(12, 8).Trim();
if (doneQTYstr.Length > 0)
{
newRow["DoneQTY"] = int.Parse(doneQTYstr);
}
if (trailer.Length <= 28)
{
newRow["YetQTY"] = int.Parse(trailer.Substring(20));
}
else
{
newRow["YetQTY"] = int.Parse(trailer.Substring(20,8));
newRow["late"] = int.Parse(trailer.Substring(28));
}
}
}
}
}
}

Related

Parsing a field and list of people from Excel files

I'm trying to parse Excel (.xls, .xlsx) files. The structure of files is the same except for the amount of the records.
I need to parse the industry. In this case it is "FinTech". Due to the fact that it is in one cell, I guess I have to use a regex expression such as ^Industry: (.*)$?
It has to find which row/column the list of the people starts and put it into a IEnumerable<Person>. It could use the following regex expressions.
Number always consists of 6 digits. ^[0-9]{6}$
Name consists of at least two words where each one of them starts with a capital letter. ^([a-zA-Z]+\s?\b){2,}$
A test .xlsx file can be found here https://docs.google.com/spreadsheets/d/15SR04cHXgGLWe0cuOOuuB5vUZigebh96/edit?usp=sharing&ouid=112418126731411268789&rtpof=true&sd=true.
List of people
Normal condition
Industry: FinTech
# Number Name
1 226250 Zain Griffiths
2 226256 Michael Houghton
3 226259 Hugo Willis Johnson
4 226264 Anna-Maria Rose
The actual question
First of all, I'm not completely sure if my regex expressions are correct. I was only able to display the rows and the columns but I'm not sure how to actually parse the industry and the list of the people into a IEnumerable<Person>. So how do I do that?
Snippet
// Program.cs
var excel = new ExcelParser();
var sheet1 = excel.Import(#"a.xlsx");
Console.OutputEncoding = Encoding.UTF8;
for (var i = 0; i < sheet1.Rows.Count; i++)
{
for (var j = 0; j < sheet1.Columns.Count; j++)
{
var cell = sheet1.Rows[i][j].ToString()?.Trim();
Console.Write($"Column: {cell} | ");
}
Console.WriteLine();
}
Console.ReadLine();
// ExcelParser.cs
public sealed class ExcelParser
{
public ExcelParser()
{
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
}
public DataTable Import(string filePath)
{
// does file exist?
if (!File.Exists(filePath))
{
throw new FileNotFoundException();
}
// .xls or .xlsx allowed
var extension = new FileInfo(filePath).Extension.ToLowerInvariant();
if (extension is not (".xls" or ".xlsx"))
{
throw new NotSupportedException();
}
// read .xls or .xlsx
using var stream = File.Open(filePath, FileMode.Open, FileAccess.Read);
using var reader = ExcelReaderFactory.CreateReader(stream);
var dataSet = reader.AsDataSet(new ExcelDataSetConfiguration
{
ConfigureDataTable = _ => new ExcelDataTableConfiguration
{
UseHeaderRow = false
}
});
// Sheet1
return dataSet.Tables[0];
}
}
The structure of files is the same except for the amount of the records
As long as the table is structured (or semi-structured), you can state one/two simple assumptions and parse the tables based on these assumptions, and in case the structure is not following the assumptions, you will return false (throw exception, etc..).
Actually, designing regexs to parse the table is kind of assumptions encoding.. I just want to Keep it simple, So, Based on the problem statement, here are my assumptions:
There will be a "industry" (or "industry:", call .ToLower()) string in a separate cell (regex will do nothing more than finding such a string), and industry's name will be in the same cell.[1]
First person's name will be next to the first 6-digits-number cell.[2]
Here is the code
public (string industryName, List<string> peopleNames) ParseSheet(DataTable sheet1)
{
// 1. Get Indices of industry cell and first Name in people names..
var industryCellIndex = (-1, -1, false);
var peopleFirstCellIndex = (-1, -1, false);
for (var i = 0; i < sheet1.Rows.Count; i++)
{
for (var j = 0; j < sheet1.Columns.Count; j++)
{
// .ToLower() added
var cell = sheet1.Rows[i][j].ToString()?.Trim().ToLower();
if (cell.StartsWith("industry"))
{
industryCellIndex = (i, j, true);
break;
}
// the name after the first 6-digits number cell will be the first name in people records
if (cell.Length == 6 && int.TryParse(cell, out _))
{
peopleFirstCellIndex = (i, j + 1, true);
break;
}
}
if (industryCellIndex.Item3 && peopleFirstCellIndex.Item3)
break;
}
if (!industryCellIndex.Item3 || !peopleFirstCellIndex.Item3)
{
// throw new Exception("Excel file is not normalized!");
return (null, null);
}
// 2. retrieve the desired data
var industryName = sheet1.Rows[industryCellIndex.Item1][industryCellIndex.Item2]
.Replace(":", ""); // will do nothing if there were no ":"
industryName = industryName.Substring(industryName.IndexOf("indusrty") + "indusrty".Length);
var peopleNames = new List<string>();
var colIndex = peopleFirstCellIndex.Item2;
for (var rowIndex = peopleFirstCellIndex.Item1;
rowIndex < sheet1.Rows.Count;
rowIndex++)
{
peopleNames.Add(sheet1.Rows[rowIndex][colIndex].ToString()?.Trim());
}
return (industryName, peopleNames);
}
[1] If this assumption needs some editing (like: the indusrty name might be the next cell that has "industry" string), the idea still the same.. you can consider this in parsing.
[2] And, for example, after the "#" cell by 2 columns and 1 row.

How do I split my textfile into a 2d array with delimiters in c#?

My textfile contains contents such as:
1/1/2018;0;29;10
1/2/2018;0;16;1
1/3/2018;0;32;1
1/4/2018;0;34;15
1/5/2018;0;19;2
1/6/2018;0;21;2
Further down in the textfiles are decimals which is why I am trying to use double
1/29/2018;0.32;52;38
1/30/2018;0.06;44;21
I am trying to split up the semicolons and assign each value between the semicolons into a 2D array that contains 31 rows and 4 columns.
private void button1_Click(object sender, EventArgs e)
{
// 2d array
double[,] testarray = new double[31, 4];
string inputFile = File.ReadAllText("testywesty.txt");
char[] spearator = {';',' '};
for (int row = 0; row < 31; row++)
{
for (int column = 0; column < 4; column++)
{
string[] strlist = inputFile.Split(spearator);
testarray [row,column] = double.Parse(strlist[column]);
}
}
}
I believe that I have the right loop needed to insert my values into the 2d array, however, I am getting an error for my input and I believe it is because of the slashes.
Is my code sufficient for holding the text file's contents into my array? And how do I deal with the '/' characters?
You'd be getting an error on the slashes because you're trying to convert them to doubles, which is not possible. One thing you can do is first convert them to a DateTime using the Parse method of that class, and then use the ToOADate method to convert it to a double. Note that if you need to convert it back, you can use the DateTime.FromOADate method as I've done in the output below.
Also, it might be helpful to use File.ReadAllLines to read the file into an array of strings, where each string is a file line (this assumes that each line contains the four parts you need as you've shown in the sample file contents in the question). This way each line represents a row, and then we can split that line to get our columns.
For example:
private static void button1_Click(object sender, EventArgs e)
{
var lines = File.ReadAllLines("testywesty.txt");
var items = new double[lines.Length, 4];
var delims = new[] {';', ' '};
for (var line = 0; line < lines.Length; line++)
{
var parts = lines[line].Split(delims);
var maxParts = Math.Min(parts.Length, items.GetLength(1));
for (var part = 0; part < maxParts; part++)
{
if (part == 0)
{
// Parse first item to a date then use ToOADate to make it a double
items[line, part] = DateTime.Parse(parts[part]).ToOADate();
}
else
{
items[line, part] = double.Parse(parts[part]);
}
}
}
// Show the output
var output = new StringBuilder();
for (var row = 0; row < items.GetLength(0); row++)
{
var result = new List<string>();
for (var col = 0; col < items.GetLength(1); col++)
{
result.Add(col == 0
? DateTime.FromOADate(items[row, col]).ToShortDateString()
: items[row, col].ToString());
}
output.AppendLine(string.Join(", ", result));
}
MessageBox.Show(output.ToString(), "Results");
}
Output
Of course, you can read the data and parse it into an array. But since it is polymorphic the array needs to be of object[,] type. This is how I would approach this:
class Program
{
static void Main(string[] args)
{
object[,] array = ReadFileAsArray("testywesty.txt");
}
static object[,] ReadFileAsArray(string file)
{
// how long is the file?
// read it twice, once to count the rows
// and a second time to read each row in
int rows = 0;
var fs = File.OpenText(file);
while (!fs.EndOfStream)
{
fs.ReadLine();
rows++;
}
fs.Close();
var array = new object[rows, 4];
fs = File.OpenText(file);
int row = 0;
while (!fs.EndOfStream)
{
// read line
var line = fs.ReadLine();
// split line into string parts at every ';'
var parts = line.Split(';');
// if 1st part is date store in 1st column
if (DateTime.TryParse(parts[0], out DateTime date))
{
array[row, 0] = date;
}
// if 2nd part is flaot store in 2nd column
if (float.TryParse(parts[1], out float x))
{
array[row, 1] = x;
}
// if 3rd part is integer store in 3rd column
if (int.TryParse(parts[2], out int a))
{
array[row, 2] = a;
}
// if 4rd part is integer store in 4rd column
if (int.TryParse(parts[3], out int b))
{
array[row, 3] = b;
}
row++;
}
fs.Close();
return array;
}
}
But I feel this is clunky. If the data types represented by the file are predetermined than filling in a collection of a custom type feels more natural in C#, as you let the type handle its own data and parsing. Consider the example below:
class Program
{
static void Main(string[] args)
{
IEnumerable<MyData> list = ReadFileAsEnumerable("testywesty.txt");
Debug.WriteLine(MyData.ToHeading());
foreach (var item in list)
{
Debug.WriteLine(item);
}
// date x a b
// 1/1/2018 0 29 10
// 1/2/2018 0 16 1
// 1/3/2018 0 32 1
// 1/4/2018 0 34 15
// 1/5/2018 0 19 2
// 1/6/2018 0 21 2
// 1/29/2018 0.32 52 38
// 1/30/2018 0.06 44 21
}
public static IEnumerable<MyData> ReadFileAsEnumerable(string file)
{
var fs = File.OpenText(file);
while (!fs.EndOfStream)
{
yield return MyData.Parse(fs.ReadLine());
}
fs.Close();
}
}
/// <summary>
/// Stores a row of my data
/// </summary>
/// <remarks>
/// Mutable structures are evil. Make all properties read-only.
/// </remarks>
public struct MyData
{
public MyData(DateTime date, float number, int a, int b)
{
this.Date = date;
this.Number= number;
this.A=a;
this.B=b;
}
public DateTime Date { get; }
public float Number { get; }
public int A { get; }
public int B { get; }
public static MyData Parse(string line)
{
// split line into string parts at every ';'
var parts = line.Split(';');
// if 1st part is date store in 1st column
if (DateTime.TryParse(parts[0], out DateTime date)) { }
// if 2nd part is flaot store in 2nd column
if (float.TryParse(parts[1], out float number)) { }
// if 3rd part is integer store in 3rd column
if (int.TryParse(parts[2], out int a)) { }
// if 4rd part is integer store in 4rd column
if (int.TryParse(parts[3], out int b)) { }
return new MyData(
date,
number,
a,
b);
}
public static string ToHeading()
{
return $"{"date",-11} {"x",-4} {"a",-4} {"b",-4}";
}
public override string ToString()
{
return $"{Date.ToShortDateString(),-11} {Number,4} {A,4} {B,4}";
}
}

Find the maximum length of every column in a csv file

So, I was trying to present a csv document in a console application. However, due to the varying text size in it, the output was not in a presentable format.
To present it, I tried to count the maximum length of text for each column and then append white space to the remaining text in that column so that there's equal length of characters in each column.
I tried to get the character count, but can't seem to figure out how to proceed further.
var file = File.ReadAllLines(#"E:\File.csv");
var lineList = file.Select(x => x.Split(',').ToList()).ToList();
int maxColumn = lineList.Select(x => x.Count).Max(x => x);
List<int> maxElementSize = new List<int>();
for (int i = 0; i < maxColumn; i++)
{
//Some Logic
}
Any help would be highly appreciated.
Here's a sample console application to get maximum character length for each column :
static void Main(string[] args)
{
string CSVPath = #"D:\test.csv";
string outputText = "";
using (var reader = File.OpenText(CSVPath))
{
outputText = reader.ReadToEnd();
}
var colSplitter = ',';
var rowSplitter = new char[] { '\n' };
var rows = (from row in outputText.Split(rowSplitter, StringSplitOptions.RemoveEmptyEntries)
let cols = row.Split(colSplitter)
from col in cols
select new { totalCols = cols.Count(), cols = cols }).ToList();
int[] maxColLengths = new int[rows.Max(o => o.totalCols)];
for (int i = 0; i < rows.Count; i++)
{
for (int j = 0; j < rows[i].cols.Count(); j++)
{
int curLength = rows[i].cols[j].Trim().Length;
if (curLength > maxColLengths[j])
maxColLengths[j] = curLength;
}
}
Console.WriteLine(string.Join(", ", maxColLengths));
}
Hope this helped.
Try with a nested for loop:
var inputLines = File.ReadAllLines(#"E:\File.csv");
Dictionary<int,int> dictIndexLenght = new Dictionary<int,int>();
foreach(var line in inputLines)
{
List<string> columList = line.Split(',').ToList();
for (int i = 0; i < columList.Count; i++)
{
int tempVal = 0;
if(dictIndexLenght.TryGetValue(i,out tempVal))
{
if(tempVal<columList[i].Length)
{
dictIndexLenght[i]=columList[i].Length;
}
}
else
dictIndexLenght[i]=columList[i].Length;
}
}
Can check the result here or with this lines of code:
for(int i=0;i<dictIndexLenght.Count;i++)
{
Console.WriteLine("Column {0} : {1}", i, dictIndexLenght[i]);
}
Here's how I would do it, very similar to un-lucky's answer, only using a List<int> instead of a Dictionary<int, int>. I added dummy data for testing, but you can see the actual call to read the file is left in there, so you can just remove the dummy data and the line that reads it, and it should work ok:
static void Main(string[] args)
{
var fileLines = new List<string>
{
"Lorem, Ipsum, is, simply, dummy, text, of, the, printing, and, typesetting,",
"industry., Lorem, Ipsum, has, been, the, industry's, standard, dummy, text,",
"ever, since, the, 1500s, when, an, ",
"unknown, printer, took, a, galley, of, type, and, scrambled, it, to, make,",
"a, type, specimen, book.,",
"It, has, survived, not, only, five, centuries, but, also, the, leap,",
"into, electronic, typesetting, remaining, essentially, unchanged.,",
"It, was, popularised, in, the, 1960s, with, the, release,",
"of, Letraset, sheets, containing, Lorem, Ipsum, passages, and, more, ",
"recently, with, desktop, publishing,",
"software, like, Aldus, PageMaker, including, versions, of, Lorem, Ipsum."
};
var filePath = #"f:\public\temp\temp.csv";
var fileLinesColumns = File.ReadAllLines(filePath).Select(line => line.Split(','));
var colWidths = new List<int>();
// Remove this line to use file data
fileLinesColumns = fileLines.Select(line => line.Split(','));
// Get the max length of each column and add it to our list
foreach (var fileLineColumns in fileLinesColumns)
{
for (int i = 0; i < fileLineColumns.Length; i++)
{
if (i > colWidths.Count - 1)
{
colWidths.Add(fileLineColumns[i].Length);
}
else if (fileLineColumns[i].Length > colWidths[i])
{
colWidths[i] = fileLineColumns[i].Length;
}
}
}
// Write out our columns, padding each one to match the longest line
foreach (var fileLineColumns in fileLinesColumns)
{
for (int i = 0; i < fileLineColumns.Length; i++)
{
Console.Write(fileLineColumns[i].PadRight(colWidths[i]));
}
Console.WriteLine();
}
Console.Write("\nDone!\nPress any key to exit...");
Console.ReadKey();
}
Output
Initialise your list, then loop over your lines, and within that line, loop over your columns:
for (i = 0; i < lineList.Count; i++)
{
maxElementSize[i] = 0;
}
for (i = 0; i < lineList.Count; i++)
{
for (j = 0; j < maxColumn; j++)
{
if(lineList[i][j].Length > maxElementSize[j])
maxElementSize[j] = lineList[i][j].Length
}
}
I use the following code to make sure the columns in a database are large enough to take the csv input data...
#!/usr/bin/python3
import array as arr
from csv import reader
import argparse
def csv_getFldLens (in_file, has_header=0, delimiter=','):
# open file in read mode
fldMaxLens = arr.array('i')
headers = []
has_header = has_header
with open(in_file, 'r') as read_obj:
# pass the file object to reader() to get the reader object
csv_reader = reader(read_obj, delimiter=delimiter)
# Iterate over each row in the csv using reader object
rcnt = 0
lastIndx = 0
for row in csv_reader:
# row variable is a list that represents a row in csv
# print(row)
if has_header and rcnt == 0:
for fld in row:
headers.append(fld)
rcnt += 1
continue
j = 0
for fld in row:
fldLen = len(fld)
if (lastIndx == 0) or (lastIndx < j):
# print("if --- li, i: ", lastIndx, i, "\n")
fldMaxLens.append(fldLen)
lastIndx = j
else:
# print("else --- li, i: ", lastIndx, i, "\n")
v1 = fldMaxLens[j]
v2 = fldLen
fldMaxLens[j] = max(v1,v2)
j = j + 1
rcnt += 1
j = 0
if has_header:
for f in headers:
print(f,": ", fldMaxLens[j])
j += 1
else:
for i in fldMaxLens:
print("Col[",j+1,"]: ",fldMaxLens[j])
j += 1
if __name__ == "__main__":
parser = argparse.ArgumentParser(description='Get column lengths of CVS fields.')
parser.add_argument('--in_file', default='', help='The CSV input file')
parser.add_argument('--has_header', action='store_true', help='The CSV file has headers')
parser.add_argument('--delimiter', default=',', help='Sets the delimiter. Default is comma \',\'.')
args = parser.parse_args()
csv_getFldLens(in_file=args.in_file, has_header=args.has_header, delimiter=args.delimiter)

Spread sheet column values not displaying correctly in console window after Z

The code below takes in data from an Excel spread sheet validates the data from a set of pre defined rules and writes out any errors to the console.
This works up to a point. The data returns as expected up to column Z. If any errors are returned passed Z. AB, AC, AD, etc. Then return values start messing up an I get values returned like ], ~, ?. I believe this issue is down to ASCII as I am starting from dec 65 (A). I guess I need to write some kind of Method that can cope with this but do not know where to start. Any help is much appreciated.
namespace WorksheetValidator
{
public class XcelReader
{
private readonly List<List<IRule>> m_Rules;
public XcelReader(List<List<IRule>> rules)
{
m_Rules = rules;
}
public void ValidateWorksheet(string fileName)
{
bool allRulesPassed = true;
WorkbookProvider workbookProvider = new WorkbookProvider();
IWorkbook workbook;
using (FileStream fileStream = File.OpenRead(fileName))
workbook = workbookProvider.GetWorkbook(fileStream, SpreadsheetType.Xlsx);
for (int rowCounter = 1; rowCounter < workbook.Worksheets[1].Rows.Count; rowCounter++)
{
IRow row = workbook.Worksheets[1].Rows[rowCounter];
for (int columnCounter = 0; columnCounter < row.Cells.Count; columnCounter++)
{
List<string> failedRules = ColumnValueIsValid(row.Cells[columnCounter].Value, m_Rules[columnCounter]);
failedRules.ForEach(failedRule =>
{
allRulesPassed = false;
Console.WriteLine("\n[{0}:{1}] Failed: {2}", rowCounter + 1, (char)(columnCounter + 65), failedRule);
});
}
}
if(allRulesPassed)
Console.WriteLine("\n\n\nWOOHOO! worksheet is hunky dory");
}
private List<string> ColumnValueIsValid(string value, List<IRule> rules)
{
List<string> failedRules = new List<string>();
rules.ForEach(rule =>
{
if(!rule.IsValid(value))
failedRules.Add(rule.GetReasonForFailure(value));
});
return failedRules;
}
}
}
Replace this:
(char)(columnCounter + 65)
With a function that converts 0 to "A", 1 to "B"..... 26 to "AA", 27 to "AB", etc.

how to optimize data load from binary file

I have a binary file encoded with little endian and containing ~250.000 values of var1 then another same number of values of var2. I should make a method that reads the file and returns a DataSet with those values in the columns var1 and var2.
I am using the library: miscutil mentioned here in SO multiple times, see here as well for details: will there be an update on MiscUtil for .Net 4?
thanks a lot Jon Skeet for making it available. :)
I have the following code working, I am interested in better ideas on how to minimize the for loops to read from the file and to populate the DataTable. Any suggestion?
private static DataSet parseBinaryFile(string filePath)
{
var result = new DataSet();
var table = result.Tables.Add("Data");
table.Columns.Add("Index", typeof(int));
table.Columns.Add("rain", typeof(float));
table.Columns.Add("gnum", typeof(float));
const int samplesCount = 259200; // 720 * 360
float[] vRain = new float[samplesCount];
float[] vStations = new float[samplesCount];
try
{
if (string.IsNullOrWhiteSpace(filePath) || !File.Exists(filePath))
{
throw new ArgumentException(string.Format("Unable to open the file: '{0}'", filePath));
}
// at this point FilePath is valid and exists...
using (FileStream fs = new FileStream(filePath, FileMode.Open))
{
// We are using the library found here: http://www.yoda.arachsys.com/csharp/miscutil/
var reader = new MiscUtil.IO.EndianBinaryReader(MiscUtil.Conversion.LittleEndianBitConverter.Little, fs);
int i = 0;
while (reader.BaseStream.Position < reader.BaseStream.Length) //while (pos < length)
{
// Read Data
float buffer = reader.ReadSingle();
if (i < samplesCount)
{
vRain[i] = buffer;
}
else
{
vStations[i-samplesCount] = buffer;
}
++i;
}
Console.WriteLine("number of reads was: {0}", (i/2).ToString("N0"));
}
for (int j = 0; j < samplesCount; ++j)
{
table.Rows.Add(new object[] { j + 1, vRain[j], vStations[j] });
}
}
catch (Exception exc)
{
Debug.WriteLine(exc.Message);
}
return result;
}
Option #1
Read the entire file into memory (or Memory Map it) and loop once.
Option #2
Add all the data table rows as you read the var1 section with a placeholder value for var2. Then fix-up the data table as you read the var2 section.

Categories

Resources