Unpivot a table in C# - c#

I'm building a function in C# to unpivot a complex table in a CSV file and insert it into a SQL table. The file looks something like this:
| 1/5/2018 | 1/5/2018 | 1/6/2018 | 1/6/2018...
City: | min: | max: | min: | max:
Boston(KBOS) | 1 | 10 | 5 | 12
My goal is to unpivot it like so:
airport_code | localtime | MinTemp | MaxTemp
KBOS | 1/5/2018 | 1 | 10
KBOS | 1/6/2018 | 5 | 12
My strategy is:
Store the first row of dates and the second row of headers into arrays
Use a CSV parser to read each following line and loop through each field
If the date that corresponds to the current field is same as the previous one, it's belongs in the same row. Put the data into the appropriate field.
Since there are only two temperature fields for each row, this row is complete can now be inserted.
Otherwise, start a new row and put the data into the appropriate field.
However, I'm running into a problem: Once insertRow is populated and inserted, I can't overwrite it or null all the fields and use it again - that throws an error that row has already been inserted. I can't move the declaration of insertRow inside the for loop because I need to preserve the data through multiple iterations to completely fill out the row. So instead I tried to declare it outside the loop but only initialize it inside the loop, something like:
if(insertRow == null)
{
insertRow = MyDataSet.tblForecast.NewtblForecastRow();
}
But that throws a "use of unassigned local variable" error. Any ideas about how I can preserve insertRow on some iterations and dispose of it on others? Or, any suggestions about a better way to do what I'm looking for? The relevant portion of the code is below:
using (TextFieldParser csvParser = new TextFieldParser(FileName))
{
csvParser.SetDelimiters(new string[] { "," });
csvParser.ReadLine(); //Skip top line
string[] dateList = csvParser.ReadFields();//Get dates from second line.
string[] fieldNames = csvParser.ReadFields();//Get headers from third line
//Read through file
while (!csvParser.EndOfData)
{
DataSet1.tblForecastRow insertRow = MyDataSet.tblForecast.NewtblForecastRow();
string[] currRec = csvParser.ReadFields();
//Get airport code
string airportCode = currRec[0].Substring(currRec[0].LastIndexOf("(") + 1, 4);
//Unpivot record
DateTime currDate = DateTime.Parse("1/1/1900");//initialize
DateTime prevDate;
for (int i = 1; i<fieldNames.Length; i++) //skip first col
{
prevDate = currDate;//previous date is the prior current date
DateTime.TryParse(dateList[i], out currDate);//set new current date
int val;
int.TryParse(currRec[i], out val);
switch (fieldNames[i].ToLower())
{
case "min:":
insertRow["MinTemp"] = val;
break;
case "max:":
insertRow["MaxTemp"] = val;
break;
}
if (currDate == prevDate)//if same date, at end of row, insert
{
insertRow["airport_code"] = airportCode;
insertRow["localTime"] = currDate;
insertRow["Forecasted_date"] = DateTime.Today;
MyDataSet.tblForecast.AddtblForecastRow(insertRow);
ForecastTableAdapter.Update(MyDataSet.tblForecast);
}
}
}
}

You create a new row when you finished handling the current one. And you already know where that is:
if (currDate == prevDate)//if same date, at end of row, insert
{
insertRow["airport_code"] = airportCode;
insertRow["localTime"] = currDate;
insertRow["Forecasted_date"] = DateTime.Today;
// we're storing insertRow
MyDataSet.tblForecast.AddtblForecastRow(insertRow);
// now it gets saved (man that is often)
ForecastTableAdapter.Update(MyDataSet.tblForecast);
// OKAY, let's create the new insertRow instance
insertRow = MyDataSet.tblForecast.NewtblForecastRow();
// and now on the next time we end up in this if
// the row we just created will be inserted
}
Your initial Row can be created outside the loop:
// first row creation
DataSet1.tblForecastRow insertRow = MyDataSet.tblForecast.NewtblForecastRow();
//Read through file
while (!csvParser.EndOfData)
{
// line moved out of the while loop
string[] currRec = csvParser.ReadFields();

Related

I want to split column data into Different column

i have data in a column which i want to split into different column.
data in column is not consistent.
eg:-
974/mt (ICD TKD)
974/mt (+AD 91.27/mt, ICD/TKD)
970-980/mt
970-980/mt
i have tried with substring but not found any solution
OUTPUT SHOULD BE:-
min |max | unit | description
-------------------------
NULL | 974 | /mt | ICD TKD
NULL | 974 | /mt |+AD 91.27/mt, ICD/TKD
970 | 980 | /mt |NULL
You can use Regex to parse the information, and then add columns with the parsed data.
Assumptions (due to lack of clarity in OP)
Min Value is optional
If present, Min Value is succeeded by a "/", followed by Max Value
Description is optional
Since OP haven't mentioned what to assume when Min Value is not available, I have used string type for Min/Max values, but should be ideally replaced by apt DataType.
public Sample Split(string columnValue)
{
var regex = new Regex(#"(?<min>\d+-)?(?<max>\d+)(?<unit>[\/a-zA-Z]+)\s?(\((?<description>(.+))\))?",RegexOptions.Compiled);
var match = regex.Match(columnValue);
if(match.Success)
{
return new Sample
{
Min = match.Groups["min"].Value,
Max = match.Groups["max"].Value,
Unit = match.Groups["unit"].Value,
Description = match.Groups["description"].Value
};
}
return default;
}
public class Sample
{
public string Min{get;set;}
public string Max{get;set;}
public string Unit{get;set;}
public string Description{get;set;}
}
For Example,
var list = new []
{
#"974/mt (ICD TKD)",
#"974/mt (+AD 91.27/mt, ICD/TKD)",
#"970-980/mt",
"970-980/mt"
};
foreach(var item in list)
{
var result = Split(item);
Console.WriteLine($"Min={result.Min},Max={result.Max},Unit={result.Unit},Description={result.Description}");
}
Output
Min=,Max=974,Unit=/mt,Description=ICD TKD
Min=,Max=974,Unit=/mt,Description=+AD 91.27/mt, ICD/TKD
Min=970-,Max=980,Unit=/mt,Description=
Min=970-,Max=980,Unit=/mt,Description=

EPPlus number format

I have an Excel sheet generated with Epplus, I am experiencing some pain points and I wish to be directed by someone who have solved a similar challenge.
I need to apply number formatting to a double value and I want to present it in Excel like this.
8 → 8.0
12 → 12.0
14.54 → 14.5
0 → 0.0
Here is my code
ws.Cells[row, col].Style.Numberformat.Format = "##0.0";
The final Excel file always append E+0 to the end of this format and therefore presents the final values like this instead.
8 → 8.0E+0
12 → 12.0E+0
14.54 → 14.5E+0
0 → 000.0E+0
When I check in the format cells of the generated Excel sheet, I see that my format appears as ##0.0E+2 instead of ##0.0 that I applied.
What may be wrong?
Here are some number format options for EPPlus:
//integer (not really needed unless you need to round numbers, Excel will use default cell properties)
ws.Cells["A1:A25"].Style.Numberformat.Format = "0";
//integer without displaying the number 0 in the cell
ws.Cells["A1:A25"].Style.Numberformat.Format = "#";
//number with 1 decimal place
ws.Cells["A1:A25"].Style.Numberformat.Format = "0.0";
//number with 2 decimal places
ws.Cells["A1:A25"].Style.Numberformat.Format = "0.00";
//number with 2 decimal places and thousand separator
ws.Cells["A1:A25"].Style.Numberformat.Format = "#,##0.00";
//number with 2 decimal places and thousand separator and money symbol
ws.Cells["A1:A25"].Style.Numberformat.Format = "€#,##0.00";
//percentage (1 = 100%, 0.01 = 1%)
ws.Cells["A1:A25"].Style.Numberformat.Format = "0%";
//accounting number format
ws.Cells["A1:A25"].Style.Numberformat.Format = "_-$* #,##0.00_-;-$* #,##0.00_-;_-$* \"-\"??_-;_-#_-";
Don't change the decimal and thousand separators to your own
localization. Excel will do that for you.
By request some DateTime formatting options.
//default DateTime pattern
worksheet.Cells["A1:A25"].Style.Numberformat.Format = DateTimeFormatInfo.CurrentInfo.ShortDatePattern;
//custom DateTime pattern
worksheet.Cells["A1:A25"].Style.Numberformat.Format = "dd-MM-yyyy HH:mm";
Addition to Accepted Answer, because value Accept Object you must pass Number to Value For Example if your input is in string :
var input = "5";
ws.Cells["A1:A25"].Value = double.Parse(input);
Another addition to the accepted answer: you can use nullable values and the formatting all looks good BUT it ends up being a string in Excel and you can't SUM, AVG etc.
So make sure you use the actual Value of the nullable.
And if you want to format a specific column like column "B" to number format you can do it this way-
using (var package = new ExcelPackage())
{
var worksheet = package.Workbook.Worksheets.Add("SHEET1");
worksheet.Cells["A1"].LoadFromDataTable(dataTable, PrintHeaders: true);
for (var col = 1; col < dataTable.Columns.Count + 1; col++)
{
if (col == 2)//col number 2 is equivalent to column B
{
worksheet.Column(col).Style.Numberformat.Format = "#";//apply the number formatting you need
}
worksheet.Column(col).AutoFit();
}
return File(package.GetAsByteArray(), XlsxContentType, "report.xlsx");//downloads file
}
I solved it as follows, so I just load the model and change as per my model if it is int ordatetime
var li = typeof(Model)
.GetProperties()
.ToArray();
using (var package = new ExcelPackage(stream))
{
var workSheet = package.Workbook.Worksheets.Add("Sheet1");
var i = 0;
foreach (var c in li)
{
i++;
if(c.PropertyType.Name == typeof(DateTime).Name || c.PropertyType.Name == typeof(DateTime?).Name)
workSheet.Column(i).Style.Numberformat.Format = DateTimeFormatInfo.CurrentInfo.ShortDatePattern; ;
if (c.PropertyType.Name == typeof(int).Name || c.PropertyType.Name == typeof(int?).Name)
workSheet.Column(i).Style.Numberformat.Format = "0";
}
}

Processing a text file where the fields are not consistent

A vendor is providing a delimited text file but the file can and likely will be custom for each customer. So if the specification provides 100 fields I may only receive 10 fields.
My concern is the overhead of each loop. In all I am using a while and 2 for loops just for the header and there will at least as many for the detail.
My answer is as follows:
using (StreamReader sr = new StreamReader(flName))
{
//Process first line to get field names
flHeader = sr.ReadLine().Split(charDelimiters);
//Check first field to determine header or detail file
if (flHeader[0].ToUpper() == "ORDERID")
{
header = true;
} else if (flHeader[0].ToUpper() == "ORDERITEMID"){
detail = true;
}
}
//Use TextFieldParser to read and parse files
using (TextFieldParser parser = new TextFieldParser(flName))
{
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(delimiters);
while (!parser.EndOfData)
{
string[] fields = parser.ReadFields();
//Send read line to header or detail processor
if (header == true)
{
if (flHeader[0] != fields[0])
{
ProcessHeader(fields);
}
}
if (detail == true)
{
if (flHeader[0] != fields[0])
{
ProcessDetail(fields);
}
}
}
//Header Processor snippet
//Declare header class
Data.BLL.OrderExportHeader_BLL OrderHeaderBLL = new Data.BLL.OrderExportHeader_BLL();
foreach (string field in fields)
{
int fldCnt = fields.Count();
//Loop through each field then use the switch to determine which field is to be filled in
for (int flds = 0; flds < fldCnt; flds++ )
{
string strField = field.Trim();
switch (flHeader[flds].ToUpper())
{
case "ORDERID":
OrderHeaderBLL.OrderID = strField;
break;
}
}
}
//header file
OrderID ManufacturerID CustomerID SalesRepID PONumber OrderDate CustomerName CustomerNumber RepNumber Discount Terms ShipVia Notes ShipToCompanyName ShipToContactName ShipToContactPhone ShipToFax ShipToContactEmail ShipToAddress1 ShipToAddress2 ShipToCity ShipToState ShipToZip ShipToCountry ShipDate BillingAddress1 BillingAddress2 BillingCity BillingState BillingZip BillingCountry FreightTerm PriceLevel OrderType OrderStatus IsPlaced ContactName ContactPhone ContactEmail ContactFax Exported ExportDate Source ContainerName ContainerCubes Origin MarketName FOB SubTotal OrderTotal TaxRate TaxTotal ShippingTotal IsDeleted IsContainer OrderGUID CancelDate DoNotShipBefore WrittenByName WrittenForName WrittenForRepNumber CatalogCode CatalogName ShipToCode
491975 18 0 2621 1234 7/17/2014 RepZio 2499174 0 Test 561-351-7416 max#repzio.com 465 Ocean Ridge Way Juno Beach FL 33408 7/18/2014 465 Ocean Ridge Way Juno Beach FL 33408 USA 0 ShopZio True Max Fraser 561-351-7416 max#repzio.com False ShopZio 0.00 ShopZio 1500.0000 1500.0000 0.000 0.0000 0.0000 False False 63960a7b-86b7-47a2-ad11-9763a6b52fd0 7/31/2014 7/18/2014
Your sample data is the key, and your sample is currently obscure, but I think it matches the description that follows.
Per your example of 10 fields out of a possible 100.
In parsing each line, you only need to split in into 10 fields. It looks like you are delimited by whitespace, but you have a problem in that fields can contain embedded whitespace. Perhaps your data is actually tab delimited in which case you are ok.
For simplicity, I am going to assume your 100 fields are name 'fld0', 'fld1', ..., 'fld99'
Now, assuming the received file contains this header
fld10, fld50, fld0, fld20, fld80, fld70, fld0, fld90, fld50, fld60
and a line of data looks like
Alpha Bravo Charlie Delta Echo Foxtrot Golf Hotel India Juliet
e.g.
split[0] = "Alpha", split[1] = "Bravo", etc.
You parse the header and find that the indexes in your master list of 100 fields are 10,50,0 etc.
So you build a lookupFld array with these index value, i.e., lookupFld[0] = 10, lookupFld[1] = 50, etc
Now, as you process each line, split into 10 fields and you have an immediate indexed lookup of the correct corresponding field in your master field list.
Now MasterList[0] = "fld0", MasterList[1] = "fld1", ..., MasterList[99] = "fld99"
for (ii=0; ii<lookupFld.count; ++ii)
{
// MasterField[lookupFld[ii]] is represented by with split[ii]
// when ii = 0
// lookupFld[0] is 10
// so MasterField[10] /* fld10 */ is represented by split[0] /* alpha */
}

Splitting an array into 2 parts

I am attempting to read a log file in this format:
date | cost
date | cost
..ect
Using the following code to read the file in to an array:
string[] lines = File.ReadAllLines("log.txt");
My question is how do I slice the array in to 2 parts per line so that I can add them to a list view of 2 columns? I was thinking perhaps a dictionary would be a good start..
Assuming this is C# rather than C, the following may do what you're looking for:
public class LogEntry{
public string Date;
public string Cost;
public LogEntry(string date,string cost){
Date=date;
Cost=cost;
}
}
...
// Grab the lines from the file:
string[] lines = File.ReadAllLines("log.txt");
// Create our output set:
LogEntry[] logEntries=new LogEntry[lines.Length];
// For each line in the file:
for(int i=0;i<lines.Length;i++){
// Split the line:
string[] linePieces=lines[i].Split('|');
// Safety check - make sure this is a line we want:
if(linePieces.Length!=2){
// No thanks!
continue;
}
// Create the entry:
logEntries[i]=new LogEntry( linePieces[0] , linePieces[1] );
}
// Do something with logEntries.
Note that this sort of processing should only be done with a relatively small log file. File.ReadAllLines("log.txt") becomes very inefficient with large files, at which point using a raw FileStream is more suitable.
var lines = File.ReadAllLines("log.txt").Select(l=> l.Split('|'));
var dictionary= lines.ToDictionary(x => x[0], y => y[1]);
Use a 2D array and string.Split('-')
string[] lines = File.ReadAllLines("log.txt");
//Create an array with lines.Length rows and 2 columns
string[,] table = new string[lines.Length,2];
for (int i = 0; i < lines.Length; i++)
{
//Split the line in 2 with the | character
string[] parts = lines[i].Split('|');
//Store them in the array, trimming the spaces off
table[i,0] = parts[0].Trim();
table[i,1] = parts[1].Trim();
}
Now you will have an array that looks like this:
table[date, cost]
You could use a dictionary so you only have to look up the date if you want to improve it. EDIT: As #Damith has done
Additionally, with LINQ you could simplify this into:
var table = File.ReadAllLines("log.txt").Select(s => s.Split('|')).ToDictionary(k => k[0].TrimEnd(' '), v => v[1].TrimStart(' '));
Which you can now easily get the results from the LINQ expression with:
foreach (KeyValuePair<string, string> kv in table)
{
Console.WriteLine("Key: " + kv.Key + " Value: " + kv.Value);
}
Also note if you do not need the spaces in your file you can omit the Trim()s
And just because this post was originally tagged C :)
Here is a C example:
With a data file (I called it temp.txt) that looks like this:
3/13/56 | 13.34
3/14/56 | 14.14
3/15/56 | 15.00
3/16/56 | 16.56
3/17/56 | 17.87
3/18/56 | 18.34
3/19/56 | 19.31
3/20/56 | 20.01
3/21/56 | 21.00
This code will read it, parse it into a single 2 dim string array, char col[2][80][20];
#include <ansi_c.h>
int main()
{
int i;
char *buf;
char line[260];
char col[2][80][20];
FILE *fp;
fp = fopen("c:\\dev\\play\\temp.txt", "r");
i=-1;
while(fgets(line, 260, fp))
{
i++;
buf = strtok(line, "|");
if(buf) strcpy(col[0][i], buf);
buf = strtok(NULL, "|");
if(buf) strcpy(col[1][i], buf);
}
fclose(fp);
return 0;
}

Convert a file full of "INSERT INTO xxx VALUES" in to something Bulk Insert can parse

This is a followup to my first question "Porting “SQL” export to T-SQL".
I am working with a 3rd party program that I have no control over and I can not change. This program will export it's internal database in to a set of .sql each one with a format of:
INSERT INTO [ExampleDB] ( [IntField] , [VarcharField], [BinaryField])
VALUES
(1 , 'Some Text' , 0x123456),
(2 , 'B' , NULL),
--(SNIP, it does this for 1000 records)
(999, 'E' , null);
(1000 , 'F' , null);
INSERT INTO [ExampleDB] ( [IntField] , [VarcharField] , BinaryField)
VALUES
(1001 , 'asdg', null),
(1002 , 'asdf' , 0xdeadbeef),
(1003 , 'dfghdfhg' , null),
(1004 , 'sfdhsdhdshd' , null),
--(SNIP 1000 more lines)
This pattern continues till the .sql file has reached a file size set during the export, the export files are grouped by EXPORT_PATH\%Table_Name%\Export#.sql Where the # is a counter starting at 1.
Currently I have about 1.3GB data and I have it exporting in 1MB chunks (1407 files across 26 tables, All but 5 tables only have one file, the largest table has 207 files).
Right now I just have a simple C# program that reads each file in to ram then calls ExecuteNonQuery. The issue is I am averaging 60 sec/file which means it will take about 23 hrs for it to do the entire export.
I assume if I some how could format the files to be loaded with a BULK INSERT instead of a INSERT INTO it could go much faster. Is there any easy way to do this or do I have to write some kind of Find & Replace and keep my fingers crossed that it does not fail on some corner case and blow up my data.
Any other suggestions on how to speed up the insert into would also be appreciated.
UPDATE:
I ended up going with the parse and do a SqlBulkCopy method. It went from 1 file/min. to 1 file/sec.
Well, here is my "solution" for helping convert the data into a DataTable or otherwise (run it in LINQPad):
var i = "(null, 1 , 'Some''\n Text' , 0x123.456)";
var pat = #",?\s*(?:(?<n>null)|(?<w>[\w.]+)|'(?<s>.*)'(?!'))";
Regex.Matches(i, pat,
RegexOptions.IgnoreCase | RegexOptions.Singleline).Dump();
The match should be run once per value group (e.g. (a,b,etc)). Parsing of the results (e.g. conversion) is left to the caller and I have not tested it [much]. I would recommend creating the correctly-typed DataTable first -- although it may be possible to pass everything "as a string" to the database? -- and then use the information in the columns to help with the extraction process (possibly using type converters). For the captures: n is null, w is word (e.g. number), s is string.
Happy coding.
Apparently your data is always wrapped in parentheses and starts with a left parenthesis. You might want to use this rule to split(RemoveEmptyEntries) each of those lines and load it into a DataTable. Then you can use SqlBulkCopy to copy all at once into the database.
This approach would not necessarily be fail-safe, but it would be certainly faster.
Edit: Here's the way how you could get the schema for every table:
private static DataTable extractSchemaTable(IEnumerable<String> lines)
{
DataTable schema = null;
var insertLine = lines.SkipWhile(l => !l.StartsWith("INSERT INTO [")).Take(1).First();
var startIndex = insertLine.IndexOf("INSERT INTO [") + "INSERT INTO [".Length;
var endIndex = insertLine.IndexOf("]", startIndex);
var tableName = insertLine.Substring(startIndex, endIndex - startIndex);
using (var con = new SqlConnection("CONNECTION"))
{
using (var schemaCommand = new SqlCommand("SELECT * FROM " tableName, con))
{
con.Open();
using (var reader = schemaCommand.ExecuteReader(CommandBehavior.SchemaOnly))
{
schema = reader.GetSchemaTable();
}
}
}
return schema;
}
Then you simply need to iterate each line in the file, check if it starts with ( and split that line by Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries). Then you could add the resulting array into the created schema-table.
Something like this:
var allLines = System.IO.File.ReadAllLines(path);
DataTable result = extractSchemaTable(allLines);
for (int i = 0; i < allLines.Length; i++)
{
String line = allLines[i];
if (line.StartsWith("("))
{
String data = line.Substring(1, line.Length - (line.Length - line.LastIndexOf(")")) - 1);
var fields = data.Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries);
// you might need to parse it to correct DataColumn.DataType
result.Rows.Add(fields);
}
}

Categories

Resources