I have written the code with reference from one of the post on this forum but getting below error
my csv data as
how to proceed?
Before you can add data to a DataTable, you must add the columns you will need. You can also check if the column exists before adding it using dt.Columns.Contains.... If your CSV file has a header row you can use that to give your columns some header text.
Something like (this compiles but not actually run):
var dt = new DataTable();
var rows = csvfile.Split('\n');
// Add the columns
var colHeaders = rows[0].Split(',');
foreach(var header in colHeaders)
{
dt.Columns.Add(header, typeof(string));
}
// now add the data rows
foreach(var row in rows.Skip(1))
{
if(!string.IsNullOrEmpty(row))
{
var data = row.Split(',');
foreach(var d in data)
{
dt.Rows.Add(d);
}
}
}
There are other examples available on the internet specifically for reading CSV files into DataTables (here's one).
Related
I"m using SSIS package with Script task to get files not older then n days and it's working fine, but now I need to bring into next step CreatedTime for each file. Below I pasted the body of my script. It works partially I just can't pass new var into LastUpdated. Frankly don't know how to deal with this structure, can I add another dimension to into existing list of create another list. I plan to use User:LastUpdated in the same way as FileNameArray.
Tx much !)
DataTable NewList = new DataTable();
DataColumn col = new DataColumn("FileName");
NewList.Columns.Add(col);
DataColumn col2 = new DataColumn("LastUpdated", System.Type.GetType("System.DateTime"));
NewList.Columns.Add(col2);
foreach (string f in MyDirFiles)
{
finf = new System.IO.FileInfo(f);
if (finf.LastWriteTime > DateTime.Now.AddDays(-7) )
)
{
NewList.Rows.Add(System.IO.Path.GetFileName(f) ,
System.IO.File.GetCreationTime(f));
}
}
Dts.Variables["User::FileNameArray"].Value = NewList.Columns["FileName"]; //<--- need convert into object
////**Dts.Variables["User::LastUpdated"].Value = NewList(xxx);
Dts.TaskResult = (int)ScriptResults.Success;
From your code and comments - can conclude the following:
NewList2 variable has DataTable type (not present in code)
User:LastUpdated SSIS package variable has DateTime type
In this case - you are trying to assign a complex structure (DataTable) to single value DateTime variable, which certainly raises an error. To do so, change type of User:LastUpdated to Object.
One can extend NewList table to contain both columns, like in the example below
DataTable NewList = new DataTable();
DataColumn col = new DataColumn("FileName");
NewList.Columns.Add(col);
DataColumn col2 = new DataColumn("LastUpdated", System.Type.GetType("System.DateTime"));
NewList.Columns.Add(col2);
Adding a new row will be more awkward.
DataRow newRow = NewList.NewRow();
newRow["FileName"] = System.IO.Path.GetFileName(f);
newRow["LastUpdated"] = System.IO.File.GetCreationTime(f);
NewList.Rows.Add(newRow);
CSVHelper and FileHelper is not an option
I have a .csv export that I need to check for consistency structured like the below
Reference,Date,EntryID
ABC123,08/09/2015,123
ABD234,08/09/2015,124
XYZ987,07/09/2015,125
QWE456,08/09/2016,126
I can use ReadLine or RealAllLines and .Split which give me entire rows/columns BUT I have need to select each row and then go through each attribute (separated by ',') for format checking
I am running into problems here. I can not single out each value in a row for this check.
It is probably either something simple onto
class Program
{
static void Main(string[] args)
{
string csvFile = #"proof.csv";
string[] lines = File.ReadAllLines(csvFile);
var values = lines.Skip(1).Select(l => new { FirstRow = l.Split('\n').First(), Values = l.Split('\n').Select(v => int.Parse(v)) });
foreach (var value in values)
{
Console.WriteLine(string.Format("{0}", value.FirstRow));
}
}
}
Or I am going down the wrong path, my searches relate to pulling specific rows or columns (as opposed to checking the individual values associated)
The sample of the data above has a highlighted example: The date is next year and I would like to be able to proof that value (just an example as it could be in either column where errors appear)
I can not single out each value in a row
That's because you split on \n twice. The values within a row are separated by comma (,).
I'm not sure what all that LINQ is supposed to do, but it's as simple as this:
string[] lines = File.ReadAllLines(csvFile);
foreach (var line in lines.Skip(1))
{
var values = line.Split(',');
// access values[0], values[1] ...
}
Instead of reading it as text read it by OLEDB object, so data of CSV file will come in datatable and you do not need to spit it.
To Read the csv file you can use these objects of OLEDB
System.Data.OleDb.OleDbCommand
System.Data.OleDb.OleDbDataAdapter
System.Data.OleDb.OleDbConnection
and
System.Data.DataTable
Quick question regarding filehelper library:
I have used file helper engine to read stream, do my validation and if the CSV file has not got a header we need to match/map it to my model: i.e
id, name, age, phone, sex,
but the CSV might not come in this format/order all the time and we need to match them using a drop down list for each column.
Is there any way I can do this?
Thannks,
The short answer, no. BUT you can create a dependent class dynamically:
Since you have the list of possible fields in your JSON file, I would recommend doing a basic System.IO ReadLine for the first data row, and then parse by your delimiter for the individual headers. i.e.:
string headerString;
var headers = new List<String>();
var file = new System.IO.StreamReader("C:\\myFile.txt");
headerString = file.ReadLine();
file.Close();
headers = headerString.Split(',').ToList();
now you have the list of strings for the first row to match against your JSON file. Then you can create your dependent class using System.Reflection.Emit (referenced link below)
typeBuilder.SetParent(typeof(MyFileHelperBaseClass));
// can place the property definitions in a for loop against your headers
foreach(string h in headers){
typeBuilder.DefineProperty("<header/col#>", ..., typeof(System.Int32), null);
}
stackoverflow article 14724822: How Can I add properties to a class on runtime in C#?
File Helpers gets a little finicky at times, so it will take some tweaking.
Hope this helps
You can use File.ReadLines(#"C:\myfile.txt").First() to read the first line and get the headers.
Then you can just use a FileHelpers CodeBuilder to build your runtime class. From the example for a delimited csv file:
DelimitedClassBuilder cb = new DelimitedClassBuilder("Customers", ",");
cb.IgnoreFirstLines = 1;
cb.IgnoreEmptyLines = true;
cb.AddField("BirthDate", typeof(DateTime));
cb.LastField.TrimMode = TrimMode.Both;
cb.LastField.FieldNullValue = DateTime.Today;
cb.AddField("Name", typeof(string));
cb.LastField.FieldQuoted = true;
cb.LastField.QuoteChar = '"';
cb.AddField("Age", typeof(int));
engine = new FileHelperEngine(cb.CreateRecordClass());
DataTable dt = engine.ReadFileAsDT("testCustomers.txt");
Then you can traverse the resulting data table.
When I use the below line It reads all tables of that particular document:
foreach (Microsoft.Office.Interop.Word.Table tableContent in document.Tables)
But I want to read tables of a particular content for example from one identifier to another identifier.
Identifier can be in the form of [SRS oraganisation_123] to another identifier [SRS Oraganisation_456]
I want to read the tables only in between the above mentioned identifiers.
Suppose 34th page contains my identifier so I want read all tables from that point to until I come across my second identifier. I don't want to read remaining tables.
Please ask me for any clarification in the question.
Say start and end Identifiers are stored in variables called myStartIdentifier and myEndIdentifier -
Range myRange = doc.Range();
int iTagStartIdx = 0;
int iTagEndIdx = 0;
if (myRange.Find.Execute(myStartIdentifier))
iTagStartIdx = myRange.Start;
myRange = doc.Range();
if (myRange.Find.Execute(myEndIdentifier))
iTagEndIdx = myRange.Start;
foreach (Table tbl in doc.Range(iTagStartIdx,iTagEndIdx).Tables)
{
// Your code goes here
}
Not sure how your program is structured... but if you can access the identifier in tableContent then you should be able to write a LINQ query.
var identifiers = new List<string>();
identifiers.Add("myIdentifier");
var tablesWithOnlyTheIdentifiersIWant = document.Tables.Select(tableContent => identifiers.Contains(tableContent.Identifier)
foreach(var tableContent in tablesWithOnlyTheIdentifiersIWant)
{
//Do something
}
Go through following code, if it helps you.
System.Data.DataTable dt = new System.Data.DataTable();
foreach (Microsoft.Office.Interop.Word.Cell c in r.Cells)
{
if(c.Range.Text=="Content you want to compare")
dt.Columns.Add(c.Range.Text);
}
foreach (Microsoft.Office.Interop.Word.Row row in newTable.Rows)
{
System.Data.DataRow dr = dt.NewRow();
int i = 0;
foreach (Cell cell in row.Cells)
{
if (!string.IsNullOrEmpty(cell.Range.Text)&&(cell.Range.Text=="Text you want to compare with"))
{
dr[i] = cell.Range.Text;
}
}
dt.Rows.Add(dr);
i++;
}
Go through following linked 3rd number answer.
Replace bookmark text in Word file using Open XML SDK
I have no control over how the data is saved in this table. However, I have to query the table and combine the data for similar pn_id column as one row/record.
For instance current data structure is as follows,
Here we have same pn_id repeated with different question ids. This should have been really saved as one pn_id and then each question as a separate column, per my opinion. However, I have to retrieve the below data as one record like this this..
Any idea how this can be done?
Thanks
Here's some pseudocode for the transform algorithm. Note that it requires scanning the entire data set twice; there are a few other opportunities to improve the efficiency, for example, if the input data can be sorted. Also, since it's pseudocode, I haven't added handling for null values.
var columnNames = new HashSet<string> { "pn_id" };
foreach (var record in data)
columnNames.Add(record.question_id.ToString());
var table = new DataTable();
foreach (var name in columnNames)
table.Columns.Add(new DataColumn(name, typeof(string)));
foreach (var record in data)
{
var targetRecord = CreateNewOrGetExistingRecord(table, record.pn_id);
targetRecord[record.question_id.ToString()] = record.char_value ?? record.date_value.ToString();
}
And here's a sketch of the helper method:
DataRow CreateNewOrGetExistingRecord(DataTable table, object primaryKeyValue)
{
var result = table.Find(primaryKeyValue);
if (result != null)
return result;
//add code here to create a new row, add it to the table, and return it to the caller
}
the structure is fine. Wouldn't make sense to have one columns per question because you would have to add a new column every time a new question were added.
Your problem can easily be solved with PIVOT. Take a look at this link for explanation