Modify CSV file headers/column names using Cinchoo ETL

Modify CSV file headers/column names using Cinchoo ETL - c#

I have a .Net Core application where I want to change the column names of a csv file. I'm using the Cinchoo ETL library. I have tried the following:
string csv = "../../../../data.csv";
using (var w = new ChoCSVWriter(csv).WithFirstLineHeader().Setup(s => s.FileHeaderWrite += (o, e) =>
{
e.HeaderText = "Test,Test2";
}))
{
w.Write(csv);
}
This is what my data.csv file looks like:
ID,Name
1, David
2, Bob
This is what my csv looks like after running my code:
Test,Test2
../../../../data.csv
The csv header names have changed but my issue is that it deleted all my data and added the path to the file for some odd reason. Any ideas on why that is?

Couple of ways you can rename the columns with new names and produce the CSV output
Option1:
StringBuilder csvIn = new StringBuilder(#"ID,Name
1, David
2, Bob");
StringBuilder csvOut = new StringBuilder();
using (var r = new ChoCSVReader(csvIn)
.WithFirstLineHeader()
)
{
using (var w = new ChoCSVWriter(csvOut)
.WithFirstLineHeader()
)
w.Write(r.Select(r1 => new { Test1 = r1.ID, Test2 = r1.Name }));
}
Console.WriteLine(csvOut.ToString());
Option2:
StringBuilder csvIn = new StringBuilder(#"ID,Name
1, David
2, Bob");
StringBuilder csvOut = new StringBuilder();
using (var r = new ChoCSVReader(csvIn)
.WithFirstLineHeader()
)
{
using (var w = new ChoCSVWriter(csvOut)
.WithFirstLineHeader()
.Setup(s => s.FileHeaderWrite += (o, e) =>
{
e.HeaderText = "Test,Test2";
})
)
w.Write(r);
}
Console.WriteLine(csvOut.ToString());
UPDATE:
Using CSV files instead of text input
string csvInFilePath = #"C:\CSVIn.csv"
string csvOutFilePath = #"C:\CSVOut.csv"
using (var r = new ChoCSVReader(csvInFilePath)
.WithFirstLineHeader()
)
{
using (var w = new ChoCSVWriter(csvOutFilePath)
.WithFirstLineHeader()
)
w.Write(r.Select(r1 => new { Test1 = r1.ID, Test2 = r1.Name }));
}
UPDATE:
To get the headers, cast record to IDictionary and use Keys property on it to get the keys
string csvInFilePath = #"C:\CSVIn.csv"
string csvOutFilePath = #"C:\CSVOut.csv"
using (var r = new ChoCSVReader(csvInFilePath)
.WithFirstLineHeader()
)
{
foreach (IDictionary<string, object> rec in r)
{
var keys = rec.Keys.ToArray();
}
}
In order to auto discover the datatypes of CSV columns, you must set the MaxScanRows on parser. Otherwise all columns will be treated as string type.
StringBuilder csvIn = new StringBuilder(#"ID,Name,Date
1, David, 1/1/2018
2, Bob, 2/12/2019");
using (var r = new ChoCSVReader(csvIn)
.WithFirstLineHeader()
.WithMaxScanRows(2)
)
{
foreach (IDictionary<string, object> rec in r.Take(1))
{
foreach (var kvp in rec)
Console.WriteLine($"{kvp.Key} - {r.Configuration[kvp.Key].FieldType}");
}
}
Hope it helps.

Related

How to use ChoETL to compare two CSV files for ADD, CHANGED or DELETED records (Master vs Detail)?

I've been playing with #Cinchoo's fantastic ETL system for C#. I need to compare two CSV files, where one CSV file is defined as a dynamically growing master table and the other is a feeder "detail" table.
The detail table may have differences in terms of NEW records, CHANGED records, or a record no longer (DELETED) existing in the master CSV file.
The output should be a 3rd table that replaces or updates the master table - so it's a growing CSV file.
Both tables have unique ID columns and a header row.
MASTER CSV
ID,name
1,Danny
2,Fred
3,Sam
DETAIL
ID,name
1,Danny
<-- record no longer exists
3,Pamela <-- name change
4,Fernando <-- new record
So far I've been referring to this fiddle, and the code below:
using System;
using ChoETL;
using System.Linq;
public class Program
{
public static void Main()
{
var input1 = ChoCSVReader.LoadText(csv1).WithFirstLineHeader().ToArray();
var input2 = ChoCSVReader.LoadText(csv2).WithFirstLineHeader().ToArray();
Console.WriteLine("NEW records\n");
using (var output = new ChoCSVWriter(Console.Out).WithFirstLineHeader())
{
output.Write(input2.OfType<ChoDynamicObject>().Except(input1.OfType<ChoDynamicObject>(),
new ChoDynamicObjectEqualityComparer(new string[] { "id" })));
}
Console.WriteLine("\n\nDELETED records\n");
using (var output = new ChoCSVWriter(Console.Out).WithFirstLineHeader())
{
output.Write(input1.OfType<ChoDynamicObject>().Except(input2.OfType<ChoDynamicObject>(),
new ChoDynamicObjectEqualityComparer(new string[] { "id" })));
}
Console.WriteLine("\n\nCHANGED records\n");
using (var output = new ChoCSVWriter(Console.Out).WithFirstLineHeader())
{
output.Write(input1.OfType<ChoDynamicObject>().Except(input2.OfType<ChoDynamicObject>(),
new ChoDynamicObjectEqualityComparer(new string[] { "id", "name" })));
}
}
static string csv1 = #"
ID,name
1,Danny
2,Fred
3,Sam";
static string csv2 = #"
ID,name
1,Danny
3,Pamela
4,Fernando";
}
OUTPUT
NEW records
ID,name
4,Fernando
DELETED records
ID,name
2,Fred
CHANGED records
ID,name
2,Fred
3,Sam
The CHANGED records is not working. As an added extra, I need a status so I want it to look like this:
CHANGED records
ID,name,status
1,Danny,NOCHANGE
2,Fred,DELETED
3,Pamela,CHANGED
4,Fernando,NEW
Thanks

Here is how you can do with Cinchoo ETL
string csv1 = #"ID,name
1,Danny
2,Fred
3,Sam";
string csv2 = #"ID,name
1,Danny
3,Pamela
4,Fernando";
var r1 = ChoCSVReader.LoadText(csv1).WithFirstLineHeader().ToArray();
var r2 = ChoCSVReader.LoadText(csv2).WithFirstLineHeader().ToArray();
using (var w = new ChoCSVWriter(Console.Out).WithFirstLineHeader())
{
var newItems = r2.OfType<ChoDynamicObject>().Except(r1.OfType<ChoDynamicObject>(), new ChoDynamicObjectEqualityComparer(new string[] { "ID" }))
.Select(r =>
{
var dict = r.AsDictionary();
dict["Status"] = "NEW";
return new ChoDynamicObject(dict);
}).ToArray();
var deletedItems = r1.OfType<ChoDynamicObject>().Except(r2.OfType<ChoDynamicObject>(), new ChoDynamicObjectEqualityComparer(new string[] { "ID" }))
.Select(r =>
{
var dict = r.AsDictionary();
dict["Status"] = "DELETED";
return new ChoDynamicObject(dict);
}).ToArray();
var changedItems = r2.OfType<ChoDynamicObject>().Except(r1.OfType<ChoDynamicObject>(), ChoDynamicObjectEqualityComparer.Default)
.Except(newItems.OfType<ChoDynamicObject>(), new ChoDynamicObjectEqualityComparer(new string[] { "ID" }))
.Select(r =>
{
var dict = r.AsDictionary();
dict["Status"] = "CHANGED";
return new ChoDynamicObject(dict);
}).ToArray();
var noChangeItems = r1.OfType<ChoDynamicObject>().Intersect(r2.OfType<ChoDynamicObject>(), ChoDynamicObjectEqualityComparer.Default)
.Select(r =>
{
var dict = r.AsDictionary();
dict["Status"] = "NOCHANGE";
return new ChoDynamicObject(dict);
}).ToArray();
var finalResult = Enumerable.Concat(newItems, deletedItems).Concat(changedItems).Concat(noChangeItems).OfType<dynamic>().OrderBy(r => r.ID);
w.Write(finalResult);
}
Console.WriteLine();
Output:
ID,name,Status
1,Danny,NOCHANGE
2,Fred,DELETED
3,Pamela,CHANGED
4,Fernando,NEW
Sample fiddle: https://dotnetfiddle.net/mrHpFx
UPDATE #1:
Above approach will work for small CSV files. For large CSV files, you must avoid it. Rather approach it in stream manner. Sample fiddle shows how (Not fully tested, but it gives direction to do it.)
Sample fiddle: https://dotnetfiddle.net/mh6w44
UPDATE #2:
Now Cinchoo ETL (v1.2.1.33) comes with built-in API to compare the CSV files in simplified manner
var r1 = ChoCSVReader.LoadText(csv1).WithFirstLineHeader().WithMaxScanRows(1).OfType<ChoDynamicObject>();
var r2 = ChoCSVReader.LoadText(csv2).WithFirstLineHeader().WithMaxScanRows(1).OfType<ChoDynamicObject>();
using (var w = new ChoCSVWriter(Console.Out).WithFirstLineHeader())
{
foreach (var t in r1.Compare(r2, "ID", "name" ))
{
dynamic v1 = t.MasterRecord as dynamic;
dynamic v2 = t.DetailRecord as dynamic;
if (t.Status == CompareStatus.Unchanged || t.Status == CompareStatus.Deleted)
{
v1.Status = t.Status.ToString();
w.Write(v1);
}
else
{
v2.Status = t.Status.ToString();
w.Write(v2);
}
}
}
Sample fiddle: https://dotnetfiddle.net/uPR5Sq

Dapper and CsvHelper ignore columns dynamically

I'm using Dapper to query an SQL procedure to which I don't know the returned columns.
I want to write the results to a CSV file with CsvHelper.
In runtime, I want to dynamically ignore some of the columns.
CsvHelper has a mapping configuration which accepts only predefined classes.
var records = sqlCon.Query(sqlProcedure); //dynamic columns
using (var writer = new StreamWriter(#"file.csv"))
using (var csv = new CsvWriter(writer, CultureInfo.InvariantCulture))
{
var map = new CsvHelper.Configuration.DefaultClassMap<dynamic>();
...
csv.Context.RegisterClassMap(map);
csv.WriteRecords(records);
}

I couldn't figure out a way to do it with mapping. It is possible to manually write them out and ignore certain columns.
void Main()
{
dynamic obj1 = new ExpandoObject();
obj1.Id = 1;
obj1.Name = "Bill";
obj1.IgnoreProperty = "Please ignore me";
dynamic obj2 = new ExpandoObject();
obj2.Id = 2;
obj2.Name = "Brenda";
obj2.IgnoreProperty = "Please also ignore me";
var records = new List<dynamic>
{
obj1,
obj2
};
//using (var writer = new StreamWriter("path\\to\\file.csv"))
using (var csv = new CsvWriter(Console.Out, CultureInfo.InvariantCulture))
{
var recordDictionary = (IDictionary<string, object>)records.First();
var properties = recordDictionary.Keys;
foreach (var property in properties)
{
if (property != "IgnoreProperty")
{
csv.WriteField(property);
}
}
csv.NextRecord();
foreach (var record in records)
{
var expanded = (IDictionary<string, object>)record;
foreach (var property in properties)
{
if (property != "IgnoreProperty")
{
csv.WriteField(expanded[property]);
}
}
csv.NextRecord();
}
}
}

use csvhelper to download files to a specific location

I want to use the linq and then let the user download a csv file to their speicifc location.
I learn something from internet and learning a csvhelper.
I write something which works and download to a specific location (D:\export.csv) as below.
public ActionResult YRecordReport()
{
using (var sw = new StreamWriter(#"d:\export.csv", true, Encoding.GetEncoding("big5")))
using (var writer = new CsvWriter(sw))
{
var abc = from t in db.TRecordDBSet
join s in db.SInfoDBSet on t.SId equals s.SId
orderby s.sId, t.StartToEndDate
select new TRecord()
{
StaffId = t.SId,
Date = t.StartToEndDate,
Name = s.Cname,
CourseName = t.Tname,
Organizer = t.Organizer,
Hour = t.Hour,
Score = t.Score
};
var source = abc.ToList();
var config = new MapperConfiguration(cfg =>
{
cfg.CreateMap<TRecord, TRecord_Data>();
});
config.AssertConfigurationIsValid();
Mapper.Initialize(cfg => cfg.CreateMap<TRecord, TRecord_Data>());
writer.Configuration.Encoding = Encoding.GetEncoding("big5");
var mapper = config.CreateMapper();
List<TRecord_Data> records = Mapper.Map<List<TRecord>, List<TRecord_Data>>(source);
foreach (var item in records)
{
writer.WriteRecord(item);
}
}
}
Because I want to let the user to download the files to their specific location, I modified the code as below, it runs ok and the export.csv files is empty. Is that I write anything not specified? and how can i let the user download and use their filename?
[MyAuthFilter(Auth = "Admin")]
public ActionResult YearTrainingRecordReport()
{
var memoryStream = new MemoryStream();
var stream = new StreamWriter(memoryStream);
var writer = new CsvWriter(stream);
var abc = from t in db.TRecordDBSet
join s in db.SInfoDBSet on t.SId equals s.SId
orderby s.SId, t.StartToEndDate
select new TRecord()
{
Sd = t.SId,
Date = t.StartToEndDate,
Name = s.Cname,
CourseName = t.Tname,
Organizer = t.Organizer,
Hour = t.Hour,
Score = t.Score
};
var source = abc.ToList();
var config = new MapperConfiguration(cfg =>
{
cfg.CreateMap<TRecord, TRecord_Data>();
});
config.AssertConfigurationIsValid();
Mapper.Initialize(cfg => cfg.CreateMap<TRecord, TRecord_Data>());
writer.Configuration.Encoding = Encoding.GetEncoding("big5");
var mapper = config.CreateMapper();
List<TRecord_Data> records = Mapper.Map<List<TRecord>, List<TRecord_Data>>(source);
foreach (var item in records)
{
writer.WriteRecord(item);
}
return File(memoryStream, "text/csv", "export.csv");
}

You need to flush your writer and reset the stream.
writer.Flush();
stream.Position = 0;
If you're using CsvHelper 3.0 or later (currently in pre-release), you'll also need to force a new record.
writer.NextRecord();

iTextSharp AcroForm - multi-field not copying

I have a pdf with buttons that take you out to web links. I used iTextSharp to split these into separate PDFs (1 per page) per outside requirements. ISSUE: Any button that has multiple positions, lost the actions.
QUESTION: Does anyone know how to update these actions? I can open the new file, but I'm not sure how to go about using the PdfStamper to add an AA to this Annotation
So when opening the original file, you could get to the Additional Action by doing this:
var r = new PdfReader(f.FullName);
var positionsOfThisButton = r.AcroFields.GetFieldPositions("14");
var field = r.AcroForm.GetField("14")
var targetObject = PdfReader.GetPdfObject(field.Ref);
var kids = targetObject.GetAsArray(PdfName.KIDS);
foreach (var k in kids){
var ko = (PdfDictionary)(k.IsIndirect() ? PdfReader.GetPdfObject(k) : k);
var aaObj = ko.Get(PdfName.AA);
//(aaObj is NULL in the new file)
var aa = (PdfDictionary)(aaObj.IsIndirect() ? PdfReader.GetPdfObject(aaObj) : aaObj);
var dObj = aa.Get(PdfName.D);
var d = (PdfDictionary)(dObj.IsIndirect() ? PdfReader.GetPdfObject(dObj) : dObj);
Debug.WriteLine("S:" + d.GetAsName(PdfName.S).ToString() );
//returns S:/Uri
Debug.WriteLine("URI:" + d.GetAsString(PdfName.URI).ToString() );
//returns URI:http://www.somesite.com/etc
}
Thanks for any help.
FYI ONLY - The following is how I split the files:
List<byte[]> Get(FileInfo f) {
List<byte[]> outputFiles = new List<byte[]>();
var reader = new PdfReader(f.FullName);
int n = reader.NumberOfPages;
reader.Close();
for (int i = n; i > 0; i--) {
reader = new PdfReader(f.FullName);
using (var document = new Document(reader.GetPageSizeWithRotation(1))) {
using (var outputStream = new MemoryStream()) {
using (var writer = new PdfCopy(document, outputStream)) {
writer.SetMergeFields();
writer.PdfVersion = '6';
document.Open();
writer.AddDocument(reader, new List<int> { i });
document.Close();
writer.Close();
}
outputFiles.Insert(0, outputStream.ToArray());
}
}
reader.Close();
}
return outputFiles;
}

Reading the XML values using LINQ

what is the best way of reading xml file using linq and the below code you will see that, I have three different loops and I feel like its not elegant or do I have options to retrofit the below code?
public static void readXMLOutput(Stream stream)
{
XDocument xml = new XDocument();
xml = LoadFromStream(stream);
var header = from p in xml.Elements("App").Elements("Application")
select p;
foreach (var record in header)
{
string noym = record.Element("nomy").Value;
string Description = record.Element("Description").Value;
string Name = record.Element("Name").Value;
string Code = record.Element("Code").Value;
}
var appRoles = from q in xml.Elements("App").Elements("Application").Elements("AppRoles").Elements("Role")
select q;
foreach (var record1 in appRoles)
{
string Name = record1.Element("Name").Value;
string modifiedName = record1.Element("ModifiedName").Value;
}
var memeber = from r in xml.Elements("App").Elements("Application").Elements("AppRoles").Elements("Role").Elements("Members")
select r;
foreach (var record2 in memeber)
{
string ExpirationDate = record2.Element("ExpirationDate").Value;
string FullName = record2.Element("FullName").Value;
}
}
UPDATED:
foreach (var record in headers)
{
..............
string Name1 = record.Attribute("Name").Value;
string UnmodifiedName = record.Attribute("UnmodifiedName").Value;
string ExpirationDate = record.Attribute("ExpirationDate").Value;
string FullName = record.Attribute("FullName").Value;
...............
}

Is that your actual code ? All those string variables you are assigning in the foreach loops only have a scope of one iteration of the loop. They are created and destroyed each time.

This may not work precisely in your case depending on the xml structure. Play around with it. Try it using LinqPad
var applications = from p in xml.Descendants("Application")
select new { Nomy = p.Element("nomy").Value
, Description = p.Element("Description").Value
, Name = p.Element("Name").Value
, Code = p.Element("Code").Value
};
var appRoles = from r in xml.Descendants("Role")
select new { Name = r.Element("Name").Value
, ModifiedName = r.Element("ModifiedName").Value
};

This answer is a hierarchical query.
var headers =
from header in xml.Elements("App").Elements("Application")
select new XElement("Header",
new XAttribute("noym", header.Element("nomy").Value),
new XAttribute("Description", header.Element("Description").Value),
new XAttribute("Name", header.Element("Name").Value),
new XAttribute("Code", header.Element("Code").Value),
from role in header.Elements("AppRoles").Elements("Role")
select new XElement("Role",
new XAttribute("Name", role.Element("Name").Value),
new XAttribute("ModifiedName", role.Element("ModifiedName").Value),
from member in role.Elements("Members")
select new XElement("Member",
new XAttribute("ExpirationDate", member.Element("ExpirationDate").Value),
new XAttribute("FullName", member.Element("FullName").Value)
)
)
);

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Modify CSV file headers/column names using Cinchoo ETL - c#

Related

How to use ChoETL to compare two CSV files for ADD, CHANGED or DELETED records (Master vs Detail)?

Dapper and CsvHelper ignore columns dynamically

use csvhelper to download files to a specific location

iTextSharp AcroForm - multi-field not copying

Reading the XML values using LINQ

Categories

Resources