Modify CSV Parser to work with TSV files C# - c#

I have this code for parsing a CSV file.
var query = from line in File.ReadAllLines("E:/test/sales/" + filename)
let customerRecord = line.Split(',')
select new FTPSalesDetails
{
retailerName = "Example",
};
foreach (var item in query)
{
//sales details table
ItemSale ts = new ItemSale
{
RetailerID = GetRetailerID(item.retailerName)
};
}
Obviously there will be more data in the above code, I am just awaiting the test information file details/structure.
In the mean time I thought I'd ask if this could me modified to parse TSV files?
All help is appreciated,
thanks :)

assuming tsv is tab separated value, you can use
line.Split('\t')
if you are using .NET 4.0, i would recommend that u use File.ReadLines for large files in order to use LINQ and not to load all the lines in memory at once.

Related

Basic Read CSV File Questions

Thanks in advance, C# newb here having a few issues.
I this CSV file provided daily, large, and has no header. I only need certain items out of this file.
Here is the code I have so far.
var config = new CsvConfiguration(CultureInfo.InvariantCulture)
{
HasHeaderRecord = false,
};
using (var reader = new StreamReader(iFile.FileName))
using (var csv = new CsvReader(reader, config))
{
var records = new List<BQFile>();
csv.Read();
csv.ReadHeader();
while (csv.Read())
{
var record = new BQFile()
{
SNumber = csv.GetField<string>("SNumber"),
FOBPoint = csv.GetField<string>("FOBPoint")
};
}
What I am not understanding since this CSV files 150+ fields, is how do grab the correct data. For example, if SNumber is column 46, FOBPoint is column 123. I am finding the CSVHelper documentation a little limited to me.
Any help is appreciated.
What I am not understanding since this CSV files 150+ fields, is how do grab the correct data
By index, because there is no header
In your BQFile, decorate the properties with an attribute of [Index(NNN)] where N is the column number (0-based). The IndexAttribute is found in CsvHelper.Configuration.Attributes namespace - I mention this because Entity Framework also has an Index attribute; be sure you use the correct one
pubic class BQFile{
[Index(46)]
public string SNumber { get; set;}
...
}
Then do:
var config = new CsvConfiguration(CultureInfo.InvariantCulture)
{
HasHeaderRecord = false,
};
using (var reader = new StreamReader(iFile.FileName))
using (var csv = new CsvReader(reader, config))
{
var records = csv.GetRecords<BQFile>();
...
records is an enumeration on top of the file stream (via CSVHelper, which reads records as it goes and creates instances of BQFile). You can only enumerate it once, and then after you're done enumerating it the filestream will be at the end - if you wanted to re-read the file you'd have to Seek the stream or renew the reader. Also, the file is only read (in chunks, progressively) as you enumerate. If you return records somewhere, so you drop out of the using and you thus dispose the reader, you'll get an error when you try to start reading from records (because it's disposed)
To work with records, you either foreach it, processing the objects you get as you go:
foreach(BQFile bqf in records){
//do stuff with each BQFile here
}
Or if you want to load it all into memory, you can do something like ToList() it so you end up with a bunch of BQFile in a List, and then you can e.g. access them randomly, read them over and over etc..
var bqfs = records.ToList();
ps; I don't know, when you said "it's column 46" if that's counting from 1 or 0.. You might have to adjust your 46.

Find a if filename does not exist in array of variable names

I have a list of zipped files that contains a ZipArchive and the zipped filename as a String. I also have a final list of filenames that I need to check with my List and if the files do not match with my final list of filenames they should be dumped from my zipped file list.
I under stand that may not be worded the best so let me try and explain with my code/pseudo code.
Here is my list:
List<ZipContents> importList = new List<ZipContents>();
Which has two parameters:
ZipArchive which is called ZipFile
String which is called FileName
filenames is the finale list of file names that I am trying to check my ZipContents list against.
Here is the start of what I am trying to do:
foreach (var import in importList)
{
var fn = import.FileName;
// do some kind of lookup to see if fn would be List<String> filenames
// If not in list dump from ZipContents
}
The commented out section is what I am unsure about doing. Would someone be able to help get me on the right track? Thanks!
EDIT 1
I know I did not say this originally but I think that LINQ would be the much cleaner route to take. I am just not positive how. I am assuming that using .RemoveAll(..) would be the way I would want to go?
Loop through importList in reverse and remove items when not found in filenames. Assuming you don't have too many items performance should be fine:
for (int i = importList.Count - 1; i >= 0; i--)
{
if (!filenames.Contains(importList[i].FileName))
{
importList.RemoveAt(i);
}
}
You can't remove items from the list using a foreach because it modifies the collection, but you can do it with the construct in my example.
You could do something like:
if (!filenames.Contains(fn)) {
importList.Remove(import);
}
Alternatively, I believe you could use Linq to simplify this logic into just one line.
Edit:
Yes, you can just create a new list of just the ones you want, like this:
var newImportList = importList.Where(il => filenames.Contains(il.FileName)).ToList();
You can do this in one line. Just use LINQ to re-establish your list:
var filenames = new List<string> {"file1", "file2"};
var zipcontents = new List<ZipContents>
{
new ZipContents {FileName = "file1"},
new ZipContents {FileName = "file2"},
new ZipContents {FileName = "file3"}
};
zipcontents = zipcontents.Where(z => filenames.Contains(z.FileName)).ToList();
//zipcontents contains only files that were in 'filenames'
Honestly, this is what LINQ was made for: querying data.

How to remove contents of one csv file from another in C#

I have 2 csv files, file1.csv and file2.csv. Some lines in each file will be identical. I wish to create a 3rd csv file, based upon file2.csv but with any lines that are present in file1.csv removed from it. Effectively I wish to subtract file1.csv from file2.csv ignoring any lines present in file1 that are not in file2.
I know that I could use streamreader to read each line in file2.csv and search for it in file1.csv. If it does not exist in file1.csv I can write it to file3.csv. However, the files are very large (over 30000 lines) and I believe this would take a lot of processing time.
I suspect there may be a better method of loading each csv to an array and then performing a simple subtraction function on them to obtain the desired result. I would appreciate either some help with the code or on method that I should approach this problem with.
Example content of files:
file1.csv
dt97861.jpg,149954,c1714ee1,\folder1\folderA\,
dt97862.jpg,149955,c1714ee0,\folder1\folderA\,
dt97863.jpg,59368,cd23f223,\folder2\folderA\,
dt97864.jpg,57881,0835be4a,\folder2\folderB\,
dt97865.jpg,57882,0835be4b,\folder2\folderB\,
file2.csv
dt97862.jpg,149955,c1714ee0,\folder1\folderA\,
dt97863.jpg,59368,cd23f223,\folder2\folderA\,
dt97864.jpg,57881,0835be4a,\folder2\folderB\,
dt97865.jpg,57882,0835be4b,\folder2\folderB\,
dt97866.jpg,57883,0835be4c,\folder2\folderB\,
dt97867.jpg,57884,0835be4d,\folder3\folderA\,
dt97868.jpg,57885,0835be4e,\folder3\folderA\,
The results I require is:
file3.csv
dt97866.jpg,57883,0835be4c,\folder2\folderB\,
dt97867.jpg,57884,0835be4d,\folder3\folderA\,
dt97868.jpg,57885,0835be4e,\folder3\folderA\,
EDIT:
With the help below I came to the following solution which I believe to be nice and elegant:
public static IEnumerable<string> ReadFile(string path)
{
string line;
using (var reader = File.OpenText(path))
while ((line = reader.ReadLine()) != null)
yield return line;
}
then:
var file2 = ReadFile(file2FilePath);
var file1 = ReadFile(file1FilePath);
var file3 = file2.Except(file1);
File.WriteAllLines(file3FilePath, file3);
Assume the line is perfectly identical, you can read both file into two IEnumerable<string> and extract with IEnumerable.Except<T>. This will produce the same result regardless of the ordering~
Example :
var file1 = new List<string>{
#"dt97861.jpg,149954,c1714ee1,\folder1\folderA\,",
#"dt97862.jpg,149955,c1714ee0,\folder1\folderA\,",
#"dt97863.jpg,59368,cd23f223,\folder2\folderA\,",
#"dt97864.jpg,57881,0835be4a,\folder2\folderB\,",
#"dt97865.jpg,57882,0835be4b,\folder2\folderB\,",
};
var file2 = new List<string>{
#"dt97862.jpg,149955,c1714ee0,\folder1\folderA\,",
#"dt97863.jpg,59368,cd23f223,\folder2\folderA\,",
#"dt97864.jpg,57881,0835be4a,\folder2\folderB\,",
#"dt97865.jpg,57882,0835be4b,\folder2\folderB\,",
#"dt97866.jpg,57883,0835be4c,\folder2\folderB\,",
#"dt97867.jpg,57884,0835be4d,\folder3\folderA\,",
#"dt97868.jpg,57885,0835be4e,\folder3\folderA\,",
};
file2.Except(file1).Dump();
Output :
dt97866.jpg,57883,0835be4c,\folder2\folderB\,
dt97867.jpg,57884,0835be4d,\folder3\folderA\,
dt97868.jpg,57885,0835be4e,\folder3\folderA\,
Here is the function to load any file into IEnumerable<string>. Just dont forget to using System.IO;.
public static IEnumerable<string> ReadFile(string path)
{
string line;
using(var reader = File.OpenText(path))
while((line = reader.ReadLine()) != null)
yield return line;
}
To write the result to a file :
//using System.IO; is required
File.WriteAllLines("file3.csv", file2.Except(file1))
Remarks : File.WriteAllLines will create or overwrite the file.
While this may not be the best approach, it's the one I've used in the past. It's a bit of a dirty hack, but...
Import both CSV files into a datatable (so you will have two datatables -I personally prefer closed xml if you plan to use an excel type format, otherwise just use a normal file read/write - My example uses regular read/write)
Move data from datatable into a list (my example assumes comma separated values, one per line.)
Find unique values between lists and merge
Export the merged lists to a csv file
*[Edited steps after actually working on the code]
Per request from Bit, I've added an example using sample data from Some Random Website - This was written in VS2008 against .NET 3.5, but it should work on 3.5+. I copied us-500 into 2 versions, the original and modified 1 row to create a unique value to test. This project is targeting x86 platform. I've used a new windows form for testing
using System.Data;
using System.Data.OleDb;
using System.IO;
using System.Linq;
using System.Windows.Forms;
namespace TestSandbox
{
public partial class Form1 : Form
{
public Form1()
{
var file1 = new DataTable();
var file2 = new DataTable();
InitializeComponent();
//Gets data from csv file, select allows for filtering
using (var conn = new OleDbConnection(#"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=c:\;Extended Properties=""text;HDR=Yes;FMT=Delimited"";"))
{
conn.Open();
using (var adapter = new OleDbDataAdapter(#"select * from [us-500.csv]", conn))
{
adapter.Fill(file1);
}
}
using (var conn = new OleDbConnection(#"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=c:\;Extended Properties=""text;HDR=Yes;FMT=Delimited"";"))
{
conn.Open();
using (var adapter = new OleDbDataAdapter(#"select * from [us-500-2.csv]", conn))
{
adapter.Fill(file2);
}
}
//Moves datatable information to lists for comparison
var file1List = (from DataRow row in file1.Rows select row.ItemArray.Select(field => field.ToString()).ToArray() into fields select string.Join(",", fields)).ToList();
var file2List = (from DataRow row in file2.Rows select row.ItemArray.Select(field => field.ToString()).ToArray() into fields select string.Join(",", fields)).ToList();
//Adds all data from file2 into file1 list, except for data that already exists in file1
file1List.AddRange(file2List.Except(file1List));
//Exports all results to c:\results.csv
File.WriteAllLines(#"C:\Results.csv", file1List.ToArray());
}
}
}
*Note: After looking at the code, importing straight to a list looks like it would be more efficient, but I'll leave this as is for now since it's not overly complicated.
Step 1. Using System.IO, we'll read two files using FileStream and create a third file using StreamWriter.
Step 2. Use FileStream to read file #1. e.g.
using (var FS = new System.IO.FileStream(file1, System.IO.FileMode.Open, System.IO.FileAccess.Read)) { ...<insert next steps in here>...}
Step 3. Nest another FileStream to read file #2. This stream will be read multiple times, so it's best if you can put the smaller file in this part of the nest. You can do this by checking the size of the file prior to jumping into these loops.
Step 4. Read in a single line from our biggest file, File#1, then we compare it against ALL lines from File#2 sequentially. If a match is found, set a boolean to TRUE indicating that there is a matching line found in File #2.
Step 5. Once we're at the end of File #2, check for a true/false condition of the boolean. If its false, SAVE the string we read from File #1 into File #3. This is your output file.
Step 6. Reset the stream pointer for File #2 to the beginning of the file e.g. FS.Seek(0, System.IO.SeekOrigin.Begin)
Step 7. Repeat from Step 4 until we've reached the end of File #1. File #3's contents should represent only unique entries from File #1 that are not members of File #2

Execute SQL scripts in a required order

i need to create a small utility to execute sql files on SQL SERVER 2008R2, i have tried the following code
private static void ExecuteScripts()
{
string sqlConnectionString = "UID=sa;password=passw0rd;Data Source=somesqlserver\\db01";
DirectoryInfo info = new DirectoryInfo(#"c:\dxsh\);
FileInfo[] fileInfos = info.GetFiles("1.8*");
foreach (var fileInfo in fileInfos)
{
string script = fileInfo.OpenText().ReadToEnd();
var conn = new SqlConnection(sqlConnectionString);
var server = new Server(new ServerConnection(conn));
server.ConnectionContext.ExecuteNonQuery(script);
}
}
i will have the following files in the folder
1. 1.8_DatabaseAndUsers.sql
2. 1.8_TablesAndTypes.sql
3. 1.8_Views.sql
4. 1.8_KeysAndIndex.sql
5. 1.8_ProceduresAndFunction.sql
i need to execute the files in this order only, pls help
If you know the order in which you want to execute the files, just fetch the files in the order you expect:
string[] files = { "1.8_DatabaseAndUsers.sql", "1.8_TablesAndTypes.sql", ... };
foreach (var file in files)
{
// Simpler way of reading files (and doesn't leave the file handle open)
string text = File.ReadAllText(file);
// using statement to avoid leaking resources
using (var conn = new SqlConnection(...))
{
var server = new Server(new ServerConnection(conn));
server.ConnectionContext.ExecuteNonQuery(script);
}
}
Basically, you shouldn't rely on the order in which the files are returned by GetFiles - if you want them in a specific order, just enforce that yourself.
Another option is to use GetFiles but make sure the filenames can be ordered appropriately, e.g.
1.8_01_DatabaseAndUsers.sql
1.8_02_TablesAndTypes.sql
1.8_03_Views.sql
1.8_04_KeysAndIndex.sql
1.8_05_ProceduresAndFunction.sql
That way you don't need to hard-code the names in your program, but you can still guarantee the order, just by sorting the filenames before executing the scripts.
I presume you do not wish to hardcode the file names into your code.
In this case, you should rename the files so that they are alphabetically ordered, which can easily be achieved by putting a number in front of their name (such as 01, 02, etc.) so the first file will be 01 1.8_DatabaseAndUsers.sql and so on.
Directory.GetFiles() should return the files in alphabetical order, but you can use the following code to retrieve the file names and add them to a list, which you then explicitly sort into alphabetical order:
// Get a list of files and add them to a list
List<string> fileList = new List<string>();
foreach (string item in Directory.GetFiles(#"c:\dxsh", "* 1.8*"))
fileList.Add(item);
fileList.Sort();
// Go through each file in the list order
for (int i = 0; i < fileList.Count; i++)
{
string filename = fileList[i];
string script = File.ReadAllText(filename);
// Run your code
}

Bulk data insertion in SQL Server table from delimited text file using c#

I have tab delimited text file. File is around 100MB. I want to store data from this file to SQL server table. The file contains 1 million records when stored in sql server. What is the best way to achieve this?
I can create in momory datatable in c# and then upload the same to sql server, but in this case it will load entire 100 MB file to memory. What if file size get bigger?
No problem; CsvReader will handle most delimited text formats, and implements IDataReader, so can be used to feed a SqlBulkCopy. For example:
using (var file = new StreamReader(path))
using (var csv = new CsvReader(file, true)) // true = first row is headers
using (var bcp = new SqlBulkCopy(connectionString))
{
bcp.DestinationTableName = "Foo";
bcp.WriteToServer(csv);
}
Note that CsvReader has lots of options more more subtle file handling (specifying the delimiter rules, etc). SqlBulkCopy is the high-performance bulk-load API - very efficient. This is a streaming reader/writer API; it does not load all the data into memory at once.
You should read the file line-by-line, so you don't have to load the whole line into memory:
using (var file = System.IO.File.OpenText(filename))
{
while (!file.EndOfStream)
{
string line = file.ReadLine();
// TODO: Do your INSERT here
}
}
* Update *
"This will make 1 million separate insert commands to sql server. Is there any way to make it in bulk"
You could use parameterised queries, which would still issue 1M inserts, but would still be quite fast.
Alternatively, you can use SqlBulkCopy, but that's going to be rather difficult if you don't want to use 3rd party libraries. If you are more amenable to the MS license, you could use the LINQ Entity Data Reader (distributed under Ms-PL license), which provides the AsDataReader extension method:
void MyInsertMethod()
{
using (var bulk = new SqlBulkCopy("MyConnectionString"))
{
bulk.DestinationTableName = "MyTableName";
bulk.WriteToServer(GetRows().AsDataReader());
}
}
class MyType
{
public string A { get; set; }
public string B { get; set; }
}
IEnumerable<MyType> GetRows()
{
using (var file = System.IO.File.OpenText("MyTextFile"))
{
while (!file.EndOfStream)
{
var splitLine = file.ReadLine().Split(',');
yield return new MyType() { A = splitLine[0], B = splitLine[1] };
}
}
}
If you didn't want to use the MS licensed code either, you could implement IDataReader yourself, but that is going to be a PITA. Note that the CSV handling above (Split(',')) is not at all robust, and also that column names in the table must be the same as property names on MyType. TBH, I'd recommend you go with Marc's answer on this one

Categories

Resources