I have a number of text files that all follow the same content format:
"Title section","Version of the app"
10
"<thing 1>","<thing 2>","<thing 3>","<thing 4>","<thing 5>","<thing 6>","<thing 7>","<thing 8>","<thing 9>","<thing 10>"
'Where:
' first line never changes, it always contains exactly these 2 items
' second line is a count of how many "line 3s" there are
' line 3 contains a command to execute and (up to) 9 parameters
' - there will always be 10 qoute-delimited entries, even if some are blank
' - there can be N number of entries (in this example, there will be 10 commands to read)
I am reading each of these text files in, using StreamReader, and want to set each file up in its own class.
public class MyTextFile{
public string[] HeaderLine { get; set; }
public int ItemCount { get; set; }
List<MyCommandLine> Commands { get; set;}
}
public class MyCommandLine{
public string[] MyCommand { get; set; }
}
private void btnGetMyFilesiles_Click(object sender, EventArgs e){
DirectoryInfo myFolder = new DirectoryInfo(#"C:\FileSpot");
FileInfo[] myfiles = myfolder.GetFiles("*.ses");
string line = "";
foreach(FileInfo file in Files ){
str = str + ", " + file.Name;
// Read the file and display it line by line.
System.IO.StreamReader readingFile = new System.IO.StreamReader(file.Name);
MyTextFile myFileObject = new MyTextFile()
while ((line = readingFile.ReadLine()) != null){
' create the new MyTextFile here
}
file.Close();
}
}
}
The objective is to determine what the actual command being called is (""), and if any of the remaining parameters point to a pre-existing file, determine if that file exists. My problem is that I can't figure out how to read N number of "line 3" into their own objects and append these objects to the MyTextFile object. I'm 99% certain that I've led myself astray in reading each file line-by-line, but I don't know how to get out of it.
So, addressing the specific issue of getting N number of line 3 items into your class, you could do something like this (obviously you can make some changes so it is more specific to your application).
public class MyTextFile
{
public List<Array> Commands = new List<Array>();
public void EnumerateCommands()
{
for (int i = 0; i < Commands.Count; i++)
{
foreach (var c in Commands[i])
Console.Write(c + " ");
Console.WriteLine();
}
}
}
class Program
{
static void Main(string[] args)
{
string line = "";
int count = 0;
MyTextFile tf = new MyTextFile();
using (StreamReader sr = new StreamReader(#"path"))
{
while ((line = sr.ReadLine()) != null)
{
count += 1;
if (count >= 3)
{
object[] Arguments = line.Split(',');
tf.Commands.Add(Arguments);
}
}
}
tf.EnumerateCommands();
Console.ReadLine();
}
}
At least now you have a list of commands within your 'MyTextFile' class that you can enumerate through and do stuff with.
** I added the EnumerateCommands method so that you could actually see the list is storing the line items. The code should run in a Console application with the appropriate 'using' statements.
Hope this helps.
If all of the is separated with coma sign , you can just do something like :
int length = Convert.ToInt32 (reader.ReadLine ());
string line = reader.ReadLine ();
IEnumerable <string> things = line.Split (',').Select (thing => thing. Replace ('\"'', string.Empty).Take(length);
Take indicates how many things to take from the line.
Related
I'm currently working on a c# form.Basically, I have a lot of log files and most of them have duplicates lines between them. This form is supposed to concatenate a lot of those files into one file then delete all the duplicates in it so that I can have one log file without duplicates. I've already successfully made it work by taking 2 files, concatenating them, deleting all the duplicates in it then reproducing the process until I have no more files. Here is the function I made for this:
private static void DeleteAllDuplicatesFastWithMemoryManagement(HashSet<string>[] path_list, string parent_path, ProgressBar pBar1, BackgroundWorker backgroundWorker1)
{
for (int j = 0; j < path_list.Length; j++)
{
HashSet<string>.Enumerator em = path_list[j].GetEnumerator();
List<string> LogFile = new List<string>();
while (em.MoveNext())
{
var secondLogFile = File.ReadAllLines(em.Current);
LogFile = LogFile.Concat(secondLogFile).ToList();
LogFile = LogFile.Distinct().ToList();
backgroundWorker1.ReportProgress(1);
}
LogFile = LogFile.Distinct().ToList();
string new_path = parent_path + "/new_data/probe." + j + ".log";
File.WriteAllLines(new_path, LogFile.Distinct().ToArray());
}
}
path_list contains all the path to the files I need to process.
path_list[0] contains all the probe.0.log files
path_list[1] contains all the probe.1.log files ...
Here is the idea I have for my problem but I have no idea how to code it :
private static void DeleteAllDuplicatesFastWithMemoryManagement(HashSet<string>[] path_list, string parent_path, ProgressBar pBar1, BackgroundWorker backgroundWorker1)
{
for (int j = 0; j < path_list.Length; j++)
{
HashSet<string>.Enumerator em = path_list[j].GetEnumerator();
List<string> LogFile = new List<string>();
while (em.MoveNext())
{
// how I see it
if (currentMemoryUsage + newfile.Length > maximumProcessMemory) {
LogFile = LogFile.Distinct().ToList();
}
//end
var secondLogFile = File.ReadAllLines(em.Current);
LogFile = LogFile.Concat(secondLogFile).ToList();
LogFile = LogFile.Distinct().ToList();
backgroundWorker1.ReportProgress(1);
}
LogFile = LogFile.Distinct().ToList();
string new_path = parent_path + "/new_data/probe." + j + ".log";
File.WriteAllLines(new_path, LogFile.Distinct().ToArray());
}
}
I think this method will be much quicker, and it will adjust to any computer specs. Can anyone help me to make this work ? Or tell me if I'm wrong.
You are creating far too many lists and arrays and Distincts.
Just combine everything in a HashSet, then write it out
private static void CombineNoDuplicates(HashSet<string>[] path_list, string parent_path, ProgressBar pBar1)
{
var logFile = new HashSet<string>(1000); // pre-size your hashset to a suitable size
foreach (var paths in path_list)
{
logFile.Clear();
foreach (var path in paths)
{
var lines = File.ReadLines(file);
logFile.UnionWith(lines);
backgroundWorker1.ReportProgress(1);
}
string new_path = Path.Combine(parent_path, "new_data", "probe." + j + ".log");
File.WriteAllLines(new_path, logFile);
}
}
Ideally you should use async instead of BackgroundWorker which is deprecated. This also means you don't need to store a whole file in memory at once, except for the first one.
private static async Task CombineNoDuplicatesAsync(HashSet<string>[] path_list, string parent_path, ProgressBar pBar1)
{
var logFile = new HashSet<string>(1000); // pre-size your hashset to a suitable size
foreach (var paths in path_list)
{
logFile.Clear();
foreach (var path in paths)
{
using (var sr = new StreamReader(file))
{
string line;
while ((line = await sr.ReadLineAsync()) != null)
{
logFile.Add(line);
}
}
}
string new_path = Path.Combine(parent_path, "new_data", "probe." + j + ".log");
await File.WriteAllLinesAsync(new_path, logFile);
}
}
If you want to risk a colliding hash-code, you could cut down your memory usage even further by just putting the strings' hashes in a HashSet, then you can fully stream all files.
Caveat: colliding hash-codes are a distinct possibility, especially with many strings. Analyze your data to see fi you can risk this.
private static async Task CombineNoDuplicatesAsync(HashSet<string>[] path_list, string parent_path, ProgressBar pBar1)
{
var hashes = new HashSet<int>(1000); // pre-size your hashset to a suitable size
foreach (var paths in path_list)
{
hashes.Clear();
string new_path = Path.Combine(parent_path, "new_data", "probe." + j + ".log");
using (var output = new StreamWriter(new_path))
{
foreach (var path in paths)
{
using (var sr = new StreamReader(file))
{
string line;
while ((line = await sr.ReadLineAsync()) != null)
{
if (hashes.Add(line.GetHashCode())
await output.WriteLineAsync(line);
}
}
}
}
}
}
You can get even more performance if you would read Span<byte> arrays and parse the lines like that, I will leave that as an exercise to the reader as it's quite complex.
Assuming your log files already contain lines that are sorted in chronological order1, we can effectively treat them as intermediate files for a multi-file sort and perform merging/duplicate elimination in one go.
It would be a new class, something like this:
internal class LogFileMerger : IEnumerable<string>
{
private readonly List<IEnumerator<string>> _files;
public LogFileMerger(HashSet<string> fileNames)
{
_files = fileNames.Select(fn => File.ReadLines(fn).GetEnumerator()).ToList();
}
public IEnumerator<string> GetEnumerator()
{
while (_files.Count > 0)
{
var candidates = _files.Select(e => e.Current);
var nextLine = candidates.OrderBy(c => c).First();
for (int i = _files.Count - 1; i >= 0; i--)
{
while (_files[i].Current == nextLine)
{
if (!_files[i].MoveNext())
{
_files.RemoveAt(i);
break;
}
}
}
yield return nextLine;
}
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
You can create a LogFileMerger using the set of input log file names and pass it directly as the IEnumerable<string> to some method like File.WriteAllLines. Using File.ReadLines should mean that the amount of memory being used for each input file is just a small buffer on each file, and we never attempt to have all of the data from any of the files loaded at any time.
(You may want to adjust the OrderBy and comparison operations in the above if there are requirements around case insensitivity but I don't see any evident in the question)
(Note also that this class cannot be enumerated multiple times in the current design. That could be adjusted by storing the paths instead of the open enumerators in the class field and making the list of open enumerators a local inside GetEnumerator)
1If this is not the case, it may be more sensible to sort each file first so that this assumption is met and then proceed with this plan.
Using C#. I am creating a program that stores movies in a text file such as:
38#Finding Nemo#Family#2001#yes
36#Wonder Woman#Action#2017#yes
35#Solo#Action#2018#yes
I am trying to process a batch transaction into my movie inventory file. This text file holds an action code after the first delimiter('#') whether to add, change, or delete a movie in the movie inventory file. The problem that I am having is that each line has a different format while I am trying to store it into an array. What is a way to read the header line differently and ignore empty values between delimiters?
An example batch file is:
H#Movie Inventory Updates#10/31/2019
D#C#5#Family Movie 1###no
D#A#21#Sci-Fi Movie 1#sci-fi#2005#yes
D#A#22#Other Movie 1#other#2001#yes
D#C#1###2002#
D#D#4####
public static Batch[] ReadBatchFile()
{
Batch[] bTransactions = new Batch[100];
if (File.Exists("batch_transaction.txt"))
{
StreamReader inFile = new StreamReader("batch_transaction.txt");
//string path = "batch_transaction.txt";
string line = inFile.ReadLine();
while (line != null) //read all transaction records, ignoring header and footer
{
string[] tempArray = line.Split('#');
bTransactions[Batch.GetCount()] = new Batch(tempArray[0], tempArray[1], int.Parse(tempArray[2]), tempArray[3], tempArray[4], double.Parse(tempArray[5]), tempArray[6]);
Batch.SetCount(Batch.GetCount() + 1);
line = inFile.ReadLine();
}
inFile.Close();
}
else
{
Console.WriteLine("File not found.");
}
return bTransactions;
}
}
Here is how I'm currently trying to read in the the other lines:
public Batch(string recordType, string actionCode, int movieId, string movieTitle, string movieGenre, double releaseYear, string inStock)
{
this.recordType = recordType;
this.actionCode = actionCode;
this.movieId = movieId;
this.movieTitle = movieTitle;
this.movieGenre = movieGenre;
this.releaseYear = releaseYear;
}
I do not want to read the whole file at any point, I know there are answers on that question, I want t
o read the First or Last line.
I know that my code locks the file that it's reading for two reasons 1) The application that writes to the file crashes intermittently when I run my little app with this code but it never crashes when I am not running this code! 2) There are a few articles that will tell you that File.ReadLines locks the file.
There are some similar questions but that answer seems to involve reading the whole file which is slow for large files and therefore not what I want to do. My requirement to only read the last line most of the time is also unique from what I have read about.
I nead to know how to read the first line (Header row) and the last line (latest row). I do not want to read all lines at any point in my code because this file can become huge and reading the entire file will become slow.
I know that
line = File.ReadLines(fullFilename).First().Replace("\"", "");
... is the same as ...
FileStream fs = new FileStream(#fullFilename, FileMode.Open, FileAccess.Read, FileShare.Read);
My question is, how can I repeatedly read the first and last lines of a file which may be being written to by another application without locking it in any way. I have no control over the application that is writting to the file. It is a data log which can be appended to at any time. The reason I am listening in this way is that this log can be appended to for days on end. I want to see the latest data in this log in my own c# programme without waiting for the log to finish being written to.
My code to call the reading / listening function ...
//Start Listening to the "data log"
private void btnDeconstructCSVFile_Click(object sender, EventArgs e)
{
MySandbox.CopyCSVDataFromLogFile copyCSVDataFromLogFile = new MySandbox.CopyCSVDataFromLogFile();
copyCSVDataFromLogFile.checkForLogData();
}
My class which does the listening. For now it simply adds the data to 2 generics lists ...
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using MySandbox.Classes;
using System.IO;
namespace MySandbox
{
public class CopyCSVDataFromLogFile
{
static private List<LogRowData> listMSDataRows = new List<LogRowData>();
static String fullFilename = string.Empty;
static LogRowData previousLineLogRowList = new LogRowData();
static LogRowData logRowList = new LogRowData();
static LogRowData logHeaderRowList = new LogRowData();
static Boolean checking = false;
public void checkForLogData()
{
//Initialise
string[] logHeaderArray = new string[] { };
string[] badDataRowsArray = new string[] { };
//Get the latest full filename (file with new data)
//Assumption: only 1 file is written to at a time in this directory.
String directory = "C:\\TestDir\\";
string pattern = "*.csv";
var dirInfo = new DirectoryInfo(directory);
var file = (from f in dirInfo.GetFiles(pattern) orderby f.LastWriteTime descending select f).First();
fullFilename = directory + file.ToString(); //This is the full filepath and name of the latest file in the directory!
if (logHeaderArray.Length == 0)
{
//Populate the Header Row
logHeaderRowList = getRow(fullFilename, true);
}
LogRowData tempLogRowList = new LogRowData();
if (!checking)
{
//Read the latest data in an asynchronous loop
callDataProcess();
}
}
private async void callDataProcess()
{
checking = true; //Begin checking
await checkForNewDataAndSaveIfFound();
}
private static Task checkForNewDataAndSaveIfFound()
{
return Task.Run(() => //Call the async "Task"
{
while (checking) //Loop (asynchronously)
{
LogRowData tempLogRowList = new LogRowData();
if (logHeaderRowList.ValueList.Count == 0)
{
//Populate the Header row
logHeaderRowList = getRow(fullFilename, true);
}
else
{
//Populate Data row
tempLogRowList = getRow(fullFilename, false);
if ((!Enumerable.SequenceEqual(tempLogRowList.ValueList, previousLineLogRowList.ValueList)) &&
(!Enumerable.SequenceEqual(tempLogRowList.ValueList, logHeaderRowList.ValueList)))
{
logRowList = getRow(fullFilename, false);
listMSDataRows.Add(logRowList);
previousLineLogRowList = logRowList;
}
}
//System.Threading.Thread.Sleep(10); //Wait for next row.
}
});
}
private static LogRowData getRow(string fullFilename, bool isHeader)
{
string line;
string[] logDataArray = new string[] { };
LogRowData logRowListResult = new LogRowData();
try
{
if (isHeader)
{
//Asign first (header) row data.
//Works but seems to block writting to the file!!!!!!!!!!!!!!!!!!!!!!!!!!!
line = File.ReadLines(fullFilename).First().Replace("\"", "");
}
else
{
//Assign data as last row (default behaviour).
line = File.ReadLines(fullFilename).Last().Replace("\"", "");
}
logDataArray = line.Split(',');
//Copy Array to Generics List and remove last value if it's empty.
for (int i = 0; i < logDataArray.Length; i++)
{
if (i < logDataArray.Length)
{
if (i < logDataArray.Length - 1)
{
//Value is not at the end, from observation, these always have a value (even if it's zero) and so we'll store the value.
logRowListResult.ValueList.Add(logDataArray[i]);
}
else
{
//This is the last value
if (logDataArray[i].Replace("\"", "").Trim().Length > 0)
{
//In this case, the last value is not empty, store it as normal.
logRowListResult.ValueList.Add(logDataArray[i]);
}
else { /*The last value is empty, e.g. "123,456,"; the final comma denotes another field but this field is empty so we will ignore it now. */ }
}
}
}
}
catch (Exception ex)
{
if (ex.Message == "Sequence contains no elements")
{ /*Empty file, no problem. The code will safely loop and then will pick up the header when it appears.*/ }
else
{
//TODO: catch this error properly
Int32 problemID = 10; //Unknown ERROR.
}
}
return logRowListResult;
}
}
}
I found the answer in a combination of other questions. One answer explaining how to read from the end of a file, which I adapted so that it would read only 1 line from the end of the file. And another explaining how to read the entire file without locking it (I did not want to read the entire file but the not locking part was useful). So now you can read the last line of the file (if it contains end of line characters) without locking it. For other end of line delimeters, just replace my 10 and 13 with your end of line character bytes...
Add the method below to public class CopyCSVDataFromLogFile
private static string Reverse(string str)
{
char[] arr = new char[str.Length];
for (int i = 0; i < str.Length; i++)
arr[i] = str[str.Length - 1 - i];
return new string(arr);
}
and replace this line ...
line = File.ReadLines(fullFilename).Last().Replace("\"", "");
with this code block ...
Int32 endOfLineCharacterCount = 0;
Int32 previousCharByte = 0;
Int32 currentCharByte = 0;
//Read the file, from the end, for 1 line, allowing other programmes to access it for read and write!
using (FileStream reader = new FileStream(fullFilename, FileMode.Open, FileAccess.Read, FileShare.ReadWrite, 0x1000, FileOptions.SequentialScan))
{
int i = 0;
StringBuilder lineBuffer = new StringBuilder();
int byteRead;
while ((-i < reader.Length) /*Belt and braces: if there were no end of line characters, reading beyond the file would give a catastrophic error here (to be avoided thus).*/
&& (endOfLineCharacterCount < 2)/*Exit Condition*/)
{
reader.Seek(--i, SeekOrigin.End);
byteRead = reader.ReadByte();
currentCharByte = byteRead;
//Exit condition: the first 2 characters we read (reading backwards remember) were end of line ().
//So when we read the second end of line, we have read 1 whole line (the last line in the file)
//and we must exit now.
if (currentCharByte == 13 && previousCharByte == 10)
{
endOfLineCharacterCount++;
}
if (byteRead == 10 && lineBuffer.Length > 0)
{
line += Reverse(lineBuffer.ToString());
lineBuffer.Remove(0, lineBuffer.Length);
}
lineBuffer.Append((char)byteRead);
previousCharByte = byteRead;
}
reader.Close();
}
I'm trying to work out a way of removing records from a program I'm writing. I have a text file with all the customer data spread over a set of lines and I read in these lines one at a time and store them in a List
When writing I simply append to the file. However, for deleting I had the idea of adding a character such as * or # to the front of lines no longer needed. However I am unsure how to do this
Below is how I currrently read the data in:
Thanks in advance
StreamReader dataIn = null;
CustomerClass holdcus; //holdcus and holdacc are used as "holding pens" for the next customer/account
Accounts holdacc;
bool moreData = false;
string[] cusdata = new string[13]; //holds customer data
string[] accdata = new string[8]; //holds account data
if (fileIntegCheck(inputDataFile, ref dataIn))
{
moreData = getCustomer(dataIn, cusdata);
while (moreData == true)
{
holdcus = new CustomerClass(cusdata[0], cusdata[1], cusdata[2], cusdata[3], cusdata[4], cusdata[5], cusdata[6], cusdata[7], cusdata[8], cusdata[9], cusdata[10], cusdata[11], cusdata[12]);
customers.Add(holdcus);
int x = Convert.ToInt32(cusdata[12]);
for (int i = 0; i < x; i++) //Takes the ID number for the last customer, as uses it to set the first value of the following accounts
{ //this is done as a key to which accounts map to which customers
moreData = getAccount(dataIn, accdata);
accdata[0] = cusdata[0];
holdacc = new Accounts(accdata[0], accdata[1], accdata[2], accdata[3], accdata[4], accdata[5], accdata[6], accdata[7]);
accounts.Add(holdacc);
}
moreData = getCustomer(dataIn, cusdata);
}
}
if (moreData != null) dataIn.Close();
Since your using string arrays, you can just do cusdata[index] = "#"+cusdata[index] to append it to the beginning of the line. However if your question is how to delete it from the file, why not skip the above step and just not add the line you want deleted when writing the file?
Here is a small read / write sample that should suit your needs. If it doesnt then let me know in the comment.
class Program
{
static readonly string filePath = "c:\\test.txt";
static void Main(string[] args)
{
// Read your file
List<string> lines = ReadLines();
//Create your remove logic here ..
lines = lines.Where(x => x.Contains("Julia Roberts") != true).ToList();
// Rewrite the file
WriteLines(lines);
}
static List<string> ReadLines()
{
List<string> lines = new List<string>();
using (StreamReader sr = new StreamReader(new FileStream(filePath, FileMode.Open)))
{
while (!sr.EndOfStream)
{
string buffer = sr.ReadLine();
lines.Add(buffer);
// Just to show you the results
Console.WriteLine(buffer);
}
}
return lines;
}
static void WriteLines(List<string> lines)
{
using (StreamWriter sw = new StreamWriter(new FileStream(filePath, FileMode.Create)))
{
foreach (var line in lines)
{
sw.WriteLine(line);
}
}
}
}
I used the following "data sample" for this
Matt Damon 100 222
Julia Roberts 125 152
Robert Downey Jr. 150 402
Tom Hanks 55 932
I have the following code which takes a CSV and writes to a console:
using (CsvReader csv = new CsvReader(
new StreamReader("data.csv"), true))
{
// missing fields will not throw an exception,
// but will instead be treated as if there was a null value
csv.MissingFieldAction = MissingFieldAction.ReplaceByNull;
// to replace by "" instead, then use the following action:
//csv.MissingFieldAction = MissingFieldAction.ReplaceByEmpty;
int fieldCount = csv.FieldCount;
string[] headers = csv.GetFieldHeaders();
while (csv.ReadNextRecord())
{
for (int i = 0; i < fieldCount; i++)
Console.Write(string.Format("{0} = {1};",
headers[i],
csv[i] == null ? "MISSING" : csv[i]));
Console.WriteLine();
}
}
The CSV file has 7 headers for which I have 7 columns in my SQL table.
What is the best way to take each csv[i] and write to a row for each column and then move to the next row?
I tried to add the ccsv[i] to a string array but that didn't work.
I also tried the following:
SqlCommand sql = new SqlCommand("INSERT INTO table1 [" + csv[i] + "]", mysqlconnectionstring);
sql.ExecuteNonQuery();
My table (table1) is like this:
name address city zipcode phone fax device
your problem is simple but I will take it one step further and let you know a better way to approach the issue.
when you have a problem to sold, always break it down into parts and apply each part in each own method. For example, in your case:
1 - read from the file
2 - create a sql query
3 - run the query
and you can even add validation to the file (imagine your file does not even have 7 fields in one or more lines...) and the example below it to be taken, only if your file never passes around 500 lines, as if it does normally you should consider to use a SQL statement that takes your file directly in to the database, it's called bulk insert
1 - read from file:
I would use a List<string> to hold the line entries and I always use StreamReader to read from text files.
using (StreamReader sr = File.OpenText(this.CsvPath))
{
while ((line = sr.ReadLine()) != null)
{
splittedLine = line.Split(new string[] { this.Separator }, StringSplitOptions.None);
if (iLine == 0 && this.HasHeader)
// header line
this.Header = splittedLine;
else
this.Lines.Add(splittedLine);
iLine++;
}
}
2 - generate the sql
foreach (var line in this.Lines)
{
string entries = string.Concat("'", string.Join("','", line))
.TrimEnd('\'').TrimEnd(','); // remove last ",'"
this.Query.Add(string.Format(this.LineTemplate, entries));
}
3 - run the query
SqlCommand sql = new SqlCommand(string.Join("", query), mysqlconnectionstring);
sql.ExecuteNonQuery();
having some fun I end up doing the solution and you can download it here, the output is:
The code can be found here. It needs more tweaks but I will left that for others. Solution written in C#, VS 2013.
The ExtractCsvIntoSql class is as follows:
public class ExtractCsvIntoSql
{
private string CsvPath, Separator;
private bool HasHeader;
private List<string[]> Lines;
private List<string> Query;
/// <summary>
/// Header content of the CSV File
/// </summary>
public string[] Header { get; private set; }
/// <summary>
/// Template to be used in each INSERT Query statement
/// </summary>
public string LineTemplate { get; set; }
public ExtractCsvIntoSql(string csvPath, string separator, bool hasHeader = false)
{
this.CsvPath = csvPath;
this.Separator = separator;
this.HasHeader = hasHeader;
this.Lines = new List<string[]>();
// you can also set this
this.LineTemplate = "INSERT INTO [table1] SELECT ({0});";
}
/// <summary>
/// Generates the SQL Query
/// </summary>
/// <returns></returns>
public List<string> Generate()
{
if(this.CsvPath == null)
throw new ArgumentException("CSV Path can't be empty");
// extract csv into object
Extract();
// generate sql query
GenerateQuery();
return this.Query;
}
private void Extract()
{
string line;
string[] splittedLine;
int iLine = 0;
try
{
using (StreamReader sr = File.OpenText(this.CsvPath))
{
while ((line = sr.ReadLine()) != null)
{
splittedLine = line.Split(new string[] { this.Separator }, StringSplitOptions.None);
if (iLine == 0 && this.HasHeader)
// header line
this.Header = splittedLine;
else
this.Lines.Add(splittedLine);
iLine++;
}
}
}
catch (Exception ex)
{
if(ex.InnerException != null)
while (ex.InnerException != null)
ex = ex.InnerException;
throw ex;
}
// Lines will have all rows and each row, the column entry
}
private void GenerateQuery()
{
foreach (var line in this.Lines)
{
string entries = string.Concat("'", string.Join("','", line))
.TrimEnd('\'').TrimEnd(','); // remove last ",'"
this.Query.Add(string.Format(this.LineTemplate, entries));
}
}
}
and you can run it as:
class Program
{
static void Main(string[] args)
{
string file = Ask("What is the CSV file path? (full path)");
string separator = Ask("What is the current separator? (; or ,)");
var extract = new ExtractCsvIntoSql(file, separator);
var sql = extract.Generate();
Output(sql);
}
private static void Output(IEnumerable<string> sql)
{
foreach(var query in sql)
Console.WriteLine(query);
Console.WriteLine("*******************************************");
Console.Write("END ");
Console.ReadLine();
}
private static string Ask(string question)
{
Console.WriteLine("*******************************************");
Console.WriteLine(question);
Console.Write("= ");
return Console.ReadLine();
}
}
Usually i like to be a bit more generic so i'll try to explain a very basic flow i use from time to time:
I don't like the hard coded attitude so even if your code will work it will be dedicated specifically to one type. I prefer i simple reflection, first to understand what DTO is it and then to understand what repository should i use to manipulate it:
For example:
public class ImportProvider
{
private readonly string _path;
private readonly ObjectResolver _objectResolver;
public ImportProvider(string path)
{
_path = path;
_objectResolver = new ObjectResolver();
}
public void Import()
{
var filePaths = Directory.GetFiles(_path, "*.csv");
foreach (var filePath in filePaths)
{
var fileName = Path.GetFileName(filePath);
var className = fileName.Remove(fileName.Length-4);
using (var reader = new CsvFileReader(filePath))
{
var row = new CsvRow();
var repository = (DaoBase)_objectResolver.Resolve("DAL.Repository", className + "Dao");
while (reader.ReadRow(row))
{
var dtoInstance = (DtoBase)_objectResolver.Resolve("DAL.DTO", className + "Dto");
dtoInstance.FillInstance(row.ToArray());
repository.Save(dtoInstance);
}
}
}
}
}
Above is a very basic class responsible importing the data. Nevertheless of how this piece of code parsing CSV files (CsvFileReader), the important part is thata "CsvRow" is a simple List.
Below is the implementation of the ObjectResolver:
public class ObjectResolver
{
private readonly Assembly _myDal;
public ObjectResolver()
{
_myDal = Assembly.Load("DAL");
}
public object Resolve(string nameSpace, string name)
{
var myLoadClass = _myDal.GetType(nameSpace + "." + name);
return Activator.CreateInstance(myLoadClass);
}
}
The idea is to simple follow a naming convetion, in my case is using a "Dto" suffix for reflecting the instances, and "Dao" suffix for reflecting the responsible dao. The full name of the Dto or the Dao can be taken from the csv name or from the header (as you wish)
Next step is filling the Dto, each dto or implements the following simple abstract:
public abstract class DtoBase
{
public abstract void FillInstance(params string[] parameters);
}
Since each Dto "knows" his structure (just like you knew to create an appropriate table in the database), it can easily implement the FillInstanceMethod, here is a simple Dto example:
public class ProductDto : DtoBase
{
public int ProductId { get; set; }
public double Weight { get; set; }
public int FamilyId { get; set; }
public override void FillInstance(params string[] parameters)
{
ProductId = int.Parse(parameters[0]);
Weight = double.Parse(parameters[1]);
FamilyId = int.Parse(parameters[2]);
}
}
After you have your Dto filled with data you should find the appropriate Dao to handle it
which is basically happens in reflection in this line of the Import() method:
var repository = (DaoBase)_objectResolver.Resolve("DAL.Repository", className + "Dao");
In my case the Dao implements an abstract base class - but it's not that relevant to your problem, your DaoBase can be a simple abstract with a single Save() method.
This way you have a dedicated Dao to CRUD your Dto's - each Dao simply knows how to save for its relevant Dto. Below is the corresponding ProductDao to the ProductDto:
public class ProductDao : DaoBase
{
private const string InsertProductQuery = #"SET foreign_key_checks = 0;
Insert into product (productID, weight, familyID)
VALUES (#productId, #weight, #familyId);
SET foreign_key_checks = 1;";
public override void Save(DtoBase dto)
{
var productToSave = dto as ProductDto;
var saveproductCommand = GetDbCommand(InsertProductQuery);
if (productToSave != null)
{
saveproductCommand.Parameters.Add(CreateParameter("#productId", productToSave.ProductId));
saveproductCommand.Parameters.Add(CreateParameter("#weight", productToSave.Weight));
saveproductCommand.Parameters.Add(CreateParameter("#familyId", productToSave.FamilyId));
ExecuteNonQuery(ref saveproductCommand);
}
}
}
Please ignore the CreateParameter() method, since it's an abstraction from the base classs. you can just use a CreateSqlParameter or CreateDataParameter etc.
Just notice, it's a real naive implementation - you can easily remodel it better, depends on your needs.
From the first impression of your questionc I guess you would be having hugely number of records (more than lacs). If yes I would consider the SQL bulk copies an option. If the record would be less go ahead single record. Insert. The reason for you insert not working is u not providing all the columns of the table and also there's some syntax error.