Converting log file to CSV - c#

I have to convert a (Squid Web Proxy Server) log file to CSV file, so that it can be loaded into powerpivot for analysis of queries.
So how should I start, any help would strongly be appreciated.
I've to use C# language for this task, log looks like the following:
Format: Timestamp Elapsed Client Action/Code Size Method URI Ident Hierarchy/From Content
1473546438.145 917 5.45.107.68 TCP_DENIED/403 4114 GET http://atlantis.pennergame.de/pet/ - NONE/- text/html
1473546439.111 3 146.148.96.13 TCP_DENIED/403 4604 POST http://mobiuas.ebay.com/services/mobile/v1/UserAuthenticationService - NONE/- text/html
1473546439.865 358 212.83.168.7 TCP_DENIED/403 3955 GET http://www.theshadehouse.com/left-sidebar-post/ - NONE/- text/html
1473546439.985 218 185.5.97.68 TCP_DENIED/403 3600 GET http://www.google.pl/search? - NONE/- text/html
1473546440.341 2 146.148.96.13 TCP_DENIED/403 4604 POST http://mobiuas.ebay.com/services/mobile/v1/UserAuthenticationService - NONE/- text/html
1473546440.840 403 115.29.46.240 TCP_DENIED/403 4430 POST http://et.airchina.com.cn/fhx/consumeRecord/getCardConsumeRecordList.htm - NONE/- text/html
1473546441.486 2 52.41.27.39 TCP_DENIED/403 3813 POST http://www.deezer.com/ajax/action.php - NONE/- text/html
1473546441.596 2 146.148.96.13 TCP_DENIED/403 4604 POST http://mobiuas.ebay.com/services/mobile/v1/UserAuthenticationService - NONE/- text/html

It is already close to a CSV, so read it line by line and clean each line up a little:
...
line = line
.Replace(" ", " ") // compress 3 spaces to 1
.Replace(" ", " ") // compress 2 spaces to 1
.Replace(" ", " ") // compress 2 spaces to 1, again
.Replace(" ", "|") // replace space by '|'
.Replace(" - ", "|"); // replace - by '|'
You may want to tweak this for the fields like TCP_DENIED/403 .
this gives you a '|' separated line. Easy to convert to any separator you need. Or split it up:
// write it out or process it further
string[] parts = line.split('|');

public static class SquidWebProxyServerCommaSeparatedWriter
{
public static void WriteToCSV(string destination, IEnumerable<SquidWebProxyServerLogEntry> serverLogEntries)
{
var lines = serverLogEntries.Select(ConvertToLine);
File.WriteAllLines(destination, lines);
}
private static string ConvertToLine(SquidWebProxyServerLogEntry serverLogEntry)
{
return string.Join(#",", serverLogEntry.Timestamp, serverLogEntry.Elapsed.ToString(),
serverLogEntry.ClientIPAddress, serverLogEntry.ActionCode, serverLogEntry.Size.ToString(),
serverLogEntry.Method.ToString(), serverLogEntry.Uri, serverLogEntry.Identity,
serverLogEntry.HierarchyFrom, serverLogEntry.MimeType);
}
}
public static class SquidWebProxyServerLogParser
{
public static IEnumerable<SquidWebProxyServerLogEntry> Parse(FileInfo fileInfo)
{
using (var streamReader = fileInfo.OpenText())
{
string row;
while ((row = streamReader.ReadLine()) != null)
{
yield return ParseRow(row)
}
}
}
private static SquidWebProxyServerLogEntry ParseRow(string row)
{
var fields = row.Split(new[] {"\t", " "}, StringSplitOptions.None);
return new SquidWebProxyServerLogEntry
{
Timestamp = fields[0],
Elapsed = int.Parse(fields[1]),
ClientIPAddress = fields[2],
ActionCode = fields[3],
Size = int.Parse(fields[4]),
Method =
(SquidWebProxyServerLogEntry.MethodType)
Enum.Parse(typeof(SquidWebProxyServerLogEntry.MethodType), fields[5]),
Uri = fields[6],
Identity = fields[7],
HierarchyFrom = fields[8],
MimeType = fields[9]
};
}
public static IEnumerable<SquidWebProxyServerLogEntry> Parse(IEnumerable<string> rows) => rows.Select(ParseRow);
}
public sealed class SquidWebProxyServerLogEntry
{
public enum MethodType
{
Get = 0,
Post = 1,
Put = 2
}
public string Timestamp { get; set; }
public int Elapsed { get; set; }
public string ClientIPAddress { get; set; }
public string ActionCode { get; set; }
public int Size { get; set; }
public MethodType Method { get; set; }
public string Uri { get; set; }
public string Identity { get; set; }
public string HierarchyFrom { get; set; }
public string MimeType { get; set; }
}

A CSV is a delimited file whose field delimiter is ,. Almost all programs allow you to specify different field and record delimiters, using , and \n as defaults.
Your file could be treated as delimited if it didn't contain multiple spaces for indentation. You can replace multiple spaces with a single one using the regex \s{2,}, eg:
var regex=new Regex(#"\s{2,}");
var original=File.ReadAllText(somePath);
var delimited=regex.Replace(original," ");
File.WriteAllText(somePath,delimited);
Power BI Desktop already allows you to use space as a delimiter. Even if it didn't, you could just replace all spaces with a comma by changing the pattern to \s+, ie:
var regex=new Regex(#"\s+");
...
var delimited=regex.Replace(original,",");
...
Log files are large, so it's a very good idea to reduce the amount of memory they use. You can avoid reading the entire file in memory if you use ReadLines to read one line at a time, make the replacement and write it out:
using(var writer=File.CreateText(targetPath))
{
foreach(var line in File.ReadLines(somePath))
{
var newline=regex.Replace(line," ");
writer.WriteLine(newline);
}
}
Unlike ReadAllLines which loads all lines in an array, ReadLines is an iterator that reads and returns one line at a time.

Related

Problems with using Regex but don't know why

so i am making an WPF application where you insert PDF files and it will convert to text, after that a few Regex functions will be used on the text to give me only the important parts of the pdf.
the first problem i am running into is with numbers, if the number for example is 6.90 it will come out as 6.9. I have tried changing my Regex but it wont make a difference.
the second problem i have is when with dates for example 09-06-2022 it just wont write anything i have also tried changing the Regex but it just wont show up.
anyone know why this is ?
this is a line in the PDF i use i am trying to only get 6.90
Date: 06-09-2022 € 5.70 € 1.20 € 6.90
this is the Regex is use to only get the Amount
(?<=Date\:?\s?\s?\s?\d{0,2}\-\d{0,2}\-\d{0,4}\s?\€\s\d{0,10}\.?\,?\d{0,2}\s?\€\s\d{0,10}\,?\.?\d{0,10}\s?\€\s)\d{0,10}\.\d{0,2}
this is the Regex i use to only get the Date
(?<=Date\:?\s?\s?\s?)\d{0,2}\-\d{0,2}\-\d{0,4}
There are a lot of "?" in it because i have to make it compatible to multiple different PDF
screenshot of the outcome for the number in my selfmade Regex executor application
screenshot of the outcome for the date in my selfmade Regex executor application
screenshot of the outcome i get when i inserted a PDF
as you can see in the screenshots for some reason i get different results and i have no clue why its different
MainWindow
the button does all the work for recieving the pdf and changing it to text and going thru the correct class where all the regex are.
using Microsoft.Win32;
using System;
using System.Collections.Generic;
using System.IO;
using System.Windows;
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;
//ItextSharp is a tool i use in Visual Studio
public partial class MainWindow : Window
{
private List<IRegexPDFFactuur> _listRegexFactuur = new
List<IRegexPDFFactuur>();
public MainWindow()
{
InitializeComponent();
}
public void btnUpload_Click(object sender, EventArgs e)
{
var openFileDialog = new OpenFileDialog();
if (openFileDialog.ShowDialog() == true)
{
tbInvoer.Text = "";
var file = openFileDialog.FileName;
var text = File.ReadAllText(file);
PdfReader pdf_Reader = new PdfReader(file);
String tempPDFText = "";
for (int i = 1; i <= pdf_Reader.NumberOfPages; i++)
{
tempPDFText = tempPDFText +
PdfTextExtractor.GetTextFromPage(pdf_Reader, i);
}
var PDFText = tempPDFText;
_listRegexFactuur.Add(new PDFtest1Type());
foreach (var tempRegexFactuurType in _listRegexFactuur)
{
if
(tempRegexFactuurType.IsRegexTypeValidForPDF(PDFText))
{
var tempPDFdate = tempRegexFactuurType.GetPDFdate(PDFText);
var tempTotalamount = tempRegexFactuurType.GetTotalamount(PDFText);
tbInvoer.Text += $"PDF Date: {tempPDFdate}\r\n";
tbInvoer.Text += $"Total amount: {tempTotalamount}";
break;
}
}
}
}
}
Interface for Regex
string regexPDFname { get; set; }
string regexPDFdate { get; set; }
string regexTotalamount { get; set; }
bool IsRegexTypeValidForPDF(string argInput);
double? GetPDFdate(string argInput);
double? GetTotalamount(string argInput);
Class with implemented Interface for Regex
public string regexPDFname { get; set; } = #"(PDFtest1)";
public string regexPDFdate { get; set; } = #"(?<=Date\:?\s?\s?\s?)\d{0,2}\-\d{0,2}\-\d{0,4}";
public string regexTotalamount { get; set; } = #"(?<=Date\:?\s?\s?\s?\d{0,2}\-\d{0,2}\-\d{0,4}\s?\€\s\d{0,10}\.?\,?\d{0,2}\s?\€\s\d{0,10}\,?\.?\d{0,10}\s?\€\s)\d{0,10}\.\d{0,2}"
public bool IsRegexTypeValidForPDF(string argInput)
{
var tempMatch = Regex.Match(argInput, regexPDFname, RegexOptions.IgnoreCase);
if (!tempMatch.Success) return false;
if (tempMatch.Value == "PDFtest1") return true;
else return false;
}
public double? GetPDFdate(string argInput)
{
var tempMatch = Regex.Match(argInput, regexPDFdate, RegexOptions.IgnoreCase);
if (!tempMatch.Success) return null;
if (Double.TryParse(tempMatch.Value, out var tempPDFdate)) return tempPDFdate;
else return null;
}
public double? GetTotalamount(string argInput)
{
var tempMatch = Regex.Match(argInput, regexTotalamount, RegexOptions.IgnoreCase);
if (!tempMatch.Success) return null;
if (Double.TryParse(tempMatch.Value, out var tempTotalamount)) return tempTotalamount;
else return null;
}
This is much easier without Regex
string input = "Date: 06-09-2022 € 5.70 € 1.20 € 6.90";
string[] array = input.Split(new char[] {':', '€'});
DateTime date = DateTime.Parse(array[1]);
decimal amount1 = decimal.Parse(array[2]);
decimal amount2 = decimal.Parse(array[3]);
decimal amount3 = decimal.Parse(array[4]);
If you still want to use Regex, this is a much simpler solution
Date\:\s{0,}(\d{1,2}-?\d{1,2}-?\d{2}(?:\d{2})?).+(\d+\.\d+).+(\d+\.\d+).+(\d+\.\d+)
Breakdown
Date\:\s{0,} matches Date: followed by 0 or more spaces
(\d{1,2}-?\d{1,2}-?\d{2,4}) matches your date string accepting 1 or 2 numbers for month and day and 2 or 4 for year
.+(\d+\.\d+) matches any characters until it matches 1 or more numbers followed by . and 1 or more numbers. This is repeated 3 times to obtain the currency values
RegEx Storm Example

How to Get index of a Character in an Unknown Line of a Multiline string in c#

I'm trying to get covid-19 results (only information about Iran) from an Api and show it on a textbox.
and the full result (all countries) that i get from the Api is a json format.
so to get only Iran section i made a Function that loops through lines of the string one by one and check if in that line there is a "{" and if yes get index of that and continue checking if in another line there is a "}" and get index of that too then check if between these, there is "Iran" then add this text (from "{" to "}") in a string:
private string getBetween(string strSourceText, string strStartingPosition, string strEndingPosition)
{
int Starting_CurlyBracket_Index = 0;
int Ending_CurlyBracket_Index = 0;
string FinalText = null;
bool isTurnTo_firstIf = true;
foreach (var line in strSourceText.Split('\r', '\n'))
{
if (isTurnTo_firstIf == true)
{
if (line.Contains(strStartingPosition))
{
Starting_CurlyBracket_Index = line.IndexOf(strStartingPosition); //i think problem is here
isTurnTo_firstIf = false;
}
}
else if (isTurnTo_firstIf == false)
{
if (line.Contains(strEndingPosition))
{
Ending_CurlyBracket_Index = line.IndexOf(strEndingPosition); //i think problem is here
if (strSourceText.Substring(Starting_CurlyBracket_Index, Ending_CurlyBracket_Index - Starting_CurlyBracket_Index).Contains("Iran")) //error here
{
FinalText = strSourceText.Substring(Starting_CurlyBracket_Index, Ending_CurlyBracket_Index - Starting_CurlyBracket_Index);
break;
}
else
{
isTurnTo_firstIf = true;
}
}
}
}
return FinalText;
}
and i call the function like this:
string OnlyIranSection = getBetween(Sorted_Covid19_Result, "{", "}"); //Sorted_Covid19_Result is the full result in json format that converted to string
textBox1.Text = OnlyIranSection;
but i get this Error:
and i know.. its because it gets indexes in the current line but what i need is getting that index in the strSourceText so i can show only this section of the whole result:
USING JSON
As per the comments I read it was really needed to use JSON utility to achieve your needs easier.
You can start with this basic example:
static void Main(string[] args)
{
string jsonString = #"{
""results"": [
{""continent"":""Asia"",""country"":""Indonesia""},
{""continent"":""Asia"",""country"":""Iran""},
{""continent"":""Asia"",""country"":""Philippines""}
]
}";
var result = JsonConvert.DeserializeObject<JsonResult>(jsonString);
var iranInfo = result.InfoList.Where(i => i.Country.ToString() == "Iran").FirstOrDefault();
}
public class JsonResult
{
[JsonProperty("results")]
public List<Info> InfoList { get; set; }
}
public class Info
{
public object Continent { get; set; }
public object Country { get; set; }
}
UPDATE: USING INDEX
As long as the structure of the JSON is consistent always then this kind of sample solution can give you hint.
Console.WriteLine("Original JSON:");
Console.WriteLine(jsonString);
Console.WriteLine();
Console.WriteLine("Step1: Make the json as single line,");
jsonString = jsonString.Replace(" ", "").Replace(Environment.NewLine, " ");
Console.WriteLine(jsonString);
Console.WriteLine();
Console.WriteLine("Step2: Get index of country Iran. And use that index to get the below output using substring.");
var iranIndex = jsonString.ToLower().IndexOf(#"""country"":""iran""");
var iranInitialInfo = jsonString.Substring(iranIndex);
Console.WriteLine(iranInitialInfo);
Console.WriteLine();
Console.WriteLine("Step3: Get inedx of continent. And use that index to get below output using substring.");
var continentIndex = iranInitialInfo.IndexOf(#"""continent"":");
iranInitialInfo = iranInitialInfo.Substring(0, continentIndex-3);
Console.WriteLine(iranInitialInfo);
Console.WriteLine();
Console.WriteLine("Step4: Get the first part of the info by using. And combine it with the initialInfo to bring the output below.");
var beginningIranInfo = jsonString.Substring(0, iranIndex);
var lastOpenCurlyBraceIndex = beginningIranInfo.LastIndexOf("{");
beginningIranInfo = beginningIranInfo.Substring(lastOpenCurlyBraceIndex);
var iranInfo = beginningIranInfo + iranInitialInfo;
Console.WriteLine(iranInfo);
OUTPUT USING INDEX:

How to read .txt and count word/length, etc

I wrote a exam last week and had a really hard task to solve and didn't got the point.
I had a .txt with a Text.
The Text is like this:
Der zerbrochne Krug, ein Lustspiel,
von Heinrich von Kleist.
Berlin. In der Realschulbuchhandlung.
1811.
[8]
PERSONEN.
WALTER, Gerichtsrath. ADAM, Dorfrichter. LICHT, Schreiber. FRAU MARTHE
RULL. EVE, ihre Tochter. VEIT TÜMPEL, ein Bauer. RUPRECHT, sein Sohn.
FRAU BRIGITTE. EIN BEDIENTER, BÜTTEL, MÄGDE, etc.
Die Handlung spielt in einem niederländischen Dorfe bei Utrecht.
[9] Scene: Die Gerichtsstube. Erster Auftritt.
And i got the Main with this code:
var document = new Document("Text.txt");
if (document.Contains("Haus") == true)
Console.WriteLine(document["Haus"]); // Word: haus, Frequency.: 36, Length: 4
else
Console.WriteLine("Word not found!");
Now i had to write a class which helps to make the code above works.
Does anyone have an idea how to solve this problem and would help a young student of business informatics to understand, how this works?
Normally the StreamReader is easy for me, but in this case it wasn't possible for me...
Thank you very much and much love and healthy for all of you, who tries tohelpme.
Well this is the class you are looking for, hope this might help you.
class Document : Dictionary<string, int>
{
private const char WORDSPLITTER = ' ';
public string Filename { get; }
public Document(string filename)
{
Filename = filename;
Fill();
}
private void Fill()
{
foreach (var item in File.ReadLines(Filename))
{
foreach (var word in item.Split(WORDSPLITTER))
{
if (ContainsKey(word))
base[word] += 1;
else
Add(word, 1);
}
}
}
public bool Contains(string word) => ContainsKey(word);
public new string this[string word]
{
get
{
if (ContainsKey(word))
return $"Word: {word}, frequency: {base[word]}, Length: {word.Length}";
else
return $"Word {word} not found!";
}
}
}
Try the below function :
private bool FindWord( string SearchWord)
{
List<string> LstWords = new List<string>();
string[] Lines = File.ReadAllLines("Path of your File");
foreach (string line in Lines )
{
string[] words = line.Split(' ');
foreach (string word in words )
{
LstWords.Add(word);
}
}
// Find word set word to upper letters and target word to upper
int index = LstWords.FindIndex(x => x.Trim ().ToUpper ().Equals(SearchWord.ToUpper ()));
if (index==-1)
{
// Not Found
return false;
}
else
{
//word found
return true;
}
}
I find that Regex could be a good way to solve this:
var ms = Regex.Matches(textToSearch, wordToFind, RegexOptions.IgnoreCase);
if (ms.Count > 0)
{
Console.WriteLine($"Word: {wordToFind} Frequency: {ms.Count} Length: {wordToFind.Length}");
}
else
{
Console.WriteLine("Word not found!");
}
Regex is in the namespace:
using System.Text.RegularExpressions;
You will need to set the RegexOptions that are appropriate for your problem.
One of the approach would be below steps-
Create a class Document with below properties -
//Contains file name
public string FileName { get; set; }
//Contains file data
public string FileData { get; set; }
//Contains word count
public int WordCount { get; set; }
//Holds all the words
public Dictionary<string, int> DictWords { get; set; } = new Dictionary<string, int>();
Define the constructor which does 2 things -
Assign the property Filename to incoming file
Read the file from the path and get all the words from the file
Find the word count and insert them to dictionary, so the Final dictionary will
have all the <<<'word'>>, <<'TotalCount'>>> records
//Constructor
public Document(string fileName)
{
//1/ Assign File Name name troperty
FileName = fileName;
//2. Read File from the Path
string text = System.IO.File.ReadAllText(fileName, Encoding.Default);
string[] source = text.Split(new char[] { '.', '!', '?', ',', '(', ')', '\t', '\n', '\r', ' ' },
StringSplitOptions.RemoveEmptyEntries);
//3. Add the counts to Dictionary
foreach (String word in source)
{
if (DictWords.ContainsKey(word))
{
DictWords[word]++;
} else
{
DictWords[word] = 1;
}
}
}
Create "Contains" method which will be used to check whether the word is present or
not in the document-
//4. Method will return true /false based on the existence of the key/word.
public bool Contains(string word)
{
if (DictWords.ContainsKey(word))
{
return true;
}
else
{
return false;
}
}
Create an indexer on string for the class to get the desired output to be print to
Console -
//4. Define index on the word.
public string this[string word]
{
get
{
if (DictWords.TryGetValue(word, out int value))
{
return $"Word: {word}, Frequency.:{value}, Length: {word.Length}";
}
return string.Empty;
}
}
Tests :
var document = new Document(#"Text.txt");
if (document.Contains("BEDIENTER") == true)
Console.WriteLine(document["BEDIENTER"]);
else
Console.WriteLine("Word not found!");
//Output
// Word: BEDIENTER, Frequency.:1, Length: 9

Create list of arrays from text file in C#

I have a number of text files that all follow the same content format:
"Title section","Version of the app"
10
"<thing 1>","<thing 2>","<thing 3>","<thing 4>","<thing 5>","<thing 6>","<thing 7>","<thing 8>","<thing 9>","<thing 10>"
'Where:
' first line never changes, it always contains exactly these 2 items
' second line is a count of how many "line 3s" there are
' line 3 contains a command to execute and (up to) 9 parameters
' - there will always be 10 qoute-delimited entries, even if some are blank
' - there can be N number of entries (in this example, there will be 10 commands to read)
I am reading each of these text files in, using StreamReader, and want to set each file up in its own class.
public class MyTextFile{
public string[] HeaderLine { get; set; }
public int ItemCount { get; set; }
List<MyCommandLine> Commands { get; set;}
}
public class MyCommandLine{
public string[] MyCommand { get; set; }
}
private void btnGetMyFilesiles_Click(object sender, EventArgs e){
DirectoryInfo myFolder = new DirectoryInfo(#"C:\FileSpot");
FileInfo[] myfiles = myfolder.GetFiles("*.ses");
string line = "";
foreach(FileInfo file in Files ){
str = str + ", " + file.Name;
// Read the file and display it line by line.
System.IO.StreamReader readingFile = new System.IO.StreamReader(file.Name);
MyTextFile myFileObject = new MyTextFile()
while ((line = readingFile.ReadLine()) != null){
' create the new MyTextFile here
}
file.Close();
}
}
}
The objective is to determine what the actual command being called is (""), and if any of the remaining parameters point to a pre-existing file, determine if that file exists. My problem is that I can't figure out how to read N number of "line 3" into their own objects and append these objects to the MyTextFile object. I'm 99% certain that I've led myself astray in reading each file line-by-line, but I don't know how to get out of it.
So, addressing the specific issue of getting N number of line 3 items into your class, you could do something like this (obviously you can make some changes so it is more specific to your application).
public class MyTextFile
{
public List<Array> Commands = new List<Array>();
public void EnumerateCommands()
{
for (int i = 0; i < Commands.Count; i++)
{
foreach (var c in Commands[i])
Console.Write(c + " ");
Console.WriteLine();
}
}
}
class Program
{
static void Main(string[] args)
{
string line = "";
int count = 0;
MyTextFile tf = new MyTextFile();
using (StreamReader sr = new StreamReader(#"path"))
{
while ((line = sr.ReadLine()) != null)
{
count += 1;
if (count >= 3)
{
object[] Arguments = line.Split(',');
tf.Commands.Add(Arguments);
}
}
}
tf.EnumerateCommands();
Console.ReadLine();
}
}
At least now you have a list of commands within your 'MyTextFile' class that you can enumerate through and do stuff with.
** I added the EnumerateCommands method so that you could actually see the list is storing the line items. The code should run in a Console application with the appropriate 'using' statements.
Hope this helps.
If all of the is separated with coma sign , you can just do something like :
int length = Convert.ToInt32 (reader.ReadLine ());
string line = reader.ReadLine ();
IEnumerable <string> things = line.Split (',').Select (thing => thing. Replace ('\"'', string.Empty).Take(length);
Take indicates how many things to take from the line.

How to take a CSV field and write to columns in SQL

I have the following code which takes a CSV and writes to a console:
using (CsvReader csv = new CsvReader(
new StreamReader("data.csv"), true))
{
// missing fields will not throw an exception,
// but will instead be treated as if there was a null value
csv.MissingFieldAction = MissingFieldAction.ReplaceByNull;
// to replace by "" instead, then use the following action:
//csv.MissingFieldAction = MissingFieldAction.ReplaceByEmpty;
int fieldCount = csv.FieldCount;
string[] headers = csv.GetFieldHeaders();
while (csv.ReadNextRecord())
{
for (int i = 0; i < fieldCount; i++)
Console.Write(string.Format("{0} = {1};",
headers[i],
csv[i] == null ? "MISSING" : csv[i]));
Console.WriteLine();
}
}
The CSV file has 7 headers for which I have 7 columns in my SQL table.
What is the best way to take each csv[i] and write to a row for each column and then move to the next row?
I tried to add the ccsv[i] to a string array but that didn't work.
I also tried the following:
SqlCommand sql = new SqlCommand("INSERT INTO table1 [" + csv[i] + "]", mysqlconnectionstring);
sql.ExecuteNonQuery();
My table (table1) is like this:
name address city zipcode phone fax device
your problem is simple but I will take it one step further and let you know a better way to approach the issue.
when you have a problem to sold, always break it down into parts and apply each part in each own method. For example, in your case:
1 - read from the file
2 - create a sql query
3 - run the query
and you can even add validation to the file (imagine your file does not even have 7 fields in one or more lines...) and the example below it to be taken, only if your file never passes around 500 lines, as if it does normally you should consider to use a SQL statement that takes your file directly in to the database, it's called bulk insert
1 - read from file:
I would use a List<string> to hold the line entries and I always use StreamReader to read from text files.
using (StreamReader sr = File.OpenText(this.CsvPath))
{
while ((line = sr.ReadLine()) != null)
{
splittedLine = line.Split(new string[] { this.Separator }, StringSplitOptions.None);
if (iLine == 0 && this.HasHeader)
// header line
this.Header = splittedLine;
else
this.Lines.Add(splittedLine);
iLine++;
}
}
2 - generate the sql
foreach (var line in this.Lines)
{
string entries = string.Concat("'", string.Join("','", line))
.TrimEnd('\'').TrimEnd(','); // remove last ",'"
this.Query.Add(string.Format(this.LineTemplate, entries));
}
3 - run the query
SqlCommand sql = new SqlCommand(string.Join("", query), mysqlconnectionstring);
sql.ExecuteNonQuery();
having some fun I end up doing the solution and you can download it here, the output is:
The code can be found here. It needs more tweaks but I will left that for others. Solution written in C#, VS 2013.
The ExtractCsvIntoSql class is as follows:
public class ExtractCsvIntoSql
{
private string CsvPath, Separator;
private bool HasHeader;
private List<string[]> Lines;
private List<string> Query;
/// <summary>
/// Header content of the CSV File
/// </summary>
public string[] Header { get; private set; }
/// <summary>
/// Template to be used in each INSERT Query statement
/// </summary>
public string LineTemplate { get; set; }
public ExtractCsvIntoSql(string csvPath, string separator, bool hasHeader = false)
{
this.CsvPath = csvPath;
this.Separator = separator;
this.HasHeader = hasHeader;
this.Lines = new List<string[]>();
// you can also set this
this.LineTemplate = "INSERT INTO [table1] SELECT ({0});";
}
/// <summary>
/// Generates the SQL Query
/// </summary>
/// <returns></returns>
public List<string> Generate()
{
if(this.CsvPath == null)
throw new ArgumentException("CSV Path can't be empty");
// extract csv into object
Extract();
// generate sql query
GenerateQuery();
return this.Query;
}
private void Extract()
{
string line;
string[] splittedLine;
int iLine = 0;
try
{
using (StreamReader sr = File.OpenText(this.CsvPath))
{
while ((line = sr.ReadLine()) != null)
{
splittedLine = line.Split(new string[] { this.Separator }, StringSplitOptions.None);
if (iLine == 0 && this.HasHeader)
// header line
this.Header = splittedLine;
else
this.Lines.Add(splittedLine);
iLine++;
}
}
}
catch (Exception ex)
{
if(ex.InnerException != null)
while (ex.InnerException != null)
ex = ex.InnerException;
throw ex;
}
// Lines will have all rows and each row, the column entry
}
private void GenerateQuery()
{
foreach (var line in this.Lines)
{
string entries = string.Concat("'", string.Join("','", line))
.TrimEnd('\'').TrimEnd(','); // remove last ",'"
this.Query.Add(string.Format(this.LineTemplate, entries));
}
}
}
and you can run it as:
class Program
{
static void Main(string[] args)
{
string file = Ask("What is the CSV file path? (full path)");
string separator = Ask("What is the current separator? (; or ,)");
var extract = new ExtractCsvIntoSql(file, separator);
var sql = extract.Generate();
Output(sql);
}
private static void Output(IEnumerable<string> sql)
{
foreach(var query in sql)
Console.WriteLine(query);
Console.WriteLine("*******************************************");
Console.Write("END ");
Console.ReadLine();
}
private static string Ask(string question)
{
Console.WriteLine("*******************************************");
Console.WriteLine(question);
Console.Write("= ");
return Console.ReadLine();
}
}
Usually i like to be a bit more generic so i'll try to explain a very basic flow i use from time to time:
I don't like the hard coded attitude so even if your code will work it will be dedicated specifically to one type. I prefer i simple reflection, first to understand what DTO is it and then to understand what repository should i use to manipulate it:
For example:
public class ImportProvider
{
private readonly string _path;
private readonly ObjectResolver _objectResolver;
public ImportProvider(string path)
{
_path = path;
_objectResolver = new ObjectResolver();
}
public void Import()
{
var filePaths = Directory.GetFiles(_path, "*.csv");
foreach (var filePath in filePaths)
{
var fileName = Path.GetFileName(filePath);
var className = fileName.Remove(fileName.Length-4);
using (var reader = new CsvFileReader(filePath))
{
var row = new CsvRow();
var repository = (DaoBase)_objectResolver.Resolve("DAL.Repository", className + "Dao");
while (reader.ReadRow(row))
{
var dtoInstance = (DtoBase)_objectResolver.Resolve("DAL.DTO", className + "Dto");
dtoInstance.FillInstance(row.ToArray());
repository.Save(dtoInstance);
}
}
}
}
}
Above is a very basic class responsible importing the data. Nevertheless of how this piece of code parsing CSV files (CsvFileReader), the important part is thata "CsvRow" is a simple List.
Below is the implementation of the ObjectResolver:
public class ObjectResolver
{
private readonly Assembly _myDal;
public ObjectResolver()
{
_myDal = Assembly.Load("DAL");
}
public object Resolve(string nameSpace, string name)
{
var myLoadClass = _myDal.GetType(nameSpace + "." + name);
return Activator.CreateInstance(myLoadClass);
}
}
The idea is to simple follow a naming convetion, in my case is using a "Dto" suffix for reflecting the instances, and "Dao" suffix for reflecting the responsible dao. The full name of the Dto or the Dao can be taken from the csv name or from the header (as you wish)
Next step is filling the Dto, each dto or implements the following simple abstract:
public abstract class DtoBase
{
public abstract void FillInstance(params string[] parameters);
}
Since each Dto "knows" his structure (just like you knew to create an appropriate table in the database), it can easily implement the FillInstanceMethod, here is a simple Dto example:
public class ProductDto : DtoBase
{
public int ProductId { get; set; }
public double Weight { get; set; }
public int FamilyId { get; set; }
public override void FillInstance(params string[] parameters)
{
ProductId = int.Parse(parameters[0]);
Weight = double.Parse(parameters[1]);
FamilyId = int.Parse(parameters[2]);
}
}
After you have your Dto filled with data you should find the appropriate Dao to handle it
which is basically happens in reflection in this line of the Import() method:
var repository = (DaoBase)_objectResolver.Resolve("DAL.Repository", className + "Dao");
In my case the Dao implements an abstract base class - but it's not that relevant to your problem, your DaoBase can be a simple abstract with a single Save() method.
This way you have a dedicated Dao to CRUD your Dto's - each Dao simply knows how to save for its relevant Dto. Below is the corresponding ProductDao to the ProductDto:
public class ProductDao : DaoBase
{
private const string InsertProductQuery = #"SET foreign_key_checks = 0;
Insert into product (productID, weight, familyID)
VALUES (#productId, #weight, #familyId);
SET foreign_key_checks = 1;";
public override void Save(DtoBase dto)
{
var productToSave = dto as ProductDto;
var saveproductCommand = GetDbCommand(InsertProductQuery);
if (productToSave != null)
{
saveproductCommand.Parameters.Add(CreateParameter("#productId", productToSave.ProductId));
saveproductCommand.Parameters.Add(CreateParameter("#weight", productToSave.Weight));
saveproductCommand.Parameters.Add(CreateParameter("#familyId", productToSave.FamilyId));
ExecuteNonQuery(ref saveproductCommand);
}
}
}
Please ignore the CreateParameter() method, since it's an abstraction from the base classs. you can just use a CreateSqlParameter or CreateDataParameter etc.
Just notice, it's a real naive implementation - you can easily remodel it better, depends on your needs.
From the first impression of your questionc I guess you would be having hugely number of records (more than lacs). If yes I would consider the SQL bulk copies an option. If the record would be less go ahead single record. Insert. The reason for you insert not working is u not providing all the columns of the table and also there's some syntax error.

Categories

Resources