Need help to develop a C# code to put the header validation - c#

I have a flat file with comma separated values which need to be transfer to a datatable and the values on the first line is header name, will be used as columns name of the datatable. But Before that, I need to check if all required header (Some Mandatory headers) are available in the flat file. Please help me to develop a C# code to put the header validation.
`.
.
.
/getting full file path of Uploaded file and read all text
System.IO.StreamReader file = new System.IO.StreamReader(#path);
string line;
while ((line = file.ReadLine()) != null)
{
string[] linetemp = line.Split(new char[] { ',' });
if(tblcsv.Rows.Count==0)
{
foreach (string ColName in linetemp)
{
tblcsv.Columns.Add(ColName); //Creating columns with available headers names
}
}
tblcsv.Rows.Add();
.
.
.
`//remaining code
For example
If the flat file will contain
datetime,status,Assignee,Reporter,Duration,Col1,Col2,Remarks
1504451523568,Inprogress,ABC,BCD,120,True,B,comments...
1504451523567,Completed,DFG,BCD,120,True,B,comments...
1504451523566,unassigned,VNB,BCD,160,,B,comments...
1504451523565,Inprogress,ERT,FGH,150,True,,comments...
and I need to check that only First line have all mandaory header(like- datetime,Status,Assignee and Duration).

I tired to implement your particular requirement with a sample Csv file from online. Csv file can be found here, I may not have a sophisticated code, but tried to take a simplest way to solve this particular problem.
Below is the short version of code which is of your importance.
String firstLine;
var fileStream = new FileStream( # "C:\Users\user\Desktop\AssetsImportCompleteSample.csv", FileMode.Open,
FileAccess.Read);
using(var streamReader = new StreamReader(fileStream, Encoding.UTF8)) {
firstLine = streamReader.ReadLine();
}
var values = firstLine.Split(',');
for (int i = 0; i < values.Length; i++) {
values[i] = values[i].Trim();
}
if (values.Length == 4)
{
int count=0;
IList<string> newList = new List<string> { "MXASSETInterface", "SRM_SaaS_ES", "EN", "AddChange" };
for (int i = 0; i < values.Length; i++)
{
if (newList.Contains(values[i]))
{
count++;
newList.Remove(values[i]);
}
}
if (count == 4)
{
Console.WriteLine("head is correct");
}
else
{
Console.WriteLine("head is incorrect");
}
}
The complete console application can be found with below code, which can be run direct
class Program
{
static void Main(string[] args)
{
try
{
String firstLine;
var fileStream = new FileStream(#"C:\Users\user\Desktop\AssetsImportCompleteSample.csv", FileMode.Open,
FileAccess.Read);
using (var streamReader = new StreamReader(fileStream, Encoding.UTF8))
{
firstLine = streamReader.ReadLine();
}
if (firstLine != null)
{
var values = firstLine.Split(',');
Console.WriteLine(firstLine);
for (int i = 0; i < values.Length; i++)
{
values[i] = values[i].Trim();
Console.WriteLine(values[i]);
}
if (values.Length == 4)
{
int count=0;
IList<string> newList = new List<string> { "MXASSETInterface", "SRM_SaaS_ES", "EN", "AddChange" };
for (int i = 0; i < values.Length; i++)
{
if (newList.Contains(values[i]))
{
count++;
newList.Remove(values[i]);
}
}
if (count == 4)
{
Console.WriteLine("head is correct");
}
else
{
Console.WriteLine("head is incorrect");
}
}
else
{
Console.WriteLine("header is Invalid");
}
}
else
{
Console.WriteLine("header is Invalid");
}
Console.ReadLine();
}
catch (Exception e)
{
Console.WriteLine("Please check if file is available or path is correct", e.Message);
}
Console.ReadLine();
}
}

I suggest using CsvHelpet library for parsing the CSV file. It allows to define a class that represents a row in your file. Header names are property names by default or they can be mapped usimg fluent API.
var csv = new CsvReader( textReader ); var records = csv.GetRecords();
Get records will fail if some headers are missing.

Related

Fastest way to fuzzy match two csv files

I have written a very simple program using a nuget package in c# to read in 2 csv files and fuzzy match them and output a new csv file with all the matches. The problem is i need the program to be able to read and compare files up to 700k and comparw it to 100k. I havent been able to find a way to speed up the process. Is there any way i can do this? I will even use another language if need be.
you can ignore all the commented code its just there for when i was using it for testing purposes. sorry im a newer programmer.
the read csv funciton is for reading in the csv. the rest is code inside another function where i pass in the string arrays to pass them through fuzzymatch
static string[] ReadCSV(string path)
{
List<string> name = new List<string>();
List<string> address = new List<string>();
List<string> city = new List<string>();
List<string> state = new List<string>();
List<string> zip = new List<string>();
using (var reader = new StreamReader(path))
{
reader.ReadLine();
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
var values = line.Split(',');
name.Add(values[0] +", "+ values[1]);
//address.Add(values[1]);
//city.Add(values[2]);
//state.Add(values[3]);
//zip.Add(values[4]);
}
}
string[] name1 = name.ToArray();
return name1;
//foreach (var item in name)
//{
// Console.WriteLine(item.ToString());
//}
}
StringBuilder csvcontent = new StringBuilder();
string csvpath = #"C:\Users\bigel\Documents\outputtest.csv";
csvcontent.AppendLine("Name,Address,Match");
//Console.WriteLine("Levenshtein Edit Distance:");
int x = 1;
foreach (var name in string1)
{
for (int i = 0; i < length; i++)
{
int leven = match[i].LevenshteinDistance(name);
//Console.WriteLine(match[i] + "\t{0} against {1}", leven, name);
if (leven <= 7)
{
output[i] = input[i] + ",match";
csvcontent.AppendLine(output[i]);
//Console.WriteLine(match[i] + " " + leven + " against " + name + " is a Match");
//Console.WriteLine(output[i]);
}
else
{
if (i == 500)
{
Console.WriteLine(x);
x++;
}
}
}
}
File.AppendAllText(csvpath, csvcontent.ToString());

StreamWriter: Starting and ending on a specific line number

I would like to ask some tips and help on a reading/writing part of my C#.
Situation:
I have to read a CSV file; - OK
If the CSV file name starts with "Load_", I want to write on another CSV the data from line 2 to the last one;
If the CSV file name starts with "RO_", I want to write on 2 different CSVs, 1 with the line 1 to 4 and the other 4 to the last one;
What I have so far is:
public static void ProcessFile(string[] ProcessFile)
{
// Keeps track of your current position within a record
int wCurrLine = 0;
// Number of rows in the file that constitute a record
const int LINES_PER_ROW = 1;
int ctr = 0;
foreach (string filename in ProcessFile)
{
var sbText = new System.Text.StringBuilder(100000);
int stop_line = 0;
int start_line = 0;
// Used for the output name of the file
var dir = Path.GetDirectoryName(filename);
var fileName = Path.GetFileNameWithoutExtension(filename);
var ext = Path.GetExtension(filename);
var folderbefore = Path.GetFullPath(Path.Combine(dir, #"..\"));
var lineCount = File.ReadAllLines(#filename).Length;
string outputname = folderbefore + "output\\" + fileName;
using (StreamReader Reader = new StreamReader(#filename))
{
if (filename.Contains("RO_"))
{
start_line = 1;
stop_line = 5;
}
else
{
start_line = 2;
stop_line = lineCount;
}
ctr = 0;
while (!Reader.EndOfStream && ctr < stop_line)
{
// Add the text
sbText.Append(Reader.ReadLine());
// Increment our current record row counter
wCurrLine++;
// If we have read all of the rows for this record
if (wCurrLine == LINES_PER_ROW)
{
// Add a line to our buffer
sbText.AppendLine();
// And reset our record row count
wCurrLine = 0;
}
ctr++;
} // end of the while
}
int total_lenght = sbText.Length
// When all of the data has been loaded, write it to the text box in one fell swoop
using (StreamWriter Writer = new StreamWriter(dir + "\\" + "output\\" + fileName + "_out" + ext))
{
Writer.Write.(sbText.);
}
} // end of the foreach
} // end of ProcessFile
I was thinking about using the IF/ELSE: "using (StreamWriter Writer = new StreamWriter(dir + "\" + "output\" + fileName + "_out" + ext))" part. However, I am not sure how to pass, to StreamWriter, to only write from/to a specific line number.
Any Help is welcome! If I am missing some information, please, let me know (I am pretty new on stackoverflow).
Thank you.
Code is way too complicated
using System.Collections.ObjectModel;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
namespace ConsoleApplication57
{
class Program
{
static void Main(string[] args)
{
}
public static void ProcessFile(string[] ProcessFile)
{
foreach (string filename in ProcessFile)
{
// Used for the output name of the file
var dir = Path.GetDirectoryName(filename);
var fileName = Path.GetFileNameWithoutExtension(filename);
var ext = Path.GetExtension(filename);
var folderbefore = Path.GetFullPath(Path.Combine(dir, #"..\"));
var lineCount = File.ReadAllLines(#filename).Length;
string outputname = folderbefore + "output\\" + fileName;
using (StreamWriter Writer = new StreamWriter(dir + "\\" + "output\\" + fileName + "_out" + ext))
{
int rowCount = 0;
using (StreamReader Reader = new StreamReader(#filename))
{
rowCount++;
string inputLine = "";
while ((inputLine = Reader.ReadLine()) != null)
{
if (filename.Contains("RO_"))
{
if (rowCount <= 4)
{
Writer.WriteLine(inputLine);
}
if (rowCount == 4) break;
}
else
{
if (rowCount >= 2)
{
Writer.WriteLine(inputLine);
}
}
} // end of the while
Writer.Flush();
}
}
} // end of the foreach
} // end of ProcessFile
}
}
You can use LINQ to Take and Skip lines.
public abstract class CsvProcessor
{
private readonly IEnumerable<string> processFiles;
public CsvProcessor(IEnumerable<string> processFiles)
{
this.processFiles = processFiles;
}
protected virtual IEnumerable<string> GetAllLinesFromFile(string fileName)
{
using(var stream = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.Read))
using(var reader = new StreamReader(stream))
{
var line = String.Empty;
while((line = reader.ReadLine()) != null)
{
yield return line;
}
}
}
protected virtual void ProcessFiles()
{
var sb1 = new StringBuilder();
var sb2 = new StringBuilder();
foreach(var file in this.processFiles)
{
var fileName = Path.GetFileNameWithoutExtension(file);
var lines = GetAllLinesFromFile(file);
if(fileName.StartsWith("RO_", StringComparison.InvariantCultureIgnoreCase))
{
sb1.AppendLine(lines.Take(4)); //take only the first four lines
sb2.AppendLine(lines.Skip(4).TakeWhile(s => !String.IsNullOrEmpty(s))); //skip the first four lines, take everything else
}
else if(fileName.StartsWith("Load_", StringComparison.InvariantCultureIgnoreCase)
{
sb2.AppendLine(lines.Skip(1).TakeWhile(s => !String.IsNullOrEmpty(s)));
}
}
// now write your StringBuilder objects to file...
}
protected virtual void WriteFile(StringBuilder sb1, StringBuilder sb2)
{
// ... etc..
}
}

Replace specific data in a csv file

I am trying to replace a specific data field in my csv file but am having issues.
My csv file is structured like:
user, password, role, id,
1, abc, 2, 3
2, def, 2, 4
3, ghi, 5, 5
I can read the file fine but when I want to replace a password using a textbox and button in a windows form I am having issues.
private void resetBtn_Click(object sender, EventArgs e)
{
var encoding = Encoding.GetEncoding("iso-8859-1");
var csvLines = File.ReadAllLines("C:\\Users\\hughesa3\\Desktop\\test environment\\users.csv", encoding);
foreach (var line in csvLines)
{
var values = line.Split(',');
if (values[0].Contains(form2value))
{
values[1] = confirmPass.Text;
}
}
}
Form2value is their username, So what im trying do is: If the first column contains what was entered in form2value it will go to the 2nd column of that row.
I have tried this
var values = line.Split(',');
if (values[0].Contains(form2value))
{
MessageBox.Show(values[1]);
values[1] = confirmPass.Text;
MessageBox.Show(values[1]);
}
}
Just to see if the value is changing and it is but it is also displaying every value[1] when i only want it to if form2value was found.
I tried to explain this as best as I could but if anyone needs more info please let me know.
Does anybody know what I am doing wrong ?
Life would be easier for you if you used a data table..........
Here is an excerp...
DT is a DataTable.
Split the first line of your file and us dt.Columns.Add to add the column headings....
private void AddDataToDataTable()
{
using (StreamReader sr = new StreamReader(new MemoryStream(this.FileContents)))
{
//Igone headings & blank Lines
string line = string.Empty;
while ((line = sr.ReadLine()) != null)
{
//If blank line then skip line
if (line == string.Empty)
{
continue;
}
dt.Rows.Add(line.Split(this.Delimeter));
}
}
}
Hope this helps
You're changing the values internal array you use in your code, not the file itself. In fact you're not writing the file anywhere, just reading it.
You'll need to: Read the file, get the line where the username is (if it exists), then write that specific line with the password.
Here's how you can do it:
private void resetBtn_Click(object sender, EventArgs e)
{
var encoding = Encoding.GetEncoding("iso-8859-1");
var csvLines = File.ReadAllLines("C:\\Users\\hughesa3\\Desktop\\test environment\\users.csv", encoding);
for (int i = 0; i < csvLines.Length; i++)
{
var values = csvLines[i].Split(',');
if (values[0].Contains(form2value))
{
values[1] = confirmPass.Text;
using (FileStream stream = new FileStream("C:\\Users\\hughesa3\\Desktop\\test environment\\users.csv", FileMode.Create))
{
using (StreamWriter writer = new StreamWriter(stream, encoding))
{
for (int currentLine = 0; currentLine < csvLines.Length; ++currentLine)
{
if (currentLine == i)
{
writer.WriteLine(string.Join(",", values));
}
else
{
writer.WriteLine(csvLines[i]);
}
}
writer.Close();
}
stream.Close();
}
}
}
}

Read file, check correctness of column, write file C#

I need to check certain columns of data to make sure there are no trailing blank spaces. At first thought I thought it would be very easy, but after attempting to achieve the goal I have got stuck.
I know that there should be 6-digits in the column I need to check. If there is less I will reject, if there are more I will trim the blank spaces. After doing that for the entire file, I want to write it back to the file with the same delimiters.
This is my attempt:
Everything seems to be working correctly except for writing the file.
if (File.Exists(filename))
{
using (StreamReader sr = new StreamReader(filename))
{
string lines = sr.ReadLine();
string[] delimit = lines.Split('|');
while (delimit[count] != "COLUMN_DATA_TO_CHANGE")
{
count++;
}
string[] allLines = File.ReadAllLines(#filename);
foreach(string nextLine in allLines.Skip(1)){
string[] tempLine = nextLine.Split('|');
if (tempLine[count].Length == 6)
{
checkColumn(tempLine);
writeFile(tempLine);
}
else if (tempLine[count].Length > 6)
{
tempLine[count] = tempLine[count].Trim();
checkColumn(tempLine);
}
else
{
throw new Exception("Not enough numbers");
}
}
}
}
}
public static void checkColumn(string[] str)
{
for (int i = 0; i < str[count].Length; i++)
{
char[] c = str[count].ToCharArray();
if (!Char.IsDigit(c[i]))
{
throw new Exception("A non-digit is contained in data");
}
}
}
public static void writeFile(string[] str)
{
string temp;
using (StreamWriter sw = new StreamWriter(filename+ "_tmp", false))
{
StringBuilder builder = new StringBuilder();
bool firstColumn = true;
foreach (string value in str)
{
if (!firstColumn)
{
builder.Append('|');
}
if (value.IndexOfAny(new char[] { '"', ',' }) != -1)
{
builder.AppendFormat("\"{0}\"", value.Replace("\"", "\"\""));
}
else
{
builder.Append(value);
}
firstColumn = false;
}
temp = builder.ToString();
sw.WriteLine(temp);
}
}
If there is a better way to go about this, I would love to hear it. Thank you for looking at the question.
edit:
file structure-
country| firstname| lastname| uniqueID (column I am checking)| address| etc
USA|John|Doe|123456 |5 main street|
notice the blank space after the 6
var oldLines = File.ReadAllLines(filePath):
var newLines = oldLines.Select(FixLine).ToArray();
File.WriteAllLines(filePath, newLines);
string FixLine(string oldLine)
{
string fixedLine = ....
return fixedLine;
}
The main problem with writing the file is that you're opening the output file for each output line, and you're opening it with append=false, which causes the file to be overwritten every time. A better approach would be to open the output file one time (probably right after validating the input file header).
Another problem is that you're opening the input file a second time with .ReadAllLines(). It would be better to read the existing file one line at a time in a loop.
Consider this modification:
using (StreamWriter sw = new StreamWriter(filename+ "_tmp", false))
{
string nextLine;
while ((nextLine = sr.ReadLine()) != null)
{
string[] tempLine = nextLine.Split('|');
...
writeFile(sw, tempLine);

Matching the name and size of a file

I'm having some trouble integrating two pieces of code. The first checks the size of a file and the next one loops trough a SQL database and looks for a matching name for a file. I basically want to check if it's a new file or if the file has changed since I logged some of it's data last time.
This gets the size of each file in the directory
// Make a reference to a directory.
DirectoryInfo di = new DirectoryInfo("C:\\Users");
// Get a reference to each file in that directory.
FileInfo[] fiArr = di.GetFiles();
// Display the names and sizes of the files.
MessageBox.Show("The directory {0} contains the following files:", di.Name);
foreach (FileInfo f in fiArr)
MessageBox.Show("The size of" + f.Name + " is " + f.Length + " bytes.");
This code loops untill it finds a mach or untill all entries has been looked trough.
try
{
// LINQ query for all files containing the word '.txt'.
var files = from file in
Directory.EnumerateFiles("C:\\Users")
where file.ToLower().Contains(".txt")
select file;
foreach (var file in files)
{
//Get path to HH file
filename = System.IO.Path.GetFileName(file);
tempString = "";
//Keep looking trough database utill database empty or HH found
while (inc != numberOfSessions && (filename != tempString))
{
sessionRow = sessions.Tables["Sessions"].Rows[inc];
tempString = sessionRow.ItemArray.GetValue(1).ToString();
inc++;
}
Lets say ItemAttay.GetValue(2) returns the saved size of a file. How can i most efficiently keep the while loop going if
inc != numberOfSessions && (filename != tempString) && (sessionRow.ItemArray.GetValue(2) == f.length)
Thanks for having a look!
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Data;
class Program
{
static void Main(string[] args)
{
var files1 = new List<string>(Directory.GetFiles(args[0],
"*.txt",
SearchOption.AllDirectories));
List<FileData> ListFiles = new List<FileData>();
for (int i = 0; i < files1.Count; i++)
{
FileInfo file = new FileInfo(files1[i]);
FileData _tmpfile = new FileData(file.Name.ToString(), file.Length,
File.GetLastWriteTime(files1[1]).ToString("yyyy-MM-dd H:mm:ss"),
File.GetLastAccessTime(files1[1]).ToString("yyyy-MM-dd H:mm:ss"));
ListFiles.Add(_tmpfile);
}
DataSet sessions = new DataSet();
DataTable dt = sessions.Tables["Sessions"];
for (int i = 0; i < ListFiles.Count; i++)
{
//compares every file in folder to database
FileData _tmp = ListFiles[i];
for (int j = 0; j < dt.Rows.Count; j++)
{
if (_tmp.GSFileName == dt.Rows[i][0].ToString())
{
//put some code here
break;
}
if (_tmp.GSSize == long.Parse(dt.Rows[i][1].ToString()))
{
//put some code here
break;
}
}
}
}
}
public class FileData
{
string FileName = "";
public string GSFileName
{
get { return FileName; }
set { FileName = value; }
}
long Size = 0;
public long GSSize
{
get { return Size; }
set { Size = value; }
}
string DateOfModification = "";
public string GSDateOfModification
{
get { return DateOfModification; }
set { DateOfModification = value; }
}
string DateOfLastAccess = "";
public string GSDateOfLastAccess
{
get { return DateOfLastAccess; }
set { DateOfLastAccess = value; }
}
public FileData(string fn, long si, string dateofmod, string dateofacc)
{
FileName = fn;
Size = si;
DateOfModification = dateofmod;
DateOfLastAccess = dateofacc;
}
}

Categories

Resources