This is the first time I have done any sort of work with flat files.
I need this to be a plain txt file NOT XML.
I have written the following opting for a comma delimited format.
public static void DataTableToFile(string fileLoc, DataTable dt)
{
StringBuilder str = new StringBuilder();
// get the column headers
foreach (DataColumn c in dt.Columns)
{
str.Append(c.ColumnName.ToString() + ",");
}
str.Remove(str.Length-1, 1);
str.AppendLine();
// write the data here
foreach (DataRow dr in dt.Rows)
{
foreach (var field in dr.ItemArray)
{
str.Append(field.ToString() + ",");
}
str.Remove(str.Length-1, 1);
str.AppendLine();
}
try
{
Write(fileLoc, str.ToString());
}
catch (Exception ex)
{
//ToDO:Add error logging
}
}
My question is: Can i do this better or faster?
And str.Remove(str.Length-1, 1); is there to remove the last , which is the only way I could think of.
Any suggestions?
Use
public static void DataTableToFile(string fileLoc, DataTable dt)
{
StringBuilder str = new StringBuilder();
// get the column headers
str.Append(String.Join(",", dt.Columns.Cast<DataColumn>()
.Select(col => col.ColumnName)) + "\r\n");
// write the data here
dt.Rows.Cast<DataRow>().ToList()
.ForEach(row => str.Append(string.Join(",", row.ItemArray) + "\r\n"));
try
{
Write(fileLoc, str.ToString());
}
catch (Exception ex)
{
//ToDO:Add error logging
}
}
The key point would be: there is no need to construct this in memory with a StringBuilder - you should instead be writing to a file via something like StreamWriter, i.e. via File.CreateText. The API is similar to StringBuilder, but you shouldn't try to remove - instead, don't add - i.e.
bool first = true;
foreach(...blah...) {
if(first) { first = false; }
else { writer.Write(','); }
... write the data ...
}
As another consideration: CSV is not just a case of adding commas. You need to think about quoted text (for data with , in), and multi-line data. Unless the data is very very simple. You might also want to make the format more explicit than just .ToString(), which is very culture-sensitive. The classic example would be large parts of Europe that use , as the decimal character, thus "CSV" often uses a different separator, to avoid having to quote everything. If the choice is available, personally I'd always use TSV instead of CSV - less problematic (in theory, although you still need to handle data with tabs in).
Related
I need to be able to rename the column in a spreadsheet from 'idn_prod' to 'idn_prod1', but there are two columns with this name.
I have tried implementing code from similar posts, but I've only been able to update both columns. Below you'll find the code I have that just renames both columns.
//locate and edit column in csv
string file1 = #"C:\Users\username\Documents\AppDevProjects\import.csv";
string[] lines = System.IO.File.ReadAllLines(file1);
System.IO.StreamWriter sw = new System.IO.StreamWriter(file1);
foreach(string s in lines)
{
sw.WriteLine(s.Replace("idn_prod", "idn_prod1"));
}
I expect only the 2nd column to be renamed, but the actual output is that both are renamed.
Here are the first couple rows of the CSV:
I'm assuming that you only need to update the column header, the actual rows need not be updated.
var file1 = #"test.csv";
var lines = System.IO.File.ReadAllLines(file1);
var columnHeaders = lines[0];
var textToReplace = "idn_prod";
var newText = "idn_prod1";
var indexToReplace = columnHeaders
.LastIndexOf("idn_prod");//LastIndex ensures that you pick the second idn_prod
columnHeaders = columnHeaders
.Remove(indexToReplace,textToReplace.Length)
.Insert(indexToReplace, newText);//I'm removing the second idn_prod and replacing it with the updated value.
using (System.IO.StreamWriter sw = new System.IO.StreamWriter(file1))
{
sw.WriteLine(columnHeaders);
foreach (var str in lines.Skip(1))
{
sw.WriteLine(str);
}
sw.Flush();
}
Replace foreach(string s in lines) loop with
for loop and get the lines count and rename only the 2nd column.
I believe the only way to handle this properly is to crack the header line (first string that has column names) into individual parts, separated by commas or tabs or whatever, and run through the columns one at a time yourself.
Your loop would consider the first line from the file, use the Split function on the delimiter, and look for the column you're interested in:
bool headerSeen = false;
foreach (string s in lines)
{
if (!headerSeen)
{
// special: this is the header
string [] parts = s.Split("\t");
for (int i = 0; i < parts.Length; i++)
{
if (parts[i] == "idn_prod")
{
// only fix the *first* one seen
parts[i] = "idn_prod1";
break;
}
}
sw.WriteLine( string.Join("\t", parts));
headerSeen = true;
}
else
{
sw.WriteLine( s );
}
}
The only reason this is even remotely possible is that it's the header and not the individual lines; headers tend to be more predictable in format, and you worry less about quoting and fields that contain the delimiter, etc.
Trying this on the individual data lines will rarely work reliably: if your delimiter is a comma, what happens if an individual field contains a comma? Then you have to worry about quoting, and this enters all kinds of fun.
For doing any real CSV work in C#, it's really worth looking into a package that specializes in this, and I've been thrilled with CsvHelper from Josh Close. Highly recommended.
Good Day,
i am having trouble reading csv files on my asp.net project.
it always returns the error index out of range cannot find column 6
before i go on explaining what i did here is the code:
string savepath;
HttpPostedFile postedFile = context.Request.Files["Filedata"];
savepath = context.Server.MapPath("files");
string filename = postedFile.FileName;
todelete = savepath + #"\" + filename;
string forex = savepath + #"\" + filename;
postedFile.SaveAs(savepath + #"\" + filename);
DataTable tblcsv = new DataTable();
tblcsv.Columns.Add("latitude");
tblcsv.Columns.Add("longitude");
tblcsv.Columns.Add("mps");
tblcsv.Columns.Add("activity_type");
tblcsv.Columns.Add("date_occured");
tblcsv.Columns.Add("details");
string ReadCSV = File.ReadAllText(forex);
foreach (string csvRow in ReadCSV.Split('\n'))
{
if (!string.IsNullOrEmpty(csvRow))
{
//Adding each row into datatable
tblcsv.Rows.Add();
int count = 0;
foreach (string FileRec in csvRow.Split('-'))
{
tblcsv.Rows[tblcsv.Rows.Count - 1][count] = FileRec;
count++;
}
}
}
i tried using comma separated columns but the string that comes with it contains comma so i tried the - symbol just to make sure that there are no excess commas on the text file but the same error is popping up.
am i doing something wrong?
thank you in advance
Your excel file might have more columns than 6 for one or more rows. For this reason the splitting in inner foreach finds more columns but the tblcsv does not have more columns than 6 to assign the extra column value.
Try something like this:
foreach (string FileRec in csvRow.Split('-'))
{
if(count > 5)
return;
tblcsv.Rows[tblcsv.Rows.Count - 1][count] = FileRec;
count++;
}
However it would be better if you check for additional columns before processing and handle the issue.
StringBuilder errors = new StringBuilder(); //// this will hold the record for those array which have length greater than the 6
foreach (string csvRow in ReadCSV.Split('\n'))
{
if (!string.IsNullOrEmpty(csvRow))
{
//Adding each row into datatable
DataRow dr = tblcsv.NewRow(); and then
int count = 0;
foreach (string FileRec in csvRow.Split('-'))
{
try
{
dr[count] = FileRec;
tblcsv.Rows.Add(dr);
}
catch (IndexOutOfRangeException i)
{
error.AppendLine(csvRow;)
break;
}
count++;
}
}
}
Now in this case we will have the knowledge of the csv row which is causing the errors, and rest will be processed successfully. Validate the row in errors whether its desired input, if not then correct value in csv file.
You can't treat the file as a CSV if the delimiter appears inside a field. In this case you can use a regular expression to extract the first five fields up to the dash, then read the rest of the line as the sixth field. With a regex you can match the entire string and even avoid splitting lines.
Regular expressions are also a lot faster than splits and consume less memory because they don't create temporary strings. That's why they are used extensively to parse log files. The ability to capture fields by name doesn't hurt either
The following sample parses the entire file and captures each field in a named group. The last field captures everything to the end of the line:
var pattern="^(?<latitude>.*?)-(?<longitude>.*?)-(?<mps>.*?)-(?<activity_type>.*?)-" +
"(?<date_occured>.*?)-(?<detail>.*)$";
var regex=new Regex(pattern,RegexOptions.Multiline);
var matches=regex.Matches(forex);
foreach (Match match in matches)
{
DataRow dr = tblcsv.NewRow();
row["latitude"]=match.Groups["latitude"].Value);
row["longitude"]=match.Groups["longitude"].Value);
...
tblcsv.Rows.Add(dr);
}
The (?<latitude>.*?)- pattern captures everything up to the first dash into a group named latitude. The .*? pattern means the matching isn't greedy ie it won't try to capture everything to the end of the line but will stop when the first - is encountered.
The column names match the field names, which means you can add all fields with a loop:
foreach (Match match in matches)
{
var row = tblCsv.NewRow();
foreach (Group group in match.Groups)
{
foreach (DataColumn col in tblCsv.Columns)
{
row[col.ColumnName]=match.Groups[col.ColumnName].Value;
}
}
tblCsv.Rows.Add(row);
}
tblCsv.Rows.Add(row);
I am a beginner c# programmer and just had a quick question on an application I am building. My process reads in multiple files with the purpose of stripping out specific records based on a 1 or 0 pipe delimited field in the text file. It is the last delimited field in the file actually. If it is a 0, I write it to a temp file (which will later replace the original that I read), if it is anything else I do not. And not to try to get it too confusing but there are two types of records in the file, a header row, and then that is followed by a few supp rows. The header row is the only one that has the flag, so as you can tell from below, if the bool gets set to a good record by being 0, it writes the header record along with all supp records below it until it hits a bad one in which case it will negate writing them until the next good one.
However, what I am trying to do now (and would like to know the easiest way), is how to write the header record without the last pipe delimited field (IE the flag). Since it should always be the last 2 characters of the row (for example "0|" or "1|" as the preceeding pipe is needed), should it be a string trim on my inputrecord string? Is there an easier way? Is there a way to do a split on the record but not actually include the last field (in this case, field 36)? Any advice would be appreciated. Thank you,
static void Main(string[] args)
{
try
{
string executionDirectory = RemoveFlaggedRecords.Properties.Settings.Default.executionDirectory;
string workDirectory = RemoveFlaggedRecords.Properties.Settings.Default.workingDirectory;
string[] files = Directory.GetFiles(executionDirectory, "FilePrefix*");
foreach (string file in files)
{
string tempFile = Path.Combine(workDirectory,Path.GetFileName(file));
using (StreamReader sr = new StreamReader(file,Encoding.Default))
{
StreamWriter sw = new StreamWriter(tempFile);
string inputRecord = sr.ReadLine();
bool goodRecord = false;
bool isheaderRecord = false;
while (inputRecord != null)
{
string[] fields = inputRecord.Split('|');
if (fields[0].ToString().ToUpper() == "HEADER")
{
goodRecord = Convert.ToInt32(fields[36]) == 0;
isheaderRecord = true;
}
if (goodRecord == true && isheaderRecord == true)
{
// I'm not sure what to do here to write the string without the 36th field***
}
else if (goodRecord == true)
{
sw.WriteLine(inputRecord);
}
inputRecord = sr.ReadLine();
}
sr.Close();
sw.Close();
sw = null;
}
}
string[] newFiles = Directory.GetFiles(workDirectory, "fileprefix*");
foreach (string file in newFiles)
{
string tempFile = Path.Combine(workDirectory, Path.GetFileName(file));
string destFile = Path.Combine(executionDirectory, Path.GetFileName(file));
File.Copy(tempFile, destFile, true);
if (File.Exists(destFile))
{
File.Delete(tempFile);
}
}
}
catch (Exception ex)
{
Console.WriteLine(ex);
}
finally
{
// not done
}
}
One way you could do this - if what you want at that point in the code is to always write all but the final element in your string[] - is construct a for loop that terminates before the last item:
for (int i = 0; i < fields.Length - 1; i++)
{
// write your field here
}
This is assuming that you want to write each field individually, and that you want to iterate through fields in the first place. If all you want to do is just write a single string to a single line without using a loop, you could do this:
var truncatedFields = fields.Take(fields.Length - 1);
And then just write the truncatedFields string[] as you see fit. One way you could accomplish all this in a single line might look like so:
sw.WriteLine(String.Join("|", fields.Take(fields.Length - 1)));
goodRecord = fields.Last().Trim() == "0";
if (inputRecord.Contains("|") string outputRecord = inputRecord.Substring(1, inputRecord.LastIndexOf("|"));
I have a list with many line extracted from file and I want to display it a richTextbox with this code
foreach (string s in Dettaglio)
{
txtDettaglio.Text += s + Environment.NewLine;
}
And Dettaglio definition is:
System.Collections.Generic.List<string> Dettaglio = new System.Collections.Generic.List<string>();
But it makes a lot of time to accomplish it there’s any other solution or I haven’t to use richTextbox?
Firstly: I'd use AppendText instead of string concatenation:
foreach (string s in Dettaglio)
{
txtDettaglio.AppendText(s);
txtDettaglio.AppendText(Environment.NewLine);
}
It may be faster to use concatenation to avoid calling AppendText twice:
foreach (string s in Dettaglio)
{
txtDettaglio.AppendText(s + Environment.NewLine);
}
Now it could be that that won't actually be any faster, but it's what I'd try to start with - the internal data structure of RichTextBox may need to do work in order to fetch the Text property, and using AppendText you may avoid it having to reparse text that it's already handled.
Maybe using StringBuilder will be faster
StringBuilder sb = new StringBuilder();
foreach (string s in Dettaglio)
{
sb.Append(s + Environment.NewLine);
}
txtDettaglio.Text = sb.ToString();
I had a look on the site and on Google, but I couldn't seem to find a good solution to what I'm trying to do.
Basically, I have a client server application (C#) where I send the server an SQL select statement (Connecting to SQL Server 2008) and would like to return results in a CSV manner back to the client.
So far I have the following:
if (sqlDataReader.HasRows)
{
while(sqlDataReader.Read())
{
//not really sure what to put here and if the while should be there!
}
}
`
Unfortunately, I'm really new to connecting C# with SQL. I need any tips on how to simply put the results in a string in a csv format. The columns and fields are likely to be different so I cannot use the method of something[something] as I've seen in a few sites. I'm not sure if I'm being comprehensible tbh!
I would really appreciate any tips / points on how to go about this please!
Here is a method I use to dump any IDataReader out to a StreamWriter. I generally create the StreamSwriter like this: new StreamWriter(Response.OutputStream). I convert any double-quote characters in the input into single-quote characters (maybe not the best way to handle this, but it works for me).
public static void createCsvFile(IDataReader reader, StreamWriter writer) {
string Delimiter = "\"";
string Separator = ",";
// write header row
for (int columnCounter = 0; columnCounter < reader.FieldCount; columnCounter++) {
if (columnCounter > 0) {
writer.Write(Separator);
}
writer.Write(Delimiter + reader.GetName(columnCounter) + Delimiter);
}
writer.WriteLine(string.Empty);
// data loop
while (reader.Read()) {
// column loop
for (int columnCounter = 0; columnCounter < reader.FieldCount; columnCounter++) {
if (columnCounter > 0) {
writer.Write(Separator);
}
writer.Write(Delimiter + reader.GetValue(columnCounter).ToString().Replace('"', '\'') + Delimiter);
} // end of column loop
writer.WriteLine(string.Empty);
} // data loop
writer.Flush();
}
As mentioned, there are quite a few issues with delimiters, escaping characters correctly, and formatting different types correctly. But if you are just looking for an example of putting data into a string, here is yet another one. It does not do any checking for the aforementioned complications.
public static void ReaderToString( IDataReader Reader )
{
while ( Reader.Read() )
{
StringBuilder str = new StringBuilder();
for ( int i = 0; i < Reader.FieldCount; i++ )
{
if ( Reader.IsDBNull( i ) )
str.Append( "null" );
else
str.Append( Reader.GetValue( i ).ToString() );
if ( i < Reader.FieldCount - 1 )
str.Append( ", " );
}
// do something with the string here
Console.WriteLine(str);
}
}
When dealing with CSV file I usually go for the FileHelpers library: it has a SqlServerStorage class which you can use to read records from a SQL server and write them to a CSV file.
You may be able to adapt the implementation of a CSV writer available here.
If you also need to parse CSV files, the implementation here is relatively good.
The CSV format is more complicated than it looks - particularly if you're going to deal with arbitrary data coming back from a query. You would need to be able to handle escaping of special characters (like quotes and commas), dealing with line breaks, and the like. You are better off finding and using a proven implementation - especially if you're new to C#.
You can get the table column names like this:
SqlConnection conn = new SqlConnection(connString);
conn.Open();
SqlCommand cmd = new SqlCommand(sql, conn);
SqlDataReader rdr = cmd.ExecuteReader();
DataTable schema = rdr.GetSchemaTable();
foreach (DataRow row in schema.Rows)
{
foreach (DataColumn col in schema.Columns)
Console.WriteLine(col.ColumnName + " = " + row[col]);
}
rdr.Close()
conn.Close();
Of course you can determine the columns names with the first row only, here it does it on every rows.
You can now put your own code to join the columns into a CSV line pretty easily...
Thanks