i'm reading a CSV file and changing the delimiter from a "," to a "|". However i've noticed in my data (which I have no control over) that in certain cases I have some data that does not want to follow this rule and it contains quoted data with a comma in it. I'm wondering how best to not replace these exceptions?
For example:
ABSON TE,Wick Lane,"Abson, Pucklechurch",Bristol,Avon,ENGLAND,BS16
9SD,37030,17563,BS0001A1,,
Should be changed to:
ABSON TE|Wick Lane|"Abson, Pucklechurch"|Bristol|Avon|ENGLAND|BS16
9SD|37030|17563|BS0001A1||
The code to read and replace the CSV file is this:
var contents = File.ReadAllText(filePath).Split(new string[] { "\n", "\r\n" }, StringSplitOptions.RemoveEmptyEntries).ToArray();
var formattedContents = contents.Select(line => line.Replace(',', '|'));
For anyone else struggling with this, I ended up using the built in .net csv parser. See here for more details and example: http://coding.abel.nu/2012/06/built-in-net-csv-parser/
My specific code:
// Create new parser object and setup parameters
var parser = new TextFieldParser(new StringReader(File.ReadAllText(filePath)))
{
HasFieldsEnclosedInQuotes = true,
Delimiters = new string[] { "," },
TrimWhiteSpace = true
};
var csvSplitList = new List<string>();
// Reads all fields on the current line of the CSV file and returns as a string array
// Joins each field together with new delimiter "|"
while (!parser.EndOfData)
{
csvSplitList.Add(String.Join("|", parser.ReadFields()));
}
// Newline characters added to each line and flattens List<string> into single string
var formattedCsvToSave = String.Join(Environment.NewLine, csvSplitList.Select(x => x));
// Write single string to file
File.WriteAllText(filePathFormatted, formattedCsvToSave);
parser.Close();
Related
I want to extract each string between the first "" for each row and create a text file with it.
sample CSV:
number,season,episode,airdate,title,tvmaze link
1,1,1,13 Sep 05,"Pilot","https://www.tvmaze.com/episodes/991/supernatural-1x01-pilot"
2,1,2,20 Sep 05,"Wendigo","https://www.tvmaze.com/episodes/992/supernatural-1x02-wendigo"
3,1,3,27 Sep 05,"Dead in the Water","https://www.tvmaze.com/episodes/993/supernatural-1x03-dead-in-the-water"
4,1,4,04 Oct 05,"Phantom Traveler","https://www.tvmaze.com/episodes/994/supernatural-1x04-phantom-traveler"
5,1,5,11 Oct 05,"Bloody Mary","https://www.tvmaze.com/episodes/995/supernatural-1x05-bloody-mary"
Final result .txt file:
Pilot
Wendigo
Dead in the Water
Phantom Traveler
Bloody Mary
my function:
private void GetEpisodeNamesFromCSV()
{
using (StreamReader sr = new StreamReader(AppDir + "\\list.csv"))
{
string strResult = sr.ReadToEnd();
string[] result = strResult.Split(new string[] { Environment.NewLine }, StringSplitOptions.None);
File.WriteAllLines(AppDir + "\\list_generated_" + ShowTitel + ".txt", result);
}
}
I can't figure out how to properly Split the stream reader object, to only get the names on each Line. I'm very new to programming, and this site helped me immensely! But this problem is specific, and I couldn't find the answer myself. I appreciate any help.
EDIT:
I went with the csvHelper solution suggested by #Jesús López:
// Create a List
List<string> episodeNames = new List<string>();
// Make sure ther are no empty lines in the csv file
var lines = File.ReadAllLines(AppDir + "\\list.csv").Where(arg => !string.IsNullOrWhiteSpace(arg));
File.WriteAllLines(AppDir + "\\list.csv", lines);
// Open the file stream
var streamReader = File.OpenText(AppDir + "\\list.csv");
var csv = new CsvReader(streamReader, CultureInfo.InvariantCulture);
// Read the File
csv.Read();
// Read the Header
csv.ReadHeader();
// Create a string array with Header
string[] header = csv.Context.Reader.HeaderRecord;
// Select the column and get the Index
var columnExtracted = "title";
int extractedIndex = Array.IndexOf(header, columnExtracted);
// Read the file and fill the List
while (csv.Read())
{
string[] row = csv.Context.Reader.Parser.Record;
string column = row[extractedIndex];
episodeNames.Add(column);
}
// Convert the List to a string array
string[] result = episodeNames.ToArray();
//write the array to a text file
File.WriteAllLines(AppDir + "\\list.txt", result);
This is not so much help on StreamReader as it is on strings
If you are confident of the file layout and format as shown (and that it will be consistent), try this quick-and-dirty in a Console app
:
var line = sr.ReadLine();
while (line != null)
{
if (line.Trim() == string.Empty) continue;
var lineEntries = line.Split(',', StringSplitOptions.RemoveEmptyEntries);
Console.WriteLine(lineEntries[4].Trim('"'));
line = sr.ReadLine();
}
Note that I offer this because of your statement "I am very new to programming" to show off methods string.Split() .Trim() (and check out .Join()) and how easy they make the basic logic of what you want to achieve.
Using a proper CSV reader is the best idea for a robust solution (plus data-integrity checking, exception handling etc), but there is a reciprocal danger of over-engineering, so if this code displays what you want/expect for a once-off learning experience, then go ahead and implement;-)
Hi my application basically reads a CSV file which will always have the same format and I need the application to create a CSV file with different formatting. Reading and writing CSV file is not the issue, however the problem I am having is getting the amounts value as these are formatted with a , in the csv file (ex: 4, 500). Having said that these are being split when writing to csv file.
Ex: From the below, how can I get the full numbers .i.e. 2241.84 & 1072809.33
line = "\"02 MAY 18\",\"TTEWTWTE\",\"GRHGWHWH\",\"02 MAY 18\",\"2,241.84\",\"\",\"1,072,809.33\""
This is how I am reading from CSV file.
openFileDialog1.ShowDialog();
var reader = new StreamReader(File.OpenRead(openFileDialog1.FileName));
List<string> searchList = new List<string>();
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
searchList.Add(line);
}
So far I have tried to use the below which gets you \"2,241.84\" which is correct but when writing to csv file I am only getting 2
searchList[2].Split(',')[1].Replace("\"", "")
Let me visualize contents in another way:
"
\"02 MAY 18\",
\"TTEWTWTE\",
\"GRHGWHWH\",
\"02 MAY 18\",
\"2,241.84\",
\"\",
\"1,072,809.33\"
"
It seems that your separator is \", rather than ,. Change searchList[2].Split(',')[1].Replace("\"", "") to searchList[1].Split(new string[] { "\",\"" }, StringSplitOptions.None).
In your case you can use this:
var result = searchList[2].Split(new string[] { "\",\"" }, StringSplitOptions.None)[4].Replace("\"", "");
Split your string with "," separator, instead of ,.
I don't know why you are using static numbers for indexes, but I will assume it's for test purposes.
I want to add data into a text file based on a specific output, it will read an XML file and write a certain line to a text file. If the data is already written into the text file, i dont want to write it again.
Code:
public void output(string folder)
{
string S = "Data" + DateTime.Now.ToString("yyyyMMddHHmm") + ".xml";
//Trades.Save(S);
string path = Path.Combine(folder, S);
Console.WriteLine(path);
XDocument f = new XDocument(Trades);
f.Save(path);
string[] lines = File.ReadAllLines(path);
File.WriteAllLines(path, lines);
using (System.IO.StreamWriter file = new System.IO.StreamWriter(#"H:\Test" + DateTime.Now.ToString("yyMMdd") + ".txt", true))
{
foreach (string line in lines)
{
if (line.Contains("CertainData"))
{
file.WriteLine(line);
if (File.ReadAllLines(path).Any(x => x.Equals(line)))
{
}
else
{
string[] tradeRefLines = File.ReadAllLines(path);
File.WriteAllLines(path, tradeRefLines); ;
}
}
}
}
}
The problem is it will still write the line even if the data is exactly the same elsewhere. I don't want duplicate lines
Any advice?
CLARIFICATION UPDATE
The "CertainData" is a reference number
I have a bunch of files that has data in it and the piece i want to seperate and put into a text file is "CertainData" field, which will have a reference number
Sometimes the files i get sent will have the same formatted information inside it with the "CertainData" appearing in their for reference
When i run this programme, if the text file i have already contains the "CertainData" reference number inside it, i dont want it to be written
If you need anymore clarification let me know and i will update the post
I think you want this: read all lines, filter out those containing a keyword and write it to a new file.
var lines = File.ReadAllLines(path).ToList();
var filteredLines = lines.Where(!line.Contains("CertainData"));
File.WriteAllLines(path, filteredLines);
If you also want to remove duplicate lines, you can add a distinct like this:
filteredLines = filteredLines.Distinct();
Why you don't use Distinct before for loop. This will filter your lines before write in file.
Try something like this
string[] lines = new string[] { "a", "b", "c", "a" };
string[] filterLines = lines.Distinct().ToArray<string>();
I am using Microsoft.VisualBasic.FileIO.TextFieldParser to read a csv file, edit it , then parse it.
The problem is the quotes are not being kept after parsing.
I tried using parser.HasFieldsEnclosedInQuotes = true; but it does not seem to keep the quotes for some reason.
This issue breaks when a field contains a quote for example :
Before
"some, field"
After
some, field
As two seperate fields
Here is my method
public static void CleanStaffFile()
{
String path = #"C:\file.csv";
String dpath = String.Format(#"C:\file_{0}.csv",DateTime.Now.ToString("MMddyyHHmmss"));
List<String> lines = new List<String>();
if (File.Exists(path))
{
using (TextFieldParser parser = new TextFieldParser(path))
{
parser.HasFieldsEnclosedInQuotes = true;
parser.Delimiters = new string[] { "," };
while (!parser.EndOfData)
{
string[] parts = parser.ReadFields();
if (parts == null)
{
break;
}
if ((parts[12] != "") && (parts[12] != "*,116"))
{
parts[12] = parts[12].Substring(0, 3);
}
else
{
parts[12] = "0";
}
lines.Add(string.Join(",", parts));
}
}
using (StreamWriter writer = new StreamWriter(dpath, false))
{
foreach (String line in lines)
writer.WriteLine(line);
}
}
MessageBox.Show("CSV file successfully processed :\n");
}
So you want to have quotes after you have modified it at string.Join(",", parts)? Then it's easy since only fields which contain the separator were wrapped in quotes before. Just add them again before the String.Join.
So before (and desired):
"some, field"
after(not desired):
some, field
This should work:
string[] fields = parser.ReadFields();
// insert your logic here ....
var newFields = fields
.Select(f => f.Contains(",") ? string.Format("\"{0}\"", f) : f);
lines.Add(string.Join(",", newFields));
Edit
I would like to keep quotes even if doesn't contain a comma
Then it's even easier:
var newFields = fields.Select(f => string.Format("\"{0}\"", f));
The TextFieldParser.HasFieldsEnclosedInQuotes property is used as follows, from the MSDN page:
If the property is True, the parser assumes that fields are enclosed in quotation marks (" ") and may contain line endings.
If a field is enclosed in quotation marks, for example, abc, "field2a,field2b", field3 and this property is True, then all text enclosed in quotation marks will be returned as is; this example would return abc|field2a,field2b|field3. Setting this property to False would make this example return abc|"field2a|field2b"|field3.
The quotes will indicate the start and end of a field, which may then contain the character(s) used to normally separate fields. If your data itself has quotes, you need to set HasFieldsEnclosedInQuotes to false.
If your data fields can contain both separators and quotes, you will need to start escaping quotes before parsing, which is a problem. Basicly you're going beyond the capabilities of a simple CSV file.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to covert tab separated file to CSV file
i have a tab delimited text file which i have to convert into CSV file all this must be done through C# code. My txt file is very large about(1.5 GB), hence i want to convert it in a quick time. please help me.
If your input tab delimited text file does not have any commas are part of the data, then it is a very straightforward find and replace similar to the other answers here:
var lines = File.ReadAllLines(path);
var csv= lines.Select(row => string.Join(",", row.Split('\t')));
File.WriteAllLines(path, csv);
But if your data has commas, doing this is going to break your columns as you now have extra commas that are not supposed to be delimiters, but will be interpreted as such. How to handle it depends greatly on what you application you will be using to read the CSV.
A Microsoft Excel compatible CSV is going to have double quotes around fields with commas to make sure they are interpreted as data and not a delimiter. This also means that fields that contain double quotes as data will need special treatment.
I would recommend a similar approach with an extension method.
var input = File.ReadAllLines(path);
var lines = input.Select(row => row.Split('\t'));
lines = lines.Select(row => row.Select(field => field.EscapeCsvField(',', '"')).ToArray());
var csv = lines.Select(row => string.Join(",", row));
File.WriteAllLines(path, csv.ToArray());
And here's the EscapeCsvField extension method:
static class Extension
{
public static String EscapeCsvField(this String source, Char delimiter, Char escapeChar)
{
if (source.Contains(delimiter) || source.Contains(escapeChar))
return String.Format("{0}{1}{0}", escapeChar, source);
return source;
}
}
Also, if the file is large, it might be best to not read the entire file into memory. In that case, I would suggest writing the CSV output to a different file and then you could use StreamReader and StreamWriter to only work with it 1 line at a time.
var tabPath = path;
var csvPath = Path.Combine(
Path.GetDirectoryName(path),
String.Format("{0}.{1}", Path.GetFileNameWithoutExtension(path), "csv"));
using (var sr = new StreamReader(tabPath))
using (var sw = new StreamWriter(csvPath, false))
{
while (!sr.EndOfStream)
{
var line = sr.ReadLine().Split('\t').Select(field => field.EscapeCsvField(',', '"')).ToArray();
var csv = String.Join(",", line);
sw.WriteLine(csv);
}
}
File.Delete(tabPath);
var csv = File.ReadAllLines("Path").Select(line => line.Replace("\t", ","));
You could simply call
public void ConvertToCSV(string strPath, string strOutput)
{
File.WriteAllLines(strOutput, File.ReadAllLines("Path").Select(line => line.Replace("\t", ",")));
}
There is a lot of content already on SO for handling .CSV files, please search first or trying something.
If the format of your file is strict, you could use string.Split and string.Join:
var lines = File.ReadAllLines(path);
var newLines = lines.Select(l => string.Join(",", l.Split('\t')));
File.WriteAllLines(path, newLines);