This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to covert tab separated file to CSV file
i have a tab delimited text file which i have to convert into CSV file all this must be done through C# code. My txt file is very large about(1.5 GB), hence i want to convert it in a quick time. please help me.
If your input tab delimited text file does not have any commas are part of the data, then it is a very straightforward find and replace similar to the other answers here:
var lines = File.ReadAllLines(path);
var csv= lines.Select(row => string.Join(",", row.Split('\t')));
File.WriteAllLines(path, csv);
But if your data has commas, doing this is going to break your columns as you now have extra commas that are not supposed to be delimiters, but will be interpreted as such. How to handle it depends greatly on what you application you will be using to read the CSV.
A Microsoft Excel compatible CSV is going to have double quotes around fields with commas to make sure they are interpreted as data and not a delimiter. This also means that fields that contain double quotes as data will need special treatment.
I would recommend a similar approach with an extension method.
var input = File.ReadAllLines(path);
var lines = input.Select(row => row.Split('\t'));
lines = lines.Select(row => row.Select(field => field.EscapeCsvField(',', '"')).ToArray());
var csv = lines.Select(row => string.Join(",", row));
File.WriteAllLines(path, csv.ToArray());
And here's the EscapeCsvField extension method:
static class Extension
{
public static String EscapeCsvField(this String source, Char delimiter, Char escapeChar)
{
if (source.Contains(delimiter) || source.Contains(escapeChar))
return String.Format("{0}{1}{0}", escapeChar, source);
return source;
}
}
Also, if the file is large, it might be best to not read the entire file into memory. In that case, I would suggest writing the CSV output to a different file and then you could use StreamReader and StreamWriter to only work with it 1 line at a time.
var tabPath = path;
var csvPath = Path.Combine(
Path.GetDirectoryName(path),
String.Format("{0}.{1}", Path.GetFileNameWithoutExtension(path), "csv"));
using (var sr = new StreamReader(tabPath))
using (var sw = new StreamWriter(csvPath, false))
{
while (!sr.EndOfStream)
{
var line = sr.ReadLine().Split('\t').Select(field => field.EscapeCsvField(',', '"')).ToArray();
var csv = String.Join(",", line);
sw.WriteLine(csv);
}
}
File.Delete(tabPath);
var csv = File.ReadAllLines("Path").Select(line => line.Replace("\t", ","));
You could simply call
public void ConvertToCSV(string strPath, string strOutput)
{
File.WriteAllLines(strOutput, File.ReadAllLines("Path").Select(line => line.Replace("\t", ",")));
}
There is a lot of content already on SO for handling .CSV files, please search first or trying something.
If the format of your file is strict, you could use string.Split and string.Join:
var lines = File.ReadAllLines(path);
var newLines = lines.Select(l => string.Join(",", l.Split('\t')));
File.WriteAllLines(path, newLines);
Related
I have a text file with several lines and a list of approved characters that can be used. If there are any characters in a line that are not on the approved list, the entire line needs to be deleted.
How can I go about completing this? C# would be the ideal, but Python, PowerShell or JS would be helpful as well.
Example approved characters: abcdefg
Valid: abc
Invalid: abc1
For my program I want the following list of approved characters:
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890#^,;.
After sorting the contents I want it to write them back to the file (without the invalid lines).
Here's a program that filters out all lines that contain invalid characters where args[0] is the input file and args[1] is the output file.
class Program
{
public static async Task Main(string[] args)
{
const string AllowedChars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890#^,;.";
var lines = File.ReadAllText(args[0]);
using StreamWriter outfile = new (args[1]);
foreach (string line in lines)
if (line.All(x => AllowedChars.Contains(x)))
await file.WriteLineAsync(line);
}
}
You can try using Linq in order to query the file:
using System.IO;
using System.Linq;
...
// HashSet<T> is more efficient than List<T> for Contains: O(1) vs. O(N)
HashSet<char> allowed = new HashSet<char>(
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890#^,;."
);
string fileName = #"c:\MyFile.txt";
var clearedLines = File
.ReadLines(fileName)
.Where(line => line.All(letter => allowed.Contains(letter)))
.ToArray(); // Since we have to write back, we have to materialize the data
File.WriteAllLines(fileName, clearedLines);
I have the following code in a text file:
static const char* g_FffectNames[EFFECT_COUNT] = {
"Fade In and Fade Out",
"Halloween Eyes",
"Rainbow Cycles"
};
I can use g_FffectNames[EFFECT_COUNT] as a starting point to search in this big text file. But I need to get the things within quotes (e.g Halloween Eyes or Rainbow Cycles).
What is the best way to get those text in C#? I would also have to assume that there are more code on top of this file (before the static const) and also toward the bottom (after the }; ) and that spacing between characters such as = { or }; is optional to the user.
Should I compress all of these lines into one string and start the search or should I use some sort of regex matching to make this easier?
You can use a regular expressions to parse input file:
var input = /* file content */;
var regex = new Regex("^\\s*\"(?<row>[^\"]+)\\s*\"", RegexOptions.Multiline);
var values = regex.Matches(input)
.Cast<Match>()
.Select(m => m.Groups["row"]).ToArray();
I got it working based on this post:
c# search string in txt file
public string readText()
{
string test = string.Empty;
var mytext = File.ReadLines("C:\\temp\\test_search.txt")
.SkipWhile(line => !line.Contains("g_FffectNames[EFFECT_COUNT]"))
.Skip(1)
.TakeWhile(line => !line.Contains("};"));
foreach (var line in mytext)
{
test += line;
}
return test;
}
i'm reading a CSV file and changing the delimiter from a "," to a "|". However i've noticed in my data (which I have no control over) that in certain cases I have some data that does not want to follow this rule and it contains quoted data with a comma in it. I'm wondering how best to not replace these exceptions?
For example:
ABSON TE,Wick Lane,"Abson, Pucklechurch",Bristol,Avon,ENGLAND,BS16
9SD,37030,17563,BS0001A1,,
Should be changed to:
ABSON TE|Wick Lane|"Abson, Pucklechurch"|Bristol|Avon|ENGLAND|BS16
9SD|37030|17563|BS0001A1||
The code to read and replace the CSV file is this:
var contents = File.ReadAllText(filePath).Split(new string[] { "\n", "\r\n" }, StringSplitOptions.RemoveEmptyEntries).ToArray();
var formattedContents = contents.Select(line => line.Replace(',', '|'));
For anyone else struggling with this, I ended up using the built in .net csv parser. See here for more details and example: http://coding.abel.nu/2012/06/built-in-net-csv-parser/
My specific code:
// Create new parser object and setup parameters
var parser = new TextFieldParser(new StringReader(File.ReadAllText(filePath)))
{
HasFieldsEnclosedInQuotes = true,
Delimiters = new string[] { "," },
TrimWhiteSpace = true
};
var csvSplitList = new List<string>();
// Reads all fields on the current line of the CSV file and returns as a string array
// Joins each field together with new delimiter "|"
while (!parser.EndOfData)
{
csvSplitList.Add(String.Join("|", parser.ReadFields()));
}
// Newline characters added to each line and flattens List<string> into single string
var formattedCsvToSave = String.Join(Environment.NewLine, csvSplitList.Select(x => x));
// Write single string to file
File.WriteAllText(filePathFormatted, formattedCsvToSave);
parser.Close();
This question already has answers here:
Parsing CSV files in C#, with header
(19 answers)
Closed 8 years ago.
This question has been asked several times with different inputs so I thought of reposting it with my requirement.
I have a CSV file which contents string fields in the way given below.
idnum,name1, name2,groupid
idnum,name1, name2,groupid
idnum,name1, name2,groupid
example
s001,sahil,payap,gid0
s002,Amir,Khan,gid02
d003,hrithik,roshan,gid03
I have two dimensional string array. I want to read row by row to my two dimensional array.
When it read it should be like this
arr[0][0]=s001
arr[0][1]=name1
arr[0][2]=name2
arr[0][3]=gid01
arr[1][0]=s002
arr[1][1]=Amir
arr[1][2]=Khan
arr[1][3]=gid04
there are 40 records in a file and it should read till the end of the file.
I need to implement this in C#
Any code sample or any explanation would be great help.
I have no knowledge in csv file handling so please don't ask what did you try, at least if you could give me a code sample for reading just one string for a variable it would be a great help.
And please don't ask to go for another solution.
Thanks.
The simplest way to read a csv file in the way you suggest is probably:
var rows = File.ReadAllLines("myfile.csv").Select(l => l.Split(',').ToArray()).ToArray();
Then:
Console.WriteLine(rows[0][0]); // Will output s001
Console.WriteLine(rows[0][1]); // Will output sahil
Console.WriteLine(rows[0][2]); // Will output payap
Console.WriteLine(rows[0][3]); // Will output gid0
Console.WriteLine(rows[1][0]); // Will output s002
Console.WriteLine(rows[2][0]); // Will output d003
The file would have to be read in line-wise. Each line would have to be separated using String.Split. Then the resulting strings would have to be trimmed using Trim, and finally would have to be written into the respective columns of the current row. However I totally second the comments above; more convenient would be to use some class or struct called Person and then to parse into a List<Person>.
The reading could be done as follows:
String line = String.Empty;
System.IO.StreamReader file = new System.IO.StreamReader("c:\\file.txt");
while((line = file.ReadLine()) != null)
{
String[] parts_of_line = line.Split(',')
for ( int i = 0; i < parts_of_line.Length; i++ )
parts_of_line[i] = parts_of_line[i].Trim();
// do with the parts of the line whatever you like
}
You can do that using the CsvHelper library:
const string Csv = #"s001,sahil,payap,gid0
s002,Amir,Khan,gid02
d003,hrithik,roshan,gid03";
var rows = new List<string[]>();
string[] row;
using (var stringReader = new StringReader(Csv))
using (var parser = new CsvParser(stringReader))
while ((row = parser.Read()) != null)
{
rows.Add(row);
}
Console.WriteLine(rows[0][0]); // Will output s001
Console.WriteLine(rows[0][1]); // Will output sahil
Console.WriteLine(rows[0][2]); // Will output payap
Console.WriteLine(rows[0][3]); // Will output gid0
Console.WriteLine(rows[1][0]); // Will output s002
Console.WriteLine(rows[2][0]); // Will output d003
For a working example, check out this .NET fiddle: http://dotnetfiddle.net/PLPXo8
If you want to read directly from file, you can do this:
var rows = new List<string[]>();
string[] row;
using (var parser = new CsvParser(File.OpenText(#"c:\test.csv")))
while ((row = parser.Read()) != null)
{
rows.Add(row);
}
I have a .txt file with a list of 174 different strings. Each string has an unique identifier.
For example:
123|this data is variable|
456|this data is variable|
789|so is this|
etc..
I wish to write a programe in C# that will read the .txt file and display only one of the 174 strings if I specify the ID of the string I want. This is because in the file I have all the data is variable so only the ID can be used to pull the string. So instead of ending up with the example about I get just one line.
eg just
123|this data is variable|
I seem to be able to write a programe that will pull just the ID from the .txt file and not the entire string or a program that mearly reads the whole file and displays it. But am yet to wirte on that does exactly what I need. HELP!
Well the actual string i get out from the txt file has no '|' they were just in the example. An example of the real string would be: 0111111(0010101) where the data in the brackets is variable. The brackets dont exsist in the real string either.
namespace String_reader
{
class Program
{
static void Main(string[] args)
{
String filepath = #"C:\my file name here";
string line;
if(File.Exists(filepath))
{
StreamReader file = null;
try
{
file = new StreamReader(filepath);
while ((line = file.ReadLine()) !=null)
{
string regMatch = "ID number here"; //this is where it all falls apart.
Regex.IsMatch (line, regMatch);
Console.WriteLine (line);// When program is run it just displays the whole .txt file
}
}
}
finally{
if (file !=null)
file.Close();
}
}
Console.ReadLine();
}
}
}
Use a Regex. Something along the lines of Regex.Match("|"+inputString+"|",#"\|[ ]*\d+\|(.+?)\|").Groups[1].Value
Oh, I almost forgot; you'll need to substitute the d+ for the actual index you want. Right now, that'll just get you the first one.
The "|" before and after the input string makes sure both the index and the value are enclosed in a | for all elements, including the first and last. There's ways of doing a Regex without it, but IMHO they just make your regex more complicated, and less readable.
Assuming you have path and id.
Console.WriteLine(File.ReadAllLines(path).Where(l => l.StartsWith(id + "|")).FirstOrDefault());
Use ReadLines to get a string array of lines then string split on the |
You could use Regex.Split method
FileInfo info = new FileInfo("filename.txt");
String[] lines = info.OpenText().ReadToEnd().Split(' ');
foreach(String line in lines)
{
int id = Convert.ToInt32(line.Split('|')[0]);
string text = Convert.ToInt32(line.Split('|')[1]);
}
Read the data into a string
Split the string on "|"
Read the items 2 by 2: key:value,key:value,...
Add them to a dictionary
Now you can easily find your string with dictionary[key].
first load the hole file to a string.
then try this:
string s = "123|this data is variable| 456|this data is also variable| 789|so is this|";
int index = s.IndexOf("123", 0);
string temp = s.Substring(index,s.Length-index);
string[] splitStr = temp.Split('|');
Console.WriteLine(splitStr[1]);
hope this is what you are looking for.
private static IEnumerable<string> ReadLines(string fspec)
{
using (var reader = new StreamReader(new FileStream(fspec, FileMode.Open, FileAccess.Read, FileShare.Read)))
{
while (!reader.EndOfStream)
yield return reader.ReadLine();
}
}
var dict = ReadLines("input.txt")
.Select(s =>
{
var split = s.Split("|".ToArray(), 2);
return new {Id = Int32.Parse(split[0]), Text = split[1]};
})
.ToDictionary(kv => kv.Id, kv => kv.Text);
Please note that with .NET 4.0 you don't need the ReadLines function, because there is ReadLines
You can now work with that as any dictionary:
Console.WriteLine(dict[12]);
Console.WriteLine(dict[999]);
No error handling here, please add your own
You can use Split method to divide the entire text into parts sepparated by '|'. Then all even elements will correspond to numbers odd elements - to strings.
StreamReader sr = new StreamReader(filename);
string text = sr.ReadToEnd();
string[] data = text.Split('|');
Then convert certain data elements to numbers and strings, i.e. int[] IDs and string[] Strs. Find the index of the given ID with idx = Array.FindIndex(IDs, ID.Equals) and the corresponding string will be Strs[idx]
List <int> IDs;
List <string> Strs;
for (int i = 0; i < data.Length - 1; i += 2)
{
IDs.Add(int.Parse(data[i]));
Strs.Add(data[i + 1]);
}
idx = Array.FindIndex(IDs, ID.Equals); // we get ID from input
answer = Strs[idx];