im having a tough time getting a small application to work faster. im not a developer and it took me some time to get this working as is. Can anyone offer any suggestions or alternate code to speed this process up, its taking about 1 Hour to process 10m of the input file.
the code is listed below and here is an example of the input file.
4401,imei:0000000000,2012-09-01 12:12:12.9999
using System;
using System.Globalization;
using System.IO;
class Sample
{
public static void Main(string[] args)
{
if (args.Length == 0)
{
return;
}
using (FileStream stream = File.Open(args[0], FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
using (StreamReader streamReader = new StreamReader(stream))
{
System.Text.StringBuilder builder = new System.Text.StringBuilder();
while (!streamReader.EndOfStream)
{
var line = streamReader.ReadLine();
var values = line.Split(',');
DateTime dt = new DateTime();
DateTime.TryParse(values[2], out dt);
values[2] = Convert.ToString(dt.Ticks);
string[] output = new string[values.Length];
bool firstColumn = true;
for (int index = 0; index < values.Length; index++)
{
if (!firstColumn)
builder.Append(',');
builder.Append(values[index]);
firstColumn = false;
}
File.WriteAllText(args[1], builder.AppendLine().ToString());
}
}
}
}
}
The biggest performance hit is that every time a line is read the entire file (processed so far) is written back to disk. For a quick win, try moving your StringBuilder out of the loop:
System.Text.StringBuilder builder = new System.Text.StringBuilder();
using (FileStream stream = File.Open(args[0], FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
using (StreamReader streamReader = new StreamReader(stream))
{
while (!streamReader.EndOfStream)
{
var line = streamReader.ReadLine();
var values = line.Split(',');
DateTime dt = new DateTime();
DateTime.TryParse(values[2], out dt);
values[2] = Convert.ToString(dt.Ticks);
string[] output = new string[values.Length];
bool firstColumn = true;
for (int index = 0; index < values.Length; index++)
{
if (!firstColumn)
builder.Append(',');
builder.Append(values[index]);
firstColumn = false;
}
builder.AppendLine();
}
}
}
File.WriteAllText(args[1], builder.ToString());
If you want to refactor further change the comma separating logic:
System.Text.StringBuilder builder = new System.Text.StringBuilder();
using (FileStream stream = File.Open(args[0], FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
using (StreamReader streamReader = new StreamReader(stream))
{
while (!streamReader.EndOfStream)
{
var line = streamReader.ReadLine();
var values = line.Split(',');
DateTime dt = new DateTime();
DateTime.TryParse(values[2], out dt);
values[2] = Convert.ToString(dt.Ticks);
builder.AppendLine(string.Join(",", values));
}
}
}
File.WriteAllText(args[1], builder.ToString());
Edit: To avoid the memory usage, remove the Stringbuilder and use another FileStream to write to disk. Your proposed solution (using a List) will still use a substantial amount of memory and likely break on larger files:
using (FileStream input = File.Open(args[0], FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
using (FileStream output = File.Create(args[1]))
{
using (StreamReader streamReader = new StreamReader(input))
using (StreamWriter streamWriter = new StreamWriter(output))
{
while (!streamReader.EndOfStream)
{
var line = streamReader.ReadLine();
var values = line.Split(',');
DateTime dt = new DateTime();
DateTime.TryParse(values[2], out dt);
values[2] = Convert.ToString(dt.Ticks);
streamWriter.WriteLine(string.Join(",", values));
}
}
}
here is what i found can fix this and handle the large files.
thanks to #Muzz and #Vache for the assistance.
string line = "";
System.IO.StreamReader file = new System.IO.StreamReader("c:/test.txt");
List<string> convertedLines = new List<string>();
while ((line = file.ReadLine()) != null)
{
string[] lineSplit = line.Split(',');
DateTime dt = new DateTime();
DateTime.TryParse(lineSplit[2], out dt);
lineSplit[2] = Convert.ToString(dt.Ticks);
string convertedline = lineSplit[0] + "," + lineSplit[1] + "," + lineSplit[2];
convertedLines.Add(convertedline);
}
file.Close();
File.WriteAllLines("c:/newTest.txt", convertedLines);
Related
I was trying to read CSV file in C#.
I have tried File.ReadAllLines(path).Select(a => a.Split(';')) way but the issue is when there is \n multiple line in a cell it is not working.
So I have tried below
using LumenWorks.Framework.IO.Csv;
var csvTable = new DataTable();
using (TextReader fileReader = File.OpenText(path))
using (var csvReader = new CsvReader(fileReader, false))
{
csvTable.Load(csvReader);
}
for (int i = 0; i < csvTable.Rows.Count; i++)
{
if (!(csvTable.Rows[i][0] is DBNull))
{
var row1= csvTable.Rows[i][0];
}
if (!(csvTable.Rows[i][1] is DBNull))
{
var row2= csvTable.Rows[i][1];
}
}
The issue is the above code throwing exception as
The CSV appears to be corrupt near record '0' field '5 at position '63'
This is because the header of CSV's having two double quote as below
"Header1",""Header2""
Is there a way that I can ignore double quotes and process the CSV's.
update
I have tried with TextFieldParser as below
public static void GetCSVData()
{
using (var parser = new TextFieldParser(path))
{
parser.HasFieldsEnclosedInQuotes = false;
parser.Delimiters = new[] { "," };
while (parser.PeekChars(1) != null)
{
string[] fields = parser.ReadFields();
foreach (var field in fields)
{
Console.Write(field + " ");
}
Console.WriteLine(Environment.NewLine);
}
}
}
The output:
Sample CSV data I have used:
Any help is appreciated.
Hope this works!
Please replace two double quotes as below from csv:
using (FileStream fs = new FileStream(Path, FileMode.Open, FileAccess.ReadWrite, FileShare.None))
{
StreamReader sr = new StreamReader(fs);
string contents = sr.ReadToEnd();
// replace "" with "
contents = contents.Replace("\"\"", "\"");
// go back to the beginning of the stream
fs.Seek(0, SeekOrigin.Begin);
// adjust the length to make sure all original
// contents is overritten
fs.SetLength(contents.Length);
StreamWriter sw = new StreamWriter(fs);
sw.Write(contents);
sw.Close();
}
Then use the same CSV helper
using LumenWorks.Framework.IO.Csv;
var csvTable = new DataTable();
using (TextReader fileReader = File.OpenText(path))
using (var csvReader = new CsvReader(fileReader, false))
{
csvTable.Load(csvReader);
}
Thanks.
I'm trying to copy the contents of one Excel file to another Excel file while replacing a string inside of the file on the copy. It's working for the most part, but the file is losing 27 kb of data. Any suggestions?
public void ReplaceString(string what, string with, string path) {
List < string > doneContents = new List < string > ();
List < string > doneNames = new List < string > ();
using(ZipArchive archive = ZipFile.Open(_path, ZipArchiveMode.Read)) {
int count = archive.Entries.Count;
for (int i = 0; i < count; i++) {
ZipArchiveEntry entry = archive.Entries[i];
using(var entryStream = entry.Open())
using(StreamReader reader = new StreamReader(entryStream)) {
string txt = reader.ReadToEnd();
if (txt.Contains(what)) {
txt = txt.Replace(what, with);
}
doneContents.Add(txt);
string name = entry.FullName;
doneNames.Add(name);
}
}
}
using(MemoryStream zipStream = new MemoryStream()) {
using(ZipArchive newArchive = new ZipArchive(zipStream, ZipArchiveMode.Create, true, Encoding.UTF8)) {
for (int i = 0; i < doneContents.Count; i++) {
int spot = i;
ZipArchiveEntry entry = newArchive.CreateEntry(doneNames[spot]);
using(var entryStream = entry.Open())
using(var sw = new StreamWriter(entryStream)) {
sw.Write(doneContents[spot]);
}
}
}
using(var fileStream = new FileStream(path, FileMode.Create)) {
zipStream.Seek(0, SeekOrigin.Begin);
zipStream.CopyTo(fileStream);
}
}
}
I've used Microsoft's DocumentFormat.OpenXML and Excel Interop, however, they are both lacking in a few main components that I need.
Update:
using(var fileStream = new FileStream(path, FileMode.Create)) {
var wrapper = new StreamWriter(fileStream);
wrapper.AutoFlush = true;
zipStream.Seek(0, SeekOrigin.Begin);
zipStream.CopyTo(wrapper.BaseStream);
wrapper.Flush();
wrapper.Close();
}
Try the process without changing the string and see if the file size is the same. If so then it would seem that your copy is working correctly, however as Marc B suggested, with compression, even a small change can result in a larger change in the overall size.
I wrote a program that split txt files into smaller pieces
But my problem is that my method is slow
Because this file size is 1gb and I use a variable named "pagesize" which base on that amount of lines in splitted files will be calculated
The problem is that the foreach is slow?
Is there a better way?
private void button1_Click(object sender, EventArgs e)
{
string inputFile = #"G:\Programming\C#\c# tamrin reza\large-Orders.txt";
int seed = 1000;
const int pageSize = 5000;
int count = 1;
const string destinationFileName = #"F:\Output\";
string outputFile;
string baseName = "-CustomerApp";
string extension = Path.GetExtension(inputFile);
var lst = new List<string>();
//FileInfo fileInfo = new FileInfo(inputFile);
//long fileSize = fileInfo.Length / pageSize;
FileStream fs = new FileStream(inputFile, FileMode.Open);
StreamReader sr = new StreamReader(fs);
while (!sr.EndOfStream)
{
for (int j = 1; j <= pageSize; j++)
{
var line = sr.ReadLine();
lst.Add(line);
}
outputFile = destinationFileName + count + baseName + extension;
CopyLines(lst, outputFile);
lst.Clear();
count++;
}
}
private void CopyLines(List<string> line, string outputFile)
{
FileStream outputFileStream = new FileStream(outputFile, FileMode.Create, FileAccess.Write);
//StreamWriter writer = new StreamWriter(outputFile);
//for (int i = 1; i < line.Count; i++)
//{
//}
using (StreamWriter sw = new StreamWriter(outputFileStream))
{
foreach (var li in line)
{
sw.WriteLine(li);
}
}
}
Thanks
You are iterating over the entire collection twice. If you write to the output file while you are reading, that will save one iteration over the entire collection.
while (!sr.EndOfStream)
{
outputFile = destinationFileName + count + baseName + extension;
FileStream outputFileStream = new FileStream(outputFile, FileMode.Create, FileAccess.Write);
using (StreamWriter sw = new StreamWriter(outputFileStream))
{
for (int j = 1; j <= pageSize; j++)
{
var line = sr.ReadLine();
sw.WriteLine(li);
}
}
lst.Clear();
count++;
}
I have a process that loads data into a sql table from a flat file then needs to immediately move the file to an archive folder.
However when running the code it imports the data but throws and IOException
{"The process cannot access the file because it is being used by another process."}
There appears to be some contention in the process. Where and how should I avoid this?
internal class Program
{
private static void Main(string[] args)
{
string sourceFolder = #"c:\ImportFiles\";
string destinationFolder = #"c:\ImportFiles\Archive\";
foreach (string fileName in Directory.GetFiles(sourceFolder, "*.*"))
{
string sourceFileName = Path.GetFileName(fileName);
string destinationFileName = Path.GetFileName(fileName) + ".arc";
ProcessFile(fileName);
string source = String.Concat(sourceFolder,sourceFileName);
string destination = String.Concat(destinationFolder,destinationFileName);
File.Move(source, destination);
}
}
static void ProcessFile(string fileName)
{
Encoding enc = new UTF8Encoding(true, true);
DataTable dt = LoadRecordsFromFile(fileName, enc, ',');
SqlBulkCopy bulkCopy = new SqlBulkCopy("Server=(local);Database=test;Trusted_Connection=True;",
SqlBulkCopyOptions.TableLock);
bulkCopy.DestinationTableName = "dbo.tblManualDataLoad";
bulkCopy.WriteToServer(dt);
bulkCopy.Close();
}
public static DataTable LoadRecordsFromFile(string fileName, Encoding encoding, char delimeter)
{
DataTable table = null;
if (fileName != null &&
!fileName.Equals(string.Empty) &&
File.Exists(fileName))
{
try
{
string tableName = "DataImport";
FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
List<string> rows = new List<string>();
StreamReader reader = new StreamReader(fs, encoding);
string record = reader.ReadLine();
while (record != null)
{
rows.Add(record);
record = reader.ReadLine();
}
List<string[]> rowObjects = new List<string[]>();
int maxColsCount = 0;
foreach (string s in rows)
{
string[] convertedRow = s.Split(new char[] { delimeter });
if (convertedRow.Length > maxColsCount)
maxColsCount = convertedRow.Length;
rowObjects.Add(convertedRow);
}
table = new DataTable(tableName);
for (int i = 0; i < maxColsCount; i++)
{
table.Columns.Add(new DataColumn());
}
foreach (string[] rowArray in rowObjects)
{
table.Rows.Add(rowArray);
}
//Remove Header Row From Import file
DataRow row = table.Rows[0];
row.Delete();
table.AcceptChanges();
}
catch
{
//TODO SEND EMAIL ALERT ON ERROR
throw new Exception("Error in ReadFromFile: IO error.");
}
}
else
{
//TODO SEND EMAIL ALERT ON ERROR
throw new FileNotFoundException("Error in ReadFromFile: the file path could not be found.");
}
return table;
}
}
Your program is likely holding the file open. You should wrap FileStream and StreamReader objects in using statements. This closes those objects when the using block finishes.
The part of your LoadRecordsFromFile function that reads the file should look something like:
...
string tableName = "DataImport";
List<string> rows = new List<string>();
using (FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
using (StreamReader reader = new StreamReader(fs, encoding))
{
string record = reader.ReadLine();
while (record != null)
{
rows.Add(record);
record = reader.ReadLine();
}
}
}
...
I am reading a file using streamreader opened in ReadWrite mode. The requirement I have is to check for a file for specific text, and if it is found, replace that line with a new line.
Currently I have initialized a StreamWriter for writing.
It is writing text to a file but it's appending that to a new line.
So what should I do to replace the particular line text?
System.IO.FileStream oStream = new System.IO.FileStream(sFilePath, System.IO.FileMode.Append, System.IO.FileAccess.Write, System.IO.FileShare.Read);
System.IO.FileStream iStream = new System.IO.FileStream(sFilePath, System.IO.FileMode.Open, System.IO.FileAccess.Read, System.IO.FileShare.ReadWrite);
System.IO.StreamWriter sw = new System.IO.StreamWriter(oStream);
System.IO.StreamReader sr = new System.IO.StreamReader(iStream);
string line;
int counter = 0;
while ((line = sr.ReadLine()) != null)
{
if (line.Contains("line_found"))
{
sw.WriteLine("line_found false");
break;
}
counter++;
}
sw.Close();
sr.Close();
Hi Try the below Code.... it will help you....
//Replace all the HI in the Text file...
var fileContents = System.IO.File.ReadAllText(#"C:\Sample.txt");
fileContents = fileContents.Replace("Hi","BYE");
System.IO.File.WriteAllText(#"C:\Sample.txt", fileContents);
//Replace the HI in a particular line....
string[] lines = System.IO.File.ReadAllLines("Sample.txt");
for (int i = 0; i < lines.Length; i++)
{
if(lines[i].Contains("hi"))
{
MessageBox.Show("Found");
lines[i] = lines[i].Replace("hi", "BYE");
break;
}
}
System.IO.File.WriteAllLines("Sample.txt", lines);