How to Speed Up Reading a File using FileSteam - c#

I am facing a performance issue while searching the content of a file. I am using the FileStream class to read files (~10 files will be involved for each search with each being ~70 MB in size). However, all of these files are simultaneously being accessed and updated by an another process during my search. As such, I cannot use Buffersize for reading files. Using buffer size in StreamReader takes 3 minutes even though I am using regex.
Has anyone come across a similar situation and could offer any pointers on improving the performance of file search?
Code Snippet
private static int BufferSize = 32768;
using (FileStream fs = File.Open(filePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
using (TextReader txtReader = new StreamReader(fs, Encoding.UTF8, true, BufferSize))
{
System.Text.RegularExpressions.Regex patternMatching = new System.Text.RegularExpressions.Regex(#"(?=\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2})(.*?)(?=\n\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2})", System.Text.RegularExpressions.RegexOptions.IgnoreCase);
System.Text.RegularExpressions.Regex dateStringMatch = new Regex(#"^\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}");
char[] temp = new char[1048576];
while (txtReader.ReadBlock(temp, 0, 1048576) > 0)
{
StringBuilder parseString = new StringBuilder();
parseString.Append(temp);
if (temp[1023].ToString() != Environment.NewLine)
{
parseString.Append(txtReader.ReadLine());
while (txtReader.Peek() > 0 && !(txtReader.Peek() >= 48 && txtReader.Peek() <= 57))
{
parseString.Append(txtReader.ReadLine());
}
}
if (parseString.Length > 0)
{
string[] allRecords = patternMatching.Split(parseString.ToString());
foreach (var item in allRecords)
{
var contentString = item.Trim();
if (!string.IsNullOrWhiteSpace(contentString))
{
var matches = dateStringMatch.Matches(contentString);
if (matches.Count > 0)
{
var rowDatetime = DateTime.MinValue;
if (DateTime.TryParse(matches[0].Value, out rowDatetime))
{
if (rowDatetime >= startDate && rowDatetime < endDate)
{
if (contentString.ToLowerInvariant().Contains(searchText))
{
var result = new SearchResult
{
LogFileType = logFileType,
Message = string.Format(messageTemplateNew, item),
Timestamp = rowDatetime,
ComponentName = componentName,
FileName = filePath,
ServerName = serverName
};
searchResults.Add(result);
}
}
}
}
}
}
}
}
}
}
return searchResults;

Some time ago I had to analyse many FileZilla Server Logfiles with each >120MB.
I used a simple List to get all lines of each logfile and then had a great performance searching for specific lines.
List<string> fileContent = File.ReadAllLines(pathToFile).ToList()
But in your case I think the main reason for the bad performance isn't reading the file. Try to StopWatch some parts of your loop to check where the most time is spent. Regex and TryParse can be very time consuming if used many times in a loop like yours.

Related

C# - Concatenating files until process memory is full then delete duplicates

I'm currently working on a c# form.Basically, I have a lot of log files and most of them have duplicates lines between them. This form is supposed to concatenate a lot of those files into one file then delete all the duplicates in it so that I can have one log file without duplicates. I've already successfully made it work by taking 2 files, concatenating them, deleting all the duplicates in it then reproducing the process until I have no more files. Here is the function I made for this:
private static void DeleteAllDuplicatesFastWithMemoryManagement(HashSet<string>[] path_list, string parent_path, ProgressBar pBar1, BackgroundWorker backgroundWorker1)
{
for (int j = 0; j < path_list.Length; j++)
{
HashSet<string>.Enumerator em = path_list[j].GetEnumerator();
List<string> LogFile = new List<string>();
while (em.MoveNext())
{
var secondLogFile = File.ReadAllLines(em.Current);
LogFile = LogFile.Concat(secondLogFile).ToList();
LogFile = LogFile.Distinct().ToList();
backgroundWorker1.ReportProgress(1);
}
LogFile = LogFile.Distinct().ToList();
string new_path = parent_path + "/new_data/probe." + j + ".log";
File.WriteAllLines(new_path, LogFile.Distinct().ToArray());
}
}
path_list contains all the path to the files I need to process.
path_list[0] contains all the probe.0.log files
path_list[1] contains all the probe.1.log files ...
Here is the idea I have for my problem but I have no idea how to code it :
private static void DeleteAllDuplicatesFastWithMemoryManagement(HashSet<string>[] path_list, string parent_path, ProgressBar pBar1, BackgroundWorker backgroundWorker1)
{
for (int j = 0; j < path_list.Length; j++)
{
HashSet<string>.Enumerator em = path_list[j].GetEnumerator();
List<string> LogFile = new List<string>();
while (em.MoveNext())
{
// how I see it
if (currentMemoryUsage + newfile.Length > maximumProcessMemory) {
LogFile = LogFile.Distinct().ToList();
}
//end
var secondLogFile = File.ReadAllLines(em.Current);
LogFile = LogFile.Concat(secondLogFile).ToList();
LogFile = LogFile.Distinct().ToList();
backgroundWorker1.ReportProgress(1);
}
LogFile = LogFile.Distinct().ToList();
string new_path = parent_path + "/new_data/probe." + j + ".log";
File.WriteAllLines(new_path, LogFile.Distinct().ToArray());
}
}
I think this method will be much quicker, and it will adjust to any computer specs. Can anyone help me to make this work ? Or tell me if I'm wrong.
You are creating far too many lists and arrays and Distincts.
Just combine everything in a HashSet, then write it out
private static void CombineNoDuplicates(HashSet<string>[] path_list, string parent_path, ProgressBar pBar1)
{
var logFile = new HashSet<string>(1000); // pre-size your hashset to a suitable size
foreach (var paths in path_list)
{
logFile.Clear();
foreach (var path in paths)
{
var lines = File.ReadLines(file);
logFile.UnionWith(lines);
backgroundWorker1.ReportProgress(1);
}
string new_path = Path.Combine(parent_path, "new_data", "probe." + j + ".log");
File.WriteAllLines(new_path, logFile);
}
}
Ideally you should use async instead of BackgroundWorker which is deprecated. This also means you don't need to store a whole file in memory at once, except for the first one.
private static async Task CombineNoDuplicatesAsync(HashSet<string>[] path_list, string parent_path, ProgressBar pBar1)
{
var logFile = new HashSet<string>(1000); // pre-size your hashset to a suitable size
foreach (var paths in path_list)
{
logFile.Clear();
foreach (var path in paths)
{
using (var sr = new StreamReader(file))
{
string line;
while ((line = await sr.ReadLineAsync()) != null)
{
logFile.Add(line);
}
}
}
string new_path = Path.Combine(parent_path, "new_data", "probe." + j + ".log");
await File.WriteAllLinesAsync(new_path, logFile);
}
}
If you want to risk a colliding hash-code, you could cut down your memory usage even further by just putting the strings' hashes in a HashSet, then you can fully stream all files.
Caveat: colliding hash-codes are a distinct possibility, especially with many strings. Analyze your data to see fi you can risk this.
private static async Task CombineNoDuplicatesAsync(HashSet<string>[] path_list, string parent_path, ProgressBar pBar1)
{
var hashes = new HashSet<int>(1000); // pre-size your hashset to a suitable size
foreach (var paths in path_list)
{
hashes.Clear();
string new_path = Path.Combine(parent_path, "new_data", "probe." + j + ".log");
using (var output = new StreamWriter(new_path))
{
foreach (var path in paths)
{
using (var sr = new StreamReader(file))
{
string line;
while ((line = await sr.ReadLineAsync()) != null)
{
if (hashes.Add(line.GetHashCode())
await output.WriteLineAsync(line);
}
}
}
}
}
}
You can get even more performance if you would read Span<byte> arrays and parse the lines like that, I will leave that as an exercise to the reader as it's quite complex.
Assuming your log files already contain lines that are sorted in chronological order1, we can effectively treat them as intermediate files for a multi-file sort and perform merging/duplicate elimination in one go.
It would be a new class, something like this:
internal class LogFileMerger : IEnumerable<string>
{
private readonly List<IEnumerator<string>> _files;
public LogFileMerger(HashSet<string> fileNames)
{
_files = fileNames.Select(fn => File.ReadLines(fn).GetEnumerator()).ToList();
}
public IEnumerator<string> GetEnumerator()
{
while (_files.Count > 0)
{
var candidates = _files.Select(e => e.Current);
var nextLine = candidates.OrderBy(c => c).First();
for (int i = _files.Count - 1; i >= 0; i--)
{
while (_files[i].Current == nextLine)
{
if (!_files[i].MoveNext())
{
_files.RemoveAt(i);
break;
}
}
}
yield return nextLine;
}
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
You can create a LogFileMerger using the set of input log file names and pass it directly as the IEnumerable<string> to some method like File.WriteAllLines. Using File.ReadLines should mean that the amount of memory being used for each input file is just a small buffer on each file, and we never attempt to have all of the data from any of the files loaded at any time.
(You may want to adjust the OrderBy and comparison operations in the above if there are requirements around case insensitivity but I don't see any evident in the question)
(Note also that this class cannot be enumerated multiple times in the current design. That could be adjusted by storing the paths instead of the open enumerators in the class field and making the list of open enumerators a local inside GetEnumerator)
1If this is not the case, it may be more sensible to sort each file first so that this assumption is met and then proceed with this plan.

Lines missing in file split

I'm writing a program which splits a CSV file in four almost-equal parts.
I'm using a 2000-lines CSV input file as example, and when reviewing the output files, there are lines missing in the first file, and also there are uncomplete lines which makes no sense, since I'm writing line by line. Here the code:
using System.IO;
using System;
class MainClass {
public static void Main(string[] args){
string line;
int linesNumber = 0, linesEach = 0, cont = 0;
StreamReader r = new StreamReader("in.csv");
StreamWriter w1 = new StreamWriter("out-1.csv");
StreamWriter w2 = new StreamWriter("out-2.csv");
StreamWriter w3 = new StreamWriter("out-3.csv");
StreamWriter w4 = new StreamWriter("out-4.csv");
while((line = r.ReadLine()) != null)
++linesNumber;
linesEach = linesNumber / 4;
r.DiscardBufferedData();
r.BaseStream.Seek(0, SeekOrigin.Begin);
r.BaseStream.Position = 0;
while((line = r.ReadLine()) != null){
++cont;
if(cont == 1){
//fisrt line must be skipped
continue;
}
if(cont < linesEach){
Console.WriteLine(line);
w1.WriteLine(line);
}
else if(cont < (linesEach*2)){
w2.WriteLine(line);
}
else if(cont < (linesEach*3)){
w3.WriteLine(line);
}
else{
w4.WriteLine(line);
}
}
}
}
Why is the writing part doing wrong? How can I fix it?
Thank you all for your help.
You could simplify your approach by using a Partitioner and some LINQ. It also has the benefit of only having two file handles open at once, instead of 1 for each output file plus the original input file.
using System.Collections.Concurrent;
using System.IO;
using System.Linq;
namespace FileSplitter
{
internal static class Program
{
internal static void Main(string[] args)
{
var input = File.ReadLines("in.csv").Skip(1);
var partitioner = Partitioner.Create(input);
var partitions = partitioner.GetPartitions(4);
for (int i = 0; i < partitions.Count; i++)
{
var enumerator = partitions[i];
using (var stream = File.OpenWrite($"out-{i + 1}.csv"))
{
using (var writer = new StreamWriter(stream))
{
while (enumerator.MoveNext())
{
writer.WriteLine(enumerator.Current);
}
}
}
}
}
}
}
This is not direct answer to your question, just an alternative.
Linq can be used to create shorter codes
int inx = 0;
var fInfo = new FileInfo(filename);
var lines = File.ReadAllLines(fInfo.FullName);
foreach (var groups in lines.GroupBy(x => inx++ / (lines.Length / 4)))
{
var newFileName = $"{fInfo.DirectoryName}\\{fInfo.Name}_{groups.Key}{fInfo.Extension}";
File.WriteAllLines(newFileName, groups);
}
Thank you all for your answers.
The problem is, as Jegan and spender suggested, that the StreamWriter needs to be wrapped in the using clause. That said, problem solved.

Reading text files with 64bit process very slow

I'm merging text files (.itf) with some logic which are located in a folder. When I compile it to 32bit (console application, .Net 4.6) everything works fine except that I get outofmemory exceptions if there are lots of data in the folders. Compiling it to 64bit would solve that problem but it is running super slow compared to the 32bit process (more than 15 times slower).
I tried it with BufferedStream and ReadAllLines, but both are performing very poorly. The profiler tells me that these methods use 99% of the time. I don't know were the problem is...
Here's the code:
private static void readData(Dictionary<string, Topic> topics)
{
foreach (string file in Directory.EnumerateFiles(Path, "*.itf"))
{
Topic currentTopic = null;
Table currentTable = null;
Object currentObject = null;
using (var fs = File.Open(file, FileMode.Open))
{
using (var bs = new BufferedStream(fs))
{
using (var sr = new StreamReader(bs, Encoding.Default))
{
string line;
while ((line = sr.ReadLine()) != null)
{
if (line.IndexOf("ETOP") > -1)
{
currentTopic = null;
}
else if (line.IndexOf("ETAB") > -1)
{
currentTable = null;
}
else if (line.IndexOf("ELIN") > -1)
{
currentObject = null;
}
else if (line.IndexOf("MTID") > -1)
{
MTID = line.Replace("MTID ", "");
}
else if (line.IndexOf("MODL") > -1)
{
MODL = line.Replace("MODL ", "");
}
else if (line.IndexOf("TOPI") > -1)
{
var name = line.Replace("TOPI ", "");
if (topics.ContainsKey(name))
{
currentTopic = topics[name];
}
else
{
var topic = new Topic(name);
currentTopic = topic;
topics.Add(name, topic);
}
}
else if (line.IndexOf("TABL") > -1)
{
var name = line.Replace("TABL ", "");
if (currentTopic.Tables.ContainsKey(name))
{
currentTable = currentTopic.Tables[name];
}
else
{
var table = new Table(name);
currentTable = table;
currentTopic.Tables.Add(name, table);
}
}
else if (line.IndexOf("OBJE") > -1)
{
if (currentTable.Name != "Metadata" || currentTable.Objects.Count == 0)
{
var shortLine = line.Replace("OBJE ", "");
var obje = new Object(shortLine.Substring(shortLine.IndexOf(" ")));
currentObject = obje;
currentTable.Objects.Add(obje);
}
}
else if (currentTopic != null && currentTable != null && currentObject != null)
{
currentObject.Data.Add(line);
}
}
}
}
}
}
}
The biggest problem with your program is that, when you let it run in 64-bit mode, then it can read a lot more files. Which is nice, a 64-bit process has a thousand times more address space than a 32-bit process, running out of it is excessively unlikely.
But you do not get a thousand times more RAM.
The universal principle of "there is no free lunch" at work. Having enough RAM matters a great deal in a program like this. First and foremost, it is used by the file system cache. That magical operating system feature that makes it look like reading files from a disk is very cheap. It is not at all, one of the slowest things you can do in a program, but it is very good at hiding it. You'll invoke it when you run your program more than once. The second, and subsequent, times you won't read from the disk at all. That's a pretty dangerous feature and very hard to avoid when you test your program, you get very unrealistic assumptions about how efficient it is.
The problem with a 64-bit process is that it easily makes the file system cache ineffective. Since you can read a lot more files, thus overwhelming the cache. And getting old file data removed. Now the second time you run your program it will not be fast anymore. The files you read will not be in the cache anymore but must be read from the disk. You'll now see the real perf of your program, the way it will behave in production. That's a good thing, even though you don't like it very much :)
Secondary problem with RAM is the lesser one, if you allocate a lot of memory to store the file data then you'll force the operating system to find the RAM to store it. That can cause a lot of hard page faults, incurred when it must unmap memory used by another process, or yours, to free up the RAM that you need. A generic problem called "thrashing". Page faults is something you can see in Task Manager, use View > Select Columns to add it.
Given that the file system cache is the most likely source of the slow-down, a simple test you can do is rebooting your machine, which ensures that the cache cannot have any of the file data, then run the 32-bit version. With the prediction that it will also be slow and that BufferedStream and ReadAllLines are the bottlenecks. Like they should be.
One final note, even though your program doesn't match the pattern, you cannot make strong assumptions about .NET 4.6 perf problems yet. Not until this very nasty bug gets fixed.
A few tips:
Why do you use File.Open, then BufferedStream then StreamReader when
you can do the job with just a StreamReader, which is buffered?
You should reorder your conditions with the one that happen the more often in first.
Consider read all lines then use Parallel.ForEach
I could solve it. Seems that there is a bug in .Net compiler. Removing the code optimization checkbox in VS2015 lead to a huge performance increase. Now, it is running with a similar performance as the 32 bit version. My final version with some optimizations:
private static void readData(ref Dictionary<string, Topic> topics)
{
Regex rgxOBJE = new Regex("OBJE [0-9]+ ", RegexOptions.IgnoreCase | RegexOptions.Compiled);
Regex rgxTABL = new Regex("TABL ", RegexOptions.IgnoreCase | RegexOptions.Compiled);
Regex rgxTOPI = new Regex("TOPI ", RegexOptions.IgnoreCase | RegexOptions.Compiled);
Regex rgxMTID = new Regex("MTID ", RegexOptions.IgnoreCase | RegexOptions.Compiled);
Regex rgxMODL = new Regex("MODL ", RegexOptions.IgnoreCase | RegexOptions.Compiled);
foreach (string file in Directory.EnumerateFiles(Path, "*.itf"))
{
if (file.IndexOf("itf_merger_result") == -1)
{
Topic currentTopic = null;
Table currentTable = null;
Object currentObject = null;
using (var sr = new StreamReader(file, Encoding.Default))
{
Stopwatch sw = new Stopwatch();
sw.Start();
Console.WriteLine(file + " read, parsing ...");
string line;
while ((line = sr.ReadLine()) != null)
{
if (line.IndexOf("OBJE") > -1)
{
if (currentTable.Name != "Metadata" || currentTable.Objects.Count == 0)
{
var obje = new Object(rgxOBJE.Replace(line, ""));
currentObject = obje;
currentTable.Objects.Add(obje);
}
}
else if (line.IndexOf("TABL") > -1)
{
var name = rgxTABL.Replace(line, "");
if (currentTopic.Tables.ContainsKey(name))
{
currentTable = currentTopic.Tables[name];
}
else
{
var table = new Table(name);
currentTable = table;
currentTopic.Tables.Add(name, table);
}
}
else if (line.IndexOf("TOPI") > -1)
{
var name = rgxTOPI.Replace(line, "");
if (topics.ContainsKey(name))
{
currentTopic = topics[name];
}
else
{
var topic = new Topic(name);
currentTopic = topic;
topics.Add(name, topic);
}
}
else if (line.IndexOf("ETOP") > -1)
{
currentTopic = null;
}
else if (line.IndexOf("ETAB") > -1)
{
currentTable = null;
}
else if (line.IndexOf("ELIN") > -1)
{
currentObject = null;
}
else if (currentTopic != null && currentTable != null && currentObject != null)
{
currentObject.Data.Add(line);
}
else if (line.IndexOf("MTID") > -1)
{
MTID = rgxMTID.Replace(line, "");
}
else if (line.IndexOf("MODL") > -1)
{
MODL = rgxMODL.Replace(line, "");
}
}
sw.Stop();
Console.WriteLine(file + " parsed in {0}s", sw.ElapsedMilliseconds / 1000.0);
}
}
}
}
Removing the code optimization checkbox should typically result in performance slowdowns, not speedups. There may be an issue in the VS 2015 product. Please provide a stand-alone repro case with an input set to your program that demonstrate the performance problem and report at: http://connect.microsoft.com/

How to avoid c# File.ReadLines First() locking file

I do not want to read the whole file at any point, I know there are answers on that question, I want t
o read the First or Last line.
I know that my code locks the file that it's reading for two reasons 1) The application that writes to the file crashes intermittently when I run my little app with this code but it never crashes when I am not running this code! 2) There are a few articles that will tell you that File.ReadLines locks the file.
There are some similar questions but that answer seems to involve reading the whole file which is slow for large files and therefore not what I want to do. My requirement to only read the last line most of the time is also unique from what I have read about.
I nead to know how to read the first line (Header row) and the last line (latest row). I do not want to read all lines at any point in my code because this file can become huge and reading the entire file will become slow.
I know that
line = File.ReadLines(fullFilename).First().Replace("\"", "");
... is the same as ...
FileStream fs = new FileStream(#fullFilename, FileMode.Open, FileAccess.Read, FileShare.Read);
My question is, how can I repeatedly read the first and last lines of a file which may be being written to by another application without locking it in any way. I have no control over the application that is writting to the file. It is a data log which can be appended to at any time. The reason I am listening in this way is that this log can be appended to for days on end. I want to see the latest data in this log in my own c# programme without waiting for the log to finish being written to.
My code to call the reading / listening function ...
//Start Listening to the "data log"
private void btnDeconstructCSVFile_Click(object sender, EventArgs e)
{
MySandbox.CopyCSVDataFromLogFile copyCSVDataFromLogFile = new MySandbox.CopyCSVDataFromLogFile();
copyCSVDataFromLogFile.checkForLogData();
}
My class which does the listening. For now it simply adds the data to 2 generics lists ...
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using MySandbox.Classes;
using System.IO;
namespace MySandbox
{
public class CopyCSVDataFromLogFile
{
static private List<LogRowData> listMSDataRows = new List<LogRowData>();
static String fullFilename = string.Empty;
static LogRowData previousLineLogRowList = new LogRowData();
static LogRowData logRowList = new LogRowData();
static LogRowData logHeaderRowList = new LogRowData();
static Boolean checking = false;
public void checkForLogData()
{
//Initialise
string[] logHeaderArray = new string[] { };
string[] badDataRowsArray = new string[] { };
//Get the latest full filename (file with new data)
//Assumption: only 1 file is written to at a time in this directory.
String directory = "C:\\TestDir\\";
string pattern = "*.csv";
var dirInfo = new DirectoryInfo(directory);
var file = (from f in dirInfo.GetFiles(pattern) orderby f.LastWriteTime descending select f).First();
fullFilename = directory + file.ToString(); //This is the full filepath and name of the latest file in the directory!
if (logHeaderArray.Length == 0)
{
//Populate the Header Row
logHeaderRowList = getRow(fullFilename, true);
}
LogRowData tempLogRowList = new LogRowData();
if (!checking)
{
//Read the latest data in an asynchronous loop
callDataProcess();
}
}
private async void callDataProcess()
{
checking = true; //Begin checking
await checkForNewDataAndSaveIfFound();
}
private static Task checkForNewDataAndSaveIfFound()
{
return Task.Run(() => //Call the async "Task"
{
while (checking) //Loop (asynchronously)
{
LogRowData tempLogRowList = new LogRowData();
if (logHeaderRowList.ValueList.Count == 0)
{
//Populate the Header row
logHeaderRowList = getRow(fullFilename, true);
}
else
{
//Populate Data row
tempLogRowList = getRow(fullFilename, false);
if ((!Enumerable.SequenceEqual(tempLogRowList.ValueList, previousLineLogRowList.ValueList)) &&
(!Enumerable.SequenceEqual(tempLogRowList.ValueList, logHeaderRowList.ValueList)))
{
logRowList = getRow(fullFilename, false);
listMSDataRows.Add(logRowList);
previousLineLogRowList = logRowList;
}
}
//System.Threading.Thread.Sleep(10); //Wait for next row.
}
});
}
private static LogRowData getRow(string fullFilename, bool isHeader)
{
string line;
string[] logDataArray = new string[] { };
LogRowData logRowListResult = new LogRowData();
try
{
if (isHeader)
{
//Asign first (header) row data.
//Works but seems to block writting to the file!!!!!!!!!!!!!!!!!!!!!!!!!!!
line = File.ReadLines(fullFilename).First().Replace("\"", "");
}
else
{
//Assign data as last row (default behaviour).
line = File.ReadLines(fullFilename).Last().Replace("\"", "");
}
logDataArray = line.Split(',');
//Copy Array to Generics List and remove last value if it's empty.
for (int i = 0; i < logDataArray.Length; i++)
{
if (i < logDataArray.Length)
{
if (i < logDataArray.Length - 1)
{
//Value is not at the end, from observation, these always have a value (even if it's zero) and so we'll store the value.
logRowListResult.ValueList.Add(logDataArray[i]);
}
else
{
//This is the last value
if (logDataArray[i].Replace("\"", "").Trim().Length > 0)
{
//In this case, the last value is not empty, store it as normal.
logRowListResult.ValueList.Add(logDataArray[i]);
}
else { /*The last value is empty, e.g. "123,456,"; the final comma denotes another field but this field is empty so we will ignore it now. */ }
}
}
}
}
catch (Exception ex)
{
if (ex.Message == "Sequence contains no elements")
{ /*Empty file, no problem. The code will safely loop and then will pick up the header when it appears.*/ }
else
{
//TODO: catch this error properly
Int32 problemID = 10; //Unknown ERROR.
}
}
return logRowListResult;
}
}
}
I found the answer in a combination of other questions. One answer explaining how to read from the end of a file, which I adapted so that it would read only 1 line from the end of the file. And another explaining how to read the entire file without locking it (I did not want to read the entire file but the not locking part was useful). So now you can read the last line of the file (if it contains end of line characters) without locking it. For other end of line delimeters, just replace my 10 and 13 with your end of line character bytes...
Add the method below to public class CopyCSVDataFromLogFile
private static string Reverse(string str)
{
char[] arr = new char[str.Length];
for (int i = 0; i < str.Length; i++)
arr[i] = str[str.Length - 1 - i];
return new string(arr);
}
and replace this line ...
line = File.ReadLines(fullFilename).Last().Replace("\"", "");
with this code block ...
Int32 endOfLineCharacterCount = 0;
Int32 previousCharByte = 0;
Int32 currentCharByte = 0;
//Read the file, from the end, for 1 line, allowing other programmes to access it for read and write!
using (FileStream reader = new FileStream(fullFilename, FileMode.Open, FileAccess.Read, FileShare.ReadWrite, 0x1000, FileOptions.SequentialScan))
{
int i = 0;
StringBuilder lineBuffer = new StringBuilder();
int byteRead;
while ((-i < reader.Length) /*Belt and braces: if there were no end of line characters, reading beyond the file would give a catastrophic error here (to be avoided thus).*/
&& (endOfLineCharacterCount < 2)/*Exit Condition*/)
{
reader.Seek(--i, SeekOrigin.End);
byteRead = reader.ReadByte();
currentCharByte = byteRead;
//Exit condition: the first 2 characters we read (reading backwards remember) were end of line ().
//So when we read the second end of line, we have read 1 whole line (the last line in the file)
//and we must exit now.
if (currentCharByte == 13 && previousCharByte == 10)
{
endOfLineCharacterCount++;
}
if (byteRead == 10 && lineBuffer.Length > 0)
{
line += Reverse(lineBuffer.ToString());
lineBuffer.Remove(0, lineBuffer.Length);
}
lineBuffer.Append((char)byteRead);
previousCharByte = byteRead;
}
reader.Close();
}

Ignore certain lines in a text file (C# Streamreader)

I'm trying to work out a way of removing records from a program I'm writing. I have a text file with all the customer data spread over a set of lines and I read in these lines one at a time and store them in a List
When writing I simply append to the file. However, for deleting I had the idea of adding a character such as * or # to the front of lines no longer needed. However I am unsure how to do this
Below is how I currrently read the data in:
Thanks in advance
StreamReader dataIn = null;
CustomerClass holdcus; //holdcus and holdacc are used as "holding pens" for the next customer/account
Accounts holdacc;
bool moreData = false;
string[] cusdata = new string[13]; //holds customer data
string[] accdata = new string[8]; //holds account data
if (fileIntegCheck(inputDataFile, ref dataIn))
{
moreData = getCustomer(dataIn, cusdata);
while (moreData == true)
{
holdcus = new CustomerClass(cusdata[0], cusdata[1], cusdata[2], cusdata[3], cusdata[4], cusdata[5], cusdata[6], cusdata[7], cusdata[8], cusdata[9], cusdata[10], cusdata[11], cusdata[12]);
customers.Add(holdcus);
int x = Convert.ToInt32(cusdata[12]);
for (int i = 0; i < x; i++) //Takes the ID number for the last customer, as uses it to set the first value of the following accounts
{ //this is done as a key to which accounts map to which customers
moreData = getAccount(dataIn, accdata);
accdata[0] = cusdata[0];
holdacc = new Accounts(accdata[0], accdata[1], accdata[2], accdata[3], accdata[4], accdata[5], accdata[6], accdata[7]);
accounts.Add(holdacc);
}
moreData = getCustomer(dataIn, cusdata);
}
}
if (moreData != null) dataIn.Close();
Since your using string arrays, you can just do cusdata[index] = "#"+cusdata[index] to append it to the beginning of the line. However if your question is how to delete it from the file, why not skip the above step and just not add the line you want deleted when writing the file?
Here is a small read / write sample that should suit your needs. If it doesnt then let me know in the comment.
class Program
{
static readonly string filePath = "c:\\test.txt";
static void Main(string[] args)
{
// Read your file
List<string> lines = ReadLines();
//Create your remove logic here ..
lines = lines.Where(x => x.Contains("Julia Roberts") != true).ToList();
// Rewrite the file
WriteLines(lines);
}
static List<string> ReadLines()
{
List<string> lines = new List<string>();
using (StreamReader sr = new StreamReader(new FileStream(filePath, FileMode.Open)))
{
while (!sr.EndOfStream)
{
string buffer = sr.ReadLine();
lines.Add(buffer);
// Just to show you the results
Console.WriteLine(buffer);
}
}
return lines;
}
static void WriteLines(List<string> lines)
{
using (StreamWriter sw = new StreamWriter(new FileStream(filePath, FileMode.Create)))
{
foreach (var line in lines)
{
sw.WriteLine(line);
}
}
}
}
I used the following "data sample" for this
Matt Damon 100 222
Julia Roberts 125 152
Robert Downey Jr. 150 402
Tom Hanks 55 932

Categories

Resources