How to avoid c# File.ReadLines First() locking file - c#

I do not want to read the whole file at any point, I know there are answers on that question, I want t
o read the First or Last line.
I know that my code locks the file that it's reading for two reasons 1) The application that writes to the file crashes intermittently when I run my little app with this code but it never crashes when I am not running this code! 2) There are a few articles that will tell you that File.ReadLines locks the file.
There are some similar questions but that answer seems to involve reading the whole file which is slow for large files and therefore not what I want to do. My requirement to only read the last line most of the time is also unique from what I have read about.
I nead to know how to read the first line (Header row) and the last line (latest row). I do not want to read all lines at any point in my code because this file can become huge and reading the entire file will become slow.
I know that
line = File.ReadLines(fullFilename).First().Replace("\"", "");
... is the same as ...
FileStream fs = new FileStream(#fullFilename, FileMode.Open, FileAccess.Read, FileShare.Read);
My question is, how can I repeatedly read the first and last lines of a file which may be being written to by another application without locking it in any way. I have no control over the application that is writting to the file. It is a data log which can be appended to at any time. The reason I am listening in this way is that this log can be appended to for days on end. I want to see the latest data in this log in my own c# programme without waiting for the log to finish being written to.
My code to call the reading / listening function ...
//Start Listening to the "data log"
private void btnDeconstructCSVFile_Click(object sender, EventArgs e)
{
MySandbox.CopyCSVDataFromLogFile copyCSVDataFromLogFile = new MySandbox.CopyCSVDataFromLogFile();
copyCSVDataFromLogFile.checkForLogData();
}
My class which does the listening. For now it simply adds the data to 2 generics lists ...
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using MySandbox.Classes;
using System.IO;
namespace MySandbox
{
public class CopyCSVDataFromLogFile
{
static private List<LogRowData> listMSDataRows = new List<LogRowData>();
static String fullFilename = string.Empty;
static LogRowData previousLineLogRowList = new LogRowData();
static LogRowData logRowList = new LogRowData();
static LogRowData logHeaderRowList = new LogRowData();
static Boolean checking = false;
public void checkForLogData()
{
//Initialise
string[] logHeaderArray = new string[] { };
string[] badDataRowsArray = new string[] { };
//Get the latest full filename (file with new data)
//Assumption: only 1 file is written to at a time in this directory.
String directory = "C:\\TestDir\\";
string pattern = "*.csv";
var dirInfo = new DirectoryInfo(directory);
var file = (from f in dirInfo.GetFiles(pattern) orderby f.LastWriteTime descending select f).First();
fullFilename = directory + file.ToString(); //This is the full filepath and name of the latest file in the directory!
if (logHeaderArray.Length == 0)
{
//Populate the Header Row
logHeaderRowList = getRow(fullFilename, true);
}
LogRowData tempLogRowList = new LogRowData();
if (!checking)
{
//Read the latest data in an asynchronous loop
callDataProcess();
}
}
private async void callDataProcess()
{
checking = true; //Begin checking
await checkForNewDataAndSaveIfFound();
}
private static Task checkForNewDataAndSaveIfFound()
{
return Task.Run(() => //Call the async "Task"
{
while (checking) //Loop (asynchronously)
{
LogRowData tempLogRowList = new LogRowData();
if (logHeaderRowList.ValueList.Count == 0)
{
//Populate the Header row
logHeaderRowList = getRow(fullFilename, true);
}
else
{
//Populate Data row
tempLogRowList = getRow(fullFilename, false);
if ((!Enumerable.SequenceEqual(tempLogRowList.ValueList, previousLineLogRowList.ValueList)) &&
(!Enumerable.SequenceEqual(tempLogRowList.ValueList, logHeaderRowList.ValueList)))
{
logRowList = getRow(fullFilename, false);
listMSDataRows.Add(logRowList);
previousLineLogRowList = logRowList;
}
}
//System.Threading.Thread.Sleep(10); //Wait for next row.
}
});
}
private static LogRowData getRow(string fullFilename, bool isHeader)
{
string line;
string[] logDataArray = new string[] { };
LogRowData logRowListResult = new LogRowData();
try
{
if (isHeader)
{
//Asign first (header) row data.
//Works but seems to block writting to the file!!!!!!!!!!!!!!!!!!!!!!!!!!!
line = File.ReadLines(fullFilename).First().Replace("\"", "");
}
else
{
//Assign data as last row (default behaviour).
line = File.ReadLines(fullFilename).Last().Replace("\"", "");
}
logDataArray = line.Split(',');
//Copy Array to Generics List and remove last value if it's empty.
for (int i = 0; i < logDataArray.Length; i++)
{
if (i < logDataArray.Length)
{
if (i < logDataArray.Length - 1)
{
//Value is not at the end, from observation, these always have a value (even if it's zero) and so we'll store the value.
logRowListResult.ValueList.Add(logDataArray[i]);
}
else
{
//This is the last value
if (logDataArray[i].Replace("\"", "").Trim().Length > 0)
{
//In this case, the last value is not empty, store it as normal.
logRowListResult.ValueList.Add(logDataArray[i]);
}
else { /*The last value is empty, e.g. "123,456,"; the final comma denotes another field but this field is empty so we will ignore it now. */ }
}
}
}
}
catch (Exception ex)
{
if (ex.Message == "Sequence contains no elements")
{ /*Empty file, no problem. The code will safely loop and then will pick up the header when it appears.*/ }
else
{
//TODO: catch this error properly
Int32 problemID = 10; //Unknown ERROR.
}
}
return logRowListResult;
}
}
}

I found the answer in a combination of other questions. One answer explaining how to read from the end of a file, which I adapted so that it would read only 1 line from the end of the file. And another explaining how to read the entire file without locking it (I did not want to read the entire file but the not locking part was useful). So now you can read the last line of the file (if it contains end of line characters) without locking it. For other end of line delimeters, just replace my 10 and 13 with your end of line character bytes...
Add the method below to public class CopyCSVDataFromLogFile
private static string Reverse(string str)
{
char[] arr = new char[str.Length];
for (int i = 0; i < str.Length; i++)
arr[i] = str[str.Length - 1 - i];
return new string(arr);
}
and replace this line ...
line = File.ReadLines(fullFilename).Last().Replace("\"", "");
with this code block ...
Int32 endOfLineCharacterCount = 0;
Int32 previousCharByte = 0;
Int32 currentCharByte = 0;
//Read the file, from the end, for 1 line, allowing other programmes to access it for read and write!
using (FileStream reader = new FileStream(fullFilename, FileMode.Open, FileAccess.Read, FileShare.ReadWrite, 0x1000, FileOptions.SequentialScan))
{
int i = 0;
StringBuilder lineBuffer = new StringBuilder();
int byteRead;
while ((-i < reader.Length) /*Belt and braces: if there were no end of line characters, reading beyond the file would give a catastrophic error here (to be avoided thus).*/
&& (endOfLineCharacterCount < 2)/*Exit Condition*/)
{
reader.Seek(--i, SeekOrigin.End);
byteRead = reader.ReadByte();
currentCharByte = byteRead;
//Exit condition: the first 2 characters we read (reading backwards remember) were end of line ().
//So when we read the second end of line, we have read 1 whole line (the last line in the file)
//and we must exit now.
if (currentCharByte == 13 && previousCharByte == 10)
{
endOfLineCharacterCount++;
}
if (byteRead == 10 && lineBuffer.Length > 0)
{
line += Reverse(lineBuffer.ToString());
lineBuffer.Remove(0, lineBuffer.Length);
}
lineBuffer.Append((char)byteRead);
previousCharByte = byteRead;
}
reader.Close();
}

Related

C# - Concatenating files until process memory is full then delete duplicates

I'm currently working on a c# form.Basically, I have a lot of log files and most of them have duplicates lines between them. This form is supposed to concatenate a lot of those files into one file then delete all the duplicates in it so that I can have one log file without duplicates. I've already successfully made it work by taking 2 files, concatenating them, deleting all the duplicates in it then reproducing the process until I have no more files. Here is the function I made for this:
private static void DeleteAllDuplicatesFastWithMemoryManagement(HashSet<string>[] path_list, string parent_path, ProgressBar pBar1, BackgroundWorker backgroundWorker1)
{
for (int j = 0; j < path_list.Length; j++)
{
HashSet<string>.Enumerator em = path_list[j].GetEnumerator();
List<string> LogFile = new List<string>();
while (em.MoveNext())
{
var secondLogFile = File.ReadAllLines(em.Current);
LogFile = LogFile.Concat(secondLogFile).ToList();
LogFile = LogFile.Distinct().ToList();
backgroundWorker1.ReportProgress(1);
}
LogFile = LogFile.Distinct().ToList();
string new_path = parent_path + "/new_data/probe." + j + ".log";
File.WriteAllLines(new_path, LogFile.Distinct().ToArray());
}
}
path_list contains all the path to the files I need to process.
path_list[0] contains all the probe.0.log files
path_list[1] contains all the probe.1.log files ...
Here is the idea I have for my problem but I have no idea how to code it :
private static void DeleteAllDuplicatesFastWithMemoryManagement(HashSet<string>[] path_list, string parent_path, ProgressBar pBar1, BackgroundWorker backgroundWorker1)
{
for (int j = 0; j < path_list.Length; j++)
{
HashSet<string>.Enumerator em = path_list[j].GetEnumerator();
List<string> LogFile = new List<string>();
while (em.MoveNext())
{
// how I see it
if (currentMemoryUsage + newfile.Length > maximumProcessMemory) {
LogFile = LogFile.Distinct().ToList();
}
//end
var secondLogFile = File.ReadAllLines(em.Current);
LogFile = LogFile.Concat(secondLogFile).ToList();
LogFile = LogFile.Distinct().ToList();
backgroundWorker1.ReportProgress(1);
}
LogFile = LogFile.Distinct().ToList();
string new_path = parent_path + "/new_data/probe." + j + ".log";
File.WriteAllLines(new_path, LogFile.Distinct().ToArray());
}
}
I think this method will be much quicker, and it will adjust to any computer specs. Can anyone help me to make this work ? Or tell me if I'm wrong.
You are creating far too many lists and arrays and Distincts.
Just combine everything in a HashSet, then write it out
private static void CombineNoDuplicates(HashSet<string>[] path_list, string parent_path, ProgressBar pBar1)
{
var logFile = new HashSet<string>(1000); // pre-size your hashset to a suitable size
foreach (var paths in path_list)
{
logFile.Clear();
foreach (var path in paths)
{
var lines = File.ReadLines(file);
logFile.UnionWith(lines);
backgroundWorker1.ReportProgress(1);
}
string new_path = Path.Combine(parent_path, "new_data", "probe." + j + ".log");
File.WriteAllLines(new_path, logFile);
}
}
Ideally you should use async instead of BackgroundWorker which is deprecated. This also means you don't need to store a whole file in memory at once, except for the first one.
private static async Task CombineNoDuplicatesAsync(HashSet<string>[] path_list, string parent_path, ProgressBar pBar1)
{
var logFile = new HashSet<string>(1000); // pre-size your hashset to a suitable size
foreach (var paths in path_list)
{
logFile.Clear();
foreach (var path in paths)
{
using (var sr = new StreamReader(file))
{
string line;
while ((line = await sr.ReadLineAsync()) != null)
{
logFile.Add(line);
}
}
}
string new_path = Path.Combine(parent_path, "new_data", "probe." + j + ".log");
await File.WriteAllLinesAsync(new_path, logFile);
}
}
If you want to risk a colliding hash-code, you could cut down your memory usage even further by just putting the strings' hashes in a HashSet, then you can fully stream all files.
Caveat: colliding hash-codes are a distinct possibility, especially with many strings. Analyze your data to see fi you can risk this.
private static async Task CombineNoDuplicatesAsync(HashSet<string>[] path_list, string parent_path, ProgressBar pBar1)
{
var hashes = new HashSet<int>(1000); // pre-size your hashset to a suitable size
foreach (var paths in path_list)
{
hashes.Clear();
string new_path = Path.Combine(parent_path, "new_data", "probe." + j + ".log");
using (var output = new StreamWriter(new_path))
{
foreach (var path in paths)
{
using (var sr = new StreamReader(file))
{
string line;
while ((line = await sr.ReadLineAsync()) != null)
{
if (hashes.Add(line.GetHashCode())
await output.WriteLineAsync(line);
}
}
}
}
}
}
You can get even more performance if you would read Span<byte> arrays and parse the lines like that, I will leave that as an exercise to the reader as it's quite complex.
Assuming your log files already contain lines that are sorted in chronological order1, we can effectively treat them as intermediate files for a multi-file sort and perform merging/duplicate elimination in one go.
It would be a new class, something like this:
internal class LogFileMerger : IEnumerable<string>
{
private readonly List<IEnumerator<string>> _files;
public LogFileMerger(HashSet<string> fileNames)
{
_files = fileNames.Select(fn => File.ReadLines(fn).GetEnumerator()).ToList();
}
public IEnumerator<string> GetEnumerator()
{
while (_files.Count > 0)
{
var candidates = _files.Select(e => e.Current);
var nextLine = candidates.OrderBy(c => c).First();
for (int i = _files.Count - 1; i >= 0; i--)
{
while (_files[i].Current == nextLine)
{
if (!_files[i].MoveNext())
{
_files.RemoveAt(i);
break;
}
}
}
yield return nextLine;
}
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
You can create a LogFileMerger using the set of input log file names and pass it directly as the IEnumerable<string> to some method like File.WriteAllLines. Using File.ReadLines should mean that the amount of memory being used for each input file is just a small buffer on each file, and we never attempt to have all of the data from any of the files loaded at any time.
(You may want to adjust the OrderBy and comparison operations in the above if there are requirements around case insensitivity but I don't see any evident in the question)
(Note also that this class cannot be enumerated multiple times in the current design. That could be adjusted by storing the paths instead of the open enumerators in the class field and making the list of open enumerators a local inside GetEnumerator)
1If this is not the case, it may be more sensible to sort each file first so that this assumption is met and then proceed with this plan.

How to count strings from text file in C#

in this button click event I am trying to count strings from text file that are the same as in textboxes, then display number of them in label. My problem is that I have no idea how to count them-I'm talking about code inside if-statement. I would really appreciate any help.
private void btnCalculate_Click(object sender, EventArgs e)
{
string openFileName;
using (OpenFileDialog ofd = new OpenFileDialog())
{
if (ofd.ShowDialog() != DialogResult.OK)
{
MessageBox.Show("You did not select OK");
return;
}
openFileName = ofd.FileName;
}
FileStream fs = null;
StreamReader sr = null;
try
{
fs = new FileStream("x", FileMode.Open, FileAccess.Read);
fs.Seek(0, SeekOrigin.Begin);
sr = new StreamReader(fs);
string s = sr.ReadLine();
while (s != null)
{
s = sr.ReadLine();
}
if(s.Contains(tbFirstClub.Text))
{
s.Count = lblResult1.Text; //problem is here
}
else if(s.Contains(tbSecondClub.Text))
{
s.Count = lblResult2.Text; //problem is here
}
}
catch (IOException)
{
MessageBox.Show("Error reading file");
}
catch (Exception)
{
MessageBox.Show("Something went wrong");
}
finally
{
if (sr != null)
{
sr.Close();
}
}
}
Thanks in advance.
s.Count = lblResult1.Text; //problem is here
wait...you are saying here..
you have a variable (s)
and you access its property (Count)
and then set it to the label text(lblResult1.Text)
is that what you're trying to do? because the reverse seems more likely
Using LINQ you can get the number of occurences, like below:
int numOfOcuurences= s.Count( s=> s == tbFirstClub.Text);
lblResult1.Text = numOfOcuurences.ToString();
welcome to Stack Overflow.
I want to point out something you said.
else if(s.Contains(tbSecondClub.Text))
{
s.Count = lblResult2.Text; //problem is here
}
S is our string that we just read from the file.
You're saying assoung S.Count (The length of the string) to text.
I don't think this is what you want. We want to return the number of times specified strings show up in a specified file
Let's refactor this, (And add some tricks along the way).
// Let's create a dictionary to store all of our desired texts, and the counts.
var textAndCounts = new Dictionary<string, int>();
textAndCounts.Add(tbFirstClub.Text, 0); // Assuming the type of Text is string, change acccorrdingly
textAndCounts.Add(tbSecondClub.Text, 0);
//We added both out texts fields to our dictionary with a value of 0
// Read all the lines from the file.
var allLines = File.ReadAllLines(openFileName); /* using System.IO */
foreach(var line in allLines)
{
if(line.Contains(tbFirstClub.Text))
{
textAndCounts[tbFirstClub.Text] += 1; // Go to where we stored our count for our text and increment
}
if(line.Contains(tbSecondClub.Text))
{
textandCounts[tbSecondClub.Text] += 1;
}
}
This should solve your problem, but it's still pretty brittle. Optimally, we want to design a system that works for any number of strings and counts them.
So how would I do it?
public Dictionary<string, int> GetCountsPerStringInFile(IEnumerable<string> textsToSearch, string filePath)
{
//Lets use Linq to create a dictionary, assuming all strings are unique.
//This means, create a dictionary in this list, where the key is the values in the list, and the value is 0 <Text, 0>
var textsAndCount = textsToSearch.ToDictionary(text => text, count => 0);
var allLines = File.ReadAllLines(openFileName);
foreach (var line in allLines)
{
// You didn't specify if a line could maintain multiple values, so let's handle that here.
var keysContained = textsAndCounts.Keys.Where(c => line.Contains(c)); // take all the keys where the line has that key.
foreach (var key in keysContained)
{
textsAndCounts[key] += 1; // increment the count associated with that string.
}
}
return textsAndCounts;
}
The above code allows us to return a data structure with any amount of strings with a count.
I think this is a good example for you to save you some headaches going forward, and it's probably a good first toe-dip into design patterns. I'd suggest looking up some material on Data structures and their use cases.

Add two lines from csv file to array(s)

I have a csv file with the following data:
500000,0.005,6000
690000,0.003,5200
I need to add each line as a separate array. So 50000, 0.005, 6000 would be array1. How would I do this?
Currently my code adds each column into one element.
For example data[0] is showing 500000
690000
static void ReadFromFile(string filePath)
{
try
{
// Create an instance of StreamReader to read from a file.
// The using statement also closes the StreamReader.
using (StreamReader sr = new StreamReader(filePath))
{
string line;
// Read and display lines from the file until the end of
// the file is reached.
while ((line = sr.ReadLine()) != null)
{
string[] data = line.Split(',');
Console.WriteLine(data[0] + " " + data[1]);
}
}
}
catch (Exception e)
{
// Let the user know what went wrong.
Console.WriteLine("The file could not be read:");
Console.WriteLine(e.Message);
}
}
Using the limited data set you've provided...
const string test = #"500000,0.005,6000
690000,0.003,5200";
var result = test.Split('\n')
.Select(x=> x.Split(',')
.Select(y => Convert.ToDecimal(y))
.ToArray()
)
.ToArray();
foreach (var element in result)
{
Console.WriteLine($"{element[0]}, {element[1]}, {element[2]}");
}
Can it be done without LINQ? Yes, but it's messy...
const string test = #"500000,0.005,6000
690000,0.003,5200";
List<decimal[]> resultList = new List<decimal[]>();
string[] lines = test.Split('\n');
foreach (var line in lines)
{
List<decimal> decimalValueList = new List<decimal>();
string[] splitValuesByComma = line.Split(',');
foreach (string value in splitValuesByComma)
{
decimal convertedValue = Convert.ToDecimal(value);
decimalValueList.Add(convertedValue);
}
decimal[] decimalValueArray = decimalValueList.ToArray();
resultList.Add(decimalValueArray);
}
decimal[][] resultArray = resultList.ToArray();
That will give the exact same output as what I've done with the first example
If you may use a List<string[]> you do not have to worry about the array length.
In the following example, the variable lines will be a list arrays, like:
["500000", "0.005", "6000"]
["690000", "0.003", "5200"]
static void ReadFromFile(string filePath)
{
try
{
// Create an instance of StreamReader to read from a file.
// The using statement also closes the StreamReader.
using (StreamReader sr = new StreamReader(filePath))
{
List<string[]> lines = new List<string[]>();
string line;
// Read and display lines from the file until the end of
// the file is reached.
while ((line = sr.ReadLine()) != null)
{
string[] splittedLine = line.Split(',');
lines.Add(splittedLine);
}
}
}
catch (Exception e)
{
// Let the user know what went wrong.
Console.WriteLine("The file could not be read:");
Console.WriteLine(e.Message);
}
}
While other have split method, I will have a more "scolar"-"specified" method.
You have some Csv value in a file. Find a name for this object stored in a Csv, name every column, type them.
Define the default value of those field. Define what happends for missing column, and malformed field. Header?
Now that you know what you have, define what you want. This time again: Object name -> Property -> Type.
Believe me or not, the simple definition of your input and output solved your issue.
Use CsvHelper to simplify your code.
CSV File Definition:
public class CsvItem_WithARealName
{
public int data1;
public decimal data2;
public int goodVariableNames;
}
public class CsvItemMapper : ClassMap<CsvItem_WithARealName>
{
public CsvItemMapper()
{ //mapping based on index. cause file has no header.
Map(m => m.data1).Index(0);
Map(m => m.data2).Index(1);
Map(m => m.goodVariableNames).Index(2);
}
}
A Csv reader method, point a document it will give your the Csv Item.
Here we have some configuration: no header and InvariantCulture for decimal convertion
private IEnumerable<CsvItem_WithARealName> GetCsvItems(string filePath)
{
using (var fileReader = File.OpenText(filePath))
using (var csvReader = new CsvHelper.CsvReader(fileReader))
{
csvReader.Configuration.CultureInfo = CultureInfo.InvariantCulture;
csvReader.Configuration.HasHeaderRecord = false;
csvReader.Configuration.RegisterClassMap<CsvItemMapper>();
while (csvReader.Read())
{
var record = csvReader.GetRecord<CsvItem_WithARealName>();
yield return record;
}
}
}
Usage :
var filename = "csvExemple.txt";
var items = GetCsvItems(filename);

Ignore certain lines in a text file (C# Streamreader)

I'm trying to work out a way of removing records from a program I'm writing. I have a text file with all the customer data spread over a set of lines and I read in these lines one at a time and store them in a List
When writing I simply append to the file. However, for deleting I had the idea of adding a character such as * or # to the front of lines no longer needed. However I am unsure how to do this
Below is how I currrently read the data in:
Thanks in advance
StreamReader dataIn = null;
CustomerClass holdcus; //holdcus and holdacc are used as "holding pens" for the next customer/account
Accounts holdacc;
bool moreData = false;
string[] cusdata = new string[13]; //holds customer data
string[] accdata = new string[8]; //holds account data
if (fileIntegCheck(inputDataFile, ref dataIn))
{
moreData = getCustomer(dataIn, cusdata);
while (moreData == true)
{
holdcus = new CustomerClass(cusdata[0], cusdata[1], cusdata[2], cusdata[3], cusdata[4], cusdata[5], cusdata[6], cusdata[7], cusdata[8], cusdata[9], cusdata[10], cusdata[11], cusdata[12]);
customers.Add(holdcus);
int x = Convert.ToInt32(cusdata[12]);
for (int i = 0; i < x; i++) //Takes the ID number for the last customer, as uses it to set the first value of the following accounts
{ //this is done as a key to which accounts map to which customers
moreData = getAccount(dataIn, accdata);
accdata[0] = cusdata[0];
holdacc = new Accounts(accdata[0], accdata[1], accdata[2], accdata[3], accdata[4], accdata[5], accdata[6], accdata[7]);
accounts.Add(holdacc);
}
moreData = getCustomer(dataIn, cusdata);
}
}
if (moreData != null) dataIn.Close();
Since your using string arrays, you can just do cusdata[index] = "#"+cusdata[index] to append it to the beginning of the line. However if your question is how to delete it from the file, why not skip the above step and just not add the line you want deleted when writing the file?
Here is a small read / write sample that should suit your needs. If it doesnt then let me know in the comment.
class Program
{
static readonly string filePath = "c:\\test.txt";
static void Main(string[] args)
{
// Read your file
List<string> lines = ReadLines();
//Create your remove logic here ..
lines = lines.Where(x => x.Contains("Julia Roberts") != true).ToList();
// Rewrite the file
WriteLines(lines);
}
static List<string> ReadLines()
{
List<string> lines = new List<string>();
using (StreamReader sr = new StreamReader(new FileStream(filePath, FileMode.Open)))
{
while (!sr.EndOfStream)
{
string buffer = sr.ReadLine();
lines.Add(buffer);
// Just to show you the results
Console.WriteLine(buffer);
}
}
return lines;
}
static void WriteLines(List<string> lines)
{
using (StreamWriter sw = new StreamWriter(new FileStream(filePath, FileMode.Create)))
{
foreach (var line in lines)
{
sw.WriteLine(line);
}
}
}
}
I used the following "data sample" for this
Matt Damon 100 222
Julia Roberts 125 152
Robert Downey Jr. 150 402
Tom Hanks 55 932

Finding elements from a memory mapped file in C#

I need to find certain elements within a memory mapped file. I have managed to map the file, however I get some problems finding the elements. My idea was to save all file elements into a list, and then search on that list.
How do I create a function that returns a list with all elements of the mapped file?
// Index indicates the line to read from
public List<string> GetElement(int index) {
}
The way I am mapping the file:
public void MapFile(string path)
{
string mapName = Path.GetFileName(path);
try
{
// Opening existing mmf
if (mapName != null)
{
_mmf = MemoryMappedFile.OpenExisting(mapName);
}
// Setting the pointer at the start of the file
_pointer = 0;
// We create the accessor to read the file
_accessor = _mmf.CreateViewAccessor();
// We mark the file as open
_open = true;
}
catch (Exception ex) {....}
try
{
// Trying to create the mmf
_mmf = MemoryMappedFile.CreateFromFile(path);
// Setting the pointer at the start of the file
_pointer = 0;
// We create the accessor to read the file
_accessor = _mmf.CreateViewAccessor();
// We mark the file as open
_open = true;
}
catch (Exception exInner){..}
}
The file that I am mapping is a UTF-8 ASCII file. Nothing weird.
What I have done:
var list = new List<string>();
// String to store what we read
string trace = string.Empty;
// We read the byte of the pointer
b = _accessor.ReadByte(_pointer);
int tracei = 0;
var traceb = new byte[2048];
// If b is different from 0 we have some data to read
if (b != 0)
{
while (b != 0)
{
// Check if it's an endline
if (b == '\n')
{
trace = Encoding.UTF8.GetString(traceb, 0, tracei - 1);
list.Add(trace);
trace = string.Empty;
tracei = 0;
_lastIndex++;
}
else
{
traceb[tracei++] = b;
}
// Advance and read
b = _accessor.ReadByte(++_pointer);
}
}
The code is difficult to read for humans and is not very efficient. How can I improve it?
You are re-inventing StreamReader, it does exactly what you do. The odds that you really want a memory-mapped file are quite low, they take a lot of virtual memory which you only can make pay off if you repeatedly read the same file at different offsets. Which is very unlikely, text files must be read sequentially since you don't know how long the lines are.
Which makes this one line of code the probable best replacement for what you posted:
string[] trace = System.IO.File.ReadAllLines(path);

Categories

Resources