Editing a line in a file by its number [duplicate] - c#

This question already has answers here:
Edit a specific Line of a Text File in C#
(6 answers)
Closed 5 years ago.
I have to write an implementation of string that stores it's values on hard drive instead of ram (I know how stupid it sounds, but it's intended to teach us how different sorting algorithms work on ram and hard drive). This is what I've written so far:
class HDDArray : IEnumerable<int>
{
private string filePath;
public int this[int index]
{
get
{
using (var reader = new StreamReader(filePath))
{
string line = reader.ReadLine();
for (int i = 0; i < index; i++)
{
line = reader.ReadLine();
}
return Convert.ToInt32(line);
}
}
set
{
using (var fs = File.Open(filePath, FileMode.OpenOrCreate, FileAccess.ReadWrite))
{
var reader = new StreamReader(fs);
var writer = new StreamWriter(fs);
for (int i = 0; i < index; i++)
{
reader.ReadLine();
}
writer.WriteLine(value);
writer.Dispose();
}
}
}
public int Length
{
get
{
int length = 0;
using (var reader = new StreamReader(filePath))
{
while (reader.ReadLine() != null)
{
length++;
}
}
return length;
}
}
public HDDArray(string file)
{
filePath = file;
if (File.Exists(file))
File.WriteAllText(file, String.Empty);
else
File.Create(file).Dispose();
}
public IEnumerator<int> GetEnumerator()
{
using (var reader = new StreamReader(filePath))
{
string line;
while ((line = reader.ReadLine()) != null)
{
yield return Convert.ToInt32(line);
}
}
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
The problem I'm facing is when trying to edit a line (in the the set portion of the indexer) I end up adding a new line instead of editing the old one (it's pretty obvious why, I just can't figure how to fix it).

Your array is designed to work with integers. Such a class is quite easy to create because the length of all numbers is 4 bytes.
class HDDArray : IEnumerable<int>, IDisposable
{
readonly FileStream stream;
readonly BinaryWriter writer;
readonly BinaryReader reader;
public HDDArray(string file)
{
stream = new FileStream(file, FileMode.Create, FileAccess.ReadWrite);
writer = new BinaryWriter(stream);
reader = new BinaryReader(stream);
}
public int this[int index]
{
get
{
stream.Position = index * 4;
return reader.ReadInt32();
}
set
{
stream.Position = index * 4;
writer.Write(value);
}
}
public int Length
{
get
{
return (int)stream.Length / 4;
}
}
public IEnumerator<int> GetEnumerator()
{
stream.Position = 0;
while (reader.PeekChar() != -1)
yield return reader.ReadInt32();
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
public void Dispose()
{
reader?.Dispose();
writer?.Dispose();
stream?.Dispose();
}
}
Since the size of each array element is known, we can simply move to stream by changing its Position property.
BinaryWriter and BinaryReader are very comfortable to write and read numbers.
Open stream is a very heavy operation. Hence do it once when you create the class. At the end of the work, you need to clean up after themselves. So I implemented the IDisposable interface.
Usage:
HDDArray arr = new HDDArray("test.dat");
Console.WriteLine("Length: " + arr.Length);
for (int i = 0; i < 10; i++)
arr[i] = i;
Console.WriteLine("Length: " + arr.Length);
foreach (var n in arr)
Console.WriteLine(n);
// Console.WriteLine(arr[20]); // Exception!
arr.Dispose(); // release resources

I stand to be corrected, but I dont think there is an easy way to re-write a specific line, so you will probably find it easier to rewrite the file - modifying that line.
You could change your set code as follows:
set
{
var allLinesInFile = File.ReadAllLines(filepath);
allLinesInFile[index] = value;
File.WriteAllLines(filepath, allLinesInFile);
}
Goes without saying that there should be some safety checks in there to check the file exists and index < allLinesInFile.Length

I think for the sake of homework of sorting algorithms you needn't bother yourself memory size issues.
Of course please add checking file existing to read.
Note: Line counting in example starts from 0.
string[] lines = File.ReadAllLines(filePath);
using (StreamWriter writer = new StreamWriter(filePath))
{
for (int currentLineNmb = 0; currentLineNmb < lines.Length; currentLineNmb++ )
{
if (currentLineNmb == lineToEditNmb)
{
writer.WriteLine(lineToWrite);
continue;
}
writer.WriteLine(lines[currentLineNmb]);
}
}

Related

Write and Read an Array to a Binary File

I have an array consisting of 1 string value and 2 int values, which I would like to write to a binary file.
It consists of name, index and score.
I have attached the array code below, how could I write this to a file?
Player[] playerArr = new Player[10];
int index = 0;
index = index + 1; // when a new player is added the index is increased by one
Player p = new Player(txtName3.Text, index, Convert.ToInt16(txtScore.Text)); // set the values of the object p
p.refName = txtName3.Text; // set refName to be the string value that is entered in txtName
p.refTotalScore = Convert.ToInt16(txtScore.Text);
playerArr[index] = p; // set the p object to be equal to a position inside the array
I would also like to sort each instantiation of the array to be output in descending order of score. How could this be done?
The file handling code I have so far is:
private static void WriteToFile(Player[] playerArr, int size)
{
Stream sw;
BinaryFormatter bf = new BinaryFormatter();
try
{
sw = File.Open("Players.bin", FileMode.Create);
bf.Serialize(sw, playerArr[0]);
sw.Close();
sw = File.Open("Players.bin", FileMode.Append);
for (int x = 1; x < size; x++)
{
bf.Serialize(sw, playerArr[x]);
}
sw.Close();
}
catch (IOException e)
{
MessageBox.Show("" + e.Message);
}
}
private int ReadFromFile(Player[] playerArr)
{
int size = 0;
Stream sr;
try
{
sr = File.OpenRead("Players.bin");
BinaryFormatter bf = new BinaryFormatter();
try
{
while (sr.Position < sr.Length)
{
playerArr[size] = (Player)bf.Deserialize(sr);
size++;
}
sr.Close();
}
catch (SerializationException e)
{
sr.Close();
return size;
}
return size;
}
catch (IOException e)
{
MessageBox.Show("\n\n\tFile not found" + e.Message);
}
finally
{
lstLeaderboard2.Items.Add("");
}
return size;
}
For the first part, you need to mark your class as Serializable, like this:
[Serializable]
public class Player
It's fine to Append to a new file, so you can change your code to this:
sw = File.Open(#"C:\Players.bin", FileMode.Append);
for (int x = 0; x < size; x++)
{
bf.Serialize(sw, playerArr[x]);
}
sw.Close();
(with the appropriate exception handling, and you'll obviously need to amend this if the file might already exist).
For the second part, you can sort an array like this using LINQ:
var sortedList = playerArr.OrderBy(p => p.Score);
If you require an array as output, do this:
var sortedArray = playerArr.OrderBy(p => p.Score).ToArray();
(Here, Score is the name of the property on the Player class by which you want to sort.)
If you'd like any more help, you'll need to be more specific about the problem!

protobuf-net Serialize a nested list of objects using SerializeWithLengthPrefix

I'm currently trying to serialize the following data structure using protobuf-net:
[ProtoContract]
public class Recording
{
[ProtoMember(1)]
public string Name;
[ProtoMember(2)]
public List<Channel> Channels;
}
[ProtoContract]
public class Channel
{
[ProtoMember(1)]
public string ChannelName;
[ProtoMember(2)]
public List<float> DataPoints;
}
I have a fixed amount of 12 channels, however the amount of datapoints per channel can get very big (up to the Gb range for all channels).
Therefor (and because the data is a continuous stream) I don't want to read and save the structure for one recording at once, but utilise SerializeWithLengthPrefix (and DeserializeItems) to also save it continuously.
My question is, is it even possible to do this with such a nested structure or do I have to flatten it?
I've seen the examples for a list in the first hierarchy level, but none for my specific case.
Also, is there any benefit if I'd write the datapoints as "chunks" of 10, 100, ... (like using List instead of List) over serializing them directly?
Thanks in advance for your help
Tobias
The key challenge in what you are trying to do is that it is heavily stream based internally to each object. protobuf-net can work in that way, but it is not trivial. There is also an issue that you want to interleave data from a single channel over multiple fragments, which is not idiomatic protobuf layout. So the core object materializer code probably doesn't do quite what you want - i.e. treat it as an open stream, not all loaded into memory, for both read and write.
That said: you could use the raw reader/writer API to achieve streaming. You should probably compare and contrast to similar code using BinaryWriter / BinaryReader, but essentially the following works:
using ProtoBuf;
using System;
using System.Collections.Generic;
using System.IO;
static class Program
{
static void Main()
{
var path = "big.blob";
WriteFile(path);
int channelTotal = 0, pointTotal = 0;
foreach(var channel in ReadChannels(path))
{
channelTotal++;
pointTotal += channel.Points.Count;
}
Console.WriteLine("Read: {0} points in {1} channels", pointTotal, channelTotal);
}
private static void WriteFile(string path)
{
string[] channels = {"up", "down", "top", "bottom", "charm", "strange"};
var rand = new Random(123456);
int totalPoints = 0, totalChannels = 0;
using (var encoder = new DataEncoder(path, "My file"))
{
for (int i = 0; i < 100; i++)
{
var channel = new Channel {
Name = channels[rand.Next(channels.Length)]
};
int count = rand.Next(1, 50);
var data = new List<float>(count);
for (int j = 0; j < count; j++)
data.Add((float)rand.NextDouble());
channel.Points = data;
encoder.AddChannel(channel);
totalPoints += count;
totalChannels++;
}
}
Console.WriteLine("Wrote: {0} points in {1} channels; {2} bytes", totalPoints, totalChannels, new FileInfo(path).Length);
}
public class Channel
{
public string Name { get; set; }
public List<float> Points { get; set; }
}
public class DataEncoder : IDisposable
{
private Stream stream;
private ProtoWriter writer;
public DataEncoder(string path, string recordingName)
{
stream = File.Create(path);
writer = new ProtoWriter(stream, null, null);
if (recordingName != null)
{
ProtoWriter.WriteFieldHeader(1, WireType.String, writer);
ProtoWriter.WriteString(recordingName, writer);
}
}
public void AddChannel(Channel channel)
{
ProtoWriter.WriteFieldHeader(2, WireType.StartGroup, writer);
var channelTok = ProtoWriter.StartSubItem(null, writer);
if (channel.Name != null)
{
ProtoWriter.WriteFieldHeader(1, WireType.String, writer);
ProtoWriter.WriteString(channel.Name, writer);
}
var list = channel.Points;
if (list != null)
{
switch(list.Count)
{
case 0:
// nothing to write
break;
case 1:
ProtoWriter.WriteFieldHeader(2, WireType.Fixed32, writer);
ProtoWriter.WriteSingle(list[0], writer);
break;
default:
ProtoWriter.WriteFieldHeader(2, WireType.String, writer);
var dataToken = ProtoWriter.StartSubItem(null, writer);
ProtoWriter.SetPackedField(2, writer);
foreach (var val in list)
{
ProtoWriter.WriteFieldHeader(2, WireType.Fixed32, writer);
ProtoWriter.WriteSingle(val, writer);
}
ProtoWriter.EndSubItem(dataToken, writer);
break;
}
}
ProtoWriter.EndSubItem(channelTok, writer);
}
public void Dispose()
{
using (writer) { if (writer != null) writer.Close(); }
writer = null;
using (stream) { if (stream != null) stream.Close(); }
stream = null;
}
}
private static IEnumerable<Channel> ReadChannels(string path)
{
using (var file = File.OpenRead(path))
using (var reader = new ProtoReader(file, null, null))
{
while (reader.ReadFieldHeader() > 0)
{
switch (reader.FieldNumber)
{
case 1:
Console.WriteLine("Recording name: {0}", reader.ReadString());
break;
case 2: // each "2" instance represents a different "Channel" or a channel switch
var channelToken = ProtoReader.StartSubItem(reader);
int floatCount = 0;
List<float> list = new List<float>();
Channel channel = new Channel { Points = list };
while (reader.ReadFieldHeader() > 0)
{
switch (reader.FieldNumber)
{
case 1:
channel.Name = reader.ReadString();
break;
case 2:
switch (reader.WireType)
{
case WireType.String: // packed array - multiple floats
var dataToken = ProtoReader.StartSubItem(reader);
while (ProtoReader.HasSubValue(WireType.Fixed32, reader))
{
list.Add(reader.ReadSingle());
floatCount++;
}
ProtoReader.EndSubItem(dataToken, reader);
break;
case WireType.Fixed32: // simple float
list.Add(reader.ReadSingle());
floatCount++; // got 1
break;
default:
Console.WriteLine("Unexpected data wire-type: {0}", reader.WireType);
break;
}
break;
default:
Console.WriteLine("Unexpected field in channel: {0}/{1}", reader.FieldNumber, reader.WireType);
reader.SkipField();
break;
}
}
ProtoReader.EndSubItem(channelToken, reader);
yield return channel;
break;
default:
Console.WriteLine("Unexpected field in recording: {0}/{1}", reader.FieldNumber, reader.WireType);
reader.SkipField();
break;
}
}
}
}
}

How to read large file and split by "\r\n"

I have a large file >200MB. The file is an CSV-file from an external party, but sadly I cannot just read the file line by line, as \r\n is used to define a new line.
Currently I am reading in all the lines using this approach:
var file = File.ReadAllText(filePath, Encoding.Default);
var lines = Regex.Split(file, #"\r\n");
for (int i = 0; i < lines.Length; i++)
{
string line = lines[i];
...
}
How can I optimize this? After calling ReadAllText on my 225MB file, the process is using more than 1GB RAM. Is it possible to use a streaming approach in my case, where I need to split the file using my \r\n pattern?
EDIT1:
Your solutions using the File.ReadLines and a StreamReader will not work, as it sees each line in the file as one line. I need to split the file using my \r\n pattern. Reading the file using my code results in 758.371 lines (which is correct), whereas a normal line counts results in more than 1.5 million.
SOLUTION
public static IEnumerable<string> ReadLines(string path)
{
const string delim = "\r\n";
using (StreamReader sr = new StreamReader(path))
{
StringBuilder sb = new StringBuilder();
while (!sr.EndOfStream)
{
for (int i = 0; i < delim.Length; i++)
{
Char c = (char)sr.Read();
sb.Append(c);
if (c != delim[i])
break;
if (i == delim.Length - 1)
{
sb.Remove(sb.Length - delim.Length, delim.Length);
yield return sb.ToString();
sb = new StringBuilder();
break;
}
}
}
if (sb.Length>0)
yield return sb.ToString();
}
}
You can use File.ReadLines which returns IEnumerable<string> instead of loading whole file to memory.
foreach(var line in File.ReadLines(#filePath, Encoding.Default)
.Where(l => !String.IsNullOrEmpty(l)))
{
}
using StreamReader it will be easy.
using (StreamReader sr = new StreamReader(path))
{
foreach(string line = GetLine(sr))
{
//
}
}
IEnumerable<string> GetLine(StreamReader sr)
{
while (!sr.EndOfStream)
yield return new string(GetLineChars(sr).ToArray());
}
IEnumerable<char> GetLineChars(StreamReader sr)
{
if (sr.EndOfStream)
yield break;
var c1 = sr.Read();
if (c1 == '\\')
{
var c2 = sr.Read();
if (c2 == 'r')
{
var c3 = sr.Read();
if (c3 == '\\')
{
var c4 = sr.Read();
if (c4 == 'n')
{
yield break;
}
else
{
yield return (char)c1;
yield return (char)c2;
yield return (char)c3;
yield return (char)c4;
}
}
else
{
yield return (char)c1;
yield return (char)c2;
yield return (char)c3;
}
}
else
{
yield return (char)c1;
yield return (char)c2;
}
}
else
yield return (char)c1;
}
Use StreamReader to read file line by line:
using (StreamReader sr = new StreamReader(filePath))
{
while (true)
{
string line = sr.ReadLine();
if (line == null)
break;
}
}
How about
StreamReader sr = new StreamReader(path);
while (!sr.EndOfStream)
{
string line = sr.ReadLine();
}
Using the stream reader approach means the whole file won't get loaded into memory.
This was my lunch break :)
Set MAXREAD to the amount of data you want in memory if for example using a foreach since I'm using yield return. Use the code at your own risk, I've tried it on smaller sets of data :)
Your usage would be something like:
foreach (var row in StreamReader(FileName).SplitByChar(new char[] {'\r','\n'}))
{
// Do something awesome! :)
}
And the extension method like this:
public static class FileStreamExtensions
{
public static IEnumerable<string> SplitByChar(this StreamReader stream, char[] splitter)
{
int MAXREAD = 1024 * 1024;
var chars = new List<char>(MAXREAD);
var bytes = new char[MAXREAD];
var lastStop = 0;
var read = 0;
while (!stream.EndOfStream)
{
read = stream.Read(bytes, 0, MAXREAD);
lastStop = 0;
for (int i = 0; i < read; i++)
{
if (bytes[i] == splitter[0])
{
var assume = true;
for (int p = 1; p < splitter.Length; p++)
{
assume &= splitter[p] == bytes[i + p];
}
if (assume)
{
chars.AddRange(bytes.Skip(lastStop).Take(i - lastStop));
var res = new String(chars.ToArray());
chars.Clear();
yield return res;
i += splitter.Length - 1;
lastStop = i + 1;
}
}
}
chars.AddRange(bytes.Skip(lastStop));
}
chars.AddRange(bytes.Skip(lastStop).Take(read - lastStop));
yield return new String(chars.ToArray());
}
}

Most efficient way for reading IMDB movies list

I am reading IMDB movies listing from a text file on my harddrive (originally available from IMDB site at ftp://ftp.fu-berlin.de/pub/misc/movies/database/movies.list.gz).
It takes around 5 minutes on my machine (basic info: Win7 x64bit, 16GB RAM, 500 GB SATA Hardisk 7200 RPM) to read this file line by line using code below.
I have two questions:
Is there any way I can optimize code to improve the read time?
Data access don't need to be sequential as I won't mind reading data from top to bottom / bottom to top or any order for that matter as long as it read one line at a time. I am wondering is there a way to read in multiple directions to improve the read time?
The application is a Windows Console Application.
Update: Many responses correctly pointed out that Writing to the Console takes substantial time. Considering that the displaying of data on the Windows Console is now desirable but not mandatory.
//Code Block
string file = #"D:\movies.list";
FileStream fs = new FileStream(file, FileMode.Open, FileAccess.Read, FileShare.None, 8, FileOptions.None);
using (StreamReader sr = new StreamReader(fs))
{
while (sr.Peek() >= 0)
{
Console.WriteLine(sr.ReadLine());
}
}
I'm not certain whether this is more efficient or not, but an alternate method would be to use File.ReadAllLines:
var movieFile = File.ReadAllLines(file);
foreach (var movie in movieFile)
Console.WriteLine(movie);
I am not a c# developer, but how about doing a bulk insert into database using the file(which will be one time). Then you can reuse the data and export as well.
In .net 4 you can use File.ReadLines for lazy evaluation and thus lower RAM usage when working on large files.
You can do linq operation directly on files and this along with File.ReadLines would improve load time.
For better understanding you can check, Read text file word-by-word using LINQ
You can also do comparison as well but putting time intervals.
However if you making web app you can read whole file on application start event and cache them in application pool for better performanace.
First of all, if you don't care about printing out the list to console, please edit your question.
Second, I created a timing program to test the speeds of the different methods suggested:
class Program
{
private static readonly string file = #"movies.list";
private static readonly int testStart = 1;
private static readonly int numOfTests = 2;
private static readonly int MinTimingVal = 1000;
private static string[] testNames = new string[] {
"Naive",
"OneCallToWrite",
"SomeCallsToWrite",
"InParallel",
"InParallelBlcoks",
"IceManMinds",
"TestTiming"
};
private static double[] avgSecs = new double[numOfTests];
private static int[] testIterations = new int[numOfTests];
public static void Main(string[] args)
{
Console.WriteLine("Starting tests...");
Debug.WriteLine("Starting tests...");
Console.WriteLine("");
Debug.WriteLine("");
//*****************************
//The console is the bottle-neck, so we can
//speed-up redrawing it by only showing 1 line at a time.
Console.WindowHeight = 1;
Console.WindowWidth = 50;
Console.BufferHeight = 100;
Console.BufferWidth = 50;
//******************************
Action[] actionArray = new Action[numOfTests];
actionArray[0] = naive;
actionArray[1] = oneCallToWrite;
actionArray[2] = someCallsToWrite;
actionArray[3] = inParallel;
actionArray[4] = inParallelBlocks;
actionArray[5] = iceManMinds;
actionArray[6] = testTiming;
for (int i = testStart; i < actionArray.Length; i++)
{
Action a = actionArray[i];
DoTiming(a, i);
}
printResults();
Console.WriteLine("");
Debug.WriteLine("");
Console.WriteLine("Tests complete.");
Debug.WriteLine("Tests complete.");
Console.WriteLine("Press Enter to Close Console...");
Debug.WriteLine("Press Enter to Close Console...");
Console.ReadLine();
}
private static void DoTiming(Action a, int num)
{
a.Invoke();
Stopwatch watch = new Stopwatch();
Stopwatch loopWatch = new Stopwatch();
bool shouldRetry = false;
int numOfIterations = 2;
do
{
watch.Start();
for (int i = 0; i < numOfIterations; i++)
{
a.Invoke();
}
watch.Stop();
shouldRetry = false;
if (watch.ElapsedMilliseconds < MinTimingVal) //if the time was less than the minimum, increase load and re-time.
{
shouldRetry = true;
numOfIterations *= 2;
watch.Reset();
}
} while (shouldRetry);
long totalTime = watch.ElapsedMilliseconds;
double avgTime = ((double)totalTime) / (double)numOfIterations;
avgSecs[num] = avgTime / 1000.00;
testIterations[num] = numOfIterations;
}
private static void printResults()
{
Console.WriteLine("");
Debug.WriteLine("");
for (int i = testStart; i < numOfTests; i++)
{
TimeSpan t = TimeSpan.FromSeconds(avgSecs[i]);
Console.WriteLine("ElapsedTime: {0:N4}, " + "test: " + testNames[i], t.ToString() );
Debug.WriteLine("ElapsedTime: {0:N4}, " + "test: " + testNames[i], t.ToString() );
}
}
public static void naive()
{
FileStream fs = new FileStream(file, FileMode.Open, FileAccess.Read, FileShare.None, 8, FileOptions.None);
using (StreamReader sr = new StreamReader(fs))
{
while (sr.Peek() >= 0)
{
Console.WriteLine( sr.ReadLine() );
}
}
}
public static void oneCallToWrite()
{
FileStream fs = new FileStream(file, FileMode.Open, FileAccess.Read, FileShare.None, 8, FileOptions.None);
using (StreamReader sr = new StreamReader(fs))
{
StringBuilder sb = new StringBuilder();
while (sr.Peek() >= 0)
{
string s = sr.ReadLine();
sb.Append("\n" + s);
}
Console.Write(sb);
}
}
public static void someCallsToWrite()
{
FileStream fs = new FileStream(file, FileMode.Open, FileAccess.Read, FileShare.None, 8, FileOptions.None);
using (StreamReader sr = new StreamReader(fs))
{
StringBuilder sb = new StringBuilder();
int count = 0;
int mod = 10000;
while (sr.Peek() >= 0)
{
count++;
string s = sr.ReadLine();
sb.Append("\n" + s);
if (count % mod == 0)
{
Console.Write(sb);
sb = new StringBuilder();
}
}
Console.Write( sb );
}
}
public static void inParallel()
{
string[] wordsFromFile = File.ReadAllLines( file );
int length = wordsFromFile.Length;
Parallel.For( 0, length, i => {
Console.WriteLine( wordsFromFile[i] );
});
}
public static void inParallelBlocks()
{
string[] wordsFromFile = File.ReadAllLines(file);
int length = wordsFromFile.Length;
Parallel.For<StringBuilder>(0, length,
() => { return new StringBuilder(); },
(i, loopState, sb) =>
{
sb.Append("\n" + wordsFromFile[i]);
return sb;
},
(x) => { Console.Write(x); }
);
}
#region iceManMinds
public static void iceManMinds()
{
string FileName = file;
long ThreadReadBlockSize = 50000;
int NumberOfThreads = 4;
byte[] _inputString;
var fi = new FileInfo(FileName);
long totalBytesRead = 0;
long fileLength = fi.Length;
long readPosition = 0L;
Console.WriteLine("Reading Lines From {0}", FileName);
var threads = new Thread[NumberOfThreads];
var instances = new ReadThread[NumberOfThreads];
_inputString = new byte[fileLength];
while (totalBytesRead < fileLength)
{
for (int i = 0; i < NumberOfThreads; i++)
{
var rt = new ReadThread { StartPosition = readPosition, BlockSize = ThreadReadBlockSize };
instances[i] = rt;
threads[i] = new Thread(rt.Read);
threads[i].Start();
readPosition += ThreadReadBlockSize;
}
for (int i = 0; i < NumberOfThreads; i++)
{
threads[i].Join();
}
for (int i = 0; i < NumberOfThreads; i++)
{
if (instances[i].BlockSize > 0)
{
Array.Copy(instances[i].Output, 0L, _inputString, instances[i].StartPosition,
instances[i].BlockSize);
totalBytesRead += instances[i].BlockSize;
}
}
}
string finalString = Encoding.ASCII.GetString(_inputString);
Console.WriteLine(finalString);//.Substring(104250000, 50000));
}
private class ReadThread
{
public long StartPosition { get; set; }
public long BlockSize { get; set; }
public byte[] Output { get; private set; }
public void Read()
{
Output = new byte[BlockSize];
var inStream = new FileStream(file, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
inStream.Seek(StartPosition, SeekOrigin.Begin);
BlockSize = inStream.Read(Output, 0, (int)BlockSize);
inStream.Close();
}
}
#endregion
public static void testTiming()
{
Thread.Sleep(500);
}
}
Each of these tests print the file out to console.
When run under default Console settings, each test took between 5:30 and 6:10 (Min:Sec).
After considering the Console properties, by making Console.WindowHeight = 1, that is, only 1 line is shown at a time, (you can scroll up and down to see the most recent 100 lines), and I achieved a speed-up.
Currently, the task completes in just a little over 2:40 (Min:Sec) for most methods.
Try it out on your computer and see how it works for you.
Interestingly enough, the different methods were basically equivalent, with the OP's code being basically the fastest.
The timing code warms-up the code then runs it twice and averages the time it takes, it does this for each method.
Feel free to try out your own methods and time them.
The answer to this question really depends on what it is you will be doing with the data. If your intention truly is to just read in the file and dump the contents to the console screen, then it would be better to use the StringBuilder Class to build up a string of, say 1000 lines, then dump the contents to the screen, reset the string then read in another 1000 lines, dump them, etc etc...
However if you are trying to build something that is part of a larger project and you are using .NET 4.0, you can use the MemoryMappedFile Class to read the file and create a CreateViewAccessor to create a "window" that operates on just a portion of the data instead of reading in the entire file.
Another option would be to make Threads that read different parts of the file all at once, then puts it all together in the end.
If you can be more specific as to what you plan to do with this data, I can help you more. Hope this helps!
EDIT:
Try this code out man. I was able to read the whole list in literally 3 seconds time using Threads:
using System;
using System.IO;
using System.Text;
using System.Threading;
namespace ConsoleApplication36
{
class Program
{
private const string FileName = #"C:\Users\Public\movies.list";
private const long ThreadReadBlockSize = 50000;
private const int NumberOfThreads = 4;
private static byte[] _inputString;
static void Main(string[] args)
{
var fi = new FileInfo(FileName);
long totalBytesRead = 0;
long fileLength = fi.Length;
long readPosition = 0L;
Console.WriteLine("Reading Lines From {0}", FileName);
var threads = new Thread[NumberOfThreads];
var instances = new ReadThread[NumberOfThreads];
_inputString = new byte[fileLength];
while (totalBytesRead < fileLength)
{
for (int i = 0; i < NumberOfThreads; i++)
{
var rt = new ReadThread { StartPosition = readPosition, BlockSize = ThreadReadBlockSize };
instances[i] = rt;
threads[i] = new Thread(rt.Read);
threads[i].Start();
readPosition += ThreadReadBlockSize;
}
for (int i = 0; i < NumberOfThreads; i++)
{
threads[i].Join();
}
for (int i = 0; i < NumberOfThreads; i++)
{
if (instances[i].BlockSize > 0)
{
Array.Copy(instances[i].Output, 0L, _inputString, instances[i].StartPosition,
instances[i].BlockSize);
totalBytesRead += instances[i].BlockSize;
}
}
}
string finalString = Encoding.ASCII.GetString(_inputString);
Console.WriteLine(finalString.Substring(104250000, 50000));
}
private class ReadThread
{
public long StartPosition { get; set; }
public long BlockSize { get; set; }
public byte[] Output { get; private set; }
public void Read()
{
Output = new byte[BlockSize];
var inStream = new FileStream(FileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
inStream.Seek(StartPosition, SeekOrigin.Begin);
BlockSize = inStream.Read(Output, 0, (int)BlockSize);
inStream.Close();
}
}
}
}
You will need to change the FileName to match the location of your movies.list file. Also, you can adjust the total number of threads. I used 4, but you can decrease or increase this at will. You can also change the Block Size...This is how much data each thread reads in. Also, I'm assuming its an ASCII text file. If its not, you need to change the encoding type to UTF8 or whatever encoding the file is in. Good luck!

Read double value from a file C#

I have a txt file that the format is:
0.32423 1.3453 3.23423
0.12332 3.1231 9.23432432
9.234324234 -1.23432 12.23432
...
Each line has three double value. There are more than 10000 lines in this file. I can use the ReadStream.ReadLine and use the String.Split, then convert it.
I want to know is there any faster method to do it.
Best Regards,
StreamReader.ReadLine, String.Split and Double.TryParse sounds like a good solution here.
No need for improvement.
There may be some little micro-optimisations you can perform, but the way you've suggested sounds about as simple as you'll get.
10000 lines shouldn't take very long - have you tried it and found you've actually got a performance problem? For example, here are two short programs - one creates a 10,000 line file and the other reads it:
CreateFile.cs:
using System;
using System.IO;
public class Test
{
static void Main()
{
Random rng = new Random();
using (TextWriter writer = File.CreateText("test.txt"))
{
for (int i = 0; i < 10000; i++)
{
writer.WriteLine("{0} {1} {2}", rng.NextDouble(),
rng.NextDouble(), rng.NextDouble());
}
}
}
}
ReadFile.cs:
using System;
using System.Diagnostics;
using System.IO;
using System.Linq;
public class Test
{
static void Main()
{
Stopwatch sw = Stopwatch.StartNew();
using (TextReader reader = File.OpenText("test.txt"))
{
string line;
while ((line = reader.ReadLine()) != null)
{
string[] bits = line.Split(' ');
foreach (string bit in bits)
{
double value;
if (!double.TryParse(bit, out value))
{
Console.WriteLine("Bad value");
}
}
}
}
sw.Stop();
Console.WriteLine("Total time: {0}ms",
sw.ElapsedMilliseconds);
}
}
On my netbook (which admittedly has an SSD in) it only takes 82ms to read the file. I would suggest that's probably not a problem :)
I would suggest reading all your lines at once with
string[] lines = System.IO.File.ReadAllLines(fileName);
This wold ensure that the I/O is done with the maximum efficiency. You woul have to measure (profile) but I would expect the conversions to take far less time.
your method is already good!
you can improve it by writing a readline function that returns an array of double and you reuse this function in other programs.
This solution is a little bit slower (see benchmarks at the end), but its nicer to read. It should also be more memory efficient because only the current character is buffered at the time (instead of the whole file or line).
Reading arrays is an additional feature in this reader which assumes that the size of the array always comes first as an int-value.
IParsable is another feature, that makes it easy to implement Parse methods for various types.
class StringSteamReader {
private StreamReader sr;
public StringSteamReader(StreamReader sr) {
this.sr = sr;
this.Separator = ' ';
}
private StringBuilder sb = new StringBuilder();
public string ReadWord() {
eol = false;
sb.Clear();
char c;
while (!sr.EndOfStream) {
c = (char)sr.Read();
if (c == Separator) break;
if (IsNewLine(c)) {
eol = true;
char nextch = (char)sr.Peek();
while (IsNewLine(nextch)) {
sr.Read(); // consume all newlines
nextch = (char)sr.Peek();
}
break;
}
sb.Append(c);
}
return sb.ToString();
}
private bool IsNewLine(char c) {
return c == '\r' || c == '\n';
}
public int ReadInt() {
return int.Parse(ReadWord());
}
public double ReadDouble() {
return double.Parse(ReadWord());
}
public bool EOF {
get { return sr.EndOfStream; }
}
public char Separator { get; set; }
bool eol;
public bool EOL {
get { return eol || sr.EndOfStream; }
}
public T ReadObject<T>() where T : IParsable, new() {
var obj = new T();
obj.Parse(this);
return obj;
}
public int[] ReadIntArray() {
int size = ReadInt();
var a = new int[size];
for (int i = 0; i < size; i++) {
a[i] = ReadInt();
}
return a;
}
public double[] ReadDoubleArray() {
int size = ReadInt();
var a = new double[size];
for (int i = 0; i < size; i++) {
a[i] = ReadDouble();
}
return a;
}
public T[] ReadObjectArray<T>() where T : IParsable, new() {
int size = ReadInt();
var a = new T[size];
for (int i = 0; i < size; i++) {
a[i] = ReadObject<T>();
}
return a;
}
internal void NextLine() {
eol = false;
}
}
interface IParsable {
void Parse(StringSteamReader r);
}
It can be used like this:
public void Parse(StringSteamReader r) {
double x = r.ReadDouble();
int y = r.ReadInt();
string z = r.ReadWord();
double[] arr = r.ReadDoubleArray();
MyParsableObject o = r.ReadObject<MyParsableObject>();
MyParsableObject [] oarr = r.ReadObjectArray<MyParsableObject>();
}
I did some benchmarking, comparing StringStreamReader with some other approaches, already proposed (StreamReader.ReadLine and File.ReadAllLines). Here are the methods I used for benchmarking:
private static void Test_StringStreamReader(string filename) {
var sw = new Stopwatch();
sw.Start();
using (var sr = new StreamReader(new FileStream(filename, FileMode.Open, FileAccess.Read))) {
var r = new StringSteamReader(sr);
r.Separator = ' ';
while (!r.EOF) {
var dbls = new List<double>();
while (!r.EOF) {
dbls.Add(r.ReadDouble());
}
}
}
sw.Stop();
Console.WriteLine("elapsed: {0}", sw.Elapsed);
}
private static void Test_ReadLine(string filename) {
var sw = new Stopwatch();
sw.Start();
using (var sr = new StreamReader(new FileStream(filename, FileMode.Open, FileAccess.Read))) {
var dbls = new List<double>();
while (!sr.EndOfStream) {
string line = sr.ReadLine();
string[] bits = line.Split(' ');
foreach(string bit in bits) {
dbls.Add(double.Parse(bit));
}
}
}
sw.Stop();
Console.WriteLine("elapsed: {0}", sw.Elapsed);
}
private static void Test_ReadAllLines(string filename) {
var sw = new Stopwatch();
sw.Start();
string[] lines = System.IO.File.ReadAllLines(filename);
var dbls = new List<double>();
foreach(var line in lines) {
string[] bits = line.Split(' ');
foreach (string bit in bits) {
dbls.Add(double.Parse(bit));
}
}
sw.Stop();
Console.WriteLine("Test_ReadAllLines: {0}", sw.Elapsed);
}
I used a file with 1.000.000 lines of double values (3 values each line). File is located on a SSD disk and each test was repeated multiple times in release-mode. These are the results (on average):
Test_StringStreamReader: 00:00:01.1980975
Test_ReadLine: 00:00:00.9117553
Test_ReadAllLines: 00:00:01.1362452
So, as mentioned StringStreamReader is a bit slower than the other approaches. For 10.000 lines, the performance is around (120ms / 95ms / 100ms).

Categories

Resources