producer and consumer issue in operating system - c#

Producer consumer problem using c# from book Abraham Silberschatz-Operating System Concepts.
I have wrote code of this pseudocode in C# ,,, but a warning occurs "Unreachable code detected in line 43"...... i am new to programming ..little guide is needed to solve this problem !
pseudo-code given in book:
#define BUFFER_SIZE 5
typedef struct {
. . .
} item;
item buffer[BUFFER_SIZE];
int in = 0;
int out = 0;
Producer:
item next_produced;
while (true) {
/* produce an item in next produced */
while (((in + 1) % BUFFER_SIZE) == out)
; /* do nothing */
buffer[in] = next_produced;
in = (in + 1) % BUFFER_SIZE;
}
Consumer:
item next_consumed;
while (true) { while (in == out)
; /* do nothing */ next_consumed = buffer[out];
out = (out + 1) % BUFFER_SIZE;
/* consume the item in next consumed */
}
My Code:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace producer_consumer_problem_Csharp
{
struct item
{
private int iData;
public item(int x)
{
this.iData = x;
}
};
class Program
{
static void Main(string[] args)
{
int Buffer_Size = 5;
item[] buffer = new item[Buffer_Size];
int inp = 0;
int outp = 0;
item next_produced;
while(true)
{
Console.WriteLine("Enter the next produced item");
int x = Convert.ToInt32( Console.ReadLine() );
next_produced = new item(x);
while ((inp + 1) % Buffer_Size == outp) ;
// do nothing
buffer[inp] = next_produced;
inp = (inp + 1) % Buffer_Size;
}
item next_consumed = new item();
while (true)
{
while (inp == outp);
/*donothing*/
next_consumed = buffer[outp];
outp = (outp +1) % Buffer_Size; /* consume the item in next consumed */
Console.WriteLine("Next consuumed item is: {0} ", next_consumed);
}
}
}
}

As Soner mentioned, the first infinite loop prevents second one from running. To get your "consumer-producer" pair to work you have to run both parts in concurrent threads. Here's an example:
struct item
{
private int iData;
public item(int x)
{
this.iData = x;
}
public override string ToString()
{
return iData.ToString();
}
};
class Program
{
static void Main(string[] args)
{
int Buffer_Size = 5;
item[] buffer = new item[Buffer_Size];
int inp = 0;
int outp = 0;
var console_lock = new object();
new Thread(() =>
{
item next_produced;
while (true)
{
lock (console_lock)
{
Console.WriteLine("Enter the next produced item");
int x = Convert.ToInt32(Console.ReadLine());
next_produced = new item(x);
}
while ((inp + 1) % Buffer_Size == outp) ;
// do nothing
buffer[inp] = next_produced;
inp = (inp + 1) % Buffer_Size;
}
}).Start();
new Thread(() =>
{
item next_consumed = new item();
while (true)
{
while (inp == outp) ;
/*donothing*/
next_consumed = buffer[outp];
outp = (outp + 1) % Buffer_Size; /* consume the item in next consumed */
lock (console_lock) Console.WriteLine("Next consuumed item is: {0} ", next_consumed);
}
}).Start();
Thread.Sleep(Timeout.Infinite);
}
}
Please keep in mind that this is a way unefficient implementation and should be considered as just a primitive example of "producer-consumer" pattern, bun not a guide. This code will needlessly consume a lot of your computing power for a /*donothing*/ which is basically a SpinWait.

Related

Producer/consumer doesn't generate expected results

I've written such producer/consumer code, which should generate big file filled with random data
class Program
{
static void Main(string[] args)
{
Random random = new Random();
String filename = #"d:\test_out";
long numlines = 1000000;
var buffer = new BlockingCollection<string[]>(10); //limit to not get OOM.
int arrSize = 100; //size of each string chunk in buffer;
String[] block = new string[arrSize];
Task producer = Task.Factory.StartNew(() =>
{
long blockNum = 0;
long lineStopped = 0;
for (long i = 0; i < numlines; i++)
{
if (blockNum == arrSize)
{
buffer.Add(block);
blockNum = 0;
lineStopped = i;
}
block[blockNum] = random.Next().ToString();
//null is sign to stop if last block is not fully filled
if (blockNum < arrSize - 1)
{
block[blockNum + 1] = null;
}
blockNum++;
};
if (lineStopped < numlines)
{
buffer.Add(block);
}
buffer.CompleteAdding();
}, TaskCreationOptions.LongRunning);
Task consumer = Task.Factory.StartNew(() =>
{
using (var outputFile = new StreamWriter(filename))
{
foreach (string[] chunk in buffer.GetConsumingEnumerable())
{
foreach (string value in chunk)
{
if (value == null) break;
outputFile.WriteLine(value);
}
}
}
}, TaskCreationOptions.LongRunning);
Task.WaitAll(producer, consumer);
}
}
And it does what is intended to do. But for some unknown reason it produces only ~550000 strings, not 1000000 and I can not understand why this is happening.
Can someone point on my mistake? I really don't get what's wrong with this code.
The buffer
String[] block = new string[arrSize];
is declared outside the Lambda. That means it is captured and re-used.
That would normally go unnoticed (you would just write out the wrong random data) but because your if (blockNum < arrSize - 1) is placed inside the for loop you regularly write a null into the shared buffer.
Exercise, instead of:
block[blockNum] = random.Next().ToString();
use
block[blockNum] = i.ToString();
and predict and verify the results.

Task process ending before finishing all the work

I've been having trouble running multiple tasks with heavy operations.
It seems as if the task processes is killed before all the operations are complete.
The code here is an example code I used to replicate the issue. If I add something like Debug.Write(), the added wait for writing fixes the issue. The issue is gone if I test on a smaller sample size too. The reason there is a class in the example below is to create complexity for the test.
The real case where I encountered the issue first is too complicated to explain for a post here.
public static class StaticRandom
{
static int seed = Environment.TickCount;
static readonly ThreadLocal<Random> random =
new ThreadLocal<Random>(() => new Random(Interlocked.Increment(ref seed)));
public static int Next()
{
return random.Value.Next();
}
public static int Next(int maxValue)
{
return random.Value.Next(maxValue);
}
public static double NextDouble()
{
return random.Value.NextDouble();
}
}
// this is the test function I run to recreate the problem:
static void tasktest()
{
var testlist = new List<ExampleClass>();
for (var index = 0; index < 10000; ++index)
{
var newClass = new ExampleClass();
newClass.Populate(Enumerable.Range(0, 1000).ToList());
testlist.Add(newClass);
}
var anotherClassList = new List<ExampleClass>();
var threadNumber = 5;
if (threadNumber > testlist.Count)
{
threadNumber = testlist.Count;
}
var taskList = new List<Task>();
var tokenSource = new CancellationTokenSource();
CancellationToken cancellationToken = tokenSource.Token;
int stuffPerThread = testlist.Count / threadNumber;
var stuffCounter = 0;
for (var count = 1; count <= threadNumber; ++count)
{
var toSkip = stuffCounter;
var threadWorkLoad = stuffPerThread;
var currentIndex = count;
// these ifs make sure all the indexes are covered
if (stuffCounter + threadWorkLoad > testlist.Count)
{
threadWorkLoad = testlist.Count - stuffCounter;
}
else if (count == threadNumber && stuffCounter + threadWorkLoad < testlist.Count)
{
threadWorkLoad = testlist.Count - stuffCounter;
}
taskList.Add(Task.Factory.StartNew(() => taskfunc(testlist, anotherClassList, toSkip, threadWorkLoad),
cancellationToken, TaskCreationOptions.None, TaskScheduler.Default));
stuffCounter += stuffPerThread;
}
Task.WaitAll(taskList.ToArray());
}
public class ExampleClass
{
public ExampleClassInner[] Inners { get; set; }
public ExampleClass()
{
Inners = new ExampleClassInner[5];
for (var index = 0; index < Inners.Length; ++index)
{
Inners[index] = new ExampleClassInner();
}
}
public void Populate(List<int> intlist) {/*adds random ints to the inner class*/}
public ExampleClass(ExampleClass copyFrom)
{
Inners = new ExampleClassInner[5];
for (var index = 0; index < Inners.Length; ++index)
{
Inners[index] = new ExampleClassInner(copyFrom.Inners[index]);
}
}
public class ExampleClassInner
{
public bool SomeBool { get; set; } = false;
public int SomeInt { get; set; } = -1;
public ExampleClassInner()
{
}
public ExampleClassInner(ExampleClassInner copyFrom)
{
SomeBool = copyFrom.SomeBool;
SomeInt = copyFrom.SomeInt;
}
}
}
static int expensivefunc(int theint)
{
/*a lot of pointless arithmetic and loops done only on primitives and with primitives,
just to increase the complexity*/
theint *= theint + 1;
var anotherlist = Enumerable.Range(0, 10000).ToList();
for (var index = 0; index < anotherlist.Count; ++index)
{
theint += index;
if (theint % 5 == 0)
{
theint *= index / 2;
}
}
var yetanotherlist = Enumerable.Range(0, 50000).ToList();
for (var index = 0; index < yetanotherlist.Count; ++index)
{
theint += index;
if (theint % 7 == 0)
{
theint -= index / 3;
}
}
while (theint > 8)
{
theint /= 2;
}
return theint;
}
// this function is intentionally creating a lot of objects, to simulate complexity
static void taskfunc(List<ExampleClass> intlist, List<ExampleClass> anotherClassList, int skip, int take)
{
if (take == 0)
{
take = intlist.Count;
}
var partial = intlist.Skip(skip).Take(take).ToList();
for (var index = 0; index < partial.Count; ++index)
{
var testint = expensivefunc(index);
var newClass = new ExampleClass(partial[index]);
newDna.Inners[StaticRandom.Next(5)].SomeInt = testint;
anotherClassList.Add(new ExampleClass(newClass));
}
}
The expected result is that the list anotherClassList will be the same size as testlist and this happens when the lists are smaller or the complexity of the task operations is smaller. However, when I increase the volume of operations, the anotherClassList has a few indexes missing and sometimes some of the indexes in the list are null objects.
Example result:
Why does this happen, I have Task.WaitAll?
Your problem is it's just not thread-safe; you just can't add to a list<T> in a multi-threaded environment and expect it to play nice.
One way is to use lock or a thread safe collection, but I feel this all should be refactored (my OCD is going off all over the place).
private static object _sync = new object();
...
private static void TaskFunc(List<ExampleClass> intlist, List<ExampleClass> anotherClassList, int skip, int take)
{
...
var partial = intlist.Skip(skip).Take(take).ToList();
...
// note that locking here will likely drastically decrease any performance threading gain
lock (_sync)
{
for (var index = 0; index < partial.Count; ++index)
{
// this is your problem, you are adding to a list from multiple threads
anotherClassList.Add(...);
}
}
}
In short, I think you need to better thinking about the threading logic of your method, identify what you are trying to achieve, and how to make it conceptually thread safe (while keeping your performance gains).
After TheGeneral enlightened me that Lists are not thread safe, I changed the List to which I was adding in a thread, to an Array type and this fixed my issue.

c# How to run a application faster

I am creating a word list of possible uppercase letters to prove how insecure 8 digit passwords are this code will write aaaaaaaa to aaaaaaab to aaaaaaac etc. until zzzzzzzz using this code:
class Program
{
static string path;
static int file = 0;
static void Main(string[] args)
{
new_file();
var alphabet = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ123456789+-*_!$£^=<>§°ÖÄÜöäü.;:,?{}[]";
var q = alphabet.Select(x => x.ToString());
int size = 3;
int counter = 0;
for (int i = 0; i < size - 1; i++)
{
q = q.SelectMany(x => alphabet, (x, y) => x + y);
}
foreach (var item in q)
{
if (counter >= 20000000)
{
new_file();
counter = 0;
}
if (File.Exists(path))
{
using (StreamWriter sw = File.AppendText(path))
{
sw.WriteLine(item);
Console.WriteLine(item);
/*if (!(Regex.IsMatch(item, #"(.)\1")))
{
sw.WriteLine(item);
counter++;
}
else
{
Console.WriteLine(item);
}*/
}
}
else
{
new_file();
}
}
}
static void new_file()
{
path = #"C:\" + "list" + file + ".txt";
if (!File.Exists(path))
{
using (StreamWriter sw = File.CreateText(path))
{
}
}
file++;
}
}
The Code is working fine but it takes Weeks to run it. Does anyone know a way to speed it up or do I have to wait? If anyone has a idea please tell me.
Performance:
size 3: 0.02s
size 4: 1.61s
size 5: 144.76s
Hints:
removed LINQ for combination generation
removed Console.WriteLine for each password
removed StreamWriter
large buffer (128k) for file writing
const string alphabet = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ123456789+-*_!$£^=<>§°ÖÄÜöäü.;:,?{}[]";
var byteAlphabet = alphabet.Select(ch => (byte)ch).ToArray();
var alphabetLength = alphabet.Length;
var newLine = new[] { (byte)'\r', (byte)'\n' };
const int size = 4;
var number = new byte[size];
var password = Enumerable.Range(0, size).Select(i => byteAlphabet[0]).Concat(newLine).ToArray();
var watcher = new System.Diagnostics.Stopwatch();
watcher.Start();
var isRunning = true;
for (var counter = 0; isRunning; counter++)
{
Console.Write("{0}: ", counter);
Console.Write(password.Select(b => (char)b).ToArray());
using (var file = System.IO.File.Create(string.Format(#"list.{0:D5}.txt", counter), 2 << 16))
{
for (var i = 0; i < 2000000; ++i)
{
file.Write(password, 0, password.Length);
var j = size - 1;
for (; j >= 0; j--)
{
if (number[j] < alphabetLength - 1)
{
password[j] = byteAlphabet[++number[j]];
break;
}
else
{
number[j] = 0;
password[j] = byteAlphabet[0];
}
}
if (j < 0)
{
isRunning = false;
break;
}
}
}
}
watcher.Stop();
Console.WriteLine(watcher.Elapsed);
}
Try the following modified code. In LINQPad it runs in < 1 second. With your original code I gave up after 40 seconds. It removes the overhead of opening and closing the file for every WriteLine operation. You'll need to test and ensure it gives the same results because I'm not willing to run your original code for 24 hours to ensure the output is the same.
class Program
{
static string path;
static int file = 0;
static void Main(string[] args)
{
new_file();
var alphabet = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ123456789+-*_!$£^=<>§°ÖÄÜöäü.;:,?{}[]";
var q = alphabet.Select(x => x.ToString());
int size = 3;
int counter = 0;
for (int i = 0; i < size - 1; i++)
{
q = q.SelectMany(x => alphabet, (x, y) => x + y);
}
StreamWriter sw = File.AppendText(path);
try
{
foreach (var item in q)
{
if (counter >= 20000000)
{
sw.Dispose();
new_file();
counter = 0;
}
sw.WriteLine(item);
Console.WriteLine(item);
}
}
finally
{
if(sw != null)
{
sw.Dispose();
}
}
}
static void new_file()
{
path = #"C:\temp\list" + file + ".txt";
if (!File.Exists(path))
{
using (StreamWriter sw = File.CreateText(path))
{
}
}
file++;
}
}
your alphabet is missing 0
With that fixed there would be 89 chars in your set. Let's call it 100 for simplicity. The set you are looking for is all the 8 character length strings drawn from that set. There are 100^8 of these, i.e. 10,000,000,000,000,000.
The disk space they will take up depends on how you encode them, lets be generous - assume you use some 8 bit char set that contains the these characters, and you don't put in carriage returns, so one byte per char, so 10,000,000,000,000,000 bytes =~ 10 peta byes?
Do you have 10 petabytes of disk? (10000 TB)?
[EDIT] In response to 'this is not an answer':
The original motivation is to create the list? The shows how large the list would be. Its hard to see what could be DONE with the list if it was actualised, i.e. it would always be quicker to reproduce it than to load it. Surely whatever point could be made by producing the list can also be made by simply knowing it's size, which the above shows how to work it out.
There are LOTS of inefficiencies in you code, but if your questions is 'how can i quickly produce this list and write it to disk' the answer is 'you literally cannot'.
[/EDIT]

Parsing Through Wave File with BinaryReader

In .NET Assembly mscorlib System.IO namespace, I am using ReadInt16() method to loop through audio data bytes and dumping signed integer values into a text file. How does one interpret the two values associated with one sample rate? That is if I have one second of mono data there will be 88200 bytes, hence using ReadInt16() returns 88200 discrete integers. This is too much information, I should only have 44100 integers. So do I need to use a different method or perhaps advance the loop by 1 per each iteration.
Many thanks..........Mickey
using System;
using System.IO;
public struct WaveFormat
{
private short m_FormatTag; // most often PCM = 1
private short m_nChannels; // number of channels
private int m_SamplesPerSecond; // samples per second eg 44100
private int m_AvgBytesPerSecond; // bytes per second eg 176000
private short m_BlockAlign; // blockalign (byte per sample) eg 4 bytes
private short m_BitsPerSample; // bits per sample, 8, 16, 24
public WaveFormat(byte BPS, int SPS, byte nChn)
{
m_FormatTag = 1; //PCM
m_nChannels = nChn;
m_SamplesPerSecond = SPS;
m_BitsPerSample = BPS;
m_BlockAlign = (short)(m_nChannels * m_BitsPerSample / 8);
m_AvgBytesPerSecond = (int)(m_BlockAlign * m_SamplesPerSecond);
}
public short FormatTag
{
get { return m_FormatTag; }
set { m_FormatTag = value; }
}
public short Channels
{
get { return m_nChannels; }
}
public int SamplesPerSecond
{
get { return m_SamplesPerSecond; }
}
public int AvgBytesPerSecond
{
get { return m_AvgBytesPerSecond; }
}
public short BlockAlign
{
get { return m_BlockAlign; }
}
public short BitsPerSample
{
get { return m_BitsPerSample; }
}
public void Read(BinaryReader br)
{
m_FormatTag = br.ReadInt16();
m_nChannels = br.ReadInt16();
m_SamplesPerSecond = br.ReadInt32();
m_AvgBytesPerSecond = br.ReadInt32();
m_BlockAlign = br.ReadInt16();
m_BitsPerSample = br.ReadInt16();
}
public void Write(BinaryWriter bw)
{
bw.Write(m_FormatTag);
bw.Write(m_nChannels);
bw.Write(m_SamplesPerSecond);
bw.Write(m_AvgBytesPerSecond);
bw.Write(m_BlockAlign);
bw.Write(m_BitsPerSample);
}
public override string ToString()
{
System.Text.StringBuilder sb = new System.Text.StringBuilder();
sb.AppendLine("FormatTag: " + m_FormatTag.ToString());
sb.AppendLine("nChannels: " + m_nChannels.ToString());
sb.AppendLine("SamplesPerSecond: " + m_SamplesPerSecond.ToString());
sb.AppendLine("AvgBytesPerSecond: " + m_AvgBytesPerSecond.ToString());
sb.AppendLine("BlockAlign: " + m_BlockAlign.ToString());
sb.AppendLine("BitsPerSample: " + m_BitsPerSample.ToString());
return sb.ToString();
}
}
Generally when you read arrays of data your code should look like:
for(int i = 0; i < totalNumberOfEntries; i++)
{
// read all data for this entry
var component1 = reader.ReadXXX();
var component2 = reader.ReadXXX();
// deal with data for this entry
someEntryStroage.Add(new Entry(component1, component2);
}
Most likely (I don't know Wave file format) in your case you either need to read pairs of Int16 values (if samples are together) or read channels separately if data for one channel is after another.
you must read the chunkinfos. The data-chunk tells you how much bytes you have to read. the WaveFormat tells you ho much Averagebytespersecond you have, and much more. I have some VB-code...
have converted the VB-code with sharpdevelop to C# maybe it helps a little bit...
using System;
using System.IO;
public class ChunkInfo
{
private byte[] m_Header;
private long m_Length;
private long m_OffSet;
public ChunkInfo(string Header)
{
m_Header = new byte[Header.Length];
for (int i = 0; i <= m_Header.GetUpperBound(0); i++)
{
m_Header[i] = (byte)Header[i];
}
}
public ChunkInfo(byte[] Header)
{
m_Header = Header;
}
public void Read(BinaryReader br)
{
m_OffSet = SearchOffset(br);
if (m_OffSet >= 0)
{
br.BaseStream.Position = m_OffSet + m_Header.Length;
m_Length = br.ReadInt32();
}
}
public void Write(BinaryWriter bw)
{
bw.Write(m_Header);
bw.Write(m_Length);
}
public long Length
{
get { return m_Length; }
}
public long OffSet
{
get { return m_OffSet; }
}
private long SearchOffset(BinaryReader br)
{
byte[] haystack = null;
bool found = false;
long offset = 0;
long basepos = 0;
int hlength = 260;
long basepos_grow = hlength - m_Header.Length;
while (!(found || (basepos >= br.BaseStream.Length)))
{
br.BaseStream.Position = basepos;
haystack = br.ReadBytes(hlength);
offset = BoyerMooreHorspool.find(haystack, m_Header);
found = offset >= 0;
if (found)
{
offset += basepos;
break;
}
else
{
basepos += basepos_grow;
}
}
return offset;
}
}
public static class BoyerMooreHorspool
{
//detects a needle in the haystack
const int UBYTE_MAX = 255;
static int[] bad_char_skip4 = new int[UBYTE_MAX + 3];
static int[] bad_char_skip8 = new int[UBYTE_MAX + 3];
static bool IsInitialized = false;
public static void init()
{
//little optimization for needles with length 4 or 8
for (int i = 0; i <= UBYTE_MAX + 2; i++)
{
bad_char_skip4[i] = 4;
bad_char_skip8[i] = 8;
}
IsInitialized = true;
}
public static int find(byte[] haystack, byte[] needle, int start = 0)
{
if (!IsInitialized) init();
int i_n = 0;
//needle index
int n_n = needle.Length;
int[] bad_char_skip = null;
switch (n_n)
{
case 4:
bad_char_skip = bad_char_skip4;
break;
case 8:
bad_char_skip = bad_char_skip8;
break;
default:
bad_char_skip = new int[UBYTE_MAX + 3];
for (i_n = 0; i_n <= UBYTE_MAX + 2; i_n++)
{
bad_char_skip[i_n] = n_n;
}
break;
}
int ifind = -1;
//if not found then return - 1
int i_h = start;
//haystack index
int n_h = haystack.Length;
if (n_n > n_h)
throw new ArgumentOutOfRangeException("needle", "needle is to long");
int last = n_n - 1;
for (i_n = 0; i_n <= last - 1; i_n++)
{
bad_char_skip[needle[i_n]] = last - i_n;
}
byte bcs = 0;
int bhs = 0;
while ((n_h - start) >= n_n)
{
i_n = last;
while (haystack[i_h + i_n] == needle[i_n])
{
i_n -= 1;
if (i_n == 0)
{
ifind = i_h;
break;
}
}
bhs = haystack[i_h + last];
bcs = (byte)(bad_char_skip[bhs]);
n_h -= bcs;
i_h += bcs;
}
return ifind;
}
}

Read double value from a file C#

I have a txt file that the format is:
0.32423 1.3453 3.23423
0.12332 3.1231 9.23432432
9.234324234 -1.23432 12.23432
...
Each line has three double value. There are more than 10000 lines in this file. I can use the ReadStream.ReadLine and use the String.Split, then convert it.
I want to know is there any faster method to do it.
Best Regards,
StreamReader.ReadLine, String.Split and Double.TryParse sounds like a good solution here.
No need for improvement.
There may be some little micro-optimisations you can perform, but the way you've suggested sounds about as simple as you'll get.
10000 lines shouldn't take very long - have you tried it and found you've actually got a performance problem? For example, here are two short programs - one creates a 10,000 line file and the other reads it:
CreateFile.cs:
using System;
using System.IO;
public class Test
{
static void Main()
{
Random rng = new Random();
using (TextWriter writer = File.CreateText("test.txt"))
{
for (int i = 0; i < 10000; i++)
{
writer.WriteLine("{0} {1} {2}", rng.NextDouble(),
rng.NextDouble(), rng.NextDouble());
}
}
}
}
ReadFile.cs:
using System;
using System.Diagnostics;
using System.IO;
using System.Linq;
public class Test
{
static void Main()
{
Stopwatch sw = Stopwatch.StartNew();
using (TextReader reader = File.OpenText("test.txt"))
{
string line;
while ((line = reader.ReadLine()) != null)
{
string[] bits = line.Split(' ');
foreach (string bit in bits)
{
double value;
if (!double.TryParse(bit, out value))
{
Console.WriteLine("Bad value");
}
}
}
}
sw.Stop();
Console.WriteLine("Total time: {0}ms",
sw.ElapsedMilliseconds);
}
}
On my netbook (which admittedly has an SSD in) it only takes 82ms to read the file. I would suggest that's probably not a problem :)
I would suggest reading all your lines at once with
string[] lines = System.IO.File.ReadAllLines(fileName);
This wold ensure that the I/O is done with the maximum efficiency. You woul have to measure (profile) but I would expect the conversions to take far less time.
your method is already good!
you can improve it by writing a readline function that returns an array of double and you reuse this function in other programs.
This solution is a little bit slower (see benchmarks at the end), but its nicer to read. It should also be more memory efficient because only the current character is buffered at the time (instead of the whole file or line).
Reading arrays is an additional feature in this reader which assumes that the size of the array always comes first as an int-value.
IParsable is another feature, that makes it easy to implement Parse methods for various types.
class StringSteamReader {
private StreamReader sr;
public StringSteamReader(StreamReader sr) {
this.sr = sr;
this.Separator = ' ';
}
private StringBuilder sb = new StringBuilder();
public string ReadWord() {
eol = false;
sb.Clear();
char c;
while (!sr.EndOfStream) {
c = (char)sr.Read();
if (c == Separator) break;
if (IsNewLine(c)) {
eol = true;
char nextch = (char)sr.Peek();
while (IsNewLine(nextch)) {
sr.Read(); // consume all newlines
nextch = (char)sr.Peek();
}
break;
}
sb.Append(c);
}
return sb.ToString();
}
private bool IsNewLine(char c) {
return c == '\r' || c == '\n';
}
public int ReadInt() {
return int.Parse(ReadWord());
}
public double ReadDouble() {
return double.Parse(ReadWord());
}
public bool EOF {
get { return sr.EndOfStream; }
}
public char Separator { get; set; }
bool eol;
public bool EOL {
get { return eol || sr.EndOfStream; }
}
public T ReadObject<T>() where T : IParsable, new() {
var obj = new T();
obj.Parse(this);
return obj;
}
public int[] ReadIntArray() {
int size = ReadInt();
var a = new int[size];
for (int i = 0; i < size; i++) {
a[i] = ReadInt();
}
return a;
}
public double[] ReadDoubleArray() {
int size = ReadInt();
var a = new double[size];
for (int i = 0; i < size; i++) {
a[i] = ReadDouble();
}
return a;
}
public T[] ReadObjectArray<T>() where T : IParsable, new() {
int size = ReadInt();
var a = new T[size];
for (int i = 0; i < size; i++) {
a[i] = ReadObject<T>();
}
return a;
}
internal void NextLine() {
eol = false;
}
}
interface IParsable {
void Parse(StringSteamReader r);
}
It can be used like this:
public void Parse(StringSteamReader r) {
double x = r.ReadDouble();
int y = r.ReadInt();
string z = r.ReadWord();
double[] arr = r.ReadDoubleArray();
MyParsableObject o = r.ReadObject<MyParsableObject>();
MyParsableObject [] oarr = r.ReadObjectArray<MyParsableObject>();
}
I did some benchmarking, comparing StringStreamReader with some other approaches, already proposed (StreamReader.ReadLine and File.ReadAllLines). Here are the methods I used for benchmarking:
private static void Test_StringStreamReader(string filename) {
var sw = new Stopwatch();
sw.Start();
using (var sr = new StreamReader(new FileStream(filename, FileMode.Open, FileAccess.Read))) {
var r = new StringSteamReader(sr);
r.Separator = ' ';
while (!r.EOF) {
var dbls = new List<double>();
while (!r.EOF) {
dbls.Add(r.ReadDouble());
}
}
}
sw.Stop();
Console.WriteLine("elapsed: {0}", sw.Elapsed);
}
private static void Test_ReadLine(string filename) {
var sw = new Stopwatch();
sw.Start();
using (var sr = new StreamReader(new FileStream(filename, FileMode.Open, FileAccess.Read))) {
var dbls = new List<double>();
while (!sr.EndOfStream) {
string line = sr.ReadLine();
string[] bits = line.Split(' ');
foreach(string bit in bits) {
dbls.Add(double.Parse(bit));
}
}
}
sw.Stop();
Console.WriteLine("elapsed: {0}", sw.Elapsed);
}
private static void Test_ReadAllLines(string filename) {
var sw = new Stopwatch();
sw.Start();
string[] lines = System.IO.File.ReadAllLines(filename);
var dbls = new List<double>();
foreach(var line in lines) {
string[] bits = line.Split(' ');
foreach (string bit in bits) {
dbls.Add(double.Parse(bit));
}
}
sw.Stop();
Console.WriteLine("Test_ReadAllLines: {0}", sw.Elapsed);
}
I used a file with 1.000.000 lines of double values (3 values each line). File is located on a SSD disk and each test was repeated multiple times in release-mode. These are the results (on average):
Test_StringStreamReader: 00:00:01.1980975
Test_ReadLine: 00:00:00.9117553
Test_ReadAllLines: 00:00:01.1362452
So, as mentioned StringStreamReader is a bit slower than the other approaches. For 10.000 lines, the performance is around (120ms / 95ms / 100ms).

Categories

Resources