I want to implement the fastest possible Inter-Process Communication (IPC) (least CPU bound) between 2 .NET Core (or even mono possible) applications on Linux (SBC).
I tried TPC (socket, loopbacks) and anonymous/named pipes which are way too slow. Now I am testing MemoryMappedFiles (shared memory) and I am observing kind of strange behavior.
The next code works for me fine:
static async Task Main(string[] args)
{
var are = new AutoResetEvent(false);
var masterTask = Task.Run(async () =>
{
using (var memoryMappedFile = MemoryMappedFile.CreateNew("test", 100_000))
{
using (var memoryMappedViewAccessor = memoryMappedFile.CreateViewAccessor())
{
are.Set();
for (int i = 0; i < 100; i++)
{
Console.WriteLine($"Master: {i}");
memoryMappedViewAccessor.Write(0, i);
await Task.Delay(1000);
}
}
}
});
are.WaitOne();
var slaveTask = Task.Run(async () =>
{
using (var memoryMappedFile = MemoryMappedFile.OpenExisting("test"))
{
using (var memoryMappedViewAccessor = memoryMappedFile.CreateViewAccessor())
{
int number;
do
{
number = memoryMappedViewAccessor.ReadInt32(0);
Console.WriteLine($"Slave: {number}");
await Task.Delay(1000);
}
while (number < 99);
}
}
});
await Task.WhenAll(masterTask, slaveTask);
Console.WriteLine("...");
Console.ReadKey();
}
But when I split it into 2 applications:
static async Task Main(string[] args)
{
using (var memoryMappedFile = MemoryMappedFile.CreateNew("test", 100_000))
{
using (var memoryMappedViewAccessor = memoryMappedFile.CreateViewAccessor())
{
for (int i = 0; i < 100; i++)
{
Console.WriteLine($"Master: {i}");
memoryMappedViewAccessor.Write(0, i);
await Task.Delay(1000);
}
}
}
Console.WriteLine("...");
Console.ReadKey();
}
and
static async Task Main(string[] args)
{
using (var memoryMappedFile = MemoryMappedFile.OpenExisting(mapName: "test"))
{
using (var memoryMappedViewAccessor = memoryMappedFile.CreateViewAccessor())
{
int number;
do
{
number = memoryMappedViewAccessor.ReadInt32(0);
Console.WriteLine($"Slave: {number}");
await Task.Delay(1000);
}
while (number < 99);
}
}
Console.WriteLine("...");
Console.ReadKey();
}
the second "client" app return:
Unhandled Exception:
System.IO.FileNotFoundException:
at System.IO.MemoryMappedFiles.MemoryMapImpl.OpenFile (System.String path, System.IO.FileMode mode, System.String mapName, System.Int64& capacity, System.IO.MemoryMappedFiles.MemoryMappedFileAccess access, System.IO.MemoryMappedFiles.MemoryMappedFileOptions options) [0x00065] in :0
Why is it? Could I somehow let the second app see the resources of the first one?
(Note: Not sure 100% but I can bet I was able to run such 2 apps previously - several years ago - on mono - I wish it was true and there is a way).
Another thought brings me to the idea of whether there is any C++ tool to do so, that can be wrapped from c# to do so. Is it possible?
We are f....d as developers guys:) https://chat.openai.com/chat helped me to solve this promptly.
I.e. on Linux, there is a folder /dev/shm where all MMF files are stored (some kind of RAM disk - tmpfs). So modify first process
MemoryMappedFile.CreateFromFile("/dev/shm/test", System.IO.FileMode.OpenOrCreate, "test", 100_000)
and the second to
MemoryMappedFile.CreateFromFile("/dev/shm/test", FileMode.Open)
and voilá. And it is incredibly fast in data volume transition from one process to another.
Related
I needed a very basic serial execution queue.
I wrote the following based on this idea, but I needed a queue to ensure FIFO, so I added a intermediate ConcurrentQueue<>.
Here is the code:
public class SimpleSerialTaskQueue
{
private SemaphoreSlim _semaphore = new SemaphoreSlim(0);
private ConcurrentQueue<Func<Task>> _taskQueue = new ConcurrentQueue<Func<Task>>();
public SimpleSerialTaskQueue()
{
Task.Run(async () =>
{
Func<Task> dequeuedTask;
while (true)
{
if (await _semaphore.WaitAsync(1000))
{
if (_taskQueue.TryDequeue(out dequeuedTask) == true)
{
await dequeuedTask();
}
}
else
{
Console.WriteLine("Nothing more to process");
//If I don't do that , memory pressure is never released
//GC.Collect();
}
}
});
}
public void Add(Func<Task> o_task)
{
_taskQueue.Enqueue(o_task);
_semaphore.Release();
}
}
When I run that in a loop, simulating heavy load, I get some kind of memory leak. Here is the code:
static void Main(string[] args)
{
SimpleSerialTaskQueue queue = new SimpleSerialTaskQueue();
for (int i = 0; i < 100000000; i++)
{
queue.Add(async () =>
{
await Task.Delay(0);
});
}
Console.ReadLine();
}
EDIT:
I don't understand why once the tasks have been executed, I still get like 750MB of memory used (based on VS2015 diagnostic tools). I thought once executed it would be very low. The GC doesnt seem to collect anything.
Can anyone tell me what is happening? Is this related to the state machine
I am reading the contents of a zip file and trying to extract them.
var allZipEntries = ZipFile.Open(zipFileFullPath, ZipArchiveMode.Read).Entries;
Now if I extract the using Foreach loop this works fine. The drawback is it is equivalent of zip.extract method and I am not getting any advantage when intend to extract all the files.
foreach (var currentEntry in allZipEntries)
{
if (currentEntry.FullName.Equals(currentEntry.Name))
{
currentEntry.ExtractToFile($"{tempPath}\\{currentEntry.Name}");
}
else
{
var subDirectoryPath = Path.Combine(tempPath, Path.GetDirectoryName(currentEntry.FullName));
Directory.CreateDirectory(subDirectoryPath);
currentEntry.ExtractToFile($"{subDirectoryPath}\\{currentEntry.Name}");
}
}
Now to take advantage of TPL tried using Parallel.forEach,but that's throwing following exception:
An exception of type 'System.IO.InvalidDataException' occurred in System.IO.Compression.dll but was not handled in user code
Additional information: A local file header is corrupt.
Parallel.ForEach(allZipEntries, currentEntry =>
{
if (currentEntry.FullName.Equals(currentEntry.Name))
{
currentEntry.ExtractToFile($"{tempPath}\\{currentEntry.Name}");
}
else
{
var subDirectoryPath = Path.Combine(tempPath, Path.GetDirectoryName(currentEntry.FullName));
Directory.CreateDirectory(subDirectoryPath);
currentEntry.ExtractToFile($"{subDirectoryPath}\\{currentEntry.Name}");
}
});
And to avoid this I could use a lock , but that defeats the whole purpose.
Parallel.ForEach(allZipEntries, currentEntry =>
{
lock (thisLock)
{
if (currentEntry.FullName.Equals(currentEntry.Name))
{
currentEntry.ExtractToFile($"{tempPath}\\{currentEntry.Name}");
}
else
{
var subDirectoryPath = Path.Combine(tempPath, Path.GetDirectoryName(currentEntry.FullName));
Directory.CreateDirectory(subDirectoryPath);
currentEntry.ExtractToFile($"{subDirectoryPath}\\{currentEntry.Name}");
}
}
});
Any other or better way around to extract the files?
ZipFile is explicitly documented as not guaranteed to be threadsafe for instance members. This is no longer mentioned on the page. Snapshot from Nov 2016.
What you're trying to do cannot be done with this library. There may be some other libraries out there which do support multiple threads per zip file, but I wouldn't expect it.
You can use multi-threading to unzip multiple files at the same time, but not for multiple entries in the same zip file.
Writing/reading in parallel is not a good idea as the hard drive controller will only run the requests one by one. By having multiple threads you just add overhead and queue them all up for no gain.
Try reading the file into memory first, this will avoid your exception however if you benchmark it you may find its actually slower due to the overhead of more threads.
If the file is very large and the decompression takes a long time, running the decompressing in parallel may improve speed, however the IO read/write will not. Most decompression libraries are already multi threaded anyway, so only if this one is not will you have any performance gain from doing this.
Edit: A dodgy way to make the library thread safe below. This runs slower/on par depending on the zip archive which proves the point that this is not something that will benefit from parallelism
Array.ForEach(Directory.GetFiles(#"c:\temp\output\"), File.Delete);
Stopwatch timer = new Stopwatch();
timer.Start();
int numberOfThreads = 8;
var clonedZipEntries = new List<ReadOnlyCollection<ZipArchiveEntry>>();
for (int i = 0; i < numberOfThreads; i++)
{
clonedZipEntries.Add(ZipFile.Open(#"c:\temp\temp.zip", ZipArchiveMode.Read).Entries);
}
int totalZipEntries = clonedZipEntries[0].Count;
int numberOfEntriesPerThread = totalZipEntries / numberOfThreads;
Func<object,int> action = (object thread) =>
{
int threadNumber = (int)thread;
int startIndex = numberOfEntriesPerThread * threadNumber;
int endIndex = startIndex + numberOfEntriesPerThread;
if (endIndex > totalZipEntries) endIndex = totalZipEntries;
for (int i = startIndex; i < endIndex; i++)
{
Console.WriteLine($"Extracting {clonedZipEntries[threadNumber][i].Name} via thread {threadNumber}");
clonedZipEntries[threadNumber][i].ExtractToFile($#"C:\temp\output\{clonedZipEntries[threadNumber][i].Name}");
}
//Check for any remainders due to non evenly divisible size
if (threadNumber == numberOfThreads - 1 && endIndex < totalZipEntries)
{
for (int i = endIndex; i < totalZipEntries; i++)
{
Console.WriteLine($"Extracting {clonedZipEntries[threadNumber][i].Name} via thread {threadNumber}");
clonedZipEntries[threadNumber][i].ExtractToFile($#"C:\temp\output\{clonedZipEntries[threadNumber][i].Name}");
}
}
return 0;
};
//Construct the tasks
var tasks = new List<Task<int>>();
for (int threadNumber = 0; threadNumber < numberOfThreads; threadNumber++) tasks.Add(Task<int>.Factory.StartNew(action, threadNumber));
Task.WaitAll(tasks.ToArray());
timer.Stop();
var threaderTimer = timer.ElapsedMilliseconds;
Array.ForEach(Directory.GetFiles(#"c:\temp\output\"), File.Delete);
timer.Reset();
timer.Start();
var entries = ZipFile.Open(#"c:\temp\temp.zip", ZipArchiveMode.Read).Entries;
foreach (var entry in entries)
{
Console.WriteLine($"Extracting {entry.Name} via thread 1");
entry.ExtractToFile($#"C:\temp\output\{entry.Name}");
}
timer.Stop();
Console.WriteLine($"Threaded version took: {threaderTimer} ms");
Console.WriteLine($"Non-Threaded version took: {timer.ElapsedMilliseconds} ms");
Console.ReadLine();
I am trying to poll an API as fast and as efficiently as possible to get market data. The API allows you to get market data from batchSize markets per request. The API allows you to have 3 concurrent requests but no more (or throws errors).
I may be requesting data from many more than batchSize different markets.
I continuously loop through all of the markets, requesting the data in batches, one batch per thread and 3 threads at any time.
The total number of markets (and hence batches) can change at any time.
I'm using the following code:
private static object lockObj = new object();
private void PollMarkets()
{
const int NumberOfConcurrentRequests = 3;
for (int i = 0; i < NumberOfConcurrentRequests; i++)
{
int batch = 0;
Task.Factory.StartNew(async () =>
{
while (true)
{
if (markets.Count > 0)
{
List<string> batchMarketIds;
lock (lockObj)
{
var numBatches = (int)Math.Ceiling((double)markets.Count / batchSize);
batchMarketIds = markets.Keys.Skip(batch*batchSize).Take(batchSize).ToList();
batch = (batch + 1) % numBatches;
}
var marketData = await GetMarketData(batchMarketIds);
// Do something with marketData
}
else
{
await Task.Delay(1000); // wait for some markets to be added.
}
}
}
});
}
}
Even though there is a lock in the critical section, each thread starts with batch = 0 (each thread is often polling for duplicate data).
If I change batch to a private volatile field the above code works as I want it to (volatile and lock).
So for some reason my lock doesn't work? I feel like it's something obvious but I'm missing it.
I believe that it is best here to use a lock instead of a volatile field, is this also correct?
Thanks
The issue was that you were defining the batch variable inside the for loop. That meant that the threads were using their own variable instead of sharing it.
In my mind you should use Queue<> to create a jobs pipeline.
Something like this
private int batchSize = 10;
private Queue<int> queue = new Queue<int>();
private void AddMarket(params int[] marketIDs)
{
lock (queue)
{
foreach (var marketID in marketIDs)
{
queue.Enqueue(marketID);
}
if (queue.Count >= batchSize)
{
Monitor.Pulse(queue);
}
}
}
private void Start()
{
for (var tid = 0; tid < 3; tid++)
{
Task.Run(async () =>
{
while (true)
{
List<int> toProcess;
lock (queue)
{
if (queue.Count < batchSize)
{
Monitor.Wait(queue);
continue;
}
toProcess = new List<int>(batchSize);
for (var count = 0; count < batchSize; count++)
{
toProcess.Add(queue.Dequeue());
}
if (queue.Count >= batchSize)
{
Monitor.Pulse(queue);
}
}
var marketData = await GetMarketData(toProcess);
}
});
}
}
I am having a console application which reads the messages from Console.OpenStandardInput();
I am doing this in a task. but it seems to be not working.
static void Main(string[] args)
{
wtoken = new CancellationTokenSource();
readInputStream = Task.Factory.StartNew(() =>
{
wtoken.Token.ThrowIfCancellationRequested();
while (true)
{
if (wtoken.Token.IsCancellationRequested)
{
wtoken.Token.ThrowIfCancellationRequested();
}
else
{
OpenStandardStreamIn();
}
}
}, wtoken.Token
);
Console.ReadLine();
}
Here is my OpenStandardStreamIn function
public static void OpenStandardStreamIn()
{
Stream stdin = Console.OpenStandardInput();
int length = 0;
byte[] bytes = new byte[4];
stdin.Read(bytes, 0, 4);
length = System.BitConverter.ToInt32(bytes, 0);
string input = "";
for (int i = 0; i < length; i++)
{
input += (char)stdin.ReadByte();
}
Console.Write(input);
}
Any help? why it is not working in a continous loop
You basically have a race condition between Console.ReadLine and your task. Both of them are trying to read from standard input - and I certainly don't know what you should expect when reading from standard input from two threads at the same time, but it seems like something worth avoiding.
You can easily test this by changing the task to do something other than reading from standard input. For example:
using System;
using System.Threading;
using System.Threading.Tasks;
class Test
{
static void Main()
{
var wtoken = new CancellationTokenSource();
var readInputStream = Task.Factory.StartNew(() =>
{
for (int i = 0; i < 10; i++)
{
Console.WriteLine(i);
Thread.Sleep(200);
}
}, wtoken.Token);
Console.ReadLine();
}
}
If your real code needs to read from standard input, then I suggest you change Console.ReadLine() into readInputStream.Wait(). I'd also suggest you use Task.Run instead of Task.Factory.StartNew() if you're using .NET 4.5, just for readability - assuming you don't need any of the more esoteric behaviour of TaskFactory.StartNew.
I have simple IO bound 4.0 console application, which send 1 to n requests to a web-service and wait for their completion and then exit. Here is a sample,
static int counter = 0;
static void Main(string[] args)
{
foreach (my Loop)
{
......................
WebClientHelper.PostDataAsync(... =>
{
................................
................................
Interlocked.Decrement(ref counter);
});
Interlocked.Increment(ref counter);
}
while(counter != 0)
{
Thread.Sleep(500);
}
}
Is this is correct implementation?
You can use Tasks. Let TPL manage those things.
Task<T>[] tasks = ...;
//Started the tasks
Task.WaitAll(tasks);
Another way is to use TaskCompletionSource as mentioned here.
As suggested by Hans, here's your code implemented with CountdownEvent:
static void Main(string[] args)
{
var counter = new CountdownEvent();
foreach (my Loop)
{
......................
WebClientHelper.PostDataAsync(... =>
{
................................
................................
counter.Signal();
});
counter.AddCount();
}
counter.Wait();
}