I am using a restful api that will return a maximum amount of 50 records per call, if you need more than this you must create multiple calls and pass it an offset.
There are times when we require 'all' of the results to be loaded, we are using something that resembles the code below - this does one request after another and adds the results to a list, stopping when either the maximum is reached or the amount returned in any one call was less than the amount requested.
How can refactor this (using tasks/parallel/threads) to load this data with multiple requests at any one time and still get the exact same results, I have looked at creating multiple Tasks and awaiting them but the problem is that the amount of records to load is unknown until the point of 'no more being available' or hitting the max.
public IEnumerable<T> GetItems(int maxAmount = -1)
{
var moreData = true;
var result = new List<T>();
var counter = 0;
var batchAmount = 50;
while(moreData)
{
var requestAmount = Math.Min(batchAmount,maxAmount-result.Count);
var items = GetItemsFromService<T>(requestAmount,counter);
counter += items.Count;
moreData = items.Count == requestAmount && (maxAmount == -1 || maxAmount> items.Count);
result.AddRange(items);
}
return result;
}
private IEnumerable<T> GetItemsFromService(int batchAmount,int offset)
{
//Lets assume that this gets data from a rest service that returns a maximum of batchAmount
//and offsets using the offset variable.
}
Unfortunately you can't use async here as you are relying on the number of items from the previous request. This must be synchronous unless you want to do some asynchronous operations on the data that you've received.
It must be a badly designed API that returns paged result without total pages or total number of items.
I managed to get this working, basically I keep sending the paged request until one of the requests come back with nothing - since they are started in order once a response comes back with nothing we do not need to make anymore requests, just allow existing requests to finish.
My working code looks like this.
private IEnumerable<object> GetEntitiesInParallel(Type type, string apiPath, Dictionary<string, string> parameters, int startPosition, int maxAmount)
{
var context = new TaskThreadingContext(maxAmount, startPosition);
var threads = Enumerable.Range(0, NumberOfThreads).Select(i =>
{
var task = Task.Factory.StartNew(() =>
{
while (context.Continue)
{
var rawData = String.Empty;
var offset = context.NextAmount();
var result = GetEntitiesSingleRequest(type, parameters, offset, apiPath, out rawData);
if (result.Any())
{
context.AddResult(result.Cast<object>(), rawData);
}
else
{
context.NoResult();
}
}
});
return task;
}).ToArray();
Task.WaitAll(threads);
var results = context.GetResults<object>();
return results;
}
private IEnumerable<object> GetEntitiesSingleRequest(Type type,Dictionary<string,string> parameters,
int offset,string apiPath, out string rawData)
{
var request = Utility.CreateRestRequest(apiPath, Method.GET,ApiKey,50,offset,parameters);
type = typeof(List<>).MakeGenericType(type);
var method = Client.GetType().GetMethods().Single(m => m.IsGenericMethod && m.Name == "Execute").MakeGenericMethod(type);
try
{
dynamic response = (IRestResponse)method.Invoke(Client, new object[] { request });
var data = response.Data as IEnumerable;
var dataList = data.Cast<object>().ToList();
rawData = response.Content.Replace("\n", Environment.NewLine);
return dataList.OfType<object>().ToList();
}
catch (Exception ex)
{
if (ex.Message.IndexOf("404") != -1)
{
rawData = null;
return Enumerable.Empty<object>();
}
throw;
}
}
private class TaskThreadingContext
{
private int batchAmount = 50;
private object locker1 = new object();
private object locker2 = new object();
private CancellationTokenSource tokenSource;
private CancellationToken token;
private volatile bool cont = true;
private volatile int offset = 0;
private volatile int max = 0;
private volatile int start = 0;
private List<object> result = new List<object>();
private List<string> raw = new List<string>();
public bool Continue { get { return cont; } }
public TaskThreadingContext(int maxRows = 0,int startPosition = 0)
{
max = maxRows;
offset = start = startPosition;
}
public int NextAmount()
{
lock(locker1)
{
var ret = offset;
var temp = offset + batchAmount;
if (temp - start > max && max > 0)
{
temp = max - offset;
}
offset = temp;
if (offset - start >= max && max > 0)
{
cont = false;
}
return ret;
}
}
public TaskThreadingContext()
{
tokenSource = new CancellationTokenSource();
token = tokenSource.Token;
}
public void AddResult(IEnumerable<object> items,string rawData)
{
lock(locker2)
{
result.AddRange(items);
raw.Add(rawData);
}
}
public IEnumerable<T> GetResults<T>()
{
return this.result.Cast<T>().ToList();
}
public void NoResult()
{
cont = false;
}
}
Related
For a unity project. Procedural terrain generation system that runs in generations. The output of each generation becomes the input of the next, so they must be run sequentially.
They are also slow enough that they need to run on another thread.
The below code works by generating a "sector" and storing it in a dictionary with its "generation" number as the key. It also stores a small initialization object with just the data required to initialize a given generation, so that I can destroy sectors to save memory and re instantiate them backwards down the chain.
Increment() finds the highest key, and generates a new sector with the previous one as its input.
Task.Run() does work to generate sectors without blocking the rest of the game. The problem is that it's possible to request a new sector before the previous one finished generating, etc.
What's the best pattern to prevent generation 3 being generated before generation 2 is finished?
public class WorldGeneratorAsync : MonoBehaviour
{
public Dictionary<int, SectorController> SectorContollerDict = new Dictionary<int, SectorController>();
public Dictionary<int, TerrainGraphInput> GraphInputDict = new Dictionary<int, TerrainGraphInput>();
public int globalSeed;
public TerrainGraph terrainGraph;
async void Initialise()
{
DestroyAllSectors();
await InstantiateSectorAsync(0);
}
async void Increment()
{
var highestGeneration = SectorContollerDict.Keys.Max();
await InstantiateSectorAsync(highestGeneration+1);
}
async Task InstantiateSectorAsync(int generation)
{
if (generation == 0) // if we are first generation, init with dummy data
{
var inputData = new TerrainGraphInput(globalSeed, generation); // dummy data
var outputData = await Task.Run(() =>terrainGraph.GetGraphOutput(inputData)); // slow function
lock (SectorContollerDict)
{
SectorContollerDict[generation] = SectorController.New(outputData);
}
lock (GraphInputDict)
{
GraphInputDict[generation] = inputData;
}
}
else // we take the init data from the previous generation
{
int adder = generation > 0 ? -1 : 1;
TerrainGraphInput inputData;
if (GraphInputDict.Keys.Contains(generation))
{
inputData = GraphInputDict[generation];
}
else if (SectorContollerDict.Keys.Contains(generation + adder))
{
var previousSectorController = SectorContollerDict[generation + adder];
inputData = new TerrainGraphInput(
previousSectorController.sectorData,
previousSectorController.sectorData.EndSeeds,
generation,
globalSeed
);
}
else
{
throw new NoValidInputException();
}
var outputData = await Task.Run(()=>terrainGraph.GetGraphOutput(inputData)); // slow function
lock (SectorContollerDict)
{
SectorContollerDict[generation] = SectorController.New(outputData);
}
lock (GraphInputDict)
{
GraphInputDict[generation] = inputData;
}
}
}
private void DestroyAllSectors()
{
SectorContollerDict = new Dictionary<int, SectorController>();
GraphInputDict = new Dictionary<int, TerrainGraphInput>();
foreach (var sc in GameObject.FindObjectsOfType<SectorController>())
{
sc.DestroyMe();
}
}
}
Thanks to Orace - their idea worked. Simpler that I expected - just switch the dictionary of sectorControllers to tasks, and await the previous generation in the instantiation function.
public class WorldGeneratorAsync : MonoBehaviour
{
public Dictionary<int, Task<SectorController>> TaskDict = new();
// public Dictionary<int, SectorController> SectorContollerDict = new();
public Dictionary<int, TerrainGraphInput> GraphInputDict = new();
public int globalSeed;
public TerrainGraph terrainGraph;
async void Initialise()
{
DestroyAllSectors();
TaskDict[0] = InstantiateSectorAsync(0);
}
async void Increment()
{
var highestGeneration = TaskDict.Keys.Max();
int newGeneration = highestGeneration + 1;
TaskDict[newGeneration] = InstantiateSectorAsync(newGeneration);
}
async Task<SectorController> InstantiateSectorAsync(int generation)
{
SectorController sc;
if (generation == 0) // if we are first generation, init with dummy data
{
var inputData = new TerrainGraphInput(globalSeed, generation); // dummy data
var outputData = await Task.Run(() =>terrainGraph.GetGraphOutput(inputData)); // slow function
sc = SectorController.New(outputData);
GraphInputDict[generation] = inputData;
}
else
{
int adder = generation > 0 ? -1 : 1;
TerrainGraphInput inputData;
if (GraphInputDict.Keys.Contains(generation))
{
inputData = GraphInputDict[generation];
}
else if (TaskDict.Keys.Contains(generation + adder))
{
// var previousSectorController = SectorContollerDict[generation + adder];
var previousSectorController = await TaskDict[generation + adder]; // await previous generation
inputData = new TerrainGraphInput(
previousSectorController.sectorData,
previousSectorController.sectorData.EndSeeds,
generation,
globalSeed
);
}
else
{
throw new NoValidInputException();
}
var outputData = await Task.Run(()=>terrainGraph.GetGraphOutput(inputData)); // slow function
sc = SectorController.New(outputData);
GraphInputDict[generation] = inputData;
}
return sc;
}
private void DestroyAllSectors()
{
GraphInputDict = new Dictionary<int, TerrainGraphInput>();
TaskDict = new();
foreach (var sc in GameObject.FindObjectsOfType<SectorController>())
{
sc.DestroyMe();
}
}
}
I'm dealing with c# concurrent-queue and multi-threading in socket-programming tcp/ip
First, I've already done with socket-programming itself. That means, I've already finished coding about client, server and stuffs about communication itself
basic structure is pipe-lined(producer-consumer problem) and now I'm doing with bit conversion
below is brief summary about my code
client-socket ->server-socket -> concurrent_queue_1(with type byte[65536],Thread_1 process this) -> concurrent_queue_2(with type double[40,3500], Thread_2 process this) -> display-data or other work(It can be gpu-work)
*(double[40,3500] can be changed to other size)
Till now,I've implemented putting_data into queue1(Thread1) and just dequeuing all(Thread2) and, its speed is about 700Mbps
The reason I used two concurrent_queue is, I want communication,and type conversion work to be processed in background regardless of main procedure about control things.
Here is the code about my own concurrent_queue with Blocking
public class BlockingConcurrentQueue<T> : IDisposable
{
private readonly ConcurrentQueue<T> _internalQueue;
private AutoResetEvent _autoResetEvent;
private long _consumed;
private long _isAddingCompleted = 0;
private long _produced;
private long _sleeping;
public BlockingConcurrentQueue()
{
_internalQueue = new ConcurrentQueue<T>();
_produced = 0;
_consumed = 0;
_sleeping = 0;
_autoResetEvent = new AutoResetEvent(false);
}
public bool IsAddingCompleted
{
get
{
return Interlocked.Read(ref _isAddingCompleted) == 1;
}
}
public bool IsCompleted
{
get
{
if (Interlocked.Read(ref _isAddingCompleted) == 1 && _internalQueue.IsEmpty)
return true;
else
return false;
}
}
public void CompleteAdding()
{
Interlocked.Exchange(ref _isAddingCompleted, 1);
}
public void Dispose()
{
_autoResetEvent.Dispose();
}
public void Enqueue(T item)
{
_internalQueue.Enqueue(item);
if (Interlocked.Read(ref _isAddingCompleted) == 1)
throw new InvalidOperationException("Adding Completed.");
Interlocked.Increment(ref _produced);
if (Interlocked.Read(ref _sleeping) == 1)
{
Interlocked.Exchange(ref _sleeping, 0);
_autoResetEvent.Set();
}
}
public bool TryDequeue(out T result)
{
if (Interlocked.Read(ref _consumed) == Interlocked.Read(ref _produced))
{
Interlocked.Exchange(ref _sleeping, 1);
_autoResetEvent.WaitOne();
}
if (_internalQueue.TryDequeue(out result))
{
Interlocked.Increment(ref _consumed);
return true;
}
return false;
}
}
My question is here
As I mentioned above, concurrent_queue1's type is byte[65536] and 65536 bytes = 8192 double data.
(40 * 3500=8192 * 17.08984375)
I want merge multiple 8192 double data into form of double[40,3500](size can be changed)and enqueue to concurrent_queue2 with Thread2
It's easy to do it with naive-approach(using many complex for loop) but it's slow cuz, It copys all the
data and expose to upper class or layer.
I'm searching method automatically enqueuing with matched size like foreach loop automatically iterates through 2D-array in row-major way, not yet found
Is there any fast way to merge 1D-byte array into form of 2D-double array and enqueue it?
Thanks for your help!
I try to understand your conversion rule, so I write this conversion code. Use Parallel to speed up the calculation.
int maxSize = 65536;
byte[] dim1Array = new byte[maxSize];
for (int i = 0; i < maxSize; ++i)
{
dim1Array[i] = byte.Parse((i % 256).ToString());
}
int dim2Row = 40;
int dim2Column = 3500;
int byteToDoubleRatio = 8;
int toDoubleSize = maxSize / byteToDoubleRatio;
double[,] dim2Array = new double[dim2Row, dim2Column];
Parallel.For(0, toDoubleSize, i =>
{
int row = i / dim2Column;
int col = i % dim2Column;
int originByteIndex = row * dim2Column * byteToDoubleRatio + col * byteToDoubleRatio;
dim2Array[row, col] = BitConverter.ToDouble(
dim1Array,
originByteIndex);
});
I've been having trouble running multiple tasks with heavy operations.
It seems as if the task processes is killed before all the operations are complete.
The code here is an example code I used to replicate the issue. If I add something like Debug.Write(), the added wait for writing fixes the issue. The issue is gone if I test on a smaller sample size too. The reason there is a class in the example below is to create complexity for the test.
The real case where I encountered the issue first is too complicated to explain for a post here.
public static class StaticRandom
{
static int seed = Environment.TickCount;
static readonly ThreadLocal<Random> random =
new ThreadLocal<Random>(() => new Random(Interlocked.Increment(ref seed)));
public static int Next()
{
return random.Value.Next();
}
public static int Next(int maxValue)
{
return random.Value.Next(maxValue);
}
public static double NextDouble()
{
return random.Value.NextDouble();
}
}
// this is the test function I run to recreate the problem:
static void tasktest()
{
var testlist = new List<ExampleClass>();
for (var index = 0; index < 10000; ++index)
{
var newClass = new ExampleClass();
newClass.Populate(Enumerable.Range(0, 1000).ToList());
testlist.Add(newClass);
}
var anotherClassList = new List<ExampleClass>();
var threadNumber = 5;
if (threadNumber > testlist.Count)
{
threadNumber = testlist.Count;
}
var taskList = new List<Task>();
var tokenSource = new CancellationTokenSource();
CancellationToken cancellationToken = tokenSource.Token;
int stuffPerThread = testlist.Count / threadNumber;
var stuffCounter = 0;
for (var count = 1; count <= threadNumber; ++count)
{
var toSkip = stuffCounter;
var threadWorkLoad = stuffPerThread;
var currentIndex = count;
// these ifs make sure all the indexes are covered
if (stuffCounter + threadWorkLoad > testlist.Count)
{
threadWorkLoad = testlist.Count - stuffCounter;
}
else if (count == threadNumber && stuffCounter + threadWorkLoad < testlist.Count)
{
threadWorkLoad = testlist.Count - stuffCounter;
}
taskList.Add(Task.Factory.StartNew(() => taskfunc(testlist, anotherClassList, toSkip, threadWorkLoad),
cancellationToken, TaskCreationOptions.None, TaskScheduler.Default));
stuffCounter += stuffPerThread;
}
Task.WaitAll(taskList.ToArray());
}
public class ExampleClass
{
public ExampleClassInner[] Inners { get; set; }
public ExampleClass()
{
Inners = new ExampleClassInner[5];
for (var index = 0; index < Inners.Length; ++index)
{
Inners[index] = new ExampleClassInner();
}
}
public void Populate(List<int> intlist) {/*adds random ints to the inner class*/}
public ExampleClass(ExampleClass copyFrom)
{
Inners = new ExampleClassInner[5];
for (var index = 0; index < Inners.Length; ++index)
{
Inners[index] = new ExampleClassInner(copyFrom.Inners[index]);
}
}
public class ExampleClassInner
{
public bool SomeBool { get; set; } = false;
public int SomeInt { get; set; } = -1;
public ExampleClassInner()
{
}
public ExampleClassInner(ExampleClassInner copyFrom)
{
SomeBool = copyFrom.SomeBool;
SomeInt = copyFrom.SomeInt;
}
}
}
static int expensivefunc(int theint)
{
/*a lot of pointless arithmetic and loops done only on primitives and with primitives,
just to increase the complexity*/
theint *= theint + 1;
var anotherlist = Enumerable.Range(0, 10000).ToList();
for (var index = 0; index < anotherlist.Count; ++index)
{
theint += index;
if (theint % 5 == 0)
{
theint *= index / 2;
}
}
var yetanotherlist = Enumerable.Range(0, 50000).ToList();
for (var index = 0; index < yetanotherlist.Count; ++index)
{
theint += index;
if (theint % 7 == 0)
{
theint -= index / 3;
}
}
while (theint > 8)
{
theint /= 2;
}
return theint;
}
// this function is intentionally creating a lot of objects, to simulate complexity
static void taskfunc(List<ExampleClass> intlist, List<ExampleClass> anotherClassList, int skip, int take)
{
if (take == 0)
{
take = intlist.Count;
}
var partial = intlist.Skip(skip).Take(take).ToList();
for (var index = 0; index < partial.Count; ++index)
{
var testint = expensivefunc(index);
var newClass = new ExampleClass(partial[index]);
newDna.Inners[StaticRandom.Next(5)].SomeInt = testint;
anotherClassList.Add(new ExampleClass(newClass));
}
}
The expected result is that the list anotherClassList will be the same size as testlist and this happens when the lists are smaller or the complexity of the task operations is smaller. However, when I increase the volume of operations, the anotherClassList has a few indexes missing and sometimes some of the indexes in the list are null objects.
Example result:
Why does this happen, I have Task.WaitAll?
Your problem is it's just not thread-safe; you just can't add to a list<T> in a multi-threaded environment and expect it to play nice.
One way is to use lock or a thread safe collection, but I feel this all should be refactored (my OCD is going off all over the place).
private static object _sync = new object();
...
private static void TaskFunc(List<ExampleClass> intlist, List<ExampleClass> anotherClassList, int skip, int take)
{
...
var partial = intlist.Skip(skip).Take(take).ToList();
...
// note that locking here will likely drastically decrease any performance threading gain
lock (_sync)
{
for (var index = 0; index < partial.Count; ++index)
{
// this is your problem, you are adding to a list from multiple threads
anotherClassList.Add(...);
}
}
}
In short, I think you need to better thinking about the threading logic of your method, identify what you are trying to achieve, and how to make it conceptually thread safe (while keeping your performance gains).
After TheGeneral enlightened me that Lists are not thread safe, I changed the List to which I was adding in a thread, to an Array type and this fixed my issue.
I have a restartable program that runs over a very large space and I have started parallelizing it some. Each Task runs independently and updates a database with its results. It doesn't matter if tasks are repeated (they are fully deterministic based on the input array and will simply generate the same result they did before), but doing so is relatively inefficient. So far I have come up with the following pattern:
static void Main(string[] args) {
GeneratorStart = Storage.Load();
var tasks = new List<Task>();
foreach (int[] temp in Generator()) {
var arr = temp;
var task = new Task(() => {
//... use arr as needed
});
task.Start();
tasks.Add(task);
if (tasks.Count > 4) {
Task.WaitAll(tasks.ToArray());
Storage.UpdateStart(temp);
tasks = new List<Task>();
}
}
}
Prior to making the generator restartable, I had a simple Parallel.Foreach loop on it and was a bit faster. I think I am losing some CPU time with the WaitAll operation. How can I get rid of this bottleneck while keeping track of what tasks I don't have to run again when I restart?
Other bits for those concerned (shortened for brevity to question):
class Program {
static bool Done = false;
static int[] GeneratorStart = null;
static IEnumerable<int[]> Generator() {
var s = new Stack<int>();
//... omitted code to initialize stack to GeneratorStart for brevity
yield return s.ToArray();
while (!Done) {
Increment(s);
yield return s.Reverse().ToArray();
}
}
static int Base = 25600; //example number (none of this is important
static void Increment(Stack<int> stack) { //outside the fact
if (stack.Count == 0) { //that it is generating an array
stack.Push(1); //of a large base
return; //behaving like an integer
} //with each digit stored in an
int i = stack.Pop(); //array position)
i++;
if (i < Base) {
stack.Push(i);
return;
}
Increment(stack);
stack.Push(0);
}
}
I've come up with this:
var tasks = new Queue<Pair<int[],Task>>();
foreach (var temp in Generator()) {
var arr = temp;
tasks.Enqueue(new Pair<int[], Task>(arr, Task.Run(() ={
//... use arr as needed
}));
var tArray = t.Select(v => v.Value).Where(t=>!t.IsCompleted).ToArray();
if (tArray.Length > 7) {
Task.WaitAny(tArray);
var first = tasks.Peek();
while (first != null && first.B.IsCompleted) {
Storage.UpdateStart(first.A);
tasks.Dequeue();
first = tasks.Count == 0 ? null : tasks.Peek();
}
}
}
...
class Pair<TA,TB> {
public TA A { get; set; }
public TB B { get; set; }
public Pair(TA a, TB b) { A = a; B = b; }
}
Let's say I have two sequences returning integers 1 to 5.
The first returns 1, 2 and 3 very fast, but 4 and 5 take 200ms each.
public static IEnumerable<int> FastFirst()
{
for (int i = 1; i < 6; i++)
{
if (i > 3) Thread.Sleep(200);
yield return i;
}
}
The second returns 1, 2 and 3 with a 200ms delay, but 4 and 5 are returned fast.
public static IEnumerable<int> SlowFirst()
{
for (int i = 1; i < 6; i++)
{
if (i < 4) Thread.Sleep(200);
yield return i;
}
}
Unioning both these sequences give me just numbers 1 to 5.
FastFirst().Union(SlowFirst());
I cannot guarantee which of the two methods has delays at what point, so the order of the execution cannot guarantee a solution for me. Therefore, I would like to parallelise the union, in order to minimise the (artifical) delay in my example.
A real-world scenario: I have a cache that returns some entities, and a datasource that returns all entities. I'd like to be able to return an iterator from a method that internally parallelises the request to both the cache and the datasource so that the cached results yield as fast as possible.
Note 1: I realise this is still wasting CPU cycles; I'm not asking how can I prevent the sequences from iterating over their slow elements, just how I can union them as fast as possible.
Update 1: I've tailored achitaka-san's great response to accept multiple producers, and to use ContinueWhenAll to set the BlockingCollection's CompleteAdding just the once. I just put it here since it would get lost in the lack of comments formatting. Any further feedback would be great!
public static IEnumerable<TResult> SelectAsync<TResult>(
params IEnumerable<TResult>[] producer)
{
var resultsQueue = new BlockingCollection<TResult>();
var taskList = new HashSet<Task>();
foreach (var result in producer)
{
taskList.Add(
Task.Factory.StartNew(
() =>
{
foreach (var product in result)
{
resultsQueue.Add(product);
}
}));
}
Task.Factory.ContinueWhenAll(taskList.ToArray(), x => resultsQueue.CompleteAdding());
return resultsQueue.GetConsumingEnumerable();
}
Take a look at this.
The first method just returns everything in order results come.
The second checks uniqueness. If you chain them you will get the result you want I think.
public static class Class1
{
public static IEnumerable<TResult> SelectAsync<TResult>(
IEnumerable<TResult> producer1,
IEnumerable<TResult> producer2,
int capacity)
{
var resultsQueue = new BlockingCollection<TResult>(capacity);
var producer1Done = false;
var producer2Done = false;
Task.Factory.StartNew(() =>
{
foreach (var product in producer1)
{
resultsQueue.Add(product);
}
producer1Done = true;
if (producer1Done && producer2Done) { resultsQueue.CompleteAdding(); }
});
Task.Factory.StartNew(() =>
{
foreach (var product in producer2)
{
resultsQueue.Add(product);
}
producer2Done = true;
if (producer1Done && producer2Done) { resultsQueue.CompleteAdding(); }
});
return resultsQueue.GetConsumingEnumerable();
}
public static IEnumerable<TResult> SelectAsyncUnique<TResult>(this IEnumerable<TResult> source)
{
HashSet<TResult> knownResults = new HashSet<TResult>();
foreach (TResult result in source)
{
if (knownResults.Contains(result)) {continue;}
knownResults.Add(result);
yield return result;
}
}
}
The cache would be nearly instant compared to fetching from the database, so you could read from the cache first and return those items, then read from the database and return the items except those that were found in the cache.
If you try to parallelise this, you will add a lot of complexity but get quite a small gain.
Edit:
If there is no predictable difference in the speed of the sources, you could run them in threads and use a synchronised hash set to keep track of which items you have already got, put the new items in a queue, and let the main thread read from the queue:
public static IEnumerable<TItem> GetParallel<TItem, TKey>(Func<TItem, TKey> getKey, params IEnumerable<TItem>[] sources) {
HashSet<TKey> found = new HashSet<TKey>();
List<TItem> queue = new List<TItem>();
object sync = new object();
int alive = 0;
object aliveSync = new object();
foreach (IEnumerable<TItem> source in sources) {
lock (aliveSync) {
alive++;
}
new Thread(s => {
foreach (TItem item in s as IEnumerable<TItem>) {
TKey key = getKey(item);
lock (sync) {
if (found.Add(key)) {
queue.Add(item);
}
}
}
lock (aliveSync) {
alive--;
}
}).Start(source);
}
while (true) {
lock (sync) {
if (queue.Count > 0) {
foreach (TItem item in queue) {
yield return item;
}
queue.Clear();
}
}
lock (aliveSync) {
if (alive == 0) break;
}
Thread.Sleep(100);
}
}
Test stream:
public static IEnumerable<int> SlowRandomFeed(Random rnd) {
int[] values = new int[100];
for (int i = 0; i < 100; i++) {
int pos = rnd.Next(i + 1);
values[i] = i;
int temp = values[pos];
values[pos] = values[i];
values[i] = temp;
}
foreach (int value in values) {
yield return value;
Thread.Sleep(rnd.Next(200));
}
}
Test:
Random rnd = new Random();
foreach (int item in GetParallel(n => n, SlowRandomFeed(rnd), SlowRandomFeed(rnd), SlowRandomFeed(rnd), SlowRandomFeed(rnd))) {
Console.Write("{0:0000 }", item);
}