Due to twitter rating limitation of 180 request per 15 minute. I made this implementation and delay to the task. But it doesn't seems to work. Whats an issue with this?
What i implemented is actually am giving a 15 minutes wait after 180 request. Whether my implementation correct?
var currentRequestIndex = 1;
var timeToDelay = 0;
foreach (var item in items)
{
var contactFeed = item;
if(currentRequestIndex % 180 == 0)
{
timeToDelay = currentRequestIndex*5000;
}
Delay(timeToDelay * 5000).ContinueWith(_ => Task.Factory.StartNew(
() =>
-- call to twitter api here
));
currentRequestIndex++;
}
public Task Delay(int milliseconds)
{
var tcs = new TaskCompletionSource<object>();
new Timer(_ => tcs.SetResult(null)).Change(milliseconds, -1);
return tcs.Task;
}
Well, you set timeToDelay to 0 and then wait timeToDelay * 5000 which given the former is also 0.
Solution 1 - Spread them evenly
Let's assume your network has no lag and all requests are send to twitter immediately. Then in order to spread your requests evenly during the 15 minutes interval you should delay the ith request by precisely i * 15 * 6000 / 180
foreach (var item in items)
{
var contactFeed = item;
delayTime = currentRequestIndex * 15 * 6000 / 180;
Delay(timeToDelay).ContinueWith(_ => Task.Factory.StartNew(
() =>
-- call to twitter api here
));
currentRequestIndex++;
}
Solution 2 - Send them all at once, wait for the rest of the 15 minutes to pass
I'll just post the code, it's pretty much self-explanatory.
Action makeRequests = () =>
{
DateTime start = DateTime.Now;
foreach (var item in items)
{
// Call twitter api here
}
TimeSpan diff = DateTime.Now - start;
Delay(15 * 6000 - diff.Milliseconds).ContinueWith(_ => Task.StartNew(makeRequests));
};
makeRequests();
P. S. By the looks of it, are using .NET v4.0, but if I'm mistaken and you are compiling against v4.5 you can use the built-in Task.Delay method.
Related
I am using C# and ASP.NET Core 6 MVC.
I have a requirement to fetch all results from API using offset whether it is just 64 records or 6300 records. Adjust the offset and do a concurrency call or parallel call to get all records at once. I need to do in the best way.
I am calling an API which results 100 max record per call. Although the overall total result (totalResult) can be around 65, 120, 1500 or 2520, or 6534 etc. There is an offset integer which I can pass into the API to get the further 100 results each time. By default, it is zero, which can brings 100 max records.
For example for totalResult of 65, the offset 0 is sufficient as it will bring all 65 records. For totalResult of 150, the offset 0 will bring 100 records and then for the next iteration, offset has to be 100 to bring more. And likewise for 6530 max records, the offset has to be adjusted 100, 200, 300... to get all results.
Now, I need to run this task parallel to avoid delay time.
This is my function:
var offset = 0
// My async call method
var addressResult = await _postcode.GetAddresses(strPostcode, offset);
if (addressResult?.Results != null && addressResult.Results.Any())
{
// concurrency code to run here with offset
int total = addressResult.Header.TotalResults; //Total Result e,g 6500
var thePostcoderesult = addressResult.Results;
// max result could be any number depends on the Total Result if it is
int maxresult = thePostcoderesult.Count();
}
So in the end when all concurrency calls to API finishes, thePostcoderesult should have all results added to it.
var thePostcoderesult = addressResult.Results;
Now, I am aware we can achieve this through
await Parallel.ForEachAsync(offsets, options, async (offset, ct) =>
with the help of the post ticked answer How to make multiple API calls faster?
I tried implementing that logic - but it gives me result only up to 1000 results as something to do with Offset and Parallel loop is not aligned. As tasks are running 10 times only and it gives 1000 results - although the results with the postcode I am searching is 1630.
Here is my updated code but as I mentioned, it does not wait to finish or run until the total number of offset.
var offset = 0
var addressResult = await _postcode.GetAddresses(strPostcode, offset);
if (addressResult?.Results != null && addressResult.Results.Any())
{
int total = addressResult.Header.TotalResults;
// Setting offset here - but something is not right
IEnumerable<int> offsets = Enumerable
.Range(0, total)
.Select(n => checked(n * 100))
.TakeWhile(offset => offset < Volatile.Read(ref total));
// wanted to use 10 parallel threads which is a safe bet I believe
var options = new ParallelOptions() { MaxDegreeOfParallelism = 10 };
var thePostcoderesult = new List<AddressResult>();
await Parallel.ForEachAsync(offsets, options, async (offset, ct) =>
{
var addressResult = await _postcode.GetAddresses(strPostcode, offset);
if (offset == 0)
{ //I am not using it
//Volatile.Write(ref total, Jresult.Results.Count());
}
thePostcoderesult.AddRange(addressResult.Results);
});
return thePostcoderesult;
}
Apologies in advance for the detailed post - If you can help to do this more correct or neat way, please you are welcome
Many thanks
You got a lot going on there, I don't think it needs to be quite that complicated. Since it seems the initial GetAddresses call tells you how many records you're going to have, you can do something like this:
var initialResponse = await _postcode.GetAddresses(strPostcode, 0);
if (initialResponse?.Results == null || !initialResponse.Results.Any())
{
return;
}
var totalPostCodeResults = new AddressResult[initialResponse.Header.TotalResults];
// fill up to the first 100 since you have it and bail if that's all there is
FillItems(initialResponse.Results, totalPostCodeResults, 0);
if(totalPostCodeResults.Length <= 100)
return totalPostCodeResults;
// Fill the offsets (aka start indexes) starting at 100
var offsets = new List<int>();
var offset = 100;
while(offset < totalPostCodeResults.Length)
{
offsets.Add(offset);
offset+=100;
}
// TODO: add the last one using modulus
// Kick off a task for each offset range
var tasks = new Task[offsets.Count()];
for(int i = 0; i < tasks.Length; i++)
{
// copy i to scoped variable to avoid parallel messiness
var index = i;
tasks[index] = Task.Run(async () => {
var response = await _postcode.GetAddresses(strPostcode, offsets[index]);
FillItems(response.Results, totalPostCodeResults, offsets[index]);
}
}
// Wait for all of them to finish
Task.WaitAll(tasks);
return totalPostCodeResults
void FillItems(List<AddressResult> results, AddressResult[] totalArray, int startIndex)
{
var index = startIndex;
results.ForEach(item => totalArray[index++] = item);
}
This question is a follow up on How to optimize performance in a simple TPL DataFlow pipeline?
The source code is here - https://github.com/MarkKharitonov/LearningTPLDataFlow
Given:
Several solutions covering about 400 C# projects encompassing thousands of C# source files totaling in more than 10,000,000 lines of code.
A file containing string literals, one per line.
I want to produce a JSON file listing all the occurrences of the literals in the source code. For every matching line I want to have the following pieces of information:
The project path
The C# file path
The matching line itself
The matching line number
And all the records arranged as a dictionary keyed by the respective literal.
So the challenge is to do it as efficiently as possible (in C#, of course).
The DataFlow pipeline can be found in this file - https://github.com/MarkKharitonov/LearningTPLDataFlow/blob/master/FindStringCmd.cs
Here it is:
private void Run(string workspaceRoot, string outFilePath, string[] literals, bool searchAllFiles, int workSize, int maxDOP1, int maxDOP2, int maxDOP3, int maxDOP4)
{
var res = new SortedDictionary<string, List<MatchingLine>>();
var projects = (workspaceRoot + "build\\projects.yml").YieldAllProjects();
var progress = new Progress();
var taskSchedulerPair = new ConcurrentExclusiveSchedulerPair(TaskScheduler.Default, Environment.ProcessorCount);
var produceCSFiles = new TransformManyBlock<ProjectEx, CSFile>(p => YieldCSFiles(p, searchAllFiles), new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = maxDOP1
});
var produceCSFileContent = new TransformBlock<CSFile, CSFile>(CSFile.PopulateContentAsync, new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = maxDOP2
});
var produceWorkItems = new TransformManyBlock<CSFile, (CSFile CSFile, int Pos, int Length)>(csFile => csFile.YieldWorkItems(literals, workSize, progress), new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = maxDOP3,
TaskScheduler = taskSchedulerPair.ConcurrentScheduler
});
var produceMatchingLines = new TransformManyBlock<(CSFile CSFile, int Pos, int Length), MatchingLine>(o => o.CSFile.YieldMatchingLines(literals, o.Pos, o.Length, progress), new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = maxDOP4,
TaskScheduler = taskSchedulerPair.ConcurrentScheduler
});
var getMatchingLines = new ActionBlock<MatchingLine>(o => AddResult(res, o));
var linkOptions = new DataflowLinkOptions { PropagateCompletion = true };
produceCSFiles.LinkTo(produceCSFileContent, linkOptions);
produceCSFileContent.LinkTo(produceWorkItems, linkOptions);
produceWorkItems.LinkTo(produceMatchingLines, linkOptions);
produceMatchingLines.LinkTo(getMatchingLines, linkOptions);
var progressTask = Task.Factory.StartNew(() =>
{
var delay = literals.Length < 10 ? 1000 : 10000;
for (; ; )
{
var current = Interlocked.Read(ref progress.Current);
var total = Interlocked.Read(ref progress.Total);
Console.Write("Total = {0:n0}, Current = {1:n0}, Percents = {2:P} \r", total, current, ((double)current) / total);
if (progress.Done)
{
break;
}
Thread.Sleep(delay);
}
Console.WriteLine();
}, TaskCreationOptions.LongRunning);
projects.ForEach(p => produceCSFiles.Post(p));
produceCSFiles.Complete();
getMatchingLines.Completion.GetAwaiter().GetResult();
progress.Done = true;
progressTask.GetAwaiter().GetResult();
res.SaveAsJson(outFilePath);
}
The default parameters are (https://github.com/MarkKharitonov/LearningTPLDataFlow/blob/master/FindStringCmd.cs#L24-L28):
private int m_maxDOP1 = 3;
private int m_maxDOP2 = 20;
private int m_maxDOP3 = Environment.ProcessorCount;
private int m_maxDOP4 = Environment.ProcessorCount;
private int m_workSize = 1_000_000;
My idea is to divide the work into work items, where a work item size is computed by multiplying the number of lines in the respective file by the count of the string literals. So, if a C# file contains 500 lines, then searching it for all the 3401 literals results in a work of size 3401 * 500 = 1700500
The unit of work is by default 1000000 lines, so in the aforementioned example the file would result in 2 work items:
Literals 0..1999
Literals 2000..3400
And it is the responsibility of the produceWorkItems block to generate these work items from files.
Example runs:
C:\work\TPLDataFlow [master ≡]> .\bin\Debug\net5.0\TPLDataFlow.exe find-string -d C:\xyz\tip -o c:\temp -l C:\temp\2.txt
Locating all the instances of the 4 literals found in the file C:\temp\2.txt in the C# code ...
Total = 49,844,516, Current = 49,702,532, Percents = 99.72%
Elapsed: 00:00:18.4320676
C:\work\TPLDataFlow [master ≡]> .\bin\Debug\net5.0\TPLDataFlow.exe find-string -d C:\xyz\tip -o c:\temp -l C:\temp\1.txt
Locating all the instances of the 3401 literals found in the file c:\temp\1.txt in the C# code ...
Total = 42,379,095,775, Current = 42,164,259,870, Percents = 99.49%
Elapsed: 01:44:13.4289270
Question
Many work items are undersized. If I have 3 C# files, 20 lines each, my current code would produce 3 work items, because in my current implementation work items never cross a file boundary. This is inefficient. Ideally, they would be batched into a single work item, because 60 * 3401 = 204060 < 1000000. But the BatchBlock cannot be used here, because it expects me to provide the batch size, which I do not know - it depends on the work items in the pipeline.
How would you achieve such batching ?
I have realized something. Maybe it is obvious, but I have just figured it out. The TPL DataFlow library is of no use if one can buffer all the items first. So in my case - I can do that. And so, I can buffer and sort the items from large to small. This way a simple Parallel.ForEach will do the work just fine. Having realized that I changed my implementation to use Reactive like this:
Phase 1 - get all the items, this is where all the IO is
var input = (workspaceRoot + "build\\projects.yml")
.YieldAllProjects()
.ToObservable()
.Select(project => Observable.FromAsync(() => Task.Run(() => YieldFiles(project, searchAllFiles))))
.Merge(2)
.SelectMany(files => files)
.Select(file => Observable.FromAsync(file.PopulateContentAsync))
.Merge(10)
.ToList()
.GetAwaiter().GetResult()
.AsList();
input.Sort((x, y) => y.EstimatedLineCount - x.EstimatedLineCount);
Phase 2 - find all the matching lines (CPU only)
var res = new SortedDictionary<string, List<MatchingLine>>();
input
.ToObservable()
.Select(file => Observable.FromAsync(() => Task.Run(() => file.YieldMatchingLines(literals, 0, literals.Count, progress).ToList())))
.Merge(maxDOP.Value)
.ToList()
.GetAwaiter().GetResult()
.SelectMany(m => m)
.ForEach(m => AddResult(res, m));
So, even though I have hundreds of projects, thousands of files and millions lines of code - it is not the scale for TPL DataFlow, because my tool can read all the files into memory, rearrange in a favorable order and then process.
Regarding the first question (configuring the pipeline), I can't really offer any guidance. Optimizing the parameters of a dataflow pipeline seems like a black art to me!
Regarding the second question (how to batch a work load consisting of work items having unknown size at compile time), you could use the custom BatchBlock<T> below. It uses the DataflowBlock.Encapsulate method in order to combine two dataflow blocks to one. The first block in an ActionBlock<T> that receives the input and puts it into a buffer, and the second is a BufferBlock<T[]> that holds the batched items and propagates them downstream. The weightSelector is a lambda that returns the weight of each received item. When the accumulated weight surpasses the batchWeight threshold, a batch is emitted.
public static IPropagatorBlock<T, T[]> CreateDynamicBatchBlock<T>(
int batchWeight, Func<T, int> weightSelector,
DataflowBlockOptions options = null)
{
// Arguments validation omitted
options ??= new DataflowBlockOptions();
var outputBlock = new BufferBlock<T[]>(options);
List<T> buffer = new List<T>();
int sumWeight = 0;
var inputBlock = new ActionBlock<T>(async item =>
{
checked
{
int weight = weightSelector(item);
if (weight + sumWeight > batchWeight && buffer.Count > 0)
await SendBatchAsync();
buffer.Add(item);
sumWeight += weight;
if (sumWeight >= batchWeight) await SendBatchAsync();
}
}, new()
{
BoundedCapacity = options.BoundedCapacity,
CancellationToken = options.CancellationToken,
TaskScheduler = options.TaskScheduler,
MaxMessagesPerTask = options.MaxMessagesPerTask,
NameFormat = options.NameFormat
});
PropagateCompletion(inputBlock, outputBlock, async () =>
{
if (buffer.Count > 0) await SendBatchAsync();
});
Task SendBatchAsync()
{
var batch = buffer.ToArray();
buffer.Clear();
sumWeight = 0;
return outputBlock.SendAsync(batch);
}
static async void PropagateCompletion(IDataflowBlock source,
IDataflowBlock target, Func<Task> postCompletionAction)
{
try { await source.Completion.ConfigureAwait(false); } catch { }
Exception ex =
source.Completion.IsFaulted ? source.Completion.Exception : null;
try { await postCompletionAction(); }
catch (Exception actionError) { ex = actionError; }
if (ex != null) target.Fault(ex); else target.Complete();
}
return DataflowBlock.Encapsulate(inputBlock, outputBlock);
}
Usage example:
var batchBlock = CreateDynamicBatchBlock<WorkItem>(1_000_000, wi => wi.Size);
If the weight int type has not enough range and overflows, you could switch to long or double.
I'm trying to send/publish at 100ms, and the message looks like this
x.x.x.x.x.x.x.x.x.x
So every 100ms or so subscribe will be called. My problem is that I think, it's not fast enough, (i.e if the current subscribe is not yet done and another subscribe is being called/message being published)
I was thinking, on how could I keep on populating the list, at the same time graph with Oxyplot. Can I use threading for this?
var x = 0;
channel.Subscribe(message =>
{
this.RunOnUiThread(() =>
{
var sample = message.Data;
byte[] data = (byte[])sample;
var data1 = System.Text.Encoding.ASCII.GetString(data);
var splitData = data1.Split('-');
foreach(string s in splitData) //Contains 10
{
double y = double.Parse(s);
y /= 100;
series1.Points.Add(new DataPoint(x, y));
MyModel.InvalidatePlot(true);
x++;
}
if (x >= xaxis.Maximum)
{
xaxis.Pan(xaxis.Transform(-1 + xaxis.Offset));
}
});
});
Guaranteeing a minimum execution time goes into Realtime Programming. And with a App on a Smartphone OS you are about as far from that I can imagine you to be. The only thing farther "off" would be any interpreted langauge (PHP, Python).
The only thing you can do is define a minimum time between itterations. I did once wrote some example code for doing that from within a alternative thread. A basic rate limiting code:
integer interval = 20;
DateTime dueTime = DateTime.Now.AddMillisconds(interval);
while(true){
if(DateTime.Now >= dueTime){
//insert code here
//Update next dueTime
dueTime = DateTime.Now.AddMillisconds(interval);
}
else{
//Just yield to not tax out the CPU
Thread.Sleep(1);
}
}
Note that DateTime.Now only resturns something new about every 18 ms, so anything less then 20 would be too little.
If you think you can not afford a minimum time, you may need to read the Speed Rant.
I have an observable which streams a value for each ms. , this is done every 250 ms. ( meaning 250 values in 250 ms (give or take) ).
Mock sample code :
IObservable<IEnumerable<int>> input = from _ in Observable.Interval(TimeSpan.FromMilliseconds(250))
select CreateSamples(250);
input.Subscribe(values =>
{
for (int i = 0; i < values.Count(); i++)
{
Console.WriteLine("Value : {0}", i);
}
});
Console.ReadKey();
private static IEnumerable<int> CreateSamples(int count)
{
for (int i = 0; i < 250; i++)
{
yield return i;
}
}
What i need is to create some form of process observable which process the input observable in a rate of 8 values every 33 ms
Something along the line of this :
IObservable<IEnumerable<int>> process = from _ in Observable.Interval(TimeSpan.FromMilliseconds(33))
select stream.Take(8);
I was wondering 2 things :
1) How can i write the first sample with the built in operators that reactive extensions provides ?
2) How can i create that process stream which takes values from the input stream
which with the behavior iv'e described ?
I tried using Window as a suggestion from comments below .
input.Window(TimeSpan.FromMilliseconds(33)).Take(8).Subscribe(winObservable => Debug.WriteLine(" !! "));
It seems as though i get 8 and only 8 observables of an unknown number of values
What i require is a recurrence of 8 values every 33 ms. from input observable.
What the code above did is 8 observables of IEnumrable and then stand idle.
EDIT : Thanks to James World . here's a sample .
var input = Observable.Range(1, int.MaxValue);
var timedInput = Observable.Interval(TimeSpan.FromMilliseconds(33))
.Zip(input.Buffer(8), (_, buffer) => buffer);
timedInput.SelectMany(x => x).Subscribe(Console.WriteLine);
But now it get's trickier i need for the Buffer value to calculated
i need this to be done by the actual MS passed between Intervals
when you write a TimeSpan.FromMilliseconds(33) the Interval event of the timer would actually be raised around 45 ms give or take .
Is there any way to calculate the buffer , something like PSUDO
input.TimeInterval().Buffer( s => s.Interval.Milliseconds / 4)
You won't be able to do this with any kind of accuracy with a reasonable solution because .NET timer resolution is 15ms.
If the timer was fast enough, you would have to flatten and repackage the stream with a pacer, something like:
// flatten stream
var fs = input.SelectMany(x => x);
// buffer 8 values and release every 33 milliseconds
var xs = Observable.Interval(TimeSpan.FromMilliseconds(33))
.Zip(fs.Buffer(8), (_,buffer) => buffer);
Although as I said, this will give very jittery timing. If that kind of timing resolution is important to you, go native!
I agree with James' analysis.
I'm wondering if this query gives you a better result:
IObservable<IList<int>> input =
Observable
.Generate(
0,
x => true,
x => x < 250 ? x + 1 : 0,
x => x,
x => TimeSpan.FromMilliseconds(33.0 / 8.0))
.Buffer(TimeSpan.FromMilliseconds(33.0));
I have a sequence of stock ticks coming in and I want to take all the data in the last hour and do some processing on it. I am trying to achieve this with reactive extensions 2.0. I read on another post to use Interval but i think that is deprecated.
Would this extension method solve your problem?
public static IObservable<T[]> RollingBuffer<T>(
this IObservable<T> #this,
TimeSpan buffering)
{
return Observable.Create<T[]>(o =>
{
var list = new LinkedList<Timestamped<T>>();
return #this.Timestamp().Subscribe(tx =>
{
list.AddLast(tx);
while (list.First.Value.Timestamp < DateTime.Now.Subtract(buffering))
{
list.RemoveFirst();
}
o.OnNext(list.Select(tx2 => tx2.Value).ToArray());
}, ex => o.OnError(ex), () => o.OnCompleted());
});
}
You are looking for the Window operators!
Here is a lengthy article I wrote on working with sequences of coincidence (overlapping windows of sequences)
http://introtorx.com/Content/v1.0.10621.0/17_SequencesOfCoincidence.html
So if you wanted to build a rolling average you could use this sort of code
var scheduler = new TestScheduler();
var notifications = new Recorded<Notification<double>>[30];
for (int i = 0; i < notifications.Length; i++)
{
notifications[i] = new Recorded<Notification<double>>(i*1000000, Notification.CreateOnNext<double>(i));
}
//Push values into an observable sequence 0.1 seconds apart with values from 0 to 30
var source = scheduler.CreateHotObservable(notifications);
source.GroupJoin(
source, //Take values from myself
_=>Observable.Return(0, scheduler), //Just the first value
_=>Observable.Timer(TimeSpan.FromSeconds(1), scheduler),//Window period, change to 1hour
(lhs, rhs)=>rhs.Sum()) //Aggregation you want to do.
.Subscribe(i=>Console.WriteLine (i));
scheduler.Start();
And we can see it output the rolling sums as it receives values.
0, 1, 3, 6, 10, 15, 21, 28...
Very likely Buffer is what you are looking for:
var hourlyBatch = ticks.Buffer(TimeSpan.FromHours(1));
Or assuming data is already Timestamped, simply using Scan:
public static IObservable<IReadOnlyList<Timestamped<T>>> SlidingWindow<T>(this IObservable<Timestamped<T>> self, TimeSpan length)
{
return self.Scan(new LinkedList<Timestamped<T>>(),
(ll, newSample) =>
{
ll.AddLast(newSample);
var oldest = newSample.Timestamp - length;
while (ll.Count > 0 && list.First.Value.Timestamp < oldest)
list.RemoveFirst();
return list;
}).Select(l => l.ToList().AsReadOnly());
}