Algorithm for preventing burst from a producer - c#

I have the following :
One producer that produces random integer (around one every minute). Eg : 154609722148751
One consumer that consumes theses integers one by one. Consumption is around 3 seconds long.
From time to time the producer get crazy and produces only one 'kind' of figure very quickly and then get back to normal.
Eg : 6666666666666666666666666666666666666666666666675444696 in 1 second.
My goal is to have as lower as possible as different kind of figure not consumed.
Say, in the previous sample
I have :
a lot of '6' not consumed
one '7' not consumed
one '5' not consumed
three '4' not consumed
one '9' not consumed
If I use a simple FIFO algorithm I am going to wait a long time before all the '6' being consumed. I would prefer to 'priortize' the other figures and THEN consume the '6'.
Does such an algorithm already exists ? (C# implementation is a plus)
Currently, I was thinking about this algorithm :
have a queue for each figure (Q0,Q1,Q2 ..., Q9)
sequentially dequeue one item for each queue :
private int _currentQueueId;
private Queue<T>[] _allQueues;
public T GetNextItemToConsume<T>()
{
//We assume that there is at least one item to consume, and no lock needed
var found = false;
while(!found)
{
var nextQueue = _allQueues[_currentQueueId++ % 10];
if(nextQueue.Count > 0)
return nextQueue.DeQueue();
}
}
Do you have better algorithm than this one ? (or any idea)
NB1 : I don't have the lead on the consumption process (that is to say I can't increase the consumption speed nor the number of consumption thread ..) (indeed an infinite consumation speed would solve my issue)
NB2 : exact time's figures are not relevant but we can assume that consumption is ten times quicker that production
NB3 : I don't have the lead on the 'crazy' behaviour of the producer and in fact it is a normal (but not so frequent) production behaviour

Here is a more complete example than my comment above:
sealed class Item {
public Item(int group) {
ItemGroup = group;
}
public int ItemGroup { get; private set; }
}
Item TakeNextItem(IList<Item> items) {
const int LowerBurstLimit = 1;
if (items == null)
throw new ArgumentNullException("items");
if (items.Count == 0)
return null;
var item = items.GroupBy(x => x.ItemGroup)
.OrderBy(x => Math.Max(LowerBurstLimit, x.Count()))
.First()
.First();
items.Remove(item);
return item;
}
My idea here is to sort the items based on their frequency and then just take from the one with lowest frequency. It is the same idea as your with your multiple queues but it is calculated on the fly.
If there are multiple groups with same frequency it will take the oldest item. (assuming GroupBy and OrderBy are stable. They are in practice but I am not sure it is stated in the documentation)
Increase LowerBurstLimit if you want to process the item in chronological order except the ones with more than LowerBurstLimit item in the queue.
To measure the time I just created this quick code in LinqPad.
(Eric Lippert: please ignore this part :-))
void Main()
{
var rand = new Random();
var items = Enumerable.Range(1, 1000)
.Select(x => { // simulate burst
int group = rand.Next(100);
return new Item(group < 90 ? 1 : (group % 10));
})
.ToList();
var first = TakeNextItem(items); // JIT
var sw = new Stopwatch();
sw.Start();
while (TakeNextItem(items) != null) {
}
sw.Stop();
Console.WriteLine("Elapsed: {0} ms", sw.ElapsedMilliseconds);
}
When I run this code with a 1000 items it takes around 80ms on my 3 year old laptop. i.e. on average 80µs per "Take".
GroupBy should be O(N*M), OrderBy O(M*LogM) and Remove O(N) (N=average queue length, M=number of groups) so the performance should scale linearly with N. i.e. a 10000 item queue take ~800µs per "Take" (I got ~700µs with 10000 items in the test)

I would use the 10 queues as you mentionned and select the one from which to dequeue statistically based on the number of element present in the particular queue. The queue with the highest number of elements is more likely to be selected for dequeue.
For better perf, you need to keep track of the total count of elements across all queues. For each dequeue operation, draw a random int X between 0 and total count-1, this will tell you from which queue to dequeue (loop through the queues a substract the number of elements in the queue from X, until you would go below zero, then pick that queue).

Related

Performance of Queue? Expecting to have heavy RAM usage but instead I get heavy CPU usage

Let's say I am receiving a signal at a variable rate fluctuating between 50 and 200 times per second. I want to store the timestamp of each signal I received into a queue, so I can remove it from the queue when the signal was received more than 1 week ago.
public Queue<long> myQueue = new Queue<long>();
public OnSignalReceived()
{
myQueue.Enqueue(DateTime.UtcNow.Ticks);
PurgeOldSignals();
}
public void PurgeOldSignals()
{
while (myQueue.Count > 0 && myQueue.Peek() < DateTime.UtcNow.AddDays(-7).Ticks)
{
myQueue.Dequeue();
}
}
Is there a more efficient way to do this? This is my implementation and I was expecting to take advantage of using a lot of memory (because let's say an average of 100 signals per second, it means the queue will hold about 60 millions items(!) before starting to purge items) in exchange of having computational performance because of the O(1) time to Enqueue() and Dequeue().
After testing however, I noticed that the bottleneck is the CPU and not the RAM. In fact, the RAM barely gets eaten up, but the CPU usage never ceases to increase. Here is the result after about 16 hours of running (clearly far away from my 7 days objective)
Any suggestions to optimize this?
EDIT 1:
In fact, the whole purpose of this is just to know at any time how many signals I got in the last week (precise to the actual second). Maybe there is a better way to achieve this?
For given task I would make circular queue of 3600*24*7 integers. Every integer would mean number of events in that second (for every second in one week). It would only need few megabytes. On measured event the integer corresponding to actual second (=now) would increment. It would be convenient to have sum of all items in the array and just update it on change to get it fast.
public class History
{
protected int eventCount = 0;
protected int[] array;
protected readonly int _intervalLength_ms;
long actualTime = 0;
int actIndex = 0;
public History(int intervalLength_ms, int numberOfIntervals)
{
_intervalLength_ms = intervalLength_ms;
array = new int[numberOfIntervals];
}
public int EventCount
{
get
{
Update();
return eventCount;
}
}
public void InsertEvent()
{
Update();
array[actIndex]++;
eventCount++;
}
protected void Update()
{
long newTime = DateTime.Now.Ticks / 10000 / _intervalLength_ms;
while (newTime > actualTime && eventCount > 0)
{
actualTime++;
actIndex++;
if (actIndex >= array.Length)
{
actIndex = 0;
}
eventCount -= array[actIndex];
array[actIndex] = 0;
}
if (newTime > actualTime)
{
actualTime = newTime;
actIndex = (int)(actualTime % array.Length);
}
}
}
It would be constructed with parameters new History(1000, 3600*24*7).
I see two issues:
(minor) but DateTime.Now is considerably more expensive then DateTime.UTCNow, so you might want to bring that out of the loop instead of doing it 60 million times.
(major) you are looping through millions of items every time you get a signal. If you only want to purge it by 7 days old signals, you should only run your purge once a day.
I suspect the reason this is slow is because of this
If Count already equals the capacity, the capacity of the Queue is
increased by automatically reallocating the internal array, and the
existing elements are copied to the new array before the new element
is added.
If Count is less than the capacity of the internal array,
this method is an O(1) operation. If the internal array needs to be
reallocated to accommodate the new element, this method becomes an
O(n) operation, where n is Count.
From the MSDN documentation on Queue<T>.Enqueue() here. The CPU usage is increasing proprtional to n because you are performing an O(n) operation on myQueue.
The solution then is to allocate as much memory as you want this program to use immediately, by calling var myQueue = new Queue<long>(n); and then your code will make the desired change and switch to high memory usage instead of CPU usage.

Getting all combinations of K and less elements in List of N elements with big K

I want to have all combination of elements in a list for a result like this:
List: {1,2,3}
1
2
3
1,2
1,3
2,3
My problem is that I have 180 elements, and I want to have all combinations up to 5 elements. With my tests with 4 elements, it took a long time (2 minutes) but all went well. But with 5 elements, I get a run out of memory exception.
My code presently is this:
public IEnumerable<IEnumerable<Rondin>> getPossibilites(List<Rondin> rondins)
{
var combin5 = rondins.Combinations(5);
var combin4 = rondins.Combinations(4);
var combin3 = rondins.Combinations(3);
var combin2 = rondins.Combinations(2);
var combin1 = rondins.Combinations(1);
return combin5.Concat(combin4).Concat(combin3).Concat(combin2).Concat(combin1).ToList();
}
With the fonction: (taken from this question: Algorithm to return all combinations of k elements from n)
public static IEnumerable<IEnumerable<T>> Combinations<T>(this IEnumerable<T> elements, int k)
{
return k == 0 ? new[] { new T[0] } :
elements.SelectMany((e, i) =>
elements.Skip(i + 1).Combinations(k - 1).Select(c => (new[] { e }).Concat(c)));
}
I need to search in the list for a combination where each element added up is near (with a certain precision) to a value, this for each element in an other list. There is all my code for this part:
var possibilites = getPossibilites(opt.rondins);
possibilites = possibilites.Where(p => p.Sum(r => r.longueur + traitScie) < 144);
foreach(BilleOptimisee b in opt.billesOptimisees)
{
var proches = possibilites.Where(p => p.Sum(r => (r.longueur + traitScie)) < b.chute && Math.Abs(b.chute - p.Sum(r => r.longueur)) - (p.Count() * 0.22) < 0.01).OrderByDescending(p => p.Sum(r => r.longueur)).ElementAt(0);
if(proches != null)
{
foreach (Rondin r in proches)
{
opt.rondins.Remove(r);
b.rondins.Add(r);
possibilites = possibilites.Where(p => !p.Contains(r));
}
}
}
With the code I have, how can I limit the memory taken by my list ? Or is there a better solution to search in a very big set of combinations ?
Please, if my question is not good, tell me why and I will do my best to learn and ask better questions next time ;)
Your output list for combinations of 5 elements will have ~1.5*10^9 (that's billion with b) sublists of size 5. If you use 32bit integers, even neglecting lists overhead and assuming you have a perfect list with 0b overhead - that will be ~200GB!
You should reconsider if you actually need to generate the list like you do, some alternative might be: streaming the list of elements - i.e. generating them on the fly.
That can be done by creating a function, which gets the last combination as an argument - and outputs the next. (to think how it is done, think about increasing by one a number. you go from last to first, remembering a "carry over" until you are done)
A streaming example for choosing 2 out of 4:
start: {4,3}
curr = start {4, 3}
curr = next(curr) {4, 2} // reduce last by one
curr = next(curr) {4, 1} // reduce last by one
curr = next(curr) {3, 2} // cannot reduce more, reduce the first by one, and set the follower to maximal possible value
curr = next(curr) {3, 1} // reduce last by one
curr = next(curr) {2, 1} // similar to {3,2}
done.
Now, you need to figure how to do it for lists of size 2, then generalize it for arbitrary size - and program your streaming combination generator.
Good Luck!
Let your precision be defined in the imaginary spectrum.
Use a real index to access the leaf and then traverse the leaf with the required precision.
See PrecisLise # http://net7mma.codeplex.com/SourceControl/latest#Common/Collections/Generic/PrecicseList.cs
While the implementation is not 100% complete as linked you can find where I used a similar concept here:
http://net7mma.codeplex.com/SourceControl/latest#RtspServer/MediaTypes/RFC6184Media.cs
Using this concept I was able to re-order h.264 Access Units and their underlying Network Access Layer Components in what I consider a very interesting way... outside of interesting it also has the potential to be more efficient using close the same amount of memory.
et al, e.g, 0 can be proceeded by 0.1 or 0.01 or 0.001, depending on the type of the key in the list (double, float, Vector, inter alia) you may have the added benefit of using the FPU or even possibly Intrinsics if supported by your processor, thus making sorting and indexing much faster than would be possible on normal sets regardless of the underlying storage mechanism.
Using this concept allows for very interesting ordering... especially if you provide a mechanism to filter the precision.
I was also able to find several bugs in the bit-stream parser of quite a few well known media libraries using this methodology...
I found my solution, I'm writing it here so that other people that has a similar problem than me can have something to work with...
I made a recursive fonction that check for a fixed amount of possibilities that fit the conditions. When the amount of possibilities is found, I return the list of possibilities, do some calculations with the results, and I can restart the process. I added a timer to stop the research when it takes too long. Since my condition is based on the sum of the elements, I do every possibilities with distinct values, and search for a small amount of possibilities each time (like 1).
So the fonction return a possibility with a very high precision, I do what I need to do with this possibility, I remove the elements of the original list, and recall the fontion with the same precision, until there is nothing returned, so I can continue with an other precision. When many precisions are done, there is only about 30 elements in my list, so I can call for all the possibilities (that still fits the maximum sum), and this part is much easier than the beginning.
There is my code:
public List<IEnumerable<Rondin>> getPossibilites(IEnumerable<Rondin> rondins, int nbElements, double minimum, double maximum, int instance = 0, double longueur = 0)
{
if(instance == 0)
timer = DateTime.Now;
List<IEnumerable<Rondin>> liste = new List<IEnumerable<Rondin>>();
//Get all distinct rondins that can fit into the maximal length
foreach (Rondin r in rondins.Where(r => r.longueur < (maximum - longueur)).DistinctBy(r => r.longueur).OrderBy(r => r.longueur))
{
//Check the current length
double longueur2 = longueur + r.longueur + traitScie;
//If the current length is under the maximal length
if (longueur2 < maximum)
{
//Get all the possibilities with all rondins except the current one, and add them to the list
foreach (IEnumerable<Rondin> poss in getPossibilites(rondins.Where(rondin => rondin.id != r.id), nbElements - liste.Count, minimum, maximum, instance + 1, longueur2).Select(possibilite => possibilite.Concat(new Rondin[] { r })))
{
liste.Add(poss);
if (liste.Count >= nbElements && nbElements > 0)
break;
}
//If this the current length in higher than the minimum, add it to the list
if (longueur2 >= minimum)
liste.Add(new Rondin[] { r });
}
//If we have enough possibilities, we stop the research
if (liste.Count >= nbElements && nbElements > 0)
break;
//If the research is taking too long, stop the research and return the list;
if (DateTime.Now.Subtract(timer).TotalSeconds > 30)
break;
}
return liste;
}

Why is processing a sorted array slower than an unsorted array?

I have a list of 500000 randomly generated Tuple<long,long,string> objects on which I am performing a simple "between" search:
var data = new List<Tuple<long,long,string>>(500000);
...
var cnt = data.Count(t => t.Item1 <= x && t.Item2 >= x);
When I generate my random array and run my search for 100 randomly generated values of x, the searches complete in about four seconds. Knowing of the great wonders that sorting does to searching, however, I decided to sort my data - first by Item1, then by Item2, and finally by Item3 - before running my 100 searches. I expected the sorted version to perform a little faster because of branch prediction: my thinking has been that once we get to the point where Item1 == x, all further checks of t.Item1 <= x would predict the branch correctly as "no take", speeding up the tail portion of the search. Much to my surprise, the searches took twice as long on a sorted array!
I tried switching around the order in which I ran my experiments, and used different seed for the random number generator, but the effect has been the same: searches in an unsorted array ran nearly twice as fast as the searches in the same array, but sorted!
Does anyone have a good explanation of this strange effect? The source code of my tests follows; I am using .NET 4.0.
private const int TotalCount = 500000;
private const int TotalQueries = 100;
private static long NextLong(Random r) {
var data = new byte[8];
r.NextBytes(data);
return BitConverter.ToInt64(data, 0);
}
private class TupleComparer : IComparer<Tuple<long,long,string>> {
public int Compare(Tuple<long,long,string> x, Tuple<long,long,string> y) {
var res = x.Item1.CompareTo(y.Item1);
if (res != 0) return res;
res = x.Item2.CompareTo(y.Item2);
return (res != 0) ? res : String.CompareOrdinal(x.Item3, y.Item3);
}
}
static void Test(bool doSort) {
var data = new List<Tuple<long,long,string>>(TotalCount);
var random = new Random(1000000007);
var sw = new Stopwatch();
sw.Start();
for (var i = 0 ; i != TotalCount ; i++) {
var a = NextLong(random);
var b = NextLong(random);
if (a > b) {
var tmp = a;
a = b;
b = tmp;
}
var s = string.Format("{0}-{1}", a, b);
data.Add(Tuple.Create(a, b, s));
}
sw.Stop();
if (doSort) {
data.Sort(new TupleComparer());
}
Console.WriteLine("Populated in {0}", sw.Elapsed);
sw.Reset();
var total = 0L;
sw.Start();
for (var i = 0 ; i != TotalQueries ; i++) {
var x = NextLong(random);
var cnt = data.Count(t => t.Item1 <= x && t.Item2 >= x);
total += cnt;
}
sw.Stop();
Console.WriteLine("Found {0} matches in {1} ({2})", total, sw.Elapsed, doSort ? "Sorted" : "Unsorted");
}
static void Main() {
Test(false);
Test(true);
Test(false);
Test(true);
}
Populated in 00:00:01.3176257
Found 15614281 matches in 00:00:04.2463478 (Unsorted)
Populated in 00:00:01.3345087
Found 15614281 matches in 00:00:08.5393730 (Sorted)
Populated in 00:00:01.3665681
Found 15614281 matches in 00:00:04.1796578 (Unsorted)
Populated in 00:00:01.3326378
Found 15614281 matches in 00:00:08.6027886 (Sorted)
When you are using the unsorted list all tuples are accessed in memory-order. They have been allocated consecutively in RAM. CPUs love accessing memory sequentially because they can speculatively request the next cache line so it will always be present when needed.
When you are sorting the list you put it into random order because your sort keys are randomly generated. This means that the memory accesses to tuple members are unpredictable. The CPU cannot prefetch memory and almost every access to a tuple is a cache miss.
This is a nice example for a specific advantage of GC memory management: data structures which have been allocated together and are used together perform very nicely. They have great locality of reference.
The penalty from cache misses outweighs the saved branch prediction penalty in this case.
Try switching to a struct-tuple. This will restore performance because no pointer-dereference needs to occur at runtime to access tuple members.
Chris Sinclair notes in the comments that "for TotalCount around 10,000 or less, the sorted version does perform faster". This is because a small list fits entirely into the CPU cache. The memory accesses might be unpredictable but the target is always in cache. I believe there is still a small penalty because even a load from cache takes some cycles. But that seems not to be a problem because the CPU can juggle multiple outstanding loads, thereby increasing throughput. Whenever the CPU hits a wait for memory it will still speed ahead in the instruction stream to queue as many memory operations as it can. This technique is used to hide latency.
This kind of behavior shows how hard it is to predict performance on modern CPUs. The fact that we are only 2x slower when going from sequential to random memory access tell me how much is going on under the covers to hide memory latency. A memory access can stall the CPU for 50-200 cycles. Given that number one could expect the program to become >10x slower when introducing random memory accesses.
LINQ doesn't know whether you list is sorted or not.
Since Count with predicate parameter is extension method for all IEnumerables, I think it doesn't even know if it's running over the collection with efficient random access. So, it simply checks every element and Usr explained why performance got lower.
To exploit performance benefits of sorted array (such as binary search), you'll have to do a little bit more coding.

Comparing 2 huge lists using C# multiple times (with a twist)

Hey everyone, great community you got here. I'm an Electrical Engineer doing some "programming" work on the side to help pay for bills. I say this because I want you to take into consideration that I don't have proper Computer Science training, but I have been coding for the past 7 years.
I have several excel tables with information (all numeric), basically it is "dialed phone numbers" in one column and number of minutes to each of those numbers on another. Separately I have a list of "carrier prefix code numbers" for the different carriers in my country. What I want to do is separate all the "traffic" per carrier. Here is the scenario:
First dialed number row: 123456789ABCD,100 <-- That would be a 13 digit phone number and 100 minutes.
I have a list of 12,000+ prefix codes for carrier 1, these codes vary in length, and I need to check everyone of them:
Prefix Code 1: 1234567 <-- this code is 7 digits long.
I need to check the first 7 digits for the dialed number an compare it to the dialed number, if a match is found, I would add the number of minutes to a subtotal for later use. Please consider that not all prefix codes are the same length, some times they are shorter or longer.
Most of this should be a piece of cake, and I could should be able to do it, but I'm getting kind of scared with the massive amount of data; Some times the dialed number lists consists of up to 30,000 numbers, and the "carrier prefix code" lists around 13,000 rows long, and I usually check 3 carriers, that means I have to do a lot of "matches".
Does anyone have an idea of how to do this efficiently using C#? Or any other language to be kind honest. I need to do this quite often and designing a tool to do it would make much more sense. I need a good perspective from someone that does have that "Computer Scientist" background.
Lists don't need to be in excel worksheets, I can export to csv file and work from there, I don't need an "MS Office" interface.
Thanks for your help.
Update:
Thank you all for your time on answering my question. I guess in my ignorance I over exaggerated the word "efficient". I don't perform this task every few seconds. It's something I have to do once per day and I hate to do with with Excel and VLOOKUPs, etc.
I've learned about new concepts from you guys and I hope I can build a solution(s) using your ideas.
UPDATE
You can do a simple trick - group the prefixes by their first digits into a dictionary and match the numbers only against the correct subset. I tested it with the following two LINQ statements assuming every prefix has at least three digis.
const Int32 minimumPrefixLength = 3;
var groupedPefixes = prefixes
.GroupBy(p => p.Substring(0, minimumPrefixLength))
.ToDictionary(g => g.Key, g => g);
var numberPrefixes = numbers
.Select(n => groupedPefixes[n.Substring(0, minimumPrefixLength)]
.First(n.StartsWith))
.ToList();
So how fast is this? 15.000 prefixes and 50.000 numbers took less than 250 milliseconds. Fast enough for two lines of code?
Note that the performance heavily depends on the minimum prefix length (MPL), hence on the number of prefix groups you can construct.
MPL Runtime
-----------------
1 10.198 ms
2 1.179 ms
3 205 ms
4 130 ms
5 107 ms
Just to give an rough idea - I did just one run and have a lot of other stuff going on.
Original answer
I wouldn't care much about performance - an average desktop pc can quiete easily deal with database tables with 100 million rows. Maybe it takes five minutes but I assume you don't want to perform the task every other second.
I just made a test. I generated a list with 15.000 unique prefixes with 5 to 10 digits. From this prefixes I generated 50.000 numbers with a prefix and additional 5 to 10 digits.
List<String> prefixes = GeneratePrefixes();
List<String> numbers = GenerateNumbers(prefixes);
Then I used the following LINQ to Object query to find the prefix of each number.
var numberPrefixes = numbers.Select(n => prefixes.First(n.StartsWith)).ToList();
Well, it took about a minute on my Core 2 Duo laptop with 2.0 GHz. So if one minute processing time is acceptable, maybe two or three if you include aggregation, I would not try to optimize anything. Of course, it would be realy nice if the programm could do the task in a second or two, but this will add quite a bit of complexity and many things to get wrong. And it takes time to design, write, and test. The LINQ statement took my only seconds.
Test application
Note that generating many prefixes is really slow and might take a minute or two.
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
namespace Test
{
static class Program
{
static void Main()
{
// Set number of prefixes and calls to not more than 50 to get results
// printed to the console.
Console.Write("Generating prefixes");
List<String> prefixes = Program.GeneratePrefixes(5, 10, 15);
Console.WriteLine();
Console.Write("Generating calls");
List<Call> calls = Program.GenerateCalls(prefixes, 5, 10, 50);
Console.WriteLine();
Console.WriteLine("Processing started.");
Stopwatch stopwatch = new Stopwatch();
const Int32 minimumPrefixLength = 5;
stopwatch.Start();
var groupedPefixes = prefixes
.GroupBy(p => p.Substring(0, minimumPrefixLength))
.ToDictionary(g => g.Key, g => g);
var result = calls
.GroupBy(c => groupedPefixes[c.Number.Substring(0, minimumPrefixLength)]
.First(c.Number.StartsWith))
.Select(g => new Call(g.Key, g.Sum(i => i.Duration)))
.ToList();
stopwatch.Stop();
Console.WriteLine("Processing finished.");
Console.WriteLine(stopwatch.Elapsed);
if ((prefixes.Count <= 50) && (calls.Count <= 50))
{
Console.WriteLine("Prefixes");
foreach (String prefix in prefixes.OrderBy(p => p))
{
Console.WriteLine(String.Format(" prefix={0}", prefix));
}
Console.WriteLine("Calls");
foreach (Call call in calls.OrderBy(c => c.Number).ThenBy(c => c.Duration))
{
Console.WriteLine(String.Format(" number={0} duration={1}", call.Number, call.Duration));
}
Console.WriteLine("Result");
foreach (Call call in result.OrderBy(c => c.Number))
{
Console.WriteLine(String.Format(" prefix={0} accumulated duration={1}", call.Number, call.Duration));
}
}
Console.ReadLine();
}
private static List<String> GeneratePrefixes(Int32 minimumLength, Int32 maximumLength, Int32 count)
{
Random random = new Random();
List<String> prefixes = new List<String>(count);
StringBuilder stringBuilder = new StringBuilder(maximumLength);
while (prefixes.Count < count)
{
stringBuilder.Length = 0;
for (int i = 0; i < random.Next(minimumLength, maximumLength + 1); i++)
{
stringBuilder.Append(random.Next(10));
}
String prefix = stringBuilder.ToString();
if (prefixes.Count % 1000 == 0)
{
Console.Write(".");
}
if (prefixes.All(p => !p.StartsWith(prefix) && !prefix.StartsWith(p)))
{
prefixes.Add(stringBuilder.ToString());
}
}
return prefixes;
}
private static List<Call> GenerateCalls(List<String> prefixes, Int32 minimumLength, Int32 maximumLength, Int32 count)
{
Random random = new Random();
List<Call> calls = new List<Call>(count);
StringBuilder stringBuilder = new StringBuilder();
while (calls.Count < count)
{
stringBuilder.Length = 0;
stringBuilder.Append(prefixes[random.Next(prefixes.Count)]);
for (int i = 0; i < random.Next(minimumLength, maximumLength + 1); i++)
{
stringBuilder.Append(random.Next(10));
}
if (calls.Count % 1000 == 0)
{
Console.Write(".");
}
calls.Add(new Call(stringBuilder.ToString(), random.Next(1000)));
}
return calls;
}
private class Call
{
public Call (String number, Decimal duration)
{
this.Number = number;
this.Duration = duration;
}
public String Number { get; private set; }
public Decimal Duration { get; private set; }
}
}
}
It sounds to me like you need to build a trie from the carrier prefixes. You'll end up with a single trie, where the terminating nodes tell you the carrier for that prefix.
Then create a dictionary from carrier to an int or long (the total).
Then for each dialed number row, just work your way down the trie until you find the carrier. Find the total number of minutes so far for the carrier, and add the current row - then move on.
The easiest data structure that would do this fairly efficiently would be a list of sets. Make a Set for each carrier to contain all the prefixes.
Now, to associate a call with a carrier:
foreach (Carrier carrier in carriers)
{
bool found = false;
for (int length = 1; length <= 7; length++)
{
int prefix = ExtractDigits(callNumber, length);
if (carrier.Prefixes.Contains(prefix))
{
carrier.Calls.Add(callNumber);
found = true;
break;
}
}
if (found)
break;
}
If you have 10 carriers, there will be 70 lookups in the set per call. But a lookup in a set isn't too slow (much faster than a linear search). So this should give you quite a big speed up over a brute force linear search.
You can go a step further and group the prefixes for each carrier according to the length. That way, if a carrier has only prefixes of length 7 and 4, you'd know to only bother to extract and look up those lengths, each time looking in the set of prefixes of that length.
How about dumping your data into a couple of database tables and then query them using SQL? Easy!
CREATE TABLE dbo.dialled_numbers ( number VARCHAR(100), minutes INT )
CREATE TABLE dbo.prefixes ( prefix VARCHAR(100) )
-- now populate the tables, create indexes etc
-- and then just run your query...
SELECT p.prefix,
SUM(n.minutes) AS total_minutes
FROM dbo.dialled_numbers AS n
INNER JOIN dbo.prefixes AS p
ON n.number LIKE p.prefix + '%'
GROUP BY p.prefix
(This was written for SQL Server, but should be very simple to translate for any other DBMS.)
Maybe it would be simpler (not necessarily more efficient) to do it in a database instead of C#.
You could insert the rows on the database and on insert determine the carrier and include it in the record (maybe in an insert trigger).
Then your report would be a sum query on the table.
I would probably just put the entries in a List, sort it, then use a binary search to look for matches. Tailor the binary search match criteria to return the first item that matches then iterate along the list until you find one that doesn't match. A binary search takes only around 15 comparisons to search a list of 30,000 items.
You may want to use a HashTable in C#.
This way you have key-value pairs, and your keys could be the phone numbers, and your value the total minutes. If a match is found in the key set, then modify the total minutes, else, add a new key.
You would then just need to modify your searching algorithm, to not look at the entire key, but only the first 7 digits of it.

Ensure uniform (ish) distribution with random number generation

I have a list of objects and I would like to access the objects in a random order continuously.
I was wondering if there was a way of ensuring that the random value were not always similar.
Example.
My list is a list of Queues, and I am trying to interleave the values to produce a real-world scenario for testing.
I don't particularly want all of the items in Queues 1 and 2 before any other item.
Is there a guaruanteed way to do this?
Thanks
EDIT ::
The List of Queues I have is a basically a list of files that i am transmitting to a webservice. Files need to be in a certain order hence the Queues.
So I have
Queue1 = "set1_1.xml", set1_2.xml", ... "set1_n.xml"
Queue2 ...
...
QueueN
While each file needs to be transmitted in order in terms of the other files in its queue, I would like to simulate a real world simulation where files would be received from different sources at different times and so have them interleaved.
At the moment I am just using a simple rand on 0 to (number of Queues) to determine which file to dequeue next. This works but I was asking if there might have been away to get some more uniformity rather than having 50 files from Queue 1 and 2 and then 5 files from Queue 3.
I do realise though that altering the randomness no longer makes it random.
Thank you for all your answers.
Well, it isn't entire clear what the scenario is, but the thing with random is you never can tell ;-p. Anything you try to do to "guarantee" thins will probably reduce the randomness.
How are you doing it? Personally I'd do something like:
static IEnumerable<T> GetItems<T>(IEnumerable<Queue<T>> queues)
{
int remaining = queues.Sum(q => q.Count);
Random rand = new Random();
while (remaining > 0)
{
int index = rand.Next(remaining);
foreach (Queue<T> q in queues)
{
if (index < q.Count)
{
yield return q.Dequeue();
remaining--;
break;
}
else
{
index -= q.Count;
}
}
}
}
This should be fairly uniform over the entire set. The trick here is that by treating the queues as a single large queue, the tendency is that the queue's with lots of items will get dequeued more quickly (since there is more chance of getting an index in their range). This means that it should automatically balance consumption between the queues so that they all run dry at (roughly) the same time. If you don't have LINQ, just change the first line:
int remaining = 0;
foreach(Queue<T> q in queues) {remaining += q.Count;}
Example usage:
static void Main()
{
List<Queue<int>> queues = new List<Queue<int>> {
Build(1,2,3,4,5), Build(6,7,8), Build(9,10,11,12,13)
};
foreach (int i in GetItems(queues))
{
Console.WriteLine(i);
}
}
static Queue<T> Build<T>(params T[] items)
{
Queue<T> queue = new Queue<T>();
foreach (T item in items)
{
queue.Enqueue(item);
}
return queue;
}
It depends on what you really want...
If the "random" values are truly random then you will get uniform distribution with enough iterations.
If you're talking about controlling or manipulating the distribution then the values will no longer be truly random!
So, you can either have:
Truly random values with uniform distribution, or
Controlled distribution, but no longer truly random
Are you trying to shuffle your list?
If so you can do it by sorting it on a random value.
Try something like this:
private Random random = new Random();
public int RandomSort(Queue q1, Queue q2)
{
if (q1 == q2) { return 0; }
return random.Next().CompareTo(random.Next());
}
And then use the RandomSort as the argument in a call to List.Sort();
If the items to be queued have a GetHashCode() algorithm which distributes values evenly across all integers, you can take the modulus operator on the hash value to specify which queue to add the item to. This is basically the same principle which hash tables use to assure an even distribution of values.

Categories

Resources