I need to optimize below code so it can execute faster, by means using more memory or parallel, currently it is taking 2 minutes to complete single record in Windows 10 64bit, 16GB RAM PC
data1 list array length = 1000
data2 list array length = 100000
data3 list array length = 100
for (int d1 = 0; d1 < data1.Count; d1++)
{
if (data1[d1].status == 'UNMATCHED')
{
for (int d2 = 0; d2 < data2.Count; d2++)
{
if (data2[d2].status == 'UNMATCHED')
{
vMatched = false;
for (int d3 = 0; d3 < data3.Count; d3++)
{
if (data3[d3].rule == "rule1")
{
if (data1[d1].value == data2[d2].value)
{
data1[d1].status = 'MATCHED';
data1[d2].status = 'MATCHED';
vMatched = true;
break;
}
}
else if (data3[d3].rule == "rule2")
{
...
}
else if (data3[d3].rule == "rule100")
{
...
}
}
if (vMatched)
break;
}
}
}
}
First of all, for any kind of performance oriented programming, avoid using strings, use more appropriate types, like enum or bools, instead. Another recommendation is to profile your code, so you know what parts actually take time.
In the given example there is only one rule presented, so the data3-loop could be eliminated by first checking if this rule exist and only then proceed with the matching.
This matching between items in data1 & data2 essentially pairs unmatched items with the same value. Whenever problems like this occur, the standard solution is some kind of search structure, like a dictionary, to get better than linear search time. For example
var data2Dictionary = data2.ToDictionary(d => Tuple.Create(d.value, d.status), d => d);
This should let you drastically decrease the time to find a item with a specific value and status. Keep in mind that the code above will throw in case multiple items share the same value & status, and that the dictionary key will not be updated if the item changes value or status.
You can avoid to start everytime the 2nd loop from 0. By keeping last index with "UNMATCHED" inside data2.
It should reduce the complexity.
In the worst case:
Now 1000 * 100000 * 100 iterations: 10000000000
New (1000+100000) * 100 iterations: 10100000
Related
I always struggle with these types of algorithms. I have a scenario where I have a cubic value for freight and need to split this value into cartons of different sizes, there are 3 sizes available in this instance, 0.12m3, 0.09m3 and 0.05m3. A few examples;
Assume total m3 is 0.16m3, I need to consume this value into the appropriate cartons.
I will have 1 carton of 0.12m3, this leave 0.04m3 to consume. This fits into the 0.05m3 so therefore I will have 1 carton of 0.05m, consumption is now complete. Final answer is 1 x 0.12m3 and 1 x 0.05m3.
Assume total m3 is 0.32m3, I would end up with 2 x 0.12m3 and 1 x 0.09m3.
I would prefer something either in c# or SQL that would easily return to me the results.
Many thanks for any help.
Cheers
I wrote an algorithm that may be a little messy but I do think it works. Your problem statement isn't 100% unambiguous, so this solution is assuming you want to pick containers so that you minimize the remaining space, when started filling from the largest container.
// List of cartons
var cartons = new List<double>
{
0.12,
0.09,
0.05
};
// Amount of stuff that you want to put into cartons
var stuff = 0.32;
var distribution = new Dictionary<double, int>();
// For this algorithm, I want to sort by descending first.
cartons = cartons.OrderByDescending(x => x).ToList();
foreach (var carton in cartons)
{
var count = 0;
while (stuff >= 0)
{
if (stuff >= carton)
{
// If the amount of stuff bigger than the carton size, we use carton size, then update stuff
count++;
stuff = stuff - carton;
distribution.CreateNewOrUpdateExisting(carton, 1);
}
else
{
// Otherwise, among remaining cartons we pick the ones that will have empty space if the remaining stuff is put in
var partial = cartons.Where(x => x - stuff >= 0 && x != carton);
if (partial != null && partial.Count() > 0)
{
var min = partial.Min();
if (min > 0)
{
distribution.CreateNewOrUpdateExisting(min, 1);
stuff = stuff - min;
}
}
else
{
break;
}
}
}
There' an accompanying extension method, which either adds an item to a dictionary, or if the Key exists, then increments the Value.
public static class DictionaryExtensions
{
public static void CreateNewOrUpdateExisting(this IDictionary<double, int> map, double key, int value)
{
if (map.ContainsKey(key))
{
map[key]++;
}
else
{
map.Add(key, value);
}
}
}
EDIT
Found a bug in the case where initial stuff is smaller than the largest container, so code updated to fix it.
NOTE
This may still not be a 100% foolproof algorithm as I haven't tested extensively. But it should give you an idea on how to proceed.
EDIT EDIT
Changing the condition to while (stuff > 0) should fix the bug mentioned in the comments.
I want to have all combination of elements in a list for a result like this:
List: {1,2,3}
1
2
3
1,2
1,3
2,3
My problem is that I have 180 elements, and I want to have all combinations up to 5 elements. With my tests with 4 elements, it took a long time (2 minutes) but all went well. But with 5 elements, I get a run out of memory exception.
My code presently is this:
public IEnumerable<IEnumerable<Rondin>> getPossibilites(List<Rondin> rondins)
{
var combin5 = rondins.Combinations(5);
var combin4 = rondins.Combinations(4);
var combin3 = rondins.Combinations(3);
var combin2 = rondins.Combinations(2);
var combin1 = rondins.Combinations(1);
return combin5.Concat(combin4).Concat(combin3).Concat(combin2).Concat(combin1).ToList();
}
With the fonction: (taken from this question: Algorithm to return all combinations of k elements from n)
public static IEnumerable<IEnumerable<T>> Combinations<T>(this IEnumerable<T> elements, int k)
{
return k == 0 ? new[] { new T[0] } :
elements.SelectMany((e, i) =>
elements.Skip(i + 1).Combinations(k - 1).Select(c => (new[] { e }).Concat(c)));
}
I need to search in the list for a combination where each element added up is near (with a certain precision) to a value, this for each element in an other list. There is all my code for this part:
var possibilites = getPossibilites(opt.rondins);
possibilites = possibilites.Where(p => p.Sum(r => r.longueur + traitScie) < 144);
foreach(BilleOptimisee b in opt.billesOptimisees)
{
var proches = possibilites.Where(p => p.Sum(r => (r.longueur + traitScie)) < b.chute && Math.Abs(b.chute - p.Sum(r => r.longueur)) - (p.Count() * 0.22) < 0.01).OrderByDescending(p => p.Sum(r => r.longueur)).ElementAt(0);
if(proches != null)
{
foreach (Rondin r in proches)
{
opt.rondins.Remove(r);
b.rondins.Add(r);
possibilites = possibilites.Where(p => !p.Contains(r));
}
}
}
With the code I have, how can I limit the memory taken by my list ? Or is there a better solution to search in a very big set of combinations ?
Please, if my question is not good, tell me why and I will do my best to learn and ask better questions next time ;)
Your output list for combinations of 5 elements will have ~1.5*10^9 (that's billion with b) sublists of size 5. If you use 32bit integers, even neglecting lists overhead and assuming you have a perfect list with 0b overhead - that will be ~200GB!
You should reconsider if you actually need to generate the list like you do, some alternative might be: streaming the list of elements - i.e. generating them on the fly.
That can be done by creating a function, which gets the last combination as an argument - and outputs the next. (to think how it is done, think about increasing by one a number. you go from last to first, remembering a "carry over" until you are done)
A streaming example for choosing 2 out of 4:
start: {4,3}
curr = start {4, 3}
curr = next(curr) {4, 2} // reduce last by one
curr = next(curr) {4, 1} // reduce last by one
curr = next(curr) {3, 2} // cannot reduce more, reduce the first by one, and set the follower to maximal possible value
curr = next(curr) {3, 1} // reduce last by one
curr = next(curr) {2, 1} // similar to {3,2}
done.
Now, you need to figure how to do it for lists of size 2, then generalize it for arbitrary size - and program your streaming combination generator.
Good Luck!
Let your precision be defined in the imaginary spectrum.
Use a real index to access the leaf and then traverse the leaf with the required precision.
See PrecisLise # http://net7mma.codeplex.com/SourceControl/latest#Common/Collections/Generic/PrecicseList.cs
While the implementation is not 100% complete as linked you can find where I used a similar concept here:
http://net7mma.codeplex.com/SourceControl/latest#RtspServer/MediaTypes/RFC6184Media.cs
Using this concept I was able to re-order h.264 Access Units and their underlying Network Access Layer Components in what I consider a very interesting way... outside of interesting it also has the potential to be more efficient using close the same amount of memory.
et al, e.g, 0 can be proceeded by 0.1 or 0.01 or 0.001, depending on the type of the key in the list (double, float, Vector, inter alia) you may have the added benefit of using the FPU or even possibly Intrinsics if supported by your processor, thus making sorting and indexing much faster than would be possible on normal sets regardless of the underlying storage mechanism.
Using this concept allows for very interesting ordering... especially if you provide a mechanism to filter the precision.
I was also able to find several bugs in the bit-stream parser of quite a few well known media libraries using this methodology...
I found my solution, I'm writing it here so that other people that has a similar problem than me can have something to work with...
I made a recursive fonction that check for a fixed amount of possibilities that fit the conditions. When the amount of possibilities is found, I return the list of possibilities, do some calculations with the results, and I can restart the process. I added a timer to stop the research when it takes too long. Since my condition is based on the sum of the elements, I do every possibilities with distinct values, and search for a small amount of possibilities each time (like 1).
So the fonction return a possibility with a very high precision, I do what I need to do with this possibility, I remove the elements of the original list, and recall the fontion with the same precision, until there is nothing returned, so I can continue with an other precision. When many precisions are done, there is only about 30 elements in my list, so I can call for all the possibilities (that still fits the maximum sum), and this part is much easier than the beginning.
There is my code:
public List<IEnumerable<Rondin>> getPossibilites(IEnumerable<Rondin> rondins, int nbElements, double minimum, double maximum, int instance = 0, double longueur = 0)
{
if(instance == 0)
timer = DateTime.Now;
List<IEnumerable<Rondin>> liste = new List<IEnumerable<Rondin>>();
//Get all distinct rondins that can fit into the maximal length
foreach (Rondin r in rondins.Where(r => r.longueur < (maximum - longueur)).DistinctBy(r => r.longueur).OrderBy(r => r.longueur))
{
//Check the current length
double longueur2 = longueur + r.longueur + traitScie;
//If the current length is under the maximal length
if (longueur2 < maximum)
{
//Get all the possibilities with all rondins except the current one, and add them to the list
foreach (IEnumerable<Rondin> poss in getPossibilites(rondins.Where(rondin => rondin.id != r.id), nbElements - liste.Count, minimum, maximum, instance + 1, longueur2).Select(possibilite => possibilite.Concat(new Rondin[] { r })))
{
liste.Add(poss);
if (liste.Count >= nbElements && nbElements > 0)
break;
}
//If this the current length in higher than the minimum, add it to the list
if (longueur2 >= minimum)
liste.Add(new Rondin[] { r });
}
//If we have enough possibilities, we stop the research
if (liste.Count >= nbElements && nbElements > 0)
break;
//If the research is taking too long, stop the research and return the list;
if (DateTime.Now.Subtract(timer).TotalSeconds > 30)
break;
}
return liste;
}
I am comparing two lists of data that were generated from a binary file. I have a good idea on why it's running slow, when there's a significant amount of records, it does un-needed redundant work.
For example, if a1 = a1, condition is true. Since 2a != 1a so why even bother checking it? I need to eliminate 1a from being checked again. If I don't, it will check the first record when it goes to check the 400,000th record. I thought about making the second for loop a foreach, but I can't remove 1a while iterating through the nested loop
The amount of items that can be in either 'for loop' can vary. I don't think a single for loop using 'i' will work since the match can be anywhere. I'm reading from a binary file
This is my current code. Program has been running for over an hour, and it's still going. I removed a lot of my iterating code for readability reasons.
for (int i = 0; i < origItemList.Count; i++)
{
int modFoundIndex = 0;
Boolean foundIt = false;
for (int g = 0; g < modItemList.Count; g++)
{
if ((origItemList[i].X == modItemList[g].X)
&& (origItemList[i].Y == modItemList[g].Y)
&& (origItemList[i].Z == modItemList[g].Z)
&& (origItemList[i].M == modItemList[g].M))
{
foundIt = true;
modFoundIndex = g;
break;
}
else
{
foundIt = false;
}
}
if (foundIt)
{
/*
* This is run assumming it finds an x,y,z,m
coordinate. It thenchecks the database file.
*
*/
//grab the rows where the coordinates match
DataRow origRow = origDbfFile.dataset.Tables[0].Rows[i];
DataRow modRow = modDbfFile.dataset.Tables[0].Rows[modFoundIndex];
//number matched indicates how many columns were matched
int numberMatched = 0;
//get the number of columns to match in order to detect all changes
int numOfColumnsToMatch = origDbfFile.datatable.Columns.Count;
List<String> mismatchedColumns = new List<String>();
//check each column name for a change
foreach (String columnName in columnNames)
{
//this grabs whatever value is in that field
String origRowValue = "" + origRow.Field<Object>(columnName);
String modRowValue = "" + modRow.Field<Object>(columnName);
//check if they are the same
if (origRowValue.Equals(modRowValue))
{
//if they aren the same, increase the number matched by one
numberMatched++;
//add the column to the list of columns that don't match
}
else
{
mismatchedColumns.Add(columnName);
}
}
/* In the event it matches 15/16 columns, show the change */
if (numberMatched != numOfColumnsToMatch)
{
//Grab the shapeFile in question
Item differentAttrShpFile = origItemList[i];
//start blue highlighting
result += "<div class='turnBlue'>";
//show where the change was made at
result += "Change Detected at<br/> point X: " +
differentAttrShpFile.X + ",<br/> point Y: " +
differentAttrShpFile.Y + ",<br/>";
result += "</div>"; //end turnblue div
foreach (String mismatchedColumn in mismatchedColumns)
{
//iterate changes here
}
}
}
}
You're coming at this in a totally wrong way. The loop you have is O(n^2), breaking when you find the match will on average cut the time in half for a hit, that's not enough. If you have a quarter million items in the list then this loop executes 62 billion times and even if the compiler optimizes out the extra array lookups you're still looking at at least a trillion instructions. You don't do O(n^2) for large n if you can possibly help it!
What you need to do is get rid of the O(n^2) aspect of this. My suggestion:
1) Define a hashing function that looks at the x, y, z & m and comes up with an integer value, my inclination would be to use one that's the wordsize of your target platform.
2) Iterate over both lists, compute hashes for everything.
3) Build an index to one of the tables, hash and the object. I suspect a dictionary is the best data structure here but a simple sorted array would also do.
4) Iterate over the list you didn't build the index over, compare the hashes to the entries in the index. If it's a hash that's an O(n) task, if it's a sorted array it's O(n log n).
5) When the hashes match do a full comparison to confirm the hit is real as you will get the occasional collision with a good 64-bit hash and you'll get a decent number of them if your hashes are 32-bit.
This is something similar to Loren said but below is in language of .NET :)
1. Override GetHashCode method to return sum of x,y,z and m. Override Equals method to check for this sum.
2. Iterate and create HashSet from modItemList (List) before loop.
3. In inner loop, first check if origItemList[i] exists in HashSet using YourModHashSet.Contains(MyObject) method.
4. If .Contains return you false, carry one, no match.
5. If .Contains return you true, iterate thru entire modItemList and apply your current logic of checking for x,y,z and m for entire list. Note that here you should use List as hash table might eat up many objects for which hash code is same.
Also, I would use Foreach instead of For because I've seen Foreach giving little better results (5 to 30% faster) in such case.
Update:
I created MyObject class like below:
public class MyObject
{
public int X, Y, Z, M;
public override int GetHashCode()
{
return X*10000 + Y*100 + Z*10 + M;
}
public override bool Equals(object obj)
{
return (obj.GetHashCode() == this.GetHashCode());
}
}
GetHashCode method is important here. We don't want many false positives. False positive occurs when Hash matches for some other combination of X, Y, Z and M. Best way to prevent false positive is to multiply each member such that each will impact one decimal place in HashCode. Note that you should consider not exceeding Int.Max value. If the expected value of X,Y,Z and M are small you should be good.
set2.Clear();
s1 = DateTime.Now;
MyObject matchingElement;
totalmatch = 0;
foreach (MyObject elem in list2)
set2.Add(elem);
foreach (MyObject t1 in list1)
{
if (set2.Contains(t1))
{
matchingElement = null;
foreach (MyObject t2 in list2)
{
if (t1.X == t2.X && t1.Y == t2.Y && t1.Z == t2.Z && t1.M == t2.M)
{
totalmatch++;
matchingElement = t2;
break;
}
}
//Do Something on matchingElement if not null
}
}
Console.WriteLine("set foreach with contains: " + (DateTime.Now - s1).TotalSeconds + "\t Total Match: " + totalmatch);
Above is sample code that I was trying to describe in my answer. This code should work super fast if matches are expected to be less.
I have several sorted sequences of numbers of type long (ascending order) and want to generate one master sequence that contains all elements in the same order. I look for the most efficient sorting algorithm to solve this problem. I target C#, .Net 4.0 and thus also welcome ideas targeting parallelism.
Here is an example:
s1 = 1,2,3,5,7,13
s2 = 2,3,6
s3 = 4,5,6,7,8
resulting Sequence = 1,2,2,3,3,4,5,5,6,6,7,7,8,13
Edit: When there are two (or more) identical values then the order of those two (or more) does not matter.
Just merge the sequences. You do not have to sort them again.
There is no .NET Framework method that I know of to do a K-way merge. Typically, it's done with a priority queue (often a heap). It's not difficult to do, and it's quite efficient. Given K sorted lists, together holding N items, the complexity is O(N log K).
I show a simple binary heap class in my article A Generic Binary Heap Class. In Sorting a Large Text File, I walk through the creation of multiple sorted sub-files and using the heap to do the K-way merge. Given an hour (perhaps less) of study, and you can probably adapt that to use in your program.
You just have to merge your sequences like in a merge sort.
And this is parallelizable:
merge sequences (1 and 2 in 1/2), (3 and 4 in 3/4), …
merge sequences (1/2 and 3/4 in 1/2/3/4), (5/6 and 7/8 in 5/6/7/8), …
…
Here is the merge function :
int j = 0;
int k = 0;
for(int i = 0; i < size_merged_seq; i++)
{
if (j < size_seq1 && seq1[j] < seq2[k])
{
merged_seq[i] = seq1[j];
j++;
}
else
{
merged_seq[i] = seq2[k];
k++;
}
}
Easy way is to merge them with each other one by one. However, this will require O(n*k^2) time, where k is number of sequences and n is the average number of items in sequences. However, using divide and conquer approach you can lower this time to O(n*k*log k). The algorithm is as follows:
Divide k sequences to k/2 groups, each of 2 elements (and 1 groups of 1 element if k is odd).
Merge sequences in each group. Thus you will get k/2 new groups.
Repeat until you get single sequence.
UPDATE:
Turns out that with all the algorithms... It's still faster the simple way:
private static List<T> MergeSorted<T>(IEnumerable<IEnumerable<T>> sortedBunches)
{
var list = sortedBunches.SelectMany(bunch => bunch).ToList();
list.Sort();
return list;
}
And for legacy purposes...
Here is the final version by prioritizing:
private static IEnumerable<T> MergeSorted<T>(IEnumerable<IEnumerable<T>> sortedInts) where T : IComparable<T>
{
var enumerators = new List<IEnumerator<T>>(sortedInts.Select(ints => ints.GetEnumerator()).Where(e => e.MoveNext()));
enumerators.Sort((e1, e2) => e1.Current.CompareTo(e2.Current));
while (enumerators.Count > 1)
{
yield return enumerators[0].Current;
if (enumerators[0].MoveNext())
{
if (enumerators[0].Current.CompareTo(enumerators[1].Current) == 1)
{
var tmp = enumerators[0];
enumerators[0] = enumerators[1];
enumerators[1] = tmp;
}
}
else
{
enumerators.RemoveAt(0);
}
}
do
{
yield return enumerators[0].Current;
} while (enumerators[0].MoveNext());
}
Hey everyone, great community you got here. I'm an Electrical Engineer doing some "programming" work on the side to help pay for bills. I say this because I want you to take into consideration that I don't have proper Computer Science training, but I have been coding for the past 7 years.
I have several excel tables with information (all numeric), basically it is "dialed phone numbers" in one column and number of minutes to each of those numbers on another. Separately I have a list of "carrier prefix code numbers" for the different carriers in my country. What I want to do is separate all the "traffic" per carrier. Here is the scenario:
First dialed number row: 123456789ABCD,100 <-- That would be a 13 digit phone number and 100 minutes.
I have a list of 12,000+ prefix codes for carrier 1, these codes vary in length, and I need to check everyone of them:
Prefix Code 1: 1234567 <-- this code is 7 digits long.
I need to check the first 7 digits for the dialed number an compare it to the dialed number, if a match is found, I would add the number of minutes to a subtotal for later use. Please consider that not all prefix codes are the same length, some times they are shorter or longer.
Most of this should be a piece of cake, and I could should be able to do it, but I'm getting kind of scared with the massive amount of data; Some times the dialed number lists consists of up to 30,000 numbers, and the "carrier prefix code" lists around 13,000 rows long, and I usually check 3 carriers, that means I have to do a lot of "matches".
Does anyone have an idea of how to do this efficiently using C#? Or any other language to be kind honest. I need to do this quite often and designing a tool to do it would make much more sense. I need a good perspective from someone that does have that "Computer Scientist" background.
Lists don't need to be in excel worksheets, I can export to csv file and work from there, I don't need an "MS Office" interface.
Thanks for your help.
Update:
Thank you all for your time on answering my question. I guess in my ignorance I over exaggerated the word "efficient". I don't perform this task every few seconds. It's something I have to do once per day and I hate to do with with Excel and VLOOKUPs, etc.
I've learned about new concepts from you guys and I hope I can build a solution(s) using your ideas.
UPDATE
You can do a simple trick - group the prefixes by their first digits into a dictionary and match the numbers only against the correct subset. I tested it with the following two LINQ statements assuming every prefix has at least three digis.
const Int32 minimumPrefixLength = 3;
var groupedPefixes = prefixes
.GroupBy(p => p.Substring(0, minimumPrefixLength))
.ToDictionary(g => g.Key, g => g);
var numberPrefixes = numbers
.Select(n => groupedPefixes[n.Substring(0, minimumPrefixLength)]
.First(n.StartsWith))
.ToList();
So how fast is this? 15.000 prefixes and 50.000 numbers took less than 250 milliseconds. Fast enough for two lines of code?
Note that the performance heavily depends on the minimum prefix length (MPL), hence on the number of prefix groups you can construct.
MPL Runtime
-----------------
1 10.198 ms
2 1.179 ms
3 205 ms
4 130 ms
5 107 ms
Just to give an rough idea - I did just one run and have a lot of other stuff going on.
Original answer
I wouldn't care much about performance - an average desktop pc can quiete easily deal with database tables with 100 million rows. Maybe it takes five minutes but I assume you don't want to perform the task every other second.
I just made a test. I generated a list with 15.000 unique prefixes with 5 to 10 digits. From this prefixes I generated 50.000 numbers with a prefix and additional 5 to 10 digits.
List<String> prefixes = GeneratePrefixes();
List<String> numbers = GenerateNumbers(prefixes);
Then I used the following LINQ to Object query to find the prefix of each number.
var numberPrefixes = numbers.Select(n => prefixes.First(n.StartsWith)).ToList();
Well, it took about a minute on my Core 2 Duo laptop with 2.0 GHz. So if one minute processing time is acceptable, maybe two or three if you include aggregation, I would not try to optimize anything. Of course, it would be realy nice if the programm could do the task in a second or two, but this will add quite a bit of complexity and many things to get wrong. And it takes time to design, write, and test. The LINQ statement took my only seconds.
Test application
Note that generating many prefixes is really slow and might take a minute or two.
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
namespace Test
{
static class Program
{
static void Main()
{
// Set number of prefixes and calls to not more than 50 to get results
// printed to the console.
Console.Write("Generating prefixes");
List<String> prefixes = Program.GeneratePrefixes(5, 10, 15);
Console.WriteLine();
Console.Write("Generating calls");
List<Call> calls = Program.GenerateCalls(prefixes, 5, 10, 50);
Console.WriteLine();
Console.WriteLine("Processing started.");
Stopwatch stopwatch = new Stopwatch();
const Int32 minimumPrefixLength = 5;
stopwatch.Start();
var groupedPefixes = prefixes
.GroupBy(p => p.Substring(0, minimumPrefixLength))
.ToDictionary(g => g.Key, g => g);
var result = calls
.GroupBy(c => groupedPefixes[c.Number.Substring(0, minimumPrefixLength)]
.First(c.Number.StartsWith))
.Select(g => new Call(g.Key, g.Sum(i => i.Duration)))
.ToList();
stopwatch.Stop();
Console.WriteLine("Processing finished.");
Console.WriteLine(stopwatch.Elapsed);
if ((prefixes.Count <= 50) && (calls.Count <= 50))
{
Console.WriteLine("Prefixes");
foreach (String prefix in prefixes.OrderBy(p => p))
{
Console.WriteLine(String.Format(" prefix={0}", prefix));
}
Console.WriteLine("Calls");
foreach (Call call in calls.OrderBy(c => c.Number).ThenBy(c => c.Duration))
{
Console.WriteLine(String.Format(" number={0} duration={1}", call.Number, call.Duration));
}
Console.WriteLine("Result");
foreach (Call call in result.OrderBy(c => c.Number))
{
Console.WriteLine(String.Format(" prefix={0} accumulated duration={1}", call.Number, call.Duration));
}
}
Console.ReadLine();
}
private static List<String> GeneratePrefixes(Int32 minimumLength, Int32 maximumLength, Int32 count)
{
Random random = new Random();
List<String> prefixes = new List<String>(count);
StringBuilder stringBuilder = new StringBuilder(maximumLength);
while (prefixes.Count < count)
{
stringBuilder.Length = 0;
for (int i = 0; i < random.Next(minimumLength, maximumLength + 1); i++)
{
stringBuilder.Append(random.Next(10));
}
String prefix = stringBuilder.ToString();
if (prefixes.Count % 1000 == 0)
{
Console.Write(".");
}
if (prefixes.All(p => !p.StartsWith(prefix) && !prefix.StartsWith(p)))
{
prefixes.Add(stringBuilder.ToString());
}
}
return prefixes;
}
private static List<Call> GenerateCalls(List<String> prefixes, Int32 minimumLength, Int32 maximumLength, Int32 count)
{
Random random = new Random();
List<Call> calls = new List<Call>(count);
StringBuilder stringBuilder = new StringBuilder();
while (calls.Count < count)
{
stringBuilder.Length = 0;
stringBuilder.Append(prefixes[random.Next(prefixes.Count)]);
for (int i = 0; i < random.Next(minimumLength, maximumLength + 1); i++)
{
stringBuilder.Append(random.Next(10));
}
if (calls.Count % 1000 == 0)
{
Console.Write(".");
}
calls.Add(new Call(stringBuilder.ToString(), random.Next(1000)));
}
return calls;
}
private class Call
{
public Call (String number, Decimal duration)
{
this.Number = number;
this.Duration = duration;
}
public String Number { get; private set; }
public Decimal Duration { get; private set; }
}
}
}
It sounds to me like you need to build a trie from the carrier prefixes. You'll end up with a single trie, where the terminating nodes tell you the carrier for that prefix.
Then create a dictionary from carrier to an int or long (the total).
Then for each dialed number row, just work your way down the trie until you find the carrier. Find the total number of minutes so far for the carrier, and add the current row - then move on.
The easiest data structure that would do this fairly efficiently would be a list of sets. Make a Set for each carrier to contain all the prefixes.
Now, to associate a call with a carrier:
foreach (Carrier carrier in carriers)
{
bool found = false;
for (int length = 1; length <= 7; length++)
{
int prefix = ExtractDigits(callNumber, length);
if (carrier.Prefixes.Contains(prefix))
{
carrier.Calls.Add(callNumber);
found = true;
break;
}
}
if (found)
break;
}
If you have 10 carriers, there will be 70 lookups in the set per call. But a lookup in a set isn't too slow (much faster than a linear search). So this should give you quite a big speed up over a brute force linear search.
You can go a step further and group the prefixes for each carrier according to the length. That way, if a carrier has only prefixes of length 7 and 4, you'd know to only bother to extract and look up those lengths, each time looking in the set of prefixes of that length.
How about dumping your data into a couple of database tables and then query them using SQL? Easy!
CREATE TABLE dbo.dialled_numbers ( number VARCHAR(100), minutes INT )
CREATE TABLE dbo.prefixes ( prefix VARCHAR(100) )
-- now populate the tables, create indexes etc
-- and then just run your query...
SELECT p.prefix,
SUM(n.minutes) AS total_minutes
FROM dbo.dialled_numbers AS n
INNER JOIN dbo.prefixes AS p
ON n.number LIKE p.prefix + '%'
GROUP BY p.prefix
(This was written for SQL Server, but should be very simple to translate for any other DBMS.)
Maybe it would be simpler (not necessarily more efficient) to do it in a database instead of C#.
You could insert the rows on the database and on insert determine the carrier and include it in the record (maybe in an insert trigger).
Then your report would be a sum query on the table.
I would probably just put the entries in a List, sort it, then use a binary search to look for matches. Tailor the binary search match criteria to return the first item that matches then iterate along the list until you find one that doesn't match. A binary search takes only around 15 comparisons to search a list of 30,000 items.
You may want to use a HashTable in C#.
This way you have key-value pairs, and your keys could be the phone numbers, and your value the total minutes. If a match is found in the key set, then modify the total minutes, else, add a new key.
You would then just need to modify your searching algorithm, to not look at the entire key, but only the first 7 digits of it.