I have Dictionary<string,T> where string represents the key of record, and I have two other pieces of information about the record that I need to maintain for each record in the dictionary, which are the category of the record and its redundancy (how many times its repeated).
For example: the record XYZ1 is of category 1, and its repeated 1 times. therefore the implementation has to be something like this:
"XYZ1", {1,1}
Now moving on, I may encounter the same record in my dataset, therefore the value of the key has to be updated like:
"XYZ1", {1,2}
"XYZ1", {1,3}
...
Since I am processing big number of records such as 100K, I tried this approach but it seems inefficient because the extra effort of fetching the value from dictionary and then slicing {1,1} and then converting both slices into integer puts lot of overhead on the execution.
I was thinking of using binary digits to represent both category and repatation and maybe bitmask to fetch these pieces.
Edit: I tried to use object with 2 properties, and then Tuple<int,int>. Complexity got worse !
My question: is it possible to do so ?
if not (in terms of complexity) any suggestions?
What is your type T? You could define a custom type which holds the information you need (category and occurences) .
class MyInfo {
public int c { get; set; }
public int o { get; set; }
}
Dictionary<String, MyInfo> data;
Then when traversing your data you can easily check whether some key is already present. If yes, just increment the occurences, else insert a new element.
MyInfo d;
foreach (var e in elements) {
if (!data.TryGet(e.key, out d))
data.Add(e.key, new MyInfo { c = e.cat, o= 1});
else
d.o++;
}
EDIT
You could also combine the category and the number of occurences into one UInt64. For instance take the category in the higher 32 bit (ie you can have 4 billion categories) and the number of occurenes in the lower 32 bit (ie each key can occur 4 billion times)
Dictionary<string, UInt64> data;
UInt64 d;
foreach (var e in elements) {
if (!data.TryGet(e.key, out d))
data[e.key] = (e.cat << 32) + 1;
else
data[e.key] = d + 1;
}
And if you want to get the number of occurrences for one specific key you can just inspect the respective part of the value.
var d = data["somekey"];
var occurrences = d & 0xFFFFFFFF;
var category = d >> 32;
It seems like category never changes. So rather than using a simple string for the key of your dictionary, I would instead do something like:
Dictionary<Tuple<string,int>,int> where the key of the dictionary is a Tuple<string,int> where the string is the record and the int is the category. Then the value in the dictionary is just a count.
A dictionary is probably going to be the fastest data structure for what you're trying to accomplish as it has near constant time O(1) lookup and entry.
You can speed it up a little bit by using the Tuple, as now the category is part of the key and no longer a bit of information you have to access separately.
At the same time you could also keep the string as the key and store a Tuple<int,int> as the value and simply set Item1 as the category and Item2 as the count.
Either way is going to be roughly equivalent in speed. Processing 100k records in such a manner should be pretty fast either way.
Related
I came across an algorithm problem. Suppose I receive a credit and would like to but two items from a local store. I would like to buy two items that add up to the entire value of the credit. The input data has three lines.
The first line is the credit, the second line is the total amount of the items and the third line lists all the item price.
Sample data 1:
200
7
150 24 79 50 88 345 3
Which means I have $200 to buy two items, there are 7 items. I should buy item 1 and item 4 as 200=150+50
Sample data 2:
8
8
2 1 9 4 4 56 90 3
Which indicates that I have $8 to pick two items from total 8 articles. The answer is item 4 and item 5 because 8=4+4
My thought is first to create the array of course, then pick up any item say item x. Creating another array say "remain" which removes x from the original array.
Subtract the price of x from the credit to get the remnant and check whether the "remain" contains remnant.
Here is my code in C#.
// Read lines from input file and create array price
foreach (string s in price)
{
int x = Int32.Parse(s);
string y = (credit - x).ToString();
index1 = Array.IndexOf(price, s) ;
index2 = Array.IndexOf(price, y) ;
remain = price.ToList();
remain.RemoveAt(index1);//remove an element
if (remain.Contains(y))
{
break;
}
}
// return something....
My two questions:
How is the complexity? I think it is O(n2).
Any improvement to the algorithm? When I use sample 2, I have trouble to get correct indices. Because there two "4" in the array, it always returns the first index since IndexOf(String) reports the zero-based index of the first occurrence of the specified string in this instance.
You can simply sort the array in O(nlogn) time. Then for each element A[i] conduct a binary search for S-A[i] again in O(nlogn) time.
EDIT: As pointed out by Heuster, you can solve the 2-SUM problem on the sorted array in linear time by using two pointers (one from the beginning and other from the end).
Create a HashSet<int> of the prices. Then go through it sequentially.Something like:
HashSet<int> items = new HashSet<int>(itemsList);
int price1 = -1;
int price2 = -1;
foreach (int price in items)
{
int otherPrice = 200 - price;
if (items.Contains(otherPrice))
{
// found a match.
price1 = price;
price2 = otherPrice;
break;
}
}
if (price2 != -1)
{
// found a match.
// price1 and price2 contain the values that add up to your target.
// now remove the items from the HashSet
items.Remove(price1);
items.Remove(price2);
}
This is O(n) to create the HashSet. Because lookups in the HashSet are O(1), the foreach loop is O(n).
This problem is called 2-sum. See., for example, http://coderevisited.com/2-sum-problem/
Here is an algorithm in O(N) time complexity and O(N) space : -
1. Put all numbers in hash table.
2. for each number Arr[i] find Sum - Arr[i] in hash table in O(1)
3. If found then (Arr[i],Sum-Arr[i]) are your pair that add up to Sum
Note:- Only failing case can be when Arr[i] = Sum/2 then you can get false positive but you can always check if there are two Sum/2 in the array in O(N)
I know I am posting this is a year and a half later, but I just happened to come across this problem and wanted to add input.
If there exists a solution, then you know that both values in the solution must both be less than the target sum.
Perform a binary search in the array of values, searching for the target sum (which may or may not be there).
The binary search will end with either finding the sum, or the closest value less than sum. That is your starting high value while searching through the array using the previously mentioned solutions. Any value above your new starting high value cannot be in the solution, as it is more than the target value.
At this point, you have eliminated a chunk of data in log(n) time, that would otherwise be eliminated in O(n) time.
Again, this is an optimization that may only be worth implementing if the data set calls for it.
I'm searching through a generic list (or IQueryable) which contains 3 columns. I'm trying to find the value of the 3 column, based on 1 and 2, but the search is really slow. For a single search, the speed isn't noticeable, but I'm performing this search on a loop, and for 700 iterations, it takes a combined time of over 2 minutes, which isn't any use. Columns 1 and 2 are int and column 3 is a double. Here is the linq I'm using:
public static Distance FindByStartAndEnd(int start, int end, IQueryable<Distance> distanceList)
{
Distance item = distanceList.Where(h => h.Start == start && h.End == end).FirstOrDefault();
return item ;
}
There could be up do 60,000 entries in the IQueryable list. I know that is quite a lot, but I didn't think it would pose any problem for searching.
So my question is, is there a better way to search through a collection when needing to match 2 columns to get value of a third? I guess I need all 700 searches to be almost instant, but it takes about 300ms for each which soon mounts up.
UPDATE - Final Solution #######################
I've now created a dictionary using Tuple with start and end as the key. I think this could be the right solution.
var dictionary = new Dictionary<Tuple<int, int>, double>();
var key = new Tuple<int, int>(Convert.ToInt32(reader[0]), Convert.ToInt32(reader[1]));
var value = Convert.ToDouble(reader[2]);
if (value <= distance)
{
dictionary.Add(key, value);
}
var key = new Tuple<int, int>(5, 20);
Works fine - much faster
Create a dictionary where columns 1 and 2 create the key. You create the dictionary once and then your searches will be almost instant.
If you have control over your collection and model classes, there is a library which allows you to index the properties of the class, which can greatly speed up searching.
http://i4o.codeplex.com/
I'd give a hashSet a try. This should speed up things ;)
Create a single value out of the first two columns, for example by concatenating them into a long, and use that as a key in a dictionary:
public long Combine(int start, int end) {
return ((long)start << 32) | end;
}
Dictionary<long, Distance> lookup = distanceList.ToDictionary(h => Combine(h.Start, h.End));
Then you can look up the value:
public static Distance FindByStartAndEnd(int start, int end, IQueryable<Distance> distanceList) {
Distance item;
if (!lookup.TryGetValue(Combine(start, end), out item) {
item = null;
}
return item;
}
Getting an item from a dictionary is close to an O(1) operaton, which should make a dramatic difference from the O(n) operaton to loop through the items to find one.
Your problem is that LINQ has to execute the expression tree everytime you return the item. Just call this method with multiple start and end values
public static IEnumerable<Distance> FindByStartAndEnd
(IEnumerable<KeyValuePair<int, int>> startAndEnd,
IQueryable<Distance> distanceList)
{
return
from item in distanceList
where
startAndEnd.Select(s => s.Key).Contains(item.Start)
&& startAndEnd.Select(s => s.Value).Contains(item.End)
select item;
}
I have a scenario at work where we have several different tables of data in a format similar to the following:
Table Name: HingeArms
Hght Part #1 Part #2
33 S-HG-088-00 S-HG-089-00
41 S-HG-084-00 S-HG-085-00
49 S-HG-033-00 S-HG-036-00
57 S-HG-034-00 S-HG-037-00
Where the first column (and possibly more) contains numeric data sorted ascending and represents a range to determine the proper record of data to get (e.g. height <= 33 then Part 1 = S-HG-088-00, height <= 41 then Part 1 = S-HG-084-00, etc.)
I need to lookup and select the nearest match given a specified value. For example, given a height = 34.25, I need to get second record in the set above:
41 S-HG-084-00 S-HG-085-00
These tables are currently stored in a VB.NET Hashtable "cache" of data loaded from a CSV file, where the key for the Hashtable is a composite of the table name and one or more columns from the table that represent the "key" for the record. For example, for the above table, the Hashtable Add for the first record would be:
ht.Add("HingeArms,33","S-HG-088-00,S-HG-089-00")
This seems less than optimal and I have some flexibility to change the structure if necessary (the cache contains data from other tables where direct lookup is possible... these "range" tables just got dumped in because it was "easy"). I was looking for a "Next" method on a Hashtable/Dictionary to give me the closest matching record in the range, but that's obviously not available on the stock classes in VB.NET.
Any ideas on a way to do what I'm looking for with a Hashtable or in a different structure? It needs to be performant as the lookup will get called often in different sections of code. Any thoughts would be greatly appreciated. Thanks.
A hashtable is not a good data structure for this, because items are scattered around the internal array according to their hash code, not their values.
Use a sorted array or List<T> and perform a binary search, e.g.
Setup:
var values = new List<HingeArm>
{
new HingeArm(33, "S-HG-088-00", "S-HG-089-00"),
new HingeArm(41, "S-HG-084-00", "S-HG-085-00"),
new HingeArm(49, "S-HG-033-00", "S-HG-036-00"),
new HingeArm(57, "S-HG-034-00", "S-HG-037-00"),
};
values.Sort((x, y) => x.Height.CompareTo(y.Height));
var keys = values.Select(x => x.Height).ToList();
Lookup:
var index = keys.BinarySearch(34.25);
if (index < 0)
{
index = ~index;
}
var result = values[index];
// result == { Height = 41, Part1 = "S-HG-084-00", Part2 = "S-HG-085-00" }
You can use a sorted .NET array in combination with Array.BinarySearch().
If you get a non negative value this is the index of exact match.
Otherwise, if result is negative use formula
int index = ~Array.BinarySearch(sortedArray, value) - 1
to get index of previous "nearest" match.
The meaning of nearest is defined by a comparer you use. It must be the same you used when sorting the array. See:
http://gmamaladze.wordpress.com/2011/07/22/back-to-the-roots-net-binary-search-and-the-meaning-of-the-negative-number-of-the-array-binarysearch-return-value/
How about LINQ-to-Objects (This is by no means meant to be a performant solution, btw.)
var ht = new Dictionary<string, string>();
ht.Add("HingeArms,33", "S-HG-088-00,S-HG-089-00");
decimal wantedHeight = 34.25m;
var foundIt =
ht.Select(x => new { Height = decimal.Parse(x.Key.Split(',')[1]), x.Key, x.Value }).Where(
x => x.Height < wantedHeight).OrderBy(x => x.Height).SingleOrDefault();
if (foundIt != null)
{
// Do Something with your item in foundIt
}
I am trying to write a program to select a random name from the US Census last name list. The list format is
Name Weight Cumulative line
----- ----- ----- -
SMITH 1.006 1.006 1
JOHNSON 0.810 1.816 2
WILLIAMS 0.699 2.515 3
JONES 0.621 3.136 4
BROWN 0.621 3.757 5
DAVIS 0.480 4.237 6
Assuming I load the data in to a structure like
Class Name
{
public string Name {get; set;}
public decimal Weight {get; set;}
public decimal Cumulative {get; set;}
}
What data structure would be best to hold the list of names, and what would be the best way to select a random name from the list but have the distribution of names be the same as the real world.
I will only be working with the first 10,000 rows if it makes a difference in the data structure.
I have tried looking at some of the other questions about weighted randomness but I am having a bit of trouble turning theory in to code. I do not know much about math theory so I do not know if this is a "With or without replacement" random selection, I want the same name able to show up more than once, which ever that one means.
The "easiest" way to handle this would be to keep this in a list.
You could then just use:
Name GetRandomName(Random random, List<Name> names)
{
double value = random.NextDouble() * names[names.Count-1].Culmitive;
return names.Last(name => name.Culmitive <= value);
}
If speed is a concern, you could store a separate array of just the Culmitive values. With this, you could use Array.BinarySearch to quickly find the appropriate index:
Name GetRandomName(Random random, List<Name> names, double[] culmitiveValues)
{
double value = random.NextDouble() * names[names.Count-1].Culmitive;
int index = Array.BinarySearch(culmitiveValues, value);
if (index >= 0)
index = ~index;
return names[index];
}
Another option, which is probably the most efficient, would be to use something like one of the C5 Generic Collection Library's tree classes. You could then use RangeFrom to find the appropriate name. This has the advantage of not requiring a separate collection
I've created a C# library for randomly selected weighted items.
It implements both the tree-selection and walker alias method algorithms, to give the best performance for all use-cases.
It is unit-tested and optimized.
It has LINQ support.
It's free and open-source, licensed under the MIT license.
Some example code:
IWeightedRandomizer<string> randomizer = new DynamicWeightedRandomizer<string>();
randomizer["Joe"] = 1;
randomizer["Ryan"] = 2;
randomizer["Jason"] = 2;
string name1 = randomizer.RandomWithReplacement();
//name1 has a 20% chance of being "Joe", 40% of "Ryan", 40% of "Jason"
string name2 = randomizer.RandomWithRemoval();
//Same as above, except whichever one was chosen has been removed from the list.
I'd say an array (vectors if you prefer) would be best to hold them. As for the weighted average, find the sum, pick a random number between zero and the sum, and pick the last name whose cumulative value is less. (e.g. here, <1.006 = smith, 1.006-1.816 = johnson, etc.
P.S. it's Cumulative.
Just for fun, and in no way optimal
List<Name> Names = //Load your structure into this
List<String> NameBank = new List<String>();
foreach(Name name in Names)
for(int i = 0; i <= (int)(name.Weight*1000); i++)
NameBank.Add(name.Name)
then:
String output = NameBank[rand(NameBank.Count)];
Hey everyone, great community you got here. I'm an Electrical Engineer doing some "programming" work on the side to help pay for bills. I say this because I want you to take into consideration that I don't have proper Computer Science training, but I have been coding for the past 7 years.
I have several excel tables with information (all numeric), basically it is "dialed phone numbers" in one column and number of minutes to each of those numbers on another. Separately I have a list of "carrier prefix code numbers" for the different carriers in my country. What I want to do is separate all the "traffic" per carrier. Here is the scenario:
First dialed number row: 123456789ABCD,100 <-- That would be a 13 digit phone number and 100 minutes.
I have a list of 12,000+ prefix codes for carrier 1, these codes vary in length, and I need to check everyone of them:
Prefix Code 1: 1234567 <-- this code is 7 digits long.
I need to check the first 7 digits for the dialed number an compare it to the dialed number, if a match is found, I would add the number of minutes to a subtotal for later use. Please consider that not all prefix codes are the same length, some times they are shorter or longer.
Most of this should be a piece of cake, and I could should be able to do it, but I'm getting kind of scared with the massive amount of data; Some times the dialed number lists consists of up to 30,000 numbers, and the "carrier prefix code" lists around 13,000 rows long, and I usually check 3 carriers, that means I have to do a lot of "matches".
Does anyone have an idea of how to do this efficiently using C#? Or any other language to be kind honest. I need to do this quite often and designing a tool to do it would make much more sense. I need a good perspective from someone that does have that "Computer Scientist" background.
Lists don't need to be in excel worksheets, I can export to csv file and work from there, I don't need an "MS Office" interface.
Thanks for your help.
Update:
Thank you all for your time on answering my question. I guess in my ignorance I over exaggerated the word "efficient". I don't perform this task every few seconds. It's something I have to do once per day and I hate to do with with Excel and VLOOKUPs, etc.
I've learned about new concepts from you guys and I hope I can build a solution(s) using your ideas.
UPDATE
You can do a simple trick - group the prefixes by their first digits into a dictionary and match the numbers only against the correct subset. I tested it with the following two LINQ statements assuming every prefix has at least three digis.
const Int32 minimumPrefixLength = 3;
var groupedPefixes = prefixes
.GroupBy(p => p.Substring(0, minimumPrefixLength))
.ToDictionary(g => g.Key, g => g);
var numberPrefixes = numbers
.Select(n => groupedPefixes[n.Substring(0, minimumPrefixLength)]
.First(n.StartsWith))
.ToList();
So how fast is this? 15.000 prefixes and 50.000 numbers took less than 250 milliseconds. Fast enough for two lines of code?
Note that the performance heavily depends on the minimum prefix length (MPL), hence on the number of prefix groups you can construct.
MPL Runtime
-----------------
1 10.198 ms
2 1.179 ms
3 205 ms
4 130 ms
5 107 ms
Just to give an rough idea - I did just one run and have a lot of other stuff going on.
Original answer
I wouldn't care much about performance - an average desktop pc can quiete easily deal with database tables with 100 million rows. Maybe it takes five minutes but I assume you don't want to perform the task every other second.
I just made a test. I generated a list with 15.000 unique prefixes with 5 to 10 digits. From this prefixes I generated 50.000 numbers with a prefix and additional 5 to 10 digits.
List<String> prefixes = GeneratePrefixes();
List<String> numbers = GenerateNumbers(prefixes);
Then I used the following LINQ to Object query to find the prefix of each number.
var numberPrefixes = numbers.Select(n => prefixes.First(n.StartsWith)).ToList();
Well, it took about a minute on my Core 2 Duo laptop with 2.0 GHz. So if one minute processing time is acceptable, maybe two or three if you include aggregation, I would not try to optimize anything. Of course, it would be realy nice if the programm could do the task in a second or two, but this will add quite a bit of complexity and many things to get wrong. And it takes time to design, write, and test. The LINQ statement took my only seconds.
Test application
Note that generating many prefixes is really slow and might take a minute or two.
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
namespace Test
{
static class Program
{
static void Main()
{
// Set number of prefixes and calls to not more than 50 to get results
// printed to the console.
Console.Write("Generating prefixes");
List<String> prefixes = Program.GeneratePrefixes(5, 10, 15);
Console.WriteLine();
Console.Write("Generating calls");
List<Call> calls = Program.GenerateCalls(prefixes, 5, 10, 50);
Console.WriteLine();
Console.WriteLine("Processing started.");
Stopwatch stopwatch = new Stopwatch();
const Int32 minimumPrefixLength = 5;
stopwatch.Start();
var groupedPefixes = prefixes
.GroupBy(p => p.Substring(0, minimumPrefixLength))
.ToDictionary(g => g.Key, g => g);
var result = calls
.GroupBy(c => groupedPefixes[c.Number.Substring(0, minimumPrefixLength)]
.First(c.Number.StartsWith))
.Select(g => new Call(g.Key, g.Sum(i => i.Duration)))
.ToList();
stopwatch.Stop();
Console.WriteLine("Processing finished.");
Console.WriteLine(stopwatch.Elapsed);
if ((prefixes.Count <= 50) && (calls.Count <= 50))
{
Console.WriteLine("Prefixes");
foreach (String prefix in prefixes.OrderBy(p => p))
{
Console.WriteLine(String.Format(" prefix={0}", prefix));
}
Console.WriteLine("Calls");
foreach (Call call in calls.OrderBy(c => c.Number).ThenBy(c => c.Duration))
{
Console.WriteLine(String.Format(" number={0} duration={1}", call.Number, call.Duration));
}
Console.WriteLine("Result");
foreach (Call call in result.OrderBy(c => c.Number))
{
Console.WriteLine(String.Format(" prefix={0} accumulated duration={1}", call.Number, call.Duration));
}
}
Console.ReadLine();
}
private static List<String> GeneratePrefixes(Int32 minimumLength, Int32 maximumLength, Int32 count)
{
Random random = new Random();
List<String> prefixes = new List<String>(count);
StringBuilder stringBuilder = new StringBuilder(maximumLength);
while (prefixes.Count < count)
{
stringBuilder.Length = 0;
for (int i = 0; i < random.Next(minimumLength, maximumLength + 1); i++)
{
stringBuilder.Append(random.Next(10));
}
String prefix = stringBuilder.ToString();
if (prefixes.Count % 1000 == 0)
{
Console.Write(".");
}
if (prefixes.All(p => !p.StartsWith(prefix) && !prefix.StartsWith(p)))
{
prefixes.Add(stringBuilder.ToString());
}
}
return prefixes;
}
private static List<Call> GenerateCalls(List<String> prefixes, Int32 minimumLength, Int32 maximumLength, Int32 count)
{
Random random = new Random();
List<Call> calls = new List<Call>(count);
StringBuilder stringBuilder = new StringBuilder();
while (calls.Count < count)
{
stringBuilder.Length = 0;
stringBuilder.Append(prefixes[random.Next(prefixes.Count)]);
for (int i = 0; i < random.Next(minimumLength, maximumLength + 1); i++)
{
stringBuilder.Append(random.Next(10));
}
if (calls.Count % 1000 == 0)
{
Console.Write(".");
}
calls.Add(new Call(stringBuilder.ToString(), random.Next(1000)));
}
return calls;
}
private class Call
{
public Call (String number, Decimal duration)
{
this.Number = number;
this.Duration = duration;
}
public String Number { get; private set; }
public Decimal Duration { get; private set; }
}
}
}
It sounds to me like you need to build a trie from the carrier prefixes. You'll end up with a single trie, where the terminating nodes tell you the carrier for that prefix.
Then create a dictionary from carrier to an int or long (the total).
Then for each dialed number row, just work your way down the trie until you find the carrier. Find the total number of minutes so far for the carrier, and add the current row - then move on.
The easiest data structure that would do this fairly efficiently would be a list of sets. Make a Set for each carrier to contain all the prefixes.
Now, to associate a call with a carrier:
foreach (Carrier carrier in carriers)
{
bool found = false;
for (int length = 1; length <= 7; length++)
{
int prefix = ExtractDigits(callNumber, length);
if (carrier.Prefixes.Contains(prefix))
{
carrier.Calls.Add(callNumber);
found = true;
break;
}
}
if (found)
break;
}
If you have 10 carriers, there will be 70 lookups in the set per call. But a lookup in a set isn't too slow (much faster than a linear search). So this should give you quite a big speed up over a brute force linear search.
You can go a step further and group the prefixes for each carrier according to the length. That way, if a carrier has only prefixes of length 7 and 4, you'd know to only bother to extract and look up those lengths, each time looking in the set of prefixes of that length.
How about dumping your data into a couple of database tables and then query them using SQL? Easy!
CREATE TABLE dbo.dialled_numbers ( number VARCHAR(100), minutes INT )
CREATE TABLE dbo.prefixes ( prefix VARCHAR(100) )
-- now populate the tables, create indexes etc
-- and then just run your query...
SELECT p.prefix,
SUM(n.minutes) AS total_minutes
FROM dbo.dialled_numbers AS n
INNER JOIN dbo.prefixes AS p
ON n.number LIKE p.prefix + '%'
GROUP BY p.prefix
(This was written for SQL Server, but should be very simple to translate for any other DBMS.)
Maybe it would be simpler (not necessarily more efficient) to do it in a database instead of C#.
You could insert the rows on the database and on insert determine the carrier and include it in the record (maybe in an insert trigger).
Then your report would be a sum query on the table.
I would probably just put the entries in a List, sort it, then use a binary search to look for matches. Tailor the binary search match criteria to return the first item that matches then iterate along the list until you find one that doesn't match. A binary search takes only around 15 comparisons to search a list of 30,000 items.
You may want to use a HashTable in C#.
This way you have key-value pairs, and your keys could be the phone numbers, and your value the total minutes. If a match is found in the key set, then modify the total minutes, else, add a new key.
You would then just need to modify your searching algorithm, to not look at the entire key, but only the first 7 digits of it.