I need help in finding proper algorithms to solve my goal.
Let's say I have a dataset with 10000 records about some events. I have 50 event types so each record in my dataset is assigned a number of event (from 1 to 50).
Example of my dataset (2 columns: Record number, event number):
1. 13
2. 24
3. 6
4. 50
5. 24
6. 6
...
10000. 46
As you can see in this example, I have one repetitive sequence of numbers 24, 6. Now I would like to find out how many of these and also other unknown sequences are there in my dataset. I would also like to know multiplicity of each sequence. I have checked Rabin–Karp algorithm but it seems to me, that I have to specify the pattern / sequence first. However I would like that algorithm to find it on its own.
I was told to look also on hierarchical clustering, but I am not sure if it fits my requirements.
To sum up, I would like to find algorithm that will find all repetitive sequences with their multiplicity in a dataset like above.
I assumed you have this data in a text file with the same structure you provided,
I used the LINQ to group and count each value as shown in following code:
static void Main(string[] args)
{
//read lines from the text file
var arr = File.ReadAllLines("dataset.txt").AsQueryable();
//convert the data to List<object> by convert each line to anonymous object
var data = arr.Select(line => new { Index = line.Split('.')[0], Value = line.Split('.')[1] });
//group the data by the value and then select the value and its count
var res = data.GroupBy(item => item.Value).Select(group => new { Value = group.First().Value, Count = group.Count() });
//printing result
Console.WriteLine("Value\t\tCount");
foreach (var item in res)
{
Console.WriteLine("{0}\t\t{1}",item.Value,item.Count);
}
Console.ReadLine();
}
The result of previous code
hope that will help you.
Related
I need to implement the following SQL in C# Linq:
SELECT NTILE (3) OVER (ORDER BY TransactionCount DESC) AS A...
I couldn't find any answer to a similar problem except this. However I don't think that is what I am looking for.
I don't even know where to start, if anyone could please give me at least a starting point I'd appreciated.
-- EDIT --
Trying to explain a little better.
I have one Store X with Transactions, Items, Units and other data that I retrieve from SQL and store in a object in C#.
I have a list of all stores with the same data but in this case I retrieve it from Analysis Services due to the large amount of data retrieved (and other reasons) and I store all of it in another object in C#.
So what I need is to order the list and find out if store X is in the top quartile of that list or second or third...
I hope that helps to clarify what I am trying to achieve.
Thank you
I believe that there is no simple LINQ equivalent of NTILE(n). Anyway, depending on your needs it's not that hard to write one.
The T-SQL documentation says
Distributes the rows in an ordered partition into a specified number of groups. The groups are numbered, starting at one. For each row, NTILE returns the number of the group to which the row belongs.
(see here)
For a very crude implementation of NTILE you can use GroupBy. The following example uses an int[] for sake of simplicity, but of course you are not restricted to
int n = 4;
int[] data = { 5, 2, 8, 2, 3, 8, 3, 2, 9, 5 };
var ntile = data.OrderBy(value => value)
.Select((value,index) => new {Value = value, Index = index})
.GroupBy(c => Math.Floor(c.Index / (data.Count() / (double)n)), c => c.Value);
First, our data is ordered ascending by it's values. If you are not using simple ints this could be something like store => store.Revenue (given you'd like to get the quantiles by revenue of the stores). Futhermore we are selecting the ordered data to an anonymous type, to include the indices. This is necessary since the indices are necessary for grouping, but it seems as GroupBy does not support lambdas with indices, as Select does.
The third line is a bit less intuitive, but I'll try and explain: The NTILE function assigns groups, the rows are assigned to. To create n groups, we devide N (number of items) by n to get the items per group and then device the current index by that, to determine in which group the current item is. To get the number of groups right I had to make the number of items per group fractional and floor the calculated group number, but admittedly, this is rather empirical.
ntile will contain n groups, each one having Key equal to the group number. Each group is enumerable. If you'd like to determine, if an element is in the second quartile, you can check if groups.Where(g => g.Key == 1) contains the element.
Remarks: The method I've used to determine the group may need some fine adjustment.
You can do it using GroupBy function by grouping based on index of the object. Consider a list of integers like this:-
List<int> numbers = new List<int> { 1, 2, 3, 4, 5, 6, 7, 8 };
You can first project the Index of all elements using Select and finally group by their resp. index. While calculating the Index we can divide it by NTILE value (3 in this case):-
var result = numbers.Select((v, i) => new { Value = v, Index = i / 3 })
.GroupBy(x => x.Index)
.Select(x => x.Select(z => z.Value).ToList());
Fiddle.
EDIT Question originally: Count and total repeating occurrences of a number in a specific array position - i.e. how many 0's in array[34].
Changed to Count and total repeating occurrences of a unique values in DataGridView column
This is because I feel I asked the wrong question and the new one gives a more accurate sense of what I was looking for
This is pretty much exactly what I wanted: http://www.codeproject.com/Tips/421557/Counting-Unique-Products-in-a-DataGridView-and-Dis
I obviously could have described the question much better.
I have a byte array populated with a random-access file, I would like to count the amount of times, let's say, 0 or 127 occurs in a certain position. At the moment I can count the amount of successful matches to several numbers but the problem is totalling it. I have tried this: (unsurprisingly does not work)
if (array[34] == 0)
{
Label_a.Text = "abc";
}
int res_a = Regex.Matches(array[34].ToString(), "0").Count;
If I create a MessageBox with res_a, it recognises if it's a 0 or not by displaying a set message. But I would like to count and total the amount of occurrences in the background where the data totalled is going to be populated into a chart. General layout of current code:
// FileStream
// BinaryReader
// While length of file is larger than 0
// FileStream.Seek
// Foreach loop for array
// GET COUNT AND TOTAL HERE!
For example, I import a file with 4 records with different status' where status code 127 is repeated 3 times and 0 is just once. I want to get THIS information - how many times it has occurred!
Another example, this file has 12 records where status code 127 is repeated 6 times and 0 is 6 too:-
Based on your comment to L.B., I understand that you want to count the number of occurences of each "status" value, so I would try something like this:
int[] vals = new int[10] { 1, 1, 2, 2, 3, 3, 3, 4, 4, 1 };
var res = vals.GroupBy(v => v).ToDictionary(g => g.Key, g => g.Count());
res now holds a Dictionary where each KeyValuePair has a Key representing a Status code, and a Value representing the number of time this Status code is in the array.
You can now test a Status code's number of appearance like this:
int Code1;
res.TryGetValue(1, out Code1); // testing for Status code 1
And so on.
Cheers
Couldn't you do something like this?
var countOf0s = array.Count(x => x == 0);
IMPORTANT NOTE
To the people who flagged this as a duplicate, please understand we do NOT want a LINQ-based solution. Our real-world example has several original lists in the tens-of-thousands range and LINQ-based solutions are not performant enough for our needs since they have to walk the lists several times to perform their function, expanding with each new source list.
That is why we are specifically looking for a non-LINQ algorithm, such as the one suggested in this answer below where they walk all lists simultaneously, and only once, via enumerators. That seems to be the best so far, but I am wondering if there are others.
Now back to the question...
For the sake of explaining our issue, consider this hypothetical problem:
I have multiple lists, but to keep this example simple, let's limit it to two, ListA and ListB, both of which are of type List<int>. Their data is as follows:
List A List B
1 2
2 3
4 4
5 6
6 8
8 9
9 10
...however the real lists can have tens of thousands of rows.
We next have a class called ListPairing that's simply defined as follows:
public class ListPairing
{
public int? ASide{ get; set; }
public int? BSide{ get; set; }
}
where each 'side' parameter really represents one of the lists. (i.e. if there were four lists, it would also have a CSide and a DSide.)
We are trying to do is construct a List<ListPairing> with the data initialized as follows:
A Side B Side
1 -
2 2
- 3
4 4
5 -
6 6
8 8
9 9
- 10
Again, note there is no row with '7'
As you can see, the results look like a full outer join. However, please see the update below.
Now to get things started, we can simply do this...
var finalList = ListA.Select(valA => new ListPairing(){ ASide = valA} );
Which yields...
A Side B Side
1 -
2 -
4 -
5 -
6 -
8 -
9 -
and now we want to go back-fill the values from List B. This requires checking first if there is an already existing ListPairing with ASide that matches BSide and if so, setting the BSide.
If there is no existing ListPairing with a matching ASide, a new ListPairing is instantiated with only the BSide set (ASide is blank.)
However, I get the feeling that's not the most efficient way to do this considering all of the required 'FindFirst' calls it would take. (These lists can be tens of thousands of items long.)
However, taking a union of those lists once up front yields the following values...
1, 2, 3, 4, 5, 6, 8, 9, 10 (Note there is no #7)
My thinking was to somehow use that ordered union of the values, then 'walking' both lists simultaneously, building up ListPairings as needed. That eliminates repeated calls to FindFirst, but I'm wondering if that's the most efficient way to do this.
Thoughts?
Update
People have suggested this is a duplicate of getting a full outer join using LINQ because the results are the same...
I am not after a LINQ full outer join. I'm after a performant algorithm.
As such, I have updated the question.
The reason I bring this up is the LINQ needed to perform that functionality is much too slow for our needs. In our model, there are actually four lists, and each can be in the tens of thousands of rows. That's why I suggested the 'Union' approach of the IDs at the very end to get the list of unique 'keys' to walk through, but I think the posted answer on doing the same but with the enumerators is an even better approach as you don't need the list of IDs up front. This would yield a single pass through all items in the lists simultaneously which would easily out-perform the LINQ-based approach.
This didn't turn out as neat as I'd hoped, but if both input lists are sorted then you can just walk through them together comparing the head elements of each one: if they're equal then you have a pair, else emit the smallest one on its own and advance that list.
public static IEnumerable<ListPairing> PairUpLists(IEnumerable<int> sortedAList,
IEnumerable<int> sortedBList)
{
// Should wrap these two in using() per Servy's comment with braces around
// the rest of the method.
var aEnum = sortedAList.GetEnumerator();
var bEnum = sortedBList.GetEnumerator();
bool haveA = aEnum.MoveNext();
bool haveB = bEnum.MoveNext();
while (haveA && haveB)
{
// We still have values left on both lists.
int comparison = aEnum.Current.CompareTo(bEnum.Current);
if (comparison < 0)
{
// The heads of the two remaining sequences do not match and A's is
// lower. Generate a partial pair with the head of A and advance the
// enumerator.
yield return new ListPairing() {ASide = aEnum.Current};
haveA = aEnum.MoveNext();
}
else if (comparison == 0)
{
// The heads of the two sequences match. Generate a pair.
yield return new ListPairing() {
ASide = aEnum.Current,
BSide = bEnum.Current
};
// Advance both enumerators
haveA = aEnum.MoveNext();
haveB = bEnum.MoveNext();
}
else
{
// No match and B is the lowest. Generate a partial pair with B.
yield return new ListPairing() {BSide = bEnum.Current};
// and advance the enumerator
haveB = bEnum.MoveNext();
}
}
if (haveA)
{
// We still have elements on list A but list B is exhausted.
do
{
// Generate a partial pair for all remaining A elements.
yield return new ListPairing() { ASide = aEnum.Current };
} while (aEnum.MoveNext());
}
else if (haveB)
{
// List A is exhausted but we still have elements on list B.
do
{
// Generate a partial pair for all remaining B elements.
yield return new ListPairing() { BSide = bEnum.Current };
} while (bEnum.MoveNext());
}
}
var list1 = new List<int?>(){1,2,4,5,6,8,9};
var list2 = new List<int?>(){2,3,4,6,8,9,10};
var left = from i in list1
join k in list2 on i equals k
into temp
from k in temp.DefaultIfEmpty()
select new {a = i, b = (i == k) ? k : (int?)null};
var right = from k in list2
join i in list1 on k equals i
into temp
from i in temp.DefaultIfEmpty()
select new {a = (i == k) ? i : (int?)i , b = k};
var result = left.Union(right);
If you need the ordering to be same as your example, then you will need to provide an index and order by that (then remove duplicates)
var result = left.Select((o,i) => new {o.a, o.b, i}).Union(right.Select((o, i) => new {o.a, o.b, i})).OrderBy( o => o.i);
result.Select( o => new {o.a, o.b}).Distinct();
I have a scenario at work where we have several different tables of data in a format similar to the following:
Table Name: HingeArms
Hght Part #1 Part #2
33 S-HG-088-00 S-HG-089-00
41 S-HG-084-00 S-HG-085-00
49 S-HG-033-00 S-HG-036-00
57 S-HG-034-00 S-HG-037-00
Where the first column (and possibly more) contains numeric data sorted ascending and represents a range to determine the proper record of data to get (e.g. height <= 33 then Part 1 = S-HG-088-00, height <= 41 then Part 1 = S-HG-084-00, etc.)
I need to lookup and select the nearest match given a specified value. For example, given a height = 34.25, I need to get second record in the set above:
41 S-HG-084-00 S-HG-085-00
These tables are currently stored in a VB.NET Hashtable "cache" of data loaded from a CSV file, where the key for the Hashtable is a composite of the table name and one or more columns from the table that represent the "key" for the record. For example, for the above table, the Hashtable Add for the first record would be:
ht.Add("HingeArms,33","S-HG-088-00,S-HG-089-00")
This seems less than optimal and I have some flexibility to change the structure if necessary (the cache contains data from other tables where direct lookup is possible... these "range" tables just got dumped in because it was "easy"). I was looking for a "Next" method on a Hashtable/Dictionary to give me the closest matching record in the range, but that's obviously not available on the stock classes in VB.NET.
Any ideas on a way to do what I'm looking for with a Hashtable or in a different structure? It needs to be performant as the lookup will get called often in different sections of code. Any thoughts would be greatly appreciated. Thanks.
A hashtable is not a good data structure for this, because items are scattered around the internal array according to their hash code, not their values.
Use a sorted array or List<T> and perform a binary search, e.g.
Setup:
var values = new List<HingeArm>
{
new HingeArm(33, "S-HG-088-00", "S-HG-089-00"),
new HingeArm(41, "S-HG-084-00", "S-HG-085-00"),
new HingeArm(49, "S-HG-033-00", "S-HG-036-00"),
new HingeArm(57, "S-HG-034-00", "S-HG-037-00"),
};
values.Sort((x, y) => x.Height.CompareTo(y.Height));
var keys = values.Select(x => x.Height).ToList();
Lookup:
var index = keys.BinarySearch(34.25);
if (index < 0)
{
index = ~index;
}
var result = values[index];
// result == { Height = 41, Part1 = "S-HG-084-00", Part2 = "S-HG-085-00" }
You can use a sorted .NET array in combination with Array.BinarySearch().
If you get a non negative value this is the index of exact match.
Otherwise, if result is negative use formula
int index = ~Array.BinarySearch(sortedArray, value) - 1
to get index of previous "nearest" match.
The meaning of nearest is defined by a comparer you use. It must be the same you used when sorting the array. See:
http://gmamaladze.wordpress.com/2011/07/22/back-to-the-roots-net-binary-search-and-the-meaning-of-the-negative-number-of-the-array-binarysearch-return-value/
How about LINQ-to-Objects (This is by no means meant to be a performant solution, btw.)
var ht = new Dictionary<string, string>();
ht.Add("HingeArms,33", "S-HG-088-00,S-HG-089-00");
decimal wantedHeight = 34.25m;
var foundIt =
ht.Select(x => new { Height = decimal.Parse(x.Key.Split(',')[1]), x.Key, x.Value }).Where(
x => x.Height < wantedHeight).OrderBy(x => x.Height).SingleOrDefault();
if (foundIt != null)
{
// Do Something with your item in foundIt
}
Hey everyone, great community you got here. I'm an Electrical Engineer doing some "programming" work on the side to help pay for bills. I say this because I want you to take into consideration that I don't have proper Computer Science training, but I have been coding for the past 7 years.
I have several excel tables with information (all numeric), basically it is "dialed phone numbers" in one column and number of minutes to each of those numbers on another. Separately I have a list of "carrier prefix code numbers" for the different carriers in my country. What I want to do is separate all the "traffic" per carrier. Here is the scenario:
First dialed number row: 123456789ABCD,100 <-- That would be a 13 digit phone number and 100 minutes.
I have a list of 12,000+ prefix codes for carrier 1, these codes vary in length, and I need to check everyone of them:
Prefix Code 1: 1234567 <-- this code is 7 digits long.
I need to check the first 7 digits for the dialed number an compare it to the dialed number, if a match is found, I would add the number of minutes to a subtotal for later use. Please consider that not all prefix codes are the same length, some times they are shorter or longer.
Most of this should be a piece of cake, and I could should be able to do it, but I'm getting kind of scared with the massive amount of data; Some times the dialed number lists consists of up to 30,000 numbers, and the "carrier prefix code" lists around 13,000 rows long, and I usually check 3 carriers, that means I have to do a lot of "matches".
Does anyone have an idea of how to do this efficiently using C#? Or any other language to be kind honest. I need to do this quite often and designing a tool to do it would make much more sense. I need a good perspective from someone that does have that "Computer Scientist" background.
Lists don't need to be in excel worksheets, I can export to csv file and work from there, I don't need an "MS Office" interface.
Thanks for your help.
Update:
Thank you all for your time on answering my question. I guess in my ignorance I over exaggerated the word "efficient". I don't perform this task every few seconds. It's something I have to do once per day and I hate to do with with Excel and VLOOKUPs, etc.
I've learned about new concepts from you guys and I hope I can build a solution(s) using your ideas.
UPDATE
You can do a simple trick - group the prefixes by their first digits into a dictionary and match the numbers only against the correct subset. I tested it with the following two LINQ statements assuming every prefix has at least three digis.
const Int32 minimumPrefixLength = 3;
var groupedPefixes = prefixes
.GroupBy(p => p.Substring(0, minimumPrefixLength))
.ToDictionary(g => g.Key, g => g);
var numberPrefixes = numbers
.Select(n => groupedPefixes[n.Substring(0, minimumPrefixLength)]
.First(n.StartsWith))
.ToList();
So how fast is this? 15.000 prefixes and 50.000 numbers took less than 250 milliseconds. Fast enough for two lines of code?
Note that the performance heavily depends on the minimum prefix length (MPL), hence on the number of prefix groups you can construct.
MPL Runtime
-----------------
1 10.198 ms
2 1.179 ms
3 205 ms
4 130 ms
5 107 ms
Just to give an rough idea - I did just one run and have a lot of other stuff going on.
Original answer
I wouldn't care much about performance - an average desktop pc can quiete easily deal with database tables with 100 million rows. Maybe it takes five minutes but I assume you don't want to perform the task every other second.
I just made a test. I generated a list with 15.000 unique prefixes with 5 to 10 digits. From this prefixes I generated 50.000 numbers with a prefix and additional 5 to 10 digits.
List<String> prefixes = GeneratePrefixes();
List<String> numbers = GenerateNumbers(prefixes);
Then I used the following LINQ to Object query to find the prefix of each number.
var numberPrefixes = numbers.Select(n => prefixes.First(n.StartsWith)).ToList();
Well, it took about a minute on my Core 2 Duo laptop with 2.0 GHz. So if one minute processing time is acceptable, maybe two or three if you include aggregation, I would not try to optimize anything. Of course, it would be realy nice if the programm could do the task in a second or two, but this will add quite a bit of complexity and many things to get wrong. And it takes time to design, write, and test. The LINQ statement took my only seconds.
Test application
Note that generating many prefixes is really slow and might take a minute or two.
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
namespace Test
{
static class Program
{
static void Main()
{
// Set number of prefixes and calls to not more than 50 to get results
// printed to the console.
Console.Write("Generating prefixes");
List<String> prefixes = Program.GeneratePrefixes(5, 10, 15);
Console.WriteLine();
Console.Write("Generating calls");
List<Call> calls = Program.GenerateCalls(prefixes, 5, 10, 50);
Console.WriteLine();
Console.WriteLine("Processing started.");
Stopwatch stopwatch = new Stopwatch();
const Int32 minimumPrefixLength = 5;
stopwatch.Start();
var groupedPefixes = prefixes
.GroupBy(p => p.Substring(0, minimumPrefixLength))
.ToDictionary(g => g.Key, g => g);
var result = calls
.GroupBy(c => groupedPefixes[c.Number.Substring(0, minimumPrefixLength)]
.First(c.Number.StartsWith))
.Select(g => new Call(g.Key, g.Sum(i => i.Duration)))
.ToList();
stopwatch.Stop();
Console.WriteLine("Processing finished.");
Console.WriteLine(stopwatch.Elapsed);
if ((prefixes.Count <= 50) && (calls.Count <= 50))
{
Console.WriteLine("Prefixes");
foreach (String prefix in prefixes.OrderBy(p => p))
{
Console.WriteLine(String.Format(" prefix={0}", prefix));
}
Console.WriteLine("Calls");
foreach (Call call in calls.OrderBy(c => c.Number).ThenBy(c => c.Duration))
{
Console.WriteLine(String.Format(" number={0} duration={1}", call.Number, call.Duration));
}
Console.WriteLine("Result");
foreach (Call call in result.OrderBy(c => c.Number))
{
Console.WriteLine(String.Format(" prefix={0} accumulated duration={1}", call.Number, call.Duration));
}
}
Console.ReadLine();
}
private static List<String> GeneratePrefixes(Int32 minimumLength, Int32 maximumLength, Int32 count)
{
Random random = new Random();
List<String> prefixes = new List<String>(count);
StringBuilder stringBuilder = new StringBuilder(maximumLength);
while (prefixes.Count < count)
{
stringBuilder.Length = 0;
for (int i = 0; i < random.Next(minimumLength, maximumLength + 1); i++)
{
stringBuilder.Append(random.Next(10));
}
String prefix = stringBuilder.ToString();
if (prefixes.Count % 1000 == 0)
{
Console.Write(".");
}
if (prefixes.All(p => !p.StartsWith(prefix) && !prefix.StartsWith(p)))
{
prefixes.Add(stringBuilder.ToString());
}
}
return prefixes;
}
private static List<Call> GenerateCalls(List<String> prefixes, Int32 minimumLength, Int32 maximumLength, Int32 count)
{
Random random = new Random();
List<Call> calls = new List<Call>(count);
StringBuilder stringBuilder = new StringBuilder();
while (calls.Count < count)
{
stringBuilder.Length = 0;
stringBuilder.Append(prefixes[random.Next(prefixes.Count)]);
for (int i = 0; i < random.Next(minimumLength, maximumLength + 1); i++)
{
stringBuilder.Append(random.Next(10));
}
if (calls.Count % 1000 == 0)
{
Console.Write(".");
}
calls.Add(new Call(stringBuilder.ToString(), random.Next(1000)));
}
return calls;
}
private class Call
{
public Call (String number, Decimal duration)
{
this.Number = number;
this.Duration = duration;
}
public String Number { get; private set; }
public Decimal Duration { get; private set; }
}
}
}
It sounds to me like you need to build a trie from the carrier prefixes. You'll end up with a single trie, where the terminating nodes tell you the carrier for that prefix.
Then create a dictionary from carrier to an int or long (the total).
Then for each dialed number row, just work your way down the trie until you find the carrier. Find the total number of minutes so far for the carrier, and add the current row - then move on.
The easiest data structure that would do this fairly efficiently would be a list of sets. Make a Set for each carrier to contain all the prefixes.
Now, to associate a call with a carrier:
foreach (Carrier carrier in carriers)
{
bool found = false;
for (int length = 1; length <= 7; length++)
{
int prefix = ExtractDigits(callNumber, length);
if (carrier.Prefixes.Contains(prefix))
{
carrier.Calls.Add(callNumber);
found = true;
break;
}
}
if (found)
break;
}
If you have 10 carriers, there will be 70 lookups in the set per call. But a lookup in a set isn't too slow (much faster than a linear search). So this should give you quite a big speed up over a brute force linear search.
You can go a step further and group the prefixes for each carrier according to the length. That way, if a carrier has only prefixes of length 7 and 4, you'd know to only bother to extract and look up those lengths, each time looking in the set of prefixes of that length.
How about dumping your data into a couple of database tables and then query them using SQL? Easy!
CREATE TABLE dbo.dialled_numbers ( number VARCHAR(100), minutes INT )
CREATE TABLE dbo.prefixes ( prefix VARCHAR(100) )
-- now populate the tables, create indexes etc
-- and then just run your query...
SELECT p.prefix,
SUM(n.minutes) AS total_minutes
FROM dbo.dialled_numbers AS n
INNER JOIN dbo.prefixes AS p
ON n.number LIKE p.prefix + '%'
GROUP BY p.prefix
(This was written for SQL Server, but should be very simple to translate for any other DBMS.)
Maybe it would be simpler (not necessarily more efficient) to do it in a database instead of C#.
You could insert the rows on the database and on insert determine the carrier and include it in the record (maybe in an insert trigger).
Then your report would be a sum query on the table.
I would probably just put the entries in a List, sort it, then use a binary search to look for matches. Tailor the binary search match criteria to return the first item that matches then iterate along the list until you find one that doesn't match. A binary search takes only around 15 comparisons to search a list of 30,000 items.
You may want to use a HashTable in C#.
This way you have key-value pairs, and your keys could be the phone numbers, and your value the total minutes. If a match is found in the key set, then modify the total minutes, else, add a new key.
You would then just need to modify your searching algorithm, to not look at the entire key, but only the first 7 digits of it.