Faster Update of Dictionary values [closed] - c#

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Closed 8 years ago.
Improve this question
My dictionary is
Dictionary<string, string> d = new Dictionary<string, string>();
I'm iterating through an XML file (very large) and saving key/value pairs in a dictionary.
The following snapshot of code is very slow in execution and I want to make it faster. It takes around more than one hour to complete where my ctr value reaches to 3332130.
if (d.ContainsKey(dKey))
{
dValue = d[dKey];
d[dKey] = dValue + "," + ctr;
}
else
d.Add(dKey, ctr.ToString());
ctr++;

3332130 is a large number to store in memory, you should not hold such a big collection in memory.
Being said that, Let's try to optimize this.
Dictionary<string, StringBuilder>() d = new Dictionary<string, StringBuilder>();
StringBuilder builder;
if (d.TryGetValue(dKey, out builder))
{
builder.Append(",");
builder.Append(ctr);
}
else
{
d.Add(dKey, new StringBuilder(ctr.ToString()));
}
String concatenation in tight loop is awfully slow, use
StringBuilder instead
Use TryGetValue which avoids you to call dValue = d[dKey];.
I believe this should increase performance significantly.

Performing a number of repeated concatenations not known at compile time on large strings is an inherently wasteful thing to do. If you end up concatting a lot of values together, and they are not particularly small, that could easily be the source of your problem.
If so, it would have nothing at all to do with the dictionary. You should consider using a StringBuilder, or building up a collection of separate strings that you can join using string.Join when you have all of the strings you'll need for that value.

You may want to consider using StringBuilders instead of strings:
var d = new Dictionary<string, StringBuilder>();
And append the values like this:
if (d.ContainsKey(dKey))
{
d[dKey].Append("," + ctr);
}
else
d.Add(dKey, new StringBuilder(ctr.ToString()));
++ctr;
But I suspect that the bottleneck is in fact somewhere else.

in addition to String concatenation enhancements, you can also split your XML into several data sets and then populate ConcurrentDictionary in parallel with them. Depending on your data and framework you are using the performance could increase in times.
More examples here and here

Related

Save large list of ints in memory for fast access [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 4 years ago.
Improve this question
First, a little background: I enjoy working on project Euler problems (https://projecteuler.net/archives) but many of them require a ton of heavy computation so I try to save known constants in memory so they don't have to be recalculated every time. These include things like n!, nPr, nCr, and lists of primes. For the purpose of this question let's just stick with primes because any solution with those can be easily ported to the others.
The question: Let's say I want to save the first 1,000,000 primes in memory for repeated access while doing heavy computation. The 1,000,000th prime is 15,485,863 so ints will do just fine here. I need to save these values in a way such that access is O(1) because these will be access a lot.
What I've tried so far:
Clearly I can't put all 1,000,000 in one cs file because Visual Studio throws a fit. I've been trying to break it into multiple files using a partial class and 2-D List<List<int>>
public partial class Primes
{
public readonly List<int> _primes_1 = new List<int>
{
2, 3, ... 999983
}
}
So _primes_1 has the primes less than 1,000,000, _primes_2 has the primes between 1,000,000 to 2,000,000, etc, 15 files worth. Then I put them together
public partial class Primes
{
public List<List<int>> _primes = new List<List<int>>()
{
_primes_1, _primes_2, _primes_3, _primes_4, _primes_5,
_primes_6, _primes_7, _primes_8, _primes_9, _primes_10,
_primes_11, _primes_12, _primes_13, _primes_14, _primes_15
};
}
This methodology does work as it is easy to enumerate through the list and IsPrime(n) checks are fairly simple as well (binary search). The big downfall with this methodology is that VS starts to freak out because each file has ~75,000 ints in it (~8000 lines depending on spacing). In fact, much of my editing of these files has to be done in NPP just to keep VS from hanging/crashing.
Other things I've considered:
I originally read the numbers in off a text file and could do that in the program but clearly I would want to do that at startup and then just have the values available. I also considered dumping them into sql but again, eventually they need to be in memory. For the in memory storage I considered memcache but I don't know enough about it to know how efficient it is in look ups.
In the end, this comes down to two questions:
How do the numbers get in to memory to begin with?
What mechanism is used to store them?
Spending a little more time in spin up is fine (within reason) as long as the lookup mechanism is fast fast fast.
Quick note: Yes I know that if I only do 15 pages as shown then I won't have all 1,000,000 because 15,485,863 is on page 16. That's fine, for our purposes here this is good enough.
Bring them in from a single text file at startup. This data shouldn't be in source files (as you are discovering).
Store them in a HashSet<int>, so for any number n, isPrime = n => primeHashSet.Contains(n). This will give you your desired O(1) complexity.
HashSet<int> primeHashSet = new HashSet<int>(
File.ReadLines(filePath)
.AsParallel() //maybe?
.SelectMany(line => Regex.Matches(line, #"\d+").Cast<Match>())
.Select(m => m.Value)
.Select(int.Parse));
Predicate<int> isPrime = primeHashSet.Contains;
bool someNumIsPrime = isPrime(5000); //for example
On my (admittedly fairly snappy) machine, this loads in about 300ms.

Isn't SortedList supposed to be slower than Dictionary? [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 5 years ago.
Improve this question
Right now I am picking up c# and right now, I am trying to figure out possible performance issues with long lists. Therefore I wanted to have some figures as to how the speed differs when I am using Dictionaries or SortedLists.
Which is why I came up with this example
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Collections;
using System.Diagnostics;
namespace dictionary_vs_sortedList
{
class Program
{
static void Main(string[] args)
{
long search = 20000;
SortedList<long, string> myList = new SortedList<long, string>();
Dictionary<long, string> myDict = new Dictionary<long, string>();
for (long i = 0; i < 10*search; i++){
myDict.Add(i, "hi");
myList.Add(i, "hey");
}
var watch1 = Stopwatch.StartNew();
if (myDict.ContainsKey(search))
{
Console.WriteLine("is included");
}
watch1.Stop();
TimeSpan ts1 = watch1.Elapsed;
Console.WriteLine(ts1);
var watch2 = Stopwatch.StartNew();
if (myList.ContainsKey(search))
{
Console.WriteLine("is also included");
}
watch2.Stop();
TimeSpan ts2 = watch2.Elapsed;
Console.WriteLine(ts2);
}
}
}
The output for this always gives me a longer running time for the Dictionary than for the SortedList. I thaught Contains() goes with O(1) for a Dictionary and with O(n) for a SortedList
Edit:
Thanks for the quick help.
I wasn't paying attention, as to how my measurement may be compromised by lines of code I do not want to measure.
This Question may now be closed.
You should not just do one run to test the performance of an operation. I put an for loop with 1000 iterations around each check (and removed the Console.Write) and got following results
Dictionary 00:00:00.0000286
SortedList 00:00:00.0056493
So the result is obvious.
There are several things wrong with your time measurement:
First, before you start measuring something you will have to execute that part first. The simple reason is that before the code is executed it must be compiled by the JIT-compiler. This takes time. You do not want that compile time being part of your measurement.
Second, O(n) and O(1) operations based on a reasonably balanced filling. Furthermore, a search for one element can result in a different measurement than searching for another element in the same collection. I can thing of situations where a dictionary can perform very slow and a SortedList very vast. So you should do multiple searches (1000? 100.000?) with different (random?) values. Doing so will result in a more reliable result.
Third, remove as much code out of the measurement as possible. The Console.WriteLine is a rather slow operation and will have a very negative impact on your measurement.
* EDIT *
The SortedList has a O(log n) operation, not O(n), but that is besides the point.

List vs Dictionary when referring by index/key [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
So I have been mainly using lists to retrieve small amounts of data from a database which feeds into a web application but have recently come across dictionaries which produce more readable code with keys but what is the performance difference when just referring by index/key?
I understand that a dictionary uses more memory but what is best practice in this scenario and is it worth the performance/maintenance trade-off bearing in mind that I will not be performing searches or sorting the data?
When you do want to find some one item through list, then you should see ALL items till you find its key.
Let's see some basic example. You have
Person
{
public int ID {get;set;}
public string Name {get;set;}
}
and you have collection List<Person> persons and you want to find some person by its ID:
var person = persons.FirstOrDefault(x => x.ID == 5);
As written it has to enumerate the entire List until it finds the entry in the List that has the correct ID (does entry 0 match the lambda? No... Does entry 1 match the lambda? No... etc etc). This is O(n).
However, if you want to find through the Dictionary dictPersons :
var person = dictPersons[person.ID];
If you want to find a certain element by key in a dictionary, it can instantly jump to where it is in the dictionary - this is O(1). O(n) for doing it for every person. (If you want to know how this is done - Dictionary runs a mathematical operation on the key, which turns it into a value that is a place inside the dictionary, which is the same place it put it when it was inserted. It is called hash-function)
So, Dictionary is faster than Listbecause Dictionary does not iterate through the all collection, but Dictionary takes the item from the exact place(hash-function calculates this place). It is a better algorithm.
Dictionary relies on chaining (maintaining a list of items for each hash table bucket) to resolve collisions whereas Hashtable uses rehashing for collision resolution (when a collision occurs, tries another hash function to map the key to a bucket). You can read how hash function works and difference between chaining and rehashing.
Unless you're actually experiencing performance issues and need to optimize it's better to go with what's more readable and maintainable. That's especially true since you mentioned that it's small amounts of data. Without exaggerating - it's possible that over the life of the application the cumulative difference in performance (if any) won't equal the time you save by making your code more readable.
To put it in perspective, consider the work that your application already does just to read request headers and parse views and read values from configuration files. Not only will the difference in performance between the list and the dictionary be small, it will also be a tiny fraction of the overall processing your application does just to serve a single page request.
And even then, if you were to see performance issues and needed to optimize, there would probably be plenty of other optimizations (like caching) that would make a bigger difference.

Best data structure for collection strings in c# [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Closed 9 years ago.
Improve this question
I have a huge collection of strings. I will find out all the strings which starts with the given character more frequently. What would be a best collection to do this. I will initialize the collection in sorted order.
Thanks
If you want a map from a character to all strings starting with that character, you might find ILookup<TKey, TElement> suitable. It's very similar to a Dictionary<TKey, TValue>, with two main differences:
Instead of a 1:1 mapping, it performs a 1:n mapping (i.e. there can be more than one value per key).
You cannot instantiate (new) nor populate it (.Add(…)) yourself; instead, you let .NET derive a fully populated instance from another collection by calling .ToLookup(…) on the latter.
Here's an example how to build such a 1:n map:
using System.Collections.Generic; // for List<T>
using System.Linq; // for ILookup<TKey, TValue> and .ToLookup(…)
// This represents the source of your strings. It doesn't have to be sorted:
var strings = new List<string>() { "Foo", "Bar", "Baz", "Quux", … };
// This is how you would build a 1:n lookup table mapping from first characters
// to all strings starting with that character. Empty strings are excluded:
ILookup<char, string> stringsByFirstCharacter =
strings.Where(str => !string.IsNullOrEmpty(str)) // exclude empty strings
.ToLookup(str => str[0]); // key := first character
// This is how you would look up all strings starting with B.
// The output will be Bar and Baz:
foreach (string str in stringsByFirstCharacter['B'])
{
Console.WriteLine(str);
}
P.S.: The above hyperlink for ILookup<…> (the interface) refers you to the help page for Lookup<…> (the implementation class). This is on purpose, as I find the documentation for the class easier to read. I would however recommend to use the interface in your code.
If you need to search regularly with a huge collection of strings, then use a Hash table. Remember to distribute the table evenly to speed up the look-up operation.
Well so you need to create an index on function from string.
For this Id suggest using
Dictionary<string,List<string>> data structure.
ToLookup isn't so good cause it limits your ability to maniuplate the data structure.

Use the += operator to concat strings [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
String vs StringBuilder
Why should I not use += to concat strings?
What is the quickest alternative?
Strings are immutable in .NET.. which means once they exist, they cannot be changed.
The StringBuilder is designed to mitigate this issue, by allowing you to append to a pre-determined character array of n size (default is 16 I think?!). However, once the StringBuilder exceeds the specified limit.. it needs to allocate a bigger copy of itself, and copy the content into it.. thus creating a possibly bigger problem.
What this boils down to is premature optimization. Unless you're noticing issues with string concatenation's using too much memory.. worrying about it is useless.
+= and String1 = String1+String2 do the same thing, copying the whole Strings to a new one.
if you do this in a loop, lots of memoryallocations are generated, leading to poor performance.
if you want to build long strings, you should look into the StringBuilder Class wich is optimized for such operations.
in short: a few concat strings shouldn't hurt performance much, but building a large String by adding small bits in a loop will slow you down a lot and/or will use lots of memory.
Another interesting article on String performance: http://www.codeproject.com/Articles/3377/Strings-UNDOCUMENTED

Categories

Resources