I have an application that receives certain "events", uniquely identified by a 12 chars string and a DateTime. At each event is associated a result that is a string.
I need to keep these events in memory (for a maximum of for example 8 hours) and be able, in case I receive a second time the same event, being able to know I've already received it (in the last 8 hours).
Events to store will be less than 1000.
I can't use an external storage, it has to be done in memory.
My idea is to use a Dictionary where the key is a class composed of a string and a datetime, the value is the result.
EDIT: the string itself (actually a MAC address) does not identify uniquely the event, it's the MAC AND the DateTime, those two combined are unique, that's why the key must be formed by both.
The application is a server that receives a certain event from a client: the event is marked on the client by client MAC and by the client datetime (can't use a guid).
It may happen that the client retransmits the same data, and by checking the dictionary for that MAC/Datetime key I would know that I have already received that data.
Then, every hour (for example), I can foreach through the whole collection and remove all the keys where datetime is older than 8 hours.
Can you suggest a better approach to the problem or to the data formats I have chosen? In terms of performance and cleaniness of the code.
Or a better way to delete old data, with LINQ for example.
Thanks,
Mattia
The event time has to not be part of the key -- if it is, how are you going to be able to tell that you have already received this event? So you should move to a dictionary where the keys are event names and the values are tuples of date and result.
Once in a while you can trim old data from the dictionary easily with LINQ:
dictionary = dictionary
.Where(p => p.Value.DateOfEvent >= DateTime.Now.AddHours(-8))
.ToDictionary();
If requirements state that updating once per hour is good enough, and you're never having more than 1000 items in the dictionary, your solution should be perfectly adequate and probably the most easily understood by anyone else looking at your code. I'd probably recommend immutable structs for the key instead of classes, but that's it.
If there's a benefit to removing them immediately rather than once per hour, you could do something where you also add a Timer that removes it after exactly 8 hours, but then you've got to deal with thread safety and cleaning up all the timers and such. Likely not worth it.
I'd avoid the OrderedDictionary approach since it's more code, and may be slower since it has to reorder with every insert.
It's a common mantra these days to focus first on keeping code simple, only optimize when necessary. Until you have a known bottleneck and have profiled it, you never know if you're even optimizing the right thing. (And from your description, there's no telling which part will be slowest without profiling it).
I would go for a Dictionary.
This way you can searh very fast for the string (O(1)-operation).
Other collections are slower:
OrderedDictionary: is slow because it needs boxing and unboxing.
SortedDictionary: performs an O(log n) operation.
All normal arrays and lists: use an O(n/2) operation.
An example:
public class Event
{
public Event(string macAddress, DateTime time, string data)
{
MacAddress = macAddress;
Time = time;
Data = data;
}
public string MacAddress { get; set; }
public DateTime Time { get; set; }
public string Data { get; set; }
}
public class EventCollection
{
private readonly Dictionary<Tuple<string, DateTime>, Event> _Events = new Dictionary<Tuple<string, DateTime>, Event>();
public void Add(Event e)
{
_Events.Add(new Tuple<string, DateTime>(e.MacAddress, e.Time), e);
}
public IList<Event> GetOldEvents(bool autoRemove)
{
DateTime old = DateTime.Now - TimeSpan.FromHours(8);
List<Event> results = new List<Event>();
foreach(Event e in _Events.Values)
if (e.Time < old)
results.Add(e);
// Clean up
if (autoRemove)
foreach(Event e in results)
_Events.Remove(new Tuple<string, DateTime>(e.MacAddress, e.Time));
return results;
}
}
I would use an OrderedDictionary where the key is the 12 charactor identifier and the result and datetime are part of the value. Sadly OrderedDictionary is not generic (key and value are objects), so you would need to do the casting and type checking yourself. When you need to remove the old events, you can foreach through the OrderedDictionary and stop when you get to a time new enough to keep. This assumes the datetimes you use are in order when you add them to the dictionary.
Related
This question already has answers here:
List with timeout/expire ability
(5 answers)
Closed 4 years ago.
I've got a little problem here and I'd like some help.
My code is an infinite search on web pages for patterns that, whenever it finds something new, it writes it on a file.
However, sometimes, the info I'm scavenging is already in the file, but it's not updated and I don't want repeated entries on my file.
Therefore, I simply created a List of strings, adding each entry there and every time the code finds what it's looking for, it checks if the string is already on that list before writing to the file.
You can clearly see why this is a bad idea... Since it runs 24/7, this list will endlessly become bigger. But there is a catch. I'm 100% sure that the info I'm looking will never repeat if 15 minutes has passed.
So, what I really want is to eliminate items that are on this list for 15 minutes. I just can't think of something simple and/or elegant to do this. Or, I don't know if there is some data structure or library that can solve this for me.
That's why I'm asking here: what is the best solution to create some kind of "timed list", where items that are there for a while gets removed at the end of the iteration?
Thanks in advance.
Have you tried .NET's built-in MemoryCache?
You can set a cache policy that includes an absolute timeout, which I think is what you want.
You'll need something running that periodically prunes the list.
What I've done in the past is:
Use a ConcurrentBag<Tuple<DateTime, T>> instead of List<T>
With the bag of Tuples, store the object and the time it was added: theBag.Add(Tuple.Create(DateTime.Now, myObject));
Run a secondary thread that periodically enumerates the bag, and removes any entries that have "expired".
This is a more active approach, but its pretty simple. However, since you're now working with two threads you've got to be careful to not deadlock. Thats why I used something like ConcurrentBag. There are other Concurrent collections you can look at as well. You mentioned a queue, so you could try a ConcurrentQueue
Take a good look at a caching library like others have suggested and weigh your options. A full caching library may be overkill.
Instead of a list of strings, create a class that has a string property and a timestamp property. When you create an instance of the class, auto populate the timestamp property with DateTime.Now.
Each time you iterate the list to see if a string exists, check the timestamp property as well and discard any item older than 15 minutes.
example
class TimeStampedSearchResult
{
public string SearchResult { get; set; }
public DateTime TimeStamp { get; private set; }
public TimeStampedSearchResult(string searchResult)
{
SearchResult = searchResult;
TimeStamp = DateTime.Now;
}
public void UpdateTimeStamp()
{
TimeStamp = DateTime.Now;
}
}
then you could use it like:
public SearchForever()
{
//the results list
List<TimeStampedSearchResult> results = new List<TimeStampedSearchResult>();
//a list of expired results to remove from results list
List<TimeStampedSearchResult> expiredResults = new List<TimeStampedSearchResult>();
while (true)
{
//search for a result
var searchResult = new TimeStampedSearchResult(SearchForStuff());
bool found = false;
//iterate our list
foreach (var result in results)
{
if (result.SearchResult == searchResult.SearchResult)
{
result.UpdateTimeStamp();
found = true;
}
else
{
if (result.TimeStamp < DateTime.Now.AddMinutes(-15))
{
expiredResults.Add(result);
}
}
}
if (!found)
{
//add to our results list
results.Add(searchResult);
//write result to file
WriteResult(searchResult.SearchResult, "myfile.txt")
}
//remove expired results
foreach (var oldResult in expiredResults)
results.Remove(oldResult);
//make sure you clear the expired results list too.
expiredResults.Clear();
}
}
I know that the int wont have a fixed position in memory so it simply cant work like that.
But The exact same portion of code will be run concurrently with different names, parameters e.t.c
I need to essentially pass a string of "Name" and then somehow increment one of the items in my int array.
Dictionary<string, int> intStats = new Dictionary<string, int>();
This dictionary stores all the stats based on the "Name" supplied as the dictionaries string key.
And since im using a LOT of multi-threading, I wish to keep the int count as synchronized as possible. Which is why im attempting to use Interlocked.Increment(ref intStats[theName]);
But unfortunately this wont work.
Is there any alternatives that would work for my situation?
First, I suggest creating a custom type that captures the semantics of your abstract data type. That way you can experiment with different implementations, and that way your call sites become self-documenting.
internal sealed class NameCounter
{
public int GetCount(string Name) { ... }
public void Increment(string Name) { ... }
}
So: what implementation choices might you make, given that this must be threadsafe?
a private Dictionary<string, int> would work but you'd have to lock the dictionary on every access, which could get expensive.
a private ConcurrentDictionary<string, int>, but keep in mind that you have to use TryUpdate in a loop to make sure you don't lose values.
make a wrapper type:
internal sealed class MutableInt
{
public int Value;
}
This is one of the rare cases when you'd want to make a public field. Now make a ConcurrentDictionary<string, MutableInt>, and then InterlockedIncrement the public field. Now you don't have to TryUpdate, but there is still a race here: if two threads both attempt to add the same name at the same time for the first time then you have to make sure that only one of them wins. Use AddOrUpdate carefully to ensure that this race doesn't happen.
Implement your own concurrent dictionary as a hash table that indexes into an int array; InterlockedIncrement on elements of the array. Again, you'll have to be extremely careful when a new name is introduced into the system to ensure that hash collisions are detected in a threadsafe manner.
Hash the string to one of n buckets, but this time the buckets are immutable dictionaries. Each bucket has a lock; lock the bucket, create a new dictionary from the old one, put it back in the bucket, unlock the bucket. If there is contention, increase n until it goes away.
I am currently working on a project where we have a set of events. One piece of analysis we do on the events is to look through a specific type of event and check to see if it's likely that it was prompted by another event which happened shortly before (or slightly after in one odd case). Each of these events can only be effected by a single event, but one event could be the causal event for multiple events. We want this association to go both ways so that, from any particular method, you can go straight to the event which caused it, or one of the events which it caused. Based on that, I started by adding the following properties to the Event objects and adding a funct:
protected Event causalEvent;
protected List<Event> effectedEvents;
After a bit of thinking, I considered that we never want the same item added twice to the effectedEvents list. After reading the answer to Preventing Duplicate List<T> Entries, I went with a Hashset.
protected Event causalEvent;
protected HashSet<Event> effectedEvents;
A co-worker and I got to discussing the code I'd added and he pointed out that using a HashSet might confuse people since he tended to see a HashSet and assume that there's a great deal of data. In our case, due to the rules being used in the algorithms, effectedEvents is going to have 0 items in about 90% of the cases, 1 item in 9%, and 2 maybe 1% of the time. Almost never will we have more than 2 items, although it is possible. I believe the lookup cost is the same for both collections. The amount of memory used is very similar since both start assuming a small capacity (although, I will concede that List gives you the ability to set that capacity in the constructor while HashSet only allows one to trim the value down based on its contents, "rounded to an implementation-specific value").
So, long question short, is there any real penalty to using a HashSet other than possible confusion for those unfamiliar with using it to ensure uniqueness?
The analysis performed in this answer indicates that you only see a performance advantage with HashSet over List when you get to 5 strings, or 20 objects (of course, results will vary based on what you are doing). Since you are going to have 0-2 items in almost all cases, your best bet performance-wise is probably to use the List.
I would not worry about the confusion of those unfamiliar with using a HashSet to ensure uniqueness. That is one of the primary uses of a HashSet. Pick the best tool for the job, and if you think people will be confused, a short comment can help with that.
Also, though it is good to use the best performing coding strategy, you should also beware of spending too much time on micro-optimizations that can be premature. Unless you are using a lot of these objects, you probably will never notice the difference between List and HashSet in this case.
If you are after memory and performance you could use a plain object and put the event directly into the field. If you need more than one entry you can replace it on demand with a List or HashSet.
Below is some code to show the concept. This gives you maximum speed with a much reduced memory footprint if most of the time the List/HashSet is empty.
This is in my opinion the most elegant solution for such sparse data structures.
using System;
using System.Collections.Generic;
using System.Diagnostics;
namespace DynamicSet
{
class Program
{
object DynamicSet; // can be null, one stored Event or a List/HashSet<Event>
// depending on how many elements are needed.
bool Contains(Event ev)
{
if( DynamicSet == null )
{
return false;
}
var storedEvent = DynamicSet as Event;
if (storedEvent != null)
{
return Object.ReferenceEquals(ev, storedEvent);
}
var set = (HashSet<Event>)DynamicSet;
return set.Contains(ev);
}
void AddEvent(Event ev)
{
if( DynamicSet == null )
{
DynamicSet = ev;
return;
}
var hash = DynamicSet as HashSet<Event>;
if( hash != null )
{
hash.Add(ev);
}
else
{
hash = new HashSet<Event>();
hash.Add((Event)DynamicSet);
DynamicSet = hash;
}
}
static void Main(string[] args)
{
Program p = new Program();
Event ev1 = new Event();
Event ev2 = new Event();
p.AddEvent(ev1);
Debug.Assert(p.Contains(ev1));
Debug.Assert(!p.Contains(ev2));
p.AddEvent(ev1);
Debug.Assert(p.Contains(ev1));
Debug.Assert(!p.Contains(ev2));
p.AddEvent(ev2);
Debug.Assert(p.Contains(ev1));
Debug.Assert(p.Contains(ev2));
}
}
}
I'm trying to optimise the performance of a string comparison operation on each string key of a dictionary used as a database query cache. The current code looks like:
public void Clear(string tableName)
{
foreach (string key in cache.Keys.Where(key => key.IndexOf(tableName, StringComparison.Ordinal) >= 0).ToList())
{
cache.Remove(key);
}
}
I'm new to using C# parallel features and am wondering what the best way would be to convert this into a parallel operation so that multiple string comparisons can happen 'simultaneously'. The cache can often get quite large so maintenance on it with Clear() can get quite costly.
Make your cache object a ConcurrentDictionary and use TryRemove instead of Remove.
This will make your cache thread-safe; then, can invoke your current foreach loop like this:
Parallel.ForEach(cache.Keys, key =>
{
if(key.IndexOf(tableName, StringComparison.Ordinal) >= 0)
{
dynamic value; // just because I don't know your dictionary.
cache.TryRemove(key, out value);
}
});
Hope that gives you an starting point.
Your approach can't work well on a Dictionary<string, Whatever> because that class isn't thread-safe for multiple writers, so the simultaneous deletes could cause all sorts of problems.
You will therefore have to use a lock to synchronise the removals, which will therefore make the access of the dictionary essentially single-threaded. About the only thing that can be safely done across the threads simultaneously is the comparison in the Where.
You could use ConcurrentDictionary because its use of striped locks will reduce this impact. It still doesn't seem the best approach though.
If you are building keys from a strings so that testing if the key starts with a sub-key, and if removing the entire subkey is a frequent need, then you could try using a Dictionary<string, Dictionary<string, Whatever>>. Adding or updating becomes a bit more expensive, but clearing becomes an O(1) removal of just the one value from the higher-level dictionary.
I've used Dictionaries as caches before and what I've used to do is to do the clean up the cache "on the fly", that is, with each entry I also include its time of inclusion, then anytime an entry is requested I remove the old entries. Performance hit was minimal to me but if needed you could implement a Queue (of Tuple<DateTime, TKey> where TKey is the type of the keys on your dictionary) as an index to hold these timestamps so you didn't need to iterate over the entire dictionary every time. Anyway, if you're having to think about these issues, it's time to consider using a dedicated caching server. To me, Shared Cache (http://sharedcache.codeplex.com) has been good enough.
I am running a server, and I would like to have a users dictionary, and give each user a specific number.
Dictionary<int,ServerSideUser> users = new Dictionary<int,ServerSideUser>();
The key represents the user on the server, so when people send messages to that user, they send them to this number. I might as well have used the users IP number, but that's not that a good idea.
I need to allocate such a number for each user, and I'm really not sure how to do so. Someone suggested something like
Enumerable.Range(int.MinValue, int.MaxValue)
.Except(users.Select(x => x.Key)).First();
but I really don't think it's the optimal way.
Also, I have the same problem with a List (or LinkedList) somewhere else.
Any ideas?
If the size of the "number" doesn't matter, take a Guid, it will always be unique and non-guessable.
If you want a dictionary that uses an arbitrary, ordered integer key, you may also be able to use a List<ServerSideUser>, in which the list index serves as the key.
Is there a specific reason you need to use a Dictionary?
Using a List<> or similar data structure definitely has limitations. Because of concurrency issues, you wouldn't want to remove users from the list at all, except when cycling the server. Otherwise, you might have a scenario in which user 255 sends a message to user 1024, who disconnects and is replaced by a new user 1024. New user 1024 then receives the message intended for old user 1024.
If you want to be able to manage the memory footprint of the user list, many of the other approaches here work; Will's answer is particularly good if you want to use ints rather than Guids.
Why don't you keep track of the current maximum number and increment that number by one every time a new user is added?
Another option: Use a factory to generate ServerSideUser instances, which assigns a new, unique ID to each user.
In this example, the factory is the class itself. You cannot instantiate the class, you must get a new instance by calling the static Create method on the type. It increments the ID generator and creates a new instance with this new id. There are many ways to do this in a thread safe way, I'm doing it here in a rudimentary 1.1-compatible way (c# pseudocode that may actually compile):
public class ServerSideUser
{
// user Id
public Id {get;private set;}
// private constructor
private ServerSideUser(){}
private ServerSideUser(int id) { Id = id; }
// lock object for generating an id
private static object _idgenLock = new Object();
private static int _currentId = 0; // or whatever
// retrieves the next id; thread safe
private static int CurrentId
{
get{ lock(_idgenLock){ _currentId += 1; return _currentId; } }
}
public static ServerSideUser Create()
{
return new ServerSideUser(CurrentId);
}
}
I suggest the combination of your approach and incremental.
Since your data is in memory, it is enough to have the identifier of type int.
Make a variable for the next user and a linked list of free identifiers.
When new user is added, use an Id from the list. If the list is empty — use the variable and increment it.
When a user is removed, add its identifier to the Dictionary.
P.S. Consider using a database.
First of all, I'd also start by seconding the GUID suggestion. Secondly, I'd assume that you're persisting the user information on the server somehow, and that somehow is likely a database. If this is the case, why not let the database pick a unique ID for each user via a primary key? Maybe it's not the best choice for what you're trying to do here, but this is the kind of problem that databases have been handling for years, so, why re-invent?
I think it depends on how you define the "uniqueness" of the clients.
For example if you have different two clients from the same machine do you consider them two clients or one?
I recommend you to use long value represents the time of connection establishment like "hhmmss" or even you can include milliseconds
Why not just start from 1 and count upwards?
lock(dict) {
int newId = dict.Count + 1;
dict[newId] = new User();
}
If you're really concerned about half the worlds population turning up at your one server, try using long:s instead.. :-D
Maybe a bit brutal, but could DateTime.Now.Ticks be something for you? As an added bonus, you know when the user was added to your dict.
From the MSDN docs on Ticks...
A single tick represents one hundred nanoseconds or one ten-millionth of a
second. There are 10,000 ticks in a millisecond.
The value of this property represents the number of 100-nanosecond intervals
that have elapsed since 12:00:00 midnight, January 1, 0001, which
represents DateTime..::.MinValue.