I am running a server, and I would like to have a users dictionary, and give each user a specific number.
Dictionary<int,ServerSideUser> users = new Dictionary<int,ServerSideUser>();
The key represents the user on the server, so when people send messages to that user, they send them to this number. I might as well have used the users IP number, but that's not that a good idea.
I need to allocate such a number for each user, and I'm really not sure how to do so. Someone suggested something like
Enumerable.Range(int.MinValue, int.MaxValue)
.Except(users.Select(x => x.Key)).First();
but I really don't think it's the optimal way.
Also, I have the same problem with a List (or LinkedList) somewhere else.
Any ideas?
If the size of the "number" doesn't matter, take a Guid, it will always be unique and non-guessable.
If you want a dictionary that uses an arbitrary, ordered integer key, you may also be able to use a List<ServerSideUser>, in which the list index serves as the key.
Is there a specific reason you need to use a Dictionary?
Using a List<> or similar data structure definitely has limitations. Because of concurrency issues, you wouldn't want to remove users from the list at all, except when cycling the server. Otherwise, you might have a scenario in which user 255 sends a message to user 1024, who disconnects and is replaced by a new user 1024. New user 1024 then receives the message intended for old user 1024.
If you want to be able to manage the memory footprint of the user list, many of the other approaches here work; Will's answer is particularly good if you want to use ints rather than Guids.
Why don't you keep track of the current maximum number and increment that number by one every time a new user is added?
Another option: Use a factory to generate ServerSideUser instances, which assigns a new, unique ID to each user.
In this example, the factory is the class itself. You cannot instantiate the class, you must get a new instance by calling the static Create method on the type. It increments the ID generator and creates a new instance with this new id. There are many ways to do this in a thread safe way, I'm doing it here in a rudimentary 1.1-compatible way (c# pseudocode that may actually compile):
public class ServerSideUser
{
// user Id
public Id {get;private set;}
// private constructor
private ServerSideUser(){}
private ServerSideUser(int id) { Id = id; }
// lock object for generating an id
private static object _idgenLock = new Object();
private static int _currentId = 0; // or whatever
// retrieves the next id; thread safe
private static int CurrentId
{
get{ lock(_idgenLock){ _currentId += 1; return _currentId; } }
}
public static ServerSideUser Create()
{
return new ServerSideUser(CurrentId);
}
}
I suggest the combination of your approach and incremental.
Since your data is in memory, it is enough to have the identifier of type int.
Make a variable for the next user and a linked list of free identifiers.
When new user is added, use an Id from the list. If the list is empty — use the variable and increment it.
When a user is removed, add its identifier to the Dictionary.
P.S. Consider using a database.
First of all, I'd also start by seconding the GUID suggestion. Secondly, I'd assume that you're persisting the user information on the server somehow, and that somehow is likely a database. If this is the case, why not let the database pick a unique ID for each user via a primary key? Maybe it's not the best choice for what you're trying to do here, but this is the kind of problem that databases have been handling for years, so, why re-invent?
I think it depends on how you define the "uniqueness" of the clients.
For example if you have different two clients from the same machine do you consider them two clients or one?
I recommend you to use long value represents the time of connection establishment like "hhmmss" or even you can include milliseconds
Why not just start from 1 and count upwards?
lock(dict) {
int newId = dict.Count + 1;
dict[newId] = new User();
}
If you're really concerned about half the worlds population turning up at your one server, try using long:s instead.. :-D
Maybe a bit brutal, but could DateTime.Now.Ticks be something for you? As an added bonus, you know when the user was added to your dict.
From the MSDN docs on Ticks...
A single tick represents one hundred nanoseconds or one ten-millionth of a
second. There are 10,000 ticks in a millisecond.
The value of this property represents the number of 100-nanosecond intervals
that have elapsed since 12:00:00 midnight, January 1, 0001, which
represents DateTime..::.MinValue.
Related
I'm building a WCF translation service. The service uses google translat's web api.
To avoid re-fetching commonly searched queries, the service keeps a memory cache of the last searches.
The search function first checks the cache, and only then puts out a request to google.
Last important detail - every time a request arrives, I construct a string to be it's key - it's comprised of the search term, and the two languages name codes - so it is unique per specific search.
The question is this:
Suppose two exact request's arrive at the same time, I would like to lock the second one out of the whole search function, so that when it enters it will find the search result that was already entered to the cache by the first one. If a different request arrives, I want to let it in.
My idea was to put a lock using the string I constructed, as it is unique per exact search (same search term, same languages). Will this give the result I described?
Thank for reading this far (:
See the code example below - and again thanks!
public async Task<TranslationResult> TranslateAsync(Langueges From, Langueges To, string Content)
{
string key = Content + From.ToString() + To.ToString();
lock(key)
{
//look for the search result in the cache
//if not found, request from google and store result in the memory cache
}
}
Something like this?
// this cache is probably going to be the application level cache in asp.net
// but using Dictionary as an example in this case
Dictionary<string,TranslationResult> cache = new ...
private static object syncRoot = new Object();
if (!cache.ContainsKey(key)
{
lock(syncRoot)
{
// this double checks is not a mistake.
// the second request would check for existence of
// the key once it acquire the lock
if (!cache.ContainsKey(key) {
// send request to google then store the result
}
else {
return cache[key];
}
}
}
else
{
return cache[key];
}
I don't think you'll get the behavior that you're looking for: the variable "key" will always be a newly created object and the current thread will always be granted the lock right away. I don't believe string interning will be in play here because the "key" variable is not a literal. If "key" were an interned string, then it might work like you want, but that would be a strange side-effect.
What to do? Consider using the ConcurrentDictionary.GetOrAdd method. Your "key" variable will become the key argument. If the key is not in the cache, then the valueFactory will be called. Your valuefactory implementation should send the call to Google. This is ConcurrentDictionary is thread-safe, so no explicit locking is needed on your part.
if two exact request's arrives at the same time then returns one of them with orginaly :)
You can know which is not trunslated then send it again. i made this before like this
I have an application that receives certain "events", uniquely identified by a 12 chars string and a DateTime. At each event is associated a result that is a string.
I need to keep these events in memory (for a maximum of for example 8 hours) and be able, in case I receive a second time the same event, being able to know I've already received it (in the last 8 hours).
Events to store will be less than 1000.
I can't use an external storage, it has to be done in memory.
My idea is to use a Dictionary where the key is a class composed of a string and a datetime, the value is the result.
EDIT: the string itself (actually a MAC address) does not identify uniquely the event, it's the MAC AND the DateTime, those two combined are unique, that's why the key must be formed by both.
The application is a server that receives a certain event from a client: the event is marked on the client by client MAC and by the client datetime (can't use a guid).
It may happen that the client retransmits the same data, and by checking the dictionary for that MAC/Datetime key I would know that I have already received that data.
Then, every hour (for example), I can foreach through the whole collection and remove all the keys where datetime is older than 8 hours.
Can you suggest a better approach to the problem or to the data formats I have chosen? In terms of performance and cleaniness of the code.
Or a better way to delete old data, with LINQ for example.
Thanks,
Mattia
The event time has to not be part of the key -- if it is, how are you going to be able to tell that you have already received this event? So you should move to a dictionary where the keys are event names and the values are tuples of date and result.
Once in a while you can trim old data from the dictionary easily with LINQ:
dictionary = dictionary
.Where(p => p.Value.DateOfEvent >= DateTime.Now.AddHours(-8))
.ToDictionary();
If requirements state that updating once per hour is good enough, and you're never having more than 1000 items in the dictionary, your solution should be perfectly adequate and probably the most easily understood by anyone else looking at your code. I'd probably recommend immutable structs for the key instead of classes, but that's it.
If there's a benefit to removing them immediately rather than once per hour, you could do something where you also add a Timer that removes it after exactly 8 hours, but then you've got to deal with thread safety and cleaning up all the timers and such. Likely not worth it.
I'd avoid the OrderedDictionary approach since it's more code, and may be slower since it has to reorder with every insert.
It's a common mantra these days to focus first on keeping code simple, only optimize when necessary. Until you have a known bottleneck and have profiled it, you never know if you're even optimizing the right thing. (And from your description, there's no telling which part will be slowest without profiling it).
I would go for a Dictionary.
This way you can searh very fast for the string (O(1)-operation).
Other collections are slower:
OrderedDictionary: is slow because it needs boxing and unboxing.
SortedDictionary: performs an O(log n) operation.
All normal arrays and lists: use an O(n/2) operation.
An example:
public class Event
{
public Event(string macAddress, DateTime time, string data)
{
MacAddress = macAddress;
Time = time;
Data = data;
}
public string MacAddress { get; set; }
public DateTime Time { get; set; }
public string Data { get; set; }
}
public class EventCollection
{
private readonly Dictionary<Tuple<string, DateTime>, Event> _Events = new Dictionary<Tuple<string, DateTime>, Event>();
public void Add(Event e)
{
_Events.Add(new Tuple<string, DateTime>(e.MacAddress, e.Time), e);
}
public IList<Event> GetOldEvents(bool autoRemove)
{
DateTime old = DateTime.Now - TimeSpan.FromHours(8);
List<Event> results = new List<Event>();
foreach(Event e in _Events.Values)
if (e.Time < old)
results.Add(e);
// Clean up
if (autoRemove)
foreach(Event e in results)
_Events.Remove(new Tuple<string, DateTime>(e.MacAddress, e.Time));
return results;
}
}
I would use an OrderedDictionary where the key is the 12 charactor identifier and the result and datetime are part of the value. Sadly OrderedDictionary is not generic (key and value are objects), so you would need to do the casting and type checking yourself. When you need to remove the old events, you can foreach through the OrderedDictionary and stop when you get to a time new enough to keep. This assumes the datetimes you use are in order when you add them to the dictionary.
Now that Stack Overflow uses redis, do they handle cache invalidation the same way? i.e. a list of identities hashed to a query string + name (I guess the name is some kind of purpose or object type name).
Perhaps they then retrieve individual items that are missing from the cache directly by id (which bypasses a bunch of database indexes and uses the more efficient clustered index instead perhaps). That'd be smart (the rehydration that Jeff mentions?).
Right now, I'm struggling to find a way to pivot all of this in a succinct way. Are there any examples of this kind of thing that I could use to help clarify my thinking prior to doing a first cut myself?
Also, I'm wondering where the cutoff is between using a .net cache (System.Runtime.Caching or System.Web.Caching) and going out and using redis. Or is Redis just hands down faster?
Here's the original SO question from 2009:
https://meta.stackexchange.com/questions/6435/how-does-stackoverflow-handle-cache-invalidation
A couple of other links:
https://meta.stackexchange.com/questions/69164/does-stackoverflow-use-caching-and-if-so-how/69172#69172
https://meta.stackexchange.com/questions/110320/stack-overflow-db-performance-and-redis-cache
I honestly can't decide if this is a SO question or a MSO question, but:
Going off to another system is never faster than querying local memory (as long as it is keyed); simple answer: we use both! So we use:
local memory
else check redis, and update local memory
else fetch from source, and update redis and local memory
This then, as you say, causes an issue of cache invalidation - although actually that isn't critical in most places. But for this - redis events (pub/sub) allow an easy way to broadcast keys that are changing to all nodes, so they can drop their local copy - meaning: next time it is needed we'll pick up the new copy from redis. Hence we broadcast the key-names that are changing against a single event channel name.
Tools: redis on ubuntu server; BookSleeve as a redis wrapper; protobuf-net and GZipStream (enabled / disabled automatically depending on size) for packaging data.
So: the redis pub/sub events are used to invalidate the cache for a given key from one node (the one that knows the state has changed) immediately (pretty much) to all nodes.
Regarding distinct processes (from comments, "do you use any kind of shared memory model for multiple distinct processes feeding off the same data?"): no, we don't do that. Each web-tier box is only really hosting one process (of any given tier), with multi-tenancy within that, so inside the same process we might have 70 sites. For legacy reasons (i.e. "it works and doesn't need fixing") we primarily use the http cache with the site-identity as part of the key.
For the few massively data-intensive parts of the system, we have mechanisms to persist to disk so that the in-memory model can be passed between successive app-domains as the web naturally recycles (or is re-deployed), but that is unrelated to redis.
Here's a related example that shows the broad flavour only of how this might work - spin up a number of instances of the following, and then type some key names in:
static class Program
{
static void Main()
{
const string channelInvalidate = "cache/invalidate";
using(var pub = new RedisConnection("127.0.0.1"))
using(var sub = new RedisSubscriberConnection("127.0.0.1"))
{
pub.Open();
sub.Open();
sub.Subscribe(channelInvalidate, (channel, data) =>
{
string key = Encoding.UTF8.GetString(data);
Console.WriteLine("Invalidated {0}", key);
});
Console.WriteLine(
"Enter a key to invalidate, or an empty line to exit");
string line;
do
{
line = Console.ReadLine();
if(!string.IsNullOrEmpty(line))
{
pub.Publish(channelInvalidate, line);
}
} while (!string.IsNullOrEmpty(line));
}
}
}
What you should see is that when you type a key-name, it is shown immediately in all the running instances, which would then dump their local copy of that key. Obviously in real use the two connections would need to be put somewhere and kept open, so would not be in using statements. We use an almost-a-singleton for this.
I would need to manage some lists with timer: each element of these lists is associated with a timer, so when the timer expires, the corresponding element must be removed from the list.
In this way the length of the list does not grow too much because, as time goes on, the elements are progressively removed. The speed with which the list length increases also depends on the rate of addition of new elements.
However I need to add the following constraint: the amount of RAM used by the list does not exceed a certain limit, ie the user must specify the maximum number of items that can be stored in RAM.
Therefore, if the rate of addition of the elements is low, all items can be stored in RAM. If, however, the rate of addition of elements is high, old items are likely to be lost before the expiration of their timers.
Intuitively, I thought about taking a cue from swapping technique used by operating systems.
class SwappingList
{
private List<string> _list;
private SwapManager _swapManager;
public SwappingList(int capacity, SwapManager swapManager)
{
_list = new List<string>(capacity);
_swapManager = swapManager;
// TODO
}
}
One of the lists that I manage is made ​​up of strings of constant length, and it must work as a hash table, so I should use HashMap, but how can I define the maximum capacity of a HashMap object?
Basically I would like to implement a caching mechanism, but I wish that the RAM used by the cache is limited to a number of items or bytes, which means that old items that are not expired yet, must be moved to a file.
According to the comments above you want a caching mechanism.
.NET 4 has this build-in (see http://msdn.microsoft.com/en-us/library/system.runtime.caching.aspx) - it comes with configurable caching policy which you can use to configure expiration among other things... it provides even some events which you can assign delegates to that are called prior removing a cache entry to customize this process even further...
You cannot specify the maximum capacity of a HashMap. You need to implement a wrapper around it, which, after each insertion, checks to see if the maximum count has been reached.
It is not clear to me whether that's all you are asking. If you have more questions, please be sure to state them clearly and use a question mark with each one of them.
Let's say I have a relatively large list of an object MyObjectModel called MyBigList. One of the properties of MyObjectModel is an int called ObjectID. In theory, I think MyBigList could reach 15-20MB in size. I also have a table in my database that stores some scalars about this list so that it can be recomposed later.
What is going to be more efficient?
Option A:
List<MyObjectModel> MyBigList = null;
MyBigList = GetBigList(some parameters);
int RowID = PutScalarsInDB(MyBigList);
Option B:
List<MyObjectModel> MyBigList = null;
MyBigList = GetBigList(some parameters);
int TheCount = MyBigList.Count();
StringBuilder ListOfObjectID = null;
foreach (MyObjectModel ThisObject in MyBigList)
{
ListOfObjectID.Append(ThisObject.ObjectID.ToString());
}
int RowID = PutScalarsInDB ( TheCount, ListOfObjectID);
In option A I pass MyBigList to a function that extracts the scalars from the list, stores these in the DB and returns the row where these entries were made. In option B, I keep MyBigList in the page method where I do the extraction of the scalars and I just pass these to the PutScalarsInDB function.
What's the better option, and it could be that yet another is better? I'm concerned about passing around objects this size and memory usage.
I don't think you'll see a material difference between these two approaches. From your description, it sounds like you'll be burning the same CPU cycles either way. The things that matter are:
Get the list
Iterate through the list to get the IDs
Iterate through the list to update the database
The order in which these three activities occur, and whether they occur within a single method or a subroutine, doesn't matter. All other activities (declaring variables, assigning results, etc.,) are of zero to negligible performance impact.
Other things being equal, your first option may be slightly more performant because you'll only be iterating once, I assume, both extracting IDs and updating the database in a single pass. But the cost of iteration will likely be very small compared with the cost of updating the database, so it's not a performance difference you're likely to notice.
Having said all that, there are many, many more factors that may impact performance, such as the type of list you're iterating through, the speed of your connection to the database, etc., that could dwarf these other considerations. It doesn't look like too much code either way. I'd strongly suggesting building both and testing them.
Then let us know your results!
If you want to know which method has more performance you can use the stopwatch class to check the time needed for each method. see here for stopwatch usage: http://www.dotnetperls.com/stopwatch
I think there are other issues for a asp.net application you need to verify:
From where do read your list? if you read it from the data base, would it be more efficient to do your work in database within a stored procedure.
Where is it stored? Is it only read and destroyed or is it stored in session or application?