I am currently working on a system that makes calls to an external service and caches some of the data in the HttpContext.Current.Items collection for performance. The data can change quite regularly and it is user sensitive which is why we are currently storing it only for the duration of the current HttpRequest.
Example:
if (HttpContext.Current.Items[cacheKey] != null)
{
LogHelper.Debug<ExampleService>("[- CACHED RESULT -] GetUser({0})", () => email);
return (ExampleUser)HttpContext.Current.Items[cacheKey];
}
using (var client = new UserServiceClient())
{
using (new OperationContextScope(client.InnerChannel))
{
LogHelper.Debug<ExampleService>("GetUser({0})", () => email);
exampleUser = svc.GetUser(email);
HttpContext.Current.Items.Add(cacheKey, exampleUser);
}
}
In my local environment this behaves as expected and mostly also does in staging where the same thread is used for the duration of the request however in production this is not the case and there are still multiple calls to the external service in the same request. This can be seen from the logs which show that the value in HttpContext.Current.Items[cacheKey] is not returned in cases where the Thread ID does not match the original request.
This I guess means that my current understanding of HttpContext.Current.Items is wrong and that this is not a suitable solution for my needs.
My question therefore is can this be made to work across threads in the same request and if so should it, otherwise what suitable alternative is there?
One option is to use Session to store your data. Unfortunately it's not applicable for API-specific requests (e.g mobile device makes a call to server API). Besides, server session state requires all of your data serializable (DB session state doesn't).
If session does not satisfy your requirements, then you should go to next option: Using cache protected by something that represents your requests coming from the same user (a.k.a access token).
Related
Currently we use Apache Ignite.NET thin client to cache different sets of data. When data request has came we check if data is already stored is the cache and, if not, request data from database and put it into the cache.
I want to prevent several database requests if two data requests has came at the same time.
Is there any way to manually lock cache before the first database request started? Thus second data request could wait until first request is completed.
I cannot solve the task isung .NET concurrency primitives cause cache could be used by multiple client instances (load-balancing).
I've already found ICache.Lock(TK key) method, but it seems that it locks only specified rows in cache and is supported only for in self-hosted mode, not for Ignite.NET this client.
Small piece of code that illustrates the issue:
var key = "cache_key";
using (var ignite = Ignition.StartClient(new Core.Client.IgniteClientConfiguration { Host = "127.0.0.1" }))
{
var cacheNames = ignite.GetCacheNames();
if (cacheNames.Contains(key))
{
return ignite.GetCache<int, Employee>(key).AsCacheQueryable();
}
else
{
var data = RequestDataFromDatabase();
var cache = ignite.CreateCache<int, Employee>(new CacheClientConfiguration(
EmployeeCacheName, new QueryEntity(typeof(int), typeof(Employee))));
cache.PutAll(data);
return cache.AsCacheQueryable();
}
}
The thin client doesn't have the required API.
If you don't need to check for individual records and it's only required to know whether the cache is available, you might just call CreateCache multiple times. It should throw an exception saying that the cache with a particular name already has started for further invocations.
try {
var cache = ignite.CreateCache<int, Employee>(new CacheClientConfiguration(
EmployeeCacheName, new QueryEntity(typeof(int), typeof(Employee))));
// Cache created by this call => add data here
} catch (IgniteClientException e) when (e.Message.Contains("already started")) {
// Return existing cache, don't add data
}
Alexandr has provided a good and simple solution if you just need to initialize the cache once.
If you need more complex synchronization logic, atomic cache operations (PutIfAbsent, Replace) can often replace locks. For example, we could have a special cache to track the status of other caches:
var statusCache = Client.GetOrCreateCache<string, string>("status");
if (statusCache.PutIfAbsent("cache-name", "created"))
{
// Just created, add data
...
//
statusCache.Put("cache-name", "populated");
}
else
{
// Already exists, wait for data
while (statusCache["cache-name"] != "populated")
Thread.Sleep(1000);
}
There is a .NET 4.7 WebAPI application working with SQL Server using Entity Framework and hosting NServiceBus endpoint with MSMQ transport.
Simplified workflow can be described by a controller action:
[HttpPost]
public async Task<IHttpActionResult> SendDebugCommand()
{
var sample = new Sample
{
State = SampleState.Initial,
};
_dataContext.Set<Sample>().Add(sample);
await _dataContext.SaveChangesAsync();
sample.State = SampleState.Queueing;
var options = new TransactionOptions
{
IsolationLevel = IsolationLevel.ReadCommitted,
};
using (var scope = new TransactionScope(TransactionScopeOption.Required, options, TransactionScopeAsyncFlowOption.Enabled))
{
await _dataContext.SaveChangesAsync();
await _messageSession.Send(new DebugCommand {SampleId = sample.Id});
scope.Complete();
}
_logger.OnCreated(sample);
return Ok();
}
And DebugCommand handler, that is sent to the same NServiceBus endpoint:
public async Task Handle(DebugCommand message, IMessageHandlerContext context)
{
var sample = await _dataContext.Set<Sample>().FindAsync(message.SampleId);
if (sample == null)
{
_logger.OnNotFound(message.SampleId);
return;
}
if (sample.State != SampleState.Queueing)
{
_logger.OnUnexpectedState(sample, SampleState.Queueing);
return;
}
// Some work being done
sample.State = SampleState.Processed;
await _dataContext.SaveChangesAsync();
_logger.OnHandled(sample);
}
Sometimes, message handler retrieves the Sample from the DB and its state is still Initial, not Queueing as expected. That means that distributed transaction initiated in the controller action is not yet fully complete. That is also confirmed by time-stamps in the log file.
The 'sometimes' happens quite rarely, under heavier load and network latency probably affects. Couldn't reproduce the problem with local DB, but easily with a remote DB.
I checked DTC configurations. I checked there is escalation to a distributed transaction for sure. Also if scope.Complete() is not called then there will be no DB update neither message sending happening.
When the transaction scope is completed and disposed, intuitively I expect both DB and MSMQ to be settled before a single further instruction is executed.
I couldn't find definite answers to questions:
Is this the way DTC work? Is this normal for both transaction parties to do commits, while completion is not reported back to the coordinator?
If yes, does it mean I should overcome such events by altering logic of the program?
Am I misusing transactions somehow? What would be the right way?
In addition to the comments mentioned by Evk in Distributed transaction with MSMQ and SQL Server but sometimes getting dirty reads here's also an excerpt from the particular documentation page about transactions:
A distributed transaction between the queueing system and the persistent storage guarantees atomic commits but guarantees only eventual consistency.
Two additional notes:
NServiceBus uses IsolationLevel.ReadCommitted by default for the transaction used to consume messages. This can be configured although I'm not sure whether setting it to serialized on the consumer would really solve the issue here.
In general, it's not advised to use a shared database between services as this highly increases coupling and opens the door for issues like you're experiencing here. Try to pass relevant data as part of the message and keep the database an internal storage for one service. Especially when using web servers, a common pattern is to add all the relevant data to a message and fire it while confirming success to the user (as the message won't be lost) while the receiving endpoint can store the data to it's database if necessary. To give more specific recommendations, this requires more knowledge about your domain and use case. I can recommend the particular discussion community to discuss design/architectural question like this.
To make this easier to understand: We are using a database that does not have connection pooling built in. We are implementing our own connection pooler.
Ok so the title probably did not give the best description. Let me first Describe what I am trying to do. We have a WCF Service (hosted in a windows service) that needs to be able to take/process multiple requests at once. The WCF service will take the request and try to talk to (say) 10 available database connections. These database connections are all tracked by the WCF service and when processing are set to busy. If a request comes in and the WCF tries to talk to one of the 10 database connections and all of them are set to busy we would like the WCF service to wait for and return the response when it becomes available.
We have tried a few different things. For example we could have while loop (yuck)
[OperationContract(AsyncPattern=true)]
ExecuteProgram(string clientId, string program, string[] args)
{
string requestId = DbManager.RegisterRequest(clientId, program, args);
string response = null;
while(response == null)
{
response = DbManager.GetResponseForRequestId(requestId);
}
return response;
}
Basically the DbManager would track requests and responses. Each request would call the DbManager which would assign a request id. When a database connection is available it would assign (say) Responses[requestId] = [the database reponse]. The request would constantly ask the DbManager if it had a response and when it did the request could return it.
This has problems all over the place. We could possibly have multiple threads stuck in while loops for who knows how long. That would be terrible for performance and CPU usage. (To say the least)
We have also looked into trying this with events / listeners. I don't know how this would be accomplished so the code below is more of how we envisioned it working.
[OperationContract(AsyncPattern=true)]
ExecuteProgram(string clientId, string program, string[] args)
{
// register an event
// listen for that event
// when that event is called return its value
}
We have also looked into the DbManager having a queue or using things like Pulse/Monitor.Wait (which we are unfamiliar with).
So, the question is: How can we have an async WCF Operation that returns when it is able to?
WCF supports the async/await keywords in .net 4.5 http://msdn.microsoft.com/en-us/library/vstudio/hh191443.aspx. You would need to do a bit of refactoring to make your ExecuteProgram async and make your DbManager request operation awaitable.
If you need your DbManager to manage the completion of these tasks as results become available for given clientIds, you can map each clientId to a TaskCompletionSource. The TaskCompletionSource can be used to create a Task and the DbManager can use the TaskCompletionSource to set the results.
This should work, with a properly-implemented async method to call:
[OperationContract]
string ExecuteProgram(string clientId, string program, string[] args)
{
Task<string> task = DbManager.DoRequestAsync(clientId, program, args);
return task.Result;
}
Are you manually managing the 10 DB connections? It sounds like you've re-implemented database connection pooling. Perhaps you should be using the connection pooling built-in to your DB server or driver.
If you only have a single database server (which I suspect is likely), then just use a BlockingCollection for your pool.
I have a short lock guarded section in a method (that serves the request entirely) that makes all initializations (etc. log-related). So only 1 thread can be there at time. In this section I also load system data from database if not loaded. This is naturally executed only on 1st request and it does not matter it takes time and no threads can propagate since it's done only once (by dummy request).
static public void LoadAllSystemData()
{
SystemData newData = new SystemData(); //own type (etc. Hashtables in Hashtables).
LoadTables(ref newData);
LoadClasses(ref newData);
LoadAllSysDescrs(ref newData);
LoadFatFields(ref newData);
LoadAllFields(ref newData);
_allData = newData;
}
After the lock-guarded section the system data is accessed from concurrent threads only by reading and no locks are needed:
static public Hashtable GetTables()
{
return _allData.Tables;
}
Now the lock guarded section must have method that checks if system data is older than 24h and refresh it. If it done just by calling method (from lock guarded section) below that thread takes a long time and no other thread can enter the lock guarded section.
static public void CheckStatus()
{
DateTime timeStamp = DateTime.Now;
TimeSpan span = timeStamp.Subtract(_cacheTimeStamp);
if (span.Hours >= 24)
{
LoadAllSystemData();
_cacheTimeStamp = DateTime.Now;
}
}
My questions are:
How to spawn a non-threadpool thread best way to handle IO so the threadpool worker thread can propagate and all the threads spend minimum time in lock guarded section?
Is the _allData = newData; in LoadAllSystemData atomic? If it is, it feels the best way to implement that so GetXxx-methods like GetTables do not need any locking!
Is there any way to get LoadAllSystemData to be called before requests? For example on iisreset?
Thanks in advance for your answers!
Matti, you're asking multiple questions that point to the best structure for your application. I would summarize your questions as:
How do I pre-load data needed by my service prior to handling any legitimate requests?
How do I ensure the pre-loaded data is loaded "once"?
How do I refresh the pre-loaded data on a schedule, i.e. every 24 hours?
Understanding the Asp.Net pipeline and event structure will help you understand the answers to these questions.
First, the Asp.Net pipeline provides a one-time execution region in Application_Start. This is fired once per application cycle. This would also fire in 'iisreset', which cycles every application for a given server. However, application cycles themselves will recycle on their own, based on their configuration. All of which are controlled through IIS settings.
Second, Asp.Net is a request system; it won't fire refresh events for you, nor can you use it by itself as a scheduler. You need an outside agent to act on that for you.
Here's what you could do:
Add your pre-loaded data routine to Application_Start.
Configure your web site's Application cycle settings for every 24 hours.
This would ensure your data is loaded once, via Application_Start. It would ensure your data is loaded prior to your site serving any requests. Last, it would ensure your data is refreshed every 24 hours.
Here's what I would do:
Add pre-loaded data routine to Application_Start.
Modify pre-loaded data routine to use Asp.Net Cache, instead of static usage.
Modify lookup data to retrieve data from cache, and re-load if data is not found.
Provide capability for diagnostic/monitoring system, i.e. nagios, to refresh data asynchronously via web method call, i.e. "preload_refresh()".
This would provide essentially the same forward effect as the above solution, but with better reliability for your service.
As always, your mileage may vary. Hope this helps.
To answer part of your question anyway, you can use the "Application_Start" method in the Global.asax to execute the statement on the start of the application if you want to pre-fill the data as soon as the application comes online.
Having set up a ReferenceDataRequest I send it along to an EventQueue
Service refdata = _session.GetService("//blp/refdata");
Request request = refdata.CreateRequest("ReferenceDataRequest");
// append the appropriate symbol and field data to the request
EventQueue eventQueue = new EventQueue();
Guid guid = Guid.NewGuid();
CorrelationID id = new CorrelationID(guid);
_session.SendRequest(request, eventQueue, id);
long _eventWaitTimeout = 60000;
myEvent = eventQueue.NextEvent(_eventWaitTimeout);
Normally I can grab the message from the queue, but I'm hitting the situation now that if I'm making a number of requests in the same run of the app (normally around the tenth), I see a TIMEOUT EventType
if (myEvent.Type == Event.EventType.TIMEOUT)
throw new Exception("Timed Out - need to rethink this strategy");
else
msg = myEvent.GetMessages().First();
These are being made on the same thread, but I'm assuming that there's something somewhere along the line that I'm consuming and not releasing.
Anyone have any clues or advice?
There aren't many references on SO to BLP's API, but hopefully we can start to rectify that situation.
I just wanted to share something, thanks to the code you included in your initial post.
If you make a request for historical intraday data for a long duration (which results in many events generated by Bloomberg API), do not use the pattern specified in the API documentation, as it may end up making your application very slow to retrieve all events.
Basically, do not call NextEvent() on a Session object! Use a dedicated EventQueue instead.
Instead of doing this:
var cID = new CorrelationID(1);
session.SendRequest(request, cID);
do {
Event eventObj = session.NextEvent();
...
}
Do this:
var cID = new CorrelationID(1);
var eventQueue = new EventQueue();
session.SendRequest(request, eventQueue, cID);
do {
Event eventObj = eventQueue.NextEvent();
...
}
This can result in some performance improvement, though the API is known to not be particularly deterministic...
I didn't really ever get around to solving this question, but we did find a workaround.
Based on a small, apparently throwaway, comment in the Server API documentation, we opted to create a second session. One session is responsible for static requests, the other for real-time. e.g.
_marketDataSession.OpenService("//blp/mktdata");
_staticSession.OpenService("//blp/refdata");
The means one session operates in subscription mode, the other more synchronously - I think it was this duality which was at the root of our problems.
Since making that change, we've not had any problems.
My reading of the docs agrees that you need separate sessions for the "//blp/mktdata" and "//blp/refdata" services.
A client appeared to have a similar problem. I solved it by making hundreds of sessions rather than passing in hundreds of requests in one session. Bloomberg may not be to happy with this BFI (brute force and ignorance) approach as we are sending the field requests for each session but it works.
Nice to see another person on stackoverflow enjoying the pain of bloomberg API :-)
I'm ashamed to say I use the following pattern (I suspect copied from the example code). It seems to work reasonably robustly, but probably ignores some important messages. But I don't get your time-out problem. It's Java, but all the languages work basically the same.
cid = session.sendRequest(request, null);
while (true) {
Event event = session.nextEvent();
MessageIterator msgIter = event.messageIterator();
while (msgIter.hasNext()) {
Message msg = msgIter.next();
if (msg.correlationID() == cid) {
processMessage(msg, fieldStrings, result);
}
}
if (event.eventType() == Event.EventType.RESPONSE) {
break;
}
}
This may work because it consumes all messages off each event.
It sounds like you are making too many requests at once. BB will only process a certain number of requests per connection at any given time. Note that opening more and more connections will not help because there are limits per subscription as well. If you make a large number of time consuming requests simultaneously, some may timeout. Also, you should process the request completely(until you receive RESPONSE message), or cancel them. A partial request that is outstanding is wasting a slot. Since splitting into two sessions, seems to have helped you, it sounds like you are also making a lot of subscription requests at the same time. Are you using subscriptions as a way to take snapshots? That is subscribe to an instrument, get initial values, and de-subscribe. If so, you should try to find a different design. This is not the way the subscriptions are intended to be used. An outstanding subscription request also uses a request slot. That is why it is best to batch as many subscriptions as possible in a single subscription list instead of making many individual requests. Hope this helps with your use of the api.
By the way, I can't tell from your sample code, but while you are blocked on messages from the event queue, are you also reading from the main event queue while(in a seperate event queue)? You must process all the messages out of the queue, especially if you have outstanding subscriptions. Responses can queue up really fast. If you are not processing messages, the session may hit some queue limits which may be why you are getting timeouts. Also, if you don't read messages, you may be marked a slow consumer and not receive more data until you start consuming the pending messages. The api is async. Event queues are just a way to block on specific requests without having to process all messages from the main queue in a context where blocking is ok, and it would otherwise be be difficult to interrupt the logic flow to process parts asynchronously.