Here is the situation that we're in.
We are distributing our assemblies (purely DLL) to our clients (we don't have control over their environment).
They call us by passing a list of item's id and we search through our huge database and return items with highest price. Since we have our SLA (30 milisecond) to meet, we are caching our items in memory cache (using Microsoft MemoryCache) We are caching about a million items.
The problem here is, it only caches throughout our client application lifetime. When the process exit, so are all the cached items.
Is there a way i can make my memorycache live longer, so that subsequent process can reused cached items?
I have consider having a window service and allow all these different processes to communicate with one on the same box, but that's going to create a huge mess when it comes to deployment.
We are using AppFabric as our distributed cache but the only way we can achieve our SLA is to use memorycache.
Any help would be greatly appreciated. Thank you
I don't see a way to make sure that your AppDomain lives longer - since all the calling assembly has to do is unload the AppDomain...
One option could be -although messy too- to implement some sort of "persisting MemoryCache"... to achieve performance you could/would use a ConcurrentDictionary persisted in a MemoryMappedFile...
Another option would be to use a local database - could even be Sqlite and implement to cache interface in-memory such that all writes/updates/deletes are "write-through" while reads are pure RAM-access...
Another option could be to include a EXE (as embedded resource for example) and start that from inside the DLL if it is not running... the EXE provides the MemoryCache, communication could be via IPC (for example shared memory...). Since the EXE is a separate process it would stay alive even after unloading your AppDomain... the problem with this is more whether the client likes and/or permissions allow it...
I really like Windows Service approach although I agree that could be a deployment mess...
The basic issue seems to be that you don't have control of the run-time Host - which is what controls the lifespan (and hence the cache).
I'd investigate creating some sort of (light-weight ?) host - maybe a .exe or a service.
The bulk of your DLL's would hang off the new host, but you could still deploy a "facade" DLL which in turn called your main solution (tied to your host). Yes you could have the external clients call your new host directly but that would mean changing / re-configuring those external callers where-as leaving your original DLL / API in place would isolate the external callers from your internal changes.
This would (I assume) mean completely gutting and re-structuring your solution, particularly whatever DLLs the external callers currently hit, because instead of processing the requests itself it's just going to pass the request off to your new host.
Performance
Inter-process communication is more expensive than keeping it within a process - I'm not sure how the change in approach would affect your performance and ability to hit the SLA.
In-particular, sparking up a new instance of the host will incur a performance hit.
Related
We have created a dotnet core web api project which is using SQL Server database. Now, we are planning to deploy this project to Microsoft Azure.
While the deployment of this application, we are also considering to enable autoscaling option (horizontal scaling).
Before, we do it. We want to have some questions that we want to clarify.
Should we need to add some additional code in our application which allows autoscaling to work properly?
Properly in a sense, as there can be more than one instance of the application running because of horizontal scaling. We are using database and more than one instance is running will it case race condition (i.e., two resources accessing the same data at a time). I mean we can add a transaction (or use locking) in our code to avoid these kinds of scenarios?
I want to know that is there any best practices to follow while implementing that kind of application?
Thank you and waiting for your answers!
Consider the following points when designing an autoscaling strategy:
The system must be designed to be horizontally scalable. Avoid making
assumptions about instance affinity; do not design solutions that
require that the code is always running in a specific instance of a
process. When scaling a cloud service or web site horizontally, do
not assume that a series of requests from the same source will always
be routed to the same instance. For the same reason, design services
to be stateless to avoid requiring a series of requests from an
application to always be routed to the same instance of a service.
When designing a service that reads messages from a queue and
processes them, do not make any assumptions about which instance of
the service handles a specific message because autoscaling could
start additional instances of a service as the queue length grows.
The Competing Consumers pattern describes how to handle this
scenario.
If the solution implements a long-running task, design this task to
support both scaling out and scaling in. Without due care, such a
task could prevent an instance of a process from being shutdown
cleanly when the system scales in, or it could lose data if the
process is forcibly terminated. Ideally, refactor a long-running task
and break up the processing that it performs into smaller, discrete
chunks. The Pipes and Filters pattern provides an example of how you
can achieve this. Alternatively, you can implement a checkpoint
mechanism that records state information about the task at regular
intervals, and save this state in durable storage that can be
accessed by any instance of the process running the task. In this
way, if the process is shutdown, the work that it was performing can
be resumed from the last checkpoint by using another instance.
For more information, follow the doc : https://github.com/Huachao/azure-content/blob/master/articles/best-practices-auto-scaling.md
Regarding this:
Properly in a sense, as there can be more than one instance of the application running because of horizontal scaling. We are using database and more than one instance is running will it case race condition (i.e., two resources accessing the same data at a time). I mean we can add a transaction (or use locking) in our code to avoid these kinds of scenarios?
Please keep in mind that, even if the app is running on a single machine, requests will still be handled concurrently. This means that even on a single machine 2 requests can cause the same entry in the database to be updated. So the above questions about race conditions apply to single instance web apps as well.
Try to avoid locking: the whole point of (horizontal) scaling is to gain performance benefits. By using locks you effectively remove this benefits as only one process at a time can use the locked resource.
Other points of considerations are:
If you are using an in-memory cache you might want to swap it out for a distributed cache.
The guidance at the MS docs
I've been working on an internal developer tool on and off for a few weeks now, but I'm running into an ugly stumbling block I haven't managed to find a good solution for. I'm hoping someone can offer some ideas or guidance on the best ways to use the existing frameworks in .NET.
Background: the purpose of this tool is to load multiple different types of log files (Windows Event Log, IIS, SQL trace, etc.) to the same database table so they can be sorted and examined together. My personal goal is to make the entire thing streamlined so that we only make a single pass and do not cache the entire log either in memory or to disk. This is important when log files reach hundreds of MB or into the GB range. Fast performance is good, but slow and unobtrusive (allowing you to work on something else in the meantime) is better than running faster but monopolizing the system in the process, so I've focused on minimizing RAM and disk usage.
I've iterated through a few different designs so far trying to boil it down to something simple. I want the core of the log parser--the part that has to interact with any outside library or file to actually read the data--to be as simple as possible and conform to a standard interface, so that adding support for a new format is as easy as possible. Currently, the parse method returns an IEnumerable<Item> where Item is a custom struct, and I use yield return to minimize the amount of buffering.
However, we quickly run into some ugly constraints: the libraries provided (generally by Microsoft) to process these file formats. The biggest and ugliest problem: one of these libraries only works in 64-bit. Another one (Microsoft.SqlServer.Management.Trace TraceFile for SSMS logs) only works in 32-bit. As we all know, you can't mix and match 32- and 64-bit code. Since the entire point of this exercise is to have one utility that can handle any format, we need to have a separate child process (which in this case is handling the 32-bit-only portion).
The end result is that I need the 64-bit main process to start up a 32-bit child, provide it with the information needed to parse the log file, and stream the data back in some way that doesn't require buffering the entire contents to memory or disk. At first I tried using stdout, but that fell apart with any significant amount of data. I've tried using WCF, but it's really not designed to handle the "service" being a child of the "client", and it's difficult to get them synchronized backwards from how they want to work, plus I don't know if I can actually make them stream data correctly. I don't want to use a mechanism that opens up unsecured network ports or that could accidentally crosstalk if someone runs more than one instance (I want that scenario to work normally--each 64-bit main process would spawn and run its own child). Ideally, I want the core of the parser running in the 32-bit child to look the same as the core of a parser running in the 64-bit parent, but I don't know if it's even possible to continue using yield return, even with some wrapper in place to help manage the IPC. Is there any existing framework in .NET that makes this relatively easy?
WCF does have a P2P mode however if all your processes are local machine you are better off with IPC such as named pipes due to the latter running in Kernel Mode and does not have the messaging overhead of the former.
Failing that you could try COM which should not have a problem talking between 32 and 64 bit processes. - Tell me more
In case anyone stumbles across this, I'll post the solution that we eventually settled on. The key was to redefine the inter-process WCF service interface to be different from the intra-process IEnumerable interface. Instead of attempting to yield return across process boundaries, we stuck a proxy layer in between that uses an enumerator, so we can call a "give me an item" method over and over again. It's likely this has more performance overhead than a true streaming solution, since there's a method call for every item, but it does seem to get the job done, and it doesn't leak or consume memory.
We did follow Micky's suggestion of using named pipes, but still within WCF. We're also using named semaphores to coordinate the two processes, so we don't attempt to make service calls until the "child service" has finished starting up.
I essentially want to make an api for an application but I only want one instance of that dll to be running at one time.
So multiple applications also need to be able to use the DLL at the same time. As you would expect from a normal api.
However I want it to be the same instance of the dll that the different applications use. This is because of communication with hardware that I don't want to be able to overlap.
DLLs are usually loaded once per process, so if your application is guaranteed to only be running in single-instance mode, there's nothing else you have to do. Your single application instance will have only one loaded DLL.
Now, if you want to "share" a "single instance" of a DLL across applications, you will inevitably have to resort to a client-server architecture. Your DLL will have to be wrapped in a Windows Service, which would expose an HTTP (or WCF) API.
You can't do that as you intend to do. The best way to do this would be having a single process (a DLL is not a process) which receives and processes messages, and have your multiple clients use an API (this would be your DLL) that just sends messages to this process.
The intercommunication of those two processes (your single process and the clients sending or receiving the messages via your API) could be done in many ways, choose the one that suits you better (basically, any kind of client/server architecture, even if the clients and the server are running on the same hardware)
This is an XY-Problem type of question. Your actual requirement is serializing interactions with the underlying hardware, so they do not overlap. Perhaps this is what you should explicitly and specifically be asking about.
Your proposed solution is to have a DLL that is kind of an OS-wide singleton or something like that. This is actually what you are asking about; although it is still not the right approach, in my opinion. The OS is in charge of managing the lifetime of the DLL modules in each process. There are many aspects to this, but for one: most DLL instances are already being shared between every process (mostly code sections, resources and such - data, of course, is not shared by default).
To solve your actual problem, you would have to resort to multi-process synchronization techniques. In Windows, this works mostly through named kernel objects like mutexes, semaphores, events and such. Another approach would be to use IPC, as other folks have already mentioned in their respective answers, which then again would require in itself some kind of synchronization.
Maybe all this is already handled by that hardware's device driver. What would be the real scenarios in which overlapped interactions with the underlying hardware would have a negative impact on the applications that use your DLL?
To ensure you have loaded one DLL per machine, you would need to run a controlling assembly in separate AppDomain, then try creating named pipe for remoting (with IpcChannel) and claim hardware resources. IpcChannel will fail to create second time in the same environment. If you need high performance communication with your hardware, use remoting only for claiming and releasing resource by another assembly used by applications.
Mutex is one of solution for exclusive control of multiple processes.
***But Mutex will sometimes occur dead lock. Be careful if you use.
We have a service running that connects with hundreds of devices over TCP. Every time we want to do an update of this service we need to restart it and this causes a connection loss for all devices.
To prevent this we want to divide our application into a connection part and a business logic/datalayer part. This will give us the option to update the business logic/datalayer without restarting the connection part. This could be done with WCF services, but the system should response as fast a possible and introducing another connection to something will cause an extra delay.
Would it be possible to update a dll file without restarting the application and give the application an instruction so it will load the new dll and discharge the old one? Off course as long as the interface between the layers don't break.
According to MSDN:
"There is no way to unload an individual assembly without unloading all of the application domains that contain it. Even if the assembly goes out of scope, the actual assembly file will remain loaded until all application domains that contain it are unloaded."
Reference: http://msdn.microsoft.com/en-us/library/ms173101(v=vs.90).aspx
My approach would probably involve some sort of local communication between communication layer and business logic, each on a different context (AppDomain) - via named pipes or memory mapped files, for example.
Here is a good example of loading / unloading assembly dynamically.
http://www.c-sharpcorner.com/uploadfile/girish.nehte/how-to-unload-an-assembly-loaded-dynamically-using-reflection/
Be careful about speed since the MethodInfo.Invoke is slow you might want to look into using DynamicMethod. Also creating / destroying app domains is slow.
http://www.wintellect.com/blogs/krome/getting-to-know-dynamicmethod
Also you can use what is called a "plugin" framework. Codeplex has one called the MEF "Managed Extensibility Framework"
http://mef.codeplex.com/
At the moment I am working on a project admin application in C# 3.5 on ASP.net. In order to reduce hits to the database, I'm caching a lot of information using static variables. For example, a list of users is kept in memory in a static class. The class reads in all the information from the database on startup, and will update the database whenever changes are made, but it never needs to read from the datebase.
The class pings other webservers (if they exist) with updated information at the same time as a write to the database. The pinging mechanism is a Windows service to which the cache object registers using a random available port. It is used for other things as well.
The amount of data isn't all that great. At the moment I'm using it just to cache the users (password hashes, permissions, name, email etc.) It just saves a pile of calls being made to the database.
I was wondering if there are any pitfalls to this method and/or if there are better ways to cache the data?
A pitfall: A static field is scoped per app domain, and increased load will make the server generate more app domains in the pool. This is not necessarily a problem if you only read from the statics, but you will get duplicate data in memory, and you will get a hit every time an app domain is created or recycled.
Better to use the Cache object - it's intended for things like this.
Edit: Turns out I was wrong about AppDomains (as pointed out in comments) - more instances of the Application will be generated under load, but they will all run in the same AppDomain. (But you should still use the Cache object!)
As long as you can expect that the cache will never grow to a size greater than the amount of available memory, it's fine. Also, be sure that there will only be one instance of this application per database, or the caches in the different instances of the app could "fall out of sync."
Where I work, we have a homegrown O/RM, and we do something similar to what you're doing with certain tables which are not expected to grow or change much. So, what you're doing is not unprecedented, and in fact in our system, is tried and true.
Another Pitfall you must consider is thread safety. All of your application requests are running in the same AppDomain but may come on different threads. Accessing a static variable must account for it being accessed from multiple threads. Probably a bit more overhead than you are looking for. Cache object is better for this purpose.
Hmmm... The "classic" method would be the application cache, but provided you never update the static variables, or understand the locking issues if you do, and you understand that they can disappear at anytime with an appdomain restart then I don't really see the harm in using a static.
I suggest you look into ways of having a distributed cache for your app. You can take a look at NCache or indeXus.Net
The reason I suggested that is because you rolled your own ad-hoc way of updating information that you're caching. Static variables/references are fine but they don't update/refresh (so you'll have to handle aging on your own) and you seem to have a distributed setup.