Booksleve: what Redis version to use? - c#

With reference to this post about Booksleeve and to the fact that there is not an official Windows Redis distribuition, what is the best practice? Is better to compile on Win32 or the "Unofficial" win32/64 distribuition is reliable and mantained ?

Booksleeve is just any-other-redis-client, and is rather orthogonal to the redis-server version/platform that you choose to use. Personally, I would only currently use the win32 implementations of redis-server as a local developer convenience. Production machines should probably use the linux build (we use ubuntu server, if it matters). The reason for this comes down to the simple fact that redis-server is designed to make use of the cheap linux fork/copy-on-write functionality to perform background saving (and possibly other functionality). Windows does not have such a fork, and the "linux on windows" implementations typically do a memory copy (pretty expensive, and may significantly impact how certain operations perform).
Worse: at least one pure-windows version of redis-server simply substitutes a BGSAVE request for a SAVE request; on an busy server, this is death: SAVE is synchronous, and redis is single threaded, usually simply exploiting the fact that individual operations are so ridiculously fast, so that you wouldn't normally notice. However: if you suddenly get a SAVE request that takes 20 seconds, then your redis server is doing nothing else for those 20 seconds. When you are relying on replies that are typically around 0.3ms, this is a big problem.
Microsoft have been working on a port of redis-server, and it may well be that this is now production ready; however, all things considered, for now I would rather stick with the primary, well-tested implementation on a linux server.
But: for ad-hoc developer usage, any of the win32 builds should be fine.

Related

Efficiently streaming data across process boundaries in .NET

I've been working on an internal developer tool on and off for a few weeks now, but I'm running into an ugly stumbling block I haven't managed to find a good solution for. I'm hoping someone can offer some ideas or guidance on the best ways to use the existing frameworks in .NET.
Background: the purpose of this tool is to load multiple different types of log files (Windows Event Log, IIS, SQL trace, etc.) to the same database table so they can be sorted and examined together. My personal goal is to make the entire thing streamlined so that we only make a single pass and do not cache the entire log either in memory or to disk. This is important when log files reach hundreds of MB or into the GB range. Fast performance is good, but slow and unobtrusive (allowing you to work on something else in the meantime) is better than running faster but monopolizing the system in the process, so I've focused on minimizing RAM and disk usage.
I've iterated through a few different designs so far trying to boil it down to something simple. I want the core of the log parser--the part that has to interact with any outside library or file to actually read the data--to be as simple as possible and conform to a standard interface, so that adding support for a new format is as easy as possible. Currently, the parse method returns an IEnumerable<Item> where Item is a custom struct, and I use yield return to minimize the amount of buffering.
However, we quickly run into some ugly constraints: the libraries provided (generally by Microsoft) to process these file formats. The biggest and ugliest problem: one of these libraries only works in 64-bit. Another one (Microsoft.SqlServer.Management.Trace TraceFile for SSMS logs) only works in 32-bit. As we all know, you can't mix and match 32- and 64-bit code. Since the entire point of this exercise is to have one utility that can handle any format, we need to have a separate child process (which in this case is handling the 32-bit-only portion).
The end result is that I need the 64-bit main process to start up a 32-bit child, provide it with the information needed to parse the log file, and stream the data back in some way that doesn't require buffering the entire contents to memory or disk. At first I tried using stdout, but that fell apart with any significant amount of data. I've tried using WCF, but it's really not designed to handle the "service" being a child of the "client", and it's difficult to get them synchronized backwards from how they want to work, plus I don't know if I can actually make them stream data correctly. I don't want to use a mechanism that opens up unsecured network ports or that could accidentally crosstalk if someone runs more than one instance (I want that scenario to work normally--each 64-bit main process would spawn and run its own child). Ideally, I want the core of the parser running in the 32-bit child to look the same as the core of a parser running in the 64-bit parent, but I don't know if it's even possible to continue using yield return, even with some wrapper in place to help manage the IPC. Is there any existing framework in .NET that makes this relatively easy?
WCF does have a P2P mode however if all your processes are local machine you are better off with IPC such as named pipes due to the latter running in Kernel Mode and does not have the messaging overhead of the former.
Failing that you could try COM which should not have a problem talking between 32 and 64 bit processes. - Tell me more
In case anyone stumbles across this, I'll post the solution that we eventually settled on. The key was to redefine the inter-process WCF service interface to be different from the intra-process IEnumerable interface. Instead of attempting to yield return across process boundaries, we stuck a proxy layer in between that uses an enumerator, so we can call a "give me an item" method over and over again. It's likely this has more performance overhead than a true streaming solution, since there's a method call for every item, but it does seem to get the job done, and it doesn't leak or consume memory.
We did follow Micky's suggestion of using named pipes, but still within WCF. We're also using named semaphores to coordinate the two processes, so we don't attempt to make service calls until the "child service" has finished starting up.

Azure Web App. Free is faster than Basic and Standard?

I have a C# MVC application with a WCF service running on Azure. First of it was of course hosted on the free version, but as I had that one running smoothly I wanted to try and see how it ran on either Basic or Standard, which as far as I know should be dedicated servers.
To my surprise the code ran significantly slower once it was changed from Free to either Standard or Basic. I chose the smallest instance, but still expected them to perform better than the Free option?
From my performance logging I can see that the code that runs especially slow is something that is started as async from Task.Run. Initially it was old school Thread.Start() but considered whether this might spawn it in some lower priority thread and therefore changed it to Task.Run - without this changing anything - so perhaps it has nothing to do with it - but it might, so now you know.
The code that runs really slow basically works on some XML document, through XDocument, XElement etc. It loops through, has some LINQ etc. but nothing too fancy. But still it is 5-10 times slower on Basic and Standard as on the Free version? For the exact same request the Free version uses around 1000ms where as Basic and Standard uses 8000-10000ms?
In each test I have tried 5-10 times but without any decrease in response-times. I thought about whether I need to wait some hours before the Basic/Standard is fully functional or something like that, but each time I switch back, the Free version just outperforms it from the get-go.
Any suggestions? Is the Free version for some strange reason more powerful than Basic or Standard or do I need to configure something differently once I get up and running on Basic or Standard?
The notable difference between the Free and Basic/Standard tiers is that Free uses an undisclosed number of shared cores, whereas Basic/Standard has a defined number of CPU cores (1-4 based on how much you pay). Related to this is the fact that Free is a shared instance while Basic/Standard is a private instance.
My best guess based on this that since the Free servers you would be on house multiple different users and applications, they probably have pretty beef specs. Their CPUs are probably 8-core Xeons and there might even be multiple CPUs. Most likely, Azure isn't enforcing any caps but rather relying on quotas (60 CPU minutes / day for the Free tier) and overall demand on the server to restrict CPU use. In other words, if your site is the only one that happens to be doing anything at the moment (unlikely of course, but for the sake of example), you could be potentially utilizing all 8+ cores on the box, whereas when you move over to Basic/Standard you are hard-limited to 1-4. Processing XML is actually very CPU heavy, so this seems to line up with my assumptions.
More than likely, this is a fluke. Perhaps your residency is currently on a relatively newly provisioned server that hasn't been fill up with tenants yet. Maybe you just happen to be sharing with tenants that aren't doing much. Who knows? But, if the server is ever actually under real load, I'd imagine you'd see a much worse response time on the Free tier than even Basic/Standard.

Force simultaneous threads/tasks for C# load testing app?

Question:
Is there a way to force the Task Parallel Library to run multiple tasks simultaneously? Even if it means making the whole process run slower with all the added context switching on each core?
Background:
I'm fairly new to multithreading, so I could use some assistance. My initial research hasn't turned up much, but I also doubt I know what exactly to search for. Perhaps someone more experienced with multithreading can help me better understand TPL and/or find a better solution.
Our company is planning on deploying a piece of software to all users' machines that will connect to a central server a few times a day, and synchronize some files and MS Access data back to the user's machine. We would like to load-test this concept first and see how the Access DB holds up to lots of simultaneous connections.
I've been tasked with writing a .NET application that behaves like the client app (connecting & syncing with a network location), but does this on multiple threads simultaneously.
I've been getting familiar with the Task Parallel Library (TPL), as this seems like the best (newest) way to handle multithreading, and get return values back from each thread easily. However as I understand it, TPL decides how to run each "task" for the fastest execution possible, splitting the work among the available cores. So lets say I want to run 30 sync jobs on a 2-core machine... the TPL would run 15 on each core, sequentially. This would mean my load test would only be hitting the Access DB with at most 2 connections at the same time. I want to hit the database with lots of simultaneous connections.
You can force the TPL to do this by specifying TaskOptions.LongRunning. According to Reflector (not according to the docs, though) this always creates a new thread. I consider relying on this safe production use.
Normal tasks will not do, because they don't guarantee execution. Setting MinThreads is a horrible solution (for production) because you are changing a process global setting to solve a local problem. And still, you are not guaranteed success.
Of course, you can also start threads. Tasks are more convenient though because of error handling. Nothing wrong with using threads for this use case.
Based on your comment, I think you should reconsider using Access in the first place. It doesn't scale well and has problems once the database grows to a certain size. Especially if this is simply served off some file share on your network.
You can try and simulate load from your single machine but I don't think that would be very representative of what you are trying to accomplish.
Have you considered using SQL Server Express? It's basically a de-tuned version of the full-blown SQL Server which might suit your needs better.

Any 'quick wins' to make .NET remoting faster on a single machine?

I've been badly let-down and received an application that in certain situations is at least 100 times too slow, which I have to release to our customers very soon (a matter of weeks).
Through some very simple profiling I have discovered that the bottleneck is its use of .NET Remoting to transfer data between a Windows service and the graphical front-end - both running on the same machine.
Microsoft guidelines say "Minimize round trips and avoid chatty interfaces": write
MyComponent.SaveCustomer("bob", "smith");
rather than
MyComponent.Firstname = "bob";
MyComponent.LastName = "smith";
MyComponent.SaveCustomer();
I think this is the root of the problem in our application. Unfortunately calls to MyComponent.* (the profiler shows that 99.999% of the time is spent in such statements) are scattered liberally throughout the source code and I don't see any hope of redesigning the interface in accordance with the guidelines above.
Edit: In fact, most of the time the front-end reads properties from MyComponent rather than writes to it. But I suspect that MyComponent can change at any time in the back-end.
I looked to see if I can read all properties from MyComponent in one go and then cache them locally (ignoring the change-at-any-time issue above), but that would involve altering hundreds of lines of code.
My question is: Are they any 'quick-win' things I can try to improve performance?
I need at least a 100-times speed-up. I am a C/C++/Delphi programmer and am pretty-much unfamiliar with C#/.NET/Remoting other than what I have read up on in the last couple of days. I'm looking for things that can be completed in a few days - a major restructuring of the code is not an option.
Just for starters, I have already confirmed that it is using BinaryFormatter.
(Sorry, this is probably a terrible question along the lines of 'How can I feasibly fix X if I rule out all of the feasible options'… but I'm desperate!)
Edit 2
In response to Richard's comment below: I think my question boils down to:
Is there any setting I can change to reduce the cost of a .NET Remoting round-trip when both ends of the connection are on the same machine?
Is there any setting I can change to reduce the number of round-trips - so that each invocation of a remote object property doesn't result in a separate round-trip? And might this break anything?
Under .Net Remoting you have 3 ways of communicating by HTTP, TCP and IPC. If the commnuicatin is on the same pc I sugest using IPC channels it will speed up your calls.
In short, no there are no quick wins here. Personally I would not make MyComponent (as a DTO) a MarshalByRefObject (which is presumably the problem), as those round trips are going to cripple you. I would keep it as a regular class, and just move a few key methods to pump them around (i.e. have a MarshalByRef manager/repository/etc class).
That should reduce round-trips; if you still have problems then it will probably be bandwidth related; this is easier to fix; for example by changing the serializer. protobuf-net allows you to do this easily by simply implementing ISerializable and forwarding the two methods (one from the interface, plus the ctor) to ProtoBuf.Serializer - it then does all the work for you, and works with remoting. I can provide examples of this if you like.
Actually, protobuf-net may help with CPU usage too, as it is a much more CPU-efficient serializer.
Could you make MyComponent a class that will cache the values and only submit them when SaveCustomer() is called?
You can try compressing traffic. If not 100-times increase, you'll still gain some performance benefit
If you need the latest data (always see the real value), and the cost of getting the data each time dominates the runtime then you need to be radical.
How about changing polling to push. Rather than calling the remote side each time you need a value, have the remote push all changes and cache the latest values locally.
Local lookups (after the initial get) are always up to date with all remoting overhead being done in the background (on another thread). Just be careful about thread safety for non-atomic types.

Does HttpListener work well on Mono?

I'm looking to write a small web service to run on a small Linux box. I prefer to code in C#, so I'm looking to use Mono.
I don't want the overhead of running a full web server or Mono's version of ASP.NET. I'm thinking of having a single process with a thread dealing with each client connection. Shared memory between threads instead of a database.
I've read a little on Microsoft's version of HttpListener and how it works with the Http.sys driver. Alas, Mono's documentation on this class is just the automated class interface with no discussion of how it works under the hood. (Linux doesn't have Http.sys, so I imagine it's implemented substantially differently.)
Could anyone point me towards some resources discussing this module please?
Many thanks, Bill, billpg.com
(A little background to my question for the interested.)
Some time ago, I asked this question, interested in keeping a long conversation open with lots of back-and-forth. I had settled on designing my own ad-hoc protocol, but people I spoke to really wanted a REST interface, even at the cost of the "Okay, send your command now" signal.
So, I wondered about running ASP.NET on a Linux/Mono server, but stumbled upon HttpListener. This seemed ideal, as each "conversation" could run in a separate thread. The thread that calls HttpListener in a loop can look for which thread each incomming connection is for and pass the reference to that thread.
The alternative for an ASP.NET driven service, would be to have the ASPX code pick up the state from a database, and write back the new state when it finishes. Yes, it would work, but that's a lot of overhead.
Greetings,
The HttpListener class in Mono works without much of a problem. I think that the most significant difference between its usage in a MS environment and a Linux environment is that port 80 cannot be bound to without a root/su/sudo security. Other ports do not have this restriction. For instance if you specify the prefix: http://localhost:1234/ the HttpListener works as expected. However if you add the prefix http://localhost/, which you would expect to listen on port 80, it fails silently. If you explicitly attempt to bind to port 80 (http://localhost:80/) then you throw an exception. If you invoke your application as a super user or root, you can explicitly bind to port 80 (http://localhost:80/).
I have not yet explored the rest of the HttpListener members in enough detail to make any useful comments about how well it operates in a linux environment. However, if there is interest, I will continue to post my observations.
chickenSandwich
I am not sure why you want to look so deep into the hood. Even on Microsoft side, the documents about http.sys may not provide you really valuable information if you are using the .NET Framework.
To know if something works on Mono good enough, you are always supposed to download its VMware or VPC image, and test your applications on it.
http://www.go-mono.com/mono-downloads/download.html
Though Mono is much more mature than a few years ago, we cannot say it has been tested by enough real-world applications like Microsoft.NET. So please test out your applications and submit issues you find to Mono team.
Based on my experience, minor issues are fixed within only a few days, while for major issues it takes a longer time. But with Mono source code available, you can fix on your own or find out good workarounds most of the times.

Categories

Resources