Improve performance of Redis to reduce timeout exceptions

Improve performance of Redis to reduce timeout exceptions - c#

I have a Redis Database on a Centos server, and 3 Windows servers are connected to it with approximately 1,000 reads/writes per second, all of which are on the same local LAN, so the ping time is less than one millisecond.
The problem is at least 5 percent of reading operations are going timeout, while I read maximum 3KB data in a read operation with 'syncTimeout=15', which is much more than network latency.
I installed Redis on bash on my windows 10, and simulate the problem. I also stopped writing operations. However, the problem still exists with 0.5 percent timeouts, while there is no network latency.
I also used a Centos Server in my LAN to simulate the problem, in this case, I need at 100 milliseconds for 'syncTimeout' to be sure the amount of timeout is less than 1 percent.
I considered using some Dictionaries to cache data from Redis, so there is no need to request per item, and I can take advantage of the pipeline. But I came across StackRedis.L1 which is developed as an L1 cache for Redis, and it is not confident in updating the L1 cache.
This is my code to simulate the problem:
var connectionMulti = ConnectionMultiplexer.Connect(
"127.0.0.1:6379,127.0.0.1:6380,allowAdmin=true,syncTimeout=15");
// 100,000 keys
var testKeys = File.ReadAllLines("D:\\RedisTestKeys.txt");
for (var i = 0; i < 3; i++)
{
var safeI = i;
Task.Factory.StartNew(() =>
{
var serverName = $"server {safeI + 1}";
var stringDatabase = connectionMulti.GetDatabase(12);
PerformanceTest($"{serverName} -> String: ",
key => stringDatabase.StringGet(key), testKeys);
});
}
and the PerformanceTest method is:
private static void PerformanceTest(string testName, Func<string, RedisValue> valueExtractor,
IList<string> keys)
{
Task.Factory.StartNew(() =>
{
Console.WriteLine($"Starting {testName} ...");
var timeouts = 0;
var errors = 0;
long totalElapsedMilliseconds = 0;
var stopwatch = new Stopwatch();
foreach (var key in keys)
{
var redisValue = new RedisValue();
stopwatch.Restart();
try
{
redisValue = valueExtractor(key);
}
catch (Exception e)
{
if (e is TimeoutException)
timeouts++;
else
errors++;
}
finally
{
stopwatch.Stop();
totalElapsedMilliseconds += stopwatch.ElapsedMilliseconds;
lock (FileLocker)
{
File.AppendAllLines("D:\\TestResult.csv",
new[]
{
$"{stopwatch.ElapsedMilliseconds.ToString()},{redisValue.Length()},{key}"
});
}
}
}
Console.WriteLine(
$"{testName} {totalElapsedMilliseconds * 1.0 / keys.Count} (errors: {errors}), (timeouts: {timeouts})");
});
}
I expect all read operations will be done successfully less than 15 milliseconds.
Achieving this, is Considering L1 cache for a Redis cache a good solution? (It is very fast, in the scale of a nanosecond, but how can I do for syncronizing)
Or Redis can be enhanced by clustering or something else? (While I tested it on bash on my PC, and I did not receive expected result)

Or Redis can be enhanced by clustering or something else?
Redis can be clustered, in different ways:
"regular" redis can be replicated to secondary read-only nodes, on the same machine or different machines; you can then send "read" traffic to some of the replicas
redis "cluster" exists, which allows you to split (shard) the keyspace over multiple primaries, sending appropriate requests to each node
redis "cluster" can also make use of readonly replicas of the sharded nodes
Whether that is appropriate or useful is contextual and needs local knowledge and testing.
Achieving this, is Considering L1 cache for a Redis cache a good solution?
Yes, it is a good solution. A request you don't make is much faster (and has much less impact on the impact) than a request you do make. There are tools for helping with cache invalidation, including using the pub/sub API for invalidations. Redis vNext is also looking into additional knowledge APIs specifically for this kind of L1 scenario.

Related

Configure OpenTelemetry to fail when sending traces to an exporter fail

We require that data is highly correct, more so than 100% uptime (I recognize this may mean that opentelemetry is not the best choice but still would like to know if its possible).
We are exporting to elastic using APM.
I have noticed 2 significant issues.
Issue 1: I provide no/incorrect bearer token. No errors (or traces) are recorded. Silent failure.
Issue 2: I try to write a huge number of traces (100k) as fast as possible. About 2k make it and the rest are discarded.
var tracerProvider = Sdk.CreateTracerProviderBuilder()
.SetSampler(new AlwaysOnSampler())
.AddSource("MyCompany.MyProduct.MyLibrary")
.AddOtlpExporter(o =>
{
o.ExportProcessorType = ExportProcessorType.Batch;
o.TimeoutMilliseconds = 100 * 1000;
o.Protocol = OtlpExportProtocol.Grpc;
o.Endpoint = new Uri("https://somepath:443");
o.Headers = "Authorization=Bearer token1";
})
//.AddConsoleExporter()
.Build();
Task.Run(() =>
{
for (int i = 0; i < 100000; i++)
{
using (var activity = MyActivitySource.StartActivity("SayHello"))
{
activity?.SetTag("foo", 1);
}
}
});
Console.WriteLine("Done");
Console.ReadLine();

I would start otel collector locally - then point your app code to local otel collector and local otel collector will export traces to Elastic APM.
Configure local otel collector:
1.) Enable telemetry metrics and monitor failed metrics of used OTLP exporter, e.g. https://grafana.com/grafana/dashboards/15983
2.) Use aggresive batching - batch processor before exporting to Elastic APM, e.g.
processors:
batch:
send_batch_size: 10000
timeout: 5s
send_batch_max_size: 0
Collector will decrease ingestion rate to Elastic APM thank to batching. Of course you need to test it first, because some trace backend implementation uses defaults - e.g. GRPC for Go has default 4MB GRPC message size and 10k traces in one message will very likely exceeds this limit. Very often these infrastructure/app limits are not documented, so stress testing before production is highly recommended. Keep in mind that batching will require additional memory, so monitor also memory usage.
I would customize retry and queue behaviour.

ASP.NET Core Distributed Redis Cache: Disconnect

I'am using Redis cache as distributed cache in ASP.NET app.
It works until Redis server becomes unavailable and the question is:
How to properly handle disconnection issues?
Redis is configured this way (Startup.cs):
services.AddDistributedRedisCache(...)
Option AbortOnConnectFail is set to false
Injected in service via constructor:
...
private IDistributedCache _cache
public MyService(IDistributedCache cache)
{
_cache = cache;
}
When Redis is down the following code throws an exception (StackExchange.Redis.RedisConnectionException: SocketFailure on 127.0.0.1:6379/Subscription ...):
var val = await _cache.GetAsync(key, cancellationToken);
I don't think that using reflection to inspect a connection state inside _cache object is a good way. So are there any 'right' options to handle it?

Maybe you can check Polly Project. It has Retry/WaitAndRetry/RetryForever and Circuit Breakers that can be handy. So you can catch that RedisConnectionException And then retry or fallback to other method.
You have Plugin for Microsoft DistributedCache Provider.
Check it out.

First of all, why is your Redis server becoming unavailable? And for how long? You should minimize these kinds of situations. Do you use Redis as a service from AWS i.e. ElasticCache? If so you can configure it to promote a new Redis slave /read-replice server to become a master if the first master fails.
To improve fault tolerance and reduce write downtime, enable Multi-AZ with Automatic Failover for your Redis (cluster mode
disabled) cluster with replicas. For more information, see Minimizing
downtime in ElastiCache for Redis with Multi-AZ.
https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/AutoFailover.html
Apart from that, a fallback solution to an unresponsive Redis server would be just to retrieve the objects/entities that your a caching in Redis from the database if the Redis server is down. You can retry the Redis call two times with 5 seconds between each retry and if the server is still down you should just query the database. This would result in a performance hit but it is a better solution than throwing an error.
T val = null;
int retryCount = 0;
do
{
try
{
val = await _cache.GetAsync(key, cancellationToken);
}
catch(Exception ex)
{
retryCount++;
Thread.Sleep(retryCount * 2000)
}
}
while(retryCount < 3 && val == null);
if (val == null)
{
var = call to database
}

Load test WebClient server

I am a developer who has no load test experience and would like to learn how to do this.
I have a simple client server application where the client sends a request to the server and the server sends a response back.
I would like to load test this but I am not sure how to do this. Here is my GetResponse method which receives a response from the server.
Response GetResponse(Request request)
{
string data = Newtonsoft.Json.JsonConvert.SerializeObject(request);
System.Net.WebClient wb = new System.Net.WebClient();
string response = wb.UploadString("http://localhost:8080", data);
return Newtonsoft.Json.JsonConvert.DeserializeObject<Response>(response);
}
My initial thoughts are to write a routine to send a load of get response requests all at the same time and then try and monitor the CPU ticks or other to see how it is performing.
Can anyone let me know if this is the correct way to go about it? I am also not really sure what the best stats to gather are?
Thanks in advance
EDIT....
Whist waiting for an answer I have written the following which adds a new thread and processes the requests as desired. Please can you comment on whether this is sufficient to see what I need or do I need a proper load testing tool?
DateTime startTime;
DateTime endTime;
Console.WriteLine("Test how many concurrent users?");
string users = Console.ReadLine();
int usersCount;
if (int.TryParse(users, out usersCount) && usersCount > 0)
{
startTime = DateTime.Now;
_countDown = new CountdownEvent(usersCount);
for (var i = 0; i < usersCount; i++)
{
string userName = string.Format("user{0}", i);
Task.Factory.StartNew(() => TestRun(userName));
}
_countDown.Wait();
endTime = DateTime.Now;
Console.WriteLine("All tasks are completed!");
Console.WriteLine(string.Format("Av time(ms) per user: {0}", (endTime - startTime).TotalMilliseconds / usersCount));
}
Console.WriteLine("Press any key to exit");
Console.ReadKey();
}
public static void TestRun(object userName)
{
Thread newThread = new Thread(DoWork);
newThread.Start(userName);
}
public static void DoWork(object userName)
{
LoadTest.Test(userName.ToString());
_countDown.Signal();
}

First of all you need a load testing tool. If your Visual Studio license allows, the most straightforward option would be using MS VS Load Testing Framework. If you don't have Web and Load test types - there is a number of free and open source load testing tools.
Creating test itself. Being a developer you should know how to construct HTTP Request. If not - most load testing tools offer record and replay functionality.
Once you get load test working you can start ramping up the number of virtual users and keep an eye on associated metrics and KPIs, i.e. :
Number of concurrent users vs response time
Number of concurrent users vs throughput
Transactions per second
Server hits per second
Response Time:
Average
Median
90/95/99 Percentile
What is the maximum number of concurrent users / requests per second your application is able to serve without errors and having reasonable response time
If application is overloaded and does not respond does it return to normal operating mode when the load decreases
Analysing above metrics you can:
determine maximum performance and capacity of your application
identify the bottleneck and work it around if possible

neo4j REST API poor performance

According to my benchmark of creating nodes using
GraphClient.Create()
performance leaves much to be desired.
I've got about 10 empty nodes per second on my machine (Core i3, 8 GB RAM).
Even when I use multithreading to perform create time to each Create() call speed icreases linearly (~N times when used N threads).
I've tested both stable 1.9.2 and 2.0.0-M04. The results exactly the same.
Does anybody know what's wrong?
EDIT: I tried to use neo4j REST API and I got similar results: ~ 20 empty nodes per second and multithreading also gives no benefits.
EDIT 2: At the same time Batch REST API, that allows batch creations provides much better performance: about 250 nodes per second. It looks like there is incredible big overhead in handling single request...

Poor performance caused by overhead in processing RESTful Cypher query. Mostly it is network overhead but overhead caused by need to parse query also exists.
Use Core Java API when you interested in high performance. Core Java API provides more than 10 times faster requests processing than Cypher query language.
See this articles:
Performance of Graph Query Languages
Get the full neo4j power by using the Core Java API for traversing
your Graph data base instead of Cypher Query Language

The neo4jclient itself uses the REST API, so you're already limited in performance (by bandwidth, network latency etc) when compared to a direct API call (for which you'd need Java).
What performance are you after?
What code are you running?
Some initial thoughts & tests to try:
Obviously there are things like CPU etc which will cause some throttling, some things to consider:
Is the Neo4J server on the same machine?
Have you tried your application not through Visual Studio? (i.e. no debugging)
In my test code (below), I get 10 entries in ~200ms - can you try this code in a simple console app and see what you get?
private static void Main()
{
var client = new GraphClient(new Uri("http://localhost.:7474/db/data"));
client.Connect();
for (int i = 0; i < 10; i++)
CreateEmptyNodes(10, client);
}
private static void CreateEmptyNodes(int numberToCreate, IGraphClient client)
{
var start = DateTime.Now;
for (int i = 0; i < numberToCreate; i++)
client.Create(new object());
var timeTaken = DateTime.Now - start;
Console.WriteLine("For {0} items, I took: {1}ms", numberToCreate, timeTaken.TotalMilliseconds);
}
EDIT:
This is a raw HttpClient approach to calling the 'Create', which I believe is analagous to what neo4jclient is doing under the hood:
private async static void StraightHttpClient(int iterations, int amount)
{
var client = new HttpClient {BaseAddress = new Uri("http://localhost.:7474/db/data/")};
for (int j = 0; j < iterations; j++)
{
DateTime start = DateTime.Now;
for (int i = 0; i < amount; i++)
{
var response = await client.SendAsync(new HttpRequestMessage(HttpMethod.Post, "cypher/") { Content = new StringContent("{\"query\":\"create me\"}", Encoding.UTF8, "application/json") });
if(response.StatusCode != HttpStatusCode.OK)
Console.WriteLine("Not ok");
}
TimeSpan timeTaken = DateTime.Now - start;
Console.WriteLine("took {0}ms", timeTaken.TotalMilliseconds);
}
}
Now, if you didn't care about the response, you could just call Client.SendAsync(..) without the await, and that gets you to a spiffy ~2500 per second. However obviously the big issue here is that you haven't necessarily sent any of those creates, you've basically queued them, so shut down your program straight after, and chances are you'll have either no entries, or a very small number.
So.. clearly the code can handle firing x thousand calls a second with no problems, (I've done a similar test to the above using ServiceStack and RestSharp, both take similar times to the HttpClient).
What it can't do is send those to the actual server at the same rate, so we're limited by the windows http stack and / or how fast n4j can process the request and supply a response.

Load testing a website

I am currently writing a small application to load test a website and am having a few problems.
List<string> pageUrls = new List<string();
// NOT SHOWN ... populate the pageUrls with thousands of links
var parallelOptions = new System.Threading.Tasks.ParallelOptions();
parallelOptions.MaxDegreeOfParallelism = 100;
System.Threading.Tasks.Parallel.ForEach(pageUrls, parallelOptions, pageUrl =>
{
var startedOn = DateTime.UtcNow;
var request = System.Net.HttpWebRequest.Create(pageUrl);
var responseTimeBefore = DateTime.UtcNow;
try
{
var response = (System.Net.HttpWebResponse)request.GetResponse();
responseCode = response.StatusCode.ToString();
response.Close();
}
catch (System.Net.WebException ex)
{
// NOT SHOWN ... write to the error log
}
var responseTimeAfter = DateTime.UtcNow;
var responseDuration = responseTimeAfter - responseTimeBefore;
// NOT SHOWN ... write the response duration out to a file
var endedOn = DateTime.UtcNow;
var threadDuration = endedOn - startedOn;
// sleep for one second
var oneSecond = new TimeSpan(0, 0, 1);
if (threadDuration < oneSecond)
{
System.Threading.Thread.Sleep(oneSecond - threadDuration);
}
}
);
When I set the MaxDegreeOfParallelism to a low value such as 10 everything works fine, the responseDuration stays between 1 and 3 seconds. If I increase the value to 100 (as in the example) the responseDuration climbs quickly until after around 300 requests the it has reached 25 seconds (and still climbing).
I thought I may be doing something wrong so I also ran Apache jMeter with the standard web test plan setup and set the users to 100. After about 300 samples the response times had rocketed to around 40 seconds.
I'm skeptical that my server is reaching its limit. The task manager on the server shows that only 2GB of the 16GB is being used and the processor hangs around 5% effort.
Could I be hitting some limit on the number of simultaneous connections on my client computer? If so, how do I change this?
Am I forgetting to do something in my code? Clean-up/close connections?
Could it be that my code is OK and it is in fact my server that just can't handle the traffic?
For reference my client computer that is running the code above is running Windows 7 and is on the same network as the server I am testing. The server is running Windows Server 2008 IIS 7.5 and is a dedicated 8-core 16GB RAM machine.

MaxDegreeOfParallelism should be used only when you are trying to limit the number of cores to be used as part of your program strategy.
By default, Parallel library utilizes the most number of available threads - so setting this option to any number mostly will limit the performance depending on the environment running it.
I would suggest you to try running this code without setting this option and that should improve the performance.
ParallelOptions.MaxDegreeOfParallelism Property in MSDN - read remarks section for more information.

Several suggestions:
How large is your recorded Jmeter test script and did you insert some think time? The larger the test, the heavier the load.
Make sure the LAN is not in use by competing traffic during test runs. Having a Gigabit ethernet switch should be mandatory.
Do use 2-3 slave machines and avoid using heavy results loggers in Jmeter like tree.You were right to minimize these graphs and results.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Improve performance of Redis to reduce timeout exceptions - c#

Related

Configure OpenTelemetry to fail when sending traces to an exporter fail

ASP.NET Core Distributed Redis Cache: Disconnect

Load test WebClient server

neo4j REST API poor performance

Load testing a website

Categories

Resources