According to my benchmark of creating nodes using
GraphClient.Create()
performance leaves much to be desired.
I've got about 10 empty nodes per second on my machine (Core i3, 8 GB RAM).
Even when I use multithreading to perform create time to each Create() call speed icreases linearly (~N times when used N threads).
I've tested both stable 1.9.2 and 2.0.0-M04. The results exactly the same.
Does anybody know what's wrong?
EDIT: I tried to use neo4j REST API and I got similar results: ~ 20 empty nodes per second and multithreading also gives no benefits.
EDIT 2: At the same time Batch REST API, that allows batch creations provides much better performance: about 250 nodes per second. It looks like there is incredible big overhead in handling single request...
Poor performance caused by overhead in processing RESTful Cypher query. Mostly it is network overhead but overhead caused by need to parse query also exists.
Use Core Java API when you interested in high performance. Core Java API provides more than 10 times faster requests processing than Cypher query language.
See this articles:
Performance of Graph Query Languages
Get the full neo4j power by using the Core Java API for traversing
your Graph data base instead of Cypher Query Language
The neo4jclient itself uses the REST API, so you're already limited in performance (by bandwidth, network latency etc) when compared to a direct API call (for which you'd need Java).
What performance are you after?
What code are you running?
Some initial thoughts & tests to try:
Obviously there are things like CPU etc which will cause some throttling, some things to consider:
Is the Neo4J server on the same machine?
Have you tried your application not through Visual Studio? (i.e. no debugging)
In my test code (below), I get 10 entries in ~200ms - can you try this code in a simple console app and see what you get?
private static void Main()
{
var client = new GraphClient(new Uri("http://localhost.:7474/db/data"));
client.Connect();
for (int i = 0; i < 10; i++)
CreateEmptyNodes(10, client);
}
private static void CreateEmptyNodes(int numberToCreate, IGraphClient client)
{
var start = DateTime.Now;
for (int i = 0; i < numberToCreate; i++)
client.Create(new object());
var timeTaken = DateTime.Now - start;
Console.WriteLine("For {0} items, I took: {1}ms", numberToCreate, timeTaken.TotalMilliseconds);
}
EDIT:
This is a raw HttpClient approach to calling the 'Create', which I believe is analagous to what neo4jclient is doing under the hood:
private async static void StraightHttpClient(int iterations, int amount)
{
var client = new HttpClient {BaseAddress = new Uri("http://localhost.:7474/db/data/")};
for (int j = 0; j < iterations; j++)
{
DateTime start = DateTime.Now;
for (int i = 0; i < amount; i++)
{
var response = await client.SendAsync(new HttpRequestMessage(HttpMethod.Post, "cypher/") { Content = new StringContent("{\"query\":\"create me\"}", Encoding.UTF8, "application/json") });
if(response.StatusCode != HttpStatusCode.OK)
Console.WriteLine("Not ok");
}
TimeSpan timeTaken = DateTime.Now - start;
Console.WriteLine("took {0}ms", timeTaken.TotalMilliseconds);
}
}
Now, if you didn't care about the response, you could just call Client.SendAsync(..) without the await, and that gets you to a spiffy ~2500 per second. However obviously the big issue here is that you haven't necessarily sent any of those creates, you've basically queued them, so shut down your program straight after, and chances are you'll have either no entries, or a very small number.
So.. clearly the code can handle firing x thousand calls a second with no problems, (I've done a similar test to the above using ServiceStack and RestSharp, both take similar times to the HttpClient).
What it can't do is send those to the actual server at the same rate, so we're limited by the windows http stack and / or how fast n4j can process the request and supply a response.
Related
We require that data is highly correct, more so than 100% uptime (I recognize this may mean that opentelemetry is not the best choice but still would like to know if its possible).
We are exporting to elastic using APM.
I have noticed 2 significant issues.
Issue 1: I provide no/incorrect bearer token. No errors (or traces) are recorded. Silent failure.
Issue 2: I try to write a huge number of traces (100k) as fast as possible. About 2k make it and the rest are discarded.
var tracerProvider = Sdk.CreateTracerProviderBuilder()
.SetSampler(new AlwaysOnSampler())
.AddSource("MyCompany.MyProduct.MyLibrary")
.AddOtlpExporter(o =>
{
o.ExportProcessorType = ExportProcessorType.Batch;
o.TimeoutMilliseconds = 100 * 1000;
o.Protocol = OtlpExportProtocol.Grpc;
o.Endpoint = new Uri("https://somepath:443");
o.Headers = "Authorization=Bearer token1";
})
//.AddConsoleExporter()
.Build();
Task.Run(() =>
{
for (int i = 0; i < 100000; i++)
{
using (var activity = MyActivitySource.StartActivity("SayHello"))
{
activity?.SetTag("foo", 1);
}
}
});
Console.WriteLine("Done");
Console.ReadLine();
I would start otel collector locally - then point your app code to local otel collector and local otel collector will export traces to Elastic APM.
Configure local otel collector:
1.) Enable telemetry metrics and monitor failed metrics of used OTLP exporter, e.g. https://grafana.com/grafana/dashboards/15983
2.) Use aggresive batching - batch processor before exporting to Elastic APM, e.g.
processors:
batch:
send_batch_size: 10000
timeout: 5s
send_batch_max_size: 0
Collector will decrease ingestion rate to Elastic APM thank to batching. Of course you need to test it first, because some trace backend implementation uses defaults - e.g. GRPC for Go has default 4MB GRPC message size and 10k traces in one message will very likely exceeds this limit. Very often these infrastructure/app limits are not documented, so stress testing before production is highly recommended. Keep in mind that batching will require additional memory, so monitor also memory usage.
I would customize retry and queue behaviour.
I have a Redis Database on a Centos server, and 3 Windows servers are connected to it with approximately 1,000 reads/writes per second, all of which are on the same local LAN, so the ping time is less than one millisecond.
The problem is at least 5 percent of reading operations are going timeout, while I read maximum 3KB data in a read operation with 'syncTimeout=15', which is much more than network latency.
I installed Redis on bash on my windows 10, and simulate the problem. I also stopped writing operations. However, the problem still exists with 0.5 percent timeouts, while there is no network latency.
I also used a Centos Server in my LAN to simulate the problem, in this case, I need at 100 milliseconds for 'syncTimeout' to be sure the amount of timeout is less than 1 percent.
I considered using some Dictionaries to cache data from Redis, so there is no need to request per item, and I can take advantage of the pipeline. But I came across StackRedis.L1 which is developed as an L1 cache for Redis, and it is not confident in updating the L1 cache.
This is my code to simulate the problem:
var connectionMulti = ConnectionMultiplexer.Connect(
"127.0.0.1:6379,127.0.0.1:6380,allowAdmin=true,syncTimeout=15");
// 100,000 keys
var testKeys = File.ReadAllLines("D:\\RedisTestKeys.txt");
for (var i = 0; i < 3; i++)
{
var safeI = i;
Task.Factory.StartNew(() =>
{
var serverName = $"server {safeI + 1}";
var stringDatabase = connectionMulti.GetDatabase(12);
PerformanceTest($"{serverName} -> String: ",
key => stringDatabase.StringGet(key), testKeys);
});
}
and the PerformanceTest method is:
private static void PerformanceTest(string testName, Func<string, RedisValue> valueExtractor,
IList<string> keys)
{
Task.Factory.StartNew(() =>
{
Console.WriteLine($"Starting {testName} ...");
var timeouts = 0;
var errors = 0;
long totalElapsedMilliseconds = 0;
var stopwatch = new Stopwatch();
foreach (var key in keys)
{
var redisValue = new RedisValue();
stopwatch.Restart();
try
{
redisValue = valueExtractor(key);
}
catch (Exception e)
{
if (e is TimeoutException)
timeouts++;
else
errors++;
}
finally
{
stopwatch.Stop();
totalElapsedMilliseconds += stopwatch.ElapsedMilliseconds;
lock (FileLocker)
{
File.AppendAllLines("D:\\TestResult.csv",
new[]
{
$"{stopwatch.ElapsedMilliseconds.ToString()},{redisValue.Length()},{key}"
});
}
}
}
Console.WriteLine(
$"{testName} {totalElapsedMilliseconds * 1.0 / keys.Count} (errors: {errors}), (timeouts: {timeouts})");
});
}
I expect all read operations will be done successfully less than 15 milliseconds.
Achieving this, is Considering L1 cache for a Redis cache a good solution? (It is very fast, in the scale of a nanosecond, but how can I do for syncronizing)
Or Redis can be enhanced by clustering or something else? (While I tested it on bash on my PC, and I did not receive expected result)
Or Redis can be enhanced by clustering or something else?
Redis can be clustered, in different ways:
"regular" redis can be replicated to secondary read-only nodes, on the same machine or different machines; you can then send "read" traffic to some of the replicas
redis "cluster" exists, which allows you to split (shard) the keyspace over multiple primaries, sending appropriate requests to each node
redis "cluster" can also make use of readonly replicas of the sharded nodes
Whether that is appropriate or useful is contextual and needs local knowledge and testing.
Achieving this, is Considering L1 cache for a Redis cache a good solution?
Yes, it is a good solution. A request you don't make is much faster (and has much less impact on the impact) than a request you do make. There are tools for helping with cache invalidation, including using the pub/sub API for invalidations. Redis vNext is also looking into additional knowledge APIs specifically for this kind of L1 scenario.
I am a developer who has no load test experience and would like to learn how to do this.
I have a simple client server application where the client sends a request to the server and the server sends a response back.
I would like to load test this but I am not sure how to do this. Here is my GetResponse method which receives a response from the server.
Response GetResponse(Request request)
{
string data = Newtonsoft.Json.JsonConvert.SerializeObject(request);
System.Net.WebClient wb = new System.Net.WebClient();
string response = wb.UploadString("http://localhost:8080", data);
return Newtonsoft.Json.JsonConvert.DeserializeObject<Response>(response);
}
My initial thoughts are to write a routine to send a load of get response requests all at the same time and then try and monitor the CPU ticks or other to see how it is performing.
Can anyone let me know if this is the correct way to go about it? I am also not really sure what the best stats to gather are?
Thanks in advance
EDIT....
Whist waiting for an answer I have written the following which adds a new thread and processes the requests as desired. Please can you comment on whether this is sufficient to see what I need or do I need a proper load testing tool?
DateTime startTime;
DateTime endTime;
Console.WriteLine("Test how many concurrent users?");
string users = Console.ReadLine();
int usersCount;
if (int.TryParse(users, out usersCount) && usersCount > 0)
{
startTime = DateTime.Now;
_countDown = new CountdownEvent(usersCount);
for (var i = 0; i < usersCount; i++)
{
string userName = string.Format("user{0}", i);
Task.Factory.StartNew(() => TestRun(userName));
}
_countDown.Wait();
endTime = DateTime.Now;
Console.WriteLine("All tasks are completed!");
Console.WriteLine(string.Format("Av time(ms) per user: {0}", (endTime - startTime).TotalMilliseconds / usersCount));
}
Console.WriteLine("Press any key to exit");
Console.ReadKey();
}
public static void TestRun(object userName)
{
Thread newThread = new Thread(DoWork);
newThread.Start(userName);
}
public static void DoWork(object userName)
{
LoadTest.Test(userName.ToString());
_countDown.Signal();
}
First of all you need a load testing tool. If your Visual Studio license allows, the most straightforward option would be using MS VS Load Testing Framework. If you don't have Web and Load test types - there is a number of free and open source load testing tools.
Creating test itself. Being a developer you should know how to construct HTTP Request. If not - most load testing tools offer record and replay functionality.
Once you get load test working you can start ramping up the number of virtual users and keep an eye on associated metrics and KPIs, i.e. :
Number of concurrent users vs response time
Number of concurrent users vs throughput
Transactions per second
Server hits per second
Response Time:
Average
Median
90/95/99 Percentile
What is the maximum number of concurrent users / requests per second your application is able to serve without errors and having reasonable response time
If application is overloaded and does not respond does it return to normal operating mode when the load decreases
Analysing above metrics you can:
determine maximum performance and capacity of your application
identify the bottleneck and work it around if possible
I noticed that methods which I use in my self hosted wcf-appliactions are slow.
I write test method
public int MeasureTime()
{
int begin = Environment.TickCount;
for (int i = 0; i < 1000000; ++i) Console.WriteLine(i);
int end = Environment.TickCount;
return end-begin;
}
This method return value about 300 ms. If I measure it in my localhost client app which I have time about 600 ms. It is normal that I have such big delays?
I use basicHttpBinding.
What does this have to do with WCF? I don't see any WCF code here.
You are measuring the time it takes to write a huge ton of text to the console. You aren't measuring WCF time.
I don't understand what you are aiming at. If you want to measure WCF round-trip time don't make the method do other stuff for 1e6 iterations.
A WCF call takes as long as the CPU need to process it plus a network round-trip. Like ~1ms.
I need an Http request that I can use in .Net which takes under 100 ms. I'm able to achieve this in my browser so I really don't see why this is such a problem in code.
I've tried WinHTTP as well as WebRequest.Create and both of them are over 500ms which isn't acceptable for my use case.
Here are examples of the simple test I'm trying to pass. (WinHttpFetcher is a simple wrapper I wrote but it does the most trivial example of a Get Request that I'm not sure it's worth pasting.)
I'm getting acceptable results with LibCurlNet but if there are simultaneous usages of the class I get an access violation. Also since it's not managed code and has to be copied to bin directory, it's not ideal to deploy with my open source project.
Any ideas of another implementation to try?
[Test]
public void WinHttp_Should_Get_Html_Quickly()
{
var fetcher = new WinHttpFetcher();
var startTime = DateTime.Now;
var result = fetcher.Fetch(new Uri("http://localhost"));
var endTime = DateTime.Now;
Assert.Less((endTime - startTime).TotalMilliseconds, 100);
}
[Test]
public void WebRequest_Should_Get_Html_Quickly()
{
var startTime = DateTime.Now;
var req = (HttpWebRequest) WebRequest.Create("http://localhost");
var response = req.GetResponse();
var endTime = DateTime.Now;
Assert.Less((endTime - startTime).TotalMilliseconds, 100);
}
When benchmarking, it is best to discard at least the first two timings as they are likely to skew the results:
Timing 1: Dominated by JIT overhead i.e. the process of turning byte code into native code.
Timing 2: A possible optimization pass for the JIT'd code.
Timings after this will reflect repeat performance much better.
The following is an example of a test harness that will automatically disregard JIT and optimization passes, and run a test a set number of iterations before taking an average to assert performance. As you can see the JIT pass takes a substantial amount of time.
JIT:410.79ms
Optimize:0.98ms.
Average over 10 iterations:0.38ms
Code:
[Test]
public void WebRequest_Should_Get_Html_Quickly()
{
private const int TestIterations = 10;
private const int MaxMilliseconds = 100;
Action test = () =>
{
WebRequest.Create("http://localhost/iisstart.htm").GetResponse();
};
AssertTimedTest(TestIterations, MaxMilliseconds, test);
}
private static void AssertTimedTest(int iterations, int maxMs, Action test)
{
double jit = Execute(test); //disregard jit pass
Console.WriteLine("JIT:{0:F2}ms.", jit);
double optimize = Execute(test); //disregard optimize pass
Console.WriteLine("Optimize:{0:F2}ms.", optimize);
double totalElapsed = 0;
for (int i = 0; i < iterations; i++) totalElapsed += Execute(test);
double averageMs = (totalElapsed / iterations);
Console.WriteLine("Average:{0:F2}ms.", averageMs);
Assert.Less(averageMs, maxMs, "Average elapsed test time.");
}
private static double Execute(Action action)
{
Stopwatch stopwatch = Stopwatch.StartNew();
action();
return stopwatch.Elapsed.TotalMilliseconds;
}
Use the StopWatch class to get accurate timings.
Then, make sure you're not seeing the results of un-optimized code or JIT compilation by running your timing test several times in Release code. Discard the first few calls to remove he impact of JIT and then take the mean tidings of the rest.
VS.NET has the ability to measure performance, and you might also want to use something like Fiddler to see how much time you're spending "on the wire" and sanity check that it's not your IIS/web server causing the delays.
500ms is a very long time, and it's possible to be in the 10s of ms with these classes, so don't give up hope (yet).
Update #1:
This is a great article that talks about micro benchmarking and what's needed to avoid seeing things like JIT:
http://blogs.msdn.com/b/vancem/archive/2009/02/06/measureit-update-tool-for-doing-microbenchmarks.aspx
You're not quite micro-benchmarking, but there are lots of best practices in here.
Update #2:
So, I wrote this console app (using VS.NET 2010)...
class Program
{
static void Main(string[] args)
{
var stopwatch = Stopwatch.StartNew();
var req = (HttpWebRequest)WebRequest.Create("http://localhost");
var response = req.GetResponse();
Console.WriteLine(stopwatch.ElapsedMilliseconds);
}
}
... and Ctrl-F5'd it. It was compiled as debug, but I ran it without debugging, and I got 63ms. I'm running this on my Windows 7 laptop, and so http://localhost brings back the default IIS7 home page. Running it again I get similar times.
Running a Release build gives times in the 50ms to 55ms range.
This is the order of magnitude I'd expect. Clearly, if your website is performing an ASP.NET recompile, or recycling the app pool, or doing lots of back end processing, then your timings will differ. If your markup is massive then it will also differ, but none of the classes you're using client side should be the rate limiting steps here. It'll be the network hope and/or the remote app processing.
Try setting the Proxy property of the HttpWebRequest instance to null.
If that works, then try setting it to GlobalProxySelection.GetEmptyWebProxy(), which seems to be more correct.
You can read more about it here:
- WebRequest slow?: http://holyhoehle.wordpress.com/2010/01/12/webrequest-slow/
Update 2018: Pulling this up from the comments.
System.Net.GlobalProxySelection is obsolete.This class has been deprecated. Please use WebRequest.DefaultWebProxy instead to access and set the global default proxy. Use null instead of GetEmptyWebProxy(). – jirarium Jul 22 '17 at 5:44