Load testing a website

Load testing a website - c#

I am currently writing a small application to load test a website and am having a few problems.
List<string> pageUrls = new List<string();
// NOT SHOWN ... populate the pageUrls with thousands of links
var parallelOptions = new System.Threading.Tasks.ParallelOptions();
parallelOptions.MaxDegreeOfParallelism = 100;
System.Threading.Tasks.Parallel.ForEach(pageUrls, parallelOptions, pageUrl =>
{
var startedOn = DateTime.UtcNow;
var request = System.Net.HttpWebRequest.Create(pageUrl);
var responseTimeBefore = DateTime.UtcNow;
try
{
var response = (System.Net.HttpWebResponse)request.GetResponse();
responseCode = response.StatusCode.ToString();
response.Close();
}
catch (System.Net.WebException ex)
{
// NOT SHOWN ... write to the error log
}
var responseTimeAfter = DateTime.UtcNow;
var responseDuration = responseTimeAfter - responseTimeBefore;
// NOT SHOWN ... write the response duration out to a file
var endedOn = DateTime.UtcNow;
var threadDuration = endedOn - startedOn;
// sleep for one second
var oneSecond = new TimeSpan(0, 0, 1);
if (threadDuration < oneSecond)
{
System.Threading.Thread.Sleep(oneSecond - threadDuration);
}
}
);
When I set the MaxDegreeOfParallelism to a low value such as 10 everything works fine, the responseDuration stays between 1 and 3 seconds. If I increase the value to 100 (as in the example) the responseDuration climbs quickly until after around 300 requests the it has reached 25 seconds (and still climbing).
I thought I may be doing something wrong so I also ran Apache jMeter with the standard web test plan setup and set the users to 100. After about 300 samples the response times had rocketed to around 40 seconds.
I'm skeptical that my server is reaching its limit. The task manager on the server shows that only 2GB of the 16GB is being used and the processor hangs around 5% effort.
Could I be hitting some limit on the number of simultaneous connections on my client computer? If so, how do I change this?
Am I forgetting to do something in my code? Clean-up/close connections?
Could it be that my code is OK and it is in fact my server that just can't handle the traffic?
For reference my client computer that is running the code above is running Windows 7 and is on the same network as the server I am testing. The server is running Windows Server 2008 IIS 7.5 and is a dedicated 8-core 16GB RAM machine.

MaxDegreeOfParallelism should be used only when you are trying to limit the number of cores to be used as part of your program strategy.
By default, Parallel library utilizes the most number of available threads - so setting this option to any number mostly will limit the performance depending on the environment running it.
I would suggest you to try running this code without setting this option and that should improve the performance.
ParallelOptions.MaxDegreeOfParallelism Property in MSDN - read remarks section for more information.

Several suggestions:
How large is your recorded Jmeter test script and did you insert some think time? The larger the test, the heavier the load.
Make sure the LAN is not in use by competing traffic during test runs. Having a Gigabit ethernet switch should be mandatory.
Do use 2-3 slave machines and avoid using heavy results loggers in Jmeter like tree.You were right to minimize these graphs and results.

Related

Configure OpenTelemetry to fail when sending traces to an exporter fail

We require that data is highly correct, more so than 100% uptime (I recognize this may mean that opentelemetry is not the best choice but still would like to know if its possible).
We are exporting to elastic using APM.
I have noticed 2 significant issues.
Issue 1: I provide no/incorrect bearer token. No errors (or traces) are recorded. Silent failure.
Issue 2: I try to write a huge number of traces (100k) as fast as possible. About 2k make it and the rest are discarded.
var tracerProvider = Sdk.CreateTracerProviderBuilder()
.SetSampler(new AlwaysOnSampler())
.AddSource("MyCompany.MyProduct.MyLibrary")
.AddOtlpExporter(o =>
{
o.ExportProcessorType = ExportProcessorType.Batch;
o.TimeoutMilliseconds = 100 * 1000;
o.Protocol = OtlpExportProtocol.Grpc;
o.Endpoint = new Uri("https://somepath:443");
o.Headers = "Authorization=Bearer token1";
})
//.AddConsoleExporter()
.Build();
Task.Run(() =>
{
for (int i = 0; i < 100000; i++)
{
using (var activity = MyActivitySource.StartActivity("SayHello"))
{
activity?.SetTag("foo", 1);
}
}
});
Console.WriteLine("Done");
Console.ReadLine();

I would start otel collector locally - then point your app code to local otel collector and local otel collector will export traces to Elastic APM.
Configure local otel collector:
1.) Enable telemetry metrics and monitor failed metrics of used OTLP exporter, e.g. https://grafana.com/grafana/dashboards/15983
2.) Use aggresive batching - batch processor before exporting to Elastic APM, e.g.
processors:
batch:
send_batch_size: 10000
timeout: 5s
send_batch_max_size: 0
Collector will decrease ingestion rate to Elastic APM thank to batching. Of course you need to test it first, because some trace backend implementation uses defaults - e.g. GRPC for Go has default 4MB GRPC message size and 10k traces in one message will very likely exceeds this limit. Very often these infrastructure/app limits are not documented, so stress testing before production is highly recommended. Keep in mind that batching will require additional memory, so monitor also memory usage.
I would customize retry and queue behaviour.

Improve performance of Redis to reduce timeout exceptions

I have a Redis Database on a Centos server, and 3 Windows servers are connected to it with approximately 1,000 reads/writes per second, all of which are on the same local LAN, so the ping time is less than one millisecond.
The problem is at least 5 percent of reading operations are going timeout, while I read maximum 3KB data in a read operation with 'syncTimeout=15', which is much more than network latency.
I installed Redis on bash on my windows 10, and simulate the problem. I also stopped writing operations. However, the problem still exists with 0.5 percent timeouts, while there is no network latency.
I also used a Centos Server in my LAN to simulate the problem, in this case, I need at 100 milliseconds for 'syncTimeout' to be sure the amount of timeout is less than 1 percent.
I considered using some Dictionaries to cache data from Redis, so there is no need to request per item, and I can take advantage of the pipeline. But I came across StackRedis.L1 which is developed as an L1 cache for Redis, and it is not confident in updating the L1 cache.
This is my code to simulate the problem:
var connectionMulti = ConnectionMultiplexer.Connect(
"127.0.0.1:6379,127.0.0.1:6380,allowAdmin=true,syncTimeout=15");
// 100,000 keys
var testKeys = File.ReadAllLines("D:\\RedisTestKeys.txt");
for (var i = 0; i < 3; i++)
{
var safeI = i;
Task.Factory.StartNew(() =>
{
var serverName = $"server {safeI + 1}";
var stringDatabase = connectionMulti.GetDatabase(12);
PerformanceTest($"{serverName} -> String: ",
key => stringDatabase.StringGet(key), testKeys);
});
}
and the PerformanceTest method is:
private static void PerformanceTest(string testName, Func<string, RedisValue> valueExtractor,
IList<string> keys)
{
Task.Factory.StartNew(() =>
{
Console.WriteLine($"Starting {testName} ...");
var timeouts = 0;
var errors = 0;
long totalElapsedMilliseconds = 0;
var stopwatch = new Stopwatch();
foreach (var key in keys)
{
var redisValue = new RedisValue();
stopwatch.Restart();
try
{
redisValue = valueExtractor(key);
}
catch (Exception e)
{
if (e is TimeoutException)
timeouts++;
else
errors++;
}
finally
{
stopwatch.Stop();
totalElapsedMilliseconds += stopwatch.ElapsedMilliseconds;
lock (FileLocker)
{
File.AppendAllLines("D:\\TestResult.csv",
new[]
{
$"{stopwatch.ElapsedMilliseconds.ToString()},{redisValue.Length()},{key}"
});
}
}
}
Console.WriteLine(
$"{testName} {totalElapsedMilliseconds * 1.0 / keys.Count} (errors: {errors}), (timeouts: {timeouts})");
});
}
I expect all read operations will be done successfully less than 15 milliseconds.
Achieving this, is Considering L1 cache for a Redis cache a good solution? (It is very fast, in the scale of a nanosecond, but how can I do for syncronizing)
Or Redis can be enhanced by clustering or something else? (While I tested it on bash on my PC, and I did not receive expected result)

Or Redis can be enhanced by clustering or something else?
Redis can be clustered, in different ways:
"regular" redis can be replicated to secondary read-only nodes, on the same machine or different machines; you can then send "read" traffic to some of the replicas
redis "cluster" exists, which allows you to split (shard) the keyspace over multiple primaries, sending appropriate requests to each node
redis "cluster" can also make use of readonly replicas of the sharded nodes
Whether that is appropriate or useful is contextual and needs local knowledge and testing.
Achieving this, is Considering L1 cache for a Redis cache a good solution?
Yes, it is a good solution. A request you don't make is much faster (and has much less impact on the impact) than a request you do make. There are tools for helping with cache invalidation, including using the pub/sub API for invalidations. Redis vNext is also looking into additional knowledge APIs specifically for this kind of L1 scenario.

Improving simultaneous HttpWebRequest performance in c#

I have an application that batches web requests to a single endpoint using the HttpWebRequest mechanism, the goal of the application is to revise large collections of product listings (specifically their descriptions).
Here is an example of the code I use to make these requests:
static class SomeClass
{
static RequestCachePolicy cachePolicy;
public static string DoRequest(string requestXml)
{
string responseXml = string.Empty;
Uri ep = new Uri(API_ENDPOINT);
HttpWebRequest theRequest = (HttpWebRequest)WebRequest.Create(ep);
theRequest.ContentType = "text/xml;charset=\"utf-8\"";
theRequest.Accept = "text/xml";
theRequest.Method = "POST";
theRequest.Headers[HttpRequestHeader.AcceptEncoding] = "gzip";
theRequest.Proxy = null;
if (cachePolicy == null) {
cachePolicy = new RequestCachePolicy(RequestCacheLevel.BypassCache);
}
theRequest.CachePolicy = cachePolicy;
using (Stream requestStream = theRequest.GetRequestStream())
{
using (StreamWriter requestWriter = new StreamWriter(requestStream))
{
requestWriter.Write(requestXml);
}
}
WebResponse theResponse = theRequest.GetResponse();
using (Stream responseStream = theResponse.GetResponseStream())
{
using (MemoryStream ms = new MemoryStream())
{
responseStream.CopyTo(ms);
byte[] resultBytes = GzCompressor.Decompress(ms.ToArray());
responseXml = Encoding.UTF8.GetString(resultBytes);
}
}
return responseXml;
}
}
My question is this; If I thread the task, I can call and complete at most 3 requests per second (based on the average sent data length) and this is through a gigabit connection to a router running business grade fibre internet. However if I divide the task up into 2 sets, and run the second set in a second process, I can double the requests complete per second.
The same can be said if I divide the task into 3 or 4 (after that performance seems to plateau unless I grab another machine to do the same), why is this? and can I change something in the first process so that running multiple processes (or computers) is no longer needed?
Things I have tried so far include the following:
Implementing GZip compression (as seen in the example above).
Re-using the RequestCachePolicy (as seen in the example above).
Setting Expect100Continue to false.
Setting DefaultConnectionLimit before the ServicePoint is created to a larger number.
Reusing the HttpWebRequest (does not work as remote host does not support it).
Increasing the ReceiveBufferSize on the ServicePoint both before and after creation.
Disabling proxy detection in Internet Explorer's Lan Settings.
My suspicion is not with the remote host as I can quite clearly wrench far more performance out by the methods I explained, but instead that some mechanism is capping the amount amount of data that is allowed to be sent through the HttpWebRequest (maybe something to do with the ServicePoint?). Thanks in advance, and please let me know if there is anything else you need clarifying.
--
Just to expand on the topic, my colleague and I used the same code on a system running Windows Server Standard 2016 64bit and requests using this method run significantly faster and more numerous. This seems to be pointing out that there is likely some sort of software bottleneck imposed proving that there is something going on. The slow operations are observed on Windows 10 Home/Pro 64bit and lower on faster hardware than the server is running on.

Scaling
I do not have a better solution for your problem but i think i know why your performance seems to peek or why it is machine dependent.
Usually a program has the best performance when the number of threads or processes matches exactly the number of cores. That is because the system can run them independently and the overhead for scheduling or context switching is minimized.
You arrived at your peek performance at 3 or 4 different tasks. From that i would conclude your machine has 2 or 4 cores. That would exactly match my explanation.

Creating a push or background service that checks remote server

General idea of what I need:
I am porting an Android app to iOS (Using Xamarin, but I can translate to C# from objective C easily enough) that relies heavily on the AlarmManager to do background checks on an HTML page on a website that I don't own. AlarmManager is essentially a task scheduler for Android. The user would set the frequency to whatever they desired.
What I've tried:
Background fetching:
app.SetMinimumBackgroundFetchInterval(240);
UNUserNotificationCenter.Current.RequestAuthorization(UNAuthorizationOptions.Alert, (approved, err) =>
{
// Handle approval
});
UNUserNotificationCenter.Current.Delegate = new WEBSITEFUNCTIONS.UserNotificationCenterDelegate();
return base.FinishedLaunching(app, options);
public override void PerformFetch(UIApplication application, Action<UIBackgroundFetchResult> completionHandler)
{
System.Diagnostics.Debug.WriteLine("interval");
WEBSITEFUNCTIONS kf = new WEBSITEFUNCTIONS();
kf.doCheck();
completionHandler(UIBackgroundFetchResult.NewData);
}
Perform Fetch is just straight up NEVER called. I need some consistency (being one minute off is no big deal... but several hours will not do). I let it run and it just straight up never worked. I've read lots on how PerformFetch works, and I don't think it'll give me the critical response time that the user needs.
UserNotifications:
New to iOS 10, is the ability to have repeating notifications. However this repeats the same notification.
var trigger = UNTimeIntervalNotificationTrigger.CreateTrigger(60, true);
var requestID = "sampleRequest";
var request = UNNotificationRequest.FromIdentifier(requestID, content, trigger);
UNUserNotificationCenter.Current.AddNotificationRequest(request, (err) =>
{
if (err != null)
{
// Do something with error...
}
});
Push Alerts:
My own server
I could setup a server that does the checking and then sends a message to the Firebase Cloud Messaging to send a message to the user about the new items. I have approximately 500 active users on the Android version, if they check 5 different pages every 5 minutes, at 90 kbs a check, that's about half a gig of bandwidth an hour.
So the cons are:
Excessive bandwidth usage will make my home internet
a lot slower
I will need to secure it myself
Power outages can sometimes last for days, leaving end users out of the loop
Their server could boot off my machine at any given
moment, I could get a new IP address from my ISP if that happened... assuming they allow that
Using my shared hosting, setup a cronjob every 15 minutes
I can setup a cronjob to do an alert every 15 minutes. It's not the fastest, but way better than relying on the first option (as it just straight up never gets called)
Once again, I'm at the mercy that their server doesn't kick me off. The app completely breaks if they do this.
Shared hosting might cut me off for putting too much strain on their servers (Hostgator claims unlimited bandwidth, I'm not sure if they'd like me doing that)

Optimizing download of multiple web pages. C#

I am developing an app where I need to download a bunch of web pages, preferably as fast as possible. The way that I do that right now is that I have multiple threads (100's) that have their own System.Net.HttpWebRequest. This sort of works, but I am not getting the performance I would like. Currently I have a beefy 600+ Mb/s connection to work with, and this is only utilized at most 10% (at peaks). I guess my strategy is flawed, but I am unable to find any other good way of doing this.
Also: If the use of HttpWebRequest is not a good way to download web pages, please say so :)
The code has been semi-auto-converted from java.
Thanks :)
Update:
public String getPage(String link){
myURL = new System.Uri(link);
myHttpConn = (System.Net.HttpWebRequest)System.Net.WebRequest.Create(myURL);
myStreamReader = new System.IO.StreamReader(new System.IO.StreamReader(myHttpConn.GetResponse().GetResponseStream(),
System.Text.Encoding.Default).BaseStream,
new System.IO.StreamReader(myHttpConn.GetResponse().GetResponseStream(),
System.Text.Encoding.Default).CurrentEncoding);
System.Text.StringBuilder buffer = new System.Text.StringBuilder();
//myLineBuff is a String
while ((myLineBuff = myStreamReader.ReadLine()) != null)
{
buffer.Append(myLineBuff);
}
return buffer.toString();
}

One problem is that it appears you're issuing each request twice:
myStreamReader = new System.IO.StreamReader(
new System.IO.StreamReader(
myHttpConn.GetResponse().GetResponseStream(),
System.Text.Encoding.Default).BaseStream,
new System.IO.StreamReader(myHttpConn.GetResponse().GetResponseStream(),
System.Text.Encoding.Default).CurrentEncoding);
It makes two calls to GetResponse. For reasons I fail to understand, you're also creating two stream readers. You can split that up and simplify it, and also do a better job of error handling...
var response = (HttpWebResponse)myHttpCon.GetResponse();
myStreamReader = new StreamReader(response.GetResponseStream(), Encoding.Default)
That should double your effective throughput.
Also, you probably want to make sure to dispose of the objects you're using. When you're downloading a lot of pages, you can quickly run out of resources if you don't clean up after yourself. In this case, you should call response.Close(). See http://msdn.microsoft.com/en-us/library/system.net.httpwebresponse.close.aspx

I am adding this answer as another possibility which people may encounter when
downloading from multiple servers using multi-threaded apps
using Windows XP or Vista as the operating system
The tcpip.sys driver for these operating systems has a limit of 10 outbound connections per second. This is a rate limit, not a connection limit, so you can have hundreds of connections, but you cannot initiate more than 10/s. The limit was imposed by Microsoft to curtail the spread of certain types of virus/worm. Whether such methods are effective is outside the scope of this answer.
In a multi-threaded application that downloads from multitudes of servers, this limitation can manifest as a series of timeouts. Windows puts into a queue all of the "half-open" (newly open but not yet established) connections once the 10/s limit is reached. In my application, for example, I had 20 threads ready to process connections, but I found that sometimes I would get timeouts from servers I knew were operating and reachable.
To verify that this is happening, check the operating system's event log, under System. The error is:
EventID 4226: TCP/IP has reached the security limit imposed on the number of concurrent TCP connect attempts.
There are many references to this error and plenty of patches and fixes to apply to remove the limit. However because this problem is frequently encountered by P2P (Torrent) users, there's quite a prolific amount of malware disguised as this patch.
I have a requirement to collect data from over 1200 servers (that are actually data sensors) on 5-minute intervals. I initially developed the application (on WinXP) to reuse 20 threads repeatedly to crawl the list of servers and aggregate the data into a SQL database. Because the connections were initiated based on a timer tick event, this error happened often because at their invocation, none of the connections are established, thus 10 are immediately queued.
Note that this isn't a problem necessarily, because as connections are established, those queued are then processed. However if non-queued connections are slow to establish, that time can negatively impact the timeout limits of the queued connections (in my experience). The result, looking at my application log file, was that I would see a batch of connections that timed out, followed by a majority of connections that were successful. Opening a web browser to test "timed out" connections was confusing, because the servers were available and quick to respond.
I decided to try HEX editing the tcpip.sys file, which was suggested on a guide at speedguide.net. The checksum of my file differed from the guide (I had SP3 not SP2) and comments in the guide weren't necessarily helpful. However, I did find a patch that worked for SP3 and noticed an immediate difference after applying it.
From what I can find, Windows 7 does not have this limitation, and since moving the application to a Windows 7-based machine, the timeout problem has remained absent.

I do this very same thing, but with thousands of sensors that provide XML and Text content. Factors that will definitely affect performance are not limited to the speed and power of your bandwidth and computer, but the bandwidth and response time of each server you are contacting, the timeout delays, the size of each download, and the reliability of the remote internet connections.
As comments indicate, hundreds of threads is not necessarily a good idea. Currently I've found that running between 20 and 50 threads at a time seems optimal. In my technique, as each thread completes a download, it is given the next item from a queue.
I run a custom ThreaderEngine Class on a separate thread that is responsible for maintaining the queue of work items and assigning threads as needed. Essentially it is a while loop that iterates through an array of threads. As the threads finish, it grabs the next item from the queue and starts the thread again.
Each of my threads are actually downloading several separate items, but the method call is the same (.NET 4.0):
public static string FileDownload(string _ip, int _port, string _file, int Timeout, int ReadWriteTimeout, NetworkCredential _cred = null)
{
string uri = String.Format("http://{0}:{1}/{2}", _ip, _port, _file);
string Data = String.Empty;
try
{
HttpWebRequest Request = (HttpWebRequest)WebRequest.Create(uri);
if (_cred != null) Request.Credentials = _cred;
Request.Timeout = Timeout; // applies to .GetResponse()
Request.ReadWriteTimeout = ReadWriteTimeout; // applies to .GetResponseStream()
Request.Proxy = null;
Request.CachePolicy = new System.Net.Cache.RequestCachePolicy(System.Net.Cache.RequestCacheLevel.NoCacheNoStore);
using (HttpWebResponse Response = (HttpWebResponse)Request.GetResponse())
{
using (Stream dataStream = Response.GetResponseStream())
{
if (dataStream != null)
using (BufferedStream buffer = new BufferedStream(dataStream))
using (StreamReader reader = new StreamReader(buffer))
{
Data = reader.ReadToEnd();
}
}
return Data;
}
}
catch (AccessViolationException ave)
{
// ...
}
catch (Exception exc)
{
// ...
}
}
Using this I am able to download about 60KB each from 1200+ remote machines (72MB) in less than 5 minutes. The machine is a Core 2 Quad with 2GB RAM and utilizes four bonded T1 connections (~6Mbps).

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.