Configure OpenTelemetry to fail when sending traces to an exporter fail

Configure OpenTelemetry to fail when sending traces to an exporter fail - c#

We require that data is highly correct, more so than 100% uptime (I recognize this may mean that opentelemetry is not the best choice but still would like to know if its possible).
We are exporting to elastic using APM.
I have noticed 2 significant issues.
Issue 1: I provide no/incorrect bearer token. No errors (or traces) are recorded. Silent failure.
Issue 2: I try to write a huge number of traces (100k) as fast as possible. About 2k make it and the rest are discarded.
var tracerProvider = Sdk.CreateTracerProviderBuilder()
.SetSampler(new AlwaysOnSampler())
.AddSource("MyCompany.MyProduct.MyLibrary")
.AddOtlpExporter(o =>
{
o.ExportProcessorType = ExportProcessorType.Batch;
o.TimeoutMilliseconds = 100 * 1000;
o.Protocol = OtlpExportProtocol.Grpc;
o.Endpoint = new Uri("https://somepath:443");
o.Headers = "Authorization=Bearer token1";
})
//.AddConsoleExporter()
.Build();
Task.Run(() =>
{
for (int i = 0; i < 100000; i++)
{
using (var activity = MyActivitySource.StartActivity("SayHello"))
{
activity?.SetTag("foo", 1);
}
}
});
Console.WriteLine("Done");
Console.ReadLine();

I would start otel collector locally - then point your app code to local otel collector and local otel collector will export traces to Elastic APM.
Configure local otel collector:
1.) Enable telemetry metrics and monitor failed metrics of used OTLP exporter, e.g. https://grafana.com/grafana/dashboards/15983
2.) Use aggresive batching - batch processor before exporting to Elastic APM, e.g.
processors:
batch:
send_batch_size: 10000
timeout: 5s
send_batch_max_size: 0
Collector will decrease ingestion rate to Elastic APM thank to batching. Of course you need to test it first, because some trace backend implementation uses defaults - e.g. GRPC for Go has default 4MB GRPC message size and 10k traces in one message will very likely exceeds this limit. Very often these infrastructure/app limits are not documented, so stress testing before production is highly recommended. Keep in mind that batching will require additional memory, so monitor also memory usage.
I would customize retry and queue behaviour.

Related

Improve performance of Redis to reduce timeout exceptions

I have a Redis Database on a Centos server, and 3 Windows servers are connected to it with approximately 1,000 reads/writes per second, all of which are on the same local LAN, so the ping time is less than one millisecond.
The problem is at least 5 percent of reading operations are going timeout, while I read maximum 3KB data in a read operation with 'syncTimeout=15', which is much more than network latency.
I installed Redis on bash on my windows 10, and simulate the problem. I also stopped writing operations. However, the problem still exists with 0.5 percent timeouts, while there is no network latency.
I also used a Centos Server in my LAN to simulate the problem, in this case, I need at 100 milliseconds for 'syncTimeout' to be sure the amount of timeout is less than 1 percent.
I considered using some Dictionaries to cache data from Redis, so there is no need to request per item, and I can take advantage of the pipeline. But I came across StackRedis.L1 which is developed as an L1 cache for Redis, and it is not confident in updating the L1 cache.
This is my code to simulate the problem:
var connectionMulti = ConnectionMultiplexer.Connect(
"127.0.0.1:6379,127.0.0.1:6380,allowAdmin=true,syncTimeout=15");
// 100,000 keys
var testKeys = File.ReadAllLines("D:\\RedisTestKeys.txt");
for (var i = 0; i < 3; i++)
{
var safeI = i;
Task.Factory.StartNew(() =>
{
var serverName = $"server {safeI + 1}";
var stringDatabase = connectionMulti.GetDatabase(12);
PerformanceTest($"{serverName} -> String: ",
key => stringDatabase.StringGet(key), testKeys);
});
}
and the PerformanceTest method is:
private static void PerformanceTest(string testName, Func<string, RedisValue> valueExtractor,
IList<string> keys)
{
Task.Factory.StartNew(() =>
{
Console.WriteLine($"Starting {testName} ...");
var timeouts = 0;
var errors = 0;
long totalElapsedMilliseconds = 0;
var stopwatch = new Stopwatch();
foreach (var key in keys)
{
var redisValue = new RedisValue();
stopwatch.Restart();
try
{
redisValue = valueExtractor(key);
}
catch (Exception e)
{
if (e is TimeoutException)
timeouts++;
else
errors++;
}
finally
{
stopwatch.Stop();
totalElapsedMilliseconds += stopwatch.ElapsedMilliseconds;
lock (FileLocker)
{
File.AppendAllLines("D:\\TestResult.csv",
new[]
{
$"{stopwatch.ElapsedMilliseconds.ToString()},{redisValue.Length()},{key}"
});
}
}
}
Console.WriteLine(
$"{testName} {totalElapsedMilliseconds * 1.0 / keys.Count} (errors: {errors}), (timeouts: {timeouts})");
});
}
I expect all read operations will be done successfully less than 15 milliseconds.
Achieving this, is Considering L1 cache for a Redis cache a good solution? (It is very fast, in the scale of a nanosecond, but how can I do for syncronizing)
Or Redis can be enhanced by clustering or something else? (While I tested it on bash on my PC, and I did not receive expected result)

Or Redis can be enhanced by clustering or something else?
Redis can be clustered, in different ways:
"regular" redis can be replicated to secondary read-only nodes, on the same machine or different machines; you can then send "read" traffic to some of the replicas
redis "cluster" exists, which allows you to split (shard) the keyspace over multiple primaries, sending appropriate requests to each node
redis "cluster" can also make use of readonly replicas of the sharded nodes
Whether that is appropriate or useful is contextual and needs local knowledge and testing.
Achieving this, is Considering L1 cache for a Redis cache a good solution?
Yes, it is a good solution. A request you don't make is much faster (and has much less impact on the impact) than a request you do make. There are tools for helping with cache invalidation, including using the pub/sub API for invalidations. Redis vNext is also looking into additional knowledge APIs specifically for this kind of L1 scenario.

Why does DataCache client behave this way?

I have a cache instance running on Windows Azure. I'm connecting to it from my web application and getting intermittent exceptions with the following message:
ErrorCode:SubStatus:There is a temporary failure.
Please retry later. (One or more specified cache servers are
unavailable, which could be caused by busy network or servers. For
on-premises cache clusters, also verify the following conditions.
Ensure that security permission has been granted for this client
account, and check that the AppFabric Caching Service is allowed
through the firewall on all cache hosts. Also the MaxBufferSize on the
server must be greater than or equal to the serialized object size
sent from the client.). Additional Information : The client was trying
to communicate with the server:
net.tcp://myserver.cache.windows.net:22234.
I've been able to duplicate the problem with this snippet in LinqPad
var config = new DataCacheFactoryConfiguration
{
AutoDiscoverProperty = new DataCacheAutoDiscoverProperty(true, "myserver.cache.windows.net"),
SecurityProperties = new DataCacheSecurity("key", false)
};
var factory = new DataCacheFactory(config);
var client = factory.GetDefaultCache();
//client.Put("foo", "bar");
for (int i = 0; i < 100; i++)
{
System.Threading.Tasks.Task.Factory.StartNew(o => {
var i1 = (int)o;
try
{
client.Get("foo").Dump();
} catch (Exception e)
{
e.Message.Dump();
}
}, i);
}
If I run this snippet as-is, spawning more than about 50 threads, I get the error. If I uncomment the initial Put(), I can run it with 10,000 threads. I make sure the entry is in the cache regardless before I run this. I've tried using pessimistic locking and it does not seem to have any effect. I've used the latest client DLLs from NuGet. I've tried scaling the cache up to 1GB with no other usage besides this snippet.
Since my requests in my web app are coming in on different threads, I believe this reasonably simulates what's happening in my app. And I'm definitely getting the same exception in both cases. Can anyone suggest a way to avoid this exception? Does it have to do with the initial Put() happening on the same thread as the constructor? That seems unlikely but it's the only thing I can do in this test scenario to eliminate the exception.

neo4j REST API poor performance

According to my benchmark of creating nodes using
GraphClient.Create()
performance leaves much to be desired.
I've got about 10 empty nodes per second on my machine (Core i3, 8 GB RAM).
Even when I use multithreading to perform create time to each Create() call speed icreases linearly (~N times when used N threads).
I've tested both stable 1.9.2 and 2.0.0-M04. The results exactly the same.
Does anybody know what's wrong?
EDIT: I tried to use neo4j REST API and I got similar results: ~ 20 empty nodes per second and multithreading also gives no benefits.
EDIT 2: At the same time Batch REST API, that allows batch creations provides much better performance: about 250 nodes per second. It looks like there is incredible big overhead in handling single request...

Poor performance caused by overhead in processing RESTful Cypher query. Mostly it is network overhead but overhead caused by need to parse query also exists.
Use Core Java API when you interested in high performance. Core Java API provides more than 10 times faster requests processing than Cypher query language.
See this articles:
Performance of Graph Query Languages
Get the full neo4j power by using the Core Java API for traversing
your Graph data base instead of Cypher Query Language

The neo4jclient itself uses the REST API, so you're already limited in performance (by bandwidth, network latency etc) when compared to a direct API call (for which you'd need Java).
What performance are you after?
What code are you running?
Some initial thoughts & tests to try:
Obviously there are things like CPU etc which will cause some throttling, some things to consider:
Is the Neo4J server on the same machine?
Have you tried your application not through Visual Studio? (i.e. no debugging)
In my test code (below), I get 10 entries in ~200ms - can you try this code in a simple console app and see what you get?
private static void Main()
{
var client = new GraphClient(new Uri("http://localhost.:7474/db/data"));
client.Connect();
for (int i = 0; i < 10; i++)
CreateEmptyNodes(10, client);
}
private static void CreateEmptyNodes(int numberToCreate, IGraphClient client)
{
var start = DateTime.Now;
for (int i = 0; i < numberToCreate; i++)
client.Create(new object());
var timeTaken = DateTime.Now - start;
Console.WriteLine("For {0} items, I took: {1}ms", numberToCreate, timeTaken.TotalMilliseconds);
}
EDIT:
This is a raw HttpClient approach to calling the 'Create', which I believe is analagous to what neo4jclient is doing under the hood:
private async static void StraightHttpClient(int iterations, int amount)
{
var client = new HttpClient {BaseAddress = new Uri("http://localhost.:7474/db/data/")};
for (int j = 0; j < iterations; j++)
{
DateTime start = DateTime.Now;
for (int i = 0; i < amount; i++)
{
var response = await client.SendAsync(new HttpRequestMessage(HttpMethod.Post, "cypher/") { Content = new StringContent("{\"query\":\"create me\"}", Encoding.UTF8, "application/json") });
if(response.StatusCode != HttpStatusCode.OK)
Console.WriteLine("Not ok");
}
TimeSpan timeTaken = DateTime.Now - start;
Console.WriteLine("took {0}ms", timeTaken.TotalMilliseconds);
}
}
Now, if you didn't care about the response, you could just call Client.SendAsync(..) without the await, and that gets you to a spiffy ~2500 per second. However obviously the big issue here is that you haven't necessarily sent any of those creates, you've basically queued them, so shut down your program straight after, and chances are you'll have either no entries, or a very small number.
So.. clearly the code can handle firing x thousand calls a second with no problems, (I've done a similar test to the above using ServiceStack and RestSharp, both take similar times to the HttpClient).
What it can't do is send those to the actual server at the same rate, so we're limited by the windows http stack and / or how fast n4j can process the request and supply a response.

Load testing a website

I am currently writing a small application to load test a website and am having a few problems.
List<string> pageUrls = new List<string();
// NOT SHOWN ... populate the pageUrls with thousands of links
var parallelOptions = new System.Threading.Tasks.ParallelOptions();
parallelOptions.MaxDegreeOfParallelism = 100;
System.Threading.Tasks.Parallel.ForEach(pageUrls, parallelOptions, pageUrl =>
{
var startedOn = DateTime.UtcNow;
var request = System.Net.HttpWebRequest.Create(pageUrl);
var responseTimeBefore = DateTime.UtcNow;
try
{
var response = (System.Net.HttpWebResponse)request.GetResponse();
responseCode = response.StatusCode.ToString();
response.Close();
}
catch (System.Net.WebException ex)
{
// NOT SHOWN ... write to the error log
}
var responseTimeAfter = DateTime.UtcNow;
var responseDuration = responseTimeAfter - responseTimeBefore;
// NOT SHOWN ... write the response duration out to a file
var endedOn = DateTime.UtcNow;
var threadDuration = endedOn - startedOn;
// sleep for one second
var oneSecond = new TimeSpan(0, 0, 1);
if (threadDuration < oneSecond)
{
System.Threading.Thread.Sleep(oneSecond - threadDuration);
}
}
);
When I set the MaxDegreeOfParallelism to a low value such as 10 everything works fine, the responseDuration stays between 1 and 3 seconds. If I increase the value to 100 (as in the example) the responseDuration climbs quickly until after around 300 requests the it has reached 25 seconds (and still climbing).
I thought I may be doing something wrong so I also ran Apache jMeter with the standard web test plan setup and set the users to 100. After about 300 samples the response times had rocketed to around 40 seconds.
I'm skeptical that my server is reaching its limit. The task manager on the server shows that only 2GB of the 16GB is being used and the processor hangs around 5% effort.
Could I be hitting some limit on the number of simultaneous connections on my client computer? If so, how do I change this?
Am I forgetting to do something in my code? Clean-up/close connections?
Could it be that my code is OK and it is in fact my server that just can't handle the traffic?
For reference my client computer that is running the code above is running Windows 7 and is on the same network as the server I am testing. The server is running Windows Server 2008 IIS 7.5 and is a dedicated 8-core 16GB RAM machine.

MaxDegreeOfParallelism should be used only when you are trying to limit the number of cores to be used as part of your program strategy.
By default, Parallel library utilizes the most number of available threads - so setting this option to any number mostly will limit the performance depending on the environment running it.
I would suggest you to try running this code without setting this option and that should improve the performance.
ParallelOptions.MaxDegreeOfParallelism Property in MSDN - read remarks section for more information.

Several suggestions:
How large is your recorded Jmeter test script and did you insert some think time? The larger the test, the heavier the load.
Make sure the LAN is not in use by competing traffic during test runs. Having a Gigabit ethernet switch should be mandatory.
Do use 2-3 slave machines and avoid using heavy results loggers in Jmeter like tree.You were right to minimize these graphs and results.

Optimizing download of multiple web pages. C#

I am developing an app where I need to download a bunch of web pages, preferably as fast as possible. The way that I do that right now is that I have multiple threads (100's) that have their own System.Net.HttpWebRequest. This sort of works, but I am not getting the performance I would like. Currently I have a beefy 600+ Mb/s connection to work with, and this is only utilized at most 10% (at peaks). I guess my strategy is flawed, but I am unable to find any other good way of doing this.
Also: If the use of HttpWebRequest is not a good way to download web pages, please say so :)
The code has been semi-auto-converted from java.
Thanks :)
Update:
public String getPage(String link){
myURL = new System.Uri(link);
myHttpConn = (System.Net.HttpWebRequest)System.Net.WebRequest.Create(myURL);
myStreamReader = new System.IO.StreamReader(new System.IO.StreamReader(myHttpConn.GetResponse().GetResponseStream(),
System.Text.Encoding.Default).BaseStream,
new System.IO.StreamReader(myHttpConn.GetResponse().GetResponseStream(),
System.Text.Encoding.Default).CurrentEncoding);
System.Text.StringBuilder buffer = new System.Text.StringBuilder();
//myLineBuff is a String
while ((myLineBuff = myStreamReader.ReadLine()) != null)
{
buffer.Append(myLineBuff);
}
return buffer.toString();
}

One problem is that it appears you're issuing each request twice:
myStreamReader = new System.IO.StreamReader(
new System.IO.StreamReader(
myHttpConn.GetResponse().GetResponseStream(),
System.Text.Encoding.Default).BaseStream,
new System.IO.StreamReader(myHttpConn.GetResponse().GetResponseStream(),
System.Text.Encoding.Default).CurrentEncoding);
It makes two calls to GetResponse. For reasons I fail to understand, you're also creating two stream readers. You can split that up and simplify it, and also do a better job of error handling...
var response = (HttpWebResponse)myHttpCon.GetResponse();
myStreamReader = new StreamReader(response.GetResponseStream(), Encoding.Default)
That should double your effective throughput.
Also, you probably want to make sure to dispose of the objects you're using. When you're downloading a lot of pages, you can quickly run out of resources if you don't clean up after yourself. In this case, you should call response.Close(). See http://msdn.microsoft.com/en-us/library/system.net.httpwebresponse.close.aspx

I am adding this answer as another possibility which people may encounter when
downloading from multiple servers using multi-threaded apps
using Windows XP or Vista as the operating system
The tcpip.sys driver for these operating systems has a limit of 10 outbound connections per second. This is a rate limit, not a connection limit, so you can have hundreds of connections, but you cannot initiate more than 10/s. The limit was imposed by Microsoft to curtail the spread of certain types of virus/worm. Whether such methods are effective is outside the scope of this answer.
In a multi-threaded application that downloads from multitudes of servers, this limitation can manifest as a series of timeouts. Windows puts into a queue all of the "half-open" (newly open but not yet established) connections once the 10/s limit is reached. In my application, for example, I had 20 threads ready to process connections, but I found that sometimes I would get timeouts from servers I knew were operating and reachable.
To verify that this is happening, check the operating system's event log, under System. The error is:
EventID 4226: TCP/IP has reached the security limit imposed on the number of concurrent TCP connect attempts.
There are many references to this error and plenty of patches and fixes to apply to remove the limit. However because this problem is frequently encountered by P2P (Torrent) users, there's quite a prolific amount of malware disguised as this patch.
I have a requirement to collect data from over 1200 servers (that are actually data sensors) on 5-minute intervals. I initially developed the application (on WinXP) to reuse 20 threads repeatedly to crawl the list of servers and aggregate the data into a SQL database. Because the connections were initiated based on a timer tick event, this error happened often because at their invocation, none of the connections are established, thus 10 are immediately queued.
Note that this isn't a problem necessarily, because as connections are established, those queued are then processed. However if non-queued connections are slow to establish, that time can negatively impact the timeout limits of the queued connections (in my experience). The result, looking at my application log file, was that I would see a batch of connections that timed out, followed by a majority of connections that were successful. Opening a web browser to test "timed out" connections was confusing, because the servers were available and quick to respond.
I decided to try HEX editing the tcpip.sys file, which was suggested on a guide at speedguide.net. The checksum of my file differed from the guide (I had SP3 not SP2) and comments in the guide weren't necessarily helpful. However, I did find a patch that worked for SP3 and noticed an immediate difference after applying it.
From what I can find, Windows 7 does not have this limitation, and since moving the application to a Windows 7-based machine, the timeout problem has remained absent.

I do this very same thing, but with thousands of sensors that provide XML and Text content. Factors that will definitely affect performance are not limited to the speed and power of your bandwidth and computer, but the bandwidth and response time of each server you are contacting, the timeout delays, the size of each download, and the reliability of the remote internet connections.
As comments indicate, hundreds of threads is not necessarily a good idea. Currently I've found that running between 20 and 50 threads at a time seems optimal. In my technique, as each thread completes a download, it is given the next item from a queue.
I run a custom ThreaderEngine Class on a separate thread that is responsible for maintaining the queue of work items and assigning threads as needed. Essentially it is a while loop that iterates through an array of threads. As the threads finish, it grabs the next item from the queue and starts the thread again.
Each of my threads are actually downloading several separate items, but the method call is the same (.NET 4.0):
public static string FileDownload(string _ip, int _port, string _file, int Timeout, int ReadWriteTimeout, NetworkCredential _cred = null)
{
string uri = String.Format("http://{0}:{1}/{2}", _ip, _port, _file);
string Data = String.Empty;
try
{
HttpWebRequest Request = (HttpWebRequest)WebRequest.Create(uri);
if (_cred != null) Request.Credentials = _cred;
Request.Timeout = Timeout; // applies to .GetResponse()
Request.ReadWriteTimeout = ReadWriteTimeout; // applies to .GetResponseStream()
Request.Proxy = null;
Request.CachePolicy = new System.Net.Cache.RequestCachePolicy(System.Net.Cache.RequestCacheLevel.NoCacheNoStore);
using (HttpWebResponse Response = (HttpWebResponse)Request.GetResponse())
{
using (Stream dataStream = Response.GetResponseStream())
{
if (dataStream != null)
using (BufferedStream buffer = new BufferedStream(dataStream))
using (StreamReader reader = new StreamReader(buffer))
{
Data = reader.ReadToEnd();
}
}
return Data;
}
}
catch (AccessViolationException ave)
{
// ...
}
catch (Exception exc)
{
// ...
}
}
Using this I am able to download about 60KB each from 1200+ remote machines (72MB) in less than 5 minutes. The machine is a Core 2 Quad with 2GB RAM and utilizes four bonded T1 connections (~6Mbps).

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.