I am performing scroll-search on Elasticsearch 5.3.0 via C# NEST API. The searchResponse.Total initially has 12522 hits. I fetch 100 documents in each scroll request. This way the number of fetched documents increments until 11300 apprx. However after this suddenly searchResponse.Total value starts to decrease as 7437 then 4000 until it reaches 0 or value lower than number of documents already read.
The code I am using is as follows :
List<ClusterJob> clusterJobData = new List<ClusterJob>();
var counter = 0;
var searchResponse = client.Search<ClusterJob>(search => search
.Index("privateIndex")
.Type("privateType")
.Size(100)
.Sort(ss => ss.Ascending(ff => ff.Request.RequestedTimeStamp))
.Scroll("5m")
while (searchResponse.Documents.Any())
{
if (searchResponse.Documents.Count > 0)
{
clusterJobData.AddRange(searchResponse.Documents);
}
counter += searchResponse.Documents.Count;
var results = searchResponse;
var retryCount = 0;
do
{
searchResponse = client.Scroll<ClusterJob>("1m", results.ScrollId);
Task.Delay(100);
retryCount++;
//Retry 3 times if request fails
} while (retryCount < 0 && !searchResponse.IsValid);
if (!searchResponse.IsValid)
{
Console.WriteLine("Failed to fetch Data");
}
Console.WriteLine("Lines Read : " + counter + "/" + searchResponse.Total);
}
My network speed is sufficiently good and I am keeping the search context open deliberately for long time. Any hints as to this unusual behavior of reducing searchResponse.total when code runs? Is this possibly a bug? I came across an old bug linked but unsure if this issue still persists : ElasticSearch Issue
Some information on my ES configuration if it helps :
Distributed across 5 VM cluster of 5 nodes
14 gb heap space allocation to each instance
Version : 5.3.0
64-bit Running as a service on each VM
Related
I have just started experimenting with Cassandra, and I'm using C# and the DataStax driver (v 3.0.8). I wanted to do some performance tests to see how fast Cassandra is handling time series data.
The results are chocking in that it takes an eternity to do a SELECT. So I guess I'm doing something wrong.
I have setup Cassandra on my local computer and I have created a table:
CREATE KEYSPACE dm WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true;
CREATE TABLE dm.daily_data_by_day (
symbol text,
value_type int,
as_of_day date,
revision_timestamp_utc timestamp,
value decimal,
PRIMARY KEY ((symbol, value_type), as_of_day, revision_timestamp_utc)
) WITH CLUSTERING ORDER BY (as_of_day ASC, revision_timestamp_utc ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
I have filled this table with about 15 million rows, divided into about 10000 partitions, each containing up to 10000 rows.
Here's the test I'm running (updated on request by phact):
[Test]
public void SelectPerformance()
{
_cluster = Cluster.Builder().AddContactPoint("127.0.0.1").Build();
_stopwatch = new Stopwatch();
var items = new[]
{
// 20 different items...
};
foreach (var item in items)
{
var watch = Stopwatch.StartNew();
var rows = ExecuteQuery(item.Symbol, item.FieldType, item.StartDate, item.EndDate);
watch.Stop();
Console.WriteLine($"{watch.ElapsedMilliseconds}\t{rows.Length}");
}
Console.WriteLine($"Average Execute: {_stopwatch.ElapsedMilliseconds/items.Length}");
_cluster.Dispose();
}
private Row[] ExecuteQuery(string symbol, int fieldType, LocalDate startDate, LocalDate endDate)
{
using (var session = _cluster.Connect("dm"))
{
var ps = session.Prepare(
#"SELECT
symbol,
value_type,
as_of_day,
revision_timestamp_utc,
value
FROM
daily_data_by_day
WHERE
symbol = ? AND
value_type = ? AND
as_of_day >= ? AND as_of_day < ?");
var statement = ps.Bind(symbol, fieldType, startDate, endDate);
statement.EnableTracing();
_stopwatch.Start();
var rowSet = session.Execute(statement);
_stopwatch.Stop();
return rowSet.ToArray();
}
}
The stopwatch tells me that session.Execute() takes 20-30 milliseconds to execute (update: after changing the code to create the cluster only once I'm down to about 15 milliseconds). So I enabled some tracing and got the following result:
activity | source_elapsed
--------------------------------------------------------------------------------------------
Parsing SELECT symbol, value_type, as_of_day, revision_timestamp_utc,...; | 47
Preparing statement | 98
Executing single-partition query on daily_data_by_day | 922
Acquiring sstable references | 939
Skipped 0/5 non-slice-intersecting sstables, included 0 due to tombstones | 978
Bloom filter allows skipping sstable 74 | 1003
Bloom filter allows skipping sstable 75 | 1015
Bloom filter allows skipping sstable 72 | 1024
Bloom filter allows skipping sstable 73 | 1032
Key cache hit for sstable 63 | 1043
Merged data from memtables and 5 sstables | 1329
Read 100 live and 0 tombstone cells | 1353
If I understand this trace correctly, Cassandra spends less than 1.4 milliseconds executing my query. So what is the DataStax driver doing the rest of the time?
(As a reference, I have done the same performance test against a local SQL Server instance resulting in about 1-2 milliseconds executing the same query from C#.)
Update:
I have attempted to do some profiling, which is not that easy to do with asynchronous code that you don't own...
My conclusion is that most of the time is spend parsing the response. Each response contains between 2000 - 3000 rows and parsing takes about 9 ms per response. Deserializing takes most of the time, about 6.5 ms, with decimal being the worst, about 3 ms per field. The other fields (text, int, date and timestamp) take about 0.5 ms per field.
Looking at my measured times I ought to have suspected this: the more rows in the response, the longer time it takes, and almost linearly.
#xmas79 Highlighted a great point. You should not create too many sessions instances (better use 1 for each keyspace), but there are also another guidelines that could help you. Follow the guidelines below and reference:
Use one Cluster instance per (physical) cluster (per application lifetime)
Use at most one Session per keyspace, or use a single Session and explicitly specify the keyspace in your queries
If you execute a statement more than once, consider using a PreparedStatement
You can reduce the number of network roundtrips and also have
atomic operations by using Batches
http://www.datastax.com/dev/blog/4-simple-rules-when-using-the-datastax-drivers-for-cassandra
EDIT
Also, taking a second look at your code, your are creating a prepared statement for every same query you are executing. The prepared statement should be created only once and you should use its reference to execute the queries. What prepared statements does is to send to the server the CQL that you will execute often so the server already parses the string and return to the user an identification for that . So, my suggestion to you is dont use it if you are not going to share the PreparedStatment object for each query. Or change your code to something like this:
[Test]
public void SelectPerformance()
{
_cluster = Cluster.Builder().AddContactPoint("127.0.0.1").Build();
var session = _cluster.Connect("dm");
var ps = session.Prepare(#"SELECT symbol, value_type, as_of_day, revision_timestamp_utc, value FROM daily_data_by_day WHERE symbol = ? AND value_type = ? AND as_of_day >= ? AND as_of_day < ?");
var items = new[]
{
// 20 different items...
};
foreach (var item in items)
{
var watch = Stopwatch.StartNew();
var rows = ExecuteQuery(session, ps, item.Symbol, item.FieldType, item.StartDate, item.EndDate);
watch.Stop();
Console.WriteLine($"{watch.ElapsedMilliseconds}\t{rows.Length}");
}
Console.WriteLine($"Average Execute: { _stopwatch.ElapsedMilliseconds/items.Length}");
_cluster.Dispose();
}
private Row[] ExecuteQuery(Session session, PreparedStatement ps, string symbol, int fieldType, LocalDate startDate, LocalDate endDate)
{
var statement = ps.Bind(symbol, fieldType, startDate, endDate);
// Do not enable request tracing for latency benchmarking
// statement.EnableTracing();
var rowSet = session.Execute(statement);
return rowSet.ToArray();
}
Short answer you want to keep the cluster object to Cassandra open and re use it across requests.
The creation of the cluster object itself is costly but gives benefits like automatic load balancing, token awareness, automatic failover , etc etc.
Why do you execute
using (var session = _cluster.Connect("dm"))
on every query? You should build your Cluster instance once, connect to the cluster and get the Session once, and reuse them everywhere. I think Cluster object configures important parameters like fail over, loadbalancing etc.. Session object manages them for you. Connecting everytime will give you performance penalties.
EDIT
It seems you are performing SELECT with the latency of 10ms-15ms each. Are you getting the same tracing numbers (eg 1.4ms) at every query? What's your storage IO system? If you are on spinning disks it could be a seek time penalty of your disk subsystem.
I'm writing a video manager in C# using ObjectListView, where preview images are generated, saved, and an entry is put into a sqlite db. From there, I use a VirtualObjectListView component to display the entries (Including the first image as a preview) in Details mode.
The problem I've ran into is that with several hundred+ entries, it starts to eat up ram and I'm seeing a lot of files (Lots of duplicates too) open/locked in Process Explorer. So, I've attempted to implement a caching system - where only 30 images are loaded at a time, and ones not needed are unloaded while new ones are loaded.
It doesn't work. It ends up loading multiple copies of each file somehow, and it just feels... Hacky. I've spent the past few days looking for an event or something that I can bind a method to, to do this - but I can't, so I've had to use GetNthObject in AbstractVirtualListDataSource.
Anyways, here's my code:
public override object GetNthObject(int n) {
VideoInfo p = (VideoInfo)this.Objects[n % this.Objects.Count];
p.ID = n;
int storeBufferHalf = 5;
int storeFrom = (n - storeBufferHalf < 0) ? 0 : n - storeBufferHalf;
int storeTo = (n + storeBufferHalf >= Objects.Count()) ? Objects.Count() - 1 : n + storeBufferHalf;
foreach (int cacheItem in cacheList.ToList()) {
if (cacheItem >= storeFrom && cacheItem <= storeTo)
continue;
VideoInfo unloadItem = (VideoInfo)this.Objects[cacheItem];
//Debug.WriteLine(cacheItem + " Preparing to delete cache: " + unloadItem.Name);
unloadItem.DestroyPreviewImage();
cacheList.Remove(cacheItem);
}
//Load up items into cache.
for (int i = storeFrom; i < storeTo; i++) {
if (!cacheList.Contains(i)) {
VideoInfo loadItem = (VideoInfo)this.Objects[i];
if (loadItem.PreviewImage != null)
continue;
loadItem.SetPreviewImage();
cacheList.Add(i);
}
}
return p;
}
Some more information: Basically, it kind of works... It does load multiple copies of each file, it does end up loading more images than it should after you scroll down a bit (The entire test DB of 60 items, actually), BUT scrolling the list up and down a few times gets it to unload the files (At least according to Process Explorer).
After that, it starts loading them all up again, some of them multiple times...
I think you need to group/split you input source list.
Consider using this answer as a means to split an IEnumerable into n parts. Then use a button to get the next/last section, and set that as your binding source.
I have a list of 500000 randomly generated Tuple<long,long,string> objects on which I am performing a simple "between" search:
var data = new List<Tuple<long,long,string>>(500000);
...
var cnt = data.Count(t => t.Item1 <= x && t.Item2 >= x);
When I generate my random array and run my search for 100 randomly generated values of x, the searches complete in about four seconds. Knowing of the great wonders that sorting does to searching, however, I decided to sort my data - first by Item1, then by Item2, and finally by Item3 - before running my 100 searches. I expected the sorted version to perform a little faster because of branch prediction: my thinking has been that once we get to the point where Item1 == x, all further checks of t.Item1 <= x would predict the branch correctly as "no take", speeding up the tail portion of the search. Much to my surprise, the searches took twice as long on a sorted array!
I tried switching around the order in which I ran my experiments, and used different seed for the random number generator, but the effect has been the same: searches in an unsorted array ran nearly twice as fast as the searches in the same array, but sorted!
Does anyone have a good explanation of this strange effect? The source code of my tests follows; I am using .NET 4.0.
private const int TotalCount = 500000;
private const int TotalQueries = 100;
private static long NextLong(Random r) {
var data = new byte[8];
r.NextBytes(data);
return BitConverter.ToInt64(data, 0);
}
private class TupleComparer : IComparer<Tuple<long,long,string>> {
public int Compare(Tuple<long,long,string> x, Tuple<long,long,string> y) {
var res = x.Item1.CompareTo(y.Item1);
if (res != 0) return res;
res = x.Item2.CompareTo(y.Item2);
return (res != 0) ? res : String.CompareOrdinal(x.Item3, y.Item3);
}
}
static void Test(bool doSort) {
var data = new List<Tuple<long,long,string>>(TotalCount);
var random = new Random(1000000007);
var sw = new Stopwatch();
sw.Start();
for (var i = 0 ; i != TotalCount ; i++) {
var a = NextLong(random);
var b = NextLong(random);
if (a > b) {
var tmp = a;
a = b;
b = tmp;
}
var s = string.Format("{0}-{1}", a, b);
data.Add(Tuple.Create(a, b, s));
}
sw.Stop();
if (doSort) {
data.Sort(new TupleComparer());
}
Console.WriteLine("Populated in {0}", sw.Elapsed);
sw.Reset();
var total = 0L;
sw.Start();
for (var i = 0 ; i != TotalQueries ; i++) {
var x = NextLong(random);
var cnt = data.Count(t => t.Item1 <= x && t.Item2 >= x);
total += cnt;
}
sw.Stop();
Console.WriteLine("Found {0} matches in {1} ({2})", total, sw.Elapsed, doSort ? "Sorted" : "Unsorted");
}
static void Main() {
Test(false);
Test(true);
Test(false);
Test(true);
}
Populated in 00:00:01.3176257
Found 15614281 matches in 00:00:04.2463478 (Unsorted)
Populated in 00:00:01.3345087
Found 15614281 matches in 00:00:08.5393730 (Sorted)
Populated in 00:00:01.3665681
Found 15614281 matches in 00:00:04.1796578 (Unsorted)
Populated in 00:00:01.3326378
Found 15614281 matches in 00:00:08.6027886 (Sorted)
When you are using the unsorted list all tuples are accessed in memory-order. They have been allocated consecutively in RAM. CPUs love accessing memory sequentially because they can speculatively request the next cache line so it will always be present when needed.
When you are sorting the list you put it into random order because your sort keys are randomly generated. This means that the memory accesses to tuple members are unpredictable. The CPU cannot prefetch memory and almost every access to a tuple is a cache miss.
This is a nice example for a specific advantage of GC memory management: data structures which have been allocated together and are used together perform very nicely. They have great locality of reference.
The penalty from cache misses outweighs the saved branch prediction penalty in this case.
Try switching to a struct-tuple. This will restore performance because no pointer-dereference needs to occur at runtime to access tuple members.
Chris Sinclair notes in the comments that "for TotalCount around 10,000 or less, the sorted version does perform faster". This is because a small list fits entirely into the CPU cache. The memory accesses might be unpredictable but the target is always in cache. I believe there is still a small penalty because even a load from cache takes some cycles. But that seems not to be a problem because the CPU can juggle multiple outstanding loads, thereby increasing throughput. Whenever the CPU hits a wait for memory it will still speed ahead in the instruction stream to queue as many memory operations as it can. This technique is used to hide latency.
This kind of behavior shows how hard it is to predict performance on modern CPUs. The fact that we are only 2x slower when going from sequential to random memory access tell me how much is going on under the covers to hide memory latency. A memory access can stall the CPU for 50-200 cycles. Given that number one could expect the program to become >10x slower when introducing random memory accesses.
LINQ doesn't know whether you list is sorted or not.
Since Count with predicate parameter is extension method for all IEnumerables, I think it doesn't even know if it's running over the collection with efficient random access. So, it simply checks every element and Usr explained why performance got lower.
To exploit performance benefits of sorted array (such as binary search), you'll have to do a little bit more coding.
I am running a test where I am comparing fetch time b/w appfabric and SQL Server 2008 and looks appFabric is performing 4x time slower than SQL Server.
I have a SQL Server 2008 setup which contains only one table with 4 columns (all nvarchar). The table has 6000 rows. I insert the same row (as CLR serializable obj) in the appfabric cache. I am running a loop to fetch data x times.
Here is the code
public class AppFabricCache
{
readonly DataCache myDefaultCache;
public AppFabricCache()
{
//-------------------------
// Configure Cache Client
//-------------------------
//Define Array for 1 Cache Host
var servers = new List<DataCacheServerEndpoint>(1);
//Specify Cache Host Details
// Parameter 1 = host name
// Parameter 2 = cache port number
servers.Add(new DataCacheServerEndpoint(#"localhost", 22233));
//Create cache configuration
var configuration = new DataCacheFactoryConfiguration();
//Set the cache host(s)
configuration.Servers = servers;
//Set default properties for local cache (local cache disabled)
configuration.LocalCacheProperties = new DataCacheLocalCacheProperties();
//Disable exception messages since this sample works on a cache aside
DataCacheClientLogManager.ChangeLogLevel(System.Diagnostics.TraceLevel.Off);
//Pass configuration settings to cacheFactory constructor
DataCacheFactory myCacheFactory = new DataCacheFactory(configuration);
//Get reference to named cache called "default"
myDefaultCache = myCacheFactory.GetCache("default");
}
public bool TryGetCachedObject(string key, out object value)
{
value = myDefaultCache.Get(key);
bool result = value != null;
return result;
}
public void PutItemIntoCache(string key, object value)
{
myDefaultCache.Put(key, value, TimeSpan.FromDays(365));
}
}
And here is the loop to fetch data from the cache
public double RunReadStressTest(int numberOfIterations, out int recordReadCount)
{
recordReadCount = 0;
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < numberOfIterations; i++)
{
for (int j = 1; j <= 6000; j++)
{
string posId = "PosId-" + j;
try
{
object value;
if (TryGetCachedObject(posId, out value))
recordReadCount++;
}
catch (Exception e)
{
Trace.WriteLine("AS%% - Exception - " + e.Message);
}
}
}
sw.Stop();
return sw.ElapsedMilliseconds;
}
}
I have exactly the same logic to retrieve data from SQL Server. It creates a
sqlCommand = 'Select * from TableName where posId = 'someId''
Here are the results...
SQL Server 2008 R2 Reading-1(ms) Reading-2(ms) Reading-3(ms) Average Time in Seconds
Iteration Count = 5 2528 2649 2665 2.614
Iteration Count = 10 5280 5445 5343 5.356
Iteration Count = 15 7978 8370 7800 8.049333333
Iteration Count = 20 9277 9643 10220 9.713333333
AppFabric Reading-1 Reading-2 Reading-3 Average Time in Seconds
Iteration Count = 5 10301 10160 10186 10.21566667
Iteration Count = 10 20130 20191 20650 20.32366667
Iteration Count = 15 30747 30571 30647 30.655
Iteration Count = 20 40448 40541 40503 40.49733333
Am I missing something here? Why it is so slow?
The difference is the network overhead. In your SQL example, you hop over the network once and select N rows. In your AppFabric example, you hop over the network PER RECORD instead of in bulk. This is the difference. To prove this, temporarily store your records in AppFabric as a List and get just the list one time, or use the AppFabric Bulk API to select them all in one request - that should account for much of the difference.
This may be caused by .Net's built in serialisation.
.Net serialisation utilises reflection which in turn has very poor performance. I'd recommend looking into the use of custom written serialisation code.
I think your test is biased and your results are non optimal.
About Distributed Cache
Local Cache : you have disabled local cache feature. Cache objects are always retrieved from the server ; network transfert and deserialization have a cost.
BulkGet : BulkGet improves performance when used with small objects, for example, when retrieving many objects of 1 - 5KB or less in size .
No Data Compression : No compression between AppFabric and Cache Clients. Check this.
About Your test
Another Important thing is that your are not testing the same thing : On one side you test SELECT * and and the other side you test N x GET ITEM.
I'm using MongoDB 1.8.2 (Debian) and mongo-csharp-driver 1.1.0.4184 (IIS 7.5/.Net 4.0 x64).
Multiple items are inserted every second in a existing collection with ~ 3,000,000 objects (~ 1.9 GB).
The WebServer memory is increasing by ~ 1 MB after every insert -> which leads very fast to > 2 GB memory usage.
The memory is never released anymore and only an application pool recycle can free the memory.
Any ideas?
MongoServer server = MongoServer.Create(mongoDBConnectionString);
MongoDatabase database = server.GetDatabase(dbName);
MongoCollection<Result> resultCollection = database.GetCollection<Result>("Result");
resultCollection.Insert(result);
result looks like
private class Result
{
public ObjectId _id { get; set; }
public DateTime Timestamp { get; set; }
public int Location { get; set; }
public string Content { get; set; }
}
UPDATE:
My problem is not the insert, it's the select - sorry weird codebase to investigate ;-)
I reproduced it with this sample Code:
Console.WriteLine("Start - " + GC.GetTotalMemory(false).ToString("#,###,##0") + " Bytes");
for (int i = 0; i < 10; i++)
{
MongoServer server = MongoServer.Create(mongoDBConnectionString);
MongoDatabase database = server.GetDatabase(dbName);
MongoCollection<Result> resultCollection = database.GetCollection<Result>("Result");
var query = Query.And ( Query.EQ("Location", 1), Query.GTE("Timestamp", DateTime.Now.AddDays(-90)) );
MongoCursor<Result> cursor = resultCollection.FindAs<Result>(query);
foreach (Result result in cursor)
{
// NOOP
}
Console.WriteLine(i + " - " + GC.GetTotalMemory(false).ToString("#,###,##0") + " Bytes");
}
Output from a .Net 4.0 Console Application with 10.000 results in the cursor:
Start - 193.060 Bytes
0 - 12.736.588 Bytes
1 - 24.331.600 Bytes
2 - 16.180.484 Bytes
3 - 13.223.036 Bytes
4 - 30.974.892 Bytes
5 - 13.335.236 Bytes
6 - 13.439.448 Bytes
7 - 13.942.436 Bytes
8 - 14.026.108 Bytes
9 - 14.113.352 Bytes
Output from a .Net 4.0 Web Application with the same 10.000 results in the cursor:
Start - 5.258.376 Bytes
0 - 20.677.816 Bytes
1 - 29.893.880 Bytes
2 - 43.783.016 Bytes
3 - 20.921.280 Bytes
4 - 34.814.088 Bytes
5 - 48.698.704 Bytes
6 - 62.576.480 Bytes
7 - 76.453.728 Bytes
8 - 90.347.360 Bytes
9 - 104.232.800 Bytes
RESULT:
Bug was reported to 10gen and they will fix it in version >= 1.4 (they are working currently on 1.2)!!!
According to documentation you should create one instance of MongoServer for one server you connected to:
MongoServer class
This class serves as the root object for working with a MongoDB
server. You will create one instance of this class for each server you
connect to.
So you need only one instance of MongoServer. This should solve your memory leaks issue.
I suggest to use dependency injection (unity, structure map, ..) to register MongoServer instance in container as singletone or just use classic singletone pattern for MongoServer. If you new in dependency injection you can take a look in Martin Fowler article.
Update:
Probably issue was not in mongodb c# driver. I've made following testing:
w7, iis 7.5, same c# driver, same mongodb:
for (int i = 0; i < 3000000; i++)
{
MongoServer server = MongoServer.Create("mongodb://localhost:27020");
MongoDatabase database = server.GetDatabase("mpower_read");
MongoCollection<Result> resultCollection =
database.GetCollection<Result>("results");
resultCollection.Insert(new Result()
{
_id = ObjectId.GenerateNewId(),
Content = i.ToString(),
Location = i,
Timestamp = DateTime.Now
});
}
I've even run this test 3 times and server not eat memory at all. So...