MongoDB / C# Driver and Memory Issue

MongoDB / C# Driver and Memory Issue - c#

I'm using MongoDB 1.8.2 (Debian) and mongo-csharp-driver 1.1.0.4184 (IIS 7.5/.Net 4.0 x64).
Multiple items are inserted every second in a existing collection with ~ 3,000,000 objects (~ 1.9 GB).
The WebServer memory is increasing by ~ 1 MB after every insert -> which leads very fast to > 2 GB memory usage.
The memory is never released anymore and only an application pool recycle can free the memory.
Any ideas?
MongoServer server = MongoServer.Create(mongoDBConnectionString);
MongoDatabase database = server.GetDatabase(dbName);
MongoCollection<Result> resultCollection = database.GetCollection<Result>("Result");
resultCollection.Insert(result);
result looks like
private class Result
{
public ObjectId _id { get; set; }
public DateTime Timestamp { get; set; }
public int Location { get; set; }
public string Content { get; set; }
}
UPDATE:
My problem is not the insert, it's the select - sorry weird codebase to investigate ;-)
I reproduced it with this sample Code:
Console.WriteLine("Start - " + GC.GetTotalMemory(false).ToString("#,###,##0") + " Bytes");
for (int i = 0; i < 10; i++)
{
MongoServer server = MongoServer.Create(mongoDBConnectionString);
MongoDatabase database = server.GetDatabase(dbName);
MongoCollection<Result> resultCollection = database.GetCollection<Result>("Result");
var query = Query.And ( Query.EQ("Location", 1), Query.GTE("Timestamp", DateTime.Now.AddDays(-90)) );
MongoCursor<Result> cursor = resultCollection.FindAs<Result>(query);
foreach (Result result in cursor)
{
// NOOP
}
Console.WriteLine(i + " - " + GC.GetTotalMemory(false).ToString("#,###,##0") + " Bytes");
}
Output from a .Net 4.0 Console Application with 10.000 results in the cursor:
Start - 193.060 Bytes
0 - 12.736.588 Bytes
1 - 24.331.600 Bytes
2 - 16.180.484 Bytes
3 - 13.223.036 Bytes
4 - 30.974.892 Bytes
5 - 13.335.236 Bytes
6 - 13.439.448 Bytes
7 - 13.942.436 Bytes
8 - 14.026.108 Bytes
9 - 14.113.352 Bytes
Output from a .Net 4.0 Web Application with the same 10.000 results in the cursor:
Start - 5.258.376 Bytes
0 - 20.677.816 Bytes
1 - 29.893.880 Bytes
2 - 43.783.016 Bytes
3 - 20.921.280 Bytes
4 - 34.814.088 Bytes
5 - 48.698.704 Bytes
6 - 62.576.480 Bytes
7 - 76.453.728 Bytes
8 - 90.347.360 Bytes
9 - 104.232.800 Bytes
RESULT:
Bug was reported to 10gen and they will fix it in version >= 1.4 (they are working currently on 1.2)!!!

According to documentation you should create one instance of MongoServer for one server you connected to:
MongoServer class
This class serves as the root object for working with a MongoDB
server. You will create one instance of this class for each server you
connect to.
So you need only one instance of MongoServer. This should solve your memory leaks issue.
I suggest to use dependency injection (unity, structure map, ..) to register MongoServer instance in container as singletone or just use classic singletone pattern for MongoServer. If you new in dependency injection you can take a look in Martin Fowler article.
Update:
Probably issue was not in mongodb c# driver. I've made following testing:
w7, iis 7.5, same c# driver, same mongodb:
for (int i = 0; i < 3000000; i++)
{
MongoServer server = MongoServer.Create("mongodb://localhost:27020");
MongoDatabase database = server.GetDatabase("mpower_read");
MongoCollection<Result> resultCollection =
database.GetCollection<Result>("results");
resultCollection.Insert(new Result()
{
_id = ObjectId.GenerateNewId(),
Content = i.ToString(),
Location = i,
Timestamp = DateTime.Now
});
}
I've even run this test 3 times and server not eat memory at all. So...

Related

MemoryMapped file Access Exception

I am trying to read a huge file (binary\text mixed) using MemoryMap.
However, it comes to a point of my loop iteration that it just gives an access exception ; just it, it doesn't say anything about what kind of exception it is, why it couldn't read, etc. I've been trying to figure out it for a few hours but can't get any conclusion.
Here's the code i am using to read it:
//numOfColors = 6
private static void ReadChunksFromLargeFile(int offsetToBegin, string fName, int numOfColors)
{
var idx = offsetToBegin;
int byteSizeForEachColor = (int)new FileInfo(fName).Length/ numOfColors;
var buffer = new byte[byteSizeForEachColor];
using (var mmf = MemoryMappedFile.CreateFromFile(fName))
{
for(int i=0; i < numOfColors; i++)
{
//numOfColors = 6
using (var view = mmf.CreateViewStream(idx, byteSizeForEachColor, MemoryMappedFileAccess.Read))
{
view.Seek(idx, SeekOrigin.Begin);
view.Read(buffer, 0, byteSizeForEachColor);
var temp = ByteArrayToHexString(buffer);
File.WriteAllText($#"C:\test\buffertest{i}.hex", temp);
}
idx += byteSizeForEachColor;
}
}
}
EDIT: offsetToBegin is 937
What i'm trying to do is read huge chunks based on a size i need. However, when it comes to i = 5 it just throws the exception.
The file i'm trying to read is this one: https://drive.google.com/file/d/1DsLaNnAOQDyWJ_g4PPNXGCNfbuirs_Ss/view?usp=sharing
Any input is appreciated. Thanks !

Your calculations are wrong. When you calculate the size of each color you are not taking into account the offset that you don't want. When you call CreateViewFromStream and tell it to skip the offset it is trying to read too many bytes, causing the AccessException.
For example:
Filesize = 60 bytes
offset = 2 bytes
num of colors = 6
Your original calculation would result in:
byteSizeForEachColor = 10
So your loop will skip the first 2 bytes and then read 10 bytes for each color, but when it comes to the last color it has already gone past the end of the file
5 x (10 + 2) = 60
6 x (10 + 2) = 72 // The file is only 60 bytes long - it has gone too far
You need to subtract the offsetToBegin from the calculated size to ensure it only reads the correct number of bytes.
Using the above values
bytesizeForEachColor = (60 / 6) - 2 = 8
So it should only read 8 bytes for each color. You need to change your code to:
using (var view = mmf.CreateViewStream(idx, byteSizeForEachColor - idx, MemoryMappedFileAccess.Read))
{
...
}
Now each loop will skip 2 bytes and read 8 - which will not cause it to go beyond the length of the file.

Changing value of searchResponse.total when using ScrollSearch in ElasticSearch NEST API

I am performing scroll-search on Elasticsearch 5.3.0 via C# NEST API. The searchResponse.Total initially has 12522 hits. I fetch 100 documents in each scroll request. This way the number of fetched documents increments until 11300 apprx. However after this suddenly searchResponse.Total value starts to decrease as 7437 then 4000 until it reaches 0 or value lower than number of documents already read.
The code I am using is as follows :
List<ClusterJob> clusterJobData = new List<ClusterJob>();
var counter = 0;
var searchResponse = client.Search<ClusterJob>(search => search
.Index("privateIndex")
.Type("privateType")
.Size(100)
.Sort(ss => ss.Ascending(ff => ff.Request.RequestedTimeStamp))
.Scroll("5m")
while (searchResponse.Documents.Any())
{
if (searchResponse.Documents.Count > 0)
{
clusterJobData.AddRange(searchResponse.Documents);
}
counter += searchResponse.Documents.Count;
var results = searchResponse;
var retryCount = 0;
do
{
searchResponse = client.Scroll<ClusterJob>("1m", results.ScrollId);
Task.Delay(100);
retryCount++;
//Retry 3 times if request fails
} while (retryCount < 0 && !searchResponse.IsValid);
if (!searchResponse.IsValid)
{
Console.WriteLine("Failed to fetch Data");
}
Console.WriteLine("Lines Read : " + counter + "/" + searchResponse.Total);
}
My network speed is sufficiently good and I am keeping the search context open deliberately for long time. Any hints as to this unusual behavior of reducing searchResponse.total when code runs? Is this possibly a bug? I came across an old bug linked but unsure if this issue still persists : ElasticSearch Issue
Some information on my ES configuration if it helps :
Distributed across 5 VM cluster of 5 nodes
14 gb heap space allocation to each instance
Version : 5.3.0
64-bit Running as a service on each VM

Converting byte[] to string efficiently

I am working on a hobby project (simple / efficient datastore). My current concern is regarding the performance of reading data from disk (binary) and populating my objects.
My goal is to create a simple store optimized for read performance (for mobile) that is much faster than reading from SQL database or CSV.
After profiling the application, when I read data from disk (~1000 records = 240 ms) records most of the time is spent in the method "set(byte[])":
// data layout:
// strings are stored as there UTF-8 representation in a byte array
// within a "row", the first two bytes contain the length in bytes of the string data
// my data store also supports other types (which are much faster) - not shown below.
class myObject : IRow
{
public string Name;
public string Title;
// and so on
public void set(byte[] row_buffer)
{
int offset = 0;
short strLength = 0;
// Name - variable about 40 bytes
strLength = BitConverter.ToInt16(row_buffer, offset);
offset += 2;
Name = Encoding.UTF8.GetString(row_buffer, offset, strLength);
offset += strLength;
// Path - variable about 150 bytes
strLength = BitConverter.ToInt16(row_buffer, offset);
offset += 2;
Path = Encoding.UTF8.GetString(row_buffer, offset, strLength);
offset += strLength;
// and so on
}
}
Further remarks:
The data is read as binary from disk.
for each row in the file, a new object is created and the function set(row_buffer) is called.
Reading the stream into the row_buffer (using br.Read(row_Buffer, 0, rowLengths[i])) consumes ~ 10% of the time
Converting the bytes (GetString) to string consumes about 88% of the time
-> I don't understand why creating strings is so expensive :(
Any idea, how I can improve the performance? I am limited to "safe C#" code only.
Thanks for reading.
EDIT
I need to create the Objects to run my Linq queries. I would like to defer object creation but failed to find a way at this stage. See my other SO question: Implement Linq query on byte[] for my own type

How to get the size of a site collection through C#?

I have limited the size of the content database upto 200 GB. I have used the following code to get the size of a content database.
SPWebApplication elevatedWebApp = spWeb.Site.WebApplication;
SPContentDatabaseCollection dbCollection = elevatedWebApp.ContentDatabases;
//Guid id = dbCollection[0].Id;
Guid lastContentDbGuid = new Guid(contentDbGuid);
//SPContentDatabase lastDatabase = dbCollection[dbCollection.Count - 1];
SPContentDatabase lastDatabase = dbCollection[lastContentDbGuid];
ulong dbSize = lastDatabase.DiskSizeRequired;
contentDbMB = ConvertToMegabytes(dbSize);
static string ConvertToMegabytes(ulong bytes)
{
return ((decimal)bytes / 1024M / 1024M).ToString("F1") + "MB";
}
Similary I want to limit a site collection upto 100 GB. So I am using the following code
long bytes = spSite.Usage.Storage;
double siteCollectionSizeInMB = ConvertBytesToMegabytes(bytes);
>
static double ConvertBytesToMegabytes(long bytes)
{
return (bytes / 1024f) / 1024f;
}
I need the size of collection using C# only. Am I doing it right way ? If I am doing anything wrong then please guide me. If anyone is having different solution then please share.

The question is answered here http://social.msdn.microsoft.com/forums/en-us/sharepointdevelopment/thread/0d066e9b-f6b9-49bc-b741-fcf7abdc854b

Look at the two solutions here: Theres both a powershell and c# solution i.e. the C# solution to get the collection size is:
SPWebService service = SPFarm.Local.Services.GetValue<SPWebService>();
foreach (SPWebApplication webapp in service.WebApplications)
{
Console.WriteLine("WebApplication : " + webapp.Name);
foreach (SPContentDatabase db in webapp.ContentDatabases)
{
Console.WriteLine("{0}, Nb Sites : {1}, Size : {2}, ", db.Name, db.Sites.Count, db.DiskSizeRequired);
}
}
You could use the following to get the total DB size etc (Taken from here):
Server srv new Server();
Database db = srv.Databases("MyDatabase");
\\Display size and space information for the database.
Console.WriteLine("data space usage (KB): " + db.DataSpaceUsage.ToString);
Console.WriteLine("index space usage (KB): " + db.IndexSpaceUsage.ToString);
Console.WriteLine("space available (KB): " + db.SpaceAvailable.ToString);
Console.WriteLine("database size (MB): " + db.Size.ToString);

C# uses essentially the same structure as Powershell in this case and a good example can be found here.
using(SPSite site = new SPSite(http://yoursitehere.com/site)
{
SPSite.UsageInfo usageInfo = site.UsageInfo;
long storageReq = usageInfo.Storage;
}
If you need to iterate through each site in a content database, just call the content database object and reference its .sites property.

Appfabric Cache is performing 4x slower than SQL Server 2008 ??

I am running a test where I am comparing fetch time b/w appfabric and SQL Server 2008 and looks appFabric is performing 4x time slower than SQL Server.
I have a SQL Server 2008 setup which contains only one table with 4 columns (all nvarchar). The table has 6000 rows. I insert the same row (as CLR serializable obj) in the appfabric cache. I am running a loop to fetch data x times.
Here is the code
public class AppFabricCache
{
readonly DataCache myDefaultCache;
public AppFabricCache()
{
//-------------------------
// Configure Cache Client
//-------------------------
//Define Array for 1 Cache Host
var servers = new List<DataCacheServerEndpoint>(1);
//Specify Cache Host Details
// Parameter 1 = host name
// Parameter 2 = cache port number
servers.Add(new DataCacheServerEndpoint(#"localhost", 22233));
//Create cache configuration
var configuration = new DataCacheFactoryConfiguration();
//Set the cache host(s)
configuration.Servers = servers;
//Set default properties for local cache (local cache disabled)
configuration.LocalCacheProperties = new DataCacheLocalCacheProperties();
//Disable exception messages since this sample works on a cache aside
DataCacheClientLogManager.ChangeLogLevel(System.Diagnostics.TraceLevel.Off);
//Pass configuration settings to cacheFactory constructor
DataCacheFactory myCacheFactory = new DataCacheFactory(configuration);
//Get reference to named cache called "default"
myDefaultCache = myCacheFactory.GetCache("default");
}
public bool TryGetCachedObject(string key, out object value)
{
value = myDefaultCache.Get(key);
bool result = value != null;
return result;
}
public void PutItemIntoCache(string key, object value)
{
myDefaultCache.Put(key, value, TimeSpan.FromDays(365));
}
}
And here is the loop to fetch data from the cache
public double RunReadStressTest(int numberOfIterations, out int recordReadCount)
{
recordReadCount = 0;
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < numberOfIterations; i++)
{
for (int j = 1; j <= 6000; j++)
{
string posId = "PosId-" + j;
try
{
object value;
if (TryGetCachedObject(posId, out value))
recordReadCount++;
}
catch (Exception e)
{
Trace.WriteLine("AS%% - Exception - " + e.Message);
}
}
}
sw.Stop();
return sw.ElapsedMilliseconds;
}
}
I have exactly the same logic to retrieve data from SQL Server. It creates a
sqlCommand = 'Select * from TableName where posId = 'someId''
Here are the results...
SQL Server 2008 R2 Reading-1(ms) Reading-2(ms) Reading-3(ms) Average Time in Seconds
Iteration Count = 5 2528 2649 2665 2.614
Iteration Count = 10 5280 5445 5343 5.356
Iteration Count = 15 7978 8370 7800 8.049333333
Iteration Count = 20 9277 9643 10220 9.713333333
AppFabric Reading-1 Reading-2 Reading-3 Average Time in Seconds
Iteration Count = 5 10301 10160 10186 10.21566667
Iteration Count = 10 20130 20191 20650 20.32366667
Iteration Count = 15 30747 30571 30647 30.655
Iteration Count = 20 40448 40541 40503 40.49733333
Am I missing something here? Why it is so slow?

The difference is the network overhead. In your SQL example, you hop over the network once and select N rows. In your AppFabric example, you hop over the network PER RECORD instead of in bulk. This is the difference. To prove this, temporarily store your records in AppFabric as a List and get just the list one time, or use the AppFabric Bulk API to select them all in one request - that should account for much of the difference.

This may be caused by .Net's built in serialisation.
.Net serialisation utilises reflection which in turn has very poor performance. I'd recommend looking into the use of custom written serialisation code.

I think your test is biased and your results are non optimal.
About Distributed Cache
Local Cache : you have disabled local cache feature. Cache objects are always retrieved from the server ; network transfert and deserialization have a cost.
BulkGet : BulkGet improves performance when used with small objects, for example, when retrieving many objects of 1 - 5KB or less in size .
No Data Compression : No compression between AppFabric and Cache Clients. Check this.
About Your test
Another Important thing is that your are not testing the same thing : On one side you test SELECT * and and the other side you test N x GET ITEM.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

MongoDB / C# Driver and Memory Issue - c#

Related

MemoryMapped file Access Exception

Changing value of searchResponse.total when using ScrollSearch in ElasticSearch NEST API

Converting byte[] to string efficiently

How to get the size of a site collection through C#?

Appfabric Cache is performing 4x slower than SQL Server 2008 ??

Categories

Resources