Batch set data from Dictionary into Redis - c#

I am using StackExchange Redis DB to insert a dictionary of Key value pairs using Batch as below:
private static StackExchange.Redis.IDatabase _database;
public void SetAll<T>(Dictionary<string, T> data, int cacheTime)
{
lock (_database)
{
TimeSpan expiration = new TimeSpan(0, cacheTime, 0);
var list = new List<Task<bool>>();
var batch = _database.CreateBatch();
foreach (var item in data)
{
string serializedObject = JsonConvert.SerializeObject(item.Value, Formatting.Indented,
new JsonSerializerSettings { ContractResolver = new SerializeAllContractResolver(), ReferenceLoopHandling = ReferenceLoopHandling.Ignore });
var task = batch.StringSetAsync(item.Key, serializedObject, expiration);
list.Add(task);
serializedObject = null;
}
batch.Execute();
Task.WhenAll(list.ToArray());
}
}
My problem: It takes around 7 seconds to set just 350 items of dictionary.
My question: Is this the right way to set bulk items into Redis or is there a quicker way to do this?
Any help is appreciated. Thanks.

"just" is a very relative term, and doesn't really make sense without more context, in particular: how big are these payloads?
however, to clarify a few points to help you investigate:
there is no need to lock an IDatabase unless that is purely for your own purposes; SE.Redis deals with thread safety internally and is intended to be used by competing threads
at the moment, your timing of this will include all the serialization code (JsonConvert.SerializeObject); this will add up, especially if your objects are big; to get a decent measure, I strongly suggest you time the serialization and redis times separately
the batch.Execute() method uses a pipeline API and does not wait for responses between calls, so: the time you're seeing is not the cumulative effect of latency; that leaves just local CPU (for serialization), network bandwidth, and server CPU; the client library tools can't impact any of those things
there is a StringSet overload that accepts a KeyValuePair<RedisKey, RedisValue>[]; you could choose to use this instead of a batch, but the only difference here is that it is the varadic MSET rather than muliple SET; either way, you'll be blocking the connection for other callers for the duration (since the purpose of batch is to make the commands contiguous)
you don't actually need to use CreateBatch here, especially since you're locking the database (but I still suggest you don't need to do this); the purpose of CreateBatch is to make a sequence of commands sequential, but I don't see that you need this here; you could just use _database.StringSetAsync for each command in turn, which would also have the advantage that you'd be running serialization in parallel to the previous command being sent - it would allow you to overlap serialization (CPU bound) and redis ops (IO bound) without any work except to delete the CreateBatch call; this will also mean that you don't monopolize the connection from other callers
So; the first thing I would do would be to remove some code:
private static StackExchange.Redis.IDatabase _database;
static JsonSerializerSettings _redisJsonSettings = new JsonSerializerSettings {
ContractResolver = new SerializeAllContractResolver(),
ReferenceLoopHandling = ReferenceLoopHandling.Ignore };
public void SetAll<T>(Dictionary<string, T> data, int cacheTime)
{
TimeSpan expiration = new TimeSpan(0, cacheTime, 0);
var list = new List<Task<bool>>();
foreach (var item in data)
{
string serializedObject = JsonConvert.SerializeObject(
item.Value, Formatting.Indented, _redisJsonSettings);
list.Add(_database.StringSetAsync(item.Key, serializedObject, expiration));
}
Task.WhenAll(list.ToArray());
}
The second thing I would do would be to time the serialization separately to the redis work.
The thrid thing I would do would be to see if I can serialize to a MemoryStream instead, ideally one that I can re-use - to avoid the string alocation and UTF-8 encode:
using(var ms = new MemoryStream())
{
foreach (var item in data)
{
ms.Position = 0;
ms.SetLength(0); // erase existing data
JsonConvert.SerializeObject(ms,
item.Value, Formatting.Indented, _redisJsonSettings);
list.Add(_database.StringSetAsync(item.Key, ms.ToArray(), expiration));
}
}

This second answer is kinda tangential, but based on the discussion it sounds as though the main cost is serialization:
The object in this context is big with huge infos in string props and many nested classes.
One thing you could do here is not store JSON. JSON is relatively large, and being text-based is relatively expensive to process both for serialization and deserialization. Unless you're using rejson, redis just treats your data as an opaque blob, so it doesn't care what the actual value is. As such, you can use more efficient formats.
I'm hugely biased, but we make use of protobuf-net in our redis storage. protobuf-net is optimized for:
small output (dense binary without redundant information)
fast binary processing (absurdly optimized with contextual IL emit, etc)
good cross-platform support (it implements Google's "protobuf" wire format, which is available on just about every platform available)
designed to work well with existing C# code, not just brand new types generated from a .proto schema
I suggest protobuf-net rather than Google's own C# protobuf library because of the last bullet point, meaning: you can use it with the data you already have.
To illustrate why, I'll use this image from https://aloiskraus.wordpress.com/2017/04/23/the-definitive-serialization-performance-guide/:
Notice in particular that the output size of protobuf-net is half that of Json.NET (reducing the bandwidth cost), and the serialization time is less than one fifth (reducing local CPU cost).
You would need to add some attributes to your model to help protobuf-net out (as per How to convert existing POCO classes in C# to google Protobuf standard POCO), but then this would be just:
using(var ms = new MemoryStream())
{
foreach (var item in data)
{
ms.Position = 0;
ms.SetLength(0); // erase existing data
ProtoBuf.Serializer.Serialize(ms, item.Value);
list.Add(_database.StringSetAsync(item.Key, ms.ToArray(), expiration));
}
}
As you can see, the code change to your redis code is minimal. Obviously you would need to use Deserialize<T> when reading the data back.
If your data is text based, you might also consider running the serialization through GZipStream or DeflateStream; if your data is dominated by text, it will compress very well.

Related

C#. Fastest way to save list of Objects to Mongo

I have several lists of Objects
List<ObjectClass1> ObjectList1;
List<ObjectClass2> ObjectList2;
I would like to write all objects as JSON to Mongo in the end of test run.
What is the fastest way to do this?
I am currently doing this:
IMongoClient client = new MongoClient();
IMongoDatabase db = client.GetDatabase("MyDB");
db.CreateCollection("ObjectList1");
var ObjectList1Collection = db.GetCollection<BsonDocument>("ObjectList1");
foreach(ObjectClass1 obj in ObjectList1)
{
var document = BsonSerializer.Deserialize<BsonDocument>(MyJSONSerializer.Serialize(obj));
ObjectList1Collection.InsertOneAsync(document);
}
db.CreateCollection("ObjectList2");
var ObjectList1Collection = db.GetCollection<BsonDocument>("ObjectList2");
foreach(ObjectClass2 obj in ObjectList2)
{
var document = BsonSerializer.Deserialize<BsonDocument>(MyJSONSerializer.Serialize(obj));
ObjectList2Collection.InsertOneAsync(document);
}
May I suggest you start with the following code:
IMongoClient client = new MongoClient();
IMongoDatabase db = client.GetDatabase("MyDB");
// create collection calls are not needed, MongoDB will do that for you
// db.CreateCollection("ObjectList1");
var objectList1Collection = db.GetCollection<ObjectClass1>("ObjectList1");
objectList1Collection.InsertMany(ObjectList1);
...and more or less the same for the second list of objects. This will simply run the insert in a bulk load fashion, i.e. avoid the overhead of calling MongoDB thousands of times and instead chunk up your list of objects into packages of 1000 documents and send them to MongoDB.
If that's not fast enough, there's various things that might make sense depending on your setup:
Profile what's going on! There's little point in optimizing as long as you don't know what the bottleneck is.
The serialization process (conversion of your entities to BsonDocuments) is pretty hefty in terms of
CPU power needed so you would want to do that bit in parallel (using multiple
threads) - you would want a CPU with a lot of cores for that.
Then you'd want to use the async implementation of the InsertMany
method mentioned above so your CPU can continue working while its
waiting for the network/IO part after sending a chunk of documents
off to MongoDB.
You should try to keep your documents as tiny as possible if you're after raw performance - never underestimate the impact of that aspect!
You can invest into stronger hardware. That's always an option...
You can do various things around the MongoDB setup, including sharding to distribute the load to various systems if the I/O part is the culprit.
You can play around with write concern levels
You can fiddle with the MongoDB storage engine
...and probably a lot more dangerous stuff. ;)
You don't need to serialize it to Json you can just call:
ObjectList1Collection.InsertManyAsync(ObjectList1);
That should be te fastest way as far as I know.

Pass-Through Stream (not having to save to memory in the middle)

In my c# program, I am working with an object (third-party, so I have no way of changing its source code) that takes a stream in in its constructor (var myObject = new MyObject(stream);).
My challenge is that I need to make some changes to some of the lines of my file prior to it being ready to be given to this object.
To do this, I wrote the following code (which does work):
using (var st = new MemoryStream())
using (var reader = new StreamReader(path))
{
using (var writer = new StreamWriter(st))
{
while (!reader.EndOfStream)
{
var currentLine = reader.ReadLine().Replace(#"XXXX", #"YYYY");
writer.Write(currentLine);
}
writer.Flush();
st.Position = 0;
var myObject = new MyObject(st);
}
}
So, this does work, but it seems so inefficient since it is no longer really streaming the information, but storing it all in the memory stream before giving it through to the object.
Is there a way to create a transform / "pass-through stream" that will:
Read in each small amount from the streamreader
Make the adjustment on that small amount
Stream that amount through
So there won't be a large bit of memory storage in the middle?
Thanks!!
You just create your own class that derives from Stream, and implements the required methods that your MyObject needs.
Implement all your Read/ReadAsync methods by calling the matching Read/ReadAsync methods on your StreamReader. You can then modify the data as it passes through.
You'd have a bit of work to do if the modification requires some sort of understanding of the data as you'll be working in unknown quantities of bytes at a time. You would need to buffer the data to an extent required to do your necessary transformations, but how you do that is very specific to the transformation of the stream that you want to achieve.
Unfortunately the design of the C# Stream class is loaded down with lots of baggage, so implementing the entire API of Stream is quite a bit of work, but chances are your MyObject only calls one or two methods on Stream so a bit of experimentation should soon get it working.

Saving game data to binary files in c#

I'm fairly new to c# and am getting into XNA a bit.
So far all is fairly simple and I can find info on it, but one thing that I've been struggling with is finding good tips/tutorials on how to create game save functionality.
I don't really want to use XML for saving neither the configuratio, nor the game since it just makes the value changing too easy. So, I decided to go for binary files, since it adds a layer of complexity.
Sadly I wasnt able to find much information on how to do that.
I saw some posts suggesting users to create a structure, then saving it as a binary file.
This seems fairly simple (I can see that being done with the controls, for example, since there aren't that many variables), but I can't seem to find info on how to convert the actual
public struct controls{
public int forward;
public int back;
}
structure ... well, to a binary file really.
Another question is saving game data.
Should I go for the same approach and create a structure that will hold variables like player health, position etc. and just load it up when I want to load the game?
I guess what I want to ask is - is it possible to "freeze" the game state (amount of enemies around, items dropped etc.) and load it up later?
Any tips, pushes and nods towards the right direction will be much appreciated.
Thank you very much!
Well, simple answer is yes you can store game state. But this is mainly depends on the actual game implementation. You have to implement one/several data classes which will store the data vital for game state recreation. I think you can't just easily dump your game memory to restore the state. You have to recreate the game scene using the values you saved earlier.
So you can use these simple methods to convert virtually any class marked by Serializable attribute to byte array:
public static byte[] ToBytes(object data)
{
using (var ms = new MemoryStream())
{
// create a binary formatter:
var bnfmt = new BinaryFormatter();
// serialize the data to the memory-steam;
bnfmt.Serialize(ms, data);
// return a byte[] representation of the binary data:
return ms.GetBuffer();
}
}
public static T FromBytes<T>(byte[] input)
{
using (var ms = new MemoryStream(input))
{
var bnfmt = new BinaryFormatter();
// serialize the data to the memory-steam;
var value = bnfmt.Deserialize(ms);
return (T)value;
}
}
Also you must know the rules of binary serialization. Which types can be serialized out-of-the-box and which needs some workaround for serialization.
Then you can optionaly apply an encryption/decryption to that byte sequence and save/load it using System.IO.File.
//read
var data = File.ReadAllBytes("test.dat");
//write
using (var file = File.OpenWrite("test.dat"))
{
file.Write(data, 0, data.Length);
}
In this situation, there's no a real "correct" answer. If you just want to "encrypt" data, why just don't create an xml in memory, and then apply you preferred criptographic function to protect it before saving?
Surely, this is not a catch-all rule: saving game data in binary format result in less space occupied on disk, and maybe faster load tines: a very long number, such as 123456789, can be stored using only 4 bytes. If you want to save it in xml, there's so much overhead due to xml tags, and conversion from string to int.
A good approach for your project is to create an helper library with serializers/deserializers. Every struct will have its own, and when called on a specific structure the function will convert structure fields into their binary representation, concatenate them as strings and erite them to file. This explains why every structure needs its own deserializer: it's up to you to chose the order of fields, binary encoding, etc
Finally, the above problem can be solved in a more elegant way using an OOP approach, maybe with every "storable" class implementing a serializable interface, and implementing ad hoc serializazions methods.

How to read back appended objects using protobuf-net?

I'm appending real-time events to a file stream using protobuf-net serialization. How can I stream all saved objects back for analysis? I don't want to use an in-memory collection (because it would be huge).
private IEnumerable<Activity> Read() {
using (var iso = new IsolatedStorageFileStream(storageFilename, FileMode.OpenOrCreate, FileAccess.Read, this.storage))
using (var sr = new StreamReader(iso)) {
while (!sr.EndOfStream) {
yield return Serializer.Deserialize<Activity>(iso); // doesn't work
}
}
}
public void Append(Activity activity) {
using (var iso = new IsolatedStorageFileStream(storageFilename, FileMode.Append, FileAccess.Write, this.storage)) {
Serializer.Serialize(iso, activity);
}
}
First, I need to discuss the protobuf format (via Google, not specific to protobuf-net). By design, it is appendable but with append===merge. For lists this means "append as new items", but for single objects this means "combine the members". Secondly, as a consequence of the above, the root object in protobuf is never terminated - the "end" is simply: when you run out of incoming data. Thirdly, and again as a direct consequence - fields are not required to be in any specific order, and generally will overwrite. So: if you just use Serialize lots of times, and then read the data back: you will have exactly one object, which will have basically the values from the last object on the stream.
What you want to do, though, is a very common scenario. So protobuf-net helps you out by including the SerializeWithLengthPrefix and DeserializeWithLengthPrefix methods. If you use these instead of Serialize / Deserialize, then it is possible to correctly parse individual objects. Basically, the length-prefix restricts the data so that only the exact amount per-object is read (rather than reading to the end of the file).
I strongly suggest (as parameters) using tag===field-number===1, and the base-128 prefix-style (an enum). As well as making the data fully protobuf compliant throughout (including the prefix data), this will make it easy to use an extra helper method: DeserializeItems. This exposes each consecutive object via an iterator-block, making it efficient to read huge files without needing everything in memory at once. It even works with LINQ.
There is also a way to use the API to selectively parse/skip different objects in the file - for example, to skip the first 532 records without processing the data. Let me know if you need an example of that.
If you already have lots of data that was already stored with Serialize rather than SerializeWithLengthPrefix - then it is probably still possible to decipher the data, by using ProtoReader to detect when the field-numbers loop back around : meaning, given fields "1, 2, 4, 5, 1, 3, 2, 5" - we can probably conclude there are 3 objects there and decipher accordingly. Again, let me know if you need a specific example.

.Net Deep cloning - what is the best way to do that?

I need to perform deep cloning on my complex object model. What do you think is the best way to do that in .Net?
I thought about serializing / Deserializing
no need to mention that MemberwiseClone is not good enough.
If you control the object model, then you can write code to do it, but it is a lot of maintenance. There are lots of problems, though, which mean that unless you need absolutely the fastest performance, then serialization is often the most manageable answer.
This is one of the cases where BinaryFormatter works acceptably; normally I'm not a fan (due to the issues with versioning etc) - but since the serialized data is for immediate consumption this isn't an issue.
If you want it a bit faster (but without your own code), then protobuf-net may help, but requires code changes (to add the necessary metadata etc). And it is tree-based (not graph-based).
Other serializers (XmlSerializer, DataContractSerializer) are also fine, but if it is just for clone, they may not offer much over BinaryFormatter (except perhaps that XmlSerializer doesn't need [Serializable].
So really, it depends on your exact classes and the scenario.
If you are running code in a Partial Trust environment such as the Rackspace Cloud you will likely be restricted from using the BinaryFormatter. The XmlSerializer can be used instead.
public static T DeepClone<T>(T obj)
{
using (var ms = new MemoryStream())
{
XmlSerializer xs = new XmlSerializer(typeof(T));
xs.Serialize(ms, obj);
ms.Position = 0;
return (T)xs.Deserialize(ms);
}
}
Example of deep cloning from msdn magazine:
Object DeepClone(Object original)
{
// Construct a temporary memory stream
MemoryStream stream = new MemoryStream();
// Construct a serialization formatter that does all the hard work
BinaryFormatter formatter = new BinaryFormatter();
// This line is explained in the "Streaming Contexts" section
formatter.Context = new StreamingContext(StreamingContextStates.Clone);
// Serialize the object graph into the memory stream
formatter.Serialize(stream, original);
// Seek back to the start of the memory stream before deserializing
stream.Position = 0;
// Deserialize the graph into a new set of objects
// and return the root of the graph (deep copy) to the caller
return (formatter.Deserialize(stream));
}
Please take a look at the really good article C# Object Clone Wars. I found a very interest solution there: Copyable: A framework for copying or cloning .NET objects
The best way is probably to implement the System.IClonable interface in your object and all its fields that also needs custom deep cloning capabilities. Then you implement the Clone method to return a deep copy of your object and its members.
You could try AltSerialize which in many cases is faster than the .Net serializer. It also provides caching and custom attributes to speed up serialization.
Best way to implement this manually. It will be really faster than any other generic methods. Also, there are a lot of libraries for this operation (You can see some list with performance benchmarks here).
By the way, BinaryFormatter is very slow for this task and can be good only for testing.

Categories

Resources