encapsulate an old buffer in a new one - c#

So I have this sort of scheme:
table Request {
a:Sample;
b:Sample;
}
Where table Sample has multiple string vectors and instances of it are pretty big.
I have a lot of files on my filesystem with Sample instances, which took me some time to create.
Now I want to take 2 files at random, read them into memory and create a new Request which encapsulates them.
I'm working with c# and this line works:
var a = Sample.GetRootAsSample(new ByteBuffer(System.IO.File.ReadAllBytes(pathToA)));
var b = Sample.GetRootAsSample(new ByteBuffer(System.IO.File.ReadAllBytes(pathTob)));
but I can't seem to find a way to just reference them in a new Request instance.
I need some way of adding those buffers as is to a new builder and then pass their Offset to the new Request all in the same builder.
Building them all over again in a new builder wouldn't be efficient.
How can I achieve this?

There's currently no way to deep-copy a table automatically in C#. Since tables may refer to all sorts of locations in the buffer, this is not a trivial operation, which requires either special purpose code generation or reflection.
There's a CopyTable in C++ using reflection. This could be ported to C# or called from C#.
An alternative is to include the existing buffers in the new table in binary form, i.e. make a and b a vector of ubytes. It means you have to call GetRootAs on them to access them, but this is all still very efficient.

Related

Best Design Pattern for Large Data processing methods

I have an application that I am refactoring and trying to Follow some of the "Clean Code" principles. I have an application that reads data from multiple different data sources and manipulates/formats that data and inserts it into another database. I have a data layer with the associated DTO's, repositories, interfaces , and helpers for each data source as well as a business layer with the matching entities, repositories and interfaces.
My question comes down to the Import Method. I basically have one method that systematically calls each Business logic method to read, process and save the data. There are a lot of calls that need to be made and even though the Import method itself is not manipulating the data at all, the method is still extremely large. Is there a better way to process this data?
ICustomer<Customer> sourceCustomerList = new CustomerRepository();
foreach (Customer customer in sourceCustomerList.GetAllCustomers())
{
// Read Some Data
DataObject object1 = iSourceDataType1.GetDataByCustomerID(customer.ID)
// Format and save the Data
iTargetDataType1.InsertDataType1(object1)
// Read Some Data
// Format the Data
// Save the Data
//...Rinse and repeat
}
You should look into Task Parallel Library (TPL) and Dataflow
ICustomer<Customer> sourceCustomerList = new CustomerRepository();
var customersBuffer = new BufferBlock<Customer>();
var transformBlock = new TransformBlock<Customer, DataObject>(
customer => iSourceDataType1.GetDataByCustomerID(customer.ID)
);
// Build your block with TransformBlock, ActionBlock, many more...
customersBuffer.LinkTo(transformBlock);
// Add all the blocks you need here....
// Then feed the first block or use a custom source
foreach (var c in sourceCustomerList.GetAllCustomers())
customersBuffer.Post(c)
customersBuffer.Complete();
Your performance will be IO-bound, especially with the many accesses to the database(s) in each iteration. Therefore, you need to revise your architecture to minimise IO.
Is it possible to move all the records closer together (maybe in a temporary database) as a first pass, then do the record matching and formatting within the database as a second pass, before reading them out and saving them where they need to be?
(As a side note, sometimes we get carried away with DDD and OO, where everything "needs" to be an object. But that is not always the best approach.)

C#. Fastest way to save list of Objects to Mongo

I have several lists of Objects
List<ObjectClass1> ObjectList1;
List<ObjectClass2> ObjectList2;
I would like to write all objects as JSON to Mongo in the end of test run.
What is the fastest way to do this?
I am currently doing this:
IMongoClient client = new MongoClient();
IMongoDatabase db = client.GetDatabase("MyDB");
db.CreateCollection("ObjectList1");
var ObjectList1Collection = db.GetCollection<BsonDocument>("ObjectList1");
foreach(ObjectClass1 obj in ObjectList1)
{
var document = BsonSerializer.Deserialize<BsonDocument>(MyJSONSerializer.Serialize(obj));
ObjectList1Collection.InsertOneAsync(document);
}
db.CreateCollection("ObjectList2");
var ObjectList1Collection = db.GetCollection<BsonDocument>("ObjectList2");
foreach(ObjectClass2 obj in ObjectList2)
{
var document = BsonSerializer.Deserialize<BsonDocument>(MyJSONSerializer.Serialize(obj));
ObjectList2Collection.InsertOneAsync(document);
}
May I suggest you start with the following code:
IMongoClient client = new MongoClient();
IMongoDatabase db = client.GetDatabase("MyDB");
// create collection calls are not needed, MongoDB will do that for you
// db.CreateCollection("ObjectList1");
var objectList1Collection = db.GetCollection<ObjectClass1>("ObjectList1");
objectList1Collection.InsertMany(ObjectList1);
...and more or less the same for the second list of objects. This will simply run the insert in a bulk load fashion, i.e. avoid the overhead of calling MongoDB thousands of times and instead chunk up your list of objects into packages of 1000 documents and send them to MongoDB.
If that's not fast enough, there's various things that might make sense depending on your setup:
Profile what's going on! There's little point in optimizing as long as you don't know what the bottleneck is.
The serialization process (conversion of your entities to BsonDocuments) is pretty hefty in terms of
CPU power needed so you would want to do that bit in parallel (using multiple
threads) - you would want a CPU with a lot of cores for that.
Then you'd want to use the async implementation of the InsertMany
method mentioned above so your CPU can continue working while its
waiting for the network/IO part after sending a chunk of documents
off to MongoDB.
You should try to keep your documents as tiny as possible if you're after raw performance - never underestimate the impact of that aspect!
You can invest into stronger hardware. That's always an option...
You can do various things around the MongoDB setup, including sharding to distribute the load to various systems if the I/O part is the culprit.
You can play around with write concern levels
You can fiddle with the MongoDB storage engine
...and probably a lot more dangerous stuff. ;)
You don't need to serialize it to Json you can just call:
ObjectList1Collection.InsertManyAsync(ObjectList1);
That should be te fastest way as far as I know.

Saving game data to binary files in c#

I'm fairly new to c# and am getting into XNA a bit.
So far all is fairly simple and I can find info on it, but one thing that I've been struggling with is finding good tips/tutorials on how to create game save functionality.
I don't really want to use XML for saving neither the configuratio, nor the game since it just makes the value changing too easy. So, I decided to go for binary files, since it adds a layer of complexity.
Sadly I wasnt able to find much information on how to do that.
I saw some posts suggesting users to create a structure, then saving it as a binary file.
This seems fairly simple (I can see that being done with the controls, for example, since there aren't that many variables), but I can't seem to find info on how to convert the actual
public struct controls{
public int forward;
public int back;
}
structure ... well, to a binary file really.
Another question is saving game data.
Should I go for the same approach and create a structure that will hold variables like player health, position etc. and just load it up when I want to load the game?
I guess what I want to ask is - is it possible to "freeze" the game state (amount of enemies around, items dropped etc.) and load it up later?
Any tips, pushes and nods towards the right direction will be much appreciated.
Thank you very much!
Well, simple answer is yes you can store game state. But this is mainly depends on the actual game implementation. You have to implement one/several data classes which will store the data vital for game state recreation. I think you can't just easily dump your game memory to restore the state. You have to recreate the game scene using the values you saved earlier.
So you can use these simple methods to convert virtually any class marked by Serializable attribute to byte array:
public static byte[] ToBytes(object data)
{
using (var ms = new MemoryStream())
{
// create a binary formatter:
var bnfmt = new BinaryFormatter();
// serialize the data to the memory-steam;
bnfmt.Serialize(ms, data);
// return a byte[] representation of the binary data:
return ms.GetBuffer();
}
}
public static T FromBytes<T>(byte[] input)
{
using (var ms = new MemoryStream(input))
{
var bnfmt = new BinaryFormatter();
// serialize the data to the memory-steam;
var value = bnfmt.Deserialize(ms);
return (T)value;
}
}
Also you must know the rules of binary serialization. Which types can be serialized out-of-the-box and which needs some workaround for serialization.
Then you can optionaly apply an encryption/decryption to that byte sequence and save/load it using System.IO.File.
//read
var data = File.ReadAllBytes("test.dat");
//write
using (var file = File.OpenWrite("test.dat"))
{
file.Write(data, 0, data.Length);
}
In this situation, there's no a real "correct" answer. If you just want to "encrypt" data, why just don't create an xml in memory, and then apply you preferred criptographic function to protect it before saving?
Surely, this is not a catch-all rule: saving game data in binary format result in less space occupied on disk, and maybe faster load tines: a very long number, such as 123456789, can be stored using only 4 bytes. If you want to save it in xml, there's so much overhead due to xml tags, and conversion from string to int.
A good approach for your project is to create an helper library with serializers/deserializers. Every struct will have its own, and when called on a specific structure the function will convert structure fields into their binary representation, concatenate them as strings and erite them to file. This explains why every structure needs its own deserializer: it's up to you to chose the order of fields, binary encoding, etc
Finally, the above problem can be solved in a more elegant way using an OOP approach, maybe with every "storable" class implementing a serializable interface, and implementing ad hoc serializazions methods.

Stack Overflow, Redis, and Cache invalidation

Now that Stack Overflow uses redis, do they handle cache invalidation the same way? i.e. a list of identities hashed to a query string + name (I guess the name is some kind of purpose or object type name).
Perhaps they then retrieve individual items that are missing from the cache directly by id (which bypasses a bunch of database indexes and uses the more efficient clustered index instead perhaps). That'd be smart (the rehydration that Jeff mentions?).
Right now, I'm struggling to find a way to pivot all of this in a succinct way. Are there any examples of this kind of thing that I could use to help clarify my thinking prior to doing a first cut myself?
Also, I'm wondering where the cutoff is between using a .net cache (System.Runtime.Caching or System.Web.Caching) and going out and using redis. Or is Redis just hands down faster?
Here's the original SO question from 2009:
https://meta.stackexchange.com/questions/6435/how-does-stackoverflow-handle-cache-invalidation
A couple of other links:
https://meta.stackexchange.com/questions/69164/does-stackoverflow-use-caching-and-if-so-how/69172#69172
https://meta.stackexchange.com/questions/110320/stack-overflow-db-performance-and-redis-cache
I honestly can't decide if this is a SO question or a MSO question, but:
Going off to another system is never faster than querying local memory (as long as it is keyed); simple answer: we use both! So we use:
local memory
else check redis, and update local memory
else fetch from source, and update redis and local memory
This then, as you say, causes an issue of cache invalidation - although actually that isn't critical in most places. But for this - redis events (pub/sub) allow an easy way to broadcast keys that are changing to all nodes, so they can drop their local copy - meaning: next time it is needed we'll pick up the new copy from redis. Hence we broadcast the key-names that are changing against a single event channel name.
Tools: redis on ubuntu server; BookSleeve as a redis wrapper; protobuf-net and GZipStream (enabled / disabled automatically depending on size) for packaging data.
So: the redis pub/sub events are used to invalidate the cache for a given key from one node (the one that knows the state has changed) immediately (pretty much) to all nodes.
Regarding distinct processes (from comments, "do you use any kind of shared memory model for multiple distinct processes feeding off the same data?"): no, we don't do that. Each web-tier box is only really hosting one process (of any given tier), with multi-tenancy within that, so inside the same process we might have 70 sites. For legacy reasons (i.e. "it works and doesn't need fixing") we primarily use the http cache with the site-identity as part of the key.
For the few massively data-intensive parts of the system, we have mechanisms to persist to disk so that the in-memory model can be passed between successive app-domains as the web naturally recycles (or is re-deployed), but that is unrelated to redis.
Here's a related example that shows the broad flavour only of how this might work - spin up a number of instances of the following, and then type some key names in:
static class Program
{
static void Main()
{
const string channelInvalidate = "cache/invalidate";
using(var pub = new RedisConnection("127.0.0.1"))
using(var sub = new RedisSubscriberConnection("127.0.0.1"))
{
pub.Open();
sub.Open();
sub.Subscribe(channelInvalidate, (channel, data) =>
{
string key = Encoding.UTF8.GetString(data);
Console.WriteLine("Invalidated {0}", key);
});
Console.WriteLine(
"Enter a key to invalidate, or an empty line to exit");
string line;
do
{
line = Console.ReadLine();
if(!string.IsNullOrEmpty(line))
{
pub.Publish(channelInvalidate, line);
}
} while (!string.IsNullOrEmpty(line));
}
}
}
What you should see is that when you type a key-name, it is shown immediately in all the running instances, which would then dump their local copy of that key. Obviously in real use the two connections would need to be put somewhere and kept open, so would not be in using statements. We use an almost-a-singleton for this.

Recommended program structure

as a beginner, I have formulated some ideas, but wanted to ask the community about the best way to implement the following program:
It decodes 8 different types of data file. They are all different, but most are similar (contain a lot of similar fields). In addition, there are 3 generations of system which can generate these files. Each is slightly different, but generates the same types of files.
I need to make a visual app which can read in any one of these, plot the data in a table (using datagridview via datatable at the moment) before plotting on a graph.
There is a bit more to it, but my question is regarding the basic structure.
I would love to learn more about making best use of object oriented techniques if that would suit well.
I am using c# (unless there are better recommendations) largely due to my lacking experience and quick development time.
I am currently using one class called 'log' that knows what generation/log type the file that is open is. it controls reading and exporting to a datatable. A form can then give it a path, wait for it to process the file and request the datatable to display.
Any obvious improvements?
As you have realised there is a great deal of potential in creating a very elegant OOP application here.
Your basic needs - as much as I can see from the information you have share - are:
1) A module that recognises the type of file
2) A module that can read the file and load the data into a common structure (is it going to be common structure??) this consists of handlers
3) A module that can visualise the data
For the first one, I would recommend two patterns:
1a) Factory pattern: File is passed to a common factory and is parsed to the point that it can decide the handler
2a) Chain-of-responsibility: File is passed to each handler which knows if it can support the file or not. If it cannot passes to the next one. At the end either one handler picks it up or an error will occur if the last handler cannot process it.
For the second one, I recommend to design a common interface and each handler implements common tasks such as load, parse... If visualisation is different and specific to handlers then you would have those set of methods as well.
Without knowing more about the data structure I cannot comment on the visualisation part.
Hope this helps.
UPDATE
This is the factory one - a very rough pseudocode:
Factory f = new Factory();
ILogParser parser = f.GetParser(fileName); // pass the file name so that factory inspects the content and returns appropriate handler
CommonDataStructure data = parser.Parse(fileName); // parse the file and return CommonDataStructure.
Visualiser v = new Visualiser(form1); // perhaps you want to pass the refernce of your form
v.Visualise(data); // draw pretty stuff now!
Ok, first thing - make one class for every file structure type, as a parser. Use inheritance as needed to combine common functionality.
Every file parser should have a method to identify whether it can parse a file, so you can take a file name, and just ask the parsers which thinks it can handle the data.
.NET 4.0 and the extensibility framework can allow dynamic integration of the parsers without a known determined selection graph.
The rest depends mostly on how similar the data is etc.
Okay, so the basic concept of OOP is thinking of Classes etc as Objects, straight from the offset, object orientated programming can be a tricky subject to pick up at first but the more practice you get the more easy you find it to implement programs using OOP.
Take a look here: http://msdn.microsoft.com/en-us/beginner/bb308750.aspx
So you can have a Decoder class and interface, something like this.
interface IDecoder
{
void DecodeTypeA(param1, 2, 3 etc);
void DecodeTypeB(param1, 2, 3 etc);
}
class FileDecoder : IDecoder
{
void DecodeTypeA(param1, 2, 3 etc)
{
// Some Code Here
}
void DecodeTypeB(param1, 2, 3 etc)
{
// Some Code Here
}
}

Categories

Resources