We want to pass a forest - a dictionary with values which can be: dictionaries, arrays, sets, numbers, strings, byte buffers - between Objective C and C# efficiently (time-wise, space is a lesser concern). Google's Protocol Buffers looked good, but they seem to handle only structured data, while ours is arbitrary. Ultimately we can write a binary (de)serialiser ourselves, but surely this was done before and released as FOSS somewhere?
Have you considered using ASN.1? Since ASN.1 is independent of programing language or system architecture, it can be used efficiently regardless of whether you need C, C#, C++, or Java.
You create a description of the information you wish to exchange, and the use an ASN.1 tool to generate an encoder/decoder for your target programming language. ASN.1 also supports a few different rules for transmitting the date which range from the efficient PER (Packed Encoding Rules) to the verbose, but flexible XER (XML Encoding Rules).
To play with ASN.1 to see if this might work for you, try the free online ASN.1 compiler and encoder/decoder at http://asn1-playground.oss.com to see if this might work for you.
Related
I'm trying to find a "meta language" that can be used to define a structure and get/set code for members. The catch is that the structure already exists in code, and this "meta language" would serve as bit-for-bit replacement of the original hand-coded structure to allow the headers describing the structures to be generated. The point is that the structures are used as part of a protocol between a C# application and an embedded device (not linux based, think smaller and more constrained like a PIC or CM0.)
The meta language would act as
Documentation for the structure members
Generate C# structs and implementation for get/set operations
Generate packed ANSI-C structs and get/set functions
The meta language would need to support
enumeration definitions (of a specified size - ie uint16_t, uint8_t, or smaller as for multi-bit enumerations)
bit-arrays (of specified size - ie 48-bit array is packed into 6 bytes,)
bit-structure/enumeration arrays (of specified size - ie a 2-bit structure of 48 indexes is 12 bytes,)
specification for endianness and bit-order,
generate binary structures that can be read directly by either the generated ANSI-C code or the C-sharp code for the purpose of sending over a network.
It would also be nice to have some limited validation of the data when received.
So far I've looked at
BSON
Etch
Hessian Avro
ICE
MessagePack
Protocol Buffers
Thrift
All of these are great for documentation and when building a new protocol, but trying to maintain compatibility with an existing protocol and these fall short due to the type encoding inherent in the data marshaling.
I've also looked at ASN.1 for ECN encoding, but that seems to be too unreadable causing issue with Documentation.
I've looked at Generating C# structure from C structure but there wasn't a good option there.
Any recommendations?
What you want is a Program Transformation System.
These are tools that can read arbitrary computer language instances, and then transform those into other valid language instances, sometimes in the same language, sometimes into a different language. They are general in that you can provide them the description of the languages you want to manipulate, and they can then operate on those languages.
Good tools in this space let you write the code transformations in terms of the ("surface") syntax of the languages of interest, essentially in the form of "if you see this, replace it by that".
For OP's scenario, the essential transformations are "if you see this slot in a structure replace it by corresponding getters and setters, and a replacement struct member for the target language.
In your case, you need to choose between 3 scenarios:
Define an abstract language for specifying data structures, and build program transforms that map from the specification language to both C# and C.
Decide that the C data declarations are the reference, and generate corresponding C# code.
Decide the C# data declarations is the reference, and generate corresponding C code.
Then you'll have to sit down, define the languages to the tool (if they aren't already defined), and construct the transforms.
(Full disclosure: I build such a tool. See my bio).
I have been searching on this topic for a while now, without finding any relevant answers. So thought of taking it on 'Stackoverflow' ...
We are trying to encode a string in order to pass it over a TCP/IP connection. Since ASN.1 is the most popular one to do it, so we are trying the various rules BER,DER,PER etc. to find out which one we can use. Our application is a .net based application and I was looking for freely available library which does this.
Strangely i could not find any free libraries.So, i started looking in the .Net framework itself. I found the there is only a 'BERConverter'. So, i did a small example with it. Taking an example string
string str = "The BER format specifies a self-describing and self-delimiting format for encoding ASN.1 data structures. Each data element is encoded as a type identifier, a length description, the actual data elements, and, where necessary, an end-of-content marker. These types of encodings are commonly called type-length-value or TLV encodings. This format allows a receiver to decode the ASN.1 information from an incomplete stream, without requiring any pre-knowledge of the size, content, or semantic meaning of the data"
In UTF-8 or ASCII it show as 512 bytes. I use the following code to encode it using BER
public static byte[] BerConvert(byte[] inputbytes)
{
byte[] output = BerConverter.Encode("{o}", inputbytes);
return output;
}
I get a byte array with size 522. In some of the other cases I find that the byte size increases compared to the original text. I thought encoding will decrease the size. Why is it happening like this ?
Apart from BER, are there other encoding rules like PER or DER which can be used to reduce the encoding size ? Are there any examples, libraries, or support which will help is implementing the these encoding styles?
When looking for ASN.1 Tools (free and commercial), a good place to start is the ITU-T web page http://www.itu.int/en/ITU-T/asn1/Pages/Tools.aspx that lists several. There are commercial tools listed there that support C#, but I do not see a free C# tool.
As for reduction of size of encodings, this depends significantly on the nature of your ASN.1 specification and the encoding rules used. If you are primarily sending text strings, BER and DER will not result in a reduction of the size of your message, while PER can significantly reduce the size of the message if you are able to produce a "permitted alphabet" constraint indicating a smaller set of characters permitted in the text you are sending.
You can try various encodings rules and different constraints to see the effects of your changes at the free online ASN.1 encoder decoder at http://asn1-playground.oss.com.
If you are beginning work on a new protocol, you may want to reevaluate your needs a bit.
As you probably know by now, ASN.1 comes with a bit of overhead—not just in the messaging, but in the engineering. A typical workflow involves writing a specification that describes the protocol, feeding it into a CASE tool that generates source code for an API, and then integrating the generated components into your application.
That said, some prefer a more ad-hoc approach. Microsoft has a BER converter class that you could try to use with C#: it may be suitable for your needs.
If compression is important, you may want to look into PER, as Paul said. But it's hard to produce valid PER encodings by hand because they rely on the specification to perform compression. (The permitted alphabet constraint is written into the specification and used to enumerate valid characters for shrinking the encoding.)
For more information on ASN.1 there are a number of tutorials online; you can also look at ITU-T standards X.680-X.695, which specify both the syntax notation and various encoding rules.
There are a few libraries on CodePlex. Like this one.
https://asn1.codeplex.com/SourceControl/latest#ObjectIdentifier.cs
I'll just leave it here Asn1DerParser.NET . And thank to the author for his work!
I am working on a system that has components written in the following languages:
C
C++
C#
PHP
Python
These components all use (infrequently changing) data that comes from the same source and can be cached and accesed from memcache for performance reasons.
Because different data types may be stored differently by different language APIs to memcache, I am wondering if it would be better to store ALL data as string (objects will be stored as JSON string).
However, this in itself may pose problems as strings (will almost surely) have different internal representations accross the different languages, so I'm wondering about how wise that decision is.
As an aside, I am using the 1 writer, multiple readers 'pattern' so concurrency is not an issue.
Can anyone (preferably with ACTUAL experience of doing something similar) advice on the best format/way to store data in memcache so that it may be consumed by different programming languages?
memcached I think primarily only understands byte[] and representation of byte is same in all languages. You can serialize your objects using protocol buffers or a similar library and consume it in any other language. I've done this in my projects.
Regardless of the back-end chosen, (memcached, mongodb, redis, mysql, carrier pigeon) the most speed-efficient way to store data in it would be a simple block of data (so the back-end has no knowledge of it.) Whether that's string, byte[], BLOB, is really all the same.
Each language will need an agreed mechanism to convert objects to a storable data format and back. You:
Shouldn't build your own mechanism, that's just reinventing the wheel.
Should think about whether 'invalid' objects might end up in the back-end. (either because of a bug in a writer, or because objects from a previous revision are still present)
When it comes to choosing a format, I'd recommend two: JSON or Protocol Buffers. This is because their encoded size and encode/decode speed is among the smallest/fastest of all the available encodings.
Comparison
JSON:
Libraries available for dozens of languages, sometimes part of the standard library.
Very simple format - Human-readable when stored, human-writable!
No coordination required between different systems, just agreement on object structure.
No set-up needed in many languages, eg PHP: $data = json_encode($object); $object = json_decode($data);
No inherent schema, so readers need to validate decoded messages manually.
Takes more space than Protocol Buffers.
Protocol Buffers:
Generating tools provided for several languages.
Minimal size - difficult to beat.
Defined schema (externally) through .proto files.
Auto-generated interface objects for encoding/decoding, eg C++: person.SerializeToOstream(&output);
Support for differing versions of object schemas to add new optional members, so that existing objects aren't necessarily invalidated.
Not human-readable or writable, so possibly harder to debug.
Defined schema introduces some configuration management overhead.
Unicode
When it comes to Unicode support, both handle it without issues:
JSON: Will typically escape non-ascii characters inside the string as \uXXXX, so no compatibility problem there. Depending on the library, it may be also possible to force UTF-8 encoding.
Protocol Buffers: Seem to use UTF-8, though I haven't found info in Google's documentation in 3-foot-high letters to that effect.
Summary
Which one you go with will depend on how exactly your system will behave, how often changes to the data structure occur, and how all the above points will affect you.
Not going to lie you could do it in redis. Redis is a key-value database written to be high performance it allows the transfer of data between languages using a number of different client libraries these are the client libraries Here is an example in java and python
Edit 1: Code is untested. If you spot an error please let me know :)
Edit 2: I know I didn't use the prefered redis client for java but the point still stands.
Python
import redis
r = redis.Redis()
r.set('test','123')
Java
import org.jredis.RedisException;
import org.jredis.ri.alphazero.JRedisClient;
import static org.jredis.ri.alphazero.support.DefaultCodec.*;
class ExampleCode{
private final JRedisClient client = new JRedisClient();
public static void main(String[] args) throws RedisException {
System.out.println(toStr(client.get('test')))
}
}
When sending information from a java application to a C# application through sockets, is the byte order different? Or can I just send an integer from C# to a java application and read it as integer?
(And do the OS matter, or is the same for java/.net no matter how the actual OS handles it?)
It all comes down to how you encode the data. If you are treating it only as a raw sequence of bytes, there is no conflict; the sequence is the same. When the matters is in endianness when interpreting chunks of the data as (for example) integers.
Any serializer written with portability in mind will have defined endianness - for example, in protocol buffers (available for both Java and C#) little-endian is always used regardless of your local hardware.
If you are doing manual writing to the stream, using things like shift-based encoding (rather than direct memory copying) will give you defined endianness.
If you use pre-canned platform serializers, you are at the mercy of the implementation. It might be endian-safe, it might not be (i.e. it might depend on the platform at both ends). For example, the .NET BitConverter class is not safe - it is usually assumed (incorrectly) to be little-endian, but on some platforms (and particularly in Mono on some hardware) it could be big-endian; hence the .IsLittleEndian property.
My advice would be to use a serializer that handles it all for you ;p
In Java, you can use a DataInputStream or DataOutputStream which read and write the high-byte first, as documented:
http://download.oracle.com/javase/6/docs/api/java/io/DataOutputStream.html#writeInt%28int%29
You should check corresponding C# documentation to see what it does (or maybe someone here can tell you).
You also have, in Java, the option of using ByteByffer:
http://download.oracle.com/javase/6/docs/api/java/nio/ByteBuffer.html
... which has the "order" method to allow you to specify a byte order for operations reading multi-byte primitive types.
Java uses Big Endian for some libraries like DataInput/OutputStream. IP protocols all use Big Endian which can lead people to use Big Endian as default for network protocols.
However NIO, ByteBuffer allows you to specify BigEndian, LittleEndian or NativeEndian (whatever the system uses by default)
x86 systems tend to use little endian and so many Microsoft/Linux applications use little endian by default but can support big-endian.
Yes, the byte order may be different. C# assumes little-endian may use the platform's byte ordering, Java tends to use big-endian. This has been discussed before on SO. See for example C# little endian or big endian?
Is there an easy way to serialize data in c++ (either to xml or binary), and then deserialize the data in C#?
I'm working with some remote WINNT machines that won't run .Net. My server app is written entirely in C#, so I want an easy way to share simple data (key value pairs mostly, and maybe some representation of a SQL result set). I figure the best way is going to be to write the data to xml in some predefined format on the client, transfer the xml file to my server, and have a C# wrapper read the xml into a usable c# object.
The client and server are communicating over a tcp connection, and what I really want is to serialize the data in memory on the client, transfer the binary data over the socket to a c# memory stream that I can deserialize into a c# object (eliminating file creation, transfer, etc), but I don't think anything like that exists. Feel free to enlighten me.
Edit
I know I can create a struct in the c++ app and define it in c# and transfer data that way, but in my head, that feels like I'm limiting what can be sent. I'd have to set predefined sizes for objects, etc
Protocol Buffers might be useful to you.
Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages – Java, C++, or Python.
.NET ports are available from Marc Gravell and Jon Skeet.
I checked out all mentioned projects like prottocol buffers, json, xml, etc. but after I have found BSON I use this because of the following reasons:
Easy to use API
Available in many languages (C, C++, Haskell, Go, Erlang, Perl, PHP, Python, Ruby, C#, ...)
Binary therefore very space efficient and fast (less bytes->less time)
constistent over platforms (no problems with endianess, etc)
hierarchical. The data model is comparable to json (what the name suggests) so most data modelling tasks should be solvable.
No precompiler necessary
wideley used (Mongodb, many languages)
C++ doesn't have structural introspection (you can't find out the fields of a class at runtime), so there aren't general mechanisms to write a C++ object. You either have to adopt a convention and use code generation, or (more typically) write the serialisation yourself.
There are some libraries for standard formats such as ASN.1, HDF5, and so on which are implementation language neutral. There are proprietary libraries which serve the same purpose (eg protocol buffers).
If you're targeting a particular architecture and compiler, then you can also just dump the C++ object as raw bytes, and create a parser on the C# side.
Quite what is better depends how tightly coupled you want your endpoints to be, and whether the data is mainly numerical (HDF5), tree and sequence structures (ASN.1), or simple plain data objects (directly writing the values in memory)
Other options would be:
creating a binary file that contains the data in the way you need it
( not a easy & portable solution )
XML
YAML
plain text files
There are a lot of options you can choose from. Named pipes, shared
memory, DDE, remoting... Depends on your particular need.
Quick googling gave the following:
Named pipes
Named Shared Memory
DDE
As mentioned already, Protocol Buffers are a good option.
If that option doesn't suit your needs, then I would look at sending the XML over to the client (you would have to prefix the message with the length so you know how much to read) and then using an implementation of IXmlSerializer or use the DataContract/DataMember attributes in conjunction with the DataContractSerializer to get your representation in .NET.
I would recommend against using the marshaling attributes, as they aren't supported on things like List<T> and a number of other standard .NET classes which you would use normally.