Generate truncated JSON using JSON.NET - c#

Given a .NET object, I would like to serialize it to a JSON string, but truncated to a specific length (e.g. 100 characters).
Is there an efficient way of doing that which does not involve serializing the entire object (which might be huge)?
Edited to make things clearer:
The result need not be a valid JSON string. It should be equivalent to:
JsonConvert.SerializeObject(obj).Substring(0, 100);
... but without traversing the entire object graph.

No serializer is going to expect this scenario, because usually their job is to make valid data that can be reliably parsed. However, many serializers have options to take a TextWriter (or if not that, then: a Stream) as an output target. You could write a custom subclass of those which either silently discards data after the chosen amount (although the serializer will still walk the entire object graph), or deliberately throws an exception once the desired amount has been reached (this exception would interrupt the serializer, allowing you to avoid most of the unnecessary work).

Related

Is there a way to provide custom property formatting with Serilog?

I have a class BackgroundTask that I have set up to log events when things happen, such as when the task completes. For example, for the Success case, I call Log.Verbose("{Task} completed successfully", this); My default ToString() includes the progress, but for a completed task we know it's 100% so I would like to ignore it. I know that with numeric types you can pass in a custom format specifier string (like "{IntProperty:p5}", which would render my int as a percentage with 5 decimal places), and I would like to be able to do the same, such as Log.Information("{Task:Name}", this), which would pass in "Name" into my ToString() method.
I've tried adding lots of different methods like adding ToString()'s, (taking in nothing, a string, a string and IFormatProvider), implementing IFormattable, forcing stringification etc and nothing seems to work. I can see that Serilog is correctly parsing my format string: (PropertyBinder.cs, ConstructNamedProperties() line 111)
This calls ConstructProperty() which just ignores the Format property of the token, which explains why it's being ignored, but I was wondering if there was a way that would work that I haven't thought of.
PS
Yes I'm aware I have several options I could do, but I'd rather not do these:
Use a custom destructurer or manually pulling out the properties myself into an anonymous type - This essentially destroys the original object, which is exactly what I don't want (I still want the original value stored as a property). E.g. Log.Information("{Task}", new {Name = this.Name, Id = this.Id});
Manually call ToString() with my own format string - Same as (1), this destroys the original, and means it won't be stored with all it's information. E.g. Log.Information("{Task}", this.ToString("Custom Format"));
Create a property on my object like ToStringFormat before passing it into Serilog - This seems bad practice and just adds extra clutter, not to mention the concurrency issues. E.g. this.Format = "Custom FOrmat"; Log.Information("{Task}", this);
This is due to the split between capturing and formatting in the Serilog pipeline.
In order for format strings like :X to be processed when rendering to a sink, the original object implementing IFormattable needs to be available.
But, because sinks often process events asynchronously, Serilog can't be sure that any given logged object is thread-safe, and so any unknown types are captured at the time/site of logging using ToString().
To get around this, you need to tell Serilog that your Point class is an (essentially immutable) value type with:
.Destructure.AsScalar(typeof(Point))
when the logger is configured. You can then implement IFormattable on Point and use {Point:X} etc. in templates.

protobuf-net stream objects from disk

Consider I have a very large collection (millions) of objects serialized according to the proto wire format. Is it possible to stream these items from the file? I tried serializing the objects as a List<T> and then deserializing a single T item, but it ended up only reading the very last item from the stream. I also tried serializing each instance individually to the stream with the same effect of upon deserialization, it only reading the last item.
I suspect the solution requires my to know the size of each serialized item and then reading that size from the stream and passing that span of bytes to the protobuf serializer for deserialization. I wanted to make sure there wasn't an easier mechanism, one that doesn't require knowledge of the length of each individual item which may be different for each instance of the object, to accomplish this task.
Another thought I had was including the size of each upcoming object as it's own object in the stream, for example:
0: meta-information for the first object, including type/length in bytes
1: object defined in 0
2: meta-information for the second object, including type/length in bytes
3: object defined in 2
4: ...etc
Version information:
I'm currently using dotnet core 3.1 and protobuf-net version 2.4.4
In protobuf the root object is not terminated by default, with the intent being to allow "merge" === "append". This conflicts with he very common scenario you are describing. Fortunately, many libraries provide a mechanism to encode the length before the object for this reason. What you are looking for is the SerializeWithLengthPrefix and DeserializeWithLengthPrefix methods.
If the data already exists as flat appends, and cannot be rewritten: there are still ways to recover it, by using the reader API. A bit more complex, but I've recovered such data in the past for people when needed.

How to read back appended objects using protobuf-net?

I'm appending real-time events to a file stream using protobuf-net serialization. How can I stream all saved objects back for analysis? I don't want to use an in-memory collection (because it would be huge).
private IEnumerable<Activity> Read() {
using (var iso = new IsolatedStorageFileStream(storageFilename, FileMode.OpenOrCreate, FileAccess.Read, this.storage))
using (var sr = new StreamReader(iso)) {
while (!sr.EndOfStream) {
yield return Serializer.Deserialize<Activity>(iso); // doesn't work
}
}
}
public void Append(Activity activity) {
using (var iso = new IsolatedStorageFileStream(storageFilename, FileMode.Append, FileAccess.Write, this.storage)) {
Serializer.Serialize(iso, activity);
}
}
First, I need to discuss the protobuf format (via Google, not specific to protobuf-net). By design, it is appendable but with append===merge. For lists this means "append as new items", but for single objects this means "combine the members". Secondly, as a consequence of the above, the root object in protobuf is never terminated - the "end" is simply: when you run out of incoming data. Thirdly, and again as a direct consequence - fields are not required to be in any specific order, and generally will overwrite. So: if you just use Serialize lots of times, and then read the data back: you will have exactly one object, which will have basically the values from the last object on the stream.
What you want to do, though, is a very common scenario. So protobuf-net helps you out by including the SerializeWithLengthPrefix and DeserializeWithLengthPrefix methods. If you use these instead of Serialize / Deserialize, then it is possible to correctly parse individual objects. Basically, the length-prefix restricts the data so that only the exact amount per-object is read (rather than reading to the end of the file).
I strongly suggest (as parameters) using tag===field-number===1, and the base-128 prefix-style (an enum). As well as making the data fully protobuf compliant throughout (including the prefix data), this will make it easy to use an extra helper method: DeserializeItems. This exposes each consecutive object via an iterator-block, making it efficient to read huge files without needing everything in memory at once. It even works with LINQ.
There is also a way to use the API to selectively parse/skip different objects in the file - for example, to skip the first 532 records without processing the data. Let me know if you need an example of that.
If you already have lots of data that was already stored with Serialize rather than SerializeWithLengthPrefix - then it is probably still possible to decipher the data, by using ProtoReader to detect when the field-numbers loop back around : meaning, given fields "1, 2, 4, 5, 1, 3, 2, 5" - we can probably conclude there are 3 objects there and decipher accordingly. Again, let me know if you need a specific example.

Yet Another Take on Measuring Object Size

Searching Google and StackOverFlow comes up with a lot of references to this question. Including for example:
Ways to determine size of complex object in .NET?
How to get object size in memory?
So let me say at the start that I understand that it is not generally possible to get an accurate measurement. However I am not that concerned about that - I am looking for something that give me relative values rather than absolute. So if they are off a bit one way or the other it does not matter.
I have a complex object graph. It is made up of a single parent (T) with children that may have children and so on. All the objects in the graph are from the same base class. The childrean are in the form of List T.
I have tried both the serializing method and the unsafe method to calculate size. They give different answers but the 'relative' problem is the same in both cases.
I made an assumption that the size of a parent object would be larger than the sum of the sizes of the children. This has turned out not to be true. I calculated the size of the parent. Then summed the size of the children. In some cases this appeared to make sense but in others the sum of the children far exceeded the size determined for the parent.
So my question is: Why is my simple assumption that serializing an object can result in a size that is less that the sum of the children. The only answer I have come up with is that each serialized object has a fixed overhead (which I guess is self evident) and the sum of these can exceed the 'own size' of the parent. If that is so is there any way to determine what that overhead might be so that I can take account of it?
Many thanks in advance for any suggestions.
EDIT
Sorry I forgot to say that all objects are marked serializable the serialization method is:
var bf = new BinaryFormatter();
var ms = new MemoryStream();
bf.Serialize(ms, testObject);
byte[] array = ms.ToArray();
return array.Length;
It will really depend on which serialization mechanism you use for serializing the objects. It's possible that it's not serializing the children elements, which is one reason why you'd see the parent size smaller than the sum of the children (possibly even smaller than each individual child).
If you want to know the relative size of an object, make sure that you're serializing all the fields of all objects in your graph.
Edit: so, if you're using the binary formatter, then you must look at the specification for the format used by that serializer to understand the overhead. The format specification is public, and can be found at http://msdn.microsoft.com/en-us/library/cc236844(prot.20).aspx. It's not very easy to digest, but if you're willing to put the time to understand it, you'll find exactly how much overhead each object will have in its serialized form.

Can anyone recommend a way to check if a class can be serialized as XML?

I have a Generic class that takes an object of type T, serializes it as XML, then saves it to the filesystem. Currently however the serialize operation fails if the object is not serializable. That's not a problem as such, however I think it would be better to check in my class constructor whether an instance of T is serializable or not and if it is not, throw an error at that point rather than later on.
Is there a way of checking of an instance of T can be serialized as XML other than simply instantiating it and trying to serialize it in a TRY...CATCH? It would be nice if I could interrogate the class T in some manner to discover whether it can be serialized as XML or not.
If it helps, the code can be seen here: http://winrtstoragehelper.codeplex.com/SourceControl/changeset/view/ac24e6e923cd#WinRtUtility%2fWinRtUtility%2fObjectStorageHelper.cs
Note that this code gets compiled against WinRT (i.e., it is for use in a Windows 8 app) however I think the question is relevant to any dialect of C#.
thanks in advance
Jamie
AFAIK, even if you check for various attributes (Serializable, DataContract) or check for the Type.IsSerializable (which I believe is just a convenience method for checking the Serializable attribute's existence) it doesn't guarantee that the implementation actually is serializable. (EDIT: As mentioned, and seen in the example code provided in the question, XmlSerializer does not depend on the Serializable attribute adornment. So there's no sense in checking for these flags.)
In my experience, your best bet is to use unit tests that will validate the various types used in your application and use try/catch to see if it pass/fails. At runtime, use try/catch (rather than pre-checking each time) and log/handle the exception.
If you have a list of valid compatible types as a result of your unit-testing, you can have a pre-check of T against a compile-time list that you determined from testing previously and assume any other types are just no good. Might want to watch for subclasses of known valid types though as even if they inherit from a valid serializable type, their implementation may not be.
EDIT: Since this is for Windows Phone 8, while I don't have experience with that platform, I have worked with Silverlight. And in that case, you can serialize objects even if they are not marked as [Serializable] (in fact, it doesn't even exist in Silverlight). The built-in XmlSerializer just works against all public properties regardless of adornment. The only way to see if it's serializable is either attempt a serialization and try/catch the failure, or write an algorithm to inspect each property (and recursively through child objects) and check if each type can be serialized.
EDITx2: Looking at your ObjectStorageHelper, I would suggest that you simply attempt serialization and catch any failures. You don't necessarily have to bubble up the exception directly. You could wrap with your own custom exception or have a returned results object that informs the API consumer of the pass/fail of the serialization and why it may have failed. Better to assume the caller is using a valid object rather than doing an expensive check each time.
EDITx3: Since you're doing a lot of other work in the save method, I would suggest rewriting your code like this:
public async Task SaveAsync(T Obj)
{
if (Obj == null)
throw new ArgumentNullException("Obj");
StorageFile file = null;
StorageFolder folder = GetFolder(storageType);
file = await folder.CreateFileAsync(FileName(Obj), CreationCollisionOption.ReplaceExisting);
IRandomAccessStream writeStream = await file.OpenAsync(FileAccessMode.ReadWrite);
using (Stream outStream = Task.Run(() => writeStream.AsStreamForWrite()).Result)
{
try
{
serializer.Serialize(outStream, Obj);
}
catch (InvalidOperationException ex)
{
throw new TypeNotSerializableException(typeof(T), ex);
}
await outStream.FlushAsync();
}
}
This way you catch the serialization issue specifically and can report to the API consumer very clearly that they have provided an invalid/non-serializable object. This way if you have an exception being thrown as part of your I/O portions, it's clearer where the issue is. In fact, you may want to separate out the serialization/deserialization aspects to their own discrete method/class so you can feed in other serializers (or be more clear from the stack trace where the issue is, or simply to make your methods do one thing and one thing only) But any more rewriting/refactoring is really left for code-review anyway and not valid much for the question at hand.
FYI, I also put a null check for your input object because if the user passes null, they would think that the save was successful when in fact, nothing happened and they might expect a value to be available for loading later when it doesn't exist. If you want to allow nulls as valid values, then don't bother with the check throwing an error.
It depends on what "can be serialized" means. Any class can be serialized with XmlSerializer. If what you mean by can be serialized is that no error occurs, then you'll have to try and catch exceptions to tell for sure.

Categories

Resources