Newtonsoft Json deserializer not releasing memory - c#

I'm using a StreamReader with JsonTextReader to deserialize a large JSON file containing tens of thousands of small objects, and its consuming way more memory than I think is reasonable (and running out). I'm using what I understand is the recommended pattern for reading large files.
Code simplified for expository purposes:
using (StreamReader streamReader = new StreamReader(stream))
using (JsonTextReader reader = new JsonTextReader(streamReader))
{
JToken token;
while (reader.Read() && reader.TokenType != JsonToken.EndArray)
{
token = JToken.Load(reader);
RawResult result = token.ToObject<RawResult>();
results.Add(result);
}
}
The VS2015 memory profiler is telling me that most of the memory is being consumed by Newtonsoft.Json.Linq.JValue objects, which is bizarre because once the current token has been converted ToObject() there is no reason (as far as I am concerned) why it shouldn't just be discarded.
I'm assuming that the Newtonsoft library is retaining all of the JSON parsed so far in memory. I don't need it to do this and I think if I could prevent this my memory problems would go away.
What can be done?

It doesn't look like you need to be using JTokens as an intermediary; you could just deserialize directly to your RawResult class inside your loop.
using (StreamReader streamReader = new StreamReader(stream))
using (JsonTextReader reader = new JsonTextReader(streamReader))
{
var serializer = new JsonSerializer();
while (reader.Read() && reader.TokenType != JsonToken.EndArray)
{
RawResult result = serializer.Deserialize<RawResult>(reader);
results.Add(result);
}
}
Also note that by adding your result items to a list, you are keeping them all in memory. If you can process them one at a time and write each result individually to your output (file, database, network stream, etc.) you can save memory that way also.
RawResult result = serializer.Deserialize<RawResult>(reader);
ProcessResult(result); // process result now instead of adding to a list

Related

Show error message by uploading file more than 5 MB [duplicate]

I send json file to server and want to read that twice.
[HttpPost]
public ActionResult CreateCases(string fileFormat, Guid key)
{
var file = Request.Files[0];
CheckFile(file);
Create(file);
return Json();
}
public object Check(HttpPostedFileBase file)
{
var stream = file.InputStream;
var serializer = new JsonSerializer();
using (var sr = new StreamReader(stream))
using (var jsonTextReader = new JsonTextReader(sr))
{
dynamic json = serializer.Deserialize(jsonTextReader);
...
}
}
public object Create(HttpPostedFileBase file)
{
var stream = file.InputStream;
var serializer = new JsonSerializer();
using (var sr = new StreamReader(stream))
using (var jsonTextReader = new JsonTextReader(sr))
{
dynamic json = serializer.Deserialize(jsonTextReader);
...
}
}
In Check method file.ContentLength = right value
In Create method file.ContentLength = 0 and json variable already = null
What am I doing wrong?
Thanks in advance.
What am I doing wrong?
This:
I [...] want to read that [file] twice
Your client only sends the file to your web application once, so you should only read it once.
Sure, you can rewind the input stream and appear to solve the immediate problem, but that just introduces new problems, because now you have the entire file in memory at once - and your code can only continue once the entire request has been read.
You don't want to read the file twice.
If you want to validate, then process the JSON, then obtain the JSON, store it in a variable, and then validate and process that variable. Yes, this still requires you to read the entire request body, but then that's your requirement.

Deserialize JSON from Stream depending on first ~16 characters in .NET Core

I am getting a Gziped JSON which I check for a specific start string and if it's correct I deserialize into my object:
await using var decompressionStream = new GZipStream(externalStream, CompressionMode.Decompress);
var streamReader = new StreamReader(decompressionStream);
string eventString = streamReader.ReadToEnd();
if (!eventString.StartsWith("SomeString"))
{
// Drop it because it can't be deserialized
return;
}
MyObject item = Utf8Json.JsonSerializer.Deserialize<MyObject>(Encoding.UTF8.GetBytes(eventString));
This approach works but it seems to be wasteful to turn the stream into a string and back into an bytes array when Utf8Json can deserialize directly from a stream.
Because 3/4 of incoming streams contain data that is of a different type and can't be deserialized I don't want to put a try/catch block around it because that many exceptions would be too expensive.
Is there a way to peek into the first ~16 chars without consuming them?

How to Trace and Process Stream with Json.Net

I am writing an application which will make a REST HTTP call and receive JSON back in the response body. The response body could be large and the content variable so I am wanting to handle it as a stream and process the content to decide which parts I want to deserialize.
I am able to perform my parsing and deserialize the objects I want using a StreamReader (passing in the HttpResponseMessage.Content, read as a stream) and a JsonTextReader. However, as the JsonTextReader pulls data from the stream, I would also like to trace the raw JSON to a file so that we have the raw response recorded (for diagnostics).
For example, my code:
var serializer = new JsonSerializer()
using (var streamReader = new StreamReader(httpContentStream))
using (var textReader = new JsonTextReader(streamReader))
{
textReader.SupportMultipleContent = true;
while (textReader.Read())
{
// Code which looks at the tokens in the text reader and
// figures out what to throw away, what to deserialize and
// what the next type to deserialize will be (stored in someType)
Type someType = null;
// Deserialize an object that we're interested in.
// This advances the textReader to the next token
// after then end of this object (I.E. More than one token)
var someObject = serializer .Deserialize(textReader, someType);
}
}
Is there a way to also have Json.Net trace out the string that its reading as it pulls characters off the stream? I understand that this could generate a large file!
Thanks in advance for any responses.

Fastest way to deserialize objects from Azure Blob storage?

After experiencing a ton of outages with our central RavenDb, we're looking to cache certain objects in Azure Blob Storage. Redis does not have the same SLA guarantees as ABS, so Redis has been ruled out
Retrieval and deserialization of these objects happens every minute and needs to happen extremely quickly.
Here is a code that we're trying to use to deserialize, however it is about 5-6x slower than retrieving objects from Raven. Anyway to optimize it? Object size is about 8mb
var blob = container.GetBlockBlobReference(entityId + ".json");
var serializer = new JsonSerializer
{
ObjectCreationHandling = ObjectCreationHandling.Reuse,
NullValueHandling = NullValueHandling.Include,
ReferenceLoopHandling = ReferenceLoopHandling.Serialize,
PreserveReferencesHandling = PreserveReferencesHandling.All,
TypeNameAssemblyFormat = FormatterAssemblyStyle.Full,
TypeNameHandling = TypeNameHandling.All
};
using (var stream = new MemoryStream())
{
blob.DownloadToStream(stream);
stream.Position = 0;
using (var sr = new StreamReader(stream))
using (var jsonTextReader = new JsonTextReader(sr))
{
var accountOut = serializer.Deserialize<Account>(jsonTextReader);
}
}
It turns out that using a single serializer object, without re-creating in every cycle of the loop was the fix for the issue. Once, we started caching JsonSerializer object and re-using it, the performance of deserialization from Blob Storage became 50% of that from RavenDb
Want speed and can't use Redis (for whatever reason)? There's only one right answer: CosmosDB.
It's very fast and it's backed by SSD storage.
99.99% availability within a single region (just click through the Portal UI to achieve geo-replication if you need that).

Stream adapter to provide write-to interface to write-out utility, but also supply read-out interface

I don't like using MemoryStream objects inbetween stream interfaces. They are awkward, requiring you to re-seek to the start, and will also peak memory usage in demanding situations.
Sometimes a utility will only work a certain way. Perhaps it will output byte[]s, or write to a stream, or is a stream in a pipeline that you read from, pulling the data through.
This Newtonsoft JSON serializer is a utility which can only write to a stream.
var js = new Newtonsoft.Json.JsonSerializer();
var sw = new StreamWriter(ps);
js.Serialize(sw, o);
This is a problem for me, because I want to chain:
IEnumerable
JSON serialization
GZIP compression
HTTP to client
(Network)
HTTP from Server
GZIP decompression
JSON deserialization
IEnumerable
Apart from the difficulties getting the JSON deserializer to present a nice IEnumerable interface, the rest of the parts don't provide an interface suitable for pipelining. Even the GZIP compression side is the wrong way around.
Ideally, on the server-side I would be able to do:
IEnumerable<object> o = GetData();
var js = new Newtonsoft.Json.JsonSerialization(o);
var gz = new System.IO.Compression.GZipStream(js, System.IO.Compression.CompressionMode.Compress, true);
return new FileStreamResult(gz, "application/x-gzip");
I could possibly extend the Newtonsoft project to provide a pipeline implementation, and I may do so. But until then I need a solution, and I believe one is required for other utilities (including the BCL GZipStream).
Are there any solutions which allow one to join such utilities more efficiently?
Is there a library which contains an adapter for such situations?
I am working on such a library, not expecting there to be such a library.
The answer is the brand new StreamAdaptor project:
https://bitbucket.org/merarischroeder/alivate-stream-adaptor. It still needs a bit of work - would be nice to package it as a NuGet package, but it's all there and tested.
So the interface will look a bit like this:
var data = GetData(); //Get the source data
var sa = new StreamAdaptor(); //This is what wraps the write-only utility source
sa.UpstreamSource((ps) => //ps is the dummy stream which does most of the magic
{
//This anon. function is run on a separate thread and can therefore be blocked
var sw = new StreamWriter(ps);
sw.AutoFlush = true;
var js = new Newtonsoft.Json.JsonSerializer();
js.Serialize(sw, data); //This is the main component of the implementation
sw.Flush();
});
var sa2 = new StreamAdaptor();
sa2.UpstreamSource((ps) =>
{
using (var gz = new System.IO.Compression.GZipStream(ps, System.IO.Compression.CompressionMode.Compress, true))
sa.CopyTo(gz);
});
The reverse process is easier with natural support for a read-through pipeline
System.IO.Compression.GZipStream sw = new System.IO.Compression.GZipStream(sa2, System.IO.Compression.CompressionMode.Decompress);
var jsonTextReader = new JsonTextReader(new StreamReader(sw));
return TestA.Deserialize(jsonTextReader);
I also demonstrate there a workaround to the IEnumerable<> deserializing issue. It requires you to create your own deserializer leveraging JsonTextReader, but it works well.
The serializer supports IEnumerable natively. The GetData function above, sets up the data source for the serializer using IEnumerable functions (among other things):
public static IEnumerable<TestB> GetTestBs()
{
for (int i = 0; i < 2; i++)
{
var b = new TestB();
b.A = "A";
b.B = "B";
b.C = TestB.GetCs();
yield return b;
}
}
It's Deserialisation which requires a workaround. Keep in mind that IEnumerable<> properties need to be listed all at the end of the JSON stream/objects, because enumeration is deferred, yet JSON deserialization is linear.
The Deserialization entry point:
public static TestA Deserialize(JsonTextReader reader)
{
TestA a = new TestA();
reader.Read();
reader.Read();
if (!reader.Value.Equals("TestBs"))
throw new Exception("Expected property 'TestBs' first");
reader.Read(); //Start array
a.TestBs = DeserializeTestBs(reader); //IEnumerable property last
return a;
}
One of the IEnumerable deserializer functions:
static IEnumerable<TestB> DeserializeTestBs(JsonTextReader reader)
{
while (reader.Read())
{
if (reader.TokenType == JsonToken.EndArray)
break;
yield return TestB.Deserialize(reader);
}
reader.Read(); //End of object
}
This can of course be achieved with trial and error, although built-in support in JSON.NET is desirable.

Categories

Resources