I have a problem deserializing a JSON file of about 1GB. When I run the following code I get an out of memory exception:
using (FileStream sr = new FileStream("myFile.json", FileMode.Open, FileAccess.Read))
{
using (StreamReader reader = new StreamReader(sr))
{
using (JsonReader jsReader = new JsonTextReader(reader))
{
JsonSerializer serializer = new JsonSerializer();
dataObject = serializer.Deserialize<T>(jsReader);
}
}
}
the exception is thrown by
Newtonsoft.Json.Linq.JTokenWriter.WriteValue(Int64 value)
The serialization works well, here is the code I'm using
using (StreamWriter reader = new StreamWriter("myFile.json"))
{
using (JsonReader jsWriter = new JsonWriter(reader))
{
JsonTextWriter jsonWriter = new JsonTextWriter(jsWriter) { Formatting = Formatting.Indented };
JsonSerializer ser = new JsonSerializer();
ser.Serialize(jsonWriter, dataObject, dataObject.GetType());
jsonWriter.Flush();
}
}}
Am I doing something wrong in the deserialization? Can you help suggesting a way to deserialize big json object?
Thanks
According to Newtonsoft.Json Performance Tips your approach has to work (because you read via stream and it should make portion from your file). I can't figure out why your code doesn't work.
But you can try another approach, that was described in the next article - Parsing Big Records with Json.NET
Related
I send json file to server and want to read that twice.
[HttpPost]
public ActionResult CreateCases(string fileFormat, Guid key)
{
var file = Request.Files[0];
CheckFile(file);
Create(file);
return Json();
}
public object Check(HttpPostedFileBase file)
{
var stream = file.InputStream;
var serializer = new JsonSerializer();
using (var sr = new StreamReader(stream))
using (var jsonTextReader = new JsonTextReader(sr))
{
dynamic json = serializer.Deserialize(jsonTextReader);
...
}
}
public object Create(HttpPostedFileBase file)
{
var stream = file.InputStream;
var serializer = new JsonSerializer();
using (var sr = new StreamReader(stream))
using (var jsonTextReader = new JsonTextReader(sr))
{
dynamic json = serializer.Deserialize(jsonTextReader);
...
}
}
In Check method file.ContentLength = right value
In Create method file.ContentLength = 0 and json variable already = null
What am I doing wrong?
Thanks in advance.
What am I doing wrong?
This:
I [...] want to read that [file] twice
Your client only sends the file to your web application once, so you should only read it once.
Sure, you can rewind the input stream and appear to solve the immediate problem, but that just introduces new problems, because now you have the entire file in memory at once - and your code can only continue once the entire request has been read.
You don't want to read the file twice.
If you want to validate, then process the JSON, then obtain the JSON, store it in a variable, and then validate and process that variable. Yes, this still requires you to read the entire request body, but then that's your requirement.
Is there any json serializer library that work in .netcore and .Net3.5?
I need to use a library in a multiplatform project the problem is that Newtonsoft's library works only in .Net Framework and System.Text.Json only works in .netcore.
** I tried Json.Net but no luck. I get this kind of error on all of the libraries:
You can use DataContractJsonSerializer
I am using it in my .net standard library to serialize and de-serialize my model objects to json.
public static string PrepareJsonString<T>(object objectToBeParsed)
{
DataContractJsonSerializer dataContractSerializer = new DataContractJsonSerializer(typeof(T));
string json = string.Empty;
using (var ms = new MemoryStream())
{
dataContractSerializer.WriteObject(ms, (T)objectToBeParsed);
ms.Position = 0;
StreamReader sr = new StreamReader(ms);
json = sr.ReadToEnd();
}
return json;
}
public static object PrepareObjectFromString<T>(string json)
{
DataContractJsonSerializer dataContractSerializer = new DataContractJsonSerializer(typeof(T));
using (var memoryStream = new MemoryStream(Encoding.UTF8.GetBytes(json)))
{
var deSerializedUser = dataContractSerializer.ReadObject(memoryStream);
return deSerializedUser;
}
}
sample to consume this functions
List<MyModel> list= PrepareObjectFromString<List<MyModel>>(myJson);
or
MyModel list= PrepareObjectFromString<MyModel>(myJson);
string json=PrepareJsonString<MyModel>(myModelInstance);
Hope this helps.
Thank you.
I'm using a StreamReader with JsonTextReader to deserialize a large JSON file containing tens of thousands of small objects, and its consuming way more memory than I think is reasonable (and running out). I'm using what I understand is the recommended pattern for reading large files.
Code simplified for expository purposes:
using (StreamReader streamReader = new StreamReader(stream))
using (JsonTextReader reader = new JsonTextReader(streamReader))
{
JToken token;
while (reader.Read() && reader.TokenType != JsonToken.EndArray)
{
token = JToken.Load(reader);
RawResult result = token.ToObject<RawResult>();
results.Add(result);
}
}
The VS2015 memory profiler is telling me that most of the memory is being consumed by Newtonsoft.Json.Linq.JValue objects, which is bizarre because once the current token has been converted ToObject() there is no reason (as far as I am concerned) why it shouldn't just be discarded.
I'm assuming that the Newtonsoft library is retaining all of the JSON parsed so far in memory. I don't need it to do this and I think if I could prevent this my memory problems would go away.
What can be done?
It doesn't look like you need to be using JTokens as an intermediary; you could just deserialize directly to your RawResult class inside your loop.
using (StreamReader streamReader = new StreamReader(stream))
using (JsonTextReader reader = new JsonTextReader(streamReader))
{
var serializer = new JsonSerializer();
while (reader.Read() && reader.TokenType != JsonToken.EndArray)
{
RawResult result = serializer.Deserialize<RawResult>(reader);
results.Add(result);
}
}
Also note that by adding your result items to a list, you are keeping them all in memory. If you can process them one at a time and write each result individually to your output (file, database, network stream, etc.) you can save memory that way also.
RawResult result = serializer.Deserialize<RawResult>(reader);
ProcessResult(result); // process result now instead of adding to a list
Abstract
Hi, I'm working on a project where it is needed to send potentially huge json of some object via HttpClient, a 10-20 mb of JSON is a typical size. In order do that efficiently I want to use streams, both with Json.Net to serialize an object plus streams for posting data with HttpClient.
Problem
Here is the snippet for serialization with Json.net, in order to work with streams, Json.net expects a stream that it will write into:
public static void Serialize( object value, Stream writeOnlyStream )
{
StreamWriter writer = new StreamWriter(writeOnlyStream); <-- Here Json.net expects the stream to be already created
JsonTextWriter jsonWriter = new JsonTextWriter(writer);
JsonSerializer ser = new JsonSerializer();
ser.Serialize(jsonWriter, value );
jsonWriter.Flush();
}
While HttpClient expects a stream that it will read from:
using (var client = new HttpClient())
{
client.BaseAddress = new Uri("http://localhost:54359/");
var response = await client.PostAsync("/api/snapshot", new StreamContent(readOnlyStream)); <-- The same thing here, HttpClient expects the stream already to exist
...
}
So eventually this means that both classes expecting the Stream to be created by someone else, but there are no streams both for Json.Net, neither for HttpClient. So the problem seems that can be solved by implementing a stream that would intercept a read requests made to read-only stream, and issue writes upon request from write-only stream.
Question
Maybe someone has stumbled on such situation already, and probably found already implemented solution to this problem. If so, please share it with me,
Thank you in advance!
If you define a subclass of HttpContent :
public class JsonContent:HttpContent
{
public object SerializationTarget{get;private set;}
public JsonContent(object serializationTarget)
{
SerializationTarget=serializationTarget;
this.Headers.ContentType = new MediaTypeHeaderValue("application/json");
}
protected override async Task SerializeToStreamAsync(Stream stream,
TransportContext context)
{
using(StreamWriter writer = new StreamWriter(stream))
using(JsonTextWriter jsonWriter = new JsonTextWriter(writer))
{
JsonSerializer ser = new JsonSerializer();
ser.Serialize(jsonWriter, SerializationTarget );
}
}
protected override bool TryComputeLength(out long length)
{
//we don't know. can't be computed up-front
length = -1;
return false;
}
}
then you can:
var someObj = new {a = 1, b = 2};
var client = new HttpClient();
var content = new JsonContent(someObj);
var responseMsg = await client.PostAsync("http://someurl",content);
and the serializer will write directly to the request stream.
Use PushStreamContent. Rather than have Web API "pull" from a stream, it lets you more intuitively "push" into one.
object value = ...;
PushStreamContent content = new PushStreamContent((stream, httpContent, transportContext) =>
{
using (var tw = new StreamWriter(stream))
{
JsonSerializer ser = new JsonSerializer();
ser.Serialize(tw, value);
}
});
Note that JSON.NET doesn't support async during serialization so while this may be more memory efficient, it won't be very scalable.
I'd recommend trying to avoid such large JSON objects, though. Try to chunk it up, for instance, if you're sending over a large collection. Many clients/servers will flat out reject something so big without special handling.
I can't get the DataContractJsonSerializer object to swallow my stream. When I execute the code with the commented-out line active, I get to see the text provided (and it is a parsable JSON object), so I know that the stream is working fine.
However, for some reason, the compiler complains that the streamReader I'm trying to shove down its throat in ReadObject isn't a Stream. Well, isn't it?!
Argument 1: cannot convert from 'System.IO.StreamReader' to 'System.IO.Stream'
What am I missing and how do I resolve it?
using (StreamReader streamReader = new StreamReader(...))
{
//String responseText = reader.ReadToEnd();
MyThingy thingy = new MyThingy();
DataContractJsonSerializer serializer
= new DataContractJsonSerializer(thingy.GetType());
thingy = serializer.ReadObject(streamReader);
}
I'm adapting this example to work with my stream. Should I approach it from a different angle? If so - how?
You're trying to put in a reader of a stream instead of an actual stream. Skip the using and whatever hides behind the ellipsis (i.e. whatever you put in as an argument when you create an instance of StreamReader), you can probably put that into the ReadObject.
Also, you'll get into problems when reading the data because ReadObject will return an instance of type Object and you'll need to convert it into MyThingy. Since it's a nullable (I'm assuming), you don't have to type cast but rather as-ify it.
MyThingy thingy = new MyThingy();
DataContractJsonSerializer serializer
= new DataContractJsonSerializer(thingy.GetType());
Stream stream = ...;
thingy = serializer.ReadObject(stream) as MyThingy;
You could of course skip the next-to-last line and put the stream directly into the last line.
Courtesy of #JohanLarsson (all Swedes are great, especially those from Stockholm, like me):
In case you can't or don't want to omit the StreamReader declaration in your using statement, I'd suggest that you take a look at BaseStream property to get to it.
You can try this:
using (StreamReader streamReader = new StreamReader(...))
{
DataContractJsonSerializer serializer = new DataContractJsonSerializer(typeof(MyThingy));
MyThingy thingy = (MyThingy) serializer.ReadObject(streamReader.BaseStream);
}
I've been always using this:
// get stuff here
String json = GetJSON();
List<T> result;
using (var ms = new MemoryStream(Encoding.Unicode.GetBytes(json)))
{
var serializer = new DataContractJsonSerializer(typeof(List<T>));
result = (List<T>)serializer.ReadObject(ms);
}