Deserializing large JSON Objects from Web Service (Out of Memory)

Deserializing large JSON Objects from Web Service (Out of Memory) - c#

I have a program that deserializes large objects from a web service. After a webservice call and a 200, the code looks like this.
JsonConvert.DeserializeObject<List<T>>(resp.Content.ReadAsStringAsync().Result).ToList()
Sometimes while running this process I will get an aggregate exception which shows an inner exception as out of memory. I can't determine if it is the process of reading in the string of JSON data (which is probably awfully large) or the Deserializing that is causing this issue. What I would like to do is break out the string and pull each JSON object back individually from the response and then deserialize it. I am just having trouble finding a way to only bring out one JSON object at a time from the response. Any suggestions are greatly appreciated!

HttpClient client = new HttpClient();
using (Stream s = client.GetStreamAsync("http://www.test.com/large.json").Result)
using (StreamReader sr = new StreamReader(s))
using (JsonReader reader = new JsonTextReader(sr))
{
JsonSerializer serializer = new JsonSerializer();
// read the json from a stream
// json size doesn't matter because only a small piece is read at a time from the HTTP request
Person p = serializer.Deserialize<Person>(reader);
}

https://learn.microsoft.com/en-us/xamarin/xamarin-forms/data-cloud/web-services/rest contains a warning:
Using the ReadAsStringAsync method to retrieve a large response can
have a negative performance impact. In such circumstances the response
should be directly deserialized to avoid having to fully buffer it.

Related

How to send a continuous stream of data with asp .net core web api?

I want to send a continuous stream of data to the client. Let's say i have a web api service route [Route(/bla/continuous/data)]. The client expects continuous stream data over http from that api, is there a way to do that? For example, this code i am trying in this route [Route(/bla/continuous/data)]
System.Xml.Serialization.XmlSerializer serializeObj = new
System.Xml.Serialization.XmlSerializer(typeof(CachedResponse));
MemoryStream memoryStream = new MemoryStream();
serializeObj.Serialize(memoryStream, events);
var xmlDoc = Encoding.ASCII.GetString(memoryStream.ToArray());
XDocument xDoc = XDocument.Parse(xmlDoc);
xDoc.StripNamespace();
var xmlReader = xDoc.CreateReader();
xmlReader.MoveToContent();
var response = Regex.Replace(xmlReader.ReadInnerXml(), #"\s+", "");
Now at the end the response type is string type, but i guess thats not continuous flow, i wanna be able to send that response continuously if that make sense, not sure if i am even explaining right way!!

You must use pub/sub pattern, you can use any message broker for that

Parsing HTTP response as it arrives

My Web API, AAA, calls another API, BBB, to retrieve a large JSON array (~500-1000 KB and each object is 10 KB), it needs to parse the JSON array to apply a logic on it and forward the response to API CCC.
For optimization, I'd like that my Web API AAA doesn't have store the HTTP response containing the large JSON array, so the array doesn't have to be stored in the LOH (Large Object Heap).
I think a good idea to solve this issue is: instead of waiting for the full JSON array to be downloaded, is it possible to parse the elements of the response as it arrives so I can parse it, apply a logic on it and forward the content to my API CCC?
So my Web API never gets to store the large JSON array in memory. By parsing each object as it arrives, the object is so small that it will be stored in GEN 0 and gets collected really fast by GC.
What I tried so far:
My API BBB looks like this (simplified):
[HttpGet("{id}")]
public IActionResult Get(int id)
{
var text = System.IO.File.ReadAllText("C:\\Users\\John\\generated1000objects.json");
var deserialized = JsonConvert.DeserializeObject<object[]>(text);
return Ok(deserialized);
}
My code to query
var httpClient = new HttpClient();
using (var request = new HttpRequestMessage(HttpMethod.Get, "https://localhost:44328/api/values/4"))
using (var response = await httpClient.SendAsync(request, HttpCompletionOption.ResponseHeadersRead))
using (Stream stream = await response.Content.ReadAsStreamAsync())
using (StreamReader sr = new StreamReader(stream))
using (JsonReader reader = new JsonTextReader(sr))
{
reader.SupportMultipleContent = true;
while (true)
{
if (!reader.Read())
{
break;
}
JsonSerializer serializer = new JsonSerializer();
var deserialize = serializer.Deserialize<object>(reader);
Console.WriteLine(deserialize); // HERE it prints the whole JSON Array. I was expecting to deal with one object of the array
Console.WriteLine("#################");
}
}
My constraints:
I can't modify the API BBB that send the large JSON array.
My API CCC cannot directly call API BBB to retrieve the large JSON array
I'm on .NET Core with ASP.NET Core 2.2.

Looking at your solution, unless you are expecting this to grow in size substantially I believe that you might be suffering from trying to attempt a micro-optimization that will actually make your process more fragile than by simply processing in a regular manner.
You mention a record size of 10k, a response size of 500-1000k. This translates to a total of 50-100 records.
I believe that you will experience more difficulties in trying to parse the response in chunks than any impact of having an object on the Large Object Heap will provide. From what I can find in the various documentation, the ONLY way to parse a JSON document using a built-in library is to parse the whole document. Any chunking would need to be managed by yourself.

Proper way to read and write web api response stream

I'm having some trouble with finding the right incantation that will allow me to write to a response stream and then later read the contents in a test. Currently I have this
var res = new HttpResponseMessage(System.Net.HttpStatusCode.OK);
var ms = new MemoryStream();
res.Content = new StreamContent(ms);
using (var sw = new StreamWriter(ms, System.Text.Encoding.UTF8))
using (var csv = new CsvHelper.CsvWriter(sw))
csv.WriteRecords(allData.ToList());
return res;
In my test I'm trying to read this response
var controller = appContainer().Resolve<MyController>();
var res = (await controller.Get()) as HttpResponseMessage;
res.ShouldNotEqual(null);
var csv = await res.Content.ReadAsStringAsync();
the last line generates an error
Error while copying content to a stream.
----> System.ObjectDisposedException : Cannot access a closed Stream.
So there's a couple things here
Why is this error happening and how can I prevent it properly in the test?
The use of MemoryStream doesn't sit right with me, shouldn't I be able to write directly to the content's stream? Isn't MemoryStream potentially hugely increasing my memory usage?

Just put this out there, though it's not perfect... Using PushStreamContent does a lot of the job but it comes with its own headaches - namely that any exceptions that your anonymous method may produce will get swallowed and be difficult to track down without a full repro of the problem. When the bomb goes off is well passed the point of the pipeline where web api unhandled exception handlers would come into effect, and the xmlhttprequest doesn't seem to recognize the closure.
E.g. something like
HttpResponseMessage response = new HttpResponseMessage(HttpStatusCode.OK);
response.Content = new PushStreamContent((stream, content, context) =>
{
// write your output here
});
return response;
will get you what you want, provided that internal method never slips up or goes wrong in any way.
PushStreamContent flushes your http headers immediately before the anonymous method is called, so you're chunked and no way to reel it back in later.
You can add a try/catch in your anonymous method to leave yourself a note if something goes wrong, but in my experience XmlHttpRequest doesn't recognize when the remote server forcibly closes the request so it keeps on waiting. Only started to figure out what was going on when I put Fiddler in there, and Fiddler squawked.

Deserializing a local xml file using Rest Sharp

I have no problem deserializing an xml into my class while using the following code. I was wondering if it was possible to use the same code on a local file, as our source files are saved locally for archival purposes and are occasionally reprocessed.
This works for remote xml but not for local xml:
RestRequest request = new RestRequest();
var client = new RestClient();
//doesnt work
client.BaseUrl = directory;
request.Resource = file;
//works
client.BaseUrl = baseURL;
request.Resource = url2;
IRestResponse<T> response = client.Execute<T>(request);
return response.Data;
Is there a way to use RestSharp from a local file? I was going to try to use the same function regardless of whether the xml is local or remote and just pass it the location of the xml to read.

This is in fact possible using built in JsonDeserializer class as below. I have used this method to stub API response for testing.
// Read the file
string fileContents = string.Empty;
using (System.IO.StreamReader reader = new System.IO.StreamReader(#"C:\Path_to_File.txt"))
{
fileContents = rd.ReadToEnd();
}
// Deserialize
RestResponse<T> restResponse = new RestResponse<T>();
restResponse.Content = fileContents;
RestSharp.Deserializers.JsonDeserializer deserializer = new RestSharp.Deserializers.JsonDeserializer();
T deserializedObject = deserializer.Deserialize<T>(restResponse);

This is not possible with standard functionality. For example "file://" URLs do not work with RestSharp.
I would recommend using RestSharp do get the returned data from a Uri and having another function to deserialize this data into an object.
You can use the same funcion then to deserialize from file data.
RestSharp is a library to do REST calls, not to deserialize from arbitrary sources. Even if there is a possibility to make RestSharp believe it is talking to a website instead of a file, it would be a hack.
If you need it you could still use the XmlDeserializer from RestSharp. It expects a IRestResponse object, but it only uses the Content property from it, so it should be easy to create. It still feels like a hack though and there are more than enough other XmlSerializers out there that will do a great job

MemoryStream data corruption issue

I have created a simple REST based WCF service which runs on BasicHttpBinding. In one of my webmethod, I am returning a Stream which points to a JSON response.
The Method looks like :
[OperationContract]
[FaultContract(typeof(ApplicationFault))]
[WebInvoke(Method = "POST", UriTemplate = "GetActiveCalls/{nurseid}")]
Stream GetActiveCalls(string nurseid);
From the body of the GetActiveCalls, I am creating an object of MemoryStream and returning the same as response. The code looks like
// Serialize the results as JSON
string jsonResult = new JavaScriptSerializer().Serialize(baseResponses);
// ContentType json
WebOperationContext.Current.OutgoingResponse.ContentType = "application/json";
WebOperationContext.Current.OutgoingResponse.Headers.Add("Cache-Control", "no-cache");
var bytes = Encoding.UTF8.GetBytes(jsonResult);
//Parse to memorystream
var ms = new MemoryStream(bytes);
ms.Seek(0, SeekOrigin.Begin);
ms.SetLength(bytes.LongLength);
return ms;
When trying this from client, I get result like
{"LastEvents":[{"FormatValues":"Klic 2 3 4","Icon":null,"Color":"Red","Acknowledged":false,"EventID":28566}],"Message":"","Status":true}
But sometimes after invoking the same method for multiple times, I start getting the response as :
{"LastEvents":[{"FormatValues":"Klic 2 3 4","Icon":null,"Color":"Red","Acknowledged":false,"EventID":28566}],"Message":"","Statu{"LastEv
You can see after "Statu on the JSON response, the stream gets reset and starts getting data from the beginning.
It looks strange to me.
*From server side, when I put breakpoint, it seems the MemoryStream has correct response.

Putting aside the question of using a memory stream or not, I encountered a similar issue just recently, where the memory stream response appeared corrupted, seemingly randomly. The solution to this problem was to remove the tracing sections from web.config, which I had turned on in dev mode. This may or may not be your issue, but it might be worth having a look at. Seems as though this problem is still present in .NET 4.5 as well.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Deserializing large JSON Objects from Web Service (Out of Memory) - c#

Related

How to send a continuous stream of data with asp .net core web api?

Parsing HTTP response as it arrives

Proper way to read and write web api response stream

Deserializing a local xml file using Rest Sharp

MemoryStream data corruption issue

Categories

Resources