Deserialize object one by one from file .Net - c#

I'm trying to deserialize a list of heavy objects from a json file. I do not want to deserialize it the classic way, like directly to a list, because it will expose me to an OutOfMemory exception. So I'm looking for a way to handle object one by one to store them one by one in the database and be memory safe.
I already handle the serialization and it's working well, but I'm facing some difficulties for deserialization.
Any idea ?
Thanks in advance
// Serialization
using (var FileStream = new FileStream(DirPath + "/TPV.Json", FileMode.Create))
{
using (var sw = new StreamWriter(FileStream))
{
using (var jw = new JsonTextWriter(sw))
{
jw.WriteStartArray();
using (var _Database = new InspectionBatimentsDataContext(TheBrain.DBClient.ConnectionString))
{
foreach (var TPVId in TPVIds)
{
var pic = (from p in _Database.TPV
where Operators.ConditionalCompareObjectEqual(p.Release, TPVId.Release, false) & Operators.ConditionalCompareObjectEqual(p.InterventionId, TPVId.InterventionId, false)
select p).FirstOrDefault;
var ser = new JsonSerializer();
ser.Serialize(jw, pic);
jw.Flush();
}
}
jw.WriteEndArray();
}
}
}

I finnaly found a way to do it by using custom separator beetween each object during serialization. Then for deserialization, I simply read the json file as string until I find my custom separator and I deserialise readed string, all in a loop. It's not the perfect answer because I'm breaking json format in my files, but it's not a constraint in my case.

Related

Get the JSON Schema's from a large OpenAPI Document OR using NewtonSoft and resolve refs

I'm currently looking extracting all of the JSON Schemas from a large OpenAPI spec. I've been using the following NuGet packages:
Microsoft.OpenApi v1.3.1
Microsoft.OpenApi.Readers v1.3.1
I was hoping to use these to parse a large Open API spec and extract all of the JSON Schemas, which I am able to parse into 'Microsoft.OpenApi.Models.OpenApiSchema' objects. But I can't seem to create a JSON Schema from these objects and write it to file.
As it stands at the moment I have the following:
using (FileStream fs = File.Open(file.FullName, FileMode.Open))
{
var openApiDocument = new OpenApiStreamReader().Read(fs, out var diagnostic);
foreach (var schema in openApiDocument.Components.Schemas)
{
var schemaName = schema.Key;
var schemaContent = schema.Value;
var outputDir = Path.Combine(outputDirectory.FullName, fileNameWithoutExtension);
if (!Directory.Exists(outputDir))
{
Directory.CreateDirectory(outputDir);
}
var outputPath = Path.Combine(outputDir, schemaName + "-Schema.json");
var outputString = schemaContent.Serialize(OpenApiSpecVersion.OpenApi3_0, OpenApiFormat.Json);
using (TextWriter sw = new StreamWriter(outputPath, true))
{
sw.Write(outputString);
sw.Close();
}
}
}
The schemaContent appears to have all of the relevant properties for the schema, but I don't seem to be able to identify the next step in getting it from that object to a JSON Schema. I'm sure I'm missing something simple so any insight would be appreciated.
UPDATED
I had a bit of a think and took a slightly different approach using NewtonSoft Json instead.
var OpenApitext = File.ReadAllText(file.FullName, Encoding.UTF8);
var settings = new JsonSerializerSettings
{
PreserveReferencesHandling = PreserveReferencesHandling.Objects,
MetadataPropertyHandling = MetadataPropertyHandling.Ignore, //ign
Formatting = Newtonsoft.Json.Formatting.Indented
};
dynamic openApiJson = JsonConvert.DeserializeObject<ExpandoObject>(OpenApitext, settings);
if (openApiJson?.components?.schemas != null)
{
foreach (var schema in openApiJson.components.schemas)
{
var schemaString = JsonConvert.SerializeObject(schema, settings);
var outputDir = Path.Combine(outputDirectory.FullName, fileNameWithoutExtension);
if (!Directory.Exists(outputDir))
{
Directory.CreateDirectory(outputDir);
}
var outputPath = Path.Combine(outputDir, schema.Name + "-Schema.json");
using (TextWriter sw = new StreamWriter(outputPath, true))
{
sw.Write(schemaString);
sw.Close();
}
}
}
Now this will allow me to create the JSON Schema and write it to file, but it doesn't want to resolve references. Looking at the API spec all references appear to be local to the API Spec. What do I need to do in order to resolve all the references in the Open API Spec before I cycle through the schemas and write them to file? I've done a bit of research and a few people seem to build out this capability themselves, but they always use a class object as a way of supporting it which I can't do here.
I reached out through the microsoft/OpenAPI.NET GitHub repo in the end. By a bit of a coincidence/happenstance I got a response from the same person both there and here. So, thank you Darrel you've helped me solve the above scenario which I was getting rather confused over. I knew in the end it was that I hadn't quite implemented it correctly.
For reference the below use case was to take in a sizeable OpenAPI Spec (Json) and extract the JSON Schemas referenced whilst ensuring that the JSON Pointers ($ref, $id) etc were resolved when this was written out to file.
The reason this was the approach I wanted to take was that due to the size of the OpenAPI specs I had to work with it was incredibly difficult using pre-built tooling like Postman for example which can extract Schemas.
Final code snippet for my implementation, little rough on a couple of the lines, I'll neaten that up over the weekend.
Console.WriteLine($"Processing file: {file.FullName}");
var fileNameWithoutExtension = Path.GetFileNameWithoutExtension(file.FullName);
var fileExtension = Path.GetExtension(file.FullName);
var reader = new OpenApiStreamReader();
var result = await reader.ReadAsync(new FileStream(file.FullName, FileMode.Open));
foreach (var schemaEntry in result.OpenApiDocument.Components.Schemas)
{
var schemaFileName = schemaEntry.Key + ".json";
Console.WriteLine("Creating " + schemaFileName);
var outputDir = Path.Combine(outputDirectory.FullName, fileNameWithoutExtension);
if (!Directory.Exists(outputDir))
{
Directory.CreateDirectory(outputDir);
}
var outputPath = Path.Combine(outputDir, schemaFileName + "-Schema.json");
using FileStream? fileStream = new FileStream(outputPath, FileMode.CreateNew);
var writerSettings = new OpenApiWriterSettings() { InlineLocalReferences = true, InlineExternalReferences = true };
using var writer = new StreamWriter(fileStream);
schemaEntry.Value.SerializeAsV2WithoutReference(new OpenApiJsonWriter(writer, writerSettings));
}

Exception when writing data with CsvHelper

I'm trying to write data to a CSV-file using CsvHelper. However, I always get the following exception:
CsvHelper.Configuration.ConfigurationException: "Types that inherit
IEnumerable cannot be auto mapped. Did you accidentally call GetRecord
or WriteRecord which acts on a single record instead of calling
GetRecords or WriteRecords which acts on a list of records?"
This is my code (C#):
TextWriter outfile = new StreamWriter("blatest.csv");
List<string> test = new List<string>
{
"hello",
"world"
};
CsvWriter csv = new CsvWriter(outfile);
csv.WriteRecords(test);
I would like to write a List<string> or (ideally) a List<Dictionary<string, string>> to CSV. What would be the correct code for this? And how can I set the header row?
Any help is appreciated. I really can't wrap my head around this.
For the error, it is because string implements IEnumerable (because it is char[]). Generally using WriteRecords you pass in an IEnumerable of custom objects.
You could try another way (Example)
using(var stream = new MemoryStream())
using(var reader = new StreamReader(stream))
using(var writer = new StreamWriter(stream))
using(var csvWriter = new CsvHelper.CsvWriter(writer))
{
//csvWriter.Configuration.HasHeaderRecord = false;
foreach( var s in test)
{
csvWriter.WriteField(s);
}
writer.Flush();
stream.Position = 0;
reader.ReadToEnd(); //dump it where you want
}

Deserializing large files using Json.NET

I am trying to process a very large amount of data (~1000 seperate files, each of them ~30 MB) in order to use as input to the training phase of a machine learning algorithm. Raw data files formatted with JSON and I deserialize them using JsonSerializer class of Json.NET. Towards the end of the program, Newtonsoft.Json.dll throwing 'OutOfMemoryException' error. Is there a way to reduce the data in memory, or do I have to change all of my approach (such as switching to a big data framework like Spark) to handle this problem?
public static List<T> DeserializeJsonFiles<T>(string path)
{
if (string.IsNullOrWhiteSpace(path))
return null;
var jsonObjects = new List<T>();
//var sw = new Stopwatch();
try
{
//sw.Start();
foreach (var filename in Directory.GetFiles(path))
{
using (var streamReader = new StreamReader(filename))
using (var jsonReader = new JsonTextReader(streamReader))
{
jsonReader.SupportMultipleContent = true;
var serializer = new JsonSerializer();
while (jsonReader.Read())
{
if (jsonReader.TokenType != JsonToken.StartObject)
continue;
var jsonObject = serializer.Deserialize<dynamic>(jsonReader);
var reducedObject = ApplyFiltering(jsonObject) //return null if the filtering conditions are not met
if (reducedObject == null)
continue;
jsonObject = reducedObject;
jsonObjects.Add(jsonObject);
}
}
}
//sw.Stop();
//Console.WriteLine($"Elapsed time: {sw.Elapsed}, Elapsed mili: {sw.ElapsedMilliseconds}");
}
catch (Exception ex)
{
Console.WriteLine($"Error: {ex}")
return null;
}
return jsonObjects;
}
Thanks.
It's not really a problem with Newtonsoft. You are reading all of these objects into one big list in memory. It gets to a point where you ask the JsonSerializer to create another object and it fails.
You need to return IEnumerable<T> from your method, yield return each object, and deal with them in the calling code without storing them in memory. That means iterating the IEnumerable<T>, processing each item, and writing to disk or wherever they need to end up.

Easiest way to write/read class

I'm new to C# and Windows Phone developing, I need to do app for school project.
I have simple class, and I want to save it to have access later, on next start of application.
What is the best (and easiest) method to do that? Using file, database or some other application memory?
Here is my class:
public class Place
{
private string name;
private string description;
private int distance;
private bool enabled;
private GeoCoordinate coordinates;
}
I need to store multiple instances of class.
There is no "best" way to do it; it depends on how you're going to use it.
A simple way is to use serialization, to XML, JSON, binary or whatever you want. I personally like JSON, as it's very lightweight and easy to read for a human. You can use the JSON.NET library to serialize objects to JSON.
For instance, if you want to serialize a collection of Place to a file, you can do something like that:
static async Task SavePlacesAsync(ICollection<Place> places)
{
var serializer = new JsonSerializer();
var folder = ApplicationData.Current.LocalFolder;
var file = await folder.CreateFileAsync("places.json", CreationCollisionOption.ReplaceExisting);
using (var stream = await file.OpenStreamForWriteAsync())
using (var writer = new StreamWriter(stream))
{
serializer.Serialize(writer, places);
}
}
And to read it back from the file:
static async Task<ICollection<Place>> LoadPlacesAsync()
{
try
{
var serializer = new JsonSerializer();
var folder = ApplicationData.Current.LocalFolder;
var file = await folder.GetFileAsync("places.json");
using (var stream = await file.OpenStreamForReadAsync())
using (var reader = new StreamReader(stream))
using (var jReader = new JsonTextReader(reader))
{
return serializer.Deserialize<ICollection<Place>>(jReader, places);
}
}
catch(FileNotFoundException)
{
return new List<Place>();
}
}
I think the easiest way to make persistent your object is to store them in a file. However this is not the best way due to the time spent in IO operations, low security, etc.
Here you have a nice example:
How to quickly save/load class instance to file

xml serialisation best practices

I have been using the traditional way of serializing content with the following code
private void SaveToXml(IdentifiableEntity IE)
{
try
{
XmlSerializer serializer = new XmlSerializer(IE.GetType());
TextWriter textWriter = new StreamWriter(IE.FilePath);
serializer.Serialize(textWriter, IE);
textWriter.Close();
}
catch (Exception e )
{
Console.WriteLine("erreur : "+ e);
}
}
private T LoadFromXml<T>(string path)
{
XmlSerializer deserializer = new XmlSerializer(typeof(T));
TextReader textReader = new StreamReader(path);
T entity = (T)deserializer.Deserialize(textReader);
textReader.Close();
return entity;
}
Though this approach does the trick, i find it a bit annoying that all my properties have to be public, that i need to tag the properties sometimes [XmlAttribute|XmlElement| XmlIgnore] and that it doesn't deal with dictionaries.
My question is : Is there a better way of serializing objects in c#, a way that with less hassle, more modern and easy to use?
First of all, I would suggest to use "using" blocks in your code.(Sample code)
If my understanding is OK, you are looking for a fast way to build your model classes that you will use during your deserialize/serialize operations.
Every Xml file is different and I don't know any generic way to serialize / deserialize them. At one moment you have to know if there will be an attribute, or elements or if any element can be null etc.
Assuming that you already have a sample XML file with a few lines which gives you general view of how it will look like
I would suggest to use xsd (miracle tool)
xsd yourXMLFileName.xml
xsd yourXMLFileName.xsd \classes
This tool will generate you every time model classes for the XML file you want to work it.
Than you serialize and deserialize easily
To deserialize (assuming that you'll get a class named XXXX representing root node in your xml)
XmlSerializer ser = new XmlSerializer(typeof(XXXX));
XXXX yourVariable;
using (XmlReader reader = XmlReader.Create(#"C:\yyyyyy\yyyyyy\YourXmlFile.xml"))
{
yourVariable= (XXXX) ser.Deserialize(reader);
}
To serialize
var serializer = new XmlSerializer(typeof(XXXX));
using(var writer = new StreamWriter(#"C:\yyyyyy\yyyyyy\YourXmlFile.xml"))
{
serializer.Serialize(writer, yourVariable);
}

Categories

Resources