Serialise nested list or alternatives - c#

I am trying to store a collection of lists (each containing over 20.000 int's) and was hoping to use a nested lest for this since each day a new list will be added.
Eventually I need to access the data in the following way:
"Take the first value of each list and compile a new list".
Iddeally I'd like to serialise a List<List<int>> however this does not seem to work (I can serialise a List<int>). Is there a trick to doing this (preferably without getting any addons)?
If not, how would you advice me to store such data efficiently and quick?
The way I try it now:
static void saveFunction(List<int> data, string name)
{
using (Stream stream = File.Open(name + ".bin", FileMode.OpenOrCreate))
{
BinaryFormatter bin = new BinaryFormatter();
if (stream.Length == 0)
{
List<List<int>> List = new List<List<int>>();
List.Add(data);
bin.Serialize(stream, List);
}
else
{
List<List<int>> List = (List<List<int>>)bin.Deserialize(stream);
List.Add(data);
bin.Serialize(stream, List);
}
}
}
Strangely the list.Count remains 1, and the number of int in the list remain the same as well while the file size increases.

You need to rewind the stream and clear the previous data between reading and writing:
static void saveFunction(List<int> data, string name)
{
using (Stream stream = File.Open(name + ".bin", FileMode.OpenOrCreate))
{
BinaryFormatter bin = new BinaryFormatter();
if (stream.Length == 0)
{
var List = new List<List<int>>();
List.Add(data);
bin.Serialize(stream, List);
}
else
{
var List = (List<List<int>>)bin.Deserialize(stream);
List.Add(data);
stream.SetLength(0); // Clear the old data from the file
bin.Serialize(stream, List);
}
}
}
What you are doing now is appending the new list to the end of the file while leaving the old list as-is -- which BinaryFormatter will happily read as the (first) object in the file when it is re-opened.
As for your second question, "how would you advice me to store such data efficiently and quick?", since your plan is to "take the first value of each list and compile a new list", it appears you're going to need to re-read the preceding lists when writing a new list. If that were not true, however, and each new list was independent of the preceding lists, BinaryFormatter does support writing multiple root objects to the same file. See here for details: Serializing lots of different objects into a single file

Related

Export to Excel with server side webapi and list

I am getting the response from API on a generic list
IEnumerable<myClass> objClass.
Here I am trying to export the list to a CSV file using StreamWriter
var serviceResponse = await services.GetProfileRepositoryAsync(requestDto, token);
if(requestDto.IsExportable)
{
MemoryStream stream = new MemoryStream();
StreamWriter writer = new StreamWriter(stream);
**writer.Write((serviceResponse.dto.NewSoftwareFileDto));**
writer.Flush();
stream.Position = 0;
return File(stream, "text/csv", "filesname.csv");
}
Since serviceResponse.dto.NewSoftwareFileDto returns a list the writer.Write is not wring the content. Once I have used Objectresult with writer.Write() method and it was working earlier. Now I am not able to recollect it.
I want to avoid looping through the list and writing data.
You can't send a DTO list directly to a MemoryStream using StreamWriter.Write(), because StreamWriter.Write() only accepts either char, char[] or string as first parameter. Since you want to avoid using loops (either for or foreach), you could use LINQ to create a List<string> from existing list of DTO as comma-separated values and then write its contents into the stream:
MemoryStream stream = new MemoryStream();
StreamWriter writer = new StreamWriter(stream);
// create list of strings from DTO list
List<string> items = serviceResponse.dto.NewSoftwareFileDto.Select(x =>
string.Join(",", x.Property1, x.Property2, ...)).ToList();
// insert newline between every lines (i.e. list indexes)
string combined = string.Join(Environment.NewLine, items);
// write combined strings to StreamWriter
writer.Write(combined);
writer.Flush();
stream.Position = 0;
return File(stream, "text/csv", "filesname.csv");
Note that Property1, Property2 etc. represents property names inside list of DTO object, depending on order of definition. The property names used in Select must be available inside NewSoftwareFileDto.

How to parse huge JSON file as stream in Json.NET?

I have a very, very large JSON file (1000+ MB) of identical JSON objects. For example:
[
{
"id": 1,
"value": "hello",
"another_value": "world",
"value_obj": {
"name": "obj1"
},
"value_list": [
1,
2,
3
]
},
{
"id": 2,
"value": "foo",
"another_value": "bar",
"value_obj": {
"name": "obj2"
},
"value_list": [
4,
5,
6
]
},
{
"id": 3,
"value": "a",
"another_value": "b",
"value_obj": {
"name": "obj3"
},
"value_list": [
7,
8,
9
]
},
...
]
Every single item in the root JSON list follows the same structure and thus would be individually deserializable. I already have the C# classes written to receive this data, and deserializing a JSON file containing a single object without the list works as expected.
At first, I tried to just directly deserialize my objects in a loop:
JsonSerializer serializer = new JsonSerializer();
MyObject o;
using (FileStream s = File.Open("bigfile.json", FileMode.Open))
using (StreamReader sr = new StreamReader(s))
using (JsonReader reader = new JsonTextReader(sr))
{
while (!sr.EndOfStream)
{
o = serializer.Deserialize<MyObject>(reader);
}
}
This didn't work, threw an exception clearly stating that an object is expected, not a list. My understanding is that this command would just read a single object contained at the root level of the JSON file, but since we have a list of objects, this is an invalid request.
My next idea was to deserialize as a C# List of objects:
JsonSerializer serializer = new JsonSerializer();
List<MyObject> o;
using (FileStream s = File.Open("bigfile.json", FileMode.Open))
using (StreamReader sr = new StreamReader(s))
using (JsonReader reader = new JsonTextReader(sr))
{
while (!sr.EndOfStream)
{
o = serializer.Deserialize<List<MyObject>>(reader);
}
}
This does succeed. However, it only somewhat reduces the issue of high RAM usage. In this case it does look like the application is deserializing items one at a time, and so is not reading the entire JSON file into RAM, but we still end up with a lot of RAM usage because the C# List object now contains all of the data from the JSON file in RAM. This has only displaced the problem.
I then decided to simply try taking a single character off the beginning of the stream (to eliminate the [) by doing sr.Read() before going into the loop. The first object then does read successfully, but subsequent ones do not, with an exception of "unexpected token". My guess is this is the comma and space between the objects throwing the reader off.
Simply removing square brackets won't work since the objects do contain a primitive list of their own, as you can see in the sample. Even trying to use }, as a separator won't work since, as you can see, there are sub-objects within the objects.
What my goal is, is to be able to read the objects from the stream one at a time. Read an object, do something with it, then discard it from RAM, and read the next object, and so on. This would eliminate the need to load either the entire JSON string or the entire contents of the data into RAM as C# objects.
What am I missing?
This should resolve your problem. Basically it works just like your initial code except it's only deserializing object when the reader hits the { character in the stream and otherwise it's just skipping to the next one until it finds another start object token.
JsonSerializer serializer = new JsonSerializer();
MyObject o;
using (FileStream s = File.Open("bigfile.json", FileMode.Open))
using (StreamReader sr = new StreamReader(s))
using (JsonReader reader = new JsonTextReader(sr))
{
while (reader.Read())
{
// deserialize only when there's "{" character in the stream
if (reader.TokenType == JsonToken.StartObject)
{
o = serializer.Deserialize<MyObject>(reader);
}
}
}
I think we can do better than the accepted answer, using more features of JsonReader to make a more generalized solution.
As a JsonReader consumes tokens from a JSON, the path is recorded in the JsonReader.Path property.
We can use this to precisely select deeply nested data from a JSON file, using regex to ensure that we're on the right path.
So, using the following extension method:
public static class JsonReaderExtensions
{
public static IEnumerable<T> SelectTokensWithRegex<T>(
this JsonReader jsonReader, Regex regex)
{
JsonSerializer serializer = new JsonSerializer();
while (jsonReader.Read())
{
if (regex.IsMatch(jsonReader.Path)
&& jsonReader.TokenType != JsonToken.PropertyName)
{
yield return serializer.Deserialize<T>(jsonReader);
}
}
}
}
The data you are concerned with lies on paths:
[0]
[1]
[2]
... etc
We can construct the following regex to precisely match this path:
var regex = new Regex(#"^\[\d+\]$");
it now becomes possible to stream objects out of your data (without fully loading or parsing the entire JSON) as follows
IEnumerable<MyObject> objects = jsonReader.SelectTokensWithRegex<MyObject>(regex);
Or if we want to dig even deeper into the structure, we can be even more precise with our regex
var regex = new Regex(#"^\[\d+\]\.value$");
IEnumerable<string> objects = jsonReader.SelectTokensWithRegex<string>(regex);
to only extract value properties from the items in the array.
I've found this technique extremely useful for extracting specific data from huge (100 GiB) JSON dumps, directly from HTTP using a network stream (with low memory requirements and no intermediate storage required).
.NET 6
This is easily done with the System.Text.Json.JsonSerializer in .NET 6:
using (FileStream? fileStream = new FileStream("hugefile.json", FileMode.Open))
{
IAsyncEnumerable<Person?> people = JsonSerializer.DeserializeAsyncEnumerable<Person?>(fileStream);
await foreach (Person? person in people)
{
Console.WriteLine($"Hello, my name is {person.Name}!");
}
}
Here is another easy way to parse large JSON file using Cinchoo ETL, an open source library (Uses JSON.NET under the hood to parse the json in stream manner)
using (var r = ChoJSONReader<MyObject>.LoadText(json)
)
{
foreach (var rec in r)
Console.WriteLine(rec.Dump());
}
Sample fiddle: https://dotnetfiddle.net/i5qJ5R
Is this what you're looking for? Found on a previous question
The current version of Json.net does not allow you to use the accepted answer code. A current alternative is:
public static object DeserializeFromStream(Stream stream)
{
var serializer = new JsonSerializer();
using (var sr = new StreamReader(stream))
using (var jsonTextReader = new JsonTextReader(sr))
{
return serializer.Deserialize(jsonTextReader);
}
}
Documentation: Deserialize JSON from a file stream

serialize/deserialize a list of objects using BinaryFormatter

I know there were already many discussions on that topic, like this one:
BinaryFormatter and Deserialization Complex objects
but this looks awfully complicated. What I'm looking for is an easier way to serialize and deserialize a generic List of objects into/from one file. This is what I've tried:
public void SaveFile(string fileName)
{
List<object> objects = new List<object>();
// Add all tree nodes
objects.Add(treeView.Nodes.Cast<TreeNode>().ToList());
// Add dictionary (Type: Dictionary<int, Tuple<List<string>, List<string>>>)
objects.Add(dictionary);
using(Stream file = File.Open(fileName, FileMode.Create))
{
BinaryFormatter bf = new BinaryFormatter();
bf.Serialize(file, objects);
}
}
public void LoadFile(string fileName)
{
ClearAll();
using(Stream file = File.Open(fileName, FileMode.Open))
{
BinaryFormatter bf = new BinaryFormatter();
object obj = bf.Deserialize(file);
// Error: ArgumentNullException in System.Core.dll
TreeNode[] nodeList = (obj as IEnumerable<TreeNode>).ToArray();
treeView.Nodes.AddRange(nodeList);
dictionary = obj as Dictionary<int, Tuple<List<string>, List<string>>>;
}
}
The serialization works, but the deserialization fails with an ArgumentNullException. Does anyone know how to pull the dictionary and the tree nodes out and cast them back, may be with a different approach, but also nice and simple? Thanks!
You have serialized a list of objects where the first item is a list of nodes and the second a dictionary. So when deserializing, you will get the same objects back.
The result from deserializing will be a List<object>, where the first element is a List<TreeNode> and the second element a Dictionary<int, Tuple<List<string>, List<string>>>
Something like this:
public static void LoadFile(string fileName)
{
ClearAll();
using(Stream file = File.Open(fileName, FileMode.Open))
{
BinaryFormatter bf = new BinaryFormatter();
object obj = bf.Deserialize(file);
var objects = obj as List<object>;
//you may want to run some checks (objects is not null and contains 2 elements for example)
var nodes = objects[0] as List<TreeNode>;
var dictionary = objects[1] as Dictionary<int, Tuple<List<string>,List<string>>>;
//use nodes and dictionary
}
}
You can give it a try on this fiddle.

put xml into Array

I have an xml file, and I need to be able to sort it in either a list or an array
The XML:
<Restaurant>
<name>test</name>
<location>test</location>
</Restaurant>
<Restaurant>
<name>test2</name>
<location>test2</location>
</Restaurant>
All the Restaurants will have the same number of fields and the same names for the fields, but the number of <Restaurant></Restaurant> in a given xml file is unknown.
In other words, I need an array or list and be able to do this:
String name = restaurantArray[0].name;
String location = restaurantArray[0].location;
While I don't need that syntax obviously, this is the functionality I'm trying to accomplish.
If you are trying to get names of restaurants and Restaurant elements are direct child of root element:
string[] names = xdoc.Root.Elements("Restaurant")
.Select(r => (string)r.Element("name"))
.ToArray();
EDIT: If you are trying to parse whole restaurant objects:
var restaurants = from r in xdoc.Root.Elements("Restaurant")
select new {
Name = (string)r.Element("name"),
Location = (string)r.Element("location")
};
Usage:
foreach(var restaurant in restaurants)
{
// use restaurant.Name or restaurant.Location
}
You can create instance of some Restaurant class instead of anonymous object here. Also you can put restaurants to array by simple restaurants.ToArray() call.
The answer by Sergey is very clear but if you want to load it from the saved file I think it will be helpful for you.
Actually for loading a XML files to the array I used this method, But my array was double Jagged array. The code that I used is below I modified based on your resturant:
private static resturant[][] LoadXML(string filePath)
{
//Open the XML file
System.IO.FileStream fs = new System.IO.FileStream(filePath, System.IO.FileMode.Open);
// First create a xml Serializer object
System.Xml.Serialization.XmlSerializer xmlSer = new System.Xml.Serialization.XmlSerializer(typeof(resturant[][]));
resturant[][] resturant = (resturant[][])xmlSer.Deserialize(fs);
// Close the file stream
fs.Close();
return resturant ;
}
By this function you can read all your data as below :
double [][] res = LoadXML(#"YOUR FILE PATH");
As you know the first and second element of each resturant is name and location I think accessing to them is now easy for you.

c# List<myObject> myList.copyTo() keeps a reference?

I've got a List and I used .copyTo() method. So it copies my List into one dimensional array.
So I loop this array and add each myObject to another List, then I'm changing things in this new List.
After this I'm showing the difference between the new values in my second List and the old values that are in my first List. But there is always no difference. So I'm thinking that the copyTo() method keeps a reference.
Are there other methods that doesn't keep a reference?
Yes. .CopyTo() performs a shallow copy, which means it copies the references. What you need is a deep copy, by cloning each object.
The best way is to make you myObject class implement IClonable
public class YourClass
{
public object Clone()
{
using (var ms = new MemoryStream())
{
var bf = new BinaryFormatter();
bf.Serialize(ms, this);
ms.Position = 0;
object obj = bf.Deserialize(ms);
ms.Close();
return obj;
}
}
}
Then you can cole .Clone() on each object and add that to a new List.
List<YourClass> originalItems = new List<YourClass>() { new YourClass() };
List<YourClass> newItemList = originalItems.Select(x => x.Clone() as YourClass).ToList();
If you've got a List of reference types, and you use the the CopyTo method to copy to an array, the contents of the List which are references will be copied across, so when you modify the objects in your array, they'll still refer to the same objects on the heap which are referenced from your list.

Categories

Resources