Related
Given 2 JSON strings:
[
{
"id":"BA",
"description":"BrandA",
"values":[
{
"id":"CategoryA",
"description":"CategoryA"
},
{
"id":"CategoryB",
"description":"CategoryB"
},
{
"id":"CategoryC",
"description":"CategoryC"
},
{
"id":"CategoryD",
"description":"CategoryD"
},
{
"id":"CategoryE",
"description":"CategoryE"
},
{
"id":"CategoryF",
"description":"CategoryF"
},
{
"id":"CategoryG",
"description":"CategoryG"
},
{
"id":"CategoryH",
"description":"CategoryH"
}
]
},
{
"id":"BB",
"description":"BrandB",
"values":[
{
"id":"CategoryA",
"description":"CategoryA"
},
{
"id":"CategoryB",
"description":"CategoryB"
},
{
"id":"CategoryC",
"description":"CategoryC"
}
]
}
]
AND
[
{
"id":"BA",
"description":"BrandA",
"values":[
{
"id":"CategoryA",
"description":"CategoryA"
},
{
"id":"CategoryC",
"description":"CategoryC"
}
]
},
{
"id":"BB",
"description":"BrandB",
"values":[
{
"id":"CategoryB",
"description":"CategoryB"
}
]
}
]
First one is the original. The second are the values that I want to remove from the original. So basically, if there is a match on brand and category between first and second JSON, regardless of the order of the elements, I want that match to be removed.
The expected result would be someting like this:
[
{
"id":"BA",
"description":"BrandA",
"values":[
{
"id":"CategoryB",
"description":"CategoryB"
},
{
"id":"CategoryD",
"description":"CategoryD"
},
{
"id":"CategoryE",
"description":"CategoryE"
},
{
"id":"CategoryF",
"description":"CategoryF"
},
{
"id":"CategoryG",
"description":"CategoryG"
},
{
"id":"CategoryH",
"description":"CategoryH"
}
]
},
{
"id":"BB",
"description":"BrandB",
"values":[
{
"id":"CategoryA",
"description":"CategoryA"
},
{
"id":"CategoryC",
"description":"CategoryC"
}
]
}
]
Catagory A and C in Brand A were removed as well as Category B in Brand B.
Based in some research, I was using https://github.com/wbish/jsondiffpatch.net, tried to work with it's functions, but so far I didn't manage to achieve the result I want. Also, to solve this by processing JSON direcly is not a must. If there is a simpler solution to achieve that by converting them to lists and use something like LINQ for example, it works for me as well (tried that, but didn't manage to find a way to do this comparison).
Thanks in advance.
If performance does not matter, the JsonSerializer and LINQ can be used:
Model
public class JsonModel
{
public class ValueModel
{
public string Id { get; set; }
public string Description { get; set; }
}
public string Id { get; set; }
public string Description { get; set; }
public IEnumerable<ValueModel> Values { get; set; }
}
Deserialize and LINQ
string json1Str = #"[...]";
string json2Str = #"[...]";
var opt = new System.Text.Json.JsonSerializerOptions { PropertyNameCaseInsensitive = true };
var json1 = System.Text.Json.JsonSerializer.Deserialize<List<JsonModel>>(json1Str, opt);
var json2 = System.Text.Json.JsonSerializer.Deserialize<List<JsonModel>>(json2Str, opt);
var result = json1.
Join(json2, j1 => j1.Id, j2 => j2.Id, (j1, j2) => new JsonModel
{
Id = j1.Id,
Description = j1.Description,
Values = j1.Values.Where(j1 => !j2.Values.Select(val => val.Id).Contains(j1.Id))
});
Console.WriteLine(System.Text.Json.JsonSerializer.Serialize(result, new System.Text.Json.JsonSerializerOptions { WriteIndented = true }));
}
Result
[
{
"Id": "BA",
"Description": "BrandA",
"Values": [
{
"Id": "CategoryB",
"Description": "CategoryB"
},
{
"Id": "CategoryD",
"Description": "CategoryD"
},
{
"Id": "CategoryE",
"Description": "CategoryE"
},
{
"Id": "CategoryF",
"Description": "CategoryF"
},
{
"Id": "CategoryG",
"Description": "CategoryG"
},
{
"Id": "CategoryH",
"Description": "CategoryH"
}
]
},
{
"Id": "BB",
"Description": "BrandB",
"Values": [
{
"Id": "CategoryA",
"Description": "CategoryA"
},
{
"Id": "CategoryC",
"Description": "CategoryC"
}
]
}
]
I have a WebAPI method that returns Json in a flexible structure that depends on the request.
Part of the problem is that there could be any number of columns, and they could be any type. The 2 given below (Code and Count) are just one example.
This structure is based on the underlying classes but there could be any number of columns in the output. So, rather than the usual properties you might expect, these are objects in a collection with Name and Value properties.
The downside of this flexible approach is that it gives a non-standard format.
Is there a way to transform this into a more normalised shape? Are there maybe some attributes I can add to the class properties to change the way they are serialised?
For example, where there are 2 columns - Code (string) and Count (numeric):
Current Json:
{
"Rows": [
{
"Columns": [
{
"Value": "1",
"Name": "Code"
},
{
"Value": 13,
"Name": "Count"
}
]
},
{
"Columns": [
{
"Value": "2",
"Name": "Code"
},
{
"Value": 12,
"Name": "Count"
}
]
},
{
"Columns": [
{
"Value": "9",
"Name": "Code"
},
{
"Value": 1,
"Name": "Count"
}
]
},
{
"Columns": [
{
"Value": "5",
"Name": "Code"
},
{
"Value": 2,
"Name": "Count"
}
]
}
]
}
Ideally I'd like to transform it to this:
{
"Rows": [
{
"Code": "1",
"Count": 13
},
{
"Code": "2",
"Count": 12
},
{
"Code": "9",
"Count": 1
},
{
"Code": "5",
"Count": 2
}
]
}
The controller method (C#)
public ReportResponse Get(ReportRequest request)
{
var result = ReportLogic.GetReport(request);
return result;
}
The output classes
public class ReportResponse
{
public List<ReportRow> Rows { get; set; }
public ReportResponse()
{
Rows = new List<ReportRow>();
}
}
public class ReportRow
{
public List<ReportColumn> Columns { get; set; }
public ReportRow()
{
Columns = new List<ReportColumn>();
}
}
public class ReportColumn<T> : ReportColumn
{
public T Value { get; set; }
public ReportColumn(string name)
{
Name = name;
}
}
public abstract class ReportColumn
{
public string Name { get; internal set; }
}
I think the easiest way would be to map your class to a dictionary before serializing. Something like:
var dictionaries = List<Dictionary<string, object>();
foreach(var column in rows.Columns)
{
dictionaries.Add(new Dictionary<string, object>{{column.Name, column.Value}});
}
Then serialize the dictionaries variable should do the trick.
If you're using the output in JavaScript, you could translate as follows:
var
data = {
"Rows": [
{
"Columns": [
{
"Value": "1",
"Name": "Code"
},
{
"Value": 13,
"Name": "Count"
}
]
},
{
"Columns": [
{
"Value": "2",
"Name": "Code"
},
{
"Value": 12,
"Name": "Count"
}
]
},
{
"Columns": [
{
"Value": "9",
"Name": "Code"
},
{
"Value": 1,
"Name": "Count"
}
]
},
{
"Columns": [
{
"Value": "5",
"Name": "Code"
},
{
"Value": 2,
"Name": "Count"
}
]
}
]
},
output = [
];
data.Rows.forEach(function (row)
{
var
newRow = {};
row.Columns.forEach(function (column)
{
newRow[column.Name] = column.Value;
});
output.push(newRow);
})
console.log(JSON.stringify(output));
I have Variants and subvariants and this sub variants are those entities which will come under different tests.
Now i want to remove duplicate sub variants from nested child object from the json structure.
Json structure:
[
{
"ParentVariantName": "Variant1",
"TestList": [
{
"TestId": 100,
"Version": 0,
"SubVariantsList": []
]
},
{
"TestId": 101,
"Version": 1.0,
"SubVariantsList": [
{
"SourceSubVariantModel": {
"Id": 69,
"Name": "Abc",
"DiffPerc": 100
},
"TargetSubVariantModel": {
"Id": 70,
"Name": "Pqr",
"DiffPerc": 200
}
},
{
"SourceSubVariantModel": {
"Id": 70,
"Name": "Pqr",
"SourceValue": 200
},
"TargetSubVariantModel": {
"Id": 71,
"Name": "Xyz",
"TargetValue":300
}
}
]
},
{
"TestId": 247,
"Version": 3.0,
"SubVariantsList": []
},
{
"TestId": 248,
"Version": 4.0,
"SubVariantsList": []
},
{
"TestId": 249,
"Version": 5.0,
"SubVariantsList": []
},
{
"TestId": 250,
"Version": 6.0,
"SubVariantsList": []
}
]
}
]
For eg:in above json structure for TestId = 101:
Id 70 is repeating 2 times and so i want to merge duplicate records and want output like this:
Expected Output:
[
{
"ParentVariantName": "Variant1",
"TestList": [
{
"TestId": 100,
"Version": 0,
"SubVariantsList": []
]
},
{
"TestId": 101,
"Version": 1.0,
"SubVariantsList": [
{
"Id": 69,
"Name": "Abc",
"DiffPerc": 100
},
{
"Id": 70,
"Name": "Abc",
"DiffPerc": 200
},
{
"Id": 71,
"Name": "Abc",
"DiffPerc": 300
},
]
},
{
"TestId": 247,
"Version": 3.0,
"SubVariantsList": []
},
{
"TestId": 248,
"Version": 4.0,
"SubVariantsList": []
},
{
"TestId": 249,
"Version": 5.0,
"SubVariantsList": []
},
{
"TestId": 250,
"Version": 6.0,
"SubVariantsList": []
}
]
}
]
This is what i am trying:
foreach (var item in data)
{
foreach (var testList in item.TestList)
{
foreach (var cdList in testList.SubVariantsList)
{
if (list.Count == 0)
{
list.Add
(
new
{
}
)
}
}
}
}
You need to define a class for your new SubVariantsList entries if you haven't done so and instantiate it when populating a new list for the updated SubVariantsList.
In the sample below I defined Model as such a class. It also defines UpsertToList() method to help filling the list without duplication.
using Newtonsoft.Json;
using System;
using System.Collections.Generic;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string json = #"
[
{
""ParentVariantName"": ""Variant1"",
..... Removed to save space. Insert your input JSON here with " changed to "" ....
";
var data = JsonConvert.DeserializeObject<dynamic>(json);
foreach (var item in data)
foreach (var test in item.TestList)
{
var list = new List<Model>();
foreach (var pair in test.SubVariantsList)
{
Model.UpsertToList(list, pair.SourceSubVariantModel);
Model.UpsertToList(list, pair.TargetSubVariantModel);
}
test.SubVariantsList = Newtonsoft.Json.Linq.JToken.FromObject(list);
}
Console.WriteLine(JsonConvert.SerializeObject(data, Formatting.Indented));
Console.ReadKey();
}
}
class Model
{
public int Id;
public string Name;
public int DiffPerc;
public static void UpsertToList(List<Model> list, dynamic sourceObject)
{
var item = list.FindLast(model => sourceObject.Id == model.Id);
if (item == null)
list.Add(new Model(sourceObject));
else
item.Update(sourceObject);
}
Model(dynamic sourceObject)
{
Id = sourceObject.Id;
Update(sourceObject);
}
void Update(dynamic sourceObject)
{
Name = sourceObject.Name;
var diffPerc = sourceObject["DiffPerc"];
if (diffPerc == null)
diffPerc = sourceObject["SourceValue"];
if (diffPerc == null)
diffPerc = sourceObject["TargetValue"];
if (diffPerc != null)
DiffPerc = diffPerc;
}
}
}
Note: I used Json.Net so you may need to Install-Package Newtonsoft.Json
Is there something wrong in my .Nest libs query? My query will get all data, I need to get by multi term.
Query string elastic result i want:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1000,
"max_score": 0,
"hits": []
},
"aggregations": {
"log_query": {
"doc_count": 2,
"histogram_Log": {
"buckets": [
{
"key_as_string": "06/02/2015 12:00:00",
"key": 1423180800000,
"doc_count": 1
},
{
"key_as_string": "21/02/2015 12:00:00",
"key": 1424476800000,
"doc_count": 1
}
]
}
}
}
}
My query string elastic:
{
"size": 0,
"aggs": {
"log_query": {
"filter": {
"bool": {
"must": [
{
"term": {
"cluster": "giauht1"
}
},
{
"term": {
"server": "hadoop0"
}
},
{
"term": {
"type": "Warn"
}
},
{
"range": {
"actionTime": {
"gte": "2015-02-01",
"lte": "2015-02-24"
}
}
}
]
}
},
"aggs": {
"histogram_Log": {
"date_histogram": {
"field": "actionTime",
"interval": "1d",
"format": "dd/MM/YYYY hh:mm:ss"
}
}
}
}
}
}
My .nest libs query:
Func<SearchDescriptor<LogInfoIndexView>, SearchDescriptor<LogInfoIndexView>> query =
que => que.Aggregations(aggs => aggs.Filter("log_query", fil =>
{
fil.Filter(fb => fb.Bool(fm => fm.Must(
ftm =>
{
ftm.Term(t => t.Cluster, cluster);
ftm.Term(t => t.Server, server);
ftm.Term(t => t.Type, logLevel);
ftm.Range(r => r.OnField("actionTime").GreaterOrEquals(from.Value).LowerOrEquals(to.Value));
return ftm;
}))).Aggregations(faggs => faggs.DateHistogram("histogram_Log", dr =>
{
dr.Field("actionTime");
dr.Interval("1d");
dr.Format("dd/MM/YYYY hh:mm:ss");
return dr;
}));
return fil;
})).Size(0).Type(new LogInfoIndexView().TypeName);
var result = client.Search(query);
My .nest result:
My model mapping:
{
"onef-sora": {
"mappings": {
"FPT.OneF.Api.Log": {
"properties": {
"actionTime": {
"type": "date",
"format": "dateOptionalTime"
},
"application": {
"type": "string",
"index": "not_analyzed"
},
"cluster": {
"type": "string",
"index": "not_analyzed"
},
"detail": {
"type": "string",
"index": "not_analyzed"
},
"iD": {
"type": "string"
},
"message": {
"type": "string",
"index": "not_analyzed"
},
"server": {
"type": "string",
"index": "not_analyzed"
},
"source": {
"type": "string",
"index": "not_analyzed"
},
"tags": {
"type": "string",
"index": "not_analyzed"
},
"type": {
"type": "string",
"index": "not_analyzed"
},
"typeLog": {
"type": "string"
},
"typeName": {
"type": "string"
},
"url": {
"type": "string",
"index": "not_analyzed"
},
"user": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
The Must() condition passed to the Bool() filter takes a params Func<FilterDescriptor<T>, FilterContainer>[] but in your filter, the Term() and Range() filters are chained onto the same filter instance; unfortunately, this doesn't work as you might expect and the end result is actually an empty json object passed to the must clause in the query DSL for the filter i.e. you end up with
{
"size": 0,
"aggs": {
"log_query": {
"filter": {
"bool": {
"must": [
{} /* where are the filters?! */
]
}
},
"aggs": {
"histogram_Log": {
"date_histogram": {
"field": "actionTime",
"interval": "1d",
"format": "dd/MM/YYYY hh:mm:ss"
}
}
}
}
}
}
The solution is to pass an array of Func<FilterDescriptor<T>, FilterContainer>; The following matches your query DSL
void Main()
{
var settings = new ConnectionSettings(new Uri("http://localhost:9200"));
var connection = new InMemoryConnection(settings);
var client = new ElasticClient(connection: connection);
DateTime? from = new DateTime(2015, 2,1);
DateTime? to = new DateTime(2015, 2, 24);
var docs = client.Search<LogInfoIndexView>(s => s
.Size(0)
.Type("type")
.Aggregations(a => a
.Filter("log_query", f => f
.Filter(ff => ff
.Bool(b => b
.Must(m => m
.Term(t => t.Cluster, "giauht1"),
m => m
.Term(t => t.Server, "hadoop0"),
m => m
.Term(t => t.Type, "Warn"),
m => m
.Range(r => r.OnField("actionTime").GreaterOrEquals(from.Value).LowerOrEquals(to.Value))
)
)
)
.Aggregations(aa => aa
.DateHistogram("histogram_Log", da => da
.Field("actionTime")
.Interval("1d")
.Format("dd/MM/YYYY hh:mm:ss")
)
)
)
)
);
Console.WriteLine(Encoding.UTF8.GetString(docs.RequestInformation.Request));
}
public class LogInfoIndexView
{
public string Cluster { get; set; }
public string Server { get; set; }
public string Type { get; set; }
public DateTime ActionTime { get; set; }
}
returning
{
"size": 0,
"aggs": {
"log_query": {
"filter": {
"bool": {
"must": [
{
"term": {
"cluster": "giauht1"
}
},
{
"term": {
"server": "hadoop0"
}
},
{
"term": {
"type": "Warn"
}
},
{
"range": {
"actionTime": {
"lte": "2015-02-24T00:00:00.000",
"gte": "2015-02-01T00:00:00.000"
}
}
}
]
}
},
"aggs": {
"histogram_Log": {
"date_histogram": {
"field": "actionTime",
"interval": "1d",
"format": "dd/MM/YYYY hh:mm:ss"
}
}
}
}
}
}
EDIT:
In answer to your comment, the difference between a filtered query filter and a filter aggregation is that the former applies the filtering to all documents at the start of the query phase and filters are generally cached, improving performance on subsequent queries with those filters, whilst the latter applies in the scope of the aggregation to filter documents in the current context to a single bucket. If your query is only to perform the aggregation and you're likely to run the aggregation with the same filters, I think the filtered query filter should offer better performance.
I have a Json response that I receive from an API call. It has several nested levels as show below (this is a snippet):
"Items": [
{
"Result": {
"Id": "191e24b8-887d-e111-96ec-000c29128cee",
"Name": "Name",
"StartDate": "2012-04-03T00:00:00+01:00",
"EndDate": null,
"Status": {
"Name": "Active",
"Value": 5
},
"Client": {
"Id": "35ea10da-b8d5-4ef8-bf23-c829ae90fe60",
"Name": "client Name",
"AdditionalItems": {}
},
"ServiceAgreement": {
"Id": "65216699-a409-44b0-8294-0e995eb05d9d",
"Name": "Name",
"AdditionalItems": {
"ScheduleBased": true,
"PayFrequency": {
"Id": "981acb72-8291-de11-98fa-005056c00008",
"Name": "Weekly",
"AdditionalItems": {}
},
"PayCycle": [
{
"Name": "Schedule Based",
"ScheduleBased": true,
"SelfBilling": false,
"Id": "a8a2ecc4-ff79-46da-a135-743b57808ec3",
"CreatedOn": "2011-09-16T23:32:19+01:00",
"CreatedBy": "System Administrator",
"ModifiedOn": "2011-09-16T23:32:19+01:00",
"ModifiedBy": "System Administrator",
"Archived": false
}
]
}
},
}
]
...
What I want to do is retreive the data from the PayCycle node using Linq. I can for example get the items with a value of true using Result.ServiceAgreement.AdditionalItems.SchedultedBased using the following Linq in the Controller:
var result = from p in data["Data"]["Items"].Children()
where (bool)p["Result"]["ServiceAgreement"]["AdditionalItems"]["ScheduleBased"] == true
select new
{
Name = (string)p["Result"]["Client"]["Name"],
Id = (string)p["Result"]["Client"]["Id"]
};
Now I need to get Result.ServiceAgreement.AdditionalItems.Paycycle.ScheduleBased and SelfBilling properties. How do I do this if PayCycle is also an array, how do I get the children as I did with Data.Items in the Linq above so that I can have the where clause filter on both these items?
You can deserialize the JSON into a dynamic object, and then use Linq to Objects:
[TestMethod]
public void TestMethod1()
{
const string json = #"""Items"": [
{
""Result"": {
""Id"": ""191e24b8-887d-e111-96ec-000c29128cee"",
""Name"": ""Name"",
""StartDate"": ""2012-04-03T00:00:00+01:00"",
""EndDate"": null,
""Status"": {
""Name"": ""Active"",
""Value"": 5
},
""Client"": {
""Id"": ""35ea10da-b8d5-4ef8-bf23-c829ae90fe60"",
""Name"": ""client Name"",
""AdditionalItems"": {}
},
""ServiceAgreement"": {
""Id"": ""65216699-a409-44b0-8294-0e995eb05d9d"",
""Name"": ""Name"",
""AdditionalItems"": {
""ScheduleBased"": true,
""PayFrequency"": {
""Id"": ""981acb72-8291-de11-98fa-005056c00008"",
""Name"": ""Weekly"",
""AdditionalItems"": {}
},
""PayCycle"": [
{
""Name"": ""Schedule Based"",
""ScheduleBased"": true,
""SelfBilling"": false,
""Id"": ""a8a2ecc4-ff79-46da-a135-743b57808ec3"",
""CreatedOn"": ""2011-09-16T23:32:19+01:00"",
""CreatedBy"": ""System Administrator"",
""ModifiedOn"": ""2011-09-16T23:32:19+01:00"",
""ModifiedBy"": ""System Administrator"",
""Archived"": false
}
]
}
}
}
}
]";
dynamic data = System.Web.Helpers.Json.Decode("{" + json + "}");
var result = from i in (IEnumerable<dynamic>)data.Items
where i.Result.ServiceAgreement.AdditionalItems.ScheduleBased == true
select new
{
i.Result.Client.Name,
i.Result.Client.Id
};
Assert.AreEqual(1, result.Count());
Assert.AreEqual("client Name", result.First().Name);
Assert.AreEqual("35ea10da-b8d5-4ef8-bf23-c829ae90fe60", result.First().Id);
}
Note that I had to add brackets { and } around your example json string, or else the .NET json parser doesn't like it.
To start off, I'm not familiar with writing LINQ/LAMBDA using this format ["some"]["thing"].
My first step would be to create some classes/objects to house the data in to ease creating any code afterwards.
e.g.
public class Result
{
public Guid Id { get; set; }
public string Name { get; set; },
public DateTime StartDate { get; set; }
//you get the idea
}
But possibly try the following?
var result = from p in data["Data"]["Items"].Children()
where (bool)p["Result"]["ServiceAgreement"]["AdditionalItems"]["ScheduleBased"] == true
&& (p["Result"]["ServiceAgreement"]["AdditionalItems"]["PayCycle"]).Where(o => o.["ScheduleBased"] == true)
select new
{
Name = (string)p["Result"]["Client"]["Name"],
Id = (string)p["Result"]["Client"]["Id"]
};