So, I have a json file that has a list of fruits. Fruits key can map to a single fruit or a collection of fruits.
E.g:
[
{
"fruits": [
"banana"
]
},
{
"fruits": [
"apple"
]
},
{
"fruits": [
"orange",
"apple"
]
}
]
I was wondering, how can I determine which fruit(s) occur the most in my json structure? That is, how do I know my how often a value occurs and which one is leading above the others?
Not sure if you're interested in having a class to deserialize into, but here's how you would do it. Feel free to skip the class and use dynamic deserialization:
class FruitCollection
{
string[] Fruits { get; set; }
}
var fruitColls = JsonConvert.DeserializeObject<FruitCollection>(json);
var mostCommon = fruitColls
.SelectMany(fc => fc.Fruits)
.GroupBy(f => f)
.OrderByDescending(g => g.Count())
.First()
.Key;
EDIT:
This question's pretty old, but I'll mention that the OrderByDescending, First thing is doing redundant work: you don't really need to sort to get the maximum. This is an age-old lazy hack that people keep doing because LINQ does not provide a nice MaxBy extension method.
Usually your input size is small enough and the other stuff adds enough overhead that you don't really care, but the "correct" way (e.g. if you had billions of fruit types) would be to use a proper MaxBy extension method or hack something out of Aggregate. Finding the max is worst-case linear, whereas sorting is worst case O(n log(n)).
If you use Json.NET, you can load your json using LINQ to JSON, then use SelectTokens to recursively find all "fruits" properties, then recursively collect all descendants string values (those of type JValue), group them by their string value, and put them in descending order:
var token = JToken.Parse(jsonString);
var fruits = token.SelectTokens("..fruits") // Recursively find all "fruit" properties
.SelectMany(f => f.DescendantsAndSelf()) // Recursively find all string literals undernearh each
.OfType<JValue>()
.GroupBy(f => (string)f) // Group by string value
.OrderByDescending(g => g.Count()) // Descending order by count.
.ToList();
Or, if you prefer to put your results into an anonymous type for clarity:
var fruits = token.SelectTokens("..fruits") // Recursively find all "fruit" properties
.SelectMany(f => f.DescendantsAndSelf()) // Recursively find all string literals undernearh each
.OfType<JValue>()
.GroupBy(f => (string)f) // Group by string value
.Select(g => new { Fruit = (string)g.Key, Count = g.Count() } )
.OrderByDescending(f => f.Count) // Descending order by count.
.ToList();
Then afterwards:
Console.WriteLine(JsonConvert.SerializeObject(fruits, Formatting.Indented));
Produces:
[
{
"Fruit": "apple",
"Count": 2
},
{
"Fruit": "banana",
"Count": 1
},
{
"Fruit": "orange",
"Count": 1
}
]
** Update **
Forgot to include the following extension method
public static class JsonExtensions
{
public static IEnumerable<JToken> DescendantsAndSelf(this JToken node)
{
if (node == null)
return Enumerable.Empty<JToken>();
var container = node as JContainer;
if (container != null)
return container.DescendantsAndSelf();
else
return new [] { node };
}
}
The original question was a little vague on the precise structure of the JSON which is why I suggested using Linq rather than deserialization.
The serialization class for this structure is simple:
public class RootObject
{
public List<List<string>> fruits { get; set; }
}
So to deserialize:
var fruitListContainer = JsonConvert.DeserializeObject<RootObject>(jsonString);
Then you can put all fruits in one list:
List<string> fruits = fruitListContainer.fruits.SelectMany(f => f);
Now you have all fruits in one list, and you can do whatever you want. For sorting, see the other answers.
Assuming that the data is in a file named fruits.json, that jq (http://stedolan.github.io/jq/) is on the PATH, and that you're using a Mac or Linux-style shell:
$ jq 'reduce (.[].fruits[]) as $fruit ({}; .[$fruit] += 1)' fruits.json
{
"banana": 1,
"apple": 2,
"orange": 1
}
On Windows, the same thing will work if the quotation marks are suitably adjusted. Alternatively, if the one-line jq program is put in a file, say fruits.jq, the following command could be run in any supported environment:
jq -f fruits.jq fruits.json
If the data is coming from some other process, you can pipe it into jq, e.g. like so:
jq -f fruits.jq
One way to find the maximum count is to add a couple of filters, e.g. as follows:
$ jq 'reduce (.[].fruits[]) as $fruit ({}; .[$fruit] += 1) |
to_entries | max_by(.value)' fruits.json
{
"key": "apple",
"value": 2
}
Related
I have an array list in this format given below in mongodb collection.
"Students":
[
null,
{
"name": "Rahul",
"RegID": "A01"
},
{
"name": "Raj",
"RegID": "A012"
}
]
I want to display name in string (Rahul,Raj).This is how I am trying.
Var Namelist = students.Select(x =>x?.name)?.ToList();
String names = string.Join(",",Namelist);
But it also appends (,Rahul,Raj).
I can do this by using loop but i want to write the exact linq query
You just need to add a filter to ignore empty entries. You can either do that in the source query:
var Namelist = students.Where(s => s != null).Select(x =>x.name).ToList();
(Note that I've removed a few null operators since you can assume that all of the values are not null)
Or when you join the strings:
String names = string.Join(",", Namelist.Where(s => s != null));
It's generally odd to have a null value in a list of related data, however, so might actually have a problem in the source data that you should fix.
I am new to LINQ queries and would like to know if what I am trying to achieve is possible via LINQ query.
So, I have a JSON doc as below.
I am trying to get all the values that match the "$type" and return me the directory path and the value for $type.
I know an interactive way of doing this but it seems LINQ is preferred and supposed to be easy to get this.
{
"$type":"type1",
"title":"US version",
"_object1":[
{
"$type":"type2",
"rootModule":{
"id":"page",
"modules":[
{
"id":"header",
"$type":"module-header"
},
{
"id":"footer",
"$type":"module-footer"
}
]
}
},
{
"$type":"type2",
"_id":"ab134"
},
{
"$type":"type3",
"_id":"ab567"
}
],
"_object2":[
{
"$type":"module1",
"constraintsId":"page"
},
{
"name":"header1 1",
"nestedobject":{
"$type":"nestedobject-type",
"dataBinder":{
"id":"ab244"
}
}
}
]
}
Thanks guys,
I was able to get the list as below:
var root = (JContainer)JToken.FromObject(document, CommonSerializerSetting.GetCommonSerializer());
var descendant = "$type";
var query = root
// Recursively descend the JSON hierarchy
.DescendantsAndSelf()
// Select all properties named descendant
.OfType<JProperty>()
.Where(p => p.Name == descendant)
// Select their value
.Select(p => p.Value);
The data:
The collection contains a list of audit records and I want to return the last modified items from the collection.
For example:
So the query needs to return Audit 1235 and 1237 Only.
The following statement works in Mongo Shell and returns the data sub-millisecond, I just need to also figure out how to return the entire Collection item instead of just the Id.
db.Forms.aggregate(
{ $group: { _id: "$Id", lastModifiedId: { $last: "$_id" } } }
)
However, I need to convert this to the C# Driver's syntax.
I have the following at the moment but it's not working and returns (for lack of a better term) weird data (see screencap under the statement).
var results = collection.Aggregate()
.Group(new BsonDocument { { "_id", "$Id" }, { "lastModifiedId", new BsonDocument("$last", "_id") } })
.ToListAsync().Result.ToList();
My current solution gets the full collection back and then runs it through an extension method to get the latest records (where list is the full collection):
var lastModifiedOnlyList =
from listItem in list.OrderByDescending(_ => _.AuditId)
group listItem by listItem.Id into grp
select grp.OrderByDescending(listItem => listItem.AuditId)
.FirstOrDefault();
While this code works, it is EXTREMELY slow because of the sheer amount of data that is being returned from the collection, so I need to do the grouping on the list as part of the collection get/find.
Please let me know if I can provide any additional information.
Update: With Axel's help I managed to get it resolved:
var pipeline = new[] { new BsonDocument { { "$group", new BsonDocument { { "_id", "$Id" }, { "LastAuditId", new BsonDocument { { "$last", "$_id" } } } } } } };
var lastAuditIds = collection.Aggregate<Audit>(pipeline).ToListAsync().Result.ToList().Select(_=>_.LastAuditId);
I moved that to it's own method and then use the IDs to get the collection items back, with my projection working as well:
var forLastAuditIds = ForLastAuditIds(collection);
var limitedList = (
projection != null
? collection.Find(forLastAuditIds & filter, new FindOptions()).Project(projection)
: collection.Find(forLastAuditIds & filter, new FindOptions())
).ToListAsync().Result.ToList();
"filter" in this case is either an Expression or a BsonDocument. The performance is great as well - sub-second for the whole thing. Thanks for the help, Axel!
I think you're doing an extra OrderBy, this should do:
var lastModifiedOnlyList =
from listItem in list
group listItem by listItem.Id into grp
select grp.OrderByDescending(listItem => listItem.AuditId)
.FirstOrDefault();
EDIT:
To gain performance in the query, you could use the Aggregate function differently:
var match = new BsonDocument
{
{
"$group",
new BsonDocument
{
{ "_id", "$Id" },
{ "lastModifiedId", new BsonDocument
{
{
"$last", "$_id"
}
}}
}
}
};
var pipeline = new[] { match };
var result = collection.Aggregate(pipeline);
That should be the equivalent of your Mongo Shell query.
So I have a list of objects and the object class itself contains an array that holds multiple values. How can I search through all of the objects in the lists' arrays to look for that value?
Example:
[
{
"ObjArray": ["1234", 123"],
"Property1": "60",
"Property2": "64"
},
{
"ObjArray": ["4321", 321"],
"Property1": "112",
"Property2": "22"
},
{
"ObjArray": ["9999"],
"Property1": "2",
"Property2": "2"
}
]
And I want to look for "9999" in all of the "ObjArray"s. How can I do that with LINQ?
EDIT
As Habib pointed out, I just needed a simple Contains clause. Working code looks like this:
var result = mainList.Where(r => r.ObjArray != null && r.ObjArray.Contains("9999", StringComparer.OrdinalIgnoreCase)).FirstOrDefault();
You can do:
var query = mainList.Where(r => r.ObjArray.Contains("9999"));
Or
var query = mainList.Where(r => r.ObjArray.Any(o => o == "9999"));
(Aside from that, your JSON appears invalid, Second value in the array needs a starting double quote)
["1234", 123"]
//^^
Background
What I'm Trying to Do
I have a list of vehicles.
I have an API (WebAPI v2) that takes in a list of filters for a make and models
a filter consists of 1 make and 0 or more models. (e.g. "Honda" and ["Civic", "Accord"])
If a filter is passed in with a make and no models, I want it to match all models for that make.
If a filter is passed in with a make and models, I want it to make only those models for that make.
The Filter Object I'm using
public class MakeModelFilter : IMakeModelFilter
{
public string Make { get; set; }
public List<string> Models { get; set; }
}
What the entire API Call Looks Like
{
"MakeModelFilters": [
{"Make": "BMW", "Models": ["X3", "X5"]}
],
"TypeFilter": [],
"GenericColorFilter": [],
"FeaturesFilter": [],
"MaxMileage" : 100000,
"PriceRange": {"Min": 1, "Max": 1000000},
"SearchText": ""
}
The portion I'm concerned with is the MakeAndModelFilters list (the rest works as designed currently).
How I'm currently obtaining search results:
var vehicles = _esClient.Search<Vehicle>(s => s
.From(0).Size(10000)
.Query(q => q
.Filtered(fq => fq
.Filter(ff => ff
.Bool(b => b
.Must(m=> m.And(
m.Or(makeModelFilterList.ToArray()),
m.Or(featureFilters.ToArray()),
m.Or(typeFilters.ToArray()),
priceRangeFilter,
mileageFilter))
)
)
.Query(qq => qq
.QueryString(qs => qs.Query(criteria.SearchText))
)
)
)
);
The Problem
No matter how I structure the filter, it seems to filter out all documents -- not in our best interest. :) Something in my boolean logic is wrong.
Where I think the problem lies
The list of make and model filters that I or together is generated by this method:
private List<FilterContainer> GenerateMakeModelFilter(List<MakeModelFilter> makeModelFilters)
{
var filterList = new List<FilterContainer>();
foreach (var filter in makeModelFilters)
{
filterList.Add(GenerateMakeModelFilter(filter));
}
return filterList;
}
This method calls the individual method to generate a bool for each make/model filter I have.
What I think the problem method is
The below method, as far as I'm aware, does the following:
If no make is passed in, throw exception
If only a make is passed in, return a bool for only that make.
If a make and models are passed in, return an a bool of the make filter + an or of all the model terms. e.g. Make:BMW AND (model:X3 OR model:X5)
Code is below:
private FilterContainer GenerateMakeModelFilter(MakeModelFilter makeModelFilter)
{
if (string.IsNullOrWhiteSpace(makeModelFilter.Make)) { throw new ArgumentNullException(nameof(makeModelFilter));}
var makeFilter = new TermFilter { Field = Property.Path<Vehicle>(it => it.Make), Value = makeModelFilter.Make };
var boolMake = new BoolFilter { Must = new List<FilterContainer> { makeFilter } };
var modelFilters = GenerateFilterList(Property.Path<Vehicle>(it => it.Model), makeModelFilter.Models);
if (!modelFilters.Any())
{
// If it has a make but no model, generate boolFilter make only.
return boolMake;
}
var orModels = new OrFilter {Filters = modelFilters};
var boolModels = new BoolFilter {Must = new List<FilterContainer> {orModels}};
var boolMakeAndModels = new AndFilter {Filters = new List<FilterContainer> {boolMake, boolModels}};
return new BoolFilter {Must = new List<FilterContainer> {boolMakeAndModels}};
}
FYI, GenerateFilterList just creates a list of Term filters and returns the list.
FYI: Generated ElasticSearch JSON
This might be a clue to where I'm going wrong (though it's huge). I've just been staring at it so long that I can't see it I think.
{
"from": 0,
"size": 10000,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"and": {
"filters": [
{
"or": {
"filters": [
{
"bool": {
"must": [
{
"and": {
"filters": [
{
"bool": {
"must": [
{
"term": {
"make": "BMW"
}
}
]
}
},
{
"bool": {
"must": [
{
"or": {
"filters": [
{
"term": {
"model": "x3"
}
},
{
"term": {
"model": "x5"
}
}
]
}
}
]
}
}
]
}
}
]
}
}
]
}
},
{ },
{ },
{
"range": {
"sellingPriceUSD": {
"lte": "1000000",
"gte": "1"
}
}
},
{
"range": {
"miles": {
"lte": "100000"
}
}
}
]
}
}
]
}
}
}
}
}
Refactor 1: Move more Towards Bitwise operations
Per Martijn's answer and Zachary's post that he references, I've updated my GenerateFilterList to return a concatenated filterContainer:
private FilterContainer GenerateFilterList(PropertyPathMarker path, List<string> filter)
{
if (filter == null || filter.Count <= 0){ return null; }
FilterContainer returnFilter = null;
foreach (var aFilter in filter)
{
returnFilter |= new TermFilter {Field = path, Value = aFilter.ToLowerInvariant()};
}
return returnFilter;
}
And then for my GenerateMakeModelFilter, I perform an "and" against the "model filters", which should be a bitwise or based on the above code:
private FilterContainer GenerateMakeModelFilter(MakeModelFilter makeModelFilter)
{
if (string.IsNullOrWhiteSpace(makeModelFilter.Make)) { throw new ArgumentNullException(nameof(makeModelFilter)); }
var makeFilter = new TermFilter { Field = Property.Path<Vehicle>(it => it.Make), Value = makeModelFilter.Make };
var modelFilters = GenerateFilterList(Property.Path<Vehicle>(it => it.Model), makeModelFilter.Models);
return makeFilter && modelFilters;
}
This shortens the part that retrieves the query:
QueryContainer textQuery = new QueryStringQuery() {Query = criteria.SearchText };
FilterContainer boolFilter = makeModelFilter || featureFilter || typeFilter || priceRangeFilter || mileageFilter;
var vehicles = _esClient.Search<Vehicle>(s => s
.From(0).Size(10000) //TODO: Extract this into a constant or setting in case the inventory grows to 10k+. This prevents it from paging.
.Query(q => q
.Filtered(fq => fq
.Filter(filter => filter.Bool(bf => bf.Must(boolFilter)))
.Query(qq => textQuery)
)
)
);
return vehicles.Documents.ToList<IVehicle>();
...but I still have no documents returned. What the heck am I missing? If I have a Make of Honda with Models of "Civic" and "Accord", and a make of "BMW" with no models, I should receive all vehicles with honda + civic || honda + accord || bmw + (any model). I'll keep at it.
And,or, & not filters might not be doing what you want. They are a special filter construct that performs better when combining filters that do not operate on bitsets. Must read on this topic:
https://www.elastic.co/blog/all-about-elasticsearch-filter-bitsets
Knowing when to use and/or/not filters vs bool filters can be quite confusing and with Elasticsearch 2.0 you can use the bool filter in ALL contexts and it will know how to best execute the filters/queries in its clauses. No more need for you to hint!
Further more although the bool filter/query is named bool it does a unary bool whereas you might expect it to be a binary bool.
This is why the bool clauses are must/should/must_not vs and/or/not.
In NEST if you use the && || ! operators combined with parenthesis we will compose one or many bool queries so that it acts in the binary bool fashion you write it down in C#.
e.g:
.Query(q=>q
(q.Term("language", "php")
&& !q.Term("name", "Elastica")
)
||
q.Term("name", "NEST")
)
If you need a more dynamic list you can use the assignment operators != and &=:
private FilterContainer GenerateMakeModelFilter(List<MakeModelFilter> makeModelFilters)
{
FilterContainer filter = null;
foreach (var filter in makeModelFilters)
{
filter |= GenerateMakeModelFilter(filter);
}
return filter;
}
Similarly if you refactor GenerateMakeModelFilter to take advantage of the C# boolean operator overloads you'll end up with an easier to read and debug query. Both in terms of C# as well as the query that gets send to Elasticsearch.
Our documentation goes into it in some more detail http://nest.azurewebsites.net/nest/writing-queries.html
UPDATE
Awesome refactor! Now we can focus on mappings in elasticsearch. When you index a json property it goes through an analysis chain which takes the single string and tries to make 1 or more terms out of it that are going to be stored in lucene's inverted index.
By default elasticsearch will analyze all string fields using the standard analyzer
In your case BMW will go through the standard analyzer which splits on whitespace (Unicode standard annex #29 to be exact) and lowercases it.
So the term in the inverted index is bmw. In elasticsearch some queries are also analyzed at query time so a e.g a match query for BMW is also analyzed and transformed to bmw before consulting the inverted index and thus will find documents no matter the casing of BMW at query time.
The term query/filter that you are using is not analyzed at query time so it will try to find BMW in the inverted index where the inverted index only has bmw. This is great if you only want exact term matches. If you set up your mapping so that a field is not analyzed you could for instance do exact matches on New York without worrying its actually stored as two separate terms new and york and inadvertently also get results from New New York