I'm working on an application where users can tag "components" as part of the workflow. In many cases, they end up with several tags that are synonyms of each other. They would like these to be grouped together so that when one tag is added to a component, the rest of the tags in the group can be added as well.
I decided to break up tag groups into two-way relationships between each pair of tags in the group. So if a group has tags 1 and 2, there's a record that looks like this:
ID TagID RelatedTagID
1 1 2
2 2 1
Basically, a group is represented as a Cartesian product of each tag in it. Extend that to 3 tags:
ID Name
1 MM
2 Managed Maintenance
3 MSP
Our relationships look like this:
ID TagID RelatedTagID
1 1 2
2 2 1
3 1 3
4 3 1
5 2 3
6 3 2
I have a couple methods to group them together, but they're less than stellar. First, I wrote a view that lists each tag along with the list of tags in its group:
SELECT
TagKey AS ID,
STUFF
((SELECT ',' + cast(RelatedTagKey AS nvarchar)
FROM RelatedTags rt
WHERE rt.TagKey = t.TagKey
FOR XML PATH('')), 1, 1, '') AS RelatedTagKeys
FROM (
SELECT DISTINCT TagKey
FROM RelatedTags
) t
The problem with this is that each group appears in the results as many times as there are tags in it, which I wasn't able to think of a way to work around in a single query. So it gives me back:
ID RelatedTagKeys
1 2,3
2 1,3
3 1,2
Then in my back-end, I discard all groups that contain a key that occurs in another group. Tags aren't being added to multiple groups, so that works, but I don't like how much extraneous data I'm pulling down.
The second solution I came up with was this LINQ query. The key used to group the tags is a listing of the group itself. This is probably much worse than I originally thought.
from t in Tags.ToList()
where t.RelatedTags.Any()
group t by
string.Join(",", (new List<int> { t.ID })
.Concat(t.RelatedTags.Select(i => i.Tag.ID))
.OrderBy(i => i))
into g
select g.ToList()
I really hate grouping by the result of calling string.Join, but when I tried just grouping by the list of keys, it didn't group properly, putting each tag in a group by itself. Also, the SQL it generated is monstrous. I'm not going to paste it here, but LINQPad shows that it generates about 12,000 lines of individual SELECT statements on my test database (we have 1562 tags and 67 records in RelatedTags).
These solutions work, but they're pretty naive and inefficient. I don't know where else to go with this, though. Any ideas?
I suppose working with your data gets easier if you have a groupId for each of your tags, such that tags that are related share the same value of groupId.
To explain what I mean, I added a second set of related tags to your dataset:
INSERT INTO tags ([ID], [Name]) VALUES
(1, 'MM'),
(2, 'Managed Maintenance'),
(3, 'MSP'),
(4, 'UM'),
(5, 'Unmanaged Maintenance');
and
INSERT INTO relatedTags ([ID], [TagID], [RelatedTagID]) VALUES
(1, 1, 2),
(2, 2, 1),
(3, 1, 3),
(4, 3, 1),
(5, 2, 3),
(6, 3, 2),
(7, 4, 5),
(8, 5, 4);
Then, a table holding the following information should make a lot of other things easier (I first explain the content of the table and then how to get it using a query):
tagId | groupId
------|--------
1 | 1
2 | 1
3 | 1
4 | 4
5 | 4
The data comprises two groups of related tags, i.e. {1,2,3} and {4,5}. Therefore, above table marks tags belonging to the same group with the same groupId, i.e. 1 for {1,2,3}, and 4 for {4,5}.
To achieve such a view/table, you could use the following query:
with rt as
( (select r2.tagId, r2.relatedTagId
from relatedTags r1 join relatedTags r2 on r1.tagId = r2.relatedTagId)
union
(select r3.tagId, r3.tagId as relatedTagId from relatedTags r3)
)
select rt.tagId, min(rt.relatedTagId) as groupId from rt
group by tagId
Of course, instead of introducing a new table / view, you could also extend your primary tags-table by a groupId attribute.
Hope this helps.
I really don't understand the relationships. You didn't explain very well. But I somehow got same results. Not sure if I did it right.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ConsoleApplication41
{
class Program
{
static void Main(string[] args)
{
Data.data = new List<Data>() {
new Data() { ID = 1, TagID = 1, RelatedTagID = 2},
new Data() { ID = 2, TagID = 2, RelatedTagID = 1},
new Data() { ID = 3, TagID = 1, RelatedTagID = 3},
new Data() { ID = 4, TagID = 3, RelatedTagID = 1},
new Data() { ID = 5, TagID = 2, RelatedTagID = 3},
new Data() { ID = 6, TagID = 3, RelatedTagID = 2}
};
var results = Data.data.GroupBy(x => x.RelatedTagID)
.OrderBy(x => x.Key)
.Select(x => new {
ID = x.Key,
RelatedTagKeys = x.Select(y => y.TagID).ToList()
}).ToList();
foreach (var result in results)
{
Console.WriteLine("ID = '{0}', RelatedTagKeys = '{1}'", result.ID, string.Join(",",result.RelatedTagKeys.Select(x => x.ToString())));
}
Console.ReadLine();
}
}
public class Data
{
public static List<Data> data { get; set; }
public int ID { get; set; }
public int TagID { get; set; }
public int RelatedTagID { get; set; }
}
}
Related
I have a problem using groupby operator in linq. I am using .Net Core EF, so I am not really sure whether there is a bug on it, or if I am doing something wrong. I am trying to perform a grouping on a table which contains several records. My grouping key is a non-anonymous entity that gets its values from navigating a level down the db structure.
In the following example, I am interested in grouping records present on the ResultTable, by the values given by the TypeId and OtherId properties.
ResultTable
- ResultId
- DetailId (FK)
DetailTable
- DetailId
- TypeId (maps to enum, not null)
- OtherId (FK, null)
My code to group is the following:
private IQueryable<IGrouping<Grouper, Result>> GetGroupedResults()
{
var results = MyContext.ResultSet.Include(r => r.Detail);
var groupedResults = results.GroupBy(r => new Grouper(r.Detail.TypeId, r.Detail.OtherId ?? 0));
return groupedResults ;
}
The definition of my grouper entity is as follows:
public class Grouper
{
public Grouper(Type type, int otherId)
{
Type= type;
OtherId = otherId;
}
public Type Type{ get; } // this is an enum
public int OtherId { get; }
public override bool Equals(object obj)
{
var p = obj as Grouper;
if (p == null)
{
return false;
}
return (Type == p.Type ) && (OtherId == p.OtherId);
}
public override int GetHashCode()
{
return (int)Type ;
}
}
What I would expect this to do is the following. Lets say that I have the following records:
ResultTable
ResultId: 1, DetailId: 1
ResultId: 2, DetailId: 2
ResultId: 3, DetailId: 3
ResultId: 4, DetailId: 4
DetailTable
DetailId: 1, TypeId: 1, OtherId: 1
DetailId: 2, TypeId: 1, OtherId: 1
DetailId: 3, TypeId: 2, OtherId: NULL
DetailId: 4, TypeId: 1, OtherId: 1
Considering that data I would expect two groups with the following keys and values
First group with key Grouper(Type: 1, OtherId: 1), values ResultId(1, 2, 4)
Second group with key Grouper(Type: 2, OtherId: 0), values ResultId(3)
However I am not getting this. I am instead getting three groups with keys and values as follows:
First group with key Grouper(Type: 1, OtherId: 1), values ResultId(1, 2, 4)
Second group with key Grouper(Type: 2, OtherId: 0), values ResultId(3)
Third group with key Grouper(Type: 1, OtherId: 1), values ResultId(4)
It seems as if the GroupBy operation were only capable of grouping records with subsequent id's. Why is this happening?
Greetings
Luis.
EDIT: Using an anonymous type to perform grouping results in correct groups, could it be the way I am defining my Grouper class?
The data is as follow
ID Title Category About Link CategoryID
1 The Matrix Sci-Fi Text goes here http://... 1
2 The Simpsons Cartoon Text goes here http://... 2
3 Avengers Action Text goes here http://... 3
4 The Matrix Sci-Fi Text goes here http://... 1
5 The One Sci-Fi Text goes here http://... 1
6 The Hobbit Sci-Fi Text goes here http://... 1
I have a checkbox list containing the categories. The problem is if the user selects 'Action' and 'Sci-Fi' as category to display The Matrix will be displayed twice.
This is my try for getting unique rows in SQL Query.
select distinct title, about, link from mytable
inner join tableCategories on categoryID = tableCategoriesID
group by title, about, link
Using the LINQ,
(from table in movieTables
join x in categoryIDList
on categoryID equals x
slect table).Distinct()
Note that the categories are in a separate table linked by the categoryID.
Need help displaying unique or distinct rows in LINQ.
You can happily select your result into a list of whatever you want:
var v = from entry in tables
where matching_logic_here
select new {id = some_id, val=some_value};
and then you can run your distinct on that list (well, a ToList() on the above will make it one), based on your needs.
The following should illustrate what i mean (just paste into linqpad. if you're using VS, get rid of the .Dump():
void Main()
{
var input = new List<mock_entry> {
new mock_entry {id = 1, name="The Matrix", cat= "Sci-Fi"},
new mock_entry {id = 2, name="The Simpsons" ,cat= "Cartoon"},
new mock_entry {id = 3, name="Avengers" ,cat= "Action"},
new mock_entry {id = 4, name="The Matrix", cat= "Sci-Fi"},
new mock_entry {id = 5, name="The One" ,cat= "Sci-Fi"},
new mock_entry {id = 6, name="The Hobbit",cat= "Sci-Fi"},
};
var v = input.Where(e=>e.cat == "Action" || e.cat =="Sci-Fi")
.Dump()
.Select(e => new {n = e.name, c =e.cat})
.Dump()
;
var d = v.Distinct()
.Dump()
;
}
// Define other methods and classes here
public struct mock_entry {
public int id {get;set;}
public string name {get;set;}
public string cat {get;set;}
}
Another option would be to use DistinctBy from more linq as suggested in this question
Edit:
Even simpler, you can use GroupBy, and just select the first entry (you'll lose the id though, but up to you).
Here's an example that will work with the above:
var v = input.GroupBy (i => i.name)
.Select(e => e.First ())
.Dump()
.Where(e=>e.cat == "Action" || e.cat =="Sci-Fi")
.Dump()
;
will yield:
1 The Matrix Sci-Fi
3 Avengers Action
5 The One Sci-Fi
6 The Hobbit Sci-Fi
I am trying to count the VALUEs corresponding to a List<int[]> using Linq to Entity Framework.
I have a List<int[]> where each int[] in the List is of length 2.
I have a DB table VALUES which contains 3 columns, called ID, PARENT and VALUE, where each int[] in the List (see 1) may correspond to a record in the table, the 0 index being ID and the 1 index being PARENT. *But some arrays likely do not correspond to any existing records in the table.
Each combination of ID and PARENT correspond to multiple DB records, with different VALUEs.
Several points that are important to note:
One of the problems is that I can't rely on ID alone - each value is defined/located according to both the ID and PARENT.
None of the int arrays repeat, though the value in each index may appear in several arrays, e.g.
List<int[]> myList = new List<int[]>();
myList.add(new int[]{2, 1});
myList.add(new int[]{3, 1}); //Notice - same "PARENT"
myList.add(new int[]{4, 1}); //Notice - same "PARENT"
myList.add(new int[]{3, 1}); //!!!! Cannot exist - already in the List
I can't seem to figure out how to request all of the VALUEs from the VALUES table that correspond to the ID, PARENT pairs in the List<int[]>.
I've tried several variations but keep arriving at the pitfall of attempting to compare an array in a linq statement... I can't seem to crack it without loading substantially more information that I actually need.
Probably the closest I've gotten is with the following line:
var myList = new List<int[]>();
// ... fill the list ...
var res = myContext.VALUES.Where(v => myList.Any(listItem => listItem[0] == v.ID && listItem[1] == v.PARENT));
Of course, this can't work because The LINQ expression node type 'ArrayIndex' is not supported in LINQ to Entities.
#chris huber
I tried it out but it was unsuccessful.
2 things:
Where you created "myValues" I have a DB table entity, not a List.
Due to point number 1, I am using LINQ to Entities, as opposed to LINQ to Object
My code then comes to something like this:
var q2 = from value in myContext.VALUES where myList.Select(x => new { ID = x.ElementAt(0), Parent = x.ElementAt(1) }).Contains(new { ID = value.ID, Parent = value.PARENT }) select value;
This returns the following error message when run:
LINQ to Entities does not recognize the method 'Int32 ElementAt[Int32](System.Collections.Generic.IEnumerable` 1[System.Int32],Int32)' method, and this method cannot be translated into a store expression.
#Ovidiu
I attempted your solution as well but the same problem as above:
As I am using LINQ to Entities, there are simply certain things that cannot be performed, in this case - the ToString() method is "not recognized". Removing the ToString() method and attempting to simply have a Int32 + "|" + In32 gives me a whole other error about LINQ to Entities not being able to cast an Int32 to Object.
Use the following LINQ expression:
List<int[]> myList = new List<int[]>();
myList.Add(new int[] { 2, 1 });
myList.Add(new int[] { 3, 1 }); //Notice - same "PARENT"
myList.Add(new int[] { 4, 1 }); //Notice - same "PARENT"
myList.Add(new int[] { 3, 1 });
List<int[]> myValues = new List<int[]>();
myValues.Add(new int[] { 2, 1 , 1});
myValues.Add(new int[] { 3, 1 , 2}); //Notice - same "PARENT"
myValues.Add(new int[] { 4, 1 , 3}); //Notice - same "PARENT"
myValues.Add(new int[] { 3, 1, 4 });
myValues.Add(new int[] { 3, 2, 4 });
var q2 = from value in myValues where myList.Select(x => new { ID = x.ElementAt(0), Parent = x.ElementAt(1) }).Contains(new { ID = value.ElementAt(0), Parent = value.ElementAt(1) }) select value;
var list = q2.ToList();
You could use this workaround: create a new List<string> from your List<int[]> and compare the values in your table with the new list.
I didn't test this with EF, but it might work
List<string> strList = myList.Select(x => x[0].ToString() + "|" + x[1].ToString()).ToList();
var res = myContext.VALUES.Where(x => strList.Contains(SqlFunctions.StringConvert((double)x.ID).Trim() + "|" + SqlFunctions.StringConvert((double)x.PARENT).Trim()));
I am having a table like this
ID Title Parentid
1 Level1 0
2 Level2 1
3 Level3 2
4 Level4 1
I want output in hierarchy model according to the parentid ,Id relationship as
Level1
->Level2->Level 3
-> Level4.
I am able to achieve like
level1
/\
level2 level4.
Here I am not getting level 3.
But i want the ouptut as shown in the first example using c#.
(Untested) Try:
;with RCTE as
(select id, title full_path from MyTable where ParentID = 0
union all
select m.id, r.full_path & '->' & m.title full_path
from MyTable m, RCTE r
where m.parentid = r.id)
select full_path from RCTE
Are all the parents defined before the children?
If so, you can use a Dictionary(int, List(Item)) (sorry about the parentheses, can't seem to get the angle brackets to work) where, say,
public class Item {
public int Id { get; set;}
public int ParentId { get; set;}
public string Title {get; set;}
}
IDictionary<int, List<Item>> CreateTree(IEnumerable<Item> nodeList){
var ret = new Dictionary<int, List<Item>>();
foreach (var item in items) {
if (!ret.ContainsKey(item.ParentId)) {
ret.Add(item.ParentId, new List<Item>());
}
ret[item.ParentId].Add(item);
}
return ret;
}
This will give (for the above data)
0 => level1
1 => level2, level4
2 => level3
If the parent ids are not guaranteed to be before the child ids, then you need to add in some tweaking to allow for orphans and then add then process them at the end.
Hope this helps,
Alan.
The recursion should be done inside of SQL Server using a Common Table Expression query (CTE). One query should be able to give the results and the "levels" which can then be parsed in C# without the need for recursion in code.
Here's a link with examples: (Mark's example also applies)
http://msdn.microsoft.com/en-us/library/ms186243.aspx
Say that have the following CTE that returns the level of some tree data (adjacency model) that I have (taken from Hierarchical data in Linq - options and performance):
WITH hierarchy_cte(id, parent_id, data, lvl) AS
(
SELECT id, parent_id, data, 0 AS lvl
FROM dbo.hierarchical_table
WHERE (parent_id IS NULL)
UNION ALL
SELECT t1.id, t1.parent_id, t1.data, h.lvl + 1 AS lvl
FROM dbo.hierarchical_table AS t1
INNER JOIN hierarchy_cte AS h ON t1.parent_id = h.id
)
SELECT id, parent_id, data, lvl
FROM hierarchy_cte AS result
I was wondering if there would be any performance increase by doing the recursion in C# instead of SQL. Can anyone show me how to perform the same work that the CTE does with a recursive C# function assuming I have a IQueryable where Tree is an entity representing an entry in the hierarchical table? Something along the lines of:
public void RecurseTree(IQueryable<Tree> tree, Guid userId, Guid parentId, int level)
{
...
currentNode.level = x
...
Recurse(tree... ,level + 1)
}
Would be cool to see this is easy to do using a lambda expression.
Recursion in SQL Server is horrendously slow by comparsion but it does work.
I'd have to say that T-SQL is somewhat limited but it was never meant to do all those operations in the first place. I don't believe there is any way you can make this happen with an IQueryable if you inted to run this against you SQL Server instance but you can do it in memory on the machine running the code using LINQ-to-Objects in a relatively compact manner.
Here's one way to do that:
class TreeNode
{
public int Id;
public int? ParentId;
}
static void Main(string[] args)
{
var list = new List<TreeNode>{
new TreeNode{ Id = 1 },
new TreeNode{ Id = 4, ParentId = 1 },
new TreeNode{ Id = 5, ParentId = 1 },
new TreeNode{ Id = 6, ParentId = 1 },
new TreeNode{ Id = 2 },
new TreeNode{ Id = 7, ParentId= 2 },
new TreeNode{ Id = 8, ParentId= 7 },
new TreeNode{ Id = 3 },
};
foreach (var item in Level(list, null, 0))
{
Console.WriteLine("Id={0}, Level={1}", item.Key, item.Value);
}
}
private static IEnumerable<KeyValuePair<int,int>> Level(List<TreeNode> list, int? parentId, int lvl)
{
return list
.Where(x => x.ParentId == parentId)
.SelectMany(x =>
new[] { new KeyValuePair<int, int>(x.Id, lvl) }.Concat(Level(list, x.Id, lvl + 1))
);
}
Genuinely recursive lambdas (and by inference, Expressions) are technically possible but pretty much insane. I would also expect any parser (L2S, EF, etc) except maybe LINQ-to-Objects to just go crazy trying to unpick that.
In short: you would do well to consider this an unsupported mechanism for Expression.
Finally, note that just because you are writing an Expression doesn't mean you are executing it in C# - in fact, quite probably the opposite: if you are actively writing an Expression (rather than a delegate or procedural code) I assume that it is going to a parser (unless you used .AsQueryable() to push it to LINQ-to-Objects).