Find Top X most similar users, based on Y boolean attributes - c#

Let's say I have 5 users, each with 5 boolean attributes, which could look like this:
| A B C D E
--------------------------
User 0 | 1 1 0 1 0
User 1 | 0 1 0 1 0
User 2 | 0 0 1 0 1
User 3 | 1 1 0 0 0
User 4 | 0 0 0 1 0
Now what would be the best approach to get a list of the top x users with the most "trues" in common. So in the example above the ranking should look like his:
Top 1: Users 0 (most true attributes)
Top 2: Users 0 and 1 OR Users 0 and 3 (both pairs have 2 attributes in common)
Top 3: Users 0, 1 and 3
Top 4: Users 0, 1, 3 and 4
Top 5: Users 0, 1, 2, 3, 4
I know there are metrics and distance measures to tell how similar two users are, but i want a list of most similar ones. Should i use some kind of clustering algorithm? But which one would consider multiple binary attributes and how could I implement it (preferably in C#)?
Since I haven't taken any classes on data mining, the literature on this topic is kinda overwhelming, so any help is highly appreciated.

User mostTrueUser = Users
.OrderByDescending(u => (u.A?1:0) + (u.B?1:0) + (u.C?1:0) + (u.D?1:0) + (u.E?1:0))
.First();
var groups = Users.GroupBy(u => ((u.A && mostTrueUser.A)?1:0)
+((u.B && mostTrueUser.B)?1:0)
+((u.C && mostTrueUser.C)?1:0)
+((u.D && mostTrueUser.D)?1:0)
+((u.E && mostTrueUser.E)?1:0)
,u => u).OrderByDescending(g => g.Key);
foreach(var group in groups)
{
Console.WriteLine("{0} // following have {0} 'true' in common with {1}",
group.Key,
mostTrueUser.ID);
foreach(var g in group)
{
Console.WriteLine(" " + g.ID);
}
}
This gives me the following:
3 // following have 3 'true' in common with 0
0
2 // following have 2 'true' in common with 0
1
3
1 // following have 1 'true' in common with 0
4
0 // following have 0 'true' in common with 0
2
Explanations
I used u.A?1:0 so true becomes 1 and false becomes 0.
I then got the User with most true using OrderByDescending([sum of trues]).
Then the GroupBy is used to group all Users on the number of true in common with the mostTrueUser.
Your ranking seems a little bit more complicated, but you can start with this to solve it.
I wrote a little tweak:
public class UserRank
{
public User UserA{get;set;}
public User UserB{get;set;}
public int Compare{
get{return ((UserA.A && UserB.A)?1:0)
+((UserA.B && UserB.B)?1:0)
+((UserA.C && UserB.C)?1:0)
+((UserA.D && UserB.D)?1:0)
+((UserA.E && UserB.E)?1:0);}
}
}
and then:
List<UserRank> userRanks = new List<UserRank>();
for(int i=0;i<Users.Count;i++)
{
for(int j=i;j<Users.Count;j++)
{
userRanks.Add(new UserRank
{
UserA = Users[i],
UserB = Users[j]
});
}
}
var groups = userRanks.GroupBy(u => u.Compare, u => u).OrderByDescending(g => g.Key);
foreach(var group in groups)
{
Console.WriteLine("{0} in common:",group.Key);
foreach(var u in group)
{
Console.WriteLine(" {0}-{1}",u.UserA.ID,u.UserB.ID);
}
}
gives me:
3 in common:
0-0
2 in common:
0-1
0-3
1-1
2-2
3-3
1 in common:
0-4
1-3
1-4
4-4
0 in common:
0-2
1-2
2-3
2-4
3-4
TutorialsPoint CodingGround permalink for testing purpose

Related

Jagged Array[][] find all selected elements in column

I have a jagged array:
int[][] loadData
a1
o1 | 3 1 5 4 3 3 1
o2 | 1 4 1 2 2 1 0
o3 | 4 4 5 4 4 3 1
o4 | 2 3 4 4 5 4 1
o5 | 3 3 5 2 5 5 1
o6 | 3 3 3 1 5 2 0
o7 | 2 5 3 5 1 2 1
o8 | 4 5 4 4 4 1 0 // this is my jagged array without o1 and a1 I use them for example
I want find all elements in column a1 who have number 3. I tried to mix the code but with no effect.
for example for 3 in column a1:
Dictionary<int, int?>[] matrix = new Dictionary<int, int?>[8];
matrix[0].Add(1, 3);
matrix[0].Add(5, 3);
matrix[0].Add(6, 3);
var x = Array.FindAll(loadData, a => Enumerable.Range(0, s)
.Select(j => loadData[j][0]));`
How to solve it?
The answer depends on what you mean by "find all."
If you want to find and count the number of rows, you can just
var count = array.Count( a => a[0] == 3 );
If you want to output the row numbers, it's a little trickier, since you have to pass the row number through before you apply the Where portion, or else the original row number will be lost.
var indexes = array.Select
(
(a, i) =>
new { RowNumber = i, Value = a[0]}
)
.Where( n => n.Value == 3 )
.Select( r => r.RowNumber )
You could also just flatten the array:
var flatList = array.SelectMany
(
(array, row) =>
array.Select
(
(element,column) =>
new { Row = row, Column = column, Value = element }
)
);
...and then query it like a flat table:
var indexes = flatList.Select
(
element => element.Column = 0 && element.Value == 3
)
.Select( a => a.Row );
I could be wrong but I think you may be looking for using linq is utilizing a .Where clause on a SelectMany statement. I had a question posted that was kind of similar although i was converting it into a string[][] array. https://stackoverflow.com/a/47784942/7813290

Creating all possible arrays without nested for loops [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I would like to generate all the possible numbers that have length n and each digit of my number has a value from the set {1,2,...,n-1}, as an array. In other words, I would like to list all the base n numbers with length n that do not include 0.
Right now, the only way I can think to do it is by nesting n for loops, and assigning myArray[i] with the (i+1)th loop, i.e.
int n;
int[] myArray = new int[n];
for (int i1 = 1; i1 < n; i1++)
myArray[0]=i1;
for (int i2 = 1; i2 < n; i2++)
myArray[1]=i2;
// and so on....
for (int in = 1; in < n; in++)
{
myArray[n]=in;
foreach (var item in myArray)
Console.Write(item.ToString());
Console.Write(Environment.NewLine);
}
and then printing each array at the innermost loop. The obvious issue is that for each n, I need to manually write n for loops.
From what I've read, recursion seems to be the best way to replace nested for loops, but I can't seem figure out how to make a general method for recursion either.
EDIT
For example, if n=3, I would like to write out 1 1 1, 1 1 2, 1 2 1, 1 2 2, 2 1 1, 2 1 2, 2 2 1, 2 2 2.
We are not limited to n<11. For example, if n=11, we would output
1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 2
1 1 1 1 1 1 1 1 1 1 3
...
1 1 1 1 1 1 1 1 1 1 10
1 1 1 1 1 1 1 1 1 2 1
1 1 1 1 1 1 1 1 1 2 2
1 1 1 1 1 1 1 1 1 2 3
...
1 1 1 1 1 1 1 1 1 9 10
1 1 1 1 1 1 1 1 1 10 1
1 1 1 1 1 1 1 1 1 10 2
1 1 1 1 1 1 1 1 1 10 3
...
10 10 10 10 10 10 10 10 10 9 10
10 10 10 10 10 10 10 10 10 10 1
10 10 10 10 10 10 10 10 10 10 2
...
10 10 10 10 10 10 10 10 10 10 10
So, a digit of a number may be any value between and including 1 and 10. The array myArray is simply used to get one of these numbers, then we print it, and go on to the next number and repeat.
As always, when thinking in recursive solutions, try to solve the problem using immutable structures; everything is much simpler to understand.
So first of all, lets build ourselves a fast little immutable stack that will help us keep track of the number we are currently generating (while not worrying about what other numbers are being generated in the recursive call...remember, immutable data can't change!):
public class ImmutableStack<T>: IEnumerable<T>
{
public static readonly ImmutableStack<T> Empty = new ImmutableStack<T>();
private readonly T first;
private readonly ImmutableStack<T> rest;
public int Count { get; }
private ImmutableStack()
{
Count = 0;
}
private ImmutableStack(T first, ImmutableStack<T> rest)
{
Debug.Assert(rest != null);
this.first = first;
this.rest = rest;
Count = rest.Count + 1;
}
public IEnumerator<T> GetEnumerator()
{
var current = this;
while (current != Empty)
{
yield return current.first;
current = current.rest;
}
}
public T Peek()
{
if (this == Empty)
throw new InvalidOperationException("Can not peek an empty stack.");
return first;
}
public ImmutableStack<T> Pop()
{
if (this == Empty)
throw new InvalidOperationException("Can not pop an empty stack.");
return rest;
}
public ImmutableStack<T> Push(T item) => new ImmutableStack<T>(item, this);
IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
}
That's easy. Note how the stack reuses data. How many empty immutable structs will there be in our little program? Only one. And stacks containing the sequence 1->2->4? Yup, only one.
Now, we implement a recursive function that just keeps adding numbers to the stack until we reach our "bail out" condition. Which is? When the stack contains n elements. Easy peasy:
private static IEnumerable<int> generateNumbers(ImmutableStack<string> digits, IEnumerable<string> set, int length)
{
if (digits.Count == length)
{
yield return int.Parse(string.Concat(digits));
}
else
{
foreach (var digit in set)
{
var newDigits = digits.Push(digit);
foreach (var newNumber in generateNumbers(newDigits, set, length))
{
yield return newNumber;
}
}
}
}
Ok, and now we just need to tie it alltogether with our public method:
public static IEnumerable<int> GenerateNumbers(int length)
{
if (length < 1)
throw new ArgumentOutOfRangeException(nameof(length));
return generateNumbers(ImmutableStack<string>.Empty,
Enumerable.Range(1, length - 1).Select(d => d.ToString(CultureInfo.InvariantCulture)),
length);
}
And sure enough, if we call this thing:
var ns = GenerateNumbers(3);
Console.WriteLine(string.Join(Environment.NewLine,
ns.Select((n, index) => $"[{index + 1}]\t: {n}")));
We get the expected output:
[1] : 111
[2] : 211
[3] : 121
[4] : 221
[5] : 112
[6] : 212
[7] : 122
[8] : 222
Do note that the total amount of numbers generated of a specified length n is (n - 1) ^ n which means that for relatively small values of length you are going to get quite an amount of numbers generated; n = 10 generates 3 486 784 401...

Most efficient way to group SQL Server results into easily-parsable result set

I have the following data in a SQL Server database:
service_id delivery_id price_increase
---------- ----------- --------------
1 1 0.4
1 2 0.3
1 3 0.2
1 4 0.1
1 5 0
2 1 0.4
2 2 0.3
2 3 0.2
2 4 0
2 5 0
4 1 0.5
4 2 0.3
4 3 0.25
4 4 0.15
4 5 0
Some points:
all the service_id values will always have a full complement of delivery_id values (i.e., there's a requirement to have delivery_ids 1-5)
there doesn't have to be a full complement of service_id values (as you can see above, service_id 3 has no entry)
for the purposes of this question, there's no limit on the number of service_id entries
These values will be parsed into the following class hierarchy:
ServicePricing
- ServiceId
- IEnumerable<DeliveryPricing>
where
DeliveryPricing
- DeliveryId
- PriceIncrease
What's the easiest way to query these values from the DB and then use C# to parse them? I could do it in a fairly trivial but tedious manner, checking to see whether a service_id has already been declared in code and so on, but is there any way to group the results so that I can more easily loop through them and have a clear boundary at which to declare a new instance of either of the classes?
For example, is it possible to put all results from the same service_id into an individual result set?
[Just for clarification, I'm looking for a SQL-based suggestion, not how to parse a result set in C#.]
select service_id, delivery_id, price_increase
from table order by service_id, delivery_id
int? serviceID = null;
ServicePricing sp;
List<ServicePricing> sps;
while (rdr.Read())
{
if(rdr.GetInt(0) <> serviceID)
{
if(sp != null)
sps.Add(sp);
serviceID = rdr.GetInt(0);
sp = new ServicePricing(serviceID);
}
sp.DeliveryPricings.Add(new DeliveryPricing(rdr.GetInt(1), rdr.GetDecimal(2));
}
sps.Add(sp);

Linq To entites query returns wrong data(different than Management Studio Query)

like the title suggests my problem is that I have a query/Stored Procedure that selects a data from a view and its working just fine at the management studio, the problem is when I try to call this data from my application( using linq to entites) I get wrong data(wrong as in a single row is repeated 10 times when the query should return 5 different rows/records)
Here is my management studio Query :
select * from dbo.v_RouteCardDetails_SizeInfo
where Trans_TransactionHeader = 0
AND Direction = 0
AND RoutGroupID = 1
AND Degree = '1st'
Result Returned:
Size SizeQuantity Trans_TransactionHeader RoutGroupID Direction Degree
XS 10 0 1 0 1st
S 2 0 1 0 1st
M 0 0 1 0 1st
L 5 0 1 0 1st
XXL 2 0 1 0 1st
and here is my Linq Query:
(from x in context.v_RouteCardDetails_SizeInfo
where x.Trans_TransactionHeader == 0
&& x.Direction == 0
&& x.RoutGroupID == 1
&& x.Degree.ToLower() == "1st"
select x).ToList<_Model.v_RouteCardDetails_SizeInfo>();
And the result returned is :
Size SizeQuantity Trans_TransactionHeader RoutGroupID Direction Degree
XS 10 0 1 0 1st
XS 10 0 1 0 1st
XS 10 0 1 0 1st
XS 10 0 1 0 1st
XS 10 0 1 0 1st
XS 10 0 1 0 1st
XS 10 0 1 0 1st
XS 10 0 1 0 1st
XS 10 0 1 0 1st
XS 10 0 1 0 1st
for 2 days I've been trying to fix this, will appreciate your help
Thanks
Undoubtedly the fields that Entity Framework has guessed as primary key of the view are not unique in the view. Try to add fields to the PK in the edmx designer (or code-first mapping) until you've really got a unique combination.
EF just materializes identical rows for each identical key value it finds in the result set from the SQL query.
Because it is not possible to have sme enviroment as your I suggest you do following things:
Check in debbuger what exactly is in list. The printed result suggest that you somehow display data returned form database and in this code could be an error.
Preview LINQ query. You can use LinqPad for this.

Does someone know how to solve this C# Math Algorithm?

Does someone know how to solve this C# Math Algorithm?
The control number calculates by multiplying each number in a "social security number" with changing 2 and 1 (starting with 2). Then it calculates and adds together.
The control number should be equal divided with 10 to be correct and pass.
Ex, 720310-1212 "Social security number"
7* 2 = 14 --> 1+4
2* 1 = 2 --> 2
0* 2 = 0 --> 0
3* 1 = 3 --> 3
1* 2 = 2 --> 2
0* 1 = 0 --> 0
1* 2 = 2 --> 2
2* 1 = 2 --> 2
1* 2 = 1 --> 2
2* 1 = 2 --> 2
Then add them 1+4+2+0+3+2+0+2+2+2+2 = 20
20/10 = 2 Pass!
You need:
a counter to accumulate the numbers,
a loop to iterate over the input string,
char.GetNumericValue to get the numeric value of each input character,
a boolean flag that is changed each iteration to indicate whether to multiply by 1 or 2,
the modulus operator % to calculate the remainder of the division by 10 at the end.
Should be simple enough. Homework?
Edit
LINQ solution:
var valid = "720310-1212"
.Where(c => char.IsDigit(c))
.Select(c => (int)char.GetNumericValue(c))
.Select((x, i) => x * (2 - i % 2))
.Select(x => x % 10 + x / 10)
.Sum() % 10 == 0;
I think you're describing the Luhn algorithm (also known as mod 10). It's used to validate credit cards (and other things). There is a C# implementation at E-Commerce Tip: Programmatically Validate Credit Card Numbers.

Categories

Resources