LINQ - AND, ANY and NOT Query - c#

I am fairly rookie with LINQ. I can do some basic stuff with it but I am in need of an expert.
I am using Entity Framework and I have a table that has 3 columns.
public class Aspect
{
[Key, Column(Order = 0)]
public int AspectID { get; set; }
[Key, Column(Order = 1)]
public int AspectFieldID { get; set; }
public string Value { get; set; }
}
I have 3 lists of words from a user's input. One contains phrases or words that must be in the Value field (AND), another contains phrases or words that don't have to be in the Value field (ANY) and the last list contains phrases or words that can not be found in the Value field (NOT).
I need to get every record that has all of the ALL words, any of the ANY words and none of the NOT words.
Here are my objects.
public class SearchAllWord
{
public string Word { get; set; }
public bool includeSynonoyms { get; set; }
}
public class SearchAnyWord
{
public string Word { get; set; }
public bool includeSynonoyms { get; set; }
}
public class SearchNotWord
{
public string Word { get; set; }
}
What I have so far is this,
var aspectFields = getAspectFieldIDs().Where(fieldID => fieldID > 0).ToList();//retrieves a list of AspectFieldID's that match user input
var result = db.Aspects
.Where(p => aspectFields.Contains(p.AspectFieldID))
.ToList();
Any and all help is appreciated.

First let me say, if this is your requirement... your query will read every record in the database. This is going to be a slow operation.
IQueryable<Aspect> query = db.Aspects.AsQueryable();
//note, if AllWords is empty, query is not modified.
foreach(SearchAllWord x in AllWords)
{
//important, lambda should capture local variable instead of loop variable.
string word = x.Word;
query = query.Where(aspect => aspect.Value.Contains(word);
}
foreach(SearchNotWord x in NotWords)
{
string word = x.Word;
query = query.Where(aspect => !aspect.Value.Contains(word);
}
if (AnyWords.Any()) //haha!
{
List<string> words = AnyWords.Select(x => x.Value).ToList();
query =
from aspect in query
from word in words //does this work in EF?
where aspect.Value.Contains(word)
group aspect by aspect into g
select g.Key;
}
If you're sending this query into Sql Server, be aware of the ~2100 parameter limit. Each word is going to be sent as a parameter.

What you need are the set operators, specifically
Intersect
Any
Bundle up your "all" words into a string array (or some other enumerable) and then you can use intersect and count to check they are all present.
Here are two sets
var A = new string[] { "first", "second", "third" };
var B = new string[] { "second", "third" };
A is a superset of B?
var isSuperset = A.Intersect(B).Count() == B.Count();
A is disjoint with B?
var isDisjoint1 = !A.Intersect(B).Any();
var isDisjoint2 = !A.Any(a => B.Any(b => a == b)); //faster
Your objects are not strings so you will want the overload that allows you to supply a comparator function.
And now some soapboxing.
Much as I love Linq2sql it is not available in ASP.NET Core and the EF team wants to keep it that way, probably because jerks like me keep saying "gross inefficiency X of EF doesn't apply to Linq2Sql".
Core is the future. Small, fast and cross platform, it lets you serve a Web API from a Raspberry Pi running Windows IOT or Linux -- or get ridiculously high performance on big hardware.
EF is not and probably never will be a high performance proposition because it takes control away from you while insisting on being platform agnostic, which prevents it from exploiting the platform.
In the absence of Linq2sql, the solution seems to be libraries like Dapper, which handle parameters when sending the query and map results into object graphs when the result arrives, but otherwise don't do much. This makes them more or less platform agnostic but still lets you exploit the platform - apart from parameter substitution your SQL is passthrough.

Related

Build LINQ query for One-to-One and filtering by child

I have very simple model with two entities related as One-to-One via reference navigation properties:
class Post //"Main"
{
public int RowId { get; set; }
public string SomeInfo { get; set; }
public FTSPost FTSPost { get; set; }
}
class FTSPost //"Child"
{
public int RowId { get; set; }
public Post Post { get; set; }
public string Content { get; set; }
public string Match { get; set; }
}
I'm not sure if it's important but FTSPost represents virtual table of FTS5 SQLite and I'm following this exmaple so FTSPost is used for free text search capability. What I need to do is just to retrieve the whole rows of data from both tables based on the text search result and not just the text itself as in the example. I.e. I'm searching by Content of FTSPost and need to get respective SomeInfo of Post, not just Content itself. Note: Match is the service property which is used for searching and it's bound to the name of FTSPosts table. It works, so I can retrieve just Content as in the example.
The obvious (for me) query doesn't work, it yields zero results:
//Doesn't work! -- zero results :(
results = _context.Posts.Where(m => m.FTSPost.Match == "text for search");
SELECT *
FROM "Posts" AS "m"
LEFT JOIN "FTSPosts" AS "f" ON "m"."RowId" = "f"."RowId"
WHERE "f"."FTSPosts" = "text for search"
However, the following nice raw SQL query works well but I can't wrap my mind how to make it in LINQ! I tried to repeat it as is, with double "from" clauses but it converts to CROSS JOIN and doesn't work either. Please, help! P.S. I use EF Core 5.
//It works!!
SELECT *
FROM "Posts" AS "m", "FTSPosts" AS "f"
WHERE "f"."FTSPosts" = "text for search" AND "m"."RowId" = "f"."RowId"
The reason why "obvious" query didn't work was the way how the one-to-one relation was created. Even though the one-to-one seems to be mirrored, the way matters. After swaping Message and FTSMessage in the following code everything started to work. LEFT JOIN in the generated SQL query was replaced by INNER JOIN.
Correct code:
modelBuilder.Entity<Message>(
x =>
{
x.HasOne(fts => fts.FTSMessage)
.WithOne(p => p.Message)
.HasForeignKey<Message>(p => p.RowId);
});

Delete newer duplicate value based on two columns .NET Core

I have huge transactions table in azure database, where we import files with +1 million objects.
public class Transaction
{
[Key]
public int Id { get; set; }
public int TransactionId { get; set; }
public DateTime Date { get; set; }
public decimal Price { get; set; }
public int UserId { get; set; }
public string Product { get; set; }
public int ClientId { get; set; }
public int Uploaed { get; set; }
public string UniqueId { get; set; }
public string Custom1 { get; set; }
public string Custom2 { get; set; }
public string Custom3{ get; set; }
}
after importing all new data I take all new transaction ids, and take all transaction ids for that client from database.
// ids from import
string transactionsString = string.Join(",", transactionIdsCsv);
var result = await _transactionsDataRepository.GetByTransactionIdsAndClientId(transactionIdsCsv.ToArray(), clientId);
// ids from repository
string transactionsDBString = string.Join(",", result.ToList());
// remove rows in db where duplicate transactions ids and clientId=ClientId
but I am struggling to find the most effective way. I wanted to do something like
delete from transactions where transactionId IN (transactionsDBString) and clientId = ClientID but that would delete both values and I only want new value to be deleted (and old value to stay)
but would that be a good way? even fetching var result = await _transactionsDataRepository... can take a lot of time since there are millions of rows.
I only want new value to be deleted (and old value to stay)
Since you already know how to identify the transaction IDs you want to delete you could delete the necessary rows while keeping the latest like so (you didn't mention it but I'm assuming you're using Entity Framework - given your use of the [Key] attribute - correct me if I'm wrong):
var transToRemove = dbContext.Transactions
.Where(t => t.ClientId == clientId && transIds.Contains(t.TransactionId))
.GroupBy(t => t.TransactionId, t => t) // Group transactions with the same TransactionId
.SelectMany(
group => group.OrderBy(t => t.Date) // Order the oldest first
.Skip(1) // Skip the oldest (we want to keep it)
);
dbContext.Transactions.RemoveRange(transToRemove);
dbContext.SaveChanges();
Edit: Included an example that should work for Dapper...
var cn = // Create your DbConnection
// This query should select all transactions you want to delete excluding
// those with the oldest Date. This is just like 'transToRemove' above
var selectQuery = #"
SELECT t1.Id FROM Transactions t1
INNER JOIN (
SELECT
MIN(tInner.Date) AS FirstTransDate,
tInner.TransactionId,
tInner.ClientId
FROM Transactions tInner
WHERE tInner.ClientId = #clientId
AND tInner.TransactionId IN #transIds
GROUP BY tInner.TransactionId, tInner.ClientId
) t2 ON t2.ClientId = t1.ClientId AND t2.TransactionId = t1.TransactionId
WHERE t1.Date != t2.FirstTransDate
";
var idsToDelete = cn.Query<int>(
selectQuery,
new { clientId, transIds }).ToList();
// Delete the whole list in one go
cn.Execute("DELETE FROM Transactions WHERE Id in #idsToDelete", new {idsToDelete});
(inspiration from here and here)
I haven't tested this using Dapper but the list of idsToDelete should be correct according to this fiddle I made. A couple things to note:
Depending on how long your list of transIds is (I believe those ID's are in result in your own example) you might want to repeat this in smaller batches instead of trying to delete the whole list in one go.
The SQL query above doesn't take into account if two "duplicate" transactions have the same "oldest" Date. If that can happen in your table, then this query will only remove all "duplicate" rows apart from those two.
Improvements
There are a couple of things that seem a little out of place with your setup that I think you should consider:
even fetching var result = await _transactionsDataRepository... can take a lot of time since there are millions of rows
Millions of rows should not be an issue for any decent database server to handle. It sounds like you are missing some indexes on your table. With proper indexes your queries should be pretty swift as long as you can keep them simple.
but would that be a good way?
Not quite sure what you are referring to being good or bad here, but I'll interpret a little... Right now you are writing tons of rows to the table that seems to contain duplicate data. When I think of a transaction-based system, no two transactions should share the same ID. That means for two different ClientIds there should never be a case where t1.TransactionId == t2.TransactionId. Now you can avoid checking ClientId in my code-snippet above.
Since you want to only keep 1 transaction for each TransactionId will you ever need to have two transactions with the same TransactionId? If not, then you can go even further and make the TransactionId column unique and avoid inserting two rows with the same TransactionId. You can use the Entity Framework [Index(IsUnique=true)] attribute to also create an index to speed up queries on that column/property.

How to populate objects with relationship from datatable?

I am having trouble designing an approach for taking data from a CSV into business objects. I'm starting by parsing the CSV and getting each row into a DataTable and that is where my mental block starts.
I've got the following classes where APDistribution is considered a child of Voucher with a 1:Many relationship:
public class Voucher
{
public string GPVoucherNumber { get; set; }
public string VendorID { get; set; }
public string TransactionDescription { get; set; }
public string Title { get; set; }
public string DocNumber { get; set; }
public DateTime DocDate { get; set; }
public decimal PurchaseAmount { get; set; }
public IEnumerable<APDistribution> Distributions { get; set; }
}
public class APDistribution
{
public string AccountNumber { get; set; }
public decimal Debit { get; set; }
public decimal Credit { get; set; }
public string DistributionReference { get; set; }
}
My CSV looks like this. Several fields can repeat representing the Voucher transaction (Vendor, Title Invoice Number, Invoice Amount, etc), and some fields are the Distribution detail (Journal Account Code, Journal Amount).
I began by thinking I could use Linq to project onto my business objects but I'm failing to see how I can structure the query to do that in one pass. I find myself wondering if I can do one query to project into a Voucher collection, one to project into an APDistribution collection, and then some sort of code to properly associate them.
I started with the following where I am grouping by the fields that should uniquely define a Voucher, but that doesn't work because the projection is dealing with an anonymous type instead of the DataRow.
var vouchers =
from row in invoicesTable.AsEnumerable()
group row by new { vendor = row.Field<string>("Vendor Code"), invoice = row.Field<string>("Vendor Invoice Number") } into rowGroup
select new Voucher
{ VendorID = rowGroup.Field<string>("Vendor Code") };
Is this achievable without introducing complex Linq that a future developer (myself included) could have difficulty understanding/maintaining? Is there a simpler approach without Linq that I'm overlooking?
The general idea is:
invoicesTable
.AsEnumerable()
.GroupBy(x=> new { row.Field<string>("Vendor Code"), row.Field<string>("Vendor Invoice Number")})
.Select(grouping =>
new Voucher
{
VendorID = grouping.First().Field<string>("VendorId") /* and so on */
Distributions = grouping.Select(somerow => new redistribution {AccountNumber = somerow.Field<string>("AccountNumber") /* and so on */}
}
But this is not the most elegant way.
You are looking for a Linq join. See the documentation here for more greater depth.
Where it appears that you are running into trouble however, is that on your 2 objects you need something for the query to compare against, like maybe adding public string VendorID { get; set; } to the APDistribution class, if possible. I would assume that the CSV files would have something that ties an APDistribution back to a Voucher, so whatever it is, make sure it's in both classes so you can relate one to the other. The name doesn't need to be the same in both classes but it should be. More importantly is that you now have something that an equality comparer can use for the join operation.
Now personally, I don't like big gnarly queries if I can break them apart and make things easier. Too much to reason about all at once, and you've indicated that you agree. So my approach is to divide and conquer as follows.
First, run queries to project the CSV data into discrete objects, like so:
var voucherRows =
from row in invoicesTable.AsEnumerable()
Select New Voucher {
VendorID = row.Field<string>("Vendor Code")
// other properties to populate
};
and
var distributionRows =
from row in distributionsTable.AsEnumerable()
Select New APDistribution {
VendorID = row.Field<string>("Vendor Code"),
// other properties to populate
};
At this point you have 2 data sets that are related in domain terms but not yet associated in code. Now you can compose the queries together in the Join query and the join starts to look a lot easier, maybe something like:
var vouchers =
from row in voucherRows
join dist in distributionRows
on row.VendorId equals dist.VendorId
into distGroup
select new Voucher
{ VendorID = row.VendorID,
// other properties to populate
Distributions = distGroup.ToList()
};
You'll have to modify the queries to your needs, but this breaks them down into 3 distinct operations that are all designed to do 1 thing, thus easier to read, reason about, debug, and modify later. If you need to group the vouchers you can at this point, but this should get you moving. Also, if needed, you can add a validation step or other processing in between the initial CSV queries and the join and you don't have to rewrite your queries, with the exception of changing some input variable names on the join.
Also, disclaimer that I did NOT build these queries in an IDE before posting so you may have some typos or missed symbols to deal with, but I'm pretty sure I have it right. Sorry in advance if you find anything aggravating.
While Linq can be cool and add efficiencies, it doesn't add value if you can't be sure the code is correct today, and can't understand it tomorrow. Maybe using Linq in this case is Premature Optimization.
Start with a non-Linq solution that is verifiably accurate without being needlessly inefficient, and then optimize later if performance becomes a problem.
Here's how I might tackle this:
var vouchers = new Dictionary<string, Voucher>();
foreach (DataRow row in invoicesTable)
{
string vendor = row.Field<string>("Vendor Code");
string invoice = row.Field<string>("Vendor Invoice Number");
string voucherKey = vendor + "|" + invoice;
if (!vouchers.ContainsKey(voucherKey))
{
vouchers.Add(voucherKey, new Voucher { VendorID = vendor, DocNumber = invoice });
}
vouchers[voucherKey].Distributions.Add(new APDistribution { AccountNumber = row.Field<string>("Journal Account Code") });
}
If this will be processing a large number of rows, you can tune this a bit by preallocating the Dictionary to an estimate of the number of unique vendors:
var vouchers = new Dictionary<string, Voucher>(invoicesTable.Rows.Count * 0.8);

Efficient query involving count in subquery

Say I have this hypothetical many-to-many relationship:
public class Paper
{
public int Id { get; set; }
public string Title { get; set; }
public virtual ICollection<Author> Authors { get; set; }
}
public class Author
{
public int Id { get; set; }
public string Name { get; set; }
public virtual ICollection<Paper> Papers { get; set; }
}
I want to use LINQ to build a query that will give me the "popularity" of each author compared to other authors, which is the number of papers the author contributed to divided by the total number of author contributions in general across all papers. I've come up with a couple queries to achieve this.
Option 1:
var query1 = from author in db.Authors
let sum = (double)db.Authors.Sum(a => a.Papers.Count)
select new
{
Author = author,
Popularity = author.Papers.Count / sum
};
Option 2:
var temp = db.Authors.Select(a => new
{
Auth = a,
Contribs = a.Papers.Count
});
var query2 = temp.Select(a => new
{
Author = a,
Popularity = a.Contribs / (double)temp.Sum(a2 => a2.Contribs)
});
Basically, my question is this: which of these is more efficient, and are there other single queries that are more efficient? How do any of those compare to two separate queries, like this:
double sum = db.Authors.Sum(a => a.Papers.Count);
var query3 = from author in db.Authors
select new
{
Author = author,
Popularity = author.Papers.Count / sum
};
Well, first of all, you can try them out yourself and see which one takes the longest for instance.
The first thing you should look for is that they translate perfectly into SQL or as close as possible so that the data doesn't get all get loaded in memory just to apply those computations.
But i feel that option 2 might be your best shot ,with one more optimization to cache the total sum of pages contributed. This way you only make one call to the db to get the authors which you anyway need, the rest will run in your code and there you can paralellize and do whatever you need to make it fast.
So something like this (sorry, I prefer the Fluent style of writing Linq):
//here you can even load only the needed info if you don't need the whole entity.
//I imagine you might only need the name and the Pages.Count which you can use below, this would be another optimization.
var allAuthors = db.Authors.All();
var totalPageCount = allAuthors.Sum(x => x.Pages.Count);
var theEndResult = allAuthors .Select(a => new
{
Author = a,
Popularity = a.Pages.Count/ (double)totalPageCount
});
Option 1 and 2 should generate the same SQL code. For readability I would go with option 1.
Option 3 will generate two SQL statements and be a little slower.

Pass Linq Expression to a function

I want to pass a property list of a class to a function. with in the function based on property list I'm going to generate a query. As exactly same functionality in Linq Select method.
Here I'm gonna implement this for Ingress Database.
As an example,
in front end I wanna run a select as this,
My Entity Class is like this
public class Customer
{
[System.Data.Linq.Mapping.ColumnAttribute(Name="Id",IsPrimaryKey=true)]
public string Id { get; set; }
[System.Data.Linq.Mapping.ColumnAttribute(Name = "Name")]
public string Name { get; set; }
[System.Data.Linq.Mapping.ColumnAttribute(Name = "Address")]
public string Address { get; set; }
[System.Data.Linq.Mapping.ColumnAttribute(Name = "Email")]
public string Email { get; set; }
[System.Data.Linq.Mapping.ColumnAttribute(Name = "Mobile")]
public string Mobile { get; set; }
}
I wanna call a Select function like this,
var result = dataAccessService.Select<Customer>(C=>C.Name,C.Address);
then,using result I can get the Name and Address properties' values.
I think my Select function should looks like this,
( *I think this should done using Linq Expression. But im not sure what are the input parameter and return type. * )
Class DataAccessService
{
// I'm not sure about this return type and input types, generic types.
public TResult Select<TSource,TResult>(Expression<Func<TSource,TResult>> selector)
{
// Here I wanna Iterate through the property list, which is passed from the caller.
// Here using the property list,
// I can get the ColumnAttribute name value and I can generate a select query.
}
}
This is a attempt to create a functionality like in Linq. But im not an expert in Linq Expressions.
There is a project call DbLinq from MIT, but its a big project and still i couldn't grab anything helpful from that.
Can someone please help me to start this, or can someone link me some useful resources to read about this.
What you're trying to do is creating a new anonymous type that consists of Name and Address. This is easily achievable via long form linq (I made that term up, for lack of a better explanation.) Here's a sample from Microsoft, link provided below:
public void Linq11()
{
List<Product> products = GetProductList();
var productInfos =
from p in products
select new { p.ProductName, p.Category, Price = p.UnitPrice };
Console.WriteLine("Product Info:");
foreach (var productInfo in productInfos)
{
Console.WriteLine("{0} is in the category {1} and costs {2} per unit.", productInfo.ProductName, productInfo.Category, productInfo.Price);
}
}
Details: Linq Select Samples
Update:
So are you trying to do something like this then?
var result = dataAccessService.Select<Customer>(c => c.Name, c => c.Address);
public object[] Select<TSource>(params Expression<Func<TSource, object>>[] selectors)
{
var toReturn = new object[selectors.Count()];
foreach (var s in selectors)
{
var func = s.Compile();
//TODO: If you implement Select a proper extension method, you can easily get the source
toReturn[i] = func(TSource);
}
return toReturn;
}
I don't understand why you're trying to implement Select as a function of DataAccessService? Are trying to create this as an extension method rather?
If this is not what you mean though, you need to rephrase you're question big time and as one commenter suggested, tell us what you need not how you want us to design it.

Categories

Resources