Hint for a C# and also an SQL Puzzle - c#

I was browsing SO careers and came across a job that had a pdf with a couple of puzzles they wanted applicants to send in.
Although I'm not interested in the job, I read the questions anyway and had a play in Visual Studio / SMSS. The first question was fairly easy to solve although I couldn't think if any way to optimise it (I solved it in C#). The second puzzle only one obvious solution strikes me and I can't think of any others.
I'm not sure if it's bad to discuss these questions here though, but if anyone can give me some hints or perhaps suggest somewhere where I can ask this without creating any grief it'd be appreciated.
The questions are in here: http://www.debtx.com/doc/DebtX_Programming_Problems.pdf
I could let the first one slide but the second one has me stumped on other ways of solving it than the obvious. Shame there's no PM function on SO...
Boilerplate solution for the first part C#:
public static bool Compare(int[] num, int[] div)
{
for (int i = 0; i < num.Length; i++)
{
for (int j = 0; j < div.Length; j++)
{
if (num[i] % div[j] == 0)
return true;
}
}
return false;
}
My SQL Solutions
select Table1.Key1, Table1.Key2 from Table1 inner join Table2 on Table1.Key1 = Table2.key2 where IsDeleted=0
select * from Table1 where key1 in(select Key2 from Table2 where IsDeleted=0)
It all seems so samey though

couple of examples using pseudo SQL to not give too much away
Not In
SELECT * FROM TBL1
WHERE NOT IN (
SELECT FROM TBL2
WHERE Deleted=0 AND Tbl2.Key1= Tbl1.Key1 AND Tbl2.Key2=Tbl1.Key2
)
Not Exists
SELECT * FROM TBL1
WHERE NOT EXISTS (
SELECT FROM TBL2
WHERE Deleted =0 AND Tbl2.Key1= Tbl1.Key1 AND Tbl2.Key2=Tbl1.Key2
)
Outter Join Is Null
SELECT * FROM TBL1 LEFT JOIN TBL2
WHERE TBL2.Key1 IS NULL OR Deleted=0

One optimisation to the C# question is to sort the DIV array. You're more likely to find a match quickly starting with the smaller numbers.
EDIT: Another optimisation to the C# question may be to look at an approach similar to the Prime Number Sieve of Eratosthenes the general theory being that you pre-empt some results without having to perform the checks.
As for the SQL question, the three obvious (common) ways are as others have stated, a JOIN, an IN and an EXISTS.

Well which solution have you used already? Immediately I think it can be done using a subquery with IN, using a LEFT OUTER JOIN and filtering on NULL, or using EXISTS.

Spoiler alert!!!!!
SELECT
T1.key1,
T1.key2
FROM
Table1 T1
WHERE
NOT EXISTS
(
SELECT *
FROM
Table2 T2
WHERE
T2.key1 = T1.key1 AND
T2.key2 = T1.key2 AND
COALESCE(T2.IsDeleted, 0) <> 1
)
SELECT
T1.key1,
T1.key2
FROM
Table1 T1
LEFT OUTER JOIN Table2 T2 ON
T2.key1 = T1.key1 AND
T2.key2 = T1.key2 AND
COALESCE(T2.IsDeleted, 0) <> 1
WHERE
T2.key1 IS NULL
SELECT
T1.key1,
T1.key2
FROM
Table1 T1
WHERE
(
SELECT COUNT(*)
FROM
Table2 T2
WHERE
T2.key1 = T1.key1 AND
T2.key2 = T1.key2 AND
COALESCE(T2.IsDeleted, 0) <> 1
) = 0

Related

Entity Framework generates inefficient SQL for paged query

I have a simple paged linq query against one entity:
var data = (from t in ctx.ObjectContext.Widgets
where t.CampaignId == campaignId &&
t.CalendarEventId == calendarEventId
(t.RecurringEventId IS NULL OR t.RecurringEventId = recurringEventId)
select t);
data = data.OrderBy(t => t.Id);
if (page > 0)
{
data = data.Skip(rows * (page - 1)).Take(rows);
}
var l = data.ToList();
I expected it to generate SQL similar to:
select top 50 * from Widgets w where CampaignId = xxx AND CalendarEventId = yyy AND (RecurringEventId IS NULL OR RecurringEventId = zzz) order by w.Id
When I run the above query in SSMS, it returns quickly (had to rebuild my indexes first).
However, the generated SQL is different. It contains a nested query as shown below:
SELECT TOP (50)
[Project1].[Id] AS [Id],
[Project1].[CampaignId] AS [CampaignId]
<redacted>
FROM ( SELECT [Project1].[Id] AS [Id],
[Project1].[CampaignId] AS [CampaignId],
<redacted>,
row_number() OVER (ORDER BY [Project1].[Id] ASC) AS [row_number]
FROM ( SELECT
[Extent1].[Id] AS [Id],
[Extent1].[CampaignId] AS [CampaignId],
<redacted>
FROM [dbo].[Widgets] AS [Extent1]
WHERE ([Extent1].[CampaignId] = #p__linq__0) AND ([Extent1].[CalendarEventId] = #p__linq__1) AND ([Extent1].[RecurringEventId] = #p__linq__2 OR [Extent1].[RecurringEventId] IS NULL)
) AS [Project1]
) AS [Project1]
WHERE [Project1].[row_number] > 0
ORDER BY [Project1].[Id] ASC
The Widgets table is enormous and the inner query returns 100000s of records, causing a timeout.
Is there anything I can do to change the generation? Anything I am doing wrong?
UPDATE
I finally managed to refactor my code to return the results relatively quickly:
var data = (from t in ctx.ObjectContext.Widgets
where t.CampaignId == campaignId &&
t.CalendarEventId == calendarEventId
(t.RecurringEventId IS NULL OR t.RecurringEventId = recurringEventId)
select t)).AsEnumerable().Select((item, index) => new { Index = index, Item = item });
data = data.OrderBy(t => t.Index);
if (page > 0)
{
data = data.Where(t => t.Index >= (rows * (page - 1)));
}
data = data.Take(rows);
Note, the page > 0 logic is simply used to prevent an invalid parameter being used; it does no optimization. In fact page > 1 , while valid, does not provide any noticeable optimization for the 1st page; since the Where is not a slow operation.
Prior SQL Server 2012, the generated SQL code is the best way to perform pagging. Yes, it is awfull and very inefficient but is the best you can do even writing your own SQL scritp by hand. There are tons of digital ink about this in the net. Just google it.
In the firt page, this can be optimized not doing Skip and just Take but in any other page you are f***** up.
A workarround could be to generate your own row_number in persistence (an auto-identity could work) and just do where(widget.number > (page*rows) ).Take(rows) in code. If there is a good index in your widget.number the query should be very fast. But, this breaks the dynamic orderBy.
However, I can see in your code that you are ordering by widget.id always; so, if dynamic orderBy is not essential, this could be a valid workaround.
Will you take your own medicine?
could you ask me.
No, I will not. The best way to deal with this is having a persistence read-model in wich you can even have one table per widget orderBy field with its own widget.number. The problem is that modeling a system with a persistence read-model just for this issue is too crazy. Having a read-model is part of the overall design of your system and requires taking it in account from the very beginning of the design and development of a system.
The generated query is so complex and nested because you used Skip method. In T-SQL Take is easy achievable by using just Top, but that is not the case with Skip - to apply it you need row_number and that is why there is a nested query - inner returns rows with row_number and outer filters them to get proper amount of rows. Your query:
select top 50 * from Widgets w where CampaignId = xxx AND CalendarEventId = yyy AND (RecurringEventId IS NULL OR RecurringEventId = zzz) order by w.Id
lacks Skipping initial rows. To keep the query very efficient it would be best to, instead of using Take and Skip to keep paging by condition on Id, because you are ordering your rows for paging basing on that field:
var data = (from t in ctx.ObjectContext.Widgets
where t.CampaignId == campaignId &&
t.CalendarEventId == calendarEventId
(t.RecurringEventId IS NULL OR t.RecurringEventId = recurringEventId)
select t);
data = data
.OrderBy(t => t.Id);
.Where(t => t.Id >= rows * (page - 1) && t.Id < rows * page )
.ToList();
AFAIK you cannot change query generated by Entity. Although you can force entity to run raw SQL query:
https://msdn.microsoft.com/en-us/data/jj592907.aspx
You can also use stored procedures:
https://msdn.microsoft.com/en-us/data/gg699321.aspx
Even if there's a chance to change generated query IMO it would be spitting into the wind. I bet that easier will be to write the SQL query on your own.

sql inner join on blank table [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
I have a complicated query for simple thing, but I have managed to make it to work. The query works when idclient is 1, but when idclient is 5 there is a problem.
Problem is that that client didn't order anything, he just paid some amount. So there isn't a.price, practically whole table is blank, and I want result like -1200,00 or paid amount in minus. My problem is that first part of table does not exist so inner join is impossible, and there for second part is also non existing. Any suggestion for a "quick fix"? :)
SELECT SUM(a.price) - s.pay AS Remain
FROM (SELECT name,
( quantity * itprice ) * ( 100 - percent ) / 100 AS price,
idclient
FROM (SELECT order.iditem AS ID,
item.name,
SUM(order.quant) AS quantity,
order.percent,
item.itprice,
order.idclient
FROM item
inner join order
ON order.iditem = item.id
WHERE ( order.idclient = 1 )
GROUP BY order.iditem,
order.percent,
item.name,
item.itprice,
order.idclient) AS X) AS a
inner join (SELECT SUM(amount) AS Pay,
idcom
FROM payed
WHERE ( idcom = 1 )
GROUP BY idcom) AS s
ON a.idclient = s.idcom
GROUP BY s.idcom,
a.idclient,
s.pay
(there is maybe some typing error in code, but don't bother because I have translated my original code, so maybe some letter is lost in translation. Code is correct)
Is this always just fetching one row? At least it looks like it, and if that's the case you could just use variables with something like this:
declare #price decimal(10,2) = 0, #payment decimal(10,2) = 0
SELECT
#price = SUM(order.quant * item.itprice ) * ( 100 - order.percent ) / 100)
FROM
item
inner join order
ON order.iditem = item.id
WHERE
order.idclient = 1
SELECT
#payment = SUM(amount)
FROM
payed
WHERE
idcom = 1
select #price - #payment

Reproduce conditional LEFT JOIN behavior in WHERE?

I have a problem and right now I am out of ideas.
I am doing some optimisation for a database application. There is a method (Method_A) called a couple hundred times that does this kind of query:
SELECT
A.a,
ISNULL(A.b, 'Nothing') As alias_b
ISNULL(B.a, 'N/A') as alias_c
FROM A
LEFT JOIN B on A.fk=B.fk AND B.a = 'SOME_KEY'
WHERE A.c = 'SOME_OTHER_KEY'
Resulting in one row looking like this:
[a ][alias_b][ alias_c ]
[ val_a ][ val_b ][val_c|'N/A']
What I want to do in order to optimize my code is to make a single request before Method_A, retreiving all the data and make Method_A filter client-side. (It is coded in .NET, I am replacing an OleDBConnection with a DataSet)
How could I acheive this kind of behavior? Because if I filter with a client-side condition, instead of getting 'N/A' in the case where the join returns nothing, I just get no row, and this is a problem since I still need val_a and val_b
Thank you for your help!
You can try this, basically split the two cases where there are non matching B elements and when there are matching (and do an inner join), maybe it will have a cleaner execution plan:
(SELECT A.a,
ISNULL(A.b, 'Nothing') As alias_b
'N/A' as alias_c
FROM A
WHERE A.c = 'SOME_OTHER_KEY' and
not exists (select B.fk FROM B WHERE A.fk=B.fk AND B.a = 'SOME_KEY')
)
UNION ALL
(SELECT A.a
ISNULL(A.b, 'Nothing') As alias_b
B.a as alias_c
FROM A, B
WHERE A.c = 'SOME_OTHER_KEY' and
A.fk= B.fk and
B.a = 'SOME_KEY'
)
Note you need the following indexes:
A(c)
A(fk)
B(fk)
B(fk,a)
A(fk,c)

2 levels of joins LINQ

I have 3 objects that I need to link together
Parent: TblClients
This will have multiple children of Type TblBusinessLeads , the key between the two is ClientID
Type Lead will have multiple children of type TblFeeBreakouts , the key between the two is LeadID
I have written the following LINQ to get the databack, but it is not coming back (out of memory exception)
from t0 in TblClients
join t1 in TblBusinessLeads on t0.ClientID equals t1.ClientID into t1_join
from t1 in t1_join.DefaultIfEmpty()
join t3 in TblFeeBreakouts on t1.LeadID equals t3.LeadID into t3_join
from t3 in t3_join.DefaultIfEmpty()
orderby
t0.ClientID,
t1.LeadID
select new {
client_data = t0,
business_lead_data = t1_join,
fee_breakout_data = t3_join
}
I am not sure of you can even do this, but the idea seems pretty common. Any help would be greatly appreciated. Thanks
EDIT:
Wow lot of comments. Here goes my answers
I am trying to run the query in LinqPad, thats where the Out of Memory is Occuring
If I look at the SQL generated, it gives me
SELECT [t0].[ClientID], [t0].[ClientName], [t0].[ClientDesc], [t0].[EditedBy], [t0].[EditedDate], [t0].[CreatedBy], [t0].[CreatedDate], [t3].[LeadID], [t3].[InitiativeName], [t3].[Description], [t3].[NewBusNeeds], [t3].[CreativeNeeds], [t3].[IdeationNeeds], [t3].[Comments], [t3].[LossReasons], [t3].[OriginDate], [t3].[DateReceivedAssignment], [t3].[DueDate], [t3].[TimelineNotes], [t3].[PendingCode], [t3].[EstStartDate], [t3].[EstEndDate], [t3].[ExeStartDate], [t3].[ExeEndDate], [t3].[Probable80Total], [t3].[Possible50Total], [t3].[Emerging25Total], [t3].[NoBudget0Total], [t3].[TotalBudget], [t3].[FinancialNotes], [t3].[DollarsRecordFor], [t3].[BizDevContactUserID], [t3].[BizDevContact2UserID], [t3].[SVPContactUserID], [t3].[ClientMgmtContactUserID], [t3].[CMAdditionalContactUserID], [t3].[AdditionalContactUserID], [t3].[CreatorUserID], [t3].[OfficeID], [t3].[ClientID] AS [ClientID2], [t3].[LeadTypeID], [t3].[ActionNeeded], [t3].[ActionDate], [t3].[NewBusDeliveryDate], [t3].[NewBusDesc], [t3].[CreativeDeliveryDate], [t3].[CreativeDesc], [t3].[IdeationDeliveryDate], [t3].[IdeationDesc], [t3].[AltMediaDeliveryDate], [t3].[AltMediaDesc], [t3].[MobileOpsDeliveryDate], [t3].[MobileOpsDesc], [t3].[EventsDeliveryDate], [t3].[EventsDesc], [t3].[Routing], [t3].[RoutingDate], [t3].[Deleted], [t3].[LeadSourceID], [t3].[NatureofLeadID], [t3].[NatureofLeadNotes], [t3].[EditedBy] AS [EditedBy2], [t3].[EditedDate] AS [EditedDate2], [t3].[CreatedBy] AS [CreatedBy2], [t3].[CreatedDate] AS [CreatedDate2], [t3].[ClientContactName], [t3].[ClientContactTitle], [t3].[ReportingYear], (
SELECT COUNT(*)
FROM [tblBusinessLead] AS [t4]
WHERE [t0].[ClientID] = [t4].[ClientID]
) AS [value], [t1].[LeadID] AS [LeadID2]
FROM [tblClient] AS [t0]
LEFT OUTER JOIN [tblBusinessLead] AS [t1] ON [t0].[ClientID] = [t1].[ClientID]
LEFT OUTER JOIN [tblFeeBreakout] AS [t2] ON [t1].[LeadID] = [t2].[LeadID]
LEFT OUTER JOIN [tblBusinessLead] AS [t3] ON [t0].[ClientID] = [t3].[ClientID]
ORDER BY [t0].[ClientID], [t1].[LeadID], [t2].[LeadID], [t2].[FeeTypeID], [t3]. [LeadID]
This returns like 1.2 million rows
There is no mapping in the model becuase the DB has no relationships (they are inferred, no foreign keys or anything like that)
The reason I am using t1_join and t3_join is because if I use t1 or t3, I get the single entity, not the IEnumerable of the object, hence I cant loop over it.
If you have more questions, please ask.
First of all, what possible use could a client have for 1.2 million rows? There is no real good use case for this, so your first step should be figuring out how to filter your results appropriately.
Second, I believe the reason your query is returning OutOfMemoryException is because LinQPad is doing a ToList() or something similar so that I can show the results of the query in the bottom pane. The ToList() is loading all 1.2 million rows into memory. If you ran the query in a regular .Net app, the following will return an IQueryable<> which will not load into memory.
var query =
from t0 in TblClients
join t1 in TblBusinessLeads on t0.ClientID equals t1.ClientID into t1_join
from t1 in t1_join.DefaultIfEmpty()
join t3 in TblFeeBreakouts on t1.LeadID equals t3.LeadID into t3_join
from t3 in t3_join.DefaultIfEmpty()
orderby
t0.ClientID,
t1.LeadID
select new {
client_data = t0,
business_lead_data = t1_join,
fee_breakout_data = t3_join
};
As stated above, it is probably a good idea to setup asociations on these tables, which I did... The LINQ for the result after the association was simple
var clientList = (from a in tblClients
select a).ToList();
From there it was just accessing the properties.

LINQ to SQL complex query problem

I have 3 tables: Principal (Principal_ID, Scale), Frequency (Frequency_ID, Value) and Visit (Visit_ID, Principal_ID, Frequency_ID).
I need a query which returns all principals (in the Principal table), and for each record, query the capacity required for that principal, calculated as below:
Capacity = (Principal.Scale == 0 ? 0 : (Frequency.Value == 1 ? 1 : Frequency.Value * 1.8) / Principal.Scale)
I'm using LINQ to SQL, so here is the query:
from Principal p in ShopManagerDataContext.Instance.Principals
let cap =
(
from Visit v in p.Visits
let fqv = v.Frequency.Value
select (p.Scale != 0 ? ((fqv == 1.0f ? fqv : fqv * 1.8f) / p.Scale) : 0)
).Sum()
select new
{
p,
Capacity = cap
};
The generated TSQL:
SELECT [t0].[Principal_ID], [t0].[Name], [t0].[Scale], (
SELECT SUM(
(CASE
WHEN [t0].[Scale] <> #p0 THEN (
(CASE
WHEN [t2].[Value] = #p1 THEN [t2].[Value]
ELSE [t2].[Value] * #p2
END)) / (CONVERT(Real,[t0].[Scale]))
ELSE #p3
END))
FROM [Visit] AS [t1]
INNER JOIN [Frequency] AS [t2] ON [t2].[Frequency_ID] = [t1].[Frequency_ID]
WHERE [t1].[Principal_ID] = [t0].[Principal_ID]
) AS [Capacity]
FROM [Principal] AS [t0]
And the error I get:
SqlException: Multiple columns are specified in an aggregated expression containing an outer reference. If an expression being aggregated contains an outer reference, then that outer reference must be the only column referenced in the expression.
And ideas how to solve this, if possible, in one query?
Thank you very much in advance!
Here are 2 ways to do this by changing up your approach:
Create a user defined aggregate function using the SQL CLR. This may not be the right solution for you, but it's a perfect fit for the problem as stated. For one thing, this would move all of the logic into the data layer so LINQ would be of limited value. With this approach you get effeciency, but there's a big impact on your architecture.
Load Visit and Fequency tables into a typed DataSet and use LINQ to datasets. This will probably work using your existing code, but I haven't tried it. With this approach your achitecture is more or less preserved, but you could have a big efficency hit if Visit and Frequency are large.
Based on the comment, I've an alternative suggestion. Since your error is coming from SQL, and you aren't using the new column as a filter, you can move your calculation to the client. For this to work, you'll need to pull all the relevant records (using DataLoadOptions.LoadWith<> on your context).
To further your desire for use with binding to a DataGrid, it'd probably be easiest to bury the complexity in a property of Principal.
partial class Principal
{
public decimal Capacity
{
get
{
return this.Scale == 0 ? 0 : this.Visits.Select(v =>
(v.Frequency.Value == 1 ? 1 : v.Frequency.Value * 1.8) / this.Scale).Sum();
}
}
}
Then your retrieval gets really simple:
using (ShopManagerDataContext context = new ShopManagerDataContext())
{
DataLoadOptions options = new DataLoadOptions();
options.LoadWith<Principal>(p => p.Visits);
options.LoadWith<Visit>(v => v.Frequency);
context.LoadOptions = options;
return (from p in context.Principals
select p).ToList();
}

Categories

Resources