Do I have to leave off the skip call in my query - c#

I have a big table in which I want to clear old records. The records have a field 'FilePath'. The 'clear' herein means mark the 'FilePath' as null. The question is because the table has millions of records, one time updating it is impossible. It blows up the memory. So my strategy is each time to fetch 2000 rows and update them then continue work on the next block.
My query:
int pageNumber = 0;
int pageSize = 2000;
bool hasHitEnd = false;
while (!hasHitEnd)
{
var size = pageNumber * pageSize;
var query = cdrContext.Mytable.Where(c => c.FacilityID == facilityID && c.FilePath != null && c.TimeStationOffHook < oldDate)
.OrderBy(c => c.TimeStationOffHook)
.Skip(size)
.Take(pageSize)
.Select(c => new { c.FilePath, c.FileName })
.ToList();
var q = cdrContext.Mytable.Where(c => c.FacilityID == facilityID && c.FilePath != null && c.TimeStationOffHook < oldDate)
.OrderBy(c => c.TimeStationOffHook)
.Skip(size)
.Take(pageSize)
.ToList();
foreach (var y in q)
{
y.FilePath = null;
}
cdrContext.SaveChanges();
if (query.Count() < pageSize)
{
hasHitEnd = true;
}
pageNumber++;
I am not confident the code. Because after updating the data, the FilePath is null. Then in the next run, it may not point to the right block as I skip one block.
Do I need to remove skip part?

You don't need to skip records, because after update next page will become first page (updated items will not match you query filter on next call).
// define query, but don't execute it
var query = cdrContext.Mytable.Where(c => c.FacilityID == facilityID &&
c.FilePath != null &&
c.TimeStationOffHook < oldDate)
.OrderBy(c => c.TimeStationOffHook)
.Take(pageSize);
List<Foo> itemsToUpdate = query.ToList(); // get first N items
while(itemsToUpdate.Any()) // all items updated
{
// update items
cdrContext.SaveChanges();
itemsToUpdate = query.ToList(); // get first N items
}

No need to skip records, subsequent page will be your first page. Also you dont need to query the db twice I see you use query and q which is not necessary. Just use q it will help a lot in perfomance. You can remove following code
if (query.Count() < pageSize)
{
hasHitEnd = true;
}
Replace that with count of records in q if q.Count() == 0 then you can break the loop or set hasHitEnd = true;

Related

Linq where clause confusion

Good day, everyone!
I've written one query for my Automation test, but it's taking too long to execute, and I'm not sure how to optimize it effectively because I'm new to the Linq where clause.
Could someone please assist me with this?
var order = OrderRepositoryX.GetOrderByStatus(OrderStatusType.Dispatched, 4000)
.Where(x => x.siteId == 1 || x.siteId == 10 || x.siteId == 8 || x.siteId == 16 || x.siteId == 26 || x.siteId == 27)
.Where(x =>
{
var totalPrice = OrderRepository.GetOrderById(shared_parameters.testConfiguration, x.orderId).TotalPrice;
if (totalPrice < 500)
return false;
return true;
})
.Where(x =>
{
var cnt = ReturnOrderRepositoryX.CheckReturnOrderExists(x.orderId);
if (cnt > 0)
return false;
return true;
})
.Where(x =>
{
var cnt = OrderRepositoryX.CheckActiveOrderJobDetailsByOrderId(x.orderId);
if (cnt > 0)
return false;
return true;
})
.FirstOrDefault();
The biggest code smell here is that you are calling other repositories inside the Where clause which (assuming that repositories actually hit database) it will effectively mean that you are hitting database per every queried item. Lets imagine that OrderRepositoryX.GetOrderByStatus(OrderStatusType.Dispatched, 4000) and first Where will result in 1000 items, only second Whereclause will lead to 1000 queries to the database (and you have some more calls to repositories in subsequent Wheres). And all of this to get just one item (i.e. FirstOrDefault).
Usual approach is to avoid calling database in loops (what Where basically does here) and rewrite such code so only single SQL query will be performed against the database returning only what is needed and performing all the filtering on the database side.
Please try this instead
Avoid too many where clauses. It gets a result and then applies another check on the whole set.
var order = OrderRepositoryX.GetOrderByStatus(OrderStatusType.Dispatched, 4000)
.FirstOrDefault(x => x.siteId == 1 || x.siteId == 10 || x.siteId == 8 || x.siteId == 16 ||
x.siteId == 26 || x.siteId == 27) &&
(x =>
{
var totalPrice = OrderRepository.GetOrderById(shared_parameters.testConfiguration, x.orderId)
.TotalPrice;
return totalPrice >= 500;
})
&& (x =>
{
var cnt = ReturnOrderRepositoryX.CheckReturnOrderExists(x.orderId);
return cnt <= 0;
})
&& (x =>
{
var cnt = OrderRepositoryX.CheckActiveOrderJobDetailsByOrderId(x.orderId);
return cnt <= 0;
});

LINQ to Entities does not recognize the method

This is my Code where I am fetching data.
var list = (from u in _dbContext.Users
where u.IsActive
&& u.IsVisible
&& u.IsPuller.HasValue
&& u.IsPuller.Value
select new PartsPullerUsers
{
AvatarCroppedAbsolutePath = u.AvatarCroppedAbsolutePath,
Bio = u.Bio,
CreateDateTime = u.CreationDate,
Id = u.Id,
ModifieDateTime = u.LastModificationDate,
ReviewCount = u.ReviewsReceived.Count(review => review.IsActive && review.IsVisible),
UserName = u.UserName,
Locations = (from ul in _dbContext.UserLocationRelationships
join l in _dbContext.Locations on ul.LocationId equals l.Id
where ul.IsActive && ul.UserId == u.Id
select new PartsPullerLocation
{
LocationId = ul.LocationId,
Name = ul.Location.Name
}),
Rating = u.GetPullerRating()
});
Now Here is my Extension.
public static int GetPullerRating(this User source)
{
var reviewCount = source.ReviewsReceived.Count(r => r.IsActive && r.IsVisible);
if (reviewCount == 0)
return 0;
var totalSum = source.ReviewsReceived.Where(r => r.IsActive && r.IsVisible).Sum(r => r.Rating);
var averageRating = totalSum / reviewCount;
return averageRating;
}
I have check this Post LINQ to Entities does not recognize the method
And I come to know I need to use
public System.Linq.Expressions.Expression<Func<Row52.Data.Entities.User, int>> GetPullerRatingtest
But how ?
Thanks
You can use conditionals inside LINQ to Entity queries:
AverageRating = u.ReviewsReceived.Count(r => r.IsActive && r.IsVisible) > 0 ?
u.ReviewsReceived.Where(r => r.IsActive && r.IsVisible).Sum(r => r.Rating) /
u.ReviewsReceived.Count(r => r.IsActive && r.IsVisible)
: 0
This will be calculated by the server, and returned as part of your list. Although with 10 million rows like you said, I would do some serious filtering before executing this.
Code within LINQ (to Entities) query is executed within database, so you can't put random C# code there. So you should either use user.GetPullerRating() after it is retrieved or create a property if you don't want to do the calculation every time.
You can also do:
foreach (var u in list)
u.Rating = u.GetPullerRating()
By the way, why is it extension method.

Select top n rows in each group in EntityFramework

I'm trying to fetch recent contents of each type, currently I'm using something similar to the following code to fetch n records for each type
int n = 10;
var contents = Entities.OrderByDescending(i => i.Date);
IQueryable<Content> query = null;
for (int i = 1; i<=5; i++)
{
if (query == null)
{
query = contents.Where(c => c.ContentTypeIndex == i).Take(n);
}
else
{
query = query.Concat(contents.Where(c => c.ContentTypeIndex == i).Take(n));
}
}
One other solution can be creating an SP, but is it possible to do it by grouping in EF? If not, any cleaner solution?
contents.Where(c => c.ContentTypeIndex >= 1 && c.ContentTypeIndex <= 5)
.GroupBy(c => c.ContentTypeIndex)
.SelectMany(g => g.Take(n));
Note: if you want to select all types of indexes, then you don't need where filter here.

Improving performance of linq query

I'm optimizing a method with a number of Linq queries. So far the execution time is around 3 seconds and I'm trying to reduce it. There is quite a lot of operations and calculations happening in the method, but nothing too complex.
I will appreciate any suggections and ideas how the performance can be improved and code optimized.
The whole code of the method(Below I'll point where I have the biggest delay):
public ActionResult DataRead([DataSourceRequest] DataSourceRequest request)
{
CTX.Configuration.AutoDetectChangesEnabled = false;
var repoKomfortaktion = new KomfortaktionRepository();
var komfortaktionen = CTX.Komfortaktionen.ToList();
var result = new List<AqGeplantViewModel>();
var gruppen = new HashSet<Guid?>(komfortaktionen.Select(c => c.KomfortaktionsGruppeId).ToList());
var hochgeladeneKomplettabzuege = CTX.Komplettabzug.Where(c => gruppen.Contains(c.KomfortaktionsGruppeId)).GroupBy(c => new { c.BetriebId, c.KomfortaktionsGruppeId }).Select(x => new { data = x.Key }).ToList();
var teilnehmendeBetriebe = repoKomfortaktion.GetTeilnehmendeBetriebe(CTX, gruppen);
var hochgeladeneSperrlistenPlz = CTX.SperrlistePlz.Where(c => gruppen.Contains(c.KomfortaktionsGruppeId) && c.AktionsKuerzel != null)
.GroupBy(c => new { c.AktionsKuerzel, c.BetriebId, c.KomfortaktionsGruppeId }).Select(x => new { data = x.Key }).ToList();
var hochgeladeneSperrlistenKdnr = CTX.SperrlisteKdnr.Where(c => gruppen.Contains(c.KomfortaktionsGruppeId) && c.AktionsKuerzel != null)
.GroupBy(c => new { c.AktionsKuerzel, c.BetriebId, c.KomfortaktionsGruppeId }).Select(x => new { data = x.Key }).ToList();
var konfigsProAktion = CTX.Order.GroupBy(c => new { c.Vfnr, c.AktionsId }).Select(c => new { count = c.Count(), c.Key.AktionsId, data = c.Key }).ToList();
foreach (var komfortaktion in komfortaktionen)
{
var item = new AqGeplantViewModel();
var zentraleTeilnehmer = teilnehmendeBetriebe.Where(c => c.TeilnahmeStatus.Any(x => x.KomfortaktionId == komfortaktion.Id && x.AktionsTypeId == 1)).ToList();
var lokaleTeilnehmer = teilnehmendeBetriebe.Where(c => c.TeilnahmeStatus.Any(x => x.KomfortaktionId == komfortaktion.Id && x.AktionsTypeId == 2)).ToList();
var hochgeladeneSperrlistenGesamt =
hochgeladeneSperrlistenPlz.Count(c => c.data.AktionsKuerzel == komfortaktion.Kuerzel && c.data.KomfortaktionsGruppeId == komfortaktion.KomfortaktionsGruppeId) +
hochgeladeneSperrlistenKdnr.Count(c => c.data.AktionsKuerzel == komfortaktion.Kuerzel && c.data.KomfortaktionsGruppeId == komfortaktion.KomfortaktionsGruppeId);
item.KomfortaktionId = komfortaktion.KomfortaktionId;
item.KomfortaktionName = komfortaktion.Aktionsname;
item.Start = komfortaktion.KomfortaktionsGruppe.StartAdressQualifizierung.HasValue ? komfortaktion.KomfortaktionsGruppe.StartAdressQualifizierung.Value.ToString("dd.MM.yyyy") : string.Empty;
item.LokalAngemeldet = lokaleTeilnehmer.Count();
item.ZentralAngemeldet = zentraleTeilnehmer.Count();
var anzHochgelandenerKomplettabzuege = hochgeladeneKomplettabzuege.Count(c => zentraleTeilnehmer.Count(x => x.BetriebId == c.data.BetriebId) == 1) +
hochgeladeneKomplettabzuege.Count(c => lokaleTeilnehmer.Count(x => x.BetriebId == c.data.BetriebId) == 1);
item.KomplettabzugOffen = (zentraleTeilnehmer.Count() + lokaleTeilnehmer.Count()) - anzHochgelandenerKomplettabzuege;
item.SperrlisteOffen = (zentraleTeilnehmer.Count() + lokaleTeilnehmer.Count()) - hochgeladeneSperrlistenGesamt;
item.KonfigurationOffen = zentraleTeilnehmer.Count() - konfigsProAktion.Count(c => c.AktionsId == komfortaktion.KomfortaktionId && zentraleTeilnehmer.Any(x => x.Betrieb.Vfnr == c.data.Vfnr));
item.KomfortaktionsGruppeId = komfortaktion.KomfortaktionsGruppeId;
result.Add(item);
}
return Json(result.ToDataSourceResult(request));
}
The first half (before foreach) takes half a second which is okay. The biggest delay is inside foreach statement in the first iteration and in particular in these lines, execution of zentraleTeilnehmer takes 1.5 second for the first time.
var zentraleTeilnehmer = teilnehmendeBetriebe.Where(c => c.TeilnahmeStatus.Any(x => x.KomfortaktionId == komfortaktion.Id && x.AktionsTypeId == 1)).ToList();
var lokaleTeilnehmer = teilnehmendeBetriebe.Where(c => c.TeilnahmeStatus.Any(x => x.KomfortaktionId == komfortaktion.Id && x.AktionsTypeId == 2)).ToList();
TeilnehmendeBetriebe has over 800 lines, where TeilnahmeStatus property has normally around 4 items. So, maximum 800*4 iterations, which is not a huge number afterall...
Thus, I'm mostly interected in optimizing these lines, hoping to reduce execution time to half a second or so.
What I tried:
Rewrite Linq to foreach: didn't help, same time... probably not surprising, but was worth a try.
foreach (var tb in teilnehmendeBetriebe) //836 items
{
foreach (var ts in tb.TeilnahmeStatus) //3377 items
{
if (ts.KomfortaktionId == komfortaktion.Id && ts.AktionsTypeId == 1)
{
testResult.Add(tb);
break;
}
}
}
Selecting particular columns for teilnehmendeBetriebe with .Select(). Didn't help either.
Neither helped other small manipulations I tried.
What is interesting - while the first iteration of foreach can take up to 2 seconds, the second and further take just milisecons, so .net is capable of optimizing or reusing calculation data.
Any advice on what can be changed in order to improve performance is very welcome!
Edit:
TeilnahmeBetriebKomfortaktion.TeilnahmeStatus is loaded eagerly in the method GetTeilnehmendeBetriebe:
public List<TeilnahmeBetriebKomfortaktion> GetTeilnehmendeBetriebe(Connection ctx, HashSet<Guid?> gruppen)
{
return ctx.TeilnahmeBetriebKomfortaktion.Include(
c => c.TeilnahmeStatus).ToList();
}
Edit2:
The query which is sent when executing GetTeilnehmendeBetriebe:
SELECT
[Extent1].[Id] AS [Id],
[Extent1].[BetriebId] AS [BetriebId],
[Extent1].[MandantenId] AS [MandantenId],
[Extent1].[CreatedUser] AS [CreatedUser],
[Extent1].[UpdatedUser] AS [UpdatedUser],
[Extent1].[CreatedDate] AS [CreatedDate],
[Extent1].[UpdatedDate] AS [UpdatedDate],
[Extent1].[IsDeleted] AS [IsDeleted]
FROM [Semas].[TeilnahmeBetriebKomfortaktion] AS [Extent1]
WHERE [Extent1].[IsDeleted] <> cast(1 as bit)
My assumption is that TeilnahmeBetriebKomfortaktion.TeilnahmeStatus is a lazy loaded collection, resulting in the N + 1 problem. You should eagerly fetch that collection to improve your performance.
The following iterations of the foreach loop are fast, because after the first iteration those objects are no longer requested from the database server but are server from memory.

LINQ to SQL dividing filters for better performance

I have these below LINQ to SQL queries
var kayitlarFiltreli = from rows in db.TBLP1CARIs
orderby rows.ID descending
where rows.HESAPADI.ToLower().Contains(filter.ToLower()) ||
(rows.CARITURU == "Bireysel" ?
rows.B_ADSOYAD.ToLower().Contains(filter.ToLower()) :
rows.K_FIRMAADI.ToLower().Contains(filter.ToLower())) ||
rows.ID.ToString().Contains(filter)
select rows;
var kayitlarBakiyeli = from rows in kayitlarFiltreli
select new
{
HESAPNO = rows.ID,
HESAPADI = rows.HESAPADI,
CARIADI = (rows.CARITURU == "Bireysel" ? rows.B_ADSOYAD : rows.K_FIRMAADI),
Bakiye = get_bakiye(rows.ID, rows.LISTEPARABIRIMI)
};
var kayitlarSon = from rows in kayitlarBakiyeli
select new
{ rows.HESAPNO,
rows.HESAPADI,
rows.CARIADI,
Bakiye = rows.Bakiye.Contains(".") == true ?
rows.Bakiye.TrimEnd('0').TrimEnd('.') :
rows.Bakiye
};
I am having performance problem I mean the queries response at least after 15secs, and when it is deployed to the website it takes at least 5 secs for the page which is using these queries to fill a GridView.get_bakiye(p1,p2,..) is a long method with a for, a foreach and a Linq-to-SQL query in it.I think the most of time is spent on get_bakiye I struggled with it already and reduced the response time like 2 secs, however it is still slow.And I am trying to get the above queries work faster.
I tried
var kayitlarSirali = from rows in db.TBLP1CARIs
orderby rows.ID descending
select rows;
var kayitlarFiltreli = from rows in kayitlarSirali
where rows.HESAPADI.ToLower().Contains(filter.ToLower()) ||
(rows.CARITURU == "Bireysel" ?
rows.B_ADSOYAD.ToLower().Contains(filter.ToLower()) :
rows.K_FIRMAADI.ToLower().Contains(filter.ToLower())) ||
rows.ID.ToString().Contains(filter)
select rows;
And the rest is the same.
Basically I just seperated the filtering part with Contains(), which I am not sure if that helps so much.
Is it good to seperate where's I mean filters when querying the database, and is it better for performance to query the database once and get the results into an in-memory IQueryable and do the rest on it?
What do you recommend for these queries to work faster?
This is the get_bakiye() method which is not something I wrote fully but I am supposed to make it perform faster.
public static string get_bakiye(int cari_id, string birim_kod)
{
return get_bakiye(cari_id, DAL.DAOCari.GetEntity(cari_id).LISTEPARABIRIMI, null,false);
}
public static string get_bakiye(int cari_id, string birim_kod, List<BAL.P_CariBakiyeTablosu> custom_rapor, bool borcluTespit)
{
VeriyazDBDataContext db = new VeriyazDBDataContext(); db.Connection.ConnectionString = System.Configuration.ConfigurationManager.ConnectionStrings["LocalSqlServer"].ConnectionString;
decimal return_bakiye = 0;
if (birim_kod == null || birim_kod.Trim() == "") //default
birim_kod = "TL";
//devir bakiyesini hesapla:
List<BAL.P_CariBakiyeTablosu> bakiyeler = new List<BAL.P_CariBakiyeTablosu>();
if (custom_rapor == null)
bakiyeler = CariBakiyeRaporuOlustur(cari_id, true);
else
bakiyeler = custom_rapor;
bakiyeler.RemoveAt(0);
List<TBLP1DOVIZTANIMLARI> dovizTanimlariTumListe = DAL.DAOdoviztanimlari.SelectAll().ToList();
//devirleri hesaplarken döviztanimlari tablosundaki varsayılan kuru kullanıyor
for (int i = 0; i < bakiyeler.Count; i++)
{
if (bakiyeler[i].DOVIZ == birim_kod)
{
return_bakiye = return_bakiye + Convert.ToDecimal(bakiyeler[i].DEVIR);
}
else
{
decimal from_kur = 1;
from_kur = from_kur = dovizTanimlariTumListe.Where(rows => rows.TBLP1DOVIZLER.KOD == bakiyeler[i].DOVIZ).FirstOrDefault().VARSAYILANKUR.GetValueOrDefault(1);
decimal to_kur = 1;
to_kur = dovizTanimlariTumListe.Where(rows => rows.TBLP1DOVIZLER.KOD == birim_kod).First().VARSAYILANKUR.GetValueOrDefault(1);
return_bakiye = return_bakiye + (Convert.ToDecimal(bakiyeler[i].DEVIR) * (from_kur / to_kur));
}
}
//islem bakiyesini hesapla:
var islemler = from rows in db.TBLP1ISLEMs
where
rows.CARI_ID == cari_id &&
rows.TEKLIF.GetValueOrDefault(false) == false &&
rows.SOZLESME.GetValueOrDefault(false) == false &&
(rows.SIPARISDURUMU == "İşlem Tamamlandı" ||
rows.SIPARISDURUMU == "Hazırlanıyor" ||
rows.SIPARISDURUMU == "" ||
rows.SIPARISDURUMU == null)
select rows;
//var dovizKuruOlanIslemler = from dovizKuruRow in db.TBLP1DOVIZKURUs
// select dovizKuruRow.ISLEM_ID;
foreach (var item in islemler)
{
decimal from_kur = 1;
decimal fromKurVarsayilan = 1;
//belirtilen dövizin varsayılanını çekiyor
fromKurVarsayilan = dovizTanimlariTumListe.Where(rows => rows.TBLP1DOVIZLER.KOD == item.PARABIRIMI).FirstOrDefault().VARSAYILANKUR.GetValueOrDefault(1);
try
{
from_kur = item.KURDEGERI.Value;
//aşağıdaki satırda dövizkuru tablosundan işleme ait kuru çekerek hesap yapıyordu, işlem tablosuna KURDEGERİ kolonu ekleyince
//buna gerek kalmadı, yukarıdaki satırda işleme ait kur değeri işlem tablosundan çekiyor.
//from_kur = item.TBLP1DOVIZKURUs.Where(rows => rows.DOVIZBIRIM == item.PARABIRIMI).FirstOrDefault().KUR.GetValueOrDefault();
}
catch
{
from_kur = fromKurVarsayilan;
}
//carinin para biriminin varsayılan kurunu çekiyor
decimal to_kur = 1;
decimal toKurVarsayilan = 1;
toKurVarsayilan = dovizTanimlariTumListe.Where(rows => rows.TBLP1DOVIZLER.KOD == birim_kod).FirstOrDefault().VARSAYILANKUR.GetValueOrDefault(1);
to_kur = toKurVarsayilan;
if (item.CARIISLEMTURU == "BORC")
{
return_bakiye = return_bakiye + (Convert.ToDecimal(item.GENELTOPLAM) * (from_kur / to_kur));
}
if (item.CARIISLEMTURU == "ALACAK")
{
return_bakiye = return_bakiye - (Convert.ToDecimal(item.GENELTOPLAM) * (from_kur / to_kur));
}
}
string returnBakiyeParaFormatli = DAL.Format.ParaDuzenle.ParaFormatDuzenle(return_bakiye.ToString());
if (borcluTespit==true)
{
return return_bakiye.ToString();
}
if (returnBakiyeParaFormatli.Contains(".") == true)
{
return returnBakiyeParaFormatli.TrimEnd('0').TrimEnd('.') + " " + birim_kod;
}
else
{
return returnBakiyeParaFormatli + " " + birim_kod;
}
}
}
In general i think you need to understand what causes Linq to Sql to execute a query against your database. In general, extension methods such as ToList(), First(), FirstOrDefault(), Single() will cause Linq To Sql to execute a command against the database. One line that does concern me is:
List<TBLP1DOVIZTANIMLARI> dovizTanimlariTumListe = DAL.DAOdoviztanimlari.SelectAll().ToList();
This seems to be getting every row from the database table that DAOdoviztanimlari is mapped to. The result of this is then queried in memory.
This then happens for every record in the queries that call get_bakiye()!
Ultimately (perfect world) you want get_bakiye() to not contain any of the extension methods i have mentioned and to return IQueryable<string> then let Linq to SQL descide how it optimizes and executes the SQL.
First profile your database and see how long the generated queries take to execute and how often do they execute.
If the results from the profiling show that you are executing the same query against the database or the query takes too long to execute perhaps you should consider loading values into memory and access them from there. Or even consider compiled queries as an alternative if that's plausible.
I had a similar problem a little while ago that I solved by creating a separate class that handles the loading of values that I needed into a collection and updating the values when necessary.

Categories

Resources