Saving rows and columns in database - c#

I am trying to save a large cvs file into the database. The file i am using is about 7000 rows and each row contains 14 columns. I have to generate and tag each column of every row with a topic id i pass in my api. After saving each item i then loop through the actual data and i use the generated id to save each data in another table. My problem is i have nested foreach loops and in the first loop i call db.saveChanges() after taking each column in every row so i can reference the generated id. but that is A LOT of saveChanges() calls that are made before processing the data.
For an example:
public static void Save(TopicRequest req){
using(var db = new DbContext()){
foreach(var row in req.items){
var obj = new Entity{
topicId = req.topicId,
year = req.year
};
db.Add(obj);
db.saveChanges();
foreach(var col in row){
var newData = new Entity{
TopicObjId = obj.id,
Value = col
}
db.TopicData.Add(newData);
}
db.saveChanges();
}
}
}
so for a 7000 row file with 14 columns that means that my first loop will make a call to save into the db 98,000 times. This is causing a timeout and the file saved. How can i probably handle such large amounts of data in this way.

I suggest to use AddRange to improve the performance.
Add vs AddRange
Here's an example:
public async Task Save(TopicRequest req)
{
using(var db = new DbContext())
{
var list1 = new List<Entity1>();
var list2 = new List<Entity2>();
foreach(var row in req.items)
{
var obj = new Entity1
{
topicId = req.topicId,
year = req.year
};
list1.Add(obj);
}
db.Topic.AddRange(list1);
await db.SaveChangesAsync();
// this may not be necessary
await db.Entry(list1).ReloadAsync():
foreach(var obj in list1)
{
var newData = new Entity2
{
TopicObjId = obj.topicId,
Value = obj.value
};
list2.Add(newData);
}
db.TopicData.AddRange(list2);
await db.SaveChangesAsync();
}
}

Related

How can I ensure rows are not loaded twice with EF / LINQ

I created code to load definitions from an external API. The code iterates through a list of words, looks up a definition for each and then I thought to use EF to insert these into my SQL Server database.
However if I run this twice it will load the same definitions the second time. Is there a way that I could make it so that EF does not add the row if it already exists?
public IHttpActionResult LoadDefinitions()
{
var words = db.Words
.AsNoTracking()
.ToList();
foreach (var word in words)
{
HttpResponse<string> response = Unirest.get("https://wordsapiv1.p.mashape.com/words/" + word)
.header("X-Mashape-Key", "xxxx")
.header("Accept", "application/json")
.asJson<string>();
RootObject rootObject = JsonConvert.DeserializeObject<RootObject>(response.Body);
var results = rootObject.results;
foreach (var result in results)
{
var definition = new WordDefinition()
{
WordId = word.WordId,
Definition = result.definition
};
db.WordDefinitions.Add(definition);
}
db.SaveChanges();
}
return Ok();
}
Also would appreciate if anyone has any suggestions as to how I could better implement this loading.
foreach (var result in results)
{
if(!(from d in db.WordDefinitions where d.Definition == result.definition select d).Any())
{
var definition = new WordDefinition()
{
WordId = word.WordId,
Definition = result.definition
};
db.WordDefinitions.Add(definition);
}
}
You can search for Definition value.
var wd = db.WordDefinition.FirstOrDefault(x => x.Definition == result.definition);
if(wd == null) {
var definition = new WordDefinition() {
WordId = word.WordId,
Definition = result.definition
};
db.WordDefinitions.Add(definition);
}
In this way you can get a WordDefinition that already have your value.
If you can also use WordId in the same way:
var wd = db.WordDefinition.FirstOrDefault(x => x.WordId == word.WordId);

Very slow runtime with Entity Framework nested loop (using nav properties)

Right now, I'm trying to write a method for a survey submission program that utilizes a very normalized schema.
I have a method that is meant to generate a survey for a team of people, linking several different EF models together in the process. However, this method runs EXTREMELY slowly for anything but the smallest team sizes (taking 11.2 seconds to execute for a 4-person team, and whopping 103.9 seconds for an 8 person team). After some analysis, I found that 75% of the runtime is taken up in the following block of code:
var TeamMembers = db.TeamMembers.Where(m => m.TeamID == TeamID && m.OnTeam).ToList();
foreach (TeamMember TeamMember in TeamMembers)
{
Employee employee = db.Employees.Find(TeamMember.EmployeeID);
SurveyForm form = new SurveyForm();
form.Submitter = employee;
form.State = "Not Submitted";
form.SurveyGroupID = surveygroup.SurveyGroupID;
db.SurveyForms.Add(form);
db.SaveChanges();
foreach (TeamMember peer in TeamMembers)
{
foreach (SurveySectionDetail SectionDetail in sectionDetails)
{
foreach (SurveyAttributeDetail AttributeDetail in attributeDetails.Where(a => a.SectionDetail.SurveySectionDetailID == SectionDetail.SurveySectionDetailID) )
{
SurveyAnswer answer = new SurveyAnswer();
answer.Reviewee = peer;
answer.SurveyFormID = form.SurveyFormID;
answer.Detail = AttributeDetail;
answer.SectionDetail = SectionDetail;
db.SurveyAnswers.Add(answer);
db.SaveChanges();
}
}
}
}
I'm really at a loss as to how I might go about cutting back the runtime. Is this just the price I pay for having this many related entities? I know that joins are expensive operations, and that I've essentially got 3 Or is there some inefficiency that I'm overlooking?
Thanks for your help!
EDIT: As requested by Xiaoy312, here's how sectionDetails and attributeDetails are defined:
SurveyTemplate template = db.SurveyTemplates.Find(SurveyTemplateID);
List<SurveySectionDetail> sectionDetails = new List<SurveySectionDetail>();
List<SurveyAttributeDetail> attributeDetails = new List<SurveyAttributeDetail>();
foreach (SurveyTemplateSection section in template.SurveyTemplateSections)
{
SurveySectionDetail SectionDetail = new SurveySectionDetail();
SectionDetail.SectionName = section.SectionName;
SectionDetail.SectionOrder = section.SectionOrder;
SectionDetail.Description = section.Description;
SectionDetail.SurveyGroupID = surveygroup.SurveyGroupID;
db.SurveySectionDetails.Add(SectionDetail);
sectionDetails.Add(SectionDetail);
db.SaveChanges();
foreach (SurveyTemplateAttribute attribute in section.SurveyTemplateAttributes)
{
SurveyAttributeDetail AttributeDetail = new SurveyAttributeDetail();
AttributeDetail.AttributeName = attribute.AttributeName;
AttributeDetail.AttributeScale = attribute.AttributeScale;
AttributeDetail.AttributeType = attribute.AttributeType;
AttributeDetail.AttributeOrder = attribute.AttributeOrder;
AttributeDetail.SectionDetail = SectionDetail;
db.SurveyAttributeDetails.Add(AttributeDetail);
attributeDetails.Add(AttributeDetail);
db.SaveChanges();
}
}
There is several points that you can improve :
Do not SaveChanges() on each Add() :
foreach (TeamMember TeamMember in TeamMembers)
{
...
// db.SaveChanges();
foreach (TeamMember peer in TeamMembers)
{
foreach (SurveySectionDetail SectionDetail in sectionDetails)
{
foreach (SurveyAttributeDetail AttributeDetail in attributeDetails.Where(a => a.SectionDetail.SurveySectionDetailID == SectionDetail.SurveySectionDetailID) )
{
...
// db.SaveChanges();
}
}
}
db.SaveChanges();
}
Consider to reduce the numbers of round trips to the database. This can be done by : they are memory-intensive
using Include() to preload your navigation properties; or
cashing the partial or whole table with ToDictionary() or ToLookup()
Instead of Add(), use AddRange() or even BulkInsert() from EntityFramework.BulkInsert if that fits your setup :
db.SurveyAnswers.AddRange(
TeamMembers.SelectMany(p =>
sectionDetails.SelectMany(s =>
attributeDetails.Where(a => a.SectionDetail.SurveySectionDetailID == s.SurveySectionDetailID)
.Select(a => new SurveyAnswer()
{
Reviewee = p,
SurveyFormID = form.SurveyFormID,
Detail = a,
SectionDetail = s,
}))));
Use Include to avoid SELECT N + 1 issue.
SurveyTemplate template = db.SurveyTemplates.Include("SurveyTemplateSections")
.Include("SurveyTemplateSections.SurveyTemplateAttributes")
.First(x=> x.SurveyTemplateID == SurveyTemplateID);
Generate the whole object graph and then save to DB.
List<SurveySectionDetail> sectionDetails = new List<SurveySectionDetail>();
List<SurveyAttributeDetail> attributeDetails = new List<SurveyAttributeDetail>();
foreach (SurveyTemplateSection section in template.SurveyTemplateSections)
{
SurveySectionDetail SectionDetail = new SurveySectionDetail();
//Some code
sectionDetails.Add(SectionDetail);
foreach (SurveyTemplateAttribute attribute in section.SurveyTemplateAttributes)
{
SurveyAttributeDetail AttributeDetail = new SurveyAttributeDetail();
//some code
attributeDetails.Add(AttributeDetail);
}
}
db.SurveySectionDetails.AddRange(sectionDetails);
db.SurveyAttributeDetails.AddRange(attributeDetails);
db.SaveChanges();
Load all employees you want before the loop, this will avoids database query for every team member.
var teamMemberIds = db.TeamMembers.Where(m => m.TeamID == TeamID && m.OnTeam)
.Select(x=>x.TeamMemberId).ToList();
var employees = db.Employees.Where(x => teamMemberIds.Contains(x.EmployeeId));
create a dictionary for attributeDetails based on their sectionDetailId to avoid query the list on every iteration.
var attributeDetailsGroupBySection = attributeDetails.GroupBy(x => x.SectionDetailId)
.ToDictionary(x => x.Key, x => x);
Move saving of SurveyAnswers and SurveyForms to outside of the loops:
List<SurveyForm> forms = new List<SurveyForm>();
List<SurveyAnswer> answers = new List<SurveyAnswer>();
foreach (int teamMemberId in teamMemberIds)
{
var employee = employees.First(x => x.Id == teamMemberId);
SurveyForm form = new SurveyForm();
//some code
forms.Add(form);
foreach (int peer in teamMemberIds)
{
foreach (SurveySectionDetail SectionDetail in sectionDetails)
{
foreach (SurveyAttributeDetail AttributeDetail in
attributeDetailsGroupBySection[SectionDetail.Id])
{
SurveyAnswer answer = new SurveyAnswer();
//some code
answers.Add(answer);
}
}
}
}
db.SurveyAnswers.AddRange(answers);
db.SurveyForms.AddRange(forms);
db.SaveChanges();
Finally if you want faster insertions you can use EntityFramework.BulkInsert. With this extension, you can save the data like this:
db.BulkInsert(answers);
db.BulkInsert(forms);

Trying to access variable from outside foreach loop

The application I am building allows a user to upload a .csv file, which will ultimately fill in fields of an existing SQL table where the Ids match. First, I am using LinqToCsv and a foreach loop to import the .csv into a temporary table. Then I have another foreach loop where I am trying to loop the rows from the temporary table into an existing table where the Ids match.
Controller Action to complete this process:
[HttpPost]
public ActionResult UploadValidationTable(HttpPostedFileBase csvFile)
{
var inputFileDescription = new CsvFileDescription
{
SeparatorChar = ',',
FirstLineHasColumnNames = true
};
var cc = new CsvContext();
var filePath = uploadFile(csvFile.InputStream);
var model = cc.Read<Credit>(filePath, inputFileDescription);
try
{
var entity = new TestEntities();
var tc = new TemporaryCsvUpload();
foreach (var item in model)
{
tc.Id = item.Id;
tc.CreditInvoiceAmount = item.CreditInvoiceAmount;
tc.CreditInvoiceDate = item.CreditInvoiceDate;
tc.CreditInvoiceNumber = item.CreditInvoiceNumber;
tc.CreditDeniedDate = item.CreditDeniedDate;
tc.CreditDeniedReasonId = item.CreditDeniedReasonId;
tc.CreditDeniedNotes = item.CreditDeniedNotes;
entity.TemporaryCsvUploads.Add(tc);
}
var idMatches = entity.Authorizations.ToList().Where(x => x.Id == tc.Id);
foreach (var number in idMatches)
{
number.CreditInvoiceDate = tc.CreditInvoiceDate;
number.CreditInvoiceNumber = tc.CreditInvoiceNumber;
number.CreditInvoiceAmount = tc.CreditInvoiceAmount;
number.CreditDeniedDate = tc.CreditDeniedDate;
number.CreditDeniedReasonId = tc.CreditDeniedReasonId;
number.CreditDeniedNotes = tc.CreditDeniedNotes;
}
entity.SaveChanges();
entity.Database.ExecuteSqlCommand("TRUNCATE TABLE TemporaryCsvUpload");
TempData["Success"] = "Updated Successfully";
}
catch (LINQtoCSVException)
{
TempData["Error"] = "Upload Error: Ensure you have the correct header fields and that the file is of .csv format.";
}
return View("Upload");
}
The issue in the above code is that tc is inside the first loop, but the matches are defined after the loop with var idMatches = entity.Authorizations.ToList().Where(x => x.Id == tc.Id);, so I am only getting the last item of the first loop.
So I would need to put var idMatches = entity.Authorizations.ToList().Where(x => x.Id == tc.Id); in the first loop, but then I can't access it in the second. If I nest the second loop then it is way to slow. Is there any way I could put the above statement in the first loop and still access it. Or any other ideas to accomplish the same thing? Thanks!
Instead of using multiple loops, keep track of processed IDs as you go and then exclude any duplicates.
[HttpPost]
public ActionResult UploadValidationTable(HttpPostedFileBase csvFile)
{
var inputFileDescription = new CsvFileDescription
{
SeparatorChar = ',',
FirstLineHasColumnNames = true
};
var cc = new CsvContext();
var filePath = uploadFile(csvFile.InputStream);
var model = cc.Read<Credit>(filePath, inputFileDescription);
try
{
var entity = new TestEntities();
var tcIdFound = new HashSet<string>();
foreach (var item in model)
{
if (tcIdFound.Contains(item.Id))
{
continue;
}
var tc = new TemporaryCsvUpload();
tc.Id = item.Id;
tc.CreditInvoiceAmount = item.CreditInvoiceAmount;
tc.CreditInvoiceDate = item.CreditInvoiceDate;
tc.CreditInvoiceNumber = item.CreditInvoiceNumber;
tc.CreditDeniedDate = item.CreditDeniedDate;
tc.CreditDeniedReasonId = item.CreditDeniedReasonId;
tc.CreditDeniedNotes = item.CreditDeniedNotes;
entity.TemporaryCsvUploads.Add(tc);
}
entity.SaveChanges();
entity.Database.ExecuteSqlCommand("TRUNCATE TABLE TemporaryCsvUpload");
TempData["Success"] = "Updated Successfully";
}
catch (LINQtoCSVException)
{
TempData["Error"] = "Upload Error: Ensure you have the correct header fields and that the file is of .csv format.";
}
return View("Upload");
}
If you want to make sure you get the last value for any duplicate ids, then store each TemporaryCsvUpload record in a dictionary instead of using only a HashSet. Same basic idea though.
Declare idMatches before the first loop, but don't instantiate it or set its value to null. Then you'll be able to use it inside both loops. After moving the declaration before the first loop, you'll still end up having the values from the last iteration using a simple Where. You'll need to concatenate the already existing list with results for the current iteration.

How to get multiple inserted identity values in Entity Framework

I am inserting values in to the table QueryList
[QueryID] [WorkItemID] [RaisedBy]
1 123 xyz
2 234 abc
where QueryID is an Identity column.
I am using a foreach loop and inserting more than one value at a time. My question is how to get all the newly inserted Identity values in Entity Framework 3.5
This is my code
using (TransactionScope currentScope = new TransactionScope())
{
Query newQuery = new Query();
foreach (long workItemId in workItemID)
{
newQuery = new Query();
...
currentScope.Complete();
success = true;
}
}
entityCollection.SaveChanges(true);
int QueryID = newQuery.QueryID; //It gives me last 1 Identity value
You have to track each newly created Query object separately. I suggest using a List<Query> for simplicity:
using (TransactionScope curentScope = new TransactionScope())
{
List<Query> newQueries = new List<Query>();
Query newQuery = new Query();
newQueries.Add(newQuery);
foreach (long workItemId in workItemID)
{
newQuery = new Query();
newQueries.Add(newQuery);
...
curentScope.Complete();
success = true;
}
}
entityCollection.SaveChanges(true);
var queryIDs = newQueries.Select(q => q.QueryID);
Side note: In your code sample you created a Query object outside of the for-loop, but didn't use it at all. This may just be because it's just a sample, but if you use it or insert it in your data context, don't create it.

Insert and update multi item into database in the same time

I have an empty data base.
I want to add multi records into data base.
while inserting record to data base i want to check if my product inserted in same date donot add it again(i want to change some it's filed and update it's content).
i used this code but it just add some data into data base (it can't check for existing product.)
var AllData = ClsDataBase.Database.InsertProductTbls;
foreach(item in AllData)
{
//Update
if (Exist(datefa))
{
var query = ClsDataBase.Database.CustomerProductTbls.SingleOrDefault
(data => data.CustomerId == AllData .CustomerId );
int? LastProductTotal = query.CustomerProducTtotal;
query.CustomerProducTtotal = LastProductTotal + ClsInsertProduct._InsertProductNumber;
}
//Insert
else
{
_CustomerProductTbl = new CustomerProductTbl();
_CustomerProductTbl.CustomerId = AllData ._CustomerId;
_CustomerProductTbl.CustomerProductDateFa = AllData.datefa
.
.
.
ClsDataBase.Database.AddToCustomerProductTbls(_CustomerProductTbl);
}
}
}
ClsDataBase.Database.SaveChanges();
if i use ClsDataBase.Database.SaveChanges(); for both update and insert part i will return this error:
An error occurred while starting a transaction on the provider connection. See the inner exception for details.
please help.
I got the solution by opening database conection for each repeat loop:
foreach(item in AllData)
{
using (StorageEntities context = new StorageEntities())
{
//Update
if (Exist(datefa))
{
var query = ClsDataBase.Database.CustomerProductTbls.SingleOrDefault
(data => data.CustomerId == AllData .CustomerId );
int? LastProductTotal = query.CustomerProducTtotal;
query.CustomerProducTtotal = LastProductTotal + ClsInsertProduct._InsertProductNumber;
}
//Insert
else
{
_CustomerProductTbl = new CustomerProductTbl();
_CustomerProductTbl.CustomerId = AllData ._CustomerId;
_CustomerProductTbl.CustomerProductDateFa = AllData.datefa;
ClsDataBase.Database.AddToCustomerProductTbls(_CustomerProductTbl);
}
ClsDataBase.Database.SaveChanges();
}
}

Categories

Resources