C# run function parallel on batches of large dataset - c#

Is there any way I can run a function in parallel with batches of a large dataset on multiple threads in c#?
So I have a list of data with approximate size of 32000 lines. I run the function below which reads each line of the dataset and verifies it. The idea is to separate the dataset into chunks of 5000 and concurrently apply the function below to each chunk/batch.
private void AccountNumberCheck(List<Invoice> invoices, string VendorID)
{
try
{
using (var context = new ApplicationContext())
{
foreach (var invoice in invoices)
{
var invoiceDB = context.Invoices.Find(invoice.Id);
var accountNumber = context.Accounts.Where(m => m.Account_Number == invoice.Account_Number && m.VendorID == VendorID);
if (accountNumber.Count() > 0)
{
var activeAccount = accountNumber.Any(m => m.Active_Status == false);
if (activeAccount == true)
{
invoiceDB.ExceptionFlag = true;
invoiceDB.ExceptionComments = invoiceDB.ExceptionComments + "The Account Number is Inactive.";
}
else
{
invoiceDB.ExceptionFlag = false;
}
}
else
{
invoiceDB.ExceptionFlag = true;
invoiceDB.ExceptionComments = invoiceDB.ExceptionComments + "The Account Number does not exist. ";
}
context.Entry(invoiceDB).State = EntityState.Modified;
context.SaveChanges();
}
}
}
catch (Exception ex)
{
}
}

Related

How to speed up workflow with database .Net

I have simple controller that return list of products.It is take about 5-7sec to return the data for the first time,and then 50-80ms ,but only if i run it immediately(10-30sec) after first one ,if i will run it after 2-3min it will be again 5-7s.
I dont store this list in cache,sow i cant understand why only first time it takes 5sec,and then 50ms.
Important:test that i have made is not on localhost,it is on the production server.On localhost it takes 50ms
How can i speed up?
Do i need to make any change in my code? Or IIS?
My code
public List<PackageModel> GetPromotionOrDefaultPackageList(string domain)
{
List<DBProduct> DBPackageList = new List<DBProduct>();
List<PackageModel> BllPackageList = new List<PackageModel>();
try
{
using (CglDomainEntities _entities = new CglDomainEntities())
{
DBPackageList = _entities.DBProducts
.Where(x =>
x.ProductGroup.Equals(domain + "-package")
&& x.IsPromotionProduct == true)
.ToList();
if (DBPackageList.Count == 0)
{
DBPackageList = _entities.DBProducts
.Where(x =>
x.ProductGroup.Equals(domain + "-package")
&& x.IsBasicProduct == true)
.ToList();
}
}
PackageModel bllPackage = new PackageModel();
foreach (DBProduct dbProduct in DBPackageList)
{
bllPackage = new PackageModel();
bllPackage.Id = dbProduct.Id;
bllPackage.ProductId = dbProduct.ProductId;
bllPackage.IsPromotionProduct = dbProduct.IsPromotionProduct.HasValue ? dbProduct.IsPromotionProduct.Value : false;
bllPackage.ProductName = dbProduct.ProductName;
bllPackage.ProductDescription = dbProduct.ProductDescription;
bllPackage.Price = dbProduct.Price;
bllPackage.Currency = dbProduct.Currency;
bllPackage.CampaignId = dbProduct.CampaignId;
bllPackage.PathToProductImage = dbProduct.PathToProductImage;
bllPackage.PathToProductPromotionImage = dbProduct.PathToProductPromotionImage;
bllPackage.PathToProductInfo = dbProduct.PathToProductInfo;
bllPackage.CssClass = dbProduct.CssClass;
BllPackageList.Add(bllPackage);
}
return BllPackageList;
}
catch (Exception ex)
{
Logger.Error(ex.Message);
return BllPackageList;
}
}

ASP.NET Core Large Data Set Processing

I have an Asp.net core 2.1 application with a Postgresql database.
A large file (~100K lines) gets uploaded to the database and then each line is processed.
As each line is processed there are approximately 15 sub queries per line taking up about 10 seconds in total per line. I have tried optimizing the queries and loading certain datasets in memory to reduce database calls which has seen a slight improvement. I have also tried reducing the batch size but that has also had minimal reduction.
Are there any other ways I can speed this up? The queries are written in lambda.
The high level function:
public void CDR_Insert(List<InputTable> inputTable, IQueryable<Costing__Table> CostingTable, List<Patterns> PatternsTable, IQueryable<Country_Calling_Codes> CallingCodesTable, IQueryable<Country> CountryTable, string voicemail_number)
{
foreach (var row in inputTable)
{
//if (!string.IsNullOrEmpty(row.finalCalledPartyPattern))
//{
var rowexists = _context.CDR.Where(m => m.InputId == row.id).Count();
if (rowexists == 0)
{
if (ValidateIfOneNumberIsForCountry(row.callingPartyNumber, row.finalCalledPartyNumber, PatternsTable, CountryTable) == true)
{
var CallRingTime = TimeSpan.FromSeconds(Ring_Time(row.dateTimeOrigination, row.dateTimeConnect));
var Call_Direction = CallDirection(row.callingPartyNumber, row.finalCalledPartyNumber, row.finalCalledPartyPattern, voicemail_number);
var CallReason = "Destination Out Of Order";
if (row.dateTimeConnect != "0")
{
CallReason = Call_Reason(row.callingPartyNumber, row.originalCalledPartyNumber, row.finalCalledPartyNumber, voicemail_number);
}
var CallOutcome = Outcome(row.duration, row.dateTimeOrigination, row.dateTimeConnect, row.callingPartyNumber, row.originalCalledPartyNumber, row.finalCalledPartyNumber, voicemail_number);
var CallCategory = GetCategory(row.callingPartyNumber, row.finalCalledPartyNumber, PatternsTable, CallingCodesTable, CountryTable);
var CallCost = 0.00;
if (CallOutcome != "Not Connected" || CallOutcome != "No answer")
{
CallCost = Costing(row.callingPartyNumber, row.finalCalledPartyNumber, row.dateTimeConnect, row.dateTimeDisconnect, row.finalCalledPartyPattern, row.duration, Call_Direction, CostingTable, PatternsTable, voicemail_number, CallingCodesTable, CountryTable);
}
CDR cDR = new CDR()
{
Date_Time = UnixTimeStampToDateTimeConvertor2(row.dateTimeOrigination),
Call_Direction = Call_Direction,
Calling_Party = row.callingPartyNumber_uri,
Calling_Digits = row.callingPartyNumber,
Called_Party = row.finalCalledPartyNumber_uri,
Called_Digits = row.finalCalledPartyNumber,
Duration = TimeSpan.FromSeconds(Convert.ToDouble(row.duration)),
Cost = CallCost,
Reason = CallReason,
Outcome = CallOutcome,
Country_Code = "MUS",//_context.Country.Where(o => o.CountryCode == _CDRExtension.CountryCodeId(_CDRExtension.Call_Type_Id(row.callingPartyNumber))).Select(o => o.CountryCode).FirstOrDefault(),
Second_Ring_Time = CallRingTime,
Category = CallCategory,
CiscoInputId = row.id
};
_context.CDR.Add(cDR);
}
row.Processed = true;
row.DateAdded = DateTime.Now;
_context.Cisco_Input.Update(row);
// }
_context.SaveChanges();
}
}
}

Handle duplicate records when doing a EF SaveChanges

I have an web application where I parse a csv file that can have over 200k records in it. I parse each line for information, verify that the key does not exist in the database and then add it to the context. When the count reaches 10,000 records it calls SaveChanges routine. The problem is that there can be duplicates in the context and it errors out. This is running on a Azure VM communicating to an Azure SQL server.
Two questions, how do I handle the duplicate issue and is there any way I can improve the speed as it takes several hours to run?
using (LoanFileEntities db = new LoanFileEntities())
{
db.Configuration.AutoDetectChangesEnabled = false; // 1. this is a huge time saver
db.Configuration.ValidateOnSaveEnabled = false; // 2. this can also save time
while (parser.Read())
{
counter++;
int loan_code = 0;
string loan_code_string = parser["LoanId"];
string dateToParse = parser["PullDate"].Trim();
DateTime date_pulled;
try
{
date_pulled = DateTime.Parse(dateToParse, CultureInfo.InvariantCulture);
}
catch (Exception)
{
throw new Exception("No Pull Date for line " + counter);
}
string originationdate = parser["OriginationDate"].Trim();
DateTime date_originated;
try
{
date_originated = DateTime.Parse(originationdate, CultureInfo.InvariantCulture);
}
catch (Exception)
{
throw new Exception("No Origination Date for line " + counter);
}
dateToParse = parser["DueDate"].Trim();
DateTime date_due;
try
{
date_due = DateTime.Parse(dateToParse, CultureInfo.InvariantCulture);
}
catch (Exception)
{
throw new Exception("No Due Date for line " + counter);
}
string region = parser["Region"].Trim();
string source = parser["Channel"].Trim();
string password = parser["FilePass"].Trim();
decimal principalAmt = Convert.ToDecimal(parser["Principal"].Trim());
decimal totalDue = Convert.ToDecimal(parser["TotalDue"].Trim());
string vitaLoanId = parser["VitaLoanId"];
var toAdd =
db.dfc_LoanRecords.Any(
x => x.loan_code_string == loan_code_string);
if (!toAdd)
{
dfc_LoanRecords loan = new dfc_LoanRecords();
loan.loan_code = loan_code;
loan.loan_code_string = loan_code_string;
loan.loan_principal_amt = principalAmt;
loan.loan_due_date = date_due;
loan.date_pulled = date_pulled;
loan.date_originated = date_originated;
loan.region = region;
loan.source = source;
loan.password = password;
loan.loan_amt_due = totalDue;
loan.vitaLoanId = vitaLoanId;
loan.load_file = fileName;
loan.load_date = DateTime.Now;
switch (loan.region)
{
case "UK":
if (location.Equals("UK"))
{
//db.dfc_LoanRecords.Add(loan);
if (loan.source == "Online")
{
counter_new_uk_online++;
}
else
{
counter_new_uk_retail++;
}
}
break;
case "US":
if (location.Equals("US"))
{
db.dfc_LoanRecords.Add(loan);
if (loan.source == "Online")
{
counter_new_us_online++;
}
else
{
counter_new_us_retail++;
}
}
break;
case "Canada":
if (location.Equals("US"))
{
db.dfc_LoanRecords.Add(loan);
if (loan.source == "Online")
{
counter_new_cn_online++;
}
else
{
counter_new_cn_retail++;
}
}
break;
}
// delay save to speed up load. 3. also saves transactional time
if (counter % 10000 == 0)
{
db.SaveChanges();
}
}
} // end of parser read
db.SaveChanges();
}
}
}
I would suggest removing duplicates in the code before sending it over to .SaveChanges().
Instead of going into detail about duplicate removal, I've put together this list of links to existing questions and answers on StackOverflow that may help:
Delete duplicates using Lambda
Using DISTINCT on a subquery to remove duplicates in Entity Framework
Using LINQ to find / delete duplicates
Hope that helps!

OutOfMemory Exception for Parallel.ForEach

I have a console application which runs as 32 bit process(we cant change it to 64 bit) and is throwing Out of memory Exception. We traced minor Memory leak(related to Entity Framework repository) in a downstream process but with that too application should not cross 2 GB memory.
One thing i want to understand is sometimes application processes 2500 records while other times it fails at 100.(Server memory utilization is under 60% when this application is running
I am using Parallel.forEach and controlling number of threads to 4 using ParallelOptions. Can anybody suggest making the application on a single thread may by any chance resolve the problem.
(The application didn't fail in any lower environment but only in production and giving us a very hard time).
Below is the code snippet.
Thanks In Advance,
Rohit
protected override void Execute(IEnumerable<PlanPremiumDetails> argument)
{
if (argument.Any())
{
var EnrollmentRequests = argument.GroupBy(c => c.CaseNumber).ToList();
SelectedPlanPremiumDetails request;
EnrollmentResponse enrollmentResponse;
bool taskStatus;
Common.Status.TotalAutoEnrollRecords = EnrollmentRequests.Count;
Common.StartTimer();
Action<IGrouping<int,PlanPremiumDetails>> processRequest = eRequest =>
{
int caseNumber;
List<EnrolledIndividual> enrolledIndividuals;
try
{
string errorMessage;
caseNumber = eRequest.Key;
if (eRequest.Any(f => f.FailedMCOEnrollmentId > 0))
{
request = FailedMcoRequest(eRequest, caseNumber);
enrollmentResponse = InvokeEnrollment(request);
if (enrollmentResponse.OverallStatus == false && enrollmentResponse.ErrorInformationMA != null)
{
StringBuilder messages = new StringBuilder();
if (enrollmentResponse.ErrorInformationMA.ErrorInformationPostMcaps != null)
{
messages.Append(enrollmentResponse.ErrorInformationMA.ErrorInformationPostMcaps.Message);
Exception innerExp = enrollmentResponse.ErrorInformationMA.ErrorInformationPostMcaps.InnerException;
if (innerExp != null)
{
// messages.Append(enrollmentResponse.ErrorInformationMA.ErrorInformationPostMcaps.InnerException.Message);
do
{
messages.Append(innerExp.Message);
innerExp = innerExp.InnerException;
}
while (innerExp != null);
}
}
else
{
if (enrollmentResponse.ErrorInformationMA != null && enrollmentResponse.ErrorInformationMA.InnerErrorInfo != null)
{
foreach (var msg in enrollmentResponse.ErrorInformationMA.InnerErrorInfo)
{
messages.Append( string.Format(#"ErrorCode: {0}, ErrorMessage: {1} ",msg.ErrorCode,msg.ErrorMessage));
}
}
}
errorMessage = Convert.ToString(messages);
Common.Errors.Add(new Exception(String.Format(ConfigurationManager.AppSettings["FailedEnrollErrorText"], caseNumber, errorMessage), enrollmentResponse.ErrorInformationMA.ErrorInformationPostMcaps));
taskStatus = GenerateTask(eRequest, caseNumber, false, errorMessage);
if (taskStatus)
{
UpdateTriggerStatus(caseNumber, "Y");
}
else
{
UpdateTriggerStatus(caseNumber, "N");
Common.Errors.Add(new Exception(String.Format(ConfigurationManager.AppSettings["FailedEnrollErrorText"], caseNumber, "Task Creation")));
}
}
else
{
Common.Status.Success = (Common.Status.Success + 1);
UpdateTriggerStatus(caseNumber, "Y");
}
}
else
{
enrolledIndividuals = eRequest.Select(p => new EnrolledIndividual
{
IndividualId = p.IndividualId,
EdgTraceId = p.EdgTraceId,
EDGNumber = p.EDGNumber
}).ToList();
request = new SelectedPlanPremiumDetails()
{
EmployerId = 0,
CaseNumber = caseNumber,
CallBackId = Common.ApplicationName,
SourceSystem = EnrollmentLibrary.SourceSystem.MAAUTOASSIGN,
AutoAssignMCOIndividuals = enrolledIndividuals
};
enrollmentResponse = InvokeEnrollment(request);
if (enrollmentResponse.OverallStatus == false && enrollmentResponse.ErrorInformationMA != null)
{
StringBuilder messages = new StringBuilder();
if (enrollmentResponse.ErrorInformationMA.ErrorInformationPostMcaps != null)
{
messages.Append(enrollmentResponse.ErrorInformationMA.ErrorInformationPostMcaps.Message);
Exception innerExp = enrollmentResponse.ErrorInformationMA.ErrorInformationPostMcaps.InnerException;
if (innerExp != null)
{
// messages.Append(enrollmentResponse.ErrorInformationMA.ErrorInformationPostMcaps.InnerException.Message);
do
{
messages.Append(innerExp.Message);
innerExp = innerExp.InnerException;
}
while (innerExp != null);
}
}
else
{
if (enrollmentResponse.ErrorInformationMA != null && enrollmentResponse.ErrorInformationMA.InnerErrorInfo.Count != null)
{
foreach (var msg in enrollmentResponse.ErrorInformationMA.InnerErrorInfo)
{
messages.Append(string.Format(#"ErrorCode: {0}, ErrorMessage: {1} ", msg.ErrorCode, msg.ErrorMessage));
}
}
}
errorMessage = Convert.ToString(messages);
Common.Errors.Add(new Exception(String.Format(ConfigurationManager.AppSettings["AutoEnrollErrorText"], caseNumber, errorMessage), enrollmentResponse.ErrorInformationMA.ErrorInformationPostMcaps));
}
else
{
// Update status to be saved in InterfaceSummary table.
Common.Status.Success = (Common.Status.Success + 1);
}
}
}
catch(Exception ex)
{
Common.Errors.Add(new Exception(String.Format(ConfigurationManager.AppSettings["AutoEnrollErrorText"], eRequest.Key, ex.Message), ex));
}
};
}
int dParallelism = Convert.ToInt32(ConfigurationManager.AppSettings["DegreeofParallelism"]);
Parallel.ForEach(EnrollmentRequests,
new ParallelOptions() { MaxDegreeOfParallelism = dParallelism > 0 ? dParallelism : 1 },
processRequest);

MySQL OdbcCommand commands sometimes hangs?

I am running a loop in C# that reads a file and make updates to the MySQL database with MySQL ODBC 5.1 driver in a Windows 8 64-bit environment.
The operations is simple
Count +1
See if file exists
Load XML file(XDocument)
Fetch data from XDocument
Open ODBCConnection
Run a couple of Stored Procedures against the MySQL database to store data
Close ODBCConnection
The problem is that after a while it will hang on for example a OdbcCommand.ExecuteNonQuery. It is not always the same SP that it will hang on?
This is a real problem, I need to loop 60 000 files but it only last around 1000 at a time.
Edit 1:
The problem seemse to accure here hever time :
public bool addPublisherToGame(int inPublisherId, int inGameId)
{
string sqlStr;
OdbcCommand commandObj;
try
{
sqlStr = "INSERT INTO games_publisher_binder (gameId, publisherId) VALUE(?,?)";
commandObj = new OdbcCommand(sqlStr, mainConnection);
commandObj.Parameters.Add("#gameId", OdbcType.Int).Value = inGameId;
commandObj.Parameters.Add("#publisherId", OdbcType.Int).Value = inPublisherId;
if (Convert.ToInt32(executeNonQuery(commandObj)) > 0)
return true;
else
return false;
}
catch (Exception ex)
{
throw (loggErrorMessage(this.ToString(), "addPublisherToGame", ex, -1, "", ""));
}
finally
{
}
}
protected object executeNonQuery(OdbcCommand inCommandObj)
{
try
{
//FileStream file = new FileStream("d:\\test.txt", FileMode.Append, FileAccess.Write);
//System.IO.StreamWriter stream = new System.IO.StreamWriter(file);
//stream.WriteLine(DateTime.Now.ToString() + " - " + inCommandObj.CommandText);
//stream.Close();
//file.Close();
//mainConnection.Open();
return inCommandObj.ExecuteNonQuery();
}
catch (Exception ex)
{
throw (ex);
}
}
I can see that the in parameters is correct
The open and close of the connection is done in a top method for ever loop (with finally).
Edit 2:
This is the method that will extract the information and save to database :
public Boolean addBoardgameToDatabase(XElement boardgame, GameFactory gameFactory)
{
int incomingGameId = -1;
XElement tmpElement;
string primaryName = string.Empty;
List<string> names = new List<string>();
GameStorage externalGameStorage;
int retry = 3;
try
{
if (boardgame.FirstAttribute != null &&
boardgame.FirstAttribute.Value != null)
{
while (retry > -1)
{
try
{
incomingGameId = int.Parse(boardgame.FirstAttribute.Value);
#region Find primary name
tmpElement = boardgame.Elements("name").Where(c => c.Attribute("primary") != null).FirstOrDefault(a => a.Attribute("primary").Value.Equals("true"));
if (tmpElement != null)
primaryName = tmpElement.Value;
else
return false;
#endregion
externalGameStorage = new GameStorage(incomingGameId,
primaryName,
string.Empty,
getDateTime("1/1/" + boardgame.Element("yearpublished").Value),
getInteger(boardgame.Element("minplayers").Value),
getInteger(boardgame.Element("maxplayers").Value),
boardgame.Element("playingtime").Value,
0, 0, false);
gameFactory.updateGame(externalGameStorage);
gameFactory.updateGameGrade(incomingGameId);
gameFactory.removeDesignersFromGame(externalGameStorage.id);
foreach (XElement designer in boardgame.Elements("boardgamedesigner"))
{
gameFactory.updateDesigner(int.Parse(designer.FirstAttribute.Value), designer.Value);
gameFactory.addDesignerToGame(int.Parse(designer.FirstAttribute.Value), externalGameStorage.id);
}
gameFactory.removePublishersFromGame(externalGameStorage.id);
foreach (XElement publisher in boardgame.Elements("boardgamepublisher"))
{
gameFactory.updatePublisher(int.Parse(publisher.FirstAttribute.Value), publisher.Value, string.Empty);
gameFactory.addPublisherToGame(int.Parse(publisher.FirstAttribute.Value), externalGameStorage.id);
}
foreach (XElement element in boardgame.Elements("name").Where(c => c.Attribute("primary") == null))
names.Add(element.Value);
gameFactory.removeGameNames(incomingGameId);
foreach (string name in names)
if (name != null && name.Length > 0)
gameFactory.addGameName(incomingGameId, name);
return true;
}
catch (Exception)
{
retry--;
if (retry < 0)
return false;
}
}
}
return false;
}
catch (Exception ex)
{
throw (new Exception(this.ToString() + ".addBoardgameToDatabase : " + ex.Message, ex));
}
}
And then we got one step higher, the method that will trigger addBoardgameToDatabase :
private void StartThreadToHandleXmlFile(int gameId)
{
FileInfo fileInfo;
XDocument xmlDoc;
Boolean gameAdded = false;
GameFactory gameFactory = new GameFactory();
try
{
fileInfo = new FileInfo(_directory + "\\" + gameId.ToString() + ".xml");
if (fileInfo.Exists)
{
xmlDoc = XDocument.Load(fileInfo.FullName);
if (addBoardgameToDatabase(xmlDoc.Element("boardgames").Element("boardgame"), gameFactory))
{
gameAdded = true;
fileInfo.Delete();
}
else
return;
}
if (!gameAdded)
{
gameFactory.InactivateGame(gameId);
fileInfo.Delete();
}
}
catch (Exception)
{ throw; }
finally
{
if(gameFactory != null)
gameFactory.CloseConnection();
}
}
And then finally the top level :
public void UpdateGames(string directory)
{
DirectoryInfo dirInfo;
FileInfo fileInfo;
Thread thread;
int gameIdToStartOn = 1;
dirInfo = new DirectoryInfo(directory);
if(dirInfo.Exists)
{
_directory = directory;
fileInfo = dirInfo.GetFiles("*.xml").OrderBy(c=> int.Parse(c.Name.Replace(".xml",""))).FirstOrDefault();
gameIdToStartOn = int.Parse(fileInfo.Name.Replace(".xml", ""));
for (int gameId = gameIdToStartOn; gameId < 500000; gameId++)
{
try
{ StartThreadToHandleXmlFile(gameId); }
catch(Exception){}
}
}
}
Use SQL connection pooling by adding "Pooling=true" to your connectionstring.
Make sure you properly close the connection AND the file.
You can create one large query and execute it only once, I think it is a lot faster then 60.000 loose queries!
Can you show a bit of your code?

Categories

Resources