I have a MS SQL table that I don't have any control over and I need to write to. This table has a int primary key that isn't automatically incremented. I can't use stored procs and I would like to use Linq to SQL since it makes other processing very easy.
My current solution is to read the last value, increment it, try to use it, if I get a clash, increment it again and retry.
Something along these lines:
var newEntity = new Log()
{
ID = dc.Logs.Max(l => l.ID) + 1,
Note = "Test"
};
dc.Logs.InsertOnSubmit(newEntity);
const int maxRetries = 10;
int retries = 0;
bool success = false;
while (!success && retries < maxRetries)
{
try
{
dc.SubmitChanges();
success = true;
}
catch (SqlException)
{
retries++;
newEntity.ID = dc.Logs.Max(l => l.ID);
}
}
if (retries >= maxRetries)
{
throw new Exception("Bummer...");
}
Does anyone have a better solution?
EDIT: Thanks to Jon, I simplified the max ID calculation. I was still in SQL thinking mode.
That looks like an expensive way to get the maximum ID. Have you already tried
var maxId = dc.Logs.Max(s => s.ID);
? Maybe it doesn't work for some reason, but I really hope it does...
(Admittedly it's more than possible that SQL Server optimises this appropriately.)
Other than that, it looks okay (smelly, but necessarily so) to me - but I'm not an expert on the matter...
You didn't indicate whether your app is the only one inserting into the table. If it is, then I'd fetch the max value once right after the start of the app/webapp and use Interlocked.Increment on it every time you need next ID (or simple addition if possible race conditions can be ruled out).
You could put the entire operation in a transaction, using a TransactionScope class, like below:
using (TransactionScope scope = new TransactionScope()){
var maxId = dc.Logs.Max(s => s.ID);
var newEntity = new Log(){
ID = maxId,
Note = "Test"
};
dc.Logs.InsertOnSubmit(newEntity);
dc.SubmitChanges();
scope.Complete();
}
By putting both the retrieval of the maximum ID and the insertion of the new records within the same transaction, you should be able to pull off an insert without having to retry in your manner.
One problem you might face with this method will be transaction deadlocks, especially if the table is heavily used. Do test it out to see if you require additional error-handling.
P.S. I included Jon Skeet's code to get the max ID in my code, because I'm pretty sure it will work correctly. :)
Make the id field auto incrementing and let the server handle id generation.
Otherwise, you will run into the problem liggett78 said. Nothing prevents another thread from reading the same id in between the reading and submitting of max id for this thread.
Related
We're investigating a performance issue where EF 6.1.3 is being painfully slow, and we cannot figure out what might be causing it.
The database context is initialized with:
Configuration.ProxyCreationEnabled = false;
Configuration.AutoDetectChangesEnabled = false;
Configuration.ValidateOnSaveEnabled = false;
We have isolated the performance issue to the following method:
protected virtual async Task<long> UpdateEntityInStoreAsync(T entity,
string[] changedProperties)
{
using (var session = sessionFactory.CreateReadWriteSession(false, false))
{
var writer = session.Writer<T>();
writer.Attach(entity);
await writer.UpdatePropertyAsync(entity, changedProperties.ToArray()).ConfigureAwait(false);
}
return entity.Id;
}
There are two names in the changedProperties list, and EF correctly generated an update statement that updates just these two properties.
This method is called repeatedly (to process a collection of data items) and takes about 15-20 seconds to complete.
If we replace the method above with the following, execution time drops to 3-4 seconds:
protected virtual async Task<long> UpdateEntityInStoreAsync(T entity,
string[] changedProperties)
{
var sql = $"update {entity.TypeName()}s set";
var separator = false;
foreach (var property in changedProperties)
{
sql += (separator ? ", " : " ") + property + " = #" + property;
separator = true;
}
sql += " where id = #Id";
var parameters = (from parameter in changedProperties.Concat(new[] { "Id" })
let property = entity.GetProperty(parameter)
select ContextManager.CreateSqlParameter(parameter, property.GetValue(entity))).ToArray();
using (var session = sessionFactory.CreateReadWriteSession(false, false))
{
await session.UnderlyingDatabase.ExecuteSqlCommandAsync(sql, parameters).ConfigureAwait(false);
}
return entity.Id;
}
The UpdatePropertiesAsync method called on the writer (a repository implementation) looks like this:
public virtual async Task UpdatePropertyAsync(T entity, string[] changedPropertyNames, bool save = true)
{
if (changedPropertyNames == null || changedPropertyNames.Length == 0)
{
return;
}
Array.ForEach(changedPropertyNames, name => context.Entry(entity).Property(name).IsModified = true);
if (save)
await context.SaveChangesAsync().ConfigureAwait(false);
}
}
What is EF doing that completely kills performance? And is there anything we can do to work around it (short of using another ORM)?
By timing the code I was able to see that the additional time spent by EF was in the call to Attach the object to the context, and not in the actual query to update the database.
By eliminating all object references (setting them to null before attaching the object and restoring them after the update is complete) the EF code runs in "comparable times" (5 seconds, but with lots of logging code) to the hand-written solution.
So it looks like EF has a "bug" (some might call it a feature) causing it to inspect the attached object recursively even though change tracking and validation have been disabled.
Update: EF 7 appears to have addressed this issue by allowing you to pass in a GraphBehavior enum when calling Attach.
The problem with Entity framework is that when you call SaveChanges(), insert statements are sent to database one by one, that's how Entity works.
And actually there are 2 db hits per insert, first db hit is insert statement for a record, and the second one is select statement to get the id of inserted record.
So you have numOfRecords * 2 database trips * time for one database trip.
Write this in your code context.Database.Log = message => Debug.WriteLine(message); to log generated sql to console, and you will see what am I talking about.
You can use BulkInsert, here is the link: https://efbulkinsert.codeplex.com/
Seeing as though you already have tried setting:
Configuration.AutoDetectChangesEnabled = false;
Configuration.ValidateOnSaveEnabled = false;
And you are not using an ordered lists, I think you are going to have to refactor your code and do some benchmarking.
I believe the bottleneck is coming from the foreach as the context is having to deal with a potentially large amounts of bulk data (not sure how many this is in your case).
Try and cut the items contained in your array down into smaller batches before calling the SaveChanges(); or SaveChangesAsync(); methods, and note the performance deviations as apposed to letting the context grow too large.
Also, if you are still not seeing further gains, try disposing of the context post SaveChanges(); and then creating a new one, depending on the size of your entities list, flushing out the context may yield even further improvements.
But this all depends on how many entities we are talking about and may only be noticeable in the hundreds and thousands of record scenarios.
Currently, I am struggling with an issue regarding Entity Framework (LINQ to Entities). Most of the time when I try to execute entity.SaveChanges() everything works fine but at some points entity.SaveChanges() takes too much and timesouts. I searched a lot but was unable to find out the answer.
(According to companies policy, I cannot copy code somewhere else. So, I do not have the exact code but I will try to layout the basic structure. I hope it helps you to figure out the problem but if i doesn't then let me know.)
Task:
My task is to scan the whole network for some specific files. Match content of each file with the content of database and based on the matching either insert or update the database with the content of the file. I have around 3000 files on the network.
Problem:
public void PerformAction()
{
DbTransaction tran = null;
entity.Connection.Open(); //entity is a global variable declared like myDatabaseEntity entity = new myDatabaseEntity();
tran = entity.Connection.BeginTransaction();
foreach(string path in listOfPaths)
{
//returns 1 - Multiple matching in database OR
// 2 - One matching file in database OR
// 3 - No Matching found.
int returnValue = SearchDatabase();
if(returnValue == 1)
DoSomething(); //All inserts/updates work perfectly. Save changes also works correctly.
else if(returnValue == 2)
DoSomething(); //Again, everything ok. SaveChanges works perfectly here.
else
{
//This function uses some XML file to generate all the queries dynamically
//Forexample INSERT INTO TABLEA(1,2,3);
GenerateInsertQueriesFromXML();
ExecuteQueries();
SaveChanges(); <---- Problem here. Sometimes take too much time.
}
//Transaction commit/rollback code here
}
}
public bool ExecuteQueries()
{
int result = 0;
foreach(string query in listOfInsertQueries)
{
result = entity.ExecuteStoreCommand(query); //Execute the insert queries
if(result <=0)
return false;
}
entity.TestEntityA a = new entity.TestEntityA();
a.PropertyA = 123;
a.PropertyB = 345;
//I have around 25 properties here
entity.AddToTestEntityA(a);
return true;
}
Found the issue.
The main table where i was inserting all the data had a trigger on INSERT and DELETE.
So, whenever i inserted some new data in the main table, the trigger was firing in the backend and was taking all the time.
Entity framework is FAST and INNOCENT :D
this is somewhat tricky to figure out I think, perhaps I am missing something.
I am a newbie trying to rig a database mapped via Linq-to-SQL to my server. There is a function called by clients which retrieves UserAccount from the database:
public static explicit operator Dictionary<byte, object>(UserAccount a)
{
Dictionary<byte, object> d = new Dictionary<byte, object>();
d.Add(0, a.AuthenticationDatas.Username);
int charCount = a.Characters.Count;
for (int i = 0; i < charCount; i++)
{
d.Add((byte)(i + 1), (Dictionary<byte, object>)a.Characters[i]);
}
return d;
}
What this actually does is convert a UserAccount type to my server datatype of Dictionary. UserAccount itself is retrieved from database then converted via this function.
However when I run this function, I get InvalidCastException on line:
int charCount = a.Characters.Count;
Moreover, when VS breakpoints # this line, I can wait a few seconds and proceed and the excpetion will be gone! It retrieves Characters.Count correctly after that.
Here is my Characters mapping:
[global::System.Data.Linq.Mapping.AssociationAttribute(Name="UserAccount_Character", Storage="_CharactersTBs", ThisKey="UID", OtherKey="UID")]
public EntitySet<Character> Characters
{
get
{
return this._Characters;
}
set
{
this._Characters.Assign(value);
}
}
I believe whats happening is that request is somehow executed on another thread then the one that interacts with database, and it errors out before database can actually retrieve Characters table. I am not quite sure...
Does anyone know what the problem might be and how can I syncronize it (without adding some gimp delay)?
EDIT:
Ok I narrowed down the problem. It has nothing to do with different threads networking or what not... Its just me being stupid. Here is a simple databse query which throws InvalidCastException # line int count = UA.Characters.Count;
static void Main(string[] args)
{
IEnumerable<UserAccount> query = from p in PBZGdb.Instance.AuthenticationDatas
where p.Username == "Misha" && p.Password == "123"
select p.UserAccount;
UserAccount UA = query.ElementAt(0);
int count = UA.Characters.Count;
Console.WriteLine(count);
Console.ReadKey();
}
(p.s.) UA is NOT null it indeed finds a correct instance of userAccount and it has 2 Characters. If I wait few seconds and try again exception goes away..
What am I doing wrong? This is the first time I really use a database in VS please help! :)
It looks like you are running in to a problem with the deferred execution of the EntitySet. A simple way to check this and potentially work around it will be to try calling the .Count() method, instead of accessing the .Count property.
You could have a look in the debugger as soon as you hit that line, and look at the value of a.Characters.IsDeferred also.
edit
Another thing you could try would be to force execution of the query by implicitly calling it's .GetEnumerator() (and associated .MoveNext()) by replacing your loop with a foreach:
int i = 0;
foreach (var character in a.Characters)
{
d.Add( /* ... */ );
++i;
}
double edit
removed commentary about
d.Add((byte)(i + 1), (Dictionary<byte, object>)a.Characters[i]);
after clarification in the comments below
Hey just want anyone having the same problem know, I figured it out. What happened was I manualy renamed LINQ .dbml file when I added it to my project after it was geneerated by sqlmetal. And of course I did it inconsistently (it was renamed in designer but not in its .cs file). I just re-generated a new .dbml file with sqlmetal with a correct name this time and everything works like butter!
Thanks guys!
Is there a "best practice" way of handling bulk inserts (via LINQ) but discard records that may already be in the table? Or I am going to have to either do a bulk insert into an import table then delete duplicates, or insert one record at a time?
08/26/2010 - EDIT #1:
I am looking at the Intersect and Except methods right now. I am gathering up data from separate sources, converting into a List, want to "compare" to the target DB then INSERT just the NEW records.
List<DTO.GatherACH> allACHes = new List<DTO.GatherACH>();
State.IState myState = null;
State.Factory factory = State.Factory.Instance;
foreach (DTO.Rule rule in Helpers.Config.Rules)
{
myState = factory.CreateState(rule.StateName);
List<DTO.GatherACH> stateACHes = myState.GatherACH();
allACHes.AddRange(stateACHes);
}
List<Model.ACH> newRecords = new List<Model.ACH>(); // Create a disconnected "record set"...
foreach (DTO.GatherACH record in allACHes)
{
var storeInfo = dbZach.StoreInfoes.Where(a => a.StoreCode == record.StoreCode && (a.TypeID == 2 || a.TypeID == 4)).FirstOrDefault();
Model.ACH insertACH = new Model.ACH
{
StoreInfoID = storeInfo.ID,
SourceDatabaseID = (byte)sourceDB.ID,
LoanID = (long)record.LoanID,
PaymentID = (long)record.PaymentID,
LastName = record.LastName,
FirstName = record.FirstName,
MICR = record.MICR,
Amount = (decimal)record.Amount,
CheckDate = record.CheckDate
};
newRecords.Add(insertACH);
}
The above code builds the newRecords list. Now, I am trying to get the records from this List that are not in the DB by comparing on the 3 field Unique Index:
AchExceptComparer myComparer = new AchExceptComparer();
var validRecords = dbZach.ACHes.Intersect(newRecords, myComparer).ToList();
The comparer looks like:
class AchExceptComparer : IEqualityComparer<Model.ACH>
{
public bool Equals(Model.ACH x, Model.ACH y)
{
return (x.LoanID == y.LoanID && x.PaymentID == y.PaymentID && x.SourceDatabaseID == y.SourceDatabaseID);
}
public int GetHashCode(Model.ACH obj)
{
return base.GetHashCode();
}
}
However, I am getting this error:
LINQ to Entities does not recognize the method 'System.Linq.IQueryable1[MisterMoney.LARS.ZACH.Model.ACH] Intersect[ACH](System.Linq.IQueryable1[MisterMoney.LARS.ZACH.Model.ACH], System.Collections.Generic.IEnumerable1[MisterMoney.LARS.ZACH.Model.ACH], System.Collections.Generic.IEqualityComparer1[MisterMoney.LARS.ZACH.Model.ACH])' method, and this method cannot be translated into a store expression.
Any ideas? And yes, this is completely inline with the original question. :)
You can't do bulk inserts with LINQ to SQL (I presume you were referring to LINQ to SQL when you said "LINQ"). However, based on what you're describing, I'd recommend checking out the new MERGE operator of SQL Server 2008.
Inserting, Updating, and Deleting Data by Using MERGE
Another example here.
I recommend you just write the SQL yourself to do the inserting, I find it is a lot faster and you can get it to work exactly how you want it to. When I did something similar to this (just a one-off program) I just used a Dictionary to hold the ID's I had inserted already, to avoid duplicates.
I find LINQ to SQL is good for one record or a small set that does its entire lifespan in the LINQ to SQL.
Or you can try to use SQL Server 2008's Bulk Insert .
One thing to watch out for is if you queue more than 2000 or so records without calling SubmitChanges() - TSQL has a limit on the number of statements per execution, so you cannot simply queue up every record and then call SubmitChanges() as this will throw an SqlException, you need to periodically clear the queue to avoid this.
My code is written with C# and the data layer is using LINQ to SQL that fill/load detached object classes.
I have recently changed the code to work with multi threads and i'm pretty sure my DAL isn't thread safe.
Can you tell me if PopCall() and Count() are thread safe and if not how do i fix them?
public class DAL
{
//read one Call item from database and delete same item from database.
public static OCall PopCall()
{
using (var db = new MyDataContext())
{
var fc = (from c in db.Calls where c.Called == false select c).FirstOrDefault();
OCall call = FillOCall(fc);
if (fc != null)
{
db.Calls.DeleteOnSubmit(fc);
db.SubmitChanges();
}
return call;
}
}
public static int Count()
{
using (var db = new MyDataContext())
{
return (from c in db.Calls select c.ID).Count();
}
}
private static OCall FillOCall(Model.Call c)
{
if (c != null)
return new OCall { ID = c.ID, Caller = c.Caller, Called = c.Called };
else return null;
}
}
Detached OCall class:
public class OCall
{
public int ID { get; set; }
public string Caller { get; set; }
public bool Called { get; set; }
}
Individually they are thread-safe, as they use isolated data-contexts etc. However, they are not an atomic unit. So it is not safe to check the count is > 0 and then assume that there is still something there to pop. Any other thread could be mutating the database.
If you need something like this, you can wrap in a TransactionScope which will give you (by default) the serializable isolation level:
using(var tran = new TransactionScope()) {
int count = OCall.Count();
if(count > 0) {
var call = Count.PopCall();
// TODO: something will call, assuming it is non-null
}
}
Of course, this introduces blocking. It is better to simply check the FirstOrDefault().
Note that PopCall could still throw exceptions - if another thread/process deletes the data between you obtaining it and calling SubmitChanges. The good thing about it throwing here is that you shouldn't find that you return the same record twice.
The SubmitChanges is transactional, but the reads aren't, unless spanned by a transaction-scope or similar. To make PopCall atomic without throwing:
public static OCall PopCall()
{
using(var tran = new TrasactionScope())
using (var db = new MyDataContext())
{
var fc = (from c in db.Calls where c.Called == false select c).FirstOrDefault();
OCall call = FillOCall(fc);
if (fc != null)
{
db.Calls.DeleteOnSubmit(fc);
db.SubmitChanges();
}
return call;
}
tran.Complete();
}
}
Now the FirstOrDefault is covered by the serializable isolation-level, so doing the read will take a lock on the data. It would be even better if we could explicitly issue an UPDLOCK here, but LINQ-to-SQL doesn't offer this.
Count() is thread-safe. Calling it twice at the same time, from two different threads will not harm anything. Now, another thread might change the number of items during the call, but so what? Another thread might change the number of item a microsecond after it returns, and there's nothing you can do about it.
PopCall on the other hand, does has a possibility of threading problems. One thread could read fc, then before it reaches the SubmitChanges(), another thread may interceded and do the read & delete, before returning to the first thread, which will attempt to delete the already deleted record. Then both calls will return the same object, even though it was your intension that a row only be returned once.
Unfortunately no amount of Linq-To-Sql trickery, nor SqlClient isolation levels, nor System.Transactions can make the PopCall() thread safe, where 'thread safe' really means 'concurrent safe' (ie. when concurrency occurs on the database server, outside the controll and scope of the client code/process). And neither is any sort of C# locking and synchronization going to help you. You just need to deeply internalize how a relational storage engine works in order to get this doen correctly. Using tables as queues (as you do it here) is notoriously tricky , deadlock prone, and really hard to get it correct.
Even less fortunate, your solution is going to have to be platform specific. I'm only going to explain the right way to do it with SQL Server, and that is to leverage the OUTPUT clause. If you want to get a bit more details why this is the case, read this article Using tables as Queues. Your Pop operation must occur atomically in the database with a calls like this:
WITH cte AS (
SELECT TOP(1) ...
FROM Calls WITH (READPAST)
WHERE Called = 0)
DELETE
FROM cte
OUTPUT DELETED.*;
Not only this, but the Calls table has to be organized with a leftmost clustered key on the Called column. Why this is the case, is again explained in the article I referenced before.
In this context the Count call is basically useless. Your only way to check correctly for an item available is to Pop, asking for Count is just going to put useless stress on the database to return a COUNT() value which means nothing under a concurrent environment.