Is there any way to insert multiple records without a loop if you already have a collection? Granted there is little performance benefit if you commit outside the loop.
I have a context and ObjectSet<Type>. I also have a collection of Type which I want to swap in or concatenate or intersect or what have you for what's in the table. In other words, I don't want to proceed from the below:
foreach (Type r in Collection)
{
Context.ObjectSet.Add(r);
}
Context.SaveChanges();
You must always use loop. There cannot be any performance benefit in methods like AddCollection or AddRange because these methods usually works on some performance optimization where internal collection is extended for whole range and it is just copied instead of extending collection for each Add call. AddObject does much more then passing data to some internal collection so it still have to process entities one by one.
If you want to optimize performance of database inserts themselves you must move to another solution because EF doesn't have any batch or bulk data modifications. Each record is passed as single insert in separate roundtrip to the database.
You could try creating an extension method to do the looping and adding.
see below.
public static class AddHelper
{
public void AddAll(this ObjectSet<T> objectSet,IEnumerable<T> items)
{
foreach(var item in items)
{
objectSet.AddObject(item);
}
}
}
Then you could use the code below.
IEnumerable<EntityName> list = GetEntities(); //this would typically return your list
Context.ObjectSet.AddAll(list);
I'm looking for the same type of solution.
I have yet to find a way to do it directly with EF, but there is a way to do it. If you create a Stored Procedure that accepts a DataTable as a parameter, you can execute the PROC with Entity Context and pass the table as a parameter.
When you're dealing with thousands or millions of records, this is a lot better than looping through each record to be passed to DB.
Passing Table Valued Parameters to SQL Server Stored Procedures.
Related
More of a concept question...
Is there any reason it would be a bad idea to make async calls to my DataContext within a foreach loop such as this?
private async Task ProcessItems(List<Item> items)
{
var modifiedItems = new List<modifiedItem>();
foreach (var item in items)
{
// **edited to reflect link between items and properties**
// var properties = await _context.Properties
// .Where(p => p.Condition == true).ToListAsync();
var properties = await _context.Properties
.Where(p => p.Condition == item.Condition).ToListAsync();
foreach (var property in properties)
{
// do something to modify 'item'
// based on the value of 'property'
// save to variable 'modifiedItem'
modifiedItems.Add(modifiedItem)
}
}
await _context.ModifiedItems.AddRangeAsync(modifiedItems);
await _context.SaveChangesAsync();
}
Since the inner foreach loop is dependent upon the properties variable, does it not start until the properties variable has been fully instantiated?
Being that the modifiedItems variable is declared outside of the parent foreach loop, is it a bad idea to asynchronously add each modifiedItem to the modifiedItems list?
Are there any properties/methods in Entity Framework, Linq, whatever that would be better suited for this kind of task? Better idea than doing embedded foreach loop?
(In case anyone wants some context... IRL, items is a list of readings from sensors. And properties are mathematical equations to convert the raw readings to meaningful data such as volume and weight in different units... then those calculated data points are being stored in a database.)
No there is none, however you miss some concept here.
Since you use an async method, the ProcessItems method should be called ProcessItemsAsync and return a task.
this would be of use to you: Async/Await - Best Practices in Asynchronous Programming
depends on your needs, it is recommended to add CancellationToken and consider exception handling as well, just becareful to not swallow exceptions.
There is no issue here using async, once you await an async call the returned object is the same.
You may want to rethink executing a DB call in a foreach if you can run it once outside the loop, and filter the data in memory to act on it. Each use case is different and you have to make sure the larger return set can be handled in memory.
Usually getting a 1000 rows from a DB once is faster than 100 rows 10 times.
In this specific case, I'd instead write it to load a single list of all properties for all sensors of interest (considering all items and based on you macAddress/sensorKey properties you've mentioned), and store that in a list. We'll call that allProperties. We await that once and avoid making repeated database calls.
Then, use LINQ To Objects to join your items to the objects in allProperties that they match. Iterate the results of that join and there's no await to do inside the loop.
Yes, this would be a bad idea.
The method you propose, performs one database query per item in your sequence of items. Apart from that this is less efficient, it also might result in an incoherent result set. Someone might have changed the database between your first and your last query, so the first returned object belongs to a different database state than the result from the last query.
A simplified example: Let's query persons, with the name of the company they work for:
Table of Companies:
Id Name
1 "Better Bakery"
2 "Google Inc"
Table of Persons
Id Name CompanyId
1 "John" 1
2 "Pete" 1
Suppose I want the Id / Name / CompnanyName of Persons with Id {1, 2}
Between your first and your second query, someone changes the name of the bakery into Best Bakery Your result would be:
PersonId Name CompanyName
1 "John" "Better Bakery"
2 "Pete" "Best Bakery"
Suddenly John and Pete work for different companies?
And what would be the result if you'd ask for Persons with Id {1, 1}? The same person would work for two different companies?
Surely this is not what you want!
Beside that the result would be incorrect, your database can improve it's queries if it knows that you want the result of several items. The database creates temporary tables to calculate the result. These tables would have to be generated only once.
So if you want one consistent set of data that represents the state of the data at the moment of your query, create your data using one query:
The same result in one database query:
var itemConditions = items.Select(item => item.Condition);
IQueryable<Property> queryProperties = dbContext.Properties
.Where(property => itemConditions.Contains(property.Condition);
In words: From every item in your input sequence of Items (in your case a list), take the propertyCondition`.
Then query the database table Properties. Keep only those rows in this table that have a Condition that is one of the Condidions in itemConditions.
Note that no query or enumeration has been performed yet. The database has not been accessed. Only the Expression in the IQueryable has been modified.
To perform the query:
List<Property> fetchedProperties = await queryProperties.ToListAsync();
If you want, you could concatenate these statements into one big LINQ statement. I doubt whether this would improve performance, it surely would deteriorate readability, and thus testability / maintainability.
Once you've fetched all desired properties in one database query, you can change them:
foreach (Property fetchedProperty in fetchedProperties)
{
// change some values
}
await dbContext.SaveChangesAsync();
As long as you keep your dbContext alive, it will remember the state of all fetched rows from all tables in dbContext.ChangeTracker.Entries. This is a sequence of DbEntityEntries. One object per fetched row. Every DbEntityEntry holds the original fetched value and the modified value. They are used to identify which items are changed and thus need to be saved in one database transaction during Savechanges()
Hence it is seldom needed to actively notify your DbContext that you changed a fetched object
The only use case I see, is when you want to change an object without first adding or fetching it, which might be dangerous, as some of the properties of the object might be changed by someone else.
You can see this in
I have a Repository base class which has the following method:
public void AddRange(IEnumerable<TEntity> entities)
{
Context.Set<TEntity>().AddRange(entities);
foreach (var entity in entities)
{
if (Context.Entry(entity).State == EntityState.Detached)
{
Context.Entry(entity).State = EntityState.Added;
}
}
}
I call it like so:
uow.UserPermissions.AddRange(permissions);
where UserPermissions is a repository that inherits from the base.
I am seeing some strange behavior that I cannot explain when permissions is not a list. For example when I try something like this:
var permissions = permissionDtos.Select(dto => new UserPermission()
{
...
});
uow.UserPermissions.AddRange(permissions);
entity framework adds twice as many permissions to the db as there are permissionDtos. However, if I add a ToList() at the end of the select statement, the strange behavior goes away. I also noticed that when I comment out the forEach loop (that modifies the context entries state) in the Repository.AddRange() method, the strange behavior goes away as well (even without adding the ToList()).
Thanks in advance.
You're doing it wrong.That is why this behaviour.When you do this Context.Set<TEntity>().AddRange(entities);,it Adds the given collection of entities into context underlying the set with each entity being put into the Added state such that it will be inserted into the database when SaveChanges is called.You can see it on MSDN.
So you don't need to do it again inside the foreach loop.When you remove either one, your method should be OK.
You can try as shown below.
public void AddRange(IEnumerable<TEntity> entities)
{
Context.Set<TEntity>().AddRange(entities);
}
To elaborate on why you get twice as many entries, it's because of how IQueryable<T> works.
Every time you enumerate a query, the query is re-executed. In fact, the query is not actually performed at all until it gets to the loop somewhere inside Context.Set<TEntity>().AddRange(entities);
Internally, AddRange is enumerating the query results, which in your case will select all rows in the table behind permissionDtos and project these into new UserPermission objects.
Then, in your foreach loop, you're executing the query a second time by enumerating it again, which again selects all rows and projects them as new UserPermission objects. Since you're projecting these results, the framework isn't able to reuse the objects it created during the first execution of the query. If you were directly enumerating a Set<TEntity> twice, it would still execute the query twice, but it would be able to re-use the already materialized objects.
In the case where you use ToList(), you're forcing the query and projection to happen immediately and you're no longer saving the query itself - permissions in this case is no longer a query but a list of your already-projected items.
Sampath has already answered that your foreach loop is redundant - adding the entities to the context will mark them as added, and setting the state to added will add them to the context (the only difference between the two being in how related entities accessible through navigation properties are treated). Even in the List case, you're doing an unnecessary loop.
I'm making a complex app (planning which involves: articles, sales, clients, manufacturing, machines...) that uses the information provided by the SQL Server database of an ERP.
I use about 30 different related objects, each of which has its info stored in a table/view. Some of this tables have from 20k to 100k records.
I need to convert all these tables into C# object for future processing that cannot be handled in SQL. I do not need all the rows, but there ins't a way to determine which ones will I need exactly, as this will depend on runtime events.
The question is about the best way to do this. I have tried the following approaches:
Retrieve all data and store it in a
DataSet using a SqlDataAdapter, which ocuppies about 300mb in RAM.
First problem here: sync, but it's admissable since data isn't going
to change that much during execution.
Then I ran through every row and convert it to C# objects,
stored in static Dictionaries for fast access through key. Problem with this is that creating so many objects (millions) takes the memory
usage up to 1,4GB, which is too much. Besides from memory, data
access is very fast.
So if getting all takes too much memory, I thought that I needed some kind of laxy loading, so I tried:
Another option I have considered is to query directly the database
through a SqlDataReader filtering by the item I need only the
first time it's required, then it's stored in the static
dictionary. This way memory usage it's the minimum, but this way
is slow (minutes order) as it means that I need to make like a millon different queries which the server doesn't seem to like (low performance).
Lastly, I tried an intermediate approach that kind of works, but I'm not sure if it's optimal, I suspect it's not:
A third option would be to fill a DataSet containing all the info and store a local static copy, but not convert all the rows to objects, just do it on demand (lazy), something like this:
public class ProductoTerminado : Articulo {
private static Dictionary<string, ProductoTerminado> productosTerminados = new Dictionary<string, ProductoTerminado>();
public PinturaTipo pinturaTipo { get; set; }
public ProductoTerminado(string id)
: base(id) { }
public static ProductoTerminado Obtener(string idArticulo)
{
idArticulo = idArticulo.ToUpper();
if (productosTerminados.ContainsKey(idArticulo))
{
return productosTerminados[idArticulo];
}
else
{
ProductoTerminado productoTerminado = new ProductoTerminado(idArticulo);
//This is where I get new data from that static dataset
var fila = Datos.bd.Tables["articulos"].Select("IdArticulo = '" + idArticulo + "'").First();
//Then I fill the object and add it to the dictionary.
productoTerminado.descripcion = fila["Descripcion"].ToString();
productoTerminado.paletizacion = Convert.ToInt32(fila["CantidadBulto"]);
productoTerminado.pinturaTipo = PinturaTipo.Obtener(fila["PT"].ToString());
productosTerminados.Add(idArticulo, productoTerminado);
return productoTerminado;
}
}
}
So, is this a good way to proceed or should I look into Entity Framework or something like a strongly typed DataSet?
I use relations between about 30 different objects, each of which has its info stored in a table/view. Some of this tables have from 20k to 100k records.
I suggest making a different decision for different types of objects. Usually, the tables that have thousands of records are more likely to change. Tables that have fewer records are less likely. In a project I was working on the decision was to cache in a List<T> the objects that don't change (on start-up). For a few hundred instances this should take well less than a second.
If you are using linq-to-sql, have an object local in a List<T> and have correctly defined the FK constraints, you can do obj.Items to access the Items table filtered by obj's ID. (In this example obj is the PK and Items is the FK table).
This design will also give users the performance they expect. When working on small sets everything is instantaneous (cached). When working on larger sets but making small selects or inserts - performance is good (quick queries that use the PK). You only really suffer when you start doing queries that join multiple big tables; and in those cases, users will probably expect this (though I can't be certain without knowing more about the use case).
I use this method to get each page of data from EF:
public IEnumerable<MyObj> GetByPage(int page, int perPage)
{
return context.MyObj.Skip((page - 1) * perPage).Take(perPage);
}
I want to know ;would this code fetch all rows of MyObj and store in memory and then will Skip and Take or all of above code will translate to SQL command?
If all first will store in memory,How can I use LINQ to entity to not to use memory to Skip and Take?
As long as you're not materializing the query (i.e calling ToList()/ToArray() etc, or iterating over it), your Skip and Take method will be translated to SQL by the Linq to Entities provider that is part of the Entity Framework.
So to answer your question: no, it won't fetch all data and load it into memory.
See this MSDN article for a full explanation.
Both Skip and Take are listed as LINQ to Entities supported methods, so they will be transformed into proper SQL statements and only necessary rows will be retrieved from database.
And because your method returns IEnumerable<T> instead of IQueryable<T> every call to the query returned from that method will cause query execution.
Skip and Take will be evaluated and then the result will be stored to memory, so it will not hold the whole data set in memory. This is called Deferred Execution.
Ultimately it depends on your data store, but for SQL Server it would not pull any records not defined in that query. In the case of T-SQL, the TOP predicate is used to "skip" records.
How can I do these 2 scenarios.
Currently I am doing something like this
public class Repository
{
private LinqtoSqlContext dbcontext = new LinqtoSqlContext();
public void Update()
{
// find record
// update record
// save record ( dbcontext.submitChanges()
}
public void Insert()
{
// make a database table object ( ie ProductTable t = new ProductTable() { productname
="something"}
// insert record ( dbcontext.ProductTable.insertOnSubmit())
// dbcontext.submitChanges();
}
}
So now I am trying to load an XML file what has tons of records. First I validate the records one at a time. I then want to insert them into the database but instead of doing submitChanges() after each record I want to do a mass submit at the end.
So I have something like this
public class Repository
{
private LinqtoSqlContext dbcontext = new LinqtoSqlContext();
public void Update()
{
// find record
// update record
}
public void Insert()
{
// make a database table object ( ie ProductTable t = new ProductTable() { productname
="something"}
// insert record ( dbcontext.ProductTable.insertOnSubmit())
}
public void SaveToDb()
{
dbcontext.submitChanges();
}
}
Then in my service layer I would do like
for(int i = 0; i < 100; i++)
{
validate();
if(valid == true)
{
update();
insert()
}
}
SaveToDb();
So pretend my for loop is has a count for all the record found in the xml file. I first validate it. If valid then I have to update a table before I insert the record. I then insert the record.
After that I want to save everything in one go.
I am not sure if I can do a mass save when updating of if that has to be after every time or what.
But I thought it would work for sure for the insert one.
Nothing seems to crash and I am not sure how to check if the records are being added to the dbcontext.
The simple answer is: you do not. Linq2Sql is a lot of things - it is not a replacement for bulk upload / bulk copy. You will be a LOT more efficient using the ETL route:
Generate a flat file (csv etc.) with the new data
Load it into the database using bulk load mechanisms
If the data is updating etc. - load it into temporary tables and use the MERGE command to merge it into the main table.
Linq2Sql will by design always suck in mass insert scenarios. ORM's just are not ETL tools.
Linq2SQL (as has been noted) does not handle this well by default, but luckily there are some solutions out there. here's one i used for a website when i wanted to do some bulk deletes. It worked well for me and due to its use of extension methods it was basically indistinguishable from regular Lin2SQL methods.
I haven't really "released" this project yet, but it's a T4-based repository system that extends Linq To SQL and implements a bunch of batch operations (delete, update, create csv, etc.): http://code.google.com/p/grim-repo/. You can check out the source code and implement it however you see fit.
Also, this link has some great source code for batch operations: http://www.aneyfamily.com/terryandann/post/2008/04/Batch-Updates-and-Deletes-with-LINQ-to-SQL.aspx
And, also, I know it's tempting, but don't crap on the elderly. Try performing batch operations with DataAdapters/ADO.net: http://davidhayden.com/blog/dave/archive/2006/01/05/2665.aspx. It's faster, but inevitably hairier.
Finally, if you have an XML file, you can create a stored procedure that takes advantage of SQL server's built-in sproc, sp_xml_preparedocument. Check out how to use it here: http://msdn.microsoft.com/en-us/library/ms187367.aspx
Even when you add multiple records to the DataContext before calling SubmitChanges, LINQ2SQL will loop through and insert them one by one. You can verify this by implementing one of the partial methods on an entity class ("InsertMyObject(MyObject instance)"). It will be called for each pending row individually.
I don't see anything wrong with your plan -- you say it works, but you just don't know how to verify it? Can't you simply look in the database to check if the records got added?
Another way to see what records are pending in the DataContext and have not yet been added is to call GetChangeSet() on the data context and then refer to the "Inserts" property of the returned object to get a list of rows that will be inserted when SubmitChanges is called.