I have a performance counter category. The counters in this category may change for my next release so when the program starts, I want to check if the category exists and it is the correct version - if not, create the new category. I can do this by storing a GUID in the help string but this is obviously smelly. Is it possible to do this more cleanly with the .NET API?
Existing smelly version...
if (PerformanceCounterCategory.Exists(CATEGORY_NAME))
{
PerformanceCounterCategory c = new PerformanceCounterCategory(CATEGORY_NAME);
if (c.CategoryHelp != CATEGORY_VERSION)
{
PerformanceCounterCategory.Delete(CATEGORY_NAME);
}
}
if (!PerformanceCounterCategory.Exists(CATEGORY_NAME))
{
// Create category
}
In our system, each time the application starts we do a check for the existing category. If it isn't found, we create the category. If it exists, we compare the existing category to what we expect and recreate it (delete, create) if there are missing values.
var missing = counters
.Where(counter => !PerformanceCounterCategory.CounterExists(counter.Name, CategoryName))
.Count();
if (missing > 0)
{
PerformanceCounterCategory.Delete(CategoryName);
PerformanceCounterCategory.Create(
CategoryName,
CategoryHelp,
PerformanceCounterCategoryType.MultiInstance,
new CounterCreationDataCollection(counters.Select(x => (CounterCreationData)x).ToArray()));
}
I don't think there is a better way. IMHO, I don't think this is a terrible solution.
Related
I can do this in GitBash:
$ git diff --name-only v01...HEAD -- *.sql
which gives:
Components/1/Database/Stored Procedures/spDC1.sql
Components/1/Database/Stored Procedures/spDC2.sql
I can't see how I would do this in LibGit2Sharp.
Any ideas?
Thanks
Here is an example from one of my projects that get a ICommitLog collection between two commits (current HEAD vs. the master branch):
// git log HEAD..master --reverse
public ICommitLog StalkerList {
get {
var filter = new CommitFilter {
SortBy = CommitSortStrategies.Reverse | CommitSortStrategies.Time,
Since = master,
Until = head.Tip,
};
return repo.Commits.QueryBy (filter);
}
}
Once you have your ICommitLog collection of all the commits within the range that you need, you can cycle through each commit to get a list of the files that were effected within that commit (of course you would need to add filtering of the filename via your "*.sql" requirements):
public String[] FilesToMerge (Commit commit)
{
var fileList = new List<String> ();
foreach (var parent in commit.Parents) {
foreach (TreeEntryChanges change in repo.Diff.Compare<TreeChanges>(parent.Tree, commit.Tree)) {
fileList.Add (change.Path);
}
}
return fileList.ToArray ();
}
I think SushiHangover's answer to this is pretty correct. Just a couple of amendments / updates. (P.S. Yeah I know this question's relatively old, but a complete answer would have helped me if I'd found it today, so putting one here for future peeps).
This bit should be an amendment comment, but I can't comment yet (low rep):
First, I think that in Sushi's example, master and head.Tip are the wrong way around. Until excludes a commit from the results and excludes its ancestors. So if you put head.Tip in there it'll exclude basically the entire history tree.
So AFAIK it should read more like this:
// git log HEAD..master --reverse
public ICommitLog StalkerList {
get {
var filter = new CommitFilter {
SortBy = CommitSortStrategies.Reverse | CommitSortStrategies.Time,
Since = head.Tip,
Until = master
};
return repo.Commits.QueryBy (filter);
}
}
It's also SUPER important to realise that the order you give them in matters. If you just switch them around you'll get nothing back.
(Also note that Sushi's original example had a bad ',' after "head.Tip".
This bit's the update:
Also worth noting that the libgit2sharp library has been updated recently. Replacing "Since" and "Until" with "IncludeReachableFrom" and "ExcludeReachableFrom" respectively.
The names aren't particularly helpful until you realise that they're just way more verbose about what they're doing.
The comment for Exclude, for example, reads:
/// A pointer to a commit object or a list of pointers which will be excluded (along with ancestors) from the enumeration.
So the latest implementation would look more like:
using (Repository r = new Repository(#"[repo_location]"))
{
CommitFilter cf = new CommitFilter
{
SortBy = CommitSortStrategies.Reverse | CommitSortStrategies.Time,
ExcludeReachableFrom = r.Branches["master"].Tip,
IncludeReachableFrom = r.Head.Tip
};
var results = r.Commits.QueryBy(cf);
foreach (var result in results)
{
//Process commits here.
}
}
Tested this out in LINQPad and it seems to work. Might have missed something though as it was a quick draft. If so let me know.
Things to note: master and Head are actually properties of a Repository, I couldn't see that coming from anywhere in the old example but it may just be a version difference with the old version.
Under Neo4j v1.9.x, I used the following sort of code.
private Category CreateNodeCategory(Category cat)
{
var node = client.Create(cat,
new IRelationshipAllowingParticipantNode<Category>[0],
new[]
{
new IndexEntry(NeoConst.IDX_Category)
{
{ NeoConst.PRP_Name, cat.Name },
{ NeoConst.PRP_Guid, cat.Nguid.ToString() }
}
});
cat.Nid = node.Id;
client.Update<Category>(node, cat);
return cat;
}
The reason being that the Node Id was auto generated and I could use it later for a quick look up, start bits in other queries, etc. Like the following:
private Node<Category> CategoryGet(long nodeId)
{
return client.Get<Category>((NodeReference<Category>)nodeId);
}
This enables the following which appeared to work well.
public Category CategoryAdd(Category cat)
{
cat = CategoryFind(cat);
if (cat.Nid != 0) { return cat; }
return CreateNodeCategory(cat);
}
public Category CategoryFind(Category cat)
{
if (cat.Nid != 0) { return cat; }
var node = client.Cypher.Start(new {
n = Node.ByIndexLookup(NeoConst.IDX_Category, NeoConst.PRP_Name, cat.Name)})
.Return<Node<Category>>("n")
.Results.FirstOrDefault();
if (node != null) { cat = node.Data; }
return cat;
}
Now the cypher Wiki, examples and bad-habits recommend using the .ExecuteWithoutResults() in all the CRUD.
So the question I have is how do you have an Auto Increment value for the node ID?
First up, for Neo4j 2 and onwards, you always need to start with the frame of reference "how would I do this in Cypher?". Then, and only then, do you worry about the C#.
Now, distilling your question, it sounds like your primary goal is to create a node, and then return a reference to it for further work.
You can do this in cypher with:
CREATE (myNode)
RETURN myNode
In C#, this would be:
var categoryNode = graphClient.Cypher
.Create("(category {cat})")
.WithParams(new { cat })
.Return(cat => cat.Node<Category>())
.Results
.Single();
However, this still isn't 100% what you were doing in your original CreateNodeCategory method. You are creating the node in the DB, getting Neo4j's internal identifier for it, then saving that identifier back into the same node. Basically, you're using Neo4j to generate auto-incrementing numbers for you. That's functional, but not really a good approach. I'll explain more ...
First up, the concept of Neo4j even giving you the node id back is going away. It's an internal identifier that actually happens to be a file offset on disk. It can change. It is low level. If you think about SQL for a second, do you use a SQL query to get the file byte offset of a row, then reference that for future updates? A: No; you write a query that finds and manipulates the row all in one hit.
Now, I notice that you already have an Nguid property on the nodes. Why can't you use that as the id? Or if the name is always unique, use that? (Domain relevant ids are always preferable to magic numbers.) If neither are appropriate, you might want to look at a project like SnowMaker to help you out.
Next, we need to look at indexing. The type of indexing that you're using is referred to in the 2.0 docs as "Legacy Indexing" and misses out on some of the cool Neo4j 2.0 features.
For the rest of this answer, I'm going to assume your Category class looks like this:
public class Category
{
public Guid UniqueId { get; set; }
public string Name { get; set; }
}
Let's start by creating our category node with a label:
var category = new Category { UnqiueId = Guid.NewGuid(), Name = "Spanners" };
graphClient.Cypher
.Create("(category:Category {category})")
.WithParams(new { category })
.ExecuteWithoutResults();
And, as a one-time operation, let's establish a schema-based index on the Name property of any nodes with the Category label:
graphClient.Cypher
.Create("INDEX ON :Category(Name)")
.ExecuteWithoutResults();
Now, we don't need to worry about manually keeping indexes up to date.
We can also introduce an index and unique constraint on UniqueId:
graphClient.Cypher
.Create("CONSTRAINT ON (category:Category) ASSERT category.UniqueId IS UNIQUE")
.ExecuteWithoutResults();
Querying is now very easy:
graphClient.Cypher
.Match("(c:Category)")
.Where((Category c) => c.UniqueId == someGuidVariable)
.Return(c => c.As<Category>())
.Results
.Single();
Rather than looking up a category node, to then do another query, just do it all in one go:
var productsInCategory = graphClient.Cypher
.Match("(c:Category)<-[:IN_CATEGORY]-(p:Product)")
.Where((Category c) => c.UniqueId == someGuidVariable)
.Return(p => p.As<Product>())
.Results;
If you want to update a category, do that in one go as well:
graphClient.Cypher
.Match("(c:Category)")
.Where((Category c) => c.UniqueId == someGuidVariable)
.Update("c = {category}")
.WithParams(new { category })
.ExecuteWithoutResults();
Finally, your CategoryAdd method currently 1) does one DB hit to find an existing node, 2) a second DB hit to create a new one, 3) a third DB hit to update the ID on it. Instead, you can compress all of this to a single call too using the MERGE keyword:
public Category GetOrCreateCategoryByName(string name)
{
return graphClient.Cypher
.WithParams(new {
name,
newIdIfRequired = Guid.NewGuid()
})
.Merge("(c:Category { Name = {name})")
.OnCreate("c")
.Set("c.UniqueId = {newIdIfRequired}")
.Return(c => c.As<Category>())
.Results
.Single();
}
Basically,
Don't use Neo4j's internal ids as a way to hack around managing your own identities. (But they may release some form of autonumbering in the future. Even if they do, domain identities like email addresses or SKUs or airport codes or ... are preferred. You don't even always need an id: you can often infer a node based on its position in the graph.)
Generally, Node<T> will disappear over time. If you use it now, you're just accruing legacy code.
Look into labels and schema-based indexing. They will make your life easier.
Try and do things in the one query. It will be much faster.
Hope that helps!
I am writing a fairly large service centered around Stanford's Folding#Home project. This portion of the project is a WCF service hosted inside of a Windows Service. With proper database indices and a dual core Core2Duo/7200rpm platter I am able to run approximately 1500 rows per second (SQL 2012 Datacenter instance). Each hour when I run this update, it takes a considerable amount of time to iterate through all 1.5 million users and add updates where necessary.
Looking at the performance profiler in SQL Server Management Studio 2012, I see that every user is being loaded via individual queries. Is there a way with EF to eagerly load a set of a given size of users, update them in memory, then save the updated users - using queries more elegant than single-select, single-update? I am currently using EF5, but if I need to move to 6 for improved performance, I will. The main source of delay on this process is waiting for database results.
Also, if there is anything I should change about the ForAll or pre-processing, feel free to mention it. The group pre-processing is very quick and dramatically increases the speed of the update by controlling each EF context's size - but if I can pre-process more and improve the overall time, I am more than willing to look into it!
private void DoUpdate(IEnumerable<Update> table)
{
var t = table.ToList();
var numberOfRowsInGroups = t.Count() / (Properties.Settings.Default.UpdatesPerContext); //Control each local context size. 120 works well on most systems I have.
//Split work groups out of the table of updates.
var groups = t.AsParallel()
.Select((update, index) => new {Value = update, Index = index})
.GroupBy(a => a.Index % numberOfRowsInGroups)
.ToList();
groups.AsParallel().ForAll(group =>
{
var ents = new FoldingDataEntities();
ents.Configuration.AutoDetectChangesEnabled = false;
ents.Configuration.LazyLoadingEnabled = true;
ents.Database.Connection.Open();
var count = 0;
foreach (var a in group)
{
var update = a.Value;
var data = UserData.GetUserData(update.Name, update.Team, ents); //(Name,Team) is a superkey; passing ents allows external context control
if (data.TotalPoints < update.NewCredit)
{
data.addUpdate(update.NewCredit, update.Sum); //basic arithmetic, very quick - may attach a row to the UserData.Updates collection. (does not SaveChanges here)
}
}
ents.ChangeTracker.DetectChanges();
ents.SaveChanges();
});
}
//from the UserData class which wraps the EF code.
public static UserData GetUserData(string name, long team, FoldingDataEntities ents)
{
return context.Users.Local.FirstOrDefault(u => (u.Team == team && u.Name == name))
?? context.Users.FirstOrDefault(u => (u.Team == team && u.Name == name))
?? context.Users.Add(new User { Name = name, Team = team, StartDate = DateTime.Now, LastUpdate = DateTime.Now });
}
internal struct Update
{
public string Name;
public long NewCredit;
public long Sum;
public long Team;
}
EF is not the solution for raw performance... It's the "easy way" to do a Data Access Layer, or DAL, but comes with a fair bit of overhead. I'd highly recommend using Dapper or raw ADO.NET to do a bulk update... Would be a lot faster.
http://www.ormbattle.net/
Now, to answer your question, to do a batch update in EF, you'll need to download some extensions and third party plugins that will enable such abilities. See: Batch update/delete EF5
My code is written with C# and the data layer is using LINQ to SQL that fill/load detached object classes.
I have recently changed the code to work with multi threads and i'm pretty sure my DAL isn't thread safe.
Can you tell me if PopCall() and Count() are thread safe and if not how do i fix them?
public class DAL
{
//read one Call item from database and delete same item from database.
public static OCall PopCall()
{
using (var db = new MyDataContext())
{
var fc = (from c in db.Calls where c.Called == false select c).FirstOrDefault();
OCall call = FillOCall(fc);
if (fc != null)
{
db.Calls.DeleteOnSubmit(fc);
db.SubmitChanges();
}
return call;
}
}
public static int Count()
{
using (var db = new MyDataContext())
{
return (from c in db.Calls select c.ID).Count();
}
}
private static OCall FillOCall(Model.Call c)
{
if (c != null)
return new OCall { ID = c.ID, Caller = c.Caller, Called = c.Called };
else return null;
}
}
Detached OCall class:
public class OCall
{
public int ID { get; set; }
public string Caller { get; set; }
public bool Called { get; set; }
}
Individually they are thread-safe, as they use isolated data-contexts etc. However, they are not an atomic unit. So it is not safe to check the count is > 0 and then assume that there is still something there to pop. Any other thread could be mutating the database.
If you need something like this, you can wrap in a TransactionScope which will give you (by default) the serializable isolation level:
using(var tran = new TransactionScope()) {
int count = OCall.Count();
if(count > 0) {
var call = Count.PopCall();
// TODO: something will call, assuming it is non-null
}
}
Of course, this introduces blocking. It is better to simply check the FirstOrDefault().
Note that PopCall could still throw exceptions - if another thread/process deletes the data between you obtaining it and calling SubmitChanges. The good thing about it throwing here is that you shouldn't find that you return the same record twice.
The SubmitChanges is transactional, but the reads aren't, unless spanned by a transaction-scope or similar. To make PopCall atomic without throwing:
public static OCall PopCall()
{
using(var tran = new TrasactionScope())
using (var db = new MyDataContext())
{
var fc = (from c in db.Calls where c.Called == false select c).FirstOrDefault();
OCall call = FillOCall(fc);
if (fc != null)
{
db.Calls.DeleteOnSubmit(fc);
db.SubmitChanges();
}
return call;
}
tran.Complete();
}
}
Now the FirstOrDefault is covered by the serializable isolation-level, so doing the read will take a lock on the data. It would be even better if we could explicitly issue an UPDLOCK here, but LINQ-to-SQL doesn't offer this.
Count() is thread-safe. Calling it twice at the same time, from two different threads will not harm anything. Now, another thread might change the number of items during the call, but so what? Another thread might change the number of item a microsecond after it returns, and there's nothing you can do about it.
PopCall on the other hand, does has a possibility of threading problems. One thread could read fc, then before it reaches the SubmitChanges(), another thread may interceded and do the read & delete, before returning to the first thread, which will attempt to delete the already deleted record. Then both calls will return the same object, even though it was your intension that a row only be returned once.
Unfortunately no amount of Linq-To-Sql trickery, nor SqlClient isolation levels, nor System.Transactions can make the PopCall() thread safe, where 'thread safe' really means 'concurrent safe' (ie. when concurrency occurs on the database server, outside the controll and scope of the client code/process). And neither is any sort of C# locking and synchronization going to help you. You just need to deeply internalize how a relational storage engine works in order to get this doen correctly. Using tables as queues (as you do it here) is notoriously tricky , deadlock prone, and really hard to get it correct.
Even less fortunate, your solution is going to have to be platform specific. I'm only going to explain the right way to do it with SQL Server, and that is to leverage the OUTPUT clause. If you want to get a bit more details why this is the case, read this article Using tables as Queues. Your Pop operation must occur atomically in the database with a calls like this:
WITH cte AS (
SELECT TOP(1) ...
FROM Calls WITH (READPAST)
WHERE Called = 0)
DELETE
FROM cte
OUTPUT DELETED.*;
Not only this, but the Calls table has to be organized with a leftmost clustered key on the Called column. Why this is the case, is again explained in the article I referenced before.
In this context the Count call is basically useless. Your only way to check correctly for an item available is to Pop, asking for Count is just going to put useless stress on the database to return a COUNT() value which means nothing under a concurrent environment.
I have a MS SQL table that I don't have any control over and I need to write to. This table has a int primary key that isn't automatically incremented. I can't use stored procs and I would like to use Linq to SQL since it makes other processing very easy.
My current solution is to read the last value, increment it, try to use it, if I get a clash, increment it again and retry.
Something along these lines:
var newEntity = new Log()
{
ID = dc.Logs.Max(l => l.ID) + 1,
Note = "Test"
};
dc.Logs.InsertOnSubmit(newEntity);
const int maxRetries = 10;
int retries = 0;
bool success = false;
while (!success && retries < maxRetries)
{
try
{
dc.SubmitChanges();
success = true;
}
catch (SqlException)
{
retries++;
newEntity.ID = dc.Logs.Max(l => l.ID);
}
}
if (retries >= maxRetries)
{
throw new Exception("Bummer...");
}
Does anyone have a better solution?
EDIT: Thanks to Jon, I simplified the max ID calculation. I was still in SQL thinking mode.
That looks like an expensive way to get the maximum ID. Have you already tried
var maxId = dc.Logs.Max(s => s.ID);
? Maybe it doesn't work for some reason, but I really hope it does...
(Admittedly it's more than possible that SQL Server optimises this appropriately.)
Other than that, it looks okay (smelly, but necessarily so) to me - but I'm not an expert on the matter...
You didn't indicate whether your app is the only one inserting into the table. If it is, then I'd fetch the max value once right after the start of the app/webapp and use Interlocked.Increment on it every time you need next ID (or simple addition if possible race conditions can be ruled out).
You could put the entire operation in a transaction, using a TransactionScope class, like below:
using (TransactionScope scope = new TransactionScope()){
var maxId = dc.Logs.Max(s => s.ID);
var newEntity = new Log(){
ID = maxId,
Note = "Test"
};
dc.Logs.InsertOnSubmit(newEntity);
dc.SubmitChanges();
scope.Complete();
}
By putting both the retrieval of the maximum ID and the insertion of the new records within the same transaction, you should be able to pull off an insert without having to retry in your manner.
One problem you might face with this method will be transaction deadlocks, especially if the table is heavily used. Do test it out to see if you require additional error-handling.
P.S. I included Jon Skeet's code to get the max ID in my code, because I'm pretty sure it will work correctly. :)
Make the id field auto incrementing and let the server handle id generation.
Otherwise, you will run into the problem liggett78 said. Nothing prevents another thread from reading the same id in between the reading and submitting of max id for this thread.