In my console application I have a couple of classes (let's call them MyClass1, MyClass2, ...) having a method that should check the existence of certain records in the database (different classes wait for different records) and return only when the needed records exist. I currently have a simple implementation using an infinite loop and Thread.Sleep. This approach does work, but it tends to cause a high CPU load. What is the way to make these methods more CPU-friendly?
public override void WaitForRecord()
{
MyDatabaseRecord record = null;
while (record == null)
{
Thread.Sleep(500);
using (var dc = new MyDataContext())
{
record = dc.Documents
.Where( /*condition*/)
.SingleOrDefault();
}
}
Logger.Info("Record with ID " + record.Id + " found at " + DateTime.Now)
}
The usage of these methods is pretty straightforward: the calling code creates a bunch of objects, launches each object's WaitForRecord method using Task.Factory.StartNew, periodically checks whether any tasks have finished execution, and prints the results in the console like this:
MyClass1 is still waiting for record...
MyClass2 has found the record...
...
Assuming that you're connecting to a SQL (2005 or greater) database, you could look into SqlDependency. Here is an article on code project about SqlDependency and EF:
http://www.codeproject.com/Articles/496484/SqlDependency-with-Entity-Framework
Related
What I'm trying to accomplish is a C# application that will read logs from the Windows Event Logs and store them somewhere else. This has to be fast, since some of the devices where it will be installed generate a high amount of logs/s.
I have tried three approaches so far:
Local WMI: it didn't work good, there are too many errors and exceptions caused by the size of the collections that need to be loaded.
EventLogReader: I though this was the perfect solution, since it allows you to query the event log however you like by using XPath expressions. The problem is that when you want to get the content of the message for each log (by calling FormatDescription()) takes way too much time for long collections.
E.g: I can read 12k logs in 0.11s if I just go over them.
If I add a line to store the message for each log, it takes nearly 6 minutes to complete exactly the same operation, which is totally crazy for such a low number of logs.
I don't know if there's any kind of optimization that might be done to EventLogReader in order to get the message faster, I couldn't find anything either on MS documentation nor on the Internet.
I also found that you can read the log entries by using a class called EventLog. However, this technology does not allow you to enter any kind of filters so you basically have to load the entire list of logs to memory and then filter it out according to your needs.
Here's an example:
EventLog eventLog = EventLog.GetEventLogs().FirstOrDefault(el => el.Log.Equals("Security", StringComparison.OrdinalIgnoreCase));
var newEntries = (from entry in eventLog.Entries.OfType()
orderby entry.TimeWritten ascending
where entry.TimeWritten > takefrom
select entry);
Despite of being faster in terms of getting the message, the use of memory might be high and I don't want to cause any issues on the devices where this solution will get deployed.
Can anybody help me with this? I cannot find any workarounds or approaches to achieve something like this.
Thank you!.
You can give the EventLogReader class a try. See https://learn.microsoft.com/en-us/previous-versions/bb671200(v=vs.90).
It is better than the EventLog class because accessing the EventLog.Entries collection has the nasty property that its count can change while you are reading from it. What is even worse is that the reading happens on an IO threadpool thread which will let your application crash with an unhandled exception. At least that was the case some years ago.
The EventLogReader also gives you the ability to supply a query string to filter for the events you are interested in. That is the way to go if you write a new application.
Here is an application which shows how you can parallelize reading:
using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Diagnostics;
using System.Diagnostics.Eventing.Reader;
using System.Linq;
using System.Threading.Tasks;
namespace EventLogReading
{
class Program
{
static volatile bool myHasStoppedReading = false;
static void ParseEventsParallel()
{
var sw = Stopwatch.StartNew();
var query = new EventLogQuery("Application", PathType.LogName, "*");
const int BatchSize = 100;
ConcurrentQueue<EventRecord> events = new ConcurrentQueue<EventRecord>();
var readerTask = Task.Factory.StartNew(() =>
{
using (EventLogReader reader = new EventLogReader(query))
{
EventRecord ev;
bool bFirst = true;
int count = 0;
while ((ev = reader.ReadEvent()) != null)
{
if ( count % BatchSize == 0)
{
events.Enqueue(ev);
}
count++;
}
}
myHasStoppedReading = true;
});
ConcurrentQueue<KeyValuePair<string, EventRecord>> eventsWithStrings = new ConcurrentQueue<KeyValuePair<string, EventRecord>>();
Action conversion = () =>
{
EventRecord ev = null;
using (var reader = new EventLogReader(query))
{
while (!myHasStoppedReading || events.TryDequeue(out ev))
{
if (ev != null)
{
reader.Seek(ev.Bookmark);
for (int i = 0; i < BatchSize; i++)
{
ev = reader.ReadEvent();
if (ev == null)
{
break;
}
eventsWithStrings.Enqueue(new KeyValuePair<string, EventRecord>(ev.FormatDescription(), ev));
}
}
}
}
};
Parallel.Invoke(Enumerable.Repeat(conversion, 8).ToArray());
sw.Stop();
Console.WriteLine($"Got {eventsWithStrings.Count} events with strings in {sw.Elapsed.TotalMilliseconds:N3}ms");
}
static void ParseEvents()
{
var sw = Stopwatch.StartNew();
List<KeyValuePair<string, EventRecord>> parsedEvents = new List<KeyValuePair<string, EventRecord>>();
using (EventLogReader reader = new EventLogReader(new EventLogQuery("Application", PathType.LogName, "*")))
{
EventRecord ev;
while ((ev = reader.ReadEvent()) != null)
{
parsedEvents.Add(new KeyValuePair<string, EventRecord>(ev.FormatDescription(), ev));
}
}
sw.Stop();
Console.WriteLine($"Got {parsedEvents.Count} events with strings in {sw.Elapsed.TotalMilliseconds:N3}ms");
}
static void Main(string[] args)
{
ParseEvents();
ParseEventsParallel();
}
}
}
Got 20322 events with strings in 19,320.047ms
Got 20323 events with strings in 5,327.064ms
This gives a decent speedup of a factor 4. I needed to use some tricks to get faster because for some strange reason the class ProviderMetadataCachedInformation is not thread safe and uses internally a lock(this) around the Format method which defeats paralell reading.
The key trick is to open the event log in the conversion threads again and then read a bunch of events of the query there via the event bookmark Api. That way you can format the strings independently.
Update1
I have landed a change in .NET 5 which increases performance by a factor three up to 20. See https://github.com/dotnet/runtime/issues/34568.
You can also copy the EventLogReader class from .NET Core and use this one instead which will give you the same speedup.
The full saga is described by my Blog Post: https://aloiskraus.wordpress.com/2020/07/20/ms-performance-hud-analyze-eventlog-reading-performance-in-realtime/
We discussed a bit about reading the existing logs in the comments, can access the Security-tagged logs by accessing:
var eventLog = new EventLog("Security");
for (int i = 0; i < eventLog.Entries.Count; i++)
{
Console.WriteLine($"{eventLog.Entries[i].Message}");
}
This might not be the cleanest (performance-wise) way of doing it, but I doubt any other will be faster, as you yourself have already found out by trying out different techniques.
A small edit duo to Alois post: EventLogReader is not faster out of the box than EventLog, especially when using the for-loop mechanism showed in the code block above, I think EventLog is faster -- it only accesses the entries inside the loop using their index, the Entries collection is just a reference, whereas while using the EventLogReader, it will perform a query first and loop through that result, which should be slower. As commented on Alois's post: if you don't need to use the query option, just use the EventLog variant. If you do need querying, use the EventLogReader as is can query on a lower level than you could while using EventLog (only LINQ queries, which is slower ofcourse than querying in while executing the look-up).
To prevent you from having this hassle again in the future, and because you said you are running a service, I'd use the EntryWritten event of the EventLog class:
var eventLog = new EventLog("Security")
{
EnableRaisingEvents = true
};
eventLog.EntryWritten += EventLog_EntryWritten;
// .. read existing logs or do other work ..
private static void EventLog_EntryWritten(object sender, EntryWrittenEventArgs e)
{
Console.WriteLine($"received new entry: {e.Entry.Message}");
}
Note that you must set the EnableRaisingEvents to true in order for the event to fire whenever a new entry is logged. It'll also be a good practice (also, performance-wise) to start a (for example) Task, so that the system won't lock itself while queuing up the calls to your event.
This approach works fine if you want to retrieve all newly created events, if you want to retrieve newly created events but use a query (filter) for these events, you can check out the EventLogWatcher class, but in your case, when there are no constraints, I'd just use the EntryWritten event because you don't need filters and for plain old simplicity.
I am having the following problem: with RethinkDB using RunChangesAsync method runs once and when used, it starts listening to changes on a given query. When the query changes, you are given the Cursor<Change<Class>> , which is a delta between the initial state and the actual state.
My question is how can I make this run continuously?
If I use:
while(true)
{
code.... //changes happening while program is here
....../
...RunChangesAsync();
/......processed buffered items
code //new changes here
}
If there are changes happening where i pointed in the code, they would not be caught by the RunChanges. The only changes that would be caught would be while RunChanges is listening. Not before ..or after it retrieves the results.
So I tried wrapping the RunChanges in an observable but it does not listen continuously for changes as I would have expected...it just retrieves 2 null items (garbage I suppose) and ends.
Observable
public IObservable<Cursor<Change<UserStatus?>>> GetObservable() =>
r.Db(Constants.DB_NAME).Table(Constants.CLIENT_TABLE).RunChangesAsync<UserStatus?>(this.con,CancellationToken.None).ToObservable();
Observer
class PlayerSubscriber : IObserver<Cursor<Change<UserStatus?>>>
{
public void OnCompleted() => Console.WriteLine("Finished");
public void OnError(Exception error) => Console.WriteLine("error");
public void OnNext(Cursor<Change<UserStatus?>> value)
{
foreach (var item in value.BufferedItems)
Console.WriteLine(item);
}
}
Program
class Program
{
public static RethinkDB r = RethinkDB.R;
public static bool End = false;
static async Task Main(string[] args)
{
var address = new Address { Host = "127.0.0.1", Port = 28015 };
var con = await r.Connection().Hostname(address.Host).Port(address.Port).ConnectAsync();
var database = new Database(r, con);
var obs = database.GetObservable();
var sub = new PlayerSubscriber();
var disp = obs.Subscribe(sub);
Console.ReadKey();
Console.WriteLine("Hello World!");
}
}
When I am debugging as you can see, the OnNext method of the Observer is executed only once (returns two null objects) and then it closes.
P.S: Database is just a wrapper around rethinkdb queries. The only method used is GetObservable which I posted it. The UserStatus is a POCO.
When creating a change feed, you'll want to create one change feed object. For example, when you get back a Cursor<Change<T>> after running .RunChangesAsync(); that is really all you need.
The cursor object you get back from query.RunChangesAsync() is your change feed object that you will use for the entire lifetime you want to receive changes.
In your example:
while(true)
{
code.... //changes happening while program is here
....../
...RunChangesAsync();
/......processed buffered items
code //new changes here
}
Having .RunChangesAsync(); in a while loop is not the correct approach. You don't need to re-run the query again and get another Cursor<Change<T>>. I'll explain how this works at the end of this post.
Also, do not use cursor.BufferedItems on the cursor object. The cursor.BufferedItems property on the cursor is not meant to consumed by your code directly; the cursor.BufferedItems property is only exposed for those special situations where you want to "peek ahead" inside the cursor object (client-side) for items that are ready to be consumed that are specific to your change feed query.
The proper way to consume items in your change feed is to enumerate over the cursor object itself as shown below:
var cursor = await query.RunChangesAsync(conn);
foreach (var item in cursor){
Console.WriteLine(item);
}
When the cursor runs out of items, it will make a request to the RethinkDB server for more items. Keep in mind, each iteration of the foreach loop can be potentially a blocking call. For example, the foreach loop can block indefinitely when 1) there are no items on the client-side to be consumed (.BufferedItems.Count == 0) and 2) there are no items that have been changed on the server-side according to your change feed query criteria. under these circumstances, the foreach loop will block until RethinkDB server sends you an item that is ready to be consumed.
Documentation about using Reactive Extensions and RethinkDB in C#
There is a driver unit test that shows how .NET Reactive Extensions can work here.
Specifically, Lines 31 - 47 in this unit test set up a change feed with Reactive Extensions:
var changes = R.Db(DbName).Table(TableName)
//.changes()[new {include_states = true, include_initial = true}]
.Changes()
.RunChanges<JObject>(conn);
changes.IsFeed.Should().BeTrue();
var observable = changes.ToObservable();
//use a new thread if you want to continue,
//otherwise, subscription will block.
observable.SubscribeOn(NewThreadScheduler.Default)
.Subscribe(
x => OnNext(x),
e => OnError(e),
() => OnCompleted()
);
Additionally, here is a good example and explanation of what happens and how to consume a change feed with C#:
Hope that helps.
Thanks,
Brian
If you have an operation that has the signature Task<int> ReadAsync(), then the way to set up polling, is like this:
IObservable<int> PollRead(TimeSpan interval)
{
return
Observable
.Interval(interval)
.SelectMany(n => Observable.FromAsync(() => ReadAsync()));
}
I'd also caution about you creating your own implementation of IObservable<T> - it's fraught with danger. You should use Observer.Create(...) if you are creating your own observer that you want to hand around. Generally you don't even do that.
I'm debugging an existing windows service (written in C#) that needs to be manually restarted every few months because it keeps eating memory.
The service is not very complicated. It requests a json file from an external server, which holds products.
Next it parses this json file into a list of products.
For each of these products it is checking if this product already exists in the database. If not it will be added if it does exists the properties will be updated.
The database is a PostgreSQL database and we use NHibernate v3.2.0 as ORM.
I've been using JetBrains DotMemory to profile the service when it runs:
The service starts and after 30s it starts doing its work. SnapShot #1 is made before the first run.
Snapshot #6 was made after the 5th run.
The other snapshots are also made after a run.
As you can see after each run the number of objects increases with approx. 60k and the memory used increases with a few MBs after every run.
Looking closer at Snapshot #6, shows the retained size is mostly used by NHibernate session objects:
Here's my OnStart code:
try
{
// Trying to fix certificate errors:
ServicePointManager.ServerCertificateValidationCallback += delegate
{
_logger.Debug("Cert validation work around");
return true;
};
_timer = new Timer(_interval)
{
AutoReset = false // makes it fire only once, restart when work is done to prevent multiple runs
};
_timer.Elapsed += DoServiceWork;
_timer.Start();
}
catch (Exception ex)
{
_logger.Error("Exception in OnStart: " + ex.Message, ex);
}
And my DoServiceWork:
try
{
// Call execute
var processor = new SAPProductProcessor();
processor.Execute();
}
catch (Exception ex)
{
_logger.Error("Error in DoServiceWork", ex);
}
finally
{
// Next round:
_timer.Start();
}
In SAPProductProcessor I use two db calls. Both in a loop.
I loop through all products from the JSON file and check if the product is already in the table using the product code:
ProductDto dto;
using (var session = SessionFactory.OpenSession())
{
using (var transaction = session.BeginTransaction(IsolationLevel.ReadCommitted))
{
var criteria = session.CreateCriteria<ProductDto>();
criteria.Add(Restrictions.Eq("Code", code));
dto = criteria.UniqueResult<ProductDto>();
transaction.Commit();
}
}
return dto;
And when the productDto is updated I save it using:
using (var session = SessionFactory.OpenSession())
{
using (var transaction = session.BeginTransaction(IsolationLevel.ReadCommitted))
{
session.SaveOrUpdate(item);
transaction.Commit();
}
}
I'm not sure how to change the code above to stop increasing the memory and the number of object.
I already tried using var session = SessionFactory.GetCurrentSession(); instead of using (var session = SessionFactory.OpenSession()) but that didn't stop the increase of memory.
Update
In the constructor of my data access class MultiSessionFactoryProvider sessionFactoryProvider is injected. And the base class is called with : base(sessionFactoryProvider.GetFactory("data")). This base class has a method BeginSession:
ISession session = _sessionFactory.GetCurrentSession();
if (session == null)
{
session = _sessionFactory.OpenSession();
ThreadLocalSessionContext.Bind(session);
}
And a EndSession:
ISession session = ThreadLocalSessionContext.Unbind(_sessionFactory);
if (session != null)
{
session.Close();
}
In my data access class I call base.BeginSession at the start and base.EndSession at then end.
The suggestion about the Singleton made me have a closer look at my data access class.
I thought when creating this class with every run would free the NHibernate memory when it runs out of scope. I even added some dispose call in the class' destructor. But that didn't work, or more likely I'm not doing it correctly.
I now save my data access class in a static field and re-use it. Now my memory doesn't increase anymore and more important the number of open objects stay the same. I just run the service using DotMemory again for over an hour calling the run around 150 times and the memory of the last snapshot is still around 105MB and the number of object is still 117k and my SessionFactory dictionary is now just 4MB instead of 150*4MB.
According to this MSDN article, you should be able to multithread a process with each thread enlisted in a single root transaction.
I created a sample based on that article where I expect atransaction to be rolled-back (bool[] results should be all false in the foreach loop). Unfortunately, this is not the case, and the outcome is predictably unpredictable (run the example enough times and you will see any combination of bool values in the array).
In addition, I've tried both DependentCloneOption.BlockCommitUntilComplete and DependentCloneOption.RollbackIfNotComplete neither of which produce the expected result.
Secondly, I think ThreadPool.QueueUserWorkItem is ugly code at best, and it would be nice to see something like this using Parallel.ForEach instead.
And finally, my question :) Why the heck is this not working? What am I doing wrong? Is it just flat-out impossible to wrap multiple threads in a single transaction?
namespace Playing
{
class Program
{
static bool[] results = new bool[] { false, false, false };
static void Main(string[] args)
{
try
{
using (var outer = new TransactionScope(
TransactionScopeOption.Required))
{
for (var i = 0; i < 3; i++ )
{
ThreadPool.QueueUserWorkItem(WorkerItem,
new Tuple<int, object>(
i, Transaction.Current.DependentClone(
DependentCloneOption.BlockCommitUntilComplete)));
}
outer.Complete();
}
}
catch { /* Suppress exceptions */ }
// Expect all to be false
foreach (var r in results)
Console.WriteLine(r);
}
private static void WorkerItem(object state)
{
var tup = (Tuple<int, object>)state;
var i = tup.Item1;
var dependent = (DependentTransaction)tup.Item2;
using (var inner = new TransactionScope(dependent))
{
// Intentionally throw exception to force roll-back
if (i == 2)
throw new Exception();
results[i] = true;
inner.Complete();
}
dependent.Complete();
}
}
}
Yours results[] members that have been set to true won't magically set themselves back to false (sadly). That's what Transaction Managers do. Look at the EnlistXXX methods to get an idea of what's involved.
Basically, you'll need to compensate in the event of a rollback. For example, you could subscribe to the root Transaction's TransactionCompleted event and check if the transaction was rolled back. If it was you'll need to restore the previous values for the child workers that completed.
You can also handle the TransactionAbortedException thrown that you are suppressing, or handle it at the worker level (see an example of catching it on this page: http://msdn.microsoft.com/en-us/library/ms973865.aspx)
Typically, with in-memory "transactions" you are better off using the Task library to have the workers batch up results and then "commit" them in a continuation of a parent Task. It's easier than messing about with Transactions, which you only need to do if you are coordinating between memory and some other Transaction Manager (like SQL Server or other processes).
I'm dealing with a courious scenario.
I'm using EntityFramework to save (insert/update) into a SQL database in a multithreaded environment. The problem is i need to access database to see whether a register with a particular key has been already created in order to set a field value (executing) or it's new to set a different value (pending). Those registers are identified by a unique guid.
I've solved this problem by setting a lock since i do know entity will not be present in any other process, in other words, i will not have same guid in different processes and it seems to be working fine. It looks something like that:
static readonly object LockableObject = new object();
static void SaveElement(Entity e)
{
lock(LockableObject)
{
Entity e2 = Repository.FindByKey(e);
if (e2 != null)
{
Repository.Insert(e2);
}
else
{
Repository.Update(e2);
}
}
}
But this implies when i have a huge ammount of requests to be saved, they will be queued.
I wonder if there is something like that (please, take it just as an idea):
static void SaveElement(Entity e)
{
(using ThisWouldBeAClassToProtectBasedOnACondition protector = new ThisWouldBeAClassToProtectBasedOnACondition(e => e.UniqueId)
{
Entity e2 = Repository.FindByKey(e);
if (e2 != null)
{
Repository.Insert(e2);
}
else
{
Repository.Update(e2);
}
}
}
The idea would be having a kind of protection that protected based on a condition so each entity e would have its own lock based on e.UniqueId property.
Any idea?
Don't use application-locks where database transactions or constraints are needed.
The use of a lock to prevent duplicate entries in a database is not a good idea. It limits the scalability of your application be forcing only a single instance to ever exist that can add or update such records. Or worse, someone will eventually try to scale the application to multiple processes or servers and it will cause data corruption (since locks are local to a single process).
What you should consider instead is using a combination of unique constraints in the database and transactions to ensure that no two attempts to add the same entry can both succeed. One will succeed - the other will be forced to rollback.
This might work for you, you can just lock on the instance of e:
lock(e)
{
Entity e2 = Repository.FindByKey(e);
if (e2 != null)
{
Repository.Insert(e2);
}
else
{
Repository.Update(e2);
}
}