Do you know a Bulked/Batched Flows Library for C# - c#

I am working on a project with peek performance requirements, so we need to bulk (batch?) several operations (for example persisting the data to a database) for efficiency.
However, I want our code to maintain an easy to understand flow, like:
input = Read();
parsed = Parse(input);
if (parsed.Count > 10)
{
status = Persist(parsed);
ReportSuccess(status);
return;
}
ReportFailure();
The feature I'm looking for here is automatically have Persist() happen in bulks (and ergo asynchronously), but behave to its user as if it's synchronous (user should block until the bulk action completes). I want the implementor to be able to implement Persist(ICollection).
I looked into flow-based programming, with which I am not highly familiar. I saw one library for fbp in C# here, and played a bit with Microsoft's Workflow Foundation, but my impression is that both are overkill for what I need. What would you use to implement a bulked flow behavior?
Note that I would like to get code that is exactly like what I wrote (simple to understand & debug), so solutions that involve yield or configuration in order to connect flows to one another are inadequate for my purpose. Also, chaining
is not what I'm looking for - I don't want to first build a chain and then run it, I want code that looks as if it is a simple flow ("Do A, Do B, if C then do D").

Common problem - instead of calling Persist I usually load up commands (or smt along those lines) into a Persistor class then after the loop is finished I call Persistor.Persist to persist the batch.
Just a few pointers - If you're generating sql the commands you add to the persistor can represent your queries somehow (with built-in objects, custom objects or just query strings). If you're calling stored procedures you can use the commands to append stuff to a piece of xml tha will be passed down to the SP when you call the persist method.
hope it helps - Pretty sure there's a pattern for this but dunno the name :)

I don't know if this is what you need, because it's sqlserver based, but have you tried taking a look to SSIS and or DTS?

One simple thing that you can do is to create a MemoryBuffer where you push the messages which simply add them to a list and returns. This MemoryBuffer has a System.Timers.Timer which gets invoked periodically and do the "actual" updates.
One such implementation can be found in a Syslog Server (C#) at http://www.fantail.net.nz/wordpress/?p=5 in which the syslog messages gets logged to a SQL Server periodically in a batch.
This approach might not be good if the info being pushed to database is important, as if something goes wrong, you will lose the messages in MemoryBuffer.

How about using the BackgroundWorker class to persist each item asynchronously on a separate thread? For example:
using System;
using System.Collections;
using System.Collections.Generic;
using System.ComponentModel;
using System.Threading;
class PersistenceManager
{
public void Persist(ICollection persistable)
{
// initialize a list of background workers
var backgroundWorkers = new List<BackgroundWorker>();
// launch each persistable item in a background worker on a separate thread
foreach (var persistableItem in persistable)
{
var worker = new BackgroundWorker();
worker.DoWork += new DoWorkEventHandler(worker_DoWork);
backgroundWorkers.Add(worker);
worker.RunWorkerAsync(persistableItem);
}
// wait for all the workers to finish
while (true)
{
// sleep a little bit to give the workers a chance to finish
Thread.Sleep(100);
// continue looping until all workers are done processing
if (backgroundWorkers.Exists(w => w.IsBusy)) continue;
break;
}
// dispose all the workers
foreach (var w in backgroundWorkers) w.Dispose();
}
void worker_DoWork(object sender, DoWorkEventArgs e)
{
var persistableItem = e.Argument;
// TODO: add logic here to save the persistableItem to the database
}
}

Related

Do I have to call Application.ExitThread()?

using System.Windows.Forms;
public class App
{
[STAThread]
public static void Main()
{
string fname;
using (var d = new OpenFileDialog())
{
if (d.ShowDialog() != DialogResult.OK)
{
return;
}
fname = d.FileName;
}
//Application.ExitThread();
for (; ;)
;
}
}
The above code shows me a file dialog. Once I select a file and press open, the for loop is executed, but the (frozen) dialog remains.
Once I uncomment Application.ExitThread() the dialog disappears as expected.
Does that work as intended? Why doesn't using make the window disappear? Where can I find more info about this?
You have discovered the primary problem with single-threaded applications... long running operations freeze the user interface.
Your DoEvents() call essentially "pauses" your code and gives other operations, like the UI, a chance to run, then resumes. The problem is that your UI is now frozen again until you call DoEvents() again. Actually, DoEvents() is a very problematic approach (some call it evil). You really should not use it.
You have better options.
Putting your long running operation in another thread helps to ensure that the UI remains responsive and that your work is done as efficiently as possible. The processor is able to switch back and forth between the two threads to give the illusion of simultaneous execution without the difficulty of full-blown multi-processes.
One of the easier ways to accomplish this is to use a BackgroundWorker, though they have generally fallen out of favor (for reasons I'm not going to get into in this post: further reading). They are still part of .NET however and have a lower learning curve then other approaches, so I'd still suggest that new developers play around with them in hobby projects.
The best approach currently is .NET's Tasks library. If your long running operation is already in a thread (for example, it's a database query and you are just waiting for it to complete), and if the library supports it, then you could take advantage of Tasks using the async keyword and not have to think twice about it. Even if it's not already in a thread or in a supported library, you could still spin up a new Task and have it executed in a separate Thread via Task.Run(). .NET Tasks have the advantage of baked in language support and a lot more, like coordinating multiple Tasks and chaining Tasks together.
JDB already explained in his answer why (generally speaking) your code doesn't work as expected. Let me add a small bit to suggest a workaround (for your specific case and for when you just need to use a system dialog and then go on like it was a console application).
You're trying to use Application.DoEvents(), OK it seems to work and in your case you do not have re-entrant code. However are you sure that all relevant messages are correctly processed? How many times you should call Application.DoEvents()? Are you sure you correctly initialize everything (I'm talking about the ApplicationContext)? Second problem is more pragmatic, OpenFileDialog needs COM, COM (here) needs STAThread, STAThread needs a message pump. I can't tell you in which way it will fail but for sure it may fail.
First of all note that usually applications start main message loop using Application.Run(). You don't expect to see new MyWindow().ShowDialog(), right? Your example is not different, let Application.Run(Form) overload creates the ApplicationContext for you (and handle HandleDestroyed event when form closes which will finally call - surprise - Application.ExitThread()). Unfortunately OpenFileDialog does not inherit from Form then you have to host it inside a dummy form to use Application.Run().
You do not need to explicitly call dlg.Dispose() (let WinForms manage objects lifetime) if you add the dialog inside the form with the designer.
using System;
using System.Windows.Forms;
public class App
{
[STAThread]
public static void Main()
{
string fname = AskForFile();
if (fname == null)
return;
LongRunningProcess(fname);
}
private static string AskForFile()
{
string fileName = null;
var form = new Form() { Visible = false };
form.Load += (o, e) => {
using (var dlg = new OpenFileDialog())
{
if (dlg.ShowDialog() == DialogResult.OK)
fileName = dlg.FileName;
}
((Form)o).Close();
};
Application.Run(form);
return fileName;
}
}
No, you don't have to call Application.ExitThread().
Application.ExitThread() terminates the calling thread's message loop and forces the destruction of the frozen dialog. Although "that works", it's better to unfreeze the dialog if the cause of the freeze is known.
In this case pressing open seems to fire a close-event which doesn't have any chance to finish. Application.DoEvents() gives it that chance and makes the dialog disappear.

How to stop SSAS cube processing programatically?

We are using Microsoft.AnalysisServices library to access our SSAS database (on SQL Server 2014) and programatically process cubes.
The code looks like this:
using (var server = new Server())
{
server.Connect("someserver");
if (server.Connected)
{
var db = server.Databases.FindByName("somedb");
if (db != null)
{
db.Process(ProcessType.ProcessFull);
}
}
}
The problem is, full cube processing may take a long time (in our case over 1h). And we need a way to gracefully stop/cancel it if necessary (the code above is a part of complex Windows Service that sometimes needs to be restarted).
It is completely acceptable that interrupting the task results in cube not being processed.
Is there a way to write the code above so it is non-blocking? Or pass a callback somehow? I could not find anything relevant in MSDN documentation:
https://msdn.microsoft.com/en-us/library/microsoft.analysisservices.database.aspx
Can you try getting rid of the using block and making your serverobject a class level member. Then try this on the cancel thread:
server.CancelSession(server.SessionID);

Multiple users writing at the same file

I have a project which is a Web API project, my project is accessed by multiple users (i mean a really-really lot of users). When my project being accessed from frontend (web page using HTML 5), and user doing something like updating or retrieving data, the backend app (web API) will write a single log file (a .log file but the content is JSON).
The problem is, when being accessed by multiple users, the frontend became unresponsive (always loading). The problem is in writing process of the log file (single log file being accessed by a really-really lot of users). I heard that using a multi threading technique can solve the problem, but i don't know which method. So, maybe anyone can help me please.
Here is my code (sorry if typo, i use my smartphone and mobile version of stack overflow):
public static void JsonInputLogging<T>(T m, string methodName)
{
MemoryStream ms = new MemoryStream();
DataContractJsonSerializer ser = new
DataContractJsonSerializer(typeof(T));
ser.WriteObject(ms, m);
string jsonString = Encoding.UTF8.GetString(ms.ToArray());
ms.Close();
logging("MethodName: " + methodName + Environment.NewLine + jsonString.ToString());
}
public static void logging (string message)
{
string pathLogFile = "D:\jsoninput.log";
FileInfo jsonInputFile = new FileInfo(pathLogFile);
if (File.Exists(jsonInputFile.ToString()))
{
long fileLength = jsonInputFile.Length;
if (fileLength > 1000000)
{
File.Move(pathLogFile, pathLogFile.Replace(*some new path*);
}
}
File.AppendAllText(pathLogFile, *some text*);
}
You have to understand some internals here first. For each [x] users, ASP.Net will use a single worker process. One worker process holds multiple threads. If you're using multiple instances on the cloud, it's even worse because then you also have multiple server instances (I assume this ain't the case).
A few problems here:
You have multiple users and therefore multiple threads.
Multiple threads can deadlock each other writing the files.
You have multiple appdomains and therefore multiple processes.
Multiple processes can lock out each other
Opening and locking files
File.Open has a few flags for locking. You can basically lock files exclusively per process, which is a good idea in this case. A two-step approach with Exists and Open won't help, because in between another worker process might do something. Bascially the idea is to call Open with write-exclusive access and if it fails, try again with another filename.
This basically solves the issue with multiple processes.
Writing from multiple threads
File access is single threaded. Instead of writing your stuff to a file, you might want to use a separate thread to do the file access, and multiple threads that tell the thing to write.
If you have more log requests than you can handle, you're in the wrong zone either way. In that case, the best way to handle it for logging IMO is to simply drop the data. In other words, make the logger somewhat lossy to make life better for your users. You can use the queue for that as well.
I usually use a ConcurrentQueue for this and a separate thread that works away all the logged data.
This is basically how to do this:
// Starts the worker thread that gets rid of the queue:
internal void Start()
{
loggingWorker = new Thread(LogHandler)
{
Name = "Logging worker thread",
IsBackground = true,
Priority = ThreadPriority.BelowNormal
};
loggingWorker.Start();
}
We also need something to do the actual work and some variables that are shared:
private Thread loggingWorker = null;
private int loggingWorkerState = 0;
private ManualResetEventSlim waiter = new ManualResetEventSlim();
private ConcurrentQueue<Tuple<LogMessageHandler, string>> queue =
new ConcurrentQueue<Tuple<LogMessageHandler, string>>();
private void LogHandler(object o)
{
Interlocked.Exchange(ref loggingWorkerState, 1);
while (Interlocked.CompareExchange(ref loggingWorkerState, 1, 1) == 1)
{
waiter.Wait(TimeSpan.FromSeconds(10.0));
waiter.Reset();
Tuple<LogMessageHandler, string> item;
while (queue.TryDequeue(out item))
{
writeToFile(item.Item1, item.Item2);
}
}
}
Basically this code enables you to work away all the items from a single thread using a queue that's shared across threads. Note that ConcurrentQueue doesn't use locks for TryDequeue, so clients won't feel any pain because of this.
Last thing that's needed is to add stuff to the queue. That's the easy part:
public void Add(LogMessageHandler l, string msg)
{
if (queue.Count < MaxLogQueueSize)
{
queue.Enqueue(new Tuple<LogMessageHandler, string>(l, msg));
waiter.Set();
}
}
This code will be called from multiple threads. It's not 100% correct because Count and Enqueue don't necessarily have to be called in a consistent way - but for our intents and purposes it's good enough. It also doesn't lock in the Enqueue and the waiter will ensure that the stuff is removed by the other thread.
Wrap all this in a singleton pattern, add some more logic to it, and your problem should be solved.
That can be problematic, since every client request handled by new thread by default anyway. You need some "root" object that is known across the project (don't think you can achieve this in static class), so you can lock on it before you access the log file. However, note that it will basically serialize the requests, and probably will have a very bad effect on performance.
No multi-threading does not solve your problem. How are multiple threads supposed to write to the same file at the same time? You would need to care about data consistency and I don't think that's the actual problem here.
What you search is asynchronous programming. The reason your GUI becomes unresponsive is, that it waits for the tasks to complete. If you know, the logger is your bottleneck then use async to your advantage. Fire the log method and forget about the outcome, just write the file.
Actually I don't really think your logger is the problem. Are you sure there is no other logic which blocks you?

What is a best approach for multithreading on SerialPort

as I am new in multithreaded application I would like to have some advice from more experienced people before starting to write the code...
I need to queue data received on serial port in serial port event for further processing.
So I have the following event handler:
void jmPort_ReceivedEvent(object source, SerialEventArgs e)
{
SetStatusLabel("Iddle...", lbStatus);
SetPicVisibility(ledNotReceiving, true);
SetPicVisibility(ledReceiving, false);
String st = jmPort.ReadLine();
if (st != null)
{
lines.Enqueue(st); //"lines" is the ConcurrentQueue<string> object
StartDataProcessing(lines); //???
SetStatusLabel("Receiving data...", lbStatus);
SetPicVisibility(ledNotReceiving, false);
SetPicVisibility(ledReceiving, true);
}
else
{
jmPort.Close();
jmPort.Open();
}
}
Within the StartDataProcessing I need to dequeue strings and update MANY UI controlls (using the InvokeRequired...this I already know :-)).
What is the best approach and colision free (without deadlock) approach to achieve this?
How to call StartDataProcessing method in more threads and safely dequeue (TryDequeue) the lines queue, make all needed computations and update UI controlls?
I have to appoint that the communication is very fast and that I am not using the standard SerialPort class. If I simply write all received strings without further processing to console window it works just well.
I am working in .NET 4.5.
Thank you for any advice...
Updated question: Ok, so what will be the best way to run the task from the datareceived event using TPL? Is it necessary to create another class (object) that will process data and use callbacks to update UI or it is possible to load some form method from the event? I'll could be very happy if someone can give me the direction what exactly to do within the datareceived event. What to do as the first step because studying all possible ways is not the solution I have time for. I need to begin with some particular way... There is so many different possible multithreading approaches and after reading about them I am still more confused and I don't know what will be the best a fastest solution... Usual Thread(s), BackgroundWorker, TPL, async-await...? :-( Because my application uses .NET 4.5 I would like to use some state-of-the-art solution :-) Thank you for any advice...
So after a lot of trying it is working to my satisfaction now.
Finally I've used the standard .NET SerialPort class as the third-party Serial class causes somae problems with higher baudrates (115200). It uses WinAPI directly so the finall code was mixed - managed and unmanaged. Now, even the standard .NET 4.5 SerialPort class works well (I've let my application successfully running through a whole night).
So, for everyone that need to deal with C#, SerialPort and higher rates (only for clarification - the device sending messages to PC is the STM32F407 /using USART 2/. I've tried it also with Arduino Due and it works as well) my datareceived event is in the following form now:
private void serialPort1_DataReceived(object sender, System.IO.Ports.SerialDataReceivedEventArgs e)
{
//the SetXXXXX functions are using the .InvokeRequired approach
//because the UI components are updated from another thread than
//the thread they were created in
SetStatusLabel("Iddle...", lbStatus);
SetPicVisibility(Form1.frm.ledNotReceiving, true);
SetPicVisibility(Form1.frm.ledReceiving, false);
String st = serialPort1.ReadLine();
if (st != null)
{
lines.Enqueue(st);
Task.Factory.StartNew(() => StartDataProcessing(lines)); // lines is global ConcurrentQueue object so in fact there is no need to pass it as parameter
SetStatusLabel("Receiving data...", lbStatus);
SetPicVisibility(Form1.frm.ledNotReceiving, false);
SetPicVisibility(Form1.frm.ledReceiving, true);
}
}
Within the StartDataProcessing function:
1. TryDequeue(lines, out str)
2. Use the ThreadPool.QueueUserWorkItem(lCallBack1, tmp); where tmp is needed part of the str (without EOF, without the message number etc.)
lCallBack1 = new WaitCallback(DisplayData);
Within the DisplayData function all the UI controls are updated
This approach mixes the ThreadPool and TPL ways but it is not a problem because the ThreadPool is used by TPL in background operation anyway.
Another working method I've tried was the following:
ThreadPool.QueueUserWorkItem(lCallBack, lines);
instead of :
Task.Factory.StartNew(() => StartDataProcessing(lines));
This method was working well but I've not tested it in over night run.
By my subjective perception the Task.... method updated the controls more smoothly but it can be only my personal feeling :-)
So, I hope this answer will help someone as I know from forums that many people are dealing with with unreliable communication based on the micocontroller <--> PC
My (surprising :-) ) conclusion is that the standard .NET SerialPort is able to handle messages even at higher baudrates. If you still run into troubles with buffer overrun then try to play with the SerialPort buffer size and SerialPort threshold. For me the settings 1024/500 are satisfactory (max size of the message send by microcontroller is 255 bytes so 500 bytes means that 2 messages are in buffer before the event is fired.)
You can also remove all SetXXXX calls from the datareceived event as they are not really needed and they can slow down the communication a little...
I am very close to real-time data capturing now and it is exactly what I've needed.
Good luck to everyone :-)
Within the StartDataProcessing I need to dequeue strings and update MANY UI controlls
No, you do not. You need to dequeue strings and then enqueue them again into the multiple queues for the different segments of the UI.
If you want to be fast, you scatter all operations and definitely the UI into separate windows that run their own separate message pumps and thus can update independently in separate UI threads.
The general process would be:
1 thread handles the serial port and takes the data and queues it.
Another one dequeues it and distributes it to separate processing threads from which
the data goes to multiple output queues all responsible for one part of the UI (depending on whether the UI Will turn a bottleneck).
There is no need to be thread safe in dequeuing. How serial is the data? Can you skip data when another update for the same piece arrives?
Read up on TPL and tasks - there are base libraries for parallel processing which come with a ton of documentation.

Good advices to use EF in a multithread program?

Have you got some good advices to use EF in a multithread program ?
I have 2 layers :
a EF layer to read/write into my database
a multithread service which uses my entities (read/write) and makes some computations (I use Task Parallel Library in the framework)
How can I synchronize my object contexts in each thread ?
Do you know a good pattern to make it work ?
Good advice is - just don't :-) EF barely manages to survive one thread - the nature of the beast.
If you absolutely have to use it, make the lightest DTO-s, close OC as soon as you have the data, repack data, spawn your threads just to do calculations and nothing else, wait till they are done, then create another OC and dump data back into DB, reconcile it etc.
If another "main" thread (the one that spawns N calculation threads via TPL) needs to know when some ther thread is done fire event, just set a flag in the other thread and then let it's code check the flag in it's loop and react by creating new OC and then reconciling data if it has to.
If your situation is more simple you can adapt this - the key is that you can only set a flag and let another thread react when it's ready. That means that it's in a stable state, has finished a round of whatever it was doing and can do things without risking race conditions. Reset the flag (an int) with interchaged operations and keep some timing data to make sure that your threads don't react again within some time T - otherwire they can spend their lifetime just querying DB.
This is how I implemented it my scenario.
var processing= new ConcurrentQueue<int>();
//possible multi threaded enumeration only processed non-queued records
Parallel.ForEach(dataEnumeration, dataItem=>
{
if(!processing.Contains(dataItem.Id))
{
processing.Enqueue(dataItem.Id);
var myEntityResource = new EntityResource();
myEntityResource.EntityRecords.Add(new EntityRecord
{
Field1="Value1",
Field2="Value2"
}
);
SaveContext(myEntityResource);
var itemIdProcessed = 0;
processing.TryDequeue(out itemIdProcessed );
}
}
public void RefreshContext(DbContext context)
{
var modifiedEntries = context.ChangeTracker.Entries()
.Where(e => e.State == EntityState.Modified || e.State == EntityState.Deleted);
foreach (var modifiedEntry in modifiedEntries)
{
modifiedEntry.Reload();
}
}
public bool SaveContext(DbContext context,out Exception error, bool reloadContextFirst = true)
{
error = null;
var saved = false;
try
{
if (reloadContextFirst)
this.RefreshContext(context);
context.SaveChanges();
saved = true;
}
catch (OptimisticConcurrencyException)
{
//retry saving on concurrency error
if (reloadContextFirst)
this.RefreshContext(context);
context.SaveChanges();
saved = true;
}
catch (DbEntityValidationException dbValEx)
{
var outputLines = new StringBuilder();
foreach (var eve in dbValEx.EntityValidationErrors)
{
outputLines.AppendFormat("{0}: Entity of type \"{1}\" in state \"{2}\" has the following validation errors:",
DateTime.Now, eve.Entry.Entity.GetType().Name, eve.Entry.State);
foreach (var ve in eve.ValidationErrors)
{
outputLines.AppendFormat("- Property: \"{0}\", Error: \"{1}\"", ve.PropertyName, ve.ErrorMessage);
}
}
throw new DbEntityValidationException(string.Format("Validation errors\r\n{0}", outputLines.ToString()), dbValEx);
}
catch (Exception ex)
{
error = new Exception("Error saving changes to the database.", ex);
}
return saved;
}
I think Craig might be right about your application no needing to have threads.. but you might look for the uses of ConcurrencyCheck in your models to make sure you don't "override" your changes
I don't know how much of your application is actually number crunching. If speed is the motivation for using multi-threading then it might pay off to take a step back and gather data about where the bottle next is.
In a lot of cases I have found that the limiting factor in applications using a database server is the speed of the I/O system for your storage. For example the speed of the hard drive disk(s) and their configuration can have a huge impact. A single hard drive disk with 7,200 RPM can handle about 60 transactions per second (ball park figure depending on many factors).
So my suggestion would be to first measure and find out where the bottle next is. Chances are you don't even need threads. That would make the code substantially easier to maintain and the quality is much higher in all likelihood.
"How can I synchronize my object contexts in each thread ?"
This is going to be tough. First of all SP or the DB queries can have parallel execution plan. So if you also have parallelism on object context you have to manually make sure that you have sufficient isolation but just enough that you dont hold lock too long that you cause deadlocks.
So I would say dont need to do it .
But that might not be the answer you want. So Can you explain a bit more what you want to achieve using this mutithreading. Is it more compute bound or IO bound. If it is IO bound long running ops then look at APM by Jeff Richter.
I think your question is more about synchronization between threads and EF is irrelevvant here. If I understand correctly you want to notify threads from one group when the main thread performed some operation - in this case "SaveChanges()" operation. The threads here are like client-server applications, where one thread is a server and other threads are clients and you want client-threads to react on server activity.
As someone noticed you probably do not need threads, but let's leave it as it is.
There is no fear of dead locks as long as you are going to use separate OC per thread.
I also assume that your client threads are long-running thread in some kind of loop. If you want your code to be executed on client thread you can't use C# events.
class ClientThread {
public bool SomethingHasChanged;
public MainLoop()
{
Loop {
if (SomethingHasChanged)
{
refresh();
SomethingHasChanged = false;
}
// your business logic here
} // End Loop
}
}
Now the question is how you will set the flag in all your client-threads? You could keep references to client threads in your main thread and loop through them and set all flags to true.
Back when I used EF, I simply had one ObjectContext, to which I synchronized all access.
This isn't ideal. Your database layer would effectively be singlethreaded. But, it did keep it thread-safe in a multithreaded environment. In my case, the heavy computation was not in the database code at all - this was a game server, so game logic was of course the primary resource hog. So, I didn't have any particular need for a multithreaded DB layer.

Categories

Resources