Self referencing loop while trying to use Hangfire - c#

I'm developing an Web API in ASP.NET Core and we found ourselves in the need of running a background task for a large bulk insert operation. The model I'm inserting however contains a property of the Geometry type from the .NET Topology Suite.
public class GeometricData
{
//...
public Geometry Geometry { get; set; }
}
In order to bulk insert, I'm following a method I found here and it is quite performatic, but its implementation is beyond the scope of this question. Even though it is fast, an user could be inserting over one million records in one go for instance, so we decided to move this processing to a background task. The Hangfire extension looked like something that could save us a lot of time at first, but it doesn't seem to handle well the Geometry type. In the code below, the BackgroundTask method might as well be an empty method:
public Task BulkInsert(IEnumerable<GeometricData> list)
{
BackgroundJob.Enqueue(() => BackgroundTask(list));
return Task.CompletedTask;
}
Just the act of passing a list of Geometry as a parameter to BackgroundTask in the action for Enqueue will throw the unfortunate error:
Self referencing loop detected for property 'CoordinateValue' with type 'NetTopologySuite.Geometries.Coordinate'. Path '[0].Geometry.Coordinates[0]'.
As a matter of fact, Coordinates (a NTS class) does reference itself:
No idea why they would do that, but they did. Regardless, everything worked just fine up until now, but unless I manage to find a solution to this (or even maybe work my way around it), I'm gonna be in a heap of trouble implementing a background worker from scratch (I'll be using the Worker Service, in case anyone is wondering). Any pointers?

Man, Add JsonIgnoreAttribute above Geometry and Coordinates, additionally: check the following link: newtonsoft.com/json/help/html/PropertyJsonIgnore.htm

Related

Does the need to make the code simpler justify the use of wrong abstractions?

Suppose we have a CommandRunner class that runs Commands, when a Command is created it's kept in the processingQueue for proccessing, if the execution of the Command finishes with errors the Command is moved to the faultedQueue for later processing but when everything is OK the Command is moved to the archiveQueue, the archiveQueue is not going to be processed in any way
the CommandRunner is something like this
class CommandRunner
{
public CommandRunner(IQueue<Command> processingQueue,
IQueue<Command> faultedQueue,
IQueue<Command> archiveQueue)
{
this.processingQueue = processingQueue;
this.faultedQueue= faultedQueue;
this.archiveQueue= archiveQueue;
}
public void RunCommands()
{
while(processingQueue.HasItems)
{
var current = processingQueue.Dequeue();
var result = current.Run();
if(result.HasError)
curent.MoveTo(faultedQueue);
else
curent.MoveTo(archiveQueue);
...
}
}
}
The CommandeRunner recives the three dependecies as a PersistentQueue the PersistentQueue is responsible for the long term storage of the Commands and so we free the CommandRunner from handling this
And the only purpose of the archiveQueue is to keep the design homogenous, to keep the CommandRunner persistence ignorant and with few dependencies
for example we can imagine a Property like this
IEnumerable<Command> AllCommands
{
get
{
return Enumerate(archiveQueue).Union(processingQueue).Union(faultedQueue);
}
}
many portions of the class need to do so(handle the Archive as a Queue to make the code simpler as shown above)
Does it make sense to use a Queue even if it's not the best abstraction, or do I have to use another abstraction for the archive concept.
what are other alternatives to meet these requirement?
Keep in mind that code, especially running code usually gets tangled and messy as time pass. To combat this, good names, good design, and meaningful comments come into play.
If you don't going to process the archiveQueue, and it's just a storage for messages that has been successfully processed, you can always store it as a different type (list, collection, set, whatever suits your needs), and then choose one of the following two:
Keep the name archiveQueue and change the underlying type. I would leave a comment where it's defined (or injected) saying : Notice that this might not be an actual queue. Name is for consistency reasons only.
Change the name to archiveRepository or something similar, while keeping the queue type. Obviously, since it's still a queue, you'll leave a comment saying: Notice, this is actually a queue.
Another thing to keep in mind, is that if you have n people working on your code base, you'll probably get n+1 different perferences about which way it shoud be done :)
Queue is a useful structure when you need to take care about the order of items inside it. If you need in your command post process, take care about the orders commands ran, then the queue can be a good choice.
If you don't need info about the order or commands, maybe you can use a List (on System.Collections namespace).
I think your choice are good, in the same case, I'll use a queues, we have a good example with OS design principles, inside OS (on Kernel) the process are queued for execution, clearly the OS queues are more complicated because they have other variables in mind like priority, and CPU utilization, but we can learn about the use of queues like data structures in process management.

Multiple calls wait on the same async task

Basically I have a lot of places in my app where in order to do something I need to make sure a product image is downloaded. I want to centralize this call into one place so I make sure to only download it once. I'm trying to do like this in my class:
private IAsyncOperation<bool> _downloadCoverTask;
internal IAsyncOperation<bool> DownloadCoverAsync()
{
if (this._downloadCoverTask == null)
{
this._downloadCoverTask = this._cloudProduct.Image.DownloadAsync();
}
return this._downloadCoverTask;
}
_cloudProduct.Image.DownloadAsync() is a method which actually does the image downloading (it also happens to be in a library that I don't control).
So in my code where I need to download the image, I do
await product.DownloadCoverAsync();
// do stuff with the image
This works ok the first time I call it, but the second time it's called I get the exception "A delegate was assigned when not allowed." I don't get any stack trace or anything either.
This would get called a bunch of places, but I'm hoping that they would all wait on the same task, and then continue when it's complete. If it's already completed I hope it would just return right away. It's possible that I'm misunderstanding and that's just not how it works?
(Yes, I posted this before, but wasn't able to get an answer and I think this question summarizes the issue better. There is also this question, but that uses TPL and seems to have a much more complex goal.)
Try to use a Task<bool> instead of an IAsyncOperation<bool>. You can get a task using the AsTask extension method.

How to break down large 'macro' classes?

One application I work on does only one thing, looking from outside world. Takes a file as input and after ~5 minutes spits out another file.
What happens inside is actually a sequential series of action. The application is, in our opinion, structured well because each action is like a small box, without too many dependencies.
Usually some later actions use some information from previous one and just a few can be executed in parallel - for the sake of simplicity we prefer to the execution sequential.
Now the problem is that the function that executes all this actions is like a batch file: a long list of calls to different functions with different arguments. So, looking in the code it looks like:
main
{
try
{
result1 = Action1(inputFile);
result2 = Action2(inputFile);
result3 = Action3(result2.value);
result4 = Action4(result1.value, inputFile);
... //You get the idea. There is no pattern passed paramteres
resultN = ActionN(parameters);
write output
}
catch
{
something went wrong, display the error
}
}
How would you model the main function of this application so is not just a long list of commands?
Not everything needs to fit to a clever pattern. There are few more elegant ways to express a long series of imperative statements than as, well, a long series of imperative statements.
If there are certain kinds of flexibility you feel you are currently lacking, express them, and we can try to propose solutions.
If there are certain clusters of actions and results that are re-used often, you could pull them out into new functions and build "aggregate" actions from them.
You could look in to dataflow languages and libraries, but I expect the gain to be small.
Not sure if it's the best approach, but you could have an object that would store all the results and you would give it to each method in turn. Every method would read the parameters it needs and write its result there. You could then have a collection of actions (either as delegates or objects implementing an interface) and call them in a loop.
class Results
{
public int Result1 { get; set; }
public string Result2 { get; set; }
…
}
var actions = new Action<Results>[] { Action1, Action2, … };
Results results = new Results();
foreach (var action in actions)
action(results);
You can think of implementing a Sequential Workflow from Windows Workflow
First of all, this solution is far not bad. If the actions are disjunct, I mean there are no global parameters or other hidden dependencies between different actions or between actions and the environment, it's a good solution. Easy to maintain or read, and when you need to expand the functionality, you have just to add new actions, when the "quantity" changes, you have just to add or remove lines from the macro sequence. If there's no need for change frequently the process chain: don't move!
If it's a system, where the implementation of actions don't often changes, but their order and parameters yes, you may design a simple script language, and transform the macro class into that script. This script should be maintained by someone else than you, someone who is familiar with the problem domain in the level of your "actions". So, he/she can assembly the application using script language without your assistance.
One nice approach for that kind of problem splitting is dataflow programming (a.k.a. Flow-based programming). In dataflow programming, there are pre-written components. Components are black boxes (from the view of the application developer), they have consumer (input) and producer (output) ports, which can be connected to form a processing network, which is then the application. If there're a good set of components for a domain, many applications can created without programming new components. Also, components can be built of other components (they called composite components).
Wikipedia (good starting point):
http://en.wikipedia.org/wiki/Dataflow_programming
http://en.wikipedia.org/wiki/Flow-based_programming
JPM's site (book, wiki, everything):
http://jpaulmorrison.com/fbp/
I think, bigger systems must have that split point you describe as "macro". Even games have that point, e.g. FPS games have a 3D engine and a game logic script, or there's SCUMM VM, which is the same.

Lockless list help!

Hi im trying to write a lockless list i got the adding part working it think but the code that extracts objects from the list does not work to good :(
Well the list is not a normal list.. i have the Interface IWorkItem
interface IWorkItem
{
DateTime ExecuteTime { get; }
bool Cancelled { get; }
void Execute(DateTime now);
}
and well i have a list where i can add this :P and the idear is when i run Get(); on the list it should loop it until it finds a IWorkItem that
If (item.ExecuteTime < DateTime.Now)
and remove it from the list and return it..
i have ran tests with many threads on my dual core cpu and it seems that Add works never failed so far but the Get function looses some workitems some where i have no idear whats wrong.....
ps if i get this working any one is free to use the code :) well you are any way but i dont se the point when its bugged :P
The code is here http://www.easy-share.com/1903474734/LinkedList.zip and if you try to run it you will see that it will some times not be able to get as many workitems as it did put in the list...
Edit: I have got a lockless list working it was faster than using the lock(obj) statment but i have a lock object that uses Interlocked that was still outpreforming the lockless list, im going to try to make a lockless arraylist and se if i get the same results there when im done ill upload the result here..
The problem is your algorithm: Consider this sequence of events:
Thread 1 calls list.Add(workItem1), which completes fully.
Status is:
first=workItem1, workItem1.next = null
Then thread 1 calls list.Add(workItem2) and reaches the spot right before the second Replace (where you have the comment "//lets try").
Status is:
first=workItem1, workItem1.next = null, nextItem=workItem1
At this point thread 2 takes over and calls list.Get(). Assume workItem1's executionTime is now, so the call succeeds and returns workItem1.
After this status is:
first = null, workItem1.next = null
(and in the other thread, nextItem is still workItem1).
Now we get back to the first thread, and it completes the Add() by setting workItem1.next:=workItem2.
If we call list.Get() now, we will get null, even though the Add() completed successfully.
You should probably look up a real peer-reviewed lock-free linked list algorithm. I think the standard one is this by John Valois. There is a C++ implementation here. This article on lock-free priority queues might also be of use.
You can use a timestamping protocol for datastructures just fine, mirroring this example from the database world:
Concurrency
But be clear that each item needs both a read and write timestamp, and be sure to follow the rules of the algorithm clearly.
There are some additional difficulties of implementing this on a linked list though, I think. The database example would be fine for a vector where you know the array index of what you want. However, in a linked list, you may need to walk down the pointers -- and the structure of the list could change while you are searching! I guess you could solve that by some sort of nuance (or if you just want to traverse the "new" list as it is, do nothing), but it poses a problem. Try to solve it without introducing some rollback condition that makes it worse than locking the list!
So are you sure that it needs to be lockless? Depending on your work load the non-blocking solution can sometimes be slower. Check out this MSDN article for a little more. Also proving that a lockless data structure is correct can be very difficult.
I am in no way an expert on the subject, but as far as I can see, you need to either make the ExecutionTime-field in the implementation of IWorkItem volatile (of course it might already be that) or insert a memorybarrier either after you set the ExecutionTime or before you read it.

using ThreadPools to search through object lists

I have these container objects (let's call them Container) in a list. Each of these Container objects in turn has a DataItem (or a derivate) in a list. In a typical scenario a user will have 15-20 Container objects with 1000-5000 DataItems each. Then there are some DataMatcher objects that can be used for different types of searches. These work mostly fine (since I have several hundred unit tests on them), but in order to make my WPF application feel snappy and responsive, I decided that I should use the ThreadPool for this task. Thus I have a DataItemCommandRunner which runs on a Container object, and basically performs each delegate in a list it takes as a parameter on each DataItem in turn; I use the ThreadPool to queue up one thread for each Container, so that the search in theory should be as efficient as possible on multi-core computers etc.
This is basically done in a DataItemUpdater class that looks something like this:
public class DataItemUpdater
{
private Container ch;
private IEnumerable<DataItemCommand> cmds;
public DataItemUpdater(Container container, IEnumerable<DataItemCommand> commandList)
{
ch = container;
cmds = commandList;
}
public void RunCommandsOnContainer(object useless)
{
Thread.CurrentThread.Priority = ThreadPriority.AboveNormal;
foreach (DataItem di in ch.ItemList)
{
foreach (var cmd in cmds)
{
cmd(sh);
}
}
//Console.WriteLine("Done running for {0}", ch.DisplayName);
}
}
(The useless object parameter for RunCommandsOnContainer is because I am experimenting with this with and without using threads, and one of them requires some parameter. Also, setting the priority to AboveNormal is just an experiment as well.)
This works fine for all but one scenario - when I use the AllWordsMatcher object type that will look for DataItem objects containing all words being searched for (as opposed to any words, exact phrase or regular expression for instance).
This is a pretty simple somestring.Contains(eachWord) based object, backed by unit tests. But herein lies some hairy strangeness.
When the RunCommandsOnContainer runs using ThreadPool threads, it will return insane results. Say I have a string like this:
var someString = "123123123 - just some numbers";
And I run this:
var res = someString.Contains("data");
When it runs, this will actually return true quite a lot - I have debugging information that shows it returning true for empty strings and other strings that simply do not contain the data. Also, it will some times return false even when the string actually contains the data being looked for.
The kicker in all this? Why do I suspect the ThreadPool and not my own code?
When I run the RunCommandsOnContainer() command for each Container in my main thread (i.e. locking the UI and everything), it works 100% correctly - every time! It never finds anything it shouldn't, and it never skips anything it should have found.
However, as soon as I use the ThreadPool, it starts finding a lot of items it shouldn't, while some times not finding items it should.
I realize this is a complex problem (it is painful trying to debug, that's for sure!), but any insight into why and how to fix this would be greatly appreciated!
Thanks!
Rune
It's a bit hard to see from the fragment you're posting, but judging by the symptoms I would look at the AllWordsMatcher (look for static state). If AllWordsMatcher is stateful you should also check that you're creating a new instance for each thread.
More generally I'd look at all the instances involved in the matching/searching process, specifically at the working objects being used when multithreaded. From past experience, the problem usually lies there. (It's easy to look too much at the object graph representing your business data Container/DataItem in this case)

Categories

Resources