IgniteQueue in Apache Ignite.NET - c#

We are using Ignite.NET and don't have option to use Ignite Java API (team skills, technology affinity etc). We are looking to create a queuing mechanism so that we could process messages in distributed fashion. I found IgniteQueue data structure to be most suitable but it doesn't seem to be available in ignite.net could someone please suggest a solution to the scenario. Multiple producers queue a unique work item to be processed reliably by only 1 consumer at a time.
E.g. there are P1,P2 producers (on different machines) they generate T1,T2,T3 on the queue and we have C1,C2,C3 consumers (on different machines) now T1 should be processed by ONLY 1 from C1,C2,C3 and so on for T2,T3 should also similarly be processed only once by 1 consumer

IgniteQueue is built on top of Ignite Cache, so yes, you can replicate the same functionality in .NET:
Create a cache
Use Continuous Query as a consumer, call ICache.Remove to ensure that every item is processed only once
Add data to cache on producers with Data Streamers or just use ICache.Put / PutAll
Below is the code for continuous query listener:
class CacheEventListener<TK, TV> : ICacheEntryEventListener<TK, TV>
{
private readonly string _cacheName;
[InstanceResource] // Injected automatically.
private readonly IIgnite _ignite = null;
private ICache<TK, TV> _cache;
public CacheEventListener(string cacheName)
{
_cacheName = cacheName;
}
public void OnEvent(IEnumerable<ICacheEntryEvent<TK, TV>> events)
{
_cache = _cache ?? _ignite.GetCache<TK, TV>(_cacheName);
foreach (var entryEvent in events)
{
if (entryEvent.EventType == CacheEntryEventType.Created && _cache.Remove(entryEvent.Key))
{
// Run consumer logic here - use another thread for heavy processing.
Consume(entryEvent.Value);
}
}
}
}
Then we deploy this to every node with a single call:
var consumer = new CacheEventListener<Guid, string>(cache.Name);
var continuousQuery = new ContinuousQuery<Guid, string>(consumer);
cache.QueryContinuous(continuousQuery);
As a result, OnEvent is called once per entry on the primary node for that entry. So there is one consumer per Ignite node. We can increase effective number of consumers per node by offloading actual consumer logic to other threads, using BlockingCollection and so on.
And one last thing - we have to come up with a unique cache key for every new entry. Simplest thing is Guid.NewGuid(), but we can also use AtomicSequence.

Related

Azure EventHubs throws Exception: At least one receiver for the endpoint is created with epoch of '0', and so non-epoch receiver is not allowed

Introduction
Hello all, we're currently working on a microservice platform that uses Azure EventHubs and events to sent data in between the services.
Let's just name these services: CustomerService, OrderService and MobileBFF.
The CustomerService mainly sends updates (with events) which will then be stored by the OrderService and MobileBFF to be able to respond to queries without having to call the CustomerService for this data.
All these 3 services + our developers on the DEV environment make use of the same ConsumerGroup to connect to these event hubs.
We currently make use of only 1 partition but plan to expand to multiple later. (You can see our code is already made to be able to read from multiple partitions)
Exception
Every now and then we're running into an exception though (if it starts it usually keeps throwing this error for an hour or something). For now we've only seen this error on DEV/TEST environments though.
The exception:
Azure.Messaging.EventHubs.EventHubsException(ConsumerDisconnected): At least one receiver for the endpoint is created with epoch of '0', and so non-epoch receiver is not allowed. Either reconnect with a higher epoch, or make sure all epoch receivers are closed or disconnected.
All consumers of the EventHub, store their SequenceNumber in their own Database. This allows us to have each consumer consume events separately and also store the last processed SequenceNumber in it's own SQL database. When the service (re)starts, it loads the SequenceNumber from the db and then requests events from here onwards untill no more events can be found. It then sleeps for 100ms and then retries. Here's the (somewhat simplified) code:
var consumerGroup = EventHubConsumerClient.DefaultConsumerGroupName;
string[] allPartitions = null;
await using (var consumer = new EventHubConsumerClient(consumerGroup, _inboxOptions.EventHubConnectionString, _inboxOptions.EventHubName))
{
allPartitions = await consumer.GetPartitionIdsAsync(stoppingToken);
}
var allTasks = new List<Task>();
foreach (var partitionId in allPartitions)
{
//This is required if you reuse variables inside a Task.Run();
var partitionIdInternal = partitionId;
allTasks.Add(Task.Run(async () =>
{
while (!stoppingToken.IsCancellationRequested)
{
try
{
await using (var consumer = new EventHubConsumerClient(consumerGroup, _inboxOptions.EventHubConnectionString, _inboxOptions.EventHubName))
{
EventPosition startingPosition;
using (var testScope = _serviceProvider.CreateScope())
{
var messageProcessor = testScope.ServiceProvider.GetService<EventHubInboxManager<T, EH>>();
//Obtains starting position from the database or sets to "Earliest" or "Latest" based on configuration
startingPosition = await messageProcessor.GetStartingPosition(_inboxOptions.InboxIdentifier, partitionIdInternal);
}
while (!stoppingToken.IsCancellationRequested)
{
bool processedSomething = false;
await foreach (PartitionEvent partitionEvent in consumer.ReadEventsFromPartitionAsync(partitionIdInternal, startingPosition, stoppingToken))
{
processedSomething = true;
startingPosition = await messageProcessor.Handle(partitionEvent);
}
if (processedSomething == false)
{
await Task.Delay(100, stoppingToken);
}
}
}
}
catch (Exception ex)
{
//Log error / delay / retry
}
}
}
}
The exception is thrown on the following line:
await using (var consumer = new EventHubConsumerClient(consumerGroup, _inboxOptions.EventHubConnectionString, _inboxOptions.EventHubName))
More investigation
The code described above is running in the MicroServices (which are hosted as AppServices in Azure)
Next to that we're also running 1 Azure Function that also reads events from the EventHub. (Probably uses the same consumer group).
According to the documentation here: https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-features#consumer-groups it should be possible to have 5 consumers per consumer group. It seems to be suggested to only have one, but it's not clear to us what could happen if we don't follow this guidance.
We did do some tests with manually spawning multiple instances of our service that reads events and when there were more then 5 this resulted in a different error which stated quite clearly that there could only be 5 consumers per partition per consumer group (or something similar).
Furthermore it seems like (we're not 100% sure) that this issue started happening when we rewrote the code (above) to be able to spawn one thread per partition. (Even though we only have 1 partition in the EventHub). Edit: we did some more log-digging and also found a few exception before merging in the code to spawn one thread per partition.
That exception indicates that there is another consumer configured to use the same consumer group and asserting exclusive access over the partition. Unless you're explicitly setting the OwnerLevel property in your client options, the likely candidate is that there is at least one EventProcessorClient running.
To remediate, you can:
Stop any event processors running against the same Event Hub and Consumer Group combination, and ensure that no other consumers are explicitly setting the OwnerLevel.
Run these consumers in a dedicated consumer group; this will allow them to co-exist with the exclusive consumer(s) and/or event processors.
Explicitly set the OwnerLevel to 1 or greater for these consumers; that will assert ownership and force any other consumers in the same consumer group to disconnect.
(note: depending on what the other consumer is, you may need to test different values here. The event processor types use 0, so anything above that will take precedence.)
To add to the Jesse's answer, I think the exception message is part of
the old SDK.
If you look into the docs, there 3 types of receiving modes defined there:
Epoch
Epoch is a unique identifier (epoch value) that the service uses, to enforce partition/lease ownership.
The epoch feature provides users the ability to ensure that there is only one receiver on a consumer group at any point in time...
Non-epoch:
... There are some scenarios in stream processing where users would like to create multiple receivers on a single consumer group. To support such scenarios, we do have ability to create a receiver without epoch and in this case we allow upto 5 concurrent receivers on the consumer group.
Mixed:
... If there is a receiver already created with epoch e1 and is actively receiving events and a new receiver is created with no epoch, the creation of new receiver will fail. Epoch receivers always take precedence in the system.

How to handle changes from different containers in Cosmos DB through Change Feed

I have some amount of containers in Cosmos DB that changes all the time. I need to provide some mechanism for reading all the changes from those containers.
I'm trying to implement builder/factory for Change Feed Processor (CFP). In my case, I have to create CFP instances dynamically for the different container.
How I see the solution right now - I need a WebJob/Console Application that listens to the queue. When another application creates a new container in Cosmos DB it also sends a new message to the queue. Message in the queue contains all information for creating new CFP (connection string, collection name, lease container name, etc). The application creates new CFP and runs it in a new thread in the background forever.
Here is the code how I'm creating a new CFP
private void StartNewProcessor()
{
new List<Task>().Add(Task.Run(async () =>
{
var container = Database.GetContainer(ContainerName);
var lease = Database.GetContainer(LeaseName);
var changeFeedProcessor = container.GetChangeFeedProcessorBuilder<Item>(ProcessorName, ProcessData)
.WithLeaseContainer(lease)
.WithInstanceName(InstanceName)
.Build();
await changeFeedProcessor.StartAsync();
Console.WriteLine($"Change Feed Processor: {ProcessorName} have been started");
Console.ReadKey(true);
await changeFeedProcessor.StopAsync();
}));
}
The problem is that it's a bad approach since there can be 100 and more collections in the future, so I'll need to create 100 extra threads in background.
I'm looking for some ideas regarding the architecture application and how to do all that in the right way. It will be great if it is possible to handle changes for all containers in one application.

Lock cache with Apache Ignite.NET thin client

Currently we use Apache Ignite.NET thin client to cache different sets of data. When data request has came we check if data is already stored is the cache and, if not, request data from database and put it into the cache.
I want to prevent several database requests if two data requests has came at the same time.
Is there any way to manually lock cache before the first database request started? Thus second data request could wait until first request is completed.
I cannot solve the task isung .NET concurrency primitives cause cache could be used by multiple client instances (load-balancing).
I've already found ICache.Lock(TK key) method, but it seems that it locks only specified rows in cache and is supported only for in self-hosted mode, not for Ignite.NET this client.
Small piece of code that illustrates the issue:
var key = "cache_key";
using (var ignite = Ignition.StartClient(new Core.Client.IgniteClientConfiguration { Host = "127.0.0.1" }))
{
var cacheNames = ignite.GetCacheNames();
if (cacheNames.Contains(key))
{
return ignite.GetCache<int, Employee>(key).AsCacheQueryable();
}
else
{
var data = RequestDataFromDatabase();
var cache = ignite.CreateCache<int, Employee>(new CacheClientConfiguration(
EmployeeCacheName, new QueryEntity(typeof(int), typeof(Employee))));
cache.PutAll(data);
return cache.AsCacheQueryable();
}
}
The thin client doesn't have the required API.
If you don't need to check for individual records and it's only required to know whether the cache is available, you might just call CreateCache multiple times. It should throw an exception saying that the cache with a particular name already has started for further invocations.
try {
var cache = ignite.CreateCache<int, Employee>(new CacheClientConfiguration(
EmployeeCacheName, new QueryEntity(typeof(int), typeof(Employee))));
// Cache created by this call => add data here
} catch (IgniteClientException e) when (e.Message.Contains("already started")) {
// Return existing cache, don't add data
}
Alexandr has provided a good and simple solution if you just need to initialize the cache once.
If you need more complex synchronization logic, atomic cache operations (PutIfAbsent, Replace) can often replace locks. For example, we could have a special cache to track the status of other caches:
var statusCache = Client.GetOrCreateCache<string, string>("status");
if (statusCache.PutIfAbsent("cache-name", "created"))
{
// Just created, add data
...
//
statusCache.Put("cache-name", "populated");
}
else
{
// Already exists, wait for data
while (statusCache["cache-name"] != "populated")
Thread.Sleep(1000);
}

Multiple users writing at the same file

I have a project which is a Web API project, my project is accessed by multiple users (i mean a really-really lot of users). When my project being accessed from frontend (web page using HTML 5), and user doing something like updating or retrieving data, the backend app (web API) will write a single log file (a .log file but the content is JSON).
The problem is, when being accessed by multiple users, the frontend became unresponsive (always loading). The problem is in writing process of the log file (single log file being accessed by a really-really lot of users). I heard that using a multi threading technique can solve the problem, but i don't know which method. So, maybe anyone can help me please.
Here is my code (sorry if typo, i use my smartphone and mobile version of stack overflow):
public static void JsonInputLogging<T>(T m, string methodName)
{
MemoryStream ms = new MemoryStream();
DataContractJsonSerializer ser = new
DataContractJsonSerializer(typeof(T));
ser.WriteObject(ms, m);
string jsonString = Encoding.UTF8.GetString(ms.ToArray());
ms.Close();
logging("MethodName: " + methodName + Environment.NewLine + jsonString.ToString());
}
public static void logging (string message)
{
string pathLogFile = "D:\jsoninput.log";
FileInfo jsonInputFile = new FileInfo(pathLogFile);
if (File.Exists(jsonInputFile.ToString()))
{
long fileLength = jsonInputFile.Length;
if (fileLength > 1000000)
{
File.Move(pathLogFile, pathLogFile.Replace(*some new path*);
}
}
File.AppendAllText(pathLogFile, *some text*);
}
You have to understand some internals here first. For each [x] users, ASP.Net will use a single worker process. One worker process holds multiple threads. If you're using multiple instances on the cloud, it's even worse because then you also have multiple server instances (I assume this ain't the case).
A few problems here:
You have multiple users and therefore multiple threads.
Multiple threads can deadlock each other writing the files.
You have multiple appdomains and therefore multiple processes.
Multiple processes can lock out each other
Opening and locking files
File.Open has a few flags for locking. You can basically lock files exclusively per process, which is a good idea in this case. A two-step approach with Exists and Open won't help, because in between another worker process might do something. Bascially the idea is to call Open with write-exclusive access and if it fails, try again with another filename.
This basically solves the issue with multiple processes.
Writing from multiple threads
File access is single threaded. Instead of writing your stuff to a file, you might want to use a separate thread to do the file access, and multiple threads that tell the thing to write.
If you have more log requests than you can handle, you're in the wrong zone either way. In that case, the best way to handle it for logging IMO is to simply drop the data. In other words, make the logger somewhat lossy to make life better for your users. You can use the queue for that as well.
I usually use a ConcurrentQueue for this and a separate thread that works away all the logged data.
This is basically how to do this:
// Starts the worker thread that gets rid of the queue:
internal void Start()
{
loggingWorker = new Thread(LogHandler)
{
Name = "Logging worker thread",
IsBackground = true,
Priority = ThreadPriority.BelowNormal
};
loggingWorker.Start();
}
We also need something to do the actual work and some variables that are shared:
private Thread loggingWorker = null;
private int loggingWorkerState = 0;
private ManualResetEventSlim waiter = new ManualResetEventSlim();
private ConcurrentQueue<Tuple<LogMessageHandler, string>> queue =
new ConcurrentQueue<Tuple<LogMessageHandler, string>>();
private void LogHandler(object o)
{
Interlocked.Exchange(ref loggingWorkerState, 1);
while (Interlocked.CompareExchange(ref loggingWorkerState, 1, 1) == 1)
{
waiter.Wait(TimeSpan.FromSeconds(10.0));
waiter.Reset();
Tuple<LogMessageHandler, string> item;
while (queue.TryDequeue(out item))
{
writeToFile(item.Item1, item.Item2);
}
}
}
Basically this code enables you to work away all the items from a single thread using a queue that's shared across threads. Note that ConcurrentQueue doesn't use locks for TryDequeue, so clients won't feel any pain because of this.
Last thing that's needed is to add stuff to the queue. That's the easy part:
public void Add(LogMessageHandler l, string msg)
{
if (queue.Count < MaxLogQueueSize)
{
queue.Enqueue(new Tuple<LogMessageHandler, string>(l, msg));
waiter.Set();
}
}
This code will be called from multiple threads. It's not 100% correct because Count and Enqueue don't necessarily have to be called in a consistent way - but for our intents and purposes it's good enough. It also doesn't lock in the Enqueue and the waiter will ensure that the stuff is removed by the other thread.
Wrap all this in a singleton pattern, add some more logic to it, and your problem should be solved.
That can be problematic, since every client request handled by new thread by default anyway. You need some "root" object that is known across the project (don't think you can achieve this in static class), so you can lock on it before you access the log file. However, note that it will basically serialize the requests, and probably will have a very bad effect on performance.
No multi-threading does not solve your problem. How are multiple threads supposed to write to the same file at the same time? You would need to care about data consistency and I don't think that's the actual problem here.
What you search is asynchronous programming. The reason your GUI becomes unresponsive is, that it waits for the tasks to complete. If you know, the logger is your bottleneck then use async to your advantage. Fire the log method and forget about the outcome, just write the file.
Actually I don't really think your logger is the problem. Are you sure there is no other logic which blocks you?

Managing a long-running data-processing task using threaded queues

I have a database-synchronisation task that takes some time to process, as there are in the region of 120k leaf records, but they are remote and relatively slow to access.
Currently, my app does a fairly naive process of
Get list of all the local Contacts
For each local contact, get all the related data
Then get the matching remote contact
Compare the two and do stuff to bring them in sync
Step 1 returns data before it's finished, and step 4 doesn't involve comparisons between different contacts in the same set.
What I was hoping to do was use some sort of queue construct and start populating it in step 1, then immediately move onto step 2 and start processing items as they come in, using multiple threads.
The process then becomes:
Start populating the queue with contacts
While there are items in the queue
Start a thread and:
Take the front contact from the queue
Fetch the remote contact
Compare them
Perform the required updates
Am I correct in the assumption that I can create a new ConcurrentQueue, start populating it, then loop over it as I might a single-threaded simple collection?
(I've not put in any error-checking or the actual threading, to keep the example simple)
class Program
{
static void Main(string[] args)
{
Processor p = new Processor();
p.Process();
}
}
class Processor
{
bool FetchComplete = false;
ConcurrentQueue<Contact> q = new ConcurrentQueue<Contact>();
public void Process()
{
this.PopulateQueue(); // this will be fired off using QueueUserWorkItem for example
while (FetchComplete == false)
{
if (q.Count > 0)
{
Contact contact;
q.TryDequeue(out contact);
ProcessContact(contact); // this will also be in QueueUserWorkItem
}
}
}
// a long running process that fills the queue with Contacts
private void PopulateQueue()
{
this.FetchComplete = false;
// foreach contact in database
Contact contact = new Contact(); // contact will come from DB
this.q.Enqueue(contact);
// end foreach
this.FetchComplete = true;
}
private void ProcessContact(Contact contact)
{
// do magic with contact
}
}
You might be better off using BlockingCollection instead of ConcurrentQueue. The reason being that the former will block the thread calling Take until an item appears in the queue. This would be useful when the thread processing the Contract instances clears out the queue before the fetching thread has retrieved them all.
In general your strategy is pretty solid. I use it all the time. It is often referred to as the producer-consumer pattern. When there are more than 2 stages involved in the processing then it is called the pipeline pattern. In that case you would have 2 or more queues instead of the typical one. You can imagine scenarios where each stage forwards the work item onto the next stage via another queue.

Categories

Resources