Delete Azure Data Factory pipeline after successful run - c#

Right now I have a WebAPI application that after receiving a request dynamically creates a specific pipeline in C# to do a specific task.
However, because the number of pipelines and datasets is limited to 5000, the application requests will eventually cause to reach this limit. I'm thinking about a way to automatically delete the pipeline and its datasets, but I'm not sure how. Manual deletion is out of the question, unfortunately.
Is there maybe a way for executing a "self-destruction" of a pipeline after completion? Or maybe trigger of removing old pipelines periodically?

No such mechanism to cleaning all the resources directly in ADF,however you could use Azure Function Time Trigger to implement it in the schedule.Please refer to my thoughts:
1.Create time trigger azure function(for example triggered every day) to query pipeline runs with REST API or SDK.
2.Loop the results and filter the Status==Succeeded and runEnd< today to get the pipeline name list
3.Delete them one by one by name list by using Delete API.(REST API:https://learn.microsoft.com/en-us/rest/api/datafactory/pipelines/delete)
4.Deleting datasets is a little bit of trouble. Although you can get the pipeline name, the activities in the pipeline are not necessarily the same, resulting in different datasets.For example,if it is copy activity,you could get referenceName in inputs and outputs array.If it is feasible to clear all datasets and they will be re-created, you can easily use the LIST DATASETS API and kill them all.

Related

Scheduled task in Azure

I am using Azure Cosmos DB. I have some collections as shown in snapshot below-
Now, I want to create a scheduled task which will retrieve all the data from collection- "CurrentDay" and do some calculation and store the value in another collection- "datewise". Similarly, I will need to retrieve all the data from collection- "datewise" and based one some calculation store the data in "monthwise" and then to "yearly".
I looked for some option in Scheduler in Azure portal and tried creating a scheduler but it seems I don't have sufficient permission license to use that feature. Basically I haven't used that so I am not sure it will work.
Had it been in SQL Server I could have done that using custom code in C#. The only option I currently have is to use REST API calls to fetch data, calculate in C# and Post it back to Azure Cosmos DB. Is there any better way of doing this?
Please let me know if I can provide any details.
I think using a scheduled task (on Azure) and getting the data via the REST API is probably what you want to do. There are several reasons why this isn't as bad as you might think:
Your server and your database are right next to each other in the data centre, so you don't need to pay for data transfer.
Latency is very low and bandwidth is very high, so you'll be limited by the database performance more than anything else (you can run parallel tasks in your scheduled task to make sure of this).
The REST API has a very well supported official C# client library.
Of course it depends on the scale of data we're talking about as to how you should provision your scheduled task.
I'd encapsulate your logic in an Azure WebJob method, and mark the method with a TimerTrigger. The TimerTrigger will call your given method on the schedule that you specify. This has a few less moving parts. If you were to go the scheduler route, you're still going to have to have the scheduler call some endpoint in order to perform the work. Packaging up your logic and schedule in a WebJob will simplify things a bit.
On a side note, if all data lived in the same collection, I'd suggest writing a stored procedure to perform these calculations. But alas, stored procedures in Cosmos are bounded at the collection level.

NServiceBus batching a long running job

I'm working on a project that is using NSB, really like it but it's my first NSB solution so a bit of a noob. We have a job that needs to run every day that processes members - it is not expected to take long as the work is simple, but will potentially effect thousands of members, and in the future, perhaps tens or hundreds of thousands.
Having it all happen in a single handler in one go feels wrong, but having a handler discover affected members and then fire separate events for each one sounds a bit too much in the opposite direction. I can think of a few other methods of doing it, but was wondering if there is an idiomatic way of dealing with this in NSB?
Edit to clarify: I'm using Schedule to send a command at 3am, the handler for that will query the SQL db for a list of members who need to be processed. Processing will involve updating/inserting one or two rows per member. My question is around how to process that potentially larege list of members within NSB.
Edit part 2: the job now needs to run monthly, not daily.
I would not use a saga for this. Sagas should be lightweight and are designed for orchestration rather than performing work. They are started by messages rather than scheduled.
You can achieve your ends by using the built-in scheduler. I've not used it, but it looks simple enough.
You could do something like:
configure a command message (eg StartJob) to be sent every day at 0300.
StartJob handler will then query the DB to get the work.
Then, depending on your requirements:
If you need all the work done at once, create a single command with all the work in it, and send it to another endpoint for processing. If you use transactional MSMQ then this will succeed or fail as a unit.
If you don't care if only some work succeeds then create a command per unit of work, and dispatch to an endpoint for processing. This has the benefit that you can scale out using the distributor if you needed to.
I'm working on a project that is using NSB...We have a job that needs
to run every day...
Although you can use NSB for this kind of work, it's not really something I would do. There are many other approaches you could use. A SQL job or cron job would be the obvious one (and a hell of a lot quicker to develop, more performant, and simpler).
Even though it does support such use cases, NServiceBus is not really designed for scheduled batch processing. I would seriously question whether you should even use NSB for this task.
You mention a running process and that sounds like a job for a Saga (see https://docs.particular.net/nservicebus/sagas/). You can use saga data and persist checkpoints in different storage mediums (SQL, Mongo etc). But yes, having something long running then dispatch messages from the Saga to individual handlers is definitely something I would do also.
Something else to consider is message deferral (Timeout Managers). So for example, lets say you process x number of users but want to run this again. NServiceBus allows you to defer messages for a defined period and the message will sit in the queue waiting to be dispatched.
Anymore info just shout and I can update my answer.
A real NSB solution would be to get rid of the "batch" job that processes all those records in one run and find out what action(s) would cause each of these records to need processing after all.
When such an action is performed you should publish an NSB event and refactor the batch job to a NSB handler that subscribes to these events so it can do the processing the moment the action is performed, running in parallel with the rest of your proces.
This way there would be no need anymore for a scheduled 'start' message at 3 am, because all the work would already have been done.
Here is how I might model this idiomatically with NServiceBus: there might be a saga called PointsExpirationPolicy, which would be initiated at the moment that any points are awarded to a user. The saga would store the user ID, and number of points awarded, and also calculate the date/time the points should expire. Then it would request a timeout callback message to be sent at the date/time these points should expire. When that callback arrives, the saga sends a command to expire that number of points from the user's account. This would also give you some flexibility around the logic of exactly when and how points expire, and would eliminate the whole batch process.

Semaphore vs. SQL-Job when trying to remove expired SQL records

I'm using ASP.NET and C# to build some 'Social Network' web site,
while adding posts there are to SQL columns that i fill, the date and time when the post was added, and the date and time when the post is expired (It varies between all kind of posts..)
I want some process that constantly checks the SQL database and remove posts with expired date and time.
I've searched for solution and i understand that the 2 most suitable solutions are Semaphores and SQL Jobs (Correct me if i'm wrong).
I hope you could give me a hint about what's the best solution, if it's not one of the two what is it, and some info about the best solution as well..
Thanks!
Just hide posts that have expired based on the current time. For example
WHERE ExpiryDateTime > SYSUTCDATE()
Then you can clean old posts in the background at any frequency you like. Create a Windows Task Scheduler task that calls a special URL of your website. That URL should perform a database cleanup. This is a very simple and clearly correct solution.
If you don't like Windows Task Scheduler (and who really does like it...) you can use a scheduler lib such as Hangfire or Quartz.Net.
Neither.
A semaphore is a way of controlling resource use between multiple threads
An SQL job is a somewhat blunt tool designed to allow db admins to schedule tasks
I would create a separate program 'oldDataDeleter' code up your logic about what you want to delete or archive after how much time and then apply that logic in an atomic way. I would run this as a windows service with a timer, or a console app as a scheduled task
The key is to ensure that the program can run concurrently with itself and only does small atomic changes on a small chunk of data at a time.
You can then fire up multiple instances of this program running at a high frequency.
This ensures don't lock your database with large 'delete all from table X join table Y' statements and that your data is constantly trimmed rather than building up a big overnight job to run.
Edit for 'all code must be in a single website project' restriction
There is another solution which in some ways is better and works with your (slightly odd and very much not best practice) requirement.
That is to delete old entries whenever you make an insertion. So when you code adds a Post "insert into posts..." it also runs the delete "delete from posts where.."
This ensures that your program is self maintaining. However, you do incur a performance hit when adding posts. Given that a large social media site would be continually adding posts and needs to scale with its users. I don't recommend this solution.
However for small projects which don't need to scale it is neater.

Hints and tips for a Windows service I am creating in C# and Quartz.NET

I have a project ongoing at the moment which is create a Windows Service that essentially moves files around multiple paths. A job may be to, every 60 seconds, get all files matching a regular expression from an FTP server and transfer them to a Network Path, and so on. These jobs are stored in an SQL database.
Currently, the service takes the form of a console application, for ease of development. Jobs are added using an ASP.NET page, and can be editted using another ASP.NET page.
I have some issues though, some relating to Quartz.NET and some general issues.
Quartz.NET:
1: This is the biggest issue I have. Seeing as I'm developing the application as a console application for the time being, I'm having to create a new Quartz.NET scheduler on all my files/pages. This is causing multiple confusing errors, but I just don't know how to institate the scheduler in one global file, and access these in my ASP.NET pages (so I can get details into a grid view to edit, for example)
2: My manager would suggested I could look into having multiple 'configurations' inside Quartz.NET. By this, I mean that at any given time, an administrator can change the applications configuration so that only specifically chosen applications run. What'd be the easiest way of doing this in Quartz.NET?
General:
1: One thing that that's crucial in this application is assurance that the file has been moved and it's actually on the target path (after the move the original file is deleted, so it would be disastrous if the file is deleted when it hasn't actually been copied!). I also need to make sure that the files contents match on the initial path, and the target path to give peace of mind that what has been copied is right. I'm currently doing this by MD5 hashing the initial file, copying the file, and before deleting it make sure that the file exists on the server. Then I hash the file on the server and make sure the hashes match up. Is there a simpler way of doing this? I'm concerned that the hashing may put strain on the system.
2: This relates to the above question, but isn't as important as not even my manager has any idea how I'd do this, but I'd love to implement this. An issue would arise if a job is executed when a file is being written to, which may be that a half written file will be transferred, thus making it totally useless, and it would also be bad as the the initial file would be destroyed while it's being written to! Is there a way of checking of this?
As you've discovered, running the Quartz scheduler inside an ASP.NET presents many problems. Check out Marko Lahma's response to your question about running the scheduler inside of an ASP.NET web app:
Quartz.Net scheduler works locally but not on remote host
As far as preventing race conditions between your jobs (eg. trying to delete a file that hasn't actually been copied to the file system yet), what you need to implement is some sort of job-chaining:
http://quartznet.sourceforge.net/faq.html#howtochainjobs
In the past I've used the TriggerListeners and JobListeners to do something similar to what you need. Basically, you register event listeners that wait to execute certain jobs until after another job is completed. It's important that you test out those listeners, and understand what's happening when those events are fired. You can easily find yourself implementing a solution that seems to work fine in development (false positive) and then fails to work in production, without understanding how and when the scheduler does certain things with regards to asynchronous job execution.
Good luck! Schedulers are fun!

MS Access interop - Data Import

I am working on a exe to export SQL to Access, we do not want to use DTS as we have multiple clients each exporting different views and the overhead to setup and maintain the DTS packages is too much.
*Edit: This process is automated for many clients every night, so the whole process has to be kicked off and controlled within a cursor in a stored procedure. This is because the data has to be filtered per project for the export.
I have tried many ways to get data out of SQL into Access and the most promising has been using Access interop and running a
doCmd.TransferDatabase(Access.AcDataTransferType.acImport...
I have hit a problem where I am importing from views, and running the import manually it seems the view does not start returning data fast enough, so access pops up a MessageBox dialog to say it has timed out.
I think this is happening in interop as well, but because it is hidden the method never returns!
Is there any way for me to prevent this message from popping up, or increasing the timeout of the import command?
My current plan of attack is to flatten the view into a table, then import from that table, then drop the flattened table.
Happy for any suggestions how to tackle this problem.
Edit:
Further info on what I am doing:
We have multiple clients which each have a standard data model. One of the 'modules' is a access exporter (sproc). It reads the views to export from a parameter table then exports. The views are filtered by project, and a access file is created for each project (every view has project field)
We are running SQL 2005 and are not moving to SQL 2005 quickly, we will probably jump to 2008 in quite a few months.
We then have a module execution job which executes the configured module on each database. There are many imports/exports/other jobs that run in this module execution, and the access exporter must be able to fit into this framework. So I need a generic SQL -> Access exporter which can be configured through our parameter framework.
Currently the sproc calls a exe I have written and my exe opens access via interop, I know this is bad for a server BUT the module execution is written so only a single module is executing at a time, so the procedure will never be running more than one instance at a time.
Have you tried using VBA? You have more options configuring connections, and I'm sure I've used a timeout adjustment in that context in the past.
Also, I've generally found it simplest just to query a view directly (as long as you can either connect with a nolock, or tolerate however long it takes to transfer); this might be a good reason to create the intermediate temp table.
There might also be benefit to opening Acces explicitly in single-user mode for this stuff.
We've done this using ADO to connect to both source and destination data. You can set connection and command timeout values as required and read/append to each recordset.
No particularly quick but we were able to leave it running overnight
I have settled on a way to do this.
http://support.microsoft.com/kb/317114 describes the basic steps to start the access process.
I have made the Process a class variable instead of a local variable of the ShellGetApp method. This way when I call the Quit function for access, if it doesn't close for whatever reason I can kill the process explicitly.
app.Quit(Access.AcQuitOption.acQuitSaveAll);
if (!accessProcess.HasExited)
{
Console.WriteLine("Access did not exit after being asked nicely, killing process manually");
accessProcess.Kill();
}
I then have used a method timeout function here to give the access call a timeout. If it times out I can kill the access process as well (timeout could be due to a dialog window popping up and I do not want the process to hang forever. I got the timeout method here.
Implement C# Generic Timeout
I'm glad you have a solution that works for you. For the benefit of others reading this, I'll mention that SSIS would have been a possible solution to this problem. Note that the difference between SSIS and DTS is pretty much night and day.
It is not difficult to parameterize the export process, such that for each client, you could export a different set of views. You could loop over the lines of a text file having the view names in it, or use a query against a configuration database to get the list of views. Otherparameters could come from the same configuration database, on a per-client and/or per-view basis.
If necessary, there would also be the option of performing per-client pre- and post-processing, by executing a child process, or pacakge, if such is configured.

Categories

Resources