Synchronous and Scheduled Flows

Modified on Sun, 10 Dec, 2023 at 2:16 AM

This is a detailed topic in our support portal in the Using Hopp series and assumes that you have some prior knowledge or experience using Hopp.

The pipelines described in the previous article can be combined to create job flows to execute part of the migration or indeed the full migration.

These flows can either be executed synchronously, meaning that the jobs in the pipelines are created and submitted and awaited and, if the flow terminates prematurely due to a faulted or cancelled job, the entire flow will have to be rerun.

Instead of submitting the jobs in the pipelines as they are created, an alternate option is to create all the jobs and pipelines in a so-called schedule and then submit the entire schedule for execution. The schedule retains the state of the jobs as they are submitted. If the schedule terminates due to a faulted or cancelled job, it can be re-submitted and will simply pick up where it left off.

Synchronous flows

The first alternative is simpler and primarily meant for use in an orchestration application, where each step in the orchestrated flow contains a minor part of the entire flow. If a job in a step fails, then the entire step will be rerun by the orchestration application.

Scheduled flow

The second alternative is primarily meant for interactive users. In order for a scheduled flow to be re-submitted, the state of the schedule has to be retained. This is easily achieved in an interactive user session of PowerShell - where-as few (if any) orchestration applications allow a step to save any state between invocations.

Building a synchronous flow

In short, a flow consists of parameterized jobs that are assembled into batches, and these batches are then submitted and awaited.

A synchronous flow is characterized by the fact that the end of the pipeline is Submit-HpJob | Wait-HpJob, meaning that the jobs are submitted immediately and that further script execution will block until all jobs have stopped running.

Setup Engines

Here's a sample of the PowerShell script to include in an orchestration step to create a batch to run the setup jobs of both the source and target engine and await their completion:

exit @(
    (New-HpSetupSourceJob),
    (New-HpSetupTargetJob)
) | New-HpBatch | Submit-HpJob | Wait-HpJob

Here's a break-down of what is going on

Both calls to the New-Hp*Job cmdlets are enclosed in parenthesis. This is important as it causes PowerShell to execute the cmdlets and return their result (which in both cases is a parameterized job, ready for submit)
The enclosing @( ... ) notation puts the two jobs inside an array
The array is piped to the New-HpBatch cmdlet which in this case does nothing (more on this in the sample below)
The array is piped on to the Submit-HpJob cmdlet that submits 2 jobs concurrently in the Hopp Runtime
The pipeline ends with the 2 jobs being piped to the Wait-HpJob cmdlet to wait for their completion
The return code from Wait-HpJob will be the exit code. If any of the jobs in the batch faulted or was cancelled, the exit code will be 1. If all jobs were completed successfully, the exit code will be 0

Load Source Tables and Source Valuesets

Here a another sample script to load all source tables in parallel with loading all source valuesets

exit @(
    (Get-HpSourceTableList | New-HpLoadSourceTableJob),
    (Get-HpValuesetList -engine "Source" | New-HpLoadValuesetJob)
) | New-HpBatch | Submit-HpJob | Wait-HpJob

Pipeline explained:

The output from the cmdlet Get-HpSourceTableList is piped to the cmdlet New-HpLoadSourceTableJob
1. New-HpLoadSourceTableJob will output an array of parameterized jobs, one for each source table
The cmdlet Get-HpValuesetList is called with the -engine option "Source" and outputs a list of the dynamic valuesets in the source engine that is piped the New-HpLoadValuesetJob cmdlet
1. New-HpLoadValuesetJob will output just 1 job to load all the valuesets
The outputs from New-HpLoadSourceTableJob and New-HpLoadValuesetJob are put inside the @(...) array that is piped to the New-HpBatch cmdlet
The input to New-HpBatch is now actually an array with just 2 elements. The first element is itself an array of all the Load source table jobs. The second item is just the one job to load all the valuesets
1. New-HpBatch consolidates the two elements into one, unified array containing all the jobs
The array containing all the jobs is then piped to Submit-HpJob in order to be submitted and run concurrently in the Hopp Runtime
And finally, the jobs are awaited by Wait-HpJob and the return code used as the exit code for the script

Building a scheduled flow

While the synchronous flow above submits and awaits the jobs as they are created, the scheduled flow first creates all the jobs and stores them in a schedule. And then the complete schedule is submitted

Here are the same 2 samples as in the synchronous flow above, but now as a scheduled flow. You will notice that the big difference is the absence of calls to the Submit-HpJob and Wait-HpJob cmdlets

$setupEngines = @(
  (New-HpSetupSourceJob),
  (New-HpSetupTargetJob)
) | New-HpBatch

$loadSourceTablesAndValuesets = @(
  (Get-HpSourceTableList | New-HpLoadSourceTableJob),
  (Get-HpValuesetList -engine "Source" | New-HpLoadValuesetJob)
) | New-HpBatch

$schedule = @(
  $setupEngines, 
  $loadtablesAndSourceValuesets
)

Submit-HpSchedule $schedule

A run through:

The batch containing the parameterized jobs to setup the source- and target engines is stored in the variable $setupEngines
Likewise, the batch containing the flattened list of table load jobs plus the valueset jobs ends up in the variable $loadSourceTablesAndValuesets
These 2 batches of parameterized jobs are the combined in to an array and stored as the $schedule
Finally, the $schedule is handed to the Submit-HpSchedule cmdlet and it is this cmdlet that handles submitting and awaiting the batches and jobs

Internally in the Submit-HpSchedule cmdlet, the calls to Submit_HpJob and Wait-HpJob are executed on the batches much in the same way as in the Synchronized Flow above.

In addition, the cmdlet stores information back into the $schedule when jobs are submitted. This enables the cmdlet to be executed again on the same $schedule and use it to query the Hopp Runtime for the state of previously submitted jobs. So, if the schedule is interrupted because of a faulted/cancelled job, it can simply be resumed by a new call. This will continue the $schedule from the previous point of interruption, restarting faulted/cancelled jobs as required.

Summary

In summary, the Synchronized Flow is ideal to use inside the step of a 3rd party orchestration application. In this use-case, the orchestration application is in charge of step sequence and dependencies. If a step fails in an orchestrator, it typically means that the step has to be rerun from scratch. So it makes sense to keep the steps small and atomic

On the other hand, the Schedule Flow is there to help the individual that find themselves again and again sitting in front of the Portal Operations job list, waiting for jobs to finish just to launch the the next batch of jobs when ready

The next articles contains guidance on building both Synchronized and Scheduled flows for an entire migration.