Keys and Integrity - Identity Value Types : Hopp Support

This is a detailed topic in our support portal in the Using migFx series and assumes that you have some prior knowledge or experience using migFx. If you want to know more about hopp tech and migFx, our comprehensive solution for complex data migration, please visit our website hopp.tech.

Regardless of the implementation, data in the Target System will inevitably be tied together with foreign key relationships. In other words, data instances have unique ids (the primary key) and other data instances relate by storing the key of the instance to relate to (the foreign key).

As is indeed best practice, the key values are normally internal to the Target System and do not carry any information. Furthermore, the responsibility for the generation and assignment of the primary keys resides inside the Target System, and it is the Target System that guarantees the necessary uniqueness of the primary keys.

This article will give you an impression of how the Identity Value types of Hopp provide valuable assistance in handling the integrity of primary and foreign keys in data migration.

Loading Target System via API

Loading via an API typically means that the generation and assignment of the primary keys are done when the API is called for a data instance, and the API will return the value of the primary key assigned to this instance to be used as a foreign key for all related data instances.

For instance, when calling the Target System API to load a Project, the API assigns and returns a primary key for the Project, and this key has to be included as a foreign key when calling the API for all the Tasks in the Project.

This is a very common use case, especially when migrating into commercially available 'off-the-shelf' products like Oracle EBS, SAP, Salesforce, Microsoft Dynamics etc.

One normal draw-back of loading via an API is that it is quite often not very performant and while the actual transformation part of the data migration may be fast, loading via the API may take a long time.

Direct load to Target System

For larger migrations into bespoke Target Systems under the control of the Target Business, it is not unusual to load the migrated data directly into the underlying database tables. Typically, this allows the load to utilize very performant bulk insert facilities of the underlying database management system.

However, for this at all to be possible, the mechanism used by the Target System to generate and assign keys must be externalized in a way that makes it possible for the correct primary keys to be generated and assigned by the data migration itself with zero risk of colliding with keys generated by the live Target System.

We have come across a few different ways to accomplish this (there are maybe more out there):

The generation of primary keys is an externalized function in the Target System that can be called from the data migration
Intervals of primary keys can be reserved for the data migration (and thus not used by the live Target System) for the duration of the data migration
The nature of the primary keys is such that keys can be created directly by the data migration outside of the Target System without risking collisions with keys generated by the Target System itself. For instance, GUIDs (Globally Unique IDentifiers)

Identity Value Types

Whether you are loading via an API or loading directly, you must ensure complete referential integrity between Primary and Foreign Keys in the Target Data produced by Hopp.

In Hopp, the Value Types Identity and Guid (available from version 1.3) in Studio are specifically designed to help you accomplish this. The two are quite similar, the only difference is that the Value Type Identity returns a 64 bit integer while the Value Type Guid returns a 128 bit (36 character) GUID.

The Identity Value Types are only available in Studio when assigning values to Target Fields. At run time, the Target Engine has a complete track of the key value of the current Business Object (as expressed by the Interface Fields marked as Key on the Business Object and its ancestors) plus the mapping id of the current Target Object and current Target Field. The Target Engine passes this combined migration key on to the Director Runtime and asks for a key value (either integer or GUID). If the Director has already stored a value for this migration key, this value will be returned. If not, the Director will generate, store and return a new value.

This mechanism ensures that

An Identity is completely and globally unique for this exact Target Field on this exact Target Object Instance
- If an Identity Value is assigned to another Target Field on the same Target Object instance, it will be different from any other identity on the same Target Object instance
- If an Identity Value is assigned to the same Target Field on a different instance of the same Target Object, it will again be different
Identity values remain unaltered across iterations. If the same Business Object is re-iterated, all Identity Values assigned to Target Fields will be identical to the values from the last iteration

In other words, Hopp ensures that a given Identity Value:

is only assigned to 1 Target Field in the entire migration
remains the same for this Target Field no matter how many times you iterate

Integrity

Once a Target Field has been assigned an Identity value, it is simply business as usual to use the assigned value of this field to assign values to other Target Fields that act as Foreign Keys on their respective Target Objects:

If from within the same root Business Object hierarchy, use the Value Type Target to pick up the Identity Value assigned to the original Target Field
If from another root Business Object, use a Relationship Value to look up the other root Business Object and pick up the Identity Value assigned to the original Target Field

Example

As an example, consider a simple Business Object Account with a child Business Object Statement. The Business Object Statement has a relationship to a separate Business Object Customer, to identify the Customer that receives the Statement. For the sake of this example, in the Target System there are two Foreign Key relationships on the Statement:

A foreign key to the Account the Statement belongs to
A foreign key to the Customer receiving the Statement

In the figure above, the Business Object Statement is a child of the Account root Business Object. In addition, it has a relationship to the separate root Business Object Customer. The Target Fields AccountId on Account and the CustomerId on Customer are both mapped using an Identity Value. As described above, these 2 Target Fields will each be assigned a globally unique, repeatable value when the migration executes.

The foreign key relationships on Statement are established like so:

The AccountId is a foreign key to Account and is assigned a Target Value that picks up the value that was assigned to the Account AccountId
The ReceiverId is a foreign key to Customer and is assigned a Relationship Value picking up the Identity Value that was assigned to the Customer CustomerId.

This is how the corresponding mapping looks in Studio:

Iteration

Of course, Hopp automatically resolves the resulting dependency between Customer and Account and ensures that the Customers are migrated (and the CustomerId assigned its value) before the dependent Accounts (using the value assigned to CustomerId).

But imagine a large migration, where the migration of all Accounts may take considerable time. Let's say that an iteration of all Customers is desired to solve some minor issue internal to the Customers (for instance that Surnames are not starting with an upper case letter). Indeed a minor correction, and it would be wasteful to be forced to iterate all dependent Accounts.

Luckily, since the values assigned to the CustomerIds remain the same across all the iterations, the ReceiverId foreign keys on the Statements within the Accounts are still valid even though the Customers are iterated. So there is no need to iterate the Accounts.

In real-life migrations, with multi-level dependencies (CreditCards depends on Accounts depend on Customers), this cross-iteration persistence of the assigned Identity values potentially saves time and avoids the risk of unnecessary iterations.

Load and Key translation

As described above, the Identity Value Types provide an easy-to-use and robust way of maintaining complete referential integrity in the migrated data.

If you are so lucky, that your migration loads by inserting directly in the Target System, and your Target System uses GUID type primary key values that will accept Guid Identity Values from Hopp, you are home free. The Guid Identity values assigned by migFx can simply go directly into the Target System with no risk of collision on primary key values.

On the other hand, if the Target System is in control of the final assignment of primary key values via an API, the Identity Values produced by Hopp are not directly usable, and - as part of your delivery extension - it will be necessary to maintain a lookup between the Identity Values produced by migFx and the primary key values generated by the Target System.

However, due to the uniqueness of the Identity Values globally within Hopp, such a lookup will always be quite simple, with the migFx Identity value on one side and the Target Primary Key Value on the other.

When a data instance is loaded via the API and a primary key value is returned, this primary key value must be stored in the lookup along with the corresponding Hopp Identity value.

Whenever the same Hopp Identity value is encountered as a foreign key in the migrated data, it is simple to look up the corresponding primary key value and send it to the Target API.