web analytics
Press "Enter" to skip to content

DataOps: Straightforward Data Migration

Lars Fiedler 0

Business use cases sometimes demand the orchestration or migration of data from one Relational Database Management System (RDBMS) to another. For example, moving data from MySQL to Oracle, or from Oracle to DB2. In this tutorial, we will work through migration of data from an IBM DB2 instance to Microsoft SQL Server (MSSQL). This type of migration is very common across large enterprises that have multiple distributed data warehouses.
The typical workflow for data migration is as follows:

  1. Retrieve the source database schema and table definitions
  2. Build the schema in the target location
  3. Pull the data from source to target, either one-time or as a scheduled (recurring) job.

The Composable DataOps Platform is an enterprise-grade distributed integration engine that can be easily deployed to execute these data orchestration and migration tasks. Composable is able to directly query any number of data store technologies, and can therefore simplify the migration of large data sets across any number of systems.

To see this in action, let’s assume we want to migrate 100s of tables from DB2 to MSSQL, and walk through the steps above.

Retrieve the source database schema and table definitions

To view the structure of an IBM DB2 database, and more specifically of the table schemas, you can query the SYSIBM catalogs provided by DB2. The SYSIBM catalogs contain metadata about the database and its tables. Typically, you can use either SYSIBM.SYSTABLES or SYSIBM.SYSCOLUMNS to examine the table structure. SYSIBM.SYSTABLES provides generic metadata about the table itself, while SYSIBM.SYSCOLUMNS provides schema information.
We can use a simple query using SYSIBM.SYSCOLUMNS that will allow us to view the table structure and build a CREATE TABLE statement for MSSQL. The DB2 query is:

Here, we are selecting:

  • NAME – the column NAME
  • COLTYPE – the column datatype
  • LENGTH – the length of the datatype
  • SCALE – the scale of the datatype (e.g., precision for DECIMAL types)
  • COLNO – the order of columns as stored in the database

We are filtering by TBNAME (a specific table) and ordering the results by COLNO so that it is ordered according to the order in which these columns were created initially and stored in the database.

An example result looks as follows:

NAME COLTYPE LENGTH SCALE COLNO
OPID CHAR 30 0 0
DTTM_INSERTED TIMESTMP 10 6 1
R_NUMBER DECIMAL 10 0 2
LEAD_STATUS CHAR 2 0 3
LEAD_ID CHAR 15 0 4

With this information, we can now utilize a datatype mapper that maps datatypes between our two different RDBMSs. These tables are typically published by the technology vendors, see this page for example.

Now, we can start to retrieve our table definitions using Composable DataFlows.

First, we simply query DB2 for the table structure as follows:

The Property loader loads the database connection string from a stored Composable Key.

The String Formatter Module has our parametrized query:

The table name (TBNAME) is listed as a parameter ({0}), as we will soon be looping over 100s of tables in the source DB2 instance.

The ODBC Query Module takes our query and executes it, returning a table of the results, similar to the table above.

Build the schema in the target location

The next step is to use these table definitions to build the target tables in our target MSSQL instance. We do this in two steps. First, we use a custom Code Module to utilize the information retrieved above and generate the CREATE TABLE statement. Second, we push the CREATE TABLE query to our target MSSQL instance to execute it.

Custom Code Module for mapping DB2 datatypes to MSSQL datatypes

The custom Code Module executes the following code to map the datatypes and create the required CREATE TABLE statement.

In our example, the mapping code is:

The code here can be modified to included as many of the datatypes that are needed.
The generated CREATE TABLE statement is then pushed to the target MSSQL instance to execute the query and build the tables.
The entire DataFlow is shown below.

Composable DataFlow that loops across DB2 tables, retrieves the table definitions, generates CREATE TABLE statements and executes them on the target MSSQL instance.


In this DataFlow, the Table Editor Module contains a list of table names, with the Table ForEach module looping through the table names.

Pull the data from source to target, either one-time or as a scheduled (recurring) job.

Now that we have our target database built and mapped, migrating the data is extremely simple. The Composable DataFlow that can execute this is shown below.

The key elements of this DataFlow are:

  • A ForEach Module contains the source table names, and loops across these
  • (optional) An initial TRUNCATE TABLE statement on the target instance, to remove any previous data that may have been pulled
  • Streaming SQL Reader module to execute the SELECT statement on the source instance, and return the results in batches (e.g., 100000 rows per batch)
  • Streaming SQL Writer module that writes the batches
    (e.g., 1000 rows per batch) to the source

As with other Composable DataFlows, a Timer Module (not shown) can be inserted to allow the dataflow to be triggered at a specific time, and create a recurring job.

Resources

For this tutorial, you can download both DataFlows as json files that you can then import into your Composable instance.
Uncompress the zip file and retrieve both DataFlows as json files:

  • Build MSSQL Tables from DB2.json
  • DB2 to MSSQL Pull.json

Next, launch your Composable Instance, and for each json file, open up the DataFlow Designer, select import (underlined in red in the image below) and select each json file.

 

Lars Fiedler

Lars has comprehensive expertise building large complex software systems, and has served as a Software Engineer at MIT’s Lincoln Laboratory from 2010-2014, where he began developing Composable Analytics. Prior to joining Lincoln Laboratory, Lars worked as a Software Engineer at Microsoft Corporation from 2006 to 2010. Lars received his MS in Computer Science from Georgia Institute of Technology in 2004, and his BS in Computer Science from Georgia Tech in 2003.
Lars Fiedler