Open dialog

Open dialog contains a selection of articles, white papers and discussion papers written by Dialog people which you may find of interest. You are able to subscribe to this page. We would like your feedback on any article. Please email us at opendialog@dialog.com.au.

Open Dialog Article

Data Migration Bicycle

Open dialog article,
By Monica Woolmer, Catapult BI (a Dialog Group company)

We know that data can participate in structure, and we know that data can flow. Optimal data structures that support data flows lead to maximised business value. It’s what we call “Formation Data” (i.e. data in correct formation). Data is much like the parts of bicycle; they can be assembled into something extremely useful and - dare we say it - fun!

A data migration project can vary from simple data conversion of a few data tables by a single employee to complex business transformations involving terabytes of data and hundreds of employees. As the complexity increases, so does the need for informed project planning and decision making to reduce the risk to the business. The data migration bicycle has historically proven to be a difficult one for business to ride.

The purpose of this paper is to provide insight into the major factors affecting estimation of data migration projects. Estimates can significantly impact the greatest challenge for migration projects – the ability to meet the requirements for special staff and infrastructure to support the migration effort.

Factors that Make Migration Difficult to Estimate

All too often a decision is made to upgrade a major application based on an over-optimistic view of simply populating the data in the new system from data held in source systems, with little or no concern for business needs, complexity, data quality or scale. This leads to errors in estimating the resources and time required to migrate the data with the obvious impact on project success.

Failed migration planning efforts are usually due to insufficient attention being given to one or more of the nine key factors outlined below.

Business Needs

Factor #1. Clear understanding of business needs

Proper articulation of the business needs allows the identification and designation of relevant data subject areas. Project costs can be minimised by being selective as to the scope of the data migration project.

Some data subject areas will be needed more urgently than others, perhaps due to an application that uses the data as part of a larger phased project. Understanding the scope and timing of deliverables will reduce the risk and enhance the project success.

Further, business needs are not expected to remain static throughout the life of the project. As the business users begin to develop greater awareness of the outcome, they will require changes. Anticipating the scale and timing of such changes during migration planning will enhance the chances of success considerably.

Transform Complexity

Three factors provide a guide to the level of difficulty in the major computing challenge of a data migration project – transforming extracted source data to the target data.

Factor #2. Number of source interfaces

Data is extracted from many source systems and presented to the transform process via several source interfaces. Every source system must be investigated and appraised to make an estimate of the effort required to complete data transformation, mapping and migration.

The more source interfaces requiring transformation, the greater the required effort. There is sometimes an exponential effect with the need to check for duplicate data and harmonise reference data across multiple systems which model the same business entities.

Factor #3. Number of target interfaces

Data that is produced by the transform process is processed by several customised loaders with many loader interfaces. Output from the custom load processes is then passed to more than one target systems via the target system vendor’s more proprietary target interfaces.

Similar to the source interfaces, the greater the number of target interfaces, the greater the required effort to map and migrate the source data, since every target system interface must be closely scrutinised. The target system interface is the last stop for data before it is presented to the Acceptance Test teams, Business Users and Customers whom ultimately will uncover any remaining defects.

Factor #4. Degree of business transformation required

Another factor that will impact estimates is the degree of business transformation. A migration of data from a source business data model that bears a high resemblance to the target business data model will be much simpler to estimate than one in which the business is undergoing a significant change in the way it goes about its business, resulting in dissimilar data models between source and target.

The business may be quite justified in seeking to transform its processes and realise the benefits of adopting a completely new application software platform that provides strong competitive advantage. However, such change can be painful for many areas of the business, as well as the data migration team which must grasp the intricacies of quite dissimilar data models and architect the migration process in such a way as to ensure referential integrity and validity of the target data.

Data Asset Quality

Many organisations run down the value of their data and information holdings through poor data governance. There are three data asset quality factors that impact migration planning estimates:

Factor #5. Quality of data in the source system

The most important phase in a data migration project involves the tasks needed to understand the data content. Poor quality data discovered late in a migration project can undermine its success. The data must be fit to use, especially if the target system imposes rules on the data that are not imposed in the source system. These rules may apply to column, row or tables level of data.

Relationships between data elements have a precise and sometimes complex logic. It is often the case that the data relationship conditions of a load program will cause the incoming data to be rejected when the relationships in the source data model are permitted.

Since cleansing of the data and other remedial steps can take significant amounts of time to implement early identification of these issues are essential to properly estimating the time and resources required for data migration.

Factor #6. Availability of source system subject matter and applications expertise

Staff with subject matter and applications expertise are a vital source of knowledge to data architects and data mappers who need to discover the systems behaviours, business rules and data relationships that exist in a source system.

Without access to such staff, the project planning and execution can flounder. One of the problems with data migrations in business transformation projects is retention of staff where, as part of the upgrade arrangement, management of the target system is passed to an outsourcer and legacy staff are let go. The problem is exacerbated when these staff leave before the project’s data architects and mappers start their work.

Factor #7. Quality of source system documentation and metadata

In a large migration project various personnel will require access to information about the basic business processes and data models of both source and target systems. In big, lengthy projects with the inevitable change in personnel, it is even more essential that acquired system and metadata knowledge be available so as to reduce the time required to deal with new issues as they arise.

Not only should the documentation exist but it should be complete and reliable. Where documentation does not exist, then an early task in the project planning stage is to generate documentation of sufficient quality to support the estimating process.

Scale

Size affects migration estimates, particularly in the area of infrastructure requirements. But size also affects project timing. The data scale of the project is measured by the number of transform staging area tables and the largest table size.

Factor #8. Number of tables

More data tables require more processing jobs, which in turn require more computing time and resources. Projects with more tables in their scope require a larger amount of data architect time to analyse and design migration solutions.

A migration with a hundred tables spread across two source systems will produce many more test defects than one with ten tables in one source system. Also, defects from the larger data model are likely to require a greater effort in resolving because of the likelihood of more complex interrelationships between data elements.

With larger migrations, a greater number of test reruns is also likely, due to the greater complexity of defects.

Factor #9. Size of largest tables

When a business has large numbers of consumers or transactions, some of the data files can become extremely large.

In a migration environment where it is usual to have multiple copies of data being used for various analysis, development and system test purposes, there is a dramatic impact on physical infrastructure in terms of processor, disk storage and network considerations. The demands on migration infrastructure will far outweigh the application’s normal maintenance system requirements.

While the total size of a fully populated database in numbers of records is significant, it is the potential performance constraints on the sometimes unwieldy largest tables in the database that need to be addressed at an early stage, particularly when the processing time for such tables during complex transformations is counted in days and hours rather than minutes and seconds.

A Ticket to Ride

Data migration and data integration are necessary projects for business to undertake but are usually commenced infrequently. One of the key tasks of a data migration project is to deconstruct the data model and the data of the source application(s), so that the data content can be understood prior to key migration decisions.

When planning a data migration/integration project or source system, think of the nine factors explained above as bicycle parts strewn about the garage floor.

Put together as “Formation Data” they will produce fun rides on a working bicycle!

Such a goal can be achieved with a high level of confidence. Having knowledge about the key migration factors is like having a ticket to ride the data migration bicycle. Dialog and Catapult BI can assist you in gaining your ticket to ride.

Reference this article: Monica Woolmer, Data Migration Bicycle (2012-04-27) Open Dialog - Dialog Information Technology <http://www.dialog.com.au/open-dialog/data-migration-bicycle/>

Learn more about Dialog Information Technology

I am looking for an experienced IT service provider.

Discover our Expertise

I am interested in joining Dialog Information Technology.

Careers Available

I would like to learn more about Dialog Information Technology.

Find out More
  • Involved
  • -
  • Committed
  • -
  • Can Do
  • -
  • Always