BLOG

The Data Integration Factory

Much has been written about lean integration and factory based approach to Data Integration, especially by my knowledgeable former colleague, John Schmidt (http://en.wikipedia.org/wiki/Lean_integration). I would like to elaborate on it and share my perspective on past experiences with Data Integration Factory, as well as why Eccella is uniquely positioned to succeed in a factory-based setting more than any other vendor.

On a high-level, a Data Integration (DI) Factory, like other factories, is built in order to repeatedly mass-produce a product. However, as opposed to a car-manufacturing factory that produces cars in an assembly line, a DI Factory produces logical DI assets. Those assets could be transformations, mappings, scripts, source/target definitions, etc that help facilitate DI needs such as moving data from source to target, enrichment of data, lookups, and others.

DI Factory will be used when there are large numbers of assets with similar characteristics indicating a facilitating mold/template can be created, or when there are existing specifications and/or metadata which could be leveraged in order to automate/accelerate the generation of such assets. Not sure where I am going with this? Lets go back to the car factory example.

Consider the two key points:

  • If all cars produced have the same chassis but different engines, wheels, colors, etc, much of the car manufacturing process can be automated to accommodate for the commonalities between the cars, and even putting in different engines and wheels could be done by the same assembly line. If you intend to build just one car, don’t bother with the factory.  But if you intend to build many, would you still want to build each one as if it is unique and was not done before?
  • Similarly, if a car company acquires a different car company and wants to keep building some of the cars from the acquired company, should they try to create an assembly line from scratch with no prior knowledge of what was done before, or should they leverage existing information and perhaps systems to recreate the assembly line?

 

DI Factory is very similar. If you need to create 10s, 100s or 1000s of mappings, transformations, sources, targets or other DI components, it very rarely makes sense to create everything from scratch, especially if multiple people do it. It is best to try and identify repeatable patterns, define templates and create parameterized specifications to drive mass generation using a factory, rather than manually creating these components.

If there is already an existing DI implementation in place, it is uncommon for it to be documented in well-defined, error-free specification to drive automation from. However, you can often build a factory to process an existing metadata and leverage the integration rules that were already implemented.

Such factory is of significant benefit for the following reasons:

  • Eliminate scale factors – implementing one mapping is just as cost effective as implementing many. The same automation process is executed, generating as many mappings as defined by the corresponding inputs.
  • Support various degrees of complexity – from source to target pass-through mappings, to complex mappings that include expressions, enrichment, error handling, etc.
  • Used by different user types – from business executives, users and analysts, to the super programmer and application developer.
  • Fast changes rollout – changes to the metadata can be applied to the created assets in a click of a button. It is very easy to make a change, re-generate the assets, and roll them out to the correct environment.
  • Uniformity – with mass code generation and maintenance, uniformed constructs are of great importance. It is what allows easy resolution of issues and quick turnaround time.

Now that we have a basic understanding of what a DI Factory is, how would one go about actually building one? Expounding on John Schmidt’s Lean Integration approach, here are my suggested key techniques to implementing a DI Factory:

Analyze Data-Collect information on input and output structures, volumes, and existing procedures such as mappings, error handling and data quality, etc. This needs to be done as early as possible in the lifecycle as it determines some of the aspects of the Factory’s design and manufacturing procedures. I recommend that in your analysis you:

  • Look for patterns and commonalities
  • Identify different use-cases and parse-ability of the data
  • Determine equivalence to your intended product
  • Analyze complexities and their percentage of overall number of total assets Determine whether the ROI is worth it by estimating amount of manual work required vs. use of automation

 Prototype-Prove the importance of implementation to you and stakeholders and demonstrate what the Factory end product would look like. Manually create a complete or partial single instance of the end result that you can be happy with. Test it thoroughly to ensure it would meet all design requirements, SLAs, etc. This would be your mold/template for future factory-produced assets.

 Leverage Metadata and Specification-Use of existing systems’ underlying metadata or machine/hand written specifications. These are essentially a set of pseudo-instructions that help your Factory produce the result. Whether you tap into a repository, export metadata files, or simply rely on machine-generated or human-written specs, these instructions have to be consumed and fed into a set of instructions your factory assembly line can process so that it can start the manufacturing process.

 Stub out parts of the generated asset-Assembly lines might not be able to create a complete, fully functioning products. What it can’t or shouldn’t build by automation, it needs to leave a clear placeholder for so that in due time, that stub would be replaced with an appropriate implementation. If you do create stubs, make sure they are informative and capture as much information as possible so that whomever will be eventually replacing the subs with an actual implementation will have as much guidance as possible.

 Change Management-Controlled by configurable assets for change management, deployment and reporting is critical so that changes could be pushed down and enhancements reapplied, preferably allowing for small increments. Consider the following as part of your change management strategy:

  • Applying changes earlier in the assembly line might disrupt processes further down the line.  Therefore all changes should be prototyped and tested prior to assembly line changes
  • Changes need to be applied after the original product has been created and delivered. This takes careful planning to ensure that changes resulting in overwrites will not erase important customized logic that was added after the original product was created

 Accelerate/Automate-The factory needs to show clear ROI by either completely automating the end product, or accelerating the creation of end product assets. The 80-20 rule is key consideration in the acceleration vs. automation discussion. 20% of the factory build out time will normally be able to address 80% of the cases, while the other 80% of the factory build out time will be used to address the remaining 20% of the cases. If possible and investment-worthy, aim for full automation.  However, acceleration is just as beneficial in most cases

 Standardize Platform-Using the same technology and people with knowledge sharing creates a robust and scalable implementation. Reusability is key - not just reusability of technology, but of knowledge and manpower too. It is best if the technology is standardized, mature and by the same provider, so that the different parts communicate and operate in harmony.

 Establish Methodology-Having an established plan and practice allows for repeatability, increased effectiveness and consistent execution excellence. Every time a factory is reinvented, designed and built from scratch, and operated by different people, it makes it harder to deliver consistent ROI. Mature factories will establish a factory build-out and have operation blueprints and guides.

In summary, these are all key points to consider as you embark on the journey to create one or more factories. Analysis, planning and standardization are key to success.