The Concept of Agile and Automated Data Warehousing
Thu 27 Feb 2020 | Weelin Lim
20 years ago, BI was the realm of developers, with technical tools that required development effort to create BI reports for business analysts and report consumers to use. Not much longer than 10 years ago, newer and more business-friendly BI tools appeared, led by Qlik and Tableau, that allowed business analysts to take ownership of BI report and dashboard building. These new tools allowed business departments to own the reporting and analytics space, work quickly and in an agile and flexible manner. This agile approach was soon picked up by the data preparation market, and business-focussed agile data preparation tools such as Alteryx appeared a few years later, to complement the agile BI tools, once people remembered that pretty dashboards are only as useful as the available data.
Data warehousing has not been an area where many people have targeted making more agile, due to its complexity. However, companies that have large requirements for data warehousing and therefore a large data warehouse development team with constant requests for new data sources to be added and existing data sources to be changed in BAU, will usually progress through a number of stages to make their development and delivery of their data warehouse functionality more efficient. The stage that a company gets to is very dependent on the appetite for investment into the data warehouse agile/automation concept, which is usually based on the amount of ongoing data warehousing development they need to carry out.
A typical evolution of this capability, something that I have seen, experienced and led, is outlined below:
Phase 1: Methodology
This phase involves:
- Defining the approaches for data pipeline building,
- The data modelling methodology to adhere to,
- A standard and modular data platform architecture
- And a list of the required functionality that must be built into each end-to-end process, from extracting the data all the way to delivering the analytics and standard guidance for qa.
The advantages of this phase include:
- Recognisable development patterns
- Developers following consistent approaches and understanding the functionality they must deliver.
Phase 2: Framework
This phase involves:
- Creating naming and coding conventions for development
- Standardises modelling patterns and conventions for data models,
- Data pipeline templates for different data layers, which may include ready-built functionality, such as pipeline auditing and logging, that developers can use to customise as required
- Reusable code for common tasks that can be called as appropriate,
- Standardised documentation templates
- Testing templates and reusable testing artefacts
The advantages of this phase include:
- More standardisation of development artefacts
- Greater efficiency from reusing code
- consistent documentation
- Reduction in key-man dependencies
- More efficient and robust QA
Phase 3: Framework Automation
This phase involves:
- Creating metadata about your data
- Creating metadata about your data processes
- Using metadata to help generate individual data pipeline code or whole modules that encompass the data processing required
- Creating standardised and automated QA tests for the generated modules
- Ability to add new data sources and extend your data warehouse model quickly
- Ability to automatically generate new data pipelines and data objects that follow the defined patterns and processes
- Reduced QA effort due to automation
- Ability to deliver standardised code and pipelines faster
- Reduction in support issues
Phase 4: Agile Data Warehouse Automation
The final evolution of this approach comes in the ability to deliver truly agile data warehouse automation. This phase should allow you to generate all data warehouse objects and data pipelines as per the previous phase, but also to make changes to your BAU data warehouse, including data objects and pipelines, whilst maintaining the existing data integrity.
Remodelling of existing data warehouses is a difficult and very involved process, as mentioned previously, so the ability to do this automatically is challenging. Taking this concept further, you can even create the ability to try data model changes and revert to a previous version if you so decide.
Having these agile abilities completely changes the effort, timescales, flexibility, cost and risk of data warehousing. However, the majority of data warehouse teams do not reach this phase, due to the effort and complexity involved.
Data Warehouse Automation:
Even if an organisation does not reach the final level of sophistication, there are ways to achieve this using tools and solutions in the market.
There has been a market for Data Warehouse Automation for over a decade now, with a few established players in that market and some new ones providing a fresh approach. Gartner have recently (Jan 2020) published an article specifically on this and highlighting the growing need for making data warehousing easier. In fact, Gartner predict that by 2022, organisations utilising metadata-driven data integration approaches will reduce time to data delivery by 30%.
The tools highlighted by Gartner aim to give a data warehouse team the ability to accelerate their delivery and are fully targeted at technical IT teams responsible for delivering data warehousing to an organisation, in the traditional setup, just quicker. This is a valid approach where data warehousing development is a major focus within organisations.
Where Data Warehouse Automation can help the Business:
We have seen a recent trend where data project responsibility and ownership lie with the Business which is natural considering they reap the direct benefits and bear the risk for a project’s success or failure. Businesses now want to be closer to such important projects, especially those that unlock the data value for them. It is vital that these do not become IT-owned projects.
However, even with data warehouse automation tools, it is especially difficult for a data warehouse project not to be owned by IT, owing to its complex and highly specialist nature and the target audience of the current DWA tools on the market.
Additionally, there are smaller or less data mature organisations that would benefit from a robust and properly architected data warehouse capability, offering strong data management principles such as data hygiene, data lineage and data security.
However, they do not want, or cannot afford to stand up a specialist team to firstly design and develop it, and secondly, to support, administer and extend it as requirements expand.
The Business ask is to have a platform that they can own going forwards, that they can extend as they identify new data sources and that does not need to have all requirements fully specified and documented up-front before working on it. after all, that is not how a business works. Businesses have some requirements, but they also have ambiguity, get new ideas and priorities and often cannot wait for typical data warehouse BAU change request cycles to obtain these changes in THEIR data warehouse platform.
The TrueCue platform aims to bridge these gaps…
This is the inspiration for TrueCue, a SaaS platform for agile data warehouse automation, driven by the experiences of decades of data warehouse development, linked to experiences with agile data preparation and analytics tools.
TrueCue is a business-focussed platform allowing business teams to create, own and manage the data warehouse which acts as their foundation for high value analytics. Our aim is to abstract all technical complexity away from creating and running a data warehouse, whilst ensuring that it follows best-practice architecture and modelling methodology, so that an organisation can be certain that the data warehouse platform is delivered to the same secure, governed and scalable standard that an enterprise IT team would do, freeing up that team to deliver other projects. With TrueCue taking care of the vital data management tasks, the business can focus on interrogating and analysing their data with confidence and certainty.
Which is why TrueCue provides all features and platform in a single product, including storage and compute in a single SaaS offering. TrueCue provides the ability to work in an agile manner, to develop your data warehouse from what you currently know and extend as you find out more, or even if you change your requirements, which is how businesses are used to working.
Our platform has the capability to bring truly agile data warehousing to every company. It integrates disparate data sources from systems, silos and external sources into a single source of the data truth, following proven data warehousing methodology and practice.
TrueCue will allow you to answer those company-wide questions that a single data source is not enough for. It will allow you to create your data warehouse from the information you currently know, even if you do not know everything. As an agile platform, you can change your mind and your data warehouse model as you learn more or as you discover more data sources to add.
The ability to achieve all of this is made available through a single, simple portal.