One of the major contributors to the improvements in quality (beyond simple productivity of farming benefits) was that some of the freed-up farming workforce shifted to improve other areas in the food value chain, including storage, distribution, retail, and preparation.
The same opportunity exists to improve how companies manage data. It seems to me that too much effort is still on “farming” the data. It is inefficient and takes away from the opportunity to redeploy that labor to more valuable areas such as analytics, ML and AI.
Since the early days of computers and databases, Extract, Transform, and Load (ETL) code has been the way developers moved data. Look at the day-to-day workings of any large enterprise and you will find 10s or 100s of thousand ETL jobs behind the scenes that represent the glue of how information flows.
One of the first steps of any technology project is to identify the up and downstream data needs and write the new ETLs needed to support that effort.
Writing ETL code is relatively simple. Plus, it’s darn useful in answering almost any business question (are my sales up or down, what’s the trends with service calls, how are different managers spends vs budget looking, etc.). Lastly, there are many popular database tool vendors with their own spin on ETL, and IT service vendors LOVE to provide ETL staff. On the surface, ETL appears really easy, useful, and lots of vendors are eager to help.
But while ETL may seem appealing, over time it leads to serious problems:
All too often we hear companies tell us “we have a legacy data warehouse with thousands of ETL jobs, no documentation, frustrated business users, no trust in the data and high costs to maintain. We’d like to port, clean-up, and retire but it’s a risky and costly effort to untangle all the spaghetti ETL code”.
Every company today would like to throw a hand grenade into their legacy data and start over again with an enterprise data lake (EDL). When implemented correctly, an EDL will address the issues listed above and make it easier to pull accurate insights across the breadth of the company.
However, if you choose to ETL data into the lake, the same problems will arise again. Lakes will come swamps very quickly.
At Next Pathway we have adopted the mantra “no ETL” and have developed tools and processes that eliminate the army of ETL developers.
Simply put, we do not ETL data.
Instead we rely on the metadata (literally, data about the data) and business defined transformation rules to automatically migrate data and code.
The operational benefits of cost and time are immediate. And because we avoid manual ETLs, data lineage and metadata are accurately captured. The long-term benefit is that the “data swamps” that result from all that manual ETL over time are avoided. This is because the descriptions of how and why the data was moved are necessarily created and guaranteed to be accurate. Anyone can look back over time to understand where the data came from and what rules were applied. True enterprise data is achieved.
Learn more about how you can accelerate the migration of your ETL pipelines to your cloud target, today.