Embarking on Big Data Initiatives
However the challenge still remains to accurately align the understanding of the data across sources, Big Data environments, and data consumers. This is where the importance of metadata comes to prominence.
A clear understanding of the data is critical in preventing your Data Lake from turning into a “data swamp.” Properly understanding data requires understanding its structure, meaning, and operational constraints. Building and maintaining this understanding is the main objective of the Metadata Management discipline.
Technical Metadata
Supporting the understanding of data from a structural perspective, technical metadata is used to capture details of physical structure and representation (e.g. in databases, files, or messages) in terms of:
In general, technical metadata takes the form of database catalogs, XML schemas, ETL job definitions, etc. Data models and dictionaries are design-time representations of the technical metadata that also incorporate business meaningful descriptions of individual data elements.
Business Metadata
The source of data element definitions is business metadata, which consists of glossaries of terms. Terms are associated with data elements to convey their meaning. To be effective, glossaries must be more than just simple lists of terms with their definitions. Glossaries need to employ a classification methodology that places terms in taxonomies. This helps ensure that concepts, including their descriptors and relationships, are consistently identified, independent of the lexical constructs of the business definitions.
Operational Metadata
To measure and improve the effectiveness of the data related processes in the data lake, operational metadata is needed to quantitatively and qualitatively describe the
A proper Metadata Management practice needs to include all three perspectives and requires the concerted participation of the business, technology, and operations organizations in order to be successful.
Metadata is the heart of any successful data project today, and the lack of importance placed on metadata is the reason why many first generation big data initiatives have failed.