1. Obtain transparency about existing data
Creating ETLs often turns out to be a tedious and challenging process. The general basis for data engineering lies in the available data. In practice, however, it is often unclear where the data required for queries from the business departments is actually located - perhaps it is already available in the data warehouse (DWH)?
The first step towards more collaboration when working with ETLs is therefore always to improve transparency. With a suitable data catalog, engineering teams can obtain a complete and specifically searchable overview of all data. This overview then ideally includes not only the DWH but also the structures in the source systems and tools used. We always advise automating as much as possible for more sustainable transparency. Keeping the catalog "up-to-date" should only require as little effort as necessary. With the necessary experience, modern tools can usually be easily linked and synchronized with data catalogs.
2. Assemble the ideal stack
After taking inventory (Best Practice 1), your data teams now clearly know which tools and capabilities are already in place and which are still missing. They also gain a clear picture of where collaboration works well and where it needs to be improved. This clarity is essential to subsequently enrich the stack with the missing components and achieve the goal of more collaboration.
Which tools are most suitable for the individual use case must now be critically evaluated. Important criteria are:
Available interfaces to data recipients
Scope and complexity of authorization management
Desired collaboration features (e.g., integrations to MS Teams, Slack, commenting capabilities, etc.).
Available interfaces to data sources
Available interfaces to data recipients
Scope and complexity of authorization management
Desired collaboration features (e.g., integrations to MS Teams, Slack, commenting capabilities, etc.).
If you have any questions about which stack is ideal for you, feel free to write us!
3. Think beyond tool boundaries
Taking best practice #2 further, we recommend thinking "big" when assembling the stack. Collaboration doesn't just happen in one isolated tool, but in several. Two good examples of processes that enable cross-tool collaboration are documentation and establishing a business glossary.
No one would deny that documentation is important and yet lack of documentation is a very common problem. Especially at interfaces between tools it is helpful to be able to see what happened before. Some tools offer the ability to automatically create documentation - these are a great support. Basically, we recommend establishing a uniform, simple structure for documentation.
A business glossary, usually part of a data catalog, provides the link between the business and technical worlds. This can be enormously helpful both for collaboration with specialist departments when building new pipelines and for traceability during pipeline repairs.
4. Rely on automation
Many companies have various tools in use around the creation of ETLs. In order to collaborate seamlessly, even across different departments, it is important to bring these tools together in one central location. Be sure to tie all solutions to your data catalog and keep them in sync automatically! With manual synchronizations, we very often experience that problems remain undetected for too long, or that they cannot be tracked due to the effort involved.