We as developers really do not like writing documentation. I am quite sure that if given a choice some developers would rather walk over broken glass than sit down and create documentation for their solutions. Sure there are some exceptions to this, but in general writing documentation is not something that we enjoy nearly as much as designing and developing solutions.
Documentation is sometimes done “after the fact” as a final task that needs to be delivered. Sometimes “agile” is taken way too far and no real documentation exists at all. Although this might work in certain cases there are certain documents that truly should not be gone without and which should be created before development starts. In BI solution one such document is the humble Source to Target Mapping (S2T).
What is a Source to Target Mapping document?
A S2T document is simply a document that contains the mapping of source system fields to the fields of a target system. In a data warehouse this will typically be a mapping from either source files or database tables to the different tables in your target data warehouse / data mart.
Think of the S2T document as being the blueprint of your ETL solution. In addition to containing the mapping of fields from source to target the document also captures the following important information:
- Loading frequency for the mapping described by the document.
- How source tables / files should be joined together in order to get the desired source dataset.
- Data types of both the source as well as the target fields.
- Any conversion logic that is applied to convert between data types.
- Any business rules that need to be applied.
- Any slowly changing dimension attributes and logic.
The S2T document is the main input into your ETL development efforts.
When should it be create?
The normal BI Solution design process roughly follows the following very high level steps.
- Investigate business area(s) and their reporting requirements.
- Investigate potential source systems.
- Design a flexible dimensional model to address the current and potential future requirements.
- Translate the dimensional model into a physical model.
- Develop the ETL solution.
- Develop the OLAP / Tabular solution.
- Develop the reports.
Our S2T documents should be created before we start with step 5. For some reason S2T documents are sometimes created while the ETL is being developed, this would be an example of “after the fact” documentation, mainly because in practice developers tend to first develop the ETL and then create the S2T documents. When done in this way S2T documents serve very little purpose other than to be system documentation for future maintenance. This approach also causes teams to forgo the main benefits of good S2T documentation.
What are the benefits?
Source to target mapping documents that are created during the design phase of the solution before embark on the actual ETL development have some of the following benefits.
- Ensure that the source data required to build the model exist in the way it is required for the designed solution.
- Highlight any complexities in ETL processes before development start.
- Ensures that the ETL development effort can be estimated more accurately. (Remember you cannot accurately estimate what you don’t understand or know.)
- Removes the bottleneck of multiple ETL developers needing to engage with the analyst(s) or source system owners all at the same time to identify how to get the source data required.
For projects with multiple ETL developers this last point is crucial. In most organisations the number of people that know a source system well enough to provide the information needed for a S2T mapping is very small. There is nothing that derails a project plan as badly as assuming that the ETL processes can be developed in parallel only to have all of your ETL developers struggle to get time with the one or two people that can give them the information they require.
What should it look like?
So now that you know why you should be creating S2T documents what should they look like? Well the honest answer is that it can take many different forms. Most of the time the documents end up being Excel spreadsheets which captures the information in an easy to read manner. There is no single correct way to do this, it is important to create a format that works for your team which contains all the necessary information listed earlier.
To get you started I have included an example which you can download. This S2T document is an example of loading a fictitious Product dimension from the AdventureWorks OLTP database directly from the source system into the data warehouse / data mart. In the real world you would have multiple stages e.g. Source to Stage, Stage to Warehouse etc. each having their own Source to Target mapping.