Beyond Gold: Why the Medallion Architecture Needs a Platinum Layer

March 02, 2026

Introduction

If you are involved in data engineering, you have likely heard of the Medallion Architecture. For those not familiar, it is a paradigm used to segregate data into layers during an ETL process to provide separation of concerns and help make troubleshooting data a little easier as well as other things. These layers are the bronze, silver, and gold layers. The bronze layer is raw data from the source disparate systems. The Silver layer is an enriched layer used to clean up data from the bronze layer. The Gold layer is the curated layer that has the data in a structure that should be easy to consume for end users.

When I was first introduced to this architecture, it built upon a concept I had already figured out through the few years I had been working on data at the time. It was almost relieving to have some type of formal concept that someone else had already thought of and also that I could use to explain the idea to others who were struggling with the concepts I had been trying to implement.

A few more years down the road I found myself struggling with part of this architecture because people I talked to all interpreted it a little differently. This ambiguity hindered productive conversations at times because people would have to find a baseline definition in order to be able to discuss the concept more in depth. The piece of the architecture that has been causing confusion is the Gold layer.

If you want to skip over me explaining my frustrations, feel free to jump to the section called "The Platinum Medallion Architecture". It defines my rules for this particular architecture. If you want to understand where this concept came from, feel free to continue reading.

Is the Gold Layer Enough?

After hearing about the Medallion Architecture, I started trying to fit my concepts of data into this architecture to give it that structure. I hoped in doing this I would be able to represent the concepts to other people. Not only to other data engineers, but also to business people I was working with.

Another challenge I ran into is based on the technology, there were a few different ways to implement this architecture. In the past I mainly worked with Microsoft SQL Server for data warehouses. I then moved on to Azure Synapse for implementing Lakehouses which gave more flexibility with having a data lake involved. This is where I started struggling with the gold concept a little bit though. Now that there was a truly defined gold layer, I couldn't quite settle on the best way to implement it. Shortly after Microsoft Fabric went to generally available, we started building out solutions for clients and this made the pattern even more muddy to me.

The gold layer is supposed to be a curated layer. It is literally the "Gold" standard of data. Just like the other layers I had some well defined rules for this layer, and this is where things fell apart for me.

Rules for gold layer:

Rule Description
1 Data in a table must be consistently the same level of granularity
2 Naming of tables and fields should be descriptive of the data they represent
3 Fields in tables should be in a data type that can be easily consumed by a semantic model or another analytics tool
4 Tables in gold must derive their data from tables in silver
5 No table in the Gold layer should depend on another Gold layer table as its source

Rule numbers 4 and 5 may seem like the same thing, but there is an important distinction between them. Using the definition of rule 4, even if you created a table in gold from another table in gold, the data ultimately comes from silver still. Introducing rule 5, it clears that ambiguity that has bugged me so much which is defining that a table in gold should not depend on another table in gold as its source for data. This explicitly closes the gap on where data can be loaded from in gold, but introduces a new problem, "How can I generate a table in gold where I need curated data from another table in gold?". This is where I started introducing the "Platinum Layer".

Introducing the Platinum Layer

The Platinum Layer is the new curated layer. This is where data should be consumed by semantic models or other analytics tools.

You may ask, "So what is the point of the gold layer then?". I am glad you asked. This changes the purpose of the gold layer to be more of a "Normalized" layer. This will be easier to explain using some real world systems to use as an example.

There is a very well known software out there called SAP S/4HANA, which most people just call SAP. It is an ERP (Enterprise Resource Planning) software that supports activities for companies such as purchase orders, sales orders, deliveries, accounting, and other functionality. It has quite a complex and large database with many tables. Some tables in the database work together for different purposes. In this example I am just going to look at sales orders and how you would promote the data through the medallion architecture.

First, we will define some tables that are commonly used. I know these are not the SAP descriptions of the tables, but they are close enough for the purposes of this example.

Table Description
VBAP Line item data for sales order
VBUP Line item status for sales order
VBAK Header data for sales order
VBUK Header status for sales order

This is how the data would generally flow through the layers. The next section will clearly define the rules for these layers if you want to use it as a reference.

Bronze Layer As our raw layer, in bronze we would simply pull the four tables in by their name so that we have an exact copy of the source tables in bronze.

Silver Layer As our enriched layer, in silver the tables again would be named like they would be in the source system. At this point we will declare data types for the columns and dedupe any data from the bronze layer.

Gold Layer As our normalized layer, in gold we will have only two tables. We will have an [OrderDetail] and [OrderHeader].

VBAP and VBUP will combine to become [OrderDetail] and VBAK and VBUK will combine to become [OrderHeader].

Platinum Layer As our curated layer, in platinum we can now create tables to be consumed by different semantic models.

Option 1. We can create an orders table that has a granularity of order detail that repeats all the order header information on each line. This is a pretty typical use case.

Option 2. We can create a General Ledger table from bringing in the BSEG table from SAP and then combining that with our new OrderHeader table since most of the data in BSEG is at the order header and not the order line detail level. If we combined the BSEG with the OrderDetail table, it would create a cross join and blow out the number of records in that table.

Creating tables for Option 1 and Option 2 in the Gold layer would have violated rule 5 of the Gold Layer rules defined earlier. This is because in both options, the tables that are created would rely on the [OrderDetail] and/or [OrderHeader] to be created which means a Gold Layer table would rely on the data from another Gold Layer table.

Using our new Platinum layer, this clearly creates a path for that data to flow in a straight line instead of becoming spaghetti code by having to keep track of what table you are processing first to make sure some other table in the same layer doesn't break.

Now that I have explained my reasoning for introducing a platinum layer, I will use the next section of this post to set forth my rules of the "The Platinum Medallion Architecture" as I have crafted them. I understand that these may not work for every situation and may be over-engineering for certain situations. I just want to have a clear defined set of rules or perhaps guidelines for people to follow if they decide to implement it.

The Platinum Medallion Architecture

This is my set of rules that I like to follow for implementing this architecture. You can really look at them more as guidelines if you need to make some tweaks. If you do need to make some tweaks, feel free to message me on the contact page about the tweaks you have made. I would love to discuss this topic.


Bronze Layer

The raw layer. This layer should contain exact copies of data that has been retrieved from disparate sources.

Rule Description
1 Data should be an exact copy of data coming from disparate systems
2 Each row should be tagged with a timestamp of when data is imported into Lakehouse
3 All data should be converted to strings to prevent type errors
4 Tables should be grouped by the disparate system of origin
5 Any manually maintained CSV or Excel files live in the "Manual" folder or schema of the Lakehouse
6 Loads should be incremental if possible to improve pipeline performance and reduce storage costs
7 Data should only ever be appended or inserted. It should never be deleted.

Depending on the technology you use, rule four can be implemented in different ways. In Microsoft Fabric, I use folder structures in the lakehouse to group the data in Files and use schemas to group tables in Tables.

Example of the file structure would be a folder hierarchy such as this.

System > Schema > Table > Year > Month > day > Files

Tables would be something similar to the below.

sap.sapnpe_vbak

I have run into instances where it was necessary to run a full load of a table about once a week to make sure the data stayed in sync due to the source database used hard deletes instead of soft deletes and many tables did not have a date field for tracking when data changed.


Silver

The enriched layer. Conceptually this is one of the simplest layers. The complexity comes from knowing what data to use as a key that determines what is a deduplicate row.

Rule Description
1 Remove duplicate or unneeded rows of data
2 Convert fields to a useful data type where applicable
3 Tables should retain the same naming convention as the source system
4 Tables in Silver must derive their data from tables in Bronze

Gold

The normalized layer. In the gold layer we want to create tables that are standardized and normalized. For example, if we are pulling in customer data from multiple disparate systems, we should create a [Customer] table where the values from both systems can uniformly load.

Rule Description
1 Data in a table must be consistently the same level of granularity
2 Naming of tables and fields should be descriptive of the data they represent
3 Fields in tables should be in a data type that can be easily consumed by a semantic model or another analytics tool
4 Tables in Gold must derive their data from tables in Silver
5 No table in the Gold layer should depend on another Gold layer table as its source

Helpful Note:

This is also where I usually like to generate hash keys as ID fields. Specifically, I like to hash the values that were used to dedupe rows using an MD5 hash and then cast that value as an integer. It is very performant and it reduces the lookups needed between tables. If you hash the customer number on the customer table and hash the customer number on an order, the values should be the same. Storing that hashed value on both tables eliminates the need to do a lookup for an id field on the customer table from the order table.


Platinum

The curated layer. This is the layer that Semantic models and analytics tools should be querying. Platinum inherits the same foundational data quality rules as Gold, with additional rules governing its sourcing and implementation.

Rule Description
1 Data in a table must be consistently the same level of granularity
2 Naming of tables and fields should be descriptive of the data they represent
3 Fields in tables should be in a data type that can be easily consumed by a semantic model or another analytics tool
4 Tables in Platinum must derive their data from tables in Gold
5 No table in the Platinum layer should depend on another Platinum layer table as its source
6 Tables such as dimensional tables should be virtual tables or views
7 Tables such as fact tables that are very large should be materialized for faster querying

Helpful Note:

When building tables and views for a Semantic Model, I like to name them the name of the semantic model and the name of the table. For example, if I was building a "Sales" semantic model, I would name it something similar to the pattern below.

Sales_Customer
Sales_InvoiceDate
Sales_OrderDate
Sales_Orders

Conclusion

The Medallion Architecture is a solid foundation for organizing data in a modern lakehouse environment. Through years of experience with different data platforms I have identified a gap in the architecture. This enhanced version of the Medallion Architecture provides a more streamlined architecture to keep data better organized and should help prevent processes from becoming a tangled mess if implemented properly.

In my next post I plan on showing you how you can use this architecture in Microsoft Fabric to apply it to some of the new technologies that have been introduced there.

If you found this post useful I would love to hear your thoughts, and sharing it with others in the data community would be greatly appreciated.

I don't have a comments section yet, so feel free to send me feedback on this blog.


Author: Kevin Williams

Kevin is a data engineer and is the Business Intelligence Practice Lead at Software Design Partners specializing in data warehousing. He is a father, an occasional gamer, and lover of many different types of music.


The opinions expressed on this site are my own and may not represent my employer's view.
Share this post
About this blog...

The Medallion Architecture's Gold layer is widely accepted but poorly defined. In this post I explore the gaps I encountered implementing Gold across SQL Server, Synapse, and Microsoft Fabric — and why adding a Platinum layer solves problems that Gold alone cannot.

Archives


An error has occurred. This application may no longer respond until reloaded. Reload 🗙