Documentation 8 min readMay 2026

How to Document dbt Models with schema.yml: A Practical Guide

Learn how to write complete, maintainable dbt documentation using schema.yml — covering model descriptions, column-level docs, tests, and doc blocks for large projects.

dbt Data Documentation schema.yml Data Engineering Analytics Engineering

Why dbt Documentation Matters

dbt (data build tool) has become the standard for analytics engineering — but documentation is often the last thing teams invest in. Undocumented models create onboarding friction, make debugging harder, and erode trust in data. A well-maintained schema.yml file is the single most impactful thing you can do to make your dbt project maintainable at scale.

This guide covers everything you need to write complete, useful dbt documentation: model descriptions, column-level docs, doc blocks, and tests — with real examples you can copy into your project today.

1. The Anatomy of schema.yml

Every dbt model should have a corresponding entry in schema.yml. The file lives in the same directory as your models and follows this structure:

yaml

version: 2

models:
  - name: orders
    description: "One row per order placed on the platform. Excludes test and internal orders."
    columns:
      - name: order_id
        description: "Surrogate key for the order."
        tests:
          - unique
          - not_null
      - name: customer_id
        description: "Foreign key to the customers model."
        tests:
          - not_null
          - relationships:
              to: ref('customers')
              field: customer_id
      - name: status
        description: "Current order status. One of: placed, shipped, delivered, returned, cancelled."
        tests:
          - accepted_values:
              values: ['placed', 'shipped', 'delivered', 'returned', 'cancelled']

This single file gives you: model-level context, column-level definitions, and data quality tests — all in one place.

2. Writing Good Model Descriptions

A model description should answer three questions:

1. **What is the grain?** (one row per order, per user per day, etc.)

2. **What is excluded?** (test accounts, deleted records, etc.)

3. **Where does it come from?** (upstream sources or models)

yaml

- name: fct_revenue
  description: >
    Daily revenue aggregated by product category and region.
    Grain: one row per (date, category, region) combination.
    Excludes refunded orders (status = 'returned') and internal test transactions.
    Source: built from stg_orders and dim_products.

Avoid vague descriptions like "revenue table" or "order data". Be specific about what the model contains and what it does not.

3. Column-Level Documentation Best Practices

Column descriptions should explain business meaning, not just repeat the column name:

Column	Bad description	Good description

|--------|----------------|------------------|

`ltv`	"LTV value"	"Predicted 12-month lifetime value in USD, calculated using the logistic regression model trained on 24 months of cohort data."
is_active	"Active flag"	"True if the user has logged in at least once in the last 90 days."
created_at	"Creation timestamp"	"UTC timestamp when the record was first inserted. Never updated after creation."

Reusing Descriptions with doc Blocks

For columns that appear across many models, use doc blocks to avoid copy-paste:

markdown

{% docs customer_id %}
Surrogate key for the customer. Generated by dbt_utils.generate_surrogate_key(['email']).
Joins to dim_customers.customer_id.
{% enddocs %}

Then reference them in schema.yml:

yaml

columns:
  - name: customer_id
    description: "{{ doc('customer_id') }}"

4. Adding Tests to Every Column

The four built-in dbt tests cover most use cases:

Test	When to use

|------|-------------|

`unique`	Primary keys, surrogate keys
not_null	Any column that should never be NULL
accepted_values	Status fields, enums, categorical columns
relationships	Foreign keys to other models

Run tests with dbt test or dbt test --select fct_orders to validate a single model.

5. Documenting Sources

Sources (raw tables) should also be documented in sources.yml:

yaml

version: 2

sources:
  - name: raw_ecommerce
    description: "Raw tables loaded from the e-commerce platform via Fivetran."
    tables:
      - name: orders
        description: "Raw order events. Append-only, one row per event."
        freshness:
          warn_after: {count: 12, period: hour}
          error_after: {count: 24, period: hour}
        loaded_at_field: _fivetran_synced

6. Generating and Serving the Docs Site

bash

dbt docs generate
dbt docs serve

This creates an interactive site with a DAG of all your models, full column-level documentation, and test results.

7. Automating Documentation with SQL Querywise DocGen

Writing schema.yml entries manually for dozens of models is time-consuming. SQL Querywise DocGen generates complete schema.yml output directly from your SQL:

1. Paste your model SQL into DocGen

2. Select **dbt schema.yml** as the output format

3. Get a ready-to-commit `schema.yml` block with column descriptions, inferred types, and suggested tests

Summary

Principle	Implementation

|-----------|---------------|

Grain first	State the grain in every model description
Business meaning	Column descriptions explain semantics, not syntax
DRY documentation	Use doc blocks for shared column descriptions
Tests everywhere	At minimum: unique + not_null on all PKs/FKs
Source freshness	Add freshness checks to all source tables

*SQL Querywise DocGen generates complete dbt schema.yml documentation from your SQL models automatically — including column descriptions, data types, and suggested tests.*

Try SQL Querywise on your own queries

3 free analyses — DocGen, Advisor, Reviewer, Explainer, Converter, and Analysis. No credit card required.

Try the live demo

How to Document dbt Models with schema.yml: A Practical Guide

Why dbt Documentation Matters

1. The Anatomy of schema.yml

2. Writing Good Model Descriptions

3. Column-Level Documentation Best Practices

Reusing Descriptions with doc Blocks

4. Adding Tests to Every Column

5. Documenting Sources

6. Generating and Serving the Docs Site

7. Automating Documentation with SQL Querywise DocGen

Summary

Try SQL Querywise on your own queries

How to Document SQL Queries: A Practical Guide for Data Teams