SQL Querywise Blog
Documentation 8 min readMay 2026

How to Document dbt Models with schema.yml: A Practical Guide

Learn how to write complete, maintainable dbt documentation using schema.yml — covering model descriptions, column-level docs, tests, and doc blocks for large projects.

dbt Data Documentation schema.yml Data Engineering Analytics Engineering

Why dbt Documentation Matters

dbt (data build tool) has become the standard for analytics engineering — but documentation is often the last thing teams invest in. Undocumented models create onboarding friction, make debugging harder, and erode trust in data. A well-maintained schema.yml file is the single most impactful thing you can do to make your dbt project maintainable at scale.

This guide covers everything you need to write complete, useful dbt documentation: model descriptions, column-level docs, doc blocks, and tests — with real examples you can copy into your project today.


1. The Anatomy of schema.yml

Every dbt model should have a corresponding entry in schema.yml. The file lives in the same directory as your models and follows this structure:

yaml
version: 2

models:
  - name: orders
    description: "One row per order placed on the platform. Excludes test and internal orders."
    columns:
      - name: order_id
        description: "Surrogate key for the order."
        tests:
          - unique
          - not_null
      - name: customer_id
        description: "Foreign key to the customers model."
        tests:
          - not_null
          - relationships:
              to: ref('customers')
              field: customer_id
      - name: status
        description: "Current order status. One of: placed, shipped, delivered, returned, cancelled."
        tests:
          - accepted_values:
              values: ['placed', 'shipped', 'delivered', 'returned', 'cancelled']

This single file gives you: model-level context, column-level definitions, and data quality tests — all in one place.


2. Writing Good Model Descriptions

A model description should answer three questions:

1. **What is the grain?** (one row per order, per user per day, etc.)

2. **What is excluded?** (test accounts, deleted records, etc.)

3. **Where does it come from?** (upstream sources or models)

yaml
- name: fct_revenue
  description: >
    Daily revenue aggregated by product category and region.
    Grain: one row per (date, category, region) combination.
    Excludes refunded orders (status = 'returned') and internal test transactions.
    Source: built from stg_orders and dim_products.

Avoid vague descriptions like "revenue table" or "order data". Be specific about what the model contains and what it does not.


3. Column-Level Documentation Best Practices

Column descriptions should explain business meaning, not just repeat the column name:

ColumnBad descriptionGood description

|--------|----------------|------------------|

`ltv`"LTV value""Predicted 12-month lifetime value in USD, calculated using the logistic regression model trained on 24 months of cohort data."
is_active"Active flag""True if the user has logged in at least once in the last 90 days."
created_at"Creation timestamp""UTC timestamp when the record was first inserted. Never updated after creation."

Reusing Descriptions with doc Blocks

For columns that appear across many models, use doc blocks to avoid copy-paste:

markdown
{% docs customer_id %}
Surrogate key for the customer. Generated by dbt_utils.generate_surrogate_key(['email']).
Joins to dim_customers.customer_id.
{% enddocs %}

Then reference them in schema.yml:

yaml
columns:
  - name: customer_id
    description: "{{ doc('customer_id') }}"

4. Adding Tests to Every Column

The four built-in dbt tests cover most use cases:

TestWhen to use

|------|-------------|

`unique`Primary keys, surrogate keys
not_nullAny column that should never be NULL
accepted_valuesStatus fields, enums, categorical columns
relationshipsForeign keys to other models

Run tests with dbt test or dbt test --select fct_orders to validate a single model.


5. Documenting Sources

Sources (raw tables) should also be documented in sources.yml:

yaml
version: 2

sources:
  - name: raw_ecommerce
    description: "Raw tables loaded from the e-commerce platform via Fivetran."
    tables:
      - name: orders
        description: "Raw order events. Append-only, one row per event."
        freshness:
          warn_after: {count: 12, period: hour}
          error_after: {count: 24, period: hour}
        loaded_at_field: _fivetran_synced

6. Generating and Serving the Docs Site

bash
dbt docs generate
dbt docs serve

This creates an interactive site with a DAG of all your models, full column-level documentation, and test results.


7. Automating Documentation with SQL Querywise DocGen

Writing schema.yml entries manually for dozens of models is time-consuming. SQL Querywise DocGen generates complete schema.yml output directly from your SQL:

1. Paste your model SQL into DocGen

2. Select **dbt schema.yml** as the output format

3. Get a ready-to-commit `schema.yml` block with column descriptions, inferred types, and suggested tests


Summary

PrincipleImplementation

|-----------|---------------|

Grain firstState the grain in every model description
Business meaningColumn descriptions explain semantics, not syntax
DRY documentationUse doc blocks for shared column descriptions
Tests everywhereAt minimum: unique + not_null on all PKs/FKs
Source freshnessAdd freshness checks to all source tables

*SQL Querywise DocGen generates complete dbt schema.yml documentation from your SQL models automatically — including column descriptions, data types, and suggested tests.*

Try SQL Querywise on your own queries

3 free analyses — DocGen, Advisor, Reviewer, Explainer, Converter, and Analysis. No credit card required.

Try the live demo

Related articles