How to Document dbt Models with schema.yml: A Practical Guide
Learn how to write complete, maintainable dbt documentation using schema.yml — covering model descriptions, column-level docs, tests, and doc blocks for large projects.
Why dbt Documentation Matters
dbt (data build tool) has become the standard for analytics engineering — but documentation is often the last thing teams invest in. Undocumented models create onboarding friction, make debugging harder, and erode trust in data. A well-maintained schema.yml file is the single most impactful thing you can do to make your dbt project maintainable at scale.
This guide covers everything you need to write complete, useful dbt documentation: model descriptions, column-level docs, doc blocks, and tests — with real examples you can copy into your project today.
1. The Anatomy of schema.yml
Every dbt model should have a corresponding entry in schema.yml. The file lives in the same directory as your models and follows this structure:
version: 2
models:
- name: orders
description: "One row per order placed on the platform. Excludes test and internal orders."
columns:
- name: order_id
description: "Surrogate key for the order."
tests:
- unique
- not_null
- name: customer_id
description: "Foreign key to the customers model."
tests:
- not_null
- relationships:
to: ref('customers')
field: customer_id
- name: status
description: "Current order status. One of: placed, shipped, delivered, returned, cancelled."
tests:
- accepted_values:
values: ['placed', 'shipped', 'delivered', 'returned', 'cancelled']This single file gives you: model-level context, column-level definitions, and data quality tests — all in one place.
2. Writing Good Model Descriptions
A model description should answer three questions:
1. **What is the grain?** (one row per order, per user per day, etc.)
2. **What is excluded?** (test accounts, deleted records, etc.)
3. **Where does it come from?** (upstream sources or models)
- name: fct_revenue
description: >
Daily revenue aggregated by product category and region.
Grain: one row per (date, category, region) combination.
Excludes refunded orders (status = 'returned') and internal test transactions.
Source: built from stg_orders and dim_products.Avoid vague descriptions like "revenue table" or "order data". Be specific about what the model contains and what it does not.
3. Column-Level Documentation Best Practices
Column descriptions should explain business meaning, not just repeat the column name:
| Column | Bad description | Good description |
|---|
|--------|----------------|------------------|
| `ltv` | "LTV value" | "Predicted 12-month lifetime value in USD, calculated using the logistic regression model trained on 24 months of cohort data." |
|---|---|---|
| is_active | "Active flag" | "True if the user has logged in at least once in the last 90 days." |
| created_at | "Creation timestamp" | "UTC timestamp when the record was first inserted. Never updated after creation." |
Reusing Descriptions with doc Blocks
For columns that appear across many models, use doc blocks to avoid copy-paste:
{% docs customer_id %}
Surrogate key for the customer. Generated by dbt_utils.generate_surrogate_key(['email']).
Joins to dim_customers.customer_id.
{% enddocs %}Then reference them in schema.yml:
columns:
- name: customer_id
description: "{{ doc('customer_id') }}"4. Adding Tests to Every Column
The four built-in dbt tests cover most use cases:
| Test | When to use |
|---|
|------|-------------|
| `unique` | Primary keys, surrogate keys |
|---|---|
| not_null | Any column that should never be NULL |
| accepted_values | Status fields, enums, categorical columns |
| relationships | Foreign keys to other models |
Run tests with dbt test or dbt test --select fct_orders to validate a single model.
5. Documenting Sources
Sources (raw tables) should also be documented in sources.yml:
version: 2
sources:
- name: raw_ecommerce
description: "Raw tables loaded from the e-commerce platform via Fivetran."
tables:
- name: orders
description: "Raw order events. Append-only, one row per event."
freshness:
warn_after: {count: 12, period: hour}
error_after: {count: 24, period: hour}
loaded_at_field: _fivetran_synced6. Generating and Serving the Docs Site
dbt docs generate dbt docs serve
This creates an interactive site with a DAG of all your models, full column-level documentation, and test results.
7. Automating Documentation with SQL Querywise DocGen
Writing schema.yml entries manually for dozens of models is time-consuming. SQL Querywise DocGen generates complete schema.yml output directly from your SQL:
1. Paste your model SQL into DocGen
2. Select **dbt schema.yml** as the output format
3. Get a ready-to-commit `schema.yml` block with column descriptions, inferred types, and suggested tests
Summary
| Principle | Implementation |
|---|
|-----------|---------------|
| Grain first | State the grain in every model description |
|---|---|
| Business meaning | Column descriptions explain semantics, not syntax |
| DRY documentation | Use doc blocks for shared column descriptions |
| Tests everywhere | At minimum: unique + not_null on all PKs/FKs |
| Source freshness | Add freshness checks to all source tables |
*SQL Querywise DocGen generates complete dbt schema.yml documentation from your SQL models automatically — including column descriptions, data types, and suggested tests.*
Try SQL Querywise on your own queries
3 free analyses — DocGen, Advisor, Reviewer, Explainer, Converter, and Analysis. No credit card required.
Try the live demoRelated articles