SQL models
Related reference docs
Getting started
If you're new to dbt, we recommend that you read a quickstart guide to build your first dbt project with models.
dbt's Python capabilities are an extension of its capabilities with SQL models. If you're new to dbt, we recommend that you read this page first, before reading: "Python Models"
A SQL model is a select
statement. Models are defined in .sql
files (typically in your models
directory):
- Each
.sql
file contains one model /select
statement - The model name is inherited from the filename.
- We strongly recommend using underscores for model names, not dots. For example, use
models/my_model.sql
instead ofmodels/my.model.sql
. - Models can be nested in subdirectories within the
models
directory.
Refer to How we style our dbt models for details on how we recommend you name your models.
When you execute the dbt run
command, dbt will build this model data warehouseA data warehouse is a data management system used for data storage and computing that allows for analytics activities such as transforming and sharing data. by wrapping it in a create view as
or create table as
statement.
For example, consider this customers
model:
with customer_orders as (
select
customer_id,
min(order_date) as first_order_date,
max(order_date) as most_recent_order_date,
count(order_id) as number_of_orders
from jaffle_shop.orders
group by 1
)
select
customers.customer_id,
customers.first_name,
customers.last_name,
customer_orders.first_order_date,
customer_orders.most_recent_order_date,
coalesce(customer_orders.number_of_orders, 0) as number_of_orders
from jaffle_shop.customers
left join customer_orders using (customer_id)
When you execute dbt run
, dbt will build this as a view named customers
in your target schema:
create view dbt_alice.customers as (
with customer_orders as (
select
customer_id,
min(order_date) as first_order_date,
max(order_date) as most_recent_order_date,
count(order_id) as number_of_orders
from jaffle_shop.orders
group by 1
)
select
customers.customer_id,
customers.first_name,
customers.last_name,
customer_orders.first_order_date,
customer_orders.most_recent_order_date,
coalesce(customer_orders.number_of_orders, 0) as number_of_orders
from jaffle_shop.customers
left join customer_orders using (customer_id)
)
Why a view named dbt_alice.customers
? By default dbt will:
- Create models as viewsA view (as opposed to a table) is a defined passthrough SQL query that can be run against a database (or data warehouse).
- Build models in a target schema you define
- Use your file name as the view or tableIn simplest terms, a table is the direct storage of data in rows and columns. Think excel sheet with raw values in each of the cells. name in the database
You can use configurations to change any of these behaviors — more on that later.
FAQs
Configuring models
Configurations are "model settings" that can be set in your dbt_project.yml
file, and in your model file using a config
block. Some example configurations include:
- Changing the materializationThe exact Data Definition Language (DDL) that dbt will use when creating the model’s equivalent in a data warehouse. that a model uses — a materialization determines the SQL that dbt uses to create the model in your warehouse.
- Build models into separate schemas.
- Apply tags to a model.
Here's an example of model configuration:
name: jaffle_shop
config-version: 2
...
models:
jaffle_shop: # this matches the `name:`` config
+materialized: view # this applies to all models in the current project
marts:
+materialized: table # this applies to all models in the `marts/` directory
marketing:
+schema: marketing # this applies to all models in the `marts/marketing/`` directory
{{ config(
materialized="view",
schema="marketing"
) }}
with customer_orders as ...
It is important to note that configurations are applied hierarchically — a configuration applied to a subdirectory will override any general configurations.
You can learn more about configurations in the reference docs.
FAQs
Building dependencies between models
You can build dependencies between models by using the ref
function in place of table names in a query. Use the name of another model as the argument for ref
.
- Model
- Compiled code in dev
- Compiled code in prod
with customers as (
select * from {{ ref('stg_customers') }}
),
orders as (
select * from {{ ref('stg_orders') }}
),
...
create view dbt_alice.customers as (
with customers as (
select * from dbt_alice.stg_customers
),
orders as (
select * from dbt_alice.stg_orders
),
...
)
...
create view analytics.customers as (
with customers as (
select * from analytics.stg_customers
),
orders as (
select * from analytics.stg_orders
),
...
)
...
dbt uses the ref
function to:
- Determine the order to run the models by creating a dependent acyclic graph (DAG).
- Manage separate environments — dbt will replace the model specified in the
ref
function with the database name for the tableIn simplest terms, a table is the direct storage of data in rows and columns. Think excel sheet with raw values in each of the cells. (or view). Importantly, this is environment-aware — if you're running dbt with a target schema nameddbt_alice
, it will select from an upstream table in the same schema. Check out the tabs above to see this in action.
Additionally, the ref
function encourages you to write modular transformations, so that you can re-use models, and reduce repeated code.
Testing and documenting models
You can also document and test models — skip ahead to the section on testing and documentation for more information.