Running the Jaffle Shop dbt Project in a Containerized Environment
Written on
Chapter 1: Introduction to the Jaffle Shop Project
If you’re new to the data build tool (dbt), you may have encountered the Jaffle Shop project, which serves as a testing framework. This fictional e-commerce platform allows users to transform raw data from an application database into a structured model for customers and orders, primed for analytics.
For more information, check out the Jaffle Shop GitHub project.
A significant challenge with the Jaffle Shop project is its requirement for users—particularly those new to dbt—to configure and manage a local database for the dbt models to be applied. In this tutorial, I will guide you through creating a containerized version of the project utilizing Docker. This setup will enable the deployment of a Postgres instance and facilitate the configuration of the dbt project to interact with this database. Additionally, I’ll share a GitHub project link to help you quickly get all services operational.
To stay updated, subscribe to Data Pipeline, a newsletter focused on Data Engineering.
Section 1.1: Setting Up Docker
To kick things off, we need to define the services we intend to operate through Docker. We will begin by creating a docker-compose.yml file that outlines two services: one for the Postgres database and another for our custom service, which will be defined next using a Dockerfile.
version: "3.9"
services:
postgres:
container_name: postgres
image: postgres:15.2-alpine
environment:
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=postgres
ports:
- 5432
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 5s
retries: 5
dbt:
container_name: dbt
build: .
image: dbt-jaffle-shop
volumes:
- ./:/usr/src/dbt
depends_on:
postgres:
condition: service_healthy
This file specifies the Docker Compose version (3.9) and outlines the settings for the postgres and dbt services. The postgres service is based on the official PostgreSQL Docker image version 15.2-alpine. It designates the container name, maps port 5432 (the standard PostgreSQL port), and establishes environment variables for the username and password. The health check defines a command to verify the container's readiness.
The dbt service uses a Docker image built from the current directory and mounts it as a volume within the container. It is configured to start only when the postgres service is healthy.
Section 1.2: Creating the Dockerfile
To containerize the Jaffle Shop project, we must establish a Dockerfile that installs the requisite dependencies for both Python and dbt, ensuring that the container remains active once the environment is initialized.
FROM --platform=linux/amd64 python:3.10-slim-buster
RUN apt-get update
&& apt-get install -y --no-install-recommends
WORKDIR /usr/src/dbt
# Install the dbt Postgres adapter. This step will also install dbt-core
RUN pip install --upgrade pip
RUN pip install dbt-postgres==1.2.0
RUN pip install pytz
# Install dbt dependencies (as specified in packages.yml file)
# Build seeds, models and snapshots (and run tests wherever applicable)
CMD dbt deps && dbt build --profiles-dir ./profiles && sleep infinity
In this Dockerfile, we start with a Python base image, update the package manager, and install required libraries. We also set the working directory and install the dbt Postgres adapter alongside any other necessary dependencies.
Chapter 2: Configuring dbt with Postgres
To work with dbt effectively, we will utilize the dbt Command Line Interface (CLI). A directory containing a dbt_project.yml file is identified as a dbt project by the CLI. We'll create this file to specify basic configurations, including the project name and the profile we will define in the next step.
name: 'jaffle_shop'
config-version: 2
version: '0.1'
profile: 'jaffle_shop'
model-paths: ["models"]
seed-paths: ["seeds"]
test-paths: ["tests"]
analysis-paths: ["analysis"]
macro-paths: ["macros"]
target-path: "target"
clean-targets:
- "target"
- "dbt_modules"
- "logs"
require-dbt-version: [">=1.0.0", "<2.0.0"]
models:
jaffle_shop:
materialized: table
staging:
materialized: view
Now we will create a profiles.yml file to store dbt profiles, which consist of targets specifying the connection details for the database or data warehouse.
jaffle_shop:
target: dev
outputs:
dev:
type: postgres
host: postgres
user: postgres
password: postgres
port: 5432
dbname: postgres
schema: public
threads: 1
This file outlines a profile named jaffle_shop with connection settings for a Postgres database running in a Docker container named postgres.
Here’s a tutorial video on how to install and set up dbt (data build tool) using Docker.
Chapter 3: Building and Running the Docker Services
To set everything in motion, we will build and launch our Docker services using the following commands:
$ docker-compose build
$ docker-compose up
These commands will initiate a Postgres instance and construct the dbt resources for the Jaffle Shop project as specified in the repository. The containers will stay active, allowing you to query the Postgres database and the tables created from dbt models, as well as execute further dbt commands via the CLI.
Section 3.1: Executing dbt Commands
Once the dbt container has built the specified models, you can access it to execute dbt commands, whether for new or modified models. First, list all active containers:
$ docker ps
Copy the ID of the dbt container and enter it with the following command:
$ docker exec -it <container_id> /bin/bash
Now, you have shell access to the container, enabling you to run dbt commands.
# Install dbt dependencies (if necessary)
dbt deps
# Build seeds
dbt seeds --profiles-dir profiles
# Build data models
dbt run --profiles-dir profiles
# Build snapshots
dbt snapshot --profiles-dir profiles
# Run tests
dbt test --profiles-dir profiles
Since we have mounted the local directory to the running container, any modifications in the local directory will reflect immediately in the container.
Section 3.2: Querying dbt Models
You can also query the Postgres database and the dbt models or snapshots created within it. To do this, enter the running Postgres container:
$ docker ps
$ docker exec -it <container_id> /bin/bash
Use psql, a terminal-based interface for PostgreSQL, to query the database:
$ psql -U postgres
Here are commands to list tables and views:
postgres=# dt
postgres=# dv
You can now execute SELECT queries on dbt models:
SELECT * FROM <model_name>;
This video covers an open-source project that integrates dbt with Airflow.
Chapter 4: Conclusion
I've created a GitHub repository where you can clone the containerized version of the Jaffle Shop dbt project and run it locally. You can find the project along with the code shared in this tutorial at the following link.
In conclusion, dbt is rapidly becoming a vital component of modern data stacks. If you're just starting with dbt, I highly recommend exploring the Jaffle Shop project. It’s a self-contained project designed by dbt Labs for testing and experimentation.
Remember, dbt is widely used by data analysts and analytics engineers, and it typically requires a database connection. However, many analysts may find it challenging to set up and initialize a local database.
This guide aims to help you get your dbt project and database operational as swiftly as possible. If you encounter any issues while running the project, feel free to reach out in the comments, and I’ll do my best to assist you with debugging your code and configuration.
👉 Subscribe to Data Pipeline, a newsletter dedicated to Data Engineering.