panhandlefamily.com

Running the Jaffle Shop dbt Project in a Containerized Environment

Written on

Chapter 1: Introduction to the Jaffle Shop Project

If you’re new to the data build tool (dbt), you may have encountered the Jaffle Shop project, which serves as a testing framework. This fictional e-commerce platform allows users to transform raw data from an application database into a structured model for customers and orders, primed for analytics.

For more information, check out the Jaffle Shop GitHub project.

A significant challenge with the Jaffle Shop project is its requirement for users—particularly those new to dbt—to configure and manage a local database for the dbt models to be applied. In this tutorial, I will guide you through creating a containerized version of the project utilizing Docker. This setup will enable the deployment of a Postgres instance and facilitate the configuration of the dbt project to interact with this database. Additionally, I’ll share a GitHub project link to help you quickly get all services operational.

To stay updated, subscribe to Data Pipeline, a newsletter focused on Data Engineering.

Section 1.1: Setting Up Docker

To kick things off, we need to define the services we intend to operate through Docker. We will begin by creating a docker-compose.yml file that outlines two services: one for the Postgres database and another for our custom service, which will be defined next using a Dockerfile.

version: "3.9"

services:

postgres:

container_name: postgres

image: postgres:15.2-alpine

environment:

  • POSTGRES_USER=postgres
  • POSTGRES_PASSWORD=postgres

ports:

  • 5432

healthcheck:

test: ["CMD-SHELL", "pg_isready -U postgres"]

interval: 5s

timeout: 5s

retries: 5

dbt:

container_name: dbt

build: .

image: dbt-jaffle-shop

volumes:

  • ./:/usr/src/dbt

depends_on:

postgres:

condition: service_healthy

This file specifies the Docker Compose version (3.9) and outlines the settings for the postgres and dbt services. The postgres service is based on the official PostgreSQL Docker image version 15.2-alpine. It designates the container name, maps port 5432 (the standard PostgreSQL port), and establishes environment variables for the username and password. The health check defines a command to verify the container's readiness.

The dbt service uses a Docker image built from the current directory and mounts it as a volume within the container. It is configured to start only when the postgres service is healthy.

Section 1.2: Creating the Dockerfile

To containerize the Jaffle Shop project, we must establish a Dockerfile that installs the requisite dependencies for both Python and dbt, ensuring that the container remains active once the environment is initialized.

FROM --platform=linux/amd64 python:3.10-slim-buster

RUN apt-get update

&& apt-get install -y --no-install-recommends

WORKDIR /usr/src/dbt

# Install the dbt Postgres adapter. This step will also install dbt-core

RUN pip install --upgrade pip

RUN pip install dbt-postgres==1.2.0

RUN pip install pytz

# Install dbt dependencies (as specified in packages.yml file)

# Build seeds, models and snapshots (and run tests wherever applicable)

CMD dbt deps && dbt build --profiles-dir ./profiles && sleep infinity

In this Dockerfile, we start with a Python base image, update the package manager, and install required libraries. We also set the working directory and install the dbt Postgres adapter alongside any other necessary dependencies.

Chapter 2: Configuring dbt with Postgres

To work with dbt effectively, we will utilize the dbt Command Line Interface (CLI). A directory containing a dbt_project.yml file is identified as a dbt project by the CLI. We'll create this file to specify basic configurations, including the project name and the profile we will define in the next step.

name: 'jaffle_shop'

config-version: 2

version: '0.1'

profile: 'jaffle_shop'

model-paths: ["models"]

seed-paths: ["seeds"]

test-paths: ["tests"]

analysis-paths: ["analysis"]

macro-paths: ["macros"]

target-path: "target"

clean-targets:

  • "target"
  • "dbt_modules"
  • "logs"

require-dbt-version: [">=1.0.0", "<2.0.0"]

models:

jaffle_shop:

materialized: table

staging:

materialized: view

Now we will create a profiles.yml file to store dbt profiles, which consist of targets specifying the connection details for the database or data warehouse.

jaffle_shop:

target: dev

outputs:

dev:

type: postgres

host: postgres

user: postgres

password: postgres

port: 5432

dbname: postgres

schema: public

threads: 1

This file outlines a profile named jaffle_shop with connection settings for a Postgres database running in a Docker container named postgres.

Here’s a tutorial video on how to install and set up dbt (data build tool) using Docker.

Chapter 3: Building and Running the Docker Services

To set everything in motion, we will build and launch our Docker services using the following commands:

$ docker-compose build

$ docker-compose up

These commands will initiate a Postgres instance and construct the dbt resources for the Jaffle Shop project as specified in the repository. The containers will stay active, allowing you to query the Postgres database and the tables created from dbt models, as well as execute further dbt commands via the CLI.

Section 3.1: Executing dbt Commands

Once the dbt container has built the specified models, you can access it to execute dbt commands, whether for new or modified models. First, list all active containers:

$ docker ps

Copy the ID of the dbt container and enter it with the following command:

$ docker exec -it <container_id> /bin/bash

Now, you have shell access to the container, enabling you to run dbt commands.

# Install dbt dependencies (if necessary)

dbt deps

# Build seeds

dbt seeds --profiles-dir profiles

# Build data models

dbt run --profiles-dir profiles

# Build snapshots

dbt snapshot --profiles-dir profiles

# Run tests

dbt test --profiles-dir profiles

Since we have mounted the local directory to the running container, any modifications in the local directory will reflect immediately in the container.

Section 3.2: Querying dbt Models

You can also query the Postgres database and the dbt models or snapshots created within it. To do this, enter the running Postgres container:

$ docker ps

$ docker exec -it <container_id> /bin/bash

Use psql, a terminal-based interface for PostgreSQL, to query the database:

$ psql -U postgres

Here are commands to list tables and views:

postgres=# dt

postgres=# dv

You can now execute SELECT queries on dbt models:

SELECT * FROM <model_name>;

This video covers an open-source project that integrates dbt with Airflow.

Chapter 4: Conclusion

I've created a GitHub repository where you can clone the containerized version of the Jaffle Shop dbt project and run it locally. You can find the project along with the code shared in this tutorial at the following link.

In conclusion, dbt is rapidly becoming a vital component of modern data stacks. If you're just starting with dbt, I highly recommend exploring the Jaffle Shop project. It’s a self-contained project designed by dbt Labs for testing and experimentation.

Remember, dbt is widely used by data analysts and analytics engineers, and it typically requires a database connection. However, many analysts may find it challenging to set up and initialize a local database.

This guide aims to help you get your dbt project and database operational as swiftly as possible. If you encounter any issues while running the project, feel free to reach out in the comments, and I’ll do my best to assist you with debugging your code and configuration.

👉 Subscribe to Data Pipeline, a newsletter dedicated to Data Engineering.

Share the page:

Twitter Facebook Reddit LinkIn

-----------------------

Recent Post:

Maximize Your Time: The Ultimate Productivity Strategy

Discover a powerful daily habit that can transform your productivity at home and work, freeing up your time and reducing stress.

Essential Metrics for Evaluating High-Quality Code

Explore crucial yet often overlooked metrics for assessing code quality, including documentation, naming conventions, and readability.

The Ultimate Guide to the Largest iPhone Screens Available

Discover which iPhone models boast the largest screens and their benefits, along with insightful video reviews and tips.