Notes taken by Horeb S.

In this section, we’re gonna build ETL pipelines for Yellow and Green Taxi data from NYC’s Taxi and Limousine Commission (TLC).

Table of contents

Getting started

The docker compose file looks like :

docker-compose.yml

volumes:
  postgres-data:
    driver: local
  kestra-data:
    driver: local

services:
  postgres:
    image: postgres
    volumes:
      - postgres-data:/var/lib/postgresql/data
    environment:
      POSTGRES_DB: kestra
      POSTGRES_USER: kestra
      POSTGRES_PASSWORD: k3str4
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -d $${POSTGRES_DB} -U $${POSTGRES_USER}"]
      interval: 30s
      timeout: 10s
      retries: 10

  kestra:
    image: kestra/kestra:develop
    pull_policy: always
    user: "root"
    command: server standalone
    volumes:
      - kestra-data:/app/storage
      - /var/run/docker.sock:/var/run/docker.sock
      - /tmp/kestra-wd:/tmp/kestra-wd
    environment:
      KESTRA_CONFIGURATION: |
        datasources:
          postgres:
            url: jdbc:postgresql://postgres:5432/kestra
            driverClassName: org.postgresql.Driver
            username: kestra
            password: k3str4
        kestra:
          server:
            basicAuth:
              enabled: false
              username: "[email protected]" # it must be a valid email address
              password: kestra
          repository:
            type: postgres
          storage:
            type: local
            local:
              basePath: "/app/storage"
          queue:
            type: postgres
          tasks:
            tmpDir:
              path: /tmp/kestra-wd/tmp
          url: <http://localhost:8080/>
    ports:
      - "8080:8080"
      - "8081:8081"
    depends_on:
      postgres:
        condition: service_started

This docker-compose file sets up two main services: a PostgreSQL database and Kestra server. The PostgreSQL service is configured with specific credentials and health checks, while the Kestra service is set up with various configurations including database connection, authentication, and storage settings. The ports 8080 and 8081 are exposed for web interface access and API communication respectively.

Let's break down the key components:

Let’s go deep in the Kestra configuration

Now, let’s launch kestra using docker compose up -d in detached mode.

ETL Pipelines in Kestra : Detailed Walkthrough

Executing an introductory flow

Flow : 01_getting_started_data_pipeline.yaml