Structure:
Source: What, exactly, is dbt?(https://www.getdbt.com/blog/what-exactly-is-dbt/)
dbt (data build tool): A command-line tool that enables data analysts and engineers to transform data in their warehouses more effectively.
Note: I use the dbt tool in PrimeHub Notebook.
Here, we will do the following things:
Step 1: Add the postgresql docker-compose yaml file:
Step 2: Start the docker
$ docker-compose up -d
Reference:
Example database Github link:
Database Information:
Step 1: Build and push the docker image:
FROM infuseai/docker-stacks:tensorflow-notebook-v2-5-0-63fdf50a
RUN pip install piperider dbt-postgres
Step 2: Open jupyter notebook and initial the dbt
$ dbt init
02:54:09 Running with dbt=1.1.1
Enter a name for your project (letters, digits, underscore): dbt_test
Which database would you like to use?
[1] postgres
(Don't see the one you want? <https://docs.getdbt.com/docs/available-adapters>)
Enter a number: 1
02:54:16
Your new dbt project "dbt_test" was created!
For more information on how to configure the profiles.yml file,
please consult the dbt documentation here:
<https://docs.getdbt.com/docs/configure-your-profile>
One more thing:
Need help? Don't hesitate to reach out to us via GitHub issues or on Slack:
<https://community.getdbt.com/>
Happy modeling!
Step 3: Modify the database connection information.
$ vim ~/.dbt/profiles.yml
dbt_test:
outputs:
dev:
type: postgres
threads: 1
host: [host]
port: 5432
user: world
pass: world123
dbname: world-db
schema: public
prod:
type: postgres
threads: [1 or more]
host: [host]
port: [port]
user: [prod_username]
pass: [prod_password]
dbname: [dbname]
schema: [prod_schema]
target: dev
Step 4: Check the connection:
$ dbt debug
Configuration:
profiles.yml file [OK found and valid]
dbt_project.yml file [OK found and valid]
Required dependencies:
- git [OK found]
Connection:
host: [host]
port: 5432
user: world
database: world-db
schema: public
search_path: None
keepalives_idle: 0
sslmode: None
Connection test: [OK connection ok]
All checks passed!
Step 5: Modify the files in models folder:
world_city.sql
SELECT name, country_code, district, population
FROM public.city
WHERE country_code = 'NLD'
schema.yml
version: 2
models:
- name: world_city
description: "A starter dbt model for world database"
columns:
- name: world_city
description: "The primary key for this table"
tests:
- unique
- not_null