我有2个停靠文件的例子,一个是工作的,另一个是不工作的.两者之间的主要区别是基本图像.
简单的基于python的图像docker文件:
# syntax = docker/dockerfile:experimental
FROM python:3.9-slim-bullseye
RUN apt-get update -qy && apt-get install -qy \
build-essential tini libsasl2-dev libssl-dev default-libmysqlclient-dev gnutls-bin
RUN pip install poetry==1.1.15
COPY pyproject.toml .
COPY poetry.lock .
RUN poetry config virtualenvs.create false
RUN --mount=type=cache,mode=0777,target=/root/.cache/pypoetry poetry install
Airflow基础图像扩展底座文件:
# syntax = docker/dockerfile:experimental
FROM apache/airflow:2.3.3-python3.9
USER root
RUN apt-get update -qy && apt-get install -qy \
build-essential tini libsasl2-dev libssl-dev default-libmysqlclient-dev gnutls-bin
USER airflow
RUN pip install poetry==1.1.15
COPY pyproject.toml .
COPY poetry.lock .
RUN poetry config virtualenvs.create false
RUN poetry config cache-dir /opt/airflow/.cache/pypoetry
RUN --mount=type=cache,uid=50000,mode=0777,target=/opt/airflow/.cache/pypoetry poetry install
在构建docker文件之前,在与pyproject.toml
文件相同的文件夹中运行poetry lock
!
pyproject.toml
个文件:pyproject.toml
个文件:
[tool.poetry]
name = "Airflow-test"
version = "0.1.0"
description = ""
authors = ["Lorem ipsum"]
[tool.poetry.dependencies]
python = "~3.9"
apache-airflow = { version = "2.3.3", extras = ["amazon", "crypto", "celery", "postgres", "hive", "jdbc", "mysql", "ssh", "slack", "statsd"] }
prometheus_client = "^0.8.0"
isodate = "0.6.1"
dacite = "1.6.0"
sqlparse = "^0.3.1"
python3-openid = "^3.1.0"
flask-appbuilder = ">=3.4.3"
alembic = ">=1.7.7"
apache-airflow-providers-google = "^8.1.0"
apache-airflow-providers-databricks = "^3.0.0"
apache-airflow-providers-amazon = "^4.0.0"
pendulum = "^2.1.2"
[tool.poetry.dev-dependencies]
[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"
为了构建映像,这是我使用的命令:
DOCKER_BUILDKIT=1 docker build --progress=plain -t airflow-test -f Dockerfile .
对于这两个映像,它们第一次构建时,poetry install
将需要下载所有依赖项.有趣的是,在我第二次构建该映像时,由于依赖项已被缓存,因此基于Python的映像速度要快得多,但基于Airflow的映像将再次try 下载所有200个依赖项.
根据O通过指定--mount=type=cache
了解到的情况,该目录将存储在映像存储库中,以便下次构建映像时可以重用.通过此操作可以修剪最终图像的大小.
在运行映像时,依赖关系是如何显示的?如果我运行docker run -it --user 50000 --entrypoint /bin/bash image
,一个简单的Python导入将在气流图像上工作,但不会在Python图像上工作.何时以及如何将依赖项重新附加到映像?
如果你想试一试,这里有一个虚拟项目,可以在本地克隆并玩耍: https://github.com/ioangrozea/Docker-dummy个