I deployed airflow on kubernetes using the official helm chart. I'm using KubernetesExecutor and git-sync.
I am using a seperate docker image for my webserver and my workers - each DAG gets its own docker image. I am running into DAG import errors at the airflow home page. E.g. if one of my DAGs is using pandas
then I'll get
Broken DAG: [/opt/airflow/dags/repo/dags/airflow_demo/ieso.py] Traceback (most recent call last):
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/opt/airflow/dags/repo/dags/project1/dag1.py", line 7, in <module>
from pandas import read_parquet
ModuleNotFoundError: No module named 'pandas'
I dont have pandas
installed on the webserver or scheduler docker images, because if I understand it correctly you shouldn't install the individual dependencies on these. I am getting the same error when running airflow dags list-import-errors
on the scheduler pod. I do have pandas
installed on the worker pod, but it doesn't get run, because the DAG cannot be discovered through these errors.
How do I make airflow discover this DAG without installing pandas
to either scheduler or webserver? I know installing it on both will fix this, however I am not interested in doing it this way.