## What is DolphinScheduler? DolphinScheduler is a distributed and easy-to-expand visual DAG workflow scheduling system, dedicated to solving the complex dependencies in data processing, making the scheduling system out of the box for data processing. GitHub URL: https://github.com/apache/incubator-dolphinscheduler Official Website: https://dolphinscheduler.apache.org ![DolphinScheduler](https://dolphinscheduler.apache.org/img/hlogo_colorful.svg) [![EN doc](https://img.shields.io/badge/document-English-blue.svg)](README.md) [![CN doc](https://img.shields.io/badge/文档-中文版-blue.svg)](README_zh_CN.md) ## Prerequisites - [Docker](https://docs.docker.com/engine/) 1.13.1+ - [Docker Compose](https://docs.docker.com/compose/) 1.11.0+ ## How to use this docker image #### You can start a dolphinscheduler by docker-compose (recommended) ``` $ docker-compose -f ./docker/docker-swarm/docker-compose.yml up -d ``` The default **postgres** user `root`, postgres password `root` and database `dolphinscheduler` are created in the `docker-compose.yml`. The default **zookeeper** is created in the `docker-compose.yml`. Access the Web UI: http://192.168.xx.xx:12345/dolphinscheduler The default username is `admin` and the default password is `dolphinscheduler123` > **Tip**: For quick start in docker, you can create a tenant named `ds` and associate the user `admin` with the tenant `ds` #### Or via Environment Variables **`DATABASE_HOST`** **`DATABASE_PORT`** **`DATABASE_DATABASE`** **`ZOOKEEPER_QUORUM`** You can specify **existing postgres and zookeeper service**. Example: ``` $ docker run -d --name dolphinscheduler \ -e ZOOKEEPER_QUORUM="192.168.x.x:2181" \ -e DATABASE_HOST="192.168.x.x" -e DATABASE_PORT="5432" -e DATABASE_DATABASE="dolphinscheduler" \ -e DATABASE_USERNAME="test" -e DATABASE_PASSWORD="test" \ -p 12345:12345 \ apache/dolphinscheduler:latest all ``` Access the Web UI:http://192.168.xx.xx:12345/dolphinscheduler #### Or start a standalone dolphinscheduler server You can start a standalone dolphinscheduler server. * Create a **local volume** for resource storage, For example: ``` docker volume create dolphinscheduler-resource-local ``` * Start a **master server**, For example: ``` $ docker run -d --name dolphinscheduler-master \ -e ZOOKEEPER_QUORUM="192.168.x.x:2181" \ -e DATABASE_HOST="192.168.x.x" -e DATABASE_PORT="5432" -e DATABASE_DATABASE="dolphinscheduler" \ -e DATABASE_USERNAME="test" -e DATABASE_PASSWORD="test" \ apache/dolphinscheduler:latest master-server ``` * Start a **worker server** (including **logger server**), For example: ``` $ docker run -d --name dolphinscheduler-worker \ -e ZOOKEEPER_QUORUM="192.168.x.x:2181" \ -e DATABASE_HOST="192.168.x.x" -e DATABASE_PORT="5432" -e DATABASE_DATABASE="dolphinscheduler" \ -e DATABASE_USERNAME="test" -e DATABASE_PASSWORD="test" \ -e ALERT_LISTEN_HOST="dolphinscheduler-alert" \ -v dolphinscheduler-resource-local:/dolphinscheduler \ apache/dolphinscheduler:latest worker-server ``` * Start a **api server**, For example: ``` $ docker run -d --name dolphinscheduler-api \ -e ZOOKEEPER_QUORUM="192.168.x.x:2181" \ -e DATABASE_HOST="192.168.x.x" -e DATABASE_PORT="5432" -e DATABASE_DATABASE="dolphinscheduler" \ -e DATABASE_USERNAME="test" -e DATABASE_PASSWORD="test" \ -v dolphinscheduler-resource-local:/dolphinscheduler \ -p 12345:12345 \ apache/dolphinscheduler:latest api-server ``` * Start a **alert server**, For example: ``` $ docker run -d --name dolphinscheduler-alert \ -e DATABASE_HOST="192.168.x.x" -e DATABASE_PORT="5432" -e DATABASE_DATABASE="dolphinscheduler" \ -e DATABASE_USERNAME="test" -e DATABASE_PASSWORD="test" \ apache/dolphinscheduler:latest alert-server ``` **Note**: You must be specify `DATABASE_HOST` `DATABASE_PORT` `DATABASE_DATABASE` `DATABASE_USERNAME` `DATABASE_PASSWORD` `ZOOKEEPER_QUORUM` when start a standalone dolphinscheduler server. ## How to build a docker image You can build a docker image in A Unix-like operating system, You can also build it in Windows operating system. In Unix-Like, Example: ```bash $ cd path/incubator-dolphinscheduler $ sh ./docker/build/hooks/build ``` In Windows, Example: ```bat C:\incubator-dolphinscheduler>.\docker\build\hooks\build.bat ``` Please read `./docker/build/hooks/build` `./docker/build/hooks/build.bat` script files if you don't understand ## Environment Variables The DolphinScheduler Docker container is configured through environment variables, and the default value will be used if an environment variable is not set. **`DATABASE_TYPE`** This environment variable sets the type for database. The default value is `postgresql`. **Note**: You must be specify it when start a standalone dolphinscheduler server. Like `master-server`, `worker-server`, `api-server`, `alert-server`. **`DATABASE_DRIVER`** This environment variable sets the type for database. The default value is `org.postgresql.Driver`. **Note**: You must be specify it when start a standalone dolphinscheduler server. Like `master-server`, `worker-server`, `api-server`, `alert-server`. **`DATABASE_HOST`** This environment variable sets the host for database. The default value is `127.0.0.1`. **Note**: You must be specify it when start a standalone dolphinscheduler server. Like `master-server`, `worker-server`, `api-server`, `alert-server`. **`DATABASE_PORT`** This environment variable sets the port for database. The default value is `5432`. **Note**: You must be specify it when start a standalone dolphinscheduler server. Like `master-server`, `worker-server`, `api-server`, `alert-server`. **`DATABASE_USERNAME`** This environment variable sets the username for database. The default value is `root`. **Note**: You must be specify it when start a standalone dolphinscheduler server. Like `master-server`, `worker-server`, `api-server`, `alert-server`. **`DATABASE_PASSWORD`** This environment variable sets the password for database. The default value is `root`. **Note**: You must be specify it when start a standalone dolphinscheduler server. Like `master-server`, `worker-server`, `api-server`, `alert-server`. **`DATABASE_DATABASE`** This environment variable sets the database for database. The default value is `dolphinscheduler`. **Note**: You must be specify it when start a standalone dolphinscheduler server. Like `master-server`, `worker-server`, `api-server`, `alert-server`. **`DATABASE_PARAMS`** This environment variable sets the database for database. The default value is `characterEncoding=utf8`. **Note**: You must be specify it when start a standalone dolphinscheduler server. Like `master-server`, `worker-server`, `api-server`, `alert-server`. **`HADOOP_HOME`** This environment variable sets `HADOOP_HOME`. The default value is `/opt/soft/hadoop`. **`HADOOP_CONF_DIR`** This environment variable sets `HADOOP_CONF_DIR`. The default value is `/opt/soft/hadoop/etc/hadoop`. **`SPARK_HOME1`** This environment variable sets `SPARK_HOME1`. The default value is `/opt/soft/spark1`. **`SPARK_HOME2`** This environment variable sets `SPARK_HOME2`. The default value is `/opt/soft/spark2`. **`PYTHON_HOME`** This environment variable sets `PYTHON_HOME`. The default value is `/usr`. **`JAVA_HOME`** This environment variable sets `JAVA_HOME`. The default value is `/usr/lib/jvm/java-1.8-openjdk`. **`HIVE_HOME`** This environment variable sets `HIVE_HOME`. The default value is `/opt/soft/hive`. **`FLINK_HOME`** This environment variable sets `FLINK_HOME`. The default value is `/opt/soft/flink`. **`DATAX_HOME`** This environment variable sets `DATAX_HOME`. The default value is `/opt/soft/datax`. **`DOLPHINSCHEDULER_DATA_BASEDIR_PATH`** User data directory path, self configuration, please make sure the directory exists and have read write permissions. The default value is `/tmp/dolphinscheduler` **`DOLPHINSCHEDULER_OPTS`** This environment variable sets java options. The default value is empty. **`RESOURCE_STORAGE_TYPE`** This environment variable sets resource storage type for dolphinscheduler like `HDFS`, `S3`, `NONE`. The default value is `HDFS`. **`RESOURCE_UPLOAD_PATH`** This environment variable sets resource store path on HDFS/S3 for resource storage. The default value is `/dolphinscheduler`. **`FS_DEFAULT_FS`** This environment variable sets fs.defaultFS for resource storage like `file:///`, `hdfs://mycluster:8020` or `s3a://dolphinscheduler`. The default value is `file:///`. **`FS_S3A_ENDPOINT`** This environment variable sets s3 endpoint for resource storage. The default value is `s3.xxx.amazonaws.com`. **`FS_S3A_ACCESS_KEY`** This environment variable sets s3 access key for resource storage. The default value is `xxxxxxx`. **`FS_S3A_SECRET_KEY`** This environment variable sets s3 secret key for resource storage. The default value is `xxxxxxx`. **`ZOOKEEPER_QUORUM`** This environment variable sets zookeeper quorum for `master-server` and `worker-serverr`. The default value is `127.0.0.1:2181`. **Note**: You must be specify it when start a standalone dolphinscheduler server. Like `master-server`, `worker-server`. **`ZOOKEEPER_ROOT`** This environment variable sets zookeeper root directory for dolphinscheduler. The default value is `/dolphinscheduler`. **`MASTER_EXEC_THREADS`** This environment variable sets exec thread num for `master-server`. The default value is `100`. **`MASTER_EXEC_TASK_NUM`** This environment variable sets exec task num for `master-server`. The default value is `20`. **`MASTER_HEARTBEAT_INTERVAL`** This environment variable sets heartbeat interval for `master-server`. The default value is `10`. **`MASTER_TASK_COMMIT_RETRYTIMES`** This environment variable sets task commit retry times for `master-server`. The default value is `5`. **`MASTER_TASK_COMMIT_INTERVAL`** This environment variable sets task commit interval for `master-server`. The default value is `1000`. **`MASTER_MAX_CPULOAD_AVG`** This environment variable sets max cpu load avg for `master-server`. The default value is `100`. **`MASTER_RESERVED_MEMORY`** This environment variable sets reserved memory for `master-server`. The default value is `0.1`. **`MASTER_LISTEN_PORT`** This environment variable sets port for `master-server`. The default value is `5678`. **`WORKER_EXEC_THREADS`** This environment variable sets exec thread num for `worker-server`. The default value is `100`. **`WORKER_HEARTBEAT_INTERVAL`** This environment variable sets heartbeat interval for `worker-server`. The default value is `10`. **`WORKER_MAX_CPULOAD_AVG`** This environment variable sets max cpu load avg for `worker-server`. The default value is `100`. **`WORKER_RESERVED_MEMORY`** This environment variable sets reserved memory for `worker-server`. The default value is `0.1`. **`WORKER_LISTEN_PORT`** This environment variable sets port for `worker-server`. The default value is `1234`. **`WORKER_GROUPS`** This environment variable sets groups for `worker-server`. The default value is `default`. **`WORKER_HOST_WEIGHT`** This environment variable sets weight for `worker-server`. The default value is `100`. **`ALERT_LISTEN_HOST`** This environment variable sets the host of `alert-server` for `worker-server`. The default value is `127.0.0.1`. **`ALERT_PLUGIN_DIR`** This environment variable sets the alert plugin directory for `alert-server`. The default value is `lib/plugin/alert`. ## Initialization scripts If you would like to do additional initialization in an image derived from this one, add one or more environment variable under `/root/start-init-conf.sh`, and modify template files in `/opt/dolphinscheduler/conf/*.tpl`. For example, to add an environment variable `API_SERVER_PORT` in `/root/start-init-conf.sh`: ``` export API_SERVER_PORT=5555 ``` and to modify `/opt/dolphinscheduler/conf/application-api.properties.tpl` template file, add server port: ``` server.port=${API_SERVER_PORT} ``` `/root/start-init-conf.sh` will dynamically generate config file: ```sh echo "generate dolphinscheduler config" ls ${DOLPHINSCHEDULER_HOME}/conf/ | grep ".tpl" | while read line; do eval "cat << EOF $(cat ${DOLPHINSCHEDULER_HOME}/conf/${line}) EOF " > ${DOLPHINSCHEDULER_HOME}/conf/${line%.*} done ``` ## FAQ ### How to stop dolphinscheduler by docker-compose? Stop containers: ``` docker-compose stop ``` Stop containers and remove containers, networks and volumes: ``` docker-compose down -v ``` ### How to deploy dolphinscheduler on Docker Swarm? Assuming that the Docker Swarm cluster has been created (If there is no Docker Swarm cluster, please refer to [create-swarm](https://docs.docker.com/engine/swarm/swarm-tutorial/create-swarm/)) Start a stack named dolphinscheduler ``` docker stack deploy -c docker-stack.yml dolphinscheduler ``` Stop and remove the stack named dolphinscheduler ``` docker stack rm dolphinscheduler ``` ### How to use MySQL as the DolphinScheduler's database instead of PostgreSQL? > Because of the commercial license, we cannot directly use the driver and client of MySQL. > > If you want to use MySQL, you can build a new image based on the `apache/dolphinscheduler` image as follows. 1. Download the MySQL driver [mysql-connector-java-5.1.49.jar](https://repo1.maven.org/maven2/mysql/mysql-connector-java/5.1.49/mysql-connector-java-5.1.49.jar) (require `>=5.1.47`) 2. Create a new `Dockerfile` to add MySQL driver and client: ``` FROM apache/dolphinscheduler:latest COPY mysql-connector-java-5.1.49.jar /opt/dolphinscheduler/lib RUN apk add --update --no-cache mysql-client ``` 3. Build a new docker image including MySQL driver and client: ``` docker build -t apache/dolphinscheduler:mysql . ``` 4. Modify all `image` fields to `apache/dolphinscheduler:mysql` in `docker-compose.yml` > If you want to deploy dolphinscheduler on Docker Swarm, you need modify `docker-stack.yml` 5. Comment the `dolphinscheduler-postgresql` block in `docker-compose.yml` 6. Add `dolphinscheduler-mysql` service in `docker-compose.yml` (**Optional**, you can directly use a external MySQL database) 7. Modify all DATABASE environments in `docker-compose.yml` ``` DATABASE_TYPE: mysql DATABASE_DRIVER: com.mysql.jdbc.Driver DATABASE_HOST: dolphinscheduler-mysql DATABASE_PORT: 3306 DATABASE_USERNAME: root DATABASE_PASSWORD: root DATABASE_DATABASE: dolphinscheduler DATABASE_PARAMS: useUnicode=true&characterEncoding=UTF-8 ``` > If you have added `dolphinscheduler-mysql` service in `docker-compose.yml`, just set `DATABASE_HOST` to `dolphinscheduler-mysql` 8. Run a dolphinscheduler (See **How to use this docker image**) ### How to support MySQL datasource in `Datasource manage`? > Because of the commercial license, we cannot directly use the driver of MySQL. > > If you want to add MySQL datasource, you can build a new image based on the `apache/dolphinscheduler` image as follows. 1. Download the MySQL driver [mysql-connector-java-5.1.49.jar](https://repo1.maven.org/maven2/mysql/mysql-connector-java/5.1.49/mysql-connector-java-5.1.49.jar) (require `>=5.1.47`) 2. Create a new `Dockerfile` to add MySQL driver: ``` FROM apache/dolphinscheduler:latest COPY mysql-connector-java-5.1.49.jar /opt/dolphinscheduler/lib ``` 3. Build a new docker image including MySQL driver: ``` docker build -t apache/dolphinscheduler:mysql-driver . ``` 4. Modify all `image` fields to `apache/dolphinscheduler:mysql-driver` in `docker-compose.yml` > If you want to deploy dolphinscheduler on Docker Swarm, you need modify `docker-stack.yml` 5. Run a dolphinscheduler (See **How to use this docker image**) 6. Add a MySQL datasource in `Datasource manage` ### How to support Oracle datasource in `Datasource manage`? > Because of the commercial license, we cannot directly use the driver of Oracle. > > If you want to add Oracle datasource, you can build a new image based on the `apache/dolphinscheduler` image as follows. 1. Download the Oracle driver [ojdbc8.jar](https://repo1.maven.org/maven2/com/oracle/database/jdbc/ojdbc8/) (such as `ojdbc8-19.9.0.0.jar`) 2. Create a new `Dockerfile` to add Oracle driver: ``` FROM apache/dolphinscheduler:latest COPY ojdbc8-19.9.0.0.jar /opt/dolphinscheduler/lib ``` 3. Build a new docker image including Oracle driver: ``` docker build -t apache/dolphinscheduler:oracle-driver . ``` 4. Modify all `image` fields to `apache/dolphinscheduler:oracle-driver` in `docker-compose.yml` > If you want to deploy dolphinscheduler on Docker Swarm, you need modify `docker-stack.yml` 5. Run a dolphinscheduler (See **How to use this docker image**) 6. Add a Oracle datasource in `Datasource manage` For more information please refer to the [incubator-dolphinscheduler](https://github.com/apache/incubator-dolphinscheduler.git) documentation.