DolphinScheduler is a distributed and easy-to-expand visual DAG workflow scheduling system, dedicated to solving the complex dependencies in data processing, making the scheduling system out of the box for data processing.
DolphinScheduler is a distributed and easy-to-expand visual DAG workflow scheduling system, dedicated to solving the complex dependencies in data processing, making the scheduling system out of the box for data processing.
The Dolphin Scheduler image uses several environment variables which are easy to miss. While none of the variables are required, they may significantly aid you in using the image.
The DolphinScheduler Docker container is configured through environment variables, and the default value will be used if an environment variable is not set.
**`DATABASE_TYPE`**
@ -168,9 +168,41 @@ This environment variable sets the database for database. The default value is `
**Note**: You must be specify it when start a standalone dolphinscheduler server. Like `master-server`, `worker-server`, `api-server`, `alert-server`.
**`DOLPHINSCHEDULER_ENV_PATH`**
**`HADOOP_HOME`**
This environment variable sets `HADOOP_HOME`. The default value is `/opt/soft/hadoop`.
**`HADOOP_CONF_DIR`**
This environment variable sets `HADOOP_CONF_DIR`. The default value is `/opt/soft/hadoop/etc/hadoop`.
**`SPARK_HOME1`**
This environment variable sets `SPARK_HOME1`. The default value is `/opt/soft/spark1`.
**`SPARK_HOME2`**
This environment variable sets `SPARK_HOME2`. The default value is `/opt/soft/spark2`.
**`PYTHON_HOME`**
This environment variable sets `PYTHON_HOME`. The default value is `/usr/bin/python`.
**`JAVA_HOME`**
This environment variable sets `JAVA_HOME`. The default value is `/usr/lib/jvm/java-1.8-openjdk`.
This environment variable sets the runtime environment for task. The default value is `/opt/dolphinscheduler/conf/env/dolphinscheduler_env.sh`.
**`HIVE_HOME`**
This environment variable sets `HIVE_HOME`. The default value is `/opt/soft/hive`.
**`FLINK_HOME`**
This environment variable sets `FLINK_HOME`. The default value is `/opt/soft/flink`.
**`DATAX_HOME`**
This environment variable sets `DATAX_HOME`. The default value is `/opt/soft/datax/bin/datax.py`.
**`DOLPHINSCHEDULER_DATA_BASEDIR_PATH`**
@ -268,7 +300,7 @@ This environment variable sets port for `worker-server`. The default value is `1
**`WORKER_GROUPS`**
This environment variable sets group for `worker-server`. The default value is `default`.
This environment variable sets groups for `worker-server`. The default value is `default`.
**`WORKER_WEIGHT`**
@ -308,3 +340,142 @@ EOF
" > ${DOLPHINSCHEDULER_HOME}/conf/${line%.*}
done
```
## FAQ
### How to stop dolphinscheduler by docker-compose?
Stop containers:
```
docker-compose stop
```
Stop containers and remove containers, networks and volumes:
```
docker-compose down -v
```
### How to deploy dolphinscheduler on Docker Swarm?
Assuming that the Docker Swarm cluster has been created (If there is no Docker Swarm cluster, please refer to [create-swarm](https://docs.docker.com/engine/swarm/swarm-tutorial/create-swarm/))
### How to use MySQL as the DolphinScheduler's database instead of PostgreSQL?
> Because of the commercial license, we cannot directly use the driver and client of MySQL.
>
> If you want to use MySQL, you can build a new image based on the `apache/dolphinscheduler` image as follows.
1. Download the MySQL driver [mysql-connector-java-5.1.49.jar](https://repo1.maven.org/maven2/mysql/mysql-connector-java/5.1.49/mysql-connector-java-5.1.49.jar) (require `>=5.1.47`)
2. Create a new `Dockerfile` to add MySQL driver and client:
> If you have added `dolphinscheduler-mysql` service in `docker-compose.yml`, just set `DATABASE_HOST` to `dolphinscheduler-mysql`
8. Run a dolphinscheduler (See **How to use this docker image**)
### How to support MySQL datasource in `Datasource manage`?
> Because of the commercial license, we cannot directly use the driver of MySQL.
>
> If you want to add MySQL datasource, you can build a new image based on the `apache/dolphinscheduler` image as follows.
1. Download the MySQL driver [mysql-connector-java-5.1.49.jar](https://repo1.maven.org/maven2/mysql/mysql-connector-java/5.1.49/mysql-connector-java-5.1.49.jar) (require `>=5.1.47`)
[DolphinScheduler](https://dolphinscheduler.apache.org) is a distributed and easy-to-expand visual DAG workflow scheduling system, dedicated to solving the complex dependencies in data processing, making the scheduling system out of the box for data processing.
[DolphinScheduler](https://dolphinscheduler.apache.org) is a distributed and easy-to-expand visual DAG workflow scheduling system, dedicated to solving the complex dependencies in data processing, making the scheduling system out of the box for data processing.
## Introduction
This chart bootstraps a [DolphinScheduler](https://dolphinscheduler.apache.org) distributed deployment on a [Kubernetes](http://kubernetes.io) cluster using the [Helm](https://helm.sh) package manager.
This chart bootstraps a [DolphinScheduler](https://dolphinscheduler.apache.org) distributed deployment on a [Kubernetes](http://kubernetes.io) cluster using the [Helm](https://helm.sh) package manager.
## Prerequisites
- Helm 3.1.0+
- Kubernetes 1.12+
- [Helm](https://helm.sh/) 3.1.0+
- [Kubernetes](https://kubernetes.io/) 1.12+
- PV provisioner support in the underlying infrastructure
These commands deploy Dolphin Scheduler on the Kubernetes cluster in the default configuration. The [configuration](#configuration) section lists the parameters that can be configured during installation.
To install the chart with a namespace named `test`:
```bash
$ helm install dolphinscheduler . -n test
```
> **Tip**: If a namespace named `test` is used, the option `-n test` needs to be added to the `helm` and `kubectl` command
These commands deploy DolphinScheduler on the Kubernetes cluster in the default configuration. The [configuration](#configuration) section lists the parameters that can be configured during installation.
> **Tip**: List all releases using `helm list`
## Access DolphinScheduler UI
If `ingress.enabled` in `values.yaml` is set to `true`, you just access `http://${ingress.host}/dolphinscheduler` in browser.
> **Tip**: If there is a problem with ingress access, please contact the Kubernetes administrator and refer to the [Ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/)
Otherwise, you need to execute port-forward command like:
> **Note**: Deleting the PVC's will delete all data as well. Please be cautious before doing it.
## Configuration
The following tables lists the configurable parameters of the Dolphins Scheduler chart and their default values.
The Configuration file is `values.yaml`, and the following tables lists the configurable parameters of the DolphinScheduler chart and their default values.
| `timezone` | World time and date for cities in all time zones | `Asia/Shanghai` |
| `image.registry` | Docker image registry for the Dolphins Scheduler | `docker.io` |
| `image.repository` | Docker image repository for the Dolphins Scheduler | `dolphinscheduler` |
| `image.tag` | Docker image version for the Dolphins Scheduler | `1.2.1` |
| `image.imagePullPolicy` | Image pull policy. One of Always, Never, IfNotPresent | `IfNotPresent` |
| `image.pullSecres` | PullSecrets is an optional list of references to secrets in the same namespace to use for pulling any of the images | `[]` |
| | | |
| `postgresql.enabled` | If not exists external PostgreSQL, by default, the Dolphins Scheduler will use a internal PostgreSQL | `true` |
| `image.registry` | Docker image registry for the DolphinScheduler | `docker.io` |
| `image.repository` | Docker image repository for the DolphinScheduler | `dolphinscheduler` |
| `image.tag` | Docker image version for the DolphinScheduler | `latest` |
| `image.pullPolicy` | Image pull policy. One of Always, Never, IfNotPresent | `IfNotPresent` |
| `image.pullSecrets`| Image pull secrets. An optional list of references to secrets in the same namespace to use for pulling any of the images | `[]` |
| | | |
| `postgresql.enabled` | If not exists external PostgreSQL, by default, the DolphinScheduler will use a internal PostgreSQL | `true` |
| `postgresql.postgresqlUsername` | The username for internal PostgreSQL | `root` |
| `postgresql.postgresqlPassword` | The password for internal PostgreSQL | `root` |
| `postgresql.postgresqlDatabase` | The database for internal PostgreSQL | `dolphinscheduler` |
| `postgresql.persistence.enabled` | Set `postgresql.persistence.enabled` to `true` to mount a new volume for internal PostgreSQL | `false` |
| `postgresql.persistence.storageClass` | PostgreSQL data Persistent Volume Storage Class. If set to "-", storageClassName: "", which disables dynamic provisioning | `-` |
| `externalDatabase.type` | If exists external PostgreSQL, and set `postgresql.enable` value to false. Dolphins Scheduler's database type will use it. | `postgresql` |
| `externalDatabase.driver` | If exists external PostgreSQL, and set `postgresql.enable` value to false. Dolphins Scheduler's database driver will use it. | `org.postgresql.Driver` |
| `externalDatabase.host` | If exists external PostgreSQL, and set `postgresql.enable` value to false. Dolphins Scheduler's database host will use it. | `localhost` |
| `externalDatabase.port` | If exists external PostgreSQL, and set `postgresql.enable` value to false. Dolphins Scheduler's database port will use it. | `5432` |
| `externalDatabase.username` | If exists external PostgreSQL, and set `postgresql.enable` value to false. Dolphins Scheduler's database username will use it. | `root` |
| `externalDatabase.password` | If exists external PostgreSQL, and set `postgresql.enable` value to false. Dolphins Scheduler's database password will use it. | `root` |
| `externalDatabase.database` | If exists external PostgreSQL, and set `postgresql.enable` value to false. Dolphins Scheduler's database database will use it. | `dolphinscheduler` |
| `externalDatabase.params` | If exists external PostgreSQL, and set `postgresql.enable` value to false. Dolphins Scheduler's database params will use it. | `characterEncoding=utf8` |
| | | |
| `zookeeper.enabled` | If not exists external Zookeeper, by default, the Dolphin Scheduler will use a internal Zookeeper | `true` |
| `zookeeper.taskQueue` | Specify task queue for `master` and `worker` | `zookeeper` |
| `externalDatabase.type` | If exists external PostgreSQL, and set `postgresql.enabled` value to false. DolphinScheduler's database type will use it | `postgresql` |
| `externalDatabase.driver` | If exists external PostgreSQL, and set `postgresql.enabled` value to false. DolphinScheduler's database driver will use it | `org.postgresql.Driver` |
| `externalDatabase.host` | If exists external PostgreSQL, and set `postgresql.enabled` value to false. DolphinScheduler's database host will use it | `localhost` |
| `externalDatabase.port` | If exists external PostgreSQL, and set `postgresql.enabled` value to false. DolphinScheduler's database port will use it | `5432` |
| `externalDatabase.username` | If exists external PostgreSQL, and set `postgresql.enabled` value to false. DolphinScheduler's database username will use it | `root` |
| `externalDatabase.password` | If exists external PostgreSQL, and set `postgresql.enabled` value to false. DolphinScheduler's database password will use it | `root` |
| `externalDatabase.database` | If exists external PostgreSQL, and set `postgresql.enabled` value to false. DolphinScheduler's database database will use it | `dolphinscheduler` |
| `externalDatabase.params` | If exists external PostgreSQL, and set `postgresql.enabled` value to false. DolphinScheduler's database params will use it | `characterEncoding=utf8` |
| | | |
| `zookeeper.enabled` | If not exists external Zookeeper, by default, the DolphinScheduler will use a internal Zookeeper | `true` |
| `zookeeper.fourlwCommandsWhitelist` | A list of comma separated Four Letter Words commands to use | `srvr,ruok,wchs,cons` |
| `zookeeper.service.port` | ZooKeeper port | `2181` |
| `zookeeper.persistence.enabled` | Set `zookeeper.persistence.enabled` to `true` to mount a new volume for internal Zookeeper | `false` |
| `zookeeper.persistence.storageClass` | Zookeeper data Persistent Volume Storage Class. If set to "-", storageClassName: "", which disables dynamic provisioning | `-` |
| `externalZookeeper.taskQueue` | If exists external Zookeeper, and set `zookeeper.enable` value to false. Specify task queue for `master` and `worker` | `zookeeper` |
| `externalZookeeper.zookeeperQuorum` | If exists external Zookeeper, and set `zookeeper.enable` value to false. Specify Zookeeper quorum | `127.0.0.1:2181` |
| `externalZookeeper.zookeeperRoot` | If exists external Zookeeper, and set `zookeeper.enable` value to false. Specify Zookeeper root path for `master` and `worker` | `dolphinscheduler` |
| | | |
| `common.configmap.DOLPHINSCHEDULER_ENV_PATH` | Extra env file path. | `/tmp/dolphinscheduler/env` |
| `common.configmap.DOLPHINSCHEDULER_DATA_BASEDIR_PATH` | File uploaded path of DS. | `/tmp/dolphinscheduler/files` |
| `common.configmap.RESOURCE_STORAGE_TYPE` | Resource Storate type, support type are: S3、HDFS、NONE. | `NONE` |
| `common.configmap.RESOURCE_UPLOAD_PATH` | The base path of resource. | `/ds` |
| `common.configmap.FS_DEFAULT_FS` | The default fs of resource, for s3 is the `s3a` prefix and bucket name. | `s3a://xxxx` |
| `common.configmap.FS_S3A_ENDPOINT` | If the resource type is `S3`, you should fill this filed, it's the endpoint of s3. | `s3.xxx.amazonaws.com` |
| `common.configmap.FS_S3A_ACCESS_KEY` | The access key for your s3 bucket. | `xxxxxxx` |
| `common.configmap.FS_S3A_SECRET_KEY` | The secret key for your s3 bucket. | `xxxxxxx` |
| `externalZookeeper.zookeeperQuorum` | If exists external Zookeeper, and set `zookeeper.enabled` value to false. Specify Zookeeper quorum | `127.0.0.1:2181` |
| `externalZookeeper.zookeeperRoot` | If exists external Zookeeper, and set `zookeeper.enabled` value to false. Specify dolphinscheduler root directory in Zookeeper | `/dolphinscheduler` |
| `common.configmap.DOLPHINSCHEDULER_DATA_BASEDIR_PATH` | User data directory path, self configuration, please make sure the directory exists and have read write permissions | `/tmp/dolphinscheduler` |
| `common.configmap.RESOURCE_UPLOAD_PATH` | Resource store on HDFS/S3 path, please make sure the directory exists on hdfs and have read write permissions | `/dolphinscheduler` |
| `common.configmap.FS_DEFAULT_FS` | Resource storage file system like `file:///`, `hdfs://mycluster:8020` or `s3a://dolphinscheduler` | `file:///` |
| `common.configmap.FS_S3A_ENDPOINT` | S3 endpoint when `common.configmap.RESOURCE_STORAGE_TYPE` is set to `S3` | `s3.xxx.amazonaws.com` |
| `common.configmap.FS_S3A_ACCESS_KEY` | S3 access key when `common.configmap.RESOURCE_STORAGE_TYPE` is set to `S3` | `xxxxxxx` |
| `common.configmap.FS_S3A_SECRET_KEY` | S3 secret key when `common.configmap.RESOURCE_STORAGE_TYPE` is set to `S3` | `xxxxxxx` |
| `common.fsFileResourcePersistence.enabled` | Set `common.fsFileResourcePersistence.enabled` to `true` to mount a new file resource volume for `api` and `worker` | `false` |
| `common.fsFileResourcePersistence.accessModes` | `PersistentVolumeClaim` Access Modes, must be `ReadWriteMany` | `[ReadWriteMany]` |
| `common.fsFileResourcePersistence.storageClassName` | Resource Persistent Volume Storage Class, must support the access mode: ReadWriteMany | `-` |
| `master.podManagementPolicy` | PodManagementPolicy controls how pods are created during initial scale up, when replacing pods on nodes, or when scaling down | `Parallel` |
| | | |
| `master.replicas` | Replicas is the desired number of replicas of the given Template | `3` |
| `master.annotations` | The `annotations` for master server | `{}` |
| `master.affinity` | If specified, the pod's scheduling constraints | `{}` |
| `master.nodeSelector` | NodeSelector is a selector which must be true for the pod to fit on a node | `{}` |
| `master.tolerations` | If specified, the pod's tolerations | `{}` |
| `master.affinity` | If specified, the pod's scheduling constraints | `{}` |
| `master.jvmOptions` | The JVM options for master server. | `""` |
| `master.resources` | The `resource` limit and request config for master server. | `{}` |
| `master.annotations` | The `annotations` for master server. | `{}` |
| `master.resources` | The `resource` limit and request config for master server | `{}` |
| `master.configmap.DOLPHINSCHEDULER_OPTS` | The java options for master server | `""` |
| `master.configmap.MASTER_EXEC_THREADS` | Master execute thread num | `100` |
| `master.configmap.MASTER_EXEC_TASK_NUM` | Master execute task number in parallel | `20` |
| `master.configmap.MASTER_MAX_CPULOAD_AVG` | Only less than cpu avg load, master server can work. default value : the number of cpu cores * 2 | `100` |
| `master.configmap.MASTER_RESERVED_MEMORY` | Only larger than reserved memory, master server can work. default value : physical memory * 1/10, unit is G | `0.1` |
| `master.configmap.MASTER_LISTEN_PORT` | Master listen port | `5678` |
| `master.livenessProbe.enabled` | Turn on and off liveness probe | `true` |
| `master.livenessProbe.initialDelaySeconds` | Delay before liveness probe is initiated | `30` |
| `master.livenessProbe.periodSeconds` | How often to perform the probe | `30` |
@ -114,22 +158,22 @@ The following tables lists the configurable parameters of the Dolphins Scheduler
| `master.persistentVolumeClaim.storageClassName` | `Master` logs data Persistent Volume Storage Class. If set to "-", storageClassName: "", which disables dynamic provisioning | `-` |
| `worker.podManagementPolicy` | PodManagementPolicy controls how pods are created during initial scale up, when replacing pods on nodes, or when scaling down | `Parallel` |
| `worker.replicas` | Replicas is the desired number of replicas of the given Template | `3` |
| `worker.annotations` | The `annotations` for worker server | `{}` |
| `worker.affinity` | If specified, the pod's scheduling constraints | `{}` |
| `worker.nodeSelector` | NodeSelector is a selector which must be true for the pod to fit on a node | `{}` |
| `worker.tolerations` | If specified, the pod's tolerations | `{}` |
| `worker.affinity` | If specified, the pod's scheduling constraints | `{}` |
| `worker.jvmOptions` | The JVM options for worker server. | `""` |
| `worker.resources` | The `resource` limit and request config for worker server. | `{}` |
| `worker.annotations` | The `annotations` for worker server. | `{}` |
| `worker.resources` | The `resource` limit and request config for worker server | `{}` |
| `worker.configmap.DOLPHINSCHEDULER_OPTS` | The java options for worker server | `""` |
| `worker.configmap.WORKER_EXEC_THREADS` | Worker execute thread num | `100` |
| `worker.configmap.WORKER_FETCH_TASK_NUM` | Submit the number of tasks at a time | `3` |
| `worker.configmap.WORKER_MAX_CPULOAD_AVG` | Only less than cpu avg load, worker server can work. default value : the number of cpu cores * 2 | `100` |
| `worker.configmap.WORKER_RESERVED_MEMORY` | Only larger than reserved memory, worker server can work. default value : physical memory * 1/10, unit is G | `0.1` |
| `worker.configmap.DOLPHINSCHEDULER_DATA_BASEDIR_PATH` | User data directory path, self configuration, please make sure the directory exists and have read write permissions | `/tmp/dolphinscheduler` |
| `worker.persistentVolumeClaim.logsPersistentVolume.storageClassName` | `Worker` logs data Persistent Volume Storage Class. If set to "-", storageClassName: "", which disables dynamic provisioning | `-` |
| `alert.livenessProbe.enabled` | Turn on and off liveness probe | `true` |
| `alert.livenessProbe.initialDelaySeconds` | Delay before liveness probe is initiated | `30` |
| `alert.livenessProbe.periodSeconds` | How often to perform the probe | `30` |
@ -194,16 +224,16 @@ The following tables lists the configurable parameters of the Dolphins Scheduler
| `alert.persistentVolumeClaim.storageClassName` | `Alert` logs data Persistent Volume Storage Class. If set to "-", storageClassName: "", which disables dynamic provisioning | `-` |
| `ingress.tls.secretName` | Ingress tls secret name | `dolphinscheduler-tls` |
For more information please refer to the [chart](https://github.com/apache/incubator-dolphinscheduler.git) documentation.
## FAQ
### How to use MySQL as the DolphinScheduler's database instead of PostgreSQL?
> Because of the commercial license, we cannot directly use the driver and client of MySQL.
>
> If you want to use MySQL, you can build a new image based on the `apache/dolphinscheduler` image as follows.
1. Download the MySQL driver [mysql-connector-java-5.1.49.jar](https://repo1.maven.org/maven2/mysql/mysql-connector-java/5.1.49/mysql-connector-java-5.1.49.jar) (require `>=5.1.47`)
2. Create a new `Dockerfile` to add MySQL driver and client:
3. Build a new docker image including MySQL driver and client:
```
docker build -t apache/dolphinscheduler:mysql .
```
4. Push the docker image `apache/dolphinscheduler:mysql` to a docker registry
5. Modify image `registry` and `repository`, and update `tag` to `mysql` in `values.yaml`
6. Modify postgresql `enabled` to `false`
7. Modify externalDatabase (especially modify `host`, `username` and `password`):
```
externalDatabase:
type: "mysql"
driver: "com.mysql.jdbc.Driver"
host: "localhost"
port: "3306"
username: "root"
password: "root"
database: "dolphinscheduler"
params: "useUnicode=true&characterEncoding=UTF-8"
```
8. Run a DolphinScheduler release in Kubernetes (See **Installing the Chart**)
### How to support MySQL datasource in `Datasource manage`?
> Because of the commercial license, we cannot directly use the driver of MySQL.
>
> If you want to add MySQL datasource, you can build a new image based on the `apache/dolphinscheduler` image as follows.
1. Download the MySQL driver [mysql-connector-java-5.1.49.jar](https://repo1.maven.org/maven2/mysql/mysql-connector-java/5.1.49/mysql-connector-java-5.1.49.jar) (require `>=5.1.47`)