[1.3.6-prepare][Improvement][Docker/K8s] Support more configs, more service access, skywalking configs, improve image for python, update faq and add support matrix (#5158)
* [1.3.6-prepare][Improvement][Config] Update config
* [1.3.6-prepare][Improvement][Docker&K8s] Sync the latest config
* [1.3.6-prepare][Improvement][Docker] Fix mysql check and remove redundant config
* [1.3.6-prepare][Improvement][Config] Update config for common.properties
* [1.3.6-prepare][Improvement][Config] Add config.env for docker compose and swarm
* [1.3.6-prepare][Improvement][K8s] Add common properties and reduce duplication for K8s
* [1.3.6-prepare][Improvement][K8s] Support more service access like ClusterIP, NodePort and LoadBalancer
* [1.3.6-prepare][Improvement][K8s] Unify annotations, affinity, nodeSelector, tolerations, resources and probe in K8s
* [1.3.6-prepare][Improvement][Docker&K8s] Support skywalking config in docker & k8s
* [1.3.6-prepare][Improvement][Docker] Rename config.env to config.env.sh
* [1.3.6-prepare][Improvement][Docker] Replace alpine with debian:slim
* [1.3.6-prepare][Improvement][Docker] Remove postgresql-client and mysql-client
* [1.3.6-prepare][Improvement][Docker&K8s] Add faq for python, spark, hadoop, flink and datax
* [1.3.6-prepare][Improvement][Docker&K8s] Add support matrix for docker/k8s
resource store on HDFS/S3 path, resource file will store to this hadoop hdfs path, self configuration, please make sure the directory exists on hdfs and have read write permissions。"/dolphinscheduler" is recommended
</description>
<on-ambari-upgradeadd="true"/>
</property>
<property>
<name>data.basedir.path</name>
<value>/tmp/dolphinscheduler</value>
<description>
user data local directory path, please make sure the directory exists and have read write permissions
resource store on HDFS/S3 path, resource file will store to this hadoop hdfs path, self configuration, please make sure the directory exists on hdfs and have read write permissions. "/dolphinscheduler" is recommended
Users who have permission to create directories under the HDFS root path
if resource.storage.type=HDFS, the user must have the permission to create directories under the HDFS root path
</description>
<on-ambari-upgradeadd="true"/>
</property>
@ -119,9 +129,7 @@
<name>fs.defaultFS</name>
<value>hdfs://mycluster:8020</value>
<description>
HA or single namenode,
If namenode ha needs to copy core-site.xml and hdfs-site.xml to the conf directory,
support s3,for example : s3a://dolphinscheduler
if resource.storage.type=S3, the value like: s3a://dolphinscheduler; if resource.storage.type=HDFS and namenode HA is enabled, you need to copy core-site.xml and hdfs-site.xml to conf dir
</description>
<on-ambari-upgradeadd="true"/>
</property>
@ -129,7 +137,7 @@
<name>fs.s3a.endpoint</name>
<value>http://host:9010</value>
<description>
s3 need,s3 endpoint
s3 required,s3 endpoint
</description>
<on-ambari-upgradeadd="true"/>
</property>
@ -137,7 +145,7 @@
<name>fs.s3a.access.key</name>
<value>A3DXS30FO22544RE</value>
<description>
s3 need,s3 access key
s3 required,s3 access key
</description>
<on-ambari-upgradeadd="true"/>
</property>
@ -145,14 +153,32 @@
<name>fs.s3a.secret.key</name>
<value>OloCLq3n+8+sdPHUhJ21XrSxTC+JK</value>
<description>
s3 need,s3 secret key
s3 required,s3 secret key
</description>
<on-ambari-upgradeadd="true"/>
</property>
<property>
<name>kerberos.expire.time</name>
<value>7</value>
<description></description>
<name>yarn.resourcemanager.ha.rm.ids</name>
<value>192.168.xx.xx,192.168.xx.xx</value>
<description>
if resourcemanager HA is enabled, please set the HA IPs; if resourcemanager is single, keep this value empty
if resourcemanager HA is enabled or not use resourcemanager, please keep the default value; If resourcemanager is single, you only need to replace ds1 to actual resourcemanager hostname
**Note**: You must be specify `DATABASE_HOST``DATABASE_PORT``DATABASE_DATABASE``DATABASE_USERNAME``DATABASE_PASSWORD``ZOOKEEPER_QUORUM` when start a standalone dolphinscheduler server.
**Note**: You must be specify `DATABASE_HOST`,`DATABASE_PORT`,`DATABASE_DATABASE`,`DATABASE_USERNAME`,`DATABASE_PASSWORD`,`ZOOKEEPER_QUORUM` when start a standalone dolphinscheduler server.
| Spark-Local(client) | Indirect Yes | Refer to FAQ |
| Spark-YARN(cluster) | Indirect Yes | Refer to FAQ |
| Spark-Mesos(cluster) | Not Yet | |
| Spark-Standalone(cluster) | Not Yet | |
| Spark-Kubernetes(cluster) | Not Yet | |
| Flink-Local(local>=1.11) | Not Yet | Generic CLI mode is not yet supported |
| Flink-YARN(yarn-cluster) | Indirect Yes | Refer to FAQ |
| Flink-YARN(yarn-session/yarn-per-job/yarn-application>=1.11) | Not Yet | Generic CLI mode is not yet supported |
| Flink-Mesos(default) | Not Yet | |
| Flink-Mesos(remote>=1.11) | Not Yet | Generic CLI mode is not yet supported |
| Flink-Standalone(default) | Not Yet | |
| Flink-Standalone(remote>=1.11) | Not Yet | Generic CLI mode is not yet supported |
| Flink-Kubernetes(default) | Not Yet | |
| Flink-Kubernetes(remote>=1.11) | Not Yet | Generic CLI mode is not yet supported |
| Flink-NativeKubernetes(kubernetes-session/application>=1.11) | Not Yet | Generic CLI mode is not yet supported |
| MapReduce | Indirect Yes | Refer to FAQ |
| Kerberos | Indirect Yes | Refer to FAQ |
| HTTP | Yes | |
| DataX | Indirect Yes | Refer to FAQ |
| Sqoop | Indirect Yes | Refer to FAQ |
| SQL-MySQL | Indirect Yes | Refer to FAQ |
| SQL-PostgreSQL | Yes | |
| SQL-Hive | Indirect Yes | Refer to FAQ |
| SQL-Spark | Indirect Yes | Refer to FAQ |
| SQL-ClickHouse | Indirect Yes | Refer to FAQ |
| SQL-Oracle | Indirect Yes | Refer to FAQ |
| SQL-SQLServer | Indirect Yes | Refer to FAQ |
| SQL-DB2 | Indirect Yes | Refer to FAQ |
## Environment Variables
The DolphinScheduler Docker container is configured through environment variables, and the default value will be used if an environment variable is not set.
### Database
**`DATABASE_TYPE`**
This environment variable sets the type for database. The default value is `postgresql`.
@ -174,13 +214,23 @@ This environment variable sets the database for database. The default value is `
**Note**: You must be specify it when start a standalone dolphinscheduler server. Like `master-server`, `worker-server`, `api-server`, `alert-server`.
**`DOLPHINSCHEDULER_OPTS`**
### ZooKeeper
This environment variable sets jvm options for `master-server`, `worker-server`, `api-server` or `alert-server`. The default value is empty.
**`ZOOKEEPER_QUORUM`**
**`LOGGER_SERVER_OPTS`**
This environment variable sets zookeeper quorum. The default value is `127.0.0.1:2181`.
**Note**: You must be specify it when start a standalone dolphinscheduler server. Like `master-server`, `worker-server`, `api-server`.
**`ZOOKEEPER_ROOT`**
This environment variable sets zookeeper root directory for dolphinscheduler. The default value is `/dolphinscheduler`.
This environment variable sets jvm options for `logger-server` (since `logger-server` is deployed with `worker-server`, it needs to be set separately). The default value is empty.
### Common
**`DOLPHINSCHEDULER_OPTS`**
This environment variable sets jvm options for dolphinscheduler, suitable for `master-server`, `worker-server`, `api-server`, `alert-server`, `logger-server`. The default value is empty.
**`DATA_BASEDIR_PATH`**
@ -210,6 +260,54 @@ This environment variable sets s3 access key for resource storage. The default v
This environment variable sets s3 secret key for resource storage. The default value is `xxxxxxx`.
This environment variable sets whether to startup kerberos. The default value is `false`.
**`JAVA_SECURITY_KRB5_CONF_PATH`**
This environment variable sets java.security.krb5.conf path. The default value is `/opt/krb5.conf`.
**`LOGIN_USER_KEYTAB_USERNAME`**
This environment variable sets login user from keytab username. The default value is `hdfs@HADOOP.COM`.
**`LOGIN_USER_KEYTAB_PATH`**
This environment variable sets login user from keytab path. The default value is `/opt/hdfs.keytab`.
**`KERBEROS_EXPIRE_TIME`**
This environment variable sets kerberos expire time, the unit is hour. The default value is `2`.
**`HDFS_ROOT_USER`**
This environment variable sets hdfs root user when resource.storage.type=HDFS. The default value is `hdfs`.
**`YARN_RESOURCEMANAGER_HA_RM_IDS`**
This environment variable sets yarn resourcemanager ha rm ids. The default value is empty.
**`YARN_APPLICATION_STATUS_ADDRESS`**
This environment variable sets yarn application status address. The default value is `http://ds1:8088/ws/v1/cluster/apps/%s`.
**`SKYWALKING_ENABLE`**
This environment variable sets whether to enable skywalking. The default value is `false`.
**`SW_AGENT_COLLECTOR_BACKEND_SERVICES`**
This environment variable sets agent collector backend services for skywalking. The default value is `127.0.0.1:11800`.
**`SW_GRPC_LOG_SERVER_HOST`**
This environment variable sets grpc log server host for skywalking. The default value is `127.0.0.1`.
**`SW_GRPC_LOG_SERVER_PORT`**
This environment variable sets grpc log server port for skywalking. The default value is `11800`.
**`HADOOP_HOME`**
This environment variable sets `HADOOP_HOME`. The default value is `/opt/soft/hadoop`.
@ -232,7 +330,7 @@ This environment variable sets `PYTHON_HOME`. The default value is `/usr/bin/pyt
**`JAVA_HOME`**
This environment variable sets `JAVA_HOME`. The default value is `/usr/lib/jvm/java-1.8-openjdk`.
This environment variable sets `JAVA_HOME`. The default value is `/usr/local/openjdk-8`.
**`HIVE_HOME`**
@ -246,23 +344,27 @@ This environment variable sets `FLINK_HOME`. The default value is `/opt/soft/fli
This environment variable sets `DATAX_HOME`. The default value is `/opt/soft/datax`.
**`ZOOKEEPER_QUORUM`**
This environment variable sets zookeeper quorum for `master-server` and `worker-serverr`. The default value is `127.0.0.1:2181`.
**Note**: You must be specify it when start a standalone dolphinscheduler server. Like `master-server`, `worker-server`.
### Master Server
**`ZOOKEEPER_ROOT`**
**`MASTER_SERVER_OPTS`**
This environment variable sets zookeeper root directory for dolphinscheduler. The default value is `/dolphinscheduler`.
This environment variable sets jvm options for `master-server`. The default value is `-Xms1g -Xmx1g -Xmn512m`.
**`MASTER_EXEC_THREADS`**
This environment variable sets exec thread num for `master-server`. The default value is `100`.
This environment variable sets exec thread number for `master-server`. The default value is `100`.
**`MASTER_EXEC_TASK_NUM`**
This environment variable sets exec task num for `master-server`. The default value is `20`.
This environment variable sets exec task number for `master-server`. The default value is `20`.
**`MASTER_DISPATCH_TASK_NUM`**
This environment variable sets dispatch task number for `master-server`. The default value is `3`.
**`MASTER_HOST_SELECTOR`**
This environment variable sets host selector for `master-server`. Optional values include `Random`, `RoundRobin` and `LowerWeight`. The default value is `LowerWeight`.
**`MASTER_HEARTBEAT_INTERVAL`**
@ -278,19 +380,21 @@ This environment variable sets task commit interval for `master-server`. The def
**`MASTER_MAX_CPULOAD_AVG`**
This environment variable sets max cpu load avg for `master-server`. The default value is `100`.
This environment variable sets max cpu load avg for `master-server`. The default value is `-1`.
**`MASTER_RESERVED_MEMORY`**
This environment variable sets reserved memory for `master-server`. The default value is `0.1`.
This environment variable sets reserved memory for `master-server`, the unit is G. The default value is `0.3`.
### Worker Server
**`MASTER_LISTEN_PORT`**
**`WORKER_SERVER_OPTS`**
This environment variable sets port for `master-server`. The default value is `5678`.
This environment variable sets jvm options for `worker-server`. The default value is `-Xms1g -Xmx1g -Xmn512m`.
**`WORKER_EXEC_THREADS`**
This environment variable sets exec thread num for `worker-server`. The default value is `100`.
This environment variable sets exec thread number for `worker-server`. The default value is `100`.
**`WORKER_HEARTBEAT_INTERVAL`**
@ -298,20 +402,22 @@ This environment variable sets heartbeat interval for `worker-server`. The defau
**`WORKER_MAX_CPULOAD_AVG`**
This environment variable sets max cpu load avg for `worker-server`. The default value is `100`.
This environment variable sets max cpu load avg for `worker-server`. The default value is `-1`.
**`WORKER_RESERVED_MEMORY`**
This environment variable sets reserved memory for `worker-server`. The default value is `0.1`.
**`WORKER_LISTEN_PORT`**
This environment variable sets port for `worker-server`. The default value is `1234`.
This environment variable sets reserved memory for `worker-server`, the unit is G. The default value is `0.3`.
**`WORKER_GROUPS`**
This environment variable sets groups for `worker-server`. The default value is `default`.
### Alert Server
**`ALERT_SERVER_OPTS`**
This environment variable sets jvm options for `alert-server`. The default value is `-Xms512m -Xmx512m -Xmn256m`.
**`XLS_FILE_PATH`**
This environment variable sets xls file path for `alert-server`. The default value is `/tmp/xls`.
@ -368,19 +474,31 @@ This environment variable sets enterprise wechat agent id for `alert-server`. Th
This environment variable sets enterprise wechat users for `alert-server`. The default value is empty.
### Api Server
**`API_SERVER_OPTS`**
This environment variable sets jvm options for `api-server`. The default value is `-Xms512m -Xmx512m -Xmn256m`.
### Logger Server
**`LOGGER_SERVER_OPTS`**
This environment variable sets jvm options for `logger-server`. The default value is `-Xms512m -Xmx512m -Xmn256m`.
## Initialization scripts
If you would like to do additional initialization in an image derived from this one, add one or more environment variable under `/root/start-init-conf.sh`, and modify template files in `/opt/dolphinscheduler/conf/*.tpl`.
If you would like to do additional initialization in an image derived from this one, add one or more environment variables under `/root/start-init-conf.sh`, and modify template files in `/opt/dolphinscheduler/conf/*.tpl`.
For example, to add an environment variable `API_SERVER_PORT` in `/root/start-init-conf.sh`:
For example, to add an environment variable `SECURITY_AUTHENTICATION_TYPE` in `/root/start-init-conf.sh`:
```
export API_SERVER_PORT=5555
export SECURITY_AUTHENTICATION_TYPE=PASSWORD
```
and to modify `/opt/dolphinscheduler/conf/application-api.properties.tpl` template file, add server port:
and to modify `application-api.properties.tpl` template file, add the `SECURITY_AUTHENTICATION_TYPE`:
### How to use MySQL as the DolphinScheduler's database instead of PostgreSQL?
> Because of the commercial license, we cannot directly use the driver and client of MySQL.
> Because of the commercial license, we cannot directly use the driver of MySQL.
>
> If you want to use MySQL, you can build a new image based on the `apache/dolphinscheduler` image as follows.
1. Download the MySQL driver [mysql-connector-java-5.1.49.jar](https://repo1.maven.org/maven2/mysql/mysql-connector-java/5.1.49/mysql-connector-java-5.1.49.jar) (require `>=5.1.47`)
2. Create a new `Dockerfile` to add MySQL driver and client:
Check whether the task log contains the output like `Pi is roughly 3.146015`
6. Verify Spark under a Spark task
The file `spark-examples_2.11-2.4.7.jar` needs to be uploaded to the resources first, and then create a Spark task with:
- Spark Version: `SPARK2`
- Main Class: `org.apache.spark.examples.SparkPi`
- Main Package: `spark-examples_2.11-2.4.7.jar`
- Deploy Mode: `local`
Similarly, check whether the task log contains the output like `Pi is roughly 3.146015`
7. Verify Spark on YARN
Spark on YARN (Deploy Mode is `cluster` or `client`) requires Hadoop support. Similar to Spark support, the operation of supporting Hadoop is almost the same as the previous steps
Ensure that `$HADOOP_HOME` and `$HADOOP_CONF_DIR` exists
For more information please refer to the [incubator-dolphinscheduler](https://github.com/apache/incubator-dolphinscheduler.git) documentation.
# user data local directory path, please make sure the directory exists and have read write permissions
data.basedir.path=${DATA_BASEDIR_PATH}
# resource storage type: HDFS, S3, NONE
resource.storage.type=${RESOURCE_STORAGE_TYPE}
# resource store on HDFS/S3 path, resource file will store to this hadoop hdfs path, self configuration, please make sure the directory exists on hdfs and have read write permissions。"/dolphinscheduler" is recommended
# resource store on HDFS/S3 path, resource file will store to this hadoop hdfs path, self configuration, please make sure the directory exists on hdfs and have read write permissions. "/dolphinscheduler" is recommended
resource.upload.path=${RESOURCE_UPLOAD_PATH}
# user data local directory path, please make sure the directory exists and have read write permissions
# if resource.storage.type=HDFS, the user need to have permission to create directories under the HDFS root path
hdfs.root.user=hdfs
# if resource.storage.type=HDFS, the user must have the permission to create directories under the HDFS root path
hdfs.root.user=${HDFS_ROOT_USER}
# if resource.storage.type=S3, the value like: s3a://dolphinscheduler; if resource.storage.type=HDFS, When namenode HA is enabled, you need to copy core-site.xml and hdfs-site.xml to conf dir
# if resource.storage.type=S3, the value like: s3a://dolphinscheduler; if resource.storage.type=HDFS and namenode HA is enabled, you need to copy core-site.xml and hdfs-site.xml to conf dir
# if resourcemanager HA enable or not use resourcemanager, please keep the default value; If resourcemanager is single, you only need to replace ds1 to actual resourcemanager hostname.
# if resourcemanager HA is enabled or not use resourcemanager, please keep the default value; If resourcemanager is single, you only need to replace ds1 to actual resourcemanager hostname
@ -41,7 +41,7 @@ If `ingress.enabled` in `values.yaml` is set to `true`, you just access `http://
> **Tip**: If there is a problem with ingress access, please contact the Kubernetes administrator and refer to the [Ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/)
Otherwise, you need to execute port-forward command like:
Otherwise, when `api.service.type=ClusterIP`you need to execute port-forward command like:
| Spark-Local(client) | Indirect Yes | Refer to FAQ |
| Spark-YARN(cluster) | Indirect Yes | Refer to FAQ |
| Spark-Mesos(cluster) | Not Yet | |
| Spark-Standalone(cluster) | Not Yet | |
| Spark-Kubernetes(cluster) | Not Yet | |
| Flink-Local(local>=1.11) | Not Yet | Generic CLI mode is not yet supported |
| Flink-YARN(yarn-cluster) | Indirect Yes | Refer to FAQ |
| Flink-YARN(yarn-session/yarn-per-job/yarn-application>=1.11) | Not Yet | Generic CLI mode is not yet supported |
| Flink-Mesos(default) | Not Yet | |
| Flink-Mesos(remote>=1.11) | Not Yet | Generic CLI mode is not yet supported |
| Flink-Standalone(default) | Not Yet | |
| Flink-Standalone(remote>=1.11) | Not Yet | Generic CLI mode is not yet supported |
| Flink-Kubernetes(default) | Not Yet | |
| Flink-Kubernetes(remote>=1.11) | Not Yet | Generic CLI mode is not yet supported |
| Flink-NativeKubernetes(kubernetes-session/application>=1.11) | Not Yet | Generic CLI mode is not yet supported |
| MapReduce | Indirect Yes | Refer to FAQ |
| Kerberos | Indirect Yes | Refer to FAQ |
| HTTP | Yes | |
| DataX | Indirect Yes | Refer to FAQ |
| Sqoop | Indirect Yes | Refer to FAQ |
| SQL-MySQL | Indirect Yes | Refer to FAQ |
| SQL-PostgreSQL | Yes | |
| SQL-Hive | Indirect Yes | Refer to FAQ |
| SQL-Spark | Indirect Yes | Refer to FAQ |
| SQL-ClickHouse | Indirect Yes | Refer to FAQ |
| SQL-Oracle | Indirect Yes | Refer to FAQ |
| SQL-SQLServer | Indirect Yes | Refer to FAQ |
| SQL-DB2 | Indirect Yes | Refer to FAQ |
## Configuration
The configuration file is `values.yaml`, and the following tables lists the configurable parameters of the DolphinScheduler chart and their default values.
@ -105,7 +153,6 @@ The configuration file is `values.yaml`, and the following tables lists the conf
| | | |
| `zookeeper.enabled` | If not exists external Zookeeper, by default, the DolphinScheduler will use a internal Zookeeper | `true` |
| `zookeeper.fourlwCommandsWhitelist` | A list of comma separated Four Letter Words commands to use | `srvr,ruok,wchs,cons` |
| `zookeeper.service.port` | ZooKeeper port | `2181` |
| `zookeeper.persistence.enabled` | Set `zookeeper.persistence.enabled` to `true` to mount a new volume for internal Zookeeper | `false` |
| `zookeeper.persistence.storageClass` | Zookeeper data persistent volume storage class. If set to "-", storageClassName: "", which disables dynamic provisioning | `-` |
@ -113,6 +160,7 @@ The configuration file is `values.yaml`, and the following tables lists the conf
| `externalZookeeper.zookeeperQuorum` | If exists external Zookeeper, and set `zookeeper.enabled` value to false. Specify Zookeeper quorum | `127.0.0.1:2181` |
| `externalZookeeper.zookeeperRoot` | If exists external Zookeeper, and set `zookeeper.enabled` value to false. Specify dolphinscheduler root directory in Zookeeper | `/dolphinscheduler` |
| | | |
| `common.configmap.DOLPHINSCHEDULER_OPTS` | The jvm options for dolphinscheduler, suitable for all servers | `""` |
| `common.configmap.DATA_BASEDIR_PATH` | User data directory path, self configuration, please make sure the directory exists and have read write permissions | `/tmp/dolphinscheduler` |
| `common.configmap.RESOURCE_UPLOAD_PATH` | Resource store on HDFS/S3 path, please make sure the directory exists on hdfs and have read write permissions | `/dolphinscheduler` |
@ -120,7 +168,27 @@ The configuration file is `values.yaml`, and the following tables lists the conf
| `common.configmap.FS_S3A_ENDPOINT` | S3 endpoint when `common.configmap.RESOURCE_STORAGE_TYPE` is set to `S3` | `s3.xxx.amazonaws.com` |
| `common.configmap.FS_S3A_ACCESS_KEY` | S3 access key when `common.configmap.RESOURCE_STORAGE_TYPE` is set to `S3` | `xxxxxxx` |
| `common.configmap.FS_S3A_SECRET_KEY` | S3 secret key when `common.configmap.RESOURCE_STORAGE_TYPE` is set to `S3` | `xxxxxxx` |
| `common.configmap.HADOOP_SECURITY_AUTHENTICATION_STARTUP_STATE` | Whether to startup kerberos | `false` |
| `common.configmap.JAVA_SECURITY_KRB5_CONF_PATH` | The java.security.krb5.conf path | `/opt/krb5.conf` |
| `common.configmap.LOGIN_USER_KEYTAB_USERNAME` | The login user from keytab username | `hdfs@HADOOP.COM` |
| `common.configmap.LOGIN_USER_KEYTAB_PATH` | The login user from keytab path | `/opt/hdfs.keytab` |
| `common.configmap.KERBEROS_EXPIRE_TIME` | The kerberos expire time, the unit is hour | `2` |
| `common.configmap.HDFS_ROOT_USER` | The HDFS root user who must have the permission to create directories under the HDFS root path | `hdfs` |
| `common.configmap.YARN_RESOURCEMANAGER_HA_RM_IDS` | If resourcemanager HA is enabled, please set the HA IPs | `nil` |
| `common.configmap.YARN_APPLICATION_STATUS_ADDRESS` | If resourcemanager is single, you only need to replace ds1 to actual resourcemanager hostname, otherwise keep default | `http://ds1:8088/ws/v1/cluster/apps/%s` |
| `common.configmap.SKYWALKING_ENABLE` | Set whether to enable skywalking | `false` |
| `common.configmap.SW_AGENT_COLLECTOR_BACKEND_SERVICES` | Set agent collector backend services for skywalking | `127.0.0.1:11800` |
| `common.configmap.SW_GRPC_LOG_SERVER_HOST` | Set grpc log server host for skywalking | `127.0.0.1` |
| `common.configmap.SW_GRPC_LOG_SERVER_PORT` | Set grpc log server port for skywalking | `11800` |
| `common.configmap.HADOOP_HOME` | Set `HADOOP_HOME` for DolphinScheduler's task environment | `/opt/soft/hadoop` |
| `common.configmap.HADOOP_CONF_DIR` | Set `HADOOP_CONF_DIR` for DolphinScheduler's task environment | `/opt/soft/hadoop/etc/hadoop` |
| `common.configmap.SPARK_HOME1` | Set `SPARK_HOME1` for DolphinScheduler's task environment | `/opt/soft/spark1` |
| `common.configmap.SPARK_HOME2` | Set `SPARK_HOME2` for DolphinScheduler's task environment | `/opt/soft/spark2` |
| `common.configmap.PYTHON_HOME` | Set `PYTHON_HOME` for DolphinScheduler's task environment | `/usr/bin/python` |
| `common.configmap.JAVA_HOME` | Set `JAVA_HOME` for DolphinScheduler's task environment | `/usr/local/openjdk-8` |
| `common.configmap.HIVE_HOME` | Set `HIVE_HOME` for DolphinScheduler's task environment | `/opt/soft/hive` |
| `common.configmap.FLINK_HOME` | Set `FLINK_HOME` for DolphinScheduler's task environment | `/opt/soft/flink` |
| `common.configmap.DATAX_HOME` | Set `DATAX_HOME` for DolphinScheduler's task environment | `/opt/soft/datax` |
| `common.sharedStoragePersistence.enabled` | Set `common.sharedStoragePersistence.enabled` to `true` to mount a shared storage volume for Hadoop, Spark binary and etc | `false` |
| `common.sharedStoragePersistence.mountPath` | The mount path for the shared storage volume | `/opt/soft` |
| `common.sharedStoragePersistence.accessModes` | `PersistentVolumeClaim` access modes, must be `ReadWriteMany` | `[ReadWriteMany]` |
@ -138,15 +206,16 @@ The configuration file is `values.yaml`, and the following tables lists the conf
| `master.nodeSelector` | NodeSelector is a selector which must be true for the pod to fit on a node | `{}` |
| `master.tolerations` | If specified, the pod's tolerations | `{}` |
| `master.resources` | The `resource` limit and request config for master server | `{}` |
| `master.configmap.DOLPHINSCHEDULER_OPTS` | The jvm options for master server | `""` |
| `master.configmap.MASTER_EXEC_THREADS` | Master execute thread num | `100` |
| `master.configmap.MASTER_SERVER_OPTS` | The jvm options for master server | `-Xms1g -Xmx1g -Xmn512m` |
| `master.configmap.MASTER_EXEC_THREADS` | Master execute thread number | `100` |
| `master.configmap.MASTER_EXEC_TASK_NUM` | Master execute task number in parallel | `20` |
| `master.configmap.MASTER_DISPATCH_TASK_NUM` | Master dispatch task number | `3` |
| `master.configmap.MASTER_HOST_SELECTOR` | Master host selector to select a suitable worker, optional values include Random, RoundRobin, LowerWeight | `LowerWeight` |
| `master.configmap.MASTER_MAX_CPULOAD_AVG` | Only less than cpu avg load, master server can work. default value : the number of cpu cores * 2 | `100` |
| `master.configmap.MASTER_RESERVED_MEMORY` | Only larger than reserved memory, master server can work. default value : physical memory * 1/10, unit is G | `0.1` |
| `master.configmap.MASTER_LISTEN_PORT` | Master listen port | `5678` |
| `master.configmap.MASTER_MAX_CPULOAD_AVG` | Only less than cpu avg load, master server can work. default value : the number of cpu cores * 2 | `-1` |
| `master.configmap.MASTER_RESERVED_MEMORY` | Only larger than reserved memory, master server can work. default value : physical memory * 1/10, unit is G | `0.3` |
| `master.livenessProbe.enabled` | Turn on and off liveness probe | `true` |
| `master.livenessProbe.initialDelaySeconds` | Delay before liveness probe is initiated | `30` |
| `master.livenessProbe.periodSeconds` | How often to perform the probe | `30` |
@ -171,13 +240,12 @@ The configuration file is `values.yaml`, and the following tables lists the conf
| `worker.nodeSelector` | NodeSelector is a selector which must be true for the pod to fit on a node | `{}` |
| `worker.tolerations` | If specified, the pod's tolerations | `{}` |
| `worker.resources` | The `resource` limit and request config for worker server | `{}` |
| `worker.configmap.DOLPHINSCHEDULER_OPTS` | The jvm options for worker server | `""` |
| `worker.configmap.LOGGER_SERVER_OPTS` | The jvm options for logger server (since `logger-server` is deployed with `worker-server`, it needs to be set separately) | `""` |
| `worker.configmap.WORKER_EXEC_THREADS` | Worker execute thread num | `100` |
| `worker.configmap.LOGGER_SERVER_OPTS` | The jvm options for logger server | `-Xms512m -Xmx512m -Xmn256m` |
| `worker.configmap.WORKER_SERVER_OPTS` | The jvm options for worker server | `-Xms1g -Xmx1g -Xmn512m` |
| `worker.configmap.WORKER_EXEC_THREADS` | Worker execute thread number | `100` |
| `worker.configmap.WORKER_MAX_CPULOAD_AVG` | Only less than cpu avg load, worker server can work. default value : the number of cpu cores * 2 | `100` |
| `worker.configmap.WORKER_RESERVED_MEMORY` | Only larger than reserved memory, worker server can work. default value : physical memory * 1/10, unit is G | `0.1` |
| `worker.configmap.WORKER_LISTEN_PORT` | Worker listen port | `1234` |
| `worker.configmap.WORKER_MAX_CPULOAD_AVG` | Only less than cpu avg load, worker server can work. default value : the number of cpu cores * 2 | `-1` |
| `worker.configmap.WORKER_RESERVED_MEMORY` | Only larger than reserved memory, worker server can work. default value : physical memory * 1/10, unit is G | `0.3` |
| `worker.configmap.WORKER_GROUPS` | Worker groups | `default` |
| `worker.livenessProbe.enabled` | Turn on and off liveness probe | `true` |
| `worker.livenessProbe.initialDelaySeconds` | Delay before liveness probe is initiated | `30` |
@ -210,7 +278,7 @@ The configuration file is `values.yaml`, and the following tables lists the conf
| `alert.nodeSelector` | NodeSelector is a selector which must be true for the pod to fit on a node | `{}` |
| `alert.tolerations` | If specified, the pod's tolerations | `{}` |
| `alert.resources` | The `resource` limit and request config for alert server | `{}` |
| `alert.configmap.DOLPHINSCHEDULER_OPTS` | The jvm options for alert server | `""` |
| `alert.configmap.ALERT_SERVER_OPTS` | The jvm options for alert server | `-Xms512m -Xmx512m -Xmn256m` |
| `api.persistentVolumeClaim.storageClassName` | `api` logs data persistent volume storage class. If set to "-", storageClassName: "", which disables dynamic provisioning | `-` |
| `api.service.type` | `type` determines how the Service is exposed. Valid options are ExternalName, ClusterIP, NodePort, and LoadBalancer | `ClusterIP` |
| `api.service.clusterIP` | `clusterIP` is the IP address of the service and is usually assigned randomly by the master | `nil` |
| `api.service.nodePort` | `nodePort` is the port on each node on which this service is exposed when type=NodePort | `nil` |
| `api.service.externalIPs` | `externalIPs` is a list of IP addresses for which nodes in the cluster will also accept traffic for this service | `[]` |
| `api.service.externalName` | `externalName` is the external reference that kubedns or equivalent will return as a CNAME record for this service | `nil` |
| `api.service.loadBalancerIP` | `loadBalancerIP` when service.type is LoadBalancer. LoadBalancer will get created with the IP specified in this field | `nil` |
| `api.service.annotations` | `annotations` may need to be set when service.type is LoadBalancer | `{}` |
@ -279,29 +354,28 @@ The configuration file is `values.yaml`, and the following tables lists the conf
### How to use MySQL as the DolphinScheduler's database instead of PostgreSQL?
> Because of the commercial license, we cannot directly use the driver and client of MySQL.
> Because of the commercial license, we cannot directly use the driver of MySQL.
>
> If you want to use MySQL, you can build a new image based on the `apache/dolphinscheduler` image as follows.
1. Download the MySQL driver [mysql-connector-java-5.1.49.jar](https://repo1.maven.org/maven2/mysql/mysql-connector-java/5.1.49/mysql-connector-java-5.1.49.jar) (require `>=5.1.47`)
2. Create a new `Dockerfile` to add MySQL driver and client:
Check whether the task log contains the output like `Pi is roughly 3.146015`
7. Verify Spark under a Spark task
The file `spark-examples_2.11-2.4.7.jar` needs to be uploaded to the resources first, and then create a Spark task with:
- Spark Version: `SPARK2`
- Main Class: `org.apache.spark.examples.SparkPi`
- Main Package: `spark-examples_2.11-2.4.7.jar`
- Deploy Mode: `local`
Similarly, check whether the task log contains the output like `Pi is roughly 3.146015`
8. Verify Spark on YARN
Spark on YARN (Deploy Mode is `cluster` or `client`) requires Hadoop support. Similar to Spark support, the operation of supporting Hadoop is almost the same as the previous steps
Ensure that `$HADOOP_HOME` and `$HADOOP_CONF_DIR` exists
For more information please refer to the [incubator-dolphinscheduler](https://github.com/apache/incubator-dolphinscheduler.git) documentation.
# user data local directory path, please make sure the directory exists and have read write permissions
data.basedir.path=/tmp/dolphinscheduler
# resource storage type: HDFS, S3, NONE
resource.storage.type=NONE
# resource store on HDFS/S3 path, resource file will store to this hadoop hdfs path, self configuration, please make sure the directory exists on hdfs and have read write permissions。"/dolphinscheduler" is recommended
# resource store on HDFS/S3 path, resource file will store to this hadoop hdfs path, self configuration, please make sure the directory exists on hdfs and have read write permissions. "/dolphinscheduler" is recommended
resource.upload.path=/dolphinscheduler
# user data local directory path, please make sure the directory exists and have read write permissions
# if resource.storage.type=HDFS, the user need to have permission to create directories under the HDFS root path
# if resource.storage.type=HDFS, the user must have the permission to create directories under the HDFS root path
hdfs.root.user=hdfs
# if resource.storage.type=S3, the value like: s3a://dolphinscheduler; if resource.storage.type=HDFS, When namenode HA is enabled, you need to copy core-site.xml and hdfs-site.xml to conf dir
# if resource.storage.type=S3, the value like: s3a://dolphinscheduler; if resource.storage.type=HDFS and namenode HA is enabled, you need to copy core-site.xml and hdfs-site.xml to conf dir
# if resourcemanager HA enable or not use resourcemanager, please keep the default value; If resourcemanager is single, you only need to replace ds1 to actual resourcemanager hostname.
# if resourcemanager HA is enabled or not use resourcemanager, please keep the default value; If resourcemanager is single, you only need to replace ds1 to actual resourcemanager hostname