This document explains the DolphinScheduler application configurations according to DolphinScheduler-1.3.x versions.
This document explains the DolphinScheduler application configurations.
## Directory Structure
Currently, all the configuration files are under [conf ] directory.
Check the following simplified DolphinScheduler installation directories to have a direct view about the position of [conf] directory and configuration files it has.
This document only describes DolphinScheduler configurations and other topics are not going into.
[Note: the DolphinScheduler (hereinafter called the ‘DS’) .]
The directory structure of DolphinScheduler is as follows:
```
├── LICENSE
@ -100,27 +96,13 @@ This document only describes DolphinScheduler configurations and other topics ar
## Configurations in Details
serial number| service classification| config file|
|--|--|--|
1|startup or shutdown DS application|dolphinscheduler-daemon.sh
spring.datasource.testWhileIdle|true| set whether the pool validates the allocated connection when a new connection request comes
spring.datasource.testOnBorrow|true| validity check when the program requests a new connection
spring.datasource.testOnReturn|false| validity check when the program recalls a connection
spring.datasource.defaultAutoCommit|true| whether auto commit
spring.datasource.keepAlive|true| runs validationQuery SQL to avoid the connection closed by pool when the connection idles over minEvictableIdleTimeMillis
spring.datasource.poolPreparedStatements|true| open PSCache
spring.datasource.maxPoolPreparedStatementPerConnectionSize|20| specify the size of PSCache on each connection
Currently, common.properties mainly configures Hadoop,s3a related configurations.
| Parameters | Default value | Description |
|--|--|--|
data.basedir.path | /tmp/dolphinscheduler | local directory used to store temp files
resource.storage.type | NONE | type of resource files: HDFS, S3, NONE
resource.storage.upload.base.path | /dolphinscheduler | storage path of resource files
resource.aws.access.key.id | minioadmin | access key id of S3
resource.aws.secret.access.key | minioadmin | secret access key of S3
resource.aws.region |us-east-1 | region of S3
resource.aws.s3.bucket.name | dolphinscheduler | bucket name of S3
resource.aws.s3.endpoint | http://minio:9000 | endpoint of S3
resource.hdfs.root.user | hdfs | configure users with corresponding permissions if storage type is HDFS
resource.hdfs.fs.defaultFS | hdfs://mycluster:8020 | If resource.storage.type=S3, then the request url would be similar to 's3a://dolphinscheduler'. Otherwise if resource.storage.type=HDFS and hadoop supports HA, copy core-site.xml and hdfs-site.xml into 'conf' directory
hadoop.security.authentication.startup.state | false | whether hadoop grant kerberos permission
login.user.keytab.path | /opt/hdfs.headless.keytab | kerberos user keytab
kerberos.expire.time | 2 | kerberos expire time,integer,the unit is hour
yarn.resourcemanager.ha.rm.ids | | specify the yarn resourcemanager url. if resourcemanager supports HA, input HA IP addresses (separated by comma), or input null for standalone
yarn.application.status.address | http://ds1:8088/ws/v1/cluster/apps/%s | keep default if ResourceManager supports HA or not use ResourceManager, or replace ds1 with corresponding hostname if ResourceManager in standalone mode
security.authentication.ldap.user.identity-attribute|uid|LDAP user identity attribute
security.authentication.ldap.user.email-attribute|mail|LDAP user email attribute
security.authentication.ldap.user.not-exist-action|CREATE|action when LDAP user is not exist. Default CREATE: automatically create user when user not exist, DENY: deny log-in when user not exist
traffic.control.global.switch|false|traffic control global switch
traffic.control.max-global-qps-rate|300|global max request number per second
traffic.control.tenant-switch|false|traffic control tenant switch
traffic.control.default-tenant-qps-rate|10|default tenant max request number per second
traffic.control.customize-tenant-qps-rate||customize tenant max request number per second
### master.properties [master-service log config]
|data.basedir.path | /tmp/dolphinscheduler | local directory used to store temp files|
|resource.storage.type | NONE | type of resource files: HDFS, S3, NONE|
|resource.upload.path | /dolphinscheduler | storage path of resource files|
|aws.access.key.id | minioadmin | access key id of S3|
|aws.secret.access.key | minioadmin | secret access key of S3|
|aws.region | us-east-1 | region of S3|
|aws.s3.endpoint | http://minio:9000 | endpoint of S3|
|hdfs.root.user | hdfs | configure users with corresponding permissions if storage type is HDFS|
|fs.defaultFS | hdfs://mycluster:8020 | If resource.storage.type=S3, then the request url would be similar to 's3a://dolphinscheduler'. Otherwise if resource.storage.type=HDFS and hadoop supports HA, copy core-site.xml and hdfs-site.xml into 'conf' directory|
|hadoop.security.authentication.startup.state | false | whether hadoop grant kerberos permission|
|login.user.keytab.path | /opt/hdfs.headless.keytab | kerberos user keytab|
|kerberos.expire.time | 2 | kerberos expire time,integer,the unit is hour|
|yarn.resourcemanager.ha.rm.ids | 192.168.xx.xx,192.168.xx.xx | specify the yarn resourcemanager url. if resourcemanager supports HA, input HA IP addresses (separated by comma), or input null for standalone|
|yarn.application.status.address | http://ds1:8088/ws/v1/cluster/apps/%s | keep default if ResourceManager supports HA or not use ResourceManager, or replace ds1 with corresponding hostname if ResourceManager in standalone mode|
|development.state | false | specify whether in development state|
|resource.manager.httpaddress.port | 8088 | the port of resource manager|
|yarn.job.history.status.address | http://ds1:19888/ws/v1/history/mapreduce/jobs/%s | job history status url of yarn|
|datasource.encryption.enable | false | whether to enable datasource encryption|
|datasource.encryption.salt | !@#$%^&* | the salt of the datasource encryption|
|data-quality.jar.name | dolphinscheduler-data-quality-dev-SNAPSHOT.jar | the jar of data quality|
|support.hive.oneSession | false | specify whether hive SQL is executed in the same session|
|sudo.enable | true | whether to enable sudo|
|alert.rpc.port | 50052 | the RPC port of Alert Server|
|zeppelin.rest.url | http://localhost:8080 | the RESTful API url of zeppelin|
### Api-server related configuration
Location: `api-server/conf/application.yaml`
|Parameters | Default value| Description|
|--|--|--|
master.listen.port|5678|master listen port
master.exec.threads|100|master-service execute thread number, used to limit the number of process instances in parallel
master.exec.task.num|20|defines the number of parallel tasks for each process instance of the master-service
master.dispatch.task.num|3|defines the number of dispatch tasks for each batch of the master-service
master.host.selector|LowerWeight|master host selector, to select a suitable worker to run the task, optional value: random, round-robin, lower weight
master.heartbeat.interval|10|master heartbeat interval, the unit is second
master.task.commit.retryTimes|5|master commit task retry times
master.task.commit.interval|1000|master commit task interval, the unit is millisecond
master.max.cpuload.avg|-1|master max CPU load avg, only higher than the system CPU load average, master server can schedule. default value -1: the number of CPU cores * 2
master.reserved.memory|0.3|master reserved memory, only lower than system available memory, master server can schedule. default value 0.3, the unit is G
### worker.properties [worker-service log config]
|server.port|12345|api service communication port|
|security.authentication.ldap.user.identity.attribute|uid|LDAP user identity attribute|
|security.authentication.ldap.user.email.attribute|mail|LDAP user email attribute|
### Master Server related configuration
Location: `master-server/conf/application.yaml`
|Parameters | Default value| Description|
|--|--|--|
worker.listen.port|1234|worker-service listen port
worker.exec.threads|100|worker-service execute thread number, used to limit the number of task instances in parallel
worker.heartbeat.interval|10|worker-service heartbeat interval, the unit is second
worker.max.cpuload.avg|-1|worker max CPU load avg, only higher than the system CPU load average, worker server can be dispatched tasks. default value -1: the number of CPU cores * 2
worker.reserved.memory|0.3|worker reserved memory, only lower than system available memory, worker server can be dispatched tasks. default value 0.3, the unit is G
worker.groups|default|worker groups separated by comma, e.g., 'worker.groups=default,test' <br> worker will join corresponding group according to this config when startup
worker.tenant.auto.create|true|tenant corresponds to the user of the system, which is used by the worker to submit the job. If system does not have this user, it will be automatically created after the parameter worker.tenant.auto.create is true.
worker.tenant.distributed.user|false|Scenes to be used for distributed users.For example,users created by FreeIpa are stored in LDAP.This parameter only applies to Linux, When this parameter is true, worker.tenant.auto.create has no effect and will not automatically create tenants.
### alert.properties [alert-service log config]
|master.listen-port|5678|master listen port|
|master.fetch-command-num|10|the number of commands fetched by master|
|master.pre-exec-threads|10|master prepare execute thread number to limit handle commands in parallel|
|master.exec-threads|100|master execute thread number to limit process instances in parallel|
|master.dispatch-task-number|3|master dispatch task number per batch|
|master.host-selector|lower_weight|master host selector to select a suitable worker, default value: LowerWeight. Optional values include random, round_robin, lower_weight|
|master.heartbeat-interval|10|master heartbeat interval, the unit is second|
|master.task-commit-interval|1000|master commit task interval, the unit is millisecond|
|master.state-wheel-interval|5|time to check status|
|master.max-cpu-load-avg|-1|master max CPU load avg, only higher than the system CPU load average, master server can schedule. default value -1: the number of CPU cores * 2|
|master.reserved-memory|0.3|master reserved memory, only lower than system available memory, master server can schedule. default value 0.3, the unit is G|
|master.failover-interval|10|failover interval, the unit is minute|
|master.kill-yarn-job-when-task-failover|true|whether to kill yarn job when failover taskInstance|
### Worker Server related configuration
Location: `worker-server/conf/application.yaml`
|Parameters | Default value| Description|
|--|--|--|
alert.type|EMAIL|alter type|
mail.protocol|SMTP|mail server protocol
mail.server.host|xxx.xxx.com|mail server host
mail.server.port|25|mail server port
mail.sender|xxx@xxx.com|mail sender email
mail.user|xxx@xxx.com|mail sender email name
mail.passwd|111111|mail sender email password
mail.smtp.starttls.enable|true|specify mail whether open tls
mail.smtp.ssl.enable|false|specify mail whether open ssl
mail.smtp.ssl.trust|xxx.xxx.com|specify mail ssl trust list
|worker.exec-threads|100|worker-service execute thread number, used to limit the number of task instances in parallel|
|worker.heartbeat-interval|10|worker-service heartbeat interval, the unit is second|
|worker.host-weight|100|worker host weight to dispatch tasks|
|worker.tenant-auto-create|true|tenant corresponds to the user of the system, which is used by the worker to submit the job. If system does not have this user, it will be automatically created after the parameter worker.tenant.auto.create is true.|
|worker.max-cpu-load-avg|-1|worker max CPU load avg, only higher than the system CPU load average, worker server can be dispatched tasks. default value -1: the number of CPU cores * 2|
|worker.reserved-memory|0.3|worker reserved memory, only lower than system available memory, worker server can be dispatched tasks. default value 0.3, the unit is G|
|worker.groups|default|worker groups separated by comma, e.g., 'worker.groups=default,test' <br> worker will join corresponding group according to this config when startup|
|worker.alert-listen-host|localhost|the alert listen host of worker|
|worker.alert-listen-port|50052|the alert listen port of worker|
### Alert Server related configuration
Location: `alert-server/conf/application.yaml`
This part describes quartz configs and configure them based on your practical situation and resources.
### install_config.conf [DS environment variables configuration script[install or start DS]]
install_config.conf is a bit complicated and is mainly used in the following two places.
* DS Cluster Auto Installation.
> System will load configs in the install_config.conf and auto-configure files below, based on the file content when executing 'install.sh'.
> Files such as dolphinscheduler-daemon.sh, datasource.properties, zookeeper.properties, common.properties, application-api.properties, master.properties, worker.properties, alert.properties, quartz.properties, etc.
* Startup and Shutdown DS Cluster.
> The system will load masters, workers, alert-server, API-servers and other parameters inside the file to startup or shutdown DS cluster.
#### File Content
```bash
# Note: please escape the character if the file contains special characters such as `.*[]^${}\+?|()@#&`.
# eg: `[` escape to `\[`
# Database type (DS currently only supports PostgreSQL and MySQL)
# DS installation path, such as '/data1_1T/dolphinscheduler'
installPath="/data1_1T/dolphinscheduler"
# Deployment user
# Note: Deployment user needs 'sudo' privilege and has rights to operate HDFS.
# Root directory must be created by the same user if using HDFS, otherwise permission related issues will be raised.
deployUser="dolphinscheduler"
# Followings are alert-service configs
# Mail server host
mailServerHost="smtp.exmail.qq.com"
# Mail server port
mailServerPort="25"
# Mail sender
mailSender="xxxxxxxxxx"
# Mail user
mailUser="xxxxxxxxxx"
# Mail password
mailPassword="xxxxxxxxxx"
# Whether mail supports TLS
starttlsEnable="true"
# Whether mail supports SSL. Note: starttlsEnable and sslEnable cannot both set true.
sslEnable="false"
# Mail server host, same as mailServerHost
sslTrust="smtp.exmail.qq.com"
# Specify which resource upload function to use for resources storage, such as sql files. And supported options are HDFS, S3 and NONE. HDFS for upload to HDFS and NONE for not using this function.
resourceStorageType="NONE"
# if S3, write S3 address. HA, for example: s3a://dolphinscheduler,
# Note: s3 make sure to create the root directory /dolphinscheduler
defaultFS="hdfs://mycluster:8020"
# If parameter 'resourceStorageType' is S3, following configs are needed:
s3Endpoint="http://192.168.xx.xx:9010"
s3AccessKey="xxxxxxxxxx"
s3SecretKey="xxxxxxxxxx"
# If ResourceManager supports HA, then input master and standby node IP or hostname, eg: '192.168.xx.xx,192.168.xx.xx'. Or else ResourceManager run in standalone mode, please set yarnHaIps="" and "" for not using yarn.
yarnHaIps="192.168.xx.xx,192.168.xx.xx"
# If ResourceManager runs in standalone, then set ResourceManager node ip or hostname, or else remain default.