This document explains the DolphinScheduler application configurations according to DolphinScheduler-1.3.x versions.
This document explains the DolphinScheduler application configurations.
## Directory Structure
## Directory Structure
Currently, all the configuration files are under [conf ] directory.
The directory structure of DolphinScheduler is as follows:
Check the following simplified DolphinScheduler installation directories to have a direct view about the position of [conf] directory and configuration files it has.
This document only describes DolphinScheduler configurations and other topics are not going into.
[Note: the DolphinScheduler (hereinafter called the ‘DS’) .]
```
```
├── LICENSE
├── LICENSE
@ -100,27 +96,13 @@ This document only describes DolphinScheduler configurations and other topics ar
## Configurations in Details
## Configurations in Details
serial number| service classification| config file|
|--|--|--|
1|startup or shutdown DS application|dolphinscheduler-daemon.sh
### dolphinscheduler-daemon.sh [startup or shutdown DolphinScheduler application]
|spring.datasource.hikari.initialization-fail-timeout|1|Connection pool initialization failed timeout|
spring.datasource.testWhileIdle|true| set whether the pool validates the allocated connection when a new connection request comes
spring.datasource.testOnBorrow|true| validity check when the program requests a new connection
Note that DolphinScheduler also supports database configuration through `bin/env/dolphinscheduler_env.sh`.
spring.datasource.testOnReturn|false| validity check when the program recalls a connection
spring.datasource.defaultAutoCommit|true| whether auto commit
spring.datasource.keepAlive|true| runs validationQuery SQL to avoid the connection closed by pool when the connection idles over minEvictableIdleTimeMillis
### Zookeeper related configuration
spring.datasource.poolPreparedStatements|true| open PSCache
DolphinScheduler uses Zookeeper for cluster management, fault tolerance, event monitoring and other functions. Configuration file location:
spring.datasource.maxPoolPreparedStatementPerConnectionSize|20| specify the size of PSCache on each connection
|Service| Configuration file |
|--|--|
|Master Server | `master-server/conf/application.yaml`|
resource.aws.s3.bucket.name | dolphinscheduler | bucket name of S3
The default configuration is as follows:
resource.aws.s3.endpoint | http://minio:9000 | endpoint of S3
resource.hdfs.root.user | hdfs | configure users with corresponding permissions if storage type is HDFS
resource.hdfs.fs.defaultFS | hdfs://mycluster:8020 | If resource.storage.type=S3, then the request url would be similar to 's3a://dolphinscheduler'. Otherwise if resource.storage.type=HDFS and hadoop supports HA, copy core-site.xml and hdfs-site.xml into 'conf' directory
hadoop.security.authentication.startup.state | false | whether hadoop grant kerberos permission
login.user.keytab.path | /opt/hdfs.headless.keytab | kerberos user keytab
kerberos.expire.time | 2 | kerberos expire time,integer,the unit is hour
yarn.resourcemanager.ha.rm.ids | | specify the yarn resourcemanager url. if resourcemanager supports HA, input HA IP addresses (separated by comma), or input null for standalone
yarn.application.status.address | http://ds1:8088/ws/v1/cluster/apps/%s | keep default if ResourceManager supports HA or not use ResourceManager, or replace ds1 with corresponding hostname if ResourceManager in standalone mode
|fs.defaultFS | hdfs://mycluster:8020 | If resource.storage.type=S3, then the request url would be similar to 's3a://dolphinscheduler'. Otherwise if resource.storage.type=HDFS and hadoop supports HA, copy core-site.xml and hdfs-site.xml into 'conf' directory|
security.authentication.type|PASSWORD| authentication type
|hadoop.security.authentication.startup.state | false | whether hadoop grant kerberos permission|
security.authentication.ldap.user.admin|read-only-admin|admin user account when you log-in with LDAP
|yarn.resourcemanager.ha.rm.ids | 192.168.xx.xx,192.168.xx.xx | specify the yarn resourcemanager url. if resourcemanager supports HA, input HA IP addresses (separated by comma), or input null for standalone|
security.authentication.ldap.user.identity-attribute|uid|LDAP user identity attribute
|yarn.application.status.address | http://ds1:8088/ws/v1/cluster/apps/%s | keep default if ResourceManager supports HA or not use ResourceManager, or replace ds1 with corresponding hostname if ResourceManager in standalone mode|
security.authentication.ldap.user.email-attribute|mail|LDAP user email attribute
|development.state | false | specify whether in development state|
security.authentication.ldap.user.not-exist-action|CREATE|action when LDAP user is not exist. Default CREATE: automatically create user when user not exist, DENY: deny log-in when user not exist
|resource.manager.httpaddress.port | 8088 | the port of resource manager|
traffic.control.global.switch|false|traffic control global switch
|yarn.job.history.status.address | http://ds1:19888/ws/v1/history/mapreduce/jobs/%s | job history status url of yarn|
traffic.control.max-global-qps-rate|300|global max request number per second
|datasource.encryption.enable | false | whether to enable datasource encryption|
traffic.control.tenant-switch|false|traffic control tenant switch
|datasource.encryption.salt | !@#$%^&* | the salt of the datasource encryption|
traffic.control.default-tenant-qps-rate|10|default tenant max request number per second
|data-quality.jar.name | dolphinscheduler-data-quality-dev-SNAPSHOT.jar | the jar of data quality|
traffic.control.customize-tenant-qps-rate||customize tenant max request number per second
|support.hive.oneSession | false | specify whether hive SQL is executed in the same session|
|sudo.enable | true | whether to enable sudo|
### master.properties [master-service log config]
|alert.rpc.port | 50052 | the RPC port of Alert Server|
|zeppelin.rest.url | http://localhost:8080 | the RESTful API url of zeppelin|
### Api-server related configuration
Location: `api-server/conf/application.yaml`
|Parameters | Default value| Description|
|Parameters | Default value| Description|
|--|--|--|
|--|--|--|
master.listen.port|5678|master listen port
|server.port|12345|api service communication port|
master.exec.threads|100|master-service execute thread number, used to limit the number of process instances in parallel
master.heartbeat.interval|10|master heartbeat interval, the unit is second
|server.jetty.max-http-post-size|5000000|jetty maximum post size|
master.task.commit.retryTimes|5|master commit task retry times
|spring.banner.charset|UTF-8|message encoding|
master.task.commit.interval|1000|master commit task interval, the unit is millisecond
|spring.jackson.time-zone|UTC|time zone|
master.max.cpuload.avg|-1|master max CPU load avg, only higher than the system CPU load average, master server can schedule. default value -1: the number of CPU cores * 2
master.reserved.memory|0.3|master reserved memory, only lower than system available memory, master server can schedule. default value 0.3, the unit is G
|security.authentication.ldap.user.identity.attribute|uid|LDAP user identity attribute|
|security.authentication.ldap.user.email.attribute|mail|LDAP user email attribute|
### Master Server related configuration
Location: `master-server/conf/application.yaml`
|Parameters | Default value| Description|
|Parameters | Default value| Description|
|--|--|--|
|--|--|--|
worker.listen.port|1234|worker-service listen port
|master.listen-port|5678|master listen port|
worker.exec.threads|100|worker-service execute thread number, used to limit the number of task instances in parallel
|master.fetch-command-num|10|the number of commands fetched by master|
worker.heartbeat.interval|10|worker-service heartbeat interval, the unit is second
|master.pre-exec-threads|10|master prepare execute thread number to limit handle commands in parallel|
worker.max.cpuload.avg|-1|worker max CPU load avg, only higher than the system CPU load average, worker server can be dispatched tasks. default value -1: the number of CPU cores * 2
|master.exec-threads|100|master execute thread number to limit process instances in parallel|
worker.reserved.memory|0.3|worker reserved memory, only lower than system available memory, worker server can be dispatched tasks. default value 0.3, the unit is G
|master.dispatch-task-number|3|master dispatch task number per batch|
worker.groups|default|worker groups separated by comma, e.g., 'worker.groups=default,test' <br> worker will join corresponding group according to this config when startup
|master.host-selector|lower_weight|master host selector to select a suitable worker, default value: LowerWeight. Optional values include random, round_robin, lower_weight|
worker.tenant.auto.create|true|tenant corresponds to the user of the system, which is used by the worker to submit the job. If system does not have this user, it will be automatically created after the parameter worker.tenant.auto.create is true.
|master.heartbeat-interval|10|master heartbeat interval, the unit is second|
worker.tenant.distributed.user|false|Scenes to be used for distributed users.For example,users created by FreeIpa are stored in LDAP.This parameter only applies to Linux, When this parameter is true, worker.tenant.auto.create has no effect and will not automatically create tenants.
|master.task-commit-interval|1000|master commit task interval, the unit is millisecond|
### alert.properties [alert-service log config]
|master.state-wheel-interval|5|time to check status|
|master.max-cpu-load-avg|-1|master max CPU load avg, only higher than the system CPU load average, master server can schedule. default value -1: the number of CPU cores * 2|
|master.reserved-memory|0.3|master reserved memory, only lower than system available memory, master server can schedule. default value 0.3, the unit is G|
|master.failover-interval|10|failover interval, the unit is minute|
|master.kill-yarn-job-when-task-failover|true|whether to kill yarn job when failover taskInstance|
|worker.exec-threads|100|worker-service execute thread number, used to limit the number of task instances in parallel|
mail.server.host|xxx.xxx.com|mail server host
|worker.heartbeat-interval|10|worker-service heartbeat interval, the unit is second|
mail.server.port|25|mail server port
|worker.host-weight|100|worker host weight to dispatch tasks|
mail.sender|xxx@xxx.com|mail sender email
|worker.tenant-auto-create|true|tenant corresponds to the user of the system, which is used by the worker to submit the job. If system does not have this user, it will be automatically created after the parameter worker.tenant.auto.create is true.|
mail.user|xxx@xxx.com|mail sender email name
|worker.max-cpu-load-avg|-1|worker max CPU load avg, only higher than the system CPU load average, worker server can be dispatched tasks. default value -1: the number of CPU cores * 2|
mail.passwd|111111|mail sender email password
|worker.reserved-memory|0.3|worker reserved memory, only lower than system available memory, worker server can be dispatched tasks. default value 0.3, the unit is G|
mail.smtp.starttls.enable|true|specify mail whether open tls
|worker.groups|default|worker groups separated by comma, e.g., 'worker.groups=default,test' <br> worker will join corresponding group according to this config when startup|
mail.smtp.ssl.enable|false|specify mail whether open ssl
|worker.alert-listen-host|localhost|the alert listen host of worker|
mail.smtp.ssl.trust|xxx.xxx.com|specify mail ssl trust list
|worker.alert-listen-port|50052|the alert listen port of worker|
### install_config.conf [DS environment variables configuration script[install or start DS]]
install_config.conf is a bit complicated and is mainly used in the following two places.
* DS Cluster Auto Installation.
> System will load configs in the install_config.conf and auto-configure files below, based on the file content when executing 'install.sh'.
> Files such as dolphinscheduler-daemon.sh, datasource.properties, zookeeper.properties, common.properties, application-api.properties, master.properties, worker.properties, alert.properties, quartz.properties, etc.
* Startup and Shutdown DS Cluster.
> The system will load masters, workers, alert-server, API-servers and other parameters inside the file to startup or shutdown DS cluster.
#### File Content
```bash
# Note: please escape the character if the file contains special characters such as `.*[]^${}\+?|()@#&`.
# eg: `[` escape to `\[`
# Database type (DS currently only supports PostgreSQL and MySQL)
# DS installation path, such as '/data1_1T/dolphinscheduler'
### Quartz related configuration
installPath="/data1_1T/dolphinscheduler"
# Deployment user
This part describes quartz configs and configure them based on your practical situation and resources.
# Note: Deployment user needs 'sudo' privilege and has rights to operate HDFS.
# Root directory must be created by the same user if using HDFS, otherwise permission related issues will be raised.
deployUser="dolphinscheduler"
# Followings are alert-service configs
# Mail server host
mailServerHost="smtp.exmail.qq.com"
# Mail server port
mailServerPort="25"
# Mail sender
mailSender="xxxxxxxxxx"
# Mail user
mailUser="xxxxxxxxxx"
# Mail password
mailPassword="xxxxxxxxxx"
# Whether mail supports TLS
starttlsEnable="true"
# Whether mail supports SSL. Note: starttlsEnable and sslEnable cannot both set true.
sslEnable="false"
# Mail server host, same as mailServerHost
sslTrust="smtp.exmail.qq.com"
# Specify which resource upload function to use for resources storage, such as sql files. And supported options are HDFS, S3 and NONE. HDFS for upload to HDFS and NONE for not using this function.
resourceStorageType="NONE"
# if S3, write S3 address. HA, for example: s3a://dolphinscheduler,
# Note: s3 make sure to create the root directory /dolphinscheduler
defaultFS="hdfs://mycluster:8020"
# If parameter 'resourceStorageType' is S3, following configs are needed:
s3Endpoint="http://192.168.xx.xx:9010"
s3AccessKey="xxxxxxxxxx"
s3SecretKey="xxxxxxxxxx"
# If ResourceManager supports HA, then input master and standby node IP or hostname, eg: '192.168.xx.xx,192.168.xx.xx'. Or else ResourceManager run in standalone mode, please set yarnHaIps="" and "" for not using yarn.
yarnHaIps="192.168.xx.xx,192.168.xx.xx"
# If ResourceManager runs in standalone, then set ResourceManager node ip or hostname, or else remain default.