Browse Source

[Doc] update the configuration doc (#11113)

3.1.0-release
rickchengx 2 years ago committed by GitHub
parent
commit
39186b1a6d
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
  1. 503
      docs/docs/en/architecture/configuration.md
  2. 498
      docs/docs/zh/architecture/configuration.md

503
docs/docs/en/architecture/configuration.md

@ -4,15 +4,11 @@
## Preface ## Preface
This document explains the DolphinScheduler application configurations according to DolphinScheduler-1.3.x versions. This document explains the DolphinScheduler application configurations.
## Directory Structure ## Directory Structure
Currently, all the configuration files are under [conf ] directory. The directory structure of DolphinScheduler is as follows:
Check the following simplified DolphinScheduler installation directories to have a direct view about the position of [conf] directory and configuration files it has.
This document only describes DolphinScheduler configurations and other topics are not going into.
[Note: the DolphinScheduler (hereinafter called the ‘DS’) .]
``` ```
├── LICENSE ├── LICENSE
@ -100,27 +96,13 @@ This document only describes DolphinScheduler configurations and other topics ar
## Configurations in Details ## Configurations in Details
serial number| service classification| config file|
|--|--|--|
1|startup or shutdown DS application|dolphinscheduler-daemon.sh ### dolphinscheduler-daemon.sh [startup or shutdown DolphinScheduler application]
2|datasource config properties|datasource.properties
3|ZooKeeper config properties|zookeeper.properties dolphinscheduler-daemon.sh is responsible for DolphinScheduler startup and shutdown.
4|common-service[storage] config properties|common.properties
5|API-service config properties|application-api.properties
6|master-service config properties|master.properties
7|worker-service config properties|worker.properties
8|alert-service config properties|alert.properties
9|quartz config properties|quartz.properties
10|DS environment variables configuration script[install/start DS]|install_config.conf
11|load environment variables configs <br /> [eg: JAVA_HOME,HADOOP_HOME, HIVE_HOME ...]|dolphinscheduler_env.sh
12|services log config files|API-service log config : logback-api.xml <br /> master-service log config : logback-master.xml <br /> worker-service log config : logback-worker.xml <br /> alert-service log config : logback-alert.xml
### dolphinscheduler-daemon.sh [startup or shutdown DS application]
dolphinscheduler-daemon.sh is responsible for DS startup and shutdown.
Essentially, start-all.sh or stop-all.sh startup and shutdown the cluster via dolphinscheduler-daemon.sh. Essentially, start-all.sh or stop-all.sh startup and shutdown the cluster via dolphinscheduler-daemon.sh.
Currently, DS just makes a basic config, remember to config further JVM options based on your practical situation of resources. Currently, DolphinScheduler just makes a basic config, remember to config further JVM options based on your practical situation of resources.
Default simplified parameters are: Default simplified parameters are:
```bash ```bash
@ -137,321 +119,206 @@ export DOLPHINSCHEDULER_OPTS="
" "
``` ```
> "-XX:DisableExplicitGC" is not recommended due to may lead to memory link (DS dependent on Netty to communicate). > "-XX:DisableExplicitGC" is not recommended due to may lead to memory link (DolphinScheduler dependent on Netty to communicate).
### datasource.properties [datasource config properties] ### Database connection related configuration
DolphinScheduler uses Spring Hikari to manage database connections, configuration file location:
|Service| Configuration file |
|--|--|
|Master Server | `master-server/conf/application.yaml`|
|Api Server| `api-server/conf/application.yaml`|
|Worker Server| `worker-server/conf/application.yaml`|
|Alert Server| `alert-server/conf/application.yaml`|
The default configuration is as follows:
DS uses Druid to manage database connections and default simplified configs are:
|Parameters | Default value| Description| |Parameters | Default value| Description|
|--|--|--| |--|--|--|
spring.datasource.driver-class-name||datasource driver |spring.datasource.driver-class-name| org.postgresql.Driver |datasource driver|
spring.datasource.url||datasource connection url |spring.datasource.url| jdbc:postgresql://127.0.0.1:5432/dolphinscheduler |datasource connection url|
spring.datasource.username||datasource username |spring.datasource.username|root|datasource username|
spring.datasource.password||datasource password |spring.datasource.password|root|datasource password|
spring.datasource.initialSize|5| initial connection pool size number |spring.datasource.hikari.connection-test-query|select 1|validate connection by running the SQL|
spring.datasource.minIdle|5| minimum connection pool size number |spring.datasource.hikari.minimum-idle| 5| minimum connection pool size number|
spring.datasource.maxActive|5| maximum connection pool size number |spring.datasource.hikari.auto-commit|true|whether auto commit|
spring.datasource.maxWait|60000| max wait milliseconds |spring.datasource.hikari.pool-name|DolphinScheduler|name of the connection pool|
spring.datasource.timeBetweenEvictionRunsMillis|60000| idle connection check interval |spring.datasource.hikari.maximum-pool-size|50| maximum connection pool size number|
spring.datasource.timeBetweenConnectErrorMillis|60000| retry interval |spring.datasource.hikari.connection-timeout|30000|connection timeout|
spring.datasource.minEvictableIdleTimeMillis|300000| connections over minEvictableIdleTimeMillis will be collect when idle check |spring.datasource.hikari.idle-timeout|600000|Maximum idle connection survival time|
spring.datasource.validationQuery|SELECT 1| validate connection by running the SQL |spring.datasource.hikari.leak-detection-threshold|0|Connection leak detection threshold|
spring.datasource.validationQueryTimeout|3| validate connection timeout[seconds] |spring.datasource.hikari.initialization-fail-timeout|1|Connection pool initialization failed timeout|
spring.datasource.testWhileIdle|true| set whether the pool validates the allocated connection when a new connection request comes
spring.datasource.testOnBorrow|true| validity check when the program requests a new connection Note that DolphinScheduler also supports database configuration through `bin/env/dolphinscheduler_env.sh`.
spring.datasource.testOnReturn|false| validity check when the program recalls a connection
spring.datasource.defaultAutoCommit|true| whether auto commit
spring.datasource.keepAlive|true| runs validationQuery SQL to avoid the connection closed by pool when the connection idles over minEvictableIdleTimeMillis ### Zookeeper related configuration
spring.datasource.poolPreparedStatements|true| open PSCache DolphinScheduler uses Zookeeper for cluster management, fault tolerance, event monitoring and other functions. Configuration file location:
spring.datasource.maxPoolPreparedStatementPerConnectionSize|20| specify the size of PSCache on each connection |Service| Configuration file |
|--|--|
|Master Server | `master-server/conf/application.yaml`|
### zookeeper.properties [zookeeper config properties] |Api Server| `api-server/conf/application.yaml`|
|Worker Server| `worker-server/conf/application.yaml`|
The default configuration is as follows:
|Parameters | Default value| Description| |Parameters | Default value| Description|
|--|--|--| |--|--|--|
zookeeper.quorum|localhost:2181| ZooKeeper cluster connection info |registry.zookeeper.namespace|dolphinscheduler|namespace of zookeeper|
zookeeper.dolphinscheduler.root|/dolphinscheduler| DS is stored under ZooKeeper root directory |registry.zookeeper.connect-string|localhost:2181| the connection string of zookeeper|
zookeeper.session.timeout|60000| session timeout |registry.zookeeper.retry-policy.base-sleep-time|60ms|time to wait between subsequent retries|
zookeeper.connection.timeout|30000| connection timeout |registry.zookeeper.retry-policy.max-sleep|300ms|maximum time to wait between subsequent retries|
zookeeper.retry.base.sleep|100| time to wait between subsequent retries |registry.zookeeper.retry-policy.max-retries|5|maximum retry times|
zookeeper.retry.max.sleep|30000| maximum time to wait between subsequent retries |registry.zookeeper.session-timeout|30s|session timeout|
zookeeper.retry.maxtime|10| maximum retry times |registry.zookeeper.connection-timeout|30s|connection timeout|
|registry.zookeeper.block-until-connected|600ms|waiting time to block until the connection succeeds|
|registry.zookeeper.digest|~|digest of zookeeper|
Note that DolphinScheduler also supports zookeeper related configuration through `bin/env/dolphinscheduler_env.sh`.
### common.properties [hadoop、s3、yarn config properties] ### common.properties [hadoop、s3、yarn config properties]
Currently, common.properties mainly configures Hadoop,s3a related configurations. Currently, common.properties mainly configures Hadoop,s3a related configurations. Configuration file location:
| Parameters | Default value | Description |
|--|--|--| |Service| Configuration file |
data.basedir.path | /tmp/dolphinscheduler | local directory used to store temp files |--|--|
resource.storage.type | NONE | type of resource files: HDFS, S3, NONE |Master Server | `master-server/conf/common.properties`|
resource.storage.upload.base.path | /dolphinscheduler | storage path of resource files |Api Server| `api-server/conf/common.properties`|
resource.aws.access.key.id | minioadmin | access key id of S3 |Worker Server| `worker-server/conf/common.properties`|
resource.aws.secret.access.key | minioadmin | secret access key of S3 |Alert Server| `alert-server/conf/common.properties`|
resource.aws.region |us-east-1 | region of S3
resource.aws.s3.bucket.name | dolphinscheduler | bucket name of S3 The default configuration is as follows:
resource.aws.s3.endpoint | http://minio:9000 | endpoint of S3
resource.hdfs.root.user | hdfs | configure users with corresponding permissions if storage type is HDFS
resource.hdfs.fs.defaultFS | hdfs://mycluster:8020 | If resource.storage.type=S3, then the request url would be similar to 's3a://dolphinscheduler'. Otherwise if resource.storage.type=HDFS and hadoop supports HA, copy core-site.xml and hdfs-site.xml into 'conf' directory
hadoop.security.authentication.startup.state | false | whether hadoop grant kerberos permission
java.security.krb5.conf.path | /opt/krb5.conf | kerberos config directory
login.user.keytab.username | hdfs-mycluster@ESZ.COM | kerberos username
login.user.keytab.path | /opt/hdfs.headless.keytab | kerberos user keytab
kerberos.expire.time | 2 | kerberos expire time,integer,the unit is hour
yarn.resourcemanager.ha.rm.ids | | specify the yarn resourcemanager url. if resourcemanager supports HA, input HA IP addresses (separated by comma), or input null for standalone
yarn.application.status.address | http://ds1:8088/ws/v1/cluster/apps/%s | keep default if ResourceManager supports HA or not use ResourceManager, or replace ds1 with corresponding hostname if ResourceManager in standalone mode
dolphinscheduler.env.path | env/dolphinscheduler_env.sh | load environment variables configs [eg: JAVA_HOME,HADOOP_HOME, HIVE_HOME ...]
development.state | false | specify whether in development state
task.resource.limit.state | false | specify whether in resource limit state
### application-api.properties [API-service log config]
| Parameters | Default value | Description | | Parameters | Default value | Description |
|--|--|--| |--|--|--|
server.port|12345|api service communication port |data.basedir.path | /tmp/dolphinscheduler | local directory used to store temp files|
server.servlet.session.timeout|7200|session timeout |resource.storage.type | NONE | type of resource files: HDFS, S3, NONE|
server.servlet.context-path|/dolphinscheduler | request path |resource.upload.path | /dolphinscheduler | storage path of resource files|
spring.servlet.multipart.max-file-size|1024MB| maximum file size |aws.access.key.id | minioadmin | access key id of S3|
spring.servlet.multipart.max-request-size|1024MB| maximum request size |aws.secret.access.key | minioadmin | secret access key of S3|
server.jetty.max-http-post-size|5000000| jetty maximum post size |aws.region | us-east-1 | region of S3|
spring.messages.encoding|UTF-8| message encoding |aws.s3.endpoint | http://minio:9000 | endpoint of S3|
spring.jackson.time-zone|GMT+8| time zone |hdfs.root.user | hdfs | configure users with corresponding permissions if storage type is HDFS|
spring.messages.basename|i18n/messages| i18n config |fs.defaultFS | hdfs://mycluster:8020 | If resource.storage.type=S3, then the request url would be similar to 's3a://dolphinscheduler'. Otherwise if resource.storage.type=HDFS and hadoop supports HA, copy core-site.xml and hdfs-site.xml into 'conf' directory|
security.authentication.type|PASSWORD| authentication type |hadoop.security.authentication.startup.state | false | whether hadoop grant kerberos permission|
security.authentication.ldap.user.admin|read-only-admin|admin user account when you log-in with LDAP |java.security.krb5.conf.path | /opt/krb5.conf | kerberos config directory|
security.authentication.ldap.urls|ldap://ldap.forumsys.com:389/|LDAP urls |login.user.keytab.username | hdfs-mycluster@ESZ.COM | kerberos username|
security.authentication.ldap.base-dn|dc=example,dc=com|LDAP base dn |login.user.keytab.path | /opt/hdfs.headless.keytab | kerberos user keytab|
security.authentication.ldap.username|cn=read-only-admin,dc=example,dc=com|LDAP username |kerberos.expire.time | 2 | kerberos expire time,integer,the unit is hour|
security.authentication.ldap.password|password|LDAP password |yarn.resourcemanager.ha.rm.ids | 192.168.xx.xx,192.168.xx.xx | specify the yarn resourcemanager url. if resourcemanager supports HA, input HA IP addresses (separated by comma), or input null for standalone|
security.authentication.ldap.user.identity-attribute|uid|LDAP user identity attribute |yarn.application.status.address | http://ds1:8088/ws/v1/cluster/apps/%s | keep default if ResourceManager supports HA or not use ResourceManager, or replace ds1 with corresponding hostname if ResourceManager in standalone mode|
security.authentication.ldap.user.email-attribute|mail|LDAP user email attribute |development.state | false | specify whether in development state|
security.authentication.ldap.user.not-exist-action|CREATE|action when LDAP user is not exist. Default CREATE: automatically create user when user not exist, DENY: deny log-in when user not exist |resource.manager.httpaddress.port | 8088 | the port of resource manager|
traffic.control.global.switch|false|traffic control global switch |yarn.job.history.status.address | http://ds1:19888/ws/v1/history/mapreduce/jobs/%s | job history status url of yarn|
traffic.control.max-global-qps-rate|300|global max request number per second |datasource.encryption.enable | false | whether to enable datasource encryption|
traffic.control.tenant-switch|false|traffic control tenant switch |datasource.encryption.salt | !@#$%^&* | the salt of the datasource encryption|
traffic.control.default-tenant-qps-rate|10|default tenant max request number per second |data-quality.jar.name | dolphinscheduler-data-quality-dev-SNAPSHOT.jar | the jar of data quality|
traffic.control.customize-tenant-qps-rate||customize tenant max request number per second |support.hive.oneSession | false | specify whether hive SQL is executed in the same session|
|sudo.enable | true | whether to enable sudo|
### master.properties [master-service log config] |alert.rpc.port | 50052 | the RPC port of Alert Server|
|zeppelin.rest.url | http://localhost:8080 | the RESTful API url of zeppelin|
### Api-server related configuration
Location: `api-server/conf/application.yaml`
|Parameters | Default value| Description| |Parameters | Default value| Description|
|--|--|--| |--|--|--|
master.listen.port|5678|master listen port |server.port|12345|api service communication port|
master.exec.threads|100|master-service execute thread number, used to limit the number of process instances in parallel |server.servlet.session.timeout|120m|session timeout|
master.exec.task.num|20|defines the number of parallel tasks for each process instance of the master-service |server.servlet.context-path|/dolphinscheduler/ |request path|
master.dispatch.task.num|3|defines the number of dispatch tasks for each batch of the master-service |spring.servlet.multipart.max-file-size|1024MB|maximum file size|
master.host.selector|LowerWeight|master host selector, to select a suitable worker to run the task, optional value: random, round-robin, lower weight |spring.servlet.multipart.max-request-size|1024MB|maximum request size|
master.heartbeat.interval|10|master heartbeat interval, the unit is second |server.jetty.max-http-post-size|5000000|jetty maximum post size|
master.task.commit.retryTimes|5|master commit task retry times |spring.banner.charset|UTF-8|message encoding|
master.task.commit.interval|1000|master commit task interval, the unit is millisecond |spring.jackson.time-zone|UTC|time zone|
master.max.cpuload.avg|-1|master max CPU load avg, only higher than the system CPU load average, master server can schedule. default value -1: the number of CPU cores * 2 |spring.jackson.date-format|"yyyy-MM-dd HH:mm:ss"|time format|
master.reserved.memory|0.3|master reserved memory, only lower than system available memory, master server can schedule. default value 0.3, the unit is G |spring.messages.basename|i18n/messages|i18n config|
|security.authentication.type|PASSWORD|authentication type|
|security.authentication.ldap.user.admin|read-only-admin|admin user account when you log-in with LDAP|
### worker.properties [worker-service log config] |security.authentication.ldap.urls|ldap://ldap.forumsys.com:389/|LDAP urls|
|security.authentication.ldap.base.dn|dc=example,dc=com|LDAP base dn|
|security.authentication.ldap.username|cn=read-only-admin,dc=example,dc=com|LDAP username|
|security.authentication.ldap.password|password|LDAP password|
|security.authentication.ldap.user.identity.attribute|uid|LDAP user identity attribute|
|security.authentication.ldap.user.email.attribute|mail|LDAP user email attribute|
### Master Server related configuration
Location: `master-server/conf/application.yaml`
|Parameters | Default value| Description| |Parameters | Default value| Description|
|--|--|--| |--|--|--|
worker.listen.port|1234|worker-service listen port |master.listen-port|5678|master listen port|
worker.exec.threads|100|worker-service execute thread number, used to limit the number of task instances in parallel |master.fetch-command-num|10|the number of commands fetched by master|
worker.heartbeat.interval|10|worker-service heartbeat interval, the unit is second |master.pre-exec-threads|10|master prepare execute thread number to limit handle commands in parallel|
worker.max.cpuload.avg|-1|worker max CPU load avg, only higher than the system CPU load average, worker server can be dispatched tasks. default value -1: the number of CPU cores * 2 |master.exec-threads|100|master execute thread number to limit process instances in parallel|
worker.reserved.memory|0.3|worker reserved memory, only lower than system available memory, worker server can be dispatched tasks. default value 0.3, the unit is G |master.dispatch-task-number|3|master dispatch task number per batch|
worker.groups|default|worker groups separated by comma, e.g., 'worker.groups=default,test' <br> worker will join corresponding group according to this config when startup |master.host-selector|lower_weight|master host selector to select a suitable worker, default value: LowerWeight. Optional values include random, round_robin, lower_weight|
worker.tenant.auto.create|true|tenant corresponds to the user of the system, which is used by the worker to submit the job. If system does not have this user, it will be automatically created after the parameter worker.tenant.auto.create is true. |master.heartbeat-interval|10|master heartbeat interval, the unit is second|
worker.tenant.distributed.user|false|Scenes to be used for distributed users.For example,users created by FreeIpa are stored in LDAP.This parameter only applies to Linux, When this parameter is true, worker.tenant.auto.create has no effect and will not automatically create tenants. |master.task-commit-retry-times|5|master commit task retry times|
|master.task-commit-interval|1000|master commit task interval, the unit is millisecond|
### alert.properties [alert-service log config] |master.state-wheel-interval|5|time to check status|
|master.max-cpu-load-avg|-1|master max CPU load avg, only higher than the system CPU load average, master server can schedule. default value -1: the number of CPU cores * 2|
|master.reserved-memory|0.3|master reserved memory, only lower than system available memory, master server can schedule. default value 0.3, the unit is G|
|master.failover-interval|10|failover interval, the unit is minute|
|master.kill-yarn-job-when-task-failover|true|whether to kill yarn job when failover taskInstance|
### Worker Server related configuration
Location: `worker-server/conf/application.yaml`
|Parameters | Default value| Description| |Parameters | Default value| Description|
|--|--|--| |--|--|--|
alert.type|EMAIL|alter type| |worker.listen-port|1234|worker-service listen port|
mail.protocol|SMTP|mail server protocol |worker.exec-threads|100|worker-service execute thread number, used to limit the number of task instances in parallel|
mail.server.host|xxx.xxx.com|mail server host |worker.heartbeat-interval|10|worker-service heartbeat interval, the unit is second|
mail.server.port|25|mail server port |worker.host-weight|100|worker host weight to dispatch tasks|
mail.sender|xxx@xxx.com|mail sender email |worker.tenant-auto-create|true|tenant corresponds to the user of the system, which is used by the worker to submit the job. If system does not have this user, it will be automatically created after the parameter worker.tenant.auto.create is true.|
mail.user|xxx@xxx.com|mail sender email name |worker.max-cpu-load-avg|-1|worker max CPU load avg, only higher than the system CPU load average, worker server can be dispatched tasks. default value -1: the number of CPU cores * 2|
mail.passwd|111111|mail sender email password |worker.reserved-memory|0.3|worker reserved memory, only lower than system available memory, worker server can be dispatched tasks. default value 0.3, the unit is G|
mail.smtp.starttls.enable|true|specify mail whether open tls |worker.groups|default|worker groups separated by comma, e.g., 'worker.groups=default,test' <br> worker will join corresponding group according to this config when startup|
mail.smtp.ssl.enable|false|specify mail whether open ssl |worker.alert-listen-host|localhost|the alert listen host of worker|
mail.smtp.ssl.trust|xxx.xxx.com|specify mail ssl trust list |worker.alert-listen-port|50052|the alert listen port of worker|
xls.file.path|/tmp/xls|mail attachment temp storage directory
||following configure WeCom[optional]| ### Alert Server related configuration
enterprise.wechat.enable|false|specify whether enable WeCom Location: `alert-server/conf/application.yaml`
enterprise.wechat.corp.id|xxxxxxx|WeCom corp id
enterprise.wechat.secret|xxxxxxx|WeCom secret
enterprise.wechat.agent.id|xxxxxxx|WeCom agent id
enterprise.wechat.users|xxxxxxx|WeCom users
enterprise.wechat.token.url|https://qyapi.weixin.qq.com/cgi-bin/gettoken? <br /> corpid=$corpId&corpsecret=$secret|WeCom token url
enterprise.wechat.push.url|https://qyapi.weixin.qq.com/cgi-bin/message/send? <br /> access_token=$token|WeCom push url
enterprise.wechat.user.send.msg||send message format
enterprise.wechat.team.send.msg||group message format
plugin.dir|/Users/xx/your/path/to/plugin/dir|plugin directory
### quartz.properties [quartz config properties]
This part describes quartz configs and configure them based on your practical situation and resources.
|Parameters | Default value| Description| |Parameters | Default value| Description|
|--|--|--| |--|--|--|
org.quartz.jobStore.driverDelegateClass | org.quartz.impl.jdbcjobstore.StdJDBCDelegate | |server.port|50053|the port of Alert Server|
org.quartz.jobStore.driverDelegateClass | org.quartz.impl.jdbcjobstore.PostgreSQLDelegate | |alert.port|50052|the port of alert|
org.quartz.scheduler.instanceName | DolphinScheduler |
org.quartz.scheduler.instanceId | AUTO |
org.quartz.scheduler.makeSchedulerThreadDaemon | true |
org.quartz.jobStore.useProperties | false |
org.quartz.threadPool.class | org.quartz.simpl.SimpleThreadPool |
org.quartz.threadPool.makeThreadsDaemons | true |
org.quartz.threadPool.threadCount | 25 |
org.quartz.threadPool.threadPriority | 5 |
org.quartz.jobStore.class | org.quartz.impl.jdbcjobstore.JobStoreTX |
org.quartz.jobStore.tablePrefix | QRTZ_ |
org.quartz.jobStore.isClustered | true |
org.quartz.jobStore.misfireThreshold | 60000 |
org.quartz.jobStore.clusterCheckinInterval | 5000 |
org.quartz.jobStore.acquireTriggersWithinLock|true |
org.quartz.jobStore.dataSource | myDs |
org.quartz.dataSource.myDs.connectionProvider.class | org.apache.dolphinscheduler.service.quartz.DruidConnectionProvider |
### install_config.conf [DS environment variables configuration script[install or start DS]]
install_config.conf is a bit complicated and is mainly used in the following two places.
* DS Cluster Auto Installation.
> System will load configs in the install_config.conf and auto-configure files below, based on the file content when executing 'install.sh'.
> Files such as dolphinscheduler-daemon.sh, datasource.properties, zookeeper.properties, common.properties, application-api.properties, master.properties, worker.properties, alert.properties, quartz.properties, etc.
* Startup and Shutdown DS Cluster.
> The system will load masters, workers, alert-server, API-servers and other parameters inside the file to startup or shutdown DS cluster.
#### File Content
```bash
# Note: please escape the character if the file contains special characters such as `.*[]^${}\+?|()@#&`.
# eg: `[` escape to `\[`
# Database type (DS currently only supports PostgreSQL and MySQL)
dbtype="mysql"
# Database url and port
dbhost="192.168.xx.xx:3306"
# Database name
dbname="dolphinscheduler"
# Database username
username="xx"
# Database password
password="xx"
# ZooKeeper url
zkQuorum="192.168.xx.xx:2181,192.168.xx.xx:2181,192.168.xx.xx:2181"
# DS installation path, such as '/data1_1T/dolphinscheduler' ### Quartz related configuration
installPath="/data1_1T/dolphinscheduler"
# Deployment user This part describes quartz configs and configure them based on your practical situation and resources.
# Note: Deployment user needs 'sudo' privilege and has rights to operate HDFS.
# Root directory must be created by the same user if using HDFS, otherwise permission related issues will be raised.
deployUser="dolphinscheduler"
# Followings are alert-service configs
# Mail server host
mailServerHost="smtp.exmail.qq.com"
# Mail server port
mailServerPort="25"
# Mail sender
mailSender="xxxxxxxxxx"
# Mail user
mailUser="xxxxxxxxxx"
# Mail password
mailPassword="xxxxxxxxxx"
# Whether mail supports TLS
starttlsEnable="true"
# Whether mail supports SSL. Note: starttlsEnable and sslEnable cannot both set true.
sslEnable="false"
# Mail server host, same as mailServerHost
sslTrust="smtp.exmail.qq.com"
# Specify which resource upload function to use for resources storage, such as sql files. And supported options are HDFS, S3 and NONE. HDFS for upload to HDFS and NONE for not using this function.
resourceStorageType="NONE"
# if S3, write S3 address. HA, for example: s3a://dolphinscheduler,
# Note: s3 make sure to create the root directory /dolphinscheduler
defaultFS="hdfs://mycluster:8020"
# If parameter 'resourceStorageType' is S3, following configs are needed:
s3Endpoint="http://192.168.xx.xx:9010"
s3AccessKey="xxxxxxxxxx"
s3SecretKey="xxxxxxxxxx"
# If ResourceManager supports HA, then input master and standby node IP or hostname, eg: '192.168.xx.xx,192.168.xx.xx'. Or else ResourceManager run in standalone mode, please set yarnHaIps="" and "" for not using yarn.
yarnHaIps="192.168.xx.xx,192.168.xx.xx"
# If ResourceManager runs in standalone, then set ResourceManager node ip or hostname, or else remain default.
singleYarnIp="yarnIp1"
# Storage path when using HDFS/S3
resourceUploadPath="/dolphinscheduler"
# HDFS/S3 root user
hdfsRootUser="hdfs"
# Followings are Kerberos configs
# Specify Kerberos enable or not
kerberosStartUp="false"
# Kdc krb5 config file path
krb5ConfPath="$installPath/conf/krb5.conf"
# Keytab username
keytabUserName="hdfs-mycluster@ESZ.COM"
# Username keytab path
keytabPath="$installPath/conf/hdfs.headless.keytab"
# API-service port
apiServerPort="12345"
# All hosts deploy DS
ips="ds1,ds2,ds3,ds4,ds5"
# Ssh port, default 22
sshPort="22"
# Master service hosts
masters="ds1,ds2"
# All hosts deploy worker service
# Note: Each worker needs to set a worker group name and default name is "default"
workers="ds1:default,ds2:default,ds3:default,ds4:default,ds5:default"
# Host deploy alert-service |Service| Configuration file |
alertServer="ds3" |--|--|
|Master Server | `master-server/conf/application.yaml`|
|Api Server| `api-server/conf/application.yaml`|
The default configuration is as follows:
|Parameters | Default value|
|--|--|
|spring.quartz.properties.org.quartz.threadPool.threadPriority | 5|
|spring.quartz.properties.org.quartz.jobStore.isClustered | true|
|spring.quartz.properties.org.quartz.jobStore.class | org.quartz.impl.jdbcjobstore.JobStoreTX|
|spring.quartz.properties.org.quartz.scheduler.instanceId | AUTO|
|spring.quartz.properties.org.quartz.jobStore.tablePrefix | QRTZ_|
|spring.quartz.properties.org.quartz.jobStore.acquireTriggersWithinLock|true|
|spring.quartz.properties.org.quartz.scheduler.instanceName | DolphinScheduler|
|spring.quartz.properties.org.quartz.threadPool.class | org.quartz.simpl.SimpleThreadPool|
|spring.quartz.properties.org.quartz.jobStore.useProperties | false|
|spring.quartz.properties.org.quartz.threadPool.makeThreadsDaemons | true|
|spring.quartz.properties.org.quartz.threadPool.threadCount | 25|
|spring.quartz.properties.org.quartz.jobStore.misfireThreshold | 60000|
|spring.quartz.properties.org.quartz.scheduler.makeSchedulerThreadDaemon | true|
|spring.quartz.properties.org.quartz.jobStore.driverDelegateClass | org.quartz.impl.jdbcjobstore.PostgreSQLDelegate|
|spring.quartz.properties.org.quartz.jobStore.clusterCheckinInterval | 5000|
# Host deploy API-service
apiServers="ds1"
```
### dolphinscheduler_env.sh [load environment variables configs] ### dolphinscheduler_env.sh [load environment variables configs]
@ -491,11 +358,11 @@ export DATAX_HOME=${DATAX_HOME:-/opt/soft/datax}
export PATH=$HADOOP_HOME/bin:$SPARK_HOME1/bin:$SPARK_HOME2/bin:$PYTHON_HOME/bin:$JAVA_HOME/bin:$HIVE_HOME/bin:$FLINK_HOME/bin:$DATAX_HOME/bin:$PATH export PATH=$HADOOP_HOME/bin:$SPARK_HOME1/bin:$SPARK_HOME2/bin:$PYTHON_HOME/bin:$JAVA_HOME/bin:$HIVE_HOME/bin:$FLINK_HOME/bin:$DATAX_HOME/bin:$PATH
``` ```
### Services logback configs ### Log related configuration
Services name| logback config name | |Service| Configuration file |
--|--| |--|--|
API-service logback config |logback-api.xml| |Master Server | `master-server/conf/logback-spring.xml`|
master-service logback config|logback-master.xml | |Api Server| `api-server/conf/logback-spring.xml`|
worker-service logback config|logback-worker.xml | |Worker Server| `worker-server/conf/logback-spring.xml`|
alert-service logback config|logback-alert.xml | |Alert Server| `alert-server/conf/logback-spring.xml`|

498
docs/docs/zh/architecture/configuration.md

@ -1,14 +1,10 @@
<!-- markdown-link-check-disable --> <!-- markdown-link-check-disable -->
# 前言 # 前言
本文档为dolphinscheduler配置文件说明文档,针对版本为 dolphinscheduler-1.3.x 版本. 本文档为dolphinscheduler配置文件说明文档
# 目录结构 # 目录结构
目前dolphinscheduler 所有的配置文件都在 [conf ] 目录中. DolphinScheduler的目录结构如下:
为了更直观的了解[conf]目录所在的位置以及包含的配置文件,请查看下面dolphinscheduler安装目录的简化说明.
本文主要讲述dolphinscheduler的配置文件.其他部分先不做赘述.
[注:以下 dolphinscheduler 简称为DS.]
``` ```
├── LICENSE ├── LICENSE
@ -96,26 +92,10 @@
# 配置文件详解 # 配置文件详解
序号| 服务分类 | 配置文件| ## dolphinscheduler-daemon.sh [启动/关闭DolphinScheduler服务脚本]
|--|--|--| dolphinscheduler-daemon.sh脚本负责DolphinScheduler的启动&关闭.
1|启动/关闭DS服务脚本|dolphinscheduler-daemon.sh
2|数据库连接配置 | datasource.properties
3|zookeeper连接配置|zookeeper.properties
4|公共[存储]配置|common.properties
5|API服务配置|application-api.properties
6|Master服务配置|master.properties
7|Worker服务配置|worker.properties
8|Alert 服务配置|alert.properties
9|Quartz配置|quartz.properties
10|DS环境变量配置脚本[用于DS安装/启动]|install_config.conf
11|运行脚本加载环境变量配置文件 <br />[如: JAVA_HOME,HADOOP_HOME, HIVE_HOME ...]|dolphinscheduler_env.sh
12|各服务日志配置文件|api服务日志配置文件 : logback-api.xml <br /> master服务日志配置文件 : logback-master.xml <br /> worker服务日志配置文件 : logback-worker.xml <br /> alert服务日志配置文件 : logback-alert.xml
## 1.dolphinscheduler-daemon.sh [启动/关闭DS服务脚本]
dolphinscheduler-daemon.sh脚本负责DS的启动&关闭.
start-all.sh/stop-all.sh最终也是通过dolphinscheduler-daemon.sh对集群进行启动/关闭操作. start-all.sh/stop-all.sh最终也是通过dolphinscheduler-daemon.sh对集群进行启动/关闭操作.
目前DS只是做了一个基本的设置,JVM参数请根据各自资源的实际情况自行设置. 目前DolphinScheduler只是做了一个基本的设置,JVM参数请根据各自资源的实际情况自行设置.
默认简化参数如下: 默认简化参数如下:
```bash ```bash
@ -132,313 +112,201 @@ export DOLPHINSCHEDULER_OPTS="
" "
``` ```
> 不建议设置"-XX:DisableExplicitGC" , DS使用Netty进行通讯,设置该参数,可能会导致内存泄漏. > 不建议设置"-XX:DisableExplicitGC" , DolphinScheduler使用Netty进行通讯,设置该参数,可能会导致内存泄漏.
## 数据库连接相关配置
在DolphinScheduler中使用Spring Hikari对数据库连接进行管理,配置文件位置:
|服务名称| 配置文件 |
|--|--|
|Master Server | `master-server/conf/application.yaml`|
|Api Server| `api-server/conf/application.yaml`|
|Worker Server| `worker-server/conf/application.yaml`|
|Alert Server| `alert-server/conf/application.yaml`|
默认配置如下:
## 2.datasource.properties [数据库连接]
在DS中使用Druid对数据库连接进行管理,默认简化配置如下.
|参数 | 默认值| 描述|
|--|--|--|
spring.datasource.driver-class-name| |数据库驱动
spring.datasource.url||数据库连接地址
spring.datasource.username||数据库用户名
spring.datasource.password||数据库密码
spring.datasource.initialSize|5| 初始连接池数量
spring.datasource.minIdle|5| 最小连接池数量
spring.datasource.maxActive|5| 最大连接池数量
spring.datasource.maxWait|60000| 最大等待时长
spring.datasource.timeBetweenEvictionRunsMillis|60000| 连接检测周期
spring.datasource.timeBetweenConnectErrorMillis|60000| 重试间隔
spring.datasource.minEvictableIdleTimeMillis|300000| 连接保持空闲而不被驱逐的最小时间
spring.datasource.validationQuery|SELECT 1|检测连接是否有效的sql
spring.datasource.validationQueryTimeout|3| 检测连接是否有效的超时时间[seconds]
spring.datasource.testWhileIdle|true| 申请连接的时候检测,如果空闲时间大于timeBetweenEvictionRunsMillis,执行validationQuery检测连接是否有效。
spring.datasource.testOnBorrow|true| 申请连接时执行validationQuery检测连接是否有效
spring.datasource.testOnReturn|false| 归还连接时执行validationQuery检测连接是否有效
spring.datasource.defaultAutoCommit|true| 是否开启自动提交
spring.datasource.keepAlive|true| 连接池中的minIdle数量以内的连接,空闲时间超过minEvictableIdleTimeMillis,则会执行keepAlive操作。
spring.datasource.poolPreparedStatements|true| 开启PSCache
spring.datasource.maxPoolPreparedStatementPerConnectionSize|20| 要启用PSCache,必须配置大于0,当大于0时,poolPreparedStatements自动触发修改为true。
## 3.zookeeper.properties [zookeeper连接配置]
|参数 | 默认值| 描述| |参数 | 默认值| 描述|
|--|--|--| |--|--|--|
zookeeper.quorum|localhost:2181| zk集群连接信息 |spring.datasource.driver-class-name| org.postgresql.Driver |数据库驱动|
zookeeper.dolphinscheduler.root|/dolphinscheduler| DS在zookeeper存储根目录 |spring.datasource.url| jdbc:postgresql://127.0.0.1:5432/dolphinscheduler |数据库连接地址|
zookeeper.session.timeout|60000| session 超时 |spring.datasource.username|root|数据库用户名|
zookeeper.connection.timeout|30000| 连接超时 |spring.datasource.password|root|数据库密码|
zookeeper.retry.base.sleep|100| 基本重试时间差 |spring.datasource.hikari.connection-test-query|select 1|检测连接是否有效的sql|
zookeeper.retry.max.sleep|30000| 最大重试时间 |spring.datasource.hikari.minimum-idle| 5|最小空闲连接池数量|
zookeeper.retry.maxtime|10|最大重试次数 |spring.datasource.hikari.auto-commit|true|是否自动提交|
|spring.datasource.hikari.pool-name|DolphinScheduler|连接池名称|
|spring.datasource.hikari.maximum-pool-size|50|连接池最大连接数|
|spring.datasource.hikari.connection-timeout|30000|连接超时时长|
|spring.datasource.hikari.idle-timeout|600000|空闲连接存活最大时间|
|spring.datasource.hikari.leak-detection-threshold|0|连接泄露检测阈值|
|spring.datasource.hikari.initialization-fail-timeout|1|连接池初始化失败timeout|
DolphinScheduler同样可以通过`bin/env/dolphinscheduler_env.sh`进行数据库连接相关的配置。
## Zookeeper相关配置
DolphinScheduler使用Zookeeper进行集群管理、容错、事件监听等功能,配置文件位置:
|服务名称| 配置文件 |
|--|--|
|Master Server | `master-server/conf/application.yaml`|
|Api Server| `api-server/conf/application.yaml`|
|Worker Server| `worker-server/conf/application.yaml`|
默认配置如下:
## 4.common.properties [hadoop、s3、yarn配置]
common.properties配置文件目前主要是配置hadoop/s3a相关的配置.
|参数 |默认值| 描述| |参数 |默认值| 描述|
|--|--|--| |--|--|--|
data.basedir.path | /tmp/dolphinscheduler | 本地工作目录,用于存放临时文件 |registry.zookeeper.namespace|dolphinscheduler|Zookeeper集群使用的namespace|
resource.storage.type | NONE | 资源文件存储类型: HDFS,S3,NONE |registry.zookeeper.connect-string|localhost:2181| Zookeeper集群连接信息|
resource.storage.upload.base.path | /dolphinscheduler | 资源文件存储路径 |registry.zookeeper.retry-policy.base-sleep-time|60ms|基本重试时间差|
resource.aws.access.key.id | minioadmin | S3 access key |registry.zookeeper.retry-policy.max-sleep|300ms|最大重试时间|
resource.aws.secret.access.key | minioadmin | S3 secret access key |registry.zookeeper.retry-policy.max-retries|5|最大重试次数|
resource.aws.region | us-east-1 | S3 区域 |registry.zookeeper.session-timeout|30s|session超时时间|
resource.aws.s3.bucket.name | dolphinscheduler | S3 存储桶名称 |registry.zookeeper.connection-timeout|30s|连接超时时间|
resource.aws.s3.endpoint | http://minio:9000 | s3 endpoint地址 |registry.zookeeper.block-until-connected|600ms|阻塞直到连接成功的等待时间|
resource.hdfs.root.user | hdfs | 如果存储类型为HDFS,需要配置拥有对应操作权限的用户 |registry.zookeeper.digest|~|Zookeeper使用的digest|
resource.hdfs.fs.defaultFS | hdfs://mycluster:8020 | 请求地址如果resource.storage.type=S3,该值类似为: s3a://dolphinscheduler. 如果resource.storage.type=HDFS, 如果 hadoop 配置了 HA,需要复制core-site.xml 和 hdfs-site.xml 文件到conf目录
hadoop.security.authentication.startup.state | false | hadoop是否开启kerberos权限 DolphinScheduler同样可以通过`bin/env/dolphinscheduler_env.sh`进行Zookeeper相关的配置。
java.security.krb5.conf.path | /opt/krb5.conf | kerberos配置目录
login.user.keytab.username | hdfs-mycluster@ESZ.COM | kerberos登录用户 ## common.properties [hadoop、s3、yarn配置]
login.user.keytab.path | /opt/hdfs.headless.keytab | kerberos登录用户keytab common.properties配置文件目前主要是配置hadoop/s3/yarn相关的配置,配置文件位置:
kerberos.expire.time | 2 | kerberos过期时间,整数,单位为小时 |服务名称| 配置文件 |
yarn.resourcemanager.ha.rm.ids | | yarn resourcemanager 地址, 如果resourcemanager开启了HA, 输入HA的IP地址(以逗号分隔),如果resourcemanager为单节点, 该值为空即可 |--|--|
yarn.application.status.address | http://ds1:8088/ws/v1/cluster/apps/%s | 如果resourcemanager开启了HA或者没有使用resourcemanager,保持默认值即可. 如果resourcemanager为单节点,你需要将ds1 配置为resourcemanager对应的hostname |Master Server | `master-server/conf/common.properties`|
dolphinscheduler.env.path | env/dolphinscheduler_env.sh | 运行脚本加载环境变量配置文件[如: JAVA_HOME,HADOOP_HOME, HIVE_HOME ...] |Api Server| `api-server/conf/common.properties`|
development.state | false | 是否处于开发模式 |Worker Server| `worker-server/conf/common.properties`|
task.resource.limit.state | false | 是否启用资源限制模式 |Alert Server| `alert-server/conf/common.properties`|
默认配置如下:
## 5.application-api.properties [API服务配置]
| 参数 | 默认值 | 描述 | | 参数 | 默认值 | 描述 |
|--|--|--| |--|--|--|
server.port|12345|api服务通讯端口 |data.basedir.path | /tmp/dolphinscheduler | 本地工作目录,用于存放临时文件|
server.servlet.session.timeout|7200|session超时时间 |resource.storage.type | NONE | 资源文件存储类型: HDFS,S3,NONE|
server.servlet.context-path|/dolphinscheduler |请求路径 |resource.upload.path | /dolphinscheduler | 资源文件存储路径|
spring.servlet.multipart.max-file-size|1024MB|最大上传文件大小 |aws.access.key.id | minioadmin | S3 access key|
spring.servlet.multipart.max-request-size|1024MB|最大请求大小 |aws.secret.access.key | minioadmin | S3 secret access key|
server.jetty.max-http-post-size|5000000|jetty服务最大发送请求大小 |aws.region | us-east-1 | S3 区域|
spring.messages.encoding|UTF-8|请求编码 |aws.s3.endpoint | http://minio:9000 | S3 endpoint地址|
spring.jackson.time-zone|GMT+8|设置时区 |hdfs.root.user | hdfs | 如果存储类型为HDFS,需要配置拥有对应操作权限的用户|
spring.messages.basename|i18n/messages|i18n配置 |fs.defaultFS | hdfs://mycluster:8020 | 请求地址如果resource.storage.type=S3,该值类似为: s3a://dolphinscheduler. 如果resource.storage.type=HDFS, 如果 hadoop 配置了 HA,需要复制core-site.xml 和 hdfs-site.xml 文件到conf目录|
security.authentication.type|PASSWORD|权限校验类型 |hadoop.security.authentication.startup.state | false | hadoop是否开启kerberos权限|
security.authentication.ldap.user.admin|read-only-admin|LDAP登陆时,系统管理员账号 |java.security.krb5.conf.path | /opt/krb5.conf | kerberos配置目录|
security.authentication.ldap.urls|ldap://ldap.forumsys.com:389/|LDAP urls |login.user.keytab.username | hdfs-mycluster@ESZ.COM | kerberos登录用户|
security.authentication.ldap.base-dn|dc=example,dc=com|LDAP base dn |login.user.keytab.path | /opt/hdfs.headless.keytab | kerberos登录用户keytab|
security.authentication.ldap.username|cn=read-only-admin,dc=example,dc=com|LDAP账号 |kerberos.expire.time | 2 | kerberos过期时间,整数,单位为小时|
security.authentication.ldap.password|password|LDAP密码 |yarn.resourcemanager.ha.rm.ids | 192.168.xx.xx,192.168.xx.xx | yarn resourcemanager 地址, 如果resourcemanager开启了HA, 输入HA的IP地址(以逗号分隔),如果resourcemanager为单节点, 该值为空即可|
security.authentication.ldap.user.identity-attribute|uid|LDAP用户身份标识字段名 |yarn.application.status.address | http://ds1:8088/ws/v1/cluster/apps/%s | 如果resourcemanager开启了HA或者没有使用resourcemanager,保持默认值即可. 如果resourcemanager为单节点,你需要将ds1 配置为resourcemanager对应的hostname|
security.authentication.ldap.user.email-attribute|mail|LDAP邮箱字段名 |development.state | false | 是否处于开发模式|
security.authentication.ldap.user.not-exist-action|CREATE|当LDAP用户不存在时执行的操作。CREATE:当用户不存在时自动新建用户, DENY:当用户不存在时拒绝登陆 |resource.manager.httpaddress.port | 8088 | resource manager的端口|
traffic.control.global.switch|false|流量控制全局开关 |yarn.job.history.status.address | http://ds1:19888/ws/v1/history/mapreduce/jobs/%s | yarn的作业历史状态URL|
traffic.control.max-global-qps-rate|300|全局最大请求数/秒 |datasource.encryption.enable | false | 是否启用datasource 加密|
traffic.control.tenant-switch|false|流量控制租户开关 |datasource.encryption.salt | !@#$%^&* | datasource加密使用的salt|
traffic.control.default-tenant-qps-rate|10|默认租户最大请求数/秒限制 |data-quality.jar.name | dolphinscheduler-data-quality-dev-SNAPSHOT.jar | 配置数据质量使用的jar包|
traffic.control.customize-tenant-qps-rate||自定义租户最大请求数/秒限制 |support.hive.oneSession | false | 设置hive SQL是否在同一个session中执行|
|sudo.enable | true | 是否开启sudo|
## 6.master.properties [Master服务配置] |alert.rpc.port | 50052 | Alert Server的RPC端口|
|zeppelin.rest.url | http://localhost:8080 | zeppelin RESTful API 接口地址|
## Api-server相关配置
位置:`api-server/conf/application.yaml`
|参数 |默认值| 描述| |参数 |默认值| 描述|
|--|--|--| |--|--|--|
master.listen.port|5678|master监听端口 |server.port|12345|api服务通讯端口|
master.exec.threads|100|master工作线程数量,用于限制并行的流程实例数量 |server.servlet.session.timeout|120m|session超时时间|
master.exec.task.num|20|master每个流程实例的并行任务数量 |server.servlet.context-path|/dolphinscheduler/ |请求路径|
master.dispatch.task.num|3|master每个批次的派发任务数量 |spring.servlet.multipart.max-file-size|1024MB|最大上传文件大小|
master.host.selector|LowerWeight|master host选择器,用于选择合适的worker执行任务,可选值: Random, RoundRobin, LowerWeight |spring.servlet.multipart.max-request-size|1024MB|最大请求大小|
master.heartbeat.interval|10|master心跳间隔,单位为秒 |server.jetty.max-http-post-size|5000000|jetty服务最大发送请求大小|
master.task.commit.retryTimes|5|任务重试次数 |spring.banner.charset|UTF-8|请求编码|
master.task.commit.interval|1000|任务提交间隔,单位为毫秒 |spring.jackson.time-zone|UTC|设置时区|
master.max.cpuload.avg|-1|master最大cpuload均值,只有高于系统cpuload均值时,master服务才能调度任务. 默认值为-1: cpu cores * 2 |spring.jackson.date-format|"yyyy-MM-dd HH:mm:ss"|设置时间格式|
master.reserved.memory|0.3|master预留内存,只有低于系统可用内存时,master服务才能调度任务,单位为G |spring.messages.basename|i18n/messages|i18n配置|
|security.authentication.type|PASSWORD|权限校验类型|
|security.authentication.ldap.user.admin|read-only-admin|LDAP登陆时,系统管理员账号|
## 7.worker.properties [Worker服务配置] |security.authentication.ldap.urls|ldap://ldap.forumsys.com:389/|LDAP urls|
|security.authentication.ldap.base.dn|dc=example,dc=com|LDAP base dn|
|security.authentication.ldap.username|cn=read-only-admin,dc=example,dc=com|LDAP账号|
|security.authentication.ldap.password|password|LDAP密码|
|security.authentication.ldap.user.identity.attribute|uid|LDAP用户身份标识字段名|
|security.authentication.ldap.user.email.attribute|mail|LDAP邮箱字段名|
## Master Server相关配置
位置:`master-server/conf/application.yaml`
|参数 |默认值| 描述| |参数 |默认值| 描述|
|--|--|--| |--|--|--|
worker.listen.port|1234|worker监听端口 |master.listen-port|5678|master监听端口|
worker.exec.threads|100|worker工作线程数量,用于限制并行的任务实例数量 |master.fetch-command-num|10|master拉取command数量|
worker.heartbeat.interval|10|worker心跳间隔,单位为秒 |master.pre-exec-threads|10|master准备执行任务的数量,用于限制并行的command|
worker.max.cpuload.avg|-1|worker最大cpuload均值,只有高于系统cpuload均值时,worker服务才能被派发任务. 默认值为-1: cpu cores * 2 |master.exec-threads|100|master工作线程数量,用于限制并行的流程实例数量|
worker.reserved.memory|0.3|worker预留内存,只有低于系统可用内存时,worker服务才能被派发任务,单位为G |master.dispatch-task-number|3|master每个批次的派发任务数量|
worker.groups|default|worker分组配置,逗号分隔,例如'worker.groups=default,test' <br> worker启动时会根据该配置自动加入对应的分组 |master.host-selector|lower_weight|master host选择器,用于选择合适的worker执行任务,可选值: random, round_robin, lower_weight|
worker.tenant.auto.create|true|租户对应于系统的用户,由worker提交作业.如果系统没有该用户,则在参数worker.tenant.auto.create为true后自动创建。 |master.heartbeat-interval|10|master心跳间隔,单位为秒|
worker.tenant.distributed.user|false|使用场景为分布式用户例如使用FreeIpa创建的用户存于LDAP中.该参数只适用于Linux,当该参数为true时worker.tenant.auto.create将不生效,不会自动去创建租户 |master.task-commit-retry-times|5|任务重试次数|
|master.task-commit-interval|1000|任务提交间隔,单位为毫秒|
|master.state-wheel-interval|5|轮询检查状态时间|
## 8.alert.properties [Alert 告警服务配置] |master.max-cpu-load-avg|-1|master最大cpuload均值,只有高于系统cpuload均值时,master服务才能调度任务. 默认值为-1: cpu cores * 2|
|master.reserved-memory|0.3|master预留内存,只有低于系统可用内存时,master服务才能调度任务,单位为G|
|master.failover-interval|10|failover间隔,单位为分钟|
|master.kill-yarn-job-when-task-failover|true|当任务实例failover时,是否kill掉yarn job|
## Worker Server相关配置
位置:`worker-server/conf/application.yaml`
|参数 |默认值| 描述| |参数 |默认值| 描述|
|--|--|--| |--|--|--|
alert.type|EMAIL|告警类型| |worker.listen-port|1234|worker监听端口|
mail.protocol|SMTP| 邮件服务器协议 |worker.exec-threads|100|worker工作线程数量,用于限制并行的任务实例数量|
mail.server.host|xxx.xxx.com|邮件服务器地址 |worker.heartbeat-interval|10|worker心跳间隔,单位为秒|
mail.server.port|25|邮件服务器端口 |worker.host-weight|100|派发任务时,worker主机的权重|
mail.sender|xxx@xxx.com|发送人邮箱 |worker.tenant-auto-create|true|租户对应于系统的用户,由worker提交作业.如果系统没有该用户,则在参数worker.tenant.auto.create为true后自动创建。|
mail.user|xxx@xxx.com|发送人邮箱名称 |worker.max-cpu-load-avg|-1|worker最大cpuload均值,只有高于系统cpuload均值时,worker服务才能被派发任务. 默认值为-1: cpu cores * 2|
mail.passwd|111111|发送人邮箱密码 |worker.reserved-memory|0.3|worker预留内存,只有低于系统可用内存时,worker服务才能被派发任务,单位为G|
mail.smtp.starttls.enable|true|邮箱是否开启tls |worker.groups|default|worker分组配置,逗号分隔,例如'worker.groups=default,test' <br> worker启动时会根据该配置自动加入对应的分组|
mail.smtp.ssl.enable|false|邮箱是否开启ssl |worker.alert-listen-host|localhost|alert监听host|
mail.smtp.ssl.trust|xxx.xxx.com|邮箱ssl白名单 |worker.alert-listen-port|50052|alert监听端口|
xls.file.path|/tmp/xls|邮箱附件临时工作目录
||以下为企业微信配置[选填]|
enterprise.wechat.enable|false|企业微信是否启用 ## Alert Server相关配置
enterprise.wechat.corp.id|xxxxxxx| 位置:`alert-server/conf/application.yaml`
enterprise.wechat.secret|xxxxxxx|
enterprise.wechat.agent.id|xxxxxxx|
enterprise.wechat.users|xxxxxxx|
enterprise.wechat.token.url|https://qyapi.weixin.qq.com/cgi-bin/gettoken? <br /> corpid=$corpId&corpsecret=$secret|
enterprise.wechat.push.url|https://qyapi.weixin.qq.com/cgi-bin/message/send? <br /> access_token=$token|
enterprise.wechat.user.send.msg||发送消息格式
enterprise.wechat.team.send.msg||群发消息格式
plugin.dir|/Users/xx/your/path/to/plugin/dir|插件目录
## 9.quartz.properties [Quartz配置]
这里面主要是quartz配置,请结合实际业务场景&资源进行配置,本文暂时不做展开.
|参数 |默认值| 描述| |参数 |默认值| 描述|
|--|--|--| |--|--|--|
org.quartz.jobStore.driverDelegateClass | org.quartz.impl.jdbcjobstore.StdJDBCDelegate |server.port|50053|Alert Server监听端口|
org.quartz.jobStore.driverDelegateClass | org.quartz.impl.jdbcjobstore.PostgreSQLDelegate |alert.port|50052|alert监听端口|
org.quartz.scheduler.instanceName | DolphinScheduler
org.quartz.scheduler.instanceId | AUTO
org.quartz.scheduler.makeSchedulerThreadDaemon | true
org.quartz.jobStore.useProperties | false
org.quartz.threadPool.class | org.quartz.simpl.SimpleThreadPool
org.quartz.threadPool.makeThreadsDaemons | true
org.quartz.threadPool.threadCount | 25
org.quartz.threadPool.threadPriority | 5
org.quartz.jobStore.class | org.quartz.impl.jdbcjobstore.JobStoreTX
org.quartz.jobStore.tablePrefix | QRTZ_
org.quartz.jobStore.isClustered | true
org.quartz.jobStore.misfireThreshold | 60000
org.quartz.jobStore.clusterCheckinInterval | 5000
org.quartz.jobStore.acquireTriggersWithinLock|true
org.quartz.jobStore.dataSource | myDs
org.quartz.dataSource.myDs.connectionProvider.class | org.apache.dolphinscheduler.service.quartz.DruidConnectionProvider
## 10.install_config.conf [DS环境变量配置脚本[用于DS安装/启动]]
install_config.conf这个配置文件比较繁琐,这个文件主要有两个地方会用到.
* 1.DS集群的自动安装.
> 调用install.sh脚本会自动加载该文件中的配置.并根据该文件中的内容自动配置上述的配置文件中的内容.
> 比如:dolphinscheduler-daemon.sh、datasource.properties、zookeeper.properties、common.properties、application-api.properties、master.properties、worker.properties、alert.properties、quartz.properties 等文件.
* 2.DS集群的启动&关闭.
>DS集群在启动&关闭的时候,会加载该配置文件中的masters,workers,alertServer,apiServers等参数,启动/关闭DS集群.
文件内容如下:
```bash
# 注意: 该配置文件中如果包含特殊字符,如: `.*[]^${}\+?|()@#&`, 请转义,
# 示例: `[` 转义为 `\[`
# 数据库类型, 目前仅支持 postgresql 或者 mysql
dbtype="mysql"
# 数据库 地址 & 端口
dbhost="192.168.xx.xx:3306"
# 数据库 名称
dbname="dolphinscheduler"
# 数据库 用户名
username="xx"
# 数据库 密码
password="xx"
# Zookeeper地址
zkQuorum="192.168.xx.xx:2181,192.168.xx.xx:2181,192.168.xx.xx:2181"
# 将DS安装到哪个目录,如: /data1_1T/dolphinscheduler,
installPath="/data1_1T/dolphinscheduler"
# 使用哪个用户部署
# 注意: 部署用户需要sudo 权限, 并且可以操作 hdfs .
# 如果使用hdfs的话,根目录必须使用该用户进行创建.否则会有权限相关的问题.
deployUser="dolphinscheduler"
# 以下为告警服务配置
# 邮件服务器地址
mailServerHost="smtp.exmail.qq.com"
# 邮件服务器 端口 ## Quartz相关配置
mailServerPort="25" 这里面主要是quartz配置,请结合实际业务场景&资源进行配置,本文暂时不做展开,配置文件位置:
# 发送者 |服务名称| 配置文件 |
mailSender="xxxxxxxxxx" |--|--|
|Master Server | `master-server/conf/application.yaml`|
|Api Server| `api-server/conf/application.yaml`|
# 发送用户 默认配置如下:
mailUser="xxxxxxxxxx"
# 邮箱密码 | 参数 | 默认值 |
mailPassword="xxxxxxxxxx" |--|--|
|spring.quartz.properties.org.quartz.threadPool.threadPriority | 5|
|spring.quartz.properties.org.quartz.jobStore.isClustered | true|
|spring.quartz.properties.org.quartz.jobStore.class | org.quartz.impl.jdbcjobstore.JobStoreTX|
|spring.quartz.properties.org.quartz.scheduler.instanceId | AUTO|
|spring.quartz.properties.org.quartz.jobStore.tablePrefix | QRTZ_|
|spring.quartz.properties.org.quartz.jobStore.acquireTriggersWithinLock|true|
|spring.quartz.properties.org.quartz.scheduler.instanceName | DolphinScheduler|
|spring.quartz.properties.org.quartz.threadPool.class | org.quartz.simpl.SimpleThreadPool|
|spring.quartz.properties.org.quartz.jobStore.useProperties | false|
|spring.quartz.properties.org.quartz.threadPool.makeThreadsDaemons | true|
|spring.quartz.properties.org.quartz.threadPool.threadCount | 25|
|spring.quartz.properties.org.quartz.jobStore.misfireThreshold | 60000|
|spring.quartz.properties.org.quartz.scheduler.makeSchedulerThreadDaemon | true|
|spring.quartz.properties.org.quartz.jobStore.driverDelegateClass | org.quartz.impl.jdbcjobstore.PostgreSQLDelegate|
|spring.quartz.properties.org.quartz.jobStore.clusterCheckinInterval | 5000|
# TLS协议的邮箱设置为true,否则设置为false
starttlsEnable="true"
# 开启SSL协议的邮箱配置为true,否则为false。注意: starttlsEnable和sslEnable不能同时为true ## dolphinscheduler_env.sh [环境变量配置]
sslEnable="false"
# 邮件服务地址值,同 mailServerHost 通过类似shell方式提交任务的的时候,会加载该配置文件中的环境变量到主机中。涉及到的 `JAVA_HOME`、元数据库、注册中心和任务类型配置,其中任务类型主要有: Shell任务、Python任务、Spark任务、Flink任务、Datax任务等等。
sslTrust="smtp.exmail.qq.com"
#业务用到的比如sql等资源文件上传到哪里,可以设置:HDFS,S3,NONE。如果想上传到HDFS,请配置为HDFS;如果不需要资源上传功能请选择NONE。
resourceStorageType="NONE"
# if S3,write S3 address,HA,for example :s3a://dolphinscheduler,
# Note,s3 be sure to create the root directory /dolphinscheduler
defaultFS="hdfs://mycluster:8020"
# 如果resourceStorageType 为S3 需要配置的参数如下:
s3Endpoint="http://192.168.xx.xx:9010"
s3AccessKey="xxxxxxxxxx"
s3SecretKey="xxxxxxxxxx"
# 如果ResourceManager是HA,则配置为ResourceManager节点的主备ip或者hostname,比如"192.168.xx.xx,192.168.xx.xx",否则如果是单ResourceManager或者根本没用到yarn,请配置yarnHaIps=""即可,如果没用到yarn,配置为""
yarnHaIps="192.168.xx.xx,192.168.xx.xx"
# 如果是单ResourceManager,则配置为ResourceManager节点ip或主机名,否则保持默认值即可。
singleYarnIp="yarnIp1"
# 资源文件在 HDFS/S3 存储路径
resourceUploadPath="/dolphinscheduler"
# HDFS/S3 操作用户
hdfsRootUser="hdfs"
# 以下为 kerberos 配置
# kerberos是否开启
kerberosStartUp="false"
# kdc krb5 config file path
krb5ConfPath="$installPath/conf/krb5.conf"
# keytab username
keytabUserName="hdfs-mycluster@ESZ.COM"
# username keytab path
keytabPath="$installPath/conf/hdfs.headless.keytab"
# api 服务端口
apiServerPort="12345"
# 部署DS的所有主机hostname
ips="ds1,ds2,ds3,ds4,ds5"
# ssh 端口 , 默认 22
sshPort="22"
# 部署master服务主机
masters="ds1,ds2"
# 部署 worker服务的主机
# 注意: 每一个worker都需要设置一个worker 分组的名称,默认值为 "default"
workers="ds1:default,ds2:default,ds3:default,ds4:default,ds5:default"
# 部署alert服务主机
alertServer="ds3"
# 部署api服务主机
apiServers="ds1"
```
## 11.dolphinscheduler_env.sh [环境变量配置]
通过类似shell方式提交任务的的时候,会加载该配置文件中的环境变量到主机中. 涉及到的 `JAVA_HOME`、元数据库、注册中心和任务类型配置,其中任务
类型主要有: Shell任务、Python任务、Spark任务、Flink任务、Datax任务等等
```bash ```bash
# JAVA_HOME, will use it to start DolphinScheduler server # JAVA_HOME, will use it to start DolphinScheduler server
@ -473,10 +341,10 @@ export DATAX_HOME=${DATAX_HOME:-/opt/soft/datax}
export PATH=$HADOOP_HOME/bin:$SPARK_HOME1/bin:$SPARK_HOME2/bin:$PYTHON_HOME/bin:$JAVA_HOME/bin:$HIVE_HOME/bin:$FLINK_HOME/bin:$DATAX_HOME/bin:$PATH export PATH=$HADOOP_HOME/bin:$SPARK_HOME1/bin:$SPARK_HOME2/bin:$PYTHON_HOME/bin:$JAVA_HOME/bin:$HIVE_HOME/bin:$FLINK_HOME/bin:$DATAX_HOME/bin:$PATH
``` ```
## 12.各服务日志配置文件 ## 日志相关配置
对应服务服务名称| 日志文件名 | |服务名称| 配置文件 |
|--|--|--| |--|--|
api服务日志配置文件 |logback-api.xml| |Master Server | `master-server/conf/logback-spring.xml`|
master服务日志配置文件|logback-master.xml | |Api Server| `api-server/conf/logback-spring.xml`|
worker服务日志配置文件|logback-worker.xml | |Worker Server| `worker-server/conf/logback-spring.xml`|
alert服务日志配置文件|logback-alert.xml | |Alert Server| `alert-server/conf/logback-spring.xml`|

Loading…
Cancel
Save