Browse Source

[Doc] update the configuration doc (#11113)

3.1.0-release
rickchengx 2 years ago committed by GitHub
parent
commit
39186b1a6d
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
  1. 505
      docs/docs/en/architecture/configuration.md
  2. 498
      docs/docs/zh/architecture/configuration.md

505
docs/docs/en/architecture/configuration.md

@ -4,15 +4,11 @@
## Preface
This document explains the DolphinScheduler application configurations according to DolphinScheduler-1.3.x versions.
This document explains the DolphinScheduler application configurations.
## Directory Structure
Currently, all the configuration files are under [conf ] directory.
Check the following simplified DolphinScheduler installation directories to have a direct view about the position of [conf] directory and configuration files it has.
This document only describes DolphinScheduler configurations and other topics are not going into.
[Note: the DolphinScheduler (hereinafter called the ‘DS’) .]
The directory structure of DolphinScheduler is as follows:
```
├── LICENSE
@ -100,27 +96,13 @@ This document only describes DolphinScheduler configurations and other topics ar
## Configurations in Details
serial number| service classification| config file|
|--|--|--|
1|startup or shutdown DS application|dolphinscheduler-daemon.sh
2|datasource config properties|datasource.properties
3|ZooKeeper config properties|zookeeper.properties
4|common-service[storage] config properties|common.properties
5|API-service config properties|application-api.properties
6|master-service config properties|master.properties
7|worker-service config properties|worker.properties
8|alert-service config properties|alert.properties
9|quartz config properties|quartz.properties
10|DS environment variables configuration script[install/start DS]|install_config.conf
11|load environment variables configs <br /> [eg: JAVA_HOME,HADOOP_HOME, HIVE_HOME ...]|dolphinscheduler_env.sh
12|services log config files|API-service log config : logback-api.xml <br /> master-service log config : logback-master.xml <br /> worker-service log config : logback-worker.xml <br /> alert-service log config : logback-alert.xml
### dolphinscheduler-daemon.sh [startup or shutdown DS application]
dolphinscheduler-daemon.sh is responsible for DS startup and shutdown.
### dolphinscheduler-daemon.sh [startup or shutdown DolphinScheduler application]
dolphinscheduler-daemon.sh is responsible for DolphinScheduler startup and shutdown.
Essentially, start-all.sh or stop-all.sh startup and shutdown the cluster via dolphinscheduler-daemon.sh.
Currently, DS just makes a basic config, remember to config further JVM options based on your practical situation of resources.
Currently, DolphinScheduler just makes a basic config, remember to config further JVM options based on your practical situation of resources.
Default simplified parameters are:
```bash
@ -137,321 +119,206 @@ export DOLPHINSCHEDULER_OPTS="
"
```
> "-XX:DisableExplicitGC" is not recommended due to may lead to memory link (DS dependent on Netty to communicate).
> "-XX:DisableExplicitGC" is not recommended due to may lead to memory link (DolphinScheduler dependent on Netty to communicate).
### datasource.properties [datasource config properties]
### Database connection related configuration
DolphinScheduler uses Spring Hikari to manage database connections, configuration file location:
|Service| Configuration file |
|--|--|
|Master Server | `master-server/conf/application.yaml`|
|Api Server| `api-server/conf/application.yaml`|
|Worker Server| `worker-server/conf/application.yaml`|
|Alert Server| `alert-server/conf/application.yaml`|
The default configuration is as follows:
DS uses Druid to manage database connections and default simplified configs are:
|Parameters | Default value| Description|
|--|--|--|
spring.datasource.driver-class-name||datasource driver
spring.datasource.url||datasource connection url
spring.datasource.username||datasource username
spring.datasource.password||datasource password
spring.datasource.initialSize|5| initial connection pool size number
spring.datasource.minIdle|5| minimum connection pool size number
spring.datasource.maxActive|5| maximum connection pool size number
spring.datasource.maxWait|60000| max wait milliseconds
spring.datasource.timeBetweenEvictionRunsMillis|60000| idle connection check interval
spring.datasource.timeBetweenConnectErrorMillis|60000| retry interval
spring.datasource.minEvictableIdleTimeMillis|300000| connections over minEvictableIdleTimeMillis will be collect when idle check
spring.datasource.validationQuery|SELECT 1| validate connection by running the SQL
spring.datasource.validationQueryTimeout|3| validate connection timeout[seconds]
spring.datasource.testWhileIdle|true| set whether the pool validates the allocated connection when a new connection request comes
spring.datasource.testOnBorrow|true| validity check when the program requests a new connection
spring.datasource.testOnReturn|false| validity check when the program recalls a connection
spring.datasource.defaultAutoCommit|true| whether auto commit
spring.datasource.keepAlive|true| runs validationQuery SQL to avoid the connection closed by pool when the connection idles over minEvictableIdleTimeMillis
spring.datasource.poolPreparedStatements|true| open PSCache
spring.datasource.maxPoolPreparedStatementPerConnectionSize|20| specify the size of PSCache on each connection
### zookeeper.properties [zookeeper config properties]
|spring.datasource.driver-class-name| org.postgresql.Driver |datasource driver|
|spring.datasource.url| jdbc:postgresql://127.0.0.1:5432/dolphinscheduler |datasource connection url|
|spring.datasource.username|root|datasource username|
|spring.datasource.password|root|datasource password|
|spring.datasource.hikari.connection-test-query|select 1|validate connection by running the SQL|
|spring.datasource.hikari.minimum-idle| 5| minimum connection pool size number|
|spring.datasource.hikari.auto-commit|true|whether auto commit|
|spring.datasource.hikari.pool-name|DolphinScheduler|name of the connection pool|
|spring.datasource.hikari.maximum-pool-size|50| maximum connection pool size number|
|spring.datasource.hikari.connection-timeout|30000|connection timeout|
|spring.datasource.hikari.idle-timeout|600000|Maximum idle connection survival time|
|spring.datasource.hikari.leak-detection-threshold|0|Connection leak detection threshold|
|spring.datasource.hikari.initialization-fail-timeout|1|Connection pool initialization failed timeout|
Note that DolphinScheduler also supports database configuration through `bin/env/dolphinscheduler_env.sh`.
### Zookeeper related configuration
DolphinScheduler uses Zookeeper for cluster management, fault tolerance, event monitoring and other functions. Configuration file location:
|Service| Configuration file |
|--|--|
|Master Server | `master-server/conf/application.yaml`|
|Api Server| `api-server/conf/application.yaml`|
|Worker Server| `worker-server/conf/application.yaml`|
The default configuration is as follows:
|Parameters | Default value| Description|
|--|--|--|
zookeeper.quorum|localhost:2181| ZooKeeper cluster connection info
zookeeper.dolphinscheduler.root|/dolphinscheduler| DS is stored under ZooKeeper root directory
zookeeper.session.timeout|60000| session timeout
zookeeper.connection.timeout|30000| connection timeout
zookeeper.retry.base.sleep|100| time to wait between subsequent retries
zookeeper.retry.max.sleep|30000| maximum time to wait between subsequent retries
zookeeper.retry.maxtime|10| maximum retry times
|registry.zookeeper.namespace|dolphinscheduler|namespace of zookeeper|
|registry.zookeeper.connect-string|localhost:2181| the connection string of zookeeper|
|registry.zookeeper.retry-policy.base-sleep-time|60ms|time to wait between subsequent retries|
|registry.zookeeper.retry-policy.max-sleep|300ms|maximum time to wait between subsequent retries|
|registry.zookeeper.retry-policy.max-retries|5|maximum retry times|
|registry.zookeeper.session-timeout|30s|session timeout|
|registry.zookeeper.connection-timeout|30s|connection timeout|
|registry.zookeeper.block-until-connected|600ms|waiting time to block until the connection succeeds|
|registry.zookeeper.digest|~|digest of zookeeper|
Note that DolphinScheduler also supports zookeeper related configuration through `bin/env/dolphinscheduler_env.sh`.
### common.properties [hadoop、s3、yarn config properties]
Currently, common.properties mainly configures Hadoop,s3a related configurations.
| Parameters | Default value | Description |
|--|--|--|
data.basedir.path | /tmp/dolphinscheduler | local directory used to store temp files
resource.storage.type | NONE | type of resource files: HDFS, S3, NONE
resource.storage.upload.base.path | /dolphinscheduler | storage path of resource files
resource.aws.access.key.id | minioadmin | access key id of S3
resource.aws.secret.access.key | minioadmin | secret access key of S3
resource.aws.region |us-east-1 | region of S3
resource.aws.s3.bucket.name | dolphinscheduler | bucket name of S3
resource.aws.s3.endpoint | http://minio:9000 | endpoint of S3
resource.hdfs.root.user | hdfs | configure users with corresponding permissions if storage type is HDFS
resource.hdfs.fs.defaultFS | hdfs://mycluster:8020 | If resource.storage.type=S3, then the request url would be similar to 's3a://dolphinscheduler'. Otherwise if resource.storage.type=HDFS and hadoop supports HA, copy core-site.xml and hdfs-site.xml into 'conf' directory
hadoop.security.authentication.startup.state | false | whether hadoop grant kerberos permission
java.security.krb5.conf.path | /opt/krb5.conf | kerberos config directory
login.user.keytab.username | hdfs-mycluster@ESZ.COM | kerberos username
login.user.keytab.path | /opt/hdfs.headless.keytab | kerberos user keytab
kerberos.expire.time | 2 | kerberos expire time,integer,the unit is hour
yarn.resourcemanager.ha.rm.ids | | specify the yarn resourcemanager url. if resourcemanager supports HA, input HA IP addresses (separated by comma), or input null for standalone
yarn.application.status.address | http://ds1:8088/ws/v1/cluster/apps/%s | keep default if ResourceManager supports HA or not use ResourceManager, or replace ds1 with corresponding hostname if ResourceManager in standalone mode
dolphinscheduler.env.path | env/dolphinscheduler_env.sh | load environment variables configs [eg: JAVA_HOME,HADOOP_HOME, HIVE_HOME ...]
development.state | false | specify whether in development state
task.resource.limit.state | false | specify whether in resource limit state
### application-api.properties [API-service log config]
Currently, common.properties mainly configures Hadoop,s3a related configurations. Configuration file location:
|Parameters | Default value| Description|
|Service| Configuration file |
|--|--|
|Master Server | `master-server/conf/common.properties`|
|Api Server| `api-server/conf/common.properties`|
|Worker Server| `worker-server/conf/common.properties`|
|Alert Server| `alert-server/conf/common.properties`|
The default configuration is as follows:
| Parameters | Default value | Description |
|--|--|--|
server.port|12345|api service communication port
server.servlet.session.timeout|7200|session timeout
server.servlet.context-path|/dolphinscheduler | request path
spring.servlet.multipart.max-file-size|1024MB| maximum file size
spring.servlet.multipart.max-request-size|1024MB| maximum request size
server.jetty.max-http-post-size|5000000| jetty maximum post size
spring.messages.encoding|UTF-8| message encoding
spring.jackson.time-zone|GMT+8| time zone
spring.messages.basename|i18n/messages| i18n config
security.authentication.type|PASSWORD| authentication type
security.authentication.ldap.user.admin|read-only-admin|admin user account when you log-in with LDAP
security.authentication.ldap.urls|ldap://ldap.forumsys.com:389/|LDAP urls
security.authentication.ldap.base-dn|dc=example,dc=com|LDAP base dn
security.authentication.ldap.username|cn=read-only-admin,dc=example,dc=com|LDAP username
security.authentication.ldap.password|password|LDAP password
security.authentication.ldap.user.identity-attribute|uid|LDAP user identity attribute
security.authentication.ldap.user.email-attribute|mail|LDAP user email attribute
security.authentication.ldap.user.not-exist-action|CREATE|action when LDAP user is not exist. Default CREATE: automatically create user when user not exist, DENY: deny log-in when user not exist
traffic.control.global.switch|false|traffic control global switch
traffic.control.max-global-qps-rate|300|global max request number per second
traffic.control.tenant-switch|false|traffic control tenant switch
traffic.control.default-tenant-qps-rate|10|default tenant max request number per second
traffic.control.customize-tenant-qps-rate||customize tenant max request number per second
### master.properties [master-service log config]
|data.basedir.path | /tmp/dolphinscheduler | local directory used to store temp files|
|resource.storage.type | NONE | type of resource files: HDFS, S3, NONE|
|resource.upload.path | /dolphinscheduler | storage path of resource files|
|aws.access.key.id | minioadmin | access key id of S3|
|aws.secret.access.key | minioadmin | secret access key of S3|
|aws.region | us-east-1 | region of S3|
|aws.s3.endpoint | http://minio:9000 | endpoint of S3|
|hdfs.root.user | hdfs | configure users with corresponding permissions if storage type is HDFS|
|fs.defaultFS | hdfs://mycluster:8020 | If resource.storage.type=S3, then the request url would be similar to 's3a://dolphinscheduler'. Otherwise if resource.storage.type=HDFS and hadoop supports HA, copy core-site.xml and hdfs-site.xml into 'conf' directory|
|hadoop.security.authentication.startup.state | false | whether hadoop grant kerberos permission|
|java.security.krb5.conf.path | /opt/krb5.conf | kerberos config directory|
|login.user.keytab.username | hdfs-mycluster@ESZ.COM | kerberos username|
|login.user.keytab.path | /opt/hdfs.headless.keytab | kerberos user keytab|
|kerberos.expire.time | 2 | kerberos expire time,integer,the unit is hour|
|yarn.resourcemanager.ha.rm.ids | 192.168.xx.xx,192.168.xx.xx | specify the yarn resourcemanager url. if resourcemanager supports HA, input HA IP addresses (separated by comma), or input null for standalone|
|yarn.application.status.address | http://ds1:8088/ws/v1/cluster/apps/%s | keep default if ResourceManager supports HA or not use ResourceManager, or replace ds1 with corresponding hostname if ResourceManager in standalone mode|
|development.state | false | specify whether in development state|
|resource.manager.httpaddress.port | 8088 | the port of resource manager|
|yarn.job.history.status.address | http://ds1:19888/ws/v1/history/mapreduce/jobs/%s | job history status url of yarn|
|datasource.encryption.enable | false | whether to enable datasource encryption|
|datasource.encryption.salt | !@#$%^&* | the salt of the datasource encryption|
|data-quality.jar.name | dolphinscheduler-data-quality-dev-SNAPSHOT.jar | the jar of data quality|
|support.hive.oneSession | false | specify whether hive SQL is executed in the same session|
|sudo.enable | true | whether to enable sudo|
|alert.rpc.port | 50052 | the RPC port of Alert Server|
|zeppelin.rest.url | http://localhost:8080 | the RESTful API url of zeppelin|
### Api-server related configuration
Location: `api-server/conf/application.yaml`
|Parameters | Default value| Description|
|--|--|--|
master.listen.port|5678|master listen port
master.exec.threads|100|master-service execute thread number, used to limit the number of process instances in parallel
master.exec.task.num|20|defines the number of parallel tasks for each process instance of the master-service
master.dispatch.task.num|3|defines the number of dispatch tasks for each batch of the master-service
master.host.selector|LowerWeight|master host selector, to select a suitable worker to run the task, optional value: random, round-robin, lower weight
master.heartbeat.interval|10|master heartbeat interval, the unit is second
master.task.commit.retryTimes|5|master commit task retry times
master.task.commit.interval|1000|master commit task interval, the unit is millisecond
master.max.cpuload.avg|-1|master max CPU load avg, only higher than the system CPU load average, master server can schedule. default value -1: the number of CPU cores * 2
master.reserved.memory|0.3|master reserved memory, only lower than system available memory, master server can schedule. default value 0.3, the unit is G
### worker.properties [worker-service log config]
|server.port|12345|api service communication port|
|server.servlet.session.timeout|120m|session timeout|
|server.servlet.context-path|/dolphinscheduler/ |request path|
|spring.servlet.multipart.max-file-size|1024MB|maximum file size|
|spring.servlet.multipart.max-request-size|1024MB|maximum request size|
|server.jetty.max-http-post-size|5000000|jetty maximum post size|
|spring.banner.charset|UTF-8|message encoding|
|spring.jackson.time-zone|UTC|time zone|
|spring.jackson.date-format|"yyyy-MM-dd HH:mm:ss"|time format|
|spring.messages.basename|i18n/messages|i18n config|
|security.authentication.type|PASSWORD|authentication type|
|security.authentication.ldap.user.admin|read-only-admin|admin user account when you log-in with LDAP|
|security.authentication.ldap.urls|ldap://ldap.forumsys.com:389/|LDAP urls|
|security.authentication.ldap.base.dn|dc=example,dc=com|LDAP base dn|
|security.authentication.ldap.username|cn=read-only-admin,dc=example,dc=com|LDAP username|
|security.authentication.ldap.password|password|LDAP password|
|security.authentication.ldap.user.identity.attribute|uid|LDAP user identity attribute|
|security.authentication.ldap.user.email.attribute|mail|LDAP user email attribute|
### Master Server related configuration
Location: `master-server/conf/application.yaml`
|Parameters | Default value| Description|
|--|--|--|
worker.listen.port|1234|worker-service listen port
worker.exec.threads|100|worker-service execute thread number, used to limit the number of task instances in parallel
worker.heartbeat.interval|10|worker-service heartbeat interval, the unit is second
worker.max.cpuload.avg|-1|worker max CPU load avg, only higher than the system CPU load average, worker server can be dispatched tasks. default value -1: the number of CPU cores * 2
worker.reserved.memory|0.3|worker reserved memory, only lower than system available memory, worker server can be dispatched tasks. default value 0.3, the unit is G
worker.groups|default|worker groups separated by comma, e.g., 'worker.groups=default,test' <br> worker will join corresponding group according to this config when startup
worker.tenant.auto.create|true|tenant corresponds to the user of the system, which is used by the worker to submit the job. If system does not have this user, it will be automatically created after the parameter worker.tenant.auto.create is true.
worker.tenant.distributed.user|false|Scenes to be used for distributed users.For example,users created by FreeIpa are stored in LDAP.This parameter only applies to Linux, When this parameter is true, worker.tenant.auto.create has no effect and will not automatically create tenants.
### alert.properties [alert-service log config]
|master.listen-port|5678|master listen port|
|master.fetch-command-num|10|the number of commands fetched by master|
|master.pre-exec-threads|10|master prepare execute thread number to limit handle commands in parallel|
|master.exec-threads|100|master execute thread number to limit process instances in parallel|
|master.dispatch-task-number|3|master dispatch task number per batch|
|master.host-selector|lower_weight|master host selector to select a suitable worker, default value: LowerWeight. Optional values include random, round_robin, lower_weight|
|master.heartbeat-interval|10|master heartbeat interval, the unit is second|
|master.task-commit-retry-times|5|master commit task retry times|
|master.task-commit-interval|1000|master commit task interval, the unit is millisecond|
|master.state-wheel-interval|5|time to check status|
|master.max-cpu-load-avg|-1|master max CPU load avg, only higher than the system CPU load average, master server can schedule. default value -1: the number of CPU cores * 2|
|master.reserved-memory|0.3|master reserved memory, only lower than system available memory, master server can schedule. default value 0.3, the unit is G|
|master.failover-interval|10|failover interval, the unit is minute|
|master.kill-yarn-job-when-task-failover|true|whether to kill yarn job when failover taskInstance|
### Worker Server related configuration
Location: `worker-server/conf/application.yaml`
|Parameters | Default value| Description|
|--|--|--|
alert.type|EMAIL|alter type|
mail.protocol|SMTP|mail server protocol
mail.server.host|xxx.xxx.com|mail server host
mail.server.port|25|mail server port
mail.sender|xxx@xxx.com|mail sender email
mail.user|xxx@xxx.com|mail sender email name
mail.passwd|111111|mail sender email password
mail.smtp.starttls.enable|true|specify mail whether open tls
mail.smtp.ssl.enable|false|specify mail whether open ssl
mail.smtp.ssl.trust|xxx.xxx.com|specify mail ssl trust list
xls.file.path|/tmp/xls|mail attachment temp storage directory
||following configure WeCom[optional]|
enterprise.wechat.enable|false|specify whether enable WeCom
enterprise.wechat.corp.id|xxxxxxx|WeCom corp id
enterprise.wechat.secret|xxxxxxx|WeCom secret
enterprise.wechat.agent.id|xxxxxxx|WeCom agent id
enterprise.wechat.users|xxxxxxx|WeCom users
enterprise.wechat.token.url|https://qyapi.weixin.qq.com/cgi-bin/gettoken? <br /> corpid=$corpId&corpsecret=$secret|WeCom token url
enterprise.wechat.push.url|https://qyapi.weixin.qq.com/cgi-bin/message/send? <br /> access_token=$token|WeCom push url
enterprise.wechat.user.send.msg||send message format
enterprise.wechat.team.send.msg||group message format
plugin.dir|/Users/xx/your/path/to/plugin/dir|plugin directory
### quartz.properties [quartz config properties]
|worker.listen-port|1234|worker-service listen port|
|worker.exec-threads|100|worker-service execute thread number, used to limit the number of task instances in parallel|
|worker.heartbeat-interval|10|worker-service heartbeat interval, the unit is second|
|worker.host-weight|100|worker host weight to dispatch tasks|
|worker.tenant-auto-create|true|tenant corresponds to the user of the system, which is used by the worker to submit the job. If system does not have this user, it will be automatically created after the parameter worker.tenant.auto.create is true.|
|worker.max-cpu-load-avg|-1|worker max CPU load avg, only higher than the system CPU load average, worker server can be dispatched tasks. default value -1: the number of CPU cores * 2|
|worker.reserved-memory|0.3|worker reserved memory, only lower than system available memory, worker server can be dispatched tasks. default value 0.3, the unit is G|
|worker.groups|default|worker groups separated by comma, e.g., 'worker.groups=default,test' <br> worker will join corresponding group according to this config when startup|
|worker.alert-listen-host|localhost|the alert listen host of worker|
|worker.alert-listen-port|50052|the alert listen port of worker|
### Alert Server related configuration
Location: `alert-server/conf/application.yaml`
This part describes quartz configs and configure them based on your practical situation and resources.
|Parameters | Default value| Description|
|--|--|--|
org.quartz.jobStore.driverDelegateClass | org.quartz.impl.jdbcjobstore.StdJDBCDelegate |
org.quartz.jobStore.driverDelegateClass | org.quartz.impl.jdbcjobstore.PostgreSQLDelegate |
org.quartz.scheduler.instanceName | DolphinScheduler |
org.quartz.scheduler.instanceId | AUTO |
org.quartz.scheduler.makeSchedulerThreadDaemon | true |
org.quartz.jobStore.useProperties | false |
org.quartz.threadPool.class | org.quartz.simpl.SimpleThreadPool |
org.quartz.threadPool.makeThreadsDaemons | true |
org.quartz.threadPool.threadCount | 25 |
org.quartz.threadPool.threadPriority | 5 |
org.quartz.jobStore.class | org.quartz.impl.jdbcjobstore.JobStoreTX |
org.quartz.jobStore.tablePrefix | QRTZ_ |
org.quartz.jobStore.isClustered | true |
org.quartz.jobStore.misfireThreshold | 60000 |
org.quartz.jobStore.clusterCheckinInterval | 5000 |
org.quartz.jobStore.acquireTriggersWithinLock|true |
org.quartz.jobStore.dataSource | myDs |
org.quartz.dataSource.myDs.connectionProvider.class | org.apache.dolphinscheduler.service.quartz.DruidConnectionProvider |
### install_config.conf [DS environment variables configuration script[install or start DS]]
install_config.conf is a bit complicated and is mainly used in the following two places.
* DS Cluster Auto Installation.
> System will load configs in the install_config.conf and auto-configure files below, based on the file content when executing 'install.sh'.
> Files such as dolphinscheduler-daemon.sh, datasource.properties, zookeeper.properties, common.properties, application-api.properties, master.properties, worker.properties, alert.properties, quartz.properties, etc.
* Startup and Shutdown DS Cluster.
> The system will load masters, workers, alert-server, API-servers and other parameters inside the file to startup or shutdown DS cluster.
#### File Content
```bash
# Note: please escape the character if the file contains special characters such as `.*[]^${}\+?|()@#&`.
# eg: `[` escape to `\[`
# Database type (DS currently only supports PostgreSQL and MySQL)
dbtype="mysql"
# Database url and port
dbhost="192.168.xx.xx:3306"
# Database name
dbname="dolphinscheduler"
# Database username
username="xx"
|server.port|50053|the port of Alert Server|
|alert.port|50052|the port of alert|
# Database password
password="xx"
# ZooKeeper url
zkQuorum="192.168.xx.xx:2181,192.168.xx.xx:2181,192.168.xx.xx:2181"
### Quartz related configuration
# DS installation path, such as '/data1_1T/dolphinscheduler'
installPath="/data1_1T/dolphinscheduler"
# Deployment user
# Note: Deployment user needs 'sudo' privilege and has rights to operate HDFS.
# Root directory must be created by the same user if using HDFS, otherwise permission related issues will be raised.
deployUser="dolphinscheduler"
# Followings are alert-service configs
# Mail server host
mailServerHost="smtp.exmail.qq.com"
# Mail server port
mailServerPort="25"
# Mail sender
mailSender="xxxxxxxxxx"
# Mail user
mailUser="xxxxxxxxxx"
# Mail password
mailPassword="xxxxxxxxxx"
# Whether mail supports TLS
starttlsEnable="true"
# Whether mail supports SSL. Note: starttlsEnable and sslEnable cannot both set true.
sslEnable="false"
# Mail server host, same as mailServerHost
sslTrust="smtp.exmail.qq.com"
# Specify which resource upload function to use for resources storage, such as sql files. And supported options are HDFS, S3 and NONE. HDFS for upload to HDFS and NONE for not using this function.
resourceStorageType="NONE"
# if S3, write S3 address. HA, for example: s3a://dolphinscheduler,
# Note: s3 make sure to create the root directory /dolphinscheduler
defaultFS="hdfs://mycluster:8020"
# If parameter 'resourceStorageType' is S3, following configs are needed:
s3Endpoint="http://192.168.xx.xx:9010"
s3AccessKey="xxxxxxxxxx"
s3SecretKey="xxxxxxxxxx"
# If ResourceManager supports HA, then input master and standby node IP or hostname, eg: '192.168.xx.xx,192.168.xx.xx'. Or else ResourceManager run in standalone mode, please set yarnHaIps="" and "" for not using yarn.
yarnHaIps="192.168.xx.xx,192.168.xx.xx"
# If ResourceManager runs in standalone, then set ResourceManager node ip or hostname, or else remain default.
singleYarnIp="yarnIp1"
# Storage path when using HDFS/S3
resourceUploadPath="/dolphinscheduler"
# HDFS/S3 root user
hdfsRootUser="hdfs"
# Followings are Kerberos configs
# Specify Kerberos enable or not
kerberosStartUp="false"
# Kdc krb5 config file path
krb5ConfPath="$installPath/conf/krb5.conf"
# Keytab username
keytabUserName="hdfs-mycluster@ESZ.COM"
# Username keytab path
keytabPath="$installPath/conf/hdfs.headless.keytab"
# API-service port
apiServerPort="12345"
# All hosts deploy DS
ips="ds1,ds2,ds3,ds4,ds5"
# Ssh port, default 22
sshPort="22"
# Master service hosts
masters="ds1,ds2"
# All hosts deploy worker service
# Note: Each worker needs to set a worker group name and default name is "default"
workers="ds1:default,ds2:default,ds3:default,ds4:default,ds5:default"
This part describes quartz configs and configure them based on your practical situation and resources.
# Host deploy alert-service
alertServer="ds3"
|Service| Configuration file |
|--|--|
|Master Server | `master-server/conf/application.yaml`|
|Api Server| `api-server/conf/application.yaml`|
The default configuration is as follows:
|Parameters | Default value|
|--|--|
|spring.quartz.properties.org.quartz.threadPool.threadPriority | 5|
|spring.quartz.properties.org.quartz.jobStore.isClustered | true|
|spring.quartz.properties.org.quartz.jobStore.class | org.quartz.impl.jdbcjobstore.JobStoreTX|
|spring.quartz.properties.org.quartz.scheduler.instanceId | AUTO|
|spring.quartz.properties.org.quartz.jobStore.tablePrefix | QRTZ_|
|spring.quartz.properties.org.quartz.jobStore.acquireTriggersWithinLock|true|
|spring.quartz.properties.org.quartz.scheduler.instanceName | DolphinScheduler|
|spring.quartz.properties.org.quartz.threadPool.class | org.quartz.simpl.SimpleThreadPool|
|spring.quartz.properties.org.quartz.jobStore.useProperties | false|
|spring.quartz.properties.org.quartz.threadPool.makeThreadsDaemons | true|
|spring.quartz.properties.org.quartz.threadPool.threadCount | 25|
|spring.quartz.properties.org.quartz.jobStore.misfireThreshold | 60000|
|spring.quartz.properties.org.quartz.scheduler.makeSchedulerThreadDaemon | true|
|spring.quartz.properties.org.quartz.jobStore.driverDelegateClass | org.quartz.impl.jdbcjobstore.PostgreSQLDelegate|
|spring.quartz.properties.org.quartz.jobStore.clusterCheckinInterval | 5000|
# Host deploy API-service
apiServers="ds1"
```
### dolphinscheduler_env.sh [load environment variables configs]
@ -491,11 +358,11 @@ export DATAX_HOME=${DATAX_HOME:-/opt/soft/datax}
export PATH=$HADOOP_HOME/bin:$SPARK_HOME1/bin:$SPARK_HOME2/bin:$PYTHON_HOME/bin:$JAVA_HOME/bin:$HIVE_HOME/bin:$FLINK_HOME/bin:$DATAX_HOME/bin:$PATH
```
### Services logback configs
### Log related configuration
Services name| logback config name |
--|--|
API-service logback config |logback-api.xml|
master-service logback config|logback-master.xml |
worker-service logback config|logback-worker.xml |
alert-service logback config|logback-alert.xml |
|Service| Configuration file |
|--|--|
|Master Server | `master-server/conf/logback-spring.xml`|
|Api Server| `api-server/conf/logback-spring.xml`|
|Worker Server| `worker-server/conf/logback-spring.xml`|
|Alert Server| `alert-server/conf/logback-spring.xml`|

498
docs/docs/zh/architecture/configuration.md

@ -1,14 +1,10 @@
<!-- markdown-link-check-disable -->
# 前言
本文档为dolphinscheduler配置文件说明文档,针对版本为 dolphinscheduler-1.3.x 版本.
本文档为dolphinscheduler配置文件说明文档
# 目录结构
目前dolphinscheduler 所有的配置文件都在 [conf ] 目录中.
为了更直观的了解[conf]目录所在的位置以及包含的配置文件,请查看下面dolphinscheduler安装目录的简化说明.
本文主要讲述dolphinscheduler的配置文件.其他部分先不做赘述.
[注:以下 dolphinscheduler 简称为DS.]
DolphinScheduler的目录结构如下:
```
├── LICENSE
@ -96,26 +92,10 @@
# 配置文件详解
序号| 服务分类 | 配置文件|
|--|--|--|
1|启动/关闭DS服务脚本|dolphinscheduler-daemon.sh
2|数据库连接配置 | datasource.properties
3|zookeeper连接配置|zookeeper.properties
4|公共[存储]配置|common.properties
5|API服务配置|application-api.properties
6|Master服务配置|master.properties
7|Worker服务配置|worker.properties
8|Alert 服务配置|alert.properties
9|Quartz配置|quartz.properties
10|DS环境变量配置脚本[用于DS安装/启动]|install_config.conf
11|运行脚本加载环境变量配置文件 <br />[如: JAVA_HOME,HADOOP_HOME, HIVE_HOME ...]|dolphinscheduler_env.sh
12|各服务日志配置文件|api服务日志配置文件 : logback-api.xml <br /> master服务日志配置文件 : logback-master.xml <br /> worker服务日志配置文件 : logback-worker.xml <br /> alert服务日志配置文件 : logback-alert.xml
## 1.dolphinscheduler-daemon.sh [启动/关闭DS服务脚本]
dolphinscheduler-daemon.sh脚本负责DS的启动&关闭.
## dolphinscheduler-daemon.sh [启动/关闭DolphinScheduler服务脚本]
dolphinscheduler-daemon.sh脚本负责DolphinScheduler的启动&关闭.
start-all.sh/stop-all.sh最终也是通过dolphinscheduler-daemon.sh对集群进行启动/关闭操作.
目前DS只是做了一个基本的设置,JVM参数请根据各自资源的实际情况自行设置.
目前DolphinScheduler只是做了一个基本的设置,JVM参数请根据各自资源的实际情况自行设置.
默认简化参数如下:
```bash
@ -132,313 +112,201 @@ export DOLPHINSCHEDULER_OPTS="
"
```
> 不建议设置"-XX:DisableExplicitGC" , DS使用Netty进行通讯,设置该参数,可能会导致内存泄漏.
> 不建议设置"-XX:DisableExplicitGC" , DolphinScheduler使用Netty进行通讯,设置该参数,可能会导致内存泄漏.
## 数据库连接相关配置
在DolphinScheduler中使用Spring Hikari对数据库连接进行管理,配置文件位置:
|服务名称| 配置文件 |
|--|--|
|Master Server | `master-server/conf/application.yaml`|
|Api Server| `api-server/conf/application.yaml`|
|Worker Server| `worker-server/conf/application.yaml`|
|Alert Server| `alert-server/conf/application.yaml`|
默认配置如下:
## 2.datasource.properties [数据库连接]
在DS中使用Druid对数据库连接进行管理,默认简化配置如下.
|参数 | 默认值| 描述|
|--|--|--|
spring.datasource.driver-class-name| |数据库驱动
spring.datasource.url||数据库连接地址
spring.datasource.username||数据库用户名
spring.datasource.password||数据库密码
spring.datasource.initialSize|5| 初始连接池数量
spring.datasource.minIdle|5| 最小连接池数量
spring.datasource.maxActive|5| 最大连接池数量
spring.datasource.maxWait|60000| 最大等待时长
spring.datasource.timeBetweenEvictionRunsMillis|60000| 连接检测周期
spring.datasource.timeBetweenConnectErrorMillis|60000| 重试间隔
spring.datasource.minEvictableIdleTimeMillis|300000| 连接保持空闲而不被驱逐的最小时间
spring.datasource.validationQuery|SELECT 1|检测连接是否有效的sql
spring.datasource.validationQueryTimeout|3| 检测连接是否有效的超时时间[seconds]
spring.datasource.testWhileIdle|true| 申请连接的时候检测,如果空闲时间大于timeBetweenEvictionRunsMillis,执行validationQuery检测连接是否有效。
spring.datasource.testOnBorrow|true| 申请连接时执行validationQuery检测连接是否有效
spring.datasource.testOnReturn|false| 归还连接时执行validationQuery检测连接是否有效
spring.datasource.defaultAutoCommit|true| 是否开启自动提交
spring.datasource.keepAlive|true| 连接池中的minIdle数量以内的连接,空闲时间超过minEvictableIdleTimeMillis,则会执行keepAlive操作。
spring.datasource.poolPreparedStatements|true| 开启PSCache
spring.datasource.maxPoolPreparedStatementPerConnectionSize|20| 要启用PSCache,必须配置大于0,当大于0时,poolPreparedStatements自动触发修改为true。
## 3.zookeeper.properties [zookeeper连接配置]
|spring.datasource.driver-class-name| org.postgresql.Driver |数据库驱动|
|spring.datasource.url| jdbc:postgresql://127.0.0.1:5432/dolphinscheduler |数据库连接地址|
|spring.datasource.username|root|数据库用户名|
|spring.datasource.password|root|数据库密码|
|spring.datasource.hikari.connection-test-query|select 1|检测连接是否有效的sql|
|spring.datasource.hikari.minimum-idle| 5|最小空闲连接池数量|
|spring.datasource.hikari.auto-commit|true|是否自动提交|
|spring.datasource.hikari.pool-name|DolphinScheduler|连接池名称|
|spring.datasource.hikari.maximum-pool-size|50|连接池最大连接数|
|spring.datasource.hikari.connection-timeout|30000|连接超时时长|
|spring.datasource.hikari.idle-timeout|600000|空闲连接存活最大时间|
|spring.datasource.hikari.leak-detection-threshold|0|连接泄露检测阈值|
|spring.datasource.hikari.initialization-fail-timeout|1|连接池初始化失败timeout|
DolphinScheduler同样可以通过`bin/env/dolphinscheduler_env.sh`进行数据库连接相关的配置。
## Zookeeper相关配置
DolphinScheduler使用Zookeeper进行集群管理、容错、事件监听等功能,配置文件位置:
|服务名称| 配置文件 |
|--|--|
|Master Server | `master-server/conf/application.yaml`|
|Api Server| `api-server/conf/application.yaml`|
|Worker Server| `worker-server/conf/application.yaml`|
默认配置如下:
|参数 |默认值| 描述|
|--|--|--|
zookeeper.quorum|localhost:2181| zk集群连接信息
zookeeper.dolphinscheduler.root|/dolphinscheduler| DS在zookeeper存储根目录
zookeeper.session.timeout|60000| session 超时
zookeeper.connection.timeout|30000| 连接超时
zookeeper.retry.base.sleep|100| 基本重试时间差
zookeeper.retry.max.sleep|30000| 最大重试时间
zookeeper.retry.maxtime|10|最大重试次数
|registry.zookeeper.namespace|dolphinscheduler|Zookeeper集群使用的namespace|
|registry.zookeeper.connect-string|localhost:2181| Zookeeper集群连接信息|
|registry.zookeeper.retry-policy.base-sleep-time|60ms|基本重试时间差|
|registry.zookeeper.retry-policy.max-sleep|300ms|最大重试时间|
|registry.zookeeper.retry-policy.max-retries|5|最大重试次数|
|registry.zookeeper.session-timeout|30s|session超时时间|
|registry.zookeeper.connection-timeout|30s|连接超时时间|
|registry.zookeeper.block-until-connected|600ms|阻塞直到连接成功的等待时间|
|registry.zookeeper.digest|~|Zookeeper使用的digest|
DolphinScheduler同样可以通过`bin/env/dolphinscheduler_env.sh`进行Zookeeper相关的配置。
## common.properties [hadoop、s3、yarn配置]
common.properties配置文件目前主要是配置hadoop/s3/yarn相关的配置,配置文件位置:
|服务名称| 配置文件 |
|--|--|
|Master Server | `master-server/conf/common.properties`|
|Api Server| `api-server/conf/common.properties`|
|Worker Server| `worker-server/conf/common.properties`|
|Alert Server| `alert-server/conf/common.properties`|
默认配置如下:
## 4.common.properties [hadoop、s3、yarn配置]
common.properties配置文件目前主要是配置hadoop/s3a相关的配置.
| 参数 | 默认值 | 描述 |
|--|--|--|
data.basedir.path | /tmp/dolphinscheduler | 本地工作目录,用于存放临时文件
resource.storage.type | NONE | 资源文件存储类型: HDFS,S3,NONE
resource.storage.upload.base.path | /dolphinscheduler | 资源文件存储路径
resource.aws.access.key.id | minioadmin | S3 access key
resource.aws.secret.access.key | minioadmin | S3 secret access key
resource.aws.region | us-east-1 | S3 区域
resource.aws.s3.bucket.name | dolphinscheduler | S3 存储桶名称
resource.aws.s3.endpoint | http://minio:9000 | s3 endpoint地址
resource.hdfs.root.user | hdfs | 如果存储类型为HDFS,需要配置拥有对应操作权限的用户
resource.hdfs.fs.defaultFS | hdfs://mycluster:8020 | 请求地址如果resource.storage.type=S3,该值类似为: s3a://dolphinscheduler. 如果resource.storage.type=HDFS, 如果 hadoop 配置了 HA,需要复制core-site.xml 和 hdfs-site.xml 文件到conf目录
hadoop.security.authentication.startup.state | false | hadoop是否开启kerberos权限
java.security.krb5.conf.path | /opt/krb5.conf | kerberos配置目录
login.user.keytab.username | hdfs-mycluster@ESZ.COM | kerberos登录用户
login.user.keytab.path | /opt/hdfs.headless.keytab | kerberos登录用户keytab
kerberos.expire.time | 2 | kerberos过期时间,整数,单位为小时
yarn.resourcemanager.ha.rm.ids | | yarn resourcemanager 地址, 如果resourcemanager开启了HA, 输入HA的IP地址(以逗号分隔),如果resourcemanager为单节点, 该值为空即可
yarn.application.status.address | http://ds1:8088/ws/v1/cluster/apps/%s | 如果resourcemanager开启了HA或者没有使用resourcemanager,保持默认值即可. 如果resourcemanager为单节点,你需要将ds1 配置为resourcemanager对应的hostname
dolphinscheduler.env.path | env/dolphinscheduler_env.sh | 运行脚本加载环境变量配置文件[如: JAVA_HOME,HADOOP_HOME, HIVE_HOME ...]
development.state | false | 是否处于开发模式
task.resource.limit.state | false | 是否启用资源限制模式
## 5.application-api.properties [API服务配置]
|参数 |默认值| 描述|
|--|--|--|
server.port|12345|api服务通讯端口
server.servlet.session.timeout|7200|session超时时间
server.servlet.context-path|/dolphinscheduler |请求路径
spring.servlet.multipart.max-file-size|1024MB|最大上传文件大小
spring.servlet.multipart.max-request-size|1024MB|最大请求大小
server.jetty.max-http-post-size|5000000|jetty服务最大发送请求大小
spring.messages.encoding|UTF-8|请求编码
spring.jackson.time-zone|GMT+8|设置时区
spring.messages.basename|i18n/messages|i18n配置
security.authentication.type|PASSWORD|权限校验类型
security.authentication.ldap.user.admin|read-only-admin|LDAP登陆时,系统管理员账号
security.authentication.ldap.urls|ldap://ldap.forumsys.com:389/|LDAP urls
security.authentication.ldap.base-dn|dc=example,dc=com|LDAP base dn
security.authentication.ldap.username|cn=read-only-admin,dc=example,dc=com|LDAP账号
security.authentication.ldap.password|password|LDAP密码
security.authentication.ldap.user.identity-attribute|uid|LDAP用户身份标识字段名
security.authentication.ldap.user.email-attribute|mail|LDAP邮箱字段名
security.authentication.ldap.user.not-exist-action|CREATE|当LDAP用户不存在时执行的操作。CREATE:当用户不存在时自动新建用户, DENY:当用户不存在时拒绝登陆
traffic.control.global.switch|false|流量控制全局开关
traffic.control.max-global-qps-rate|300|全局最大请求数/秒
traffic.control.tenant-switch|false|流量控制租户开关
traffic.control.default-tenant-qps-rate|10|默认租户最大请求数/秒限制
traffic.control.customize-tenant-qps-rate||自定义租户最大请求数/秒限制
## 6.master.properties [Master服务配置]
|data.basedir.path | /tmp/dolphinscheduler | 本地工作目录,用于存放临时文件|
|resource.storage.type | NONE | 资源文件存储类型: HDFS,S3,NONE|
|resource.upload.path | /dolphinscheduler | 资源文件存储路径|
|aws.access.key.id | minioadmin | S3 access key|
|aws.secret.access.key | minioadmin | S3 secret access key|
|aws.region | us-east-1 | S3 区域|
|aws.s3.endpoint | http://minio:9000 | S3 endpoint地址|
|hdfs.root.user | hdfs | 如果存储类型为HDFS,需要配置拥有对应操作权限的用户|
|fs.defaultFS | hdfs://mycluster:8020 | 请求地址如果resource.storage.type=S3,该值类似为: s3a://dolphinscheduler. 如果resource.storage.type=HDFS, 如果 hadoop 配置了 HA,需要复制core-site.xml 和 hdfs-site.xml 文件到conf目录|
|hadoop.security.authentication.startup.state | false | hadoop是否开启kerberos权限|
|java.security.krb5.conf.path | /opt/krb5.conf | kerberos配置目录|
|login.user.keytab.username | hdfs-mycluster@ESZ.COM | kerberos登录用户|
|login.user.keytab.path | /opt/hdfs.headless.keytab | kerberos登录用户keytab|
|kerberos.expire.time | 2 | kerberos过期时间,整数,单位为小时|
|yarn.resourcemanager.ha.rm.ids | 192.168.xx.xx,192.168.xx.xx | yarn resourcemanager 地址, 如果resourcemanager开启了HA, 输入HA的IP地址(以逗号分隔),如果resourcemanager为单节点, 该值为空即可|
|yarn.application.status.address | http://ds1:8088/ws/v1/cluster/apps/%s | 如果resourcemanager开启了HA或者没有使用resourcemanager,保持默认值即可. 如果resourcemanager为单节点,你需要将ds1 配置为resourcemanager对应的hostname|
|development.state | false | 是否处于开发模式|
|resource.manager.httpaddress.port | 8088 | resource manager的端口|
|yarn.job.history.status.address | http://ds1:19888/ws/v1/history/mapreduce/jobs/%s | yarn的作业历史状态URL|
|datasource.encryption.enable | false | 是否启用datasource 加密|
|datasource.encryption.salt | !@#$%^&* | datasource加密使用的salt|
|data-quality.jar.name | dolphinscheduler-data-quality-dev-SNAPSHOT.jar | 配置数据质量使用的jar包|
|support.hive.oneSession | false | 设置hive SQL是否在同一个session中执行|
|sudo.enable | true | 是否开启sudo|
|alert.rpc.port | 50052 | Alert Server的RPC端口|
|zeppelin.rest.url | http://localhost:8080 | zeppelin RESTful API 接口地址|
## Api-server相关配置
位置:`api-server/conf/application.yaml`
|参数 |默认值| 描述|
|--|--|--|
master.listen.port|5678|master监听端口
master.exec.threads|100|master工作线程数量,用于限制并行的流程实例数量
master.exec.task.num|20|master每个流程实例的并行任务数量
master.dispatch.task.num|3|master每个批次的派发任务数量
master.host.selector|LowerWeight|master host选择器,用于选择合适的worker执行任务,可选值: Random, RoundRobin, LowerWeight
master.heartbeat.interval|10|master心跳间隔,单位为秒
master.task.commit.retryTimes|5|任务重试次数
master.task.commit.interval|1000|任务提交间隔,单位为毫秒
master.max.cpuload.avg|-1|master最大cpuload均值,只有高于系统cpuload均值时,master服务才能调度任务. 默认值为-1: cpu cores * 2
master.reserved.memory|0.3|master预留内存,只有低于系统可用内存时,master服务才能调度任务,单位为G
## 7.worker.properties [Worker服务配置]
|server.port|12345|api服务通讯端口|
|server.servlet.session.timeout|120m|session超时时间|
|server.servlet.context-path|/dolphinscheduler/ |请求路径|
|spring.servlet.multipart.max-file-size|1024MB|最大上传文件大小|
|spring.servlet.multipart.max-request-size|1024MB|最大请求大小|
|server.jetty.max-http-post-size|5000000|jetty服务最大发送请求大小|
|spring.banner.charset|UTF-8|请求编码|
|spring.jackson.time-zone|UTC|设置时区|
|spring.jackson.date-format|"yyyy-MM-dd HH:mm:ss"|设置时间格式|
|spring.messages.basename|i18n/messages|i18n配置|
|security.authentication.type|PASSWORD|权限校验类型|
|security.authentication.ldap.user.admin|read-only-admin|LDAP登陆时,系统管理员账号|
|security.authentication.ldap.urls|ldap://ldap.forumsys.com:389/|LDAP urls|
|security.authentication.ldap.base.dn|dc=example,dc=com|LDAP base dn|
|security.authentication.ldap.username|cn=read-only-admin,dc=example,dc=com|LDAP账号|
|security.authentication.ldap.password|password|LDAP密码|
|security.authentication.ldap.user.identity.attribute|uid|LDAP用户身份标识字段名|
|security.authentication.ldap.user.email.attribute|mail|LDAP邮箱字段名|
## Master Server相关配置
位置:`master-server/conf/application.yaml`
|参数 |默认值| 描述|
|--|--|--|
worker.listen.port|1234|worker监听端口
worker.exec.threads|100|worker工作线程数量,用于限制并行的任务实例数量
worker.heartbeat.interval|10|worker心跳间隔,单位为秒
worker.max.cpuload.avg|-1|worker最大cpuload均值,只有高于系统cpuload均值时,worker服务才能被派发任务. 默认值为-1: cpu cores * 2
worker.reserved.memory|0.3|worker预留内存,只有低于系统可用内存时,worker服务才能被派发任务,单位为G
worker.groups|default|worker分组配置,逗号分隔,例如'worker.groups=default,test' <br> worker启动时会根据该配置自动加入对应的分组
worker.tenant.auto.create|true|租户对应于系统的用户,由worker提交作业.如果系统没有该用户,则在参数worker.tenant.auto.create为true后自动创建。
worker.tenant.distributed.user|false|使用场景为分布式用户例如使用FreeIpa创建的用户存于LDAP中.该参数只适用于Linux,当该参数为true时worker.tenant.auto.create将不生效,不会自动去创建租户
## 8.alert.properties [Alert 告警服务配置]
|master.listen-port|5678|master监听端口|
|master.fetch-command-num|10|master拉取command数量|
|master.pre-exec-threads|10|master准备执行任务的数量,用于限制并行的command|
|master.exec-threads|100|master工作线程数量,用于限制并行的流程实例数量|
|master.dispatch-task-number|3|master每个批次的派发任务数量|
|master.host-selector|lower_weight|master host选择器,用于选择合适的worker执行任务,可选值: random, round_robin, lower_weight|
|master.heartbeat-interval|10|master心跳间隔,单位为秒|
|master.task-commit-retry-times|5|任务重试次数|
|master.task-commit-interval|1000|任务提交间隔,单位为毫秒|
|master.state-wheel-interval|5|轮询检查状态时间|
|master.max-cpu-load-avg|-1|master最大cpuload均值,只有高于系统cpuload均值时,master服务才能调度任务. 默认值为-1: cpu cores * 2|
|master.reserved-memory|0.3|master预留内存,只有低于系统可用内存时,master服务才能调度任务,单位为G|
|master.failover-interval|10|failover间隔,单位为分钟|
|master.kill-yarn-job-when-task-failover|true|当任务实例failover时,是否kill掉yarn job|
## Worker Server相关配置
位置:`worker-server/conf/application.yaml`
|参数 |默认值| 描述|
|--|--|--|
alert.type|EMAIL|告警类型|
mail.protocol|SMTP| 邮件服务器协议
mail.server.host|xxx.xxx.com|邮件服务器地址
mail.server.port|25|邮件服务器端口
mail.sender|xxx@xxx.com|发送人邮箱
mail.user|xxx@xxx.com|发送人邮箱名称
mail.passwd|111111|发送人邮箱密码
mail.smtp.starttls.enable|true|邮箱是否开启tls
mail.smtp.ssl.enable|false|邮箱是否开启ssl
mail.smtp.ssl.trust|xxx.xxx.com|邮箱ssl白名单
xls.file.path|/tmp/xls|邮箱附件临时工作目录
||以下为企业微信配置[选填]|
enterprise.wechat.enable|false|企业微信是否启用
enterprise.wechat.corp.id|xxxxxxx|
enterprise.wechat.secret|xxxxxxx|
enterprise.wechat.agent.id|xxxxxxx|
enterprise.wechat.users|xxxxxxx|
enterprise.wechat.token.url|https://qyapi.weixin.qq.com/cgi-bin/gettoken? <br /> corpid=$corpId&corpsecret=$secret|
enterprise.wechat.push.url|https://qyapi.weixin.qq.com/cgi-bin/message/send? <br /> access_token=$token|
enterprise.wechat.user.send.msg||发送消息格式
enterprise.wechat.team.send.msg||群发消息格式
plugin.dir|/Users/xx/your/path/to/plugin/dir|插件目录
## 9.quartz.properties [Quartz配置]
这里面主要是quartz配置,请结合实际业务场景&资源进行配置,本文暂时不做展开.
|worker.listen-port|1234|worker监听端口|
|worker.exec-threads|100|worker工作线程数量,用于限制并行的任务实例数量|
|worker.heartbeat-interval|10|worker心跳间隔,单位为秒|
|worker.host-weight|100|派发任务时,worker主机的权重|
|worker.tenant-auto-create|true|租户对应于系统的用户,由worker提交作业.如果系统没有该用户,则在参数worker.tenant.auto.create为true后自动创建。|
|worker.max-cpu-load-avg|-1|worker最大cpuload均值,只有高于系统cpuload均值时,worker服务才能被派发任务. 默认值为-1: cpu cores * 2|
|worker.reserved-memory|0.3|worker预留内存,只有低于系统可用内存时,worker服务才能被派发任务,单位为G|
|worker.groups|default|worker分组配置,逗号分隔,例如'worker.groups=default,test' <br> worker启动时会根据该配置自动加入对应的分组|
|worker.alert-listen-host|localhost|alert监听host|
|worker.alert-listen-port|50052|alert监听端口|
## Alert Server相关配置
位置:`alert-server/conf/application.yaml`
|参数 |默认值| 描述|
|--|--|--|
org.quartz.jobStore.driverDelegateClass | org.quartz.impl.jdbcjobstore.StdJDBCDelegate
org.quartz.jobStore.driverDelegateClass | org.quartz.impl.jdbcjobstore.PostgreSQLDelegate
org.quartz.scheduler.instanceName | DolphinScheduler
org.quartz.scheduler.instanceId | AUTO
org.quartz.scheduler.makeSchedulerThreadDaemon | true
org.quartz.jobStore.useProperties | false
org.quartz.threadPool.class | org.quartz.simpl.SimpleThreadPool
org.quartz.threadPool.makeThreadsDaemons | true
org.quartz.threadPool.threadCount | 25
org.quartz.threadPool.threadPriority | 5
org.quartz.jobStore.class | org.quartz.impl.jdbcjobstore.JobStoreTX
org.quartz.jobStore.tablePrefix | QRTZ_
org.quartz.jobStore.isClustered | true
org.quartz.jobStore.misfireThreshold | 60000
org.quartz.jobStore.clusterCheckinInterval | 5000
org.quartz.jobStore.acquireTriggersWithinLock|true
org.quartz.jobStore.dataSource | myDs
org.quartz.dataSource.myDs.connectionProvider.class | org.apache.dolphinscheduler.service.quartz.DruidConnectionProvider
## 10.install_config.conf [DS环境变量配置脚本[用于DS安装/启动]]
install_config.conf这个配置文件比较繁琐,这个文件主要有两个地方会用到.
* 1.DS集群的自动安装.
> 调用install.sh脚本会自动加载该文件中的配置.并根据该文件中的内容自动配置上述的配置文件中的内容.
> 比如:dolphinscheduler-daemon.sh、datasource.properties、zookeeper.properties、common.properties、application-api.properties、master.properties、worker.properties、alert.properties、quartz.properties 等文件.
* 2.DS集群的启动&关闭.
>DS集群在启动&关闭的时候,会加载该配置文件中的masters,workers,alertServer,apiServers等参数,启动/关闭DS集群.
文件内容如下:
```bash
# 注意: 该配置文件中如果包含特殊字符,如: `.*[]^${}\+?|()@#&`, 请转义,
# 示例: `[` 转义为 `\[`
# 数据库类型, 目前仅支持 postgresql 或者 mysql
dbtype="mysql"
# 数据库 地址 & 端口
dbhost="192.168.xx.xx:3306"
# 数据库 名称
dbname="dolphinscheduler"
# 数据库 用户名
username="xx"
# 数据库 密码
password="xx"
# Zookeeper地址
zkQuorum="192.168.xx.xx:2181,192.168.xx.xx:2181,192.168.xx.xx:2181"
# 将DS安装到哪个目录,如: /data1_1T/dolphinscheduler,
installPath="/data1_1T/dolphinscheduler"
# 使用哪个用户部署
# 注意: 部署用户需要sudo 权限, 并且可以操作 hdfs .
# 如果使用hdfs的话,根目录必须使用该用户进行创建.否则会有权限相关的问题.
deployUser="dolphinscheduler"
|server.port|50053|Alert Server监听端口|
|alert.port|50052|alert监听端口|
# 以下为告警服务配置
# 邮件服务器地址
mailServerHost="smtp.exmail.qq.com"
# 邮件服务器 端口
mailServerPort="25"
## Quartz相关配置
这里面主要是quartz配置,请结合实际业务场景&资源进行配置,本文暂时不做展开,配置文件位置:
# 发送者
mailSender="xxxxxxxxxx"
|服务名称| 配置文件 |
|--|--|
|Master Server | `master-server/conf/application.yaml`|
|Api Server| `api-server/conf/application.yaml`|
# 发送用户
mailUser="xxxxxxxxxx"
默认配置如下:
# 邮箱密码
mailPassword="xxxxxxxxxx"
| 参数 | 默认值 |
|--|--|
|spring.quartz.properties.org.quartz.threadPool.threadPriority | 5|
|spring.quartz.properties.org.quartz.jobStore.isClustered | true|
|spring.quartz.properties.org.quartz.jobStore.class | org.quartz.impl.jdbcjobstore.JobStoreTX|
|spring.quartz.properties.org.quartz.scheduler.instanceId | AUTO|
|spring.quartz.properties.org.quartz.jobStore.tablePrefix | QRTZ_|
|spring.quartz.properties.org.quartz.jobStore.acquireTriggersWithinLock|true|
|spring.quartz.properties.org.quartz.scheduler.instanceName | DolphinScheduler|
|spring.quartz.properties.org.quartz.threadPool.class | org.quartz.simpl.SimpleThreadPool|
|spring.quartz.properties.org.quartz.jobStore.useProperties | false|
|spring.quartz.properties.org.quartz.threadPool.makeThreadsDaemons | true|
|spring.quartz.properties.org.quartz.threadPool.threadCount | 25|
|spring.quartz.properties.org.quartz.jobStore.misfireThreshold | 60000|
|spring.quartz.properties.org.quartz.scheduler.makeSchedulerThreadDaemon | true|
|spring.quartz.properties.org.quartz.jobStore.driverDelegateClass | org.quartz.impl.jdbcjobstore.PostgreSQLDelegate|
|spring.quartz.properties.org.quartz.jobStore.clusterCheckinInterval | 5000|
# TLS协议的邮箱设置为true,否则设置为false
starttlsEnable="true"
# 开启SSL协议的邮箱配置为true,否则为false。注意: starttlsEnable和sslEnable不能同时为true
sslEnable="false"
## dolphinscheduler_env.sh [环境变量配置]
# 邮件服务地址值,同 mailServerHost
sslTrust="smtp.exmail.qq.com"
#业务用到的比如sql等资源文件上传到哪里,可以设置:HDFS,S3,NONE。如果想上传到HDFS,请配置为HDFS;如果不需要资源上传功能请选择NONE。
resourceStorageType="NONE"
# if S3,write S3 address,HA,for example :s3a://dolphinscheduler,
# Note,s3 be sure to create the root directory /dolphinscheduler
defaultFS="hdfs://mycluster:8020"
# 如果resourceStorageType 为S3 需要配置的参数如下:
s3Endpoint="http://192.168.xx.xx:9010"
s3AccessKey="xxxxxxxxxx"
s3SecretKey="xxxxxxxxxx"
# 如果ResourceManager是HA,则配置为ResourceManager节点的主备ip或者hostname,比如"192.168.xx.xx,192.168.xx.xx",否则如果是单ResourceManager或者根本没用到yarn,请配置yarnHaIps=""即可,如果没用到yarn,配置为""
yarnHaIps="192.168.xx.xx,192.168.xx.xx"
# 如果是单ResourceManager,则配置为ResourceManager节点ip或主机名,否则保持默认值即可。
singleYarnIp="yarnIp1"
# 资源文件在 HDFS/S3 存储路径
resourceUploadPath="/dolphinscheduler"
# HDFS/S3 操作用户
hdfsRootUser="hdfs"
# 以下为 kerberos 配置
# kerberos是否开启
kerberosStartUp="false"
# kdc krb5 config file path
krb5ConfPath="$installPath/conf/krb5.conf"
# keytab username
keytabUserName="hdfs-mycluster@ESZ.COM"
# username keytab path
keytabPath="$installPath/conf/hdfs.headless.keytab"
# api 服务端口
apiServerPort="12345"
# 部署DS的所有主机hostname
ips="ds1,ds2,ds3,ds4,ds5"
# ssh 端口 , 默认 22
sshPort="22"
# 部署master服务主机
masters="ds1,ds2"
# 部署 worker服务的主机
# 注意: 每一个worker都需要设置一个worker 分组的名称,默认值为 "default"
workers="ds1:default,ds2:default,ds3:default,ds4:default,ds5:default"
# 部署alert服务主机
alertServer="ds3"
# 部署api服务主机
apiServers="ds1"
```
## 11.dolphinscheduler_env.sh [环境变量配置]
通过类似shell方式提交任务的的时候,会加载该配置文件中的环境变量到主机中. 涉及到的 `JAVA_HOME`、元数据库、注册中心和任务类型配置,其中任务
类型主要有: Shell任务、Python任务、Spark任务、Flink任务、Datax任务等等
通过类似shell方式提交任务的的时候,会加载该配置文件中的环境变量到主机中。涉及到的 `JAVA_HOME`、元数据库、注册中心和任务类型配置,其中任务类型主要有: Shell任务、Python任务、Spark任务、Flink任务、Datax任务等等。
```bash
# JAVA_HOME, will use it to start DolphinScheduler server
@ -473,10 +341,10 @@ export DATAX_HOME=${DATAX_HOME:-/opt/soft/datax}
export PATH=$HADOOP_HOME/bin:$SPARK_HOME1/bin:$SPARK_HOME2/bin:$PYTHON_HOME/bin:$JAVA_HOME/bin:$HIVE_HOME/bin:$FLINK_HOME/bin:$DATAX_HOME/bin:$PATH
```
## 12.各服务日志配置文件
对应服务服务名称| 日志文件名 |
|--|--|--|
api服务日志配置文件 |logback-api.xml|
master服务日志配置文件|logback-master.xml |
worker服务日志配置文件|logback-worker.xml |
alert服务日志配置文件|logback-alert.xml |
## 日志相关配置
|服务名称| 配置文件 |
|--|--|
|Master Server | `master-server/conf/logback-spring.xml`|
|Api Server| `api-server/conf/logback-spring.xml`|
|Worker Server| `worker-server/conf/logback-spring.xml`|
|Alert Server| `alert-server/conf/logback-spring.xml`|

Loading…
Cancel
Save