分布式调度框架。
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 

53 KiB

Configuration

Preface

This document explains the DolphinScheduler application configurations.

Directory Structure

The directory structure of DolphinScheduler is as follows:

├── LICENSE
│
├── NOTICE
│
├── licenses                                    directory of licenses
│
├── bin                                         directory of DolphinScheduler application commands, configurations scripts
│   ├── dolphinscheduler-daemon.sh              script to start or shut down DolphinScheduler application
│   ├── env                                     directory of scripts to load environment variables
│     ├── dolphinscheduler_env.sh             script to export environment variables [eg: JAVA_HOME,HADOOP_HOME, HIVE_HOME ...] when you start or stop service using script `dolphinscheduler-daemon.sh`
│     └── install_env.sh                      script to export environment variables for DolphinScheduler installation when you use scripts `install.sh` `start-all.sh` `stop-all.sh` `status-all.sh`
│   ├── install.sh                              script to auto-setup services when you deploy DolphinScheduler in `psuedo-cluster` mode or `cluster` mode
│   ├── remove-zk-node.sh                       script to cleanup ZooKeeper caches
│   ├── scp-hosts.sh                            script to copy installation files to target hosts
│   ├── start-all.sh                            script to start all services when you deploy DolphinScheduler in `psuedo-cluster` mode or `cluster` mode
│   ├── status-all.sh                           script to check the status of all services when you deploy DolphinScheduler in `psuedo-cluster` mode or `cluster` mode
│   └── stop-all.sh                             script to shut down all services when you deploy DolphinScheduler in `psuedo-cluster` mode or `cluster` mode
│
├── alert-server                                directory of DolphinScheduler alert-server commands, configurations scripts and libs
│   ├── bin
│     └── start.sh                            script to start DolphinScheduler alert-server
│     └── jvm_args_env.sh                     script to set JVM args of DolphinScheduler alert-server
│   ├── conf
│     ├── application.yaml                    configurations of alert-server
│     ├── bootstrap.yaml                      configurations for Spring Cloud bootstrap, mostly you don't need to modify this,
│     ├── common.properties                   configurations of common-service like storage, credentials, etc.
│     ├── dolphinscheduler_env.sh             script to load environment variables for alert-server
│     └── logback-spring.xml                  configurations of alert-service log
│   └── libs                                    directory of alert-server libs
│
├── api-server                                  directory of DolphinScheduler api-server commands, configurations scripts and libs
│   ├── bin
│     └── start.sh                            script to start DolphinScheduler api-server
│     └── jvm_args_env.sh                     script to set JVM args of DolphinScheduler api-server
│   ├── conf
│     ├── application.yaml                    configurations of api-server
│     ├── bootstrap.yaml                      configurations for Spring Cloud bootstrap, mostly you don't need to modify this,
│     ├── common.properties                   configurations of common-service like storage, credentials, etc.
│     ├── dolphinscheduler_env.sh             script to load environment variables for api-server
│     └── logback-spring.xml                  configurations of api-service log
│   ├── libs                                    directory of api-server libs
│   └── ui                                      directory of api-server related front-end web resources
│
├── master-server                               directory of DolphinScheduler master-server commands, configurations scripts and libs
│   ├── bin
│     └── start.sh                            script to start DolphinScheduler master-server
│     └── jvm_args_env.sh                     script to set JVM args of DolphinScheduler master-server
│   ├── conf
│     ├── application.yaml                    configurations of master-server
│     ├── bootstrap.yaml                      configurations for Spring Cloud bootstrap, mostly you don't need to modify this,
│     ├── common.properties                   configurations of common-service like storage, credentials, etc.
│     ├── dolphinscheduler_env.sh             script to load environment variables for master-server
│     └── logback-spring.xml                  configurations of master-service log
│   └── libs                                    directory of master-server libs
│
├── standalone-server                           directory of DolphinScheduler standalone-server commands, configurations scripts and libs
│   ├── bin
│     └── start.sh                            script to start DolphinScheduler standalone-server
│     └── jvm_args_env.sh                     script to set JVM args of DolphinScheduler standalone-server
│   ├── conf
│     ├── application.yaml                    configurations of standalone-server
│     ├── bootstrap.yaml                      configurations for Spring Cloud bootstrap, mostly you don't need to modify this,
│     ├── common.properties                   configurations of common-service like storage, credentials, etc.
│     ├── dolphinscheduler_env.sh             script to load environment variables for standalone-server
│     ├── logback-spring.xml                  configurations of standalone-service log
│     └── sql                                 .sql files to create or upgrade DolphinScheduler metadata
│   ├── libs                                    directory of standalone-server libs
│   └── ui                                      directory of standalone-server related front-end web resources
│  
├── tools                                       directory of DolphinScheduler metadata tools commands, configurations scripts and libs
│   ├── bin
│     └── upgrade-schema.sh                   script to initialize or upgrade DolphinScheduler metadata
│   ├── conf
│     ├── application.yaml                    configurations of tools
│     └── common.properties                   configurations of common-service like storage, credentials, etc.
│   ├── libs                                    directory of tool libs
│   └── sql                                     .sql files to create or upgrade DolphinScheduler metadata
│  
├── worker-server                               directory of DolphinScheduler worker-server commands, configurations scripts and libs
│   ├── bin
│     └── start.sh                        script to start DolphinScheduler worker-server
│     └── jvm_args_env.sh                 script to set JVM args of DolphinScheduler worker-server
│   ├── conf
│     ├── application.yaml                configurations of worker-server
│     ├── bootstrap.yaml                  configurations for Spring Cloud bootstrap, mostly you don't need to modify this,
│     ├── common.properties               configurations of common-service like storage, credentials, etc.
│     ├── dolphinscheduler_env.sh         script to load environment variables for worker-server
│     └── logback-spring.xml              configurations of worker-service log
│   └── libs                                directory of worker-server libs
│
└── ui                                          directory of front-end web resources

Configurations in Details

dolphinscheduler-daemon.sh [startup or shutdown DolphinScheduler application]

dolphinscheduler-daemon.sh is responsible for DolphinScheduler startup and shutdown. Essentially, start-all.sh or stop-all.sh startup and shutdown the cluster via dolphinscheduler-daemon.sh. Currently, DolphinScheduler just makes a basic config, remember to config further JVM options based on your practical situation of resources.

Default simplified parameters are:

export DOLPHINSCHEDULER_OPTS="
-server
-Xmx16g
-Xms1g
-Xss512k
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:+UseFastAccessorMethods
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=70
"

"-XX:DisableExplicitGC" is not recommended due to may lead to memory link (DolphinScheduler dependent on Netty to communicate). If add "-Djava.net.preferIPv6Addresses=true" will use ipv6 address, if add "-Djava.net.preferIPv4Addresses=true" will use ipv4 address, if doesn't set the two parameter will use ipv4 or ipv6.

DolphinScheduler uses Spring Hikari to manage database connections, configuration file location:

Service Configuration file
Master Server master-server/conf/application.yaml
Api Server api-server/conf/application.yaml
Worker Server worker-server/conf/application.yaml
Alert Server alert-server/conf/application.yaml

The default configuration is as follows:

Parameters Default value Description
spring.datasource.driver-class-name org.postgresql.Driver datasource driver
spring.datasource.url jdbc:postgresql://127.0.0.1:5432/dolphinscheduler datasource connection url
spring.datasource.username root datasource username
spring.datasource.password root datasource password
spring.datasource.hikari.connection-test-query select 1 validate connection by running the SQL
spring.datasource.hikari.minimum-idle 5 minimum connection pool size number
spring.datasource.hikari.auto-commit true whether auto commit
spring.datasource.hikari.pool-name DolphinScheduler name of the connection pool
spring.datasource.hikari.maximum-pool-size 50 maximum connection pool size number
spring.datasource.hikari.connection-timeout 30000 connection timeout
spring.datasource.hikari.idle-timeout 600000 Maximum idle connection survival time
spring.datasource.hikari.leak-detection-threshold 0 Connection leak detection threshold
spring.datasource.hikari.initialization-fail-timeout 1 Connection pool initialization failed timeout

Note that DolphinScheduler also supports database configuration through bin/env/dolphinscheduler_env.sh.

DolphinScheduler uses Zookeeper for cluster management, fault tolerance, event monitoring and other functions. Configuration file location:

Service Configuration file
Master Server master-server/conf/application.yaml
Api Server api-server/conf/application.yaml
Worker Server worker-server/conf/application.yaml

The default configuration is as follows:

Parameters Default value Description
registry.zookeeper.namespace dolphinscheduler namespace of zookeeper
registry.zookeeper.connect-string localhost:2181 the connection string of zookeeper
registry.zookeeper.retry-policy.base-sleep-time 60ms time to wait between subsequent retries
registry.zookeeper.retry-policy.max-sleep 300ms maximum time to wait between subsequent retries
registry.zookeeper.retry-policy.max-retries 5 maximum retry times
registry.zookeeper.session-timeout 30s session timeout
registry.zookeeper.connection-timeout 30s connection timeout
registry.zookeeper.block-until-connected 600ms waiting time to block until the connection succeeds
registry.zookeeper.digest {username}:{password} digest of zookeeper to access znode, works only when acl is enabled, for more details please check [https://zookeeper.apache.org/doc/r3.4.14/zookeeperAdmin.html](Apache Zookeeper doc)

Note that DolphinScheduler also supports zookeeper related configuration through bin/env/dolphinscheduler_env.sh.

common.properties [hadoop、s3、yarn config properties]

Currently, common.properties mainly configures Hadoop,s3a related configurations. Configuration file location:

Service Configuration file
Master Server master-server/conf/common.properties
Api Server api-server/conf/common.properties
Worker Server worker-server/conf/common.properties
Alert Server alert-server/conf/common.properties

The default configuration is as follows:

Parameters Default value Description
data.basedir.path /tmp/dolphinscheduler local directory used to store temp files
resource.storage.type NONE type of resource files: HDFS, S3, OSS, GCS, ABS, NONE
resource.upload.path /dolphinscheduler storage path of resource files
aws.access.key.id minioadmin access key id of S3
aws.secret.access.key minioadmin secret access key of S3
aws.region us-east-1 region of S3
aws.s3.endpoint http://minio:9000 endpoint of S3
hdfs.root.user hdfs configure users with corresponding permissions if storage type is HDFS
fs.defaultFS hdfs://mycluster:8020 If resource.storage.type=S3, then the request url would be similar to 's3a://dolphinscheduler'. Otherwise if resource.storage.type=HDFS and hadoop supports HA, copy core-site.xml and hdfs-site.xml into 'conf' directory
hadoop.security.authentication.startup.state false whether hadoop grant kerberos permission
java.security.krb5.conf.path /opt/krb5.conf kerberos config directory
login.user.keytab.username hdfs-mycluster@ESZ.COM kerberos username
login.user.keytab.path /opt/hdfs.headless.keytab kerberos user keytab
kerberos.expire.time 2 kerberos expire time,integer,the unit is hour
yarn.resourcemanager.ha.rm.ids 192.168.xx.xx,192.168.xx.xx specify the yarn resourcemanager url. if resourcemanager supports HA, input HA IP addresses (separated by comma), or input null for standalone
yarn.application.status.address http://ds1:8088/ws/v1/cluster/apps/%s keep default if ResourceManager supports HA or not use ResourceManager, or replace ds1 with corresponding hostname if ResourceManager in standalone mode
development.state false specify whether in development state
dolphin.scheduler.network.interface.preferred NONE display name of the network card which will be used
dolphin.scheduler.network.interface.restrict docker0 display name of the network card which shouldn't be used
dolphin.scheduler.network.priority.strategy default IP acquisition strategy, give priority to finding the internal network or the external network
resource.manager.httpaddress.port 8088 the port of resource manager
yarn.job.history.status.address http://ds1:19888/ws/v1/history/mapreduce/jobs/%s job history status url of yarn
datasource.encryption.enable false whether to enable datasource encryption
datasource.encryption.salt !@#$%^&* the salt of the datasource encryption
data-quality.jar.dir the jar of data quality
support.hive.oneSession false specify whether hive SQL is executed in the same session
sudo.enable true whether to enable sudo
alert.rpc.port 50052 the RPC port of Alert Server
zeppelin.rest.url http://localhost:8080 the RESTful API url of zeppelin
appId.collect log way to collect applicationId, if use aop, alter the configuration from log to aop, annotation of applicationId auto collection related configuration in bin/env/dolphinscheduler_env.sh should be removed. Note: Aop way doesn't support submitting yarn job on remote host by client mode like Beeline, and will failure if override applicationId collection-related environment configuration in dolphinscheduler_env.sh, and .

Location: api-server/conf/application.yaml

Parameters Default value Description
server.port 12345 api service communication port
server.servlet.session.timeout 120m session timeout
server.servlet.context-path /dolphinscheduler/ request path
spring.servlet.multipart.max-file-size 1024MB maximum file size
spring.servlet.multipart.max-request-size 1024MB maximum request size
server.jetty.max-http-post-size 5000000 jetty maximum post size
spring.banner.charset UTF-8 message encoding
spring.jackson.time-zone UTC time zone
spring.jackson.date-format "yyyy-MM-dd HH:mm:ss" time format
spring.messages.basename i18n/messages i18n config
security.authentication.type PASSWORD authentication type
security.authentication.ldap.user.admin read-only-admin admin user account when you log-in with LDAP
security.authentication.ldap.urls ldap://ldap.forumsys.com:389/ LDAP urls
security.authentication.ldap.base.dn dc=example,dc=com LDAP base dn
security.authentication.ldap.username cn=read-only-admin,dc=example,dc=com LDAP username
security.authentication.ldap.password password LDAP password
security.authentication.ldap.user.identity-attribute uid LDAP user identity attribute
security.authentication.ldap.user.email-attribute mail LDAP user email attribute
security.authentication.ldap.user.not-exist-action CREATE action when ldap user is not exist,default value: CREATE. Optional values include(CREATE,DENY)
security.authentication.ldap.ssl.enable false LDAP ssl switch
security.authentication.ldap.ssl.trust-store ldapkeystore.jks LDAP jks file absolute path
security.authentication.ldap.ssl.trust-store-password password LDAP jks password
security.authentication.casdoor.user.admin admin user account when you log-in with Casdoor
casdoor.endpoint Casdoor server url
casdoor.client-id id in Casdoor
casdoor.client-secret secret in Casdoor
casdoor.certificate certificate in Casdoor
casdoor.organization-name organization name in Casdoor
casdoor.application-name application name in Casdoor
casdoor.redirect-url doplhinscheduler login url
api.traffic.control.global.switch false traffic control global switch
api.traffic.control.max-global-qps-rate 300 global max request number per second
api.traffic.control.tenant-switch false traffic control tenant switch
api.traffic.control.default-tenant-qps-rate 10 default tenant max request number per second
api.traffic.control.customize-tenant-qps-rate customize tenant max request number per second

Location: master-server/conf/application.yaml

Parameters Default value Description
master.listen-port 5678 master listen port
master.fetch-command-num 10 the number of commands fetched by master
master.pre-exec-threads 10 master prepare execute thread number to limit handle commands in parallel
master.exec-threads 100 master execute thread number to limit process instances in parallel
master.dispatch-task-number 3 master dispatch task number per batch
master.host-selector lower_weight master host selector to select a suitable worker, default value: LowerWeight. Optional values include random, round_robin, lower_weight
master.max-heartbeat-interval 10s master max heartbeat interval
master.task-commit-retry-times 5 master commit task retry times
master.task-commit-interval 1000 master commit task interval, the unit is millisecond
master.state-wheel-interval 5 time to check status
master.server-load-protection.enabled true If set true, will open master overload protection
master.server-load-protection.max-cpu-usage-percentage-thresholds 0.7 Master max cpu usage, when the master's cpu usage is smaller then this value, master server can execute workflow.
master.server-load-protection.max-jvm-memory-usage-percentage-thresholds 0.7 Master max JVM memory usage , when the master's jvm memory usage is smaller then this value, master server can execute workflow.
master.server-load-protection.max-system-memory-usage-percentage-thresholds 0.7 Master max System memory usage , when the master's system memory usage is smaller then this value, master server can execute workflow.
master.server-load-protection.max-disk-usage-percentage-thresholds 0.7 Master max disk usage , when the master's disk usage is smaller then this value, master server can execute workflow.
master.failover-interval 10 failover interval, the unit is minute
master.kill-application-when-task-failover true whether to kill yarn/k8s application when failover taskInstance
master.registry-disconnect-strategy.strategy stop Used when the master disconnect from registry, default value: stop. Optional values include stop, waiting
master.registry-disconnect-strategy.max-waiting-time 100s Used when the master disconnect from registry, and the disconnect strategy is waiting, this config means the master will waiting to reconnect to registry in given times, and after the waiting times, if the master still cannot connect to registry, will stop itself, if the value is 0s, the Master will wait infinitely
master.worker-group-refresh-interval 10s The interval to refresh worker group from db to memory

Location: worker-server/conf/application.yaml

Parameters Default value Description
worker.listen-port 1234 worker-service listen port
worker.exec-threads 100 worker-service execute thread number, used to limit the number of task instances in parallel
worker.max-heartbeat-interval 10s worker-service max heartbeat interval
worker.host-weight 100 worker host weight to dispatch tasks
worker.server-load-protection.enabled true If set true will open worker overload protection
worker.max-cpu-usage-percentage-thresholds.max-cpu-usage-percentage-thresholds 0.7 Master max cpu usage, when the master's cpu usage is smaller then this value, master server can execute workflow.
worker.server-load-protection.max-jvm-memory-usage-percentage-thresholds 0.7 Master max JVM memory usage , when the master's jvm memory usage is smaller then this value, master server can execute workflow.
worker.server-load-protection.max-system-memory-usage-percentage-thresholds 0.7 Master max System memory usage , when the master's system memory usage is smaller then this value, master server can execute workflow.
worker.server-load-protection.max-disk-usage-percentage-thresholds 0.7 Master max disk usage , when the master's disk usage is smaller then this value, master server can execute workflow.
worker.registry-disconnect-strategy.strategy stop Used when the worker disconnect from registry, default value: stop. Optional values include stop, waiting
worker.registry-disconnect-strategy.max-waiting-time 100s Used when the worker disconnect from registry, and the disconnect strategy is waiting, this config means the worker will waiting to reconnect to registry in given times, and after the waiting times, if the worker still cannot connect to registry, will stop itself, if the value is 0s, will wait infinitely
worker.task-execute-threads-full-policy REJECT If REJECT, when the task waiting in the worker reaches exec-threads, it will reject the received task and the Master will redispatch it; If CONTINUE, it will put the task into the worker's execution queue and wait for a free thread to start execution
worker.tenant-config.auto-create-tenant-enabled true tenant corresponds to the user of the system, which is used by the worker to submit the job. If system does not have this user, it will be automatically created after the parameter worker.tenant.auto.create is true.
worker.tenant-config.distributed-tenant-enabled false When this parameter is true, auto-create-tenant-enabled has no effect and will not automatically create tenants
worker.tenant-config.default-tenant-enabled false If set true, will use worker bootstrap user as the tenant to execute task when the tenant is default.

Location: alert-server/conf/application.yaml

Parameters Default value Description
server.port 50053 the port of Alert Server
alert.port 50052 the port of alert

This part describes quartz configs and configure them based on your practical situation and resources.

Service Configuration file
Master Server master-server/conf/application.yaml
Api Server api-server/conf/application.yaml

The default configuration is as follows:

Parameters Default value
spring.quartz.properties.org.quartz.jobStore.isClustered true
spring.quartz.properties.org.quartz.jobStore.class org.quartz.impl.jdbcjobstore.JobStoreTX
spring.quartz.properties.org.quartz.scheduler.instanceId AUTO
spring.quartz.properties.org.quartz.jobStore.tablePrefix QRTZ_
spring.quartz.properties.org.quartz.jobStore.acquireTriggersWithinLock true
spring.quartz.properties.org.quartz.scheduler.instanceName DolphinScheduler
spring.quartz.properties.org.quartz.jobStore.useProperties false
spring.quartz.properties.org.quartz.jobStore.misfireThreshold 60000
spring.quartz.properties.org.quartz.scheduler.makeSchedulerThreadDaemon true
spring.quartz.properties.org.quartz.jobStore.driverDelegateClass org.quartz.impl.jdbcjobstore.PostgreSQLDelegate
spring.quartz.properties.org.quartz.jobStore.clusterCheckinInterval 5000

The above configuration items is the same in Master Server and Api Server, but their Quartz Scheduler threadpool configuration is different.

The default quartz threadpool configuration in Master Server is as follows:

Parameters Default value
spring.quartz.properties.org.quartz.threadPool.makeThreadsDaemons true
spring.quartz.properties.org.quartz.threadPool.threadCount 25
spring.quartz.properties.org.quartz.threadPool.threadPriority 5
spring.quartz.properties.org.quartz.threadPool.class org.quartz.simpl.SimpleThreadPool

Since Api Server will not start Quartz Scheduler instance, as a client only, therefore it's threadpool is configured as QuartzZeroSizeThreadPool which has zero thread; The default configuration is as follows:

Parameters Default value
spring.quartz.properties.org.quartz.threadPool.class org.apache.dolphinscheduler.scheduler.quartz.QuartzZeroSizeThreadPool

dolphinscheduler_env.sh [load environment variables configs]

When using shell to commit tasks, DolphinScheduler will export environment variables from bin/env/dolphinscheduler_env.sh. The mainly configuration including JAVA_HOME and other environment paths.

# JAVA_HOME, will use it to start DolphinScheduler server
export JAVA_HOME=${JAVA_HOME:-/opt/soft/java}

# Tasks related configurations, need to change the configuration if you use the related tasks.
export HADOOP_HOME=${HADOOP_HOME:-/opt/soft/hadoop}
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/opt/soft/hadoop/etc/hadoop}
export SPARK_HOME=${SPARK_HOME:-/opt/soft/spark}
export PYTHON_LAUNCHER=${PYTHON_LAUNCHER:-/opt/soft/python/bin/python3}
export HIVE_HOME=${HIVE_HOME:-/opt/soft/hive}
export FLINK_HOME=${FLINK_HOME:-/opt/soft/flink}
export DATAX_LAUNCHER=${DATAX_LAUNCHER:-/opt/soft/datax/bin/datax.py}

export PATH=$HADOOP_HOME/bin:$SPARK_HOME/bin:$PYTHON_LAUNCHER:$JAVA_HOME/bin:$HIVE_HOME/bin:$FLINK_HOME/bin:$DATAX_LAUNCHER:$PATH

# applicationId auto collection related configuration, the following configurations are unnecessary if setting appId.collect=log
export HADOOP_CLASSPATH=`hadoop classpath`:${DOLPHINSCHEDULER_HOME}/tools/libs/*
export SPARK_DIST_CLASSPATH=$HADOOP_CLASSPATH:$SPARK_DIST_CLASS_PATH
export HADOOP_CLIENT_OPTS="-javaagent:${DOLPHINSCHEDULER_HOME}/tools/libs/aspectjweaver-1.9.7.jar":$HADOOP_CLIENT_OPTS
export SPARK_SUBMIT_OPTS="-javaagent:${DOLPHINSCHEDULER_HOME}/tools/libs/aspectjweaver-1.9.7.jar":$SPARK_SUBMIT_OPTS
export FLINK_ENV_JAVA_OPTS="-javaagent:${DOLPHINSCHEDULER_HOME}/tools/libs/aspectjweaver-1.9.7.jar":$FLINK_ENV_JAVA_OPTS
Service Configuration file
Master Server master-server/conf/logback-spring.xml
Api Server api-server/conf/logback-spring.xml
Worker Server worker-server/conf/logback-spring.xml
Alert Server alert-server/conf/logback-spring.xml