Browse Source

[doc] Migrate dev doc from website repository (#9291)

* [doc] Migrate dev doc from website repository

* Correct release language

* Fix license issue

* Add all images from website

* Delete unused images

* Add CI

* Fix ci

* fix ci

* Remove no need file

* Add latest commit from website

* correct img_utils.py script

* Remove unused imgs
3.0.0/version-upgrade
Jiajie Zhong 3 years ago committed by GitHub
parent
commit
9f84dbbda0
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
  1. 39
      .dlc.json
  2. 52
      .github/workflows/docs.yml
  3. 1
      .licenserc.yaml
  4. 11
      deploy/README.md
  5. 642
      docs/configs/docsdev.js
  6. 127
      docs/configs/index.md.jsx
  7. 396
      docs/configs/site.js
  8. 79
      docs/docs/en/about/glossary.md
  9. 48
      docs/docs/en/about/hardware.md
  10. 19
      docs/docs/en/about/introduction.md
  11. 42
      docs/docs/en/architecture/cache.md
  12. 424
      docs/docs/en/architecture/configuration.md
  13. 282
      docs/docs/en/architecture/design.md
  14. 59
      docs/docs/en/architecture/load-balance.md
  15. 193
      docs/docs/en/architecture/metadata.md
  16. 1114
      docs/docs/en/architecture/task-structure.md
  17. 708
      docs/docs/en/faq.md
  18. 14
      docs/docs/en/guide/alert/alert_plugin_user_guide.md
  19. 27
      docs/docs/en/guide/alert/dingtalk.md
  20. 64
      docs/docs/en/guide/alert/enterprise-webexteams.md
  21. 14
      docs/docs/en/guide/alert/enterprise-wechat.md
  22. 42
      docs/docs/en/guide/alert/telegram.md
  23. 39
      docs/docs/en/guide/datasource/hive.md
  24. 6
      docs/docs/en/guide/datasource/introduction.md
  25. 14
      docs/docs/en/guide/datasource/mysql.md
  26. 13
      docs/docs/en/guide/datasource/postgresql.md
  27. 13
      docs/docs/en/guide/datasource/spark.md
  28. 248
      docs/docs/en/guide/expansion-reduction.md
  29. 123
      docs/docs/en/guide/flink-call.md
  30. 5
      docs/docs/en/guide/homepage.md
  31. 39
      docs/docs/en/guide/installation/cluster.md
  32. 754
      docs/docs/en/guide/installation/kubernetes.md
  33. 201
      docs/docs/en/guide/installation/pseudo-cluster.md
  34. 74
      docs/docs/en/guide/installation/skywalking-agent.md
  35. 42
      docs/docs/en/guide/installation/standalone.md
  36. 32
      docs/docs/en/guide/monitor.md
  37. 69
      docs/docs/en/guide/open-api.md
  38. 48
      docs/docs/en/guide/parameter/built-in.md
  39. 66
      docs/docs/en/guide/parameter/context.md
  40. 19
      docs/docs/en/guide/parameter/global.md
  41. 19
      docs/docs/en/guide/parameter/local.md
  42. 40
      docs/docs/en/guide/parameter/priority.md
  43. 18
      docs/docs/en/guide/project/project-list.md
  44. 11
      docs/docs/en/guide/project/task-instance.md
  45. 114
      docs/docs/en/guide/project/workflow-definition.md
  46. 62
      docs/docs/en/guide/project/workflow-instance.md
  47. 165
      docs/docs/en/guide/resource.md
  48. 151
      docs/docs/en/guide/security.md
  49. 1024
      docs/docs/en/guide/start/docker.md
  50. 62
      docs/docs/en/guide/start/quick-start.md
  51. 36
      docs/docs/en/guide/task/conditions.md
  52. 63
      docs/docs/en/guide/task/datax.md
  53. 27
      docs/docs/en/guide/task/dependent.md
  54. 60
      docs/docs/en/guide/task/emr.md
  55. 69
      docs/docs/en/guide/task/flink.md
  56. 47
      docs/docs/en/guide/task/http.md
  57. 73
      docs/docs/en/guide/task/map-reduce.md
  58. 19
      docs/docs/en/guide/task/pigeon.md
  59. 55
      docs/docs/en/guide/task/python.md
  60. 43
      docs/docs/en/guide/task/shell.md
  61. 68
      docs/docs/en/guide/task/spark.md
  62. 43
      docs/docs/en/guide/task/sql.md
  63. 13
      docs/docs/en/guide/task/stored-procedure.md
  64. 46
      docs/docs/en/guide/task/sub-process.md
  65. 39
      docs/docs/en/guide/task/switch.md
  66. 84
      docs/docs/en/guide/upgrade.md
  67. 73
      docs/docs/en/history-versions.md
  68. 58
      docs/docs/zh/about/glossary.md
  69. 47
      docs/docs/zh/about/hardware.md
  70. 12
      docs/docs/zh/about/introduction.md
  71. 42
      docs/docs/zh/architecture/cache.md
  72. 406
      docs/docs/zh/architecture/configuration.md
  73. 287
      docs/docs/zh/architecture/design.md
  74. 58
      docs/docs/zh/architecture/load-balance.md
  75. 185
      docs/docs/zh/architecture/metadata.md
  76. 1134
      docs/docs/zh/architecture/task-structure.md
  77. 689
      docs/docs/zh/faq.md
  78. 12
      docs/docs/zh/guide/alert/alert_plugin_user_guide.md
  79. 26
      docs/docs/zh/guide/alert/dingtalk.md
  80. 64
      docs/docs/zh/guide/alert/enterprise-webexteams.md
  81. 13
      docs/docs/zh/guide/alert/enterprise-wechat.md
  82. 41
      docs/docs/zh/guide/alert/telegram.md
  83. 40
      docs/docs/zh/guide/datasource/hive.md
  84. 6
      docs/docs/zh/guide/datasource/introduction.md
  85. 13
      docs/docs/zh/guide/datasource/mysql.md
  86. 13
      docs/docs/zh/guide/datasource/postgresql.md
  87. 19
      docs/docs/zh/guide/datasource/spark.md
  88. 245
      docs/docs/zh/guide/expansion-reduction.md
  89. 150
      docs/docs/zh/guide/flink-call.md
  90. 5
      docs/docs/zh/guide/homepage.md
  91. 35
      docs/docs/zh/guide/installation/cluster.md
  92. 755
      docs/docs/zh/guide/installation/kubernetes.md
  93. 200
      docs/docs/zh/guide/installation/pseudo-cluster.md
  94. 74
      docs/docs/zh/guide/installation/skywalking-agent.md
  95. 42
      docs/docs/zh/guide/installation/standalone.md
  96. 32
      docs/docs/zh/guide/monitor.md
  97. 65
      docs/docs/zh/guide/open-api.md
  98. 49
      docs/docs/zh/guide/parameter/built-in.md
  99. 69
      docs/docs/zh/guide/parameter/context.md
  100. 19
      docs/docs/zh/guide/parameter/global.md
  101. Some files were not shown because too many files have changed in this diff Show More

39
.dlc.json

@ -0,0 +1,39 @@
{
"ignorePatterns": [
{
"pattern": "^http://localhost"
},
{
"pattern": "^https://hive.apache.org"
},
{
"pattern": "^http://192"
},
{
"pattern": "^https://img.shields.io/badge"
}
],
"replacementPatterns": [
{
"pattern": "^/en-us/download/download.html$",
"replacement": "https://dolphinscheduler.apache.org/en-us/download/download.html"
},
{
"pattern": "^/zh-cn/download/download.html$",
"replacement": "https://dolphinscheduler.apache.org/zh-cn/download/download.html"
},
{
"pattern": "^/img",
"replacement": "{{BASEURL}}/docs/img"
}
],
"timeout": "10s",
"retryOn429": true,
"retryCount": 10,
"fallbackRetryDelay": "1000s",
"aliveStatusCodes": [
200,
401,
0
]
}

52
.github/workflows/docs.yml

@ -0,0 +1,52 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
name: Docs
on:
pull_request:
concurrency:
group: doc-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
img-check:
name: Image Check
timeout-minutes: 15
runs-on: ubuntu-latest
defaults:
run:
working-directory: docs
steps:
- uses: actions/checkout@v2
- name: Set up Python 3.9
uses: actions/setup-python@v2
with:
python-version: 3.9
- name: Run Image Check
run: python img_utils.py -v check
dead-link:
name: Dead Link
runs-on: ubuntu-latest
timeout-minutes: 30
steps:
- uses: actions/checkout@v2
- run: sudo npm install -g markdown-link-check@3.10.0
- run: |
for file in $(find . -name "*.md"); do
markdown-link-check -c .dlc.json -q "$file"
done

1
.licenserc.yaml

@ -33,6 +33,7 @@ header:
- .gitattributes
- '**/licenses/**/LICENSE-*'
- '**/*.md'
- '**/*.svg'
- '**/*.json'
- '**/*.iml'
- '**/*.ini'

11
deploy/README.md

@ -1,11 +1,4 @@
# DolphinScheduler for Docker and Kubernetes
### QuickStart in Docker
[![EN doc](https://img.shields.io/badge/document-English-blue.svg)](https://dolphinscheduler.apache.org/en-us/docs/latest/user_doc/docker-deployment.html)
[![CN doc](https://img.shields.io/badge/文档-中文版-blue.svg)](https://dolphinscheduler.apache.org/zh-cn/docs/latest/user_doc/docker-deployment.html)
### QuickStart in Kubernetes
[![EN doc](https://img.shields.io/badge/document-English-blue.svg)](https://dolphinscheduler.apache.org/en-us/docs/latest/user_doc/kubernetes-deployment.html)
[![CN doc](https://img.shields.io/badge/文档-中文版-blue.svg)](https://dolphinscheduler.apache.org/zh-cn/docs/latest/user_doc/kubernetes-deployment.html)
* [Start Up DolphinScheduler with Docker](https://dolphinscheduler.apache.org/en-us/docs/latest/user_doc/guide/installation/docker.html)
* [Start Up DolphinScheduler with Kubernetes](https://dolphinscheduler.apache.org/en-us/docs/latest/user_doc/guide/installation/kubernetes.html)

642
docs/configs/docsdev.js

@ -0,0 +1,642 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*
*/
export default {
'en-us': {
sidemenu: [
{
title: 'About',
children: [
{
title: 'Introduction',
link: '/en-us/docs/dev/user_doc/about/introduction.html',
},
{
title: 'Hardware Environment',
link: '/en-us/docs/dev/user_doc/about/hardware.html',
},
{
title: 'Glossary',
link: '/en-us/docs/dev/user_doc/about/glossary.html',
},
],
},
{
title: 'Quick Start',
children: [
{
title: 'Quick Start',
link: '/en-us/docs/dev/user_doc/guide/start/quick-start.html',
},
{
title: 'Docker Deployment',
link: '/en-us/docs/dev/user_doc/guide/start/docker.html',
},
],
},
{
title: 'Installation',
children: [
{
title: 'Standalone Deployment',
link: '/en-us/docs/dev/user_doc/guide/installation/standalone.html',
},
{
title: 'Pseudo Cluster Deployment',
link: '/en-us/docs/dev/user_doc/guide/installation/pseudo-cluster.html',
},
{
title: 'Cluster Deployment',
link: '/en-us/docs/dev/user_doc/guide/installation/cluster.html',
},
{
title: 'Kubernetes Deployment',
link: '/en-us/docs/dev/user_doc/guide/installation/kubernetes.html',
},
],
},
{
title: 'Introduction to Functions',
children: [
{
title: 'Workflow Overview',
link: '/en-us/docs/dev/user_doc/guide/homepage.html',
},
{
title: 'Project',
children: [
{
title: 'Project List',
link: '/en-us/docs/dev/user_doc/guide/project/project-list.html',
},
{
title: 'Workflow Definition',
link: '/en-us/docs/dev/user_doc/guide/project/workflow-definition.html',
},
{
title: 'Workflow Instance',
link: '/en-us/docs/dev/user_doc/guide/project/workflow-instance.html',
},
{
title: 'Task Instance',
link: '/en-us/docs/dev/user_doc/guide/project/task-instance.html',
},
]
},
{
title: 'Task',
children: [
{
title: 'Shell',
link: '/en-us/docs/dev/user_doc/guide/task/shell.html',
},
{
title: 'SubProcess',
link: '/en-us/docs/dev/user_doc/guide/task/sub-process.html',
},
{
title: 'Dependent',
link: '/en-us/docs/dev/user_doc/guide/task/dependent.html',
},
{
title: 'Stored Procedure',
link: '/en-us/docs/dev/user_doc/guide/task/stored-procedure.html',
},
{
title: 'SQL',
link: '/en-us/docs/dev/user_doc/guide/task/sql.html',
},
{
title: 'Spark',
link: '/en-us/docs/dev/user_doc/guide/task/spark.html',
},
{
title: 'MapReduce',
link: '/en-us/docs/dev/user_doc/guide/task/map-reduce.html',
},
{
title: 'Python',
link: '/en-us/docs/dev/user_doc/guide/task/python.html',
},
{
title: 'Flink',
link: '/en-us/docs/dev/user_doc/guide/task/flink.html',
},
{
title: 'HTTP',
link: '/en-us/docs/dev/user_doc/guide/task/http.html',
},
{
title: 'DataX',
link: '/en-us/docs/dev/user_doc/guide/task/datax.html',
},
{
title: 'Pigeon',
link: '/en-us/docs/dev/user_doc/guide/task/pigeon.html',
},
{
title: 'Conditions',
link: '/en-us/docs/dev/user_doc/guide/task/conditions.html',
},
{
title: 'Switch',
link: '/en-us/docs/dev/user_doc/guide/task/switch.html',
},
{
title: 'Amazon EMR',
link: '/en-us/docs/dev/user_doc/guide/task/emr.html',
},
],
},
{
title: 'Parameter',
children: [
{
title: 'Built-in Parameter',
link: '/en-us/docs/dev/user_doc/guide/parameter/built-in.html',
},
{
title: 'Global Parameter',
link: '/en-us/docs/dev/user_doc/guide/parameter/global.html',
},
{
title: 'Local Parameter',
link: '/en-us/docs/dev/user_doc/guide/parameter/local.html',
},
{
title: 'Parameter Context',
link: '/en-us/docs/dev/user_doc/guide/parameter/context.html',
},
{
title: 'Parameter Priority',
link: '/en-us/docs/dev/user_doc/guide/parameter/priority.html',
},
],
},
{
title: 'Data Source',
children: [
{
title: 'Introduction',
link: '/en-us/docs/dev/user_doc/guide/datasource/introduction.html',
},
{
title: 'MySQL',
link: '/en-us/docs/dev/user_doc/guide/datasource/mysql.html',
},
{
title: 'PostgreSQL',
link: '/en-us/docs/dev/user_doc/guide/datasource/postgresql.html',
},
{
title: 'HIVE',
link: '/en-us/docs/dev/user_doc/guide/datasource/hive.html',
},
{
title: 'Spark',
link: '/en-us/docs/dev/user_doc/guide/datasource/spark.html',
},
],
},
{
title: 'Alert',
children: [
{
title: 'Alert Component User Guide ',
link: '/en-us/docs/dev/user_doc/guide/alert/alert_plugin_user_guide.html',
},
{
title: 'Telegram',
link: '/en-us/docs/dev/user_doc/guide/alert/telegram.html',
},
{
title: 'Ding Talk',
link: '/en-us/docs/dev/user_doc/guide/alert/dingtalk.html',
},
{
title: 'Enterprise Wechat',
link: '/en-us/docs/dev/user_doc/guide/alert/enterprise-wechat.html',
},
{
title: 'Enterprise Webexteams',
link: '/en-us/docs/dev/user_doc/guide/alert/enterprise-webexteams.html',
},
],
},
{
title: 'Resource',
link: '/en-us/docs/dev/user_doc/guide/resource.html',
},
{
title: 'Monitor',
link: '/en-us/docs/dev/user_doc/guide/monitor.html',
},
{
title: 'Security',
link: '/en-us/docs/dev/user_doc/guide/security.html',
},
{
title: 'Open API',
link: '/en-us/docs/dev/user_doc/guide/open-api.html',
},
{
title: 'Flink',
link: '/en-us/docs/dev/user_doc/guide/flink-call.html',
},
{
title: 'Upgrade',
link: '/en-us/docs/dev/user_doc/guide/upgrade.html',
},
{
title: 'Expansion and Reduction',
link: '/en-us/docs/dev/user_doc/guide/expansion-reduction.html',
},
],
},
{
title: 'Architecture Guide',
children: [
{
title: 'Architecture Design',
link: '/en-us/docs/dev/user_doc/architecture/design.html',
},
{
title: 'Metadata',
link: '/en-us/docs/dev/user_doc/architecture/metadata.html',
},
{
title: 'Configuration File',
link: '/en-us/docs/dev/user_doc/architecture/configuration.html',
},
{
title: 'Task Structure',
link: '/en-us/docs/dev/user_doc/architecture/task-structure.html',
},
{
title: 'Load Balance',
link: '/en-us/docs/dev/user_doc/architecture/load-balance.html',
},
{
title: 'Cache',
link: '/en-us/docs/dev/user_doc/architecture/cache.html',
},
],
},
{
title: 'Observability',
children: [
{
title: 'SkyWalking-Agent',
link: '/en-us/docs/dev/user_doc/guide/installation/skywalking-agent.html',
},
],
},
{
title: 'FAQ',
children: [
{
title: 'FAQ',
link: '/en-us/docs/release/faq.html',
},
],
},
{
title: 'Older Versions',
children: [
{
title: 'Older Versions',
link: '/en-us/docs/release/history-versions.html',
},
],
},
],
barText: 'Documentation',
},
'zh-cn': {
sidemenu: [
{
title: '关于Apache DolphinScheduler',
children: [
{
title: '简介',
link: '/zh-cn/docs/dev/user_doc/about/introduction.html',
},
{
title: '建议配置',
link: '/zh-cn/docs/dev/user_doc/about/hardware.html',
},
{
title: '名词解释',
link: '/zh-cn/docs/dev/user_doc/about/glossary.html',
},
],
},
{
title: '快速上手',
children: [
{
title: '快速上手',
link: '/zh-cn/docs/dev/user_doc/guide/start/quick-start.html',
},
{
title: 'Docker部署(Docker)',
link: '/zh-cn/docs/dev/user_doc/guide/start/docker.html',
},
],
},
{
title: '部署指南',
children: [
{
title: '单机部署(Standalone)',
link: '/zh-cn/docs/dev/user_doc/guide/installation/standalone.html',
},
{
title: '伪集群部署(Pseudo-Cluster)',
link: '/zh-cn/docs/dev/user_doc/guide/installation/pseudo-cluster.html',
},
{
title: '集群部署(Cluster)',
link: '/zh-cn/docs/dev/user_doc/guide/installation/cluster.html',
},
{
title: 'Kubernetes部署(Kubernetes)',
link: '/zh-cn/docs/dev/user_doc/guide/installation/kubernetes.html',
},
],
},
{
title: '功能介绍',
children: [
{
title: '指标总览',
link: '/zh-cn/docs/dev/user_doc/guide/homepage.html',
},
{
title: '项目管理',
children: [
{
title: '项目列表',
link: '/zh-cn/docs/dev/user_doc/guide/project/project-list.html',
},
{
title: '工作流定义',
link: '/zh-cn/docs/dev/user_doc/guide/project/workflow-definition.html',
},
{
title: '工作流实例',
link: '/zh-cn/docs/dev/user_doc/guide/project/workflow-instance.html',
},
{
title: '任务实例',
link: '/zh-cn/docs/dev/user_doc/guide/project/task-instance.html',
},
]
},
{
title: '任务类型',
children: [
{
title: 'Shell',
link: '/zh-cn/docs/dev/user_doc/guide/task/shell.html',
},
{
title: 'SubProcess',
link: '/zh-cn/docs/dev/user_doc/guide/task/sub-process.html',
},
{
title: 'Dependent',
link: '/zh-cn/docs/dev/user_doc/guide/task/dependent.html',
},
{
title: 'Stored Procedure',
link: '/zh-cn/docs/dev/user_doc/guide/task/stored-procedure.html',
},
{
title: 'SQL',
link: '/zh-cn/docs/dev/user_doc/guide/task/sql.html',
},
{
title: 'Spark',
link: '/zh-cn/docs/dev/user_doc/guide/task/spark.html',
},
{
title: 'MapReduce',
link: '/zh-cn/docs/dev/user_doc/guide/task/map-reduce.html',
},
{
title: 'Python',
link: '/zh-cn/docs/dev/user_doc/guide/task/python.html',
},
{
title: 'Flink',
link: '/zh-cn/docs/dev/user_doc/guide/task/flink.html',
},
{
title: 'HTTP',
link: '/zh-cn/docs/dev/user_doc/guide/task/http.html',
},
{
title: 'DataX',
link: '/zh-cn/docs/dev/user_doc/guide/task/datax.html',
},
{
title: 'Pigeon',
link: '/zh-cn/docs/dev/user_doc/guide/task/pigeon.html',
},
{
title: 'Conditions',
link: '/zh-cn/docs/dev/user_doc/guide/task/conditions.html',
},
{
title: 'Switch',
link: '/zh-cn/docs/dev/user_doc/guide/task/switch.html',
},
{
title: 'Amazon EMR',
link: '/zh-cn/docs/dev/user_doc/guide/task/emr.html',
},
],
},
{
title: '参数',
children: [
{
title: '内置参数',
link: '/zh-cn/docs/dev/user_doc/guide/parameter/built-in.html',
},
{
title: '全局参数',
link: '/zh-cn/docs/dev/user_doc/guide/parameter/global.html',
},
{
title: '本地参数',
link: '/zh-cn/docs/dev/user_doc/guide/parameter/local.html',
},
{
title: '参数传递',
link: '/zh-cn/docs/dev/user_doc/guide/parameter/context.html',
},
{
title: '参数优先级',
link: '/zh-cn/docs/dev/user_doc/guide/parameter/priority.html',
},
],
},
{
title: '数据源中心',
children: [
{
title: '简介',
link: '/zh-cn/docs/dev/user_doc/guide/datasource/introduction.html',
},
{
title: 'MySQL',
link: '/zh-cn/docs/dev/user_doc/guide/datasource/mysql.html',
},
{
title: 'PostgreSQL',
link: '/zh-cn/docs/dev/user_doc/guide/datasource/postgresql.html',
},
{
title: 'HIVE',
link: '/zh-cn/docs/dev/user_doc/guide/datasource/hive.html',
},
{
title: 'Spark',
link: '/zh-cn/docs/dev/user_doc/guide/datasource/spark.html',
},
],
},
{
title: '告警',
children: [
{
title: '告警组件向导',
link: '/zh-cn/docs/dev/user_doc/guide/alert/alert_plugin_user_guide.html',
},
{
title: 'Telegram',
link: '/zh-cn/docs/dev/user_doc/guide/alert/telegram.html',
},
{
title: '钉钉告警',
link: '/zh-cn/docs/dev/user_doc/guide/alert/dingtalk.html',
},
{
title: '企业微信',
link: '/zh-cn/docs/dev/user_doc/guide/alert/enterprise-wechat.html',
},
{
title: 'Webexteams',
link: '/zh-cn/docs/dev/user_doc/guide/alert/enterprise-webexteams.html',
},
],
},
{
title: '资源中心',
link: '/zh-cn/docs/dev/user_doc/guide/resource.html',
},
{
title: '监控中心',
link: '/zh-cn/docs/dev/user_doc/guide/monitor.html',
},
{
title: '安全中心',
link: '/zh-cn/docs/dev/user_doc/guide/security.html',
},
{
title: 'API调用',
link: '/zh-cn/docs/dev/user_doc/guide/open-api.html',
},
{
title: 'Flink调用',
link: '/zh-cn/docs/dev/user_doc/guide/flink-call.html',
},
{
title: '升级',
link: '/zh-cn/docs/dev/user_doc/guide/upgrade.html',
},
{
title: '/缩容',
link: '/zh-cn/docs/dev/user_doc/guide/expansion-reduction.html',
},
],
},
{
title: '架构设计',
children: [
{
title: '元数据文档',
link: '/zh-cn/docs/dev/user_doc/architecture/metadata.html',
},
{
title: '架构设计',
link: '/zh-cn/docs/dev/user_doc/architecture/design.html',
},
{
title: '配置文件',
link: '/zh-cn/docs/dev/user_doc/architecture/configuration.html',
},
{
title: '任务结构',
link: '/zh-cn/docs/dev/user_doc/architecture/task-structure.html',
},
{
title: '负载均衡',
link: '/zh-cn/docs/dev/user_doc/architecture/load-balance.html',
},
{
title: '缓存',
link: '/zh-cn/docs/dev/user_doc/architecture/cache.html',
},
],
},
{
title: '可观测性',
children: [
{
title: 'SkyWalking-Agent',
link: '/zh-cn/docs/dev/user_doc/guide/installation/skywalking-agent.html',
},
],
},
{
title: 'FAQ',
children: [
{
title: 'FAQ',
link: '/zh-cn/docs/release/faq.html',
},
],
},
{
title: '历史版本',
children: [
{
title: '历史版本',
link: '/zh-cn/docs/release/history-versions.html',
},
],
},
],
barText: '文档',
},
};

127
docs/configs/index.md.jsx

@ -0,0 +1,127 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*
*/
import React from 'react';
import ReactDOM from 'react-dom';
import cookie from 'js-cookie';
import Language from '../../components/language';
import Header from '../../components/header';
import Footer from '../../components/footer';
import Md2Html from '../../components/md2html';
import Sidemenu from '../../components/sidemenu';
import siteConfig from '../../../site_config/site';
import docs120Config from '../../../site_config/docs1-2-0';
import docs121Config from '../../../site_config/docs1-2-1';
import docs131Config from '../../../site_config/docs1-3-1';
import docs132Config from '../../../site_config/docs1-3-2';
import docs133Config from '../../../site_config/docs1-3-3';
import docs134Config from '../../../site_config/docs1-3-4';
import docs135Config from '../../../site_config/docs1-3-5';
import docs136Config from '../../../site_config/docs1-3-6';
import docs138Config from '../../../site_config/docs1-3-8';
import docs139Config from '../../../site_config/docs1-3-9';
import docs200Config from '../../../site_config/docs2-0-0';
import docs201Config from '../../../site_config/docs2-0-1';
import docs202Config from '../../../site_config/docs2-0-2';
import docs203Config from '../../../site_config/docs2-0-3';
import docs205Config from '../../../site_config/docs2-0-5';
import docsDevConfig from '../../../site_config/docsdev';
const docsSource = {
'1.2.0': docs120Config,
'1.2.1': docs121Config,
'1.3.1': docs131Config,
'1.3.2': docs132Config,
'1.3.3': docs133Config,
'1.3.4': docs134Config,
'1.3.5': docs135Config,
'1.3.6': docs136Config,
'1.3.8': docs138Config,
'1.3.9': docs139Config,
'2.0.0': docs200Config,
'2.0.1': docs201Config,
'2.0.2': docs202Config,
'2.0.3': docs203Config,
'2.0.5': docs205Config,
dev: docsDevConfig,
};
const isValidVersion = version => version && docsSource.hasOwnProperty(version);
class Docs extends Md2Html(Language) {
render() {
const language = this.getLanguage();
let dataSource = {};
// from location path
let version = window.location.pathname.split('/')[3];
if (isValidVersion(version) || version === 'latest') {
cookie.set('docs_version', version);
}
// from rendering html
if (!version && this.props.subdir) {
version = this.props.subdir.split('/')[0];
}
if (isValidVersion(version)) {
dataSource = docsSource[version][language];
} else if (isValidVersion(cookie.get('docs_version'))) {
dataSource = docsSource[cookie.get('docs_version')][language];
} else if (isValidVersion(siteConfig.docsLatest)) {
dataSource = docsSource[siteConfig.docsLatest][language];
dataSource.sidemenu.forEach((menu) => {
menu.children.forEach((submenu) => {
if (!submenu.children) {
submenu.link = submenu.link.replace(`docs/${siteConfig.docsLatest}`, 'docs/latest');
} else {
submenu.children.forEach((menuLevel3) => {
menuLevel3.link = menuLevel3.link.replace(`docs/${siteConfig.docsLatest}`, 'docs/latest');
});
}
});
});
} else {
return null;
}
const __html = this.props.__html || this.state.__html;
return (
<div className="md2html docs-page">
<Header
currentKey="docs"
type="dark"
logo="/img/hlogo_white.svg"
language={language}
onLanguageChange={this.onLanguageChange}
/>
<section className="content-section">
<Sidemenu dataSource={dataSource.sidemenu} />
<div
className="doc-content markdown-body"
ref={(node) => { this.markdownContainer = node; }}
dangerouslySetInnerHTML={{ __html }}
/>
</section>
<Footer logo="/img/ds_gray.svg" language={language} />
</div>
);
}
}
document.getElementById('root') && ReactDOM.render(<Docs />, document.getElementById('root'));
export default Docs;

396
docs/configs/site.js

@ -0,0 +1,396 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*
*/
// 全局的一些配置
export default {
rootPath: '',
port: 8080,
domain: 'dolphinscheduler.apache.org',
copyToDist: ['asset', 'img', 'file', '.asf.yaml', 'sitemap.xml', '.nojekyll', '.htaccess', 'googled0df7b96f277a143.html'],
docsLatest: '2.0.5',
defaultSearch: 'google', // default search engine
defaultLanguage: 'en-us',
'en-us': {
pageMenu: [
{
key: 'home',
text: 'HOME',
link: '/en-us/index.html',
},
{
key: 'docs',
text: 'DOCS',
link: '/en-us/docs/latest/user_doc/guide/quick-start.html',
children: [
{
key: 'docs0',
text: 'latest(2.0.5)',
link: '/en-us/docs/latest/user_doc/guide/quick-start.html',
},
{
key: 'docs1',
text: '1.3.9',
link: '/en-us/docs/1.3.9/user_doc/quick-start.html',
},
{
key: 'docsHistory',
text: 'Older Versions',
link: '/en-us/docs/release/history-versions.html',
},
{
key: 'pythonAPI',
text: 'PyDolphinScheduler',
link: '/python/index.html',
},
{
key: 'docsdev',
text: 'dev',
link: '/en-us/docs/dev/user_doc/about/introduction.html',
},
],
},
{
key: 'download',
text: 'DOWNLOAD',
link: '/en-us/download/download.html',
},
{ key: 'blog',
text: 'BLOG',
link: '/en-us/blog/index.html',
},
{
key: 'development',
text: 'DEVELOPMENT',
link: '/en-us/development/development-environment-setup.html',
},
{
key: 'community',
text: 'COMMUNITY',
link: '/en-us/community/team.html',
},
{
key: 'ASF',
text: 'ASF',
target: '_blank',
link: 'https://www.apache.org/',
children: [
{
key: 'Foundation',
text: 'Foundation',
target: '_blank',
link: 'https://www.apache.org/',
},
{
key: 'License',
text: 'License',
target: '_blank',
link: 'https://www.apache.org/licenses/',
},
{
key: 'Events',
text: 'Events',
target: '_blank',
link: 'https://www.apache.org/events/current-event',
},
{
key: 'Security',
text: 'Security',
target: '_blank',
link: 'https://www.apache.org/security/',
},
{
key: 'Sponsorship',
text: 'Sponsorship',
target: '_blank',
link: 'https://www.apache.org/foundation/sponsorship.html',
},
{
key: 'Thanks',
text: 'Thanks',
target: '_blank',
link: 'https://www.apache.org/foundation/thanks.html',
},
],
},
{
key: 'user',
text: 'USER',
link: '/en-us/user/index.html',
},
],
documentation: {
title: 'Documentation',
list: [
{
text: 'Overview',
link: '/en-us/development/architecture-design.html',
},
{
text: 'Quick start',
link: '/en-us/docs/latest/user_doc/guide/quick-start.html',
},
{
text: 'Developer guide',
link: '/en-us/development/development-environment-setup.html',
},
],
},
asf: {
title: 'ASF',
list: [
{
text: 'Foundation',
link: 'http://www.apache.org',
},
{
text: 'License',
link: 'http://www.apache.org/licenses/',
},
{
text: 'Events',
link: 'http://www.apache.org/events/current-event',
},
{
text: 'Sponsorship',
link: 'http://www.apache.org/foundation/sponsorship.html',
},
{
text: 'Thanks',
link: 'http://www.apache.org/foundation/thanks.html',
},
],
},
contact: {
title: 'About us',
content: 'Do you need feedback? Please contact us through the following ways.',
list: [
{
name: 'Email List',
img1: '/img/emailgray.png',
img2: '/img/emailblue.png',
link: '/en-us/community/development/subscribe.html',
},
{
name: 'Twitter',
img1: '/img/twittergray.png',
img2: '/img/twitterblue.png',
link: 'https://twitter.com/dolphinschedule',
},
{
name: 'Stack Overflow',
img1: '/img/stackoverflow.png',
img2: '/img/stackoverflow-selected.png',
link: 'https://stackoverflow.com/questions/tagged/apache-dolphinscheduler',
},
{
name: 'Slack',
img1: '/img/slack.png',
img2: '/img/slack-selected.png',
link: 'https://join.slack.com/t/asf-dolphinscheduler/shared_invite/zt-omtdhuio-_JISsxYhiVsltmC5h38yfw',
},
],
},
copyright: 'Copyright © 2019-2021 The Apache Software Foundation. Apache DolphinScheduler, DolphinScheduler, and its feather logo are trademarks of The Apache Software Foundation.',
},
'zh-cn': {
pageMenu: [
{
key: 'home',
text: '首页',
link: '/zh-cn/index.html',
},
{
key: 'docs',
text: '文档',
link: '/zh-cn/docs/latest/user_doc/guide/quick-start.html',
children: [
{
key: 'docs0',
text: '最新版本latest(2.0.5)',
link: '/zh-cn/docs/latest/user_doc/guide/quick-start.html',
},
{
key: 'docs1',
text: '1.3.9',
link: '/zh-cn/docs/1.3.9/user_doc/quick-start.html',
},
{
key: 'docsHistory',
text: '历史版本',
link: '/zh-cn/docs/release/history-versions.html',
},
{
key: 'pythonAPI',
text: 'PyDolphinScheduler',
link: '/python/index.html',
},
{
key: 'docsdev',
text: 'dev',
link: '/zh-cn/docs/dev/user_doc/about/introduction.html',
},
],
},
{
key: 'download',
text: '下载',
link: '/zh-cn/download/download.html',
},
{
key: 'blog',
text: '博客',
link: '/zh-cn/blog/index.html',
},
{
key: 'development',
text: '开发者',
link: '/zh-cn/development/development-environment-setup.html',
},
{
key: 'community',
text: '社区',
link: '/zh-cn/community/team.html',
},
{
key: 'ASF',
text: 'ASF',
target: '_blank',
link: 'https://www.apache.org/',
children: [
{
key: 'Foundation',
text: 'Foundation',
target: '_blank',
link: 'https://www.apache.org/',
},
{
key: 'License',
text: 'License',
target: '_blank',
link: 'https://www.apache.org/licenses/',
},
{
key: 'Events',
text: 'Events',
target: '_blank',
link: 'https://www.apache.org/events/current-event',
},
{
key: 'Security',
text: 'Security',
target: '_blank',
link: 'https://www.apache.org/security/',
},
{
key: 'Sponsorship',
text: 'Sponsorship',
target: '_blank',
link: 'https://www.apache.org/foundation/sponsorship.html',
},
{
key: 'Thanks',
text: 'Thanks',
target: '_blank',
link: 'https://www.apache.org/foundation/thanks.html',
},
],
},
{
key: 'user',
text: '用户',
// link: '',
link: '/zh-cn/user/index.html',
},
],
documentation: {
title: '文档',
list: [
{
text: '概览',
link: '/zh-cn/development/architecture-design.html',
},
{
text: '快速开始',
link: '/zh-cn/docs/latest/user_doc/guide/quick-start.html',
},
{
text: '开发者指南',
link: '/zh-cn/development/development-environment-setup.html',
},
],
},
asf: {
title: 'ASF',
list: [
{
text: '基金会',
link: 'http://www.apache.org',
},
{
text: '证书',
link: 'http://www.apache.org/licenses/',
},
{
text: '事件',
link: 'http://www.apache.org/events/current-event',
},
{
text: '赞助',
link: 'http://www.apache.org/foundation/sponsorship.html',
},
{
text: '致谢',
link: 'http://www.apache.org/foundation/thanks.html',
},
],
},
contact: {
title: '联系我们',
content: '有问题需要反馈请通过以下方式联系我们',
list: [
{
name: '邮件列表',
img1: '/img/emailgray.png',
img2: '/img/emailblue.png',
link: '/zh-cn/community/development/subscribe.html',
},
{
name: 'Twitter',
img1: '/img/twittergray.png',
img2: '/img/twitterblue.png',
link: 'https://twitter.com/dolphinschedule',
},
{
name: 'Stack Overflow',
img1: '/img/stackoverflow.png',
img2: '/img/stackoverflow-selected.png',
link: 'https://stackoverflow.com/questions/tagged/apache-dolphinscheduler',
},
{
name: 'Slack',
img1: '/img/slack.png',
img2: '/img/slack-selected.png',
link: 'https://join.slack.com/t/asf-dolphinscheduler/shared_invite/zt-omtdhuio-_JISsxYhiVsltmC5h38yfw',
},
],
},
copyright: 'Copyright © 2019-2021 The Apache Software Foundation. Apache DolphinScheduler, DolphinScheduler, and its feather logo are trademarks of The Apache Software Foundation.',
},
};

79
docs/docs/en/about/glossary.md

@ -0,0 +1,79 @@
## System Architecture Design
Before explaining the architecture of the scheduling system, let's first understand the commonly used terms of the
scheduling system
### 1.Glossary
**DAG:** The full name is Directed Acyclic Graph, referred to as DAG. Task tasks in the workflow are assembled in the
form of a directed acyclic graph, and topological traversal is performed from nodes with zero degrees of entry until
there are no subsequent nodes. Examples are as follows:
<p align="center">
<img src="/img/dag_examples_cn.jpg" alt="dag example" width="60%" />
<p align="center">
<em>dag example</em>
</p>
</p>
**Process definition**: Visualization formed by dragging task nodes and establishing task node associations**DAG**
**Process instance**: The process instance is the instantiation of the process definition, which can be generated by
manual start or scheduled scheduling. Each time the process definition runs, a process instance is generated
**Task instance**: The task instance is the instantiation of the task node in the process definition, which identifies
the specific task execution status
**Task type**: Currently supports SHELL, SQL, SUB_PROCESS (sub-process), PROCEDURE, MR, SPARK, PYTHON, DEPENDENT (
depends), and plans to support dynamic plug-in expansion, note: **SUB_PROCESS** It is also a separate process
definition that can be started and executed separately
**Scheduling method**: The system supports scheduled scheduling and manual scheduling based on cron expressions. Command
type support: start workflow, start execution from current node, resume fault-tolerant workflow, resume pause process,
start execution from failed node, complement, timing, rerun, pause, stop, resume waiting thread. Among them **Resume
fault-tolerant workflow** and **Resume waiting thread** The two command types are used by the internal control of
scheduling, and cannot be called from the outside
**Scheduled**: System adopts **quartz** distributed scheduler, and supports the visual generation of cron expressions
**Rely**: The system not only supports **DAG** simple dependencies between the predecessor and successor nodes, but also
provides **task dependent** nodes, supporting **between processes**
**Priority**: Support the priority of process instances and task instances, if the priority of process instances and
task instances is not set, the default is first-in-first-out
**Email alert**: Support **SQL task** Query result email sending, process instance running result email alert and fault
tolerance alert notification
**Failure strategy**: For tasks running in parallel, if a task fails, two failure strategy processing methods are
provided. **Continue** refers to regardless of the status of the task running in parallel until the end of the process
failure. **End** means that once a failed task is found, Kill will also run the parallel task at the same time, and the
process fails and ends
**Complement**: Supplement historical data,Supports **interval parallel and serial** two complement methods
### 2.Module introduction
- dolphinscheduler-alert alarm module, providing AlertServer service.
- dolphinscheduler-api web application module, providing ApiServer service.
- dolphinscheduler-common General constant enumeration, utility class, data structure or base class
- dolphinscheduler-dao provides operations such as database access.
- dolphinscheduler-remote client and server based on netty
- dolphinscheduler-server MasterServer and WorkerServer services
- dolphinscheduler-service service module, including Quartz, Zookeeper, log client access service, easy to call server
module and api module
- dolphinscheduler-ui front-end module
### Sum up
From the perspective of scheduling, this article preliminarily introduces the architecture principles and implementation
ideas of the big data distributed workflow scheduling system-DolphinScheduler. To be continued

48
docs/docs/en/about/hardware.md

@ -0,0 +1,48 @@
# Hardware Environment
DolphinScheduler, as an open-source distributed workflow task scheduling system, can deploy and run smoothly in Intel architecture server environments and mainstream virtualization environments and supports mainstream Linux operating system environments.
## Linux Operating System Version Requirements
| OS | Version |
| :----------------------- | :----------: |
| Red Hat Enterprise Linux | 7.0 and above |
| CentOS | 7.0 and above |
| Oracle Enterprise Linux | 7.0 and above |
| Ubuntu LTS | 16.04 and above |
> **Attention:**
>The above Linux operating systems can run on physical servers and mainstream virtualization environments such as VMware, KVM, and XEN.
## Recommended Server Configuration
DolphinScheduler supports 64-bit hardware platforms with Intel x86-64 architecture. The following shows the recommended server requirements in a production environment:
### Production Environment
| **CPU** | **MEM** | **HD** | **NIC** | **Num** |
| --- | --- | --- | --- | --- |
| 4 core+ | 8 GB+ | SAS | GbE | 1+ |
> **Attention:**
> - The above recommended configuration is the minimum configuration for deploying DolphinScheduler. Higher configuration is strongly recommended for production environments.
> - The recommended hard disk size is more than 50GB and separate the system disk and data disk.
## Network Requirements
DolphinScheduler provides the following network port configurations for normal operation:
| Server | Port | Desc |
| --- | --- | --- |
| MasterServer | 5678 | not the communication port, require the native ports do not conflict |
| WorkerServer | 1234 | not the communication port, require the native ports do not conflict |
| ApiApplicationServer | 12345 | backend communication port |
> **Attention:**
> - MasterServer and WorkerServer do not need to enable communication between the networks. As long as the local ports do not conflict.
> - Administrators can adjust relevant ports on the network side and host-side according to the deployment plan of DolphinScheduler components in the actual environment.
## Browser Requirements
DolphinScheduler recommends Chrome and the latest browsers which use Chrome Kernel to access the front-end UI page.

19
docs/docs/en/about/introduction.md

@ -0,0 +1,19 @@
# About DolphinScheduler
Apache DolphinScheduler is a distributed, easy to extend visual DAG workflow task scheduling open-source system. Solves the intricate dependencies of data R&D ETL and the inability to monitor the health status of tasks. DolphinScheduler assembles tasks in the DAG streaming way, which can monitor the execution status of tasks in time, and supports operations like retry, recovery failure from specified nodes, pause, resume and kill tasks, etc.
## Simple to Use
- DolphinScheduler has DAG monitoring user interfaces, users can customize DAG by dragging and dropping. All process definitions are visualized, supports rich third-party systems APIs and one-click deployment.
## High Reliability
- Decentralized multi-masters and multi-workers, support HA, select queues to avoid overload.
## Rich Scenarios
- Support features like multi-tenants, suspend and resume operations to cope with big data scenarios. Support many task types like Spark, Flink, Hive, MR, shell, python, sub_process.
## High Scalability
- Supports customized task types, distributed scheduling, and the overall scheduling capability increases linearly with the scale of the cluster.

42
docs/docs/en/architecture/cache.md

@ -0,0 +1,42 @@
# Cache
## Purpose
Due to the large database read operations during the master-server scheduling process. Such as read tables like `tenant`, `user`, `processDefinition`, etc. Operations stress read pressure to the DB, and slow down the entire core scheduling process.
By considering this part of the business data is a high-read and low-write scenario, a cache module is introduced to reduce the DB read pressure and speed up the core scheduling process.
## Cache Settings
```yaml
spring:
cache:
# default disable cache, you can enable by `type: caffeine`
type: none
cache-names:
- tenant
- user
- processDefinition
- processTaskRelation
- taskDefinition
caffeine:
spec: maximumSize=100,expireAfterWrite=300s,recordStats
```
The cache module uses [spring-cache](https://spring.io/guides/gs/caching/), so you can set cache config like whether to enable cache (`none` to disable by default), cache types in the spring `application.yaml` directly.
Currently, implements the config of [caffeine](https://github.com/ben-manes/caffeine), you can assign cache configs like cache size, expire time, etc.
## Cache Read
The cache module adopts the `@Cacheable` annotation from spring-cache and you can annotate the annotation in the related mapper layer. Refer to the `TenantMapper`.
## Cache Evict
The business data updates come from the api-server, and the cache side is in the master-server. Then it is necessary to monitor the data updates from the api-server (use aspect point cut interceptor `@CacheEvict`), and notify the master-server of `cacheEvictCommand` when processing a cache eviction.
Note: the final strategy for cache update comes from the expiration strategy configuration in caffeine, therefore configure it under the business scenarios;
The sequence diagram shows below:
<img src="/img/cache-evict.png" alt="cache-evict" style="zoom: 67%;" />

424
docs/docs/en/architecture/configuration.md

@ -0,0 +1,424 @@
<!-- markdown-link-check-disable -->
# Configuration
## Preface
This document explains the DolphinScheduler application configurations according to DolphinScheduler-1.3.x versions.
## Directory Structure
Currently, all the configuration files are under [conf ] directory.
Check the following simplified DolphinScheduler installation directories to have a direct view about the position of [conf] directory and configuration files it has.
This document only describes DolphinScheduler configurations and other topics are not going into.
[Note: the DolphinScheduler (hereinafter called the ‘DS’) .]
```
├─bin DS application commands directory
│ ├─dolphinscheduler-daemon.sh startup or shutdown DS application
│ ├─start-all.sh startup all DS services with configurations
│ ├─stop-all.sh shutdown all DS services with configurations
├─conf configurations directory
│ ├─application-api.properties API-service config properties
│ ├─datasource.properties datasource config properties
│ ├─zookeeper.properties ZooKeeper config properties
│ ├─master.properties master-service config properties
│ ├─worker.properties worker-service config properties
│ ├─quartz.properties quartz config properties
│ ├─common.properties common-service [storage] config properties
│ ├─alert.properties alert-service config properties
│ ├─config environment variables config directory
│ ├─install_config.conf DS environment variables configuration script [install or start DS]
│ ├─env load environment variables configs script directory
│ ├─dolphinscheduler_env.sh load environment variables configs [eg: JAVA_HOME,HADOOP_HOME, HIVE_HOME ...]
│ ├─org mybatis mapper files directory
│ ├─i18n i18n configs directory
│ ├─logback-api.xml API-service log config
│ ├─logback-master.xml master-service log config
│ ├─logback-worker.xml worker-service log config
│ ├─logback-alert.xml alert-service log config
├─sql .sql files to create or upgrade DS metadata
│ ├─create create SQL scripts directory
│ ├─upgrade upgrade SQL scripts directory
│ ├─dolphinscheduler_postgre.sql PostgreSQL database init script
│ ├─dolphinscheduler_mysql.sql MySQL database init script
│ ├─soft_version current DS version-id file
├─script DS services deployment, database create or upgrade scripts directory
│ ├─create-dolphinscheduler.sh DS database init script
│ ├─upgrade-dolphinscheduler.sh DS database upgrade script
│ ├─monitor-server.sh DS monitor-server start script
│ ├─scp-hosts.sh transfer installation files script
│ ├─remove-zk-node.sh cleanup ZooKeeper caches script
├─ui front-end web resources directory
├─lib DS .jar dependencies directory
├─install.sh auto-setup DS services script
```
## Configurations in Details
serial number| service classification| config file|
|--|--|--|
1|startup or shutdown DS application|dolphinscheduler-daemon.sh
2|datasource config properties|datasource.properties
3|ZooKeeper config properties|zookeeper.properties
4|common-service[storage] config properties|common.properties
5|API-service config properties|application-api.properties
6|master-service config properties|master.properties
7|worker-service config properties|worker.properties
8|alert-service config properties|alert.properties
9|quartz config properties|quartz.properties
10|DS environment variables configuration script[install/start DS]|install_config.conf
11|load environment variables configs <br /> [eg: JAVA_HOME,HADOOP_HOME, HIVE_HOME ...]|dolphinscheduler_env.sh
12|services log config files|API-service log config : logback-api.xml <br /> master-service log config : logback-master.xml <br /> worker-service log config : logback-worker.xml <br /> alert-service log config : logback-alert.xml
### dolphinscheduler-daemon.sh [startup or shutdown DS application]
dolphinscheduler-daemon.sh is responsible for DS startup and shutdown.
Essentially, start-all.sh or stop-all.sh startup and shutdown the cluster via dolphinscheduler-daemon.sh.
Currently, DS just makes a basic config, remember to config further JVM options based on your practical situation of resources.
Default simplified parameters are:
```bash
export DOLPHINSCHEDULER_OPTS="
-server
-Xmx16g
-Xms1g
-Xss512k
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:+UseFastAccessorMethods
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=70
"
```
> "-XX:DisableExplicitGC" is not recommended due to may lead to memory link (DS dependent on Netty to communicate).
### datasource.properties [datasource config properties]
DS uses Druid to manage database connections and default simplified configs are:
|Parameters | Default value| Description|
|--|--|--|
spring.datasource.driver-class-name||datasource driver
spring.datasource.url||datasource connection url
spring.datasource.username||datasource username
spring.datasource.password||datasource password
spring.datasource.initialSize|5| initial connection pool size number
spring.datasource.minIdle|5| minimum connection pool size number
spring.datasource.maxActive|5| maximum connection pool size number
spring.datasource.maxWait|60000| max wait milliseconds
spring.datasource.timeBetweenEvictionRunsMillis|60000| idle connection check interval
spring.datasource.timeBetweenConnectErrorMillis|60000| retry interval
spring.datasource.minEvictableIdleTimeMillis|300000| connections over minEvictableIdleTimeMillis will be collect when idle check
spring.datasource.validationQuery|SELECT 1| validate connection by running the SQL
spring.datasource.validationQueryTimeout|3| validate connection timeout[seconds]
spring.datasource.testWhileIdle|true| set whether the pool validates the allocated connection when a new connection request comes
spring.datasource.testOnBorrow|true| validity check when the program requests a new connection
spring.datasource.testOnReturn|false| validity check when the program recalls a connection
spring.datasource.defaultAutoCommit|true| whether auto commit
spring.datasource.keepAlive|true| runs validationQuery SQL to avoid the connection closed by pool when the connection idles over minEvictableIdleTimeMillis
spring.datasource.poolPreparedStatements|true| open PSCache
spring.datasource.maxPoolPreparedStatementPerConnectionSize|20| specify the size of PSCache on each connection
### zookeeper.properties [zookeeper config properties]
|Parameters | Default value| Description|
|--|--|--|
zookeeper.quorum|localhost:2181| ZooKeeper cluster connection info
zookeeper.dolphinscheduler.root|/dolphinscheduler| DS is stored under ZooKeeper root directory
zookeeper.session.timeout|60000| session timeout
zookeeper.connection.timeout|30000| connection timeout
zookeeper.retry.base.sleep|100| time to wait between subsequent retries
zookeeper.retry.max.sleep|30000| maximum time to wait between subsequent retries
zookeeper.retry.maxtime|10| maximum retry times
### common.properties [hadoop、s3、yarn config properties]
Currently, common.properties mainly configures Hadoop,s3a related configurations.
|Parameters | Default value| Description|
|--|--|--|
data.basedir.path|/tmp/dolphinscheduler| local directory used to store temp files
resource.storage.type|NONE| type of resource files: HDFS, S3, NONE
resource.upload.path|/dolphinscheduler| storage path of resource files
hadoop.security.authentication.startup.state|false| whether hadoop grant kerberos permission
java.security.krb5.conf.path|/opt/krb5.conf|kerberos config directory
login.user.keytab.username|hdfs-mycluster@ESZ.COM|kerberos username
login.user.keytab.path|/opt/hdfs.headless.keytab|kerberos user keytab
kerberos.expire.time|2|kerberos expire time,integer,the unit is hour
resource.view.suffixs| txt,log,sh,conf,cfg,py,java,sql,hql,xml,properties| file types supported by resource center
hdfs.root.user|hdfs| configure users with corresponding permissions if storage type is HDFS
fs.defaultFS|hdfs://mycluster:8020|If resource.storage.type=S3, then the request url would be similar to 's3a://dolphinscheduler'. Otherwise if resource.storage.type=HDFS and hadoop supports HA, copy core-site.xml and hdfs-site.xml into 'conf' directory
fs.s3a.endpoint||s3 endpoint url
fs.s3a.access.key||s3 access key
fs.s3a.secret.key||s3 secret key
yarn.resourcemanager.ha.rm.ids||specify the yarn resourcemanager url. if resourcemanager supports HA, input HA IP addresses (separated by comma), or input null for standalone
yarn.application.status.address|http://ds1:8088/ws/v1/cluster/apps/%s|keep default if ResourceManager supports HA or not use ResourceManager, or replace ds1 with corresponding hostname if ResourceManager in standalone mode
dolphinscheduler.env.path|env/dolphinscheduler_env.sh|load environment variables configs [eg: JAVA_HOME,HADOOP_HOME, HIVE_HOME ...]
development.state|false| specify whether in development state
### application-api.properties [API-service log config]
|Parameters | Default value| Description|
|--|--|--|
server.port|12345|api service communication port
server.servlet.session.timeout|7200|session timeout
server.servlet.context-path|/dolphinscheduler | request path
spring.servlet.multipart.max-file-size|1024MB| maximum file size
spring.servlet.multipart.max-request-size|1024MB| maximum request size
server.jetty.max-http-post-size|5000000| jetty maximum post size
spring.messages.encoding|UTF-8| message encoding
spring.jackson.time-zone|GMT+8| time zone
spring.messages.basename|i18n/messages| i18n config
security.authentication.type|PASSWORD| authentication type
### master.properties [master-service log config]
|Parameters | Default value| Description|
|--|--|--|
master.listen.port|5678|master listen port
master.exec.threads|100|master-service execute thread number, used to limit the number of process instances in parallel
master.exec.task.num|20|defines the number of parallel tasks for each process instance of the master-service
master.dispatch.task.num|3|defines the number of dispatch tasks for each batch of the master-service
master.host.selector|LowerWeight|master host selector, to select a suitable worker to run the task, optional value: random, round-robin, lower weight
master.heartbeat.interval|10|master heartbeat interval, the unit is second
master.task.commit.retryTimes|5|master commit task retry times
master.task.commit.interval|1000|master commit task interval, the unit is millisecond
master.max.cpuload.avg|-1|master max CPU load avg, only higher than the system CPU load average, master server can schedule. default value -1: the number of CPU cores * 2
master.reserved.memory|0.3|master reserved memory, only lower than system available memory, master server can schedule. default value 0.3, the unit is G
### worker.properties [worker-service log config]
|Parameters | Default value| Description|
|--|--|--|
worker.listen.port|1234|worker-service listen port
worker.exec.threads|100|worker-service execute thread number, used to limit the number of task instances in parallel
worker.heartbeat.interval|10|worker-service heartbeat interval, the unit is second
worker.max.cpuload.avg|-1|worker max CPU load avg, only higher than the system CPU load average, worker server can be dispatched tasks. default value -1: the number of CPU cores * 2
worker.reserved.memory|0.3|worker reserved memory, only lower than system available memory, worker server can be dispatched tasks. default value 0.3, the unit is G
worker.groups|default|worker groups separated by comma, e.g., 'worker.groups=default,test' <br> worker will join corresponding group according to this config when startup
### alert.properties [alert-service log config]
|Parameters | Default value| Description|
|--|--|--|
alert.type|EMAIL|alter type|
mail.protocol|SMTP|mail server protocol
mail.server.host|xxx.xxx.com|mail server host
mail.server.port|25|mail server port
mail.sender|xxx@xxx.com|mail sender email
mail.user|xxx@xxx.com|mail sender email name
mail.passwd|111111|mail sender email password
mail.smtp.starttls.enable|true|specify mail whether open tls
mail.smtp.ssl.enable|false|specify mail whether open ssl
mail.smtp.ssl.trust|xxx.xxx.com|specify mail ssl trust list
xls.file.path|/tmp/xls|mail attachment temp storage directory
||following configure WeCom[optional]|
enterprise.wechat.enable|false|specify whether enable WeCom
enterprise.wechat.corp.id|xxxxxxx|WeCom corp id
enterprise.wechat.secret|xxxxxxx|WeCom secret
enterprise.wechat.agent.id|xxxxxxx|WeCom agent id
enterprise.wechat.users|xxxxxxx|WeCom users
enterprise.wechat.token.url|https://qyapi.weixin.qq.com/cgi-bin/gettoken? <br /> corpid=$corpId&corpsecret=$secret|WeCom token url
enterprise.wechat.push.url|https://qyapi.weixin.qq.com/cgi-bin/message/send? <br /> access_token=$token|WeCom push url
enterprise.wechat.user.send.msg||send message format
enterprise.wechat.team.send.msg||group message format
plugin.dir|/Users/xx/your/path/to/plugin/dir|plugin directory
### quartz.properties [quartz config properties]
This part describes quartz configs and configure them based on your practical situation and resources.
|Parameters | Default value| Description|
|--|--|--|
org.quartz.jobStore.driverDelegateClass | org.quartz.impl.jdbcjobstore.StdJDBCDelegate |
org.quartz.jobStore.driverDelegateClass | org.quartz.impl.jdbcjobstore.PostgreSQLDelegate |
org.quartz.scheduler.instanceName | DolphinScheduler |
org.quartz.scheduler.instanceId | AUTO |
org.quartz.scheduler.makeSchedulerThreadDaemon | true |
org.quartz.jobStore.useProperties | false |
org.quartz.threadPool.class | org.quartz.simpl.SimpleThreadPool |
org.quartz.threadPool.makeThreadsDaemons | true |
org.quartz.threadPool.threadCount | 25 |
org.quartz.threadPool.threadPriority | 5 |
org.quartz.jobStore.class | org.quartz.impl.jdbcjobstore.JobStoreTX |
org.quartz.jobStore.tablePrefix | QRTZ_ |
org.quartz.jobStore.isClustered | true |
org.quartz.jobStore.misfireThreshold | 60000 |
org.quartz.jobStore.clusterCheckinInterval | 5000 |
org.quartz.jobStore.acquireTriggersWithinLock|true |
org.quartz.jobStore.dataSource | myDs |
org.quartz.dataSource.myDs.connectionProvider.class | org.apache.dolphinscheduler.service.quartz.DruidConnectionProvider |
### install_config.conf [DS environment variables configuration script[install or start DS]]
install_config.conf is a bit complicated and is mainly used in the following two places.
* DS Cluster Auto Installation.
> System will load configs in the install_config.conf and auto-configure files below, based on the file content when executing 'install.sh'.
> Files such as dolphinscheduler-daemon.sh, datasource.properties, zookeeper.properties, common.properties, application-api.properties, master.properties, worker.properties, alert.properties, quartz.properties, etc.
* Startup and Shutdown DS Cluster.
> The system will load masters, workers, alert-server, API-servers and other parameters inside the file to startup or shutdown DS cluster.
#### File Content
```bash
# Note: please escape the character if the file contains special characters such as `.*[]^${}\+?|()@#&`.
# eg: `[` escape to `\[`
# Database type (DS currently only supports PostgreSQL and MySQL)
dbtype="mysql"
# Database url and port
dbhost="192.168.xx.xx:3306"
# Database name
dbname="dolphinscheduler"
# Database username
username="xx"
# Database password
password="xx"
# ZooKeeper url
zkQuorum="192.168.xx.xx:2181,192.168.xx.xx:2181,192.168.xx.xx:2181"
# DS installation path, such as '/data1_1T/dolphinscheduler'
installPath="/data1_1T/dolphinscheduler"
# Deployment user
# Note: Deployment user needs 'sudo' privilege and has rights to operate HDFS.
# Root directory must be created by the same user if using HDFS, otherwise permission related issues will be raised.
deployUser="dolphinscheduler"
# Followings are alert-service configs
# Mail server host
mailServerHost="smtp.exmail.qq.com"
# Mail server port
mailServerPort="25"
# Mail sender
mailSender="xxxxxxxxxx"
# Mail user
mailUser="xxxxxxxxxx"
# Mail password
mailPassword="xxxxxxxxxx"
# Whether mail supports TLS
starttlsEnable="true"
# Whether mail supports SSL. Note: starttlsEnable and sslEnable cannot both set true.
sslEnable="false"
# Mail server host, same as mailServerHost
sslTrust="smtp.exmail.qq.com"
# Specify which resource upload function to use for resources storage, such as sql files. And supported options are HDFS, S3 and NONE. HDFS for upload to HDFS and NONE for not using this function.
resourceStorageType="NONE"
# if S3, write S3 address. HA, for example: s3a://dolphinscheduler,
# Note: s3 make sure to create the root directory /dolphinscheduler
defaultFS="hdfs://mycluster:8020"
# If parameter 'resourceStorageType' is S3, following configs are needed:
s3Endpoint="http://192.168.xx.xx:9010"
s3AccessKey="xxxxxxxxxx"
s3SecretKey="xxxxxxxxxx"
# If ResourceManager supports HA, then input master and standby node IP or hostname, eg: '192.168.xx.xx,192.168.xx.xx'. Or else ResourceManager run in standalone mode, please set yarnHaIps="" and "" for not using yarn.
yarnHaIps="192.168.xx.xx,192.168.xx.xx"
# If ResourceManager runs in standalone, then set ResourceManager node ip or hostname, or else remain default.
singleYarnIp="yarnIp1"
# Storage path when using HDFS/S3
resourceUploadPath="/dolphinscheduler"
# HDFS/S3 root user
hdfsRootUser="hdfs"
# Followings are Kerberos configs
# Specify Kerberos enable or not
kerberosStartUp="false"
# Kdc krb5 config file path
krb5ConfPath="$installPath/conf/krb5.conf"
# Keytab username
keytabUserName="hdfs-mycluster@ESZ.COM"
# Username keytab path
keytabPath="$installPath/conf/hdfs.headless.keytab"
# API-service port
apiServerPort="12345"
# All hosts deploy DS
ips="ds1,ds2,ds3,ds4,ds5"
# Ssh port, default 22
sshPort="22"
# Master service hosts
masters="ds1,ds2"
# All hosts deploy worker service
# Note: Each worker needs to set a worker group name and default name is "default"
workers="ds1:default,ds2:default,ds3:default,ds4:default,ds5:default"
# Host deploy alert-service
alertServer="ds3"
# Host deploy API-service
apiServers="ds1"
```
### dolphinscheduler_env.sh [load environment variables configs]
When using shell to commit tasks, DS will load environment variables inside dolphinscheduler_env.sh into the host.
Types of tasks involved are: Shell, Python, Spark, Flink, DataX, etc.
```bash
export HADOOP_HOME=/opt/soft/hadoop
export HADOOP_CONF_DIR=/opt/soft/hadoop/etc/hadoop
export SPARK_HOME1=/opt/soft/spark1
export SPARK_HOME2=/opt/soft/spark2
export PYTHON_HOME=/opt/soft/python
export JAVA_HOME=/opt/soft/java
export HIVE_HOME=/opt/soft/hive
export FLINK_HOME=/opt/soft/flink
export DATAX_HOME=/opt/soft/datax/bin/datax.py
export PATH=$HADOOP_HOME/bin:$SPARK_HOME1/bin:$SPARK_HOME2/bin:$PYTHON_HOME:$JAVA_HOME/bin:$HIVE_HOME/bin:$PATH:$FLINK_HOME/bin:$DATAX_HOME:$PATH
```
### Services logback configs
Services name| logback config name |
--|--|
API-service logback config |logback-api.xml|
master-service logback config|logback-master.xml |
worker-service logback config|logback-worker.xml |
alert-service logback config|logback-alert.xml |

282
docs/docs/en/architecture/design.md

@ -0,0 +1,282 @@
# System Architecture Design
## System Structure
### System Architecture Diagram
<p align="center">
<img src="/img/architecture-1.3.0.jpg" alt="System architecture diagram" width="70%" />
<p align="center">
<em>System architecture diagram</em>
</p>
</p>
### Start Process Activity Diagram
<p align="center">
<img src="/img/process-start-flow-1.3.0.png" alt="Start process activity diagram" width="70%" />
<p align="center">
<em>Start process activity diagram</em>
</p>
</p>
### Architecture Description
* **MasterServer**
MasterServer adopts a distributed and decentralized design concept. MasterServer is mainly responsible for DAG task segmentation, task submission monitoring, and monitoring the health status of other MasterServer and WorkerServer at the same time.
When the MasterServer service starts, register a temporary node with ZooKeeper, and perform fault tolerance by monitoring changes in the temporary node of ZooKeeper.
MasterServer provides monitoring services based on netty.
#### The Service Mainly Includes:
- **Distributed Quartz** distributed scheduling component, which is mainly responsible for the start and stop operations of schedule tasks. When Quartz starts the task, there will be a thread pool inside the Master responsible for the follow-up operation of the processing task.
- **MasterSchedulerThread** is a scanning thread that regularly scans the **command** table in the database and runs different business operations according to different **command types**.
- **MasterExecThread** is mainly responsible for DAG task segmentation, task submission monitoring, and logical processing to different command types.
- **MasterTaskExecThread** is mainly responsible for the persistence to tasks.
* **WorkerServer**
WorkerServer also adopts a distributed and decentralized design concept. WorkerServer is mainly responsible for task execution and providing log services.
When the WorkerServer service starts, register a temporary node with ZooKeeper and maintain a heartbeat.
Server provides monitoring services based on netty.
#### The Service Mainly Includes:
- **Fetch TaskThread** is mainly responsible for continuously getting tasks from the **Task Queue**, and calling **TaskScheduleThread** corresponding executor according to different task types.
* **ZooKeeper**
ZooKeeper service, MasterServer and WorkerServer nodes in the system all use ZooKeeper for cluster management and fault tolerance. In addition, the system implements event monitoring and distributed locks based on ZooKeeper.
We have also implemented queues based on Redis, but we hope DolphinScheduler depends on as few components as possible, so we finally removed the Redis implementation.
* **Task Queue**
Provide task queue operation, the current queue is also implement base on ZooKeeper. Due to little information stored in the queue, there is no need to worry about excessive data in the queue. In fact, we have tested the millions of data storage in queues, which has no impact on system stability and performance.
* **Alert**
Provide alarm related interface, the interface mainly includes **alarm** two types of alarm data storage, query and notification functions. Among them, there are **email notification** and **SNMP (not yet implemented)**.
* **API**
The API interface layer is mainly responsible for processing requests from the front-end UI layer. The service uniformly provides RESTful APIs to provide request services to external.
Interfaces include workflow creation, definition, query, modification, release, logoff, manual start, stop, pause, resume, start execution from specific node, etc.
* **UI**
The front-end page of the system provides various visual operation interfaces of the system, see more at [Introduction to Functions](../guide/homepage.md) section.
### Architecture Design Ideas
#### Decentralization VS Centralization
##### Centralized Thinking
The centralized design concept is relatively simple. The nodes in the distributed cluster are roughly divided into two roles according to responsibilities:
<p align="center">
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/master_slave.png" alt="master-slave character" width="50%" />
</p>
- The role of the master is mainly responsible for task distribution and monitoring the health status of the slave, and can dynamically balance the task to the slave, so that the slave node won't be in a "busy dead" or "idle dead" state.
- The role of Worker is mainly responsible for task execution and heartbeat maintenance to the Master, so that Master can assign tasks to Slave.
Problems in centralized thought design:
- Once there is a problem with the Master, the team grow aimless without commander and the entire cluster collapse. In order to solve this problem, most of the Master and Slave architecture models adopt the design scheme of active and standby Master, which can be hot standby or cold standby, or automatic switching or manual switching. More and more new systems are beginning to have ability to automatically elect and switch Master to improve the availability of the system.
- Another problem is that if the Scheduler is on the Master, although it can support different tasks in a DAG running on different machines, it will cause the Master to be overloaded. If the Scheduler is on the slave, all tasks in a DAG can only submit jobs on a certain machine. When there are more parallel tasks, the pressure on the slave may be greater.
##### Decentralized
<p align="center">
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/decentralization.png" alt="Decentralization" width="50%" />
</p>
- In the decentralized design, there is usually no concept of Master or Slave. All roles are the same, the status is equal, the global Internet is a typical decentralized distributed system. Any node connected to the network goes down, will only affect a small range of functions.
- The core design of decentralized design is that there is no distinct "manager" different from other nodes in the entire distributed system, so there is no single point failure. However, because there is no "manager" node, each node needs to communicate with other nodes to obtain the necessary machine information, and the unreliability of distributed system communication greatly increases the difficulty to implement the above functions.
- In fact, truly decentralized distributed systems are rare. Instead, dynamic centralized distributed systems are constantly pouring out. Under this architecture, the managers in the cluster are dynamically selected, rather than preset, and when the cluster fails, the nodes of the cluster will automatically hold "meetings" to elect new "managers" To preside over the work. The most typical case is Etcd implemented by ZooKeeper and Go language.
- The decentralization of DolphinScheduler is that the Master and Worker register in ZooKeeper, for implement the centerless feature to Master cluster and Worker cluster. Use the ZooKeeper distributed lock to elect one of the Master or Worker as the "manager" to perform the task.
#### Distributed Lock Practice
DolphinScheduler uses ZooKeeper distributed lock to implement only one Master executes Scheduler at the same time, or only one Worker executes the submission of tasks.
1. The following shows the core process algorithm for acquiring distributed locks:
<p align="center">
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/distributed_lock.png" alt="Obtain distributed lock process" width="50%" />
</p>
2. Flow diagram of implementation of Scheduler thread distributed lock in DolphinScheduler:
<p align="center">
<img src="/img/distributed_lock_procss.png" alt="Obtain distributed lock process" width="50%" />
</p>
#### Insufficient Thread Loop Waiting Problem
- If there is no sub-process in a DAG, when the number of data in the Command is greater than the threshold set by the thread pool, the process directly waits or fails.
- If a large DAG nests many sub-processes, there will produce a "dead" state as the following figure:
<p align="center">
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/lack_thread.png" alt="Insufficient threads waiting loop problem" width="50%" />
</p>
In the above figure, MainFlowThread waits for the end of SubFlowThread1, SubFlowThread1 waits for the end of SubFlowThread2, SubFlowThread2 waits for the end of SubFlowThread3, and SubFlowThread3 waits for a new thread in the thread pool, then the entire DAG process cannot finish, and the threads cannot be released. In this situation, the state of the child-parent process loop waiting is formed. At this moment, unless a new Master is started and add threads to break such a "stalemate", the scheduling cluster will no longer use.
It seems a bit unsatisfactory to start a new Master to break the deadlock, so we proposed the following three solutions to reduce this risk:
1. Calculate the sum of all Master threads, and then calculate the number of threads required for each DAG, that is, pre-calculate before the DAG process executes. Because it is a multi-master thread pool, it is unlikely to obtain the total number of threads in real time.
2. Judge whether the single-master thread pool is full, let the thread fail directly when fulfilled.
3. Add a Command type with insufficient resources. If the thread pool is insufficient, suspend the main process. In this way, there are new threads in the thread pool, which can make the process suspended by insufficient resources wake up to execute again.
Note: The Master Scheduler thread executes by FIFO when acquiring the Command.
So we choose the third way to solve the problem of insufficient threads.
#### Fault-Tolerant Design
Fault tolerance divides into service downtime fault tolerance and task retry, and service downtime fault tolerance divides into master fault tolerance and worker fault tolerance.
##### Downtime Fault Tolerance
The service fault-tolerance design relies on ZooKeeper's Watcher mechanism, and the implementation principle shows in the figure:
<p align="center">
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/fault-tolerant.png" alt="DolphinScheduler fault-tolerant design" width="40%" />
</p>
Among them, the Master monitors the directories of other Masters and Workers. If the remove event is triggered, perform fault tolerance of the process instance or task instance according to the specific business logic.
- Master fault tolerance:
<p align="center">
<img src="/img/failover-master.jpg" alt="failover-master" width="50%" />
</p>
Fault tolerance range: From the perspective of host, the fault tolerance range of Master includes: own host and node host that does not exist in the registry, and the entire process of fault tolerance will be locked;
Fault-tolerant content: Master's fault-tolerant content includes: fault-tolerant process instances and task instances. Before fault-tolerant, compares the start time of the instance with the server start-up time, and skips fault-tolerance if after the server start time;
Fault-tolerant post-processing: After the fault tolerance of ZooKeeper Master completed, then re-schedule by the Scheduler thread in DolphinScheduler, traverses the DAG to find the "running" and "submit successful" tasks. Monitor the status of its task instances for the "running" tasks, and for the "commits successful" tasks, it is necessary to find out whether the task queue already exists. If exists, monitor the status of the task instance. Otherwise, resubmit the task instance.
- Worker fault tolerance:
<p align="center">
<img src="/img/failover-worker.jpg" alt="failover-worker" width="50%" />
</p>
Fault tolerance range: From the perspective of process instance, each Master is only responsible for fault tolerance of its own process instance; it will lock only when `handleDeadServer`;
Fault-tolerant content: When sending the remove event of the Worker node, the Master only fault-tolerant task instances. Before fault-tolerant, compares the start time of the instance with the server start-up time, and skips fault-tolerance if after the server start time;
Fault-tolerant post-processing: Once the Master Scheduler thread finds that the task instance is in the "fault-tolerant" state, it takes over the task and resubmits it.
Note: Due to "network jitter", the node may lose heartbeat with ZooKeeper in a short period of time, and the node's remove event may occur. For this situation, we use the simplest way, that is, once the node and ZooKeeper timeout connection occurs, then directly stop the Master or Worker service.
##### Task Failed and Try Again
Here we must first distinguish the concepts of task failure retry, process failure recovery, and process failure re-run:
- Task failure retry is at the task level and is automatically performed by the schedule system. For example, if a Shell task sets to retry for 3 times, it will try to run it again up to 3 times after the Shell task fails.
- Process failure recovery is at the process level and is performed manually. Recovery can only perform **from the failed node** or **from the current node**.
- Process failure re-run is also at the process level and is performed manually, re-run perform from the beginning node.
Next to the main point, we divide the task nodes in the workflow into two types.
- One is a business node, which corresponds to an actual script or process command, such as shell node, MR node, Spark node, and dependent node.
- Another is a logical node, which does not operate actual script or process command, but only logical processing to the entire process flow, such as sub-process sections.
Each **business node** can configure the number of failed retries. When the task node fails, it will automatically retry until it succeeds or exceeds the retry times. **Logical node** failure retry is not supported, but the tasks in the logical node support.
If there is a task failure in the workflow that reaches the maximum retry times, the workflow will fail and stop, and the failed workflow can be manually re-run or process recovery operations.
#### Task Priority Design
In the early schedule design, if there is no priority design and use the fair scheduling, the task submitted first may complete at the same time with the task submitted later, thus invalid the priority of process or task. So we have re-designed this, and the following is our current design:
- According to **the priority of different process instances** prior over **priority of the same process instance** prior over **priority of tasks within the same process** prior over **tasks within the same process**, process task submission order from highest to Lowest.
- The specific implementation is to parse the priority according to the JSON of the task instance, and then save the **process instance priority_process instance id_task priority_task id** information to the ZooKeeper task queue. When obtain from the task queue, we can get the highest priority task by comparing string.
- The priority of the process definition is to consider that some processes need to process before other processes. Configure the priority when the process starts or schedules. There are 5 levels in total, which are HIGHEST, HIGH, MEDIUM, LOW, and LOWEST. As shown below
<p align="center">
<img src="https://user-images.githubusercontent.com/10797147/146744784-eb351b14-c94a-4ed6-8ba4-5132c2a3d116.png" alt="Process priority configuration" width="40%" />
</p>
- The priority of the task is also divides into 5 levels, ordered by HIGHEST, HIGH, MEDIUM, LOW, LOWEST. As shown below:
<p align="center">
<img src="https://user-images.githubusercontent.com/10797147/146744830-5eac611f-5933-4f53-a0c6-31613c283708.png" alt="Task priority configuration" width="35%" />
</p>
#### Logback and Netty Implement Log Access
- Since Web (UI) and Worker are not always on the same machine, to view the log cannot be like querying a local file. There are two options:
- Put logs on the ES search engine.
- Obtain remote log information through netty communication.
- In consideration of the lightness of DolphinScheduler as much as possible, so choose gRPC to achieve remote access to log information.
<p align="center">
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/grpc.png" alt="grpc remote access" width="50%" />
</p>
- We use the customized FileAppender and Filter functions from Logback to implement each task instance generates one log file.
- The following is the FileAppender implementation:
```java
/**
* task log appender
*/
public class TaskLogAppender extends FileAppender<ILoggingEvent> {
...
@Override
protected void append(ILoggingEvent event) {
if (currentlyActiveFile == null){
currentlyActiveFile = getFile();
}
String activeFile = currentlyActiveFile;
// thread name: taskThreadName-processDefineId_processInstanceId_taskInstanceId
String threadName = event.getThreadName();
String[] threadNameArr = threadName.split("-");
// logId = processDefineId_processInstanceId_taskInstanceId
String logId = threadNameArr[1];
...
super.subAppend(event);
}
}
```
Generate logs in the form of /process definition id /process instance id /task instance id.log
- Filter to match the thread name starting with TaskLogInfo:
- The following shows the TaskLogFilter implementation:
```java
/**
* task log filter
*/
public class TaskLogFilter extends Filter<ILoggingEvent> {
@Override
public FilterReply decide(ILoggingEvent event) {
if (event.getThreadName().startsWith("TaskLogInfo-")){
return FilterReply.ACCEPT;
}
return FilterReply.DENY;
}
}
```
## Sum Up
From the perspective of scheduling, this article preliminarily introduces the architecture principles and implementation ideas of the big data distributed workflow scheduling system: DolphinScheduler. To be continued.

59
docs/docs/en/architecture/load-balance.md

@ -0,0 +1,59 @@
# Load Balance
Load balancing refers to the reasonable allocation of server pressure through routing algorithms (usually in cluster environments) to achieve the maximum optimization of server performance.
## DolphinScheduler-Worker Load Balancing Algorithms
DolphinScheduler-Master allocates tasks to workers, and by default provides three algorithms:
- Weighted random (random)
- Smoothing polling (round-robin)
- Linear load (lower weight)
The default configuration is the linear load.
As the routing sets on the client side, the master service, you can change master.host.selector in master.properties to configure the algorithm.
e.g. master.host.selector=random (case-insensitive)
## Worker Load Balancing Configuration
The configuration file is worker.properties
### Weight
All the load algorithms above are weighted based on weights, which affect the routing outcome. You can set different weights for different machines by modifying the `worker.weight` value.
### Preheating
Consider JIT optimization, worker runs at low power for a period of time after startup, so that it can gradually reach its optimal state, a process we call preheating. If you are interested, you can read some articles about JIT.
So the worker gradually reaches its maximum weight with time after starts up ( by default ten minutes, there is no configuration about the pre-heating duration, it's recommend to submit a PR if have needs to change the duration).
## Load Balancing Algorithm in Details
### Random (Weighted)
This algorithm is relatively simple, select a worker by random (the weight affects its weighting).
### Smoothed Polling (Weighted)
An obvious drawback of the weighted polling algorithm, which is under special weights circumstance, weighted polling scheduling generates an imbalanced sequence of instances, and this unsmooth load may cause some instances to experience transient high loads, leading to a risk of system crash. To address this scheduling flaw, we provide a smooth weighted polling algorithm.
Each worker has two weights parameters, weight (which remains constant after warm-up is complete) and current_weight (which changes dynamically). For every route, calculate the current_weight + weight and is iterated over all the workers, the weight of all the workers sum up and count as total_weight, then the worker with the largest current_weight is selected as the worker for this task. By meantime, set worker's current_weight-total_weight.
### Linear Weighting (Default Algorithm)
This algorithm reports its own load information to the registry at regular intervals. Make decision on two main pieces of information:
- load average (default is the number of CPU cores * 2)
- available physical memory (default is 0.3, in G)
If either of these is lower than the configured item, then this worker will not participate in the load. (no traffic will be allocated)
You can customise the configuration by changing the following properties in worker.properties
- worker.max.cpuload.avg=-1 (worker max cpuload avg, only higher than the system cpu load average, worker server can be dispatched tasks. default value -1: the number of cpu cores * 2)
- worker.reserved.memory=0.3 (worker reserved memory, only lower than system available memory, worker server can be dispatched tasks. default value 0.3, the unit is G)

193
docs/docs/en/architecture/metadata.md

@ -0,0 +1,193 @@
# MetaData
## DolphinScheduler DB Table Overview
| Table Name | Comment |
| :---: | :---: |
| t_ds_access_token | token for access DolphinScheduler backend |
| t_ds_alert | alert detail |
| t_ds_alertgroup | alert group |
| t_ds_command | command detail |
| t_ds_datasource | data source |
| t_ds_error_command | error command detail |
| t_ds_process_definition | process definition |
| t_ds_process_instance | process instance |
| t_ds_project | project |
| t_ds_queue | queue |
| t_ds_relation_datasource_user | datasource related to user |
| t_ds_relation_process_instance | sub process |
| t_ds_relation_project_user | project related to user |
| t_ds_relation_resources_user | resource related to user |
| t_ds_relation_udfs_user | UDF functions related to user |
| t_ds_relation_user_alertgroup | alert group related to user |
| t_ds_resources | resoruce center file |
| t_ds_schedules | process definition schedule |
| t_ds_session | user login session |
| t_ds_task_instance | task instance |
| t_ds_tenant | tenant |
| t_ds_udfs | UDF resource |
| t_ds_user | user detail |
| t_ds_version | DolphinScheduler version |
---
## E-R Diagram
### User Queue DataSource
![image.png](/img/metadata-erd/user-queue-datasource.png)
- One tenant can own Multiple users.
- The queue field in the t_ds_user table stores the queue_name information in the t_ds_queue table, t_ds_tenant stores queue information using queue_id column. During the execution of the process definition, the user queue has the highest priority. If the user queue is null, use the tenant queue.
- The user_id field in the t_ds_datasource table shows the user who create the data source. The user_id in t_ds_relation_datasource_user shows the user who has permission to the data source.
### Project Resource Alert
![image.png](/img/metadata-erd/project-resource-alert.png)
- User can have multiple projects, user project authorization completes the relationship binding using project_id and user_id in t_ds_relation_project_user table.
- The user_id in the t_ds_projcet table represents the user who create the project, and the user_id in the t_ds_relation_project_user table represents users who have permission to the project.
- The user_id in the t_ds_resources table represents the user who create the resource, and the user_id in t_ds_relation_resources_user represents the user who has permissions to the resource.
- The user_id in the t_ds_udfs table represents the user who create the UDF, and the user_id in the t_ds_relation_udfs_user table represents a user who has permission to the UDF.
### Command Process Task
![image.png](/img/metadata-erd/command.png)<br />![image.png](/img/metadata-erd/process-task.png)
- A project has multiple process definitions, a process definition can generate multiple process instances, and a process instance can generate multiple task instances.
- The t_ds_schedulers table stores the specified time schedule information for process definition.
- The data stored in the t_ds_relation_process_instance table is used to deal with the sub-processes of a process definition, parent_process_instance_id field represents the id of the main process instance who contains child processes, process_instance_id field represents the id of the sub-process instance, parent_task_instance_id field represents the task instance id of the sub-process node.
- The process instance table and the task instance table correspond to the t_ds_process_instance table and the t_ds_task_instance table, respectively.
---
## Core Table Schema
### t_ds_process_definition
| Field | Type | Comment |
| --- | --- | --- |
| id | int | primary key |
| name | varchar | process definition name |
| version | int | process definition version |
| release_state | tinyint | process definition release state:0:offline,1:online |
| project_id | int | project id |
| user_id | int | process definition creator id |
| process_definition_json | longtext | process definition JSON content |
| description | text | process definition description |
| global_params | text | global parameters |
| flag | tinyint | whether process available: 0 not available, 1 available |
| locations | text | Node location information |
| connects | text | Node connection information |
| receivers | text | receivers |
| receivers_cc | text | carbon copy list |
| create_time | datetime | create time |
| timeout | int | timeout |
| tenant_id | int | tenant id |
| update_time | datetime | update time |
| modify_by | varchar | define user modify the process |
| resource_ids | varchar | resource id set |
### t_ds_process_instance
| Field | Type | Comment |
| --- | --- | --- |
| id | int | primary key |
| name | varchar | process instance name |
| process_definition_id | int | process definition id |
| state | tinyint | process instance Status: 0 successful commit, 1 running, 2 prepare to pause, 3 pause, 4 prepare to stop, 5 stop, 6 fail, 7 succeed, 8 need fault tolerance, 9 kill, 10 wait for thread, 11 wait for dependency to complete |
| recovery | tinyint | process instance failover flag:0: normal,1: failover instance needs restart |
| start_time | datetime | process instance start time |
| end_time | datetime | process instance end time |
| run_times | int | process instance run times |
| host | varchar | process instance host |
| command_type | tinyint | command type:0 start ,1 start from the current node,2 resume a fault-tolerant process,3 resume from pause process, 4 execute from the failed node,5 complement, 6 dispatch, 7 re-run, 8 pause, 9 stop, 10 resume waiting thread |
| command_param | text | JSON command parameters |
| task_depend_type | tinyint | node dependency type: 0 current node, 1 forward, 2 backward |
| max_try_times | tinyint | max try times |
| failure_strategy | tinyint | failure strategy, 0: end the process when node failed,1: continue run the other nodes when failed |
| warning_type | tinyint | warning type 0: no warning, 1: warning if process success, 2: warning if process failed, 3: warning whatever results |
| warning_group_id | int | warning group id |
| schedule_time | datetime | schedule time |
| command_start_time | datetime | command start time |
| global_params | text | global parameters |
| process_instance_json | longtext | process instance JSON |
| flag | tinyint | whether process instance is available: 0 not available, 1 available |
| update_time | timestamp | update time |
| is_sub_process | int | whether the process is sub process: 1 sub-process, 0 not sub-process |
| executor_id | int | executor id |
| locations | text | node location information |
| connects | text | node connection information |
| history_cmd | text | history commands, record all the commands to a instance |
| dependence_schedule_times | text | depend schedule estimate time |
| process_instance_priority | int | process instance priority. 0 highest,1 high,2 medium,3 low,4 lowest |
| worker_group | varchar | worker group who assign the task |
| timeout | int | timeout |
| tenant_id | int | tenant id |
### t_ds_task_instance
| Field | Type | Comment |
| --- | --- | --- |
| id | int | primary key |
| name | varchar | task name |
| task_type | varchar | task type |
| process_definition_id | int | process definition id |
| process_instance_id | int | process instance id |
| task_json | longtext | task content JSON |
| state | tinyint | Status: 0 commit succeeded, 1 running, 2 prepare to pause, 3 pause, 4 prepare to stop, 5 stop, 6 fail, 7 succeed, 8 need fault tolerance, 9 kill, 10 wait for thread, 11 wait for dependency to complete |
| submit_time | datetime | task submit time |
| start_time | datetime | task start time |
| end_time | datetime | task end time |
| host | varchar | host of task running on |
| execute_path | varchar | task execute path in the host |
| log_path | varchar | task log path |
| alert_flag | tinyint | whether alert |
| retry_times | int | task retry times |
| pid | int | pid of task |
| app_link | varchar | Yarn app id |
| flag | tinyint | task instance is available : 0 not available, 1 available |
| retry_interval | int | retry interval when task failed |
| max_retry_times | int | max retry times |
| task_instance_priority | int | task instance priority:0 highest,1 high,2 medium,3 low,4 lowest |
| worker_group | varchar | worker group who assign the task |
#### t_ds_schedules
| Field | Type | Comment |
| --- | --- | --- |
| id | int | primary key |
| process_definition_id | int | process definition id |
| start_time | datetime | schedule start time |
| end_time | datetime | schedule end time |
| crontab | varchar | crontab expression |
| failure_strategy | tinyint | failure strategy: 0 end,1 continue |
| user_id | int | user id |
| release_state | tinyint | release status: 0 not yet released,1 released |
| warning_type | tinyint | warning type: 0: no warning, 1: warning if process success, 2: warning if process failed, 3: warning whatever results |
| warning_group_id | int | warning group id |
| process_instance_priority | int | process instance priority:0 highest,1 high,2 medium,3 low,4 lowest |
| worker_group | varchar | worker group who assign the task |
| create_time | datetime | create time |
| update_time | datetime | update time |
### t_ds_command
| Field | Type | Comment |
| --- | --- | --- |
| id | int | primary key |
| command_type | tinyint | command type: 0 start workflow, 1 start execution from current node, 2 resume fault-tolerant workflow, 3 resume pause process, 4 start execution from failed node, 5 complement, 6 schedule, 7 re-run, 8 pause, 9 stop, 10 resume waiting thread |
| process_definition_id | int | process definition id |
| command_param | text | JSON command parameters |
| task_depend_type | tinyint | node dependency type: 0 current node, 1 forward, 2 backward |
| failure_strategy | tinyint | failed policy: 0 end, 1 continue |
| warning_type | tinyint | alarm type: 0 no alarm, 1 alarm if process success, 2: alarm if process failed, 3: warning whatever results |
| warning_group_id | int | warning group id |
| schedule_time | datetime | schedule time |
| start_time | datetime | start time |
| executor_id | int | executor id |
| dependence | varchar | dependence column |
| update_time | datetime | update time |
| process_instance_priority | int | process instance priority: 0 highest,1 high,2 medium,3 low,4 lowest |
| worker_group_id | int | worker group who assign the task |

1114
docs/docs/en/architecture/task-structure.md

File diff suppressed because it is too large Load Diff

708
docs/docs/en/faq.md

@ -0,0 +1,708 @@
<!-- markdown-link-check-disable -->
## Q: What's the name of this project?
A: DolphinScheduler
---
## Q: DolphinScheduler service introduction and recommended running memory
A: DolphinScheduler consists of 5 services, MasterServer, WorkerServer, ApiServer, AlertServer, LoggerServer and UI.
| Service | Description |
| ------------------------- | ------------------------------------------------------------ |
| MasterServer | Mainly responsible for DAG segmentation and task status monitoring |
| WorkerServer/LoggerServer | Mainly responsible for the submission, execution and update of task status. LoggerServer is used for Rest Api to view logs through RPC |
| ApiServer | Provides the Rest Api service for the UI to call |
| AlertServer | Provide alarm service |
| UI | Front page display |
Note:**Due to the large number of services, it is recommended that the single-machine deployment is preferably 4 cores and 16G or more.**
---
## Q: Which mailboxes does the system support?
A: Support most mailboxes, qq, 163, 126, 139, outlook, aliyun, etc. are supported. Support TLS and SSL protocols, optionally configured in alert.properties
---
## Q: What are the common system variable time parameters and how do I use them?
A: Please refer to 'System parameter' in the system-manual
---
## Q: pip install kazoo This installation gives an error. Is it necessary to install?
A: This is the python connection Zookeeper needs to use, it is used to delete the master/worker temporary node info in the Zookeeper. so you can ignore error if it's your first install. after version 1.3.0, kazoo is not been needed, we use program to replace what kazoo done
---
## Q: How to specify the machine running task
A: version 1.2 and berfore, Use **the administrator** to create a Worker group, **specify the Worker group** when the **process definition starts**, or **specify the Worker group on the task node**. If not specified, use Default, **Default is to select one of all the workers in the cluster to use for task submission and execution.**
version 1.3, you can set worker group for the worker
---
## Q: Priority of the task
A: We also support **the priority of processes and tasks**. Priority We have five levels of **HIGHEST, HIGH, MEDIUM, LOW and LOWEST**. **You can set the priority between different process instances, or you can set the priority of different task instances in the same process instance.** For details, please refer to the task priority design in the architecture-design.
---
## Q: dolphinscheduler-grpc gives an error
A: Execute in the root directory: mvn -U clean package assembly:assembly -Dmaven.test.skip=true , then refresh the entire project.
version 1.3 not use grpc, we use netty directly
---
## Q: Does DolphinScheduler support running on windows?
A: In theory, **only the Worker needs to run on Linux**. Other services can run normally on Windows. But it is still recommended to deploy on Linux.
---
## Q: UI compiles node-sass prompt in linux: Error: EACCESS: permission denied, mkdir xxxx
A: Install **npm install node-sass --unsafe-perm** separately, then **npm install**
---
## Q: UI cannot log in normally.
A: 1, if it is node startup, check whether the .env API_BASE configuration under dolphinscheduler-ui is the Api Server service address.
2, If it is nginx booted and installed via **install-dolphinscheduler-ui.sh**, check if the proxy_pass configuration in **/etc/nginx/conf.d/dolphinscheduler.conf** is the Api Server service address
3, if the above configuration is correct, then please check if the Api Server service is normal,
curl http://localhost:12345/dolphinscheduler/users/get-user-info, check the Api Server log,
if Prompt cn.dolphinscheduler.api.interceptor.LoginHandlerInterceptor:[76] - session info is null, which proves that the Api Server service is normal.
4, if there is no problem above, you need to check if **server.context-path and server.port configuration** in **application.properties** is correct
---
## Q: After the process definition is manually started or scheduled, no process instance is generated.
A: 1, first **check whether the MasterServer service exists through jps**, or directly check whether there is a master service in zk from the service monitoring.
2,If there is a master service, check **the command status statistics** or whether new records are added in **t_ds_error_command**. If it is added, **please check the message field.**
---
## Q : The task status is always in the successful submission status.
A: 1, **first check whether the WorkerServer service exists through jps**, or directly check whether there is a worker service in zk from the service monitoring.
2,If the **WorkerServer** service is normal, you need to **check whether the MasterServer puts the task task in the zk queue. You need to check whether the task is blocked in the MasterServer log and the zk queue.**
3, if there is no problem above, you need to locate whether the Worker group is specified, but **the machine grouped by the worker is not online**.
---
## Q: Is there a Docker image and a Dockerfile?
A: Provide Docker image and Dockerfile.
Docker image address: https://hub.docker.com/r/escheduler/escheduler_images
Dockerfile address: https://github.com/qiaozhanwei/escheduler_dockerfile/tree/master/docker_escheduler
---
## Q : Need to pay attention to the problem in install.sh
A: 1, if the replacement variable contains special characters, **use the \ transfer character to transfer**
2, installPath="/data1_1T/dolphinscheduler", **this directory can not be the same as the install.sh directory currently installed with one click.**
3, deployUser = "dolphinscheduler", **the deployment user must have sudo privileges**, because the worker is executed by sudo -u tenant sh xxx.command
4, monitorServerState = "false", whether the service monitoring script is started, the default is not to start the service monitoring script. **If the service monitoring script is started, the master and worker services are monitored every 5 minutes, and if the machine is down, it will automatically restart.**
5, hdfsStartupSate="false", whether to enable HDFS resource upload function. The default is not enabled. **If it is not enabled, the resource center cannot be used.** If enabled, you need to configure the configuration of fs.defaultFS and yarn in conf/common/hadoop/hadoop.properties. If you use namenode HA, you need to copy core-site.xml and hdfs-site.xml to the conf root directory.
Note: **The 1.0.x version does not automatically create the hdfs root directory, you need to create it yourself, and you need to deploy the user with hdfs operation permission.**
---
## Q : Process definition and process instance offline exception
A : For **versions prior to 1.0.4**, modify the code under the escheduler-api cn.escheduler.api.quartz package.
```
public boolean deleteJob(String jobName, String jobGroupName) {
lock.writeLock().lock();
try {
JobKey jobKey = new JobKey(jobName,jobGroupName);
if(scheduler.checkExists(jobKey)){
logger.info("try to delete job, job name: {}, job group name: {},", jobName, jobGroupName);
return scheduler.deleteJob(jobKey);
}else {
return true;
}
} catch (SchedulerException e) {
logger.error(String.format("delete job : %s failed",jobName), e);
} finally {
lock.writeLock().unlock();
}
return false;
}
```
---
## Q: Can the tenant created before the HDFS startup use the resource center normally?
A: No. Because the tenant created by HDFS is not started, the tenant directory will not be registered in HDFS. So the last resource will report an error.
---
## Q: In the multi-master and multi-worker state, the service is lost, how to be fault-tolerant
A: **Note:** **Master monitors Master and Worker services.**
1,If the Master service is lost, other Masters will take over the process of the hanged Master and continue to monitor the Worker task status.
2,If the Worker service is lost, the Master will monitor that the Worker service is gone. If there is a Yarn task, the Kill Yarn task will be retried.
Please see the fault-tolerant design in the architecture for details.
---
## Q : Fault tolerance for a machine distributed by Master and Worker
A: The 1.0.3 version only implements the fault tolerance of the Master startup process, and does not take the Worker Fault Tolerance. That is to say, if the Worker hangs, no Master exists. There will be problems with this process. We will add Master and Worker startup fault tolerance in version **1.1.0** to fix this problem. If you want to manually modify this problem, you need to **modify the running task for the running worker task that is running the process across the restart and has been dropped. The running process is set to the failed state across the restart**. Then resume the process from the failed node.
---
## Q : Timing is easy to set to execute every second
A : Note when setting the timing. If the first digit (* * * * * ? *) is set to *, it means execution every second. **We will add a list of recently scheduled times in version 1.1.0.** You can see the last 5 running times online at http://cron.qqe2.com/
---
## Q: Is there a valid time range for timing?
A: Yes, **if the timing start and end time is the same time, then this timing will be invalid timing. If the end time of the start and end time is smaller than the current time, it is very likely that the timing will be automatically deleted.**
---
## Q : There are several implementations of task dependencies
A: 1, the task dependency between **DAG**, is **from the zero degree** of the DAG segmentation
2, there are **task dependent nodes**, you can achieve cross-process tasks or process dependencies, please refer to the (DEPENDENT) node design in the system-manual.
Note: **Cross-project processes or task dependencies are not supported**
---
## Q: There are several ways to start the process definition.
A: 1, in **the process definition list**, click the **Start** button.
2, **the process definition list adds a timer**, scheduling start process definition.
3, process definition **view or edit** the DAG page, any **task node right click** Start process definition.
4, you can define DAG editing for the process, set the running flag of some tasks to **prohibit running**, when the process definition is started, the connection of the node will be removed from the DAG.
---
## Q : Python task setting Python version
A: 1,**for the version after 1.0.3** only need to modify PYTHON_HOME in conf/env/.dolphinscheduler_env.sh
```
export PYTHON_HOME=/bin/python
```
Note: This is **PYTHON_HOME** , which is the absolute path of the python command, not the simple PYTHON_HOME. Also note that when exporting the PATH, you need to directly
```
export PATH=$HADOOP_HOME/bin:$SPARK_HOME1/bin:$SPARK_HOME2/bin:$PYTHON_HOME:$JAVA_HOME/bin:$HIVE_HOME/bin:$PATH
```
2,For versions prior to 1.0.3, the Python task only supports the Python version of the system. It does not support specifying the Python version.
---
## Q:Worker Task will generate a child process through sudo -u tenant sh xxx.command, will kill when kill
A: We will add the kill task in 1.0.4 and kill all the various child processes generated by the task.
---
## Q : How to use the queue in DolphinScheduler, what does the user queue and tenant queue mean?
A : The queue in the DolphinScheduler can be configured on the user or the tenant. **The priority of the queue specified by the user is higher than the priority of the tenant queue.** For example, to specify a queue for an MR task, the queue is specified by mapreduce.job.queuename.
Note: When using the above method to specify the queue, the MR uses the following methods:
```
Configuration conf = new Configuration();
GenericOptionsParser optionParser = new GenericOptionsParser(conf, args);
String[] remainingArgs = optionParser.getRemainingArgs();
```
If it is a Spark task --queue mode specifies the queue
---
## Q : Master or Worker reports the following alarm
<p align="center">
<img src="https://analysys.github.io/easyscheduler_docs/images/master_worker_lack_res.png" width="60%" />
</p>
A : Change the value of master.properties **master.reserved.memory** under conf to a smaller value, say 0.1 or the value of worker.properties **worker.reserved.memory** is a smaller value, say 0.1
---
## Q: The hive version is 1.1.0+cdh5.15.0, and the SQL hive task connection is reported incorrectly.
<p align="center">
<img src="https://analysys.github.io/easyscheduler_docs/images/cdh_hive_error.png" width="60%" />
</p>
A : Will hive pom
```
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>2.1.0</version>
</dependency>
```
change into
```
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>1.1.0</version>
</dependency>
```
---
## Q : how to add a worker server
A: 1, Create deployment user and hosts mapping, please refer 1.3 part of [cluster deployment](https://dolphinscheduler.apache.org/en-us/docs/laster/user_doc/installation/cluster.html)
2, Configure hosts mapping and ssh access and modify directory permissions. please refer 1.4 part of [cluster deployment](https://dolphinscheduler.apache.org/en-us/docs/laster/user_doc/installation/cluster.html)
3, Copy the deployment directory from worker server that has already deployed
4, Go to bin dir, then start worker server and logger server
```
./dolphinscheduler-daemon.sh start worker-server
./dolphinscheduler-daemon.sh start logger-server
```
---
## Q : When DolphinScheduler release a new version, and the change between current version and latest, and how to upgrade, and version number specification
A: 1, The release process of Apache Project happens in the mailing list. You can subscribe DolphinScheduler's mailing list and then when the release is in process, you'll receive release emails. Please follow this [introduction](https://github.com/apache/dolphinscheduler#get-help) to subscribe DolphinScheduler's mailing list.
2, When new version published, there would be release note which describe the change log, and there also have upgrade document for the previous version to new's.
3, Version number is x.y.z, when x is increased, it represents the version of the new architecture. When y is increased, it means that it is incompatible with the y version before it needs to be upgraded by script or other manual processing. When the z increase represents a bug fix, the upgrade is fully compatible. No additional processing is required. Remaining problem, the 1.0.2 upgrade is not compatible with 1.0.1 and requires an upgrade script.
---
## Q : Subsequent tasks can execute even front task failed
A: When start the workflow, you can set the task failure strategy: continue or failure.
![set task failure strategy](https://user-images.githubusercontent.com/15833811/80368215-ee378080-88be-11ea-9074-01a33d012b23.png)
---
## Q : Workflow template DAG, workflow instance, work task and what is the relationship among them? A DAG supports a maximum concurrency of 100, does it mean that 100 workflow instances are generated and run concurrently? A task node in a DAG also has a concurrent number configuration. Does it mean that tasks can run concurrently with multiple threads? Is the maximum number 100?
A:
1.2.1 version
```
master.properties
Control the max parallel number of master node workflows
master.exec.threads=100
Control the max number of parallel tasks in each workflow
master.exec.task.number=20
worker.properties
Control the max parallel number of worker node tasks
worker.exec.threads=100
```
---
## Q : Worker group manage page no buttons displayed
<p align="center">
<img src="https://user-images.githubusercontent.com/39816903/81903776-d8cb9180-95f4-11ea-98cb-94ca1e6a1db5.png" width="60%" />
</p>
A: For version 1.3.0, we want to support k8s, while the ip always will be changed, so can't config on the UI, worker can config group name in the worker.properties.
---
## Q : Why not add mysql jdbc connector to docker image
A: The license of mysql jdbc connector is not compatible with apache v2 license, so it can't be included by docker image.
---
## Q : Allways fail when a task instance submit multiple yarn application
<p align="center">
<img src="https://user-images.githubusercontent.com/16174111/81312485-476e9380-90b9-11ea-9aad-ed009db899b1.png" width="60%" />
</p>
A: This bug have fix in dev and in Requirement/TODO list.
---
## Q : Master server and worker server stop abnormally after run for a few days
<p align="center">
<img src="https://user-images.githubusercontent.com/18378986/81293969-c3101680-90a0-11ea-87e5-ac9f0dd53f5e.png" width="60%" />
</p>
A: Session timeout is too short, only 0.3 seconds. Change the config item in zookeeper.properties:
```
zookeeper.session.timeout=60000
zookeeper.connection.timeout=30000
```
---
## Q : Started using the docker-compose default configuration and display zookeeper errors
<p align="center">
<img src="https://user-images.githubusercontent.com/42579056/80374318-13c98780-88c9-11ea-8d5f-53448b957f02.png" width="60%" />
</p>
A: This problem is solved in dev-1.3.0. This [pr](https://github.com/apache/dolphinscheduler/pull/2595) has solved this bug, brief change log:
```
1. add zookeeper environment variable ZOO_4LW_COMMANDS_WHITELIST in docker-compose.yml file.
2. change the data type of minLatency, avgLatency and maxLatency from int to float.
```
---
## Q : Interface show some task would be running all the time when db delayed and log show task instance is null
<p align="center">
<img src="https://user-images.githubusercontent.com/51871547/80302626-b1478d00-87dd-11ea-97d4-08aa2244a6d0.jpg" width="60%" />
</p>
<p align="center">
<img src="https://user-images.githubusercontent.com/51871547/80302626-b1478d00-87dd-11ea-97d4-08aa2244a6d0.jpg" width="60%" />
</p>
A: This [bug](https://github.com/apache/dolphinscheduler/issues/1477) describe the problem detail and it has been been solved in version 1.2.1.
For version under 1.2.1, some tips for this situation:
```
1. clear the task queue in zk for path: /dolphinscheduler/task_queue
2. change the state of the task to failed( integer value: 6).
3. run the work flow by recover from failed
```
---
## Q : Zookeeper masters znode list ip address is 127.0.0.1, instead of wanted ip eth0 or eth1, and may can't see task log
A: bug fix:
```
1, confirm hostname
$hostname
hadoop1
2, hostname -i
127.0.0.1 10.3.57.15
3, edit /etc/hosts,delete hadoop1 from 127.0.0.1 record
$cat /etc/hosts
127.0.0.1 localhost
10.3.57.15 ds1 hadoop1
4, hostname -i
10.3.57.15
```
Hostname cmd return server hostname, hostname -i return all matched ips configured in /etc/hosts. So after I delete the hostname matched with 127.0.0.1, and only remain internal ip resolution, instead of remove all the 127.0.0.1 resolution record. As long as hostname cmd return the correct internal ip configured in /etc/hosts can fix this bug. DolphinScheduler use the first record returned by hostname -i command. In my opion, DS should not use hostname -i to get the ip , as in many companies the devops configured the server name, we suggest use ip configured in configuration file or znode instead of /etc/hosts.
---
## Q : The scheduling system set a second frequency task, causing the system to crash
A: The scheduling system not support second frequency task.
---
## Q : Compile front-end code(dolphinscheduler-ui) show error cannot download "https://github.com/sass/node-sass/releases/download/v4.13.1/darwin-x64-72_binding.node"
A: 1, cd dolphinscheduler-ui and delete node_modules directory
```
sudo rm -rf node_modules
```
2, install node-sass through npm.taobao.org
```
sudo npm uninstall node-sass
sudo npm i node-sass --sass_binary_site=https://npm.taobao.org/mirrors/node-sass/
```
3, if the 2nd step failure, please, [referer url](https://dolphinscheduler.apache.org/en-us/development/frontend-development.html)
```
sudo npm rebuild node-sass
```
When solved this problem, if you don't want to download this node every time, you can set system environment variable: SASS_BINARY_PATH= /xxx/xxx/xxx/xxx.node.
---
## Q : How to config when use mysql as database instead of postgres
A: 1, Edit project root dir maven config file, remove scope test property so that mysql driver can be loaded.
```
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>${mysql.connector.version}</version>
<scope>test<scope>
</dependency>
```
2, Edit application-dao.properties and quzrtz.properties config file to use mysql driver.
Default is postgresSql driver because of license problem.
---
## Q : How does a shell task run
A: 1, Where is the executed server? Specify one worker to run the task, you can create worker group in Security Center, then the task can be send to the particular worker. If a worker group have multiple servers, which server actually execute is determined by scheduling and has randomness.
2, If it is a shell file of a path on the server, how to point to the path? The server shell file, involving permissions issues, it is not recommended to do so. It is recommended that you use the storage function of the resource center, and then use the resource reference in the shell editor. The system will help you download the script to the execution directory. If the task dependent on resource center files, worker use "hdfs dfs -get" to get the resource files in HDFS, then run the task in /tmp/escheduler/exec/process, this path can be customized when installtion dolphinscheduler.
3, Which user execute the task? Task is run by the tenant through "sudo -u ${tenant}", tenant is a linux user.
---
## Q : What’s the best deploy mode you suggest in production env
A: 1, I suggest you use 3 nodes for stability if you don't have too many tasks to run. And deploy Master/Worker server on different nodes is better. If you only have one node, you of course only can deploy them together! By the way, how many machines you need is determined by your business. The DolphinScheduler system itself does not use too many resources. Test more, and you'll find the right way to use a few machines.
---
## Q : DEPENDENT Task Node
A: 1, DEPENDENT task node actually does not have script, it used for config data cycle dependent logic, and then add task node after that to realize task cycle dependent.
---
## Q : How to change the boot port of the master
<p align="center">
<img src="https://user-images.githubusercontent.com/8263441/62352160-0f3e9100-b53a-11e9-95ba-3ae3dde49c72.png" width="60%" />
</p>
A: 1, modify application_master.properties, for example: server.port=12345.
---
## Q : Scheduled tasks cannot be online
A: 1, We can successly create scheduled task and add one record into t_scheduler_schedules table, but when I click online, front page no reaction and will lock table t_scheduler_schedules, and tested set field release_state value to 1 in table t_scheduler_schedules, and task display online state. For DS version above 1.2 table name is t_ds_schedules, other version table name is t_scheduler_schedules.
---
## Q : What is the address of swagger ui
A: 1, For version 1.2+ is http://apiServerIp:apiServerPort/dolphinscheduler/doc.html others is http://apiServerIp:apiServerPort/escheduler/doc.html.
---
## Q : Front-end installation package is missing files
<p align="center">
<img src="https://user-images.githubusercontent.com/41460919/61437083-d960b080-a96e-11e9-87f1-297ba3aca5e3.png" width="60%" />
</p>
<p align="center">
<img src="https://user-images.githubusercontent.com/41460919/61437218-1b89f200-a96f-11e9-8e48-3fac47eb2389.png" width="60%" />
</p>
A: 1, User changed the config api server config file and item
![apiServerContextPath](https://user-images.githubusercontent.com/41460919/61678323-1b09a680-ad35-11e9-9707-3ba68bbc70d6.png), thus lead to the problem. After resume to the default value and problem solved.
---
## Q : Upload a relatively large file blocked
<p align="center">
<img src="https://user-images.githubusercontent.com/21357069/58231400-805b0e80-7d69-11e9-8107-7f37b06a95df.png" width="60%" />
</p>
A: 1, Edit ngnix config file, edit upload max size client_max_body_size 1024m.
2, the version of Google Chrome is old, and the latest version of the browser has been updated.
---
## Q : Create a spark data source, click "Test Connection", the system will fall back to the login page
A: 1, edit nginx config file /etc/nginx/conf.d/escheduler.conf
```
proxy_connect_timeout 300s;
proxy_read_timeout 300s;
proxy_send_timeout 300s;
```
---
## Q : Welcome to subscribe the DolphinScheduler development mailing list
A: In the process of using DolphinScheduler, if you have any questions or ideas, suggestions, you can participate in the DolphinScheduler community building through the Apache mailing list. Sending a subscription email is also very simple, the steps are as follows:
1, Send an email to dev-subscribe@dolphinscheduler.apache.org with your own email address, subject and content.
2, Receive confirmation email and reply. After completing step 1, you will receive a confirmation email from dev-help@dolphinscheduler.apache.org (if not received, please confirm whether the email is automatically classified as spam, promotion email, subscription email, etc.) . Then reply directly to the email, or click on the link in the email to reply quickly, the subject and content are arbitrary.
3, Receive a welcome email. After completing the above steps, you will receive a welcome email with the subject WELCOME to dev@dolphinscheduler.apache.org, and you have successfully subscribed to the Apache DolphinScheduler mailing list.
---
## Q : Workflow Dependency
A: 1, It is currently judged according to natural days, at the end of last month: the judgment time is the workflow A start_time/scheduler_time between '2019-05-31 00:00:00' and '2019-05-31 23:59:59'. Last month: It is judged that there is an A instance completed every day from the 1st to the end of the month. Last week: There are completed A instances 7 days last week. The first two days: Judging yesterday and the day before yesterday, there must be a completed A instance for two days.
---
## Q : DS Backend Inteface Document
A: 1, http://106.75.43.194:8888/dolphinscheduler/doc.html?language=en.
## During the operation of dolphinscheduler, the ip address is obtained incorrectly
When the master service and worker service are registered with zookeeper, relevant information will be created in the form of ip:port
If the ip address is obtained incorrectly, please check the network information. For example, in the Linux system, use the `ifconfig` command to view the network information. The following figure is an example:
<p align="center">
<img src="/img/network/network_config.png" width="60%" />
</p>
You can use the three strategies provided by dolphinscheduler to get the available ip:
* default: First using internal network card to obtain the IP address, and then using external network card. If all above fail, use the address of the first available network card
* inner: Use the internal network card to obtain the ip address, if fails thrown an exception.
* outer: Use the external network card to obtain the ip address, if fails thrown an exception.
Modify the configuration in `common.properties`:
```shell
# network IP gets priority, default: inner outer
# dolphin.scheduler.network.priority.strategy=default
```
After configuration is modified, restart the service to activation
If the ip address is still wrong, please download [dolphinscheduler-netutils.jar](/asset/dolphinscheduler-netutils.jar) to the machine, execute the following commands and feedback the output to the community developers:
```shell
java -jar target/dolphinscheduler-netutils.jar
```
## Configure sudo to be secret free, which is used to solve the problem of using the default configuration sudo authority to be too large or unable to apply for root authority
Configure the sudo permission of the dolphinscheduler account to be an ordinary user manager within the scope of some ordinary users, and restrict specified users to run certain commands on the specified host. For detailed configuration, please see sudo rights management
For example, sudo permission management configuration dolphinscheduler OS account can only operate the permissions of users userA, userB, userC (users userA, userB, and userC are used for multi-tenant submitting jobs to the big data cluster)
```shell
echo 'dolphinscheduler ALL=(userA,userB,userC) NOPASSWD: NOPASSWD: ALL' >> /etc/sudoers
sed -i 's/Defaults requirett/#Defaults requirett/g' /etc/sudoers
```
---
## Q:Deploy for multiple YARN clusters
A:By deploying different worker in different yarn clusters,the steps are as follows(eg: AWS EMR):
1. Deploying the worker server on the master node of the EMR cluster
2. Changing `yarn.application.status.address` to current emr's yarn url in the `conf/common.properties`
3. Execute command `bin/dolphinscheduler-daemon.sh start worker-server` and `bin/dolphinscheduler-daemon.sh start logger-server` to start worker-server and logger-server
---
## Q:Update process definition error: Duplicate key TaskDefinition
A:Before DS 2.0.4 (after 2.0.0-alpha), there may be a problem of duplicate keys TaskDefinition due to version switching, which may cause the update workflow to fail; you can refer to the following SQL to delete duplicate data, taking MySQL as an example: (Note: Before operating, be sure to back up the original data, the SQL from pr[#8408](https://github.com/apache/dolphinscheduler/pull/8408))
```SQL
DELETE FROM t_ds_process_task_relation_log WHERE id IN
(
SELECT
x.id
FROM
(
SELECT
aa.id
FROM
t_ds_process_task_relation_log aa
JOIN
(
SELECT
a.process_definition_code
,MAX(a.id) as min_id
,a.pre_task_code
,a.pre_task_version
,a.post_task_code
,a.post_task_version
,a.process_definition_version
,COUNT(*) cnt
FROM
t_ds_process_task_relation_log a
JOIN (
SELECT
code
FROM
t_ds_process_definition
GROUP BY code
)b ON b.code = a.process_definition_code
WHERE 1=1
GROUP BY a.pre_task_code
,a.post_task_code
,a.pre_task_version
,a.post_task_version
,a.process_definition_code
,a.process_definition_version
HAVING COUNT(*) > 1
)bb ON bb.process_definition_code = aa.process_definition_code
AND bb.pre_task_code = aa.pre_task_code
AND bb.post_task_code = aa.post_task_code
AND bb.process_definition_version = aa.process_definition_version
AND bb.pre_task_version = aa.pre_task_version
AND bb.post_task_version = aa.post_task_version
AND bb.min_id != aa.id
)x
)
;
DELETE FROM t_ds_task_definition_log WHERE id IN
(
SELECT
x.id
FROM
(
SELECT
a.id
FROM
t_ds_task_definition_log a
JOIN
(
SELECT
code
,name
,version
,MAX(id) AS min_id
FROM
t_ds_task_definition_log
GROUP BY code
,name
,version
HAVING COUNT(*) > 1
)b ON b.code = a.code
AND b.name = a.name
AND b.version = a.version
AND b.min_id != a.id
)x
)
;
```
---
## We will collect more FAQ later

14
docs/docs/en/guide/alert/alert_plugin_user_guide.md

@ -0,0 +1,14 @@
# Alert Component User Guide
## How to Create Alert Plugins and Alert Groups
In version 2.0.0, users need to create alert instances, and then associate them with alert groups. Alert group can use multiple alert instances and notify them one by one.
First, go to the Security Center page. Select Alarm Group Management, click Alarm Instance Management on the left and create an alarm instance. Select the corresponding alarm plug-in and fill in the relevant alarm parameters.
Then select Alarm Group Management, create an alarm group, and choose the corresponding alarm instance.
<img src="/img/alert/alert_step_1.png">
<img src="/img/alert/alert_step_2.png">
<img src="/img/alert/alert_step_3.png">
<img src="/img/alert/alert_step_4.png">

27
docs/docs/en/guide/alert/dingtalk.md

@ -0,0 +1,27 @@
# DingTalk
If you need to use `DingTalk` for alerting, create an alert instance in the alert instance management and select the DingTalk plugin.
The following shows the `DingTalk` configuration example:
![dingtalk-plugin](/img/alert/dingtalk-plugin.png)
## Parameter Configuration
* Webhook
> The format is: https://oapi.dingtalk.com/robot/send?access_token=XXXXXX
* Keyword
> Custom keywords for security settings
* Secret
> Signature of security settings
* MessageType
> Support both text and markdown types
When a custom bot sends a message, you can specify the "@person list" by their mobile phone number. When the selected people in the "@people list" receive the message, there will be a `@` message reminder. `No disturb` mode always receives reminders, and "someone @ you" appears in the message.
* @Mobiles
> The mobile phone number of the "@person"
* @UserIds
> The user ID by "@person"
* @All
> @Everyone
[DingTalk Custom Robot Access Development Documentation](https://open.dingtalk.com/document/robots/custom-robot-access)

64
docs/docs/en/guide/alert/enterprise-webexteams.md

@ -0,0 +1,64 @@
# Webex Teams
If you need to use `Webex Teams` to alert, create an alert instance in the alert instance management, and choose the WebexTeams plugin.
You can pick private alert or room group chat alert.
The following is the `WebexTeams` configuration example:
![enterprise-webexteams-plugin](/img/alert/enterprise-webexteams-plugin.png)
## Parameter Configuration
* botAccessToken
> The robot's access token
* roomID
> The ID of the room that receives message (only support one room ID)
* toPersonId
> The person ID of the recipient when sending a private 1:1 message
* toPersonEmail
> The email address of the recipient when sending a private 1:1 message
* atSomeoneInRoom
> If the message destination is room, the emails of the person being @, use `,` (eng commas) to separate multiple emails
* destination
> The destination of the message (one message only support one destination)
## Create a Bot
Create a bot visit [Official Website My-Apps](https://developer.webex.com/my-apps) to `Create a New APP` and select `Create a Bot`, fill in the bot information and acquire `bot username` and `bot ID` for further usage.
![enterprise-webexteams-bot-info](/img/alert/enterprise-webexteams-bot.png)
## Create a Room
Create a root visit [Official Website for Developer APIs](https://developer.webex.com/docs/api/v1/rooms/create-a-room) to create a new room, fill in the room name and acquire `id`(room ID) and `creatorId` for further usage.
![enterprise-webexteams-room-info](/img/alert/enterprise-webexteams-room.png)
### Invite Bot to the Room
Invite bot to the room by invite bot Email (bot username).
## Send Private Message
In this way, you can send private message to a person by `User Email` or `UserId` in a private conversation. Fill in the `To Person Id` or `To Person Email`(recommended) and `Bot Access Token` and select `Destination` `personEmail` or `personId`.
The `user Email` is user register Email.
The `userId` we can acquire it from the `creatorId` of creating a new group chat room API.
![enterprise-webexteams-private-message-form](/img/alert/enterprise-webexteams-private-form.png)
### Private Alert Message Example
![enterprise-webexteams-private-message-example](/img/alert/enterprise-webexteams-private-msg.png)
## Send Group Room Message
In this way, you can send group room message to a room by `Room ID`. Fill in the `Room Id` and `Bot Access Token` and select `Destination` `roomId`.
The `Room ID` we can acquire it from the `id` of creating a new group chat room API.
![enterprise-webexteams-room](/img/alert/enterprise-webexteams-group-form.png)
### Group Room Alert Message Example
![enterprise-webexteams-room-message-example](/img/alert/enterprise-webexteams-room-msg.png)
[WebexTeams Application Bot Guide](https://developer.webex.com/docs/bots)
[WebexTeams Message Guide](https://developer.webex.com/docs/api/v1/messages/create-a-message)

14
docs/docs/en/guide/alert/enterprise-wechat.md

@ -0,0 +1,14 @@
# Enterprise WeChat
If you need to use `Enterprise WeChat` to alert, create an alert instance in the alert instance management, and choose the WeChat plugin.
The following is the `WeChat` configuration example:
![enterprise-wechat-plugin](/img/alert/enterprise-wechat-plugin.png)
The parameter `send.type` corresponds to app and group chat respectively:
APP: https://work.weixin.qq.com/api/doc/90000/90135/90236
Group Chat: https://work.weixin.qq.com/api/doc/90000/90135/90248
The parameter `user.send.msg` corresponds to the `content` in the document, and the corresponding variable is `{msg}`.

42
docs/docs/en/guide/alert/telegram.md

@ -0,0 +1,42 @@
# Telegram
If you need `Telegram` to alert, create an alert instance in the alert instance management, and choose the `Telegram` plugin.
The following shows the `Telegram` configuration example:
![telegram-plugin](/img/alert/telegram-plugin.png)
## Parameter Configuration
* WebHook:
> The WebHook of Telegram when use robot to send message
* botToken
> The robot's access token
* chatId
> Sub Telegram Channel
* parseMode
> Message parse type (support txt, markdown, markdownV2, html)
* EnableProxy
> Enable proxy sever
* Proxy
> the proxy address of the proxy server
* Port
> the proxy port of Proxy-Server
* User
> Authentication(Username) for the proxy server
* Password
> Authentication(Password) for the proxy server
**NOTICE**:The webhook needs to be able to receive and use the same JSON body of HTTP POST that DolphinScheduler constructs and the following shows the JSON body:
```json
{
"text": "[{\"projectId\":1,\"projectName\":\"p1\",\"owner\":\"admin\",\"processId\":35,\"processDefinitionCode\":4928367293568,\"processName\":\"s11-3-20220324084708668\",\"taskCode\":4928359068928,\"taskName\":\"s1\",\"taskType\":\"SHELL\",\"taskState\":\"FAILURE\",\"taskStartTime\":\"2022-03-24 08:47:08\",\"taskEndTime\":\"2022-03-24 08:47:09\",\"taskHost\":\"192.168.1.103:1234\",\"logPath\":\"\"}]",
"chat_id": "chat id number"
}
```
References:
- [Telegram Application Bot Guide](https://core.telegram.org/bots)
- [Telegram Bots Api](https://core.telegram.org/bots/api)
- [Telegram SendMessage Api](https://core.telegram.org/bots/api#sendmessage)

39
docs/docs/en/guide/datasource/hive.md

@ -0,0 +1,39 @@
# HIVE
## Use HiveServer2
![hive](/img/new_ui/dev/datasource/hive.png)
- Datasource: select `HIVE`
- Datasource name: enter the name of the DataSource
- Description: enter a description of the DataSource
- IP/Host Name: enter the HIVE service IP
- Port: enter the HIVE service port
- Username: set the username for HIVE connection
- Password: set the password for HIVE connection
- Database name: enter the database name of the HIVE connection
- Jdbc connection parameters: parameter settings for HIVE connection, in JSON format
> NOTICE: If you wish to execute multiple HIVE SQL in the same session, you could set `support.hive.oneSession = true` in `common.properties`.
> It is helpful when you try to set env variables before running HIVE SQL. Default value of `support.hive.oneSession` is `false` and multi-SQLs run in different sessions.
## Use HiveServer2 HA ZooKeeper
![hive-server2](/img/new_ui/dev/datasource/hiveserver2.png)
NOTICE: If Kerberos is disabled, ensure the parameter `hadoop.security.authentication.startup.state` is false, and parameter `java.security.krb5.conf.path` value sets null.
If **Kerberos** is enabled, needs to set the following parameters in `common.properties`:
```conf
# whether to startup kerberos
hadoop.security.authentication.startup.state=true
# java.security.krb5.conf path
java.security.krb5.conf.path=/opt/krb5.conf
# login user from keytab username
login.user.keytab.username=hdfs-mycluster@ESZ.COM
# login user from keytab path
login.user.keytab.path=/opt/hdfs.headless.keytab
```

6
docs/docs/en/guide/datasource/introduction.md

@ -0,0 +1,6 @@
# DataSource
DataSource supports MySQL, PostgreSQL, Hive/Impala, Spark, ClickHouse, Oracle, SQL Server and other DataSources.
- Click "Data Source Center -> Create Data Source" to create different types of DataSources according to requirements.
- Click "Test Connection" to test whether the DataSource can connect successfully.

14
docs/docs/en/guide/datasource/mysql.md

@ -0,0 +1,14 @@
# MySQL
![mysql](/img/new_ui/dev/datasource/mysql.png)
- Datasource: select MYSQL
- Datasource name: enter the name of the DataSource
- Description: enter a description of the DataSource
- IP/Host Name: enter the MYSQL service IP
- Port: enter the MYSQL service port
- Username: set the username for MYSQL connection
- Password: set the password for MYSQL connection
- Database name: enter the database name of the MYSQL connection
- Jdbc connection parameters: parameter settings for MYSQL connection, in JSON format

13
docs/docs/en/guide/datasource/postgresql.md

@ -0,0 +1,13 @@
# PostgreSQL
![postgresql](/img/new_ui/dev/datasource/postgresql.png)
- Datasource: select POSTGRESQL
- Datasource name: enter the name of the DataSource
- Description: enter a description of the DataSource
- IP/Host Name: enter the PostgreSQL service IP
- Port: enter the PostgreSQL service port
- Username: set the username for PostgreSQL connection
- Password: set the password for PostgreSQL connection
- Database name: enter the database name of the PostgreSQL connection
- Jdbc connection parameters: parameter settings for PostgreSQL connection, in JSON format

13
docs/docs/en/guide/datasource/spark.md

@ -0,0 +1,13 @@
# Spark
![sparksql](/img/new_ui/dev/datasource/sparksql.png)
- Datasource: select Spark
- Datasource name: enter the name of the DataSource
- Description: enter a description of the DataSource
- IP/Host Name: enter the Spark service IP
- Port: enter the Spark service port
- Username: set the username for Spark connection
- Password: set the password for Spark connection
- Database name: enter the database name of the Spark connection
- Jdbc connection parameters: parameter settings for Spark connection, in JSON format

248
docs/docs/en/guide/expansion-reduction.md

@ -0,0 +1,248 @@
# DolphinScheduler Expansion and Reduction
## Expansion
This article describes how to add a new master service or worker service to an existing DolphinScheduler cluster.
```
Attention: There cannot be more than one master service process or worker service process on a physical machine.
If the physical machine which locate the expansion master or worker node has already installed the scheduled service, check the [1.4 Modify configuration] and edit the configuration file `conf/config/install_config.conf` on ** all ** nodes, add masters or workers parameter, and restart the scheduling cluster.
```
### Basic software installation
* [required] [JDK](https://www.oracle.com/technetwork/java/javase/downloads/index.html) (version 1.8+): must install, install and configure `JAVA_HOME` and `PATH` variables under `/etc/profile`
* [optional] If the expansion is a worker node, you need to consider whether to install an external client, such as Hadoop, Hive, Spark Client.
```markdown
Attention: DolphinScheduler itself does not depend on Hadoop, Hive, Spark, but will only call their Client for the corresponding task submission.
```
### Get Installation Package
- Check the version of DolphinScheduler used in your existing environment, and get the installation package of the corresponding version, if the versions are different, there may be compatibility problems.
- Confirm the unified installation directory of other nodes, this article assumes that DolphinScheduler is installed in `/opt/` directory, and the full path is `/opt/dolphinscheduler`.
- Please download the corresponding version of the installation package to the server installation directory, uncompress it and rename it to `dolphinscheduler` and store it in the `/opt` directory.
- Add database dependency package, this document uses Mysql database, add `mysql-connector-java` driver package to `/opt/dolphinscheduler/lib` directory.
```shell
# create the installation directory, please do not create the installation directory in /root, /home and other high privilege directories
mkdir -p /opt
cd /opt
# decompress
tar -zxvf apache-dolphinscheduler-1.3.8-bin.tar.gz -C /opt
cd /opt
mv apache-dolphinscheduler-1.3.8-bin dolphinscheduler
```
```markdown
Attention: You can copy the installation package directly from an existing environment to an expanded physical machine.
```
### Create Deployment Users
- Create deployment user on **all** expansion machines, and make sure to configure sudo-free. If we plan to deploy scheduling on four expansion machines, ds1, ds2, ds3, and ds4, create deployment users on each machine is prerequisite.
```shell
# to create a user, you need to log in with root and set the deployment user name, modify it by yourself, the following take `dolphinscheduler` as an example:
useradd dolphinscheduler;
# set the user password, please change it by yourself, the following take `dolphinscheduler123` as an example
echo "dolphinscheduler123" | passwd --stdin dolphinscheduler
# configure sudo password-free
echo 'dolphinscheduler ALL=(ALL) NOPASSWD: NOPASSWD: ALL' >> /etc/sudoers
sed -i 's/Defaults requirett/#Defaults requirett/g' /etc/sudoers
```
```markdown
Attention:
- Since it is `sudo -u {linux-user}` to switch between different Linux users to run multi-tenant jobs, the deploying user needs to have sudo privileges and be password free.
- If you find the line `Default requiretty` in the `/etc/sudoers` file, please also comment it out.
- If have needs to use resource uploads, you also need to assign read and write permissions to the deployment user on `HDFS or MinIO`.
```
### Modify Configuration
- From an existing node such as `Master/Worker`, copy the configuration directory directly to replace the configuration directory in the new node. After finishing the file copy, check whether the configuration items are correct.
```markdown
Highlights:
datasource.properties: database connection information
zookeeper.properties: information for connecting zk
common.properties: Configuration information about the resource store (if hadoop is set up, please check if the core-site.xml and hdfs-site.xml configuration files exist).
env/dolphinscheduler_env.sh: environment Variables
````
- Modify the `dolphinscheduler_env.sh` environment variable in the `conf/env` directory according to the machine configuration (the following is the example that all the used software install under `/opt/soft`)
```shell
export HADOOP_HOME=/opt/soft/hadoop
export HADOOP_CONF_DIR=/opt/soft/hadoop/etc/hadoop
# export SPARK_HOME1=/opt/soft/spark1
export SPARK_HOME2=/opt/soft/spark2
export PYTHON_HOME=/opt/soft/python
export JAVA_HOME=/opt/soft/jav
export HIVE_HOME=/opt/soft/hive
export FLINK_HOME=/opt/soft/flink
export DATAX_HOME=/opt/soft/datax/bin/datax.py
export PATH=$HADOOP_HOME/bin:$SPARK_HOME2/bin:$PYTHON_HOME:$JAVA_HOME/bin:$HIVE_HOME/bin:$PATH:$FLINK_HOME/bin:$DATAX_HOME:$PATH
```
`Attention: This step is very important, such as `JAVA_HOME` and `PATH` is necessary to configure if haven not used just ignore or comment out`
- Soft link the `JDK` to `/usr/bin/java` (still using `JAVA_HOME=/opt/soft/java` as an example)
```shell
sudo ln -s /opt/soft/java/bin/java /usr/bin/java
```
- Modify the configuration file `conf/config/install_config.conf` on the **all** nodes, synchronizing the following configuration.
* To add a new master node, you need to modify the IPs and masters parameters.
* To add a new worker node, modify the IPs and workers parameters.
```shell
# which machines to deploy DS services on, separated by commas between multiple physical machines
ips="ds1,ds2,ds3,ds4"
# ssh port,default 22
sshPort="22"
# which machine the master service is deployed on
masters="existing master01,existing master02,ds1,ds2"
# the worker service is deployed on which machine, and specify the worker belongs to which worker group, the following example of "default" is the group name
workers="existing worker01:default,existing worker02:default,ds3:default,ds4:default"
```
- If the expansion is for worker nodes, you need to set the worker group, refer to the security of the [Worker grouping](./security.md)
- On all new nodes, change the directory permissions so that the deployment user has access to the DolphinScheduler directory
```shell
sudo chown -R dolphinscheduler:dolphinscheduler dolphinscheduler
```
### Restart the Cluster and Verify
- Restart the cluster
```shell
# stop command:
bin/stop-all.sh # stop all services
sh bin/dolphinscheduler-daemon.sh stop master-server # stop master service
sh bin/dolphinscheduler-daemon.sh stop worker-server # stop worker service
sh bin/dolphinscheduler-daemon.sh stop api-server # stop api service
sh bin/dolphinscheduler-daemon.sh stop alert-server # stop alert service
# start command::
bin/start-all.sh # start all services
sh bin/dolphinscheduler-daemon.sh start master-server # start master service
sh bin/dolphinscheduler-daemon.sh start worker-server # start worker service
sh bin/dolphinscheduler-daemon.sh start api-server # start api service
sh bin/dolphinscheduler-daemon.sh start alert-server # start alert service
```
```
Attention: When using `stop-all.sh` or `stop-all.sh`, if the physical machine execute the command is not configured to be ssh-free on all machines, it will prompt to enter the password
```
- After completing the script, use the `jps` command to see if every node service is started (`jps` comes with the `Java JDK`)
```
MasterServer ----- master service
WorkerServer ----- worker service
ApiApplicationServer ----- api service
AlertServer ----- alert service
```
After successful startup, you can view the logs, which are stored in the `logs` folder.
```Log Path
logs/
├── dolphinscheduler-alert-server.log
├── dolphinscheduler-master-server.log
├── dolphinscheduler-worker-server.log
├── dolphinscheduler-api-server.log
```
If the above services start normally and the scheduling system page is normal, check whether there is an expanded Master or Worker service in the [Monitor] of the web system. If it exists, the expansion is complete.
-----------------------------------------------------------------------------
## Reduction
The reduction is to reduce the master or worker services for the existing DolphinScheduler cluster.
There are two steps for shrinking. After performing the following two steps, the shrinking operation can be completed.
### Stop the Service on the Scaled-Down Node
* If you are scaling down the master node, identify the physical machine where the master service is located, and stop the master service on the physical machine.
* If scale down the worker node, determine the physical machine where the worker service scale down and stop the worker services on the physical machine.
```shell
# stop command:
bin/stop-all.sh # stop all services
sh bin/dolphinscheduler-daemon.sh stop master-server # stop master service
sh bin/dolphinscheduler-daemon.sh stop worker-server # stop worker service
sh bin/dolphinscheduler-daemon.sh stop api-server # stop api service
sh bin/dolphinscheduler-daemon.sh stop alert-server # stop alert service
# start command:
bin/start-all.sh # start all services
sh bin/dolphinscheduler-daemon.sh start master-server # start master service
sh bin/dolphinscheduler-daemon.sh start worker-server # start worker service
sh bin/dolphinscheduler-daemon.sh start api-server # start api service
sh bin/dolphinscheduler-daemon.sh start alert-server # start alert service
```
```
Attention: When using `stop-all.sh` or `stop-all.sh`, if the machine without the command is not configured to be ssh-free for all machines, it will prompt to enter the password
```
- After the script is completed, use the `jps` command to see if every node service was successfully shut down (`jps` comes with the `Java JDK`)
```
MasterServer ----- master service
WorkerServer ----- worker service
ApiApplicationServer ----- api service
AlertServer ----- alert service
```
If the corresponding master service or worker service does not exist, then the master or worker service is successfully shut down.
### Modify the Configuration File
- modify the configuration file `conf/config/install_config.conf` on the **all** nodes, synchronizing the following configuration.
* to scale down the master node, modify the IPs and masters parameters.
* to scale down worker nodes, modify the IPs and workers parameters.
```shell
# which machines to deploy DS services on, "localhost" for this machine
ips="ds1,ds2,ds3,ds4"
# ssh port,default: 22
sshPort="22"
# which machine the master service is deployed on
masters="existing master01,existing master02,ds1,ds2"
# The worker service is deployed on which machine, and specify which worker group this worker belongs to, the following example of "default" is the group name
workers="existing worker01:default,existing worker02:default,ds3:default,ds4:default"
```

123
docs/docs/en/guide/flink-call.md

@ -0,0 +1,123 @@
# Flink Calls Operating Steps
## Create a Queue
1. Log in to the scheduling system, click `Security`, then click `Queue manage` on the left, and click `Create queue` to create a queue.
2. Fill in the name and value of the queue, and click "Submit"
<p align="center">
<img src="/img/api/create_queue.png" width="80%" />
</p>
## Create a Tenant
```
1. The tenant corresponds to a Linux user, which the user worker uses to submit jobs. If the Linux OS environment does not have this user, the worker will create this user when executing the script.
2. Both the tenant and the tenant code are unique and cannot be repeated, just like a person only has one name and one ID number.
3. After creating a tenant, there will be a folder in the HDFS relevant directory.
```
<p align="center">
<img src="/img/api/create_tenant.png" width="80%" />
</p>
## Create a User
<p align="center">
<img src="/img/api/create_user.png" width="80%" />
</p>
## Create a Token
1. Log in to the scheduling system, click `Security`, then click `Token manage` on the left, and click `Create token` to create a token.
<p align="center">
<img src="/img/token-management-en.png" width="80%" />
</p>
2. Select the `Expiration time` (token validity time), select `User` (choose the specified user to perform the API operation), click "Generate token", copy the `Token` string, and click "Submit".
<p align="center">
<img src="/img/create-token-en1.png" width="80%" />
</p>
## Token Usage
1. Open the API documentation page
> Address:http://{api server ip}:12345/dolphinscheduler/doc.html?language=en_US&lang=en
<p align="center">
<img src="/img/api-documentation-en.png" width="80%" />
</p>
2. Select a test API, the API selected for this test is `queryAllProjectList`
> projects/query-project-list
3. Open `Postman`, fill in the API address, and enter the `Token` in `Headers`, and then send the request to view the result:
```
token: The Token just generated
```
<p align="center">
<img src="/img/test-api.png" width="80%" />
</p>
## User Authorization
<p align="center">
<img src="/img/api/user_authorization.png" width="80%" />
</p>
## User Login
```
http://192.168.1.163:12345/dolphinscheduler/ui/#/monitor/servers/master
```
<p align="center">
<img src="/img/api/user_login.png" width="80%" />
</p>
## Upload the Resource
<p align="center">
<img src="/img/api/upload_resource.png" width="80%" />
</p>
## Create a Workflow
<p align="center">
<img src="/img/api/create_workflow1.png" width="80%" />
</p>
<p align="center">
<img src="/img/api/create_workflow2.png" width="80%" />
</p>
<p align="center">
<img src="/img/api/create_workflow3.png" width="80%" />
</p>
<p align="center">
<img src="/img/api/create_workflow4.png" width="80%" />
</p>
## View the Execution Result
<p align="center">
<img src="/img/api/execution_result.png" width="80%" />
</p>
## View Log
<p align="center">
<img src="/img/api/log.png" width="80%" />
</p>

5
docs/docs/en/guide/homepage.md

@ -0,0 +1,5 @@
# Home Page
The home page contains task status statistics, process status statistics, and workflow definition statistics for all projects of the user.
![homepage](/img/new_ui/dev/homepage/homepage.png)

39
docs/docs/en/guide/installation/cluster.md

@ -0,0 +1,39 @@
# Cluster Deployment
Cluster deployment is to deploy the DolphinScheduler on multiple machines for running massive tasks in production.
If you are a new hand and want to experience DolphinScheduler functions, we recommend you install follow [Standalone deployment](standalone.md). If you want to experience more complete functions and schedule massive tasks, we recommend you install follow [pseudo-cluster deployment](pseudo-cluster.md). If you want to deploy DolphinScheduler in production, we recommend you follow [cluster deployment](cluster.md) or [Kubernetes deployment](kubernetes.md).
## Deployment Steps
Cluster deployment uses the same scripts and configuration files as [pseudo-cluster deployment](pseudo-cluster.md), so the preparation and deployment steps are the same as pseudo-cluster deployment. The difference is that [pseudo-cluster deployment](pseudo-cluster.md) is for one machine, while cluster deployment (Cluster) is for multiple machines. And steps of "Modify Configuration" are quite different between pseudo-cluster deployment and cluster deployment.
### Prerequisites and DolphinScheduler Startup Environment Preparations
Configure all the configurations refer to [pseudo-cluster deployment](pseudo-cluster.md) on every machine, except sections `Prerequisites`, `Start ZooKeeper` and `Initialize the Database` of the `DolphinScheduler Startup Environment`.
### Modify Configuration
This step differs quite a lot from [pseudo-cluster deployment](pseudo-cluster.md), because the deployment script transfers the required resources for installation to each deployment machine by using `scp`. So we only need to modify the configuration of the machine that runs `install.sh` script and configurations will dispatch to cluster by `scp`. The configuration file is under the path `conf/config/install_config.conf`, here we only need to modify section **INSTALL MACHINE**, **DolphinScheduler ENV, Database, Registry Server** and keep other sections the same as [pseudo-cluster deployment](pseudo-cluster .md), the following describes the parameters that must be modified:
```shell
# ---------------------------------------------------------
# INSTALL MACHINE
# ---------------------------------------------------------
# Using IP or machine hostname for the server going to deploy master, worker, API server, the IP of the server
# If you using a hostname, make sure machines could connect each other by hostname
# As below, the hostname of the machine deploying DolphinScheduler is ds1, ds2, ds3, ds4, ds5, where ds1, ds2 install the master server, ds3, ds4, and ds5 installs worker server, the alert server is installed in ds4, and the API server is installed in ds5
ips="ds1,ds2,ds3,ds4,ds5"
masters="ds1,ds2"
workers="ds3:default,ds4:default,ds5:default"
alertServer="ds4"
apiServers="ds5"
```
## Start and Login DolphinScheduler
Same as pseudo-cluster.md](pseudo-cluster.md)
## Start and Stop Server
Same as pseudo-cluster.md](pseudo-cluster.md)

754
docs/docs/en/guide/installation/kubernetes.md

@ -0,0 +1,754 @@
# QuickStart in Kubernetes
Kubernetes deployment is DolphinScheduler deployment in a Kubernetes cluster, which can schedule massive tasks and can be used in production.
If you are a new hand and want to experience DolphinScheduler functions, we recommend you install follow [Standalone deployment](standalone.md). If you want to experience more complete functions and schedule massive tasks, we recommend you install follow [pseudo-cluster deployment](pseudo-cluster.md). If you want to deploy DolphinScheduler in production, we recommend you follow [cluster deployment](cluster.md) or [Kubernetes deployment](kubernetes.md).
## Prerequisites
- [Helm](https://helm.sh/) version 3.1.0+
- [Kubernetes](https://kubernetes.io/) version 1.12+
- PV provisioner support in the underlying infrastructure
## Install DolphinScheduler
Please download the source code package `apache-dolphinscheduler-1.3.8-src.tar.gz`, download address: [download address](/en-us/download/download.html)
To publish the release name `dolphinscheduler` version, please execute the following commands:
```
$ tar -zxvf apache-dolphinscheduler-1.3.8-src.tar.gz
$ cd apache-dolphinscheduler-1.3.8-src/docker/kubernetes/dolphinscheduler
$ helm repo add bitnami https://charts.bitnami.com/bitnami
$ helm dependency update .
$ helm install dolphinscheduler . --set image.tag=1.3.8
```
To publish the release name `dolphinscheduler` version to `test` namespace:
```bash
$ helm install dolphinscheduler . -n test
```
> **Tip**: If a namespace named `test` is used, the optional parameter `-n test` needs to be added to the `helm` and `kubectl` commands.
These commands are used to deploy DolphinScheduler on the Kubernetes cluster by default. The [Appendix-Configuration](#appendix-configuration) section lists the parameters that can be configured during installation.
> **Tip**: List all releases using `helm list`
The **PostgreSQL** (with username `root`, password `root` and database `dolphinscheduler`) and **ZooKeeper** services will start by default.
## Access DolphinScheduler UI
If `ingress.enabled` in `values.yaml` is set to `true`, you could access `http://${ingress.host}/dolphinscheduler` in browser.
> **Tip**: If there is a problem with ingress access, please contact the Kubernetes administrator and refer to the [Ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/).
Otherwise, when `api.service.type=ClusterIP` you need to execute `port-forward` commands:
```bash
$ kubectl port-forward --address 0.0.0.0 svc/dolphinscheduler-api 12345:12345
$ kubectl port-forward --address 0.0.0.0 -n test svc/dolphinscheduler-api 12345:12345 # with test namespace
```
> **Tip**: If the error of `unable to do port forwarding: socat not found` appears, you need to install `socat` first.
Access the web: `http://localhost:12345/dolphinscheduler` (Modify the IP address if needed).
Or when `api.service.type=NodePort` you need to execute the command:
```bash
NODE_IP=$(kubectl get no -n {{ .Release.Namespace }} -o jsonpath="{.items[0].status.addresses[0].address}")
NODE_PORT=$(kubectl get svc {{ template "dolphinscheduler.fullname" . }}-api -n {{ .Release.Namespace }} -o jsonpath="{.spec.ports[0].nodePort}")
echo http://$NODE_IP:$NODE_PORT/dolphinscheduler
```
Access the web: `http://$NODE_IP:$NODE_PORT/dolphinscheduler`.
The default username is `admin` and the default password is `dolphinscheduler123`.
Please refer to the `Quick Start` in the chapter [Quick Start](../start/quick-start.md) to explore how to use DolphinScheduler.
## Uninstall the Chart
To uninstall or delete the `dolphinscheduler` deployment:
```bash
$ helm uninstall dolphinscheduler
```
The command removes all the Kubernetes components (except PVC) associated with the `dolphinscheduler` and deletes the release.
Run the command below to delete the PVC's associated with `dolphinscheduler`:
```bash
$ kubectl delete pvc -l app.kubernetes.io/instance=dolphinscheduler
```
> **Note**: Deleting the PVC's will delete all data as well. Please be cautious before doing it.
## Configuration
The configuration file is `values.yaml`, and the [Appendix-Configuration](#appendix-configuration) tables lists the configurable parameters of the DolphinScheduler and their default values.
## Support Matrix
| Type | Support | Notes |
| ------------------------------------------------------------ | ------------ | ------------------------------------- |
| Shell | Yes | |
| Python2 | Yes | |
| Python3 | Indirect Yes | Refer to FAQ |
| Hadoop2 | Indirect Yes | Refer to FAQ |
| Hadoop3 | Not Sure | Not tested |
| Spark-Local(client) | Indirect Yes | Refer to FAQ |
| Spark-YARN(cluster) | Indirect Yes | Refer to FAQ |
| Spark-Standalone(cluster) | Not Yet | |
| Spark-Kubernetes(cluster) | Not Yet | |
| Flink-Local(local>=1.11) | Not Yet | Generic CLI mode is not yet supported |
| Flink-YARN(yarn-cluster) | Indirect Yes | Refer to FAQ |
| Flink-YARN(yarn-session/yarn-per-job/yarn-application>=1.11) | Not Yet | Generic CLI mode is not yet supported |
| Flink-Standalone(default) | Not Yet | |
| Flink-Standalone(remote>=1.11) | Not Yet | Generic CLI mode is not yet supported |
| Flink-Kubernetes(default) | Not Yet | |
| Flink-Kubernetes(remote>=1.11) | Not Yet | Generic CLI mode is not yet supported |
| Flink-NativeKubernetes(kubernetes-session/application>=1.11) | Not Yet | Generic CLI mode is not yet supported |
| MapReduce | Indirect Yes | Refer to FAQ |
| Kerberos | Indirect Yes | Refer to FAQ |
| HTTP | Yes | |
| DataX | Indirect Yes | Refer to FAQ |
| Sqoop | Indirect Yes | Refer to FAQ |
| SQL-MySQL | Indirect Yes | Refer to FAQ |
| SQL-PostgreSQL | Yes | |
| SQL-Hive | Indirect Yes | Refer to FAQ |
| SQL-Spark | Indirect Yes | Refer to FAQ |
| SQL-ClickHouse | Indirect Yes | Refer to FAQ |
| SQL-Oracle | Indirect Yes | Refer to FAQ |
| SQL-SQLServer | Indirect Yes | Refer to FAQ |
| SQL-DB2 | Indirect Yes | Refer to FAQ |
## FAQ
### How to View the Logs of a Pod Container?
List all pods (aka `po`):
```
kubectl get po
kubectl get po -n test # with test namespace
```
View the logs of a pod container named `dolphinscheduler-master-0`:
```
kubectl logs dolphinscheduler-master-0
kubectl logs -f dolphinscheduler-master-0 # follow log output
kubectl logs --tail 10 dolphinscheduler-master-0 -n test # show last 10 lines from the end of the logs
```
### How to Scale API, master and worker on Kubernetes?
List all deployments (aka `deploy`):
```
kubectl get deploy
kubectl get deploy -n test # with test namespace
```
Scale api to 3 replicas:
```
kubectl scale --replicas=3 deploy dolphinscheduler-api
kubectl scale --replicas=3 deploy dolphinscheduler-api -n test # with test namespace
```
List all stateful sets (aka `sts`):
```
kubectl get sts
kubectl get sts -n test # with test namespace
```
Scale master to 2 replicas:
```
kubectl scale --replicas=2 sts dolphinscheduler-master
kubectl scale --replicas=2 sts dolphinscheduler-master -n test # with test namespace
```
Scale worker to 6 replicas:
```
kubectl scale --replicas=6 sts dolphinscheduler-worker
kubectl scale --replicas=6 sts dolphinscheduler-worker -n test # with test namespace
```
### How to Use MySQL as the DolphinScheduler's Database Instead of PostgreSQL?
> Because of the commercial license, we cannot directly use the driver of MySQL.
>
> If you want to use MySQL, you can build a new image based on the `apache/dolphinscheduler` image follow the following instructions:
1. Download the MySQL driver [mysql-connector-java-8.0.16.jar](https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.16/mysql-connector-java-8.0.16.jar).
2. Create a new `Dockerfile` to add MySQL driver:
```
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:1.3.8
COPY mysql-connector-java-8.0.16.jar /opt/dolphinscheduler/lib
```
3. Build a new docker image including MySQL driver:
```
docker build -t apache/dolphinscheduler:mysql-driver .
```
4. Push the docker image `apache/dolphinscheduler:mysql-driver` to a docker registry.
5. Modify image `repository` and update `tag` to `mysql-driver` in `values.yaml`.
6. Modify postgresql `enabled` to `false` in `values.yaml`.
7. Modify externalDatabase (especially modify `host`, `username` and `password`) in `values.yaml`:
```yaml
externalDatabase:
type: "mysql"
driver: "com.mysql.jdbc.Driver"
host: "localhost"
port: "3306"
username: "root"
password: "root"
database: "dolphinscheduler"
params: "useUnicode=true&characterEncoding=UTF-8"
```
8. Run a DolphinScheduler release in Kubernetes (See **Install DolphinScheduler**).
### How to Support MySQL Datasource in `Datasource manage`?
> Because of the commercial license, we cannot directly use the driver of MySQL.
>
> If you want to add MySQL datasource, you can build a new image based on the `apache/dolphinscheduler` image follow the following instructions:
1. Download the MySQL driver [mysql-connector-java-8.0.16.jar](https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.16/mysql-connector-java-8.0.16.jar).
2. Create a new `Dockerfile` to add MySQL driver:
```
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:1.3.8
COPY mysql-connector-java-8.0.16.jar /opt/dolphinscheduler/lib
```
3. Build a new docker image including MySQL driver:
```
docker build -t apache/dolphinscheduler:mysql-driver .
```
4. Push the docker image `apache/dolphinscheduler:mysql-driver` to a docker registry.
5. Modify image `repository` and update `tag` to `mysql-driver` in `values.yaml`.
6. Run a DolphinScheduler release in Kubernetes (See **Install DolphinScheduler**).
7. Add a MySQL datasource in `Datasource manage`.
### How to Support Oracle Datasource in `Datasource manage`?
> Because of the commercial license, we cannot directly use the driver of Oracle.
>
> If you want to add Oracle datasource, you can build a new image based on the `apache/dolphinscheduler` image follow the following instructions:
1. Download the Oracle driver [ojdbc8.jar](https://repo1.maven.org/maven2/com/oracle/database/jdbc/ojdbc8/) (such as `ojdbc8-19.9.0.0.jar`)
2. Create a new `Dockerfile` to add Oracle driver:
```
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:1.3.8
COPY ojdbc8-19.9.0.0.jar /opt/dolphinscheduler/lib
```
3. Build a new docker image including Oracle driver:
```
docker build -t apache/dolphinscheduler:oracle-driver .
```
4. Push the docker image `apache/dolphinscheduler:oracle-driver` to a docker registry.
5. Modify image `repository` and update `tag` to `oracle-driver` in `values.yaml`.
6. Run a DolphinScheduler release in Kubernetes (See **Install DolphinScheduler**).
7. Add an Oracle datasource in `Datasource manage`.
### How to Support Python 2 pip and Custom requirements.txt?
1. Create a new `Dockerfile` to install pip:
```
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:1.3.8
COPY requirements.txt /tmp
RUN apt-get update && \
apt-get install -y --no-install-recommends python-pip && \
pip install --no-cache-dir -r /tmp/requirements.txt && \
rm -rf /var/lib/apt/lists/*
```
The command will install the default **pip 18.1**. If you upgrade the pip, just add the following command.
```
pip install --no-cache-dir -U pip && \
```
2. Build a new docker image including pip:
```
docker build -t apache/dolphinscheduler:pip .
```
3. Push the docker image `apache/dolphinscheduler:pip` to a docker registry.
4. Modify image `repository` and update `tag` to `pip` in `values.yaml`.
5. Run a DolphinScheduler release in Kubernetes (See **Install DolphinScheduler**).
6. Verify pip under a new Python task.
### How to Support Python 3?
1. Create a new `Dockerfile` to install Python 3:
```
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:1.3.8
RUN apt-get update && \
apt-get install -y --no-install-recommends python3 && \
rm -rf /var/lib/apt/lists/*
```
The command will install the default **Python 3.7.3**. If you also want to install **pip3**, just replace `python3` with `python3-pip` like:
```
apt-get install -y --no-install-recommends python3-pip && \
```
2. Build a new docker image including Python 3:
```
docker build -t apache/dolphinscheduler:python3 .
```
3. Push the docker image `apache/dolphinscheduler:python3` to a docker registry.
4. Modify image `repository` and update `tag` to `python3` in `values.yaml`.
5. Modify `PYTHON_HOME` to `/usr/bin/python3` in `values.yaml`.
6. Run a DolphinScheduler release in Kubernetes (See **Install DolphinScheduler**).
7. Verify Python 3 under a new Python task.
### How to Support Hadoop, Spark, Flink, Hive or DataX?
Take Spark 2.4.7 as an example:
1. Download the Spark 2.4.7 release binary `spark-2.4.7-bin-hadoop2.7.tgz`.
2. Ensure that `common.sharedStoragePersistence.enabled` is turned on.
3. Run a DolphinScheduler release in Kubernetes (See **Install DolphinScheduler**).
4. Copy the Spark 2.4.7 release binary into the Docker container.
```bash
kubectl cp spark-2.4.7-bin-hadoop2.7.tgz dolphinscheduler-worker-0:/opt/soft
kubectl cp -n test spark-2.4.7-bin-hadoop2.7.tgz dolphinscheduler-worker-0:/opt/soft # with test namespace
```
Because the volume `sharedStoragePersistence` is mounted on `/opt/soft`, all files in `/opt/soft` will not be lost.
5. Attach the container and ensure that `SPARK_HOME2` exists.
```bash
kubectl exec -it dolphinscheduler-worker-0 bash
kubectl exec -n test -it dolphinscheduler-worker-0 bash # with test namespace
cd /opt/soft
tar zxf spark-2.4.7-bin-hadoop2.7.tgz
rm -f spark-2.4.7-bin-hadoop2.7.tgz
ln -s spark-2.4.7-bin-hadoop2.7 spark2 # or just mv
$SPARK_HOME2/bin/spark-submit --version
```
The last command will print the Spark version if everything goes well.
6. Verify Spark under a Shell task.
```
$SPARK_HOME2/bin/spark-submit --class org.apache.spark.examples.SparkPi $SPARK_HOME2/examples/jars/spark-examples_2.11-2.4.7.jar
```
Check whether the task log contains the output like `Pi is roughly 3.146015`.
7. Verify Spark under a Spark task.
The file `spark-examples_2.11-2.4.7.jar` needs to be uploaded to the resources first, and then create a Spark task with:
- Spark Version: `SPARK2`
- Main Class: `org.apache.spark.examples.SparkPi`
- Main Package: `spark-examples_2.11-2.4.7.jar`
- Deploy Mode: `local`
Similarly, check whether the task log contains the output like `Pi is roughly 3.146015`.
8. Verify Spark on YARN.
Spark on YARN (Deploy Mode is `cluster` or `client`) requires Hadoop support. Similar to Spark support, the operation of supporting Hadoop is almost the same as the previous steps.
Ensure that `$HADOOP_HOME` and `$HADOOP_CONF_DIR` exists.
### How to Support Spark 3?
In fact, the way to submit applications with `spark-submit` is the same, regardless of Spark 1, 2 or 3. In other words, the semantics of `SPARK_HOME2` is the second `SPARK_HOME` instead of `SPARK2`'s `HOME`, so just set `SPARK_HOME2=/path/to/spark3`.
Take Spark 3.1.1 as an example:
1. Download the Spark 3.1.1 release binary `spark-3.1.1-bin-hadoop2.7.tgz`.
2. Ensure that `common.sharedStoragePersistence.enabled` is turned on.
3. Run a DolphinScheduler release in Kubernetes (See **Install DolphinScheduler**).
4. Copy the Spark 3.1.1 release binary into the Docker container.
```bash
kubectl cp spark-3.1.1-bin-hadoop2.7.tgz dolphinscheduler-worker-0:/opt/soft
kubectl cp -n test spark-3.1.1-bin-hadoop2.7.tgz dolphinscheduler-worker-0:/opt/soft # with test namespace
```
5. Attach the container and ensure that `SPARK_HOME2` exists.
```bash
kubectl exec -it dolphinscheduler-worker-0 bash
kubectl exec -n test -it dolphinscheduler-worker-0 bash # with test namespace
cd /opt/soft
tar zxf spark-3.1.1-bin-hadoop2.7.tgz
rm -f spark-3.1.1-bin-hadoop2.7.tgz
ln -s spark-3.1.1-bin-hadoop2.7 spark2 # or just mv
$SPARK_HOME2/bin/spark-submit --version
```
The last command will print the Spark version if everything goes well.
6. Verify Spark under a Shell task.
```
$SPARK_HOME2/bin/spark-submit --class org.apache.spark.examples.SparkPi $SPARK_HOME2/examples/jars/spark-examples_2.12-3.1.1.jar
```
Check whether the task log contains the output like `Pi is roughly 3.146015`.
### How to Support Shared Storage Between Master, Worker and Api Server?
For example, Master, Worker and API server may use Hadoop at the same time.
1. Modify the following configurations in `values.yaml`
```yaml
common:
sharedStoragePersistence:
enabled: false
mountPath: "/opt/soft"
accessModes:
- "ReadWriteMany"
storageClassName: "-"
storage: "20Gi"
```
Modify `storageClassName` and `storage` to actual environment values.
> **Note**: `storageClassName` must support the access mode: `ReadWriteMany`.
2. Copy the Hadoop into the directory `/opt/soft`.
3. Ensure that `$HADOOP_HOME` and `$HADOOP_CONF_DIR` are correct.
### How to Support Local File Resource Storage Instead of HDFS and S3?
Modify the following configurations in `values.yaml`:
```yaml
common:
configmap:
RESOURCE_STORAGE_TYPE: "HDFS"
RESOURCE_UPLOAD_PATH: "/dolphinscheduler"
FS_DEFAULT_FS: "file:///"
fsFileResourcePersistence:
enabled: true
accessModes:
- "ReadWriteMany"
storageClassName: "-"
storage: "20Gi"
```
Modify `storageClassName` and `storage` to actual environment values.
> **Note**: `storageClassName` must support the access mode: `ReadWriteMany`.
### How to Support S3 Resource Storage Like MinIO?
Take MinIO as an example: Modify the following configurations in `values.yaml`:
```yaml
common:
configmap:
RESOURCE_STORAGE_TYPE: "S3"
RESOURCE_UPLOAD_PATH: "/dolphinscheduler"
FS_DEFAULT_FS: "s3a://BUCKET_NAME"
FS_S3A_ENDPOINT: "http://MINIO_IP:9000"
FS_S3A_ACCESS_KEY: "MINIO_ACCESS_KEY"
FS_S3A_SECRET_KEY: "MINIO_SECRET_KEY"
```
Modify `BUCKET_NAME`, `MINIO_IP`, `MINIO_ACCESS_KEY` and `MINIO_SECRET_KEY` to actual environment values.
> **Note**: `MINIO_IP` can only use IP instead of the domain name, because DolphinScheduler currently doesn't support S3 path style access.
### How to Configure SkyWalking?
Modify SkyWalking configurations in `values.yaml`:
```yaml
common:
configmap:
SKYWALKING_ENABLE: "true"
SW_AGENT_COLLECTOR_BACKEND_SERVICES: "127.0.0.1:11800"
SW_GRPC_LOG_SERVER_HOST: "127.0.0.1"
SW_GRPC_LOG_SERVER_PORT: "11800"
```
## Appendix-Configuration
| Parameter | Description | Default |
| --------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------ | ----------------------------------------------------- |
| `timezone` | World time and date for cities in all time zones | `Asia/Shanghai` |
| | | |
| `image.repository` | Docker image repository for the DolphinScheduler | `apache/dolphinscheduler` |
| `image.tag` | Docker image version for the DolphinScheduler | `latest` |
| `image.pullPolicy` | Image pull policy. Options: Always, Never, IfNotPresent | `IfNotPresent` |
| `image.pullSecret` | Image pull secret. An optional reference to secret in the same namespace to use for pulling any of the images | `nil` |
| | | |
| `postgresql.enabled` | If not exists external PostgreSQL, by default, the DolphinScheduler will use a internal PostgreSQL | `true` |
| `postgresql.postgresqlUsername` | The username for internal PostgreSQL | `root` |
| `postgresql.postgresqlPassword` | The password for internal PostgreSQL | `root` |
| `postgresql.postgresqlDatabase` | The database for internal PostgreSQL | `dolphinscheduler` |
| `postgresql.persistence.enabled` | Set `postgresql.persistence.enabled` to `true` to mount a new volume for internal PostgreSQL | `false` |
| `postgresql.persistence.size` | `PersistentVolumeClaim` size | `20Gi` |
| `postgresql.persistence.storageClass` | PostgreSQL data persistent volume storage class. If set to "-", storageClassName: "", which disables dynamic provisioning | `-` |
| `externalDatabase.type` | If exists external PostgreSQL, and set `postgresql.enabled` value to false. DolphinScheduler's database type will use it | `postgresql` |
| `externalDatabase.driver` | If exists external PostgreSQL, and set `postgresql.enabled` value to false. DolphinScheduler's database driver will use it | `org.postgresql.Driver` |
| `externalDatabase.host` | If exists external PostgreSQL, and set `postgresql.enabled` value to false. DolphinScheduler's database host will use it | `localhost` |
| `externalDatabase.port` | If exists external PostgreSQL, and set `postgresql.enabled` value to false. DolphinScheduler's database port will use it | `5432` |
| `externalDatabase.username` | If exists external PostgreSQL, and set `postgresql.enabled` value to false. DolphinScheduler's database username will use it | `root` |
| `externalDatabase.password` | If exists external PostgreSQL, and set `postgresql.enabled` value to false. DolphinScheduler's database password will use it | `root` |
| `externalDatabase.database` | If exists external PostgreSQL, and set `postgresql.enabled` value to false. DolphinScheduler's database database will use it | `dolphinscheduler` |
| `externalDatabase.params` | If exists external PostgreSQL, and set `postgresql.enabled` value to false. DolphinScheduler's database params will use it | `characterEncoding=utf8` |
| | | |
| `zookeeper.enabled` | If not exists external ZooKeeper, by default, the DolphinScheduler will use a internal ZooKeeper | `true` |
| `zookeeper.fourlwCommandsWhitelist` | A list of comma separated Four Letter Words commands to use | `srvr,ruok,wchs,cons` |
| `zookeeper.persistence.enabled` | Set `zookeeper.persistence.enabled` to `true` to mount a new volume for internal ZooKeeper | `false` |
| `zookeeper.persistence.size` | `PersistentVolumeClaim` size | `20Gi` |
| `zookeeper.persistence.storageClass` | ZooKeeper data persistent volume storage class. If set to "-", storageClassName: "", which disables dynamic provisioning | `-` |
| `zookeeper.zookeeperRoot` | Specify dolphinscheduler root directory in ZooKeeper | `/dolphinscheduler` |
| `externalZookeeper.zookeeperQuorum` | If exists external ZooKeeper, and set `zookeeper.enabled` value to false. Specify Zookeeper quorum | `127.0.0.1:2181` |
| `externalZookeeper.zookeeperRoot` | If exists external ZooKeeper, and set `zookeeper.enabled` value to false. Specify dolphinscheduler root directory in Zookeeper | `/dolphinscheduler` |
| | | |
| `common.configmap.DOLPHINSCHEDULER_OPTS` | The jvm options for dolphinscheduler, suitable for all servers | `""` |
| `common.configmap.DATA_BASEDIR_PATH` | User data directory path, self configuration, please make sure the directory exists and have read write permissions | `/tmp/dolphinscheduler` |
| `common.configmap.RESOURCE_STORAGE_TYPE` | Resource storage type: HDFS, S3, NONE | `HDFS` |
| `common.configmap.RESOURCE_UPLOAD_PATH` | Resource store on HDFS/S3 path, please make sure the directory exists on hdfs and have read write permissions | `/dolphinscheduler` |
| `common.configmap.FS_DEFAULT_FS` | Resource storage file system like `file:///`, `hdfs://mycluster:8020` or `s3a://dolphinscheduler` | `file:///` |
| `common.configmap.FS_S3A_ENDPOINT` | S3 endpoint when `common.configmap.RESOURCE_STORAGE_TYPE` is set to `S3` | `s3.xxx.amazonaws.com` |
| `common.configmap.FS_S3A_ACCESS_KEY` | S3 access key when `common.configmap.RESOURCE_STORAGE_TYPE` is set to `S3` | `xxxxxxx` |
| `common.configmap.FS_S3A_SECRET_KEY` | S3 secret key when `common.configmap.RESOURCE_STORAGE_TYPE` is set to `S3` | `xxxxxxx` |
| `common.configmap.HADOOP_SECURITY_AUTHENTICATION_STARTUP_STATE` | Whether to startup kerberos | `false` |
| `common.configmap.JAVA_SECURITY_KRB5_CONF_PATH` | The java.security.krb5.conf path | `/opt/krb5.conf` |
| `common.configmap.LOGIN_USER_KEYTAB_USERNAME` | The login user from keytab username | `hdfs@HADOOP.COM` |
| `common.configmap.LOGIN_USER_KEYTAB_PATH` | The login user from keytab path | `/opt/hdfs.keytab` |
| `common.configmap.KERBEROS_EXPIRE_TIME` | The kerberos expire time, the unit is hour | `2` |
| `common.configmap.HDFS_ROOT_USER` | The HDFS root user who must have the permission to create directories under the HDFS root path | `hdfs` |
| `common.configmap.RESOURCE_MANAGER_HTTPADDRESS_PORT` | Set resource manager httpaddress port for yarn | `8088` |
| `common.configmap.YARN_RESOURCEMANAGER_HA_RM_IDS` | If resourcemanager HA is enabled, please set the HA IPs | `nil` |
| `common.configmap.YARN_APPLICATION_STATUS_ADDRESS` | If resourcemanager is single, you only need to replace ds1 to actual resourcemanager hostname, otherwise keep default | `http://ds1:%s/ws/v1/cluster/apps/%s` |
| `common.configmap.SKYWALKING_ENABLE` | Set whether to enable skywalking | `false` |
| `common.configmap.SW_AGENT_COLLECTOR_BACKEND_SERVICES` | Set agent collector backend services for skywalking | `127.0.0.1:11800` |
| `common.configmap.SW_GRPC_LOG_SERVER_HOST` | Set grpc log server host for skywalking | `127.0.0.1` |
| `common.configmap.SW_GRPC_LOG_SERVER_PORT` | Set grpc log server port for skywalking | `11800` |
| `common.configmap.HADOOP_HOME` | Set `HADOOP_HOME` for DolphinScheduler's task environment | `/opt/soft/hadoop` |
| `common.configmap.HADOOP_CONF_DIR` | Set `HADOOP_CONF_DIR` for DolphinScheduler's task environment | `/opt/soft/hadoop/etc/hadoop` |
| `common.configmap.SPARK_HOME1` | Set `SPARK_HOME1` for DolphinScheduler's task environment | `/opt/soft/spark1` |
| `common.configmap.SPARK_HOME2` | Set `SPARK_HOME2` for DolphinScheduler's task environment | `/opt/soft/spark2` |
| `common.configmap.PYTHON_HOME` | Set `PYTHON_HOME` for DolphinScheduler's task environment | `/usr/bin/python` |
| `common.configmap.JAVA_HOME` | Set `JAVA_HOME` for DolphinScheduler's task environment | `/usr/local/openjdk-8` |
| `common.configmap.HIVE_HOME` | Set `HIVE_HOME` for DolphinScheduler's task environment | `/opt/soft/hive` |
| `common.configmap.FLINK_HOME` | Set `FLINK_HOME` for DolphinScheduler's task environment | `/opt/soft/flink` |
| `common.configmap.DATAX_HOME` | Set `DATAX_HOME` for DolphinScheduler's task environment | `/opt/soft/datax` |
| `common.sharedStoragePersistence.enabled` | Set `common.sharedStoragePersistence.enabled` to `true` to mount a shared storage volume for Hadoop, Spark binary and etc | `false` |
| `common.sharedStoragePersistence.mountPath` | The mount path for the shared storage volume | `/opt/soft` |
| `common.sharedStoragePersistence.accessModes` | `PersistentVolumeClaim` access modes, must be `ReadWriteMany` | `[ReadWriteMany]` |
| `common.sharedStoragePersistence.storageClassName` | Shared Storage persistent volume storage class, must support the access mode: ReadWriteMany | `-` |
| `common.sharedStoragePersistence.storage` | `PersistentVolumeClaim` size | `20Gi` |
| `common.fsFileResourcePersistence.enabled` | Set `common.fsFileResourcePersistence.enabled` to `true` to mount a new file resource volume for `api` and `worker` | `false` |
| `common.fsFileResourcePersistence.accessModes` | `PersistentVolumeClaim` access modes, must be `ReadWriteMany` | `[ReadWriteMany]` |
| `common.fsFileResourcePersistence.storageClassName` | Resource persistent volume storage class, must support the access mode: ReadWriteMany | `-` |
| `common.fsFileResourcePersistence.storage` | `PersistentVolumeClaim` size | `20Gi` |
| | | |
| `master.podManagementPolicy` | PodManagementPolicy controls how pods are created during initial scale up, when replacing pods on nodes, or when scaling down | `Parallel` |
| `master.replicas` | Replicas is the desired number of replicas of the given Template | `3` |
| `master.annotations` | The `annotations` for master server | `{}` |
| `master.affinity` | If specified, the pod's scheduling constraints | `{}` |
| `master.nodeSelector` | NodeSelector is a selector which must be true for the pod to fit on a node | `{}` |
| `master.tolerations` | If specified, the pod's tolerations | `{}` |
| `master.resources` | The `resource` limit and request config for master server | `{}` |
| `master.configmap.MASTER_SERVER_OPTS` | The jvm options for master server | `-Xms1g -Xmx1g -Xmn512m` |
| `master.configmap.MASTER_EXEC_THREADS` | Master execute thread number to limit process instances | `100` |
| `master.configmap.MASTER_EXEC_TASK_NUM` | Master execute task number in parallel per process instance | `20` |
| `master.configmap.MASTER_DISPATCH_TASK_NUM` | Master dispatch task number per batch | `3` |
| `master.configmap.MASTER_HOST_SELECTOR` | Master host selector to select a suitable worker, optional values include Random, RoundRobin, LowerWeight | `LowerWeight` |
| `master.configmap.MASTER_HEARTBEAT_INTERVAL` | Master heartbeat interval, the unit is second | `10` |
| `master.configmap.MASTER_TASK_COMMIT_RETRYTIMES` | Master commit task retry times | `5` |
| `master.configmap.MASTER_TASK_COMMIT_INTERVAL` | master commit task interval, the unit is second | `1` |
| `master.configmap.MASTER_MAX_CPULOAD_AVG` | Master max cpuload avg, only higher than the system cpu load average, master server can schedule | `-1` (`the number of cpu cores * 2`) |
| `master.configmap.MASTER_RESERVED_MEMORY` | Master reserved memory, only lower than system available memory, master server can schedule, the unit is G | `0.3` |
| `master.livenessProbe.enabled` | Turn on and off liveness probe | `true` |
| `master.livenessProbe.initialDelaySeconds` | Delay before liveness probe is initiated | `30` |
| `master.livenessProbe.periodSeconds` | How often to perform the probe | `30` |
| `master.livenessProbe.timeoutSeconds` | When the probe times out | `5` |
| `master.livenessProbe.failureThreshold` | Minimum consecutive successes for the probe | `3` |
| `master.livenessProbe.successThreshold` | Minimum consecutive failures for the probe | `1` |
| `master.readinessProbe.enabled` | Turn on and off readiness probe | `true` |
| `master.readinessProbe.initialDelaySeconds` | Delay before readiness probe is initiated | `30` |
| `master.readinessProbe.periodSeconds` | How often to perform the probe | `30` |
| `master.readinessProbe.timeoutSeconds` | When the probe times out | `5` |
| `master.readinessProbe.failureThreshold` | Minimum consecutive successes for the probe | `3` |
| `master.readinessProbe.successThreshold` | Minimum consecutive failures for the probe | `1` |
| `master.persistentVolumeClaim.enabled` | Set `master.persistentVolumeClaim.enabled` to `true` to mount a new volume for `master` | `false` |
| `master.persistentVolumeClaim.accessModes` | `PersistentVolumeClaim` access modes | `[ReadWriteOnce]` |
| `master.persistentVolumeClaim.storageClassName` | `Master` logs data persistent volume storage class. If set to "-", storageClassName: "", which disables dynamic provisioning | `-` |
| `master.persistentVolumeClaim.storage` | `PersistentVolumeClaim` size | `20Gi` |
| | | |
| `worker.podManagementPolicy` | PodManagementPolicy controls how pods are created during initial scale up, when replacing pods on nodes, or when scaling down | `Parallel` |
| `worker.replicas` | Replicas is the desired number of replicas of the given Template | `3` |
| `worker.annotations` | The `annotations` for worker server | `{}` |
| `worker.affinity` | If specified, the pod's scheduling constraints | `{}` |
| `worker.nodeSelector` | NodeSelector is a selector which must be true for the pod to fit on a node | `{}` |
| `worker.tolerations` | If specified, the pod's tolerations | `{}` |
| `worker.resources` | The `resource` limit and request config for worker server | `{}` |
| `worker.configmap.WORKER_SERVER_OPTS` | The jvm options for worker server | `-Xms1g -Xmx1g -Xmn512m` |
| `worker.configmap.WORKER_EXEC_THREADS` | Worker execute thread number to limit task instances | `100` |
| `worker.configmap.WORKER_HEARTBEAT_INTERVAL` | Worker heartbeat interval, the unit is second | `10` |
| `worker.configmap.WORKER_MAX_CPULOAD_AVG` | Worker max cpuload avg, only higher than the system cpu load average, worker server can be dispatched tasks | `-1` (`the number of cpu cores * 2`) |
| `worker.configmap.WORKER_RESERVED_MEMORY` | Worker reserved memory, only lower than system available memory, worker server can be dispatched tasks, the unit is G | `0.3` |
| `worker.configmap.WORKER_GROUPS` | Worker groups | `default` |
| `worker.livenessProbe.enabled` | Turn on and off liveness probe | `true` |
| `worker.livenessProbe.initialDelaySeconds` | Delay before liveness probe is initiated | `30` |
| `worker.livenessProbe.periodSeconds` | How often to perform the probe | `30` |
| `worker.livenessProbe.timeoutSeconds` | When the probe times out | `5` |
| `worker.livenessProbe.failureThreshold` | Minimum consecutive successes for the probe | `3` |
| `worker.livenessProbe.successThreshold` | Minimum consecutive failures for the probe | `1` |
| `worker.readinessProbe.enabled` | Turn on and off readiness probe | `true` |
| `worker.readinessProbe.initialDelaySeconds` | Delay before readiness probe is initiated | `30` |
| `worker.readinessProbe.periodSeconds` | How often to perform the probe | `30` |
| `worker.readinessProbe.timeoutSeconds` | When the probe times out | `5` |
| `worker.readinessProbe.failureThreshold` | Minimum consecutive successes for the probe | `3` |
| `worker.readinessProbe.successThreshold` | Minimum consecutive failures for the probe | `1` |
| `worker.persistentVolumeClaim.enabled` | Set `worker.persistentVolumeClaim.enabled` to `true` to enable `persistentVolumeClaim` for `worker` | `false` |
| `worker.persistentVolumeClaim.dataPersistentVolume.enabled` | Set `worker.persistentVolumeClaim.dataPersistentVolume.enabled` to `true` to mount a data volume for `worker` | `false` |
| `worker.persistentVolumeClaim.dataPersistentVolume.accessModes` | `PersistentVolumeClaim` access modes | `[ReadWriteOnce]` |
| `worker.persistentVolumeClaim.dataPersistentVolume.storageClassName` | `Worker` data persistent volume storage class. If set to "-", storageClassName: "", which disables dynamic provisioning | `-` |
| `worker.persistentVolumeClaim.dataPersistentVolume.storage` | `PersistentVolumeClaim` size | `20Gi` |
| `worker.persistentVolumeClaim.logsPersistentVolume.enabled` | Set `worker.persistentVolumeClaim.logsPersistentVolume.enabled` to `true` to mount a logs volume for `worker` | `false` |
| `worker.persistentVolumeClaim.logsPersistentVolume.accessModes` | `PersistentVolumeClaim` access modes | `[ReadWriteOnce]` |
| `worker.persistentVolumeClaim.logsPersistentVolume.storageClassName` | `Worker` logs data persistent volume storage class. If set to "-", storageClassName: "", which disables dynamic provisioning | `-` |
| `worker.persistentVolumeClaim.logsPersistentVolume.storage` | `PersistentVolumeClaim` size | `20Gi` |
| | | |
| `alert.replicas` | Replicas is the desired number of replicas of the given Template | `1` |
| `alert.strategy.type` | Type of deployment. Can be "Recreate" or "RollingUpdate" | `RollingUpdate` |
| `alert.strategy.rollingUpdate.maxSurge` | The maximum number of pods that can be scheduled above the desired number of pods | `25%` |
| `alert.strategy.rollingUpdate.maxUnavailable` | The maximum number of pods that can be unavailable during the update | `25%` |
| `alert.annotations` | The `annotations` for alert server | `{}` |
| `alert.affinity` | If specified, the pod's scheduling constraints | `{}` |
| `alert.nodeSelector` | NodeSelector is a selector which must be true for the pod to fit on a node | `{}` |
| `alert.tolerations` | If specified, the pod's tolerations | `{}` |
| `alert.resources` | The `resource` limit and request config for alert server | `{}` |
| `alert.configmap.ALERT_SERVER_OPTS` | The jvm options for alert server | `-Xms512m -Xmx512m -Xmn256m` |
| `alert.configmap.XLS_FILE_PATH` | XLS file path | `/tmp/xls` |
| `alert.configmap.MAIL_SERVER_HOST` | Mail `SERVER HOST ` | `nil` |
| `alert.configmap.MAIL_SERVER_PORT` | Mail `SERVER PORT` | `nil` |
| `alert.configmap.MAIL_SENDER` | Mail `SENDER` | `nil` |
| `alert.configmap.MAIL_USER` | Mail `USER` | `nil` |
| `alert.configmap.MAIL_PASSWD` | Mail `PASSWORD` | `nil` |
| `alert.configmap.MAIL_SMTP_STARTTLS_ENABLE` | Mail `SMTP STARTTLS` enable | `false` |
| `alert.configmap.MAIL_SMTP_SSL_ENABLE` | Mail `SMTP SSL` enable | `false` |
| `alert.configmap.MAIL_SMTP_SSL_TRUST` | Mail `SMTP SSL TRUST` | `nil` |
| `alert.configmap.ENTERPRISE_WECHAT_ENABLE` | `Enterprise Wechat` enable | `false` |
| `alert.configmap.ENTERPRISE_WECHAT_CORP_ID` | `Enterprise Wechat` corp id | `nil` |
| `alert.configmap.ENTERPRISE_WECHAT_SECRET` | `Enterprise Wechat` secret | `nil` |
| `alert.configmap.ENTERPRISE_WECHAT_AGENT_ID` | `Enterprise Wechat` agent id | `nil` |
| `alert.configmap.ENTERPRISE_WECHAT_USERS` | `Enterprise Wechat` users | `nil` |
| `alert.livenessProbe.enabled` | Turn on and off liveness probe | `true` |
| `alert.livenessProbe.initialDelaySeconds` | Delay before liveness probe is initiated | `30` |
| `alert.livenessProbe.periodSeconds` | How often to perform the probe | `30` |
| `alert.livenessProbe.timeoutSeconds` | When the probe times out | `5` |
| `alert.livenessProbe.failureThreshold` | Minimum consecutive successes for the probe | `3` |
| `alert.livenessProbe.successThreshold` | Minimum consecutive failures for the probe | `1` |
| `alert.readinessProbe.enabled` | Turn on and off readiness probe | `true` |
| `alert.readinessProbe.initialDelaySeconds` | Delay before readiness probe is initiated | `30` |
| `alert.readinessProbe.periodSeconds` | How often to perform the probe | `30` |
| `alert.readinessProbe.timeoutSeconds` | When the probe times out | `5` |
| `alert.readinessProbe.failureThreshold` | Minimum consecutive successes for the probe | `3` |
| `alert.readinessProbe.successThreshold` | Minimum consecutive failures for the probe | `1` |
| `alert.persistentVolumeClaim.enabled` | Set `alert.persistentVolumeClaim.enabled` to `true` to mount a new volume for `alert` | `false` |
| `alert.persistentVolumeClaim.accessModes` | `PersistentVolumeClaim` access modes | `[ReadWriteOnce]` |
| `alert.persistentVolumeClaim.storageClassName` | `Alert` logs data persistent volume storage class. If set to "-", storageClassName: "", which disables dynamic provisioning | `-` |
| `alert.persistentVolumeClaim.storage` | `PersistentVolumeClaim` size | `20Gi` |
| | | |
| `api.replicas` | Replicas is the desired number of replicas of the given Template | `1` |
| `api.strategy.type` | Type of deployment. Can be "Recreate" or "RollingUpdate" | `RollingUpdate` |
| `api.strategy.rollingUpdate.maxSurge` | The maximum number of pods that can be scheduled above the desired number of pods | `25%` |
| `api.strategy.rollingUpdate.maxUnavailable` | The maximum number of pods that can be unavailable during the update | `25%` |
| `api.annotations` | The `annotations` for api server | `{}` |
| `api.affinity` | If specified, the pod's scheduling constraints | `{}` |
| `api.nodeSelector` | NodeSelector is a selector which must be true for the pod to fit on a node | `{}` |
| `api.tolerations` | If specified, the pod's tolerations | `{}` |
| `api.resources` | The `resource` limit and request config for api server | `{}` |
| `api.configmap.API_SERVER_OPTS` | The jvm options for api server | `-Xms512m -Xmx512m -Xmn256m` |
| `api.livenessProbe.enabled` | Turn on and off liveness probe | `true` |
| `api.livenessProbe.initialDelaySeconds` | Delay before liveness probe is initiated | `30` |
| `api.livenessProbe.periodSeconds` | How often to perform the probe | `30` |
| `api.livenessProbe.timeoutSeconds` | When the probe times out | `5` |
| `api.livenessProbe.failureThreshold` | Minimum consecutive successes for the probe | `3` |
| `api.livenessProbe.successThreshold` | Minimum consecutive failures for the probe | `1` |
| `api.readinessProbe.enabled` | Turn on and off readiness probe | `true` |
| `api.readinessProbe.initialDelaySeconds` | Delay before readiness probe is initiated | `30` |
| `api.readinessProbe.periodSeconds` | How often to perform the probe | `30` |
| `api.readinessProbe.timeoutSeconds` | When the probe times out | `5` |
| `api.readinessProbe.failureThreshold` | Minimum consecutive successes for the probe | `3` |
| `api.readinessProbe.successThreshold` | Minimum consecutive failures for the probe | `1` |
| `api.persistentVolumeClaim.enabled` | Set `api.persistentVolumeClaim.enabled` to `true` to mount a new volume for `api` | `false` |
| `api.persistentVolumeClaim.accessModes` | `PersistentVolumeClaim` access modes | `[ReadWriteOnce]` |
| `api.persistentVolumeClaim.storageClassName` | `api` logs data persistent volume storage class. If set to "-", storageClassName: "", which disables dynamic provisioning | `-` |
| `api.persistentVolumeClaim.storage` | `PersistentVolumeClaim` size | `20Gi` |
| `api.service.type` | `type` determines how the Service is exposed. Valid options are ExternalName, ClusterIP, NodePort, and LoadBalancer | `ClusterIP` |
| `api.service.clusterIP` | `clusterIP` is the IP address of the service and is usually assigned randomly by the master | `nil` |
| `api.service.nodePort` | `nodePort` is the port on each node on which this service is exposed when type=NodePort | `nil` |
| `api.service.externalIPs` | `externalIPs` is a list of IP addresses for which nodes in the cluster will also accept traffic for this service | `[]` |
| `api.service.externalName` | `externalName` is the external reference that kubedns or equivalent will return as a CNAME record for this service | `nil` |
| `api.service.loadBalancerIP` | `loadBalancerIP` when service.type is LoadBalancer. LoadBalancer will get created with the IP specified in this field | `nil` |
| `api.service.annotations` | `annotations` may need to be set when service.type is LoadBalancer | `{}` |
| | | |
| `ingress.enabled` | Enable ingress | `false` |
| `ingress.host` | Ingress host | `dolphinscheduler.org` |
| `ingress.path` | Ingress path | `/dolphinscheduler` |
| `ingress.tls.enabled` | Enable ingress tls | `false` |
| `ingress.tls.secretName` | Ingress tls secret name | `dolphinscheduler-tls` |

201
docs/docs/en/guide/installation/pseudo-cluster.md

@ -0,0 +1,201 @@
# Pseudo-Cluster Deployment
The purpose of the pseudo-cluster deployment is to deploy the DolphinScheduler service on a single machine. In this mode, DolphinScheduler's master, worker, API server, are all on the same machine.
If you are a new hand and want to experience DolphinScheduler functions, we recommend you install follow [Standalone deployment](standalone.md). If you want to experience more complete functions and schedule massive tasks, we recommend you install follow [pseudo-cluster deployment](pseudo-cluster.md). If you want to deploy DolphinScheduler in production, we recommend you follow [cluster deployment](cluster.md) or [Kubernetes deployment](kubernetes.md).
## Preparation
Pseudo-cluster deployment of DolphinScheduler requires external software support:
* JDK:Download [JDK][jdk] (1.8+), and configure `JAVA_HOME` to and `PATH` variable. You can skip this step, if it already exists in your environment.
* Binary package: Download the DolphinScheduler binary package at [download page](https://dolphinscheduler.apache.org/en-us/download/download.html)
* Database: [PostgreSQL](https://www.postgresql.org/download/) (8.2.15+) or [MySQL](https://dev.mysql.com/downloads/mysql/) (5.7+), you can choose one of the two, such as MySQL requires JDBC Driver 8.0.16
* Registry Center: [ZooKeeper](https://zookeeper.apache.org/releases.html) (3.4.6+),[download link][zookeeper]
* Process tree analysis
* `pstree` for macOS
* `psmisc` for Fedora/Red/Hat/CentOS/Ubuntu/Debian
> **_Note:_** DolphinScheduler itself does not depend on Hadoop, Hive, Spark, but if you need to run tasks that depend on them, you need to have the corresponding environment support.
## DolphinScheduler Startup Environment
### Configure User Exemption and Permissions
Create a deployment user, and make sure to configure `sudo` without password. Here make an example to create user `dolphinscheduler`:
```shell
# To create a user, login as root
useradd dolphinscheduler
# Add password
echo "dolphinscheduler" | passwd --stdin dolphinscheduler
# Configure sudo without password
sed -i '$adolphinscheduler ALL=(ALL) NOPASSWD: NOPASSWD: ALL' /etc/sudoers
sed -i 's/Defaults requirett/#Defaults requirett/g' /etc/sudoers
# Modify directory permissions and grant permissions for user you created above
chown -R dolphinscheduler:dolphinscheduler apache-dolphinscheduler-*-bin
```
> **_NOTICE:_**
>
> * Due to DolphinScheduler's multi-tenant task switch user using command `sudo -u {linux-user}`, the deployment user needs to have `sudo` privileges and be password-free. If novice learners don’t understand, you can ignore this point for now.
> * If you find the line "Defaults requirett" in the `/etc/sudoers` file, please comment the content.
### Configure Machine SSH Password-Free Login
Since resources need to be sent to different machines during installation, SSH password-free login is required between each machine. The following shows the steps to configure password-free login:
```shell
su dolphinscheduler
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
```
> **_Notice:_** After the configuration is complete, you can run the command `ssh localhost` to test works or not. If you can login with ssh without password stands for successful.
### Start ZooKeeper
Go to the ZooKeeper installation directory, copy configure file `zoo_sample.cfg` to `conf/zoo.cfg`, and change value of dataDir in `conf/zoo.cfg` to `dataDir=./tmp/zookeeper`.
```shell
# Start ZooKeeper
./bin/zkServer.sh start
```
<!--
Modify the database configuration and initialize
```properties
spring.datasource.driver-class-name=com.mysql.jdbc.Driver
spring.datasource.url=jdbc:mysql://localhost:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8&allowMultiQueries=true
# Modify it if you are not using dolphinscheduler/dolphinscheduler as your username and password
spring.datasource.username=dolphinscheduler
spring.datasource.password=dolphinscheduler
```
After modifying and saving, execute the following command to create database tables and init basic data.
```shell
sh script/create-dolphinscheduler.sh
```
-->
## Modify Configuration
After completing the preparation of the basic environment, you need to modify the configuration file according to your environment. The configuration file is in the path of `conf/config/install_config.conf`. Generally, you just need to modify the **INSTALL MACHINE, DolphinScheduler ENV, Database, Registry Server** part to complete the deployment, the following describes the parameters that must be modified:
```shell
# ---------------------------------------------------------
# INSTALL MACHINE
# ---------------------------------------------------------
# Due to the master, worker, and API server being deployed on a single node, the IP of the server is the machine IP or localhost
ips="localhost"
masters="localhost"
workers="localhost:default"
alertServer="localhost"
apiServers="localhost"
# DolphinScheduler installation path, it will auto-create if not exists
installPath="~/dolphinscheduler"
# Deploy user, use the user you create in section **Configure machine SSH password-free login**
deployUser="dolphinscheduler"
# ---------------------------------------------------------
# DolphinScheduler ENV
# ---------------------------------------------------------
# The path of JAVA_HOME, which JDK install path in section **Preparation**
javaHome="/your/java/home/here"
# ---------------------------------------------------------
# Database
# ---------------------------------------------------------
# Database type, username, password, IP, port, metadata. For now `dbtype` supports `mysql` and `postgresql`
dbtype="mysql"
dbhost="localhost:3306"
# Need to modify if you are not using `dolphinscheduler/dolphinscheduler` as your username and password
username="dolphinscheduler"
password="dolphinscheduler"
dbname="dolphinscheduler"
# ---------------------------------------------------------
# Registry Server
# ---------------------------------------------------------
# Registration center address, the address of ZooKeeper service
registryServers="localhost:2181"
```
## Initialize the Database
DolphinScheduler metadata is stored in the relational database. Currently, supports PostgreSQL and MySQL. If you use MySQL, you need to manually download [mysql-connector-java driver][mysql] (8.0.16) and move it to the lib directory of DolphinScheduler. Let's take MySQL as an example for how to initialize the database:
```shell
mysql -uroot -p
mysql> CREATE DATABASE dolphinscheduler DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
# Change {user} and {password} by requests
mysql> GRANT ALL PRIVILEGES ON dolphinscheduler.* TO '{user}'@'%' IDENTIFIED BY '{password}';
mysql> GRANT ALL PRIVILEGES ON dolphinscheduler.* TO '{user}'@'localhost' IDENTIFIED BY '{password}';
mysql> flush privileges;
```
After the above steps done you would create a new database for DolphinScheduler, then run Shell scripts to init database:
```shell
sh script/create-dolphinscheduler.sh
```
## Start DolphinScheduler
Use **deployment user** you created above, running the following command to complete the deployment, and the server log will be stored in the logs folder.
```shell
sh install.sh
```
> **_Note:_** For the first time deployment, there maybe occur five times of `sh: bin/dolphinscheduler-daemon.sh: No such file or directory` in the terminal,
this is non-important information that you can ignore.
## Login DolphinScheduler
Access address `http://localhost:12345/dolphinscheduler` and login DolphinScheduler UI. The default username and password are **admin/dolphinscheduler123**
## Start or Stop Server
```shell
# Stop all DolphinScheduler server
sh ./bin/stop-all.sh
# Start all DolphinScheduler server
sh ./bin/start-all.sh
# Start or stop DolphinScheduler Master
sh ./bin/dolphinscheduler-daemon.sh stop master-server
sh ./bin/dolphinscheduler-daemon.sh start master-server
# Start or stop DolphinScheduler Worker
sh ./bin/dolphinscheduler-daemon.sh start worker-server
sh ./bin/dolphinscheduler-daemon.sh stop worker-server
# Start or stop DolphinScheduler Api
sh ./bin/dolphinscheduler-daemon.sh start api-server
sh ./bin/dolphinscheduler-daemon.sh stop api-server
# Start or stop Alert
sh ./bin/dolphinscheduler-daemon.sh start alert-server
sh ./bin/dolphinscheduler-daemon.sh stop alert-server
```
> **_Note:_**: Please refer to the section of "System Architecture Design" for service usage
[jdk]: https://www.oracle.com/technetwork/java/javase/downloads/index.html
[zookeeper]: https://zookeeper.apache.org/releases.html
[mysql]: https://downloads.MySQL.com/archives/c-j/
[issue]: https://github.com/apache/dolphinscheduler/issues/6597

74
docs/docs/en/guide/installation/skywalking-agent.md

@ -0,0 +1,74 @@
SkyWalking Agent Deployment
=============================
The `dolphinscheduler-skywalking` module provides [SkyWalking](https://skywalking.apache.org/) monitor agent for the DolphinScheduler project.
This document describes how to enable SkyWalking version 8.4+ support with this module (recommend using SkyWalking 8.5.0).
## Installation
The following configuration is used to enable the SkyWalking agent.
### Through Environment Variable Configuration (for Docker Compose)
Modify SkyWalking environment variables in `docker/docker-swarm/config.env.sh`:
```
SKYWALKING_ENABLE=true
SW_AGENT_COLLECTOR_BACKEND_SERVICES=127.0.0.1:11800
SW_GRPC_LOG_SERVER_HOST=127.0.0.1
SW_GRPC_LOG_SERVER_PORT=11800
```
And run:
```shell
$ docker-compose up -d
```
### Through Environment Variable Configuration (for Docker)
```shell
$ docker run -d --name dolphinscheduler \
-e DATABASE_HOST="192.168.x.x" -e DATABASE_PORT="5432" -e DATABASE_DATABASE="dolphinscheduler" \
-e DATABASE_USERNAME="test" -e DATABASE_PASSWORD="test" \
-e ZOOKEEPER_QUORUM="192.168.x.x:2181" \
-e SKYWALKING_ENABLE="true" \
-e SW_AGENT_COLLECTOR_BACKEND_SERVICES="your.skywalking-oap-server.com:11800" \
-e SW_GRPC_LOG_SERVER_HOST="your.skywalking-log-reporter.com" \
-e SW_GRPC_LOG_SERVER_PORT="11800" \
-p 12345:12345 \
apache/dolphinscheduler:1.3.8 all
```
### Through install_config.conf Configuration (for DolphinScheduler install.sh)
Add the following configurations to `${workDir}/conf/config/install_config.conf`.
```properties
# SkyWalking config
# note: enable SkyWalking tracking plugin
enableSkywalking="true"
# note: configure SkyWalking backend service address
skywalkingServers="your.skywalking-oap-server.com:11800"
# note: configure SkyWalking log reporter host
skywalkingLogReporterHost="your.skywalking-log-reporter.com"
# note: configure SkyWalking log reporter port
skywalkingLogReporterPort="11800"
```
## Usage
### Import Dashboard
#### Import DolphinScheduler Dashboard to SkyWalking Server
Copy the `${dolphinscheduler.home}/ext/skywalking-agent/dashboard/dolphinscheduler.yml` file into `${skywalking-oap-server.home}/config/ui-initialized-templates/` directory, and restart SkyWalking oap-server.
#### View DolphinScheduler Dashboard
If you have opened the SkyWalking dashboard with a browser before, you need to clear the browser cache.
![img1](/img/skywalking/import-dashboard-1.jpg)

42
docs/docs/en/guide/installation/standalone.md

@ -0,0 +1,42 @@
# Standalone
Standalone only for quick experience for DolphinScheduler.
If you are a new hand and want to experience DolphinScheduler functions, we recommend you install follow [Standalone deployment](standalone.md). If you want to experience more complete functions and schedule massive tasks, we recommend you install follow [pseudo-cluster deployment](pseudo-cluster.md). If you want to deploy DolphinScheduler in production, we recommend you follow [cluster deployment](cluster.md) or [Kubernetes deployment](kubernetes.md).
> **_Note:_** Standalone only recommends the usage of fewer than 20 workflows, because it uses H2 Database, ZooKeeper Testing Server, too many tasks may cause instability.
## Preparation
* JDK:download [JDK][jdk] (1.8+), and configure `JAVA_HOME` to and `PATH` variable. You can skip this step if it already exists in your environment.
* Binary package: download the DolphinScheduler binary package at [download page](https://dolphinscheduler.apache.org/en-us/download/download.html).
## Start DolphinScheduler Standalone Server
### Extract and Start DolphinScheduler
There is a standalone startup script in the binary compressed package, which can be quickly started after extraction. Switch to a user with sudo permission and run the script:
```shell
# Extract and start Standalone Server
tar -xvzf apache-dolphinscheduler-*-bin.tar.gz
cd apache-dolphinscheduler-*-bin
sh ./bin/dolphinscheduler-daemon.sh start standalone-server
```
### Login DolphinScheduler
Access address `http://localhost:12345/dolphinscheduler` and login DolphinScheduler UI. The default username and password are **admin/dolphinscheduler123**
### Start or Stop Server
The script `./bin/dolphinscheduler-daemon.sh`can be used not only quickly start standalone, but also to stop the service operation. The following are all the commands:
```shell
# Start Standalone Server
sh ./bin/dolphinscheduler-daemon.sh start standalone-server
# Stop Standalone Server
sh ./bin/dolphinscheduler-daemon.sh stop standalone-server
```
[jdk]: https://www.oracle.com/technetwork/java/javase/downloads/index.html

32
docs/docs/en/guide/monitor.md

@ -0,0 +1,32 @@
# Monitor
## Service Management
- Service management is mainly to monitor and display the health status and basic information of each service in the system.
## Monitor Master Server
- Mainly related to master information.
![master](/img/new_ui/dev/monitor/master.png)
## Monitor Worker Server
- Mainly related to worker information.
![worker](/img/new_ui/dev/monitor/worker.png)
## Monitor DB
- Mainly the health status of the DB.
![db](/img/new_ui/dev/monitor/db.png)
## Statistics Management
![statistics](/img/new_ui/dev/monitor/statistics.png)
- Number of commands wait to be executed: statistics of the `t_ds_command` table data.
- The number of failed commands: statistics of the `t_ds_error_command` table data.
- Number of tasks wait to run: count the data of `task_queue` in the ZooKeeper.
- Number of tasks wait to be killed: count the data of `task_kill` in the ZooKeeper.

69
docs/docs/en/guide/open-api.md

@ -0,0 +1,69 @@
# Open API
## Background
Generally, projects and processes are created through pages, but considering the integration with third-party systems requires API calls to manage projects and workflows.
## The Operation Steps of DolphinScheduler API Calls
### Create a Token
1. Log in to the scheduling system, click "Security", then click "Token manage" on the left, and click "Create token" to create a token.
<p align="center">
<img src="/img/token-management-en.png" width="80%" />
</p>
2. Select the "Expiration time" (Token validity time), select "User" (choose the specified user to perform the API operation), click "Generate token", copy the `Token` string, and click "Submit".
<p align="center">
<img src="/img/create-token-en1.png" width="80%" />
</p>
### Token Usage
1. Open the API documentation page
> Address:http://{API server ip}:12345/dolphinscheduler/doc.html?language=en_US&lang=en
<p align="center">
<img src="/img/api-documentation-en.png" width="80%" />
</p>
2. select a test API, the API selected for this test is `queryAllProjectList`
> projects/query-project-list
3. Open `Postman`, fill in the API address, enter the `Token` in `Headers`, and then send the request to view the result:
```
token: The Token just generated
```
<p align="center">
<img src="/img/test-api.png" width="80%" />
</p>
### Create a Project
Here is an example of creating a project named "wudl-flink-test":
<p align="center">
<img src="/img/api/create_project1.png" width="80%" />
</p>
<p align="center">
<img src="/img/api/create_project2.png" width="80%" />
</p>
<p align="center">
<img src="/img/api/create_project3.png" width="80%" />
</p>
The returned `msg` information is "success", indicating that we have successfully created the project through API.
If you are interested in the source code of creating a project, please continue to read the following:
### Appendix: The Source Code of Creating a Project
<p align="center">
<img src="/img/api/create_source1.png" width="80%" />
</p>
<p align="center">
<img src="/img/api/create_source2.png" width="80%" />
</p>

48
docs/docs/en/guide/parameter/built-in.md

@ -0,0 +1,48 @@
# Built-in Parameter
## Basic Built-in Parameter
<table>
<tr><th>Variable</th><th>Declaration Method</th><th>Meaning</th></tr>
<tr>
<td>system.biz.date</td>
<td>${system.biz.date}</td>
<td>The day before the schedule time of the daily scheduling instance, the format is yyyyMMdd</td>
</tr>
<tr>
<td>system.biz.curdate</td>
<td>${system.biz.curdate}</td>
<td>The schedule time of the daily scheduling instance, the format is yyyyMMdd</td>
</tr>
<tr>
<td>system.datetime</td>
<td>${system.datetime}</td>
<td>The schedule time of the daily scheduling instance, the format is yyyyMMddHHmmss</td>
</tr>
</table>
## Extended Built-in Parameter
- Support custom variables in the code, declaration way: `${variable name}`. Refers to "System Parameter".
- Benchmark variable defines as `$[...]` format, time format `$[yyyyMMddHHmmss]` can be decomposed and combined arbitrarily, such as: `$[yyyyMMdd]`, `$[HHmmss]`, `$[yyyy-MM-dd]`, etc.
- Or define by the 2 following ways:
1. Use add_month(yyyyMMdd, offset) function to add or minus number of months
the first parameter of this function is [yyyyMMdd], represents the time format
the second parameter is offset, represents the number of months the user wants to add or minus
* Next N years:$[add_months(yyyyMMdd,12*N)]
* N years before:$[add_months(yyyyMMdd,-12*N)]
* Next N months:$[add_months(yyyyMMdd,N)]
* N months before:$[add_months(yyyyMMdd,-N)]
*********************************************************************************************************
1. Add or minus numbers directly after the time format
* Next N weeks:$[yyyyMMdd+7*N]
* First N weeks:$[yyyyMMdd-7*N]
* Next N days:$[yyyyMMdd+N]
* N days before:$[yyyyMMdd-N]
* Next N hours:$[HHmmss+N/24]
* First N hours:$[HHmmss-N/24]
* Next N minutes:$[HHmmss+N/24/60]
* First N minutes:$[HHmmss-N/24/60]

66
docs/docs/en/guide/parameter/context.md

@ -0,0 +1,66 @@
# Refer to Parameter Context
DolphinScheduler provides the ability to refer to each other between parameters, including local parameters refer to global parameters and upstream and downstream parameter transfer. Due to the existence of references, it involves the priority of parameters when the parameter names are the same. see also [Parameter Priority](priority.md)
## Local Task Refers to Global Parameter
The premise of local tasks referring global parameters is that you have already defined [Global Parameter](global.md). The usage is similar to the usage in [local parameters](local.md), but the value of the parameter needs to be configured as the key of the global parameter.
![parameter-call-global-in-local](/img/global_parameter.png)
As the figure above shows, `${biz_date}` and `${curdate}` are examples of local parameters that refer to global parameters. Observe the last line of the above figure, `local_param_bizdate` uses `${global_bizdate}` to refer to the global parameter. In the shell script, you can use `${local_param_bizdate}` to refer to the value of the global variable `global_bizdate`, or set the value of `local_param_bizdate` directly through JDBC. Similarly, `local_param` refers to the global parameters defined in the previous section through `${local_param}`. `biz_date`, `biz_curdate`, `system.datetime` are all user-defined parameters, which are assigned value via `${global parameters}`.
## Pass Parameter From Upstream Task to Downstream
DolphinScheduler allows parameter transfer between tasks. Currently, transfer direction only supports one-way transfer from upstream to downstream. The task types that support this feature are:
* [Shell](../task/shell.md)
* [SQL](../task/sql.md)
* [Procedure](../task/stored-procedure.md)
When defining an upstream node, if there is a need to transmit the result of that node to a dependency related downstream node. You need to set an `OUT` direction parameter to [Custom Parameters] of the [Current Node Settings]. At present, we mainly focus on the SQL and shell nodes to pass parameters downstream.
### SQL
`prop` is user-specified; the direction selects `OUT`, and will define as an export parameter only when the direction is `OUT`. Choose data structures for data type according to the scenario, and the leave the value part blank.
If the result of the SQL node has only one row, one or multiple fields, the name of the `prop` needs to be the same as the field name. The data type can choose structure except `LIST`. The parameter assigns the value according to the same column name in the SQL query result.
If the result of the SQL node has multiple rows, one or more fields, the name of the `prop` needs to be the same as the field name. Choose the data type structure as `LIST`, and the SQL query result will be converted to `LIST<VARCHAR>`, and forward to convert to JSON as the parameter value.
Let's make an example of the SQL node process in the above picture:
The following defines the [createParam1] node in the above figure:
![png05](/img/globalParam/image-20210723104957031.png)
The following defines the [createParam2] node:
![png06](/img/globalParam/image-20210723105026924.png)
Find the value of the variable in the [Workflow Instance] page corresponding to the node instance.
The following shows the Node instance [createparam1]:
![png07](/img/globalParam/image-20210723105131381.png)
Here, the value of "id" is 12.
Let's see the case of the node instance [createparam2].
![png08](/img/globalParam/image-20210723105255850.png)
There is only the "id" value. Although the user-defined SQL query both "id" and "database_name" field, only set the `OUT` parameter `id` due to only one parameter "id" is defined for output. The length of the result list is 10 due to display reasons.
### SHELL
`prop` is user-specified and the direction is `OUT`. The output is defined as an export parameter only when the direction is `OUT`. Choose data structures for data type according to the scenario, and leave the value part blank.
The user needs to pass the parameter when creating the shell script, the output statement format is `${setValue(key=value)}`, the key is the `prop` of the corresponding parameter, and value is the value of the parameter.
For example, in the figure below:
![png09](/img/globalParam/image-20210723101242216.png)
When the log detects the `${setValue(key=value1)}` format in the shell node definition, it will assign value1 to the key, and downstream nodes can use the variable key directly. Similarly, you can find the corresponding node instance on the [Workflow Instance] page to see the value of the variable.
![png10](/img/globalParam/image-20210723102522383.png)

19
docs/docs/en/guide/parameter/global.md

@ -0,0 +1,19 @@
# Global Parameter
## Scope
The parameters defined on the process definition page can apply to all the scope of the process tasks.
## Usage
Usage of global parameters is: at the process define page, click the '+' beside the 'Set global' and fill in the key and value to save:
<p align="center">
<img src="/img/supplement_global_parameter_en.png" width="80%" />
</p>
<p align="center">
<img src="/img/local_parameter_en.png" width="80%" />
</p>
The `global_bizdate` parameter defined here can be referenced by local parameters of any other task node, and set the value of `global_bizdate` by referencing the system parameter `system.biz.date`.

19
docs/docs/en/guide/parameter/local.md

@ -0,0 +1,19 @@
# Local Parameter
## Scope
Parameters configured on the task definition page, the scope of this parameter is inside this task only. But if you configure according to [Refer to Parameter Context](context.md), it could pass to downstream tasks.
## Usage
Usage of local parameters is: at the task define page, click the '+' beside the 'Custom Parameters' and fill in the key and value to save:
<p align="center">
<img src="/img/supplement_local_parameter_en.png" width="80%" />
</p>
<p align="center">
<img src="/img/global_parameter_en.png" width="80%" />
</p>
If you want to call the [built-in parameter](built-in.md) in the local parameters, fill in thevalue of built-in parameters in `value`. As in the above figure, `${biz_date}` and `${curdate}`.

40
docs/docs/en/guide/parameter/priority.md

@ -0,0 +1,40 @@
# Parameter Priority
DolphinScheduler has three parameter types:
* [Global Parameter](global.md): parameters defined at the workflow define page
* [Parameter Context](context.md): parameters passed by upstream task nodes
* [Local Parameter](local.md): parameters belong to its node, which is the parameters defined by the user in [Custom Parameters]. The user can define part of the parameters when creating workflow definitions.
Due to there are multiple sources of the parameter value, there will raise parameter priority issues when the parameter name is the same. The priority of DolphinScheduler parameters from high to low is: `Local Parameter > Parameter Context > Global Parameter`
In the case of upstream tasks can pass parameters to the downstream, there may be multiple tasks upstream that pass the same parameter name:
* Downstream nodes prefer to use parameters with non-empty values
* If there are multiple parameters with non-empty values, select the value from the upstream task with the earliest completion time
## Example
Followings are examples shows task parameters priority problems:#############
1: Use shell nodes to explain the first case.
![png01](/img/globalParam/image-20210723102938239.png)
The [useParam] node can use the parameters which are set in the [createParam] node. The [useParam] node cannot obtain the parameters from the [noUseParam] node due to there is no dependency between them. Other task node types have the same usage rules with the Shell example here.
![png02](/img/globalParam/image-20210723103316896.png)
The [createParam] node can use parameters directly. In addition, the node creates two parameters named "key" and "key1", and "key1" has the same name as the one passed by the upstream node and assign value "12". However, due to the priority rules, the value assignment will assign "12" and the value from the upstream node is discarded.
2: Use SQL nodes to explain another case.
![png03](/img/globalParam/image-20210723103937052.png)
The following shows the definition of the [use_create] node:
![png04](/img/globalParam/image-20210723104411489.png)
"status" is the own parameters of the node set by the current node. However, the user also sets the "status" parameter (global parameter) when saving the process definition and assign its value to -1. Then the value of status will be 2 with higher priority when the SQL executes. The global parameter value is discarded.
The "ID" here is the parameter set by the upstream node. The user sets the parameters of the same parameter name "ID" for the [createparam1] node and [createparam2] node. And the [use_create] node uses the value of [createParam1] which is finished first.

18
docs/docs/en/guide/project/project-list.md

@ -0,0 +1,18 @@
# Project Management
## Create Project
- Click "Project Management" to enter the project management page, click the "Create Project" button, enter the project name, project description, and click "Submit" to create a new project.
![project-list](/img/new_ui/dev/project/project-list.png)
## Project Home
- Click the project name link on the project management page to enter the project home page, as shown in the figure below, the project home page contains the task status statistics, process status statistics, and workflow definition statistics of the project. The introduction for those metrics:
- Task status statistics: within the specified time range, count the number of task instances status as successful submission, running, ready to pause, pause, ready to stop, stop, failure, success, need fault tolerance, kill and waiting threads
- Process status statistics: within the specified time range, count the number of workflow instances status as submission success, running, ready to pause, pause, ready to stop, stop, failure, success, need fault tolerance, kill and waiting threads
- Workflow definition statistics: count the workflow definitions created by this user and granted by the administrator
![project-overview](/img/new_ui/dev/project/project-overview.png)

11
docs/docs/en/guide/project/task-instance.md

@ -0,0 +1,11 @@
## Task Instance
- Click Project Management -> Workflow -> Task Instance. Enter the task instance page, as shown in the figure below, click workflow instance name, you can jump to the workflow instance DAG chart to view the task status.
<p align="center">
<img src="/img/task-list-en.png" width="80%" />
</p>
- View log:Click the "view log" button in the operation column to view task execution log.
<p align="center">
<img src="/img/task-log2-en.png" width="80%" />
</p>

114
docs/docs/en/guide/project/workflow-definition.md

@ -0,0 +1,114 @@
# Workflow Definition
## Create workflow definition
- Click Project Management -> Workflow -> Workflow Definition, enter the workflow definition page, and click the "Create Workflow" button to enter the **workflow DAG edit** page, as shown in the following figure:
<p align="center">
<img src="/img/dag5.png" width="80%" />
</p>
- Drag from the toolbar <img src="/img/tasks/icons/shell.png" width="15"/> to the canvas, to add a shell task to the canvas, as shown in the figure below:
![demo-shell-simple](/img/tasks/demo/shell.jpg)
- **Add parameter settings for shell task:**
1. Fill in the "Node Name", "Description" and "Script" fields;
2. Check “Normal” for “Run Flag”. If “Prohibit Execution” is checked, the task will not execute when the workflow runs;
3. Select "Task Priority": when the number of worker threads is insufficient, high priority tasks will execute first in the execution queue, and tasks with the same priority will execute in the order of first in, first out;
4. Timeout alarm (optional): check the timeout alarm, timeout failure, and fill in the "timeout period". When the task execution time exceeds **timeout period**, an alert email will send and the task timeout fails;
5. Resources (optional). Resources are files create or upload in the Resource Center -> File Management page. For example, the file name is `test.sh`, and the command to call the resource in the script is `sh test.sh`;
6. Customize parameters (optional);
7. Click the "Confirm Add" button to save the task settings.
- **Set dependencies between tasks:** Click the icon in the upper right corner <img src="/img/line.png" width="35"/> to connect the task; as shown in the figure below, task 2 and task 3 execute in parallel, When task 1 finished execution, tasks 2 and 3 will execute simultaneously.
<p align="center">
<img src="/img/dag6.png" width="80%" />
</p>
- **Delete dependencies:** Click the "arrow" icon in the upper right corner <img src="/img/arrow.png" width="35"/>, select the connection line, and click the "Delete" icon in the upper right corner <img src= "/img/delete.png" width="35"/>, delete dependencies between tasks.
<p align="center">
<img src="/img/dag7.png" width="80%" />
</p>
- **Save workflow definition:** Click the "Save" button, and the "Set DAG chart name" window pops up, as shown in the figure below. Enter the workflow definition name, workflow definition description, and set global parameters (optional, refer to [global parameters](../parameter/global.md)), click the "Add" button to finish workflow definition creation.
<p align="center">
<img src="/img/dag8.png" width="80%" />
</p>
> For other types of tasks, please refer to [Task Node Type and Parameter Settings](#TaskParamers). <!-- markdown-link-check-disable-line -->
## Workflow Definition Operation Function
Click Project Management -> Workflow -> Workflow Definition to enter the workflow definition page, as shown below:
<p align="center">
<img src="/img/work_list_en.png" width="80%" />
</p>
The following are the operation functions of the workflow definition list:
- **Edit:** Only "Offline" workflow definitions can be edited. Workflow DAG editing is the same as [Create Workflow Definition](#creatDag) <!-- markdown-link-check-disable-line -->
- **Online:** When the workflow status is "Offline", used to make workflow online. Only the workflow in the "Online" state can run, but cannot edit
- **Offline:** When the workflow status is "Online", used to make workflow offline. Only the workflow in the "Offline" state can be edited, but cannot run
- **Run:** Only workflow in the online state can run. See [2.3.3 Run Workflow](#run-the-workflow) for the operation steps.
- **Timing:** Timing can only set to online workflows, and the system automatically schedules to run the workflow on time. The status after creating a timing setting is "offline", and the timing must set online on the timing management page to make effect. See [2.3.4 Workflow Timing](#workflow-timing) for timing operation steps
- **Timing Management:** The timing management page can edit, online or offline and delete timing
- **Delete:** Delete the workflow definition
- **Download:** Download workflow definition to local
- **Tree Diagram:** Display the task node type and task status in a tree structure, as shown in the figure below:
<p align="center">
<img src="/img/tree_en.png" width="80%" />
</p>
## Run the Workflow
- Click Project Management -> Workflow -> Workflow Definition to enter the workflow definition page, as shown in the figure below, click the "Go Online" button <img src="/img/online.png" width="35"/>to make workflow online.
<p align="center">
<img src="/img/work_list_en.png" width="80%" />
</p>
- Click the "Run" button to pop up the startup parameter setting window, as shown in the figure below, set the startup parameters, click the "Run" button in the pop-up box, the workflow starts running, and the workflow instance page generates a workflow instance.
<p align="center">
<img src="/img/run_work_en.png" width="80%" />
</p>
Description of workflow operating parameters:
* Failure strategy: When a task node fails to execute, other parallel task nodes need to execute this strategy. "Continue" means: after a certain task fails, other task nodes execute normally; "End" means: terminate all tasks execution, and terminate the entire process
* Notification strategy: When the process is over, send the process execution result notification email according to the process status, options including no send, send if sucess, send of failure, send whatever result
* Process priority: The priority of process operation, divide into five levels: highest (HIGHEST), high (HIGH), medium (MEDIUM), low (LOW), and lowest (LOWEST). When the number of master threads is insufficient, high priority processes will execute first in the execution queue, and processes with the same priority will execute in the order of first in, first out;
* Worker group: The process can only be executed in the specified worker machine group. The default is `Default`, which can execute on any worker
* Notification group: select notification strategy||timeout alarm||when fault tolerance occurs, process result information or email will send to all members in the notification group
* Recipient: select notification policy||timeout alarm||when fault tolerance occurs, process result information or alarm email will be sent to the recipient list
* Cc: select notification policy||timeout alarm||when fault tolerance occurs, the process result information or warning email will be copied to the CC list
* Startup parameter: Set or overwrite global parameter values when starting a new process instance
* Complement: two modes including serial complement and parallel complement. Serial complement: within the specified time range, the complements are executed from the start date to the end date and N process instances are generated in turn; parallel complement: within the specified time range, multiple days are complemented at the same time to generate N process instances.
* You can select complement time range (only support continuous date) when executing a timing workflow definition. For example, need to fill in the data from 1st May to 10th May, as shown in the figure below:
<p align="center">
<img src="/img/complement_en1.png" width="80%" />
</p>
> Serial mode: the complement execute sequentially from 1st May to 10th May, and the process instance page generates 10 process instances;
> Parallel mode: The tasks from 1st May to 10th May execute simultaneously, and the process instance page generates 10 process instances;
## Workflow Timing
- Create timing: Click Project Management->Workflow->Workflow Definition, enter the workflow definition page, make the workflow online, click the "timing" button <img src="/img/timing.png" width="35"/> , the timing parameter setting dialog box pops up, as shown in the figure below:
<p align="center">
<img src="/img/time_schedule_en.png" width="80%" />
</p>
- Choose the start and end time. In the time range, the workflow runs at regular intervals; If not in the time range, no regular workflow instances generate.
- Add a timing that execute once a day at 5 AM, as shown in the following figure:
<p align="center">
<img src="/img/timer-en.png" width="80%" />
</p>
- Failure strategy, notification strategy, process priority, worker group, notification group, recipient, and CC are the same as workflow running parameters.
- Click the "Create" button to create the timing. Now the timing status is "**Offline**" and the timing needs to be **Online** to make effect.
- Timing online: Click the "Timing Management" button <img src="/img/timeManagement.png" width="35"/>, enter the timing management page, click the "online" button, the timing status will change to "online", as shown in the below figure, the workflow makes effect regularly.
<p align="center">
<img src="/img/time-manage-list-en.png" width="80%" />
</p>
## Import Workflow
Click Project Management -> Workflow -> Workflow Definition to enter the workflow definition page, click the "Import Workflow" button to import the local workflow file, the workflow definition list displays the imported workflow and the status is offline.

62
docs/docs/en/guide/project/workflow-instance.md

@ -0,0 +1,62 @@
# Workflow Instance
## View Workflow Instance
- Click Project Management -> Workflow -> Workflow Instance, enter the Workflow Instance page, as shown in the figure below:
<p align="center">
<img src="/img/instance-list-en.png" width="80%" />
</p>
- Click the workflow name to enter the DAG view page, and check the task execution status, as shown in the figure below:
<p align="center">
<img src="/img/instance-runs-en.png" width="80%" />
</p>
## View Task Log
- Enter the workflow instance page, click the workflow name, enter the DAG view page, double-click the task node, as shown in the figure below:
<p align="center">
<img src="/img/instanceViewLog-en.png" width="80%" />
</p>
- Click "View Log", a log window pops up, as shown in the figure below, you can also view the task log on the task instance page, refer to [Task View Log](./task-instance.md)
<p align="center">
<img src="/img/task-log-en.png" width="80%" />
</p>
## View Task History
- Click Project Management -> Workflow -> Workflow Instance, enter the workflow instance page, and click the workflow name to enter the workflow DAG page;
- Double-click the task node, as shown in the figure below, click "View History" to jump to the task instance page, and display a list of task instances running by the workflow instance
<p align="center">
<img src="/img/task_history_en.png" width="80%" />
</p>
## View Operation Parameters
- Click Project Management -> Workflow -> Workflow Instance, enter the workflow instance page, and click the workflow name to enter the workflow DAG page;
- Click the icon in the upper left corner <img src="/img/run_params_button.png" width="35"/>,View the startup parameters of the workflow instance; click the icon <img src="/img/global_param.png" width="35"/>,View the global and local parameters of the workflow instance, as shown in the following figure:
<p align="center">
<img src="/img/run_params_en.png" width="80%" />
</p>
## Workflow Instance Operation Function
Click Project Management -> Workflow -> Workflow Instance, enter the workflow instance page, as shown in the figure below:
<p align="center">
<img src="/img/instance-list-en.png" width="80%" />
</p>
- **Edit:** only can edit terminated processes. Click the "Edit" button or the workflow instance name to enter the DAG edit page. After the edit, click the "Save" button to confirm, as shown in the figure below. In the pop-up box, check "Whether to update to workflow definition" to update the workflow definition; if not checked, no effect on the workflow definition
<p align="center">
<img src="/img/editDag-en.png" width="80%" />
</p>
- **Rerun:** re-execute the terminated process
- **Recovery failed:** for failed processes, you can perform failure recovery operations, starting from the failed node
- **Stop:** to **stop** the running process, the background code will first `kill` the worker process, and then execute `kill -9` operation
- **Pause:** Perform a **pause** operation on the running process, the system status will change to **waiting for execution**, it will wait for the task to finish, and pause the next sequence task.
- **Resume pause:** to resume the paused process, start running directly from the **paused node**
- **Delete:** delete the workflow instance and the task instance under the workflow instance
- **Gantt chart:** the vertical axis of the Gantt chart is the topological sorting of task instances of the workflow instance, and the horizontal axis is the running time of the task instances, as shown in the figure below:
<p align="center">
<img src="/img/gantt-en.png" width="80%" />
</p>

165
docs/docs/en/guide/resource.md

@ -0,0 +1,165 @@
# Resource Center
If you want to use the resource upload function, you can appoint the local file directory as the upload directory for a single machine (this operation does not need to deploy Hadoop). Or you can also upload to a Hadoop or MinIO cluster, at this time, you need to have Hadoop (2.6+) or MinIO or other related environments.
> **_Note:_**
>
> * If you want to use the resource upload function, the deployment user in [installation and deployment](installation/standalone.md) must have relevant operation authority.
> * If you using a Hadoop cluster with HA, you need to enable HDFS resource upload, and you need to copy the `core-site.xml` and `hdfs-site.xml` under the Hadoop cluster to `/opt/dolphinscheduler/conf`, otherwise skip this copy step.
## HDFS Resource Configuration
- Upload resource files and UDF functions, all uploaded files and resources will be stored on HDFS, so require the following configurations:
```
conf/common.properties
# Users who have permission to create directories under the HDFS root path
hdfs.root.user=hdfs
# data base dir, resource file will store to this hadoop hdfs path, self configuration, please make sure the directory exists on hdfs and have read write permissions。"/dolphinscheduler" is recommended
resource.upload.path=/dolphinscheduler
# resource storage type : HDFS,S3,NONE
resource.storage.type=HDFS
# whether kerberos starts
hadoop.security.authentication.startup.state=false
# java.security.krb5.conf path
java.security.krb5.conf.path=/opt/krb5.conf
# loginUserFromKeytab user
login.user.keytab.username=hdfs-mycluster@ESZ.COM
# loginUserFromKeytab path
login.user.keytab.path=/opt/hdfs.headless.keytab
# if resource.storage.type is HDFS,and your Hadoop Cluster NameNode has HA enabled, you need to put core-site.xml and hdfs-site.xml in the installPath/conf directory. In this example, it is placed under /opt/soft/dolphinscheduler/conf, and configure the namenode cluster name; if the NameNode is not HA, modify it to a specific IP or host name.
# if resource.storage.type is S3,write S3 address,HA,for example :s3a://dolphinscheduler,
# Note,s3 be sure to create the root directory /dolphinscheduler
fs.defaultFS=hdfs://mycluster:8020
#resourcemanager ha note this need ips , this empty if single
yarn.resourcemanager.ha.rm.ids=192.168.xx.xx,192.168.xx.xx
# If it is a single resourcemanager, you only need to configure one host name. If it is resourcemanager HA, the default configuration is fine
yarn.application.status.address=http://xxxx:8088/ws/v1/cluster/apps/%s
```
## File Management
> It is the management of various resource files, including creating basic `txt/log/sh/conf/py/java` and jar packages and other type files, and can do edit, rename, download, delete and other operations to the files.
![file-manage](/img/new_ui/dev/resource/file-manage.png)
- Create a file
> The file format supports the following types: txt, log, sh, conf, cfg, py, java, sql, xml, hql, properties.
![create-file](/img/new_ui/dev/resource/create-file.png)
- upload files
> Upload file: Click the "Upload File" button to upload, drag the file to the upload area, the file name will be automatically completed with the uploaded file name.
![upload-file](/img/new_ui/dev/resource/upload-file.png)
- File View
> For the files that can be viewed, click the file name to view the file details.
<p align="center">
<img src="/img/file_detail_en.png" width="80%" />
</p>
- Download file
> Click the "Download" button in the file list to download the file or click the "Download" button in the upper right corner of the file details to download the file.
- File rename
![rename-file](/img/new_ui/dev/resource/rename-file.png)
- delete
> File list -> Click the "Delete" button to delete the specified file.
- Re-upload file
> Re-upload file: Click the "Re-upload File" button to upload a new file to replace the old file, drag the file to the re-upload area, the file name will be automatically completed with the new uploaded file name.
<p align="center">
<img src="/img/reupload_file_en.png" width="80%" />
</p>
## UDF Management
### Resource Management
> The resource management and file management functions are similar. The difference is that the resource management is the UDF upload function, and the file management uploads the user programs, scripts and configuration files.
> Operation function: rename, download, delete.
- Upload UDF resources
> Same as uploading files.
### Function Management
- Create UDF function
> Click "Create UDF Function", enter the UDF function parameters, select the UDF resource, and click "Submit" to create the UDF function.
> Currently, only supports temporary UDF functions of Hive.
- UDF function name: enter the name of the UDF function.
- Package name Class name: enter the full path of the UDF function.
- UDF resource: set the resource file corresponding to the created UDF function.
![create-udf](/img/new_ui/dev/resource/create-udf.png)
## Task Group Settings
The task group is mainly used to control the concurrency of task instances and is designed to control the pressure of other resources (it can also control the pressure of the Hadoop cluster, the cluster will have queue control it). When creating a new task definition, you can configure the corresponding task group and configure the priority of the task running in the task group.
### Task Group Configuration
#### Create Task Group
![create-taskGroup](/img/new_ui/dev/resource/create-taskGroup.png)
The user clicks [Resources] - [Task Group Management] - [Task Group option] - [Create Task Group]
![create-taskGroup](/img/new_ui/dev/resource/create-taskGroup.png)
You need to enter the information inside the picture:
- Task group name: the name displayed of the task group
- Project name: the project range that the task group functions, this item is optional, if not selected, all the projects in the whole system can use this task group.
- Resource pool size: The maximum number of concurrent task instances allowed.
#### View Task Group Queue
![view-queue](/img/new_ui/dev/resource/view-queue.png)
Click the button to view task group usage information:
![view-queue](/img/new_ui/dev/resource/view-groupQueue.png)
#### Use of Task Groups
**Note**: The usage of task groups is applicable to tasks executed by workers, such as [switch] nodes, [condition] nodes, [sub_process] and other node types executed by the master are not controlled by the task group. Let's take the shell node as an example:
![use-queue](/img/new_ui/dev/resource/use-queue.png)
Regarding the configuration of the task group, all you need to do is to configure these parts in the red box:
- Task group name: The task group name is displayed on the task group configuration page. Here you can only see the task group that the project has permission to access (the project is selected when creating a task group) or the task group that scope globally (no project is selected when creating a task group).
- Priority: When there is a waiting resource, the task with high priority will be distributed to the worker by the master first. The larger the value of this part, the higher the priority.
### Implementation Logic of Task Group
#### Get Task Group Resources:
The master judges whether the task is configured with a task group when distributing the task. If the task is not configured, it is normally thrown to the worker to run; if a task group is configured, it checks whether the remaining size of the task group resource pool meets the current task operation before throwing it to the worker for execution. , if the resource pool -1 is satisfied, continue to run; if not, exit the task distribution and wait for other tasks to wake up.
#### Release and Wake Up:
When the task that has occupied the task group resource is finished, the task group resource will be released. After the release, it will check whether there is a task waiting in the current task group. If there is, mark the task with the best priority to run, and create a new executable event. The event stores the task ID that is marked to acquire the resource, and then the task obtains the task group resource and run.
#### Task Group Flowchart
<p align="center">
<img src="/img/task_group_process.png" width="80%" />
</p>

151
docs/docs/en/guide/security.md

@ -0,0 +1,151 @@
# Security (Authorization System)
* Only the administrator account in the security center has the authority to operate. It has functions such as queue management, tenant management, user management, alarm group management, worker group management, token management, etc. In the user management module, can authorize to the resources, data sources, projects, etc.
* Administrator login, the default username and password is `admin/dolphinscheduler123`
## Create Queue
- Configure `queue` parameter to execute programs such as Spark and MapReduce.
- The administrator enters the `Security Center->Queue Management` page and clicks the "Create Queue" button to create a new queue.
![create-queue](/img/new_ui/dev/security/create-queue.png)
## Add Tenant
- The tenant corresponds to the Linux user, which is used by the worker to submit the job. The task will fail if Linux does not have this user exists. You can set the parameter `worker.tenant.auto.create` as `true` in configuration file `worker.properties`. After that DolphinScheduler will create a user if not exists, The property `worker.tenant.auto.create=true` requests worker run `sudo` command without password.
- Tenant Code: **Tenant Code is the only user on Linux and cannot be repeated**
- The administrator enters the `Security Center->Tenant Management` page and clicks the `Create Tenant` button to create a tenant.
![create-tenant](/img/new_ui/dev/security/create-tenant.png)
## Create Normal User
- Users are divided into **administrator users** and **normal users**
- The administrator has authorization to authorize and user management authorities but does not have the authority to create project and workflow definition operations.
- Normal users can create projects and create, edit and execute workflow definitions.
- **Note**: If the user switches tenants, all resources under the tenant to which the user belongs will be copied to the new tenant that is switched.
- The administrator enters the `Security Center -> User Management` page and clicks the `Create User` button to create a user.
![create-user](/img/new_ui/dev/security/create-user.png)
> **Edit user information**
- The administrator enters the `Security Center->User Management` page and clicks the `Edit` button to edit user information.
- After a normal user logs in, click the user information in the username drop-down box to enter the user information page, and click the `Edit` button to edit the user information.
> **Modify user password**
- The administrator enters the `Security Center->User Management` page and clicks the `Edit` button. When editing user information, enter the new password to modify the user password.
- After a normal user logs in, click the user information in the username drop-down box to enter the password modification page, enter the password and confirm the password and click the `Edit` button, then the password modification is a success.
## Create Alarm Group
- The alarm group is a parameter set at startup. After the process ends, the status of the process and other information will be sent to the alarm group by email.
* The administrator enters the `Security Center -> Alarm Group Management` page and clicks the `Create Alarm Group` button to create an alarm group.
![create-alarmInstance](/img/new_ui/dev/security/create-alarmInstance.png)
## Token Management
> Since the back-end interface has login check, token management provides a way to execute various operations on the system by calling interfaces.
- The administrator enters the `Security Center -> Token Management page`, clicks the `Create Token` button, selects the expiration time and user, clicks the `Generate Token` button, and clicks the `Submit` button, then create the selected user's token successfully.
![create-token](/img/new_ui/dev/security/create-token.png)
- After a normal user logs in, click the user information in the username drop-down box, enter the token management page, select the expiration time, click the `Generate Token` button, and click the `Submit` button, then the user creates a token successfully.
- Call example:
```java
/**
* test token
*/
public void doPOSTParam()throws Exception{
// create HttpClient
CloseableHttpClient httpclient = HttpClients.createDefault();
// create http post request
HttpPost httpPost = new HttpPost("http://127.0.0.1:12345/escheduler/projects/create");
httpPost.setHeader("token", "123");
// set parameters
List<NameValuePair> parameters = new ArrayList<NameValuePair>();
parameters.add(new BasicNameValuePair("projectName", "qzw"));
parameters.add(new BasicNameValuePair("desc", "qzw"));
UrlEncodedFormEntity formEntity = new UrlEncodedFormEntity(parameters);
httpPost.setEntity(formEntity);
CloseableHttpResponse response = null;
try {
// execute
response = httpclient.execute(httpPost);
// response status code 200
if (response.getStatusLine().getStatusCode() == 200) {
String content = EntityUtils.toString(response.getEntity(), "UTF-8");
System.out.println(content);
}
} finally {
if (response != null) {
response.close();
}
httpclient.close();
}
}
```
## Granted Permissions
* Granted permissions include project permissions, resource permissions, data source permissions, UDF function permissions.
* The administrator can authorize the projects, resources, data sources and UDF functions to normal users which not created by them. Because the way to authorize projects, resources, data sources and UDF functions to users is the same, we take project authorization as an example.
* Note: The user has all permissions to the projects created by them. Projects will not be displayed in the project list and the selected project list.
- The administrator enters the `Security Center -> User Management` page and clicks the `Authorize` button of the user who needs to be authorized, as shown in the figure below:
<p align="center">
<img src="/img/auth-en.png" width="80%" />
</p>
- Select the project and authorize the project.
<p align="center">
<img src="/img/auth-project-en.png" width="80%" />
</p>
- Resources, data sources, and UDF function authorization are the same as project authorization.
## Worker Grouping
Each worker node will belong to its own worker group, and the default group is "default".
When executing a task, the task can be assigned to the specified worker group, and the task will be executed by the worker node in the group.
> Add or update worker group
- Open the `conf/worker.properties` configuration file on the worker node where you want to configure the groups and modify the `worker.groups` parameter.
- The `worker.groups` parameter is followed by the name of the group corresponding to the worker node, which is `default`.
- If the worker node corresponds to more than one group, they are separated by commas.
```conf
worker.groups=default,test
```
- You can also change the worker group for the worker during execution, and if the modification is successful, the worker will use the new group and ignore the configuration in `worker.properties`. The step to modify work group as below: `Security Center -> Worker Group Management -> click 'New Worker Group' -> click 'New Worker Group' -> enter 'Group Name' -> Select Exists Worker -> Click Submit`.
## Environmental Management
* Configure the Worker operating environment online. A Worker can specify multiple environments, and each environment is equivalent to the `dolphinscheduler_env.sh` file.
* The default environment is the `dolphinscheduler_env.sh` file.
* When executing a task, the task can be assigned to the specified worker group, and select the corresponding environment according to the worker group. Finally, the worker node executes the environment first and then executes the task.
> Add or update environment
- The environment configuration is equivalent to the configuration in the `dolphinscheduler_env.sh` file.
![create-environment](/img/new_ui/dev/security/create-environment.png)
> Usage environment
- Create a task node in the workflow definition, select the worker group and the environment corresponding to the worker group. When executing the task, the Worker will execute the environment first before executing the task.
![use-environment](/img/new_ui/dev/security/use-environment.png)

1024
docs/docs/en/guide/start/docker.md

File diff suppressed because it is too large Load Diff

62
docs/docs/en/guide/start/quick-start.md

@ -0,0 +1,62 @@
# Quick Start
* Watch Apache DolphinScheduler Quick Start Tutorile here:
[![image](/img/video_cover/quick-use.png)](https://www.youtube.com/watch?v=nrF20hpCkug)
* Administrator user login
> Address:http://localhost:12345/dolphinscheduler Username and password: `admin/dolphinscheduler123`
![login](/img/new_ui/dev/quick-start/login.png)
* Create a queue
![create-queue](/img/new_ui/dev/quick-start/create-queue.png)
* Create a tenant
![create-tenant](/img/new_ui/dev/quick-start/create-tenant.png)
* Create Ordinary Users
![create-user](/img/new_ui/dev/quick-start/create-user.png)
* Create an alarm instance
![create-alarmInstance](/img/new_ui/dev/quick-start/create-alarmInstance.png)
* Create an alarm group
![create-alarmGroup](/img/new_ui/dev/quick-start/create-alarmGroup.png)
* Create a worker group
![create-workerGroup](/img/new_ui/dev/quick-start/create-workerGroup.png)
* Create environment
![create-environment](/img/new_ui/dev/quick-start/create-environment.png)
* Create a token
![create-token](/img/new_ui/dev/quick-start/create-token.png)
* Login with regular users
> Click on the user name in the upper right corner to "exit" and re-use the normal user login.
* `Project Management - > Create Project - > Click on Project Name`
![project](/img/new_ui/dev/quick-start/project.png)
* `Click Workflow Definition - > Create Workflow Definition - > Online Process Definition`
<p align="center">
<img src="/img/process_definition_en.png" width="60%" />
</p>
* `Running Process Definition - > Click Workflow Instance - > Click Process Instance Name - > Double-click Task Node - > View Task Execution Log`
<p align="center">
<img src="/img/log_en.png" width="60%" />
</p>

36
docs/docs/en/guide/task/conditions.md

@ -0,0 +1,36 @@
# Conditions
Condition is a conditional node, that determines which downstream task should run based on the condition of the upstream task. Currently, the Conditions support multiple upstream tasks, but only two downstream tasks. When the number of upstream tasks exceeds one, achieve complex upstream dependencies by through `and` and `or` operators.
## Create Task
Drag from the toolbar <img src="/img/conditions.png" width="20"/> task node to canvas to create a new Conditions task, as shown in the figure below:
<p align="center">
<img src="/img/condition_dag_en.png" width="80%" />
</p>
<p align="center">
<img src="/img/condition_task_en.png" width="80%" />
</p>
## Parameter
- Node name: The node name in a workflow definition is unique.
- Run flag: Identifies whether this node schedules normally, if it does not need to execute, select the `prohibition execution`.
- Descriptive information: Describe the function of the node.
- Task priority: When the number of worker threads is insufficient, execute in the order of priority from high to low, and tasks with the same priority will execute in a first-in first-out order.
- Worker grouping: Assign tasks to the machines of the worker group to execute. If `Default` is selected, randomly select a worker machine for execution.
- Times of failed retry attempts: The number of times the task failed to resubmit. You can select from drop-down or fill-in a number.
- Failed retry interval: The time interval for resubmitting the task after a failed task. You can select from drop-down or fill-in a number.
- Timeout alarm: Check the timeout alarm and timeout failure. When the task runs exceed the "timeout", an alarm email will send and the task execution will fail.
- Downstream tasks selection: supports two branches success and failure.
- Success: When the upstream task runs successfully, run the success branch.
- Failure: When the upstream task runs failed, run the failure branch.
- Upstream condition selection: can select one or more upstream tasks for conditions.
- Add an upstream dependency: the first parameter is to choose a specified task name, and the second parameter is to choose the upstream task status to trigger conditions.
- Select upstream task relationship: use `and` and `or` operators to handle the complex relationship of upstream when there are multiple upstream tasks for conditions.
## Related Task
[switch](switch.md): Conditions task mainly executes the corresponding branch based on the execution status (success, failure) of the upstream nodes. The [Switch](switch.md) task node mainly executes the corresponding branch based on the value of the [global parameter](../parameter/global.md) and the result of user written expression.

63
docs/docs/en/guide/task/datax.md

@ -0,0 +1,63 @@
# DataX
## Overview
DataX task type for executing DataX programs. For DataX nodes, the worker will execute `${DATAX_HOME}/bin/datax.py` to analyze the input json file.
## Create Task
- Click `Project -> Management-Project -> Name-Workflow Definition`, and click the `Create Workflow` button to enter the DAG editing page.
- Drag from the toolbar <img src="/img/tasks/icons/datax.png" width="15"/> task node to canvas.
## Task Parameter
- **Node name**: The node name in a workflow definition is unique.
- **Run flag**: Identifies whether this node schedules normally, if it does not need to execute, select the `prohibition execution`.
- **Descriptive information**: Describe the function of the node.
- **Task priority**: When the number of worker threads is insufficient, execute in the order of priority from high to low, and tasks with the same priority will execute in a first-in first-out order.
- **Worker grouping**: Assign tasks to the machines of the worker group to execute. If `Default` is selected, randomly select a worker machine for execution.
- **Environment Name**: Configure the environment name in which run the script.
- **Times of failed retry attempts**: The number of times the task failed to resubmit.
- **Failed retry interval**: The time interval (unit minute) for resubmitting the task after a failed task.
- **Delayed execution time**: The time (unit minute) that a task delays in execution.
- **Timeout alarm**: Check the timeout alarm and timeout failure. When the task runs exceed the "timeout", an alarm email will send and the task execution will fail.
- **Custom template**: Customize the content of the DataX node's JSON profile when the default DataSource provided does not meet the requirements.
- **JSON**: JSON configuration file for DataX synchronization.
- **Custom parameters**: SQL task type, and stored procedure is a custom parameter order, to set customized parameter type and data type for the method is the same as the stored procedure task type. The difference is that the custom parameter of the SQL task type replaces the `${variable}` in the SQL statement.
- **Data source**: Select the data source to extract data.
- **SQL statement**: The SQL statement used to extract data from the target database, the SQL query column name is automatically parsed when execute the node, and mapped to the target table to synchronize column name. When the column names of the source table and the target table are inconsistent, they can be converted by column alias (as)
- **Target library**: Select the target library for data synchronization.
- **Pre-SQL**: Pre-SQL executes before the SQL statement (executed by the target database).
- **Post-SQL**: Post-SQL executes after the SQL statement (executed by the target database).
- **Stream limit (number of bytes)**: Limit the number of bytes for a query.
- **Limit flow (number of records)**: Limit the number of records for a query.
- **Running memory**: Set the minimum and maximum memory required, which can be set according to the actual production environment.
- **Predecessor task**: Selecting a predecessor task for the current task, will set the selected predecessor task as upstream of the current task.
## Task Example
This example demonstrates how to import data from Hive into MySQL.
### Configure the DataX environment in DolphinScheduler
If you are using the DataX task type in a production environment, it is necessary to configure the required environment first. The following is the configuration file: `/dolphinscheduler/conf/env/dolphinscheduler_env.sh`.
![datax_task01](/img/tasks/demo/datax_task01.png)
After finish the environment configuration, need to restart DolphinScheduler.
### Configure DataX Task Node
As the default DataSource does not contain data read from Hive, require a custom JSON, refer to: [HDFS Writer](https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md). Note: Partition directories exist on the HDFS path, when importing data in real world situations, partitioning is recommended to be passed as a parameter, using custom parameters.
After finish the required JSON file, you can configure the node by following the steps in the diagram below:
![datax_task02](/img/tasks/demo/datax_task02.png)
### View Execution Result
![datax_task03](/img/tasks/demo/datax_task03.png)
### Notice
If the default DataSource provided does not meet your needs, you can configure the writer and reader of the DataX according to the actual usage environment in the custom template options, available at [DataX](https://github.com/alibaba/DataX).

27
docs/docs/en/guide/task/dependent.md

@ -0,0 +1,27 @@
# Dependent Node
- Dependent nodes are **dependency check nodes**. For example, process A depends on the successful execution of process B from yesterday, and the dependent node will check whether process B run successful yesterday.
> Drag from the toolbar ![PNG](https://analysys.github.io/easyscheduler_docs_cn/images/toolbar_DEPENDENT.png) task node to the canvas, as shown in the figure below:
<p align="center">
<img src="/img/dependent-nodes-en.png" width="80%" />
</p>
> The dependent node provides a logical judgment function, such as checking whether the B process was successful yesterday, or whether the C process was executed successfully.
<p align="center">
<img src="/img/depend-node-en.png" width="80%" />
</p>
> For example, process A is a weekly report task, processes B and C are daily tasks, and task A requires tasks B and C to be successfully executed every day of the last week, as shown in the figure:
<p align="center">
<img src="/img/depend-node1-en.png" width="80%" />
</p>
> If the weekly report A also needs to be executed successfully last Tuesday:
<p align="center">
<img src="/img/depend-node3-en.png" width="80%" />
</p>

60
docs/docs/en/guide/task/emr.md

@ -0,0 +1,60 @@
# Amazon EMR
## Overview
Amazon EMR task type, for creating EMR clusters on AWS and running computing tasks. Using [aws-java-sdk](https://aws.amazon.com/cn/sdk-for-java/) in the background code, to transfer JSON parameters to [RunJobFlowRequest](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/elasticmapreduce/model/RunJobFlowRequest.html) object and submit to AWS.
## Parameter
- Node name: The node name in a workflow definition is unique.
- Run flag: Identifies whether this node schedules normally, if it does not need to execute, select the `prohibition execution`.
- Descriptive information: Describe the function of the node.
- Task priority: When the number of worker threads is insufficient, execute in the order of priority from high to low, and tasks with the same priority will execute in a first-in first-out order.
- Worker grouping: Assign tasks to the machines of the worker group to execute. If `Default` is selected, randomly select a worker machine for execution.
- Times of failed retry attempts: The number of times the task failed to resubmit. You can select from drop-down or fill-in a number.
- Failed retry interval: The time interval for resubmitting the task after a failed task. You can select from drop-down or fill-in a number.
- Timeout alarm: Check the timeout alarm and timeout failure. When the task runs exceed the "timeout", an alarm email will send and the task execution will fail.
- JSON: JSON corresponding to the [RunJobFlowRequest](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/elasticmapreduce/model/RunJobFlowRequest.html) object, for details refer to [API_RunJobFlow_Examples](https://docs.aws.amazon.com/emr/latest/APIReference/API_RunJobFlow.html#API_RunJobFlow_Examples).
## JSON example
```json
{
"Name": "SparkPi",
"ReleaseLabel": "emr-5.34.0",
"Applications": [
{
"Name": "Spark"
}
],
"Instances": {
"InstanceGroups": [
{
"Name": "Primary node",
"InstanceRole": "MASTER",
"InstanceType": "m4.xlarge",
"InstanceCount": 1
}
],
"KeepJobFlowAliveWhenNoSteps": false,
"TerminationProtected": false
},
"Steps": [
{
"Name": "calculate_pi",
"ActionOnFailure": "CONTINUE",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args": [
"/usr/lib/spark/bin/run-example",
"SparkPi",
"15"
]
}
}
],
"JobFlowRole": "EMR_EC2_DefaultRole",
"ServiceRole": "EMR_DefaultRole"
}
```

69
docs/docs/en/guide/task/flink.md

@ -0,0 +1,69 @@
# Flink Node
## Overview
Flink task type for executing Flink programs. For Flink nodes, the worker submits the task by using the Flink command `flink run`. See [flink cli](https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/cli/) for more details.
## Create Task
- Click `Project -> Management-Project -> Name-Workflow Definition`, and click the "Create Workflow" button to enter the DAG editing page.
- Drag from the toolbar <img src="/img/tasks/icons/flink.png" width="15"/>task node to canvas.
## Task Parameter
- **Node name**: The node name in a workflow definition is unique.
- **Run flag**: Identifies whether this node schedules normally, if it does not need to execute, select the `prohibition execution`.
- **Descriptive information**: Describe the function of the node.
- **Task priority**: When the number of worker threads is insufficient, execute in the order of priority from high to low, and tasks with the same priority will execute in a first-in first-out order.
- **Worker grouping**: Assign tasks to the machines of the worker group to execute. If `Default` is selected, randomly select a worker machine for execution.
- **Environment Name**: Configure the environment name in which run the script.
- **Times of failed retry attempts**: The number of times the task failed to resubmit.
- **Failed retry interval**: The time interval (unit minute) for resubmitting the task after a failed task.
- **Delayed execution time**: The time (unit minute) that a task delays in execution.
- **Timeout alarm**: Check the timeout alarm and timeout failure. When the task runs exceed the "timeout", an alarm email will send and the task execution will fail.
- **Program type**: Supports Java, Scala and Python.
- **The class of main function**: The **full path** of Main Class, the entry point of the Flink program.
- **Main jar package**: The jar package of the Flink program (upload by Resource Center).
- **Deployment mode**: Support 2 deployment modes: cluster and local.
- **Flink version**: Select version according to the execution env.
- **Task name** (optional): Flink task name.
- **JobManager memory size**: Used to set the size of jobManager memories, which can be set according to the actual production environment.
- **Number of slots**: Used to set the number of slots, which can be set according to the actual production environment.
- **TaskManager memory size**: Used to set the size of taskManager memories, which can be set according to the actual production environment.
- **Number of TaskManager**: Used to set the number of taskManagers, which can be set according to the actual production environment.
- **Parallelism**: Used to set the degree of parallelism for executing Flink tasks.
- **Main program parameters**: Set the input parameters for the Flink program and support the substitution of custom parameter variables.
- **Optional parameters**: Support `--jar`, `--files`,` --archives`, `--conf` format.
- **Resource**: Appoint resource files in the `Resource` if parameters refer to them.
- **Custom parameter**: It is a local user-defined parameter for Flink, and will replace the content with `${variable}` in the script.
- **Predecessor task**: Selecting a predecessor task for the current task, will set the selected predecessor task as upstream of the current task.
## Task Example
### Execute the WordCount Program
This is a common introductory case in the big data ecosystem, which often apply to computational frameworks such as MapReduce, Flink and Spark. The main purpose is to count the number of identical words in the input text. (Flink's releases attach this example job)
#### Configure the flink environment in DolphinScheduler
If you are using the flink task type in a production environment, it is necessary to configure the required environment first. The following is the configuration file: `/dolphinscheduler/conf/env/dolphinscheduler_env.sh`.
![demo-flink-simple](/img/tasks/demo/flink_task01.png)
#### Upload the Main Package
When using the Flink task node, you need to upload the jar package to the Resource Centre for the execution, refer to the [resource center](../resource.md).
After finish the Resource Centre configuration, upload the required target files directly by dragging and dropping.
![resource_upload](/img/tasks/demo/upload_jar.png)
#### Configure Flink Nodes
Configure the required content according to the parameter descriptions above.
![demo-flink-simple](/img/tasks/demo/flink_task02.png)
## Notice
JAVA and Scala only used for identification, there is no difference. If use Python to develop Flink, there is no class of the main function and the rest is the same.

47
docs/docs/en/guide/task/http.md

@ -0,0 +1,47 @@
# HTTP Node
## Overview
This node is used to perform http type tasks such as the common POST and GET request types, and also supports http request validation and other functions.
## Create Task
- Click `Project -> Management-Project -> Name-Workflow Definition`, and click the "Create Workflow" button to enter the DAG editing page.
- Drag the <img src="/img/tasks/icons/http.png" width="15"/> from the toolbar to the drawing board.
## Task Parameter
- **Node name**: The node name in a workflow definition is unique.
- **Run flag**: Identifies whether this node can be scheduled normally, if it does not need to be executed, you can turn on the prohibition switch.
- **Descriptive information**: describe the function of the node.
- **Task priority**: When the number of worker threads is insufficient, they are executed in order from high to low, and when the priority is the same, they are executed according to the first-in first-out principle.
- **Worker grouping**: Tasks are assigned to the machines of the worker group to execute. If Default is selected, a worker machine will be randomly selected for execution.
- **Environment Name**: Configure the environment name in which to run the script.
- **Number of failed retry attempts**: The number of times the task failed to be resubmitted.
- **Failed retry interval**: The time, in cents, interval for resubmitting the task after a failed task.
- **Delayed execution time**: the time, in cents, that a task is delayed in execution.
- **Timeout alarm**: Check the timeout alarm and timeout failure. When the task exceeds the "timeout period", an alarm email will be sent and the task execution will fail.
- **Request address**: HTTP request URL.
- **Request type**: Support GET, POSt, HEAD, PUT, DELETE.
- **Request parameters**: Support Parameter, Body, Headers.
- **Verification conditions**: support default response code, custom response code, content included, content not included.
- **Verification content**: When the verification condition selects a custom response code, the content contains, and the content does not contain, the verification content is required.
- **Custom parameter**: It is a user-defined parameter of http part, which will replace the content with `${variable}` in the script.
- **Predecessor task**: Selecting a predecessor task for the current task will set the selected predecessor task as upstream of the current task.
## Example
HTTP defines the different methods of interacting with the server, the most basic methods are GET, POST, PUT and DELETE. Here we use the http task node to demonstrate the use of POST to send a request to the system's login page to submit data.
The main configuration parameters are as follows:
- URL: Address to access the target resource. Here is the system's login page.
- HTTP Parameters:
- userName: Username
- userPassword: User login password
![http_task](/img/tasks/demo/http_task01.png)
## Notice
None

73
docs/docs/en/guide/task/map-reduce.md

@ -0,0 +1,73 @@
# MapReduce Node
## Overview
MapReduce(MR) task type used for executing MapReduce programs. For MapReduce nodes, the worker submits the task by using the Hadoop command `hadoop jar`. See [Hadoop Command Manual](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CommandsManual.html#jar) for more details.
## Create Task
- Click `Project -> Management-Project -> Name-Workflow Definition`, and click the `Create Workflow` button to enter the DAG editing page.
- Drag from the toolbar <img src="/img/tasks/icons/mr.png" width="15"/> to the canvas.
## Task Parameter
- **Node name**: The node name in a workflow definition is unique.
- **Run flag**: Identifies whether this node schedules normally, if it does not need to execute, select the `prohibition execution`.
- **Descriptive information**: Describe the function of the node.
- **Task priority**: When the number of worker threads is insufficient, execute in the order of priority from high to low, and tasks with the same priority will execute in a first-in first-out order.
- **Worker grouping**: Assign tasks to the machines of the worker group to execute. If `Default` is selected, randomly select a worker machine for execution.
- **Environment Name**: Configure the environment name in which run the script.
- **Times of failed retry attempts**: The number of times the task failed to resubmit.
- **Failed retry interval**: The time interval (unit minute) for resubmitting the task after a failed task.
- **Delayed execution time**: The time (unit minute) that a task delays in execution.
- **Timeout alarm**: Check the timeout alarm and timeout failure. When the task runs exceed the "timeout", an alarm email will send and the task execution will fail.
- **Resource**: Refers to the list of resource files that called in the script, and upload or create files by the Resource Center file management.
- **Custom parameters**: It is a local user-defined parameter for MapReduce, and will replace the content with `${variable}` in the script.
- **Predecessor task**: Selecting a predecessor task for the current task, will set the selected predecessor task as upstream of the current task.
### JAVA or SCALA Program
- **Program type**: Select JAVA or SCALA program.
- **The class of the main function**: The **full path** of Main Class, the entry point of the MapReduce program.
- **Main jar package**: The jar package of the MapReduce program.
- **Task name** (optional): MapReduce task name.
- **Command line parameters**: Set the input parameters of the MapReduce program and support the substitution of custom parameter variables.
- **Other parameters**: support `-D`, `-files`, `-libjars`, `-archives` format.
- **Resource**: Appoint resource files in the `Resource` if parameters refer to them.
- **User-defined parameter**: It is a local user-defined parameter for MapReduce, and will replace the content with `${variable}` in the script.
## Python Program
- **Program type**: Select Python language.
- **Main jar package**: The Python jar package for running MapReduce.
- **Other parameters**: support `-D`, `-mapper`, `-reducer,` `-input` `-output` format, and you can set the input of user-defined parameters, such as:
- `-mapper "mapper.py 1"` `-file mapper.py` `-reducer reducer.py` `-file reducer.py` `–input /journey/words.txt` `-output /journey/out/mr/\${currentTimeMillis}`
- The `mapper.py 1` after `-mapper` is two parameters, the first parameter is `mapper.py`, and the second parameter is `1`.
- **Resource**: Appoint resource files in the `Resource` if parameters refer to them.
- **User-defined parameter**: It is a local user-defined parameter for MapReduce, and will replace the content with `${variable}` in the script.
## Task Example
### Execute the WordCount Program
This example is a common introductory type of MapReduce application, which used to count the number of identical words in the input text.
#### Configure the MapReduce Environment in DolphinScheduler
If you are using the MapReduce task type in a production environment, it is necessary to configure the required environment first. The following is the configuration file: `/dolphinscheduler/conf/env/dolphinscheduler_env.sh`.
![mr_configure](/img/tasks/demo/mr_task01.png)
#### Upload the Main Package
When using the MapReduce task node, you need to use the Resource Centre to upload the jar package for the execution. Refer to the [resource centre](../resource.md).
After finish the Resource Centre configuration, upload the required target files directly by dragging and dropping.
![resource_upload](/img/tasks/demo/upload_jar.png)
#### Configure MapReduce Nodes
Configure the required content according to the parameter descriptions above.
![demo-mr-simple](/img/tasks/demo/mr_task02.png)

19
docs/docs/en/guide/task/pigeon.md

@ -0,0 +1,19 @@
# Pigeon
Pigeon is a task used to trigger remote tasks, acquire logs or status by calling remote WebSocket service. It is DolphinScheduler uses a remote WebSocket service to call tasks.
## Create
Drag from the toolbar <img src="/img/pigeon.png" width="20"/> to the canvas to create a new Pigeon task.
## Parameter
- Node name: The node name in a workflow definition is unique.
- Run flag: Identifies whether this node schedules normally, if it does not need to execute, select the `prohibition execution`.
- Descriptive information: Describe the function of the node.
- Task priority: When the number of worker threads is insufficient, execute in the order of priority from high to low, and tasks with the same priority will execute in a first-in first-out order.
- Worker grouping: Assign tasks to the machines of the worker group to execute. If `Default` is selected, randomly select a worker machine for execution.
- Times of failed retry attempts: The number of times the task failed to resubmit. You can select from drop-down or fill-in a number.
- Failed retry interval: The time interval for resubmitting the task after a failed task. You can select from drop-down or fill-in a number.
- Timeout alarm: Check the timeout alarm and timeout failure. When the task runs exceed the "timeout", an alarm email will send and the task execution will fail.
- Target task name: Target task name of this Pigeon node.

55
docs/docs/en/guide/task/python.md

@ -0,0 +1,55 @@
# Python Node
## Overview
Use `Python Task` to create a python-type task and execute python scripts. When the worker executes `Python Task`,
it will generate a temporary python script, and executes the script by the Linux user with the same name as the tenant.
## Create Task
- Click Project Management-Project Name-Workflow Definition, and click the "Create Workflow" button to enter the DAG editing page.
- Drag <img src="/img/tasks/icons/python.png" width="15"/> from the toolbar to the canvas.
## Task Parameter
- Node name: The node name in a workflow definition is unique.
- Run flag: Identifies whether this node can be scheduled normally, if it does not need to be executed, you can turn on the prohibition switch.
- Descriptive information: Describe the function of the node.
- Task priority: When the number of worker threads is insufficient, execute in the order of priority from high to low, and tasks with the same priority will execute in a first-in first-out order.
- Worker grouping: Assign tasks to the machines of the worker group to execute. If `Default` is selected, randomly select a worker machine for execution.
- Environment Name: Configure the environment name in which to run the script.
- Number of failed retry attempts: The failure task resubmitting times. It supports drop-down and hand-filling.
- Failed retry interval: The time interval for resubmitting the task after a failed task. It supports drop-down and hand-filling.
- Timeout alarm: Check the timeout alarm and timeout failure. When the task exceeds the "timeout period", an alarm email will send and the task execution will fail.
- Script: Python program developed by the user.
- Resource: Refers to the list of resource files that need to be called in the script, and the files uploaded or created by the resource center-file management.
- Custom parameters: It is the user-defined parameters of Python, which will replace the content with \${variable} in the script.
## Task Example
### Simply Print
This example simulates a common task that runs by a simple command. The example is to print one line in the log file, as shown in the following figure:
"This is a demo of python task".
![demo-python-simple](/img/tasks/demo/python_ui_next.jpg)
```python
print("This is a demo of python task")
```
### Custom Parameters
This example simulates a custom parameter task. We use parameters for reusing existing tasks as template or coping with the dynamic task. In this case,
we declare a custom parameter named "param_key", with the value "param_val". Then we use echo to print the parameter "${param_key}" we just declared.
After running this example, we would see "param_val" print in the log.
![demo-python-custom-param](/img/tasks/demo/python_custom_param_ui_next.jpg)
```python
print("${param_key}")
```
## Notice
None

43
docs/docs/en/guide/task/shell.md

@ -0,0 +1,43 @@
# Shell
## Overview
Shell task used to create a shell task type and execute a series of shell scripts. When the worker run the shell task, a temporary shell script is generated, and use the Linux user with the same name as the tenant executes the script.
## Create Task
- Click `Project -> Management-Project -> Name-Workflow Definition`, and click the `Create Workflow` button to enter the DAG editing page.
- Drag from the toolbar <img src="/img/tasks/icons/shell.png" width="15"/> to the canvas.
## Task Parameter
- Node name: The node name in a workflow definition is unique.
- Run flag: Identifies whether this node schedules normally, if it does not need to execute, select the `prohibition execution`.
- Descriptive information: Describe the function of the node.
- Task priority: When the number of worker threads is insufficient, execute in the order of priority from high to low, and tasks with the same priority will execute in a first-in first-out order.
- Worker grouping: Assign tasks to the machines of the worker group to execute. If `Default` is selected, randomly select a worker machine for execution.
- Environment Name: Configure the environment name in which run the script.
- Times of failed retry attempts: The number of times the task failed to resubmit. You can select from drop-down or fill-in a number.
- Failed retry interval: The time interval for resubmitting the task after a failed task. You can select from drop-down or fill-in a number.
- Timeout alarm: Check the timeout alarm and timeout failure. When the task runs exceed the "timeout", an alarm email will send and the task execution will fail.
- Script: Shell program developed by users.
- Resource: Refers to the list of resource files that called in the script, and upload or create files by the Resource Center file management.
- Custom parameters: It is a user-defined local parameter of Shell, and will replace the content with `${variable}` in the script.
- Predecessor task: Selecting a predecessor task for the current task, will set the selected predecessor task as upstream of the current task.
## Task Example
### Simply Print
We make an example simulate from a common task which runs by one command. The example is to print one line in the log file, as shown in the following figure:
"This is a demo of shell task".
![demo-shell-simple](/img/tasks/demo/shell.jpg)
### Custom Parameters
This example simulates a custom parameter task. We use parameters for reusing existing tasks as template or coping with the dynamic task. In this case,
we declare a custom parameter named "param_key", with the value "param_val". Then we use `echo` to print the parameter "${param_key}" we just declared.
After running this example, we would see "param_val" print in the log.
![demo-shell-custom-param](/img/tasks/demo/shell_custom_param.jpg)

68
docs/docs/en/guide/task/spark.md

@ -0,0 +1,68 @@
# Spark Node
## Overview
Spark task type used to execute Spark program. For Spark nodes, the worker submits the task by using the spark command `spark submit`. See [spark-submit](https://spark.apache.org/docs/3.2.1/submitting-applications.html#launching-applications-with-spark-submit) for more details.
## Create Task
- Click `Project -> Management-Project -> Name-Workflow Definition`, and click the `Create Workflow` button to enter the DAG editing page.
- Drag from the toolbar <img src="/img/tasks/icons/spark.png" width="15"/> to the canvas.
## Task Parameter
- **Node name**: The node name in a workflow definition is unique.
- **Run flag**: Identifies whether this node schedules normally, if it does not need to execute, select the `prohibition execution`.
- **Descriptive information**: Describe the function of the node.
- **Task priority**: When the number of worker threads is insufficient, execute in the order of priority from high to low, and tasks with the same priority will execute in a first-in first-out order.
- **Worker grouping**: Assign tasks to the machines of the worker group to execute. If `Default` is selected, randomly select a worker machine for execution.
- **Environment Name**: Configure the environment name in which run the script.
- **Times of failed retry attempts**: The number of times the task failed to resubmit.
- **Failed retry interval**: The time interval (unit minute) for resubmitting the task after a failed task.
- **Delayed execution time**: The time (unit minute) that a task delays in execution.
- **Timeout alarm**: Check the timeout alarm and timeout failure. When the task runs exceed the "timeout", an alarm email will send and the task execution will fail.
- **Program type**: Supports Java, Scala and Python.
- **Spark version**: Support Spark1 and Spark2.
- **The class of main function**: The **full path** of Main Class, the entry point of the Spark program.
- **Main jar package**: The Spark jar package (upload by Resource Center).
- **Deployment mode**: Support 3 deployment modes: yarn-cluster, yarn-client and local.
- **Task name** (optional): Spark task name.
- **Driver core number**: Set the number of Driver core, which can be set according to the actual production environment.
- **Driver memory size**: Set the size of Driver memories, which can be set according to the actual production environment.
- **Number of Executor**: Set the number of Executor, which can be set according to the actual production environment.
- **Executor memory size**: Set the size of Executor memories, which can be set according to the actual production environment.
- **Main program parameters**: Set the input parameters of the Spark program and support the substitution of custom parameter variables.
- **Optional parameters**: support `--jars`, `--files`,` --archives`, `--conf` format.
- **Resource**: Appoint resource files in the `Resource` if parameters refer to them.
- **Custom parameter**: It is a local user-defined parameter for Spark, and will replace the content with `${variable}` in the script.
- **Predecessor task**: Selecting a predecessor task for the current task, will set the selected predecessor task as upstream of the current task.
## Task Example
### Execute the WordCount Program
This is a common introductory case in the big data ecosystem, which often apply to computational frameworks such as MapReduce, Flink and Spark. The main purpose is to count the number of identical words in the input text. (Flink's releases attach this example job)
#### Configure the Spark Environment in DolphinScheduler
If you are using the Spark task type in a production environment, it is necessary to configure the required environment first. The following is the configuration file: `/dolphinscheduler/conf/env/dolphinscheduler_env.sh`.
![spark_configure](/img/tasks/demo/spark_task01.png)
#### Upload the Main Package
When using the Spark task node, you need to upload the jar package to the Resource Centre for the execution, refer to the [resource center](../resource.md).
After finish the Resource Centre configuration, upload the required target files directly by dragging and dropping.
![resource_upload](/img/tasks/demo/upload_jar.png)
#### Configure Spark Nodes
Configure the required content according to the parameter descriptions above.
![demo-spark-simple](/img/tasks/demo/spark_task02.png)
## Notice
JAVA and Scala only used for identification, there is no difference. If use Python to develop Flink, there is no class of the main function and the rest is the same.

43
docs/docs/en/guide/task/sql.md

@ -0,0 +1,43 @@
# SQL
## Overview
SQL task type used to connect to databases and execute SQL.
## Create DataSource
Refer to [DataSource](../datasource/introduction.md)
## Create Task
- Click `Project -> Management-Project -> Name-Workflow Definition`, and click the "Create Workflow" button to enter the DAG editing page.
- Drag from the toolbar <img src="/img/tasks/icons/sql.png" width="25"/> to the canvas.
## Task Parameter
- Data source: Select the corresponding DataSource.
- SQL type: Supports query and non-query. The query is a `select` type query, which is returned with a result set. You can specify three templates for email notification: form, attachment or form attachment. Non-queries return without a result set, three types of operations are: update, delete and insert.
- SQL parameter: The input parameter format is `key1=value1;key2=value2...`.
- SQL statement: SQL statement.
- UDF function: For Hive DataSources, you can refer to UDF functions created in the resource center, but other DataSource do not support UDF functions.
- Custom parameters: SQL task type, and stored procedure is a custom parameter order, to set customized parameter type and data type for the method is the same as the stored procedure task type. The difference is that the custom parameter of the SQL task type replaces the `${variable}` in the SQL statement.
- Pre-SQL: Pre-SQL executes before the SQL statement.
- Post-SQL: Post-SQL executes after the SQL statement.
## Task Example
### Create a Temporary Table in Hive and Write Data
This example creates a temporary table `tmp_hello_world` in Hive and writes a row of data. Before creating a temporary table, we need to ensure that the table does not exist. So we use custom parameters to obtain the time of the day as the suffix of the table name every time we run, this task can run every different day. The format of the created table name is: `tmp_hello_world_{yyyyMMdd}`.
![hive-sql](/img/tasks/demo/hive-sql.png)
### After Running the Task Successfully, Query the Results in Hive
Log in to the bigdata cluster and use 'hive' command or 'beeline' or 'JDBC' and other methods to connect to the 'Apache Hive' for the query. The query SQL is `select * from tmp_hello_world_{yyyyMMdd}`, please replace `{yyyyMMdd}` with the date of the running day. The following shows the query screenshot:
![hive-sql](/img/tasks/demo/hive-result.png)
## Notice
Pay attention to the selection of SQL type. If it is an insert operation, need to change to "Non-Query" type.

13
docs/docs/en/guide/task/stored-procedure.md

@ -0,0 +1,13 @@
# Stored Procedure
- Execute the stored procedure according to the selected DataSource.
> Drag from the toolbar ![PNG](https://analysys.github.io/easyscheduler_docs_cn/images/toolbar_PROCEDURE.png) task node into the canvas, as shown in the figure below:
<p align="center">
<img src="/img/procedure-en.png" width="80%" />
</p>
- DataSource: The DataSource type of the stored procedure supports MySQL and POSTGRESQL, select the corresponding DataSource.
- Method: The method name of the stored procedure.
- Custom parameters: The custom parameter types of the stored procedure support `IN` and `OUT`, and the data types support: VARCHAR, INTEGER, LONG, FLOAT, DOUBLE, DATE, TIME, TIMESTAMP and BOOLEAN.

46
docs/docs/en/guide/task/sub-process.md

@ -0,0 +1,46 @@
# SubProcess Node
## Overview
The sub-process node is to execute an external workflow definition as a task node.
## Create Task
- Click `Project -> Management-Project -> Name-Workflow Definition`, and click the `Create Workflow` button to enter the DAG editing page.
- Drag from the toolbar <img src="/img/tasks/icons/sub_process.png" width="15"/> task node to canvas to create a new SubProcess task.
## Task Parameter
- Node name: The node name in a workflow definition is unique.
- Run flag: Identifies whether this node schedules normally.
- Descriptive information: Describe the function of the node.
- Task priority: When the number of worker threads is insufficient, execute in the order of priority from high to low, and tasks with the same priority will execute in a first-in first-out order.
- Worker grouping: Assign tasks to the machines of the worker group to execute. If `Default` is selected, randomly select a worker machine for execution.
- Environment Name: Configure the environment name in which run the script.
- Timeout alarm: Check the timeout alarm and timeout failure. When the task runs exceed the "timeout", an alarm email will send and the task execution will fail.
- Sub-node: It is the workflow definition of the selected sub-process. Enter the sub-node in the upper right corner to jump to the workflow definition of the selected sub-process.
- Predecessor task: Selecting a predecessor task for the current task, will set the selected predecessor task as upstream of the current task.
## Task Example
This example simulates a common task type, here we use a child node task to recall the [Shell](shell.md) to print out "hello". This means executing a shell task as a child node.
### Create a Shell task
Create a shell task to print "hello" and define the workflow as `test_dag01`.
![subprocess_task01](/img/tasks/demo/subprocess_task01.png)
## Create the Sub_process task
To use the sub_process, you need to create the sub-node task, which is the shell task we created in the first step. After that, as shown in the diagram below, select the corresponding sub-node in position ⑤.
![subprocess_task02](/img/tasks/demo/subprocess_task02.png)
After creating the sub_process, create a corresponding shell task for printing "world" and link both together. Save the current workflow and run it to get the expected result.
![subprocess_task03](/img/tasks/demo/subprocess_task03.png)
## Notice
When using `sub_process` to recall a sub-node task, you need to ensure that the defined sub-node is online status, otherwise, the sub_process workflow will not work properly.

39
docs/docs/en/guide/task/switch.md

@ -0,0 +1,39 @@
# Switch
The switch is a conditional judgment node, decide the branch executes according to the value of [global variable](../parameter/global.md) and the expression result written by the user.
## Create
Drag from the toolbar <img src="/img/switch.png" width="20"/> task node to canvas to create a task.
**Note** After created a switch task, you must first configure the upstream and downstream, then configure the parameter of task branches.
## Parameter
- Node name: The node name in a workflow definition is unique.
- Run flag: Identifies whether this node schedules normally, if it does not need to execute, select the `prohibition execution`.
- Descriptive information: Describe the function of the node.
- Task priority: When the number of worker threads is insufficient, execute in the order of priority from high to low, and tasks with the same priority will execute in a first-in first-out order.
- Worker grouping: Assign tasks to the machines of the worker group to execute. If `Default` is selected, randomly select a worker machine for execution.
- Times of failed retry attempts: The number of times the task failed to resubmit. You can select from drop-down or fill-in a number.
- Failed retry interval: The time interval for resubmitting the task after a failed task. You can select from drop-down or fill-in a number.
- Timeout alarm: Check the timeout alarm and timeout failure. When the task runs exceed the "timeout", an alarm email will send and the task execution will fail.
- Condition: You can configure multiple conditions for the switch task. When the conditions are satisfied, execute the configured branch. You can configure multiple different conditions to satisfy different businesses.
- Branch flow: The default branch flow, when all the conditions are not satisfied, execute this branch flow.
## Detail
Here we have three tasks, the dependencies are `A -> B -> [C, D]`, and `task_a` is a shell task and `task_b` is a switch task
- In task A, a global variable named `id` is defined through [global variable](../parameter/global.md), and the declaration method is `${setValue(id=1)}`
- Task B adds conditions and uses global variables declared upstream to achieve conditional judgment (Note: switch can get the global variables value, as long as its direct or indirect upstream have already assigned the global variables before switch acquires). We want to execute task C when `id = 1`, otherwise run task D
- Configure task C to run when the global variable `id=1`. Then edit `${id} == 1` in the condition of task B, and select `C` as branch flow
- For other tasks, select `D` as branch flow
The following shows the switch task configuration:
![task-switch-configure](/img/switch_configure.jpg)
## Related Task
[condition](conditions.md):[Condition](conditions.md)task mainly executes the corresponding branch based on the execution result status (success, failure) of the upstream node.
The [Switch](switch.md) task mainly executes the corresponding branch based on the value of the [global parameter](../parameter/global.md) and the judgment expression result written by the user.

84
docs/docs/en/guide/upgrade.md

@ -0,0 +1,84 @@
# DolphinScheduler Upgrade Documentation
## Back-Up Previous Version's Files and Database
## Stop All Services of DolphinScheduler
`sh ./script/stop-all.sh`
## Download the Newest Version Installation Package
- [download](/en-us/download/download.html) the latest version of the installation packages.
- The following upgrade operations need to be performed in the new version's directory.
## Database Upgrade
- Modify the following properties in `conf/datasource.properties`.
- If using MySQL as the database to run DolphinScheduler, please comment out PostgreSQL related configurations, and add MYSQL connector jar into lib dir, here we download `mysql-connector-java-8.0.16.jar`, and then correctly configure database connection information. You can download MYSQL connector jar from [here](https://downloads.MySQL.com/archives/c-j/). Alternatively, if you use PostgreSQL as the database, you just need to comment out Mysql related configurations and correctly configure database connect information.
```properties
# postgre
#spring.datasource.driver-class-name=org.postgresql.Driver
#spring.datasource.url=jdbc:postgresql://localhost:5432/dolphinscheduler
# mysql
spring.datasource.driver-class-name=com.mysql.jdbc.Driver
spring.datasource.url=jdbc:mysql://xxx:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8&allowMultiQueries=true
spring.datasource.username=xxx
spring.datasource.password=xxx
```
- Execute database upgrade script:
`sh ./script/upgrade-dolphinscheduler.sh`
## Backend Service Upgrade
### Modify the Content in `conf/config/install_config.conf` File
- Standalone Deployment please refer to the [Standalone-Deployment](./installation/standalone.md).
- Cluster Deployment please refer to the [Cluster-Deployment](./installation/cluster.md).
#### Masters Need Attentions
Create worker group in 1.3.1 version has a different design:
- Before version 1.3.1 worker group can be created through UI interface.
- Since version 1.3.1 worker group can be created by modifying the worker configuration.
#### When Upgrade from Version Before 1.3.1 to 1.3.2, the Below Operations are What We Need to Do to Keep Worker Group Configuration Consist with Previous
1. Go to the backup database, search records in `t_ds_worker_group table`, mainly focus `id, name and IP` three columns.
| id | name | ip_list |
| :--- | :---: | ---: |
| 1 | service1 | 192.168.xx.10 |
| 2 | service2 | 192.168.xx.11,192.168.xx.12 |
2. Modify the worker configuration in `conf/config/install_config.conf` file.
Assume bellow are the machine worker service to be deployed:
| hostname | ip |
| :--- | :---: |
| ds1 | 192.168.xx.10 |
| ds2 | 192.168.xx.11 |
| ds3 | 192.168.xx.12 |
To keep worker group config consistent with the previous version, we need to modify workers configuration as below:
```shell
#worker service is deployed on which machine, and also specify which worker group this worker belongs to.
workers="ds1:service1,ds2:service2,ds3:service2"
```
#### The Worker Group has Been Enhanced in Version 1.3.2
Workers in 1.3.1 can't belong to more than one worker group, but in 1.3.2 it's supported. So in 1.3.1 it's not supported when `workers="ds1:service1,ds1:service2"`, and in 1.3.2 it's supported.
### Execute Deploy Script
```shell
`sh install.sh`
```

73
docs/docs/en/history-versions.md

@ -0,0 +1,73 @@
<!-- markdown-link-check-disable -->
# Older Versions:
#### Setup instructions, are available for each stable version of Apache DolphinScheduler below:
### Versions: 2.0.5
#### Links: [2.0.5 Document](../2.0.5/user_doc/guide/quick-start.md)
### Versions: 2.0.3
#### Links: [2.0.3 Document](../2.0.3/user_doc/guide/quick-start.md)
### Versions: 2.0.2
#### Links:[2.0.2 Document](../2.0.2/user_doc/guide/quick-start.md)
### Versions: 2.0.1
#### Links:[2.0.1 Document](../2.0.1/user_doc/guide/quick-start.md)
### Versions: 2.0.0
#### Links:[2.0.0 Document](../2.0.0/user_doc/guide/quick-start.md)
### Versions:1.3.9
#### Links:[1.3.9 Document](../1.3.9/user_doc/quick-start.md)
### Versions:1.3.8
#### Links:[1.3.8 Document](../1.3.8/user_doc/quick-start.md)
### Versions:1.3.6
#### Links:[1.3.6 Document](../1.3.6/user_doc/quick-start.md)
### Versions:1.3.5
#### Links:[1.3.5 Document](../1.3.5/user_doc/quick-start.md)
### Versions:1.3.4
##### Links:[1.3.4 Document](../1.3.4/user_doc/quick-start.md)
### Versions:1.3.3
#### Links:[1.3.3 Document](../1.3.4/user_doc/quick-start.md)
### Versions:1.3.2
#### Links:[1.3.2 Document](../1.3.2/user_doc/quick-start.md)
### Versions:1.3.1
#### Links:[1.3.1 Document](../1.3.1/user_doc/quick-start.md)
### Versions:1.2.1
#### Links:[1.2.1 Document](../1.2.1/user_doc/quick-start.md)
### Versions:1.2.0
#### Links:[1.2.0 Document](../1.2.0/user_doc/quick-start.md)
### Versions:1.1.0
#### Links:[1.1.0 Document](../1.2.0/user_doc/quick-start.md)
### Versions:Dev
#### Links:[Dev Document](../dev/user_doc/about/introduction.md)

58
docs/docs/zh/about/glossary.md

@ -0,0 +1,58 @@
## 名词解释
在对Apache DolphinScheduler了解之前,我们先来认识一下调度系统常用的名词
### 1.名词解释
**DAG:** 全称Directed Acyclic Graph,简称DAG。工作流中的Task任务以有向无环图的形式组装起来,从入度为零的节点进行拓扑遍历,直到无后继节点为止。举例如下图:
<p align="center">
<img src="/img/dag_examples_cn.jpg" alt="dag示例" width="60%" />
<p align="center">
<em>dag示例</em>
</p>
</p>
**流程定义**:通过拖拽任务节点并建立任务节点的关联所形成的可视化**DAG**
**流程实例**:流程实例是流程定义的实例化,可以通过手动启动或定时调度生成,流程定义每运行一次,产生一个流程实例
**任务实例**:任务实例是流程定义中任务节点的实例化,标识着具体的任务执行状态
**任务类型**:目前支持有SHELL、SQL、SUB_PROCESS(子流程)、PROCEDURE、MR、SPARK、PYTHON、DEPENDENT(依赖)、,同时计划支持动态插件扩展,注意:其中子 **SUB_PROCESS**
也是一个单独的流程定义,是可以单独启动执行的
**调度方式**:系统支持基于cron表达式的定时调度和手动调度。命令类型支持:启动工作流、从当前节点开始执行、恢复被容错的工作流、恢复暂停流程、从失败节点开始执行、补数、定时、重跑、暂停、停止、恢复等待线程。
其中 **恢复被容错的工作流****恢复等待线程** 两种命令类型是由调度内部控制使用,外部无法调用
**定时调度**:系统采用 **quartz** 分布式调度器,并同时支持cron表达式可视化的生成
**依赖**:系统不单单支持 **DAG** 简单的前驱和后继节点之间的依赖,同时还提供**任务依赖**节点,支持**流程间的自定义任务依赖**
**优先级** :支持流程实例和任务实例的优先级,如果流程实例和任务实例的优先级不设置,则默认是先进先出
**邮件告警**:支持 **SQL任务** 查询结果邮件发送,流程实例运行结果邮件告警及容错告警通知
**失败策略**:对于并行运行的任务,如果有任务失败,提供两种失败策略处理方式,**继续**是指不管并行运行任务的状态,直到流程失败结束。**结束**是指一旦发现失败任务,则同时Kill掉正在运行的并行任务,流程失败结束
**补数**:补历史数据,支持**区间并行和串行**两种补数方式
### 2.模块介绍
- dolphinscheduler-alert 告警模块,提供 AlertServer 服务。
- dolphinscheduler-api web应用模块,提供 ApiServer 服务。
- dolphinscheduler-common 通用的常量枚举、工具类、数据结构或者基类
- dolphinscheduler-dao 提供数据库访问等操作。
- dolphinscheduler-remote 基于 netty 的客户端、服务端
- dolphinscheduler-server MasterServer 和 WorkerServer 服务
- dolphinscheduler-service service模块,包含Quartz、Zookeeper、日志客户端访问服务,便于server模块和api模块调用
- dolphinscheduler-ui 前端模块

47
docs/docs/zh/about/hardware.md

@ -0,0 +1,47 @@
# 软硬件环境建议配置
DolphinScheduler 作为一款开源分布式工作流任务调度系统,可以很好地部署和运行在 Intel 架构服务器及主流虚拟化环境下,并支持主流的Linux操作系统环境
## 1. Linux 操作系统版本要求
| 操作系统 | 版本 |
| :----------------------- | :----------: |
| Red Hat Enterprise Linux | 7.0 及以上 |
| CentOS | 7.0 及以上 |
| Oracle Enterprise Linux | 7.0 及以上 |
| Ubuntu LTS | 16.04 及以上 |
> **注意:**
>以上 Linux 操作系统可运行在物理服务器以及 VMware、KVM、XEN 主流虚拟化环境上
## 2. 服务器建议配置
DolphinScheduler 支持运行在 Intel x86-64 架构的 64 位通用硬件服务器平台。对生产环境的服务器硬件配置有以下建议:
### 生产环境
| **CPU** | **内存** | **硬盘类型** | **网络** | **实例数量** |
| --- | --- | --- | --- | --- |
| 4核+ | 8 GB+ | SAS | 千兆网卡 | 1+ |
> **注意:**
> - 以上建议配置为部署 DolphinScheduler 的最低配置,生产环境强烈推荐使用更高的配置
> - 硬盘大小配置建议 50GB+ ,系统盘和数据盘分开
## 3. 网络要求
DolphinScheduler正常运行提供如下的网络端口配置:
| 组件 | 默认端口 | 说明 |
| --- | --- | --- |
| MasterServer | 5678 | 非通信端口,只需本机端口不冲突即可 |
| WorkerServer | 1234 | 非通信端口,只需本机端口不冲突即可 |
| ApiApplicationServer | 12345 | 提供后端通信端口 |
> **注意:**
> - MasterServer 和 WorkerServer 不需要开启网络间通信,只需本机端口不冲突即可
> - 管理员可根据实际环境中 DolphinScheduler 组件部署方案,在网络侧和主机侧开放相关端口
## 4. 客户端 Web 浏览器要求
DolphinScheduler 推荐 Chrome 以及使用 Chromium 内核的较新版本浏览器访问前端可视化操作界面

12
docs/docs/zh/about/introduction.md

@ -0,0 +1,12 @@
# 关于DolphinScheduler
Apache DolphinScheduler是一个分布式易扩展的可视化DAG工作流任务调度开源系统。解决数据研发ETL 错综复杂的依赖关系,不能直观监控任务健康状态等问题。DolphinScheduler以DAG流式的方式将Task组装起来,可实时监控任务的运行状态,同时支持重试、从指定节点恢复失败、暂停及Kill任务等操作
# 简单易用
DAG监控界面,所有流程定义都是可视化,通过拖拽任务定制DAG,通过API方式与第三方系统对接, 一键部署
# 高可靠性
去中心化的多Master和多Worker, 自身支持HA功能, 采用任务队列来避免过载,不会造成机器卡死
# 丰富的使用场景
支持暂停恢复操作.支持多租户,更好的应对大数据的使用场景. 支持更多的任务类型,如 spark, hive, mr, python, sub_process, shell
# 高扩展性
支持自定义任务类型,调度器使用分布式调度,调度能力随集群线性增长,Master和Worker支持动态上下线

42
docs/docs/zh/architecture/cache.md

@ -0,0 +1,42 @@
### 缓存
#### 缓存目的
由于在master-server调度过程中,会产生大量的数据库读取操作,如tenant,user,processDefinition等,一方面对DB产生很大的读压力,另一方面则会使整个核心调度流程变得缓慢;
考虑到这部分业务数据是读多写少的场景,故引入了缓存模块,以减少DB读压力,加快核心调度流程;
#### 缓存设置
```yaml
spring:
cache:
# default enable cache, you can disable by `type: none`
type: none
cache-names:
- tenant
- user
- processDefinition
- processTaskRelation
- taskDefinition
caffeine:
spec: maximumSize=100,expireAfterWrite=300s,recordStats
```
缓存模块采用[spring-cache](https://spring.io/guides/gs/caching/)机制,可直接在spring配置文件中配置是否开启缓存(默认`none`关闭), 缓存类型;
目前采用[caffeine](https://github.com/ben-manes/caffeine)进行缓存管理,可自由设置缓存相关配置,如缓存大小、过期时间等;
#### 缓存读取
缓存采用spring-cache的注解,配置在相关的mapper层,可参考如:`TenantMapper`.
#### 缓存更新
业务数据的更新来自于api-server, 而缓存端在master-server, 故需要对api-server的数据更新做监听(aspect切面拦截`@CacheEvict`),当需要进行缓存驱逐时会通知master-server,master-server接收到cacheEvictCommand后进行缓存驱逐;
需要注意的是:缓存更新的兜底策略来自于用户在caffeine中的过期策略配置,请结合业务进行配置;
时序图如下图所示:
<img src="/img/cache-evict.png" alt="cache-evict" style="zoom: 67%;" />

406
docs/docs/zh/architecture/configuration.md

@ -0,0 +1,406 @@
<!-- markdown-link-check-disable -->
# 前言
本文档为dolphinscheduler配置文件说明文档,针对版本为 dolphinscheduler-1.3.x 版本.
# 目录结构
目前dolphinscheduler 所有的配置文件都在 [conf ] 目录中.
为了更直观的了解[conf]目录所在的位置以及包含的配置文件,请查看下面dolphinscheduler安装目录的简化说明.
本文主要讲述dolphinscheduler的配置文件.其他部分先不做赘述.
[注:以下 dolphinscheduler 简称为DS.]
```
├─bin DS命令存放目录
│ ├─dolphinscheduler-daemon.sh 启动/关闭DS服务脚本
│ ├─start-all.sh 根据配置文件启动所有DS服务
│ ├─stop-all.sh 根据配置文件关闭所有DS服务
├─conf 配置文件目录
│ ├─application-api.properties api服务配置文件
│ ├─datasource.properties 数据库配置文件
│ ├─zookeeper.properties zookeeper配置文件
│ ├─master.properties master服务配置文件
│ ├─worker.properties worker服务配置文件
│ ├─quartz.properties quartz服务配置文件
│ ├─common.properties 公共服务[存储]配置文件
│ ├─alert.properties alert服务配置文件
│ ├─config 环境变量配置文件夹
│ ├─install_config.conf DS环境变量配置脚本[用于DS安装/启动]
│ ├─env 运行脚本环境变量配置目录
│ ├─dolphinscheduler_env.sh 运行脚本加载环境变量配置文件[如: JAVA_HOME,HADOOP_HOME, HIVE_HOME ...]
│ ├─org mybatis mapper文件目录
│ ├─i18n i18n配置文件目录
│ ├─logback-api.xml api服务日志配置文件
│ ├─logback-master.xml master服务日志配置文件
│ ├─logback-worker.xml worker服务日志配置文件
│ ├─logback-alert.xml alert服务日志配置文件
├─sql DS的元数据创建升级sql文件
│ ├─create 创建SQL脚本目录
│ ├─upgrade 升级SQL脚本目录
│ ├─dolphinscheduler_postgre.sql postgre数据库初始化脚本
│ ├─dolphinscheduler_mysql.sql mysql数据库初始化脚本
│ ├─soft_version 当前DS版本标识文件
├─script DS服务部署,数据库创建/升级脚本目录
│ ├─create-dolphinscheduler.sh DS数据库初始化脚本
│ ├─upgrade-dolphinscheduler.sh DS数据库升级脚本
│ ├─monitor-server.sh DS服务监控启动脚本
│ ├─scp-hosts.sh 安装文件传输脚本
│ ├─remove-zk-node.sh 清理zookeeper缓存文件脚本
├─ui 前端WEB资源目录
├─lib DS依赖的jar存放目录
├─install.sh 自动安装DS服务脚本
```
# 配置文件详解
序号| 服务分类 | 配置文件|
|--|--|--|
1|启动/关闭DS服务脚本|dolphinscheduler-daemon.sh
2|数据库连接配置 | datasource.properties
3|zookeeper连接配置|zookeeper.properties
4|公共[存储]配置|common.properties
5|API服务配置|application-api.properties
6|Master服务配置|master.properties
7|Worker服务配置|worker.properties
8|Alert 服务配置|alert.properties
9|Quartz配置|quartz.properties
10|DS环境变量配置脚本[用于DS安装/启动]|install_config.conf
11|运行脚本加载环境变量配置文件 <br />[如: JAVA_HOME,HADOOP_HOME, HIVE_HOME ...]|dolphinscheduler_env.sh
12|各服务日志配置文件|api服务日志配置文件 : logback-api.xml <br /> master服务日志配置文件 : logback-master.xml <br /> worker服务日志配置文件 : logback-worker.xml <br /> alert服务日志配置文件 : logback-alert.xml
## 1.dolphinscheduler-daemon.sh [启动/关闭DS服务脚本]
dolphinscheduler-daemon.sh脚本负责DS的启动&关闭.
start-all.sh/stop-all.sh最终也是通过dolphinscheduler-daemon.sh对集群进行启动/关闭操作.
目前DS只是做了一个基本的设置,JVM参数请根据各自资源的实际情况自行设置.
默认简化参数如下:
```bash
export DOLPHINSCHEDULER_OPTS="
-server
-Xmx16g
-Xms1g
-Xss512k
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:+UseFastAccessorMethods
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=70
"
```
> 不建议设置"-XX:DisableExplicitGC" , DS使用Netty进行通讯,设置该参数,可能会导致内存泄漏.
## 2.datasource.properties [数据库连接]
在DS中使用Druid对数据库连接进行管理,默认简化配置如下.
|参数 | 默认值| 描述|
|--|--|--|
spring.datasource.driver-class-name| |数据库驱动
spring.datasource.url||数据库连接地址
spring.datasource.username||数据库用户名
spring.datasource.password||数据库密码
spring.datasource.initialSize|5| 初始连接池数量
spring.datasource.minIdle|5| 最小连接池数量
spring.datasource.maxActive|5| 最大连接池数量
spring.datasource.maxWait|60000| 最大等待时长
spring.datasource.timeBetweenEvictionRunsMillis|60000| 连接检测周期
spring.datasource.timeBetweenConnectErrorMillis|60000| 重试间隔
spring.datasource.minEvictableIdleTimeMillis|300000| 连接保持空闲而不被驱逐的最小时间
spring.datasource.validationQuery|SELECT 1|检测连接是否有效的sql
spring.datasource.validationQueryTimeout|3| 检测连接是否有效的超时时间[seconds]
spring.datasource.testWhileIdle|true| 申请连接的时候检测,如果空闲时间大于timeBetweenEvictionRunsMillis,执行validationQuery检测连接是否有效。
spring.datasource.testOnBorrow|true| 申请连接时执行validationQuery检测连接是否有效
spring.datasource.testOnReturn|false| 归还连接时执行validationQuery检测连接是否有效
spring.datasource.defaultAutoCommit|true| 是否开启自动提交
spring.datasource.keepAlive|true| 连接池中的minIdle数量以内的连接,空闲时间超过minEvictableIdleTimeMillis,则会执行keepAlive操作。
spring.datasource.poolPreparedStatements|true| 开启PSCache
spring.datasource.maxPoolPreparedStatementPerConnectionSize|20| 要启用PSCache,必须配置大于0,当大于0时,poolPreparedStatements自动触发修改为true。
## 3.zookeeper.properties [zookeeper连接配置]
|参数 |默认值| 描述|
|--|--|--|
zookeeper.quorum|localhost:2181| zk集群连接信息
zookeeper.dolphinscheduler.root|/dolphinscheduler| DS在zookeeper存储根目录
zookeeper.session.timeout|60000| session 超时
zookeeper.connection.timeout|30000| 连接超时
zookeeper.retry.base.sleep|100| 基本重试时间差
zookeeper.retry.max.sleep|30000| 最大重试时间
zookeeper.retry.maxtime|10|最大重试次数
## 4.common.properties [hadoop、s3、yarn配置]
common.properties配置文件目前主要是配置hadoop/s3a相关的配置.
|参数 |默认值| 描述|
|--|--|--|
data.basedir.path|/tmp/dolphinscheduler|本地工作目录,用于存放临时文件
resource.storage.type|NONE|资源文件存储类型: HDFS,S3,NONE
resource.upload.path|/dolphinscheduler|资源文件存储路径
hadoop.security.authentication.startup.state|false|hadoop是否开启kerberos权限
java.security.krb5.conf.path|/opt/krb5.conf|kerberos配置目录
login.user.keytab.username|hdfs-mycluster@ESZ.COM|kerberos登录用户
login.user.keytab.path|/opt/hdfs.headless.keytab|kerberos登录用户keytab
kerberos.expire.time|2|kerberos过期时间,整数,单位为小时
resource.view.suffixs| txt,log,sh,conf,cfg,py,java,sql,hql,xml,properties|资源中心支持的文件格式
hdfs.root.user|hdfs|如果存储类型为HDFS,需要配置拥有对应操作权限的用户
fs.defaultFS|hdfs://mycluster:8020|请求地址如果resource.storage.type=S3,该值类似为: s3a://dolphinscheduler. 如果resource.storage.type=HDFS, 如果 hadoop 配置了 HA,需要复制core-site.xml 和 hdfs-site.xml 文件到conf目录
fs.s3a.endpoint||s3 endpoint地址
fs.s3a.access.key||s3 access key
fs.s3a.secret.key||s3 secret key
yarn.resourcemanager.ha.rm.ids||yarn resourcemanager 地址, 如果resourcemanager开启了HA, 输入HA的IP地址(以逗号分隔),如果resourcemanager为单节点, 该值为空即可
yarn.application.status.address|http://ds1:8088/ws/v1/cluster/apps/%s|如果resourcemanager开启了HA或者没有使用resourcemanager,保持默认值即可. 如果resourcemanager为单节点,你需要将ds1 配置为resourcemanager对应的hostname
dolphinscheduler.env.path|env/dolphinscheduler_env.sh|运行脚本加载环境变量配置文件[如: JAVA_HOME,HADOOP_HOME, HIVE_HOME ...]
development.state|false|是否处于开发模式
## 5.application-api.properties [API服务配置]
|参数 |默认值| 描述|
|--|--|--|
server.port|12345|api服务通讯端口
server.servlet.session.timeout|7200|session超时时间
server.servlet.context-path|/dolphinscheduler |请求路径
spring.servlet.multipart.max-file-size|1024MB|最大上传文件大小
spring.servlet.multipart.max-request-size|1024MB|最大请求大小
server.jetty.max-http-post-size|5000000|jetty服务最大发送请求大小
spring.messages.encoding|UTF-8|请求编码
spring.jackson.time-zone|GMT+8|设置时区
spring.messages.basename|i18n/messages|i18n配置
security.authentication.type|PASSWORD|权限校验类型
## 6.master.properties [Master服务配置]
|参数 |默认值| 描述|
|--|--|--|
master.listen.port|5678|master监听端口
master.exec.threads|100|master工作线程数量,用于限制并行的流程实例数量
master.exec.task.num|20|master每个流程实例的并行任务数量
master.dispatch.task.num|3|master每个批次的派发任务数量
master.host.selector|LowerWeight|master host选择器,用于选择合适的worker执行任务,可选值: Random, RoundRobin, LowerWeight
master.heartbeat.interval|10|master心跳间隔,单位为秒
master.task.commit.retryTimes|5|任务重试次数
master.task.commit.interval|1000|任务提交间隔,单位为毫秒
master.max.cpuload.avg|-1|master最大cpuload均值,只有高于系统cpuload均值时,master服务才能调度任务. 默认值为-1: cpu cores * 2
master.reserved.memory|0.3|master预留内存,只有低于系统可用内存时,master服务才能调度任务,单位为G
## 7.worker.properties [Worker服务配置]
|参数 |默认值| 描述|
|--|--|--|
worker.listen.port|1234|worker监听端口
worker.exec.threads|100|worker工作线程数量,用于限制并行的任务实例数量
worker.heartbeat.interval|10|worker心跳间隔,单位为秒
worker.max.cpuload.avg|-1|worker最大cpuload均值,只有高于系统cpuload均值时,worker服务才能被派发任务. 默认值为-1: cpu cores * 2
worker.reserved.memory|0.3|worker预留内存,只有低于系统可用内存时,worker服务才能被派发任务,单位为G
worker.groups|default|worker分组配置,逗号分隔,例如'worker.groups=default,test' <br> worker启动时会根据该配置自动加入对应的分组
## 8.alert.properties [Alert 告警服务配置]
|参数 |默认值| 描述|
|--|--|--|
alert.type|EMAIL|告警类型|
mail.protocol|SMTP| 邮件服务器协议
mail.server.host|xxx.xxx.com|邮件服务器地址
mail.server.port|25|邮件服务器端口
mail.sender|xxx@xxx.com|发送人邮箱
mail.user|xxx@xxx.com|发送人邮箱名称
mail.passwd|111111|发送人邮箱密码
mail.smtp.starttls.enable|true|邮箱是否开启tls
mail.smtp.ssl.enable|false|邮箱是否开启ssl
mail.smtp.ssl.trust|xxx.xxx.com|邮箱ssl白名单
xls.file.path|/tmp/xls|邮箱附件临时工作目录
||以下为企业微信配置[选填]|
enterprise.wechat.enable|false|企业微信是否启用
enterprise.wechat.corp.id|xxxxxxx|
enterprise.wechat.secret|xxxxxxx|
enterprise.wechat.agent.id|xxxxxxx|
enterprise.wechat.users|xxxxxxx|
enterprise.wechat.token.url|https://qyapi.weixin.qq.com/cgi-bin/gettoken? <br /> corpid=$corpId&corpsecret=$secret|
enterprise.wechat.push.url|https://qyapi.weixin.qq.com/cgi-bin/message/send? <br /> access_token=$token|
enterprise.wechat.user.send.msg||发送消息格式
enterprise.wechat.team.send.msg||群发消息格式
plugin.dir|/Users/xx/your/path/to/plugin/dir|插件目录
## 9.quartz.properties [Quartz配置]
这里面主要是quartz配置,请结合实际业务场景&资源进行配置,本文暂时不做展开.
|参数 |默认值| 描述|
|--|--|--|
org.quartz.jobStore.driverDelegateClass | org.quartz.impl.jdbcjobstore.StdJDBCDelegate
org.quartz.jobStore.driverDelegateClass | org.quartz.impl.jdbcjobstore.PostgreSQLDelegate
org.quartz.scheduler.instanceName | DolphinScheduler
org.quartz.scheduler.instanceId | AUTO
org.quartz.scheduler.makeSchedulerThreadDaemon | true
org.quartz.jobStore.useProperties | false
org.quartz.threadPool.class | org.quartz.simpl.SimpleThreadPool
org.quartz.threadPool.makeThreadsDaemons | true
org.quartz.threadPool.threadCount | 25
org.quartz.threadPool.threadPriority | 5
org.quartz.jobStore.class | org.quartz.impl.jdbcjobstore.JobStoreTX
org.quartz.jobStore.tablePrefix | QRTZ_
org.quartz.jobStore.isClustered | true
org.quartz.jobStore.misfireThreshold | 60000
org.quartz.jobStore.clusterCheckinInterval | 5000
org.quartz.jobStore.acquireTriggersWithinLock|true
org.quartz.jobStore.dataSource | myDs
org.quartz.dataSource.myDs.connectionProvider.class | org.apache.dolphinscheduler.service.quartz.DruidConnectionProvider
## 10.install_config.conf [DS环境变量配置脚本[用于DS安装/启动]]
install_config.conf这个配置文件比较繁琐,这个文件主要有两个地方会用到.
* 1.DS集群的自动安装.
> 调用install.sh脚本会自动加载该文件中的配置.并根据该文件中的内容自动配置上述的配置文件中的内容.
> 比如:dolphinscheduler-daemon.sh、datasource.properties、zookeeper.properties、common.properties、application-api.properties、master.properties、worker.properties、alert.properties、quartz.properties 等文件.
* 2.DS集群的启动&关闭.
>DS集群在启动&关闭的时候,会加载该配置文件中的masters,workers,alertServer,apiServers等参数,启动/关闭DS集群.
文件内容如下:
```bash
# 注意: 该配置文件中如果包含特殊字符,如: `.*[]^${}\+?|()@#&`, 请转义,
# 示例: `[` 转义为 `\[`
# 数据库类型, 目前仅支持 postgresql 或者 mysql
dbtype="mysql"
# 数据库 地址 & 端口
dbhost="192.168.xx.xx:3306"
# 数据库 名称
dbname="dolphinscheduler"
# 数据库 用户名
username="xx"
# 数据库 密码
password="xx"
# Zookeeper地址
zkQuorum="192.168.xx.xx:2181,192.168.xx.xx:2181,192.168.xx.xx:2181"
# 将DS安装到哪个目录,如: /data1_1T/dolphinscheduler,
installPath="/data1_1T/dolphinscheduler"
# 使用哪个用户部署
# 注意: 部署用户需要sudo 权限, 并且可以操作 hdfs .
# 如果使用hdfs的话,根目录必须使用该用户进行创建.否则会有权限相关的问题.
deployUser="dolphinscheduler"
# 以下为告警服务配置
# 邮件服务器地址
mailServerHost="smtp.exmail.qq.com"
# 邮件服务器 端口
mailServerPort="25"
# 发送者
mailSender="xxxxxxxxxx"
# 发送用户
mailUser="xxxxxxxxxx"
# 邮箱密码
mailPassword="xxxxxxxxxx"
# TLS协议的邮箱设置为true,否则设置为false
starttlsEnable="true"
# 开启SSL协议的邮箱配置为true,否则为false。注意: starttlsEnable和sslEnable不能同时为true
sslEnable="false"
# 邮件服务地址值,同 mailServerHost
sslTrust="smtp.exmail.qq.com"
#业务用到的比如sql等资源文件上传到哪里,可以设置:HDFS,S3,NONE。如果想上传到HDFS,请配置为HDFS;如果不需要资源上传功能请选择NONE。
resourceStorageType="NONE"
# if S3,write S3 address,HA,for example :s3a://dolphinscheduler,
# Note,s3 be sure to create the root directory /dolphinscheduler
defaultFS="hdfs://mycluster:8020"
# 如果resourceStorageType 为S3 需要配置的参数如下:
s3Endpoint="http://192.168.xx.xx:9010"
s3AccessKey="xxxxxxxxxx"
s3SecretKey="xxxxxxxxxx"
# 如果ResourceManager是HA,则配置为ResourceManager节点的主备ip或者hostname,比如"192.168.xx.xx,192.168.xx.xx",否则如果是单ResourceManager或者根本没用到yarn,请配置yarnHaIps=""即可,如果没用到yarn,配置为""
yarnHaIps="192.168.xx.xx,192.168.xx.xx"
# 如果是单ResourceManager,则配置为ResourceManager节点ip或主机名,否则保持默认值即可。
singleYarnIp="yarnIp1"
# 资源文件在 HDFS/S3 存储路径
resourceUploadPath="/dolphinscheduler"
# HDFS/S3 操作用户
hdfsRootUser="hdfs"
# 以下为 kerberos 配置
# kerberos是否开启
kerberosStartUp="false"
# kdc krb5 config file path
krb5ConfPath="$installPath/conf/krb5.conf"
# keytab username
keytabUserName="hdfs-mycluster@ESZ.COM"
# username keytab path
keytabPath="$installPath/conf/hdfs.headless.keytab"
# api 服务端口
apiServerPort="12345"
# 部署DS的所有主机hostname
ips="ds1,ds2,ds3,ds4,ds5"
# ssh 端口 , 默认 22
sshPort="22"
# 部署master服务主机
masters="ds1,ds2"
# 部署 worker服务的主机
# 注意: 每一个worker都需要设置一个worker 分组的名称,默认值为 "default"
workers="ds1:default,ds2:default,ds3:default,ds4:default,ds5:default"
# 部署alert服务主机
alertServer="ds3"
# 部署api服务主机
apiServers="ds1"
```
## 11.dolphinscheduler_env.sh [环境变量配置]
通过类似shell方式提交任务的的时候,会加载该配置文件中的环境变量到主机中.
涉及到的任务类型有: Shell任务、Python任务、Spark任务、Flink任务、Datax任务等等
```bash
export HADOOP_HOME=/opt/soft/hadoop
export HADOOP_CONF_DIR=/opt/soft/hadoop/etc/hadoop
export SPARK_HOME1=/opt/soft/spark1
export SPARK_HOME2=/opt/soft/spark2
export PYTHON_HOME=/opt/soft/python
export JAVA_HOME=/opt/soft/java
export HIVE_HOME=/opt/soft/hive
export FLINK_HOME=/opt/soft/flink
export DATAX_HOME=/opt/soft/datax/bin/datax.py
export PATH=$HADOOP_HOME/bin:$SPARK_HOME1/bin:$SPARK_HOME2/bin:$PYTHON_HOME:$JAVA_HOME/bin:$HIVE_HOME/bin:$PATH:$FLINK_HOME/bin:$DATAX_HOME:$PATH
```
## 12.各服务日志配置文件
对应服务服务名称| 日志文件名 |
|--|--|--|
api服务日志配置文件 |logback-api.xml|
master服务日志配置文件|logback-master.xml |
worker服务日志配置文件|logback-worker.xml |
alert服务日志配置文件|logback-alert.xml |

287
docs/docs/zh/architecture/design.md

@ -0,0 +1,287 @@
## 系统架构设计
### 2.系统架构
#### 2.1 系统架构图
<p align="center">
<img src="/img/architecture-1.3.0.jpg" alt="系统架构图" width="70%" />
<p align="center">
<em>系统架构图</em>
</p>
</p>
#### 2.2 启动流程活动图
<p align="center">
<img src="/img/process-start-flow-1.3.0.png" alt="启动流程活动图" width="70%" />
<p align="center">
<em>启动流程活动图</em>
</p>
</p>
#### 2.3 架构说明
* **MasterServer**
MasterServer采用分布式无中心设计理念,MasterServer主要负责 DAG 任务切分、任务提交监控,并同时监听其它MasterServer和WorkerServer的健康状态。
MasterServer服务启动时向Zookeeper注册临时节点,通过监听Zookeeper临时节点变化来进行容错处理。
MasterServer基于netty提供监听服务。
##### 该服务内主要包含:
- **Distributed Quartz**分布式调度组件,主要负责定时任务的启停操作,当quartz调起任务后,Master内部会有线程池具体负责处理任务的后续操作
- **MasterSchedulerThread**是一个扫描线程,定时扫描数据库中的 **command** 表,根据不同的**命令类型**进行不同的业务操作
- **MasterExecThread**主要是负责DAG任务切分、任务提交监控、各种不同命令类型的逻辑处理
- **MasterTaskExecThread**主要负责任务的持久化
* **WorkerServer**
WorkerServer也采用分布式无中心设计理念,WorkerServer主要负责任务的执行和提供日志服务。
WorkerServer服务启动时向Zookeeper注册临时节点,并维持心跳。
Server基于netty提供监听服务。Worker
##### 该服务包含:
- **FetchTaskThread**主要负责不断从**Task Queue**中领取任务,并根据不同任务类型调用**TaskScheduleThread**对应执行器。
* **ZooKeeper**
ZooKeeper服务,系统中的MasterServer和WorkerServer节点都通过ZooKeeper来进行集群管理和容错。另外系统还基于ZooKeeper进行事件监听和分布式锁。
我们也曾经基于Redis实现过队列,不过我们希望DolphinScheduler依赖到的组件尽量地少,所以最后还是去掉了Redis实现。
* **Task Queue**
提供任务队列的操作,目前队列也是基于Zookeeper来实现。由于队列中存的信息较少,不必担心队列里数据过多的情况,实际上我们压测过百万级数据存队列,对系统稳定性和性能没影响。
* **Alert**
提供告警相关接口,接口主要包括**告警**两种类型的告警数据的存储、查询和通知功能。其中通知功能又有**邮件通知**和**SNMP(暂未实现)**两种。
* **API**
API接口层,主要负责处理前端UI层的请求。该服务统一提供RESTful api向外部提供请求服务。
接口包括工作流的创建、定义、查询、修改、发布、下线、手工启动、停止、暂停、恢复、从该节点开始执行等等。
* **UI**
系统的前端页面,提供系统的各种可视化操作界面。
#### 2.3 架构设计思想
##### 一、去中心化vs中心化
###### 中心化思想
中心化的设计理念比较简单,分布式集群中的节点按照角色分工,大体上分为两种角色:
<p align="center">
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/master_slave.png" alt="master-slave角色" width="50%" />
</p>
- Master的角色主要负责任务分发并监督Slave的健康状态,可以动态的将任务均衡到Slave上,以致Slave节点不至于“忙死”或”闲死”的状态。
- Worker的角色主要负责任务的执行工作并维护和Master的心跳,以便Master可以分配任务给Slave。
中心化思想设计存在的问题:
- 一旦Master出现了问题,则群龙无首,整个集群就会崩溃。为了解决这个问题,大多数Master/Slave架构模式都采用了主备Master的设计方案,可以是热备或者冷备,也可以是自动切换或手动切换,而且越来越多的新系统都开始具备自动选举切换Master的能力,以提升系统的可用性。
- 另外一个问题是如果Scheduler在Master上,虽然可以支持一个DAG中不同的任务运行在不同的机器上,但是会产生Master的过负载。如果Scheduler在Slave上,则一个DAG中所有的任务都只能在某一台机器上进行作业提交,则并行任务比较多的时候,Slave的压力可能会比较大。
###### 去中心化
<p align="center">
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/decentralization.png" alt="去中心化" width="50%" />
</p>
- 在去中心化设计里,通常没有Master/Slave的概念,所有的角色都是一样的,地位是平等的,全球互联网就是一个典型的去中心化的分布式系统,联网的任意节点设备down机,都只会影响很小范围的功能。
- 去中心化设计的核心设计在于整个分布式系统中不存在一个区别于其他节点的”管理者”,因此不存在单点故障问题。但由于不存在” 管理者”节点所以每个节点都需要跟其他节点通信才得到必须要的机器信息,而分布式系统通信的不可靠性,则大大增加了上述功能的实现难度。
- 实际上,真正去中心化的分布式系统并不多见。反而动态中心化分布式系统正在不断涌出。在这种架构下,集群中的管理者是被动态选择出来的,而不是预置的,并且集群在发生故障的时候,集群的节点会自发的举行"会议"来选举新的"管理者"去主持工作。最典型的案例就是ZooKeeper及Go语言实现的Etcd。
- DolphinScheduler的去中心化是Master/Worker注册到Zookeeper中,实现Master集群和Worker集群无中心,并使用Zookeeper分布式锁来选举其中的一台Master或Worker为“管理者”来执行任务。
##### 二、分布式锁实践
DolphinScheduler使用ZooKeeper分布式锁来实现同一时刻只有一台Master执行Scheduler,或者只有一台Worker执行任务的提交。
1. 获取分布式锁的核心流程算法如下
<p align="center">
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/distributed_lock.png" alt="获取分布式锁流程" width="50%" />
</p>
2. DolphinScheduler中Scheduler线程分布式锁实现流程图:
<p align="center">
<img src="/img/distributed_lock_procss.png" alt="获取分布式锁流程" width="50%" />
</p>
##### 三、线程不足循环等待问题
- 如果一个DAG中没有子流程,则如果Command中的数据条数大于线程池设置的阈值,则直接流程等待或失败。
- 如果一个大的DAG中嵌套了很多子流程,如下图则会产生“死等”状态:
<p align="center">
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/lack_thread.png" alt="线程不足循环等待问题" width="50%" />
</p>
上图中MainFlowThread等待SubFlowThread1结束,SubFlowThread1等待SubFlowThread2结束, SubFlowThread2等待SubFlowThread3结束,而SubFlowThread3等待线程池有新线程,则整个DAG流程不能结束,从而其中的线程也不能释放。这样就形成的子父流程循环等待的状态。此时除非启动新的Master来增加线程来打破这样的”僵局”,否则调度集群将不能再使用。
对于启动新Master来打破僵局,似乎有点差强人意,于是我们提出了以下三种方案来降低这种风险:
1. 计算所有Master的线程总和,然后对每一个DAG需要计算其需要的线程数,也就是在DAG流程执行之前做预计算。因为是多Master线程池,所以总线程数不太可能实时获取。
2. 对单Master线程池进行判断,如果线程池已经满了,则让线程直接失败。
3. 增加一种资源不足的Command类型,如果线程池不足,则将主流程挂起。这样线程池就有了新的线程,可以让资源不足挂起的流程重新唤醒执行。
注意:Master Scheduler线程在获取Command的时候是FIFO的方式执行的。
于是我们选择了第三种方式来解决线程不足的问题。
##### 四、容错设计
容错分为服务宕机容错和任务重试,服务宕机容错又分为Master容错和Worker容错两种情况
###### 1. 宕机容错
服务容错设计依赖于ZooKeeper的Watcher机制,实现原理如图:
<p align="center">
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/fault-tolerant.png" alt="DolphinScheduler容错设计" width="40%" />
</p>
其中Master监控其他Master和Worker的目录,如果监听到remove事件,则会根据具体的业务逻辑进行流程实例容错或者任务实例容错。
- Master容错流程:
<p align="center">
<img src="/img/failover-master.jpg" alt="容错流程" width="50%" />
</p>
容错范围:从host的维度来看,Master的容错范围包括:自身host+注册中心上不存在的节点host,容错的整个过程会加锁;
容错内容:Master的容错内容包括:容错工作流实例和任务实例,在容错前会比较实例的开始时间和服务节点的启动时间,在服务启动时间之后的则跳过容错;
容错后处理:ZooKeeper Master容错完成之后则重新由DolphinScheduler中Scheduler线程调度,遍历 DAG 找到”正在运行”和“提交成功”的任务,对”正在运行”的任务监控其任务实例的状态,对”提交成功”的任务需要判断Task Queue中是否已经存在,如果存在则同样监控任务实例的状态,如果不存在则重新提交任务实例。
- Worker容错流程:
<p align="center">
<img src="/img/failover-worker.jpg" alt="容错流程" width="50%" />
</p>
容错范围:从工作流实例的维度看,每个Master只负责容错自己的工作流实例;只有在`handleDeadServer`时会加锁;
容错内容:当发送Worker节点的remove事件时,Master只容错任务实例,在容错前会比较实例的开始时间和服务节点的启动时间,在服务启动时间之后的则跳过容错;
容错后处理:Master Scheduler线程一旦发现任务实例为” 需要容错”状态,则接管任务并进行重新提交。
注意:由于” 网络抖动”可能会使得节点短时间内失去和ZooKeeper的心跳,从而发生节点的remove事件。对于这种情况,我们使用最简单的方式,那就是节点一旦和ZooKeeper发生超时连接,则直接将Master或Worker服务停掉。
###### 2.任务失败重试
这里首先要区分任务失败重试、流程失败恢复、流程失败重跑的概念:
- 任务失败重试是任务级别的,是调度系统自动进行的,比如一个Shell任务设置重试次数为3次,那么在Shell任务运行失败后会自己再最多尝试运行3次
- 流程失败恢复是流程级别的,是手动进行的,恢复是从只能**从失败的节点开始执行**或**从当前节点开始执行**
- 流程失败重跑也是流程级别的,是手动进行的,重跑是从开始节点进行
接下来说正题,我们将工作流中的任务节点分了两种类型。
- 一种是业务节点,这种节点都对应一个实际的脚本或者处理语句,比如Shell节点,MR节点、Spark节点、依赖节点等。
- 还有一种是逻辑节点,这种节点不做实际的脚本或语句处理,只是整个流程流转的逻辑处理,比如子流程节等。
每一个**业务节点**都可以配置失败重试的次数,当该任务节点失败,会自动重试,直到成功或者超过配置的重试次数。**逻辑节点**不支持失败重试。但是逻辑节点里的任务支持重试。
如果工作流中有任务失败达到最大重试次数,工作流就会失败停止,失败的工作流可以手动进行重跑操作或者流程恢复操作
##### 五、任务优先级设计
在早期调度设计中,如果没有优先级设计,采用公平调度设计的话,会遇到先行提交的任务可能会和后继提交的任务同时完成的情况,而不能做到设置流程或者任务的优先级,因此我们对此进行了重新设计,目前我们设计如下:
- 按照**不同流程实例优先级**优先于**同一个流程实例优先级**优先于**同一流程内任务优先级**优先于**同一流程内任务**提交顺序依次从高到低进行任务处理。
- 具体实现是根据任务实例的json解析优先级,然后把**流程实例优先级_流程实例id_任务优先级_任务id**信息保存在ZooKeeper任务队列中,当从任务队列获取的时候,通过字符串比较即可得出最需要优先执行的任务
- 其中流程定义的优先级是考虑到有些流程需要先于其他流程进行处理,这个可以在流程启动或者定时启动时配置,共有5级,依次为HIGHEST、HIGH、MEDIUM、LOW、LOWEST。如下图
<p align="center">
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/process_priority.png" alt="流程优先级配置" width="40%" />
</p>
- 任务的优先级也分为5级,依次为HIGHEST、HIGH、MEDIUM、LOW、LOWEST。如下图
<p align="center">
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/task_priority.png" alt="任务优先级配置" width="35%" />
</p>
##### 六、Logback和netty实现日志访问
- 由于Web(UI)和Worker不一定在同一台机器上,所以查看日志不能像查询本地文件那样。有两种方案:
- 将日志放到ES搜索引擎上
- 通过netty通信获取远程日志信息
- 介于考虑到尽可能的DolphinScheduler的轻量级性,所以选择了gRPC实现远程访问日志信息。
<p align="center">
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/grpc.png" alt="grpc远程访问" width="50%" />
</p>
- 我们使用自定义Logback的FileAppender和Filter功能,实现每个任务实例生成一个日志文件。
- FileAppender主要实现如下:
```java
/**
* task log appender
*/
public class TaskLogAppender extends FileAppender<ILoggingEvent> {
...
@Override
protected void append(ILoggingEvent event) {
if (currentlyActiveFile == null){
currentlyActiveFile = getFile();
}
String activeFile = currentlyActiveFile;
// thread name: taskThreadName-processDefineId_processInstanceId_taskInstanceId
String threadName = event.getThreadName();
String[] threadNameArr = threadName.split("-");
// logId = processDefineId_processInstanceId_taskInstanceId
String logId = threadNameArr[1];
...
super.subAppend(event);
}
}
```
以/流程定义id/流程实例id/任务实例id.log的形式生成日志
- 过滤匹配以TaskLogInfo开始的线程名称:
- TaskLogFilter实现如下:
```java
/**
* task log filter
*/
public class TaskLogFilter extends Filter<ILoggingEvent> {
@Override
public FilterReply decide(ILoggingEvent event) {
if (event.getThreadName().startsWith("TaskLogInfo-")){
return FilterReply.ACCEPT;
}
return FilterReply.DENY;
}
}
```
### 总结
本文从调度出发,初步介绍了大数据分布式工作流调度系统--DolphinScheduler的架构原理及实现思路。未完待续

58
docs/docs/zh/architecture/load-balance.md

@ -0,0 +1,58 @@
### 负载均衡
负载均衡即通过路由算法(通常是集群环境),合理的分摊服务器压力,达到服务器性能的最大优化。
### DolphinScheduler-Worker 负载均衡算法
DolphinScheduler-Master 分配任务至 worker,默认提供了三种算法:
加权随机(random)
平滑轮询(roundrobin)
线性负载(lowerweight)
默认配置为线性加权负载。
由于路由是在客户端做的,即 master 服务,因此你可以更改 master.properties 中的 master.host.selector 来配置你所想要的算法。
eg:master.host.selector=random(不区分大小写)
### Worker 负载均衡配置
配置文件 worker.properties
#### 权重
上述所有的负载算法都是基于权重来进行加权分配的,权重影响分流结果。你可以在 修改 worker.weight 的值来给不同的机器设置不同的权重。
#### 预热
考虑到 JIT 优化,我们会让 worker 在启动后低功率的运行一段时间,使其逐渐达到最佳状态,这段过程我们称之为预热。感兴趣的同学可以去阅读 JIT 相关的文章。
因此 worker 在启动后,他的权重会随着时间逐渐达到最大(默认十分钟,我们没有提供配置项,如果需要,你可以修改并提交相关的 PR)。
### 负载均衡算法细述
#### 随机(加权)
该算法比较简单,即在符合的 worker 中随机选取一台(权重会影响他的比重)。
#### 平滑轮询(加权)
加权轮询算法一个明显的缺陷。即在某些特殊的权重下,加权轮询调度会生成不均匀的实例序列,这种不平滑的负载可能会使某些实例出现瞬时高负载的现象,导致系统存在宕机的风险。为了解决这个调度缺陷,我们提供了平滑加权轮询算法。
每台 worker 都有两个权重,即 weight(预热完成后保持不变),current_weight(动态变化),每次路由。都会遍历所有的 worker,使其 current_weight+weight,同时累加所有 worker 的 weight,计为 total_weight,然后挑选 current_weight 最大的作为本次执行任务的 worker,与此同时,将这台 worker 的 current_weight-total_weight。
#### 线性加权(默认算法)
该算法每隔一段时间会向注册中心上报自己的负载信息。我们主要根据两个信息来进行判断
* load 平均值(默认是 CPU 核数 *2)
* 可用物理内存(默认是 0.3,单位是 G)
如果两者任何一个低于配置项,那么这台 worker 将不参与负载。(即不分配流量)
你可以在 worker.properties 修改下面的属性来自定义配置
* worker.max.cpuload.avg=-1 (worker最大cpuload均值,只有高于系统cpuload均值时,worker服务才能被派发任务. 默认值为-1: cpu cores * 2)
* worker.reserved.memory=0.3 (worker预留内存,只有低于系统可用内存时,worker服务才能被派发任务,单位为G)

185
docs/docs/zh/architecture/metadata.md

@ -0,0 +1,185 @@
# Dolphin Scheduler 1.3元数据文档
<a name="25Ald"></a>
### 表概览
| 表名 | 表信息 |
| :---: | :---: |
| t_ds_access_token | 访问ds后端的token |
| t_ds_alert | 告警信息 |
| t_ds_alertgroup | 告警组 |
| t_ds_command | 执行命令 |
| t_ds_datasource | 数据源 |
| t_ds_error_command | 错误命令 |
| t_ds_process_definition | 流程定义 |
| t_ds_process_instance | 流程实例 |
| t_ds_project | 项目 |
| t_ds_queue | 队列 |
| t_ds_relation_datasource_user | 用户关联数据源 |
| t_ds_relation_process_instance | 子流程 |
| t_ds_relation_project_user | 用户关联项目 |
| t_ds_relation_resources_user | 用户关联资源 |
| t_ds_relation_udfs_user | 用户关联UDF函数 |
| t_ds_relation_user_alertgroup | 用户关联告警组 |
| t_ds_resources | 资源文件 |
| t_ds_schedules | 流程定时调度 |
| t_ds_session | 用户登录的session |
| t_ds_task_instance | 任务实例 |
| t_ds_tenant | 租户 |
| t_ds_udfs | UDF资源 |
| t_ds_user | 用户 |
| t_ds_version | ds版本信息 |
<a name="VNVGr"></a>
### 用户 队列 数据源
![image.png](/img/metadata-erd/user-queue-datasource.png)
- 一个租户下可以有多个用户<br />
- t_ds_user中的queue字段存储的是队列表中的queue_name信息,t_ds_tenant下存的是queue_id,在流程定义执行过程中,用户队列优先级最高,用户队列为空则采用租户队列<br />
- t_ds_datasource表中的user_id字段表示创建该数据源的用户,t_ds_relation_datasource_user中的user_id表示,对数据源有权限的用户<br />
<a name="HHyGV"></a>
### 项目 资源 告警
![image.png](/img/metadata-erd/project-resource-alert.png)
- 一个用户可以有多个项目,用户项目授权通过t_ds_relation_project_user表完成project_id和user_id的关系绑定<br />
- t_ds_projcet表中的user_id表示创建该项目的用户,t_ds_relation_project_user表中的user_id表示对项目有权限的用户<br />
- t_ds_resources表中的user_id表示创建该资源的用户,t_ds_relation_resources_user中的user_id表示对资源有权限的用户<br />
- t_ds_udfs表中的user_id表示创建该UDF的用户,t_ds_relation_udfs_user表中的user_id表示对UDF有权限的用户<br />
<a name="Bg2Sn"></a>
### 命令 流程 任务
![image.png](/img/metadata-erd/command.png)<br />![image.png](/img/metadata-erd/process-task.png)
- 一个项目有多个流程定义,一个流程定义可以生成多个流程实例,一个流程实例可以生成多个任务实例<br />
- t_ds_schedulers表存放流程定义的定时调度信息<br />
- t_ds_relation_process_instance表存放的数据用于处理流程定义中含有子流程的情况,parent_process_instance_id表示含有子流程的主流程实例id,process_instance_id表示子流程实例的id,parent_task_instance_id表示子流程节点的任务实例id,流程实例表和任务实例表分别对应t_ds_process_instance表和t_ds_task_instance表
<a name="Pv25P"></a>
### 核心表Schema
<a name="32Jzd"></a>
#### t_ds_process_definition
| 字段 | 类型 | 注释 |
| --- | --- | --- |
| id | int | 主键 |
| name | varchar | 流程定义名称 |
| version | int | 流程定义版本 |
| release_state | tinyint | 流程定义的发布状态:0 未上线  1已上线 |
| project_id | int | 项目id |
| user_id | int | 流程定义所属用户id |
| process_definition_json | longtext | 流程定义json串 |
| description | text | 流程定义描述 |
| global_params | text | 全局参数 |
| flag | tinyint | 流程是否可用:0 不可用,1 可用 |
| locations | text | 节点坐标信息 |
| connects | text | 节点连线信息 |
| receivers | text | 收件人 |
| receivers_cc | text | 抄送人 |
| create_time | datetime | 创建时间 |
| timeout | int | 超时时间 |
| tenant_id | int | 租户id |
| update_time | datetime | 更新时间 |
| modify_by | varchar | 修改用户 |
| resource_ids | varchar | 资源id集 |
<a name="e6jfz"></a>
#### t_ds_process_instance
| 字段 | 类型 | 注释 |
| --- | --- | --- |
| id | int | 主键 |
| name | varchar | 流程实例名称 |
| process_definition_id | int | 流程定义id |
| state | tinyint | 流程实例状态:0 提交成功,1 正在运行,2 准备暂停,3 暂停,4 准备停止,5 停止,6 失败,7 成功,8 需要容错,9 kill,10 等待线程,11 等待依赖完成 |
| recovery | tinyint | 流程实例容错标识:0 正常,1 需要被容错重启 |
| start_time | datetime | 流程实例开始时间 |
| end_time | datetime | 流程实例结束时间 |
| run_times | int | 流程实例运行次数 |
| host | varchar | 流程实例所在的机器 |
| command_type | tinyint | 命令类型:0 启动工作流,1 从当前节点开始执行,2 恢复被容错的工作流,3 恢复暂停流程,4 从失败节点开始执行,5 补数,6 调度,7 重跑,8 暂停,9 停止,10 恢复等待线程 |
| command_param | text | 命令的参数(json格式) |
| task_depend_type | tinyint | 节点依赖类型:0 当前节点,1 向前执行,2 向后执行 |
| max_try_times | tinyint | 最大重试次数 |
| failure_strategy | tinyint | 失败策略 0 失败后结束,1 失败后继续 |
| warning_type | tinyint | 告警类型:0 不发,1 流程成功发,2 流程失败发,3 成功失败都发 |
| warning_group_id | int | 告警组id |
| schedule_time | datetime | 预期运行时间 |
| command_start_time | datetime | 开始命令时间 |
| global_params | text | 全局参数(固化流程定义的参数) |
| process_instance_json | longtext | 流程实例json(copy的流程定义的json) |
| flag | tinyint | 是否可用,1 可用,0不可用 |
| update_time | timestamp | 更新时间 |
| is_sub_process | int | 是否是子工作流 1 是,0 不是 |
| executor_id | int | 命令执行用户 |
| locations | text | 节点坐标信息 |
| connects | text | 节点连线信息 |
| history_cmd | text | 历史命令,记录所有对流程实例的操作 |
| dependence_schedule_times | text | 依赖节点的预估时间 |
| process_instance_priority | int | 流程实例优先级:0 Highest,1 High,2 Medium,3 Low,4 Lowest |
| worker_group | varchar | 任务指定运行的worker分组 |
| timeout | int | 超时时间 |
| tenant_id | int | 租户id |
<a name="IvHEc"></a>
#### t_ds_task_instance
| 字段 | 类型 | 注释 |
| --- | --- | --- |
| id | int | 主键 |
| name | varchar | 任务名称 |
| task_type | varchar | 任务类型 |
| process_definition_id | int | 流程定义id |
| process_instance_id | int | 流程实例id |
| task_json | longtext | 任务节点json |
| state | tinyint | 任务实例状态:0 提交成功,1 正在运行,2 准备暂停,3 暂停,4 准备停止,5 停止,6 失败,7 成功,8 需要容错,9 kill,10 等待线程,11 等待依赖完成 |
| submit_time | datetime | 任务提交时间 |
| start_time | datetime | 任务开始时间 |
| end_time | datetime | 任务结束时间 |
| host | varchar | 执行任务的机器 |
| execute_path | varchar | 任务执行路径 |
| log_path | varchar | 任务日志路径 |
| alert_flag | tinyint | 是否告警 |
| retry_times | int | 重试次数 |
| pid | int | 进程pid |
| app_link | varchar | yarn app id |
| flag | tinyint | 是否可用:0 不可用,1 可用 |
| retry_interval | int | 重试间隔 |
| max_retry_times | int | 最大重试次数 |
| task_instance_priority | int | 任务实例优先级:0 Highest,1 High,2 Medium,3 Low,4 Lowest |
| worker_group | varchar | 任务指定运行的worker分组 |
<a name="pPQkU"></a>
#### t_ds_schedules
| 字段 | 类型 | 注释 |
| --- | --- | --- |
| id | int | 主键 |
| process_definition_id | int | 流程定义id |
| start_time | datetime | 调度开始时间 |
| end_time | datetime | 调度结束时间 |
| crontab | varchar | crontab 表达式 |
| failure_strategy | tinyint | 失败策略: 0 结束,1 继续 |
| user_id | int | 用户id |
| release_state | tinyint | 状态:0 未上线,1 上线 |
| warning_type | tinyint | 告警类型:0 不发,1 流程成功发,2 流程失败发,3 成功失败都发 |
| warning_group_id | int | 告警组id |
| process_instance_priority | int | 流程实例优先级:0 Highest,1 High,2 Medium,3 Low,4 Lowest |
| worker_group | varchar | 任务指定运行的worker分组 |
| create_time | datetime | 创建时间 |
| update_time | datetime | 更新时间 |
<a name="TkQzn"></a>
#### t_ds_command
| 字段 | 类型 | 注释 |
| --- | --- | --- |
| id | int | 主键 |
| command_type | tinyint | 命令类型:0 启动工作流,1 从当前节点开始执行,2 恢复被容错的工作流,3 恢复暂停流程,4 从失败节点开始执行,5 补数,6 调度,7 重跑,8 暂停,9 停止,10 恢复等待线程 |
| process_definition_id | int | 流程定义id |
| command_param | text | 命令的参数(json格式) |
| task_depend_type | tinyint | 节点依赖类型:0 当前节点,1 向前执行,2 向后执行 |
| failure_strategy | tinyint | 失败策略:0结束,1继续 |
| warning_type | tinyint | 告警类型:0 不发,1 流程成功发,2 流程失败发,3 成功失败都发 |
| warning_group_id | int | 告警组 |
| schedule_time | datetime | 预期运行时间 |
| start_time | datetime | 开始时间 |
| executor_id | int | 执行用户id |
| dependence | varchar | 依赖字段 |
| update_time | datetime | 更新时间 |
| process_instance_priority | int | 流程实例优先级:0 Highest,1 High,2 Medium,3 Low,4 Lowest |
| worker_group | varchar | 任务指定运行的worker分组 |

1134
docs/docs/zh/architecture/task-structure.md

File diff suppressed because it is too large Load Diff

689
docs/docs/zh/faq.md

@ -0,0 +1,689 @@
<!-- markdown-link-check-disable -->
## Q:项目的名称是?
A:DolphinScheduler
---
## Q:DolphinScheduler 服务介绍及建议运行内存
A:DolphinScheduler 由 5 个服务组成,MasterServer、WorkerServer、ApiServer、AlertServer、LoggerServer 和 UI。
| 服务 | 说明 |
| ------------------------- | ------------------------------------------------------------ |
| MasterServer | 主要负责 **DAG** 的切分和任务状态的监控 |
| WorkerServer/LoggerServer | 主要负责任务的提交、执行和任务状态的更新。LoggerServer 用于 Rest Api 通过 **RPC** 查看日志 |
| ApiServer | 提供 Rest Api 服务,供 UI 进行调用 |
| AlertServer | 提供告警服务 |
| UI | 前端页面展示 |
注意:**由于服务比较多,建议单机部署最好是 4 核 16G 以上**
---
## Q:系统支持哪些邮箱?
A:支持绝大多数邮箱,qq、163、126、139、outlook、aliyun 等皆支持。支持 **TLS 和 SSL** 协议,可以在 alert.properties 中选择性配置
---
## Q:常用的系统变量时间参数有哪些,如何使用?
A:请参考[使用手册](https://dolphinscheduler.apache.org/zh-cn/docs/latest/user_doc/guide/parameter/built-in.html) 第8小节
---
## Q:pip install kazoo 这个安装报错。是必须安装的吗?
A: 这个是 python 连接 Zookeeper 需要使用到的,用于删除Zookeeper中的master/worker临时节点信息。所以如果是第一次安装,就可以忽略错误。在1.3.0之后,kazoo不再需要了,我们用程序来代替kazoo所做的
---
## Q:怎么指定机器运行任务
A:使用 **管理员** 创建 Worker 分组,在 **流程定义启动** 的时候可**指定Worker分组**或者在**任务节点上指定Worker分组**。如果不指定,则使用 Default,**Default默认是使用的集群里所有的Worker中随机选取一台来进行任务提交、执行**
---
## Q:任务的优先级
A:我们同时 **支持流程和任务的优先级**。优先级我们有 **HIGHEST、HIGH、MEDIUM、LOW 和 LOWEST** 五种级别。**可以设置不同流程实例之间的优先级,也可以设置同一个流程实例中不同任务实例的优先级**。详细内容请参考任务优先级设计 https://analysys.github.io/easyscheduler_docs_cn/%E7%B3%BB%E7%BB%9F%E6%9E%B6%E6%9E%84%E8%AE%BE%E8%AE%A1.html#%E7%B3%BB%E7%BB%9F%E6%9E%B6%E6%9E%84%E8%AE%BE%E8%AE%A1
----
## Q:dolphinscheduler-grpc 报错
A:在 1.2 及以前版本中,在根目录下执行:mvn -U clean package assembly:assembly -Dmaven.test.skip=true,然后刷新下整个项目就好,1.3版本中不再使用 GRPC 进行通信了
----
## Q:DolphinScheduler 支持 windows 上运行么
A: 理论上只有 **Worker 是需要在 Linux 上运行的**,其它的服务都是可以在 windows 上正常运行的。但是还是建议最好能在 linux 上部署使用
-----
## Q:UI 在 linux 编译 node-sass 提示:Error:EACCESS:permission denied,mkdir xxxx
A:单独安装 **npm install node-sass --unsafe-perm**,之后再 **npm install**
---
## Q:UI 不能正常登陆访问
A: 1,如果是 node 启动的查看 dolphinscheduler-ui 下的 .env 文件里的 API_BASE 配置是否是 Api Server 服务地址
2,如果是 nginx 启动的并且是通过 **install-dolphinscheduler-ui.sh** 安装的,查看
**/etc/nginx/conf.d/dolphinscheduler.conf** 中的 proxy_pass 配置是否是 Api Server 服务地址
3,如果以上配置都是正确的,那么请查看 Api Server 服务是否是正常的,
curl http://localhost:12345/dolphinscheduler/users/get-user-info 查看 Api Server 日志,
如果提示 cn.dolphinscheduler.api.interceptor.LoginHandlerInterceptor:[76] - session info is null,则证明 Api Server 服务是正常的
4,如果以上都没有问题,需要查看一下 **application.properties** 中的 **server.context-path 和 server.port 配置**是否正确
注意:1.3 版本直接使用 Jetty 进行前端代码的解析,无需再安装配置 nginx 了
---
## Q:流程定义手动启动或调度启动之后,没有流程实例生成
A: 1,首先通过 **jps 查看MasterServer服务是否存在**,或者从服务监控直接查看 zk 中是否存在 master 服务
2,如果存在 master 服务,查看 **命令状态统计** 或者 **t_ds_error_command** 中是否增加的新记录,如果增加了,**请查看 message 字段定位启动异常原因**
---
## Q:任务状态一直处于提交成功状态
A: 1,首先通过 **jps 查看 WorkerServer 服务是否存在**,或者从服务监控直接查看 zk 中是否存在 worker 服务
2,如果 **WorkerServer** 服务正常,需要 **查看 MasterServer 是否把 task 任务放到 zk 队列中** ,**需要查看 MasterServer 日志及 zk 队列中是否有任务阻塞**
3,如果以上都没有问题,需要定位是否指定了 Worker 分组,但是 **Worker 分组的机器不是在线状态**
---
## Q:install.sh 中需要注意问题
A: 1,如果替换变量中包含特殊字符,**请用 \ 转移符进行转移**
2,installPath="/data1_1T/dolphinscheduler",**这个目录不能和当前要一键安装的 install.sh 目录是一样的**
3,deployUser="dolphinscheduler",**部署用户必须具有 sudo 权限**,因为 worker 是通过 sudo -u 租户 sh xxx.command 进行执行的
4,monitorServerState="false",服务监控脚本是否启动,默认是不启动服务监控脚本的。**如果启动服务监控脚本,则每 5 分钟定时来监控 master 和 worker 的服务是否 down 机,如果 down 机则会自动重启**
5,hdfsStartupSate="false",是否开启 HDFS 资源上传功能。默认是不开启的,**如果不开启则资源中心是不能使用的**。如果开启,需要 conf/common/hadoop/hadoop.properties 中配置 fs.defaultFS 和 yarn 的相关配置,如果使用 namenode HA,需要将 core-site.xml 和 hdfs-site.xml 复制到conf根目录下
注意:**1.0.x 版本是不会自动创建 hdfs 根目录的,需要自行创建,并且需要部署用户有hdfs的操作权限**
---
## Q:流程定义和流程实例下线异常
A : 对于 **1.0.4 以前的版本中**,修改 dolphinscheduler-api cn.dolphinscheduler.api.quartz 包下的代码即可
```
public boolean deleteJob(String jobName, String jobGroupName) {
lock.writeLock().lock();
try {
JobKey jobKey = new JobKey(jobName,jobGroupName);
if(scheduler.checkExists(jobKey)){
logger.info("try to delete job, job name: {}, job group name: {},", jobName, jobGroupName);
return scheduler.deleteJob(jobKey);
}else {
return true;
}
} catch (SchedulerException e) {
logger.error(String.format("delete job : %s failed",jobName), e);
} finally {
lock.writeLock().unlock();
}
return false;
}
```
---
## Q:HDFS 启动之前创建的租户,能正常使用资源中心吗
A: 不能。因为在未启动 HDFS 创建的租户,不会在 HDFS 中注册租户目录。所以上次资源会报错
## Q:多 Master 和多 Worker 状态下,服务掉了,怎么容错
A: **注意:Master 监控 Master 及 Worker 服务。**
1,如果 Master 服务掉了,其它的 Master 会接管挂掉的 Master 的流程,继续监控 Worker task 状态
2,如果 Worker 服务掉了,Master 会监控到 Worker 服务掉了,如果存在 Yarn 任务,Kill Yarn 任务之后走重试
具体请看容错设计:https://analysys.github.io/easyscheduler_docs_cn/%E7%B3%BB%E7%BB%9F%E6%9E%B6%E6%9E%84%E8%AE%BE%E8%AE%A1.html#%E7%B3%BB%E7%BB%9F%E6%9E%B6%E6%9E%84%E8%AE%BE%E8%AE%A1
---
## Q:对于 Master 和 Worker 一台机器伪分布式下的容错
A : 1.0.3 版本只实现了 Master 启动流程容错,不走 Worker 容错。也就是说如果 Worker 挂掉的时候,没有 Master 存在。这流程将会出现问题。我们会在 **1.1.0** 版本中增加 Master 和 Worker 启动自容错,修复这个问题。如果想手动修改这个问题,需要针对 **跨重启正在运行流程** **并且已经掉的正在运行的 Worker 任务,需要修改为失败**,**同时跨重启正在运行流程设置为失败状态**。然后从失败节点进行流程恢复即可
---
## Q:定时容易设置成每秒执行
A : 设置定时的时候需要注意,如果第一位(* * * * * ? *)设置成 \* ,则表示每秒执行。**我们将会在 1.1.0 版本中加入显示最近调度的时间列表** ,使用 http://cron.qqe2.com/ 可以在线看近 5 次运行时间
## Q:定时有有效时间范围吗
A:有的,**如果定时的起止时间是同一个时间,那么此定时将是无效的定时**。**如果起止时间的结束时间比当前的时间小,很有可能定时会被自动删除**
## Q:任务依赖有几种实现
A: 1,**DAG** 之间的任务依赖关系,是从 **入度为零** 进行 DAG 切分的
2,有 **任务依赖节点** ,可以实现跨流程的任务或者流程依赖,具体请参考 依赖(DEPENDENT)节点:https://analysys.github.io/easyscheduler_docs_cn/%E7%B3%BB%E7%BB%9F%E4%BD%BF%E7%94%A8%E6%89%8B%E5%86%8C.html#%E4%BB%BB%E5%8A%A1%E8%8A%82%E7%82%B9%E7%B1%BB%E5%9E%8B%E5%92%8C%E5%8F%82%E6%95%B0%E8%AE%BE%E7%BD%AE
## Q:流程定义有几种启动方式
A: 1,在 **流程定义列表**,点击 **启动** 按钮
2,**流程定义列表添加定时器**,调度启动流程定义
3,流程定义 **查看或编辑** DAG 页面,任意 **任务节点右击** 启动流程定义
4,可以对流程定义 DAG 编辑,设置某些任务的运行标志位 **禁止运行**,则在启动流程定义的时候,将该节点的连线将从 DAG 中去掉
## Q:Python 任务设置 Python 版本
A: 只需要修改 conf/env/dolphinscheduler_env.sh 中的 PYTHON_HOME
```
export PYTHON_HOME=/bin/python
```
注意:这了 **PYTHON_HOME** ,是 python 命令的绝对路径,而不是单纯的 PYTHON_HOME,还需要注意的是 export PATH 的时候,需要直接
```
export PATH=$HADOOP_HOME/bin:$SPARK_HOME1/bin:$SPARK_HOME2/bin:$PYTHON_HOME:$JAVA_HOME/bin:$HIVE_HOME/bin:$PATH
```
## Q:Worker Task 通过 sudo -u 租户 sh xxx.command 会产生子进程,在 kill 的时候,是否会杀掉
A: 我们会在 1.0.4 中增加 kill 任务同时,kill 掉任务产生的各种所有子进程
## Q:DolphinScheduler 中的队列怎么用,用户队列和租户队列是什么意思
A : DolphinScheduler 中的队列可以在用户或者租户上指定队列,**用户指定的队列优先级是高于租户队列的优先级的。**,例如:对 MR 任务指定队列,是通过 mapreduce.job.queuename 来指定队列的。
注意:MR 在用以上方法指定队列的时候,传递参数请使用如下方式:
```
Configuration conf = new Configuration();
GenericOptionsParser optionParser = new GenericOptionsParser(conf, args);
String[] remainingArgs = optionParser.getRemainingArgs();
```
如果是 Spark 任务 --queue 方式指定队列
## Q:Master 或者 Worker 报如下告警
<p align="center">
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/master_worker_lack_res.png" width="60%" />
</p>
A : 修改 conf 下的 master.properties **master.reserved.memory** 的值为更小的值,比如说 0.1 或者
worker.properties **worker.reserved.memory** 的值为更小的值,比如说 0.1
## Q:hive 版本是 1.1.0+cdh5.15.0,SQL hive 任务连接报错
<p align="center">
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/cdh_hive_error.png" width="60%" />
</p>
A: 将 hive pom
```
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>2.1.0</version>
</dependency>
```
修改为
```
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>1.1.0</version>
</dependency>
```
---
## Q:如何增加一台工作服务器
A: 1,参考官网[部署文档](https://dolphinscheduler.apache.org/zh-cn/docs/laster/user_doc/installation/cluster.html) 1.3 小节,创建部署用户和 hosts 映射
2,参考官网[部署文档](https://dolphinscheduler.apache.org/zh-cn/docs/laster/user_doc/installation/cluster.html) 1.4 小节,配置 hosts 映射和 ssh 打通及修改目录权限.
1.4 小节的最后一步是在当前新增机器上执行的,即需要给部署目录部署用户的权限
3,复制正在运行的服务器上的部署目录到新机器的同样的部署目录下
4,到 bin 下,启动 worker server 和 logger server
```
./dolphinscheduler-daemon.sh start worker-server
./dolphinscheduler-daemon.sh start logger-server
```
---
## Q:DolphinScheduler 什么时候发布新版本,同时新旧版本区别,以及如何升级,版本号规范
A:1,Apache 项目的发版流程是通过邮件列表完成的。 你可以订阅 DolphinScheduler 的邮件列表,订阅之后如果有发版,你就可以收到邮件。请参照这篇[指引](https://github.com/apache/dolphinscheduler#get-help)来订阅 DolphinScheduler 的邮件列表。
2,当项目发版的时候,会有发版说明告知具体的变更内容,同时也会有从旧版本升级到新版本的升级文档。
3,版本号为 x.y.z, 当 x 增加时代表全新架构的版本。当 y 增加时代表与 y 版本之前的不兼容需要升级脚本或其他人工处理才能升级。当 z 增加代表是 bug 修复,升级完全兼容。无需额外处理。之前有个问题 1.0.2 的升级不兼容 1.0.1 需要升级脚本。
---
## Q:后续任务在前置任务失败情况下仍旧可以执行
A:在启动工作流的时候,你可以设置失败策略:继续还是失败。
![设置任务失败策略](https://user-images.githubusercontent.com/15833811/80368215-ee378080-88be-11ea-9074-01a33d012b23.png)
---
## Q:工作流模板 DAG、工作流实例、工作任务及实例之间是什么关系 工作流模板 DAG、工作流实例、工作任务及实例之间是什么关系,一个 dag 支持最大并发 100,是指产生 100 个工作流实例并发运行吗?一个 dag 中的任务节点,也有并发数的配置,是指任务也可以并发多个线程运行吗?最大数 100 吗?
A:
1.2.1 version
```
master.properties
设置 master 节点并发执行的最大工作流数
master.exec.threads=100
Control the number of parallel tasks in each workflow
设置每个工作流可以并发执行的最大任务数
master.exec.task.number=20
worker.properties
设置 worker 节点并发执行的最大任务数
worker.exec.threads=100
```
---
## Q:工作组管理页面没有展示按钮
<p align="center">
<img src="https://user-images.githubusercontent.com/39816903/81903776-d8cb9180-95f4-11ea-98cb-94ca1e6a1db5.png" width="60%" />
</p>
A:1.3.0 版本,为了支持 k8s,worker ip 一直变动,因此我们不能在 UI 界面上配置,工作组可以配置在 worker.properties 上配置名称。
---
## Q:为什么不把 mysql 的 jdbc 连接包添加到 docker 镜像里面
A:Mysql jdbc 连接包的许可证和 apache v2 的许可证不兼容,因此它不能被加入到 docker 镜像里面。
---
## Q:当一个任务提交多个 yarn 程序的时候经常失败
<p align="center">
<img src="https://user-images.githubusercontent.com/16174111/81312485-476e9380-90b9-11ea-9aad-ed009db899b1.png" width="60%" />
</p>
A:这个 Bug 在 dev 分支已修复,并加入到需求/待做列表。
---
## Q:Master 服务和 Worker 服务在运行几天之后停止了
<p align="center">
<img src="https://user-images.githubusercontent.com/18378986/81293969-c3101680-90a0-11ea-87e5-ac9f0dd53f5e.png" width="60%" />
</p>
A:会话超时时间太短了,只有 0.3 秒,修改 zookeeper.properties 的配置项:
```
zookeeper.session.timeout=60000
zookeeper.connection.timeout=30000
```
---
## Q:使用 docker-compose 默认配置启动,显示 zookeeper 错误
<p align="center">
<img src="https://user-images.githubusercontent.com/42579056/80374318-13c98780-88c9-11ea-8d5f-53448b957f02.png" width="60%" />
</p>
A:这个问题在 dev-1.3.0 版本解决了。这个 [pr](https://github.com/apache/dolphinscheduler/pull/2595) 已经解决了这个 bug,主要的改动点:
```
在docker-compose.yml文件中增加zookeeper的环境变量ZOO_4LW_COMMANDS_WHITELIST。
把minLatency,avgLatency and maxLatency的类型从int改成float。
```
---
## Q:界面上显示任务一直运行,结束不了,从日志上看任务实例为空
<p align="center">
<img src="https://user-images.githubusercontent.com/51871547/80302626-b1478d00-87dd-11ea-97d4-08aa2244a6d0.jpg" width="60%" />
</p>
<p align="center">
<img src="https://user-images.githubusercontent.com/51871547/80302626-b1478d00-87dd-11ea-97d4-08aa2244a6d0.jpg" width="60%" />
</p>
A:这个 [bug](https://github.com/apache/dolphinscheduler/issues/1477) 描述了问题的详情。这个问题在 1.2.1 版本已经被修复了。
对于 1.2.1 以下的版本,这种情况的一些提示:
```
1,清空 zk 下这个路径的任务:/dolphinscheduler/task_queue
2,修改任务状态为失败(int 值 6)
3,运行工作流来从失败中恢复
```
---
## Q:zk 中注册的 master 信息 ip 地址是 127.0.0.1,而不是配置的域名所对应或者解析的 ip 地址,可能导致不能查看任务日志
A:修复 bug:
```
1、confirm hostname
$hostname
hadoop1
2、hostname -i
127.0.0.1 10.3.57.15
3、edit /etc/hosts,delete hadoop1 from 127.0.0.1 record
$cat /etc/hosts
127.0.0.1 localhost
10.3.57.15 ds1 hadoop1
4、hostname -i
10.3.57.15
```
hostname 命令返回服务器主机名,hostname -i 返回的是服务器主机名在 /etc/hosts 中所有匹配的ip地址。所以我把 /etc/hosts 中 127.0.0.1 中的主机名删掉,只保留内网 ip 的解析就可以了,没必要把 127.0.0.1 整条注释掉, 只要 hostname 命令返回值在 /etc/hosts 中对应的内网 ip 正确就可以,ds 程序取了第一个值,我理解上 ds 程序不应该用 hostname -i 取值这样有点问题,因为好多公司服务器的主机名都是运维配置的,感觉还是直接取配置文件的域名解析的返回 ip 更准确,或者 znode 中存域名信息而不是 /etc/hosts。
---
## Q:调度系统设置了一个秒级的任务,导致系统挂掉
A:调度系统不支持秒级任务。
---
## Q:编译前后端代码 (dolphinscheduler-ui) 报错不能下载"https://github.com/sass/node-sass/releases/download/v4.13.1/darwin-x64-72_binding.node"
A:1,cd dolphinscheduler-ui 然后删除 node_modules 目录
```
sudo rm -rf node_modules
```
2,通过 npm.taobao.org 下载 node-sass
```
sudo npm uninstall node-sass
sudo npm i node-sass --sass_binary_site=https://npm.taobao.org/mirrors/node-sass/
```
3,如果步骤 2 报错,请重新构建 node-saas [参考链接](https://dolphinscheduler.apache.org/en-us/development/frontend-development.html)
```
sudo npm rebuild node-sass
```
当问题解决之后,如果你不想每次编译都下载这个 node,你可以设置系统环境变量:SASS_BINARY_PATH= /xxx/xxx/xxx/xxx.node。
---
## Q:当使用 mysql 作为 ds 数据库需要如何配置
A:1,修改项目根目录 maven 配置文件,移除 scope 的 test 属性,这样 mysql 的包就可以在其它阶段被加载
```
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>${mysql.connector.version}</version>
<scope>test<scope>
</dependency>
```
2,修改 application-dao.properties 和 quzrtz.properties 来使用 mysql 驱动
默认驱动是 postgres 主要由于许可证原因。
---
## Q:shell 任务是如何运行的
A:1,被执行的服务器在哪里配置,以及实际执行的服务器是哪台? 要指定在某个 worker 上去执行,可以在 worker 分组中配置,固定 IP,这样就可以把路径写死。如果配置的 worker 分组有多个 worker,实际执行的服务器由调度决定的,具有随机性。
2,如果是服务器上某个路径的一个 shell 文件,怎么指向这个路径?服务器上某个路径下的 shell 文件,涉及到权限问题,不建议这么做。建议你可以使用资源中心的存储功能,然后在 shell 编辑器里面使用资源引用就可以,系统会帮助你把脚本下载到执行目录下。如果以 hdfs 作为资源中心,在执行的时候,调度器会把依赖的 jar 包,文件等资源拉到 worker 的执行目录上,我这边是 /tmp/escheduler/exec/process,该配置可以在 install.sh 中进行指定。
3,以哪个用户来执行任务?执行任务的时候,调度器会采用 sudo -u 租户的方式去执行,租户是一个 linux 用户。
---
## Q:生产环境部署方式有推荐的最佳实践吗
A:1,如果没有很多任务要运行,出于稳定性考虑我们建议使用 3 个节点,并且最好把 Master/Worder 服务部署在不同的节点。如果你只有一个节点,当然只能把所有的服务部署在同一个节点!通常来说,需要多少节点取决于你的业务,海豚调度系统本身不需要很多的资源。充分测试之后,你们将找到使用较少节点的合适的部署方式。
---
## Q:DEPENDENT 节点
A:1,DEPENDENT 节点实际是没有执行体的,是专门用来配置数据周期依赖逻辑,然后再把执行节点挂载后面,来实现任务间的周期依赖。
---
## Q:如何改变 Master 服务的启动端口
<p align="center">
<img src="https://user-images.githubusercontent.com/8263441/62352160-0f3e9100-b53a-11e9-95ba-3ae3dde49c72.png" width="60%" />
</p>
A:1,修改 application_master.properties 配置文件,例如:server.port=12345。
---
## Q:调度任务不能上线
A:1,我们可以成功创建调度任务,并且表 t_scheduler_schedules 中也成功加入了一条记录,但当我点击上线后,前端页面无反应且会把 t_scheduler_schedules 这张表锁定,我测试过将 t_scheduler_schedules 中的 RELEASE_state 字段手动更新为 1 这样前端会显示为上线状态。DS 版本 1.2+ 表名是 t_ds_schedules,其它版本表名是 t_scheduler_schedules。
---
## Q:请问 swagger ui 的地址是什么
A:1,1.2+ 版本地址是:http://apiServerIp:apiServerPort/dolphinscheduler/doc.html?language=zh_CN&lang=cn,其它版本是 http://apiServerIp:apiServerPort/escheduler/doc.html?language=zh_CN&lang=cn。
---
## Q:前端安装包缺少文件
<p align="center">
<img src="https://user-images.githubusercontent.com/41460919/61437083-d960b080-a96e-11e9-87f1-297ba3aca5e3.png" width="60%" />
</p>
<p align="center">
<img src="https://user-images.githubusercontent.com/41460919/61437218-1b89f200-a96f-11e9-8e48-3fac47eb2389.png" width="60%" />
</p>
A: 1,用户修改了 api server 配置文件中的![apiServerContextPath](https://user-images.githubusercontent.com/41460919/61678323-1b09a680-ad35-11e9-9707-3ba68bbc70d6.png)配置项,导致了这个问题,恢复成默认配置之后问题解决。
---
## Q:上传比较大的文件卡住
<p align="center">
<img src="https://user-images.githubusercontent.com/21357069/58231400-805b0e80-7d69-11e9-8107-7f37b06a95df.png" width="60%" />
</p>
A:1,编辑 ngnix 配置文件 vi /etc/nginx/nginx.conf,更改上传大小 client_max_body_size 1024m。
2,更新 google chrome 版本到最新版本。
---
## Q:创建 spark 数据源,点击“测试连接”,系统回退回到登入页面
A:1,edit /etc/nginx/conf.d/escheduler.conf
```
proxy_connect_timeout 300s;
proxy_read_timeout 300s;
proxy_send_timeout 300s;
```
---
## Q:欢迎订阅 DolphinScheduler 开发邮件列表
A:在使用 DolphinScheduler 的过程中,如果您有任何问题或者想法、建议,都可以通过 Apache 邮件列表参与到 DolphinScheduler 的社区建设中来。
发送订阅邮件也非常简单,步骤如下:
1,用自己的邮箱向 dev-subscribe@dolphinscheduler.apache.org 发送一封邮件,主题和内容任意。
2, 接收确认邮件并回复。 完成步骤1后,您将收到一封来自 dev-help@dolphinscheduler.apache.org 的确认邮件(如未收到,请确认邮件是否被自动归入垃圾邮件、推广邮件、订阅邮件等文件夹)。然后直接回复该邮件,或点击邮件里的链接快捷回复即可,主题和内容任意。
3, 接收欢迎邮件。 完成以上步骤后,您会收到一封主题为 WELCOME to dev@dolphinscheduler.apache.org 的欢迎邮件,至此您已成功订阅 Apache DolphinScheduler的邮件列表。
---
## Q:工作流依赖
A:1,目前是按照自然天来判断,上月末:判断时间是工作流 A start_time/scheduler_time between '2019-05-31 00:00:00' and '2019-05-31 23:59:59'。上月:是判断上个月从 1 号到月末每天都要有完成的A实例。上周: 上周 7 天都要有完成的 A 实例。前两天: 判断昨天和前天,两天都要有完成的 A 实例。
---
## Q:DS 后端接口文档
A:1,http://106.75.43.194:8888/dolphinscheduler/doc.html?language=zh_CN&lang=zh。
## dolphinscheduler 在运行过程中,ip 地址获取错误的问题
master 服务、worker 服务在 zookeeper 注册时,会以 ip:port 的形式创建相关信息
如果 ip 地址获取错误,请检查网络信息,如 Linux 系统通过 `ifconfig` 命令查看网络信息,以下图为例:
<p align="center">
<img src="/img/network/network_config.png" width="60%" />
</p>
可以使用 dolphinscheduler 提供的三种策略,获取可用 ip:
* default: 优先获取内网网卡获取 ip 地址,其次获取外网网卡获取 ip 地址,在前两项失效情况下,使用第一块可用网卡的地址
* inner: 使用内网网卡获取 ip地址,如果获取失败抛出异常信息
* outer: 使用外网网卡获取 ip地址,如果获取失败抛出异常信息
配置方式是在 `common.properties` 中修改相关配置:
```shell
# network IP gets priority, default: inner outer
# dolphin.scheduler.network.priority.strategy=default
```
以上配置修改后重启服务生效
如果 ip 地址获取依然错误,请下载 [dolphinscheduler-netutils.jar](/asset/dolphinscheduler-netutils.jar) 到相应机器,执行以下命令以进一步排障,并反馈给社区开发人员:
```shell
java -jar target/dolphinscheduler-netutils.jar
```
## 配置 sudo 免密,用于解决默认配置 sudo 权限过大或不能申请 root 权限的使用问题
配置 dolphinscheduler OS 账号的 sudo 权限为部分普通用户范围内的一个普通用户管理者,限制指定用户在指定主机上运行某些命令,详细配置请看 sudo 权限管理
例如 sudo 权限管理配置 dolphinscheduler OS 账号只能操作用户 userA,userB,userC 的权限(其中用户 userA,userB,userC 用于多租户向大数据集群提交作业)
```shell
echo 'dolphinscheduler ALL=(userA,userB,userC) NOPASSWD: NOPASSWD: ALL' >> /etc/sudoers
sed -i 's/Defaults requirett/#Defaults requirett/g' /etc/sudoers
```
---
## Q:Yarn多集群支持
A:将Worker节点分别部署至多个Yarn集群,步骤如下(例如AWS EMR):
1. 将 Worker 节点部署至 EMR 集群的 Master 节点
2. 将 `conf/common.properties` 中的 `yarn.application.status.address` 修改为当前集群的 Yarn 的信息
3. 通过 `bin/dolphinscheduler-daemon.sh start worker-server``bin/dolphinscheduler-daemon.sh start logger-server` 分别启动 worker-server 和 logger-server
---
## Q:Update process definition error: Duplicate key TaskDefinition
A:在DS 2.0.4之前(2.0.0-alpha之后),可能存在版本切换的重复键问题,导致更新工作流失败;可参考如下SQL进行重复数据的删除,以MySQL为例:(注意:操作前请务必备份原数据,SQL来源于pr [#8408](https://github.com/apache/dolphinscheduler/pull/8408))
```SQL
DELETE FROM t_ds_process_task_relation_log WHERE id IN
(
SELECT
x.id
FROM
(
SELECT
aa.id
FROM
t_ds_process_task_relation_log aa
JOIN
(
SELECT
a.process_definition_code
,MAX(a.id) as min_id
,a.pre_task_code
,a.pre_task_version
,a.post_task_code
,a.post_task_version
,a.process_definition_version
,COUNT(*) cnt
FROM
t_ds_process_task_relation_log a
JOIN (
SELECT
code
FROM
t_ds_process_definition
GROUP BY code
)b ON b.code = a.process_definition_code
WHERE 1=1
GROUP BY a.pre_task_code
,a.post_task_code
,a.pre_task_version
,a.post_task_version
,a.process_definition_code
,a.process_definition_version
HAVING COUNT(*) > 1
)bb ON bb.process_definition_code = aa.process_definition_code
AND bb.pre_task_code = aa.pre_task_code
AND bb.post_task_code = aa.post_task_code
AND bb.process_definition_version = aa.process_definition_version
AND bb.pre_task_version = aa.pre_task_version
AND bb.post_task_version = aa.post_task_version
AND bb.min_id != aa.id
)x
)
;
DELETE FROM t_ds_task_definition_log WHERE id IN
(
SELECT
x.id
FROM
(
SELECT
a.id
FROM
t_ds_task_definition_log a
JOIN
(
SELECT
code
,name
,version
,MAX(id) AS min_id
FROM
t_ds_task_definition_log
GROUP BY code
,name
,version
HAVING COUNT(*) > 1
)b ON b.code = a.code
AND b.name = a.name
AND b.version = a.version
AND b.min_id != a.id
)x
)
;
```
---
我们会持续收集更多的 FAQ。

12
docs/docs/zh/guide/alert/alert_plugin_user_guide.md

@ -0,0 +1,12 @@
## 如何创建告警插件以及告警组
在2.0.0版本中,用户需要创建告警实例,然后同告警组进行关联,一个告警组可以使用多个告警实例,我们会逐一进行进行告警通知。
首先需要进入到安全中心,选择告警组管理,然后点击左侧的告警实例管理,然后创建一个告警实例,然后选择对应的告警插件,填写相关告警参数。
然后选择告警组管理,创建告警组,选择相应的告警实例即可。
<img src="/img/alert/alert_step_1.png">
<img src="/img/alert/alert_step_2.png">
<img src="/img/alert/alert_step_3.png">
<img src="/img/alert/alert_step_4.png">

26
docs/docs/zh/guide/alert/dingtalk.md

@ -0,0 +1,26 @@
# 钉钉
如果您需要使用到钉钉进行告警,请在告警实例管理里创建告警实例,选择 DingTalk 插件。钉钉的配置样例如下:
![dingtalk-plugin](/img/alert/dingtalk-plugin.png)
参数配置
* Webhook
> 格式如下:https://oapi.dingtalk.com/robot/send?access_token=XXXXXX
* Keyword
> 安全设置的自定义关键词
* Secret
> 安全设置的加签
* 消息类型
> 支持 text 和 markdown 两种类型
自定义机器人发送消息时,可以通过手机号码指定“被@人列表”。在“被@人列表”里面的人员收到该消息时,会有@消息提醒。免打扰会话仍然通知提醒,首屏出现“有人@你”
* @Mobiles
> 被@人的手机号
* @UserIds
> 被@人的用户userid
* @All
> 是否@所有人
[钉钉自定义机器人接入开发文档](https://open.dingtalk.com/document/robots/custom-robot-access)

64
docs/docs/zh/guide/alert/enterprise-webexteams.md

@ -0,0 +1,64 @@
# WebexTeams
如果您需要使用到Webex Teams进行告警,请在告警实例管理里创建告警实例,选择 WebexTeams 插件。
你可以选择机器人私聊通知或聊天室通知。
WebexTeams的配置样例如下:
![enterprise-webexteams-plugin](/img/alert/enterprise-webexteams-plugin.png)
## 参数配置
* botAccessToken
> 在创建机器人时,获得的访问令牌
* roomID
> 接受消息的room ID(只支持一个ID)
* toPersonId
> 接受消息的用户ID(只支持一个ID)
* toPersonEmail
> 接受消息的用户邮箱(只支持一个邮箱)
* atSomeoneInRoom
> 如果消息目的地为room,被@人的用户邮箱,多个邮箱用英文逗号分隔
* destination
> 消息目的地,一条消息只支持一个目的地
## 创建一个机器人
访问[官网My-Apps](https://developer.webex.com/docs/api/v1/rooms/create-a-room)来创建一个机器人,点击`Create a New APP` 然后选择 `Create a Bot`,填入机器人信息后获取`bot username` 和 `bot ID`以备以下步骤使用。
![enterprise-webexteams-bot-info](/img/alert/enterprise-webexteams-bot.png)
## 创建一个房间
访问[官网开发者APIs](https://developer.webex.com/docs/api/v1/rooms/create-a-room)来创建一个房间,填入房间名称后获取`id`(room ID) 和 `creatorId`以备以下步骤使用。
![enterprise-webexteams-room-info](/img/alert/enterprise-webexteams-room.png)
### 邀请机器人到房间
通过机器人的Email(bot username)将机器人添加至房间。
## 发送私聊消息
通过这种方式,你可以通过`用户邮箱`或`用户`对一个用户私聊窗口发送告警,填入`用户` 或 `用户邮箱`(推荐) 和 `访问令牌`并选择`描述` `personEmail``personId`
`用户邮箱`是用户注册Email地址。
`用户`我们可以从新建房间返回的`creatorId`中获取。
![enterprise-webexteams-private-message-form](/img/alert/enterprise-webexteams-private-form.png)
### 私聊告警样例
![enterprise-webexteams-private-message-example](/img/alert/enterprise-webexteams-private-msg.png)
## 发送群聊消息
通过这种方式,你可以通过`房间`向一个房间发送告警,填入`房间` 和 `访问令牌`并选择`描述` `roomId`
`房间`我们可以从新建房间API返回的`id`中获取。
![enterprise-webexteams-group-form](/img/alert/enterprise-webexteams-group-form.png)
### 群聊告警消息样例
![enterprise-webexteams-room-message-example](/img/alert/enterprise-webexteams-room-msg.png)
[WebexTeams申请机器人文档](https://developer.webex.com/docs/bots)
[WebexTeamsMessage开发文档](https://developer.webex.com/docs/api/v1/messages/create-a-message)

13
docs/docs/zh/guide/alert/enterprise-wechat.md

@ -0,0 +1,13 @@
# 企业微信
如果您需要使用到企业微信进行告警,请在告警实例管理里创建告警实例,选择 WeChat 插件。企业微信的配置样例如下
![enterprise-wechat-plugin](/img/alert/enterprise-wechat-plugin.png)
其中 send.type 分别对应企微文档:
应用:https://work.weixin.qq.com/api/doc/90000/90135/90236
群聊:https://work.weixin.qq.com/api/doc/90000/90135/90248
user.send.msg 对应文档中的 content,与此相对应的值的变量为 {msg}

41
docs/docs/zh/guide/alert/telegram.md

@ -0,0 +1,41 @@
# Telegram
如果您需要使用`Telegram`进行告警,请在告警实例管理模块创建告警实例,选择`Telegram`插件。
`Telegram`的配置样例如下:
![telegram-plugin](/img/alert/telegram-plugin.png)
参数配置:
* WebHook:
> 使用 Telegram 的机器人,发送消息的WebHook。
* botToken
> 创建 Telegram 的机器人,获取的访问令牌。
* chatId
> 订阅的 Telegram 频道
* parseMode
> 消息解析类型, 支持: txt、markdown、markdownV2、html
* EnableProxy
> 开启代理
* Proxy
> 代理地址
* Port
> 代理端口
* User
> 代理鉴权用户
* Password
> 代理鉴权密码
**注意**:用户配置的WebHook需要能够接收和使用与DolphinScheduler构造的HTTP POST请求BODY相同的结构,JSON结构如下:
```json
{
"text": "[{\"projectId\":1,\"projectName\":\"p1\",\"owner\":\"admin\",\"processId\":35,\"processDefinitionCode\":4928367293568,\"processName\":\"s11-3-20220324084708668\",\"taskCode\":4928359068928,\"taskName\":\"s1\",\"taskType\":\"SHELL\",\"taskState\":\"FAILURE\",\"taskStartTime\":\"2022-03-24 08:47:08\",\"taskEndTime\":\"2022-03-24 08:47:09\",\"taskHost\":\"192.168.1.103:1234\",\"logPath\":\"\"}]",
"chat_id": "chat id number"
}
```
[Telegram 如何申请机器人,如何创建频道](https://core.telegram.org/bots)
[Telegram 机器人开发文档](https://core.telegram.org/bots/api)
[Telegram SendMessage 接口文档](https://core.telegram.org/bots/api#sendmessage)

40
docs/docs/zh/guide/datasource/hive.md

@ -0,0 +1,40 @@
# HIVE数据源
## 使用HiveServer2
![hive](/img/new_ui/dev/datasource/hive.png)
- 数据源:选择 HIVE
- 数据源名称:输入数据源的名称
- 描述:输入数据源的描述
- IP 主机名:输入连接 HIVE 的 IP
- 端口:输入连接 HIVE 的端口
- 用户名:设置连接 HIVE 的用户名
- 密码:设置连接 HIVE 的密码
- 数据库名:输入连接 HIVE 的数据库名称
- Jdbc 连接参数:用于 HIVE 连接的参数设置,以 JSON 形式填写
> 注意:如果您希望在同一个会话中执行多个 HIVE SQL,您可以修改配置文件 `common.properties` 中的配置,设置 `support.hive.oneSession = true`
> 这对运行 HIVE SQL 前设置环境变量的场景会很有帮助。参数 `support.hive.oneSession` 默认值为 `false`,多条 SQL 将在不同的会话中运行。
## 使用 HiveServer2 HA Zookeeper
![hive-server2](/img/new_ui/dev/datasource/hiveserver2.png)
注意:如果没有开启 kerberos,请保证参数 `hadoop.security.authentication.startup.state` 值为 `false`,
参数 `java.security.krb5.conf.path` 值为空. 开启了 **kerberos**,则需要在 `common.properties` 配置以下参数
```conf
# whether to startup kerberos
hadoop.security.authentication.startup.state=true
# java.security.krb5.conf path
java.security.krb5.conf.path=/opt/krb5.conf
# login user from keytab username
login.user.keytab.username=hdfs-mycluster@ESZ.COM
# login user from keytab path
login.user.keytab.path=/opt/hdfs.headless.keytab
```

6
docs/docs/zh/guide/datasource/introduction.md

@ -0,0 +1,6 @@
# 数据源
数据源中心支持MySQL、POSTGRESQL、HIVE/IMPALA、SPARK、CLICKHOUSE、ORACLE、SQLSERVER等数据源。
- 点击“数据源中心->创建数据源”,根据需求创建不同类型的数据源。
- 点击“测试连接”,测试数据源是否可以连接成功。

13
docs/docs/zh/guide/datasource/mysql.md

@ -0,0 +1,13 @@
# MySQL 数据源
![mysql](/img/new_ui/dev/datasource/mysql.png)
- 数据源:选择 MYSQL
- 数据源名称:输入数据源的名称
- 描述:输入数据源的描述
- IP 主机名:输入连接 MySQL 的 IP
- 端口:输入连接 MySQL 的端口
- 用户名:设置连接 MySQL 的用户名
- 密码:设置连接 MySQL 的密码
- 数据库名:输入连接 MySQL 的数据库名称
- Jdbc 连接参数:用于 MySQL 连接的参数设置,以 JSON 形式填写

13
docs/docs/zh/guide/datasource/postgresql.md

@ -0,0 +1,13 @@
# POSTGRESQL 数据源
![postgresql](/img/new_ui/dev/datasource/postgresql.png)
- 数据源:选择 POSTGRESQL
- 数据源名称:输入数据源的名称
- 描述:输入数据源的描述
- IP 主机名:输入连接 POSTGRESQL 的 IP
- 端口:输入连接 POSTGRESQL 的端口
- 用户名:设置连接 POSTGRESQL 的用户名
- 密码:设置连接 POSTGRESQL 的密码
- 数据库名:输入连接 POSTGRESQL 的数据库名称
- Jdbc 连接参数:用于 POSTGRESQL 连接的参数设置,以 JSON 形式填写

19
docs/docs/zh/guide/datasource/spark.md

@ -0,0 +1,19 @@
# Spark数据源
![sparksql](/img/new_ui/dev/datasource/sparksql.png)
- 数据源:选择 Spark
- 数据源名称:输入数据源的名称
- 描述:输入数据源的描述
- IP/主机名:输入连接Spark的IP
- 端口:输入连接Spark的端口
- 用户名:设置连接Spark的用户名
- 密码:设置连接Spark的密码
- 数据库名:输入连接Spark的数据库名称
- Jdbc连接参数:用于Spark连接的参数设置,以JSON形式填写
注意:如果开启了**kerberos**,则需要填写 **Principal**
<p align="center">
<img src="/img/sparksql_kerberos.png" width="80%" />
</p>

245
docs/docs/zh/guide/expansion-reduction.md

@ -0,0 +1,245 @@
# DolphinScheduler扩容/缩容 文档
## 1. DolphinScheduler扩容文档
本文扩容是针对现有的DolphinScheduler集群添加新的master或者worker节点的操作说明.
```
注意: 一台物理机上不能存在多个master服务进程或者worker服务进程.
如果扩容master或者worker节点所在的物理机已经安装了调度的服务,请直接跳到 [1.4.修改配置]. 编辑 ** 所有 ** 节点上的配置文件 `conf/config/install_config.conf`. 新增masters或者workers参数,重启调度集群即可.
```
### 1.1. 基础软件安装(必装项请自行安装)
* [必装] [JDK](https://www.oracle.com/technetwork/java/javase/downloads/index.html) (1.8+) : 必装,请安装好后在/etc/profile下配置 JAVA_HOME 及 PATH 变量
* [可选] 如果扩容的是worker类型的节点,需要考虑是否要安装外部客户端,比如Hadoop、Hive、Spark 的Client.
```markdown
注意:DolphinScheduler本身不依赖Hadoop、Hive、Spark,仅是会调用他们的Client,用于对应任务的提交。
```
### 1.2. 获取安装包
- 确认现有环境使用的DolphinScheduler是哪个版本,获取对应版本的安装包,如果版本不同,可能存在兼容性的问题.
- 确认其他节点的统一安装目录,本文假设DolphinScheduler统一安装在 /opt/ 目录中,安装全路径为/opt/dolphinscheduler.
- 请下载对应版本的安装包至服务器安装目录,解压并重名为dolphinscheduler存放在/opt目录中.
- 添加数据库依赖包,本文使用Mysql数据库,添加mysql-connector-java驱动包到/opt/dolphinscheduler/lib目录中
```shell
# 创建安装目录,安装目录请不要创建在/root、/home等高权限目录
mkdir -p /opt
cd /opt
# 解压缩
tar -zxvf apache-dolphinscheduler-1.3.8-bin.tar.gz -C /opt
cd /opt
mv apache-dolphinscheduler-1.3.8-bin dolphinscheduler
```
```markdown
注意:安装包可以从现有的环境直接复制到扩容的物理机上使用.
```
### 1.3. 创建部署用户
- 在**所有**扩容的机器上创建部署用户,并且一定要配置sudo免密。假如我们计划在ds1,ds2,ds3,ds4这四台扩容机器上部署调度,首先需要在每台机器上都创建部署用户
```shell
# 创建用户需使用root登录,设置部署用户名,请自行修改,后面以dolphinscheduler为例
useradd dolphinscheduler;
# 设置用户密码,请自行修改,后面以dolphinscheduler123为例
echo "dolphinscheduler123" | passwd --stdin dolphinscheduler
# 配置sudo免密
echo 'dolphinscheduler ALL=(ALL) NOPASSWD: NOPASSWD: ALL' >> /etc/sudoers
sed -i 's/Defaults requirett/#Defaults requirett/g' /etc/sudoers
```
```markdown
注意:
- 因为是以 sudo -u {linux-user} 切换不同linux用户的方式来实现多租户运行作业,所以部署用户需要有 sudo 权限,而且是免密的。
- 如果发现/etc/sudoers文件中有"Default requiretty"这行,也请注释掉
- 如果用到资源上传的话,还需要在`HDFS或者MinIO`上给该部署用户分配读写的权限
```
### 1.4. 修改配置
- 从现有的节点比如Master/Worker节点,直接拷贝conf目录替换掉新增节点中的conf目录.拷贝之后检查一下配置项是否正确.
```markdown
重点检查:
datasource.properties 中的数据库连接信息.
zookeeper.properties 中的连接zk的信息.
common.properties 中关于资源存储的配置信息(如果设置了hadoop,请检查是否存在core-site.xml和hdfs-site.xml配置文件).
env/dolphinscheduler_env.sh 中的环境变量
````
- 根据机器配置,修改 conf/env 目录下的 `dolphinscheduler_env.sh` 环境变量(以相关用到的软件都安装在/opt/soft下为例)
```shell
export HADOOP_HOME=/opt/soft/hadoop
export HADOOP_CONF_DIR=/opt/soft/hadoop/etc/hadoop
#export SPARK_HOME1=/opt/soft/spark1
export SPARK_HOME2=/opt/soft/spark2
export PYTHON_HOME=/opt/soft/python
export JAVA_HOME=/opt/soft/java
export HIVE_HOME=/opt/soft/hive
export FLINK_HOME=/opt/soft/flink
export DATAX_HOME=/opt/soft/datax/bin/datax.py
export PATH=$HADOOP_HOME/bin:$SPARK_HOME2/bin:$PYTHON_HOME:$JAVA_HOME/bin:$HIVE_HOME/bin:$PATH:$FLINK_HOME/bin:$DATAX_HOME:$PATH
```
`注: 这一步非常重要,例如 JAVA_HOME 和 PATH 是必须要配置的,没有用到的可以忽略或者注释掉`
- 将jdk软链到/usr/bin/java下(仍以 JAVA_HOME=/opt/soft/java 为例)
```shell
sudo ln -s /opt/soft/java/bin/java /usr/bin/java
```
- 修改 **所有** 节点上的配置文件 `conf/config/install_config.conf`, 同步修改以下配置.
* 新增的master节点, 需要修改 ips 和 masters 参数.
* 新增的worker节点, 需要修改 ips 和 workers 参数.
```shell
#在哪些机器上新增部署DS服务,多个物理机之间用逗号隔开.
ips="ds1,ds2,ds3,ds4"
#ssh端口,默认22
sshPort="22"
#master服务部署在哪台机器上
masters="现有master01,现有master02,ds1,ds2"
#worker服务部署在哪台机器上,并指定此worker属于哪一个worker组,下面示例的default即为组名
workers="现有worker01:default,现有worker02:default,ds3:default,ds4:default"
```
- 如果扩容的是worker节点,需要设置worker分组.请参考安全中心[创建worker分组](./security.md)
- 在所有的新增节点上,修改目录权限,使得部署用户对dolphinscheduler目录有操作权限
```shell
sudo chown -R dolphinscheduler:dolphinscheduler dolphinscheduler
```
### 1.4. 重启集群&验证
- 重启集群
```shell
停止命令:
bin/stop-all.sh 停止所有服务
sh bin/dolphinscheduler-daemon.sh stop master-server 停止 master 服务
sh bin/dolphinscheduler-daemon.sh stop worker-server 停止 worker 服务
sh bin/dolphinscheduler-daemon.sh stop api-server 停止 api 服务
sh bin/dolphinscheduler-daemon.sh stop alert-server 停止 alert 服务
启动命令:
bin/start-all.sh 启动所有服务
sh bin/dolphinscheduler-daemon.sh start master-server 启动 master 服务
sh bin/dolphinscheduler-daemon.sh start worker-server 启动 worker 服务
sh bin/dolphinscheduler-daemon.sh start api-server 启动 api 服务
sh bin/dolphinscheduler-daemon.sh start alert-server 启动 alert 服务
```
```
注意: 使用stop-all.sh或者stop-all.sh的时候,如果执行该命令的物理机没有配置到所有机器的ssh免登陆的话,会提示输入密码
```
- 脚本完成后,使用`jps`命令查看各个节点服务是否启动(`jps`为`java JDK`自带)
```
MasterServer ----- master服务
WorkerServer ----- worker服务
ApiApplicationServer ----- api服务
AlertServer ----- alert服务
```
启动成功后,可以进行日志查看,日志统一存放于logs文件夹内
```日志路径
logs/
├── dolphinscheduler-alert-server.log
├── dolphinscheduler-master-server.log
├── dolphinscheduler-worker-server.log
├── dolphinscheduler-api-server.log
```
如果以上服务都正常启动且调度系统页面正常,在web系统的[监控中心]查看是否有扩容的Master或者Worker服务.如果存在,则扩容完成
-----------------------------------------------------------------------------
## 2. 缩容
缩容是针对现有的DolphinScheduler集群减少master或者worker服务,
缩容一共分两个步骤,执行完以下两步,即可完成缩容操作.
### 2.1 停止缩容节点上的服务
* 如果缩容master节点,要确定要缩容master服务所在的物理机,并在物理机上停止该master服务.
* 如果缩容worker节点,要确定要缩容worker服务所在的物理机,并在物理机上停止worker服务.
```shell
停止命令:
bin/stop-all.sh 停止所有服务
sh bin/dolphinscheduler-daemon.sh stop master-server 停止 master 服务
sh bin/dolphinscheduler-daemon.sh stop worker-server 停止 worker 服务
sh bin/dolphinscheduler-daemon.sh stop api-server 停止 api 服务
sh bin/dolphinscheduler-daemon.sh stop alert-server 停止 alert 服务
启动命令:
bin/start-all.sh 启动所有服务
sh bin/dolphinscheduler-daemon.sh start master-server 启动 master 服务
sh bin/dolphinscheduler-daemon.sh start worker-server 启动 worker 服务
sh bin/dolphinscheduler-daemon.sh start api-server 启动 api 服务
sh bin/dolphinscheduler-daemon.sh start alert-server 启动 alert 服务
```
```
注意: 使用stop-all.sh或者stop-all.sh的时候,如果没有执行该命令的机器没有配置到所有机器的ssh免登陆的话,会提示输入密码
```
- 脚本完成后,使用`jps`命令查看各个节点服务是否成功关闭(`jps`为`java JDK`自带)
```
MasterServer ----- master服务
WorkerServer ----- worker服务
ApiApplicationServer ----- api服务
AlertServer ----- alert服务
```
如果对应的master服务或者worker服务不存在,则代表master/worker服务成功关闭.
### 2.2 修改配置文件
- 修改 **所有** 节点上的配置文件 `conf/config/install_config.conf`, 同步修改以下配置.
* 缩容master节点, 需要修改 ips 和 masters 参数.
* 缩容worker节点, 需要修改 ips 和 workers 参数.
```shell
#在哪些机器上部署DS服务,本机选localhost
ips="ds1,ds2,ds3,ds4"
#ssh端口,默认22
sshPort="22"
#master服务部署在哪台机器上
masters="现有master01,现有master02,ds1,ds2"
#worker服务部署在哪台机器上,并指定此worker属于哪一个worker组,下面示例的default即为组名
workers="现有worker01:default,现有worker02:default,ds3:default,ds4:default"
```

150
docs/docs/zh/guide/flink-call.md

@ -0,0 +1,150 @@
# 调用 flink 操作步骤
### 创建队列
1. 登录调度系统,点击 "安全中心",再点击左侧的 "队列管理",点击 "队列管理" 创建队列
2. 填写队列名称和队列值,然后点击 "提交"
<p align="center">
<img src="/img/api/create_queue.png" width="80%" />
</p>
### 创建租户
```
1.租户对应的是 linux 用户, 用户 worker 提交作业所使用的的用户, 如果 linux 没有这个用户, worker 会在执行脚本的时候创建这个用户
2.租户和租户编码都是唯一不能重复,好比一个人有名字有身份证号。
3.创建完租户会在 hdfs 对应的目录上有相关的文件夹。
```
<p align="center">
<img src="/img/api/create_tenant.png" width="80%" />
</p>
### 创建用户
<p align="center">
<img src="/img/api/create_user.png" width="80%" />
</p>
### 创建 Token
1. 登录调度系统,点击 "安全中心",再点击左侧的 "令牌管理",点击 "令牌管理" 创建令牌
<p align="center">
<img src="/img/token-management.png" width="80%" />
</p>
2. 选择 "失效时间" (Token有效期),选择 "用户" (以指定的用户执行接口操作),点击 "生成令牌" ,拷贝 Token 字符串,然后点击 "提交"
<p align="center">
<img src="/img/create-token.png" width="80%" />
</p>
### 使用 Token
1. 打开 API文档页面
> 地址:http://{api server ip}:12345/dolphinscheduler/doc.html?language=zh_CN&lang=cn
<p align="center">
<img src="/img/api-documentation.png" width="80%" />
</p>
2. 选一个测试的接口,本次测试选取的接口是:查询所有项目
> projects/query-project-list
3. 打开 Postman,填写接口地址,并在 Headers 中填写 Token,发送请求后即可查看结果
```
token: 刚刚生成的 Token
```
<p align="center">
<img src="/img/test-api.png" width="80%" />
</p>
### 用户授权
<p align="center">
<img src="/img/api/user_authorization.png" width="80%" />
</p>
### 用户登录
```
http://192.168.1.163:12345/dolphinscheduler/ui/#/monitor/servers/master
```
<p align="center">
<img src="/img/api/user_login.png" width="80%" />
</p>
### 资源上传
<p align="center">
<img src="/img/api/upload_resource.png" width="80%" />
</p>
### 创建工作流
<p align="center">
<img src="/img/api/create_workflow1.png" width="80%" />
</p>
<p align="center">
<img src="/img/api/create_workflow2.png" width="80%" />
</p>
<p align="center">
<img src="/img/api/create_workflow3.png" width="80%" />
</p>
<p align="center">
<img src="/img/api/create_workflow4.png" width="80%" />
</p>
### 查看执行结果
<p align="center">
<img src="/img/api/execution_result.png" width="80%" />
</p>
### 查看日志结果
<p align="center">
<img src="/img/api/log.png" width="80%" />
</p>

5
docs/docs/zh/guide/homepage.md

@ -0,0 +1,5 @@
# 首页
首页包含用户所有项目的任务状态统计、流程状态统计、工作流定义统计。
![homepage](/img/new_ui/dev/homepage/homepage.png)

35
docs/docs/zh/guide/installation/cluster.md

@ -0,0 +1,35 @@
# 集群部署(Cluster)
集群部署目的是在多台机器部署 DolphinScheduler 服务,用于运行大量任务情况。
如果你是新手,想要体验 DolphinScheduler 的功能,推荐使用[Standalone](standalone.md)方式体检。如果你想体验更完整的功能,或者更大的任务量,推荐使用[伪集群部署](pseudo-cluster.md)。如果你是在生产中使用,推荐使用[集群部署](cluster.md)或者[kubernetes](kubernetes.md)
## 部署步骤
集群部署(Cluster)使用的脚本和配置文件与[伪集群部署](pseudo-cluster.md)中的配置一样,所以所需要的步骤也与[伪集群部署](pseudo-cluster.md)大致一样。区别就是[伪集群部署](pseudo-cluster.md)针对的是一台机器,而集群部署(Cluster)需要针对多台机器,且两者“修改相关配置”步骤区别较大
### 前置准备工作 && 准备 DolphinScheduler 启动环境
其中除了[伪集群部署](pseudo-cluster.md)中的“前置准备工作”,“准备启动环境”除了“启动zookeeper”以及“初始化数据库”外,别的都需要在每台机器中进行配置
### 修改相关配置
这个是与[伪集群部署](pseudo-cluster.md)差异较大的一步,因为部署脚本会通过 `scp` 的方式将安装需要的资源传输到各个机器上,所以这一步我们仅需要修改运行`install.sh`脚本的所在机器的配置即可。配置文件在路径在`conf/config/install_config.conf`下,此处我们仅需修改**INSTALL MACHINE**,**DolphinScheduler ENV、Database、Registry Server**与[伪集群部署](pseudo-cluster.md)保持一致,下面对必须修改参数进行说明
```shell
# ---------------------------------------------------------
# INSTALL MACHINE
# ---------------------------------------------------------
# 需要配置master、worker、API server,所在服务器的IP均为机器IP或者localhost
# 如果是配置hostname的话,需要保证机器间可以通过hostname相互链接
# 如下图所示,部署 DolphinScheduler 机器的 hostname 为 ds1,ds2,ds3,ds4,ds5,其中 ds1,ds2 安装 master 服务,ds3,ds4,ds5安装 worker 服务,alert server安装在ds4中,api server 安装在ds5中
ips="ds1,ds2,ds3,ds4,ds5"
masters="ds1,ds2"
workers="ds3:default,ds4:default,ds5:default"
alertServer="ds4"
apiServers="ds5"
```
## 启动 DolphinScheduler && 登录 DolphinScheduler && 启停服务
[与伪集群部署](pseudo-cluster.md)保持一致

755
docs/docs/zh/guide/installation/kubernetes.md

@ -0,0 +1,755 @@
# 快速试用 Kubernetes 部署
Kubernetes部署目的是在Kubernetes集群中部署 DolphinScheduler 服务,能调度大量任务,可用于在生产中部署。
如果你是新手,想要体验 DolphinScheduler 的功能,推荐使用[Standalone](standalone.md)方式体检。如果你想体验更完整的功能,或者更大的任务量,推荐使用[伪集群部署](pseudo-cluster.md)。如果你是在生产中使用,推荐使用[集群部署](cluster.md)或者[kubernetes](kubernetes.md)
## 先决条件
- [Helm](https://helm.sh/) 3.1.0+
- [Kubernetes](https://kubernetes.io/) 1.12+
- PV 供应(需要基础设施支持)
## 安装 dolphinscheduler
请下载源码包 apache-dolphinscheduler-1.3.8-src.tar.gz,下载地址: [下载](/zh-cn/download/download.html)
发布一个名为 `dolphinscheduler` 的版本(release),请执行以下命令:
```
$ tar -zxvf apache-dolphinscheduler-1.3.8-src.tar.gz
$ cd apache-dolphinscheduler-1.3.8-src/docker/kubernetes/dolphinscheduler
$ helm repo add bitnami https://charts.bitnami.com/bitnami
$ helm dependency update .
$ helm install dolphinscheduler . --set image.tag=1.3.8
```
将名为 `dolphinscheduler` 的版本(release) 发布到 `test` 的命名空间中:
```bash
$ helm install dolphinscheduler . -n test
```
> **提示**: 如果名为 `test` 的命名空间被使用, 选项参数 `-n test` 需要添加到 `helm``kubectl` 命令中
这些命令以默认配置在 Kubernetes 集群上部署 DolphinScheduler,[附录-配置](#appendix-configuration)部分列出了可以在安装过程中配置的参数 <!-- markdown-link-check-disable-line -->
> **提示**: 列出所有已发布的版本,使用 `helm list`
**PostgreSQL** (用户 `root`, 密码 `root`, 数据库 `dolphinscheduler`) 和 **ZooKeeper** 服务将会默认启动
## 访问 DolphinScheduler 前端页面
如果 `values.yaml` 文件中的 `ingress.enabled` 被设置为 `true`, 在浏览器中访问 `http://${ingress.host}/dolphinscheduler` 即可
> **提示**: 如果 ingress 访问遇到问题,请联系 Kubernetes 管理员并查看 [Ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/)
否则,当 `api.service.type=ClusterIP` 时,你需要执行 port-forward 端口转发命令:
```bash
$ kubectl port-forward --address 0.0.0.0 svc/dolphinscheduler-api 12345:12345
$ kubectl port-forward --address 0.0.0.0 -n test svc/dolphinscheduler-api 12345:12345 # 使用 test 命名空间
```
> **提示**: 如果出现 `unable to do port forwarding: socat not found` 错误, 需要先安装 `socat`
访问前端页面:http://localhost:12345/dolphinscheduler,如果有需要请修改成对应的 IP 地址
或者当 `api.service.type=NodePort` 时,你需要执行命令:
```bash
NODE_IP=$(kubectl get no -n {{ .Release.Namespace }} -o jsonpath="{.items[0].status.addresses[0].address}")
NODE_PORT=$(kubectl get svc {{ template "dolphinscheduler.fullname" . }}-api -n {{ .Release.Namespace }} -o jsonpath="{.spec.ports[0].nodePort}")
echo http://$NODE_IP:$NODE_PORT/dolphinscheduler
```
然后访问前端页面: http://localhost:12345/dolphinscheduler
默认的用户是`admin`,默认的密码是`dolphinscheduler123`
请参考用户手册章节的[快速上手](../start/quick-start.md)查看如何使用DolphinScheduler
## 卸载 dolphinscheduler
卸载名为 `dolphinscheduler` 的版本(release),请执行:
```bash
$ helm uninstall dolphinscheduler
```
该命令将删除与 `dolphinscheduler` 相关的所有 Kubernetes 组件(但PVC除外),并删除版本(release)
要删除与 `dolphinscheduler` 相关的PVC,请执行:
```bash
$ kubectl delete pvc -l app.kubernetes.io/instance=dolphinscheduler
```
> **注意**: 删除PVC也会删除所有数据,请谨慎操作!
## 配置
配置文件为 `values.yaml`,[附录-配置](#appendix-configuration) 表格列出了 DolphinScheduler 的可配置参数及其默认值 <!-- markdown-link-check-disable-line -->
## 支持矩阵
| Type | 支持 | 备注 |
| ------------------------------------------------------------ | ------- | --------------------- |
| Shell | 是 | |
| Python2 | 是 | |
| Python3 | 间接支持 | 详见 FAQ |
| Hadoop2 | 间接支持 | 详见 FAQ |
| Hadoop3 | 尚未确定 | 尚未测试 |
| Spark-Local(client) | 间接支持 | 详见 FAQ |
| Spark-YARN(cluster) | 间接支持 | 详见 FAQ |
| Spark-Standalone(cluster) | 尚不 | |
| Spark-Kubernetes(cluster) | 尚不 | |
| Flink-Local(local>=1.11) | 尚不 | Generic CLI 模式尚未支持 |
| Flink-YARN(yarn-cluster) | 间接支持 | 详见 FAQ |
| Flink-YARN(yarn-session/yarn-per-job/yarn-application>=1.11) | 尚不 | Generic CLI 模式尚未支持 |
| Flink-Standalone(default) | 尚不 | |
| Flink-Standalone(remote>=1.11) | 尚不 | Generic CLI 模式尚未支持 |
| Flink-Kubernetes(default) | 尚不 | |
| Flink-Kubernetes(remote>=1.11) | 尚不 | Generic CLI 模式尚未支持 |
| Flink-NativeKubernetes(kubernetes-session/application>=1.11) | 尚不 | Generic CLI 模式尚未支持 |
| MapReduce | 间接支持 | 详见 FAQ |
| Kerberos | 间接支持 | 详见 FAQ |
| HTTP | 是 | |
| DataX | 间接支持 | 详见 FAQ |
| Sqoop | 间接支持 | 详见 FAQ |
| SQL-MySQL | 间接支持 | 详见 FAQ |
| SQL-PostgreSQL | 是 | |
| SQL-Hive | 间接支持 | 详见 FAQ |
| SQL-Spark | 间接支持 | 详见 FAQ |
| SQL-ClickHouse | 间接支持 | 详见 FAQ |
| SQL-Oracle | 间接支持 | 详见 FAQ |
| SQL-SQLServer | 间接支持 | 详见 FAQ |
| SQL-DB2 | 间接支持 | 详见 FAQ |
## FAQ
### 如何查看一个 pod 容器的日志?
列出所有 pods (别名 `po`):
```
kubectl get po
kubectl get po -n test # with test namespace
```
查看名为 dolphinscheduler-master-0 的 pod 容器的日志:
```
kubectl logs dolphinscheduler-master-0
kubectl logs -f dolphinscheduler-master-0 # 跟随日志输出
kubectl logs --tail 10 dolphinscheduler-master-0 -n test # 显示倒数10行日志
```
### 如何在 Kubernetes 上扩缩容 api, master 和 worker?
列出所有 deployments (别名 `deploy`):
```
kubectl get deploy
kubectl get deploy -n test # with test namespace
```
扩缩容 api 至 3 个副本:
```
kubectl scale --replicas=3 deploy dolphinscheduler-api
kubectl scale --replicas=3 deploy dolphinscheduler-api -n test # with test namespace
```
列出所有 statefulsets (别名 `sts`):
```
kubectl get sts
kubectl get sts -n test # with test namespace
```
扩缩容 master 至 2 个副本:
```
kubectl scale --replicas=2 sts dolphinscheduler-master
kubectl scale --replicas=2 sts dolphinscheduler-master -n test # with test namespace
```
扩缩容 worker 至 6 个副本:
```
kubectl scale --replicas=6 sts dolphinscheduler-worker
kubectl scale --replicas=6 sts dolphinscheduler-worker -n test # with test namespace
```
### 如何用 MySQL 替代 PostgreSQL 作为 DolphinScheduler 的数据库?
> 由于商业许可证的原因,我们不能直接使用 MySQL 的驱动包.
>
> 如果你要使用 MySQL, 你可以基于官方镜像 `apache/dolphinscheduler` 进行构建.
1. 下载 MySQL 驱动包 [mysql-connector-java-8.0.16.jar](https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.16/mysql-connector-java-8.0.16.jar)
2. 创建一个新的 `Dockerfile`,用于添加 MySQL 的驱动包:
```
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:1.3.8
COPY mysql-connector-java-8.0.16.jar /opt/dolphinscheduler/lib
```
3. 构建一个包含 MySQL 驱动包的新镜像:
```
docker build -t apache/dolphinscheduler:mysql-driver .
```
4. 推送 docker 镜像 `apache/dolphinscheduler:mysql-driver` 到一个 docker registry 中
5. 修改 `values.yaml` 文件中 image 的 `repository` 字段,并更新 `tag``mysql-driver`
6. 修改 `values.yaml` 文件中 postgresql 的 `enabled``false`
7. 修改 `values.yaml` 文件中的 externalDatabase 配置 (尤其修改 `host`, `username``password`)
```yaml
externalDatabase:
type: "mysql"
driver: "com.mysql.jdbc.Driver"
host: "localhost"
port: "3306"
username: "root"
password: "root"
database: "dolphinscheduler"
params: "useUnicode=true&characterEncoding=UTF-8"
```
8. 部署 dolphinscheduler (详见**安装 dolphinscheduler**)
### 如何在数据源中心支持 MySQL 数据源?
> 由于商业许可证的原因,我们不能直接使用 MySQL 的驱动包.
>
> 如果你要添加 MySQL 数据源, 你可以基于官方镜像 `apache/dolphinscheduler` 进行构建.
1. 下载 MySQL 驱动包 [mysql-connector-java-8.0.16.jar](https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.16/mysql-connector-java-8.0.16.jar)
2. 创建一个新的 `Dockerfile`,用于添加 MySQL 驱动包:
```
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:1.3.8
COPY mysql-connector-java-8.0.16.jar /opt/dolphinscheduler/lib
```
3. 构建一个包含 MySQL 驱动包的新镜像:
```
docker build -t apache/dolphinscheduler:mysql-driver .
```
4. 推送 docker 镜像 `apache/dolphinscheduler:mysql-driver` 到一个 docker registry 中
5. 修改 `values.yaml` 文件中 image 的 `repository` 字段,并更新 `tag``mysql-driver`
6. 部署 dolphinscheduler (详见**安装 dolphinscheduler**)
7. 在数据源中心添加一个 MySQL 数据源
### 如何在数据源中心支持 Oracle 数据源?
> 由于商业许可证的原因,我们不能直接使用 Oracle 的驱动包.
>
> 如果你要添加 Oracle 数据源, 你可以基于官方镜像 `apache/dolphinscheduler` 进行构建.
1. 下载 Oracle 驱动包 [ojdbc8.jar](https://repo1.maven.org/maven2/com/oracle/database/jdbc/ojdbc8/) (例如 `ojdbc8-19.9.0.0.jar`)
2. 创建一个新的 `Dockerfile`,用于添加 Oracle 驱动包:
```
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:1.3.8
COPY ojdbc8-19.9.0.0.jar /opt/dolphinscheduler/lib
```
3. 构建一个包含 Oracle 驱动包的新镜像:
```
docker build -t apache/dolphinscheduler:oracle-driver .
```
4. 推送 docker 镜像 `apache/dolphinscheduler:oracle-driver` 到一个 docker registry 中
5. 修改 `values.yaml` 文件中 image 的 `repository` 字段,并更新 `tag``oracle-driver`
6. 部署 dolphinscheduler (详见**安装 dolphinscheduler**)
7. 在数据源中心添加一个 Oracle 数据源
### 如何支持 Python 2 pip 以及自定义 requirements.txt?
1. 创建一个新的 `Dockerfile`,用于安装 pip:
```
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:1.3.8
COPY requirements.txt /tmp
RUN apt-get update && \
apt-get install -y --no-install-recommends python-pip && \
pip install --no-cache-dir -r /tmp/requirements.txt && \
rm -rf /var/lib/apt/lists/*
```
这个命令会安装默认的 **pip 18.1**. 如果你想升级 pip, 只需添加一行
```
pip install --no-cache-dir -U pip && \
```
2. 构建一个包含 pip 的新镜像:
```
docker build -t apache/dolphinscheduler:pip .
```
3. 推送 docker 镜像 `apache/dolphinscheduler:pip` 到一个 docker registry 中
4. 修改 `values.yaml` 文件中 image 的 `repository` 字段,并更新 `tag``pip`
5. 部署 dolphinscheduler (详见**安装 dolphinscheduler**)
6. 在一个新 Python 任务下验证 pip
### 如何支持 Python 3?
1. 创建一个新的 `Dockerfile`,用于安装 Python 3:
```
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:1.3.8
RUN apt-get update && \
apt-get install -y --no-install-recommends python3 && \
rm -rf /var/lib/apt/lists/*
```
这个命令会安装默认的 **Python 3.7.3**. 如果你也想安装 **pip3**, 将 `python3` 替换为 `python3-pip` 即可
```
apt-get install -y --no-install-recommends python3-pip && \
```
2. 构建一个包含 Python 3 的新镜像:
```
docker build -t apache/dolphinscheduler:python3 .
```
3. 推送 docker 镜像 `apache/dolphinscheduler:python3` 到一个 docker registry 中
4. 修改 `values.yaml` 文件中 image 的 `repository` 字段,并更新 `tag``python3`
5. 修改 `values.yaml` 文件中的 `PYTHON_HOME``/usr/bin/python3`
6. 部署 dolphinscheduler (详见**安装 dolphinscheduler**)
7. 在一个新 Python 任务下验证 Python 3
### 如何支持 Hadoop, Spark, Flink, Hive 或 DataX?
以 Spark 2.4.7 为例:
1. 下载 Spark 2.4.7 发布的二进制包 `spark-2.4.7-bin-hadoop2.7.tgz`
2. 确保 `common.sharedStoragePersistence.enabled` 开启
3. 部署 dolphinscheduler (详见**安装 dolphinscheduler**)
4. 复制 Spark 3.1.1 二进制包到 Docker 容器中
```bash
kubectl cp spark-2.4.7-bin-hadoop2.7.tgz dolphinscheduler-worker-0:/opt/soft
kubectl cp -n test spark-2.4.7-bin-hadoop2.7.tgz dolphinscheduler-worker-0:/opt/soft # with test namespace
```
因为存储卷 `sharedStoragePersistence` 被挂载到 `/opt/soft`, 因此 `/opt/soft` 中的所有文件都不会丢失
5. 登录到容器并确保 `SPARK_HOME2` 存在
```bash
kubectl exec -it dolphinscheduler-worker-0 bash
kubectl exec -n test -it dolphinscheduler-worker-0 bash # with test namespace
cd /opt/soft
tar zxf spark-2.4.7-bin-hadoop2.7.tgz
rm -f spark-2.4.7-bin-hadoop2.7.tgz
ln -s spark-2.4.7-bin-hadoop2.7 spark2 # or just mv
$SPARK_HOME2/bin/spark-submit --version
```
如果一切执行正常,最后一条命令将会打印 Spark 版本信息
6. 在一个 Shell 任务下验证 Spark
```
$SPARK_HOME2/bin/spark-submit --class org.apache.spark.examples.SparkPi $SPARK_HOME2/examples/jars/spark-examples_2.11-2.4.7.jar
```
检查任务日志是否包含输出 `Pi is roughly 3.146015`
7. 在一个 Spark 任务下验证 Spark
文件 `spark-examples_2.11-2.4.7.jar` 需要先被上传到资源中心,然后创建一个 Spark 任务并设置:
- Spark版本: `SPARK2`
- 主函数的Class: `org.apache.spark.examples.SparkPi`
- 主程序包: `spark-examples_2.11-2.4.7.jar`
- 部署方式: `local`
同样地, 检查任务日志是否包含输出 `Pi is roughly 3.146015`
8. 验证 Spark on YARN
Spark on YARN (部署方式为 `cluster``client`) 需要 Hadoop 支持. 类似于 Spark 支持, 支持 Hadoop 的操作几乎和前面的步骤相同
确保 `$HADOOP_HOME``$HADOOP_CONF_DIR` 存在
### 如何支持 Spark 3?
事实上,使用 `spark-submit` 提交应用的方式是相同的, 无论是 Spark 1, 2 或 3. 换句话说,`SPARK_HOME2` 的语义是第二个 `SPARK_HOME`, 而非 `SPARK2``HOME`, 因此只需设置 `SPARK_HOME2=/path/to/spark3` 即可
以 Spark 3.1.1 为例:
1. 下载 Spark 3.1.1 发布的二进制包 `spark-3.1.1-bin-hadoop2.7.tgz`
2. 确保 `common.sharedStoragePersistence.enabled` 开启
3. 部署 dolphinscheduler (详见**安装 dolphinscheduler**)
4. 复制 Spark 3.1.1 二进制包到 Docker 容器中
```bash
kubectl cp spark-3.1.1-bin-hadoop2.7.tgz dolphinscheduler-worker-0:/opt/soft
kubectl cp -n test spark-3.1.1-bin-hadoop2.7.tgz dolphinscheduler-worker-0:/opt/soft # with test namespace
```
5. 登录到容器并确保 `SPARK_HOME2` 存在
```bash
kubectl exec -it dolphinscheduler-worker-0 bash
kubectl exec -n test -it dolphinscheduler-worker-0 bash # with test namespace
cd /opt/soft
tar zxf spark-3.1.1-bin-hadoop2.7.tgz
rm -f spark-3.1.1-bin-hadoop2.7.tgz
ln -s spark-3.1.1-bin-hadoop2.7 spark2 # or just mv
$SPARK_HOME2/bin/spark-submit --version
```
如果一切执行正常,最后一条命令将会打印 Spark 版本信息
6. 在一个 Shell 任务下验证 Spark
```
$SPARK_HOME2/bin/spark-submit --class org.apache.spark.examples.SparkPi $SPARK_HOME2/examples/jars/spark-examples_2.12-3.1.1.jar
```
检查任务日志是否包含输出 `Pi is roughly 3.146015`
### 如何在 Master、Worker 和 Api 服务之间支持共享存储?
例如, Master、Worker 和 Api 服务可能同时使用 Hadoop
1. 修改 `values.yaml` 文件中下面的配置项
```yaml
common:
sharedStoragePersistence:
enabled: false
mountPath: "/opt/soft"
accessModes:
- "ReadWriteMany"
storageClassName: "-"
storage: "20Gi"
```
`storageClassName``storage` 需要被修改为实际值
> **注意**: `storageClassName` 必须支持访问模式: `ReadWriteMany`
2. 将 Hadoop 复制到目录 `/opt/soft`
3. 确保 `$HADOOP_HOME``$HADOOP_CONF_DIR` 正确
### 如何支持本地文件存储而非 HDFS 和 S3?
修改 `values.yaml` 文件中下面的配置项
```yaml
common:
configmap:
RESOURCE_STORAGE_TYPE: "HDFS"
RESOURCE_UPLOAD_PATH: "/dolphinscheduler"
FS_DEFAULT_FS: "file:///"
fsFileResourcePersistence:
enabled: true
accessModes:
- "ReadWriteMany"
storageClassName: "-"
storage: "20Gi"
```
`storageClassName``storage` 需要被修改为实际值
> **注意**: `storageClassName` 必须支持访问模式: `ReadWriteMany`
### 如何支持 S3 资源存储,例如 MinIO?
以 MinIO 为例: 修改 `values.yaml` 文件中下面的配置项
```yaml
common:
configmap:
RESOURCE_STORAGE_TYPE: "S3"
RESOURCE_UPLOAD_PATH: "/dolphinscheduler"
FS_DEFAULT_FS: "s3a://BUCKET_NAME"
FS_S3A_ENDPOINT: "http://MINIO_IP:9000"
FS_S3A_ACCESS_KEY: "MINIO_ACCESS_KEY"
FS_S3A_SECRET_KEY: "MINIO_SECRET_KEY"
```
`BUCKET_NAME`, `MINIO_IP`, `MINIO_ACCESS_KEY``MINIO_SECRET_KEY` 需要被修改为实际值
> **注意**: `MINIO_IP` 只能使用 IP 而非域名, 因为 DolphinScheduler 尚不支持 S3 路径风格访问 (S3 path style access)
### 如何配置 SkyWalking?
修改 `values.yaml` 文件中的 SKYWALKING 配置项
```yaml
common:
configmap:
SKYWALKING_ENABLE: "true"
SW_AGENT_COLLECTOR_BACKEND_SERVICES: "127.0.0.1:11800"
SW_GRPC_LOG_SERVER_HOST: "127.0.0.1"
SW_GRPC_LOG_SERVER_PORT: "11800"
```
## 附录-配置
| Parameter | Description | Default |
| --------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------ | ----------------------------------------------------- |
| `timezone` | World time and date for cities in all time zones | `Asia/Shanghai` |
| | | |
| `image.repository` | Docker image repository for the DolphinScheduler | `apache/dolphinscheduler` |
| `image.tag` | Docker image version for the DolphinScheduler | `latest` |
| `image.pullPolicy` | Image pull policy. One of Always, Never, IfNotPresent | `IfNotPresent` |
| `image.pullSecret` | Image pull secret. An optional reference to secret in the same namespace to use for pulling any of the images | `nil` |
| | | |
| `postgresql.enabled` | If not exists external PostgreSQL, by default, the DolphinScheduler will use a internal PostgreSQL | `true` |
| `postgresql.postgresqlUsername` | The username for internal PostgreSQL | `root` |
| `postgresql.postgresqlPassword` | The password for internal PostgreSQL | `root` |
| `postgresql.postgresqlDatabase` | The database for internal PostgreSQL | `dolphinscheduler` |
| `postgresql.persistence.enabled` | Set `postgresql.persistence.enabled` to `true` to mount a new volume for internal PostgreSQL | `false` |
| `postgresql.persistence.size` | `PersistentVolumeClaim` size | `20Gi` |
| `postgresql.persistence.storageClass` | PostgreSQL data persistent volume storage class. If set to "-", storageClassName: "", which disables dynamic provisioning | `-` |
| `externalDatabase.type` | If exists external PostgreSQL, and set `postgresql.enabled` value to false. DolphinScheduler's database type will use it | `postgresql` |
| `externalDatabase.driver` | If exists external PostgreSQL, and set `postgresql.enabled` value to false. DolphinScheduler's database driver will use it | `org.postgresql.Driver` |
| `externalDatabase.host` | If exists external PostgreSQL, and set `postgresql.enabled` value to false. DolphinScheduler's database host will use it | `localhost` |
| `externalDatabase.port` | If exists external PostgreSQL, and set `postgresql.enabled` value to false. DolphinScheduler's database port will use it | `5432` |
| `externalDatabase.username` | If exists external PostgreSQL, and set `postgresql.enabled` value to false. DolphinScheduler's database username will use it | `root` |
| `externalDatabase.password` | If exists external PostgreSQL, and set `postgresql.enabled` value to false. DolphinScheduler's database password will use it | `root` |
| `externalDatabase.database` | If exists external PostgreSQL, and set `postgresql.enabled` value to false. DolphinScheduler's database database will use it | `dolphinscheduler` |
| `externalDatabase.params` | If exists external PostgreSQL, and set `postgresql.enabled` value to false. DolphinScheduler's database params will use it | `characterEncoding=utf8` |
| | | |
| `zookeeper.enabled` | If not exists external Zookeeper, by default, the DolphinScheduler will use a internal Zookeeper | `true` |
| `zookeeper.fourlwCommandsWhitelist` | A list of comma separated Four Letter Words commands to use | `srvr,ruok,wchs,cons` |
| `zookeeper.persistence.enabled` | Set `zookeeper.persistence.enabled` to `true` to mount a new volume for internal Zookeeper | `false` |
| `zookeeper.persistence.size` | `PersistentVolumeClaim` size | `20Gi` |
| `zookeeper.persistence.storageClass` | Zookeeper data persistent volume storage class. If set to "-", storageClassName: "", which disables dynamic provisioning | `-` |
| `zookeeper.zookeeperRoot` | Specify dolphinscheduler root directory in Zookeeper | `/dolphinscheduler` |
| `externalZookeeper.zookeeperQuorum` | If exists external Zookeeper, and set `zookeeper.enabled` value to false. Specify Zookeeper quorum | `127.0.0.1:2181` |
| `externalZookeeper.zookeeperRoot` | If exists external Zookeeper, and set `zookeeper.enabled` value to false. Specify dolphinscheduler root directory in Zookeeper | `/dolphinscheduler` |
| | | |
| `common.configmap.DOLPHINSCHEDULER_OPTS` | The jvm options for dolphinscheduler, suitable for all servers | `""` |
| `common.configmap.DATA_BASEDIR_PATH` | User data directory path, self configuration, please make sure the directory exists and have read write permissions | `/tmp/dolphinscheduler` |
| `common.configmap.RESOURCE_STORAGE_TYPE` | Resource storage type: HDFS, S3, NONE | `HDFS` |
| `common.configmap.RESOURCE_UPLOAD_PATH` | Resource store on HDFS/S3 path, please make sure the directory exists on hdfs and have read write permissions | `/dolphinscheduler` |
| `common.configmap.FS_DEFAULT_FS` | Resource storage file system like `file:///`, `hdfs://mycluster:8020` or `s3a://dolphinscheduler` | `file:///` |
| `common.configmap.FS_S3A_ENDPOINT` | S3 endpoint when `common.configmap.RESOURCE_STORAGE_TYPE` is set to `S3` | `s3.xxx.amazonaws.com` |
| `common.configmap.FS_S3A_ACCESS_KEY` | S3 access key when `common.configmap.RESOURCE_STORAGE_TYPE` is set to `S3` | `xxxxxxx` |
| `common.configmap.FS_S3A_SECRET_KEY` | S3 secret key when `common.configmap.RESOURCE_STORAGE_TYPE` is set to `S3` | `xxxxxxx` |
| `common.configmap.HADOOP_SECURITY_AUTHENTICATION_STARTUP_STATE` | Whether to startup kerberos | `false` |
| `common.configmap.JAVA_SECURITY_KRB5_CONF_PATH` | The java.security.krb5.conf path | `/opt/krb5.conf` |
| `common.configmap.LOGIN_USER_KEYTAB_USERNAME` | The login user from keytab username | `hdfs@HADOOP.COM` |
| `common.configmap.LOGIN_USER_KEYTAB_PATH` | The login user from keytab path | `/opt/hdfs.keytab` |
| `common.configmap.KERBEROS_EXPIRE_TIME` | The kerberos expire time, the unit is hour | `2` |
| `common.configmap.HDFS_ROOT_USER` | The HDFS root user who must have the permission to create directories under the HDFS root path | `hdfs` |
| `common.configmap.RESOURCE_MANAGER_HTTPADDRESS_PORT` | Set resource manager httpaddress port for yarn | `8088` |
| `common.configmap.YARN_RESOURCEMANAGER_HA_RM_IDS` | If resourcemanager HA is enabled, please set the HA IPs | `nil` |
| `common.configmap.YARN_APPLICATION_STATUS_ADDRESS` | If resourcemanager is single, you only need to replace ds1 to actual resourcemanager hostname, otherwise keep default | `http://ds1:%s/ws/v1/cluster/apps/%s` |
| `common.configmap.SKYWALKING_ENABLE` | Set whether to enable skywalking | `false` |
| `common.configmap.SW_AGENT_COLLECTOR_BACKEND_SERVICES` | Set agent collector backend services for skywalking | `127.0.0.1:11800` |
| `common.configmap.SW_GRPC_LOG_SERVER_HOST` | Set grpc log server host for skywalking | `127.0.0.1` |
| `common.configmap.SW_GRPC_LOG_SERVER_PORT` | Set grpc log server port for skywalking | `11800` |
| `common.configmap.HADOOP_HOME` | Set `HADOOP_HOME` for DolphinScheduler's task environment | `/opt/soft/hadoop` |
| `common.configmap.HADOOP_CONF_DIR` | Set `HADOOP_CONF_DIR` for DolphinScheduler's task environment | `/opt/soft/hadoop/etc/hadoop` |
| `common.configmap.SPARK_HOME1` | Set `SPARK_HOME1` for DolphinScheduler's task environment | `/opt/soft/spark1` |
| `common.configmap.SPARK_HOME2` | Set `SPARK_HOME2` for DolphinScheduler's task environment | `/opt/soft/spark2` |
| `common.configmap.PYTHON_HOME` | Set `PYTHON_HOME` for DolphinScheduler's task environment | `/usr/bin/python` |
| `common.configmap.JAVA_HOME` | Set `JAVA_HOME` for DolphinScheduler's task environment | `/usr/local/openjdk-8` |
| `common.configmap.HIVE_HOME` | Set `HIVE_HOME` for DolphinScheduler's task environment | `/opt/soft/hive` |
| `common.configmap.FLINK_HOME` | Set `FLINK_HOME` for DolphinScheduler's task environment | `/opt/soft/flink` |
| `common.configmap.DATAX_HOME` | Set `DATAX_HOME` for DolphinScheduler's task environment | `/opt/soft/datax` |
| `common.sharedStoragePersistence.enabled` | Set `common.sharedStoragePersistence.enabled` to `true` to mount a shared storage volume for Hadoop, Spark binary and etc | `false` |
| `common.sharedStoragePersistence.mountPath` | The mount path for the shared storage volume | `/opt/soft` |
| `common.sharedStoragePersistence.accessModes` | `PersistentVolumeClaim` access modes, must be `ReadWriteMany` | `[ReadWriteMany]` |
| `common.sharedStoragePersistence.storageClassName` | Shared Storage persistent volume storage class, must support the access mode: ReadWriteMany | `-` |
| `common.sharedStoragePersistence.storage` | `PersistentVolumeClaim` size | `20Gi` |
| `common.fsFileResourcePersistence.enabled` | Set `common.fsFileResourcePersistence.enabled` to `true` to mount a new file resource volume for `api` and `worker` | `false` |
| `common.fsFileResourcePersistence.accessModes` | `PersistentVolumeClaim` access modes, must be `ReadWriteMany` | `[ReadWriteMany]` |
| `common.fsFileResourcePersistence.storageClassName` | Resource persistent volume storage class, must support the access mode: ReadWriteMany | `-` |
| `common.fsFileResourcePersistence.storage` | `PersistentVolumeClaim` size | `20Gi` |
| | | |
| `master.podManagementPolicy` | PodManagementPolicy controls how pods are created during initial scale up, when replacing pods on nodes, or when scaling down | `Parallel` |
| `master.replicas` | Replicas is the desired number of replicas of the given Template | `3` |
| `master.annotations` | The `annotations` for master server | `{}` |
| `master.affinity` | If specified, the pod's scheduling constraints | `{}` |
| `master.nodeSelector` | NodeSelector is a selector which must be true for the pod to fit on a node | `{}` |
| `master.tolerations` | If specified, the pod's tolerations | `{}` |
| `master.resources` | The `resource` limit and request config for master server | `{}` |
| `master.configmap.MASTER_SERVER_OPTS` | The jvm options for master server | `-Xms1g -Xmx1g -Xmn512m` |
| `master.configmap.MASTER_EXEC_THREADS` | Master execute thread number to limit process instances | `100` |
| `master.configmap.MASTER_EXEC_TASK_NUM` | Master execute task number in parallel per process instance | `20` |
| `master.configmap.MASTER_DISPATCH_TASK_NUM` | Master dispatch task number per batch | `3` |
| `master.configmap.MASTER_HOST_SELECTOR` | Master host selector to select a suitable worker, optional values include Random, RoundRobin, LowerWeight | `LowerWeight` |
| `master.configmap.MASTER_HEARTBEAT_INTERVAL` | Master heartbeat interval, the unit is second | `10` |
| `master.configmap.MASTER_TASK_COMMIT_RETRYTIMES` | Master commit task retry times | `5` |
| `master.configmap.MASTER_TASK_COMMIT_INTERVAL` | master commit task interval, the unit is second | `1` |
| `master.configmap.MASTER_MAX_CPULOAD_AVG` | Master max cpuload avg, only higher than the system cpu load average, master server can schedule | `-1` (`the number of cpu cores * 2`) |
| `master.configmap.MASTER_RESERVED_MEMORY` | Master reserved memory, only lower than system available memory, master server can schedule, the unit is G | `0.3` |
| `master.livenessProbe.enabled` | Turn on and off liveness probe | `true` |
| `master.livenessProbe.initialDelaySeconds` | Delay before liveness probe is initiated | `30` |
| `master.livenessProbe.periodSeconds` | How often to perform the probe | `30` |
| `master.livenessProbe.timeoutSeconds` | When the probe times out | `5` |
| `master.livenessProbe.failureThreshold` | Minimum consecutive successes for the probe | `3` |
| `master.livenessProbe.successThreshold` | Minimum consecutive failures for the probe | `1` |
| `master.readinessProbe.enabled` | Turn on and off readiness probe | `true` |
| `master.readinessProbe.initialDelaySeconds` | Delay before readiness probe is initiated | `30` |
| `master.readinessProbe.periodSeconds` | How often to perform the probe | `30` |
| `master.readinessProbe.timeoutSeconds` | When the probe times out | `5` |
| `master.readinessProbe.failureThreshold` | Minimum consecutive successes for the probe | `3` |
| `master.readinessProbe.successThreshold` | Minimum consecutive failures for the probe | `1` |
| `master.persistentVolumeClaim.enabled` | Set `master.persistentVolumeClaim.enabled` to `true` to mount a new volume for `master` | `false` |
| `master.persistentVolumeClaim.accessModes` | `PersistentVolumeClaim` access modes | `[ReadWriteOnce]` |
| `master.persistentVolumeClaim.storageClassName` | `Master` logs data persistent volume storage class. If set to "-", storageClassName: "", which disables dynamic provisioning | `-` |
| `master.persistentVolumeClaim.storage` | `PersistentVolumeClaim` size | `20Gi` |
| | | |
| `worker.podManagementPolicy` | PodManagementPolicy controls how pods are created during initial scale up, when replacing pods on nodes, or when scaling down | `Parallel` |
| `worker.replicas` | Replicas is the desired number of replicas of the given Template | `3` |
| `worker.annotations` | The `annotations` for worker server | `{}` |
| `worker.affinity` | If specified, the pod's scheduling constraints | `{}` |
| `worker.nodeSelector` | NodeSelector is a selector which must be true for the pod to fit on a node | `{}` |
| `worker.tolerations` | If specified, the pod's tolerations | `{}` |
| `worker.resources` | The `resource` limit and request config for worker server | `{}` |
| `worker.configmap.WORKER_SERVER_OPTS` | The jvm options for worker server | `-Xms1g -Xmx1g -Xmn512m` |
| `worker.configmap.WORKER_EXEC_THREADS` | Worker execute thread number to limit task instances | `100` |
| `worker.configmap.WORKER_HEARTBEAT_INTERVAL` | Worker heartbeat interval, the unit is second | `10` |
| `worker.configmap.WORKER_MAX_CPULOAD_AVG` | Worker max cpuload avg, only higher than the system cpu load average, worker server can be dispatched tasks | `-1` (`the number of cpu cores * 2`) |
| `worker.configmap.WORKER_RESERVED_MEMORY` | Worker reserved memory, only lower than system available memory, worker server can be dispatched tasks, the unit is G | `0.3` |
| `worker.configmap.WORKER_GROUPS` | Worker groups | `default` |
| `worker.livenessProbe.enabled` | Turn on and off liveness probe | `true` |
| `worker.livenessProbe.initialDelaySeconds` | Delay before liveness probe is initiated | `30` |
| `worker.livenessProbe.periodSeconds` | How often to perform the probe | `30` |
| `worker.livenessProbe.timeoutSeconds` | When the probe times out | `5` |
| `worker.livenessProbe.failureThreshold` | Minimum consecutive successes for the probe | `3` |
| `worker.livenessProbe.successThreshold` | Minimum consecutive failures for the probe | `1` |
| `worker.readinessProbe.enabled` | Turn on and off readiness probe | `true` |
| `worker.readinessProbe.initialDelaySeconds` | Delay before readiness probe is initiated | `30` |
| `worker.readinessProbe.periodSeconds` | How often to perform the probe | `30` |
| `worker.readinessProbe.timeoutSeconds` | When the probe times out | `5` |
| `worker.readinessProbe.failureThreshold` | Minimum consecutive successes for the probe | `3` |
| `worker.readinessProbe.successThreshold` | Minimum consecutive failures for the probe | `1` |
| `worker.persistentVolumeClaim.enabled` | Set `worker.persistentVolumeClaim.enabled` to `true` to enable `persistentVolumeClaim` for `worker` | `false` |
| `worker.persistentVolumeClaim.dataPersistentVolume.enabled` | Set `worker.persistentVolumeClaim.dataPersistentVolume.enabled` to `true` to mount a data volume for `worker` | `false` |
| `worker.persistentVolumeClaim.dataPersistentVolume.accessModes` | `PersistentVolumeClaim` access modes | `[ReadWriteOnce]` |
| `worker.persistentVolumeClaim.dataPersistentVolume.storageClassName` | `Worker` data persistent volume storage class. If set to "-", storageClassName: "", which disables dynamic provisioning | `-` |
| `worker.persistentVolumeClaim.dataPersistentVolume.storage` | `PersistentVolumeClaim` size | `20Gi` |
| `worker.persistentVolumeClaim.logsPersistentVolume.enabled` | Set `worker.persistentVolumeClaim.logsPersistentVolume.enabled` to `true` to mount a logs volume for `worker` | `false` |
| `worker.persistentVolumeClaim.logsPersistentVolume.accessModes` | `PersistentVolumeClaim` access modes | `[ReadWriteOnce]` |
| `worker.persistentVolumeClaim.logsPersistentVolume.storageClassName` | `Worker` logs data persistent volume storage class. If set to "-", storageClassName: "", which disables dynamic provisioning | `-` |
| `worker.persistentVolumeClaim.logsPersistentVolume.storage` | `PersistentVolumeClaim` size | `20Gi` |
| | | |
| `alert.replicas` | Replicas is the desired number of replicas of the given Template | `1` |
| `alert.strategy.type` | Type of deployment. Can be "Recreate" or "RollingUpdate" | `RollingUpdate` |
| `alert.strategy.rollingUpdate.maxSurge` | The maximum number of pods that can be scheduled above the desired number of pods | `25%` |
| `alert.strategy.rollingUpdate.maxUnavailable` | The maximum number of pods that can be unavailable during the update | `25%` |
| `alert.annotations` | The `annotations` for alert server | `{}` |
| `alert.affinity` | If specified, the pod's scheduling constraints | `{}` |
| `alert.nodeSelector` | NodeSelector is a selector which must be true for the pod to fit on a node | `{}` |
| `alert.tolerations` | If specified, the pod's tolerations | `{}` |
| `alert.resources` | The `resource` limit and request config for alert server | `{}` |
| `alert.configmap.ALERT_SERVER_OPTS` | The jvm options for alert server | `-Xms512m -Xmx512m -Xmn256m` |
| `alert.configmap.XLS_FILE_PATH` | XLS file path | `/tmp/xls` |
| `alert.configmap.MAIL_SERVER_HOST` | Mail `SERVER HOST ` | `nil` |
| `alert.configmap.MAIL_SERVER_PORT` | Mail `SERVER PORT` | `nil` |
| `alert.configmap.MAIL_SENDER` | Mail `SENDER` | `nil` |
| `alert.configmap.MAIL_USER` | Mail `USER` | `nil` |
| `alert.configmap.MAIL_PASSWD` | Mail `PASSWORD` | `nil` |
| `alert.configmap.MAIL_SMTP_STARTTLS_ENABLE` | Mail `SMTP STARTTLS` enable | `false` |
| `alert.configmap.MAIL_SMTP_SSL_ENABLE` | Mail `SMTP SSL` enable | `false` |
| `alert.configmap.MAIL_SMTP_SSL_TRUST` | Mail `SMTP SSL TRUST` | `nil` |
| `alert.configmap.ENTERPRISE_WECHAT_ENABLE` | `Enterprise Wechat` enable | `false` |
| `alert.configmap.ENTERPRISE_WECHAT_CORP_ID` | `Enterprise Wechat` corp id | `nil` |
| `alert.configmap.ENTERPRISE_WECHAT_SECRET` | `Enterprise Wechat` secret | `nil` |
| `alert.configmap.ENTERPRISE_WECHAT_AGENT_ID` | `Enterprise Wechat` agent id | `nil` |
| `alert.configmap.ENTERPRISE_WECHAT_USERS` | `Enterprise Wechat` users | `nil` |
| `alert.livenessProbe.enabled` | Turn on and off liveness probe | `true` |
| `alert.livenessProbe.initialDelaySeconds` | Delay before liveness probe is initiated | `30` |
| `alert.livenessProbe.periodSeconds` | How often to perform the probe | `30` |
| `alert.livenessProbe.timeoutSeconds` | When the probe times out | `5` |
| `alert.livenessProbe.failureThreshold` | Minimum consecutive successes for the probe | `3` |
| `alert.livenessProbe.successThreshold` | Minimum consecutive failures for the probe | `1` |
| `alert.readinessProbe.enabled` | Turn on and off readiness probe | `true` |
| `alert.readinessProbe.initialDelaySeconds` | Delay before readiness probe is initiated | `30` |
| `alert.readinessProbe.periodSeconds` | How often to perform the probe | `30` |
| `alert.readinessProbe.timeoutSeconds` | When the probe times out | `5` |
| `alert.readinessProbe.failureThreshold` | Minimum consecutive successes for the probe | `3` |
| `alert.readinessProbe.successThreshold` | Minimum consecutive failures for the probe | `1` |
| `alert.persistentVolumeClaim.enabled` | Set `alert.persistentVolumeClaim.enabled` to `true` to mount a new volume for `alert` | `false` |
| `alert.persistentVolumeClaim.accessModes` | `PersistentVolumeClaim` access modes | `[ReadWriteOnce]` |
| `alert.persistentVolumeClaim.storageClassName` | `Alert` logs data persistent volume storage class. If set to "-", storageClassName: "", which disables dynamic provisioning | `-` |
| `alert.persistentVolumeClaim.storage` | `PersistentVolumeClaim` size | `20Gi` |
| | | |
| `api.replicas` | Replicas is the desired number of replicas of the given Template | `1` |
| `api.strategy.type` | Type of deployment. Can be "Recreate" or "RollingUpdate" | `RollingUpdate` |
| `api.strategy.rollingUpdate.maxSurge` | The maximum number of pods that can be scheduled above the desired number of pods | `25%` |
| `api.strategy.rollingUpdate.maxUnavailable` | The maximum number of pods that can be unavailable during the update | `25%` |
| `api.annotations` | The `annotations` for api server | `{}` |
| `api.affinity` | If specified, the pod's scheduling constraints | `{}` |
| `api.nodeSelector` | NodeSelector is a selector which must be true for the pod to fit on a node | `{}` |
| `api.tolerations` | If specified, the pod's tolerations | `{}` |
| `api.resources` | The `resource` limit and request config for api server | `{}` |
| `api.configmap.API_SERVER_OPTS` | The jvm options for api server | `-Xms512m -Xmx512m -Xmn256m` |
| `api.livenessProbe.enabled` | Turn on and off liveness probe | `true` |
| `api.livenessProbe.initialDelaySeconds` | Delay before liveness probe is initiated | `30` |
| `api.livenessProbe.periodSeconds` | How often to perform the probe | `30` |
| `api.livenessProbe.timeoutSeconds` | When the probe times out | `5` |
| `api.livenessProbe.failureThreshold` | Minimum consecutive successes for the probe | `3` |
| `api.livenessProbe.successThreshold` | Minimum consecutive failures for the probe | `1` |
| `api.readinessProbe.enabled` | Turn on and off readiness probe | `true` |
| `api.readinessProbe.initialDelaySeconds` | Delay before readiness probe is initiated | `30` |
| `api.readinessProbe.periodSeconds` | How often to perform the probe | `30` |
| `api.readinessProbe.timeoutSeconds` | When the probe times out | `5` |
| `api.readinessProbe.failureThreshold` | Minimum consecutive successes for the probe | `3` |
| `api.readinessProbe.successThreshold` | Minimum consecutive failures for the probe | `1` |
| `api.persistentVolumeClaim.enabled` | Set `api.persistentVolumeClaim.enabled` to `true` to mount a new volume for `api` | `false` |
| `api.persistentVolumeClaim.accessModes` | `PersistentVolumeClaim` access modes | `[ReadWriteOnce]` |
| `api.persistentVolumeClaim.storageClassName` | `api` logs data persistent volume storage class. If set to "-", storageClassName: "", which disables dynamic provisioning | `-` |
| `api.persistentVolumeClaim.storage` | `PersistentVolumeClaim` size | `20Gi` |
| `api.service.type` | `type` determines how the Service is exposed. Valid options are ExternalName, ClusterIP, NodePort, and LoadBalancer | `ClusterIP` |
| `api.service.clusterIP` | `clusterIP` is the IP address of the service and is usually assigned randomly by the master | `nil` |
| `api.service.nodePort` | `nodePort` is the port on each node on which this service is exposed when type=NodePort | `nil` |
| `api.service.externalIPs` | `externalIPs` is a list of IP addresses for which nodes in the cluster will also accept traffic for this service | `[]` |
| `api.service.externalName` | `externalName` is the external reference that kubedns or equivalent will return as a CNAME record for this service | `nil` |
| `api.service.loadBalancerIP` | `loadBalancerIP` when service.type is LoadBalancer. LoadBalancer will get created with the IP specified in this field | `nil` |
| `api.service.annotations` | `annotations` may need to be set when service.type is LoadBalancer | `{}` |
| | | |
| `ingress.enabled` | Enable ingress | `false` |
| `ingress.host` | Ingress host | `dolphinscheduler.org` |
| `ingress.path` | Ingress path | `/dolphinscheduler` |
| `ingress.tls.enabled` | Enable ingress tls | `false` |
| `ingress.tls.secretName` | Ingress tls secret name | `dolphinscheduler-tls` |

200
docs/docs/zh/guide/installation/pseudo-cluster.md

@ -0,0 +1,200 @@
# 伪集群部署
伪集群部署目的是在单台机器部署 DolphinScheduler 服务,该模式下master、worker、api server 都在同一台机器上
如果你是新手,想要体验 DolphinScheduler 的功能,推荐使用[Standalone](standalone.md)方式体检。如果你想体验更完整的功能,或者更大的任务量,推荐使用[伪集群部署](pseudo-cluster.md)。如果你是在生产中使用,推荐使用[集群部署](cluster.md)或者[kubernetes](kubernetes.md)
## 前置准备工作
伪分布式部署 DolphinScheduler 需要有外部软件的支持
* JDK:下载[JDK][jdk] (1.8+),并将 JAVA_HOME 配置到以及 PATH 变量中。如果你的环境中已存在,可以跳过这步。
* 二进制包:在[下载页面](https://dolphinscheduler.apache.org/zh-cn/download/download.html)下载 DolphinScheduler 二进制包
* 数据库:[PostgreSQL](https://www.postgresql.org/download/) (8.2.15+) 或者 [MySQL](https://dev.mysql.com/downloads/mysql/) (5.7+),两者任选其一即可,如 MySQL 则需要 JDBC Driver 8.0.16
* 注册中心:[ZooKeeper](https://zookeeper.apache.org/releases.html) (3.4.6+),[下载地址][zookeeper]
* 进程树分析
* macOS安装`pstree`
* Fedora/Red/Hat/CentOS/Ubuntu/Debian安装`psmisc`
> **_注意:_** DolphinScheduler 本身不依赖 Hadoop、Hive、Spark,但如果你运行的任务需要依赖他们,就需要有对应的环境支持
## 准备 DolphinScheduler 启动环境
### 配置用户免密及权限
创建部署用户,并且一定要配置 `sudo` 免密。以创建 dolphinscheduler 用户为例
```shell
# 创建用户需使用 root 登录
useradd dolphinscheduler
# 添加密码
echo "dolphinscheduler" | passwd --stdin dolphinscheduler
# 配置 sudo 免密
sed -i '$adolphinscheduler ALL=(ALL) NOPASSWD: NOPASSWD: ALL' /etc/sudoers
sed -i 's/Defaults requirett/#Defaults requirett/g' /etc/sudoers
# 修改目录权限,使得部署用户对二进制包解压后的 apache-dolphinscheduler-*-bin 目录有操作权限
chown -R dolphinscheduler:dolphinscheduler apache-dolphinscheduler-*-bin
```
> **_注意:_**
>
> * 因为任务执行服务是以 `sudo -u {linux-user}` 切换不同 linux 用户的方式来实现多租户运行作业,所以部署用户需要有 sudo 权限,而且是免密的。初学习者不理解的话,完全可以暂时忽略这一点
> * 如果发现 `/etc/sudoers` 文件中有 "Defaults requirett" 这行,也请注释掉
### 配置机器SSH免密登陆
由于安装的时候需要向不同机器发送资源,所以要求各台机器间能实现SSH免密登陆。配置免密登陆的步骤如下
```shell
su dolphinscheduler
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
```
> **_注意:_** 配置完成后,可以通过运行命令 `ssh localhost` 判断是否成功,如果不需要输入密码就能ssh登陆则证明成功
### 启动zookeeper
进入 zookeeper 的安装目录,将 `zoo_sample.cfg` 配置文件复制到 `conf/zoo.cfg`,并将 `conf/zoo.cfg` 中 dataDir 中的值改成 `dataDir=./tmp/zookeeper`
```shell
# 启动 zookeeper
./bin/zkServer.sh start
```
<!--
修改数据库配置,并初始化
```properties
spring.datasource.driver-class-name=com.mysql.jdbc.Driver
spring.datasource.url=jdbc:mysql://localhost:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8&allowMultiQueries=true
# 如果你不是以 dolphinscheduler/dolphinscheduler 作为用户名和密码的,需要进行修改
spring.datasource.username=dolphinscheduler
spring.datasource.password=dolphinscheduler
```
修改并保存完后,执行 script 目录下的创建表及导入基础数据脚本
```shell
sh script/create-dolphinscheduler.sh
```
-->
## 修改相关配置
完成了基础环境的准备后,在运行部署命令前,还需要根据环境修改配置文件。配置文件在路径在`conf/config/install_config.conf`下,一般部署只需要修改**INSTALL MACHINE、DolphinScheduler ENV、Database、Registry Server**部分即可完成部署,下面对必须修改参数进行说明
```shell
# ---------------------------------------------------------
# INSTALL MACHINE
# ---------------------------------------------------------
# 因为是在单节点上部署master、worker、API server,所以服务器的IP均为机器IP或者localhost
ips="localhost"
masters="localhost"
workers="localhost:default"
alertServer="localhost"
apiServers="localhost"
# DolphinScheduler安装路径,如果不存在会创建
installPath="~/dolphinscheduler"
# 部署用户,填写在 **配置用户免密及权限** 中创建的用户
deployUser="dolphinscheduler"
# ---------------------------------------------------------
# DolphinScheduler ENV
# ---------------------------------------------------------
# JAVA_HOME 的路径,是在 **前置准备工作** 安装的JDK中 JAVA_HOME 所在的位置
javaHome="/your/java/home/here"
# ---------------------------------------------------------
# Database
# ---------------------------------------------------------
# 数据库的类型,用户名,密码,IP,端口,元数据库db。其中dbtype目前支持 mysql 和 postgresql
dbtype="mysql"
dbhost="localhost:3306"
# 如果你不是以 dolphinscheduler/dolphinscheduler 作为用户名和密码的,需要进行修改
username="dolphinscheduler"
password="dolphinscheduler"
dbname="dolphinscheduler"
# ---------------------------------------------------------
# Registry Server
# ---------------------------------------------------------
# 注册中心地址,zookeeper服务的地址
registryServers="localhost:2181"
```
## 初始化数据库
DolphinScheduler 元数据存储在关系型数据库中,目前支持 PostgreSQL 和 MySQL,如果使用 MySQL 则需要手动下载 [mysql-connector-java 驱动][mysql] (8.0.16) 并移动到 DolphinScheduler 的 lib目录下。下面以 MySQL 为例,说明如何初始化数据库
```shell
mysql -uroot -p
mysql> CREATE DATABASE dolphinscheduler DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
# 修改 {user} 和 {password} 为你希望的用户名和密码
mysql> GRANT ALL PRIVILEGES ON dolphinscheduler.* TO '{user}'@'%' IDENTIFIED BY '{password}';
mysql> GRANT ALL PRIVILEGES ON dolphinscheduler.* TO '{user}'@'localhost' IDENTIFIED BY '{password}';
mysql> flush privileges;
```
完成上述步骤后,您已经为 DolphinScheduler 创建一个新数据库,现在你可以通过快速的 Shell 脚本来初始化数据库
```shell
sh script/create-dolphinscheduler.sh
```
## 启动 DolphinScheduler
使用上面创建的**部署用户**运行以下命令完成部署,部署后的运行日志将存放在 logs 文件夹内
```shell
sh install.sh
```
> **_注意:_** 第一次部署的话,可能出现 5 次`sh: bin/dolphinscheduler-daemon.sh: No such file or directory`相关信息,次为非重要信息直接忽略即可
## 登录 DolphinScheduler
浏览器访问地址 http://localhost:12345/dolphinscheduler 即可登录系统UI。默认的用户名和密码是 **admin/dolphinscheduler123**
## 启停服务
```shell
# 一键停止集群所有服务
sh ./bin/stop-all.sh
# 一键开启集群所有服务
sh ./bin/start-all.sh
# 启停 Master
sh ./bin/dolphinscheduler-daemon.sh stop master-server
sh ./bin/dolphinscheduler-daemon.sh start master-server
# 启停 Worker
sh ./bin/dolphinscheduler-daemon.sh start worker-server
sh ./bin/dolphinscheduler-daemon.sh stop worker-server
# 启停 Api
sh ./bin/dolphinscheduler-daemon.sh start api-server
sh ./bin/dolphinscheduler-daemon.sh stop api-server
# 启停 Alert
sh ./bin/dolphinscheduler-daemon.sh start alert-server
sh ./bin/dolphinscheduler-daemon.sh stop alert-server
```
> **_注意:_**:服务用途请具体参见《系统架构设计》小节
[jdk]: https://www.oracle.com/technetwork/java/javase/downloads/index.html
[zookeeper]: https://zookeeper.apache.org/releases.html
[mysql]: https://downloads.MySQL.com/archives/c-j/
[issue]: https://github.com/apache/dolphinscheduler/issues/6597

74
docs/docs/zh/guide/installation/skywalking-agent.md

@ -0,0 +1,74 @@
SkyWalking Agent 部署
=============================
dolphinscheduler-skywalking 模块为 Dolphinscheduler 项目提供了 [Skywalking](https://skywalking.apache.org/) 监控代理。
本文档介绍了如何通过此模块接入 SkyWalking 8.4+ (推荐使用8.5.0)。
# 安装
以下配置用于启用 Skywalking agent。
### 通过配置环境变量 (使用 Docker Compose 部署时)
修改 `docker/docker-swarm/config.env.sh` 文件中的 SKYWALKING 环境变量:
```
SKYWALKING_ENABLE=true
SW_AGENT_COLLECTOR_BACKEND_SERVICES=127.0.0.1:11800
SW_GRPC_LOG_SERVER_HOST=127.0.0.1
SW_GRPC_LOG_SERVER_PORT=11800
```
并且运行
```shell
$ docker-compose up -d
```
### 通过配置环境变量 (使用 Docker 部署时)
```shell
$ docker run -d --name dolphinscheduler \
-e DATABASE_HOST="192.168.x.x" -e DATABASE_PORT="5432" -e DATABASE_DATABASE="dolphinscheduler" \
-e DATABASE_USERNAME="test" -e DATABASE_PASSWORD="test" \
-e ZOOKEEPER_QUORUM="192.168.x.x:2181" \
-e SKYWALKING_ENABLE="true" \
-e SW_AGENT_COLLECTOR_BACKEND_SERVICES="your.skywalking-oap-server.com:11800" \
-e SW_GRPC_LOG_SERVER_HOST="your.skywalking-log-reporter.com" \
-e SW_GRPC_LOG_SERVER_PORT="11800" \
-p 12345:12345 \
apache/dolphinscheduler:1.3.8 all
```
### 通过配置 install_config.conf (使用 DolphinScheduler install.sh 部署时)
添加以下配置到 `${workDir}/conf/config/install_config.conf`.
```properties
# skywalking config
# note: enable skywalking tracking plugin
enableSkywalking="true"
# note: configure skywalking backend service address
skywalkingServers="your.skywalking-oap-server.com:11800"
# note: configure skywalking log reporter host
skywalkingLogReporterHost="your.skywalking-log-reporter.com"
# note: configure skywalking log reporter port
skywalkingLogReporterPort="11800"
```
# 使用
### 导入图表
#### 导入图表到 Skywalking server
复制 `${dolphinscheduler.home}/ext/skywalking-agent/dashboard/dolphinscheduler.yml` 文件到 `${skywalking-oap-server.home}/config/ui-initialized-templates/` 目录下,并重启 Skywalking oap-server。
#### 查看 dolphinscheduler 图表
如果之前已经使用浏览器打开过 Skywalking,则需要清空浏览器缓存。
![img1](/img/skywalking/import-dashboard-1.jpg)

42
docs/docs/zh/guide/installation/standalone.md

@ -0,0 +1,42 @@
# Standalone极速体验版
Standalone 仅适用于 DolphinScheduler 的快速体验.
如果你是新手,想要体验 DolphinScheduler 的功能,推荐使用[Standalone](standalone.md)方式体检。如果你想体验更完整的功能,或者更大的任务量,推荐使用[伪集群部署](pseudo-cluster.md)。如果你是在生产中使用,推荐使用[集群部署](cluster.md)或者[kubernetes](kubernetes.md)
> **_注意:_** Standalone仅建议20个以下工作流使用,因为其采用 H2 Database, Zookeeper Testing Server,任务过多可能导致不稳定
## 前置准备工作
* JDK:下载[JDK][jdk] (1.8+),并将 `JAVA_HOME` 配置到以及 `PATH` 变量中。如果你的环境中已存在,可以跳过这步。
* 二进制包:在[下载页面](https://dolphinscheduler.apache.org/zh-cn/download/download.html)下载 DolphinScheduler 二进制包
## 启动 DolphinScheduler Standalone Server
### 解压并启动 DolphinScheduler
二进制压缩包中有 standalone 启动的脚本,解压后即可快速启动。切换到有sudo权限的用户,运行脚本
```shell
# 解压并运行 Standalone Server
tar -xvzf apache-dolphinscheduler-*-bin.tar.gz
cd apache-dolphinscheduler-*-bin
sh ./bin/dolphinscheduler-daemon.sh start standalone-server
```
### 登录 DolphinScheduler
浏览器访问地址 http://localhost:12345/dolphinscheduler 即可登录系统UI。默认的用户名和密码是 **admin/dolphinscheduler123**
## 启停服务
脚本 `./bin/dolphinscheduler-daemon.sh` 除了可以快捷启动 standalone 外,还能停止服务运行,全部命令如下
```shell
# 启动 Standalone Server 服务
sh ./bin/dolphinscheduler-daemon.sh start standalone-server
# 停止 Standalone Server 服务
sh ./bin/dolphinscheduler-daemon.sh stop standalone-server
```
[jdk]: https://www.oracle.com/technetwork/java/javase/downloads/index.html

32
docs/docs/zh/guide/monitor.md

@ -0,0 +1,32 @@
# 监控中心
## 服务管理
- 服务管理主要是对系统中的各个服务的健康状况和基本信息的监控和显示
### Master 监控
- 主要是 master 的相关信息。
![master](/img/new_ui/dev/monitor/master.png)
### Worker 监控
- 主要是 worker 的相关信息。
![worker](/img/new_ui/dev/monitor/worker.png)
### DB 监控
- 主要是 DB 的健康状况
![db](/img/new_ui/dev/monitor/db.png)
## 统计管理
![statistics](/img/new_ui/dev/monitor/statistics.png)
- 待执行命令数:统计 t_ds_command 表的数据
- 执行失败的命令数:统计 t_ds_error_command 表的数据
- 待运行任务数:统计 Zookeeper 中 task_queue 的数据
- 待杀死任务数:统计 Zookeeper 中 task_kill 的数据

65
docs/docs/zh/guide/open-api.md

@ -0,0 +1,65 @@
# API 调用
## 背景
一般都是通过页面来创建项目、流程等,但是与第三方系统集成就需要通过调用 API 来管理项目、流程
## 操作步骤
### 创建 token
1. 登录调度系统,点击 "安全中心",再点击左侧的 "令牌管理",点击 "令牌管理" 创建令牌
<p align="center">
<img src="/img/token-management.png" width="80%" />
</p>
2. 选择 "失效时间" (Token有效期),选择 "用户" (以指定的用户执行接口操作),点击 "生成令牌" ,拷贝 Token 字符串,然后点击 "提交"
<p align="center">
<img src="/img/create-token.png" width="80%" />
</p>
### 使用 Token
1. 打开 API文档页面
> 地址:http://{api server ip}:12345/dolphinscheduler/doc.html?language=zh_CN&lang=cn
<p align="center">
<img src="/img/api-documentation.png" width="80%" />
</p>
2. 选一个测试的接口,本次测试选取的接口是:查询所有项目
> projects/query-project-list
3. 打开 Postman,填写接口地址,并在 Headers 中填写 Token,发送请求后即可查看结果
```
token:刚刚生成的Token
```
<p align="center">
<img src="/img/test-api.png" width="80%" />
</p>
### 创建项目
这里以创建名为 "wudl-flink-test" 的项目为例
<p align="center">
<img src="/img/api/create_project1.png" width="80%" />
</p>
<p align="center">
<img src="/img/api/create_project2.png" width="80%" />
</p>
<p align="center">
<img src="/img/api/create_project3.png" width="80%" />
</p>
返回 msg 信息为 "success",说明我们已经成功通过 API 的方式创建了项目。
如果您对创建项目的源码感兴趣,欢迎继续阅读下面内容
### 附:创建项目源码
<p align="center">
<img src="/img/api/create_source1.png" width="80%" />
</p>
<p align="center">
<img src="/img/api/create_source2.png" width="80%" />
</p>

49
docs/docs/zh/guide/parameter/built-in.md

@ -0,0 +1,49 @@
# 内置参数
## 基础内置参数
<table>
<tr><th>变量名</th><th>声明方式</th><th>含义</th></tr>
<tr>
<td>system.biz.date</td>
<td>${system.biz.date}</td>
<td>日常调度实例定时的定时时间前一天,格式为 yyyyMMdd</td>
</tr>
<tr>
<td>system.biz.curdate</td>
<td>${system.biz.curdate}</td>
<td>日常调度实例定时的定时时间,格式为 yyyyMMdd</td>
</tr>
<tr>
<td>system.datetime</td>
<td>${system.datetime}</td>
<td>日常调度实例定时的定时时间,格式为 yyyyMMddHHmmss</td>
</tr>
</table>
## 衍生内置参数
- 支持代码中自定义变量名,声明方式:${变量名}。可以是引用 "系统参数"
- 我们定义这种基准变量为 \$[...] 格式的,\$[yyyyMMddHHmmss] 是可以任意分解组合的,比如:\$[yyyyMMdd], \$[HHmmss], \$[yyyy-MM-dd] 等
- 也可以通过以下两种方式:
1.使用add_months()函数,该函数用于加减月份,
第一个入口参数为[yyyyMMdd],表示返回时间的格式
第二个入口参数为月份偏移量,表示加减多少个月
* 后 N 年:$[add_months(yyyyMMdd,12*N)]
* 前 N 年:$[add_months(yyyyMMdd,-12*N)]
* 后 N 月:$[add_months(yyyyMMdd,N)]
* 前 N 月:$[add_months(yyyyMMdd,-N)]
*******************************************
2.直接加减数字
在自定义格式后直接“+/-”数字
* 后 N 周:$[yyyyMMdd+7*N]
* 前 N 周:$[yyyyMMdd-7*N]
* 后 N 天:$[yyyyMMdd+N]
* 前 N 天:$[yyyyMMdd-N]
* 后 N 小时:$[HHmmss+N/24]
* 前 N 小时:$[HHmmss-N/24]
* 后 N 分钟:$[HHmmss+N/24/60]
* 前 N 分钟:$[HHmmss-N/24/60]

69
docs/docs/zh/guide/parameter/context.md

@ -0,0 +1,69 @@
# 参数的引用
DolphinScheduler 提供参数间相互引用的能力,包括:本地参数引用全局参数、上下游参数传递。因为有引用的存在,就涉及当参数名相同时,参数的优先级问题,详见[参数优先级](priority.md)
## 本地任务引用全局参数
本地任务引用全局参数的前提是,你已经定义了[全局参数](global.md),使用方式和[本地参数](local.md)中的使用方式类似,但是参数的值需要配置成全局参数中的key
![parameter-call-global-in-local](/img/global_parameter.png)
如上图中的`${biz_date}`以及`${curdate}`,就是本地参数引用全局参数的例子。观察上图的最后一行,local_param_bizdate通过\${global_bizdate}来引用全局参数,在shell脚本中可以通过\${local_param_bizdate}来引全局变量 global_bizdate的值,或通过JDBC直接将local_param_bizdate的值set进去。同理,local_param通过${local_param}引用上一节中定义的全局参数。biz_date、biz_curdate、system.datetime都是用户自定义的参数,通过${全局参数}进行赋值。
## 上游任务传递给下游任务
DolphinScheduler 允许在任务间进行参数传递,目前传递方向仅支持上游单向传递给下游。目前支持这个特性的任务类型有:
* [Shell](../task/shell.md)
* [SQL](../task/sql.md)
* [Procedure](../task/stored-procedure.md)
当定义上游节点时,如果有需要将该节点的结果传递给有依赖关系的下游节点,需要在【当前节点设置】的【自定义参数】设置一个方向是 OUT 的变量。目前我们主要针对 SQL 和 SHELL 节点做了可以向下传递参数的功能。
### SQL
prop 为用户指定;方向选择为 OUT,只有当方向为 OUT 时才会被定义为变量输出;数据类型可以根据需要选择不同数据结构;value 部分不需要填写。
如果 SQL 节点的结果只有一行,一个或多个字段,prop 的名字需要和字段名称一致。数据类型可选择为除 LIST 以外的其他类型。变量会选择 SQL 查询结果中的列名中与该变量名称相同的列对应的值。
如果 SQL 节点的结果为多行,一个或多个字段,prop 的名字需要和字段名称一致。数据类型选择为LIST。获取到 SQL 查询结果后会将对应列转化为 LIST<VARCHAR>,并将该结果转化为 JSON 后作为对应变量的值。
我们再以上图中包含 SQL 节点的流程举例说明:
上图中节点【createParam1】的定义如下:
<img src="/img/globalParam/image-20210723104957031.png" alt="image-20210723104957031" style="zoom:50%;" />
节点【createParam2】的定义如下:
<img src="/img/globalParam/image-20210723105026924.png" alt="image-20210723105026924" style="zoom:50%;" />
您可以在【工作流实例】页面,找到对应的节点实例,便可以查看该变量的值。
节点实例【createParam1】如下:
<img src="/img/globalParam/image-20210723105131381.png" alt="image-20210723105131381" style="zoom:50%;" />
这里当然 "id" 的值会等于 12.
我们再来看节点实例【createParam2】的情况。
<img src="/img/globalParam/image-20210723105255850.png" alt="image-20210723105255850" style="zoom:50%;" />
这里只有 "id" 的值。尽管用户定义的 sql 查到的是 "id" 和 "database_name" 两个字段,但是由于只定义了一个为 out 的变量 "id",所以只会设置一个变量。由于显示的原因,这里已经替您查好了该 list 的长度为 10。
### SHELL
prop 为用户指定;方向选择为 OUT,只有当方向为 OUT 时才会被定义为变量输出;数据类型可以根据需要选择不同数据结构;value 部分不需要填写。
用户需要传递参数,在定义 shell 脚本时,需要输出格式为 ${setValue(key=value)} 的语句,key 为对应参数的 prop,value 为该参数的值。
例如下图中:
<img src="/img/globalParam/image-20210723101242216.png" alt="image-20210723101242216" style="zoom:50%;" />
shell 节点定义时当日志检测到 ${setValue(key=value1)} 的格式时,会将 value1 赋值给 key,下游节点便可以直接使用变量 key 的值。同样,您可以在【工作流实例】页面,找到对应的节点实例,便可以查看该变量的值。
<img src="/img/globalParam/image-20210723102522383.png" alt="image-20210723102522383" style="zoom:50%;" />

19
docs/docs/zh/guide/parameter/global.md

@ -0,0 +1,19 @@
# 全局参数
## 作用域
在工作流定义页面配置的参数,作用于该工作流中全部的任务
## 使用方式
全局参数配置方式如下:在工作流定义页面,点击“设置全局”右边的加号,填写对应的变量名称和对应的值,保存即可
<p align="center">
<img src="/img/supplement_global_parameter.png" width="80%" />
</p>
<p align="center">
<img src="/img/local_parameter.png" width="80%" />
</p>
这里定义的global_bizdate参数可以被其它任一节点的局部参数引用,并设置global_bizdate的value为通过引用系统参数system.biz.date获得的值

Some files were not shown because too many files have changed in this diff Show More

Loading…
Cancel
Save