Browse Source

[Doc][Style] Fix doc format once for all (#12006)

labbomb
Eric Gao 2 years ago committed by GitHub
parent
commit
a44817fc46
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
  1. 2
      .github/PULL_REQUEST_TEMPLATE.md
  2. 2
      CONTRIBUTING.md
  3. 15
      README.md
  4. 3
      README_zh_CN.md
  5. 1
      deploy/README.md
  6. 1
      docs/docs/en/DSIP.md
  7. 1
      docs/docs/en/about/features.md
  8. 1
      docs/docs/en/about/glossary.md
  9. 6
      docs/docs/en/about/hardware.md
  10. 13
      docs/docs/en/architecture/configuration.md
  11. 7
      docs/docs/en/architecture/design.md
  12. 1
      docs/docs/en/architecture/load-balance.md
  13. 5
      docs/docs/en/architecture/metadata.md
  14. 268
      docs/docs/en/architecture/task-structure.md
  15. 23
      docs/docs/en/contribute/api-standard.md
  16. 2
      docs/docs/en/contribute/api-test.md
  17. 26
      docs/docs/en/contribute/architecture-design.md
  18. 1
      docs/docs/en/contribute/backend/mechanism/global-parameter.md
  19. 2
      docs/docs/en/contribute/backend/mechanism/overview.md
  20. 1
      docs/docs/en/contribute/backend/mechanism/task/switch.md
  21. 7
      docs/docs/en/contribute/backend/spi/alert.md
  22. 1
      docs/docs/en/contribute/backend/spi/registry.md
  23. 69
      docs/docs/en/contribute/frontend-development.md
  24. 3
      docs/docs/en/contribute/have-questions.md
  25. 2
      docs/docs/en/contribute/join/DS-License.md
  26. 3
      docs/docs/en/contribute/join/code-conduct.md
  27. 1
      docs/docs/en/contribute/join/issue.md
  28. 2
      docs/docs/en/contribute/join/pull-request.md
  29. 15
      docs/docs/en/contribute/join/review.md
  30. 7
      docs/docs/en/contribute/join/submit-code.md
  31. 1
      docs/docs/en/contribute/join/subscribe.md
  32. 3
      docs/docs/en/contribute/join/unit-test.md
  33. 1
      docs/docs/en/contribute/log-specification.md
  34. 1
      docs/docs/en/contribute/release/release-prepare.md
  35. 4
      docs/docs/en/contribute/release/release.md
  36. 3
      docs/docs/en/guide/alert/dingtalk.md
  37. 1
      docs/docs/en/guide/alert/email.md
  38. 3
      docs/docs/en/guide/alert/enterprise-webexteams.md
  39. 2
      docs/docs/en/guide/alert/enterprise-wechat.md
  40. 1
      docs/docs/en/guide/alert/feishu.md
  41. 2
      docs/docs/en/guide/alert/http.md
  42. 2
      docs/docs/en/guide/alert/script.md
  43. 3
      docs/docs/en/guide/alert/telegram.md
  44. 44
      docs/docs/en/guide/data-quality.md
  45. 3
      docs/docs/en/guide/datasource/athena.md
  46. 2
      docs/docs/en/guide/datasource/clickhouse.md
  47. 2
      docs/docs/en/guide/datasource/db2.md
  48. 2
      docs/docs/en/guide/datasource/hive.md
  49. 2
      docs/docs/en/guide/datasource/mysql.md
  50. 2
      docs/docs/en/guide/datasource/oracle.md
  51. 2
      docs/docs/en/guide/datasource/postgresql.md
  52. 3
      docs/docs/en/guide/datasource/presto.md
  53. 2
      docs/docs/en/guide/datasource/redshift.md
  54. 2
      docs/docs/en/guide/datasource/spark.md
  55. 2
      docs/docs/en/guide/datasource/sqlserver.md
  56. 12
      docs/docs/en/guide/expansion-reduction.md
  57. 1
      docs/docs/en/guide/healthcheck.md
  58. 5
      docs/docs/en/guide/howto/datasource-setting.md
  59. 5
      docs/docs/en/guide/installation/pseudo-cluster.md
  60. 5
      docs/docs/en/guide/integration/rainbond.md
  61. 2
      docs/docs/en/guide/metrics/metrics.md
  62. 2
      docs/docs/en/guide/monitor.md
  63. 4
      docs/docs/en/guide/parameter/built-in.md
  64. 2
      docs/docs/en/guide/project/project-list.md
  65. 2
      docs/docs/en/guide/project/task-definition.md
  66. 2
      docs/docs/en/guide/project/task-instance.md
  67. 6
      docs/docs/en/guide/project/workflow-definition.md
  68. 8
      docs/docs/en/guide/project/workflow-instance.md
  69. 1
      docs/docs/en/guide/resource/file-manage.md
  70. 7
      docs/docs/en/guide/security.md
  71. 1
      docs/docs/en/guide/start/docker.md
  72. 1
      docs/docs/en/guide/task/java.md
  73. 1
      docs/docs/en/guide/upgrade/incompatible.md
  74. 4
      docs/docs/en/guide/upgrade/upgrade.md
  75. 1
      docs/docs/en/history-versions.md
  76. 1
      docs/docs/zh/DSIP.md
  77. 1
      docs/docs/zh/about/features.md
  78. 1
      docs/docs/zh/about/glossary.md
  79. 10
      docs/docs/zh/about/hardware.md
  80. 17
      docs/docs/zh/architecture/configuration.md
  81. 19
      docs/docs/zh/architecture/design.md
  82. 2
      docs/docs/zh/architecture/load-balance.md
  83. 6
      docs/docs/zh/architecture/metadata.md
  84. 311
      docs/docs/zh/architecture/task-structure.md
  85. 24
      docs/docs/zh/contribute/api-standard.md
  86. 4
      docs/docs/zh/contribute/api-test.md
  87. 37
      docs/docs/zh/contribute/architecture-design.md
  88. 2
      docs/docs/zh/contribute/backend/mechanism/overview.md
  89. 1
      docs/docs/zh/contribute/backend/mechanism/task/switch.md
  90. 9
      docs/docs/zh/contribute/backend/spi/alert.md
  91. 2
      docs/docs/zh/contribute/backend/spi/registry.md
  92. 1
      docs/docs/zh/contribute/e2e-test.md
  93. 71
      docs/docs/zh/contribute/frontend-development.md
  94. 1
      docs/docs/zh/contribute/have-questions.md
  95. 10
      docs/docs/zh/contribute/join/DS-License.md
  96. 3
      docs/docs/zh/contribute/join/code-conduct.md
  97. 5
      docs/docs/zh/contribute/join/commit-message.md
  98. 1
      docs/docs/zh/contribute/join/contribute.md
  99. 4
      docs/docs/zh/contribute/join/issue.md
  100. 6
      docs/docs/zh/contribute/join/microbench.md
  101. Some files were not shown because too many files have changed in this diff Show More

2
.github/PULL_REQUEST_TEMPLATE.md

@ -1,6 +1,5 @@
<!--Thanks very much for contributing to Apache DolphinScheduler. Please review https://dolphinscheduler.apache.org/en-us/community/development/pull-request.html before opening a pull request.-->
## Purpose of the pull request
<!--(For example: This pull request adds checkstyle plugin).-->
@ -10,6 +9,7 @@
<!--*(for example:)*
- *Add maven-checkstyle-plugin to root pom.xml*
-->
## Verify this pull request
<!--*(Please pick either of the following options)*-->

2
CONTRIBUTING.md

@ -40,7 +40,6 @@ There will be two repositories at this time: origin (your own warehouse) and ups
Get/update remote repository code (already the latest code, skip it).
```sh
git fetch upstream
```
@ -91,7 +90,6 @@ After submitting changes to your remote repository, you should click on the new
<img src = "http://geek.analysys.cn/static/upload/221/2019-04-02/90f3abbf-70ef-4334-b8d6-9014c9cf4c7f.png" width ="60%"/>
</p>
Select the modified local branch and the branch to merge past to create a pull request.
<p align = "center">

15
README.md

@ -1,6 +1,6 @@
Dolphin Scheduler Official Website
[dolphinscheduler.apache.org](https://dolphinscheduler.apache.org)
============
==================================================================
[![License](https://img.shields.io/badge/license-Apache%202-4EB1BA.svg)](https://www.apache.org/licenses/LICENSE-2.0.html)
[![codecov](https://codecov.io/gh/apache/dolphinscheduler/branch/dev/graph/badge.svg)](https://codecov.io/gh/apache/dolphinscheduler/branch/dev)
@ -8,9 +8,6 @@ Dolphin Scheduler Official Website
[![Twitter Follow](https://img.shields.io/twitter/follow/dolphinschedule.svg?style=social&label=Follow)](https://twitter.com/dolphinschedule)
[![Slack Status](https://img.shields.io/badge/slack-join_chat-white.svg?logo=slack&style=social)](https://s.apache.org/dolphinscheduler-slack)
[![Stargazers over time](https://starchart.cc/apache/dolphinscheduler.svg)](https://starchart.cc/apache/dolphinscheduler)
[![EN doc](https://img.shields.io/badge/document-English-blue.svg)](README.md)
@ -45,11 +42,11 @@ scale of the cluster
## What's in DolphinScheduler
Stability | Accessibility | Features | Scalability |
--------- | ------------- | -------- | ------------|
Decentralized multi-master and multi-worker | Visualization of workflow key information, such as task status, task type, retry times, task operation machine information, visual variables, and so on at a glance.  |  Support pause, recover operation | Support customized task types
support HA | Visualization of all workflow operations, dragging tasks to draw DAGs, configuring data sources and resources. At the same time, for third-party systems, provide API mode operations. | Users on DolphinScheduler can achieve many-to-one or one-to-one mapping relationship through tenants and Hadoop users, which is very important for scheduling large data jobs. | The scheduler supports distributed scheduling, and the overall scheduling capability will increase linearly with the scale of the cluster. Master and Worker support dynamic adjustment.
Overload processing: By using the task queue mechanism, the number of schedulable tasks on a single machine can be flexibly configured. Machine jam can be avoided with high tolerance to numbers of tasks cached in task queue. | One-click deployment | Support traditional shell tasks, and big data platform task scheduling: MR, Spark, SQL (MySQL, PostgreSQL, hive, spark SQL), Python, Procedure, Sub_Process | |
| Stability | Accessibility | Features | Scalability |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Decentralized multi-master and multi-worker | Visualization of workflow key information, such as task status, task type, retry times, task operation machine information, visual variables, and so on at a glance.  |  Support pause, recover operation | Support customized task types |
| support HA | Visualization of all workflow operations, dragging tasks to draw DAGs, configuring data sources and resources. At the same time, for third-party systems, provide API mode operations. | Users on DolphinScheduler can achieve many-to-one or one-to-one mapping relationship through tenants and Hadoop users, which is very important for scheduling large data jobs. | The scheduler supports distributed scheduling, and the overall scheduling capability will increase linearly with the scale of the cluster. Master and Worker support dynamic adjustment. |
| Overload processing: By using the task queue mechanism, the number of schedulable tasks on a single machine can be flexibly configured. Machine jam can be avoided with high tolerance to numbers of tasks cached in task queue. | One-click deployment | Support traditional shell tasks, and big data platform task scheduling: MR, Spark, SQL (MySQL, PostgreSQL, hive, spark SQL), Python, Procedure, Sub_Process | |
## User Interface Screenshots

3
README_zh_CN.md

@ -1,12 +1,11 @@
Dolphin Scheduler Official Website
[dolphinscheduler.apache.org](https://dolphinscheduler.apache.org)
============
==================================================================
[![License](https://img.shields.io/badge/license-Apache%202-4EB1BA.svg)](https://www.apache.org/licenses/LICENSE-2.0.html)
[![codecov](https://codecov.io/gh/apache/dolphinscheduler/branch/dev/graph/badge.svg)](https://codecov.io/gh/apache/dolphinscheduler/branch/dev)
[![Quality Gate Status](https://sonarcloud.io/api/project_badges/measure?project=apache-dolphinscheduler&metric=alert_status)](https://sonarcloud.io/dashboard?id=apache-dolphinscheduler)
[![Stargazers over time](https://starchart.cc/apache/dolphinscheduler.svg)](https://starchart.cc/apache/dolphinscheduler)
[![CN doc](https://img.shields.io/badge/文档-中文版-blue.svg)](README_zh_CN.md)

1
deploy/README.md

@ -2,3 +2,4 @@
* [Start Up DolphinScheduler with Docker](https://dolphinscheduler.apache.org/en-us/docs/latest/user_doc/guide/start/docker.html)
* [Start Up DolphinScheduler with Kubernetes](https://dolphinscheduler.apache.org/en-us/docs/latest/user_doc/guide/installation/kubernetes.html)

1
docs/docs/en/DSIP.md

@ -89,3 +89,4 @@ closed and transfer from [current DSIPs][current-DSIPs] to [past DSIPs][past-DSI
[github-issue-choose]: https://github.com/apache/dolphinscheduler/issues/new/choose
[mail-to-dev]: mailto:dev@dolphinscheduler.apache.org
[DSIP-1]: https://github.com/apache/dolphinscheduler/issues/6407

1
docs/docs/en/about/features.md

@ -17,3 +17,4 @@
## High Scalability
- **Scalability**: Supports multitenancy and online resource management. Stable operation of 100,000 data tasks per day is supported.

1
docs/docs/en/about/glossary.md

@ -71,4 +71,3 @@ process fails and ends
From the perspective of scheduling, this article preliminarily introduces the architecture principles and implementation
ideas of the big data distributed workflow scheduling system-DolphinScheduler. To be continued

6
docs/docs/en/about/hardware.md

@ -7,7 +7,7 @@ This section briefs about the hardware requirements for DolphinScheduler. Dolphi
The Linux operating systems specified below can run on physical servers and mainstream virtualization environments such as VMware, KVM, and XEN.
| Operating System | Version |
| :----------------------- | :----------: |
|:-------------------------|:---------------:|
| Red Hat Enterprise Linux | 7.0 and above |
| CentOS | 7.0 and above |
| Oracle Enterprise Linux | 7.0 and above |
@ -23,7 +23,7 @@ DolphinScheduler supports 64-bit hardware platforms with Intel x86-64 architectu
### Production Environment
| **CPU** | **MEM** | **HD** | **NIC** | **Num** |
| --- | --- | --- | --- | --- |
|---------|---------|--------|---------|---------|
| 4 core+ | 8 GB+ | SAS | GbE | 1+ |
> **Note:**
@ -35,7 +35,7 @@ DolphinScheduler supports 64-bit hardware platforms with Intel x86-64 architectu
DolphinScheduler provides the following network port configurations for normal operation:
| Server | Port | Desc |
| --- | --- | --- |
|----------------------|-------|----------------------------------------------------------------------|
| MasterServer | 5678 | not the communication port, require the native ports do not conflict |
| WorkerServer | 1234 | not the communication port, require the native ports do not conflict |
| ApiApplicationServer | 12345 | backend communication port |

13
docs/docs/en/architecture/configuration.md

@ -101,8 +101,6 @@ The directory structure of DolphinScheduler is as follows:
## Configurations in Details
### dolphinscheduler-daemon.sh [startup or shutdown DolphinScheduler application]
dolphinscheduler-daemon.sh is responsible for DolphinScheduler startup and shutdown.
@ -110,6 +108,7 @@ Essentially, start-all.sh or stop-all.sh startup and shutdown the cluster via do
Currently, DolphinScheduler just makes a basic config, remember to config further JVM options based on your practical situation of resources.
Default simplified parameters are:
```bash
export DOLPHINSCHEDULER_OPTS="
-server
@ -157,8 +156,8 @@ The default configuration is as follows:
Note that DolphinScheduler also supports database configuration through `bin/env/dolphinscheduler_env.sh`.
### Zookeeper related configuration
DolphinScheduler uses Zookeeper for cluster management, fault tolerance, event monitoring and other functions. Configuration file location:
|Service| Configuration file |
|--|--|
@ -226,8 +225,8 @@ The default configuration is as follows:
|alert.rpc.port | 50052 | the RPC port of Alert Server|
|zeppelin.rest.url | http://localhost:8080 | the RESTful API url of zeppelin|
### Api-server related configuration
Location: `api-server/conf/application.yaml`
|Parameters | Default value| Description|
@ -257,6 +256,7 @@ Location: `api-server/conf/application.yaml`
|traffic.control.customize-tenant-qps-rate||customize tenant max request number per second|
### Master Server related configuration
Location: `master-server/conf/application.yaml`
|Parameters | Default value| Description|
@ -278,8 +278,8 @@ Location: `master-server/conf/application.yaml`
|master.registry-disconnect-strategy.strategy|stop|Used when the master disconnect from registry, default value: stop. Optional values include stop, waiting|
|master.registry-disconnect-strategy.max-waiting-time|100s|Used when the master disconnect from registry, and the disconnect strategy is waiting, this config means the master will waiting to reconnect to registry in given times, and after the waiting times, if the master still cannot connect to registry, will stop itself, if the value is 0s, the Master will waitting infinitely|
### Worker Server related configuration
Location: `worker-server/conf/application.yaml`
|Parameters | Default value| Description|
@ -298,6 +298,7 @@ Location: `worker-server/conf/application.yaml`
|worker.registry-disconnect-strategy.max-waiting-time|100s|Used when the worker disconnect from registry, and the disconnect strategy is waiting, this config means the worker will waiting to reconnect to registry in given times, and after the waiting times, if the worker still cannot connect to registry, will stop itself, if the value is 0s, will waitting infinitely |
### Alert Server related configuration
Location: `alert-server/conf/application.yaml`
|Parameters | Default value| Description|
@ -305,7 +306,6 @@ Location: `alert-server/conf/application.yaml`
|server.port|50053|the port of Alert Server|
|alert.port|50052|the port of alert|
### Quartz related configuration
This part describes quartz configs and configure them based on your practical situation and resources.
@ -335,7 +335,6 @@ The default configuration is as follows:
|spring.quartz.properties.org.quartz.jobStore.driverDelegateClass | org.quartz.impl.jdbcjobstore.PostgreSQLDelegate|
|spring.quartz.properties.org.quartz.jobStore.clusterCheckinInterval | 5000|
### dolphinscheduler_env.sh [load environment variables configs]
When using shell to commit tasks, DolphinScheduler will export environment variables from `bin/env/dolphinscheduler_env.sh`. The

7
docs/docs/en/architecture/design.md

@ -84,6 +84,7 @@
##### Centralized Thinking
The centralized design concept is relatively simple. The nodes in the distributed cluster are roughly divided into two roles according to responsibilities:
<p align="center">
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/master_slave.png" alt="master-slave character" width="50%" />
</p>
@ -120,8 +121,6 @@ The service fault-tolerance design relies on ZooKeeper's Watcher mechanism, and
</p>
Among them, the Master monitors the directories of other Masters and Workers. If the remove event is triggered, perform fault tolerance of the process instance or task instance according to the specific business logic.
- Master fault tolerance:
<p align="center">
@ -172,13 +171,14 @@ In the early schedule design, if there is no priority design and use the fair sc
- According to **the priority of different process instances** prior over **priority of the same process instance** prior over **priority of tasks within the same process** prior over **tasks within the same process**, process task submission order from highest to Lowest.
- The specific implementation is to parse the priority according to the JSON of the task instance, and then save the **process instance priority_process instance id_task priority_task id** information to the ZooKeeper task queue. When obtain from the task queue, we can get the highest priority task by comparing string.
- The priority of the process definition is to consider that some processes need to process before other processes. Configure the priority when the process starts or schedules. There are 5 levels in total, which are HIGHEST, HIGH, MEDIUM, LOW, and LOWEST. As shown below
<p align="center">
<img src="https://user-images.githubusercontent.com/10797147/146744784-eb351b14-c94a-4ed6-8ba4-5132c2a3d116.png" alt="Process priority configuration" width="40%" />
</p>
- The priority of the task is also divides into 5 levels, ordered by HIGHEST, HIGH, MEDIUM, LOW, LOWEST. As shown below:
<p align="center">
<img src="https://user-images.githubusercontent.com/10797147/146744830-5eac611f-5933-4f53-a0c6-31613c283708.png" alt="Task priority configuration" width="35%" />
</p>
@ -188,7 +188,6 @@ In the early schedule design, if there is no priority design and use the fair sc
- Since Web (UI) and Worker are not always on the same machine, to view the log cannot be like querying a local file. There are two options:
- Put logs on the ES search engine.
- Obtain remote log information through netty communication.
- In consideration of the lightness of DolphinScheduler as much as possible, so choose gRPC to achieve remote access to log information.
<p align="center">

1
docs/docs/en/architecture/load-balance.md

@ -57,3 +57,4 @@ You can customise the configuration by changing the following properties in work
- worker.max.cpuload.avg=-1 (worker max cpuload avg, only higher than the system cpu load average, worker server can be dispatched tasks. default value -1: the number of cpu cores * 2)
- worker.reserved.memory=0.3 (worker reserved memory, only lower than system available memory, worker server can be dispatched tasks. default value 0.3, the unit is G)

5
docs/docs/en/architecture/metadata.md

@ -1,8 +1,8 @@
# MetaData
## Table Schema
see sql files in `dolphinscheduler/dolphinscheduler-dao/src/main/resources/sql`
see sql files in `dolphinscheduler/dolphinscheduler-dao/src/main/resources/sql`
---
@ -26,6 +26,7 @@ see sql files in `dolphinscheduler/dolphinscheduler-dao/src/main/resources/sql`
- The `user_id` in the `t_ds_udfs` table represents the user who create the UDF, and the `user_id` in the `t_ds_relation_udfs_user` table represents a user who has permission to the UDF.
### Project - Tenant - ProcessDefinition - Schedule
![image.png](../../../img/metadata-erd/project_tenant_process_definition_schedule.png)
- A project can have multiple process definitions, and each process definition belongs to only one project.
@ -33,8 +34,10 @@ see sql files in `dolphinscheduler/dolphinscheduler-dao/src/main/resources/sql`
- A workflow definition can have one or more schedules.
### Process Definition Execution
![image.png](../../../img/metadata-erd/process_definition.png)
- A process definition corresponds to multiple task definitions, which are associated through `t_ds_process_task_relation` and the associated key is `code + version`. When the pre-task of the task is empty, the corresponding `pre_task_node` and `pre_task_version` are 0.
- A process definition can have multiple process instances `t_ds_process_instance`, one process instance corresponds to one or more task instances `t_ds_task_instance`.
- The data stored in the `t_ds_relation_process_instance` table is used to handle the case that the process definition contains sub-processes. `parent_process_instance_id` represents the id of the main process instance containing the sub-process, `process_instance_id` represents the id of the sub-process instance, `parent_task_instance_id` represents the task instance id of the sub-process node. The process instance table and the task instance table correspond to the `t_ds_process_instance` table and the `t_ds_task_instance` table, respectively.

268
docs/docs/en/architecture/task-structure.md

@ -6,28 +6,28 @@ All tasks in DolphinScheduler are saved in the `t_ds_process_definition` table.
The following shows the `t_ds_process_definition` table structure:
No. | field | type | description
-------- | ---------| -------- | ---------
1|id|int(11)|primary key
2|name|varchar(255)|process definition name
3|version|int(11)|process definition version
4|release_state|tinyint(4)|release status of process definition: 0 not released, 1 released
5|project_id|int(11)|project id
6|user_id|int(11)|user id of the process definition
7|process_definition_json|longtext|process definition JSON
8|description|text|process definition description
9|global_params|text|global parameters
10|flag|tinyint(4)|specify whether the process is available: 0 is not available, 1 is available
11|locations|text|node location information
12|connects|text|node connectivity info
13|receivers|text|receivers
14|receivers_cc|text|CC receivers
15|create_time|datetime|create time
16|timeout|int(11) |timeout
17|tenant_id|int(11) |tenant id
18|update_time|datetime|update time
19|modify_by|varchar(36)|specify the user that made the modification
20|resource_ids|varchar(255)|resource ids
| No. | field | type | description |
|-----|-------------------------|--------------|------------------------------------------------------------------------------|
| 1 | id | int(11) | primary key |
| 2 | name | varchar(255) | process definition name |
| 3 | version | int(11) | process definition version |
| 4 | release_state | tinyint(4) | release status of process definition: 0 not released, 1 released |
| 5 | project_id | int(11) | project id |
| 6 | user_id | int(11) | user id of the process definition |
| 7 | process_definition_json | longtext | process definition JSON |
| 8 | description | text | process definition description |
| 9 | global_params | text | global parameters |
| 10 | flag | tinyint(4) | specify whether the process is available: 0 is not available, 1 is available |
| 11 | locations | text | node location information |
| 12 | connects | text | node connectivity info |
| 13 | receivers | text | receivers |
| 14 | receivers_cc | text | CC receivers |
| 15 | create_time | datetime | create time |
| 16 | timeout | int(11) | timeout |
| 17 | tenant_id | int(11) | tenant id |
| 18 | update_time | datetime | update time |
| 19 | modify_by | varchar(36) | specify the user that made the modification |
| 20 | resource_ids | varchar(255) | resource ids |
The `process_definition_json` field is the core field, which defines the task information in the DAG diagram, and it is stored in JSON format.
@ -40,6 +40,7 @@ No. | field | type | description
4|timeout|int|timeout
Data example:
```bash
{
"globalParams":[
@ -238,38 +239,38 @@ No.|parameter name||type|description |note
**The following shows the node data structure:**
No.|parameter name||type|description |notes
-------- | ---------| ---------| -------- | --------- | ---------
1|id | |String| task Id|
2|type ||String |task type |SPARK
3| name| |String|task name |
4| params| |Object|customized parameters |JSON format
5| |mainClass |String | main class
6| |mainArgs | String| execution arguments
7| |others | String| other arguments
8| |mainJar |Object | application jar package
9| |deployMode |String |deployment mode |local,client,cluster
10| |driverCores | String| driver cores
11| |driverMemory | String| driver memory
12| |numExecutors |String | executor count
13| |executorMemory |String | executor memory
14| |executorCores |String | executor cores
15| |programType | String| program type|JAVA,SCALA,PYTHON
16| | sparkVersion| String| Spark version| SPARK1 , SPARK2
17| | localParams| Array|customized local parameters
18| | resourceList| Array|resource files
19|description | |String|description | |
20|runFlag | |String |execution flag| |
21|conditionResult | |Object|condition branch| |
22| | successNode| Array|jump to node if success| |
23| | failedNode|Array|jump to node if failure|
24| dependence| |Object |task dependency |mutual exclusion with params
25|maxRetryTimes | |String|max retry times | |
26|retryInterval | |String |retry interval| |
27|timeout | |Object|timeout | |
28| taskInstancePriority| |String|task priority | |
29|workerGroup | |String |Worker group| |
30|preTasks | |Array|preposition tasks| |
| No. | parameter name || type | description | notes |
|-----|----------------------|----------------|--------|-----------------------------|------------------------------|
| 1 | id | | String | task Id |
| 2 | type || String | task type | SPARK |
| 3 | name | | String | task name |
| 4 | params | | Object | customized parameters | JSON format |
| 5 | | mainClass | String | main class |
| 6 | | mainArgs | String | execution arguments |
| 7 | | others | String | other arguments |
| 8 | | mainJar | Object | application jar package |
| 9 | | deployMode | String | deployment mode | local,client,cluster |
| 10 | | driverCores | String | driver cores |
| 11 | | driverMemory | String | driver memory |
| 12 | | numExecutors | String | executor count |
| 13 | | executorMemory | String | executor memory |
| 14 | | executorCores | String | executor cores |
| 15 | | programType | String | program type | JAVA,SCALA,PYTHON |
| 16 | | sparkVersion | String | Spark version | SPARK1 , SPARK2 |
| 17 | | localParams | Array | customized local parameters |
| 18 | | resourceList | Array | resource files |
| 19 | description | | String | description | |
| 20 | runFlag | | String | execution flag | |
| 21 | conditionResult | | Object | condition branch | |
| 22 | | successNode | Array | jump to node if success | |
| 23 | | failedNode | Array | jump to node if failure |
| 24 | dependence | | Object | task dependency | mutual exclusion with params |
| 25 | maxRetryTimes | | String | max retry times | |
| 26 | retryInterval | | String | retry interval | |
| 27 | timeout | | Object | timeout | |
| 28 | taskInstancePriority | | String | task priority | |
| 29 | workerGroup | | String | Worker group | |
| 30 | preTasks | | Array | preposition tasks | |
**Node data example:**
@ -336,31 +337,31 @@ No.|parameter name||type|description |notes
**The following shows the node data structure:**
No.|parameter name||type|description |notes
-------- | ---------| ---------| -------- | --------- | ---------
1|id | |String| task Id|
2|type ||String |task type |MR
3| name| |String|task name |
4| params| |Object|customized parameters |JSON format
5| |mainClass |String | main class
6| |mainArgs | String|execution arguments
7| |others | String|other arguments
8| |mainJar |Object | application jar package
9| |programType | String|program type|JAVA,PYTHON
10| | localParams| Array|customized local parameters
11| | resourceList| Array|resource files
12|description | |String|description | |
13|runFlag | |String |execution flag| |
14|conditionResult | |Object|condition branch| |
15| | successNode| Array|jump to node if success| |
16| | failedNode|Array|jump to node if failure|
17| dependence| |Object |task dependency |mutual exclusion with params
18|maxRetryTimes | |String|max retry times | |
19|retryInterval | |String |retry interval| |
20|timeout | |Object|timeout | |
21| taskInstancePriority| |String|task priority| |
22|workerGroup | |String |Worker group| |
23|preTasks | |Array|preposition tasks| |
| No. | parameter name || type | description | notes |
|-----|----------------------|--------------|--------|-----------------------------|------------------------------|
| 1 | id | | String | task Id |
| 2 | type || String | task type | MR |
| 3 | name | | String | task name |
| 4 | params | | Object | customized parameters | JSON format |
| 5 | | mainClass | String | main class |
| 6 | | mainArgs | String | execution arguments |
| 7 | | others | String | other arguments |
| 8 | | mainJar | Object | application jar package |
| 9 | | programType | String | program type | JAVA,PYTHON |
| 10 | | localParams | Array | customized local parameters |
| 11 | | resourceList | Array | resource files |
| 12 | description | | String | description | |
| 13 | runFlag | | String | execution flag | |
| 14 | conditionResult | | Object | condition branch | |
| 15 | | successNode | Array | jump to node if success | |
| 16 | | failedNode | Array | jump to node if failure |
| 17 | dependence | | Object | task dependency | mutual exclusion with params |
| 18 | maxRetryTimes | | String | max retry times | |
| 19 | retryInterval | | String | retry interval | |
| 20 | timeout | | Object | timeout | |
| 21 | taskInstancePriority | | String | task priority | |
| 22 | workerGroup | | String | Worker group | |
| 23 | preTasks | | Array | preposition tasks | |
**Node data example:**
@ -493,36 +494,36 @@ No.|parameter name||type|description |notes
**The following shows the node data structure:**
No.|parameter name||type|description |notes
-------- | ---------| ---------| -------- | --------- | ---------
1|id | |String|task Id|
2|type ||String |task type|FLINK
3| name| |String|task name|
4| params| |Object|customized parameters |JSON format
5| |mainClass |String |main class
6| |mainArgs | String|execution arguments
7| |others | String|other arguments
8| |mainJar |Object |application jar package
9| |deployMode |String |deployment mode |local,client,cluster
10| |slot | String| slot count
11| |taskManager |String | taskManager count
12| |taskManagerMemory |String |taskManager memory size
13| |jobManagerMemory |String | jobManager memory size
14| |programType | String| program type|JAVA,SCALA,PYTHON
15| | localParams| Array|local parameters
16| | resourceList| Array|resource files
17|description | |String|description | |
18|runFlag | |String |execution flag| |
19|conditionResult | |Object|condition branch| |
20| | successNode| Array|jump node if success| |
21| | failedNode|Array|jump node if failure|
22| dependence| |Object |task dependency |mutual exclusion with params
23|maxRetryTimes | |String|max retry times| |
24|retryInterval | |String |retry interval| |
25|timeout | |Object|timeout | |
26| taskInstancePriority| |String|task priority| |
27|workerGroup | |String |Worker group| |
38|preTasks | |Array|preposition tasks| |
| No. | parameter name || type | description | notes |
|-----|----------------------|-------------------|--------|-------------------------|------------------------------|
| 1 | id | | String | task Id |
| 2 | type || String | task type | FLINK |
| 3 | name | | String | task name |
| 4 | params | | Object | customized parameters | JSON format |
| 5 | | mainClass | String | main class |
| 6 | | mainArgs | String | execution arguments |
| 7 | | others | String | other arguments |
| 8 | | mainJar | Object | application jar package |
| 9 | | deployMode | String | deployment mode | local,client,cluster |
| 10 | | slot | String | slot count |
| 11 | | taskManager | String | taskManager count |
| 12 | | taskManagerMemory | String | taskManager memory size |
| 13 | | jobManagerMemory | String | jobManager memory size |
| 14 | | programType | String | program type | JAVA,SCALA,PYTHON |
| 15 | | localParams | Array | local parameters |
| 16 | | resourceList | Array | resource files |
| 17 | description | | String | description | |
| 18 | runFlag | | String | execution flag | |
| 19 | conditionResult | | Object | condition branch | |
| 20 | | successNode | Array | jump node if success | |
| 21 | | failedNode | Array | jump node if failure |
| 22 | dependence | | Object | task dependency | mutual exclusion with params |
| 23 | maxRetryTimes | | String | max retry times | |
| 24 | retryInterval | | String | retry interval | |
| 25 | timeout | | Object | timeout | |
| 26 | taskInstancePriority | | String | task priority | |
| 27 | workerGroup | | String | Worker group | |
| 38 | preTasks | | Array | preposition tasks | |
**Node data example:**
@ -588,30 +589,30 @@ No.|parameter name||type|description |notes
**The following shows the node data structure:**
No.|parameter name||type|description |notes
-------- | ---------| ---------| -------- | --------- | ---------
1|id | |String|task Id|
2|type ||String |task type|HTTP
3| name| |String|task name|
4| params| |Object|customized parameters |JSON format
5| |url |String |request url
6| |httpMethod | String|http method|GET,POST,HEAD,PUT,DELETE
7| | httpParams| Array|http parameters
8| |httpCheckCondition | String|validation of HTTP code status|default code 200
9| |condition |String |validation conditions
10| | localParams| Array|customized local parameters
11|description | |String|description| |
12|runFlag | |String |execution flag| |
13|conditionResult | |Object|condition branch| |
14| | successNode| Array|jump node if success| |
15| | failedNode|Array|jump node if failure|
16| dependence| |Object |task dependency |mutual exclusion with params
17|maxRetryTimes | |String|max retry times | |
18|retryInterval | |String |retry interval| |
19|timeout | |Object|timeout | |
20| taskInstancePriority| |String|task priority| |
21|workerGroup | |String |Worker group| |
22|preTasks | |Array|preposition tasks| |
| No. | parameter name || type | description | notes |
|-----|----------------------|--------------------|--------|--------------------------------|------------------------------|
| 1 | id | | String | task Id |
| 2 | type || String | task type | HTTP |
| 3 | name | | String | task name |
| 4 | params | | Object | customized parameters | JSON format |
| 5 | | url | String | request url |
| 6 | | httpMethod | String | http method | GET,POST,HEAD,PUT,DELETE |
| 7 | | httpParams | Array | http parameters |
| 8 | | httpCheckCondition | String | validation of HTTP code status | default code 200 |
| 9 | | condition | String | validation conditions |
| 10 | | localParams | Array | customized local parameters |
| 11 | description | | String | description | |
| 12 | runFlag | | String | execution flag | |
| 13 | conditionResult | | Object | condition branch | |
| 14 | | successNode | Array | jump node if success | |
| 15 | | failedNode | Array | jump node if failure |
| 16 | dependence | | Object | task dependency | mutual exclusion with params |
| 17 | maxRetryTimes | | String | max retry times | |
| 18 | retryInterval | | String | retry interval | |
| 19 | timeout | | Object | timeout | |
| 20 | taskInstancePriority | | String | task priority | |
| 21 | workerGroup | | String | Worker group | |
| 22 | preTasks | | Array | preposition tasks | |
**Node data example:**
@ -1112,3 +1113,4 @@ No.|parameter name||type|description |notes
]
}
```

23
docs/docs/en/contribute/api-standard.md

@ -1,9 +1,11 @@
# API design standard
A standardized and unified API is the cornerstone of project design.The API of DolphinScheduler follows the REST ful standard. REST ful is currently the most popular Internet software architecture. It has a clear structure, conforms to standards, is easy to understand and extend.
This article uses the DolphinScheduler API as an example to explain how to construct a Restful API.
## 1. URI design
REST is "Representational State Transfer".The design of Restful URI is based on resources.The resource corresponds to an entity on the network, for example: a piece of text, a picture, and a service. And each resource corresponds to a URI.
+ One Kind of Resource: expressed in the plural, such as `task-instances`、`groups` ;
@ -12,36 +14,43 @@ REST is "Representational State Transfer".The design of Restful URI is based on
+ A Sub Resource:`/instances/{instanceId}/tasks/{taskId}`;
## 2. Method design
We need to locate a certain resource by URI, and then use Method or declare actions in the path suffix to reflect the operation of the resource.
### ① Query - GET
Use URI to locate the resource, and use GET to indicate query.
+ When the URI is a type of resource, it means to query a type of resource. For example, the following example indicates paging query `alter-groups`.
```
Method: GET
/dolphinscheduler/alert-groups
```
+ When the URI is a single resource, it means to query this resource. For example, the following example means to query the specified `alter-group`.
```
Method: GET
/dolphinscheduler/alter-groups/{id}
```
+ In addition, we can also express query sub-resources based on URI, as follows:
```
Method: GET
/dolphinscheduler/projects/{projectId}/tasks
```
**The above examples all represent paging query. If we need to query all data, we need to add `/list` after the URI to distinguish. Do not mix the same API for both paged query and query.**
```
Method: GET
/dolphinscheduler/alert-groups/list
```
### ② Create - POST
Use URI to locate the resource, use POST to indicate create, and then return the created id to requester.
+ create an `alter-group`
@ -52,35 +61,42 @@ Method: POST
```
+ create sub-resources is also the same as above.
```
Method: POST
/dolphinscheduler/alter-groups/{alterGroupId}/tasks
```
### ③ Modify - PUT
Use URI to locate the resource, use PUT to indicate modify.
+ modify an `alert-group`
```
Method: PUT
/dolphinscheduler/alter-groups/{alterGroupId}
```
### ④ Delete -DELETE
Use URI to locate the resource, use DELETE to indicate delete.
+ delete an `alert-group`
```
Method: DELETE
/dolphinscheduler/alter-groups/{alterGroupId}
```
+ batch deletion: batch delete the id array,we should use POST. **(Do not use the DELETE method, because the body of the DELETE request has no semantic meaning, and it is possible that some gateways, proxies, and firewalls will directly strip off the request body after receiving the DELETE request.)**
```
Method: POST
/dolphinscheduler/alter-groups/batch-delete
```
### ⑤ Partial Modifications -PATCH
Use URI to locate the resource, use PATCH to partial modifications.
```
@ -89,20 +105,27 @@ Method: PATCH
```
### ⑥ Others
In addition to creating, deleting, modifying and quering, we also locate the corresponding resource through url, and then append operations to it after the path, such as:
```
/dolphinscheduler/alert-groups/verify-name
/dolphinscheduler/projects/{projectCode}/process-instances/{code}/view-gantt
```
## 3. Parameter design
There are two types of parameters, one is request parameter and the other is path parameter. And the parameter must use small hump.
In the case of paging, if the parameter entered by the user is less than 1, the front end needs to automatically turn to 1, indicating that the first page is requested; When the backend finds that the parameter entered by the user is greater than the total number of pages, it should directly return to the last page.
## 4. Others design
### base URL
The URI of the project needs to use `/<project_name>` as the base path, so as to identify that these APIs are under this project.
```
/dolphinscheduler
```

2
docs/docs/en/contribute/api-test.md

@ -10,7 +10,6 @@ In contrast, API testing focuses on whether a complete operation chain can be co
For example, the API test of the tenant management interface focuses on whether users can log in normally; If the login fails, whether the error message can be displayed correctly. After logging in, you can perform tenant management operations through the sessionid you carry.
## API Test
### API-Pages
@ -49,7 +48,6 @@ In addition, during the testing process, the interface are not requested directl
On the login page, only the input parameter specification of the interface request is defined. For the output parameter of the interface request, only the unified basic response structure is defined. The data actually returned by the interface is tested in the actual test case. Whether the input and output of main test interfaces can meet the requirements of test cases.
### API-Cases
The following is an example of a tenant management test. As explained earlier, we use docker-compose for deployment, so for each test case, we need to import the corresponding file in the form of an annotation.

26
docs/docs/en/contribute/architecture-design.md

@ -1,4 +1,5 @@
## Architecture Design
Before explaining the architecture of the schedule system, let us first understand the common nouns of the schedule system.
### 1.Noun Interpretation
@ -34,11 +35,10 @@ Before explaining the architecture of the schedule system, let us first understa
**Complement**: Complement historical data, support **interval parallel and serial** two complement methods
### 2.System architecture
#### 2.1 System Architecture Diagram
<p align="center">
<img src="../../../img/architecture.jpg" alt="System Architecture Diagram" />
<p align="center">
@ -46,8 +46,6 @@ Before explaining the architecture of the schedule system, let us first understa
</p>
</p>
#### 2.2 Architectural description
* **MasterServer**
@ -55,8 +53,6 @@ Before explaining the architecture of the schedule system, let us first understa
MasterServer adopts the distributed non-central design concept. MasterServer is mainly responsible for DAG task split, task submission monitoring, and monitoring the health status of other MasterServer and WorkerServer.
When the MasterServer service starts, it registers a temporary node with Zookeeper, and listens to the Zookeeper temporary node state change for fault tolerance processing.
##### The service mainly contains:
- **Distributed Quartz** distributed scheduling component, mainly responsible for the start and stop operation of the scheduled task. When the quartz picks up the task, the master internally has a thread pool to be responsible for the subsequent operations of the task.
@ -67,8 +63,6 @@ Before explaining the architecture of the schedule system, let us first understa
- **MasterTaskExecThread** is mainly responsible for task persistence
* **WorkerServer**
- WorkerServer also adopts a distributed, non-central design concept. WorkerServer is mainly responsible for task execution and providing log services. When the WorkerServer service starts, it registers the temporary node with Zookeeper and maintains the heartbeat.
@ -76,7 +70,6 @@ Before explaining the architecture of the schedule system, let us first understa
##### This service contains:
- **FetchTaskThread** is mainly responsible for continuously receiving tasks from **Task Queue** and calling **TaskScheduleThread** corresponding executors according to different task types.
- **ZooKeeper**
The ZooKeeper service, the MasterServer and the WorkerServer nodes in the system all use the ZooKeeper for cluster management and fault tolerance. In addition, the system also performs event monitoring and distributed locking based on ZooKeeper.
@ -99,8 +92,6 @@ Before explaining the architecture of the schedule system, let us first understa
The front-end page of the system provides various visual operation interfaces of the system. For details, see the [quick start](https://dolphinscheduler.apache.org/en-us/docs/latest/user_doc/about/introduction.html) section.
#### 2.3 Architectural Design Ideas
##### I. Decentralized vs centralization
@ -130,7 +121,6 @@ Problems in the design of centralized :
- In the decentralized design, there is usually no Master/Slave concept, all roles are the same, the status is equal, the global Internet is a typical decentralized distributed system, networked arbitrary node equipment down machine , all will only affect a small range of features.
- The core design of decentralized design is that there is no "manager" that is different from other nodes in the entire distributed system, so there is no single point of failure problem. However, since there is no "manager" node, each node needs to communicate with other nodes to get the necessary machine information, and the unreliable line of distributed system communication greatly increases the difficulty of implementing the above functions.
- In fact, truly decentralized distributed systems are rare. Instead, dynamic centralized distributed systems are constantly emerging. Under this architecture, the managers in the cluster are dynamically selected, rather than preset, and when the cluster fails, the nodes of the cluster will spontaneously hold "meetings" to elect new "managers". Go to preside over the work. The most typical case is the Etcd implemented in ZooKeeper and Go.
- Decentralization of DolphinScheduler is the registration of Master/Worker to ZooKeeper. The Master Cluster and the Worker Cluster are not centered, and the Zookeeper distributed lock is used to elect one Master or Worker as the “manager” to perform the task.
##### 二、Distributed lock practice
@ -184,8 +174,6 @@ Service fault tolerance design relies on ZooKeeper's Watcher mechanism. The impl
The Master monitors the directories of other Masters and Workers. If the remove event is detected, the process instance is fault-tolerant or the task instance is fault-tolerant according to the specific business logic.
- Master fault tolerance flow chart:
<p align="center">
@ -194,8 +182,6 @@ The Master monitors the directories of other Masters and Workers. If the remove
After the ZooKeeper Master is fault-tolerant, it is rescheduled by the Scheduler thread in DolphinScheduler. It traverses the DAG to find the "Running" and "Submit Successful" tasks, and monitors the status of its task instance for the "Running" task. You need to determine whether the Task Queue already exists. If it exists, monitor the status of the task instance. If it does not exist, resubmit the task instance.
- Worker fault tolerance flow chart:
<p align="center">
@ -214,8 +200,6 @@ Here we must first distinguish between the concept of task failure retry, proces
- Process failure recovery is process level, is done manually, recovery can only be performed **from the failed node** or **from the current node**
- Process failure rerun is also process level, is done manually, rerun is from the start node
Next, let's talk about the topic, we divided the task nodes in the workflow into two types.
- One is a business node, which corresponds to an actual script or processing statement, such as a Shell node, an MR node, a Spark node, a dependent node, and so on.
@ -225,16 +209,12 @@ Each **service node** can configure the number of failed retries. When the task
If there is a task failure in the workflow that reaches the maximum number of retries, the workflow will fail to stop, and the failed workflow can be manually rerun or process resumed.
##### V. Task priority design
In the early scheduling design, if there is no priority design and fair scheduling design, it will encounter the situation that the task submitted first may be completed simultaneously with the task submitted subsequently, but the priority of the process or task cannot be set. We have redesigned this, and we are currently designing it as follows:
- According to **different process instance priority** prioritizes **same process instance priority** prioritizes **task priority within the same process** takes precedence over **same process** commit order from high Go to low for task processing.
- The specific implementation is to resolve the priority according to the json of the task instance, and then save the **process instance priority _ process instance id_task priority _ task id** information in the ZooKeeper task queue, when obtained from the task queue, Through string comparison, you can get the task that needs to be executed first.
- The priority of the process definition is that some processes need to be processed before other processes. This can be configured at the start of the process or at the time of scheduled start. There are 5 levels, followed by HIGHEST, HIGH, MEDIUM, LOW, and LOWEST. As shown below
<p align="center">
@ -308,8 +288,6 @@ Public class TaskLogFilter extends Filter<ILoggingEvent> {
}
```
### summary
Starting from the scheduling, this paper introduces the architecture principle and implementation ideas of the big data distributed workflow scheduling system-DolphinScheduler. To be continued

1
docs/docs/en/contribute/backend/mechanism/global-parameter.md

@ -59,3 +59,4 @@ Assign the parameters with matching values to varPool (List, which contains the
* Format the varPool as json and pass it to master.
* The parameters that are OUT would be written into the localParam after the master has received the varPool.

2
docs/docs/en/contribute/backend/mechanism/overview.md

@ -1,6 +1,6 @@
# Overview
<!-- TODO Since the side menu does not support multiple levels, add new page to keep all sub page here -->
* [Global Parameter](global-parameter.md)
* [Switch Task type](task/switch.md)

1
docs/docs/en/contribute/backend/mechanism/task/switch.md

@ -6,3 +6,4 @@ Switch task workflow step as follows
* `SwitchTaskExecThread` processes the expressions defined in `switch` from top to bottom, obtains the value of the variable from `varPool`, and parses the expression through `javascript`. If the expression returns true, stop checking and record The order of the expression, here we record as resultConditionLocation. The task of SwitchTaskExecThread is over
* After the `switch` task runs, if there is no error (more commonly, the user-defined expression is out of specification or there is a problem with the parameter name), then `MasterExecThread.submitPostNode` will obtain the downstream node of the `DAG` to continue execution.
* If it is found in `DagHelper.parsePostNodes` that the current node (the node that has just completed the work) is a `switch` node, the `resultConditionLocation` will be obtained, and all branches except `resultConditionLocation` in the SwitchParameters will be skipped. In this way, only the branches that need to be executed are left

7
docs/docs/en/contribute/backend/spi/alert.md

@ -26,8 +26,8 @@ If you don't care about its internal design, but simply want to know how to deve
This module is currently a plug-in provided by us, and now we have supported dozens of plug-ins, such as Email, DingTalk, Script, etc.
#### Alert SPI Main class information.
AlertChannelFactory
Alarm plug-in factory interface. All alarm plug-ins need to implement this interface. This interface is used to define the name of the alarm plug-in and the required parameters. The create method is used to create a specific alarm plug-in instance.
@ -77,15 +77,19 @@ The specific design of alert_spi can be seen in the issue: [Alert Plugin Design]
* SMS
SMS alerts
* FeiShu
FeiShu alert notification
* Slack
Slack alert notification
* PagerDuty
PagerDuty alert notification
* WebexTeams
WebexTeams alert notification
@ -101,3 +105,4 @@ The specific design of alert_spi can be seen in the issue: [Alert Plugin Design]
* Http
We have implemented a Http script for alerting. And calling most of the alerting plug-ins end up being Http requests, if we not support your alert plug-in yet, you can use Http to realize your alert login. Also welcome to contribute your common plug-ins to the community :)

1
docs/docs/en/contribute/backend/spi/registry.md

@ -6,6 +6,7 @@ Make the following configuration (take zookeeper as an example)
* Registry plug-in configuration, take Zookeeper as an example (registry.properties)
dolphinscheduler-service/src/main/resources/registry.properties
```registry.properties
registry.plugin.name=zookeeper
registry.servers=127.0.0.1:2181

69
docs/docs/en/contribute/frontend-development.md

@ -1,6 +1,7 @@
# Front-end development documentation
### Technical selection
```
Vue mvvm framework
@ -17,10 +18,16 @@ Lodash high performance JavaScript utility library
### Development environment
- #### Node installation
-
#### Node installation
Node package download (note version v12.20.2) `https://nodejs.org/download/release/v12.20.2/`
- #### Front-end project construction
-
#### Front-end project construction
Use the command line mode `cd` enter the `dolphinscheduler-ui` project directory and execute `npm install` to pull the project dependency package.
> If `npm install` is very slow, you can set the taobao mirror
@ -36,13 +43,16 @@ npm config set registry http://registry.npm.taobao.org/
API_BASE = http://127.0.0.1:12345
```
> ##### ! ! ! Special attention here. If the project reports a "node-sass error" error while pulling the dependency package, execute the following command again after execution.
##### ! ! ! Special attention here. If the project reports a "node-sass error" error while pulling the dependency package, execute the following command again after execution.
```bash
npm install node-sass --unsafe-perm #Install node-sass dependency separately
```
- #### Development environment operation
-
#### Development environment operation
- `npm start` project development environment (after startup address http://localhost:8888)
#### Front-end project release
@ -140,6 +150,7 @@ Public module and utill `src/js/module`
Home => `http://localhost:8888/#/home`
Project Management => `http://localhost:8888/#/projects/list`
```
| Project Home
| Workflow
@ -149,6 +160,7 @@ Project Management => `http://localhost:8888/#/projects/list`
```
Resource Management => `http://localhost:8888/#/resource/file`
```
| File Management
| udf Management
@ -159,6 +171,7 @@ Resource Management => `http://localhost:8888/#/resource/file`
Data Source Management => `http://localhost:8888/#/datasource/list`
Security Center => `http://localhost:8888/#/security/tenant`
```
| Tenant Management
| User Management
@ -174,16 +187,19 @@ User Center => `http://localhost:8888/#/user/account`
The project `src/js/conf/home` is divided into
`pages` => route to page directory
```
The page file corresponding to the routing address
```
`router` => route management
```
vue router, the entry file index.js in each page will be registered. Specific operations: https://router.vuejs.org/zh/
```
`store` => status management
```
The page corresponding to each route has a state management file divided into:
@ -201,9 +217,13 @@ Specific action:https://vuex.vuejs.org/zh/
```
## specification
## Vue specification
##### 1.Component name
The component is named multiple words and is connected with a wire (-) to avoid conflicts with HTML tags and a clearer structure.
```
// positive example
export default {
@ -212,7 +232,9 @@ export default {
```
##### 2.Component files
The internal common component of the `src/js/module/components` project writes the folder name with the same name as the file name. The subcomponents and util tools that are split inside the common component are placed in the internal `_source` folder of the component.
```
└── components
├── header
@ -228,6 +250,7 @@ The internal common component of the `src/js/module/components` project writes t
```
##### 3.Prop
When you define Prop, you should always name it in camel format (camelCase) and use the connection line (-) when assigning values to the parent component.
This follows the characteristics of each language, because it is case-insensitive in HTML tags, and the use of links is more friendly; in JavaScript, the more natural is the hump name.
@ -270,7 +293,9 @@ props: {
```
##### 4.v-for
When performing v-for traversal, you should always bring a key value to make rendering more efficient when updating the DOM.
```
<ul>
<li v-for="item in list" :key="item.id">
@ -280,6 +305,7 @@ When performing v-for traversal, you should always bring a key value to make ren
```
v-for should be avoided on the same element as v-if (`for example: <li>`) because v-for has a higher priority than v-if. To avoid invalid calculations and rendering, you should try to use v-if Put it on top of the container's parent element.
```
<ul v-if="showList">
<li v-for="item in list" :key="item.id">
@ -289,7 +315,9 @@ v-for should be avoided on the same element as v-if (`for example: <li>`) becaus
```
##### 5.v-if / v-else-if / v-else
If the elements in the same set of v-if logic control are logically identical, Vue reuses the same part for more efficient element switching, `such as: value`. In order to avoid the unreasonable effect of multiplexing, you should add key to the same element for identification.
```
<div v-if="hasData" key="mazey-data">
<span>{{ mazeyData }}</span>
@ -300,12 +328,15 @@ If the elements in the same set of v-if logic control are logically identical, V
```
##### 6.Instruction abbreviation
In order to unify the specification, the instruction abbreviation is always used. Using `v-bind`, `v-on` is not bad. Here is only a unified specification.
```
<input :value="mazeyUser" @click="verifyUser">
```
##### 7.Top-level element order of single file components
Styles are packaged in a file, all the styles defined in a single vue file, the same name in other files will also take effect. All will have a top class name before creating a component.
Note: The sass plugin has been added to the project, and the sas syntax can be written directly in a single vue file.
For uniformity and ease of reading, they should be placed in the order of `<template>`、`<script>`、`<style>`.
@ -357,25 +388,31 @@ For uniformity and ease of reading, they should be placed in the order of `<tem
## JavaScript specification
##### 1.var / let / const
It is recommended to no longer use var, but use let / const, prefer const. The use of any variable must be declared in advance, except that the function defined by function can be placed anywhere.
##### 2.quotes
```
const foo = 'after division'
const bar = `${foo},ront-end engineer`
```
##### 3.function
Anonymous functions use the arrow function uniformly. When multiple parameters/return values are used, the object's structure assignment is used first.
```
function getPersonInfo ({name, sex}) {
// ...
return {name, gender}
}
```
The function name is uniformly named with a camel name. The beginning of the capital letter is a constructor. The lowercase letters start with ordinary functions, and the new operator should not be used to operate ordinary functions.
##### 4.object
```
const foo = {a: 0, b: 1}
const bar = JSON.parse(JSON.stringify(foo))
@ -393,7 +430,9 @@ for (let [key, value] of myMap.entries()) {
```
##### 5.module
Unified management of project modules using import / export.
```
// lib.js
export default {}
@ -411,13 +450,16 @@ If the module has only one output value, use `export default`,otherwise no.
##### 1.Label
Do not write the type attribute when referencing external CSS or JavaScript. The HTML5 default type is the text/css and text/javascript properties, so there is no need to specify them.
```
<link rel="stylesheet" href="//www.test.com/css/test.css">
<script src="//www.test.com/js/test.js"></script>
```
##### 2.Naming
The naming of Class and ID should be semantic, and you can see what you are doing by looking at the name; multiple words are connected by a link.
```
// positive example
.test-header{
@ -426,6 +468,7 @@ The naming of Class and ID should be semantic, and you can see what you are doin
```
##### 3.Attribute abbreviation
CSS attributes use abbreviations as much as possible to improve the efficiency and ease of understanding of the code.
```
@ -439,6 +482,7 @@ border: 1px solid #ccc;
```
##### 4.Document type
The HTML5 standard should always be used.
```
@ -446,7 +490,9 @@ The HTML5 standard should always be used.
```
##### 5.Notes
A block comment should be written to a module file.
```
/**
* @module mazey/api
@ -458,6 +504,7 @@ A block comment should be written to a module file.
## interface
##### All interfaces are returned as Promise
Note that non-zero is wrong for catching catch
```
@ -477,6 +524,7 @@ test.then(res => {
```
Normal return
```
{
code:0,
@ -486,6 +534,7 @@ Normal return
```
Error return
```
{
code:10000,
@ -493,8 +542,10 @@ Error return
msg:'failed'
}
```
If the interface is a post request, the Content-Type defaults to application/x-www-form-urlencoded; if the Content-Type is changed to application/json,
Interface parameter transfer needs to be changed to the following way
```
io.post('url', payload, null, null, { emulateJSON: false } res => {
resolve(res)
@ -524,6 +575,7 @@ User Center Related Interfaces `src/js/conf/home/store/user/actions.js`
(1) First place the icon icon of the node in the `src/js/conf/home/pages/dag/img `folder, and note the English name of the node defined by the `toolbar_${in the background. For example: SHELL}.png`
(2) Find the `tasksType` object in `src/js/conf/home/pages/dag/_source/config.js` and add it to it.
```
'DEPENDENT': { // The background definition node type English name is used as the key value
desc: 'DEPENDENT', // tooltip desc
@ -532,6 +584,7 @@ User Center Related Interfaces `src/js/conf/home/store/user/actions.js`
```
(3) Add a `${node type (lowercase)}`.vue file in `src/js/conf/home/pages/dag/_source/formModel/tasks`. The contents of the components related to the current node are written here. Must belong to a node component must have a function _verification () After the verification is successful, the relevant data of the current component is thrown to the parent component.
```
/**
* Verification
@ -566,6 +619,7 @@ User Center Related Interfaces `src/js/conf/home/store/user/actions.js`
(4) Common components used inside the node component are under` _source`, and `commcon.js` is used to configure public data.
##### 2.Increase the status type
(1) Find the `tasksState` object in `src/js/conf/home/pages/dag/_source/config.js` and add it to it.
```
@ -579,7 +633,9 @@ User Center Related Interfaces `src/js/conf/home/store/user/actions.js`
```
##### 3.Add the action bar tool
(1) Find the `toolOper` object in `src/js/conf/home/pages/dag/_source/config.js` and add it to it.
```
{
code: 'pointer', // tool identifier
@ -599,13 +655,12 @@ User Center Related Interfaces `src/js/conf/home/store/user/actions.js`
`util.js` => belongs to the `plugIn` tool class
The operation is handled in the `src/js/conf/home/pages/dag/_source/dag.js` => `toolbarEvent` event.
##### 3.Add a routing page
(1) First add a routing address`src/js/conf/home/router/index.js` in route management
```
routing address{
path: '/test', // routing address
@ -621,10 +676,10 @@ routing address{
This will give you direct access to`http://localhost:8888/#/test`
##### 4.Increase the preset mailbox
Find the `src/lib/localData/email.js` startup and timed email address input to automatically pull down the match.
```
export default ["test@analysys.com.cn","test1@analysys.com.cn","test3@analysys.com.cn"]
```

3
docs/docs/en/contribute/have-questions.md

@ -21,8 +21,9 @@ Some quick tips when using email:
- Tagging the subject line of your email will help you get a faster response, e.g. [api-server]: How to get open api interface?
- Tags may help identify a topic by:
- Component: MasterServer,ApiServer,WorkerServer,AlertServer, etc
- Level: Beginner, Intermediate, Advanced
- Scenario: Debug, How-to
- For error logs or long code examples, please use [GitHub gist](https://gist.github.com/) and include only a few lines of the pertinent code / log within the email.

2
docs/docs/en/contribute/join/DS-License.md

@ -20,7 +20,6 @@ Moreover, when we intend to refer a new software ( not limited to 3rd party jar,
* [COMMUNITY-LED DEVELOPMENT "THE APACHE WAY"](https://apache.org/dev/licensing-howto.html)
For example, we should contain the NOTICE file (every open-source project has NOTICE file, generally under root directory) of ZooKeeper in our project when we are using ZooKeeper. As the Apache explains, "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work.
We are not going to dive into every 3rd party open-source license policy, you may look up them if interested.
@ -40,3 +39,4 @@ We need to follow the following steps when we need to add new jars or external r
* [COMMUNITY-LED DEVELOPMENT "THE APACHE WAY"](https://apache.org/dev/licensing-howto.html)
* [ASF 3RD PARTY LICENSE POLICY](https://apache.org/legal/resolved.html)

3
docs/docs/en/contribute/join/code-conduct.md

@ -3,6 +3,7 @@
The following Code of Conduct is based on full compliance with the [Apache Software Foundation Code of Conduct](https://www.apache.org/foundation/policies/conduct.html).
## Development philosophy
- **Consistent** code style, naming, and usage are consistent.
- **Easy to read** code is obvious, easy to read and understand, when debugging one knows the intent of the code.
- **Neat** agree with the concepts of《Refactoring》and《Code Cleanliness》and pursue clean and elegant code.
@ -63,6 +64,6 @@ The following Code of Conduct is based on full compliance with the [Apache Softw
- Accurate assertion, try not to use `not`,`containsString` assertion.
- The true value of the test case should be named actualXXX, and the expected value should be named expectedXXX.
- Classes and Methods with `@Test` labels do not require javadoc.
- Public specifications.
- Each line is no longer than `200` in length, ensuring that each line is semantically complete for easy understanding.

1
docs/docs/en/contribute/join/issue.md

@ -1,6 +1,7 @@
# Issue Notice
## Preface
Issues function is used to track various Features, Bugs, Functions, etc. The project maintainer can organize the tasks to be completed through issues.
Issue is an important step in drawing out a feature or bug,

2
docs/docs/en/contribute/join/pull-request.md

@ -1,6 +1,7 @@
# Pull Request Notice
## Preface
Pull Request is a way of software cooperation, which is a process of bringing code involving different functions into the trunk. During this process, the code can be discussed, reviewed, and modified.
In Pull Request, we try not to discuss the implementation of the code. The general implementation of the code and its logic should be determined in Issue. In the Pull Request, we only focus on the code format and code specification, so as to avoid wasting time caused by different opinions on implementation.
@ -75,3 +76,4 @@ see [Code Style](../development-environment-setup.md#code-style) for details.
the second is multiple issues have subtle differences.
In this scenario, the responsibilities of each issue can be clearly divided. The type of each issue is marked as Sub-Task, and then these sub task type issues are associated with one issue.
And each Pull Request is submitted should be associated with only one issue of a sub task.

15
docs/docs/en/contribute/join/review.md

@ -28,7 +28,7 @@ go to section [review Pull Requests](#pull-requests).
Review Issues means discuss [Issues][all-issues] in GitHub and give suggestions on it. Include but are not limited to the following situations
| Situation | Reason | Label | Action |
| ------ | ------ | ------ | ------ |
|-------------------------|-------------------------------|------------------------------------------------------|---------------------------------------------------------------------|
| wont fix | Has been fixed in dev branch | [wontfix][label-wontfix] | Close Issue, inform creator the fixed version if it already release |
| duplicate issue | Had the same problem before | [duplicate][label-duplicate] | Close issue, inform creator the link of same issue |
| Description not clearly | Without detail reproduce step | [need more information][label-need-more-information] | Inform creator add more description |
@ -37,7 +37,7 @@ In addition give suggestion, add label for issue is also important during review
better, which convenient for further processing. An issue can with more than one label. Common issue categories are:
| Label | Meaning |
| ------ | ------ |
|------------------------------------------|--------------------------------|
| [UI][label-UI] | UI and front-end related |
| [security][label-security] | Security Issue |
| [user experience][label-user-experience] | User experience Issue |
@ -55,7 +55,7 @@ Beside classification, label could also set the priority of Issues. The higher t
in the community, the easier it is to be fixed or implemented. The priority label are as follows
| Label | priority |
| ------ | ------ |
|------------------------------------------|-----------------|
| [priority:high][label-priority-high] | High priority |
| [priority:middle][label-priority-middle] | Middle priority |
| [priority:low][label-priority-low] | Low priority |
@ -75,7 +75,7 @@ Before reading following content, please make sure you have labeled the Issue.
When an Issue need to create Pull Requests, you could also labeled it from below.
| Label | Mean |
| ------ | ------ |
|--------------------------------------------|---------------------------------------------|
| [Chore][label-Chore] | Chore for project |
| [Good first issue][label-good-first-issue] | Good first issue for new contributor |
| [easy to fix][label-easy-to-fix] | Easy to fix, harder than `Good first issue` |
@ -90,14 +90,14 @@ When an Issue need to create Pull Requests, you could also labeled it from below
<!-- markdown-link-check-disable -->
Review Pull mean discussing in [Pull Requests][all-PRs] in GitHub and giving suggestions to it. DolphinScheduler's
Pull Requests reviewing are the same as [GitHub's reviewing changes in pull requests][gh-review-pr]. You can give your
suggestions in Pull Requests
suggestions in Pull Reque-->
* When you think the Pull Request is OK to be merged, you can agree to the Pull Request according to the "Approve" process
in [GitHub's reviewing changes in pull requests][gh-review-pr].
* When you think Pull Request needs to be changed, you can comment it according to the "Comment" process in
[GitHub's reviewing changes in pull requests][gh-review-pr]. And when you think issues that must be fixed before they
merged, please follow "Request changes" in [GitHub's reviewing changes in pull requests][gh-review-pr] to ask contributors
modify it.
<!-- markdown-link-check-enable -->
Labeled Pull Requests is an important part. Reasonable classification can save a lot of time for reviewers. The good news
@ -108,7 +108,7 @@ and [priority:high][label-priority-high].
Pull Requests have some unique labels of it own
| Label | Mean |
| ------ | ------ |
|--------------------------------------------------------|----------------------------------------------------------|
| [miss document][label-miss-document] | Pull Requests miss document, and should be add |
| [first time contributor][label-first-time-contributor] | Pull Requests submit by first time contributor |
| [don't merge][label-do-not-merge] | Pull Requests have some problem and should not be merged |
@ -151,3 +151,4 @@ Pull Requests have some unique labels of it own
[all-issues]: https://github.com/apache/dolphinscheduler/issues
[all-PRs]: https://github.com/apache/dolphinscheduler/pulls
[gh-review-pr]: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/about-pull-request-reviews

7
docs/docs/en/contribute/join/submit-code.md

@ -3,19 +3,16 @@
* First from the remote repository *https://github.com/apache/dolphinscheduler.git* fork a copy of the code into your own repository
* There are currently three branches in the remote repository:
* master normal delivery branch
After the stable release, merge the code from the stable branch into the master.
    
* dev daily development branch
Every day dev development branch, newly submitted code can pull request to this branch.
* Clone your repository to your local
`git clone https://github.com/apache/dolphinscheduler.git`
* Add remote repository address, named upstream
`git remote add upstream https://github.com/apache/dolphinscheduler.git`
* View repository
    `git remote -v`
@ -39,6 +36,7 @@ git push --set-upstream origin dev-1.0
```
* Create new branch
```
git checkout -b xxx origin/dev
```
@ -60,4 +58,3 @@ Make sure that the branch `xxx` is building successfully on the latest code of t
* Finally, congratulations, you have become an official contributor to dolphinscheduler!

1
docs/docs/en/contribute/join/subscribe.md

@ -21,3 +21,4 @@ Unsubscribe from the mailing list steps are as follows:
2. Receive confirmation email and reply. After completing step 1, you will receive a confirmation email from dev-help@dolphinscheduler.apache.org (if not received, please confirm whether the email is automatically classified as spam, promotion email, subscription email, etc.) . Then reply directly to the email, or click on the link in the email to reply quickly, the subject and content are arbitrary.
3. Receive a goodbye email. After completing the above steps, you will receive a goodbye email with the subject GOODBYE from dev@dolphinscheduler.apache.org, and you have successfully unsubscribed to the Apache DolphinScheduler mailing list, and you will not receive emails from dev@dolphinscheduler.apache.org.

3
docs/docs/en/contribute/join/unit-test.md

@ -12,8 +12,10 @@
- Pay attention to boundary conditions.
- Unit tests should be well designed as well as avoiding useless code.
- When you find a `method` is difficult to write unit test, and if you confirm that the `method` is `bad code`, then refactor it with the developer.
<!-- markdown-link-check-disable -->
- DolphinScheduler: [mockito](http://site.mockito.org/). Here are some development guides: [mockito tutorial](http://www.baeldung.com/bdd-mockito), [mockito refcard](https://dzone.com/refcardz/mockito)
<!-- markdown-link-check-enable -->
- TDD(option): When you start writing a new feature, you can try writing test cases first.
@ -100,6 +102,7 @@ The test will fail when the code in the unit test throws an exception. Therefore
}
}
```
You should this:
```java

1
docs/docs/en/contribute/log-specification.md

@ -50,3 +50,4 @@ That is, the workflow instance ID and task instance ID are injected in the print
- Branch printing of logs is prohibited. The contents of the logs need to be associated with the relevant information in the log format, and printing them in separate lines will cause the contents of the logs to not match the time and other information, and cause the logs to be mixed in a large number of log environments, which will make log retrieval more difficult.
- The use of the "+" operator for splicing log content is prohibited. Use placeholders for formatting logs for printing to improve memory usage efficiency.
- When the log content includes object instances, you need to make sure to override the toString() method to prevent printing meaningless hashcode.

1
docs/docs/en/contribute/release/release-prepare.md

@ -29,3 +29,4 @@ For example, to release `x.y.z`, the following updates are required:
- Add new history version
- `docs/docs/en/history-versions.md` and `docs/docs/zh/history-versions.md`: Add the new version and link for `x.y.z`
- `docs/configs/docsdev.js`: change `/dev/` to `/x.y.z/`

4
docs/docs/en/contribute/release/release.md

@ -210,7 +210,7 @@ git push origin --tags
> Note1: In this step, you should use github token for password because native password no longer supported, you can see
> https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token for more
> detail about how to create token about it.
>
> Note2: After the command done, it will auto-created `release.properties` file and `*.Backup` files, their will be need
> in the following command and DO NOT DELETE THEM
@ -293,6 +293,7 @@ cd ~/ds_svn/dev/dolphinscheduler
svn add *
svn --username="${A_USERNAME}" commit -m "release ${VERSION}"
```
## Check Release
### Check sha512 hash
@ -538,3 +539,4 @@ DolphinScheduler Resources:
- Mailing list: dev@dolphinscheduler.apache.org
- Documents: https://dolphinscheduler.apache.org/zh-cn/docs/<VERSION>/user_doc/about/introduction.html
```

3
docs/docs/en/guide/alert/dingtalk.md

@ -9,7 +9,7 @@ The following shows the `DingTalk` configuration example:
## Parameter Configuration
| **Parameter** | **Description** |
| --- | --- |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Warning Type | Alert on success or failure or both. |
| WebHook | The format is: [https://oapi.dingtalk.com/robot/send?access\_token=XXXXXX](https://oapi.dingtalk.com/robot/send?access_token=XXXXXX) |
| Keyword | Custom keywords for security settings. |
@ -25,3 +25,4 @@ The following shows the `DingTalk` configuration example:
## Reference
- [DingTalk Custom Robot Access Development Documentation](https://open.dingtalk.com/document/robots/custom-robot-access)

1
docs/docs/en/guide/alert/email.md

@ -1,4 +1,5 @@
# Email
If you need to use `Email` for alerting, create an alert instance in the alert instance management and select the Email plugin.
The following shows the `Email` configuration example:

3
docs/docs/en/guide/alert/enterprise-webexteams.md

@ -8,7 +8,7 @@ The following is the `WebexTeams` configuration example:
## Parameter Configuration
| **Parameter** | **Description** |
| --- | --- |
|-----------------|-------------------------------------------------------------------------------------------------------------------------|
| botAccessToken | The access token of robot. |
| roomID | The ID of the room that receives message (only support one room ID). |
| toPersonId | The person ID of the recipient when sending a private 1:1 message. |
@ -59,3 +59,4 @@ The `Room ID` we can acquire it from the `id` of creating a new group chat room
- [WebexTeams Application Bot Guide](https://developer.webex.com/docs/bots)
- [WebexTeams Message Guide](https://developer.webex.com/docs/api/v1/messages/create-a-message)

2
docs/docs/en/guide/alert/enterprise-wechat.md

@ -40,7 +40,6 @@ The following is the `query userId` API example:
APP: https://work.weixin.qq.com/api/doc/90000/90135/90236
### Group Chat
The Group Chat send type means to notify the alert results via group chat created by Enterprise WeChat API, sending messages to all members of the group and specified users are not supported.
@ -69,3 +68,4 @@ The following is the `create new group chat` API and `query userId` API example:
## Reference
- Group Chat:https://work.weixin.qq.com/api/doc/90000/90135/90248

1
docs/docs/en/guide/alert/feishu.md

@ -10,6 +10,7 @@ The following shows the `Feishu` configuration example:
## Parameter Configuration
* Webhook
> Copy the robot webhook URL shown below:
![alert-feishu-webhook](../../../../img/new_ui/dev/alert/alert_feishu_webhook.png)

2
docs/docs/en/guide/alert/http.md

@ -5,7 +5,7 @@ If you need to use `Http script` for alerting, create an alert instance in the a
## Parameter Configuration
| **Parameter** | **Description** |
| --- | --- |
|---------------|-----------------------------------------------------------------------------------------------------|
| URL | The `Http` request URL needs to contain protocol, host, path and parameters if the method is `GET`. |
| Request Type | Select the request type from `POST` or `GET`. |
| Headers | The headers of the `Http` request in JSON format. |

2
docs/docs/en/guide/alert/script.md

@ -8,7 +8,7 @@ The following shows the `Script` configuration example:
## Parameter Configuration
| **Parameter** | **Description** |
| --- | --- |
|---------------|--------------------------------------------------|
| User Params | User defined parameters will pass to the script. |
| Script Path | The file location path in the server. |
| Type | Support `Shell` script. |

3
docs/docs/en/guide/alert/telegram.md

@ -8,7 +8,7 @@ The following shows the `Telegram` configuration example:
## Parameter Configuration
| **Parameter** | **Description** |
| --- | --- |
|---------------|---------------------------------------------------------------|
| WebHook | The WebHook of Telegram when use robot to send message. |
| botToken | The access token of robot. |
| chatId | Sub Telegram Channel. |
@ -35,3 +35,4 @@ The webhook needs to be able to receive and use the same JSON body of HTTP POST
- [Telegram Application Bot Guide](https://core.telegram.org/bots)
- [Telegram Bots Api](https://core.telegram.org/bots/api)
- [Telegram SendMessage Api](https://core.telegram.org/bots/api#sendmessage)

44
docs/docs/en/guide/data-quality.md

@ -1,4 +1,5 @@
# Data Quality
## Introduction
The data quality task is used to check the data accuracy during the integration and processing of data. Data quality tasks in this release include single-table checking, single-table custom SQL checking, multi-table accuracy, and two-table value comparisons. The running environment of the data quality task is Spark 2.4.0, and other versions have not been verified, and users can verify by themselves.
@ -28,12 +29,12 @@ data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar
## Detailed Inspection Logic
| **Parameter** | **Description** |
| ----- | ---- |
|---------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| CheckMethod | [CheckFormula][Operator][Threshold], if the result is true, it indicates that the data does not meet expectations, and the failure strategy is executed. |
| CheckFormula | <ul><li>Expected-Actual</li><li>Actual-Expected</li><li>(Actual/Expected)x100%</li><li>(Expected-Actual)/Expected x100%</li></ul> |
| Operator | =, >, >=, <, <=, != |
| ExpectedValue | <ul><li>FixValue</li><li>DailyAvg</li><li>WeeklyAvg</li><li>MonthlyAvg</li><li>Last7DayAvg</li><li>Last30DayAvg</li><li>SrcTableTotalRows</li><li>TargetTableTotalRows</li></ul> |
| Example |<ul><li>CheckFormula:Expected-Actual</li><li>Operator:></li><li>Threshold:0</li><li>ExpectedValue:FixValue=9</li></ul>
| Example | <ul><li>CheckFormula:Expected-Actual</li><li>Operator:></li><li>Threshold:0</li><li>ExpectedValue:FixValue=9</li></ul> |
In the example, assuming that the actual value is 10, the operator is >, and the expected value is 9, then the result 10 -9 > 0 is true, which means that the row data in the empty column has exceeded the threshold, and the task is judged to fail.
@ -50,7 +51,6 @@ The goal of the null value check is to check the number of empty rows in the spe
```sql
SELECT COUNT(*) AS miss FROM ${src_table} WHERE (${src_field} is null or ${src_field} = '') AND (${src_filter})
```
- The SQL to calculate the total number of rows in the table is as follows:
```sql
@ -62,7 +62,7 @@ The goal of the null value check is to check the number of empty rows in the spe
![dataquality_null_check](../../../img/tasks/demo/null_check.png)
| **Parameter** | **Description** |
| ----- | ---- |
|------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Source data type | Select MySQL, PostgreSQL, etc. |
| Source data source | The corresponding data source under the source data type. |
| Source data table | Drop-down to select the table where the validation data is located. |
@ -75,7 +75,9 @@ The goal of the null value check is to check the number of empty rows in the spe
| Expected value type | Select the desired type from the drop-down menu. |
## Timeliness Check of Single Table Check
### Inspection Introduction
The timeliness check is used to check whether the data is processed within the expected time. The start time and end time can be specified to define the time range. If the amount of data within the time range does not reach the set threshold, the check task will be judged as fail.
### Interface Operation Guide
@ -83,9 +85,9 @@ The timeliness check is used to check whether the data is processed within the e
![dataquality_timeliness_check](../../../img/tasks/demo/timeliness_check.png)
| **Parameter** | **Description** |
| ----- | ---- |
| Source data type | Select MySQL, PostgreSQL, etc.
| Source data source | The corresponding data source under the source data type.
|------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Source data type | Select MySQL, PostgreSQL, etc. |
| Source data source | The corresponding data source under the source data type. |
| Source data table | Drop-down to select the table where the validation data is located. |
| Src filter conditions | Such as the title, it will also be used when counting the total number of rows in the table, optional. |
| Src table check column | Drop-down to select check column name. |
@ -101,6 +103,7 @@ The timeliness check is used to check whether the data is processed within the e
## Field Length Check for Single Table Check
### Inspection Introduction
The goal of field length verification is to check whether the length of the selected field meets the expectations. If there is data that does not meet the requirements, and the number of rows exceeds the threshold, the task will be judged to fail.
### Interface Operation Guide
@ -108,7 +111,7 @@ The goal of field length verification is to check whether the length of the sele
![dataquality_length_check](../../../img/tasks/demo/field_length_check.png)
| **Parameter** | **Description** |
| ----- | ---- |
|------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Source data type | Select MySQL, PostgreSQL, etc. |
| Source data source | The corresponding data source under the source data type. |
| Source data table | Drop-down to select the table where the validation data is located. |
@ -125,6 +128,7 @@ The goal of field length verification is to check whether the length of the sele
## Uniqueness Check for Single Table Check
### Inspection Introduction
The goal of the uniqueness check is to check whether the fields are duplicated. It is generally used to check whether the primary key is duplicated. If there are duplicates and the threshold is reached, the check task will be judged to be failed.
### Interface Operation Guide
@ -132,7 +136,7 @@ The goal of the uniqueness check is to check whether the fields are duplicated.
![dataquality_uniqueness_check](../../../img/tasks/demo/uniqueness_check.png)
| **Parameter** | **Description** |
| ----- | ---- |
|------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Source data type | Select MySQL, PostgreSQL, etc. |
| Source data source | The corresponding data source under the source data type. |
| Source data table | Drop-down to select the table where the validation data is located. |
@ -147,6 +151,7 @@ The goal of the uniqueness check is to check whether the fields are duplicated.
## Regular Expression Check for Single Table Check
### Inspection Introduction
The goal of regular expression verification is to check whether the format of the value of a field meets the requirements, such as time format, email format, ID card format, etc. If there is data that does not meet the format and exceeds the threshold, the task will be judged as failed.
### Interface Operation Guide
@ -154,7 +159,7 @@ The goal of regular expression verification is to check whether the format of th
![dataquality_regex_check](../../../img/tasks/demo/regexp_check.png)
| **Parameter** | **Description** |
| ----- | ---- |
|------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Source data type | Select MySQL, PostgreSQL, etc. |
| Source data source | The corresponding data source under the source data type. |
| Source data table | Drop-down to select the table where the validation data is located. |
@ -168,7 +173,9 @@ The goal of regular expression verification is to check whether the format of th
| Expected value type | Select the desired type from the drop-down menu. |
## Enumeration Value Validation for Single Table Check
### Inspection Introduction
The goal of enumeration value verification is to check whether the value of a field is within the range of the enumeration value. If there is data that is not in the range of the enumeration value and exceeds the threshold, the task will be judged to fail.
### Interface Operation Guide
@ -176,7 +183,7 @@ The goal of enumeration value verification is to check whether the value of a fi
![dataquality_enum_check](../../../img/tasks/demo/enumeration_check.png)
| **Parameter** | **Description** |
| ----- | ---- |
|-----------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Source data type | Select MySQL, PostgreSQL, etc. |
| Source data source | The corresponding data source under the source data type. |
| Source data table | Drop-down to select the table where the validation data is located. |
@ -192,6 +199,7 @@ The goal of enumeration value verification is to check whether the value of a fi
## Table Row Number Verification for Single Table Check
### Inspection Introduction
The goal of table row number verification is to check whether the number of rows in the table reaches the expected value. If the number of rows does not meet the standard, the task will be judged as failed.
### Interface Operation Guide
@ -199,7 +207,7 @@ The goal of table row number verification is to check whether the number of rows
![dataquality_count_check](../../../img/tasks/demo/table_count_check.png)
| **Parameter** | **Description** |
| ----- | ---- |
|------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Source data type | Select MySQL, PostgreSQL, etc. |
| Source data source | The corresponding data source under the source data type. |
| Source data table | Drop-down to select the table where the validation data is located. |
@ -218,7 +226,7 @@ The goal of table row number verification is to check whether the number of rows
![dataquality_custom_sql_check](../../../img/tasks/demo/custom_sql_check.png)
| **Parameter** | **Description** |
| ----- | ---- |
|------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Source data type | Select MySQL, PostgreSQL, etc. |
| Source data source | The corresponding data source under the source data type. |
| Source data table | Drop-down to select the table where the data to be verified is located. |
@ -232,12 +240,14 @@ The goal of table row number verification is to check whether the number of rows
| Expected value type | Select the desired type from the drop-down menu. |
## Accuracy Check of Multi-table
### Inspection Introduction
Accuracy checks are performed by comparing the accuracy differences of data records for selected fields between two tables, examples are as follows
- table test1
| c1 | c2 |
| :---: | :---: |
|:--:|:--:|
| a | 1 |
| b | 2 |
@ -255,7 +265,7 @@ If you compare the data in c1 and c21, the tables test1 and test2 are exactly th
![dataquality_multi_table_accuracy_check](../../../img/tasks/demo/multi_table_accuracy_check.png)
| **Parameter** | **Description** |
| ----- | ---- |
|--------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Source data type | Select MySQL, PostgreSQL, etc. |
| Source data source | The corresponding data source under the source data type. |
| Source data table | Drop-down to select the table where the data to be verified is located. |
@ -271,7 +281,9 @@ If you compare the data in c1 and c21, the tables test1 and test2 are exactly th
| Expected value type | Select the desired type in the drop-down menu, only `SrcTableTotalRow`, `TargetTableTotalRow` and fixed value are suitable for selection here. |
## Comparison of the values checked by the two tables
### Inspection Introduction
Two-table value comparison allows users to customize different SQL statistics for two tables and compare the corresponding values. For example, for the source table A, the total amount of a certain column is calculated, and for the target table, the total amount of a certain column is calculated. value sum2, compare sum1 and sum2 to determine the check result.
### Interface Operation Guide
@ -279,7 +291,7 @@ Two-table value comparison allows users to customize different SQL statistics fo
![dataquality_multi_table_comparison_check](../../../img/tasks/demo/multi_table_comparison_check.png)
| **Parameter** | **Description** |
| ----- | ---- |
|--------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Source data type | Select MySQL, PostgreSQL, etc. |
| Source data source | The corresponding data source under the source data type. |
| Source data table | The table where the data is to be verified. |

3
docs/docs/en/guide/datasource/athena.md

@ -5,7 +5,7 @@
## Datasource Parameters
| **Datasource** | **Description** |
| --- | --- |
|----------------------------|-----------------------------------------------------------|
| Datasource | Select ATHENA. |
| Datasource name | Enter the name of the DataSource. |
| Description | Enter a description of the DataSource. |
@ -20,3 +20,4 @@
- No, read section example in [datasource-setting](../howto/datasource-setting.md) `DataSource Center` section to activate this datasource.
- JDBC driver configuration reference document [athena-connect-with-jdbc](https://docs.amazonaws.cn/athena/latest/ug/connect-with-jdbc.html)
- Driver download link [SimbaAthenaJDBC-2.0.31.1000/AthenaJDBC42.jar](https://s3.cn-north-1.amazonaws.com.cn/athena-downloads-cn/drivers/JDBC/SimbaAthenaJDBC-2.0.31.1000/AthenaJDBC42.jar)

2
docs/docs/en/guide/datasource/clickhouse.md

@ -5,7 +5,7 @@
## Datasource Parameters
| **Datasource** | **Description** |
| --- | --- |
|-------------------------|---------------------------------------------------------------|
| Datasource | Select CLICKHOUSE. |
| Datasource Name | Enter the name of the datasource. |
| Description | Enter a description of the datasource. |

2
docs/docs/en/guide/datasource/db2.md

@ -5,7 +5,7 @@
## Datasource Parameters
| **Datasource** | **Description** |
| --- | --- |
|-------------------------|--------------------------------------------------------|
| Datasource | Select DB2. |
| Datasource Name | Enter the name of the datasource. |
| Description | Enter a description of the datasource. |

2
docs/docs/en/guide/datasource/hive.md

@ -7,7 +7,7 @@
## Datasource Parameters
| **Datasource** | **Description** |
| --- | --- |
|----------------------------|---------------------------------------------------------|
| Datasource | Select HIVE. |
| Datasource name | Enter the name of the DataSource. |
| Description | Enter a description of the DataSource. |

2
docs/docs/en/guide/datasource/mysql.md

@ -5,7 +5,7 @@
## Datasource Parameters
| **Datasource** | **Description** |
| --- | --- |
|----------------------------|----------------------------------------------------------|
| Datasource | Select MYSQL. |
| Datasource name | Enter the name of the DataSource. |
| Description | Enter a description of the DataSource. |

2
docs/docs/en/guide/datasource/oracle.md

@ -5,7 +5,7 @@
## Datasource Parameters
| **Datasource** | **Description** |
| --- | --- |
|-------------------------|-----------------------------------------------------------|
| Datasource | Select Oracle. |
| Datasource Name | Enter the name of the datasource. |
| Description | Enter a description of the datasource. |

2
docs/docs/en/guide/datasource/postgresql.md

@ -5,7 +5,7 @@
## Datasource Parameters
| **Datasource** | **Description** |
| --- | --- |
|----------------------------|---------------------------------------------------------------|
| Datasource | Select POSTGRESQL. |
| Datasource name | Enter the name of the DataSource. |
| Description | Enter a description of the DataSource. |

3
docs/docs/en/guide/datasource/presto.md

@ -5,7 +5,7 @@
## Datasource Parameters
| **Datasource** | **Description** |
| --- | --- |
|-------------------------|-----------------------------------------------------------|
| Datasource | Select Presto. |
| Datasource Name | Enter the name of the datasource. |
| Description | Enter a description of the datasource. |
@ -16,7 +16,6 @@
| Database Name | Enter the database name of the Presto connection. |
| jdbc connect parameters | Parameter settings for Presto connection, in JSON format. |
## Native Supported
Yes, could use this datasource by default.

2
docs/docs/en/guide/datasource/redshift.md

@ -5,7 +5,7 @@
## Datasource Parameters
| **Datasource** | **Description** |
| --- | --- |
|-------------------------|-------------------------------------------------------------|
| Datasource | Select Redshift. |
| Datasource Name | Enter the name of the datasource. |
| Description | Enter a description of the datasource. |

2
docs/docs/en/guide/datasource/spark.md

@ -5,7 +5,7 @@
## Datasource Parameters
| **Datasource** | **Description** |
| --- | --- |
|----------------------------|----------------------------------------------------------|
| Datasource | Select Spark. |
| Datasource name | Enter the name of the DataSource. |
| Description | Enter a description of the DataSource. |

2
docs/docs/en/guide/datasource/sqlserver.md

@ -5,7 +5,7 @@
## Datasource Parameters
| **Datasource** | **Description** |
| --- | --- |
|-------------------------|--------------------------------------------------------------|
| Datasource | Select SQLSERVER. |
| Datasource Name | Enter the name of the datasource. |
| Description | Enter a description of the datasource. |

12
docs/docs/en/guide/expansion-reduction.md

@ -14,7 +14,6 @@ This article describes how to add a new master service or worker service to an e
* [required] [JDK](https://www.oracle.com/technetwork/java/javase/downloads/index.html) (version 1.8+): must install, install and configure `JAVA_HOME` and `PATH` variables under `/etc/profile`
* [optional] If the expansion is a worker node, you need to consider whether to install an external client, such as Hadoop, Hive, Spark Client.
```markdown
Attention: DolphinScheduler itself does not depend on Hadoop, Hive, Spark, but will only call their Client for the corresponding task submission.
```
@ -74,8 +73,7 @@ sed -i 's/Defaults requirett/#Defaults requirett/g' /etc/sudoers
zookeeper.properties: information for connecting zk
common.properties: Configuration information about the resource store (if hadoop is set up, please check if the core-site.xml and hdfs-site.xml configuration files exist).
dolphinscheduler_env.sh: environment Variables
````
```
- Modify the `dolphinscheduler_env.sh` environment variable in the `bin/env/dolphinscheduler_env.sh` directory according to the machine configuration (the following is the example that all the used software install under `/opt/soft`)
```shell
@ -94,15 +92,12 @@ sed -i 's/Defaults requirett/#Defaults requirett/g' /etc/sudoers
`Attention: This step is very important, such as `JAVA_HOME` and `PATH` is necessary to configure if haven not used just ignore or comment out`
- Soft link the `JDK` to `/usr/bin/java` (still using `JAVA_HOME=/opt/soft/java` as an example)
```shell
sudo ln -s /opt/soft/java/bin/java /usr/bin/java
```
- Modify the configuration file `conf/config/install_config.conf` on the **all** nodes, synchronizing the following configuration.
* To add a new master node, you need to modify the IPs and masters parameters.
* To add a new worker node, modify the IPs and workers parameters.
@ -120,6 +115,7 @@ masters="existing master01,existing master02,ds1,ds2"
workers="existing worker01:default,existing worker02:default,ds3:default,ds4:default"
```
- If the expansion is for worker nodes, you need to set the worker group, refer to the security of the [Worker grouping](./security.md)
- On all new nodes, change the directory permissions so that the deployment user has access to the DolphinScheduler directory
@ -222,13 +218,12 @@ bash bin/dolphinscheduler-daemon.sh start alert-server # start alert service
ApiApplicationServer ----- api service
AlertServer ----- alert service
```
If the corresponding master service or worker service does not exist, then the master or worker service is successfully shut down.
If the corresponding master service or worker service does not exist, then the master or worker service is successfully shut down.
### Modify the Configuration File
- modify the configuration file `conf/config/install_config.conf` on the **all** nodes, synchronizing the following configuration.
* to scale down the master node, modify the IPs and masters parameters.
* to scale down worker nodes, modify the IPs and workers parameters.
@ -246,3 +241,4 @@ masters="existing master01,existing master02,ds1,ds2"
workers="existing worker01:default,existing worker02:default,ds3:default,ds4:default"
```

1
docs/docs/en/guide/healthcheck.md

@ -39,3 +39,4 @@ curl --request GET 'http://localhost:50053/actuator/health'
```
> Notice: If you modify the default service port and address, you need to modify the IP+Port to the modified value.

5
docs/docs/en/guide/howto/datasource-setting.md

@ -5,7 +5,7 @@
We here use MySQL as an example to illustrate how to configure an external database:
> NOTE: If you use MySQL, you need to manually download [mysql-connector-java driver][mysql] (8.0.16) and move it to the libs directory of DolphinScheduler
which is `api-server/libs` and `alert-server/libs` and `master-server/libs` and `worker-server/libs`.
> which is `api-server/libs` and `alert-server/libs` and `master-server/libs` and `worker-server/libs`.
* First of all, follow the instructions in [datasource-setting](datasource-setting.md) `Pseudo-Cluster/Cluster Initialize the Database` section to create and initialize database
* Set the following environment variables in your terminal with your database address, username and password for `{address}`, `{user}` and `{password}`:
@ -27,7 +27,6 @@ DolphinScheduler stores metadata in `relational database`. Currently, we support
> If you use MySQL, you need to manually download [mysql-connector-java driver][mysql] (8.0.16) and move it to the libs directory of DolphinScheduler which is `api-server/libs` and `alert-server/libs` and `master-server/libs` and `worker-server/libs`.
For mysql 5.6 / 5.7
```shell
@ -58,6 +57,7 @@ mysql> FLUSH PRIVILEGES;
```
For PostgreSQL:
```shell
# Use psql-tools to login PostgreSQL
psql
@ -128,3 +128,4 @@ like Docker.
> But if you want to use MySQL as the metabase of DolphinScheduler, it only supports [8.0.16 and above](https:/ /repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.16/mysql-connector-java-8.0.16.jar) version.
[mysql]: https://downloads.MySQL.com/archives/c-j/

5
docs/docs/en/guide/installation/pseudo-cluster.md

@ -154,7 +154,7 @@ bash ./bin/install.sh
```
> **_Note:_** For the first time deployment, there maybe occur five times of `sh: bin/dolphinscheduler-daemon.sh: No such file or directory` in the terminal,
this is non-important information that you can ignore.
> this is non-important information that you can ignore.
## Login DolphinScheduler
@ -190,7 +190,7 @@ bash ./bin/dolphinscheduler-daemon.sh stop alert-server
> for micro-services need. It means that you could start all servers by command `<service>/bin/start.sh` with different
> environment variable from `<service>/conf/dolphinscheduler_env.sh`. But it will use file `bin/env/dolphinscheduler_env.sh` overwrite
> `<service>/conf/dolphinscheduler_env.sh` if you start server with command `/bin/dolphinscheduler-daemon.sh start <service>`.
>
> **_Note2:_**: Please refer to the section of "System Architecture Design" for service usage. Python gateway service is
> started along with the api-server, and if you do not want to start Python gateway service please disabled it by changing
> the yaml config `python-gateway.enabled : false` in api-server's configuration path `api-server/conf/application.yaml`
@ -198,3 +198,4 @@ bash ./bin/dolphinscheduler-daemon.sh stop alert-server
[jdk]: https://www.oracle.com/technetwork/java/javase/downloads/index.html
[zookeeper]: https://zookeeper.apache.org/releases.html
[issue]: https://github.com/apache/dolphinscheduler/issues/6597

5
docs/docs/en/guide/integration/rainbond.md

@ -15,7 +15,7 @@ This section describes the one-click deployment of high availability DolphinSche
* Click `install` on the right side of DolphinScheduler to go to the installation page. Fill in the corresponding information and click `OK` to start the installation. You will get automatically redirected to the application view.
| Select item | Description |
| ------------ | ------------------------------------ |
|--------------|-------------------------------------|
| Team name | user workspace,Isolate by namespace |
| Cluster name | select kubernetes cluster |
| Select app | select application |
@ -42,6 +42,7 @@ Take `worker` as an example: enter the `component -> Telescopic` page, and set t
To verify `worker` node, enter `DolphinScheduler UI -> Monitoring -> Worker` page to view detailed node information.
![](../../../../img/rainbond/monitor-dolphinscheduler.png)
## Configuration file
API and Worker Services share the configuration file `/opt/dolphinscheduler/conf/common.properties`. To modify the configurations, you only need to modify that of the API service.
@ -61,4 +62,6 @@ Take `DataX` as an example:
* LOCK_PATH:/opt/soft
3. Update component, the plug-in `Datax` will be downloaded automatically and decompress to `/opt/soft`
![](../../../../img/rainbond/plugin.png)
---

2
docs/docs/en/guide/metrics/metrics.md

@ -78,7 +78,6 @@ For example, you can get the master metrics by `curl http://localhost:5679/actua
- ds.task.execution.count: (counter) the number of executed tasks
- ds.task.execution.duration: (histogram) duration of task executions
### Workflow Related Metrics
- ds.workflow.create.command.count: (counter) the number of commands created and inserted by workflows
@ -175,3 +174,4 @@ For example, you can get the master metrics by `curl http://localhost:5679/actua
- system.load.average.1m: the total number of runnable entities queued to available processors and runnable entities running on the available processors averaged over a period
- logback.events: the number of events that made it to the logs grouped by the tag `level`
- http.server.requests: total number of http requests

2
docs/docs/en/guide/monitor.md

@ -29,7 +29,7 @@
![statistics](../../../img/new_ui/dev/monitor/statistics.png)
| **Parameter** | **Description** |
| ----- | ----- |
|----------------------------------------|----------------------------------------------------|
| Number of commands wait to be executed | Statistics of the `t_ds_command` table data. |
| The number of failed commands | Statistics of the `t_ds_error_command` table data. |
| Number of tasks wait to run | Count the data of `task_queue` in the ZooKeeper. |

4
docs/docs/en/guide/parameter/built-in.md

@ -3,7 +3,7 @@
## Basic Built-in Parameter
| Variable | Declaration Method | Meaning |
| ---- | ---- | -----------------------------|
|--------------------|-------------------------|---------------------------------------------------------------------------------------------|
| system.biz.date | `${system.biz.date}` | The day before the schedule time of the daily scheduling instance, the format is `yyyyMMdd` |
| system.biz.curdate | `${system.biz.curdate}` | The schedule time of the daily scheduling instance, the format is `yyyyMMdd` |
| system.datetime | `${system.datetime}` | The schedule time of the daily scheduling instance, the format is `yyyyMMddHHmmss` |
@ -22,7 +22,6 @@
- N years before:`$[add_months(yyyyMMdd,-12*N)]`
- Next N months:`$[add_months(yyyyMMdd,N)]`
- N months before:`$[add_months(yyyyMMdd,-N)]`
2. Add or minus numbers directly after the time format.
- Next N weeks:`$[yyyyMMdd+7*N]`
- First N weeks:`$[yyyyMMdd-7*N]`
@ -32,3 +31,4 @@
- First N hours:`$[HHmmss-N/24]`
- Next N minutes:`$[HHmmss+N/24/60]`
- First N minutes:`$[HHmmss-N/24/60]`

2
docs/docs/en/guide/project/project-list.md

@ -3,7 +3,7 @@
This page describes details regarding Project screen in Apache DolphinScheduler. Here, you will see all the functions which can be handled in this screen. The following table explains commonly used terms in Apache DolphinScheduler:
| Glossary | description |
| ------ |---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|---------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| DAG | Tasks in a workflow are assembled in form of Directed Acyclic Graph (DAG). A topological traversal is performed from nodes with zero degrees of entry until there are no subsequent nodes. |
| Workflow Definition | Visualization formed by dragging task nodes and establishing task node associations (DAG). |
| Workflow Instance | Instantiation of the workflow definition, which can be generated by manual start or scheduled scheduling. Each time the process definition runs, a workflow instance is generated. |

2
docs/docs/en/guide/project/task-definition.md

@ -1,6 +1,7 @@
# Task Definition
## Batch Task Definition
Task definition allows to modify or operate tasks at the task level rather than modifying them in the workflow definition.
We already have workflow level task editor in [workflow definition](workflow-definition.md) which you can click the specific
workflow and then edit its task definition. It is depressing when you want to edit the task definition but do not remember
@ -14,6 +15,7 @@ name but forget which workflow it belongs to. It is also supported query by the
`Workflow Name`
## Stream Task Definition
Stream task definitions are created in the workflow definition, and can be modified and executed.
![task-definition](../../../../img/new_ui/dev/project/stream-task-definition.png)

2
docs/docs/en/guide/project/task-instance.md

@ -1,6 +1,7 @@
# Task Instance
## Batch Task Instance
### Create Task Instance
Click `Project Management -> Workflow -> Task Instance` to enter the task instance page, as shown in the figure below, click the name of the workflow instance to jump to the DAG diagram of the workflow instance to view the task status.
@ -21,3 +22,4 @@ Click the `View Log` button in the operation column to view the log of the task
- SavePoint: Click the `SavePoint` button in the operation column to do stream task savepoint.
- Stop: Click the `Stop` button in the operation column to stop the stream task.

6
docs/docs/en/guide/project/workflow-definition.md

@ -35,6 +35,7 @@ Click the plus sign on the right of the task node to connect the task; as shown
![workflow-dependent](../../../../img/new_ui/dev/project/workflow-dependent.png)
### Dependencies with stream task
If the DAG contains stream tasks, the relationship between stream tasks is displayed as a dotted line, and the execution of stream tasks will be skipped when the workflow instance is executed.
![workflow-dependent](../../../../img/new_ui/dev/project/workflow-definition-with-stream-task.png)
@ -103,7 +104,6 @@ The following are the operation functions of the workflow definition list:
* Cc: select notification policy||timeout alarm||when fault tolerance occurs, the process result information or warning email will be copied to the CC list.
* Startup parameter: Set or overwrite global parameter values when starting a new process instance.
* Complement: refers to running the workflow definition within the specified date range and generating the corresponding workflow instance according to the complement policy. The complement policy includes two modes: **serial complement** and **parallel complement**. The date can be selected on the page or entered manually.
* Serial complement: within the specified time range, complement is executed from the start date to the end date, and multiple process instances are generated in turn; Click Run workflow and select the serial complement mode: for example, from July 9 to July 10, execute in sequence, and generate two process instances in sequence on the process instance page.
![workflow-serial](../../../../img/new_ui/dev/project/workflow-serial.png)
@ -143,6 +143,7 @@ The following are the operation functions of the workflow definition list:
![workflow-configuredTiming](../../../../img/new_ui/dev/project/workflow-configuredTiming.png)
![workflow-configuredTimingResult](../../../../img/new_ui/dev/project/workflow-configuredTimingResult.png)
## Run the task alone
- Right-click the task and click the `Start` button (only online tasks can be clicked to run).
@ -160,12 +161,15 @@ The following are the operation functions of the workflow definition list:
![workflow-time01](../../../../img/new_ui/dev/project/workflow-time01.png)
- Select a start and end time. Within the start and end time range, the workflow is run regularly; outside the start and end time range, no timed workflow instance will be generated.
- Add a timing that execute 5 minutes once, as shown in the following figure:
![workflow-time02](../../../../img/new_ui/dev/project/workflow-time02.png)
- Failure strategy, notification strategy, process priority, worker group, notification group, recipient, and CC are the same as workflow running parameters.
- Click the "Create" button to create the timing. Now the timing status is "**Offline**" and the timing needs to be **Online** to make effect.
- Schedule online: Click the `Timing Management` button <img src="../../../../img/timeManagement.png" width="35"/>, enter the timing management page, click the `online` button, the timing status will change to `online`, as shown in the below figure, the workflow makes effect regularly.
![workflow-time03](../../../../img/new_ui/dev/project/workflow-time03.png)

8
docs/docs/en/guide/project/workflow-instance.md

@ -43,15 +43,23 @@ Click `Project Management -> Workflow -> Workflow Instance`, enter the workflow
![workflow-instance](../../../../img/new_ui/dev/project/workflow-instance.png)
- **Edit:** Only processes with success/failed/stop status can be edited. Click the "Edit" button or the workflow instance name to enter the DAG edit page. After the edit, click the "Save" button to confirm, as shown in the figure below. In the pop-up box, check "Whether to update the workflow definition", after saving, the information modified by the instance will be updated to the workflow definition; if not checked, the workflow definition would not be updated.
<p align="center">
<img src="../../../../img/editDag-en.png" width="80%" />
</p>
- **Rerun:** Re-execute the terminated process
- **Recovery Failed:** For failed processes, you can perform failure recovery operations, starting from the failed node
- **Stop:** **Stop** the running process, the background code will first `kill` the worker process, and then execute `kill -9` operation
- **Pause:** **Pause** the running process, the system status will change to **waiting for execution**, it will wait for the task to finish, and pause the next sequence task.
- **Resume pause:** Resume the paused process, start running directly from the **paused node**
- **Delete:** Delete the workflow instance and the task instance under the workflow instance
- **Gantt Chart:** The vertical axis of the Gantt chart is the topological sorting of task instances of the workflow instance, and the horizontal axis is the running time of the task instances, as shown in the figure:
![instance-gantt](../../../../img/new_ui/dev/project/instance-gantt.png)

1
docs/docs/en/guide/resource/file-manage.md

@ -59,6 +59,7 @@ In the workflow definition module of project Manage, create a new workflow using
- Script: 'sh hello.sh'
- Resource: Select 'hello.sh'
> Notice: When using a resource file in the script, the file name needs to be the same as the full path of the selected resource:
> For example: if the resource path is `/resource/hello.sh`, you need to use the full path of `/resource/hello.sh` to use it in the script.

7
docs/docs/en/guide/security.md

@ -164,10 +164,10 @@ Create a task node in the workflow definition, select the worker group and the e
## Cluster Management
> Add or update cluster
- Each process can be related to zero or several clusters to support multiple environment, now just support k8s.
> - Each process can be related to zero or several clusters to support multiple environment, now just support k8s.
>
> Usage cluster
- After creation and authorization, k8s namespaces and processes will associate clusters. Each cluster will have separate workflows and task instances running independently.
> - After creation and authorization, k8s namespaces and processes will associate clusters. Each cluster will have separate workflows and task instances running independently.
![create-cluster](../../../img/new_ui/dev/security/create-cluster.png)
@ -183,4 +183,3 @@ Create a task node in the workflow definition, select the worker group and the e
![create-environment](../../../img/new_ui/dev/security/create-namespace.png)

1
docs/docs/en/guide/start/docker.md

@ -71,7 +71,6 @@ $ docker-compose --profile all up -d
[Using docker-compose to start server](#using-docker-compose-to-start-server) will create new a database and the ZooKeeper
container when it up. You could start DolphinScheduler server separately if you want to reuse your exists services.
```shell
$ DOLPHINSCHEDULER_VERSION=<version>
# Initialize the database, make sure database <DATABASE> already exists

1
docs/docs/en/guide/task/java.md

@ -9,6 +9,7 @@ This node is for executing java-type tasks and supports using files and jar pack
- Drag the toolbar's Java task node to the palette.
# Task Parameters
| **Parameter** | **Description** |
|--------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Node Name | The name of the set task. The node name in a workflow definition is unique. |

1
docs/docs/en/guide/upgrade/incompatible.md

@ -7,3 +7,4 @@ This document records the incompatible updates between each version. You need to
## 3.0.0
* Copy and import workflow without 'copy' suffix [#10607](https://github.com/apache/dolphinscheduler/pull/10607)

4
docs/docs/en/guide/upgrade/upgrade.md

@ -61,7 +61,7 @@ The architecture of worker group is different between version before version 1.3
* Check the backup database, search records in table `t_ds_worker_group` table and mainly focus on three columns: `id, name and IP`.
| id | name | ip_list |
| :--- | :---: | ---: |
|:---|:--------:|----------------------------:|
| 1 | service1 | 192.168.xx.10 |
| 2 | service2 | 192.168.xx.11,192.168.xx.12 |
@ -70,7 +70,7 @@ The architecture of worker group is different between version before version 1.3
Assume bellow are the machine worker service to be deployed:
| hostname | ip |
| :--- | :---: |
|:---------|:-------------:|
| ds1 | 192.168.xx.10 |
| ds2 | 192.168.xx.11 |
| ds3 | 192.168.xx.12 |

1
docs/docs/en/history-versions.md

@ -79,3 +79,4 @@
### Versions:Dev
#### Links:[Dev Document](../dev/user_doc/about/introduction.md)

1
docs/docs/zh/DSIP.md

@ -83,3 +83,4 @@ integer in [All DSIPs][all-DSIPs] issues.
[github-issue-choose]: https://github.com/apache/dolphinscheduler/issues/new/choose
[mail-to-dev]: mailto:dev@dolphinscheduler.apache.org
[DSIP-1]: https://github.com/apache/dolphinscheduler/issues/6407

1
docs/docs/zh/about/features.md

@ -17,3 +17,4 @@
## High Scalability
- **高扩展性**: 支持多租户和在线资源管理。支持每天10万个数据任务的稳定运行。

1
docs/docs/zh/about/glossary.md

@ -50,4 +50,3 @@
- dolphinscheduler-ui 前端模块

10
docs/docs/zh/about/hardware.md

@ -5,7 +5,7 @@ DolphinScheduler 作为一款开源分布式工作流任务调度系统,可以
## 1. Linux 操作系统版本要求
| 操作系统 | 版本 |
| :----------------------- | :----------: |
|:-------------------------|:---------:|
| Red Hat Enterprise Linux | 7.0 及以上 |
| CentOS | 7.0 及以上 |
| Oracle Enterprise Linux | 7.0 及以上 |
@ -15,29 +15,29 @@ DolphinScheduler 作为一款开源分布式工作流任务调度系统,可以
> 以上 Linux 操作系统可运行在物理服务器以及 VMware、KVM、XEN 主流虚拟化环境上
## 2. 服务器建议配置
DolphinScheduler 支持运行在 Intel x86-64 架构的 64 位通用硬件服务器平台。对生产环境的服务器硬件配置有以下建议:
### 生产环境
| **CPU** | **内存** | **硬盘类型** | **网络** | **实例数量** |
| --- | --- | --- | --- | --- |
|---------|--------|----------|--------|----------|
| 4核+ | 8 GB+ | SAS | 千兆网卡 | 1+ |
> **注意:**
> - 以上建议配置为部署 DolphinScheduler 的最低配置,生产环境强烈推荐使用更高的配置
> - 硬盘大小配置建议 50GB+ ,系统盘和数据盘分开
## 3. 网络要求
DolphinScheduler正常运行提供如下的网络端口配置:
| 组件 | 默认端口 | 说明 |
| --- | --- | --- |
|----------------------|-------|-------------------|
| MasterServer | 5678 | 非通信端口,只需本机端口不冲突即可 |
| WorkerServer | 1234 | 非通信端口,只需本机端口不冲突即可 |
| ApiApplicationServer | 12345 | 提供后端通信端口 |
> **注意:**
> - MasterServer 和 WorkerServer 不需要开启网络间通信,只需本机端口不冲突即可
> - 管理员可根据实际环境中 DolphinScheduler 组件部署方案,在网络侧和主机侧开放相关端口

17
docs/docs/zh/architecture/configuration.md

@ -1,9 +1,11 @@
<!-- markdown-link-check-disable -->
# 前言
本文档为dolphinscheduler配置文件说明文档。
# 目录结构
DolphinScheduler的目录结构如下:
```
@ -98,11 +100,13 @@ DolphinScheduler的目录结构如下:
# 配置文件详解
## dolphinscheduler-daemon.sh [启动/关闭DolphinScheduler服务脚本]
dolphinscheduler-daemon.sh脚本负责DolphinScheduler的启动&关闭.
start-all.sh/stop-all.sh最终也是通过dolphinscheduler-daemon.sh对集群进行启动/关闭操作.
目前DolphinScheduler只是做了一个基本的设置,JVM参数请根据各自资源的实际情况自行设置.
默认简化参数如下:
```bash
export DOLPHINSCHEDULER_OPTS="
-server
@ -120,6 +124,7 @@ export DOLPHINSCHEDULER_OPTS="
> 不建议设置"-XX:DisableExplicitGC" , DolphinScheduler使用Netty进行通讯,设置该参数,可能会导致内存泄漏.
## 数据库连接相关配置
在DolphinScheduler中使用Spring Hikari对数据库连接进行管理,配置文件位置:
|服务名称| 配置文件 |
@ -149,8 +154,8 @@ export DOLPHINSCHEDULER_OPTS="
DolphinScheduler同样可以通过设置环境变量进行数据库连接相关的配置, 将以上小写字母转成大写并把`.`换成`_`作为环境变量名, 设置值即可。
## Zookeeper相关配置
DolphinScheduler使用Zookeeper进行集群管理、容错、事件监听等功能,配置文件位置:
|服务名称| 配置文件 |
|--|--|
@ -175,6 +180,7 @@ DolphinScheduler使用Zookeeper进行集群管理、容错、事件监听等功
DolphinScheduler同样可以通过`bin/env/dolphinscheduler_env.sh`进行Zookeeper相关的配置。
## common.properties [hadoop、s3、yarn配置]
common.properties配置文件目前主要是配置hadoop/s3/yarn相关的配置,配置文件位置:
|服务名称| 配置文件 |
|--|--|
@ -217,6 +223,7 @@ common.properties配置文件目前主要是配置hadoop/s3/yarn相关的配置
|zeppelin.rest.url | http://localhost:8080 | zeppelin RESTful API 接口地址|
## Api-server相关配置
位置:`api-server/conf/application.yaml`
|参数 |默认值| 描述|
|--|--|--|
@ -245,6 +252,7 @@ common.properties配置文件目前主要是配置hadoop/s3/yarn相关的配置
|traffic.control.customize-tenant-qps-rate||自定义租户最大请求数/秒限制|
## Master Server相关配置
位置:`master-server/conf/application.yaml`
|参数 |默认值| 描述|
|--|--|--|
@ -266,6 +274,7 @@ common.properties配置文件目前主要是配置hadoop/s3/yarn相关的配置
|master.registry-disconnect-strategy.max-waiting-time|100s|当Master与注册中心失联之后重连时间, 之后当strategy为waiting时,该值生效。 该值表示当Master与注册中心失联时会在给定时间之内进行重连, 在给定时间之内重连失败将会停止自己,在重连时,Master会丢弃目前正在执行的工作流,值为0表示会无限期等待 |
## Worker Server相关配置
位置:`worker-server/conf/application.yaml`
|参数 |默认值| 描述|
|--|--|--|
@ -282,16 +291,16 @@ common.properties配置文件目前主要是配置hadoop/s3/yarn相关的配置
|worker.registry-disconnect-strategy.strategy|stop|当Worker与注册中心失联之后采取的策略, 默认值是: stop. 可选值包括: stop, waiting|
|worker.registry-disconnect-strategy.max-waiting-time|100s|当Worker与注册中心失联之后重连时间, 之后当strategy为waiting时,该值生效。 该值表示当Worker与注册中心失联时会在给定时间之内进行重连, 在给定时间之内重连失败将会停止自己,在重连时,Worker会丢弃kill正在执行的任务。值为0表示会无限期等待 |
## Alert Server相关配置
位置:`alert-server/conf/application.yaml`
|参数 |默认值| 描述|
|--|--|--|
|server.port|50053|Alert Server监听端口|
|alert.port|50052|alert监听端口|
## Quartz相关配置
这里面主要是quartz配置,请结合实际业务场景&资源进行配置,本文暂时不做展开,配置文件位置:
|服务名称| 配置文件 |
@ -319,7 +328,6 @@ common.properties配置文件目前主要是配置hadoop/s3/yarn相关的配置
|spring.quartz.properties.org.quartz.jobStore.driverDelegateClass | org.quartz.impl.jdbcjobstore.PostgreSQLDelegate|
|spring.quartz.properties.org.quartz.jobStore.clusterCheckinInterval | 5000|
## dolphinscheduler_env.sh [环境变量配置]
通过类似shell方式提交任务的的时候,会加载该配置文件中的环境变量到主机中。涉及到的 `JAVA_HOME` 任务类型的环境配置,其中任务类型主要有: Shell任务、Python任务、Spark任务、Flink任务、Datax任务等等。
@ -342,6 +350,7 @@ export PATH=$HADOOP_HOME/bin:$SPARK_HOME1/bin:$SPARK_HOME2/bin:$PYTHON_HOME/bin:
```
## 日志相关配置
|服务名称| 配置文件 |
|--|--|
|Master Server | `master-server/conf/logback-spring.xml`|

19
docs/docs/zh/architecture/design.md

@ -3,6 +3,7 @@
## 系统架构
### 系统架构图
<p align="center">
<img src="../../../img/architecture-1.3.0.jpg" alt="系统架构图" width="70%" />
<p align="center">
@ -11,6 +12,7 @@
</p>
### 启动流程活动图
<p align="center">
<img src="../../../img/process-start-flow-1.3.0.png" alt="启动流程活动图" width="70%" />
<p align="center">
@ -47,6 +49,7 @@
WorkerServer也采用分布式无中心设计理念,WorkerServer主要负责任务的执行和提供日志服务。
WorkerServer服务启动时向Zookeeper注册临时节点,并维持心跳。
WorkerServer基于netty提供监听服务。
##### 该服务包含:
- **WorkerManagerThread**主要负责任务队列的提交,不断从任务队列中领取任务,提交到线程池处理;
@ -79,6 +82,7 @@
##### 中心化思想
中心化的设计理念比较简单,分布式集群中的节点按照角色分工,大体上分为两种角色:
<p align="center">
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/master_slave.png" alt="master-slave角色" width="50%" />
</p>
@ -86,14 +90,13 @@
- Master的角色主要负责任务分发并监督Slave的健康状态,可以动态的将任务均衡到Slave上,以致Slave节点不至于“忙死”或”闲死”的状态。
- Worker的角色主要负责任务的执行工作并维护和Master的心跳,以便Master可以分配任务给Slave。
中心化思想设计存在的问题:
- 一旦Master出现了问题,则群龙无首,整个集群就会崩溃。为了解决这个问题,大多数Master/Slave架构模式都采用了主备Master的设计方案,可以是热备或者冷备,也可以是自动切换或手动切换,而且越来越多的新系统都开始具备自动选举切换Master的能力,以提升系统的可用性。
- 另外一个问题是如果Scheduler在Master上,虽然可以支持一个DAG中不同的任务运行在不同的机器上,但是会产生Master的过负载。如果Scheduler在Slave上,则一个DAG中所有的任务都只能在某一台机器上进行作业提交,则并行任务比较多的时候,Slave的压力可能会比较大。
##### 去中心化
<p align="center">
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/decentralization.png" alt="去中心化" width="50%" />
</p>
@ -101,10 +104,10 @@
- 在去中心化设计里,通常没有Master/Slave的概念,所有的角色都是一样的,地位是平等的,全球互联网就是一个典型的去中心化的分布式系统,联网的任意节点设备down机,都只会影响很小范围的功能。
- 去中心化设计的核心设计在于整个分布式系统中不存在一个区别于其他节点的”管理者”,因此不存在单点故障问题。但由于不存在” 管理者”节点所以每个节点都需要跟其他节点通信才得到必须要的机器信息,而分布式系统通信的不可靠性,则大大增加了上述功能的实现难度。
- 实际上,真正去中心化的分布式系统并不多见。反而动态中心化分布式系统正在不断涌出。在这种架构下,集群中的管理者是被动态选择出来的,而不是预置的,并且集群在发生故障的时候,集群的节点会自发的举行"会议"来选举新的"管理者"去主持工作。最典型的案例就是ZooKeeper及Go语言实现的Etcd。
- DolphinScheduler的去中心化是Master/Worker注册心跳到Zookeeper中,Master基于slot处理各自的Command,通过selector分发任务给worker,实现Master集群和Worker集群无中心。
#### 二、容错设计
容错分为服务宕机容错和任务重试,服务宕机容错又分为Master容错和Worker容错两种情况
##### 宕机容错
@ -160,37 +163,35 @@
如果工作流中有任务失败达到最大重试次数,工作流就会失败停止,失败的工作流可以手动进行重跑操作或者流程恢复操作。
#### 四、任务优先级设计
在早期调度设计中,如果没有优先级设计,采用公平调度设计的话,会遇到先行提交的任务可能会和后继提交的任务同时完成的情况,而不能做到设置流程或者任务的优先级,因此我们对此进行了重新设计,目前我们设计如下:
- 按照**不同流程实例优先级**优先于**同一个流程实例优先级**优先于**同一流程内任务优先级**优先于**同一流程内任务**提交顺序依次从高到低进行任务处理。
- 具体实现是根据任务实例的json解析优先级,然后把**流程实例优先级_流程实例id_任务优先级_任务id**信息保存在ZooKeeper任务队列中,当从任务队列获取的时候,通过字符串比较即可得出最需要优先执行的任务
- 其中流程定义的优先级是考虑到有些流程需要先于其他流程进行处理,这个可以在流程启动或者定时启动时配置,共有5级,依次为HIGHEST、HIGH、MEDIUM、LOW、LOWEST。如下图
<p align="center">
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/process_priority.png" alt="流程优先级配置" width="40%" />
</p>
- 任务的优先级也分为5级,依次为HIGHEST、HIGH、MEDIUM、LOW、LOWEST。如下图
<p align="center">
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/task_priority.png" alt="任务优先级配置" width="35%" />
</p>
#### 五、Logback和netty实现日志访问
- 由于Web(UI)和Worker不一定在同一台机器上,所以查看日志不能像查询本地文件那样。有两种方案:
- 将日志放到ES搜索引擎上
- 通过netty通信获取远程日志信息
- 介于考虑到尽可能的DolphinScheduler的轻量级性,所以选择了gRPC实现远程访问日志信息。
<p align="center">
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/grpc.png" alt="grpc远程访问" width="50%" />
</p>
- 详情可参考Master和Worker的logback配置,如下示例:
```xml
@ -217,6 +218,6 @@
```
## 总结
本文从调度出发,初步介绍了大数据分布式工作流调度系统--DolphinScheduler的架构原理及实现思路。未完待续
本文从调度出发,初步介绍了大数据分布式工作流调度系统--DolphinScheduler的架构原理及实现思路。未完待续

2
docs/docs/zh/architecture/load-balance.md

@ -1,4 +1,5 @@
### 负载均衡
负载均衡即通过路由算法(通常是集群环境),合理的分摊服务器压力,达到服务器性能的最大优化。
### DolphinScheduler-Worker 负载均衡算法
@ -56,3 +57,4 @@ eg:master.host.selector=random(不区分大小写)
* worker.max.cpuload.avg=-1 (worker最大cpuload均值,只有高于系统cpuload均值时,worker服务才能被派发任务. 默认值为-1: cpu cores * 2)
* worker.reserved.memory=0.3 (worker预留内存,只有低于系统可用内存时,worker服务才能被派发任务,单位为G)

6
docs/docs/zh/architecture/metadata.md

@ -1,11 +1,13 @@
# DolphinScheduler 元数据文档
## 表Schema
详见`dolphinscheduler/dolphinscheduler-dao/src/main/resources/sql`目录下的sql文件
## E-R图
### 用户 队列 数据源
![image.png](../../../img/metadata-erd/user-queue-datasource.png)
- 一个租户下可以有多个用户;<br />
@ -13,6 +15,7 @@
- `t_ds_datasource`表中的`user_id`字段表示创建该数据源的用户,`t_ds_relation_datasource_user`中的`user_id`表示对数据源有权限的用户;<br />
### 项目 资源 告警
![image.png](../../../img/metadata-erd/project-resource-alert.png)
- 一个用户可以有多个项目,用户项目授权通过`t_ds_relation_project_user`表完成project_id和user_id的关系绑定;<br />
@ -21,6 +24,7 @@
- `t_ds_udfs`表中的`user_id`表示创建该UDF的用户,`t_ds_relation_udfs_user`表中的`user_id`表示对UDF有权限的用户;<br />
### 项目 - 租户 - 工作流定义 - 定时
![image.png](../../../img/metadata-erd/project_tenant_process_definition_schedule.png)
- 一个项目可以有多个工作流定义,每个工作流定义只属于一个项目;<br />
@ -28,10 +32,10 @@
- 一个工作流定义可以有一个或多个定时的配置;<br />
### 工作流定义和执行
![image.png](../../../img/metadata-erd/process_definition.png)
- 一个工作流定义对应多个任务定义,通过`t_ds_process_task_relation`进行关联,关联的key是`code + version`,当任务的前置节点为空时,对应的`pre_task_node`和`pre_task_version`为0;
- 一个工作流定义可以有多个工作流实例`t_ds_process_instance`,一个工作流实例对应一个或多个任务实例`t_ds_task_instance`;
- `t_ds_relation_process_instance`表存放的数据用于处理流程定义中含有子流程的情况,`parent_process_instance_id`表示含有子流程的主流程实例id,`process_instance_id`表示子流程实例的id,`parent_task_instance_id`表示子流程节点的任务实例id,流程实例表和任务实例表分别对应`t_ds_process_instance`表和`t_ds_task_instance`表;

311
docs/docs/zh/architecture/task-structure.md

@ -1,32 +1,31 @@
# 任务总体存储结构
在dolphinscheduler中创建的所有任务都保存在t_ds_process_definition 表中.
该数据库表结构如下表所示:
序号 | 字段 | 类型 | 描述
-------- | ---------| -------- | ---------
1|id|int(11)|主键
2|name|varchar(255)|流程定义名称
3|version|int(11)|流程定义版本
4|release_state|tinyint(4)|流程定义的发布状态:0 未上线 , 1已上线
5|project_id|int(11)|项目id
6|user_id|int(11)|流程定义所属用户id
7|process_definition_json|longtext|流程定义JSON
8|description|text|流程定义描述
9|global_params|text|全局参数
10|flag|tinyint(4)|流程是否可用:0 不可用,1 可用
11|locations|text|节点坐标信息
12|connects|text|节点连线信息
13|receivers|text|收件人
14|receivers_cc|text|抄送人
15|create_time|datetime|创建时间
16|timeout|int(11) |超时时间
17|tenant_id|int(11) |租户id
18|update_time|datetime|更新时间
19|modify_by|varchar(36)|修改用户
20|resource_ids|varchar(255)|资源ids
| 序号 | 字段 | 类型 | 描述 |
|----|-------------------------|--------------|-------------------------|
| 1 | id | int(11) | 主键 |
| 2 | name | varchar(255) | 流程定义名称 |
| 3 | version | int(11) | 流程定义版本 |
| 4 | release_state | tinyint(4) | 流程定义的发布状态:0 未上线 , 1已上线 |
| 5 | project_id | int(11) | 项目id |
| 6 | user_id | int(11) | 流程定义所属用户id |
| 7 | process_definition_json | longtext | 流程定义JSON |
| 8 | description | text | 流程定义描述 |
| 9 | global_params | text | 全局参数 |
| 10 | flag | tinyint(4) | 流程是否可用:0 不可用,1 可用 |
| 11 | locations | text | 节点坐标信息 |
| 12 | connects | text | 节点连线信息 |
| 13 | receivers | text | 收件人 |
| 14 | receivers_cc | text | 抄送人 |
| 15 | create_time | datetime | 创建时间 |
| 16 | timeout | int(11) | 超时时间 |
| 17 | tenant_id | int(11) | 租户id |
| 18 | update_time | datetime | 更新时间 |
| 19 | modify_by | varchar(36) | 修改用户 |
| 20 | resource_ids | varchar(255) | 资源ids |
其中process_definition_json 字段为核心字段, 定义了 DAG 图中的任务信息.该数据以JSON 的方式进行存储.
@ -39,6 +38,7 @@
4|timeout|int|超时时间
数据示例:
```bash
{
"globalParams":[
@ -58,6 +58,7 @@
# 各任务类型存储结构详解
## Shell节点
**节点数据结构如下:**
序号|参数名||类型|描述 |描述
-------- | ---------| ---------| -------- | --------- | ---------
@ -81,7 +82,6 @@
18|workerGroup | |String |Worker 分组| |
19|preTasks | |Array|前置任务 | |
**节点数据样例:**
```bash
@ -131,8 +131,8 @@
```
## SQL节点
通过 SQL对指定的数据源进行数据查询、更新操作.
**节点数据结构如下:**
@ -168,7 +168,6 @@
28|workerGroup | |String |Worker 分组| |
29|preTasks | |Array|前置任务 | |
**节点数据样例:**
```bash
@ -230,47 +229,47 @@
}
```
## PROCEDURE[存储过程]节点
**节点数据结构如下:**
**节点数据样例:**
## SPARK节点
**节点数据结构如下:**
序号|参数名||类型|描述 |描述
-------- | ---------| ---------| -------- | --------- | ---------
1|id | |String| 任务编码|
2|type ||String |类型 |SPARK
3| name| |String|名称 |
4| params| |Object| 自定义参数 |Json 格式
5| |mainClass |String | 运行主类
6| |mainArgs | String| 运行参数
7| |others | String| 其他参数
8| |mainJar |Object | 程序 jar 包
9| |deployMode |String | 部署模式 |local,client,cluster
10| |driverCores | String| driver核数
11| |driverMemory | String| driver 内存数
12| |numExecutors |String | executor数量
13| |executorMemory |String | executor内存
14| |executorCores |String | executor核数
15| |programType | String| 程序类型|JAVA,SCALA,PYTHON
16| | sparkVersion| String| Spark 版本| SPARK1 , SPARK2
17| | localParams| Array|自定义参数
18| | resourceList| Array|资源文件
19|description | |String|描述 | |
20|runFlag | |String |运行标识| |
21|conditionResult | |Object|条件分支 | |
22| | successNode| Array|成功跳转节点| |
23| | failedNode|Array|失败跳转节点 |
24| dependence| |Object |任务依赖 |与params互斥
25|maxRetryTimes | |String|最大重试次数 | |
26|retryInterval | |String |重试间隔| |
27|timeout | |Object|超时控制 | |
28| taskInstancePriority| |String|任务优先级 | |
29|workerGroup | |String |Worker 分组| |
30|preTasks | |Array|前置任务 | |
**节点数据结构如下:**
| 序号 | 参数名 || 类型 | 描述 | 描述 |
|----|----------------------|----------------|--------|------------|----------------------|
| 1 | id | | String | 任务编码 |
| 2 | type || String | 类型 | SPARK |
| 3 | name | | String | 名称 |
| 4 | params | | Object | 自定义参数 | Json 格式 |
| 5 | | mainClass | String | 运行主类 |
| 6 | | mainArgs | String | 运行参数 |
| 7 | | others | String | 其他参数 |
| 8 | | mainJar | Object | 程序 jar 包 |
| 9 | | deployMode | String | 部署模式 | local,client,cluster |
| 10 | | driverCores | String | driver核数 |
| 11 | | driverMemory | String | driver 内存数 |
| 12 | | numExecutors | String | executor数量 |
| 13 | | executorMemory | String | executor内存 |
| 14 | | executorCores | String | executor核数 |
| 15 | | programType | String | 程序类型 | JAVA,SCALA,PYTHON |
| 16 | | sparkVersion | String | Spark 版本 | SPARK1 , SPARK2 |
| 17 | | localParams | Array | 自定义参数 |
| 18 | | resourceList | Array | 资源文件 |
| 19 | description | | String | 描述 | |
| 20 | runFlag | | String | 运行标识 | |
| 21 | conditionResult | | Object | 条件分支 | |
| 22 | | successNode | Array | 成功跳转节点 | |
| 23 | | failedNode | Array | 失败跳转节点 |
| 24 | dependence | | Object | 任务依赖 | 与params互斥 |
| 25 | maxRetryTimes | | String | 最大重试次数 | |
| 26 | retryInterval | | String | 重试间隔 | |
| 27 | timeout | | Object | 超时控制 | |
| 28 | taskInstancePriority | | String | 任务优先级 | |
| 29 | workerGroup | | String | Worker 分组 | |
| 30 | preTasks | | Array | 前置任务 | |
**节点数据样例:**
@ -333,38 +332,35 @@
}
```
## MapReduce(MR)节点
**节点数据结构如下:**
序号|参数名||类型|描述 |描述
-------- | ---------| ---------| -------- | --------- | ---------
1|id | |String| 任务编码|
2|type ||String |类型 |MR
3| name| |String|名称 |
4| params| |Object| 自定义参数 |Json 格式
5| |mainClass |String | 运行主类
6| |mainArgs | String| 运行参数
7| |others | String| 其他参数
8| |mainJar |Object | 程序 jar 包
9| |programType | String| 程序类型|JAVA,PYTHON
10| | localParams| Array|自定义参数
11| | resourceList| Array|资源文件
12|description | |String|描述 | |
13|runFlag | |String |运行标识| |
14|conditionResult | |Object|条件分支 | |
15| | successNode| Array|成功跳转节点| |
16| | failedNode|Array|失败跳转节点 |
17| dependence| |Object |任务依赖 |与params互斥
18|maxRetryTimes | |String|最大重试次数 | |
19|retryInterval | |String |重试间隔| |
20|timeout | |Object|超时控制 | |
21| taskInstancePriority| |String|任务优先级 | |
22|workerGroup | |String |Worker 分组| |
23|preTasks | |Array|前置任务 | |
**节点数据结构如下:**
| 序号 | 参数名 || 类型 | 描述 | 描述 |
|----|----------------------|--------------|--------|-----------|-------------|
| 1 | id | | String | 任务编码 |
| 2 | type || String | 类型 | MR |
| 3 | name | | String | 名称 |
| 4 | params | | Object | 自定义参数 | Json 格式 |
| 5 | | mainClass | String | 运行主类 |
| 6 | | mainArgs | String | 运行参数 |
| 7 | | others | String | 其他参数 |
| 8 | | mainJar | Object | 程序 jar 包 |
| 9 | | programType | String | 程序类型 | JAVA,PYTHON |
| 10 | | localParams | Array | 自定义参数 |
| 11 | | resourceList | Array | 资源文件 |
| 12 | description | | String | 描述 | |
| 13 | runFlag | | String | 运行标识 | |
| 14 | conditionResult | | Object | 条件分支 | |
| 15 | | successNode | Array | 成功跳转节点 | |
| 16 | | failedNode | Array | 失败跳转节点 |
| 17 | dependence | | Object | 任务依赖 | 与params互斥 |
| 18 | maxRetryTimes | | String | 最大重试次数 | |
| 19 | retryInterval | | String | 重试间隔 | |
| 20 | timeout | | Object | 超时控制 | |
| 21 | taskInstancePriority | | String | 任务优先级 | |
| 22 | workerGroup | | String | Worker 分组 | |
| 23 | preTasks | | Array | 前置任务 | |
**节点数据样例:**
@ -420,8 +416,8 @@
}
```
## Python节点
**节点数据结构如下:**
序号|参数名||类型|描述 |描述
-------- | ---------| ---------| -------- | --------- | ---------
@ -445,7 +441,6 @@
18|workerGroup | |String |Worker 分组| |
19|preTasks | |Array|前置任务 | |
**节点数据样例:**
```bash
@ -494,43 +489,40 @@
}
```
## Flink节点
**节点数据结构如下:**
序号|参数名||类型|描述 |描述
-------- | ---------| ---------| -------- | --------- | ---------
1|id | |String| 任务编码|
2|type ||String |类型 |FLINK
3| name| |String|名称 |
4| params| |Object| 自定义参数 |Json 格式
5| |mainClass |String | 运行主类
6| |mainArgs | String| 运行参数
7| |others | String| 其他参数
8| |mainJar |Object | 程序 jar 包
9| |deployMode |String | 部署模式 |local,client,cluster
10| |slot | String| slot数量
11| |taskManager |String | taskManager数量
12| |taskManagerMemory |String | taskManager内存数
13| |jobManagerMemory |String | jobManager内存数
14| |programType | String| 程序类型|JAVA,SCALA,PYTHON
15| | localParams| Array|自定义参数
16| | resourceList| Array|资源文件
17|description | |String|描述 | |
18|runFlag | |String |运行标识| |
19|conditionResult | |Object|条件分支 | |
20| | successNode| Array|成功跳转节点| |
21| | failedNode|Array|失败跳转节点 |
22| dependence| |Object |任务依赖 |与params互斥
23|maxRetryTimes | |String|最大重试次数 | |
24|retryInterval | |String |重试间隔| |
25|timeout | |Object|超时控制 | |
26| taskInstancePriority| |String|任务优先级 | |
27|workerGroup | |String |Worker 分组| |
38|preTasks | |Array|前置任务 | |
**节点数据结构如下:**
| 序号 | 参数名 || 类型 | 描述 | 描述 |
|----|----------------------|-------------------|--------|----------------|----------------------|
| 1 | id | | String | 任务编码 |
| 2 | type || String | 类型 | FLINK |
| 3 | name | | String | 名称 |
| 4 | params | | Object | 自定义参数 | Json 格式 |
| 5 | | mainClass | String | 运行主类 |
| 6 | | mainArgs | String | 运行参数 |
| 7 | | others | String | 其他参数 |
| 8 | | mainJar | Object | 程序 jar 包 |
| 9 | | deployMode | String | 部署模式 | local,client,cluster |
| 10 | | slot | String | slot数量 |
| 11 | | taskManager | String | taskManager数量 |
| 12 | | taskManagerMemory | String | taskManager内存数 |
| 13 | | jobManagerMemory | String | jobManager内存数 |
| 14 | | programType | String | 程序类型 | JAVA,SCALA,PYTHON |
| 15 | | localParams | Array | 自定义参数 |
| 16 | | resourceList | Array | 资源文件 |
| 17 | description | | String | 描述 | |
| 18 | runFlag | | String | 运行标识 | |
| 19 | conditionResult | | Object | 条件分支 | |
| 20 | | successNode | Array | 成功跳转节点 | |
| 21 | | failedNode | Array | 失败跳转节点 |
| 22 | dependence | | Object | 任务依赖 | 与params互斥 |
| 23 | maxRetryTimes | | String | 最大重试次数 | |
| 24 | retryInterval | | String | 重试间隔 | |
| 25 | timeout | | Object | 超时控制 | |
| 26 | taskInstancePriority | | String | 任务优先级 | |
| 27 | workerGroup | | String | Worker 分组 | |
| 38 | preTasks | | Array | 前置任务 | |
**节点数据样例:**
@ -593,33 +585,33 @@
```
## HTTP节点
**节点数据结构如下:**
序号|参数名||类型|描述 |描述
-------- | ---------| ---------| -------- | --------- | ---------
1|id | |String| 任务编码|
2|type ||String |类型 |HTTP
3| name| |String|名称 |
4| params| |Object| 自定义参数 |Json 格式
5| |url |String | 请求地址
6| |httpMethod | String| 请求方式|GET,POST,HEAD,PUT,DELETE
7| | httpParams| Array|请求参数
8| |httpCheckCondition | String| 校验条件|默认响应码200
9| |condition |String | 校验内容
10| | localParams| Array|自定义参数
11|description | |String|描述 | |
12|runFlag | |String |运行标识| |
13|conditionResult | |Object|条件分支 | |
14| | successNode| Array|成功跳转节点| |
15| | failedNode|Array|失败跳转节点 |
16| dependence| |Object |任务依赖 |与params互斥
17|maxRetryTimes | |String|最大重试次数 | |
18|retryInterval | |String |重试间隔| |
19|timeout | |Object|超时控制 | |
20| taskInstancePriority| |String|任务优先级 | |
21|workerGroup | |String |Worker 分组| |
22|preTasks | |Array|前置任务 | |
**节点数据结构如下:**
| 序号 | 参数名 || 类型 | 描述 | 描述 |
|----|----------------------|--------------------|--------|-----------|--------------------------|
| 1 | id | | String | 任务编码 |
| 2 | type || String | 类型 | HTTP |
| 3 | name | | String | 名称 |
| 4 | params | | Object | 自定义参数 | Json 格式 |
| 5 | | url | String | 请求地址 |
| 6 | | httpMethod | String | 请求方式 | GET,POST,HEAD,PUT,DELETE |
| 7 | | httpParams | Array | 请求参数 |
| 8 | | httpCheckCondition | String | 校验条件 | 默认响应码200 |
| 9 | | condition | String | 校验内容 |
| 10 | | localParams | Array | 自定义参数 |
| 11 | description | | String | 描述 | |
| 12 | runFlag | | String | 运行标识 | |
| 13 | conditionResult | | Object | 条件分支 | |
| 14 | | successNode | Array | 成功跳转节点 | |
| 15 | | failedNode | Array | 失败跳转节点 |
| 16 | dependence | | Object | 任务依赖 | 与params互斥 |
| 17 | maxRetryTimes | | String | 最大重试次数 | |
| 18 | retryInterval | | String | 重试间隔 | |
| 19 | timeout | | Object | 超时控制 | |
| 20 | taskInstancePriority | | String | 任务优先级 | |
| 21 | workerGroup | | String | Worker 分组 | |
| 22 | preTasks | | Array | 前置任务 | |
**节点数据样例:**
@ -677,8 +669,6 @@
}
```
## DataX节点
**节点数据结构如下:**
@ -714,11 +704,8 @@
28|workerGroup | |String |Worker 分组| |
29|preTasks | |Array|前置任务 | |
**节点数据样例:**
```bash
{
"type":"DATAX",
@ -798,9 +785,6 @@
22|workerGroup | |String |Worker 分组| |
23|preTasks | |Array|前置任务 | |
**节点数据样例:**
```bash
@ -869,7 +853,6 @@
15|workerGroup | |String |Worker 分组| |
16|preTasks | |Array|前置任务 | |
**节点数据样例:**
```bash
@ -912,8 +895,8 @@
}
```
## 子流程节点
**节点数据结构如下:**
序号|参数名||类型|描述 |描述
-------- | ---------| ---------| -------- | --------- | ---------
@ -935,7 +918,6 @@
16|workerGroup | |String |Worker 分组| |
17|preTasks | |Array|前置任务 | |
**节点数据样例:**
```bash
@ -972,9 +954,8 @@
}
```
## 依赖(DEPENDENT)节点
**节点数据结构如下:**
序号|参数名||类型|描述 |描述
-------- | ---------| ---------| -------- | --------- | ---------
@ -1000,7 +981,6 @@
20|workerGroup | |String |Worker 分组| |
21|preTasks | |Array|前置任务 | |
**节点数据样例:**
```bash
@ -1132,3 +1112,4 @@
]
}
```

24
docs/docs/zh/contribute/api-standard.md

@ -1,9 +1,11 @@
# API 设计规范
规范统一的 API 是项目设计的基石。DolphinScheduler 的 API 遵循 REST ful 标准,REST ful 是目前最流行的一种互联网软件架构,它结构清晰,符合标准,易于理解,扩展方便。
本文以 DolphinScheduler 项目的接口为样例,讲解如何构造具有 Restful 风格的 API。
## 1. URI 设计
REST 即为 Representational State Transfer 的缩写,即“表现层状态转化”。
“表现层”指的就是“资源”。资源对应网络上的一种实体,例如:一段文本,一张图片,一种服务。且每种资源都对应一个特定的 URI。
@ -15,36 +17,43 @@ Restful URI 的设计基于资源:
+ 子资源下的单个资源:`/instances/{instanceId}/tasks/{taskId}`;
## 2. Method 设计
我们需要通过 URI 来定位某种资源,再通过 Method,或者在路径后缀声明动作来体现对资源的操作。
### ① 查询操作 - GET
通过 URI 来定位要资源,通过 GET 表示查询。
+ 当 URI 为一类资源时表示查询一类资源,例如下面样例表示分页查询 `alter-groups`
```
Method: GET
/dolphinscheduler/alert-groups
```
+ 当 URI 为单个资源时表示查询此资源,例如下面样例表示查询对应的 `alter-group`
```
Method: GET
/dolphinscheduler/alter-groups/{id}
```
+ 此外,我们还可以根据 URI 来表示查询子资源,如下:
```
Method: GET
/dolphinscheduler/projects/{projectId}/tasks
```
**上述的关于查询的方式都表示分页查询,如果我们需要查询全部数据的话,则需在 URI 的后面加 `/list` 来区分。分页查询和查询全部不要混用一个 API。**
```
Method: GET
/dolphinscheduler/alert-groups/list
```
### ② 创建操作 - POST
通过 URI 来定位要创建的资源类型,通过 POST 表示创建动作,并且将创建后的 `id` 返回给请求者。
+ 下面样例表示创建一个 `alter-group`
@ -55,57 +64,72 @@ Method: POST
```
+ 创建子资源也是类似的操作:
```
Method: POST
/dolphinscheduler/alter-groups/{alterGroupId}/tasks
```
### ③ 修改操作 - PUT
通过 URI 来定位某一资源,通过 PUT 指定对其修改。
```
Method: PUT
/dolphinscheduler/alter-groups/{alterGroupId}
```
### ④ 删除操作 -DELETE
通过 URI 来定位某一资源,通过 DELETE 指定对其删除。
+ 下面例子表示删除 `alterGroupId` 对应的资源:
```
Method: DELETE
/dolphinscheduler/alter-groups/{alterGroupId}
```
+ 批量删除:对传入的 id 数组进行批量删除,使用 POST 方法。**(这里不要用 DELETE 方法,因为 DELETE 请求的 body 在语义上没有任何意义,而且有可能一些网关,代理,防火墙在收到 DELETE 请求后会把请求的 body 直接剥离掉。)**
```
Method: POST
/dolphinscheduler/alter-groups/batch-delete
```
### ⑤ 部分更新操作 -PATCH
通过 URI 来定位某一资源,通过 PATCH 指定对其部分更新。
+ 下面例子表示部分更新 `alterGroupId` 对应的资源:
```
Method: PATCH
/dolphinscheduler/alter-groups/{alterGroupId}
```
### ⑥ 其他操作
除增删改查外的操作,我们同样也通过 `url` 定位到对应的资源,然后再在路径后面追加对其进行的操作。例如:
```
/dolphinscheduler/alert-groups/verify-name
/dolphinscheduler/projects/{projectCode}/process-instances/{code}/view-gantt
```
## 3. 参数设计
参数分为两种,一种是请求参数(Request Param 或 Request Body),另一种是路径参数(Path Param)。
参数变量必须用小驼峰表示,并且在分页场景中,用户输入的参数小于 1,则前端需要返给后端 1 表示请求第一页;当后端发现用户输入的参数大于总页数时,直接返回最后一页。
## 4. 其他设计
### 基础路径
整个项目的 URI 需要以 `/<project_name>` 作为基础路径,从而标识这类 API 都是项目下的,即:
```
/dolphinscheduler
```

4
docs/docs/zh/contribute/api-test.md

@ -1,4 +1,5 @@
# DolphinScheduler — API 测试
## 前置知识:
### API 测试与单元测试的区别
@ -47,10 +48,8 @@ public final class LoginPage {
在登陆页面(LoginPage)只定义接口请求的入参规范,对于接口请求出参只定义统一的基础响应结构,接口实际返回的data数据则再实际的测试用例中测试。主要测试接口的输入和输出是否能够符合测试用例的要求。
### API-Cases
下面以租户管理测试为例,前文已经说明,我们使用 docker-compose 进行部署,所以每个测试案例,都需要以注解的形式引入对应的文件。
使用 OkHttpClient 框架来进行 HTTP 请求。在每个测试案例开始之前都需要进行一些准备工作。比如:登录用户、创建对应的租户(根据具体的测试案例而定)。
@ -83,7 +82,6 @@ public final class LoginPage {
https://github.com/apache/dolphinscheduler/tree/dev/dolphinscheduler-api-test/dolphinscheduler-api-test-case/src/test/java/org/apache/dolphinscheduler/api.test/cases
## 补充
在本地运行的时候,首先需要启动相应的本地服务,可以参考该页面: [环境搭建](./development-environment-setup.md)

37
docs/docs/zh/contribute/architecture-design.md

@ -1,7 +1,9 @@
## 系统架构设计
在对调度系统架构说明之前,我们先来认识一下调度系统常用的名词
### 1.名词解释
**DAG:** 全称Directed Acyclic Graph,简称DAG。工作流中的Task任务以有向无环图的形式组装起来,从入度为零的节点进行拓扑遍历,直到无后继节点为止。举例如下图:
<p align="center">
@ -36,6 +38,7 @@
### 2.系统架构
#### 2.1 系统架构图
<p align="center">
<img src="../../../img/architecture.jpg" alt="系统架构图" />
<p align="center">
@ -63,9 +66,10 @@
* **WorkerServer**
WorkerServer也采用分布式无中心设计理念,WorkerServer主要负责任务的执行和提供日志服务。WorkerServer服务启动时向Zookeeper注册临时节点,并维持心跳。
##### 该服务包含:
- **FetchTaskThread**主要负责不断从**Task Queue**中领取任务,并根据不同任务类型调用**TaskScheduleThread**对应执行器。
- **FetchTaskThread**主要负责不断从**Task Queue**中领取任务,并根据不同任务类型调用**TaskScheduleThread**对应执行器。
* **ZooKeeper**
ZooKeeper服务,系统中的MasterServer和WorkerServer节点都通过ZooKeeper来进行集群管理和容错。另外系统还基于ZooKeeper进行事件监听和分布式锁。
@ -95,6 +99,7 @@
###### 中心化思想
中心化的设计理念比较简单,分布式集群中的节点按照角色分工,大体上分为两种角色:
<p align="center">
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/master_slave.png" alt="master-slave角色" width="50%" />
</p>
@ -102,16 +107,13 @@
- Master的角色主要负责任务分发并监督Slave的健康状态,可以动态的将任务均衡到Slave上,以致Slave节点不至于“忙死”或”闲死”的状态。
- Worker的角色主要负责任务的执行工作并维护和Master的心跳,以便Master可以分配任务给Slave。
中心化思想设计存在的问题:
- 一旦Master出现了问题,则群龙无首,整个集群就会崩溃。为了解决这个问题,大多数Master/Slave架构模式都采用了主备Master的设计方案,可以是热备或者冷备,也可以是自动切换或手动切换,而且越来越多的新系统都开始具备自动选举切换Master的能力,以提升系统的可用性。
- 另外一个问题是如果Scheduler在Master上,虽然可以支持一个DAG中不同的任务运行在不同的机器上,但是会产生Master的过负载。如果Scheduler在Slave上,则一个DAG中所有的任务都只能在某一台机器上进行作业提交,则并行任务比较多的时候,Slave的压力可能会比较大。
###### 去中心化
<p align="center"
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/decentralization.png" alt="去中心化" width="50%" />
</p>
@ -119,25 +121,23 @@
- 在去中心化设计里,通常没有Master/Slave的概念,所有的角色都是一样的,地位是平等的,全球互联网就是一个典型的去中心化的分布式系统,联网的任意节点设备down机,都只会影响很小范围的功能。
- 去中心化设计的核心设计在于整个分布式系统中不存在一个区别于其他节点的”管理者”,因此不存在单点故障问题。但由于不存在” 管理者”节点所以每个节点都需要跟其他节点通信才得到必须要的机器信息,而分布式系统通信的不可靠性,则大大增加了上述功能的实现难度。
- 实际上,真正去中心化的分布式系统并不多见。反而动态中心化分布式系统正在不断涌出。在这种架构下,集群中的管理者是被动态选择出来的,而不是预置的,并且集群在发生故障的时候,集群的节点会自发的举行"会议"来选举新的"管理者"去主持工作。最典型的案例就是ZooKeeper及Go语言实现的Etcd。
- DolphinScheduler的去中心化是Master/Worker注册到Zookeeper中,实现Master集群和Worker集群无中心,并使用Zookeeper分布式锁来选举其中的一台Master或Worker为“管理者”来执行任务。
##### 二、分布式锁实践
DolphinScheduler使用ZooKeeper分布式锁来实现同一时刻只有一台Master执行Scheduler,或者只有一台Worker执行任务的提交。
1. 获取分布式锁的核心流程算法如下
<p align="center">
<img src="../../../img/architecture-design/distributed_lock.png" alt="获取分布式锁流程" width="70%" />
</p>
2. DolphinScheduler中Scheduler线程分布式锁实现流程图:
<p align="center">
<img src="../../../img/architecture-design/distributed_lock_procss.png" alt="获取分布式锁流程" />
</p>
##### 三、线程不足循环等待问题
- 如果一个DAG中没有子流程,则如果Command中的数据条数大于线程池设置的阈值,则直接流程等待或失败。
@ -158,8 +158,8 @@ DolphinScheduler使用ZooKeeper分布式锁来实现同一时刻只有一台Mast
于是我们选择了第三种方式来解决线程不足的问题。
##### 四、容错设计
容错分为服务宕机容错和任务重试,服务宕机容错又分为Master容错和Worker容错两种情况
###### 1. 宕机容错
@ -171,8 +171,6 @@ DolphinScheduler使用ZooKeeper分布式锁来实现同一时刻只有一台Mast
</p>
其中Master监控其他Master和Worker的目录,如果监听到remove事件,则会根据具体的业务逻辑进行流程实例容错或者任务实例容错。
- Master容错流程图:
<p align="center">
@ -180,8 +178,6 @@ DolphinScheduler使用ZooKeeper分布式锁来实现同一时刻只有一台Mast
</p>
ZooKeeper Master容错完成之后则重新由DolphinScheduler中Scheduler线程调度,遍历 DAG 找到”正在运行”和“提交成功”的任务,对”正在运行”的任务监控其任务实例的状态,对”提交成功”的任务需要判断Task Queue中是否已经存在,如果存在则同样监控任务实例的状态,如果不存在则重新提交任务实例。
- Worker容错流程图:
<p align="center">
@ -200,8 +196,6 @@ Master Scheduler线程一旦发现任务实例为” 需要容错”状态,则
- 流程失败恢复是流程级别的,是手动进行的,恢复是从只能**从失败的节点开始执行**或**从当前节点开始执行**
- 流程失败重跑也是流程级别的,是手动进行的,重跑是从开始节点进行
接下来说正题,我们将工作流中的任务节点分了两种类型。
- 一种是业务节点,这种节点都对应一个实际的脚本或者处理语句,比如Shell节点,MR节点、Spark节点、依赖节点等。
@ -212,38 +206,35 @@ Master Scheduler线程一旦发现任务实例为” 需要容错”状态,则
如果工作流中有任务失败达到最大重试次数,工作流就会失败停止,失败的工作流可以手动进行重跑操作或者流程恢复操作
##### 五、任务优先级设计
在早期调度设计中,如果没有优先级设计,采用公平调度设计的话,会遇到先行提交的任务可能会和后继提交的任务同时完成的情况,而不能做到设置流程或者任务的优先级,因此我们对此进行了重新设计,目前我们设计如下:
- 按照**不同流程实例优先级**优先于**同一个流程实例优先级**优先于**同一流程内任务优先级**优先于**同一流程内任务**提交顺序依次从高到低进行任务处理。
- 具体实现是根据任务实例的json解析优先级,然后把**流程实例优先级_流程实例id_任务优先级_任务id**信息保存在ZooKeeper任务队列中,当从任务队列获取的时候,通过字符串比较即可得出最需要优先执行的任务
- 其中流程定义的优先级是考虑到有些流程需要先于其他流程进行处理,这个可以在流程启动或者定时启动时配置,共有5级,依次为HIGHEST、HIGH、MEDIUM、LOW、LOWEST。如下图
<p align="center">
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/process_priority.png" alt="流程优先级配置" width="40%" />
</p>
- 任务的优先级也分为5级,依次为HIGHEST、HIGH、MEDIUM、LOW、LOWEST。如下图
<p align="center">
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/task_priority.png" alt="任务优先级配置" width="35%" />
</p>
##### 六、Logback和gRPC实现日志访问
- 由于Web(UI)和Worker不一定在同一台机器上,所以查看日志不能像查询本地文件那样。有两种方案:
- 将日志放到ES搜索引擎上
- 通过gRPC通信获取远程日志信息
- 介于考虑到尽可能的DolphinScheduler的轻量级性,所以选择了gRPC实现远程访问日志信息。
<p align="center">
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/grpc.png" alt="grpc远程访问" width="60%" />
</p>
- 我们使用自定义Logback的FileAppender和Filter功能,实现每个任务实例生成一个日志文件。
- FileAppender主要实现如下:
@ -273,7 +264,6 @@ Master Scheduler线程一旦发现任务实例为” 需要容错”状态,则
}
```
以/流程定义id/流程实例id/任务实例id.log的形式生成日志
- 过滤匹配以TaskLogInfo开始的线程名称:
@ -297,5 +287,6 @@ public class TaskLogFilter extends Filter<ILoggingEvent> {
```
### 总结
本文从调度出发,初步介绍了大数据分布式工作流调度系统--DolphinScheduler的架构原理及实现思路。未完待续

2
docs/docs/zh/contribute/backend/mechanism/overview.md

@ -1,6 +1,6 @@
# 综述
<!-- TODO 由于 side menu 不支持多个等级,所以新建了一个leading page存放 -->
* [全局参数](global-parameter.md)
* [switch任务类型](task/switch.md)

1
docs/docs/zh/contribute/backend/mechanism/task/switch.md

@ -6,3 +6,4 @@ Switch任务类型的工作流程如下
* SwitchTaskExecThread从上到下(用户在页面上定义的表达式顺序)处理switch中定义的表达式,从varPool中获取变量的值,通过js解析表达式,如果表达式返回true,则停止检查,并且记录该表达式的顺序,这里我们记录为resultConditionLocation。SwitchTaskExecThread的任务便结束了。
* 当switch节点运行结束之后,如果没有发生错误(较为常见的是用户定义的表达式不合规范或参数名有问题),这个时候MasterExecThread.submitPostNode会获取DAG的下游节点继续执行。
* DagHelper.parsePostNodes中如果发现当前节点(刚刚运行完成功的节点)是switch节点的话,会获取resultConditionLocation,将SwitchParameters中除了resultConditionLocation以外的其他分支全部skip掉。这样留下来的就只有需要执行的分支了。

9
docs/docs/zh/contribute/backend/spi/alert.md

@ -26,7 +26,6 @@ DolphinScheduler 正在处于微内核 + 插件化的架构更改之中,所有
该模块是目前我们提供的插件,目前我们已经支持数十种插件,如 Email、DingTalk、Script等。
#### Alert SPI 主要类信息:
AlertChannelFactory
@ -64,31 +63,39 @@ alert_spi 具体设计可见 issue:[Alert Plugin Design](https://github.com/ap
钉钉群聊机器人告警
相关参数配置可以参考钉钉机器人文档。
* EnterpriseWeChat
企业微信告警通知
相关参数配置可以参考企业微信机器人文档。
* Script
我们实现了 Shell 脚本告警,我们会将相关告警参数透传给脚本,你可以在 Shell 中实现你的相关告警逻辑,如果你需要对接内部告警应用,这是一种不错的方法。
* FeiShu
飞书告警通知
* Slack
Slack告警通知
* PagerDuty
PagerDuty告警通知
* WebexTeams
WebexTeams告警通知
相关参数配置可以参考WebexTeams文档。
* Telegram
Telegram告警通知
相关参数配置可以参考Telegram文档。
* Http
我们实现了Http告警,调用大部分的告警插件最终都是Http请求,如果我们没有支持你常用插件,可以使用Http来实现你的告警需求,同时也欢迎将你常用插件贡献到社区。

2
docs/docs/zh/contribute/backend/spi/registry.md

@ -6,6 +6,7 @@
* 注册中心插件配置, 以Zookeeper 为例 (registry.properties)
dolphinscheduler-service/src/main/resources/registry.properties
```registry.properties
registry.plugin.name=zookeeper
registry.servers=127.0.0.1:2181
@ -19,6 +20,7 @@
`dolphinscheduler-registry-api` 定义了实现插件的标准,当你需要扩展插件的时候只需要实现 `org.apache.dolphinscheduler.registry.api.RegistryFactory` 即可。
`dolphinscheduler-registry-plugin` 模块下是我们目前所提供的注册中心插件。
#### FAQ
1:registry connect timeout

1
docs/docs/zh/contribute/e2e-test.md

@ -1,4 +1,5 @@
# DolphinScheduler — E2E 自动化测试
## 一、前置知识:
### 1、E2E 测试与单元测试的区别

71
docs/docs/zh/contribute/frontend-development.md

@ -1,6 +1,7 @@
# 前端开发文档
### 技术选型
```
Vue mvvm 框架
@ -17,10 +18,16 @@ Lodash 高性能的 JavaScript 实用工具库
### 开发环境搭建
- #### Node安装
-
#### Node安装
Node包下载 (注意版本 v12.20.2) `https://nodejs.org/download/release/v12.20.2/`
- #### 前端项目构建
-
#### 前端项目构建
用命令行模式 `cd` 进入 `dolphinscheduler-ui`项目目录并执行 `npm install` 拉取项目依赖包
> 如果 `npm install` 速度非常慢,你可以设置淘宝镜像
@ -36,13 +43,16 @@ npm config set registry http://registry.npm.taobao.org/
API_BASE = http://127.0.0.1:12345
```
> ##### !!!这里特别注意 项目如果在拉取依赖包的过程中报 " node-sass error " 错误,请在执行完后再次执行以下命令
##### !!!这里特别注意 项目如果在拉取依赖包的过程中报 " node-sass error " 错误,请在执行完后再次执行以下命令
```bash
npm install node-sass --unsafe-perm #单独安装node-sass依赖
```
- #### 开发环境运行
-
#### 开发环境运行
- `npm start` 项目开发环境 (启动后访问地址 http://localhost:8888)
#### 前端项目发布
@ -140,6 +150,7 @@ npm install node-sass --unsafe-perm #单独安装node-sass依赖
首页 => `http://localhost:8888/#/home`
项目管理 => `http://localhost:8888/#/projects/list`
```
| 项目首页
| 工作流
@ -149,6 +160,7 @@ npm install node-sass --unsafe-perm #单独安装node-sass依赖
```
资源管理 => `http://localhost:8888/#/resource/file`
```
| 文件管理
| UDF管理
@ -159,6 +171,7 @@ npm install node-sass --unsafe-perm #单独安装node-sass依赖
数据源管理 => `http://localhost:8888/#/datasource/list`
安全中心 => `http://localhost:8888/#/security/tenant`
```
| 租户管理
| 用户管理
@ -174,16 +187,19 @@ npm install node-sass --unsafe-perm #单独安装node-sass依赖
项目 `src/js/conf/home` 下分为
`pages` => 路由指向页面目录
```
路由地址对应的页面文件
```
`router` => 路由管理
```
vue的路由器,在每个页面的入口文件index.js 都会注册进来 具体操作:https://router.vuejs.org/zh/
```
`store` => 状态管理
```
每个路由对应的页面都有一个状态管理的文件 分为:
@ -201,9 +217,13 @@ state => mapState => 详情:https://vuex.vuejs.org/zh/guide/state.html
```
## 规范
## Vue规范
##### 1.组件名
组件名为多个单词,并且用连接线(-)连接,避免与 HTML 标签冲突,并且结构更加清晰。
```
// 正例
export default {
@ -212,7 +232,9 @@ export default {
```
##### 2.组件文件
`src/js/module/components`项目内部公共组件书写文件夹名与文件名同名,公共组件内部所拆分的子组件与util工具都放置组件内部 `_source`文件夹里。
```
└── components
├── header
@ -228,6 +250,7 @@ export default {
```
##### 3.Prop
定义 Prop 的时候应该始终以驼峰格式(camelCase)命名,在父组件赋值的时候使用连接线(-)。
这里遵循每个语言的特性,因为在 HTML 标记中对大小写是不敏感的,使用连接线更加友好;而在 JavaScript 中更自然的是驼峰命名。
@ -270,7 +293,9 @@ props: {
```
##### 4.v-for
在执行 v-for 遍历的时候,总是应该带上 key 值使更新 DOM 时渲染效率更高。
```
<ul>
<li v-for="item in list" :key="item.id">
@ -280,6 +305,7 @@ props: {
```
v-for 应该避免与 v-if 在同一个元素(`例如:<li>`)上使用,因为 v-for 的优先级比 v-if 更高,为了避免无效计算和渲染,应该尽量将 v-if 放到容器的父元素之上。
```
<ul v-if="showList">
<li v-for="item in list" :key="item.id">
@ -289,7 +315,9 @@ v-for 应该避免与 v-if 在同一个元素(`例如:<li>`)上使用,
```
##### 5.v-if / v-else-if / v-else
若同一组 v-if 逻辑控制中的元素逻辑相同,Vue 为了更高效的元素切换,会复用相同的部分,`例如:value`。为了避免复用带来的不合理效果,应该在同种元素上加上 key 做标识。
```
<div v-if="hasData" key="mazey-data">
<span>{{ mazeyData }}</span>
@ -300,12 +328,15 @@ v-for 应该避免与 v-if 在同一个元素(`例如:<li>`)上使用,
```
##### 6.指令缩写
为了统一规范始终使用指令缩写,使用`v-bind`,`v-on`并没有什么不好,这里仅为了统一规范。
```
<input :value="mazeyUser" @click="verifyUser">
```
##### 7.单文件组件的顶级元素顺序
样式后续都是打包在一个文件里,所有在单个vue文件中定义的样式,在别的文件里同类名的样式也是会生效的所有在创建一个组件前都会有个顶级类名
注意:项目内已经增加了sass插件,单个vue文件里可以直接书写sass语法
为了统一和便于阅读,应该按 `<template>`、`<script>`、`<style>`的顺序放置。
@ -357,25 +388,31 @@ v-for 应该避免与 v-if 在同一个元素(`例如:<li>`)上使用,
## JavaScript规范
##### 1.var / let / const
建议不再使用 var,而使用 let / const,优先使用 const。任何一个变量的使用都要提前申明,除了 function 定义的函数可以随便放在任何位置。
##### 2.引号
```
const foo = '后除'
const bar = `${foo},前端工程师`
```
##### 3.函数
匿名函数统一使用箭头函数,多个参数/返回值时优先使用对象的结构赋值。
```
function getPersonInfo ({name, sex}) {
// ...
return {name, gender}
}
```
函数名统一使用驼峰命名,以大写字母开头申明的都是构造函数,使用小写字母开头的都是普通函数,也不该使用 new 操作符去操作普通函数。
##### 4.对象
```
const foo = {a: 0, b: 1}
const bar = JSON.parse(JSON.stringify(foo))
@ -393,7 +430,9 @@ for (let [key, value] of myMap.entries()) {
```
##### 5.模块
统一使用 import / export 的方式管理项目的模块。
```
// lib.js
export default {}
@ -406,18 +445,21 @@ import 统一放在文件顶部。
如果模块只有一个输出值,使用 `export default`,否则不用。
## HTML / CSS
###### 1.标签
在引用外部 CSS 或 JavaScript 时不写 type 属性。HTML5 默认 type 为 `text/css``text/javascript` 属性,所以没必要指定。
```
<link rel="stylesheet" href="//www.test.com/css/test.css">
<script src="//www.test.com/js/test.js"></script>
```
##### 2.命名
Class 和 ID 的命名应该语义化,通过看名字就知道是干嘛的;多个单词用连接线 - 连接。
```
// 正例
.test-header{
@ -426,6 +468,7 @@ Class 和 ID 的命名应该语义化,通过看名字就知道是干嘛的;
```
##### 3.属性缩写
CSS 属性尽量使用缩写,提高代码的效率和方便理解。
```
@ -439,6 +482,7 @@ border: 1px solid #ccc;
```
##### 4.文档类型
应该总是使用 HTML5 标准。
```
@ -446,7 +490,9 @@ border: 1px solid #ccc;
```
##### 5.注释
应该给一个模块文件写一个区块注释。
```
/**
* @module mazey/api
@ -458,6 +504,7 @@ border: 1px solid #ccc;
## 接口
##### 所有的接口都以 Promise 形式返回
注意非0都为错误走catch
```
@ -477,6 +524,7 @@ test.then(res => {
```
正常返回
```
{
code:0,
@ -486,6 +534,7 @@ test.then(res => {
```
错误返回
```
{
code:10000,
@ -493,8 +542,10 @@ test.then(res => {
msg:'失败'
}
```
接口如果是post请求,Content-Type默认为application/x-www-form-urlencoded;如果Content-Type改成application/json,
接口传参需要改成下面的方式
```
io.post('url', payload, null, null, { emulateJSON: false } res => {
resolve(res)
@ -524,6 +575,7 @@ dag 相关接口 `src/js/conf/home/store/dag/actions.js`
(1) 先将节点的icon小图标放置`src/js/conf/home/pages/dag/img`文件夹内,注意 `toolbar_${后台定义的节点的英文名称 例如:SHELL}.png`
(2) 找到 `src/js/conf/home/pages/dag/_source/config.js` 里的 `tasksType` 对象,往里增加
```
'DEPENDENT': { // 后台定义节点类型英文名称用作key值
desc: 'DEPENDENT', // tooltip desc
@ -532,6 +584,7 @@ dag 相关接口 `src/js/conf/home/store/dag/actions.js`
```
(3) 在 `src/js/conf/home/pages/dag/_source/formModel/tasks` 增加一个 `${节点类型(小写)}`.vue 文件,跟当前节点相关的组件内容都在这里写。 属于节点组件内的必须拥有一个函数 `_verification()` 验证成功后将当前组件的相关数据往父组件抛。
```
/**
* 验证
@ -568,6 +621,7 @@ dag 相关接口 `src/js/conf/home/store/dag/actions.js`
##### 2.增加状态类型
(1) 找到 `src/js/conf/home/pages/dag/_source/config.js` 里的 `tasksState` 对象,往里增加
```
'WAITTING_DEPEND': { //后端定义状态类型 前端用作key值
id: 11, // 前端定义id 后续用作排序
@ -579,7 +633,9 @@ dag 相关接口 `src/js/conf/home/store/dag/actions.js`
```
##### 3.增加操作栏工具
(1) 找到 `src/js/conf/home/pages/dag/_source/config.js` 里的 `toolOper` 对象,往里增加
```
{
code: 'pointer', // 工具标识
@ -599,13 +655,12 @@ dag 相关接口 `src/js/conf/home/store/dag/actions.js`
`util.js` => 属于 `plugIn` 工具类
操作则在 `src/js/conf/home/pages/dag/_source/dag.js` => `toolbarEvent` 事件中处理。
##### 3.增加一个路由页面
(1) 首先在路由管理增加一个路由地址`src/js/conf/home/router/index.js`
```
{
path: '/test', // 路由地址
@ -621,10 +676,10 @@ dag 相关接口 `src/js/conf/home/store/dag/actions.js`
这样就可以直接访问 `http://localhost:8888/#/test`
##### 4.增加预置邮箱
找到`src/lib/localData/email.js`启动和定时邮箱地址输入可以自动下拉匹配。
```
export default ["test@analysys.com.cn","test1@analysys.com.cn","test3@analysys.com.cn"]
```

1
docs/docs/zh/contribute/have-questions.md

@ -24,3 +24,4 @@
- 级别:Beginner、Intermediate、Advanced
- 场景相关:Debug,、How-to
- 如果内容包括错误日志或长代码,请使用 [GitHub gist](https://gist.github.com/),并在邮件中只附加相关代码/日志的几行。

10
docs/docs/zh/contribute/join/DS-License.md

@ -21,7 +21,6 @@
* [COMMUNITY-LED DEVELOPMENT "THE APACHE WAY"](https://apache.org/dev/licensing-howto.html)
以Apache为例,当我们使用了ZooKeeper,那么ZooKeeper的NOTICE文件(每个开源项目都会有NOTICE文件,一般位于根目录)则必须在我们的项目中体现,用Apache的话来讲,就是"Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work.
@ -37,7 +36,9 @@ copyright notice that is included in or attached to the work.
* 在dolphinscheduler-dist/release-docs/LICENSE中添加相关的maven仓库地址。
* 在dolphinscheduler-dist/release-docs/NOTICE中追加相关的NOTICE文件,此文件请务必和原代码仓库地址中的NOTICE文件一致。
* 在dolphinscheduler-dist/release-docs/license/下添加相关源代码的协议,文件命名为license+文件名.txt。
#### check dependency license fail
```
--- /dev/fd/63 2020-12-03 03:08:57.191579482 +0000
+++ /dev/fd/62 2020-12-03 03:08:57.191579482 +0000
@ -49,13 +50,16 @@ copyright notice that is included in or attached to the work.
+mchange-commons-java-0.2.11.jar
Error: Process completed with exit code 1.
```
一般来讲,添加一个jar的工作往往不会如此轻易的结束,因为它往往依赖了其它各种各样的jar,这些jar我们同样需要添加相应的license。
这种情况下,我们会在check里面得到 check dependency license fail的错误信息,如上,我们缺少了HikariCP-java6-2.3.13、c3p0等的license声明,
按照添加jar的步骤补充即可,提示还是蛮友好的(哈哈)。
### 附件
<!-- markdown-link-check-disable -->
附件:新jar的邮件格式
附件:新jar的邮-->
```
[VOTE][New Jar] jetcd-core(registry plugin support etcd3 )
@ -96,9 +100,11 @@ https://mvnrepository.com/artifact/io.etcd/jetcd-core
https://mvnrepository.com/artifact/io.etcd/jetcd-launcher
```
<!-- markdown-link-check-enable -->
### 参考文章:
* [COMMUNITY-LED DEVELOPMENT "THE APACHE WAY"](https://apache.org/dev/licensing-howto.html)
* [ASF 3RD PARTY LICENSE POLICY](https://apache.org/legal/resolved.html)

3
docs/docs/zh/contribute/join/code-conduct.md

@ -3,6 +3,7 @@
以下行为准则以完全遵循[Apache软件基金会行为准则](https://www.apache.org/foundation/policies/conduct.html)为前提。
## 开发理念
- **一致** 代码风格、命名以及使用方式保持一致。
- **易读** 代码无歧义,易于阅读和理解而非调试手段才知晓代码意图。
- **整洁** 认同《重构》和《代码整洁之道》的理念,追求整洁优雅代码。
@ -63,6 +64,6 @@
- 精确断言,尽量不使用`not`,`containsString`断言。
- 测试用例的真实值应名为为actualXXX,期望值应命名为expectedXXX。
- 测试类和`@Test`标注的方法无需javadoc。
- 公共规范
- 每行长度不超过`200`个字符,保证每一行语义完整以便于理解。

5
docs/docs/zh/contribute/join/commit-message.md

@ -1,6 +1,7 @@
# Commit Message 须知
### 前言
一个好的 commit message 是能够帮助其他的开发者(或者未来的开发者)快速理解相关变更的上下文,同时也可以帮助项目管理人员确定该提交是否适合包含在发行版中。但当我们在查看了很多开源项目的 commit log 后,发现一个有趣的问题,一部分开发者,代码质量很不错,但是 commit message 记录却比较混乱,当其他贡献者或者学习者在查看代码的时候,并不能通过 commit log 很直观的了解
该提交前后变更的目的,正如 Peter Hutterer 所言:Re-establishing the context of a piece of code is wasteful. We can’t avoid it completely, so our efforts should go to reducing it as much as possible. Commit messages can do exactly that and as a result, a commit message shows whether a developer is a good collaborator. 因此,DolphinScheduler 结合其他社区以及 Apache 官方文档制定了该规约。
@ -21,6 +22,7 @@ commit message 应该明确说明该提交解决了哪些问题(bug 修复、
Commit message 应该包括三个部分:Header,Body 和 Footer。其中,Header 是必需的,Body 和 Footer 可以省略。
##### header
Header 部分只有一行,包括三个字段:type(必需)、scope(可选)和 subject(必需)。
[DS-ISSUE编号][type] subject
@ -57,7 +59,6 @@ Body 部分需要注意以下几点:
* 语句最后不需要 ‘.’ (句号) 结尾
##### Footer
Footer只适用于两种情况
@ -71,6 +72,7 @@ Footer只适用于两种情况
如果当前 commit 针对某个issue,那么可以在 Footer 部分关闭这个 issue,也可以一次关闭多个 issue 。
##### 举个例子
[DS-001][docs-zh] add commit message
* commit message RIP
@ -82,6 +84,7 @@ and clarify the optimization in the version iteration
This closes #001
### 参考文档
[提交消息格式](https://cwiki.apache.org/confluence/display/GEODE/Commit+Message+Format)
[On commit messages-Peter Hutterer](http://who-t.blogspot.com/2009/12/on-commit-messages.html)

1
docs/docs/zh/contribute/join/contribute.md

@ -27,7 +27,6 @@
参考[参与贡献 Issue 需知](./issue.md),[参与贡献 Pull Request 需知](./pull-request.md),[参与贡献 CommitMessage 需知](./commit-message.md)
### 3. 如何领取 Issue,提交 Pull Request
如果你想实现某个 Feature 或者修复某个 Bug。请参考以下内容:

4
docs/docs/zh/contribute/join/issue.md

@ -1,6 +1,7 @@
# Issue 须知
## 前言
Issues 功能被用来追踪各种特性,Bug,功能等。项目维护者可以通过 Issues 来组织需要完成的任务。
Issue 是引出一个 Feature 或 Bug 等的重要步骤,在单个
@ -181,6 +182,7 @@ Priority分为四级: Critical、Major、Minor、Trivial
* 尽量列出其他调度已经具备的类似功能。商用与开源软件均可。
以下是 **Feature 的 Markdown 内容模板**,请按照该模板填写 issue 内容。
```shell
**标题**
标题格式: [Feature][Priority] feature标题
@ -197,7 +199,6 @@ Priority分为四级: Critical、Major、Minor、Trivial
```
### Contributor
除一些特殊情况之外,在开始完成
@ -215,3 +216,4 @@ Pull Request review 阶段针对实现思路的意见不同或需要重构而导
确实存在大多数提出 Issue 用户不清楚这个 Issue 是属于哪个模块的,其实这在很多开源社区都是很常见的。在这种情况下,其实
committer/contributor 是知道这个 Issue 影响的模块的,如果之后这个 Issue 被 committer 和 contributor approve
确实有价值,那么 committer 就可以按照 Issue 涉及到的具体的模块去修改 Issue 标题,或者留言给提出 Issue 的用户去修改成对应的标题。

6
docs/docs/zh/contribute/join/microbench.md

@ -22,7 +22,6 @@ JMH,即Java MicroBenchmark Harness,是专门用于代码微基准测试的
* 3:对比一个函数的多种实现方式
DolphinScheduler-MicroBench提供了AbstractBaseBenchmark,你可以在其基础上继承,编写你的基准测试代码,AbstractMicroBenchmark能保证以JUnit的方式运行。
### 定制运行参数
@ -39,8 +38,8 @@ DolphinScheduler-MicroBench提供了AbstractBaseBenchmark,你可以在其基础
### DolphinScheduler-MicroBench 介绍
通常并不建议跑测试时,用较少的循环次数,但是较少的次数有助于确认基准测试时工作的,在确认结束后,再运行大量的基准测试。
```java
@Warmup(iterations = 2, time = 1)
@Measurement(iterations = 4, time = 1)
@ -49,6 +48,7 @@ public class EnumBenchMark extends AbstractBaseBenchmark {
}
```
这可以以方法级别或者类级别来运行基准测试,命令行的参数会覆盖annotation上的参数。
```java
@ -72,7 +72,9 @@ Iteration 2: 0.004 us/op
Iteration 3: 0.004 us/op
Iteration 4: 0.004 us/op
```
在经过预热后,我们通常会得到如下结果
```java
Benchmark (testNum) Mode Cnt Score Error Units
EnumBenchMark.simpleTest 101 thrpt 8 428750972.826 ± 66511362.350 ops/s

Some files were not shown because too many files have changed in this diff Show More

Loading…
Cancel
Save