<!--Thanks very much for contributing to Apache DolphinScheduler. Please review https://dolphinscheduler.apache.org/en-us/community/development/pull-request.html before opening a pull request.-->
## Purpose of the pull request
<!--(For example: This pull request adds checkstyle plugin).-->
@ -10,6 +9,7 @@
<!--*(for example:)*
- *Add maven-checkstyle-plugin to root pom.xml*
-->
## Verify this pull request
<!--*(Please pick either of the following options)*-->
Decentralized multi-master and multi-worker | Visualization of workflow key information, such as task status, task type, retry times, task operation machine information, visual variables, and so on at a glance. | Support pause, recover operation | Support customized task types
support HA | Visualization of all workflow operations, dragging tasks to draw DAGs, configuring data sources and resources. At the same time, for third-party systems, provide API mode operations. | Users on DolphinScheduler can achieve many-to-one or one-to-one mapping relationship through tenants and Hadoop users, which is very important for scheduling large data jobs. | The scheduler supports distributed scheduling, and the overall scheduling capability will increase linearly with the scale of the cluster. Master and Worker support dynamic adjustment.
Overload processing: By using the task queue mechanism, the number of schedulable tasks on a single machine can be flexibly configured. Machine jam can be avoided with high tolerance to numbers of tasks cached in task queue. | One-click deployment | Support traditional shell tasks, and big data platform task scheduling: MR, Spark, SQL (MySQL, PostgreSQL, hive, spark SQL), Python, Procedure, Sub_Process | |
| Stability | Accessibility | Features | Scalability |
| Decentralized multi-master and multi-worker | Visualization of workflow key information, such as task status, task type, retry times, task operation machine information, visual variables, and so on at a glance.| Support pause, recover operation | Support customized task types |
| support HA | Visualization of all workflow operations, dragging tasks to draw DAGs, configuring data sources and resources. At the same time, for third-party systems, provide API mode operations. | Users on DolphinScheduler can achieve many-to-one or one-to-one mapping relationship through tenants and Hadoop users, which is very important for scheduling large data jobs. | The scheduler supports distributed scheduling, and the overall scheduling capability will increase linearly with the scale of the cluster. Master and Worker support dynamic adjustment. |
| Overload processing: By using the task queue mechanism, the number of schedulable tasks on a single machine can be flexibly configured. Machine jam can be avoided with high tolerance to numbers of tasks cached in task queue. | One-click deployment | Support traditional shell tasks, and big data platform task scheduling: MR, Spark, SQL (MySQL, PostgreSQL, hive, spark SQL), Python, Procedure, Sub_Process | |
|master.registry-disconnect-strategy.strategy|stop|Used when the master disconnect from registry, default value: stop. Optional values include stop, waiting|
|master.registry-disconnect-strategy.max-waiting-time|100s|Used when the master disconnect from registry, and the disconnect strategy is waiting, this config means the master will waiting to reconnect to registry in given times, and after the waiting times, if the master still cannot connect to registry, will stop itself, if the value is 0s, the Master will waitting infinitely|
|worker.registry-disconnect-strategy.max-waiting-time|100s|Used when the worker disconnect from registry, and the disconnect strategy is waiting, this config means the worker will waiting to reconnect to registry in given times, and after the waiting times, if the worker still cannot connect to registry, will stop itself, if the value is 0s, will waitting infinitely |
The centralized design concept is relatively simple. The nodes in the distributed cluster are roughly divided into two roles according to responsibilities:
@ -120,8 +121,6 @@ The service fault-tolerance design relies on ZooKeeper's Watcher mechanism, and
</p>
Among them, the Master monitors the directories of other Masters and Workers. If the remove event is triggered, perform fault tolerance of the process instance or task instance according to the specific business logic.
- Master fault tolerance:
<palign="center">
@ -172,13 +171,14 @@ In the early schedule design, if there is no priority design and use the fair sc
- According to **the priority of different process instances** prior over **priority of the same process instance** prior over **priority of tasks within the same process** prior over **tasks within the same process**, process task submission order from highest to Lowest.
- The specific implementation is to parse the priority according to the JSON of the task instance, and then save the **process instance priority_process instance id_task priority_task id** information to the ZooKeeper task queue. When obtain from the task queue, we can get the highest priority task by comparing string.
- The priority of the process definition is to consider that some processes need to process before other processes. Configure the priority when the process starts or schedules. There are 5 levels in total, which are HIGHEST, HIGH, MEDIUM, LOW, and LOWEST. As shown below
@ -57,3 +57,4 @@ You can customise the configuration by changing the following properties in work
- worker.max.cpuload.avg=-1 (worker max cpuload avg, only higher than the system cpu load average, worker server can be dispatched tasks. default value -1: the number of cpu cores * 2)
- worker.reserved.memory=0.3 (worker reserved memory, only lower than system available memory, worker server can be dispatched tasks. default value 0.3, the unit is G)
see sql files in `dolphinscheduler/dolphinscheduler-dao/src/main/resources/sql`
see sql files in `dolphinscheduler/dolphinscheduler-dao/src/main/resources/sql`
---
@ -26,6 +26,7 @@ see sql files in `dolphinscheduler/dolphinscheduler-dao/src/main/resources/sql`
- The `user_id` in the `t_ds_udfs` table represents the user who create the UDF, and the `user_id` in the `t_ds_relation_udfs_user` table represents a user who has permission to the UDF.
- A process definition corresponds to multiple task definitions, which are associated through `t_ds_process_task_relation` and the associated key is `code + version`. When the pre-task of the task is empty, the corresponding `pre_task_node` and `pre_task_version` are 0.
- A process definition can have multiple process instances `t_ds_process_instance`, one process instance corresponds to one or more task instances `t_ds_task_instance`.
- The data stored in the `t_ds_relation_process_instance` table is used to handle the case that the process definition contains sub-processes. `parent_process_instance_id` represents the id of the main process instance containing the sub-process, `process_instance_id` represents the id of the sub-process instance, `parent_task_instance_id` represents the task instance id of the sub-process node. The process instance table and the task instance table correspond to the `t_ds_process_instance` table and the `t_ds_task_instance` table, respectively.
A standardized and unified API is the cornerstone of project design.The API of DolphinScheduler follows the REST ful standard. REST ful is currently the most popular Internet software architecture. It has a clear structure, conforms to standards, is easy to understand and extend.
This article uses the DolphinScheduler API as an example to explain how to construct a Restful API.
## 1. URI design
REST is "Representational State Transfer".The design of Restful URI is based on resources.The resource corresponds to an entity on the network, for example: a piece of text, a picture, and a service. And each resource corresponds to a URI.
+ One Kind of Resource: expressed in the plural, such as `task-instances`、`groups` ;
@ -12,36 +14,43 @@ REST is "Representational State Transfer".The design of Restful URI is based on
+ A Sub Resource:`/instances/{instanceId}/tasks/{taskId}`;
## 2. Method design
We need to locate a certain resource by URI, and then use Method or declare actions in the path suffix to reflect the operation of the resource.
### ① Query - GET
Use URI to locate the resource, and use GET to indicate query.
+ When the URI is a type of resource, it means to query a type of resource. For example, the following example indicates paging query `alter-groups`.
```
Method: GET
/dolphinscheduler/alert-groups
```
+ When the URI is a single resource, it means to query this resource. For example, the following example means to query the specified `alter-group`.
```
Method: GET
/dolphinscheduler/alter-groups/{id}
```
+ In addition, we can also express query sub-resources based on URI, as follows:
```
Method: GET
/dolphinscheduler/projects/{projectId}/tasks
```
**The above examples all represent paging query. If we need to query all data, we need to add `/list` after the URI to distinguish. Do not mix the same API for both paged query and query.**
```
Method: GET
/dolphinscheduler/alert-groups/list
```
### ② Create - POST
Use URI to locate the resource, use POST to indicate create, and then return the created id to requester.
Use URI to locate the resource, use PUT to indicate modify.
+ modify an `alert-group`
```
Method: PUT
/dolphinscheduler/alter-groups/{alterGroupId}
```
### ④ Delete -DELETE
Use URI to locate the resource, use DELETE to indicate delete.
+ delete an `alert-group`
```
Method: DELETE
/dolphinscheduler/alter-groups/{alterGroupId}
```
+ batch deletion: batch delete the id array,we should use POST. **(Do not use the DELETE method, because the body of the DELETE request has no semantic meaning, and it is possible that some gateways, proxies, and firewalls will directly strip off the request body after receiving the DELETE request.)**
```
Method: POST
/dolphinscheduler/alter-groups/batch-delete
```
### ⑤ Partial Modifications -PATCH
Use URI to locate the resource, use PATCH to partial modifications.
```
@ -89,20 +105,27 @@ Method: PATCH
```
### ⑥ Others
In addition to creating, deleting, modifying and quering, we also locate the corresponding resource through url, and then append operations to it after the path, such as:
There are two types of parameters, one is request parameter and the other is path parameter. And the parameter must use small hump.
In the case of paging, if the parameter entered by the user is less than 1, the front end needs to automatically turn to 1, indicating that the first page is requested; When the backend finds that the parameter entered by the user is greater than the total number of pages, it should directly return to the last page.
## 4. Others design
### base URL
The URI of the project needs to use `/<project_name>` as the base path, so as to identify that these APIs are under this project.
@ -10,7 +10,6 @@ In contrast, API testing focuses on whether a complete operation chain can be co
For example, the API test of the tenant management interface focuses on whether users can log in normally; If the login fails, whether the error message can be displayed correctly. After logging in, you can perform tenant management operations through the sessionid you carry.
## API Test
### API-Pages
@ -49,7 +48,6 @@ In addition, during the testing process, the interface are not requested directl
On the login page, only the input parameter specification of the interface request is defined. For the output parameter of the interface request, only the unified basic response structure is defined. The data actually returned by the interface is tested in the actual test case. Whether the input and output of main test interfaces can meet the requirements of test cases.
### API-Cases
The following is an example of a tenant management test. As explained earlier, we use docker-compose for deployment, so for each test case, we need to import the corresponding file in the form of an annotation.
@ -46,8 +46,6 @@ Before explaining the architecture of the schedule system, let us first understa
</p>
</p>
#### 2.2 Architectural description
* **MasterServer**
@ -55,8 +53,6 @@ Before explaining the architecture of the schedule system, let us first understa
MasterServer adopts the distributed non-central design concept. MasterServer is mainly responsible for DAG task split, task submission monitoring, and monitoring the health status of other MasterServer and WorkerServer.
When the MasterServer service starts, it registers a temporary node with Zookeeper, and listens to the Zookeeper temporary node state change for fault tolerance processing.
##### The service mainly contains:
- **Distributed Quartz** distributed scheduling component, mainly responsible for the start and stop operation of the scheduled task. When the quartz picks up the task, the master internally has a thread pool to be responsible for the subsequent operations of the task.
@ -67,8 +63,6 @@ Before explaining the architecture of the schedule system, let us first understa
- **MasterTaskExecThread** is mainly responsible for task persistence
* **WorkerServer**
- WorkerServer also adopts a distributed, non-central design concept. WorkerServer is mainly responsible for task execution and providing log services. When the WorkerServer service starts, it registers the temporary node with Zookeeper and maintains the heartbeat.
@ -76,7 +70,6 @@ Before explaining the architecture of the schedule system, let us first understa
##### This service contains:
- **FetchTaskThread** is mainly responsible for continuously receiving tasks from **Task Queue** and calling **TaskScheduleThread** corresponding executors according to different task types.
- **ZooKeeper**
The ZooKeeper service, the MasterServer and the WorkerServer nodes in the system all use the ZooKeeper for cluster management and fault tolerance. In addition, the system also performs event monitoring and distributed locking based on ZooKeeper.
@ -99,8 +92,6 @@ Before explaining the architecture of the schedule system, let us first understa
The front-end page of the system provides various visual operation interfaces of the system. For details, see the [quick start](https://dolphinscheduler.apache.org/en-us/docs/latest/user_doc/about/introduction.html) section.
#### 2.3 Architectural Design Ideas
##### I. Decentralized vs centralization
@ -130,7 +121,6 @@ Problems in the design of centralized :
- In the decentralized design, there is usually no Master/Slave concept, all roles are the same, the status is equal, the global Internet is a typical decentralized distributed system, networked arbitrary node equipment down machine , all will only affect a small range of features.
- The core design of decentralized design is that there is no "manager" that is different from other nodes in the entire distributed system, so there is no single point of failure problem. However, since there is no "manager" node, each node needs to communicate with other nodes to get the necessary machine information, and the unreliable line of distributed system communication greatly increases the difficulty of implementing the above functions.
- In fact, truly decentralized distributed systems are rare. Instead, dynamic centralized distributed systems are constantly emerging. Under this architecture, the managers in the cluster are dynamically selected, rather than preset, and when the cluster fails, the nodes of the cluster will spontaneously hold "meetings" to elect new "managers". Go to preside over the work. The most typical case is the Etcd implemented in ZooKeeper and Go.
- Decentralization of DolphinScheduler is the registration of Master/Worker to ZooKeeper. The Master Cluster and the Worker Cluster are not centered, and the Zookeeper distributed lock is used to elect one Master or Worker as the “manager” to perform the task.
##### 二、Distributed lock practice
@ -184,8 +174,6 @@ Service fault tolerance design relies on ZooKeeper's Watcher mechanism. The impl
The Master monitors the directories of other Masters and Workers. If the remove event is detected, the process instance is fault-tolerant or the task instance is fault-tolerant according to the specific business logic.
- Master fault tolerance flow chart:
<palign="center">
@ -194,8 +182,6 @@ The Master monitors the directories of other Masters and Workers. If the remove
After the ZooKeeper Master is fault-tolerant, it is rescheduled by the Scheduler thread in DolphinScheduler. It traverses the DAG to find the "Running" and "Submit Successful" tasks, and monitors the status of its task instance for the "Running" task. You need to determine whether the Task Queue already exists. If it exists, monitor the status of the task instance. If it does not exist, resubmit the task instance.
- Worker fault tolerance flow chart:
<palign="center">
@ -214,8 +200,6 @@ Here we must first distinguish between the concept of task failure retry, proces
- Process failure recovery is process level, is done manually, recovery can only be performed **from the failed node** or **from the current node**
- Process failure rerun is also process level, is done manually, rerun is from the start node
Next, let's talk about the topic, we divided the task nodes in the workflow into two types.
- One is a business node, which corresponds to an actual script or processing statement, such as a Shell node, an MR node, a Spark node, a dependent node, and so on.
@ -225,16 +209,12 @@ Each **service node** can configure the number of failed retries. When the task
If there is a task failure in the workflow that reaches the maximum number of retries, the workflow will fail to stop, and the failed workflow can be manually rerun or process resumed.
##### V. Task priority design
In the early scheduling design, if there is no priority design and fair scheduling design, it will encounter the situation that the task submitted first may be completed simultaneously with the task submitted subsequently, but the priority of the process or task cannot be set. We have redesigned this, and we are currently designing it as follows:
- According to **different process instance priority** prioritizes **same process instance priority** prioritizes **task priority within the same process** takes precedence over **same process** commit order from high Go to low for task processing.
- The specific implementation is to resolve the priority according to the json of the task instance, and then save the **process instance priority _ process instance id_task priority _ task id** information in the ZooKeeper task queue, when obtained from the task queue, Through string comparison, you can get the task that needs to be executed first.
- The priority of the process definition is that some processes need to be processed before other processes. This can be configured at the start of the process or at the time of scheduled start. There are 5 levels, followed by HIGHEST, HIGH, MEDIUM, LOW, and LOWEST. As shown below
<palign="center">
@ -308,8 +288,6 @@ Public class TaskLogFilter extends Filter<ILoggingEvent> {
}
```
### summary
Starting from the scheduling, this paper introduces the architecture principle and implementation ideas of the big data distributed workflow scheduling system-DolphinScheduler. To be continued
@ -6,3 +6,4 @@ Switch task workflow step as follows
* `SwitchTaskExecThread` processes the expressions defined in `switch` from top to bottom, obtains the value of the variable from `varPool`, and parses the expression through `javascript`. If the expression returns true, stop checking and record The order of the expression, here we record as resultConditionLocation. The task of SwitchTaskExecThread is over
* After the `switch` task runs, if there is no error (more commonly, the user-defined expression is out of specification or there is a problem with the parameter name), then `MasterExecThread.submitPostNode` will obtain the downstream node of the `DAG` to continue execution.
* If it is found in `DagHelper.parsePostNodes` that the current node (the node that has just completed the work) is a `switch` node, the `resultConditionLocation` will be obtained, and all branches except `resultConditionLocation` in the SwitchParameters will be skipped. In this way, only the branches that need to be executed are left
@ -26,8 +26,8 @@ If you don't care about its internal design, but simply want to know how to deve
This module is currently a plug-in provided by us, and now we have supported dozens of plug-ins, such as Email, DingTalk, Script, etc.
#### Alert SPI Main class information.
AlertChannelFactory
Alarm plug-in factory interface. All alarm plug-ins need to implement this interface. This interface is used to define the name of the alarm plug-in and the required parameters. The create method is used to create a specific alarm plug-in instance.
@ -77,15 +77,19 @@ The specific design of alert_spi can be seen in the issue: [Alert Plugin Design]
* SMS
SMS alerts
* FeiShu
FeiShu alert notification
* Slack
Slack alert notification
* PagerDuty
PagerDuty alert notification
* WebexTeams
WebexTeams alert notification
@ -101,3 +105,4 @@ The specific design of alert_spi can be seen in the issue: [Alert Plugin Design]
* Http
We have implemented a Http script for alerting. And calling most of the alerting plug-ins end up being Http requests, if we not support your alert plug-in yet, you can use Http to realize your alert login. Also welcome to contribute your common plug-ins to the community :)
@ -17,10 +18,16 @@ Lodash high performance JavaScript utility library
### Development environment
- #### Node installation
-
#### Node installation
Node package download (note version v12.20.2) `https://nodejs.org/download/release/v12.20.2/`
- #### Front-end project construction
-
#### Front-end project construction
Use the command line mode `cd` enter the `dolphinscheduler-ui` project directory and execute `npm install` to pull the project dependency package.
> If `npm install` is very slow, you can set the taobao mirror
@ -36,13 +43,16 @@ npm config set registry http://registry.npm.taobao.org/
API_BASE = http://127.0.0.1:12345
```
> ##### ! ! ! Special attention here. If the project reports a "node-sass error" error while pulling the dependency package, execute the following command again after execution.
##### ! ! ! Special attention here. If the project reports a "node-sass error" error while pulling the dependency package, execute the following command again after execution.
Data Source Management => `http://localhost:8888/#/datasource/list`
Security Center => `http://localhost:8888/#/security/tenant`
```
| Tenant Management
| User Management
@ -174,16 +187,19 @@ User Center => `http://localhost:8888/#/user/account`
The project `src/js/conf/home` is divided into
`pages` => route to page directory
```
The page file corresponding to the routing address
```
`router` => route management
```
vue router, the entry file index.js in each page will be registered. Specific operations: https://router.vuejs.org/zh/
```
`store` => status management
```
The page corresponding to each route has a state management file divided into:
@ -201,9 +217,13 @@ Specific action:https://vuex.vuejs.org/zh/
```
## specification
## Vue specification
##### 1.Component name
The component is named multiple words and is connected with a wire (-) to avoid conflicts with HTML tags and a clearer structure.
```
// positive example
export default {
@ -212,7 +232,9 @@ export default {
```
##### 2.Component files
The internal common component of the `src/js/module/components` project writes the folder name with the same name as the file name. The subcomponents and util tools that are split inside the common component are placed in the internal `_source` folder of the component.
```
└── components
├── header
@ -228,6 +250,7 @@ The internal common component of the `src/js/module/components` project writes t
```
##### 3.Prop
When you define Prop, you should always name it in camel format (camelCase) and use the connection line (-) when assigning values to the parent component.
This follows the characteristics of each language, because it is case-insensitive in HTML tags, and the use of links is more friendly; in JavaScript, the more natural is the hump name.
@ -270,7 +293,9 @@ props: {
```
##### 4.v-for
When performing v-for traversal, you should always bring a key value to make rendering more efficient when updating the DOM.
```
<ul>
<liv-for="item in list":key="item.id">
@ -280,6 +305,7 @@ When performing v-for traversal, you should always bring a key value to make ren
```
v-for should be avoided on the same element as v-if (`for example: <li>`) because v-for has a higher priority than v-if. To avoid invalid calculations and rendering, you should try to use v-if Put it on top of the container's parent element.
```
<ulv-if="showList">
<liv-for="item in list":key="item.id">
@ -289,7 +315,9 @@ v-for should be avoided on the same element as v-if (`for example: <li>`) becaus
```
##### 5.v-if / v-else-if / v-else
If the elements in the same set of v-if logic control are logically identical, Vue reuses the same part for more efficient element switching, `such as: value`. In order to avoid the unreasonable effect of multiplexing, you should add key to the same element for identification.
```
<divv-if="hasData"key="mazey-data">
<span>{{ mazeyData }}</span>
@ -300,12 +328,15 @@ If the elements in the same set of v-if logic control are logically identical, V
```
##### 6.Instruction abbreviation
In order to unify the specification, the instruction abbreviation is always used. Using `v-bind`, `v-on` is not bad. Here is only a unified specification.
```
<input:value="mazeyUser"@click="verifyUser">
```
##### 7.Top-level element order of single file components
Styles are packaged in a file, all the styles defined in a single vue file, the same name in other files will also take effect. All will have a top class name before creating a component.
Note: The sass plugin has been added to the project, and the sas syntax can be written directly in a single vue file.
For uniformity and ease of reading, they should be placed in the order of `<template>`、`<script>`、`<style>`.
@ -357,25 +388,31 @@ For uniformity and ease of reading, they should be placed in the order of `<tem
## JavaScript specification
##### 1.var / let / const
It is recommended to no longer use var, but use let / const, prefer const. The use of any variable must be declared in advance, except that the function defined by function can be placed anywhere.
##### 2.quotes
```
const foo = 'after division'
const bar = `${foo},ront-end engineer`
```
##### 3.function
Anonymous functions use the arrow function uniformly. When multiple parameters/return values are used, the object's structure assignment is used first.
```
function getPersonInfo ({name, sex}) {
// ...
return {name, gender}
}
```
The function name is uniformly named with a camel name. The beginning of the capital letter is a constructor. The lowercase letters start with ordinary functions, and the new operator should not be used to operate ordinary functions.
##### 4.object
```
const foo = {a: 0, b: 1}
const bar = JSON.parse(JSON.stringify(foo))
@ -393,7 +430,9 @@ for (let [key, value] of myMap.entries()) {
```
##### 5.module
Unified management of project modules using import / export.
```
// lib.js
export default {}
@ -411,13 +450,16 @@ If the module has only one output value, use `export default`,otherwise no.
##### 1.Label
Do not write the type attribute when referencing external CSS or JavaScript. The HTML5 default type is the text/css and text/javascript properties, so there is no need to specify them.
The naming of Class and ID should be semantic, and you can see what you are doing by looking at the name; multiple words are connected by a link.
```
// positive example
.test-header{
@ -426,6 +468,7 @@ The naming of Class and ID should be semantic, and you can see what you are doin
```
##### 3.Attribute abbreviation
CSS attributes use abbreviations as much as possible to improve the efficiency and ease of understanding of the code.
```
@ -439,6 +482,7 @@ border: 1px solid #ccc;
```
##### 4.Document type
The HTML5 standard should always be used.
```
@ -446,7 +490,9 @@ The HTML5 standard should always be used.
```
##### 5.Notes
A block comment should be written to a module file.
```
/**
* @module mazey/api
@ -458,6 +504,7 @@ A block comment should be written to a module file.
## interface
##### All interfaces are returned as Promise
Note that non-zero is wrong for catching catch
```
@ -477,6 +524,7 @@ test.then(res => {
```
Normal return
```
{
code:0,
@ -486,6 +534,7 @@ Normal return
```
Error return
```
{
code:10000,
@ -493,8 +542,10 @@ Error return
msg:'failed'
}
```
If the interface is a post request, the Content-Type defaults to application/x-www-form-urlencoded; if the Content-Type is changed to application/json,
Interface parameter transfer needs to be changed to the following way
@ -524,6 +575,7 @@ User Center Related Interfaces `src/js/conf/home/store/user/actions.js`
(1) First place the icon icon of the node in the `src/js/conf/home/pages/dag/img `folder, and note the English name of the node defined by the `toolbar_${in the background. For example: SHELL}.png`
(2) Find the `tasksType` object in `src/js/conf/home/pages/dag/_source/config.js` and add it to it.
```
'DEPENDENT': { // The background definition node type English name is used as the key value
desc: 'DEPENDENT', // tooltip desc
@ -532,6 +584,7 @@ User Center Related Interfaces `src/js/conf/home/store/user/actions.js`
```
(3) Add a `${node type (lowercase)}`.vue file in `src/js/conf/home/pages/dag/_source/formModel/tasks`. The contents of the components related to the current node are written here. Must belong to a node component must have a function _verification () After the verification is successful, the relevant data of the current component is thrown to the parent component.
```
/**
* Verification
@ -566,6 +619,7 @@ User Center Related Interfaces `src/js/conf/home/store/user/actions.js`
(4) Common components used inside the node component are under` _source`, and `commcon.js` is used to configure public data.
##### 2.Increase the status type
(1) Find the `tasksState` object in `src/js/conf/home/pages/dag/_source/config.js` and add it to it.
```
@ -579,7 +633,9 @@ User Center Related Interfaces `src/js/conf/home/store/user/actions.js`
```
##### 3.Add the action bar tool
(1) Find the `toolOper` object in `src/js/conf/home/pages/dag/_source/config.js` and add it to it.
```
{
code: 'pointer', // tool identifier
@ -599,13 +655,12 @@ User Center Related Interfaces `src/js/conf/home/store/user/actions.js`
`util.js` => belongs to the `plugIn` tool class
The operation is handled in the `src/js/conf/home/pages/dag/_source/dag.js` => `toolbarEvent` event.
##### 3.Add a routing page
(1) First add a routing address`src/js/conf/home/router/index.js` in route management
```
routing address{
path: '/test', // routing address
@ -621,10 +676,10 @@ routing address{
This will give you direct access to`http://localhost:8888/#/test`
##### 4.Increase the preset mailbox
Find the `src/lib/localData/email.js` startup and timed email address input to automatically pull down the match.
- For error logs or long code examples, please use [GitHub gist](https://gist.github.com/) and include only a few lines of the pertinent code / log within the email.
@ -20,7 +20,6 @@ Moreover, when we intend to refer a new software ( not limited to 3rd party jar,
* [COMMUNITY-LED DEVELOPMENT "THE APACHE WAY"](https://apache.org/dev/licensing-howto.html)
For example, we should contain the NOTICE file (every open-source project has NOTICE file, generally under root directory) of ZooKeeper in our project when we are using ZooKeeper. As the Apache explains, "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work.
We are not going to dive into every 3rd party open-source license policy, you may look up them if interested.
@ -40,3 +39,4 @@ We need to follow the following steps when we need to add new jars or external r
* [COMMUNITY-LED DEVELOPMENT "THE APACHE WAY"](https://apache.org/dev/licensing-howto.html)
* [ASF 3RD PARTY LICENSE POLICY](https://apache.org/legal/resolved.html)
The following Code of Conduct is based on full compliance with the [Apache Software Foundation Code of Conduct](https://www.apache.org/foundation/policies/conduct.html).
## Development philosophy
- **Consistent** code style, naming, and usage are consistent.
- **Easy to read** code is obvious, easy to read and understand, when debugging one knows the intent of the code.
- **Neat** agree with the concepts of《Refactoring》and《Code Cleanliness》and pursue clean and elegant code.
@ -63,6 +64,6 @@ The following Code of Conduct is based on full compliance with the [Apache Softw
- Accurate assertion, try not to use `not`,`containsString` assertion.
- The true value of the test case should be named actualXXX, and the expected value should be named expectedXXX.
- Classes and Methods with `@Test` labels do not require javadoc.
- Public specifications.
- Each line is no longer than `200` in length, ensuring that each line is semantically complete for easy understanding.
Pull Request is a way of software cooperation, which is a process of bringing code involving different functions into the trunk. During this process, the code can be discussed, reviewed, and modified.
In Pull Request, we try not to discuss the implementation of the code. The general implementation of the code and its logic should be determined in Issue. In the Pull Request, we only focus on the code format and code specification, so as to avoid wasting time caused by different opinions on implementation.
@ -75,3 +76,4 @@ see [Code Style](../development-environment-setup.md#code-style) for details.
the second is multiple issues have subtle differences.
In this scenario, the responsibilities of each issue can be clearly divided. The type of each issue is marked as Sub-Task, and then these sub task type issues are associated with one issue.
And each Pull Request is submitted should be associated with only one issue of a sub task.
| wont fix | Has been fixed in dev branch | [wontfix][label-wontfix] | Close Issue, inform creator the fixed version if it already release |
| duplicate issue | Had the same problem before | [duplicate][label-duplicate] | Close issue, inform creator the link of same issue |
| Description not clearly | Without detail reproduce step | [need more information][label-need-more-information] | Inform creator add more description |
@ -37,7 +37,7 @@ In addition give suggestion, add label for issue is also important during review
better, which convenient for further processing. An issue can with more than one label. Common issue categories are:
@ -21,3 +21,4 @@ Unsubscribe from the mailing list steps are as follows:
2. Receive confirmation email and reply. After completing step 1, you will receive a confirmation email from dev-help@dolphinscheduler.apache.org (if not received, please confirm whether the email is automatically classified as spam, promotion email, subscription email, etc.) . Then reply directly to the email, or click on the link in the email to reply quickly, the subject and content are arbitrary.
3. Receive a goodbye email. After completing the above steps, you will receive a goodbye email with the subject GOODBYE from dev@dolphinscheduler.apache.org, and you have successfully unsubscribed to the Apache DolphinScheduler mailing list, and you will not receive emails from dev@dolphinscheduler.apache.org.
- Unit tests should be well designed as well as avoiding useless code.
- When you find a `method` is difficult to write unit test, and if you confirm that the `method` is `badcode`, then refactor it with the developer.
<!-- markdown-link-check-disable -->
- DolphinScheduler: [mockito](http://site.mockito.org/). Here are some development guides: [mockito tutorial](http://www.baeldung.com/bdd-mockito), [mockito refcard](https://dzone.com/refcardz/mockito)
<!-- markdown-link-check-enable -->
- TDD(option): When you start writing a new feature, you can try writing test cases first.
@ -100,6 +102,7 @@ The test will fail when the code in the unit test throws an exception. Therefore
@ -50,3 +50,4 @@ That is, the workflow instance ID and task instance ID are injected in the print
- Branch printing of logs is prohibited. The contents of the logs need to be associated with the relevant information in the log format, and printing them in separate lines will cause the contents of the logs to not match the time and other information, and cause the logs to be mixed in a large number of log environments, which will make log retrieval more difficult.
- The use of the "+" operator for splicing log content is prohibited. Use placeholders for formatting logs for printing to improve memory usage efficiency.
- When the log content includes object instances, you need to make sure to override the toString() method to prevent printing meaningless hashcode.
The Group Chat send type means to notify the alert results via group chat created by Enterprise WeChat API, sending messages to all members of the group and specified users are not supported.
@ -69,3 +68,4 @@ The following is the `create new group chat` API and `query userId` API example:
## Reference
- Group Chat:https://work.weixin.qq.com/api/doc/90000/90135/90248
The data quality task is used to check the data accuracy during the integration and processing of data. Data quality tasks in this release include single-table checking, single-table custom SQL checking, multi-table accuracy, and two-table value comparisons. The running environment of the data quality task is Spark 2.4.0, and other versions have not been verified, and users can verify by themselves.
| CheckMethod | [CheckFormula][Operator][Threshold], if the result is true, it indicates that the data does not meet expectations, and the failure strategy is executed. |
| Example |<ul><li>CheckFormula:Expected-Actual</li><li>Operator:></li><li>Threshold:0</li><li>ExpectedValue:FixValue=9</li></ul>
| Example |<ul><li>CheckFormula:Expected-Actual</li><li>Operator:></li><li>Threshold:0</li><li>ExpectedValue:FixValue=9</li></ul> |
In the example, assuming that the actual value is 10, the operator is >, and the expected value is 9, then the result 10 -9 > 0 is true, which means that the row data in the empty column has exceeded the threshold, and the task is judged to fail.
@ -50,7 +51,6 @@ The goal of the null value check is to check the number of empty rows in the spe
```sql
SELECT COUNT(*) AS miss FROM ${src_table} WHERE (${src_field} is null or ${src_field} = '') AND (${src_filter})
```
- The SQL to calculate the total number of rows in the table is as follows:
```sql
@ -62,7 +62,7 @@ The goal of the null value check is to check the number of empty rows in the spe
| Source data type | Select MySQL, PostgreSQL, etc. |
| Source data source | The corresponding data source under the source data type. |
| Source data table | Drop-down to select the table where the validation data is located. |
@ -75,7 +75,9 @@ The goal of the null value check is to check the number of empty rows in the spe
| Expected value type | Select the desired type from the drop-down menu. |
## Timeliness Check of Single Table Check
### Inspection Introduction
The timeliness check is used to check whether the data is processed within the expected time. The start time and end time can be specified to define the time range. If the amount of data within the time range does not reach the set threshold, the check task will be judged as fail.
### Interface Operation Guide
@ -83,9 +85,9 @@ The timeliness check is used to check whether the data is processed within the e
@ -101,6 +103,7 @@ The timeliness check is used to check whether the data is processed within the e
## Field Length Check for Single Table Check
### Inspection Introduction
The goal of field length verification is to check whether the length of the selected field meets the expectations. If there is data that does not meet the requirements, and the number of rows exceeds the threshold, the task will be judged to fail.
### Interface Operation Guide
@ -108,7 +111,7 @@ The goal of field length verification is to check whether the length of the sele
| Source data type | Select MySQL, PostgreSQL, etc. |
| Source data source | The corresponding data source under the source data type. |
| Source data table | Drop-down to select the table where the validation data is located. |
@ -125,6 +128,7 @@ The goal of field length verification is to check whether the length of the sele
## Uniqueness Check for Single Table Check
### Inspection Introduction
The goal of the uniqueness check is to check whether the fields are duplicated. It is generally used to check whether the primary key is duplicated. If there are duplicates and the threshold is reached, the check task will be judged to be failed.
### Interface Operation Guide
@ -132,7 +136,7 @@ The goal of the uniqueness check is to check whether the fields are duplicated.
| Source data type | Select MySQL, PostgreSQL, etc. |
| Source data source | The corresponding data source under the source data type. |
| Source data table | Drop-down to select the table where the validation data is located. |
@ -147,6 +151,7 @@ The goal of the uniqueness check is to check whether the fields are duplicated.
## Regular Expression Check for Single Table Check
### Inspection Introduction
The goal of regular expression verification is to check whether the format of the value of a field meets the requirements, such as time format, email format, ID card format, etc. If there is data that does not meet the format and exceeds the threshold, the task will be judged as failed.
### Interface Operation Guide
@ -154,7 +159,7 @@ The goal of regular expression verification is to check whether the format of th
| Source data type | Select MySQL, PostgreSQL, etc. |
| Source data source | The corresponding data source under the source data type. |
| Source data table | Drop-down to select the table where the validation data is located. |
@ -168,7 +173,9 @@ The goal of regular expression verification is to check whether the format of th
| Expected value type | Select the desired type from the drop-down menu. |
## Enumeration Value Validation for Single Table Check
### Inspection Introduction
The goal of enumeration value verification is to check whether the value of a field is within the range of the enumeration value. If there is data that is not in the range of the enumeration value and exceeds the threshold, the task will be judged to fail.
### Interface Operation Guide
@ -176,7 +183,7 @@ The goal of enumeration value verification is to check whether the value of a fi
| Source data type | Select MySQL, PostgreSQL, etc. |
| Source data source | The corresponding data source under the source data type. |
| Source data table | Drop-down to select the table where the validation data is located. |
@ -192,6 +199,7 @@ The goal of enumeration value verification is to check whether the value of a fi
## Table Row Number Verification for Single Table Check
### Inspection Introduction
The goal of table row number verification is to check whether the number of rows in the table reaches the expected value. If the number of rows does not meet the standard, the task will be judged as failed.
### Interface Operation Guide
@ -199,7 +207,7 @@ The goal of table row number verification is to check whether the number of rows
| Source data type | Select MySQL, PostgreSQL, etc. |
| Source data source | The corresponding data source under the source data type. |
| Source data table | Drop-down to select the table where the data to be verified is located. |
@ -271,7 +281,9 @@ If you compare the data in c1 and c21, the tables test1 and test2 are exactly th
| Expected value type | Select the desired type in the drop-down menu, only `SrcTableTotalRow`, `TargetTableTotalRow` and fixed value are suitable for selection here. |
## Comparison of the values checked by the two tables
### Inspection Introduction
Two-table value comparison allows users to customize different SQL statistics for two tables and compare the corresponding values. For example, for the source table A, the total amount of a certain column is calculated, and for the target table, the total amount of a certain column is calculated. value sum2, compare sum1 and sum2 to determine the check result.
### Interface Operation Guide
@ -279,7 +291,7 @@ Two-table value comparison allows users to customize different SQL statistics fo
- Driver download link [SimbaAthenaJDBC-2.0.31.1000/AthenaJDBC42.jar](https://s3.cn-north-1.amazonaws.com.cn/athena-downloads-cn/drivers/JDBC/SimbaAthenaJDBC-2.0.31.1000/AthenaJDBC42.jar)
@ -14,7 +14,6 @@ This article describes how to add a new master service or worker service to an e
* [required] [JDK](https://www.oracle.com/technetwork/java/javase/downloads/index.html) (version 1.8+): must install, install and configure `JAVA_HOME` and `PATH` variables under `/etc/profile`
* [optional] If the expansion is a worker node, you need to consider whether to install an external client, such as Hadoop, Hive, Spark Client.
```markdown
Attention: DolphinScheduler itself does not depend on Hadoop, Hive, Spark, but will only call their Client for the corresponding task submission.
```
@ -74,8 +73,7 @@ sed -i 's/Defaults requirett/#Defaults requirett/g' /etc/sudoers
zookeeper.properties: information for connecting zk
common.properties: Configuration information about the resource store (if hadoop is set up, please check if the core-site.xml and hdfs-site.xml configuration files exist).
dolphinscheduler_env.sh: environment Variables
````
```
- Modify the `dolphinscheduler_env.sh` environment variable in the `bin/env/dolphinscheduler_env.sh` directory according to the machine configuration (the following is the example that all the used software install under `/opt/soft`)
```shell
@ -94,15 +92,12 @@ sed -i 's/Defaults requirett/#Defaults requirett/g' /etc/sudoers
`Attention: This step is very important, such as `JAVA_HOME` and `PATH` is necessary to configure if haven not used just ignore or comment out`
- Soft link the `JDK` to `/usr/bin/java` (still using `JAVA_HOME=/opt/soft/java` as an example)
```shell
sudo ln -s /opt/soft/java/bin/java /usr/bin/java
```
- Modify the configuration file `conf/config/install_config.conf` on the **all** nodes, synchronizing the following configuration.
* To add a new master node, you need to modify the IPs and masters parameters.
* To add a new worker node, modify the IPs and workers parameters.
We here use MySQL as an example to illustrate how to configure an external database:
> NOTE: If you use MySQL, you need to manually download [mysql-connector-java driver][mysql] (8.0.16) and move it to the libs directory of DolphinScheduler
which is `api-server/libs` and `alert-server/libs` and `master-server/libs` and `worker-server/libs`.
> which is `api-server/libs` and `alert-server/libs` and `master-server/libs` and `worker-server/libs`.
* First of all, follow the instructions in [datasource-setting](datasource-setting.md) `Pseudo-Cluster/Cluster Initialize the Database` section to create and initialize database
* Set the following environment variables in your terminal with your database address, username and password for `{address}`, `{user}` and `{password}`:
@ -27,7 +27,6 @@ DolphinScheduler stores metadata in `relational database`. Currently, we support
> If you use MySQL, you need to manually download [mysql-connector-java driver][mysql] (8.0.16) and move it to the libs directory of DolphinScheduler which is `api-server/libs` and `alert-server/libs` and `master-server/libs` and `worker-server/libs`.
For mysql 5.6 / 5.7
```shell
@ -58,6 +57,7 @@ mysql> FLUSH PRIVILEGES;
```
For PostgreSQL:
```shell
# Use psql-tools to login PostgreSQL
psql
@ -128,3 +128,4 @@ like Docker.
> But if you want to use MySQL as the metabase of DolphinScheduler, it only supports [8.0.16 and above](https:/ /repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.16/mysql-connector-java-8.0.16.jar) version.
> **_Note:_** For the first time deployment, there maybe occur five times of `sh: bin/dolphinscheduler-daemon.sh: No such file or directory` in the terminal,
this is non-important information that you can ignore.
> this is non-important information that you can ignore.
@ -15,7 +15,7 @@ This section describes the one-click deployment of high availability DolphinSche
* Click `install` on the right side of DolphinScheduler to go to the installation page. Fill in the corresponding information and click `OK` to start the installation. You will get automatically redirected to the application view.
API and Worker Services share the configuration file `/opt/dolphinscheduler/conf/common.properties`. To modify the configurations, you only need to modify that of the API service.
@ -61,4 +62,6 @@ Take `DataX` as an example:
* LOCK_PATH:/opt/soft
3. Update component, the plug-in `Datax` will be downloaded automatically and decompress to `/opt/soft`
@ -78,7 +78,6 @@ For example, you can get the master metrics by `curl http://localhost:5679/actua
- ds.task.execution.count: (counter) the number of executed tasks
- ds.task.execution.duration: (histogram) duration of task executions
### Workflow Related Metrics
- ds.workflow.create.command.count: (counter) the number of commands created and inserted by workflows
@ -175,3 +174,4 @@ For example, you can get the master metrics by `curl http://localhost:5679/actua
- system.load.average.1m: the total number of runnable entities queued to available processors and runnable entities running on the available processors averaged over a period
- logback.events: the number of events that made it to the logs grouped by the tag `level`
- http.server.requests: total number of http requests
This page describes details regarding Project screen in Apache DolphinScheduler. Here, you will see all the functions which can be handled in this screen. The following table explains commonly used terms in Apache DolphinScheduler:
| DAG | Tasks in a workflow are assembled in form of Directed Acyclic Graph (DAG). A topological traversal is performed from nodes with zero degrees of entry until there are no subsequent nodes. |
| Workflow Definition | Visualization formed by dragging task nodes and establishing task node associations (DAG). |
| Workflow Instance | Instantiation of the workflow definition, which can be generated by manual start or scheduled scheduling. Each time the process definition runs, a workflow instance is generated. |
Click `Project Management -> Workflow -> Task Instance` to enter the task instance page, as shown in the figure below, click the name of the workflow instance to jump to the DAG diagram of the workflow instance to view the task status.
@ -21,3 +22,4 @@ Click the `View Log` button in the operation column to view the log of the task
- SavePoint: Click the `SavePoint` button in the operation column to do stream task savepoint.
- Stop: Click the `Stop` button in the operation column to stop the stream task.
If the DAG contains stream tasks, the relationship between stream tasks is displayed as a dotted line, and the execution of stream tasks will be skipped when the workflow instance is executed.
@ -103,7 +104,6 @@ The following are the operation functions of the workflow definition list:
* Cc: select notification policy||timeout alarm||when fault tolerance occurs, the process result information or warning email will be copied to the CC list.
* Startup parameter: Set or overwrite global parameter values when starting a new process instance.
* Complement: refers to running the workflow definition within the specified date range and generating the corresponding workflow instance according to the complement policy. The complement policy includes two modes: **serial complement** and **parallel complement**. The date can be selected on the page or entered manually.
* Serial complement: within the specified time range, complement is executed from the start date to the end date, and multiple process instances are generated in turn; Click Run workflow and select the serial complement mode: for example, from July 9 to July 10, execute in sequence, and generate two process instances in sequence on the process instance page.
- Select a start and end time. Within the start and end time range, the workflow is run regularly; outside the start and end time range, no timed workflow instance will be generated.
- Add a timing that execute 5 minutes once, as shown in the following figure:
- Failure strategy, notification strategy, process priority, worker group, notification group, recipient, and CC are the same as workflow running parameters.
- Click the "Create" button to create the timing. Now the timing status is "**Offline**" and the timing needs to be **Online** to make effect.
- Schedule online: Click the `Timing Management` button <imgsrc="../../../../img/timeManagement.png"width="35"/>, enter the timing management page, click the `online` button, the timing status will change to `online`, as shown in the below figure, the workflow makes effect regularly.
- **Edit:** Only processes with success/failed/stop status can be edited. Click the "Edit" button or the workflow instance name to enter the DAG edit page. After the edit, click the "Save" button to confirm, as shown in the figure below. In the pop-up box, check "Whether to update the workflow definition", after saving, the information modified by the instance will be updated to the workflow definition; if not checked, the workflow definition would not be updated.
- **Recovery Failed:** For failed processes, you can perform failure recovery operations, starting from the failed node
- **Stop:****Stop** the running process, the background code will first `kill` the worker process, and then execute `kill -9` operation
- **Pause:****Pause** the running process, the system status will change to **waiting for execution**, it will wait for the task to finish, and pause the next sequence task.
- **Resume pause:** Resume the paused process, start running directly from the **paused node**
- **Delete:** Delete the workflow instance and the task instance under the workflow instance
- **Gantt Chart:** The vertical axis of the Gantt chart is the topological sorting of task instances of the workflow instance, and the horizontal axis is the running time of the task instances, as shown in the figure:
@ -164,10 +164,10 @@ Create a task node in the workflow definition, select the worker group and the e
## Cluster Management
> Add or update cluster
- Each process can be related to zero or several clusters to support multiple environment, now just support k8s.
> - Each process can be related to zero or several clusters to support multiple environment, now just support k8s.
>
> Usage cluster
- After creation and authorization, k8s namespaces and processes will associate clusters. Each cluster will have separate workflows and task instances running independently.
> - After creation and authorization, k8s namespaces and processes will associate clusters. Each cluster will have separate workflows and task instances running independently.
ZooKeeper Master容错完成之后则重新由DolphinScheduler中Scheduler线程调度,遍历 DAG 找到”正在运行”和“提交成功”的任务,对”正在运行”的任务监控其任务实例的状态,对”提交成功”的任务需要判断Task Queue中是否已经存在,如果存在则同样监控任务实例的状态,如果不存在则重新提交任务实例。
* [COMMUNITY-LED DEVELOPMENT "THE APACHE WAY"](https://apache.org/dev/licensing-howto.html)
以Apache为例,当我们使用了ZooKeeper,那么ZooKeeper的NOTICE文件(每个开源项目都会有NOTICE文件,一般位于根目录)则必须在我们的项目中体现,用Apache的话来讲,就是"Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work.
@ -37,7 +36,9 @@ copyright notice that is included in or attached to the work.
该提交前后变更的目的,正如 Peter Hutterer 所言:Re-establishing the context of a piece of code is wasteful. We can’t avoid it completely, so our efforts should go to reducing it as much as possible. Commit messages can do exactly that and as a result, a commit message shows whether a developer is a good collaborator. 因此,DolphinScheduler 结合其他社区以及 Apache 官方文档制定了该规约。