<!--Thanks very much for contributing to Apache DolphinScheduler. Please review https://dolphinscheduler.apache.org/en-us/community/development/pull-request.html before opening a pull request.-->
## Purpose of the pull request
<!--(For example: This pull request adds checkstyle plugin).-->
@ -8,8 +7,9 @@
## Brief change log
<!--*(for example:)*
- *Add maven-checkstyle-plugin to root pom.xml*
- *Add maven-checkstyle-plugin to root pom.xml*
-->
## Verify this pull request
<!--*(Please pick either of the following options)*-->
@ -25,9 +25,9 @@ This pull request is already covered by existing tests, such as *(please describ
This change added tests and can be verified as follows:
<!--*(example:)*
- *Added dolphinscheduler-dao tests for end-to-end.*
- *Added CronUtilsTest to verify the change.*
- *Manually verified the change by testing locally.* -->
- *Added dolphinscheduler-dao tests for end-to-end.*
- *Added CronUtilsTest to verify the change.*
- *Manually verified the change by testing locally.* -->
@ -21,35 +18,35 @@ Dolphin Scheduler Official Website
DolphinScheduler is a distributed and extensible workflow scheduler platform with powerful DAG visual interfaces, dedicated to solving complex job dependencies in the data pipeline and providing various types of jobs available `out of the box`.
Its main objectives are as follows:
- Highly Reliable,
- Highly Reliable,
DolphinScheduler adopts a decentralized multi-master and multi-worker architecture design, which naturally supports easy expansion and high availability (not restricted by a single point of bottleneck), and its performance increases linearly with the increase of machines
- High performance, supporting tens of millions of tasks every day
- Support multi-tenant.
- Cloud Native, DolphinScheduler supports multi-cloud/data center workflow management, also
- High performance, supporting tens of millions of tasks every day
- Support multi-tenant.
- Cloud Native, DolphinScheduler supports multi-cloud/data center workflow management, also
supports Kubernetes, Docker deployment and custom task types, distributed
scheduling, with overall scheduling capability increased linearly with the
scale of the cluster
- Support various task types: Shell, MR, Spark, SQL (MySQL, PostgreSQL, hive, spark SQL), Python, Sub_Process, Procedure, etc.
- Support scheduling of workflows and dependencies, manual scheduling to pause/stop/recover task, support failure task retry/alarm, recover specified nodes from failure, kill task, etc.
- Associate the tasks according to the dependencies of the tasks in a DAG graph, which can visualize the running state of the task in real-time.
- WYSIWYG online editing tasks
- Support the priority of workflows & tasks, task failover, and task timeout alarm or failure.
- Support workflow global parameters and node customized parameter settings.
- Support online upload/download/management of resource files, etc. Support online file creation and editing.
- Support task log online viewing and scrolling and downloading, etc.
- Support the viewing of Master/Worker CPU load, memory, and CPU usage metrics.
- Support displaying workflow history in tree/Gantt chart, as well as statistical analysis on the task status & process status in each workflow.
- Support back-filling data.
- Support internationalization.
- More features waiting for partners to explore...
- Support various task types: Shell, MR, Spark, SQL (MySQL, PostgreSQL, hive, spark SQL), Python, Sub_Process, Procedure, etc.
- Support scheduling of workflows and dependencies, manual scheduling to pause/stop/recover task, support failure task retry/alarm, recover specified nodes from failure, kill task, etc.
- Associate the tasks according to the dependencies of the tasks in a DAG graph, which can visualize the running state of the task in real-time.
- WYSIWYG online editing tasks
- Support the priority of workflows & tasks, task failover, and task timeout alarm or failure.
- Support workflow global parameters and node customized parameter settings.
- Support online upload/download/management of resource files, etc. Support online file creation and editing.
- Support task log online viewing and scrolling and downloading, etc.
- Support the viewing of Master/Worker CPU load, memory, and CPU usage metrics.
- Support displaying workflow history in tree/Gantt chart, as well as statistical analysis on the task status & process status in each workflow.
- Support back-filling data.
- Support internationalization.
- More features waiting for partners to explore...
## What's in DolphinScheduler
Stability | Accessibility | Features | Scalability |
Decentralized multi-master and multi-worker | Visualization of workflow key information, such as task status, task type, retry times, task operation machine information, visual variables, and so on at a glance. | Support pause, recover operation | Support customized task types
support HA | Visualization of all workflow operations, dragging tasks to draw DAGs, configuring data sources and resources. At the same time, for third-party systems, provide API mode operations. | Users on DolphinScheduler can achieve many-to-one or one-to-one mapping relationship through tenants and Hadoop users, which is very important for scheduling large data jobs. | The scheduler supports distributed scheduling, and the overall scheduling capability will increase linearly with the scale of the cluster. Master and Worker support dynamic adjustment.
Overload processing: By using the task queue mechanism, the number of schedulable tasks on a single machine can be flexibly configured. Machine jam can be avoided with high tolerance to numbers of tasks cached in task queue. | One-click deployment | Support traditional shell tasks, and big data platform task scheduling: MR, Spark, SQL (MySQL, PostgreSQL, hive, spark SQL), Python, Procedure, Sub_Process | |
| Stability | Accessibility | Features | Scalability |
| Decentralized multi-master and multi-worker | Visualization of workflow key information, such as task status, task type, retry times, task operation machine information, visual variables, and so on at a glance.| Support pause, recover operation | Support customized task types |
| support HA | Visualization of all workflow operations, dragging tasks to draw DAGs, configuring data sources and resources. At the same time, for third-party systems, provide API mode operations. | Users on DolphinScheduler can achieve many-to-one or one-to-one mapping relationship through tenants and Hadoop users, which is very important for scheduling large data jobs. | The scheduler supports distributed scheduling, and the overall scheduling capability will increase linearly with the scale of the cluster. Master and Worker support dynamic adjustment. |
| Overload processing: By using the task queue mechanism, the number of schedulable tasks on a single machine can be flexibly configured. Machine jam can be avoided with high tolerance to numbers of tasks cached in task queue. | One-click deployment | Support traditional shell tasks, and big data platform task scheduling: MR, Spark, SQL (MySQL, PostgreSQL, hive, spark SQL), Python, Procedure, Sub_Process | |
> - The above recommended configuration is the minimum configuration for deploying DolphinScheduler. Higher configuration is strongly recommended for production environments.
@ -4,4 +4,4 @@ Apache DolphinScheduler provides a distributed and easy to expand visual workflo
Apache DolphinScheduler aims to solve complex big data task dependencies and to trigger relationships in data OPS orchestration for various big data applications. Solves the intricate dependencies of data R&D ETL and the inability to monitor the health status of tasks. DolphinScheduler assembles tasks in the Directed Acyclic Graph (DAG) streaming mode, which can monitor the execution status of tasks in time, and supports operations like retry, recovery failure from specified nodes, pause, resume, and kill tasks, etc.
|master.registry-disconnect-strategy.strategy|stop|Used when the master disconnect from registry, default value: stop. Optional values include stop, waiting|
|master.registry-disconnect-strategy.max-waiting-time|100s|Used when the master disconnect from registry, and the disconnect strategy is waiting, this config means the master will waiting to reconnect to registry in given times, and after the waiting times, if the master still cannot connect to registry, will stop itself, if the value is 0s, the Master will waitting infinitely|
|worker.registry-disconnect-strategy.max-waiting-time|100s|Used when the worker disconnect from registry, and the disconnect strategy is waiting, this config means the worker will waiting to reconnect to registry in given times, and after the waiting times, if the worker still cannot connect to registry, will stop itself, if the value is 0s, will waitting infinitely |
MasterServer adopts a distributed and decentralized design concept. MasterServer is mainly responsible for DAG task segmentation, task submission monitoring, and monitoring the health status of other MasterServer and WorkerServer at the same time.
When the MasterServer service starts, register a temporary node with ZooKeeper, and perform fault tolerance by monitoring changes in the temporary node of ZooKeeper.
MasterServer provides monitoring services based on netty.
MasterServer adopts a distributed and decentralized design concept. MasterServer is mainly responsible for DAG task segmentation, task submission monitoring, and monitoring the health status of other MasterServer and WorkerServer at the same time.
When the MasterServer service starts, register a temporary node with ZooKeeper, and perform fault tolerance by monitoring changes in the temporary node of ZooKeeper.
MasterServer provides monitoring services based on netty.
#### The Service Mainly Includes:
- **DistributedQuartz** distributed scheduling component, which is mainly responsible for the start and stop operations of scheduled tasks. When quartz start the task, there will be a thread pool inside the Master responsible for the follow-up operation of the processing task;
#### The Service Mainly Includes:
- **MasterSchedulerService** is a scanning thread that regularly scans the `t_ds_command` table in the database, runs different business operations according to different **command types**;
- **DistributedQuartz** distributed scheduling component, which is mainly responsible for the start and stop operations of scheduled tasks. When quartz start the task, there will be a thread pool inside the Master responsible for the follow-up operation of the processing task;
- **WorkflowExecuteRunnable** is mainly responsible for DAG task segmentation, task submission monitoring, and logical processing of different event types;
- **MasterSchedulerService** is a scanning thread that regularly scans the `t_ds_command` table in the database, runs different business operations according to different **command types**;
- **TaskExecuteRunnable** is mainly responsible for the processing and persistence of tasks, and generates task events and submits them to the event queue of the process instance;
- **WorkflowExecuteRunnable** is mainly responsible for DAG task segmentation, task submission monitoring, and logical processing of different event types;
- **EventExecuteService** is mainly responsible for the polling of the event queue of the process instances;
- **TaskExecuteRunnable** is mainly responsible for the processing and persistence of tasks, and generates task events and submits them to the event queue of the process instance;
- **StateWheelExecuteThread** is mainly responsible for process instance and task timeout, task retry, task-dependent polling, and generates the corresponding process instance or task event and submits it to the event queue of the process instance;
- **EventExecuteService** is mainly responsible for the polling of the event queue of the process instances;
- **FailoverExecuteThread** is mainly responsible for the logic of Master fault tolerance and Worker fault tolerance;
- **StateWheelExecuteThread** is mainly responsible for process instance and task timeout, task retry, task-dependent polling, and generates the corresponding process instance or task event and submits it to the event queue of the process instance;
* **WorkerServer**
- **FailoverExecuteThread** is mainly responsible for the logic of Master fault tolerance and Worker fault tolerance;
WorkerServer also adopts a distributed and decentralized design concept. WorkerServer is mainly responsible for task execution and providing log services.
* **WorkerServer**
When the WorkerServer service starts, register a temporary node with ZooKeeper and maintain a heartbeat.
WorkerServer provides monitoring services based on netty.
#### The Service Mainly Includes:
WorkerServer also adopts a distributed and decentralized design concept. WorkerServer is mainly responsible for task execution and providing log services.
- **WorkerManagerThread** is mainly responsible for the submission of the task queue, continuously receives tasks from the task queue, and submits them to the thread pool for processing;
When the WorkerServer service starts, register a temporary node with ZooKeeper and maintain a heartbeat.
WorkerServer provides monitoring services based on netty.
- **TaskExecuteThread** is mainly responsible for the process of task execution, and the actual processing of tasks according to different task types;
#### The Service Mainly Includes:
- **RetryReportTaskStatusThread** is mainly responsible for regularly polling to report the task status to the Master until the Master replies to the status ack to avoid the loss of the task status;
- **WorkerManagerThread** is mainly responsible for the submission of the task queue, continuously receives tasks from the task queue, and submits them to the thread pool for processing;
* **ZooKeeper**
- **TaskExecuteThread** is mainly responsible for the process of task execution, and the actual processing of tasks according to different task types;
ZooKeeper service, MasterServer and WorkerServer nodes in the system all use ZooKeeper for cluster management and fault tolerance. In addition, the system implements event monitoring and distributed locks based on ZooKeeper.
- **RetryReportTaskStatusThread** is mainly responsible for regularly polling to report the task status to the Master until the Master replies to the status ack to avoid the loss of the task status;
We have also implemented queues based on Redis, but we hope DolphinScheduler depends on as few components as possible, so we finally removed the Redis implementation.
* **ZooKeeper**
* **AlertServer**
ZooKeeper service, MasterServer and WorkerServer nodes in the system all use ZooKeeper for cluster management and fault tolerance. In addition, the system implements event monitoring and distributed locks based on ZooKeeper.
We have also implemented queues based on Redis, but we hope DolphinScheduler depends on as few components as possible, so we finally removed the Redis implementation.
* **AlertServer**
Provides alarm services, and implements rich alarm methods through alarm plugins.
* **API**
* **API**
The API interface layer is mainly responsible for processing requests from the front-end UI layer. The service uniformly provides RESTful APIs to provide request services to external.
The API interface layer is mainly responsible for processing requests from the front-end UI layer. The service uniformly provides RESTful APIs to provide request services to external.
* **UI**
* **UI**
The front-end page of the system provides various visual operation interfaces of the system, see more at [Introduction to Functions](../guide/homepage.md) section.
@ -84,6 +84,7 @@
##### Centralized Thinking
The centralized design concept is relatively simple. The nodes in the distributed cluster are roughly divided into two roles according to responsibilities:
@ -120,8 +121,6 @@ The service fault-tolerance design relies on ZooKeeper's Watcher mechanism, and
</p>
Among them, the Master monitors the directories of other Masters and Workers. If the remove event is triggered, perform fault tolerance of the process instance or task instance according to the specific business logic.
- Master fault tolerance:
<palign="center">
@ -146,7 +145,7 @@ Fault-tolerant content: When sending the remove event of the Worker node, the Ma
Fault-tolerant post-processing: Once the Master Scheduler thread finds that the task instance is in the "fault-tolerant" state, it takes over the task and resubmits it.
Note: Due to "network jitter", the node may lose heartbeat with ZooKeeper in a short period of time, and the node's remove event may occur. For this situation, we use the simplest way, that is, once the node and ZooKeeper timeout connection occurs, then directly stop the Master or Worker service.
Note: Due to "network jitter", the node may lose heartbeat with ZooKeeper in a short period of time, and the node's remove event may occur. For this situation, we use the simplest way, that is, once the node and ZooKeeper timeout connection occurs, then directly stop the Master or Worker service.
##### Task Failed and Try Again
@ -170,26 +169,26 @@ If there is a task failure in the workflow that reaches the maximum retry times,
In the early schedule design, if there is no priority design and use the fair scheduling, the task submitted first may complete at the same time with the task submitted later, thus invalid the priority of process or task. So we have re-designed this, and the following is our current design:
- According to **the priority of different process instances** prior over **priority of the same process instance** prior over **priority of tasks within the same process** prior over **tasks within the same process**, process task submission order from highest to Lowest.
- The specific implementation is to parse the priority according to the JSON of the task instance, and then save the **process instance priority_process instance id_task priority_task id** information to the ZooKeeper task queue. When obtain from the task queue, we can get the highest priority task by comparing string.
- According to **the priority of different process instances** prior over **priority of the same process instance** prior over **priority of tasks within the same process** prior over **tasks within the same process**, process task submission order from highest to Lowest.
- The specific implementation is to parse the priority according to the JSON of the task instance, and then save the **process instance priority_process instance id_task priority_task id** information to the ZooKeeper task queue. When obtain from the task queue, we can get the highest priority task by comparing string.
- The priority of the process definition is to consider that some processes need to process before other processes. Configure the priority when the process starts or schedules. There are 5 levels in total, which are HIGHEST, HIGH, MEDIUM, LOW, and LOWEST. As shown below
- The priority of the process definition is to consider that some processes need to process before other processes. Configure the priority when the process starts or schedules. There are 5 levels in total, which are HIGHEST, HIGH, MEDIUM, LOW, and LOWEST. As shown below
@ -57,3 +57,4 @@ You can customise the configuration by changing the following properties in work
- worker.max.cpuload.avg=-1 (worker max cpuload avg, only higher than the system cpu load average, worker server can be dispatched tasks. default value -1: the number of cpu cores * 2)
- worker.reserved.memory=0.3 (worker reserved memory, only lower than system available memory, worker server can be dispatched tasks. default value 0.3, the unit is G)
see sql files in `dolphinscheduler/dolphinscheduler-dao/src/main/resources/sql`
see sql files in `dolphinscheduler/dolphinscheduler-dao/src/main/resources/sql`
---
@ -15,7 +15,7 @@ see sql files in `dolphinscheduler/dolphinscheduler-dao/src/main/resources/sql`
- One tenant can own Multiple users.
- The queue field in the `t_ds_user` table stores the `queue_name` information in the `t_ds_queue` table, `t_ds_tenant` stores queue information using `queue_id` column. During the execution of the process definition, the user queue has the highest priority. If the user queue is null, use the tenant queue.
- The `user_id` field in the `t_ds_datasource` table shows the user who create the data source. The user_id in `t_ds_relation_datasource_user` shows the user who has permission to the data source.
@ -26,6 +26,7 @@ see sql files in `dolphinscheduler/dolphinscheduler-dao/src/main/resources/sql`
- The `user_id` in the `t_ds_udfs` table represents the user who create the UDF, and the `user_id` in the `t_ds_relation_udfs_user` table represents a user who has permission to the UDF.
- A process definition corresponds to multiple task definitions, which are associated through `t_ds_process_task_relation` and the associated key is `code + version`. When the pre-task of the task is empty, the corresponding `pre_task_node` and `pre_task_version` are 0.
- A process definition can have multiple process instances `t_ds_process_instance`, one process instance corresponds to one or more task instances `t_ds_task_instance`.
- The data stored in the `t_ds_relation_process_instance` table is used to handle the case that the process definition contains sub-processes. `parent_process_instance_id` represents the id of the main process instance containing the sub-process, `process_instance_id` represents the id of the sub-process instance, `parent_task_instance_id` represents the task instance id of the sub-process node. The process instance table and the task instance table correspond to the `t_ds_process_instance` table and the `t_ds_task_instance` table, respectively.
- The data stored in the `t_ds_relation_process_instance` table is used to handle the case that the process definition contains sub-processes. `parent_process_instance_id` represents the id of the main process instance containing the sub-process, `process_instance_id` represents the id of the sub-process instance, `parent_task_instance_id` represents the task instance id of the sub-process node. The process instance table and the task instance table correspond to the `t_ds_process_instance` table and the `t_ds_task_instance` table, respectively.
A standardized and unified API is the cornerstone of project design.The API of DolphinScheduler follows the REST ful standard. REST ful is currently the most popular Internet software architecture. It has a clear structure, conforms to standards, is easy to understand and extend.
This article uses the DolphinScheduler API as an example to explain how to construct a Restful API.
## 1. URI design
REST is "Representational State Transfer".The design of Restful URI is based on resources.The resource corresponds to an entity on the network, for example: a piece of text, a picture, and a service. And each resource corresponds to a URI.
+ One Kind of Resource: expressed in the plural, such as `task-instances`、`groups` ;
@ -12,36 +14,43 @@ REST is "Representational State Transfer".The design of Restful URI is based on
+ A Sub Resource:`/instances/{instanceId}/tasks/{taskId}`;
## 2. Method design
We need to locate a certain resource by URI, and then use Method or declare actions in the path suffix to reflect the operation of the resource.
### ① Query - GET
Use URI to locate the resource, and use GET to indicate query.
+ When the URI is a type of resource, it means to query a type of resource. For example, the following example indicates paging query `alter-groups`.
```
Method: GET
/dolphinscheduler/alert-groups
```
+ When the URI is a single resource, it means to query this resource. For example, the following example means to query the specified `alter-group`.
```
Method: GET
/dolphinscheduler/alter-groups/{id}
```
+ In addition, we can also express query sub-resources based on URI, as follows:
```
Method: GET
/dolphinscheduler/projects/{projectId}/tasks
```
**The above examples all represent paging query. If we need to query all data, we need to add `/list` after the URI to distinguish. Do not mix the same API for both paged query and query.**
```
Method: GET
/dolphinscheduler/alert-groups/list
```
### ② Create - POST
Use URI to locate the resource, use POST to indicate create, and then return the created id to requester.
Use URI to locate the resource, use PUT to indicate modify.
+ modify an `alert-group`
```
Method: PUT
/dolphinscheduler/alter-groups/{alterGroupId}
```
### ④ Delete -DELETE
Use URI to locate the resource, use DELETE to indicate delete.
+ delete an `alert-group`
```
Method: DELETE
/dolphinscheduler/alter-groups/{alterGroupId}
```
+ batch deletion: batch delete the id array,we should use POST. **(Do not use the DELETE method, because the body of the DELETE request has no semantic meaning, and it is possible that some gateways, proxies, and firewalls will directly strip off the request body after receiving the DELETE request.)**
```
Method: POST
/dolphinscheduler/alter-groups/batch-delete
```
### ⑤ Partial Modifications -PATCH
Use URI to locate the resource, use PATCH to partial modifications.
```
@ -89,20 +105,27 @@ Method: PATCH
```
### ⑥ Others
In addition to creating, deleting, modifying and quering, we also locate the corresponding resource through url, and then append operations to it after the path, such as:
There are two types of parameters, one is request parameter and the other is path parameter. And the parameter must use small hump.
In the case of paging, if the parameter entered by the user is less than 1, the front end needs to automatically turn to 1, indicating that the first page is requested; When the backend finds that the parameter entered by the user is greater than the total number of pages, it should directly return to the last page.
## 4. Others design
### base URL
The URI of the project needs to use `/<project_name>` as the base path, so as to identify that these APIs are under this project.
@ -10,7 +10,6 @@ In contrast, API testing focuses on whether a complete operation chain can be co
For example, the API test of the tenant management interface focuses on whether users can log in normally; If the login fails, whether the error message can be displayed correctly. After logging in, you can perform tenant management operations through the sessionid you carry.
## API Test
### API-Pages
@ -49,7 +48,6 @@ In addition, during the testing process, the interface are not requested directl
On the login page, only the input parameter specification of the interface request is defined. For the output parameter of the interface request, only the unified basic response structure is defined. The data actually returned by the interface is tested in the actual test case. Whether the input and output of main test interfaces can meet the requirements of test cases.
### API-Cases
The following is an example of a tenant management test. As explained earlier, we use docker-compose for deployment, so for each test case, we need to import the corresponding file in the form of an annotation.
Before explaining the architecture of the schedule system, let us first understand the common nouns of the schedule system.
### 1.Noun Interpretation
@ -12,7 +13,7 @@ Before explaining the architecture of the schedule system, let us first understa
</p>
</p>
**Process definition**: Visualization **DAG** by dragging task nodes and establishing associations of task nodes
**Process definition**: Visualization **DAG** by dragging task nodes and establishing associations of task nodes
**Process instance**: A process instance is an instantiation of a process definition, which can be generated by manual startup or scheduling. The process definition runs once, a new process instance is generated
@ -34,11 +35,10 @@ Before explaining the architecture of the schedule system, let us first understa
**Complement**: Complement historical data, support **interval parallel and serial** two complement methods
@ -46,60 +46,51 @@ Before explaining the architecture of the schedule system, let us first understa
</p>
</p>
#### 2.2 Architectural description
* **MasterServer**
* **MasterServer**
MasterServer adopts the distributed non-central design concept. MasterServer is mainly responsible for DAG task split, task submission monitoring, and monitoring the health status of other MasterServer and WorkerServer.
When the MasterServer service starts, it registers a temporary node with Zookeeper, and listens to the Zookeeper temporary node state change for fault tolerance processing.
MasterServer adopts the distributed non-central design concept. MasterServer is mainly responsible for DAG task split, task submission monitoring, and monitoring the health status of other MasterServer and WorkerServer.
When the MasterServer service starts, it registers a temporary node with Zookeeper, and listens to the Zookeeper temporary node state change for fault tolerance processing.
#####The servicemainly contains:
##### The service mainly contains:
- **Distributed Quartz** distributed scheduling component, mainly responsible for the start and stop operation of the scheduled task. When the quartz picks up the task, the master internally has a thread pool to be responsible for the subsequent operations of the task.
- **Distributed Quartz** distributed scheduling component, mainly responsible for the start and stop operation of the scheduled task. When the quartz picks up the task, the master internally has a thread pool to be responsible for the subsequent operations of the task.
- **MasterSchedulerThread** is a scan thread that periodically scans the **command** table in the database for different business operations based on different **command types**
- **MasterSchedulerThread** is a scan thread that periodically scans the **command** table in the database for different business operations based on different **command types**
- **MasterExecThread** is mainly responsible for DAG task segmentation, task submission monitoring, logic processing of various command types
- **MasterExecThread** is mainly responsible for DAG task segmentation, task submission monitoring, logic processing of various command types
- **MasterTaskExecThread** is mainly responsible for task persistence
- **MasterTaskExecThread** is mainly responsible for task persistence
* **WorkerServer**
-WorkerServeralsoadopts a distributed, non-central design concept.WorkerServer is mainly responsible for task execution and providing log services. When the WorkerServer service starts, it registers the temporary node with Zookeeper and maintains the heartbeat.
* **WorkerServer**
##### This service contains:
- WorkerServer also adopts a distributed, non-central design concept. WorkerServer is mainly responsible for task execution and providing log services. When the WorkerServer service starts, it registers the temporary node with Zookeeper and maintains the heartbeat.
- **FetchTaskThread** is mainly responsible for continuously receiving tasks from **Task Queue** and calling **TaskScheduleThread** corresponding executors according to different task types.
- **ZooKeeper**
##### This service contains:
The ZooKeeper service, the MasterServer and the WorkerServer nodes in the system all use the ZooKeeper for cluster management and fault tolerance. In addition, the system also performs event monitoring and distributed locking based on ZooKeeper.
We have also implemented queues based on Redis, but we hope that DolphinScheduler relies on as few components as possible, so we finally removed the Redis implementation.
- **FetchTaskThread** is mainly responsible for continuously receiving tasks from **Task Queue** and calling **TaskScheduleThread** corresponding executors according to different task types.
- **Task Queue**
- **ZooKeeper**
The task queue operation is provided. Currently, the queue is also implemented based on Zookeeper. Since there is less information stored in the queue, there is no need to worry about too much data in the queue. In fact, we have over-measured a million-level data storage queue, which has no effect on system stability and performance.
The ZooKeeper service, the MasterServer and the WorkerServer nodes in the system all use the ZooKeeper for cluster management and fault tolerance. In addition, the system also performs event monitoring and distributed locking based on ZooKeeper.
We have also implemented queues based on Redis, but we hope that DolphinScheduler relies on as few components as possible, so we finally removed the Redis implementation.
- **Alert**
- **Task Queue**
Provides alarm-related interfaces. The interfaces mainly include **Alarms**. The storage, query, and notification functions of the two types of alarm data. The notification function has two types: **mail notification** and **SNMP (not yet implemented)**.
The task queue operation is provided. Currently, the queue is also implemented based on Zookeeper. Since there is less information stored in the queue, there is no need to worry about too much data in the queue. In fact, we have over-measured a million-level data storage queue, which has no effect on system stability and performance.
- **API**
- **Alert**
The API interface layer is mainly responsible for processing requests from the front-end UI layer. The service provides a RESTful api to provide request services externally.
Interfaces include workflow creation, definition, query, modification, release, offline, manual start, stop, pause, resume, start execution from this node, and more.
Provides alarm-related interfaces. The interfaces mainly include **Alarms**. The storage, query, and notification functions of the two types of alarm data. The notification function has two types: **mail notification** and **SNMP (not yet implemented)**.
- **UI**
- **API**
The API interface layer is mainly responsible for processing requests from the front-end UI layer. The service provides a RESTful api to provide request services externally.
Interfaces include workflow creation, definition, query, modification, release, offline, manual start, stop, pause, resume, start execution from this node, and more.
- **UI**
The front-end page of the system provides various visual operation interfaces of the system. For details, see the [quick start](https://dolphinscheduler.apache.org/en-us/docs/latest/user_doc/about/introduction.html) section.
The front-end page of the system provides various visual operation interfaces of the system. For details, see the [quick start](https://dolphinscheduler.apache.org/en-us/docs/latest/user_doc/about/introduction.html) section.
#### 2.3 Architectural Design Ideas
@ -130,10 +121,9 @@ Problems in the design of centralized :
- In the decentralized design, there is usually no Master/Slave concept, all roles are the same, the status is equal, the global Internet is a typical decentralized distributed system, networked arbitrary node equipment down machine , all will only affect a small range of features.
- The core design of decentralized design is that there is no "manager" that is different from other nodes in the entire distributed system, so there is no single point of failure problem. However, since there is no "manager" node, each node needs to communicate with other nodes to get the necessary machine information, and the unreliable line of distributed system communication greatly increases the difficulty of implementing the above functions.
- In fact, truly decentralized distributed systems are rare. Instead, dynamic centralized distributed systems are constantly emerging. Under this architecture, the managers in the cluster are dynamically selected, rather than preset, and when the cluster fails, the nodes of the cluster will spontaneously hold "meetings" to elect new "managers". Go to preside over the work. The most typical case is the Etcd implemented in ZooKeeper and Go.
- Decentralization of DolphinScheduler is the registration of Master/Worker to ZooKeeper. The Master Cluster and the Worker Cluster are not centered, and the Zookeeper distributed lock is used to elect one Master or Worker as the “manager” to perform the task.
##### 二、Distributed lock practice
##### 二、Distributed lock practice
DolphinScheduler uses ZooKeeper distributed locks to implement only one Master to execute the Scheduler at the same time, or only one Worker to perform task submission.
@ -184,8 +174,6 @@ Service fault tolerance design relies on ZooKeeper's Watcher mechanism. The impl
The Master monitors the directories of other Masters and Workers. If the remove event is detected, the process instance is fault-tolerant or the task instance is fault-tolerant according to the specific business logic.
- Master fault tolerance flow chart:
<palign="center">
@ -194,8 +182,6 @@ The Master monitors the directories of other Masters and Workers. If the remove
After the ZooKeeper Master is fault-tolerant, it is rescheduled by the Scheduler thread in DolphinScheduler. It traverses the DAG to find the "Running" and "Submit Successful" tasks, and monitors the status of its task instance for the "Running" task. You need to determine whether the Task Queue already exists. If it exists, monitor the status of the task instance. If it does not exist, resubmit the task instance.
- Worker fault tolerance flow chart:
<palign="center">
@ -204,7 +190,7 @@ After the ZooKeeper Master is fault-tolerant, it is rescheduled by the Scheduler
Once the Master Scheduler thread finds the task instance as "need to be fault tolerant", it takes over the task and resubmits.
Note: Because the "network jitter" may cause the node to lose the heartbeat of ZooKeeper in a short time, the node's remove event occurs. In this case, we use the easiest way, that is, once the node has timeout connection with ZooKeeper, it will directly stop the Master or Worker service.
Note: Because the "network jitter" may cause the node to lose the heartbeat of ZooKeeper in a short time, the node's remove event occurs. In this case, we use the easiest way, that is, once the node has timeout connection with ZooKeeper, it will directly stop the Master or Worker service.
###### 2. Task failure retry
@ -214,8 +200,6 @@ Here we must first distinguish between the concept of task failure retry, proces
- Process failure recovery is process level, is done manually, recovery can only be performed **from the failed node** or **from the current node**
- Process failure rerun is also process level, is done manually, rerun is from the start node
Next, let's talk about the topic, we divided the task nodes in the workflow into two types.
- One is a business node, which corresponds to an actual script or processing statement, such as a Shell node, an MR node, a Spark node, a dependent node, and so on.
@ -225,16 +209,12 @@ Each **service node** can configure the number of failed retries. When the task
If there is a task failure in the workflow that reaches the maximum number of retries, the workflow will fail to stop, and the failed workflow can be manually rerun or process resumed.
##### V. Task priority design
In the early scheduling design, if there is no priority design and fair scheduling design, it will encounter the situation that the task submitted first may be completed simultaneously with the task submitted subsequently, but the priority of the process or task cannot be set. We have redesigned this, and we are currently designing it as follows:
- According to **different process instance priority** prioritizes **same process instance priority** prioritizes **task priority within the same process** takes precedence over **same process** commit order from high Go to low for task processing.
- The specific implementation is to resolve the priority according to the json of the task instance, and then save the **process instance priority _ process instance id_task priority _ task id** information in the ZooKeeper task queue, when obtained from the task queue, Through string comparison, you can get the task that needs to be executed first.
- The priority of the process definition is that some processes need to be processed before other processes. This can be configured at the start of the process or at the time of scheduled start. There are 5 levels, followed by HIGHEST, HIGH, MEDIUM, LOW, and LOWEST. As shown below
<palign="center">
@ -308,8 +288,6 @@ Public class TaskLogFilter extends Filter<ILoggingEvent> {
}
```
### summary
Starting from the scheduling, this paper introduces the architecture principle and implementation ideas of the big data distributed workflow scheduling system-DolphinScheduler. To be continued
@ -6,3 +6,4 @@ Switch task workflow step as follows
* `SwitchTaskExecThread` processes the expressions defined in `switch` from top to bottom, obtains the value of the variable from `varPool`, and parses the expression through `javascript`. If the expression returns true, stop checking and record The order of the expression, here we record as resultConditionLocation. The task of SwitchTaskExecThread is over
* After the `switch` task runs, if there is no error (more commonly, the user-defined expression is out of specification or there is a problem with the parameter name), then `MasterExecThread.submitPostNode` will obtain the downstream node of the `DAG` to continue execution.
* If it is found in `DagHelper.parsePostNodes` that the current node (the node that has just completed the work) is a `switch` node, the `resultConditionLocation` will be obtained, and all branches except `resultConditionLocation` in the SwitchParameters will be skipped. In this way, only the branches that need to be executed are left
@ -6,7 +6,7 @@ DolphinScheduler is undergoing a microkernel + plug-in architecture change. All
For alarm-related codes, please refer to the `dolphinscheduler-alert-api` module. This module defines the extension interface of the alarm plug-in and some basic codes. When we need to realize the plug-inization of related functions, it is recommended to read the code of this block first. Of course, it is recommended that you read the document. This will reduce a lot of time, but the document There is a certain degree of lag. When the document is missing, it is recommended to take the source code as the standard (if you are interested, we also welcome you to submit related documents). In addition, we will hardly make changes to the extended interface (excluding new additions) , Unless there is a major structural adjustment, there is an incompatible upgrade version, so the existing documents can generally be satisfied.
We use the native JAVA-SPI, when you need to extend, in fact, you only need to pay attention to the extension of the `org.apache.dolphinscheduler.alert.api.AlertChannelFactory` interface, the underlying logic such as plug-in loading, and other kernels have been implemented, Which makes our development more focused and simple.
We use the native JAVA-SPI, when you need to extend, in fact, you only need to pay attention to the extension of the `org.apache.dolphinscheduler.alert.api.AlertChannelFactory` interface, the underlying logic such as plug-in loading, and other kernels have been implemented, Which makes our development more focused and simple.
In additional, the `AlertChannelFactory` extends from `PrioritySPI`, this means you can set the plugin priority, when you have two plugin has the same name, you can customize the priority by override the `getIdentify` method. The high priority plugin will be load, but if you have two plugin with the same name and same priority, the server will throw `IllegalArgumentException` when load the plugin.
@ -26,8 +26,8 @@ If you don't care about its internal design, but simply want to know how to deve
This module is currently a plug-in provided by us, and now we have supported dozens of plug-ins, such as Email, DingTalk, Script, etc.
#### Alert SPI Main class information.
AlertChannelFactory
Alarm plug-in factory interface. All alarm plug-ins need to implement this interface. This interface is used to define the name of the alarm plug-in and the required parameters. The create method is used to create a specific alarm plug-in instance.
@ -56,36 +56,40 @@ The specific design of alert_spi can be seen in the issue: [Alert Plugin Design]
* Email
Email alert notification
Email alert notification
* DingTalk
Alert for DingTalk group chat bots
Related parameter configuration can refer to the DingTalk robot document.
Alert for DingTalk group chat bots
Related parameter configuration can refer to the DingTalk robot document.
* EnterpriseWeChat
EnterpriseWeChat alert notifications
EnterpriseWeChat alert notifications
Related parameter configuration can refer to the EnterpriseWeChat robot document.
Related parameter configuration can refer to the EnterpriseWeChat robot document.
* Script
We have implemented a shell script for alerting. We will pass the relevant alert parameters to the script and you can implement your alert logic in the shell. This is a good way to interface with internal alerting applications.
We have implemented a shell script for alerting. We will pass the relevant alert parameters to the script and you can implement your alert logic in the shell. This is a good way to interface with internal alerting applications.
* SMS
SMS alerts
SMS alerts
* FeiShu
FeiShu alert notification
* Slack
Slack alert notification
* PagerDuty
PagerDuty alert notification
* WebexTeams
WebexTeams alert notification
@ -95,9 +99,10 @@ The specific design of alert_spi can be seen in the issue: [Alert Plugin Design]
* Telegram
Telegram alert notification
Related parameter configuration can refer to the Telegram document.
* Http
We have implemented a Http script for alerting. And calling most of the alerting plug-ins end up being Http requests, if we not support your alert plug-in yet, you can use Http to realize your alert login. Also welcome to contribute your common plug-ins to the community :)
For specific configuration information, please refer to the parameter information provided by the specific plug-in, for example zk: `org/apache/dolphinscheduler/plugin/registry/zookeeper/ZookeeperConfiguration.java`
@ -14,4 +14,4 @@ In additional, the `TaskChannelFactory` extends from `PrioritySPI`, this means y
Since the task plug-in involves the front-end page, the front-end SPI has not yet been implemented, so you need to implement the front-end page corresponding to the plug-in separately.
If there is a class conflict in the task plugin, you can use [Shade-Relocating Classes](https://maven.apache.org/plugins/maven-shade-plugin/) to solve this problem.
If there is a class conflict in the task plugin, you can use [Shade-Relocating Classes](https://maven.apache.org/plugins/maven-shade-plugin/) to solve this problem.
@ -77,31 +77,31 @@ In addition, during the testing process, the elements are not manipulated direct
The SecurityPage provides goToTab methods to test the corresponding sidebar jumps, mainly including TenantPage, UserPage, WorkerGroupPage and QueuePage. These pages are implemented in the same way, mainly to test whether the input, add and delete buttons of the form can return to the corresponding page.
```java
public <TextendsSecurityPage.Tab> T goToTab(Class<T> tab) {
if (tab == TenantPage.class) {
WebElement menuTenantManageElement = new WebDriverWait(driver, 60)
@ -146,14 +146,14 @@ The following is an example of a tenant management test. As explained earlier, w
The browser is loaded using the RemoteWebDriver provided with Selenium. Before each test case is started there is some preparation work that needs to be done. For example: logging in the user, jumping to the corresponding page (depending on the specific test case).
```java
@BeforeAll
public static void setup() {
new LoginPage(browser)
.login("admin", "dolphinscheduler123")
.goToNav(SecurityPage.class)
.goToTab(TenantPage.class)
;
}
@BeforeAll
public static void setup() {
new LoginPage(browser)
.login("admin", "dolphinscheduler123")
.goToNav(SecurityPage.class)
.goToTab(TenantPage.class)
;
}
```
When the preparation is complete, it is time for the formal test case writing. We use a form of @Order() annotation for modularity, to confirm the order of the tests. After the tests have been run, assertions are used to determine if the tests were successful, and if the assertion returns true, the tenant creation was successful. The following code can be used as a reference:
@ -176,14 +176,14 @@ The rest are similar cases and can be understood by referring to the specific so
@ -17,10 +18,16 @@ Lodash high performance JavaScript utility library
### Development environment
- #### Node installation
Node package download (note version v12.20.2) `https://nodejs.org/download/release/v12.20.2/`
-
#### Node installation
Node package download (note version v12.20.2) `https://nodejs.org/download/release/v12.20.2/`
-
#### Front-end project construction
- #### Front-end project construction
Use the command line mode `cd` enter the `dolphinscheduler-ui` project directory and execute `npm install` to pull the project dependency package.
> If `npm install` is very slow, you can set the taobao mirror
@ -36,13 +43,16 @@ npm config set registry http://registry.npm.taobao.org/
API_BASE = http://127.0.0.1:12345
```
> ##### ! ! ! Special attention here. If the project reports a "node-sass error" error while pulling the dependency package, execute the following command again after execution.
##### ! ! ! Special attention here. If the project reports a "node-sass error" error while pulling the dependency package, execute the following command again after execution.
Data Source Management => `http://localhost:8888/#/datasource/list`
Security Center => `http://localhost:8888/#/security/tenant`
```
| Tenant Management
| User Management
@ -174,16 +187,19 @@ User Center => `http://localhost:8888/#/user/account`
The project `src/js/conf/home` is divided into
`pages` => route to page directory
```
The page file corresponding to the routing address
The page file corresponding to the routing address
```
`router` => route management
```
vue router, the entry file index.js in each page will be registered. Specific operations: https://router.vuejs.org/zh/
```
`store` => status management
```
The page corresponding to each route has a state management file divided into:
@ -201,9 +217,13 @@ Specific action:https://vuex.vuejs.org/zh/
```
## specification
## Vue specification
##### 1.Component name
The component is named multiple words and is connected with a wire (-) to avoid conflicts with HTML tags and a clearer structure.
```
// positive example
export default {
@ -212,7 +232,9 @@ export default {
```
##### 2.Component files
The internal common component of the `src/js/module/components` project writes the folder name with the same name as the file name. The subcomponents and util tools that are split inside the common component are placed in the internal `_source` folder of the component.
```
└── components
├── header
@ -228,6 +250,7 @@ The internal common component of the `src/js/module/components` project writes t
```
##### 3.Prop
When you define Prop, you should always name it in camel format (camelCase) and use the connection line (-) when assigning values to the parent component.
This follows the characteristics of each language, because it is case-insensitive in HTML tags, and the use of links is more friendly; in JavaScript, the more natural is the hump name.
@ -270,7 +293,9 @@ props: {
```
##### 4.v-for
When performing v-for traversal, you should always bring a key value to make rendering more efficient when updating the DOM.
```
<ul>
<liv-for="item in list":key="item.id">
@ -280,6 +305,7 @@ When performing v-for traversal, you should always bring a key value to make ren
```
v-for should be avoided on the same element as v-if (`for example: <li>`) because v-for has a higher priority than v-if. To avoid invalid calculations and rendering, you should try to use v-if Put it on top of the container's parent element.
```
<ulv-if="showList">
<liv-for="item in list":key="item.id">
@ -289,7 +315,9 @@ v-for should be avoided on the same element as v-if (`for example: <li>`) becaus
```
##### 5.v-if / v-else-if / v-else
If the elements in the same set of v-if logic control are logically identical, Vue reuses the same part for more efficient element switching, `such as: value`. In order to avoid the unreasonable effect of multiplexing, you should add key to the same element for identification.
```
<divv-if="hasData"key="mazey-data">
<span>{{ mazeyData }}</span>
@ -300,12 +328,15 @@ If the elements in the same set of v-if logic control are logically identical, V
```
##### 6.Instruction abbreviation
In order to unify the specification, the instruction abbreviation is always used. Using `v-bind`, `v-on` is not bad. Here is only a unified specification.
```
<input:value="mazeyUser"@click="verifyUser">
```
##### 7.Top-level element order of single file components
Styles are packaged in a file, all the styles defined in a single vue file, the same name in other files will also take effect. All will have a top class name before creating a component.
Note: The sass plugin has been added to the project, and the sas syntax can be written directly in a single vue file.
For uniformity and ease of reading, they should be placed in the order of `<template>`、`<script>`、`<style>`.
@ -357,25 +388,31 @@ For uniformity and ease of reading, they should be placed in the order of `<tem
## JavaScript specification
##### 1.var / let / const
It is recommended to no longer use var, but use let / const, prefer const. The use of any variable must be declared in advance, except that the function defined by function can be placed anywhere.
##### 2.quotes
```
const foo = 'after division'
const bar = `${foo},ront-end engineer`
```
##### 3.function
Anonymous functions use the arrow function uniformly. When multiple parameters/return values are used, the object's structure assignment is used first.
```
function getPersonInfo ({name, sex}) {
// ...
return {name, gender}
}
```
The function name is uniformly named with a camel name. The beginning of the capital letter is a constructor. The lowercase letters start with ordinary functions, and the new operator should not be used to operate ordinary functions.
##### 4.object
```
const foo = {a: 0, b: 1}
const bar = JSON.parse(JSON.stringify(foo))
@ -393,7 +430,9 @@ for (let [key, value] of myMap.entries()) {
```
##### 5.module
Unified management of project modules using import / export.
```
// lib.js
export default {}
@ -411,13 +450,16 @@ If the module has only one output value, use `export default`,otherwise no.
##### 1.Label
Do not write the type attribute when referencing external CSS or JavaScript. The HTML5 default type is the text/css and text/javascript properties, so there is no need to specify them.
The naming of Class and ID should be semantic, and you can see what you are doing by looking at the name; multiple words are connected by a link.
```
// positive example
.test-header{
@ -426,6 +468,7 @@ The naming of Class and ID should be semantic, and you can see what you are doin
```
##### 3.Attribute abbreviation
CSS attributes use abbreviations as much as possible to improve the efficiency and ease of understanding of the code.
```
@ -439,6 +482,7 @@ border: 1px solid #ccc;
```
##### 4.Document type
The HTML5 standard should always be used.
```
@ -446,7 +490,9 @@ The HTML5 standard should always be used.
```
##### 5.Notes
A block comment should be written to a module file.
```
/**
* @module mazey/api
@ -457,7 +503,8 @@ A block comment should be written to a module file.
## interface
##### All interfaces are returned as Promise
##### All interfaces are returned as Promise
Note that non-zero is wrong for catching catch
```
@ -477,6 +524,7 @@ test.then(res => {
```
Normal return
```
{
code:0,
@ -486,6 +534,7 @@ Normal return
```
Error return
```
{
code:10000,
@ -493,8 +542,10 @@ Error return
msg:'failed'
}
```
If the interface is a post request, the Content-Type defaults to application/x-www-form-urlencoded; if the Content-Type is changed to application/json,
Interface parameter transfer needs to be changed to the following way
@ -524,6 +575,7 @@ User Center Related Interfaces `src/js/conf/home/store/user/actions.js`
(1) First place the icon icon of the node in the `src/js/conf/home/pages/dag/img `folder, and note the English name of the node defined by the `toolbar_${in the background. For example: SHELL}.png`
(2) Find the `tasksType` object in `src/js/conf/home/pages/dag/_source/config.js` and add it to it.
```
'DEPENDENT': { // The background definition node type English name is used as the key value
desc: 'DEPENDENT', // tooltip desc
@ -532,6 +584,7 @@ User Center Related Interfaces `src/js/conf/home/store/user/actions.js`
```
(3) Add a `${node type (lowercase)}`.vue file in `src/js/conf/home/pages/dag/_source/formModel/tasks`. The contents of the components related to the current node are written here. Must belong to a node component must have a function _verification () After the verification is successful, the relevant data of the current component is thrown to the parent component.
```
/**
* Verification
@ -566,6 +619,7 @@ User Center Related Interfaces `src/js/conf/home/store/user/actions.js`
(4) Common components used inside the node component are under` _source`, and `commcon.js` is used to configure public data.
##### 2.Increase the status type
(1) Find the `tasksState` object in `src/js/conf/home/pages/dag/_source/config.js` and add it to it.
```
@ -579,7 +633,9 @@ User Center Related Interfaces `src/js/conf/home/store/user/actions.js`
```
##### 3.Add the action bar tool
(1) Find the `toolOper` object in `src/js/conf/home/pages/dag/_source/config.js` and add it to it.
```
{
code: 'pointer', // tool identifier
@ -599,13 +655,12 @@ User Center Related Interfaces `src/js/conf/home/store/user/actions.js`
`util.js` => belongs to the `plugIn` tool class
The operation is handled in the `src/js/conf/home/pages/dag/_source/dag.js` => `toolbarEvent` event.
##### 3.Add a routing page
(1) First add a routing address`src/js/conf/home/router/index.js` in route management
```
routing address{
path: '/test', // routing address
@ -619,12 +674,12 @@ routing address{
(2)Create a `test` folder in `src/js/conf/home/pages` and create an `index.vue `entry file in the folder.
This will give you direct access to`http://localhost:8888/#/test`
This will give you direct access to`http://localhost:8888/#/test`
##### 4.Increase the preset mailbox
Find the `src/lib/localData/email.js` startup and timed email address input to automatically pull down the match.
- For error logs or long code examples, please use [GitHub gist](https://gist.github.com/) and include only a few lines of the pertinent code / log within the email.
@ -20,7 +20,6 @@ Moreover, when we intend to refer a new software ( not limited to 3rd party jar,
* [COMMUNITY-LED DEVELOPMENT "THE APACHE WAY"](https://apache.org/dev/licensing-howto.html)
For example, we should contain the NOTICE file (every open-source project has NOTICE file, generally under root directory) of ZooKeeper in our project when we are using ZooKeeper. As the Apache explains, "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work.
We are not going to dive into every 3rd party open-source license policy, you may look up them if interested.
@ -40,3 +39,4 @@ We need to follow the following steps when we need to add new jars or external r
* [COMMUNITY-LED DEVELOPMENT "THE APACHE WAY"](https://apache.org/dev/licensing-howto.html)
* [ASF 3RD PARTY LICENSE POLICY](https://apache.org/legal/resolved.html)
@ -8,4 +8,4 @@ In Dolphinscheduler community, if a committer who have earned even more merit, c
One thing that is sometimes hard to understand when you are new to the open development process used at the ASF, is that we value the community more than the code. A strong and healthy community will be respectful and be a fun and rewarding place. More importantly, a diverse and healthy community can continue to support the code over the longer term, even as individual companies come and go from the field.
More details could be found [here](https://community.apache.org/contributors/).
More details could be found [here](https://community.apache.org/contributors/).
The following Code of Conduct is based on full compliance with the [Apache Software Foundation Code of Conduct](https://www.apache.org/foundation/policies/conduct.html).
## Development philosophy
- **Consistent** code style, naming, and usage are consistent.
- **Easy to read** code is obvious, easy to read and understand, when debugging one knows the intent of the code.
- **Neat** agree with the concepts of《Refactoring》and《Code Cleanliness》and pursue clean and elegant code.
- **Abstract** hierarchy is clear and the concepts are refined and reasonable. Keep methods, classes, packages, and modules at the same level of abstraction.
- **Heart** Maintain a sense of responsibility and continue to be carved in the spirit of artisans.
- **Consistent** code style, naming, and usage are consistent.
- **Easy to read** code is obvious, easy to read and understand, when debugging one knows the intent of the code.
- **Neat** agree with the concepts of《Refactoring》and《Code Cleanliness》and pursue clean and elegant code.
- **Abstract** hierarchy is clear and the concepts are refined and reasonable. Keep methods, classes, packages, and modules at the same level of abstraction.
- **Heart** Maintain a sense of responsibility and continue to be carved in the spirit of artisans.
## Development specifications
- Executing `mvn -U clean package -Prelease` can compile and test through all test cases.
- The test coverage tool checks for no less than dev branch coverage.
- In the root directory, use Checkstyle to check your code for special reasons for violating validation rules. The template location is located at ds_check_style.xml.
- Follow the coding specifications.
- Executing `mvn -U clean package -Prelease` can compile and test through all test cases.
- The test coverage tool checks for no less than dev branch coverage.
- In the root directory, use Checkstyle to check your code for special reasons for violating validation rules. The template location is located at ds_check_style.xml.
- Follow the coding specifications.
## Coding specifications
- Use linux line breaks.
- Indentation (including empty lines) is consistent with the last line.
- An empty line is required between the class declaration and the following variable or method.
- There should be no meaningless empty lines.
- Classes, methods, and variables should be named as the name implies and abbreviations should be avoided.
- Return value variables are named after `result`; `each` is used in loops to name loop variables; and `entry` is used in map instead of `each`.
- The cached exception is called `e`; Catch the exception and do nothing, and the exception is named `ignored`.
- Configuration Files are named in camelCase, and file names are lowercase with uppercase initial/starting letter.
- Code that requires comment interpretation should be as small as possible and interpreted by method name.
- `equals` and `==` In a conditional expression, the constant is left, the variable is on the right, and in the expression greater than less than condition, the variable is left and the constant is right.
- In addition to the abstract classes used for inheritance, try to design the class as `final`.
- Nested loops are as much a method as possible.
- The order in which member variables are defined and the order in which parameters are passed is consistent across classes and methods.
- Priority is given to the use of guard statements.
- Classes and methods have minimal access control.
- The private method used by the method should follow the method, and if there are multiple private methods, the writing private method should appear in the same order as the private method in the original method.
- Method entry and return values are not allowed to be `null`.
- The return and assignment statements of if else are preferred with the tri-objective operator.
- Priority is given to `LinkedList` and only use `ArrayList` if you need to get element values in the collection through the index.
- Collection types such as `ArrayList`,`HashMap` that may produce expansion must specify the initial size of the collection to avoid expansion.
- Logs and notes are always in English.
- Comments can only contain `javadoc`, `todo` and `fixme`.
- Exposed classes and methods must have javadoc, other classes and methods and methods that override the parent class do not require javadoc.
- Use linux line breaks.
- Indentation (including empty lines) is consistent with the last line.
- An empty line is required between the class declaration and the following variable or method.
- There should be no meaningless empty lines.
- Classes, methods, and variables should be named as the name implies and abbreviations should be avoided.
- Return value variables are named after `result`; `each` is used in loops to name loop variables; and `entry` is used in map instead of `each`.
- The cached exception is called `e`; Catch the exception and do nothing, and the exception is named `ignored`.
- Configuration Files are named in camelCase, and file names are lowercase with uppercase initial/starting letter.
- Code that requires comment interpretation should be as small as possible and interpreted by method name.
- `equals` and `==` In a conditional expression, the constant is left, the variable is on the right, and in the expression greater than less than condition, the variable is left and the constant is right.
- In addition to the abstract classes used for inheritance, try to design the class as `final`.
- Nested loops are as much a method as possible.
- The order in which member variables are defined and the order in which parameters are passed is consistent across classes and methods.
- Priority is given to the use of guard statements.
- Classes and methods have minimal access control.
- The private method used by the method should follow the method, and if there are multiple private methods, the writing private method should appear in the same order as the private method in the original method.
- Method entry and return values are not allowed to be `null`.
- The return and assignment statements of if else are preferred with the tri-objective operator.
- Priority is given to `LinkedList` and only use `ArrayList` if you need to get element values in the collection through the index.
- Collection types such as `ArrayList`,`HashMap` that may produce expansion must specify the initial size of the collection to avoid expansion.
- Logs and notes are always in English.
- Comments can only contain `javadoc`, `todo` and `fixme`.
- Exposed classes and methods must have javadoc, other classes and methods and methods that override the parent class do not require javadoc.
## Unit test specifications
- Test code and production code are subject to the same code specifications.
- Unit tests are subject to AIR (Automatic, Independent, Repeatable) Design concept.
- Automatic: Unit tests should be fully automated, not interactive. Manual checking of output results is prohibited, `System.out`, `log`, etc. are not allowed, and must be verified with assertions.
- Independent: It is prohibited to call each other between unit test cases and to rely on the order of execution. Each unit test can be run independently.
- Repeatable: Unit tests cannot be affected by the external environment and can be repeated.
- Unit tests are subject to BCDE(Border, Correct, Design, Error) Design principles.
- Border (Boundary value test): The expected results are obtained by entering the boundaries of loop boundaries, special values, data order, etc.
- Correct (Correctness test): The expected results are obtained with the correct input.
- Design (Rationality Design): Design high-quality unit tests in combination with production code design.
- Error (Fault tolerance test): The expected results are obtained through incorrect input such as illegal data, abnormal flow, etc.
- If there is no special reason, the test needs to be fully covered.
- Each test case needs to be accurately asserted.
- Prepare the environment for code separation from the test code.
- Only jUnit `Assert`,hamcrest `CoreMatchers`,Mockito Correlation can use static import.
- Single-data assertions should use `assertTrue`,`assertFalse`,`assertNull` and `assertNotNull`.
- Multi-data assertions should use `assertThat`.
- Accurate assertion, try not to use `not`,`containsString` assertion.
- The true value of the test case should be named actualXXX, and the expected value should be named expectedXXX.
- Classes and Methods with `@Test` labels do not require javadoc.
- Test code and production code are subject to the same code specifications.
- Unit tests are subject to AIR (Automatic, Independent, Repeatable) Design concept.
- Automatic: Unit tests should be fully automated, not interactive. Manual checking of output results is prohibited, `System.out`, `log`, etc. are not allowed, and must be verified with assertions.
- Independent: It is prohibited to call each other between unit test cases and to rely on the order of execution. Each unit test can be run independently.
- Repeatable: Unit tests cannot be affected by the external environment and can be repeated.
- Unit tests are subject to BCDE(Border, Correct, Design, Error) Design principles.
- Border (Boundary value test): The expected results are obtained by entering the boundaries of loop boundaries, special values, data order, etc.
- Correct (Correctness test): The expected results are obtained with the correct input.
- Design (Rationality Design): Design high-quality unit tests in combination with production code design.
- Error (Fault tolerance test): The expected results are obtained through incorrect input such as illegal data, abnormal flow, etc.
- If there is no special reason, the test needs to be fully covered.
- Each test case needs to be accurately asserted.
- Prepare the environment for code separation from the test code.
- Only jUnit `Assert`,hamcrest `CoreMatchers`,Mockito Correlation can use static import.
- Single-data assertions should use `assertTrue`,`assertFalse`,`assertNull` and `assertNotNull`.
- Multi-data assertions should use `assertThat`.
- Accurate assertion, try not to use `not`,`containsString` assertion.
- The true value of the test case should be named actualXXX, and the expected value should be named expectedXXX.
- Classes and Methods with `@Test` labels do not require javadoc.
- Public specifications.
- Each line is no longer than `200` in length, ensuring that each line is semantically complete for easy understanding.
- Public specifications.
- Each line is no longer than `200` in length, ensuring that each line is semantically complete for easy understanding.
@ -13,8 +13,8 @@ We encourage any form of participation in the community that will eventually bec
* Help promote DolphinScheduler, participate in technical conferences or meetup, sharing and more.
Welcome to the contributing team and join open source starting with submitting your first PR.
- For example, add code comments or find "easy to fix" tags or some very simple issue (misspellings, etc.) and so on, first familiarize yourself with the submission process through the first simple PR.
- For example, add code comments or find "easy to fix" tags or some very simple issue (misspellings, etc.) and so on, first familiarize yourself with the submission process through the first simple PR.
Note: Contributions are not limited to PR Only, but contribute to the development of the project.
I'm sure you'll benefit from open source by participating in DolphinScheduler!
@ -37,4 +37,4 @@ If you want to implement a Feature or fix a Bug. Please refer to the following:
* You should create a new branch to start your work, to get the name of the branch refer to the [Submit Guide-Pull Request Notice](./pull-request.md). For example, if you want to complete the feature and submit Issue 111, your branch name should be feature-111. The feature name can be determined after discussion with the instructor.
* When you're done, send a Pull Request to dolphinscheduler, please refer to the《[Submit Guide-Submit Pull Request Process](./submit-code.md)》
If you want to submit a Pull Request to complete a Feature or fix a Bug, it is recommended that you start with the `good first issue`, `easy-to-fix` issues, complete a small function to submit, do not change too many files at a time, changing too many files will also put a lot of pressure on Reviewers, it is recommended to submit them through multiple Pull Requests, not all at once.
If you want to submit a Pull Request to complete a Feature or fix a Bug, it is recommended that you start with the `good first issue`, `easy-to-fix` issues, complete a small function to submit, do not change too many files at a time, changing too many files will also put a lot of pressure on Reviewers, it is recommended to submit them through multiple Pull Requests, not all at once.
Issues function is used to track various Features, Bugs, Functions, etc. The project maintainer can organize the tasks to be completed through issues.
Issue is an important step in drawing out a feature or bug,
@ -129,8 +130,8 @@ The main purpose of this is to avoid wasting time caused by different opinions o
- How to deal with the user who raises an issue does not know the module corresponding to the issue.
It is true that most users when raising issue do not know which module the issue belongs to.
In fact, this is very common in many open source communities. In this case, the committer / contributor actually knows the module affected by the issue.
If the issue is really valuable after being approved by committer and contributor, then the committer can modify the issue title according to the specific module involved in the issue,
or leave a message to the user who raises the issue to modify it into the corresponding title.
It is true that most users when raising issue do not know which module the issue belongs to.
In fact, this is very common in many open source communities. In this case, the committer / contributor actually knows the module affected by the issue.
If the issue is really valuable after being approved by committer and contributor, then the committer can modify the issue title according to the specific module involved in the issue,
or leave a message to the user who raises the issue to modify it into the corresponding title.
Pull Request is a way of software cooperation, which is a process of bringing code involving different functions into the trunk. During this process, the code can be discussed, reviewed, and modified.
In Pull Request, we try not to discuss the implementation of the code. The general implementation of the code and its logic should be determined in Issue. In the Pull Request, we only focus on the code format and code specification, so as to avoid wasting time caused by different opinions on implementation.
@ -62,8 +63,8 @@ Please refer to the commit message section.
### Pull Request Code Style
DolphinScheduler uses `Spotless` to automatically fix code style and formatting errors,
see [Code Style](../development-environment-setup.md#code-style) for details.
DolphinScheduler uses `Spotless` to automatically fix code style and formatting errors,
see [Code Style](../development-environment-setup.md#code-style) for details.
### Question
@ -74,4 +75,5 @@ see [Code Style](../development-environment-setup.md#code-style) for details.
Usually, there are two solutions to this scenario: the first is to merge multiple issues with into the same issue, and then close the other issues;
the second is multiple issues have subtle differences.
In this scenario, the responsibilities of each issue can be clearly divided. The type of each issue is marked as Sub-Task, and then these sub task type issues are associated with one issue.
And each Pull Request is submitted should be associated with only one issue of a sub task.
And each Pull Request is submitted should be associated with only one issue of a sub task.
@ -10,7 +10,7 @@ from the community to review them. You could see detail in [mail][mail-review-wa
in [GitHub Discussion][discussion-result-review-wanted].
> Note: It is only users mentioned in the [GitHub Discussion][discussion-result-review-wanted] can review Issues or Pull
> Requests, Community advocates **Anyone is encouraged to review Issues and Pull Requests**. Users in
> Requests, Community advocates **Anyone is encouraged to review Issues and Pull Requests**. Users in
> [GitHub Discussion][discussion-result-review-wanted] show their willing to review when we collect in the mail thread.
> The advantage of this list is when the community has discussion, in addition to the mention Members in [team](/us-en/community/community.html),
> you can also find some help in [GitHub Discussion][discussion-result-review-wanted] people. If you want to join the
@ -27,43 +27,43 @@ go to section [review Pull Requests](#pull-requests).
Review Issues means discuss [Issues][all-issues] in GitHub and give suggestions on it. Include but are not limited to the following situations
| Situation | Reason | Label | Action |
| ------ | ------ | ------ | ------ |
| wont fix | Has been fixed in dev branch | [wontfix][label-wontfix] | Close Issue, inform creator the fixed version if it already release |
| duplicate issue | Had the same problem before | [duplicate][label-duplicate] | Close issue, inform creator the link of same issue |
| Description not clearly | Without detail reproduce step | [need more information][label-need-more-information] | Inform creator add more description |
| wont fix | Has been fixed in dev branch | [wontfix][label-wontfix] | Close Issue, inform creator the fixed version if it already release |
| duplicate issue | Had the same problem before | [duplicate][label-duplicate] | Close issue, inform creator the link of same issue |
| Description not clearly | Without detail reproduce step | [need more information][label-need-more-information] | Inform creator add more description |
In addition give suggestion, add label for issue is also important during review. The labeled issues can be retrieved
better, which convenient for further processing. An issue can with more than one label. Common issue categories are:
| Label | Meaning |
| ------ | ------ |
| [UI][label-UI] | UI and front-end related |
| [security][label-security] | Security Issue |
| [user experience][label-user-experience] | User experience Issue |
| [development][label-development] | Development Issue |
| [Python][label-Python] | Python Issue |
| [plug-in][label-plug-in] | Plug-in Issue |
| [document][label-document] | Document Issue |
| [docker][label-docker] | Docker Issue |
| [need verify][label-need-verify] | Need verify Issue |
| [e2e][label-e2e] | E2E Issue |
| [win-os][label-win-os] | windows operating system Issue |
| [suggestion][label-suggestion] | Give suggestion to us |
@ -21,3 +21,4 @@ Unsubscribe from the mailing list steps are as follows:
2. Receive confirmation email and reply. After completing step 1, you will receive a confirmation email from dev-help@dolphinscheduler.apache.org (if not received, please confirm whether the email is automatically classified as spam, promotion email, subscription email, etc.) . Then reply directly to the email, or click on the link in the email to reply quickly, the subject and content are arbitrary.
3. Receive a goodbye email. After completing the above steps, you will receive a goodbye email with the subject GOODBYE from dev@dolphinscheduler.apache.org, and you have successfully unsubscribed to the Apache DolphinScheduler mailing list, and you will not receive emails from dev@dolphinscheduler.apache.org.
- Unit tests help everyone to get into the details of the code and understand how it works.
- Through test cases we can find bugs and submit robust code.
- The test case is also a demo usage of the code.
- Unit tests help everyone to get into the details of the code and understand how it works.
- Through test cases we can find bugs and submit robust code.
- The test case is also a demo usage of the code.
### 2. Some design principles for unit test cases
- The steps, granularity and combination of conditions should be carefully designed.
- Pay attention to boundary conditions.
- Unit tests should be well designed as well as avoiding useless code.
- When you find a `method` is difficult to write unit test, and if you confirm that the `method` is `badcode`, then refactor it with the developer.
- The steps, granularity and combination of conditions should be carefully designed.
- Pay attention to boundary conditions.
- Unit tests should be well designed as well as avoiding useless code.
- When you find a `method` is difficult to write unit test, and if you confirm that the `method` is `badcode`, then refactor it with the developer.
<!-- markdown-link-check-disable -->
- DolphinScheduler: [mockito](http://site.mockito.org/). Here are some development guides: [mockito tutorial](http://www.baeldung.com/bdd-mockito), [mockito refcard](https://dzone.com/refcardz/mockito)
- DolphinScheduler: [mockito](http://site.mockito.org/). Here are some development guides: [mockito tutorial](http://www.baeldung.com/bdd-mockito), [mockito refcard](https://dzone.com/refcardz/mockito)
<!-- markdown-link-check-enable -->
- TDD(option): When you start writing a new feature, you can try writing test cases first.
- TDD(option): When you start writing a new feature, you can try writing test cases first.
### 3. Test coverage setpoint
- At this stage, the default value for test coverage of Delta change codes is >= 60%, the higher the better.
- We can see the test reports on this page: https://codecov.io/gh/apache/dolphinscheduler
- At this stage, the default value for test coverage of Delta change codes is >= 60%, the higher the better.
- We can see the test reports on this page: https://codecov.io/gh/apache/dolphinscheduler
## Fundamental guidelines for unit test
@ -64,13 +66,13 @@ Invalid assertions make the test itself meaningless, it has little to do with wh
There are several types of invalid assertions:
1. Different types of comparisons.
1. Different types of comparisons.
2. Determines that an object or variable with a default value is not null.
2. Determines that an object or variable with a default value is not null.
This seems meaningless. Therefore, when making the relevant judgements you should pay attention to whether it contains a default value itself.
This seems meaningless. Therefore, when making the relevant judgements you should pay attention to whether it contains a default value itself.
3. Assertions should be affirmative rather than negative if possible. Assertions should be within a range of predicted results, or exact values, whenever possible (otherwise you may end up with something that doesn't match your actual expectations but passes the assertion) unless your code only cares about whether it is empty or not.
3. Assertions should be affirmative rather than negative if possible. Assertions should be within a range of predicted results, or exact values, whenever possible (otherwise you may end up with something that doesn't match your actual expectations but passes the assertion) unless your code only cares about whether it is empty or not.
### 8. Some points to note for unit tests
@ -90,17 +92,18 @@ For example @Ignore("see #1").
The test will fail when the code in the unit test throws an exception. Therefore, there is no need to use try-catch to catch exceptions.
@ -49,4 +49,5 @@ That is, the workflow instance ID and task instance ID are injected in the print
- The use of printStackTrace() is prohibited for exception handling. This method prints the exception stack information to the standard error output.
- Branch printing of logs is prohibited. The contents of the logs need to be associated with the relevant information in the log format, and printing them in separate lines will cause the contents of the logs to not match the time and other information, and cause the logs to be mixed in a large number of log environments, which will make log retrieval more difficult.
- The use of the "+" operator for splicing log content is prohibited. Use placeholders for formatting logs for printing to improve memory usage efficiency.
- When the log content includes object instances, you need to make sure to override the toString() method to prevent printing meaningless hashcode.
- When the log content includes object instances, you need to make sure to override the toString() method to prevent printing meaningless hashcode.
Compared with the last release, the `release-docs` of the current release needs to be updated to the latest, if there are dependencies and versions changes
- `dolphinscheduler-dist/release-docs/LICENSE`
- `dolphinscheduler-dist/release-docs/NOTICE`
- `dolphinscheduler-dist/release-docs/licenses`
- `dolphinscheduler-dist/release-docs/LICENSE`
- `dolphinscheduler-dist/release-docs/NOTICE`
- `dolphinscheduler-dist/release-docs/licenses`
## Update Version
@ -29,3 +29,4 @@ For example, to release `x.y.z`, the following updates are required:
- Add new history version
- `docs/docs/en/history-versions.md` and `docs/docs/zh/history-versions.md`: Add the new version and link for `x.y.z`
- `docs/configs/docsdev.js`: change `/dev/` to `/x.y.z/`
| Warning Type | Alert on success or failure or both. |
| WebHook | The format is: [https://oapi.dingtalk.com/robot/send?access\_token=XXXXXX](https://oapi.dingtalk.com/robot/send?access_token=XXXXXX) |
| Keyword | Custom keywords for security settings. |
| Secret | Signature of security settings |
| Msg Type | Message parse type (support txt, markdown, markdownV2, html). |
| At User Mobile | When a custom bot sends a message, you can specify the "@person list" by their mobile phone number. When the selected people in the "@people list" receive the message, there will be a `@` message reminder. `No disturb` mode always receives reminders, and "someone @ you" appears in the message. The "At User Mobile" represents mobile phone number of the "@person" |
| At User Ids | The user ID by "@person" |
| Proxy | The proxy address of the proxy server. |
| Port | The proxy port of Proxy-Server. |
| User | Authentication(Username) for the proxy server. |
| Password | Authentication(Password) for the proxy server. |
| At User Ids | The user ID by "@person" |
| Proxy | The proxy address of the proxy server. |
| Port | The proxy port of Proxy-Server. |
| User | Authentication(Username) for the proxy server. |
| Password | Authentication(Password) for the proxy server. |
## Reference
- [DingTalk Custom Robot Access Development Documentation](https://open.dingtalk.com/document/robots/custom-robot-access)
- [DingTalk Custom Robot Access Development Documentation](https://open.dingtalk.com/document/robots/custom-robot-access)
The Group Chat send type means to notify the alert results via group chat created by Enterprise WeChat API, sending messages to all members of the group and specified users are not supported.
@ -68,4 +67,5 @@ The following is the `create new group chat` API and `query userId` API example:
## Reference
- Group Chat:https://work.weixin.qq.com/api/doc/90000/90135/90248
- Group Chat:https://work.weixin.qq.com/api/doc/90000/90135/90248
The data quality task is used to check the data accuracy during the integration and processing of data. Data quality tasks in this release include single-table checking, single-table custom SQL checking, multi-table accuracy, and two-table value comparisons. The running environment of the data quality task is Spark 2.4.0, and other versions have not been verified, and users can verify by themselves.
@ -7,9 +8,9 @@ The execution logic of the data quality task is as follows:
- The user defines the task in the interface, and the user input value is stored in `TaskParam`.
- When running a task, `Master` will parse `TaskParam`, encapsulate the parameters required by `DataQualityTask` and send it to `Worker`.
- Worker runs the data quality task. After the data quality task finishes running, it writes the statistical results to the specified storage engine.
- Worker runs the data quality task. After the data quality task finishes running, it writes the statistical results to the specified storage engine.
- The current data quality task result is stored in the `t_ds_dq_execute_result` table of `dolphinscheduler`
`Worker` sends the task result to `Master`, after `Master` receives `TaskResponse`, it will judge whether the task type is `DataQualityTask`, if so, it will read the corresponding result from `t_ds_dq_execute_result` according to `taskInstanceId`, and then The result is judged according to the check mode, operator and threshold configured by the user.
`Worker` sends the task result to `Master`, after `Master` receives `TaskResponse`, it will judge whether the task type is `DataQualityTask`, if so, it will read the corresponding result from `t_ds_dq_execute_result` according to `taskInstanceId`, and then The result is judged according to the check mode, operator and threshold configured by the user.
- If the result is a failure, the corresponding operation, alarm or interruption will be performed according to the failure policy configured by the user.
| CheckMethod | [CheckFormula][Operator][Threshold], if the result is true, it indicates that the data does not meet expectations, and the failure strategy is executed. |
| CheckMethod | [CheckFormula][Operator][Threshold], if the result is true, it indicates that the data does not meet expectations, and the failure strategy is executed. |
| Example |<ul><li>CheckFormula:Expected-Actual</li><li>Operator:></li><li>Threshold:0</li><li>ExpectedValue:FixValue=9</li></ul>
| Example |<ul><li>CheckFormula:Expected-Actual</li><li>Operator:></li><li>Threshold:0</li><li>ExpectedValue:FixValue=9</li></ul> |
In the example, assuming that the actual value is 10, the operator is >, and the expected value is 9, then the result 10 -9 > 0 is true, which means that the row data in the empty column has exceeded the threshold, and the task is judged to fail.
# Task Operation Guide
@ -50,7 +51,6 @@ The goal of the null value check is to check the number of empty rows in the spe
```sql
SELECT COUNT(*) AS miss FROM ${src_table} WHERE (${src_field} is null or ${src_field} = '') AND (${src_filter})
```
- The SQL to calculate the total number of rows in the table is as follows:
```sql
@ -61,155 +61,163 @@ The goal of the null value check is to check the number of empty rows in the spe
| Threshold | The value used in the formula for comparison. |
| Failure strategy | <ul><li>Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.</li><li>Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.</li></ul> |
| Expected value type | Select the desired type from the drop-down menu. |
| Threshold | The value used in the formula for comparison. |
| Failure strategy | <ul><li>Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.</li><li>Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.</li></ul> |
| Expected value type | Select the desired type from the drop-down menu. |
## Timeliness Check of Single Table Check
### Inspection Introduction
The timeliness check is used to check whether the data is processed within the expected time. The start time and end time can be specified to define the time range. If the amount of data within the time range does not reach the set threshold, the check task will be judged as fail.
| Threshold | The value used in the formula for comparison. |
| Failure strategy | <ul><li>Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.</li><li>Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.</li></ul> |
| Expected value type | Select the desired type from the drop-down menu. |
| Threshold | The value used in the formula for comparison. |
| Failure strategy | <ul><li>Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.</li><li>Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.</li></ul> |
| Expected value type | Select the desired type from the drop-down menu. |
## Field Length Check for Single Table Check
### Inspection Introduction
The goal of field length verification is to check whether the length of the selected field meets the expectations. If there is data that does not meet the requirements, and the number of rows exceeds the threshold, the task will be judged to fail.
| Threshold | The value used in the formula for comparison. |
| Failure strategy | <ul><li>Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.</li><li>Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.</li></ul> |
| Expected value type | Select the desired type from the drop-down menu. |
| Threshold | The value used in the formula for comparison. |
| Failure strategy | <ul><li>Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.</li><li>Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.</li></ul> |
| Expected value type | Select the desired type from the drop-down menu. |
## Uniqueness Check for Single Table Check
### Inspection Introduction
The goal of the uniqueness check is to check whether the fields are duplicated. It is generally used to check whether the primary key is duplicated. If there are duplicates and the threshold is reached, the check task will be judged to be failed.
| Threshold | The value used in the formula for comparison. |
| Failure strategy | <ul><li>Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.</li><li>Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.</li></ul> |
| Expected value type | Select the desired type from the drop-down menu. |
| Threshold | The value used in the formula for comparison. |
| Failure strategy | <ul><li>Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.</li><li>Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.</li></ul> |
| Expected value type | Select the desired type from the drop-down menu. |
## Regular Expression Check for Single Table Check
### Inspection Introduction
The goal of regular expression verification is to check whether the format of the value of a field meets the requirements, such as time format, email format, ID card format, etc. If there is data that does not meet the format and exceeds the threshold, the task will be judged as failed.
| Threshold | The value used in the formula for comparison. |
| Failure strategy | <ul><li>Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.</li><li>Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.</li></ul> |
| Expected value type | Select the desired type from the drop-down menu. |
| Threshold | The value used in the formula for comparison. |
| Failure strategy | <ul><li>Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.</li><li>Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.</li></ul> |
| Expected value type | Select the desired type from the drop-down menu. |
## Enumeration Value Validation for Single Table Check
### Inspection Introduction
The goal of enumeration value verification is to check whether the value of a field is within the range of the enumeration value. If there is data that is not in the range of the enumeration value and exceeds the threshold, the task will be judged to fail.
| Threshold | The value used in the formula for comparison. |
| Failure strategy | <ul><li>Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.</li><li>Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.</li></ul> |
| Expected value type | Select the desired type from the drop-down menu. |
| Threshold | The value used in the formula for comparison. |
| Failure strategy | <ul><li>Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.</li><li>Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.</li></ul> |
| Expected value type | Select the desired type from the drop-down menu. |
## Table Row Number Verification for Single Table Check
### Inspection Introduction
The goal of table row number verification is to check whether the number of rows in the table reaches the expected value. If the number of rows does not meet the standard, the task will be judged as failed.
| Threshold | The value used in the formula for comparison. |
| Failure strategy | <ul><li>Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.</li><li>Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.</li></ul> |
| Expected value type | Select the desired type from the drop-down menu. |
| Threshold | The value used in the formula for comparison. |
| Failure strategy | <ul><li>Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.</li><li>Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.</li></ul> |
| Expected value type | Select the desired type from the drop-down menu. |
## Custom SQL Check for Single Table Check
@ -217,36 +225,38 @@ The goal of table row number verification is to check whether the number of rows
| Source data type | Select MySQL, PostgreSQL, etc. |
| Source data source | The corresponding data source under the source data type. |
| Source data table | Drop-down to select the table where the data to be verified is located. |
| Actual value name | Alias in SQL for statistical value calculation, such as max_num. |
|Actual value calculation SQL | SQL for outputting actual values. Note:<ul><li>The SQL must be statistical SQL, such as counting the number of rows, calculating the maximum value, minimum value, etc.</li><li>Select max(a) as max_num from ${src_table}, the table name must be filled like this.</li></ul> |
| Src filter conditions | Such as the title, it will also be used when counting the total number of rows in the table, optional. |
| Threshold | The value used in the formula for comparison. |
| Failure strategy | <ul><li>Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.</li><li>Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.</li></ul> |
| Expected value type | Select the desired type from the drop-down menu. |
| Source data type | Select MySQL, PostgreSQL, etc. |
| Source data source | The corresponding data source under the source data type. |
| Source data table | Drop-down to select the table where the data to be verified is located. |
| Actual value name | Alias in SQL for statistical value calculation, such as max_num. |
|Actual value calculation SQL | SQL for outputting actual values. Note:<ul><li>The SQL must be statistical SQL, such as counting the number of rows, calculating the maximum value, minimum value, etc.</li><li>Select max(a) as max_num from ${src_table}, the table name must be filled like this.</li></ul> |
| Src filter conditions | Such as the title, it will also be used when counting the total number of rows in the table, optional. |
| Threshold | The value used in the formula for comparison. |
| Failure strategy | <ul><li>Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.</li><li>Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.</li></ul> |
| Expected value type | Select the desired type from the drop-down menu. |
## Accuracy Check of Multi-table
### Inspection Introduction
Accuracy checks are performed by comparing the accuracy differences of data records for selected fields between two tables, examples are as follows
- table test1
| c1 | c2 |
|:---:|:---:|
| a | 1 |
| b | 2 |
|:--:|:--:|
| a | 1 |
| b | 2 |
- table test2
| c21 | c22 |
|:---:|:---:|
| a | 1 |
| b | 3 |
|:---:|:---:|
| a | 1 |
| b | 3 |
If you compare the data in c1 and c21, the tables test1 and test2 are exactly the same. If you compare c2 and c22, the data in table test1 and table test2 are inconsistent.
@ -254,45 +264,47 @@ If you compare the data in c1 and c21, the tables test1 and test2 are exactly th
| Source data type | Select MySQL, PostgreSQL, etc. |
| Source data source | The corresponding data source under the source data type. |
| Source data table | Drop-down to select the table where the data to be verified is located. |
| Src filter conditions | Such as the title, it will also be used when counting the total number of rows in the table, optional. |
| Target data type | Choose MySQL, PostgreSQL, etc. |
| Target data source | The corresponding data source under the source data type. |
| Target data table | Drop-down to select the table where the data to be verified is located. |
| Target filter conditions | Such as the title, it will also be used when counting the total number of rows in the table, optional. |
| Check column | Fill in the source data column, operator and target data column respectively. |
| Verification method | Select the desired verification method. |
| Operators | =, >, >=, <, <=, ! = |
| Failure strategy | <ul><li>Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.</li><li>Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.</li><ul> |
| Expected value type | Select the desired type in the drop-down menu, only `SrcTableTotalRow`, `TargetTableTotalRow` and fixed value are suitable for selection here. |
| Source data type | Select MySQL, PostgreSQL, etc. |
| Source data source | The corresponding data source under the source data type. |
| Source data table | Drop-down to select the table where the data to be verified is located. |
| Src filter conditions | Such as the title, it will also be used when counting the total number of rows in the table, optional. |
| Target data type | Choose MySQL, PostgreSQL, etc. |
| Target data source | The corresponding data source under the source data type. |
| Target data table | Drop-down to select the table where the data to be verified is located. |
| Target filter conditions | Such as the title, it will also be used when counting the total number of rows in the table, optional. |
| Check column | Fill in the source data column, operator and target data column respectively. |
| Verification method | Select the desired verification method. |
| Operators | =, >, >=, <, <=, ! = |
| Failure strategy | <ul><li>Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.</li><li>Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.</li><ul> |
| Expected value type | Select the desired type in the drop-down menu, only `SrcTableTotalRow`, `TargetTableTotalRow` and fixed value are suitable for selection here. |
## Comparison of the values checked by the two tables
### Inspection Introduction
Two-table value comparison allows users to customize different SQL statistics for two tables and compare the corresponding values. For example, for the source table A, the total amount of a certain column is calculated, and for the target table, the total amount of a certain column is calculated. value sum2, compare sum1 and sum2 to determine the check result.
| Source data type | Select MySQL, PostgreSQL, etc. |
| Source data source | The corresponding data source under the source data type. |
| Source data table | The table where the data is to be verified. |
| Actual value name | Calculate the alias in SQL for the actual value, such as max_age1. |
| Actual value calculation SQL | SQL for outputting actual values. Note: <ul><li>The SQL must be statistical SQL, such as counting the number of rows, calculating the maximum value, minimum value, etc.</li><li>Select max(age) as max_age1 from ${src_table} The table name must be filled like this.</li></ul> |
| Target data type | Choose MySQL, PostgreSQL, etc. |
| Target data source | The corresponding data source under the source data type. |
| Target data table | The table where the data is to be verified. |
| Expected value name | Calculate the alias in SQL for the expected value, such as max_age2. |
| Source data type | Select MySQL, PostgreSQL, etc. |
| Source data source | The corresponding data source under the source data type. |
| Source data table | The table where the data is to be verified. |
| Actual value name | Calculate the alias in SQL for the actual value, such as max_age1. |
| Actual value calculation SQL | SQL for outputting actual values. Note: <ul><li>The SQL must be statistical SQL, such as counting the number of rows, calculating the maximum value, minimum value, etc.</li><li>Select max(age) as max_age1 from ${src_table} The table name must be filled like this.</li></ul> |
| Target data type | Choose MySQL, PostgreSQL, etc. |
| Target data source | The corresponding data source under the source data type. |
| Target data table | The table where the data is to be verified. |
| Expected value name | Calculate the alias in SQL for the expected value, such as max_age2. |
| Expected value calculation SQL | SQL for outputting expected value. Note: <ul><li>The SQL must be statistical SQL, such as counting the number of rows, calculating the maximum value, minimum value, etc.</li><li>Select max(age) as max_age2 from ${target_table} The table name must be filled like this.</li></ul> |
| Verification method | Select the desired verification method. |
| Operators | =, >, >=, <, <=, ! = |
| Failure strategy | <ul><li>Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.</li><li>Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.</li></ul> |
| Verification method | Select the desired verification method. |
| Operators | =, >, >=, <, <=, ! = |
| Failure strategy | <ul><li>Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.</li><li>Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.</li></ul> |
## Task result view
@ -306,4 +318,4 @@ Two-table value comparison allows users to customize different SQL statistics fo
- Driver download link [SimbaAthenaJDBC-2.0.31.1000/AthenaJDBC42.jar](https://s3.cn-north-1.amazonaws.com.cn/athena-downloads-cn/drivers/JDBC/SimbaAthenaJDBC-2.0.31.1000/AthenaJDBC42.jar)
| Datasource name | Enter the name of the DataSource. |
| Description | Enter a description of the DataSource. |
| IP/Host Name | Enter the HIVE service IP. |
| Port | Enter the HIVE service port. |
| Username | Set the username for HIVE connection. |
| Password | Set the password for HIVE connection. |
| Database name | Enter the database name of the HIVE connection. |
| Jdbc connection parameters | Parameter settings for HIVE connection, in JSON format. |
> NOTICE: If you wish to execute multiple HIVE SQL in the same session, you could set `support.hive.oneSession = true` in `common.properties`.
> NOTICE: If you wish to execute multiple HIVE SQL in the same session, you could set `support.hive.oneSession = true` in `common.properties`.
> It is helpful when you try to set env variables before running HIVE SQL. Default value of `support.hive.oneSession` is `false` and multi-SQLs run in different sessions.
NOTICE: If Kerberos is disabled, ensure the parameter `hadoop.security.authentication.startup.state` is false, and parameter `java.security.krb5.conf.path` value sets null.
If **Kerberos** is enabled, needs to set the following parameters in `common.properties`:
NOTICE: If Kerberos is disabled, ensure the parameter `hadoop.security.authentication.startup.state` is false, and parameter `java.security.krb5.conf.path` value sets null.
If **Kerberos** is enabled, needs to set the following parameters in `common.properties`:
This article describes how to add a new master service or worker service to an existing DolphinScheduler cluster.
```
Attention: There cannot be more than one master service process or worker service process on a physical machine.
If the physical machine which locate the expansion master or worker node has already installed the scheduled service, check the [1.4 Modify configuration] and edit the configuration file `conf/config/install_config.conf` on ** all ** nodes, add masters or workers parameter, and restart the scheduling cluster.
Attention: There cannot be more than one master service process or worker service process on a physical machine.
If the physical machine which locate the expansion master or worker node has already installed the scheduled service, check the [1.4 Modify configuration] and edit the configuration file `conf/config/install_config.conf` on ** all ** nodes, add masters or workers parameter, and restart the scheduling cluster.
```
### Basic software installation
@ -14,16 +14,15 @@ This article describes how to add a new master service or worker service to an e
* [required] [JDK](https://www.oracle.com/technetwork/java/javase/downloads/index.html) (version 1.8+): must install, install and configure `JAVA_HOME` and `PATH` variables under `/etc/profile`
* [optional] If the expansion is a worker node, you need to consider whether to install an external client, such as Hadoop, Hive, Spark Client.
```markdown
Attention: DolphinScheduler itself does not depend on Hadoop, Hive, Spark, but will only call their Client for the corresponding task submission.
Attention: DolphinScheduler itself does not depend on Hadoop, Hive, Spark, but will only call their Client for the corresponding task submission.
```
### Get Installation Package
- Check the version of DolphinScheduler used in your existing environment, and get the installation package of the corresponding version, if the versions are different, there may be compatibility problems.
- Confirm the unified installation directory of other nodes, this article assumes that DolphinScheduler is installed in `/opt/` directory, and the full path is `/opt/dolphinscheduler`.
- Please download the corresponding version of the installation package to the server installation directory, uncompress it and rename it to `dolphinscheduler` and store it in the `/opt` directory.
- Please download the corresponding version of the installation package to the server installation directory, uncompress it and rename it to `dolphinscheduler` and store it in the `/opt` directory.
- Add database dependency package, this document uses Mysql database, add `mysql-connector-java` driver package to `/opt/dolphinscheduler/lib` directory.
Attention: You can copy the installation package directly from an existing environment to an expanded physical machine.
Attention: You can copy the installation package directly from an existing environment to an expanded physical machine.
```
### Create Deployment Users
@ -58,53 +57,49 @@ sed -i 's/Defaults requirett/#Defaults requirett/g' /etc/sudoers
```
```markdown
Attention:
- Since it is `sudo -u {linux-user}` to switch between different Linux users to run multi-tenant jobs, the deploying user needs to have sudo privileges and be password free.
- If you find the line `Default requiretty` in the `/etc/sudoers` file, please also comment it out.
- If have needs to use resource uploads, you also need to assign read and write permissions to the deployment user on `HDFS or MinIO`.
Attention:
- Since it is `sudo -u {linux-user}` to switch between different Linux users to run multi-tenant jobs, the deploying user needs to have sudo privileges and be password free.
- If you find the line `Default requiretty` in the `/etc/sudoers` file, please also comment it out.
- If have needs to use resource uploads, you also need to assign read and write permissions to the deployment user on `HDFS or MinIO`.
```
### Modify Configuration
- From an existing node such as `Master/Worker`, copy the configuration directory directly to replace the configuration directory in the new node. After finishing the file copy, check whether the configuration items are correct.
```markdown
Highlights:
datasource.properties: database connection information
zookeeper.properties: information for connecting zk
common.properties: Configuration information about the resource store (if hadoop is set up, please check if the core-site.xml and hdfs-site.xml configuration files exist).
dolphinscheduler_env.sh: environment Variables
````
```markdown
Highlights:
datasource.properties: database connection information
zookeeper.properties: information for connecting zk
common.properties: Configuration information about the resource store (if hadoop is set up, please check if the core-site.xml and hdfs-site.xml configuration files exist).
dolphinscheduler_env.sh: environment Variables
```
- Modify the `dolphinscheduler_env.sh` environment variable in the `bin/env/dolphinscheduler_env.sh` directory according to the machine configuration (the following is the example that all the used software install under `/opt/soft`)
Attention: When using `stop-all.sh` or `stop-all.sh`, if the physical machine execute the command is not configured to be ssh-free on all machines, it will prompt to enter the password
Attention: When using `stop-all.sh` or `stop-all.sh`, if the physical machine execute the command is not configured to be ssh-free on all machines, it will prompt to enter the password
```
- After completing the script, use the `jps` command to see if every node service is started (`jps` comes with the `Java JDK`)
```
MasterServer ----- master service
WorkerServer ----- worker service
ApiApplicationServer ----- api service
AlertServer ----- alert service
MasterServer ----- master service
WorkerServer ----- worker service
ApiApplicationServer ----- api service
AlertServer ----- alert service
```
After successful startup, you can view the logs, which are stored in the `logs` folder.
```Log Path
logs/
├── dolphinscheduler-alert-server.log
├── dolphinscheduler-master-server.log
├── dolphinscheduler-worker-server.log
├── dolphinscheduler-api-server.log
logs/
├── dolphinscheduler-alert-server.log
├── dolphinscheduler-master-server.log
├── dolphinscheduler-worker-server.log
├── dolphinscheduler-api-server.log
```
If the above services start normally and the scheduling system page is normal, check whether there is an expanded Master or Worker service in the [Monitor] of the web system. If it exists, the expansion is complete.
@ -187,9 +183,9 @@ There are two steps for shrinking. After performing the following two steps, the
### Stop the Service on the Scaled-Down Node
* If you are scaling down the master node, identify the physical machine where the master service is located, and stop the master service on the physical machine.
* If scale down the worker node, determine the physical machine where the worker service scale down and stop the worker services on the physical machine.
* If you are scaling down the master node, identify the physical machine where the master service is located, and stop the master service on the physical machine.
* If scale down the worker node, determine the physical machine where the worker service scale down and stop the worker services on the physical machine.
Attention: When using `stop-all.sh` or `stop-all.sh`, if the machine without the command is not configured to be ssh-free for all machines, it will prompt to enter the password
Attention: When using `stop-all.sh` or `stop-all.sh`, if the machine without the command is not configured to be ssh-free for all machines, it will prompt to enter the password
```
- After the script is completed, use the `jps` command to see if every node service was successfully shut down (`jps` comes with the `Java JDK`)
```
MasterServer ----- master service
WorkerServer ----- worker service
ApiApplicationServer ----- api service
AlertServer ----- alert service
MasterServer ----- master service
WorkerServer ----- worker service
ApiApplicationServer ----- api service
AlertServer ----- alert service
```
If the corresponding master service or worker service does not exist, then the master or worker service is successfully shut down.
If the corresponding master service or worker service does not exist, then the master or worker service is successfully shut down.
### Modify the Configuration File
- modify the configuration file `conf/config/install_config.conf` on the **all** nodes, synchronizing the following configuration.
* to scale down the master node, modify the IPs and masters parameters.
* to scale down worker nodes, modify the IPs and workers parameters.
- modify the configuration file `conf/config/install_config.conf` on the **all** nodes, synchronizing the following configuration.
* to scale down the master node, modify the IPs and masters parameters.
* to scale down worker nodes, modify the IPs and workers parameters.
```shell
# which machines to deploy DS services on, "localhost" for this machine
We here use MySQL as an example to illustrate how to configure an external database:
> NOTE: If you use MySQL, you need to manually download [mysql-connector-java driver][mysql] (8.0.16) and move it to the libs directory of DolphinScheduler
which is `api-server/libs` and `alert-server/libs` and `master-server/libs` and `worker-server/libs`.
> which is `api-server/libs` and `alert-server/libs` and `master-server/libs` and `worker-server/libs`.
* First of all, follow the instructions in [datasource-setting](datasource-setting.md) `Pseudo-Cluster/Cluster Initialize the Database` section to create and initialize database
* Set the following environment variables in your terminal or modify the `bin/env/dolphinscheduler_env.sh` with your database username and password for `{user}` and `{password}`:
@ -26,7 +26,6 @@ DolphinScheduler stores metadata in `relational database`. Currently, we support
> If you use MySQL, you need to manually download [mysql-connector-java driver][mysql] (8.0.16) and move it to the libs directory of DolphinScheduler which is `api-server/libs` and `alert-server/libs` and `master-server/libs` and `worker-server/libs`.
For mysql 5.6 / 5.7
```shell
@ -54,9 +53,10 @@ mysql> GRANT ALL PRIVILEGES ON dolphinscheduler.* TO '{user}'@'%';
mysql> CREATE USER '{user}'@'localhost' IDENTIFIED BY '{password}';
mysql> GRANT ALL PRIVILEGES ON dolphinscheduler.* TO '{user}'@'localhost';
mysql> FLUSH PRIVILEGES;
```
```
For PostgreSQL:
For PostgreSQL:
```shell
# Use psql-tools to login PostgreSQL
psql
@ -75,6 +75,7 @@ pg_ctl reload
Then, modify `./bin/env/dolphinscheduler_env.sh`, change {user} and {password} to what you set in the previous step.
> But if you want to use MySQL as the metabase of DolphinScheduler, it only supports [8.0.16 and above](https:/ /repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.16/mysql-connector-java-8.0.16.jar) version.
> **_Note:_** For the first time deployment, there maybe occur five times of `sh: bin/dolphinscheduler-daemon.sh: No such file or directory` in the terminal,
this is non-important information that you can ignore.
> this is non-important information that you can ignore.
@ -5,7 +5,7 @@ Standalone only for quick experience for DolphinScheduler.
If you are a new hand and want to experience DolphinScheduler functions, we recommend you install follow Standalone deployment. If you want to experience more complete functions and schedule massive tasks, we recommend you install follow [pseudo-cluster deployment](pseudo-cluster.md). If you want to deploy DolphinScheduler in production, we recommend you follow [cluster deployment](cluster.md) or [Kubernetes deployment](kubernetes.md).
> **_Note:_** Standalone only recommends the usage of fewer than 20 workflows, because it uses in-memory H2 Database in default, ZooKeeper Testing Server, too many tasks may cause instability.
> When Standalone stops or restarts, in-memory H2 database will clear up. To use Standalone with external databases like mysql or postgresql, please see [`Database Configuration`](#database-configuration).
> When Standalone stops or restarts, in-memory H2 database will clear up. To use Standalone with external databases like mysql or postgresql, please see [`Database Configuration`](#database-configuration).
@ -6,7 +6,7 @@ This section describes the one-click deployment of high availability DolphinSche
* Available Rainbond cloud native application management platform is a prerequisite,please refer to the official `Rainbond` documentation [Rainbond Quick install](https://www.rainbond.com/docs/quick-start/quick-install)
## DolphinScheduler Cluster One-click Deployment
## DolphinScheduler Cluster One-click Deployment
* Logging in and accessing the built-in open source app store, search the keyword `dolphinscheduler` to find the DolphinScheduler App.
@ -14,12 +14,12 @@ This section describes the one-click deployment of high availability DolphinSche
* Click `install` on the right side of DolphinScheduler to go to the installation page. Fill in the corresponding information and click `OK` to start the installation. You will get automatically redirected to the application view.
API and Worker Services share the configuration file `/opt/dolphinscheduler/conf/common.properties`. To modify the configurations, you only need to modify that of the API service.
@ -60,5 +61,7 @@ Take `DataX` as an example:
* FILE_PATH:/opt/soft
* LOCK_PATH:/opt/soft
3. Update component, the plug-in `Datax` will be downloaded automatically and decompress to `/opt/soft`
Apache DolphinScheduler exports metrics for system observability. We use [Micrometer](https://micrometer.io/) as application metrics facade.
Currently, we only support `Prometheus Exporter` but more are coming soon.
## Quick Start
## Quick Start
- We enable Apache DolphinScheduler to export metrics in `standalone` mode to help users get hands dirty easily.
- We enable Apache DolphinScheduler to export metrics in `standalone` mode to help users get hands dirty easily.
- After triggering tasks in `standalone` mode, you could access metrics list by visiting url `http://localhost:12345/dolphinscheduler/actuator/metrics`.
- After triggering tasks in `standalone` mode, you could access `prometheus-format` metrics by visiting url `http://localhost:12345/dolphinscheduler/actuator/prometheus`.
- For a better experience with `Prometheus` and `Grafana`, we have prepared the out-of-the-box `Grafana` configurations for you, you could find the `Grafana` dashboards
at `dolphinscheduler-meter/resources/grafana` and directly import these dashboards to your `Grafana` instance.
at `dolphinscheduler-meter/resources/grafana` and directly import these dashboards to your `Grafana` instance.
- If you want to try with `docker`, you can use the following command to start the out-of-the-box `Prometheus` and `Grafana`:
```shell
@ -17,12 +17,12 @@ cd dolphinscheduler-meter/src/main/resources/grafana-demo
docker compose up
```
then access the `Grafana` by the url: `http://localhost/3001` for dashboards.
then access the `Grafana` by the url: `http://localhost/3001` for dashboards.
- If you prefer to have some experiments in `cluster` mode, please refer to the [Configuration](#configuration) section below:
## Configuration
@ -48,7 +48,7 @@ For example, you can get the master metrics by `curl http://localhost:5679/actua
### Prometheus
- all dots mapped to underscores
- metric name starting with number added with prefix `m_`
- metric name starting with number added with prefix `m_`
- COUNTER: add `_total` suffix if not ending with it
- LONG_TASK_TIMER: `_timer_seconds` suffix added if not ending with them
- GAUGE: `_baseUnit` suffix added if not ending with it
@ -56,7 +56,7 @@ For example, you can get the master metrics by `curl http://localhost:5679/actua
## Dolphin Scheduler Metrics Cheatsheet
- We categorize metrics by dolphin scheduler components such as `master server`, `worker server`, `api server` and `alert server`.
- Although task / workflow related metrics exported by `master server` and `worker server`, we categorize them separately for users to query them more conveniently.
- Although task / workflow related metrics exported by `master server` and `worker server`, we categorize them separately for users to query them more conveniently.
### Task Related Metrics
@ -66,19 +66,18 @@ For example, you can get the master metrics by `curl http://localhost:5679/actua
- success: the number of successful tasks
- fail: the number of failed tasks
- stop: the number of stopped tasks
- retry: the number of retried tasks
- retry: the number of retried tasks
- submit: the number of submitted tasks
- failover: the number of task fail-overs
- ds.task.dispatch.count: (counter) the number of tasks dispatched to worker
- ds.task.dispatch.failure.count: (counter) the number of tasks failed to dispatch, retry failure included
- ds.task.dispatch.error.count: (counter) the number of task dispatch errors
- ds.task.execution.count.by.type: (counter) the number of task executions grouped by tag `task_type`
- ds.task.running: (gauge) the number of running tasks
- ds.task.prepared: (gauge) the number of tasks prepared for task queue
- ds.task.execution.count: (counter) the number of executed tasks
- ds.task.running: (gauge) the number of running tasks
- ds.task.prepared: (gauge) the number of tasks prepared for task queue
- ds.task.execution.count: (counter) the number of executed tasks
- ds.task.execution.duration: (histogram) duration of task executions
### Workflow Related Metrics
- ds.workflow.create.command.count: (counter) the number of commands created and inserted by workflows
@ -88,14 +87,14 @@ For example, you can get the master metrics by `curl http://localhost:5679/actua
- timeout: the number of timeout workflow instances
- finish: the number of finished workflow instances, both successes and failures included
- success: the number of successful workflow instances
- fail: the number of failed workflow instances
- stop: the number of stopped workflow instances
- fail: the number of failed workflow instances
- stop: the number of stopped workflow instances
- failover: the number of workflow instance fail-overs
### Master Server Metrics
- ds.master.overload.count: (counter) the number of times the master overloaded
- ds.master.consume.command.count: (counter) the number of commands consumed by master
- ds.master.consume.command.count: (counter) the number of commands consumed by master
- ds.master.scheduler.failover.check.count: (counter) the number of scheduler (master) fail-over checks
- ds.master.scheduler.failover.check.time: (histogram) the total time cost of scheduler (master) fail-over checks
- ds.master.quartz.job.executed: the total number of quartz jobs executed
@ -111,7 +110,7 @@ For example, you can get the master metrics by `curl http://localhost:5679/actua
### Api Server Metrics
- Currently, we have not embedded any metrics in Api Server.
- Currently, we have not embedded any metrics in Api Server.
### Alert Server Related
@ -124,7 +123,7 @@ For example, you can get the master metrics by `curl http://localhost:5679/actua
- hikaricp.connections: the total number of connections
- hikaricp.connections.creation: connection creation time (max, count, sum included)
- hikaricp.connections.acquire: connection acquirement time (max, count, sum included)
- hikaricp.connections.acquire: connection acquirement time (max, count, sum included)
- hikaricp.connections.usage: connection usage time (max, count, sum included)
- hikaricp.connections.max: the max number of connections
- hikaricp.connections.min: the min number of connections
@ -175,3 +174,4 @@ For example, you can get the master metrics by `curl http://localhost:5679/actua
- system.load.average.1m: the total number of runnable entities queued to available processors and runnable entities running on the available processors averaged over a period
- logback.events: the number of events that made it to the logs grouped by the tag `level`
- http.server.requests: total number of http requests
| system.biz.date | `${system.biz.date}` | The day before the schedule time of the daily scheduling instance, the format is `yyyyMMdd` |
| system.biz.curdate | `${system.biz.curdate}` | The schedule time of the daily scheduling instance, the format is `yyyyMMdd`|
| system.datetime | `${system.datetime}` | The schedule time of the daily scheduling instance, the format is `yyyyMMddHHmmss` |
## Extended Built-in Parameter
@ -16,19 +16,19 @@
- Or define by the following two ways:
1. Use add_month(yyyyMMdd, offset) function to add or minus number of months.
The first parameter of this function is [yyyyMMdd], represents the time format and the second parameter is offset, represents the number of months the user wants to add or minus.
- Next N years:`$[add_months(yyyyMMdd,12*N)]`
- N years before:`$[add_months(yyyyMMdd,-12*N)]`
- Next N months:`$[add_months(yyyyMMdd,N)]`
- N months before:`$[add_months(yyyyMMdd,-N)]`
2. Add or minus numbers directly after the time format.
- Next N weeks:`$[yyyyMMdd+7*N]`
- First N weeks:`$[yyyyMMdd-7*N]`
- Next N days:`$[yyyyMMdd+N]`
- N days before:`$[yyyyMMdd-N]`
- Next N hours:`$[HHmmss+N/24]`
- First N hours:`$[HHmmss-N/24]`
- Next N minutes:`$[HHmmss+N/24/60]`
- First N minutes:`$[HHmmss-N/24/60]`
1. Use add_month(yyyyMMdd, offset) function to add or minus number of months.
The first parameter of this function is [yyyyMMdd], represents the time format and the second parameter is offset, represents the number of months the user wants to add or minus.
- Next N years:`$[add_months(yyyyMMdd,12*N)]`
- N years before:`$[add_months(yyyyMMdd,-12*N)]`
- Next N months:`$[add_months(yyyyMMdd,N)]`
- N months before:`$[add_months(yyyyMMdd,-N)]`
2.Addorminusnumbers directly after the time format.
@ -49,7 +49,7 @@ When the SHELL task is completed, we can use the output passed upstream as the q
> Note: If the result of the SQL node has only one row, one or multiple fields, the name of the `prop` needs to be the same as the field name. The data type can choose structure except `LIST`. The parameter assigns the value according to the same column name in the SQL query result.
>
>If the result of the SQL node has multiple rows, one or more fields, the name of the `prop` needs to be the same as the field name. Choose the data type structure as `LIST`, and the SQL query result will be converted to `LIST<VARCHAR>`, and forward to convert to JSON as the parameter value.
>If the result of the SQL node has multiple rows, one or more fields, the name of the `prop` needs to be the same as the field name. Choose the data type structure as `LIST`, and the SQL query result will be converted to `LIST<VARCHAR>`, and forward to convert to JSON as the parameter value.
#### Save the workflow and set the global parameters
This page describes details regarding Project screen in Apache DolphinScheduler. Here, you will see all the functions which can be handled in this screen. The following table explains commonly used terms in Apache DolphinScheduler:
| DAG | Tasks in a workflow are assembled in form of Directed Acyclic Graph (DAG). A topological traversal is performed from nodes with zero degrees of entry until there are no subsequent nodes. |
| Workflow Definition | Visualization formed by dragging task nodes and establishing task node associations (DAG). |
| Workflow Instance | Instantiation of the workflow definition, which can be generated by manual start or scheduled scheduling. Each time the process definition runs, a workflow instance is generated. |
| Workflow Relation | Shows dynamic status of all the workflows in a project. |
| Task | Task is a discrete action in a Workflow. Apache DolphinScheduler supports SHELL, SQL, SUB_PROCESS (sub-process), PROCEDURE, MR, SPARK, PYTHON, DEPENDENT ( depends), and plans to support dynamic plug-in expansion, (SUB_PROCESS). It is also a separate process definition that can be started and executed separately. |
| Task Instance | Instantiation of the task node in the process definition, which identifies the specific task execution status. |
| DAG | Tasks in a workflow are assembled in form of Directed Acyclic Graph (DAG). A topological traversal is performed from nodes with zero degrees of entry until there are no subsequent nodes. |
| Workflow Definition | Visualization formed by dragging task nodes and establishing task node associations (DAG). |
| Workflow Instance | Instantiation of the workflow definition, which can be generated by manual start or scheduled scheduling. Each time the process definition runs, a workflow instance is generated. |
| Workflow Relation | Shows dynamic status of all the workflows in a project. |
| Task | Task is a discrete action in a Workflow. Apache DolphinScheduler supports SHELL, SQL, SUB_PROCESS (sub-process), PROCEDURE, MR, SPARK, PYTHON, DEPENDENT ( depends), and plans to support dynamic plug-in expansion, (SUB_PROCESS). It is also a separate process definition that can be started and executed separately. |
| Task Instance | Instantiation of the task node in the process definition, which identifies the specific task execution status. |
Click `Project Management -> Workflow -> Task Instance` to enter the task instance page, as shown in the figure below, click the name of the workflow instance to jump to the DAG diagram of the workflow instance to view the task status.
@ -21,3 +22,4 @@ Click the `View Log` button in the operation column to view the log of the task
- SavePoint: Click the `SavePoint` button in the operation column to do stream task savepoint.
- Stop: Click the `Stop` button in the operation column to stop the stream task.
@ -29,15 +29,16 @@ Drag from the toolbar <img src="../../../../img/tasks/icons/shell.png" width="15
7. Click the `Confirm Add` button to save the task settings.
### Set dependencies between tasks
Click the plus sign on the right of the task node to connect the task; as shown in the figure below, task Node_B and task Node_C execute in parallel, When task Node_A finished execution, tasks Node_B and Node_C will execute simultaneously.
If the DAG contains stream tasks, the relationship between stream tasks is displayed as a dotted line, and the execution of stream tasks will be skipped when the workflow instance is executed.
**Delete dependencies:** Using your mouse to select the connection line, and click the "Delete" icon in the upper right corner <imgsrc="../../../../img/delete.png"width="35"/>, delete dependencies between tasks.
@ -57,15 +58,15 @@ Click `Project Management -> Workflow -> Workflow Definition` to enter the workf
Workflow running parameter description:
* **Failure strategy**: When a task node fails to execute, other parallel task nodes need to execute the strategy. "Continue" means: After a task fails, other task nodes execute normally; "End" means: Terminate all tasks being executed, and terminate the entire process.
* **Notification strategy**: When the process ends, send process execution information notification emails according to the process status, including no status, success, failure, success or failure.
* **Process priority**: the priority of process operation, divided into five levels: the highest (HIGHEST), high (HIGH), medium (MEDIUM), low (LOW), the lowest (LOWEST). When the number of master threads is insufficient, processes with higher levels will be executed first in the execution queue, and processes with the same priority will be executed in the order of first-in, first-out.
* **Worker grouping**: This process can only be executed in the specified worker machine group. The default is Default, which can be executed on any worker.
* **Notification Group**: Select Notification Policy||Timeout Alarm||When fault tolerance occurs, process information or emails will be sent to all members in the notification group.
* **Recipient**: Select Notification Policy||Timeout Alarm||When fault tolerance occurs, process information or alarm email will be sent to the recipient list.
* **Cc**: Select Notification Policy||Timeout Alarm||When fault tolerance occurs, the process information or alarm email will be copied to the Cc list.
* **Startup parameters**: Set or override the value of global parameters when starting a new process instance.
* **Complement**: There are 2 modes of serial complement and parallel complement. Serial complement: within the specified time range, perform complements in sequence from the start date to the end date, and generate N process instances in turn; parallel complement: within the specified time range, perform multiple complements at the same time, and generate N process instances at the same time .
* **Failure strategy**: When a task node fails to execute, other parallel task nodes need to execute the strategy. "Continue" means: After a task fails, other task nodes execute normally; "End" means: Terminate all tasks being executed, and terminate the entire process.
* **Notification strategy**: When the process ends, send process execution information notification emails according to the process status, including no status, success, failure, success or failure.
* **Process priority**: the priority of process operation, divided into five levels: the highest (HIGHEST), high (HIGH), medium (MEDIUM), low (LOW), the lowest (LOWEST). When the number of master threads is insufficient, processes with higher levels will be executed first in the execution queue, and processes with the same priority will be executed in the order of first-in, first-out.
* **Worker grouping**: This process can only be executed in the specified worker machine group. The default is Default, which can be executed on any worker.
* **Notification Group**: Select Notification Policy||Timeout Alarm||When fault tolerance occurs, process information or emails will be sent to all members in the notification group.
* **Recipient**: Select Notification Policy||Timeout Alarm||When fault tolerance occurs, process information or alarm email will be sent to the recipient list.
* **Cc**: Select Notification Policy||Timeout Alarm||When fault tolerance occurs, the process information or alarm email will be copied to the Cc list.
* **Startup parameters**: Set or override the value of global parameters when starting a new process instance.
* **Complement**: There are 2 modes of serial complement and parallel complement. Serial complement: within the specified time range, perform complements in sequence from the start date to the end date, and generate N process instances in turn; parallel complement: within the specified time range, perform multiple complements at the same time, and generate N process instances at the same time .
* **Complement**: Execute the workflow definition of the specified date, you can select the time range of the supplement (currently only supports the supplement for consecutive days), for example, the data from May 1st to May 10th needs to be supplemented, as shown in the following figure:
The following are the operation functions of the workflow definition list:
@ -91,58 +92,58 @@ The following are the operation functions of the workflow definition list:
- Click the `Run` button to pop up the startup parameter setting window, as shown in the figure below, set the startup parameters, click the `Run` button in the pop-up box, the workflow starts running, and the workflow instance page generates a workflow instance.
* Failure strategy: When a task node fails to execute, other parallel task nodes need to execute this strategy. "Continue" means: after a certain task fails, other task nodes execute normally; "End" means: terminate all tasks execution, and terminate the entire process.
* Notification strategy: When the process is over, send the process execution result notification email according to the process status, options including no send, send if sucess, send of failure, send whatever result.
* Process priority: The priority of process operation, divide into five levels: highest (HIGHEST), high (HIGH), medium (MEDIUM), low (LOW), and lowest (LOWEST). When the number of master threads is insufficient, high priority processes will execute first in the execution queue, and processes with the same priority will execute in the order of first in, first out.
* Worker group: The process can only be executed in the specified worker machine group. The default is `Default`, which can execute on any worker.
* Notification group: select notification strategy||timeout alarm||when fault tolerance occurs, process result information or email will send to all members in the notification group.
* Recipient: select notification policy||timeout alarm||when fault tolerance occurs, process result information or alarm email will be sent to the recipient list.
* Cc: select notification policy||timeout alarm||when fault tolerance occurs, the process result information or warning email will be copied to the CC list.
* Startup parameter: Set or overwrite global parameter values when starting a new process instance.
* Complement: refers to running the workflow definition within the specified date range and generating the corresponding workflow instance according to the complement policy. The complement policy includes two modes: **serial complement** and **parallel complement**. The date can be selected on the page or entered manually.
* Serial complement: within the specified time range, complement is executed from the start date to the end date, and multiple process instances are generated in turn; Click Run workflow and select the serial complement mode: for example, from July 9 to July 10, execute in sequence, and generate two process instances in sequence on the process instance page.
* Parallel Replenishment: within the specified time range, replenishment is performed simultaneously for multiple days, and multiple process instances are generated at the same time. Enter date manually: manually enter a date in the comma separated date format of 'yyyy MM DD hh:mm:ss'.Click Run workflow and select the parallel complement mode: for example, execute the workflow definition from July 9 to July 10 at the same time, and generate two process instances on the process instance page at the same time.
* Concurrency: refers to the maximum number of instances executed in parallel in the parallel complement mode.For example, if tasks from July 6 to July 10 are executed at the same time, and the concurrency is 2, then the process instance is:
* Dependency mode: whether to trigger the replenishment of workflow instances that downstream dependent nodes depend on the current workflow (the timing status of workflow instances that require the current replenishment is online, which will only trigger the replenishment of downstream directly dependent on the current workflow).
* Relationship between complement and timing configuration:
1. Unconfigured timing: When there is no timing configuration, the daily replenishment will be performed by default according to the selected time range. For example, the workflow scheduling date is July 7 to July 10. If timing is not configured, the process instance is:
2. Configured timing: If there is a timing configuration, it will be supplemented according to the selected time range in combination with the timing configuration. For example, the workflow scheduling date is July 7 to July 10, and the timing is configured (running every 5 a.m.). The process example is:
* Failure strategy: When a task node fails to execute, other parallel task nodes need to execute this strategy. "Continue" means: after a certain task fails, other task nodes execute normally; "End" means: terminate all tasks execution, and terminate the entire process.
* Notification strategy: When the process is over, send the process execution result notification email according to the process status, options including no send, send if sucess, send of failure, send whatever result.
* Process priority: The priority of process operation, divide into five levels: highest (HIGHEST), high (HIGH), medium (MEDIUM), low (LOW), and lowest (LOWEST). When the number of master threads is insufficient, high priority processes will execute first in the execution queue, and processes with the same priority will execute in the order of first in, first out.
* Worker group: The process can only be executed in the specified worker machine group. The default is `Default`, which can execute on any worker.
* Notification group: select notification strategy||timeout alarm||when fault tolerance occurs, process result information or email will send to all members in the notification group.
* Recipient: select notification policy||timeout alarm||when fault tolerance occurs, process result information or alarm email will be sent to the recipient list.
* Cc: select notification policy||timeout alarm||when fault tolerance occurs, the process result information or warning email will be copied to the CC list.
* Startup parameter: Set or overwrite global parameter values when starting a new process instance.
* Complement: refers to running the workflow definition within the specified date range and generating the corresponding workflow instance according to the complement policy. The complement policy includes two modes: **serial complement** and **parallel complement**. The date can be selected on the page or entered manually.
* Serial complement: within the specified time range, complement is executed from the start date to the end date, and multiple process instances are generated in turn; Click Run workflow and select the serial complement mode: for example, from July 9 to July 10, execute in sequence, and generate two process instances in sequence on the process instance page.
*ParallelReplenishment: within the specified time range, replenishment is performed simultaneously for multiple days, and multiple process instances are generated at the same time. Enter date manually: manually enter a date in the comma separated date format of 'yyyy MM DD hh:mm:ss'.Click Run workflow and select the parallel complement mode: for example, execute the workflow definition from July 9 to July 10 at the same time, and generate two process instances on the process instance page at the same time.
*Concurrency:refers to the maximum number of instances executed in parallel in the parallel complement mode.For example, if tasks from July 6 to July 10 are executed at the same time, and the concurrency is 2, then the process instance is:
*Dependencymode: whether to trigger the replenishment of workflow instances that downstream dependent nodes depend on the current workflow (the timing status of workflow instances that require the current replenishment is online, which will only trigger the replenishment of downstream directly dependent on the current workflow).
*Relationshipbetween complement and timing configuration:
1. Unconfigured timing: When there is no timing configuration, the daily replenishment will be performed by default according to the selected time range. For example, the workflow scheduling date is July 7 to July 10. If timing is not configured, the process instance is:
2.Configuredtiming:Ifthere is a timingconfiguration, it will be supplemented according to the selected time range in combination with the timing configuration. For example, the workflow scheduling date is July 7 to July 10, and the timing is configured (running every 5 a.m.). The process example is:
- Select a start and end time. Within the start and end time range, the workflow is run regularly; outside the start and end time range, no timed workflow instance will be generated.
- Add a timing that execute 5 minutes once, as shown in the following figure:
- Failure strategy, notification strategy, process priority, worker group, notification group, recipient, and CC are the same as workflow running parameters.
- Click the "Create" button to create the timing. Now the timing status is "**Offline**" and the timing needs to be **Online** to make effect.
- Schedule online: Click the `Timing Management` button <imgsrc="../../../../img/timeManagement.png"width="35"/>, enter the timing management page, click the `online` button, the timing status will change to `online`, as shown in the below figure, the workflow makes effect regularly.
@ -30,7 +30,7 @@ Double-click the task node, click `View History` to jump to the task instance pa
## View Running Parameters
Click `Project Management -> Workflow -> Workflow Instance` to enter the workflow instance page, click the workflow name to enter the workflow DAG page;
Click `Project Management -> Workflow -> Workflow Instance` to enter the workflow instance page, click the workflow name to enter the workflow DAG page;
Click the icon in the upper left corner <imgsrc="../../../../img/run_params_button.png"width="35"/> to view the startup parameters of the workflow instance; click the icon <imgsrc="../../../../img/global_param.png"width="35"/> to view the global parameters and local parameters of the workflow instance, as shown in the following figure:
@ -43,15 +43,23 @@ Click `Project Management -> Workflow -> Workflow Instance`, enter the workflow
- **Edit:** Only processes with success/failed/stop status can be edited. Click the "Edit" button or the workflow instance name to enter the DAG edit page. After the edit, click the "Save" button to confirm, as shown in the figure below. In the pop-up box, check "Whether to update the workflow definition", after saving, the information modified by the instance will be updated to the workflow definition; if not checked, the workflow definition would not be updated.
- **Recovery Failed:** For failed processes, you can perform failure recovery operations, starting from the failed node
- **Stop:****Stop** the running process, the background code will first `kill` the worker process, and then execute `kill -9` operation
- **Pause:****Pause** the running process, the system status will change to **waiting for execution**, it will wait for the task to finish, and pause the next sequence task.
- **Resume pause:** Resume the paused process, start running directly from the **paused node**
- **Delete:** Delete the workflow instance and the task instance under the workflow instance
- **Gantt Chart:** The vertical axis of the Gantt chart is the topological sorting of task instances of the workflow instance, and the horizontal axis is the running time of the task instances, as shown in the figure:
The Resource Center is typically used for uploading files, UDF functions, and task group management. For a stand-alone
environment, you can select the local file directory as the upload folder (**this operation does not require Hadoop or HDFS deployment**).
Of course, you can also choose to upload to Hadoop or MinIO cluster. In this case, you need to have Hadoop (2.6+) or MinIOn and other related environments.
Of course, you can also choose to upload to Hadoop or MinIO cluster. In this case, you need to have Hadoop (2.6+) or MinIOn and other related environments.
The task group is mainly used to control the concurrency of task instances, and is designed to control the pressure of other resources (it can also control the pressure of the Hadoop cluster, the cluster will have queue control it). When creating a new task definition, you can configure the corresponding task group and configure the priority of the task running in the task group.
The task group is mainly used to control the concurrency of task instances, and is designed to control the pressure of other resources (it can also control the pressure of the Hadoop cluster, the cluster will have queue control it). When creating a new task definition, you can configure the corresponding task group and configure the priority of the task running in the task group.
You need to enter the information inside the picture:
@ -18,39 +18,39 @@ You need to enter the information inside the picture:
- **Project name**: The project that the task group functions, this item is optional, if not selected, all the projects in the whole system can use this task group.
- **Resource pool size**: The maximum number of concurrent task instances allowed.
**Note**: The use of task groups is applicable to tasks executed by workers, such as `switch` nodes, `condition` nodes, `sub_process` and other node types executed by the master are not controlled by the task group.
Regarding the configuration of the task group, all you need to do is to configure these parts in the red box:
- Task group name: The task group name is displayed on the task group configuration page. Here you can only see the task group that the project has permission to access (the project is selected when creating a task group) or the task group that scope globally (no project is selected when creating a task group).
- Priority: When there is a waiting resource, the task with high priority will be distributed to the worker by the master first. The larger the value of this part, the higher the priority.
- Priority: When there is a waiting resource, the task with high priority will be distributed to the worker by the master first. The larger the value of this part, the higher the priority.
## Implementation Logic of Task Group
## Implementation Logic of Task Group
### Get Task Group Resources
The master judges whether the task is configured with a task group when distributing the task. If the task is not configured, it is normally thrown to the worker to run; if a task group is configured, it checks whether the remaining size of the task group resource pool meets the current task operation before throwing it to the worker for execution. , if the resource pool -1 is satisfied, continue to run; if not, exit the task distribution and wait for other tasks to wake up.
The master judges whether the task is configured with a task group when distributing the task. If the task is not configured, it is normally thrown to the worker to run; if a task group is configured, it checks whether the remaining size of the task group resource pool meets the current task operation before throwing it to the worker for execution. , if the resource pool -1 is satisfied, continue to run; if not, exit the task distribution and wait for other tasks to wake up.
### Release and Wake Up
When the task that has occupied the task group resource is finished, the task group resource will be released. After the release, it will check whether there is a task waiting in the current task group. If there is, mark the task with the best priority to run, and create a new executable event. The event stores the task ID that is marked to acquire the resource, and then the task obtains the task group resource and run.
When the task that has occupied the task group resource is finished, the task group resource will be released. After the release, it will check whether there is a task waiting in the current task group. If there is, mark the task with the best priority to run, and create a new executable event. The event stores the task ID that is marked to acquire the resource, and then the task obtains the task group resource and run.
- You can add new worker groups for the workers during runtime regardless of the configurations in `application.yaml` as below:
`Security Center` -> `Worker Group Manage` -> `Create Worker Group` -> fill in `Group Name` and `Worker Addresses` -> click `confirm`.
- You can add new worker groups for the workers during runtime regardless of the configurations in `application.yaml` as below:
`Security Center` -> `Worker Group Manage` -> `Create Worker Group` -> fill in `Group Name` and `Worker Addresses` -> click `confirm`.
## Environmental Management
@ -164,10 +164,10 @@ Create a task node in the workflow definition, select the worker group and the e
## Cluster Management
> Add or update cluster
- Each process can be related to zero or several clusters to support multiple environment, now just support k8s.
> - Each process can be related to zero or several clusters to support multiple environment, now just support k8s.
>
> Usage cluster
- After creation and authorization, k8s namespaces and processes will associate clusters. Each cluster will have separate workflows and task instances running independently.
> - After creation and authorization, k8s namespaces and processes will associate clusters. Each cluster will have separate workflows and task instances running independently.
| Node Name | The name of the set task. The node name in a workflow definition is unique. |
| Run Flag | Indicates whether the node is scheduled properly and turns on the kill switch, if not needed. |
| Description | Describes the functionality of the node. |
| Task Priority | When the number of worker threads is insufficient, the worker executes tasks according to the priority. When the priority is the same, the worker executes tasks by order. |
| Worker Group | The group of machines who execute the tasks. If selecting `Default`, DolphinScheduler will randomly choose a worker machine to execute the task. |
| Environment Name | Configure the environment in which the task runs. |
| Number Of Failed Retries | Number of resubmitted tasks that failed. You can choose the number in the drop-down menu or fill it manually. |
| Failed Retry Interval | the interval between the failure and resubmission of a task. You can choose the number in the drop-down menu or fill it manually. |
| Delayed Execution Time | the amount of time a task is delayed, in units. |
| Timeout Alarm | Check timeout warning, timeout failure, when the task exceeds the“Timeout length”, send a warning message and the task execution fails. |
| Module Path | pick Java 9 + 's modularity feature, put all resources into-module-path, and require that the JDK version in your worker supports modularity. |
| Main Parameter | Java program main method entry parameter. |
| Java VM Parameters | JVM startup parameters. |
| Script | You need to write Java code if you use the Java run type. The public class must exist in the code without writing a package statement. |
| Resources | External JAR packages or other resource files that are added to the classpath or module path and can be easily retrieved in your JAVA script. |
| Custom parameter | A user-defined parameter that is part of HTTP and replaces `${ variable }` in the script . |
| Pre Tasks | Selects a pre-task for the current task and sets the pre-task as the upstream of the current task. |
## Example
Java type tasks have two modes of execution, here is a demonstration of executing tasks in Java mode.
@ -28,13 +28,13 @@ Change configuration in `./bin/env/dolphinscheduler_env.sh` ({user} and {passwor
Using MySQL as an example, change the value if you use other databases. Please manually download the [mysql-connector-java driver jar](https://downloads.MySQL.com/archives/c-j/)
jar package and add it to the `./tools/libs` directory, then change `./bin/ env/dolphinscheduler_env.sh` file
- If you deploy with Pseudo-Cluster deployment, change it according to [Pseudo-Cluster](../installation/pseudo-cluster.md) section "Modify Configuration".
- If you deploy with Cluster deployment, change it according to [Cluster](../installation/cluster.md) section "Modify Configuration".
And them run command `sh ./bin/start-all.sh` to start all services.
And them run command `sh ./bin/start-all.sh` to start all services.
## Notice
@ -54,26 +54,26 @@ And them run command `sh ./bin/start-all.sh` to start all services.
The architecture of worker group is different between version before version 1.3.1 until version 2.0.0
- Before version 1.3.1(include itself) worker group can be created through UI interface.
- Since version 1.3.1 and before version 2.0.0, worker group can be created by modifying the worker configuration.
- Since version 1.3.1 and before version 2.0.0, worker group can be created by modifying the worker configuration.
#### How Can I Do When I Upgrade from 1.3.1 to version before 2.0.0
* Check the backup database, search records in table `t_ds_worker_group` table and mainly focus on three columns: `id, name and IP`.
| id | name | ip_list |
| :--- | :---: | ---: |
| 1 | service1 | 192.168.xx.10 |
| 2 | service2 | 192.168.xx.11,192.168.xx.12 |
| id | name | ip_list |
|:---|:--------:|----------------------------:|
| 1 | service1 | 192.168.xx.10 |
| 2 | service2 | 192.168.xx.11,192.168.xx.12 |
* Modify worker related configuration in `bin/env/install_config.conf`.
Assume bellow are the machine worker service to be deployed:
| hostname | ip |
| :--- | :---: |
| ds1 | 192.168.xx.10 |
| ds2 | 192.168.xx.11 |
| ds3 | 192.168.xx.12 |
| hostname | ip |
|:---------|:-------------:|
| ds1 | 192.168.xx.10 |
| ds2 | 192.168.xx.11 |
| ds3 | 192.168.xx.12 |
To keep worker group config consistent with the previous version, we need to modify workers configuration as below: