This page describes details regarding Project screen in Apache DolphinScheduler. Here, you will see all the functions which can be handled in this screen. The following table explains commonly used terms in Apache DolphinScheduler:
| Glossary | |
| ------ | -------- |
| DAG | Tasks in a workflow are assembled in form of Directed Acyclic Graph (DAG). A topological traversal is performed from nodes with zero degrees of entry until there are no subsequent nodes. |
| Workflow Definition | Visualization formed by dragging task nodes and establishing task node associations (DAG). |
| Workflow Instance | Instantiation of the workflow definition, which can be generated by manual start or scheduled scheduling. Each time the process definition runs, a workflow instance is generated. |
| Workflow Relation | Shows dynamic status of all the workflows in a project. |
| Task | Task is a discrete action in a Workflow. Apache DolphinScheduler supports SHELL, SQL, SUB_PROCESS (sub-process), PROCEDURE, MR, SPARK, PYTHON, DEPENDENT ( depends), and plans to support dynamic plug-in expansion, (SUB_PROCESS). It is also a separate process definition that can be started and executed separately. |
| Task Instance | Instantiation of the task node in the process definition, which identifies the specific task execution status. |
## Create Project
- Click "Project Management" to enter the project management page, click the "Create Project" button, enter the project name, project description, and click "Submit" to create a new project.
- Click `Project Management` to enter the project management page, click the `Create Project` button, enter the project name, project description, and click "Submit" to create a new project.
- Click the project name link on the project management page to enter the project home page, as shown in the figure below, the project home page contains the task status statistics, process status statistics, and workflow definition statistics of the project. The introduction for those metrics:
- Task status statistics: within the specified time range, count the number of task instances status as successful submission, running, ready to pause, pause, ready to stop, stop, failure, success, need fault tolerance, kill and waiting threads
- Process status statistics: within the specified time range, count the number of workflow instances status as submission success, running, ready to pause, pause, ready to stop, stop, failure, success, need fault tolerance, kill and waiting threads
- Workflow definition statistics: count the workflow definitions created by this user and granted by the administrator
- Task status statistics: Within the specified time range, count the number of task instances status as successful submission, running, ready to pause, pause, ready to stop, stop, failure, success, need fault tolerance, kill and waiting threads
- Process status statistics: Within the specified time range, count the number of workflow instances status as submission success, running, ready to pause, pause, ready to stop, stop, failure, success, need fault tolerance, kill and waiting threads
- Workflow definition statistics: Count the workflow definitions created by this user and granted by the administrator
- Click Project Management -> Workflow -> Task Instance. Enter the task instance page, as shown in the figure below, click workflow instance name, you can jump to the workflow instance DAG chart to view the task status.
- Click `Project Management -> Workflow -> Task Instance`. Enter the `Task Instance` page, as shown in the figure below, click workflow instance name, you can jump to the workflow instance DAG chart to view the task status.
- Click Project Management -> Workflow -> Workflow Definition, enter the workflow definition page, and click the "Create Workflow" button to enter the **workflow DAG edit** page, as shown in the following figure:
- Click `Project Management -> Workflow -> Workflow Definition`, enter the `Workflow Definition` page, and click the `Create Workflow` button to enter the **workflow DAG edit** page, as shown in the following figure:
1. Fill in the "Node Name", "Description" and "Script" fields;
2. Check “Normal” for “Run Flag”. If “Prohibit Execution” is checked, the task will not execute when the workflow runs;
3. Select "Task Priority": when the number of worker threads is insufficient, high priority tasks will execute first in the execution queue, and tasks with the same priority will execute in the order of first in, first out;
1. Fill in the `Node Name`, `Description` and `Script` fields;
2. Check “`Normal`” for “`Run Flag`”. If “`Prohibit Execution`” is checked, the task will not execute when the workflow runs;
3. Select `Task Priority`: when the number of worker threads is insufficient, high priority tasks will execute first in the execution queue, and tasks with the same priority will execute in the order of first in, first out;
4. Timeout alarm (optional): check the timeout alarm, timeout failure, and fill in the "timeout period". When the task execution time exceeds **timeout period**, an alert email will send and the task timeout fails;
5. Resources (optional). Resources are files create or upload in the Resource Center -> File Management page. For example, the file name is `test.sh`, and the command to call the resource in the script is `sh test.sh`;
5. Resources (optional). Resources are files create or upload in the `Resource Center -> File Management` page. For example, the file name is `test.sh`, and the command to call the resource in the script is `sh test.sh`;
6. Customize parameters (optional);
7. Click the "Confirm Add" button to save the task settings.
7. Click the `Confirm Add` button to save the task settings.
- **Set dependencies between tasks:** Click the plus sign on the right of the task node to connect the task; as shown in the figure below, task Node_B and task Node_C execute in parallel, When task Node_A finished execution, tasks Node_B and Node_C will execute simultaneously.
- **Save workflow definition:** Click the "Save" button, and the "Set DAG chart name" window pops up, as shown in the figure below. Enter the workflow definition name, workflow definition description, and set global parameters (optional, refer to [global parameters](../parameter/global.md)), click the "Add" button to finish workflow definition creation.
- **Save workflow definition:** Click the `Save` button, and the "Set DAG chart name" window pops up, as shown in the figure below. Enter the workflow definition name, workflow definition description, and set global parameters (optional, refer to [global parameters](../parameter/global.md)), click the `Add` button to finish workflow definition creation.
The following are the operation functions of the workflow definition list:
- **Edit:** Only "Offline" workflow definitions can be edited. Workflow DAG editing is the same as [Create Workflow Definition](#creatDag) <!-- markdown-link-check-disable-line -->
- **Online:** When the workflow status is "Offline", used to make workflow online. Only the workflow in the "Online" state can run, but cannot edit
- **Offline:** When the workflow status is "Online", used to make workflow offline. Only the workflow in the "Offline" state can be edited, but cannot run
- **Online:** When the workflow status is "Offline", used to make workflow online. Only the workflow in the "Online" state can run, but cannot edit.
- **Offline:** When the workflow status is "Online", used to make workflow offline. Only the workflow in the "Offline" state can be edited, but cannot run.
- **Run:** Only workflow in the online state can run. See [2.3.3 Run Workflow](#run-the-workflow) for the operation steps.
- **Timing:** Timing can only set to online workflows, and the system automatically schedules to run the workflow on time. The status after creating a timing setting is "offline", and the timing must set online on the timing management page to make effect. See [2.3.4 Workflow Timing](#workflow-timing) for timing operation steps
- **Timing Management:** The timing management page can edit, online or offline and delete timing
- **Timing:** Timing can only set to online workflows, and the system automatically schedules to run the workflow on time. The status after creating a timing setting is "offline", and the timing must set online on the timing management page to make effect. See [2.3.4 Workflow Timing](#workflow-timing) for timing operation steps.
- **Timing Management:** The timing management page can edit, online or offline and delete timing.
- **Delete:** Delete the workflow definition. In the same project, only the workflow definition created by yourself can be deleted, and the workflow definition of other users cannot be deleted. If you need to delete it, please contact the user who created it or the administrator.
- **Download:** Download workflow definition to local
- **Download:** Download workflow definition to local.
- **Tree Diagram:** Display the task node type and task status in a tree structure, as shown in the figure below:
- Click Project Management -> Workflow -> Workflow Definition to enter the workflow definition page, as shown in the figure below, click the "Go Online" button <imgsrc="../../../../img/online.png"width="35"/>to make workflow online.
- Click `Project Management -> Workflow -> Workflow Definition` to enter the workflow definition page, as shown in the figure below, click the "Go Online" button <imgsrc="../../../../img/online.png"width="35"/>to make workflow online.
- Click the "Run" button to pop up the startup parameter setting window, as shown in the figure below, set the startup parameters, click the "Run" button in the pop-up box, the workflow starts running, and the workflow instance page generates a workflow instance.
- Click the `Run` button to pop up the startup parameter setting window, as shown in the figure below, set the startup parameters, click the `Run` button in the pop-up box, the workflow starts running, and the workflow instance page generates a workflow instance.
* Failure strategy: When a task node fails to execute, other parallel task nodes need to execute this strategy. "Continue" means: after a certain task fails, other task nodes execute normally; "End" means: terminate all tasks execution, and terminate the entire process
* Notification strategy: When the process is over, send the process execution result notification email according to the process status, options including no send, send if sucess, send of failure, send whatever result
* Process priority: The priority of process operation, divide into five levels: highest (HIGHEST), high (HIGH), medium (MEDIUM), low (LOW), and lowest (LOWEST). When the number of master threads is insufficient, high priority processes will execute first in the execution queue, and processes with the same priority will execute in the order of first in, first out;
* Worker group: The process can only be executed in the specified worker machine group. The default is `Default`, which can execute on any worker
* Notification group: select notification strategy||timeout alarm||when fault tolerance occurs, process result information or email will send to all members in the notification group
* Recipient: select notification policy||timeout alarm||when fault tolerance occurs, process result information or alarm email will be sent to the recipient list
* Cc: select notification policy||timeout alarm||when fault tolerance occurs, the process result information or warning email will be copied to the CC list
* Startup parameter: Set or overwrite global parameter values when starting a new process instance
* Failure strategy: When a task node fails to execute, other parallel task nodes need to execute this strategy. "Continue" means: after a certain task fails, other task nodes execute normally; "End" means: terminate all tasks execution, and terminate the entire process.
* Notification strategy: When the process is over, send the process execution result notification email according to the process status, options including no send, send if sucess, send of failure, send whatever result.
* Process priority: The priority of process operation, divide into five levels: highest (HIGHEST), high (HIGH), medium (MEDIUM), low (LOW), and lowest (LOWEST). When the number of master threads is insufficient, high priority processes will execute first in the execution queue, and processes with the same priority will execute in the order of first in, first out.
* Worker group: The process can only be executed in the specified worker machine group. The default is `Default`, which can execute on any worker.
* Notification group: select notification strategy||timeout alarm||when fault tolerance occurs, process result information or email will send to all members in the notification group.
* Recipient: select notification policy||timeout alarm||when fault tolerance occurs, process result information or alarm email will be sent to the recipient list.
* Cc: select notification policy||timeout alarm||when fault tolerance occurs, the process result information or warning email will be copied to the CC list.
* Startup parameter: Set or overwrite global parameter values when starting a new process instance.
* Complement: includes serial complement and parallel complement and supports manual date input and date selection. Serial complement defines, within the specified time range, executes the complements from the start date to the end date by sequential and N process instances are generated in sequence. Parallel complement defines executing the complement concurrently in the multiple days to generate N process instances within the specified time range. Manual date input defines: manually input the date following the date format `yyyy-MM-dd HH:mm:ss` and separate dates by a comma. Date selection defines: to select dates via UI.
* Dependent Mode: Trigger the complement of workflow instances that the downstream dependent node depends on the current workflow(It is required that the scheduled status of the workflow instance of the current replenishment is online, and only the downstream supplement directly dependent on the current workflow will be triggered)
* Dependent Mode: Trigger the complement of workflow instances that the downstream dependent node depends on the current workflow(It is required that the scheduled status of the workflow instance of the current replenishment is online, and only the downstream supplement directly dependent on the current workflow will be triggered).
* You can select complement time range (When the scheduled configuration is not online, the daily complement will be performed by default according to the selected time range.If the timing configuration is online, it will be complemented according to the selected time range in combination with the timing configuration) when executing a timing workflow definition. For example, need to fill in the data from 1st May to 10th May, as shown in the figure below:
- Create timing: Click Project Management->Workflow->Workflow Definition, enter the workflow definition page, make the workflow online, click the "timing" button <imgsrc="../../../../img/timing.png"width="35"/> , the timing parameter setting dialog box pops up, as shown in the figure below:
- Create timing: Click `Project Management->Workflow->Workflow Definition`, enter the workflow definition page, make the workflow online, click the "timing" button <imgsrc="../../../../img/timing.png"width="35"/> , the timing parameter setting dialog box pops up, as shown in the figure below:
@ -115,10 +115,10 @@ The following are the operation functions of the workflow definition list:
- Failure strategy, notification strategy, process priority, worker group, notification group, recipient, and CC are the same as workflow running parameters.
- Click the "Create" button to create the timing. Now the timing status is "**Offline**" and the timing needs to be **Online** to make effect.
- Timing online: Click the "Timing Management" button <imgsrc="../../../../img/timeManagement.png"width="35"/>, enter the timing management page, click the "online" button, the timing status will change to "online", as shown in the below figure, the workflow makes effect regularly.
- Timing online: Click the `Timing Management` button <imgsrc="../../../../img/timeManagement.png"width="35"/>, enter the timing management page, click the `online` button, the timing status will change to `online`, as shown in the below figure, the workflow makes effect regularly.
Click Project Management -> Workflow -> Workflow Definition to enter the workflow definition page, click the "Import Workflow" button to import the local workflow file, the workflow definition list displays the imported workflow and the status is offline.
Click `Project Management -> Workflow -> Workflow Definition` to enter the workflow definition page, click the `Import Workflow` button to import the local workflow file, the workflow definition list displays the imported workflow and the status is offline.
- Click Project Management -> Workflow -> Workflow Instance, enter the workflow instance page, and click the workflow name to enter the workflow DAG page;
- Double-click the task node, as shown in the figure below, click "View History" to jump to the task instance page, and display a list of task instances running by the workflow instance
- Click `Project Management -> Workflow -> Workflow Instance`, enter the workflow instance page, and click the workflow name to enter the workflow DAG page;
- Double-click the task node, as shown in the figure below, click `View History` to jump to the task instance page, and display a list of task instances running by the workflow instance.
- Click Project Management -> Workflow -> Workflow Instance, enter the workflow instance page, and click the workflow name to enter the workflow DAG page;
- Click `Project Management -> Workflow -> Workflow Instance`, enter the workflow instance page, and click the workflow name to enter the workflow DAG page;
- Click the icon in the upper left corner <imgsrc="../../../../img/run_params_button.png"width="35"/>,View the startup parameters of the workflow instance; click the icon <imgsrc="../../../../img/global_param.png"width="35"/>,View the global and local parameters of the workflow instance, as shown in the following figure: