Browse Source

[doc] Change tasks doc (#10639)

3.1.0-release
sneh-wha 2 years ago committed by GitHub
parent
commit
65ebdbef98
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
  1. 36
      docs/docs/en/guide/task/conditions.md
  2. 58
      docs/docs/en/guide/task/datax.md
  3. 3
      docs/docs/en/guide/task/dependent.md
  4. 29
      docs/docs/en/guide/task/emr.md
  5. 66
      docs/docs/en/guide/task/flink.md
  6. 43
      docs/docs/en/guide/task/http.md
  7. 54
      docs/docs/en/guide/task/jupyter.md
  8. 44
      docs/docs/en/guide/task/kubernetes.md
  9. 74
      docs/docs/en/guide/task/map-reduce.md
  10. 125
      docs/docs/en/guide/task/mlflow.md
  11. 69
      docs/docs/en/guide/task/openmldb.md
  12. 32
      docs/docs/en/guide/task/pigeon.md
  13. 37
      docs/docs/en/guide/task/python.md
  14. 3
      docs/docs/en/guide/task/shell.md
  15. 65
      docs/docs/en/guide/task/spark.md
  16. 27
      docs/docs/en/guide/task/sql.md
  17. 21
      docs/docs/en/guide/task/stored-procedure.md
  18. 2
      docs/docs/en/guide/task/sub-process.md
  19. 47
      docs/docs/en/guide/task/switch.md
  20. 32
      docs/docs/en/guide/task/zeppelin.md

36
docs/docs/en/guide/task/conditions.md

@ -4,25 +4,25 @@ Condition is a conditional node, that determines which downstream task should ru
## Create Task
- Click `Project -> Management-Project -> Name-Workflow Definition`, and click the "Create Workflow" button to enter the DAG editing page.
- Click `Project Management -> Project Name -> Workflow Definition`, and click the "`Create Workflow`" button to enter the DAG editing page.
- Drag from the toolbar <img src="../../../../img/conditions.png" width="20"/> task node to canvas.
## Parameter
- Node name: The node name in a workflow definition is unique.
- Run flag: Identifies whether this node schedules normally, if it does not need to execute, select the `prohibition execution`.
- Descriptive information: Describe the function of the node.
- Task priority: When the number of worker threads is insufficient, execute in the order of priority from high to low, and tasks with the same priority will execute in a first-in first-out order.
- Worker grouping: Assign tasks to the machines of the worker group to execute. If `Default` is selected, randomly select a worker machine for execution.
- Times of failed retry attempts: The number of times the task failed to resubmit. You can select from drop-down or fill-in a number.
- Failed retry interval: The time interval for resubmitting the task after a failed task. You can select from drop-down or fill-in a number.
- Timeout alarm: Check the timeout alarm and timeout failure. When the task runs exceed the "timeout", an alarm email will send and the task execution will fail.
- Downstream tasks selection: Depending on the status of the predecessor task, you can jump to the corresponding branch, currently two branches are supported: success, failure
- Success: When the upstream task runs successfully, run the success branch.
- Failure: When the upstream task runs failed, run the failure branch.
- Upstream condition selection: can select one or more upstream tasks for conditions.
- Add an upstream dependency: the first parameter is to choose a specified task name, and the second parameter is to choose the upstream task status to trigger conditions.
- Select upstream task relationship: use `and` and `or` operators to handle the complex relationship of upstream when there are multiple upstream tasks for conditions.
## Task Parameters
| **Parameter** | **Description** |
| -------- |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Node Name | Set the name of the task. Node names within a workflow definition are unique. |
| Run flag | Indicates whether the node can be scheduled normally. If it is not necessary to execute, you can turn on the prohibiting execution switch. |
| Description | Describes the function of this node. |
| Task priority | When the number of worker threads is insufficient, they are executed in order from high to low according to the priority, and they are executed according to the first-in, first-out principle when the priority is the same. |
| Worker group | The task is assigned to the machines in the worker group for execution. If Default is selected, a worker machine will be randomly selected for execution. |
| Task group name | The group in Resources, if not configured, it will not be used. |
| Environment Name | Configure the environment in which to run the script. |
| Number of failed retries | The number of times the task is resubmitted after failure. It supports drop-down and manual filling. |
| Failure Retry Interval | The time interval for resubmitting the task if the task fails. It supports drop-down and manual filling. |
| Timeout alarm | Check Timeout Alarm and Timeout Failure. When the task exceeds the "timeout duration", an alarm email will be sent and the task execution will fail. |
| Downstream tasks selection | Depending on the status of the predecessor task, you can jump to the corresponding branch, currently two branches are supported: success, failure <ul><li>Success: When the upstream task runs successfully, run the success branch.</li><li>Failure: When the upstream task runs failed, run the failure branch.</li></ul></li></ul> |
| Upstream condition selection | Can select one or more upstream tasks for conditions.<ul><li>Add an upstream dependency: the first parameter is to choose a specified task name, and the second parameter is to choose the upstream task status to trigger conditions.</li><li>Select upstream task relationship: use `and` and `or` operators to handle the complex relationship of upstream when there are multiple upstream tasks for conditions.</li></ul></li></ul> |
## Related Task
@ -51,7 +51,7 @@ After you finish creating the workflow, you can run the workflow online. You can
In the above figure, the task status marked with a green check mark is the successfully executed task node.
## Notice
## Note
- The Conditions task supports multiple upstream tasks, but only two downstream tasks.
- The Conditions task and the workflow that contain it do not support copy operations.

58
docs/docs/en/guide/task/datax.md

@ -6,35 +6,37 @@ DataX task type for executing DataX programs. For DataX nodes, the worker will e
## Create Task
- Click Project Management -> Project Name -> Workflow Definition, and click the "Create Workflow" button to enter the DAG editing page.
- Click `Project Management -> Project Name -> Workflow Definition`, and click the `Create Workflow` button to enter the DAG editing page.
- Drag the <img src="../../../../img/tasks/icons/datax.png" width="15"/> from the toolbar to the drawing board.
## Task Parameter
- **Node name**: The node name in a workflow definition is unique.
- **Run flag**: Identifies whether this node can be scheduled normally, if it does not need to be executed, you can turn on the prohibition switch.
- **Descriptive information**: describe the function of the node.
- **Task priority**: When the number of worker threads is insufficient, they are executed in order from high to low, and when the priority is the same, they are executed according to the first-in first-out principle.
- **Worker grouping**: Tasks are assigned to the machines of the worker group to execute. If Default is selected, a worker machine will be randomly selected for execution.
- **Environment Name**: Configure the environment name in which to run the script.
- **Number of failed retry attempts**: The number of times the task failed to be resubmitted.
- **Failed retry interval**: The time, in cents, interval for resubmitting the task after a failed task.
- **Cpu quota**: Assign the specified CPU time quota to the task executed. Takes a percentage value. Default -1 means unlimited. For example, the full CPU load of one core is 100%,and that of 16 cores is 1600%. This function is controlled by [task.resource.limit.state](../../architecture/configuration.md)
- **Max memory**:Assign the specified max memory to the task executed. Exceeding this limit will trigger oom to be killed and will not automatically retry. Takes an MB value. Default -1 means unlimited. This function is controlled by [task.resource.limit.state](../../architecture/configuration.md)
- **Delayed execution time**: The time, in cents, that a task is delayed in execution.
- **Timeout alarm**: Check the timeout alarm and timeout failure. When the task exceeds the "timeout period", an alarm email will be sent and the task execution will fail.
- **Custom template**: Custom the content of the DataX node's json profile when the default data source provided does not meet the required requirements.
- **json**: json configuration file for DataX synchronization.
- **Custom parameters**: SQL task type, and stored procedure is a custom parameter order to set values for the method. The custom parameter type and data type are the same as the stored procedure task type. The difference is that the SQL task type custom parameter will replace the \${variable} in the SQL statement.
- **Data source**: Select the data source from which the data will be extracted.
- **sql statement**: the sql statement used to extract data from the target database, the sql query column name is automatically parsed when the node is executed, and mapped to the target table synchronization column name. When the source table and target table column names are inconsistent, they can be converted by column alias.
- **Target library**: Select the target library for data synchronization.
- **Pre-sql**: Pre-sql is executed before the sql statement (executed by the target library).
- **Post-sql**: Post-sql is executed after the sql statement (executed by the target library).
- **Stream limit (number of bytes)**: Limits the number of bytes in the query.
- **Limit flow (number of records)**: Limit the number of records for a query.
- **Running memory**: the minimum and maximum memory required can be configured to suit the actual production environment.
- **Predecessor task**: Selecting a predecessor task for the current task will set the selected predecessor task as upstream of the current task.
## Task Parameters
| **Parameter** | **Description** |
| ------- | ---------- |
| Node name | The node name in a workflow definition is unique. |
| Run flag | Identifies whether this node schedules normally, if it does not need to execute, select the prohibition execution. |
| Task priority | When the number of worker threads is insufficient, execute in the order of priority from high to low, and tasks with the same priority will execute in a first-in first-out order. |
| Description | Describe the function of the node. |
| Worker group | Assign tasks to the machines of the worker group to execute. If `Default` is selected, randomly select a worker machine for execution. |
| Environment Name | Configure the environment name in which run the script. |
| Number of failed retries | The number of times the task failed to resubmit. |
| Failed retry interval | The time interval (unit minute) for resubmitting the task after a failed task. |
| Cpu quota | Assign the specified CPU time quota to the task executed. Takes a percentage value. Default -1 means unlimited. For example, the full CPU load of one core is 100%,and that of 16 cores is 1600%. This function is controlled by [task.resource.limit.state](../../architecture/configuration.md) |
| Max memory | Assign the specified max memory to the task executed. Exceeding this limit will trigger oom to be killed and will not automatically retry. Takes an MB value. Default -1 means unlimited. This function is controlled by [task.resource.limit.state](../../architecture/configuration.md) |
| Delayed execution time | The time, in cents, that a task is delayed in execution. |
| Timeout alarm | Check the timeout alarm and timeout failure. When the task exceeds the "timeout period", an alarm email will be sent and the task execution will fail. |
| Custom template | Custom the content of the DataX node's json profile when the default data source provided does not meet the required requirements. |
| json | json configuration file for DataX synchronization. |
| Custom parameters | SQL task type, and stored procedure is a custom parameter order to set values for the method. The custom parameter type and data type are the same as the stored procedure task type. The difference is that the SQL task type custom parameter will replace the \${variable} in the SQL statement. |
| Data source | Select the data source from which the data will be extracted. |
| sql statement | the sql statement used to extract data from the target database, the sql query column name is automatically parsed when the node is executed, and mapped to the target table synchronization column name. When the source table and target table column names are inconsistent, they can be converted by column alias. |
| Target library | Select the target library for data synchronization. |
| Pre-sql | Pre-sql is executed before the sql statement (executed by the target library). |
| Post-sql | Post-sql is executed after the sql statement (executed by the target library). |
| Stream limit (number of bytes) | Limits the number of bytes in the query. |
| Limit flow (number of records) | Limit the number of records for a query. |
| Running memory | the minimum and maximum memory required can be configured to suit the actual production environment. |
| Predecessor task | Selecting a predecessor task for the current task will set the selected predecessor task as upstream of the current task. |
## Task Example
@ -60,6 +62,6 @@ After writing the required json file, you can configure the node content by foll
![datax_task03](../../../../img/tasks/demo/datax_task03.png)
### Notice
### Note
If the default data source provided does not meet your needs, you can configure the writer and reader of DataX according to the actual usage environment in the custom template option, available at https://github.com/alibaba/DataX.

3
docs/docs/en/guide/task/dependent.md

@ -26,7 +26,8 @@ Dependent nodes are **dependency check nodes**. For example, process A depends o
| Delayed execution time | The time (unit minute) that a task delays in execution. |
| Pre task | Selecting a predecessor task for the current task, will set the selected predecessor task as upstream of the current task. |
## Examples
## Task Examples
The Dependent node provides a logical judgment function, which can detect the execution of the dependent node according to the logic.

29
docs/docs/en/guide/task/emr.md

@ -4,17 +4,24 @@
Amazon EMR task type, for creating EMR clusters on AWS and running computing tasks. Using [aws-java-sdk](https://aws.amazon.com/cn/sdk-for-java/) in the background code, to transfer JSON parameters to [RunJobFlowRequest](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/elasticmapreduce/model/RunJobFlowRequest.html) object and submit to AWS.
## Parameter
- Node name: The node name in a workflow definition is unique.
- Run flag: Identifies whether this node schedules normally, if it does not need to execute, select the `prohibition execution`.
- Descriptive information: Describe the function of the node.
- Task priority: When the number of worker threads is insufficient, execute in the order of priority from high to low, and tasks with the same priority will execute in a first-in first-out order.
- Worker grouping: Assign tasks to the machines of the worker group to execute. If `Default` is selected, randomly select a worker machine for execution.
- Times of failed retry attempts: The number of times the task failed to resubmit. You can select from drop-down or fill-in a number.
- Failed retry interval: The time interval for resubmitting the task after a failed task. You can select from drop-down or fill-in a number.
- Timeout alarm: Check the timeout alarm and timeout failure. When the task runs exceed the "timeout", an alarm email will send and the task execution will fail.
- JSON: JSON corresponding to the [RunJobFlowRequest](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/elasticmapreduce/model/RunJobFlowRequest.html) object, for details refer to [API_RunJobFlow_Examples](https://docs.aws.amazon.com/emr/latest/APIReference/API_RunJobFlow.html#API_RunJobFlow_Examples).
## Create Task
* Click `Project Management -> Project Name -> Workflow Definition`, click the "`Create Workflow`" button to enter the DAG editing page.
* Drag `AmazonEMR` task from the toolbar to the artboard to complete the creation.
## Task Parameters
| **Parameter** | **Description** |
| ------- | ---------- |
| Node name | The node name in a workflow definition is unique. |
| Run flag | Identifies whether this node schedules normally, if it does not need to execute, select the `prohibition execution`.|
| Description | Describe the function of the node. |
| Task priority | When the number of worker threads is insufficient, execute in the order of priority from high to low, and tasks with the same priority will execute in a first-in first-out order. |
| Worker grouping | Assign tasks to the machines of the worker group to execute. If `Default` is selected, randomly select a worker machine for execution. |
| Times of failed retry attempts | The number of times the task failed to resubmit. You can select from drop-down or fill-in a number. |
| Failed retry interval: The time interval for resubmitting the task after a failed task. You can select from drop-down or fill-in a number. |
| Timeout alarm | Check the timeout alarm and timeout failure. When the task runs exceed the "timeout", an alarm email will send and the task execution will fail. |
| JSON | JSON corresponding to the [RunJobFlowRequest](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/elasticmapreduce/model/RunJobFlowRequest.html) object, for details refer to [API_RunJobFlow_Examples](https://docs.aws.amazon.com/emr/latest/APIReference/API_RunJobFlow.html#API_RunJobFlow_Examples). |
## JSON example

66
docs/docs/en/guide/task/flink.md

@ -10,39 +10,41 @@ Flink task type, used to execute Flink programs. For Flink nodes:
## Create Task
- Click `Project -> Management-Project -> Name-Workflow Definition`, and click the "Create Workflow" button to enter the DAG editing page.
- Click `Project Management -> Project Name -> Workflow Definition`, and click the `Create Workflow` button to enter the DAG editing page.
- Drag from the toolbar <img src="../../../../img/tasks/icons/flink.png" width="15"/>task node to canvas.
## Task Parameter
- **Node name**: The node name in a workflow definition is unique.
- **Run flag**: Identifies whether this node schedules normally, if it does not need to execute, select the `prohibition execution`.
- **Descriptive information**: Describe the function of the node.
- **Task priority**: When the number of worker threads is insufficient, execute in the order of priority from high to low, and tasks with the same priority will execute in a first-in first-out order.
- **Worker grouping**: Assign tasks to the machines of the worker group to execute. If `Default` is selected, randomly select a worker machine for execution.
- **Environment Name**: Configure the environment name in which run the script.
- **Times of failed retry attempts**: The number of times the task failed to resubmit.
- **Failed retry interval**: The time interval (unit minute) for resubmitting the task after a failed task.
- **Delayed execution time**: The time (unit minute) that a task delays in execution.
- **Timeout alarm**: Check the timeout alarm and timeout failure. When the task runs exceed the "timeout", an alarm email will send and the task execution will fail.
- **Program type**: Support Java, Scala, Python and SQL four languages.
- **The class of main function**: The **full path** of Main Class, the entry point of the Flink program.
- **Main jar package**: The jar package of the Flink program (upload by Resource Center).
- **Deployment mode**: Support 2 deployment modes: cluster and local.
- **Initialization script**: Script file to initialize session context.
- **Script**: The sql script file developed by the user that should be executed.
- **Flink version**: Select version according to the execution env.
- **Task name** (optional): Flink task name.
- **JobManager memory size**: Used to set the size of jobManager memories, which can be set according to the actual production environment.
- **Number of slots**: Used to set the number of slots, which can be set according to the actual production environment.
- **TaskManager memory size**: Used to set the size of taskManager memories, which can be set according to the actual production environment.
- **Number of TaskManager**: Used to set the number of taskManagers, which can be set according to the actual production environment.
- **Parallelism**: Used to set the degree of parallelism for executing Flink tasks.
- **Main program parameters**: Set the input parameters for the Flink program and support the substitution of custom parameter variables.
- **Optional parameters**: Support `--jar`, `--files`,` --archives`, `--conf` format.
- **Resource**: Appoint resource files in the `Resource` if parameters refer to them.
- **Custom parameter**: It is a local user-defined parameter for Flink, and will replace the content with `${variable}` in the script.
- **Predecessor task**: Selecting a predecessor task for the current task, will set the selected predecessor task as upstream of the current task.
## Task Parameters
| **Parameter** | **Description** |
| ------- | ---------- |
| Node name | The node name in a workflow definition is unique. |
| Run flag | Identifies whether this node schedules normally, if it does not need to execute, select the `prohibition execution`. |
| Description | Describe the function of the node. |
| Task priority | When the number of worker threads is insufficient, execute in the order of priority from high to low, and tasks with the same priority will execute in a first-in first-out order. |
| Worker grouping | Assign tasks to the machines of the worker group to execute. If `Default` is selected, randomly select a worker machine for execution. |
| Environment Name | Configure the environment name in which run the script. |
| Times of failed retry attempts | The number of times the task failed to resubmit. |
| Failed retry interval | The time interval (unit minute) for resubmitting the task after a failed task. |
| Delayed execution time | The time (unit minute) that a task delays in execution. |
| Timeout alarm | Check the timeout alarm and timeout failure. When the task runs exceed the "timeout", an alarm email will send and the task execution will fail. |
| Program type | Support Java, Scala, Python and SQL four languages. |
| Class of main function**: The **full path** of Main Class, the entry point of the Flink program. |
| Main jar package | The jar package of the Flink program (upload by Resource Center). |
| Deployment mode | Support 2 deployment modes: cluster and local. |
| Initialization script | Script file to initialize session context. |
| Script | The sql script file developed by the user that should be executed. |
| Flink version | Select version according to the execution environment. |
| Task name | Flink task name. |
| JobManager memory size | Used to set the size of jobManager memories, which can be set according to the actual production environment. |
| Number of slots | Used to set the number of slots, which can be set according to the actual production environment. |
| TaskManager memory size | Used to set the size of taskManager memories, which can be set according to the actual production environment. |
| Number of TaskManager | Used to set the number of taskManagers, which can be set according to the actual production environment. |
| Parallelism | Used to set the degree of parallelism for executing Flink tasks. |
| Main program parameters | Set the input parameters for the Flink program and support the substitution of custom parameter variables. |
| Optional parameters | Support `--jar`, `--files`,` --archives`, `--conf` format. |
| Resource | Appoint resource files in the `Resource` if parameters refer to them. |
| Custom parameter | It is a local user-defined parameter for Flink, and will replace the content with `${variable}` in the script. |
| Predecessor task | Selecting a predecessor task for the current task, will set the selected predecessor task as upstream of the current task. |
## Task Example
@ -76,7 +78,7 @@ Configure the required content according to the parameter descriptions above.
![demo-flink-sql-simple](../../../../img/tasks/demo/flink_sql_test.png)
## Notice
## Note
- JAVA and Scala only used for identification, there is no difference. If use Python to develop Flink, there is no class of the main function and the rest is the same.

43
docs/docs/en/guide/task/http.md

@ -6,28 +6,28 @@ This node is used to perform http type tasks such as the common POST and GET req
## Create Task
- Click `Project -> Management-Project -> Name-Workflow Definition`, and click the "Create Workflow" button to enter the DAG editing page.
- Click `Project Management -> Project Name -> Workflow Definition`, and click the "`Create Workflow`" button to enter the DAG editing page.
- Drag the <img src="../../../../img/tasks/icons/http.png" width="15"/> from the toolbar to the drawing board.
## Task Parameter
- **Node name**: The node name in a workflow definition is unique.
- **Run flag**: Identifies whether this node can be scheduled normally, if it does not need to be executed, you can turn on the prohibition switch.
- **Descriptive information**: describe the function of the node.
- **Task priority**: When the number of worker threads is insufficient, they are executed in order from high to low, and when the priority is the same, they are executed according to the first-in first-out principle.
- **Worker grouping**: Tasks are assigned to the machines of the worker group to execute. If Default is selected, a worker machine will be randomly selected for execution.
- **Environment Name**: Configure the environment name in which to run the script.
- **Number of failed retry attempts**: The number of times the task failed to be resubmitted.
- **Failed retry interval**: The time, in cents, interval for resubmitting the task after a failed task.
- **Delayed execution time**: the time, in cents, that a task is delayed in execution.
- **Timeout alarm**: Check the timeout alarm and timeout failure. When the task exceeds the "timeout period", an alarm email will be sent and the task execution will fail.
- **Request address**: HTTP request URL.
- **Request type**: Support GET, POSt, HEAD, PUT, DELETE.
- **Request parameters**: Support Parameter, Body, Headers.
- **Verification conditions**: support default response code, custom response code, content included, content not included.
- **Verification content**: When the verification condition selects a custom response code, the content contains, and the content does not contain, the verification content is required.
- **Custom parameter**: It is a user-defined parameter of http part, which will replace the content with `${variable}` in the script.
- **Predecessor task**: Selecting a predecessor task for the current task will set the selected predecessor task as upstream of the current task.
## Task Parameters
| **Parameter** | **Description** |
| ------- | ---------- |
| Node Name | Set the name of the task. Node names within a workflow definition are unique. |
| Run flag | Indicates whether the node can be scheduled normally. If it is not necessary to execute, you can turn on the prohibiting execution switch. |
| Description | Describes the function of this node. |
| Task priority | When the number of worker threads is insufficient, they are executed in order from high to low according to the priority, and they are executed according to the first-in, first-out principle when the priority is the same. |
| Worker group | The task is assigned to the machines in the worker group for execution. If Default is selected, a worker machine will be randomly selected for execution. |
| Task group name | The group in Resources, if not configured, it will not be used. |
| Environment Name | Configure the environment in which to run the script. |
| Number of failed retries | The number of times the task is resubmitted after failure. It supports drop-down and manual filling. |
| Failure Retry Interval | The time interval for resubmitting the task if the task fails. It supports drop-down and manual filling. |
| Timeout alarm | Check Timeout Alarm and Timeout Failure. When the task exceeds the "timeout duration", an alarm email will be sent and the task execution will fail. |
| Request address | HTTP request URL. |
| Request type | Supports GET, POSt, HEAD, PUT, DELETE. || Request parameters |Supports Parameter, Body, Headers. || Verification conditions | Supports default response code, custom response code, content included, content not included.|
| Verification content | When the verification condition selects a custom response code, the content contains, and the content does not contain, the verification content is required. |
| Custom parameter | It is a user-defined parameter of http part, which will replace the content with `${variable}` in the script. |
| Pre tasks | Selecting a predecessor task for the current task will set the selected predecessor task as upstream of the current task. |
## Example
@ -42,6 +42,3 @@ The main configuration parameters are as follows:
![http_task](../../../../img/tasks/demo/http_task01.png)
## Notice
None

54
docs/docs/en/guide/task/jupyter.md

@ -11,7 +11,7 @@ it will use `papermill` to evaluate jupyter notes. Click [here](https://papermil
Click [here](https://docs.conda.io/en/latest/) for more information about `conda`.
- `conda.path` is set to `/opt/anaconda3/etc/profile.d/conda.sh` by default. If you have no idea where your `conda` is, simply run `conda info | grep -i 'base environment'`.
> NOTICE: `Jupyter Task Plugin` uses `source` command to activate conda environment.
> NOTE: `Jupyter Task Plugin` uses `source` command to activate conda environment.
> If your tenant does not have permission to use `source`, `Jupyter Task Plugin` will not function.
@ -28,7 +28,7 @@ Click [here](https://docs.conda.io/en/latest/) for more information about `conda
2. Upload packed conda environment to `resource center`.
3. Select your packed conda environment as `resource` in your `jupyter task`, e.g. `jupyter_env.tar.gz`.
> **_Note:_** Make sure you follow the [Conda-Pack](https://conda.github.io/conda-pack/) official instructions.
> NOTE: Make sure you follow the [Conda-Pack](https://conda.github.io/conda-pack/) official instructions.
> If you unpack your packed conda environment, the directory structure should be the same as below:
```
@ -42,36 +42,40 @@ Click [here](https://docs.conda.io/en/latest/) for more information about `conda
└── ssl
```
> NOTICE: Please follow the `conda pack` instructions above strictly, and DO NOT modify `bin/activate`.
> NOTE: Please follow the `conda pack` instructions above strictly, and DO NOT modify `bin/activate`.
> `Jupyter Task Plugin` uses `source` command to activate your packed conda environment.
> If you are concerned about using `source`, choose other options to manage your python dependency.
## Create Task
- Click Project Management-Project Name-Workflow Definition, and click the "Create Workflow" button to enter the DAG editing page.
- Click `Project Management-Project Name-Workflow Definition`, and click the "`Create Workflow`" button to enter the DAG editing page.
- Drag <img src="../../../../img/tasks/icons/jupyter.png" width="15"/> from the toolbar to the canvas.
## Task Parameter
- Node name: The node name in a workflow definition is unique.
- Run flag: Identifies whether this node can be scheduled normally, if it does not need to be executed, you can turn on the prohibition switch.
- Descriptive information: Describe the function of the node.
- Task priority: When the number of worker threads is insufficient, execute in the order of priority from high to low, and tasks with the same priority will execute in a first-in first-out order.
- Worker grouping: Assign tasks to the machines of the worker group to execute. If `Default` is selected, randomly select a worker machine for execution.
- Number of failed retry attempts: The failure task resubmitting times. It supports drop-down and hand-filling.
- Failed retry interval: The time interval for resubmitting the task after a failed task. It supports drop-down and hand-filling.
- Cpu quota: Assign the specified CPU time quota to the task executed. Takes a percentage value. Default -1 means unlimited. For example, the full CPU load of one core is 100%,and that of 16 cores is 1600%. This function is controlled by [task.resource.limit.state](../../architecture/configuration.md)
- Max memory:Assign the specified max memory to the task executed. Exceeding this limit will trigger oom to be killed and will not automatically retry. Takes an MB value. Default -1 means unlimited. This function is controlled by [task.resource.limit.state](../../architecture/configuration.md)
- Timeout alarm: Check the timeout alarm and timeout failure. When the task exceeds the "timeout period", an alarm email will send and the task execution will fail.
- Conda Env Name: Name of conda environment or packed conda environment tarball.
- Input Note Path: Path of input jupyter note template.
- Out Note Path: Path of output note.
- Jupyter Parameters: Parameters in json format used for jupyter note parameterization.
- Kernel: Jupyter notebook kernel.
- Engine: Engine to evaluate jupyter notes.
- Jupyter Execution Timeout: Timeout set for each jupyter notebook cell.
- Jupyter Start Timeout: Timeout set for jupyter notebook kernel.
- Others: Other command options for papermill.
## Task Parameters
| **Parameter** | **Description** |
| ------- | ---------- |
| Node Name | Set the name of the task. Node names within a workflow definition are unique. |
| Run flag | Indicates whether the node can be scheduled normally. If it is not necessary to execute, you can turn on the prohibiting execution switch. |
| Description | Describes the function of this node. |
| Task priority | When the number of worker threads is insufficient, they are executed in order from high to low according to the priority, and they are executed according to the first-in, first-out principle when the priority is the same. |
| Worker group | The task is assigned to the machines in the worker group for execution. If Default is selected, a worker machine will be randomly selected for execution. |
| Task group name | The group in Resources, if not configured, it will not be used. |
| Environment Name | Configure the environment in which to run the script. |
| Number of failed retries | The number of times the task is resubmitted after failure. It supports drop-down and manual filling. |
| Failure Retry Interval | The time interval for resubmitting the task if the task fails. It supports drop-down and manual filling. |
| Cpu quota | Assign the specified CPU time quota to the task executed. Takes a percentage value. Default -1 means unlimited. For example, the full CPU load of one core is 100%,and that of 16 cores is 1600%. This function is controlled by [task.resource.limit.state](../../architecture/configuration.md). |
| Max memory | Assign the specified max memory to the task executed. Exceeding this limit will trigger oom to be killed and will not automatically retry. Takes an MB value. Default -1 means unlimited. This function is controlled by [task.resource.limit.state](../../architecture/configuration.md). |
| Timeout alarm | Check the timeout alarm and timeout failure. When the task exceeds the "timeout period", an alarm email will send and the task execution will fail. |
| Conda Env Name | Name of conda environment or packed conda environment tarball. |
|Input Note Path | Path of input jupyter note template. |
| Out Note Path | Path of output note. |
| Jupyter Parameters | Parameters in json format used for jupyter note parameterization. |
| Kernel | Jupyter notebook kernel. |
| Engine | Engine to evaluate jupyter notes. |
| Jupyter Execution Timeout | Timeout set for each jupyter notebook cell. |
| Jupyter Start Timeout | Timeout set for jupyter notebook kernel. |
| Others | Other command options for papermill. |
## Task Example

44
docs/docs/en/guide/task/kubernetes.md

@ -6,27 +6,31 @@ K8S task type used to execute a batch task. In this task, the worker submits the
## Create Task
- Click `Project -> Management-Project -> Name-Workflow Definition`, and click the `Create Workflow` button to enter the DAG editing page.
- Click `Project Management -> Project Name -> Workflow Definition`, and click the `Create Workflow` button to enter the DAG editing page.
- Drag from the toolbar <img src="../../../../img/tasks/icons/kubernetes.png" width="15"/> to the canvas.
## Task Parameter
- **Node name**: The node name in a workflow definition is unique.
- **Run flag**: Identifies whether this node schedules normally, if it does not need to execute, select the `prohibition execution`.
- **Descriptive information**: Describe the function of the node.
- **Task priority**: When the number of worker threads is insufficient, execute in the order of priority from high to low, and tasks with the same priority will execute in a first-in first-out order.
- **Worker grouping**: Assign tasks to the machines of the worker group to execute. If `Default` is selected, randomly select a worker machine for execution.
- **Environment Name**: Configure the environment name in which to run the task.
- **Times of failed retry attempts**: The number of times the task failed to resubmit.
- **Failed retry interval**: The time interval (unit minute) for resubmitting the task after a failed task.
- **Delayed execution time**: The time (unit minute) that a task delays in execution.
- **Timeout alarm**: Check the timeout alarm and timeout failure. When the task runs exceed the "timeout", an alarm email will send and the task execution will fail.
- **Namespace**::the namespace for running k8s task
- **Min CPU**:min CPU requirement for running k8s task
- **Min Memory**:min memory requirement for running k8s task
- **Image**:the registry url for image
- **Custom parameter**: It is a local user-defined parameter for K8S task, these params will pass to container as environment variables.
- **Predecessor task**: Selecting a predecessor task for the current task, will set the selected predecessor task as upstream of the current task.
## Task Parameters
| **Parameter** | **Description** |
| ------- | ---------- |
| Node Name | Set the name of the task. Node names within a workflow definition are unique. |
| Run flag | Indicates whether the node can be scheduled normally. If it is not necessary to execute, you can turn on the prohibiting execution switch. |
| Description | Describes the function of this node. |
| Task priority | When the number of worker threads is insufficient, they are executed in order from high to low according to the priority, and they are executed according to the first-in, first-out principle when the priority is the same. |
| Worker group | The task is assigned to the machines in the worker group for execution. If Default is selected, a worker machine will be randomly selected for execution. |
| Task group name | The group in Resources, if not configured, it will not be used. |
| Environment Name | Configure the environment in which to run the script. |
| Number of failed retries | The number of times the task is resubmitted after failure. It supports drop-down and manual filling. |
| Failure Retry Interval | The time interval for resubmitting the task if the task fails. It supports drop-down and manual filling. |
| Timeout alarm | Check Timeout Alarm and Timeout Failure. When the task exceeds the "timeout duration", an alarm email will be sent and the task execution will fail. |
| Namespace | The namespace for running k8s task. |
| Min CPU | Minimum CPU requirement for running k8s task. |
| Min Memory | Minimum memory requirement for running k8s task. |
| Image | The registry url for image. |
| Custom parameter | It is a local user-defined parameter for K8S task, these params will pass to container as environment variables. |
| Predecessor task | Selecting a predecessor task for the current task, will set the selected predecessor task as upstream of the current task. |
## Task Example
### Configure the K8S Environment in DolphinScheduler
@ -39,6 +43,6 @@ Configure the required content according to the parameter descriptions above.
![K8S](../../../../img/tasks/demo/kubernetes-task-en.png)
## Notice
## Note
Task name contains only lowercase alphanumeric characters or '-'

74
docs/docs/en/guide/task/map-reduce.md

@ -6,45 +6,51 @@ MapReduce(MR) task type used for executing MapReduce programs. For MapReduce nod
## Create Task
- Click `Project -> Management-Project -> Name-Workflow Definition`, and click the `Create Workflow` button to enter the DAG editing page.
- Click `Project Management -> Project Name -> Workflow Definition`, and click the `Create Workflow` button to enter the DAG editing page.
- Drag from the toolbar <img src="../../../../img/tasks/icons/mr.png" width="15"/> to the canvas.
## Task Parameter
- **Node name**: The node name in a workflow definition is unique.
- **Run flag**: Identifies whether this node schedules normally, if it does not need to execute, select the `prohibition execution`.
- **Descriptive information**: Describe the function of the node.
- **Task priority**: When the number of worker threads is insufficient, execute in the order of priority from high to low, and tasks with the same priority will execute in a first-in first-out order.
- **Worker grouping**: Assign tasks to the machines of the worker group to execute. If `Default` is selected, randomly select a worker machine for execution.
- **Environment Name**: Configure the environment name in which run the script.
- **Times of failed retry attempts**: The number of times the task failed to resubmit.
- **Failed retry interval**: The time interval (unit minute) for resubmitting the task after a failed task.
- **Delayed execution time**: The time (unit minute) that a task delays in execution.
- **Timeout alarm**: Check the timeout alarm and timeout failure. When the task runs exceed the "timeout", an alarm email will send and the task execution will fail.
- **Resource**: Refers to the list of resource files that called in the script, and upload or create files by the Resource Center file management.
- **Custom parameters**: It is a local user-defined parameter for MapReduce, and will replace the content with `${variable}` in the script.
- **Predecessor task**: Selecting a predecessor task for the current task, will set the selected predecessor task as upstream of the current task.
## Task Parameters
### General
| **Parameter** | **Description** |
| ------- | ---------- |
| Node Name | Set the name of the task. Node names within a workflow definition are unique. |
| Run flag | Indicates whether the node can be scheduled normally. If it is not necessary to execute, you can turn on the prohibiting execution switch. |
| Description | Describes the function of this node. |
| Task priority | When the number of worker threads is insufficient, they are executed in order from high to low according to the priority, and they are executed according to the first-in, first-out principle when the priority is the same. |
| Worker group | The task is assigned to the machines in the worker group for execution. If Default is selected, a worker machine will be randomly selected for execution. |
| Task group name | The group in Resources, if not configured, it will not be used. |
| Environment Name | Configure the environment in which to run the script. |
| Number of failed retries | The number of times the task is resubmitted after failure. It supports drop-down and manual filling. |
| Failure Retry Interval | The time interval for resubmitting the task if the task fails. It supports drop-down and manual filling. |
| Timeout alarm | Check Timeout Alarm and Timeout Failure. When the task exceeds the "timeout duration", an alarm email will be sent and the task execution will fail. |
| Resource | Refers to the list of resource files that called in the script, and upload or create files by the Resource Center file management. |
| Custom parameters | It is a local user-defined parameter for MapReduce, and will replace the content with `${variable}` in the script. |
| Predecessor task | Selecting a predecessor task for the current task, will set the selected predecessor task as upstream of the current task. |
### JAVA or SCALA Program
- **Program type**: Select JAVA or SCALA program.
- **The class of the main function**: The **full path** of Main Class, the entry point of the MapReduce program.
- **Main jar package**: The jar package of the MapReduce program.
- **Task name** (optional): MapReduce task name.
- **Command line parameters**: Set the input parameters of the MapReduce program and support the substitution of custom parameter variables.
- **Other parameters**: support `-D`, `-files`, `-libjars`, `-archives` format.
- **Resource**: Appoint resource files in the `Resource` if parameters refer to them.
- **User-defined parameter**: It is a local user-defined parameter for MapReduce, and will replace the content with `${variable}` in the script.
## Python Program
- **Program type**: Select Python language.
- **Main jar package**: The Python jar package for running MapReduce.
- **Other parameters**: support `-D`, `-mapper`, `-reducer,` `-input` `-output` format, and you can set the input of user-defined parameters, such as:
- `-mapper "mapper.py 1"` `-file mapper.py` `-reducer reducer.py` `-file reducer.py` `–input /journey/words.txt` `-output /journey/out/mr/\${currentTimeMillis}`
- The `mapper.py 1` after `-mapper` is two parameters, the first parameter is `mapper.py`, and the second parameter is `1`.
- **Resource**: Appoint resource files in the `Resource` if parameters refer to them.
- **User-defined parameter**: It is a local user-defined parameter for MapReduce, and will replace the content with `${variable}` in the script.
| **Parameter** | **Description** |
| ------- | ---------- |
| Program type | Select JAVA or SCALA program. |
| The class of the main function | The **full path** of Main Class, the entry point of the MapReduce program. |
| Main jar package | The jar package of the MapReduce program. |
| Task name | MapReduce task name. |
| Command line parameters | Set the input parameters of the MapReduce program and support the substitution of custom parameter variables. |
| Other parameters | Support `-D`, `-files`, `-libjars`, `-archives` format. |
| Resource | Appoint resource files in the `Resource` if parameters refer to them. |
| User-defined parameter | It is a local user-defined parameter for MapReduce, and will replace the content with `${variable}` in the script. |
### Python Program
| **Parameter** | **Description** |
| ------- | ---------- |
| Program type | Select Python language. |
| Main jar package | The Python jar package for running MapReduce. |
| Other parameters | Support `-D`, `-mapper`, `-reducer,` `-input` `-output` format, and you can set the input of user-defined parameters, such as:<ul><li>`-mapper "mapper.py 1"` `-file mapper.py` `-reducer reducer.py` `-file reducer.py` `–input /journey/words.txt` `-output /journey/out/mr/${currentTimeMillis}`</li><li>The `mapper.py 1` after `-mapper` is two parameters, the first parameter is `mapper.py`, and the second parameter is `1`. </li></ul> |
| Resource | Appoint resource files in the `Resource` if parameters refer to them. |
| User-defined parameter | It is a local user-defined parameter for MapReduce, and will replace the content with `${variable}` in the script. |
## Task Example

125
docs/docs/en/guide/task/mlflow.md

@ -31,35 +31,26 @@ The MLflow plugin currently supports and will support the following:
## Create Task
- Click `Project -> Management-Project -> Name-Workflow Definition`, and click the "Create Workflow" button to enter the
DAG editing page.
- Click `Project Management -> Project Name -> Workflow Definition`, and click the `Create Workflow` button to enter the DAG editing page.
- Drag from the toolbar <img src="../../../../img/tasks/icons/mlflow.png" width="15"/> task node to canvas.
## Task Example
First, introduce some general parameters of DolphinScheduler:
- **Node name**: The node name in a workflow definition is unique.
- **Run flag**: Identifies whether this node schedules normally, if it does not need to execute, select
the `prohibition execution`.
- **Descriptive information**: Describe the function of the node.
- **Task priority**: When the number of worker threads is insufficient, execute in the order of priority from high
to low, and tasks with the same priority will execute in a first-in first-out order.
- **Worker grouping**: Assign tasks to the machines of the worker group to execute. If `Default` is selected,
randomly select a worker machine for execution.
- **Environment Name**: Configure the environment name in which run the script.
- **Times of failed retry attempts**: The number of times the task failed to resubmit.
- **Failed retry interval**: The time interval (unit minute) for resubmitting the task after a failed task.
- **Delayed execution time**: The time (unit minute) that a task delays in execution.
- **Timeout alarm**: Check the timeout alarm and timeout failure. When the task runs exceed the "timeout", an alarm
email will send and the task execution will fail.
- **Predecessor task**: Selecting a predecessor task for the current task, will set the selected predecessor task as
upstream of the current task.
Here are some specific parameters for the MLFlow component:
- **MLflow Tracking Server URI**: MLflow Tracking Server URI, default http://localhost:5000.
- **Experiment Name**: Create the experiment where the task is running, if the experiment does not exist. If the name is empty, it is set to ` Default `, the same as MLflow.
## Task Parameters and Example
| **Parameter** | **Description** |
| ------- | ---------- |
| Node Name | Set the name of the task. Node names within a workflow definition are unique. |
| Run flag | Indicates whether the node can be scheduled normally. If it is not necessary to execute, you can turn on the prohibiting execution switch. |
| Description | Describes the function of this node. |
| Task priority | When the number of worker threads is insufficient, they are executed in order from high to low according to the priority, and they are executed according to the first-in, first-out principle when the priority is the same. |
| Worker group | The task is assigned to the machines in the worker group for execution. If Default is selected, a worker machine will be randomly selected for execution. |
| Task group name | The group in Resources, if not configured, it will not be used. |
| Environment Name | Configure the environment in which to run the script. |
| Number of failed retries | The number of times the task is resubmitted after failure. It supports drop-down and manual filling. |
| Failure Retry Interval | The time interval for resubmitting the task if the task fails. It supports drop-down and manual filling. |
| Timeout alarm | Check Timeout Alarm and Timeout Failure. When the task exceeds the "timeout duration", an alarm email will be sent and the task execution will fail. |
| Predecessor task | Selecting the predecessor task of the current task will set the selected predecessor task as the upstream of the current task. |
| MLflow Tracking Server URI | MLflow Tracking Server URI, default http://localhost:5000. |
| Experiment Name | Create the experiment where the task is running, if the experiment does not exist. If the name is empty, it is set to ` Default `, the same as MLflow. |
### MLflow Projects
@ -67,71 +58,54 @@ Here are some specific parameters for the MLFlow component:
![mlflow-conda-env](../../../../img/tasks/demo/mlflow-basic-algorithm.png)
**Task Parameter**
- **Register Model**: Register the model or not. If register is selected, the following parameters are expanded.
- **Model Name**: The registered model name is added to the original model version and registered as
Production.
- **Data Path**: The absolute path of the file or folder. Ends with .csv for file or contain train.csv and
test.csv for folder(In the suggested way, users should build their own test sets for model evaluation).
- **Parameters**: Parameter when initializing the algorithm/AutoML model, which can be empty. For example
parameters `"time_budget=30;estimator_list=['lgbm']"` for flaml 。The convention will be passed with '; ' shards
each parameter, using the name before the equal sign as the parameter name, and using the name after the equal
sign to get the corresponding parameter value through `python eval()`.
- [Logistic Regression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression)
- [SVM](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html?highlight=svc#sklearn.svm.SVC)
- [lightgbm](https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMClassifier.html#lightgbm.LGBMClassifier)
- [xgboost](https://xgboost.readthedocs.io/en/stable/python/python_api.html#xgboost.XGBClassifier)
- **Algorithm**:The selected algorithm currently supports `LR`, `SVM`, `LightGBM` and `XGboost` based
on [scikit-learn](https://scikit-learn.org/) form.
- **Parameter Search Space**: Parameter search space when running the corresponding algorithm, which can be
empty. For example, the parameter `max_depth=[5, 10];n_estimators=[100, 200]` for lightgbm 。The convention
will be passed with '; 'shards each parameter, using the name before the equal sign as the parameter name,
and using the name after the equal sign to get the corresponding parameter value through `python eval()`.
**Task Parameters**
| **Parameter** | **Description** |
| ------- | ---------- |
| Register Model | Register the model or not. If register is selected, the following parameters are expanded. |
| Model Name | The registered model name is added to the original model version and registered as Production. |
| Data Path | The absolute path of the file or folder. Ends with .csv for file or contain train.csv and test.csv for folder(In the suggested way, users should build their own test sets for model evaluation. |
| Parameters | Parameter when initializing the algorithm/AutoML model, which can be empty. For example, parameters `"time_budget=30;estimator_list=['lgbm']"` for flaml 。The convention will be passed with '; ' shards each parameter, using the name before the equal sign as the parameter name, and using the name after the equal sign to get the corresponding parameter value through `python eval()`. <ul><li>[Logistic Regression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression)</li><li>[SVM](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html?highlight=svc#sklearn.svm.SVC)</li><li>[lightgbm](https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMClassifier.html#lightgbm.LGBMClassifier)</li><li>[xgboost](https://xgboost.readthedocs.io/en/stable/python/python_api.html#xgboost.XGBClassifier)</li></ul> |
| Algorithm |The selected algorithm currently supports `LR`, `SVM`, `LightGBM` and `XGboost` based on [scikit-learn](https://scikit-learn.org/) form. |
| Parameter Search Space | Parameter search space when running the corresponding algorithm, which can be empty. For example, the parameter `max_depth=[5, 10];n_estimators=[100, 200]` for lightgbm 。The convention will be passed with '; 'shards each parameter, using the name before the equal sign as the parameter name, and using the name after the equal sign to get the corresponding parameter value through `python eval()`. |
#### AutoML
![mlflow-automl](../../../../img/tasks/demo/mlflow-automl.png)
**Task Parameter**
- **Register Model**: Register the model or not. If register is selected, the following parameters are expanded.
- **model name**: The registered model name is added to the original model version and registered as
Production.
- **Data Path**: The absolute path of the file or folder. Ends with .csv for file or contain train.csv and
test.csv for folder(In the suggested way, users should build their own test sets for model evaluation).
- **Parameters**: Parameter when initializing the algorithm/AutoML model, which can be empty. For example
parameters `n_estimators=200;learning_rate=0.2` for flaml. The convention will be passed with '; 'shards
each parameter, using the name before the equal sign as the parameter name, and using the name after the equal
sign to get the corresponding parameter value through `python eval()`. The detailed parameter list is as follows:
- [flaml](https://microsoft.github.io/FLAML/docs/reference/automl#automl-objects)
- [autosklearn](https://automl.github.io/auto-sklearn/master/api.html)
- **AutoML tool**: The AutoML tool used, currently
supports [autosklearn](https://github.com/automl/auto-sklearn)
and [flaml](https://github.com/microsoft/FLAML).
| **Parameter** | **Description** |
| ------- | ---------- |
| Register Model | Register the model or not. If register is selected, the following parameters are expanded. |
| model name | The registered model name is added to the original model version and registered as Production. |
| Data Path | The absolute path of the file or folder. Ends with .csv for file or contain train.csv and test.csv for folder(In the suggested way, users should build their own test sets for model evaluation). |
| Parameters | Parameter when initializing the algorithm/AutoML model, which can be empty. For example, parameters `n_estimators=200;learning_rate=0.2` for flaml. The convention will be passed with '; 'shards each parameter, using the name before the equal sign as the parameter name, and using the name after the equal sign to get the corresponding parameter value through `python eval()`. The detailed parameter list is as follows: <ul><li>[flaml](https://microsoft.github.io/FLAML/docs/reference/automl#automl-objects)</li><li>[autosklearn](https://automl.github.io/auto-sklearn/master/api.html)</li></ul> |
| AutoML tool | The AutoML tool used, currently supports [autosklearn](https://github.com/automl/auto-sklearn) and [flaml](https://github.com/microsoft/FLAML). |
#### Custom projects
![mlflow-custom-project.png](../../../../img/tasks/demo/mlflow-custom-project.png)
**Task Parameter**
- **parameters**: `--param-list` in `mlflow run`. For example `-P learning_rate=0.2 -P colsample_bytree=0.8 -P subsample=0.9`.
- **Repository**: Repository url of MLflow Project,Support git address and directory on worker. If it's in a subdirectory,We add `#` to support this (same as `mlflow run`) , for example `https://github.com/mlflow/mlflow#examples/xgboost/xgboost_native`.
- **Project Version**: Version of the project,default master.
| **Parameter** | **Description** |
| ------- | ---------- |
| parameters | `--param-list` in `mlflow run`. For example `-P learning_rate=0.2 -P colsample_bytree=0.8 -P subsample=0.9`. |
| Repository | Repository url of MLflow Project,Support git address and directory on worker. If it's in a subdirectory,We add `#` to support this (same as `mlflow run`) , for example `https://github.com/mlflow/mlflow#examples/xgboost/xgboost_native`. |
| Project Version | Version of the project,default master. |
You can now use this feature to run all MLFlow projects on Github (For example [MLflow examples](https://github.com/mlflow/mlflow/tree/master/examples) ). You can also create your own machine learning library to reuse your work, and then use DolphinScheduler to use your library with one click.
### MLflow Models
General Parameters:
**General Parameters**
- **Model-URI**: Model-URI of MLflow , support `models:/<model_name>/suffix` format and `runs:/` format. See https://mlflow.org/docs/latest/tracking.html#artifact-stores.
- **Port**: The port to listen on.
| **Parameter** | **Description** |
| ------- | ---------- |
| Model-URI | Model-URI of MLflow , support `models:/<model_name>/suffix` format and `runs:/` format. See https://mlflow.org/docs/latest/tracking.html#artifact-stores |
| Port | The port to listen on. |
#### MLFLOW
#### MLflow
![mlflow-models-mlflow](../../../../img/tasks/demo/mlflow-models-mlflow.png)
@ -143,12 +117,9 @@ General Parameters:
![mlflow-models-docker-compose](../../../../img/tasks/demo/mlflow-models-docker-compose.png)
- **Max Cpu Limit**: For example `1.0` or `0.5`, the same as docker compose.
- **Max Memory Limit**: For example `1G` or `500M`, the same as docker compose.
## Environment to prepare
## Environment to Prepare
### Conda env
### Conda environment
You need to enter the admin account to configure a conda environment variable(Please
install [anaconda](https://docs.continuum.io/anaconda/install/)
@ -161,7 +132,7 @@ Conda environment.
![mlflow-set-conda-env](../../../../img/tasks/demo/mlflow-set-conda-env.png)
### Start the mlflow service
### Start the MLflow Service
Make sure you have installed MLflow, using 'pip install mlflow'.

69
docs/docs/en/guide/task/openmldb.md

@ -9,65 +9,54 @@ OpenMLDB task plugin used to execute tasks on OpenMLDB cluster.
## Create Task
- Click `Project -> Management-Project -> Name-Workflow Definition`, and click the "Create Workflow" button to enter the
DAG editing page.
- Click `Project Management -> Project Name -> Workflow Definition`, and click the `Create Workflow` button to enter the DAG editing page.
- Drag from the toolbar <img src="../../../../img/tasks/icons/openmldb.png" width="15"/> task node to canvas.
## Task Example
First, introduce some general parameters of DolphinScheduler
- **Node name**: The node name in a workflow definition is unique.
- **Run flag**: Identifies whether this node schedules normally, if it does not need to execute, select
the `prohibition execution`.
- **Descriptive information**: Describe the function of the node.
- **Task priority**: When the number of worker threads is insufficient, execute in the order of priority from high
to low, and tasks with the same priority will execute in a first-in first-out order.
- **Worker grouping**: Assign tasks to the machines of the worker group to execute. If `Default` is selected,
randomly select a worker machine for execution.
- **Environment Name**: Configure the environment name in which run the script.
- **Times of failed retry attempts**: The number of times the task failed to resubmit.
- **Failed retry interval**: The time interval (unit minute) for resubmitting the task after a failed task.
- **Delayed execution time**: The time (unit minute) that a task delays in execution.
- **Timeout alarm**: Check the timeout alarm and timeout failure. When the task runs exceed the "timeout", an alarm
email will send and the task execution will fail.
- **Predecessor task**: Selecting a predecessor task for the current task, will set the selected predecessor task as
upstream of the current task.
### OpenMLDB Parameters
**Task Parameter**
- **zookeeper** :OpenMLDB cluster zookeeper address, e.g. 127.0.0.1:2181.
- **zookeeper path** : OpenMLDB cluster zookeeper path, e.g. /openmldb.
- **Execute Mode** :determine the init mode, offline or online. You can switch it in sql statement.
- **SQL statement** :SQL statement.
- Custom parameters: It is the user-defined parameters of Python, which will replace the content with \${variable} in the script.
Here are some examples:
#### Load data
## Task Parameters
| **Parameter** | **Description** |
| ------- | ---------- |
| Node Name | Set the name of the task. Node names within a workflow definition are unique. |
| Run flag | Indicates whether the node can be scheduled normally. If it is not necessary to execute, you can turn on the prohibiting execution switch. |
| Description | Describes the function of this node. |
| Task priority | When the number of worker threads is insufficient, they are executed in order from high to low according to the priority, and they are executed according to the first-in, first-out principle when the priority is the same. |
| Worker group | The task is assigned to the machines in the worker group for execution. If Default is selected, a worker machine will be randomly selected for execution. |
| Task group name | The group in Resources, if not configured, it will not be used. |
| Environment Name | Configure the environment in which to run the script. |
| Number of failed retries | The number of times the task is resubmitted after failure. It supports drop-down and manual filling. |
| Failure Retry Interval | The time interval for resubmitting the task if the task fails. It supports drop-down and manual filling. |
| Timeout alarm | Check Timeout Alarm and Timeout Failure. When the task exceeds the "timeout duration", an alarm email will be sent and the task execution will fail. |
| Predecessor task | Selecting the predecessor task of the current task will set the selected predecessor task as the upstream of the current task. |
| zookeeper | OpenMLDB cluster zookeeper address, e.g. 127.0.0.1:2181. |
| zookeeper path | OpenMLDB cluster zookeeper path, e.g. /openmldb. |
| Execute Mode | Determine the init mode, offline or online. You can switch it in sql statement. |
| SQL statement | SQL statement. |
| Custom parameters | It is the user-defined parameters of Python, which will replace the content with \${variable} in the script. |
## Task Examples
### Load data
![load data](../../../../img/tasks/demo/openmldb-load-data.png)
We use `LOAD DATA` to load data into OpenMLDB cluster. We select `offline` here, so it will load to offline storage.
#### Feature extraction
### Feature extraction
![fe](../../../../img/tasks/demo/openmldb-feature-extraction.png)
We use `SELECT INTO` to do feature extraction. We select `offline` here, so it will run sql on offline engine.
## Environment to prepare
### Environment to Prepare
### Start the OpenMLDB cluster
#### Start the OpenMLDB Cluster
You should create an OpenMLDB cluster first. If in production env, please check [deploy OpenMLDB](https://openmldb.ai/docs/en/v0.5/deploy/install_deploy.html).
You can follow [run OpenMLDB in docker](https://openmldb.ai/docs/zh/v0.5/quickstart/openmldb_quickstart.html#id11)
to a quick start.
### Python env
#### Python Environment
The OpenMLDB task will use OpenMLDB Python SDK to connect OpenMLDB cluster. So you should have the Python env.

32
docs/docs/en/guide/task/pigeon.md

@ -1,19 +1,27 @@
# Pigeon
## Overview
Pigeon is a task used to trigger remote tasks, acquire logs or status by calling remote WebSocket service. It is DolphinScheduler uses a remote WebSocket service to call tasks.
## Create
## Create Task
Drag from the toolbar <img src="../../../../img/pigeon.png" width="20"/> to the canvas to create a new Pigeon task.
- Click `Project Management -> Project Name -> Workflow Definition`, and click the `Create Workflow` button to enter the DAG editing page.
- Drag from the toolbar <img src="../../../../img/pigeon.png" width="20"/> to the canvas to create a new Pigeon task.
## Parameter
## Task Parameters
- Node name: The node name in a workflow definition is unique.
- Run flag: Identifies whether this node schedules normally, if it does not need to execute, select the `prohibition execution`.
- Descriptive information: Describe the function of the node.
- Task priority: When the number of worker threads is insufficient, execute in the order of priority from high to low, and tasks with the same priority will execute in a first-in first-out order.
- Worker grouping: Assign tasks to the machines of the worker group to execute. If `Default` is selected, randomly select a worker machine for execution.
- Times of failed retry attempts: The number of times the task failed to resubmit. You can select from drop-down or fill-in a number.
- Failed retry interval: The time interval for resubmitting the task after a failed task. You can select from drop-down or fill-in a number.
- Timeout alarm: Check the timeout alarm and timeout failure. When the task runs exceed the "timeout", an alarm email will send and the task execution will fail.
- Target task name: Target task name of this Pigeon node.
| **Parameter** | **Description** |
| ------- | ---------- |
| Node Name | Set the name of the task. Node names within a workflow definition are unique. |
| Run flag | Indicates whether the node can be scheduled normally. If it is not necessary to execute, you can turn on the prohibiting execution switch. |
| Description | Describes the function of this node. |
| Task priority | When the number of worker threads is insufficient, they are executed in order from high to low according to the priority, and they are executed according to the first-in, first-out principle when the priority is the same. |
| Worker group | The task is assigned to the machines in the worker group for execution. If Default is selected, a worker machine will be randomly selected for execution. |
| Task group name | The group in Resources, if not configured, it will not be used. |
| Environment Name | Configure the environment in which to run the script. |
| Number of failed retries | The number of times the task is resubmitted after failure. It supports drop-down and manual filling. |
| Failure Retry Interval | The time interval for resubmitting the task if the task fails. It supports drop-down and manual filling. |
| Timeout alarm | Check Timeout Alarm and Timeout Failure. When the task exceeds the "timeout duration", an alarm email will be sent and the task execution will fail. |
| Target task name | Target task name of this Pigeon node. |
| Predecessor task | Selecting the predecessor task of the current task will set the selected predecessor task as the upstream of the current task. |

37
docs/docs/en/guide/task/python.md

@ -7,25 +7,29 @@ it will generate a temporary python script, and executes the script by the Linux
## Create Task
- Click Project Management-Project Name-Workflow Definition, and click the "Create Workflow" button to enter the DAG editing page.
- Click `Project Management -> Project Name -> Workflow Definition`, and click the `Create Workflow` button to enter the DAG editing page.
- Drag <img src="../../../../img/tasks/icons/python.png" width="15"/> from the toolbar to the canvas.
## Task Parameter
- Node name: The node name in a workflow definition is unique.
- Run flag: Identifies whether this node can be scheduled normally, if it does not need to be executed, you can turn on the prohibition switch.
- Descriptive information: Describe the function of the node.
- Task priority: When the number of worker threads is insufficient, execute in the order of priority from high to low, and tasks with the same priority will execute in a first-in first-out order.
- Worker grouping: Assign tasks to the machines of the worker group to execute. If `Default` is selected, randomly select a worker machine for execution.
- Environment Name: Configure the environment name in which to run the script.
- Number of failed retry attempts: The failure task resubmitting times. It supports drop-down and hand-filling.
- Failed retry interval: The time interval for resubmitting the task after a failed task. It supports drop-down and hand-filling.
- Cpu quota: Assign the specified CPU time quota to the task executed. Takes a percentage value. Default -1 means unlimited. For example, the full CPU load of one core is 100%,and that of 16 cores is 1600%. This function is controlled by [task.resource.limit.state](../../architecture/configuration.md)
- Max memory:Assign the specified max memory to the task executed. Exceeding this limit will trigger oom to be killed and will not automatically retry. Takes an MB value. Default -1 means unlimited. This function is controlled by [task.resource.limit.state](../../architecture/configuration.md)
- Timeout alarm: Check the timeout alarm and timeout failure. When the task exceeds the "timeout period", an alarm email will send and the task execution will fail.
- Script: Python program developed by the user.
- Resource: Refers to the list of resource files that need to be called in the script, and the files uploaded or created by the resource center-file management.
- Custom parameters: It is the user-defined parameters of Python, which will replace the content with \${variable} in the script.
| **Parameter** | **Description** |
| ------- | ---------- |
| Node Name | Set the name of the task. Node names within a workflow definition are unique. |
| Run flag | Indicates whether the node can be scheduled normally. If it is not necessary to execute, you can turn on the prohibiting execution switch. |
| Description | Describes the function of this node. |
| Task priority | When the number of worker threads is insufficient, they are executed in order from high to low according to the priority, and they are executed according to the first-in, first-out principle when the priority is the same. |
| Worker group | The task is assigned to the machines in the worker group for execution. If Default is selected, a worker machine will be randomly selected for execution. |
| Task group name | The group in Resources, if not configured, it will not be used. |
| Environment Name | Configure the environment in which to run the script. |
| Number of failed retries | The number of times the task is resubmitted after failure. It supports drop-down and manual filling. |
| Failure Retry Interval | The time interval for resubmitting the task if the task fails. It supports drop-down and manual filling. |
| Timeout alarm | Check Timeout Alarm and Timeout Failure. When the task exceeds the "timeout duration", an alarm email will be sent and the task execution will fail. |
| Cpu quota | Assign the specified CPU time quota to the task executed. Takes a percentage value. Default -1 means unlimited. For example, the full CPU load of one core is 100%,and that of 16 cores is 1600%. This function is controlled by [task.resource.limit.state](../../architecture/configuration.md). |
| Max memory | Assign the specified max memory to the task executed. Exceeding this limit will trigger oom to be killed and will not automatically retry. Takes an MB value. Default -1 means unlimited. This function is controlled by [task.resource.limit.state](../../architecture/configuration.md). |
| Timeout alarm | Check the timeout alarm and timeout failure. When the task exceeds the "timeout period", an alarm email will send and the task execution will fail. |
| Script | Python program developed by the user. |
| Resource | Refers to the list of resource files that need to be called in the script, and the files uploaded or created by the resource center-file management. |
| Custom parameters | It is the user-defined parameters of Python, which will replace the content with \${variable} in the script. |
## Task Example
@ -52,6 +56,3 @@ After running this example, we would see "param_val" print in the log.
print("${param_key}")
```
## Notice
None

3
docs/docs/en/guide/task/shell.md

@ -9,6 +9,9 @@ Shell task type, used to create a shell type task and execute a series of shell
- Click `Project Management -> Project Name -> Workflow Definition`, and click the `Create Workflow` button to enter the DAG editing page.
- Drag from the toolbar <img src="../../../../img/tasks/icons/shell.png" width="15"/> to the canvas.
## Task Parameters
| **Parameter** | **Description** |
| ------- | ---------- |
| Node Name | Set the name of the task. Node names within a workflow definition are unique. |

65
docs/docs/en/guide/task/spark.md

@ -10,38 +10,39 @@ Spark task type for executing Spark application. When executing the Spark task,
## Create Task
- Click `Project -> Management-Project -> Name-Workflow Definition`, and click the `Create Workflow` button to enter the DAG editing page.
- Click `Project Management -> Project Name -> Workflow Definition`, and click the `Create Workflow` button to enter the DAG editing page.
- Drag from the toolbar <img src="../../../../img/tasks/icons/spark.png" width="15"/> to the canvas.
## Task Parameter
- **Node name**: The node name in a workflow definition is unique.
- **Run flag**: Identifies whether this node schedules normally, if it does not need to execute, select the `prohibition execution`.
- **Descriptive information**: Describe the function of the node.
- **Task priority**: When the number of worker threads is insufficient, execute in the order of priority from high to low, and tasks with the same priority will execute in a first-in first-out order.
- **Worker grouping**: Assign tasks to the machines of the worker group to execute. If `Default` is selected, randomly select a worker machine for execution.
- **Environment Name**: Configure the environment name in which run the script.
- **Times of failed retry attempts**: The number of times the task failed to resubmit.
- **Failed retry interval**: The time interval (unit minute) for resubmitting the task after a failed task.
- **Delayed execution time**: The time (unit minute) that a task delays in execution.
- **Timeout alarm**: Check the timeout alarm and timeout failure. When the task runs exceed the "timeout", an alarm email will send and the task execution will fail.
- **Program type**: Supports Java, Scala, Python and SQL.
- **Spark version**: Support Spark1 and Spark2.
- **The class of main function**: The **full path** of Main Class, the entry point of the Spark program.
- **Main jar package**: The Spark jar package (upload by Resource Center).
- **SQL scripts**: SQL statements in .sql files that Spark sql runs.
- **Deployment mode**: (1) spark submit supports three modes: yarn-clusetr, yarn-client and local.
(2) spark sql supports yarn-client and local modes.
- **Task name** (optional): Spark task name.
- **Driver core number**: Set the number of Driver core, which can be set according to the actual production environment.
- **Driver memory size**: Set the size of Driver memories, which can be set according to the actual production environment.
- **Number of Executor**: Set the number of Executor, which can be set according to the actual production environment.
- **Executor memory size**: Set the size of Executor memories, which can be set according to the actual production environment.
- **Main program parameters**: Set the input parameters of the Spark program and support the substitution of custom parameter variables.
- **Optional parameters**: support `--jars`, `--files`,` --archives`, `--conf` format.
- **Resource**: Appoint resource files in the `Resource` if parameters refer to them.
- **Custom parameter**: It is a local user-defined parameter for Spark, and will replace the content with `${variable}` in the script.
- **Predecessor task**: Selecting a predecessor task for the current task, will set the selected predecessor task as upstream of the current task.
## Task Parameters
| **Parameter** | **Description** |
| ------- | ---------- |
| Node Name | Set the name of the task. Node names within a workflow definition are unique. |
| Run flag | Indicates whether the node can be scheduled normally. If it is not necessary to execute, you can turn on the prohibiting execution switch. |
| Description | Describes the function of this node. |
| Task priority | When the number of worker threads is insufficient, they are executed in order from high to low according to the priority, and they are executed according to the first-in, first-out principle when the priority is the same. |
| Worker group | The task is assigned to the machines in the worker group for execution. If Default is selected, a worker machine will be randomly selected for execution. |
| Task group name | The group in Resources, if not configured, it will not be used. |
| Environment Name | Configure the environment in which to run the script. |
| Number of failed retries | The number of times the task is resubmitted after failure. It supports drop-down and manual filling. |
| Failure Retry Interval | The time interval for resubmitting the task if the task fails. It supports drop-down and manual filling. |
| Timeout alarm | Check Timeout Alarm and Timeout Failure. When the task exceeds the "timeout duration", an alarm email will be sent and the task execution will fail. |
| Program type | Supports Java, Scala, Python, and SQL. |
| Spark version | Support Spark1 and Spark2. |
| The class of main function | The **full path** of Main Class, the entry point of the Spark program. |
| Main jar package | The Spark jar package (upload by Resource Center). |
| SQL scripts | SQL statements in .sql files that Spark sql runs. |
| Deployment mode | <ul><li>spark submit supports three modes: yarn-clusetr, yarn-client and local.</li><li>spark sql supports yarn-client and local modes.</li></ul> |
| Task name | Spark task name. |
| Driver core number | Set the number of Driver core, which can be set according to the actual production environment. |
| Driver memory size | Set the size of Driver memories, which can be set according to the actual production environment. |
| Number of Executor | Set the number of Executor, which can be set according to the actual production environment. |
| Executor memory size | Set the size of Executor memories, which can be set according to the actual production environment. |
| Main program parameters | Set the input parameters of the Spark program and support the substitution of custom parameter variables. |
| Optional parameters | Support `--jars`, `--files`,` --archives`, `--conf` format. |
| Resource | Appoint resource files in the `Resource` if parameters refer to them. |
| Custom parameter | It is a local user-defined parameter for Spark, and will replace the content with `${variable}` in the script. |
| Predecessor task | Selecting a predecessor task for the current task, will set the selected predecessor task as upstream of the current task. |
## Task Example
@ -71,7 +72,7 @@ Configure the required content according to the parameter descriptions above.
![demo-spark-simple](../../../../img/tasks/demo/spark_task02.png)
### spark sql
### Spark sql
#### Execute DDL and DML statements
@ -79,7 +80,7 @@ This case is to create a view table terms and write three rows of data and a tab
![spark_sql](../../../../img/tasks/demo/spark_sql.png)
## Notice
## Note
JAVA and Scala are only used for identification, and there is no difference when you use the Spark task. If your application is developed by Python, you could just ignore the parameter **Main Class** in the form. Parameter **SQL scripts** is only for SQL type and could be ignored in JAVA, Scala and Python.

27
docs/docs/en/guide/task/sql.md

@ -10,24 +10,21 @@ Refer to [DataSource](../datasource/introduction.md)
## Create Task
- Click `Project -> Management-Project -> Name-Workflow Definition`, and click the "Create Workflow" button to enter the DAG editing page.
- Click `Project Management -> Project Name -> Workflow Definition`, and click the `Create Workflow` button to enter the DAG editing page.
- Drag from the toolbar <img src="../../../../img/tasks/icons/sql.png" width="25"/> to the canvas.
## Task Parameter
- Data source: Select the corresponding DataSource.
- SQL type: Supports query and non-query.
- Query: supports `DML select` type commands, which return a result set. You can specify three templates for email notification as form, attachment or form attachment;
- Non-query: support `DDL` all commands and `DML update, delete, insert` three types of commands;
- Segmented execution symbol: When the data source does not support executing multiple SQL statements at a time, the symbol for splitting SQL statements is provided to call the data source execution method multiple times.
Example: 1. When the Hive data source is selected as the data source, this parameter does not need to be filled in. Because the Hive data source itself supports executing multiple SQL statements at one time;
2. When the MySQL data source is selected as the data source, and multi-segment SQL statements are to be executed, this parameter needs to be filled in with a semicolon `;`. Because the MySQL data source does not support executing multiple SQL statements at one time;
- SQL parameter: The input parameter format is `key1=value1;key2=value2...`.
- SQL statement: SQL statement.
- UDF function: For Hive DataSources, you can refer to UDF functions created in the resource center, but other DataSource do not support UDF functions.
- Custom parameters: SQL task type, and stored procedure is a custom parameter order, to set customized parameter type and data type for the method is the same as the stored procedure task type. The difference is that the custom parameter of the SQL task type replaces the `${variable}` in the SQL statement.
- Pre-SQL: Pre-SQL executes before the SQL statement.
- Post-SQL: Post-SQL executes after the SQL statement.
| **Parameter** | **Description** |
| ------- |-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Data source | Select the corresponding DataSource. |
| SQL type | Supports query and non-query. <ul><li>Query: supports `DML select` type commands, which return a result set. You can specify three templates for email notification as form, attachment or form attachment;</li><li>Non-query: support `DDL` all commands and `DML update, delete, insert` three types of commands;<ul><li>Segmented execution symbol: When the data source does not support executing multiple SQL statements at a time, the symbol for splitting SQL statements is provided to call the data source execution method multiple times. Example: 1. When the Hive data source is selected as the data source, this parameter does not need to be filled in. Because the Hive data source itself supports executing multiple SQL statements at one time; 2. When the MySQL data source is selected as the data source, and multi-segment SQL statements are to be executed, this parameter needs to be filled in with a semicolon `;. Because the MySQL data source does not support executing multiple SQL statements at one time.</li></ul></li></ul> |
| SQL parameter | The input parameter format is `key1=value1;key2=value2...`. |
| SQL statement | SQL statement. |
| UDF function | For Hive DataSources, you can refer to UDF functions created in the resource center, but other DataSource do not support UDF functions. |
| Custom parameters | SQL task type, and stored procedure is a custom parameter order, to set customized parameter type and data type for the method is the same as the stored procedure task type. The difference is that the custom parameter of the SQL task type replaces the `${variable}` in the SQL statement. |
| Pre-SQL | Pre-SQL executes before the SQL statement. |
| Post-SQL | Post-SQL executes after the SQL statement. |
## Task Example
@ -51,7 +48,7 @@ Table created in the Pre-SQL, after use in the SQL statement, cleaned in the Pos
![pre_post_sql](../../../../img/tasks/demo/pre_post_sql.png)
## Notice
## Note
Pay attention to the selection of SQL type. If it is an insert operation, need to change to "Non-Query" type.

21
docs/docs/en/guide/task/stored-procedure.md

@ -8,6 +8,21 @@
<img src="../../../../img/procedure-en.png" width="80%" />
</p>
- DataSource: The DataSource type of the stored procedure supports MySQL and POSTGRESQL, select the corresponding DataSource.
- Method: The method name of the stored procedure.
- Custom parameters: The custom parameter types of the stored procedure support `IN` and `OUT`, and the data types support: VARCHAR, INTEGER, LONG, FLOAT, DOUBLE, DATE, TIME, TIMESTAMP and BOOLEAN.
## Task Parameters
| **Parameter** | **Description** |
| ------- | ---------- |
| Node Name | Set the name of the task. Node names within a workflow definition are unique. |
| Run flag | Indicates whether the node can be scheduled normally. If it is not necessary to execute, you can turn on the prohibiting execution switch. |
| Description | Describes the function of this node. |
| Task priority | When the number of worker threads is insufficient, they are executed in order from high to low according to the priority, and they are executed according to the first-in, first-out principle when the priority is the same. |
| Worker group | The task is assigned to the machines in the worker group for execution. If Default is selected, a worker machine will be randomly selected for execution. |
| Task group name | The group in Resources, if not configured, it will not be used. |
| Environment Name | Configure the environment in which to run the script. |
| Number of failed retries | The number of times the task is resubmitted after failure. It supports drop-down and manual filling. |
| Failure Retry Interval | The time interval for resubmitting the task if the task fails. It supports drop-down and manual filling. |
| Timeout alarm | Check Timeout Alarm and Timeout Failure. When the task exceeds the "timeout duration", an alarm email will be sent and the task execution will fail. |
| DataSource | The DataSource type of the stored procedure supports MySQL and POSTGRESQL, select the corresponding DataSource. |
| Method | The method name of the stored procedure. |
| Custom parameters | The custom parameter types of the stored procedure support `IN` and `OUT`, and the data types support: VARCHAR, INTEGER, LONG, FLOAT, DOUBLE, DATE, TIME, TIMESTAMP and BOOLEAN. |
| Predecessor task | Selecting the predecessor task of the current task will set the selected predecessor task as the upstream of the current task. |

2
docs/docs/en/guide/task/sub-process.md

@ -6,7 +6,7 @@ The sub-process node is to execute an external workflow definition as a task nod
## Create Task
- Click `Project Management-> Project Name -> Workflow Definition`, and click the `Create Workflow` button to enter the DAG editing page.
- Click `Project Management -> Project Name -> Workflow Definition`, and click the `Create Workflow` button to enter the DAG editing page.
- Drag from the toolbar <img src="../../../../img/tasks/icons/sub_process.png" width="15"/> task node to canvas to create a new SubProcess task.
## Task Parameter

47
docs/docs/en/guide/task/switch.md

@ -1,29 +1,34 @@
# Switch
## Overview
The switch is a conditional judgment node, decide the branch executes according to the value of [global variable](../parameter/global.md) and the expression result written by the user.
**Note** Execute expressions using javax.script.ScriptEngine.eval.
**Note**: Execute expressions using javax.script.ScriptEngine.eval.
## Create Task
Click Project -> Management-Project -> Name-Workflow Definition, and click the Create Workflow button to enter the DAG editing page.
Drag from the toolbar <img src="../../../../img/switch.png" width="20"/> task node to canvas to create a task.
**Note** After created a switch task, you must first configure the upstream and downstream, then configure the parameter of task branches.
## Parameter
- Node name: The node name in a workflow definition is unique.
- Run flag: Identifies whether this node schedules normally, if it does not need to execute, select the `prohibition execution`.
- Descriptive information: Describe the function of the node.
- Task priority: When the number of worker threads is insufficient, execute in the order of priority from high to low, and tasks with the same priority will execute in a first-in first-out order.
- Worker grouping: Assign tasks to the machines of the worker group to execute. If `Default` is selected, randomly select a worker machine for execution.
- Environment name: The environment in Security, if not configured, it will not be used.
- Task group name: The group in Resources, if not configured, it will not be used.
- Times of failed retry attempts: The number of times the task failed to resubmit. You can select from drop-down or fill-in a number.
- Failed retry interval: The time interval for resubmitting the task after a failed task. You can select from drop-down or fill-in a number.
- Delay execution time: Task delay execution time.
- Timeout alarm: Check the timeout alarm and timeout failure. When the task runs exceed the "timeout", an alarm email will send and the task execution will fail.
- Condition: You can configure multiple conditions for the switch task. When the conditions are satisfied, execute the configured branch. You can configure multiple different conditions to satisfy different businesses.
- Branch flow: The default branch flow, when all the conditions are not satisfied, execute this branch flow.
- Click `Project Management -> Project Name -> Workflow Definition`, and click the `Create Workflow` button to enter the DAG editing page.
- Drag from the toolbar <img src="../../../../img/switch.png" width="20"/> task node to canvas to create a task.
**Note**: After created a switch task, you must first configure the upstream and downstream, then configure the parameter of task branches.
## Task Parameters
| **Parameter** | **Description** |
| ------- | ---------- |
| Node Name | Set the name of the task. Node names within a workflow definition are unique. |
| Run flag | Indicates whether the node can be scheduled normally. If it is not necessary to execute, you can turn on the prohibiting execution switch. |
| Description | Describes the function of this node. |
| Task priority | When the number of worker threads is insufficient, they are executed in order from high to low according to the priority, and they are executed according to the first-in, first-out principle when the priority is the same. |
| Worker group | The task is assigned to the machines in the worker group for execution. If Default is selected, a worker machine will be randomly selected for execution. |
| Task group name | The group in Resources, if not configured, it will not be used. |
| Environment Name | Configure the environment in which to run the script. |
| Number of failed retries | The number of times the task is resubmitted after failure. It supports drop-down and manual filling. |
| Failure Retry Interval | The time interval for resubmitting the task if the task fails. It supports drop-down and manual filling. |
| Timeout alarm | Check Timeout Alarm and Timeout Failure. When the task exceeds the "timeout duration", an alarm email will be sent and the task execution will fail. |
| Delay execution time | Task delay execution time. |
| Condition | You can configure multiple conditions for the switch task. When the conditions are satisfied, execute the configured branch. You can configure multiple different conditions to satisfy different businesses. |
| Branch flow | The default branch flow, when all the conditions are not satisfied, execute this branch flow. |
## Task Example

32
docs/docs/en/guide/task/zeppelin.md

@ -7,22 +7,26 @@ it will call `Zeppelin Client API` to trigger zeppelin notebook paragraph. Click
## Create Task
- Click Project Management-Project Name-Workflow Definition, and click the "Create Workflow" button to enter the DAG editing page.
- Click `Project Management -> Project Name -> Workflow Definition`, and click the `Create Workflow` button to enter the DAG editing page.
- Drag <img src="../../../../img/tasks/icons/zeppelin.png" width="15"/> from the toolbar to the canvas.
## Task Parameter
- Node name: The node name in a workflow definition is unique.
- Run flag: Identifies whether this node can be scheduled normally, if it does not need to be executed, you can turn on the prohibition switch.
- Descriptive information: Describe the function of the node.
- Task priority: When the number of worker threads is insufficient, execute in the order of priority from high to low, and tasks with the same priority will execute in a first-in first-out order.
- Worker grouping: Assign tasks to the machines of the worker group to execute. If `Default` is selected, randomly select a worker machine for execution.
- Number of failed retry attempts: The failure task resubmitting times. It supports drop-down and hand-filling.
- Failed retry interval: The time interval for resubmitting the task after a failed task. It supports drop-down and hand-filling.
- Timeout alarm: Check the timeout alarm and timeout failure. When the task exceeds the "timeout period", an alarm email will send and the task execution will fail.
- Zeppelin Note ID: The unique note id for a zeppelin notebook note.
- Zeppelin Paragraph ID: The unique paragraph id for a zeppelin notebook paragraph. If you want to schedule a whole note at a time, leave this field blank.
- Zeppelin Parameters: Parameters in json format used for zeppelin dynamic form.
## Task Parameters
| **Parameter** | **Description** |
| ------- | ---------- |
| Node Name | Set the name of the task. Node names within a workflow definition are unique. |
| Run flag | Indicates whether the node can be scheduled normally. If it is not necessary to execute, you can turn on the prohibiting execution switch. |
| Description | Describes the function of this node. |
| Task priority | When the number of worker threads is insufficient, they are executed in order from high to low according to the priority, and they are executed according to the first-in, first-out principle when the priority is the same. |
| Worker group | The task is assigned to the machines in the worker group for execution. If Default is selected, a worker machine will be randomly selected for execution. |
| Task group name | The group in Resources, if not configured, it will not be used. |
| Environment Name | Configure the environment in which to run the script. |
| Number of failed retries | The number of times the task is resubmitted after failure. It supports drop-down and manual filling. |
| Failure Retry Interval | The time interval for resubmitting the task if the task fails. It supports drop-down and manual filling. |
| Timeout alarm | Check Timeout Alarm and Timeout Failure. When the task exceeds the "timeout duration", an alarm email will be sent and the task execution will fail. |
| Zeppelin Note ID | The unique note id for a zeppelin notebook note. |
| Zeppelin Paragraph ID | The unique paragraph id for a zeppelin notebook paragraph. If you want to schedule a whole note at a time, leave this field blank. |
| Zeppelin Parameters | Parameters in json format used for zeppelin dynamic form. |
## Task Example

Loading…
Cancel
Save