@ -30,23 +30,10 @@ The follow shows the DolphinScheduler DataSync task plugin features:
## Task Example
First, introduce some general parameters of DolphinScheduler:
- **Node name**: The name of the task. Node names within the same workflow must be unique.
- **Run flag**: Indicating whether to schedule the task. If you do not need to execute the task, you can turn on the `Prohibition execution` switch.
- **Description**: Describing the function of this node.
- **Task priority**: When the number of the worker threads is insufficient, the worker executes task according to the priority. When two tasks have the same priority, the worker will execute them in `first come first served` fashion.
- **Worker group**: Machines which execute the tasks. If you choose `default`, scheduler will send the task to a random worker.
- **Task group name**: Resource group of tasks. It will not take effect if not configured.
- **Environment name**: Environment to execute the task.
- **Number of failed retries**: The number of task retries for failures. You could select it by drop-down menu or fill it manually.
- **Failure retry interval**: Interval of task retries for failures. You could select it by drop-down menu or fill it manually.
- **CPU quota**: Assign the specified CPU time quota to the task executed. Takes a percentage value. Default -1 means unlimited. For example, the full CPU load of one core is 100%, and that of 16 cores is 1600%. You could configure it by [task.resource.limit.state](../../architecture/configuration.md).
- **Max memory**: Assign the specified max memory to the task executed. Exceeding this limit will trigger oom to be killed and will not automatically retry. Takes an MB value. Default -1 means unlimited. You could configure it by [task.resource.limit.state](../../architecture/configuration.md).
- **Timeout alarm**: Alarm for task timeout. When the task exceeds the "timeout threshold", an alarm email will send.
- **Delayed execution time**: The time that a task delays for execution in minutes.
- **Resources**: Resources which your task node uses.
- **Predecessor task**: The upstream task of the current task node.
[//]: # (TODO: use the commented anchor below once our website template supports this syntax)
[//]: # (- Please refer to [DolphinScheduler Task Parameters Appendix](appendix.md#default-task-parameters)`Default Task Parameters` section for default parameters.)
- Please refer to [DolphinScheduler Task Parameters Appendix](appendix.md) `Default Task Parameters` section for default parameters.
Here are some specific parameters for the DataSync plugin:
| Node Name | The name of the set task. The node name in a workflow definition is unique. |
| Run Flag | Indicates whether the node is scheduled properly and turns on the kill switch, if not needed. |
| Description | Describes the functionality of the node. |
| Task Priority | When the number of worker threads is insufficient, the worker executes tasks according to the priority. When the priority is the same, the worker executes tasks by order. |
| Worker Group | The group of machines who execute the tasks. If selecting `Default`, DolphinScheduler will randomly choose a worker machine to execute the task. |
| Environment Name | Configure the environment in which the task runs. |
| Number Of Failed Retries | Number of resubmitted tasks that failed. You can choose the number in the drop-down menu or fill it manually. |
| Failed Retry Interval | the interval between the failure and resubmission of a task. You can choose the number in the drop-down menu or fill it manually. |
| Delayed Execution Time | the amount of time a task is delayed, in units. |
| Timeout Alarm | Check timeout warning, timeout failure, when the task exceeds the“Timeout length”, send a warning message and the task execution fails. |
| Module Path | pick Java 9 + 's modularity feature, put all resources into-module-path, and require that the JDK version in your worker supports modularity. |
| Main Parameter | Java program main method entry parameter. |
| Java VM Parameters | JVM startup parameters. |
| Script | You need to write Java code if you use the Java run type. The public class must exist in the code without writing a package statement. |
| Resources | External JAR packages or other resource files that are added to the classpath or module path and can be easily retrieved in your JAVA script. |
| Custom parameter | A user-defined parameter that is part of HTTP and replaces `${ variable }` in the script . |
| Pre Tasks | Selects a pre-task for the current task and sets the pre-task as the upstream of the current task. |
[//]: # (TODO: use the commented anchor below once our website template supports this syntax)
[//]: # (- Please refer to [DolphinScheduler Task Parameters Appendix](appendix.md#default-task-parameters)`Default Task Parameters` section for default parameters.)
- Please refer to [DolphinScheduler Task Parameters Appendix](appendix.md) `Default Task Parameters` section for default parameters.
| Module Path | pick Java 9 + 's modularity feature, put all resources into-module-path, and require that the JDK version in your worker supports modularity. |
| Main Parameter | Java program main method entry parameter. |
| Java VM Parameters | JVM startup parameters. |
| Script | You need to write Java code if you use the Java run type. The public class must exist in the code without writing a package statement. |
| Resources | External JAR packages or other resource files that are added to the classpath or module path and can be easily retrieved in your JAVA script. |
@ -100,6 +103,7 @@ You can now use this feature to run all MLFlow projects on Github (For example [
## Environment to Prepare
### Conda Environment
Please install [anaconda](https://docs.continuum.io/anaconda/install/) or [miniconda](https://docs.conda.io/en/latest/miniconda.html#installing) in advance.
**Method A:**
@ -113,7 +117,6 @@ Add the following content to the file:
export PATH=/opt/anaconda3/bin:$PATH
```
**Method B:**
You need to enter the admin account to configure a conda environment variable.
@ -12,16 +12,19 @@ Click [here](https://seatunnel.apache.org/) for more information about `Apache S
## Task Parameter
- Please refer to [DolphinScheduler Task Parameters Appendix](appendix.md#default-task-parameters) for default parameters.
[//]: # (TODO: use the commented anchor below once our website template supports this syntax)
[//]: # (- Please refer to [DolphinScheduler Task Parameters Appendix](appendix.md#default-task-parameters)`Default Task Parameters` section for default parameters.)
- Please refer to [DolphinScheduler Task Parameters Appendix](appendix.md) `Default Task Parameters` section for default parameters.
- Engine: Supports FLINK and SPARK
- FLINK
- Run model: supports `run` and `run-application` modes
- Option parameters: used to add the parameters of the Flink engine, such as `-m yarn-cluster -ynm seatunnel`
- SPARK
- Deployment mode: specify the deployment mode, `cluster``client``local`
- Master: Specify the `Master` model, `yarn``local``spark``mesos`, where `spark` and `mesos` need to specify the `Master` service address, for example: 127.0.0.1:7077
> Click [here](https://seatunnel.apache.org/docs/2.1.2/command/usage) for more information on the usage of `Apache SeaTunnel command`
- FLINK
- Run model: supports `run` and `run-application` modes
- Option parameters: used to add the parameters of the Flink engine, such as `-m yarn-cluster -ynm seatunnel`
- SPARK
- Deployment mode: specify the deployment mode, `cluster``client``local`
- Master: Specify the `Master` model, `yarn``local``spark``mesos`, where `spark` and `mesos` need to specify the `Master` service address, for example: 127.0.0.1:7077
> Click [here](https://seatunnel.apache.org/docs/2.1.2/command/usage) for more information on the usage of `Apache SeaTunnel command`
- Custom Configuration: Supports custom configuration or select configuration file from Resource Center