- Task State Statistics: It refers to the statistics of the number of tasks to be run, failed, running, completed and succeeded in a given time frame.
- Process State Statistics: It refers to the statistics of the number of waiting, failing, running, completing and succeeding process instances in a specified time range.
- Process Definition Statistics: The process definition created by the user and the process definition granted by the administrator to the user are counted.
- Go to the project home page, click "Process definitions" and enter the list page of process definition.
- Click "Create process" to create a new process definition.
- Drag the "SHELL" node to the canvas and add a shell task.
- Fill in the Node Name, Description, and Script fields.
- Selecting "task priority" will give priority to high-level tasks in the execution queue. Tasks with the same priority will be executed in the first-in-first-out order.
- Timeout alarm. Fill in "Overtime Time". When the task execution time exceeds the overtime, it can alarm and fail over time.
- Increase the order of execution between nodes: click "line connection". As shown, task 1 and task 3 are executed in parallel. When task 1 is executed, task 2 and task 3 are executed simultaneously.
- Delete dependencies: Click on the arrow icon to "drag nodes and select items", select the connection line, click on the delete icon to delete dependencies between nodes.
> Before setting workflow offline, the timed tasks in timed management should be offline, so that the definition of workflow can be set offline successfully.
* Failure strategy:**When a task node fails to execute, other parallel task nodes need to execute the strategy**。”Continue "Representation: Other task nodes perform normally", "End" Representation: Terminate all ongoing tasks and terminate the entire process.
* Process priority: The priority of process running is divided into five levels:the highest, the high, the medium, the low, and the lowest . High-level processes are executed first in the execution queue, and processes with the same priority are executed first in first out order.
* Worker group: This process can only be executed in a specified machine group. Default, by default, can be executed on any worker.
* Notification group: When the process ends or fault tolerance occurs, process information is sent to all members of the notification group by mail.
* Recipient: Enter the mailbox and press Enter key to save. When the process ends and fault tolerance occurs, an alert message is sent to the recipient list.
* Cc: Enter the mailbox and press Enter key to save. When the process is over and fault-tolerant occurs, alarm messages are copied to the copier list.
* Complement: To implement the workflow definition of a specified date, you can select the time range of the complement (currently only support for continuous days), such as the data from May 1 to May 10, as shown in the figure:
> Complement execution mode includes serial execution and parallel execution. In serial mode, the complement will be executed sequentially from May 1 to May 10. In parallel mode, the tasks from May 1 to May 10 will be executed simultaneously.
* Editor: You can edit the terminated process. When you save it after editing, you can choose whether to update the process definition or not.
* Rerun: A process that has been terminated can be re-executed.
* Recovery failure: For a failed process, a recovery failure operation can be performed, starting at the failed node.
* Stop: Stop the running process, the background will `kill` he worker process first, then `kill -9` operation.
* Pause:The running process can be **suspended**, the system state becomes **waiting to be executed**, waiting for the end of the task being executed, and suspending the next task to be executed.
* Restore pause: **The suspended process** can be restored and run directly from the suspended node
* Delete: Delete process instances and task instances under process instances
* Gantt diagram: The vertical axis of Gantt diagram is the topological ordering of task instances under a process instance, and the horizontal axis is the running time of task instances, as shown in the figure:
> It is the management of various resource files, including creating basic txt/log/sh/conf files, uploading jar packages and other types of files, editing, downloading, deleting and other operations.
> Upload Files: Click the Upload button to upload, drag the file to the upload area, and the file name will automatically complete the uploaded file name.
> You can download a file by clicking the download button in the top right corner of the file details, or by downloading the file under the download button after the file list.
> File List - > Click the Delete button to delete the specified file
#### Resource management
> Resource management and file management functions are similar. The difference is that resource management is the UDF function of uploading, and file management uploads user programs, scripts and configuration files.
* Upload UDF resources
> The same as uploading files.
#### Function management
* Create UDF Functions
> Click "Create UDF Function", enter parameters of udf function, select UDF resources, and click "Submit" to create udf function.
>
>
>
> Currently only temporary udf functions for HIVE are supported
>
>
>
> - UDF function name: name when entering UDF Function
> - Package Name: Full Path of Input UDF Function
> - Parameter: Input parameters used to annotate functions
> - Database Name: Reserved Field for Creating Permanent UDF Functions
> - UDF Resources: Set up the resource files corresponding to the created UDF
- The security has the functions of queue management, tenant management, user management, warning group management, worker group manager, token manage and other functions. It can also authorize resources, data sources, projects, etc.
- The tenant corresponds to the account of Linux, which is used by the worker server to submit jobs. If Linux does not have this user, the worker would create the account when executing the task.
- Tenant Code:**the tenant code is the only account on Linux that can't be duplicated.**
* The alarm group is a parameter set at start-up. After the process is finished, the status of the process and other information will be sent to the alarm group by mail.
- Worker group provides a mechanism for tasks to run on a specified worker. Administrators create worker groups, which can be specified in task nodes and operation parameters. If the specified grouping is deleted or no grouping is specified, the task will run on any worker.
- Multiple IP addresses within a worker group (**aliases can not be written**), separated by **commas in English**
- Granting permissions includes project permissions, resource permissions, datasource permissions, UDF Function permissions.
> Administrators can authorize projects, resources, data sources and UDF Functions that are not created by ordinary users. Because project, resource, data source and UDF Function are all authorized in the same way, the project authorization is introduced as an example.
> Note:For projects created by the user himself, the user has all the permissions. The list of items and the list of selected items will not be reflected
- 1.Click on the authorization button of the designated person as follows:
- The shell node, when the worker executes, generates a temporary shell script, which is executed by a Linux user with the same name as the tenant.
> Drag the ![PNG](https://analysys.github.io/easyscheduler_docs_cn/images/toolbar_SHELL.png) task node in the toolbar onto the palette and double-click the task node as follows:
- Node name: The node name in a process definition is unique
- Run flag: Identify whether the node can be scheduled properly, and if it does not need to be executed, you can turn on the forbidden execution switch.
- Description : Describes the function of the node
- Number of failed retries: Number of failed task submissions, support drop-down and manual filling
- Failure Retry Interval: Interval between tasks that fail to resubmit tasks, support drop-down and manual filling
- Script: User-developed SHELL program
- Resources: A list of resource files that need to be invoked in a script
- Custom parameters: User-defined parameters that are part of SHELL replace the contents of scripts with ${variables}
> Drag the ![PNG](https://analysys.github.io/easyscheduler_docs_cn/images/toolbar_SUB_PROCESS.png) task node in the toolbar onto the palette and double-click the task node as follows:
- Node name: The node name in a process definition is unique
- Run flag: Identify whether the node is scheduled properly
- Description: Describes the function of the node
- Sub-node: The process definition of the selected sub-process is selected, and the process definition of the selected sub-process can be jumped to by entering the sub-node in the upper right corner.
### DEPENDENT
- Dependent nodes are **dependent checking nodes**. For example, process A depends on the successful execution of process B yesterday, and the dependent node checks whether process B has a successful execution instance yesterday.
> Drag the ![PNG](https://analysys.github.io/easyscheduler_docs_cn/images/toolbar_DEPENDENT.png) ask node in the toolbar onto the palette and double-click the task node as follows:
> Dependent nodes provide logical judgment functions, such as checking whether yesterday's B process was successful or whether the C process was successfully executed.
> For example, process A is a weekly task and process B and C are daily tasks. Task A requires that task B and C be successfully executed every day of the week, as shown in the figure:
- The procedure is executed according to the selected data source.
> Drag the ![PNG](https://analysys.github.io/easyscheduler_docs_cn/images/toolbar_PROCEDURE.png) task node in the toolbar onto the palette and double-click the task node as follows:
- Datasource: The data source type of stored procedure supports MySQL and POSTGRRESQL, and chooses the corresponding data source.
- Method: The method name of the stored procedure
- Custom parameters: Custom parameter types of stored procedures support IN and OUT, and data types support nine data types: VARCHAR, INTEGER, LONG, FLOAT, DOUBLE, DATE, TIME, TIMESTAMP and BOOLEAN.
- Executing the query SQL function, you can choose to send mail in the form of tables and attachments to the designated recipients.
> Drag the ![PNG](https://analysys.github.io/easyscheduler_docs_cn/images/toolbar_SQL.png) task node in the toolbar onto the palette and double-click the task node as follows:
- sql type: support query and non-query, query is select type query, there is a result set returned, you can specify mail notification as table, attachment or table attachment three templates. Non-query is not returned by result set, and is for update, delete, insert three types of operations
- sql parameter: input parameter format is key1 = value1; key2 = value2...
- sql statement: SQL statement
- UDF function: For HIVE type data sources, you can refer to UDF functions created in the resource center, other types of data sources do not support UDF functions for the time being.
- Custom parameters: SQL task type, and stored procedure is to customize the order of parameters to set values for methods. Custom parameter type and data type are the same as stored procedure task type. The difference is that the custom parameter of the SQL task type replaces the ${variable} in the SQL statement.
### SPARK
- Through SPARK node, SPARK program can be directly executed. For spark node, worker will use `spark-submit` mode to submit tasks.
> Drag the ![PNG](https://analysys.github.io/easyscheduler_docs_cn/images/toolbar_SPARK.png) task node in the toolbar onto the palette and double-click the task node as follows:
- Class of the main function: The full path of Main Class, the entry to the Spark program
- Master jar package: It's Spark's jar package
- Deployment: support three modes: yarn-cluster, yarn-client, and local
- Driver Kernel Number: Driver Kernel Number and Memory Number can be set
- Executor Number: Executor Number, Executor Memory Number and Executor Kernel Number can be set
- Command Line Parameters: Setting the input parameters of Spark program to support the replacement of custom parameter variables.
- Other parameters: support - jars, - files, - archives, - conf format
- Resource: If a resource file is referenced in other parameters, you need to select the specified resource.
- Custom parameters: User-defined parameters in MR locality that replace the contents in scripts with ${variables}
Note: JAVA and Scala are just used for identification, no difference. If it's a Spark developed by Python, there's no class of the main function, and everything else is the same.
### MapReduce(MR)
- Using MR nodes, MR programs can be executed directly. For Mr nodes, worker submits tasks using `hadoop jar`
> Drag the ![PNG](https://analysys.github.io/easyscheduler_docs_cn/images/toolbar_MR.png) task node in the toolbar onto the palette and double-click the task node as follows:
- Among them, mapper. py 1 after - mapper is two parameters, the first parameter is mapper. py, and the second parameter is 1.
- Resource: If a resource file is referenced in other parameters, you need to select the specified resource.
- Custom parameters: User-defined parameters in MR locality that replace the contents in scripts with ${variables}
### Python
- With Python nodes, Python scripts can be executed directly. For Python nodes, worker will use `python ** `to submit tasks.
> Drag the ![PNG](https://analysys.github.io/easyscheduler_docs_cn/images/toolbar_PYTHON.png) task node in the toolbar onto the palette and double-click the task node as follows:
- Resource: A list of resource files that need to be invoked in a script
- Custom parameters: User-defined parameters that are part of Python that replace the contents in the script with ${variables}
### System parameter
<table>
<tr><th>variable</th><th>meaning</th></tr>
<tr>
<td>${system.biz.date}</td>
<td>The timing time of routine dispatching instance is one day before, in yyyyyMMdd format. When data is supplemented, the date + 1</td>
</tr>
<tr>
<td>${system.biz.curdate}</td>
<td> Daily scheduling example timing time, format is yyyyyMMdd, when supplementing data, the date + 1</td>
</tr>
<tr>
<td>${system.datetime}</td>
<td>Daily scheduling example timing time, format is yyyyyMMddHmmss, when supplementing data, the date + 1</td>
</tr>
</table>
### Time Customization Parameters
> Support code to customize the variable name, declaration: ${variable name}. It can refer to "system parameters" or specify "constants".
> When we define this benchmark variable as $[...], [yyyyMMddHHmmss] can be decomposed and combined arbitrarily, such as:$[yyyyMMdd], $[HHmmss], $[yyyy-MM-dd] ,etc.
> Can also do this:
>
>
- Later N years: $[add_months (yyyyyyMMdd, 12*N)]
- The previous N years: $[add_months (yyyyyyMMdd, -12*N)]
- Later N months: $[add_months (yyyyyMMdd, N)]
- The first N months: $[add_months (yyyyyyMMdd, -N)]
- Later N weeks: $[yyyyyyMMdd + 7*N]
- The first N weeks: $[yyyyyMMdd-7*N]
- The day after that: $[yyyyyyMMdd + N]
- The day before yesterday: $[yyyyyMMdd-N]
- Later N hours: $[HHmmss + N/24]
- First N hours: $[HHmmss-N/24]
- After N minutes: $[HHmmss + N/24/60]
- First N minutes: $[HHmmss-N/24/60]
### User-defined parameters
> User-defined parameters are divided into global parameters and local parameters. Global parameters are the global parameters passed when the process definition and process instance are saved. Global parameters can be referenced by local parameters of any task node in the whole process.
> In tasks, local_param_bizdate refers to global parameters by ${global_bizdate} for scripts, the value of variable local_param_bizdate can be referenced by${local_param_bizdate}, or the value of local_param_bizdate can be set directly by JDBC.