Browse Source

[doc] Modified dq, monitor, security, resources (#10715)

3.1.0-release
sneh-wha 2 years ago committed by GitHub
parent
commit
91b59dee54
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
  1. 8
      docs/configs/docsdev.js
  2. 497
      docs/docs/en/guide/data-quality.md
  3. 12
      docs/docs/en/guide/monitor.md
  4. 58
      docs/docs/en/guide/resource/configuration.md
  5. 5
      docs/docs/en/guide/resource/intro.md
  6. 14
      docs/docs/en/guide/resource/task-group.md
  7. 16
      docs/docs/en/guide/resource/udf-manage.md
  8. 18
      docs/docs/en/guide/security.md
  9. 4
      docs/docs/zh/guide/resource/intro.md

8
docs/configs/docsdev.js

@ -265,6 +265,10 @@ export default {
{ {
title: 'Resource', title: 'Resource',
children: [ children: [
{
title: 'Introduction',
link: '/en-us/docs/dev/user_doc/guide/resource/intro.html'
},
{ {
title: 'Configuration', title: 'Configuration',
link: '/en-us/docs/dev/user_doc/guide/resource/configuration.html' link: '/en-us/docs/dev/user_doc/guide/resource/configuration.html'
@ -657,6 +661,10 @@ export default {
{ {
title: '资源中心', title: '资源中心',
children: [ children: [
{
title: '简介',
link: '/zh-cn/docs/dev/user_doc/guide/resource/intro.html'
},
{ {
title: '配置详情', title: '配置详情',
link: '/zh-cn/docs/dev/user_doc/guide/resource/configuration.html' link: '/zh-cn/docs/dev/user_doc/guide/resource/configuration.html'

497
docs/docs/en/guide/data-quality.md

@ -1,56 +1,45 @@
# Overview # Data Quality
## Introduction ## Introduction
The data quality task is used to check the data accuracy during the integration and processing of data. Data quality tasks in this release include single-table checking, single-table custom SQL checking, multi-table accuracy, and two-table value comparisons. The running environment of the data quality task is Spark 2.4.0, and other versions have not been verified, and users can verify by themselves. The data quality task is used to check the data accuracy during the integration and processing of data. Data quality tasks in this release include single-table checking, single-table custom SQL checking, multi-table accuracy, and two-table value comparisons. The running environment of the data quality task is Spark 2.4.0, and other versions have not been verified, and users can verify by themselves.
- The execution flow of the data quality task is as follows: The execution flow of the data quality task is as follows:
> The user defines the task in the interface, and the user input value is stored in `TaskParam` - The user defines the task in the interface, and the user input value is stored in `TaskParam`.
When running a task, `Master` will parse `TaskParam`, encapsulate the parameters required by `DataQualityTask` and send it to `Worker`. - When running a task, `Master` will parse `TaskParam`, encapsulate the parameters required by `DataQualityTask` and send it to `Worker`.
Worker runs the data quality task. After the data quality task finishes running, it writes the statistical results to the specified storage engine. The current data quality task result is stored in the `t_ds_dq_execute_result` table of `dolphinscheduler` - Worker runs the data quality task. After the data quality task finishes running, it writes the statistical results to the specified storage engine.
`Worker` sends the task result to `Master`, after `Master` receives `TaskResponse`, it will judge whether the task type is `DataQualityTask`, if so, it will read the corresponding result from `t_ds_dq_execute_result` according to `taskInstanceId`, and then The result is judged according to the check mode, operator and threshold configured by the user. If the result is a failure, the corresponding operation, alarm or interruption will be performed according to the failure policy configured by the user. - The current data quality task result is stored in the `t_ds_dq_execute_result` table of `dolphinscheduler`
`Worker` sends the task result to `Master`, after `Master` receives `TaskResponse`, it will judge whether the task type is `DataQualityTask`, if so, it will read the corresponding result from `t_ds_dq_execute_result` according to `taskInstanceId`, and then The result is judged according to the check mode, operator and threshold configured by the user.
Add config : `<server-name>/conf/common.properties` - If the result is a failure, the corresponding operation, alarm or interruption will be performed according to the failure policy configured by the user.
- Add config : `<server-name>/conf/common.properties`
```properties ```properties
# Change to specific version if you not use dev branch
data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar
``` ```
Please fill in `data-quality.jar.name` according to the actual package name, - Please fill in `data-quality.jar.name` according to the actual package name.
If you package `data-quality` separately, remember to modify the package name to be consistent with `data-quality.jar.name`. - If you package `data-quality` separately, remember to modify the package name to be consistent with `data-quality.jar.name`.
If the old version is upgraded and used, you need to execute the `sql` update script to initialize the database before running. - If the old version is upgraded and used, you need to execute the `sql` update script to initialize the database before running.
If you want to use `MySQL` data, you need to comment out the `scope` of `MySQL` in `pom.xml` - If you want to use `MySQL` data, you need to comment out the `scope` of `MySQL` in `pom.xml`.
Currently only `MySQL`, `PostgreSQL` and `HIVE` data sources have been tested, other data sources have not been tested yet - Currently only `MySQL`, `PostgreSQL` and `HIVE` data sources have been tested, other data sources have not been tested yet.
`Spark` needs to be configured to read `Hive` metadata, `Spark` does not use `jdbc` to read `Hive` - `Spark` needs to be configured to read `Hive` metadata, `Spark` does not use `jdbc` to read `Hive`.
## Detail ## Detailed Inspection Logic
- CheckMethod: [CheckFormula][Operator][Threshold], if the result is true, it indicates that the data does not meet expectations, and the failure strategy is executed. | **Parameter** | **Description** |
- CheckFormula: | ----- | ---- |
- Expected-Actual | CheckMethod | [CheckFormula][Operator][Threshold], if the result is true, it indicates that the data does not meet expectations, and the failure strategy is executed. |
- Actual-Expected | CheckFormula | <ul><li>Expected-Actual</li><li>Actual-Expected</li><li>(Actual/Expected)x100%</li><li>(Expected-Actual)/Expected x100%</li></ul> |
- (Actual/Expected)x100% | Operator | =, >, >=, <, <=, != |
- (Expected-Actual)/Expected x100% | ExpectedValue | <ul><li>FixValue</li><li>DailyAvg</li><li>WeeklyAvg</li><li>MonthlyAvg</li><li>Last7DayAvg</li><li>Last30DayAvg</li><li>SrcTableTotalRows</li><li>TargetTableTotalRows</li></ul> |
- Operator:=、>、>=、<<=、!= | Example |<ul><li>CheckFormula:Expected-Actual</li><li>Operator:></li><li>Threshold:0</li><li>ExpectedValue:FixValue=9</li></ul>
- ExpectedValue
- FixValue
- DailyAvg
- WeeklyAvg
- MonthlyAvg
- Last7DayAvg
- Last30DayAvg
- SrcTableTotalRows
- TargetTableTotalRows
- example
- CheckFormula:Expected-Actual
- Operator:>
- Threshold:0
- ExpectedValue:FixValue=9。
Assuming that the actual value is 10, the operator is >, and the expected value is 9, then the result 10 -9 > 0 is true, which means that the row data in the empty column has exceeded the threshold, and the task is judged to fail In the example, assuming that the actual value is 10, the operator is >, and the expected value is 9, then the result 10 -9 > 0 is true, which means that the row data in the empty column has exceeded the threshold, and the task is judged to fail.
# Guide
## NullCheck # Task Operation Guide
### Introduction
## Null Value Check for Single Table Check
### Inspection Introduction
The goal of the null value check is to check the number of empty rows in the specified column. The number of empty rows can be compared with the total number of rows or a specified threshold. If it is greater than a certain threshold, it will be judged as failure. The goal of the null value check is to check the number of empty rows in the specified column. The number of empty rows can be compared with the total number of rows or a specified threshold. If it is greater than a certain threshold, it will be judged as failure.
- Calculate the SQL statement that the specified column is empty as follows: - Calculate the SQL statement that the specified column is empty as follows:
@ -64,247 +53,253 @@ The goal of the null value check is to check the number of empty rows in the spe
SELECT COUNT(*) AS total FROM ${src_table} WHERE (${src_filter}) SELECT COUNT(*) AS total FROM ${src_table} WHERE (${src_filter})
``` ```
### UI Guide ### Interface Operation Guide
![dataquality_null_check](../../../img/tasks/demo/null_check.png) ![dataquality_null_check](../../../img/tasks/demo/null_check.png)
- Source data type: select MySQL, PostgreSQL, etc.
- Source data source: the corresponding data source under the source data type | **Parameter** | **Description** |
- Source data table: drop-down to select the table where the validation data is located | ----- | ---- |
- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional | Source data type | Select MySQL, PostgreSQL, etc. |
- Src table check column: drop-down to select the check column name | Source data source | The corresponding data source under the source data type. |
- Check method: | Source data table | Drop-down to select the table where the validation data is located. |
- [Expected-Actual] | Src filter conditions | Such as the title, it will also be used when counting the total number of rows in the table, optional. |
- [Actual-Expected] | Src table check column | Drop-down to select the check column name. |
- [Actual/Expected]x100% | Check method | <ul><li>[Expected-Actual]</li><li>[Actual-Expected]</li><li>[Actual/Expected]x100%</li><li>[(Expected-Actual)/Expected]x100%</li></ul> |
- [(Expected-Actual)/Expected]x100% | Check operators | =, >, >=, <, <=, ! = |
- Check operators: =, >, >=, <, <=, ! = | Threshold | The value used in the formula for comparison. |
- Threshold: The value used in the formula for comparison | Failure strategy | <ul><li>Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.</li><li>Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.</li></ul> |
- Failure strategy | Expected value type | Select the desired type from the drop-down menu. |
- Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent
- Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent ## Timeliness Check of Single Table Check
- Expected value type: select the desired type from the drop-down menu ### Inspection Introduction
The timeliness check is used to check whether the data is processed within the expected time. The start time and end time can be specified to define the time range. If the amount of data within the time range does not reach the set threshold, the check task will be judged as fail.
## Timeliness Check
### Introduction ### Interface Operation Guide
The timeliness check is used to check whether the data is processed within the expected time. The start time and end time can be specified to define the time range. If the amount of data within the time range does not reach the set threshold, the check task will be judged as fail
### UI Guide
![dataquality_timeliness_check](../../../img/tasks/demo/timeliness_check.png) ![dataquality_timeliness_check](../../../img/tasks/demo/timeliness_check.png)
- Source data type: select MySQL, PostgreSQL, etc.
- Source data source: the corresponding data source under the source data type | **Parameter** | **Description** |
- Source data table: drop-down to select the table where the validation data is located | ----- | ---- |
- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional | Source data type | Select MySQL, PostgreSQL, etc.
- Src table check column: drop-down to select check column name | Source data source | The corresponding data source under the source data type.
- start time: the start time of a time range | Source data table | Drop-down to select the table where the validation data is located. |
- end time: the end time of a time range | Src filter conditions | Such as the title, it will also be used when counting the total number of rows in the table, optional. |
- Time Format: Set the corresponding time format | Src table check column | Drop-down to select check column name. |
- Check method: | Start time | The start time of a time range. |
- [Expected-Actual] | end time | The end time of a time range. |
- [Actual-Expected] | Time Format | Set the corresponding time format. |
- [Actual/Expected]x100% | Check method | <ul><li>[Expected-Actual]</li><li>[Actual-Expected]</li><li>[Actual/Expected]x100%</li><li>[(Expected-Actual)/Expected]x100%</li></ul> |
- [(Expected-Actual)/Expected]x100% | Check operators | =, >, >=, <, <=, ! = |
- Check operators: =, >, >=, <, <=, ! = | Threshold | The value used in the formula for comparison. |
- Threshold: The value used in the formula for comparison | Failure strategy | <ul><li>Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.</li><li>Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.</li></ul> |
- Failure strategy | Expected value type | Select the desired type from the drop-down menu. |
- Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent
- Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent ## Field Length Check for Single Table Check
- Expected value type: select the desired type from the drop-down menu
### Inspection Introduction
## Field Length Check The goal of field length verification is to check whether the length of the selected field meets the expectations. If there is data that does not meet the requirements, and the number of rows exceeds the threshold, the task will be judged to fail.
### Introduction
The goal of field length verification is to check whether the length of the selected field meets the expectations. If there is data that does not meet the requirements, and the number of rows exceeds the threshold, the task will be judged to fail ### Interface Operation Guide
### UI Guide
![dataquality_length_check](../../../img/tasks/demo/field_length_check.png) ![dataquality_length_check](../../../img/tasks/demo/field_length_check.png)
- Source data type: select MySQL, PostgreSQL, etc.
- Source data source: the corresponding data source under the source data type | **Parameter** | **Description** |
- Source data table: drop-down to select the table where the validation data is located | ----- | ---- |
- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional | Source data type | Select MySQL, PostgreSQL, etc. |
- Src table check column: drop-down to select the check column name | Source data source | The corresponding data source under the source data type. |
- Logical operators: =, >, >=, <, <=, ! = | Source data table | Drop-down to select the table where the validation data is located. |
- Field length limit: like the title | Src filter conditions | Such as the title, it will also be used when counting the total number of rows in the table, optional. |
- Check method: | Src table check column | Drop-down to select the check column name. |
- [Expected-Actual] | Logical operators | =, >, >=, <, <=, ! = |
- [Actual-Expected] | Field length limit | Like the title. |
- [Actual/Expected]x100% | Check method | <ul><li>[Expected-Actual]</li><li>[Actual-Expected]</li><li>[Actual/Expected]x100%</li><li>[(Expected-Actual)/Expected]x100%</li></ul> |
- [(Expected-Actual)/Expected]x100% | Check operators | =, >, >=, <, <=, ! = |
- Check operators: =, >, >=, <, <=, ! = | Threshold | The value used in the formula for comparison. |
- Threshold: The value used in the formula for comparison | Failure strategy | <ul><li>Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.</li><li>Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.</li></ul> |
- Failure strategy | Expected value type | Select the desired type from the drop-down menu. |
- Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent
- Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent ## Uniqueness Check for Single Table Check
- Expected value type: select the desired type from the drop-down menu
### Inspection Introduction
## Uniqueness Check
### Introduction
The goal of the uniqueness check is to check whether the field is duplicated. It is generally used to check whether the primary key is duplicated. If there is duplication and the threshold is reached, the check task will be judged to be failed. The goal of the uniqueness check is to check whether the field is duplicated. It is generally used to check whether the primary key is duplicated. If there is duplication and the threshold is reached, the check task will be judged to be failed.
### UI Guide
### Interface Operation Guide
![dataquality_uniqueness_check](../../../img/tasks/demo/uniqueness_check.png) ![dataquality_uniqueness_check](../../../img/tasks/demo/uniqueness_check.png)
- Source data type: select MySQL, PostgreSQL, etc.
- Source data source: the corresponding data source under the source data type | **Parameter** | **Description** |
- Source data table: drop-down to select the table where the validation data is located | ----- | ---- |
- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional | Source data type | Select MySQL, PostgreSQL, etc. |
- Src table check column: drop-down to select the check column name | Source data source | The corresponding data source under the source data type. |
- Check method: | Source data table | Drop-down to select the table where the validation data is located. |
- [Expected-Actual] | Src filter conditions | Such as the title, it will also be used when counting the total number of rows in the table, optional. |
- [Actual-Expected] | Src table check column | Drop-down to select the check column name. |
- [Actual/Expected]x100% | Check method | <ul><li>[Expected-Actual]</li><li>[Actual-Expected]</li><li>[Actual/Expected]x100%</li><li>[(Expected-Actual)/Expected]x100%</li></ul> |
- [(Expected-Actual)/Expected]x100% | Check operators | =, >, >=, <, <=, ! = |
- Check operators: =, >, >=, <, <=, ! = | Threshold | The value used in the formula for comparison. |
- Threshold: The value used in the formula for comparison | Failure strategy | <ul><li>Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.</li><li>Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.</li></ul> |
- Failure strategy | Expected value type | Select the desired type from the drop-down menu. |
- Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent
- Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent ## Regular Expression Check for Single Table Check
- Expected value type: select the desired type from the drop-down menu
### Inspection Introduction
## Regular Expression Check
### Introduction
The goal of regular expression verification is to check whether the format of the value of a field meets the requirements, such as time format, email format, ID card format, etc. If there is data that does not meet the format and exceeds the threshold, the task will be judged as failed. The goal of regular expression verification is to check whether the format of the value of a field meets the requirements, such as time format, email format, ID card format, etc. If there is data that does not meet the format and exceeds the threshold, the task will be judged as failed.
### UI Guide
### Interface Operation Guide
![dataquality_regex_check](../../../img/tasks/demo/regexp_check.png) ![dataquality_regex_check](../../../img/tasks/demo/regexp_check.png)
- Source data type: select MySQL, PostgreSQL, etc.
- Source data source: the corresponding data source under the source data type | **Parameter** | **Description** |
- Source data table: drop-down to select the table where the validation data is located | ----- | ---- |
- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional | Source data type | Select MySQL, PostgreSQL, etc. |
- Src table check column: drop-down to select check column name | Source data source | The corresponding data source under the source data type. |
- Regular expression: as title | Source data table | Drop-down to select the table where the validation data is located. |
- Check method: | Src filter conditions | Such as the title, it will also be used when counting the total number of rows in the table, optional. |
- [Expected-Actual] | Src table check column | Drop-down to select check column name. |
- [Actual-Expected] | Regular expression | As title. |
- [Actual/Expected]x100% | Check method | <ul><li>[Expected-Actual]</li><li>[Actual-Expected]</li><li>[Actual/Expected]x100%</li><li>[(Expected-Actual)/Expected]x100%</li></ul> |
- [(Expected-Actual)/Expected]x100% | Check operators | =, >, >=, <, <=, ! = |
- Check operators: =, >, >=, <, <=, ! = | Threshold | The value used in the formula for comparison. |
- Threshold: The value used in the formula for comparison | Failure strategy | <ul><li>Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.</li><li>Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.</li></ul> |
- Failure strategy | Expected value type | Select the desired type from the drop-down menu. |
- Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent
- Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent ## Enumeration Value Validation for Single Table Check
- Expected value type: select the desired type from the drop-down menu ### Inspection Introduction
## Enumeration Check
### Introduction
The goal of enumeration value verification is to check whether the value of a field is within the range of enumeration values. If there is data that is not in the range of enumeration values and exceeds the threshold, the task will be judged to fail The goal of enumeration value verification is to check whether the value of a field is within the range of enumeration values. If there is data that is not in the range of enumeration values and exceeds the threshold, the task will be judged to fail
### UI Guide
### Interface Operation Guide
![dataquality_enum_check](../../../img/tasks/demo/enumeration_check.png) ![dataquality_enum_check](../../../img/tasks/demo/enumeration_check.png)
- Source data type: select MySQL, PostgreSQL, etc.
- Source data source: the corresponding data source under the source data type | **Parameter** | **Description** |
- Source data table: drop-down to select the table where the validation data is located | ----- | ---- |
- Src table filter conditions: such as title, also used when counting the total number of rows in the table, optional | Source data type | Select MySQL, PostgreSQL, etc. |
- Src table check column: drop-down to select the check column name | Source data source | The corresponding data source under the source data type. |
- List of enumeration values: separated by commas | Source data table | Drop-down to select the table where the validation data is located. |
- Check method: | Src table filter conditions | Such as title, also used when counting the total number of rows in the table, optional. |
- [Expected-Actual] | Src table check column | Drop-down to select the check column name. |
- [Actual-Expected] | List of enumeration values | Separated by commas. |
- [Actual/Expected]x100% | Check method | <ul><li>[Expected-Actual]</li><li>[Actual-Expected]</li><li>[Actual/Expected]x100%</li><li>[(Expected-Actual)/Expected]x100%</li></ul> |
- [(Expected-Actual)/Expected]x100% | Check operators | =, >, >=, <, <=, ! = |
- Check operators: =, >, >=, <, <=, ! = | Threshold | The value used in the formula for comparison. |
- Threshold: The value used in the formula for comparison | Failure strategy | <ul><li>Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.</li><li>Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.</li></ul> |
- Failure strategy | Expected value type | Select the desired type from the drop-down menu. |
- Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent
- Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent ## Table Row Number Verification for Single Table Check
- Expected value type: select the desired type from the drop-down menu
## Table Count Check ### Inspection Introduction
### Introduction
The goal of table row number verification is to check whether the number of rows in the table reaches the expected value. If the number of rows does not meet the standard, the task will be judged as failed. The goal of table row number verification is to check whether the number of rows in the table reaches the expected value. If the number of rows does not meet the standard, the task will be judged as failed.
### UI Guide
### Interface Operation Guide
![dataquality_count_check](../../../img/tasks/demo/table_count_check.png) ![dataquality_count_check](../../../img/tasks/demo/table_count_check.png)
- Source data type: select MySQL, PostgreSQL, etc.
- Source data source: the corresponding data source under the source data type | **Parameter** | **Description** |
- Source data table: drop-down to select the table where the validation data is located | ----- | ---- |
- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional | Source data type | Select MySQL, PostgreSQL, etc. |
- Src table check column: drop-down to select the check column name | Source data source | The corresponding data source under the source data type. |
- Check method: | Source data table | Drop-down to select the table where the validation data is located. |
- [Expected-Actual] | Src filter conditions | Such as the title, it will also be used when counting the total number of rows in the table, optional. |
- [Actual-Expected] | Src table check column | Drop-down to select the check column name. |
- [Actual/Expected]x100% | Check method | <ul><li>[Expected-Actual]</li><li>[Actual-Expected]</li><li>[Actual/Expected]x100%</li><li>[(Expected-Actual)/Expected]x100%</li></ul> |
- [(Expected-Actual)/Expected]x100% | Check operators | =, >, >=, <, <=, ! = |
- Check operators: =, >, >=, <, <=, ! = | Threshold | The value used in the formula for comparison. |
- Threshold: The value used in the formula for comparison | Failure strategy | <ul><li>Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.</li><li>Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.</li></ul> |
- Failure strategy | Expected value type | Select the desired type from the drop-down menu. |
- Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent
- Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent ## Custom SQL Check for Single Table Check
- Expected value type: select the desired type from the drop-down menu
### Interface Operation Guide
## Custom SQL Check
### Introduction
### UI Guide
![dataquality_custom_sql_check](../../../img/tasks/demo/custom_sql_check.png) ![dataquality_custom_sql_check](../../../img/tasks/demo/custom_sql_check.png)
- Source data type: select MySQL, PostgreSQL, etc.
- Source data source: the corresponding data source under the source data type | **Parameter** | **Description** |
- Source data table: drop-down to select the table where the data to be verified is located | ----- | ---- |
- Actual value name: alias in SQL for statistical value calculation, such as max_num | Source data type | Select MySQL, PostgreSQL, etc. |
- Actual value calculation SQL: SQL for outputting actual values, | Source data source | The corresponding data source under the source data type. |
- Note: The SQL must be statistical SQL, such as counting the number of rows, calculating the maximum value, minimum value, etc. | Source data table | Drop-down to select the table where the data to be verified is located. |
- select max(a) as max_num from ${src_table}, the table name must be filled like this | Actual value name | Alias in SQL for statistical value calculation, such as max_num. |
- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional |Actual value calculation SQL | SQL for outputting actual values. Note:<ul><li>The SQL must be statistical SQL, such as counting the number of rows, calculating the maximum value, minimum value, etc.</li><li>Select max(a) as max_num from ${src_table}, the table name must be filled like this.</li></ul> |
- Check method: | Src filter conditions | Such as the title, it will also be used when counting the total number of rows in the table, optional. |
- Check operators: =, >, >=, <, <=, ! = | Check method | <ul><li>[Expected-Actual]</li><li>[Actual-Expected]</li><li>[Actual/Expected]x100%</li><li>[(Expected-Actual)/Expected]x100%</li></ul> |
- Threshold: The value used in the formula for comparison | Check operators | =, >, >=, <, <=, ! = |
- Failure strategy | Threshold | The value used in the formula for comparison. |
- Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent | Failure strategy | <ul><li>Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.</li><li>Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.</li></ul> |
- Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent | Expected value type | Select the desired type from the drop-down menu. |
- Expected value type: select the desired type from the drop-down menu
## Accuracy Check of Multi-table
## Accuracy check of multi-table ### Inspection Introduction
### Introduction
Accuracy checks are performed by comparing the accuracy differences of data records for selected fields between two tables, examples are as follows Accuracy checks are performed by comparing the accuracy differences of data records for selected fields between two tables, examples are as follows
- table test1 - table test1
| c1 | c2 | | c1 | c2 |
| :---: | :---: | | :---: | :---: |
| a | 1 | | a | 1 |
|b|2| | b | 2 |
- table test2 - table test2
| c21 | c22 | | c21 | c22 |
| :---: | :---: | | :---: | :---: |
| a | 1 | | a | 1 |
|b|3| | b | 3 |
If you compare the data in c1 and c21, the tables test1 and test2 are exactly the same. If you compare c2 and c22, the data in table test1 and table test2 are inconsistent. If you compare the data in c1 and c21, the tables test1 and test2 are exactly the same. If you compare c2 and c22, the data in table test1 and table test2 are inconsistent.
### UI Guide
### Interface Operation Guide
![dataquality_multi_table_accuracy_check](../../../img/tasks/demo/multi_table_accuracy_check.png) ![dataquality_multi_table_accuracy_check](../../../img/tasks/demo/multi_table_accuracy_check.png)
- Source data type: select MySQL, PostgreSQL, etc.
- Source data source: the corresponding data source under the source data type | **Parameter** | **Description** |
- Source data table: drop-down to select the table where the data to be verified is located | ----- | ---- |
- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional | Source data type | Select MySQL, PostgreSQL, etc. |
- Target data type: choose MySQL, PostgreSQL, etc. | Source data source | The corresponding data source under the source data type. |
- Target data source: the corresponding data source under the source data type | Source data table | Drop-down to select the table where the data to be verified is located. |
- Target data table: drop-down to select the table where the data to be verified is located | Src filter conditions | Such as the title, it will also be used when counting the total number of rows in the table, optional. |
- Target filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional | Target data type | Choose MySQL, PostgreSQL, etc. |
- Check column: | Target data source | The corresponding data source under the source data type. |
- Fill in the source data column, operator and target data column respectively | Target data table | Drop-down to select the table where the data to be verified is located. |
- Verification method: select the desired verification method | Target filter conditions | Such as the title, it will also be used when counting the total number of rows in the table, optional. |
- Operators: =, >, >=, <, <=, ! = | Check column | Fill in the source data column, operator and target data column respectively. |
- Failure strategy | Verification method | Select the desired verification method. |
- Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent | Operators | =, >, >=, <, <=, ! = |
- Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent | Failure strategy | <ul><li>Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.</li><li>Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.</li><ul> |
- Expected value type: select the desired type in the drop-down menu, only SrcTableTotalRow, TargetTableTotalRow and fixed value are suitable for selection here | Expected value type | Select the desired type in the drop-down menu, only `SrcTableTotalRow`, `TargetTableTotalRow` and fixed value are suitable for selection here. |
## Comparison of the values checked by the two tables ## Comparison of the values checked by the two tables
### Introduction ### Inspection Introduction
Two-table value comparison allows users to customize different SQL statistics for two tables and compare the corresponding values. For example, for the source table A, the total amount of a certain column is calculated, and for the target table, the total amount of a certain column is calculated. value sum2, compare sum1 and sum2 to determine the check result Two-table value comparison allows users to customize different SQL statistics for two tables and compare the corresponding values. For example, for the source table A, the total amount of a certain column is calculated, and for the target table, the total amount of a certain column is calculated. value sum2, compare sum1 and sum2 to determine the check result.
### UI Guide
### Interface Operation Guide
![dataquality_multi_table_comparison_check](../../../img/tasks/demo/multi_table_comparison_check.png) ![dataquality_multi_table_comparison_check](../../../img/tasks/demo/multi_table_comparison_check.png)
- Source data type: select MySQL, PostgreSQL, etc.
- Source data source: the corresponding data source under the source data type | **Parameter** | **Description** |
- Source data table: the table where the data is to be verified | ----- | ---- |
- Actual value name: Calculate the alias in SQL for the actual value, such as max_age1 | Source data type | Select MySQL, PostgreSQL, etc. |
- Actual value calculation SQL: SQL for outputting actual values, | Source data source | The corresponding data source under the source data type. |
- Note: The SQL must be statistical SQL, such as counting the number of rows, calculating the maximum value, minimum value, etc. | Source data table | The table where the data is to be verified. |
- select max(age) as max_age1 from ${src_table} The table name must be filled like this | Actual value name | Calculate the alias in SQL for the actual value, such as max_age1. |
- Target data type: choose MySQL, PostgreSQL, etc. | Actual value calculation SQL | SQL for outputting actual values. Note: <ul><li>The SQL must be statistical SQL, such as counting the number of rows, calculating the maximum value, minimum value, etc.</li><li>Select max(age) as max_age1 from ${src_table} The table name must be filled like this.</li></ul> |
- Target data source: the corresponding data source under the source data type | Target data type | Choose MySQL, PostgreSQL, etc. |
- Target data table: the table where the data is to be verified | Target data source | The corresponding data source under the source data type. |
- Expected value name: Calculate the alias in SQL for the expected value, such as max_age2 | Target data table | The table where the data is to be verified. |
- Expected value calculation SQL: SQL for outputting expected value, | Expected value name | Calculate the alias in SQL for the expected value, such as max_age2. |
- Note: The SQL must be statistical SQL, such as counting the number of rows, calculating the maximum value, minimum value, etc. | Expected value calculation SQL | SQL for outputting expected value. Note: <ul><li>The SQL must be statistical SQL, such as counting the number of rows, calculating the maximum value, minimum value, etc.</li><li>Select max(age) as max_age2 from ${target_table} The table name must be filled like this.</li></ul> |
- select max(age) as max_age2 from ${target_table} The table name must be filled like this | Verification method | Select the desired verification method. |
- Verification method: select the desired verification method | Operators | =, >, >=, <, <=, ! = |
- Operators: =, >, >=, <, <=, ! = | Failure strategy | <ul><li>Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.</li><li>Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.</li></ul> |
- Failure strategy
- Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent
- Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent
## Task result view ## Task result view
![dataquality_result](../../../img/tasks/demo/result.png) ![dataquality_result](../../../img/tasks/demo/result.png)
## Rule View ## Rule View
### List of rules ### List of rules
![dataquality_rule_list](../../../img/tasks/demo/rule_list.png) ![dataquality_rule_list](../../../img/tasks/demo/rule_list.png)
### Rules Details ### Rules Details
![dataquality_rule_detail](../../../img/tasks/demo/rule_detail.png) ![dataquality_rule_detail](../../../img/tasks/demo/rule_detail.png)

12
docs/docs/en/guide/monitor.md

@ -28,14 +28,16 @@
![statistics](../../../img/new_ui/dev/monitor/statistics.png) ![statistics](../../../img/new_ui/dev/monitor/statistics.png)
- Number of commands wait to be executed: statistics of the `t_ds_command` table data. | **Parameter** | **Description** |
- The number of failed commands: statistics of the `t_ds_error_command` table data. | ----- | ----- |
- Number of tasks wait to run: count the data of `task_queue` in the ZooKeeper. | Number of commands wait to be executed | Statistics of the `t_ds_command` table data. |
- Number of tasks wait to be killed: count the data of `task_kill` in the ZooKeeper. | The number of failed commands | Statistics of the `t_ds_error_command` table data. |
| Number of tasks wait to run | Count the data of `task_queue` in the ZooKeeper. |
| Number of tasks wait to be killed | Count the data of `task_kill` in the ZooKeeper. |
### Audit Log ### Audit Log
The audit log provides information about who accesses the system and the operations made to the system and record related The audit log provides information about who accesses the system and the operations made to the system and record related
time, which strengthen the security of the system and maintenance. time, which strengthen the security of the system and maintenance.
![audit-log](../../../img/new_ui/dev/monitor/audit-log.jpg) ![audit-log](../../../img/new_ui/dev/monitor/audit-log.jpg)

58
docs/docs/en/guide/resource/configuration.md

@ -1,6 +1,6 @@
# Configuration # HDFS Resource Configuration
The Resource Center is usually used for operations such as uploading files, UDF functions, and task group management. You can appoint the local file directory as the upload directory for a single machine (this operation does not need to deploy Hadoop). Or you can also upload to a Hadoop or MinIO cluster, at this time, you need to have Hadoop (2.6+) or MinIO or other related environments. When it is necessary to use the Resource Center to create or upload relevant files, all files and resources will be stored on HDFS. Therefore the following configuration is required.
## Local File Resource Configuration ## Local File Resource Configuration
@ -13,13 +13,9 @@ Configure the file in the following paths: `api-server/conf/common.properties` a
- Change `data.basedir.path` to the local directory path. Please make sure the user who deploy dolphinscheduler have read and write permissions, such as: `data.basedir.path=/tmp/dolphinscheduler`. And the directory you configured will be auto-created if it does not exists. - Change `data.basedir.path` to the local directory path. Please make sure the user who deploy dolphinscheduler have read and write permissions, such as: `data.basedir.path=/tmp/dolphinscheduler`. And the directory you configured will be auto-created if it does not exists.
- Modify the following two parameters, `resource.storage.type=HDFS` and `resource.hdfs.fs.defaultFS=file:///`. - Modify the following two parameters, `resource.storage.type=HDFS` and `resource.hdfs.fs.defaultFS=file:///`.
## HDFS Resource Configuration ## Configuring the common.properties
When it is necessary to use the Resource Center to create or upload relevant files, all files and resources will be stored on HDFS. Therefore the following configuration is required.
### Configuring the common.properties
After version 3.0.0-alpha, if you want to upload resources using HDFS or S3 from the Resource Center, the following paths need to be configured: `api-server/conf/common.properties` and `worker-server/conf/common.properties`. This can be found as follows. After version 3.0.0-alpha, if you want to upload resources using HDFS or S3 from the Resource Center, you will need to configure the following paths The following paths need to be configured: `api-server/conf/common.properties` and `worker-server/conf/common.properties`. This can be found as follows.
```properties ```properties
# #
@ -42,12 +38,13 @@ After version 3.0.0-alpha, if you want to upload resources using HDFS or S3 from
# user data local directory path, please make sure the directory exists and have read write permissions # user data local directory path, please make sure the directory exists and have read write permissions
data.basedir.path=/tmp/dolphinscheduler data.basedir.path=/tmp/dolphinscheduler
# resource storage type: HDFS, S3, NONE # resource view suffixs
resource.storage.type=HDFS #resource.view.suffixs=txt,log,sh,bat,conf,cfg,py,java,sql,xml,hql,properties,json,yml,yaml,ini,js
# resource store on HDFS/S3 path, resource file will store to this hadoop hdfs path, self configuration, # resource storage type: HDFS, S3, NONE
# please make sure the directory exists on hdfs and have read write permissions. "/dolphinscheduler" is recommended resource.storage.type=NONE
resource.storage.upload.base.path=/tmp/dolphinscheduler # resource store on HDFS/S3 path, resource file will store to this base path, self configuration, please make sure the directory exists on hdfs and have read write permissions. "/dolphinscheduler" is recommended
resource.storage.upload.base.path=/dolphinscheduler
# The AWS access key. if resource.storage.type=S3 or use EMR-Task, This configuration is required # The AWS access key. if resource.storage.type=S3 or use EMR-Task, This configuration is required
resource.aws.access.key.id=minioadmin resource.aws.access.key.id=minioadmin
@ -61,10 +58,9 @@ resource.aws.s3.bucket.name=dolphinscheduler
resource.aws.s3.endpoint=http://localhost:9000 resource.aws.s3.endpoint=http://localhost:9000
# if resource.storage.type=HDFS, the user must have the permission to create directories under the HDFS root path # if resource.storage.type=HDFS, the user must have the permission to create directories under the HDFS root path
resource.hdfs.root.user=root resource.hdfs.root.user=hdfs
# if resource.storage.type=S3, the value like: s3a://dolphinscheduler; # if resource.storage.type=S3, the value like: s3a://dolphinscheduler; if resource.storage.type=HDFS and namenode HA is enabled, you need to copy core-site.xml and hdfs-site.xml to conf dir
# if resource.storage.type=HDFS and namenode HA is enabled, you need to copy core-site.xml and hdfs-site.xml to conf dir resource.hdfs.fs.defaultFS=hdfs://mycluster:8020
resource.hdfs.fs.defaultFS=hdfs://localhost:8020
# whether to startup kerberos # whether to startup kerberos
hadoop.security.authentication.startup.state=false hadoop.security.authentication.startup.state=false
@ -80,18 +76,16 @@ login.user.keytab.path=/opt/hdfs.headless.keytab
# kerberos expire time, the unit is hour # kerberos expire time, the unit is hour
kerberos.expire.time=2 kerberos.expire.time=2
# resource view suffixs
#resource.view.suffixs=txt,log,sh,bat,conf,cfg,py,java,sql,xml,hql,properties,json,yml,yaml,ini,js
# resourcemanager port, the default value is 8088 if not specified # resourcemanager port, the default value is 8088 if not specified
resource.manager.httpaddress.port=8088 resource.manager.httpaddress.port=8088
# if resourcemanager HA is enabled, please set the HA IPs; if resourcemanager is single, keep this value empty # if resourcemanager HA is enabled, please set the HA IPs; if resourcemanager is single, keep this value empty
yarn.resourcemanager.ha.rm.ids=192.168.xx.xx,192.168.xx.xx yarn.resourcemanager.ha.rm.ids=192.168.xx.xx,192.168.xx.xx
# if resourcemanager HA is enabled or not use resourcemanager, please keep the default value; # if resourcemanager HA is enabled or not use resourcemanager, please keep the default value; If resourcemanager is single, you only need to replace ds1 to actual resourcemanager hostname
# If resourcemanager is single, you only need to replace ds1 to actual resourcemanager hostname yarn.application.status.address=http://ds1:%s/ws/v1/cluster/apps/%s
yarn.application.status.address=http://localhost:%s/ds/v1/cluster/apps/%s
# job history status url when application number threshold is reached(default 10000, maybe it was set to 1000) # job history status url when application number threshold is reached(default 10000, maybe it was set to 1000)
yarn.job.history.status.address=http://localhost:19888/ds/v1/history/mapreduce/jobs/%s yarn.job.history.status.address=http://ds1:19888/ws/v1/history/mapreduce/jobs/%s
# datasource encryption enable # datasource encryption enable
datasource.encryption.enable=false datasource.encryption.enable=false
@ -109,8 +103,7 @@ data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar
# Whether hive SQL is executed in the same session # Whether hive SQL is executed in the same session
support.hive.oneSession=false support.hive.oneSession=false
# use sudo or not, if set true, executing user is tenant user and deploy user needs sudo permissions; # use sudo or not, if set true, executing user is tenant user and deploy user needs sudo permissions; if set false, executing user is the deploy user and doesn't need sudo permissions
# if set false, executing user is the deploy user and doesn't need sudo permissions
sudo.enable=true sudo.enable=true
# network interface preferred like eth0, default: empty # network interface preferred like eth0, default: empty
@ -120,17 +113,26 @@ sudo.enable=true
#dolphin.scheduler.network.priority.strategy=default #dolphin.scheduler.network.priority.strategy=default
# system env path # system env path
#dolphinscheduler.env.path=env/dolphinscheduler_env.sh #dolphinscheduler.env.path=dolphinscheduler_env.sh
# development state # development state
development.state=false development.state=false
# rpc port # rpc port
alert.rpc.port=50052 alert.rpc.port=50052
# Url endpoint for zeppelin RESTful API
zeppelin.rest.url=http://localhost:8080
# set path of conda.sh
conda.path=/opt/anaconda3/etc/profile.d/conda.sh
# Task resource limit state
task.resource.limit.state=false
``` ```
> **_Note:_** > **Note:**
> >
> * If only the `api-server/conf/common.properties` file is configured, then resource uploading is enabled, but you can not use resources in task. If you want to use or execute the files in the workflow you need to configure `worker-server/conf/common.properties` too. > * If only the `api-server/conf/common.properties` file is configured, then resource uploading is enabled, but you can not use resources in task. If you want to use or execute the files in the workflow you need to configure `worker-server/conf/common.properties` too.
> * If you want to use the resource upload function, the deployment user in [installation and deployment](../installation/standalone.md) must have relevant operation authority. > * If you want to use the resource upload function, the deployment user in [installation and deployment](../installation/standalone.md) must have relevant operation authority.
> * If you using a Hadoop cluster with HA, you need to enable HDFS resource upload, and you need to copy the `core-site.xml` and `hdfs-site.xml` under the Hadoop cluster to `worker-server/conf` and `api-server/conf`, otherwise skip this copy step. > * If you using a Hadoop cluster with HA, you need to enable HDFS resource upload, and you need to copy the `core-site.xml` and `hdfs-site.xml` under the Hadoop cluster to `worker-server/conf` and `api-server/conf`, otherwise skip this copy step.

5
docs/docs/en/guide/resource/intro.md

@ -0,0 +1,5 @@
# Resource Center Introduction
The Resource Center is typically used for uploading files, UDF functions, and task group management. For a stand-alone
environment, you can select the local file directory as the upload folder (**this operation does not require Hadoop or HDFS deployment**).
Of course, you can also choose to upload to Hadoop or MinIO cluster. In this case, you need to have Hadoop (2.6+) or MinIOn and other related environments.

14
docs/docs/en/guide/resource/task-group.md

@ -2,9 +2,9 @@
The task group is mainly used to control the concurrency of task instances and is designed to control the pressure of other resources (it can also control the pressure of the Hadoop cluster, the cluster will have queue control it). When creating a new task definition, you can configure the corresponding task group and configure the priority of the task running in the task group. The task group is mainly used to control the concurrency of task instances and is designed to control the pressure of other resources (it can also control the pressure of the Hadoop cluster, the cluster will have queue control it). When creating a new task definition, you can configure the corresponding task group and configure the priority of the task running in the task group.
### Task Group Configuration ## Task Group Configuration
#### Create Task Group ### Create Task Group
![create-taskGroup](../../../../img/new_ui/dev/resource/create-taskGroup.png) ![create-taskGroup](../../../../img/new_ui/dev/resource/create-taskGroup.png)
@ -20,7 +20,7 @@ You need to enter the information inside the picture:
- Resource pool size: The maximum number of concurrent task instances allowed. - Resource pool size: The maximum number of concurrent task instances allowed.
#### View Task Group Queue ### View Task Group Queue
![view-queue](../../../../img/new_ui/dev/resource/view-queue.png) ![view-queue](../../../../img/new_ui/dev/resource/view-queue.png)
@ -28,7 +28,7 @@ Click the button to view task group usage information:
![view-queue](../../../../img/new_ui/dev/resource/view-groupQueue.png) ![view-queue](../../../../img/new_ui/dev/resource/view-groupQueue.png)
#### Use of Task Groups ### Use of Task Groups
**Note**: The usage of task groups is applicable to tasks executed by workers, such as [switch] nodes, [condition] nodes, [sub_process] and other node types executed by the master are not controlled by the task group. Let's take the shell node as an example: **Note**: The usage of task groups is applicable to tasks executed by workers, such as [switch] nodes, [condition] nodes, [sub_process] and other node types executed by the master are not controlled by the task group. Let's take the shell node as an example:
@ -40,13 +40,13 @@ Regarding the configuration of the task group, all you need to do is to configur
- Priority: When there is a waiting resource, the task with high priority will be distributed to the worker by the master first. The larger the value of this part, the higher the priority. - Priority: When there is a waiting resource, the task with high priority will be distributed to the worker by the master first. The larger the value of this part, the higher the priority.
### Implementation Logic of Task Group ## Implementation Logic of Task Group
#### Get Task Group Resources: ### Get Task Group Resources
The master judges whether the task is configured with a task group when distributing the task. If the task is not configured, it is normally thrown to the worker to run; if a task group is configured, it checks whether the remaining size of the task group resource pool meets the current task operation before throwing it to the worker for execution. , if the resource pool -1 is satisfied, continue to run; if not, exit the task distribution and wait for other tasks to wake up. The master judges whether the task is configured with a task group when distributing the task. If the task is not configured, it is normally thrown to the worker to run; if a task group is configured, it checks whether the remaining size of the task group resource pool meets the current task operation before throwing it to the worker for execution. , if the resource pool -1 is satisfied, continue to run; if not, exit the task distribution and wait for other tasks to wake up.
#### Release and Wake Up: ### Release and Wake Up
When the task that has occupied the task group resource is finished, the task group resource will be released. After the release, it will check whether there is a task waiting in the current task group. If there is, mark the task with the best priority to run, and create a new executable event. The event stores the task ID that is marked to acquire the resource, and then the task obtains the task group resource and run. When the task that has occupied the task group resource is finished, the task group resource will be released. After the release, it will check whether there is a task waiting in the current task group. If there is, mark the task with the best priority to run, and create a new executable event. The event stores the task ID that is marked to acquire the resource, and then the task obtains the task group resource and run.

16
docs/docs/en/guide/resource/udf-manage.md

@ -2,20 +2,18 @@
The resource management and file management functions are similar. The difference is that the resource management is the UDF upload function, and the file management uploads the user programs, scripts and configuration files. Operation function: rename, download, delete. The resource management and file management functions are similar. The difference is that the resource management is the UDF upload function, and the file management uploads the user programs, scripts and configuration files. Operation function: rename, download, delete.
- Upload UDF resources - Upload UDF resources: Same as uploading files.
> Same as uploading files. ## Function Management
### Function Management
- Create UDF function - Create UDF function
> Click "Create UDF Function", enter the UDF function parameters, select the UDF resource, and click "Submit" to create the UDF function. > Click "`Create UDF Function`", enter the UDF function parameters, select the UDF resource, and click `Submit` to create the UDF function.
> Currently, only supports temporary UDF functions of Hive. > Currently, only supports temporary UDF functions of `HIVE`.
- UDF function name: enter the name of the UDF function. - UDF function name: Enter the name of the UDF function.
- Package name Class name: enter the full path of the UDF function. - Package name Class name: Enter the full path of the UDF function.
- UDF resource: set the resource file corresponding to the created UDF function. - UDF resource: Set the resource file corresponding to the created UDF function.
![create-udf](../../../../img/new_ui/dev/resource/create-udf.png) ![create-udf](../../../../img/new_ui/dev/resource/create-udf.png)

18
docs/docs/en/guide/security.md

@ -1,7 +1,7 @@
# Security (Authorization System) # Security (Authorization System)
* Only the administrator account in the security center has the authority to operate. It has functions such as queue management, tenant management, user management, alarm group management, worker group management, token management, etc. In the user management module, can authorize to the resources, data sources, projects, etc. - Only the administrator account in the security center has the authority to operate. It has functions such as queue management, tenant management, user management, alarm group management, worker group management, token management, etc. In the user management module, can authorize to the resources, data sources, projects, etc.
* Administrator login, the default username and password is `admin/dolphinscheduler123` - Administrator login, the default username and password is `admin/dolphinscheduler123`.
## Create Queue ## Create Queue
@ -50,7 +50,7 @@
## Token Management ## Token Management
> Since the back-end interface has login check, token management provides a way to execute various operations on the system by calling interfaces. Since the back-end interface has login check, token management provides a way to execute various operations on the system by calling interfaces.
- The administrator enters the `Security Center -> Token Management page`, clicks the `Create Token` button, selects the expiration time and user, clicks the `Generate Token` button, and clicks the `Submit` button, then create the selected user's token successfully. - The administrator enters the `Security Center -> Token Management page`, clicks the `Create Token` button, selects the expiration time and user, clicks the `Generate Token` button, and clicks the `Submit` button, then create the selected user's token successfully.
@ -66,7 +66,6 @@
public void doPOSTParam()throws Exception{ public void doPOSTParam()throws Exception{
// create HttpClient // create HttpClient
CloseableHttpClient httpclient = HttpClients.createDefault(); CloseableHttpClient httpclient = HttpClients.createDefault();
// create http post request // create http post request
HttpPost httpPost = new HttpPost("http://127.0.0.1:12345/escheduler/projects/create"); HttpPost httpPost = new HttpPost("http://127.0.0.1:12345/escheduler/projects/create");
httpPost.setHeader("token", "123"); httpPost.setHeader("token", "123");
@ -96,9 +95,9 @@
## Granted Permissions ## Granted Permissions
* Granted permissions include project permissions, resource permissions, data source permissions, UDF function permissions. - Granted permissions include project permissions, resource permissions, data source permissions, UDF function permissions.
* The administrator can authorize the projects, resources, data sources and UDF functions to normal users which not created by them. Because the way to authorize projects, resources, data sources and UDF functions to users is the same, we take project authorization as an example. - The administrator can authorize the projects, resources, data sources and UDF functions to normal users which not created by them. Because the way to authorize projects, resources, data sources and UDF functions to users is the same, we take project authorization as an example.
* Note: The user has all permissions to the projects created by them. Projects will not be displayed in the project list and the selected project list. - Note: The user has all permissions to the projects created by them. Projects will not be displayed in the project list and the selected project list.
- The administrator enters the `Security Center -> User Management` page and clicks the `Authorize` button of the user who needs to be authorized, as shown in the figure below: - The administrator enters the `Security Center -> User Management` page and clicks the `Authorize` button of the user who needs to be authorized, as shown in the figure below:
<p align="center"> <p align="center">
@ -145,7 +144,6 @@ worker.groups=default,test
![create-environment](../../../img/new_ui/dev/security/create-environment.png) ![create-environment](../../../img/new_ui/dev/security/create-environment.png)
> Usage environment > Usage environment
- Create a task node in the workflow definition, select the worker group and the environment corresponding to the worker group. When executing the task, the Worker will execute the environment first before executing the task. - Create a task node in the workflow definition, select the worker group and the environment corresponding to the worker group. When executing the task, the Worker will execute the environment first before executing the task.
![use-environment](../../../img/new_ui/dev/security/use-environment.png) ![use-environment](../../../img/new_ui/dev/security/use-environment.png)
@ -153,11 +151,9 @@ worker.groups=default,test
## Cluster Management ## Cluster Management
> Add or update cluster > Add or update cluster
- Each process can be related to zero or several clusters to support multiple environment, now just support k8s. - Each process can be related to zero or several clusters to support multiple environment, now just support k8s.
> Usage cluster > Usage cluster
- After creation and authorization, k8s namespaces and processes will associate clusters. Each cluster will have separate workflows and task instances running independently. - After creation and authorization, k8s namespaces and processes will associate clusters. Each cluster will have separate workflows and task instances running independently.
![create-cluster](../../../img/new_ui/dev/security/create-cluster.png) ![create-cluster](../../../img/new_ui/dev/security/create-cluster.png)
@ -173,3 +169,5 @@ worker.groups=default,test
- After creation and authorization, you can select it from the namespace drop down list when edit k8s task, If the k8s cluster name is `ds_null_k8s` means test mode which will not operate the cluster actually. - After creation and authorization, you can select it from the namespace drop down list when edit k8s task, If the k8s cluster name is `ds_null_k8s` means test mode which will not operate the cluster actually.
![create-environment](../../../img/new_ui/dev/security/create-namespace.png) ![create-environment](../../../img/new_ui/dev/security/create-namespace.png)

4
docs/docs/zh/guide/resource/intro.md

@ -0,0 +1,4 @@
# 资源中心简介
资源中心通常用于上传文件、UDF 函数和任务组管理。 对于 standalone 环境,可以选择本地文件目录作为上传文件夹(此操作不需要Hadoop部署)。当然,你也可以
选择上传到 Hadoop 或者 MinIO 集群。 在这种情况下,您需要有 Hadoop(2.6+)或 MinION 等相关环境。
Loading…
Cancel
Save