Browse Source

[Feature][doc] Refactor and separate the Resource Center Document (#9658)

Co-authored-by: Jiajie Zhong <zhongjiajie955@gmail.com>
3.0.0/version-upgrade
QuakeWang 2 years ago committed by GitHub
parent
commit
691e8ab538
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
  1. 38
      docs/configs/docsdev.js
  2. 165
      docs/docs/en/guide/resource.md
  3. 116
      docs/docs/en/guide/resource/configuration.md
  4. 64
      docs/docs/en/guide/resource/file-manage.md
  5. 56
      docs/docs/en/guide/resource/task-group.md
  6. 45
      docs/docs/en/guide/resource/udf-manage.md
  7. 2
      docs/docs/en/guide/task/flink.md
  8. 2
      docs/docs/en/guide/task/map-reduce.md
  9. 2
      docs/docs/en/guide/task/spark.md
  10. 168
      docs/docs/zh/guide/resource.md
  11. 117
      docs/docs/zh/guide/resource/configuration.md
  12. 69
      docs/docs/zh/guide/resource/file-manage.md
  13. 57
      docs/docs/zh/guide/resource/task-group.md
  14. 53
      docs/docs/zh/guide/resource/udf-manage.md
  15. 2
      docs/docs/zh/guide/task/flink.md
  16. 2
      docs/docs/zh/guide/task/map-reduce.md
  17. 2
      docs/docs/zh/guide/task/spark.md
  18. BIN
      docs/img/file_detail.png
  19. BIN
      docs/img/file_detail_en.png
  20. BIN
      docs/img/new_ui/dev/resource/demo/file-demo01.png
  21. BIN
      docs/img/new_ui/dev/resource/demo/file-demo02.png
  22. BIN
      docs/img/new_ui/dev/resource/demo/file-demo03.png
  23. BIN
      docs/img/new_ui/dev/resource/demo/udf-demo01.png
  24. BIN
      docs/img/new_ui/dev/resource/demo/udf-demo02.png
  25. BIN
      docs/img/new_ui/dev/resource/demo/udf-demo03.png
  26. BIN
      docs/img/tasks/demo/file_detail.png

38
docs/configs/docsdev.js

@ -261,7 +261,24 @@ export default {
},
{
title: 'Resource',
link: '/en-us/docs/dev/user_doc/guide/resource.html',
children: [
{
title: 'Configuration',
link: '/en-us/docs/dev/user_doc/guide/resource/configuration.html'
},
{
title: 'File Manage',
link: '/en-us/docs/dev/user_doc/guide/resource/file-manage.html'
},
{
title: 'UDF Manage',
link: '/en-us/docs/dev/user_doc/guide/resource/udf-manage.html'
},
{
title: 'Task Group Manage',
link: '/en-us/docs/dev/user_doc/guide/resource/task-manage.html'
},
],
},
{
title: 'Monitor',
@ -591,7 +608,24 @@ export default {
},
{
title: '资源中心',
link: '/zh-cn/docs/dev/user_doc/guide/resource.html',
children: [
{
title: '配置详情',
link: '/zh-cn/docs/dev/user_doc/guide/resource/configuration.html'
},
{
title: '文件管理',
link: '/zh-cn/docs/dev/user_doc/guide/resource/file-manage.html'
},
{
title: 'UDF 管理',
link: '/zh-cn/docs/dev/user_doc/guide/resource/udf-manage.html'
},
{
title: '任务组管理',
link: '/zh-cn/docs/dev/user_doc/guide/resource/task-manage.html'
},
],
},
{
title: '监控中心',

165
docs/docs/en/guide/resource.md

@ -1,165 +0,0 @@
# Resource Center
If you want to use the resource upload function, you can appoint the local file directory as the upload directory for a single machine (this operation does not need to deploy Hadoop). Or you can also upload to a Hadoop or MinIO cluster, at this time, you need to have Hadoop (2.6+) or MinIO or other related environments.
> **_Note:_**
>
> * If you want to use the resource upload function, the deployment user in [installation and deployment](installation/standalone.md) must have relevant operation authority.
> * If you using a Hadoop cluster with HA, you need to enable HDFS resource upload, and you need to copy the `core-site.xml` and `hdfs-site.xml` under the Hadoop cluster to `/opt/dolphinscheduler/conf`, otherwise skip this copy step.
## HDFS Resource Configuration
- Upload resource files and UDF functions, all uploaded files and resources will be stored on HDFS, so require the following configurations:
```
conf/common.properties
# Users who have permission to create directories under the HDFS root path
hdfs.root.user=hdfs
# data base dir, resource file will store to this hadoop hdfs path, self configuration, please make sure the directory exists on hdfs and have read write permissions。"/dolphinscheduler" is recommended
resource.upload.path=/dolphinscheduler
# resource storage type : HDFS,S3,NONE
resource.storage.type=HDFS
# whether kerberos starts
hadoop.security.authentication.startup.state=false
# java.security.krb5.conf path
java.security.krb5.conf.path=/opt/krb5.conf
# loginUserFromKeytab user
login.user.keytab.username=hdfs-mycluster@ESZ.COM
# loginUserFromKeytab path
login.user.keytab.path=/opt/hdfs.headless.keytab
# if resource.storage.type is HDFS,and your Hadoop Cluster NameNode has HA enabled, you need to put core-site.xml and hdfs-site.xml in the installPath/conf directory. In this example, it is placed under /opt/soft/dolphinscheduler/conf, and configure the namenode cluster name; if the NameNode is not HA, modify it to a specific IP or host name.
# if resource.storage.type is S3,write S3 address,HA,for example :s3a://dolphinscheduler,
# Note,s3 be sure to create the root directory /dolphinscheduler
fs.defaultFS=hdfs://mycluster:8020
#resourcemanager ha note this need ips , this empty if single
yarn.resourcemanager.ha.rm.ids=192.168.xx.xx,192.168.xx.xx
# If it is a single resourcemanager, you only need to configure one host name. If it is resourcemanager HA, the default configuration is fine
yarn.application.status.address=http://xxxx:8088/ws/v1/cluster/apps/%s
```
## File Management
> It is the management of various resource files, including creating basic `txt/log/sh/conf/py/java` and jar packages and other type files, and can do edit, rename, download, delete and other operations to the files.
![file-manage](/img/new_ui/dev/resource/file-manage.png)
- Create a file
> The file format supports the following types: txt, log, sh, conf, cfg, py, java, sql, xml, hql, properties.
![create-file](/img/new_ui/dev/resource/create-file.png)
- upload files
> Upload file: Click the "Upload File" button to upload, drag the file to the upload area, the file name will be automatically completed with the uploaded file name.
![upload-file](/img/new_ui/dev/resource/upload-file.png)
- File View
> For the files that can be viewed, click the file name to view the file details.
<p align="center">
<img src="/img/file_detail_en.png" width="80%" />
</p>
- Download file
> Click the "Download" button in the file list to download the file or click the "Download" button in the upper right corner of the file details to download the file.
- File rename
![rename-file](/img/new_ui/dev/resource/rename-file.png)
- delete
> File list -> Click the "Delete" button to delete the specified file.
- Re-upload file
> Re-upload file: Click the "Re-upload File" button to upload a new file to replace the old file, drag the file to the re-upload area, the file name will be automatically completed with the new uploaded file name.
<p align="center">
<img src="/img/reupload_file_en.png" width="80%" />
</p>
## UDF Management
### Resource Management
> The resource management and file management functions are similar. The difference is that the resource management is the UDF upload function, and the file management uploads the user programs, scripts and configuration files.
> Operation function: rename, download, delete.
- Upload UDF resources
> Same as uploading files.
### Function Management
- Create UDF function
> Click "Create UDF Function", enter the UDF function parameters, select the UDF resource, and click "Submit" to create the UDF function.
> Currently, only supports temporary UDF functions of Hive.
- UDF function name: enter the name of the UDF function.
- Package name Class name: enter the full path of the UDF function.
- UDF resource: set the resource file corresponding to the created UDF function.
![create-udf](/img/new_ui/dev/resource/create-udf.png)
## Task Group Settings
The task group is mainly used to control the concurrency of task instances and is designed to control the pressure of other resources (it can also control the pressure of the Hadoop cluster, the cluster will have queue control it). When creating a new task definition, you can configure the corresponding task group and configure the priority of the task running in the task group.
### Task Group Configuration
#### Create Task Group
![create-taskGroup](/img/new_ui/dev/resource/create-taskGroup.png)
The user clicks [Resources] - [Task Group Management] - [Task Group option] - [Create Task Group]
![create-taskGroup](/img/new_ui/dev/resource/create-taskGroup.png)
You need to enter the information inside the picture:
- Task group name: the name displayed of the task group
- Project name: the project range that the task group functions, this item is optional, if not selected, all the projects in the whole system can use this task group.
- Resource pool size: The maximum number of concurrent task instances allowed.
#### View Task Group Queue
![view-queue](/img/new_ui/dev/resource/view-queue.png)
Click the button to view task group usage information:
![view-queue](/img/new_ui/dev/resource/view-groupQueue.png)
#### Use of Task Groups
**Note**: The usage of task groups is applicable to tasks executed by workers, such as [switch] nodes, [condition] nodes, [sub_process] and other node types executed by the master are not controlled by the task group. Let's take the shell node as an example:
![use-queue](/img/new_ui/dev/resource/use-queue.png)
Regarding the configuration of the task group, all you need to do is to configure these parts in the red box:
- Task group name: The task group name is displayed on the task group configuration page. Here you can only see the task group that the project has permission to access (the project is selected when creating a task group) or the task group that scope globally (no project is selected when creating a task group).
- Priority: When there is a waiting resource, the task with high priority will be distributed to the worker by the master first. The larger the value of this part, the higher the priority.
### Implementation Logic of Task Group
#### Get Task Group Resources:
The master judges whether the task is configured with a task group when distributing the task. If the task is not configured, it is normally thrown to the worker to run; if a task group is configured, it checks whether the remaining size of the task group resource pool meets the current task operation before throwing it to the worker for execution. , if the resource pool -1 is satisfied, continue to run; if not, exit the task distribution and wait for other tasks to wake up.
#### Release and Wake Up:
When the task that has occupied the task group resource is finished, the task group resource will be released. After the release, it will check whether there is a task waiting in the current task group. If there is, mark the task with the best priority to run, and create a new executable event. The event stores the task ID that is marked to acquire the resource, and then the task obtains the task group resource and run.
#### Task Group Flowchart
<p align="center">
<img src="/img/task_group_process.png" width="80%" />
</p>

116
docs/docs/en/guide/resource/configuration.md

@ -0,0 +1,116 @@
# Configuration
The Resource Center is usually used for operations such as uploading files, UDF functions, and task group management. You can appoint the local file directory as the upload directory for a single machine (this operation does not need to deploy Hadoop). Or you can also upload to a Hadoop or MinIO cluster, at this time, you need to have Hadoop (2.6+) or MinIO or other related environments.
## HDFS Resource Configuration
When it is necessary to use the Resource Center to create or upload relevant files, all files and resources will be stored on HDFS. Therefore the following configuration is required.
### Configuring the common.properties
After version 3.0.0-alpha, if you want to upload resources using HDFS or S3 from the Resource Center, you will need to configure the following paths The following paths need to be configured: `api-server/conf/common.properties` and `worker-server/conf/common.properties`. This can be found as follows.
```properties
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# user data local directory path, please make sure the directory exists and have read write permissions
data.basedir.path=/tmp/dolphinscheduler
# resource storage type: HDFS, S3, NONE
resource.storage.type=HDFS
# resource store on HDFS/S3 path, resource file will store to this hadoop hdfs path, self configuration,
# please make sure the directory exists on hdfs and have read write permissions. "/dolphinscheduler" is recommended
resource.upload.path=/tmp/dolphinscheduler
# whether to startup kerberos
hadoop.security.authentication.startup.state=false
# java.security.krb5.conf path
java.security.krb5.conf.path=/opt/krb5.conf
# login user from keytab username
login.user.keytab.username=hdfs-mycluster@ESZ.COM
# login user from keytab path
login.user.keytab.path=/opt/hdfs.headless.keytab
# kerberos expire time, the unit is hour
kerberos.expire.time=2
# resource view suffixs
#resource.view.suffixs=txt,log,sh,bat,conf,cfg,py,java,sql,xml,hql,properties,json,yml,yaml,ini,js
# if resource.storage.type=HDFS, the user must have the permission to create directories under the HDFS root path
hdfs.root.user=root
# if resource.storage.type=S3, the value like: s3a://dolphinscheduler;
# if resource.storage.type=HDFS and namenode HA is enabled, you need to copy core-site.xml and hdfs-site.xml to conf dir
fs.defaultFS=hdfs://localhost:8020
aws.access.key.id=minioadmin
aws.secret.access.key=minioadmin
aws.region=us-east-1
aws.endpoint=http://localhost:9000
# resourcemanager port, the default value is 8088 if not specified
resource.manager.httpaddress.port=8088
# if resourcemanager HA is enabled, please set the HA IPs; if resourcemanager is single, keep this value empty
yarn.resourcemanager.ha.rm.ids=192.168.xx.xx,192.168.xx.xx
# if resourcemanager HA is enabled or not use resourcemanager, please keep the default value;
# If resourcemanager is single, you only need to replace ds1 to actual resourcemanager hostname
yarn.application.status.address=http://localhost:%s/ds/v1/cluster/apps/%s
# job history status url when application number threshold is reached(default 10000, maybe it was set to 1000)
yarn.job.history.status.address=http://localhost:19888/ds/v1/history/mapreduce/jobs/%s
# datasource encryption enable
datasource.encryption.enable=false
# datasource encryption salt
datasource.encryption.salt=!@#$%^&*
# data quality option
data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar
#data-quality.error.output.path=/tmp/data-quality-error-data
# Network IP gets priority, default inner outer
# Whether hive SQL is executed in the same session
support.hive.oneSession=false
# use sudo or not, if set true, executing user is tenant user and deploy user needs sudo permissions;
# if set false, executing user is the deploy user and doesn't need sudo permissions
sudo.enable=true
# network interface preferred like eth0, default: empty
#dolphin.scheduler.network.interface.preferred=
# network IP gets priority, default: inner outer
#dolphin.scheduler.network.priority.strategy=default
# system env path
#dolphinscheduler.env.path=env/dolphinscheduler_env.sh
# development state
development.state=false
# rpc port
alert.rpc.port=50052
```
> **_Note:_**
>
> * If only the `api-server/conf/common.properties` file is configured, then resource uploading is enabled, but you can not use resources in task. If you want to use or execute the files in the workflow you need to configure `worker-server/conf/common.properties` too.
> * If you want to use the resource upload function, the deployment user in [installation and deployment](../installation/standalone.md) must have relevant operation authority.
> * If you using a Hadoop cluster with HA, you need to enable HDFS resource upload, and you need to copy the `core-site.xml` and `hdfs-site.xml` under the Hadoop cluster to `/opt/dolphinscheduler/conf`, otherwise skip this copy step.

64
docs/docs/en/guide/resource/file-manage.md

@ -0,0 +1,64 @@
# File Management
When third party jars are used in the scheduling process or user defined scripts are required, these can be created from this page. The types of files that can be created include: txt, log, sh, conf, py, java and so on. Files can be edited, renamed, downloaded and deleted.
![file-manage](/img/new_ui/dev/resource/file-manage.png)
- Create a file
> The file format supports the following types: txt, log, sh, conf, cfg, py, java, sql, xml, hql, properties.
![create-file](/img/new_ui/dev/resource/create-file.png)
- upload files
> Upload file: Click the "Upload File" button to upload, drag the file to the upload area, the file name will be automatically completed with the uploaded file name.
![upload-file](/img/new_ui/dev/resource/upload-file.png)
- File View
> For the files that can be viewed, click the file name to view the file details.
![file_detail](/img/tasks/demo/file_detail.png)
- Download file
> Click the "Download" button in the file list to download the file or click the "Download" button in the upper right corner of the file details to download the file.
- File rename
![rename-file](/img/new_ui/dev/resource/rename-file.png)
- delete
> File list -> Click the "Delete" button to delete the specified file.
- Re-upload file
> Re-upload file: Click the "Re-upload File" button to upload a new file to replace the old file, drag the file to the re-upload area, the file name will be automatically completed with the new uploaded file name.
![reuplod_file](/img/reupload_file_en.png)
## Example
The example uses a simple shell script to demonstrate the use of resource center files in workflow definitions. The same is true for tasks such as MR and Spark, which require jar packages.
### Create a shell file
Create a shell file, print `hello world`.
![create-shell](/img/new_ui/dev/resource/demo/file-demo01.png)
Create the workflow execution shell
In the workflow definition module of project Manage, create a new workflow using a shell task.
- Script: 'sh hello.sh'
- Resource: Select 'hello.sh'
![use-shell](/img/new_ui/dev/resource/demo/file-demo02.png)
### View the results
You can view the log results of running the node in the workflow example. The diagram below:
![log-shell](/img/new_ui/dev/resource/demo/file-demo03.png)

56
docs/docs/en/guide/resource/task-group.md

@ -0,0 +1,56 @@
# Task Group Settings
The task group is mainly used to control the concurrency of task instances and is designed to control the pressure of other resources (it can also control the pressure of the Hadoop cluster, the cluster will have queue control it). When creating a new task definition, you can configure the corresponding task group and configure the priority of the task running in the task group.
### Task Group Configuration
#### Create Task Group
![create-taskGroup](/img/new_ui/dev/resource/create-taskGroup.png)
The user clicks [Resources] - [Task Group Management] - [Task Group option] - [Create Task Group]
![create-taskGroup](/img/new_ui/dev/resource/create-taskGroup.png)
You need to enter the information inside the picture:
- Task group name: the name displayed of the task group
- Project name: the project range that the task group functions, this item is optional, if not selected, all the projects in the whole system can use this task group.
- Resource pool size: The maximum number of concurrent task instances allowed.
#### View Task Group Queue
![view-queue](/img/new_ui/dev/resource/view-queue.png)
Click the button to view task group usage information:
![view-queue](/img/new_ui/dev/resource/view-groupQueue.png)
#### Use of Task Groups
**Note**: The usage of task groups is applicable to tasks executed by workers, such as [switch] nodes, [condition] nodes, [sub_process] and other node types executed by the master are not controlled by the task group. Let's take the shell node as an example:
![use-queue](/img/new_ui/dev/resource/use-queue.png)
Regarding the configuration of the task group, all you need to do is to configure these parts in the red box:
- Task group name: The task group name is displayed on the task group configuration page. Here you can only see the task group that the project has permission to access (the project is selected when creating a task group) or the task group that scope globally (no project is selected when creating a task group).
- Priority: When there is a waiting resource, the task with high priority will be distributed to the worker by the master first. The larger the value of this part, the higher the priority.
### Implementation Logic of Task Group
#### Get Task Group Resources:
The master judges whether the task is configured with a task group when distributing the task. If the task is not configured, it is normally thrown to the worker to run; if a task group is configured, it checks whether the remaining size of the task group resource pool meets the current task operation before throwing it to the worker for execution. , if the resource pool -1 is satisfied, continue to run; if not, exit the task distribution and wait for other tasks to wake up.
#### Release and Wake Up:
When the task that has occupied the task group resource is finished, the task group resource will be released. After the release, it will check whether there is a task waiting in the current task group. If there is, mark the task with the best priority to run, and create a new executable event. The event stores the task ID that is marked to acquire the resource, and then the task obtains the task group resource and run.
#### Task Group Flowchart
![task_group](/img/task_group_process.png)

45
docs/docs/en/guide/resource/udf-manage.md

@ -0,0 +1,45 @@
# UDF Manage
The resource management and file management functions are similar. The difference is that the resource management is the UDF upload function, and the file management uploads the user programs, scripts and configuration files. Operation function: rename, download, delete.
- Upload UDF resources
> Same as uploading files.
### Function Management
- Create UDF function
> Click "Create UDF Function", enter the UDF function parameters, select the UDF resource, and click "Submit" to create the UDF function.
> Currently, only supports temporary UDF functions of Hive.
- UDF function name: enter the name of the UDF function.
- Package name Class name: enter the full path of the UDF function.
- UDF resource: set the resource file corresponding to the created UDF function.
![create-udf](/img/new_ui/dev/resource/create-udf.png)
## Example
### Write UDF functions
You can customize UDF functions based on actual production requirements. Write a function that adds "HelloWorld" to the end of any string. As shown below:
![code-udf](/img/new_ui/dev/resource/demo/udf-demo01.png)
### Configure the UDF function
Before configuring the UDF function, upload the jar package of the UDF function through resource management. Then enter function management and configure related information. As shown below:
![conf-udf](/img/new_ui/dev/resource/demo/udf-demo02.png)
### Use UDF functions
When using UDF functions, you only need to write specific functions and upload the configuration through the resource center. The system automatically configures the create function statement as follows: [SqlTask](https://github.com/apache/dolphinscheduler/blob/923f3f38e3271d7f1d22b3abc3497cecb6957e4a/dolphinscheduler-task-plugin/dolphinscheduler-task-sql/src/main/java/org/apache/dolphinscheduler/plugin/task/sql/SqlTask.java#L507-L531)
Enter the workflow and define an SQL node. Set the data source type to HIVE and the data source instance type to HIVE/IMPALA.
- SQL statement: `select HwUdf("abc");` This function is used in the same way as the built-in functions, and can be accessed directly using the function name.
- UDF function: Select the one configured for the resource center.
![use-udf](/img/new_ui/dev/resource/demo/udf-demo03.png)

2
docs/docs/en/guide/task/flink.md

@ -52,7 +52,7 @@ If you are using the flink task type in a production environment, it is necessar
#### Upload the Main Package
When using the Flink task node, you need to upload the jar package to the Resource Center for the execution, refer to the [resource center](../resource.md).
When using the Flink task node, you need to upload the jar package to the Resource Center for the execution, refer to the [resource center](../resource/configuration.md).
After finish the Resource Centre configuration, upload the required target files directly by dragging and dropping.

2
docs/docs/en/guide/task/map-reduce.md

@ -60,7 +60,7 @@ If you are using the MapReduce task type in a production environment, it is nece
#### Upload the Main Package
When using the MapReduce task node, you need to use the Resource Centre to upload the jar package for the execution. Refer to the [resource centre](../resource.md).
When using the MapReduce task node, you need to use the Resource Centre to upload the jar package for the execution. Refer to the [resource centre](../resource/configuration.md).
After finish the Resource Centre configuration, upload the required target files directly by dragging and dropping.

2
docs/docs/en/guide/task/spark.md

@ -59,7 +59,7 @@ If you are using the Spark task type in a production environment, it is necessar
##### Upload the Main Package
When using the Spark task node, you need to upload the jar package to the Resource Centre for the execution, refer to the [resource center](../resource.md).
When using the Spark task node, you need to upload the jar package to the Resource Centre for the execution, refer to the [resource center](../resource/configuration.md).
After finish the Resource Centre configuration, upload the required target files directly by dragging and dropping.

168
docs/docs/zh/guide/resource.md

@ -1,168 +0,0 @@
# 资源中心
如果需要用到资源上传功能,针对单机可以选择本地文件目录作为上传文件夹(此操作不需要部署 Hadoop)。当然也可以选择上传到 Hadoop or MinIO 集群上,此时则需要有Hadoop (2.6+) 或者 MinIO 等相关环境
> **_注意:_**
>
> * 如果用到资源上传的功能,那么 [安装部署](installation/standalone.md)中,部署用户需要有这部分的操作权限
> * 如果 Hadoop 集群的 NameNode 配置了 HA 的话,需要开启 HDFS 类型的资源上传,同时需要将 Hadoop 集群下的 `core-site.xml``hdfs-site.xml` 复制到 `/opt/dolphinscheduler/conf`,非 NameNode HA 跳过次步骤
## hdfs资源配置
- 上传资源文件和udf函数,所有上传的文件和资源都会被存储到hdfs上,所以需要以下配置项:
```
conf/common.properties
# Users who have permission to create directories under the HDFS root path
hdfs.root.user=hdfs
# data base dir, resource file will store to this hadoop hdfs path, self configuration, please make sure the directory exists on hdfs and have read write permissions。"/dolphinscheduler" is recommended
resource.upload.path=/dolphinscheduler
# resource storage type : HDFS,S3,NONE
resource.storage.type=HDFS
# whether kerberos starts
hadoop.security.authentication.startup.state=false
# java.security.krb5.conf path
java.security.krb5.conf.path=/opt/krb5.conf
# loginUserFromKeytab user
login.user.keytab.username=hdfs-mycluster@ESZ.COM
# loginUserFromKeytab path
login.user.keytab.path=/opt/hdfs.headless.keytab
# if resource.storage.type is HDFS,and your Hadoop Cluster NameNode has HA enabled, you need to put core-site.xml and hdfs-site.xml in the installPath/conf directory. In this example, it is placed under /opt/soft/dolphinscheduler/conf, and configure the namenode cluster name; if the NameNode is not HA, modify it to a specific IP or host name.
# if resource.storage.type is S3,write S3 address,HA,for example :s3a://dolphinscheduler,
# Note,s3 be sure to create the root directory /dolphinscheduler
fs.defaultFS=hdfs://mycluster:8020
#resourcemanager ha note this need ips , this empty if single
yarn.resourcemanager.ha.rm.ids=192.168.xx.xx,192.168.xx.xx
# If it is a single resourcemanager, you only need to configure one host name. If it is resourcemanager HA, the default configuration is fine
yarn.application.status.address=http://xxxx:8088/ws/v1/cluster/apps/%s
```
## 文件管理
> 是对各种资源文件的管理,包括创建基本的 `txt/log/sh/conf/py/java` 等文件、上传jar包等各种类型文件,可进行编辑、重命名、下载、删除等操作。
![file-manage](/img/new_ui/dev/resource/file-manage.png)
* 创建文件
> 文件格式支持以下几种类型:txt、log、sh、conf、cfg、py、java、sql、xml、hql、properties
![create-file](/img/new_ui/dev/resource/create-file.png)
* 上传文件
> 上传文件:点击"上传文件"按钮进行上传,将文件拖拽到上传区域,文件名会自动以上传的文件名称补全
![upload-file](/img/new_ui/dev/resource/upload-file.png)
* 文件查看
> 对可查看的文件类型,点击文件名称,可查看文件详情
<p align="center">
<img src="/img/file_detail.png" width="80%" />
</p>
* 下载文件
> 点击文件列表的"下载"按钮下载文件或者在文件详情中点击右上角"下载"按钮下载文件
* 文件重命名
![rename-file](/img/new_ui/dev/resource/rename-file.png)
* 删除
> 文件列表->点击"删除"按钮,删除指定文件
* 重新上传文件
> 点击文件列表中的”重新上传文件“按钮进行重新上传文件,将文件拖拽到上传区域,文件名会自动以上传的文件名称补全
<p align="center">
<img src="/img/reupload_file_en.png" width="80%" />
</p>
## UDF管理
### 资源管理
> 资源管理和文件管理功能类似,不同之处是资源管理是上传的UDF函数,文件管理上传的是用户程序,脚本及配置文件
> 操作功能:重命名、下载、删除。
* 上传udf资源
> 和上传文件相同。
### 函数管理
* 创建 UDF 函数
> 点击“创建 UDF 函数”,输入 UDF 函数参数,选择udf资源,点击“提交”,创建udf函数。
> 目前只支持HIVE的临时UDF函数
- UDF 函数名称:输入UDF函数时的名称
- 包名类名:输入UDF函数的全路径
- UDF 资源:设置创建的 UDF 对应的资源文件
![create-udf](/img/new_ui/dev/resource/create-udf.png)
## 任务组设置
任务组主要用于控制任务实例并发,旨在控制其他资源的压力(也可以控制Hadoop集群压力,不过集群会有队列管控)。您可在新建任务定义时,可配置对应的任务组,并配置任务在任务组内运行的优先级。
### 任务组配置
#### 新建任务组
![taskGroup](/img/new_ui/dev/resource/taskGroup.png)
用户点击【资源中心】-【任务组管理】-【任务组配置】-新建任务组
![create-taskGroup](/img/new_ui/dev/resource/create-taskGroup.png)
您需要输入图片中信息,其中
【任务组名称】:任务组在被使用时显示的名称
【项目名称】:任务组作用的项目,该项为非必选项,如果不选择,则整个系统所有项目均可使用该任务组。
【资源容量】:允许任务实例并发的最大数量
#### 查看任务组队列
![view-queue](/img/new_ui/dev/resource/view-queue.png)
点击按钮查看任务组使用信息
![view-queue](/img/new_ui/dev/resource/view-groupQueue.png)
#### 任务组的使用
注:任务组的使用适用于由 worker 执行的任务,例如【switch】节点、【condition】节点、【sub_process】等由 master 负责执行的节点类型不受任务组控制。
我们以 shell 节点为例:
![use-queue](/img/new_ui/dev/resource/use-queue.png)
关于任务组的配置,您需要做的只需要配置红色框内的部分,其中:
【任务组名称】:任务组配置页面显示的任务组名称,这里只能看到该项目有权限的任务组(新建任务组时选择了该项目),或作用在全局的任务组(新建任务组时没有选择项目)
【组内优先级】:在出现等待资源时,优先级高的任务会最先被 master 分发给 worker 执行,该部分数值越大,优先级越高。
### 任务组的实现逻辑
#### 获取任务组资源:
Master 在分发任务时判断该任务是否配置了任务组,如果任务没有配置,则正常抛给 worker 运行;如果配置了任务组,在抛给 worker 执行之前检查任务组资源池剩余大小是否满足当前任务运行,如果满足资源池 -1,继续运行;如果不满足则退出任务分发,等待其他任务结束唤醒。
#### 释放与唤醒:
当获取到任务组资源的任务结束运行后,会释放任务组资源,释放后会检查当前任务组是否有任务等待,如果有则标记优先级最好的任务可以运行,并新建一个可以执行的event。该event中存储着被标记可以获取资源的任务id,随后在获取任务组资源然后运行。
#### 任务组流程图
<p align="center">
<img src="/img/task_group_process.png" width="80%" />
</p>

117
docs/docs/zh/guide/resource/configuration.md

@ -0,0 +1,117 @@
# 资源中心配置详情
资源中心通常用于上传文件、 UDF 函数,以及任务组管理等操作。针对单机环境可以选择本地文件目录作为上传文件夹(此操作不需要部署 Hadoop)。当然也可以选择上传到 Hadoop or MinIO 集群上,此时则需要有 Hadoop(2.6+)或者 MinIOn 等相关环境。
## HDFS 资源配置
当需要使用资源中心进行相关文件的创建或者上传操作时,所有的文件和资源都会被存储在 HDFS 上。所以需要进行以下配置:
### 配置 common.properties 文件
在 3.0.0-alpha 版本之后,如果需要使用到资源中心的 HDFS 或 S3 上传资源,我们需要对以下路径的进行配置:`api-server/conf/common.properties` 和 `worker-server/conf/common.properties`。可参考如下:
```properties
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# user data local directory path, please make sure the directory exists and have read write permissions
data.basedir.path=/tmp/dolphinscheduler
# resource storage type: HDFS, S3, NONE
resource.storage.type=HDFS
# resource store on HDFS/S3 path, resource file will store to this hadoop hdfs path, self configuration,
# please make sure the directory exists on hdfs and have read write permissions. "/dolphinscheduler" is recommended
resource.upload.path=/tmp/dolphinscheduler
# whether to startup kerberos
hadoop.security.authentication.startup.state=false
# java.security.krb5.conf path
java.security.krb5.conf.path=/opt/krb5.conf
# login user from keytab username
login.user.keytab.username=hdfs-mycluster@ESZ.COM
# login user from keytab path
login.user.keytab.path=/opt/hdfs.headless.keytab
# kerberos expire time, the unit is hour
kerberos.expire.time=2
# resource view suffixs
#resource.view.suffixs=txt,log,sh,bat,conf,cfg,py,java,sql,xml,hql,properties,json,yml,yaml,ini,js
# if resource.storage.type=HDFS, the user must have the permission to create directories under the HDFS root path
hdfs.root.user=root
# if resource.storage.type=S3, the value like: s3a://dolphinscheduler;
# if resource.storage.type=HDFS and namenode HA is enabled, you need to copy core-site.xml and hdfs-site.xml to conf dir
fs.defaultFS=hdfs://localhost:8020
aws.access.key.id=minioadmin
aws.secret.access.key=minioadmin
aws.region=us-east-1
aws.endpoint=http://localhost:9000
# resourcemanager port, the default value is 8088 if not specified
resource.manager.httpaddress.port=8088
# if resourcemanager HA is enabled, please set the HA IPs; if resourcemanager is single, keep this value empty
yarn.resourcemanager.ha.rm.ids=192.168.xx.xx,192.168.xx.xx
# if resourcemanager HA is enabled or not use resourcemanager, please keep the default value;
# If resourcemanager is single, you only need to replace ds1 to actual resourcemanager hostname
yarn.application.status.address=http://localhost:%s/ds/v1/cluster/apps/%s
# job history status url when application number threshold is reached(default 10000, maybe it was set to 1000)
yarn.job.history.status.address=http://localhost:19888/ds/v1/history/mapreduce/jobs/%s
# datasource encryption enable
datasource.encryption.enable=false
# datasource encryption salt
datasource.encryption.salt=!@#$%^&*
# data quality option
data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar
#data-quality.error.output.path=/tmp/data-quality-error-data
# Network IP gets priority, default inner outer
# Whether hive SQL is executed in the same session
support.hive.oneSession=false
# use sudo or not, if set true, executing user is tenant user and deploy user needs sudo permissions;
# if set false, executing user is the deploy user and doesn't need sudo permissions
sudo.enable=true
# network interface preferred like eth0, default: empty
#dolphin.scheduler.network.interface.preferred=
# network IP gets priority, default: inner outer
#dolphin.scheduler.network.priority.strategy=default
# system env path
#dolphinscheduler.env.path=env/dolphinscheduler_env.sh
# development state
development.state=false
# rpc port
alert.rpc.port=50052
```
> **注意**
>
> * 如果只配置了 `api-server/conf/common.properties` 的文件,则只是开启了资源上传的操作,并不能满足正常使用。如果想要在工作流中执行相关文件则需要额外配置 `worker-server/conf/common.properties`
> * 如果用到资源上传的功能,那么[安装部署](../installation/standalone.md)中,部署用户需要有这部分的操作权限。
> * 如果 Hadoop 集群的 NameNode 配置了 HA 的话,需要开启 HDFS 类型的资源上传,同时需要将 Hadoop 集群下的 `core-site.xml``hdfs-site.xml` 复制到 `/opt/dolphinscheduler/conf`,非 NameNode HA 跳过次步骤。
>

69
docs/docs/zh/guide/resource/file-manage.md

@ -0,0 +1,69 @@
# 文件管理
当在调度过程中需要使用到第三方的 jar 或者用户需要自定义脚本的情况,可以通过在该页面完成相关操作。可创建的文件类型包括:`txt/log/sh/conf/py/java` 等。并且可以对文件进行编辑、重命名、下载和删除等操作。
![file-manage](/img/new_ui/dev/resource/file-manage.png)
* 创建文件
> 文件格式支持以下几种类型:txt、log、sh、conf、cfg、py、java、sql、xml、hql、properties
![create-file](/img/new_ui/dev/resource/create-file.png)
* 上传文件
> 上传文件:点击"上传文件"按钮进行上传,将文件拖拽到上传区域,文件名会自动以上传的文件名称补全
![upload-file](/img/new_ui/dev/resource/upload-file.png)
* 文件查看
> 对可查看的文件类型,点击文件名称,可查看文件详情
![file_detail](/img/tasks/demo/file_detail.png)
* 下载文件
> 点击文件列表的"下载"按钮下载文件或者在文件详情中点击右上角"下载"按钮下载文件
* 文件重命名
![rename-file](/img/new_ui/dev/resource/rename-file.png)
* 删除
> 文件列表->点击"删除"按钮,删除指定文件
* 重新上传文件
> 点击文件列表中的”重新上传文件“按钮进行重新上传文件,将文件拖拽到上传区域,文件名会自动以上传的文件名称补全
![reuplod_file](/img/reupload_file_en.png)
## 任务样例
该样例主要通过一个简单的 shell 脚本,来演示如何在工作流定义中使用资源中心的文件。像 MR、Spark 等任务需要用到 jar 包,也是同理。
### 创建 shell 文件
创建一个 shell 文件,输出 “hello world”。
![create-shell](/img/new_ui/dev/resource/demo/file-demo01.png)
### 创建工作流执行文件
在项目管理的工作流定义模块,创建一个新的工作流,使用 shell 任务。
- 脚本:`sh hello.sh`
- 资源:选择 `hello.sh`
![use-shell](/img/new_ui/dev/resource/demo/file-demo02.png)
### 查看结果
可以在工作流实例中,查看该节点运行的日志结果。如下图:
![log-shell](/img/new_ui/dev/resource/demo/file-demo03.png)

57
docs/docs/zh/guide/resource/task-group.md

@ -0,0 +1,57 @@
# 任务组管理
任务组主要用于控制任务实例并发,旨在控制其他资源的压力(也可以控制 Hadoop 集群压力,不过集群会有队列管控)。您可在新建任务定义时,可配置对应的任务组,并配置任务在任务组内运行的优先级。
### 任务组配置
#### 新建任务组
![taskGroup](/img/new_ui/dev/resource/taskGroup.png)
用户点击【资源中心】-【任务组管理】-【任务组配置】-新建任务组
![create-taskGroup](/img/new_ui/dev/resource/create-taskGroup.png)
您需要输入图片中信息,其中
【任务组名称】:任务组在被使用时显示的名称
【项目名称】:任务组作用的项目,该项为非必选项,如果不选择,则整个系统所有项目均可使用该任务组。
【资源容量】:允许任务实例并发的最大数量
#### 查看任务组队列
![view-queue](/img/new_ui/dev/resource/view-queue.png)
点击按钮查看任务组使用信息
![view-queue](/img/new_ui/dev/resource/view-groupQueue.png)
#### 任务组的使用
注:任务组的使用适用于由 worker 执行的任务,例如【switch】节点、【condition】节点、【sub_process】等由 master 负责执行的节点类型不受任务组控制。
我们以 shell 节点为例:
![use-queue](/img/new_ui/dev/resource/use-queue.png)
关于任务组的配置,您需要做的只需要配置红色框内的部分,其中:
【任务组名称】:任务组配置页面显示的任务组名称,这里只能看到该项目有权限的任务组(新建任务组时选择了该项目),或作用在全局的任务组(新建任务组时没有选择项目)
【组内优先级】:在出现等待资源时,优先级高的任务会最先被 master 分发给 worker 执行,该部分数值越大,优先级越高。
### 任务组的实现逻辑
#### 获取任务组资源:
Master 在分发任务时判断该任务是否配置了任务组,如果任务没有配置,则正常抛给 worker 运行;如果配置了任务组,在抛给 worker 执行之前检查任务组资源池剩余大小是否满足当前任务运行,如果满足资源池 -1,继续运行;如果不满足则退出任务分发,等待其他任务结束唤醒。
#### 释放与唤醒:
当获取到任务组资源的任务结束运行后,会释放任务组资源,释放后会检查当前任务组是否有任务等待,如果有则标记优先级最好的任务可以运行,并新建一个可以执行的event。该event中存储着被标记可以获取资源的任务id,随后在获取任务组资源然后运行。
#### 任务组流程图
![task_group](/img/task_group_process.png)

53
docs/docs/zh/guide/resource/udf-manage.md

@ -0,0 +1,53 @@
# UDF 管理
- 资源管理和文件管理功能类似,不同之处是资源管理是上传的 UDF 函数,文件管理上传的是用户程序,脚本及配置文件。
- 主要包括以下操作:重命名、下载、删除等。
* 上传 UDF 资源
> 和上传文件相同。
## 函数管理
* 创建 UDF 函数
> 点击“创建 UDF 函数”,输入 UDF 函数参数,选择udf资源,点击“提交”,创建 UDF 函数。
> 目前只支持 HIVE 的临时 UDF 函数
- UDF 函数名称:输入 UDF 函数时的名称
- 包名类名:输入 UDF 函数的全路径
- UDF 资源:设置创建的 UDF 对应的资源文件
![create-udf](/img/new_ui/dev/resource/create-udf.png)
## 任务样例
### 编写 UDF 函数
用户可以根据实际生产需求,自定义想要的 UDF 函数。这里编写一个在任意字符串的末尾添加 "HelloWorld" 的函数。如下图所示:
![code-udf](/img/new_ui/dev/resource/demo/udf-demo01.png)
### 配置 UDF 函数
配置 UDF 函数前,需要先通过资源管理上传所需的函数 jar 包。然后进入函数管理,配置相关信息即可。如下图所示:
![conf-udf](/img/new_ui/dev/resource/demo/udf-demo02.png)
### 使用 UDF 函数
在使用 UDF 函数过程中,用户只需关注具体的函数编写,通过资源中心上传配置完成即可。系统会自动配置 create function 语句,参考如下:[SqlTask](https://github.com/apache/dolphinscheduler/blob/923f3f38e3271d7f1d22b3abc3497cecb6957e4a/dolphinscheduler-task-plugin/dolphinscheduler-task-sql/src/main/java/org/apache/dolphinscheduler/plugin/task/sql/SqlTask.java#L507-L531)
进入工作流定义一个 SQL 节点,数据源类型选择为 HIVE,数据源实例类型为 HIVE/IMPALA。
- SQL 语句:`select HwUdf("abc");` 该函数与内置函数使用方式一样,直接使用函数名称即可访问。
- UDF 函数:选择资源中心所配置的即可。
![use-udf](/img/new_ui/dev/resource/demo/udf-demo03.png)

2
docs/docs/zh/guide/task/flink.md

@ -52,7 +52,7 @@ Flink 任务类型,用于执行 Flink 程序。对于 Flink 节点,worker
#### 上传主程序包
在使用 Flink 任务节点时,需要利用资源中心上传执行程序的 jar 包,可参考[资源中心](../resource.md)。
在使用 Flink 任务节点时,需要利用资源中心上传执行程序的 jar 包,可参考[资源中心](../resource/configuration.md)。
当配置完成资源中心之后,直接使用拖拽的方式,即可上传所需目标文件。

2
docs/docs/zh/guide/task/map-reduce.md

@ -60,7 +60,7 @@ MapReduce(MR) 任务类型,用于执行 MapReduce 程序。对于 MapReduce
#### 上传主程序包
在使用 MapReduce 任务节点时,需要利用资源中心上传执行程序的 jar 包。可参考[资源中心](../resource.md)。
在使用 MapReduce 任务节点时,需要利用资源中心上传执行程序的 jar 包。可参考[资源中心](../resource/configuration.md)。
当配置完成资源中心之后,直接使用拖拽的方式,即可上传所需目标文件。

2
docs/docs/zh/guide/task/spark.md

@ -60,7 +60,7 @@ Spark 任务类型用于执行 Spark 应用。对于 Spark 节点,worker 支
##### 上传主程序包
在使用 Spark 任务节点时,需要利用资源中心上传执行程序的 jar 包,可参考[资源中心](../resource.md)。
在使用 Spark 任务节点时,需要利用资源中心上传执行程序的 jar 包,可参考[资源中心](../resource/configuration.md)。
当配置完成资源中心之后,直接使用拖拽的方式,即可上传所需目标文件。

BIN
docs/img/file_detail.png

Binary file not shown.

Before

Width:  |  Height:  |  Size: 60 KiB

BIN
docs/img/file_detail_en.png

Binary file not shown.

Before

Width:  |  Height:  |  Size: 483 KiB

BIN
docs/img/new_ui/dev/resource/demo/file-demo01.png

Binary file not shown.

After

Width:  |  Height:  |  Size: 58 KiB

BIN
docs/img/new_ui/dev/resource/demo/file-demo02.png

Binary file not shown.

After

Width:  |  Height:  |  Size: 117 KiB

BIN
docs/img/new_ui/dev/resource/demo/file-demo03.png

Binary file not shown.

After

Width:  |  Height:  |  Size: 264 KiB

BIN
docs/img/new_ui/dev/resource/demo/udf-demo01.png

Binary file not shown.

After

Width:  |  Height:  |  Size: 16 KiB

BIN
docs/img/new_ui/dev/resource/demo/udf-demo02.png

Binary file not shown.

After

Width:  |  Height:  |  Size: 82 KiB

BIN
docs/img/new_ui/dev/resource/demo/udf-demo03.png

Binary file not shown.

After

Width:  |  Height:  |  Size: 132 KiB

BIN
docs/img/tasks/demo/file_detail.png

Binary file not shown.

After

Width:  |  Height:  |  Size: 55 KiB

Loading…
Cancel
Save