Browse Source

[Feature][MLops] Support MLflow Models to deploy model service (MLflow models serve and Docker) (#10150)

* Add backend of MLflow Moldes component

* Add front end of MLflow Moldes component

* [DOC] update mlflow document

* revert unnecessary

* fix doc image url

* Update docs/docs/en/guide/task/mlflow.md

Discard the lr abbreviation

Co-authored-by: Jiajie Zhong <zhongjiajie955@gmail.com>

Co-authored-by: Jiajie Zhong <zhongjiajie955@gmail.com>
3.1.0-release
JieguangZhou 2 years ago committed by GitHub
parent
commit
e11373b963
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
  1. 197
      docs/docs/en/guide/task/mlflow.md
  2. 157
      docs/docs/zh/guide/task/mlflow.md
  3. BIN
      docs/img/tasks/demo/mlflow-automl.png
  4. BIN
      docs/img/tasks/demo/mlflow-basic-algorithm.png
  5. BIN
      docs/img/tasks/demo/mlflow-models-docker.png
  6. BIN
      docs/img/tasks/demo/mlflow-models-mlflow.png
  7. 47
      dolphinscheduler-task-plugin/dolphinscheduler-task-mlflow/src/main/java/org/apache/dolphinscheduler/plugin/task/mlflow/MlflowConstants.java
  8. 77
      dolphinscheduler-task-plugin/dolphinscheduler-task-mlflow/src/main/java/org/apache/dolphinscheduler/plugin/task/mlflow/MlflowParameters.java
  9. 112
      dolphinscheduler-task-plugin/dolphinscheduler-task-mlflow/src/main/java/org/apache/dolphinscheduler/plugin/task/mlflow/MlflowTask.java
  10. 78
      dolphinscheduler-task-plugin/dolphinscheduler-task-mlflow/src/test/java/org/apache/dolphinler/plugin/task/mlflow/MlflowTaskTest.java
  11. 4
      dolphinscheduler-ui/src/locales/modules/en_US.ts
  12. 4
      dolphinscheduler-ui/src/locales/modules/zh_CN.ts
  13. 2
      dolphinscheduler-ui/src/views/projects/task/components/node/fields/index.ts
  14. 80
      dolphinscheduler-ui/src/views/projects/task/components/node/fields/use-mlflow-models.ts
  15. 255
      dolphinscheduler-ui/src/views/projects/task/components/node/fields/use-mlflow-projects.ts
  16. 140
      dolphinscheduler-ui/src/views/projects/task/components/node/fields/use-mlflow.ts
  17. 5
      dolphinscheduler-ui/src/views/projects/task/components/node/format-data.ts
  18. 5
      dolphinscheduler-ui/src/views/projects/task/components/node/tasks/use-mlflow.ts
  19. 4
      dolphinscheduler-ui/src/views/projects/task/components/node/types.ts

197
docs/docs/en/guide/task/mlflow.md

@ -5,8 +5,29 @@
[MLflow](https://mlflow.org) is an excellent open source platform to manage the ML lifecycle, including experimentation,
reproducibility, deployment, and a central model registry.
Mlflow task is used to perform mlflow project tasks, which include basic algorithmic and autoML capabilities (
User-defined MLFlow project task execution will be supported in the near future)
MLflow task plugin used to execute MLflow tasks,Currently contains Mlflow Projects and MLflow Models.(Model Registry will soon be rewarded for support)
- Mlflow Projects: Package data science code in a format to reproduce runs on any platform.
- MLflow Models: Deploy machine learning models in diverse serving environments.
- Model Registry: Store, annotate, discover, and manage models in a central repository.
The Mlflow plugin currently supports and will support the following:
- [ ] MLflow Projects
- [x] BasicAlgorithm: contains lr, svm, lightgbm, xgboost
- [x] AutoML: AutoML tool,contains autosklean, flaml
- [ ] Custom projects: Support for running your own MLflow projects
- [ ] MLflow Models
- [x] MLFLOW: Use `MLflow models serve` to deploy a model service
- [x] Docker: Run the container after packaging the docker image
- [ ] Docker Compose: Use docker compose to run the container, Will replace the docker run above
- [ ] Seldon core: Use Selcon core to deploy model to k8s cluster
- [ ] k8s: Deploy containers directly to K8S
- [ ] mlflow deployments: Built-in deployment modules, such as built-in deployment to SageMaker, etc
- [ ] Model Registry
- [ ] Register Model: Allows artifacts (Including model and related parameters, indicators) to be registered directly into the model center
## Create Task
@ -14,68 +35,110 @@ User-defined MLFlow project task execution will be supported in the near future)
DAG editing page.
- Drag from the toolbar <img src="/img/tasks/icons/mlflow.png" width="15"/> task node to canvas.
## Task Parameter
- DolphinScheduler common parameters
- **Node name**: The node name in a workflow definition is unique.
- **Run flag**: Identifies whether this node schedules normally, if it does not need to execute, select
the `prohibition execution`.
- **Descriptive information**: Describe the function of the node.
- **Task priority**: When the number of worker threads is insufficient, execute in the order of priority from high
to low, and tasks with the same priority will execute in a first-in first-out order.
- **Worker grouping**: Assign tasks to the machines of the worker group to execute. If `Default` is selected,
randomly select a worker machine for execution.
- **Environment Name**: Configure the environment name in which run the script.
- **Times of failed retry attempts**: The number of times the task failed to resubmit.
- **Failed retry interval**: The time interval (unit minute) for resubmitting the task after a failed task.
- **Delayed execution time**: The time (unit minute) that a task delays in execution.
- **Timeout alarm**: Check the timeout alarm and timeout failure. When the task runs exceed the "timeout", an alarm
email will send and the task execution will fail.
- **Custom parameter**: It is a local user-defined parameter for mlflow, and will replace the content
with `${variable}` in the script.
- **Predecessor task**: Selecting a predecessor task for the current task, will set the selected predecessor task as
upstream of the current task.
- MLflow task specific parameters
- **mlflow server tracking uri** :MLflow server uri, default http://localhost:5000.
- **experiment name** :The experiment in which the task is running, if none, is created.
- **register model** :Register the model or not. If register is selected, the following parameters are expanded.
- **model name** : The registered model name is added to the original model version and registered as
Production.
- **job type** : The type of task to run, currently including the underlying algorithm and AutoML. (User-defined
MLFlow project task execution will be supported in the near future)
- BasicAlgorithm specific parameters
- **algorithm** :The selected algorithm currently supports `LR`, `SVM`, `LightGBM` and `XGboost` based
on [scikit-learn](https://scikit-learn.org/) form.
- **Parameter search space** : Parameter search space when running the corresponding algorithm, which can be
empty. For example, the parameter `max_depth=[5, 10];n_estimators=[100, 200]` for lightgbm 。The convention
will be passed with '; 'shards each parameter, using the name before the equal sign as the parameter name,
and using the name after the equal sign to get the corresponding parameter value through `python eval()`.
- AutoML specific parameters
- **AutoML tool** : The AutoML tool used, currently
supports [autosklearn](https://github.com/automl/auto-sklearn)
and [flaml](https://github.com/microsoft/FLAML)
- Parameters common to BasicAlgorithm and AutoML
- **data path** : The absolute path of the file or folder. Ends with .csv for file or contain train.csv and
test.csv for folder(In the suggested way, users should build their own test sets for model evaluation)。
- **parameters** : Parameter when initializing the algorithm/AutoML model, which can be empty. For example
parameters `"time_budget=30;estimator_list=['lgbm']"` for flaml 。The convention will be passed with '; 'shards
each parameter, using the name before the equal sign as the parameter name, and using the name after the equal
sign to get the corresponding parameter value through `python eval()`.
- BasicAlgorithm
- [lr](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression)
- [SVM](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html?highlight=svc#sklearn.svm.SVC)
- [lightgbm](https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMClassifier.html#lightgbm.LGBMClassifier)
- [xgboost](https://xgboost.readthedocs.io/en/stable/python/python_api.html#xgboost.XGBClassifier)
- AutoML
- [flaml](https://microsoft.github.io/FLAML/docs/reference/automl#automl-objects)
- [autosklearn](https://automl.github.io/auto-sklearn/master/api.html)
## Task Example
### Preparation
First, introduce some general parameters of DolphinScheduler
- **Node name**: The node name in a workflow definition is unique.
- **Run flag**: Identifies whether this node schedules normally, if it does not need to execute, select
the `prohibition execution`.
- **Descriptive information**: Describe the function of the node.
- **Task priority**: When the number of worker threads is insufficient, execute in the order of priority from high
to low, and tasks with the same priority will execute in a first-in first-out order.
- **Worker grouping**: Assign tasks to the machines of the worker group to execute. If `Default` is selected,
randomly select a worker machine for execution.
- **Environment Name**: Configure the environment name in which run the script.
- **Times of failed retry attempts**: The number of times the task failed to resubmit.
- **Failed retry interval**: The time interval (unit minute) for resubmitting the task after a failed task.
- **Delayed execution time**: The time (unit minute) that a task delays in execution.
- **Timeout alarm**: Check the timeout alarm and timeout failure. When the task runs exceed the "timeout", an alarm
email will send and the task execution will fail.
- **Predecessor task**: Selecting a predecessor task for the current task, will set the selected predecessor task as
upstream of the current task.
### MLflow Projects
#### BasicAlgorithm
![mlflow-conda-env](/img/tasks/demo/mlflow-basic-algorithm.png)
**Task Parameter**
- **mlflow server tracking uri** :MLflow server uri, default http://localhost:5000.
- **job type** : The type of task to run, currently including the underlying algorithm and AutoML. (User-defined
MLFlow project task execution will be supported in the near future)
- **experiment name** :The experiment in which the task is running, if none, is created.
- **register model** :Register the model or not. If register is selected, the following parameters are expanded.
- **model name** : The registered model name is added to the original model version and registered as
Production.
- **data path** : The absolute path of the file or folder. Ends with .csv for file or contain train.csv and
test.csv for folder(In the suggested way, users should build their own test sets for model evaluation)。
- **parameters** : Parameter when initializing the algorithm/AutoML model, which can be empty. For example
parameters `"time_budget=30;estimator_list=['lgbm']"` for flaml 。The convention will be passed with '; 'shards
each parameter, using the name before the equal sign as the parameter name, and using the name after the equal
sign to get the corresponding parameter value through `python eval()`.
- [Logistic Regression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression)
- [SVM](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html?highlight=svc#sklearn.svm.SVC)
- [lightgbm](https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMClassifier.html#lightgbm.LGBMClassifier)
- [xgboost](https://xgboost.readthedocs.io/en/stable/python/python_api.html#xgboost.XGBClassifier)
- **algorithm** :The selected algorithm currently supports `LR`, `SVM`, `LightGBM` and `XGboost` based
on [scikit-learn](https://scikit-learn.org/) form.
- **Parameter search space** : Parameter search space when running the corresponding algorithm, which can be
empty. For example, the parameter `max_depth=[5, 10];n_estimators=[100, 200]` for lightgbm 。The convention
will be passed with '; 'shards each parameter, using the name before the equal sign as the parameter name,
and using the name after the equal sign to get the corresponding parameter value through `python eval()`.
#### AutoML
![mlflow-automl](/img/tasks/demo/mlflow-automl.png)
**Task Parameter**
- **mlflow server tracking uri** :MLflow server uri, default http://localhost:5000.
- **job type** : The type of task to run, currently including the underlying algorithm and AutoML. (User-defined
MLFlow project task execution will be supported in the near future)
- **experiment name** :The experiment in which the task is running, if none, is created.
- **register model** :Register the model or not. If register is selected, the following parameters are expanded.
- **model name** : The registered model name is added to the original model version and registered as
Production.
- **data path** : The absolute path of the file or folder. Ends with .csv for file or contain train.csv and
test.csv for folder(In the suggested way, users should build their own test sets for model evaluation)。
- **parameters** : Parameter when initializing the algorithm/AutoML model, which can be empty. For example
parameters `n_estimators=200;learning_rate=0.2` for flaml 。The convention will be passed with '; 'shards
each parameter, using the name before the equal sign as the parameter name, and using the name after the equal
sign to get the corresponding parameter value through `python eval()`. The detailed parameter list is as follows:
- [flaml](https://microsoft.github.io/FLAML/docs/reference/automl#automl-objects)
- [autosklearn](https://automl.github.io/auto-sklearn/master/api.html)
- **AutoML tool** : The AutoML tool used, currently
supports [autosklearn](https://github.com/automl/auto-sklearn)
and [flaml](https://github.com/microsoft/FLAML)
### MLflow Models
#### MLFLOW
![mlflow-models-mlflow](/img/tasks/demo/mlflow-models-mlflow.png)
#### Conda env
**Task Parameter**
- **mlflow server tracking uri** :MLflow server uri, default http://localhost:5000.
- **model-uri** :Model-uri of mlflow , support `models:/<model_name>/suffix` format and `runs:/` format. See https://mlflow.org/docs/latest/tracking.html#artifact-stores
- **Port** :The port to listen on
#### Docker
![mlflow-models-docker](/img/tasks/demo/mlflow-models-docker.png)
**Task Parameter**
- **mlflow server tracking uri** :MLflow server uri, default http://localhost:5000.
- **model-uri** :Model-uri of mlflow , support `models:/<model_name>/suffix` format and `runs:/` format. See https://mlflow.org/docs/latest/tracking.html#artifact-stores
- **Port** :The port to listen on
## Environment to prepare
### Conda env
You need to enter the admin account to configure a conda environment variable(Please
install [anaconda](https://docs.continuum.io/anaconda/install/)
@ -88,7 +151,7 @@ Conda environment.
![mlflow-set-conda-env](/img/tasks/demo/mlflow-set-conda-env.png)
#### Start the mlflow service
### Start the mlflow service
Make sure you have installed MLflow, using 'PIP Install MLFlow'.
@ -102,16 +165,6 @@ mlflow server -h 0.0.0.0 -p 5000 --serve-artifacts --backend-store-uri sqlite://
After running, an MLflow service is started
### Run BasicAlgorithm task
The following example shows how to create an MLflow BasicAlgorithm task.
![mlflow-basic-algorithm](/img/tasks/demo/mlflow-basic-algorithm.png)
After this, you can visit the MLFlow service (`http://localhost:5000`) page to view the experiments and models.
![mlflow-server](/img/tasks/demo/mlflow-server.png)
### Run AutoML task
![mlflow-automl](/img/tasks/demo/mlflow-automl.png)

157
docs/docs/zh/guide/task/mlflow.md

@ -4,60 +4,114 @@
[MLflow](https://mlflow.org) 是一个MLops领域一个优秀的开源项目, 用于管理机器学习的生命周期,包括实验、可再现性、部署和中心模型注册。
MLflow 任务用于执行 MLflow Project 任务,其中包含了阈值的基础算法能力与AutoML能力(将在不久将来支持用户自定义的mlflow project任务执行)。
MLflow 组件用于执行 MLflow 任务,目前包含Mlflow Projects, 和MLflow Models。(Model Registry将在不就的将来支持)
- Mlflow Projects: 将代码打包,并可以运行到任务的平台上。
- MLflow Models: 在不同的服务环境中部署机器学习模型。
- Model Registry: 在一个中央存储库中存储、注释、发现和管理模型 (你也可以在你的mlflow project 里面自行注册模型)。
目前 Mlflow 组件支持的和即将支持的内容如下中:
- [ ] MLflow Projects
- [x] BasicAlgorithm: 基础算法,包含lr, svm, lightgbm, xgboost。
- [x] AutoML: AutoML工具,包含autosklean, flaml。
- [ ] Custom projects: 支持运行自己的MLflow Projects项目
- [ ] MLflow Models
- [x] MLFLOW: 直接使用 `MLflow models serve` 部署模型
- [x] Docker: 打包 DOCKER 镜像后部署模型
- [ ] Docker Compose: 使用Docker Compose 部署模型,将会取代上面的Docker部署
- [ ] Seldon core: 构建完镜像后,使用Seldon Core 部署到k8s集群上, 可以使用Seldon Core的生成模型管理能力
- [ ] k8s: 构建完镜像后, 部署到k8s集群上
- [ ] mlflow deployments: 内置的允许MLflow 部署模块, 如内置的部署到Sagemaker等
- [ ] Model Registry
- [ ] Register Model: 注册相关工件(模型以及相关的参数,指标)到模型中心
## 创建任务
- 点击项目管理-项目名称-工作流定义,点击“创建工作流”按钮,进入 DAG 编辑页面;
- 拖动工具栏的 <img src="/img/tasks/icons/mlflow.png" width="15"/> 任务节点到画板中。
## 任务参数
- DS通用参数
- **节点名称** :设置任务的名称。一个工作流定义中的节点名称是唯一的。
- **运行标志** :标识这个节点是否能正常调度,如果不需要执行,可以打开禁止执行开关。
- **描述** :描述该节点的功能。
- **任务优先级** :worker 线程数不足时,根据优先级从高到低依次执行,优先级一样时根据先进先出原则执行。
- **Worker 分组** :任务分配给 worker 组的机器执行,选择 Default,会随机选择一台 worker 机执行。
- **环境名称** :配置运行脚本的环境。
- **失败重试次数** :任务失败重新提交的次数。
- **失败重试间隔** :任务失败重新提交任务的时间间隔,以分钟为单位。
- **延迟执行时间** :任务延迟执行的时间,以分钟为单位。
- **超时告警** :勾选超时告警、超时失败,当任务超过"超时时长"后,会发送告警邮件并且任务执行失败。
- **自定义参数** :是 mlflow 局部的用户自定义参数,会替换脚本中以 ${变量} 的内容
- **前置任务** :选择当前任务的前置任务,会将被选择的前置任务设置为当前任务的上游。
- MLflow任务特定参数
- **mlflow server tracking uri** :MLflow server 的连接, 默认 http://localhost:5000。
- **实验名称** :任务运行时所在的实验,若无则创建。
- **注册模型** :是否注册模型,若选择注册,则会展开以下参数。
- **注册的模型名称** : 注册的模型名称,会在原来的基础上加上一个模型版本,并注册为Production。
- **任务类型** : 运行的任务类型,目前包括基础算法与AutoML, 后续将会支持用户自定义的ML Project。
- 基础算法下的特有参数
- **算法** :选择的算法,目前基于 [scikit-learn](https://scikit-learn.org/) 形式支持 `lr`, `svm`, `lightgbm`, `xgboost`.
- **参数搜索空间** : 运行对应算法的参数搜索空间, 可为空。如针对lightgbm 的 `max_depth=[5, 10];n_estimators=[100, 200]`
则会进行对应搜索。约定传入后会以`;`切分各个参数,等号前的名字作为参数名,等号后的名字将以python eval执行得到对应的参数值
- AutoML下的参数下的特有参数
- **AutoML工具** : 使用的AutoML工具,目前支持 [autosklearn](https://github.com/automl/auto-sklearn)
, [flaml](https://github.com/microsoft/FLAML)
- BasicAlgorithm 和 AutoML共有参数
- **数据路径** : 文件/文件夹的绝对路径, 若文件需以.csv结尾(自动切分训练集与测试集), 文件夹需包含train.csv和test.csv(建议方式,用户应自行构建测试集用于模型评估)。
- **参数** : 初始化模型/AutoML训练器时的参数,可为空, 如针对 flaml 设置`"time_budget=30;estimator_list=['lgbm']"`。约定传入后会以`;`
切分各个参数,等号前的名字作为参数名,等号后的名字将以python eval执行得到对应的参数值。详细的参数列表如下:
- BasicAlgorithm
- [lr](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression)
- [SVM](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html?highlight=svc#sklearn.svm.SVC)
- [lightgbm](https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMClassifier.html#lightgbm.LGBMClassifier)
- [xgboost](https://xgboost.readthedocs.io/en/stable/python/python_api.html#xgboost.XGBClassifier)
- AutoML
- [flaml](https://microsoft.github.io/FLAML/docs/reference/automl#automl-objects)
- [autosklearn](https://automl.github.io/auto-sklearn/master/api.html)
## 任务样例
### 前置准备
首先介绍一些DS通用参数
#### conda 环境配置
- **节点名称** :设置任务的名称。一个工作流定义中的节点名称是唯一的。
- **运行标志** :标识这个节点是否能正常调度,如果不需要执行,可以打开禁止执行开关。
- **描述** :描述该节点的功能。
- **任务优先级** :worker 线程数不足时,根据优先级从高到低依次执行,优先级一样时根据先进先出原则执行。
- **Worker 分组** :任务分配给 worker 组的机器执行,选择 Default,会随机选择一台 worker 机执行。
- **环境名称** :配置运行脚本的环境。
- **失败重试次数** :任务失败重新提交的次数。
- **失败重试间隔** :任务失败重新提交任务的时间间隔,以分钟为单位。
- **延迟执行时间** :任务延迟执行的时间,以分钟为单位。
- **超时告警** :勾选超时告警、超时失败,当任务超过"超时时长"后,会发送告警邮件并且任务执行失败。
- **前置任务** :选择当前任务的前置任务,会将被选择的前置任务设置为当前任务的上游。
### MLflow Projects
#### BasicAlgorithm
![mlflow-conda-env](/img/tasks/demo/mlflow-basic-algorithm.png)
**任务参数**
- **mlflow server tracking uri** :MLflow server 的连接, 默认 http://localhost:5000。
- **任务类型** : 运行的任务类型,目前包括基础算法与AutoML, 后续将会支持用户自定义的MLflow Project。
- **实验名称** :任务运行时所在的实验,若无则创建。
- **注册模型** :是否注册模型,若选择注册,则会展开以下参数。
- **注册的模型名称** : 注册的模型名称,会在原来的基础上加上一个模型版本,并注册为Production。
- **数据路径** : 文件/文件夹的绝对路径, 若文件需以.csv结尾(自动切分训练集与测试集), 文件夹需包含train.csv和test.csv(建议方式,用户应自行构建测试集用于模型评估)。
详细的参数列表如下:
- [lr](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression)
- [SVM](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html?highlight=svc#sklearn.svm.SVC)
- [lightgbm](https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMClassifier.html#lightgbm.LGBMClassifier)
- [xgboost](https://xgboost.readthedocs.io/en/stable/python/python_api.html#xgboost.XGBClassifier)
- **算法** :选择的算法,目前基于 [scikit-learn](https://scikit-learn.org/) 形式支持 `lr`, `svm`, `lightgbm`, `xgboost`.
- **参数搜索空间** : 运行对应算法的参数搜索空间, 可为空。如针对lightgbm 的 `max_depth=[5, 10];n_estimators=[100, 200]` 则会进行对应搜索。约定传入后会以;切分各个参数,等号前的名字作为参数名,等号后的名字将以python eval执行得到对应的参数值
#### AutoML
![mlflow-automl](/img/tasks/demo/mlflow-automl.png)
**任务参数**
- **mlflow server tracking uri** :MLflow server 的连接, 默认 http://localhost:5000。
- **任务类型** : 运行的任务类型,目前包括基础算法与AutoML, 后续将会支持用户自定义的MLflow Project。
- **实验名称** :任务运行时所在的实验,若无则创建。
- **注册模型** :是否注册模型,若选择注册,则会展开以下参数。
- **注册的模型名称** : 注册的模型名称,会在原来的基础上加上一个模型版本,并注册为Production。
- **数据路径** : 文件/文件夹的绝对路径, 若文件需以.csv结尾(自动切分训练集与测试集), 文件夹需包含train.csv和test.csv(建议方式,用户应自行构建测试集用于模型评估)。
- **参数** : 初始化AutoML训练器时的参数,可为空, 如针对 flaml 设置`time_budget=30;estimator_list=['lgbm']`。约定传入后会以; 切分各个参数,等号前的名字作为参数名,等号后的名字将以python eval执行得到对应的参数值。详细的参数列表如下:
- [flaml](https://microsoft.github.io/FLAML/docs/reference/automl#automl-objects)
- [autosklearn](https://automl.github.io/auto-sklearn/master/api.html)
- **AutoML工具** : 使用的AutoML工具,目前支持 [autosklearn](https://github.com/automl/auto-sklearn)
, [flaml](https://github.com/microsoft/FLAML)
### MLflow Models
#### MLFLOW
![mlflow-models-mlflow](/img/tasks/demo/mlflow-models-mlflow.png)
**任务参数**
- **mlflow server tracking uri** :MLflow server 的连接, 默认 http://localhost:5000。
- **部署模型的uri** :mlflow 服务里面模型对应的uri, 支持 `models:/<model_name>/suffix` 格式 和 `runs:/` 格式。
- **部署端口** :部署服务时的端口。
#### Docker
![mlflow-models-docker](/img/tasks/demo/mlflow-models-docker.png)
- **mlflow server tracking uri** :MLflow server 的连接, 默认 http://localhost:5000。
- **部署模型的uri** :mlflow 服务里面模型对应的uri, 支持 `models:/<model_name>/suffix` 格式 和 `runs:/` 格式。
- **部署端口** :部署服务时的端口。
## 环境准备
### conda 环境配置
你需要进入admin账户配置一个conda环境变量(请提前[安装anaconda](https://docs.continuum.io/anaconda/install/)
或者[安装miniconda](https://docs.conda.io/en/latest/miniconda.html#installing) )
@ -68,13 +122,13 @@ MLflow 任务用于执行 MLflow Project 任务,其中包含了阈值的基础
![mlflow-set-conda-env](/img/tasks/demo/mlflow-set-conda-env.png)
#### mlflow service 启动
### mlflow service 启动
确保你已经安装mlflow,可以使用`pip install mlflow`进行安装
在你想保存实验和模型的地方建立一个文件夹,然后启动 mlflow service
```
```sh
mkdir mlflow
cd mlflow
mlflow server -h 0.0.0.0 -p 5000 --serve-artifacts --backend-store-uri sqlite:///mlflow.db
@ -82,16 +136,7 @@ mlflow server -h 0.0.0.0 -p 5000 --serve-artifacts --backend-store-uri sqlite://
运行后会启动一个mlflow服务
### 执行 基础算法 任务
以下实例展示了如何创建 mlflow 基础算法任务
![mlflow-basic-algorithm](/img/tasks/demo/mlflow-basic-algorithm.png)
执行完后可以通过访问 mlflow service (`http://localhost:5000`) 页面查看实验与模型
可以通过访问 mlflow service (`http://localhost:5000`) 页面查看实验与模型
![mlflow-server](/img/tasks/demo/mlflow-server.png)
### 执行 AutoML 任务
![mlflow-automl](/img/tasks/demo/mlflow-automl.png)

BIN
docs/img/tasks/demo/mlflow-automl.png

Binary file not shown.

Before

Width:  |  Height:  |  Size: 85 KiB

After

Width:  |  Height:  |  Size: 28 KiB

BIN
docs/img/tasks/demo/mlflow-basic-algorithm.png

Binary file not shown.

Before

Width:  |  Height:  |  Size: 137 KiB

After

Width:  |  Height:  |  Size: 32 KiB

BIN
docs/img/tasks/demo/mlflow-models-docker.png

Binary file not shown.

After

Width:  |  Height:  |  Size: 20 KiB

BIN
docs/img/tasks/demo/mlflow-models-mlflow.png

Binary file not shown.

After

Width:  |  Height:  |  Size: 20 KiB

47
dolphinscheduler-task-plugin/dolphinscheduler-task-mlflow/src/main/java/org/apache/dolphinscheduler/plugin/task/mlflow/MlflowConstants.java

@ -38,5 +38,50 @@ public class MlflowConstants {
public static final String RUN_PROJECT_AUTOML_SCRIPT = "run_mlflow_automl_project.sh";
public static final String MLFLOW_TASK_TYPE_PROJECTS = "MLflow Projects";
}
public static final String MLFLOW_TASK_TYPE_MODELS = "MLflow Models";
public static final String MLFLOW_MODELS_DEPLOY_TYPE_MLFLOW = "MLFLOW";
public static final String MLFLOW_MODELS_DEPLOY_TYPE_DOCKER = "DOCKER";
/**
* mlflow command
*/
public static final String EXPORT_MLFLOW_TRACKING_URI_ENV = "export MLFLOW_TRACKING_URI=%s";
public static final String SET_DATA_PATH = "data_path=%s";
public static final String SET_REPOSITORY = "repo=%s";
public static final String MLFLOW_RUN_BASIC_ALGORITHM = "mlflow run $repo " +
"-P algorithm=%s " +
"-P data_path=$data_path " +
"-P params=\"%s\" " +
"-P search_params=\"%s\" " +
"-P model_name=\"%s\" " +
"--experiment-name=\"%s\" " +
"--version=main ";
public static final String MLFLOW_RUN_AUTOML_PROJECT = "mlflow run $repo " +
"-P tool=%s " +
"-P data_path=$data_path " +
"-P params=\"%s\" " +
"-P model_name=\"%s\" " +
"--experiment-name=\"%s\" " +
"--version=main ";
public static final String MLFLOW_MODELS_SERVE = "mlflow models serve -m %s --port %s -h 0.0.0.0";
public static final String MLFLOW_BUILD_DOCKER = "mlflow models build-docker -m %s -n %s --enable-mlserver";
public static final String DOCKER_RREMOVE_CONTAINER = "docker rm -f %s";
public static final String DOCKER_RUN = "docker run --name=%s -p=%s:8080 %s";
}

77
dolphinscheduler-task-plugin/dolphinscheduler-task-mlflow/src/main/java/org/apache/dolphinscheduler/plugin/task/mlflow/MlflowParameters.java

@ -29,7 +29,7 @@ public class MlflowParameters extends AbstractParameters {
private String params = "";
private String mlflowJobType = "BasicAlgorithm";
private String mlflowJobType = "";
/**
* AutoML parameters
@ -51,12 +51,23 @@ public class MlflowParameters extends AbstractParameters {
* mlflow parameters
*/
private String mlflowTaskType = "";
private String experimentName;
private String modelName = "";
private String mlflowTrackingUri = "http://127.0.0.1:5000";
/**
* mlflow models deploy parameters
*/
private String deployType;
private String deployModelKey;
private String deployPort;
public void setAlgorithm(String algorithm) {
this.algorithm = algorithm;
@ -90,6 +101,13 @@ public class MlflowParameters extends AbstractParameters {
return dataPath;
}
public void setMlflowTaskType(String mlflowTaskType) {
this.mlflowTaskType = mlflowTaskType;
}
public String getMlflowTaskType() {
return mlflowTaskType;
}
public void setExperimentNames(String experimentName) {
this.experimentName = experimentName;
@ -131,17 +149,43 @@ public class MlflowParameters extends AbstractParameters {
return automlTool;
}
public void setDeployType(String deployType) {
this.deployType = deployType;
}
public String getDeployType() {
return deployType;
}
public void setDeployModelKey(String deployModelKey) {
this.deployModelKey = deployModelKey;
}
public String getDeployModelKey() {
return deployModelKey;
}
public void setDeployPort(String deployPort) {
this.deployPort = deployPort;
}
public String getDeployPort() {
return deployPort;
}
@Override
public boolean checkParameters() {
Boolean checkResult = experimentName != null && mlflowTrackingUri != null;
if (mlflowJobType.equals(MlflowConstants.JOB_TYPE_BASIC_ALGORITHM)) {
checkResult &= dataPath != null;
} else if (mlflowJobType.equals(MlflowConstants.JOB_TYPE_AUTOML)) {
checkResult &= dataPath != null;
checkResult &= automlTool != null;
} else {
}
Boolean checkResult = true;
// Boolean checkResult = mlflowTrackingUri != null;
// if (mlflowJobType.equals(MlflowConstants.JOB_TYPE_BASIC_ALGORITHM)) {
// checkResult &= dataPath != null;
// checkResult &= experimentName != null;
// } else if (mlflowJobType.equals(MlflowConstants.JOB_TYPE_AUTOML)) {
// checkResult &= dataPath != null;
// checkResult &= automlTool != null;
// checkResult &= experimentName != null;
// } else {
// }
return checkResult;
}
@ -188,4 +232,17 @@ public class MlflowParameters extends AbstractParameters {
return scriptPath;
}
public String getModelKeyName(String tag) throws IllegalArgumentException{
String imageName;
if (deployModelKey.startsWith("runs:")) {
imageName = deployModelKey.replace("runs:/", "");
} else if (deployModelKey.startsWith("models:")) {
imageName = deployModelKey.replace("models:/", "");
} else {
throw new IllegalArgumentException("model key must start with runs:/ or models:/ ");
}
imageName = imageName.replace("/", tag);
return imageName;
}
};

112
dolphinscheduler-task-plugin/dolphinscheduler-task-mlflow/src/main/java/org/apache/dolphinscheduler/plugin/task/mlflow/MlflowTask.java

@ -22,16 +22,18 @@ import static org.apache.dolphinscheduler.plugin.task.api.TaskConstants.EXIT_COD
import org.apache.dolphinscheduler.plugin.task.api.AbstractTaskExecutor;
import org.apache.dolphinscheduler.plugin.task.api.ShellCommandExecutor;
import org.apache.dolphinscheduler.plugin.task.api.TaskExecutionContext;
import org.apache.dolphinscheduler.plugin.task.api.model.Property;
import org.apache.dolphinscheduler.plugin.task.api.model.TaskResponse;
import org.apache.dolphinscheduler.plugin.task.api.parameters.AbstractParameters;
import org.apache.dolphinscheduler.plugin.task.api.parser.ParamUtils;
import org.apache.dolphinscheduler.plugin.task.api.utils.MapUtils;
import org.apache.dolphinscheduler.plugin.task.api.parser.ParameterUtils;
import org.apache.dolphinscheduler.spi.utils.JSONUtils;
import java.nio.file.Files;
import java.nio.file.Path;
import java.io.*;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
/**
* shell task
@ -62,9 +64,7 @@ public class MlflowTask extends AbstractTaskExecutor {
super(taskExecutionContext);
this.taskExecutionContext = taskExecutionContext;
this.shellCommandExecutor = new ShellCommandExecutor(this::logHandle,
taskExecutionContext,
logger);
this.shellCommandExecutor = new ShellCommandExecutor(this::logHandle, taskExecutionContext, logger);
}
@Override
@ -101,24 +101,94 @@ public class MlflowTask extends AbstractTaskExecutor {
shellCommandExecutor.cancelApplication();
}
public String buildCommand(){
String command = "";
if (mlflowParameters.getMlflowTaskType().equals(MlflowConstants.MLFLOW_TASK_TYPE_PROJECTS)) {
command = buildCommandForMlflowProjects();
} else if (mlflowParameters.getMlflowTaskType().equals(MlflowConstants.MLFLOW_TASK_TYPE_MODELS)) {
command = buildCommandForMlflowModels();
}
logger.info("mlflow task command: \n{}", command);
return command;
}
/**
* create command
*
* @return file name
* @throws Exception exception
*/
private String buildCommand() throws Exception {
private String buildCommandForMlflowProjects() {
Map<String, Property> paramsMap = getParamsMap();
List<String> args = new ArrayList<>();
args.add(String.format(MlflowConstants.EXPORT_MLFLOW_TRACKING_URI_ENV, mlflowParameters.getMlflowTrackingUri()));
String runCommand;
if (mlflowParameters.getMlflowJobType().equals(MlflowConstants.JOB_TYPE_BASIC_ALGORITHM)) {
args.add(String.format(MlflowConstants.SET_DATA_PATH, mlflowParameters.getDataPath()));
args.add(String.format(MlflowConstants.SET_REPOSITORY, MlflowConstants.PRESET_BASIC_ALGORITHM_PROJECT));
runCommand = MlflowConstants.MLFLOW_RUN_BASIC_ALGORITHM;
runCommand = String.format(runCommand, mlflowParameters.getAlgorithm(), mlflowParameters.getParams(), mlflowParameters.getSearchParams(), mlflowParameters.getModelName(), mlflowParameters.getExperimentName());
} else if (mlflowParameters.getMlflowJobType().equals(MlflowConstants.JOB_TYPE_AUTOML)) {
args.add(String.format(MlflowConstants.SET_DATA_PATH, mlflowParameters.getDataPath()));
args.add(String.format(MlflowConstants.SET_REPOSITORY, MlflowConstants.PRESET_AUTOML_PROJECT));
runCommand = MlflowConstants.MLFLOW_RUN_AUTOML_PROJECT;
runCommand = String.format(runCommand, mlflowParameters.getAutomlTool(), mlflowParameters.getParams(), mlflowParameters.getModelName(), mlflowParameters.getExperimentName());
} else {
runCommand = String.format("Cant not Support %s", mlflowParameters.getMlflowTaskType());
}
args.add(runCommand);
String command = ParameterUtils.convertParameterPlaceholders(String.join("\n", args), ParamUtils.convert(paramsMap));
return command;
}
protected String buildCommandForMlflowModels() {
/**
* load script template from resource folder
* papermill [OPTIONS] NOTEBOOK_PATH [OUTPUT_PATH]
*/
String script = loadRunScript(mlflowParameters.getScriptPath());
script = parseScript(script);
logger.info("raw script : \n{}", script);
logger.info("task execute path : {}", taskExecutionContext.getExecutePath());
return script;
Map<String, Property> paramsMap = getParamsMap();
List<String> args = new ArrayList<>();
args.add(String.format(MlflowConstants.EXPORT_MLFLOW_TRACKING_URI_ENV, mlflowParameters.getMlflowTrackingUri()));
String deployModelKey = mlflowParameters.getDeployModelKey();
if (mlflowParameters.getDeployType().equals(MlflowConstants.MLFLOW_MODELS_DEPLOY_TYPE_MLFLOW)) {
args.add(String.format(MlflowConstants.MLFLOW_MODELS_SERVE, deployModelKey, mlflowParameters.getDeployPort()));
} else if (mlflowParameters.getDeployType().equals(MlflowConstants.MLFLOW_MODELS_DEPLOY_TYPE_DOCKER)) {
String imageName = "mlflow/" + mlflowParameters.getModelKeyName(":");
String containerName = "mlflow-" + mlflowParameters.getModelKeyName("-");
args.add(String.format(MlflowConstants.MLFLOW_BUILD_DOCKER, deployModelKey, imageName));
args.add(String.format(MlflowConstants.DOCKER_RREMOVE_CONTAINER, containerName));
args.add(String.format(MlflowConstants.DOCKER_RUN, containerName, mlflowParameters.getDeployPort(), imageName));
}
String command = ParameterUtils.convertParameterPlaceholders(String.join("\n", args), ParamUtils.convert(paramsMap));
return command;
}
private Map<String, Property> getParamsMap() {
// replace placeholder, and combining local and global parameters
Map<String, Property> paramsMap = ParamUtils.convert(taskExecutionContext, getParameters());
if (MapUtils.isEmpty(paramsMap)) {
paramsMap = new HashMap<>();
}
if (MapUtils.isNotEmpty(taskExecutionContext.getParamsMap())) {
paramsMap.putAll(taskExecutionContext.getParamsMap());
}
return paramsMap;
}
@Override
@ -126,14 +196,4 @@ public class MlflowTask extends AbstractTaskExecutor {
return mlflowParameters;
}
private String parseScript(String script) {
return ParameterUtils.convertParameterPlaceholders(script, mlflowParameters.getParamsMap());
}
public static String loadRunScript(String scriptPath) throws IOException {
Path path = Paths.get(scriptPath);
byte[] data = Files.readAllBytes(path);
String result = new String(data);
return result;
}
}

78
dolphinscheduler-task-plugin/dolphinscheduler-task-mlflow/src/test/java/org/apache/dolphinler/plugin/task/mlflow/MlflowTaskTest.java

@ -22,6 +22,7 @@ import java.util.UUID;
import org.apache.dolphinscheduler.plugin.task.api.TaskExecutionContext;
import org.apache.dolphinscheduler.plugin.task.api.TaskExecutionContextCacheManager;
import org.apache.dolphinscheduler.plugin.task.mlflow.MlflowConstants;
import org.apache.dolphinscheduler.plugin.task.mlflow.MlflowParameters;
import org.apache.dolphinscheduler.plugin.task.mlflow.MlflowTask;
import org.junit.Assert;
@ -56,7 +57,7 @@ public class MlflowTaskTest {
PowerMockito.mockStatic(PropertyUtils.class);
}
public TaskExecutionContext createContext(MlflowParameters mlflowParameters){
public TaskExecutionContext createContext(MlflowParameters mlflowParameters) {
String parameters = JSONUtils.toJsonString(mlflowParameters);
TaskExecutionContext taskExecutionContext = Mockito.mock(TaskExecutionContext.class);
Mockito.when(taskExecutionContext.getTaskParams()).thenReturn(parameters);
@ -77,36 +78,43 @@ public class MlflowTaskTest {
}
@Test
public void testInitBasicAlgorithmTask()
throws Exception {
try {
MlflowParameters mlflowParameters = createBasicAlgorithmParameters();
TaskExecutionContext taskExecutionContext = createContext(mlflowParameters);
MlflowTask mlflowTask = new MlflowTask(taskExecutionContext);
mlflowTask.init();
mlflowTask.getParameters().setVarPool(taskExecutionContext.getVarPool());
} catch (Exception e) {
Assert.fail(e.getMessage());
}
public void testInitBasicAlgorithmTask() throws Exception {
MlflowTask mlflowTask = initTask(createBasicAlgorithmParameters());
String command = mlflowTask.buildCommand();
}
@Test
public void testInitAutoMLTask() throws Exception {
MlflowTask mlflowTask = initTask(createAutoMLParameters());
String command = mlflowTask.buildCommand();
}
@Test
public void testModelsDeployMlflow() throws Exception {
MlflowTask mlflowTask = initTask(createModelDeplyMlflowParameters());
String command = mlflowTask.buildCommand();
}
@Test
public void testInitAutoMLTask()
throws Exception {
try {
MlflowParameters mlflowParameters = createAutoMLParameters();
TaskExecutionContext taskExecutionContext = createContext(mlflowParameters);
MlflowTask mlflowTask = new MlflowTask(taskExecutionContext);
mlflowTask.init();
mlflowTask.getParameters().setVarPool(taskExecutionContext.getVarPool());
} catch (Exception e) {
Assert.fail(e.getMessage());
}
public void testModelsDeployDocker() throws Exception {
MlflowTask mlflowTask = initTask(createModelDeplyDockerParameters());
String command = mlflowTask.buildCommand();
}
private MlflowTask initTask(MlflowParameters mlflowParameters) {
TaskExecutionContext taskExecutionContext = createContext(mlflowParameters);
MlflowTask mlflowTask = new MlflowTask(taskExecutionContext);
mlflowTask.init();
mlflowTask.getParameters().setVarPool(taskExecutionContext.getVarPool());
return mlflowTask;
}
private MlflowParameters createBasicAlgorithmParameters() {
MlflowParameters mlflowParameters = new MlflowParameters();
mlflowParameters.setMlflowJobType("BasicAlgorithm");
mlflowParameters.setMlflowTaskType(MlflowConstants.MLFLOW_TASK_TYPE_PROJECTS);
mlflowParameters.setMlflowJobType(MlflowConstants.JOB_TYPE_BASIC_ALGORITHM);
mlflowParameters.setAlgorithm("xgboost");
mlflowParameters.setDataPaths("xxxxxxxxxx");
mlflowParameters.setExperimentNames("asbbb");
@ -116,7 +124,8 @@ public class MlflowTaskTest {
private MlflowParameters createAutoMLParameters() {
MlflowParameters mlflowParameters = new MlflowParameters();
mlflowParameters.setMlflowJobType("AutoML");
mlflowParameters.setMlflowTaskType(MlflowConstants.MLFLOW_TASK_TYPE_PROJECTS);
mlflowParameters.setMlflowJobType(MlflowConstants.JOB_TYPE_AUTOML);
mlflowParameters.setAutomlTool("autosklearn");
mlflowParameters.setParams("time_left_for_this_task=30");
mlflowParameters.setDataPaths("xxxxxxxxxxx");
@ -126,4 +135,23 @@ public class MlflowTaskTest {
return mlflowParameters;
}
private MlflowParameters createModelDeplyMlflowParameters() {
MlflowParameters mlflowParameters = new MlflowParameters();
mlflowParameters.setMlflowTaskType(MlflowConstants.MLFLOW_TASK_TYPE_MODELS);
mlflowParameters.setDeployType(MlflowConstants.MLFLOW_MODELS_DEPLOY_TYPE_MLFLOW);
mlflowParameters.setMlflowTrackingUris("http://127.0.0.1:5000");
mlflowParameters.setDeployModelKey("runs:/a272ec279fc34a8995121ae04281585f/model");
mlflowParameters.setDeployPort("7000");
return mlflowParameters;
}
private MlflowParameters createModelDeplyDockerParameters() {
MlflowParameters mlflowParameters = new MlflowParameters();
mlflowParameters.setMlflowTaskType(MlflowConstants.MLFLOW_TASK_TYPE_MODELS);
mlflowParameters.setDeployType(MlflowConstants.MLFLOW_MODELS_DEPLOY_TYPE_DOCKER);
mlflowParameters.setMlflowTrackingUris("http://127.0.0.1:5000");
mlflowParameters.setDeployModelKey("runs:/a272ec279fc34a8995121ae04281585f/model");
mlflowParameters.setDeployPort("7000");
return mlflowParameters;
}
}

4
dolphinscheduler-ui/src/locales/modules/en_US.ts

@ -989,6 +989,10 @@ const project = {
' mlflow server tracking uri cant not be empty',
mlflow_jobType: 'job type',
mlflow_automlTool: 'AutoML tool',
mlflow_taskType: 'MLflow Task Type',
mlflow_deployType: 'Deploy Mode',
mlflow_deployModelKey: 'model-uri',
mlflow_deployPort: 'Port',
send_email: 'Send Email',
log_display: 'Log display',
rows_of_result: 'rows of result',

4
dolphinscheduler-ui/src/locales/modules/zh_CN.ts

@ -972,6 +972,10 @@ const project = {
mlflow_mlflowTrackingUri_error_tips: ' mlflow server tracking uri 不能为空',
mlflow_jobType: '任务类型',
mlflow_automlTool: 'AutoML工具',
mlflow_taskType: 'MLflow 任务类型',
mlflow_deployType: '部署类型',
mlflow_deployModelKey: '部署的模型uri',
mlflow_deployPort: '监听端口',
send_email: '发送邮件',
log_display: '日志显示',
rows_of_result: '行查询结果',

2
dolphinscheduler-ui/src/views/projects/task/components/node/fields/index.ts

@ -66,3 +66,5 @@ export { useNamespace } from './use-namespace'
export { useK8s } from './use-k8s'
export { useJupyter } from './use-jupyter'
export { useMlflow } from './use-mlflow'
export { useMlflowProjects } from './use-mlflow-projects'
export { useMlflowModels } from './use-mlflow-models'

80
dolphinscheduler-ui/src/views/projects/task/components/node/fields/use-mlflow-models.ts

@ -0,0 +1,80 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import { useI18n } from 'vue-i18n'
import type { IJsonItem } from '../types'
import { watch, ref } from 'vue'
export function useMlflowModels(model: { [field: string]: any }): IJsonItem[] {
const { t } = useI18n()
const deployTypeSpan = ref(0)
const deployModelKeySpan = ref(0)
const deployPortSpan = ref(0)
const setFlag = () => {
model.isModels = model.mlflowTaskType === 'MLflow Models' ? true : false
}
const resetSpan = () => {
deployTypeSpan.value = model.isModels ? 12 : 0
deployModelKeySpan.value = model.isModels ? 24 : 0
deployPortSpan.value = model.isModels ? 12 : 0
}
watch(
() => [model.mlflowTaskType, model.registerModel],
() => {
setFlag()
resetSpan()
}
)
setFlag()
resetSpan()
return [
{
type: 'select',
field: 'deployType',
name: t('project.node.mlflow_deployType'),
span: deployTypeSpan,
options: DEPLOY_TYPE
},
{
type: 'input',
field: 'deployModelKey',
name: t('project.node.mlflow_deployModelKey'),
span: deployModelKeySpan
},
{
type: 'input',
field: 'deployPort',
name: t('project.node.mlflow_deployPort'),
span: deployPortSpan
}
]
}
const DEPLOY_TYPE = [
{
label: 'MLFLOW',
value: 'MLFLOW'
},
{
label: 'DOCKER',
value: 'DOCKER'
}
]

255
dolphinscheduler-ui/src/views/projects/task/components/node/fields/use-mlflow-projects.ts

@ -0,0 +1,255 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import { useI18n } from 'vue-i18n'
import type { IJsonItem } from '../types'
import { watch, ref } from 'vue'
export function useMlflowProjects(model: {
[field: string]: any
}): IJsonItem[] {
const { t } = useI18n()
const experimentNameSpan = ref(0)
const registerModelSpan = ref(0)
const modelNameSpan = ref(0)
const mlflowJobTypeSpan = ref(0)
const dataPathSpan = ref(0)
const paramsSpan = ref(0)
const setFlag = () => {
model.isProjects = model.mlflowTaskType === 'MLflow Projects' ? true : false
}
const resetSpan = () => {
experimentNameSpan.value = model.isProjects ? 12 : 0
registerModelSpan.value = model.isProjects ? 6 : 0
mlflowJobTypeSpan.value = model.isProjects ? 12 : 0
paramsSpan.value = model.isProjects ? 24 : 0
dataPathSpan.value = model.isProjects ? 24 : 0
}
watch(
() => [model.mlflowTaskType, model.registerModel],
() => {
setFlag()
resetSpan()
}
)
watch(
() => [model.registerModel],
() => {
modelNameSpan.value = model.isProjects && model.registerModel ? 6 : 0
}
)
setFlag()
resetSpan()
return [
{
type: 'select',
field: 'mlflowJobType',
name: t('project.node.mlflow_jobType'),
span: mlflowJobTypeSpan,
options: MLFLOW_JOB_TYPE
},
{
type: 'input',
field: 'experimentName',
name: t('project.node.mlflow_experimentName'),
span: experimentNameSpan,
props: {
placeholder: t('project.node.mlflow_experimentName_tips')
},
validate: {
trigger: ['input', 'blur'],
required: false
}
},
{
type: 'switch',
field: 'registerModel',
name: t('project.node.mlflow_registerModel'),
span: registerModelSpan
},
{
type: 'input',
field: 'modelName',
name: t('project.node.mlflow_modelName'),
span: modelNameSpan,
props: {
placeholder: t('project.node.mlflow_modelName_tips')
},
validate: {
trigger: ['input', 'blur'],
required: false
}
},
{
type: 'input',
field: 'dataPath',
name: t('project.node.mlflow_dataPath'),
span: dataPathSpan,
props: {
placeholder: t('project.node.mlflow_dataPath_tips')
}
},
{
type: 'input',
field: 'params',
name: t('project.node.mlflow_params'),
span: paramsSpan,
props: {
placeholder: t('project.node.mlflow_params_tips')
},
validate: {
trigger: ['input', 'blur'],
required: false
}
},
...useBasicAlgorithm(model),
...useAutoML(model)
]
}
export function useBasicAlgorithm(model: {
[field: string]: any
}): IJsonItem[] {
const { t } = useI18n()
const algorithmSpan = ref(0)
const searchParamsSpan = ref(0)
const setFlag = () => {
model.isBasicAlgorithm =
model.mlflowJobType === 'BasicAlgorithm' &&
model.mlflowTaskType === 'MLflow Projects'
? true
: false
}
const resetSpan = () => {
algorithmSpan.value = model.isBasicAlgorithm ? 12 : 0
searchParamsSpan.value = model.isBasicAlgorithm ? 24 : 0
}
watch(
() => [model.mlflowTaskType, model.mlflowJobType],
() => {
setFlag()
resetSpan()
}
)
return [
{
type: 'select',
field: 'algorithm',
name: t('project.node.mlflow_algorithm'),
span: algorithmSpan,
options: ALGORITHM
},
{
type: 'input',
field: 'searchParams',
name: t('project.node.mlflow_searchParams'),
props: {
placeholder: t('project.node.mlflow_searchParams_tips')
},
span: searchParamsSpan,
validate: {
trigger: ['input', 'blur'],
required: false
}
}
]
}
export function useAutoML(model: { [field: string]: any }): IJsonItem[] {
const { t } = useI18n()
const automlToolSpan = ref(0)
const setFlag = () => {
model.isAutoML =
model.mlflowJobType === 'AutoML' &&
model.mlflowTaskType === 'MLflow Projects'
? true
: false
}
const resetSpan = () => {
automlToolSpan.value = model.isAutoML ? 12 : 0
}
watch(
() => [model.mlflowTaskType, model.mlflowJobType],
() => {
setFlag()
resetSpan()
}
)
return [
{
type: 'select',
field: 'automlTool',
name: t('project.node.mlflow_automlTool'),
span: automlToolSpan,
options: AutoMLTOOL
}
]
}
export const MLFLOW_JOB_TYPE = [
{
label: 'BasicAlgorithm',
value: 'BasicAlgorithm'
},
{
label: 'AutoML',
value: 'AutoML'
}
]
export const ALGORITHM = [
{
label: 'svm',
value: 'svm'
},
{
label: 'lr',
value: 'lr'
},
{
label: 'lightgbm',
value: 'lightgbm'
},
{
label: 'xgboost',
value: 'xgboost'
}
]
export const AutoMLTOOL = [
{
label: 'autosklearn',
value: 'autosklearn'
},
{
label: 'flaml',
value: 'flaml'
}
]

140
dolphinscheduler-ui/src/views/projects/task/components/node/fields/use-mlflow.ts

@ -15,62 +15,22 @@
* limitations under the License.
*/
import { useI18n } from 'vue-i18n'
import { useCustomParams } from '.'
import type { IJsonItem } from '../types'
import { computed } from 'vue'
import { useMlflowProjects, useMlflowModels } from '.'
export const MLFLOW_JOB_TYPE = [
export const MLFLOW_TASK_TYPE = [
{
label: 'BasicAlgorithm',
value: 'BasicAlgorithm'
label: 'MLflow Models',
value: 'MLflow Models'
},
{
label: 'AutoML',
value: 'AutoML'
}
]
export const ALGORITHM = [
{
label: 'svm',
value: 'svm'
},
{
label: 'lr',
value: 'lr'
},
{
label: 'lightgbm',
value: 'lightgbm'
},
{
label: 'xgboost',
value: 'xgboost'
}
]
export const AutoMLTOOL = [
{
label: 'autosklearn',
value: 'autosklearn'
},
{
label: 'flaml',
value: 'flaml'
label: 'MLflow Projects',
value: 'MLflow Projects'
}
]
export function useMlflow(model: { [field: string]: any }): IJsonItem[] {
const { t } = useI18n()
const registerModelSpan = computed(() => (model.registerModel ? 12 : 24))
const modelNameSpan = computed(() => (model.registerModel ? 12 : 0))
const algorithmSpan = computed(() =>
model.mlflowJobType === 'BasicAlgorithm' ? 12 : 0
)
const automlToolSpan = computed(() =>
model.mlflowJobType === 'AutoML' ? 12 : 0
)
const searchParamsSpan = computed(() =>
model.mlflowJobType === 'BasicAlgorithm' ? 24 : 0
)
return [
{
@ -93,92 +53,14 @@ export function useMlflow(model: { [field: string]: any }): IJsonItem[] {
}
}
},
{
type: 'input',
field: 'experimentName',
name: t('project.node.mlflow_experimentName'),
span: 12,
props: {
placeholder: t('project.node.mlflow_experimentName_tips')
},
validate: {
trigger: ['input', 'blur'],
required: false
}
},
{
type: 'switch',
field: 'registerModel',
name: t('project.node.mlflow_registerModel'),
span: registerModelSpan
},
{
type: 'input',
field: 'modelName',
name: t('project.node.mlflow_modelName'),
span: modelNameSpan,
props: {
placeholder: t('project.node.mlflow_modelName_tips')
},
validate: {
trigger: ['input', 'blur'],
required: false
}
},
{
type: 'select',
field: 'mlflowJobType',
name: t('project.node.mlflow_jobType'),
field: 'mlflowTaskType',
name: t('project.node.mlflow_taskType'),
span: 12,
options: MLFLOW_JOB_TYPE
},
{
type: 'select',
field: 'algorithm',
name: t('project.node.mlflow_algorithm'),
span: algorithmSpan,
options: ALGORITHM
},
{
type: 'select',
field: 'automlTool',
name: t('project.node.mlflow_automlTool'),
span: automlToolSpan,
options: AutoMLTOOL
},
{
type: 'input',
field: 'dataPath',
name: t('project.node.mlflow_dataPath'),
props: {
placeholder: t('project.node.mlflow_dataPath_tips')
}
},
{
type: 'input',
field: 'params',
name: t('project.node.mlflow_params'),
props: {
placeholder: t('project.node.mlflow_params_tips')
},
validate: {
trigger: ['input', 'blur'],
required: false
}
},
{
type: 'input',
field: 'searchParams',
name: t('project.node.mlflow_searchParams'),
props: {
placeholder: t('project.node.mlflow_searchParams_tips')
},
span: searchParamsSpan,
validate: {
trigger: ['input', 'blur'],
required: false
}
options: MLFLOW_TASK_TYPE
},
...useCustomParams({ model, field: 'localParams', isSimple: false })
...useMlflowProjects(model),
...useMlflowModels(model)
]
}

5
dolphinscheduler-ui/src/views/projects/task/components/node/format-data.ts

@ -337,7 +337,6 @@ export function formatParams(data: INodeData): {
}
if (data.taskType === 'MLFLOW') {
taskParams.algorithm = data.algorithm
taskParams.algorithm = data.algorithm
taskParams.params = data.params
taskParams.searchParams = data.searchParams
@ -348,6 +347,10 @@ export function formatParams(data: INodeData): {
taskParams.mlflowJobType = data.mlflowJobType
taskParams.automlTool = data.automlTool
taskParams.registerModel = data.registerModel
taskParams.mlflowTaskType = data.mlflowTaskType
taskParams.deployType = data.deployType
taskParams.deployPort = data.deployPort
taskParams.deployModelKey = data.deployModelKey
}
if (data.taskType === 'PIGEON') {

5
dolphinscheduler-ui/src/views/projects/task/components/node/tasks/use-mlflow.ts

@ -41,8 +41,11 @@ export function useMlflow({
failRetryInterval: 1,
failRetryTimes: 0,
workerGroup: 'default',
mlflowTrackingUri: 'http://127.0.0.1:5000',
algorithm: 'svm',
mlflowTrackingUri: 'http://127.0.0.1:5000',
mlflowTaskType: 'MLflow Models',
deployType: 'MLFLOW',
deployPort: '7000',
mlflowJobType: 'AutoML',
automlTool: 'flaml',
delayTime: 0,

4
dolphinscheduler-ui/src/views/projects/task/components/node/types.ts

@ -329,6 +329,10 @@ interface ITaskParams {
mlflowJobType?: string
automlTool?: string
registerModel?: boolean
mlflowTaskType?: string
deployType?: string
deployPort?: string
deployModelKey?: string
}
interface INodeData

Loading…
Cancel
Save