Browse Source

[Feature] [MLOps] support mlflow deploy with docker compose (#10217)

* [Feature] [MLOps] support mlflow deploy with docker compose

fix doc

Update docs/docs/en/guide/task/mlflow.md

fix doc

Co-authored-by: Jiajie Zhong <zhongjiajie955@gmail.com>

revert cancel modification

fix ENV name and docker compose command

* fix doc image link

* fix testModelsDeployDockerCompose

* add docker compose container health check and fix mlflow bug

* update docker compose healthcheck timeout
3.1.0-release
JieguangZhou 2 years ago committed by GitHub
parent
commit
3258438f6e
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
  1. 104
      docs/docs/en/guide/task/mlflow.md
  2. 78
      docs/docs/zh/guide/task/mlflow.md
  3. BIN
      docs/img/tasks/demo/mlflow-automl.png
  4. BIN
      docs/img/tasks/demo/mlflow-basic-algorithm.png
  5. BIN
      docs/img/tasks/demo/mlflow-custom-project-template.png
  6. BIN
      docs/img/tasks/demo/mlflow-custom-project.png
  7. BIN
      docs/img/tasks/demo/mlflow-models-docker-compose.png
  8. BIN
      docs/img/tasks/demo/mlflow-models-docker.png
  9. BIN
      docs/img/tasks/demo/mlflow-models-mlflow.png
  10. 27
      dolphinscheduler-task-plugin/dolphinscheduler-task-mlflow/src/main/java/org/apache/dolphinscheduler/plugin/task/mlflow/MlflowConstants.java
  11. 44
      dolphinscheduler-task-plugin/dolphinscheduler-task-mlflow/src/main/java/org/apache/dolphinscheduler/plugin/task/mlflow/MlflowParameters.java
  12. 21
      dolphinscheduler-task-plugin/dolphinscheduler-task-mlflow/src/main/java/org/apache/dolphinscheduler/plugin/task/mlflow/MlflowTask.java
  13. 39
      dolphinscheduler-task-plugin/dolphinscheduler-task-mlflow/src/main/resources/docker-compose.yml
  14. 25
      dolphinscheduler-task-plugin/dolphinscheduler-task-mlflow/src/main/resources/run_mlflow_automl_project.sh
  15. 25
      dolphinscheduler-task-plugin/dolphinscheduler-task-mlflow/src/main/resources/run_mlflow_basic_algorithm_project.sh
  16. 37
      dolphinscheduler-task-plugin/dolphinscheduler-task-mlflow/src/test/java/org/apache/dolphinler/plugin/task/mlflow/MlflowTaskTest.java
  17. 33
      dolphinscheduler-ui/src/locales/en_US/project.ts
  18. 8
      dolphinscheduler-ui/src/locales/zh_CN/project.ts
  19. 28
      dolphinscheduler-ui/src/views/projects/task/components/node/fields/use-mlflow-models.ts
  20. 2
      dolphinscheduler-ui/src/views/projects/task/components/node/format-data.ts
  21. 2
      dolphinscheduler-ui/src/views/projects/task/components/node/tasks/use-mlflow.ts
  22. 2
      dolphinscheduler-ui/src/views/projects/task/components/node/types.ts

104
docs/docs/en/guide/task/mlflow.md

@ -5,13 +5,13 @@
[MLflow](https://mlflow.org) is an excellent open source platform to manage the ML lifecycle, including experimentation, [MLflow](https://mlflow.org) is an excellent open source platform to manage the ML lifecycle, including experimentation,
reproducibility, deployment, and a central model registry. reproducibility, deployment, and a central model registry.
MLflow task plugin used to execute MLflow tasks,Currently contains Mlflow Projects and MLflow Models.(Model Registry will soon be rewarded for support) MLflow task plugin used to execute MLflow tasks,Currently contains MLflow Projects and MLflow Models. (Model Registry will soon be rewarded for support)
- Mlflow Projects: Package data science code in a format to reproduce runs on any platform. - MLflow Projects: Package data science code in a format to reproduce runs on any platform.
- MLflow Models: Deploy machine learning models in diverse serving environments. - MLflow Models: Deploy machine learning models in diverse serving environments.
- Model Registry: Store, annotate, discover, and manage models in a central repository. - Model Registry: Store, annotate, discover, and manage models in a central repository.
The Mlflow plugin currently supports and will support the following: The MLflow plugin currently supports and will support the following:
- [x] MLflow Projects - [x] MLflow Projects
- [x] BasicAlgorithm: contains LogisticRegression, svm, lightgbm, xgboost - [x] BasicAlgorithm: contains LogisticRegression, svm, lightgbm, xgboost
@ -20,10 +20,10 @@ The Mlflow plugin currently supports and will support the following:
- [ ] MLflow Models - [ ] MLflow Models
- [x] MLFLOW: Use `MLflow models serve` to deploy a model service - [x] MLFLOW: Use `MLflow models serve` to deploy a model service
- [x] Docker: Run the container after packaging the docker image - [x] Docker: Run the container after packaging the docker image
- [ ] Docker Compose: Use docker compose to run the container, Will replace the docker run above - [x] Docker Compose: Use docker compose to run the container, it will replace the docker run above
- [ ] Seldon core: Use Selcon core to deploy model to k8s cluster - [ ] Seldon core: Use Selcon core to deploy model to k8s cluster
- [ ] k8s: Deploy containers directly to K8S - [ ] k8s: Deploy containers directly to K8S
- [ ] mlflow deployments: Built-in deployment modules, such as built-in deployment to SageMaker, etc - [ ] MLflow deployments: Built-in deployment modules, such as built-in deployment to SageMaker, etc
- [ ] Model Registry - [ ] Model Registry
- [ ] Register Model: Allows artifacts (Including model and related parameters, indicators) to be registered directly into the model center - [ ] Register Model: Allows artifacts (Including model and related parameters, indicators) to be registered directly into the model center
@ -37,7 +37,7 @@ The Mlflow plugin currently supports and will support the following:
## Task Example ## Task Example
First, introduce some general parameters of DolphinScheduler First, introduce some general parameters of DolphinScheduler:
- **Node name**: The node name in a workflow definition is unique. - **Node name**: The node name in a workflow definition is unique.
- **Run flag**: Identifies whether this node schedules normally, if it does not need to execute, select - **Run flag**: Identifies whether this node schedules normally, if it does not need to execute, select
@ -56,6 +56,11 @@ First, introduce some general parameters of DolphinScheduler
- **Predecessor task**: Selecting a predecessor task for the current task, will set the selected predecessor task as - **Predecessor task**: Selecting a predecessor task for the current task, will set the selected predecessor task as
upstream of the current task. upstream of the current task.
Here are some specific parameters for the MLFlow component:
- **MLflow Tracking Server URI**: MLflow Tracking Server URI, default http://localhost:5000.
- **Experiment Name**: Create the experiment where the task is running, if the experiment does not exist. If the name is empty, it is set to ` Default `, the same as MLflow.
### MLflow Projects ### MLflow Projects
#### BasicAlgorithm #### BasicAlgorithm
@ -64,24 +69,22 @@ First, introduce some general parameters of DolphinScheduler
**Task Parameter** **Task Parameter**
- **mlflow server tracking uri** :MLflow server uri, default http://localhost:5000. - **Register Model**: Register the model or not. If register is selected, the following parameters are expanded.
- **experiment name** :Create the experiment where the task is running, if the experiment does not exist. If the name is empty, it is set to ` Default `, the same as MLflow. - **Model Name**: The registered model name is added to the original model version and registered as
- **register model** :Register the model or not. If register is selected, the following parameters are expanded.
- **model name** : The registered model name is added to the original model version and registered as
Production. Production.
- **data path** : The absolute path of the file or folder. Ends with .csv for file or contain train.csv and - **Data Path**: The absolute path of the file or folder. Ends with .csv for file or contain train.csv and
test.csv for folder(In the suggested way, users should build their own test sets for model evaluation) test.csv for folder(In the suggested way, users should build their own test sets for model evaluation).
- **parameters** : Parameter when initializing the algorithm/AutoML model, which can be empty. For example - **Parameters**: Parameter when initializing the algorithm/AutoML model, which can be empty. For example
parameters `"time_budget=30;estimator_list=['lgbm']"` for flaml 。The convention will be passed with '; 'shards parameters `"time_budget=30;estimator_list=['lgbm']"` for flaml 。The convention will be passed with '; ' shards
each parameter, using the name before the equal sign as the parameter name, and using the name after the equal each parameter, using the name before the equal sign as the parameter name, and using the name after the equal
sign to get the corresponding parameter value through `python eval()`. sign to get the corresponding parameter value through `python eval()`.
- [Logistic Regression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression) - [Logistic Regression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression)
- [SVM](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html?highlight=svc#sklearn.svm.SVC) - [SVM](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html?highlight=svc#sklearn.svm.SVC)
- [lightgbm](https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMClassifier.html#lightgbm.LGBMClassifier) - [lightgbm](https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMClassifier.html#lightgbm.LGBMClassifier)
- [xgboost](https://xgboost.readthedocs.io/en/stable/python/python_api.html#xgboost.XGBClassifier) - [xgboost](https://xgboost.readthedocs.io/en/stable/python/python_api.html#xgboost.XGBClassifier)
- **algorithm** :The selected algorithm currently supports `LR`, `SVM`, `LightGBM` and `XGboost` based - **Algorithm**:The selected algorithm currently supports `LR`, `SVM`, `LightGBM` and `XGboost` based
on [scikit-learn](https://scikit-learn.org/) form. on [scikit-learn](https://scikit-learn.org/) form.
- **Parameter search space** : Parameter search space when running the corresponding algorithm, which can be - **Parameter Search Space**: Parameter search space when running the corresponding algorithm, which can be
empty. For example, the parameter `max_depth=[5, 10];n_estimators=[100, 200]` for lightgbm 。The convention empty. For example, the parameter `max_depth=[5, 10];n_estimators=[100, 200]` for lightgbm 。The convention
will be passed with '; 'shards each parameter, using the name before the equal sign as the parameter name, will be passed with '; 'shards each parameter, using the name before the equal sign as the parameter name,
and using the name after the equal sign to get the corresponding parameter value through `python eval()`. and using the name after the equal sign to get the corresponding parameter value through `python eval()`.
@ -92,63 +95,56 @@ First, introduce some general parameters of DolphinScheduler
**Task Parameter** **Task Parameter**
- **mlflow server tracking uri** :MLflow server uri, default http://localhost:5000. - **Register Model**: Register the model or not. If register is selected, the following parameters are expanded.
- **experiment name** :Create the experiment where the task is running, if the experiment does not exist. If the name is empty, it is set to ` Default `, the same as MLflow. - **model name**: The registered model name is added to the original model version and registered as
- **register model** :Register the model or not. If register is selected, the following parameters are expanded.
- **model name** : The registered model name is added to the original model version and registered as
Production. Production.
- **data path** : The absolute path of the file or folder. Ends with .csv for file or contain train.csv and - **Data Path**: The absolute path of the file or folder. Ends with .csv for file or contain train.csv and
test.csv for folder(In the suggested way, users should build their own test sets for model evaluation)。 test.csv for folder(In the suggested way, users should build their own test sets for model evaluation).
- **parameters** : Parameter when initializing the algorithm/AutoML model, which can be empty. For example - **Parameters**: Parameter when initializing the algorithm/AutoML model, which can be empty. For example
parameters `n_estimators=200;learning_rate=0.2` for flaml The convention will be passed with '; 'shards parameters `n_estimators=200;learning_rate=0.2` for flaml. The convention will be passed with '; 'shards
each parameter, using the name before the equal sign as the parameter name, and using the name after the equal each parameter, using the name before the equal sign as the parameter name, and using the name after the equal
sign to get the corresponding parameter value through `python eval()`. The detailed parameter list is as follows: sign to get the corresponding parameter value through `python eval()`. The detailed parameter list is as follows:
- [flaml](https://microsoft.github.io/FLAML/docs/reference/automl#automl-objects) - [flaml](https://microsoft.github.io/FLAML/docs/reference/automl#automl-objects)
- [autosklearn](https://automl.github.io/auto-sklearn/master/api.html) - [autosklearn](https://automl.github.io/auto-sklearn/master/api.html)
- **AutoML tool** : The AutoML tool used, currently - **AutoML tool**: The AutoML tool used, currently
supports [autosklearn](https://github.com/automl/auto-sklearn) supports [autosklearn](https://github.com/automl/auto-sklearn)
and [flaml](https://github.com/microsoft/FLAML) and [flaml](https://github.com/microsoft/FLAML).
#### Custom projects #### Custom projects
![mlflow-custom-project-template.png](../../../../img/tasks/demo/mlflow-custom-project-template.png) ![mlflow-custom-project.png](../../../../img/tasks/demo/mlflow-custom-project.png)
**Task Parameter** **Task Parameter**
- **mlflow server tracking uri** :MLflow server uri, default http://localhost:5000. - **parameters**: `--param-list` in `mlflow run`. For example `-P learning_rate=0.2 -P colsample_bytree=0.8 -P subsample=0.9`.
- **experiment name** :Create the experiment where the task is running, if the experiment does not exist. If the name is empty, it is set to ` Default `, the same as MLflow. - **Repository**: Repository url of MLflow Project,Support git address and directory on worker. If it's in a subdirectory,We add `#` to support this (same as `mlflow run`) , for example `https://github.com/mlflow/mlflow#examples/xgboost/xgboost_native`.
- **parameters** : `--param-list` in `mlflow run`. For example `-P learning_rate=0.2 -P colsample_bytree=0.8 -P subsample=0.9` - **Project Version**: Version of the project,default master.
- **Repository** : Repository url of MLflow Project,Support git address and directory on worker. If it's in a subdirectory,We add `#` to support this (same as `mlflow run`) , for example `https://github.com/mlflow/mlflow#examples/xgboost/xgboost_native`
- **Project Version** : Version of the project,default master
You can now use this feature to run all mlFlow projects on Github (For example [MLflow examples](https://github.com/mlflow/mlflow/tree/master/examples) )了。You can also create your own machine learning library to reuse your work, and then use DolphinScheduler to use your library with one click. You can now use this feature to run all MLFlow projects on Github (For example [MLflow examples](https://github.com/mlflow/mlflow/tree/master/examples) ). You can also create your own machine learning library to reuse your work, and then use DolphinScheduler to use your library with one click.
The actual interface is as follows
![mlflow-custom-project.png](../../../../img/tasks/demo/mlflow-custom-project.png)
### MLflow Models ### MLflow Models
#### MLFLOW General Parameters:
![mlflow-models-mlflow](../../../../img/tasks/demo/mlflow-models-mlflow.png) - **Model-URI**: Model-URI of MLflow , support `models:/<model_name>/suffix` format and `runs:/` format. See https://mlflow.org/docs/latest/tracking.html#artifact-stores.
- **Port**: The port to listen on.
**Task Parameter**
- **mlflow server tracking uri** :MLflow server uri, default http://localhost:5000. #### MLFLOW
- **model-uri** :Model-uri of mlflow , support `models:/<model_name>/suffix` format and `runs:/` format. See https://mlflow.org/docs/latest/tracking.html#artifact-stores
- **Port** :The port to listen on ![mlflow-models-mlflow](../../../../img/tasks/demo/mlflow-models-mlflow.png)
#### Docker #### Docker
![mlflow-models-docker](../../../../img/tasks/demo/mlflow-models-docker.png) ![mlflow-models-docker](../../../../img/tasks/demo/mlflow-models-docker.png)
**Task Parameter** #### DOCKER COMPOSE
![mlflow-models-docker-compose](../../../../img/tasks/demo/mlflow-models-docker-compose.png)
- **mlflow server tracking uri** :MLflow server uri, default http://localhost:5000. - **Max Cpu Limit**: For example `1.0` or `0.5`, the same as docker compose.
- **model-uri** :Model-uri of mlflow , support `models:/<model_name>/suffix` format and `runs:/` format. See https://mlflow.org/docs/latest/tracking.html#artifact-stores - **Max Memory Limit**: For example `1G` or `500M`, the same as docker compose.
- **Port** :The port to listen on
## Environment to prepare ## Environment to prepare
@ -156,7 +152,7 @@ The actual interface is as follows
You need to enter the admin account to configure a conda environment variable(Please You need to enter the admin account to configure a conda environment variable(Please
install [anaconda](https://docs.continuum.io/anaconda/install/) install [anaconda](https://docs.continuum.io/anaconda/install/)
or [miniconda](https://docs.conda.io/en/latest/miniconda.html#installing ) in advance ) or [miniconda](https://docs.conda.io/en/latest/miniconda.html#installing ) in advance).
![mlflow-conda-env](../../../../img/tasks/demo/mlflow-conda-env.png) ![mlflow-conda-env](../../../../img/tasks/demo/mlflow-conda-env.png)
@ -167,9 +163,9 @@ Conda environment.
### Start the mlflow service ### Start the mlflow service
Make sure you have installed MLflow, using 'PIP Install MLFlow'. Make sure you have installed MLflow, using 'pip install mlflow'.
Create a folder where you want to save your experiments and models and start mlFlow service. Create a folder where you want to save your experiments and models and start MLflow service.
```sh ```sh
mkdir mlflow mkdir mlflow
@ -177,8 +173,8 @@ cd mlflow
mlflow server -h 0.0.0.0 -p 5000 --serve-artifacts --backend-store-uri sqlite:///mlflow.db mlflow server -h 0.0.0.0 -p 5000 --serve-artifacts --backend-store-uri sqlite:///mlflow.db
``` ```
After running, an MLflow service is started After running, an MLflow service is started.
After this, you can visit the MLFlow service (`http://localhost:5000`) page to view the experiments and models. After this, you can visit the MLflow service (`http://localhost:5000`) page to view the experiments and models.
![mlflow-server](../../../../img/tasks/demo/mlflow-server.png) ![mlflow-server](../../../../img/tasks/demo/mlflow-server.png)

78
docs/docs/zh/guide/task/mlflow.md

@ -4,25 +4,25 @@
[MLflow](https://mlflow.org) 是一个MLops领域一个优秀的开源项目, 用于管理机器学习的生命周期,包括实验、可再现性、部署和中心模型注册。 [MLflow](https://mlflow.org) 是一个MLops领域一个优秀的开源项目, 用于管理机器学习的生命周期,包括实验、可再现性、部署和中心模型注册。
MLflow 组件用于执行 MLflow 任务,目前包含Mlflow Projects, 和MLflow Models。(Model Registry将在不就的将来支持) MLflow 组件用于执行 MLflow 任务,目前包含Mlflow Projects, 和MLflow Models。(Model Registry将在不就的将来支持)
- Mlflow Projects: 将代码打包,并可以运行到任务的平台上。 - MLflow Projects: 将代码打包,并可以运行到任务的平台上。
- MLflow Models: 在不同的服务环境中部署机器学习模型。 - MLflow Models: 在不同的服务环境中部署机器学习模型。
- Model Registry: 在一个中央存储库中存储、注释、发现和管理模型 (你也可以在你的mlflow project 里面自行注册模型)。 - Model Registry: 在一个中央存储库中存储、注释、发现和管理模型 (你也可以在你的MLflow project 里面自行注册模型)。
目前 Mlflow 组件支持的和即将支持的内容如下中: 目前 Mlflow 组件支持的和即将支持的内容如下中:
- [x] MLflow Projects - [x] MLflow Projects
- [x] BasicAlgorithm: 基础算法,包含LogisticRegression, svm, lightgbm, xgboost - [x] BasicAlgorithm: 基础算法,包含LogisticRegression, svm, lightgbm, xgboost
- [x] AutoML: AutoML工具,包含autosklean, flaml - [x] AutoML: AutoML工具,包含autosklean, flaml
- [x] Custom projects: 支持运行自己的MLflow Projects项目 - [x] Custom projects: 支持运行自己的MLflow Projects项目
- [ ] MLflow Models - [ ] MLflow Models
- [x] MLFLOW: 直接使用 `MLflow models serve` 部署模型 - [x] MLFLOW: 直接使用 `mlflow models serve` 部署模型。
- [x] Docker: 打包 DOCKER 镜像后部署模型 - [x] Docker: 打包 DOCKER 镜像后部署模型
- [ ] Docker Compose: 使用Docker Compose 部署模型,将会取代上面的Docker部署 - [x] Docker Compose: 使用Docker Compose 部署模型,将会取代上面的Docker部署
- [ ] Seldon core: 构建完镜像后,使用Seldon Core 部署到k8s集群上, 可以使用Seldon Core的生成模型管理能力 - [ ] Seldon core: 构建完镜像后,使用Seldon Core 部署到k8s集群上, 可以使用Seldon Core的生成模型管理能力
- [ ] k8s: 构建完镜像后, 部署到k8s集群上 - [ ] k8s: 构建完镜像后, 部署到k8s集群上
- [ ] mlflow deployments: 内置的允许MLflow 部署模块, 如内置的部署到Sagemaker等 - [ ] MLflow deployments: 内置的允许MLflow 部署模块, 如内置的部署到Sagemaker等
- [ ] Model Registry - [ ] Model Registry
- [ ] Register Model: 注册相关工件(模型以及相关的参数,指标)到模型中心 - [ ] Register Model: 注册相关工件(模型以及相关的参数,指标)到模型中心
@ -48,6 +48,12 @@ MLflow 组件用于执行 MLflow 任务,目前包含Mlflow Projects, 和MLflow
- **超时告警** :勾选超时告警、超时失败,当任务超过"超时时长"后,会发送告警邮件并且任务执行失败。 - **超时告警** :勾选超时告警、超时失败,当任务超过"超时时长"后,会发送告警邮件并且任务执行失败。
- **前置任务** :选择当前任务的前置任务,会将被选择的前置任务设置为当前任务的上游。 - **前置任务** :选择当前任务的前置任务,会将被选择的前置任务设置为当前任务的上游。
以下是一些MLflow 组件的常用参数
- **MLflow Tracking Server URI** :MLflow Tracking Server 的连接, 默认 http://localhost:5000。
- **实验名称** :任务运行时所在的实验,若实验不存在,则创建。若实验名称为空,则设置为`Default`, 与 MLflow 一样。
### MLflow Projects ### MLflow Projects
#### BasicAlgorithm #### BasicAlgorithm
@ -56,8 +62,6 @@ MLflow 组件用于执行 MLflow 任务,目前包含Mlflow Projects, 和MLflow
**任务参数** **任务参数**
- **mlflow server tracking uri** :MLflow server 的连接, 默认 http://localhost:5000。
- **实验名称** :任务运行时所在的实验,若实验不存在,则创建。若实验名称为空,则设置为`Default`, 与 MLflow 一样。
- **注册模型** :是否注册模型,若选择注册,则会展开以下参数。 - **注册模型** :是否注册模型,若选择注册,则会展开以下参数。
- **注册的模型名称** : 注册的模型名称,会在原来的基础上加上一个模型版本,并注册为Production。 - **注册的模型名称** : 注册的模型名称,会在原来的基础上加上一个模型版本,并注册为Production。
- **数据路径** : 文件/文件夹的绝对路径, 若文件需以.csv结尾(自动切分训练集与测试集), 文件夹需包含train.csv和test.csv(建议方式,用户应自行构建测试集用于模型评估)。 - **数据路径** : 文件/文件夹的绝对路径, 若文件需以.csv结尾(自动切分训练集与测试集), 文件夹需包含train.csv和test.csv(建议方式,用户应自行构建测试集用于模型评估)。
@ -66,7 +70,7 @@ MLflow 组件用于执行 MLflow 任务,目前包含Mlflow Projects, 和MLflow
- [SVM](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html?highlight=svc#sklearn.svm.SVC) - [SVM](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html?highlight=svc#sklearn.svm.SVC)
- [lightgbm](https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMClassifier.html#lightgbm.LGBMClassifier) - [lightgbm](https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMClassifier.html#lightgbm.LGBMClassifier)
- [xgboost](https://xgboost.readthedocs.io/en/stable/python/python_api.html#xgboost.XGBClassifier) - [xgboost](https://xgboost.readthedocs.io/en/stable/python/python_api.html#xgboost.XGBClassifier)
- **算法** :选择的算法,目前基于 [scikit-learn](https://scikit-learn.org/) 形式支持 `lr`, `svm`, `lightgbm`, `xgboost`. - **算法** :选择的算法,目前基于 [scikit-learn](https://scikit-learn.org/) 形式支持 `lr`, `svm`, `lightgbm`, `xgboost`
- **参数搜索空间** : 运行对应算法的参数搜索空间, 可为空。如针对lightgbm 的 `max_depth=[5, 10];n_estimators=[100, 200]` 则会进行对应搜索。约定传入后会以;切分各个参数,等号前的名字作为参数名,等号后的名字将以python eval执行得到对应的参数值 - **参数搜索空间** : 运行对应算法的参数搜索空间, 可为空。如针对lightgbm 的 `max_depth=[5, 10];n_estimators=[100, 200]` 则会进行对应搜索。约定传入后会以;切分各个参数,等号前的名字作为参数名,等号后的名字将以python eval执行得到对应的参数值
#### AutoML #### AutoML
@ -75,8 +79,6 @@ MLflow 组件用于执行 MLflow 任务,目前包含Mlflow Projects, 和MLflow
**任务参数** **任务参数**
- **mlflow server tracking uri** :MLflow server 的连接, 默认 http://localhost:5000。
- **实验名称** :任务运行时所在的实验,若实验不存在,则创建。若实验名称为空,则设置为`Default`, 与 MLflow 一样。
- **注册模型** :是否注册模型,若选择注册,则会展开以下参数。 - **注册模型** :是否注册模型,若选择注册,则会展开以下参数。
- **注册的模型名称** : 注册的模型名称,会在原来的基础上加上一个模型版本,并注册为Production。 - **注册的模型名称** : 注册的模型名称,会在原来的基础上加上一个模型版本,并注册为Production。
- **数据路径** : 文件/文件夹的绝对路径, 若文件需以.csv结尾(自动切分训练集与测试集), 文件夹需包含train.csv和test.csv(建议方式,用户应自行构建测试集用于模型评估)。 - **数据路径** : 文件/文件夹的绝对路径, 若文件需以.csv结尾(自动切分训练集与测试集), 文件夹需包含train.csv和test.csv(建议方式,用户应自行构建测试集用于模型评估)。
@ -84,65 +86,61 @@ MLflow 组件用于执行 MLflow 任务,目前包含Mlflow Projects, 和MLflow
- [flaml](https://microsoft.github.io/FLAML/docs/reference/automl#automl-objects) - [flaml](https://microsoft.github.io/FLAML/docs/reference/automl#automl-objects)
- [autosklearn](https://automl.github.io/auto-sklearn/master/api.html) - [autosklearn](https://automl.github.io/auto-sklearn/master/api.html)
- **AutoML工具** : 使用的AutoML工具,目前支持 [autosklearn](https://github.com/automl/auto-sklearn) - **AutoML工具** : 使用的AutoML工具,目前支持 [autosklearn](https://github.com/automl/auto-sklearn)
, [flaml](https://github.com/microsoft/FLAML) , [flaml](https://github.com/microsoft/FLAML)
#### Custom projects #### Custom projects
![mlflow-custom-project-template.png](../../../../img/tasks/demo/mlflow-custom-project-template.png) ![mlflow-custom-project.png](../../../../img/tasks/demo/mlflow-custom-project.png)
**任务参数** **任务参数**
- **mlflow server tracking uri** :MLflow server 的连接, 默认 http://localhost:5000。
- **实验名称** :任务运行时所在的实验,若实验不存在,则创建。若实验名称为空,则设置为`Default`, 与 MLflow 一样。
- **参数** : `mlflow run`中的 --param-list 如 `-P learning_rate=0.2 -P colsample_bytree=0.8 -P subsample=0.9` - **参数** : `mlflow run`中的 --param-list 如 `-P learning_rate=0.2 -P colsample_bytree=0.8 -P subsample=0.9`
- **运行仓库** : MLflow Project的仓库地址,可以为github地址,或者worker上的目录, 如Mlflow project位于子目录,可以添加 `#` 隔开, 如 `https://github.com/mlflow/mlflow#examples/xgboost/xgboost_native` - **运行仓库** : MLflow Project的仓库地址,可以为github地址,或者worker上的目录, 如MLflow project位于子目录,可以添加 `#` 隔开, 如 `https://github.com/mlflow/mlflow#examples/xgboost/xgboost_native`
- **项目版本** : 对应项目中git版本管理中的版本,默认 master - **项目版本** : 对应项目中git版本管理中的版本,默认 master
现在你可以使用这个功能来运行github上所有的MLflow Projects (如 [MLflow examples](https://github.com/mlflow/mlflow/tree/master/examples) )了。你也可以创建自己的机器学习库,用来复用你的研究成果,以后你就可以使用DolphinScheduler来一键操作使用你的算法库。 现在你可以使用这个功能来运行github上所有的MLflow Projects (如 [MLflow examples](https://github.com/mlflow/mlflow/tree/master/examples) )了。你也可以创建自己的机器学习库,用来复用你的研究成果,以后你就可以使用DolphinScheduler来一键操作使用你的算法库。
实际运行界面如下
![mlflow-custom-project.png](../../../../img/tasks/demo/mlflow-custom-project.png) ### MLflow Models
常用参数:
### MLflow Models - **部署模型的URI** :MLflow 服务里面模型对应的URI, 支持 `models:/<model_name>/suffix` 格式 和 `runs:/` 格式。
- **监听端口** :部署服务时的端口。
#### MLFLOW #### MLFLOW
![mlflow-models-mlflow](../../../../img/tasks/demo/mlflow-models-mlflow.png) ![mlflow-models-mlflow](../../../../img/tasks/demo/mlflow-models-mlflow.png)
**任务参数**
- **mlflow server tracking uri** :MLflow server 的连接, 默认 http://localhost:5000。
- **部署模型的uri** :mlflow 服务里面模型对应的uri, 支持 `models:/<model_name>/suffix` 格式 和 `runs:/` 格式。
- **部署端口** :部署服务时的端口。
#### Docker #### Docker
![mlflow-models-docker](../../../../img/tasks/demo/mlflow-models-docker.png) ![mlflow-models-docker](../../../../img/tasks/demo/mlflow-models-docker.png)
- **mlflow server tracking uri** :MLflow server 的连接, 默认 http://localhost:5000。 #### DOCKER COMPOSE
- **部署模型的uri** :mlflow 服务里面模型对应的uri, 支持 `models:/<model_name>/suffix` 格式 和 `runs:/` 格式。
- **部署端口** :部署服务时的端口。 ![mlflow-models-docker-compose](../../../../img/tasks/demo/mlflow-models-docker-compose.png)
- **最大CPU限制** :如 `1.0` 或者 `0.5`, 与 docker compose 一致。
- **最大内存限制** :如 `1G` 或者 `500M`, 与 docker compose 一致。
## 环境准备 ## 环境准备
### conda 环境配置 ### conda 环境配置
你需要进入admin账户配置一个conda环境变量(请提前[安装anaconda](https://docs.continuum.io/anaconda/install/) 你需要进入admin账户配置一个conda环境变量(请提前[安装anaconda](https://docs.continuum.io/anaconda/install/)
或者[安装miniconda](https://docs.conda.io/en/latest/miniconda.html#installing) ) 或者[安装miniconda](https://docs.conda.io/en/latest/miniconda.html#installing) )
![mlflow-conda-env](../../../../img/tasks/demo/mlflow-conda-env.png) ![mlflow-conda-env](../../../../img/tasks/demo/mlflow-conda-env.png)
后续注意配置任务时,环境选择上面创建的conda环境,否则程序会找不到conda环境 后续注意配置任务时,环境选择上面创建的conda环境,否则程序会找不到conda环境
![mlflow-set-conda-env](../../../../img/tasks/demo/mlflow-set-conda-env.png) ![mlflow-set-conda-env](../../../../img/tasks/demo/mlflow-set-conda-env.png)
### mlflow service 启动 ### MLflow service 启动
确保你已经安装mlflow,可以使用`pip install mlflow`进行安装 确保你已经安装MLflow,可以使用`pip install mlflow`进行安装
在你想保存实验和模型的地方建立一个文件夹,然后启动 mlflow service 在你想保存实验和模型的地方建立一个文件夹,然后启动 mlflow service
```sh ```sh
mkdir mlflow mkdir mlflow
@ -150,9 +148,9 @@ cd mlflow
mlflow server -h 0.0.0.0 -p 5000 --serve-artifacts --backend-store-uri sqlite:///mlflow.db mlflow server -h 0.0.0.0 -p 5000 --serve-artifacts --backend-store-uri sqlite:///mlflow.db
``` ```
运行后会启动一个mlflow服务 运行后会启动一个MLflow服务。
可以通过访问 mlflow service (`http://localhost:5000`) 页面查看实验与模型 可以通过访问 MLflow service (`http://localhost:5000`) 页面查看实验与模型
![mlflow-server](../../../../img/tasks/demo/mlflow-server.png) ![mlflow-server](../../../../img/tasks/demo/mlflow-server.png)

BIN
docs/img/tasks/demo/mlflow-automl.png

Binary file not shown.

Before

Width:  |  Height:  |  Size: 28 KiB

After

Width:  |  Height:  |  Size: 32 KiB

BIN
docs/img/tasks/demo/mlflow-basic-algorithm.png

Binary file not shown.

Before

Width:  |  Height:  |  Size: 32 KiB

After

Width:  |  Height:  |  Size: 33 KiB

BIN
docs/img/tasks/demo/mlflow-custom-project-template.png

Binary file not shown.

Before

Width:  |  Height:  |  Size: 65 KiB

BIN
docs/img/tasks/demo/mlflow-custom-project.png

Binary file not shown.

Before

Width:  |  Height:  |  Size: 82 KiB

After

Width:  |  Height:  |  Size: 35 KiB

BIN
docs/img/tasks/demo/mlflow-models-docker-compose.png

Binary file not shown.

After

Width:  |  Height:  |  Size: 24 KiB

BIN
docs/img/tasks/demo/mlflow-models-docker.png

Binary file not shown.

Before

Width:  |  Height:  |  Size: 20 KiB

After

Width:  |  Height:  |  Size: 17 KiB

BIN
docs/img/tasks/demo/mlflow-models-mlflow.png

Binary file not shown.

Before

Width:  |  Height:  |  Size: 20 KiB

After

Width:  |  Height:  |  Size: 17 KiB

27
dolphinscheduler-task-plugin/dolphinscheduler-task-mlflow/src/main/java/org/apache/dolphinscheduler/plugin/task/mlflow/MlflowConstants.java

@ -36,19 +36,21 @@ public class MlflowConstants {
public static final String PRESET_BASIC_ALGORITHM_PROJECT = PRESET_REPOSITORY + "#Project-BasicAlgorithm"; public static final String PRESET_BASIC_ALGORITHM_PROJECT = PRESET_REPOSITORY + "#Project-BasicAlgorithm";
public static final String RUN_PROJECT_BASIC_ALGORITHM_SCRIPT = "run_mlflow_basic_algorithm_project.sh";
public static final String RUN_PROJECT_AUTOML_SCRIPT = "run_mlflow_automl_project.sh";
public static final String MLFLOW_TASK_TYPE_PROJECTS = "MLflow Projects"; public static final String MLFLOW_TASK_TYPE_PROJECTS = "MLflow Projects";
public static final String MLFLOW_TASK_TYPE_MODELS = "MLflow Models"; public static final String MLFLOW_TASK_TYPE_MODELS = "MLflow Models";
public static final String MLFLOW_MODELS_DEPLOY_TYPE_MLFLOW = "MLFLOW"; public static final String MLFLOW_MODELS_DEPLOY_TYPE_MLFLOW = "MLFLOW";
public static final String MLFLOW_MODELS_DEPLOY_TYPE_DOCKER = "DOCKER"; public static final String MLFLOW_MODELS_DEPLOY_TYPE_DOCKER = "DOCKER";
public static final String MLFLOW_MODELS_DEPLOY_TYPE_DOCKER_COMPOSE = "DOCKER COMPOSE";
/**
* template file
*/
public static final String TEMPLATE_DOCKER_COMPOSE = "docker-compose.yml";
/** /**
* mlflow command * mlflow command
@ -86,9 +88,22 @@ public class MlflowConstants {
public static final String MLFLOW_BUILD_DOCKER = "mlflow models build-docker -m %s -n %s --enable-mlserver"; public static final String MLFLOW_BUILD_DOCKER = "mlflow models build-docker -m %s -n %s --enable-mlserver";
public static final String DOCKER_RREMOVE_CONTAINER = "docker rm -f %s"; public static final String DOCKER_RREMOVE_CONTAINER = "docker rm -f %s";
public static final String DOCKER_RUN = "docker run --name=%s -p=%s:8080 %s"; public static final String DOCKER_RUN = "docker run --name=%s -p=%s:8080 %s";
public static final String DOCKER_COMPOSE_RUN = "docker-compose up -d";
public static final String SET_DOCKER_COMPOSE_ENV = "export DS_TASK_MLFLOW_IMAGE_NAME=%s\n" +
"export DS_TASK_MLFLOW_CONTAINER_NAME=%s\n" +
"export DS_TASK_MLFLOW_DEPLOY_PORT=%s\n" +
"export DS_TASK_MLFLOW_CPU_LIMIT=%s\n" +
"export DS_TASK_MLFLOW_MEMORY_LIMIT=%s";
public static final String DOCKER_HEALTH_CHECK_COMMAND = "for i in $(seq 1 300); " +
"do " +
"[ $(docker inspect --format \"{{json .State.Health.Status }}\" %s) = '\"healthy\"' ] " +
"&& exit 0 && break;sleep 1; " +
"done; docker-compose down; exit 1";
} }

44
dolphinscheduler-task-plugin/dolphinscheduler-task-mlflow/src/main/java/org/apache/dolphinscheduler/plugin/task/mlflow/MlflowParameters.java

@ -76,6 +76,10 @@ public class MlflowParameters extends AbstractParameters {
private String deployPort; private String deployPort;
private String cpuLimit;
private String memoryLimit;
public void setAlgorithm(String algorithm) { public void setAlgorithm(String algorithm) {
this.algorithm = algorithm; this.algorithm = algorithm;
} }
@ -196,6 +200,22 @@ public class MlflowParameters extends AbstractParameters {
return deployPort; return deployPort;
} }
public void setCpuLimit(String cpuLimit) {
this.cpuLimit = cpuLimit;
}
public String getCpuLimit() {
return cpuLimit;
}
public void setMemoryLimit(String memoryLimit) {
this.memoryLimit = memoryLimit;
}
public String getMemoryLimit() {
return memoryLimit;
}
@Override @Override
public boolean checkParameters() { public boolean checkParameters() {
Boolean checkResult = true; Boolean checkResult = true;
@ -242,19 +262,6 @@ public class MlflowParameters extends AbstractParameters {
paramsMap.put("repo_version", MlflowConstants.PRESET_REPOSITORY_VERSION); paramsMap.put("repo_version", MlflowConstants.PRESET_REPOSITORY_VERSION);
} }
public String getScriptPath() {
String projectScript;
if (mlflowJobType.equals(MlflowConstants.JOB_TYPE_BASIC_ALGORITHM)) {
projectScript = MlflowConstants.RUN_PROJECT_BASIC_ALGORITHM_SCRIPT;
} else if (mlflowJobType.equals(MlflowConstants.JOB_TYPE_AUTOML)) {
projectScript = MlflowConstants.RUN_PROJECT_AUTOML_SCRIPT;
} else {
throw new IllegalArgumentException();
}
String scriptPath = MlflowTask.class.getClassLoader().getResource(projectScript).getPath();
return scriptPath;
}
public String getModelKeyName(String tag) throws IllegalArgumentException { public String getModelKeyName(String tag) throws IllegalArgumentException {
String imageName; String imageName;
if (deployModelKey.startsWith("runs:")) { if (deployModelKey.startsWith("runs:")) {
@ -268,4 +275,15 @@ public class MlflowParameters extends AbstractParameters {
return imageName; return imageName;
} }
public String getDockerComposeEnvCommand() {
String imageName = "mlflow/" + getModelKeyName(":");
String env = String.format(MlflowConstants.SET_DOCKER_COMPOSE_ENV, imageName, getContainerName(), deployPort, cpuLimit, memoryLimit);
return env;
}
public String getContainerName(){
String containerName = "ds-mlflow-" + getModelKeyName("-");
return containerName;
}
}; };

21
dolphinscheduler-task-plugin/dolphinscheduler-task-mlflow/src/main/java/org/apache/dolphinscheduler/plugin/task/mlflow/MlflowTask.java

@ -101,7 +101,7 @@ public class MlflowTask extends AbstractTaskExecutor {
shellCommandExecutor.cancelApplication(); shellCommandExecutor.cancelApplication();
} }
public String buildCommand(){ public String buildCommand() {
String command = ""; String command = "";
if (mlflowParameters.getMlflowTaskType().equals(MlflowConstants.MLFLOW_TASK_TYPE_PROJECTS)) { if (mlflowParameters.getMlflowTaskType().equals(MlflowConstants.MLFLOW_TASK_TYPE_PROJECTS)) {
command = buildCommandForMlflowProjects(); command = buildCommandForMlflowProjects();
@ -146,8 +146,7 @@ public class MlflowTask extends AbstractTaskExecutor {
runCommand = MlflowConstants.MLFLOW_RUN_CUSTOM_PROJECT; runCommand = MlflowConstants.MLFLOW_RUN_CUSTOM_PROJECT;
runCommand = String.format(runCommand, mlflowParameters.getParams(), mlflowParameters.getExperimentName(), mlflowParameters.getMlflowProjectVersion()); runCommand = String.format(runCommand, mlflowParameters.getParams(), mlflowParameters.getExperimentName(), mlflowParameters.getMlflowProjectVersion());
} } else {
else {
runCommand = String.format("Cant not Support %s", mlflowParameters.getMlflowJobType()); runCommand = String.format("Cant not Support %s", mlflowParameters.getMlflowJobType());
} }
@ -173,11 +172,19 @@ public class MlflowTask extends AbstractTaskExecutor {
} else if (mlflowParameters.getDeployType().equals(MlflowConstants.MLFLOW_MODELS_DEPLOY_TYPE_DOCKER)) { } else if (mlflowParameters.getDeployType().equals(MlflowConstants.MLFLOW_MODELS_DEPLOY_TYPE_DOCKER)) {
String imageName = "mlflow/" + mlflowParameters.getModelKeyName(":"); String imageName = "mlflow/" + mlflowParameters.getModelKeyName(":");
String containerName = "mlflow-" + mlflowParameters.getModelKeyName("-"); String containerName = mlflowParameters.getContainerName();
args.add(String.format(MlflowConstants.MLFLOW_BUILD_DOCKER, deployModelKey, imageName)); args.add(String.format(MlflowConstants.MLFLOW_BUILD_DOCKER, deployModelKey, imageName));
args.add(String.format(MlflowConstants.DOCKER_RREMOVE_CONTAINER, containerName)); args.add(String.format(MlflowConstants.DOCKER_RREMOVE_CONTAINER, containerName));
args.add(String.format(MlflowConstants.DOCKER_RUN, containerName, mlflowParameters.getDeployPort(), imageName)); args.add(String.format(MlflowConstants.DOCKER_RUN, containerName, mlflowParameters.getDeployPort(), imageName));
} else if (mlflowParameters.getDeployType().equals(MlflowConstants.MLFLOW_MODELS_DEPLOY_TYPE_DOCKER_COMPOSE)) {
String templatePath = getTemplatePath(MlflowConstants.TEMPLATE_DOCKER_COMPOSE);
args.add(String.format("cp %s %s", templatePath, taskExecutionContext.getExecutePath()));
String imageName = "mlflow/" + mlflowParameters.getModelKeyName(":");
args.add(String.format(MlflowConstants.MLFLOW_BUILD_DOCKER, deployModelKey, imageName));
args.add(mlflowParameters.getDockerComposeEnvCommand());
args.add(MlflowConstants.DOCKER_COMPOSE_RUN);
args.add(String.format(MlflowConstants.DOCKER_HEALTH_CHECK_COMMAND, mlflowParameters.getContainerName()));
} }
String command = ParameterUtils.convertParameterPlaceholders(String.join("\n", args), ParamUtils.convert(paramsMap)); String command = ParameterUtils.convertParameterPlaceholders(String.join("\n", args), ParamUtils.convert(paramsMap));
@ -197,9 +204,15 @@ public class MlflowTask extends AbstractTaskExecutor {
} }
@Override @Override
public AbstractParameters getParameters() { public AbstractParameters getParameters() {
return mlflowParameters; return mlflowParameters;
} }
public String getTemplatePath(String template) {
String templatePath = MlflowTask.class.getClassLoader().getResource(template).getPath();
return templatePath;
}
} }

39
dolphinscheduler-task-plugin/dolphinscheduler-task-mlflow/src/main/resources/docker-compose.yml

@ -0,0 +1,39 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
version: "3"
services:
mlflow-model:
image: "${DS_TASK_MLFLOW_IMAGE_NAME}"
container_name: "${DS_TASK_MLFLOW_CONTAINER_NAME}"
ports:
- "${DS_TASK_MLFLOW_DEPLOY_PORT}:8080"
deploy:
resources:
limits:
cpus: "${DS_TASK_MLFLOW_CPU_LIMIT}"
memory: "${DS_TASK_MLFLOW_MEMORY_LIMIT}"
environment:
PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION: python
healthcheck:
test: ["CMD", "curl", "http://127.0.0.1:8080/ping"]
interval: 5s
timeout: 5s
retries: 5

25
dolphinscheduler-task-plugin/dolphinscheduler-task-mlflow/src/main/resources/run_mlflow_automl_project.sh

@ -1,25 +0,0 @@
#!/bin/bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
data_path=${data_path}
export MLFLOW_TRACKING_URI=${MLFLOW_TRACKING_URI}
echo $data_path
repo=${repo}
mlflow run $repo -P tool=${automl_tool} -P data_path=$data_path -P params="${params}" -P model_name="${model_name}" --experiment-name="${experiment_name}" --version="${repo_version}"
echo "training finish"

25
dolphinscheduler-task-plugin/dolphinscheduler-task-mlflow/src/main/resources/run_mlflow_basic_algorithm_project.sh

@ -1,25 +0,0 @@
#!/bin/bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
data_path=${data_path}
export MLFLOW_TRACKING_URI=${MLFLOW_TRACKING_URI}
echo $data_path
repo=${repo}
mlflow run $repo -P algorithm=${algorithm} -P data_path=$data_path -P params="${params}" -P search_params="${search_params}" -P model_name="${model_name}" --experiment-name="${experiment_name}" --version="${repo_version}"
echo "training finish"

37
dolphinscheduler-task-plugin/dolphinscheduler-task-mlflow/src/test/java/org/apache/dolphinler/plugin/task/mlflow/MlflowTaskTest.java

@ -135,18 +135,37 @@ public class MlflowTaskTest {
} }
@Test @Test
public void testModelsDeployDocker() throws Exception { public void testModelsDeployDocker() {
MlflowTask mlflowTask = initTask(createModelDeplyDockerParameters()); MlflowTask mlflowTask = initTask(createModelDeplyDockerParameters());
Assert.assertEquals(mlflowTask.buildCommand(), Assert.assertEquals(mlflowTask.buildCommand(),
"export MLFLOW_TRACKING_URI=http://127.0.0.1:5000\n" + "export MLFLOW_TRACKING_URI=http://127.0.0.1:5000\n" +
"mlflow models build-docker -m runs:/a272ec279fc34a8995121ae04281585f/model " + "mlflow models build-docker -m runs:/a272ec279fc34a8995121ae04281585f/model " +
"-n mlflow/a272ec279fc34a8995121ae04281585f:model " + "-n mlflow/a272ec279fc34a8995121ae04281585f:model " +
"--enable-mlserver\n" + "--enable-mlserver\n" +
"docker rm -f mlflow-a272ec279fc34a8995121ae04281585f-model\n" + "docker rm -f ds-mlflow-a272ec279fc34a8995121ae04281585f-model\n" +
"docker run --name=mlflow-a272ec279fc34a8995121ae04281585f-model " + "docker run --name=ds-mlflow-a272ec279fc34a8995121ae04281585f-model " +
"-p=7000:8080 mlflow/a272ec279fc34a8995121ae04281585f:model"); "-p=7000:8080 mlflow/a272ec279fc34a8995121ae04281585f:model");
} }
@Test
public void testModelsDeployDockerCompose() throws Exception{
MlflowTask mlflowTask = initTask(createModelDeplyDockerComposeParameters());
Assert.assertEquals(mlflowTask.buildCommand(),
"export MLFLOW_TRACKING_URI=http://127.0.0.1:5000\n" +
"cp " + mlflowTask.getTemplatePath(MlflowConstants.TEMPLATE_DOCKER_COMPOSE) +
" /tmp/dolphinscheduler_test\n" +
"mlflow models build-docker -m models:/22222/1 -n mlflow/22222:1 --enable-mlserver\n" +
"export DS_TASK_MLFLOW_IMAGE_NAME=mlflow/22222:1\n" +
"export DS_TASK_MLFLOW_CONTAINER_NAME=ds-mlflow-22222-1\n" +
"export DS_TASK_MLFLOW_DEPLOY_PORT=7000\n" +
"export DS_TASK_MLFLOW_CPU_LIMIT=0.5\n" +
"export DS_TASK_MLFLOW_MEMORY_LIMIT=200m\n" +
"docker-compose up -d\n" +
"for i in $(seq 1 300); do " +
"[ $(docker inspect --format \"{{json .State.Health.Status }}\" ds-mlflow-22222-1) = '\"healthy\"' ] && exit 0 && break;sleep 1; " +
"done; docker-compose down; exit 1");
}
private MlflowTask initTask(MlflowParameters mlflowParameters) { private MlflowTask initTask(MlflowParameters mlflowParameters) {
TaskExecutionContext taskExecutionContext = createContext(mlflowParameters); TaskExecutionContext taskExecutionContext = createContext(mlflowParameters);
MlflowTask mlflowTask = new MlflowTask(taskExecutionContext); MlflowTask mlflowTask = new MlflowTask(taskExecutionContext);
@ -213,4 +232,16 @@ public class MlflowTaskTest {
mlflowParameters.setDeployPort("7000"); mlflowParameters.setDeployPort("7000");
return mlflowParameters; return mlflowParameters;
} }
private MlflowParameters createModelDeplyDockerComposeParameters() {
MlflowParameters mlflowParameters = new MlflowParameters();
mlflowParameters.setMlflowTaskType(MlflowConstants.MLFLOW_TASK_TYPE_MODELS);
mlflowParameters.setDeployType(MlflowConstants.MLFLOW_MODELS_DEPLOY_TYPE_DOCKER_COMPOSE);
mlflowParameters.setMlflowTrackingUris("http://127.0.0.1:5000");
mlflowParameters.setDeployModelKey("models:/22222/1");
mlflowParameters.setDeployPort("7000");
mlflowParameters.setCpuLimit("0.5");
mlflowParameters.setMemoryLimit("200m");
return mlflowParameters;
}
} }

33
dolphinscheduler-ui/src/locales/en_US/project.ts

@ -608,9 +608,6 @@ export default {
zeppelin_paragraph_id: 'zeppelinParagraphId', zeppelin_paragraph_id: 'zeppelinParagraphId',
zeppelin_paragraph_id_tips: zeppelin_paragraph_id_tips:
'Please enter the paragraph id of your zeppelin paragraph', 'Please enter the paragraph id of your zeppelin paragraph',
zeppelin_parameters: 'parameters',
zeppelin_parameters_tips:
'Please enter the parameters for zeppelin dynamic form',
jupyter_conda_env_name: 'condaEnvName', jupyter_conda_env_name: 'condaEnvName',
jupyter_conda_env_name_tips: jupyter_conda_env_name_tips:
'Please enter the conda environment name of papermill', 'Please enter the conda environment name of papermill',
@ -634,36 +631,38 @@ export default {
jupyter_others: 'others', jupyter_others: 'others',
jupyter_others_tips: jupyter_others_tips:
'Please enter the other options you need for papermill', 'Please enter the other options you need for papermill',
mlflow_algorithm: 'algorithm', mlflow_algorithm: 'Algorithm',
mlflow_algorithm_tips: 'svm', mlflow_algorithm_tips: 'svm',
mlflow_params: 'parameters', mlflow_params: 'Parameters',
mlflow_params_tips: ' ', mlflow_params_tips: ' ',
mlflow_searchParams: 'Parameter search space', mlflow_searchParams: 'Parameter Search Space',
mlflow_searchParams_tips: ' ', mlflow_searchParams_tips: ' ',
mlflow_isSearchParams: 'Search parameters', mlflow_isSearchParams: 'Search Parameters',
mlflow_dataPath: 'data path', mlflow_dataPath: 'Data Path',
mlflow_dataPath_tips: mlflow_dataPath_tips:
' The absolute path of the file or folder. Ends with .csv for file or contain train.csv and test.csv for folder', ' The absolute path of the file or folder. Ends with .csv for file or contain train.csv and test.csv for folder',
mlflow_dataPath_error_tips: ' data data can not be empty ', mlflow_dataPath_error_tips: ' data data can not be empty ',
mlflow_experimentName: 'experiment name', mlflow_experimentName: 'Experiment Name',
mlflow_experimentName_tips: 'experiment_001', mlflow_experimentName_tips: 'experiment_001',
mlflow_registerModel: 'register model', mlflow_registerModel: 'Register Model',
mlflow_modelName: 'model name', mlflow_modelName: 'Model Name',
mlflow_modelName_tips: 'model_001', mlflow_modelName_tips: 'model_001',
mlflow_mlflowTrackingUri: 'mlflow server tracking uri', mlflow_mlflowTrackingUri: 'MLflow Tracking Server URI',
mlflow_mlflowTrackingUri_tips: 'http://127.0.0.1:5000', mlflow_mlflowTrackingUri_tips: 'http://127.0.0.1:5000',
mlflow_mlflowTrackingUri_error_tips: mlflow_mlflowTrackingUri_error_tips:
' mlflow server tracking uri cant not be empty', 'MLflow Tracking Server URI can not be empty',
mlflow_jobType: 'job type', mlflow_jobType: 'Job Type',
mlflow_automlTool: 'AutoML tool', mlflow_automlTool: 'AutoML Tool',
mlflow_taskType: 'MLflow Task Type', mlflow_taskType: 'MLflow Task Type',
mlflow_deployType: 'Deploy Mode', mlflow_deployType: 'Deploy Mode',
mlflow_deployModelKey: 'model-uri', mlflow_deployModelKey: 'Model-URI',
mlflow_deployPort: 'Port', mlflow_deployPort: 'Port',
mlflowProjectRepository: 'Repository', mlflowProjectRepository: 'Repository',
mlflowProjectRepository_tips: 'github respository or path on worker', mlflowProjectRepository_tips: 'github respository or path on worker',
mlflowProjectVersion: 'Project Version', mlflowProjectVersion: 'Project Version',
mlflowProjectVersion_tips: 'git version', mlflowProjectVersion_tips: 'git version',
mlflow_cpuLimit: 'Max Cpu Limit',
mlflow_memoryLimit: 'Max Memory Limit',
openmldb_zk_address: 'zookeeper address', openmldb_zk_address: 'zookeeper address',
openmldb_zk_address_tips: 'Please enter the zookeeper address', openmldb_zk_address_tips: 'Please enter the zookeeper address',
openmldb_zk_path: 'zookeeper path', openmldb_zk_path: 'zookeeper path',
@ -694,4 +693,4 @@ export default {
'Please enter threshold number is needed', 'Please enter threshold number is needed',
please_enter_comparison_title: 'please select comparison title' please_enter_comparison_title: 'please select comparison title'
} }
} }

8
dolphinscheduler-ui/src/locales/zh_CN/project.ts

@ -637,19 +637,21 @@ export default {
mlflow_registerModel: '注册模型', mlflow_registerModel: '注册模型',
mlflow_modelName: '注册的模型名称', mlflow_modelName: '注册的模型名称',
mlflow_modelName_tips: 'model_001', mlflow_modelName_tips: 'model_001',
mlflow_mlflowTrackingUri: 'mlflow server tracking uri', mlflow_mlflowTrackingUri: 'MLflow Tracking Server URI',
mlflow_mlflowTrackingUri_tips: 'http://127.0.0.1:5000', mlflow_mlflowTrackingUri_tips: 'http://127.0.0.1:5000',
mlflow_mlflowTrackingUri_error_tips: ' mlflow server tracking uri 不能为空', mlflow_mlflowTrackingUri_error_tips: ' MLflow Tracking Server URI 不能为空',
mlflow_jobType: '任务类型', mlflow_jobType: '任务类型',
mlflow_automlTool: 'AutoML工具', mlflow_automlTool: 'AutoML工具',
mlflow_taskType: 'MLflow 任务类型', mlflow_taskType: 'MLflow 任务类型',
mlflow_deployType: '部署类型', mlflow_deployType: '部署类型',
mlflow_deployModelKey: '部署的模型uri', mlflow_deployModelKey: '部署的模型URI',
mlflow_deployPort: '监听端口', mlflow_deployPort: '监听端口',
mlflowProjectRepository: '运行仓库', mlflowProjectRepository: '运行仓库',
mlflowProjectRepository_tips: '可以为github仓库或worker上的路径', mlflowProjectRepository_tips: '可以为github仓库或worker上的路径',
mlflowProjectVersion: '项目版本', mlflowProjectVersion: '项目版本',
mlflowProjectVersion_tips: '项目git版本', mlflowProjectVersion_tips: '项目git版本',
mlflow_cpuLimit: '最大cpu限制',
mlflow_memoryLimit: '最大内存限制',
openmldb_zk_address: 'zookeeper地址', openmldb_zk_address: 'zookeeper地址',
openmldb_zk_address_tips: '请输入zookeeper地址', openmldb_zk_address_tips: '请输入zookeeper地址',
openmldb_zk_path: 'zookeeper路径', openmldb_zk_path: 'zookeeper路径',

28
dolphinscheduler-ui/src/views/projects/task/components/node/fields/use-mlflow-models.ts

@ -23,6 +23,8 @@ export function useMlflowModels(model: { [field: string]: any }): IJsonItem[] {
const deployTypeSpan = ref(0) const deployTypeSpan = ref(0)
const deployModelKeySpan = ref(0) const deployModelKeySpan = ref(0)
const deployPortSpan = ref(0) const deployPortSpan = ref(0)
const cpuLimitSpan = ref(0)
const memoryLimitSpan = ref(0)
const setFlag = () => { const setFlag = () => {
model.isModels = model.mlflowTaskType === 'MLflow Models' ? true : false model.isModels = model.mlflowTaskType === 'MLflow Models' ? true : false
@ -35,13 +37,21 @@ export function useMlflowModels(model: { [field: string]: any }): IJsonItem[] {
} }
watch( watch(
() => [model.mlflowTaskType, model.registerModel], () => [model.mlflowTaskType],
() => { () => {
setFlag() setFlag()
resetSpan() resetSpan()
} }
) )
watch(
() => [model.deployType],
() => {
cpuLimitSpan.value = model.deployType === "DOCKER COMPOSE" ? 12 : 0
memoryLimitSpan.value = model.deployType === "DOCKER COMPOSE" ? 12 : 0
}
)
setFlag() setFlag()
resetSpan() resetSpan()
@ -64,6 +74,18 @@ export function useMlflowModels(model: { [field: string]: any }): IJsonItem[] {
field: 'deployPort', field: 'deployPort',
name: t('project.node.mlflow_deployPort'), name: t('project.node.mlflow_deployPort'),
span: deployPortSpan span: deployPortSpan
},
{
type: 'input',
field: 'cpuLimit',
name: t('project.node.mlflow_cpuLimit'),
span: cpuLimitSpan
},
{
type: 'input',
field: 'memoryLimit',
name: t('project.node.mlflow_memoryLimit'),
span: memoryLimitSpan
} }
] ]
} }
@ -76,5 +98,9 @@ const DEPLOY_TYPE = [
{ {
label: 'DOCKER', label: 'DOCKER',
value: 'DOCKER' value: 'DOCKER'
},
{
label: 'DOCKER COMPOSE',
value: 'DOCKER COMPOSE'
} }
] ]

2
dolphinscheduler-ui/src/views/projects/task/components/node/format-data.ts

@ -355,6 +355,8 @@ export function formatParams(data: INodeData): {
taskParams.deployModelKey = data.deployModelKey taskParams.deployModelKey = data.deployModelKey
taskParams.mlflowProjectRepository = data.mlflowProjectRepository taskParams.mlflowProjectRepository = data.mlflowProjectRepository
taskParams.mlflowProjectVersion = data.mlflowProjectVersion taskParams.mlflowProjectVersion = data.mlflowProjectVersion
taskParams.cpuLimit = data.cpuLimit
taskParams.memoryLimit = data.memoryLimit
} }
if (data.taskType === 'OPENMLDB') { if (data.taskType === 'OPENMLDB') {

2
dolphinscheduler-ui/src/views/projects/task/components/node/tasks/use-mlflow.ts

@ -49,6 +49,8 @@ export function useMlflow({
mlflowJobType: 'CustomProject', mlflowJobType: 'CustomProject',
mlflowProjectVersion: 'master', mlflowProjectVersion: 'master',
automlTool: 'flaml', automlTool: 'flaml',
cpuLimit: '0.5',
memoryLimit: '500M',
mlflowCustomProjectParameters: [], mlflowCustomProjectParameters: [],
delayTime: 0, delayTime: 0,
timeout: 30, timeout: 30,

2
dolphinscheduler-ui/src/views/projects/task/components/node/types.ts

@ -336,6 +336,8 @@ interface ITaskParams {
deployType?: string deployType?: string
deployPort?: string deployPort?: string
deployModelKey?: string deployModelKey?: string
cpuLimit?: string
memoryLimit?: string
zk?: string zk?: string
zkPath?: string zkPath?: string
executeMode?: string executeMode?: string

Loading…
Cancel
Save