[MLflow](https://mlflow.org) is an excellent open source platform to manage the ML lifecycle, including experimentation,
reproducibility, deployment, and a central model registry.
MLflow task plugin used to execute MLflow tasks,Currently contains Mlflow Projects and MLflow Models.(Model Registry will soon be rewarded for support)
MLflow task plugin used to execute MLflow tasks,Currently contains MLflow Projects and MLflow Models. (Model Registry will soon be rewarded for support)
- Mlflow Projects: Package data science code in a format to reproduce runs on any platform.
- MLflow Projects: Package data science code in a format to reproduce runs on any platform.
- MLflow Models: Deploy machine learning models in diverse serving environments.
- Model Registry: Store, annotate, discover, and manage models in a central repository.
The Mlflow plugin currently supports and will support the following:
The MLflow plugin currently supports and will support the following:
@ -20,10 +20,10 @@ The Mlflow plugin currently supports and will support the following:
- [ ] MLflow Models
- [x] MLFLOW: Use `MLflow models serve` to deploy a model service
- [x] Docker: Run the container after packaging the docker image
- [ ] Docker Compose: Use docker compose to run the container, Will replace the docker run above
- [x] Docker Compose: Use docker compose to run the container, it will replace the docker run above
- [ ] Seldon core: Use Selcon core to deploy model to k8s cluster
- [ ] k8s: Deploy containers directly to K8S
- [ ] mlflow deployments: Built-in deployment modules, such as built-in deployment to SageMaker, etc
- [ ] MLflow deployments: Built-in deployment modules, such as built-in deployment to SageMaker, etc
- [ ] Model Registry
- [ ] Register Model: Allows artifacts (Including model and related parameters, indicators) to be registered directly into the model center
@ -37,7 +37,7 @@ The Mlflow plugin currently supports and will support the following:
## Task Example
First, introduce some general parameters of DolphinScheduler
First, introduce some general parameters of DolphinScheduler:
- **Node name**: The node name in a workflow definition is unique.
- **Run flag**: Identifies whether this node schedules normally, if it does not need to execute, select
@ -56,6 +56,11 @@ First, introduce some general parameters of DolphinScheduler
- **Predecessor task**: Selecting a predecessor task for the current task, will set the selected predecessor task as
upstream of the current task.
Here are some specific parameters for the MLFlow component:
- **MLflow Tracking Server URI**: MLflow Tracking Server URI, default http://localhost:5000.
- **Experiment Name**: Create the experiment where the task is running, if the experiment does not exist. If the name is empty, it is set to ` Default `, the same as MLflow.
### MLflow Projects
#### BasicAlgorithm
@ -64,24 +69,22 @@ First, introduce some general parameters of DolphinScheduler
**Task Parameter**
- **mlflow server tracking uri** :MLflow server uri, default http://localhost:5000.
- **experiment name** :Create the experiment where the task is running, if the experiment does not exist. If the name is empty, it is set to ` Default `, the same as MLflow.
- **register model** :Register the model or not. If register is selected, the following parameters are expanded.
- **model name** : The registered model name is added to the original model version and registered as
- **Register Model**: Register the model or not. If register is selected, the following parameters are expanded.
- **Model Name**: The registered model name is added to the original model version and registered as
Production.
- **data path**: The absolute path of the file or folder. Ends with .csv for file or contain train.csv and
test.csv for folder(In the suggested way, users should build their own test sets for model evaluation)。
- **parameters**: Parameter when initializing the algorithm/AutoML model, which can be empty. For example
parameters `"time_budget=30;estimator_list=['lgbm']"` for flaml 。The convention will be passed with '; 'shards
- **Data Path**: The absolute path of the file or folder. Ends with .csv for file or contain train.csv and
test.csv for folder(In the suggested way, users should build their own test sets for model evaluation).
- **Parameters**: Parameter when initializing the algorithm/AutoML model, which can be empty. For example
parameters `"time_budget=30;estimator_list=['lgbm']"` for flaml 。The convention will be passed with '; 'shards
each parameter, using the name before the equal sign as the parameter name, and using the name after the equal
sign to get the corresponding parameter value through `python eval()`.
- **algorithm**:The selected algorithm currently supports `LR`, `SVM`, `LightGBM` and `XGboost` based
- **Algorithm**:The selected algorithm currently supports `LR`, `SVM`, `LightGBM` and `XGboost` based
on [scikit-learn](https://scikit-learn.org/) form.
- **Parameter search space**: Parameter search space when running the corresponding algorithm, which can be
- **Parameter Search Space**: Parameter search space when running the corresponding algorithm, which can be
empty. For example, the parameter `max_depth=[5, 10];n_estimators=[100, 200]` for lightgbm 。The convention
will be passed with '; 'shards each parameter, using the name before the equal sign as the parameter name,
and using the name after the equal sign to get the corresponding parameter value through `python eval()`.
@ -92,63 +95,56 @@ First, introduce some general parameters of DolphinScheduler
**Task Parameter**
- **mlflow server tracking uri** :MLflow server uri, default http://localhost:5000.
- **experiment name** :Create the experiment where the task is running, if the experiment does not exist. If the name is empty, it is set to ` Default `, the same as MLflow.
- **register model** :Register the model or not. If register is selected, the following parameters are expanded.
- **model name** : The registered model name is added to the original model version and registered as
- **Register Model**: Register the model or not. If register is selected, the following parameters are expanded.
- **model name**: The registered model name is added to the original model version and registered as
Production.
- **data path**: The absolute path of the file or folder. Ends with .csv for file or contain train.csv and
test.csv for folder(In the suggested way, users should build their own test sets for model evaluation)。
- **parameters**: Parameter when initializing the algorithm/AutoML model, which can be empty. For example
parameters `n_estimators=200;learning_rate=0.2` for flaml 。The convention will be passed with '; 'shards
- **Data Path**: The absolute path of the file or folder. Ends with .csv for file or contain train.csv and
test.csv for folder(In the suggested way, users should build their own test sets for model evaluation).
- **Parameters**: Parameter when initializing the algorithm/AutoML model, which can be empty. For example
parameters `n_estimators=200;learning_rate=0.2` for flaml. The convention will be passed with '; 'shards
each parameter, using the name before the equal sign as the parameter name, and using the name after the equal
sign to get the corresponding parameter value through `python eval()`. The detailed parameter list is as follows:
- **mlflow server tracking uri** :MLflow server uri, default http://localhost:5000.
- **experiment name** :Create the experiment where the task is running, if the experiment does not exist. If the name is empty, it is set to ` Default `, the same as MLflow.
- **parameters** : `--param-list` in `mlflow run`. For example `-P learning_rate=0.2 -P colsample_bytree=0.8 -P subsample=0.9`
- **Repository** : Repository url of MLflow Project,Support git address and directory on worker. If it's in a subdirectory,We add `#` to support this (same as `mlflow run`) , for example `https://github.com/mlflow/mlflow#examples/xgboost/xgboost_native`
- **Project Version** : Version of the project,default master
- **parameters**: `--param-list` in `mlflow run`. For example `-P learning_rate=0.2 -P colsample_bytree=0.8 -P subsample=0.9`.
- **Repository**: Repository url of MLflow Project,Support git address and directory on worker. If it's in a subdirectory,We add `#` to support this (same as `mlflow run`) , for example `https://github.com/mlflow/mlflow#examples/xgboost/xgboost_native`.
- **Project Version**: Version of the project,default master.
You can now use this feature to run all mlFlow projects on Github (For example [MLflow examples](https://github.com/mlflow/mlflow/tree/master/examples) )了。You can also create your own machine learning library to reuse your work, and then use DolphinScheduler to use your library with one click.
You can now use this feature to run all MLFlow projects on Github (For example [MLflow examples](https://github.com/mlflow/mlflow/tree/master/examples) ). You can also create your own machine learning library to reuse your work, and then use DolphinScheduler to use your library with one click.
- **Model-URI**: Model-URI of MLflow , support `models:/<model_name>/suffix` format and `runs:/` format. See https://mlflow.org/docs/latest/tracking.html#artifact-stores.
- **Port**: The port to listen on.
**Task Parameter**
- **mlflow server tracking uri** :MLflow server uri, default http://localhost:5000.
- **model-uri** :Model-uri of mlflow , support `models:/<model_name>/suffix` format and `runs:/` format. See https://mlflow.org/docs/latest/tracking.html#artifact-stores
- **mlflow server tracking uri** :MLflow server uri, default http://localhost:5000.
- **model-uri** :Model-uri of mlflow , support `models:/<model_name>/suffix` format and `runs:/` format. See https://mlflow.org/docs/latest/tracking.html#artifact-stores
- **Port** :The port to listen on
- **Max Cpu Limit**: For example `1.0` or `0.5`, the same as docker compose.
- **Max Memory Limit**: For example `1G` or `500M`, the same as docker compose.
## Environment to prepare
@ -156,7 +152,7 @@ The actual interface is as follows
You need to enter the admin account to configure a conda environment variable(Please