From 136a1830187509c465236274b2e9e423ee13075f Mon Sep 17 00:00:00 2001 From: Jiajie Zhong Date: Thu, 5 May 2022 10:57:46 +0800 Subject: [PATCH] [doc] Separate and correct getting start by docker (#9862) * [doc] Separate and correct getting start by docker * Correct content of starting docker * Remove all FAQ in docker.md and only keep build image by users themself in faq.md * Separate how to use incompatible incompatible datasource from docker.md to directory datasource * correct disable flag * fix download link * fix wrong anchor * remove not support anchor for zh docs --- docs/docs/en/faq.md | 28 + docs/docs/en/guide/datasource/hive.md | 4 + docs/docs/en/guide/datasource/introduction.md | 22 +- docs/docs/en/guide/datasource/mysql.md | 4 + docs/docs/en/guide/datasource/postgresql.md | 4 + docs/docs/en/guide/datasource/spark.md | 4 + docs/docs/en/guide/start/docker.md | 1101 ++--------------- docs/docs/zh/faq.md | 34 + docs/docs/zh/guide/datasource/hive.md | 3 + docs/docs/zh/guide/datasource/introduction.md | 19 +- docs/docs/zh/guide/datasource/mysql.md | 4 + docs/docs/zh/guide/datasource/postgresql.md | 4 + docs/docs/zh/guide/datasource/spark.md | 4 + docs/docs/zh/guide/start/docker.md | 1091 ++-------------- 14 files changed, 333 insertions(+), 1993 deletions(-) diff --git a/docs/docs/en/faq.md b/docs/docs/en/faq.md index 50eaf32fb4..adcb64b297 100644 --- a/docs/docs/en/faq.md +++ b/docs/docs/en/faq.md @@ -706,6 +706,34 @@ After version 3.0.0-alpha, Python gateway server integrate into API server, and start API server. If you want disabled when Python gateway service you could change API server configuration in path `api-server/conf/application.yaml` and change attribute `python-gateway.enabled : false`. +## 如果构建自定义的 Docker 镜像 + +DolphinScheduler 每次发版都会同时发布 Docker 镜像,你可以在 docker hub 中找到这些镜像,如果你因为个性化需求想要自己打包 docker 镜像,最佳实践是基于 DolphinScheduler 对应镜像编写 Dockerfile 文件 + +```Dockerfile +FROM dolphinscheduler-standalone-server +RUN apt update ; \ + apt install -y ; \ +``` + +如果你想基于源码进行改造,打包并分发你的镜像,可以在代码改造完成后运行 + +```shell +./mvnw -B clean deploy \ + -Dmaven.test.skip \ + -Dmaven.javadoc.skip \ + -Dmaven.checkstyle.skip \ + -Dmaven.deploy.skip \ + -Ddocker.tag=latest \ + -Pdocker,release +``` + +如果你不仅需要改造源码,还想要自定义 Docker 镜像打包的依赖,可以在修改源码的同时修改 Dockerfile 的定义,你可以在源码项目根目录中运行以下命令找到所有的 Dockerfile 文件 + +```shell +find . -iname 'Dockerfile' +``` + --- ## We will collect more FAQ later \ No newline at end of file diff --git a/docs/docs/en/guide/datasource/hive.md b/docs/docs/en/guide/datasource/hive.md index 9eff6013f6..6ac14fa479 100644 --- a/docs/docs/en/guide/datasource/hive.md +++ b/docs/docs/en/guide/datasource/hive.md @@ -37,3 +37,7 @@ login.user.keytab.username=hdfs-mycluster@ESZ.COM # login user from keytab path login.user.keytab.path=/opt/hdfs.headless.keytab ``` + +## Native Supported + +Yes, could use this datasource by default. diff --git a/docs/docs/en/guide/datasource/introduction.md b/docs/docs/en/guide/datasource/introduction.md index 5095cd685b..1f8121ca16 100644 --- a/docs/docs/en/guide/datasource/introduction.md +++ b/docs/docs/en/guide/datasource/introduction.md @@ -2,5 +2,23 @@ DataSource supports MySQL, PostgreSQL, Hive/Impala, Spark, ClickHouse, Oracle, SQL Server and other DataSources. -- Click "Data Source Center -> Create Data Source" to create different types of DataSources according to requirements. -- Click "Test Connection" to test whether the DataSource can connect successfully. \ No newline at end of file +- Click bottom "Data Source Center -> Create Data Source" to create a new datasource. +- Click "Test Connection" to test whether the DataSource can connect successfully(datasource can be saved only if passed the + connection test). + +## Using datasource incompatible to Apache LICENSE V2 LICENSE + +Some of datasource are native supported to DolphinScheduler while others need users download JDBC driver package manually, +because those JDBC driver incompatible to Apache LICENSE V2 LICENSE. For this reason we have to release DolphinScheduler's +distribute package without those packages, even if this will make more complicated for users. Datasource such as MySQL, +Oracle, SQL Server as the examples, but we have the solution to solve this + +### Example + +For example, if you want to use MySQL datasource, you need to download the correct JDBC driver from [mysql maven repository](https://repo1.maven.org/maven2/mysql/mysql-connector-java), +and move it into directory `api-server/libs` and `worker-server/libs`. After that, you could activate MySQL datasource by +restarting `api-server` and `worker-server`. Mount to container volume in the same path and restart it if you use container +like Docker. + +> Note: If you only want to use MySQL in the datasource center, there is no requirement for the version of MySQL JDBC driver. +> But if you want to use MySQL as the metabase of DolphinScheduler, it only supports [8.0.16 and above](https:/ /repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.16/mysql-connector-java-8.0.16.jar) version. diff --git a/docs/docs/en/guide/datasource/mysql.md b/docs/docs/en/guide/datasource/mysql.md index 0ac7cda7a1..1648204db4 100644 --- a/docs/docs/en/guide/datasource/mysql.md +++ b/docs/docs/en/guide/datasource/mysql.md @@ -12,3 +12,7 @@ - Database name: enter the database name of the MYSQL connection - Jdbc connection parameters: parameter settings for MYSQL connection, in JSON format +## Native Supported + +No, read section example in [introduction](introduction.md) to activate this datasource. + diff --git a/docs/docs/en/guide/datasource/postgresql.md b/docs/docs/en/guide/datasource/postgresql.md index fe05169d5f..3083e859c4 100644 --- a/docs/docs/en/guide/datasource/postgresql.md +++ b/docs/docs/en/guide/datasource/postgresql.md @@ -11,3 +11,7 @@ - Password: set the password for PostgreSQL connection - Database name: enter the database name of the PostgreSQL connection - Jdbc connection parameters: parameter settings for PostgreSQL connection, in JSON format + +## Native Supported + +Yes, could use this datasource by default. diff --git a/docs/docs/en/guide/datasource/spark.md b/docs/docs/en/guide/datasource/spark.md index 61e2c7dfed..f481966fc0 100644 --- a/docs/docs/en/guide/datasource/spark.md +++ b/docs/docs/en/guide/datasource/spark.md @@ -11,3 +11,7 @@ - Password: set the password for Spark connection - Database name: enter the database name of the Spark connection - Jdbc connection parameters: parameter settings for Spark connection, in JSON format + +## Native Supported + +Yes, could use this datasource by default. diff --git a/docs/docs/en/guide/start/docker.md b/docs/docs/en/guide/start/docker.md index 5265fa3ce5..6d7966da63 100644 --- a/docs/docs/en/guide/start/docker.md +++ b/docs/docs/en/guide/start/docker.md @@ -1,1024 +1,135 @@ -# Quick Trial Docker Deployment +# Docker Quick Start -## Pre-conditions +There are three ways to start DolphinScheduler with Docker, [Standalone-server](#using-standalone-server-docker-image) is the way you +find if you just want to start and try DolphinScheduler as a beginner. [docker-compose](#using-docker-compose-to-start-server) is for +some who want to deploy DolphinScheduler in small or event middle scale workflows in their daily work. +[Using exists postgresql and zookeeper server](#using-exists-postgresql-zookeeper) is for users who want to reuse the database +or zookeeper server already exists. + +## Prepare - [Docker](https://docs.docker.com/engine/install/) 1.13.1+ - [Docker Compose](https://docs.docker.com/compose/) 1.11.0+ -## How to use docker image? - -There are 3 ways to quickly try DolphinScheduler. - -### I. Start DolphinScheduler as docker-compose (recommended) - -This method requires the installation of [docker-compose](https://docs.docker.com/compose/). The installation of docker-compose is widely available online, so please install it yourself. - -For Windows 7-10 versions, you can install [Docker Toolbox](https://github.com/docker/toolbox/releases). For Windows 10 64-bit, you can install [Docker Desktop](https://docs.docker.com/docker-for-windows/install/) and note the [system requirements](https://docs.docker.com/ docker-for-windows/install/#system-requirements). - -#### 0. Please allocate at least 4GB of memory - -For Mac users, click on `Docker Desktop -> Preferences -> Resources -> Memory`. - -For Windows Docker Toolbox users, there are two items that need to be configured. +## Start Server -- **Memory**: Open Oracle VirtualBox Manager and if you double click on Docker Quickstart Terminal and run Docker Toolbox successfully, you will see a virtual machine named `default`. Click on `Settings -> System -> Motherboard -> Memory Size`. -- **Port Forwarding**: Click `Settings -> Network -> Advanced -> Port Forwarding -> Add`. `Name`, fill in `12345` for both `Host Port` and `Subsystem Port`, leave out the `Host IP` and `Subsystem IP`. +### Using standalone-server Docker Image -For Windows Docker Desktop users. -- **Hyper-V Mode**: Click `Docker Desktop -> Settings -> Resources -> Memory` -- **WSL 2 Mode**: Reference [WSL 2 utility VM](https://docs.microsoft.com/zh-cn/windows/wsl/wsl-config#configure-global-options-with-wslconfig) - -#### 1.Download the source code package - -Please download the source package apache-dolphinscheduler-x.x.x-src.tar.gz from: [download](/en-us/download/download.html) - -#### 2.Pull the image and start the service - -> For Mac and Linux user, open **Terminal** -> For Windows Docker Toolbox user, open **Docker Quickstart Terminal** -> For Windows Docker Desktop user, open **Windows PowerShell** +Start DolphinScheduler with standalone-server Docker images is the easiest way to experience and explode it. In this way, +you can learn DolphinScheduler's concepts and usage, with minimal cost. +```shell +$ DOLPHINSCHEDULER_VERSION= +$ docker run --name dolphinscheduler-standalone-server -p 12345:12345 -p 25333:25333 -d apache/dolphinscheduler-standalone-server:"${DOLPHINSCHEDULER_VERSION}" ``` -$ tar -zxvf apache-dolphinscheduler--src.tar.gz -$ cd apache-dolphinscheduler--src/deploy/docker -$ docker pull dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler: -$ docker tag apache/dolphinscheduler: apache/dolphinscheduler:latest -$ docker-compose up -d -``` -> PowerShell should use `cd apache-dolphinscheduler--src\deploy\docker` - -**PostgreSQL** (user `root`, password `root`, database `dolphinscheduler`) and **ZooKeeper** services will be started by default -#### 3.Login system +> Note: Do not use apache/dolphinscheduler-standalone-server Docker image in production, it should only for you to taste +> DolphinScheduler at the first time. Not only because it runs all services in one single process, but also it uses H2 as +> its database which will lose metadata after it stops (could be changed to another database to avoid it). In addition, +> apache/dolphinscheduler-standalone-server only contains DolphinScheduler core services, some tasks such as Spark and Flink, +> require external components or environment to run it. -To access the front-end page: http://localhost:12345/dolphinscheduler, please change to the corresponding IP address if necessary +### Using docker-compose to Start Server -The default user is `admin` and the default password is `dolphinscheduler123`. +The difference between start services by docker-compose and standalone-server are servers running in one single process +or not. Services start with docker-compose running in separate containers, as well as different processes. Metadata could +be stored on disks after you change docker-compose configuration, and it is robust and stable for someone who wants to run +DolphinScheduler in a long term. You have to install [docker-compose](https://docs.docker.com/compose/install/) before you +start servers. -![login](/img/new_ui/dev/quick-start/login.png) - -Please refer to the user manual chapter [Quick Start] (... /start/quick-start.md) to see how to use DolphinScheduler. - -### II. By specifying the existing PostgreSQL and ZooKeeper services - -This method requires the installation of [docker](https://docs.docker.com/engine/install/). The installation of docker is well documented on the web, so please install it yourself. - -#### 1.Basic software installation (please install it yourself) - -- [PostgreSQL](https://www.postgresql.org/download/) (8.2.15+) -- [ZooKeeper](https://zookeeper.apache.org/releases.html) (3.4.6+) -- [Docker](https://docs.docker.com/engine/install/) (1.13.1+) - -#### 2. Please login to the PostgreSQL database and create a database named `dolphinscheduler`. - -#### 3. Initialize the database and import `sql/dolphinscheduler_postgre.sql` to create tables and import the base data +After installed docker-compose, it is recommended to modify some configurations for better experience. We highly recommended +modify docker-compose's memory up to 4 GB. -#### 4. Download the DolphinScheduler image +- Mac:Click `Docker Desktop -> Preferences -> Resources -> Memory` modified it +- Windows Docker Desktop: + - Hyper-V mode: Click `Docker Desktop -> Settings -> Resources -> Memory` modified it + - WSL 2 mode: see [WSL 2 utility VM](https://docs.microsoft.com/zh-cn/windows/wsl/wsl-config#configure-global-options-with-wslconfig) for more detail. -We have uploaded the DolphinScheduler images for users to the docker repository. Instead of building the image locally, users can pull the image from the docker repository by running the following command. +After complete the configuration, we can get the `docker-compose.yaml` file from [download page](/en-us/download/download.html) +form its source package, and make sure you get the right version. After download the package, you can run the commands as below. +```shell +$ DOLPHINSCHEDULER_VERSION= +$ tar -zxf apache-dolphinscheduler-"${DOLPHINSCHEDULER_VERSION}"-src.tar.gz +# Going to docker-compose's location +# For Mac or Linux users +$ cd apache-dolphinscheduler-"${DOLPHINSCHEDULER_VERSION}"-src/deploy/docker +# For Windows users +$ cd apache-dolphinscheduler-"${DOLPHINSCHEDULER_VERSION}"-src\deploy\docker +$ docker-compose up -d ``` -docker pull dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler: -``` - -#### 5. Run a DolphinScheduler instance - -``` -$ docker run -d --name dolphinscheduler \ --e DATABASE_HOST="192.168.x.x" -e DATABASE_PORT="5432" -e DATABASE_DATABASE="dolphinscheduler" \ --e DATABASE_USERNAME="test" -e DATABASE_PASSWORD="test" \ --e ZOOKEEPER_QUORUM="192.168.x.x:2181" \ --p 12345:12345 \ -apache/dolphinscheduler: all -``` - -Note: The database user test and password test need to be replaced with the actual PostgreSQL user and password. 192.168.x.x needs to be replaced with the host IP of PostgreSQL and ZooKeeper. -#### 6. Login system +> NOTES: It will not only start DolphinScheduler servers but also some others necessary services like PostgreSQL(with `root` +> as user, `root` as password and `dolphinscheduler` as database) and ZooKeeper when starting with docker-compose. -As above. +### Using Exists PostgreSQL ZooKeeper -### III. Running a standalone service in DolphinScheduler +[Using docker-compose to start server](#using-docker-compose-to-start-server) will create new a database and the ZooKeeper +container when it up. You could start DolphinScheduler server separately if you want to reuse your exists services. -When the container is started, the following services are automatically started. -``` - MasterServer ----- master service - WorkerServer ----- worker service - ApiApplicationServer ----- api service - AlertServer ----- alert service -``` - -If you just want to run some of the services in dolphinscheduler. You can run some of the services in dolphinscheduler by executing the following command. - -* Start a **master server**, as follows: - -``` +```shell +$ DOLPHINSCHEDULER_VERSION= +# Initialize the database, make sure database already exists +$ docker run -d --name dolphinscheduler-tools \ + -e DATABASE="postgresql" \ + # Use "com.mysql.cj.jdbc.driver" if you use MySQL + -e SPRING_DATASOURCE_DRIVER_CLASS_NAME="org.postgresql.Driver" \ + -e SPRING_DATASOURCE_URL="jdbc:postgresql://localhost:5432/" \ + -e SPRING_DATASOURCE_USERNAME="" \ + -e SPRING_DATASOURCE_PASSWORD="" \ + apache/dolphinscheduler-tools:"${DOLPHINSCHEDULER_VERSION}" bin/create-schema.sh +# Starting DolphinScheduler service $ docker run -d --name dolphinscheduler-master \ --e DATABASE_HOST="192.168.x.x" -e DATABASE_PORT="5432" -e DATABASE_DATABASE="dolphinscheduler" \ --e DATABASE_USERNAME="test" -e DATABASE_PASSWORD="test" \ --e ZOOKEEPER_QUORUM="192.168.x.x:2181" \ -apache/dolphinscheduler: master-server -``` - -* Start a **worker server**, as follows: - -``` + -e DATABASE="postgresql" \ + -e SPRING_DATASOURCE_DRIVER_CLASS_NAME="org.postgresql.Driver" \ + -e SPRING_DATASOURCE_URL="jdbc:postgresql://localhost:5432/dolphinscheduler" \ + -e SPRING_DATASOURCE_USERNAME="" \ + -e SPRING_DATASOURCE_PASSWORD="" \ + -e REGISTRY_ZOOKEEPER_CONNECT_STRING="localhost:2181" \ + -d apache/dolphinscheduler-master:"${DOLPHINSCHEDULER_VERSION}" $ docker run -d --name dolphinscheduler-worker \ --e DATABASE_HOST="192.168.x.x" -e DATABASE_PORT="5432" -e DATABASE_DATABASE="dolphinscheduler" \ --e DATABASE_USERNAME="test" -e DATABASE_PASSWORD="test" \ --e ZOOKEEPER_QUORUM="192.168.x.x:2181" \ -apache/dolphinscheduler: worker-server -``` - -* Start a **api server**, as follows: - -``` + -e DATABASE="postgresql" \ + -e SPRING_DATASOURCE_DRIVER_CLASS_NAME="org.postgresql.Driver" \ + -e SPRING_DATASOURCE_URL="jdbc:postgresql://localhost:5432/dolphinscheduler" \ + -e SPRING_DATASOURCE_USERNAME="" \ + -e SPRING_DATASOURCE_PASSWORD="" \ + -e REGISTRY_ZOOKEEPER_CONNECT_STRING="localhost:2181" \ + -d apache/dolphinscheduler-worker:"${DOLPHINSCHEDULER_VERSION}" $ docker run -d --name dolphinscheduler-api \ --e DATABASE_HOST="192.168.x.x" -e DATABASE_PORT="5432" -e DATABASE_DATABASE="dolphinscheduler" \ --e DATABASE_USERNAME="test" -e DATABASE_PASSWORD="test" \ --e ZOOKEEPER_QUORUM="192.168.x.x:2181" \ --p 12345:12345 \ -apache/dolphinscheduler: api-server -``` - -* Start a **alter server**, as follows: - -``` -$ docker run -d --name dolphinscheduler-alert \ --e DATABASE_HOST="192.168.x.x" -e DATABASE_PORT="5432" -e DATABASE_DATABASE="dolphinscheduler" \ --e DATABASE_USERNAME="test" -e DATABASE_PASSWORD="test" \ -apache/dolphinscheduler: alert-server -``` - -**NOTE**: When you run some of the services in dolphinscheduler, you must specify these environment variables `DATABASE_HOST`, `DATABASE_PORT`, `DATABASE_DATABASE`, `DATABASE_USERNAME`, `DATABASE_ PASSWORD`, `ZOOKEEPER_QUORUM`. - -## Environment variables - -Docker containers are configured via environment variables, appendix-environment-variables lists the configurable environment variables for DolphinScheduler and their default values - -In particular, in Docker Compose and Docker Swarm, this can be configured via the environment variable configuration file `config.env.sh`. - -## Support Matrix - -| Type | Support | Notes | -| ------------------------------------------------------------ | ------- | --------------------- | -| Shell | Yes | Yes -| Python2 | Yes | Yes -| Python3 | Indirect support | See FAQ | -| Hadoop2 | Indirect support | See FAQ | -| Hadoop3 | Not yet determined | Not yet tested | -| Spark-Local(client) | Indirect support | See FAQ | -| Spark-YARN(cluster) | Indirect support | See FAQ | -| Spark-Standalone(cluster) | not yet | | -| Spark-Kubernetes(cluster) | not yet | -| Flink-Local(local>=1.11) | Not yet | Generic CLI mode is not yet supported -| Flink-YARN(yarn-cluster) | indirectly supported | see FAQ | -| Flink-YARN(yarn-session/yarn-per-job/yarn-application>=1.11) | Not yet | Generic CLI mode is not yet supported | -| Flink-Standalone(default) | not yet | -| Flink-Standalone(remote>=1.11) | Not yet | Generic CLI mode is not yet supported | -| Flink-Kubernetes(default) | not yet | | Flink-Kubernetes(default) | not yet | -| Flink-Kubernetes(remote>=1.11) | Not yet | Generic CLI mode is not yet supported | -| Flink-NativeKubernetes(kubernetes-session/application>=1.11) | Not yet | Generic CLI mode is not yet supported | -| MapReduce | Indirectly supported | See FAQ | -| Kerberos | Indirectly supported | See FAQ | -| HTTP | Yes | Yes -| DataX | Indirect support | See FAQ | Yes -| Sqoop | Indirect Support | See FAQ | -| SQL-MySQL | Indirect Support | See FAQ | -| SQL-PostgreSQL | Yes | Yes -| SQL-Hive | Indirect Support | See FAQ | -| SQL-Spark | Indirect support | See FAQ | -| SQL-ClickHouse | Indirect Support | See FAQ | -| SQL-Oracle | Indirect Support | See FAQ | -| SQL-SQLServer | Indirect Support | See FAQ | -| SQL-DB2 | Indirect Support | See FAQ | - -## FAQ - -### How to manage DolphinScheduler via docker-compose? - -Start, restart, stop or list all containers: - -``` -docker-compose start -docker-compose restart -docker-compose stop -docker-compose ps -``` - -Stop all containers and remove all containers, networks: - -``` -docker-compose down -``` - -Stop all containers and remove all containers, networks and storage volumes: - -``` -docker-compose down -v -``` - -### How do I check the logs of a container? - -Lists all running containers: - -``` -docker ps -docker ps --format "{{.Names}}" # Show container name only -``` - -View the logs of the container named docker-swarm_dolphinscheduler-api_1: - -``` -docker logs docker-swarm_dolphinscheduler-api_1 -docker logs -f docker-swarm_dolphinscheduler-api_1 # Follow the latest logs -docker logs --tail 10 docker-swarm_dolphinscheduler-api_1 # Follow the latest ten lines of logs -``` - -### How to scale master and worker with docker-compose? - -Scaling master to 2 instances: - -``` -docker-compose up -d --scale dolphinscheduler-master=2 dolphinscheduler-master -``` - -Scaling worker to 3 instances: - -``` -docker-compose up -d --scale dolphinscheduler-worker=3 dolphinscheduler-worker -``` - -### How to deploy DolphinScheduler on Docker Swarm? - -Assuming the Docker Swarm cluster has been deployed (see [create-swarm](https://docs.docker.com/engine/swarm/swarm-tutorial/create-swarm/) if a Docker Swarm cluster has not been created yet). - -Start a stack called dolphinscheduler: - -``` -docker stack deploy -c docker-stack.yml dolphinscheduler -``` - -List all services of the stack named dolphinscheduler: - -``` -docker stack services dolphinscheduler -``` - -Stop and remove the stack named dolphinscheduler: - -``` -docker stack rm dolphinscheduler -``` - -Remove all storage volumes from the stack named dolphinscheduler: - -``` -docker volume rm -f $(docker volume ls --format "{{.Name}}" | grep -e "^dolphinscheduler") -``` - -### How to scale up and down master and worker on Docker Swarm? - -Scaling up the master of a stack named dolphinscheduler to 2 instances: - -``` -docker service scale dolphinscheduler_dolphinscheduler-master=2 -``` - -Scaling up the workers of the stack named dolphinscheduler to 3 instances: - -``` -docker service scale dolphinscheduler_dolphinscheduler-worker=3 -``` - -### How to build a Docker image? - -#### Build from source (requires Maven 3.3+ & JDK 1.8+) - -In a Unix-like system, execute in Terminal: - -```bash -$ bash ./docker/build/hooks/build -``` - -In a Windows system, execute in cmd or PowerShell: - -```bat -C:\dolphinscheduler-src>.\docker\build\hooks\build.bat -``` - -If you don't understand `. /docker/build/hooks/build` `. /docker/build/hooks/build.bat` these scripts, please read the contents. - -#### Build from binary packages (Maven 3.3+ & JDK 1.8+ not required) - -Please download the binary package apache-dolphinscheduler--bin.tar.gz from: [download](/zh-cn/download/download.html). Then put apache-dolphinscheduler--bin.tar.gz into the `apache-dolphinscheduler--src/docker/build` directory and execute it in Terminal or PowerShell: - -``` -$ cd apache-dolphinscheduler--src/docker/build -$ docker build --build-arg VERSION= -t apache/dolphinscheduler: . -``` - -> PowerShell should use `cd apache-dolphinscheduler--src/docker/build` - -#### Building images for multi-platform architectures - -Currently supports building images for `linux/amd64` and `linux/arm64` platform architectures, requiring - -1. support for [docker buildx](https://docs.docker.com/engine/reference/commandline/buildx/) -2. have push permissions for https://hub.docker.com/r/apache/dolphinscheduler (**Be careful**: the build command automatically pushes multi-platform architecture images to the apache/dolphinscheduler docker hub by default) - -Execute : - -```bash -$ docker login # login, use to push apache/dolphinscheduler -$ bash ./docker/build/hooks/build x -``` - -### How to add an environment variable to Docker? - -If you want to add additional actions and environment variables at compile time or run time, you can do so in the `/root/start-init-conf.sh` file, and if you need to change the configuration file, please change the corresponding configuration file in `/opt/dolphinscheduler/conf/*.tpl`. configuration file - -For example, add an environment variable `SECURITY_AUTHENTICATION_TYPE` to `/root/start-init-conf.sh`. - -``` -export SECURITY_AUTHENTICATION_TYPE=PASSWORD -``` - -After adding the above environment variables, you should add this environment variable configuration to the corresponding template file `application-api.properties.tpl`: - -``` -security.authentication.type=${SECURITY_AUTHENTICATION_TYPE} -``` - -`/root/start-init-conf.sh` will dynamically generate a configuration file based on the template file. - -```sh -echo "generate dolphinscheduler config" -ls ${DOLPHINSCHEDULER_HOME}/conf/ | grep ".tpl" | while read line; do -eval "cat << EOF -$(cat ${DOLPHINSCHEDULER_HOME}/conf/${line}) -EOF -" > ${DOLPHINSCHEDULER_HOME}/conf/${line%.*} -done -``` - -### How to replace PostgreSQL with MySQL as the database for DolphinScheduler? - -> Due to commercial licensing, we cannot use MySQL driver packages directly. -> -> If you want to use MySQL, you can build it based on the official image `apache/dolphinscheduler`. - -1. Download the MySQL driver package [mysql-connector-java-8.0.16.jar](https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.16/mysql-connector-java-8.0.16.jar) - -2. Create a new `Dockerfile` to add the MySQL driver package: - -``` -FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler: -COPY mysql-connector-java-8.0.16.jar /opt/dolphinscheduler/lib -``` - -3. Build a new image containing the MySQL driver package: - -``` -docker build -t apache/dolphinscheduler:mysql-driver -``` - -4. Modify all the image fields in the `docker-compose.yml` file to `apache/dolphinscheduler:mysql-driver`. - -> If you want to deploy dolphinscheduler on Docker Swarm, you need to modify `docker-stack.yml`. - -5. Comment out the `dolphinscheduler-postgresql` block in the `docker-compose.yml` file. - -6. Add the `dolphinscheduler-mysql` service to the `docker-compose.yml` file (**optional**, you can use an external MySQL database directly). - -7. Modify the DATABASE environment variable in the `config.env.sh` file. - -``` -DATABASE_TYPE=mysql -DATABASE_DRIVER=com.mysql.jdbc.Driver -DATABASE_HOST=dolphinscheduler-mysql -DATABASE_PORT=3306 -DATABASE_USERNAME=root -DATABASE_PASSWORD=root -DATABASE_DATABASE=dolphinscheduler -DATABASE_PARAMS=useUnicode=true&characterEncoding=UTF-8 -``` - -> If you have already added the `dolphinscheduler-mysql` service, set `DATABASE_HOST` to `dolphinscheduler-mysql`. - -8. Run dolphinscheduler (see **How to use a docker image** for details). - -### How do I support MySQL data sources in the Data Source Centre? - -> Due to commercial licensing, we cannot use MySQL's driver packages directly. -> -> If you want to add a MySQL datasource, you can build it based on the official image `apache/dolphinscheduler`. - -1. Download MySQL driver package [mysql-connector-java-8.0.16.jar](https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.16/mysql-connector-java-8.0.16.jar) - -2. Create a new `Dockerfile` to add the MySQL driver package: - -``` -FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler: -COPY mysql-connector-java-8.0.16.jar /opt/dolphinscheduler/lib -``` - -3. Build a new image containing the MySQL driver package: - -``` -docker build -t apache/dolphinscheduler:mysql-driver . -``` - -4. Change all `image` fields in the `docker-compose.yml` file to `apache/dolphinscheduler:mysql-driver`. - -> If you want to deploy dolphinscheduler on Docker Swarm, you will need to modify `docker-stack.yml`. - -5. Run dolphinscheduler (see **How to use a docker image** for details). - -6. Add a MySQL data source to the data source centre. - -### How to support Oracle data sources in the Data Source Centre? - -> Due to commercial licensing, we cannot use Oracle's driver packages directly. -> -> If you want to add an Oracle datasource, you can build it based on the official image `apache/dolphinscheduler`. - -1. Download the Oracle driver package [ojdbc8.jar](https://repo1.maven.org/maven2/com/oracle/database/jdbc/ojdbc8/) (such as `ojdbc8-19.9.0.0.jar`). - -2. Create a new `Dockerfile` to add the Oracle driver package: - -``` -FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler: -COPY ojdbc8-19.9.0.0.jar /opt/dolphinscheduler/lib -``` - -3. Build a new image containing the Oracle driver package: - -``` -docker build -t apache/dolphinscheduler:oracle-driver . -``` - -4. Change all `image` fields in the `docker-compose.yml` file to `apache/dolphinscheduler:oracle-driver`. - -> If you want to deploy dolphinscheduler on Docker Swarm, you will need to modify `docker-stack.yml`. - -5. Run dolphinscheduler (see **How to use a docker image** for details). - -6. Add an Oracle data source to the data source centre. - -### How to support Python 2 pip and custom requirements.txt? - -1. Create a new `Dockerfile` for installing pip: - -``` -FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler: -COPY requirements.txt /tmp -RUN apt-get update && \ - apt-get install -y --no-install-recommends python-pip && \ - pip install --no-cache-dir -r /tmp/requirements.txt && \ - rm -rf /var/lib/apt/lists/* -``` - -This command will install the default **pip 18.1**. If you want to upgrade pip, just add a line. - - -``` - pip install --no-cache-dir -U pip && \ -``` - -2. Build a new image containing pip. - -``` -docker build -t apache/dolphinscheduler:pip . -``` - -3. Change all `image` fields in the `docker-compose.yml` file to `apache/dolphinscheduler:pip`. - -> If you want to deploy dolphinscheduler on Docker Swarm, you will need to modify `docker-stack.yml`. - -4. Run dolphinscheduler (see **How to use docker images** for details). - -5. Verify pip under a new Python task. - -### How do I support Python 3? - -1. Create a new `Dockerfile` for installing Python 3: - -``` -FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler: -RUN apt-get update && \ - apt-get install -y --no-install-recommends python3 && \ - rm -rf /var/lib/apt/lists/* -``` - -This command will install the default **Python 3.7.3**. If you also want to install **pip3**, replace `python3` with `python3-pip` and you're done. - -``` - apt-get install -y --no-install-recommends python3-pip && \ -``` - -2. Build a new image containing Python 3: - -``` -docker build -t apache/dolphinscheduler:python3 . -``` - -3. Change all `image` fields in the `docker-compose.yml` file to `apache/dolphinscheduler:python3`. - -> If you want to deploy dolphinscheduler on Docker Swarm, you will need to modify `docker-stack.yml`. - -4. Modify `PYTHON_HOME` to `/usr/bin/python3` in the `config.env.sh` file. + -e DATABASE="postgresql" \ + -e SPRING_DATASOURCE_DRIVER_CLASS_NAME="org.postgresql.Driver" \ + -e SPRING_DATASOURCE_URL="jdbc:postgresql://localhost:5432/dolphinscheduler" \ + -e SPRING_DATASOURCE_USERNAME="" \ + -e SPRING_DATASOURCE_PASSWORD="" \ + -e REGISTRY_ZOOKEEPER_CONNECT_STRING="localhost:2181" \ + -d apache/dolphinscheduler-api:"${DOLPHINSCHEDULER_VERSION}" +$ docker run -d --name dolphinscheduler-alert-server \ + -e DATABASE="postgresql" \ + -e SPRING_DATASOURCE_DRIVER_CLASS_NAME="org.postgresql.Driver" \ + -e SPRING_DATASOURCE_URL="jdbc:postgresql://localhost:5432/dolphinscheduler" \ + -e SPRING_DATASOURCE_USERNAME="" \ + -e SPRING_DATASOURCE_PASSWORD="" \ + -e REGISTRY_ZOOKEEPER_CONNECT_STRING="localhost:2181" \ + -d apache/dolphinscheduler-alert-server:"${DOLPHINSCHEDULER_VERSION}" +``` + +> Note: You should install and start [PostgreSQL](https://www.postgresql.org/download/)(8.2.15+) and [ZooKeeper](https://zookeeper.apache.org/releases.html)(3.4.6+) +> by yourself if you want to use this way to start Dolphinscheduler, but you do not have those services + +## Login DolphinScheduler + +You could access DolphinScheduler web UI by click [http://localhost:12345/dolphinscheduler/ui](http://localhost:12345/dolphinscheduler/ui) +and use `admin` and `dolphinscheduler123` as default username and password in the login page. -5. Run dolphinscheduler (see **How to use docker images** for details). - -6. Verify Python 3 under a new Python task. - -### How to support Hadoop, Spark, Flink, Hive or DataX? - -Take Spark 2.4.7 as an example: - -1. Download the Spark 2.4.7 release binary package `spark-2.4.7-bin-hadoop2.7.tgz`. - -2. Run dolphinscheduler (see **How to use a docker image** for details). - -3. Copy the Spark 2.4.7 binary package to the Docker container. - -```bash -docker cp spark-2.4.7-bin-hadoop2.7.tgz docker-swarm_dolphinscheduler-worker_1:/opt/soft -``` - -Because the storage volume `dolphinscheduler-shared-local` is mounted to `/opt/soft`, all files in `/opt/soft` will not be lost. - -4. Login to the container and make sure `SPARK_HOME2` exists. - -```bash -docker exec -it docker-swarm_dolphinscheduler-worker_1 bash -cd /opt/soft -tar zxf spark-2.4.7-bin-hadoop2.7.tgz -rm -f spark-2.4.7-bin-hadoop2.7.tgz -ln -s spark-2.4.7-bin-hadoop2.7 spark2 # or mv -$SPARK_HOME2/bin/spark-submit --version -``` - -If everything executes correctly, the last command will print the Spark version information. - -5. Verify Spark in a shell task. - -``` -$SPARK_HOME2/bin/spark-submit --class org.apache.spark.examples.SparkPi $SPARK_HOME2/examples/jars/spark-examples_2.11-2.4.7.jar -``` - -Check if the task log contains the output `Pi is roughly 3.146015`. - -6. Verifying Spark in a Spark Task - -The file `spark-examples_2.11-2.4.7.jar` needs to be uploaded to the Resource Center first, then a Spark task created and set up: - -- Spark version: `SPARK2` -- Class of main function: `org.apache.spark.examples.SparkPi` -- Main package: `spark-examples_2.11-2.4.7.jar` -- Deployment method: `local` - -Similarly, check if the task log contains output `Pi is roughly 3.146015` - -7. Verifying Spark on YARN - -Spark on YARN (deployed as a `cluster` or `client`) requires Hadoop support. Similar to Spark support, supporting Hadoop is almost identical to the previous steps. - -Make sure `$HADOOP_HOME` and `$HADOOP_CONF_DIR` are present. - -### How is Spark 3 supported? - -In fact, submitting an application using `spark-submit` is the same, whether it is Spark 1, 2 or 3. In other words, the semantics of `SPARK_HOME2` is a second `SPARK_HOME`, not the `HOME` of `SPARK2`, so simply setting `SPARK_HOME2=/path/to/ spark3`. - -Let's take Spark 3.1.1 as an example: - -1. Download the Spark 3.1.1 release binary package `spark-3.1.1-bin-hadoop2.7.tgz`. - -2. Run dolphinscheduler (see **How to use a docker image** for details). - -3. Copy the Spark 3.1.1 binary package to the Docker container - -```bash -docker cp spark-3.1.1-bin-hadoop2.7.tgz docker-swarm_dolphinscheduler-worker_1:/opt/soft -``` - -4. log in to the container and ensure that `SPARK_HOME2` exists - -```bash -docker exec -it docker-swarm_dolphinscheduler-worker_1 bash -cd /opt/soft -tar zxf spark-3.1.1-bin-hadoop2.7.tgz -rm -f spark-3.1.1-bin-hadoop2.7.tgz -ln -s spark-3.1.1-bin-hadoop2.7 spark2 # or mv -$SPARK_HOME2/bin/spark-submit --version -``` - -If everything executes correctly, the last command will print the Spark version information. - -5. Verify Spark in a shell task. - -``` -$SPARK_HOME2/bin/spark-submit --class org.apache.spark.examples.SparkPi $SPARK_HOME2/examples/jars/spark-examples_2.12-3.1.1.jar -``` - -Check if the task log contains the output `Pi is roughly 3.146015`. - -### How to support shared storage between Master, Worker and Api services? - -> **Note**: If you are deploying via docker-compose on a single machine, steps 1 and 2 can be skipped and you can execute commands like `docker cp hadoop-3.2.2.tar.gz docker-swarm_dolphinscheduler-worker_1:/opt/soft ' Place Hadoop in the container under the shared directory /opt/soft. - -For example, the Master, Worker and Api services may use Hadoop at the same time. - -1. Modify the `dolphinscheduler-shared-local` storage volume in the `docker-compose.yml` file to support nfs. - -> If you want to deploy dolphinscheduler on Docker Swarm, you need to modify `docker-stack.yml`. - -```yaml -volumes: - dolphinscheduler-shared-local: - driver_opts: - type: "nfs" - o: "addr=10.40.0.199,nolock,soft,rw" - device: ":/path/to/shared/dir" -``` - - -2. Put Hadoop into nfs. - -3. Make sure `$HADOOP_HOME` and `$HADOOP_CONF_DIR` are correct. - -### How to support local file storage instead of HDFS and S3? - -> **Note**: If you are deploying on a standalone machine via docker-compose, you can skip step 2. - -1. modify the following environment variables in the `config.env.sh` file: - -``` -RESOURCE_STORAGE_TYPE=HDFS -FS_DEFAULT_FS=file:/// -``` - -2. Modify the `dolphinscheduler-resource-local` storage volume in the `docker-compose.yml` file to support nfs. - -> If you want to deploy dolphinscheduler on Docker Swarm, you need to modify `docker-stack.yml`. - -```yaml -volumes: - dolphinscheduler-resource-local: - driver_opts: - type: "nfs" - o: "addr=10.40.0.199,nolock,soft,rw" - device: ":/path/to/resource/dir" -``` - -### How do I support S3 resource stores such as MinIO? - -Take MinIO as an example: Modify the following environment variables in the `config.env.sh` file. - -``` -RESOURCE_STORAGE_TYPE=S3 -RESOURCE_UPLOAD_PATH=/dolphinscheduler -FS_DEFAULT_FS=s3a://BUCKET_NAME -FS_S3A_ENDPOINT=http://MINIO_IP:9000 -FS_S3A_ACCESS_KEY=MINIO_ACCESS_KEY -FS_S3A_SECRET_KEY=MINIO_SECRET_KEY -``` - -`BUCKET_NAME`, `MINIO_IP`, `MINIO_ACCESS_KEY` and `MINIO_SECRET_KEY` need to be changed to actual values. - -> **NOTE**: `MINIO_IP` can only use IPs and not domain names, as DolphinScheduler does not yet support S3 path style access. - -### How to configure SkyWalking? - -Modify the SKYWALKING environment variables in the `config.env.sh` file. - -``` -SKYWALKING_ENABLE=true -SW_AGENT_COLLECTOR_BACKEND_SERVICES=127.0.0.1:11800 -SW_GRPC_LOG_SERVER_HOST=127.0.0.1 -SW_GRPC_LOG_SERVER_PORT=11800 -``` - -## Appendix - Environment Variables - -### Database - -**`DATABASE_TYPE`** - -Configure the `TYPE` of the `database`, default value `postgresql`. - -**NOTE**: This environment variable must be specified when running the `master-server`, `worker-server`, `api-server`, and `alert-server` services in `dolphinscheduler`, so that you can build distributed services better. - -**`DATABASE_DRIVER`** - -Configure `DRIVER` for `database`, default value `org.postgresql.Driver`. - -**NOTE**: This environment variable must be specified when running the `master-server`, `worker-server`, `api-server`, and `alert-server` services in `dolphinscheduler`, so that you can better build distributed services. - -**`DATABASE_HOST`** - -Configure the `HOST` of `database`, default value `127.0.0.1`. - -**NOTE**: This environment variable must be specified when running `master-server`, `worker-server`, `api-server`, `alert-server` services in `dolphinscheduler` so that you can build distributed services better. - -**`DATABASE_PORT`** - -Configure `PORT` for `database`, default value `5432`. - -**NOTE**: This environment variable must be specified when running `master-server`, `worker-server`, `api-server`, `alert-server` services in `dolphinscheduler` so that you can build distributed services better. - -**`DATABASE_USERNAME`** - -Configure the `USERNAME` of `database`, default value `root`. - -**NOTE**: This environment variable must be specified when running `master-server`, `worker-server`, `api-server`, `alert-server` services in `dolphinscheduler`, so that you can build distributed services better. - -**`DATABASE_PASSWORD`** - -Configure `PASSWORD` for `database`, default value `root`. - -**NOTE**: This environment variable must be specified when running `master-server`, `worker-server`, `api-server`, `alert-server` services in `dolphinscheduler`, so that you can better build distributed services. - -**`DATABASE_DATABASE`** - -Configure `DATABASE` for `database`, default value `dolphinscheduler`. - -**NOTE**: This environment variable must be specified when running the `master-server`, `worker-server`, `api-server`, and `alert-server` services in `dolphinscheduler`, so that you can better build distributed services. - -**`DATABASE_PARAMS`** - -Configure `PARAMS` for `database`, default value `characterEncoding=utf8`. - -**NOTE**: This environment variable must be specified when running `master-server`, `worker-server`, `api-server`, `alert-server` services in `dolphinscheduler`, so that you can build distributed services better. - -### ZooKeeper - -**`ZOOKEEPER_QUORUM`** - -Configure the `Zookeeper` address for `dolphinscheduler`, default value `127.0.0.1:2181`. - -**NOTE**: This environment variable must be specified when running the `master-server`, `worker-server`, `api-server` services in `dolphinscheduler`, so that you can build distributed services better. - -**`ZOOKEEPER_ROOT`** - -Configure `dolphinscheduler` as the root directory for data storage in `zookeeper`, default value `/dolphinscheduler`. - -### General - -**`DOLPHINSCHEDULER_OPTS`** - -Configure `jvm options` for `dolphinscheduler`, for `master-server`, `worker-server`, `api-server`, `alert-server`, default `""`, - -**`DATA_BASEDIR_PATH`** - -User data directory, user configured, make sure it exists and user read/write access, default value `/tmp/dolphinscheduler`. - -**`RESOURCE_STORAGE_TYPE`** - -Configure the resource storage type for `dolphinscheduler`, options are `HDFS`, `S3`, `NONE`, default `HDFS`. - -**`RESOURCE_UPLOAD_PATH`** - -Configure the resource storage path on `HDFS/S3`, default value `/dolphinscheduler`. - -**`FS_DEFAULT_FS`** - -Configure the file system protocol for the resource store, e.g. `file:///`, `hdfs://mycluster:8020` or `s3a://dolphinscheduler`, default value `file:///`. - -**`FS_S3A_ENDPOINT`** - -When `RESOURCE_STORAGE_TYPE=S3`, the access path to `S3` needs to be configured, default value `s3.xxx.amazonaws.com`. - -**`FS_S3A_ACCESS_KEY`** - -When `RESOURCE_STORAGE_TYPE=S3`, you need to configure the `s3 access key` of `S3`, default value `xxxxxxx`. - -**`FS_S3A_SECRET_KEY`** - -When `RESOURCE_STORAGE_TYPE=S3`, you need to configure `s3 secret key` for `S3`, default value `xxxxxxx`. - -**`HADOOP_SECURITY_AUTHENTICATION_STARTUP_STATE`** - -Configure whether `dolphinscheduler` is kerberos enabled, default value `false`. - -**`JAVA_SECURITY_KRB5_CONF_PATH`** - -Configure the path to java.security.krb5.conf for `dolphinscheduler`, default value `/opt/krb5.conf`. - -**`LOGIN_USER_KEYTAB_USERNAME`** - -Configure the keytab username for the `dolphinscheduler` login user, default value `hdfs@HADOOP.COM`. - -**`LOGIN_USER_KEYTAB_PATH`** - -Configure the keytab path for the `dolphinscheduler` login user, default value `/opt/hdfs.keytab`. - -**`KERBEROS_EXPIRE_TIME`** - -Configure the kerberos expiration time for `dolphinscheduler`, in hours, default value `2`. - -**`HDFS_ROOT_USER`** - -Configure the root user name of hdfs for `dolphinscheduler` when `RESOURCE_STORAGE_TYPE=HDFS`, default value `hdfs`. - -**`RESOURCE_MANAGER_HTTPADDRESS_PORT`** - -Configure the resource manager httpaddress port for `dolphinscheduler`, default value `8088`. - -**`YARN_RESOURCEMANAGER_HA_RM_IDS`** - -Configure `dolphinscheduler`'s yarn resourcemanager ha rm ids, default value `null`. - -**`YARN_APPLICATION_STATUS_ADDRESS`** - -Configure the yarn application status address for `dolphinscheduler`, default value `http://ds1:%s/ws/v1/cluster/apps/%s`. - -**`SKYWALKING_ENABLE`** - -Configure whether `skywalking` is enabled or not. Default value `false`. - -**`SW_AGENT_COLLECTOR_BACKEND_SERVICES`** - -Configure the collector back-end address for `skywalking`. Default value `127.0.0.1:11800`. - -**`SW_GRPC_LOG_SERVER_HOST`** - -Configure the grpc service host or IP for `skywalking`. Default value `127.0.0.1`. - -**`SW_GRPC_LOG_SERVER_PORT`** - -Configure the grpc service port for `skywalking`. Default value `11800`. - -**`HADOOP_HOME`** - -Configure `HADOOP_HOME` for `dolphinscheduler`, default value `/opt/soft/hadoop`. - -**`HADOOP_CONF_DIR`** - -Configure `HADOOP_CONF_DIR` for `dolphinscheduler`, default value `/opt/soft/hadoop/etc/hadoop`. - -**`SPARK_HOME1`** - -Configure `SPARK_HOME1` for `dolphinscheduler`, default value `/opt/soft/spark1`. - -**`SPARK_HOME2`** - -Configure `SPARK_HOME2` for `dolphinscheduler`, default value `/opt/soft/spark2`. - -**`PYTHON_HOME`** - -Configure `PYTHON_HOME` for `dolphinscheduler`, default value `/usr/bin/python`. - -**`JAVA_HOME`** - -Configure `JAVA_HOME` for `dolphinscheduler`, default value `/usr/local/openjdk-8`. - -**`HIVE_HOME`** - -Configure `HIVE_HOME` for `dolphinscheduler`, default value `/opt/soft/hive`. - -**`FLINK_HOME`** - -Configure `FLINK_HOME` for `dolphinscheduler`, default value `/opt/soft/flink`. - -**`DATAX_HOME`** - -Configure `DATAX_HOME` for `dolphinscheduler`, default value `/opt/soft/datax`. - -### Master Server - -**`MASTER_SERVER_OPTS`** - -Configure `jvm options` for `master-server`, default value `-Xms1g -Xmx1g -Xmn512m`. - -**`MASTER_EXEC_THREADS`** - -Configure the number of threads to be executed in `master-server`, default value `100`. - -**`MASTER_EXEC_TASK_NUM`** - -Configure the number of tasks to be executed in `master-server`, default value `20`. - -**`MASTER_DISPATCH_TASK_NUM`** - -Configure the number of tasks to be dispatched in `master-server`, default value `3`. - -**`MASTER_HOST_SELECTOR`** - -Configure the selector for the worker host when dispatching tasks in `master-server`, optional values are `Random`, `RoundRobin` and `LowerWeight`, default value `LowerWeight`. - -**`MASTER_HEARTBEAT_INTERVAL'** - -Configure the heartbeat interaction time in `master-server`, default value `10`. - -**`MASTER_TASK_COMMIT_RETRYTIMES`** - -Configure the number of task commit retries in `master-server`, default value `5`. - -**`MASTER_TASK_COMMIT_INTERVAL`** - -Configure the task commit interaction time in `master-server`, default value `1`. - -**`MASTER_MAX_CPULOAD_AVG`** - -Configure the `load average` value in the CPU in `master-server`, default value `-1`. - -**`MASTER_RESERVED_MEMORY`** - -Configure the reserved memory in G for `master-server`, default value `0.3`. - -### Worker Server - -**`WORKER_SERVER_OPTS`** - -Configure `jvm options` for `worker-server`, default value `-Xms1g -Xmx1g -Xmn512m`. - -**`WORKER_EXEC_THREADS`** - -Configure the number of threads to be executed in `worker-server`, default value `100`. - -**`WORKER_HEARTBEAT_INTERVAL`** - -Configure the heartbeat interaction time in `worker-server`, default value `10`. - -**`WORKER_MAX_CPULOAD_AVG`** - -Configure the maximum `load average` value in the CPU in `worker-server`, default value `-1`. - -**`WORKER_RESERVED_MEMORY`** - -Configure the reserved memory in G for `worker-server`, default value `0.3`. - -**`WORKER_GROUPS`** - -Configure the grouping of `worker-server`, default value `default`. - -### Alert Server - -**`ALERT_SERVER_OPTS`** - -Configure `jvm options` for `alert-server`, default value `-Xms512m -Xmx512m -Xmn256m`. - -**`XLS_FILE_PATH`** - -Configure the path to store `XLS` files for the `alert-server`, default value `/tmp/xls`. - -**`MAIL_SERVER_HOST`** - -Configure the mail service address for `alert-server`, default value `empty`. - -**`MAIL_SERVER_PORT`** - -Configure the mail service port for `alert-server`, default value `empty`. - -**`MAIL_SENDER`** - -Configure the mail sender for `alert-server`, default value `empty`. - -**`MAIL_USER`** - -Configure the user name of the mail service for `alert-server`, default value `empty`. - -**`MAIL_PASSWD`** - -Configure the mail service user password for `alert-server`, default value `empty`. - -**`MAIL_SMTP_STARTTLS_ENABLE`** - -Configure whether TLS is enabled for `alert-server`'s mail service, default value `true`. - -**`MAIL_SMTP_SSL_ENABLE`** - -Configure whether the mail service of `alert-server` is SSL enabled or not, default value `false`. - -**`MAIL_SMTP_SSL_TRUST`** - -Configure the trusted address for SSL for `alert-server`'s mail service, default value `null`. - -**`ENTERPRISE_WECHAT_ENABLE`** - -Configure whether the mail service of `alert-server` has Enterprise Wechat enabled, default value `false`. - -**`ENTERPRISE_WECHAT_CORP_ID`** - -Configures the Enterprise Wechat `ID` of the mail service for `alert-server`, default value `null`. - -**`ENTERPRISE_WECHAT_SECRET`** - -Configure the mail service enterprise wechat `SECRET` for `alert-server`, default value `Empty`. - -**`ENTERPRISE_WECHAT_AGENT_ID`** - -Configure `AGENT_ID` of the mail service enterprise wechat for `alert-server`, default value `Empty`. - -**`ENTERPRISE_WECHAT_USERS`** - -Configure `USERS` for the mail service enterprise microsoft for `alert-server`, default value `empty`. +![login](/img/new_ui/dev/quick-start/login.png) -### Api Server +> Note: If you start the services by the way [using exists PostgreSQL ZooKeeper](#using-exists-postgresql-zookeeper), and +> strating with multiple machine, you should change URL domain from `localhost` to IP or hostname the api server running. -**`API_SERVER_OPTS`** +## Change Environment Variable -Configure `jvm options` for `api-server`, default value `-Xms512m -Xmx512m -Xmn256m`. +You can modify some environment variables to change configurations when you are starting servers through Docker. We have +an example in [using exists PostgreSQL ZooKeeper](#using-exists-postgresql-zookeeper) to change database and ZooKeeper configurations, +and you could find all environment variables in [all environment variables](https://github.com/apache/dolphinscheduler/blob//script/env/dolphinscheduler_env.sh) +and change them if you want. diff --git a/docs/docs/zh/faq.md b/docs/docs/zh/faq.md index e57d87ff01..ca7fea50c1 100644 --- a/docs/docs/zh/faq.md +++ b/docs/docs/zh/faq.md @@ -682,4 +682,38 @@ update t_ds_version set version='2.0.1'; --- +## 在二进制分发包中找不到 python-gateway-server 文件夹 + +在 3.0.0-alpha 版本之后,Python gateway server 集成到 api server 中,当您启动 api server 后,Python gateway server 将启动。 +如果您不想在 api server 启动的时候启动 Python gateway server,您可以修改 api server 中的配置文件 `api-server/conf/application.yaml` +并更改可选项 `python-gateway.enabled` 中的值设置为 `false`。 + +## 如果构建自定义的 Docker 镜像 + +DolphinScheduler 每次发版都会同时发布 Docker 镜像,你可以在 docker hub 中找到这些镜像,如果你因为个性化需求想要自己打包 docker 镜像,最佳实践是基于 DolphinScheduler 对应镜像编写 Dockerfile 文件 + +```Dockerfile +FROM dolphinscheduler-standalone-server +RUN apt update ; \ + apt install -y ; \ +``` + +如果你想基于源码进行改造,打包并分发你的镜像,可以在代码改造完成后运行 + +```shell +./mvnw -B clean deploy \ + -Dmaven.test.skip \ + -Dmaven.javadoc.skip \ + -Dmaven.checkstyle.skip \ + -Dmaven.deploy.skip \ + -Ddocker.tag=latest \ + -Pdocker,release +``` + +如果你不仅需要改造源码,还想要自定义 Docker 镜像打包的依赖,可以在修改源码的同时修改 Dockerfile 的定义,你可以在源码项目根目录中运行以下命令找到所有的 Dockerfile 文件 + +```shell +find . -iname 'Dockerfile' +``` + 我们会持续收集更多的 FAQ。 \ No newline at end of file diff --git a/docs/docs/zh/guide/datasource/hive.md b/docs/docs/zh/guide/datasource/hive.md index c7df33542e..6598ba81b5 100644 --- a/docs/docs/zh/guide/datasource/hive.md +++ b/docs/docs/zh/guide/datasource/hive.md @@ -38,3 +38,6 @@ login.user.keytab.username=hdfs-mycluster@ESZ.COM login.user.keytab.path=/opt/hdfs.headless.keytab ``` +## 是否原生支持 + +是,数据源不需要任务附加操作即可使用。 diff --git a/docs/docs/zh/guide/datasource/introduction.md b/docs/docs/zh/guide/datasource/introduction.md index 77e91bc55c..f5cd2d6db0 100644 --- a/docs/docs/zh/guide/datasource/introduction.md +++ b/docs/docs/zh/guide/datasource/introduction.md @@ -2,5 +2,20 @@ 数据源中心支持MySQL、POSTGRESQL、HIVE/IMPALA、SPARK、CLICKHOUSE、ORACLE、SQLSERVER等数据源。 -- 点击“数据源中心->创建数据源”,根据需求创建不同类型的数据源。 -- 点击“测试连接”,测试数据源是否可以连接成功。 \ No newline at end of file +- 点击"数据源中心->创建数据源",根据需求创建不同类型的数据源 +- 点击"测试连接",测试数据源是否可以连接成功(只有当数据源通过连接性测试后才能保存数据源)。 + +## 使用不兼容 Apache LICENSE V2 许可的数据库 + +数据源中心里,DolphinScheduler 对部分数据源有原生的支持,但是部分数据源需要用户下载对应的 JDBC 驱动包并放置到正确的位置才能正常使用。 +这对用户会增加用户的使用成本,但是我们不得不这么做,因为这部分数据源的 JDBC 驱动和 Apache LICENSE V2 不兼容,所以我们无法在 +DolphinScheduler 分发的二进制包中包含他们。这部分数据源主要包括 MySQL,Oracle,SQL Server 等,幸运的是我们为这部分数据源的支持给出了解决方案。 + +### 样例 + +我们以 MySQL 为例,如果你想要使用 MySQL 数据源,你需要先在 [mysql maven 仓库](https://repo1.maven.org/maven2/mysql/mysql-connector-java) +中下载对应版本的 JDBC 驱动,将其移入 `api-server/libs` 以及 `worker-server/libs` 文件夹中,最后重启 `api-server` 和 `worker-server` +服务,即可使用 MySQL 数据源。如果你使用容器启动 DolphinScheduler,同样也是将 JDBC 驱动挂载放到以上两个服务的对应路径下后,重启驱动即可。 + +> 注意:如果你只是想要在数据源中心使用 MySQL,则对 MySQL JDBC 驱动的版本没有要求,如果你想要将 MySQL 作为 DolphinScheduler 的元数据库, +> 则仅支持 [8.0.16 及以上](https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.16/mysql-connector-java-8.0.16.jar)的版本。 diff --git a/docs/docs/zh/guide/datasource/mysql.md b/docs/docs/zh/guide/datasource/mysql.md index 0e9ef4a07f..3727dac78e 100644 --- a/docs/docs/zh/guide/datasource/mysql.md +++ b/docs/docs/zh/guide/datasource/mysql.md @@ -11,3 +11,7 @@ - 密码:设置连接 MySQL 的密码 - 数据库名:输入连接 MySQL 的数据库名称 - Jdbc 连接参数:用于 MySQL 连接的参数设置,以 JSON 形式填写 + +## 是否原生支持 + +否,使用前需请参考[简介](introduction.md)中的 "样例" 章节激活数据源。 diff --git a/docs/docs/zh/guide/datasource/postgresql.md b/docs/docs/zh/guide/datasource/postgresql.md index ec3fd11a31..a0a8143641 100644 --- a/docs/docs/zh/guide/datasource/postgresql.md +++ b/docs/docs/zh/guide/datasource/postgresql.md @@ -11,3 +11,7 @@ - 密码:设置连接 POSTGRESQL 的密码 - 数据库名:输入连接 POSTGRESQL 的数据库名称 - Jdbc 连接参数:用于 POSTGRESQL 连接的参数设置,以 JSON 形式填写 + +## 是否原生支持 + +是,数据源不需要任务附加操作即可使用。 diff --git a/docs/docs/zh/guide/datasource/spark.md b/docs/docs/zh/guide/datasource/spark.md index 9b9702890d..946ff01101 100644 --- a/docs/docs/zh/guide/datasource/spark.md +++ b/docs/docs/zh/guide/datasource/spark.md @@ -17,3 +17,7 @@

+ +## 是否原生支持 + +是,数据源不需要任务附加操作即可使用。 diff --git a/docs/docs/zh/guide/start/docker.md b/docs/docs/zh/guide/start/docker.md index 1b365d4f2a..717d5cf65c 100644 --- a/docs/docs/zh/guide/start/docker.md +++ b/docs/docs/zh/guide/start/docker.md @@ -1,1023 +1,126 @@ -# 快速试用 Docker 部署 +# Docker 快速使用教程 -## 先决条件 +本教程使用三种不同的方式通过 Docker 完成 DolphinScheduler 的部署,如果你想要快速体验,推荐使用 standalone-server 镜像, +如果你想要体验比较完成的服务,推荐使用 docker-compose 启动服务。如果你已经有自己的数据库或者 Zookeeper 服务 +你想要沿用这些基础服务,你可以参考沿用已有的 PostgreSQL 和 ZooKeeper 服务完成部署。 + +## 前置条件 - [Docker](https://docs.docker.com/engine/install/) 1.13.1+ - [Docker Compose](https://docs.docker.com/compose/) 1.11.0+ -## 如何使用 Docker 镜像 - -有 3 种方式可以快速试用 DolphinScheduler - -### 一、以 docker-compose 的方式启动 DolphinScheduler (推荐) - -这种方式需要先安装 [docker-compose](https://docs.docker.com/compose/), docker-compose 的安装网上已经有非常多的资料,请自行安装即可 - -对于 Windows 7-10,你可以安装 [Docker Toolbox](https://github.com/docker/toolbox/releases)。对于 Windows 10 64-bit,你可以安装 [Docker Desktop](https://docs.docker.com/docker-for-windows/install/),并且注意[系统要求](https://docs.docker.com/docker-for-windows/install/#system-requirements) - -#### 0、请配置内存不少于 4GB - -对于 Mac 用户,点击 `Docker Desktop -> Preferences -> Resources -> Memory` - -对于 Windows Docker Toolbox 用户,有两项需要配置: +## 启动服务 -- **内存**:打开 Oracle VirtualBox Manager,如果你双击 Docker Quickstart Terminal 并成功运行 Docker Toolbox,你将会看到一个名为 `default` 的虚拟机. 点击 `设置 -> 系统 -> 主板 -> 内存大小` -- **端口转发**:点击 `设置 -> 网络 -> 高级 -> 端口转发 -> 添加`. `名称`,`主机端口` 和 `子系统端口` 都填写 `12345`,不填 `主机IP` 和 `子系统IP` +### 使用 standalone-server 镜像 -对于 Windows Docker Desktop 用户 -- **Hyper-V 模式**:点击 `Docker Desktop -> Settings -> Resources -> Memory` -- **WSL 2 模式**:参考 [WSL 2 utility VM](https://docs.microsoft.com/zh-cn/windows/wsl/wsl-config#configure-global-options-with-wslconfig) - -#### 1、下载源码包 - -请下载源码包 apache-dolphinscheduler--src.tar.gz,下载地址: [下载](/zh-cn/download/download.html) - -#### 2、拉取镜像并启动服务 - -> 对于 Mac 和 Linux 用户,打开 **Terminal** -> 对于 Windows Docker Toolbox 用户,打开 **Docker Quickstart Terminal** -> 对于 Windows Docker Desktop 用户,打开 **Windows PowerShell** +使用 standalone-server 镜像启动一个 DolphinScheduler standalone-server 容器应该是最快体验 DolphinScheduler 的方法。通过这个方式 +你可以最快速的体验到 DolphinScheduler 的大部分功能,了解主要和概念和内容。 +```shell +$ DOLPHINSCHEDULER_VERSION= +$ docker run --name dolphinscheduler-standalone-server -p 12345:12345 -p 25333:25333 -d apache/dolphinscheduler-standalone-server:"${DOLPHINSCHEDULER_VERSION}" ``` -$ tar -zxvf apache-dolphinscheduler--src.tar.gz -$ cd apache-dolphinscheduler--src/deploy/docker -$ docker pull dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler: -$ docker tag apache/dolphinscheduler: apache/dolphinscheduler:latest -$ docker-compose up -d -``` - -> PowerShell 应该使用 `cd apache-dolphinscheduler--src\deploy\docker` -**PostgreSQL** (用户 `root`, 密码 `root`, 数据库 `dolphinscheduler`) 和 **ZooKeeper** 服务将会默认启动 +> 注意:请不要将 apache/dolphinscheduler-standalone-server 镜像作为生产镜像,应该仅仅作为快速体验 DolphinScheduler 的功能的途径。 +> 除了因为他将全部服务运行在一个进程中外,还因为其使用内存数据库 H2 储存其元数据,当服务停止时内存数据库中的数据将会被清空。另外 +> apache/dolphinscheduler-standalone-server 仅包含 DolphinScheduler 核心服务,部分任务组件(如 Spark 和 Flink 等), +> 告警组件(如 Telegram 和 Dingtalk 等)需要外部的组件或对应的配置后 -#### 3、登录系统 +### 使用 docker-compose 启动服务 -访问前端页面:[http://localhost:12345/dolphinscheduler](http://localhost:12345/dolphinscheduler) ,如果有需要请修改成对应的 IP 地址 +使用 docker-compose 启动服务相比 standalone-server 的优点是 DolphinScheduler 的各个是独立的容器和进程,相互影响降到最小,且能够在 +服务重启的时候保留元数据(如需要挂载到本地路径需要做指定)。他更健壮,能保证用户体验更加完整的 DolphinScheduler 服务。这种方式需要先安装 +[docker-compose](https://docs.docker.com/compose/install/),链接适用于 Mac,Linux,Windows。 -默认的用户是`admin`,默认的密码是`dolphinscheduler123` +安装完成 docker-compose 后我们需要修改部分配置以便能更好体验 DolphinScheduler 服务,我们需要配置不少于 4GB 的内存: -![login](/img/new_ui/dev/quick-start/login.png) - -请参考用户手册章节的[快速上手](../start/quick-start.md)查看如何使用DolphinScheduler - -### 二、通过指定已存在的 PostgreSQL 和 ZooKeeper 服务 - -这种方式需要先安装 [docker](https://docs.docker.com/engine/install/), docker 的安装网上已经有非常多的资料,请自行安装即可 - -#### 1、基础软件安装 (请自行安装) - -- [PostgreSQL](https://www.postgresql.org/download/) (8.2.15+) -- [ZooKeeper](https://zookeeper.apache.org/releases.html) (3.4.6+) -- [Docker](https://docs.docker.com/engine/install/) (1.13.1+) - -#### 2、请登录 PostgreSQL 数据库,创建名为 `dolphinscheduler` 数据库 - -#### 3、初始化数据库,导入 `sql/dolphinscheduler_postgre.sql` 进行创建表及基础数据导入 +- Mac:点击 `Docker Desktop -> Preferences -> Resources -> Memory` 调整内存大小 +- Windows Docker Desktop: + - Hyper-V 模式:点击 `Docker Desktop -> Settings -> Resources -> Memory` 调整内存大小 + - WSL 2 模式 模式:参考 [WSL 2 utility VM](https://docs.microsoft.com/zh-cn/windows/wsl/wsl-config#configure-global-options-with-wslconfig) 调整内存大小 -#### 4、下载 DolphinScheduler 镜像 +配置完成后我们需要获取 `docker-compose.yaml` 文件,通过[下载页面](/zh-cn/download/download.html)下载对应版本源码包可能是最快的方法, +源码包对应的值为 "Total Source Code"。当下载完源码后就可以运行命令进行部署了。 -我们已将面向用户的 DolphinScheduler 镜像上传至 docker 仓库,用户无需在本地构建镜像,直接执行以下命令从 docker 仓库 pull 镜像: - -``` -docker pull dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler: -``` - -#### 5、运行一个 DolphinScheduler 实例 - -``` -$ docker run -d --name dolphinscheduler \ --e DATABASE_HOST="192.168.x.x" -e DATABASE_PORT="5432" -e DATABASE_DATABASE="dolphinscheduler" \ --e DATABASE_USERNAME="test" -e DATABASE_PASSWORD="test" \ --e ZOOKEEPER_QUORUM="192.168.x.x:2181" \ --p 12345:12345 \ -apache/dolphinscheduler: all +```shell +$ DOLPHINSCHEDULER_VERSION= +$ tar -zxf apache-dolphinscheduler-"${DOLPHINSCHEDULER_VERSION}"-src.tar.gz +# Mac Linux 用户 +$ cd apache-dolphinscheduler-"${DOLPHINSCHEDULER_VERSION}"-src/deploy/docker +# Windows 用户 +$ cd apache-dolphinscheduler-"${DOLPHINSCHEDULER_VERSION}"-src\deploy\docker +$ docker-compose up -d ``` -注:数据库用户 test 和密码 test 需要替换为实际的 PostgreSQL 用户和密码,192.168.x.x 需要替换为 PostgreSQL 和 ZooKeeper 的主机 IP +> 提醒:通过 docker-compose 启动服务时,除了会启动 DolphinScheduler 对应的服务外,还会启动必要依赖服务,如数据库 PostgreSQL(用户 +> `root`, 密码 `root`, 数据库 `dolphinscheduler`) 和 服务发现 ZooKeeper。 -#### 6、登录系统 +### 沿用已有的 PostgreSQL 和 ZooKeeper 服务 -同上 +使用 docker-compose 启动服务会新启动数据库,以及 ZooKeeper 服务。如果你已经有在运行中的数据库,或者 +ZooKeeper 且不想启动新的服务,可以使用这个方式分别启动 DolphinScheduler 容器。 -### 三、运行 DolphinScheduler 中的独立服务 - -在容器启动时,会自动启动以下服务: - -``` - MasterServer ----- master 服务 - WorkerServer ----- worker 服务 - ApiApplicationServer ----- api 服务 - AlertServer ----- alert 服务 -``` - -如果你只是想运行 dolphinscheduler 中的部分服务,你可以够通执行以下命令来运行 dolphinscheduler 中的部分服务 - -* 启动一个 **master server**, 如下: - -``` +```shell +$ DOLPHINSCHEDULER_VERSION= +# 初始化数据库,其确保数据库 已经存在 +$ docker run -d --name dolphinscheduler-tools \ + -e DATABASE="postgresql" \ + # 如果使用 MySQL 则使用 "com.mysql.cj.jdbc.driver" + -e SPRING_DATASOURCE_DRIVER_CLASS_NAME="org.postgresql.Driver" \ + -e SPRING_DATASOURCE_URL="jdbc:postgresql://localhost:5432/" \ + -e SPRING_DATASOURCE_USERNAME="" \ + -e SPRING_DATASOURCE_PASSWORD="" \ + apache/dolphinscheduler-tools:"${DOLPHINSCHEDULER_VERSION}" bin/create-schema.sh +# 启动 DolphinScheduler 对应的服务 $ docker run -d --name dolphinscheduler-master \ --e DATABASE_HOST="192.168.x.x" -e DATABASE_PORT="5432" -e DATABASE_DATABASE="dolphinscheduler" \ --e DATABASE_USERNAME="test" -e DATABASE_PASSWORD="test" \ --e ZOOKEEPER_QUORUM="192.168.x.x:2181" \ -apache/dolphinscheduler: master-server -``` - -* 启动一个 **worker server**, 如下: - -``` + -e DATABASE="postgresql" \ + -e SPRING_DATASOURCE_DRIVER_CLASS_NAME="org.postgresql.Driver" \ + -e SPRING_DATASOURCE_URL="jdbc:postgresql://localhost:5432/dolphinscheduler" \ + -e SPRING_DATASOURCE_USERNAME="" \ + -e SPRING_DATASOURCE_PASSWORD="" \ + -e REGISTRY_ZOOKEEPER_CONNECT_STRING="localhost:2181" \ + -d apache/dolphinscheduler-master:"${DOLPHINSCHEDULER_VERSION}" $ docker run -d --name dolphinscheduler-worker \ --e DATABASE_HOST="192.168.x.x" -e DATABASE_PORT="5432" -e DATABASE_DATABASE="dolphinscheduler" \ --e DATABASE_USERNAME="test" -e DATABASE_PASSWORD="test" \ --e ZOOKEEPER_QUORUM="192.168.x.x:2181" \ -apache/dolphinscheduler: worker-server -``` - -* 启动一个 **api server**, 如下: - -``` + -e DATABASE="postgresql" \ + -e SPRING_DATASOURCE_DRIVER_CLASS_NAME="org.postgresql.Driver" \ + -e SPRING_DATASOURCE_URL="jdbc:postgresql://localhost:5432/dolphinscheduler" \ + -e SPRING_DATASOURCE_USERNAME="" \ + -e SPRING_DATASOURCE_PASSWORD="" \ + -e REGISTRY_ZOOKEEPER_CONNECT_STRING="localhost:2181" \ + -d apache/dolphinscheduler-worker:"${DOLPHINSCHEDULER_VERSION}" $ docker run -d --name dolphinscheduler-api \ --e DATABASE_HOST="192.168.x.x" -e DATABASE_PORT="5432" -e DATABASE_DATABASE="dolphinscheduler" \ --e DATABASE_USERNAME="test" -e DATABASE_PASSWORD="test" \ --e ZOOKEEPER_QUORUM="192.168.x.x:2181" \ --p 12345:12345 \ -apache/dolphinscheduler: api-server -``` + -e DATABASE="postgresql" \ + -e SPRING_DATASOURCE_DRIVER_CLASS_NAME="org.postgresql.Driver" \ + -e SPRING_DATASOURCE_URL="jdbc:postgresql://localhost:5432/dolphinscheduler" \ + -e SPRING_DATASOURCE_USERNAME="" \ + -e SPRING_DATASOURCE_PASSWORD="" \ + -e REGISTRY_ZOOKEEPER_CONNECT_STRING="localhost:2181" \ + -d apache/dolphinscheduler-api:"${DOLPHINSCHEDULER_VERSION}" +$ docker run -d --name dolphinscheduler-alert-server \ + -e DATABASE="postgresql" \ + -e SPRING_DATASOURCE_DRIVER_CLASS_NAME="org.postgresql.Driver" \ + -e SPRING_DATASOURCE_URL="jdbc:postgresql://localhost:5432/dolphinscheduler" \ + -e SPRING_DATASOURCE_USERNAME="" \ + -e SPRING_DATASOURCE_PASSWORD="" \ + -e REGISTRY_ZOOKEEPER_CONNECT_STRING="localhost:2181" \ + -d apache/dolphinscheduler-alert-server:"${DOLPHINSCHEDULER_VERSION}" +``` + +> 注意:如果你本地还没有对应的数据库和 ZooKeeper 服务,但是想要尝试这个启动方式,可以先安装并启动 +> [PostgreSQL](https://www.postgresql.org/download/)(8.2.15+) 以及 [ZooKeeper](https://zookeeper.apache.org/releases.html)(3.4.6+) + +## 登录系统 + +不管你是用那种方式启动的服务,只要服务启动后,你都可以通过 [http://localhost:12345/dolphinscheduler/ui](http://localhost:12345/dolphinscheduler/ui) +访问 DolphinScheduler。访问上述链接后会跳转到登陆页面,DolphinScheduler 默认的用户和密码分别为 `admin` 和 `dolphinscheduler123`。 +想要了解更多操作请参考用户手册[快速上手](../start/quick-start.md)。 -* 启动一个 **alert server**, 如下: - -``` -$ docker run -d --name dolphinscheduler-alert \ --e DATABASE_HOST="192.168.x.x" -e DATABASE_PORT="5432" -e DATABASE_DATABASE="dolphinscheduler" \ --e DATABASE_USERNAME="test" -e DATABASE_PASSWORD="test" \ -apache/dolphinscheduler: alert-server -``` +![login](/img/new_ui/dev/quick-start/login.png) -**注意**: 当你运行 dolphinscheduler 中的部分服务时,你必须指定这些环境变量 `DATABASE_HOST`, `DATABASE_PORT`, `DATABASE_DATABASE`, `DATABASE_USERNAME`, `DATABASE_PASSWORD`, `ZOOKEEPER_QUORUM`。 +> 注意:如果你使用沿用已有的 PostgreSQL 和 ZooKeeper 服务方式启动服务,且服务分布在多台机器中, +> 请将上述的地址改成你 API 容器启动的 hostname 或者 IP。 ## 环境变量 -Docker 容器通过环境变量进行配置,[附录-环境变量](#appendix-environment-variables) 列出了 DolphinScheduler 的可配置环境变量及其默认值 - -特别地,在 Docker Compose 和 Docker Swarm 中,可以通过环境变量配置文件 `config.env.sh` 进行配置 - -## 支持矩阵 - -| Type | 支持 | 备注 | -| ------------------------------------------------------------ | ------- | --------------------- | -| Shell | 是 | | -| Python2 | 是 | | -| Python3 | 间接支持 | 详见 FAQ | -| Hadoop2 | 间接支持 | 详见 FAQ | -| Hadoop3 | 尚未确定 | 尚未测试 | -| Spark-Local(client) | 间接支持 | 详见 FAQ | -| Spark-YARN(cluster) | 间接支持 | 详见 FAQ | -| Spark-Standalone(cluster) | 尚不 | | -| Spark-Kubernetes(cluster) | 尚不 | | -| Flink-Local(local>=1.11) | 尚不 | Generic CLI 模式尚未支持 | -| Flink-YARN(yarn-cluster) | 间接支持 | 详见 FAQ | -| Flink-YARN(yarn-session/yarn-per-job/yarn-application>=1.11) | 尚不 | Generic CLI 模式尚未支持 | -| Flink-Standalone(default) | 尚不 | | -| Flink-Standalone(remote>=1.11) | 尚不 | Generic CLI 模式尚未支持 | -| Flink-Kubernetes(default) | 尚不 | | -| Flink-Kubernetes(remote>=1.11) | 尚不 | Generic CLI 模式尚未支持 | -| Flink-NativeKubernetes(kubernetes-session/application>=1.11) | 尚不 | Generic CLI 模式尚未支持 | -| MapReduce | 间接支持 | 详见 FAQ | -| Kerberos | 间接支持 | 详见 FAQ | -| HTTP | 是 | | -| DataX | 间接支持 | 详见 FAQ | -| Sqoop | 间接支持 | 详见 FAQ | -| SQL-MySQL | 间接支持 | 详见 FAQ | -| SQL-PostgreSQL | 是 | | -| SQL-Hive | 间接支持 | 详见 FAQ | -| SQL-Spark | 间接支持 | 详见 FAQ | -| SQL-ClickHouse | 间接支持 | 详见 FAQ | -| SQL-Oracle | 间接支持 | 详见 FAQ | -| SQL-SQLServer | 间接支持 | 详见 FAQ | -| SQL-DB2 | 间接支持 | 详见 FAQ | - -## FAQ - -### 如何通过 docker-compose 管理 DolphinScheduler? - -启动、重启、停止或列出所有容器: - -``` -docker-compose start -docker-compose restart -docker-compose stop -docker-compose ps -``` - -停止所有容器并移除所有容器、网络: - -``` -docker-compose down -``` - -停止所有容器并移除所有容器、网络和存储卷: - -``` -docker-compose down -v -``` - -### 如何查看一个容器的日志? - -列出所有运行的容器: - -``` -docker ps -docker ps --format "{{.Names}}" # 只打印名字 -``` - -查看名为 docker-swarm_dolphinscheduler-api_1 的容器的日志: - -``` -docker logs docker-swarm_dolphinscheduler-api_1 -docker logs -f docker-swarm_dolphinscheduler-api_1 # 跟随日志输出 -docker logs --tail 10 docker-swarm_dolphinscheduler-api_1 # 显示倒数10行日志 -``` - -### 如何通过 docker-compose 扩缩容 master 和 worker? - -扩缩容 master 至 2 个实例: - -``` -docker-compose up -d --scale dolphinscheduler-master=2 dolphinscheduler-master -``` - -扩缩容 worker 至 3 个实例: - -``` -docker-compose up -d --scale dolphinscheduler-worker=3 dolphinscheduler-worker -``` - -### 如何在 Docker Swarm 上部署 DolphinScheduler? - -假设 Docker Swarm 集群已经部署(如果还没有创建 Docker Swarm 集群,请参考 [create-swarm](https://docs.docker.com/engine/swarm/swarm-tutorial/create-swarm/)) - -启动名为 dolphinscheduler 的 stack: - -``` -docker stack deploy -c docker-stack.yml dolphinscheduler -``` - -列出名为 dolphinscheduler 的 stack 的所有服务: - -``` -docker stack services dolphinscheduler -``` - -停止并移除名为 dolphinscheduler 的 stack: - -``` -docker stack rm dolphinscheduler -``` - -移除名为 dolphinscheduler 的 stack 的所有存储卷: - -``` -docker volume rm -f $(docker volume ls --format "{{.Name}}" | grep -e "^dolphinscheduler") -``` - -### 如何在 Docker Swarm 上扩缩容 master 和 worker? - -扩缩容名为 dolphinscheduler 的 stack 的 master 至 2 个实例: - -``` -docker service scale dolphinscheduler_dolphinscheduler-master=2 -``` - -扩缩容名为 dolphinscheduler 的 stack 的 worker 至 3 个实例: - -``` -docker service scale dolphinscheduler_dolphinscheduler-worker=3 -``` - -### 如何构建一个 Docker 镜像? - -#### 从源码构建 (需要 Maven 3.3+ & JDK 1.8+) - -类 Unix 系统,在 Terminal 中执行: - -```bash -$ bash ./docker/build/hooks/build -``` - -Windows 系统,在 cmd 或 PowerShell 中执行: - -```bat -C:\dolphinscheduler-src>.\docker\build\hooks\build.bat -``` - -如果你不理解 `./docker/build/hooks/build` `./docker/build/hooks/build.bat` 这些脚本,请阅读里面的内容 - -#### 从二进制包构建 (不需要 Maven 3.3+ & JDK 1.8+) - -请下载二进制包 apache-dolphinscheduler--bin.tar.gz,下载地址: [下载](/zh-cn/download/download.html). 然后将 apache-dolphinscheduler--bin.tar.gz 放到 `apache-dolphinscheduler--src/docker/build` 目录里,在 Terminal 或 PowerShell 中执行: - -``` -$ cd apache-dolphinscheduler--src/docker/build -$ docker build --build-arg VERSION= -t apache/dolphinscheduler: . -``` - -> PowerShell 应该使用 `cd apache-dolphinscheduler--src/docker/build` - -#### 构建多平台架构镜像 - -目前支持构建 `linux/amd64` 和 `linux/arm64` 平台架构的镜像,要求: - -1. 支持 [docker buildx](https://docs.docker.com/engine/reference/commandline/buildx/) -2. 具有 https://hub.docker.com/r/apache/dolphinscheduler 的 push 权限(**务必谨慎**: 构建命令默认会自动将多平台架构镜像推送到 apache/dolphinscheduler 的 docker hub) - -执行: - -```bash -$ docker login # 登录, 用于推送 apache/dolphinscheduler -$ bash ./docker/build/hooks/build x -``` - -### 如何为 Docker 添加一个环境变量? - -如果你想在编译的时候或者运行的时候附加一些其它的操作及新增一些环境变量,你可以在`/root/start-init-conf.sh`文件中进行修改,同时如果涉及到配置文件的修改,请在`/opt/dolphinscheduler/conf/*.tpl`中修改相应的配置文件 - -例如,在`/root/start-init-conf.sh`添加一个环境变量`SECURITY_AUTHENTICATION_TYPE`: - -``` -export SECURITY_AUTHENTICATION_TYPE=PASSWORD -``` - -当添加以上环境变量后,你应该在相应的模板文件`application-api.properties.tpl`中添加这个环境变量配置: - -``` -security.authentication.type=${SECURITY_AUTHENTICATION_TYPE} -``` - -`/root/start-init-conf.sh` 将根据模板文件动态的生成配置文件: - -```sh -echo "generate dolphinscheduler config" -ls ${DOLPHINSCHEDULER_HOME}/conf/ | grep ".tpl" | while read line; do -eval "cat << EOF -$(cat ${DOLPHINSCHEDULER_HOME}/conf/${line}) -EOF -" > ${DOLPHINSCHEDULER_HOME}/conf/${line%.*} -done -``` - -### 如何用 MySQL 替代 PostgreSQL 作为 DolphinScheduler 的数据库? - -> 由于商业许可证的原因,我们不能直接使用 MySQL 的驱动包. -> -> 如果你要使用 MySQL, 你可以基于官方镜像 `apache/dolphinscheduler` 进行构建. - -1. 下载 MySQL 驱动包 [mysql-connector-java-8.0.16.jar](https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.16/mysql-connector-java-8.0.16.jar) - -2. 创建一个新的 `Dockerfile`,用于添加 MySQL 的驱动包: - -``` -FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler: -COPY mysql-connector-java-8.0.16.jar /opt/dolphinscheduler/lib -``` - -3. 构建一个包含 MySQL 驱动包的新镜像: - -``` -docker build -t apache/dolphinscheduler:mysql-driver . -``` - -4. 修改 `docker-compose.yml` 文件中的所有 image 字段为 `apache/dolphinscheduler:mysql-driver` - -> 如果你想在 Docker Swarm 上部署 dolphinscheduler,你需要修改 `docker-stack.yml` - -5. 注释 `docker-compose.yml` 文件中的 `dolphinscheduler-postgresql` 块 - -6. 在 `docker-compose.yml` 文件中添加 `dolphinscheduler-mysql` 服务(**可选**,你可以直接使用一个外部的 MySQL 数据库) - -7. 修改 `config.env.sh` 文件中的 DATABASE 环境变量 - -``` -DATABASE_TYPE=mysql -DATABASE_DRIVER=com.mysql.jdbc.Driver -DATABASE_HOST=dolphinscheduler-mysql -DATABASE_PORT=3306 -DATABASE_USERNAME=root -DATABASE_PASSWORD=root -DATABASE_DATABASE=dolphinscheduler -DATABASE_PARAMS=useUnicode=true&characterEncoding=UTF-8 -``` - -> 如果你已经添加了 `dolphinscheduler-mysql` 服务,设置 `DATABASE_HOST` 为 `dolphinscheduler-mysql` 即可 - -8. 运行 dolphinscheduler (详见**如何使用docker镜像**) - -### 如何在数据源中心支持 MySQL 数据源? - -> 由于商业许可证的原因,我们不能直接使用 MySQL 的驱动包. -> -> 如果你要添加 MySQL 数据源, 你可以基于官方镜像 `apache/dolphinscheduler` 进行构建. - -1. 下载 MySQL 驱动包 [mysql-connector-java-8.0.16.jar](https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.16/mysql-connector-java-8.0.16.jar) - -2. 创建一个新的 `Dockerfile`,用于添加 MySQL 驱动包: - -``` -FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler: -COPY mysql-connector-java-8.0.16.jar /opt/dolphinscheduler/lib -``` - -3. 构建一个包含 MySQL 驱动包的新镜像: - -``` -docker build -t apache/dolphinscheduler:mysql-driver . -``` - -4. 将 `docker-compose.yml` 文件中的所有 `image` 字段修改为 `apache/dolphinscheduler:mysql-driver` - -> 如果你想在 Docker Swarm 上部署 dolphinscheduler,你需要修改 `docker-stack.yml` - -5. 运行 dolphinscheduler (详见**如何使用docker镜像**) - -6. 在数据源中心添加一个 MySQL 数据源 - -### 如何在数据源中心支持 Oracle 数据源? - -> 由于商业许可证的原因,我们不能直接使用 Oracle 的驱动包. -> -> 如果你要添加 Oracle 数据源, 你可以基于官方镜像 `apache/dolphinscheduler` 进行构建. - -1. 下载 Oracle 驱动包 [ojdbc8.jar](https://repo1.maven.org/maven2/com/oracle/database/jdbc/ojdbc8/) (such as `ojdbc8-19.9.0.0.jar`) - -2. 创建一个新的 `Dockerfile`,用于添加 Oracle 驱动包: - -``` -FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler: -COPY ojdbc8-19.9.0.0.jar /opt/dolphinscheduler/lib -``` - -3. 构建一个包含 Oracle 驱动包的新镜像: - -``` -docker build -t apache/dolphinscheduler:oracle-driver . -``` - -4. 将 `docker-compose.yml` 文件中的所有 `image` 字段修改为 `apache/dolphinscheduler:oracle-driver` - -> 如果你想在 Docker Swarm 上部署 dolphinscheduler,你需要修改 `docker-stack.yml` - -5. 运行 dolphinscheduler (详见**如何使用docker镜像**) - -6. 在数据源中心添加一个 Oracle 数据源 - -### 如何支持 Python 2 pip 以及自定义 requirements.txt? - -1. 创建一个新的 `Dockerfile`,用于安装 pip: - -``` -FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler: -COPY requirements.txt /tmp -RUN apt-get update && \ - apt-get install -y --no-install-recommends python-pip && \ - pip install --no-cache-dir -r /tmp/requirements.txt && \ - rm -rf /var/lib/apt/lists/* -``` - -这个命令会安装默认的 **pip 18.1**. 如果你想升级 pip, 只需添加一行 - -``` - pip install --no-cache-dir -U pip && \ -``` - -2. 构建一个包含 pip 的新镜像: - -``` -docker build -t apache/dolphinscheduler:pip . -``` - -3. 将 `docker-compose.yml` 文件中的所有 `image` 字段修改为 `apache/dolphinscheduler:pip` - -> 如果你想在 Docker Swarm 上部署 dolphinscheduler,你需要修改 `docker-stack.yml` - -4. 运行 dolphinscheduler (详见**如何使用docker镜像**) - -5. 在一个新 Python 任务下验证 pip - -### 如何支持 Python 3? - -1. 创建一个新的 `Dockerfile`,用于安装 Python 3: - -``` -FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler: -RUN apt-get update && \ - apt-get install -y --no-install-recommends python3 && \ - rm -rf /var/lib/apt/lists/* -``` - -这个命令会安装默认的 **Python 3.7.3**. 如果你也想安装 **pip3**, 将 `python3` 替换为 `python3-pip` 即可 - -``` - apt-get install -y --no-install-recommends python3-pip && \ -``` - -2. 构建一个包含 Python 3 的新镜像: - -``` -docker build -t apache/dolphinscheduler:python3 . -``` - -3. 将 `docker-compose.yml` 文件中的所有 `image` 字段修改为 `apache/dolphinscheduler:python3` - -> 如果你想在 Docker Swarm 上部署 dolphinscheduler,你需要修改 `docker-stack.yml` - -4. 修改 `config.env.sh` 文件中的 `PYTHON_HOME` 为 `/usr/bin/python3` - -5. 运行 dolphinscheduler (详见**如何使用docker镜像**) - -6. 在一个新 Python 任务下验证 Python 3 - -### 如何支持 Hadoop, Spark, Flink, Hive 或 DataX? - -以 Spark 2.4.7 为例: - -1. 下载 Spark 2.4.7 发布的二进制包 `spark-2.4.7-bin-hadoop2.7.tgz` - -2. 运行 dolphinscheduler (详见**如何使用docker镜像**) - -3. 复制 Spark 2.4.7 二进制包到 Docker 容器中 - -```bash -docker cp spark-2.4.7-bin-hadoop2.7.tgz docker-swarm_dolphinscheduler-worker_1:/opt/soft -``` - -因为存储卷 `dolphinscheduler-shared-local` 被挂载到 `/opt/soft`, 因此 `/opt/soft` 中的所有文件都不会丢失 - -4. 登录到容器并确保 `SPARK_HOME2` 存在 - -```bash -docker exec -it docker-swarm_dolphinscheduler-worker_1 bash -cd /opt/soft -tar zxf spark-2.4.7-bin-hadoop2.7.tgz -rm -f spark-2.4.7-bin-hadoop2.7.tgz -ln -s spark-2.4.7-bin-hadoop2.7 spark2 # 或者 mv -$SPARK_HOME2/bin/spark-submit --version -``` - -如果一切执行正常,最后一条命令将会打印 Spark 版本信息 - -5. 在一个 Shell 任务下验证 Spark - -``` -$SPARK_HOME2/bin/spark-submit --class org.apache.spark.examples.SparkPi $SPARK_HOME2/examples/jars/spark-examples_2.11-2.4.7.jar -``` - -检查任务日志是否包含输出 `Pi is roughly 3.146015` - -6. 在一个 Spark 任务下验证 Spark - -文件 `spark-examples_2.11-2.4.7.jar` 需要先被上传到资源中心,然后创建一个 Spark 任务并设置: - -- Spark版本: `SPARK2` -- 主函数的 Class: `org.apache.spark.examples.SparkPi` -- 主程序包: `spark-examples_2.11-2.4.7.jar` -- 部署方式: `local` - -同样地, 检查任务日志是否包含输出 `Pi is roughly 3.146015` - -7. 验证 Spark on YARN - -Spark on YARN (部署方式为 `cluster` 或 `client`) 需要 Hadoop 支持. 类似于 Spark 支持, 支持 Hadoop 的操作几乎和前面的步骤相同 - -确保 `$HADOOP_HOME` 和 `$HADOOP_CONF_DIR` 存在 - -### 如何支持 Spark 3? - -事实上,使用 `spark-submit` 提交应用的方式是相同的, 无论是 Spark 1, 2 或 3. 换句话说,`SPARK_HOME2` 的语义是第二个 `SPARK_HOME`, 而非 `SPARK2` 的 `HOME`, 因此只需设置 `SPARK_HOME2=/path/to/spark3` 即可 - -以 Spark 3.1.1 为例: - -1. 下载 Spark 3.1.1 发布的二进制包 `spark-3.1.1-bin-hadoop2.7.tgz` - -2. 运行 dolphinscheduler (详见**如何使用docker镜像**) - -3. 复制 Spark 3.1.1 二进制包到 Docker 容器中 - -```bash -docker cp spark-3.1.1-bin-hadoop2.7.tgz docker-swarm_dolphinscheduler-worker_1:/opt/soft -``` - -4. 登录到容器并确保 `SPARK_HOME2` 存在 - -```bash -docker exec -it docker-swarm_dolphinscheduler-worker_1 bash -cd /opt/soft -tar zxf spark-3.1.1-bin-hadoop2.7.tgz -rm -f spark-3.1.1-bin-hadoop2.7.tgz -ln -s spark-3.1.1-bin-hadoop2.7 spark2 # 或者 mv -$SPARK_HOME2/bin/spark-submit --version -``` - -如果一切执行正常,最后一条命令将会打印 Spark 版本信息 - -5. 在一个 Shell 任务下验证 Spark - -``` -$SPARK_HOME2/bin/spark-submit --class org.apache.spark.examples.SparkPi $SPARK_HOME2/examples/jars/spark-examples_2.12-3.1.1.jar -``` - -检查任务日志是否包含输出 `Pi is roughly 3.146015` - -### 如何在 Master、Worker 和 Api 服务之间支持共享存储? - -> **注意**: 如果是在单机上通过 docker-compose 部署,则步骤 1 和 2 可以直接跳过,并且执行命令如 `docker cp hadoop-3.2.2.tar.gz docker-swarm_dolphinscheduler-worker_1:/opt/soft` 将 Hadoop 放到容器中的共享目录 /opt/soft 下 - -例如, Master、Worker 和 Api 服务可能同时使用 Hadoop - -1. 修改 `docker-compose.yml` 文件中的 `dolphinscheduler-shared-local` 存储卷,以支持 nfs - -> 如果你想在 Docker Swarm 上部署 dolphinscheduler,你需要修改 `docker-stack.yml` - -```yaml -volumes: - dolphinscheduler-shared-local: - driver_opts: - type: "nfs" - o: "addr=10.40.0.199,nolock,soft,rw" - device: ":/path/to/shared/dir" -``` - -2. 将 Hadoop 放到 nfs - -3. 确保 `$HADOOP_HOME` 和 `$HADOOP_CONF_DIR` 正确 - -### 如何支持本地文件存储而非 HDFS 和 S3? - -> **注意**: 如果是在单机上通过 docker-compose 部署,则步骤 2 可以直接跳过 - -1. 修改 `config.env.sh` 文件中下面的环境变量: - -``` -RESOURCE_STORAGE_TYPE=HDFS -FS_DEFAULT_FS=file:/// -``` - -2. 修改 `docker-compose.yml` 文件中的 `dolphinscheduler-resource-local` 存储卷,以支持 nfs - -> 如果你想在 Docker Swarm 上部署 dolphinscheduler,你需要修改 `docker-stack.yml` - -```yaml -volumes: - dolphinscheduler-resource-local: - driver_opts: - type: "nfs" - o: "addr=10.40.0.199,nolock,soft,rw" - device: ":/path/to/resource/dir" -``` - -### 如何支持 S3 资源存储,例如 MinIO? - -以 MinIO 为例: 修改 `config.env.sh` 文件中下面的环境变量 - -``` -RESOURCE_STORAGE_TYPE=S3 -RESOURCE_UPLOAD_PATH=/dolphinscheduler -FS_DEFAULT_FS=s3a://BUCKET_NAME -FS_S3A_ENDPOINT=http://MINIO_IP:9000 -FS_S3A_ACCESS_KEY=MINIO_ACCESS_KEY -FS_S3A_SECRET_KEY=MINIO_SECRET_KEY -``` - -`BUCKET_NAME`, `MINIO_IP`, `MINIO_ACCESS_KEY` 和 `MINIO_SECRET_KEY` 需要被修改为实际值 - -> **注意**: `MINIO_IP` 只能使用 IP 而非域名, 因为 DolphinScheduler 尚不支持 S3 路径风格访问 (S3 path style access) - -### 如何配置 SkyWalking? - -修改 `config.env.sh` 文件中的 SKYWALKING 环境变量 - -``` -SKYWALKING_ENABLE=true -SW_AGENT_COLLECTOR_BACKEND_SERVICES=127.0.0.1:11800 -SW_GRPC_LOG_SERVER_HOST=127.0.0.1 -SW_GRPC_LOG_SERVER_PORT=11800 -``` - -## 附录-环境变量 - -### 数据库 - -**`DATABASE_TYPE`** - -配置`database`的`TYPE`, 默认值 `postgresql`。 - -**注意**: 当运行`dolphinscheduler`中`master-server`、`worker-server`、`api-server`、`alert-server`这些服务时,必须指定这个环境变量,以便于你更好的搭建分布式服务。 - -**`DATABASE_DRIVER`** - -配置`database`的`DRIVER`, 默认值 `org.postgresql.Driver`。 - -**注意**: 当运行`dolphinscheduler`中`master-server`、`worker-server`、`api-server`、`alert-server`这些服务时,必须指定这个环境变量,以便于你更好的搭建分布式服务。 - -**`DATABASE_HOST`** - -配置`database`的`HOST`, 默认值 `127.0.0.1`。 - -**注意**: 当运行`dolphinscheduler`中`master-server`、`worker-server`、`api-server`、`alert-server`这些服务时,必须指定这个环境变量,以便于你更好的搭建分布式服务。 - -**`DATABASE_PORT`** - -配置`database`的`PORT`, 默认值 `5432`。 - -**注意**: 当运行`dolphinscheduler`中`master-server`、`worker-server`、`api-server`、`alert-server`这些服务时,必须指定这个环境变量,以便于你更好的搭建分布式服务。 - -**`DATABASE_USERNAME`** - -配置`database`的`USERNAME`, 默认值 `root`。 - -**注意**: 当运行`dolphinscheduler`中`master-server`、`worker-server`、`api-server`、`alert-server`这些服务时,必须指定这个环境变量,以便于你更好的搭建分布式服务。 - -**`DATABASE_PASSWORD`** - -配置`database`的`PASSWORD`, 默认值 `root`。 - -**注意**: 当运行`dolphinscheduler`中`master-server`、`worker-server`、`api-server`、`alert-server`这些服务时,必须指定这个环境变量,以便于你更好的搭建分布式服务。 - -**`DATABASE_DATABASE`** - -配置`database`的`DATABASE`, 默认值 `dolphinscheduler`。 - -**注意**: 当运行`dolphinscheduler`中`master-server`、`worker-server`、`api-server`、`alert-server`这些服务时,必须指定这个环境变量,以便于你更好的搭建分布式服务。 - -**`DATABASE_PARAMS`** - -配置`database`的`PARAMS`, 默认值 `characterEncoding=utf8`。 - -**注意**: 当运行`dolphinscheduler`中`master-server`、`worker-server`、`api-server`、`alert-server`这些服务时,必须指定这个环境变量,以便于你更好的搭建分布式服务。 - -### ZooKeeper - -**`ZOOKEEPER_QUORUM`** - -配置`dolphinscheduler`的`Zookeeper`地址, 默认值 `127.0.0.1:2181`。 - -**注意**: 当运行`dolphinscheduler`中`master-server`、`worker-server`、`api-server`这些服务时,必须指定这个环境变量,以便于你更好的搭建分布式服务。 - -**`ZOOKEEPER_ROOT`** - -配置`dolphinscheduler`在`zookeeper`中数据存储的根目录,默认值 `/dolphinscheduler`。 - -### 通用 - -**`DOLPHINSCHEDULER_OPTS`** - -配置`dolphinscheduler`的`jvm options`,适用于`master-server`、`worker-server`、`api-server`、`alert-server`,默认值 `""`、 - -**`DATA_BASEDIR_PATH`** - -用户数据目录, 用户自己配置, 请确保这个目录存在并且用户读写权限, 默认值 `/tmp/dolphinscheduler`。 - -**`RESOURCE_STORAGE_TYPE`** - -配置`dolphinscheduler`的资源存储类型,可选项为 `HDFS`、`S3`、`NONE`,默认值 `HDFS`。 - -**`RESOURCE_UPLOAD_PATH`** - -配置`HDFS/S3`上的资源存储路径,默认值 `/dolphinscheduler`。 - -**`FS_DEFAULT_FS`** - -配置资源存储的文件系统协议,如 `file:///`, `hdfs://mycluster:8020` or `s3a://dolphinscheduler`,默认值 `file:///`。 - -**`FS_S3A_ENDPOINT`** - -当`RESOURCE_STORAGE_TYPE=S3`时,需要配置`S3`的访问路径,默认值 `s3.xxx.amazonaws.com`。 - -**`FS_S3A_ACCESS_KEY`** - -当`RESOURCE_STORAGE_TYPE=S3`时,需要配置`S3`的`s3 access key`,默认值 `xxxxxxx`。 - -**`FS_S3A_SECRET_KEY`** - -当`RESOURCE_STORAGE_TYPE=S3`时,需要配置`S3`的`s3 secret key`,默认值 `xxxxxxx`。 - -**`HADOOP_SECURITY_AUTHENTICATION_STARTUP_STATE`** - -配置`dolphinscheduler`是否启用kerberos,默认值 `false`。 - -**`JAVA_SECURITY_KRB5_CONF_PATH`** - -配置`dolphinscheduler`的java.security.krb5.conf路径,默认值 `/opt/krb5.conf`。 - -**`LOGIN_USER_KEYTAB_USERNAME`** - -配置`dolphinscheduler`登录用户的keytab用户名,默认值 `hdfs@HADOOP.COM`。 - -**`LOGIN_USER_KEYTAB_PATH`** - -配置`dolphinscheduler`登录用户的keytab路径,默认值 `/opt/hdfs.keytab`。 - -**`KERBEROS_EXPIRE_TIME`** - -配置`dolphinscheduler`的kerberos过期时间,单位为小时,默认值 `2`。 - -**`HDFS_ROOT_USER`** - -当`RESOURCE_STORAGE_TYPE=HDFS`时,配置`dolphinscheduler`的hdfs的root用户名,默认值 `hdfs`。 - -**`RESOURCE_MANAGER_HTTPADDRESS_PORT`** - -配置`dolphinscheduler`的resource manager httpaddress 端口,默认值 `8088`。 - -**`YARN_RESOURCEMANAGER_HA_RM_IDS`** - -配置`dolphinscheduler`的yarn resourcemanager ha rm ids,默认值 `空`。 - -**`YARN_APPLICATION_STATUS_ADDRESS`** - -配置`dolphinscheduler`的yarn application status地址,默认值 `http://ds1:%s/ws/v1/cluster/apps/%s`。 - -**`SKYWALKING_ENABLE`** - -配置`skywalking`是否启用. 默认值 `false`。 - -**`SW_AGENT_COLLECTOR_BACKEND_SERVICES`** - -配置`skywalking`的collector后端地址. 默认值 `127.0.0.1:11800`。 - -**`SW_GRPC_LOG_SERVER_HOST`** - -配置`skywalking`的grpc服务主机或IP. 默认值 `127.0.0.1`。 - -**`SW_GRPC_LOG_SERVER_PORT`** - -配置`skywalking`的grpc服务端口. 默认值 `11800`。 - -**`HADOOP_HOME`** - -配置`dolphinscheduler`的`HADOOP_HOME`,默认值 `/opt/soft/hadoop`。 - -**`HADOOP_CONF_DIR`** - -配置`dolphinscheduler`的`HADOOP_CONF_DIR`,默认值 `/opt/soft/hadoop/etc/hadoop`。 - -**`SPARK_HOME1`** - -配置`dolphinscheduler`的`SPARK_HOME1`,默认值 `/opt/soft/spark1`。 - -**`SPARK_HOME2`** - -配置`dolphinscheduler`的`SPARK_HOME2`,默认值 `/opt/soft/spark2`。 - -**`PYTHON_HOME`** - -配置`dolphinscheduler`的`PYTHON_HOME`,默认值 `/usr/bin/python`。 - -**`JAVA_HOME`** - -配置`dolphinscheduler`的`JAVA_HOME`,默认值 `/usr/local/openjdk-8`。 - -**`HIVE_HOME`** - -配置`dolphinscheduler`的`HIVE_HOME`,默认值 `/opt/soft/hive`。 - -**`FLINK_HOME`** - -配置`dolphinscheduler`的`FLINK_HOME`,默认值 `/opt/soft/flink`。 - -**`DATAX_HOME`** - -配置`dolphinscheduler`的`DATAX_HOME`,默认值 `/opt/soft/datax`。 - -### Master Server - -**`MASTER_SERVER_OPTS`** - -配置`master-server`的`jvm options`,默认值 `-Xms1g -Xmx1g -Xmn512m`。 - -**`MASTER_EXEC_THREADS`** - -配置`master-server`中的执行线程数量,默认值 `100`。 - -**`MASTER_EXEC_TASK_NUM`** - -配置`master-server`中的执行任务数量,默认值 `20`。 - -**`MASTER_DISPATCH_TASK_NUM`** - -配置`master-server`中的派发任务数量,默认值 `3`。 - -**`MASTER_HOST_SELECTOR`** - -配置`master-server`中派发任务时worker host的选择器,可选值为`Random`, `RoundRobin`和`LowerWeight`,默认值 `LowerWeight`。 - -**`MASTER_HEARTBEAT_INTERVAL`** - -配置`master-server`中的心跳交互时间,默认值 `10`。 - -**`MASTER_TASK_COMMIT_RETRYTIMES`** - -配置`master-server`中的任务提交重试次数,默认值 `5`。 - -**`MASTER_TASK_COMMIT_INTERVAL`** - -配置`master-server`中的任务提交交互时间,默认值 `1`。 - -**`MASTER_MAX_CPULOAD_AVG`** - -配置`master-server`中的CPU中的`load average`值,默认值 `-1`。 - -**`MASTER_RESERVED_MEMORY`** - -配置`master-server`的保留内存,单位为G,默认值 `0.3`。 - -### Worker Server - -**`WORKER_SERVER_OPTS`** - -配置`worker-server`的`jvm options`,默认值 `-Xms1g -Xmx1g -Xmn512m`。 - -**`WORKER_EXEC_THREADS`** - -配置`worker-server`中的执行线程数量,默认值 `100`。 - -**`WORKER_HEARTBEAT_INTERVAL`** - -配置`worker-server`中的心跳交互时间,默认值 `10`。 - -**`WORKER_MAX_CPULOAD_AVG`** - -配置`worker-server`中的CPU中的最大`load average`值,默认值 `-1`。 - -**`WORKER_RESERVED_MEMORY`** - -配置`worker-server`的保留内存,单位为G,默认值 `0.3`。 - -**`WORKER_GROUPS`** - -配置`worker-server`的分组,默认值 `default`。 - -### Alert Server - -**`ALERT_SERVER_OPTS`** - -配置`alert-server`的`jvm options`,默认值 `-Xms512m -Xmx512m -Xmn256m`。 - -**`XLS_FILE_PATH`** - -配置`alert-server`的`XLS`文件的存储路径,默认值 `/tmp/xls`。 - -**`MAIL_SERVER_HOST`** - -配置`alert-server`的邮件服务地址,默认值 `空`。 - -**`MAIL_SERVER_PORT`** - -配置`alert-server`的邮件服务端口,默认值 `空`。 - -**`MAIL_SENDER`** - -配置`alert-server`的邮件发送人,默认值 `空`。 - -**`MAIL_USER=`** - -配置`alert-server`的邮件服务用户名,默认值 `空`。 - -**`MAIL_PASSWD`** - -配置`alert-server`的邮件服务用户密码,默认值 `空`。 - -**`MAIL_SMTP_STARTTLS_ENABLE`** - -配置`alert-server`的邮件服务是否启用TLS,默认值 `true`。 - -**`MAIL_SMTP_SSL_ENABLE`** - -配置`alert-server`的邮件服务是否启用SSL,默认值 `false`。 - -**`MAIL_SMTP_SSL_TRUST`** - -配置`alert-server`的邮件服务SSL的信任地址,默认值 `空`。 - -**`ENTERPRISE_WECHAT_ENABLE`** - -配置`alert-server`的邮件服务是否启用企业微信,默认值 `false`。 - -**`ENTERPRISE_WECHAT_CORP_ID`** - -配置`alert-server`的邮件服务企业微信`ID`,默认值 `空`。 - -**`ENTERPRISE_WECHAT_SECRET`** - -配置`alert-server`的邮件服务企业微信`SECRET`,默认值 `空`。 - -**`ENTERPRISE_WECHAT_AGENT_ID`** - -配置`alert-server`的邮件服务企业微信`AGENT_ID`,默认值 `空`。 - -**`ENTERPRISE_WECHAT_USERS`** - -配置`alert-server`的邮件服务企业微信`USERS`,默认值 `空`。 - -### Api Server - -**`API_SERVER_OPTS`** - -配置`api-server`的`jvm options`,默认值 `-Xms512m -Xmx512m -Xmn256m`。 +可以通过环境变量来修改 Docker 运行的配置,我们在沿用已有的 PostgreSQL 和 ZooKeeper 服务中就通过环境变量修改了 Docker 的数据库配置和 +注册中心配置,关于全部的配置环境可以查看[全部的配置文件](https://github.com/apache/dolphinscheduler/blob//script/env/dolphinscheduler_env.sh) 了解