DolphinScheduler

分布式调度框架。

28 KiB

Raw Blame History

What is DolphinScheduler?

DolphinScheduler is a distributed and easy-to-expand visual DAG workflow scheduling system, dedicated to solving the complex dependencies in data processing, making the scheduling system out of the box for data processing.

GitHub URL: https://github.com/apache/incubator-dolphinscheduler

Official Website: https://dolphinscheduler.apache.org

Prerequisites

Docker 1.13.1+
Docker Compose 1.11.0+

How to use this docker image

You can start a dolphinscheduler by docker-compose (recommended)

$ docker-compose -f ./docker/docker-swarm/docker-compose.yml up -d

The default postgres user root, postgres password root and database dolphinscheduler are created in the docker-compose.yml.

The default zookeeper is created in the docker-compose.yml.

Access the Web UI: http://192.168.xx.xx:12345/dolphinscheduler

The default username is admin and the default password is dolphinscheduler123

Tip: For quick start in docker, you can create a tenant named ds and associate the user admin with the tenant ds

Or via Environment Variables `DATABASE_HOST`, `DATABASE_PORT`, `ZOOKEEPER_QUORUM`

You can specify existing postgres and zookeeper service. Example:

$ docker run -d --name dolphinscheduler \
-e DATABASE_HOST="192.168.x.x" -e DATABASE_PORT="5432" -e DATABASE_DATABASE="dolphinscheduler" \
-e DATABASE_USERNAME="test" -e DATABASE_PASSWORD="test" \
-e ZOOKEEPER_QUORUM="192.168.x.x:2181" \
-p 12345:12345 \
apache/dolphinscheduler:latest all

Access the Web UI：http://192.168.xx.xx:12345/dolphinscheduler

Or start a standalone dolphinscheduler server

You can start a standalone dolphinscheduler server.

Create a local volume for resource storage, For example:

docker volume create dolphinscheduler-resource-local

Start a master server, For example:

$ docker run -d --name dolphinscheduler-master \
-e DATABASE_HOST="192.168.x.x" -e DATABASE_PORT="5432" -e DATABASE_DATABASE="dolphinscheduler" \
-e DATABASE_USERNAME="test" -e DATABASE_PASSWORD="test" \
-e ZOOKEEPER_QUORUM="192.168.x.x:2181" \
apache/dolphinscheduler:latest master-server

Start a worker server (including logger server), For example:

$ docker run -d --name dolphinscheduler-worker \
-e DATABASE_HOST="192.168.x.x" -e DATABASE_PORT="5432" -e DATABASE_DATABASE="dolphinscheduler" \
-e DATABASE_USERNAME="test" -e DATABASE_PASSWORD="test" \
-e ZOOKEEPER_QUORUM="192.168.x.x:2181" \
-v dolphinscheduler-resource-local:/dolphinscheduler \
apache/dolphinscheduler:latest worker-server

Start a api server, For example:

$ docker run -d --name dolphinscheduler-api \
-e DATABASE_HOST="192.168.x.x" -e DATABASE_PORT="5432" -e DATABASE_DATABASE="dolphinscheduler" \
-e DATABASE_USERNAME="test" -e DATABASE_PASSWORD="test" \
-e ZOOKEEPER_QUORUM="192.168.x.x:2181" \
-v dolphinscheduler-resource-local:/dolphinscheduler \
-p 12345:12345 \
apache/dolphinscheduler:latest api-server

Start a alert server, For example:

$ docker run -d --name dolphinscheduler-alert \
-e DATABASE_HOST="192.168.x.x" -e DATABASE_PORT="5432" -e DATABASE_DATABASE="dolphinscheduler" \
-e DATABASE_USERNAME="test" -e DATABASE_PASSWORD="test" \
apache/dolphinscheduler:latest alert-server

Note: You must be specify DATABASE_HOST, DATABASE_PORT, DATABASE_DATABASE, DATABASE_USERNAME, DATABASE_PASSWORD, ZOOKEEPER_QUORUM when start a standalone dolphinscheduler server.

How to build a docker image

You can build a docker image in A Unix-like operating system, You can also build it in Windows operating system.

In Unix-Like, Example:

$ cd path/incubator-dolphinscheduler
$ sh ./docker/build/hooks/build

In Windows, Example:

C:\incubator-dolphinscheduler>.\docker\build\hooks\build.bat

Please read ./docker/build/hooks/build ./docker/build/hooks/build.bat script files if you don't understand

Support Matrix

Type	Support	Notes
Shell	Yes
Python2	Yes
Python3	Indirect Yes	Refer to FAQ
Hadoop2	Indirect Yes	Refer to FAQ
Hadoop3	Not Sure	Not tested
Spark-Local(client)	Indirect Yes	Refer to FAQ
Spark-YARN(cluster)	Indirect Yes	Refer to FAQ
Spark-Mesos(cluster)	Not Yet
Spark-Standalone(cluster)	Not Yet
Spark-Kubernetes(cluster)	Not Yet
Flink-Local(local>=1.11)	Not Yet	Generic CLI mode is not yet supported
Flink-YARN(yarn-cluster)	Indirect Yes	Refer to FAQ
Flink-YARN(yarn-session/yarn-per-job/yarn-application>=1.11)	Not Yet	Generic CLI mode is not yet supported
Flink-Mesos(default)	Not Yet
Flink-Mesos(remote>=1.11)	Not Yet	Generic CLI mode is not yet supported
Flink-Standalone(default)	Not Yet
Flink-Standalone(remote>=1.11)	Not Yet	Generic CLI mode is not yet supported
Flink-Kubernetes(default)	Not Yet
Flink-Kubernetes(remote>=1.11)	Not Yet	Generic CLI mode is not yet supported
Flink-NativeKubernetes(kubernetes-session/application>=1.11)	Not Yet	Generic CLI mode is not yet supported
MapReduce	Indirect Yes	Refer to FAQ
Kerberos	Indirect Yes	Refer to FAQ
HTTP	Yes
DataX	Indirect Yes	Refer to FAQ
Sqoop	Indirect Yes	Refer to FAQ
SQL-MySQL	Indirect Yes	Refer to FAQ
SQL-PostgreSQL	Yes
SQL-Hive	Indirect Yes	Refer to FAQ
SQL-Spark	Indirect Yes	Refer to FAQ
SQL-ClickHouse	Indirect Yes	Refer to FAQ
SQL-Oracle	Indirect Yes	Refer to FAQ
SQL-SQLServer	Indirect Yes	Refer to FAQ
SQL-DB2	Indirect Yes	Refer to FAQ

Environment Variables

The DolphinScheduler Docker container is configured through environment variables, and the default value will be used if an environment variable is not set.