README.md
Apache Dolphinscheduler
About
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code. It is also provided powerful user interface, dedicated to solving complex task dependencies in the data pipeline and providing various types of jobs available out of the box
The key features for DolphinScheduler are as follows:
- Easy to deploy, provide four ways to deploy which including Standalone, Cluster, Docker and Kubernetes.
- Easy to use, workflow can be created and managed by four ways, which including Web UI, Python SDK, Yaml file and Open API
- Highly reliable and high availability, decentralized architecture with multi-master and multi-worker, native supports horizontal scaling.
- High performance, its performance is N times faster than other orchestration platform and it can support tens of millions of tasks per day
- Cloud Native, DolphinScheduler supports orchestrating multi-cloud/data center workflow, and supports custom task type
- Versioning both workflow and workflow instance(including tasks)
- Various state control of workflow and task, support pause/stop/recover them in any time
- Multi-tenancy support
- Others like backfill support(Web UI native), permission control including project, resource and data source
QuickStart
- For quick experience
- Want to start with standalone
- Want to start with Docker
- For Kubernetes
Stability | Accessibility | Features | Scalability |
---|---|---|---|
Decentralized multi-master and multi-worker | Visualization of workflow key information, such as task status, task type, retry times, task operation machine information, visual variables, and so on at a glance. | Support pause, recover operation | Support customized task types |
support HA | Visualization of all workflow operations, dragging tasks to draw DAGs, configuring data sources and resources. At the same time, for third-party systems, provide API mode operations. | Users on DolphinScheduler can achieve many-to-one or one-to-one mapping relationship through tenants and Hadoop users, which is very important for scheduling large data jobs. | The scheduler supports distributed scheduling, and the overall scheduling capability will increase linearly with the scale of the cluster. Master and Worker support dynamic adjustment. |
Overload processing: By using the task queue mechanism, the number of schedulable tasks on a single machine can be flexibly configured. Machine jam can be avoided with high tolerance to numbers of tasks cached in task queue. | One-click deployment | Support traditional shell tasks, and big data platform task scheduling: MR, Spark, SQL (MySQL, PostgreSQL, hive, spark SQL), Python, Procedure, Sub_Process |
User Interface Screenshots
-
Homepage: Project and workflow overview, including the latest workflow instance and task instance status statistics.
-
Workflow Definition: Create and manage workflow by drag and drop, easy to build and maintain complex workflow, support bulk of tasks out of box.
-
Workflow Tree View: Abstract tree structure could clearer understanding of the relationship between tasks
-
Data source: Manage support multiple external data sources, provide unified data access capabilities for such as MySQL, PostgreSQL, Hive, Trino, etc.
-
Monitor: View the status of the master, worker and database in real time, including server resource usage and load, do quick health check without logging in to the server.
Suggestions & Bug Reports
Follow this guide to report your suggestions or bugs.
Contributing
The community welcomes everyone to contribute, please refer to this page to find out more: How to contribute, find the good first issue in here if you are new to DolphinScheduler.
Community
Welcome to join the Apache DolphinScheduler community by:
- Join the DolphinScheduler Slack to keep in touch with the community
- Follow the DolphinScheduler Twitter and get the latest news
- Subscribe DolphinScheduler mail list, users@dolphinscheduler.apache.org for user and dev@dolphinscheduler.apache.org for developer
Landscapes
DolphinScheduler enriches the CNCF CLOUD NATIVE Landscape.