Browse Source

update markdown docs which can not display images normally (#806)

pull/2/head
fancyChuan 5 years ago committed by dailidong
parent
commit
642f3093a4
  1. 24
      docs/en_US/architecture-design.md
  2. 2
      docs/zh_CN/系统使用手册.md
  3. 26
      docs/zh_CN/系统架构设计.md

24
docs/en_US/architecture-design.md

@ -6,7 +6,7 @@ Before explaining the architecture of the schedule system, let us first understa
**DAG:** Full name Directed Acyclic Graph,referred to as DAG。Tasks in the workflow are assembled in the form of directed acyclic graphs, which are topologically traversed from nodes with zero indegrees of ingress until there are no successor nodes. For example, the following picture:
<p align="center">
<img src="https://analysys.github.io/EasyScheduler/zh_CN/images/dag_examples_cn.jpg" alt="dag示例" width="60%" />
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/dag_examples_cn.jpg" alt="dag示例" width="60%" />
<p align="center">
<em>dag example</em>
</p>
@ -111,7 +111,7 @@ Before explaining the architecture of the schedule system, let us first understa
The centralized design concept is relatively simple. The nodes in the distributed cluster are divided into two roles according to their roles:
<p align="center">
<img src="https://analysys.github.io/EasyScheduler/zh_CN/images/master_slave.png" alt="master-slave role" width="50%" />
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/master_slave.png" alt="master-slave role" width="50%" />
</p>
- The role of Master is mainly responsible for task distribution and supervising the health status of Slave. It can dynamically balance the task to Slave, so that the Slave node will not be "busy" or "free".
@ -125,7 +125,7 @@ Problems in the design of centralized :
###### Decentralization
<p align="center"
<img src="https://analysys.github.io/EasyScheduler/zh_CN/images/decentralization.png" alt="decentralized" width="50%" />
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/decentralization.png" alt="decentralized" width="50%" />
</p>
- In the decentralized design, there is usually no Master/Slave concept, all roles are the same, the status is equal, the global Internet is a typical decentralized distributed system, networked arbitrary node equipment down machine , all will only affect a small range of features.
@ -141,13 +141,13 @@ EasyScheduler uses ZooKeeper distributed locks to implement only one Master to e
1. The core process algorithm for obtaining distributed locks is as follows
<p align="center">
<img src="https://analysys.github.io/EasyScheduler/zh_CN/images/distributed_lock.png" alt="Get Distributed Lock Process" width="50%" />
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/distributed_lock.png" alt="Get Distributed Lock Process" width="50%" />
</p>
2. Scheduler thread distributed lock implementation flow chart in EasyScheduler:
<p align="center">
<img src="https://analysys.github.io/EasyScheduler/zh_CN/images/distributed_lock_procss.png" alt="Get Distributed Lock Process" width="50%" />
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/distributed_lock_procss.png" alt="Get Distributed Lock Process" width="50%" />
</p>
##### Third, the thread is insufficient loop waiting problem
@ -156,7 +156,7 @@ EasyScheduler uses ZooKeeper distributed locks to implement only one Master to e
- If a large number of sub-processes are nested in a large DAG, the following figure will result in a "dead" state:
<p align="center">
<img src="https://analysys.github.io/EasyScheduler/zh_CN/images/lack_thread.png" alt="Thread is not enough to wait for loop" width="50%" />
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/lack_thread.png" alt="Thread is not enough to wait for loop" width="50%" />
</p>
In the above figure, MainFlowThread waits for SubFlowThread1 to end, SubFlowThread1 waits for SubFlowThread2 to end, SubFlowThread2 waits for SubFlowThread3 to end, and SubFlowThread3 waits for a new thread in the thread pool, then the entire DAG process cannot end, and thus the thread cannot be released. This forms the state of the child parent process loop waiting. At this point, the scheduling cluster will no longer be available unless a new Master is started to add threads to break such a "stuck."
@ -180,7 +180,7 @@ Fault tolerance is divided into service fault tolerance and task retry. Service
Service fault tolerance design relies on ZooKeeper's Watcher mechanism. The implementation principle is as follows:
<p align="center">
<img src="https://analysys.github.io/EasyScheduler/zh_CN/images/fault-tolerant.png" alt="EasyScheduler Fault Tolerant Design" width="40%" />
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/fault-tolerant.png" alt="EasyScheduler Fault Tolerant Design" width="40%" />
</p>
The Master monitors the directories of other Masters and Workers. If the remove event is detected, the process instance is fault-tolerant or the task instance is fault-tolerant according to the specific business logic.
@ -190,7 +190,7 @@ The Master monitors the directories of other Masters and Workers. If the remove
- Master fault tolerance flow chart:
<p align="center">
<img src="https://analysys.github.io/EasyScheduler/zh_CN/images/fault-tolerant_master.png" alt="Master Fault Tolerance Flowchart" width="40%" />
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/fault-tolerant_master.png" alt="Master Fault Tolerance Flowchart" width="40%" />
</p>
After the ZooKeeper Master is fault-tolerant, it is rescheduled by the Scheduler thread in EasyScheduler. It traverses the DAG to find the "Running" and "Submit Successful" tasks, and monitors the status of its task instance for the "Running" task. You need to determine whether the Task Queue already exists. If it exists, monitor the status of the task instance. If it does not exist, resubmit the task instance.
@ -200,7 +200,7 @@ After the ZooKeeper Master is fault-tolerant, it is rescheduled by the Scheduler
- Worker fault tolerance flow chart:
<p align="center">
<img src="https://analysys.github.io/EasyScheduler/zh_CN/images/fault-tolerant_worker.png" alt="Worker Fault Tolerance Flowchart" width="40%" />
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/fault-tolerant_worker.png" alt="Worker Fault Tolerance Flowchart" width="40%" />
</p>
Once the Master Scheduler thread finds the task instance as "need to be fault tolerant", it takes over the task and resubmits.
@ -239,13 +239,13 @@ In the early scheduling design, if there is no priority design and fair scheduli
- The priority of the process definition is that some processes need to be processed before other processes. This can be configured at the start of the process or at the time of scheduled start. There are 5 levels, followed by HIGHEST, HIGH, MEDIUM, LOW, and LOWEST. As shown below
<p align="center">
<img src="https://analysys.github.io/EasyScheduler/zh_CN/images/process_priority.png" alt="Process Priority Configuration" width="40%" />
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/process_priority.png" alt="Process Priority Configuration" width="40%" />
</p>
- The priority of the task is also divided into 5 levels, followed by HIGHEST, HIGH, MEDIUM, LOW, and LOWEST. As shown below
<p align="center">
<img src="https://analysys.github.io/EasyScheduler/zh_CN/images/task_priority.png" alt="task priority configuration" width="35%" />
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/task_priority.png" alt="task priority configuration" width="35%" />
</p>
##### VI. Logback and gRPC implement log access
@ -256,7 +256,7 @@ In the early scheduling design, if there is no priority design and fair scheduli
- Considering the lightweightness of EasyScheduler as much as possible, gRPC was chosen to implement remote access log information.
<p align="center">
<img src="https://analysys.github.io/EasyScheduler/zh_CN/images/grpc.png" alt="grpc remote access" width="50%" />
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/grpc.png" alt="grpc remote access" width="50%" />
</p>
- We use a custom Logback FileAppender and Filter function to generate a log file for each task instance.

2
docs/zh_CN/系统使用手册.md

@ -110,7 +110,7 @@
> 点击任务实例节点,点击**查看历史**,可以查看该工作流实例运行的该任务实例列表
<p align="center">
<img src="https://analysys.github.io/EasyScheduler/zh_CN/images/task_history.png" width="60%" />
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/task_history.png" width="60%" />
</p>

26
docs/zh_CN/系统架构设计.md

@ -5,7 +5,7 @@
**DAG:** 全称Directed Acyclic Graph,简称DAG。工作流中的Task任务以有向无环图的形式组装起来,从入度为零的节点进行拓扑遍历,直到无后继节点为止。举例如下图:
<p align="center">
<img src="https://analysys.github.io/EasyScheduler/zh_CN/images/dag_examples_cn.jpg" alt="dag示例" width="60%" />
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/dag_examples_cn.jpg" alt="dag示例" width="60%" />
<p align="center">
<em>dag示例</em>
</p>
@ -37,7 +37,7 @@
#### 2.1 系统架构图
<p align="center">
<img src="https://analysys.github.io/EasyScheduler/zh_CN/images/architecture.jpg" alt="系统架构图" width="70%" />
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/architecture.jpg" alt="系统架构图" width="70%" />
<p align="center">
<em>系统架构图</em>
</p>
@ -98,7 +98,7 @@
中心化的设计理念比较简单,分布式集群中的节点按照角色分工,大体上分为两种角色:
<p align="center">
<img src="https://analysys.github.io/EasyScheduler/zh_CN/images/master_slave.png" alt="master-slave角色" width="50%" />
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/master_slave.png" alt="master-slave角色" width="50%" />
</p>
- Master的角色主要负责任务分发并监督Slave的健康状态,可以动态的将任务均衡到Slave上,以致Slave节点不至于“忙死”或”闲死”的状态。
@ -115,7 +115,7 @@
###### 去中心化
<p align="center"
<img src="https://analysys.github.io/EasyScheduler/zh_CN/images/decentralization.png" alt="去中心化" width="50%" />
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/decentralization.png" alt="去中心化" width="50%" />
</p>
- 在去中心化设计里,通常没有Master/Slave的概念,所有的角色都是一样的,地位是平等的,全球互联网就是一个典型的去中心化的分布式系统,联网的任意节点设备down机,都只会影响很小范围的功能。
@ -131,12 +131,12 @@
EasyScheduler使用ZooKeeper分布式锁来实现同一时刻只有一台Master执行Scheduler,或者只有一台Worker执行任务的提交。
1. 获取分布式锁的核心流程算法如下
<p align="center">
<img src="https://analysys.github.io/EasyScheduler/zh_CN/images/distributed_lock.png" alt="获取分布式锁流程" width="50%" />
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/distributed_lock.png" alt="获取分布式锁流程" width="50%" />
</p>
2. EasyScheduler中Scheduler线程分布式锁实现流程图:
<p align="center">
<img src="https://analysys.github.io/EasyScheduler/zh_CN/images/distributed_lock_procss.png" alt="获取分布式锁流程" width="50%" />
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/distributed_lock_procss.png" alt="获取分布式锁流程" width="50%" />
</p>
@ -146,7 +146,7 @@ EasyScheduler使用ZooKeeper分布式锁来实现同一时刻只有一台Master
- 如果一个大的DAG中嵌套了很多子流程,如下图则会产生“死等”状态:
<p align="center">
<img src="https://analysys.github.io/EasyScheduler/zh_CN/images/lack_thread.png" alt="线程不足循环等待问题" width="50%" />
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/lack_thread.png" alt="线程不足循环等待问题" width="50%" />
</p>
上图中MainFlowThread等待SubFlowThread1结束,SubFlowThread1等待SubFlowThread2结束, SubFlowThread2等待SubFlowThread3结束,而SubFlowThread3等待线程池有新线程,则整个DAG流程不能结束,从而其中的线程也不能释放。这样就形成的子父流程循环等待的状态。此时除非启动新的Master来增加线程来打破这样的”僵局”,否则调度集群将不能再使用。
@ -169,7 +169,7 @@ EasyScheduler使用ZooKeeper分布式锁来实现同一时刻只有一台Master
服务容错设计依赖于ZooKeeper的Watcher机制,实现原理如图:
<p align="center">
<img src="https://analysys.github.io/EasyScheduler/zh_CN/images/fault-tolerant.png" alt="EasyScheduler容错设计" width="40%" />
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/fault-tolerant.png" alt="EasyScheduler容错设计" width="40%" />
</p>
其中Master监控其他Master和Worker的目录,如果监听到remove事件,则会根据具体的业务逻辑进行流程实例容错或者任务实例容错。
@ -178,7 +178,7 @@ EasyScheduler使用ZooKeeper分布式锁来实现同一时刻只有一台Master
- Master容错流程图:
<p align="center">
<img src="https://analysys.github.io/EasyScheduler/zh_CN/images/fault-tolerant_master.png" alt="Master容错流程图" width="40%" />
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/fault-tolerant_master.png" alt="Master容错流程图" width="40%" />
</p>
ZooKeeper Master容错完成之后则重新由EasyScheduler中Scheduler线程调度,遍历 DAG 找到”正在运行”和“提交成功”的任务,对”正在运行”的任务监控其任务实例的状态,对”提交成功”的任务需要判断Task Queue中是否已经存在,如果存在则同样监控任务实例的状态,如果不存在则重新提交任务实例。
@ -187,7 +187,7 @@ ZooKeeper Master容错完成之后则重新由EasyScheduler中Scheduler线程调
- Worker容错流程图:
<p align="center">
<img src="https://analysys.github.io/EasyScheduler/zh_CN/images/fault-tolerant_worker.png" alt="Worker容错流程图" width="40%" />
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/fault-tolerant_worker.png" alt="Worker容错流程图" width="40%" />
</p>
Master Scheduler线程一旦发现任务实例为” 需要容错”状态,则接管任务并进行重新提交。
@ -224,12 +224,12 @@ Master Scheduler线程一旦发现任务实例为” 需要容错”状态,则
- 其中流程定义的优先级是考虑到有些流程需要先于其他流程进行处理,这个可以在流程启动或者定时启动时配置,共有5级,依次为HIGHEST、HIGH、MEDIUM、LOW、LOWEST。如下图
<p align="center">
<img src="https://analysys.github.io/EasyScheduler/zh_CN/images/process_priority.png" alt="流程优先级配置" width="40%" />
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/process_priority.png" alt="流程优先级配置" width="40%" />
</p>
- 任务的优先级也分为5级,依次为HIGHEST、HIGH、MEDIUM、LOW、LOWEST。如下图
<p align="center">
<img src="https://analysys.github.io/EasyScheduler/zh_CN/images/task_priority.png" alt="任务优先级配置" width="35%" />
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/task_priority.png" alt="任务优先级配置" width="35%" />
</p>
@ -242,7 +242,7 @@ Master Scheduler线程一旦发现任务实例为” 需要容错”状态,则
- 介于考虑到尽可能的EasyScheduler的轻量级性,所以选择了gRPC实现远程访问日志信息。
<p align="center">
<img src="https://analysys.github.io/EasyScheduler/zh_CN/images/grpc.png" alt="grpc远程访问" width="50%" />
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/grpc.png" alt="grpc远程访问" width="50%" />
</p>

Loading…
Cancel
Save