Browse Source

Merge remote-tracking branch 'upstream/dev' into dev

pull/2/head
lenboo 6 years ago
parent
commit
db2ac7f74a
  1. 19
      README.md
  2. 45
      docs/zh_CN/1.0.3-release.md
  3. 37
      docs/zh_CN/1.0.4-release.md
  4. 13
      escheduler-dao/src/main/java/cn/escheduler/dao/ProcessDao.java
  5. 4
      escheduler-dao/src/main/java/cn/escheduler/dao/mapper/ProcessDefinitionMapperProvider.java
  6. 6
      escheduler-dao/src/main/resources/dao/data_source.properties
  7. 7
      escheduler-server/src/main/java/cn/escheduler/server/worker/task/AbstractCommandExecutor.java

19
README.md

@ -31,21 +31,21 @@ Its main objectives are as follows:
  | EasyScheduler | Azkaban | Airflow
-- | -- | -- | --
**Stability** |   |   |  
Single point of failure | Decentralized multi-master and multi-worker | Yes Single Web and Scheduler Combination Node | Yes. Single Scheduler
Single point of failure | Decentralized multi-master and multi-worker | Yes <br/> Single Web and Scheduler Combination Node | Yes <br/> Single Scheduler
Additional HA requirements | Not required (HA is supported by itself) | DB | Celery / Dask / Mesos + Load Balancer + DB
Overload processing | Task queue mechanism, the number of schedulable tasks on a single machine can be flexibly configured, when too many tasks will be cached in the task queue, will not cause machine jam. | Jammed the server when there are too many tasks | Jammed the server when there are too many tasks
**Easy to use** |   |   |  
DAG Monitoring Interface | Visualization process defines key information such as task status, task type, retry times, task running machine, visual variables and so on at a glance. | Only task status can be seen | Can't visually distinguish task types
Visual process definition | Yes All process definition operations are visualized, dragging tasks to draw DAGs, configuring data sources and resources. At the same time, for third-party systems, the api mode operation is provided. | No DAG and custom upload via custom DSL | No DAG is drawn through Python code, which is inconvenient to use, especially for business people who can't write code.
Quick deployment | One-click deployment | Complex clustering deployment | Complex clustering deployment
DAG Monitoring Interface | Visualization process defines key information such as task status, task type, retry times, task running machine, visual variables and so on at a glance. | Only task status can be seen | Can't visually distinguish task types
Visual process definition | Yes <br/> All process definition operations are visualized, dragging tasks to draw DAGs, configuring data sources and resources. At the same time, for third-party systems, the api mode operation is provided. | No <br/> DAG and custom upload via custom DSL | No <br/> DAG is drawn through Python code, which is inconvenient to use, especially for business people who can't write code.
Quick deployment | One-click deployment | Complex clustering deployment | Complex clustering deployment
**Features** |   |   |  
Suspend and resume | Support pause, recover operation | No Can only kill the workflow first and then re-run | No Can only kill the workflow first and then re-run
Suspend and resume | Support pause, recover operation | No <br/> Can only kill the workflow first and then re-run | No <br/> Can only kill the workflow first and then re-run
Whether to support multiple tenants | Users on easyscheduler can achieve many-to-one or one-to-one mapping relationship through tenants and Hadoop users, which is very important for scheduling large data jobs. " Supports traditional shell tasks, while supporting large data platform task scheduling: MR, Spark, SQL (mysql, postgresql, hive, sparksql), Python, Procedure, Sub_Process | No | No
Task type | Supports traditional shell tasks, and also support big data platform task scheduling: MR, Spark, SQL (mysql, postgresql, hive, sparksql), Python, Procedure, Sub_Process | shell、gobblin、hadoopJava、java、hive、pig、spark、hdfsToTeradata、teradataToHdfs | BashOperator、DummyOperator、MySqlOperator、HiveOperator、EmailOperator、HTTPOperator、SqlOperator
Compatibility | Support the scheduling of big data jobs like spark, hive, Mr. At the same time, it is more compatible with big data business because it supports multiple tenants. | Because it does not support multi-tenant, it is not flexible enough to use business in big data platform. | Because it does not support multi-tenant, it is not flexible enough to use business in big data platform.
**Scalability** |   |   |  
Whether to support custom task types | Yes | Yes | Yes
Is Cluster Extension Supported? | Yes The scheduler uses distributed scheduling, and the overall scheduling capability will increase linearly with the scale of the cluster. Master and Worker support dynamic online and offline. | Yes, but complicated Executor horizontal extend | Yes, but complicated Executor horizontal extend
Is Cluster Extension Supported? | Yes <br/> The scheduler uses distributed scheduling, and the overall scheduling capability will increase linearly with the scale of the cluster. Master and Worker support dynamic online and offline. | Yes <br/> but complicated Executor horizontal extend | Yes <br/> but complicated Executor horizontal extend
@ -84,10 +84,13 @@ https://github.com/analysys/EasyScheduler/blob/master/CONTRIBUTING.md
### Thanks
Easy Scheduler uses a lot of excellent open source projects, such as google guava, guice, grpc, netty, ali bonecp, quartz, and many open source projects of apache, etc.
It is because of the shoulders of these open source projects that the birth of the Easy Scheduler is possible. We are very grateful for all the open source software used! We also hope that we will not only be the beneficiaries of open source, but also be open source contributors, so we decided to contribute to easy scheduling and promised long-term updates. I also hope that partners who have the same passion and conviction for open source will join in and contribute to open source!
It is because of the shoulders of these open source projects that the birth of the Easy Scheduler is possible. We are very grateful for all the open source software used! We also hope that we will not only be the beneficiaries of open source, but also be open source contributors, so we decided to contribute to easy scheduling and promised long-term updates. We also hope that partners who have the same passion and conviction for open source will join in and contribute to open source!
### Help
### Get Help
The fastest way to get response from our developers is to submit issues, or add our wechat : 510570367
### License
Please refer to [LICENSE](https://github.com/analysys/EasyScheduler/blob/dev/LICENSE) file.

45
docs/zh_CN/1.0.3-release.md

@ -2,45 +2,22 @@ Easy Scheduler Release 1.0.3
===
Easy Scheduler 1.0.3是1.x系列中的第四个版本。
新特性:
===
- [[EasyScheduler-254](https://github.com/analysys/EasyScheduler/issues/254)] 流程定义删除和批量删除
- [[EasyScheduler-347](https://github.com/analysys/EasyScheduler/issues/347)] 任务依赖增加“今日”
- [[EasyScheduler-273](https://github.com/analysys/EasyScheduler/issues/273)]sql任务添加title
- [[EasyScheduler-247](https://github.com/analysys/EasyScheduler/issues/247)]API在线文档
- [[EasyScheduler-319](https://github.com/analysys/EasyScheduler/issues/319)] 单机容错
- [[EasyScheduler-253](https://github.com/analysys/EasyScheduler/issues/253)] 项目增加流程定义统计和运行流程实例统计
- [[EasyScheduler-292](https://github.com/analysys/EasyScheduler/issues/292)] 启用SSL的邮箱发送邮件
- [[EasyScheduler-77](https://github.com/analysys/EasyScheduler/issues/77)] 定时管理、工作流定义添加删除功能
- [[EasyScheduler-380](https://github.com/analysys/EasyScheduler/issues/380)] 服务监控功能
- [[EasyScheduler-380](https://github.com/analysys/EasyScheduler/issues/382)] 项目增加流程定义统计和运行流程实例统计
增强:
===
- [[EasyScheduler-192](https://github.com/analysys/EasyScheduler/issues/192)] 租户删除前可以考虑校验租户和资源
- [[EasyScheduler-376](https://github.com/analysys/EasyScheduler/issues/294)] 删除实例时候,没有删除对应zookeeper队列里的任务
- [[EasyScheduler-185](https://github.com/analysys/EasyScheduler/issues/185)] 项目删除工作流定义还存在
- [[EasyScheduler-206](https://github.com/analysys/EasyScheduler/issues/206)] 优化部署,完善docker化支持
- [[EasyScheduler-381](https://github.com/analysys/EasyScheduler/issues/381)] 前端一键部署脚本支持ubuntu
- [[EasyScheduler-482]](https://github.com/analysys/EasyScheduler/issues/482)sql任务中的邮件标题增加了对自定义变量的支持
- [[EasyScheduler-483]](https://github.com/analysys/EasyScheduler/issues/483)sql任务中的发邮件失败,则此sql任务为失败
- [[EasyScheduler-484]](https://github.com/analysys/EasyScheduler/issues/484)修改sql任务中自定义变量的替换规则,支持多个单引号和双引号的替换
- [[EasyScheduler-485]](https://github.com/analysys/EasyScheduler/issues/485)创建资源文件时,增加对该资源文件是否在hdfs上已存在的验证
修复:
===
- [[EasyScheduler-255](https://github.com/analysys/EasyScheduler/issues/255)]子父流程全局变量覆盖,子流程继承父流程全局变量并可以重写
- [[EasyScheduler-256](https://github.com/analysys/EasyScheduler/issues/256)]子父流程参数显示异常
- [[EasyScheduler-186](https://github.com/analysys/EasyScheduler/issues/186)]所有查询中只要输入%会返回所有数据
- [[EasyScheduler-185](https://github.com/analysys/EasyScheduler/issues/185)]项目删除工作流定义还存在
- [[EasyScheduler-266](https://github.com/analysys/EasyScheduler/issues/266)]Stop process return: process definition 1 not on line
- [[EasyScheduler-300](https://github.com/analysys/EasyScheduler/issues/300)] 超时告警时间单位
- [[EasyScheduler-235](https://github.com/analysys/EasyScheduler/issues/235)]nginx超时连接问题修复
- [[EasyScheduler-272](https://github.com/analysys/EasyScheduler/issues/272)]管理员不能生成token
- [[EasyScheduler-272](https://github.com/analysys/EasyScheduler/issues/277)]save global parameters error
- [[EasyScheduler-183](https://github.com/analysys/EasyScheduler/issues/183)]创建中文名称的Worker分组报错
- [[EasyScheduler-377](https://github.com/analysys/EasyScheduler/issues/377)]资源文件重命名只修改描述时会报名称已存在错误
- [[EasyScheduler-235](https://github.com/analysys/EasyScheduler/issues/235)]创建spark数据源,点击“测试连接”,系统回退回到登入页面
- [[EasyScheduler-83](https://github.com/analysys/EasyScheduler/issues/83)]1.0.1版本启动api server报错
- [[EasyScheduler-379](https://github.com/analysys/EasyScheduler/issues/379)]跨天恢复执行定时任务时,时间参数不对
- [[EasyScheduler-383](https://github.com/analysys/EasyScheduler/issues/383)]sql邮件不显示前面的空行
- [[EasyScheduler-198]](https://github.com/analysys/EasyScheduler/issues/198) 流程定义列表根据定时状态和更新时间进行排序
- [[EasyScheduler-419]](https://github.com/analysys/EasyScheduler/issues/419) 修复在线创建文件,hdfs文件未创建,却返回成功
- [[EasyScheduler-481]](https://github.com/analysys/EasyScheduler/issues/481)修复job不存在定时无法下线的问题
- [[EasyScheduler-425]](https://github.com/analysys/EasyScheduler/issues/425) kill任务时增加对其子进程的kill
- [[EasyScheduler-422]](https://github.com/analysys/EasyScheduler/issues/422) 修复更新资源文件时更新时间和大小未更新的问题
- [[EasyScheduler-431]](https://github.com/analysys/EasyScheduler/issues/431) 修复删除租户时,如果未启动hdfs,则删除租户失败的问题
- [[EasyScheduler-485]](https://github.com/analysys/EasyScheduler/issues/486) shell进程退出,yarn状态非终态等待判断
感谢:
===

37
docs/zh_CN/1.0.4-release.md

@ -2,30 +2,27 @@ Easy Scheduler Release 1.0.4
===
Easy Scheduler 1.0.4是1.x系列中的第五个版本。
增强:
===
- [[EasyScheduler-482]](https://github.com/analysys/EasyScheduler/issues/482)sql任务中的邮件标题增加了对自定义变量的支持
- [[EasyScheduler-483]](https://github.com/analysys/EasyScheduler/issues/483)sql任务中的发邮件失败,则此sql任务为失败
- [[EasyScheduler-484]](https://github.com/analysys/EasyScheduler/issues/484)修改sql任务中自定义变量的替换规则,支持多个单引号和双引号的替换
- [[EasyScheduler-485]](https://github.com/analysys/EasyScheduler/issues/485)创建资源文件时,增加对该资源文件是否在hdfs上已存在的验证
- [[EasyScheduler-486]](https://github.com/analysys/EasyScheduler/issues/486)shell进程退出,yarn状态非终态等待判断
**修复**:
- [[EasyScheduler-198]](https://github.com/analysys/EasyScheduler/issues/198) 流程定义列表根据定时状态和更新时间进行排序
- [[EasyScheduler-419]](https://github.com/analysys/EasyScheduler/issues/419) 修复在线创建文件,hdfs文件未创建,却返回成功
- [[EasyScheduler-481]](https://github.com/analysys/EasyScheduler/issues/481)修复job不存在定时无法下线的问题
- [[EasyScheduler-425]](https://github.com/analysys/EasyScheduler/issues/425) kill任务时增加对其子进程的kill
- [[EasyScheduler-422]](https://github.com/analysys/EasyScheduler/issues/422) 修复更新资源文件时更新时间和大小未更新的问题
- [[EasyScheduler-431]](https://github.com/analysys/EasyScheduler/issues/431) 修复删除租户时,如果未启动hdfs,则删除租户失败的问题
- [[EasyScheduler-485]](https://github.com/analysys/EasyScheduler/issues/486) shell进程退出,yarn状态非终态等待判断
修复
===
- [[EasyScheduler-198]](https://github.com/analysys/EasyScheduler/issues/198) 流程定义列表根据定时状态和更新时间进行排序
- [[EasyScheduler-419]](https://github.com/analysys/EasyScheduler/issues/419) 修复在线创建文件,hdfs文件未创建,却返回成功
- [[EasyScheduler-481]](https://github.com/analysys/EasyScheduler/issues/481)修复job不存在定时无法下线的问题
- [[EasyScheduler-425]](https://github.com/analysys/EasyScheduler/issues/425) kill任务时增加对其子进程的kill
- [[EasyScheduler-422]](https://github.com/analysys/EasyScheduler/issues/422) 修复更新资源文件时更新时间和大小未更新的问题
- [[EasyScheduler-431]](https://github.com/analysys/EasyScheduler/issues/431) 修复删除租户时,如果未启动hdfs,则删除租户失败的问题
**增强**:
- [[EasyScheduler-482]](https://github.com/analysys/EasyScheduler/issues/482)sql任务中的邮件标题增加了对自定义变量的支持
- [[EasyScheduler-483]](https://github.com/analysys/EasyScheduler/issues/483)sql任务中的发邮件失败,则此sql任务为失败
- [[EasyScheduler-484]](https://github.com/analysys/EasyScheduler/issues/484)修改sql任务中自定义变量的替换规则,支持多个单引号和双引号的替换
- [[EasyScheduler-485]](https://github.com/analysys/EasyScheduler/issues/485)创建资源文件时,增加对该资源文件是否在hdfs上已存在的验证
感谢:
===
最后但最重要的是,没有以下伙伴的贡献就没有新版本的诞生:
Baoqi, jimmy201602, samz406, petersear, millionfor, hyperknob, fanguanqun, yangqinlong, qq389401879, feloxx, coding-now, hymzcn, nysyxxg, chgxtony, gj-zhang, xianhu, sunnyingit,
zhengqiangtan
最后但最重要的是,没有以下伙伴的贡献就没有新版本的诞生(排名不分先后):
以及微信群里众多的热心伙伴!在此非常感谢!
Baoqi, jimmy201602, samz406, petersear, millionfor, hyperknob, fanguanqun, yangqinlong, qq389401879,
feloxx, coding-now, hymzcn, nysyxxg, chgxtony, lfyee, Crossoverrr, gj-zhang, sunnyingit, xianhu, zhengqiangtan
以及微信群/钉钉群里众多的热心伙伴!在此非常感谢!

13
escheduler-dao/src/main/java/cn/escheduler/dao/ProcessDao.java

@ -931,6 +931,9 @@ public class ProcessDao extends AbstractBaseDao {
cmdParam.put(CMDPARAM_COMPLEMENT_DATA_START_DATE, startTime);
processMapStr = JSONUtils.toJson(cmdParam);
}
updateSubProcessDefinitionByParent(parentProcessInstance, childDefineId);
Command command = new Command();
command.setWarningType(parentProcessInstance.getWarningType());
command.setWarningGroupId(parentProcessInstance.getWarningGroupId());
@ -945,6 +948,16 @@ public class ProcessDao extends AbstractBaseDao {
logger.info("sub process command created: {} ", command.toString());
}
private void updateSubProcessDefinitionByParent(ProcessInstance parentProcessInstance, int childDefinitionId) {
ProcessDefinition fatherDefinition = this.findProcessDefineById(parentProcessInstance.getProcessDefinitionId());
ProcessDefinition childDefinition = this.findProcessDefineById(childDefinitionId);
if(childDefinition != null && fatherDefinition != null){
childDefinition.setReceivers(fatherDefinition.getReceivers());
childDefinition.setReceiversCc(fatherDefinition.getReceiversCc());
processDefineMapper.update(childDefinition);
}
}
/**
* submit task to mysql
* @param taskInstance

4
escheduler-dao/src/main/java/cn/escheduler/dao/mapper/ProcessDefinitionMapperProvider.java

@ -55,6 +55,8 @@ public class ProcessDefinitionMapperProvider {
VALUES("`connects`", "#{processDefinition.connects}");
VALUES("`create_time`", "#{processDefinition.createTime}");
VALUES("`update_time`", "#{processDefinition.updateTime}");
VALUES("`receivers` ","#{processDefinition.receivers}");
VALUES("`receivers_cc`", "#{processDefinition.receiversCc}");
VALUES("`timeout`", "#{processDefinition.timeout}");
VALUES("`tenant_id`", "#{processDefinition.tenantId}");
VALUES("`flag`", EnumFieldUtil.genFieldStr("processDefinition.flag", ReleaseState.class));
@ -102,6 +104,8 @@ public class ProcessDefinitionMapperProvider {
SET("`global_params`=#{processDefinition.globalParams}");
SET("`create_time`=#{processDefinition.createTime}");
SET("`update_time`=#{processDefinition.updateTime}");
SET("`receivers`=#{processDefinition.receivers}");
SET("`receivers_cc`=#{processDefinition.receiversCc}");
SET("`timeout`=#{processDefinition.timeout}");
SET("`tenant_id`=#{processDefinition.tenantId}");
SET("`flag`="+EnumFieldUtil.genFieldStr("processDefinition.flag", Flag.class));

6
escheduler-dao/src/main/resources/dao/data_source.properties

@ -1,9 +1,9 @@
# base spring data source configuration
spring.datasource.type=com.alibaba.druid.pool.DruidDataSource
spring.datasource.driver-class-name=com.mysql.jdbc.Driver
spring.datasource.url=jdbc:mysql://192.168.220.188:3306/escheduler_new?characterEncoding=UTF-8
spring.datasource.username=root
spring.datasource.password=root@123
spring.datasource.url=jdbc:mysql://192.168.xx.xx:3306/escheduler?characterEncoding=UTF-8
spring.datasource.username=xx
spring.datasource.password=xx
# connection configuration
spring.datasource.initialSize=5

7
escheduler-server/src/main/java/cn/escheduler/server/worker/task/AbstractCommandExecutor.java

@ -162,7 +162,12 @@ public abstract class AbstractCommandExecutor {
exitStatusCode = updateState(processDao, exitStatusCode, pid, taskInstId);
} else {
cancelApplication();
TaskInstance taskInstance = processDao.findTaskInstanceById(taskInstId);
if (taskInstance == null) {
logger.error("task instance id:{} not exist", taskInstId);
} else {
ProcessUtils.kill(taskInstance);
}
exitStatusCode = -1;
logger.warn("process timeout, work dir:{}, pid:{}", taskDir, pid);
}

Loading…
Cancel
Save