From 7e8b5c693cf73ed47204931ea49637e71206dd9e Mon Sep 17 00:00:00 2001 From: bao liang <29528966+lenboo@users.noreply.github.com> Date: Tue, 6 Aug 2019 11:14:27 +0800 Subject: [PATCH] add en-us documents (#672) * add quick start document * update test * add backend deployment document * add frontend deployment document * add system manual * Supplementary translation Translated untranslated places * 1.0.1-release.md add 1.0.1-release document * 1.0.2-release.md add 1.0.2-release document * 1.0.3-release.md add 1.0.3-release document * 1.1.0-release.md add 1.1.0-release document * EasyScheduler-FAQ.md add FAQ document * Backend development documentation.md add backend development documentation * Upgrade documentation.md add Upgrade documentation * Frontend development documentation.md add frontend development documentation --- docs/en_us/1.0.1-release.md | 16 + docs/en_us/1.0.2-release.md | 49 ++ docs/en_us/1.0.3-release.md | 30 + docs/en_us/1.1.0-release.md | 55 ++ docs/en_us/Backend Deployment Document.md | 223 ++++++ .../Backend development documentation.md | 48 ++ docs/en_us/EasyScheduler-FAQ.md | 285 +++++++ docs/en_us/Frontend Deployment Document.md | 106 +++ .../Frontend development documentation.md | 650 ++++++++++++++++ docs/en_us/Quick Start.md | 52 ++ docs/en_us/System manual.md | 715 ++++++++++++++++++ docs/en_us/Upgrade documentation.md | 38 + 12 files changed, 2267 insertions(+) create mode 100644 docs/en_us/1.0.1-release.md create mode 100644 docs/en_us/1.0.2-release.md create mode 100644 docs/en_us/1.0.3-release.md create mode 100644 docs/en_us/1.1.0-release.md create mode 100644 docs/en_us/Backend Deployment Document.md create mode 100644 docs/en_us/Backend development documentation.md create mode 100644 docs/en_us/EasyScheduler-FAQ.md create mode 100644 docs/en_us/Frontend Deployment Document.md create mode 100644 docs/en_us/Frontend development documentation.md create mode 100644 docs/en_us/Quick Start.md create mode 100644 docs/en_us/System manual.md create mode 100644 docs/en_us/Upgrade documentation.md diff --git a/docs/en_us/1.0.1-release.md b/docs/en_us/1.0.1-release.md new file mode 100644 index 0000000000..9bebfcca9b --- /dev/null +++ b/docs/en_us/1.0.1-release.md @@ -0,0 +1,16 @@ +Easy Scheduler Release 1.0.1 +=== +Easy Scheduler 1.0.2 is the second version in the 1.x series. The update is as follows: + +- 1,outlook TSL email support +- 2,servlet and protobuf jar conflict resolution +- 3,create a tenant and establish a Linux user at the same time +- 4,the re-run time is negative +- 5,stand-alone and cluster can be deployed with one click of install.sh +- 6,queue support interface added +- 7,escheduler.t_escheduler_queue added create_time and update_time fields + + + + + diff --git a/docs/en_us/1.0.2-release.md b/docs/en_us/1.0.2-release.md new file mode 100644 index 0000000000..502dbf8f9b --- /dev/null +++ b/docs/en_us/1.0.2-release.md @@ -0,0 +1,49 @@ +Easy Scheduler Release 1.0.2 +=== +Easy Scheduler 1.0.2 is the third version in the 1.x series. This version adds scheduling open interfaces, worker grouping (the machine group for which the specified task runs), task flow and service monitoring, and support for oracle, clickhouse, etc., as follows: + +New features: +=== +- [[EasyScheduler-79](https://github.com/analysys/EasyScheduler/issues/79)] scheduling the open interface through the token mode, which can be operated through the api. +- [[EasyScheduler-138](https://github.com/analysys/EasyScheduler/issues/138)] can specify the machine (group) where the task runs. +- [[EasyScheduler-139](https://github.com/analysys/EasyScheduler/issues/139)] task Process Monitoring and Master, Worker, Zookeeper Operation Status Monitoring +- [[EasyScheduler-140](https://github.com/analysys/EasyScheduler/issues/140)] workflow Definition - Increase Process Timeout Alarm +- [[EasyScheduler-134](https://github.com/analysys/EasyScheduler/issues/134)] task type supports Oracle, CLICKHOUSE, SQLSERVER, IMPALA +- [[EasyScheduler-136](https://github.com/analysys/EasyScheduler/issues/136)] sql task node can independently select CC mail users +- [[EasyScheduler-141](https://github.com/analysys/EasyScheduler/issues/141)] user Management—Users can bind queues. The user queue level is higher than the tenant queue level. If the user queue is empty, look for the tenant queue. + + + +Enhanced: +=== +- [[EasyScheduler-154](https://github.com/analysys/EasyScheduler/issues/154)] Tenant code allows encoding of pure numbers or underscores + + +Repair: +=== +- [[EasyScheduler-135](https://github.com/analysys/EasyScheduler/issues/135)] Python task can specify python version + +- [[EasyScheduler-125](https://github.com/analysys/EasyScheduler/issues/125)] The mobile phone number in the user account does not recognize the opening of Unicom's latest number 166 + +- [[EasyScheduler-178](https://github.com/analysys/EasyScheduler/issues/178)] Fix subtle spelling mistakes in ProcessDao + +- [[EasyScheduler-129](https://github.com/analysys/EasyScheduler/issues/129)] Tenant code, underlined and other special characters cannot pass the check. + + +Thank: +=== +Last but not least, no new version was born without the contributions of the following partners: + +Baoqi , chubbyjiang , coreychen , chgxtony, cmdares , datuzi , dingchao, fanguanqun , 风清扬, gaojun416 , googlechorme, hyperknob , hujiang75277381 , huanzui , kinssun, ivivi727 ,jimmy, jiangzhx , kevin5210 , lidongdai , lshmouse , lenboo, lyf198972 , lgcareer , lzy305 , moranrr , millionfor , mazhong8808, programlief, qiaozhanwei , roy110 , swxchappy , sherlock111 , samz406 , swxchappy, qq389401879 , lzy305, vkingnew, William-GuoWei , woniulinux, yyl861, zhangxin1988, yangjiajun2014, yangqinlong, yangjiajun2014, zhzhenqin, zhangluck, zhanghaicheng1, zhuyizhizhi + +And many enthusiastic partners in the WeChat group! Thank you very much! + + + + + + + + + + diff --git a/docs/en_us/1.0.3-release.md b/docs/en_us/1.0.3-release.md new file mode 100644 index 0000000000..b87f894011 --- /dev/null +++ b/docs/en_us/1.0.3-release.md @@ -0,0 +1,30 @@ +Easy Scheduler Release 1.0.3 +=== +Easy Scheduler 1.0.3 is the fourth version in the 1.x series. + +Enhanced: +=== +- [[EasyScheduler-482]](https://github.com/analysys/EasyScheduler/issues/482)sql task mail header added support for custom variables +- [[EasyScheduler-483]](https://github.com/analysys/EasyScheduler/issues/483)sql task failed to send mail, then this sql task is failed +- [[EasyScheduler-484]](https://github.com/analysys/EasyScheduler/issues/484)modify the replacement rule of the custom variable in the sql task, and support the replacement of multiple single quotes and double quotes. +- [[EasyScheduler-485]](https://github.com/analysys/EasyScheduler/issues/485)when creating a resource file, increase the verification that the resource file already exists on hdfs + +Repair: +=== +- [[EasyScheduler-198]](https://github.com/analysys/EasyScheduler/issues/198) the process definition list is sorted according to the timing status and update time +- [[EasyScheduler-419]](https://github.com/analysys/EasyScheduler/issues/419) fixes online creation of files, hdfs file is not created, but returns successfully +- [[EasyScheduler-481] ](https://github.com/analysys/EasyScheduler/issues/481)fixes the problem that the job does not exist at the same time. +- [[EasyScheduler-425]](https://github.com/analysys/EasyScheduler/issues/425) kills the kill of its child process when killing the task +- [[EasyScheduler-422]](https://github.com/analysys/EasyScheduler/issues/422) fixed an issue where the update time and size were not updated when updating resource files +- [[EasyScheduler-431]](https://github.com/analysys/EasyScheduler/issues/431) fixed an issue where deleting a tenant failed if hdfs was not started when the tenant was deleted +- [[EasyScheduler-485]](https://github.com/analysys/EasyScheduler/issues/486) the shell process exits, the yarn state is not final and waits for judgment. + +Thank: +=== +Last but not least, no new version was born without the contributions of the following partners: + +Baoqi, jimmy201602, samz406, petersear, millionfor, hyperknob, fanguanqun, yangqinlong, qq389401879, +feloxx, coding-now, hymzcn, nysyxxg, chgxtony + +And many enthusiastic partners in the WeChat group! Thank you very much! + diff --git a/docs/en_us/1.1.0-release.md b/docs/en_us/1.1.0-release.md new file mode 100644 index 0000000000..c9ebe71503 --- /dev/null +++ b/docs/en_us/1.1.0-release.md @@ -0,0 +1,55 @@ +Easy Scheduler Release 1.1.0 +=== +Easy Scheduler 1.1.0 is the first release in the 1.1.x series. + +New features: +=== +- [[EasyScheduler-391](https://github.com/analysys/EasyScheduler/issues/391)] run a process under a specified tenement user +- [[EasyScheduler-288](https://github.com/analysys/EasyScheduler/issues/288)] feature/qiye_weixin +- [[EasyScheduler-189](https://github.com/analysys/EasyScheduler/issues/189)] security support such as Kerberos +- [[EasyScheduler-398](https://github.com/analysys/EasyScheduler/issues/398)]dministrator, with tenants (install.sh set default tenant), can create resources, projects and data sources (limited to one administrator) +- [[EasyScheduler-293](https://github.com/analysys/EasyScheduler/issues/293)]click on the parameter selected when running the process, there is no place to view, no save +- [[EasyScheduler-401](https://github.com/analysys/EasyScheduler/issues/401)]timing is easy to time every second. After the timing is completed, you can display the next trigger time on the page. +- [[EasyScheduler-493](https://github.com/analysys/EasyScheduler/pull/493)]add datasource kerberos auth and FAQ modify and add resource upload s3 + + +Enhanced: +=== +- [[EasyScheduler-227](https://github.com/analysys/EasyScheduler/issues/227)] upgrade spring-boot to 2.1.x and spring to 5.x +- [[EasyScheduler-434](https://github.com/analysys/EasyScheduler/issues/434)] number of worker nodes zk and mysql are inconsistent +- [[EasyScheduler-435](https://github.com/analysys/EasyScheduler/issues/435)]authentication of the mailbox format +- [[EasyScheduler-441](https://github.com/analysys/EasyScheduler/issues/441)] prohibits running nodes from joining completed node detection +- [[EasyScheduler-400](https://github.com/analysys/EasyScheduler/issues/400)] Home page, queue statistics are not harmonious, command statistics have no data +- [[EasyScheduler-395](https://github.com/analysys/EasyScheduler/issues/395)] For fault-tolerant recovery processes, the status cannot be ** is running +- [[EasyScheduler-529](https://github.com/analysys/EasyScheduler/issues/529)] optimize poll task from zookeeper +- [[EasyScheduler-242](https://github.com/analysys/EasyScheduler/issues/242)]worker-server node gets task performance problem +- [[EasyScheduler-352](https://github.com/analysys/EasyScheduler/issues/352)]worker grouping, queue consumption problem +- [[EasyScheduler-461](https://github.com/analysys/EasyScheduler/issues/461)]view data source parameters, need to encrypt account password information +- [[EasyScheduler-396](https://github.com/analysys/EasyScheduler/issues/396)]Dockerfile optimization, and associated Dockerfile and github to achieve automatic mirroring +- [[EasyScheduler-389](https://github.com/analysys/EasyScheduler/issues/389)]service monitor cannot find the change of master/worker +- [[EasyScheduler-511](https://github.com/analysys/EasyScheduler/issues/511)]support recovery process from stop/kill nodes. +- [[EasyScheduler-399](https://github.com/analysys/EasyScheduler/issues/399)]HadoopUtils specifies user actions instead of **Deploying users + +Repair: +=== +- [[EasyScheduler-394](https://github.com/analysys/EasyScheduler/issues/394)] When the master&worker is deployed on the same machine, if the master&worker service is restarted, the previously scheduled tasks cannot be scheduled. +- [[EasyScheduler-469](https://github.com/analysys/EasyScheduler/issues/469)]Fix naming errors,monitor page +- [[EasyScheduler-392](https://github.com/analysys/EasyScheduler/issues/392)]Feature request: fix email regex check +- [[EasyScheduler-405](https://github.com/analysys/EasyScheduler/issues/405)]timed modification/addition page, start time and end time cannot be the same +- [[EasyScheduler-517](https://github.com/analysys/EasyScheduler/issues/517)]complement - subworkflow - time parameter +- [[EasyScheduler-532](https://github.com/analysys/EasyScheduler/issues/532)] python node does not execute the problem +- [[EasyScheduler-543](https://github.com/analysys/EasyScheduler/issues/543)]optimize datasource connection params safety +- [[EasyScheduler-569](https://github.com/analysys/EasyScheduler/issues/569)] timed tasks can't really stop +- [[EasyScheduler-463](https://github.com/analysys/EasyScheduler/issues/463)]mailbox verification does not support very suffixed mailboxes + + + + +Thank: +=== +Last but not least, no new version was born without the contributions of the following partners: + +Baoqi, jimmy201602, samz406, petersear, millionfor, hyperknob, fanguanqun, yangqinlong, qq389401879, chgxtony, Stanfan, lfyee, thisnew, hujiang75277381, sunnyingit, lgbo-ustc, ivivi, lzy305, JackIllkid, telltime, lipengbo2018, wuchunfu, telltime + +And many enthusiastic partners in the WeChat group! Thank you very much! + diff --git a/docs/en_us/Backend Deployment Document.md b/docs/en_us/Backend Deployment Document.md new file mode 100644 index 0000000000..5d0299f55d --- /dev/null +++ b/docs/en_us/Backend Deployment Document.md @@ -0,0 +1,223 @@ +# Backend Deployment Document + +There are two deployment modes for the backend: + +- 1. automatic deployment +- 2. source code compile and then deployment + +## 1、Preparations + +Download the latest version of the installation package, download address: [gitee download](https://gitee.com/easyscheduler/EasyScheduler/attach_files/) , download escheduler-backend-x.x.x.tar.gz(back-end referred to as escheduler-backend),escheduler-ui-x.x.x.tar.gz(front-end referred to as escheduler-ui) + + + +#### Preparations 1: Installation of basic software (self-installation of required items) + + * [Mysql](http://geek.analysys.cn/topic/124) (5.5+) : Mandatory + * [JDK](https://www.oracle.com/technetwork/java/javase/downloads/index.html) (1.8+) : Mandatory + * [ZooKeeper](https://www.jianshu.com/p/de90172ea680)(3.4.6+) :Mandatory + * [Hadoop](https://blog.csdn.net/Evankaka/article/details/51612437)(2.6+) :Optionally, if you need to use the resource upload function, MapReduce task submission needs to configure Hadoop (uploaded resource files are currently stored on Hdfs) + * [Hive](https://staroon.pro/2017/12/09/HiveInstall/)(1.2.1) : Optional, hive task submission needs to be installed + * Spark(1.x,2.x) : Optional, Spark task submission needs to be installed + * PostgreSQL(8.2.15+) : Optional, PostgreSQL PostgreSQL stored procedures need to be installed + +``` + Note: Easy Scheduler itself does not rely on Hadoop, Hive, Spark, PostgreSQL, but only calls their Client to run the corresponding tasks. +``` + +#### Preparations 2: Create deployment users + +- Deployment users are created on all machines that require deployment scheduling, because the worker service executes jobs in sudo-u {linux-user}, so deployment users need sudo privileges and are confidential. + +```Deployment account +vi /etc/sudoers + +# For example, the deployment user is an escheduler account +escheduler ALL=(ALL) NOPASSWD: NOPASSWD: ALL + +# And you need to comment out the Default requiretty line +#Default requiretty +``` + +#### Preparations 3: SSH Secret-Free Configuration +Configure SSH secret-free login on deployment machines and other installation machines. If you want to install easyscheduler on deployment machines, you need to configure native password-free login itself. + + + +- [Connect the host and other machines SSH](http://geek.analysys.cn/topic/113) + + +#### Preparations 4: database initialization + +* Create databases and accounts + + Enter the mysql command line service by following MySQL commands: + + > mysql -h {host} -u {user} -p{password} + + Then execute the following command to create database and account + + ```sql + CREATE DATABASE escheduler DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci; + GRANT ALL PRIVILEGES ON escheduler.* TO '{user}'@'%' IDENTIFIED BY '{password}'; + GRANT ALL PRIVILEGES ON escheduler.* TO '{user}'@'localhost' IDENTIFIED BY '{password}'; + flush privileges; + ``` + +* Versions 1.0.0 and 1.0.1 create tables and import basic data + Instructions:在escheduler-backend/sql/escheduler.sql和quartz.sql + + ```sql + mysql -h {host} -u {user} -p{password} -D {db} < escheduler.sql + + mysql -h {host} -u {user} -p{password} -D {db} < quartz.sql + ``` + +* Version 1.0.2 later (including 1.0.2) creates tables and imports basic data + Modify the following attributes in conf/dao/data_source.properties + + ``` + spring.datasource.url + spring.datasource.username + spring.datasource.password + ``` + Execute scripts for creating tables and importing basic data + ``` + sh ./script/create_escheduler.sh + ``` + +#### Preparations 5: Modify the deployment directory permissions and operation parameters + +Let's first get a general idea of the role of files (folders) in the escheduler-backend directory after decompression. + +```directory +bin : Basic service startup script +conf : Project Profile +lib : The project relies on jar packages, including individual module jars and third-party jars +script : Cluster Start, Stop and Service Monitor Start and Stop scripts +sql : The project relies on SQL files +install.sh : One-click deployment script +``` + +- Modify permissions (please modify the deployUser to the corresponding deployment user) so that the deployment user has operational privileges on the escheduler-backend directory + + `sudo chown -R deployUser:deployUser escheduler-backend` + +- Modify the `.escheduler_env.sh` environment variable in the conf/env/directory + +- Modify deployment parameters (depending on your server and business situation): + + - Modify the parameters in **install.sh** to replace the values required by your business + - MonitorServerState switch variable, added in version 1.0.3, controls whether to start the self-start script (monitor master, worker status, if off-line will start automatically). The default value of "false" means that the self-start script is not started, and if it needs to start, it is changed to "true". + - hdfsStartupSate switch variable controls whether to starthdfs + The default value of "false" means not to start hdfs + If you need to start hdfs instead of "true", you need to create the hdfs root path by yourself, that is, hdfsPath in install.sh. + + - If you use hdfs-related functions, you need to copy**hdfs-site.xml** and **core-site.xml** to the conf directory + + +## 2、Deployment +Automated deployment is recommended, and experienced partners can use source deployment as well. + +### 2.1 Automated Deployment + +- Install zookeeper tools + + `pip install kazoo` + +- Switch to deployment user, one-click deployment + + `sh install.sh` + +- Use the jps command to see if the service is started (jps comes with Java JDK) + +```aidl + MasterServer ----- Master Service + WorkerServer ----- Worker Service + LoggerServer ----- Logger Service + ApiApplicationServer ----- API Service + AlertServer ----- Alert Service +``` +If there are more than five services, the automatic deployment is successful + + +After successful deployment, the log can be viewed and stored in a specified folder. + +```log path + logs/ + ├── escheduler-alert-server.log + ├── escheduler-master-server.log + |—— escheduler-worker-server.log + |—— escheduler-api-server.log + |—— escheduler-logger-server.log +``` + +### 2.2 Compile source code to deploy + +After downloading the release version of the source package, unzip it into the root directory + +* Execute the compilation command: + +``` + mvn -U clean package assembly:assembly -Dmaven.test.skip=true +``` + +* View directory + +After normal compilation, target/escheduler-{version}/ is generated in the current directory + + + + + +### 2.3 Start-and-stop services commonly used in systems (for service purposes, please refer to System Architecture Design for details) + +* stop all services in the cluster at one click + + ` sh ./bin/stop_all.sh` + +* one click to open all services in the cluster + + ` sh ./bin/start_all.sh` + +* start and stop Master + +```start master +sh ./bin/escheduler-daemon.sh start master-server +sh ./bin/escheduler-daemon.sh stop master-server +``` + +* start and stop Worker + +```start worker +sh ./bin/escheduler-daemon.sh start worker-server +sh ./bin/escheduler-daemon.sh stop worker-server +``` + +* start and stop Api + +```start Api +sh ./bin/escheduler-daemon.sh start api-server +sh ./bin/escheduler-daemon.sh stop api-server +``` +* start and stop Logger + +```start Logger +sh ./bin/escheduler-daemon.sh start logger-server +sh ./bin/escheduler-daemon.sh stop logger-server +``` +* start and stop Alert + +```start Alert +sh ./bin/escheduler-daemon.sh start alert-server +sh ./bin/escheduler-daemon.sh stop alert-server +``` + +## 3、Database Upgrade +Database upgrade is a function added in version 1.0.2. The database can be upgraded automatically by executing the following commands + +```upgrade +sh ./script/upgrade_escheduler.sh +``` + + diff --git a/docs/en_us/Backend development documentation.md b/docs/en_us/Backend development documentation.md new file mode 100644 index 0000000000..49381fd6b5 --- /dev/null +++ b/docs/en_us/Backend development documentation.md @@ -0,0 +1,48 @@ +# Backend development documentation + +## Environmental requirements + + * [Mysql](http://geek.analysys.cn/topic/124) (5.5+) : Must be installed + * [JDK](https://www.oracle.com/technetwork/java/javase/downloads/index.html) (1.8+) : Must be installed + * [ZooKeeper](https://mirrors.tuna.tsinghua.edu.cn/apache/zookeeper)(3.4.6+) :Must be installed + * [Maven](http://maven.apache.org/download.cgi)(3.3+) :Must be installed + +Because the escheduler-rpc module in EasyScheduler uses Grpc, you need to use Maven to compile the generated classes. +For those who are not familiar with Maven, please refer to: [maven in five minutes](http://maven.apache.org/guides/getting-started/maven-in-five-minutes.html)(3.3+) + +http://maven.apache.org/install.html + +## Project compilation +After importing the EasyScheduler source code into the development tools such as Idea, first convert to the Maven project (right click and select "Add Framework Support") + +* Execute the compile command: + +``` + mvn -U clean package assembly:assembly -Dmaven.test.skip=true +``` + +* View directory + +After normal compilation, it will generate target/escheduler-{version}/ in the current directory. + +``` + bin + conf + lib + script + sql + install.sh +``` + +- Description + +``` +bin : basic service startup script +conf : project configuration file +lib : the project depends on the jar package, including the various module jars and third-party jars +script : cluster start, stop, and service monitoring start and stop scripts +sql : project depends on sql file +install.sh : one-click deployment script +``` + + diff --git a/docs/en_us/EasyScheduler-FAQ.md b/docs/en_us/EasyScheduler-FAQ.md new file mode 100644 index 0000000000..bbff613e26 --- /dev/null +++ b/docs/en_us/EasyScheduler-FAQ.md @@ -0,0 +1,285 @@ +## Q: EasyScheduler service introduction and recommended running memory + +A: EasyScheduler consists of 5 services, MasterServer, WorkerServer, ApiServer, AlertServer, LoggerServer and UI. + +| Service | Description | +| ------------------------- | ------------------------------------------------------------ | +| MasterServer | Mainly responsible for DAG segmentation and task status monitoring | +| WorkerServer/LoggerServer | Mainly responsible for the submission, execution and update of task status. LoggerServer is used for Rest Api to view logs through RPC | +| ApiServer | Provides the Rest Api service for the UI to call | +| AlertServer | Provide alarm service | +| UI | Front page display | + +Note:**Due to the large number of services, it is recommended that the single-machine deployment is preferably 4 cores and 16G or more.** + +--- + +## Q: Why can't an administrator create a project? + +A: The administrator is currently "**pure management**". There is no tenant, that is, there is no corresponding user on linux, so there is no execution permission, **so there is no project, resource and data source,** so there is no permission to create. **But there are all viewing permissions**. If you need to create a business operation such as a project, **use the administrator to create a tenant and a normal user, and then use the normal user login to operate**. We will release the administrator's creation and execution permissions in version 1.1.0, and the administrator will have all permissions. + +--- + +## Q: Which mailboxes does the system support? + +A: Support most mailboxes, qq, 163, 126, 139, outlook, aliyun, etc. are supported. Support TLS and SSL protocols, optionally configured in alert.properties + +--- + +## Q: What are the common system variable time parameters and how do I use them? + +A: Please refer to https://analysys.github.io/easyscheduler_docs_cn/%E7%B3%BB%E7%BB%9F%E4%BD%BF%E7%94%A8%E6%89%8B%E5%86%8C.html#%E7%B3%BB%E7%BB%9F%E5%8F%82%E6%95%B0 + +--- + +## Q: pip install kazoo This installation gives an error. Is it necessary to install? + +A: This is the python connection zookeeper needs to use, must be installed + +--- + +## Q: How to specify the machine running task + +A: Use **the administrator** to create a Worker group, **specify the Worker group** when the **process definition starts**, or **specify the Worker group on the task node**. If not specified, use Default, **Default is to select one of all the workers in the cluster to use for task submission and execution.** + +--- + +## Q: Priority of the task + +A: We also support t**he priority of processes and tasks**. Priority We have five levels of **HIGHEST, HIGH, MEDIUM, LOW and LOWEST**. **You can set the priority between different process instances, or you can set the priority of different task instances in the same process instance.** For details, please refer to the task priority design https://analysys.github.io/easyscheduler_docs_cn/%E7%B3%BB%E7%BB%9F%E6%9E%B6%E6%9E%84%E8%AE%BE%E8%AE%A1.html#%E7%B3%BB%E7%BB%9F%E6%9E%B6%E6%9E%84%E8%AE%BE%E8%AE%A1 + +---- + +## Q: Escheduler-grpc gives an error + +A: Execute in the root directory: mvn -U clean package assembly:assembly -Dmaven.test.skip=true , then refresh the entire project + +---- + +## Q: Does EasyScheduler support running on windows? + +A: In theory, **only the Worker needs to run on Linux**. Other services can run normally on Windows. But it is still recommended to deploy on Linux. + +----- + +## Q: UI compiles node-sass prompt in linux: Error: EACCESS: permission denied, mkdir xxxx + +A: Install **npm install node-sass --unsafe-perm** separately, then **npm install** + +--- + +## Q: UI cannot log in normally. + +A: 1, if it is node startup, check whether the .env API_BASE configuration under escheduler-ui is the Api Server service address. + + 2, If it is nginx booted and installed via **install-escheduler-ui.sh**, check if the proxy_pass configuration in **/etc/nginx/conf.d/escheduler.conf** is the Api Server service. address + + 3, if the above configuration is correct, then please check if the Api Server service is normal, curl http://192.168.xx.xx:12345/escheduler/users/get-user-info, check the Api Server log, if Prompt cn.escheduler.api.interceptor.LoginHandlerInterceptor:[76] - session info is null, which proves that the Api Server service is normal. + + 4, if there is no problem above, you need to check if **server.context-path and server.port configuration** in **application.properties** is correct + +--- + +## Q: After the process definition is manually started or scheduled, no process instance is generated. + +A: 1, first **check whether the MasterServer service exists through jps**, or directly check whether there is a master service in zk from the service monitoring. + + 2,If there is a master service, check **the command status statistics** or whether new records are added in **t_escheduler_error_command**. If it is added, **please check the message field.** + +--- + +## Q : The task status is always in the successful submission status. + +A: 1, **first check whether the WorkerServer service exists through jps**, or directly check whether there is a worker service in zk from the service monitoring. + + 2,If the **WorkerServer** service is normal, you need to **check whether the MasterServer puts the task task in the zk queue. You need to check whether the task is blocked in the MasterServer log and the zk queue.** + + 3, if there is no problem above, you need to locate whether the Worker group is specified, but **the machine grouped by the worker is not online**.** + +--- + +## Q: Is there a Docker image and a Dockerfile? + +A: Provide Docker image and Dockerfile. + +Docker image address: https://hub.docker.com/r/escheduler/escheduler_images + +Dockerfile address: https://github.com/qiaozhanwei/escheduler_dockerfile/tree/master/docker_escheduler + +------ + +## Q : Need to pay attention to the problem in install.sh + +A: 1, if the replacement variable contains special characters, **use the \ transfer character to transfer** + + 2, installPath="/data1_1T/escheduler", **this directory can not be the same as the install.sh directory currently installed with one click.** + + 3, deployUser = "escheduler", **the deployment user must have sudo privileges**, because the worker is executed by sudo -u tenant sh xxx.command + + 4, monitorServerState = "false", whether the service monitoring script is started, the default is not to start the service monitoring script. **If the service monitoring script is started, the master and worker services are monitored every 5 minutes, and if the machine is down, it will automatically restart.** + + 5, hdfsStartupSate="false", whether to enable HDFS resource upload function. The default is not enabled. **If it is not enabled, the resource center cannot be used.** If enabled, you need to configure the configuration of fs.defaultFS and yarn in conf/common/hadoop/hadoop.properties. If you use namenode HA, you need to copy core-site.xml and hdfs-site.xml to the conf root directory. + + Note: **The 1.0.x version does not automatically create the hdfs root directory, you need to create it yourself, and you need to deploy the user with hdfs operation permission.** + +--- + +## Q : Process definition and process instance offline exception + +A : For **versions prior to 1.0.4**, modify the code under the escheduler-api cn.escheduler.api.quartz package. + +``` +public boolean deleteJob(String jobName, String jobGroupName) { + lock.writeLock().lock(); + try { + JobKey jobKey = new JobKey(jobName,jobGroupName); + if(scheduler.checkExists(jobKey)){ + logger.info("try to delete job, job name: {}, job group name: {},", jobName, jobGroupName); + return scheduler.deleteJob(jobKey); + }else { + return true; + } + + } catch (SchedulerException e) { + logger.error(String.format("delete job : %s failed",jobName), e); + } finally { + lock.writeLock().unlock(); + } + return false; + } +``` + +--- + +## Q: Can the tenant created before the HDFS startup use the resource center normally? + +A: No. Because the tenant created by HDFS is not started, the tenant directory will not be registered in HDFS. So the last resource will report an error. + +## Q: In the multi-master and multi-worker state, the service is lost, how to be fault-tolerant + +A: **Note:** **Master monitors Master and Worker services.** + + 1,If the Master service is lost, other Masters will take over the process of the hanged Master and continue to monitor the Worker task status. + + 2,If the Worker service is lost, the Master will monitor that the Worker service is gone. If there is a Yarn task, the Kill Yarn task will be retried. + +Please see the fault-tolerant design for details:https://analysys.github.io/easyscheduler_docs_cn/%E7%B3%BB%E7%BB%9F%E6%9E%B6%E6%9E%84%E8%AE%BE%E8%AE%A1.html#%E7%B3%BB%E7%BB%9F%E6%9E%B6%E6%9E%84%E8%AE%BE%E8%AE%A1 + +--- + +## Q : Fault tolerance for a machine distributed by Master and Worker + +A: The 1.0.3 version only implements the fault tolerance of the Master startup process, and does not take the Worker Fault Tolerance. That is to say, if the Worker hangs, no Master exists. There will be problems with this process. We will add Master and Worker startup fault tolerance in version **1.1.0** to fix this problem. If you want to manually modify this problem, you need to **modify the running task for the running worker task that is running the process across the restart and has been dropped. The running process is set to the failed state across the restart**. Then resume the process from the failed node. + +--- + +## Q : Timing is easy to set to execute every second + +A : Note when setting the timing. If the first digit (* * * * * ? *) is set to *, it means execution every second. **We will add a list of recently scheduled times in version 1.1.0.** You can see the last 5 running times online at http://cron.qqe2.com/ + + + +## Q: Is there a valid time range for timing? + +A: Yes, **if the timing start and end time is the same time, then this timing will be invalid timing. If the end time of the start and end time is smaller than the current time, it is very likely that the timing will be automatically deleted.** + + + +## Q : There are several implementations of task dependencies + +A: 1, the task dependency between **DAG**, is **from the zero degree** of the DAG segmentation + + 2, there are **task dependent nodes**, you can achieve cross-process tasks or process dependencies, please refer to the (DEPENDENT) node:https://analysys.github.io/easyscheduler_docs_cn/%E7%B3%BB%E7%BB%9F%E4%BD%BF%E7%94%A8%E6%89%8B%E5%86%8C.html#%E4%BB%BB%E5%8A%A1%E8%8A%82%E7%82%B9%E7%B1%BB%E5%9E%8B%E5%92%8C%E5%8F%82%E6%95%B0%E8%AE%BE%E7%BD%AE + + Note: **Cross-project processes or task dependencies are not supported** + +## Q: There are several ways to start the process definition. + +A: 1, in **the process definition list**, click the **Start** button. + + 2, **the process definition list adds a timer**, scheduling start process definition. + + 3, process definition **view or edit** the DAG page, any **task node right click** Start process definition. + + 4, you can define DAG editing for the process, set the running flag of some tasks to **prohibit running**, when the process definition is started, the connection of the node will be removed from the DAG. + + + +## Q : Python task setting Python version + +A: 1,**for the version after 1.0.3** only need to modify PYTHON_HOME in conf/env/.escheduler_env.sh + +``` +export PYTHON_HOME=/bin/python +``` + +Note: This is **PYTHON_HOME** , which is the absolute path of the python command, not the simple PYTHON_HOME. Also note that when exporting the PATH, you need to directly + +``` +export PATH=$HADOOP_HOME/bin:$SPARK_HOME1/bin:$SPARK_HOME2/bin:$PYTHON_HOME:$JAVA_HOME/bin:$HIVE_HOME/bin:$PATH +``` + + 2,For versions prior to 1.0.3, the Python task only supports the Python version of the system. It does not support specifying the Python version. + +## Q:Worker Task will generate a child process through sudo -u tenant sh xxx.command, will kill when kill + +A: We will add the kill task in 1.0.4 and kill all the various child processes generated by the task. + + + +## Q : How to use the queue in EasyScheduler, what does the user queue and tenant queue mean? + +A : The queue in the EasyScheduler can be configured on the user or the tenant. **The priority of the queue specified by the user is higher than the priority of the tenant queue.** For example, to specify a queue for an MR task, the queue is specified by mapreduce.job.queuename. + +Note: When using the above method to specify the queue, the MR uses the following methods: + +``` + Configuration conf = new Configuration(); + GenericOptionsParser optionParser = new GenericOptionsParser(conf, args); + String[] remainingArgs = optionParser.getRemainingArgs(); +``` + + + +If it is a Spark task --queue mode specifies the queue + + + +## Q : Master or Worker reports the following alarm + +
+ +
+ + + +A : Change the value of master.properties **master.reserved.memory** under conf to a smaller value, say 0.1 or the value of worker.properties **worker.reserved.memory** is a smaller value, say 0.1 + +## Q: The hive version is 1.1.0+cdh5.15.0, and the SQL hive task connection is reported incorrectly. + ++ +
+ + + +A : Will hive pom + +``` ++ +
+ +* Create queue + ++ +
+ + * Create tenant ++ +
+ + * Creating Ordinary Users ++ +
+ + * Create an alarm group + ++ +
+ + * Log in with regular users + > Click on the user name in the upper right corner to "exit" and re-use the normal user login. + + * Project Management - > Create Project - > Click on Project Name ++ +
+ + * Click Workflow Definition - > Create Workflow Definition - > Online Process Definition + ++ +
+ + * Running Process Definition - > Click Workflow Instance - > Click Process Instance Name - > Double-click Task Node - > View Task Execution Log + ++ +
+ + diff --git a/docs/en_us/System manual.md b/docs/en_us/System manual.md new file mode 100644 index 0000000000..c297eb2714 --- /dev/null +++ b/docs/en_us/System manual.md @@ -0,0 +1,715 @@ +# System Use Manual + + +## Quick Start + + > Refer to[ Quick Start ]( Quick-Start.md) + +## Operational Guidelines + + - Administrator accounts can only be managed in terms of authority, do not participate in specific business, can not create projects, and can not perform related operations on process definition. + - The following operations can only be performed by using ordinary user login system. + +### Create a project + + - Click "Project - > Create Project", enter project name, description, and click "Submit" to create a new project. + - Click on the project name to enter the project home page. ++ +
+ +> Project Home Page contains task status statistics, process status statistics, process definition statistics, queue statistics, command statistics. + + - Task State Statistics: It refers to the statistics of the number of tasks to be run, failed, running, completed and succeeded in a given time frame. + - Process State Statistics: It refers to the statistics of the number of waiting, failing, running, completing and succeeding process instances in a specified time range. + - Process Definition Statistics: The process definition created by the user and the process definition granted by the administrator to the user are counted. + - Queue statistics: Worker performs queue statistics, the number of tasks to be performed and the number of tasks to be killed + - Command Status Statistics: Statistics of the Number of Commands Executed + +### Creating Process definitions + - Go to the project home page, click "Process definitions" and enter the list page of process definition. + - Click "Create process" to create a new process definition. + - Drag the "SHELL" node to the canvas and add a shell task. + - Fill in the Node Name, Description, and Script fields. + - Selecting "task priority" will give priority to high-level tasks in the execution queue. Tasks with the same priority will be executed in the first-in-first-out order. + - Timeout alarm. Fill in "Overtime Time". When the task execution time exceeds the overtime, it can alarm and fail over time. + - Fill in "Custom Parameters" and refer to [Custom Parameters](#用户自定义参数) ++ +
+ - Increase the order of execution between nodes: click "line connection". As shown, task 1 and task 3 are executed in parallel. When task 1 is executed, task 2 and task 3 are executed simultaneously. + ++ +
+ + - Delete dependencies: Click on the arrow icon to "drag nodes and select items", select the connection line, click on the delete icon to delete dependencies between nodes. ++ +
+ + - Click "Save", enter the name of the process definition, the description of the process definition, and set the global parameters. + ++ +
+ + - For other types of nodes, refer to [task node types and parameter settings](#task node types and parameter settings) + +### Execution process definition + - **The process definition of the off-line state can be edited, but not run**, so the on-line workflow is the first step. + > Click on the Process definition, return to the list of process definitions, click on the icon "online", online process definition. + + > Before offline process, it is necessary to offline timed management before offline process can be successfully defined. + > + > + + - Click "Run" to execute the process. Description of operation parameters: + * Failure strategy:**When a task node fails to execute, other parallel task nodes need to execute the strategy**。”Continue "Representation: Other task nodes perform normally" and "End" Representation: Terminate all ongoing tasks and terminate the entire process. + * Notification strategy:When the process is over, send process execution information notification mail according to the process status. + * Process priority: The priority of process running is divided into five levels:the highest , the high , the medium , the low , and the lowest . High-level processes are executed first in the execution queue, and processes with the same priority are executed first in first out order. + * Worker group This process can only be executed in a specified machine group. Default, by default, can be executed on any worker. + * Notification group: When the process ends or fault tolerance occurs, process information is sent to all members of the notification group by mail. + * Recipient: Enter the mailbox and press Enter key to save. When the process ends and fault tolerance occurs, an alert message is sent to the recipient list. + * Cc: Enter the mailbox and press Enter key to save. When the process is over and fault-tolerant occurs, alarm messages are copied to the copier list. ++ +
+ + * Complement: To implement the workflow definition of a specified date, you can select the time range of the complement (currently only support for continuous days), such as the data from May 1 to May 10, as shown in the figure: ++ +
+ +> SComplement execution mode includes serial execution and parallel execution. In serial mode, the complement will be executed sequentially from May 1 to May 10. In parallel mode, the tasks from May 1 to May 10 will be executed simultaneously. + +### Timing Process Definition + - Create Timing: "Process Definition - > Timing" + - Choose start-stop time, in the start-stop time range, regular normal work, beyond the scope, will not continue to produce timed workflow instances. ++ +
+ + - Add a timer to be executed once a day at 5:00 a.m. as shown below: ++ +
+ + - Timely online,**the newly created timer is offline. You need to click "Timing Management - >online" to work properly.** + +### View process instances + > Click on "Process Instances" to view the list of process instances. + + > Click on the process name to see the status of task execution. + ++ +
+ + > Click on the task node, click "View Log" to view the task execution log. + ++ +
+ + > Click on the task instance node, click **View History** to view the list of task instances that the process instance runs. + ++ +
+ + + > Operations on workflow instances: + ++ +
+ + * Editor: You can edit the terminated process. When you save it after editing, you can choose whether to update the process definition or not. + * Rerun: A process that has been terminated can be re-executed. + * Recovery failure: For a failed process, a recovery failure operation can be performed, starting at the failed node. + * Stop: Stop the running process, the background will `kill` he worker process first, then `kill -9` operation. + * Pause:The running process can be **suspended**, the system state becomes **waiting to be executed**, waiting for the end of the task being executed, and suspending the next task to be executed. + * Restore pause: **The suspended process** can be restored and run directly from the suspended node + * Delete: Delete process instances and task instances under process instances + * Gantt diagram: The vertical axis of Gantt diagram is the topological ordering of task instances under a process instance, and the horizontal axis is the running time of task instances, as shown in the figure: ++ +
+ +### View task instances + > Click on "Task Instance" to enter the Task List page and query the performance of the task. + > + > + ++ +
+ + > Click "View Log" in the action column to view the log of task execution. + ++ +
+ +### Create data source + > Data Source Center supports MySQL, POSTGRESQL, HIVE and Spark data sources. + +#### Create and edit MySQL data source + + - Click on "Datasource - > Create Datasources" to create different types of datasources according to requirements. +- Datasource: Select MYSQL +- Datasource Name: Name of Input Datasource +- Description: Description of input datasources +- IP: Enter the IP to connect to MySQL +- Port: Enter the port to connect MySQL +- User name: Set the username to connect to MySQL +- Password: Set the password to connect to MySQL +- Database name: Enter the name of the database connecting MySQL +- Jdbc connection parameters: parameter settings for MySQL connections, filled in as JSON + ++ +
+ + > Click "Test Connect" to test whether the data source can be successfully connected. + > + > + +#### Create and edit POSTGRESQL data source + +- Datasource: Select POSTGRESQL +- Datasource Name: Name of Input Data Source +- Description: Description of input data sources +- IP: Enter IP to connect to POSTGRESQL +- Port: Input port to connect POSTGRESQL +- Username: Set the username to connect to POSTGRESQL +- Password: Set the password to connect to POSTGRESQL +- Database name: Enter the name of the database connecting to POSTGRESQL +- Jdbc connection parameters: parameter settings for POSTGRESQL connections, filled in as JSON + ++ +
+ +#### Create and edit HIVE data source + +1.Connect with HiveServer 2 + ++ +
+ + - Datasource: Select HIVE +- Datasource Name: Name of Input Datasource +- Description: Description of input datasources +- IP: Enter IP to connect to HIVE +- Port: Input port to connect to HIVE +- Username: Set the username to connect to HIVE +- Password: Set the password to connect to HIVE +- Database Name: Enter the name of the database connecting to HIVE +- Jdbc connection parameters: parameter settings for HIVE connections, filled in in as JSON + +2.Connect using Hive Server 2 HA Zookeeper mode + ++ +
+ + +Note: If **kerberos** is turned on, you need to fill in **Principal ** ++ +
+ + + + +#### Create and Edit Datasource + ++ +
+ +- Datasource: Select Spark +- Datasource Name: Name of Input Datasource +- Description: Description of input datasources +- IP: Enter the IP to connect to Spark +- Port: Input port to connect Spark +- Username: Set the username to connect to Spark +- Password: Set the password to connect to Spark +- Database name: Enter the name of the database connecting to Spark +- Jdbc Connection Parameters: Parameter settings for Spark Connections, filled in as JSON + + + +Note: If **kerberos** If Kerberos is turned on, you need to fill in **Principal** + ++ +
+ +### Upload Resources + - Upload resource files and udf functions, all uploaded files and resources will be stored on hdfs, so the following configuration items are required: + +``` +conf/common/common.properties + -- hdfs.startup.state=true +conf/common/hadoop.properties + -- fs.defaultFS=hdfs://xxxx:8020 + -- yarn.resourcemanager.ha.rm.ids=192.168.xx.xx,192.168.xx.xx + -- yarn.application.status.address=http://xxxx:8088/ws/v1/cluster/apps/%s +``` + +#### File Manage + + > It is the management of various resource files, including creating basic txt/log/sh/conf files, uploading jar packages and other types of files, editing, downloading, deleting and other operations. + > + > + >+ > + >
+ + * Create file + > File formats support the following types:txt、log、sh、conf、cfg、py、java、sql、xml、hql + ++ +
+ + * Upload Files + +> Upload Files: Click the Upload button to upload, drag the file to the upload area, and the file name will automatically complete the uploaded file name. + ++ +
+ + + * File View + +> For viewable file types, click on the file name to view file details + ++ +
+ + * Download files + +> You can download a file by clicking the download button in the top right corner of the file details, or by downloading the file under the download button after the file list. + + * File rename + ++ +
+ +#### Delete +> File List - > Click the Delete button to delete the specified file + +#### Resource management + > Resource management and file management functions are similar. The difference is that resource management is the UDF function of uploading, and file management uploads user programs, scripts and configuration files. + + * Upload UDF resources + > The same as uploading files. + +#### Function management + + * Create UDF Functions + > Click "Create UDF Function", enter parameters of udf function, select UDF resources, and click "Submit" to create udf function. + > + > + > + > Currently only temporary udf functions for HIVE are supported + > + > + > + > - UDF function name: name when entering UDF Function + > - Package Name: Full Path of Input UDF Function + > - Parameter: Input parameters used to annotate functions + > - Database Name: Reserved Field for Creating Permanent UDF Functions + > - UDF Resources: Set up the resource files corresponding to the created UDF + > + > + ++ +
+ +## Security (Privilege System) + + - The security has the functions of queue management, tenant management, user management, warning group management, worker group manager, token manage and other functions. It can also authorize resources, data sources, projects, etc. +- Administrator login, default username password: admin/escheduler 123 + + + +Create queues + + + + - Queues are used to execute spark, mapreduce and other programs, which require the use of "queue" parameters. +- Security - > Queue Manage - > Creat Queue ++ +
+ + +### Create Tenants + - The tenant corresponds to the user of Linux, which is used by the worker to submit jobs. If Linux does not have this user, the worker creates the user when executing the script. + - Tenant Code:**the tenant code is the only user on Linux that can't be duplicated.** + ++ +
+ +### Create Ordinary Users + - Users are divided into **administrator users** and **ordinary users**. + * Administrators have only **authorization and user management** privileges, and no privileges to **create project and process-defined operations**. + * Ordinary users can **create projects and create, edit, and execute process definitions**. + * Note: **If the user switches the tenant, all resources under the tenant will be copied to the switched new tenant.** ++ +
+ +### Create alarm group + * The alarm group is a parameter set at start-up. After the process is finished, the status of the process and other information will be sent to the alarm group by mail. + * New and Editorial Warning Group ++ +
+ +### Create Worker Group + - Worker grouping provides a mechanism for tasks to run on a specified worker. Administrators set worker groups, and each task node can set worker groups for the task to run. If the task-specified groups are deleted or no groups are specified, the task will run on the worker specified by the process instance. +- Multiple IP addresses within a worker group (**no aliases can be written**), separated by **commas in English** + ++ +
+ +### Token manage + - Because the back-end interface has login check and token management, it provides a way to operate the system by calling the interface. +- Call examples: + +```令牌调用示例 + /** + * test token + */ + public void doPOSTParam()throws Exception{ + // create HttpClient + CloseableHttpClient httpclient = HttpClients.createDefault(); + + // create http post request + HttpPost httpPost = new HttpPost("http://127.0.0.1:12345/escheduler/projects/create"); + httpPost.setHeader("token", "123"); + // set parameters + List+ +
+ +- 2.Select the project button to authorize the project + ++ + + + +
+ +### Monitor center + - Service management is mainly to monitor and display the health status and basic information of each service in the system. + +#### Master monitor + - Mainly related information about master. ++ +
+ +#### Worker monitor + - Mainly related information of worker. + + + ++ +
+ +#### Zookeeper monitor + - Mainly the configuration information of each worker and master in zookpeeper. + ++ +
+ +#### Mysql monitor + - Mainly the health status of mysql + ++ +
+ +## Task Node Type and Parameter Setting + +### Shell + + - The shell node, when the worker executes, generates a temporary shell script, which is executed by a Linux user with the same name as the tenant. +> Drag the ![PNG](https://analysys.github.io/easyscheduler_docs_cn/images/toolbar_SHELL.png) task node in the toolbar onto the palette and double-click the task node as follows: + ++ +
` + +- Node name: The node name in a process definition is unique +- Run flag: Identify whether the node can be scheduled properly, and if it does not need to be executed, you can turn on the forbidden execution switch. +- Description : Describes the function of the node +- Number of failed retries: Number of failed task submissions, support drop-down and manual filling +- Failure Retry Interval: Interval between tasks that fail to resubmit tasks, support drop-down and manual filling +- Script: User-developed SHELL program +- Resources: A list of resource files that need to be invoked in a script +- Custom parameters: User-defined parameters that are part of SHELL replace the contents of scripts with ${variables} + +### SUB_PROCESS + - The sub-process node is to execute an external workflow definition as its own task node. +> Drag the ![PNG](https://analysys.github.io/easyscheduler_docs_cn/images/toolbar_SUB_PROCESS.png) task node in the toolbar onto the palette and double-click the task node as follows: + ++ +
+ +- Node name: The node name in a process definition is unique +- Run flag: Identify whether the node is scheduled properly +- Description: Describes the function of the node +- Sub-node: The process definition of the selected sub-process is selected, and the process definition of the selected sub-process can be jumped to by entering the sub-node in the upper right corner. + +### DEPENDENT + + - Dependent nodes are **dependent checking nodes**. For example, process A depends on the successful execution of process B yesterday, and the dependent node checks whether process B has a successful execution instance yesterday. + +> Drag the ![PNG](https://analysys.github.io/easyscheduler_docs_cn/images/toolbar_DEPENDENT.png) ask node in the toolbar onto the palette and double-click the task node as follows: + ++ +
+ + > Dependent nodes provide logical judgment functions, such as checking whether yesterday's B process was successful or whether the C process was successfully executed. + ++ +
+ + > For example, process A is a weekly task and process B and C are daily tasks. Task A requires that task B and C be successfully executed every day of the week, as shown in the figure: + ++ +
+ + > If weekly A also needs to be implemented successfully on Tuesday: + > + > + ++ +
+ +### PROCEDURE + - The procedure is executed according to the selected data source. +> Drag the ![PNG](https://analysys.github.io/easyscheduler_docs_cn/images/toolbar_PROCEDURE.png) task node in the toolbar onto the palette and double-click the task node as follows: + ++ +
+ +- Datasource: The data source type of stored procedure supports MySQL and POSTGRRESQL, and chooses the corresponding data source. +- Method: The method name of the stored procedure +- Custom parameters: Custom parameter types of stored procedures support IN and OUT, and data types support nine data types: VARCHAR, INTEGER, LONG, FLOAT, DOUBLE, DATE, TIME, TIMESTAMP and BOOLEAN. + +### SQL + - Execute non-query SQL functionality ++ +
+ + - Executing the query SQL function, you can choose to send mail in the form of tables and attachments to the designated recipients. +> Drag the ![PNG](https://analysys.github.io/easyscheduler_docs_cn/images/toolbar_SQL.png) task node in the toolbar onto the palette and double-click the task node as follows: + ++ +
+ +- Datasource: Select the corresponding datasource +- sql type: support query and non-query, query is select type query, there is a result set returned, you can specify mail notification as table, attachment or table attachment three templates. Non-query is not returned by result set, and is for update, delete, insert three types of operations +- sql parameter: input parameter format is key1 = value1; key2 = value2... +- sql statement: SQL statement +- UDF function: For HIVE type data sources, you can refer to UDF functions created in the resource center, other types of data sources do not support UDF functions for the time being. +- Custom parameters: SQL task type, and stored procedure is to customize the order of parameters to set values for methods. Custom parameter type and data type are the same as stored procedure task type. The difference is that the custom parameter of the SQL task type replaces the ${variable} in the SQL statement. + + + +### SPARK + + - Through SPARK node, SPARK program can be directly executed. For spark node, worker will use `spark-submit` mode to submit tasks. + +> Drag the ![PNG](https://analysys.github.io/easyscheduler_docs_cn/images/toolbar_SPARK.png) task node in the toolbar onto the palette and double-click the task node as follows: +> +> + ++ +
+ +- Program Type: Support JAVA, Scala and Python +- Class of the main function: The full path of Main Class, the entry to the Spark program +- Master jar package: It's Spark's jar package +- Deployment: support three modes: yarn-cluster, yarn-client, and local +- Driver Kernel Number: Driver Kernel Number and Memory Number can be set +- Executor Number: Executor Number, Executor Memory Number and Executor Kernel Number can be set +- Command Line Parameters: Setting the input parameters of Spark program to support the replacement of custom parameter variables. +- Other parameters: support - jars, - files, - archives, - conf format +- Resource: If a resource file is referenced in other parameters, you need to select the specified resource. +- Custom parameters: User-defined parameters in MR locality that replace the contents in scripts with ${variables} + +Note: JAVA and Scala are just used for identification, no difference. If it's a Spark developed by Python, there's no class of the main function, and everything else is the same. + +### MapReduce(MR) + - Using MR nodes, MR programs can be executed directly. For Mr nodes, worker submits tasks using `hadoop jar` + + +> Drag the ![PNG](https://analysys.github.io/easyscheduler_docs_cn/images/toolbar_MR.png) task node in the toolbar onto the palette and double-click the task node as follows: + + 1. JAVA program + ++ +
+ +- Class of the main function: The full path of the MR program's entry Main Class +- Program Type: Select JAVA Language +- Master jar package: MR jar package +- Command Line Parameters: Setting the input parameters of MR program to support the replacement of custom parameter variables +- Other parameters: support - D, - files, - libjars, - archives format +- Resource: If a resource file is referenced in other parameters, you need to select the specified resource. +- Custom parameters: User-defined parameters in MR locality that replace the contents in scripts with ${variables} + +2. Python program + ++ +
+ +- Program Type: Select Python Language +- Main jar package: Python jar package running MR +- Other parameters: support - D, - mapper, - reducer, - input - output format, where user-defined parameters can be set, such as: +- mapper "mapper.py 1" - file mapper.py-reducer reducer.py-file reducer.py-input/journey/words.txt-output/journey/out/mr/${current TimeMillis} +- Among them, mapper. py 1 after - mapper is two parameters, the first parameter is mapper. py, and the second parameter is 1. +- Resource: If a resource file is referenced in other parameters, you need to select the specified resource. +- Custom parameters: User-defined parameters in MR locality that replace the contents in scripts with ${variables} + +### Python + - With Python nodes, Python scripts can be executed directly. For Python nodes, worker will use `python ** `to submit tasks. + + + + +> Drag the ![PNG](https://analysys.github.io/easyscheduler_docs_cn/images/toolbar_PYTHON.png) task node in the toolbar onto the palette and double-click the task node as follows: + ++ +
+ +- Script: User-developed Python program +- Resource: A list of resource files that need to be invoked in a script +- Custom parameters: User-defined parameters that are part of Python that replace the contents in the script with ${variables} + +### System parameter + +variable | meaning |
---|---|
${system.biz.date} | +The timing time of routine dispatching instance is one day before, in yyyyyMMdd format. When data is supplemented, the date + 1 | +
${system.biz.curdate} | +Daily scheduling example timing time, format is yyyyyMMdd, when supplementing data, the date + 1 | +
${system.datetime} | +Daily scheduling example timing time, format is yyyyyMMddHmmss, when supplementing data, the date + 1 | +
+ +
+ +> global_bizdate is a global parameter, referring to system parameters. + ++ +
+ +> In tasks, local_param_bizdate refers to global parameters by ${global_bizdate} for scripts, the value of variable local_param_bizdate can be referenced by${local_param_bizdate}, or the value of local_param_bizdate can be set directly by JDBC. + + + diff --git a/docs/en_us/Upgrade documentation.md b/docs/en_us/Upgrade documentation.md new file mode 100644 index 0000000000..b66e6c2584 --- /dev/null +++ b/docs/en_us/Upgrade documentation.md @@ -0,0 +1,38 @@ + +# EasyScheduler upgrade documentation + +## 1. Back up the previous version of the file and database + +## 2. Stop all services of escheduler + + `sh ./script/stop_all.sh` + +## 3. Download the new version of the installation package + +- [gitee](https://gitee.com/easyscheduler/EasyScheduler/attach_files), download the latest version of the front and rear installation package (backend referred to as escheduler-backend, front end referred to as escheduler-ui) +- The following upgrade operations need to be performed in the new version of the directory + +## 4. Database upgrade +- Modify the following properties in conf/dao/data_source.properties + +``` + spring.datasource.url + spring.datasource.username + spring.datasource.password +``` + +- Execute database upgrade script + +`sh ./script/upgrade_escheduler.sh` + +## 5. Backend service upgrade + +- Modify the contents of the install.sh configuration and execute the upgrade script + + `sh install.sh` + +## 6. Frontend service upgrade +- Overwrite the previous version of the dist directory +- Restart the nginx service + + `systemctl restart nginx` \ No newline at end of file