分布式调度框架。
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

151 lines
6.5 KiB

.. Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
.. http://www.apache.org/licenses/LICENSE-2.0
.. Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
Tutorial
========
This tutorial show you the basic concept of *PyDolphinScheduler* and tell all
things you should know before you submit or run your first workflow. If you
still not install *PyDolphinScheduler* and start Apache DolphinScheduler, you
could go and see :ref:`how to getting start PyDolphinScheduler <start:getting started>`
Overview of Tutorial
--------------------
Here have an overview of our tutorial, and it look a little complex but do not
worry about that because we explain this example below as detailed as possible.
.. literalinclude:: ../../examples/tutorial.py
:start-after: [start tutorial]
:end-before: [end tutorial]
Import Necessary Module
-----------------------
First of all, we should importing necessary module which we would use later just
like other Python package. We just create a minimum demo here, so we just import
:class:`pydolphinscheduler.core.process_definition` and
:class:`pydolphinscheduler.tasks.shell`.
.. literalinclude:: ../../examples/tutorial.py
:start-after: [start package_import]
:end-before: [end package_import]
If you want to use other task type you could click and
:doc:`see all tasks we support <tasks/index>`
Process Definition Declaration
------------------------------
We should instantiate object after we import them from `import necessary module`_.
Here we declare basic arguments for process definition(aka, workflow). We define
the name of process definition, using `Python context manager`_ and it
**the only required argument** for object process definition. Beside that we also
declare three arguments named `schedule`, `start_time` which setting workflow schedule
interval and schedule start_time, and argument `tenant` which changing workflow's
task running user in the worker, :ref:`section tenant <concept:tenant>` in *PyDolphinScheduler*
:doc:`concept` page have more detail information.
.. literalinclude:: ../../examples/tutorial.py
:start-after: [start workflow_declare]
:end-before: [end workflow_declare]
We could find more detail about process definition in
:ref:`concept about process definition <concept:process definition>` if you interested in it.
For all arguments of object process definition, you could find in the
:class:`pydolphinscheduler.core.process_definition` api documentation.
Task Declaration
----------------
Here we declare four tasks, and bot of them are simple task of
:class:`pydolphinscheduler.tasks.shell` which running `echo` command in terminal.
Beside the argument `command`, we also need setting argument `name` for each task *(not
only shell task, `name` is required for each type of task)*.
.. literalinclude:: ../../examples/tutorial.py
:dedent: 0
:start-after: [start task_declare]
:end-before: [end task_declare]
Beside shell task, *PyDolphinScheduler* support multiple tasks and you could
find in :doc:`tasks/index`.
Setting Task Dependence
-----------------------
After we declare both process definition and task, we have one workflow with
four tasks, both all tasks is independent so that they would run in parallel.
We should reorder the sort and the dependence of tasks. It useful when we need
run prepare task before we run actual task or we need tasks running is specific
rule. We both support attribute `set_downstream` and `set_upstream`, or bitwise
operators `>>` and `<<`.
In this example, we set task `task_parent` is the upstream task of task
`task_child_one` and `task_child_two`, and task `task_union` is the downstream
task of both these two task.
.. literalinclude:: ../../examples/tutorial.py
:dedent: 0
:start-after: [start task_relation_declare]
:end-before: [end task_relation_declare]
Please notice that we could grouping some tasks and set dependence if they have
same downstream or upstream. We declare task `task_child_one` and `task_child_two`
as a group here, named as `task_group` and set task `task_parent` as upstream of
both of them. You could see more detail in :ref:`concept:Tasks Dependence` section in concept
documentation.
Submit Or Run Workflow
----------------------
Now we finish our workflow definition, with task and task dependence, but all
these things are in local, we should let Apache DolphinScheduler daemon know what we
define our workflow. So the last thing we have to do here is submit our workflow to
Apache DolphinScheduler daemon.
We here in the example using `ProcessDefinition` attribute `run` to submit workflow
to the daemon, and set the schedule time we just declare in `process definition declaration`_.
Now, we could run the Python code like other Python script, for the basic usage run
:code:`python tutorial.py` to trigger and run it.
.. literalinclude:: ../../examples/tutorial.py
:dedent: 0
:start-after: [start submit_or_run]
:end-before: [end submit_or_run]
If you not start your Apache DolphinScheduler server, you could find the way in
:ref:`start:start Python gateway server` and it would have more detail about related server
start. Beside attribute `run`, we have attribute `submit` for object `ProcessDefinition`
and it just submit workflow to the daemon but not setting the schedule information. For
more detail you could see :ref:`concept:process definition`.
DAG Graph After Tutorial Run
----------------------------
After we run the tutorial code, you could login Apache DolphinScheduler web UI,
go and see the `DolphinScheduler project page`_. they is a new process definition be
created and named "Tutorial". It create by *PyDolphinScheduler* and the DAG graph as below
.. literalinclude:: ../../examples/tutorial.py
:language: text
:lines: 24-28
.. _`DolphinScheduler project page`: https://dolphinscheduler.apache.org/en-us/docs/latest/user_doc/guide/project.html
.. _`Python context manager`: https://docs.python.org/3/library/stdtypes.html#context-manager-types