@ -18,129 +18,202 @@
Tutorial
========
This tutorial show you the basic concept of *PyDolphinScheduler* and tell all
This tutorial shows you the basic concept of *PyDolphinScheduler* and tells all
things you should know before you submit or run your first workflow. If you
still not install *PyDolphinScheduler* and start Apache DolphinScheduler, you
could go and see :ref: `how to getting start PyDolphinScheduler <start:getting started>`
still have not installed *PyDolphinScheduler* and start DolphinScheduler, you
could go and see :ref: `how to getting start PyDolphinScheduler <start:getting started>` firstly.
Overview of Tutorial
--------------------
Here have an overview of our tutorial, and it look a little complex but do not
worry about that because we explain this example below as detailed as possible.
Here have an overview of our tutorial, and it looks a little complex but does not
worry about that because we explain this example below as detail as possible.
.. literalinclude :: ../../src/pydolphinscheduler/examples/tutorial.py
:start-after: [start tutorial]
:end-before: [end tutorial]
There are two types of tutorials: traditional and task decorator.
- **Traditional Way** : More general, support many :doc: `built-in task types <tasks/index>` , it is convenient
when you build your workflow at the beginning.
- **Task Decorator** : A Python decorator allow you to wrap your function into pydolphinscheduler's task. Less
versatility to the traditional way because it only supported Python functions and without build-in tasks
supported. But it is helpful if your workflow is all built with Python or if you already have some Python
workflow code and want to migrate them to pydolphinscheduler.
.. tab :: Tradition
.. literalinclude :: ../../src/pydolphinscheduler/examples/tutorial.py
:dedent: 0
:start-after: [start tutorial]
:end-before: [end tutorial]
.. tab :: Task Decorator
.. literalinclude :: ../../src/pydolphinscheduler/examples/tutorial_decorator.py
:dedent: 0
:start-after: [start tutorial]
:end-before: [end tutorial]
Import Necessary Module
-----------------------
First of all, we should importing necessary module which we would use later just
like other Python package. We just create a minimum demo here, so we just import
:class: `pydolphinscheduler.core.process_definition` and
:class: `pydolphinscheduler.tasks.shell` .
First of all, we should import the necessary module which we would use later just like other Python packages.
.. literalinclude :: ../../src/pydolphinscheduler/examples/tutorial.py
:start-after: [start package_import]
:end-before: [end package_import]
.. tab :: Tradition
.. literalinclude :: ../../src/pydolphinscheduler/examples/tutorial.py
:dedent: 0
:start-after: [start package_import]
:end-before: [end package_import]
If you want to use other task type you could click and
:doc: `see all tasks we support <tasks/index>`
In tradition tutorial we import :class: `pydolphinscheduler.core.process_definition.ProcessDefinition` and
:class: `pydolphinscheduler.tasks.shell.Shell` .
If you want to use other task type you could click and :doc: `see all tasks we support <tasks/index>`
.. tab :: Task Decorator
.. literalinclude :: ../../src/pydolphinscheduler/examples/tutorial_decorator.py
:dedent: 0
:start-after: [start package_import]
:end-before: [end package_import]
In task decorator tutorial we import :class: `pydolphinscheduler.core.process_definition.ProcessDefinition` and
:func: `pydolphinscheduler.tasks.func_wrap.task` .
Process Definition Declaration
------------------------------
We should instantiate object after we import them from `import necessary module`_ .
Here we declare basic arguments for process definition(aka, workflow). We define
the name of process definition, using `Python context manager`_ and it
**the only required argument** for object process definition. Beside that we also
declare three arguments named `schedule` , `start_time` which setting workflow schedule
interval and schedule start_time, and argument `tenant` which changing workflow's
task running user in the worker, :ref: `section tenant <concept:tenant>` in *PyDolphinScheduler*
:doc: `concept` page have more detail information.
We should instantiate :class: `pydolphinscheduler.core.process_definition.ProcessDefinition` object after we
import them from `import necessary module`_ . Here we declare basic arguments for process definition(aka, workflow).
We define the name of :code: `ProcessDefinition` , using `Python context manager`_ and it **the only required argument**
for `ProcessDefinition` . Besides, we also declare three arguments named :code: `schedule` and :code: `start_time`
which setting workflow schedule interval and schedule start_time, and argument :code: `tenant` defines which tenant
will be running this task in the DolphinScheduler worker. See :ref: `section tenant <concept:tenant>` in
*PyDolphinScheduler* :doc: `concept` for more information.
.. literalinclude :: ../../src/pydolphinscheduler/examples/tutorial.py
:start-after: [start workflow_declare]
:end-before: [end workflow_declare]
.. tab :: Tradition
.. literalinclude :: ../../src/pydolphinscheduler/examples/tutorial.py
:dedent: 0
:start-after: [start workflow_declare]
:end-before: [end workflow_declare]
.. tab :: Task Decorator
We could find more detail about process definition in
:ref: `concept about process definition <concept:process definition>` if you interested in it.
For all arguments of object process definition, you could find in the
:class: `pydolphinscheduler.core.process_definition` api documentation.
.. literalinclude :: ../../src/pydolphinscheduler/examples/tutorial_decorator.py
:dedent: 0
:start-after: [start workflow_declare]
:end-before: [end workflow_declare]
We could find more detail about :code: `ProcessDefinition` in :ref: `concept about process definition <concept:process definition>`
if you are interested in it. For all arguments of object process definition, you could find in the
:class: `pydolphinscheduler.core.process_definition` API documentation.
Task Declaration
----------------
Here we declare four tasks, and bot of them are simple task of
:class: `pydolphinscheduler.tasks.shell` which running `echo` command in terminal.
Beside the argument `command` , we also need setting argument `name` for each task *(not
only shell task, `name` is required for each type of task)*.
.. tab :: Tradition
.. literalinclude :: ../../src/pydolphinscheduler/examples/tutorial.py
:dedent: 0
:start-after: [start task_declare]
:end-before: [end task_declare]
We declare four tasks to show how to create tasks, and both of them are simple tasks of
:class: `pydolphinscheduler.tasks.shell` which runs `echo` command in the terminal. Besides the argument
`command` with :code: `echo` command, we also need to set the argument `name` for each task
*(not only shell task, `name` is required for each type of task)* .
.. literalinclude :: ../../src/pydolphinscheduler/examples/tutorial.py
:dedent: 0
:start-after: [start task_declare]
:end-before: [end task_declare]
Besides shell task, *PyDolphinScheduler* supports multiple tasks and you could find in :doc: `tasks/index` .
.. tab :: Task Decorator
Beside shell task, *PyDolphinScheduler* support multiple tasks and you could
find in :doc: `tasks/index` .
We declare four tasks to show how to create tasks, and both of them are created by the task decorator which
using :func: `pydolphinscheduler.tasks.func_wrap.task` . All we have to do is add a decorator named
:code: `@task` to existing Python function, and then use them inside :class: `pydolphinscheduler.core.process_definition`
.. literalinclude :: ../../src/pydolphinscheduler/examples/tutorial_decorator.py
:dedent: 0
:start-after: [start task_declare]
:end-before: [end task_declare]
It makes our workflow more Pythonic, but be careful that when we use task decorator mode mean we only use
Python function as a task and could not use the :doc: `built-in tasks <tasks/index>` most of the cases.
Setting Task Dependence
-----------------------
After we declare both process definition and task, we have one workflow with
four tasks, both all tasks is independent so that they would run in parallel.
We should reorder the sort and the dependence of tasks. It useful when we need
run prepare task before we run actual task or we need tasks running is specific
rule. We both support attribute `set_downstream` and `set_upstream` , or bitwise
operators `>>` and `<<` .
After we declare both process definition and task, we have four tasks that are independent and will be running
in parallel. If you want to start one task until some task is finished, you have to set dependence on those
tasks.
In this example, we set task `task_parent` is the upstream task of task
`task_child_one` and `task_child_two` , and task `task_union` is the downstream
task of both these two task.
Set task dependence is quite easy by task's attribute :code: `set_downstream` and :code: `set_upstream` or by
bitwise operators :code: `>>` and :code: `<<`
.. literalinclude :: ../../src/pydolphinscheduler/examples/tutorial.py
:dedent: 0
:start-after: [start task_relation_declare]
:end-before: [end task_relation_declare]
In this tutorial, task `task_parent` is the leading task of the whole workflow, then task `task_child_one` and
task `task_child_two` are its downstream tasks. Task `task_union` will not run unless both task `task_child_one`
and task `task_child_two` was done, because both two task is `task_union` 's upstream.
.. tab :: Tradition
.. literalinclude :: ../../src/pydolphinscheduler/examples/tutorial.py
:dedent: 0
:start-after: [start task_relation_declare]
:end-before: [end task_relation_declare]
Please notice that we could grouping some tasks and set dependence if they have
same downstream or upstream. We declare task `task_child_one` and `task_child_two`
as a group here, named as `task_group` and set task `task_parent` as upstream of
both of them. You could see more detail in :ref: `concept:Tasks Dependence` section in concept
documentation.
.. tab :: Task Decorator
.. literalinclude :: ../../src/pydolphinscheduler/examples/tutorial_decorator.py
:dedent: 0
:start-after: [start task_relation_declare]
:end-before: [end task_relation_declare]
.. note ::
We could set task dependence in batch mode if they have the same downstream or upstream by declaring those
tasks as task groups. In tutorial, We declare task `task_child_one` and `task_child_two` as task group named
`task_group` , then set `task_group` as downstream of task `task_parent` . You could see more detail in
:ref: `concept:Tasks Dependence` for more detail about how to set task dependence.
Submit Or Run Workflow
----------------------
Now we finish our workflow definition, with task and task dependence, but all
these things are in local, we should let Apache DolphinScheduler daemon know what we
define our workflow. So the last thing we have to do here is submit our workflow to
Apache DolphinScheduler daemon.
After that, we finish our workflow definition, with four tasks and task dependence, but all these things are
local, we should let the DolphinScheduler daemon know how the definition of workflow. So the last thing we
have to do is submit the workflow to the DolphinScheduler daemon.
We here in the example using `ProcessDefinition` attribute `run` to submit workflow
to the daemon, and set the schedule time we just declare in `process definition declaration`_ .
Fortunately, we have a convenient method to submit workflow via `ProcessDefinition` attribute :code: `run` which
will create workflow definition as well as workflow schedule .
Now, we could run the Python code like other Python script, for the basic usage run
:code: `python tutorial.py` to trigger and run it.
.. tab :: Tradition
.. literalinclude :: ../../src/pydolphinscheduler/examples/tutorial.py
:dedent: 0
:start-after: [start submit_or_run]
:end-before: [end submit_or_run]
.. literalinclude :: ../../src/pydolphinscheduler/examples/tutorial.py
:dedent: 0
:start-after: [start submit_or_run]
:end-before: [end submit_or_run]
.. tab :: Task Decorator
.. literalinclude :: ../../src/pydolphinscheduler/examples/tutorial_decorator.py
:dedent: 0
:start-after: [start submit_or_run]
:end-before: [end submit_or_run]
At last, we could execute this workflow code in your terminal like other Python scripts, running
:code: `python tutorial.py` to trigger and execute it.
.. note ::
If you not start your Apache DolphinScheduler server, you could find the way in
:ref: `start:start Python gateway service` and it would have more detail about related server
start. Beside attribute `run` , we have attribute `submit` for object `ProcessDefinition`
and it just submit workflow to the daemon but not setting the schedule information. For
more detail you could see :ref: `concept:process definition` .
If you do not start your DolphinScheduler API server, you could find how to start it in
:ref: `start:start Python gateway service` for more detail. Besides attribute :code: `run` , we have attribute
:code: `submit` for object `ProcessDefinition` which just submits workflow to the daemon but does not set
the workflow schedule information. For more detail, you could see :ref: `concept:process definition` .
DAG Graph After Tutorial Run
----------------------------
After we run the tutorial code, you could login Apache DolphinScheduler web UI,
go and see the `DolphinScheduler project page`_ . they is a new process definition be
created and named "Tutorial". It create by *PyDolphinScheduler* and the DAG graph as below
After we run the tutorial code, you could log in DolphinScheduler web UI, go and see the
`DolphinScheduler project page`_ . They is a new process definition be created by *PyDolphinScheduler* and it
named "tutorial" or "tutorial_decorator". The task graph of workflow like below:
.. literalinclude :: ../../src/pydolphinscheduler/examples/tutorial.py
:language: text