DolphinScheduler/dolphinscheduler-python/pydolphinscheduler/examples/task_datax_example.py

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.

"""
A example workflow for task datax.

This example will create a workflow named `task_datax`.
`task_datax` is true workflow define and run task task_datax.
You can create data sources `first_mysql` and `first_mysql` through UI.
It creates a task to synchronize datax from the source database to the target database.
"""


from pydolphinscheduler.core.process_definition import ProcessDefinition
from pydolphinscheduler.tasks.datax import CustomDataX, DataX

# datax json template
JSON_TEMPLATE = {
    "job": {
        "content": [
            {
                "reader": {
                    "name": "mysqlreader",
                    "parameter": {
                        "username": "usr",
                        "password": "pwd",
                        "column": [
                            "id",
                            "name",
                            "code",
                            "description"
                        ],
                        "splitPk": "id",
                        "connection": [
                            {
                                "table": [
                                    "source_table"
                                ],
                                "jdbcUrl": [
                                    "jdbc:mysql://127.0.0.1:3306/source_db"
                                ]
                            }
                        ]
                    }
                },
                "writer": {
                    "name": "mysqlwriter",
                    "parameter": {
                        "writeMode": "insert",
                        "username": "usr",
                        "password": "pwd",
                        "column": [
                            "id",
                            "name"
                        ],
                        "connection": [
                            {
                                "jdbcUrl": "jdbc:mysql://127.0.0.1:3306/target_db",
                                "table": [
                                    "target_table"
                                ]
                            }
                        ]
                    }
                }
            }
        ]
    }
}

with ProcessDefinition(
    name="task_datax_1",
    tenant="tenant_exists",
) as pd:
    # This task synchronizes the data in `t_ds_project`
    # of `first_mysql` database to `target_project` of `second_mysql` database.
    task1 = DataX(
        name="task_datax",
        datasource_name="first_mysql",
        datatarget_name="second_mysql",
        sql="select id, name, code, description from source_table",
        target_table="target_table",
    )

    # you can custom json_template of datax to sync data. This task create job
    # same as task1 do
    task2 = CustomDataX(name="task_custom_datax", json=str(JSON_TEMPLATE))
    pd.run()
[cherry-pick][python] Make dolphinscheduler python API works to 2.0.2 (#7608) * [cherry-pick][python] Make it work to 2.0.2 * Remove unused ProcessExecutionTypeEnum * Add queryByName to project * Add checkTenantExists to tenant * Add queryByTenantCode to tenant * Add queryQueueName to queue * Add all content from dev branch * Add gitignore * Add pydolphinscheduler content * Add ds-py to bin test * Py merge to 202 * Fix version * Fix missing variable * Add py4j as known deps * Fix core database bug 3 years ago			`# Licensed to the Apache Software Foundation (ASF) under one`
			`# or more contributor license agreements. See the NOTICE file`
			`# distributed with this work for additional information`
			`# regarding copyright ownership. The ASF licenses this file`
			`# to you under the Apache License, Version 2.0 (the`
			`# "License"); you may not use this file except in compliance`
			`# with the License. You may obtain a copy of the License at`
			`#`
			`# http://www.apache.org/licenses/LICENSE-2.0`
			`#`
			`# Unless required by applicable law or agreed to in writing,`
			`# software distributed under the License is distributed on an`
			`# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY`
			`# KIND, either express or implied. See the License for the`
			`# specific language governing permissions and limitations`
			`# under the License.`

			`"""`
			`A example workflow for task datax.`

			This example will create a workflow named `task_datax`.
			`task_datax` is true workflow define and run task task_datax.
			You can create data sources `first_mysql` and `first_mysql` through UI.
			`It creates a task to synchronize datax from the source database to the target database.`
			`"""`


			`from pydolphinscheduler.core.process_definition import ProcessDefinition`
			`from pydolphinscheduler.tasks.datax import CustomDataX, DataX`

			`# datax json template`
Fix bug in python example (#7681) Fix example bug in switch and datax task type. Fix misunderstanding for condition node * [python] Fix switch example workflow name conflict to dependent * [python] Fix task condition missing branch success and fail * [python] Task datax add more detail example 3 years ago			`JSON_TEMPLATE = {`
			`"job": {`
			`"content": [`
			`{`
			`"reader": {`
			`"name": "mysqlreader",`
			`"parameter": {`
			`"username": "usr",`
			`"password": "pwd",`
			`"column": [`
			`"id",`
			`"name",`
			`"code",`
			`"description"`
			`],`
			`"splitPk": "id",`
			`"connection": [`
			`{`
			`"table": [`
			`"source_table"`
			`],`
			`"jdbcUrl": [`
			`"jdbc:mysql://127.0.0.1:3306/source_db"`
			`]`
			`}`
			`]`
			`}`
			`},`
			`"writer": {`
			`"name": "mysqlwriter",`
			`"parameter": {`
			`"writeMode": "insert",`
			`"username": "usr",`
			`"password": "pwd",`
			`"column": [`
			`"id",`
			`"name"`
			`],`
			`"connection": [`
			`{`
			`"jdbcUrl": "jdbc:mysql://127.0.0.1:3306/target_db",`
			`"table": [`
			`"target_table"`
			`]`
			`}`
			`]`
			`}`
			`}`
			`}`
			`]`
			`}`
			`}`
[cherry-pick][python] Make dolphinscheduler python API works to 2.0.2 (#7608) * [cherry-pick][python] Make it work to 2.0.2 * Remove unused ProcessExecutionTypeEnum * Add queryByName to project * Add checkTenantExists to tenant * Add queryByTenantCode to tenant * Add queryQueueName to queue * Add all content from dev branch * Add gitignore * Add pydolphinscheduler content * Add ds-py to bin test * Py merge to 202 * Fix version * Fix missing variable * Add py4j as known deps * Fix core database bug 3 years ago
			`with ProcessDefinition(`
Fix bug in python example (#7681) Fix example bug in switch and datax task type. Fix misunderstanding for condition node * [python] Fix switch example workflow name conflict to dependent * [python] Fix task condition missing branch success and fail * [python] Task datax add more detail example 3 years ago			`name="task_datax_1",`
[cherry-pick][python] Make dolphinscheduler python API works to 2.0.2 (#7608) * [cherry-pick][python] Make it work to 2.0.2 * Remove unused ProcessExecutionTypeEnum * Add queryByName to project * Add checkTenantExists to tenant * Add queryByTenantCode to tenant * Add queryQueueName to queue * Add all content from dev branch * Add gitignore * Add pydolphinscheduler content * Add ds-py to bin test * Py merge to 202 * Fix version * Fix missing variable * Add py4j as known deps * Fix core database bug 3 years ago			`tenant="tenant_exists",`
			`) as pd:`
			# This task synchronizes the data in `t_ds_project`
			# of `first_mysql` database to `target_project` of `second_mysql` database.
			`task1 = DataX(`
			`name="task_datax",`
			`datasource_name="first_mysql",`
			`datatarget_name="second_mysql",`
			`sql="select id, name, code, description from source_table",`
			`target_table="target_table",`
			`)`

Fix bug in python example (#7681) Fix example bug in switch and datax task type. Fix misunderstanding for condition node * [python] Fix switch example workflow name conflict to dependent * [python] Fix task condition missing branch success and fail * [python] Task datax add more detail example 3 years ago			`# you can custom json_template of datax to sync data. This task create job`
			`# same as task1 do`
			`task2 = CustomDataX(name="task_custom_datax", json=str(JSON_TEMPLATE))`
[cherry-pick][python] Make dolphinscheduler python API works to 2.0.2 (#7608) * [cherry-pick][python] Make it work to 2.0.2 * Remove unused ProcessExecutionTypeEnum * Add queryByName to project * Add checkTenantExists to tenant * Add queryByTenantCode to tenant * Add queryQueueName to queue * Add all content from dev branch * Add gitignore * Add pydolphinscheduler content * Add ds-py to bin test * Py merge to 202 * Fix version * Fix missing variable * Add py4j as known deps * Fix core database bug 3 years ago			`pd.run()`