Skip to content

Commit

Permalink
Merge branch 'dev' into pr/6703
Browse files Browse the repository at this point in the history
  • Loading branch information
hailin0 committed May 14, 2024
2 parents 8c0c846 + 4b6c13e commit 8ed173d
Show file tree
Hide file tree
Showing 405 changed files with 20,627 additions and 2,612 deletions.
3 changes: 2 additions & 1 deletion .github/workflows/documents.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,11 +45,12 @@ jobs:
- uses: actions/setup-node@v2
with:
node-version: 14
node-version: 16.19.0

- name: Run docusaurus build
run: |
cd seatunnel-website
npm set strict-ssl false
npm install
npm run build
Expand Down
6 changes: 5 additions & 1 deletion docs/en/concept/JobEnvConfig.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# JobEnvConfig
# Job Env Config

This document describes env configuration information, the common parameters can be used in all engines. In order to better distinguish between engine parameters, the additional parameters of other engine need to carry a prefix.
In flink engine, we use `flink.` as the prefix. In the spark engine, we do not use any prefixes to modify parameters, because the official spark parameters themselves start with `spark.`
Expand Down Expand Up @@ -29,6 +29,10 @@ In `STREAMING` mode, checkpoints is required, if you do not set it, it will be o

This parameter configures the parallelism of source and sink.

### job.retry.times

Used to control the default retry times when a job fails. The default value is 3, and it only works in the Zeta engine.

### shade.identifier

Specify the method of encryption, if you didn't have the requirement for encrypting or decrypting config files, this option can be ignored.
Expand Down
125 changes: 125 additions & 0 deletions docs/en/concept/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ configure the Config file.
The main format of the Config file is `hocon`, for more details of this format type you can refer to [HOCON-GUIDE](https://github.com/lightbend/config/blob/main/HOCON.md),
BTW, we also support the `json` format, but you should know that the name of the config file should end with `.json`

We also support the `SQL` format, for details, please refer to the [SQL configuration](sql-config.md) file.

## Example

Before you read on, you can find config file
Expand Down Expand Up @@ -64,6 +66,19 @@ sink {
}
```

#### multi-line support

In `hocon`, multiline strings are supported, which allows you to include extended passages of text without worrying about newline characters or special formatting. This is achieved by enclosing the text within triple quotes **`"""`** . For example:

```
var = """
Apache SeaTunnel is a
next-generation high-performance,
distributed, massive data integration tool.
"""
sql = """ select * from "table" """
```

### json

```json
Expand Down Expand Up @@ -193,6 +208,116 @@ configured with these two parameters, because in SeaTunnel, there is a default c
parameters are not configured, then the generated data from the last module of the previous node will be used.
This is much more convenient when there is only one source.

## Config variable substitution

In config file we can define some variables and replace it in run time. **This is only support `hocon` format file**.

```hocon
env {
job.mode = "BATCH"
job.name = ${jobName}
parallelism = 2
}
source {
FakeSource {
result_table_name = ${resName}
row.num = ${rowNum}
string.template = ${strTemplate}
int.template = [20, 21]
schema = {
fields {
name = ${nameType}
age = "int"
}
}
}
}
transform {
sql {
source_table_name = "fake"
result_table_name = "sql"
query = "select * from "${resName}" where name = '"${nameVal}"' "
}
}
sink {
Console {
source_table_name = "sql"
username = ${username}
password = ${password}
blankSpace = ${blankSpace}
}
}
```

In the above config, we define some variables, like `${rowNum}`, `${resName}`.
We can replace those parameters with this shell command:

```shell
./bin/seatunnel.sh -c <this_config_file>
-i jobName='st var job'
-i resName=fake
-i rowNum=10
-i strTemplate=['abc','d~f','h i']
-i nameType=string
-i nameVal=abc
-i username=seatunnel=2.3.1
-i password='$a^b%c.d~e0*9('
-i blankSpace='2023-12-26 11:30:00'
-e local
```

Then the final submitted config is:

```hocon
env {
job.mode = "BATCH"
job.name = "st var job"
parallelism = 2
}
source {
FakeSource {
result_table_name = "fake"
row.num = 10
string.template = ["abc","d~f","h i"]
int.template = [20, 21]
schema = {
fields {
name = string
age = "int"
}
}
}
}
transform {
sql {
source_table_name = "fake"
result_table_name = "sql"
query = "select * from fake where name = 'abc' "
}
}
sink {
Console {
source_table_name = "sql"
username = "seatunnel=2.3.1"
password = "$a^b%c.d~e0*9("
blankSpace = "2023-12-26 11:30:00"
}
}
```

Some Notes:
- quota with `'` if the value has space ` ` or special character (like `(`)
- if the replacement variables is in `"` or `'`, like `resName` and `nameVal`, you need add `"`

## What's More

If you want to know the details of this format configuration, Please
Expand Down
189 changes: 189 additions & 0 deletions docs/en/concept/sql-config.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,189 @@
# SQL Configuration File

## Structure of SQL Configuration File

The `SQL` configuration file appears as follows.

### SQL

```sql
/* config
env {
parallelism = 1
job.mode = "BATCH"
}
*/

CREATE TABLE source_table WITH (
'connector'='jdbc',
'type'='source',
'url' = 'jdbc:mysql://localhost:3306/seatunnel',
'driver' = 'com.mysql.cj.jdbc.Driver',
'user' = 'root',
'password' = '123456',
'query' = 'select * from source',
'properties'= '{
useSSL = false,
rewriteBatchedStatements = true
}'
);

CREATE TABLE sink_table WITH (
'connector'='jdbc',
'type'='sink',
'url' = 'jdbc:mysql://localhost:3306/seatunnel',
'driver' = 'com.mysql.cj.jdbc.Driver',
'user' = 'root',
'password' = '123456',
'generate_sink_sql' = 'true',
'database' = 'seatunnel',
'table' = 'sink'
);

INSERT INTO sink_table SELECT id, name, age, email FROM source_table;
```

## Explanation of `SQL` Configuration File

### General Configuration in SQL File

```sql
/* config
env {
parallelism = 1
job.mode = "BATCH"
}
*/
```

In the `SQL` file, common configuration sections are defined using `/* config */` comments. Inside, common configurations like `env` can be defined using `HOCON` format.

### SOURCE SQL Syntax

```sql
CREATE TABLE source_table WITH (
'connector'='jdbc',
'type'='source',
'url' = 'jdbc:mysql://localhost:3306/seatunnel',
'driver' = 'com.mysql.cj.jdbc.Driver',
'user' = 'root',
'password' = '123456',
'query' = 'select * from source',
'properties' = '{
useSSL = false,
rewriteBatchedStatements = true
}'
);
```

* Using `CREATE TABLE ... WITH (...)` syntax creates a mapping for the source table. The `TABLE` name is the name of the source-mapped table, and the `WITH` syntax contains source-related configuration parameters.
* There are two fixed parameters in the WITH syntax: `connector` and `type`, representing connector plugin name (such as `jdbc`, `FakeSource`, etc.) and source type (fixed as `source`), respectively.
* Other parameter names can reference relevant configuration parameters of the corresponding connector plugin, but the format needs to be changed to `'key' = 'value',`.
* If `'value'` is a sub-configuration, you can directly use a string in `HOCON` format. Note: if using a sub-configuration in `HOCON` format, the internal property items must be separated by `,`, like this:

```sql
'properties' = '{
useSSL = false,
rewriteBatchedStatements = true
}'
```

* If using `'` within `'value'`, it needs to be escaped with `''`, like this:

```sql
'query' = 'select * from source where name = ''Joy Ding'''
```

### SINK SQL Syntax

```sql
CREATE TABLE sink_table WITH (
'connector'='jdbc',
'type'='sink',
'url' = 'jdbc:mysql://localhost:3306/seatunnel',
'driver' = 'com.mysql.cj.jdbc.Driver',
'user' = 'root',
'password' = '123456',
'generate_sink_sql' = 'true',
'database' = 'seatunnel',
'table' = 'sink'
);
```

* Using `CREATE TABLE ... WITH (...)` syntax creates a mapping for the target table. The `TABLE` name is the name of the target-mapped table, and the `WITH` syntax contains sink-related configuration parameters.
* There are two fixed parameters in the `WITH` syntax: `connector` and `type`, representing connector plugin name (such as `jdbc`, `console`, etc.) and target type (fixed as `sink`), respectively.
* Other parameter names can reference relevant configuration parameters of the corresponding connector plugin, but the format needs to be changed to `'key' = 'value',`.

### INSERT INTO SELECT Syntax

```sql
INSERT INTO sink_table SELECT id, name, age, email FROM source_table;
```

* The `SELECT FROM` part is the table name of the source-mapped table.
* The `INSERT INTO` part is the table name of the target-mapped table.
* Note: This syntax does **not support** specifying fields in `INSERT`, like this: `INSERT INTO sink_table (id, name, age, email) SELECT id, name, age, email FROM source_table;`

### INSERT INTO SELECT TABLE Syntax

```sql
INSERT INTO sink_table SELECT source_table;
```

* The `SELECT` part directly uses the name of the source-mapped table, indicating that all data from the source table will be inserted into the target table.
* Using this syntax does not generate related `transform` configurations. This syntax is generally used in multi-table synchronization scenarios. For example:

```sql
CREATE TABLE source_table WITH (
'connector'='jdbc',
'type' = 'source',
'url' = 'jdbc:mysql://127.0.0.1:3306/seatunnel',
'driver' = 'com.mysql.cj.jdbc.Driver',
'user' = 'root',
'password' = '123456',
'table_list' = '[
{
table_path = "source.table1"
},
{
table_path = "source.table2",
query = "select * from source.table2"
}
]'
);

CREATE TABLE sink_table WITH (
'connector'='jdbc',
'type' = 'sink',
'url' = 'jdbc:mysql://127.0.0.1:3306/seatunnel',
'driver' = 'com.mysql.cj.jdbc.Driver',
'user' = 'root',
'password' = '123456',
'generate_sink_sql' = 'true',
'database' = 'sink'
);

INSERT INTO sink_table SELECT source_table;
```

### CREATE TABLE AS Syntax

```sql
CREATE TABLE temp1 AS SELECT id, name, age, email FROM source_table;
```

* This syntax creates a temporary table with the result of a `SELECT` query, used for `INSERT INTO` operations.
* The syntax of the `SELECT` part refers to: [SQL-transform](../transform-v2/sql.md) `query` configuration item

```sql
CREATE TABLE temp1 AS SELECT id, name, age, email FROM source_table;

INSERT INTO sink_table SELECT * FROM temp1;
```

## Example of SQL Configuration File Submission

```bash
./bin/seatunnel.sh --config ./config/sample.sql
```

0 comments on commit 8ed173d

Please sign in to comment.