Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support bidirectional data replication between two TiDB clusters #251

Open
shenli opened this issue Dec 23, 2019 · 23 comments · May be fixed by #253
Open

Support bidirectional data replication between two TiDB clusters #251

shenli opened this issue Dec 23, 2019 · 23 comments · May be fixed by #253
Assignees

Comments

@shenli
Copy link

shenli commented Dec 23, 2019

Gravity could replicate data between two MySQL clusters bidirectionally and could read TiDB's binlog. So I guess it is not hard to support bidirectional data replication between two TiDB Clusters.

@shenli shenli changed the title Support bidirectional data replication between two TiDB Clusters Support bidirectional data replication between two TiDB clusters Dec 23, 2019
@Ryan-Git Ryan-Git self-assigned this Dec 23, 2019
@Ryan-Git
Copy link
Collaborator

Ryan-Git commented Dec 23, 2019

@shenli
Yes. Could you provide a docker image that starts TiDB cluster and kafka with proper configuration(pumper, drainer, etc)? It would be appreciated if binlog could be subscribed from it directly.

@WangXiangUSTC
Copy link

WangXiangUSTC commented Dec 23, 2019

can use docker-compose to start a TiDB cluster with binlog, can see https://github.com/pingcap/tidb-docker-compose, use docker-compose-binlog.yml this file, and need update drainer's config file to send binlog to kafka.

@IANTHEREAL
Copy link

@Ryan-Git are anything going well?

@Ryan-Git
Copy link
Collaborator

Ryan-Git commented Dec 25, 2019

@GregoryIan I've roughly finished the change. Need more time for testing.

btw, there's no unique key info in the messages. It will cause some problems when writing to target in parallel.

@WangXiangUSTC
Copy link

@Ryan-Git do you means find a bug when table without unique key in gravity?

@Ryan-Git
Copy link
Collaborator

Ryan-Git commented Dec 25, 2019

@WangXiangUSTC

For mysql, we consider new and old values for both primary keys and unique keys as "data dependency". A message has duplicate dependency with ongoing ones will be delayed until they finished. It's just like lock/latch mechanism in databases.

for example, there's one row with pk = 1 and uk = 1. Two transactions happen in following sequence

delete from t where uk = 1;
insert into t(pk, uk) values (2, 1);

They should be executed in sequence. But if we only know primary key, it's different(1 and 2). So gravity is allowed to output them in arbitrary order.

@july2993
Copy link

@WangXiangUSTC

For mysql, we consider new and old values for both primary keys and unique keys as "data dependency". A message has duplicate dependency with ongoing ones will be delayed until they finished. It's just like lock/latch mechanism in databases.

for example, there's one row with pk = 1 and uk = 1. Two transactions happen in following sequence

delete from t where uk = 1;
insert into t(pk, uk) values (2, 1);

They should be executed in sequence. But if we only know primary key, it's different(1 and 2). So gravity is allowed to output them in arbitrary order.

can we get the schema from downstream?
another choice we can add the unique key info in the messages to address this problem.

@Ryan-Git
Copy link
Collaborator

Ryan-Git commented Dec 25, 2019

Getting the schema from downstream might work, but

  • Logically, data dependency is decided by source data store, not the target. Though it seems identical in this specific scenario, it's still error prone.

  • In practice, the internal message structure is built by input component. When it reaches output, it has been scheduled thus too late. Connecting to the output data store in input component is ridiculous.

One way to get around this is connecting to the source data store. But the message and schema may not match unless we keep a schema history. That's another story though(#174 ). I think add the unique key info in the messages is a better solution for TiDB.

Temporally, we can set number of workers to one, handle the messages sequentially, just like drainer outputs to only one partition. But the throughput would be a problem.

btw, do we need to support DDL?

@july2993
Copy link

Getting the schema from downstream might work, but

  • Logically, data dependency is decided by source data store, not the target. Though it seems identical in this specific scenario, it's still error prone.
  • In practice, the internal message structure is built by input component. When it reaches output, it has been scheduled thus too late. Connecting to the output data store in input component is ridiculous.

One way to get around this is connecting to the source data store. But the message and schema may not match unless we keep a schema history. That's another story though(#174 ). I think add the unique key info in the messages is a better solution for TiDB.

Temporally, we can set number of workers to one, handle the messages sequentially, just like drainer outputs to only one partition. But the throughput would be a problem.

btw, do we need to support DDL?

I think i can add the unique key info in the messages later.

Is the DDL supported for other sources like mysql ?

@Ryan-Git
Copy link
Collaborator

I think i can add the unique key info in the messages later.

Is the DDL supported for other sources like mysql ?

Yes, partially supported.

@Ryan-Git
Copy link
Collaborator

Ryan-Git commented Dec 27, 2019

@july2993
One more question. How could we describe composite key? The ColumnInfo type supports only single column primary key I think.

@july2993
Copy link

@july2993
One more question. How could we describe composite key? The ColumnInfo type supports only single column primary key I think.

all column belong to PK will be mark as IsPrimaryKey for composite key,
I will add the unique key info in pingcap/tidb-tools#310, after that, i think you can ignore the IsPrimaryKey flag, could you take a look?

@Ryan-Git
Copy link
Collaborator

@july2993
One more question. How could we describe composite key? The ColumnInfo type supports only single column primary key I think.

all column belong to PK will be mark as IsPrimaryKey for composite key,
I will add the unique key info in pingcap/tidb-tools#310, after that, i think you can ignore the IsPrimaryKey flag, could you take a look?

ok

@Ryan-Git
Copy link
Collaborator

Ryan-Git commented Dec 27, 2019

@july2993
Now the proto PR has been merged. How about drainer and tidb-docker-compose?
I'll do some test during weekend, with the old primary key flag first.

@july2993
Copy link

@july2993
Now the proto PR has been merged. How about drainer and tidb-docker-compose?
I'll do some test during weekend, with the old primary key flag first.

the relate pr of drainer is not merged ye pingcap/tidb-binlog#858, I will merge it ASAP then you can use the added unique info, if only testing the old primary key flag i think you can just use the docker-compose privided first by @WangXiangUSTC

@Ryan-Git
Copy link
Collaborator

@WangXiangUSTC
there're a few errors with binlog docker-compose. Could you help?
drainer:

[2019/12/30 03:22:23.297 +00:00] [FATAL] [main.go:39] ["verifying flags failed, See 'drainer --help'."] [error="component drainer's config file /drainer.toml contained unknown configuration options: syncer.disable-dispatch"] [errorVerbose="component drainer's config file /drainer.toml contained unknown configuration options: syncer.disable-dispatch\ngithub.com/pingcap/tidb-binlog/pkg/util.StrictDecodeFile\n\t/home/jenkins/agent/workspace/build_tidb_binlog_master/go/src/github.com/pingcap/tidb-binlog/pkg/util/util.go:205\ngithub.com/pingcap/tidb-binlog/drainer.(*Config).configFromFile\n\t/home/jenkins/agent/workspace/build_tidb_binlog_master/go/src/github.com/pingcap/tidb-binlog/drainer/config.go:230\ngithub.com/pingcap/tidb-binlog/drainer.(*Config).Parse\n\t/home/jenkins/agent/workspace/build_tidb_binlog_master/go/src/github.com/pingcap/tidb-binlog/drainer/config.go:173\nmain.main\n\t/home/jenkins/agent/workspace/build_tidb_binlog_master/go/src/github.com/pingcap/tidb-binlog/cmd/drainer/main.go:38\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357"] [stack="github.com/pingcap/log.Fatal\n\t/home/jenkins/agent/workspace/build_tidb_binlog_master/go/pkg/mod/github.com/pingcap/[email protected]/global.go:59\nmain.main\n\t/home/jenkins/agent/workspace/build_tidb_binlog_master/go/src/github.com/pingcap/tidb-binlog/cmd/drainer/main.go:39\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203"]

tidb: (I can't find config txn-total-size-limit in doc)

invalid config txn-total-size-limit should be less than 104857600 with binlog enabled

@july2993
Copy link

@Ryan-Git
you can try this tow config change, I am doing a test step by step to make sure the dock-compose works, wll let you know once i make sure it works.

diff --git a/config/drainer.toml b/config/drainer.toml
index b2190e3..bf4f064 100644
--- a/config/drainer.toml
+++ b/config/drainer.toml
@@ -40,7 +40,7 @@ txn-batch = 20
 # to get higher throughput by higher concurrent write to the downstream
 worker-count = 16

- disable-dispatch = false
+ # disable-dispatch = false

  # safe mode will split update to delete and insert
 safe-mode = false
diff --git a/config/tidb.toml b/config/tidb.toml
index 9f881b4..a7d3fc1 100644
--- a/config/tidb.toml
+++ b/config/tidb.toml
@@ -113,6 +113,8 @@ metrics-addr = "pushgateway:9091"
 metrics-interval = 15

 [performance]
+txn-total-size-limit = 104857599
+
 # Max CPUs to use, 0 use number of CPUs in the machine.
 max-procs = 0
 # StmtCountLimit limits the max count of statement inside a transaction.

@Ryan-Git
Copy link
Collaborator

@july2993
There's still some problem with drainer and kafka. Drainer exits with 0 silently while kafka can't connect to zk.

@july2993
Copy link

@Ryan-Git

here is the total changed july2993/tidb-docker-compose@4fe618e
or you can just checkout this branch: https://github.com/july2993/tidb-docker-compose/tree/binlog_test
the ZOO_SERVERS is changed since 3.5, so we can specify the 3.4 version.

Also, the newest version of TiDB-Binlog has included the added unique key info, you may need to re-pull.

docker-compose -f docker-compose-binlog.yml pull

after that, it 's expected to start all services as:

docker-compose -f docker-compose-binlog.yml up -d --force-recreate

A example:

mysql> create table t(a1 int primary key, a2 int, a3 int, unique key(a2,a3));
Query OK, 0 rows affected (0.08 sec)

mysql> insert into t(a1, a2, a3) values(1,2,3);
Query OK, 1 row affected (0.03 sec)

mysql>

here I use the tool print to consume messages and just log it.
If running it locally you may need to set the flowing configaration in /etc/hosts

127.0.0.1 kafka0
127.0.0.1 kafka1
127.0.0.1 kafka2

we can get message from Kafka:

➜  print git:(master) ✗ ./print -topic=6776431877390009174_obinlog -offset -2
[2019/12/31 11:03:26.046 +08:00] [INFO] [print.go:51] [recv] [message="type:DDL commit_ts:413600582227984385 ddl_data:<schema_name:\"test\" table_name:\"\" ddl_query:\"CREATE DATABASE IF NOT EXISTS test\" > "]
[2019/12/31 11:03:26.046 +08:00] [INFO] [print.go:51] [recv] [message="type:DDL commit_ts:413600602768801799 ddl_data:<schema_name:\"test\" table_name:\"t\" ddl_query:\"create table t(a1 int primary key, a2 int, a3 int, unique key(a2,a3))\" > "]
[2019/12/31 11:03:26.046 +08:00] [INFO] [print.go:51] [recv] [message="commit_ts:413600605764845571 dml_data:<tables:<schema_name:\"test\" table_name:\"t\" column_info:<name:\"a1\" mysql_type:\"int\" is_primary_key:true > column_info:<name:\"a2\" mysql_type:\"int\" > column_info:<name:\"a3\" mysql_type:\"int\" > mutations:<type:Insert row:<columns:<int64_value:1 > columns:<int64_value:2 > columns:<int64_value:3 > > > unique_keys:<name:\"PRIMARY\" column_names:\"a1\" > unique_keys:<name:\"a2\" column_names:\"a2\" column_names:\"a3\" > > > "]

the default topic name is _obinlog, you can change it by setting syncer.to.topick-name in drainer.toml if need.

Let me know if there's anything more I can help.

@Ryan-Git
Copy link
Collaborator

@july2993 Thanks. I'll try this.

@Ryan-Git Ryan-Git linked a pull request Jan 3, 2020 that will close this issue
@Ryan-Git
Copy link
Collaborator

Ryan-Git commented Jan 3, 2020

@july2993
I've done some basic testing, trying to setup ci for this scenario. You could make further test with https://github.com/moiot/gravity/tree/tidb-bidirection

btw, default value for stop-write-at-available-space of pumper seems too big for testing.

@Ryan-Git
Copy link
Collaborator

Ryan-Git commented Jan 13, 2020

@july2993 any problem? If no, I'm planning to merge #253

@july2993
Copy link

july2993 commented Jan 13, 2020

@july2993 any problem? If no, I'm planning to merge #253

@Ryan-Git
Sorry, looks good for me, but I haven't made further test yet. Can you keep going on and we can make further tests later if need?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants