-
Notifications
You must be signed in to change notification settings - Fork 1
/
README
executable file
·179 lines (136 loc) · 7 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
Overview
--------
Wombat is a program that replicates data from MySQL to Apache Kafka. It replicates row-level data in near real-time.
It uses the MySQL replication protocol to speak to a MySQL server and act as a MySQL replication follower. Wombat receives a stream of row-level changes, and writes each change to Kafka. If a row is inserted/updated, Wombat sends the row's new data to Kafka. If a row is deleted, Wombat sends a delete marker to Kafka.
Key features of Wombat:
* Works with any MySQL/MariaDB-compatible server (tested against MariaDB 5.5 and MariaDB 10.0 (haven't done a lot of testing).
* Safe. It replicates from the master MySQL database with just read-only, replication-only permissions. (need to test this)
* Low impact on performance. It replicates asynchronously.
* Transactionally consistent output. Writes only appear if they have been successfully committed to MySQL; aborted transactions do not emit messages. Writes appear in the same order as they were committed to MySQL.
* Fault-tolerant. If Wombat crashes, it will resume replication at the point where it failed. It will not lose any data.
Demo
----
[See demo_video.mov file]
Building
--------
To compile wombat:
$ ./gradlew build fatJar
To build the docker quickstart images:
$ docker build -f replicator/Dockerfile.quickstart -t wombat-quickstart replicator
$ docker build -t mariadb-quickstart mariadb-quickstart
Runtime requirements
--------------------
MySQL configuration requirements:
* row-based binlog replication enabled (https://dev.mysql.com/doc/refman/5.1/en/replication-howto-masterbaseconfig.html)
* --binlog-row-image=full (http://dev.mysql.com/doc/refman/5.6/en/replication-options-binary-log.html#sysvar_binlog_row_image)
* all tables must have primary keys
* Only replicates file/offset replication, does not replicate systems using GTIDs.
Quickstart
----------
The easiest way to start playing with Wombat is to use the pre-built and pre-configured Docker images. They have a hardcoded configuration and are not intended for production use.
$ docker run -d --name zookeeper --hostname zookeeper confluent/zookeeper
$ docker run -d --name kafka --hostname kafka --link zookeeper:zookeeper --env KAFKA_LOG_CLEANUP_POLICY=compact confluent/kafka
$ docker run -d --name mariadb mariadb-quickstart
# run wombat
$ docker run --rm -it --link kafka:kafka --link zookeeper:zookeeper --link mariadb:mariadb --name wombat wombat-quickstart
# interact. consume from kafka.
$ docker run -it --rm --link zookeeper:zookeeper --link kafka:kafka confluent/tools /bin/bash -c '/usr/bin/kafka-console-consumer --zookeeper ${ZOOKEEPER_PORT_2181_TCP_ADDR}:${ZOOKEEPER_PORT_2181_TCP_PORT} --topic prefix.test.users --property print.key=true --from-beginning'
# insert some data
$ docker run --rm -it --link mariadb:mariadb mariadb:5.5 sh -c 'mysql --user=root --password=${MARIADB_ENV_MYSQL_ROOT_PASSWORD} --host=${MARIADB_PORT_3306_TCP_ADDR} --port=${MARIADB_PORT_3306_TCP_PORT}'
MariaDB [(none)]> create database test;
MariaDB [(none)]> create table test.users (userId int auto_increment primary key, name char(128));
MariaDB [(none)]> insert into test.users (name) values ("James");
MariaDB [(none)]> update test.users set name = "James Cheng" where name = "James";
MariaDB [(none)]> delete from test.users where name = "James Cheng";
# clean up
$ docker rm -f wombat mariadb kafka zookeeper
Consuming
---------
Wombat encodes the row data using JSON, and then wraps it in an HTTP envelope. The envelope headers are used for monitoring replication progress.
Each table is replicated to a different Kafka topic.
Given the following table structure:
CREATE TABLE users (userId INT AUTO_INCREMENT PRIMARY KEY, name CHAR(128));
Inserts:
INSERT INTO users (name) VALUES ("James");
BL/1 143 27
ServerId: mysqlhost.domainname
BinlogEventStartPosition: 8288
BinlogFilename: mariadb-bin.000008
BinlogTransactionStartPosition: 8173
{"name":"James","userId":1}
Updates:
UPDATE users SET name = "James Cheng" WHERE name="James";
BL/1 143 33
ServerId: mysqlhost.domainname
BinlogEventStartPosition: 8470
BinlogFilename: mariadb-bin.000008
BinlogTransactionStartPosition: 8355
{"name":"James Cheng","userId":1}
Deletes:
DELETE FROM users WHERE name = "James Cheng";
null
Deletes are represented as a message with a null payload.
Status
------
Wombat is pre-alpha software. It has only been tested against a limit set of inputs.
A lot of thought has been put into its design, but not everything is implemented. I am looking for feedback to help validate the design and guide the implementation.
Caveats/issues
--------------
* MySQL schema changes are not handled.
* Not all MySQL data types are handled properly.
* Not tolerant to disk failure (it saves its MySQL replication position into a local file).
* Errors are not handled. Since it is still in development mode, it prints errors to stderr and System.exit()'s whenever it encounters an error.
Inspiration
-----------
Databus - https://github.com/linkedin/databus
License
-------
Copyright 2015 TiVo Inc.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this software except in compliance with the License in the enclosed file called LICENSE.
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Contributing
------------
See CONTRIBUTING.md
Running manually
----------------
How to run it manually, without docker.
Wombat takes configuration from two config files that are passed in at startup.
$ cat config.properties
MYSQL_SOURCE_IDENTIFIER=myServerId
MYSQL_SOURCE_HOST=192.168.1.1
MYSQL_SOURCE_PORT=3306
MYSQL_SOURCE_USER=replication-user
MYSQL_SOURCE_PASSWORD=replication-password
KAFKA_BROKER=192.168.1.1
KAFKA_PORT=9092
KAFKA_TOPIC_BASE=topic.prefix
$ cat checkpoint.txt
BINLOG_FILENAME=mariadb-bin.000003
BINLOG_POSITION=4
$ export CONFIG_FILE=config.properties
$ export CHECKPOINT_FILE=checkpoint.txt
$ java -jar replication/build/libs/replication.jar
Configuration
-------------
MYSQL_SOURCE_IDENTIFIER
An arbitrary string that will be added to each message that originates from this MySQL server
MYSQL_SOURCE_HOST
Hostname or IP address of the MySQL server to replicate from.
MYSQL_SOURCE_PORT
The port that MySQL listens on
MYSQL_SOURCE_USER
The username used to connect to MySQL
MYSQL_SOURCE_PASSWORD
The password used to connect to MySQL
SERVER_ID
MySQL's server id setting
KAFKA_BROKER
IP/Hostname of one of the Kafka brokers
KAFKA_PORT
Port of the Kafka broker
KAFKA_TOPIC_BASE
An arbitrary string. A MySQL database and table nane are
appended to this string to determine to which topic that
table's rows should be sent. For example, a prefix of
"server1" could be chosen and a table db.users would be
written to the topic "server1.db.users"