Active/Active WAN-based Replication in Apache Geode (incubating)

Guest post by Apache Geode (incubating) committer Bruce Schuchardt.

To learn more about the WAN Gateway feature in Apache Geode, join the next Geode Clubhouse virtual meeting on June 8 at 9AM Pacific.

This past February 2016 Pivotal contributed its Disaster Recovery (WAN) feature to Apache Geode. This feature has been incorporated into the Apache Geode 1.0.0-incubating.M2 release. This article describes how this feature benefits Geode users.

Ensuring your systems run smoothly even when your data center has a hiccup, or a real disaster strikes is critical for many companies to survive when hardships befall them. As we enter the age of the zettabyte, seamless disaster recovery has become even more critical and difficult. There is more data than we have ever handled before, and most of it is very, very big.

Most disaster recovery (DR) sites are in standby mode—assets sitting idle, waiting for their turn. The sites are either holding data copied through a storage area network (SAN) or using other data replication mechanisms to propagate information from a live site to a standby site. When disaster strikes, clients are redirected to the standby site where they’re greeted with a polite “please wait” while the site spins up.

At best, the DR site is a hot standby that is ready to go on short notice. DNS redirects clients to the DR site and they’re good to go.

What about all the machines at the DR site? With active/passive replication you can probably do queries on the slave site, but what if you want to make full use of all of that expensive gear and go active/active? The challenge is in the data replication technology. Most current data replication architectures are one-way. If it’s not one-way, it can come with restrictions—for example, you need to avoid opening files with exclusive access.

Comparing Replication for MySQL, Oracle and Geode

The open source database MySQL only supports one-way replication for DR. It is limited to active/passive replication to a standby server (for more on how this works see How does MySQL Replication really work?). Replication is single threaded and based on a binary log that is written by the database. While it’s a slave, you can query the data. When you fail over, the slave database needs to be made into master and web servers need to be brought online before the site is usable.

At least one Oracle replication service offers bi-directional active/active support but the cost is high as it involves a lot of database writes to disk back and forth and there are contention points that slow it down and can cause gaps in replication during a failover.

At the other end of the spectrum, Geode supports full, in-memory, bi-directional (or N-way) replication. A no-SAN, shared-nothing configuration like Geode’s WAN gateway lets data flow in both directions at high speed and scale. You can even hook all of your sites together in a ring to limit the amount of work a site has to do to replicate its changes. In this type of topology, changes are replicated to a neighboring site that does its own store-and-forward replication.

Parallel, Contention-free Replication

The replication service in Geode is transparent to applications and does not affect normal use of the distributed big data grid. Setup is easy too as there is nothing to change in your configuration other than enabling the WAN service and providing the WAN endpoint(s) to use for replication. Geode automatically sets up a parallel asynchronous replication system across your machines that reaches out to the remote site and efficiently batches changes asynchronously across the WAN. This is markedly simpler than what you get with most other systems, like http://gemfire.docs.pivotal.io/docs-gemfire/latest/managing/disk_storage/operation_logs.htmlEhcache, where you are responsible for setting up major components like a JMS messaging bus between sites.

The type of replication it uses is parallel replication. Parallel replication spreads the work across servers and eliminates contention, or communication bottlenecks between data stores. Data partitioning breaks the data and work apart across many servers, building in horizontal scalability. Each partition has a replication queue that has one or more redundant backups in case of failure and streams data to a disk store for recovery. To ensure data remains persistent across a shut-down or fail-over, Geode uses Oplogging, similar to Apache Kafka, for high performance instead of the random-access messaging of traditional databases that is so slow.

Parallel replication allows Geode’s solution to keep up with the blinding speed of Geode’s data grid. Prior to having this feature the WAN gateway had higher latency and could not keep up as easily with the busy data grid.

With the innovation of parallel replication, the replication mechanism itself is as elastic as the entire Geode data grid. If you add more capacity by adding more cache nodes, WAN replication capacity is also expanded at the same time. Communications between sites is so distributed that bottlenecks can be nearly eliminated and latency stays very low.

Dealing with Active/Active conflicts—A Detailed Example

Contention happens as the database prevents data from being updated when two or more processes try to modify the same data. Essentially, the first process to the data will lock the data down until it is done with it. In an active/active configuration, data flows back and forth and the times when the same data could be modified simultaneously increases. However, Geode allows the simultaneous updates and automatically detects the conflict and retains the latest data. You can also add your own GatewayConflictResolver to handle the problem. The conflict resolution approach is similar to how most active/active technologies work.

GatewayConflictResolver is handed both pieces of information along with details about the operation being performed and when the changes happened. It can choose to keep the incoming change, reject it, merge it, or do something completely different. The product comes with an example of active/active replication showing how to do it. For example, let’s say you have a collection of Items that were modified at the same time by two sites. One adds the item “Nike Fuel” to the collection and the other adds “Jawbone UP” to it. The GatewayConflictResolver would see this in its onEvent callback:

TimestampedEntryEvent ( Key = WishList103499 oldSystemId = 1 newSystemId = 2 oldTimestamp = 1363885242046 (milli clock) newTimestamp = 1363888231022 oldValue = ( “Fit Bit” “Nike Fuel” ) newValue = ( “Fit Bit” “Jawbone UP” ) )

The resolver can then merge the two collections together into (“Fit Bit” “Nike Fuel” “Jawbone UP”) and use the “helper” that is passed to it via onEvent to change the event value. Both systems would see a conflict, and their respective conflict resolver plugins would be responsible for attaining a consistent result.

As mentioned, Geode comes with an active/active example–most of the code is already there and ready to be copied and pasted into your own resolver. It even shows you how to resolve a conflict on compact PDX serialized data to keep CPU cost low when resolving conflicts. Note, PDX is a high-speed serialization mechanism that comes with Geode.

Putting data into a Geode cache is easy. Geode allows you to carve up the datagrid into maps that it calls Regions. Each Region is a key/value store that can either be replicated on other nodes and/or partitioned across the grid. When you create a Region, you give it a name, and Geode uses this to reach out to other nodes in the grid and inform them of your participation in the grid. Then you can pull data out of the Region using Map, ConcurrentMap interfaces and the extensions added by Geode’s Region interface including queries and transactions.

Here are some good examples of using Geode with Spring Data—all you have to do is start up some Geode data nodes with a Gateway hub, give the hub the address of your other data centers, then add a Geode Repository to your app.

If you just need to replicate session state between sites for fast failover, Geode has plug-in modules for HTTP session management with tc Server and webservers similar to Apache Tomcat, cache for Hibernate, Memcached. Bi-directional WAN replication lets you use both sites and gives continuous availability of data. If one of the sites goes down, clients can be distributed among remaining sites. Data not yet transmitted by the failed site is persisted on disk on the down site—the data will be missing until one of the redundant WAN replication queues is brought back up. So this issue must be dealt with, but the issue is the same for any replication technology. Of course, there is always a lag in synchronization while the data is being carted from one system to the other. With Geode, the lag is very small, because the data is queued for replication as soon as it’s put in the cache.

About the Author:

Bruce Schuchardt is a senior engineer at The Pivotal Initiative and has worked on Geode since its inception (as GemFire) in 2002. He is responsible for the replication, consistency and membership systems. Bruce holds an MS in computer science from University of Massachusetts