Apache Atlas uses and interacts with a variety of systems to provide metadata management and data lineage to data administrators. By choosing and configuring these dependencies appropriately, it is possible to achieve a good degree of service availability with Atlas. This document describes the state of high availability support in Atlas, including its capabilities and current limitations, and also the configuration required for achieving a this level of high availability.
The architecture page in the wiki gives an overview of the various components that make up Atlas. The options mentioned below for various components derive context from the above page, and would be worthwhile to review before proceeding to read this page.
Currently, the Atlas Web service has a limitation that it can only have one active instance at a time. Therefore, in case of errors to the host running the service, a new Atlas web service instance should be brought up and pointed to from the clients. In future versions of the system, we plan to provide full High Availability of the service, thereby enabling hot failover. To minimize service loss, we recommend the following:
listen atlas bind <proxy hostname>:<proxy port> balance roundrobin server inst1 <atlas server hostname>:<port> check server inst2 <atlas backup server hostname>:<port> check backup
As described above, Atlas uses Titan to store the metadata it manages. By default, Titan uses BerkeleyDB as an embedded backing store. However, this option would result in loss of data if the node running the Atlas server fails. In order to provide HA for the metadata store, we recommend that Atlas be configured to use HBase as the backing store for Titan. Doing this implies that you could benefit from the HA guarantees HBase provides. In order to configure Atlas to use HBase in HA mode, do the following:
As described above, Atlas indexes metadata through Titan to support full text search queries. In order to provide HA for the index store, we recommend that Atlas be configured to use Solr as the backing index store for Titan. In order to configure Atlas to use Solr in HA mode, do the following:
Metadata notification events from Hooks are sent to Atlas by writing them to a Kafka topic called ATLAS_HOOK. Similarly, events from Atlas to other integrating components like Ranger, are written to a Kafka topic called ATLAS_ENTITIES. Since Kafka persists these messages, the events will not be lost even if the consumers are down as the events are being sent. In addition, we recommend Kafka is also setup for fault tolerance so that it has higher availability guarantees. In order to configure Atlas to use Kafka in HA mode, do the following:
$KAFKA_HOME/bin/kafka-topics.sh --create --zookeeper <list of zookeeper host:port entries> --topic ATLAS_HOOK --replication-factor <numReplicas> --partitions 1 $KAFKA_HOME/bin/kafka-topics.sh --create --zookeeper <list of zookeeper host:port entries> --topic ATLAS_ENTITIES --replication-factor <numReplicas> --partitions 1 Here KAFKA_HOME points to the Kafka installation directory.
atlas.notification.embedded=false atlas.kafka.zookeeper.connect=<comma separated list of servers forming Zookeeper quorum used by Kafka> atlas.kafka.bootstrap.servers=<comma separated list of Kafka broker endpoints in host:port form> - Give at least 2 for redundancy.