HBase model includes the following types:
HBase entities are created and de-duped in Atlas using unique attribute qualifiedName, whose value should be formatted as detailed below. Note that namespaceName, tableName and columnFamilyName should be in lower case.
hbase_namespace.qualifiedName: <namespaceName>@<clusterName> hbase_table.qualifiedName: <namespaceName>:<tableName>@<clusterName> hbase_column_family.qualifiedName: <namespaceName>:<tableName>.<columnFamilyName>@<clusterName>
Atlas HBase hook registers with HBase master as a co-processor. On detecting changes to HBase namespaces/tables/column-families, Atlas hook updates the metadata in Atlas via Kafka notifications. Follow the instructions below to setup Atlas hook in HBase:
<property> <name>hbase.coprocessor.master.classes</name> <value>org.apache.atlas.hbase.hook.HBaseAtlasCoprocessor</value> </property>
The following properties in atlas-application.properties control the thread pool and notification details:
atlas.hook.hbase.synchronous=false # whether to run the hook synchronously. false recommended to avoid delays in HBase operations. Default: false atlas.hook.hbase.numRetries=3 # number of retries for notification failure. Default: 3 atlas.hook.hbase.queueSize=10000 # queue size for the threadpool. Default: 10000 atlas.cluster.name=primary # clusterName to use in qualifiedName of entities. Default: primary atlas.kafka.zookeeper.connect= # Zookeeper connect URL for Kafka. Example: localhost:2181 atlas.kafka.zookeeper.connection.timeout.ms=30000 # Zookeeper connection timeout. Default: 30000 atlas.kafka.zookeeper.session.timeout.ms=60000 # Zookeeper session timeout. Default: 60000 atlas.kafka.zookeeper.sync.time.ms=20 # Zookeeper sync time. Default: 20
Other configurations for Kafka notification producer can be specified by prefixing the configuration name with "atlas.kafka.". For list of configuration supported by Kafka producer, please refer to Kafka Producer Configs
Apache Atlas provides a command-line utility, import-hbase.sh, to import metadata of Apache HBase namespaces and tables into Apache Atlas. This utility can be used to initialize Apache Atlas with namespaces/tables present in a Apache HBase cluster. This utility supports importing metadata of a specific table, tables in a specific namespace or all tables.
Usage 1: <atlas package>/hook-bin/import-hbase.sh Usage 2: <atlas package>/hook-bin/import-hbase.sh [-n <namespace regex> OR --namespace <namespace regex>] [-t <table regex> OR --table <table regex>] Usage 3: <atlas package>/hook-bin/import-hbase.sh [-f <filename>] File Format: namespace1:tbl1 namespace1:tbl2 namespace2:tbl1