Sqoop Model
The default hive model includes the following types:
- Entity types:
- sqoop_process
- super-types: Process
- attributes: name, operation, dbStore, hiveTable, commandlineOpts, startTime, endTime, userName
- sqoop_dbdatastore
- super-types: DataSet
- attributes: name, dbStoreType, storeUse, storeUri, source, description, ownerName
- Enum types:
- sqoop_operation_type
- values: IMPORT, EXPORT, EVAL
- sqoop_dbstore_usage
- values: TABLE, QUERY, PROCEDURE, OTHER
The entities are created and de-duped using unique qualified name. They provide namespace and can be used for querying as well:
- sqoop_process.qualifiedName - dbStoreType-storeUri-endTime
- sqoop_dbdatastore.qualifiedName - dbStoreType-storeUri-source
Sqoop Hook
Sqoop added a SqoopJobDataPublisher that publishes data to Atlas after completion of import Job. Today, only hiveImport is supported in SqoopHook. This is used to add entities in Atlas using the model detailed above.
Follow the instructions below to setup Atlas hook in Hive:
Add the following properties to to enable Atlas hook in Sqoop:
- Set-up Atlas hook in <sqoop-conf>/sqoop-site.xml by adding the following:
<property>
<name>sqoop.job.data.publish.class</name>
<value>org.apache.atlas.sqoop.hook.SqoopHook</value>
</property>
- Copy <atlas-conf>/atlas-application.properties to to the sqoop conf directory <sqoop-conf>/
- Link <atlas-home>/hook/sqoop/*.jar in sqoop lib
Refer Configuration for notification related configurations
NOTES
- Only the following sqoop operations are captured by sqoop hook currently - hiveImport