It is a tool that facilitates efficient bi-directional bulk data transfer between HDFS and RDBMS.
Features:
- Internally uses JDBC for importing and exporting the data.
- For use cases that require fast data transfers, direct mode of Sqoop enables the use of bulk copy utilities.
- Supports various file formats – Text, Sequence file, Avro.
- Supports Hive and Hbase imports.
- Provides metastore to save jobs.
- Supports incremental imports (RDBMS to HDFS).
- Is easily extensible.
Database Systems Supported by Sqoop:
MySQL being an open source database; has always been the main focus of Apache Community. The best connector that Sqoop packages is for MySQL.
In all, Sqoop supports following databases:
- MySQL (direct mode support as well)
- Oracle
- SqlServer
- PostGre (direct mode support as well)
- DB2
- Hsqldb
- generic piece of code that works for all the databases (the functionality is limited).
Third-party extensions:
One of the strong advantages of Sqoop is that it is extensible. There are a number of third-party companies shipping database-specific connectors:
| RDBMS | Developed by | Link |
|---|---|---|
| Teradata | Cloudera | View |
| Netezza | Cloudera | View |
| Oracle | Quest | View |
| Microsoft Sql Server | Microsoft | View |
| Microsoft PDW | Microsoft | |
| Couchbase | Couchbase | View |
| VoltDB | VoltDB | Blog |
History:
Sqoop was initially developed and maintained by Cloudera. It was incubated in Apache on 23 July 2011, since then Apache committee manages the releases. When Sqoop was under incubation, following versions were released:
| Version | Download | Docs | Release Manager |
|---|---|---|---|
| Sqoop-1.4.0-incubating | 1.4.0-incubating | 1.4.0-incubating | Bilung Lee |
| Sqoop-1.4.1-incubating | 1.4.1-incubating | 1.4.1-incubating | Jarek Jarcec Cecho |
In march 2012, Sqoop graduated to a Top Level Project in Apache. Releases after that:
| Version | Download | Docs | Release Manager |
|---|---|---|---|
| Sqoop-1.4.2 | 1.4.2 | 1.4.2 | Abhijeet Gaikwad (mentored by Jarek Jarcec Cecho) |
An excellent information about Sqoop graduation and Versions is provided on this blog by Arvind Prabhakar.
Sqoop 2:
Few limitations in Sqoop lead to the experimental development of entirely new Sqoop 2. The disadvantages and new design is proposed here.
The first release in this branch:
| Version | Download | Docs | Release Manager |
|---|---|---|---|
| Sqoop-1.99.1 | 1.99.1 | 1.99.1 | Jarek Jarcec Cecho |
Jarcec proposed that 1.99.1 version name is apt because it is away from current stable 1.4 and is near to 2.0. It is the first release in 2.0 series and will move to 2.0 when more stable. The proposal was accepted by all developers who voted.