[hadoop@hadoop000 ~]$ sqoop help usage: sqoop COMMAND [ARGS] Available commands: codegen Generate code to interact with database records create-hive-table Import a table definition into Hive eval Evaluate a SQL statement and display the results export Export an HDFS directory to a database table help List available commands import Import a table from a database to HDFS import-all-tables Import tables from a database to HDFS import-mainframe Import datasets from a mainframe server to HDFS job Work with saved jobs list-databases List available databases on a server list-tables List available tables in a database merge Merge results of incremental imports metastore Run a standalone Sqoop metastore version Display version information See 'sqoop help COMMAND' for information on a specific command. # 这里提示我们使用sqoop help command(要查询的命令)进行该命令的详细查询
2.list-databases
# 查看list-databases命令帮助 [hadoop@hadoop000 ~]$ sqoop help list-databases usage: sqoop list-databases [GENERIC-ARGS] [TOOL-ARGS] Common arguments: --connect <jdbc-uri> Specify JDBC connect string --connection-manager <class-name> Specify connection manager class name --connection-param-file <properties-file> Specify connection parameters file --driver <class-name> Manually specify JDBC driver class to use --hadoop-home <hdir> Override $HADOOP_MAPRED_HOME_ARG --hadoop-mapred-home <dir> Override $HADOOP_MAPRED_HOME_ARG --help Print usage instructions -P Read password from console --password <password> Set authentication password --password-alias <password-alias> Credential provider password alias --password-file <password-file> Set authentication password file path --relaxed-isolation Use read-uncommitted isolation for imports --skip-dist-cache Skip copying jars to distributed cache --username <username> Set authentication username --verbose Print more information while working # 简单使用 [hadoop@oradb3 ~]$ sqoop list-databases \ > --connect jdbc:mysql://localhost:3306 \ > --username root \ > --password 123456 # 结果 information_schema mysql performance_schema slow_query_log sys test
3.list-tables
# 命令帮助 [hadoop@hadoop000 ~]$ sqoop help list-tables usage: sqoop list-tables [GENERIC-ARGS] [TOOL-ARGS] Common arguments: --connect <jdbc-uri> Specify JDBC connect string --connection-manager <class-name> Specify connection manager class name --connection-param-file <properties-file> Specify connection parameters file --driver <class-name> Manually specify JDBC driver class to use --hadoop-home <hdir> Override $HADOOP_MAPRED_HOME_ARG --hadoop-mapred-home <dir> Override $HADOOP_MAPRED_HOME_ARG --help Print usage instructions -P Read password from console --password <password> Set authentication password --password-alias <password-alias> Credential provider password alias --password-file <password-file> Set authentication password file path --relaxed-isolation Use read-uncommitted isolation for imports --skip-dist-cache Skip copying jars to distributed cache --username <username> Set authentication username --verbose Print more information while working # 使用方法 [hadoop@hadoop000 ~]$ sqoop list-tables \ > --connect jdbc:mysql://localhost:3306/test \ > --username root \ > --password 123456 # 结果 t_order test0001 test_1013 test_dyc test_tb
4.将mysql导入HDFS中(import)
(默认导入当前用户目录下/user/用户名/表名)
说到这里扩展一个小知识点:
- hadoop fs -ls 显示的是当前的用户目录 即/user/hadoop
hadoop fs -ls / 显示的是HDFS根目录
# 查看命令帮助 [hadoop@hadoop000 ~]$ sqoop help import # 执行import [hadoop@hadoop000 ~]$ sqoop import \ > --connect jdbc:mysql://localhost:3306/test \ > --username root \ > --password 123456 \ > --table students
这时很可能会出现这个错误
Exception in thread “main” java.lang.NoClassDefFoundError: org/json/JSONObject
这里我们需要导入java-json.jar包 下载地址 把java-json.jar添加到../sqoop/lib目录下即可
# 再次执行 import导入 [hadoop@hadoop000 ~]$ sqoop import \ > --connect jdbc:mysql://localhost:3306/test \ > --username root \ > --password 123456 \ > --table students 18/07/04 13:28:35 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.7.0 18/07/04 13:28:35 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 18/07/04 13:28:35 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 18/07/04 13:28:35 INFO tool.CodeGenTool: Beginning code generation 18/07/04 13:28:35 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `students` AS t LIMIT 1 18/07/04 13:28:35 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `students` AS t LIMIT 1 18/07/04 13:28:35 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/app/hadoop-2.6.0-cdh5.7.0 18/07/04 13:28:37 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/3024b8df04f623e8c79ed9b5b30ace75/students.jar 18/07/04 13:28:37 WARN manager.MySQLManager: It looks like you are importing from mysql. 18/07/04 13:28:37 WARN manager.MySQLManager: This transfer can be faster! Use the --direct 18/07/04 13:28:37 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path. 18/07/04 13:28:37 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql) 18/07/04 13:28:37 INFO mapreduce.ImportJobBase: Beginning import of students 18/07/04 13:28:38 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 18/07/04 13:28:39 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 18/07/04 13:28:39 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 18/07/04 13:28:41 INFO db.DBInputFormat: Using read commited transaction isolation 18/07/04 13:28:41 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`id`), MAX(`id`) FROM `students` 18/07/04 13:28:41 INFO db.IntegerSplitter: Split size: 0; Num splits: 4 from: 1001 to: 1003 18/07/04 13:28:41 INFO mapreduce.JobSubmitter: number of splits:3 18/07/04 13:28:42 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1530598609758_0015 18/07/04 13:28:42 INFO impl.YarnClientImpl: Submitted application application_1530598609758_0015 18/07/04 13:28:42 INFO mapreduce.Job: The url to track the job: http://oradb3:8088/proxy/application_1530598609758_0015/ 18/07/04 13:28:42 INFO mapreduce.Job: Running job: job_1530598609758_0015 18/07/04 13:28:52 INFO mapreduce.Job: Job job_1530598609758_0015 running in uber mode : false 18/07/04 13:28:52 INFO mapreduce.Job: map 0% reduce 0% 18/07/04 13:28:58 INFO mapreduce.Job: map 33% reduce 0% 18/07/04 13:28:59 INFO mapreduce.Job: map 67% reduce 0% 18/07/04 13:29:00 INFO mapreduce.Job: map 100% reduce 0% 18/07/04 13:29:00 INFO mapreduce.Job: Job job_1530598609758_0015 completed successfully 18/07/04 13:29:00 INFO mapreduce.Job: Counters: 30 ... 18/07/04 13:29:00 INFO mapreduce.ImportJobBase: Transferred 40 bytes in 21.3156 seconds (1.8766 bytes/sec) 18/07/04 13:29:00 INFO mapreduce.ImportJobBase: Retrieved 3 records. # 生成的日志信息大家一定要好好理解 # 查看HDFS上的文件 [hadoop@hadoop000 ~]$ hadoop fs -ls /user/hadoop/students Found 4 items -rw-r--r-- 1 hadoop supergroup 0 2018-07-04 13:28 /user/hadoop/students/_SUCCESS -rw-r--r-- 1 hadoop supergroup 13 2018-07-04 13:28 /user/hadoop/students/part-m-00000 -rw-r--r-- 1 hadoop supergroup 13 2018-07-04 13:28 /user/hadoop/students/part-m-00001 -rw-r--r-- 1 hadoop supergroup 14 2018-07-04 13:28 /user/hadoop/students/part-m-00002 [hadoop@hadoop000 ~]$ hadoop fs -cat /user/hadoop/students/"part*" 1001,lodd,23 1002,sdfs,21 1003,sdfsa,24
我们还可以加一些其他参数 使导入过程更加可控
-m 指定启动map进程个数,默认是4个
–delete-target-dir 删除目标目录
–mapreduce-job-name 指定mapreduce的job的名字
–target-dir 导入到指定目录
–fields-terminated-by 指定字段之间的分隔符
–null-string 含义是 string类型的字段,当Value是NULL,替换成指定的字符
–null-non-string 含义是非string类型的字段,当Value是NULL,替换成指定字符
–columns 导入表中的部分字段
–where 按条件导入数据
–query 按照sql语句进行导入 使用–query关键字,就不能使用–table和–columns
–options-file 在文件中执行
# 执行导入 [hadoop@hadoop000 ~]$ sqoop import \ > --connect jdbc:mysql://localhost:3306/test \ > --username root --password 123456 \ > --mapreduce-job-name FromMySQL2HDFS \ > --delete-target-dir \ > --table students \ > -m 1 # HDFS中查看 [hadoop@hadoop000 ~]$ hadoop fs -ls /user/hadoop/students Found 2 items -rw-r--r-- 1 hadoop supergroup 0 2018-07-04 13:53 /user/hadoop/students/_SUCCESS -rw-r--r-- 1 hadoop supergroup 40 2018-07-04 13:53 /user/hadoop/students/part-m-00000 [hadoop@oradb3 ~]$ hadoop fs -cat /user/hadoop/students/"part*" 1001,lodd,23 1002,sdfs,21 1003,sdfsa,24
# 使用where 参数 [hadoop@hadoop000 ~]$ sqoop import \ > --connect jdbc:mysql://localhost:3306/test \ > --username root --password 123456 \ > --table students \ > --mapreduce-job-name FromMySQL2HDFS2 \ > --delete-target-dir \ > --fields-terminated-by '\t' \ > -m 1 \ > --null-string 0 \ > --columns "name" \ > --target-dir STU_COLUMN_WHERE \ > --where 'id<1002' # HDFS 结果 [hadoop@hadoop000 ~]$ hadoop fs -cat STU_COLUMN_WHERE/"part*" lodd
# 使用query 参数 [hadoop@hadoop000 ~]$ sqoop import \ > --connect jdbc:mysql://localhost:3306/test \ > --username root --password 123456 \ > --mapreduce-job-name FromMySQL2HDFS3 \ > --delete-target-dir \ > --fields-terminated-by '\t' \ > -m 1 \ > --null-string 0 \ > --target-dir STU_COLUMN_QUERY \ > --query "select * from students where id>1001 and \$CONDITIONS" # HDFS查看 [hadoop@hadoop000 ~]$ hadoop fs -cat STU_COLUMN_QUERY/"part*" 1002 sdfs 21 1003 sdfsa 24
# 使用options-file参数 [hadoop@hadoop000 ~]$ vi sqoop-import-hdfs.txt import --connect jdbc:mysql://localhost:3306/test --username root --password 123456 --table students --target-dir STU_option_file # 执行导入 [hadoop@hadoop000 ~]$ sqoop --options-file /home/hadoop/sqoop-import-hdfs.txt # HDFS查看 [hadoop@hadoop000 ~]$ hadoop fs -cat STU_option_file/"part*" 1001,lodd,23 1002,sdfs,21 1003,sdfsa,24
5.eval
查看帮助命令对与该命令的解释为: Evaluate a SQL statement and display the results,也就是说执行一个SQL语句并查询出结果。
# 查看命令帮助 [hadoop@hadoop000 ~]$ sqoop help eval usage: sqoop eval [GENERIC-ARGS] [TOOL-ARGS] Common arguments: --connect <jdbc-uri> Specify JDBC connect string --connection-manager <class-name> Specify connection manager class name --connection-param-file <properties-file> Specify connection parameters file --driver <class-name> Manually specify JDBC driver class to use --hadoop-home <hdir> Override $HADOOP_MAPRED_HOME_ARG --hadoop-mapred-home <dir> Override $HADOOP_MAPRED_HOME_ARG --help Print usage instructions -P Read password from console --password <password> Set authentication password --password-alias <password-alias> Credential provider password alias --password-file <password-file> Set authentication password file path --relaxed-isolation Use read-uncommitted isolation for imports --skip-dist-cache Skip copying jars to distributed cache --username <username> Set authentication username --verbose Print more information while working SQL evaluation arguments: -e,--query <statement> Execute 'statement' in SQL and exit # 执行 [hadoop@hadoop000 ~]$ sqoop eval \ > --connect jdbc:mysql://localhost:3306/test \ > --username root --password 123456 \ > --query "select * from students" 18/07/04 14:28:44 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.7.0 18/07/04 14:28:44 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 18/07/04 14:28:44 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. ---------------------------------------------------- | id | name | age | ---------------------------------------------------- | 1001 | lodd | 23 | | 1002 | sdfs | 21 | | 1003 | sdfsa | 24 | ----------------------------------------------------
6.export (HDFS数据导出到MySQL或Hive中的数据导入到MySQL)
常用参数:
–table 指定导出表的名称
–input-fields-terminated-by 指定hdfs上文件的分隔符,默认是逗号
–export-dir 导出数据的目录
–columns 指定导出的字段
在执行导出语句前mysql要先创建表(不创建表会报错):
# HDFS原文件 [hadoop@hadoop000 ~]$ hadoop fs -cat /user/hadoop/students/part-m-00000 1001,lodd,23 1002,sdfs,21 1003,sdfsa,24 # export导出到mysql [hadoop@hadoop000 ~]$ sqoop export \ > --connect jdbc:mysql://localhost:3306/test \ > --username root \ > --password 123456 \ > --table students_demo \ > --export-dir /user/hadoop/students/ 18/07/04 14:46:20 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.7.0 18/07/04 14:46:20 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 18/07/04 14:46:20 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 18/07/04 14:46:20 INFO tool.CodeGenTool: Beginning code generation 18/07/04 14:46:21 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `students_demo` AS t LIMIT 1 18/07/04 14:46:21 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `students_demo` AS t LIMIT 1 18/07/04 14:46:21 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/app/hadoop-2.6.0-cdh5.7.0 18/07/04 14:46:24 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/fc7b53dd6eef701c0731c7a7c4a4b340/students_demo.jar 18/07/04 14:46:24 INFO mapreduce.ExportJobBase: Beginning export of students_demo 18/07/04 14:46:25 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 18/07/04 14:46:25 INFO Configuration.deprecation: mapred.map.max.attempts is deprecated. Instead, use mapreduce.map.maxattempts 18/07/04 14:46:26 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 18/07/04 14:46:26 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative 18/07/04 14:46:26 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps ... 18/07/04 14:46:55 INFO mapreduce.ExportJobBase: Transferred 672 bytes in 29.3122 seconds (22.9256 bytes/sec) 18/07/04 14:46:55 INFO mapreduce.ExportJobBase: Exported 3 records. # mysql中查看 mysql> select * from students_demo; +------+-------+------+ | id | name | age | +------+-------+------+ | 1001 | lodd | 23 | | 1002 | sdfs | 21 | | 1003 | sdfsa | 24 | +------+-------+------+ 3 rows in set (0.00 sec)
如果再导入一次会追加在表中
# 增加columns参数 [hadoop@hadoop000 ~]$ sqoop export \ > --connect jdbc:mysql://localhost:3306/test \ > --username root \ > --password 123456 \ > --table students_demo2 \ > --export-dir /user/hadoop/students/ \ > --columns id,name # mysql结果 mysql> select * from students_demo2; +------+-------+------+ | id | name | age | +------+-------+------+ | 1001 | lodd | NULL | | 1002 | sdfs | NULL | | 1003 | sdfsa | NULL | +------+-------+------+ 3 rows in set (0.00 sec)
7.MySQL的中的数据导入到Hive中
常用参数:
–create-hive-table 创建目标表,如果有会报错
–hive-database 指定hive数据库
–hive-import 指定导入hive(没有这个条件导入到hdfs中)
–hive-overwrite 覆盖
–hive-table 指定hive中表的名字,如果不指定使用导入的表的表名
–hive-partition-key 指定Hive分区表字段
–hive-partition-value 指定导入的分区值
首次导入可能会报错如下:
18/07/04 15:06:26 ERROR hive.HiveConfig: Could not load org.apache.hadoop.hive.conf.HiveConf. Make sure HIVE_CONF_DIR is set correctly. 18/07/04 15:06:26 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf
解决方法:到hive目录的lib下拷贝几个jar包,问题就解决了
# 报错解决方法 [hadoop@hadoop000 lib]$ pwd /home/hadoop/app/hive-1.1.0-cdh5.7.0/lib [hadoop@hadoop000 lib]$ cp hive-common-1.1.0-cdh5.7.0.jar /home/hadoop/app/sqoop-1.4.6-cdh5.7.0/lib/ [hadoop@hadoop000 lib]$ cp hive-shims* /home/hadoop/app/sqoop-1.4.6-cdh5.7.0/lib/
# 报错解决后执行导入 [hadoop@hadoop000 ~]$ sqoop import \ > --connect jdbc:mysql://localhost:3306/test \ > --username root --password 123456 \ > --table students \ > --create-hive-table \ > --hive-database hive \ > --hive-import \ > --hive-overwrite \ > --hive-table stu_import \ > --mapreduce-job-name FromMySQL2HIVE \ > --delete-target-dir \ > --fields-terminated-by '\t' \ > -m 1 \ > --null-non-string 0 # Hive中查看 hive> show tables; OK stu_import Time taken: 0.051 seconds, Fetched: 1 row(s) hive> select * from stu_import; OK 1001 lodd 23 1002 sdfs 21 1003 sdfsa 24 Time taken: 0.969 seconds, Fetched: 3 row(s)
建议:导入Hive不建议大家使用–create-hive-table参数,建议事先创建好hive表;因为自动创建的表字段类型可能并不是我们想要的。
# 增加partition参数 [hadoop@hadoop000 ~]$ sqoop import \ > --connect jdbc:mysql://localhost:3306/test \ > --username root --password 123456 \ > --table students \ > --create-hive-table \ > --hive-database hive \ > --hive-import \ > --hive-overwrite \ > --hive-table stu_import2 \ > --mapreduce-job-name FromMySQL2HIVE2 \ > --delete-target-dir \ > --fields-terminated-by '\t' \ > -m 1 \ > --null-non-string 0 \ > --hive-partition-key dt \ > --hive-partition-value "2018-08-08" # Hive中查看 hive> select * from stu_import2; OK 1001 lodd 23 2018-08-08 1002 sdfs 21 2018-08-08 1003 sdfsa 24 2018-08-08 Time taken: 0.192 seconds, Fetched: 3 row(s)
8.sqoop job的使用
sqoop job可以将执行的语句变成一个job,并不是在创建语句的时候执行,你可以查看该job,可以任何时候执行该job,也可以删除job,这样就方便我们进行任务的调度。
–create <job-id> 创建一个新的job.
–delete <job-id> 删除job
–exec <job-id> 执行job
–show <job-id> 显示job的参数
–list 列出所有的job
# 创建job [hadoop@hadoop000 ~]$ sqoop job --create person_job1 -- import --connect jdbc:mysql://localhost:3306/test \ > --username root \ > --password 123456 \ > --table students_demo \ > -m 1 \ > --delete-target-dir # 查看job [hadoop@hadoop000 ~]$ sqoop job --list Available jobs: person_job1 # 执行job 会提示输入mysql root用户密码 [hadoop@hadoop000 ~]$ sqoop job --exec person_job1 # HDFS查看 [hadoop@hadoop000 lib]$ hadoop fs -ls /user/hadoop/students_demo Found 2 items -rw-r--r-- 1 hadoop supergroup 0 2018-07-04 15:34 /user/hadoop/students_demo/_SUCCESS -rw-r--r-- 1 hadoop supergroup 40 2018-07-04 15:34 /user/hadoop/students_demo/part-m-00000
我们发现执行person_job的时候,需要输入数据库的密码,怎么样能不输入密码呢
配置sqoop-site.xml即可解决
# 将sqoop.metastore.client.record.password参数的注释去掉 或者再添加一下 [hadoop@hadoop000 conf]$ pwd /home/hadoop/app/sqoop-1.4.6-cdh5.7.0/conf [hadoop@hadoop000 conf]$ vi sqoop-site.xml <property> <name>sqoop.metastore.client.record.password</name> <value>true</value> <description>If true, allow saved passwords in the metastore. </description> </property>