博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Sqoop基本语法简介
阅读量:5921 次
发布时间:2019-06-19

本文共 18895 字,大约阅读时间需要 62 分钟。

简介:

本篇文章主要介绍sqoop的基本语法及简单使用方法。

1.查看命令帮助

[hadoop@hadoop000 ~]$ sqoop helpusage: sqoop COMMAND [ARGS]Available commands:  codegen            Generate code to interact with database records  create-hive-table  Import a table definition into Hive  eval               Evaluate a SQL statement and display the results  export             Export an HDFS directory to a database table  help               List available commands  import             Import a table from a database to HDFS  import-all-tables  Import tables from a database to HDFS  import-mainframe   Import datasets from a mainframe server to HDFS  job                Work with saved jobs  list-databases     List available databases on a server  list-tables        List available tables in a database  merge              Merge results of incremental imports  metastore          Run a standalone Sqoop metastore  version            Display version informationSee 'sqoop help COMMAND' for information on a specific command.# 这里提示我们使用sqoop help command(要查询的命令)进行该命令的详细查询

2.list-databases

# 查看list-databases命令帮助[hadoop@hadoop000 ~]$ sqoop help list-databasesusage: sqoop list-databases [GENERIC-ARGS] [TOOL-ARGS]Common arguments:   --connect 
Specify JDBC connect string --connection-manager
Specify connection manager class name --connection-param-file
Specify connection parameters file --driver
Manually specify JDBC driver class to use --hadoop-home
Override $HADOOP_MAPRED_HOME_ARG --hadoop-mapred-home
Override $HADOOP_MAPRED_HOME_ARG --help Print usage instructions-P Read password from console --password
Set authentication password --password-alias
Credential provider password alias --password-file
Set authentication password file path --relaxed-isolation Use read-uncommitted isolation for imports --skip-dist-cache Skip copying jars to distributed cache --username
Set authentication username --verbose Print more information while working# 简单使用[hadoop@oradb3 ~]$ sqoop list-databases \> --connect jdbc:mysql://localhost:3306 \> --username root \> --password 123456# 结果information_schemamysqlperformance_schemaslow_query_logsystest

3.list-tables

# 命令帮助[hadoop@hadoop000 ~]$ sqoop help list-tablesusage: sqoop list-tables [GENERIC-ARGS] [TOOL-ARGS]Common arguments:   --connect 
Specify JDBC connect string --connection-manager
Specify connection manager class name --connection-param-file
Specify connection parameters file --driver
Manually specify JDBC driver class to use --hadoop-home
Override $HADOOP_MAPRED_HOME_ARG --hadoop-mapred-home
Override $HADOOP_MAPRED_HOME_ARG --help Print usage instructions-P Read password from console --password
Set authentication password --password-alias
Credential provider password alias --password-file
Set authentication password file path --relaxed-isolation Use read-uncommitted isolation for imports --skip-dist-cache Skip copying jars to distributed cache --username
Set authentication username --verbose Print more information while working# 使用方法[hadoop@hadoop000 ~]$ sqoop list-tables \> --connect jdbc:mysql://localhost:3306/test \> --username root \> --password 123456# 结果t_ordertest0001test_1013test_dyctest_tb

4.将mysql导入HDFS中(import)

(默认导入当前用户目录下/user/用户名/表名)

说到这里扩展一个小知识点:

  • hadoop fs -ls 显示的是当前的用户目录 即/user/hadoop
    hadoop fs -ls / 显示的是HDFS根目录
# 查看命令帮助[hadoop@hadoop000 ~]$ sqoop help import# 执行import[hadoop@hadoop000 ~]$ sqoop import \> --connect jdbc:mysql://localhost:3306/test \> --username root \> --password 123456 \> --table students

这时很可能会出现这个错误

Exception in thread "main" java.lang.NoClassDefFoundError: org/json/JSONObject
这里我们需要导入java-json.jar包 把java-json.jar添加到../sqoop/lib目录下即可

# 再次执行 import导入[hadoop@hadoop000 ~]$ sqoop import \> --connect jdbc:mysql://localhost:3306/test \> --username root \> --password 123456 \> --table students18/07/04 13:28:35 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.7.018/07/04 13:28:35 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.18/07/04 13:28:35 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.18/07/04 13:28:35 INFO tool.CodeGenTool: Beginning code generation18/07/04 13:28:35 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `students` AS t LIMIT 118/07/04 13:28:35 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `students` AS t LIMIT 118/07/04 13:28:35 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/app/hadoop-2.6.0-cdh5.7.018/07/04 13:28:37 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/3024b8df04f623e8c79ed9b5b30ace75/students.jar18/07/04 13:28:37 WARN manager.MySQLManager: It looks like you are importing from mysql.18/07/04 13:28:37 WARN manager.MySQLManager: This transfer can be faster! Use the --direct18/07/04 13:28:37 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.18/07/04 13:28:37 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)18/07/04 13:28:37 INFO mapreduce.ImportJobBase: Beginning import of students18/07/04 13:28:38 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar18/07/04 13:28:39 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps18/07/04 13:28:39 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:803218/07/04 13:28:41 INFO db.DBInputFormat: Using read commited transaction isolation18/07/04 13:28:41 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`id`), MAX(`id`) FROM `students`18/07/04 13:28:41 INFO db.IntegerSplitter: Split size: 0; Num splits: 4 from: 1001 to: 100318/07/04 13:28:41 INFO mapreduce.JobSubmitter: number of splits:318/07/04 13:28:42 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1530598609758_001518/07/04 13:28:42 INFO impl.YarnClientImpl: Submitted application application_1530598609758_001518/07/04 13:28:42 INFO mapreduce.Job: The url to track the job: http://oradb3:8088/proxy/application_1530598609758_0015/18/07/04 13:28:42 INFO mapreduce.Job: Running job: job_1530598609758_001518/07/04 13:28:52 INFO mapreduce.Job: Job job_1530598609758_0015 running in uber mode : false18/07/04 13:28:52 INFO mapreduce.Job:  map 0% reduce 0%18/07/04 13:28:58 INFO mapreduce.Job:  map 33% reduce 0%18/07/04 13:28:59 INFO mapreduce.Job:  map 67% reduce 0%18/07/04 13:29:00 INFO mapreduce.Job:  map 100% reduce 0%18/07/04 13:29:00 INFO mapreduce.Job: Job job_1530598609758_0015 completed successfully18/07/04 13:29:00 INFO mapreduce.Job: Counters: 30...18/07/04 13:29:00 INFO mapreduce.ImportJobBase: Transferred 40 bytes in 21.3156 seconds (1.8766 bytes/sec)18/07/04 13:29:00 INFO mapreduce.ImportJobBase: Retrieved 3 records.# 生成的日志信息大家一定要好好理解# 查看HDFS上的文件[hadoop@hadoop000 ~]$ hadoop fs -ls /user/hadoop/studentsFound 4 items-rw-r--r--   1 hadoop supergroup          0 2018-07-04 13:28 /user/hadoop/students/_SUCCESS-rw-r--r--   1 hadoop supergroup         13 2018-07-04 13:28 /user/hadoop/students/part-m-00000-rw-r--r--   1 hadoop supergroup         13 2018-07-04 13:28 /user/hadoop/students/part-m-00001-rw-r--r--   1 hadoop supergroup         14 2018-07-04 13:28 /user/hadoop/students/part-m-00002[hadoop@hadoop000 ~]$ hadoop fs -cat /user/hadoop/students/"part*"1001,lodd,231002,sdfs,211003,sdfsa,24

我们还可以加一些其他参数 使导入过程更加可控

-m 指定启动map进程个数,默认是4个

--delete-target-dir 删除目标目录
--mapreduce-job-name 指定mapreduce的job的名字
--target-dir 导入到指定目录
--fields-terminated-by 指定字段之间的分隔符
--null-string 含义是 string类型的字段,当Value是NULL,替换成指定的字符
--null-non-string 含义是非string类型的字段,当Value是NULL,替换成指定字符
--columns 导入表中的部分字段
--where 按条件导入数据
--query 按照sql语句进行导入 使用--query关键字,就不能使用--table和--columns
--options-file 在文件中执行

# 执行导入[hadoop@hadoop000 ~]$ sqoop import \> --connect jdbc:mysql://localhost:3306/test \> --username root --password 123456 \> --mapreduce-job-name FromMySQL2HDFS \> --delete-target-dir \> --table students \> -m 1# HDFS中查看[hadoop@hadoop000 ~]$ hadoop fs -ls /user/hadoop/students              Found 2 items-rw-r--r--   1 hadoop supergroup          0 2018-07-04 13:53 /user/hadoop/students/_SUCCESS-rw-r--r--   1 hadoop supergroup         40 2018-07-04 13:53 /user/hadoop/students/part-m-00000[hadoop@oradb3 ~]$ hadoop fs -cat /user/hadoop/students/"part*"1001,lodd,231002,sdfs,211003,sdfsa,24
# 使用where 参数[hadoop@hadoop000 ~]$ sqoop import \> --connect jdbc:mysql://localhost:3306/test \> --username root --password 123456 \> --table students \> --mapreduce-job-name FromMySQL2HDFS2 \> --delete-target-dir \> --fields-terminated-by '\t' \> -m 1 \> --null-string 0 \> --columns "name" \> --target-dir STU_COLUMN_WHERE \> --where 'id<1002'# HDFS 结果[hadoop@hadoop000 ~]$ hadoop fs -cat STU_COLUMN_WHERE/"part*"lodd
# 使用query 参数[hadoop@hadoop000 ~]$ sqoop import \> --connect jdbc:mysql://localhost:3306/test \> --username root --password 123456 \> --mapreduce-job-name FromMySQL2HDFS3 \> --delete-target-dir \> --fields-terminated-by '\t' \> -m 1 \> --null-string 0 \> --target-dir STU_COLUMN_QUERY \> --query "select * from students where id>1001 and \$CONDITIONS"# HDFS查看[hadoop@hadoop000 ~]$ hadoop fs -cat STU_COLUMN_QUERY/"part*"1002    sdfs    211003    sdfsa   24
# 使用options-file参数[hadoop@hadoop000 ~]$ vi sqoop-import-hdfs.txtimport--connectjdbc:mysql://localhost:3306/test--usernameroot--password123456--tablestudents--target-dirSTU_option_file# 执行导入[hadoop@hadoop000 ~]$ sqoop --options-file /home/hadoop/sqoop-import-hdfs.txt# HDFS查看[hadoop@hadoop000 ~]$ hadoop fs -cat STU_option_file/"part*"1001,lodd,231002,sdfs,211003,sdfsa,24

5.eval

查看帮助命令对与该命令的解释为: Evaluate a SQL statement and display the results,也就是说执行一个SQL语句并查询出结果。

# 查看命令帮助[hadoop@hadoop000 ~]$ sqoop help evalusage: sqoop eval [GENERIC-ARGS] [TOOL-ARGS]Common arguments:   --connect 
Specify JDBC connect string --connection-manager
Specify connection manager class name --connection-param-file
Specify connection parameters file --driver
Manually specify JDBC driver class to use --hadoop-home
Override $HADOOP_MAPRED_HOME_ARG --hadoop-mapred-home
Override $HADOOP_MAPRED_HOME_ARG --help Print usage instructions-P Read password from console --password
Set authentication password --password-alias
Credential provider password alias --password-file
Set authentication password file path --relaxed-isolation Use read-uncommitted isolation for imports --skip-dist-cache Skip copying jars to distributed cache --username
Set authentication username --verbose Print more information while workingSQL evaluation arguments:-e,--query
Execute 'statement' in SQL and exit# 执行[hadoop@hadoop000 ~]$ sqoop eval \> --connect jdbc:mysql://localhost:3306/test \> --username root --password 123456 \> --query "select * from students"18/07/04 14:28:44 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.7.018/07/04 14:28:44 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.18/07/04 14:28:44 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.----------------------------------------------------| id | name | age | ----------------------------------------------------| 1001 | lodd | 23 | | 1002 | sdfs | 21 | | 1003 | sdfsa | 24 | ----------------------------------------------------

6.export (HDFS数据导出到MySQL或Hive中的数据导入到MySQL)

常用参数:

--table 指定导出表的名称

--input-fields-terminated-by 指定hdfs上文件的分隔符,默认是逗号
--export-dir 导出数据的目录
--columns 指定导出的字段

在执行导出语句前mysql要先创建表(不创建表会报错):

# HDFS原文件[hadoop@hadoop000 ~]$ hadoop fs -cat /user/hadoop/students/part-m-000001001,lodd,231002,sdfs,211003,sdfsa,24# export导出到mysql[hadoop@hadoop000 ~]$ sqoop export \> --connect jdbc:mysql://localhost:3306/test \> --username root \> --password 123456 \> --table students_demo \> --export-dir /user/hadoop/students/18/07/04 14:46:20 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.7.018/07/04 14:46:20 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.18/07/04 14:46:20 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.18/07/04 14:46:20 INFO tool.CodeGenTool: Beginning code generation18/07/04 14:46:21 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `students_demo` AS t LIMIT 118/07/04 14:46:21 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `students_demo` AS t LIMIT 118/07/04 14:46:21 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/app/hadoop-2.6.0-cdh5.7.018/07/04 14:46:24 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/fc7b53dd6eef701c0731c7a7c4a4b340/students_demo.jar18/07/04 14:46:24 INFO mapreduce.ExportJobBase: Beginning export of students_demo18/07/04 14:46:25 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar18/07/04 14:46:25 INFO Configuration.deprecation: mapred.map.max.attempts is deprecated. Instead, use mapreduce.map.maxattempts18/07/04 14:46:26 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative18/07/04 14:46:26 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative18/07/04 14:46:26 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps...18/07/04 14:46:55 INFO mapreduce.ExportJobBase: Transferred 672 bytes in 29.3122 seconds (22.9256 bytes/sec)18/07/04 14:46:55 INFO mapreduce.ExportJobBase: Exported 3 records.# mysql中查看mysql> select * from students_demo;+------+-------+------+| id   | name  | age  |+------+-------+------+| 1001 | lodd  |   23 || 1002 | sdfs  |   21 || 1003 | sdfsa |   24 |+------+-------+------+3 rows in set (0.00 sec)

如果再导入一次会追加在表中

# 增加columns参数[hadoop@hadoop000 ~]$ sqoop export \> --connect jdbc:mysql://localhost:3306/test \> --username root \> --password 123456 \> --table students_demo2 \> --export-dir /user/hadoop/students/ \> --columns id,name# mysql结果mysql> select * from students_demo2;+------+-------+------+| id   | name  | age  |+------+-------+------+| 1001 | lodd  | NULL || 1002 | sdfs  | NULL || 1003 | sdfsa | NULL |+------+-------+------+3 rows in set (0.00 sec)

7.MySQL的中的数据导入到Hive中

常用参数:

--create-hive-table 创建目标表,如果有会报错

--hive-database 指定hive数据库
--hive-import 指定导入hive(没有这个条件导入到hdfs中)
--hive-overwrite 覆盖
--hive-table 指定hive中表的名字,如果不指定使用导入的表的表名
--hive-partition-key 指定Hive分区表字段
--hive-partition-value 指定导入的分区值

首次导入可能会报错如下:

18/07/04 15:06:26 ERROR hive.HiveConfig: Could not load org.apache.hadoop.hive.conf.HiveConf. Make sure HIVE_CONF_DIR is set correctly.<br/>18/07/04 15:06:26 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf
解决方法:到hive目录的lib下拷贝几个jar包,问题就解决了

# 报错解决方法[hadoop@hadoop000 lib]$ pwd/home/hadoop/app/hive-1.1.0-cdh5.7.0/lib[hadoop@hadoop000 lib]$ cp hive-common-1.1.0-cdh5.7.0.jar /home/hadoop/app/sqoop-1.4.6-cdh5.7.0/lib/[hadoop@hadoop000 lib]$ cp hive-shims* /home/hadoop/app/sqoop-1.4.6-cdh5.7.0/lib/
# 报错解决后执行导入[hadoop@hadoop000 ~]$ sqoop import \> --connect jdbc:mysql://localhost:3306/test \> --username root --password 123456 \> --table students \> --create-hive-table \> --hive-database hive \> --hive-import \> --hive-overwrite \> --hive-table stu_import \> --mapreduce-job-name FromMySQL2HIVE \> --delete-target-dir \> --fields-terminated-by '\t' \> -m 1 \> --null-non-string 0# Hive中查看hive> show tables;OKstu_importTime taken: 0.051 seconds, Fetched: 1 row(s)hive> select * from stu_import;OK1001    lodd    231002    sdfs    211003    sdfsa   24Time taken: 0.969 seconds, Fetched: 3 row(s)

建议:导入Hive不建议大家使用–create-hive-table参数,建议事先创建好hive表;因为自动创建的表字段类型可能并不是我们想要的。

# 增加partition参数[hadoop@hadoop000 ~]$ sqoop import \> --connect jdbc:mysql://localhost:3306/test \> --username root --password 123456 \> --table students \> --create-hive-table \> --hive-database hive \> --hive-import \> --hive-overwrite \> --hive-table stu_import2 \> --mapreduce-job-name FromMySQL2HIVE2 \> --delete-target-dir \> --fields-terminated-by '\t' \> -m 1 \> --null-non-string 0 \> --hive-partition-key dt \> --hive-partition-value "2018-08-08"# Hive中查看hive> select * from stu_import2;OK1001    lodd    23      2018-08-081002    sdfs    21      2018-08-081003    sdfsa   24      2018-08-08Time taken: 0.192 seconds, Fetched: 3 row(s)

8.sqoop job的使用

sqoop job可以将执行的语句变成一个job,并不是在创建语句的时候执行,你可以查看该job,可以任何时候执行该job,也可以删除job,这样就方便我们进行任务的调度。

--create <job-id> 创建一个新的job.

--delete <job-id> 删除job
--exec <job-id> 执行job
--show <job-id> 显示job的参数
--list 列出所有的job

# 创建job[hadoop@hadoop000 ~]$ sqoop job --create person_job1 -- import --connect jdbc:mysql://localhost:3306/test \> --username root \> --password 123456 \> --table students_demo \> -m 1 \> --delete-target-dir# 查看job[hadoop@hadoop000 ~]$ sqoop job --listAvailable jobs:  person_job1# 执行job 会提示输入mysql root用户密码[hadoop@hadoop000 ~]$ sqoop job --exec person_job1# HDFS查看[hadoop@hadoop000 lib]$ hadoop fs -ls /user/hadoop/students_demoFound 2 items-rw-r--r--   1 hadoop supergroup          0 2018-07-04 15:34 /user/hadoop/students_demo/_SUCCESS-rw-r--r--   1 hadoop supergroup         40 2018-07-04 15:34 /user/hadoop/students_demo/part-m-00000

我们发现执行person_job的时候,需要输入数据库的密码,怎么样能不输入密码呢

配置sqoop-site.xml即可解决

# 将sqoop.metastore.client.record.password参数的注释去掉 或者再添加一下[hadoop@hadoop000 conf]$ pwd/home/hadoop/app/sqoop-1.4.6-cdh5.7.0/conf[hadoop@hadoop000 conf]$ vi sqoop-site.xml  
sqoop.metastore.client.record.password
true
If true, allow saved passwords in the metastore.

参考文章:

转载于:https://blog.51cto.com/10814168/2136156

你可能感兴趣的文章
poj 1789 Truck History
查看>>
简单BGP实例
查看>>
java synchronized详解
查看>>
Java小细节
查看>>
poj - 1860 Currency Exchange
查看>>
【JS学习】慕课网8-17编程练习 网页的返回与跳转
查看>>
chgrp命令
查看>>
Java集合框架GS Collections具体解释
查看>>
为什么要在下班后努力学习?你不知道的秘密...... ...
查看>>
洛谷 P2486 BZOJ 2243 [SDOI2011]染色
查看>>
Spring Cloud 2.x系列之整合rocketMQ
查看>>
答疑解惑:Linux与Windows的那些事儿(2)
查看>>
使用pjsip传输已经编码的视频,源码在github
查看>>
一个通过JSONP跨域调用WCF REST服务的例子(以jQuery为例)
查看>>
Forrester企业虚拟化率推进云计算应用
查看>>
基于consul高可用
查看>>
最新!Gartner公布持续威胁评估框架CARTA 对应NIST提出的对内持续监控框架ISCM
查看>>
网络安全公司Sift Science 获 3000 万美元 C 轮融资
查看>>
美联储被盯上了?国会“严重担忧”联储网络漏洞
查看>>
台积电:2018年之前不决定是否把工厂迁美
查看>>