本文的主要详细介绍hadoop2.x中snappy压缩算法安装配置的详细步骤。
[一]、 实验环境
- CentOS 6.3 64位
- Hadoop 2.6.0
- JDK 1.7.0_75
[二]、 snappy编译安装
2.1、下载源码
到官网 http://code.google.com/p/snappy/ 或者到 https://github.com/google/snappy 下载源码,目前版本为 1.1.1。
2.2、编译安装
解压 tar -zxvf snappy-1.1.1.tar.gz
,然后以 root 用户 执行标准的三步进行编译安装:
1 2 3 |
./configure make make install |
默认是安装到 /usr/local/lib
,这时在此目录下查看:
1 2 3 4 5 6 |
[hadoop@micmiu ~]$ ls -lh /usr/local/lib |grep snappy -rw-r--r-- 1 root root 229K Mar 10 11:28 libsnappy.a -rwxr-xr-x 1 root root 953 Mar 10 11:28 libsnappy.la lrwxrwxrwx 1 root root 18 Mar 10 11:28 libsnappy.so -> libsnappy.so.1.2.0 lrwxrwxrwx 1 root root 18 Mar 10 11:28 libsnappy.so.1 -> libsnappy.so.1.2.0 -rwxr-xr-x 1 root root 145K Mar 10 11:28 libsnappy.so.1.2.0 |
安装过程没有错误同时能看到上面的动态库,基本表示snappy 安装编译成功。
[三]、Hadoop snappy 安装配置
3.1、hadoop 动态库重新编译支持snappy
hadoop动态库编译参考:Hadoop2.2.0源码编译 和 Hadoop2.x在Ubuntu系统中编译源码 ,只是把最后编译的命令中增加 -Drequire.snappy
:
1 |
mvn package -Pdist,native -DskipTests -Dtar -Drequire.snappy |
把重新编译生成的hadoop动态库替换原来的。
3.2、hadoop-snappy 下载
官网 http://code.google.com/p/hadoop-snappy/ ,目前官网没有软件包提供,只能借助 svn 下载源码:
1 |
svn checkout http://hadoop-snappy.googlecode.com/svn/trunk/ hadoop-snappy |
3.3、hadoop-snappy 编译
1 |
mvn package [-Dsnappy.prefix=SNAPPY_INSTALLATION_DIR] |
PS:如果上面 snappy安装路径是默认的话,即 /usr/local/lib
,则此处 [-Dsnappy.prefix=SNAPPY_INSTALLATION_DIR] 可以省略,或者 -Dsnappy.prefix=/usr/local/lib
编译成功后,把编译后target下的 hadoop-snappy-0.0.1-SNAPSHOT.jar 复制到 $HADOOP_HOME/lib ,同时把编译生成后的动态库 copy到 $HADOOP_HOME/lib/native/
目录下:
1 |
cp -r $HADOOP-SNAPPY_CODE_HOME/target/hadoop-snappy-0.0.1-SNAPSHOT/lib/native/Linux-amd64-64 $HADOOP_HOME/lib/native/ |
3.4、编译过程中常见错误处理
① 缺少一些第三方依赖
官方文档中提到编译前提需要:gcc c++, autoconf, automake, libtool, Java 6, JAVA_HOME set, Maven 3
②错误信息:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
[exec] libtool: link: gcc -shared src/org/apache/hadoop/io/compress/snappy/.libs/SnappyCompressor.o src/org/apache/hadoop/io/compress/snappy/.libs/SnappyDecompressor.o -L/usr/local/lib -ljvm -ldl -m64 -Wl,-soname -Wl,libhadoopsnappy.so.0 -o .libs/libhadoopsnappy.so.0.0.1 [exec] /usr/bin/ld: cannot find -ljvm [exec] collect2: ld returned 1 exit status [exec] make: *** [libhadoopsnappy.la] Error 1 或者 [exec] /bin/sh ./libtool --tag=CC --mode=link gcc -g -Wall -fPIC -O2 -m64 -g -O2 -version-info 0:1:0 -L/usr/local/lib -o libhadoopsna/usr/bin/ld: cannot find -ljvm [exec] collect2: ld returned 1 exit status [exec] make: *** [libhadoopsnappy.la] Error 1 [exec] ppy.la -rpath /usr/local/lib src/org/apache/hadoop/io/compress/snappy/SnappyCompressor.lo src/org/apache/hadoop/io/compress/snappy/SnappyDecompressor.lo -ljvm -ldl [exec] libtool: link: gcc -shared src/org/apache/hadoop/io/compress/snappy/.libs/SnappyCompressor.o src/org/apache/hadoop/io/compress/snappy/.libs/SnappyDecompressor.o -L/usr/local/lib -ljvm -ldl -m64 -Wl,-soname -Wl,libhadoopsnappy.so.0 -o .libs/libhadoopsnappy.so.0.0.1 [ant] Exiting /home/hadoop/codes/hadoop-snappy/maven/build-compilenative.xml. |
这个错误是因为没有把安装jvm的libjvm.so 链接到 /usr/local/lib。如果你的系统时amd64,可以执行如下命令解决这个问题:
1 |
ln -s /usr/java/jdk1.7.0_75/jre/lib/amd64/server/libjvm.so /usr/local/lib/ |
[四]、hadoop配置修改
4.1、修改 $HADOOP_HOME/etc/hadoop/hadoop-env.sh
,添加:
1 |
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native/Linux-amd64-64/ |
4.2、修改 $HADOOP_HOME/etc/hadoop/core-site.xml
:
1 2 3 4 5 6 7 8 |
<property> <name>io.compression.codecs</name> <value>org.apache.hadoop.io.compress.GzipCodec, org.apache.hadoop.io.compress.DefaultCodec, org.apache.hadoop.io.compress.BZip2Codec, org.apache.hadoop.io.compress.SnappyCodec </value> </property> |
4.3、修改 $HADOOP_HOME/etc/hadoop/mapred-site.xml
中有关压缩属性,测试snappy:
1 2 3 4 5 6 7 8 9 |
<property> <name>mapreduce.map.output.compress</name> <value>true</value> </property> <property> <name>mapreduce.map.output.compress.codec</name> <value>org.apache.hadoop.io.compress.SnappyCodec</value> </property> |
[五]、测试验证
全部配置好后(集群中所有的节点都需要copy动态库和修改配置),重启hadoop集群环境,运行自带的测试实例 wordcount,如果mapreduce过程中没有错误信息即表示snappy压缩安装方法配置成功。
当然hadoop也提供了本地库的测试方法 hadoop checknative
:
1 2 3 4 5 6 7 8 9 10 |
[hadoop@micmiu ~]$ hadoop checknative 15/03/17 22:57:59 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native 15/03/17 22:57:59 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library Native library checking: hadoop: true /usr/local/share/hadoop-2.6.0/lib/native/libhadoop.so.1.0.0 zlib: true /lib64/libz.so.1 snappy: true /usr/local/share/hadoop/lib/native/Linux-amd64-64/libsnappy.so.1 lz4: true revision:99 bzip2: true /lib64/libbz2.so.1 openssl: true /usr/lib64/libcrypto.so |
—————– EOF @Michael Sun —————–
原创文章,转载请注明: 转载自micmiu – 软件开发+生活点滴[ http://www.micmiu.com/ ]
本文链接地址: http://www.micmiu.com/bigdata/hadoop/hadoop-snappy-install-config/
😀
楼主挺辛苦的,但是不晓得楼主有没有看hadoop-snappy和hadoop2.x的
源码,在hadoop2.x源码已经集成了Snappy压缩了,所以编译安装
hadoop-snappy 根本是多余的,只要安装snappy本地库和重新编译
hadoop native 库就行咯:
mvn clean package -DskipTests -Pdist,native -Dtar -Dsnappy.lib=/usr/local/snappy-1.1.3/lib -Dbundle.snappy
这个下次试试