目录结构
- 引言
- 实验环境
- 实验过程
- 演示demo
[一]、 引言
在Hadoop2.x初期的时候写过一篇 hadoop 2.2.0 集群模式安装配置和测试,记录了分布式搭建的最基本的搭建步骤和运行演示,那篇文章中没有对HA的配置做实验,本文会详细介绍 Hadoop2的分布式、NameNode配置HA以及ResourceManage配置HA的实验过程。
[二]、 实验环境
1、各节点及角色分配
本文以5个集群节点为基础做实验环境,具体的角色分配如下:
hostname | NameNode | DataNode | JournalNode | Zookeeper | ZKFC | ResourceManager |
nn1.hadoop | √(Active) | √ | √ | √ | √ | |
nn2.hadoop | √(Standby) | √ | √ | √ | √ | |
dn1.hadoop | √ | √ | √ | |||
dn2.hadoop | √ | |||||
dn3.hadoop | √ |
2、系统及软件版本
- CentOS 6.3 64位
- Java 1.7.0_75
- Hadoop 2.6.0
- zookeeper 3.4.6
3、安装JDK (所有节点需要操作)
1 2 3 4 5 6 7 8 |
//查询openjdk相关安装包 rpm -qa | grep java //卸载openjdk rpm -e --nodeps java-1.6.0-openjdk-1.6.0.0-1.45.1.11.1.el6.x86_64 rpm -e --nodeps java-1.6.0-openjdk-javadoc-1.6.0.0-1.45.1.11.1.el6.x86_64 rpm -e --nodeps java-1.6.0-openjdk-devel-1.6.0.0-1.45.1.11.1.el6.x86_64 rpm -e --nodeps tzdata-java-2012c-1.el6.noarch rpm -e --nodeps java-1.5.0-gcj-1.5.0.0-29.1.el6.x86_64 |
Oracle官方下载 64为 jdk :jdk-7u3-linux-x64.rpm 执行安装命令:
rpm -ivh jdk-7u3-linux-x64.rpm
默认的安装路径:/usr/java/jdk1.7.0_75
4、配置hosts (所有节点需要操作)
1 2 3 4 5 |
172.17.225.61 nn1.hadoop zk1.hadoop 172.17.225.121 nn2.hadoop zk2.hadoop 172.17.225.72 dn1.hadoop zk3.hadoop 172.17.225.76 dn2.hadoop 172.17.225.19 dn3.hadoop |
5、确认SSHD已经安装并启动 (所有节点需要操作)
6、配置时钟同步
第一种方法 :(所有节点都要操作)都从公共NTP服务器同步,执行如下:
1 2 3 4 |
$ cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime $ ntpdate us.pool.ntp.org $ crontab -e 0-59/10 * * * * /usr/sbin/ntpdate us.pool.ntp.org | logger -t NTP |
第二种方法:选一个节点搭建一个NTP服务,其他节点从该NTP服务器同步
7、创建专有用户(所有节点需要操作)
比如创建 hadoop用户,密码也初始化为hadoop, 下面有关hadoop部署配置都是以这个用户操作的
1 2 3 |
groupadd hadoop useradd -g hadoop hadoop passwd hadoop |
为hadoop 用户修改环境变量 vi ~/.bash_profile
:
1 2 |
export JAVA_HOME=/usr/java/jdk1.7.0_75 export PATH="$JAVA_HOME/bin:$PATH" |
8、SSH免密码登陆
配置所有的NameNode节点 可以免密码登录到其余所有节点,只需要单向免密登录即可,当然你要配置为双向也无妨。有关SSH无密码登录的详细介绍可以参见:Linux(Centos)配置OpenSSH无密码登陆
[三]、 实验过程
1、hadoop2的编译
在实验环境中任节点机器上 下载hadoop 2.6.0的源码,安装配置好Java 和Maven 然后执行 mvn package -Pdist,native -DskipTests -Dtar
进行源码编译,具体可参考:
2、zookeeper安装配置
下载最新稳定版本(3.4.6)部署在ZK的各个节点,修改环境变量vi ~/.bash_profile
:
1 2 |
export ZOOKEEPER_HOME=/usr/local/share/zookeeper export PATH="$ZOOKEEPER_HOME/bin:$PATH" |
修改配置文件:
1 2 3 |
cd $ZOOKEEPER_HOME cp conf/zoo_sample.cfg conf/zoo.cfg vi conf/zoo.cfg |
修改成如下:
1 2 3 4 5 6 7 8 9 |
tickTime=2000 initLimit=10 syncLimit=5 clientPort=2181 dataDir=/bigdata/hadoop/zookeeper/zkdata dataLogDir=/bigdata/hadoop/zookeeper/zklogs server.1=zk1.hadoop:2888:3888 server.2=zk2.hadoop:2888:3888 server.3=zk3.hadoop:2888:3888 |
配置文件中的相关目录路径需要先创建好且hadoop用户具有读写权限,不同zk节点配置不同的myid:
- 在zk1.hadoop 节点中 执行:echo 1 > /bigdata/hadoop/zookeeper/zkdata/myid
- 在zk2.hadoop 节点中 执行:echo 2 > /bigdata/hadoop/zookeeper/zkdata/myid
- 在zk3.hadoop 节点中 执行:echo 3 > /bigdata/hadoop/zookeeper/zkdata/myid
myid中的数值需要和 zoo.cfg中的配置一致。
3、hadoop 安装配置(所有节点需要修改)
3.1、配置环境变量vi ~/.bash_profile
:
1 2 |
export HADOOP_HOME=/usr/local/share/hadoop export PATH="$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin" |
3.2、修改 $HADOOP_HOME/etc/hadoop/core-site.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://mycluster</value> <description>这里的 mycluster为HA集群的逻辑名, 与hdfs-site.xml中的dfs.nameservices配置一致</description> </property> <property> <name>hadoop.tmp.dir</name> <value>/bigdata/hadoop/temp</value> <description>这里的路径默认是NameNode、DataNode、JournalNode等存放数据的公共目录。 用户也可单独指定每类数据的存储目录。这里目录结构需要自己先创建好</description> </property> <property> <name>ha.zookeeper.quorum</name> <value>zk1.hadoop:2181,zk2.hadoop:2181,zk3.hadoop:2181</value> <description>这里是zk集群配置中各节点的地址和端口。 注意:数量一定是奇数而且和zoo.cfg中配置的一致</description> </property> </configuration> |
3.3、修改 $HADOOP_HOME/etc/hadoop/hfds-site.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 |
<property> <name>dfs.replication</name> <value>3</value> <description>配置副本数量</description> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/bigdata/hadoop/dfs/name</value> <description>namenode元数据存储目录</description> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/bigdata/hadoop/dfs/data</value> <description>datanode元数据存储目录</description> </property> <property> <name>dfs.nameservices</name> <value>mycluster</value> <description>指定HA命名服务,可随意起名, core-site.xml中fs.defaultFS配置需要引用它</description> </property> <property> <name>dfs.ha.namenodes.mycluster</name> <value>nn1,nn2</value> <description>指定集群下NameNode逻辑名</description> </property> <property> <name>dfs.namenode.rpc-address.mycluster.nn1</name> <value>nn1.hadoop:9000</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.nn2</name> <value>nn2.hadoop:9000</value> </property> <property> <name>dfs.namenode.http-address.mycluster.nn1</name> <value>nn1.hadoop:50070</value> </property> <property> <name>dfs.namenode.http-address.mycluster.nn2</name> <value>nn2.hadoop:50070</value> </property> <property> <name>dfs.namenode.servicerpc-address.mycluster.nn1</name> <value>nn1.hadoop:53310</value> </property> <property> <name>dfs.namenode.servicerpc-address.mycluster.nn2</name> <value>nn2.hadoop:53310</value> </property> <property> <name>dfs.ha.automatic-failover.enabled.mycluster</name> <value>true</value> <description>故障失败是否自动切换</description> </property> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://nn1.hadoop:8485;nn2.hadoop:8485;dn1.hadoop:8485/hadoop-journal</value> <description>配置JournalNode,包含三部分: 1.qjournal 前缀表名协议; 2.然后就是三台部署JournalNode的主机host/ip:端口,三台机器之间用分号分隔; 3.最后的hadoop-journal是journalnode的命名空间,可以随意取名。 </description> </property> <property> <name>dfs.journalnode.edits.dir</name> <value>/bigdata/hadoop/dfs/journal/</value> <description>journalnode的本地数据存放目录</description> </property> <property> <name>dfs.client.failover.proxy.provider.mycluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> <description> 指定mycluster出故障时执行故障切换的类</description> </property> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> <description>ssh的操作方式执行故障切换</description> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/hadoop/.ssh/id_dsa</value> <description> 如果使用ssh进行故障切换,使用ssh通信时用的密钥存储的位置</description> </property> <property> <name>dfs.ha.fencing.ssh.connect-timeout</name> <value>1000</value> </property> <property> <name>dfs.namenode.handler.count</name> <value>10</value> </property> |
3.4、修改 $HADOOP_HOME/etc/hadoop/yarn-site.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.cluster-id</name> <value>clusterrm</value> </property> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>nn1.hadoop</value> </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>nn2.hadoop</value> </property> <property> <name>yarn.resourcemanager.recovery.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.store.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value> </property> <property> <name>yarn.resourcemanager.zk-address</name> <value>zk1.hadoop:2181,zk2.hadoop:2181,zk3.hadoop:2181</value> </property> |
PS: yarn-site.xml中的HA相关配置格式和hdfs-site.xml中的HA配置类似。
3.5、修改 $HADOOP_HOME/etc/hadoop/mapred-site.xml
1 2 3 4 5 |
<property> <name>mapreduce.framework.name</name> <value>yarn</value> <final>true</final> </property> |
3.6、修改 $HADOOP_HOME/etc/hadoop/salves
1 2 3 |
dn1.hadoop dn2.hadoop dn3.hadoop |
4、启动步骤和详细过程:
4.1、启动ZK
在所有的ZK节点执行命令: zkServer.sh start
可借助命令 zkServer.sh status
查看各个ZK的从属关系
4.2、格式化ZK(仅第一次需要做)
任意ZK节点上执行:hdfs zkfc -formatZK
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
[hadoop@nn1 micmiu]$ hdfs zkfc -formatZK 15/02/02 16:54:24 INFO tools.DFSZKFailoverController: Failover controller configured for NameNode NameNode at nn1.hadoop/172.17.225.61:53310 15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT 15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Client environment:host.name=nn1.hadoop 15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Client environment:java.version=1.7.0_75 15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation 15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/java/jdk1.7.0_75/jre 15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/usr/local/share/hadoop-2.6.0/etc/hadoop:/usr/local/share/hadoop-2.6.0/...... 15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/usr/local/share/hadoop-2.6.0/lib/native 15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp 15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA> 15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux 15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64 15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Client environment:os.version=2.6.32-279.el6.x86_64 15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Client environment:user.name=hadoop 15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/hadoop 15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Client environment:user.dir=/usr/local/share/hadoop-2.6.0 15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=zk1.hadoop:2181,zk2.hadoop:2181,zk3.hadoop:2181 sessionTimeout=5000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@1e884ca9 15/02/02 16:54:24 INFO zookeeper.ClientCnxn: Opening socket connection to server nn1.hadoop/172.17.225.61:2181. Will not attempt to authenticate using SASL (unknown error) 15/02/02 16:54:24 INFO zookeeper.ClientCnxn: Socket connection established to nn1.hadoop/172.17.225.61:2181, initiating session 15/02/02 16:54:24 INFO zookeeper.ClientCnxn: Session establishment complete on server nn1.hadoop/172.17.225.61:2181, sessionid = 0x14b496d55810000, negotiated timeout = 5000 15/02/02 16:54:24 INFO ha.ActiveStandbyElector: Session connected. 15/02/02 16:54:24 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/mycluster in ZK. 15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Session: 0x14b496d55810000 closed 15/02/02 16:54:24 INFO zookeeper.ClientCnxn: EventThread shut down |
4.3、启动ZKFC
ZKFC(zookeeperFailoverController)是用来监控NN状态,协助实现主备NN切换的,所以仅仅在主备NN节点上启动就行。
1 2 3 4 5 6 7 |
[hadoop@nn1 micmiu]$ hadoop-daemon.sh start zkfc starting zkfc, logging to /usr/local/share/hadoop-2.6.0/logs/hadoop-hadoop-zkfc-nn1.hadoop.out [hadoop@nn1 micmiu]$ [hadoop@nn2 micmiu]$ hadoop-daemon.sh start zkfc starting zkfc, logging to /usr/local/share/hadoop-2.6.0/logs/hadoop-hadoop-zkfc-nn2.hadoop.out [hadoop@nn2 micmiu]$ jps |
4.4、启动JournalNode 用于主备NN之间同步元数据信息的共享存储系统, 在每个JN节点上启动:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
#JN节点1 [hadoop@nn1 micmiu]$ hadoop-daemon.sh start journalnode starting journalnode, logging to /usr/local/share/hadoop-2.6.0/logs/hadoop-hadoop-journalnode-nn1.hadoop.out [hadoop@nn1 micmiu]$ jps 8499 QuorumPeerMain 8771 DFSZKFailoverController 8895 Jps 8837 JournalNode #JN节点2 [hadoop@nn2 micmiu]$ hadoop-daemon.sh start journalnode starting journalnode, logging to /usr/local/share/hadoop-2.6.0/logs/hadoop-hadoop-journalnode-nn2.hadoop.out [hadoop@nn2 micmiu]$ jps 7828 QuorumPeerMain 8198 JournalNode 8082 DFSZKFailoverController 8252 Jps #JN节点3 [hadoop@dn1 micmiu]$ hadoop-daemon.sh start journalnode starting journalnode, logging to /usr/local/share/hadoop-2.6.0/logs/hadoop-hadoop-journalnode-dn1.hadoop.out [hadoop@dn1 ~]$ jps 748 QuorumPeerMain 1008 JournalNode 1063 Jps |
4.5、格式化并启动主NN
格式化:hdfs namenode -format
注意:只有第一次启动系统时需格式化,请勿重复格式化!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
[hadoop@nn1 micmiu]$ hdfs namenode -format 15/02/02 17:03:05 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = nn1.hadoop/172.17.225.61 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 2.6.0 STARTUP_MSG: classpath = /usr/local/share/hadoop-2.6.0/etc/hadoop:/usr/local/share/hadoop-2.6.0/share/hadoop/common/lib/....... STARTUP_MSG: build = Unknown -r Unknown; compiled by 'hadoop' on 2015-01-29T15:07Z STARTUP_MSG: java = 1.7.0_75 ************************************************************/ 15/02/02 17:03:05 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT] 15/02/02 17:03:05 INFO namenode.NameNode: createNameNode [-format] Formatting using clusterid: CID-237f7c54-db75-470c-8baf-d4dcfaddaf2f 15/02/02 17:03:05 INFO namenode.FSNamesystem: No KeyProvider found. 15/02/02 17:03:05 INFO namenode.FSNamesystem: fsLock is fair:true 15/02/02 17:03:05 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000 15/02/02 17:03:05 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true 15/02/02 17:03:05 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000 15/02/02 17:03:05 INFO blockmanagement.BlockManager: The block deletion will start around 2015 Feb 02 17:03:05 15/02/02 17:03:05 INFO util.GSet: Computing capacity for map BlocksMap 15/02/02 17:03:05 INFO util.GSet: VM type = 64-bit 15/02/02 17:03:05 INFO util.GSet: 2.0% max memory 889 MB = 17.8 MB 15/02/02 17:03:05 INFO util.GSet: capacity = 2^21 = 2097152 entries 15/02/02 17:03:05 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false 15/02/02 17:03:05 INFO blockmanagement.BlockManager: defaultReplication = 3 15/02/02 17:03:05 INFO blockmanagement.BlockManager: maxReplication = 512 15/02/02 17:03:05 INFO blockmanagement.BlockManager: minReplication = 1 15/02/02 17:03:05 INFO blockmanagement.BlockManager: maxReplicationStreams = 2 15/02/02 17:03:05 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks = false 15/02/02 17:03:05 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000 15/02/02 17:03:05 INFO blockmanagement.BlockManager: encryptDataTransfer = false 15/02/02 17:03:05 INFO blockmanagement.BlockManager: maxNumBlocksToLog = 1000 15/02/02 17:03:05 INFO namenode.FSNamesystem: fsOwner = hadoop (auth:SIMPLE) 15/02/02 17:03:05 INFO namenode.FSNamesystem: supergroup = supergroup 15/02/02 17:03:05 INFO namenode.FSNamesystem: isPermissionEnabled = true 15/02/02 17:03:05 INFO namenode.FSNamesystem: Determined nameservice ID: mycluster 15/02/02 17:03:05 INFO namenode.FSNamesystem: HA Enabled: true 15/02/02 17:03:05 INFO namenode.FSNamesystem: Append Enabled: true 15/02/02 17:03:06 INFO util.GSet: Computing capacity for map INodeMap 15/02/02 17:03:06 INFO util.GSet: VM type = 64-bit 15/02/02 17:03:06 INFO util.GSet: 1.0% max memory 889 MB = 8.9 MB 15/02/02 17:03:06 INFO util.GSet: capacity = 2^20 = 1048576 entries 15/02/02 17:03:06 INFO namenode.NameNode: Caching file names occuring more than 10 times 15/02/02 17:03:06 INFO util.GSet: Computing capacity for map cachedBlocks 15/02/02 17:03:06 INFO util.GSet: VM type = 64-bit 15/02/02 17:03:06 INFO util.GSet: 0.25% max memory 889 MB = 2.2 MB 15/02/02 17:03:06 INFO util.GSet: capacity = 2^18 = 262144 entries 15/02/02 17:03:06 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033 15/02/02 17:03:06 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0 15/02/02 17:03:06 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension = 30000 15/02/02 17:03:06 INFO namenode.FSNamesystem: Retry cache on namenode is enabled 15/02/02 17:03:06 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis 15/02/02 17:03:06 INFO util.GSet: Computing capacity for map NameNodeRetryCache 15/02/02 17:03:06 INFO util.GSet: VM type = 64-bit 15/02/02 17:03:06 INFO util.GSet: 0.029999999329447746% max memory 889 MB = 273.1 KB 15/02/02 17:03:06 INFO util.GSet: capacity = 2^15 = 32768 entries 15/02/02 17:03:06 INFO namenode.NNConf: ACLs enabled? false 15/02/02 17:03:06 INFO namenode.NNConf: XAttrs enabled? true 15/02/02 17:03:06 INFO namenode.NNConf: Maximum size of an xattr: 16384 15/02/02 17:03:07 INFO namenode.FSImage: Allocated new BlockPoolId: BP-711086735-172.17.225.61-1422867787014 15/02/02 17:03:07 INFO common.Storage: Storage directory /bigdata/hadoop/dfs/name has been successfully formatted. 15/02/02 17:03:07 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0 15/02/02 17:03:07 INFO util.ExitUtil: Exiting with status 0 15/02/02 17:03:07 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at nn1.hadoop/172.17.225.61 ************************************************************/ [hadoop@nn1 micmiu]$ |
在主NN节点执行命令启动NN: hadoop-daemon.sh start namenode
可以对比查看启动前后NN节点的进程:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
#启动前 [hadoop@nn1 micmiu]$ jps 8499 QuorumPeerMain 8771 DFSZKFailoverController 8988 Jps 8837 JournalNode [hadoop@nn1 micmiu]$ hadoop-daemon.sh start namenode starting namenode, logging to /usr/local/share/hadoop-2.6.0/logs/hadoop-hadoop-namenode-nn1.hadoop.out #启动后 [hadoop@nn1 micmiu]$ jps 8499 QuorumPeerMain 9134 Jps 8771 DFSZKFailoverController 8837 JournalNode 9017 NameNode |
4.6、在备NN上同步主NN的元数据信息
hdfs namenode -bootstrapStandby
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
[hadoop@nn2 ~]$ hdfs namenode -bootstrapStandby 15/02/02 17:04:43 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = nn2.hadoop/172.17.225.121 STARTUP_MSG: args = [-bootstrapStandby] STARTUP_MSG: version = 2.6.0 STARTUP_MSG: classpath = /usr/local/share/hadoop-2.6.0/etc/hadoop:/usr/local/share/hadoop-2.6.0...... STARTUP_MSG: build = Unknown -r Unknown; compiled by 'hadoop' on 2015-01-29T15:07Z STARTUP_MSG: java = 1.7.0_75 ************************************************************/ 15/02/02 17:04:43 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT] 15/02/02 17:04:43 INFO namenode.NameNode: createNameNode [-bootstrapStandby] ===================================================== About to bootstrap Standby ID nn2 from: Nameservice ID: mycluster Other Namenode ID: nn1 Other NN's HTTP address: http://nn1.hadoop:50070 Other NN's IPC address: nn1.hadoop/172.17.225.61:53310 Namespace ID: 263802668 Block pool ID: BP-711086735-172.17.225.61-1422867787014 Cluster ID: CID-237f7c54-db75-470c-8baf-d4dcfaddaf2f Layout version: -60 ===================================================== 15/02/02 17:04:44 INFO common.Storage: Storage directory /bigdata/hadoop/dfs/name has been successfully formatted. 15/02/02 17:04:45 INFO namenode.TransferFsImage: Opening connection to http://nn1.hadoop:50070/imagetransfer?getimage=1&txid=0&storageInfo=-60:263802668:0:CID-237f7c54-db75-470c-8baf-d4dcfaddaf2f 15/02/02 17:04:45 INFO namenode.TransferFsImage: Image Transfer timeout configured to 60000 milliseconds 15/02/02 17:04:45 INFO namenode.TransferFsImage: Transfer took 0.00s at 0.00 KB/s 15/02/02 17:04:45 INFO namenode.TransferFsImage: Downloaded file fsimage.ckpt_0000000000000000000 size 352 bytes. 15/02/02 17:04:45 INFO util.ExitUtil: Exiting with status 0 15/02/02 17:04:45 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at nn2.hadoop/172.17.225.121 ************************************************************/ [hadoop@nn2 ~]$ |
4.7、启动备NN
在备NN上执行命令:hadoop-daemon.sh start namenode
1 2 3 4 5 6 7 8 |
hadoop@nn2 ~]$ hadoop-daemon.sh start namenode starting namenode, logging to /usr/local/share/hadoop-2.6.0/logs/hadoop-hadoop-namenode-nn2.hadoop.out [hadoop@nn2 ~]$ jps 7828 QuorumPeerMain 8198 JournalNode 8082 DFSZKFailoverController 8394 NameNode 8491 Jps |
4.8、设置和确认主NN
本文配置的是自动切换,ZK已经自动选择一个节点作为主NN了,所以这一步可以省略,查看节点状态:
1 2 3 4 |
[hadoop@nn1 ~]$ hdfs haadmin -getServiceState nn1 active [hadoop@nn1 ~]$ hdfs haadmin -getServiceState nn2 standby |
如果是配置手动切换NN的,这一步是不可缺少的,因为系统还不知道谁是主NN,两个节点的NN都是Standby状态。手动激活主NN的命令:hdfs haadmin -transitionToActive nn1
4.9、在主NN上启动Datanode
启动所有datanode命令:hadoop-daemons.sh start datanode
注意:hadoop-daemons.sh 和 hadoop-daemon.sh 命令的差异
1 2 3 4 5 |
hadoop@nn1 ~]$ hadoop-daemons.sh start datanode dn3.hadoop: starting datanode, logging to /usr/local/share/hadoop-2.6.0/logs/hadoop-hadoop-datanode-dn3.hadoop.out dn1.hadoop: starting datanode, logging to /usr/local/share/hadoop-2.6.0/logs/hadoop-hadoop-datanode-dn1.hadoop.out dn2.hadoop: starting datanode, logging to /usr/local/share/hadoop-2.6.0/logs/hadoop-hadoop-datanode-dn2.hadoop.out [hadoop@nn1 ~]$ |
4.10、启动YARN
方法一:一次性启动ResourceManager和NodeManager命令:start-yarn.sh
方法二:分别启动ResourceManager和NodeManager:
yarn-daemon.sh start resourcemanager
yarn-daemon.sh start nodemanager
(如果有多个datanode,需使用yarn-daemons.sh)
ResourceManager 也配置了HA,根据命令查看节点状态:
yarn rmadmin –getServiceState serviceid
1 2 3 4 5 |
[hadoop@nn1 ~]$ yarn rmadmin -getServiceState rm1 active [hadoop@nn1 ~]$ yarn rmadmin -getServiceState rm2 standby [hadoop@nn1 ~]$ |
4.11 启动MR JobHistory Server
在dn1.hadoop上运行MRJS :mr-jobhistory-daemon.sh start historyserver
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
//运行MRJS之前 [hadoop@dn1 ~]$ jps 14625 Jps 3568 NodeManager 748 QuorumPeerMain 1008 JournalNode 1194 DataNode [hadoop@dn1 ~]$ mr-jobhistory-daemon.sh start historyserver starting historyserver, logging to /usr/local/share/hadoop-2.6.0/logs/mapred-hadoop-historyserver-dn1.hadoop.out //运行MRJS之后 [hadoop@dn1 ~]$ jps 14745 JobHistoryServer 3568 NodeManager 748 QuorumPeerMain 1008 JournalNode 1194 DataNode 14786 Jps |
4.12、验证NameNode 和ResourceManager 的HA是否生效
把当前主节点中的相关进程kill掉 查看各节点状态切换情况。
4.13、验证NN HA的透明性
注意验证 hdfs dfs -ls
/ 和 hdfs dfs -ls hdfs://mycluster/
的访问效果是一致的:
1 2 3 4 5 6 7 8 9 |
[hadoop@nn1 ~]$ hdfs dfs -ls / Found 2 items drwx------ - hadoop supergroup 0 2015-02-02 23:42 /tmp drwxr-xr-x - hadoop supergroup 0 2015-02-02 23:39 /user [hadoop@nn1 ~]$ hdfs dfs -ls hdfs://mycluster/ Found 2 items drwx------ - hadoop supergroup 0 2015-02-02 23:42 hdfs://mycluster/tmp drwxr-xr-x - hadoop supergroup 0 2015-02-02 23:39 hdfs://mycluster/user [hadoop@nn1 ~]$ |
[五]、 运行wrodcount demo
这个demo的演示可参考:hadoop 2.2.0 集群模式安装配置和测试 中的 wordcount演示步骤,这里不再重复描述了。
参考引用:
- CentOS 卸载自带的OpenJDK
- Linux(Centos)配置OpenSSH无密码登陆
- Hadoop2.2.0源码编译
- Hadoop2.x在Ubuntu系统中编译源码
- hadoop 2.2.0 集群模式安装配置和测试
- http://hadoop.apache.org/docs/r2.6.0/
- 【甘道夫】Hadoop2.2.0 NN HA详细配置+Client透明性试验【完整版】
—————– EOF @Michael Sun —————–
原创文章,转载请注明: 转载自micmiu – 软件开发+生活点滴[ http://www.micmiu.com/ ]
本文链接地址: http://www.micmiu.com/bigdata/hadoop/hadoop2-cluster-ha-setup/
0 条评论。