hdfs增加ns之后，重启DN报clusterId不匹配错误

时间：2014-12-09 23:06:30 阅读：498 评论：0 收藏：0 [点我收藏+]

在测试环境准备测试FastCopy，因为之前只有一个NS，准备增加一个NS也便于测试，一切都准备妥当之后，重启DN，但是DN死活连接不上新的NN，报以下错误：

java.io.IOException: Incompatible clusterIDs in /data0/hadoop/dfs/data: namenode clusterID = CID-79c6e55b-5897-4a30-b278-149827ac200f; datanode clusterID = CID-1561e550-a7b9-4886-8a9a-cc2328b82912
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:472)
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:225)
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:249)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:944)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:915)
        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:274)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:220)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:815)
        at java.lang.Thread.run(Thread.java:745)</span>

错误提示DN的clusterID和NN的clusterID不匹配，同事提醒说，格式化新增的NN的时候指定DN也有的clusterID（CID-1561e550-a7b9-4886-8a9a-cc2328b82912）就可以了,一个NN节点上执行：

hdfs name -format -clusterid CID-1561e550-a7b9-4886-8a9a-cc2328b82912

根据提示格式化完NN和JN之后，启动该NN，新增的另外一个NN不需要格式化，只需要执行以下命令就能将之前启动的NN所有信息同步到自己目录下：

<span style="font-size:14px;">hdfs namenode -bootstrapStandby</span>

同步完成之后，启动NN，然后重启所有的DN，发现在NS1和NS2对应的NN上面都能看到所有的DN了。

以下来说一下什么是clusterID，也即clusterID的作用：

clusterID，也即是集群唯一的ID，其作用是确保可信任的DN连接到集群，DN中clusterID是DN第一次启动的时候从NN获取：

  private void connectToNNAndHandshake() throws IOException {
    // get NN proxy
    bpNamenode = dn.connectToNN(nnAddr);

    // First phase of the handshake with NN - get the namespace
    // info.
    NamespaceInfo nsInfo = retrieveNamespaceInfo();
    
    // Verify that this matches the other NN in this HA pair.
    // This also initializes our block pool in the DN if we are
    // the first NN connection for this BP.
    bpos.verifyAndSetNamespaceInfo(nsInfo);
    
    // Second phase of the handshake with the NN.
    register();
  }

NamespaceInfo retrieveNamespaceInfo() throws IOException {
    NamespaceInfo nsInfo = null;
    while (shouldRun()) {
      try {
        nsInfo = bpNamenode.versionRequest();
        LOG.debug(this + " received versionRequest response: " + nsInfo);
        break;
      } catch(SocketTimeoutException e) {  // namenode is busy
        LOG.warn("Problem connecting to server: " + nnAddr);
      } catch(IOException e ) {  // namenode is not available
        LOG.warn("Problem connecting to server: " + nnAddr);
      }
      
      // try again in a second
      sleepAndLogInterrupts(5000, "requesting version info from NN");
    }
    
    if (nsInfo != null) {
      checkNNVersion(nsInfo);
    } else {
      throw new IOException("DN shut down before block pool connected");
    }
    return nsInfo;
  }

void initBlockPool(BPOfferService bpos) throws IOException {
    NamespaceInfo nsInfo = bpos.getNamespaceInfo();
    if (nsInfo == null) {
      throw new IOException("NamespaceInfo not found: Block pool " + bpos
          + " should have retrieved namespace info before initBlockPool.");
    }
    
    // Register the new block pool with the BP manager.
    blockPoolManager.addBlockPool(bpos);

    setClusterId(nsInfo.clusterID, nsInfo.getBlockPoolID());
    
    // In the case that this is the first block pool to connect, initialize
    // the dataset, block scanners, etc.
    initStorage(nsInfo);
    initPeriodicScanners(conf);
    
    data.addBlockPool(nsInfo.getBlockPoolID(), conf);
  }

并持久化到本地每一个存储目录下的VERSION文件中的：

cat /data0/hadoop/dfs/data/current/VERSION

#Thu Oct 23 14:06:21 CST 2014
storageID=DS-35e3967e-51e4-4a6c-a3da-d2be044c8522
clusterID=CID-1561e550-a7b9-4886-8a9a-cc2328b82912
cTime=0
datanodeUuid=1327c11f-984c-4c07-a44a-70ba5e84621c
storageType=DATA_NODE
layoutVersion=-55

所以如果HDFS在也有NS的基础上再增加NS，新的NN在格式化的时候必须指定之前也有的clusterID，这样DN才能成功连接上新的DN。

说明：

DN：DataNode

NN：NameNode

JN：JournalNode

NS：NameService

hdfs增加ns之后，重启DN报clusterId不匹配错误

原文：http://blog.csdn.net/bigdatahappy/article/details/41831599

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)