导入数据到HBase的方式选择

时间：2016-04-12 11:08:39 阅读：180 评论：0 收藏：0 [点我收藏+]

Choosing the Right Import Method

If the data is already in an HBase table:

To move the data from one HBase cluster to another, use snapshot and either the clone_snapshot or ExportSnapshot utility; or, use the CopyTable utility.
To move the data from one HBase cluster to another without downtime on either cluster, use replication.
To migrate data between HBase version that are not wire compatible, such as from CDH 4 to CDH 5, see Importing HBase Data From CDH 4 to CDH 5.

If the data currently exists outside HBase:

If possible, write the data to HFile format, and use a BulkLoad to import it into HBase. The data is immediately available to HBase and you can bypass the normal write path, increasing efficiency.
If you prefer not to use bulk loads, and you are using a tool such as Pig, you can use it to import your data.

If you need to stream live data to HBase instead of import in bulk:

Write a Java client using the Java API, or use the Apache Thrift Proxy API to write a client in a language supported by Thrift.
Stream data directly into HBase using the REST Proxy API in conjunction with an HTTP client such as wget or curl.
Use Flume or Spark.

Most likely, at least one of these methods works in your situation. If not, you can use MapReduce directly. Test the most feasible methods with a subset of your data to determine which one is optimal.

摘自：http://www.cloudera.com/documentation/enterprise/5-4-x/topics/admin_hbase_import.html

导入数据到HBase的方式选择

原文：http://www.cnblogs.com/admln/p/5381774.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)