HBase Python API
HBase通过thrift机制可以实现多语言编程,信息通过端口传递,因此Python是个不错的选择
吐槽
博主在Mac上配置HBase,奈何Zoomkeeper一直报错,结果Ubuntu虚拟机上10min解决……但是虚拟机里没有IDE写Java代码还是不方便,因此用Mac主机连接虚拟机的想法孕育而生,这样又可以愉快地使用主机的IDE了~
一、服务端启动Hbase Thrift RPC
HBase的启动方式有很多,这里不再赘述,Ubuntu启动HBase之后,启动thrift
hbase-daemon.sh start thrift默认的服务端口是9090
二、客户端安装依赖包
sudo pip install thrift
sudo pip install hbase-thrift三、编写客户端代码
# coding=utf-8
from thrift.transport import TSocket
from thrift.transport.TTransport import TBufferedTransport
from thrift.protocol import TBinaryProtocol
from hbase import Hbase
from hbase.ttypes import ColumnDescriptor
from hbase.ttypes import Mutation
class HBaseClient(object):
    def __init__(self, ip, port=9090):
        """
        建立与thrift server端的连接
        """
        # server端地址和端口设定
        self.__transport = TBufferedTransport(TSocket.TSocket(ip, port))
        # 设置传输协议
        protocol = TBinaryProtocol.TBinaryProtocol(self.__transport)
        # 客户端
        self.__client = Hbase.Client(protocol)
        # 打开连接
        self.__transport.open()
    def __del__(self):
        self.__transport.close()
    def get_tables(self):
        """
        获得所有表
        :return:表名列表
        """
        return self.__client.getTableNames()
    def create_table(self, table, *columns):
        """
        创建表格
        :param table:表名
        :param columns:列族名
        """
        func = lambda col: ColumnDescriptor(col)
        column_families = map(func, columns)
        self.__client.createTable(table, column_families)
    def put(self, table, row, columns):
        """
        添加记录
        :param table:表名
        :param row:行键
        :param columns:列名
        :return:
        """
        func = lambda (k, v): Mutation(column=k, value=v)
        mutations = map(func, columns.items())
        self.__client.mutateRow(table, row, mutations)
    def delete(self, table, row, column):
        """
        删除记录
        :param table:表名
        :param row:行键
        """
        self.__client.deleteAll(table, row, column)
    def scan(self, table, start_row="", columns=None):
        """
        获得记录
        :param table: 表名
        :param start_row: 起始行
        :param columns: 列族
        :param attributes:
        """
        scanner = self.__client.scannerOpen(table, start_row, columns)
        func = lambda (k, v): (k, v.value)
        while True:
            r = self.__client.scannerGet(scanner)
            if not r:
                break
            yield dict(map(func, r[0].columns.items()))
if __name__ == ‘__main__‘:
    client = HBaseClient("10.211.55.7")
    # client.create_table(‘student‘, ‘name‘, ‘course‘)
    client.put("student", "1",
               {"name:": "Jack",
                "course:art": "88",
                "course:math": "12"})
    client.put("student", "2",
               {"name:": "Tom", "course:art": "90",
                "course:math": "100"})
    client.put("student", "3",
               {"name:": "Jerry"})
    client.delete(‘student‘, ‘1‘, ‘course:math‘)
    for v in client.scan(‘student‘):
        print v四、测试结果
{‘course:art‘: ‘88‘, ‘name:‘: ‘Jack‘}
{‘course:art‘: ‘90‘, ‘name:‘: ‘Tom‘, ‘course:math‘: ‘100‘}
{‘name:‘: ‘Jerry‘}五、小结
有了Python接口后,编写简单任务脚本变得非常方便,这大大得益于RPC机制,很好地解耦了Client和Server,方便开发人员合作。
