首页 > 编程语言 > 详细

define different Jieba objects in python file

时间:2019-10-17 18:01:12      阅读:72      评论:0      收藏:0      [点我收藏+]

Now, I have three different vocab.txt (glove, tencent.ai, fasttext).

Target: use these vocab.txt to init jieba object in one python file.

Method: if define three different jieba objects, there should be three different cache files here. Of course, should solve how to pass in different cache file paths ?  In 

/home/user/anaconda3/envs/py36/lib/python3.6/site-packages/jieba/__init__.py, change the parameters of the __init__() function.

 51
 52 class Tokenizer(object):
 53
 54     def __init__(self, tmp_dir=None, dictionary=DEFAULT_DICT):
 55         self.lock = threading.RLock()
 56         if dictionary == DEFAULT_DICT:
 57             self.dictionary = dictionary
 58         else:
 59             self.dictionary = _get_abs_path(dictionary)
 60         self.FREQ = {}
 61         self.total = 0
 62         self.user_word_tag_tab = {}
 63         self.initialized = False
 64         self.tmp_dir = tmp_dir
 65         self.cache_file = None

 

Result:

 1 import sys
 2 sys.path.append(/home/user/anaconda3/envs/py36/lib/python3.6/site-packages/jieba)
 3 from jieba import Tokenizer
 4 class Jieba(object):
 5     """docstring for Jie"""
 6     def __init__(self, vocab_path, model_path):
 7         super(Jie, self).__init__()
 8         self.jieba = Tokenizer(os.path.join("/home/user/models/serving_embedding_torch/model_path/torch/data", model_path))
 9         self.jieba.load_userdict(vocab_path)
10 
11     def seg(self, text):
12         print(list(self.jieba.cut(text, cut_all=False)))
13 
14 a = Jieba(glove.model/vocab.txt, glove.model)
15 b = Jieba(tencent.model/vocab.txt, tencent.model)
16 c = Jieba(fb.model/vocab.txt, fb.model)
17 text = "区块链是一个好方向海派青年公寓龙爪槐"
18 a.seg(text)
19 b.seg(text)
20 c.seg(text)
(py36) user@big-001:~/models/serving_embedding_torch/model_path/torch/data$  python3 peel.py
Building prefix dict from the default dictionary ...
2019-10-17 17:14:20,745 DEBUG: Building prefix dict from the default dictionary ...
Dumping model to file cache /home/user/models/serving_embedding_torch/model_path/torch/data/glove.model/jieba.cache
2019-10-17 17:14:21,575 DEBUG: Dumping model to file cache /home/user/models/serving_embedding_torch/model_path/torch/data/glove.model/jieba.cache
Loading model cost 0.899 seconds.
2019-10-17 17:14:21,644 DEBUG: Loading model cost 0.899 seconds.
Prefix dict has been built succesfully.
2019-10-17 17:14:21,644 DEBUG: Prefix dict has been built succesfully.
Building prefix dict from the default dictionary ...
2019-10-17 17:14:26,352 DEBUG: Building prefix dict from the default dictionary ...
Dumping model to file cache /home/user/models/serving_embedding_torch/model_path/torch/data/tencent.model/jieba.cache
2019-10-17 17:14:27,101 DEBUG: Dumping model to file cache /home/user/models/serving_embedding_torch/model_path/torch/data/tencent.model/jieba.cache
Loading model cost 0.805 seconds.
2019-10-17 17:14:27,158 DEBUG: Loading model cost 0.805 seconds.
Prefix dict has been built succesfully.
2019-10-17 17:14:27,159 DEBUG: Prefix dict has been built succesfully.
Building prefix dict from the default dictionary ...
2019-10-17 17:18:41,279 DEBUG: Building prefix dict from the default dictionary ...
Dumping model to file cache /home/user/models/serving_embedding_torch/model_path/torch/data/fb.model/jieba.cache
2019-10-17 17:18:42,045 DEBUG: Dumping model to file cache /home/user/models/serving_embedding_torch/model_path/torch/data/fb.model/jieba.cache
Loading model cost 0.822 seconds.
2019-10-17 17:18:42,101 DEBUG: Loading model cost 0.822 seconds.
Prefix dict has been built succesfully.
2019-10-17 17:18:42,102 DEBUG: Prefix dict has been built succesfully.
[‘区块‘, ‘链是‘, ‘一个‘, ‘好‘, ‘方向‘, ‘海派‘, ‘青年‘, ‘公寓‘, ‘龙爪槐‘]
[‘区块链‘, ‘是‘, ‘一个‘, ‘好方向‘, ‘海派青年公寓‘, ‘龙爪槐‘]
[‘区块链‘, ‘是‘, ‘一个‘, ‘好‘, ‘方向‘, ‘海派‘, ‘青年‘, ‘公寓‘, ‘龙爪槐‘]

 

define different Jieba objects in python file

原文:https://www.cnblogs.com/wang2825/p/11693653.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!