LibShortText 怎么处理中文文档

时间：2014-11-06 19:43:25 阅读：1293 评论：0 收藏：0 [点我收藏+]

LibShortText 是林智仁老师继 libsvm、liblinear 之后的另一力作，主要有几大特征：

It is more efficient than general text-mining packages. On a typical computer, processing and training 10 million short texts takes only around half an hour.
The fast training and testing is built upon the linear classifier LIBLINEAR
Default options often work well without tedious tuning.
An interactive tool for error analysis is included. Based on the property that each short text contains few words, LibShortText provides details in predicting each text.

这么一个工具，如何使用在中文处理呢？
尝试了一下中文的unigram feature的自动生成，发现中文character 没有count进unigram中=。=

于是我发信问了作者
作者回复：

Unfortunately I don‘t think our code can now support Chinese
documents.
Chih-Jen

原文：http://www.cnblogs.com/zklidd/p/4079668.html

踩

(0)

评论一句话评论（0）

分享档案

更多>