首页 > 编程语言 > 详细

python分析《三国演义》,谁才是这部书的绝对主角(包含统计指定角色的方法)

时间:2021-05-19 01:05:34      阅读:23      评论:0      收藏:0      [点我收藏+]

前面分析统计了金庸名著《倚天屠龙记》中人物按照出现次数并排序

https://www.cnblogs.com/becks/p/11421214.html

然后使用pyecharts,统计B站某视频弹幕内容,并绘制成词云显示

https://www.cnblogs.com/becks/p/14743080.html

 

这次,就用分析统计下《三国演义》这部书里各角色出现的频率,并绘制成词云显示,看看谁是绝对的主角吧

 

首先,我们需要把这部书里出现的人物都枚举出来,毕竟只统计角色信息,不需要把非人物名也统计进来

角色 = {刘备,诸葛亮,关羽,张飞,刘禅,"孙权",赵云,司马懿,周瑜,曹操,袁绍,马超,魏延,
        黄忠,姜维,马岱,庞德,孟获,刘表,董卓,孙策,
        鲁肃,司马昭,夏侯渊,王平,刘璋,袁术,吕蒙,甘宁,邓艾,曹仁,
        陆逊,许褚,庞统,曹洪,李典,曹丕,廖化,曹真,吕布}

然后就是读取实现准备好的《三国演义》书籍txt文档格式,使用jieba库对文档内容进行处理

# -*-coding:utf8-*-
# encoding:utf-8

import jieba   #倒入jieba库
import os
import sys
from collections import Counter#分词后词频统计
from pyecharts.charts import WordCloud#词云

path = os.path.abspath(os.path.dirname(sys.argv[0]))
txt=open(path+\\171182.txt,"r", encoding=utf-8).read()    #读取三国演义文本

words=jieba.lcut(txt)   #jieba库分析文本
counts={}

在就是统计指定角色姓名出现次数

for word in words:    
    if len(word)<=1:   
        continue
    elif word in 角色:
        counts[word]=counts.get(word,0)+1
    else:
        None

绘制词云

items=list(counts.items())#字典到列表
wordcloud = WordCloud()
wordcloud.add("",items,word_size_range=[15, 80],rotate_step=30,shape=cardioid)
wordcloud.render(path+\\wordcloud.html)

执行脚本后查看生成文件

技术分享图片

 

 技术分享图片

 

 

曹操两个字的显示的最大,说明整部书里出现的次数最多。这肯定不对,罗贯中是刘备粉啊,

后来想了下,在三国里,直呼人姓名那是骂人,是损。那些所谓的正派人士都是有雅称的,比如卧龙、诸葛等等

改了下代码,把这些人的雅称也匹配进去

刘备 = {"玄德","玄德曰","先主","刘豫州","刘皇叔",刘玄德,刘使君}
诸葛亮 = {"孔明","孔明曰","卧龙","卧龙先生","诸葛先生",孔明先生,诸葛丞相,诸葛}
关羽 = {"关公","云长","汉寿亭侯","关云长"}
曹操 = {"孟德",曹孟德,曹操}
张飞 = {"张翼德",翼德}

同时,统计部分也作了处理

for word in words:    #筛选分析后的名词
    if len(word)<=1:   #因为词组中的汉字数大于1个即认为是一个词组,所以通过continue结束掉读取的汉字书为1的内容
        continue
    #elif word in exculdes:
        #continue
    #elif word in 诸葛亮 or word in 刘备 or word in 关羽 or word in 曹操:  
        #counts[word]=counts.get(word,0)+1
    elif word in 刘备:
        word ="刘备"
        counts[word]=counts.get(word,0)+1
    elif word in 诸葛亮:
        word ="诸葛亮"
        counts[word]=counts.get(word,0)+1
    elif word in 曹操:
        word ="曹操"
        counts[word]=counts.get(word,0)+1  
    elif word in 关羽:
        word ="关羽"
        counts[word]=counts.get(word,0)+1    
    elif word in 张飞:
        word ="张飞"
        counts[word]=counts.get(word,0)+1    
    elif word in 其他:
        counts[word]=counts.get(word,0)+1
  
    else:
        None

 

 再次执行,嗯,诸葛亮是王者,诸葛亮合计出现了1350次,刘备合计出现1271次

技术分享图片

 

技术分享图片

 

技术分享图片

 

 

 附整个代码

# -*-coding:utf8-*-
# encoding:utf-8

import jieba   #倒入jieba库
import os
import sys
from collections import Counter#分词后词频统计
from pyecharts.charts import WordCloud#词云


path = os.path.abspath(os.path.dirname(sys.argv[0]))
txt=open(path+\\三国演义.txt,"r", encoding=utf-8).read()    #文本

words=jieba.lcut(txt)   #jieba库分析文本
counts={}

刘备 = {"玄德","玄德曰","先主","刘豫州","刘皇叔",刘玄德,刘使君}
诸葛亮 = {"孔明","孔明曰","卧龙","卧龙先生","诸葛先生",孔明先生,诸葛丞相,诸葛}
关羽 = {"关公","云长","汉寿亭侯","关云长"}
刘禅 = {"后主"}
曹操 = {"孟德",曹孟德,曹操}
张飞 = {"张翼德",翼德}

其他 = {"孙权",赵云,司马懿,周瑜,刘禅,袁绍,马超,魏延,黄忠,姜维,马岱,庞德,孟获,刘表,董卓,孙策,
        鲁肃,司马昭,夏侯渊,王平,刘璋,袁术,吕蒙,甘宁,邓艾,曹仁,陆逊,许褚,庞统,曹洪,李典,曹丕,廖化,曹真,吕布}
for word in words:    #筛选分析后的名词
    if len(word)<=1:   #因为词组中的汉字数大于1个即认为是一个词组,所以通过continue结束掉读取的汉字书为1的内容
        continue
    #elif word in exculdes:
        #continue
    #elif word in 诸葛亮 or word in 刘备 or word in 关羽 or word in 曹操:  
        #counts[word]=counts.get(word,0)+1
    elif word in 刘备:
        word ="刘备"
        counts[word]=counts.get(word,0)+1
    elif word in 诸葛亮:
        word ="诸葛亮"
        counts[word]=counts.get(word,0)+1
    elif word in 曹操:
        word ="曹操"
        counts[word]=counts.get(word,0)+1  
    elif word in 关羽:
        word ="关羽"
        counts[word]=counts.get(word,0)+1    
    elif word in 张飞:
        word ="张飞"
        counts[word]=counts.get(word,0)+1    
    elif word in 其他:
        counts[word]=counts.get(word,0)+1
  
    else:
        None

items=list(counts.items())#字典到列表

wordcloud = WordCloud()
wordcloud.add("",items,word_size_range=[15, 80],rotate_step=30,shape=cardioid)
wordcloud.render(path+\\wordcloud.html)

 

python分析《三国演义》,谁才是这部书的绝对主角(包含统计指定角色的方法)

原文:https://www.cnblogs.com/becks/p/14783258.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!