基于SLDTM的主题提取方法

郭晓利, 周自岚

PDF(4276 KB)
PDF(4276 KB)
东北电力大学学报 ›› 2017, Vol. 37 ›› Issue (5) : 80-86.
信息·计算机·自动化

基于SLDTM的主题提取方法

  • 郭晓利, 周自岚
作者信息 +

Topic Extraction Method Based on SLDTM

  • Guo Xiaoli, Zhou Zilan
Author information +
History +

摘要

针对主题提取时现有的LDA模型对于主题数目和关键时间点的确定存在一定困难、对于主题结果的准确解释上存在难度的问题,本文提出的SLDTM融合了一种改进的聚类算法到DTM模型中,并在各个子集上采用标签信息进行监督学习。该模型中滑动窗口大小依据主题分布特征而变化,实现更合理的文本集分割,主题的个数也可变且易于理解。实验表明:和以往主题模型相比,SLDTM提取的主题更能体现内容发生的重要变化,语义也更加清晰。

Abstract

Owing to exist the difficult of determine the number of topics and key point of times and accurate interpretation of topics for existing LDA model.There present SLDTM,which fused an improved clustering algorithm to the DTM model and using the tag information for supervised learning in each subset.In this paper,a more reasonable text set segmentation can be achieved because the sliding window size of SLDTM can be changed according to the distribution characteristics of the topics.The number of topics is variable and can be understand easier.experimental results show that compared with the previous topic model,these extracted topics of SLDTM can reflect the important changes of the content and the semantics is clearer.

关键词

主题提取 / 主题模型 / 标签 / 文本处理

Key words

Topic Extraction / Topic Model / Tag / Text Processing

引用本文

导出引用
郭晓利, 周自岚. 基于SLDTM的主题提取方法. 东北电力大学学报. 2017, 37(5): 80-86
Guo Xiaoli, Zhou Zilan. Topic Extraction Method Based on SLDTM. Journal of Northeast Electric Power University. 2017, 37(5): 80-86

参考文献

[1] W.Cui,S.Liu,L.Tan,et al.TextFlow:towards better understanding of evolving topics in text[J].IEEE Transactions on Visualization and Computer Graphics,2011,17(12):2412-2421.

[2] 曲朝阳,范旭东,于华涛,等.基于本体的智能电网文本知识获取模型[J].东北电力大学学报,2014,34(5):60-68.

[3] 曹丽娜,唐锡晋.基于主题模型的BBS话题演化趋势分析[J].管理科学学报,2014,17(11):109-121.

[4] 曹建平,王晖,夏友清,等.基于LDA的双通道在线主题演化模型[J].自动化学报,2014,40(12):2877-2886.

[5] 徐戈,王厚峰.自然语言处理中主题模型的发展[J].计算机学报,2011,34(8):1423-1436.

[6] K.Hornik,B.Grun.topicmodels:An R package for fitting topic models[J].Journal of Statistical Software,2011,40(13):1-30.

[7] H E.Jianyun,X.Chen,D U.Min,et al.Topic evolution analysis based on improved online LDA model[J].Journal of Central South University,2015,46(2):547-553.

[8] 单斌,李芳.基于LDA话题演化研究方法综述[J].中文信息学报,2010,24(6):43-49.

[9] S.Jameel,W.Lam,L.Bing.Supervised topic models with word order structure for document classification and retrieval learning[J].Information Retrieval Journal,2015,18(4):1-48.

[10] J.Zhu,A.Ahmed,E P.Xing.MedLDA:maximum margin supervised topic models[J].Journal of Machine Learning Research,2012,13(4):2237-2278.

[11] Y.Rao,Q.Li,X.Mao,et al.Sentiment topic models for social emotion mining[J].Information Sciences,2014,266(5):90-100.

[12] 杨玉珍,刘培玉.融合扩展信息瓶颈理论的话题关联检测方法研究[J].自动化学报,2014,40(3):471-479.

[13] 胡艳丽,白亮,张维明.一种话题演化建模与分析方法[J].自动化学报,2012,38(10):1690-1697.

[14] S.Oeltze,D J.Lehmann,A.Kuhn,et al.Blood flow clustering and applications in virtual stenting of intracranial aneurysms[J].IEEE Transactions on Visualization and Computer Graphics,2014,20(5):686-701.

[15] 曲朝阳,陈帅,杨帆,等.基于云计算技术的电力大数据预处理属性约简方法[J].电力系统自动化.2014,38(8),67-71.

[16] A N.Rafferty,T L.Griffiths,D.Klein.Analyzing the rate at which languages lose the influence of a common ancestor[J].Cognitive Science,2014,38(17):1406-1431.

[17] S.Liu,X.Wang,Y.Song,et al.Evolutionary bayesian rose trees[J].IEEE Transactions on Knowledge and Data Engineering,2015,27(6):1533-1546.

[18] S.Liu,J.Yin,X.Wang,et al.Online visual analytics of text streams[J].IEEE Transactions on Visualization and Computer Graphics,2015,22(11):2451-2466.

[19] I.Pruteanu-Malinici,L.Ren,J.Paisley,et al.Hierarchical bayesian modeling of topics in time-stamped documents[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2010,32(6):996-1011.

[20] W.Ding,C.Chen.Dynamic topic detection and tracking:a comparison of HDP,C-word,and cocitation methods[J].Journal of the Association for Information Scienceand Technology,2014,65(10):2084-2097.

[21] 郭晓利,韩啸.电网知识协同发现策略研究[J].东北电力大学学报,2014,34(1):94-98.
PDF(4276 KB)

301

Accesses

0

Citation

Detail

段落导航
相关文章

/