python - Clustering Using Latent Symantic Analysis -

- September 15, 2014

suppose have corpus of documents , run lsa algorithm on it. how can use final matrix obtained after applying svd semantically cluster words appearing in corpus of documents? wikipedia says lsa can used find relation between terms. there library available in python can me accomplish task of semantically clustering words based on lsa?

try gensim (http://radimrehurek.com/gensim/index.html), install following these instruction: http://radimrehurek.com/gensim/install.html

then here code sample:

from gensim import corpora, models, similarities  documents = ["human machine interface lab abc computer applications",              "a survey of user opinion of computer system response time",              "the eps user interface management system",              "system , human system engineering testing of eps",              "relation of user perceived response time error measurement",              "the generation of random binary unordered trees",              "the intersection graph of paths in trees",              "graph minors iv widths of trees , quasi ordering",              "graph minors survey"]  # remove common words , tokenize stoplist = set('for of , in'.split()) texts = [[word word in document.lower().split() if word not in stoplist]          document in documents]  # remove words appear once all_tokens = sum(texts, []) tokens_once = set(word word in set(all_tokens) if all_tokens.count(word) == 1)  texts = [[word word in text if word not in tokens_once] text in texts]  dictionary = corpora.dictionary(texts) corp = [dictionary.doc2bow(text) text in texts]  # extract 400 lsi topics; use default one-pass algorithm lsi = models.lsimodel.lsimodel(corpus=corp, id2word=dictionary, num_topics=400)  # print contributing words (both positively , negatively) each of first ten topics lsi.print_topics(10)

Search This Blog

Employment & Recruiting

python - Clustering Using Latent Symantic Analysis -

Popular posts from this blog

Php - Delimiter must not be alphanumeric or backslash -

Delphi interface implements -

java - How to create Table using Apache PDFBox -