Perform Latent Dirichlet Analysis or lda on a term-document matrix.

lda(dtm, topics = 2L, iter = 1000L, alpha = 0.1, beta = 0.1)

# S3 method for dtm
lda(dtm, topics = 2L, iter = 1000L, alpha = 0.1,
  beta = 0.1)

Arguments

dtm

An object of class document_term_matrix as returned by document_term_matrix.

topics, iter

Number of topics and iterations.

alpha

Dirichlet dist. hyperparameter for topic distribution per document. alpha < 1 yields a sparse topic mixture for each document. alpha > 1 yields a more uniform topic mixture for each document.

beta

Dirichlet dist. hyperparameter for word distribution per topic. beta < 1 yields a sparse word mixture for each topic. beta > 1 yields a more uniform word mixture for each topic.

Value

A list containing.

  • ntopics_nwords ntopics * nwords Sparse matrix of probabilities s.t. \(sum(ntopics_nwords, 1) == 1\).

  • ntopics_ndocs ntopics * ndocs Dense matrix of probabilities s.t. \(sum(theta, 1) == 1\).

Examples

# NOT RUN {
init_textanalysis()

# build document
doc1 <- string_document("First document. Another sentence")
doc2 <- string_document("Some example written here.")
doc3 <- string_document("This is a string document")
doc4 <- string_document("Yet another document for the corpus.")

crps <- corpus(doc1, doc2, doc3, doc4)

update_lexicon(crps)

m <- document_term_matrix(crps)
lda <- lda(m, 2L, 1000L, .1, .1)
# }