The lexicon of a corpus consists of all the terms that occur in any document in the corpus. The lexical frequency of a term tells us how often a term occurs across all of the documents. Often the most interesting words in a document are those words whose frequency within a document is higher than their frequency in the corpus as a whole.

lexicon(corpus)

update_lexicon(corpus)

# S3 method for corpus
lexicon(corpus)

# S3 method for corpus
update_lexicon(corpus)

Arguments

corpus

A corpus, as returned vy corpus.

Examples

# NOT RUN {
init_textanalysis()

# build document
doc1 <- string_document("First document.")
doc2 <- string_document("Second document.")

# do not automatically update
corpus <- corpus(doc1, doc2, update_lexicon = FALSE)

update_lexicon(corpus)
lexicon(corpus)
# }