documents.Rd
The basic unit of text analysis is a document. The textanalysis package allows one to work with documents stored in a variety of formats.
file_document(path) string_document(text) token_document(text) ngram_document(text, ...)
path | The path to the file. |
---|---|
text | The text as a character string, or tokens, or ngrams as a list. |
... | Other positonal arguments. |
An object of class document
.
directory_corpus
to read a dicrectory of files as corpus,
and to_documents
to parse a vector or data.frame to documents.
# NOT RUN { init_textanalysis() doc <- "This is a document." string_document(doc) ngram_document(doc, 2L) # }