The basic unit of text analysis is a document. The textanalysis package allows one to work with documents stored in a variety of formats.

file_document(path)

string_document(text)

token_document(text)

ngram_document(text, ...)

Arguments

path

The path to the file.

text

The text as a character string, or tokens, or ngrams as a list.

...

Other positonal arguments.

Value

An object of class document.

See also

directory_corpus to read a dicrectory of files as corpus, and to_documents to parse a vector or data.frame to documents.

Examples

# NOT RUN {
init_textanalysis()

doc <- "This is a document."
string_document(doc)
ngram_document(doc, 2L)
# }