Remove corrupt UTF8 characters that might cause issues, recommended.

remove_corrupt_utf8(text)

# S3 method for corpus
remove_corrupt_utf8(text)

# S3 method for documents
remove_corrupt_utf8(text)

# S3 method for document
remove_corrupt_utf8(text)

Arguments

text

An object inheriting of class document or corpus.

Examples

# NOT RUN {
init_textanalysis()

# build document
doc <- string_document("this document is clean")

# replaces in place!
remove_corrupt_utf8(doc)
# }