Setup

initialisation functions.

init_textanalysis() init_stringanalysis() switch_backend() install_textanalysis() install_stringanalysis()

Initialise Session

Objects

Create documents and corpora.

file_document() string_document() token_document() ngram_document()

Document

remove_case()

Remove Upper case

remove_corrupt_utf8()

Remove Corrupt UTF8

remove_words()

Remove Specific Words

corpus() directory_corpus()

Corpus

to_documents()

Create Multiple Documents

standardize()

Standardize

inverse_index() update_inverse_index() inverse_index()

Inverse Index

lexicon() update_lexicon()

Lexicon

Metadata

Get metadata on documents and corpora.

title_() language_() author_() timestamp_()

Document Metadata

titles_() languages_() authors_() timestamps_()

Corpus Metadata

get_ngrams()

Extract NGrams

get_text()

Extract Text

get_tokens()

Extract Tokens

Preprocessing

Clean documents and corpora.

prepare()

Preprocess Document

strip_articles()

Strip Articles

strip_definite_articles()

Strip Definite Articles

strip_frequent_terms()

Strip Frequent Terms

strip_html_tags()

Strip HTML Tags

strip_indefinite_articles()

Strip Indefinite Articles

strip_non_letters()

Strip Stopwords

strip_numbers()

Strip Numbers

strip_preprositions()

Strip Preprositions

strip_pronouns()

Strip Pronouns

strip_punctuation()

Strip Punctuation

strip_sparse_terms() strip_sparse_terms()

Strip Sparse Terms

strip_stopwords()

Strip Stopwords

stem_words()

Stem

Term Frequency

Term frequency-related function.

tf()

Term Frequency

tf_idf()

Term Frequency Inverse Document Freqency

bm_25()

Okapi BM-25

document_term_matrix()

DocumentTermMatrix

document_term_vector()

Document Term Vector

dtm_matrix()

Sparse Matrix

Features

Models and text metrics.

coom()

Co-occurrence Matrix

create_hash_function() hash()

Hash Trick

init_naive_classifer()

Naive Bayes Classifier

lda()

Latent Dirichlet Analysis

lsa()

Latent Semantic Analysis

lexical_frequency()

Lexical Frequency

lexicon_size()

Lexicon Size

ngram_complexity()

Determine NGram Complexity

sentiment()

Sentiment Analyzer

summarizer()

Summarize

train_naive_classifier()

Train Naive Bayes Classifier

predict_class()

Predict Class

Misclellaneous

Utility function.

corpus_to_tibble()

Convert Corpus

set_seed()

Set Seed in Julia