Setup

This document will walk you through the general concepts of textanalysis and demonstrate the broad workflow of the package. First you will need to have the package installed of course; instructions are on the homepage. Once installed, the package can be loaded.

library(textanalysis)

Because the package depends on Julia we must initialise the session.

init_textanalysis()
#> Julia version 1.1.1 at location /home/jp/Downloads/julia-1.1.1-linux-x86_64/julia-1.1.1/bin will be used.
#> Loading setup script for JuliaCall...
#> Finish loading setup script for JuliaCall.
#> ✔ textanalysis initialised.

Document

Once the pacakges, installed, loaded, and the session initialised we can start using the package. The most basic object type in textanalysis is the document, it can be created with any of the *_document functions, though you will likely only need string_document to create a document from a character string.

str <- "This is a very simple document!"
(doc <- string_document(str))
#> ℹ A document.

You can always get the content of the document with get_text.

get_text(doc)
#> [1] "This is a very simple document!"

Turning a character string into a document allows to easily clean it, or prepare it in textanalysis jargon. There are multitude of ways to clean text in textanalysis, they are further detailed in the preprocessing vignettes. Here we use the straightforward prepare function with leaving all arguments on default.

prepare(doc)
#> ⚠ This function changes `document` in place!

Notice the warning, the prepare function changes the object doc in place; we did not assign the (inexistent) output of prepare to a new object and the object doc changed. Let’s demonstrate.

get_text(doc) # see the document changed!
#> [1] "    simple document"

This is somewhat puzzling for us R users but it actually happens for good reasons: textanalysis does not need to make a copy of the object, this allows processing more data.

However, you will not use the package in this manner as you have multiple documents to process at once, you can do so with the to_documents function which will process multiple documents from a vector or data.frame

texts <- c(
  str,
  "This is another document"
)

(docs <- to_documents(texts))
#> ℹ 2 documents.