get_started.Rmd
This document will walk you through the general concepts of textanalysis and demonstrate the broad workflow of the package. First you will need to have the package installed of course; instructions are on the homepage. Once installed, the package can be loaded.
library(textanalysis)
Because the package depends on Julia we must initialise the session.
init_textanalysis()
#> Julia version 1.1.1 at location /home/jp/Downloads/julia-1.1.1-linux-x86_64/julia-1.1.1/bin will be used.
#> Loading setup script for JuliaCall...
#> Finish loading setup script for JuliaCall.
#> ✔ textanalysis initialised.
Once the pacakges, installed, loaded, and the session initialised we can start using the package. The most basic object type in textanalysis is the document, it can be created with any of the *_document
functions, though you will likely only need string_document
to create a document from a character string.
str <- "This is a very simple document!"
(doc <- string_document(str))
#> ℹ A document.
You can always get the content of the document with get_text
.
get_text(doc)
#> [1] "This is a very simple document!"
Turning a character string into a document
allows to easily clean it, or prepare
it in textanalysis jargon. There are multitude of ways to clean text in textanalysis, they are further detailed in the preprocessing vignettes. Here we use the straightforward prepare
function with leaving all arguments on default.
prepare(doc)
#> ⚠ This function changes `document` in place!
Notice the warning, the prepare function changes the object doc
in place; we did not assign the (inexistent) output of prepare
to a new object and the object doc
changed. Let’s demonstrate.
get_text(doc) # see the document changed!
#> [1] " simple document"
This is somewhat puzzling for us R users but it actually happens for good reasons: textanalysis does not need to make a copy of the object, this allows processing more data.
However, you will not use the package in this manner as you have multiple documents to process at once, you can do so with the to_documents
function which will process multiple documents from a vector or data.frame
texts <- c(
str,
"This is another document"
)
(docs <- to_documents(texts))
#> ℹ 2 documents.