Digitized Articles of Estonia is searchable through the web interface https://dea.digar.ee/ and accessible as a dataset. The overview of the data is on a separate page.
The access to the data is available in the cloud in JupyterHub environment where it is possible to run code and write Jupyter Notebooks, using R or Python.
The JupyterHub environment has access to full texts and metadata, a possibility to write your own analyses and download your findings. Data is open for anyone to use.
To use the environment, you need to acquire a username in ETAIS. To get the username please contact data@digar.ee.
For easy access to the data, an R package digar.txts has been made that helps one form subsets from the collection and perform full text searches on them.
To process the data it is possible to use your own code, rely on example case studies or export your search results as a table.
The access to the files is supported by an R package digar.txts which uses a few simple commands to 1) get an overview of the data and the associated files, 2) form subsets of the data, 3) perform text searches on them, and 4) extract the immediate context of the text matches. The search results may for example be stored in a table and downladed as a smaller collection for offline use.
These commands are:
Any R commands or packages can fit for further processing. While the JupyterHub environment supports both R and Python, but each Notebook is usually in only one of them.
#Install package remotes, if needed. JupyterLab should have it.
#install.packages("remotes")
#Since the JypiterLab that we use does not have write-access to
#all the files, we specify a local folder for our packages.
dir.create("R_pckg")
remotes::install_github("peeter-t2/digar.txts",lib="~/R_pckg/",upgrade="never")
library(digar.txts,lib.loc="~/R_pckg/")
all_issues <- get_digar_overview()
library(tidyverse)
subset <- all_issues %>%
filter(DocumentType=="NEWSPAPER") %>%
filter(year>1880&year<1940) %>%
filter(keyid=="postimeesew")
subset_meta <- get_subset_meta(subset)
#potentially write to file, for easier access if returning to it
#readr::write_tsv(subset_meta,"subset_meta_postimeesew1.tsv")
#subset_meta <- readr::read_tsv("subset_meta_postimeesew1.tsv")
do_subset_search(searchterm="lurich", searchfile="lurich1.txt",subset)
texts <- fread("lurich1.txt",header=F)[,.(id=V1,txt=V2)]
concs <- get_concordances(searchterm="[Ll]urich",texts=texts,before=30,after=30,txt="txt",id="id")
Note: to use ctrl+shift+m keyboard shortcut for the %>% pipe in Jupyter, do this. Add this code in Settings -> Advanced Settings Editor… -> Keyboard Shortcuts, on the left in the User Preferences box.
{
"shortcuts": [
{
"command": "notebook:replace-selection",
"selector": ".jp-Notebook",
"keys": ["Ctrl Shift M"],
"args": {"text": '%>% '}
}
]
}