We will show how researchers within the humanities can access and use the cloud-based corpus at the National Library of Norway from within Jupyter Notebook. We cover the following topics:
A problem for many researchers is the use of copyrighted material. However, the actual text is not often required; some features of it may suffice, like bag of words, a participle count or a character model. None of these features challenge the copyright holder. A centralized repository of copyrighted material can provide feature sets that suffice for many kind of analyses.
An API can be used by researchers without programming skills, as well as programmers. While the latter need a documentation of the low-level interface to the cloud, the actual API, the former wants an accessible interface for doing corpus analysis, and Jupyter Notebook provides such an interface via top level functions and commands expressed in a programming language, e.g. Python or R.
Readymade library metadata can be integrated for building corpora based on those data, like Dewey decimal codes or topic words. We will show how metadata can be used to build, select and compare corpora. The participants will be able to build a corpus and do analysis on it.
The participants will experiment with the API, and get a hands on experience with the tools.
A link to this page Jupyter and corpus tutorial dhn2019