In the following I want to introduce some very useful tools for working with historical data.
Tools
Voyant Tools
Voyant Tools is an open source web application for performing text analysis. Supports academic reading and interpretation of texts or corpus, particularly by scholars in the digital humanities, but also by students and the general public. It can be used to analyze texts online or uploaded by users.
Its interface is made up of panels that perform these various analytical tasks. These panels can also be embedded in external web texts (for example, a web article could include a Voyant panel that creates a word cloud from it).
Geoffrey Rockwell's book
Hermeneutica: Computer-Assisted Interpretation in the Humanities.
Techniques
Apart from computer tools, there is a series of techniques that must be mastered in order to work with historical texts properly.
For example, you have to know how ...:
- preserve texts
- digitize texts
- transcribe texts
- organize, respectively tokenize texts
- archive texts
- interpret texts
Methods for transcription
The transcription of a text begins with its search, respectively creation.
Digitize books
Some of the most important aspects when digitizing a text are:
- Copyright
- OCR recognition
- The file format
Some digital libraries offer books and historical texts already digitized, whose use for the subsequent process of transcription and analysis is quite useful.
Google Books
Many of the books available on Google Books have already been recognized by OCR and offer a download option in .txt format.
You can read a
step-by-step guide here.
Voice to text transcription
Google has a fairly powerful transcription tool for
oral heritage, although its free use is limited. There is an
English manual here.
Manuscript transcription
One of the biggest challenges is the reading and transcription of manuscripts. Who hasn't ever experienced that they couldn't write their own notes after a while? Handwritten texts from other eras or cultural contexts create even greater difficulties.
.
Download Transcript.
Text traceability
Another great challenge is traceability. When we transcribe a text from a manuscript it is sometimes difficult to find the exact place in the manuscript that refers to each of the transcribed texts.
A good practice is to display page by page using overlapping photos, something similar to what Google does in its Google Books.
Another example is found in the virtual edition of Alfred Escher's letters.
Alfred Escher Briefedition.
Tokenization
There are a number of tools for dividing texts into smaller units. The central question is where and how it should be divided.
Besides Voyant Tools, find a set of tools here:
Online Tools.
Voyant Tools.
Natural Language Toolkit.
Tabea Hirzel
Saturday, January 23, 2021