The Reading tab comprises a set of contextual sections in support of a first reading of a text. These include bibliographical information about the text (title, author, themes, genres, headnote, references) and a choice between a text view that excludes most paratexts and a document view that represents all of the text in the source edition.
Depending on the form, structure, and content of the text, other available contextual sections include a table of contents, a summative view of the poetic form (metre, stanza form, syllabic pattern, rhyme type and scheme), bibliographic information about the source of the text, a statement of editorial principles applied, any text-specific secondary literature, an introductory essay, other versions of the text in ECPA, related works, and a list of other works by the same author.
ECPA supports the reading process through an extensible contextual reading aid function. When hovering over any word a set of word properties is displayed, including standard spelling, lemma, part of speech, word class, and pronunciation in modern RP according to the International Phonetic Alphabet (IPA). Also identical word tokens and lemmas are highlighted throughout the poem. Additionally, the selected word can be looked up in a number of external dictionaries, thesauri, gazetteers, and reference works.
ECPA makes it easy to add notes or queries to any part of the poetic text by simply clicking on the word or line in question and filling in the annotations form with your details. If the note or query applies not to a word or line, but to an entire stanza or paragraph, or to a piece of paratext, please adjust the context of the annotation settings. Please note that all contributions will be submitted to the editor in the first instance for review. Once peer reviewed, the contribution will be made publicly available under a Creative Commons BY-NC-SA License. This license protects your contributions, but at the same time lets others re-use and build upon them.
We welcome user-contributed content to the Eighteenth-Century Poetry Archive. Throughout the website, you can make contributions to augment an existing piece of information, make a correction to an obvious error (no transcription or edition is without error!), or submit your own notes, glosses, observations, suggestions, readings and interpretations. These contributions help make the resource better for everyone and we thank you for them in advance! These places of peer participation are marked with a icon and are subject to a peer-review process. If you would like to be acknowledged for your contribution, please fill in your contact details in the lower half of the form.
The Analysis tab comprises results from a number of computationally-assisted analytical processes on five core linguistic levels. These analytical layers can be studied individually (hence their containers are collapsible and sortable), when focusing on a particular aspect, for example the relation between verse line and syntax. However they should be considered as connected and interrelated when studying the poem as a whole, as each layer interacts with the others at any given point as well as over time. They require the reader to pay attention to them simultaneously, and to take into account literary as well as sociological features while doing so.
The analysis results thus represent a means of assisting the reader in the task of analysing a poem on a number of interrelated levels. We understand computer-assisted reading as enabling us to realize the full scope of phenomena on phonological, morphological, syntactic, semantic, and pragmatic levels that can be expressed and analysed algorithmically. The key task of fine-tuning the contextual data around every word in a text, i.e. assuming unique significance of every occurrence and its relations to other words, is at the heart of creating the inventory of analytical results. Any of these findings, once verified, are potentially useful, but it is through selection based on relevance and weighting in the context of an individual poem that this potential is fully realized and new insights can be won.
The verse line serves as the basic reference point for all analytical results. Visually as well as rhythmically accentuated, it provides an accessible level of granularity between smaller units such as phonemes, morphemes, and lexemes, and bigger building blocks, such as sentences, stanzas, and ultimately the text as a whole. When hovering over any line the analytical results for that line are dynamically updated on the right. In addition, an integrated view of all analytical layers is displayed when hovering over any word/token (identical words/tokens and lemmas are also highlighted throughout the poem).
As any computationally-assisted analysis of poetic texts that transcends basic quantitative levels is notoriously complex, frequently ambiguous and always error-prone, a word of caution must preface any such undertaking. Despite our best efforts to present useful and interesting results and highlight potential avenues for further investigation, we are aware of the pitfalls and many potential sources of errors, which we will continue to address as ECPA evolves and matures. The main issue of the use of natural language processing (NLP) on language that is both poetic and historically distant can only be addressed through ongoing work to improve domain appropriation, the training of tools on historical corpora, and the contribution of human knowledge in the form of textual notes, glosses, corrections and queries, interpretations, and models.
The long-term aim of this applied computational criticism is to make it possible, analytically, to "zoom" into a single phoneme, morpheme, lexeme and to seamlessly "zoom" out to the word, the line, the stanza, the whole poem, or even a cluster of poems, thus supplementing the current focus on close reading of individual texts with an analysis of the historical and cultural functions of poetic form on a larger scale. At the functional level, it should be possible to enhance the the website with new tools and let them operate on the underlying texts. Analysis thus becomes an act not separate from, but integrated with the act of reading. The integration of tools with the corpus, rather than as a separate entity, modifies the texts to "research objects".
This layer is concerned with elements traditionally associated with the musical aspects of lyric poems. These include metre, rhythm, rhetorical figures such as alliteration, assonance, and consonance, and, of course, rhyme. In the context of the long tradition of oral transmission of poetry, these patterns of repetition and composition (sonic, rhythmic, or otherwise), contribute to making poems more cohesive and memorizable. More generally, the sound schemata in this layer interact, frequently supporting, countering, or playing with the components of the other layers.
Throughout the 18th century, the dominant prosodic mode is accentual-syllabic, which is based on recurrent units (feet) comprised of any combination of stressed and unstressed syllables in an invariant sequence. This sequence, which is abstracted from the observed combination of stressed and unstressed syllables, is assigned to the poem as its metrical pattern or metre. Distinct from metre, we define realisation, narrowly, as the actual verse pattern of stressed and unstressed syllables (line rhythm) that contrasts or coincides with the metrical pattern.*
The properties highlighted for end rhymes include the rhyme label, the position in the rhyme pattern and stanza type, properties related to stress patterns and across word boundaries, as well as the type(s) of similarity with matching rhymes. Related rhymes are highlighted when hovering over the rhyme words. Some of the global properties of rhyme are summarized in the Poetic Form section in the reading-view.
The rhetorical figures currently detected in the phonological domain are alliteration, paroemion, assonance, and consonance. The components of each figure are highlighted when hovering over the constituent words. Individual patterns of each figure will be highlighted globally when hovering over the identified pattern. Freezing the line currently selected will make it easy to investigate particular sound patterns beyond the confines of the current line, thus highlighting their movement in space and time.
The phonemic display button that switches the Text display between the text of the work and its phonemic transcription in modern RP according to the International Phonetic Alphabet (IPA). Further functionalities, such as highlighting of vowels (short, neutralized, long, diphthong), consonants (plosive, affricate, fricative, nasal, approximant), and vowel and consonant groups, are currently in development.
* As such, realisation must be confused neither with a view that considers metre as variant, nor with a rendition of the line, which can take many different forms depending on the speaker and the historical context. Of course, there is no 1:1 relation between stress and metrical prominence, and a metrical position can be filled by more than one syllable. Realisation is here merely used as a simplified way of recording stress and metrical patterns, but still allows us to highlight interesting phenomena, such as stressed syllables in metrically non-prominent positions etc.
This layer is concerned with the internal structure of words as well as word formation mechanisms. It focuses on the poem as written language, as an act of composition, selecting and arranging morphemes into words and phrases. It examines features such as word formation and choice, syllabic phenomena, composition, and word origins.
Syllables, morphemes, and words are independent units of structure, i.e. there are morphemes that are not syllables, syllables that are not morphemes, words that just consist of a single syllable or morpheme. The number of syllables per line is displayed here and any variations given.
The brief morphological table lists every word in the selected line, giving the word, number of syllables, lemma, word class, part of speech, indication of upper case, word frequency in the poem, and a KWIC list. The plan is to expand the current view to a full morphological analysis in the future.
The rhetorical figures currently detected in the morphological domain are polyptoton, epizeuxis, diacope, anaphora, epistrophe, aphaeresis, apocope, syncope, and synalepha. The components of each figure are highlighted when hovering over the constituent words. Individual patterns of each figure will be highlighted globally when hovering over the identified pattern. Freezing the line currently selected will make it easy to investigate particular morphological patterns beyond the confines of the current line.
This includes the number of words in the text and the number of unique word forms, as well as the resulting vocabulary density. There is also a list of most frequent words in the poem.
This layer is concerned with the analysis of syntactic structures, the position and arrangement of words, interesting phenomena such as parataxis and hypotaxis, and the relationship between verse line, metrical and rhythmic structures, and syntactic units, such as phrases, clauses, and sentences.
The relationship between verse line and syntactic structure is complex and ever changing as the poetic text is played out in time and space. Metrical pattern, rhythm, rhyme, and syntactic structure need to be studied as closely connected and interrelated. The stanza form is an important indicator for further analysis and will be identified first.
A computationally facilitated analysis of the syntactic structure is presented in form of a syntactic dependency parse of the selected sentence. Dependency as a syntactic theory is based on the idea that all linguistic units are connected to each other by directed links named dependencies. A syntactic dependency parse thus connects linguistic units according to their relationships. The result is a tree diagram, to be read from left to right and top to bottom, that serves as a visual representation of syntactic structure, in which the grammatical hierarchy is graphically displayed. The verb is taken to be the structural centre of the clause structure. All other syntactic units are either directly or indirectly connected to the verb in terms of the directed links. Technically spoken, each vertex in the tree represents a lexical unit, child nodes are units that are dependent on the parent, and edges are labelled by the relationship between the words.
The major word classes, nouns, pronouns, verbs, adjectives, and adverbs, are highlighted with a different colour scheme each to make it easier to process the information and to relate it to the verse lines. Hovering over the nodes and edges highlights the node properties as well as the types of dependencies. Closer integration of the syntactic parse with the poetic text is planned for a future update. As a start, a sentencing button has been introduced that highlights syntactic units in the poetic text. The number of sentences in the text and the average number of words per sentence is also displayed.
This layer is concerned with the creation of "literal" meaning on the word and sentence level. The meaning of a sentence is a function of the meaning of its component words (paradigmatic associations) and the way they are combined (syntagmatic associations). The study of the function of this selection and arrangement of words is at the centre of this layer.
For the purpose of a computationally assisted analysis of meaning, frame semantics offers an attractive model as it relativizes word meanings to a finite set of semantic frames. Semantic frames are schematic representations of the conceptual structures and patterns that provide the foundation for meaningful interaction in a given speech community. Thus, semantic frames are linked by linguistic conventions to the meanings of linguistic units (lexical items) constructing a schematic representation of a situation, object, event, or relation and providing the background structure against which words are understood. Each semantic frame identifies a set of frame elements, i.e. participants in the frame (semantic role labels), which in turn are linked to individual lexical units (words). We currently use the SEMAFOR v3.0.4 alpha [pre-trained] software to generate the frame semantic parses and the frameviz.js software to visualize them.
Just like the syntactic dependency parse, the frame semantic parse is meant to be read from left to right and top to bottom. At the top are the lexical units of the selected sentence. Below the lexical units are the names of the evoked frames, a complete list of which can be found at FrameNet. The subsequent rows indicate the frame elements for each frame. Frame element spans are indicated with blue bars. For the future, the plan is both to adapt the frames for the narrower context of eighteenth-century writing by supplementing the frame associations with historical contextual information, such as contemporary dictionaries and historical gazetteers, and to integrate the results of this analysis more closely with the other analytical layers.
NB: Meaning is notoriously difficult to establish computationally, and the analysis presented here should be considered experimental. Frames may be evoked erroneously when lexical units are not or wrongly mapped to frame elements due to changes in word meaning, a loss of meaning altogether, or a change of associations conveyed through historical distance between speaker and recipient, or insufficient domain appropriation.
The rhetorical figures currently detected in the semantic domain are similie and homophonic paronomasia. The components of each figure are highlighted when hovering over the constituent words. Individual patterns of each figure will be highlighted globally when hovering over the identified pattern. Please be aware that semantic figures of speech are complex entities and provide a considerable challenge for computationally facilitated detection.
Unlike semantics, which is concerned with the creation of meaning on the level of the word and sentence, this layer is concerned with the creation of meaning in context. The poem is considered as spoken language, i.e. as an act of internal communication between speaker and addressee in the text, and external communication between the poet and the reader and in a wider sense with society itself. This layer comprises elements such as discourse, intention, argumentative structure, internal and external contexts, themes and genre, and real-world references (e.g. named entities).
The rhetorical figures currently detected in the pragmatic domain are ecphonesis, apostrophe, and pysma. The components of each figure are highlighted when hovering over the constituent parts. Individual patterns of each figure will be highlighted globally when hovering over the identified pattern.
Among the various types of referring expressions, references to named entities are crucial to the communication process in a specific context. For the purpose of identifying these references to named entities, ECPA uses a hand-curated and domain customized gazetteer that has been constructed from the results of four NERC engines (Morphadorner, Stanford NER, MITIE, and OpenNLP). We have supplemented the gazetteer with automated PoS-based named entity recognition (classifier), and the source of the detected named entity will be identified from one of these two sources.
As an enabling technology, Named Entity Recognition and Classification is often a pre-processing step for relation extraction, knowledge base generation (taxonomies, ontologies, thesauri), question answering, semantic search, and proper noun and pronominal coreference resolution. All of these areas may offer valuable avenues of enquiry in future updates.
The Visualization tab comprises a set of original and adapted visualizations intended to support the analysis/interpretation of the poems in the Eighteenth-Century Poetry Archive. By shedding light on the texts from different visualization perspectives (presentational/disseminative, operational/observational, analytical/interactive, creative/inventive) and with varying foci on one or more analytical layers, the visualizations serve as a toolbox for researchers interested in visual interpretation.
The Visualization home page will list and briefly introduce (i.e. provide information about the visualization perspectives and analytical foci of) all visualizations available for the chosen poem. You can return to this home page at any time by following the Visualization Home-link, which can be found near the top left corner in all visualizations. When first accessing the visualization tab, the evenly split layout of the workspace will be changed slightly in favour of the visualization space; the text of the poem will, however, remain visible at all times.1
The list of available visualizations for the poem under consideration is generated automatically. Some visualizations will not be available for prose poems, or will have a limited number of display options.2 You can learn more about the chosen visualization by clicking on the help-symbol () behind the visualization's logo in the top left corner. As new visualizations are added over time, the list will ultimately cover all analytical layers and will offer a variety of visualization perspectives. Please do contact the editor with suggestions for additional visualizations or feedback about the usability and usefulness of the visualizations offered. Thank you!
1 The vertical divider can be dragged to adjust the layout to
suit individual preference.
2 Please also be advised that some visualizations can be slow to render when the number of lines in the poem exceeds a few hundred.
Phonemia visualizes the phonemic makeup of a poem (in modern RP). It visualizes vowel and consonant groups and distributions using easy-to-read colour highlighting. It draws attention to sonic patterns, points of contact and opposition, hotspots, and the use of sound devices for particular effects. Phonemia stands in a long tradition of quantitative as well as qualitative studies of the phonemic composition of poems.1
Phonemia is comprised of three2 related displays, which are intended to support shifting perspectives as the exploratory close reading process evolves over time. At the centre of all displays is a complete phonemic transcription of the whole poem (or of a subsection at a time).
- Phonemic transcription / distribution: The distribution display presents a tabular overview of the distribution of vowel and consonant groups in each line of the poem. Each of the phonemic units can be selected individually in the control panel on the left. This panel also lists vowels, consonants, and all phonemes sorted by frequency. In addition, sonic devices such as alliteration or end rhymes are highlighted (sound devices are visualized in both the poetic text and its phonemic transcription).
- Phonemic transcription / metre: Phonemia's second display focuses on the metrical and rhythmic qualities of the poem, by presenting a complete scansion of the poem. A metrical and a syllable pattern are assigned manually to every poem as part of the editorial process in ECPA and this pattern is visualized in the scansion result. When differences in the syllabic/metrical pattern and realization cannot be resolved, we employ the ZeuScansion tool, to suggest a scansion. Based on the quality of the available evidence, a confidence rating is assigned to each scanned line.
- Phonemic transcription / word classes: The word classes display visualizes the relation of parts of speech to phonological properties. It acknowledges the role word classes play when considering phonemic properties, including but not limited to varieties of stress, rhythmic qualities, word and syllable lengths, and voicing. The study of the distribution of word classes, their co-occurrence, and the significance of their verse and sentence positions can be augmented by this display.
Phonemia highlights not only the repetition of certain phonemes in close proximity, but also of groups of phonemes that share a number of features and can thus produce an accumulative effect. To make these more transparent, a warm colour scheme has been chosen for vowel groups and a cold colour scheme has been applied to consonant groups. The effectiveness of these choices is subject to review and may change if feedback suggests other more appropriate visual representations.
Future work may include supplementing modern RP with the phonology of eighteenth-century English, evidence of which can be found in numerous pronunciation dictionaries produced in the later eighteenth century. One of them, Thomas Spence's Grand Repository of the English Language (1775), contains a detailed phonetic script illustrating "correct" pronunciation.3
1 An early example of phonemic analysis is James J. Lynch's "The Tonality of Lyric Poetry: An Experiment in Method", Word 9(3) (1953): 211-224. Phonemia's distribution display is indebted to Marc R. Plamondon's English Poetic Phonemics-project (2008/9).
2 Only the word classes display is available for prose poems.
3 See Joan Beal, English Pronunciation in the Eighteenth Century: Thomas Spence's 'Grand Repository of the English Language', Oxford: OUP, 1999. See also Charles Jones, English Pronunciation in the Eighteenth and Nineteenth Centuries, Houndmills: Palgrave Macmillan, 2006.
- 0:00 Project introduction and participants
- 0:26 Close reading context
- 2:24 Poetic variables
- 3:31 Rule-based visual mappings
- 5:34 Visualization design
- 6:36 Visualization interface
- Comparing Three Designs of Macro-Glyphs for Poetry Visualization. A. Abdul-Rahman, E. Maguire, and M. Chen. In Proceedings of EuroVis 2014, Short Paper, 2014. [DOI]
- Rule-based Visual Mappings — with a Case Study on Poetry Visualization. A. Abdul-Rahman, J. Lein, K. Coles, E. Maguire, M. Meyer, M. Wynne, C. R. Johnson, A. Trefethen, and M. Chen. In Computer Graphics Forum 32(3) (2013): 381-390. (Presented in EuroVis 2013.) [DOI]
- Freedom and Flow: A New Approach to Visualizing Poetry. A. Abdul-Rahman, K. Coles, J. Lein and M. Wynne. In Digital Humanities 2013, Lincoln, Nebraska, 2013. [Abstract]
This re-implementation of Double Tree includes several of the enhancements mentioned in Culy and Lyding, 2010, including the abilities to sort the branches by a variety of properties, to filter the branches by different properties, and to search the words in the Double Tree. It also is designed to work with various types of structured data.
DoubleTreeJS is © 2012-2016, Chris Culy. Used under a BSD license.
- Culy, C., M. Passarotti, and U. König-Cardanobile. 2014. "A Compact Interactive Visualization of Dependency Treebank Query Results". Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), Rejkjavik, Iceland. May 26-31, 2014. N. Calzolari et al. (eds.). ELRA.
- Culy, C. and V. Lyding. 2010. "Double Tree: An Advanced KWIC Visualization for Expert Users". Information Visualization, Proceedings of IV 2010, 2010 14th International Conference Information Visualization, 98-103.
In the Eighteenth-Century Poetry Archive, the term modelling refers to the act of formal ontological modelling.1 Consequently, model refers to a realization or result of such an act by a modeller, drawing on the concepts and relationships defined in one or more published ontologies, and expressed in a formal language. Modelling facilitates the creation and exploration of external representations to make sense of what is being modelled. Formal models also support computational processing and reasoning, which can develop alongside our own thinking about our objects of study.2 In this conceptualization, models are a form of knowledge representation, visualization, and preservation.
Modelling complements the reading, analysis, and visualization views, as it allows for the reflected and systematic representation of insights won in a more experiential manner in the other views.
Motivation and purpose
As external representations or abstractions (sometimes referred to as a "purposeful simplifications"), models are characterized by a situational perspective, guided by a focus on the aspects of the modelled reality that are of particular interest in a given research question, perspective, or theoretical approach.3 Each text or other modelled reality, can have multiple models, all different in their construction of the modelled and each focusing on different aspects, but all collectively modelling the object of study and all arriving at ever more sophisticated and adequate models. It is through formalization and expression of these models in a language suitable for the description of the abstraction that interoperability between them is achieved.
It is important to note that these models are not experiential or idiosyncratic, but are expressed using shared conceptualizations, which makes them useful beyond the original context and purpose of their creation. Models co-exist to shed light on the modelled reality from a variety of angles, and because of their formalization can be shared, re-used, adapted (forked), enhanced, aggregated, and even developed collaboratively.4 This is not an attempt to limit expressivity or freedom, instead it is a plea for adopting common practices where those are possible: opportunities for beneficial formalisms, clarity and precision of expression, and to collaboratively create and collectively benefit from improved knowledge models. A pre-requisite for this collaborative effort is a shared vision of our domain and an agreement on the theoretical underpinnings of our concepts and modelled representations. These formalized shared conceptualizations are provided by ontologies.
Ontologies and the Semantic Web
Ontologies are conceptual data models expressed in a formal language. We can think of them as a kind of shared vocabulary (of concepts and relationships) with clearly defined semantics. In the ontologies used in this project, we find concepts (also known as classes) such as work, manuscript, book, author, period, event, production, document, time-span, place, language, material, apparatus, line, foot, rhyme, stanza, and hundreds more. We also find properties and relations between these classes with varying degrees of specificity, such as mentions, is related to, analyses, has type, consists of, was motivated by, has dimensions, has current location, has title, occurs at, is derived from, is mentioned in, is irregular, and many more. Ontologies thus allow us to capture, encode, organise, and preserve our conceptualizations as knowledge representations.5
In ECPA, we employ the CIDOC CRM family of ontologies6 as reference ontologies for the cultural heritage domain, and POSTDATA as a domain ontology specifically to model poetic texts. We use the CRM as our primary reference model due to its stability and longevity (ISO standard), cross-domain robustness and expressivity, and adaptability and modular extensibility.7 CRM offers hundreds of classes and properties, and although not specifically designed with our application in mind, it is flexible and extensible enough to serve as a starting point for modelling textual history, analysis, and scholarly argumentation more generally. With its many extension ontologies, it also caters for describing archival and library resources, provenance, and material aspects. It thus offers a feasible way of modelling scholarly discourse in the literary domain, from the detailed description of poetic form and function (POSTDATA) to argumentation and knowledge representation more generally (CRM).
CRM is an event-based ontology. It is conducive to the types of historical analysis we engage in as literary scholars. The publication of a book, for example, is conceived of as a publication event, the outcome of which is said book, and in which an author, publisher, and printer participated. It took place at a specific place and a specific time, etc. It is through these events that we connect people, places, time-spans, concepts, ideas, objects, and much more. These events become the basic units of description, and at the same time—as a network—provide the context (e.g. historical, sociological, economic, literary, material) for the model. Put simply, whatever we talk about, describe, refer to, or more generally are interested in become instances of the abovementioned classes conceptualised in our ontologies, and by talking about them, they come into existence.
Ontological modelling is based on the idea of making statements about these instances in expressions of the form subject – predicate – object, known as triples. The subject denotes the instance we want to talk about, the predicate denotes aspects or properties of the instance, and expresses a relationship between the subject and the object. It is through these building blocks that are as free as possible from implicit knowledge and unreflected assumptions that we can formulate and express, explicitly and unambiguously, our understanding of and argumentation about our objects of study. Modelling as an activity does not presuppose a certain size, complexity, or depth. Models can be narrow, specific, and small and be as valuable as larger more complex models.
As knowledge is encoded, preserved, and brought in contact with other knowledge representations, increasingly sophisticated knowledge graphs can evolve. They can help underpin and support our own reasoning processes and, possibly most beneficially, show up areas that will require further investigation and research.
The modelling view is accessed via the Modelling-tab on any of the poems in the Eighteenth-Century Poetry Archive. Its landing page comprises a brief introduction followed by a list of available models to view for the selected poem as well as the option to create a new model. We have endeavoured to make it as easy as possible for you to get modelling: all you will need is a valid e-mail address (no registration or signing up for an account is necessary).
Telling BIGGER stories
For modelling to succeed and become the stepping stone to interpretation we believe it will be, it needs to be a participatory methodology to produce and organize shared knowledge, thus transcending our individualistic analytical tradition. By making our research reproducible, extensible, and thanks to the ontological underpinnings, machine-understandable, we embrace the machine's capacity for reasoning as an important way of producing new insights.9
Modelling is a means of making our understanding compatible with the machine's capacity for reasoning, a combinatorial negotiation implicit in all modelling and simulation. This prototype hopes to hint at the viability (or imaginative fruitfulness) of the approach rather than establishing proof-of-concept. Once a big enough set of models has been created, we can use computers to analyse them, reason upon them, and help us to not only store these knowledge bases, but build on them to discover new knowledge. Ultimately, we expect these models to constitute a dataset that will be useful for the computationally-assisted interpretation of the poems.
In the context and for the purpose of this project, the modelling view
is not concerned with the underlying issues of data modelling, i.e. with turning
our objects of study into digital versions (e.g. TEI/XML) in the first place. For an
extensive discussion of these issues, see The Shape of Data in Digital Humanities:
Modeling Texts and Text-based Resources, ed. Julia Flanders and Fotis Jannidis.
London and New York: Routledge, 2019.
2 In this research, I am not interested in computers as hypothesis-testing or fact-finding machines, or indeed in attempting to teach a computer to read and analyse a text like a human reader would. Long-term, I am interested in how computers read, analyse, and ultimately interpret texts, and in understanding the adaptations, mediations, and negotiations necessary to activate this potential for our own understanding and investigation of new territory. I consider modelling and, alongside it, simulation as crucial steps towards this aim.
3 Models acknowledge their situational, perspectival, and research-led character, but are not limited to a specific time-frame, a singular purpose, and are not idiosyncratic. They make their assumptions, conceptualizations, contextualizations, and other forms of implicit knowledge as explicit as possible and thus aim to contribute to a common (adequate and appropriate) understanding of the domain.
4 The multiplicity of models in the modelling view is therefore not a collection of static views of the same modelled reality, but the result of many dynamic engagements with models, modellers, and modelled aspects of those realities. Models are performative in nature, inviting manipulation and experimentation. As changes in perspective are acted out (simulated), new insights are won. And as models mature, they represent both the what and the how of the knowledge created.
5 This is not to suggest that agreement about semantics is achievable in every instance, it implies, however, a willingness to embrace the core humanistic ideal of unambiguous formulation and expression in order to make humanistic concepts and methods computationally operational.
6 As of v1.1 of ECPA, the versions of the CIDOC CRM family of ontologies used are: CIDOC CRM v6.2.1 (draft) encoded in RDFS, CRMinf 0.7 Encoded in RDFS, and FRBRoo v2.4 DRAFT harmonised with CIDOC CRM v6.2.1 encoded in RDFS. These are currently the latest versions for which RDFS representations exist. For the purposes of simplification, we use a subset or "reduced CRM-compatible form" of the CIDOC CRM.
7 Our domain ontology is based on an open world assumption, which accepts that, as in the humanities more generally, our knowledge is frequently incomplete, provisional, and underspecified, i.e. there will be exceptions, limitations, omissions in our conceptualisations. It is important to realize that it will be easier to model some aspects in our domain than others. We are keen to learn about situations where it feels "impossible" to express something you want to model.
[Please note: Unfortunately, the POSTDATA project was unable to provide a version of their ontology in time for inclusion in this prototype. We will be working towards inclusion of the ontology in one of the next updates to the modelling platform.]
8 Models are serialized (i.e. stored as files) as N-Triples (a very simple, easy-to-parse, line-based format) for long-term use and preservation. Models can also be imported into the modelling view using a variety of ingest formats (currently not exposed through the user interface), of which Turtle is among the easiest to read and write.
9 Formal models can be expressed as a system of logical propositions, which can be reasoned upon.
The ECPA modelling view has been designed in a way that is easy to grasp, learn, and perform. Its layout consists of two components: a header and the body of the model. The former comprises the title of the model and a number of workflow buttons (e.g. fork, close, save, download, publish), while the latter comprises the model as a graph representation via Cytoscape.js and a control panel, which provides information about the model, a representation of the model's components in text form, and an editing tool.
Our tool is at the same time modelling and publishing platform. As such, it offers a number of workflow management buttons that support the entire modelling process, from inception through to publication.
Viewing an existing model
Possibly the easiest way to familiarize yourself with the modelling view is to spend some time exploring an existing model. The default view will show the model in its graphical representation next to a set of metadata about the model in the control panel to the right of the graph. You can explore the model in more detail either by traversing the knowledge graph, or by looking at its components via the View-tab in the control panel. You will notice that the Edit-tab is not accessible when viewing somebody else's graph.
Forking an existing model
Forking existing models is at the heart of collaborative digital scholarship. When forking a model, the model becomes the starting point of a collaborative effort.1 To fork a model, simply click the Fork-button and fill in the required metadata fields. The header will indicate the collaborative nature of the model created. Once a model has been forked, it can be edited in the same way as a newly created model.2
Creating a new model
Creating a new model is easy. Simply fill in the required metadata fields and you are ready to go. As your model will be empty to start with, the default view will take you directly to the Edit-tab in the control panel, where you can start creating your model.2
Saving (and resuming work on) a model
You can pause and save your progress at any point in the modelling process by saving and closing your model. Before doing so, it may be convenient to bookmark your model's Web address in your browser so that you can easily find it when you are ready to resume your work.
Downloading a model (and its graph)
You can download your model at any point in the modelling process. Your model will be saved to your computer in the n-triples format. Please note that it is currently not possible to upload models into the ECPA modelling interface via the frontend. In exceptional circumstances, we can upload a model for you.
You can also export an image of the graph (to be precise the part of the graph visible in the current viewport) of your model as a PNG image file. In addition, you can also export the Cytoscape graph as a whole as a JSON file, which would enable further processing on the level of the graph alone.
Publishing a model
All models are private by default, which means that you can work on them for as long as you need to. It is only when you submit a model for publication that the editor will review your model primarily to ensure its integrity and to resolve any issues you may have encountered. You will not be able to work on your model during this process. Once the review stage is completed, your model will be published on the ECPA Website.
1 The two main purposes
of forking a model are: 1. the ability to build upon each other's work, and 2. to offer changes
(hopefully, improvements) back to the original modeller. Forking facilitates a mode of thinking,
learning, and researching that is profoundly social in nature.
2 All models are private by default. You can continue working on your model in private until you are happy for it to be shared with others.
The control panel
As of the current prototype of the modelling interface, the control panel is the main mechanism by which to view and build a model.1 The control panel consists of three tabs, About this model, View, and Edit.
About this model
The About this model-tab is the default landing page when viewing a model. It provides a set of metadata essential for initial orientation in the model. Apart from basic information about the model's creator and provenance, it should explain the motivation and rationale of the model.
The View-tab comprises three scrollable sections, namely classes, i.e. the ontological concepts used in the model, instances (or individuals), i.e. the individual instantiations of those classes, and finally statements, i.e. the incoming and outgoing statements that define and relate the instances among themselves and connect them to external conceptualizations.
The number and types of classes reflect the type of argument and discourse the model instantiates. A definition and the ontological source of a class will be displayed when hovering over it. Choose any class from the list to view a list of its instances in the model.
The instances represent the objects of interest and study in the model. Choose any of the instances from the list to view the incoming and outgoing statements that define its properties and relate it to other internal and external individuals and concepts. When choosing an instance, it will be focused on in the graph display.1
In the statements section you can inspect the defining properties of the chosen instance. The statements are also displayed in tooltips, when hovering over any instance, i.e. node, in the graph.
The Edit-tab comprises three scrollable/expandable sections, namely templates, shortcuts, and taxonomy, which provide increasingly granular modelling options.
The easiest way to start modelling is to use the templates provided for a number of common classes (currently, persons, works, objects, events, places, concepts, and argumentation). It is recommended to first search Wikidata for a usable record of the object of interest. If a record is found on Wikidata, you can simply copy&paste its item ID (Q. . .) into the template's Wikidata ID input field and press the TAB key to autofill the form. You can also select the already customized templates provided for the author and work instances for the selected text.
If no (suitable) Wikidata record can be found, you may want to enhance an existing record or create one if none exists (and then use its item ID in the above manner). Alternatively, you can fill in the template form fields by hand. All templates have been designed with the bare minimum of metadata necessary to create an instance in mind. You can enhance the detail of any of the instances created at any time during the modelling process.
The required instances and properties of each of the common classes will be created automatically and added to the knowledge graph on form submission. As a result you will see both the graph representation and the overview of the created instances in the View-tab. Whenever new instances or properties are created the graph will be redrawn and focused on the newly added part of the knowledge graph.1
The shortcuts in the second section provide a convenient way of connecting common classes in standardized ways. Shortcuts are currently provided for things, places, actors, events/time, and concepts. These shortcuts are based on the CIDOC CRM's idea of using "Fundamental Categories" (FC) and "Fundamental Relationships" (FR) for querying CRM-based repositories.2
To use the shortcuts, simply choose a class of which you want to connect an instance of, e.g. a person (actor), by clicking on the appropriate arrow symbol to expand the tree. Next, choose the class of the instance you want to connect to, e.g. a place (again, by clicking on the arrow symbol). Next, select one of the available categories which best represents the type of connection you are trying to make, e.g. actor refers to place. You will be presented with one or more shortcuts, which connect the chosen classes in a meaningful way.
Each shortcut takes the form of a path of one or more statements (triples) that follow on from each other. You can hover over each of the components of the path to get a definition of the constituents of the shortcut. Choose one of the numbered (bold) Shortcut links to bring up the shortcuts form and start filling in the instances to be connected. As instances are added to the knowledge base, these will be used to populate data lists that can be used to speed up the process of filling in the forms.
Once familiar with the basics of creating triples (statements), the taxonomy tree display of the classes in our ontologies provides the most granular modelling option. It is advisable to spend some time familiarizing yourself with the class hierarchy and the concepts represented. Simply click on the arrow symbols to expand or collapse the tree. You can call up the scope notes for each of the classes by hovering over it.
To select a class, simply click on its label. This will take you to the triple creation interface. The chosen class will become the subject of your statement. Next, choose a predicate from the lists of direct and inverse properties. As you choose from the available options, the current statement is created for you at the top of the panel. Finally, click the Create-button to insert the triple into the knowledge graph as usual.
It is important to stress that the best results are often achieved by using the three approaches (templates, shortcuts, and taxonomy) in tandem. As always, please feel free to contact us with any questions, problems, or ideas for improvements. As the most recent addition to ECPA, we are committed to further development and enhancements of the modelling prototype based on user feedback.
1 In future updates of the platform, we will experiment
with a graphical modelling interface (i.e. directly powered by
Cytoscape), however, for now the graph is primarily a
visual and exploratory aid. Please note: the graph is
currently redrawn from scratch every time an addition is made, any
manual adjustments of the layout of the nodes and edges will
therefore not be preserved beyond the next edit. Manual changes to
the layout may, however, be useful before exporting an image of the
2 Katerina Tzompanaki and Martin Doerr, "Fundamental Categories and Relationships for intuitive querying CIDOC‐CRM based repositories". Technical Report ICS‐FORTH/TR‐429. Heraklion: Institute of Computer Science, FORTH, April 2012.