Digital corpus of Czech Church Slavonic texts

The project focuses on the digitization of non-Church Slavonic texts, which (with varying degrees of probability) originated in the 10th and 11th centuries on Czech territory. The corpus serves as a source for further philological research, for now studies have been written on its basis dealing with phonetic changes reflected in Czech Church Slavonic texts and stylometric analysis (for the time being in print). Texts are transcribed according to professional editions, as well as images of manuscripts. Digitization takes place in cooperation with the Slavonic Institute of the Academy of Sciences of the Czech Republic, and partial activities of the project are supported by internal grants from the Faculty of Arts of the Palacky University in Olomouc. The corpus is not yet public and offline, but upon request it is possible to make it available and use it in research for other interested parties.

(Miroslav Vepřek)

Literary Cartography and Quantitatve Models of Czech Novels from the 19th Century

The project has a twofold basic goal. The first is to present a database of literary and cartographic models of fictional topographies of Czech prose that significantly thematise Prague. The purpose of the database is to make available map models of Prague’s fictional topographies in Czech prose from the second half of the 19th century onwards, which will serve researchers to analyse and compare structural and morphological correspondences and differences in the ways in which Prague’s topography is configured in literary works. The insights that emerge from such comparisons should contribute to a more systematic understanding of the more complex question of literary representation in Prague. The second aim is to present selected quantitative and statistical analyses that add another important dimension to literary cartographic models, which provide specific and unique information about texts that map models do not contain. By combining the two main methodologies (mapping and quantitative analysis) it is possible to arrive at results that are able to provide completely original and otherwise difficult to obtain information that has the potential to become a significant source of new insight into the way Czech literature has represented Prague since that period.

Future goals:

In addition to the continuous building of the corpus itself, the following tools will be designed:

Sentiment analysis tool based on thematic lexicon.

A tool for modelling verbs of motion based on a thematic lexicon.

Develop the corpus database and add new analysis tools to create a representative infrastructure (sustainability of the project).

(Richard Změlík)

Database of the Czech and Slowak Americans in the Service durin WWI

This project aims on documenting the life and role of Americans of Czech and Slovak origins fighting World War II. It consists of a database of the soldiers themselves, and a number of documents, photos and comments accompanying and explaining the position and role of the Czech and Slovak Americans during WWII. Links to affiliated and topic-related webpages and project will be also added shortly. We hope that this project will not only help the various users to understand better the life, faith and sometimes fate of the men in uniform, but also to understand better the life of the whole Czech and Slovak American community. It should also help as a tool to keep track of family members, relatives or even just random members of the community and is therefore open to wide collaboration. Being American citizens, they have often experienced the unusual destiny of liberating the homeland of their parents or forfathers, often still speaking the same language as the people of Czechoslovakia themselves. But there were others, too, who have fougt in Pacific, in Asia or Africa, in Air Force as well as in the Navy. They still remembered their Czech heritage, but also became, gradually, more and more the true Americans, shaped and hardened by the war.

(Vladimír Polách)

Research in AI #1 (the future plan)

The aim of the research will be to compare how AI (specifically Chat GPT 3.5 and Chat GPT 4) responds to a series of questions that relate to selected topics, e.g. in the fields of medicine, law or theology.


  • A set of basic questions will be formulated in the area of the selected disciplines. The purpose of these is to obtain answers that will be understandable, especially to the layman.
  • The questions will reflect the basic requirements that a layman can potentially ask of experts in the selected discplines.
  • The answers will be given to a group of experts from the selected dsicplins, with the aim of formulating them as clearly as possible, taking into account the fact that the recipient is a layman and also taking into account the scope, which will be defined by the number of tokens.
  • The same queries will then be given to the AI, with its answers limited by the same number of tokens.
  • Subsequently, the responses of the „live“ respondents and the AI-generated responses will be quantitatively analysed.
  • The quantitative analysis will look at the following aspects:
    a) verbal richness, text extensiveness, text concentration,
    b) TTR,
    c) Sentence length (average and meridian – boxplots),
    d) Motivational richness (in how many subtopics is the main theme – question realized),
    e) Entropy (degree of originality, sterotypicality of the text).
  • These values will be measured against the given topics – questions and presented in graphical summaries.
  • The following relationships will be monitored:
    a) how human and AI answers differ,
    b) how the answer differs between GPT 3.5 – GPT 4,
    c) how do human answers differ between GPT 3.5 and GPT 4.
  • At the same time, a qualitative interpretation of the responses from both types will be made.
  • The output will be a comparison of how a human (domain expert) and an AI respond to a set of queries with respect to its different versions, and to what extent the AI is able to replace human work

(Richard Změlík – Miroslav Vepřek)

Research in AI #2 (the future plan)
  • The goal will be to try to train AI on a corpus of literary texts.
  • The AI is then given the task of writing, for example, a romantic, realistic short story.
  • Main reserach question: which of the structural phenomena characterizing the given genre and poetics appear repeatedly in the generated AI texts (motifs, themes, narrative techniques, types of narrators, narrative method, plots, character characteristics, etc.).
  • The aim is to find out what segments and structural relationships AI takes into account when generating literary texts and to what extent it is able to compete with fiction.

(Richard Změlík)