In April 2022, the German National Library (Deutsche Nationalbibliothek, DNB) put a new automatic subject cataloguing system into operation. The so-called “cataloguing machine” will initially be used to automatically assign thematic search entries based on the contents of German-language e-books, electronic journal articles and printed university publications.
The new automatic subject cataloguing system is currently used to assign DDC Subject Categories, descriptors from the Integrated Authority File (GND) for German-language publications, and DDC Short Numbers in the subject category Medicine (for German and English-language publications).
All components are integrated as services within the system’s modular architecture. The system is based on Annif, an open source software which was developed at the National Library of Finland in Helsinki and is used for automatic classification and indexing purposes. Annif is language-independent and combines various algorithms for text mining and machine-learning.
Implementing each of the new processes as services facilitates the combination, replacement and extension of functionalities and workflows in the future. You will find more information in our blog (in German) and in the Annif users group forum (in English).
Photo: Claudia Grote, DNB, CC BY SA 3.0 DE