2023

Improving the Web Archiving Infrastructure of the National Széchényi Library and the Bibliothèque Nationale du Luxembourg

The aims and objectives of the activity

The aim of the cooperation was to improve the web archiving infrastructure of the National Széchényi Library and the Bibliothèque Nationale du Luxembourg, to renew certain workflows and to share knowledge. On the Luxembourg side, this involved the transfer of their API and deduplication module for the Browsertrix Crawler software, assistance with its implementation, and the sharing of their knowledge and software in the use of the Camunda workflow management system. On the National Széchényi Library side, it meant providing advices on the operation of the SolrWayback program, and informing about the design of the planned database, and automated workflow.

Overview 

From 11 to 15 September 2023, NSZL staff (Gyula Kalcsó, Eszter Mihály, Kata Szűcs) visited the National Library of Luxembourg. During the visit, they held technical discussions, participated in workshops and got to know the workflows of the Luxembourg web archive. During the workshops, the colleagues gained a lot of new experience, learned new procedures and agreed on the sharing of certain configurations and source code. Opportunities for academic cooperation were also explored. The staff of the web archives discussed with academic researchers the possibilities of research and how to broaden them. In addition to planning the technical details (how national libraries share source codes and documentation and how they make them openly accessible), the participants also discussed activities related to the call for proposals but beyond. The delegation also met Claude D. Conter, Director of the National Library of Luxembourg.

Between 28 and 30 November 2023, the colleagues from Luxembourg (Ben Els, László Tóth) visited Budapest. The main purpose of the visit was to participate in the seventh annual web archiving event of the NSZL Digital Humanities Centre, the 404 Not Found – Who Preserves the Internet? conference, with presentations and workshops. The event focused on the renewal of the Hungarian national library’s web archiving activities, new tools, technologies and new institutional collaborations. The DHC presented the latest activities related to the web archive, targeted data retrieval with scraping, the use of artificial intelligence, etc.

Main speakers of the 404 Not Found – Who Preserves the Internet? conference: Ben Els, László Tóth, László Drótos, Eszter Simon, Gyula Kalcsó.

As part of the visit, the colleagues from Luxembourg met with the delegation of the Hungarian Linguistics Research Centre and discussed the possibilities of automatic language recognition of the content of the multilingual Luxembourgish web archive.

 

The outcomes of the project

Our main objective, to improve the infrastructure of the two web archives, has been fully achieved. We have shared a lot of knowledge and good practices that we can incorporate into both institution’s web archiving activities. We have also delivered on our commitment to code-sharing by exchanging and open-sourcing important code that will greatly contribute to the significant advancement of the web archiving activities of both national libraries. During the visits, we have built up working partnerships that promise to go well beyond the scope of the grant cooperation.

Quotes from participants

  • László Tóth, Web Archiving Software Engineer, National Library of Luxembourg:

This was a very fruitful collaboration, characterized by a broad and varied transfer of knowledge in several areas. The 404 Conference in Budapest was a highly stimulating experience for everyone involved, and our Hungarian colleagues’ visit to the BNL was a successful and though-provoking exchange of ideas. In conclusion, a rewarding experience for all parties.

  • Gyula Kalcsó, Web Archiving Team Leader, National Széchényi Library:

This cooperation was essential for us, because we were able to learn about tried and tested processes that we are still in the process of implementing. We were able to see them in action and try them out. We also learned a lot from the presentations and workshops of our colleagues from Luxembourg. It is also worth noting that the cooperation has also led to further opportunities for cooperation and new contacts.

Web links

Links to the web archives of the National Széchényi Library and the Bibliothèque Nationale du Luxembourg:

Gallery