Developing a bilingual controlled vocabulary for heritage science

Scientific responsibility :

  • Loïc Bertrand
  • Caroline Corbières
  • Sophie David

Partnership :


Funding :


Project ID : IDF-DIM-PAMIR-2023-A-004

Summary :

The objective of this internship is to build a bilingual controlled vocabulary in heritage sciences that can be queried by a process to create bibliographic queries, and subsequently to carry out documentary monitoring. The construction of the controlled vocabulary will be based on the extraction of keywords from the titles, summaries and keywords of the corpus developed as part of the OBISPA project and on interviews with heritage science specialists. The trainee’s activities are as follows:

  • to take stock of existing vocabularies in the heritage sciences;
  • to build a bilingual controlled vocabulary of key words in the heritage sciences;
  • to propose a processing chain requiring the controlled vocabulary for bibliographic monitoring; to evaluate the results of the processing chain;
  • to write a report on the results obtained.

This work will be based on the tools used and developed by the members of the OBISPA project, namely :

  • Zotero to access the corpus; the Python and Javascript codes developed by the librarians – in the form of Jupyter notebooks – to analyse the augmented corpus (lexical analysis and citation network analysis);
  • Jupyter notebook to develop a Python code querying the controlled vocabulary;
  • Git to deposit the developed code;
  • the bibliographic databases Google Scholar, Hal, Scopus and Web of Science to carry out bibliographic queries.