Arquivo.pt, formerly known as the Portuguese Web Archive, is a web archive that preserves content dating back to 1996.[1] It is a service of the Fundação para a Ciência e Tecnologia (FCT) and was initially founded at the Fundação para a Computação Científica Nacional by Daniel Coelho Gomes.

The mission of Arquivo.pt is to archive web content of national interest, preserving information of historical significance for future generations. In addition to preserving scientific and historical knowledge, Arquivo.pt also enables individuals to preserve their personal memories.[2][3]

Arquivo.pt provides a free preservation service for Portuguese web authors and serves as a research resource. Researchers have utilized it to measure the accessibility of the Portuguese web for people with disabilities.[4] Additionally, any Internet user can suggest websites to be archived using a form on the Arquivo.pt website.[5]

In 2017, Arquivo.pt celebrated 10 years since the project's inception.[6] During this year, Arquivo.pt received 100,000 users, with a growing trend as the service became more widely known (90% were new users and 53% were from outside Portugal).

Features

edit

Arquivo.pt offers a range of features for preserving and accessing web content. These include URL search, full-text search, image search, site suggestion, and various types of web crawls.

Online Accessibility

edit

Arquivo.pt is fully accessible online, unlike web archives that require users to visit physical facilities to access archived content. This makes it convenient for users worldwide to access and utilize the archived content without geographical limitations.[7] For example, the French web archive (Archives de l'internet) and the Czech web archive (Webarchiv) require users to visit specific locations to access their archives.[8][9][10] This online accessibility ensures that Arquivo.pt can serve a broader audience, including international researchers and the general public.[11]

edit

Arquivo.pt allows users to search for archived web pages by their URL. This feature enables users to access different versions of a web page over time, providing a historical view of the web content. Users can enter the URL of a specific web page to see its archived versions, making it easier to track changes and updates over time.[12] This capability is particularly useful for researchers and historians who need to analyze the evolution of web content.[13]

edit

Arquivo.pt offers a full-text search capability, allowing users to search for specific terms or phrases within the archived content. This feature is particularly useful for researchers and historians who need to find specific information within a vast amount of archived data.[14][15] Full-text search is not commonly available in many other web archives, which often only offer URL-based searches.[16]

edit

In March 2021, Arquivo.pt introduced an image search feature, known as Dionisius. This tool allows users to search for images archived from the web, dating back to 1996. Users can find images that are no longer available on the live web and can also locate the original web pages where these images were published.[17][18] This feature is relatively unique among web archives and provides significant value for users interested in visual content.[19]

Site Suggestion

edit

Arquivo.pt encourages user participation by allowing anyone to suggest websites for archival. This feature ensures that important web content is preserved, even if it might otherwise be overlooked. Users can submit suggestions through a form on the Arquivo.pt website, contributing to the archive's comprehensiveness.[20]

Types of Crawls

edit

Arquivo.pt performs various types of web crawls to capture and archive web content. These include broad crawls, which cover a wide range of websites, and selective crawls, which focus on specific themes or events. The archive conducts 3 to 4 crawls per year, ensuring that a significant portion of the web is regularly archived.[21][22] This approach helps maintain an up-to-date and comprehensive archive of Portuguese web content.

Awards and Recognitions

edit

Arquivo.pt has received numerous accolades and recognitions for its contributions to digital preservation and web archiving. These awards highlight the archive's impact and importance in the field of digital heritage.

  • The catalogue of digital preservation tools developed by Arquivo.pt was a finalist for The National Archives (UK) Award for Safeguarding the Digital Legacy (Digital Preservation Coalition Awards 2024).[23]
  • In 2023, Arquivo.pt was ranked among the top 3 government digital services in Portugal.[24]
  • The archive won the Best Digital Service award in 2022.[25]
  • In 2022, Arquivo.pt was included in the honour roll for security in Portugal according to the Portuguese Observatory of Internet Technologies.[26]
  • The archive received the Best Paper Award for its work on measuring the Portuguese web at the Ibero-American IADIS WWW/Internet 2008.[27]

References

edit
  1. ^ Gomes, Daniel (2022-11-14). "Web archives as research infrastructure for digital societies: the case study of Arquivo.pt". Archeion. 123: 46–85. doi:10.4467/26581264arc.22.012.16665. ISSN 2658-1264.
  2. ^ Teszelszky, Kees (November 2021). "Introduction: digital humanities and the use of web archives". International Journal of Digital Humanities. 2 (1–3): 1–4. doi:10.1007/s42803-021-00040-5. ISSN 2524-7832.
  3. ^ Gomes, Daniel; Costa, Miguel (April 2014). "The Importance of Web Archives for Humanities". International Journal of Humanities and Arts Computing. 8 (1): 106–123. doi:10.3366/ijhac.2014.0122. ISSN 1753-8548.
  4. ^ Lopes, Rui; Gomes, Daniel; Carriço, Luís (2010-04-26). "Web not for all: A large scale study of web accessibility". Proceedings of the 2010 International Cross Disciplinary Conference on Web Accessibility (W4A). ACM. pp. 1–4. doi:10.1145/1805986.1806001. ISBN 978-1-4503-0045-2.
  5. ^ "Suggest websites to be preserved". Retrieved 2024-09-29.
  6. ^ Gomes, Daniel; Nogueira, André; Miranda, João; Costa, Miguel (September 2009). "Introducing the Portuguese web archive initiative". 8th International Web Archiving Workshop.
  7. ^ "General informations – sobre.arquivo.pt". Retrieved 2024-09-29.
  8. ^ "Frequently Asked Questions". Retrieved 2024-09-29.
  9. ^ "How to use the digital library — National Library of the Czech Republic". Retrieved 2024-09-29.
  10. ^ "Bibliothèque nationale de France - IIPC". Retrieved 2024-09-29.
  11. ^ "Arquivo.pt". Retrieved 2024-09-29.
  12. ^ "Page search – sobre.arquivo.pt". Retrieved 2024-09-29.
  13. ^ "Arquivo.pt API (Full-text & URL search)". Retrieved 2024-09-29.
  14. ^ Gomes, Daniel (2022-11-14). "Web archives as research infrastructure for digital societies: the case study of Arquivo.pt". Archeion. 123: 46–85. doi:10.4467/26581264arc.22.012.16665. ISSN 2658-1264.
  15. ^ Teszelszky, Kees (November 2021). "Introduction: digital humanities and the use of web archives". International Journal of Digital Humanities. 2 (1–3): 1–4. doi:10.1007/s42803-021-00040-5. ISSN 2524-7832.
  16. ^ "Arquivo.pt API (Full-text & URL search)". Retrieved 2024-09-29.
  17. ^ SAPO. "Arquivo.pt tem mais de mil milhões de imagens históricas da internet pesquisáveis online". SAPO Tek (in Portuguese). Retrieved 2021-05-05.
  18. ^ Mourão, André; Gomes, Daniel (2023-10-09). "Searching images in a web archive". 2023 IEEE 10th International Conference on Data Science and Advanced Analytics (DSAA). IEEE. pp. 1–10. doi:10.1109/DSAA60987.2023.10302607. ISBN 979-8-3503-4503-2. Retrieved 2024-08-27.
  19. ^ "Image search – sobre.arquivo.pt". Retrieved 2024-09-29.
  20. ^ "Suggest websites to be preserved". Retrieved 2024-09-29.
  21. ^ "Crawling web content – sobre.arquivo.pt". Retrieved 2024-09-29.
  22. ^ "Creating a searchable web archive (Technical Report)" (PDF). Retrieved 2024-09-29.
  23. ^ "The National Archives (UK). The National Archives (UK) Award for Safeguarding the Digital Legacy". Retrieved 22 August 2024.
  24. ^ Fevereiro, Sara. "Quem são os líderes da transformação digital do país?". Expresso. Retrieved 20 August 2024.
  25. ^ "Os Melhores & As Maiores do Portugal Tecnológico 2022: conheça os vencedores". November 30, 2022. Retrieved August 20, 2024.
  26. ^ "ISOC Portugal lança o Observatório da Internet portuguesa". Retrieved 20 August 2024.
  27. ^ Gomes, Daniel; Miranda, João (December 2008). "Arquivo e Medição da Web Portuguesa". Ibero-Americana IADIS WWW/Internet 2008.