Top of page

Computing Cultural Heritage in the Cloud Derivative Datasets

Welcome to data.labs.loc.gov, an experimental sandbox for sharing data packages. Right now, it only features data packages compiled as part of LC Labs' Mellon Foundation-funded Computing Cultural Heritage in the Cloud (CCHC) initiative. These data packages will be used as part of an invitation-only event in October 2022 to investigate access and engagement with large public domain datasets using cloud services.

LC Labs continues to seek feedback from users on the information presented in this space; please get in touch with us below with your comments and questions.

Data Packages

  • historic stereograph image of woman peering through stereoview

    Stereograph Card Images Data Package

    About: The Stereograph Card dataset consists of 39,526 stereograph card images from the 1850s through 1924, a subset of what was available online in the collection on loc.gov in August 2022.

    Source collection: The image files in this dataset are derived from the Stereograph Cards collection on loc.gov. For more information about the source material, please contact the Prints & Photographs division .

    Link to data: To access the dataset and documentation, please visit the Stereograph Cards Data Package cover page .

    Link to image on the left: https://www.loc.gov/item/2003674057/ .

  • historic map of 19th century Austria-Hungary

    Spezialkarte der österreichisch-ungarischen Monarchie ("Austro-Hungarian map set") Data Package

    About: This experimental dataset contains 4,998 images in TIFF format representing non-georeferenced map sheets and corresponding GeoTIFF formatted images that are georeferenced, and have had the map collars (non-map portions of the image at the edge of the sheet) removed.

    Source collection: The historical maps contained in this dataset were initially prepared and issued by the Austro-Hungarian Monarchy's Militärgeographisches Institut beginning around 1875 . After the dissolution of the Austro-Hungarian Empire in 1918, parts of the set were continued by successive governments. For more information about the source material, please contact the Geography & Maps Division .

    Link to data: To access the dataset and documentation, please visit the Austro-Hungarian Map Data Package cover page.

    Link to image on the left: https://www.loc.gov/item/2018588019/ .

  • illustration of man working a printing press

    Selected Digitized Books Data Package

    About: This dataset comprises 166,218 .txt and JSON files containing full text from 90,414 books in the Selected Digitized Books collection on loc.gov. The text was created as part of digitization workflows using Optical Character Recognition (OCR) technologies.

    Source collection: The Selected Digitized Books collection is a growing collection of selected books and other materials from the Library of Congress General Collections that have been made openly available. Most of the materials in this collection were published in the United States prior to the 1930s and are in English. The collection features thousands of works of fiction, including books intended for children, young adults, and other audiences. There are also some materials in foreign languages that were published in other countries.

    Link to data: To access the dataset and documentation, please visit the Digitized Books Data Package cover page.

    Link to image on the left: https://www.loc.gov/resource/gdcmassbookdig.littleadventures00alls/?sp=25 .

Questions and Feedback

For curatorial questions about the source collection or technical questions about the dataset formats and composition, please contact the appropriate content specialist via the Library's Ask a Librarian service at https://ask.loc.gov/ .

For questions about download and access, documentation, or how a dataset was created; or to share more about your experience using a data package, please email the LC Labs team at [email protected]