Top of page

Access Exploratory Data Packages

Discover and access the Library's free-to-use exploratory data packages, a data publishing format that combines normalized metadata, metadata enrichments, media files, standardized documentation, narrative context, code samples, and other usability features.

  • Sanborn Maps Data Package

    The dataset contains metadata records for 50,600 maps from the Sanborn Fire Insurance Maps collection and their corresponding 440,048 images . The Sanborn collection at Library of Congress includes over fifty thousand editions of fire insurance maps comprising almost seven hundred thousand individual sheets. The Library of Congress holdings represent the largest extant collection of maps produced by the Sanborn Map Company.

    Metadata Metadata formats Data files
    50,600 records .csv, .json 440,048 .jpg images
  • United States Elections, Web Archives Data Package

    The data package is comprised of 396,117 CDX index files from the United States Elections Web Archive , which includes campaign websites and related web content documenting presidential, congressional, and gubernatorial elections that were archived weekly during general election seasons. The data package currently includes years 2000 – 2016.

    Metadata Metadata formats Data files
    396,117 index files, 1 descriptive metadata file .cdx.gz, .csv Web archived documents are not included within the data package, but the CDX files provide pointers for download.
  • Free to Use and Reuse Data Package

    This dataset contains metadata records and images for 2,610 curated selections featured in the Library of Congress' Free to Use and Reuse Sets as well as links to images for the full digital objects represented in the sets. This dataset includes only those items which are accessible via the Library of Congress' API.

    Metadata Metadata formats Data files
    2,610 records .csv, .json 2,610 .jpg images
  • National Jukebox Data Package

    This dataset contains metadata records and audio files for 5,882 audio recordings in the National Jukebox collection . The records range in date from 1900-1922.

    Metadata Metadata formats Data files
    5,882 records .csv, .json 5,882 .mp3 audio files
  • Digitized Telephone Directories, 1891-1988 Data Package

    This dataset contains metadata records for a subset of 3,513 reels of US telephone directories, digitized from microfilm, from the Digitized Telephone Directories collection . Full text OCR files are also included for a subset of the records when they exist.

    Metadata Metadata formats Full text OCR files
    3,511 records .csv, .json 486 .txt files
  • Selected Dot Gov Media Types, Web Archives Data Package

    The Dot Gov Datasets are the result of exploratory work conducted by the Library's Web Archiving Program to make the Web Archives more widely accessible and usable. This data package consists of seven datasets, each containing information related to 1,000 or more files of related media types selected from .gov domains in the Library's Web Archives (i.e. audio, CSV, image, PDF, Powerpoint, TSV, and XLS data formats).

    Metadata Metadata format Data Files
    7,000 records .csv 7,000 files in multiple formats
  • Stereograph Card Images Data Package

    The Stereograph Card dataset consists of 39,526 stereograph card images from the 1850s through 1924, a subset of what was available online in the collection on loc.gov in August 2022.

    Metadata Metadata formats Data files
    39,532 records .csv, json 39,526 .jpg files
  • Directory Holdings Data Package

    The Directory Holdings Data Package consists of metadata describing the Library of Congress inventoried holdings of United States Telephone Directories, City Directories, and Criss-cross directories. It is based on the inventory tables listed on the Library's United States: City and Telephone Directories and Directories By Address: Inventories of Library Collections Library Guides . The data is presented two ways: by Directory Type and by state/region.

    Metadata Metadata formats
    250,318 records .csv, .json
  • Selected Digitized Books Data Package

    This dataset comprises 84,058 files containing full text from 90,414 books in the Selected Digitized Books collection on loc.gov. The text was created as part of digitization workflows using Optical Character Recognition (OCR) technologies.

    Metadata Metadata formats Data files
    90,414 records .csv, .json 84,058 full text files (.txt, .json)
  • Spezialkarte der österreichisch-ungarischen Monarchie (“Austro-Hungarian map set”) Data Package

    This experimental dataset contains 4,998 images in TIFF format representing non-georeferenced map sheets and corresponding GeoTIFF formatted images that are georeferenced, and have had the map collars (non-map portions of the image at the edge of the sheet) removed. The data comprises scans and georeferenced images of maps surveying the Austro-Hungarian empire from approximately 1875 to shortly after its collapse in 1918.

    Data files
    4,998 Georeferenced Map sheets (GeoTIFF, .tif)
  • General Collections Assessment Data Package

    The General Collections Assessment is an ongoing program to assess the Library's approximately 22 million books, bound serials and other materials classified under the General Collections. Assessments will be completed in segments divided by subject area (based on the Library's Collections Policy Statements ) (CPS). As part of this project, the Library is making available for exploration the underlying bibliographic datasets used as the primary data sources for the collection assessments.

    Metadata Metadata format
    894,692 records .csv
Back to top