Data for Exploration Access Exploratory Data Packages

Access Exploratory Data Packages

Discover and access the Library's free-to-use exploratory data packages, a data publishing format that combines normalized metadata, metadata enrichments, media files, standardized documentation, narrative context, code samples, and other usability features.

An index map of an insurance map of Amarillo, Texas containing an index of maps as a table of text underneath a color-coded and numbered map with streets and building numbers.

Sanborn Maps Data Package

The dataset contains metadata records for 50,600 maps from the Sanborn Fire Insurance Maps collection and their corresponding 440,048 images . The Sanborn collection at Library of Congress includes over fifty thousand editions of fire insurance maps comprising almost seven hundred thousand individual sheets. The Library of Congress holdings represent the largest extant collection of maps produced by the Sanborn Map Company.

Metadata	Metadata formats	Data files
50,600 records	.csv, .json	440,048 .jpg images

Screenshot of sample CDX file opened in a text editor

United States Elections, Web Archives Data Package

The data package is comprised of 396,117 CDX index files from the United States Elections Web Archive , which includes campaign websites and related web content documenting presidential, congressional, and gubernatorial elections that were archived weekly during general election seasons. The data package currently includes years 2000 – 2016.

Metadata	Metadata formats	Data files
396,117 index files, 1 descriptive metadata file	.cdx.gz, .csv	Web archived documents are not included within the data package, but the CDX files provide pointers for download.

A woman wearing a red bandana, holding a hand drill, and drilling into a metal panel

Free to Use and Reuse Data Package

This dataset contains metadata records and images for 2,610 curated selections featured in the Library of Congress' Free to Use and Reuse Sets as well as links to images for the full digital objects represented in the sets. This dataset includes only those items which are accessible via the Library of Congress' API.

Metadata	Metadata formats	Data files
2,610 records	.csv, .json	2,610 .jpg images

A photograph of a vinyl record with the blue label with a hold musical note and white text that reads: Columbia Records

National Jukebox Data Package

This dataset contains metadata records and audio files for 5,882 audio recordings in the National Jukebox collection . The records range in date from 1900-1922.

Metadata	Metadata formats	Data files
5,882 records	.csv, .json	5,882 .mp3 audio files

A black and white page within a telephone directory with the title Telephone Directory and four advertisements above and below the title

Digitized Telephone Directories, 1891-1988 Data Package

This dataset contains metadata records for a subset of 3,513 reels of US telephone directories, digitized from microfilm, from the Digitized Telephone Directories collection . Full text OCR files are also included for a subset of the records when they exist.

Metadata	Metadata formats	Full text OCR files
3,511 records	.csv, .json	486 .txt files

A collage of many small, colorful, and low-resolution graphics that look like they are in the style of 1990s-era web graphics

Selected Dot Gov Media Types, Web Archives Data Package

The Dot Gov Datasets are the result of exploratory work conducted by the Library's Web Archiving Program to make the Web Archives more widely accessible and usable. This data package consists of seven datasets, each containing information related to 1,000 or more files of related media types selected from .gov domains in the Library's Web Archives (i.e. audio, CSV, image, PDF, Powerpoint, TSV, and XLS data formats).

Metadata	Metadata format	Data Files
7,000 records	.csv	7,000 files in multiple formats

A stereographic image of a woman viewing stereographs in her home. She is sitting in front of a fireplace with a cabinet for stereographs on her right.

Stereograph Card Images Data Package

The Stereograph Card dataset consists of 39,526 stereograph card images from the 1850s through 1924, a subset of what was available online in the collection on loc.gov in August 2022.

Metadata	Metadata formats	Data files
39,532 records	.csv, json	39,526 .jpg files

A black and white photograph of a telephone and directory in 1940

Directory Holdings Data Package

The Directory Holdings Data Package consists of metadata describing the Library of Congress inventoried holdings of United States Telephone Directories, City Directories, and Criss-cross directories. It is based on the inventory tables listed on the Library's United States: City and Telephone Directories and Directories By Address: Inventories of Library Collections Library Guides . The data is presented two ways: by Directory Type and by state/region.

Metadata	Metadata formats
250,318 records	.csv, .json

Illustration of a man using a printing press with caption: Pulling the Great Archimedean Lever

Selected Digitized Books Data Package

This dataset comprises 84,058 files containing full text from 90,414 books in the Selected Digitized Books collection on loc.gov. The text was created as part of digitization workflows using Optical Character Recognition (OCR) technologies.

Metadata	Metadata formats	Data files
90,414 records	.csv, .json	84,058 full text files (.txt, .json)

A map of the Austrian Monarchy from 1854 with regions hand-colored

Spezialkarte der österreichisch-ungarischen Monarchie (“Austro-Hungarian map set”) Data Package

This experimental dataset contains 4,998 images in TIFF format representing non-georeferenced map sheets and corresponding GeoTIFF formatted images that are georeferenced, and have had the map collars (non-map portions of the image at the edge of the sheet) removed. The data comprises scans and georeferenced images of maps surveying the Austro-Hungarian empire from approximately 1875 to shortly after its collapse in 1918.

Data files
4,998 Georeferenced Map sheets (GeoTIFF, .tif)

Poster shows an illustration of a young woman, wearing a green coat and hat and sitting in a blue chair, reading a list of names for gifts she has to buy.

General Collections Assessment Data Package

The General Collections Assessment is an ongoing program to assess the Library's approximately 22 million books, bound serials and other materials classified under the General Collections. Assessments will be completed in segments divided by subject area (based on the Library's Collections Policy Statements ) (CPS). As part of this project, the Library is making available for exploration the underlying bibliographic datasets used as the primary data sources for the collection assessments.

Metadata	Metadata format
894,692 records	.csv