Top of page
Data for Exploration Access Exploratory Data Packages
Discover and access the Library's free-to-use exploratory data packages, a data publishing format that combines normalized metadata, metadata enrichments, media files, standardized documentation, narrative context, code samples, and other usability features.
The dataset contains metadata records for 50,600 maps from the Sanborn Fire Insurance Maps collection and their corresponding 440,048 images . The Sanborn collection at Library of Congress includes over fifty thousand editions of fire insurance maps comprising almost seven hundred thousand individual sheets. The Library of Congress holdings represent the largest extant collection of maps produced by the Sanborn Map Company.
Metadata | Metadata formats | Data files |
---|---|---|
50,600 records | .csv, .json | 440,048 .jpg images |
This dataset contains metadata records and images for 2,610 curated selections featured in the Library of Congress' Free to Use and Reuse Sets as well as links to images for the full digital objects represented in the sets. This dataset includes only those items which are accessible via the Library of Congress' API.
Metadata | Metadata formats | Data files |
---|---|---|
2,610 records | .csv, .json | 2,610 .jpg images |
This dataset contains metadata records and audio files for 5,882 audio recordings in the National Jukebox collection . The records range in date from 1900-1922.
Metadata | Metadata formats | Data files |
---|---|---|
5,882 records | .csv, .json | 5,882 .mp3 audio files |
This dataset contains metadata records for a subset of 3,513 reels of US telephone directories, digitized from microfilm, from the Digitized Telephone Directories collection . Full text OCR files are also included for a subset of the records when they exist.
Metadata | Metadata formats | Full text OCR files |
---|---|---|
3,511 records | .csv, .json | 486 .txt files |
The Dot Gov Datasets are the result of exploratory work conducted by the Library's Web Archiving Program to make the Web Archives more widely accessible and usable. This data package consists of seven datasets, each containing information related to 1,000 or more files of related media types selected from .gov domains in the Library's Web Archives (i.e. audio, CSV, image, PDF, Powerpoint, TSV, and XLS data formats).
Metadata | Metadata format | Data Files |
---|---|---|
7,000 records | .csv | 7,000 files in multiple formats |
The Stereograph Card dataset consists of 39,526 stereograph card images from the 1850s through 1924, a subset of what was available online in the collection on loc.gov in August 2022.
Metadata | Metadata formats | Data files |
---|---|---|
39,532 records | .csv, json | 39,526 .jpg files |
The Directory Holdings Data Package consists of metadata describing the Library of Congress inventoried holdings of United States Telephone Directories, City Directories, and Criss-cross directories. It is based on the inventory tables listed on the Library's United States: City and Telephone Directories and Directories By Address: Inventories of Library Collections Library Guides . The data is presented two ways: by Directory Type and by state/region.
Metadata | Metadata formats |
---|---|
250,318 records | .csv, .json |
This dataset comprises 84,058 files containing full text from 90,414 books in the Selected Digitized Books collection on loc.gov. The text was created as part of digitization workflows using Optical Character Recognition (OCR) technologies.
Metadata | Metadata formats | Data files |
---|---|---|
90,414 records | .csv, .json | 84,058 full text files (.txt, .json) |
This experimental dataset contains 4,998 images in TIFF format representing non-georeferenced map sheets and corresponding GeoTIFF formatted images that are georeferenced, and have had the map collars (non-map portions of the image at the edge of the sheet) removed. The data comprises scans and georeferenced images of maps surveying the Austro-Hungarian empire from approximately 1875 to shortly after its collapse in 1918.
Data files |
---|
4,998 Georeferenced Map sheets (GeoTIFF, .tif) |
The General Collections Assessment is an ongoing program to assess the Library's approximately 22 million books, bound serials and other materials classified under the General Collections. Assessments will be completed in segments divided by subject area (based on the Library's Collections Policy Statements ) (CPS). As part of this project, the Library is making available for exploration the underlying bibliographic datasets used as the primary data sources for the collection assessments.
Metadata | Metadata format |
---|---|
894,692 records | .csv |