Top of page

Digitized Telephone Directories, 1891-1988 Data Package

This dataset contains metadata records for a subset of 3,513 reels of US telephone directories, digitized from microfilm, from the Digitized Telephone Directories collection . Full text OCR files are also included for a subset of the records when they exist.

A black and white page within a telephone directory with the title Telephone Directory and four advertisements above and below the title
Florida - White and Yellow Pages - Belle Glade - Pahokee - 12/1938 thru 8/1953

About this dataset

This experimental dataset was created as part of an LC Labs experiment in collaboration with AVP to understand the benefits, risks, quality benchmarks, workflows, compilation methods, transformations, and documentation practices required to assemble datasets for public use in the cloud.

View source collection Browse collection items

Metadata Metadata formats Full text OCR files
3,511 records .csv, .json 486 .txt files

Data package documentation

Included in this data package is comprehensive documentation of source data or collection provenance, the contents of the data package, and how the data package was created. Here are some particular sections of interest as well as a link to the full documentation:

View the documentation

Dataset at a glance

How to access and use this data package

There are two main options for accessing and using this data package: (1) Directly downloading files from this page and (2) using Python for more advanced usage.

Direct downloads

The following list outlines the contents of this data package. Many of the individual files inside the data package are linked directly on this page which you can download and immediately use. Zipped files are available for bulk download of the entire or parts of the data package.

Download everything
  • telephone.zip (904.7 MB) - Full data package which includes metadata files and 486 .txt files, zipped
Sample the data
  • sample-data.zip (23.6 MB) - 100 randomly selected items from the 3,511 item set and their corresponding full text files (when available) have been provided as sample data. Included with this are a metadata.csv, metadata.json, and manifest.json.
  • sample-data/metadata.json (604.7 KB) - A JSON file containing the metadata for the 100 sample items
  • sample-data/metadata.csv (417.0 KB) - A CSV transformation of the sample JSON metadata
  • sample-data/manifest.html - For downloading individual full text files, this is a simple page that lists each text's file id, item id, MD5 hash (base64), file size, and URL
  • sample-data/manifest.json (3.3 KB) - A JSON file listing each text file id, their item id, MD5 hash (base64), file size, and URL
Download the documentation
  • README.html - An overview of the source data or collection provenance, the contents of the data package, and how the data package was created.
  • README.md (29.3 KB) - README as a Markdown text file
  • README.pdf (30.6 KB) - README as a PDF file
Download the metadata
Download the full text files
  • data.zip (901.1 MB) - All 486 text files, zipped
  • manifest.html - For downloading individual text files, this is a simple page that lists each text's file id, item id, MD5 hash (base64), file size, and URL. For bulk downloads, refer to the following Using Python section .
  • manifest.txt (75.5 KB) - A text file listing each text file id, their item id, MD5 hash (base64), file size, and URL
  • manifest.json (82.2 KB) - A JSON file listing each text file id, their item id, MD5 hash (base64), file size, and URL

Using Python

While direct downloads are more convenient for most activities, users with familiarity with writing Python can perform more advanced and complex tasks programmatically.

For your convenience we developed a number of Jupyter Notebooks to help get you started.

View the Python notebook for this data package

Bulk downloads using Python

For bulk downloads, refer to this Python script for downloading files in bulk . Sample commands for this data package:

Download all OCR'd text files in this package

python bulk_download.py --package "https://data.labs.loc.gov/telephone/" --out "output/telephone/"

Dataset details

Source collection

U.S. Telephone Directory Collection

The Library of Congress makes available to the public an extensive collection of past and present city, telephone, and reverse telephone (criss-cross) directories for the United States and many foreign countries. These directories are available in a variety of formats and locations. The collection spans most of the 20th century, and includes directories from Alabama, Alaska, Arizona, Arkansas, California, Colorado, Connecticut, Delaware, the District of Columbia, Florida, Georgia, Hawaii, Iowa, Maryland, Pennsylvania, and the city of Chicago. All the directories and their metadata records are in English. There is not a one-to-one correspondence between metadata records and directories, as some microfilm reels contained multiple directories when they were digitized. There is a mix of white pages and yellow pages. Some directories may be missing pages due to damage.

Rights statement All white pages are in the public domain, as are any pre-1964 yellow pages that were not registered and renewed for copyright. For more information, see https://www.loc.gov/collections/united-states-telephone-directory-collection/about-this-collection/rights-and-access/ .
Date created 2023-05-05
Date updated 2024-03-29
Creators & contributors
Creator:
AVP
Contributors:
LC Labs
History & Genealogy Section
Cite this dataset
Chicago citation style:
Library Of Congress. Digitized Telephone Directories, 1891-1988 Data Package. [Washington, D.C.: Library of Congress, 2023] Software, E-Resource. https://data.labs.loc.gov/telephone/.
APA citation style:
Library Of Congress. (2023) Digitized Telephone Directories, 1891-1988 Data Package. [Washington, D.C.: Library of Congress] [Software, E-Resource] Retrieved from the Library of Congress, https://data.labs.loc.gov/telephone/.
MLA citation style:
Library Of Congress. Digitized Telephone Directories, 1891-1988 Data Package. [Washington, D.C.: Library of Congress, 2023] Software, E-Resource. Retrieved from the Library of Congress, </data.labs.loc.gov/telephone/>.
Curatorial questions For curatorial questions about the content of the collection or technical questions about the dataset formats and composition, please contact the History and Genealogy Section via the Library's Ask a Librarian service at https://ask.loc.gov/genealogy-local-history/ .
Access questions For questions and technical issues about download and access, please submit a ticket on Github or email the LC Labs Team at [email protected] .
Back to top