Directory Holdings Data Package README

Version information

Version 1.2 | Last updated 2024-04-17

CONTENT ADVISORY

Please note that terminology in historical materials and in Library descriptions does not always match the language preferred by members of the communities depicted, and may include negative stereotypes or words that offend.

About the source data or collection

Brief description & background of collection

City and telephone directories provide specific information on a locality’s residents and addresses, and may also include occupation or telephone number. Family historians often use these directories to fill in the gaps between censuses. Since they are updated regularly—often at least once a year—directories help to pinpoint where and with whom ancestors lived each year, which in turn helps researchers to know where to look for more records. Directories are also a terrific resource for house history, business history, and local history in general. The Library of Congress makes available to the public an extensive collection of past and present city, telephone, and reverse telephone (criss-cross) directories for the United States and many foreign countries. These directories are available in a variety of formats and locations.

City directories are among the most important sources of information about urban areas and their inhabitants. They provide personal and professional information about a city's residents as well as information about its business, civic, social, religious, charitable, and literary institutions. City directories often provide additional information about individuals such as place of employment and name of spouse. The entries are arranged alphabetically by last name and also by address and telephone number. City directories are compiled through door-to-door surveys and are published at irregular intervals. The Library of Congress has an unmatched collection of United States city directories. They are available in a variety of formats: paper, microfiche, microfilm, and electronic.

City Directories of the United States is a large, self-service collection of city directories on microfilm. More than 1200 cities, towns, and counties are represented; the years of coverage are primarily 1861 through 1960. The directories are arranged by city (not by state) in large cabinets in the Microform and Electronic Resources Center. Paper copies of the guide to city directories can be found in the Microform and Electronic Resources Center and at the Local History and Genealogy Reference Desk in the Main Reading Room. Directories for many localities include listings for other nearby communities. When ranges of years are indicated, there may be gaps of as many as five years.

Telephone directory inventories are organized by state and details the available directories by city/town and year. * Telephone directories for many U.S. cities and towns from 1976 through 1995 can be accessed through phonefiche housed in the Microform and Electronic Resources Center (LJ 139B). * Pre-1976 directories for fourteen states (Alabama, Alaska, Arizona, Arkansas, California, Colorado, Connecticut, Delaware, Florida, Georgia, Hawaii, Iowa, Maryland, Pennsylvania), the District of Columbia, and the city of Chicago have been microfilmed and digitized by the Library of Congress. They can be requested in the Microform and Electronic Resources Center or viewed online in the Library's Digital Collections. * Telephone directories for New York City and the surrounding area published from 1878-1959 can be accessed through the self-service microfilm collection in the Microform and Electronic Resources Center. * Microfilmed telephone directories that have not been digitized and are not part of the New York City collection may be requested in the Microform and Electronic Resources Center. * Any directories in this index that are not specified as microfilm, phonefiche, or digital, are available to view in printed form. Print copies of telephone directories are served in the Science & Business Reading Room on the 5th floor of the John Adams Building.

Criss-cross directories contain information arranged by address—first by street name and then by house number. In more recent years, criss-cross directories also contain listings organized by telephone number or telokey. The inventory of these directories at the Library of Congress is arranged alphabetically by the name of the city or town and primarily begins in the 1960's.

Original format

Mostly photographic prints mounted on cards. Some items are images of original glass plates, some produced anaglyphs (red-blue) acquired by LC appear in collection.

Library of Congress reading room

Contact

For questions or more information about this material, please contact Local History & Genealogy Reference Services staff through the Ask a Librarian service or Business Reference Services through the Ask a Librarian service.

Metadata type

Library Guide inventory

Scale of description

Directory holdings are listed by coverage only in tables on the Library Guides listed below. These listings represent only a portion of all directories held by the Library of Congress.

About this exploratory data package

This dataset contains localities represented in three types of directories held by the Library--City Directories, Telephone Directories, and Criss-cross Directories (or reverse telephone number). Family historians often use these directories to fill in the gaps between censuses. Since they are updated regularly—often at least once a year—directories help to pinpoint where and with whom ancestors lived each year, which in turn helps researchers to know where to look for more records. Directories are also a terrific resource for house history, business history, and local history in general. Criss-cross directories contain information arranged by address—first by street name and then by house number. In more recent years, criss-cross directories also contain listings organized by telephone number or telokey.

The dataset is based on the inventory tables listed on the Library's United States: City and Telephone Directories and Directories By Address: Inventories of Library Collections Library Guides. While the original tables list state/locality and date ranges of coverage (including missing years), the datasets break this coverage down by year, with one row per state/locality and year coverage. Because of the large volume of data, datasets are offered by type of directory and combined by state. For telephone directories, localities have been cross-referenced with the Digitized Telephone Directories dataset to provide links to corresponding digitized copies, if available.

Library Guide inventories were usually compiled by recording the information listed on the spine of each directory, which may not include all localities represented within. Also, inventories were not created for all states held by the Library, so this dataset should not be interpreted as a comprehensive list of all directories held by the Library.

This dataset was created as part of an LC Labs experiment in collaboration with AVP to understand the benefits, risks, quality benchmarks, workflows, compilation methods, transformations, and documentation practices required to assemble datasets for public use in the cloud.

The goal of this dataset in particular is to explore ways of bringing together curated lists of items selected by subject specialists or other knowledgeable staff into a single dataset. Through the creation of this dataset for the experiment, LC Labs and AVP hope to learn: - How to compile datasets from curated lists, such as LibGuides, or recommendations from subject specialists. - what interactions are important to ensuring as many relevant resources are included as possible and that original context is communicated to dataset users. - how these curated lists can be maintained for specialists to continue to update.

What's included?

The data package contains:

Computational readiness and possible uses

The data in this dataset have been selected, structured, standardized, and enriched to make the dataset more easily comprehensible and computable through a range of methods and in a variety of environments. Enrichment of locations with coordinates and structured addresses could support plotting records in mapping interfaces. Enrichment and standardization of dates could enable sequencing in timelines.

How was it created?

This dataset was created through a four-stage process including data extraction, mapping and standardization to a specified schema, enrichment of certain fields with additional data, and packaging for access and use.

Extraction

This dataset was created by first extracting tables from the online Library Guides and converting them into spreadsheets. The data was reshaped to split date coverage into one row per year for each locality represented. This process yielded 250319 rows across all directory types.

Standardization

Entries in the tables were then split into multiple rows by parsing dates/ranges and lists of localities so that each row contained a unqiue combination of a locality and and year of coverage. For example, the City Directory entry 'Oregon, Pendleton City/Umatilla County' for the years 1937-1960 was split into a set of rows for each year within the date range for Pendleton City and another set of rows for Umatilla County. Because of inconsistencies in the locality listings, manual cleanup needed to be performed before splitting the rows, so there will be some instances of human error in the output. A separate column was created for state and additional columns were created with boilerplate values, such as Directorytype, Language, and Typeof_resource.

Enrichment

After standardization, some data fields were enriched to bring additional value for potential use cases. In this dataset, locations were concatenated into city/town/village, state strings, queried in OpenStreetMap and enriched with structured location data, geocoordinates, and URLs to the structured data record in OpenStreetMap. Results were filtered to include only place types with one of the following values: "hamlet", "town", "city", "village", "county", "state", "province", "locality", "country", "suburb", "borough". If there was more than one result, the script chose the first result to encode in the output data. This approach may have produced inaccuracies in the enriched data due to idiosyncracies in recorded locations or misspellings in the original metadata.

The telephone directory holdings were then cross-referenced with the Digitized U.S. Telephone Directories, 1891-1988 (Metadata only) Dataset to determine whether a given location and date combination had been digitized. This cross-referencing was done by searching the Digitized U.S. Telephone Directories for the State, Locality, and Date of each item in this dataset. If one or more match was found, the item was marked as Digitized and the URL(s) to the item(s) (retrieved from Digitized U.S. Telephone Directories) was added.

Packaging

After enrichment, the dataset was output in JSON, JSONL (JSON lines), and CSV formats. CSV files were flattened from JSON using the following rules: - Arrays were flattened to strings, with array items delimited by the pipe character
- Enriched locations (Location) are included in their original JSON form in the Location column. The Location.Full_name is included in a Location_full_name column with multiple locations delimited by the pipe character. Coordinates from Location.Coordinates are listed as lat,long pairs in the Coordinates column in the same order as in Location_full_name. State_region and County columns were added for easier grouping and filtering of the data.

Because of the large volume of data, directory holdings data was grouped in two ways--all directory types combined and grouped by State_region, and each directory type grouped as a single dataset (with telephone directory CSVs split into multiple files, small enough to be opened in spreadsheet programs).

Dataset field descriptions

The data fields that follow were compiled from a "General Purpose" schema designed for the Data Transformation Services experiment and supplemented with additional fields specific to this collection and/or anticipated uses of the data. Values have been sourced from the Library Guide tables or templated with static values where necessary (denoted by "Supplied"). These mappings are indicated in the "Data source" column in the table below. Some values have been standardized for consistency across the dataset or interoperability with other datasets using similar data structures, standards, or controlled vocabularies. Types and descriptions of standardizations are listed above in the "How was it created" section and indicated in the "Standardization" column of the table below. Enrichments are also described in the "How was it created" section and are indicated below.

The data fields that follow are directly translated from the metadata.json file. The JSON file is nested in nature, and that nested structure is not strictly carried over into the CSV. When JSON fields have been flattened or otherwise altered to fit a CSV field, the transformation is described below.

Each of the fields described below appears for an object or row in the dataset. Please note that not all elements appear for each result. The number and percentage of results populated for each field are indicated in the table below as well as in a summary.csv file in this package.

Field Datatype Definition Requirement Repeatability Data Source Percent Populated
Coordinates Text (CSV only) The coordinate pair (lat, long) of a location matched through the OpenStreetMap location enrichment. Multiple coordinate sets are delimited with the pipe character. Corresponds to the location string in the same position in the Locationfullname column. Optional Y Enrichment: Location
Date Date (EDTF) A year in which the locality has coverage by the directory. Optional Y Library Guide tables 100%
Date_text Text Free text years or date ranges in which the locality has coverage from by the directory. Optional Y Library Guide tables 100%
Digitized Boolean (Telephone Directories only) Whether or not the year covered is also represented by a digitized directory Optional N Enrichment: Telephone Directory cross-referencing 77.38%
Directory_type Text The type of directory, City Directory, Telephone Directory, or Criss-cross Directory Required N Library Guide tables 100%
Genre Text A genre for the resource. Optional Y Supplied 100%
Language Text The language(s) of the content of the resource. Optional Y Supplied 100%
Locality Text The locality covered by a directory. Required N Library Guide tables 99.99%
Location Object Structured representation of a location, including parent administrative divisions, where applicable, and geocoordinates Optional Y Enrichment: Telephone Directory cross-referencing 92.45%
Location_temp Text Temporary field for storing extracted location data for enrichment stage. This is removed during the enrichment stage. Optional Y Library Guide tables N/A
Location_text Text Textual representations of a location, copied directly from the source record Optional Y Library Guide tables 99.99%
Locationfullname Text (CSV only) The full location string of a location matched through the OpenStreetMap location enrichment. Multiple locations are delimited with the pipe character. Corresponds to the coordinate pair in the same position in the Coordinates column. Optional Y Enrichment: Telephone Directory cross-referencing
Location.Address Text The administrative divisions that make up the full address of the location. Required N Enrichment: Telephone Directory cross-referencing
Location.Address.[place_type] Text An administrative division of the location, ex. Country, State, or City. [place_type] is replaced by the address type from the address block in the OpenStreetMap data. All parent administrative divisions as well as the type of location are included. Required N Enrichment: Telephone Directory cross-referencing
Location.Coordinates List The coordinates (lat, long) of the location. Required N Enrichment: Telephone Directory cross-referencing
Location.Full_name Text The full display name of the location, taken from Open Street Map. Required N Enrichment: Telephone Directory cross-referencing
Location.Osm_url URL The Open Street Map URL for the location. Required N Enrichment: Telephone Directory cross-referencing
Location.Short_name Text The name of the lowest level administrative division of the location. Required N Enrichment: Telephone Directory cross-referencing
Notes Text Additional information about the content, context, or physical description of the resource. Optional Y Library Guide tables 100%
Original_format Text The format the resource was digitized from. Optional Y Library Guide tables 48.82%
Repository Text The repository that holds the physical or digital resource. Optional N Supplied 100%
Shelf_id Text An identifier for finding the original physical resource. Optional Y Library Guide tables 0.93%
Source_collection Text The collection the resource belongs to. Optional Y Supplied 100%
State_region Text The state or region of the locality covered by a directory. Required N Library Guide tables 100%
Telephonedirectorytype Text (Telephone Directories only) The type of telephone directory: White Page, Yellow Pages, or White and Yellow Pages. Required N Supplied 74.03%
Typeofresource Text A term that specifies the characteristics and general type of content of the resource, such as "Still image" or "Text." Based on MODS 3.7 enumerated list of values for typeOfResource: https://www.loc.gov/standards/mods/userguide/typeofresource.html Required Y Supplied 100%
Url URL (Telephone Directories only) The Digital Collections URL for the resource, if there is a corresponding digitized version of the telephone directory. Optional N Enrichment: Telephone Directory cross-referencing 2.68%

Rights Statement

Please note that terminology in historical materials and in Library descriptions does not always match the language preferred by members of the communities depicted, and may include negative stereotypes or words that offend.

Creator and contributor information

Creator: AVP

Contributors: LC Labs

Contact information

Please contact [email protected] with any questions or suggestions!