Top of page

General Collections Assessment Data Package

The General Collections Assessment is an ongoing program to assess the Library's approximately 22 million books, bound serials and other materials classified under the General Collections. Assessments will be completed in segments divided by subject area (based on the Library's Collections Policy Statements ) (CPS). As part of this project, the Library is making available for exploration the underlying bibliographic datasets used as the primary data sources for the collection assessments.

Poster shows an illustration of a young woman, wearing a green coat and hat and sitting in a blue chair, reading a list of names for gifts she has to buy.
Why not books? for birthdays, weddings, graduations, holidays.

About this dataset

The General Collections Assessment is an ongoing program to assess the Library's approximately 22 million books, bound serials and other materials classified under the General Collections. Assessments will be completed in segments divided by subject area (based on the Library's Collections Policy Statements ) (CPS). Goals for the program include allowing the Library to assess its effectiveness in meeting its collecting mandate; providing action steps to address any issues identified through the assessments; and building a data gathering process to support on-going and future assessment. As part of this project, the Library is making the datasets publicly available here.

The General Collections Assessment is being conducted in segments, using the Library's Collections Policy Statements (CPS) as a framework for structuring the analysis and reporting. A total of 45 segment assessments are planned. The Collections Development Office is making available for exploration the underlying bibliographic datasets used as the primary data sources for the collection assessments. The list below contains links to datasets and the relevant CPS for each segment. The project is ongoing and as more segment datasets are completed, they will be added to this page.

Methodology: Each segment assessment analyzes the bibliographic data of books and serials in the subject area covered by a Collections Policy Statement. For more information on how the data were cleaned, analyzed, and standardized as well as Dataset Field Descriptions, please see the README .

View source collection

Metadata Metadata format
894,692 records .csv

Data package documentation

Included in this data package is comprehensive documentation of source data or collection provenance, the contents of the data package, and how the data package was created. Here are some particular sections of interest as well as a link to the full documentation:

View the documentation

Dataset at a glance

How to access and use this data package

There are two main options for accessing and using this data package: (1) Directly downloading files from this page and (2) using Python for more advanced usage.

Direct downloads

The following list outlines the contents of this data package. Many of the individual files inside the data package are linked directly on this page which you can download and immediately use. Zipped files are available for bulk download of the entire or parts of the data package.

Download everything
Sample the data
  • sample-data.zip (65.5 KB) - 100 randomly selected items from each of the three assessments
  • sample-data/chi.csv (31.3 KB) - A CSV file containing the metadata for the 100 sample items from the Children's Literature assessment.
  • sample-data/localhistory_us.csv (31.1 KB) - A CSV file containing the metadata for the 100 sample items from the Local History assessment (U.S. scope only).
  • sample-data/philosophy.csv (30.0 KB) - A CSV file containing the metadata for the 100 sample items from the Philosophy assessment.
Download the documentation
  • README.html - An overview of the source data or collection provenance, the contents of the data package, and how the data package was created.
  • README.md (10.1 KB) - README as a Markdown text file
  • README.pdf (16.7 KB) - README as a PDF file
Download the metadata
  • chi.csv (100.0 MB) - Bibliographic data for 331,146 collection items used in the Children's Literature assessment.
  • localhistory_us.csv (97.5 MB) - Bibliographic data for 321,869 collection items used in the Local History assessment (U.S. scope only).
  • philosophy.csv (70.7 MB) - Bibliographic data for 241,677 collection items used in the Philosophy assessment.
  • README.html#dataset-field-descriptions - Metadata field descriptions

Using Python

While direct downloads are more convenient for most activities, users with familiarity with writing Python can perform more advanced and complex tasks programmatically.

For your convenience we developed a number of Jupyter Notebooks to help get you started.

View the Python notebook for this data package

Dataset details

Source collection

General Collections

These data were produced as part of an ongoing program to assess the Library's approximately 22 million books, bound serials and other materials classified under the General Collections.

Rights statement These data are free to reuse.
Date created 2023-11-28
Date updated 2024-04-01
Creators & contributors
Creator:
Collection Development Office, Library of Congress based on existing bibliographic records
Cite this dataset
Chicago citation style:
Library Of Congress. General Collections Assessment Data Package. [Washington, D.C.: Library of Congress, 2023] Software, E-Resource. https://data.labs.loc.gov/gen-coll-assessment/.
APA citation style:
Library Of Congress. (2023) General Collections Assessment Data Package. [Washington, D.C.: Library of Congress] [Software, E-Resource] Retrieved from the Library of Congress, https://data.labs.loc.gov/gen-coll-assessment/.
MLA citation style:
Library Of Congress. General Collections Assessment Data Package. [Washington, D.C.: Library of Congress, 2023] Software, E-Resource. Retrieved from the Library of Congress, </data.labs.loc.gov/gen-coll-assessment/>.
Curatorial questions Please email any questions or comments to [email protected].
Access questions For questions and technical issues about download and access, please submit a ticket on Github or email the LC Labs Team at [email protected] .
Back to top