Data For Exploration Data packages General Collections Assessment Data Package

General Collections Assessment Data Package

Poster shows an illustration of a young woman, wearing a green coat and hat and sitting in a blue chair, reading a list of names for gifts she has to buy. — Why not books? for birthdays, weddings, graduations, holidays.

About this dataset

The General Collections Assessment is an ongoing program to assess the Library's approximately 22 million books, bound serials and other materials classified under the General Collections. Assessments will be completed in segments divided by subject area (based on the Library's Collections Policy Statements ) (CPS). Goals for the program include allowing the Library to assess its effectiveness in meeting its collecting mandate; providing action steps to address any issues identified through the assessments; and building a data gathering process to support on-going and future assessment. As part of this project, the Library is making the datasets publicly available here.

The General Collections Assessment is being conducted in segments, using the Library's Collections Policy Statements (CPS) as a framework for structuring the analysis and reporting. A total of 45 segment assessments are planned. The Collections Development Office is making available for exploration the underlying bibliographic datasets used as the primary data sources for the collection assessments. The list below contains links to datasets and the relevant CPS for each segment. The project is ongoing and as more segment datasets are completed, they will be added to this page.

Methodology: Each segment assessment analyzes the bibliographic data of books and serials in the subject area covered by a Collections Policy Statement. For more information on how the data were cleaned, analyzed, and standardized as well as Dataset Field Descriptions, please see the README .

View source collection

Metadata	Metadata format
894,692 records	.csv

Data package documentation

Included in this data package is comprehensive documentation of source data or collection provenance, the contents of the data package, and how the data package was created. Here are some particular sections of interest as well as a link to the full documentation:

View the documentation

Dataset at a glance

How to access and use this data package

There are two main options for accessing and using this data package: (1) Directly downloading files from this page and (2) using Python for more advanced usage.

Direct downloads

The following list outlines the contents of this data package. Many of the individual files inside the data package are linked directly on this page which you can download and immediately use. Zipped files are available for bulk download of the entire or parts of the data package.

Download everything	gen-coll-assessment.zip (46.3 MB) - Full data package which includes documentation and metadata files, zipped
Sample the data	sample-data.zip (65.5 KB) - 100 randomly selected items from each of the three assessments sample-data/chi.csv (31.3 KB) - A CSV file containing the metadata for the 100 sample items from the Children's Literature assessment. sample-data/localhistory_us.csv (31.1 KB) - A CSV file containing the metadata for the 100 sample items from the Local History assessment (U.S. scope only). sample-data/philosophy.csv (30.0 KB) - A CSV file containing the metadata for the 100 sample items from the Philosophy assessment.
Download the documentation	README.html - An overview of the source data or collection provenance, the contents of the data package, and how the data package was created. README.md (10.1 KB) - README as a Markdown text file README.pdf (16.7 KB) - README as a PDF file
Download the metadata	chi.csv (100.0 MB) - Bibliographic data for 331,146 collection items used in the Children's Literature assessment. localhistory_us.csv (97.5 MB) - Bibliographic data for 321,869 collection items used in the Local History assessment (U.S. scope only). philosophy.csv (70.7 MB) - Bibliographic data for 241,677 collection items used in the Philosophy assessment. README.html#dataset-field-descriptions - Metadata field descriptions

Using Python

While direct downloads are more convenient for most activities, users with familiarity with writing Python can perform more advanced and complex tasks programmatically.

For your convenience we developed a number of Jupyter Notebooks to help get you started.

View the Python notebook for this data package

Dataset details

Source collection	General Collections These data were produced as part of an ongoing program to assess the Library's approximately 22 million books, bound serials and other materials classified under the General Collections.
Rights statement	These data are free to reuse.
Date created	2023-11-28
Date updated	2024-04-01
Creators & contributors	Creator: Collection Development Office, Library of Congress based on existing bibliographic records
Cite this dataset	Chicago citation style: Library Of Congress. General Collections Assessment Data Package. [Washington, D.C.: Library of Congress, 2023] Software, E-Resource. https://data.labs.loc.gov/gen-coll-assessment/. APA citation style: Library Of Congress. (2023) General Collections Assessment Data Package. [Washington, D.C.: Library of Congress] [Software, E-Resource] Retrieved from the Library of Congress, https://data.labs.loc.gov/gen-coll-assessment/. MLA citation style: Library Of Congress. General Collections Assessment Data Package. [Washington, D.C.: Library of Congress, 2023] Software, E-Resource. Retrieved from the Library of Congress, </data.labs.loc.gov/gen-coll-assessment/>.
Curatorial questions	Please email any questions or comments to [email protected].
Access questions	For questions and technical issues about download and access, please submit a ticket on Github or email the LC Labs Team at [email protected] .