Top of page
Data For Exploration Data packages General Collections Assessment Data Package
The General Collections Assessment is an ongoing program to assess the Library's approximately 22 million books, bound serials and other materials classified under the General Collections. Assessments will be completed in segments divided by subject area (based on the Library's Collections Policy Statements ) (CPS). As part of this project, the Library is making available for exploration the underlying bibliographic datasets used as the primary data sources for the collection assessments.
The General Collections Assessment is an ongoing program to assess the
Library's approximately 22 million books, bound serials and other materials
classified under the General Collections. Assessments will be completed in
segments divided by subject area (based on the Library's
Collections Policy Statements
) (CPS). Goals for the program include allowing the Library to assess its
effectiveness in meeting its collecting mandate; providing action steps to
address any issues identified through the assessments; and building a data
gathering process to support on-going and future assessment. As part of this
project, the Library is making the datasets publicly available here.
The General Collections Assessment is being conducted in segments, using the
Library's Collections Policy Statements (CPS) as a framework for structuring
the analysis and reporting. A total of 45 segment assessments are planned.
The Collections Development Office is making available for exploration the
underlying bibliographic datasets used as the primary data sources for the
collection assessments. The list below contains links to datasets and the
relevant CPS for each segment. The project is ongoing and as more segment
datasets are completed, they will be added to this page.
Methodology:
Each segment assessment analyzes the bibliographic data of books and serials
in the subject area covered by a Collections Policy Statement. For more
information on how the data were cleaned, analyzed, and standardized as well
as Dataset Field Descriptions, please see the
README
.
Metadata | Metadata format |
---|---|
894,692 records | .csv |
Included in this data package is comprehensive documentation of source data or collection provenance, the contents of the data package, and how the data package was created. Here are some particular sections of interest as well as a link to the full documentation:
There are two main options for accessing and using this data package: (1) Directly downloading files from this page and (2) using Python for more advanced usage.
The following list outlines the contents of this data package. Many of the individual files inside the data package are linked directly on this page which you can download and immediately use. Zipped files are available for bulk download of the entire or parts of the data package.
Download everything |
|
---|---|
Sample the data |
|
Download the documentation |
|
Download the metadata |
|
While direct downloads are more convenient for most activities, users with familiarity with writing Python can perform more advanced and complex tasks programmatically.
For your convenience we developed a number of Jupyter Notebooks to help get you started.
View the Python notebook for this data package
Source collection |
These data were produced as part of an ongoing program to assess the Library's approximately 22 million books, bound serials and other materials classified under the General Collections. |
---|---|
Rights statement | These data are free to reuse. |
Date created | 2023-11-28 |
Date updated | 2024-04-01 |
Creators & contributors |
|
Cite this dataset |
|
Curatorial questions | Please email any questions or comments to [email protected]. |
Access questions | For questions and technical issues about download and access, please submit a ticket on Github or email the LC Labs Team at [email protected] . |