Top of page
Data For Exploration Data packages National Jukebox Data Package
This dataset contains metadata records and audio files for 5,882 audio recordings in the National Jukebox collection . The records range in date from 1900-1922.
This dataset was created as part of an LC Labs experiment in collaboration with AVP to explore methods for creating a "general purpose" dataset, a format that is easily repeatable across collections that use the LOC API.
Metadata | Metadata formats | Data files |
---|---|---|
5,882 records | .csv, .json | 5,882 .mp3 audio files |
Included in this data package is comprehensive documentation of source data or collection provenance, the contents of the data package, and how the data package was created. Here are some particular sections of interest as well as a link to the full documentation:
There are two main options for accessing and using this data package: (1) Directly downloading files from this page and (2) using Python for more advanced usage.
The following list outlines the contents of this data package. Many of the individual files inside the data package are linked directly on this page which you can download and immediately use. Zipped files are available for bulk download of the entire or parts of the data package.
Sample the data |
|
---|---|
Download the documentation |
|
Download the metadata |
|
Download the audio files |
|
While direct downloads are more convenient for most activities, users with familiarity with writing Python can perform more advanced and complex tasks programmatically.
For your convenience we developed a number of Jupyter Notebooks to help get you started.
View the Python notebook for this data package
For bulk downloads, refer to this Python script for downloading files in bulk . Sample commands for this data package:
Download all audio files
python bulk_download.py --package "https://data.labs.loc.gov/jukebox/" --out
"output/jukebox/"
Source collection |
Recordings in the National Jukebox come from the Recorded Sound Section of the Library of Congress, the University of California Santa Barbara, and a private collection, though the recordings selected for this dataset all come from the Recorded Sound Section of the Library of Congress. (This dataset does not include the roughly 8000 recordings from the other two repositories from the selected time period.) All recordings included in the Jukebox were issued on record labels now owned by Sony Music Entertainment, which granted the Library of Congress a license to make the recordings available online. All of the recordings digitized so far under this license, those included in this dataset, were made by the Victor Talking Machine Company. These recordings were originally made on wax discs. In cases where multiple copies of the same recording were available, the disc in the best condition was selected for digitization. |
---|---|
Rights statement | All recordings published before January 1, 1923 entered the public domain on January 1, 2022 under the Music Modernization Act of 2018. Based on the dates in the item metadata, all recordings included in this dataset are assumed to be in the public domain. |
Date created | 2023-05-05 |
Date updated | 2024-03-28 |
Creators & contributors |
|
Cite this dataset |
|
Curatorial questions | For curatorial questions about the content of the collection or technical questions about the dataset formats and composition, please contact the Recorded Sound Section via the Library's Ask a Librarian service at https://ask.loc.gov/recorded-sound . |
Access questions | For questions and technical issues about download and access, please submit a ticket on Github or email the LC Labs Team at [email protected] . |