Top of page
Data For Exploration Data packages United States Elections, Web Archives Data Package
The data package is comprised of 396,117 CDX index files from the United States Elections Web Archive , which includes campaign websites and related web content documenting presidential, congressional, and gubernatorial elections that were archived weekly during general election seasons. The data package currently includes years 2000 – 2016.
This data package is maintained by the Library of Congress's Web Archiving Program. It is an update of a time-limited dataset initially released in 2022 and described in more detail in its accompanying blog post . This data package is comprised of 396,117 CDX index files from the United States Elections Web Archive , which includes campaign websites and related web content documenting presidential, congressional, and gubernatorial elections that were archived weekly during general election seasons. The data package currently includes years 2000 – 2016. Some of the data in this collection (metadata.csv) is publicly available via the loc.gov API. The CDX files in this data package are available only through this data package.
Metadata | Metadata formats | Data files |
---|---|---|
396,117 index files, 1 descriptive metadata file | .cdx.gz, .csv | Web archived documents are not included within the data package, but the CDX files provide pointers for download. |
Included in this data package is comprehensive documentation of source data or collection provenance, the contents of the data package, and how the data package was created. Here are some particular sections of interest as well as a link to the full documentation:
There are two main options for accessing and using this data package: (1) Directly downloading files from this page and (2) using Python for more advanced usage.
The following list outlines the contents of this data package. Many of the individual files inside the data package are linked directly on this page which you can download and immediately use. Zipped files are available for bulk download of the entire or parts of the data package.
Sample the data |
|
---|---|
Download the documentation |
|
Download the descriptive metadata |
|
Download the CDX index files |
|
Browse CDX index files by year |
|
While direct downloads are more convenient for most activities, users with familiarity with writing Python can perform more advanced and complex tasks programmatically.
For your convenience we developed a number of Jupyter Notebooks to help get you started.
View the Python notebook for this data package
For bulk downloads, refer to this Python script for downloading files in bulk . Sample commands for this data package:
Download all CDX files by from 2004
python bulk_download.py --package
"https://data.labs.loc.gov/us-elections/by-year/2004/" --out "output/2004/"
Download all CDX files by from 2016
python bulk_download.py --package
"https://data.labs.loc.gov/us-elections/by-year/2016/" --out "output/2016/"
Source collection |
United States Elections Web Archive The United States Elections Web Archive includes campaign websites documenting presidential, congressional, and gubernatorial elections that were archived weekly during general election seasons 2000 - present. Prior to election 2020, the sites archived in the collection often include web-harvested social media content, in order to provide a fuller representation of how candidates presented themselves via the Internet to the electorate and with varying capture success rates. In the early years of the collections, websites were also included of political parties, government, advocacy groups, bloggers, and other individuals and groups producing content relevant to the election. These sites have generally been moved into the Public Policy Topics Web Archive or into the general web archives. However, because these sites were originally collected with candidate websites, they remain in the CDX index files in this data package. This data package reflects the scope of the United States Elections Web Archive at the time of its collection each general election year. Because this scope has evolved, the scope of the CDX index files also varies over time. |
---|---|
Rights statement |
Rights for this data package - The README, CDX files, and metadata.csv contained within this data package have no known copyright restrictions and are free to use and reuse. Rights for the source United States Election Web Archive collection - See the full Rights & Access statement for the United States Election Web Archive at https://www.loc.gov/collections/united-states-elections-web-archive/about-this-collection/rights-and-access/ . |
Date created | 2024-11-15 |
Creators & contributors |
|
Cite this dataset |
|
Curatorial questions | Please direct curatorial questions to the Web Archiving Program at [email protected]. |
Access questions | For questions and technical issues about download and access, please submit a ticket on Github or email the LC Labs Team at [email protected] . |