Austro-Hungarian Maps Dataset Data Processing Plan

Section A: General

A1: Goals of experiment

To make available an experimental project from the 2015 Library of Congress GIS Research Fellows program, accessible in a format that is both comprehensive and more easily digestible for computational and non-computational re-use.

A2: Describe the scope of the intended workflow or pipeline.

This dataset was originally developed as part of a 2015 project of the Library of Congress GIS Research Fellows program. Paper materials were digitized, georeferenced, and a derivative shapefile was generated for quick display of each sheet's georeferenced bounding box. The set was then lightly refined for public dissemination in 2022. Details of methods can be found below and in the dataset's README.md file.

A3: Data delivery format and specifications for data generated in the experiment.

The following file formats are used in this dataset:

A4: Description of intended use.

The data package will be made publicly available through a S3/Cloudfront distribution on data.labs.loc.gov.

Section B: Data Doumentation

B1: Description of dataset.

Title of dataset

Austro-Hungarian Maps Dataset

Composition

Provenance

This dataset has a layered provenance, outlined below.

Compilation methods

For the dataset preparation, the source data was taken directly from the output of the 2015 Library of Congress GIS Research Fellows program. No sampling methods were used.

Preprocessing steps

In preparation for the public release of the dataset in 2022:

  1. Filenames were standardized for clarity.
  2. The shapefile attribute table was updated with standardized filenames and re-exported using Esri's arcgis Python library.
  3. The "sheet_img" files were cropped and deskewed (rotated) for clarity of presentation.
  4. 104 images in the "geo" directory that used approximate transformations stored in Esri .tfwx and .aux files, rather than GeoTIFFs, were removed.
  5. A metadata.csv file was generated, based on metadata in filenames.

Additional details of the filenaming scheme and of the original methods used for the 2015 project can be found in the dataset's README.md file.

Potential risk to people, communities, and organizations and strategies for risk mitigation.

The data package provides the following Content Advisory, found in the README.md and cover-sheet.md files:

Maps, like other archival materials, are produced from the particular socio-political perspective of their creators, expressed through the use of labels, choice of language, drawing of borders, and other visual symbology. Maps can contain contested land claims, offensive place names, and partial perspectives on political and cultural conflict. The historical maps contained in this dataset were initially prepared and issued by the Austro-Hungarian Monarchy's Militärgeographisches Institut beginning around 1875. After the dissolution of the Austro-Hungarian Empire in 1918, parts of the set were continued by successive governments. Some portions of this set were produced by Third Reich Germany in the 1930s and 1940s during and after annexing areas to its south and east. Most of this set is produced by various governmental military mapping agencies.

The regions covered by this set of maps are linguistically and culturally diverse. Generally, place names in the set are identified by their German cognates, but they are also known to sometimes be in Hungarian or Slavic languages.

How will the experiment team address outdated or potentially offensive terms or elements of data that may be harmful if encountered by human users?

The content of the original maps has not been altered except to crop for georeferenced display. Please also refer to the Content Advisory above.

The Library of Congress is unaware of any copyright or other restrictions in this map set. Absent any such restrictions, these materials are free to use and reuse. The determination of the status of an item ultimately rests with the person desiring to reproduce or use the item.