README FILE FOR The London Stage Information Bank http://minds.wisconsin.edu/handle/1793/71768 Created by: Mattie Burkert Dept. of English University of Wisconsin-Madison 600 North Park Street Madison, WI 53706 burkert@wisc.edu, madelaine.burkert@gmail.com with support from: Brianna Marshall Digital Curation Coordinator Chair, Research Data Services University of Wisconsin-Madison Madison, WI 53706 brianna.marshall@wisc.edu William Daland Designer and Developer of the London Stage Project software, LSIB's first phase, which implemented both the data entry system and the data processing software for producing files suitable for information retrieval, as well as its information retrieval program. 4917 Brentwood Rd. Durham, NC 27713 wdaland@mindspring.com --------------------------------------- FILE LIST This folder contains: ./LSP_data.zip – raw data files ./CONCATENATED.txt – a single file containing all of the data from LSP_data.zip, combined and saved as plain text ./CODE_SCANS.zip - raw image files of code printouts ./CODE_SCANS_COMBINED.pdf - a single PDF containing all of the raw image files of code printouts ./README_BURKERT.txt - the current file, explaining the dataset and its components. --------------------------------------- FILE INFORMATION I. Dataset history The data files were retrieved from the Lawrence University Archives in Appleton, WI on September 15, 2014 by Mattie Burkert, with the assistance of University Archivist Erin Dix. The code pages were scanned and emailed to Mattie by programmer Will Daland over the course of several months in 2015. The files contained here represent a database constructed at Lawrence University between 1970 and 1978, under the direction of Professor Ben Ross Schneider, Jr. That database was titled The London Stage Information Bank. It was in turn based on an 11-volume, 8000-page reference series published by Southern Illinois University Press between 1960 and 1968, which was called _The London Stage, 1660-1800: A Calendar of Plays, Entertainments & Afterpieces, Together with Casts, Box-Receipts and Contemporary Comment. Compiled From the Playbills, Newspapers and Theatrical Diaries of the Period_[http://catalog.hathitrust.org/Record/000200105]. In 1970, Schneider was approached to create a computer index of the entire reference series. After 9 years, he produced The Index to The London Stage (SIU Press, 1979) [http://catalog.hathitrust.org/Record/000299859]. In the process, he and an extensive team of typists, editors, and programmers produced the Information Bank, a version of which is represented in these files. II. Specifics of data The workflow for the creation of the data was as follows: * Schneider and several graduate student editors used colored pencils to mark up pages of the reference book, identifying particular types of entities (headers, cast lists, extraneous information) that were to be coded in specific ways. * The pages were sent to China Data Systems in Hong Kong, where typists transcribed the marked-up text in an OCR-friendly font called OCRB (with some modifications to the IBM Selectric typeballs). The typists also coded the editorial markups, and standardized certain elements such as punctuation. * The corrected and standardized text was sent to Information Control Incorporated in Kansas City, where it underwent Optical Character Recognition. The results were stored on magnetic tapes in EBCDIC suitable for an IBM System/360 using the OS/360 Operating System. * Once the tapes containing the OCR’d text were returned to Appleton, a program called ICIFIX (created by Will Daland) was used to perform additional correction and standardization. This was a simple program that ran interactively using the OS/360 operator's console. * Later, Undergraduate student editors used a word-processing program developed specifically for this project (SITAR) to edit the resulting text, reducing errors and inconsistencies more efficiently in an iterative process. In order to produce the structure necessary to produce the Index, the data was run through several other programs developed by Daland in PL/1, which were combined into a system stored on an IBM 2311 disk called GWSJR1 (named after George Winchester Stone Jr. one of the five authors of the London Stage volumes, who contributed money to pay for the disk). The Information Bank was intended to be maintained at Lawrence University. In 1983, when Lawrence ceased to maintain a computer capable of reading the tapes on which the Information Bank was stored, the tapes were transferred to the Harvard Theater Collection. The HTC gave the tapes to their IT department to be migrated onto a new medium, at which point they were lost. The present data files were found stored on 3.5” floppy discs that appeared to have been written in 1990 at Lawrence University. No documentation of the specific provenance of these discs accompanied them, nor did a search through the IT department or Archive logs turn up evidence of the chain of transmission. As of 6/3/2015 it is thought that the data available in the present files may have been run through the LSP System. However it is possible that this data is essentially an ASCII version of the raw data as it arrived from ICI as EBCDIC data on magnetic tapes, or it may be anything in between. File extensions: The files in LSP_data.zip have idiosyncratic extensions (.LSP, .NPK), but can be opened in plain text editors like Notepad. Markup conventions: Documentation retrieved from the Lawrence Archives decodes the “box codes” used by the typists to demarcate particular kinds of information, which may be useful in understanding the present files. For instance, *p is placed at the beginning of the performance header, which includes the title of mainpiece and its cast list. *a is placed before the afterpiece title and cast list. *d corresponds to dancing, *s to singing, *m to music, and *e to entertainment. Each of these “boxes” includes a list of performer and piece performed (e.g. singer – song). *c is the code for additional comments from the editors of the volumes. Additional documentation of the data’s syntax can be accessed through the Lawrence University Archives. The source code of the LSP software was not preserved in ASCII or EBCDIC text files, but was saved by Will Daland in printed form. From March to August 2015, Will scanned his hard copies as electronic image files using an HP 4620 printer/scanner. The scans here represent code pages 22-97, 100-121 and documentation pages 1-48, 54; the remaining pages were not deemed useful or relevant by Will for recovering the software. The formats of the files vary, as do the color and brightness settings, because Will Daland and Mattie Burkert were attempting to identify the best combination for Optical Character Recognition. The individually scanned pages are named according to conventions defined by Will: d09 = documentation page 9 p021 = code page 21 For code pages, additional parameters are listed in this order: color mode, resolution, brightness, contrast, file format. So, the file p025c300dpib2c75c.tif represents code page 25, in color, resolution downsampled to 300 dpi, brightness 2, contrast 75, saved as a TIFF image file. Not all file names contain all of this information. If no color mode is indicated, scan is assumed to be in color. If no brightness is specified, it is assumed to be 2. As of 8/13/15, Will Daland and Mattie Burkert are working on restoring this software to a runnable form. III. Contributors to data Compilers and editors of the original reference series, and Advisory Board of the London Stage Information Bank: William Van Lennep, Emmett L. Avery, Arthur H. Scouten, George Winchester Stone Jr., and Charles Beecher Hogan Additional Advisory Board Members: Allardyce Nicoll, Sybil Rosenfeld, Cecil Price, Philip Highfill, Kalman Burnim, Carl Stratman, John Robinson, William Armstrong Head of the London Stage Information Bank project: Ben Ross Schneider, Jr. Graduate student markup editors: Leonard Leff, Marchia Heinemann, Muriel Friedman, and Mark Auburn Additional non-specialist markup editors: Devon Schneider, Ben Schneider III, Dorothy Church Typists at China Data Systems: names unknown Programmers: Will Daland (devised markup conventions and developed processing programs for data processing), Reid Watts (developed word-processing software for editing the data, i.e. SITAR), Walter Brown, Nick Schneider Undergraduate Research Assistant: Cynthia Persak (executed the data entry and data processing programs, starting with the magnetic tapes from ICIFIX, and produced files suitable for executing the information retrieval program) Undergraduate editors: Catherine Boggs, Catherine Steiner, Marc Weinberger, Joseph Jacobs, Ruth Steiner, Connie Hansen, Sarah Larsen, Laurie Johnson, Sue Kock, Peter Pretkel, Lynn Seifert, Louise Freiberg, Elizabeth O’Brien, Jan Surkamp, Mark Burrows, Kathy Rosner Other contributors: Debbie Watts, Mackay Taylor Schneider, Scott Farnsworth, Marc Weinberger, Suzanne Fusso, Melinda Young Funders: The project cost approximately $200,000 to complete. Funding was provided by the National Endowment for the Humanities, the American Council of Learned Societies, the American Philosophical Society, the Andrew Mellon Foundation, the United States Steel Foundation, the Billy Rose Foundation, Lawrence University, and individual gifts from Mrs. John A. Logan, Charles Beecher Hogan, Faith Bradford, Dr. and Mrs. J. Merrill Knapp Jr., and an anonymous Friend of Lawrence University. IV. Books and Articles published from or relevant to the data Listed in chronological order: * Ben R. Schneider Jr. and Will Daland. “The ‘London Stage’ Information Bank.” Computers and the Humanities 5.4 (1971): 209-214. * Ben R. Schneider. “The Production of Machine-Readable Text: Some of the Variables.” Computers and the Humanities 6.1 (1971): 39-47. * Ben Ross Schneider Jr. Travels in Computerland; or, Incompatibilities and Interfaces. A Full and True Account of the Implementation of the London Stage Information Bank (Reading, Mass: Addison-Wesley Publishing Company, 1974). * Ben R. Schneider Jr., ed. The Index to The London Stage (Carbondale: Southern Illinois University Press, 1979). * Ben Ross Schneider Jr. “The London Stage Project: Its Status and Future.” Data Bases in the Humanities and Social Sciences. Ed. Joseph Raben and Gregory Marks (Amsterdam: North Holland Publishing Company, 1980). * Ben Ross Schneider Jr. My Personal Computer and Other Family Crises; or, Ahab and Alice in Microland (New York: Macmillan, 1984). --------------------------------------- COPYRIGHT & LICENSING INFORMATION Southern Illinois University Press holds the copyright to all volumes of The London Stage as well as the Index to the London Stage. The HathiTrust scans accessible at the links above are available for use under a Creative Commons Attribution Noncommercial 3.0 license. In 1970, Southern Illinois Press gave Lawrence University exclusive rights to create and maintain an electronic database of The London Stage. In 2015, Southern Illinois University Press granted Mattie Burkert non-exclusive permission to deposit the data files in a secure, university-supported repository at the University of Wisconsin. Lawrence University concurred in allowing Burkert to do so. In 2015, Will Daland granted Mattie Burkert permission to deposit the code scans in a secure, university-supported repository at the University of Wisconsin. For permission to use this data, please contact Angela Moore-Swafford at SIU Press (angmoore@siu.edu) and Erin Dix at Lawrence University Archives (erin.k.dix@lawrence.edu). For permission to use the code, please contact Will Daland (wdaland@mindspring.com) and Mattie Burkert (burkert@wisc.edu, madelaine.burkert@gmail.com). --------------------------------------- LIMITATIONS There are a number of possible issues with the data files. For example, the dates seem to be missing or damaged in many of the performance headers. Numerals appear to have been replaced with special characters. This appears to be a character conversion issue and may have happened when the data were migrated to floppies or at another point. Investigation into correcting the data is ongoing. Use these data with caution.