Open Notebook Science Archiving Project


Objective

The aim of this wiki is to provide a central place to brainstorm and execute on the challenge of archiving Open Notebook Science (ONS) projects in a way that specific versions of documents can be archived and cited. The participation of libraries would be ideal as objective third party curators.

Background

The advent of new forms of digital scholarship is both an opportunity and a challenge for researchers and librarians.(Rusbridge 09 Digital Curation Blog) It provides researchers with the ability to share work that could not be communicated in the past using traditional publication tools. The challenge for libraries is that this new scholarship often does not lend itself to archiving and curation like traditional sources.

Although repository management systems - such as DSpace - have been created to cope with new formats, they are not able to handle the entire diversity. Open Notebook Science (ONS) represents such a format and a solution to the obstacles in archiving ONS could provide a template for a wider range of new forms of scholarship.

Open Notebook Science

The term "Open Notebook Science" was first used in 2006 to represent the public sharing of laboratory notebooks and all associated raw data in as close to real time as possible.(Bradley 06 Drexel CoAS E-Learning Blog) This approach was presented as a way to communicate science much more quickly and more comprehensively - since failed as well as successful experiments are shared this way. Two large applications of ONS are currently implemented in the Bradley laboratory at Drexel University: the UsefulChem project mainly on synthesizing new anti-malarial compounds and the Open Notebook Science Challenge to measure non-aqueous solubility.

Citing ONS documents

Both the UsefulChem and ONSC projects make extensive use of wiki, blogs, Google Spreadsheets and to a lesser extent Flickr, YouTube and other miscellaneous formats. Wikis have the advantage that previous page versions can be accessed on the server running the wiki. To a more limited extent Google Spreadsheets have a similar functionality.

Documents from both ONS projects have been successfully cited in peer-reviewed publications as well as the blogosphere.(Bradley 09 UsefulChem Blog) This is very encouraging since it should encourage broader use of new forms of digital scholarship. For example when wiki page from a laboratory notebook is cited it is possible to view the history of that particular page to find out its status at the time of citation.(Bradley 08 UsefulChem Blog) However, one has to do the same for all pages and documents linked to that page. For some types of documents, like images on Flickr, there is no way to even view previous versions.

The Snapshot Archive

It would be most beneficial to be able to link to a version of the entire project on a particular day, in addition to the most recent version of a particular document. This way other researchers can interpret the exact context of a citation. Over time there may be additions, corrections and re-interpretations but other researchers should be able to reproduce the state of knowledge at a given point in time.

An archive consisting of snapshot views of the entire knowledge base will meet this requirement. When navigating through a snapshot view, links will interconnect documents exactly as they were on the date when the citation was made.

Next Steps

Level 1

As a way of quickly investigating the feasibility of this project with minimal investment of resources, set up a publicly accessible folder on a library server and run the SolSumArchive program using Windows Scheduler, set to run daily at 3:00 AM ET. This will create a snapshot of the summary file containing the solubility data for the ONSC project. The program archives the SolSum Google Spreadsheet as an Excel document, which maintains all the calculations and hyperlinks in the sheet. The Excel documents are named starting with the date so that they can be sorted chronologically.(Bradley 09 UsefulChem Blog)

Level 2

A further simple step involves running the ONSPreserver program once a day using a Windows Scheduler. This will create an archive of key files specified in the ONSbackup Google Spreadsheet. Using this sheet, we can ensure that various file formats are stored properly. Selected files can be added manually to the backup sheet and can be used as a template for archiving other similar digital scholarly projects.

Level 3

The next step involves running the Challenge.exe program once a day using a Windows Scheduler. This will automatically spider the ONSChallenge wiki and linked files and automatcially create a snapshot.

Level 4

With a program that has not yet been created, the final step is to take a full snapshot of both the ONSC and UsefulChem projects while changing the hyperlinks in the downloaded documents so that all navigation will occur within the archive for that day. Programs that allow viewing of some specialized files - like JCAMP-DX for spectra - will also be included so that one can navigate and visualize all of the datasets for a given snapshot completely offline.

ONS Projects to be Archived

ONS Solubility Challenge
UsefulChem

Resources

Presentation to Columbia Library in May 2009
A description of the status of the project on April 12, 2009
Instructions for setting up Windows Scheduler

Software

ONSPreserverLite (Andy Lang) (Level 1)

This will create a copy of the main spreadsheet SolubilitesSum and prepend the date on the file name, see Level 1 above.

ONSPreserver (Andy Lang) (Levels 2 and 3)

This will create a copy of all files listed on this Backup Spreadsheet and prepend the date on the file name. Within the same archive there is an executable - Challenge.exe - that automatically spiders and then backs up a significant portion of the ONSchallenge wiki files not listed in the Backup Spreadsheet.