Available online at www.sciencedirect.com Procedia Computer Science 4 (2011) 658–667 International Conference on Computational Science, ICCS 2011 Paper Mˆ ch´ : Creating Dynamic Reproducible Science a e Grant R. Brammer, Ralph W. Crosby, Suzanne J. Matthews, and Tiffani L. Williams [grb,rwc,sjm,tlw]@cse.tamu.edu Department of Computer Science & Engineering, Texas A&M University, College Station, TX 77843-3112 Abstract For centuries, the research paper have been the main vehicle for scientific progress. From the paper, readers in the scientific community are expected to extract all the relevant information necessary to reproduce and validate the results presented by the paper’s authors. However, the increased use of computer software in science makes reproducing scientific results increasingly difficult. The research paper in its current state is no longer sufficient to fully reproduce, validate, or review a paper’s experimental results and conclusions. This impedes scientific progress. To remedy these concerns, we introduce Paper Mˆ ch´ , a new system for creating dynamic, executable research papers. a e The key novelty of Paper Mˆ ch´ is its use of virtual machines, which lets readers and reviewers easily view and interact a e a e with a paper, and reproduce key experimental results. For authors, the Paper Mˆ ch´ workbench provides an easy-touse interface to build an executable paper. By transforming the static research paper into a dynamic and interactive entity, Paper Mˆ ch´ brings the presentation of scientific results into the 21st century. We believe that Paper Mˆ ch´ a e a e will become indispensable to the scientific process, and increase the visibility of key findings among members and non-members of the scientific community. Keywords: executable paper, virtual machines, scientific reproducibility, abstract management, reviewing 1. Introduction Scientific progress depends on the effective dissemination and reproducibility of existing research. For centuries, scientific papers (as well as scientific books) have been the primary mechanism for disseminating scientific results. However, such a mechanism is based on the reader having access to the materials needed to validate the results discussed in the scientific paper. The increased use of computer software in science makes reproducibility of results quite difficult—especially since many scientists do not publish the source code nor the data needed to reproduce their results. As a result, the hypotheses and results discussed in a scientific paper are not validated since it is too difficult for the reader to recreate the authors’ experimental environment. Given that our current dissemination practices impede scientific progress, how can we make scientific contributions more easily accessible (or executable) for the scientific community and public at large? We introduce Paper Mˆ ch´ , a novel paper management system under development that allows users to explore a e research papers interactively, reproduce results and test hypotheses of their own. That is, Paper Mˆ ch´ supports a e the notion of an executable paper. Our expertise in high-performance computing and bioinformatics have provided the motivation for developing our system for a wide variety of users. More specifically, Paper Mˆ ch´ divides the a e scientific community into three different individuals: authors, reviewers, and readers. For authors, our Paper Mˆ ch´ a e system offers a simple interface to build an interactive and executable paper. The novel use of virtual machines in 1877–0509 © 2011 Published by Elsevier Ltd. Selection and/or peer-review under responsibility of Prof. Mitsuhisa Sato and Prof. Satoshi Matsuoka doi:10.1016/j.procs.2011.04.069 Grant R. Brammer et al. / Procedia Computer Science 4 (2011) 658–667 659 Paper Mˆ ch´ allows authors to easily reconstruct their experimental environment, which in turn, allows reviewers and a e readers to explore a paper interactively, reproduce and validate experimental results, and test their own hypotheses. 1.1. Existing paper management systems Current systems for author-reviewer interaction involve the ability for authors to submit static, scientific papers to a server, which are then assigned to reviewers. The reviewers then review the paper by submitting their comments back to the server. This is usually done with the assistance of abstract management software (usually bundled with a conference management system). The purpose of these systems is to lessen the administrative workload of handling the large volumes of submitted scientific abstracts and articles. Within the computer science community, EasyChair [1] is by far the most popular conference management system, due to its ease-of-use and being free. In 2010, there were 3,306 computer science conferences managed using EasyChair [1]. Popular commercial options include START [2] and Linklings [3], the latter which is used by computing conferences such as SuperComputing (SC) and Grace Hopper Celebration of Women in Computing (GHC). Professional organizations for computing such as ACM and IEEE use ScholarOne Manuscripts [4] (formerly Manuscript Central) as their abstract management software. However, the key limitation to these systems is that they only permit the inclusion of the research paper itself. Experimental data and source code are not uploaded to these systems. If an author wishes to have these elements available for reviewers and readers, the author must find a way to host the source code and data during the review process. The reviewer then has to download the source code and then spend time figuring out how to execute the source, which can be a nontrivial process. As a result, the current techniques for managing papers makes it very hard to reproduce the experimental results in the paper, which is key to validating the claims the authors make in their scientific paper. In addition, the forums in which authors can interact with readers is quite limited. At conference talks, audience members have a limited time span in which to directly interact with the author. In some scientific journals (e.g. Systematic Biology), readers can respond to authors and their work in the form of a “Points of View” section. However, this form of communication between authors and readers is not immediate since it can take months before the point of view appears in the journal—assuming the reviewers reading the point of view feel that the viewpoint is worthy of publication. Web-based systems like CiteULike [5] promote reader interaction within the scientific community through “social bookmarking”. Here, users share references to papers they enjoy, and they can also see who likes the same paper they do. However, there does not seem to be a way for these readers to interact directly with the author themselves. Such interaction would be very valuable since authors can use the feedback as a way to improve their work. 1.2. Summary and software availability a e We believe that Paper Mˆ ch´ is indispensable for future scientific research, and provides a mechanism for increasing the visibility and accessibility of scientific findings to everyone. Currently, our proposed system is under development and is a finalist in the Executable Paper Grand Challenge sponsored by Elsevier. For more information regarding this challenge, please visit http://www.executablepapers.com/index.html. A demonstration of our Paper Mˆ ch´ system will take place at the International Conference on Computational Science (ICCS’11) a e in June 2011. Thus, this paper discusses the underlying design of Paper Mˆ ch´ , considers sample use cases, and a e discusses how our system will be evaluated, and explores the future capabilities of Paper Mˆ ch´ . a e 2. Background: Virtual Machines The novelty of Paper Mˆ ch´ is the use of virtual machines (VMs) [6] to implement an executable paper that allows a e authors, reviewers, and readers to interact. A virtual machine is a system that performs software-level emulation of a system different than the host machine. While virtual machines have existed since the 1960s, they have been used in recent years as a mechanism to evaluate new operating systems, act as test environments for software, back-up a system, and run code on virtualized “clones” of legacy machines. A virtual machine consists of two main components: the hypervisor and (one or more) guest operating systems. The hypervisor is installed on the native operating system of the host machine, and is used to run one or more guest operating systems. The guest operating system is stored on the host machine in the form of an logical image, which is a disk file consisting of one or more physical disks that the guest requires to execute. This guest image contains the full installation of the guest operating system and 660 Grant R. Brammer et al. / Procedia Computer Science 4 (2011) 658–667 Figure 1: A diagram showing how authors, reviewers, and readers interact in Paper Mˆ ch´ during the life cycle of a research paper. Before a e publication, authors submit their paper to the reviewers for review. After discussing the paper amongst themselves, the reviewers submit their reviews to the author. If requested, the authors can then resubmit the revised paper, and receive further revisions from the reviewers. After publication, the research paper is available to the community of readers. These readers can discuss the paper amongst themselves as well as comment on the paper. The Paper Mˆ ch´ system will facilitate reader comments on a paper and expedite author responses. a e software applications. When instantiated by the hypervisor, the guest operating system executes as if it is running directly on the host machine’s hardware. In addition to commercial virtual machine software such as VMware [7] and Parallels [8], fully-functioning open source alternatives such as VirtualBox [9] and Xen [10] are available. For Paper Mˆ ch´ , the advantages of virtual machines that have allowed for their prolific use are now being apa e plied to improve the quality of research papers and the interactions between authors, reviewers and readers. Virtual machines allow authors to create a snapshot in time of their experimental system, allowing them to easily package their results and data with their research paper in a single entity. Thus, the results of the executable paper can easily be reproduced in the future as a virtualized clone of the original experimental platform. Another added benefit to creating executable papers as virtual machine images is that is easy to enforce an added security levels. For example, during the reviewing process, the VM image of the paper can be locked as “read-only”. This allows reviewers to simultaneously view unpublished source code and data, while preventing them from separating unpublished material from the package and co-opting them for personal benefit. Security controls will increase author confidence in the reviewing process, and help prevent plagiarism. Once the paper enters public domain, some of these restrictions may be lifted. Thus, the use of virtual machines in the creation of the executable paper within Paper Mˆ ch´ simultanea e ously allows readers and reviewers to interact with a research paper and its experimental results, while protecting the author’s sensitive data and source code during the pre-publication phase of the paper. 3. The Paper Mˆ ch´ System a e Paper Mˆ ch´ is designed to support the requirements of the authors, reviewers and readers that comprise the a e scientific community. As illustrated in Figure 1, Paper Mˆ ch´ is intended to support the needs of the full life-cycle of a e a research paper. Moreover, Paper Mˆ ch´ does not replace, but in fact augments, the capabilities of scientists working a e to create new research. Before a paper is published, authors are responsible for creating the paper and submitting the paper to reviewers. At the core of this process is the Paper Mˆ ch´ package (.pm) file. This file is the artifact that a e represents the “executable paper” and is the container for all the various elements of the paper. For a particular paper, a single .pm file is created. Users interact primarily with the Paper Mˆ ch´ Workbench, which allows the user to create, a e update, manage and access the .pm files. While authors use the Paper Mˆ ch´ Workbench to create, update and manage a e the .pm files, readers and reviewers use the workbench to view and access the .pm files. During the pre-publication phase, reviewers evaluate the paper and communicate with the author. Once the paper has been published, readers Grant R. Brammer et al. / Procedia Computer Science 4 (2011) 658–667 661 Author Create/Edit Paper Mâché VM Comments Paper Source Code Executables Data Libraries Dependancies Ratings Reviews Discussion Text Figures Audio Video Execute Comment Read Reader/Reviewer Figure 2: An overview of the Paper Mˆ ch´ system showing the interaction between the actors and the major components of the system. The author a e is responsible for creating the paper, virtual machine (VM), and all underlying content. These sections are uploaded through the Paper Mˆ ch´ a e workbench, which creates a .pm file. Once the components are uploaded, a web based comments section is made available to readers or reviewers. Online users can download and execute the VM, comment on different aspects of the system, and read the paper. discuss the paper and send their comments to the authors via the Paper Mˆ ch´ Workbench. The author, in turn, can a e respond to the comments. These interactions are explored in more detail in Sections 3.1 and 3.2. As mentioned earlier, Paper Mˆ ch´ uses virtual machines to replicate the environment in which code and scia e entific experiments were run. Image files containing virtual machines (.vm files) will be packaged within the .pm Paper Mˆ ch´ package. In order to work with Paper Mˆ ch´ virtual image files, all participants in the process (aua e a e thors, reviewers and readers) will need to download a Paper Mˆ ch´ hypervisor appropriate to their environment. Once a e downloaded, the hypervisor will be usable with any .vm files contained in .pm Paper Mˆ ch´ packages. Authors will a e also need to do a one-time download of the Paper Mˆ ch´ image tool that will help automate the process of creating a e virtual images for their research. 3.1. Creating an executable paper or .pm file Here, we describe how an author (or authors) of a scientific paper use Paper Mˆ ch´ to create an executable version a e (or .pm file) of the paper. We note that Paper Mˆ ch´ does not replace conventional research leading to publishable a e scientific results. Prior to working with Paper Mˆ ch´ , it is assumed that the author(s) of the paper have performed all a e of the necessary research and written the paper. First, the primary author logs into the Paper Mˆ ch´ workbench, creates an empty .pm file (the Paper Mˆ ch´ wrapa e a e per in Figure 2) and identifies any additional authors authorized to edit the .pm file. Additional metadata (description, keywords, etc.) may be entered at this time or at any future point in the process. At this point, the empty .pm will be in an “editable” status indicating that the contents may be freely changed by the authors. To create the contents of A the .pm file, the authors will upload the text of the paper (e.g., LTEX or .doc(x) format) and also upload associated files (e.g., audio/video, graphics, figures) in their native formats. This is represented by the Paper section in Figure 2. While there will be no particular order required by the upload process, dependency checking (e.g. the existence of referenced figures in the document) may be requested and will be required prior to review. To create the virtual machine (.vm) files associated with paper, the virtual machine image tool will be run on a test machine (or machines) to create one or more .vm files for machines that host any applications referenced in the text. 662 Grant R. Brammer et al. / Procedia Computer Science 4 (2011) 658–667 As part of the virtual image, the authors will define the scripts or commands necessary for a reader to recreate the tests referenced in the text. These instructions will be packaged as metadata associated with the .vm file. Authors will be responsible for testing and adjusting the generated .vm files using a previously downloaded Paper Mˆ ch´ hypervisor a e appropriate to their environment. When completed the .vm files will be uploaded into the .pm file package creating the VM section shown Figure 2. We note that in Table 2, many of the steps listed under the “Traditional” column will need to be performed by the author. However, the use of virtual machines will eliminate this process for reviewers and readers. Once satisfied with the contents of the .pm file, the authors will transition the file to a “submitted” status. In this state, the .pm file will be locked preventing updates. Additionally, the file will only be visible to the authors and reviewers, which are assigned by a conference program chair or journal editor. For simplicity, it is best to think of journal editors and conference program chairs as “super reviewers” within Paper Mˆ ch´ . As a super reviewer, they a e have the power to assign papers to reviewers as well as make decisions as to whether a paper has been accepted for publication. Once the reviewing process has ended, authors will receive their reviews as well as the decision whether their paper has been accepted for publication. If changes are required prior to publication, the journal editor or program chair will change the .pm file state to allow the authors to make any changes requested. When the package is approved for publication, the editor or program chair will change the status to “published” and the .pm file becomes publicly available. In this state, the .pm file will be locked for changes. However, the authors will still be able to make changes (e.g., updates to source code) to the .pm file. All updates will be tracked separately from the base .pm file so that readers will easily be able to view the original contents of the file. Finally, we note that there are unfortunate situations where a published paper has to be retracted or formally corrected. Paper Mˆ ch´ will be able to change the status of a e such published papers and make the new status clearly visible to readers. Once completed, the authoring process will have generated a .pm file package containing everything necessary to not only understand the research but duplicate and further experiment with far into the future. 3.2. Reading an executable paper or .pm file The following discussion is focused on the readers of the executable paper, but all operations are equally appropriate to reviewers. The only difference between reviewers and readers is the visibility of the materials. During pre-publication, only reviewers and the paper authors will be able to access the paper. Comments entered by the reviewers (and authors) will only be visible to the authors and reviewers during this period. A reader starts their interaction with the Paper Mˆ ch´ Workbench by logging into the web application and searcha e ing for a paper of interest. They will be able to read the abstract as part of the search results. If they decide to study the paper further, they will be able to click on the paper and open the .pm file in a web page. After opening the .pm package, the reader will be able to read the paper as well as view multimedia, figures and other content available (again represented by the Paper section of Figure 2). Hyperlinks help users navigate and view relevant portions of the paper, figures, charts and graphics associated with the package. At any point, the reader will be able to enter comments and ratings for the paper or specific elements of the paper. Comments will be available to the authors in the .pm package itself (see Figure 2), and shown on the web page associated with the .pm file. Authors will also be able to review and respond to comments from readers through the web page. Clicking on the .vm file name on the web page will initiate the download of the virtual machine image onto the reader’s computer. Once downloaded, the .vm file is executed on the previously downloaded Paper Mˆ ch´ hypervia e sor. The reader is able to sign into the virtual machine and recreate the authors’ experiments using the scripts and commands packaged within the .vm file. Since the .vm file is a fully functioning virtual machine, readers can easily adjust parameters and try running the source with different data. An example. Figure 3 shows a sample prototype of a Paper Mˆ ch´ virtual machine executing a published paper a e describing a MapReduce inspired algorithm called MrsRF [11]. Here, the executable paper (or Matthews2010.pm) is executing on a reader’s desktop. Within the virtual machine window, the reader has executed an application from the command line and generated a graph. The source code and build files will be packaged within the .vm file and the reader is able to view and modify the source code for the application to further experiment with the application. Such changes may be saved in the reader’s local copy of the virtual machine but will not be saved in the web based package. Grant R. Brammer et al. / Procedia Computer Science 4 (2011) 658–667 663 Figure 3: A prototype of the Paper Mˆ ch´ system. Readers and reviewers download the Matthews2010.pm file to their desktop. Using a hypervisor a e to execute the .pm file opens up a new window, displaying the guest operating system (in this case, Ubuntu). In the context of this guest operating system, readers and reviewers can reproduce the experimental results discussed in the paper. When readers and reviewers are done interacting with the experimental environment, they can simply close the window. Executing the paper within Paper Mˆ ch´ is much easier than having each reviewer and reader set up the expera e imental environment on their own. In this example, the source code for MrsRF is available publicly from the web. The MrsRF source code takes advantage of Phoenix [12], an underlying MapReduce framework which was originally designed for the Solaris operating system. We then modified Phoenix to get working on some versions of Linux (e.g., Ubuntu and CentOS). However, if the readers and reviewers do not have access to those versions of Linux (or the correct version of the Gnu C Library [13]), then they cannot run our software properly. Gnu C Library (glibc) incompatibilities are especially difficult to deal with, since making haphazard updates to a system’s glibc installation can lead to disastrous results. In the past, we had a real-life case involving a reader at another university who had Linux and glibc incompatibility issues with MrsRF. Despite the code being freely available on the web, and open, continuous communication between the authors and the user, the situation was only fully resolved when the reader ended up using a virtual machine to execute the MrsRF code. This finally allowed her to reproduce the results found in the paper. As a result, we believe that the integration of virtual machine files in Paper Mˆ ch´ will quickly and easily a e allow users to recreate a paper’s experimental framework and reproduce results. Paper Mˆ ch´ readers and reviewers will be able to easily interact with research. As a result of the .vm file, all a e components of the research (source code, libraries, etc.) will be exposed. Thus, if the reader desires to create an executable version outside of the virtual image (e.g., execute the paper on their operating system of choice), it will be far easier to construct such an environment with the working model that is available from within Paper Mˆ ch´ . a e 3.3. The Paper Mˆ ch´ file (.pm ) and Workbench a e An executable paper will be represented as a single .pm file within the Paper Mˆ ch´ system. This file will be a e structured as a set of subdirectories as shown in Table 1, similar to the structure of a Java .jar file. The file will be built, updated and maintained by the Paper Mˆ ch´ Workbench. It is not intended that the authors be directly a e responsible for updating the .pm files. Within the .pm file, there will be a set of directories corresponding to the sections of the executable paper as shown in Figure 2. The paper subdirectory holds the text of the paper as well as any figures referenced and any additional multimedia files. The comments subdirectory contains the comments and ratings entered by readers and reviewers as well as responses of the authors. The metadata subdirectory holds additional information (author’s names and contact information, dates updated, etc.) associated with the paper. The .vm file will also be contained with the .pm file. While readers and reviewers will primarily interact with the paper using the web-based Paper Mˆ ch´ workbench, a e it will be possible to download the entire .pm file from the web if desired. The .pm file may then be expanded into a 664 Grant R. Brammer et al. / Procedia Computer Science 4 (2011) 658–667 \Matthews2010.pm \paper \text \figures \media \comments \metadata MyPaper.vm • Files associated with the paper portion of the executable document • Text files such as html and pdf representations of the paper • Figures and data referenced in the paper • Multimedia content (e.g. video, audio) • Comments and ratings for the paper • Metadata associated with the package • Virtual machines image Table 1: Contents of the Matthews2010.pm file shown in Figure 3. The .pm file will have a standard directory structure similar to Java .jar files. The paper, comments and .vm sections correspond to sections in Figure 2. The metadata section contains various information about the .pm package and its contents. set of directories on the users machine. In this mode, any changes to the contents of the package will not be recorded in the web based copy of the .pm file. The only required elements of the .vm file (in addition to an operating system) will be the source code to the applications, any dependencies required to build and run the applications, and scripts or instructions for reproducing the experiments referenced in the paper. Inclusion of other portions of the paper (e.g. pdf files) in the .vm file already contained in the .pm package will be optional. The web based Paper Mˆ ch´ Workbench will act as the point of contact for all users and provide robust capabilities a e for the management of .pm packages. Implementation details of the system include using contemporary web design (CSS, AJAX, etc.). The workbench will be developed using Ruby on Rails hosted on an Apache web server and interfacing with a MYSQL database. The workbench is organized around the actions associated with the various roles (author, reviewer, reader) an individual performs in a scientific community. For example, a casual reader just browsing a set of papers will only be able to view those elements that the author (and publisher) have enabled for view. For example, only authors can view the comments left to them by reviewers. Individual elements within the .pm file will also be secured. For example, to protect intellectual property, essential data may not be viewable until the paper is actually published. All operations and functions within the workbench will be secure. For example, a standard role-based security model (e.g. author, reviewer) will be used in conjunction with an overall state for the .pm file (e.g. under construction, in review, published) to allow security to be varied depending on where the paper is in the publishing cycle. Each element within the .pm file will have an Access Control List (ACL) to provide highly granular control over security. Additionally, the workbench provides revision control on the contents of the package to allow those with appropriate security to view and revert changes to elements with the package. Whenever possible, elements within the package will be watermarked with codes that would allow for tracking of the elements back to the original package to reduce plagiarism. To create and execute virtual machines, the workbench will provide a downloadable tool that will assist the author in creating a virtual image from an existing machine. This image may then be uploaded. Moreover, the workbench will provide the ability to execute the virtual image providing a remote desktop to the user (VNC, Windows Remote Desktop). Through the workbench, the reader will be able to interact with the paper, and participate in public discussions (see Figure 2). By taking advantage of the security features allowed by the Paper Mˆ ch´ hypervisor, authors a e will be able to “lock down” portions of the virtual machine to prevent coping of unpublished research. Authors using the workbench can create new packages and modify packages they own. They will also be able to view and respond to comments, whether from readers (public), or from reviewers (private). Lastly, reviewers can use the workbench to interact with the paper and leave comments to the authors. 4. Evaluation of Paper Mˆ ch´ a e We will evaluate the performance of our Paper Mˆ ch´ system based on two metrics: speed and understanding. a e For example, our first experiment will measure the time it takes for a reader/reviewer to interact with the science described in a paper. That is, consider Table 2. The traditional approach requires seven steps to interact with the science in a paper. Of course, this assumes that the software is available publicly or directly from the author and Grant R. Brammer et al. / Procedia Computer Science 4 (2011) 658–667 665 that readers and reviewers can get the source code compiled on their experimental platform and that the data is also available for experimentation. For Paper Mˆ ch´ , Table 2 shows that there are three steps that are needed to recreate a e the experiments that are discussed in the scientific paper of interest. For our performance evaluation, we will select a set of papers from various conference and journal publications, and measure the average amount of time that the traditional mechanism requires the reader/reviewer to obtain the science discussed in a paper. We will compare that number to the time required under Paper Mˆ ch´ . Given that Paper Mˆ ch´ is a new system, we will ask for volunteers a e a e to store their scientific results on the system so that we have a large enough sample for comparison with the traditional approach. Secondly, we will measure the amount of time required by an author to use Paper Mˆ ch´ to package the executable a e portion of their paper. The traditional approach places little overhead on the author in terms of making the executable portion of their paper available. Instead, the overhead is placed on each reviewer and reader to spend the time required to recreate the experimental environment used by the author (as shown in the Traditional column of Table 2). Clearly, there will be overhead placed on the author in order to use Paper Mˆ ch´ . However, the time spent by the author is a e time saved for each reviewer and reader that accesses the paper. We are interested in the amount of time required by the author to share their executable environment in Paper Mˆ ch´ . Our hope is that the additional time required a e by the author is minimal when compared to the savings gained by reviewers and readers to interact with the science described in a scientific paper. Finally, we will experiment with accessing the improvement in understanding a scientific paper as a result of interacting with it through Paper Mˆ ch´ . Improved understanding means a better experience interacting with a paper a e for readers and reviewers. Certainly, user studies will be conducted as a way to measure understanding. However, we will also consider other types of experiments. For example, one way to measure understanding is to measure the level of engagement with a scientific paper. We could measure the amount of time spent reading a traditional paper compared to the amount of time spent with a Paper Mˆ ch´ package. Our hypothesis is that engaged readers will have a e better comprehension of the scientific content. We could compare the frequency of comments on papers with and with out an attached virtual machine as one metric of accessing interest and to some degree understanding. Furthermore, it will be interesting, to measure whether techniques (such as Paper Mˆ ch´ ) that increase participant’s engagement in a a e paper positively affects the impact factor of scientific journals. 5. Extending the Capabilities of Paper Mˆ ch´ a e Section 3 provides a description of the primary features that are the focus of the current development of our Paper Mˆ ch´ system. However, Paper Mˆ ch´ can be extended in many ways in order to improve the experience of a e a e authors, readers, and reviewers. For example, cloud computing has become a major topic of interest and one that could provide further utility to Paper Mˆ ch´ . By hosting the hypervisor in the cloud, users can execute papers from a e a web browser. This would offload the computational requirements from the user to the host server. With enough computing power server side, we could enable users to test and interact with super computing scale research from their commodity hardware. Community features can significantly add to the reader’s experience. Imagine two copies of the same paper: one fresh off the printer and the other annotated by a graduate student who has poured over the research. Wouldn’t both paper types be helpful in understanding the work? While it might be preferential to first read the paper as it was published before turning to an annotated copy, those notes and comments could be invaluable to understanding complicated passages, figures, or algorithms. One of the goals of Paper Mˆ ch´ is to allow readers to pick up right a e where the authors left off. Furthermore, the ability to share detailed comments and annotations allows readers to experience the paper from different perspectives. Science does not remain static. New experiments are performed, tweaks to code are made, and different data sets are tested. Since research does not stop once a paper is published, it would be useful if these advances were represented in the executable paper. By combining Paper Mˆ ch´ with version control systems, executable papers a e could be made to reflect the ever advancing nature of research. The beauty of version control systems is that they track changes. Hence, the original work can always be available while also making it easy to obtain the most recent version of the software. Small changes to projects do not often warrant a new publication. However even something as simple as a bug fix could have great impact on readers and researches interested in the research. Thus, version control systems are an important step in keeping the community up to date on the most recent iteration of a project. 666 Grant R. Brammer et al. / Procedia Computer Science 4 (2011) 658–667 Traditional 1. 2. 3. 4. 5. 6. 7. Obtain source code Resolve OS / platform dependencies (32 bit vs. 64) Resolve library dependencies Compile with proper flags Obtain data set used in paper Obtain the commands used to run experiments in the paper. Run experiments Paper Mˆ ch´ a e 1. Install VM framework∗ 2. Download VM 3. Run Experiments Table 2: A table comparing the steps required for executing the science in a scientific paper using the traditional approach and Paper Mˆ ch´ . a e The ∗ denotes that this step is only required the first time a user uses Paper Mˆ ch´ . a e 6. Conclusions In this paper, we introduce Paper Mˆ ch´ , a novel system for creating dynamic, executable research papers. While a e virtual machines are widely used to maintain controlled, reproducible environments for software development and testing, Paper Mˆ ch´ extends the use of virtual machines to facilitate the reproduction of scientific research. By a e allowing authors, reviewers, and readers to interact with not just the text but its programs and data in a virtual machine environment, the scientific paper becomes a dynamic, executable entity. Short and long-term compatibility is assured through the use of virtual machines. The programs and data associated with the paper will be runnable even if the actual source code no longer compiles in modern environments. Virtual machines allow for easy, instant execution providing a quick method for validation of the programs and data. The robust security model associated with the Paper Mˆ ch´ packages provides a ideal method for managing a e copyright and licensing issues. The capabilities of any user (or role) can be managed based on the licensing requirements of operating systems and applications. The cloud environment and the ability to seamlessly work with the authors host environment will provide the ability to work with large scale systems and large file sizes. By providing a single point of management (the Paper Mˆ ch´ Workbench) it becomes possible to track the provenance of individual a e elements within the papers. However, the benefits of Paper Mˆ ch´ extend beyond the scientific community. The interactive aspects of our a e Paper Mˆ ch´ system encourages the interest of science and the scientific process amongst the general public, thanks a e to an increase in visibility and accessibility of current research. The increase in accessibility to current findings changes the way that scientific research is performed and communicated. We believe paper management systems such as Paper Mˆ ch´ have the ability to pave the way for more scientific collaborations, increases the communication a e and understanding of core concepts, and will consequently allow for earlier adoption of critical findings into existing research. Thus, our Paper Mˆ ch´ system provides a bridge that allows everyone to actively participate in the a e scientific process. 7. Acknowledgements This publication is based in part on work supported by Award No. KUS-C1-016-04, made by King Abdullah University of Science and Technology (KAUST). This work was also supported by the National Foundation under grants DEB-0629849, IIS-0713618, and IIS-1018785. References [1] [2] [3] [4] A. Voronkov, Easy chair conference system, Internet Website, last accessed, March 2011., available from http://www.easychair.org. Sofconf.com, START v2, Internet Website, last accessed, March 2011., available from http://www.softconf.com/about/. L. LLC, Linklings, Internet Website, last accessed, March 2011., available from http://www.linklings.com/. T. Reuters, ScholarOne manuscripts, Internet Website, last accessed, March 2011., available from http://scholarone.com/products/ manuscript/. [5] Springer, Citeulike: Everyone’s library, Internet Website, last accessed, March 2011., available from http://www.citeulike.org/. [6] R. Figueiredo, P. Dinda, J. Fortes, A case for grid computing on virtual machines, in: Distributed Computing Systems, 2003. Proceedings. 23rd International Conference on, 2003, pp. 550 – 559. doi:10.1109/ICDCS.2003.1203506. Grant R. Brammer et al. / Procedia Computer Science 4 (2011) 658–667 667 [7] B. Walters, Vmware virtual platform, Linux Journal 1999. URL http://portal.acm.org/citation.cfm?id=327906.327912 [8] P. H. LTD, Virtualization and automation solutions for desktops, servers, hosting, saas - parallels optimized computing, Internet Website, last accessed, March 2011., available from http://www.parallels.com/. [9] J. Watson, Virtualbox: bits and bytes masquerading as machines, Linux Journal 2008. URL http://portal.acm.org/citation.cfm?id=1344209.1344210 [10] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, A. Warfield, Xen and the art of virtualization, SIGOPS Oper. Syst. Rev. 37 (2003) 164–177. doi:http://doi.acm.org/10.1145/1165389.945462. URL http://doi.acm.org/10.1145/1165389.945462 [11] S. Matthews, T. Williams, MrsRF: an efficient mapreduce algorithm for analyzing large collections of evolutionary trees, BMC Bioinformatics 11 (Suppl 1) (2010) S15. doi:10.1186/1471-2105-11-S1-S15. URL http://www.biomedcentral.com/1471-2105/11/S1/S15 [12] C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, C. Kozyrakis, Evaluating mapreduce for multi-core and multiprocessor systems, in: High Performance Computer Architecture, 2007. HPCA 2007. IEEE 13th International Symposium on, 2007, pp. 13–24. doi:10.1109/ HPCA.2007.346181. [13] S. Loosemore, R. Stallman, R. McGrath, A. Oram, The GNU C Library: Reference Manual, Free software foundation, 1996. 10