Available online at www.sciencedirect.com Procedia Computer Science 4 (2011) 627–636 Procedia Computer Science 00 (2009) 000–000 Procedia Computer Science www.elsevier.com/locate/procedia International Conference on Computational Science, ICCS 2011 A-R-E: The Author-Review-Execute Environment Wolfgang Müllera*, Isabel Rojasa, Andreas Eberhartb, Peter Haaseb, Michael Schmidtb a SDBV group, HITS, 69118 Heidelberg, Germany b fluid Operations,69190 Walldorf, Germany Abstract The Author-Review-Execute (A-R-E) is an innovative concept to offer under a single principle and platform an environment to support the life cycle of an (executable) paper; namely the authoring of the paper, its submission, the reviewing process, the author's revisions, its publication, and finally the study (reading/interaction) of the paper as well as extensions (follow ups) of the paper. It combines Semantic Wiki technology, a resolver that solves links both between parts of documents to executable code or to data, an anonymizing component to support the authoring and reviewing tasks, and web services providing link perennity. Keywords: Semantic Wiki, Linked Data, Extended links 1. Introduction The main goal of an executable paper is to increase comprehension, reproducibility and sustainability of electronic publications. We take a data-driven, loosely coupled, and distributed approach to support the life cycle of an (executable) paper: authoring, reviewing, publication and study. In the A-R-E system the main objective is to provide an environment where a publication is a structured complex entity enriched with features that support further exploration of the facts and hypotheses stated in the paper, as well as related information from external sources. We will present the concept and features of the A-R-E based on the types of users of the system, which we have defined as: author(s), reviewer(s), publisher and final reader(s). For each of these we consider (i) the desired user experience and (ii) the technical needs for the necessary functionality. The authors of an executable paper require an environment that supports him or her in providing, enriching and linking content. It has to be simple and flexible; otherwise, no one will enhance the papers as needed. For increasing adoption, the authoring environment needs to preserve the freedom of choice of tools, e.g. the data analysis tools to be used. It is hard to imagine that scientists will commit to one platform that restricts the way that they do experiments or produce data, so the possibility to export and import text files and data is crucial. Linking information should be possible at different levels of details and to different information source types. The use of links from named entities and figures or tables into Semantic Web content (web-based databases, web-enabled * Corresponding author. Tel.: +49-6221-533-231; fax: +49-6221-533-231. E-mail address: wolfgang.mueller@h-its.org. 1877–0509 © 2011 Published by Elsevier Ltd. Selection and/or peer-review under responsibility of Prof. Mitsuhisa Sato and Prof. Satoshi Matsuoka doi:10.1016/j.procs.2011.04.066 369.indd 627 5/3/11 10:53:08 AM 628 Wolfgang Müller et al. / Procedia Computer Science 4 (2011) 627–636 Author name / Procedia Computer Science 00 (2011) 000–000 articles, etc.) are necessary (now widely available, see Attwood et al. [1]) features that need to be supported. Furthermore, it would be desirable to allow the author to link sections (chunks) of information to sections within external (referenced) articles. Apart from the authoring aspects per se, the system should support the author in submission as well as in the revision process (interaction with the reviewer). The author needs to be able to include additional information supporting the work presented in a paper, such as files with raw data from which a diagram is generated, or a program used to process the data, or even links to specific parts of a referenced publication (and not to the publication as a whole). Supporting theses needs would offer the authors an integrated platform for the management of their executable paper and its parts, covering the processes of writing, referencing, annotation, proofing, submission, and revision of the paper. From the reviewers’ point of view, the executable paper environment should facilitate the understanding and verification of the paper. The reviewer needs to have access to supplementary files such as data and executable code. Easy navigation and commenting of the paper’s context using its structure is also a desired feature that helps the reviewer in his or her tasks. Referencing from one section to another allows the reviewer to concentrate on a certain aspect of the paper, for example, to follow-up on a given topic mentioned in the abstract, the author can link the information in the abstract to the related section, allowing the reviewer to use this link during the revision process. One of the main tasks in the reviewing process is the verification of the paper by consulting related work or information, which has to be supported by the system. The reviewer should be allowed to write his or her review using references to the content of the paper, to external references or to his or her own supplementary content or to the paper’s supplementary files. Often, resolving references means for a reader not only getting the paper B referenced, but also finding out which paragraph is providing the information bit pertinent to paper A. Similar considerations apply to figures in a paper. One would like to be able to navigate from data points in a plot to the data items in raw experimental data that led to these points, say by being pointed to an excel sheet column with experimental data. Furthermore, one would like to be able to navigate into the program code that led to a given aggregation of data. Evidently one will not be concerned in programming details, but rather in the implementation of the main bits of code leading to a plot, graph, or other figure. If given the chance, one would like to be able to explore the data further, either by changing the code applied or the data analyzed. It is important that the reviewing environment facilitates the communication reviewer-author and reviewer-publisher, all bit it in a secure and anonymous (for the case author-reviewer) manner. Publishers need to control the anonymous peer review process, e.g. give the reviewers rights to view the document and supplementary data before acceptance and then open up the publication of the data as needed. By facilitating the interlinking among its papers the system can improves the quality of research as well as the repeatability of associated experiments. The final reader can be viewed as a reviewer with limited authorization. The system should offer the same facilities as to the reviewer for the navigation and exploration of the paper, but these will be restricted by the access rights granted by the paper’s authors to the supplementary files of the paper. If authorized by the publisher the system will allow the reader to make his or her personal “notes” on the paper, annotations and complementary information. In the last few years there have been multiple efforts towards augmenting the information provided by a publication, in order to facilitate its comprehension as well as to extend the knowledge it provides. Several Web-based tools allow the identification of terms in the paper against a set of ontologies or databases adding relevant hyperlinks to target pages. Attwood et al. [1] provides a broad and detailed overview of techniques and efforts. Taking into account the features that we understand and think are required by the different types of users of an executable paper environment, the rest of this document will be structured as follows. First we will define our concept of the A-R-E system, highlighting the main goals that we aim to achieve. We then present the main components of the A-R-E system. After this we exemplify how an author and a reviewer could use the system and highlight also the role of each of the A-R-E components. 369.indd 628 5/3/11 10:53:08 AM Wolfgang Müller et al. / Procedia Computer Science 4 (2011) 627–636 Author name / Procedia Computer Science 00 (2011) 000–000 629 2. Concept Our mission is to support data-driven navigation, analysis, visualization, and annotation of the publication in the different stages of the life-time of a paper: creation, revision, publication, analysis and extensions (follow-ups). Figure 1: Architecture of the A-R-E system We consider that the author of a document is often behind a firewall, is not always root on his/her machine and does not necessarily want to go through extended administrative motions just for sharing some data with a reviewer. Furthermore we need to respect the reviewer's anonymity during the review. Finally, we need to resolve URLs specifying document regions using an appropriate component. As a consequence, the A-R-E system consists of the Information Workbench (IWB) [7], a Semantic Wiki-based tool for authoring and linking content, plus a proxy server component that reside at the publisher's location. The proxy server takes care of anonymization and firewall circumvention. In addition there is a local component running at the location of the author's that interacts with the proxy server component. This base architecture is depicted in Fig. 1 and will be further explained below. Figure 2 below illustrates a simplified diagram of a paper’s life cycle, indicating for each state the transition actions and the type of user that executes it. Each state is normally comprised by more than one task, which in turn can be iterative, so there could be multiple cycles in the writing process before submitting the final draft of the paper. Figure 2: State flow diagram of the phases in the life cycle of the (executable) paper Although the paper will go through different phases and probably suffer modifications or additions in these phases, we depart from a general intrinsic structure, which defines the system functionalities of the system. Figure 3 shows how an executable paper is represented in our system. At its core, the authors model the structure of their publication. Typically, it would be defined according to the sections and subsections contained in the paper, such as Abstract, Introduction, and other chapters. When importing a paper from an existing document (e.g., a Word file), the system could propose a structure according to the content of the document. 369.indd 629 5/3/11 10:53:09 AM 630 Wolfgang Müller et al. / Procedia Computer (2011) 4 (2011) Author name / Procedia Computer Science 00 Science000–000 627–636 Figure 3: Structure of an executable paper in our A-R-E system Apart from some basic constraints enforced by the publisher (like, e.g., the presence of an Abstract), the authors are totally free in defining the structure of their publication, and all sections/subsections are semantically linked to each other. Following the paradigms implemented in the Information Workbench, these semantic links are stored in the form of RDF data [8], the W3C standard for representing and exchanging semantic information. The semantic links connecting the components of the paper can be of benefit within the reviewing process, during publishing, and also once the paper has been published: given that parts of the paper are treated as first-class citizens in our model, the publisher could easily generate a table of contents, extract abstracts, or interlink related sections (even across different publications). As another example, once the paper has been published readers could annotate individual sections with comments, additional information, or related work. As also shown in the figure, each section/subsection is associated with a Semantic Wiki page. These wiki pages can be collaboratively edited by the authors and, later on, processed by the reviewers. Such Semantic Wikis, which have recently gained attention not only in Semantic Web community (see e.g. [14]), differ from traditional wikis in that they allow to embed widgets that build upon an underlying semantic database, thus making it possible to create dynamic charts and dashboards that are filled according to the content contained in the underlying database. The Information Workbench offers built-in support to import data given in common formats such as tabular data, relational data, or RDF data. In addition, authors can choose to integrate both local data and data public data from global repositories. Coming with the proliferation of executable papers, we may also expect that it becomes common practice that authors publish associated data in the publisher's side, in a globally accessible data repository. Using the Information Workbench, the authors can then embed dynamic charts, dashboards, or other data visualization from public global data and local data directly into the Semantic Wiki pages, possibly combining data from multiple sources into a single dashboard. Furthermore, as another central feature authors can use the Semantic Wiki to put semantic links to executable code, which can later on be verified and run by the reviewers (we will discuss this issue in more detail later in this section). We want to enable the linking of information in a paper to information within the paper as well as to external resources. These sources can be either data files or allow the processing of data. In addition to that, we want to enable the linking of parts of documents to parts of documents as an afterthought. Linking at the data level is the basis for the success of the Semantic Web. HTML allows linking from marked regions in documents to other documents and even anchors in documents. However, these anchors must have been prepared beforehand by the author of a document. In other words, the author of a document decides how the reader of a document is to read the document, which parts are to be referenced and which is the minimal granularity that can be referenced. Similarly, PDFs allow linking to pages or to named destinations that have to be created by the author. This poverty in deep region-to-region linking possibilities is in stark contrast to the fact that there are quite some languages for specifying locations in general text, XML [9], spreadsheets, or even program code. For text, there are 369.indd 630 5/3/11 10:53:09 AM Wolfgang Müller et al. / Procedia Computer Science 4 (2011) 627–636 Author name / Procedia Computer Science 00 (2011) 000–000 631 numerous examples such as the command language of the VI text editor (http://www.vim.org/). XPath [10] enables specifying regions in XML. In Spreadsheets, there is an implicit de facto standard across spreadsheets software (such as Excel, Open Office and others) on how to specify sheet regions in formulas. Aspect Oriented Programming [2] is about regions of code the so-called point-cuts to be affected by code changes, the so-called advices. All these region specification approaches can be used for enabling deep linking. Within our A-R-E concept, we are following a pragmatic way to enable deep linking without having to change or adapt web standards. We have chosen to enable linking via the use of a proxy server/anonymizer component which will be described in detail below. This notion of extended linking is complementary to efforts to join Excel and ontologies for improved use of Excel data [3,4,5]. Another important point that needs to be taken into account when defining the system is the provenance of information and tracking of revisions. This is a key factor to maintaining the integrity of an executable paper. The system implicitly will track changes and origin of information, check for lost links and modifications to referenced files. Here we plan to use techniques similar to those used in the SysMO-DB SEEK [6] (i.e. detecting file changes by generating, shipping and comparing cryptographic hashes). Complementing the previous components, last but not least each publication has associated meta data. This meta data includes information such as the title, the authors, categorization information, keywords, associated proceeding information, etc. It is filled in by the authors using predefined forms when submitting the paper and will be aligned with common ontologies for publication meta data such as dublincore (http://dublincore.org/), to increase the reusability of the meta data description. Hence, as a major benefit, the publisher can directly publish all its meta data in a semantic data format such as RDF, to make it available to the scientific community. Related tasks such as the meta data annotation and data publishing processes are supported by the Information Workbench out-of-the-box. Our conceptual view of the executable paper makes no reference to the location of the files (data, executables, or other supplementary files) that can be linked from the executable paper. The idea behind this is that this should be more or less seamless for the reviewers and readers, and controllable by the author. Inherently, the A-R-E is a distributed system: with distributed file management and distributed execution of tasks (all be it limited for the time being). In addition to the conceptual view of the executable paper, we need to cater for the concepts supporting the distributed nature of the system, namely free choice of storage location, assuring author security and reviewer anonymity, and later virtual machines for execution. Free choice of storage location (data local, remote, or in the cloud): This functionality is closely linked to import functionality. The truth in modern science is that data can be at many places, the workstation at the desk, the computing center of the institution, the cloud, and other big data centers. Some of the data may be too big to ship by wire. Furthermore, we assume that most data analysis (e.g. on protein sequences, seismic data etc.) are carried out with specialized tools outside the system and that within the paper the author can reference the data, the tools, and the derived data, while these may reside outside the system. An executable paper can refer to (or contain) the tools/programs that were used to process a given dataset or to obtain certain results. These elements (e.g. applications, data-sets, and results) can be stored in different locations, e.g. in the author’s server, in the publisher’s server, or in another (cloud) server. The system will provide the mechanisms to keep track of the referred elements and the relations between them (e.g. a certain file is the result of applying a certain tool/method on a certain set of data). Apart from defining his/her tools for data processing, authors, reviewers, and readers will also be able to apply (and refer to) tools and applications supplied by the system for a wide palette of data types, such as geographical data or protein data. Furthermore, authors will be able to create new widgets to incorporate (new) data analysis and processing tools into the A-R-E system, making them directly available to other users of the system. Assuring author security and reviewer anonymity: Blind peer-review is still the prevalent way of evaluating papers. However, consider the following scenario: An executable paper has been submitted. As part of reading the executable paper the reviewer accesses a data file residing on a machine controlled by the author. Doing so, he leaves an IP address on the author’s server. The IP address will allow the author to find out the reviewer’s institution and (given a sufficiently small research domain) it will enable the author to find out the reviewer’s identity. At the same time, in case of a hacking attack by the reviewer, the author would like to know who accessed his data and 369.indd 631 5/3/11 10:53:09 AM 632 Wolfgang Müller et al. / Procedia Computer Science 4 (2011) 627–636 Author name / Procedia Computer Science 00 (2011) 000–000 when. Both reader anonymity and author security are best achieved using an anonymizing proxy server under the control of the editor, as further described below. Virtual machines for reproducibility: Providing a virtual machine that reproduces the conditions under which a certain application was ran goes beyond the scope of the first prototype that we aim to build. However, our prototype will provide the basis for the implementation of such features in future versions of our system. We will create the appropriate meta-data description in order to allow the author to specify the hardware and software requirements for the execution of the application, as well as meta-data on the result files describing the conditions under which these results were obtained. 3. Components of an A-R-E System In this section we will describe the architecture of the system implemented to support the concepts described above. The A-R-E consists in principle of two main components, namely the central authoring/reviewing/executing environment and the resolver/anonymizer. In addition, there is a component that mainly works on the author’s local computer (author’s local environment) , which is normally behind a firewall in his or her organisation. The readers may want to access resources from outside the firewall, either at the editor’s location or at the author’s location or in distinct locations such as other servers or clouds. The central authoring/reviewing/executing environment is maintained at the publisher’s location. The core component of this environment is the Information Workbench (IWB [7]), a generic platform for building Linked Data applications, which has been developed by fluid Operations and is already productively used in fluid Operations’ product portfolio. On demand, the user can also download a local copy of the Information Workbench, to author the paper in his or her local environment prior to uploading the content. The IWB comes with a built-in semantic database and provides full Semantic Wiki functionality, allowing users to author free-text sections, interlink such sections, and establish connections to integrated semantic data. Terms can be used as linking points within documents or document parts, and in turn these terms can be organized in graphs, supporting navigation and discovery by following paths of terms through the graph. The IWB design follows a self-service application development paradigm, i.e. it makes it easy to use and define widgets for searching, exploring, and processing data. In addition, it includes predefined components for the analysis and visualization of data (e.g. in the form of charts or dashboards) and supports the collaborative knowledge acquisition process, thus facilitating collaborative work on publications. With its built-in semantic database, it also makes it easy to attach meta information to executable papers, e.g. to categorize publications or to establish links between papers, authors, and conferences. To build our A-R-E system on top of the Information Workbench we had to extend it by some novel features. Being designed as a platform for self-service application development, though, the IWB comes with APIs that allow to seamlessly integrate new modules and to couple it with other systems. Among the major changes was the support for editorial workflows in the Information Workbench, which enabled us to implement the authoring-reviewingpublishing process (cf. Figure 2 and the associated discussion). In addition, we had to integrate the IWB with the resolver/anonymizer component and added some new widgets supporting the submission and reviewing process. Most of the other tasks, like editing, metadata annotation, visualization support, or data export and import functionality could be realized out-of-the-box with features already in place. The author’s local environment refers to the machine residing at the author’s location or to the machine on which the author stores his or her data or carries out the execution of his or her programs. The authors should be allowed to manage the files (of diverse nature) associated with the paper. It enables the following, distributed, functionality : 1. Authors are enabled to create locally executable papers, if they so wish, and then export their data for import into another instance of the Information Workbench. 2. Authors can provide (restricted) links to data that they do not want to upload, furthermore, they can provide the possibility to locally execute runs of software in their local environment. 3. Authors can control multiple local environments, e.g. the cloud for big data experiments and their local workstation for more iterative, less data-intensive tasks, like e.g. grouping results into plots. 369.indd 632 5/3/11 10:53:09 AM 633 Wolfgang Müller et al. / Procedia Computer Science 4 (2011) 627–636 Author name / Procedia Computer Science 00 (2011) 000–000 4. The author must be enabled to share data with reviewers even before publication. Depending on the organisation setting up a restricted environment on the institute’s server can incur severe hassle. Sharing data on the author’s machine is often blocked by firewalls. This means, that the author’s local environment comprises an installation of the Information Workbench, as well as a tool that provides access to the data which are to remain outside the Information Workbench but should be shared from the author’s machine. This component enables users to drag and drop data files into a shared area, to link files to each other and to create URLs that make the object accessible from the outside. The reader then can explore relations such as “resultfile A was generated from datafile B using matlab module M” and access the files in question. Obviously, this information is also sufficient in order to run analyses using input the shared data files yielding result files that can be shown to the reader/reviewer. While the above caters for the authoring/reviewing/executing needs, we have not addressed the firewalls, yet. One possibility of circumventing firewalls involves a proxy server component. Let us consider as a general case a server A behind a firewall (that forbids incoming connections to A but allows outgoing connections from A), a proxy server B outside the firewall and a client C (behind another firewall that forbids C to run a server but allows C to build outgoing connections to servers) who wants to request data from A and is blocked by the firewall. As a consequence, C cannot be served by A directly. How to resolve the challenges of this scenario can be illustrated by using the interplay between the author’s local environment and the resolver/anonymizer. The resolver/anonymizer component has the following functions: 1. help resolving A-R-E URLs that designate document regions. 2. shield reviewer data requests from the data provider, thus acting as an anonymizer. 3. provide some security for the data provider, as in case of need the owner of the anonymizer can trace who accessed data via the anonymizer. 4. And it can also be useful as a proxy in a firewall piercing scenario. For our example, we consider the hypothetical web locations author.org resolver.com, reviewer.org, and thirdparty.org. Imagine a document at http://publisher.com/executablePapers/document.jsp linking to http://thirdparty.org/paper.pdf, page 1, words 20 to 50, and to http://author.org/anotherPaper.pdf, page 4, words 4157. Furthermore consider that author.org is hidden behind a firewall, so it cannot be accessed directly. For use in the A-R-E system one will wrap up these URLs such as: http://resolver.com/resolve?link=”http://author.org/anotherPaper.pdf”;page=4;words=”41-57”. The following steps will be performed. 1. resolver.com receives the request. Before receiving the request, resolver.com has checked the requester’s credentials and has made sure that the requester has the right to see the intended document. He has further determined to which extent the user can see the documents. 2. The local component on author.org periodically contacts resolver.com. As result from its latest poll, author.org learns that anotherPaper.pdf is requested 3. author.org sends anotherPaper.pdf to resolver.com. 4. resolver.com marks up the PDF of anotherPaper.pdf. In particular it highlights the words 41 through 57 on page 4. According to the rights of the requesting user, the corresponding representation is handed out. For example, we could imagine that some users are able to receive PDFs that allow cut-and-paste, others just receive GIFs that enable reading but not further processing. 5. resolver.com forwards the enriched postscript to reviewer.org Note that in this five-step process (i) the link to a pdf region has been resolved, (ii) reviewer.org got the requested document, (iii) author.org did not learn who requested the document (iv) author.org learned the request by polling, i.e. without listening to a socket, thus avoiding the most frequent firewall restrictions. As can be seen in the example, the resolver is a RESTful web service. 369.indd 633 5/3/11 10:53:10 AM 634 Wolfgang Müller et al. / Procedia Computer Science 4 (2011) 627–636 Author name / Procedia Computer Science 00 (2011) 000–000 4. A Walk Through A-R-E’s Features Figure 4: Representing linked content in the Information Workbench To demonstrate the main features of the A-R-E system we have defined a case study scenario in the area of biochemical pathway analysis, where the HITS partner has experience in the processing of publications. However, in the demo version of the system we include some examples in other areas in order to demonstrate the generality of our solution and to present features that can be better/easier presented in different contexts. The prototype contemplates all the phases of an executable paper’s life-cycle, namely the authoring of the paper, its submission, the reviewing process, the authors’ revision, the publication, and finally the study (reading/interaction) of the paper as well as extensions (follow-ups) of the paper. To write the publication the author can use a text processing program of his or her choice (this is not part of the system). The author can then import his or her draft of their paper into the system. The importing facility will recognize the main sections of the paper and define these as the structure of the paper, this can of course can be edited and modified by the author. The author can then use the A-R-E to explicitly reference related work (other executable and non-executable papers) and use the Semantic Wiki facilities to find and eventually link from the paper to external sources or to linked sources within the A-R-E system (See Figure 4). 369.indd 634 5/3/11 10:53:10 AM Wolfgang Müller et al. / Procedia Computer Science 4 (2011) 627–636 Author name / Procedia Computer Science 00 (2011) 000–000 635 Figure 5: Exploring and integrating existing life science data with the IWB The A-R-E system already contains a wide pallet of life science databases from publicly available Linked Data repositories. This data can be explored and integrated by a single click of a button (see Figure 5). In our demo, we will show how the user can add new linked databases, such as the Systems Biology Ontology (SBO) [15] to make references from the article to terms in the SBO. The demo will then show the systems support to reference to the related data, regions in related data, program and application files which can be on the author’s local server or imported into the publisher’s server. We will show how the author can also use applications offered by the system, such as data analysis and visualization tools, and include the results into the paper. If desired, associated data can also be loaded directly into the A-R-E’s internal database (or accessed dynamically from external Web sources), to be visualized on demand using internal charting and reporting widgets. Once a paper is ready for submission the paper can be “proof executed” to guarantee its integrity (all links are functional and complete, references to external and system resources satisfy the necessary requirements, etc.). Once submitted the author will not be able to modify the paper. The paper will be made visible and executable (but not changeable) to the reviewers. Here the system will support the reviewing process by offering the search and linking facilities offered to the authors as well as allowing the application of analysis tools and other types of software to data supplied by the author or to other sets of paper. The reviewers can use the results of their analysis of the paper to create their reviews, referencing to the results they obtained and to parts of the paper. To demonstrate the functionalies of the anonymizing proxy we will show a case where the reviewer needs to access data on the authors local environment. The author-reviewer cycle can be carried out several times (as many as allowed or necessary) and, assuming success, the paper will be ready for publication at some point. The management of the phases of the paper, once a paper has been registered in the system, is coordinated by the publisher. Finally, the publication of the executable paper then brings the paper into an environment similar to that of the reviewer. In this step, the publisher may limit the type of access to elements related to the paper, such as the code of applications, which could have been open for revision by the reviewers (possibly on request) but may not be accessible to the “general public”. The potential reader of the paper benefits from the executable paper in several regards: improved search and exploration facilities may help to find relevant content and data more efficiently, precise links to parts of other documents facilitate the understanding of the paper and its relation to related work, executable parts of the paper could be reproduced and verified with only little effort, and data export functionality may give direct access to the data associated with the paper, allowing the reader to integrate the data into his/her own experiments. If desired, the publisher could choose to grant the author ongoing write access to the paper, allowing him to correct mistakes and extend previous results once the paper has been published. 369.indd 635 5/3/11 10:53:10 AM 636 Wolfgang Müller et al. / Procedia Computer Science 4 (2011) 627–636 Author name / Procedia Computer Science 00 (2011) 000–000 5. Conclusions We have developed the A-R-E- environment that implements our concept of an executable paper. Our prototype is based on a solid Semantic Wiki system augmented with a resolver/anonymizer component for the resolution of links amongst (parts) of documents, and for the control of security/anonymity issues involved in the reviewing process and in the sharing of data. Thirdly, there is a component (or multiple such components, where needed) at the author’s location enabling simple through-firewall sharing, as well as performing execution of analysis on systems controlled by the author. Together, these components are an environment that facilitates reading and comprehension of papers by making it easier to find what matters, to see how things fit together and to link them to outside sources of knowledge. We enable blending of established semantic web techniques with extended links that enable linking into regions of documents without creating anchors in these documents beforehand. We have created a demonstrator to show the capabilities of the system, choosing as demo scenario the area of biochemical pathways. We see several possible extensions, including better support for virtual machines in order to execute papers in small-data scenarios at arbitrary locations and support for data streaming/multi-resolution approaches for big-data scenarios. Acknowledgements The authors wish to thank Dr. Ulrike Wittig, Renate Kania for discussion and the SysMO-DB and SysMO-LAB projects for ongoing fruitful collaboration. References 1. T.K.Attwood, D.B.Kell, P. McDermott, J Marsh, S. R. Pettifer, D. Thorne: Calling International Rescue: knowledge lost in literature and data landslide! Biochem. J. (2009) 424, 317–333 (Printed in Great Britain) doi:10.1042/BJ20091474 2. G. Kiczales, J. Lamping, A. Mehdhekar, C. Maeda, C. V. Lopes. J. Loingtier, J. Irwin: Aspect-Oriented Programming, Proceedings of the European Conference on Object-Oriented Programming (ECOOP), Springer-Verlag LNCS 1241. June 1997 3. The ISA infrastructure http://isatab.sourceforge.net/isacreator.html 4. SysMO-DB RightField http://www.sysmo-db.org/rightfield 5. Anzo for Excel http://www.cambridgesemantics.com/products/anzo_for_excel 6. SysMO-DB project http://www.sysmo-db.org 7. The Information Workbench - Interacting with the Web of Data. P. Haase, A. Eberhart, S. Godelet, T. Mathäß, T. Tran, G. Ladwig, A. Wagner. Technical Report, fluid Operation & AIFB, October 2009. http://iwb.fluidops.com/. 8. RDF Primer. W3C Rec., Feb 10, 2004. http://www.w3.org/TR/rdf-syntax/. 9. XML (Extensible Markup Language). W3C, http://www.w3.org/XML/. 10. XPath (XML Path Language). W3C Rec. Nov 16, 1999. http://www.w3.org/TR/xpath/. 11. Vít Novácek, Siegfried Handschuh: Biomedical Publication Knowledge Acquisition, Processing and Dissemination with CORAAL. OTM Conferences (2) 2010: 1126-1144 12. Tudor Groza, Siegfried Handschuh, Georgeta Bordea: Towards automatic extraction of epistemic items from scientific publications. SAC 2010: 1341-1348 13. Vít Novácek, Tudor Groza, Siegfried Handschuh, Stefan Decker: CORAAL - Dive into publications, bathe in the knowledge. J. Web Sem. 8(2-3): 176-181 (2010) 14. M. Krötzsch, D. Vrandecic, M.Völkel: Semantic MediaWiki. International Semantic Web Conference 2006: 935-942 15. Le Novère N. (2006) Model storage, exchange and integration. BMC Neuroscience, 7(Suppl 1):S11. http://www.ebi.ac.uk/sbo/main/. 369.indd 636 5/3/11 10:53:10 AM