Available online at www.sciencedirect.com

Procedia
Computer
Science

Procedia Computer Science 4 (2011) 608–617
Procedia Computer Science 00 (2009) 000–000

www.elsevier.com/locate/procedia

Executable Paper Grand Challenge
International Conference on Computational Science, ICCS 2011

The Collage Authoring Environment
Piotr Nowakowskia*, Eryk Ciepielaa, Daniel Harężlaka, Joanna Kocota, Marek
Kasztelnika, Tomasz Bartyńskia, Jan Meiznera, Grzegorz Dyka, Maciej Malawskib,c
a
ACC CYFRONET AGH, ul. Nawojki 11, 30-950 Kraków, Poland
Institute of Computer Science AGH, al. Mickiewicza 30, 30-059 Kraków, Poland
c
Center for Research Computing, University of Notre Dame, USA

b

Abstract
The Collage Authoring Environment is a software infrastructure which enables domain scientists to collaboratively develop and
publish their work in the form of executable papers. It corresponds to the recent developments in both e-Science and
computational technologies which call for a novel publishing paradigm. As part of this paradigm, static content (such as
traditional scientific publications) should be supplemented with elements of interactivity, enabling reviewers and readers to
reexamine the reported results by executing parts of the software on which such results are based as well as access primary
scientific data.
Taking into account the presented rationale we propose an environment which enables authors to seamlessly embed chunks of
executable code (called assets) into scientific publications and allow repeated execution of such assets on underlying computing
and data storage resources, as required by scientists who wish to build upon the presented results. The Collage Authoring
Environment can be deployed on arbitrary resources, including those belonging to high performance computing centers, scientific
e-Infrastructures and resources contributed by the scientists themselves. The environment provides access to static content,
primary datasets (where exposed by authors) and executable assets. Execution features are provided by a dedicated engine (called
the Collage Server) and embedded into an interactive view delivered to readers, resembling a traditional research publication but
interactive and collaborative in its scope.
Along with a textual description of the Collage environment the authors also present a prototype implementation, which supports
the features described in this paper. The functionality of this prototype is discussed along with theoretical assumptions
underpinning the proposed system.
Keywords: executable paper; high performance computing; e-Science; scientific publishing

1. Introduction
Many disciplines of modern computational science – high-energy physics, molecular biology, social research and
material studies to name just a few – face an obvious need to enrich and enhance the current state of scientific

* Corresponding author. Tel.: +48-600-280-105.
E-mail address: p.nowakowski@cyfronet.pl.

1877–0509 © 2011 Published by Elsevier Ltd. Selection and/or peer-review
under responsibility of Prof. Mitsuhisa Sato and Prof. Satoshi Matsuoka
doi:10.1016/j.procs.2011.04.064

Piotr Nowakowski et al. / Procedia Computer Science 4 (2011) 608–617
Piotr Nowakowski / Procedia Computer Science 00 (2011) 000–000

609

publications. Given traditional publishing methods, data sets, code and actionable software are absent when research
is recorded and preserved as a journal article, book chapter or any other paper-based publication. Furthermore, the
actual data which is the result of (and frequently the basis for) published research is not preserved and made
accessible in a coherent manner – thus, the reader of a scientific paper must often take the author’s word that the
professed results and conclusions are, in fact, valid. Clearly, the scientific paper itself no longer conveys sufficient
information to enable reviewers and other readers to judge it on its own merits. This phenomenon endangers the
scientific process and carries serious implications for further progress in computational sciences, which necessarily
build upon to-date published results.
The Collage Authoring Environment is an attempt at resolving this issue: just as the Web has come alive through
the use of mashups and content embedding technologies, so too can research papers benefit from interactivity and
dynamic content generation. The authors propose a system which enables a scientific publisher to deploy an
infrastructure for the storage and provisioning of executable papers, consisting of static bits of text (much like
traditional scientific publications) along with access to primary datasets and embedded assets facilitating execution
of author-supplied code by publication readers.
This paper is organized as follows: in section 2 we present the motivation and objectives of our work. Section 3
lists the current state of the art in scientific publishing and outlines some ongoing initiatives which aim to extend the
traditional publication model with the capabilities offered by modern computing and networking tools. Section 4
focuses on a more in-depth discussion of the Collage environment design, while section 5 outlines the features
implemented as part of the first prototype of the tool. The paper ends with conclusions and prospects for future
development.
2. Motivation and Objectives
Even though the past two decades have witnessed the development and spread of computer-aided research
techniques (e-Science), the mechanism by which scientific advances are communicated to the general public has
remained unchanged for more than a century: it is the scientific paper. The shortcomings associated with this mode
of publication are well known: scientific papers do not yield themselves to rapid verification, reproducibility and
reuse of research achievements. This is particularly troublesome given the fact that data management and access
technologies have made great strides in the recent years, yet such progress is not reflected by the procedures and
traditions associated with publishing scientific research. The pressing need for new solutions is illustrated – for
instance – by the requirements imposed by certain publishers upon reproducibility of results and the provision of
data which would enable such reproducibility. Rather than providing a simple means by which textual information is
conveyed, much like displaying a motionless clock, the scientific publication should instead become a vessel for the
enactment of the algorithms used to generate results, whereby the internal workings of the published research – its
“movement” – can be directly studied in action.
On the basis of this goal we can derive some specific objectives which should – in the authors’ view – be met by
any infrastructure purporting to realize the executable paper vision. Accordingly, the following assumptions can be
treated as a starting point for the framework proposed in this paper:
 Executability: The executable paper is necessarily interactive, i.e. it must enable the execution of arbitrary
computations (as determined by the authors) on underlying computing resources. There must be a way to embed
such interactive elements in the paper’s predetermined structure (which follows the natural flow of the scientific
publication – from the problem statement and initial assumptions, through details of the proposed algorithms, to
validation of results and conclusions);
 Compatibility: The infrastructure should be compatible with a wide array of data sources and computational
platforms, including modern Cloud environments. In addition, the environment should be sufficiently extensible
to enable integration with potential future computing solutions;
 Validation: The infrastructure should support tracing and validating research results, particularly by reviewers of
scientific papers who may be familiar with the given discipline;
 Licensing: The authors of the scientific paper should be free to expose only such data and computations as they
are entitled to; moreover the authors, in conjunction with the publishers, should exercise control over who is
allowed access to particular research papers;

610 	

Piotr Nowakowski et al. / Procedia Computer Science 4 (2011) 608–617
Piotr Nowakowski / Procedia Computer Science 00 (2011) 000–000

 Computational access: The environment should be structured in such a way as to enable computations to be
executed on high-performance computers, where available.
 Data access: The executable paper should also facilitate access to primary data – namely, data which is the basis
for the results presented in a given paper. Such data may assume various guises and be represented by databases
or flat files. Thus, the environment must support embedded links to data elements as part of the executable paper
and retrieval of said data with the use of a dedicated engine, along with extensions for various storage
mechanisms;
 Collaborative development support: The environment must be collaborative in the sense that the research paper
can be co-authored and amended by a community of authors, each contributing to its content;
 Multi-actor environment: The environment must provide facilities for authors of scientific publications as well
as for readers (including reviewers) as we assume that authors will have a higher degree of control over the
content which is being displayed while the readers will be presented by a more constrained view (which will,
nevertheless, still enable them to interact with the paper’s content in a meaningful way);
 Evolutionary approach: The executable paper, while necessarily being rendered by a computer (contrary to a
printed publication), should still formally resemble a traditional research paper so that its structure appears
familiar to readers. Executable content should be embedded in the text as frames;
 Security: Security must be implemented to ensure that authors can limit access to their research data, and to
prevent malicious code from being injected and executed in the infrastructure.

Fig. 1: Conceptual view of the executable paper. Static content (the body of the publication) is extended by interactive elements. Readers can
access primary data and reenact computations in order to validate the presented conclusions or navigate result spaces. Subject to the authors’
approval, readers can also obtain access to the underlying code of the experiments presented in the publication. The infrastructure is Web-based
and can be integrated with the Publisher’s portal.

A cursory depiction of the structure of the executable paper, as envisioned by the authors, can be seen in Figure
1. Of note is the fact that we intend to preserve the traditional “look and feel” of the publication, and not replace it
with an entirely new interface. We believe that the way towards wide-scale adoption of novel publishing paradigms
is not to eschew the existing model, but rather to extend it with additional functionality. In Sections 4 and 5 we
explain how we intend to pursue the objectives stated above and bring this vision closer to reality.
3. State of the Art in Scientific Publishing
Much effort is afoot to address the development of richer publication formats and better integration between
research data and the final published article (see http://users.emulab.net/trac/archive10/wiki/PublishStandard for an
overview of funding agencies and reports pertaining to such increased integration developed for the NSF’s 2010

Piotr Nowakowski et al. / Procedia Computer Science 4 (2011) 608–617
Piotr Nowakowski / Procedia Computer Science 00 (2011) 000–000

611

“Workshop on Archiving Experiments to Raise Scientific Standards”, https://www.protogeni.net/trac/archive10/).
At the same time it becomes apparent that browsing and citing scientific papers is an activity that can largely benefit
from social networking tools and websites. For example, CiteULike (http://www.citeulike.com) can be used to post
bibliographic references to the papers into a personal database with a one click in the Web browser. Moreover, the
references can be shared within a group writing an article, which greatly helps finding related papers and forming
communities which share the common interest. Zotero (http://www.zotero.org) is a similar tool which integrates
with the Web browser, while desktop tools such as Mendeley (http://www.mendeley.com/) can process local PDF
files and integrate with text processing suites like Word or LaTeX. Linking the articles with primary data is pursued
by the UTOPIA Documents project (http://getutopia.com) [1], which aims to bring PDFs to life by linking to live
resources on the web and turning static data into live interactive content, such as curated database entries, molecular
structures, sequence and alignment data and plots. myExperiment (http://www.myexperiment.org) is a Web 2.0based Virtual Research Environment and Repository for the storage, social curation and community contribution of
scientific workflows, supporting the sharing, reuse and execution of workflows by close integration with virtual
laboratory frameworks. Research on new ways of utilizing the possibilities offered by Internet and social networks
for scientific publications has been pursued in EU-funded projects like SciX (http://www.scix.net), PEER
(http://www.peerproject.eu/) and Liquidpub (http://liquidpub.org) and also gains attention from the collaborations of
professionals such as Concept Web Alliance (http://conceptweblog.wordpress.com/declaration/). An interesting
example of how Web 2.0 technologies may be leverage in collaborative scientific endeavors may also be found in
[2].
Several useful tools are being developed in the life sciences to help identify and enrich entities in biology. As an
example, EBI’s Reflect tool (http://www.reflect.ws), winner of the Elsevier Grand Challenge
(http://www.elseviergrandchallenge.com), offers a commodity Firefox plugin that allows entity recognition of any
biological (and, increasingly, nonbiological) entities that are given in one of the ontologies linked to the database.
On mouse-over, the tool pulls in information from various content sources and represents it as tabs on a pop-up
menu. A more expansive entity identification interface was recently launched by PubMedCentral
(http://beta.ukpmc.ac.uk/), which affords users the ability to highlight terms from the Gene Ontology and/or protein
thesauri. Clicking on an entity brings the user to the corresponding EBI page on a specific biological entity.
Recently, Web 2.0 technologies have enjoyed considerable success in entertainment and journalism and social
media, enabling users to add value to Web applications [3]. The benefit of Web 2.0 for science has also been
recognized [4]. For example, Giustini [5] describes the influence of blogs, wikis or RSS feeds on medicine. There
are some Web 2.0 social network and resource sharing platforms for research, wherein the researchers can establish
and maintain contacts, build communities, organize work space for collaborative projects, or share and reuse
scientific content resources including collaborative curation through tagging and comments [6]. However, naïve
approaches to sharing and reusing are insufficient and inappropriate in the realm of science [7]. Care must be taken
to protect intellectual property, appropriately attribute ownership, credit and attribution and support the scientists’
activities within the scientific communities’ culture of reward and reputation. myExperiment
(http://myexperiment.org) [8] is cited in [6] as an example of the second generation of social networking and sharing
sites that are mindful of the specific needs of scientists. Originally designed to share scientific workflows that are
expected to be reused and adapted, it has a sophisticated model of Versioning, Ownership, Sharing, Credit,
Attribution and Permissions for individuals, groups and networks, under the direct control of the members.
Lastly, experimental scientific papers convince readers of their core claims by using data: research data,
represented in figures and tables, and by equations, chemical formulae and such. There is a large move afoot to
integrate, store, annotate and employ such data in an enhanced way to scientific papers; this is an important,
challenging and exciting development. In this light, the wide adoption of the Linked Data philosophy seems very
promising: here, pages are rendered either in HTML for human consumption, or in RDF for computer consumption.
In pharmacology and therapeutics, there are ongoing efforts to enable greater “connectivity between data silos (...) to
connect drug and clinical trials related data sources” [9]. In computer science, a recent plea [10] was for authors and
publishers to “provide mechanisms for publishing software, inputs, and experimental data as metadata for the
publications that report these experiments in a deep and rigorous manner.”
While the projects mentioned above aim at addressing the problems of linking and referencing scientific data,
publications, services and even executable workflows, they do not provide for a unified infrastructure which can link

612 	

Piotr Nowakowski et al. / Procedia Computer Science 4 (2011) 608–617
Piotr Nowakowski / Procedia Computer Science 00 (2011) 000–000

together the computing and data e-infrastructure, execution engines and the resulting executable paper, which is the
ultimate objective of Collage.
4. The Collage Authoring Environment: Concept and Design
The Collage Authoring Environment operates by presenting domain scientists with a framework which can be
used to develop, test and run experiments in computational science, expressed in a variety of programming
languages. This environment bases on the GridSpace2 platform, developed as the end-user interface for the PL-Grid
HPC infrastructure. While it provides all the features traditionally expected of a programming environment, Collage
also enables the developer to publish and expose fragments of the developed experiments (called assets) as external,
embeddable entities which can subsequently be visualized in a digital edition of a research paper. Such assets
support interactive visualization of research results (backed by the underlying computing resources), enabling
reviewers and readers to enter arbitrary input data and witness the published scientific algorithms in action, and to
efficiently browse data sets which would be difficult to publish by traditional means (e.g. attachments to standalone
research papers). Moreover, the framework itself is capable of interfacing with popular scientific software suites,
such as Matlab, Mathematica, Packmol, GAMESS and many others.
Collage is aimed at two principal user groups, namely authors of scientific papers (i.e. anyone who has a hand in
carrying out a scientific experiment or authoring a publication which bases on its results) and readers, i.e. all users
not directly involved in preparing the paper itself, but interested in its content and the scientific data it presents.
Clearly, the latter group also includes reviewers of scientific papers.
4.1. External interfaces
The view presented by the Collage environment to the end users (i.e. authors and readers) assumes the form of a
HTML document in which static content is interspersed by dynamically generated forms, representing the
“executable” part of the paper. While the document may superficially resemble a traditional research publication, it
also contains embedded assets, each of which presents a visual interface to the end user (typically in the form of a
diagram, graph, figure etc.) and is capable of requesting the underlying infrastructure to repeat a given part of the
experiment or redisplay its results depending on the parameters specified by the reader. Initially this interface
assumes a predefined default state, as dictated by the experiment; however it may change following execution (at the
reader’s request). While the executable paper is being loaded, each asset refers to a predetermined data piece,
provided by the Collage server (which – if needed – can forward execution requests to underlying computing and
storage resources). As results arrive, placeholders are replaced in the paper view with actual results of computations.
The authors of the paper are thus provided by two distinct user interfaces: a Web environment where they can
code their experiments and determine the extent to which input data can be manipulated by the end user (also called
the Experimentation UI) as well as a separate interface which enables them to develop the actual executable paper as
a Web document with embedded assets (this is called the Authoring UI). Both types of interfaces are further
discussed in section 6. Since generating results on the fly might prove unfeasible, assets can also access local storage
(flat files and databases) where results of previous runs are cached.
4.2. Collage assets
Three types of assets are envisioned in the infrastructure. All assets belonging to a given type share a common
rendering widget by which they are represented in the executable paper. They also share a set of common actions
which can be performed on their content. The asset types foreseen in Collage are:
 input forms – the goal of this asset is to visualize an input form in the executable paper view. The form can be
used by the user to feed input data into the running experiment. This type of asset directly implements the
interactivity aspects of the executable paper as the user will be able to browse large result spaces with the aid of
input forms. Upon submission of an input form, the collage server will receive the input data and may further
apply it in the course of processing the experiment, possibly generating further assets. An example of this
functionality would be an interactive graph, which can be manipulated by the user through an input form. Each

Piotr Nowakowski et al. / Procedia Computer Science 4 (2011) 608–617
Piotr Nowakowski / Procedia Computer Science 00 (2011) 000–000

613

time the input form is filled, and submitted, the server reruns the required computations and generates a fresh
graph (which is rendered as a visualization – see below). Should computations become too complex to be
performed in real time, the relevant results may be read from data sources (databases and/or flat files), according
to the input specified by the reader. Note that this type of asset can also be used to upload data files to the
infrastructure for processing;
 visualizations – the goal of this asset is to render an experiment result which can be directly visualized in the
research paper. Typically, this would be a figure, diagram or chart, although the environment itself does not
impose restrictions on the nature of the visualization. In the case of inherently static visualizations (such as
images), either the publisher server or the client browser may issue periodical requests to the Collage server, to
determine whether the payload of a given visualization has changed (which may often be the case – for instance
as partial results are returned by the computing backend). Should this be the case, the contents of the
visualization asset will be automatically updated in the client browser. Furthermore, the Collage server can also
detect that a given visualization is not yet available (e.g. if the underlying computations are still in progress) and
notify the Publisher server (or client browser) so that an appropriate message may be displayed as the client
awaits results.
 code snippets – these assets embed an editable view of the code which enacts a specific computation and may be
used to generate additional assets. The purpose of this type of asset is to enable the reader to exercise more indepth control over the experiment execution process, and also to review the inner workings of the executable
paper for the purposes of validation and/or reuse. Subject to the author’s control, the code of experiment snippets
may be stored and the experiment reenacted.
Together, the three asset types mentioned above cover the full spectrum of interactivity which is required by the
executable paper, as presented in Section 2. The authors can prepare and deploy assets (including executable code)
while the viewers can interact with them by means of input forms. Results are displayed via visualization assets and
can be refreshed by the server as more output data becomes available. The system may also visualize external (nonCollage) assets referenced by static URLs.
4.3. Conformance with stated objectives
In order to better justify the proposed architecture it may be beneficial to refer to Section 2 and explain how
Collage matches each of the objectives described there. This is done in Table 1, which lists each of the presented
objectives and explains where Collage fits in.
Table 1. How Collage addresses the objectives stated in Section 2
Stated objective

Collage Contribution

Executability

The Collage environment supports exposure of executable code, which can be embedded in the structure
of the publication with the help of suitable assets. The user may enter input data into a form, whereupon
the system proceeds with computations and retrieves results as an embedded fragment of the paper

Compatibility

The Collage server is capable of submitting computations to a variety of computational resources, as
well as retrieving data from networked data storage elements. Moreover, as the server is capable of
executing arbitrary code in a number of scripting languages (including Ruby, Python and Perl), any
resources which can be interfaced with the use of these languages are automatically available to Collage
assets.

Validation

The readers and reviewers of scientific papers are capable of rerunning experiments (where supported by
the given publication) and validating their results, as well as reviewing the computations which produce
these results by accessing the underlying code. Subject to the authors’ approval, the environment may
also facilitate modifications in code snippets and repeated execution of experiments with the use of the
updated code.

Licensing

As the Collage environment is served through a Web-based environment, it can be easily integrated with
the Web portal of any scientific publisher, whereupon the Publisher’s licensing schemes become
applicable to Collage papers. The authors of scientific papers may additionally control access to primary
data by supplying custom user credentials to be applied by Collage when accessing such data.

Computational access

The Collage engine can interface computing infrastructures, including existing Grid infrastructures

614 	

Piotr Nowakowski et al. / Procedia Computer Science 4 (2011) 608–617
Piotr Nowakowski / Procedia Computer Science 00 (2011) 000–000

operated across the EU and in other areas of the world. Support for Cloud computing is also foreseen. In
addition, the authors of scientific papers may schedule computations on their own resources contributed
to the executable paper (for instance, a public Web Service operated by a scientific institution).
Data access

Data access is provided by means of custom assets, which may be mediated by the Collage server or
refer to data elements directly. The only requirement here is that a given resource is available under a
known URL.

Collaborative
development support

The Authoring UI (described in Section 6) enables multiple authors to collaborate on the development of
a single paper. Moreover, the Collage environment supports decomposition of virtual experiments into
snippets, each of which may be coded by a different person and apply different programming tools.

Multi-actor environment

Collage is specifically designed to offer different features to authors of scientific papers (who, in
addition to authoring the paper itself, may use the environment to further develop and execute their
virtual experiments) and readers who interact with the paper by means of assets specified by the authors.

Evolutionary approach

The environment attempts to simulate a traditional paper view, by enabling authors to author their
publications as HTML documents, embedding executable assets where necessary.

Security

Collage provides secure login features guarding access to sensitive data or computations. Moreover, as
the environment is Web-based and may integrate with the Web resources of a scientific publisher, it is
also possible to extend the access control mechanisms applied therein to Collage features.

On the basis of this comparison, we can conclude that the Collage environment constitutes a good match with the
stated goals of an executable paper platform, as described in Section 2.
5. Implementation Details
The authors have substantial experience in developing computing solutions for domain scientists, as evidenced
by their longstanding involvement in development of environments for collaborative applications [11]. Among the
outcomes of our work is the GridSpace virtual laboratory and workbench, on which the Collage environment is
partly based [12]. This section is intended as a broad overview of the design and operation of the Collage
environment, including technical aspects of its implementation.
5.1. Interaction with end users
As already mentioned in Section 2, two user roles can be distinguished in the Collage environment: the author,
i.e. the person responsible for preparing the paper and its associated assets, and the reader who interacts with the
paper once it has been prepared and served by the Collage infrastructure. Each paper can have multiple authors and,
likewise, multiple readers (some of whom may act as reviewers, although this issue is not relevant in the scope of
this paper). This situation is depicted in Figure 2, which lists the major building blocks of the Collage infrastructure
along with pairwise interactions between actors and/or software modules. It should be noted that the architecture
restricts reader access to the Presentation UI exposed by the Publisher server, while the author may additionally
access a dedicated Experimentation UI and Authoring UI, used – respectively – to deploy executable code and set
up the actual structure of the target publication.

Piotr Nowakowski et al. / Procedia Computer Science 4 (2011) 608–617
Piotr Nowakowski / Procedia Computer Science 00 (2011) 000–000

615

Fig. 2. Interaction of user groups with the Collage environment. Authors of scientific publications can set up executable content by exploiting the
Experimentation UI and Authoring UI as appropriate. Readers have access to a dedicated Presentation UI, which visualizes the publications and
permits interaction with Collage assets.

The Publisher server refers to the Collage server via a separate API (the Execution API), whenever an asset needs
to be visualized or updated. This is further explained in the following subsection. In turn, the Collage server may
delegate computations to the Computing backend (comprising computational and/or data storage resources, either
operated by the Publisher or supplied by the publication authors). The Publisher server commands the Collage
server to execute the entire experiment, which may consist of multiple snippets. The Collage server is responsible
for management of snippets and can externalize their execution according to the experiment workflow (represented
by a snippet sequence). Results are retrieved from the Computing Backend by the Publisher server for direct
visualization by means of the Presentation UI.
5.2. Rendering executable papers
In accordance with the above schema, the core functionality of Collage will be provided by the execution
environment, which is interfaced by the Publisher Server (PS); essentially a web server capable of delegating
execution requests to the underlying Collage Server (CS), as depicted in Figure 3. The user requests a document,
triggering a HTTP/S GET request which is dispatched to PS. PS serves the contents of the document (HTML) and
may issue an embedded experiment execution request to CS. The request contains the input data necessary for
running the experiment asynchronously, returning a unique run identifier (RUNID) to PS. Whenever “live” links to
assets appear, PS inserts RUNIDs into the links and then embeds the links themselves as separate IFrame elements.
The resulting page is presented to the user. As the browser renders the document, each link issues further HTTP/S
requests directly to CS.

Fig. 3. The Executable Paper rendered by Collage. Static elements of the document are served by the Publisher Server (PS) while interactive
elements are delegated to the Computing Backend (CB) via the Collage Server (CS). Results are visualized in the user’s browser.

Each link requests CS (via HTTP/S GET) to provide a specific asset (e.g. an image, text file etc.) Parsing the
presented RUNID enables CS to determine whether a given computation has concluded. From the reader’s
perspective, parts of the document, representing specific assets, may remain in a wait state until results are available
and can be visualized. The end result resembles a successive sequence of updates, akin to the “notebook” function
provided by Mathematica.

616 	

Piotr Nowakowski et al. / Procedia Computer Science 4 (2011) 608–617
Piotr Nowakowski / Procedia Computer Science 00 (2011) 000–000

In order to enable users to run experiments with arbitrary input data, CS serves data input assets as HTML forms
which the users may fill and send back to CS via HTTP/S POST. Such form consists of predefined widgets which
may be specified by the authors with the use of the Experimentation UI. The input data is then processed and can be
fed into the experiment code in order to generate further assets (which are again forwarded to the browser for
rendering).
The detailed sequence of actions is as follows:
 (1a) Publisher Server (PS) requests Collage Server (CS) to execute an experiment which generates the paper. In
response, the CS returns a list of assets which are provided by the paper, along with the URLs at which these
assets are expected to be found;
 (1b) CS deploys an instance of the experiment on the Computing Backend (CB);
 (1c) PS serves the stub of the paper to the user’s Browser (B). The Browser uses IFrames to represent interactive
assets, initially populating them with placeholder content pending reception of the expected assets from CB;
 (2a) Static elements may be served by CS directly (without waiting for CS to respond to execution requests);
 (2b) Static elements are forwarded to the Browser as a mashup, without passing through PS;
 (3a) For dynamic assets, CS may serve an input form (if input data is required) or output data (in the case of a
visualization or snippet asset), which is then rendered within an IFrame directly in the Browser;
 (3b) Data requested and collected from input forms does not pass through PS – this enables the users to retrieve
large data sets and does not introduce undue load on PS;
 (3c) User input is fed into the experiment as required;
 (4a) The experiment generates results;
 (4b) CS forwards the results from CB to B, to be rendered as a visualization asset in a separate frame belonging
to the paper;
 (4c) The result does not pass through PS – instead, it is downloaded directly from the Computing backend by
means of a dedicated URL (containing a specific RUNID).
As already mentioned, the browser uses IFrames to render the output of assets. The advantages of this approach
are threefold. First, the asset payload can be retrieved and updated in an asynchronous manner, irrespective of the
static content of the paper. Second, each IFrame may host code (for instance, a JavaScript library) which is specific
to the type of asset in question and facilitates its proper visualization in the asset window. Such code is provided by
the Publisher server upon instantiation of the executable paper. Finally, the assets may be represented by URLs,
which means that they may directly point to the resources included in the paper (particularly datasets), even if such
resources are external to the Collage framework.
5.3. Interfacing computing and data storage resources
Collage is intended as a generic environment which does not force users to use a specific programming language
or hardware platform. Owing to the functionality already present in GridSpace [13] we can execute computations on
a variety of resources, from local machines, through Web Services to PBS queues and Cloud systems [14]. The
system supports a variety of scripting languages, including Ruby, Python and Perl, as well as direct shell
programming on the Computing backend. Likewise, the developer of a virtual experiment is not constrained in the
choice of data storage solutions that can be interfaced. In most cases it should be sufficient to paste existing code
into the Experimentation UI environment as separate snippets, registering assets and instructing the Publisher server
to embed them in the executable paper as IFrame links. With the aid of browser mechanisms, the executable paper
may periodically query the assets which are listed as part of the publication, replacing placeholder content with
proper visualization payload, as results become available.
6. Summary and Conclusions
We believe that our experience with issues pertaining to e-Science and the tools developed in support of
computerized research leave us uniquely poised to meet the goals set forth in Section 2. Much of the functionality on
which Collage bases is already present in the GridSpace platform [11, 12]; therefore we consider it a good starting
point towards the development of a targeted solution which would be easy to interact with and to maintain. At the
time of preparation of this paper a working demo of Collage was being developed, based on actual research papers,

Piotr Nowakowski et al. / Procedia Computer Science 4 (2011) 608–617
Piotr Nowakowski / Procedia Computer Science 00 (2011) 000–000

617

showing how the solutions outlined above are applied in practice and describing in detail the tools available to both
the author of an executable paper and to its readers.
While work on implementing the Collage Environment currently focuses on its core engine and data/computation
access features, as well as on preparing suitable demonstrations, in the future we intend to progress to further
development of end-user interfaces, particularly the Authoring UI and the Presentation UI (discussed in Section 5),
thus ensuring that Collage becomes a marketable solution.
Acknowledgements
The authors wish to thank the PL-Grid project, developed with support of the European Union within the
European Regional Development Fund program no. POIG.02.03.00-00-007/08-00. The authors would also like to
thank Tomasz Gubała of ACC CYFRONET AGH for his valuable contributions.
References
1. T. K. Attwood, D. B. Kell, P. McDermott, J. Marsh, S. R. Pettifer, and D. Thorne, “Utopia Documents: linking scholarly literature with
research data”. In Proceedings of 9th European Conference on Computational Biology, Ghent, Belgium, Sep 2010.
2. G. Allen, F. Löffler, T. Radke, E. Schnetter and E. Seidel, “Integrating Web 2.0 technologies with scientific simulation codes for real-time
collaboration”. CLUSTER 2009: 1-10.
3. T. O'Reilly, What is Web 2.0. Design patterns and business models for the next generation of software. O'Reilly Media, 2005.
www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html
4. D. DeRoure (2010), “e-Science and the Web”. IEEE Computer, 43(5):90-93.
5. D. Giustini, “How Web 2.0 is changing medicine”. BMJ 2006; 333, pp. 1283-1284.
6. V. Gewin (2008), “The new networking nexus”, Nature 451, 1024-1025.
7. D. Crotty, Why Web 2.0 is Failing in Biology, Cold Spring Harbor Protocols, Feb 14, 2008;
http://www.cshblogs.org/chsprotocols/2008/12/14/why-web-20-is-failing-in-biology.
8. D. DeRoure, C. Goble and R. Stevens, “Designing the myExperiment Virtual Research Environment for the Social Sharing of
Workflows.” e-science 2007 – Third IEEE International Conference on e-Science and Grid Computing, 2007, Bangalore, India; 10-13 December
2007; pp. 603-610.
9. A. Jentzsch, B. Andersson, O. Hassanzadeh, S. Stephens and C. Bizer (2009), “Enabling Tailored Therapeutics with Linked Data,
Proceedings Linked Data on the Web” (LDOW2009), April 20th, 2009, Madrid, Spain.
10. M. Hall, D. Padua, K. Pingali (2009), “Compiler Research: The Next 50 Years: the next 50 years”, Communications of the ACM, Vol. 52
No. 2, Pages 60-67 Communications - February 2009 - Research Highlights (Page 67).
11. M. Bubak, M. Malawski, T. Gubala, M. Kasztelnik, P. Nowakowski, D. Harezlak, T. Bartynski, J. Kocot, E. Ciepiela, W. Funika, D.
Krol, B. Balis, M. Assel, and A. Tirado-Ramos: Virtual Laboratory for Collaborative Applications, In: M. Cannataro (Ed.) Handbook of Research
on Computational Grid Technologies for Life Sciences, Biomedicine and Healthcare, Information Science Reference, 2009, IGI Global.
12. E. Ciepiela, D. Harezlak, J. Kocot, T. Bartynski, M. Kasztelnik, P. Nowakowski, T. Gubała, M. Malawski and M. Bubak, “Exploratory
Programming in the Virtual Laboratory”, in Proceedings of the International Multiconference on Computer Science and Information Technology
pp. 621–628. Best paper award.
13. M. Malawski, T. Bartynski, and M. Bubak, “Invocation of operations from script-based grid applications”, Future Generation Computer
Systems, Volume 26, Issue 1, January 2010, Pages 138-146. Available: http://dx.doi.org/10.1016/j.future.2009.05.012.
14. J. T. Dudley and J. Atul, “In silico research in the era of cloud computing”, Nature Biotechnology, vol. 28, no. 11, 1181-1185 (2010).