The Getting started Guide introduces the main components and functionality of Kepler, and contains step-by-step instructions for using, modifying, and creating your own scientific workflows. The Guide provides a brief introduction to the application interface as well as to application-specific terminology and concepts. Once you are familiar with the general principles of Kepler, we recommend that you work through a couple of the sample workflows covered in later section to get a feel for how easy it is to use and modify workflow components and how components can be combined to form powerful workflows.
Here is a sample workflow (figure 1.1) for fetching the information for an 'Accession' number from DNA data bank of Japan (DDBJ) by using their REST service.
Figure 1.1: Example showing the use of REST service to fetch information related to an accession number.
Results, header information and gene sequence, obtained from running the above workflow are shown in figure 1.2
Figure 1.2: Header and gene sequence for an 'Accession' number AB000101
1.1. What is Kepler?
Kepler is a software application for the analysis and modeling of scientific data. Kepler simplifies the effort required to create executable models by using a visual representation of these processes. These representations, or "scientific workflows," display the flow of data among modeling components (figure 1.x).
Kepler allows scientists to create their own executable scientific workflows by simply dragging and dropping components onto a workflow creation area and connecting the components to construct a specific data flow, creating a visual model of the analytical portion of their research. Kepler represents the overall workflow visually so that it is easy to understand how data flow from one component to another. The resulting workflow can be saved in a text format, emailed to colleagues, and/or published for sharing with colleagues worldwide.
1.2. What are Scientific Workflows?
Scientific workflows are a flexible tool for accessing scientific data (metagenomics, streaming sensor data, medical and satellite images, simulation output, observational data, etc.) and executing complex analysis on the retrieved data.
Each workflow consists of analytical steps that may involve database access and querying, data analysis and mining, and intensive computations performed on high performance cluster computers. Each workflow step is represented by an "actor", which is a processing component that can be dragged and dropped into a workflow via Kepler's visual interface. Connected actors (and a few other components that we'll discuss in later sections) form a workflow, allowing scientists to inspect and display data on the fly as it is computed, make parameter changes as necessary, and re-run and reproduce experimental results.
2. Basic Components in Kepler
Scientific workflows consist of customizable components--directors, actors, and parameters--as well as relations and ports, which facilitate communication among the components.
2.1. Director and Actors
Kepler uses a director/actor metaphor to visually represent the various components of a workflow. A director controls (or directs) the execution of a workflow, just as a film director oversees a cast and crew. The actors take their execution instructions from the director. In other words, actors specify what processing occurs while the director specifies when it occurs.
Every workflow must have a director that controls the execution of the workflow using a particular model of computation. Each model of computation in Kepler is represented by its own director. For example, workflow execution can be synchronous, with processing
Composite actors are collections or sets of actors bundled together to perform more complex operations. Composite actors can be used in workflows, essentially acting as a nested or sub-workflow, see figure 2.1. An entire workflow can be represented as a composite actor and included as a component within an encapsulating workflow. In more complex workflows, it is possible to have different directors at different levels. Kepler provides a large set of actors for creating and editing scientific workflows. Actors can be added to Kepler for an individual's exclusive use and/or can be made available to others.
Figure 2.1: Schematic representation of the nested workflow (composite actor).
Each actor in a workflow can contain one or more ports used to consume or produce data and communicate with other actors in the workflow. Actors are connected in a workflow via their ports. The link that represents data flow between one actor port and another actor port is called a channel. Ports are categorized into three types:
• input port – for data consumed by the actor;
• output port – for data produced by the actor; and
• input/output port – for data both consumed and produced by the actor
Each port is configured to be either a "singular" or "multiple" port. A single input port can be connected to only a single channel, whereas a multiple input port can be connected to multiple channels. Single ports are designated with a dark triangle; multiple ports use a hollow triangle.
Relations allow users to "branch" a data flow. Branched data can be sent to multiple places in the workflow. For example, a scientist might wish to direct the output of an operational actor to another operational actor for further processing, and to a display actor to display the data at that specific reference point. By placing a Relation in the output data channel, the user can direct the information to both places simultaneously.
Parameters are configurable values that can be attached to a workflow or to individual directors or actors. For example, the Display actor has parameters called rowsDispalyed, columnsDisplayed, and title. Parameters rowsDispalyed and columnsDisplayed have default value of 10 and 40 respectively while title is empty as shown in figure 2.2. Alternately, user could configure the Display actor by supplying suitable values for parameters.
Figure 2.2: Screenshot of the interface used for setting the 'Display' actor parameters.
Unlike Display actor, most of the actors' parameters do not have specified default values and require configuration from the user. For an example, parameters of the SOAP Service actors can be configured by supplying the URL of wsdl file. Numbers of workflow iterations are controlled by Director parameters and the relevant iteration criteria.
Next sections provide an overview of the interface and step-by-step examples of how to open, edit, and run different scientific workflows.
3. Kepler Interface
Scientific workflows are edited and built in Kepler's easily navigated, drag-and-drop interface. The major sections of the Kepler application window (Figure 3.1) consist of the following:
• Menu bar – provides access to all Kepler functions.
• Toolbar – provides access to the most commonly used Kepler functions.
• Components and Data Access area – consists of a Components tab and Data tab. Both tabs contain a search function and display the library of available components and/or search results.
• Workflow canvas – provides space for displaying and creating workflows.
• Navigation area – displays the full workflow. Click a section of the workflow displayed in the Navigation area to select and display that section on the workflow canvas.
Figure 3.1: Screenshot of Kepler graphical user interface, various components are annotated (annotation is not done yet)
3.1. The Toolbar
The Kepler toolbar is designed to contain the most commonly used Kepler functions as shown in figure 3.2.
Figure 3.2: Annotated (annotation is to be added) toolbar components
The main sections of the toolbar include:
• Viewing --zoom in, reset, fit, and zoom out of the workflow on the Workflow canvas
• Run – run, pause, and stop the workflow without opening the Run window
• Ports – add single (black) or multi (white) input and output ports to workflows; add Relations to workflows
4. Basic Operations in Kepler
This section covers the basic operations in Kepler: opening and running an existing workflow, and some techniques for editing, designing, and creating your own workflows.
4.1. Creating a Basic Scientific Workflow
One of the strengths of Kepler is the ability to design, create, and save your own executable workflows. The general steps in creating a workflow are as follows:
1. Create a conceptual (paper or other medium) model of your scientific workflow.
2. Open the Kepler application.
3. Map the data and actor components available in Kepler to your conceptual model.
4. Select a director for your workflow and drag it to the Workflow canvas. For more information about choosing a director, please see Chapter 5 of the Kepler User Manual.
5. Drag the desired workflow components to the Workflow canvas.
6. Connect the workflow components.
7. Save the workflow.
The examples in this section illustrate how to begin to create your own workflows. The first example is the classic 'Hello World' workflow that demonstrates how easy it is to create a functioning workflow in Kepler. The second example is more practical and shows how to use your desktop data in a workflow.
4.1.1. Example 1: Creating a 'Hello World' Workflow
To create the 'Hello World' workflow, begin by thinking about the type of data used (e.g., text or string data); the type of output desired (e.g., textual or image display); and the type of director needed to execute this model (e.g., synchronous or parallel) The "Hello World" workflow requires a 'String Constant' actor, a text display actor, and a SDF director (in a SDF director, the data will flow through the actors based on the order in the workflow, and the workflow will run continuously, by default).
1. Open Kepler. A blank Workflow canvas will open.
2. In the Components and Data Access area, select the Components tab, then navigate to the "/Components/Director/" directory.
3. Drag the SDF Director to the top of the Workflow canvas. Double click on it and change the iterations value from 0 to 1 in the dialog box as shown in figure 4.1.
Figure 4.1: Screen shot of the interface for setting the SDF director's parameters.
4. In the Components tab, search for "String Constant" and select the String Constant actor.
5. Drag the String Constant actor onto the Workflow canvas and place it a little below the
SDF Director. Configure the 'String Constant' actor by right-clicking the actor and selecting Configure Actor from the menu, see figure 4.2.
Figure 4.2: Screenshot of actor configuration popup menu.
Figure 4.3: Screenshot showing the editable parameters of the 'String Constant'.
6. Type 'Hello World' without quotes in the value field and the 'firingCountLimit' to 1 of the "Edit parameters for String Constant", figure 4.3, dialog window and click Commit to save your changes. "Hello World" is a string value. In Kepler, all string values must be surrounded by quotes.
7. In the Components and Data Access area, search for "Display" and select the Display actor found under "Textual Output."
8. Drag the Display actor to the Workflow canvas.
9. Connect the output port of the Constant actor to the input port of the Display actor, figure 4.4.
Figure 4.4: Screenshot of the 'Hello World' workflow.
10. Run the model (Figure 4.5)
Figure 4.5: Screenshot of the Display actor upon execution of the 'Hello World' workflow.