Project

General

Profile

Creating a D-Net Workflow » History » Version 4

Alessia Bardi, 06/02/2015 05:31 PM

1 1 Alessia Bardi
h1. Creating a D-Net Workflow
2
3
Typologies of workflows:
4
* collection workflows: operate the systematic collection of metadata or files from data sources;
5
* processing workflows: implement the sequence of steps needed to stage collected metadata (or files) and form the target Information Spaces
6
7
h2. Defining a collection workflow
8
9
COMING SOON
10
11
h2. Definition of a processing workflow
12
13
The definition of a processing workflow basically consists in the creation of (at least) 2 D-Net profiles:
14 3 Alessia Bardi
# One (or more) [[Creating a D-Net Workflow#Workflow definition | _workflow_]] (@WorkflowDSResourceType@) : it defines the sequence of steps to execute (in sequence or in parallel).
15 4 Alessia Bardi
# One [[Creating a D-Net Workflow#Meta-Workflow definition | _meta-workflow_]] (@MetaWorkflowDSResourceType@): it describes a set of related workflows that can run in parallel or in sequence to accomplish a high-level functionality. 
16 1 Alessia Bardi
#* For example, in order to export the Information Space via OAI-PMH we have first to feed the OAI-PMH back-end (workflow 1: "OAI store feed"), then we can configure the back-end with the needed indexes (workflow 2: "OAI Post feed").
17 3 Alessia Bardi
 
18 1 Alessia Bardi
h3. Workflow definition
19
20
Let's see a simple example of a workflow with only one step that performs a sleep:
21
<pre>
22
<?xml version="1.0" encoding="UTF-8"?>
23
<RESOURCE_PROFILE xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
24
    <HEADER>
25
        <RESOURCE_IDENTIFIER value="wfUUID_V29ya2Zsb3dEU1Jlc291cmNlcy9Xb3JrZmxvd0RTUmVzb3VyY2VUeXBl"/>
26
        <RESOURCE_TYPE value="WorkflowDSResourceType"/>
27
        <RESOURCE_KIND value="WorkflowDSResources"/>
28
        <RESOURCE_URI value=""/>
29
        <DATE_OF_CREATION value="2015-02-06T11:13:51.0Z"/>
30
    </HEADER>
31
    <BODY>
32
        <WORKFLOW_NAME>Sleeping Beauty</WORKFLOW_NAME>
33
        <WORKFLOW_TYPE>Tutorial</WORKFLOW_TYPE>
34
        <WORKFLOW_PRIORITY>30</WORKFLOW_PRIORITY>
35
        <CONFIGURATION start="manual">
36
            <NODE name="sleep" isStart="true" type="Sleep" >
37
                <DESCRIPTION>Sleep for a while</DESCRIPTION>
38
                <PARAMETERS>
39
                    <PARAM name="sleepTimeMs" managedBy="user" type="int" required="true">1000</PARAM>
40
                </PARAMETERS>
41
                <ARCS>
42
                    <ARC to="success"/>
43
                </ARCS>
44
            </NODE>
45
        </CONFIGURATION>
46
        <STATUS/>
47
    </BODY>
48
</RESOURCE_PROFILE>
49
</pre> 
50
51
The @CONFIGURATION@ element of the profile tells us that the workflow is composed of one step.
52
The step is defined by the @NODE@ element:
53
* @name="sleep"@: the name that will appear in the UI when showing the workflow
54
* @isStart="true"@:  marks the node as starting node. Each workflow must have at least one start node
55
* @type="Sleep"@: the type of the node. Through this value we refer to a Java bean with name @wfNodeSleep@, whose class implements the business logic. Therefore we expect a bean like the following to be defined in an application-context XML:
56
<pre>
57
<bean id="wfNodeSleep" class="eu.dnetlib.msro.tutorial.SleepJobNode" scope="prototype"/>
58
</pre>
59
Note that the attribute @scope="prototype"@ is important because it tells the Spring framework to create a new bean instance of the object every time a request for that specific bean is made. 
60
We'll see how the actual class @SleepJobNode@ in a while.
61
* @DESCRIPTION@: brief description of the logics of the node. It will appear in the UI when showing the workflow
62
* @PARAMETERS@: parameters needed by the node. A parameter (@PARAM@), has the follwoing attributes:
63
** @name@ and @type@: name and type of the param. The Java class must define a property (accessible via public getter/setter) with the same name and type.
64
** @required@: tells if the param is required or not. If it is, then the workflow can be run only if the param is set.
65
** @managedBy@: who's in charge of setting the value. Allowed values are:
66
*** @user@: the UI will enable user to set the value
67
*** @system@: the UI will not enable the user to set the value, as the system is expected to set the parameter automatically (more on this in a while)
68
*** More attributes are available. TODO: add other optional attributes for parameter (e.g. @function@)
69
* @ARCS@: which node must be executed after this node. In the example, we simply go to the @success@ node, a pre-defined node that completes the workflow successfully. If there is an exception in one node, the pre-defined @failure@ node is executed automatically. 
70
71
Now let's see how the business logic of the node is implemented in the Java class @eu.dnetlib.msro.tutorial.SleepJobNode@:
72
<pre>
73
public class SleepJobNode extends SimpleJobNode{
74
   private int sleepTimeMs;
75
   
76
   @Override
77
   protected String execute(final NodeToken token) throws Exception {
78
      //removed exception handling for readability
79
      Thread.sleep(sleepTimeMs);
80
      return Arc.DEFAULT_ARC;
81
   }
82
83
   public int getSleepTimeMs(){ return sleepTimeMs;}
84
   public void setSleepTimeMs(int time){ sleepTimeMs = time;}
85
}
86
</pre>
87
88
@SleepJobNode@ extends the class @eu.dnetlib.msro.workflows.nodes.SimpleJobNode@, which is a D-Net class extending Sarasvati @com.googlecode.sarasvati.mem.MemNode@. The method to override is
89
@abstract protected String execute(final NodeToken token) throws Exception;@ 
90
that must return the name of the arc to follow. 
91
Through this mechanism you can guide the workflow to execute a path or another depending on what's happening during the execution of the current step.
92
93
For example, let's suppose we want to: 
94
* sleep and go to success if sleepTimeMs < 2000 
95
* sleep and go to failure if sleepTimeMs >= 2000
96
97
Our class should change to something like:
98
<pre>
99
public class SleepJobNode extends SimpleJobNode{
100
   private int sleepTimeMs;
101
   
102
   @Override
103
   protected String execute(final NodeToken token) throws Exception {
104
      //removed exception handling for readability
105
      Thread.sleep(sleepTimeMs);
106
      if(sleepTimeMs < 2000) return "ok";
107
      else return "oops";
108
   }
109
110
   public int getSleepTimeMs(){ return sleepTimeMs;}
111
   public void setSleepTimeMs(int time){ sleepTimeMs = time;}
112
}
113
</pre>
114
115
and the workflow should be adapted to consider the two different arcs: "ok" and "oops"
116
<pre>
117
<NODE name="sleep" isStart="true" type="Sleep" >
118
   <DESCRIPTION>Sleep for a while</DESCRIPTION>
119
   <PARAMETERS>
120
      <PARAM name="sleepTimeMs" managedBy="user" type="int" required="true">1000</PARAM>
121
   </PARAMETERS>
122
   <ARCS>
123
      <ARC name="ok" to="success"/>
124
      <ARC name="oops" to="failure"/>
125
   </ARCS>
126
</NODE>
127
</pre>
128
129
D-Net offers three JobNode super classes to choose from:
130
* @eu.dnetlib.msro.workflows.nodes.SimpleJobNode@: sync execution. It runs the @execute@ method in the main thread.
131
* @eu.dnetlib.msro.workflows.nodes.AsyncJobNode@: async execution. It runs the @execute@ method in a separate thread. 
132
* @eu.dnetlib.msro.workflows.nodes.BlackboardJobNode@: use this class when your node must perform a blackboard request to a D-Net service
133
134
h4. Extending BlackboardJobNode
135
136
D-Net services can implement the BlackBoard (BB) protocol that allows them to receive messages.
137
If a message is known to a service, the service will execute the corresponding action and notify the caller about its execution status by updating the blackboard in the service profile. The management of reading/writing BB messages is implemented in the @BlackboardJobNode@ class, so that you only have to:
138
* Specify the profile id of the service you want to call by overriding the method
139
@abstract protected String obtainServiceId(NodeToken token);@
140
* Prepare the BB message by overriding the method 
141
@abstract protected void prepareJob(final BlackboardJob job, final NodeToken token) throws Exception;@
142
143
Let's suppose we have a service accepting the BB message "SLEEP" that requires a parameter named @sleepTimeMs@.
144
Our @SleepJobNode@ class becomes:
145
<pre>
146
public class SleepJobNode extends BlackboardJobNode{
147
   private String xqueryForServiceId;
148
149
   private int sleepTimeMs;
150
151
   @Override
152
   protected String obtainServiceId(final NodeToken token) {
153
       List<String> serviceIds = getServiceLocator().getService(ISLookUpService.class).quickSearchProfile(xqueryForServiceId);
154
       if (serviceIds.size() < 1) throw new RuntimeException("Service id not found using query: " + xqueryForServiceId);
155
       return serviceIds.get(0);
156
   }   
157
158
   @Override
159
   protected void prepareJob(final BlackboardJob job, final NodeToken token) throws Exception {
160
      job.setAction("SLEEP");
161
      job.getParameters().put("sleepTimeMs", sleepTimeMs);
162
   }
163
164
   public int getSleepTimeMs(){ return sleepTimeMs;}
165
   public void setSleepTimeMs(int time){ sleepTimeMs = time;}
166
   public String getXqueryForServiceId() { return xqueryForServiceId;}
167
   public void setXqueryForServiceId(final String xqueryForServiceId) { this.xqueryForServiceId = xqueryForServiceId;}
168
}
169
</pre>
170
171
And the workflow node must declare a new parameter for the @xqueryForServiceId@:
172
<pre>
173
<NODE name="sleep" isStart="true" type="Sleep">
174
    <DESCRIPTION>Sleep for a while</DESCRIPTION>
175
    <PARAMETERS>
176
        <PARAM name="sleepTimeMs" managedBy="user" type="int" required="true">1000</PARAM>
177
        <PARAM name="xqueryForServiceId" managedBy="user" type="string" required="true">/RESOURCE_PROFILE[.//RESOURCE_TYPE/@value='SleepServiceResourceType']/HEADER/RESOURCE_IDENTIFIER/@value/string()</PARAM>
178
    </PARAMETERS>
179
    <ARCS>
180
        <ARC to="success"/>
181
    </ARCS>
182
</NODE>
183
</pre>
184
185
But what if one of your params shuld not be set by an end-user but it should rather be picked from the workflow env?
186
Let's suppose this is the case for the xquery. Then the SleepJobNode class should expect to receive the name of the env parameter where to look for the actual xquery to use:
187
<pre>
188
public class SleepJobNode extends BlackboardJobNode{
189
   private String xqueryForServiceIdParam;
190
   . . . 
191
   @Override
192
   protected String obtainServiceId(final NodeToken token) {
193
       final String xquery = token.getEnv().getAttribute(getXqueryForServiceIdParam());
194
       List<String> serviceIds = getServiceLocator().getService(ISLookUpService.class).quickSearchProfile(xquery);
195
       . . .
196
   }   
197
. . . 
198
</pre>
199
200
<pre>
201
<NODE name="sleep" isStart="true" type="Sleep">
202
    <DESCRIPTION>Sleep for a while</DESCRIPTION>
203
    <PARAMETERS>
204
        <PARAM name="sleepTimeMs" managedBy="user" type="int" required="true">1000</PARAM>
205
        <PARAM managedBy="system" name="xqueryForServiceIdParam" required="true" type="string">xqueryForServiceId_in_env</PARAM>
206
    </PARAMETERS>
207
    <ARCS>
208
        <ARC to="success"/>
209
    </ARCS>
210
</NODE>
211
</pre>
212
213
h3. Special parameters
214
215
Note that in case of parameter name clashes, the order is @sysparams < envparams < params@, thus XML node params override params in the env, which in turns override sys params.
216
217
h4. envParams
218
219 2 Claudio Atzori
@envParams@ is a special parameter that allows you to map one value of the wf env to another parameter of the same wf env, allowing to wave parameters from node to node.
220 1 Alessia Bardi
This can be useful when different nodes of the same workflow want to use the same value but they are coded to read it from different env parameters.
221
Example:
222
<pre>
223
<PARAM required="true" type="string" name="envParams" managedBy="system">
224
  { 
225
     'path' : 'rottenRecordsPath'
226
  }
227
</PARAM>
228
</pre>
229
The value of @envParams@ is a list of parameter mappings. In the example we have 1 mapping:
230
# Reads the env parameter named "rottenRecordsPath" and copy its value into another env parameter named "path"
231
Given the above configuration, a node will be able to find the value it needs in the expected env parameter ("path").
232
233
h4. sysParams
234
235
@sysParam@ is a special parameter that allows you to set a parameter with a value coming from a system property (those you set in *.properties files).
236
Example:
237
<pre>
238
<PARAM required="true" type="string" name="sysParams" managedBy="system">
239
{ 	
240
   'hbase.mapred.inputtable' : 'hbase.mapred.datatable', 
241
   'hbase.mapreduce.inputtable' : 'hbase.mapred.datatable'
242
}
243
</PARAM>
244
</pre>
245
The value of @sysParams@ is a list of parameter mappings. In the example we have 2 mapping: the value of the system property "hbase.mapred.datatable" will be copied into two new parameters: "hbase.mapred.inputtable" and "hbase.mapreduce.inputtable".
246 3 Alessia Bardi
247
h3. Meta-Workflow definition