Project

General

Profile

Creating a D-Net Workflow » History » Version 10

Alessia Bardi, 06/02/2015 06:03 PM

1 1 Alessia Bardi
h1. Creating a D-Net Workflow
2
3
Typologies of workflows:
4
* collection workflows: operate the systematic collection of metadata or files from data sources;
5
* processing workflows: implement the sequence of steps needed to stage collected metadata (or files) and form the target Information Spaces
6
7
h2. Defining a collection workflow
8
9
COMING SOON
10
11
h2. Definition of a processing workflow
12
13
The definition of a processing workflow basically consists in the creation of (at least) 2 D-Net profiles:
14 5 Alessia Bardi
# One (or more) [[Creating a D-Net Workflow#Workflow definition|workflow]] (@WorkflowDSResourceType@) : it defines the sequence of steps to execute (in sequence or in parallel).
15
# One [[Creating a D-Net Workflow#Meta-Workflow definition|meta-workflow]] (@MetaWorkflowDSResourceType@): it describes a set of related workflows that can run in parallel or in sequence to accomplish a high-level functionality. 
16 1 Alessia Bardi
#* For example, in order to export the Information Space via OAI-PMH we have first to feed the OAI-PMH back-end (workflow 1: "OAI store feed"), then we can configure the back-end with the needed indexes (workflow 2: "OAI Post feed").
17 3 Alessia Bardi
 
18 1 Alessia Bardi
h3. Workflow definition
19
20
Let's see a simple example of a workflow with only one step that performs a sleep:
21
<pre>
22
<?xml version="1.0" encoding="UTF-8"?>
23
<RESOURCE_PROFILE xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
24
    <HEADER>
25
        <RESOURCE_IDENTIFIER value="wfUUID_V29ya2Zsb3dEU1Jlc291cmNlcy9Xb3JrZmxvd0RTUmVzb3VyY2VUeXBl"/>
26
        <RESOURCE_TYPE value="WorkflowDSResourceType"/>
27
        <RESOURCE_KIND value="WorkflowDSResources"/>
28
        <RESOURCE_URI value=""/>
29
        <DATE_OF_CREATION value="2015-02-06T11:13:51.0Z"/>
30
    </HEADER>
31
    <BODY>
32
        <WORKFLOW_NAME>Sleeping Beauty</WORKFLOW_NAME>
33
        <WORKFLOW_TYPE>Tutorial</WORKFLOW_TYPE>
34
        <WORKFLOW_PRIORITY>30</WORKFLOW_PRIORITY>
35
        <CONFIGURATION start="manual">
36
            <NODE name="sleep" isStart="true" type="Sleep" >
37
                <DESCRIPTION>Sleep for a while</DESCRIPTION>
38
                <PARAMETERS>
39
                    <PARAM name="sleepTimeMs" managedBy="user" type="int" required="true">1000</PARAM>
40
                </PARAMETERS>
41
                <ARCS>
42
                    <ARC to="success"/>
43
                </ARCS>
44
            </NODE>
45
        </CONFIGURATION>
46
        <STATUS/>
47
    </BODY>
48
</RESOURCE_PROFILE>
49
</pre> 
50
51
The @CONFIGURATION@ element of the profile tells us that the workflow is composed of one step.
52
The step is defined by the @NODE@ element:
53
* @name="sleep"@: the name that will appear in the UI when showing the workflow
54
* @isStart="true"@:  marks the node as starting node. Each workflow must have at least one start node
55
* @type="Sleep"@: the type of the node. Through this value we refer to a Java bean with name @wfNodeSleep@, whose class implements the business logic. Therefore we expect a bean like the following to be defined in an application-context XML:
56
<pre>
57
<bean id="wfNodeSleep" class="eu.dnetlib.msro.tutorial.SleepJobNode" scope="prototype"/>
58
</pre>
59
Note that the attribute @scope="prototype"@ is important because it tells the Spring framework to create a new bean instance of the object every time a request for that specific bean is made. 
60
We'll see how the actual class @SleepJobNode@ in a while.
61
* @DESCRIPTION@: brief description of the logics of the node. It will appear in the UI when showing the workflow
62
* @PARAMETERS@: parameters needed by the node. A parameter (@PARAM@), has the follwoing attributes:
63
** @name@ and @type@: name and type of the param. The Java class must define a property (accessible via public getter/setter) with the same name and type.
64
** @required@: tells if the param is required or not. If it is, then the workflow can be run only if the param is set.
65
** @managedBy@: who's in charge of setting the value. Allowed values are:
66
*** @user@: the UI will enable user to set the value
67
*** @system@: the UI will not enable the user to set the value, as the system is expected to set the parameter automatically (more on this in a while)
68
*** More attributes are available. TODO: add other optional attributes for parameter (e.g. @function@)
69
* @ARCS@: which node must be executed after this node. In the example, we simply go to the @success@ node, a pre-defined node that completes the workflow successfully. If there is an exception in one node, the pre-defined @failure@ node is executed automatically. 
70
71
Now let's see how the business logic of the node is implemented in the Java class @eu.dnetlib.msro.tutorial.SleepJobNode@:
72
<pre>
73
public class SleepJobNode extends SimpleJobNode{
74
   private int sleepTimeMs;
75
   
76
   @Override
77
   protected String execute(final NodeToken token) throws Exception {
78
      //removed exception handling for readability
79
      Thread.sleep(sleepTimeMs);
80
      return Arc.DEFAULT_ARC;
81
   }
82
83
   public int getSleepTimeMs(){ return sleepTimeMs;}
84
   public void setSleepTimeMs(int time){ sleepTimeMs = time;}
85
}
86
</pre>
87
88 6 Alessia Bardi
@SleepJobNode@ extends the class <pre>eu.dnetlib.msro.workflows.nodes.SimpleJobNode</pre>, which is a D-Net class extending Sarasvati <pre>com.googlecode.sarasvati.mem.MemNode</pre>. The method to override is
89
<pre>abstract protected String execute(final NodeToken token) throws Exception;</pre> 
90 1 Alessia Bardi
that must return the name of the arc to follow. 
91
Through this mechanism you can guide the workflow to execute a path or another depending on what's happening during the execution of the current step.
92
93
For example, let's suppose we want to: 
94
* sleep and go to success if sleepTimeMs < 2000 
95
* sleep and go to failure if sleepTimeMs >= 2000
96
97
Our class should change to something like:
98
<pre>
99
public class SleepJobNode extends SimpleJobNode{
100
   private int sleepTimeMs;
101
   
102
   @Override
103
   protected String execute(final NodeToken token) throws Exception {
104
      //removed exception handling for readability
105
      Thread.sleep(sleepTimeMs);
106
      if(sleepTimeMs < 2000) return "ok";
107
      else return "oops";
108
   }
109
110
   public int getSleepTimeMs(){ return sleepTimeMs;}
111
   public void setSleepTimeMs(int time){ sleepTimeMs = time;}
112
}
113
</pre>
114
115
and the workflow should be adapted to consider the two different arcs: "ok" and "oops"
116
<pre>
117
<NODE name="sleep" isStart="true" type="Sleep" >
118
   <DESCRIPTION>Sleep for a while</DESCRIPTION>
119
   <PARAMETERS>
120
      <PARAM name="sleepTimeMs" managedBy="user" type="int" required="true">1000</PARAM>
121
   </PARAMETERS>
122
   <ARCS>
123
      <ARC name="ok" to="success"/>
124
      <ARC name="oops" to="failure"/>
125
   </ARCS>
126
</NODE>
127
</pre>
128
129
D-Net offers three JobNode super classes to choose from:
130 6 Alessia Bardi
* <pre>eu.dnetlib.msro.workflows.nodes.SimpleJobNode</pre>: sync execution. It runs the @execute@ method in the main thread.
131
* <pre>eu.dnetlib.msro.workflows.nodes.AsyncJobNode</pre>: async execution. It runs the @execute@ method in a separate thread. 
132
* <pre>eu.dnetlib.msro.workflows.nodes.BlackboardJobNode</pre>: use this class when your node must perform a blackboard request to a D-Net service
133 1 Alessia Bardi
134
h4. Extending BlackboardJobNode
135
136 6 Alessia Bardi
D-Net services can implement the BlackBoard (BB) protocol, which allows them to receive messages.
137 1 Alessia Bardi
If a message is known to a service, the service will execute the corresponding action and notify the caller about its execution status by updating the blackboard in the service profile. The management of reading/writing BB messages is implemented in the @BlackboardJobNode@ class, so that you only have to:
138
* Specify the profile id of the service you want to call by overriding the method
139 6 Alessia Bardi
<pre>abstract protected String obtainServiceId(NodeToken token);</pre>
140 1 Alessia Bardi
* Prepare the BB message by overriding the method 
141 6 Alessia Bardi
<pre>abstract protected void prepareJob(final BlackboardJob job, final NodeToken token) throws Exception;</pre>
142 1 Alessia Bardi
143
Let's suppose we have a service accepting the BB message "SLEEP" that requires a parameter named @sleepTimeMs@.
144
Our @SleepJobNode@ class becomes:
145
<pre>
146
public class SleepJobNode extends BlackboardJobNode{
147
   private String xqueryForServiceId;
148
149
   private int sleepTimeMs;
150
151
   @Override
152
   protected String obtainServiceId(final NodeToken token) {
153
       List<String> serviceIds = getServiceLocator().getService(ISLookUpService.class).quickSearchProfile(xqueryForServiceId);
154
       if (serviceIds.size() < 1) throw new RuntimeException("Service id not found using query: " + xqueryForServiceId);
155
       return serviceIds.get(0);
156
   }   
157
158
   @Override
159
   protected void prepareJob(final BlackboardJob job, final NodeToken token) throws Exception {
160
      job.setAction("SLEEP");
161
      job.getParameters().put("sleepTimeMs", sleepTimeMs);
162
   }
163
164
   public int getSleepTimeMs(){ return sleepTimeMs;}
165
   public void setSleepTimeMs(int time){ sleepTimeMs = time;}
166
   public String getXqueryForServiceId() { return xqueryForServiceId;}
167
   public void setXqueryForServiceId(final String xqueryForServiceId) { this.xqueryForServiceId = xqueryForServiceId;}
168
}
169
</pre>
170
171
And the workflow node must declare a new parameter for the @xqueryForServiceId@:
172
<pre>
173
<NODE name="sleep" isStart="true" type="Sleep">
174
    <DESCRIPTION>Sleep for a while</DESCRIPTION>
175
    <PARAMETERS>
176
        <PARAM name="sleepTimeMs" managedBy="user" type="int" required="true">1000</PARAM>
177
        <PARAM name="xqueryForServiceId" managedBy="user" type="string" required="true">/RESOURCE_PROFILE[.//RESOURCE_TYPE/@value='SleepServiceResourceType']/HEADER/RESOURCE_IDENTIFIER/@value/string()</PARAM>
178
    </PARAMETERS>
179
    <ARCS>
180
        <ARC to="success"/>
181
    </ARCS>
182
</NODE>
183
</pre>
184
185
But what if one of your params shuld not be set by an end-user but it should rather be picked from the workflow env?
186
Let's suppose this is the case for the xquery. Then the SleepJobNode class should expect to receive the name of the env parameter where to look for the actual xquery to use:
187
<pre>
188
public class SleepJobNode extends BlackboardJobNode{
189
   private String xqueryForServiceIdParam;
190
   . . . 
191
   @Override
192
   protected String obtainServiceId(final NodeToken token) {
193
       final String xquery = token.getEnv().getAttribute(getXqueryForServiceIdParam());
194
       List<String> serviceIds = getServiceLocator().getService(ISLookUpService.class).quickSearchProfile(xquery);
195
       . . .
196
   }   
197
. . . 
198
</pre>
199
200
<pre>
201
<NODE name="sleep" isStart="true" type="Sleep">
202
    <DESCRIPTION>Sleep for a while</DESCRIPTION>
203
    <PARAMETERS>
204
        <PARAM name="sleepTimeMs" managedBy="user" type="int" required="true">1000</PARAM>
205
        <PARAM managedBy="system" name="xqueryForServiceIdParam" required="true" type="string">xqueryForServiceId_in_env</PARAM>
206
    </PARAMETERS>
207
    <ARCS>
208
        <ARC to="success"/>
209
    </ARCS>
210
</NODE>
211
</pre>
212
213
h3. Special parameters
214
215
Note that in case of parameter name clashes, the order is @sysparams < envparams < params@, thus XML node params override params in the env, which in turns override sys params.
216
217
h4. envParams
218
219 2 Claudio Atzori
@envParams@ is a special parameter that allows you to map one value of the wf env to another parameter of the same wf env, allowing to wave parameters from node to node.
220 1 Alessia Bardi
This can be useful when different nodes of the same workflow want to use the same value but they are coded to read it from different env parameters.
221
Example:
222
<pre>
223
<PARAM required="true" type="string" name="envParams" managedBy="system">
224
  { 
225
     'path' : 'rottenRecordsPath'
226
  }
227
</PARAM>
228
</pre>
229 7 Alessia Bardi
The value of @envParams@ is a list of parameter mappings. In the example we have 1 mapping: the value in the env parameter named "rottenRecordsPath" is copied into another env parameter named "path".
230 1 Alessia Bardi
Given the above configuration, a node will be able to find the value it needs in the expected env parameter ("path").
231
232
h4. sysParams
233
234 8 Alessia Bardi
@sysParams@ is a special parameter that allows you to set a parameter with a value coming from a system property (those you set in *.properties files).
235 1 Alessia Bardi
Example:
236
<pre>
237
<PARAM required="true" type="string" name="sysParams" managedBy="system">
238
{ 	
239
   'hbase.mapred.inputtable' : 'hbase.mapred.datatable', 
240
   'hbase.mapreduce.inputtable' : 'hbase.mapred.datatable'
241
}
242
</PARAM>
243
</pre>
244
The value of @sysParams@ is a list of parameter mappings. In the example we have 2 mapping: the value of the system property "hbase.mapred.datatable" will be copied into two new parameters: "hbase.mapred.inputtable" and "hbase.mapreduce.inputtable".
245 3 Alessia Bardi
246
h3. Meta-Workflow definition
247 9 Alessia Bardi
248
A meta-workflow implements a high-level functionality by composing workflows.
249
In the simplest scenario, one meta-workflow refers to only one workflow, as follows:
250
251
<pre>
252
<RESOURCE_PROFILE>
253
    <HEADER>
254
        <RESOURCE_IDENTIFIER
255
            value="metaWF-UUID_TWV0YVdvcmtmbG93RFNSZXNvdXJjZXMvTWV0YVdvcmtmbG93RFNSZXNvdXJjZVR5cGU="/>
256
        <RESOURCE_TYPE value="MetaWorkflowDSResourceType"/>
257
        <RESOURCE_KIND value="MetaWorkflowDSResources"/>
258
        <RESOURCE_URI value=""/>
259
        <DATE_OF_CREATION value="2006-05-04T18:13:51.0Z"/>
260
    </HEADER>
261
    <BODY>
262 10 Alessia Bardi
        <METAWORKFLOW_NAME family="tutorial">Sleeping</METAWORKFLOW_NAME>
263
        <METAWORKFLOW_DESCRIPTION>A sample meta-wf</METAWORKFLOW_DESCRIPTION>
264
        <METAWORKFLOW_SECTION>Tutorial</METAWORKFLOW_SECTION>
265 9 Alessia Bardi
        <ADMIN_EMAIL>wf-admin-1@wf.com,wf-admin-2@wf.com</ADMIN_EMAIL>
266
        <CONFIGURATION status="EXECUTABLE">
267
            <WORKFLOW
268
                id="wf-UUID_V29ya2Zsb3dEU1Jlc291cmNlcy9Xb3JrZmxvd0RTUmVzb3VyY2VUeXBl"
269
                name="Sleeping Beauty"/>
270
        </CONFIGURATION>
271
        <SCHEDULING enabled="false">
272
            <CRON>29 5 22 ? * *</CRON>
273
            <MININTERVAL>10080</MININTERVAL>
274
        </SCHEDULING>
275
    </BODY>
276
</RESOURCE_PROFILE>
277
</pre>
278
279
Let's have a look at the information in the @BODY@ element:
280
* @METAWORKFLOW_NAME@, @family@, @METAWORKFLOW_DESCRIPTION@, and @METAWORKFLOW_SECTION@, : guide the UI for a correct visualization
281
* @ADMIN_EMAIL@: comma separated list of emails. Put here the emails of people that want to receive emails about the failure and completion of the workflows included in this meta-workflow.
282
* @WORKFLOW@: reference to an existing workflow
283
* @SCHEDULING@: 
284
** @enabled@: set to true if you want the meta-workflow to run automatically according to the given schedule
285
** @CRON@: cron expression to schedule the meta-workflow execution
286
** @MININTERVAL@: milliseconds (?) between two subsequent runs of the meta-workflow.
287
288
More complex meta-workflows may include different workflows running in sequence, in parallel, or a mix of them.
289
The data provision meta-workflow of OpenAIRE is a good example of a complex meta-workflow. It composes a total of 10 different workflows, where workflows @uuid2@, @uuid3@, @uuid4@, @uuid5@ runs in parallel. Note that the parallel/sequence composition is defined by the level of nesting of the @WORKFLOW@ XML element.
290
<pre>
291
<CONFIGURATION status="EXECUTABLE">
292
    <WORKFLOW id="uuid1_V29ya2Zsb3dEU1Jlc291cmNlcy9Xb3JrZmxvd0RTUmVzb3VyY2VUeXBl" name="reset">
293
        <WORKFLOW id="uuid2_V29ya2Zsb3dEU1Jlc291cmNlcy9Xb3JrZmxvd0RTUmVzb3VyY2VUeXBl" name="db2hbase"/>
294
        <WORKFLOW id="uuid3_V29ya2Zsb3dEU1Jlc291cmNlcy9Xb3JrZmxvd0RTUmVzb3VyY2VUeXBl" name="oaf2hbase"/>
295
        <WORKFLOW id="uuid4_V29ya2Zsb3dEU1Jlc291cmNlcy9Xb3JrZmxvd0RTUmVzb3VyY2VUeXBl" name="odf2hbase"/>
296
        <WORKFLOW id="uuid5_V29ya2Zsb3dEU1Jlc291cmNlcy9Xb3JrZmxvd0RTUmVzb3VyY2VUeXBl" name="actions2hbase">
297
            <WORKFLOW id="uuid6_V29ya2Zsb3dEU1Jlc291cmNlcy9Xb3JrZmxvd0RTUmVzb3VyY2VUeXBl" name="deduplication organizations">
298
                <WORKFLOW id="uuid7_V29ya2Zsb3dEU1Jlc291cmNlcy9Xb3JrZmxvd0RTUmVzb3VyY2VUeXBl" name="deduplication persons">
299
                    <WORKFLOW id="uuid8_V29ya2Zsb3dEU1Jlc291cmNlcy9Xb3JrZmxvd0RTUmVzb3VyY2VUeXBl" name="deduplication results">
300
                        <WORKFLOW id="uuid9_V29ya2Zsb3dEU1Jlc291cmNlcy9Xb3JrZmxvd0RTUmVzb3VyY2VUeXBl" name="update provision">
301
                            <WORKFLOW id="uuid10_V29ya2Zsb3dEU1Jlc291cmNlcy9Xb3JrZmxvd0RTUmVzb3VyY2VUeXBl" name="prepare pre-public info space"/>
302
                        </WORKFLOW>
303
                    </WORKFLOW>
304
                </WORKFLOW>
305
            </WORKFLOW>
306
        </WORKFLOW>
307
    </WORKFLOW>
308
</CONFIGURATION>
309
</pre>