Project

General

Profile

Creating a D-Net Workflow » History » Version 16

Claudio Atzori, 09/02/2015 11:21 AM

1 1 Alessia Bardi
h1. Creating a D-Net Workflow
2
3
Typologies of workflows:
4
* collection workflows: operate the systematic collection of metadata or files from data sources;
5
* processing workflows: implement the sequence of steps needed to stage collected metadata (or files) and form the target Information Spaces
6
7 16 Claudio Atzori
h2. Prerequisites
8 1 Alessia Bardi
9 16 Claudio Atzori
In order to implement workflow nodes you need to depend on the maven artifact @dnet-msro-service@ providing the building blocks. Currently the latest release is @[3.0.0]@, and this module is work in progress.
10 1 Alessia Bardi
11 16 Claudio Atzori
<pre>
12
<dependency>
13
	<groupId>eu.dnetlib</groupId>
14
	<artifactId>dnet-msro-service</artifactId>
15
	<version>[3.0.0,4.0.0)</version>
16
</dependency>
17
</pre>
18 1 Alessia Bardi
19 16 Claudio Atzori
h2. Definition of workflow
20
21 1 Alessia Bardi
The definition of a processing workflow basically consists in the creation of (at least) 2 D-Net profiles:
22 5 Alessia Bardi
# One (or more) [[Creating a D-Net Workflow#Workflow definition|workflow]] (@WorkflowDSResourceType@) : it defines the sequence of steps to execute (in sequence or in parallel).
23
# One [[Creating a D-Net Workflow#Meta-Workflow definition|meta-workflow]] (@MetaWorkflowDSResourceType@): it describes a set of related workflows that can run in parallel or in sequence to accomplish a high-level functionality. 
24 1 Alessia Bardi
#* For example, in order to export the Information Space via OAI-PMH we have first to feed the OAI-PMH back-end (workflow 1: "OAI store feed"), then we can configure the back-end with the needed indexes (workflow 2: "OAI Post feed").
25 3 Alessia Bardi
 
26 1 Alessia Bardi
h3. Workflow definition
27
28
Let's see a simple example of a workflow with only one step that performs a sleep:
29
<pre>
30
<?xml version="1.0" encoding="UTF-8"?>
31
<RESOURCE_PROFILE xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
32
    <HEADER>
33
        <RESOURCE_IDENTIFIER value="wfUUID_V29ya2Zsb3dEU1Jlc291cmNlcy9Xb3JrZmxvd0RTUmVzb3VyY2VUeXBl"/>
34
        <RESOURCE_TYPE value="WorkflowDSResourceType"/>
35
        <RESOURCE_KIND value="WorkflowDSResources"/>
36
        <RESOURCE_URI value=""/>
37
        <DATE_OF_CREATION value="2015-02-06T11:13:51.0Z"/>
38
    </HEADER>
39
    <BODY>
40
        <WORKFLOW_NAME>Sleeping Beauty</WORKFLOW_NAME>
41
        <WORKFLOW_TYPE>Tutorial</WORKFLOW_TYPE>
42
        <WORKFLOW_PRIORITY>30</WORKFLOW_PRIORITY>
43
        <CONFIGURATION start="manual">
44
            <NODE name="sleep" isStart="true" type="Sleep" >
45
                <DESCRIPTION>Sleep for a while</DESCRIPTION>
46
                <PARAMETERS>
47
                    <PARAM name="sleepTimeMs" managedBy="user" type="int" required="true">1000</PARAM>
48
                </PARAMETERS>
49
                <ARCS>
50
                    <ARC to="success"/>
51
                </ARCS>
52
            </NODE>
53
        </CONFIGURATION>
54
        <STATUS/>
55
    </BODY>
56
</RESOURCE_PROFILE>
57
</pre> 
58
59
The @CONFIGURATION@ element of the profile tells us that the workflow is composed of one step.
60
The step is defined by the @NODE@ element:
61
* @name="sleep"@: the name that will appear in the UI when showing the workflow
62
* @isStart="true"@:  marks the node as starting node. Each workflow must have at least one start node
63
* @type="Sleep"@: the type of the node. Through this value we refer to a Java bean with name @wfNodeSleep@, whose class implements the business logic. Therefore we expect a bean like the following to be defined in an application-context XML:
64
<pre>
65
<bean id="wfNodeSleep" class="eu.dnetlib.msro.tutorial.SleepJobNode" scope="prototype"/>
66
</pre>
67
Note that the attribute @scope="prototype"@ is important because it tells the Spring framework to create a new bean instance of the object every time a request for that specific bean is made. 
68
We'll see how the actual class @SleepJobNode@ in a while.
69
* @DESCRIPTION@: brief description of the logics of the node. It will appear in the UI when showing the workflow
70
* @PARAMETERS@: parameters needed by the node. A parameter (@PARAM@), has the follwoing attributes:
71
** @name@ and @type@: name and type of the param. The Java class must define a property (accessible via public getter/setter) with the same name and type.
72
** @required@: tells if the param is required or not. If it is, then the workflow can be run only if the param is set.
73
** @managedBy@: who's in charge of setting the value. Allowed values are:
74
*** @user@: the UI will enable user to set the value
75
*** @system@: the UI will not enable the user to set the value, as the system is expected to set the parameter automatically (more on this in a while)
76
*** More attributes are available. TODO: add other optional attributes for parameter (e.g. @function@)
77
* @ARCS@: which node must be executed after this node. In the example, we simply go to the @success@ node, a pre-defined node that completes the workflow successfully. If there is an exception in one node, the pre-defined @failure@ node is executed automatically. 
78
79 16 Claudio Atzori
h3. JobNode types
80 1 Alessia Bardi
81 16 Claudio Atzori
The D-Net workflow engine is based on "Sarasvati":https://code.google.com/p/sarasvati and offers three JobNode super classes to choose from, depending on the specific workflow requirements. Every jobNode extends the Sarasvati class 
82 1 Alessia Bardi
83 16 Claudio Atzori
<pre>com.googlecode.sarasvati.mem.MemNode</pre>
84
85
and defines the following abstract method to override
86
87
<pre>abstract protected String execute(final NodeToken token) throws Exception;</pre>
88
89 1 Alessia Bardi
that must return the name of the arc to follow. 
90
Through this mechanism you can guide the workflow to execute a path or another depending on what's happening during the execution of the current step.
91
92 16 Claudio Atzori
The D-Net workflow engine offers three JobNode super classes to choose from, depending on the specific workflow requirements:
93
94
<pre>eu.dnetlib.msro.workflows.nodes.SimpleJobNode</pre>
95
96
Basic workflow node implementation. The execution is synchronous, and it runs the @execute@ method in the main thread. The use of this kind of job nodes is suited for short lasting synchronous service calls, e.g. isLookup, mapped resultsets. This node is synchronous with sarasvati, which won't be able to execute in parallel more than one instance of this nodes at the same time. To overcome this limitation, use the following node type.
97
98
<pre>eu.dnetlib.msro.workflows.nodes.AsyncJobNode</pre>
99
100
The AsyncJob nodes delegates the execution of the @execute@ method in a separated thread. This implies that the jobNode is decoupled from sarasvati, allowing parallel execution of different AsyncJobNodes at the same time. The workflow doesn't advance until the delegated thread running the @execute@ method doesn't complete.
101
102
<pre>eu.dnetlib.msro.workflows.nodes.BlackboardJobNode</pre>
103
104
The BlackboardJobNode allows to send blackboard messages to D-Net services implementing the blackboard protocol, and differently from the previous two, exposes a different signature:
105
106
<pre>
107
abstract protected String obtainServiceId(NodeToken token);
108
109
abstract protected void prepareJob(final BlackboardJob job, final NodeToken token) throws Exception;
110
</pre>
111
112
h3. JobNode examples
113
114
Now let's see how the business logic of the node is implemented in the Java class @eu.dnetlib.msro.tutorial.SleepJobNode@:
115
116
Let's suppose we want to: 
117 1 Alessia Bardi
* sleep and go to success if sleepTimeMs < 2000 
118
* sleep and go to failure if sleepTimeMs >= 2000
119
120
<pre>
121
public class SleepJobNode extends SimpleJobNode{
122
   private int sleepTimeMs;
123
   
124
   @Override
125
   protected String execute(final NodeToken token) throws Exception {
126
      //removed exception handling for readability
127
      Thread.sleep(sleepTimeMs);
128 16 Claudio Atzori
      if(sleepTimeMs < 2000) 
129
         return "ok";
130
      else 
131
         return "oops";
132 1 Alessia Bardi
   }
133
134
   public int getSleepTimeMs(){ return sleepTimeMs;}
135
   public void setSleepTimeMs(int time){ sleepTimeMs = time;}
136
}
137
</pre>
138
139
and the workflow should be adapted to consider the two different arcs: "ok" and "oops"
140
<pre>
141
<NODE name="sleep" isStart="true" type="Sleep" >
142
   <DESCRIPTION>Sleep for a while</DESCRIPTION>
143
   <PARAMETERS>
144
      <PARAM name="sleepTimeMs" managedBy="user" type="int" required="true">1000</PARAM>
145
   </PARAMETERS>
146
   <ARCS>
147 6 Alessia Bardi
      <ARC name="ok" to="success"/>
148
      <ARC name="oops" to="failure"/>
149
   </ARCS>
150 1 Alessia Bardi
</NODE>
151
</pre>
152
153 6 Alessia Bardi
h4. Extending BlackboardJobNode
154 1 Alessia Bardi
155
D-Net services can implement the BlackBoard (BB) protocol, which allows them to receive messages.
156 6 Alessia Bardi
If a message is known to a service, the service will execute the corresponding action and notify the caller about its execution status by updating the blackboard in the service profile. The management of reading/writing BB messages is implemented in the @BlackboardJobNode@ class, so that you only have to:
157 1 Alessia Bardi
* Specify the profile id of the service you want to call by overriding the method
158 6 Alessia Bardi
<pre>abstract protected String obtainServiceId(NodeToken token);</pre>
159 1 Alessia Bardi
* Prepare the BB message by overriding the method 
160
<pre>abstract protected void prepareJob(final BlackboardJob job, final NodeToken token) throws Exception;</pre>
161
162
Let's suppose we have a service accepting the BB message "SLEEP" that requires a parameter named @sleepTimeMs@.
163
Our @SleepJobNode@ class becomes:
164
<pre>
165
public class SleepJobNode extends BlackboardJobNode{
166
   private String xqueryForServiceId;
167
168
   private int sleepTimeMs;
169
170
   @Override
171
   protected String obtainServiceId(final NodeToken token) {
172 16 Claudio Atzori
       return getServiceLocator().getService(ISLookUpService.class).getResourceProfileByQuery(xqueryForServiceId);
173 1 Alessia Bardi
   }   
174
175
   @Override
176
   protected void prepareJob(final BlackboardJob job, final NodeToken token) throws Exception {
177
      job.setAction("SLEEP");
178
      job.getParameters().put("sleepTimeMs", sleepTimeMs);
179
   }
180
181
   public int getSleepTimeMs(){ return sleepTimeMs;}
182
   public void setSleepTimeMs(int time){ sleepTimeMs = time;}
183
   public String getXqueryForServiceId() { return xqueryForServiceId;}
184
   public void setXqueryForServiceId(final String xqueryForServiceId) { this.xqueryForServiceId = xqueryForServiceId;}
185
}
186
</pre>
187
188
And the workflow node must declare a new parameter for the @xqueryForServiceId@:
189
<pre>
190
<NODE name="sleep" isStart="true" type="Sleep">
191
    <DESCRIPTION>Sleep for a while</DESCRIPTION>
192
    <PARAMETERS>
193
        <PARAM name="sleepTimeMs" managedBy="user" type="int" required="true">1000</PARAM>
194
        <PARAM name="xqueryForServiceId" managedBy="user" type="string" required="true">/RESOURCE_PROFILE[.//RESOURCE_TYPE/@value='SleepServiceResourceType']/HEADER/RESOURCE_IDENTIFIER/@value/string()</PARAM>
195
    </PARAMETERS>
196
    <ARCS>
197
        <ARC to="success"/>
198
    </ARCS>
199
</NODE>
200
</pre>
201
202
But what if one of your params shuld not be set by an end-user but it should rather be picked from the workflow env?
203
Let's suppose this is the case for the xquery. Then the SleepJobNode class should expect to receive the name of the env parameter where to look for the actual xquery to use:
204
<pre>
205
public class SleepJobNode extends BlackboardJobNode{
206
   private String xqueryForServiceIdParam;
207
   . . . 
208
   @Override
209
   protected String obtainServiceId(final NodeToken token) {
210
       final String xquery = token.getEnv().getAttribute(getXqueryForServiceIdParam());
211 16 Claudio Atzori
       return getServiceLocator().getService(ISLookUpService.class).getResourceProfileByQuery(xquery);
212 1 Alessia Bardi
   }   
213
. . . 
214
</pre>
215
216
<pre>
217
<NODE name="sleep" isStart="true" type="Sleep">
218
    <DESCRIPTION>Sleep for a while</DESCRIPTION>
219
    <PARAMETERS>
220
        <PARAM name="sleepTimeMs" managedBy="user" type="int" required="true">1000</PARAM>
221
        <PARAM managedBy="system" name="xqueryForServiceIdParam" required="true" type="string">xqueryForServiceId_in_env</PARAM>
222
    </PARAMETERS>
223
    <ARCS>
224
        <ARC to="success"/>
225
    </ARCS>
226
</NODE>
227
</pre>
228
229
h3. Special parameters
230
231
Note that in case of parameter name clashes, the order is @sysparams < envparams < params@, thus XML node params override params in the env, which in turns override sys params.
232
233
h4. envParams
234
235 2 Claudio Atzori
@envParams@ is a special parameter that allows you to map one value of the wf env to another parameter of the same wf env, allowing to wave parameters from node to node.
236 1 Alessia Bardi
This can be useful when different nodes of the same workflow want to use the same value but they are coded to read it from different env parameters.
237
Example:
238
<pre>
239
<PARAM required="true" type="string" name="envParams" managedBy="system">
240
  { 
241
     'path' : 'rottenRecordsPath'
242
  }
243
</PARAM>
244
</pre>
245 7 Alessia Bardi
The value of @envParams@ is a list of parameter mappings. In the example we have 1 mapping: the value in the env parameter named "rottenRecordsPath" is copied into another env parameter named "path".
246 1 Alessia Bardi
Given the above configuration, a node will be able to find the value it needs in the expected env parameter ("path").
247
248
h4. sysParams
249
250 8 Alessia Bardi
@sysParams@ is a special parameter that allows you to set a parameter with a value coming from a system property (those you set in *.properties files).
251 1 Alessia Bardi
Example:
252
<pre>
253
<PARAM required="true" type="string" name="sysParams" managedBy="system">
254
{ 	
255
   'hbase.mapred.inputtable' : 'hbase.mapred.datatable', 
256
   'hbase.mapreduce.inputtable' : 'hbase.mapred.datatable'
257
}
258
</PARAM>
259
</pre>
260
The value of @sysParams@ is a list of parameter mappings. In the example we have 2 mapping: the value of the system property "hbase.mapred.datatable" will be copied into two new parameters: "hbase.mapred.inputtable" and "hbase.mapreduce.inputtable".
261 3 Alessia Bardi
262 14 Alessia Bardi
h3. Re-using existing job node beans
263 13 Alessia Bardi
264
Often you will need generic nodes in your workflow that have already been implemented and used.
265
before you implement a new node for a generic feature, it is better to check here on "where is my bean":http://ci.research-infrastructures.eu/public/whereismybean/ and filter by "wfNode".
266 15 Alessia Bardi
267
"where is my bean":http://ci.research-infrastructures.eu/public/whereismybean/ will tell you which module and which application context defines the bean you need.
268
Consider that the service looks into the whole @dnet40/modules@ svn directory, hence you will usually find several entries for the same bean.
269
270
Typically, you'll need beans located in @dnet-msro-service@ and @dnet-*-workflows@ (e.g. @dnet-opnaireplus-workflows@ for the OpenAIRE project). 
271
272 13 Alessia Bardi
CNR suggests you to check out those modules from svn so you can quickly refer to node implementations, if you need it.
273 15 Alessia Bardi
274
275 13 Alessia Bardi
276 3 Alessia Bardi
h3. Meta-Workflow definition
277 9 Alessia Bardi
278
A meta-workflow implements a high-level functionality by composing workflows.
279
In the simplest scenario, one meta-workflow refers to only one workflow, as follows:
280
281
<pre>
282
<RESOURCE_PROFILE>
283
    <HEADER>
284
        <RESOURCE_IDENTIFIER
285
            value="metaWF-UUID_TWV0YVdvcmtmbG93RFNSZXNvdXJjZXMvTWV0YVdvcmtmbG93RFNSZXNvdXJjZVR5cGU="/>
286
        <RESOURCE_TYPE value="MetaWorkflowDSResourceType"/>
287
        <RESOURCE_KIND value="MetaWorkflowDSResources"/>
288
        <RESOURCE_URI value=""/>
289
        <DATE_OF_CREATION value="2006-05-04T18:13:51.0Z"/>
290
    </HEADER>
291
    <BODY>
292 10 Alessia Bardi
        <METAWORKFLOW_NAME family="tutorial">Sleeping</METAWORKFLOW_NAME>
293
        <METAWORKFLOW_DESCRIPTION>A sample meta-wf</METAWORKFLOW_DESCRIPTION>
294
        <METAWORKFLOW_SECTION>Tutorial</METAWORKFLOW_SECTION>
295 9 Alessia Bardi
        <ADMIN_EMAIL>wf-admin-1@wf.com,wf-admin-2@wf.com</ADMIN_EMAIL>
296
        <CONFIGURATION status="EXECUTABLE">
297
            <WORKFLOW
298 11 Alessia Bardi
                id="wf-profileID" name="Sleeping Beauty"/>
299 9 Alessia Bardi
        </CONFIGURATION>
300
        <SCHEDULING enabled="false">
301
            <CRON>29 5 22 ? * *</CRON>
302
            <MININTERVAL>10080</MININTERVAL>
303
        </SCHEDULING>
304
    </BODY>
305 1 Alessia Bardi
</RESOURCE_PROFILE>
306
</pre>
307
308
Let's have a look at the information in the @BODY@ element:
309 9 Alessia Bardi
* @METAWORKFLOW_NAME@, @family@, @METAWORKFLOW_DESCRIPTION@, and @METAWORKFLOW_SECTION@, : guide the UI for a correct visualization
310
* @ADMIN_EMAIL@: comma separated list of emails. Put here the emails of people that want to receive emails about the failure and completion of the workflows included in this meta-workflow.
311
* @WORKFLOW@: reference to an existing workflow
312
* @SCHEDULING@: 
313
** @enabled@: set to true if you want the meta-workflow to run automatically according to the given schedule
314
** @CRON@: cron expression to schedule the meta-workflow execution
315
** @MININTERVAL@: milliseconds (?) between two subsequent runs of the meta-workflow.
316
317
More complex meta-workflows may include different workflows running in sequence, in parallel, or a mix of them.
318
The data provision meta-workflow of OpenAIRE is a good example of a complex meta-workflow. It composes a total of 10 different workflows, where workflows @profileID2@, @profileID3@, @profileID4@, @profileID5@ runs in parallel. Note that the parallel/sequence composition is defined by the level of nesting of the @WORKFLOW@ XML element.
319
<pre>
320
<CONFIGURATION status="EXECUTABLE">
321
    <WORKFLOW id="profileID1" name="reset">
322 12 Alessia Bardi
        <WORKFLOW id="profileID2" name="db2hbase"/>
323 1 Alessia Bardi
        <WORKFLOW id="profileID3" name="oaf2hbase"/>
324 9 Alessia Bardi
        <WORKFLOW id="profileID4" name="odf2hbase"/>
325 11 Alessia Bardi
        <WORKFLOW id="profileID5" name="actions2hbase">
326
            <WORKFLOW id="profileID6" name="deduplication organizations">
327
                <WORKFLOW id="profileID7" name="deduplication persons">
328
                    <WORKFLOW id="profileID8" name="deduplication results">
329
                        <WORKFLOW id="profileID9" name="update provision">
330
                            <WORKFLOW id="profileID10" name="prepare pre-public info space"/>
331
                        </WORKFLOW>
332
                    </WORKFLOW>
333
                </WORKFLOW>
334
            </WORKFLOW>
335 9 Alessia Bardi
        </WORKFLOW>
336
    </WORKFLOW>
337
</CONFIGURATION>
338
</pre>
339 16 Claudio Atzori
340
h2. Workflow use case: metadata aggregation
341
342
COMING SOON