Project

General

Profile

1
General notes
2
====================
3

    
4
Oozie-installer is a utility allowing building, uploading and running oozie workflows. In practice, it creates a `*.tar.gz` package that contains resouces that define a workflow and some helper scripts. See the `icm-iis-core-examples` project for examples of usage.
5

    
6
This module is automatically executed when running: 
7

    
8
`mvn package -Poozie-package -Dworkflow.source.dir=classpath/to/parent/directory/of/oozie_app` 
9

    
10
on module having set:
11

    
12
	<parent>
13
    		<groupId>eu.dnetlib</groupId>
14
	        <artifactId>icm-iis-parent-container</artifactId>
15
            <version>0.0.1-SNAPSHOT</version>
16
	</parent>
17

    
18
in `pom.xml` file. `oozie-package` profile initializes oozie workflow packaging, `workflow.source.dir` property points to a workflow (notice: this is not a relative path but a classpath to directory). 
19
 
20
The outcome of this packaging is `oozie-package.tar.gz` file containing inside all the resources required to run Oozie workflow:
21

    
22
- jar packages
23
- workflow definitions
24
- job properties
25
- maintenance scripts
26

    
27
Required properties
28
====================
29

    
30
In order to include proper workflow within package, `workflow.source.dir` property has to be set. It could be provided by setting `-Dworkflow.source.dir=some/job/dir` maven parameter.
31

    
32
Other placeholders used in shell scripts (`*.sh`) along with default values in `pom.xml` file:
33

    
34
	property name		| default value
35
	---------------------------------------------------
36
	iis.hadoop.frontend.host.name	| localhost
37
	iis.hadoop.master.host.name	| localhost
38
	iis.hadoop.frontend.user.name	| ${user.name} which maven property holding current user name
39
	iis.hadoop.frontend.home.dir	| /mnt/tmp
40
	sandboxName			| generated by dedicated plugin, based on `workflow.source.dir`
41
	sandboxDir			| /user/${iis.hadoop.frontend.user.name}/${sandboxName}
42
	workingDir			| ${sandboxDir}/working_dir
43
	oozieAppDir			| oozie_app
44
	oozieServiceLoc			| http://${iis.hadoop.master.host.name}:11000/oozie
45
	
46
this list can be supplemented with job.properties default values defined in `pom.xml` file:
47

    
48
	property name		| default value
49
	---------------------------------------------------
50
	nameNode			| hdfs://${iis.hadoop.master.host.name}:8020
51
	jobTracker			| ${iis.hadoop.master.host.name}:8021
52
	queueName			| default
53

    
54
All values will be overriden with the ones from `job.properties` and eventually `job-override.properties` stored in module's main folder. Values can be also provided as maven command line -D arguments.
55

    
56
When overriding properties from `job.properties`, `job-override.properties` file can be created in main module directory (the one containing `pom.xml` file) and define all new properties which will override existing properties. One can provide those properties one by one as command line arguments.
57

    
58
Properties overriding order is the following:
59

    
60
1. `pom.xml` defined properties (located in the project root dir)
61
2. `~/.m2/settings.xml` defined properties
62
3. `${workflow.source.dir}/job.properties`
63
4. `job-override.properties` (located in the project root dir)
64
5. `maven -Dparam=value`
65

    
66
where the maven `-Dparam` property is overriding all the other ones.
67

    
68
Workflow definition requirements
69
====================
70

    
71
`workflow.source.dir` property should point to the following directory structure:
72

    
73
	[${workflow.source.dir}]
74
		|
75
		|-job.properties (optional)
76
		|
77
		\-[oozie_app]
78
			|
79
			\-workflow.xml
80

    
81
This property can be set using maven `-D` switch.
82

    
83
`[oozie_app]` is the default directory name however it can be set to any value as soon as `oozieAppDir` property is provided with directory name as value. 
84

    
85
Subworkflows are supported as well and subworkflow directories should be nested within `[oozie_app]` directory. 
86

    
87
Creating oozie installer step-by-step
88
=====================================
89

    
90
Automated oozie-installer steps are the following:
91

    
92
1. creating jar packages:  `*.jar` and `*tests.jar` along with copying all dependancies in `target/dependencies`
93
2. reading properties from maven, `job.properties`, `job-override.properties`
94
3. invoking priming mechanism linking resources from import.txt file (currently resolving subworkflow resources)
95
4. assembling shell scripts for preparing Hadoop filesystem, uploading Oozie application and starting workflow
96
5. copying whole `${workflow.source.dir}` content to `target/${oozie.package.file.name}`
97
6. generating updated `job.properties` file in `target/${oozie.package.file.name}` based on maven, `job.properties` and `job-override.properties`
98
7. creating lib directory (or multiple directories for subworkflows for each nested directory) and copying jar packages created at step (1) to each one of them
99
8. bundling whole `${oozie.package.file.name}` directory into single tar.gz package
100

    
101
Uploading oozie package and running workflow on cluster
102
=======================================================
103

    
104
In order to simplify deployment and execution process four dedicated profiles were introduced:
105

    
106
- deploy-local
107
- run-local
108
- deploy
109
- run
110

    
111
to be used along with `oozie-package` profile e.g. by providing `-Poozie,deploy-local,run-local` or `-Poozie,deploy,run` maven parameters.
112

    
113
`deploy-local` profile supplements packaging process with:
114
1) extracting oozie package to `target/local-upload` directory
115
2) uploading oozie package content to local hadoop cluster
116

    
117
`run-local` profile introduces:
118
1) executing workflow uploaded to HDFS cluster using `deploy-local` command
119

    
120
`deploy` profile supplements packaging process with:
121
1) uploading oozie-package via scp to `/mnt/tmp/${user.name}/oozie-package-${timestamp}` directory on `${iis.hadoop.frontend.host.name}` machine
122
2) extracting uploaded package
123
3) uploading oozie content to hadoop cluster
124

    
125
`run` profile introduces:
126
1) executing workflow uploaded to HDFS cluster using `deploy` command
127
2) removing uploaded files
128

    
129
Notice: ssh access to frontend machine has to be configured on system level and it is preferable to set key-based authentication in order to simplify remote operations.
130

    
131
Other tips
132
==========
133

    
134
It is a good practice to define all hadoop cluster related environment variables in local  `~/.m2/settings.xml` file.
(4-4/5)