Project

General

Profile

1 36918 alessia.ba
# D-Net Software Toolikt
2
3
This is a minimal instance of the D-Net software toolkit, a software framework for the realization of aggregative data infrastructures.
4
5
Official Web Site: http://www.d-net.research-infrastructures.eu/
6
7 38339 alessia.ba
Need support? Contact us via email at: dnet-team@isti.cnr.it
8
9 38498 claudio.at
This webapp contains the minimal set of services needed to feature:
10 36918 alessia.ba
11 38258 alessia.ba
- Collection of metadata records in oai_dc format via OAI-PMH, FTP, local file system, HTTP.
12 36918 alessia.ba
13
- Transformation of the collected metadata records into an internal format named DMF (Driver Metadata Format)
14
15 38498 claudio.at
- Indexing of DMF records in a Solr full-text index
16 36918 alessia.ba
17 38498 claudio.at
- OAI-PMH export of aggregated metadata records in DMF and oai_dc formats. More formats can be added at runtime by providing a dedicated XSLT from DMF to the desired target format.
18 36918 alessia.ba
19
# Installation requirements
20
This minimal instance can be run on a single machine as web application to be deployed on a Tomcat container.
21
## Hardware requirements
22
23
Suggested minimal hardware requirements:
24
25 38258 alessia.ba
- Operating system: almost anything but Windows
26 38498 claudio.at
- HARD DISK space: mostly depends on the quantity and size of records you are going to collect. A couple of GBs for a small repository (<10K metadata recods) should be fine. See suggestions on installing mongodb below.
27 38258 alessia.ba
28 36918 alessia.ba
## Software requirements
29
Software required:
30
31
* Apache Tomcat 7: the webapp container
32 38498 claudio.at
* Mongodb >= 2.4: used to store the collected and transformed metadata records. Each collected record will be stored in three separate "versions": original, transformed, pmh-ready, hence enough disk space should be available for mongoDB.
33 38878 claudio.at
* Solr 4.9.x or 4.10.x: used to make the documents searchable. The solr server should be run using the option '-DzkRun' to instruct solr to start the zookeeper server.
34 36918 alessia.ba
35 38498 claudio.at
Note that Tomcat, Solr and Mongodb can be installed in the same machine or in dedicated nodes, although this requires to change some default system properties.
36 36918 alessia.ba
37 38240 alessia.ba
#Running the D-Net web app with Maven
38
## Maven settings
39
40 38498 claudio.at
Either if you want to run the D-Net web app with the Tomcat7 plugin for maven, or you want to build the .war file to deploy on a running tomcat,
41
you need maven3 and you must add the following repository into your <code>settings.xml</code>:
42 38240 alessia.ba
43
```
44
 <repository>
45
          <id>dnet-bootstrap-releases</id>
46
          <name>D-Net Bootstrap Releases</name>
47
          <url>http://maven.research-infrastructures.eu/nexus/content/repositories/dnet4-bootstrap-release/</url>
48
          <releases>
49
            <enabled>true</enabled>
50
          </releases>
51
          <snapshots>
52
            <enabled>false</enabled>
53
          </snapshots>
54
          <layout>default</layout>
55
 </repository>
56
```
57
58
We also suggest to add the Tomcat plugin to the plugins group at the bottom of the same file:
59
60
```
61
<pluginGroups>
62
    <pluginGroup>org.apache.tomcat.maven</pluginGroup>
63
</pluginGroups>
64
```
65
66
## Testing on local machine:
67 38498 claudio.at
The D-Net Software is developed in Java using Maven. You can try out the D-Net web app on your local machine with the tomcat7 plugin, provided you are also running a mongodb and a solr server on localhost that are listening to the relative standard ports.
68 39169 claudio.at
69 39158 claudio.at
Please note that the solr client used in D-Net needs to interact with the zookeeper server. For simplicity we suggest to use the embedded zookepper instance provided within the solr distribution. By default solr listens on the 8983 port and its embedded zookeeper server on the 9983 port.
70 38240 alessia.ba
71 39169 claudio.at
To override properties, you can modify <code>dnet-basic-aggregator/src/main/resources/eu/dnetlib/cnr-site.properties</code>. Please check the Section D-Net Configuration and the PROPERTIES.md file for more information about D-Net properties.
72
73 38240 alessia.ba
```
74 39169 claudio.at
> cd dnet-basic-aggregator
75 38240 alessia.ba
76
> mvn tomcat7:run
77
```
78
79
When you see a log like:
80
```
81
52665 [Thread-7] INFO  eu.dnetlib.enabling.is.store.TestContentInitializerJob  - INITIALIZED
82
```
83
84
The webapp should be ready and running at http://localhost:8280/app , where 'app' is the value of the property <code>container.hostname</code> ('app' is the default).
85
86
87
# Deployment on a Tomcat instance
88
89
In this distribution you will find a ready-to-deploy war package.
90
91
Copy the war file into the Tomcat 7 <code>webapps</code> directory, ensure you have overridden the properties as explained in the D-Net configuration section and restart Tomcat.
92
93
When you see a log like:
94
```
95
52665 [Thread-7] INFO  eu.dnetlib.enabling.is.store.TestContentInitializerJob  - INITIALIZED
96
```
97
98 39169 claudio.at
The webapp should be ready and running at
99 38240 alessia.ba
100 39169 claudio.at
```
101
http://${container.hostname}:${container.port}/${container.context}
102
```
103
104 38240 alessia.ba
If you want to build the web app yourself, then keep reading...
105
106
107
## Building the D-Net web app
108
The D-Net Software is developed in Java with Maven.
109
110
To build the war to use in a Tomcat 7 web app container:
111
112
```
113 39169 claudio.at
 > cd dnet-basic-aggregator
114 38240 alessia.ba
115
 > mvn package
116
```
117
118 39169 claudio.at
The <code>.war</code> file is then created into the <code>target</code> directory.
119 38240 alessia.ba
120 36918 alessia.ba
#D-Net configuration
121
Before you start the web application, you need to configure at least the following properties.
122
For the full list of available properties and their values, check PROPERTIES.md.
123
124
Create a file named <code>cnr.override.properties</code> in <code>$yourTomcatHomeDirectory$/common/classes</code> (<code>$yourTomcatHomeDirectory$</code> will likely be something similar to <code>/var/lib/tomcat7</code>)
125
126
- <code>container.hostname</code>: the host name where the web app will be running. Default value is <code>localhost</code>. The default value should *only* be used in local development scenarios.
127 38258 alessia.ba
</br>Example: <code>container.hostname = dnet-host.dnet.eu</code>
128 36918 alessia.ba
- <code>container.port</code>: the port where the web app will be running. Default is 8280.
129 38258 alessia.ba
</br>Example: <code>container.port = 8080</code>
130 36918 alessia.ba
- <code>container.context</code>: the name of the web app (i.e. the name of the war file). Default is "app". The default value should *only* be used in local development scenarios.
131 38258 alessia.ba
</br>Example: <code>container.context = is</code>
132 36918 alessia.ba
- <code>dnet.data.path</code>: path to the directory where all D-Net related resources will be saved. An embedded existDB will be automatically installed in this directory during the first start-up. The directory must be writable by the user running tomcat. Default value is <code>/tmp/dnet</code>. The default value should *only* be used in local development scenarios.
133 38258 alessia.ba
</br>Example: <code>dnet.data.path = /var/lib/dnet</code>
134 36918 alessia.ba
- <code>services.aggregator.country</code>: your country code. Default is <code>EU</code> (Europe).
135 38258 alessia.ba
</br>Example: <code>services.aggregator.country = IT</code>
136 36918 alessia.ba
- <code>services.aggregator.name</code>: the name of your aggregator. Default is "D-NET"
137 38258 alessia.ba
</br>Example: <code>services.aggregator.name = TEST_Aggregator</code>.
138 39169 claudio.at
- <code>services.mdstore.mongodb.host</code>: the machine hosting mongodb for the storage of metadata records (M[eta]D[ata]Store). Default is <code>localhost</code>.
139
</br>Example: <code>services.mdstore.mongodb.host = mongodb.dnet.eu</code>
140
- <code>services.mdstore.mongodb.db</code>: name of the mongodb database to be used for the storage of metadata records. Default is <code>mdstore_minimal</code>.
141 38258 alessia.ba
</br>Example: <code>services.mdstore.mongodb.db = mdstore_1</code>
142 36918 alessia.ba
- <code>dnet.logger.mongo.host</code>: the machine hosting mongodb for the storage of workflow logs. Default is localhost.
143 38258 alessia.ba
</br>Example: <code>dnet.logger.mongo.host = mongo.dnet.eu</code>
144 36918 alessia.ba
- <code>dnet.logger.mongo.db</code>: name of the mongodb database to be used for the storage of workflow logs. Default is "dnet_logs_minimal".
145 38258 alessia.ba
</br>Example: <code>dnet.logger.mongo.db = dnet_logs_1</code>
146 36918 alessia.ba
- <code>services.oai.publisher.repo.name</code>: name of the OAI-PMH Publisher, as it will appear in the OAI Identify response. Default is "D-Net OAI-PMH Publisher".
147 38258 alessia.ba
</br>Example: <code>services.oai.publisher.repo.name = TEST_Aggregator OAI-PMH Publisher</code>
148 36918 alessia.ba
- <code>services.oai.publisher.repo.email</code>: email of the OAI-PMH Publisher administrator, as it will appear in the OAI Identify response. Default is "dnet-admin@mock.it". The default *must not* be used in beta or production system for it is a mock email.
149 38258 alessia.ba
</br>Example: <code>name.surname@valid.mail.com</code>
150 39169 claudio.at
- <code>dnet.admin.password</code>: md5sum of the password that will allow the user "admin" to login to the D-Net Admin UI. To generate the new password: <code>echo -n "thePassword" | md5sum</code>. Default is "dnet-minimal" (without double quotes). The default value *should always be overridden*.
151
</br>Example: <code>dnet.admin.password = 9003d1df22eb4d3820015070385194c8</code>, where 9003d1df22eb4d3820015070385194c8 is the md5 for the string "pwd" obtained via the command <code>echo -n "pwd" | md5sum</code>.
152 36918 alessia.ba
- <code>service.solr.index.jsonConfiguration</code>: information about the Solr instance to be used to create full-text indices on the aggregated metadata records. Default value assumes a local Solr instance. Specifically:
153
<code>
154 39169 claudio.at
{"id":"solr", "address":"localhost:9983", "port":"8983", "webContext":"solr", "numShards":"1", "replicationFactor":"1", "host":"localhost",	"feedingShutdownTolerance":"30000",	"feedingBufferFlushThreshold":"1000", "feedingSimulationMode":"false", "luceneMatchVersion":"4.9",	"serverLibPath":"../../../../contrib/extraction/lib", "filterCacheSize":"512","filterCacheInitialSize":"512",	"queryCacheSize":"512","queryCacheInitialSize":"512", "documentCacheSize":"512", "documentCacheInitialSize":"512", "ramBufferSizeMB":"960","mergeFactor":"40",	"autosoftcommit":"-1","autocommit":"15000", "termIndexInterval":"1024","maxIndexingThreads":"8", "queryResultWindowSize":"20","queryResultMaxDocCached":"200"}
155 36918 alessia.ba
</code>
156
157
If you are not running the Solr service on the same machine where Tomcat runs, then you need to override the above configuration according to your Solr server installation.
158
Typically, changing <code>address</code> and <code>host</code> is enough if your Solr server is not configured for sharding and replication.
159
For more details refer to the Solr documentation.
160
161 38349 alessia.ba
#Using D-Net
162 36918 alessia.ba
163 38339 alessia.ba
Under the root folder of the project you can find the folder <code>mock-repository-content</code>.
164
It contains 150 oai_dc metadata records you can use to test the functionality of the D-Net software with a Mock Datasource.
165
166
* Place the folder in a location that is readable from tomcat
167
* Start the container
168 38349 alessia.ba
* Access the Admin UI (http://${container.hostname}:${container.port}/${container.context}/mvc/ui/index.do)
169
	* If you are running via the maven tomcat plugin with the default properties the URL is: http://localhost:8280/app/mvc/ui/index.do
170 38339 alessia.ba
* Go on Datasource Management --> Overview and search for "mock"
171
* Click on "Add metaworkflow" and select the "Collection and Transformation" meta-workflow. This action will associate a meta-workflow (i.e., a workflow of workflows) to the datasource and will create all needed metadata stores.
172
* Click on the "access params" button on the top right and change the base url to the location where you saved the sample folder (e.g. file:///dnet/test/mock-repository-content)
173 38349 alessia.ba
* Click on the meta-workflow "Collection and Transformation" and configure its workflows with the missing parameter for the transformation rule
174
	* click on the yellow "parameters" button of the trasnformation workflow and select the rule <code>dc2dmf_DRIVER</code>
175 38339 alessia.ba
* Ensure the launch mode is set to "Auto" for each workflow
176
* Click on the Launch button of the first ("collect")
177 38349 alessia.ba
* Wait for all the workflows to complete: collect, transform, index, oai, and oaiPostFeed
178 38339 alessia.ba
* Verify that the records get transformed and indexed: click on MD Inspectors --> D-Net content checker and perform some queries
179
* Verify that the aggregated records are correctly exposed via the built-in OAI-PMH publisher at:
180 38349 alessia.ba
	* http://${container.hostname}:${container.port}/${container.context}/mvc/oai/oai.do?verb=ListRecords&metadataPrefix=dmf for the DMF metadata format
181
	* http://${container.hostname}:${container.port}/${container.context}/mvc/oai/oai.do?verb=ListRecords&metadataPrefix=oai_dc for the OAI_DC metadata format
182 38339 alessia.ba
183
#Need support?
184
Do not hesitate to contact dnet-team@isti.cnr.it