Project

General

Profile

1
<?xml version="1.0" encoding="UTF-8"?>
2
<de:comments xmlns:de="http://de.tukl.softech.agileReview">
3
  <de:author name="mafju"/>
4
  <de:files>
5
    <de:project name="icm-iis-export-hbase">
6
      <de:folder name="src">
7
        <de:folder name="main">
8
          <de:folder name="resources">
9
            <de:folder name="eu">
10
              <de:folder name="dnetlib">
11
                <de:folder name="iis">
12
                  <de:folder name="export">
13
                    <de:folder name="hbase">
14
                      <de:folder name="oozie_app">
15
                        <de:file name="workflow.xml">
16
                          <de:comment id="c0" author="mafju" reviewID="2013-03-08 export data generator" creation-date="2013-03-08T11:22:35.759+01:00" last-modified="2013-03-11T15:06:14.902+01:00" priority="0" recipient="mhorst" status="1" revision="0">
17
                            <de:text>I would personally use more descriptive names for data stores e.g. "document_with_inferenced_data" instead of less readable "doc_with_inf_data". Especially "inf" word is ambiguous because it could also mean, e.g. "infinite", "inferior".</de:text>
18
                            <de:replies>
19
                              <de:reply author="mhorst" creation-date="2013-03-11T14:21:52.972+01:00">I will rename it with full names.</de:reply>
20
                              <de:reply author="mhorst" creation-date="2013-03-11T15:03:58.209+01:00">Renamed, closing.</de:reply>
21
                            </de:replies>
22
                          </de:comment>
23
                        </de:file>
24
                      </de:folder>
25
                      <de:file name="job.properties">
26
                        <de:comment id="c4" author="mafju" reviewID="2013-03-08 export data generator" creation-date="2013-03-08T12:59:58.747+01:00" last-modified="2013-03-11T15:07:27.033+01:00" priority="0" recipient="mhorst" status="1" revision="0">
27
                          <de:text>It seems that there's a convention that the data that is not available is marked with "x". This does not seem to be elegant since it is easy to confuse the special marker "x" with a real directory.</de:text>
28
                          <de:replies>
29
                            <de:reply author="mhorst" creation-date="2013-03-08T20:25:22.582+01:00">Come on, this was clearly not intended to be considered as a final solution. Just a quick and dirty 'blank' to make things working asap. Obviously creating 'x' directory in current folder and placing JSON files will cause errors, but I was confident I would not do this ;)
30

    
31
I am not sure what would be the best 'blank' value anyway,  empty string seem to cause errors.</de:reply>
32
                            <de:reply author="mafju" creation-date="2013-03-11T13:25:17.949+01:00">> Come on, this was clearly not intended to be considered as a final solution. Just a quick and dirty 'blank' to make things working asap.
33

    
34
:D How would I know that? I thought this was also something I should review; business as usual :) Maybe some "FIXME" comment or something similar would be in order?
35

    
36
> I am not sure what would be the best 'blank' value anyway,  empty string seem to cause errors.
37

    
38
Maybe something that could not possibly be a valid path in file system, e.g. "*"?</de:reply>
39
                            <de:reply author="mhorst" creation-date="2013-03-11T14:19:30.057+01:00">I am not sure how would '*' as directory path be handled, what I would like to achieve is to have assurance of providing empty directory without any files.
40

    
41
I've got simple idea: at prepare state create export_data_gen/empty directory and provide this path as default input :)</de:reply>
42
                            <de:reply author="mhorst" creation-date="2013-03-11T14:51:12.980+01:00">Confirmed: creating empty dir and setting values to:
43

    
44
${workingDir}/export_data_gen/empty/
45

    
46
works as expected.</de:reply>
47
                            <de:reply author="mafju" creation-date="2013-03-11T15:07:27.033+01:00">Nice</de:reply>
48
                          </de:replies>
49
                        </de:comment>
50
                        <de:comment id="c5" author="mafju" reviewID="2013-03-08 export data generator" creation-date="2013-03-08T13:04:17.272+01:00" last-modified="2013-03-11T13:26:05.253+01:00" priority="0" recipient="mhorst" status="1" revision="0">
51
                          <de:text>There's a hardcoded path. Is this done intentionally?</de:text>
52
                          <de:replies>
53
                            <de:reply author="mhorst" creation-date="2013-03-08T20:29:15.900+01:00">Obviously, yes. As I had written before on some other comment: this module, and all of its configuration was intended to be used by me only. So it does not follow conventions and can be ugly.
54

    
55
If it will be prepared to be used by others, all hardcoded values will be replaced by placeholders or proper default values.</de:reply>
56
                            <de:reply author="mafju" creation-date="2013-03-11T13:25:50.147+01:00">Ok</de:reply>
57
                          </de:replies>
58
                        </de:comment>
59
                      </de:file>
60
                    </de:folder>
61
                  </de:folder>
62
                </de:folder>
63
              </de:folder>
64
            </de:folder>
65
          </de:folder>
66
          <de:folder name="java">
67
            <de:folder name="eu">
68
              <de:folder name="dnetlib">
69
                <de:folder name="iis">
70
                  <de:folder name="export">
71
                    <de:folder name="hbase">
72
                      <de:folder name="generator">
73
                        <de:file name="JsonBasedInferencedDataGenerator.java">
74
                          <de:comment id="c1" author="mafju" reviewID="2013-03-08 export data generator" creation-date="2013-03-08T12:14:23.711+01:00" last-modified="2013-03-11T13:18:37.576+01:00" priority="0" recipient="mhorst" status="1" revision="0">
75
                            <de:text>The logic of code in this class is quite complicated, its not clear what it does when just glancing at it. If it is going to be used by some other people, maybe some javadoc would be appropriate, or even better: simplifying the logic (if possible)?</de:text>
76
                            <de:replies>
77
                              <de:reply author="mhorst" creation-date="2013-03-08T19:31:34.656+01:00">It wasn't intended for other people, I am the one struggling with export so I was mainly thinking about myself when developing this module.</de:reply>
78
                              <de:reply author="mafju" creation-date="2013-03-11T13:07:23.015+01:00">Fair enough, although maybe in the future someone would want to use it.</de:reply>
79
                            </de:replies>
80
                          </de:comment>
81
                          <de:comment id="c2" author="mafju" reviewID="2013-03-08 export data generator" creation-date="2013-03-08T12:27:46.775+01:00" last-modified="2013-03-11T14:14:05.659+01:00" priority="0" recipient="mhorst" status="1" revision="0">
82
                            <de:text>This is not clear to me. The name of the "modeToPathMap" object suggests that we're extracting from the "parameters" an element whose kay/name is equal to some path. This is probably not something that we're really doing?
83

    
84
BTW, we're passing here some paths do files through parameters of the worklfow node, but a standard way to do it is to pass such information through input ports. Why do we use parameters? Maybe it would be more appropriate (and more explicit) to use the input ports?</de:text>
85
                            <de:replies>
86
                              <de:reply author="mhorst" creation-date="2013-03-08T20:07:43.964+01:00">> This is not clear to me. The name of the "modeToPathMap" object suggests that we're extracting from the "parameters" an element whose kay/name is equal to some path. This is probably not something that we're really doing?
87

    
88
This is required to associate export mode with proper input path and to make it easier to handle individual modes using single handleMode() method in the loop.
89

    
90
> BTW, we're passing here some paths do files through parameters of the worklfow node, but a standard way to do it is to pass such information through input ports. Why do we use parameters? Maybe it would be more appropriate (and more explicit) to use the input ports?
91

    
92
The main reason for using parameters over ports was their less restrictive nature in terms of requirement for providing given path explicitly as in most cases only single datastore will be generated. When using ports: all of them have to be provided in configuration. I will switch back to ports because I had to provide all placeholders in job.properties anyway so I gained nothing due to configuration restrictions.</de:reply>
93
                              <de:reply author="mafju" creation-date="2013-03-11T13:17:24.538+01:00">> The main reason for using parameters over ports was their less restrictive nature in terms of requirement for providing given path explicitly as in most cases only single datastore will be generated. When using ports: all of them have to be provided in configuration. I will switch back to ports because I had to provide all placeholders in job.properties anyway so I gained nothing due to configuration restrictions.
94

    
95
Ok. If it wouldn't make much difference, you could also generate all possible datastores every time this workflow node is run and then use only some of them in the next workflow node.</de:reply>
96
                              <de:reply author="mhorst" creation-date="2013-03-11T14:14:05.659+01:00">True, but the main problem was not the requirement for generating output for all 4 ports but the neccessity of providing all 4 input ports with JSONs. In most cases I would like to provide JSONs for single port only (e.g. similarities when testing similarity export).</de:reply>
97
                            </de:replies>
98
                          </de:comment>
99
                          <de:comment id="c3" author="mafju" reviewID="2013-03-08 export data generator" creation-date="2013-03-08T12:36:39.418+01:00" last-modified="2013-03-11T13:18:30.472+01:00" priority="0" recipient="mhorst" status="2" revision="0">
100
                            <de:text>Is doing this check necessary? Don't we assume here that the "modeToPathMap" contains all possible "ExportModes"? If so, double checking it here does not seem to be necessary.</de:text>
101
                            <de:replies>
102
                              <de:reply author="mhorst" creation-date="2013-03-08T20:14:06.373+01:00">This might seem to be excessive although when someone changes list of iterable modes without filling in the modeToPathMap this will trigger meaningful error message. 
103

    
104
Now I realized we could simply iterate over modeToPathMap keys and skip this condition :)</de:reply>
105
                            </de:replies>
106
                          </de:comment>
107
                        </de:file>
108
                      </de:folder>
109
                    </de:folder>
110
                  </de:folder>
111
                </de:folder>
112
              </de:folder>
113
            </de:folder>
114
          </de:folder>
115
        </de:folder>
116
      </de:folder>
117
    </de:project>
118
  </de:files>
119
</de:comments>
(1-1/2)