cleaning up mapper code
Adding property flag for dedup
Added inheritence in classes for mappers
removed limes depedency
Adding fields used for tokenizing as proprties
removed redis pass from code and updated getStats action to parse args correctly
Final update- fixed distance algs, added props un wf
Updating similarity comparators
Added support for composite keys, cleaned up code
Fixes on old code
Reverted changes to old code
Reduced max num of records
Downgraded to java 7
Added java 8 support; Updated blocking, fixed issue with token generation
Moved code for block creation inside blocking class
Testing with name only as blocking key
Fixed issue in comparison
Added treemap in token blocking ; fixed issue with loop
Updated token blocking to use only specified fields and use composite keys
Changed token blocking to accept year as a token
cleaned up build
resolved conflicts to stable state
Added DatasetComparator class
adding new branches
Final updates and cleanup before benchmarking
Cleaning up code in Frequency counter
Refactor redis utils. Cleaned up classses
Fixed issue with leftover Redis connections in Build mappers
A
fixed parsing
finished verification
Trimmed trailing \t in redis records.
Added M/R step for verification
Cleaned up packages, Added check in token blocking for only numeric tokens.
Refactored and cleaned up packages.
Trying storiing target records on map ini interlinking
fixed custom comparator
'updating
updating tests
'testing
'updates'
Updates for CR
FInal working build for linkage.
Fixed delims/parsing in DatasetComparator
removed <> from property names in reducer.
added recuder class
Added new m/r in build for testing lazy write on redcords
Cleaned up code for building
Cleaned up code for building ; fixed error in blocking
Refactored Build accoring to new parsing.
Cleaned up mappers /reducers;
Fnished target and source parsing ; Cleaning up build phase.
Fnished target and source parsing
Fnished target parsing
Fnished source parsing
Cleaned up mappers /reducers; Added a step for counting word frequencies in titles.
Cleaned up Limes Reducer
Restored limes reducer
renamed reducers
added new classes
updates with new output format, optimized code, custom comparator
updates with new output format
clea
fixed stats and paths
cleaned up blocking
'moved
''
Trying out custom output format
Fixed error in Preprocessing
added pruning of fields and entities according to mappings
added optimized cache implementation
fixed dataset comparison
Added batch read from Redis in Linkage Reduce phase