Multi-view Image Exploration, Release 1.1
=========================================
This is the source code to the image exploration tool used in the following
publications:

[1] V. Ferrari, T. Tuytelaars, L. Van Gool:
Simutaneous Object Recognition and Segmentation by Image Exploration,
ECCV 2004

[2] V. Ferrari, T. Tuytelaars, L. Van Gool:
Integrating Multiple Model Views for Object Recognition,
CVPR, Vol. 2, pp. 143-153, 2004

[3] A. Thomas, V. Ferrari, B. Leibe, T. Tuytelaars, B. Schiele, L. Van Gool:
Towards Multi-View Object Class Detection,
CVPR, Vol. 2, pp. 1589-1596, 2006.

[4] A. Thomas, V. Ferrari, B. Leibe, T. Tuytelaars, B. Schiele, L. Van Gool:
Using Multi-view Recognition and Meta-data Annotation to Guide a Robot's
Attention,
doi:10.1177/0278364909340444
International Journal of Robotics Research, 2009.

The original code (a mix of Matlab and C(++)) was programmed by T. Tuytelaars
and V. Ferrari, and was ported to a full C++ implementation by A. Thomas.

This source code is released "as-is". It may only be used for research
purposes or personal use. For any other use, you should contact the
authors. There is no warranty for fitness for a particular purpose. This code
comes without support, if you have any problems with compiling/installing it,
you'll have to fix them yourself.

If you have any questions besides compilation errors, you can contact the
authors at:
  alexander.thomas@esat.kuleuven.be
  ferrari@vision.ee.ethz.ch

Version history:
1.0 (2007/01): Initial release
1.1 (2009/07): Fixed some issues with recent compilers


INSTALLATION:
-------------
Read the 'INSTALL' file.


USAGE:
------
Once installed, the binaries will reside inside the 'bin' directory inside the
installation directory. The main program is 'learnMultiViewMosaic'. Most of
the other binaries perform subparts of the process described in [1, 2, 3].
Most of the programs will give short usage information when run without
arguments. Some of the programs may be useless test programs, we didn't really
clean up everything before releasing this. Similarly, the source code may be
a bit messy in some parts.

'learnMultiViewMosaic' will search for dense multi-view correspondences as
described in [3]. The arguments should be a set of images, containing
different views of the same object(s). File names should be of the form
objectName##suffix, where suffix doesn't contain numbers, and ## is the view
ID. You can process multiple objects at the same time by using different
prefixes. For instance, file names could be:

object1-001.png
object1-002.png
object1-003.png
object2-00.png
object2-01.png

Explicit segmentation masks can be provided in a subfolder 'maps' in the same
directory as the images (masks must not be given as arguments to the program,
they are detected automatically). Masks must be in PNG format, either 8-bit
grayscale or RGB color, no alpha channels. For an image called 'image.ext',
the mask must be called 'image-map.png', and the dimensions must
match. Foreground is indicated by white in the mask image, background by
black. If no segmentation mask is found, the program will try to detect a
uniform background color and use it to segment figure from background.

The program will produce a lot of output files, most of which are intermediate
results that can be discarded after processing is completed. When aborted and
re-started, the program will check for intermediate results and use them
instead of recalculating (unless the -f option is used). It is assumed that
consecutive view IDs are adjacent, and by default only the 2 nearest views in
each direction of the current view will be matched (this can be changed with
the -s option). A few other options are available to control parameters of the
process: see the program's default output. The defaults should be good for
images in which the object occupies an area of about 500x500 pixels.

The final output is provided in a 'model_objectName.tracks' file, one such
file per object. This file contains a list of all the 'tracks' found on the
object, where a track is a set of regions that correspond across the different
views. The structure of a .tracks file is as follows:

MVT [version number]
<object name>
[number of views N] [number of tracks M]
<space-separated list of N view IDs from filenames>
<Start of region listing = M sets of maximum N+1 lines, containing the following:>
  [int trackLength]
  [int viewID] [int type] [double[2] pt0] [double[2] pt1] [double[2] pt2] [double[2] maj] [double[2] min]
  [int viewID] [int ....

A track consists of at least two regions, where viewID is the ID of the image
in which the region exists. 'type' is a number indicating the type of region,
which will be always 21 (ellipse), unless the parallellogram regions were
activated. The remaining data describe the shape of the region: 'pt0' is the
center of the region; 'pt1' and 'pt2' are the transformed coordinates of the
points (1,0) and (0,1) if the region is considered an affine transform of the
unit circle. 'maj' and 'min' are intended to be the points on the long,
resp. short axis of the ellipse, but these values are not calculated in the
program, so they should be ignored.

There is no binary for the object recognition procedure described in [1, 2],
because this was unneeded in [3]. Because it uses the same building blocks as
the image exploration procedure, it should be straightforward to implement
this, using learnmvmosaic.cpp as a starting point.
