3D HUMAN BODY POSE ESTIMATION BY SUPERQUADRICS

Ilya Afanasyev, Massimo Lunardelli, Nicolo' Biasi, Luca Baglivo, Mattia Tavernini, Francesco Setti and Mariolino De Cecco Department of Mechanical and Structural Engineering (DIMS), University of Trento, via Mesiano, 77, Trento, Italy {ilya.afanasyev, mariolino.dececco}@unitn.it

Keywords:

Superquadrics, RANSAC fitting, Human Body Pose Estimation, 3D object localization.

Abstract:

This paper presents a method for 3D Human Body pose estimation by using a multi-camera system. The pose is estimated by RANSAC-object search with a robust least square fitting of 3D points to SuperQuadric (SQ) models of the searched object. The solution is verified by evaluating the matching score between the SQ object model and 3D real data captured by a multi-camera system and segmented by a special preprocessing algorithm. This method can be used for 3D object recognition, localization and pose estimation of Human Body.

1

INTRODUCTION

The recovery of 3D human pose is an important problem in computer vision with many potential applications in robotics, motion analysis (sports, medical), animation, interactive games, and security surveillance. The task of 3D object recognition and pose estimation from unstructured 3D real data obtained by a multi-camera system can be solved different ways. Some authors propose to analyze a single-frame poses from individual camera views with subsequent generation of a shape hypothesis with hierarchical shape matching to 3D upper body model based on tapered SQ (Hofmann and Gavrila, 2009). Other authors use the idea of recovery of 3D upper human body from 2D static images by building an image generative model and using the MCMC (Markov Chain Monte Carlo) framework to search the solution space (Mun Wai Lee, 2004). For this purpose they employ an articulated structure human model (from 7 joints, 10 body parts and 21 degree of freedom), a probabilistic shape model, and clothing models (Mun Wai Lee, 2004). There is also a method of recovering an object by superquadric models with the recover-and-select paradigm (Jaklic, 2000 and Leonardis, 1997). For this purpose the authors fill a range images with a set of seeds (small SQ models) and then increase these seeds using growth iteration approach with

following selection of the suitable models. This approach was tried out on a wooden mannequin (Jaklic, 2000 and Leonardis, 1997). As far as a human body and limbs (in metric dimensions) can be modeled a priori, we propose to use a RANSAC-based model-fitting technique with a composite superquadric model. It is known that SQ models permit to describe complex-geometry objects with few parameters and generate simple minimization function to estimate an object pose. The logic of our 3D Human Body pose estimation algorithm is represented by the block-scheme in Figure 1. The object pose estimation starts with a preprocessing of “cloud of points” captured by multiple cameras. The preprocessing stage realizes segmentation of the Human Body into 9 parts (body, arms, forearms, hips and legs). Initially the algorithm recovers a position of a body as the largest object (“Body Pose Search”) and then uses the information about body position to restore other positions of human limbs (“Limbs Pose Search”). To cope with measurement noise and outliers, the object pose is estimated by RANSAC-SQ-fitting technique. We can control the quality of fitting by setting limbs (body) thresholds, which are a ratio of optimal quantity of inliers to a number of corresponding limb (body) points. The tests showed that as a result of the Body Pose Search we can obtain a hypothesis with a slightly wrong body position, which can satisfy to a body threshold, but can’t be applicable to overcome limb thresholds. For this reason, when a

number of limb inliers less than a limb threshold, we restart the Body Pose Search stage until finding suitable results of RANSAC-SQ-fitting for every

ε1

ε2

x a1 ⋅ signum(cosη ) ⋅ cosη ⋅ signum(cos ω ) ⋅ cos ω y = a ⋅ signum(cosη ) ⋅ cosη ε1 ⋅ signum(sin ω ) ⋅ sin ω ε 2 2 ε z a3 ⋅ signum(sin η ) ⋅ sin η 1

(1)

where x, y, z – superquadric system coordinates; a1, a2, a3 – parameters of object scaling; ε1, ε2 – parameters of object shape; η, ω – spherical coordinates. The object under investigation is a Human Body, which consists of 9 superquadrics – superellipsoids (Figure 2) with the shape parameters ε1 = ε2 = 0.5 and scaling parameters for different body parts: - Body: a1 = 0.095, a2 = 0.18, a3 = 0.275 (m). - Arms: a1 = a3 = 0.055, a2 = 0.15 (m). - Forearms: a1 = a3 = 0.045, a2 = 0.13 (m). - Hips: a1 = a2 = 0.075, a3 = 0.2 (m). - Legs: a1 = a2 = 0.05, a3 = 0.185 (m). The scaling parameters of SQ are presented in the metrical superquadric coordinate systems.

limbs. Figure 1: The block-scheme of 3D Human Body Pose Estimation algorithm.

2

OBJECT MODEL IN SUPERQUADRICS

Figure 2: Presentation of Human Body in 9 blocks: B – body, LA/RA – Left/Right Arms, LF/RF – Left/Right Forearms, LH/RH – Left/Right Hips, LL/RL – Left/Right Legs. Other abbreviations: LS – Left Shoulder, E – Elbow, ηLA – angle position of Left Shoulder, LHJ – Left Hip Joint, K – Knee, etc.

2.2 2.1

Human Body in SQ

SuperQuadric parameters

It is known (Jaklic, 2000 and Leonardis, 1997) that the explicit equation of superquadrics, which is usually used for SQ representation and visualization, is:

The position of Human Body is defined by the following rotation & translation sequences of the Body Superquadrics: 1. Translation of center of BODY (xc, yc, zc), along x, y, z-coordinates.

The rotation matrix of BODY RBODY is: 0 1 0 0 cos(α) − sin(α) RBODY= 0 sin(α) cos(α) 0 0 0

0 cos(β) 0 0 ⋅ 0 − sin(β) 1 0

0 cos(β) sin(β ) 0 0 cos(γ ) 0 − sin(β) cos(β) 0 0 0 ⋅ ⋅ 0 0 0 1 0 − sin(γ ) 1 0 0 0 1 0

0 1 0 0 cos(α) − sin(α) RLA = 0 sin(α) cos(α) 0 0 0

2. Rotation α among x (clockwise). 3. Rotation β among y (clockwise). 4. Rotation γ among z (clockwise).

0 sin(β) 0 cos(γ ) − sin(γ ) 1 0 0 sin(γ ) cos(γ ) ⋅ 0 cos(β) 0 0 0 0 0 1 0 0

0 sin(γ ) 0 1 0 0 . 0 cos(γ ) 0 0 0 1

(7)

The transformation Left Arm - Elbow (LA-E) is 0 0 0 0 1 0 0 1

1 0 TELA = 0 0

(2)

0 0 0 1 0 a2 . 0 1 0 0 0 1

(8)

The transformation matrix TBODY for the BODY is: 1 0 TBODY = RBODY ⋅ 0 0

0 0 xc 1 0 yc . 0 1 zc 0 0 1

(3)

The transformation Elbow - Left Forearm (E-LF) is created by 1. Rotation δLF among x (clockwise). 2. Translation of SQ center on -a2 along y. −1

2.3

0 0 0 1 0 cos(δ ) sin(δ ) − a 2 E E TLF = TLF (δ = δLF ) = . 0 − sin(δ ) cos(δ ) 0 0 0 1 0

Human Arms and Forearms in SQ

Let’s consider the transformation equations for Left Arm and Forearm. The position of Left Shoulder according to the center of the body coordinate system (Figure 2) is estimated by SQ explicit equation (1): 0 π ε P = Pη = η LA , ω = = a 2 ⋅ signum(cosη ) ⋅ cosη 1 2 a 3 ⋅ signum(sin η ) ⋅ sin η ε1 B S

1 0 T = 0 0 B LS

0 0 L 1 0 PSB . 0 1 L 0 0 1

(5)

0 0 0 1 0 a2 , 0 1 0 0 0 1

where RLA is the rotation matrix of Left Arm:

(

(4)

We can express the transformation: Left Shoulder - Left Arm (LS-LA) by the following rotation & translation sequences: 1. Rotation α among x (clockwise). 2. Rotation β among z (anticlockwise). 3. Rotation γ among y (clockwise). 4. Translation of SQ center on distance a2 along y. 1 0 TLALS = TLALS (α = αLA, β = βLA,γ = γ LA) = RLA ⋅ 0 0

Finally, taking into account equations (4)-(9), the full transformation for every point of system “Body - Left Forearm” (B-LF) can be calculated this way: PB = TLSB ⋅ TLALS ⋅ TELA ⋅ TLFE ⋅ PLF ,

Taking into account (4), the transformation Body - Left Shoulder (B-LS) will be:

(6)

(9)

)

−1

PLF = TLSB ⋅ TLALS ⋅ TELA ⋅ TLFE ⋅ PB.

(10)

where PB, PLF - coordinates of Body and Left Forearm points correspondingly (Figure 2). The main equations for Right Arm and Forearm are calculated the same way.

2.4

Human Hips and Legs in SQ

Analogically with previous equations (Section 2.3), the full transformation for every point of system “Body - Left Leg” (B-LL) is calculated this way: B LHJ PB = TLHJ ⋅ TLH ⋅ TKLH ⋅ TLLK ⋅ PLL,

(

)

−1

B LHJ PLL = TLHJ ⋅ TLH ⋅ TKLH ⋅ TLLK ⋅ PB.

(11)

where PB, PLL – coordinates of Body and Left Leg points respectively (Figure 2); T – corresponding transformations (12)-(15). The transformation Body – Left Hip Joint (BLHJ) is absolutely the same as TLSB from equation (5), except using the angle ηLL in the equation (4) for calculation of the Left Hip position.

The transformation Left Hip Joint – Left Hip (LHJ -LH) uses other rotation sequences and translation if compare with equations (6) and (7): 1. Rotation α among x (clockwise). 2. Rotation β among y (anticlockwise). 3. Rotation γ among z (clockwise). 4. Translation of SQ center on distance -a3 along z. 1 0 LHJ LHJ (α = αLH, β = βLH, γ = γ LH ) = RLH ⋅ TLH = TLH 0 0

0 1 0 0 , 0 1 − a3 0 0 1 0 0

(12)

where RLH is the rotation matrix of Left Hip: 0 cos(β) 0 0 ⋅ 0 sin(β) 1 0

0 1 0 0 cos(α) − sin(α) RLH = 0 sin(α) cos(α) 0 0 0

0 − sin(β) 0 cos(γ ) − sin(γ ) 1 0 0 sin(γ ) cos(γ ) ⋅ ( ) 0 cosβ 0 0 0 0 0 1 0 0

0 0 0 0 . 1 0 0 1

(13)

The transformation Left Hip - Knee (LH-K) is

LH K

T

1 0 = 0 0

0 0 . 0 1 − a3 0 0 1 0 0 1 0

(14)

0 − sin(δ ) 0 1 0 0 . 0 cos(δ ) a3 0 0 1 −1

(15)

The similar transformations for Right Hip and Leg are described by almost the same equations.

3 3.1

3.2

3D OBJECT LOCALIZATION ALGORITHM About sensors, object and data

The 3D cloud of points is captured with a multicamera system developed at the University of Trento in the framework of the project VERITAS (De Cecco, Paludet, etc., 2010). The multi-camera system for capturing range images consists of 8 pairs of cameras, which are a multiple stereo system employing measurements a 3D-surface with

Preprocessing: Segmentation

The segmentation of 3D cloud of points of a human body has been done automatically basing on the clothing analysis. We extract the human being clusters (body and eight limbs: left/right arms, forearms, hips and legs) according to the special clothing marks on the garment (Figure 6). These marks generate color structures, which are predefined clothing models. The result of this clothing analysis is a matrix of segmentation, the elements of which set belonging to the body or definite limbs for every data points. Experimental results show that such clothing segmentation is well-able to extract limbs of human body from range images with variations in backgrounds and lighting conditions. The segmentation is completed with the use of RANSAC fitting. The markers near to the body joints have uncertain association with limbs. This uncertainty can be solved using RANSAC-SQfitting.

3.3

The transformation Knee - Left Leg (E-LL) is created by 1. Rotation δLL among y (clockwise). 2. Translation of SQ center on a3 along z. cos(δ ) 0 K K TLL = TLL(δ = δLL ) = sin(δ ) 0

superimposed colored markers, like a multi-camera system described in the paper (De Cecco, Pertile, etc., 2010).

RANSAC algorithm

We use RANSAC ("RANdom SAmple Consensus") algorithm for SQ Model Fitting at Body and Limb Pose Estimation. To remind the basic concept of RANSAC algorithm, the pseudocode of RANSAC algorithm is presented in Figure 3. The number of iterations performed by RANSAC (the parameter k) can be determined from the following formula (Kovesi, 2008): log(1− p) k= . inlierss (16) log 1− n where p – is the probability desired for choosing at least one sample free from outliers (in most of applications: p=0.99); s – is a number of points required to fit the model. The success of RANSAC usage depends on correct choosing the models. In our case it means correct choice of SQ parameters (as anthropometric sizes of human body and limbs) and logic of recognition (i.e. the sequence of body/limbs fitting). The attempt of robust recognition whole parts of human body simultaneously (in one stage) will be failed because of big amount of outliers. The test

showed that an acceptable quality of RANSAC-SQfitting can be achievable for two-stage human body pose estimation (Figure 1).

3.4

RANSAC model fitting

The logic of RANSAC-SQ-fitting algorithm is explained by pseudocodes (Figures 4 and 5). We are fitting a model described by the superquadric implicit equation to 3D data of a known object (chosen by segmentation), starting with a RANSAC-based object search to find pose hypothesis (i.e. 6 variables for Body Pose Search: 3 angles of rotation and 3 translation coordinates in the world coordinate system; or 4 variables for Limbs Pose Search: 4 angles of rotation). Each RANSAC sample calculation is started by picking a set of random points (s = 6 points for Body fitting and s = 3 for Limb fitting). Then we are fitting SQ model to this random dataset by minimizing an inside-outside function of distance to SQ surface (applying the “Trust-Region algorithm” or “Levenberg-Marquardt algorithm” in a nonlinear least-square minimization method) and evaluating inliers (by comparing the distances between 3D points and SQ model with assigned threshold). To speed up the fitting process, the position of initial starting point of minimization searching in world coordinates can be chosen in the center of gravity of segmented data points. SQ models allow to recovery an object in “clouds of points” with using the limited number of 3D data points, independently if they belong to object edges and corners or not. The minimization process with “Trust-Region” or “Levenberg-Marquardt” algorithms is stable without redundant complexity and time consuming. The Figures 6-9 show the result of fitting by the RANSAC-SQ-Fitting algorithm.

Algorithm RANSAC (x, fittingfn, distfn, s, t, Trials) % x - a dataset xn of n observations % fittingfn - a function that fits a model to x % distfn - a function that checks a distance from a model to x % s - min number of data to fit the model M % t – a threshold (a distance: datapoint - model) % Trials - a number of iterations in algorithm iter := 0 bestM := 0 inliers := 0 score := 0 p := 0.99

% count of iterations % the best model % accumulator for inliers % amount of inliers % probability of a sample without outliers

while k > iter % randomly selected j values from data xn xjs := random(xn); % model parameters, which fitted to xjs M := fittingfn(xjs); for all xi from xn if distfn(M, xin) < t inlk := xi end if end for % amount of inliers m := length(inlm); % the test to check how good the model is if m > score score := m; % amount of inliers inliers := inl; % inliers bestM := M; % the best model

k =

log( 1 − p ) s inliers log 1 − n

end if increment iter if iter > Trials break end if end while return bestM, inliers Figure 3: Pseudocode of RANSAC algorithm.

Algorithm RANSAC_Body_Fitting (x,t)

Algorithm RANSAC_Limb_Fitting (x,t)

s = 6; % min number of points to fit a SQ t = 2·10-2; % a threshold: datapoint-SQ (2 cm) % Trials - a number of iterations in algorithm % x - a dataset xn of n points of a body, which are a vector of the world coordinates (xw , yw , zw )

s = 3; % min number of points to fit a SQ t = 2·10-2; % a threshold: datapoint-SQ (2 cm) set TBODY; % from Body Fitting Algorithm % x - a dataset xn of n points of a limb, which are a vector of the world coordinates (xw , yw , zw )

% fittingfn - function to define SQ position by s, x. function fittingfn (xs) set x0s = 0; % initial values of α,β,γ,xc,yc,zc set SQBODY parameters a1-a3,ε1,ε2 for all xi from xn

% fittingfn - function to define SQ position by s, x. function fittingfn (x, TBODY) set x0s = 0; % initial values of α,β,γ,δ set a1-a3,ε1,ε2 for SQLA and SQLF set ηLA; % the angle position of a Shoulder for all xi from xn

i

0 0 1 0 cos(α ) − sin(α ) TBODY = 0 sin(α ) cos(α ) 0 0 0

0 cos(β ) 0 0 ⋅ 0 − sin(β ) 1 0

−1 Fsi ( x si , y si , z si ) = TBODY

0 sin(β ) 0 cos(γ ) − sin(γ ) 1 0 0 sin(γ ) cos(γ ) ⋅ 0 cos(β ) 0 0 0 0 0 1 0 0

(

min ∑ Fi ( x i ) 2 = Fwi F ( x)

0 0 1 0 0 0 ⋅ 1 0 0 0 1 0

i

0 0 xc 1 0 yc . 0 1 zc 0 0 1

x wi y ⋅ wi ; z wi 1

2 2 Fs ( xsi ) ε 2 Fs ( y si ) ε 2 + Fwi = a 2 a1

n

i

ε1

ε2

2 ε1 Fs ( z si ) ε1 ; + a3

)

2

−1 .

i =1

end for calculate variables α,β,γ,xc,yc,zc by minimizing min n F (x )2

∑

i

i

i =1

i

xwi y −1 Fsi (xsi , ysi , zsi ) = TBODY⋅ wi ; zwi 1 2 Fs (xsi ) ε2 Fwi = a1

1 0 0 0 0 0 0 0 1 0 aLA ε 1 0 aB2 cos(ηLA) 1 LS 2 T R ; = ⋅ ; LA LA ε 0 0 1 0 0 1 aB3 sin(ηLA) 1 0 0 1 0 0 0 1 0 0 cos(β) sin(β) 0 0 cos(γ ) 0 sin(γ ) 0 1 0 0 cos(α) − sin(α) 0 − sin(β) cos(β) 0 0 0 1 0 0 ⋅ ⋅ RLA = . 0 sin(α) cos(α) 0 0 0 1 0 − sin(γ ) 0 cos(γ ) 0 0 1 0 0 0 1 0 0 0 1 0 0 −1 1 0 0 0 0 0 0 1 0 1 0 aLA 0 cos(δ ) sin(δ ) − aLF LA 2 E E 2 . TE = ; T = T (δ = δLF ) = 0 0 1 0 LF LF 0 − sin(δ ) cos(δ ) 0 0 0 0 1 0 0 0 1 1 0 TLSBODY(ηLA) = 0 0

( )

LA −1

F (xsi , ysi , zsi ) = TLIMB LA si

ε2

(

d i = a1 ⋅ a 2 ⋅ a3 ⋅ Fwi

ε1

2

Fs (zsi ) ε1 ; + a3

)

2

−1 .

if di < t then xi = inliers end for return inliers Trials = 1000; % a number of iterations

start RANSAC (x, fittingfn, distfn, s, t, Trials) return bestTBODY, bestinliers Figure 4: Pseudocode of RANSAC Body Fitting algorithm.

xwi xwi y y −1 w LF LF i ⋅ ; Fsi (xsi , ysi , zsi ) = (TLIMB) ⋅ wi ; zwi zwi 1 1

2 2 Fs ( xsi ) ε 2 Fs ( y si ) ε 2 + Fwi = a 2 a1

[(

n

min∑ Fi (xi ) 2 = Fwi ( LA) F ( x)

2 ε 1 Fs ( ysi ) ε2 + a2

i

LA LS LF LS E TLIMB =TBODY⋅TLSBODY⋅TLA ; TLIMB = TBODY⋅ TLSBODY⋅ TLA ⋅ TELA ⋅ TLF ;

return TBODY % distfn - a function to select distances from SQ to x function distfn (TBODY, x) for all xi from xn

i

i =1

ε1LA

ε2

2 ε1 Fs ( z si ) ε1 ; + a3

)(

ε

LF

)]

2

− 1 ⋅ Fwi (LF) 1 −1 .

end for calculate variables α,β,γ,δ by minimizing min n F (x )2

∑

i

i

i =1

return α,β,γ,δ % distfn - a function to select distances from SQ to x function distfn (x,TBODY,α,β,γ,δ) for all xi from xn xwi xwi LA −1 ywi LF −1 ywi FsiLA(xsi , ysi , zsi ) = TLIMB ; FsLF ⋅ (x , y , z ) = (TLIMB) ⋅ ; zwi i si si si zwi 1 1

( )

2 Fs (xsi ) ε2 Fwi = a1

ε2

2 ε 1 Fs ( ysi ) ε2 + a2

(

d i = a1 ⋅ a 2 ⋅ a3 ⋅ Fwi

ε1

2

Fs (zsi ) ε1 ; + a3

)

2

−1 .

if di < t then xi = inliers end for return inliers Figure 5: Pseudocode of RANSAC Limb Fitting algorithm on example of Left Arm (LA) and Forearm (LF) Limbs.

Figure 6: Illustration of RANSAC Limb Fitting algorithm. At the top: left – a pose of a human in the garment, right – “cloud of points”. At the bottom: left – the result of RANSAC-fitting to 3D data, right – final pose estimation.

Figure 7: Illustration of RANSAC Limb Fitting algorithm. At the top: left – a pose of a human in the garment, right – “cloud of points”. At the bottom: left – the result of RANSAC-fitting to 3D data, right – final pose estimation.

Figure 8: Illustration of RANSAC Limb Fitting algorithm. At the top: left – a pose of a human in the garment, right – “cloud of points”. At the bottom: left – the result of RANSAC-fitting to 3D data, right – final pose estimation.

Figure 9: Illustration of RANSAC Limb Fitting algorithm. At the top: left – a pose of a human in the garment, right – “cloud of points”. At the bottom: left – the result of RANSAC-fitting to 3D data, right – final pose estimation.

4

RESULTS

The Figures 6-9 show the workability of the RANSAC-SQ-fitting algorithm (Figure 6) for tasks of Human Body Pose Estimation. For the presented examples, the amount of inliers is about 65% from approximately 2200 points of 3D rawdata. The algorithm has been developed in MATLAB. 3D data have been captured from a multi-camera system and then processed offline. The pose estimation technique described has been tested at processing 119 frames of 3D video of Human Body movement giving encouraging results.

5

CONCLUSIONS

This paper describes a method of applicable Human Body pose estimation for 3D data distributed in a space as a “cloud of points”. The proposed method based on RANSAC-object search with a robust least square fitting of SuperQuadric (SQ) models to 3D data. The solution is verified by evaluating the matching score between the SQ object model and 3D real data (acquired by a multiple cameras and segmented by a special clothing analysis). This method can be useful for applications dealt with 3D Human Body recognition, localization and pose estimation.

ACKNOWLEDGEMENTS This work of Ilya Afanasyev on algorithms of 3D object recognition, localization and reconstruction was supported by the grant of EU\FP7-Marie CurieCOFUND - Trentino postdoc program, 2010-2013. The work on 3D data acquisition and segmentation was executed in the framework of project VERITAS and funded by FP7, EU. The authors are very grateful to colleagues from Mechatronics dep., namely Alberto Fornaser for his help and support of 3D data acquisition.

REFERENCES De Cecco M., Pertile M., Baglivo L., Lunardelli M., Setti F., and Tavernini M., 2010. A unified framework for uncertainty, compatibility analysis, and data fusion for multi-stereo 3-D shape estimation. In IEEE Transactions on Instrumentation and measurement, Vol. 59, No. 11.

De Cecco M., Paludet A., Setti F., Lunardelli M., Bini R., Tavernini M., Baglivo L., Kirchner M., Da Lio M., 2010. VERITAS poster at SIAMOC congress, Italy. http://veritas-project.eu/2010/10/veritas-presented-atsiamoc-congress/ Jaklic A., Leonardis A., Solina F., 2000. Segmentation and Recovery of Superquadrics. Computational imaging and vision 20, Kluwer, Dordrecht. Hofmann M., Gavrila D.M., 2009. Multi-view 3D Human Pose Estimation combining Single-frame Recovery, Temporal Integration and Model Adaptation. In CVPR: 2214-2221. Kovesi P., 2008. RANSAC software in MATLAB. www.csse.uwa.edu.au/~pk/research/matlabfns/. Leonardis A., Jaklic A., Solina F., 1997. Superquadrics for Segmenting and Modeling Range Data. In IEEE Conf. Proc.. PAMI-19 (11). P. 1289-1295. DOI: 10.1109/34.632988. Mun Wai Lee, Cohen I., 2004. Human Upper Body Pose Estimation in Static Images. In ECCV (2): 126-138.

Ilya Afanasyev, Massimo Lunardelli, Nicolo' Biasi, Luca Baglivo, Mattia Tavernini, Francesco Setti and Mariolino De Cecco Department of Mechanical and Structural Engineering (DIMS), University of Trento, via Mesiano, 77, Trento, Italy {ilya.afanasyev, mariolino.dececco}@unitn.it

Keywords:

Superquadrics, RANSAC fitting, Human Body Pose Estimation, 3D object localization.

Abstract:

This paper presents a method for 3D Human Body pose estimation by using a multi-camera system. The pose is estimated by RANSAC-object search with a robust least square fitting of 3D points to SuperQuadric (SQ) models of the searched object. The solution is verified by evaluating the matching score between the SQ object model and 3D real data captured by a multi-camera system and segmented by a special preprocessing algorithm. This method can be used for 3D object recognition, localization and pose estimation of Human Body.

1

INTRODUCTION

The recovery of 3D human pose is an important problem in computer vision with many potential applications in robotics, motion analysis (sports, medical), animation, interactive games, and security surveillance. The task of 3D object recognition and pose estimation from unstructured 3D real data obtained by a multi-camera system can be solved different ways. Some authors propose to analyze a single-frame poses from individual camera views with subsequent generation of a shape hypothesis with hierarchical shape matching to 3D upper body model based on tapered SQ (Hofmann and Gavrila, 2009). Other authors use the idea of recovery of 3D upper human body from 2D static images by building an image generative model and using the MCMC (Markov Chain Monte Carlo) framework to search the solution space (Mun Wai Lee, 2004). For this purpose they employ an articulated structure human model (from 7 joints, 10 body parts and 21 degree of freedom), a probabilistic shape model, and clothing models (Mun Wai Lee, 2004). There is also a method of recovering an object by superquadric models with the recover-and-select paradigm (Jaklic, 2000 and Leonardis, 1997). For this purpose the authors fill a range images with a set of seeds (small SQ models) and then increase these seeds using growth iteration approach with

following selection of the suitable models. This approach was tried out on a wooden mannequin (Jaklic, 2000 and Leonardis, 1997). As far as a human body and limbs (in metric dimensions) can be modeled a priori, we propose to use a RANSAC-based model-fitting technique with a composite superquadric model. It is known that SQ models permit to describe complex-geometry objects with few parameters and generate simple minimization function to estimate an object pose. The logic of our 3D Human Body pose estimation algorithm is represented by the block-scheme in Figure 1. The object pose estimation starts with a preprocessing of “cloud of points” captured by multiple cameras. The preprocessing stage realizes segmentation of the Human Body into 9 parts (body, arms, forearms, hips and legs). Initially the algorithm recovers a position of a body as the largest object (“Body Pose Search”) and then uses the information about body position to restore other positions of human limbs (“Limbs Pose Search”). To cope with measurement noise and outliers, the object pose is estimated by RANSAC-SQ-fitting technique. We can control the quality of fitting by setting limbs (body) thresholds, which are a ratio of optimal quantity of inliers to a number of corresponding limb (body) points. The tests showed that as a result of the Body Pose Search we can obtain a hypothesis with a slightly wrong body position, which can satisfy to a body threshold, but can’t be applicable to overcome limb thresholds. For this reason, when a

number of limb inliers less than a limb threshold, we restart the Body Pose Search stage until finding suitable results of RANSAC-SQ-fitting for every

ε1

ε2

x a1 ⋅ signum(cosη ) ⋅ cosη ⋅ signum(cos ω ) ⋅ cos ω y = a ⋅ signum(cosη ) ⋅ cosη ε1 ⋅ signum(sin ω ) ⋅ sin ω ε 2 2 ε z a3 ⋅ signum(sin η ) ⋅ sin η 1

(1)

where x, y, z – superquadric system coordinates; a1, a2, a3 – parameters of object scaling; ε1, ε2 – parameters of object shape; η, ω – spherical coordinates. The object under investigation is a Human Body, which consists of 9 superquadrics – superellipsoids (Figure 2) with the shape parameters ε1 = ε2 = 0.5 and scaling parameters for different body parts: - Body: a1 = 0.095, a2 = 0.18, a3 = 0.275 (m). - Arms: a1 = a3 = 0.055, a2 = 0.15 (m). - Forearms: a1 = a3 = 0.045, a2 = 0.13 (m). - Hips: a1 = a2 = 0.075, a3 = 0.2 (m). - Legs: a1 = a2 = 0.05, a3 = 0.185 (m). The scaling parameters of SQ are presented in the metrical superquadric coordinate systems.

limbs. Figure 1: The block-scheme of 3D Human Body Pose Estimation algorithm.

2

OBJECT MODEL IN SUPERQUADRICS

Figure 2: Presentation of Human Body in 9 blocks: B – body, LA/RA – Left/Right Arms, LF/RF – Left/Right Forearms, LH/RH – Left/Right Hips, LL/RL – Left/Right Legs. Other abbreviations: LS – Left Shoulder, E – Elbow, ηLA – angle position of Left Shoulder, LHJ – Left Hip Joint, K – Knee, etc.

2.2 2.1

Human Body in SQ

SuperQuadric parameters

It is known (Jaklic, 2000 and Leonardis, 1997) that the explicit equation of superquadrics, which is usually used for SQ representation and visualization, is:

The position of Human Body is defined by the following rotation & translation sequences of the Body Superquadrics: 1. Translation of center of BODY (xc, yc, zc), along x, y, z-coordinates.

The rotation matrix of BODY RBODY is: 0 1 0 0 cos(α) − sin(α) RBODY= 0 sin(α) cos(α) 0 0 0

0 cos(β) 0 0 ⋅ 0 − sin(β) 1 0

0 cos(β) sin(β ) 0 0 cos(γ ) 0 − sin(β) cos(β) 0 0 0 ⋅ ⋅ 0 0 0 1 0 − sin(γ ) 1 0 0 0 1 0

0 1 0 0 cos(α) − sin(α) RLA = 0 sin(α) cos(α) 0 0 0

2. Rotation α among x (clockwise). 3. Rotation β among y (clockwise). 4. Rotation γ among z (clockwise).

0 sin(β) 0 cos(γ ) − sin(γ ) 1 0 0 sin(γ ) cos(γ ) ⋅ 0 cos(β) 0 0 0 0 0 1 0 0

0 sin(γ ) 0 1 0 0 . 0 cos(γ ) 0 0 0 1

(7)

The transformation Left Arm - Elbow (LA-E) is 0 0 0 0 1 0 0 1

1 0 TELA = 0 0

(2)

0 0 0 1 0 a2 . 0 1 0 0 0 1

(8)

The transformation matrix TBODY for the BODY is: 1 0 TBODY = RBODY ⋅ 0 0

0 0 xc 1 0 yc . 0 1 zc 0 0 1

(3)

The transformation Elbow - Left Forearm (E-LF) is created by 1. Rotation δLF among x (clockwise). 2. Translation of SQ center on -a2 along y. −1

2.3

0 0 0 1 0 cos(δ ) sin(δ ) − a 2 E E TLF = TLF (δ = δLF ) = . 0 − sin(δ ) cos(δ ) 0 0 0 1 0

Human Arms and Forearms in SQ

Let’s consider the transformation equations for Left Arm and Forearm. The position of Left Shoulder according to the center of the body coordinate system (Figure 2) is estimated by SQ explicit equation (1): 0 π ε P = Pη = η LA , ω = = a 2 ⋅ signum(cosη ) ⋅ cosη 1 2 a 3 ⋅ signum(sin η ) ⋅ sin η ε1 B S

1 0 T = 0 0 B LS

0 0 L 1 0 PSB . 0 1 L 0 0 1

(5)

0 0 0 1 0 a2 , 0 1 0 0 0 1

where RLA is the rotation matrix of Left Arm:

(

(4)

We can express the transformation: Left Shoulder - Left Arm (LS-LA) by the following rotation & translation sequences: 1. Rotation α among x (clockwise). 2. Rotation β among z (anticlockwise). 3. Rotation γ among y (clockwise). 4. Translation of SQ center on distance a2 along y. 1 0 TLALS = TLALS (α = αLA, β = βLA,γ = γ LA) = RLA ⋅ 0 0

Finally, taking into account equations (4)-(9), the full transformation for every point of system “Body - Left Forearm” (B-LF) can be calculated this way: PB = TLSB ⋅ TLALS ⋅ TELA ⋅ TLFE ⋅ PLF ,

Taking into account (4), the transformation Body - Left Shoulder (B-LS) will be:

(6)

(9)

)

−1

PLF = TLSB ⋅ TLALS ⋅ TELA ⋅ TLFE ⋅ PB.

(10)

where PB, PLF - coordinates of Body and Left Forearm points correspondingly (Figure 2). The main equations for Right Arm and Forearm are calculated the same way.

2.4

Human Hips and Legs in SQ

Analogically with previous equations (Section 2.3), the full transformation for every point of system “Body - Left Leg” (B-LL) is calculated this way: B LHJ PB = TLHJ ⋅ TLH ⋅ TKLH ⋅ TLLK ⋅ PLL,

(

)

−1

B LHJ PLL = TLHJ ⋅ TLH ⋅ TKLH ⋅ TLLK ⋅ PB.

(11)

where PB, PLL – coordinates of Body and Left Leg points respectively (Figure 2); T – corresponding transformations (12)-(15). The transformation Body – Left Hip Joint (BLHJ) is absolutely the same as TLSB from equation (5), except using the angle ηLL in the equation (4) for calculation of the Left Hip position.

The transformation Left Hip Joint – Left Hip (LHJ -LH) uses other rotation sequences and translation if compare with equations (6) and (7): 1. Rotation α among x (clockwise). 2. Rotation β among y (anticlockwise). 3. Rotation γ among z (clockwise). 4. Translation of SQ center on distance -a3 along z. 1 0 LHJ LHJ (α = αLH, β = βLH, γ = γ LH ) = RLH ⋅ TLH = TLH 0 0

0 1 0 0 , 0 1 − a3 0 0 1 0 0

(12)

where RLH is the rotation matrix of Left Hip: 0 cos(β) 0 0 ⋅ 0 sin(β) 1 0

0 1 0 0 cos(α) − sin(α) RLH = 0 sin(α) cos(α) 0 0 0

0 − sin(β) 0 cos(γ ) − sin(γ ) 1 0 0 sin(γ ) cos(γ ) ⋅ ( ) 0 cosβ 0 0 0 0 0 1 0 0

0 0 0 0 . 1 0 0 1

(13)

The transformation Left Hip - Knee (LH-K) is

LH K

T

1 0 = 0 0

0 0 . 0 1 − a3 0 0 1 0 0 1 0

(14)

0 − sin(δ ) 0 1 0 0 . 0 cos(δ ) a3 0 0 1 −1

(15)

The similar transformations for Right Hip and Leg are described by almost the same equations.

3 3.1

3.2

3D OBJECT LOCALIZATION ALGORITHM About sensors, object and data

The 3D cloud of points is captured with a multicamera system developed at the University of Trento in the framework of the project VERITAS (De Cecco, Paludet, etc., 2010). The multi-camera system for capturing range images consists of 8 pairs of cameras, which are a multiple stereo system employing measurements a 3D-surface with

Preprocessing: Segmentation

The segmentation of 3D cloud of points of a human body has been done automatically basing on the clothing analysis. We extract the human being clusters (body and eight limbs: left/right arms, forearms, hips and legs) according to the special clothing marks on the garment (Figure 6). These marks generate color structures, which are predefined clothing models. The result of this clothing analysis is a matrix of segmentation, the elements of which set belonging to the body or definite limbs for every data points. Experimental results show that such clothing segmentation is well-able to extract limbs of human body from range images with variations in backgrounds and lighting conditions. The segmentation is completed with the use of RANSAC fitting. The markers near to the body joints have uncertain association with limbs. This uncertainty can be solved using RANSAC-SQfitting.

3.3

The transformation Knee - Left Leg (E-LL) is created by 1. Rotation δLL among y (clockwise). 2. Translation of SQ center on a3 along z. cos(δ ) 0 K K TLL = TLL(δ = δLL ) = sin(δ ) 0

superimposed colored markers, like a multi-camera system described in the paper (De Cecco, Pertile, etc., 2010).

RANSAC algorithm

We use RANSAC ("RANdom SAmple Consensus") algorithm for SQ Model Fitting at Body and Limb Pose Estimation. To remind the basic concept of RANSAC algorithm, the pseudocode of RANSAC algorithm is presented in Figure 3. The number of iterations performed by RANSAC (the parameter k) can be determined from the following formula (Kovesi, 2008): log(1− p) k= . inlierss (16) log 1− n where p – is the probability desired for choosing at least one sample free from outliers (in most of applications: p=0.99); s – is a number of points required to fit the model. The success of RANSAC usage depends on correct choosing the models. In our case it means correct choice of SQ parameters (as anthropometric sizes of human body and limbs) and logic of recognition (i.e. the sequence of body/limbs fitting). The attempt of robust recognition whole parts of human body simultaneously (in one stage) will be failed because of big amount of outliers. The test

showed that an acceptable quality of RANSAC-SQfitting can be achievable for two-stage human body pose estimation (Figure 1).

3.4

RANSAC model fitting

The logic of RANSAC-SQ-fitting algorithm is explained by pseudocodes (Figures 4 and 5). We are fitting a model described by the superquadric implicit equation to 3D data of a known object (chosen by segmentation), starting with a RANSAC-based object search to find pose hypothesis (i.e. 6 variables for Body Pose Search: 3 angles of rotation and 3 translation coordinates in the world coordinate system; or 4 variables for Limbs Pose Search: 4 angles of rotation). Each RANSAC sample calculation is started by picking a set of random points (s = 6 points for Body fitting and s = 3 for Limb fitting). Then we are fitting SQ model to this random dataset by minimizing an inside-outside function of distance to SQ surface (applying the “Trust-Region algorithm” or “Levenberg-Marquardt algorithm” in a nonlinear least-square minimization method) and evaluating inliers (by comparing the distances between 3D points and SQ model with assigned threshold). To speed up the fitting process, the position of initial starting point of minimization searching in world coordinates can be chosen in the center of gravity of segmented data points. SQ models allow to recovery an object in “clouds of points” with using the limited number of 3D data points, independently if they belong to object edges and corners or not. The minimization process with “Trust-Region” or “Levenberg-Marquardt” algorithms is stable without redundant complexity and time consuming. The Figures 6-9 show the result of fitting by the RANSAC-SQ-Fitting algorithm.

Algorithm RANSAC (x, fittingfn, distfn, s, t, Trials) % x - a dataset xn of n observations % fittingfn - a function that fits a model to x % distfn - a function that checks a distance from a model to x % s - min number of data to fit the model M % t – a threshold (a distance: datapoint - model) % Trials - a number of iterations in algorithm iter := 0 bestM := 0 inliers := 0 score := 0 p := 0.99

% count of iterations % the best model % accumulator for inliers % amount of inliers % probability of a sample without outliers

while k > iter % randomly selected j values from data xn xjs := random(xn); % model parameters, which fitted to xjs M := fittingfn(xjs); for all xi from xn if distfn(M, xin) < t inlk := xi end if end for % amount of inliers m := length(inlm); % the test to check how good the model is if m > score score := m; % amount of inliers inliers := inl; % inliers bestM := M; % the best model

k =

log( 1 − p ) s inliers log 1 − n

end if increment iter if iter > Trials break end if end while return bestM, inliers Figure 3: Pseudocode of RANSAC algorithm.

Algorithm RANSAC_Body_Fitting (x,t)

Algorithm RANSAC_Limb_Fitting (x,t)

s = 6; % min number of points to fit a SQ t = 2·10-2; % a threshold: datapoint-SQ (2 cm) % Trials - a number of iterations in algorithm % x - a dataset xn of n points of a body, which are a vector of the world coordinates (xw , yw , zw )

s = 3; % min number of points to fit a SQ t = 2·10-2; % a threshold: datapoint-SQ (2 cm) set TBODY; % from Body Fitting Algorithm % x - a dataset xn of n points of a limb, which are a vector of the world coordinates (xw , yw , zw )

% fittingfn - function to define SQ position by s, x. function fittingfn (xs) set x0s = 0; % initial values of α,β,γ,xc,yc,zc set SQBODY parameters a1-a3,ε1,ε2 for all xi from xn

% fittingfn - function to define SQ position by s, x. function fittingfn (x, TBODY) set x0s = 0; % initial values of α,β,γ,δ set a1-a3,ε1,ε2 for SQLA and SQLF set ηLA; % the angle position of a Shoulder for all xi from xn

i

0 0 1 0 cos(α ) − sin(α ) TBODY = 0 sin(α ) cos(α ) 0 0 0

0 cos(β ) 0 0 ⋅ 0 − sin(β ) 1 0

−1 Fsi ( x si , y si , z si ) = TBODY

0 sin(β ) 0 cos(γ ) − sin(γ ) 1 0 0 sin(γ ) cos(γ ) ⋅ 0 cos(β ) 0 0 0 0 0 1 0 0

(

min ∑ Fi ( x i ) 2 = Fwi F ( x)

0 0 1 0 0 0 ⋅ 1 0 0 0 1 0

i

0 0 xc 1 0 yc . 0 1 zc 0 0 1

x wi y ⋅ wi ; z wi 1

2 2 Fs ( xsi ) ε 2 Fs ( y si ) ε 2 + Fwi = a 2 a1

n

i

ε1

ε2

2 ε1 Fs ( z si ) ε1 ; + a3

)

2

−1 .

i =1

end for calculate variables α,β,γ,xc,yc,zc by minimizing min n F (x )2

∑

i

i

i =1

i

xwi y −1 Fsi (xsi , ysi , zsi ) = TBODY⋅ wi ; zwi 1 2 Fs (xsi ) ε2 Fwi = a1

1 0 0 0 0 0 0 0 1 0 aLA ε 1 0 aB2 cos(ηLA) 1 LS 2 T R ; = ⋅ ; LA LA ε 0 0 1 0 0 1 aB3 sin(ηLA) 1 0 0 1 0 0 0 1 0 0 cos(β) sin(β) 0 0 cos(γ ) 0 sin(γ ) 0 1 0 0 cos(α) − sin(α) 0 − sin(β) cos(β) 0 0 0 1 0 0 ⋅ ⋅ RLA = . 0 sin(α) cos(α) 0 0 0 1 0 − sin(γ ) 0 cos(γ ) 0 0 1 0 0 0 1 0 0 0 1 0 0 −1 1 0 0 0 0 0 0 1 0 1 0 aLA 0 cos(δ ) sin(δ ) − aLF LA 2 E E 2 . TE = ; T = T (δ = δLF ) = 0 0 1 0 LF LF 0 − sin(δ ) cos(δ ) 0 0 0 0 1 0 0 0 1 1 0 TLSBODY(ηLA) = 0 0

( )

LA −1

F (xsi , ysi , zsi ) = TLIMB LA si

ε2

(

d i = a1 ⋅ a 2 ⋅ a3 ⋅ Fwi

ε1

2

Fs (zsi ) ε1 ; + a3

)

2

−1 .

if di < t then xi = inliers end for return inliers Trials = 1000; % a number of iterations

start RANSAC (x, fittingfn, distfn, s, t, Trials) return bestTBODY, bestinliers Figure 4: Pseudocode of RANSAC Body Fitting algorithm.

xwi xwi y y −1 w LF LF i ⋅ ; Fsi (xsi , ysi , zsi ) = (TLIMB) ⋅ wi ; zwi zwi 1 1

2 2 Fs ( xsi ) ε 2 Fs ( y si ) ε 2 + Fwi = a 2 a1

[(

n

min∑ Fi (xi ) 2 = Fwi ( LA) F ( x)

2 ε 1 Fs ( ysi ) ε2 + a2

i

LA LS LF LS E TLIMB =TBODY⋅TLSBODY⋅TLA ; TLIMB = TBODY⋅ TLSBODY⋅ TLA ⋅ TELA ⋅ TLF ;

return TBODY % distfn - a function to select distances from SQ to x function distfn (TBODY, x) for all xi from xn

i

i =1

ε1LA

ε2

2 ε1 Fs ( z si ) ε1 ; + a3

)(

ε

LF

)]

2

− 1 ⋅ Fwi (LF) 1 −1 .

end for calculate variables α,β,γ,δ by minimizing min n F (x )2

∑

i

i

i =1

return α,β,γ,δ % distfn - a function to select distances from SQ to x function distfn (x,TBODY,α,β,γ,δ) for all xi from xn xwi xwi LA −1 ywi LF −1 ywi FsiLA(xsi , ysi , zsi ) = TLIMB ; FsLF ⋅ (x , y , z ) = (TLIMB) ⋅ ; zwi i si si si zwi 1 1

( )

2 Fs (xsi ) ε2 Fwi = a1

ε2

2 ε 1 Fs ( ysi ) ε2 + a2

(

d i = a1 ⋅ a 2 ⋅ a3 ⋅ Fwi

ε1

2

Fs (zsi ) ε1 ; + a3

)

2

−1 .

if di < t then xi = inliers end for return inliers Figure 5: Pseudocode of RANSAC Limb Fitting algorithm on example of Left Arm (LA) and Forearm (LF) Limbs.

Figure 6: Illustration of RANSAC Limb Fitting algorithm. At the top: left – a pose of a human in the garment, right – “cloud of points”. At the bottom: left – the result of RANSAC-fitting to 3D data, right – final pose estimation.

Figure 7: Illustration of RANSAC Limb Fitting algorithm. At the top: left – a pose of a human in the garment, right – “cloud of points”. At the bottom: left – the result of RANSAC-fitting to 3D data, right – final pose estimation.

Figure 8: Illustration of RANSAC Limb Fitting algorithm. At the top: left – a pose of a human in the garment, right – “cloud of points”. At the bottom: left – the result of RANSAC-fitting to 3D data, right – final pose estimation.

Figure 9: Illustration of RANSAC Limb Fitting algorithm. At the top: left – a pose of a human in the garment, right – “cloud of points”. At the bottom: left – the result of RANSAC-fitting to 3D data, right – final pose estimation.

4

RESULTS

The Figures 6-9 show the workability of the RANSAC-SQ-fitting algorithm (Figure 6) for tasks of Human Body Pose Estimation. For the presented examples, the amount of inliers is about 65% from approximately 2200 points of 3D rawdata. The algorithm has been developed in MATLAB. 3D data have been captured from a multi-camera system and then processed offline. The pose estimation technique described has been tested at processing 119 frames of 3D video of Human Body movement giving encouraging results.

5

CONCLUSIONS

This paper describes a method of applicable Human Body pose estimation for 3D data distributed in a space as a “cloud of points”. The proposed method based on RANSAC-object search with a robust least square fitting of SuperQuadric (SQ) models to 3D data. The solution is verified by evaluating the matching score between the SQ object model and 3D real data (acquired by a multiple cameras and segmented by a special clothing analysis). This method can be useful for applications dealt with 3D Human Body recognition, localization and pose estimation.

ACKNOWLEDGEMENTS This work of Ilya Afanasyev on algorithms of 3D object recognition, localization and reconstruction was supported by the grant of EU\FP7-Marie CurieCOFUND - Trentino postdoc program, 2010-2013. The work on 3D data acquisition and segmentation was executed in the framework of project VERITAS and funded by FP7, EU. The authors are very grateful to colleagues from Mechatronics dep., namely Alberto Fornaser for his help and support of 3D data acquisition.

REFERENCES De Cecco M., Pertile M., Baglivo L., Lunardelli M., Setti F., and Tavernini M., 2010. A unified framework for uncertainty, compatibility analysis, and data fusion for multi-stereo 3-D shape estimation. In IEEE Transactions on Instrumentation and measurement, Vol. 59, No. 11.

De Cecco M., Paludet A., Setti F., Lunardelli M., Bini R., Tavernini M., Baglivo L., Kirchner M., Da Lio M., 2010. VERITAS poster at SIAMOC congress, Italy. http://veritas-project.eu/2010/10/veritas-presented-atsiamoc-congress/ Jaklic A., Leonardis A., Solina F., 2000. Segmentation and Recovery of Superquadrics. Computational imaging and vision 20, Kluwer, Dordrecht. Hofmann M., Gavrila D.M., 2009. Multi-view 3D Human Pose Estimation combining Single-frame Recovery, Temporal Integration and Model Adaptation. In CVPR: 2214-2221. Kovesi P., 2008. RANSAC software in MATLAB. www.csse.uwa.edu.au/~pk/research/matlabfns/. Leonardis A., Jaklic A., Solina F., 1997. Superquadrics for Segmenting and Modeling Range Data. In IEEE Conf. Proc.. PAMI-19 (11). P. 1289-1295. DOI: 10.1109/34.632988. Mun Wai Lee, Cohen I., 2004. Human Upper Body Pose Estimation in Static Images. In ECCV (2): 126-138.