QoE Prediction Model and its Application in Video ... - Semantic Scholar

7 downloads 9 Views 2MB Size Report
The model is for videophone services only (i.e., content with head and shoulder ... its own right as the optimization of QoE is crucial for mobile multimedia design ...


QoE Prediction Model and its Application in Video Quality Adaptation Over UMTS Networks Asiya Khan, Lingfen Sun, and Emmanuel Ifeachor

Abstract—The primary aim of this paper is to present a new content-based, non-intrusive quality of experience (QoE) prediction model for low bitrate and resolution (QCIF) H.264 encoded videos and to illustrate its application in video quality adaptation over Universal Mobile Telecommunication Systems (UMTS) networks. The success of video applications over UMTS networks very much depends on meeting the QoE requirements of users. Thus, it is highly desirable to be able to predict and, if appropriate, to control video quality to meet such QoE requirements. Video quality is affected by distortions caused both by the encoder and the UMTS access network. The impact of these distortions is content dependent, but this feature is not widely used in non-intrusive video quality prediction models. In the new model, we chose four key parameters that can impact video quality and hence the QoE-content type, sender bitrate, block error rate and mean burst length. The video quality was predicted in terms of the mean opinion score (MOS). Subjective quality tests were carried out to develop and evaluate the model. The performance of the model was evaluated with un. The model seen dataset with good prediction accuracy also performed well with the LIVE database which was recently made available to the research community. We illustrate the application of the new model in a novel QoE-driven adaptation scheme at the pre-encoding stage in a UMTS network. Simulation results in NS2 demonstrate the effectiveness of the proposed adaptation scheme, especially at the UMTS access network which is a bottleneck. An advantage of the model is that it is light weight (and so it can be implemented for real-time monitoring), and it provides a measure of user-perceived quality, but without requiring time-consuming subjective tests. The model has potential applications in several other areas, including QoE control and optimization in network planning and content provisioning for network/service providers. Index Terms—Content types, MOS, non-intrusive model, NS2, QoE, UMTS, video quality prediction and adaptation.



RANSMISSION of video content over Universal Mobile Telecommunication Systems (UMTS) networks is growing exponentially and gaining popularity. Digital videos are now available everywhere—from handheld devices to

Manuscript received January 05, 2011; revised May 31, 2011 and October 29, 2011; accepted November 01, 2011. Date of publication November 16, 2011; date of current version March 21, 2012. This work was supported in part by the EU FP7 ADAMANTIUM project (contract No. 214751). The associate editor coordinating the review of this manuscript and approving it for publication was Prof. James E. Fowler. The authors are with the Centre for Signal Processing and Multimedia Communication, School of Computing and Mathematics, University of Plymouth, Plymouth PL4 8AA, U.K. (e-mail: [email protected]; [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TMM.2011.2176324

personal computers. However, due to the bandwidth constraints of UMTS networks quality of experience (QoE) still remains of concern. This is because low video quality leads to poor QoE which in turn leads to reduced usage of the applications/services and hence reduced revenues. User expectation of video quality over handheld and mobile terminals is increasing to that of broadcast level [1], [2]. In order to meet QoE requirements of users, there is a need to predict, monitor and if necessary control video quality. Non-intrusive models provide an effective and practical way to achieve this [3]. However, research on video quality modeling is still limited. ITU-T Study Group 9 [4] is working on standardization of non-intrusive video quality modeling and recently has produced a draft version of the test plan for the hybrid perceptual/bitstream models for IPTV and mobile video streaming applications. Existing non-intrusive models do not take into account the impact of several important parameters, such as, video content which has an impact on video quality achievable under same network conditions, as well as the encoder. Video content features (such as blurriness, brightness, etc.) can be extracted from the video signal before encoding the video. Existing video quality prediction algorithms tend to consider either video content features or the effects of distortions caused by the encoder or network impairments but rarely all three. In addition, they are restricted to IP networks. However, with the growth of video services over wireless access networks it is important to take into account impairments that occur in the access network. Video quality can be measured in an intrusive (full reference) or non-intrusive (reference free) way. Intrusive prediction models require access to the source, whereas non-intrusive modes do not and hence are preferred for online monitoring, prediction and control. Video quality prediction models presented in [5]–[8] consider encoder based distortions only and those in [9] and [10] are based on video content features. The model presented in [9], called MOVIE index, is a full reference model developed from the spatio-temporal features of the video, whereas the model presented in [10] is reference free. Full reference video quality prediction models are difficult to implement for real-time monitoring due to their complexity. Work presented in [11] uses bitstream measurements only to monitor video quality. The model presented in [12] combines video content features with the distortions caused by the encoder. Work in [13] presents a metric that measures temporal quality degradation caused by regular and irregular frame loss. Several models have also been developed to predict video quality [14], [15] over IP networks from network distortions only, e.g., packet loss, delay and jitter. Work presented in [16] proposes a metric that models multiple packet losses in H.264 videos using reduced reference methods,

1520-9210/$26.00 © 2011 IEEE


but do not consider either encoder based distortions or video content features. Work presented in [17] uses neural networks to assess video quality based on a combination of parameters from the network and encoder, but this is restricted to IP networks with limited content. In [18] a model for videophone applications is proposed. The model uses two encoder parameters (sender bitrate and frame rate) and packet loss from the network. The model is for videophone services only (i.e., content with head and shoulder movement) with high sender bitrates over IP networks. In [19] a video quality estimation model is proposed for IPTV applications. The model combines parameters from the network and encoder based parameters. Recent work on video quality assessment [20]–[22] has shown that video quality is affected by parameters associated with the encoder (e.g., sender bitrate) and the network (e.g., packet loss). In [23] a review of the evolution of video quality metrics and a discussion of the state of art are given. A metric that combines network losses with video bitstream information is also presented. In a previous paper [24] we showed that video quality is impacted by distortions caused by the encoder and access network. These distortions are very much content dependent. Work presented in [25] also concluded that content types was the second most important QoS parameter after encoder type and has significant impact on video quality. We proposed in [26] a video quality prediction model over UMTS networks that combine parameters associated with the encoder and the UMTS access network, but this is based on peak-signal-to-noise-ratio (PSNR) to MOS conversion which may not adequately reflect visual quality. What is needed is a model that is based on subjective tests so the quality predicted by the model is closely linked to user-perceived quality. The model should be efficient and light weight and suitable for all types of video content so that it can be implemented at the receiver side to monitor and predict quality and, if appropriate, control the end-to-end perceived quality. Thus, the focus of this study is to develop a new contentbased, non-intrusive, video quality prediction model that takes into account distortions caused by the UMTS access network and the encoder. Our work focuses on low bitrate videos encoded with H.264 codec transmitted over UMTS access network, taking into account content types. The most significant content types were identified in a previous study [27] and were classified into groups using a combination of temporal (movement) and spatial (edges, blurriness, brightness) features using a cluster analysis tool [28]. As part of the model development, we conducted subjective tests with different test scenarios. The test cases were prepared by considering the distortions introduced by the encoder and the UMTS access network for different types of content (from head and shoulders to fast moving sports). The new model was used to control and optimize QoE to demonstrate its application. The application is of interest in its own right as the optimization of QoE is crucial for mobile multimedia design and delivery. The idea in this application is to move away from the use of individual network parameters, such as block loss or delay, to control performance. Instead, to move towards perceptual-based, video quality control, in order to achieve the best possible end-to-end video quality [10], [29], [30]. In this study, we used fuzzy techniques to perform



adaptation as they provide a natural way to perform control and because they have been successfully used in network adaptation schemes before for video applications [31]–[33]. In particular, they have been used in adaptive feedback for packet loss rate and congestion notification from routers [31], adaptive control of video bit rate [32], and in control algorithms for variable bit rate applications [33]. The main contributions of the paper are twofold. • A new and efficient model to predict video quality over UMTS networks non-intrusively. The model uses a combination of parameters associated with the encoder and the UMTS access network for different types of content. • Application of the new model in QoE control using a new sender bitrate adaptation scheme at the pre-encoding stage. The rest of the paper is organized as follows. Section II presents the new model. In Section III we demonstrate the application of the proposed model in QoE control. Section IV concludes the paper and highlights areas of future work. II. QOE PREDICTION MODEL FOR H.264 VIDEO In this section, we present the development of the non-intrusive content-based QoE prediction model for low bitrate H.264 video for mobile streaming application. First, we describe in subsection A the generation of data sets used to develop the model, and in subsection B the subjective tests are described. Data analysis and modeling are presented in subsection C and model validation with external databases are presented in subsection D. A. Data Set Generation The content of the source clips and the choice of codec were chosen to be representative of a typical scenario for watching video on mobile devices. The test material comprises of six clips—three chosen for model training and three for validation. The video sequences represent contents with low spatio-temporal (ST) to high ST features as classified in a previous study [27]. The videos were encoded with H.264/AVC [34] codec as it is the recommended codec for video transmission over UMTS networks. The considered frame structure is IPPP for all the sequences, since the use of periodic IDR-frames (Instantaneous Decoding Refresh within H.264 encoder) could cause sudden data rate increases and delays. The video contents, description and duration of the clips are summarized in Table I. The video frame size was QCIF . QCIF was specifically chosen (instead of, for example, CIF or larger sizes) as it is the recommended size for mobile phones and small handheld terminals which are the target application areas of the study. However, newer smart phones have



Fig. 1. Snapshot of the training and validation sequences.


Fig. 2. Two-state Markov loss model.

a resolution of 320 240, or even higher on, e.g., ipads. Thus, the QCIF resolution may seem small and in this sense our study reflects the worst case scenario. All the generated video clips can be downloaded from [35]. Snapshots of the video clips are depicted in Fig. 1. A combination of parameters associated with the encoder and the UMTS access network for different content types were chosen (see Table II). We chose six content types (CTs)—three for model training and three for validation. Video sequences of Akiyo, Foreman and Stefan were used for training and sequences of Suzie, Carphone and Football for validation of the model. The frame rate (FR) was fixed at 10 fps as this is typical in mobile video streaming [2], [6], [12]. We found that SBR has a greater impact on quality than FR and hence SBR was chosen. The videos were encoded with three Constant Bitrate (CBR) SBR values of 48 kbps, 88 kbps and 128 kbps for the training videos and 90 kbps and 130 kbps for the validation videos. The UMTS access network parameters chosen were block error rate (BLER) and mean burst length (MBL). In the access network we found that BLER is the most important parameter [24]. MBL was chosen to account for more bursty scenarios. The videos were then sent over OPNET [36] simulated UMTS network to create conditions with BLER of 1, 5, 10, 15 and 20%. BLER of 20% corresponds to an IP loss of 2%–3%. Thus the quality was not degraded beyond 20% BLER. We found that BLER of 1%–5% corresponded to no IP loss. OPNET was used to analyze the specific impact of the UMTS error conditions on the perceived video quality, due to its accurate implementation of the radio-link-control (RLC) not-in-order delivery mechanism. Error simulated in the physical layer (BLER) was employed to generate losses at the link layer modeled with 2-state Markov model [37] with variable MBLs [38] to depict the various UMTS scenarios. The 2-state Markov model is depicted in Fig. 2. According to this model, the network is either in a good (G) state, where all packets are correctly delivered, or in bad (B) state, where all packets are lost. Transitions between the states (from G to B and vice versa) occur with probabilities of - and - . The average block error rate and mean burst length can be expressed as and - - . If , this reduces to a random error model with the only difference that loss of two consecutive packets is not allowed. is selected based on the mean error burst length found in [38] for typical roaming scenarios from real-world UMTS measurements. if for a scenario where more bursty errors are found, while the is for random uniform error model.

Fig. 3. Experimental setup for data generation.

All the chosen test conditions were sent over the simulated network to generate test data with network impairments. This was specifically done as in the literature onlylimited work on video quality assessment takes network errors into account. A contribution of the paper is to look at the combined effects of distortions caused by both the encoder and access network impairments on end-to-end quality. The video test conditions are described in Table II. The experimental set up is shown in Fig. 3. In total 81 sequences were generated for training and 54 for model validation. B. Subjective Tests The subjective quality assessment experiment follows ITU-T Recommendations [39], and was conducted using the singlestimulus absolute category rating (ACR) method with a five point quality scale [39]. Each processed video is presented one at a time and rated individually. The presentation order was randomized between viewers such that each one of them viewed the test sequences in a different presentation order. The videos were rated independently on a discrete 5 level scale from “bad” (1) to “excellent” (5). The ratings for each test clip were then averaged over all subjects to obtain a mean opinion score (MOS) [40]. Voting period was not time-limited. After choosing their quality rating, assessors had to confirm their choice using the “submit” button. This approach gave subjects the possibility to change their mind before committing to their final vote. Viewing distance was not fixed. Participants were allowed to adjust to their most comfortable viewing distance, although they were instructed to keep their back in contact with the chair. The laboratory had calibrated 20-inch computer LCD monitor (Philips 200 WB7) to display the video sequences. The display had a native resolution of 1280 1024 pixels and color quality selected as highest (32 bit). The room had a white background. The sequences were displayed in their original size with grey border. Participants provided their ratings electronically using the computer mouse. A total of 20 naïve viewers participated in the experiment—11 males and 9 females. This conforms to the minimum number of viewers specified by ITU-T Recommendations [39].




Fig. 4. Histogram of subjective MOS (dashed line represents median at 3.7).

The age range of 14 participants was between 18 to 25; 4 were between 25 to 30 and 2 were over 35. Participants were recruited from within the University. They were first presented with three training sequences that were different to the test sequences. The experiments were divided into two sessions with a 10–15 min comfort break between them over three days. This adhered to the ITU-T recommendation of time period not exceeding half an hour. An informal survey was conducted after the tests regarding the length of the study, fatigue during the tests, etc. which concluded that all participants did not experience any fatigue or discomfort during the tests. All subjects assessed all degraded video sequences in the test. The MOS data obtained from the test was scanned for unreliable and inconsistent results. We used the ITU-T [39] criteria for screening subjective ratings which led to three subjects being rejected. The scores from the rest of the subjects were averaged to compute the overall MOS for each test condition. Fig. 4 shows the histogram of subjective quality ratingsfor both the training and validation datasets. Fig. 4 indicates that the MOS distribution is biased towards high MOS values. This is due to the fact that subjects gave a higher vote to sequences with low to medium movement. Also in the test conditions the data were not degraded beyond 3% IP loss and so no sequence were rated as ’bad’ (see later Section IV). C. Data Analysis and Modeling We analyzed the relationships ofthe four chosen parameters that impacts on QoE—sender bitrate, content type, block error rate and mean burst length on end-to-end video quality. This enabled us to establish a relationship between these four parameters and MOS which is used in the regression modeling (see later). We performed 4-way repeated analysis of variance (ANOVA) [28] on our training dataset to determine if the means in the MOS data set given by the 4 QoE parameters differ when grouped by multiple factors (i.e., the impact of all four parameters on MOS). ANOVA would also enable us to understand the interactions of the four variables and hence their relationships in the regression modeling.

Table III shows the results of the analysis. The fourth column shows the F statistic and the fifth column gives the p-value, which is derived from the cumulative distribution function (cdf) of F [28]. A small p-value indicates that MOS is significantly affected by the corresponding parameter. From the results, we may conclude that MOS is affected by CT, SBR and BLER . However, MBL is not as significant . However, there were interaction effects between each pair of parameters. The two-way interaction between CT and SBR have a greater impact compared to that of CT and BLER . This was due to the fact that in our data we only considered limited values of BLER. BLER had an impact only on fast moving contents. This is due to the fact that as long as the IP packet error probability remains unchanged the impact of link layer losses on end-to-end video quality is negligible. With three-way interactions, the combination of CT, SBR and MBL and SBR, BLER and MBL have a significant impact compared to the other two. The two-way interactions capture the three way impact as well. We can summarize our findings as follows. 1) The most important QoE parameter in the application layer is content type. Therefore, an accurate video quality prediction model should consider all content types. 2) The optimal SBR that gives the best quality is very much content dependent and varies from sequence to sequence. We found that for slow moving content gave acceptable quality [40]. However, as the spatio-temporal activity of the content increased quality became unacceptable under no network impairment. Hence the choice of SBR is very much dependent on the type of content. 3) The most important QoE parameter in the UMTS access network is BLER. Therefore, an accurate video quality prediction model should consider the impact of access network parameters in addition to encoder parameters. 4) The impact of access network parameters of MBL and BLER vary depending on the type of content. For slow moving content BLER of 20% gave acceptable quality. However, for fast moving content for the same BLER the quality may not be acceptable . Therefore, the impact of access network QoE parameters is very much content dependent.



Fig. 5. Pie chart representation of the ranking of QoE parameters.

Fig. 6. Functional block of proposed regression-based model.

Based on our findings we have ranked the four QoE parameters in order of importance. The most important is ranked as 1 and the least important as 4. The ranking is based on the results of the ANOVA analysis carried out earlier. Our analysis showed that CT is the most important QoE parameter as the impact of SBR and BLER are very much content dependent. Similarly, we found MBL to be least important. The degree of importance of each QoE parameter is depicted in the form of a pie chart in Fig. 5. We note from Fig. 5 that BLER (25%) and MBL (22%) are close in importance. Similarly, BLER (25%) and SBR (24%) are very close in order of importance. The pie chart takes as its input the interactions of the four QoE parameters. The ANOVA analysis shown in Table III confirmed that there are interactions between the four QoE parameters. This enabled us to carry out a nonlinear regression analysis on the datasets using different polynomial and rational equations to capture these effects in a model. The functional block of the proposed model is shown in Fig. 6. The application layer parameters considered are content type (CT) and sender bitrate (SBR). The UMTS access networkparameters in the physical layer are BLER, modeled with 2-state Markov model with varying MBLs. We plotted 2-D graphs to analyze the relationship between MOS, sender bitrate, content type, block error rate and mean burst length [see Fig. 7(a)–(c)]. Based on the relationships of the QoE parameters we established the following function for estimating the overall video quality: (1)

Fig. 7. Relationships of the QoS parameters with MOS.

where is a constant, is measured in terms of SBR and CT, and is measured in terms of BLER and MBL for 2-state Markov model.





In our previous work [27], we extracted the temporal features of sum of absolute differences (SAD) and the spatial features of edge, blurriness and brightness, thus giving CT as shown in (2): (2) CT is then predicted using cluster analysis [28] as three discrete values of 0.1 (Suzie), 0.5 (Carphone) and 0.9 (Football). See Section III for details. 1) Content Type and Encoder : From ANOVA analysis and Fig. 7(a), we established the relationship between MOS and SBR and CT. We found that there is a logarithmic relationship between SBR and MOS and a linear relationship between MOS and CT. Also from ANOVA analysis (Table III), we found that the combined impact of SBR and CT is significant. Mathematically, this is shown by (3): (3) : Similarly, in the UMTS ac2) UMTS Access Network cess network we found that the relationships between MOS and BLER and MBL can be modeled as a polynomial function shown in Fig. 7(b) and (c) can be modeled as polynomial function as shown in (4): (4) Based on the relationship obtained for the four chosen QoE parameters from the ANOVA analysis and Fig. 7(a)–(c), we found the following rational model from nonlinear regression analysis of the subjective data using MATLAB:

(5) The values of the coefficients of the model are listed in Table IV. Equation (5) can be readily extended to include FR as a variable to account for the impact of frame rate on quality. The coefficients for the QoE prediction model are given in Table V along with the correlation coefficient and root mean squared error (RMSE). Fig. 8 shows the resulting scatter plot of subjective data against model prediction. We achieved a correlation coefficient of around 93% for the validation dataset and 95.6% for the training dataset. There were 81 test conditions for the training and 54 test conditions for the validation.

Fig. 8. Scatter plot of subjective video quality against quality prediction from model.

D. Validation of the Proposed Model This section shows the validation of the proposed model with external MOS databases given in [25] and [41]. The database in [25] considers H.264 encoder based distortions only (i.e., no network dependent distortions). The data is for H.264 encoded QCIF videos with encoder parameters of SBR and FR with five video sequences. We chose encoder only parameters and CT to show that if network losses were taken as zero ( in (5)) the QoE model gives a correlation coefficient of 78%. In this case (5) reduces to (6). Fig. 9(a) shows the model validation results on the external database from [24]: (6) The LIVE database in [41] considers H.264 encoder based distortions along with network packet losses for ten types of video sequences.We have used the LIVE database that was made available for the research community to use. The subjective quality measure in this database is based on the Degradation MOS (DMOS) as opposed to MOS. The data in Fig. 9(b) is from LIVE which is for H.264 videos of size 768 480 with high SBRs and packet losses. The data is taken such that , packet losses are taken as BLER, and SBR values are taken according to the LIVE database. CT is calculated from (2) for ten video sequences. The frame rate was fixed at 30 fps. We achieved a correlation coefficient of for data from [41]. Fig. 9(b) shows the model validation results on the LIVE database. Fig. 9(b) shows that our model overestimates when DMOS is low (10–20) and underestimates when DMOS is between 35–45. The DMOS from 16–32 would be equivalent to MOS of 4. DMOS from 0–15 would be equivalent to MOS of 5. The model overestimates as the MOS dataset was biased towards high MOS values (in the proposed model). This seems to be a limitation of the existing dataset and will be addressed in future work. Similarly, DMOS from 33–48 represents MOS of 3. Model performance between DMOS 35–45 shows little variation in some points only. This could be due to the little variation in the content types in LIVE dataset. The model performance



Fig. 10. Conceptual diagram to illustrate QoE-driven adaptation.

Fig. 11. Content classification method.

(a) take into account the impact of network losses. Table V summarizes the correlation coefficient and root mean squared error of our model against external databases of [25] and [41]. III. APPLICATION OF THE PROPOSED MODEL QOE-BASED SENDER BITRATE ADAPTATION


In this section we demonstrate the application of the new model in sender bitrate adaptation at the sender side. The optimization of QoE is crucial for mobile multimedia design and delivery. The conceptual diagram of our proposed QoE-driven sender bitrate adaptation is given in Fig. 10. The QoE-based prediction model derived earlier is used in the QoE-based sender bitrate adaptive control mechanism as shown in Fig. 10. Subsection A describes the content classification method, QoE-adaptation scheme is described in subsection B, whereas subsection C describes the evaluation set-up of the proposed scheme. Results are discussed in subsection D. (b) Fig. 9. (a) Validation of proposed model from subjective data in [25]. (b) Validation of proposed model from subjective data in [41].

is good between – . In addition, the variation in content types in the LIVE dataset is very little. The content classifier classified most of the content as slow to medium activity and hence this added to the two clusters in Fig. 9(b). Further, LIVE wireless video database considered videos encoded at maximum frame rates and SBRs, whereas our model is derived from subjective data where test conditions included only one fixed frame rate (10 fps) and low sender bitrates. However, there is evidence in the literature [7] that subjective quality varies with frame rate and that this is a result of interaction between temporal and spatial dimensions. At present, the model does not take this aspect into account because this is not the focus of this study, and this may explain the bias on certain points in the LIVE wireless video database. We did not validate against the VQM FRTV1 dataset [42] as it is limited to MPEG4 and H263 codecs. In addition, it does not

A. Content Classification The video content classification [27] is carried out from degraded videos at the receiver side by extracting the spatial and temporal features using a well known multivariate statistical analysis called cluster analysis [28]. This technique is used as it groups samples that have various characteristics into similar groups. The spatio-temporal metrics have a low complexity and thus can be extracted from the videos in real time. The spatial features extracted are edge, blurriness and brightness, whereas the temporal features extracted are the sum of absolute difference values. Based on the extracted spatio-temporal features, a cluster analysis based on the Euclid distance of the data, is performed to determine the content type. Therefore, video clips in one cluster have similar content complexity. The content classifier takes the extracted features for each new video as input then predicts its most likely type. Once the CT is predicted, then it is used as an input to the QoE model given by (5). Details of our content classification design are given in [27]. The block diagram of the video content classification function is given in Fig. 11. Content classification was done offline in this paper for simplicity.



The content classifier can be extended to predict CT as continuous from 0 to 1 based on, e.g., neural networks. For larger video clips or movies the input will be segmented by segment analysis of the content features extracted. B. QoE-Driven Adaptation Scheme The fuzzy logic algorithm [43] implemented at the sender side processes the feedback information and decides the optimum number of layers that will be sent using fuzzy logic control. Layered encoding is used for adapting the video streams to the network dynamics. Video streams are encoded in a layered manner in a way that every additional layer increases the perceived quality of the stream. Base layers are encoded at a very low rate to accommodate for the UMTS access network conditions. Additional layers are added or dropped in order to adapt the video stream according to the content type and network conditions.We used stream switching in Evalvid-RA [44] where video streams are encoded at different sender bitrates. We use the model proposed in (5) for MOS prediction. The model is light weight and easy to implement. The predicted QoE metrics from (5) together with network QoS parameters is then used in the QoE-driven adaptation scheme to adapt the sender bitrate as shown in Fig. 9. RTCP is used to exchange the feedback on the quality of the data distribution by exchanging reports between the sender and the receiver. The feedback information is sent through extended RTCP reports [45] every second from the network and collects QoS information like loss rate, delay and jitter from the core network to give the network congestion level. The network congestion level is calculated from the packet loss ratio. Here it is referred to as the BLER computed from the total number of blocks lost over the total blocks sent. We use BLER as opposed to packets lost as in UMTS networks, the physical layer passes the transport blocks to the medium access control (MAC) layer together with the error indication from cyclic redundancy check, the output of the physical layer can be characterized by the overall probability (BLER) in this paper. Thus, an error model based on 2-state Markov model [37] of block errors was used in the simulation. However, this is also a limitation of the current adaptation scheme and in the future we aim to consider the losses in the core network in addition to access network. We define loss rate (LR), computed from [45] as the fraction of the number of block lost (BL) divided the total number of blocks sent (BS) within an interval. Therefore, the loss rate (LR) is given by (7) as (7) The range of loss rate level is from with 0 being no congestion and 1 meaning fully congested network. The LR was partitioned into four levels as , , and . The levels of LR are chosen such that 1% blocks lost, 3% blocks lost, 6% clocks lost and blocks lost. LR is an input to the decision algorithm for SBR adaptation. The second input to the decision algorithm is the degradation (D) and is calculated as the difference between the maximum achievable MOS and the [computed from

Fig. 12. Membership functions for the two inputs and the output and the outut SBR adaptor surface.


the QoE prediction model given in (5)]. The maximum achievable MOS is set to 4.2 when no blocks are lost. The D is therefore given by (8) as (8) The maximum value that D can have is 3.2, indicating maximum degradation, and the minimum value that D can have is 0 indicating no degradation at all. The D has been split into four levels as 0–0.25, 0.25–0.7, 0.7–1.2 and . The split in the values of D are chosen as a change of 0.25 in MOS. The levels of D are chosen such that MOS ranges from 3.8–4.2, 3.8–3.5, 3.5–3.0 and . This is then linked with an SBR level. The D along with the LR are used as input to the fuzzy logic sender bitrate adaptor. The membership functions for the two inputs (linguistic input variables) and the output (SBRchange) is shown in Fig. 12. Triangular functions are chosen due to their simplicity. The SBR change (output) surface is also given by Fig. 12 which shows the overall behavior of the SBR adaptor. The first linguistic variable (LV) input LR is the network loss rate. It ranges from 0 to 1. The second linguistic variable (LV) D is the degradation calculated from QoE model. D ranges from 0 to 3.2. The fuzzy SBR adaptor processes the two linguistic variables based on the predefined if-then rule statements (rule base) shown in Table VI, and derives the linguistic output variable SBRchange, which is defined for every possible combination of inputs. An example of the fuzzy rule is:


If loss rate is large (L) and degradation is medium (M) then SBRchange is BC (big change). The linguistic variables in Table VI are given by the membership functions of the output in Fig. 12 and are described as no change (NC), very small change (VSC), small change (SC) and big change (BC). The linguistic variables in Table VI for the two inputs are given by zero (Z), small (S), medium (M) and large (L). The defuzzified output can then be used to determine the next level of SBR as given by (9):



(9) Each value of SBR change corresponds to a layer of the encoded video bitstream. The defuzzified output is selected from 0 to 1 as shown in Fig. 12. Thus a gradual increase in SBR is allowed when the bandwidth is available and there is no/reduced loss rate, whereas quick action is taken to reduce the SBR in case of severe loss rate. C. Evaluation Set-Up The network topology is modeled in the UMTS extension for the NS2 [46] namely, Enhanced UMTS Radio Access Network Extension (EURANE) [47] integrated with Evalvid-RA [43] modified for H.264 video streaming. H.264 codec is chosen as it is the recommended codec for low bitrate transmission. We chose H.264 due to its better efficiency, more control at the encoder and being an evolving codec. The results of our proposed adaptive scheme are compared with the well known TFRC (TCP-Friendly Rate) [48] controller. TFRC calculates the sending rate as a function of the measured packet loss rate during a single round trip time duration measured at the receiver. The sender then calculates the sending rate according to [48]. With the Evalvid-RA [44] framework, it is possible to simulate pure TFRC transport directly on top of the network layer using stream switching. Constant bit rate (CBR) videos are used in the simulation as proof of concept. Three compression settings were used as shown in Table VII (104 kbps, 88 kbps, 44 kbps) for stream switching. The mismatch at the switching points is achieved by switching frames. The three CBR bitrates used in switching frames are given in Table VII. The switching is simulated in Evalvid-RA. Foreman video is used which is 30 s long and comprised of 3 equal sequences of 10 s (300 frames at 30 fps). The adaptive algorithm adjusts the compression rates until all packets get through. However, our technique can easily be extended to variable bit rate (VBR) videos, too. The evaluation model is given in Fig. 13. It consists of a streaming client and server. In the evaluation, the user equipment (UE) is a streaming client and a fixed host is the video streaming server located in the Internet. The addressed scenario comprises of a UMTS radio cell covered by a node B connected to a radio network controller (RNC). The evaluation model consists of a UE connected to downlink dedicated physical channel (DPCH). As the main aim of the evaluation was to investigate the impact of the radio interface (UMTS access network) on the quality of streaming H.264 video and carry out adaptation of SBR, no packet losses occur on either the Internet or the UMTS core network (e.g., SGSN, GGSN). No adaptation is carried

Fig. 13. UMTS network topology.

out if the quality stays above MOS of 4.2. This is because frequent adaptation is annoying to viewers. In Fig. 13 the links between the two nodes are labeled with their bitrate (in bits per second) and delay (in milliseconds). Each link capacity was chosen so that the radio channel is the connection bottleneck. Consequently, the functionality of Serving GPRS Support Node (SGSN) and Gateway GPRS Support Node (GGSN) was abstracted out and modeled as traditional ns nodes since they are wired nodes and in many ways mimic the behavior of IP router. Currently no header compression technique is supported in the Packet Data Convergence Protocol (PDCP) layer. From the 3GPP [49] recommendations we find that for video streaming services, such as VOD or unicast IPTV services, a client should support H.264 (AVC) Baseline Profile up to the Level 1.2 [34]. As the transmission of video was for mobile handsets, all the video sequences are encoded with a QCIF resolution. The considered frame structure is IPPP for all the sequences, since the extensive use of I frames could saturate the available data channel. In the IPPP frame structure used, IntraRefresh Macroblocks (IR MBs) were not included in P-pictures. The problem of error propagation with the loss of a P-frame will result in a quality drop if IR MBs have been used. However, our adaptation mechanism using IPPPP frame structure results in acceptable quality and the implementation of the various ways of using IR MBs will combat the problem of error propagation and enhance the performance of our adaptation mechanism and improve the delivered video quality. The results validated



Fig. 14. Comparison of end user quality with TFRC and no adaptation.

the use of our control mechanism and loss concealment mechanisms and error propagation combating algorithms will add to the performance of our algorithm. Therefore, our results represent a lower bound on the expected video quality. From these considerations, we set up the encoding features as shown in Table VII. The implemented link loss model is a 2-state Markov model given in Fig. 2 with three chosen values of MBL as 1, 1.75 and 2.5 to represent all UMTS scenarios. We ran ten simulations with each MBL and calculated the PSNR for each scenario. See subsection D for results and analysis. D. Results and Analysis In order to study the effect of link bandwidth on the MOS (QoE of the user), we conducted scenarios involving one user and then up to five UMTS users that received streaming video over NS2 simulated UMTS network. We conducted experiments with content type of Foreman and assessed the performance of our QoE-driven adaptation scheme over simulated NS2 [46] UMTS networks in terms of Average PSNR. NS2 was chosen due to its flexibility and based on the characteristics of the link bandwidth. PSNR values are compared to non-adaptive and TFRC in Fig. 14. Fig. 14 shows that the QoE-based fuzzy adaptive scheme successfully adapts the sender bitrate to network congestion. The proposed scheme slowly reduces the sender bitrate according to the network conditions maintaining acceptable quality. TFRC uses a more aggressive manner of recovery after network congestion and increases their transmission rate faster causing significant degradations of end-user quality measured in terms of average PSNR. Fig. 15 provides a perception for the capacity of the proposed QoE-based fuzzy adaptation scheme with respect to the number of UMTS users that can be supported by a video streaming server, taking into account the bottleneck link bandwidth. The figure shows the performance of the proposed QoE-driven adaptive scheme at three UMTS downlink bandwidths of 128 kbps, 256 kbps and 384 kbps compared to TFRC. The adaptive

Fig. 15. Average PSNR versus number of active users.

scheme out performs TFRC at all three link bandwidths with 5 users. Fig. 15 also depicts the quality (average PSNR) that is experienced by multiple identical users (total of 5) having the same connection characteristics, with respect to the bandwidth of the bottleneck link. The dashed lines indicate acceptable quality taken as (as from literature average is taken as acceptable). When the Link Bandwidth (LBW) is high enough (384 kbps) to sustain the aggregated video transmission rate, all users are supported by the video streaming server at equal quality levels. Even at the bottleneck LBW of 256 kbps all users can be supported at the minimum acceptable level. We have shown that our scheme would maintain acceptable quality for five users as most networks have admission control. This should enable us to admit more users with the proposed scheme if the cost of implementation is justified by the additional revenues. However, at the lowest level of the LBW (128 kbps) only two users can be supported and then the quality reduces below the acceptable threshold. Similarly, Fig. 16 gives the adaptive video quality over UMTS compared to the non-adaptive one at LBWs of 128 kbps, 256 kps and 384 kbps. Again, we observe an improvement in quality for content type of Foreman. At bottleneck bandwidth of 128 kb/s, adaptive “Foreman” gives an average PSNR 31 dB compared to 24 dB without adaptation. Therefore, the adaptive video scheme gracefully adapted the delivered video quality to the available network downlink bandwidth. IV. CONCLUSIONS AND FUTURE WORK In this paper we have presented a novel content-based non-intrusive video quality prediction model for low bitrate H.264 videos over UMTS networks. However, the model can be easily extended to access networks of Wireless Local Area Network (WLAN). The model was developed using a combination of parameters associated with the encoder and the UMTS access network for different content types. We demonstrated an application of the model in a new QoE-driven SBR adaptation scheme. The model was evaluated with unseen dataset (different




Fig. 16. Video quality results for different bottleneck bandwidth over UMTS network.

video clips within the same content type) with good prediction accuracy. It was also validated with the recent LIVE [41] database. The model has potential applications in several other areas, including QoE control and optimization in network planning and content provisioning for network/service providers. We found that the impact of combined encoder and network distortions on video quality are very much content dependent. This was reflected in the subjective scores. For example, sequences with slow movement and high losses were rated higher than sequences with fast movement. In our subjective tests none of the subjects rated the video sequences as “bad”. This is because BLER was restricted to 20%. In future, a study will be undertaken on mobile handsets with higher packet losses to cover the whole MOS range (i.e., from “bad” to “excellent”). In this study, QCIF videos were specifically chosen as the target application is mobile, but in general the size of the video (spatial resolution) has an impact on overall quality [25]. With the increase in available bandwidth, the QCIF resolution may seem small. On newer smart mobile phones a resolution of 320 240 is more commonly used. Future studies will consider subjective tests using mobile handsets with higher resolutions and hence investigate the impact of resolution on quality. The methodology presented in this paper is the same and should apply to higher resolution phones/terminals. Future studies will also take into account the impact of core network losses in addition to that of access network losses on video quality. Further, the results from the adaptation scheme will be validated using subjective tests.

ACKNOWLEDGMENT The authors would like to thank Mr. J. O. Fajardo for his help in the generation of the dataset and Dr. E. Jammeh for the discussions on fuzzy logic.

[1] S. Jeong and H. Ahn, “Mobile IPTV QoS/QoE monitoring system based on OMA DM protocol,” in Proc. Int. Conf. Information and Communication Technology Convergence (ICTC), Jeju, Korea, Nov. 17–19, 2010, pp. 99–100. [2] S. Jumisko, V. Ilvonen, A. Kaisa, and V. Mattila, “Effect of TV content in subjective assessment of video quality on mobile devices,” in Proc. IST/SPIE Conf. Mobile Multimedia, San Jose, CA, 2005, pp. 243–254. [3] D. S. Hands, “Video QoS enhancement using perceptual quality metrics,” BT Technol. J., vol. 23, no. 2, pp. 208–216, Apr. 2005. [4] ITU-T SG 9, Q 12/9, Hybrid Perceptual/Bitstream Models, 2011. [Online]. Available: http://www.itu.int/itu-t/workprog/wp_item. aspx?isn=6299. [5] H. Koumaras, A. Kourtis, C. Lin, and C. Shieh, “A theoretical framework for end-to-end video quality prediction of MPEG-based sequences,” in Proc. 3rd Int. Conf. Networking and Services, Jun. 19–25, 2007, pp. 62–65. [6] A. Eden, “No-reference image quality analysis for compressed video sequences,” IEEE Trans. Broadcast., vol. 54, no. 3, pp. 691–697, Sep. 2008. [7] Q. Huynh-Thu and M. Ghanbari, “Temporal aspect of perceived quality in mobile video broadcasting,” IEEE Trans. Broadcast., vol. 54, no. 3, pp. 641–651, Sep. 2008. [8] R. Feghali, F. Speranza, D. Wang, and A. Vincent, “Video quality metric for bitrate control via joint adjustment of quantization and frame rate,” IEEE Trans. Broadcast., vol. 53, no. 1, pp. 441–446, Mar. 2007. [9] K. Seshadrinathan and A. Bovik, “Motion tuned spatio-temporal quality assessment of natural videos,” IEEE Trans. Image Process., vol. 19, no. 2, pp. 335–350, Feb. 2010. [10] G. Zhai, J. Cai, W. Lin, X. Yang, and W. Zhang, “Three dimensional scalable video adaptation via user-end perceptual quality assessment,” IEEE Trans. Broadcast, Special Issue on Quality Issues in Multimedia Broadcasting, vol. 54, no. 3, pp. 719–727, Sep. 2008. [11] A. R. Reibman, V. A. Vaishampayan, and Y. Sermadevi, “Quality monitoring of video over a packet network,” IEEE Trans. Multimedia, vol. 6, no. 2, pp. 327–334, Apr. 2004. [12] M. Ries, O. Nemethova, and M. Rupp, “Video quality estimation for mobile H.264/AVC video streaming,” J. Commun., vol. 3, no. 1, pp. 41–50, Jan. 2008. [13] K.-C. Yang, C. C. Guest, K. El-Maleh, and P. K. Das, “Perceptual temporal quality metric for compressed video,” IEEE Trans. Multimedia, vol. 9, no. 7, pp. 1528–1535, Nov. 2007. [14] P. Calyam, E. Ekicio, C. Lee, M. Haffner, and N. Howes, “A gap-model based framework for online VVoIPQoE measurement,” J. Commun. Netw., vol. 9, no. 4, pp. 446–456, Dec. 2007. [15] S. Tao, J. Apostolopoulos, and R. Guerin, “Real-Time monitoring of video quality in IP networks,” IEEE/ACM Trans. Netw., vol. 16, no. 5, pp. 1052–1065, Oct. 2008. [16] S. Kanumuri, S. G. Subramanian, P. C. Cosman, and A. R. Reibman, “Predicting H.264 packet loss visibility using a generalized linear model,” in Proc. IEEE Int. Conf. Image Processing, Oct. 8–11, 2006, pp. 2245–2248. [17] S. Mohamed and G. Rubino, “A study of real-time packet video quality using random neural networks,” IEEE Trans. Circuits Syst. Video Technol., vol. 12, no. 12, pp. 1071–1083, Dec. 2002. [18] K. Yamagishi and T. Hayashi, “Opinion model for estimating video quality of videophone services,” in Proc. IEEE Globecom, San Francisco, CA, Nov. 27–Dec. 1, 2006, pp. 1–5. [19] K. Yamagishi, T. Kawano, and T. Hayashi, “Hybrid video-quality-estimation model for IPTV services,” in Proc. IEEE Globecom, Honolulu, HI, Nov. 30–Dec. 4, 2009, pp. 1–5. [20] G. W. Cermak, “Subjective video quality as a function of bit rate, frame rate, packet loss rate and codec,” in Proc. 1st Int. Workshop Quality of Multimedia Experience (QoMEX), Jul. 29–31, 2009, pp. 41–46. [21] A. K. Moorthy, K. Seshadrinathan, R. Soundararajan, and A. C. Bovik, “Wireless video quality assessment: A study of subjective scores and objective algorithms,” IEEE Trans. Circuits Syst. Video Technol., vol. 20, no. 4, pp. 513–516, Apr. 2010. [22] K. Seshadrinathan, R. Soundararajan, A. Bovik, and L. Cormack, “Study of subjective and objective quality assessment of video,” IEEE Trans. Image Process., vol. 19, no. 6, pp. 1427–1441, Jun. 2010. [23] S. Winkler and P. Mohandas, “The evolution of video quality measurement: From PSNR to hybrid metrics,” IEEE Trans. Broadcast., vol. 54, no. 3, pp. 660–668, Sep. 2008.



[24] A. Khan, L. Sun, E. Ifeachor, J. Fajardo, and F. Liberal, “Video quality prediction models based on video content dynamics for H.264 video over UMTS networks,” Int. J. Digit. Multimedia Broadcast., Special Issue on IP and Broadcasting Systems Convergence (IPBSC), vol. 2010, 2010, 17 pp. [25] G. Zhai, J. Cai, W. Lin, X. Yang, W. Zhang, and M. Etoh, “Cross-dimensional perceptual quality assessment for low bitrate videos,” IEEE Trans. Multimedia, vol. 10, no. 7, pp. 1316–1324, Nov. 2008. [26] A. Khan, L. Sun, E. Ifeachor, J. Fajardo, and F. Liberal, “Video quality prediction model for H.264 video over UMTS networks and their application in mobile video streaming,” in Proc. IEEE ICC, Cape Town, South Africa, May 23–27, 2010, pp. 1–5. [27] A. Khan, L. Sun, and E. Ifeachor, “Content clustering-based video quality prediction model for MPEG4 video streaming over wireless networks,” in Proc. IEEE ICC, Dresden, Germany, Jun. 14–18, 2009, pp. 1–5. [28] G. W. Snedecor and W. G. Cochran, Statistical Methods, 8th ed. Ames: Iowa State Univ. Press, 1989. [29] G. Muntean, P. Perry, and L. Murphy, “A new adaptive multimedia streaming system for all-IP multi-service networks,” IEEE Trans. Broadcast., vol. 50, no. 1, pp. 1–10, Mar. 2004. [30] B. Ciubotaru, G. Muntean, and G. Ghinea, “Objective assessment of region of interest-aware adaptive multimedia streaming quality,” IEEE Trans. Broadcast., vol. 55, no. 2, pp. 202–212, Jun. 2009. [31] P. Antoniou, V. Vassiliou, and A. Pitssillides, “Delivering adaptive scalable video over the wireless networks,” in Proc. 1st ERCIM Workshop eMobility, Coimbra, Portugal, May 21, 2007, pp. 23–34. [32] E. Jammeh, M. Fleury, and M. Ghanbari, “Fuzzy logic congestion control of transcoded video streaming without packet loss feedback,” IEEE Trans. Circuits Syst. Video Technol., vol. 18, no. 3, pp. 387–393, Mar. 2008. [33] M. Rezaei, M. Hannuksela, and M. Gabbouj, “Semi-fuzzy rate controller for variable bit rate video,” IEEE Trans. Circuits Syst. Video Technol., vol. 18, no. 5, pp. 633–645, May 2008. [34] JM H.264 Software. [Online]. Available: http://iphome.hhi.de/ suehring/tml/. [35] Degraded Video Clips. [Online]. Available: http://www.tech. plym.ac.uk/spmc/staff/akhan/degraded_video.html. [36] OPNET for Research. [Online]. Available: http://www.opnet.com. [37] E. N. Gilbert, “Capacity of a burst-noise channel,” Bell Syst. Tech. J., vol. 39, pp. 1253–1265, Sep. 1960. [38] W. Karner, O. Nemethova, P. Svoboda, and M. Rupp, “Link error analysis and modelling for video streaming cross-layer design in mobile communication networks,” ETRI J., vol. 29, no. 5, pp. 569–595, Oct. 2007. [39] BT-500-11: Methodology for Subjective Assessment of the Quality of Television Picture, International Telecommunication Union. [40] Subjective Video Quality Assessment Methods for Multimedia Applications, International Telecommunications Union, 2008, ITU-T. Rec. P.910. [41] A. K. Moorthy, K. Seshadrinathan, R. Soundararajan, and A. C. Bovik, LIVE Wireless Video Quality Assessment Database. [Online]. Available: http://live.ece.utexas.edu/research/quality/live_wireless_video.html.2009. [42] Final Report From the Video Quality Experts Group on the Validation of Objective Quality Metrics for Video Quality Assessment, Video Quality Experts Group (VQEG), Multimedia Group Test Plan, 2008. [Online]. Available: http://www.its.bldrdoc.gov/vqeg/projects/frtv_phaseI. [43] H. Takagi, “Application of neural networks and fuzzy logic to consumer products,” IEEE Technol. Updates Series: Fuzzy Logic Technol. Appl., vol. 1, pp. 8–12, 1994. [44] A. Lie and J. Klaue, “Evalvid-RA: Trace driven simulation of rate adaptive MPEG4 VBR video,” Multimedia Syst., vol. 14, no. 1, pp. 33–50, 2008. [45] T. Freidman, R. Caceres, and A. Clark, RTP Control Protocol Extended Reports (RTCP XR), 2003. [46] NS2. [Online]. Available: http://www.isi.edu/nsnam/ns/.

[47] Enhanced UMTS Radio Access Network Extensions for ns-2 (E.U.R.A.N.E). [Online]. Available: http://eurane.ti-wmc.nl/eurane/. [48] M. Handley, S. Floyd, J. Widmer, and J. Padhye, RFC3448: TCPFriendly Rate Control (TFRC): Protocol Specification, 2003. [Online]. Available: http://ww.ietf.org/rfc/rfc3448.txt. [49] Third Generation Partnership Project: Technical Specification Group Access Network; Radio Link Control (RLC Specification (Release 5), 3GPP TS 25.322. Asiya Khan received the B.Eng. degree (Hons) in electrical and electronic engineering from the University of Glasgow, Glasgow, U.K., in 1992, the M.Sc. degree in communication, control, and digital signal processing from Strathclyde University, Glasgow, in 1993, and the Ph.D. degree in multimedia communication from the University of Plymouth, Plymouth, U.K. She worked with British Telecommunication Plc from 1993 to 2002 in a management capacity developing various products and seeing them from inception through to launch. She has been a Research Assistant in Perceived QoS Control for New and Emerging Multimedia Services (VoIP and IPTV)–FP7 ADAMANTIUM project at the University of Plymouth. She has published several papers in international journals and conferences. Her research interests include video quality of service over wireless networks, adaptation, perceptual modeling, and content-based analysis. Dr. Khan was awarded with the “Best Paper Award” in ICAS 2009.

Lingfen Sun received the B.Eng. degree in telecommunication engineering in 1985 and the M.Sc. degree in communication and electronics system in 1988 from the Institute of Communication Engineering, Nanjing, China, in 1985 and 1988, respectively, and the Ph.D. in computing and communications from the University of Plymouth, Plymouth, U.K., in 2004. She is currently an Associate Professor (Reader) in Multimedia Communications and Networks in the School of Computing and Mathematics, University of Plymouth. She has been involved in several European and industry funded projects related with multimedia QoE. She has published 60 peer-refereed technical papers since 2000 and filed 1 patent. Her current research interests include multimedia (voice/video/audiovisual) quality assessment, QoS/QoE management/control, VoIP, and network performance characterization. Dr. Sun is the Chair of QoE Interest Group of IEEE MMTC during 2010–2012, Publicity Co-Chair of IEEE ICME 2011, and Post & Demo Co-Chair of IEEE Globecom 2010.

Emmanuel Ifeachor received the M.Sc. degree in communication engineering from Imperial College, London, U.K., and the Ph.D. degree in medical electronics from the University of Plymouth, Plymouth, U.K. He is a Professor of Intelligent Electronic Systems and Head of Signal Processing and Multimedia Communications research at the University of Plymouth. His primary research interests are in information processing and computational intelligence techniques and their application to problems in communications and biomedicine. His current research includes user-perceived QoS and QoE prediction and control for real-time multimedia services, biosignals analysis for personalized healthcare, and ICT for health. He has published extensively in these areas.

Suggest Documents