Operations Research and Applications : An International Journal (ORAJ), Vol.4, No.3/4, November 2017DOI: 10.5121/oraj.2017.4401 1RESIDUALS AND INFLUENCE IN NONLINEARREGRESSION FOR REPEATED MEASUREMENT DATAMunsir Ali, Yu Feng, Ali choo, Zamir AliNanjing University of Science and Technology, P.R.
ChinaABSTRACTAll observations don’t have equal significance in regression analysis. Diagnostics of observations is animportant aspect of model building. In this paper, we use diagnostics method to detect residuals and influentialpoints in nonlinear regression for repeated measurement data.
Cook distance and Gauss newton method havebeen proposed to identify the outliers in nonlinear regression analysis and parameter estimation. Most of thesetechniques based on graphical representations of residuals, hat matrix and case deletion measures. The resultsshow us detection of single and multiple outliers cases in repeated measurement data. We use these techniquesto explore performance of residuals and influence in nonlinear regression model.KEY WORDS:Hat matrix, Cook distance, Residuals, Nonlinear regression models.
Mathematics Subject Classification:62J20,62J02, 62G05,62J05,62J99.1.INTRODUCTIONData containing of repeated measurements hold on each of number of individuals appear frequentlyin biomedical and biological implementations. This kind of modeling data generally impliescharacterization of the relationship among the measured response of y , measurement factor, orcovariate x 11. In many implementations, the relationship between y and x is nonlinear inunknown parameters of attention.
The expression of repeated measurement on an individual requires definite care in marking therandom variation in the data. It is important to recognize random variation amongmeasurements within a given individual and random variation among the individuals.Inferential methods assist these different variance components in the framework of a properhierarchical statistical model. When the relationship between x and y in the unknownparameters is linear, the framework is that of the classical linear mixed effects model 10.
In this case, Bayesian inferential method is provided satisfactory hierarchical linear model14. There is a substantial literature about hierarchical linear model, McCulloch, Casella,and Searle (1992). Linear modeling methods for repeated measurement data are quiteadvanced and developed, and well recorded in statistical literature, Crowder and Hand(1990), Lindsey(1993), and Diggle, Liang and Zenger(1994).
In this particular work, we aim to indicate residuals data points in nonlinear regression for repeatedmeasurement data and parameter estimation. We use Cook.distacne and Gauss newton method, andwe also explore some useful examples for parameter estimation and Outliers detection. Theorganization of this paper is given as; in section 2, we give some models and parameter estimation;section 3 deals with the diagnostics methods in case of single and multiple Outliers detection by Operations Research and Applications : An International Journal (ORAJ), Vol.4, No.
3/4, November 20172scatterplots and parameter estimation with some applicable examples while section 4 concludes thepaper.2. THE MODEL AND THE PARAMETER ESTIMATIONWe introduce hierarchal nonlinear model that forms the fundamental inferential methods and discussthe available techniques for the analysis of repeated measurement data. In the linear case, intra andinter individual variation can assist within the two stages model. The first stage characterizes by anonlinear regression model with a model for individual covariance structure, and inters individualvariability represent in the second stage through individual specific regression parameters. Let ij y denote the jth response, 1,..
, ij n = for ith individual, i = ,..,1 m, taken at a set ofconditions sum up by the vector of covariates ij x , so that a sum of mN = ?i=1ni response havebeen observed. The vector ij x includes variables.Suppose that, for individuali , the jth response obey the model.( , ) ij ij i ij y f x e = + ? (1) Where ij e is a random errorexpression considering unreliability in the response, given the ithindividual, with ( ) 0 E eij i ? = Getting the response and errors for the ith individual into the( × )1 i n vectors ‘1 ,.
….
., i i ini y = y y , and ‘1 ,..
.., i i ini e = e e , respectively, and interpreting the( 1) in × vector.( , ) , i i i i y f x e = + ? (2)where ( ) 0 E ei i ? = .The model given in (1) and (2) describes the organizing and random variation association withmeasurement on the ithindividual.If for nonlinear regression ~ (0, ) ? i i N ? , then y on the parameter ? of score function L( ) ?•observation information matrix (? )••? L and fisher information matrix I( ) ? respectively.Computational of nonlinear least square estimates need to use the iterative numerical algorithm.)( 0^=•L ? , we may use Taylor expansion at point ? 0)( ( ) )(( ) 0 0^0^ ^0^= + ? + ? =• • ••L ? L ? L ? ? ? o ? ? ( ) ( ), i 1 i i 1 i? ? L ? L ?•?••+= + ? i = 2,1 ,.
…. (3)Until i i 1? ? ?+? < , ? is an advance fixed value. Gauss newton method has some importantproperties. Operations Research and Applications : An International Journal (ORAJ), Vol.
4, No.3/4, November 201733. STATISTICAL DIAGNOSTICS FOR NONLINEAR MODELS WITH REPEATEDMEASUREMENT DATAIn statistics, Cook’s distance is an often used to estimate the influential points of a data 12.Datapoints with huge residuals (outliers) and/or high leverage may misrepresent the outcome andaccuracy of a regression.^ ^ ^ ^^2 ( ) ( )^2( ) ( )( )( , )T TT ij ijij ijU UD D U U pp? ? ? ???? ?= = ??(4)Wheref x( , ) U???=?, Cook distance gives squared distance from^? to^? ( )i relative to the fixedgeometry of T U U . The values of^2( , ) T D U U p i?? can be converted to a familiar probability scaleby comparing calculated values to the F p n p ( , ) ? ? ? distribution.Cook distance in multiple cases:^ ^ ^ ^^2 ( ) ( )^2( )( )( )( , )TT i ii iU UD D U U pp? ? ? ???? ?= = ?? (5)Di Can be expressed in multidimensional analogues of the ir , and ii v . The results are obtained byfirst expressing^? ( )i as a function of^? :^1( ) ( ) ( ) ( ) ( ) ( ) T T ? i i i i i U U U Y ?=1( ) ( ) T T T T U U U U X Y X Y i i i i?= ? ? (6)The inverse of (6)^1 1 1 1( ) ( ) ( ) ( ) ( ) T T T T T T ? i U U U U U I V U U U U Y X Y i i i i i? ? ? ? = + ? ?^ ^1 1 1 ( ) ( ) ( ( ) ) T T ? ? U U U I V X I I V V Y i i i i i i? ? ? = ? ? ? + + ? (7)^ ^1 1( ) ( ) ( ) T T ? ? i i i i U U U I V e ? ?= ? ? (8)Substituting into (6) ahead to the form:1 1^2( ) ( ) Ti i i i iie I V V I V e Dp ?? ?? ?=?(9)Single case Cook distance:^ ^ ^ ^1( ) ( ) ( ) ( )Iij ij ij ? ? ? ? I L•?= + (10)In this case, 1( ) TI U U ??= ? , and 1( ) TL U Ue ?•?= ?Operations Research and Applications : An International Journal (ORAJ), Vol.
4, No.3/4, November 20174Replacing into (4), we get the form1 1 1 T T D U U U e ij ij ij ij ij ij ij? ? ? = ? ? (11)Multiple cases Cook distance^ ^ ^ ^1( ) ( ) ( ) ( ) ( ) ii i ? ? ? ? I L•?= +Substituting into (6), this form gets1 1 1( ) ( ) ( ) ( ) ( ) ( ) T D U U U U e i i i i i i i? ? ? = ? ? (12)Example 1:We observe the data in table I that taken from a study reported by Kwan et al. (1876) of thepharmacokinetics of indomethacin following bolus intravenous injection of the same dose insix human volunteers, for each subject plasma concentrations of indomethacin weremeasured at 11 times intervals regarding from 15 to 8 hours post-injection11.
Table. i: plasma concentrations ( / ) µg ml following intravenous injection of indomethacin for six humanWe consider two examples to calculate Gauss newton method1 2 3 4 y x x = ? + ? ? ? ? ? exp( ) exp( ), 1 4 ? ? ,….
.., 0, > (13)We examine Gauss Newton method1 1 1 1 i i T T ? ? U U U e + ? ? ? = + ? ?Using MATLAB’s convention for representing Jacobin matrix U which is equal tof ( ) U???=?where ? = In a known case, and e y f = ? ( ) ?,We chose initial values of ? ,0 ? = 0.7, 0.6,0.54, 0.
5, after 5 iterations we obtained^? = 0.75,0.65,0.50,0.45.
which is satisfied under condition 1 4 10 i i ? ?+ ? ? < .Operations Research and Applications : An International Journal (ORAJ), Vol.4, No.3/4, November 20175Example 2:We consider another example to compute Gauss newton method.
The result of estimation of the parameters in based on 11 responses for the fifth subjects are given inTable I. Using Matlab to calculate G-N method and get parameter estimations.We choose initial values,0 ? =1.0000,1.2000,-1.
1000,-1.2000,then use Gauss newton method to estimate the values of ? . After 5 iterations, we obtainedˆ? =1.2715 , 1.0408, -1.2327, -1.5069, and we satisfied under this condition1 4 10 i i ? ?+ ? ? <.
Example.3.We consider Table I, we focus on fifth subject to detect single case outlier. Wheref ( ) U???=? ,? = In and e is unobservable error y f ? ( ) ?.Fig.
1. Scatter plot for the table I (fifth individual) under model (11).In the above scatterplot, we obtained cook’s distance and found outlier in a set of predictedvalues. First observation of our data set is an outlier which is indicated in (figure.1).
Operations Research and Applications : An International Journal (ORAJ), Vol.4, No.3/4, November 20176Example.
4We consider another example to detect multiple outliers’ cases.Figure. 2. Scatter plot for the table I under model (12).We obtained cooks.
Distance and found four values that fall far from other data points. Sowe consider these (23, 56, 45, 12) points outliers in 66 observations data. The outliers aredesignated in (figure.2) cook’s distance plot.4.
CONCLUSION:It is well understood that all observations of a data set don’t play the same role in the resultof regression analysis. For example, the character of the regression line maybe determine byonly a few observations, while most of the data is somewhat ignored. Such observations thathighly influence the results of the analysis are called influential observations.It is important,for many causes, to be able to detect influential observations.
In this paper, we establishedGauss newton method for parameter estimation and as well we extended rebut version ofCook. Distance in single and multiple cases to detect outliers data points for repeatedmeasurement data.REFERENCES:1 Ayinde, K., Lukman, A.F. and Arowolo, O. (2015) “Robust Regression Diagnostics of InfluentialObservations in Linear Regression Model”. Open Journal of Statistics,vol.
5, pp273-283.2 Altman, N. & Krzywinski, M.(2016) “Analyzing outliers influential or nuisance”. .
Nature methods,vol.13, pp281-282.3 Law, M. & Jackson, D. (2017) “Residual plot for linear models with censored outcome data: A refinedmethod for visualizing residual uncertainty”. Communication in statistics simulation and computation,vol.46, pp3159-3171.
4 Cook, R.D and Tsai, C.L. (1985)”Residual in nonlinear regression”, Biometrika, vol.
72, No.1, pp23-29. Operations Research and Applications : An International Journal (ORAJ), Vol.4, No.3/4, November 201775 Cook R.
D. (1979)”Influence observations in linear regression”, J.Amer.
statist.Assoc, vol.74, pp169-74.6 Cook R.D, and presscot.
(1981)”Approximation significance levels for detecting outlier in linearregression”, Technometrics, vol.23,pp59-64.7 Ellenberg, J.H. (1976)”Testing of a single outlier from a general regression model”, Biometrics, vol. 32,pp637-45.8 Vonesh, E.F.
(1992)”Nonlinear models for the analysis of longitudinal data”, Statistics in medicine, vol.11, pp1929-1954.9 Solomon P.
J. and cox D.R. (1992)”Nonlinear components for variance models”, Biometrikka,vol. 79,pp1-11.
10 Cook R.D. (1979)”Influence observation in liner regression”, J.
Am.statist.assoc,vol. 74, pp169-174.11 Diggle, P. J. (1988)”An approach to the analysis of repeated measurements”, Biometrics, vol.
44, pp959-971.12 PREGIBON, D. (1981) “Logistic regression diagnostics”, Annual of statistics, vol.9, pp705-724.13 Anscombe, F.J. (1961) “Examination of residuals, Proc.
fouth Berkeley symp” vol. 1, pp1-36.14 MARIE DAIDIAN and DAVID M.GILTINAN .
march. (1995) “Nonlinear models for repeatedmeasurement data”.AUTHORMunsir Ali, school of science, department of statistics Nanjing University of science andtechnology, P.R china.