In an effort to make clinical databases easier to analyze, we havedeveloped an extensive library of flexible procedures that create a varietyof statistical reports. Since investigators want to look at many outcomemeasures, these procedures operate on lists of variables, looping througheach variable to run the analysis. Eliminating much of the tediousprogramming usually required to analyze clinical databases, these procedurescan save hours of programming time. "Let the computer do your work foryou."
Conditional Inference Methods for Incomplete Poisson Data WithEndogenous Time-Varying Covariates
Jason Roy
We investigate the effect of protease inhibitors (PIs) on the rate ofemergency room (ER) visits among HIV-infected women from a longitudinalcohort study. One strategy to account for serial correlation inlongitudinal studies is to assume observations are independent, conditionalon unit-specific nuisance parameters. It is possible to estimate thesemodels using unconditional maximum likelihood, where the nuisance parametersare assigned a parametric distribution and integrated out of the likelihood.Alternately, we can proceed using conditional inference, where we eliminatethe nuisance parameters from the likelihood by conditioning on a sufficientstatistic for these parameters. An advantage of conditional inferencemethods over parametric random effects models is all patient-leveltime-invariant factors (both measured and unmeasured) are accounted for inthe analysis. A limitation is standard conditional inference methods assumemissing data are missing completely at random and do not allow endogenoustime-varying covariates (i.e., ER visits in the past cannot predict futurePI use). Both assumptions are unlikely to be met for these data, because onewould expect `sicker' patients would be more likely to receive treatmentand/or drop out from the study. We develop new estimation strategies thatallow endogenous time-varying covariates and missing at random dropouts.The analysis shows that PI use reduces the rate of ER visits among patientswhose CD4 cell count was
On the Density of the Solution to a Random System of Equations
Anthony Almudevar
for PDF file of seminar abstract
Paradoxical Association of a Group of Atherosclerosis-related Genotypes with Reduced Rate of Coronary Events After Myocardial Infarction
David Oakes
Local Polynomial Density Estimation With Interval Censored Data
Derick R. Peterson and Mark J. van der Laan
A survival time is interval censored if only its current status,an indicator of whether the event has occurred,is observed at a possibly random number of monitoring times.We provide estimators with pointwise confidence limits for allderivatives of the distribution of the time till event,assuming that the observed monitoring times are independent of the time of interest. Our estimator is a standard local polynomial regression smootherapplied to the pooled sample of dependent current status observations.We show that the proposed estimator has a normal limiting distributionidentical to that of a smoother applied to independent current status observations. Thus local bandwidth selection techniques and pointwiseconfidence limit procedures for standard nonparametric regressionperform properly, despite the dependence in the pooled sample.
Pre-limit Theorems and Their Applications
Lev Klebanov
Finitely many empirical observations can never justify any tail behavior, thus they cannot justify the applicability ofclassical limit theorems in probability theory. In this paper weattempt to show that instead of relying on limit theorems, one may usethe so-called pre-limit theorems explained later. The applicability ofour pre-limit theorem relies not on the tail but on the 'centralsection' ('body') of the distributions and as a result, instead of alimiting behavior (when $n$, the number of i.i.d. observations tends toinfinity), the pre-limit theorem should provide an approximation fordistribution functions in case $n$ is 'large' but not too 'large'.Our pre-limiting approach seems to be more realistic for practicalapplications.
p-Values-Only-Based Stepwise Procedures for Multiple Testing and Their Optimality Properties
Alexander Gordon
for PDF file of seminar abstract
Modeling Cancer Screening: Further Thoughts and Results
Andrei Yakovlev
Over the years, many large-scale randomized trials have been conducted to evaluate the effects of breast cancer screening. These trials have failed to provide conclusive evidence for significant survival benefits of mammographic screening because of certain pitfalls in their design and lack of statistical power. However, such studies represent a rich source of information on the natural history of breast cancer, thereby opening up the way to evaluate potential benefits of breast cancer screening through using realistic mathematical models of cancer development and detection. We propose a biologically motivated model of breast cancer development and detection allowing for arbitrary screening schedules and the effects of clinical covariates recorded at the time of diagnosis on post-treatment survival. Biologically meaningful parameters of the model are estimated by the method of maximum likelihood from the data on age and tumor size at detection that resulted from two randomized trials known as the Canadian National Breast Screening Studies. When properly calibrated, the model provides a good description of the U.S. national trends in breast cancer incidence and mortality. The model was validated by predicting (without any further calibration or tuning) certain quantitative characteristics obtained from the SEER data. In particular, the model provides an excellent prediction of the size-specific age-adjusted incidence of invasive breast cancer as a function of calendar time for the period 1975-1999. Predictive properties of the model are also illustrated with an application to the dynamics of age-specific incidence and stage-specific age-adjusted incidence over the period 1975-1999.
Iterated Birth and Death Markov Process and its Biological Applications
Leonid Hanin
We solve, under realistic biological assumptions, the followinglong-standing problem in radiation biology: to find the distributionof the number of clonogenic tumor cells surviving a given arbitraryschedule of fractionated radiation. Mathematically, this leads to theproblem of computing the distribution of the state N(t) of an iteratedbirth and death Markov process at any time t counted from the end ofexposure. We show that the distribution of the random variable N(t)belongs to the class of generalized negative binomial distributions,find an explicit computationally feasible formula for thisdistribution, and identify its limiting forms. In particular, for t =0, the limiting distribution turns out to be Poisson, and an estimateof the rate of convergence in the total variation metric thatgeneralizes the classical Law of Rare Events is obtained.
Statistical Methods of Translating Microarray Data into Clinically Relevant Diagnostic Information in Colorectal Cancer
Byung Soo Kim
The aim of the study is two fold. First, we identify a set of differentially expressed (DE) genes in colorectal cancer, compared with normal colorectal tissues to rank genes for the development of biomarkers for population screening of colorectal cancer. Second, we detect a set of DE genes for subtypes of colorectal cancer which can be classified with respect to stage, location and carcino-embryonic antigen (CEA) level. The cancer and normal tissues were obtained from 87 colorectal cancer patients who underwent surgery at Severance Hospital, Yonsei Cancer Center, Yonsei University College of Medicine, from May to December of 2002. We originally attempted to extract total RNAs from tumor and normal tissues from 87 patients. From each of 36 patients we had RNA specimens both for tumor and normal tissues. However, from 19 (32) patients RNA specimens for normal tissues (tumor) only were available. Thus, we have a matched pair sample of size 36 and two independent samples of sizes 19 and 32. We conducted a cDNA microarray experiment using a common reference design with 17K human cDNA microarrays. We pooled eleven cancer cell lines from various origins and used it for the common reference. We used M=log2(R/G) for the evaluation of relative intensity. As a means of utilizing the whole data set we first use the matched pair data set as a training set from which we detect a set of DE genes between the normal tissue and the tumor. Then we use the pool of two independent data sets of "tumor only" and "normal only" as the test set for the validation. We employ four procedures for detecting a set of DE genes from the matched pair sample of size 36: Paired t test and Dudoit et al.s maxT procedure; Tusher et al.s SAM procedure; Lnnstedt and Speeds empirical Bayes procedure; Hotellings T2 statistic. We employ the diagonal quadratic discriminant analysis for the classification of the test set. We modify standard methods for the data at hand and propose a t-based statistics, say t3, which combine three data types for the detection of DE genes. We also extend Pepe et al.s ROC approach of ranking genes for the purpose of biomarker development for our mixed data type (Pepe et al., 2003 Biometrics). We note that only a few genes are required to achieve 0% test error in discriminating the normal tissue from the colorectal cancer. For the subtype analyses various approaches failed to identify DE genes with respect to colon cancer versus rectum cancer and stage B versus stage C. We employed a regression approach to detect a few genes which well correlated with CEA.
Fall 2003 Biostatistics Brown Bag Seminar Abstracts
Biomarker Measurement Error: A Bayesian Approach with Application toLung Cancer
Sally W. Thurston
Molecular biologists have identified specific cellular changes, calledbiomarkers, which enable them to better understand the pathway fromchemical exposure to initiation of some cancers. In lung cancer, onesuch biomarker is the number of DNA adducts in lung tissue. Adductsare formed from the binding of cigarette carcinogens to DNA, and thisadduct formation plays a central role in lung cancer initiation fromsmoking.
The goal of this work is to incorporate knowledge of such underlyingbiological mechanisms into a useful statistical framework to improvecancer risk estimates. The model considers adducts in the blood to bea surrogate measure of lung adducts. Lung adducts can never bemeasured in controls. The model is developed on a subset of the data,a small portion of which has biomarker measurements, and is used topredict cancer risk for the remaining data which do not have biomarkermeasurements. These predictions are compared to those from atraditional model, and to observed case/control status. Although thebiomarker model compares favorably with the traditional approach,model diagnostics suggest that better predictions could be made froman expanded model which allows for measurement error in lung adducts.
Functional Response Models and their Applications
Xin M. Tu
I will discuss a new class of semi-parametric (distribution-free) regression models with functional responses. This class of functional response models (FRM) generalizes the traditional regression models by defining the response variable as a function of several responses from multiple subjects. By using such multiple-subjects-based responses, the FRM integrates many popular non- and semi-parametric approaches within a unified modeling framework. For example, under the proposed framework, we can derive regression models to perform inferences for two-way contingency tables and to estimate variance components by identifying them as model parameters. The FRM also provides theoretical platform for developing new models for addressing limitations of existing non- and semi-models. For example, we can develop FRMs to generalize ANOVA so that we can not only compare the means, but also the variances of the multiple groups, and to derive and extend the Mann-Whitney-Wilcoxon (MWW) rank-based tests to more than two groups. For inferences, we discuss a novel approach by integrating the U-statistic theory with the generalized estimating equations. The talk is illustrated with examples from biomedical and psychosocial research.
Biomedical Modeling, Prediction and Simulation
Hulin Wu
In this brown-bag seminar, I am going to give a brief introductionto several on-going projects in my research group. Our research projects include
1) Nonparametric smoothing/regression methods for longitudinal data with applications to long-term HIV dynamic modeling
2) Mechanism-based modeling of longitudinal data with applications to AIDS treatment response modeling
a) Hierarchical Bayesian approach
b) Mixed-effects state-space model approach
3) Nonlinear and time-varying coefficient state-space models and particle filter techniques with applications to SARS epidemics
4) Clinical trial modeling and simulations
In summary, we are trying to combine the models and techniques frombiomathematics, engineering, computer science and statisticsto solve important biomedical problems. The multi-discipline featureof these projects will be further enhanced in the next several years.Currently the research faculty and postdoc fellows who are involved in these projects include Drs. Yangxin Huang, Jianwei Chen, Haihong Zhu and Dacheng Liu as well as other external collaborators.
A Discussion on Intent-To-Treat Principle for Blood Transfusion Trials
Hongwei Zhao
Spring 2003 Biostatistics Brown Bag Seminar Abstracts
Statistical Analysis of Skewed Data
Hongkun Wang and Hongwei Zhao
This talk is motivated by an example where the dependent variable has alot of zero values and a very skewed distribution, and the interest is tofind a relationship between several covariates and this variable. We willexamine briefly some current literatures which dealt with this problem. Wewill also discuss the interpretation of the parameters for some of thoseproposed models. In the end we will present the results of the dataanalysis of our example.
Inference on multi-type cell systems using clonal data and application to oligodendrocytes development in cell culture
Ollivier Hyrien
Fall 2002 Biostatistics Brown Bag Seminar Abstracts
Designing and Analyzing a Small Bernoulli-Trial Experiment, with Application to a Recent Cardiological Device Trial
Jack Hall
In the recent `WEARIT' trial, the success of a wearable defibrillator in preventing death from a heart attack in patients awaiting a heart transplant was evaluated. A trial design was called for that would meet certain requirements on error probabilities, that would make a decision -- for or against the device -- within a speci- fied maximum number (n = 15) of heart attack incidents in a group of recruited patients, and hopefully would terminate after many fewer incidents. We will use this setting to review single and double sampling plans, curtailed sampling, and various other sequential sampling plans that might be used for such a trial, along with the associated methodology for inference about the implicit Bernoulli parameter -- the success rate in resuscitating patients after a heart attack. We present this in the context of the WEARIT trial. You may be surprised how many statistical issues arise in an inference problem associated with observation of a few Bernoulli trials!
Using Local Correlation in Kernel-Based Smoothers for Dependent Data
Derick Peterson
This is a joint work with Hongwei Zhao and Sara Eapen.
Informative Prior Specification for Linear Regression Models using Parameter Decompositions
Sally Thurston
I will motivate this work by discussing a dataset for which theintended Bayesian analysis requires an informative prior, due tointeractions for which the data likelihood has no direct information.I will then present a method of obtaining informative priors for alinear regression model, based on information elicited from a subjectmatter expert. This method relies on a decomposition, novel in themultivariate case, of regression coefficients, their covariancematrix, and the residual variance of the regression. The onlyquantities which the expert needs to specify are the population means,variances, and pairwise correlations. Finally, I will discuss how Iused the information elicited from the expert to obtain a properinformative prior for this example. This is joint work with JoeIbrahim and Susan Korrick.
Topology, DNA Topology and Some Probabilistic Models of Nucleic Acids
Eva Culakova
This is an informative talk based on already known results.The presentation was inspired by my effort to understand the book by A D Bates and A Maxwell "DNA Topology".First I will introduce a classical result about "Hopf Map" in orderto give my appreciation to the field of topology. Next I will give an example of a situation where topology can help to understand DNA recombination. At the end I will briefly introduce a probabilistic modelthat is used to distinguish if two nucleic acids or proteinsequences are related or not. This part is based on the book byDurbin, Eddy, Krogh and Mitchison "Biological Sequence Analysis".
Li-Shan Huang
I will discuss the paper,Doksum, K., and Samarov, A. (1995). Nonparametric estimation of globalfunctionals and a measure of the explanatory power of covariates inregression. Annals of Statistics, 23, 1443-1473. and propose new ideas of nonparametric coefficient of determination.
An Overview of Multiple Imputation
Michael McDermott
Spring 2002 Biostatistics Brown Bag Seminar Abstracts
MADIT-II: A Recently Completed Sequential Clinical Trial
Jack Hall
The Multicenter Automatic Defibrillator Implantation Trial #2,administered here at the UR Medical Center with 1232 heart-diseasepatients enrolled through 76 hospital centers, came to a favorable conclusion in November by reaching a pre-specified sequential stop-ping criteria for efficacy. The statistical work was, and continues to be, carried out here, including statistical design of the study,weekly analyses of the survivorship data, chairing of the Monitoring Committee, and final analyses of efficacy and side-effects data, with cost analyses still to come. This talk will give an overview, focusing on the statistical aspects of designing, monitoring and analyzing such trial data.
Combining Statified and Unstratified Log-Rank Tests for Correlated Survival Data
Changyong Feng
The log-rank test is the most widely used nonparametric method fortesting treatment differences in survival-analysis-based clinicaltrials due to its efficiency under proportional hazards. Most previouswork on the log-rank test has assumed that the samples from the twotreatment groups are independent. However, in multicenter clinicaltrials, survival times of patients in the same medical center may becorrelated due to some factors specific to each center; or studies mayutilize pairing of patients or response units, resulting independence. For such data we can construct stratified and unstratifiedlog-rank tests (call them SLRT and ULRT respectively). These two testsaddress somewhat different features of the data. An appropriate linearcombination of these two tests may give a more powerful test thaneither individual test. Under a matched-pair frailty model, we obtainclosed-form asymptotic local alternative distributions and thecorrelation coefficient of SLRT and ULRT. Based on these results weconstruct an optimal linear combination of the two test statistics.Simulation studies with Hougaard model confirm our construction. Ourapproach is illustrated with data from the Diabetic RetinopathyStudy(Huster, et al, 1989). We extend our work to the cases ofstratum size > 2 and of variable (but upper bounded) stratum sizes.
Non-Sexual Household Transmission of HCV Infection
Fenyuan Xiao
Objective: This study was designed to determine the prevalence and theincidence of HCV infection among non-sexual household contacts ofHCV-infected women and to describe the association between HCVinfection and potential household risk factors in order to examinewhether non-sexual household contact is a route of HCV transmission.Methods: A baseline prevalence survey included 409 non-sexualhousehold contacts of 241 HCV-infected index women in the Houston areafrom 1994 to 1997. A total of 470 non- sexual household contacts withno evidence of HCV infection at baseline investigation werere-assessed approximately three years after baselineenrollment. Information on potential risk factors was collectedthrough face to face interviews and blood samples were tested foranti- HCV with ELISA-2 and Matrix / RIBA-2. The relationships betweenHCV infection and potential risk factors were examined by usingunivariate and multivariate logistic regression analyses.Results: The overall prevalence of anti-HCV positivity among 409non-sexual household contacts was 4.4%. The highest prevalence ofanti-HCV was found in parents (19.5%), followed by siblings (8.1%) andother relatives (5.6%); the children had the lowest prevalence ofanti-HCV (1.2%). The univariate analysis showed that IDU, bloodtransfusion, tattoos, sexual contact with injecting drug users, morethan 3 sexual partners in a lifetime, history of a STD, incarceration,previous hepatitis, and contact with hepatitis patients weresignificantly associated with HCV infection, however, sharing razors,nail clippers, toothbrushes, gum, food or beds with HCV-infectedwomen, and history of dialysis, health care job, body piercing, andhomosexual activities were not. Multivariate analysis found that IDU(OR = 221.7 with 95% CI of 22.8 to 2155.7) and history of a STD (OR =11.7 with 95% CI of 1.2 to 113.1) were the only variablessignificantly associated with HCV infection. No such associationsremained for other risk factors. The three-year cumulative incidenceof anti- HCV among 352 non-sexual household contacts of HCV-infectedwomen was zero.Conclusion: This study has provided no evidence that non-sexualhousehold contact is a likely route of transmission for HCVinfection. The risk of sharing razors, nail clippers, toothbrushes,gum, food and/or beds with HCV-infected women is not evident and hasnot been shown to be the likely mode for HCV spread among familymembers. This study does suggest that IDU is the likely route oftransmission for most HCV infection. Association also has been shownindependently with a history of STD. The prevalence of anti-HCV amongnon-sexual household contacts was low. Exposure to common parenteralrisk factors and sexual transmission between sexual partners mayaccount for HCV spread among household members of HCV-infectedpersons.
Parameter Estimation in Bivariate Copula Models
Antai Wang
Many models have been proposed for multivariatefailure-time data (T_{1},T_{2}) arising in reliability and otherapplications. A bivariate survivor function S(t_{1},t_{2}) issaid to be generated by an archimedean copula if it can beexpressed in the formS(t_{1},t_{2})=p[q{S_{1}(t_{1})}+q{S_{2}(t_{2})}]for some convex, decreasing function q defined on (0,1]. Here$p$ is the inverse function of q. Usually, p is specified assome function of an unknown parameter. Given a samplefrom S(t_{1},t_{2}), the distribution function ofV=S(T_{1},T_{2}), called the Kendall distribution, can beexpressed simply in terms of q. We use the score function fromthe log-likelihood of the V's to estimate the unknown parameter. Although theV's are unknown, they can be estimated empirically.Interestingly, our estimates based on the empirical V's are muchmore precise than the estimates based on the true and unknownV's. We also investigate an alternative procedure based oniteratively estimating the V's using the assumed copulastructure. We discuss the asymptotic theory for both methods andpresent some illustrative examples. I will also cover the recentdevelopment of a new method to estimate the parameter forbivariate data subject to right random censoring briefly.
Microarray Analyses
Li-Shan Huang
Bayesian inference of phylogeny
John Huelsenbeck
A Few Remarks on Partial Correlation
Heng Li
A Generalization of ROC Curves
Michael McDermott
Fall 2001 Biostatistics Brown Bag Seminar Abstracts
Use of placebo-controls vs. active-controls in clinical trials evaluating new treatments
Mike McDermott
Using Measurement Error Models w/o and w/ Interactions to Assess Effects of Prenatal and Postnatal Methylmercury Exposure in the Seychelles Child Development Study at age 66-months
Li-Shan Huang
Overrunning in Sequential Clinical Trials
Jack Hall
Most large-scale clinical trials these days have sequentialstopping rules that permit early termination of the trial whenclear superiority of a treatment is firmly established early inthe trial. Once a stopping boundary has been reached, statisticalmethods allow computation of p-values and estimates of treatmenteffects which recognize the sequential stopping rule. Typically,however, additional `lagged' data become available after theboundary has been reached. Earlier methods of accommodating such`overrunning' have serious defects. Two new methods (one jointwith Aiyi Liu, the other joint with Keyue Ding) will be described, and illustrated with data from the MADIT trial of an implanted defibrillator (New England Journal of Medicine, 335:1933-40, 1996).
A Second Look at Some Statistical Ideas Via Geometric Projection
Heng Li
Geometric concepts have always been useful in statistics. Consider, for example, the number of situations in which the idea of orthogonal projection plays a crucial role. We will discuss a closely related geometric operation, to be called orthogonal cross projection, and point out some of its manifestations in statistics (e.g., covariance). Power point technology would be used in the presentation, provided that all the equipments are functional and are not too sophisticated for the presenter to operate.
Two-Period Designs: Part II
David Oakes
On Two Consistent Tests of Bivariate Independence and Some Applications
Greg Wilding
The use of the correlation coefficient for testing bivariateindependence, although most common, has serious limitations. In thistalk I will discuss Hoeffding's (1948) test of bivariate independence,and its asymptotic equivalent due to Blum, Kiefer and Rosenblatt(1961), which are well known to be consistent against all dependencealternatives. Specifically, I will describe the status of its nulldistribution and compare its power using a variety of copulas,including those due to Morgenstern, Gumbel, Plackett, Marshall andOlkin, Raftery, Clayton, and Frank. I will also show how the test ofbivariate independence can be used for constructing simplegoodness-of-fit tests.
Smoothing Longitudinal Data: A Work in Progress
Derick R. Peterson, Hongwei Zhao, Sara Eapen
We consider the general problem of smoothing longitudinal data to estimate the nonparametric marginal mean function, where a random but bounded number of measurements are available for each independent subject. In stark contrast to recent work in this area, we show that not only can consistent estimators use the correlation structure of the data but that ignoring this correlation structure necessarily results in inefficiency, just as in the parametric setting. The class of local polynomial kernel-based estimating equations considered by Lin & Carroll (JASA 2000) are shown to be too small, such that they cannot properly make use of the correlation structure; this explains the problem with their general message that it is best to assume working independence, while also providing insight into why penalized likelihood-based correlated smoothing splines can be expected to be efficient. We propose a class of simple, explicit ad hoc estimators which although not efficient can improve upon the working independence local polynomial modeling approach by making use of the local correlation structure to dramatically improve the precision even for moderate sample sizes.
Spring 2001 Biostatistics Brown Bag Seminar Abstracts
30th Anniversary of the Biplot
Ruben Gabriel
A Review of Nonparametric Surival Estimation with Bivariate Right-CensoredData
Derick Peterson
The problem of nonparametric estimation of the survival function withcensored data has an elegant and efficient solution in theone-dimensional case: the Kaplan-Meier estimator. In higherdimensions, with multiple, possibly correlated, survival times,however, the task is much more formidable. Several authors haveproposed ad hoc estimators in this model, and in 1996 van der Laanproposed a theoretically efficient estimator, while also analyzinginefficient estimators previously proposed by Dabrowska, Prentice andCai, and Pruitt. I will review these estimators and explain why theNPMLE is not, in general, consistent for the bivariate survivalfunction. Unlike in the one-dimensional case, some sort of smoothingis required for efficient estimation. Bandwidth selection remains anopen problem in this context, thus contributing to the slow uptake ofvan der Laan's estimator.
Confidence Intervals: Equal-Tail, Shortest or Unbiased?
Jack Hall
Various criteria for choosing confidence intervals have beenconsidered in the literature. We focus on three, named in the title. When based on a pivot with a symmetric distribution, the three coincide, but in `small-sample' applications this coverslittle more than confidence intervals for normal population means, contrasts among such means, and rank procedures about a center of symmetry. Of course, from a large-sample perspective, a maximum likelihood estimate minus parameter, standardized by a standard error estimate, is such a pivot, and this covers many applications.
We review the pro's and con's of the three competitors, largely in the context of confidence intervals for the variance when sampling from a normal population, and similarly for varianceratios of analysis of variance. However, our motivation is for dealing with confidence intervals for the hazard ratio after a sequential clinical trial: What kind of interval should bepreferred?
Your opinions will be invited....
Analysis of Chicago Ozone Data 1981-1991
Li-Shan Huang
Ozone concentrations are affected by precursor emissions and by meteorological conditions. It is of interest to analyze trends in ozoneafter adjusting for meteorological influences. We will discussthe following 4 approaches to analyze Chicago Ozone data 1981-91:
- Nonlinear Regression, by Bloomfield, Royle, Steinberg and Yang (1996)
- Logistic Models, by Smith and Huang (1993)
- Semi-parametric modeling, by Gao, Sacks and Welch (1996)
- Tree regression & empirical Bayes, by Huang and Smith (1999)
A Test for Equality of Ordered Inverse Gaussian Means
Lili Tian
The inverse gaussian (IG) distribution, called the fraternal twin ofthe Gaussian distribution, has been widely used in applied fields dueto the facts that it is ideally suited for modeling positively skeweddata and that its inference theory is well known to be analogous tothat of the Gaussian distribution in numerous ways. For example,Weiss (1982, 1983, 1984) demonstrated that the distribution ofcirculation times of drug molecules through the body can beapproximated by the IG distribution. We propose a test procedure toassess trends in the IG response variable (e.g., in animal toxicitystudies). This approach, based on combining independent tests usingclassical methods, can be easily extended to a spectrum of orderconstraints. It is also shown that this procedure is intriguinglyanalogous to that for the Gaussian distribution. The power propertiesare examined by simulation.
Correlation Between Variables When Each Is Subject to Sets of Exchangeable Measurements: An Approach Based on Group Invariance
Heng Li
An analytical procedure is developed for a type of data structure suitable for modelling the situation in which multiple measurements aremade on each of a set of variables, and the measurements can be divided into exchangeable subsets. The procedure is based on the pattern in covariance matrix corresponding to the group invariance inherent in the data structure,from which a closed-form expression of Gaussian likelihood can be found. Sufficient statistics in the form of sums of squares and cross products and their distributions are obtained, leading to methods of statistical inference for a variety of practical purposes from correction forattenuation to estimation of reliability coefficients.The closed-form expression of the likelihood function is also helpful for implementing likelihood-based computation, such as the EM algorithm forhandling missing data, and for Bayesian inference. The latter can be a very effective tool in dealing with some inferential problems that do not have standard solutions in the traditional framework. Examples include guaranteeing the nonnegative definiteness of an estimated disattenuated correlation matrix and combining information on association parameters from a main study and a reliability, reproducibility, or repeatability study. No originality is claimed and nothing presented will be beyond what is intuitively obviousand/or what has already been in the literature, although the procedure isreadily adaptable for variations on the basic structure.The main objective is to illustrate the application of group invariancein modelling and analysis, which is the topic of almost all my previouslunch presentations. The current presentation, however, involves a datastructure that has not been discussed in the previous presentations.
On Kendall's Process and an Associated Estimation Procedure
David Oakes
If X is a continuous univariate random variable with distribution function F(x) then it is well-known that F(X) = pr (X
This talk will explore the use of a bivariate analog of the probability integral transform in estimating the parameters governing the dependence structure in a bivariate distribution. We will present and explain some simulation results that at first sight seemed somewhat surprising.
(This is joint work with Antai Wang and will form the basis for his upcoming qualifying paper)
Bootstrap variations: random weighting
Derick Peterson
A review of treatment allocation methods in clinical trials
Hongwei Zhao
Randomized-Withdrawal and Randomized-Start Designs
Jack Hall
Randomized-withdrawal and randomized-start designs have recently been introduced in the neurological clinical trials literature as designs which facilitate detection of long-term (`neuroprotective') effects as distinguished from short-term (`symptomatic') effects of a treatment relative to a placebo. Models and analyses for such designs will be described, along with various advantages and limitations. Factorial versions will also be considered.
Fall 2000 Biostatistics Brown Bag Seminar Abstracts
A Roughness-Penalty View of Kernel Smoothing
Li-Shan Huang
It has been shown that a smoothing spline estimateis an equivalent kernel estimate. In this paper, we show thatboth the Nadaraya-Watson and local linear kernel estimators areequivalent penalized estimators.
Algebraic Rationales for Some Statistical Procedures: Possibilities forUnification and Generalization
Heng Li
Many common procedures in statistics have algebraic interpretations.We will discuss a series of examples beginning with the most basicones. It will be shown how algebraic rules extracted from simplecases can be applied to tackle some non-trivial problems.Possibilities for a general framework will also be discussed.
A Simulation Study of Frailty Effects in Censored Bivariate Survival Data
Sara Eapen
Multivariate censored survival data typically have correlated failuretimes. The corrleation can be a consequence of the observationaldesign, for example with clustered sampling and matching, or it can bea focus of interest as in genetic studies, longitudinal studies ofrecurrent events and other studies involving multiplemeasurements. The correlation between failure times can be accountedfor by fixed or random effects. A simulation study was designed tocompare the performance of the mixture likelihood approach toestimating the model with these frailty effects in censored bivariatesurvival data. It is found that the mixture method is surprisinglyrobust to misspecification of the frailty distribution.
Profile Likelihood and the EM-algorithm
David Oakes
A Review of the Case-Crossover Design & Applications
Jack Hall
The case-crossover design -- a case-control study in which the subject serves as his own control -- was formally introduced by the epidemiologist Malcolm McClure in 1991. He described it as `a method for studying transient effects on the the risk of acute events'. The design will be described and discussed in the context of several published applications (including participation and Robert Tibshirani), evaluating the questions: Are MI's more likely following (i) sexual activity? (ii) coffee drinking? (iii) episodes of anger? Are auto accidents more likely while using a cell phone?
Exploring Multivariate Data with Density Trees
Richard Raubertas
Classification trees are widely used as rules for assigningobservations to classes based on their attributes or features.A classification tree is equivalent to a partition of thefeature space into rectangular regions, with a constant estimate of class probabilities in each region. Density treesare proposed as a variation on this idea, designed to examine the multivariate distribution of the features themselves. A tree-structured approach is used to partition the feature space into low- and high-density regions; that is, regions withespecially low or especially high numbers of observations relative to an arbitrary reference distribution. This results in a nonparametric, piecewise-constant estimate of the joint distribution of the features. Because the regions are defined by simpleinequalities on individual features, density trees can providea direct and interpretable description of multivariate structure.In addition, they may be useful for identifying regionswhere prediction models derived from the data are poorly supported by observations.
Nonparametric regression for longitudinal data
Hongwei Zhao
My talk is motivated by an applied example where it is desirable to fita nonparametric regression model for data that were obtainedlongitudinally. Even though theory for nonparametric regression forindependent data have been well developed, there are still questions thatneed to be answered for applying nonparametric methods to the longitudinaldata. Simulations are conducted to compare some current available methodsas well as some news ones. These methods are also applied to a realexample.
Generalized Nonlinear Regression
Christopher Cox
Please send your comments and suggestions about this web page to the BST Webmaster ()