- Home
- Documents
*CONDITIONAL TREATMENT EFFECT: A STUDY ON 2020. 9. 15.¢ Key words and phrases: Asymptotic...*

prev

next

out of 77

View

0Download

0

Embed Size (px)

Statistica Sinica

DOUBLY ROBUST ESTIMATION FOR

CONDITIONAL TREATMENT EFFECT:

A STUDY ON ASYMPTOTICS

Chuyun Ye1, Keli Guo2 and Lixing Zhu1,2

1 Beijing Normal University, Beijing, China

2 Hong Kong Baptist University, Hong Kong

Abstract: In this paper, we apply doubly robust approach to estimate, when

some covariates are given, the conditional average treatment effect under para-

metric, semiparametric and nonparametric structure of the nuisance propensity

score and outcome regression models. We then conduct a systematic study on

the asymptotic distributions of nine estimators with different combinations of es-

timated propensity score and outcome regressions. The study covers the asymp-

totic properties with all models correctly specified; with either propensity score

or outcome regressions locally / globally misspecified; and with all models locally

/ globally misspecified. The asymptotic variances are compared and the asymp-

totic bias correction under model-misspecification is discussed. The phenomenon

that the asymptotic variance, with model-misspecification, could sometimes be

even smaller than that with all models correctly specified is explored. We also

conduct a numerical study to examine the theoretical results.

ar X

iv :2

00 9.

05 71

1v 1

[ m

at h.

ST ]

1 2

Se p

20 20

Key words and phrases: Asymptotic variance, Conditional average treatment

effect, Doubly robust estimation.

1. Introduction

To explore the heterogeneity of treatment effect under Rubin’s protential

outcome framework (Rosenbaum and Rubin (1983)) to reveal the casuality

of a treatment, conditional average treatment effect (CATE) is useful, which

is conditional on some covariates of interest. See Abrevaya et al. (2015)

as an example. Shi et al. (2019) showed that the existence of optimal

individualized treatment regime (OITR) has a close connection with CATE.

To estimate CATE, there are some standard approaches available in

the literature. When either propensity score function or outcome regres-

sion functions or both are unknown, we need to estimate them first such

that we can then estimate the CATE function. Regard these functions as

nuisance models. Abrevaya et al. (2015) used the propensity score-based

(PS-based) estimation under parametric (P-IPW) and nonparametric struc-

ture (N-IPW), and showed that N-IPW is asymptotically more efficient than

P-IPW. Zhou and Zhu (2020) suggested the PS-based estimation under a

semiparametric dimension reduction structure (S-IPW) to show the advan-

tage of semiparametric estimation and Li et al. (2020) considered outcome

regression-based (OR-based) estimation under parametric (P-OR), semi-

parametric (S-OR) and nonparametric structure (N-OR) to derive their

asymptotic properties and suggested also the use of semiparametric method.

Both of the works together give an estimation efficiency comparison between

PS-based and OR-based estimators. A clear asymptotic efficiency ranking

was shown by Li et al. (2020) when the propensity score and outcome re-

gression models are all correctly specified and the underlying nonparametric

models is sufficiently smooth such that, with delicately selecting bandwidths

and kernel functions, the nonparametric estimation can achieve sufficiently

fast rates of convergence:

OR-based estimators︷ ︸︸ ︷ O-OR ∼= P-OR � S-OR � N-OR ∼=

PS-based CATE estimators︷ ︸︸ ︷ N-IPW � S-IPW � P-IPW ∼= O-IPW (1.1)

where A � B denotes the asymptotic efficiency advantage, with smaller

variance, of A over B, A ∼= B the efficiency equivalence and O-OR and

O-IPW stand for OR-based and PS-based estimator respectively assuming

the nuisance models are known with no need to estimate.

As well known, the doubly robust (DR) method that was first sug-

gested as the augmented inverse probability weighting (AIPW) estimation

proposed by Robins et al. (1994). Later developments provide the estima-

tion consistency (Scharfstein et al. (1999)) for more general doubly robust

estimation, not restricted to AIPW, that even has one misspecified in the

two involved models. For further discussion and introduction on DR es-

timation, readers can refer to, as an example, Seaman and Vansteelandt

(2018). Like Abrevaya et al. (2015), Lee et al. (2017) brought up a two-

step AIPW estimator of CATE also under parametric structure. For the

cases with high-dimensional covariate, Fan et al. (2019) and Zimmert and

Lechner (2019) combined such an estimator with statistical learning.

In the current paper, we focus on investigating the asymptotic efficiency

comparisons among nine doubly robust estimators under parametric, semi-

parametric dimension reduction and nonparametric structure. To this end,

we will give a systematic study to provide insight into which combinations

may have merit in an asymptotic sense and in practice, which ones would

be worth of recommendation for use. We also further consider the asymp-

totic efficiency when nuisance models are globally or locally misspecified,

which will be defined later. Roughly speaking, local misspecification means

that misspecified model can converge, at a certain rate, to the correspond-

ing correctly specified model as the sample size n goes to infinity, while

globally misspecified model cannot. Denote cn, d1n and d0n respectively the

departure degrees of used models to the corresponding correctly specified

models, and Vi(x1) for i = 1, 2, 3, 4, which will be clarified in Theorems 1, 2,

3 and 5 respectively, of the asymptotic variance functions of x1 for all nine

estimators in difference scenarios. Here V1(x1) is the asymptotic variance

when all models are correctly specified, which is regarded as a benchmark

for comparisons. We have that V1(x1) ≤ V3(x1), but V2(x1) and V4(x1) are

not necessarily larger than V1(x1). Here we display main findings in this

paper.

• When all nuisance models are correctly specified, and the tuning pa-

rameters including the bandwidths in nonparametric estimations are

delicately selected, the asymptotic variances are all equal to V1(x1).

Write all DR estimators as DRCATE. Together with (1.1), the

asymptotic efficiency ranking is as:

OR-based estimators︷ ︸︸ ︷ O-OR ∼= P-OR � S-OR � N-OR ∼= DRCATE ∼=

PS-based CATE estimators︷ ︸︸ ︷ N-IPW � S-IPW � P-IPW ∼= O-IPW

• If only one of the nuisance models, either propensity score or outcome

regressions, is (are) misspecified, the estimators remain unbiased as

expectably. But globally misspecified outcome regressions or propen-

sity score lead to asymptotic variance changes. We can give exam-

ples of propensity score to show that the variance can be even smaller

than that with correctly specified models. Further, when the nuisance

models are locally misspecified, the asymptotic efficiency remains the

same as that with no misspecification.

• Further, when all nuisance models are globally misspecified, we need

to take care of estimation bias. When the misspecifications are all

local, but the convergence rates cnd1n and cnd0n are all faster than the

convergence rate of nonparametric estimation that will be specified

later, the asymptotic distributions remain unchanged.

To give a quick access to the results about the asymptotic variances,

we present a summary in Table 1. Denote PS(P ), PS(N) and PS(S)

as estimators with parametrically, nonparametrically and semiparametri-

cally estimated PS function respectively, OR(P ), OR(N) and OR(S) as

estimators with parametrically, nonparametrically and semiparametrically

estimated OR functions respectively. Dark cells mean no such combina-

tions.

The remaining parts of this article are organized as follows. We first de-

scribe the Rubin’s potential outcome framework and the relevant notations

in Section 2. Section 3 contains a general two-step estimation of CATE,

while Section 4 describes the corresponding asymptotic properties under

different situations. Section 5 presents the results of Monte Carlo simula-

tions and Section 6 includes some concluding remarks. We would like to

point out that such comparisons do not mean the estimations that are of

asymptotic efficiency advantage are always worthwhile to recommend be-

Table 1: Asymptotic variance result summary

Combination

All

Correctly

specified

Globally

Misspecified

PS

Locally

Misspecified

PS

Globally

Misspecified

OR

Locally

Misspecified

OR

PS(P ) +OR(P ) V1(x1)

V2(x1)

(Not necessarily

enlarged )

V1(x1) V3(x1)

(Enlarged)

V1(x1)

PS(P ) +OR(N) V1(x1) V1(x1) V1(x1)

PS(N) +OR(P ) V1(x1) V1(x1) V1(x1)

PS(N) +OR(N) V1(x1)

PS(P ) +OR(S) V1(x1)

V2(x1)

(Not necessarily

enla