Table of Contents

 

 

 

 

Preface

xi

Chapter 1

Introduction to Biostatistics

1

1.1

What is Biostatistics?

1

1.2

Populations, Samples, and Statistics

2

1.2.1

The Basic Biostatistical Terminology

3

1.2.2

Biomedical Studies

5

1.2.3

Observational Studies versus Experiments

7

1.3

Clinical Trials

9

1.3.1

Safety and Ethical Considerations in a Clinical Trial

9

1.3.2

Types of Clinical Trials

10

1.3.3

The Phases of a Clinical Trial

11

1.4

Data Set Descriptions

12

1.4.1

Birth Weight Data Set

12

1.4.2

Body Fat Data Set

12

1.4.3

Coronary Heart Disease Data Set

12

1.4.4

Prostate Cancer Study Data Set

13

1.4.5

Intensive Care Unit Data Set

14

1.4.6

Mammography Experience Study Data Set

14

1.4.7

Benign Breast Disease Study

15

 

Glossary

17

 

Exercises

18

Chapter 2

Describing Populations

23

2.1

Populations and Variables

23

2.1.1

Qualitative Variables

24

2.1.2

Quantitative Variables

25

2.1.3

Multivariate Data

27

2.2

Population Distributions and Parameters

28

2.2.1

Distributions

29

2.2.2

Describing a Population with Parameters

33

2.2.3

Proportions and Percentiles

33

2.2.4

Parameters Measuring Centrality

35

2.2.5

Measures of Dispersion

38

2.2.6

The Coefficient of Variation

41

2.2.7

Parameters for Bivariate Populations

43

2.3

Probability

46

2.3.1

Basic Probability Rules

46

2.3.2

Conditional Probability

So

2.3.3

Independence

52

2.4

Probability Models

53

2.4.1

The Binomial Probability Model

54

2.4.2

The Normal Probability Model

57

2.4.3

Z Scores

63

 

Glossary

64

 

Exercises

65

Chapter 3

Random Sampling

76

3.1

Obtaining Representative Data

76

3.1.1

The Sampling Plan

78

3.1.2

Probability Samples

78

3.2

Commonly Used Sampling Plans

80

3.2.1

Simple Random Sampling

80

3.2.2

Stratified Random Sampling

84

3.2.3

Cluster Sampling

86

3.2.4

Systematic Sampling

88

3.3

Determining the Sample Size

89

3.3.1

The Sample Size for a Simple Random Sample

89

3.3.2

The Sample Size for a Stratified Random Sample

93

3.3.3

Determining the Sample Size in a Systematic Random Sample

99

 

Glossary

100

 

Exercises

102

Chapter 4

Summarizing Random Samples

109

4.1

Samples and Inferential Statistics

109

4.2

Inferential Graphical Statistics

110

4.2.1

Bar and Pie Charts

111

4.2.2

Boxplots

114

4.2.3

Histograms

120

4.2.4

Normal Probability Plots

126

4.3

Numerical Statistics for Univariate Data Sets

129

4.3.1

Estimating Population Proportions

129

4.3.2

Estimating Population Percentiles

136

4.3.3

Estimating the Mean, Median, and Mode

137

4.3.4

Estimating the Variance and Standard Deviation

143

4.3.5

Linear Transformations

148

4.3.6

The Plug-In Rule for Estimation

151

4.4

Statistics for Multivariate Data Sets

153

4.4.1

Graphical Statistics for Bivariate Data Sets

154

4.4.2

Numerical Summaries for Bivariate Data Sets

156

4.4.3

Fitting Lines to Scatterplots

161

 

Glossary

163

 

Exercises

166

Chapter 5

Measuring the Reliability of Statistics

181

5.1

Sampling Distributions

181

5.1.1

Unbiased Estimators

183

5.1.2

Measuring the Accuracy of an Estimator

184

5.1.3

The Bound on the Error of Estimation

186

5.2

The Sampling Distribution of a Sample Proportion

187

5.2.1

The Mean and Standard Deviation of the Sampling Distribution of

187

5.2.2

Determining the Sample Size for a Prespecified Value of the Bound on the Error Estimation

190

5.2.3

The Central Limit Theorem for

191

5.2.4

Some Final Notes on the Sampling Distribution of

192

5.3

The Sampling Distribution of

193

5.3.1

The Mean and Standard Deviation of the Sampling Distribution of

193

5.3.2

Determining the Sample Size for a Prespecified Value of the Bound on the Error Estimation

196

5.3.3

The Central Limit Theorem for

197

5.3.4

The t Distribution

199

5.3.5

Some Final Notes on the Sampling Distribution of

201

5.4

Comparisons Based on Two Samples

202

5.4.1

Comparing Two Population Proportions

203

5.4.2

Comparing Two Population Means

209

5.5

Bootstrapping the Sampling Distribution of a Statistic

215

 

Glossary

218

 

Exercises

219

Chapter 6

Confidence Intervals

229

6.1

Interval Estimation

229

6.2

Confidence Intervals

230

6.3

Single Sample Confidence Intervals

232

6.3.1

Confidence Intervals for Proportions

233

6.3.2

Confidence Intervals for a Mean

236

6.3.3

Large Sample Confidence Intervals for μ

237

6.3.4

Small Sample Confidence Intervals for μ

238

6.3.5

Determining the Sample Size for a Confidence Interval for the Mean

241

6.4

Bootstrap Confidence Intervals

243

6.5

Two Sample Comparative Confidence Intervals

244

6.5.1

Confidence Intervals for Comparing Two Proportions

244

6.5.2

Confidence Intervals for the Relative Risk

249

 

Glossary

252

 

Exercises

253

Chapter 7

Testing Statistical Hypotheses

265

7.1

Hypothesis Testing

265

7.1.1

The Components of a Hypothesis Test

265

7.1.2

P-Values and Significance Testing

272

7.2

Testing Hypotheses About Proportions

276

7.2.1

Single Sample Tests of a Population Proportion

276

7.2.2

Comparing Two Population Proportions

282

7.2.3

Tests of Independence

287

7.3

Testing Hypotheses About Means

295

7.3.1

t-Tests

295

7.3.2

t-Tests for the Mean of a Population

298

7.3.3

Paired Comparison t-Tests

302

7.3.4

Two Independent Sample t-Tests

307

7.4

Some Final Comments on Hypothesis Testing

313

 

Glossary

314

 

Exercises

315

Chapter 8

Simple Linear Regression

333

8.1

Bivariate Data, Scatterplots, and Correlation

333

8.1.1

Scatterplots

333

8.1.2

Correlation

336

8.2

The Simple Linear Regression Model

340

8.2.1

The Simple Linear Regression Model

341

8.2.2

Assumptions of the Simple Linear Regression Model

343

8.3

Fitting a Simple Linear Regression Model

344

8.4

Assessing the Assumptions and Fit of a Simple Linear Regression Model

347

8.4.1

Residuals

348

8.4.2

Residual Diagnostics

348

8.4.3

Estimating σ and Assessing the Strength of the Linear Relationship

355

8.5

Statistical Inferences Based on a Fitted Model

358

8.5.1

Inferences About β0

359

8.5.2

Inferences About β1

360

8.6

Inferences About the Response Variable

363

8.6.1

Inferences About μy|Z

363

8.6.2

Inferences for Predicting Values of Y

365

8.7

Some Final Comments on Simple Linear Regression

366

 

Glossary

369

 

Exercises

371

Chapter 9

Multiple Regression

383

9.1

Investigating Multivariate Relationships

385

9.2

The Multiple Linear Regression Model

387

9.2.1

The Assumptions of a Multiple Regression Model

388

9.3

Fitting a Multiple Linear Regression Model

390

9.4

Assessing the Assumptions of a Multiple Linear Regression Model

390

9.4.1

Residual Diagnostics

394

9.4.2

Detecting Multivariate Outliers and Influential Observations

399

9.5

Assessing the Adequacy of Fit of a Multiple Regression Model

401

9.5.1

Estimating σ

401

9.5.2

The Coefficient of Determination

401

9.5.3

Multiple Regression Analysis of Variance

403

9.6

Statistical Inferences-Based Multiple Regression Model

406

9.6.1

Inferences About the Regression Coefficients

406

9.6.2

Inferences About the Response Variable

408

9.7

Comparing Multiple Regression Models

410

9.8

Multiple Regression Models with Categorical Variables

413

9.8.1

Regression Models with Dummy Variables

415

9.8.2

Testing the Importance of Categorical Variables

418

9.9

Variable Selection Techniques

421

9.9.1

Model Selection Using Maximum R2adj

422

9.9.2

Model Selection Using BIC

424

9.10

Some Final Comments on Multiple Regression

425

 

Glossary

427

 

Exercises

429

Chapter 10

Logistic Regression

446

10.1

Odds and Odds Ratios

447

10.2

The Logistic Regression Model

450

10.2.1

Assumptions of the Logistic Regression Model

452

10.3

Fitting a Logistic Regression Model

454

10.4

Assessing the Fit of a Logistic Regression Model

456

10.4.1

Checking the Assumptions of a Logistic Regression Model

456

10.4.2

Testing for the Goodness of Fit of a Logistic Regression Model

458

10.4.3

Model Diagnostics

459

10.5

Statistical Inferences Based on a Logistic Regression Model

465

10.5.1

Inferences About the Logistic Regression Coefficients

465

10.5.2

Comparing Models

467

10.6

Variable Selection

470

10.7

Some Final Comments on Logistic Regression

473

 

Glossary

474

 

Exercises

476

Chapter 11

Design of Experiments

487

11.1

Experiments versus Observational Studies

487

11.2

The Basic Principles of Experimental Design

490

11.2.1

Terminology

490

11.2.2

Designing an Experiment

491

11.3

Experimental Designs

493

11.3.1

The Completely Randomized Design

495

11.3.2

The Randomized Block Design

498

11.4

Factorial Experiments

500

11.4.1

Two-Factor Experiments

502

11.4.2

Three-Factor Experiments

504

11.5

Models for Designed Experiments

506

11.5.1

The Model for a Completely Randomized Design

506

11.5.2

The Model for a Randomized Block Design

508

11.5.3

Models for Experimental Designs with a Factorial Treatment Structure

509

11.6

Some Final Comments of Designed Experiments

511

 

Glossary

511

 

Exercises

513

Chapter 12

Analysis of Variance

520

12.1

Single-Factor Analysis of Variance

521

12.1.1

Partitioning the Total Experimental Variation

523

12.1.2

The Model Assumptions

524

12.1.3

The F-Test

527

12.1.4

Comparing Treatment Means

528

12.2

Randomized Block Analysis of Variance

533

12.2.1

The ANOV Table for the Randomized Block Design

534

12.2.2

The Model Assumptions

536

12.2.3

The F-Test

538

12.2.4

Separating the Treatment Means

539

12.3

Multifactor Analysis of Variance

542

12.3.1

Two-Factor Analysis of Variance

542

12.3.2

Three-Factor Analysis of Variance

550

12.4

Selecting the Number of Replicates in Analysis of Variance

555

12.4.1

Determining the Number of Replicates from the Power

555

12.4.2

Determining the Number of Replicates from D

556

12.5

Some Final Comments on Analysis of Variance

557

 

Glossary

558

 

Exercises

559

Chapter 13

Survival Analysis

575

13.1

The Kaplan–Meier Estimate of the Survival Function

576

13.2

The Proportional Hazards Model

582

13.3

Logistic Regression and Survival Analysis

586

13.4

Some Final Comments on Survival Analysis

588

 

Glossary

589

 

Exercises

590

 

References

599

 

Appendix A

605

 

Problem Solutions

613

 

Index

643