hidden jargon in statistics

Randomization, replication, repeated measurement, blocking

randomization means both the allocation of the experimental material and the order in which the individual runs of the experiment are to be performed randomly. Randomization could meet the requirement of the statistical methods that the observations (or errors) are independently distributed random variables. By using randomization, extraneous errors could be averaged out.

replication means an independent repeat run of each factor combination. First it allows the experimenter to obtain an estimate of the experimental error; this estimate of error becomes a basic unit of measurement for determining whether observed differences in the data are really statistically different. Second if the sample mean is used to estimate the true mean response for one of the factor levels in the experiment, replication permits the experimenter to obtain a more precise estimate of this parameter.

Repeated measurement involves measuring the same subject multiple times; Replication involves running the same study on different subjects with identical conditions (same factor combination)

Blocking is a design technique used to improve the precision with which comparisons among the factors of interest are made. Often blocking is used to reduce or eliminate the variability transmitted from nuisance factors, which may influence the experimental response but in which we are not directly interested.

standard deviation and standard error

  • Standard deviation describes variability within a single sample, while standard error describes variability across multiple samples of a population
  • Standard deviation is a descriptive statistic that can be calculated from sample data, while standard error is an inferential statistic that can only be estimated

Standard deviation involves population standard deviation \(\sigma\) and sample standard deviation \(s_n\). Sample standard deviation \(s_n\) includes biased estimator of \(\sigma\) as \(s_n=\sqrt{\frac{1}{n}\sum_i^n(x_i-\bar{x})^2}\) and unbiased estimator of \(\sigma\) as \(s_{n-1}=\sqrt{\frac{n}{n-1}s_n^2}=\sqrt{\frac{1}{n-1}\sum_{i=1}^n(x_i-\bar{x})^2}\)

Standard error is the standard deviation of the estimator \(\hat{\theta}\), and can be said as the standard deviation of the sampling distribution. Standard error of the mean \(\bar{x}\) (SEM) is equal to the population standard deviation divided by the square root of number of samples \(n\).

Suppose \(X_1,X_2,\cdots,X_n\) are a random sample from an \(N(\mu,\sigma^2)\) distribution. Then the standard deviation of \(\bar{X}\) is the standard error, as \(SD(\bar{X})=\sigma/\sqrt{n}\). Usually the \(\sigma\) is unknown, and is estimated by \(s_n\) or \(s_{n-1}\). Therefore, by giving the observed sample standard deviation \(s_n\) or \(s_{n-1}\), the standard error of the mean \(\bar{X}\) could be estimated by \(SE(\bar{X})=\frac{s_n}{\sqrt{n}}\) or \(SE(\bar{X})=\frac{s_{n-1}}{\sqrt{n}}\)

Confidence interval and credible interval

A great explanation of the difference between confidence interval and credible interval can be found in What's the difference between a confidence interval and a credible interval?

Sufficient statistic

Here is an useful Q&A explaining the definition of sufficient statistic

Understanding Sufficient statistic.

Power, sample size, and effect size

Here is a great blog explaining the definition of power, sample size calculation and effect size calculation

power and sample size determination

Reference