Beta distribution

Posted on 2022-05-29 Edited on 2022-11-07 In statistics

A good interpretation of Beta distribution can be referred to a thread from What is the intuition behind beta distribution

The short version is that the Beta distribution can be understood as representing a distribution of probabilities, that is, it represents all the possible values of a probability when we don't know what that probability is.

For $\theta\sim Beta(\alpha,\beta)$ \[ \begin{gather*} p(\theta)=\frac{\theta^{\alpha-1}(1-\theta)^{\beta-1}}{B(\alpha,\beta)}\\ =\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}\theta^{\alpha-1}(1-\theta)^{\beta-1}\\ \theta\in[0,1] \end{gather*} \] The expectation and variance of Beta distribution is \[ \begin{gather*} E(\theta)=\frac{\alpha}{\alpha+\beta}\\ var(\theta)=\frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)} \end{gather*} \] When $\alpha=\beta=1$, Beta distribution is equivalent to the uniform distribution \[ \begin{gather*} p(\theta)=\frac{\Gamma(2)}{\Gamma(1)\Gamma(1)}\theta^0(1-\theta)^0\\ =\frac{1\Gamma(1)}{1*1}\\ =1\\ F_{\Theta}(x)=\int_0^x 1d\theta=x\in[0,1]\\ \Rightarrow \theta\sim U(0,1) \end{gather*} \]

Reference

q-value

Posted on 2022-05-28 Edited on 2022-11-07 In statistics

Q value

From wikipedia, q value is phrased as follows

In statistical hypothesis testing, specifically multiple hypothesis testing, the q-value provides a means to control the positive false discovery rate (pFDR).John D. 2003 Just as the p-value gives the expected false positive rate obtained by rejecting the null hypothesis for any result with an equal or smaller p-value, the q-value gives the expected pFDR obtained by rejecting the null hypothesis for any result with an equal or smaller q-value.

From Statistics and Data analysis: From elementary to Intermediate, we can find the explicit definition of p-value

Simply rejecting or not rejecting $H_0$ at a specified $\alpha$ does not fully convey the information in the data. It is more useful to report the smallest a-level at which the observed test result is significant. This smallest $\alpha$-level is called the observed level of significance or the P-value. The smaller the P-value, the more significant is the test result. Once the P-value is computed, a test at any specified $\alpha$ can be conducted by rejecting $H_0$ if P-value$<\alpha$. An alternative definition of the P-value is that it is the probability under Null hypothesis of obtaining a test statistic at least as "extreme" as the observed value.

We can provide an example to enhance the understanding of p-value and q-value. Assume we have $1000$ tests. Each test will produce a p-value. We can order these p-values from the smallest to the largest. A p-value of $0.05$ means $5\%$ of all tests which is 50 tests are expected to output false positive results. A q-value of $0.05$ means tests with q-value less than $0.05$ are all considered to be significant, and $5\%$ of these significant tests are expected to output false positive results.

One of the methods that adjust p-value to q-value is the Benjamini-Hochberg method, (short for BH). The BH-adjusted p-values are defined as \[ p_{(i)}^{BH}=min\{\underset{j\geq i}{min}\{\frac{mp_{(j)}}{j}\},1\} \] The process is as follows.

Order all p-values in an ascending sequence. Then multiply each p-value by the total number of test $m$ and divide by its rank order

For the first p-value, find the minimum value of $\frac{mp_{(j)}}{j}$for $j=1,2,\cdots,m$; compare with $1$, assign q-value as $1$ if it is larger than $1$. This value would be the smallest q-value among all tests; For the second p-value, find the minimum value of $\frac{mp_{(j)}}{j}$for $j=2,\cdots,m$, this q-value must be less than the first q-value. We begin this procedure until all p-values are adjusted to q-values

We can write BH method using R code

> BH <- function(p.values)
+ {
+   p <- p.values
+   lp <- length(p)
+   i <- lp:1L
+   # key to the p.values (key1)
+   o <- order(p,decreasing = TRUE)
+   # key to the key1 (key2)
+   ro <- order(o)
+   pmin(1,cummin(lp/i * p[o]))[ro]
+ }
> BH(c(0.05,0.02,0.2,0.4,0.5))
[1] 0.1250000 0.1000000 0.3333333 0.5000000 0.5000000

We can visualize this process

p.value	0.05	0.02	0.2	0.4	0.5
p.value position (key1)	1	2	3	4	5

\[ \downarrow \]

p.value position(key1)	5	4	3	1	2
q.value	0.5	0.5	1/3	0.125	0.1
Position of key (key2)	1	2	3	4	5

\[ \downarrow \]

position of key (key2)	4	5	3	2	1
p.value position (key1)	1	2	3	4	5
p.value	0.05	0.02	0.2	0.4	0.5
q.value	0.125	0.1	1/3	1/2	1/2

Reference

Density of transformed random variable

Posted on 2022-05-09 Edited on 2022-12-20 In statistics

Suppose we are given the distribution of a continuous random variable $X$. Let $Y=\phi(X)$, where $\phi$ is an invertible (one-to-one) function. Let $\psi=\phi^{-1}$ so that $X=\psi(Y)$. Furthermore,suppose $\psi$ is differentiable. Then the p.d.f. of $Y$ is \[ \begin{gather} g(y)=f(\psi(y))|\frac{dx}{dy}|\\ \text{where }x=\psi(y) \end{gather} \] From the equation, we can easily observe that $|\frac{dx}{dy}|$ is the dispersion of $Y$ when random variable $X$ is transformed to random variable $Y$. Thus one cannot simply replace $X$ with $\psi(y)$ to derive the density of $Y$ without considering the difference of dispersion between $X$ and $Y$.

Proof:

Let $G(y)$ represents the c.d.f. of $Y$. first suppose $\phi$ is a strictly increasing function. Then $\psi=\phi^{-1}$ is also a strictly increasing function and $d\psi(y)/dy>0$. Therefore \[ \begin{gather}\\ G(y)=P[Y\leq y]=P[\phi(X)\leq y]\\ =P(X\leq \phi^{-1}(y))=F(\psi(y)) \end{gather} \] Differentiating $G(y)$ by using the chain rule, it follows that \[ \begin{gather} g(y)=\frac{dG(y)}{dy}=\frac{dF(\psi(y))}{dy}\\ =f(\psi(y))\frac{d\psi(y)}{dy}=f(\psi(y))\frac{dx}{dy} \end{gather} \] Next suppose $\phi$ is a strictly decreasing function. Then $\psi=\phi^{-1}$ is also a strictly decreasing function and $d\psi(y)/dy<0$. Therefore \[ G(y)=P[Y\leq y] = P[\phi(X)\leq y]=P[X\geq\phi^{-1}(y)]=1-F(\psi(y)) \] Differentiating $G(y)$ by using the chain rule, it follows that \[ \begin{gather} g(y)=\frac{dG(y)}{dy} = -\frac{dF(\psi(y))}{dy}\\ =-f(\psi(y))\frac{d\psi(y)}{dy}\\ =f(\psi(y))|\frac{dx}{dy}| \end{gather} \]

Example

Suppose $Y=lnX$, where $X\sim Beta(a,b)$.

Then we have p.d.f. of $X$

\[ \begin{gather} f_X(x)=\frac{x^{a-1}(1-x)^{b-1}}{B(a,b)}\\ \text{where }B(a,b)=\frac{\Gamma(a)\Gamma(b)}{\Gamma(a+b)}\\ x\in[0,1] \end{gather} \]

\[ \begin{gather} Y=\phi(X)=lnX\\ Y\in(-\infty,0]\\ X=\psi(Y)=e^Y\\ X\in[0,1] \end{gather} \]

Then we have p.d.f. of $Y$

\[ \begin{gather} g_Y(y)=f_X(\psi(y))|\frac{dx}{dy}|\\ =\frac{e^{y(a-1)}(1-e^y)^{b-1}}{B(a,b)}e^y\ (\frac{dx}{dy}=\frac{de^y}{dy}=e^y)\\ =\frac{e^{ay}(1-e^y)^{b-1}}{B(a,b)} \end{gather} \]

Next we derive the m.g.f. (moment generating function) of $Y$

\[ \begin{gather} M_Y(t)=E[e^{tY}]\\ =\int_{-\infty}^0e^{ty}g_Y(y)dy\\ =\int_{-\infty}^0\frac{e^{ay}(1-e^y)^{b-1}}{B(a,b)}e^{ty}dy\\ =\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}\int_{-\infty}^0(e^y)^{a+t}(1-e^y)^{b-1}dy\\ =\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}\int_0^1x^{a+t}(1-x)^{b-1}dlnx\ (e^y=e^{lnx}=x)\\ =\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}\int_0^1x^{a+t}(1-x)^{b-1}\frac{1}{x}dx\\(\frac{dy}{dx}=\frac{dlnx}{dx}=\frac{1}{x}\Rightarrow dlnx=\frac{1}{x}dx)\\ =\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}\int_0^1x^{a+t-1}(1-x)^{b-1}dx\\ =\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}B(a+t,b)\int_0^1\frac{x^{a+t-1}(1-x)^{b-1}}{B(a+t,b)}dx\\ =\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}B(a+t,b)\\ (\int_0^1\frac{x^{a+t-1}(1-x)^{b-1}}{B(a+t,b)}dx\text{ is the integration of distribution }Beta(a+t,b)\text{ which is 1})\\ =\frac{\Gamma(a+b)\Gamma(a+t)\Gamma(b)}{\Gamma(a)\Gamma(b)\Gamma(a+b+t)}\\ =\frac{\Gamma(a+b)\Gamma(a+t)}{\Gamma(a+b+t)\Gamma(a)} \end{gather} \]

Terminologies in collecting data

Posted on 2022-04-25 Edited on 2022-11-07 In statistics

Types of Statistical studies

Study with different purposes
- Comparative (analytical) study
  
  A study whose purpose is to compare two or more alternative methods or groups distinguished by some attribute is called a comparative (analytical) study
- Noncomparative (descriptive) study
  
  A study whose purpose is to learn about the characteristics of a group, but no necessarily to make comparasions, is called a noncomparative (descriptive) study
Study with or without intervention
- Observational study
  
  The researcher is a passive observer who documents the process
- Experimental study
  
  The researcher actively intervenes to control the study conditions and records the response
Remark: Both study types can be either comparative or noncomparative

Confounding variable

If the effects of predictor variables cannot be separated from the effects of the uncontrolled variables, those uncontrolled variables are called confounding variables. A variable must meet two conditions to be a confounder:

It must be correlatedd with the independent variable. This may be a causal relationship, but it does not have to be.
It must be causally related to the dependent variable

Sometimes the presence of these variables is not recognized after the fact, in which case they are referred to as lurking variables

graph TD
A[Confounding variable]-.->B[predictor variable];
B -->C[response variable];
A -->C;

See more info

Comparative study

The goal of the simplest and most common comparative studies is to evaluate how a "change" from a "baseline" or "normal" condition affects a response variable.

The changed condition is called a treatment or an intervention and the normal condition is called a control
The treatment and control group should be similar in all other respects excepty for the changed condition to provide a valid comparison

Observational study

A sample survey provides a snapshot of the population based on a sample observed at a point in time
A prospective study follows a sample forward in time
A retrospective study follows a sample backward in time

Sample survey

A numerical characteristic of a population defined for a specific variable is called a parameter
A numerical function of the sample data, called a statistic, is used to estimate the unknown population parameter
It is possible to list all units in a finite population; a list of all units in a finite population is called a sampling frame;
It is impossible to list all units of an infinite population as there is no upper limit on the number of units in a population
The population of interest is called the target population; sometimes the target population is difficult to study, so sampling is done using another easily accessible population, called the sampled population
The deviation between a sample estimate and the true parameter value is called the sampling error; nonsampling error often cause bias, which is a systematic deviation between the sample estimate and the population parameter; bias is a more serious problem than sampling errors because it doesn't disappear by increasing the sample size. For example, measurement bias affects all measurements. Nonsampling error includes other types such as self-selection bias and response bias

Basic sampling designs

Simple random sample(SRS)

Draw a sample of size $n$ from a population of size $N$ Without replacement
Stratified Random sampling

Divide a population into homogeneous subpopulations and draw SRS from each one
Multistage cluster sampling

Draw SRS layer by layer from the top to the bottom
Systematic sampling

A 1-in-k systematic sampling consists of selecting one unit at random from the first $k$ Unit and selecting every $kth$ Unit thereafter; mainly useful for cases such as assembly line or cars entering a toll booth on a highway

Experimental studies

Treatment factors are controlled in an experiment, effects of the response variable are of primary interest
Nuisance factors, or noise factors are all the other factors that might affect the response variable
The different possible values of a factor are called levels, and each treatment is a particular combination of the levels of different treatment factors
Subjects or items receiving treatments are called experimental units; all experimental units receiving the same treatment form a treatment group
A run is an observation made on an experimental unit after a particular treatment condition; a replicate is another independent run carried out under a particular identical conditions; repeat measurements of the same response is not replicate

Strategies to reduce experimental error variation

Main components of experimental error:

Systematic error

Caused by difference between experimental units; the nuisance factors on which the experimental units differ are said to confound or bias the treatment comparisons

Random error

Caused by the inherent variability in the response of similar experimental units given the same treatment

Measurement error

Caused by imprecise measurement instruments

Strategies to reduce the systematic error with respect to type of nuisance factor
- Blocking: nuisance factor is controllable
- Regression Analysis: nuisance factors cannot be used as blocking factor because they are not controllable; such nuisance factors are called covariates. A covariate should be measured before the treatment are applied to experimental units because it could also be affected by the treatment
- Randomization: in terms of additional known or unknown nuisance factors, the experimental units may well differ on these factors, thus biasing the results. Randomization doesn't imply experimental units are equal for each treatment, but that no one treatment is favored
Note: A covariates can be an independent variable (i.e. of direct interest) or it can be unwanted, confounding variable. Adding a covariate to a model can increase the accuracy of the results See more info

In summary: Block over those nuisance factors that can be easily controlled, randomize over the rest
Strategy to reduce the random error

Replicating the experimental runs; making multiple independent runs under identical treatment conditions
Strategy to reduce the measurement error

Repeat measurements, including having different subjects make the measurements;

Variation within each group of measurements can be used to estimate random and measurement errors.

Basic experimental designs

Completely randomized design (CRD)

All experimental units are assigned at random to the treatment

Batch 1 Batch 2 Batch 3 Batch 4 Batch 5

A C D D B

B A A C D

C D B A C

B C D B A

Randomly assign $20$ samples to the $4$ treatments $ABCD$ without considering the batchs they are from
Randomized block design (RBD)
Forming blocks of units which are similar (in terms of chosen blocking factors)

Two special cases:
- Matched pairs design: Match subjects on the nuisance factors (controllable)
- Cross-over design: the order in which the methods are administered may introduce bias due to the learning effect
Batch 1 Batch 2 Batch 3 Batch 4 Batch 5

A D C D B

C A B C C

B C D B A

D B A A D

Randomly assign $4$ treatments to the experiments within each block

Batch 1	Batch 2	Batch 3	Batch 4	Batch 5
A	C	D	D	B
B	A	A	C	D
C	D	B	A	C
B	C	D	B	A

Batch 1	Batch 2	Batch 3	Batch 4	Batch 5
A	D	C	D	B
C	A	B	C	C
B	C	D	B	A
D	B	A	A	D

Relationship between R-square and correlation coefficient

Posted on 2022-04-21 Edited on 2022-11-11 In statistics

The coefficient of determination of $R^2$

To begin with, we need to mention several terms that we must define

SST: sum of squares total, as the squared difference between the observed dependent variable and its mean \[ \sum(y_i-\overline{y})^2 \]
SSR/ESS: sum of squares due to regression or explained sum of squares, as the sum of the differences between the predicted value and the mean of the dependent variable \[ \sum(\hat{y}_i-\overline{y})^2=\sum(\hat{y}_i-\overline{\hat{y}})^2 \]
SSE/RSS: sum of square error or residual sum of squares, as the difference between the observed value and the predicted value, and as the unexplained variability \[ \sum(y_i-\hat{y}_i)^2=\sum e_i^2 \]

The coefficient of determination of $R^2$ shows how much of the variation of the dependent variable Var(y) can be explained by the model; In OLS estimate, $R^2$ can also be calculated as the ratio of Explained sum of square (ESS) to total sum squared(TSS) \[ R^2=\frac{ESS}{SST}=1-\frac{SSE}{SST}=1-\frac{\sum e_i^2}{\sum(y_i-\overline{y})^2} \]

Formulate $R^2$

Suppose $y_i=\hat{y}_i+e_i$ \[ \begin{gather} Var(y)=Var(y)+Var(e)+2Cov(\hat{y},e)\\ =Var(\hat{y})+Var(e)\ (Cov(\hat{y},e)=0)\\ \Rightarrow\sum(y_i-\overline{y})^2=\sum(\hat{y}_i-\overline{\hat{y}})^2+\sum(e_i-\overline{e})^2\\ =\sum(\hat{y}_i-\overline{\hat{y}})^2+\sum e_i^2\ (\overline{e}=0)\\ =\sum(\hat{y}_i-\overline{y})^2+\sum e_i^2 \end{gather} \]

Relationship between $R^2$ and Pearson correlation coefficient

\[ \begin{gather} r^2_{y,\hat{y}}=(\frac{Cov(y,\hat{y})}{\sqrt{Var(y)Var(\hat{y})}})^2 =\frac{Cov(y,\hat{y})Cov(y,\hat{y})}{Var(y)Var(\hat{y})}\\ =\frac{Cov(\hat{y}+e,\hat{y})^2}{Var(y)Var(\hat{y})} =\frac{Cov(\hat{y},\hat{y})^2}{Var(y)Var(\hat{y})}\ (Cov(\hat{y},e)=0)\\ =\frac{Var(\hat{y})}{Var(y)} =\frac{\sum(\hat{y}_i-\overline{\hat{y}})^2}{\sum(y_i-\overline{y})^2}\\ =\frac{SSR}{SST}\\ =R^2 \end{gather} \]

Conclusion: $R^2$ is equal to the squared Pearson correlation coefficient between observed response variable $y$ and predicted response variable $\hat{y}$

> iris_model <- lm(Petal.Width~Sepal.Length+Sepal.Width+Petal.Length,data = iris)
> summary(iris_model)

Call:
lm(formula = Petal.Width ~ Sepal.Length + Sepal.Width + Petal.Length, 
    data = iris)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.60959 -0.10134 -0.01089  0.09825  0.60685 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -0.24031    0.17837  -1.347     0.18    
Sepal.Length -0.20727    0.04751  -4.363 2.41e-05 ***
Sepal.Width   0.22283    0.04894   4.553 1.10e-05 ***
Petal.Length  0.52408    0.02449  21.399  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.192 on 146 degrees of freedom
Multiple R-squared:  0.9379,	Adjusted R-squared:  0.9366 
F-statistic: 734.4 on 3 and 146 DF,  p-value: < 2.2e-16

> (cor(iris_model$fitted.values,iris$Petal.Width))^2
[1] 0.9378503

Particularly, for simple linear regression, the coefficient of determination also equals the square of the sample correlation coefficient $r_{xy}$

Assume $y_i=\beta_0+\beta_1x_i$, then the best linear unbiased estimates of $\beta_0$ and $\beta_1$ are \[ \begin{gather*} \hat{\beta_0}=\bar{y}-\hat{\beta_1}\bar{x}\\ \hat{\beta_1}=\frac{S_{xy}}{S_{xx}} \end{gather*} \] where \[ \begin{gather*} S_{xy}=\sum_{i=1}^n(x_i-\bar{x})(y_i-\bar{y})\\ S_{xx}=\sum_{i=1}^n(x_i-\bar{x})^2\\ S_{yy}=\sum_{i=1}^n(y_i-\bar{y})^2 \end{gather*} \] Then we could easily obtain the relationship between $R^2$ and the $r_{xy}$ as \[ \begin{gather*} R^2 = \frac{ESS}{SST}=\frac{\hat{\beta_1}^2S_{xx}}{S_{yy}}\\ =(\frac{S_{xy}}{S_xS_y})^2\\ =r^2_{xy} \end{gather*} \] We could also test this by running R code

> model <- lm(Petal.Width ~ Sepal.Length,data = iris)
> summary(model)

Call:
lm(formula = Petal.Width ~ Sepal.Length, data = iris)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.96671 -0.35936 -0.01787  0.28388  1.23329 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -3.20022    0.25689  -12.46   <2e-16 ***
Sepal.Length  0.75292    0.04353   17.30   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.44 on 148 degrees of freedom
Multiple R-squared:  0.669,	Adjusted R-squared:  0.6668 
F-statistic: 299.2 on 1 and 148 DF,  p-value: < 2.2e-16

> cor(model$fitted.values,iris$Petal.Width)^2
[1] 0.6690277
> cor(iris$Petal.Width,iris$Sepal.Length)^2
[1] 0.6690277

Reference

Linear regression Part

Posted on 2022-04-21 Edited on 2022-11-07 In statistics

Prove mean of predicted values in OLS regression is equal to the mean of original values

In OLS estimation, we can summarize response $y$ as \[ y_i=\hat{y}_i+e_i \] where residual $e_i$ is assumed to follow a normal distribution $N(0,\sigma^2)$ \[ \sum e_i=\overline{e}=0 \] thereby, we have \[ \sum_{i=1}^ny_i=\sum_{i=1}^n(\hat{y}_i+e_i)\\ =\sum_{i=1}^n\hat{y}_i \]

> iris_model <- lm(Petal.Width~Sepal.Length+Sepal.Width+Petal.Length,data = iris)
> summary(iris_model)

Call:
lm(formula = Petal.Width ~ Sepal.Length + Sepal.Width + Petal.Length, 
    data = iris)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.60959 -0.10134 -0.01089  0.09825  0.60685 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -0.24031    0.17837  -1.347     0.18    
Sepal.Length -0.20727    0.04751  -4.363 2.41e-05 ***
Sepal.Width   0.22283    0.04894   4.553 1.10e-05 ***
Petal.Length  0.52408    0.02449  21.399  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.192 on 146 degrees of freedom
Multiple R-squared:  0.9379,	Adjusted R-squared:  0.9366 
F-statistic: 734.4 on 3 and 146 DF,  p-value: < 2.2e-16

> mean(iris_model$fitted.values)
[1] 1.199333
> mean(iris$Petal.Width)
[1] 1.199333

Proof that the mean of predicted values in OLS regression is equal to the mean of original values?

Assumption of linear regression

Posted on 2022-04-21 Edited on 2022-12-20 In statistics

Definition

The model is called “linear” not because it is linear in $x$, but rather because it is linear in the parameters.

The following are the examples of linear models:

\[ \begin{gather*} y =\beta_0+\beta_1x_1+\beta_2x_2+\epsilon\\ y =\beta_0+\beta_1x_1+\beta_2x_2^2+\epsilon\\ y =\beta_0+\beta_1x_1+\beta_2x_2+\beta_3x_1^2+\beta_4x_2^2+\beta_5x_1x_2+\epsilon \end{gather*} \]

A model that can be transformed so that it becomes linear in its unknown parameters is called intrinsically linear; Otherwise it is called a nonlinear model

The following are nonlinear models

\[ \begin{gather*} y =\beta_0+\beta_1x_1+\beta_2x_2^{\beta_3}+\epsilon\\ y = \alpha_0z_1^{\alpha_1}z_2^{\alpha_2}\eta \end{gather*} \]

The following are some nonlinear models that can be made linear by transformation

\[ \begin{gather*} log \ y=log\alpha_0+\alpha_1logz_1+\alpha_2logz_2+log\eta\\ \Rightarrow y*=\beta_0+\beta_1x_1+\beta_2x_2+\epsilon \end{gather*} \]

There are four key assumptions for Simple Linear regression

Linear relationship between dependent variable and independent variable

The variance of the residuals is constant (constant variance errors, homoscedasticity)

Independence of observation (no autocorrelation). Simply put, the model assumes that the values of residuals are independent

sample error is dependent,

Normally distributed errors

In addition to the four assumptions listed above, there is one more for Multiple Linear regression.

The data should not show multicollinearity.

Useful blogs

There is an interesting post about Multicollinearity that I'd like to mention here.

Is it necessary to correct collinearity when square terms are in a model?

Question: I had a regression model where one of the explanatory variable is "age". I added a "age-squared" variable since the distribution of age was in a quadratic form. It is obvious that the 'age' and 'age-squared' variables will be highly correlated. In that case, is it really necessary to deal the collinearity problem in the model?

Several answers are great:

Multicollinearity is NOT a problem in your case. Multicollinearity has to be checked and problems have to be solved when you want to estimate the independent effect of two variables which happen to be correlated by chance. This is NOT your problem with your age and age-squared variables since you should never be interested in evaluating the effect of changing age without changing agesquared. So do not care about multicollinearity between one variable and a second variable which is a deterministic non linear function of the first one. Except in the case of perfect multocillinearity, which would be the case if you had only two different values for your age variable.
Age and age squared are correlated. However, one is not a linear transformation of the other by definition.
Another possibility is to run a regression between Age and Age2 to see that the R2 is close to zero. This indicates, as mentioned before by other colleagues, that collinearity is very low and can be overlooked from a econometric perspective.

In linear regression, when is it appropriate to use the log of an independent variable instead of the actual values?

https://stats.stackexchange.com/questions/310003/why-in-box-cox-method-we-try-to-make-x-and-y-normally-distributed-but-thats-no?noredirect=1&lq=1

Reference

Principal component analysis

Posted on 2022-02-21 Edited on 2022-11-07 In PCA

Principal Component analysis

PCA deduction using Spectral decomposition

Suppose we have a random vector \[ \begin{gather} X = \begin{pmatrix}X_1\\X_2\\ \vdots\\\ X_p\end{pmatrix}\\ var(X) = \Sigma=\begin{bmatrix}\sigma_1^2 & \sigma_{21}^2& \cdots & \sigma_{1p}^2\\ \sigma_{21}^2& \sigma_{22}^2 & \cdots & \sigma_{2p}^2\\ \vdots & \vdots & \ddots & \vdots \\ \sigma_{p1}^2&\sigma_{p2}^2& \cdots & \sigma_{p}^2 \end{bmatrix}\\ \end{gather} \] We need to augment the spread between the data, so we need to consider linear combination \[ Y_i = e_{11}X_1+e_{12}X_2+\cdots+e_{1p}X_p=\mathbf{e}_i^TX \] and maximize the variance of linear combination \[ \begin{gather} var(Y_i)=E[(\mathbf{e}_i^TX-E(\mathbf{e}_i^TX))^2]\\ E[(\mathbf{e}_i^TX-E(\mathbf{e}_i^TX))^T(\mathbf{e}_i^TX-E(\mathbf{e}_i^TX))]\\ =E[(\mathbf{e}_i^TX-E(\mathbf{e}_i^TX))(\mathbf{e}_i^TX-E(\mathbf{e}_i^TX))^T](\text{variance is a scalar})\\ =E[\mathbf{e}_i^TXX^T\mathbf{e}_i-\mathbf{e}_i^TEXEX^T\mathbf{e}_i]\\ =E[\mathbf{e}_i^T(XX^T)\mathbf{e}_i]\\ =\mathbf{e}_i^T[E(XX^T)-E(X)E(X^T)]\mathbf{e}_i\\ =\mathbf{e}_i^T\Sigma \mathbf{e}_i \end{gather} \] Since $\Sigma \mathbf{e}_i=\lambda_i\mathbf{e}_i$ in spectral decomposition \[ var(Y_i)=e_i^T\lambda_ie_i=\lambda_ie_i^Te_i=\lambda_i \]

Basic command of hexo

Posted on 2021-11-29 Edited on 2022-11-08 In hexo

Welcome to Hexo! This is your very first post. Check documentation for more info. If you get any problems when using Hexo, you can find the answer in troubleshooting or you can ask me on GitHub.

Quick Start

Create a new post

1	$ hexo new "My New Post"

More info: Writing

Run server

1	$ hexo server

More info: Server

Generate static files

1	$ hexo generate

More info: Generating

Deploy to remote sites

1	$ hexo deploy

More info: Deployment

Reference

Levy process

Posted on 2021-11-29 Edited on 2022-12-20 In time series analysis

Definition of Levy process

A Levy process, $\{L(t),t\in \mathbb{R}\}$ is a process with the following $4$ properties:

$L(0)=0$
$L(t)-L(s)$ has the same distribution as $L(t-s)$ for $\forall\ s,t$ such that $s\leq t$
If $(s,t)$ and $(u,v)$ are disjoint intervals then $L(t)-L(s)$ and $L(v)-L(u)$ are independent
$\{L(t)\}$ is continuous in probability, that is, for $\forall\ \epsilon>0$ and for $\forall\ t\in \mathbb{R}$ \[ \underset{s\rightarrow t}{lim}P(\left|L(t)-L(s)\right|>\epsilon)=0 \]

Brownian motion

A well-known example of the levy process is the Brownian Motion \[ \begin{gather} L(t)\sim N(\mu t,\sigma^2t)\\ t\geq0,\ \mu\in \mathbb{R}\ and\ \sigma>0 \end{gather} \] To show that it satisfies property (ii) of Levy process, we can calculate the corresponding expectation and variance \[ \begin{gather} E[L(t)-L(s)]\\ =E[L(t)]-E[L(s)]\\ =\mu(t-s)\\ =E[L(t-s)] \end{gather} \]

\[ \begin{gather} Var[L(t)-L(s)]\\ =E[(L(t)-L(s)-E[L(t)-L(s)])^2]\\ =E[[L(t)-L(s)]^2-2[L(t)-L(s)]\mu(t-s)+\mu^2(t-s)^2]\\ =E[L^2(t)+L^2(s)-2L(t)L(s)]-\mu^2(t-s)^2\\ =\sigma^2t+\mu^2t^2+\sigma^2s+\mu^2s^2-\mu^2(t-s)^2-2E[L(t)L(s)]\\ =\sigma^2(t+s)+\mu^2(t^2+s^2-t^2-s^2+2ts)-2E[L(t)L(s)]\\ =\sigma^2(t+s)+2\mu^2ts-2E[L(t)L(s)]\\ =\sigma^2(t+s)+2\mu^2ts-2E[[L(s)+L(t)-L(s)]L(s)] \end{gather} \]

Since $L(t)-L(s)\text{ is independent to }L(s)$, we have \[ \begin{gather} Var[L(t)-L(s)]\\ =\sigma^2(t+s)+2\mu^2ts-2[E[L^2(s)]+E[L(t)-L(s)]E[L(s)]]\\ =\sigma^2(t+s)+2\mu^2ts-2(\mu^2s^2+\sigma^2s+\mu(t-s)\mu s)\\ =\sigma^2(t+s)-2\sigma^2s\\ =\sigma^2(t-s)\\ =Var[L(t-s)]\ \end{gather} \] Hence $L(t)-L(s)$ has the same distribution of $L(t-s)$

Reference

Brockwell, P. J., & Davis, R. A. (2002). Introduction to time series and forecasting. New York: Springer.

Reference

Q value

Reference

Example

Types of Statistical studies

Confounding variable

Comparative study

Observational study

Sample survey

Basic sampling designs

Experimental studies

Strategies to reduce experimental error variation

Basic experimental designs

The coefficient of determination of \(R^2\)

Formulate \(R^2\)

Relationship between \(R^2\) and Pearson correlation coefficient

Reference

Prove mean of predicted values in OLS regression is equal to the mean of original values

Related Q&A

Definition

Useful blogs

Reference

Principal Component analysis

PCA deduction using Spectral decomposition

Quick Start

Create a new post

Run server

Generate static files

Deploy to remote sites

Reference

Definition of Levy process

Brownian motion

Reference