Corrected AIC for X-ray model fitting

Akaike information criterion (AIC) is a popular general classical method for model comparison (see the book: Ivezić et al. 2020; or the original paper of Akaike: Akaike 1974 ).

The motivation for using AIC is simple, which is we need some tools to compare different models, i.e., which model is not that good and not substantially supported by the data.

What is AIC?

The AIC is defined as follows,

AIC \equiv - 2 \ln (L^{0} (M)) + 2 k,

where $L^{0} (M)$ is the maximum value of the likelihood of model $M$ , $k$ is the number of parameters of the model $M$ . The second term of $2 k$ is a must to make AIC an unbiased estimator of Kullback–Leibler discrepancy, which quantifies the discrepancy between a selected model and the real mechanism (more details and definitions can be found in Cavanaugh & Neath 2019) Additionally, $2 k$ can be simply regarded as a penalty term for the number of parameters. Adopting a more complicated model, i.e., the model with larger $k$ , will usually result in a smaller $- 2 \ln (L^{0} (M))$ but increase the second term $2 k$ , and hence there should be a trade-off. The smaller the AIC, the better the model is, and the model with the smallest AIC is the best model. Note that it cannot guarantee the best model can describe the real mechanism well and it just performs better than any else adopted model.

What is corrected AIC?

For small samples, such as $N / k ≲ 40$ , where the $N$ is the number of data points, there should be another penalty term for the number of parameters (Sugiura 1978), which reads

{AIC}_{c} \equiv - 2 \ln (L^{0} (M)) + 2 k + \frac{2 k (k + 1)}{N - k - 1} = AIC + \frac{2 k (k + 1)}{N - k - 1} .

The last term is the second-order effect that should be taken into consideration when the sample size is small but will be negligible when $N$ is very large. A model with a number of parameters close to the number of data points will be rejected since the last term will tend to infinity when $k$ approaches $N - 1$ . If the last term is committed, the model with a large number may not be rejected for a small sample. Therefore, ${AIC}_{c}$ is better than $AIC$ . Since $N / k ≲ 40$ is usually satisfied in the context of X-ray model fitting, ${AIC}_{c}$ will always be used.

How to calculate ${AIC}_{c}$ when different statistics are adopted?

$χ^{2}$ statistics

For statistics based on Gaussian likelihood:

L (M) = \prod_{i} \frac{1}{\sqrt{2 π} σ_{i}} \exp (\frac{- (D_{i} - M_{i})^{2}}{2 σ_{i}^{2}}),

where $L (M)$ is the likelihood for the model $M$ , $D_{i}$ is the data in $i$ th bin, $M_{i}$ denotes the model prediction in $i$ th bin, and $σ_{i}$ is the error for the $i$ th bin. $χ^{2}$ is defined as follows,

χ^{2} \equiv \sum_{i} \frac{(D_{i} - M_{i})^{2}}{σ_{i}^{2}},

which obeys the Chi-squared distribution. The relation of $χ^{2}$ and $L (M)$ is straightforward,

χ^{2} = - 2 \ln L (M) + constant .

Since only the relative value of ${AIC}_{c}$ between different models is important, the first term is the minimal value of $χ^{2}$ for a certain model, namely,

{AIC}_{c} = χ_{\min}^{2} + 2 k + \frac{2 k (k + 1)}{N - k - 1} for χ^{2} statistics .

C-statistics

For statistics based on Poisson likelihood:

L (M) = \prod_{i} \frac{M_{i}^{D_{i}}}{D_{i}!} \exp (- M_{i})

Taking its logarithm and multiplying by $- 2$ , we can get

- 2 \ln L (M) = 2 \sum_{i} (M_{i} - D_{i} \ln M_{i} + \ln (D_{i}!)) .

Omitting the factorial term, we can get the Cash-statistics (Cash 1979):

\tilde{C} = 2 \sum_{i} (M_{i} - D_{i} \ln M_{i}) .

Approximating the factorial term by Stirling's formula, that is

\ln (D_{i}!) \approx D_{i} \ln D_{i} - D_{i},

a modification of the original Cash-statistic, C-statistics, can be obtained as follows:

C = 2 \sum_{i} (M_{i} - D_{i} \ln M_{i} + D_{i} \ln D_{i} - D_{i}),

which is implemented in some popular fitting packages like XSPEC (Arnaud 1996), SHERPA (Freeman et al. 2001), and SPEX (Kaastra et al. 1996). $\tilde{C}$ is the same as $C$ , up to a constant $\sum_{i} (D_{i} \ln D_{i} - D_{i})$ . $C$ is non-negative, $C$ is equal to $0$ if and only if all the $M_{i}$ are equal to $D_{i}$ . Since the count rate is usually low for X-ray observation, it is better to use C-statistics fitting than $χ^{2}$ fitting, i.e., getting the best-fit parameters by minimizing $C$ instead of $χ^{2}$ . The corresponding ${AIC}_{c}$ is

{AIC}_{c} = C_{\min} + 2 k + \frac{2 k (k + 1)}{N - k - 1} for C - statistics .

As for the calculation of ${AIC}_{c}$ , the first term should be $χ_{\min}^{2}$ if $χ^{2}$ fitting is used, while it should be $C_{\min}$ if C-statistics fitting is implemented (Peca et al. 2023; Ng et al. 2022; Rogantini et al. 2019; Igoshev et al. 2018).

For a given model, its $Δ {AIC}_{c}$ is defined as the relative difference of ${AIC}_{c}$ between the minimum ${AIC}_{c}$ , which reads,

Δ {AIC}_{c} \equiv {AIC}_{c} - {AIC}_{c, \min} .

The model with the minimum ${AIC}_{c}$ is the most preferred model, but the rest models can be simply ruled out. Generally, the model with $Δ {AIC}_{c} > 2$ is not substantially supported by the observation data and may not be reserved (Burnham et al. 2002). The following table (Burnham et al. 2002) shows the level of empirical support for different $Δ {AIC}_{c}$ .

The model whose $Δ {AIC}_{c}$ within 2 is substantially supported by the data, i.e., the model with $Δ {AIC}_{c}$ within 2 is not significantly different from the best model and cannot be ruled out. However, the model whose $Δ {AIC}_{c}$ is larger than 2 is considerably less supported by the data, i.e., the model with $Δ {AIC}_{c}$ larger than 2 is probably not the best model and can be ruled out. The model whose $Δ {AIC}_{c}$ is larger than 10 is essentially not supported by the data, you should discard such a model.

$Δ {AIC}_{c}$	Level of Empirical Support
0-2	Substantial
4-7	Considerably less
>10	Essentially none

Attention

The corrected AIC can be used if you cannot rule out some models from the perspective of physics. Namely, corrected AIC just rules out the models from the perspective of statistics. If you can rule out some models from the perspective of physics, please rule out them first.

Corrected AIC for X-ray model fitting ​

What is AIC? ​

What is corrected AIC? ​

How to calculate AICc when different statistics are adopted? ​

χ2 statistics ​

C-statistics ​

Attention ​