Skip to content

Corrected AIC for X-ray model fitting

Akaike information criterion (AIC) is a popular general classical method for model comparison (see the book: Ivezić et al. 2020; or the original paper of Akaike: Akaike 1974 ).

The motivation for using AIC is simple, which is we need some tools to compare different models, i.e., which model is not that good and not substantially supported by the data.

What is AIC?

The AIC is defined as follows,

AIC2ln(L0(M))+2k,

where L0(M) is the maximum value of the likelihood of model M, k is the number of parameters of the model M. The second term of 2k is a must to make AIC an unbiased estimator of Kullback–Leibler discrepancy, which quantifies the discrepancy between a selected model and the real mechanism (more details and definitions can be found in Cavanaugh & Neath 2019) Additionally, 2k can be simply regarded as a penalty term for the number of parameters. Adopting a more complicated model, i.e., the model with larger k, will usually result in a smaller 2ln(L0(M)) but increase the second term 2k, and hence there should be a trade-off. The smaller the AIC, the better the model is, and the model with the smallest AIC is the best model. Note that it cannot guarantee the best model can describe the real mechanism well and it just performs better than any else adopted model.

What is corrected AIC?

For small samples, such as N/k40, where the N is the number of data points, there should be another penalty term for the number of parameters (Sugiura 1978), which reads

AICc2ln(L0(M))+2k+2k(k+1)Nk1=AIC+2k(k+1)Nk1.

The last term is the second-order effect that should be taken into consideration when the sample size is small but will be negligible when N is very large. A model with a number of parameters close to the number of data points will be rejected since the last term will tend to infinity when k approaches N1. If the last term is committed, the model with a large number may not be rejected for a small sample. Therefore, AICc is better than AIC. Since N/k40 is usually satisfied in the context of X-ray model fitting, AICc will always be used.

How to calculate AICc when different statistics are adopted?

χ2 statistics

For statistics based on Gaussian likelihood:

L(M)=i12πσiexp((DiMi)22σi2),

where L(M) is the likelihood for the model M, Di is the data in ith bin, Mi denotes the model prediction in ith bin, and σi is the error for the ith bin. χ2 is defined as follows,

χ2i(DiMi)2σi2,

which obeys the Chi-squared distribution. The relation of χ2 and L(M) is straightforward,

χ2=2lnL(M)+constant.

Since only the relative value of AICc between different models is important, the first term is the minimal value of χ2 for a certain model, namely,

AICc=χmin2+2k+2k(k+1)Nk1for χ2 statistics.

C-statistics

For statistics based on Poisson likelihood:

L(M)=iMiDiDi!exp(Mi)

Taking its logarithm and multiplying by 2, we can get

2lnL(M)=2i(MiDilnMi+ln(Di!)).

Omitting the factorial term, we can get the Cash-statistics (Cash 1979):

C~=2i(MiDilnMi).

Approximating the factorial term by Stirling's formula, that is

ln(Di!)DilnDiDi,

a modification of the original Cash-statistic, C-statistics, can be obtained as follows:

C=2i(MiDilnMi+DilnDiDi),

which is implemented in some popular fitting packages like XSPEC (Arnaud 1996), SHERPA (Freeman et al. 2001), and SPEX (Kaastra et al. 1996). C~ is the same as C, up to a constant i(DilnDiDi). C is non-negative, C is equal to 0 if and only if all the Mi are equal to Di. Since the count rate is usually low for X-ray observation, it is better to use C-statistics fitting than χ2 fitting, i.e., getting the best-fit parameters by minimizing C instead of χ2. The corresponding AICc is

AICc=Cmin+2k+2k(k+1)Nk1for Cstatistics.

As for the calculation of AICc, the first term should be χmin2 if χ2 fitting is used, while it should be Cmin if C-statistics fitting is implemented (Peca et al. 2023; Ng et al. 2022; Rogantini et al. 2019; Igoshev et al. 2018).

For a given model, its ΔAICc is defined as the relative difference of AICc between the minimum AICc, which reads,

ΔAICcAICcAICc,min.

The model with the minimum AICc is the most preferred model, but the rest models can be simply ruled out. Generally, the model with ΔAICc>2 is not substantially supported by the observation data and may not be reserved (Burnham et al. 2002). The following table (Burnham et al. 2002) shows the level of empirical support for different ΔAICc.

The model whose ΔAICc within 2 is substantially supported by the data, i.e., the model with ΔAICc within 2 is not significantly different from the best model and cannot be ruled out. However, the model whose ΔAICc is larger than 2 is considerably less supported by the data, i.e., the model with ΔAICc larger than 2 is probably not the best model and can be ruled out. The model whose ΔAICc is larger than 10 is essentially not supported by the data, you should discard such a model.

ΔAICcLevel of Empirical Support
0-2Substantial
4-7Considerably less
>10Essentially none

Attention

The corrected AIC can be used if you cannot rule out some models from the perspective of physics. Namely, corrected AIC just rules out the models from the perspective of statistics. If you can rule out some models from the perspective of physics, please rule out them first.

Released under the MIT License.