Choosing the supports spaces

Introduction

It becomes evident from the analysis performed in “Generalized Maximum Entropy framework” that the estimates obtained via the GME/GCE framework depend on the choice of the support spaces. In fact, the restrictions imposed on the parameters space through \(Z\) should reflect prior knowledge about the unknown parameters. However, such knowledge is not always available and there is no clear answer for how to define those support spaces.

Prior-informed support space construction

If sufficient prior information is available, the support space can be constructed around a pre-estimate or prior mean, making the representation more efficient. — Golan [1]

Preliminary estimates (e.g., from OLS or Ridge) can guide the center and/or range of the support points for the unknown parameters. This can help to:

  • Avoid arbitrarily wide or narrow supports;

  • Improve estimation efficiency and stability;

  • Try to include the true value within the support.

Ridge

The Ridge regression introduced by Hoerl and Kennard [2] is an estimation procedure to handle collinearity without removing variables from the regression model. By adding a small non-negative constant (ridge or shrinkage parameter) to the diagonal of the correlation matrix of the explanatory variables, it is possible to reduce the variance of the OLS estimator through the introduction of some bias. Although the resulting estimators are biased, the biases are small enough for these estimators to be substantially more precise than the unbiased estimators. The challenge in Ridge regression remains on the selection of the ridge parameter. One straightforward approach is based on simply plotting the coefficients against several possible values for the ridge parameter and inspecting the resulting traces. The Ridge Regression estimator of \(\boldsymbol{\beta}\) takes the form \[\begin{align} \widehat{\boldsymbol{\beta}}^{ridge}&= \underset{\boldsymbol{\beta}}{\operatorname{argmin}} \|\mathbf{y}-\mathbf{X}\boldsymbol{\beta}\|^2+\lambda \|\boldsymbol{\beta}\|^2 \\ &=(\mathbf{X}'\mathbf{X}+\lambda \mathbf{I})^{-1}\mathbf{X'y}, \end{align}\] where \(\lambda \geq 0\) denotes the ridge parameter and \(\mathbf{I}\) is a \(((K+1) \times (K+1))\) identity matrix. Note that when \(\lambda \rightarrow 0\), the Ridge regression estimator approaches the OLS estimator whereas the Ridge regression estimator approaches the zero vector when \(\lambda \rightarrow \infty\). Thus, a trade-off between variance and bias is needed.

Ridge preliminary estimates can be obtained (choosing the ridge parameter according to a given rule) and be used to define, for instance, zero centered support spaces [3]. Macedo et al [4,5] suggest to define \(Z\) uniformly and symmetrically around zero with limits established by the absolute maximum values of the ridge estimates. The absolute maximum values are defined by the ridge trace when setting a vector of penalization parameters. This procedure is called the RidGME.

Consider dataThesis (see “Generalized Maximum Entropy framework”).

Suppose we want to obtain the estimates for the model

\[\begin{equation} \mathbf{y}=\beta_0\mathbf{1}_N + \beta_1\mathbf{X001} + \beta_2\mathbf{X002} + \beta_3\mathbf{X003} + \beta_4\mathbf{X004} + \mathbf{e}. \end{equation}\]

In order to define the support spaces let us obtain the ridge trace. Using the function ridgetrace() and setting:

  • formula = y ~ X001 + X002 + X003 + X004 + X005

  • data = dataThesis

the ridge trace is computed.

res.rt.01 <- 
  ridgetrace(
    formula = y ~ X001 + X002 + X003 + X004,
    data = dataThesis,
    lambda.min = 10^-3, # default
    lambda.max = 10^3, # default
    lambda.n = 100 # default
  )

Since that in our example the true parameters are known, we can add them to the ridge trace using the argument coef = coef.dataThesis from the plot function

plot(res.rt.01, coef = coef.dataThesis)

The Ridge estimated coefficients that produce the lowest 5-fold cross-validation RMSE (by default errormeasure = "RMSE",cv = TRUE and cv.nfolds = 5) are

res.rt.01
#> 
#> Call:
#> ridgetrace(formula = y ~ X001 + X002 + X003 + X004, data = dataThesis, 
#>     lambda.min = 10^-3, lambda.max = 10^3, lambda.n = 100)
#> 
#> Coefficients:
#> (Intercept)         X001         X002         X003         X004  
#>      -4.059       -6.096      -22.511       35.031      -30.717

Yet, we are interested in the maximum absolute values, and those values can be obtain setting the argument which = "max.abs" in the coef function.

coef(res.rt.01, which = "max.abs")
#> (Intercept)        X001        X002        X003        X004 
#>    4.060082    6.963381   23.524831   41.116805   34.008698

Note that the maximum absolute value of each estimate is greater than the true absolute value of the parameter (represented by the horizontal dashed horizontal lines).

coef(res.rt.01, which = "max.abs") > abs(coef.dataThesis)
#> (Intercept)        X001        X002        X003        X004 
#>        TRUE        TRUE        TRUE        TRUE        TRUE

Given this information and if one wants to have symmetrically centered supports it is possible to define, for instance, the following:

\(\mathbf{z}_0'= \left[ -4.060082, -4.060082/2, 0, 4.060082/2, 4.060082\right]\), \(\mathbf{z}_1'= \left[ -6.963381, -6.963381/2, 0, 6.963381/2, 6.963381\right]\) \(\mathbf{z}_2'= \left[ -23.524831, -23.524831/2, 0, 23.524831/2, 23.524831\right]\), \(\mathbf{z}_3'= \left[ -41.116805, -41.116805/2, 0, 41.116805/2, 41.116805\right]\), and \(\mathbf{z}_4'= \left[ -34.008698, -34.008698/2, 0, 34.008698/2, 34.008698\right]\).


(RidGME.support <- 
  matrix(c(-coef(res.rt.01, which = "max.abs"),
           coef(res.rt.01, which = "max.abs")),
         ncol = 2,
         byrow = FALSE))
#>            [,1]      [,2]
#> [1,]  -4.060082  4.060082
#> [2,]  -6.963381  6.963381
#> [3,] -23.524831 23.524831
#> [4,] -41.116805 41.116805
#> [5,] -34.008698 34.008698

Using lmgce and setting support.signal = RidGME.support it is possible to obtain the desired model

res.lmgce.RidGME <-
  GCEstim::lmgce(
    y ~ .,
    data = dataThesis,
    support.signal = RidGME.support,
    twosteps.n = 0
  )
coef(res.lmgce.RidGME)
#> (Intercept)        X001        X002        X003        X004 
#>  -3.6372353   0.2300302 -12.8022960 -10.1772066  -5.1629798

Alternatively, it is possible to use directly the lmgce function setting support.method = "ridge" and support.signal = 1. Doing this, the support spaces will be internally calculated.

res.lmgce.RidGME <-
  GCEstim::lmgce(
    y ~ .,
    data = dataThesis,
    support.method = "ridge",
    support.method.ridge.symm = TRUE, # default
    support.method.ridge.maxresid = FALSE,
    support.signal = 1,
    twosteps.n = 0
  )
coef(res.lmgce.RidGME)
#> (Intercept)        X001        X002        X003        X004 
#>  -3.6372353   0.2300302 -12.8022960 -10.1772066  -5.1629798

The estimated GME coefficients with a prior Ridge information are \(\widehat{\boldsymbol{\beta}}^{GME_{(RidGME)}}=\) (-3.637, 0.23, -12.802, -10.177, -5.163).

The prediction error is \(RMSE^{y,GME_{(RidGME)}} \approx\) 1.054, the cross-validation prediction error is \(CV\text{-}RMSE_{r}^{y,GME_{(RidGME)}} \approx\) 1.105, and the precision error is \(RMSE^{\beta,GME_{(RidGME)}} \approx\) 1.659.

We can compare these results with the ones from the “Generalized Maximum Entropy framework” vignette.

\(OLS\) \(GME_{(100000)}\) \(GME_{(100)}\) \(GME_{(50)}\) \(GME_{(RidGME)}\)
Prediction RMSE 0.947 0.947 0.949 0.951 1.054
Prediction CV-RMSE 0.980 0.980 0.982 0.984 1.105
Precision RMSE 29.295 29.191 3.022 2.746 1.659

Although we did not have to blindly define the support spaces, the results are not very reassuring and a different strategy should be pursued. Furthermore, if we consider the data set dataincRidGME and want to obtain the estimated model

\[\begin{equation} \mathbf{y}=\beta_0\mathbf{1}_N + \beta_1\mathbf{X001} + \beta_2\mathbf{X002} + \beta_3\mathbf{X003} + \beta_4\mathbf{X004} + \beta_5\mathbf{X005} + \beta_6\mathbf{X006} + \mathbf{e}. \end{equation}\]

we can note that now not all the maximum absolute values of the estimates are greater than the true absolute value of the parameters.

res.rt.02 <- 
  ridgetrace(
    formula = y ~ .,
    data = dataincRidGME)

The true coefficints used to generate dataincRidGME are

(coef.dataincRidGME <- c(2.5, rep(0, 3), c(-8, 19, -13)))
#> [1]   2.5   0.0   0.0   0.0  -8.0  19.0 -13.0
plot(res.rt.02, coef = coef.dataincRidGME)

coef(res.rt.02, which = "max.abs") > abs(coef.dataincRidGME)
#> (Intercept)        X001        X002        X003        X004        X005 
#>       FALSE        TRUE        TRUE        TRUE       FALSE        TRUE 
#>        X006 
#>        TRUE

If we use the maximum absolute values to define the support spaces we are excluding the true value of the parameter in two of them. To avoid that, we can broaden the support spaces by a given factor greater than \(1\), for instance \(2\). That can be done by setting support.signal = 2 in lmgce.


res.lmgce.RidGME.02.alpha1 <-
  GCEstim::lmgce(
    y ~ .,
    data = dataincRidGME,
    support.method = "ridge",
    support.method.ridge.symm = TRUE, # default
    support.method.ridge.maxresid = FALSE,
    support.signal = 1,
    twosteps.n = 0
  )

res.lmgce.RidGME.02.alpha2 <-
  GCEstim::lmgce(
    y ~ .,
    data = dataincRidGME,
    support.method = "ridge",
    support.method.ridge.symm = TRUE, # default
    support.method.ridge.maxresid = FALSE,
    support.signal = 2,
    twosteps.n = 0
  )

From summary we can confirm that both prediction error and prediction cross-validation error are smaller when multiplying the maximum absolute values by \(2\).

summary(res.lmgce.RidGME.02.alpha1)$error.measure
#> [1] 2.753859
summary(res.lmgce.RidGME.02.alpha2)$error.measure
#> [1] 2.589135
summary(res.lmgce.RidGME.02.alpha1)$error.measure.cv.mean
#> [1] 2.855001
summary(res.lmgce.RidGME.02.alpha2)$error.measure.cv.mean
#> [1] 2.673434

The precision error is also smaller

round(GCEstim::accmeasure(coef(res.lmgce.RidGME.02.alpha1), coef.dataincRidGME, which = "RMSE"), 3) 
#> [1] 3.962
round(GCEstim::accmeasure(coef(res.lmgce.RidGME.02.alpha2), coef.dataincRidGME, which = "RMSE"), 3) 
#> [1] 3.256

But, since generally we do not know the true value of the parameters, we also can not know by which factor we must multiply the maximum absolute values. And to make it even more complicated, in some situations, a “better” estimation is obtained when the factor is between \(0\) and \(1\). So, we might as well test different values of the factor and choose, for instance, the one with the lowest k-fold cross-validation error. By default support.signal.vector.n = 20 values logarithmically spaced between support.signal.vector.min = 0.3 and support.signal.vector.max = 20 will be tested in a cv.nfolds = 5 fold cross-validation (CV) scenario and the factor chosen corresponds to the one that produces the CV errormeasure = "RMSE" that is not greater than the minimum CV-RMSE plus one standard error (errormeasure.which = "1se").

res.lmgce.RidGME.02 <-
  GCEstim::lmgce(
    y ~ .,
    data = dataincRidGME,
    support.method = "ridge",
    twosteps.n = 0
  )

With plot it is possible to visualize the change in the CV-error with the different factors used to multiply the maximum absolute value given by the Ridge trace.

plot(res.lmgce.RidGME.02, which = 2, NormEnt = FALSE)$p2

Red dots represent the CV-error and whiskers have the length of two standard errors for each of the 20 support spaces. The dotted horizontal line is the OLS CV-error. The black vertical dotted line corresponds to the support spaces that produced the lowest CV-error. The black vertical dashed line corresponds to the support spaces that produced the 1se CV-error. The red vertical dotted line corresponds to the support spaces that produced the elbow CV-error.

summary(res.lmgce.RidGME.02)
#> 
#> Call:
#> GCEstim::lmgce(formula = y ~ ., data = dataincRidGME, support.method = "ridge", 
#>     twosteps.n = 0)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -6.4866 -0.9559  0.9156  2.6846  8.2776 
#> 
#> Coefficients:
#>             Estimate Std. Deviation z value Pr(>|t|)    
#> (Intercept)  1.46930        0.13478  10.902  < 2e-16 ***
#> X001         1.09574        8.15474   0.134  0.89311    
#> X002         0.25230        8.93924   0.028  0.97748    
#> X003         0.05000        4.74420   0.011  0.99159    
#> X004        -0.02795        6.23277  -0.004  0.99642    
#> X005        13.62821        5.05258   2.697  0.00699 ** 
#> X006        -6.58200       18.46490  -0.356  0.72150    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Normalized Entropy:
#>              NormEnt  SupportLL SupportUL
#> (Intercept) 0.403919  -1.694475  1.694475
#> X001        0.969934  -5.007529  5.007529
#> X002        0.988918  -1.892794  1.892794
#> X003        0.999038  -1.270632  1.270632
#> X004        0.999773  -1.462323  1.462323
#> X005        0.309534 -14.951168 14.951168
#> X006        0.832130 -13.054795 13.054795
#> 
#> Residual standard error: 2.994 on 113 degrees of freedom
#> Chosen factor for the Upper Limit of the Supports: 0.7263, Chosen Error: 1se
#> Multiple R-squared: 0.2644, Adjusted R-squared: 0.2253
#> NormEnt: 0.7862, CV-NormEnt: 0.8123 (0.01803)
#> RMSE: 2.906, CV-RMSE: 2.998 (0.2348)

Note that the prediction errors are worst than the ones obtained when the factor was \(2\) because we chose 1se error. In this case, the precision error is also not the best one. We should have chosen errormeasure.which = "min". That can be done using

res.lmgce.RidGME.02.min <-
  GCEstim::lmgce(
    y ~ .,
    data = dataincRidGME,
    support.method = "ridge",
    errormeasure.which = "min",
    twosteps.n = 0
  )

From the summary we can conclude that the lowest prediction errors were obtained.

summary(res.lmgce.RidGME.02.min)
#> 
#> Call:
#> GCEstim::lmgce(formula = y ~ ., data = dataincRidGME, errormeasure.which = "min", 
#>     support.method = "ridge", twosteps.n = 0)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -7.1008 -1.7581 -0.0809  1.5068  6.0625 
#> 
#> Coefficients:
#>             Estimate Std. Deviation z value Pr(>|t|)    
#> (Intercept)   2.2380         0.1189  18.822  < 2e-16 ***
#> X001          3.8909         7.1939   0.541    0.589    
#> X002         -0.3020         7.8860  -0.038    0.969    
#> X003         -0.4128         4.1852  -0.099    0.921    
#> X004         -0.6463         5.4984  -0.118    0.906    
#> X005         20.7784         4.4573   4.662 3.14e-06 ***
#> X006        -13.4480        16.2893  -0.826    0.409    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Normalized Entropy:
#>              NormEnt  SupportLL SupportUL
#> (Intercept) 0.950029  -7.961442  7.961442
#> X001        0.982905 -23.527733 23.527733
#> X002        0.999284  -8.893240  8.893240
#> X003        0.997026  -5.970029  5.970029
#> X004        0.994491  -6.870682  6.870682
#> X005        0.944564 -70.247642 70.247642
#> X006        0.969815 -61.337584 61.337584
#> 
#> Residual standard error: 2.641 on 113 degrees of freedom
#> Chosen factor for the Upper Limit of the Supports: 3.4125, Chosen Error: min
#> Multiple R-squared: 0.5366, Adjusted R-squared: 0.512
#> NormEnt: 0.9769, CV-NormEnt: 0.9774 (0.00137)
#> RMSE: 2.563, CV-RMSE: 2.656 (0.4292)

The precision error is also the best one.

round(GCEstim::accmeasure(coef(res.lmgce.RidGME.02), coef.dataincRidGME, which = "RMSE"), 3) 
#> [1] 4.407

round(GCEstim::accmeasure(coef(res.lmgce.RidGME.02.min), coef.dataincRidGME, which = "RMSE"), 3) 
#> [1] 3.227

If we go back to our first example and use this last approach, called incRidGME,

res.lmgce.RidGME.01.1se <-
  GCEstim::lmgce(
    y ~ .,
    data = dataThesis,
    support.method = "ridge",
    twosteps.n = 0
  )
res.lmgce.RidGME.01.min <- 
  GCEstim::lmgce(
    y ~ .,
    data = dataThesis,
    support.method = "ridge",
    errormeasure.which = "min",
    twosteps.n = 0
  )

we can see an improvement from the RidGME. In particular, when the 1se error is selected, the Bias-Variance tradeoff seems more appropriate than when the min error is defined.

\(OLS\) \(GME_{(100000)}\) \(GME_{(100)}\) \(GME_{(50)}\) \(GME_{(RidGME)}\) \(GME_{(incRidGME_{1se})}\) \(GME_{(incRidGME_{min})}\)
Prediction RMSE 0.947 0.947 0.949 0.951 1.054 1.132 0.948
Prediction CV-RMSE 0.980 0.980 0.982 0.984 1.105 1.191 0.981
Precision RMSE 29.295 29.191 3.022 2.746 1.659 1.850 8.939

Standardization

Since all parameters estimations methods have some drawback we can try to avoid doing a pre estimation to define the support space. Consider model in (1) (see “Generalized Maximum Entropy framework”). It can be written as
\[\begin{align} \qquad \qquad \mathbf{y} &= \beta_0\mathbf{1}_N + \beta_1 \mathbf{x_{1}} + \beta_2 \mathbf{x_{2}} + \dots + \beta_K \mathbf{x_{K}} + \mathbf{\e}, \qquad \qquad (2) \end{align}\]

Standardizing \(y\) and \(x_j\), the model in (2) is rewritten as
\[\begin{align} y^* &= X^*b + e^*,\\ y^* &= b_1x_1^* + b_2x_2^* + \dots + b_Kx_K^* + e^*, \end{align}\] where \[\begin{align} y_i^*&=\frac{y_i-\frac{\sum_{i=1}^{N}y_i}{N}}{\sqrt{\frac{1}{N-1}\sum_{i=1}^{N}\left( y_i-\frac{\sum_{i=1}^{N}y_i}{N}\right)^2}},\\ x_{ji}^*&=\frac{x_{ji}-\frac{\sum_{i=1}^{N}x_{ji}}{N}}{\sqrt{\frac{1}{N-1}\sum_{i=1}^{N}\left( x_{ji}-\frac{\sum_{i=1}^{N}x_{ji}}{N}\right)^2}},\\ b_j&=\frac{\sqrt{\frac{1}{N-1}\sum_{i=1}^{N}\left( x_{ji}-\frac{\sum_{i=1}^{N}x_{ji}}{N}\right)^2}}{\sqrt{\frac{1}{N-1}\sum_{i=1}^{N}\left( y_i-\frac{\sum_{i=1}^{N}y_i}{N}\right)^2}}\beta_j, \end{align}\] with \(j\in \left\lbrace 1,\dots,K\right\rbrace\), and \(i \in \left\lbrace 1,\dots,N\right\rbrace\). In this formulation, \(b_j\) are called standardized coefficients.

Although not bounded, standardized coefficients greater than \(1\) in magnitude tend to occur with low frequency, and specially in extremely ill-conditioned problems. Given this, one can define zero centered support spaces for the standardized variables symmetrically bounded by a “small” number (or vector of numbers) and then revert the support spaces to the original scale. By doing so, no pre estimation is performed. lmgce uses this approach by default (support.method = "standardized") to do the estimation.

res.lmgce.1se <-
  GCEstim::lmgce(
    y ~ .,
    data = dataThesis,
    support.method = "standardize", # default 
    errormeasure.which = "1se", # default
    twosteps.n = 0
  )

We can also choose the support space that produced the lowest CV-error.

res.lmgce.min <- changesupport(res.lmgce.1se, "min")
summary(res.lmgce.1se)
#> 
#> Call:
#> GCEstim::lmgce(formula = y ~ ., data = dataThesis, errormeasure.which = "1se", 
#>     support.method = "standardize", twosteps.n = 0)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -3.3533 -0.8869 -0.1567  0.3512  1.8425 
#> 
#> Coefficients:
#>             Estimate Std. Deviation z value Pr(>|t|)    
#> (Intercept)  -3.6956         0.0634 -58.291   <2e-16 ***
#> X001          0.4679         6.6664   0.070    0.944    
#> X002        -11.6305         7.8027  -1.491    0.136    
#> X003        -11.4604        46.7568  -0.245    0.806    
#> X004         -4.0757        25.2921  -0.161    0.872    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Normalized Entropy:
#>              NormEnt  SupportLL SupportUL
#> (Intercept) 0.422711  -4.311289  4.311289
#> X001        0.998452  -9.377392  9.377392
#> X002        0.730761 -18.577182 18.577182
#> X003        0.922396 -32.878041 32.878041
#> X004        0.976346 -20.975389 20.975389
#> 
#> Residual standard error: 1.102 on 70 degrees of freedom
#> Chosen Upper Limit for Standardized Supports: 0.9059, Chosen Error: 1se
#> Multiple R-squared: 0.7365, Adjusted R-squared: 0.7214
#> NormEnt: 0.8101, CV-NormEnt: 0.8206 (0.01532)
#> RMSE: 1.064, CV-RMSE: 1.149 (0.3264)
summary(res.lmgce.min)
#> 
#> Call:
#> GCEstim::lmgce(formula = y ~ ., data = dataThesis, errormeasure.which = "1se", 
#>     support.method = "standardize", twosteps.n = 0)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -2.6246 -0.5646  0.1015  0.5219  2.0774 
#> 
#> Coefficients:
#>              Estimate Std. Deviation z value Pr(>|t|)    
#> (Intercept)  -4.04530        0.05608 -72.134   <2e-16 ***
#> X001         -2.02242        5.89678  -0.343   0.7316    
#> X002        -17.73142        6.90190  -2.569   0.0102 *  
#> X003          6.36037       41.35863   0.154   0.8778    
#> X004        -15.18239       22.37211  -0.679   0.4974    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Normalized Entropy:
#>              NormEnt  SupportLL SupportUL
#> (Intercept) 0.964998  -17.14915  17.14915
#> X001        0.999941 -207.02929 207.02929
#> X002        0.998838 -410.13759 410.13759
#> X003        0.999952 -725.86468 725.86468
#> X004        0.999332 -463.08398 463.08398
#> 
#> Residual standard error: 0.9815 on 70 degrees of freedom
#> Chosen Upper Limit for Standardized Supports: 20, Chosen Error: min
#> Multiple R-squared: 0.8361, Adjusted R-squared: 0.8267
#> NormEnt: 0.9926, CV-NormEnt: 0.9927 (0.0002702)
#> RMSE: 0.9483, CV-RMSE: 0.9805 (0.2474)

And we can do a final comparison between methods.

\(OLS\) \(GME_{(RidGME)}\) \(GME_{(incRidGME_{1se})}\) \(GME_{(incRidGME_{min})}\) \(GME_{(std_{1se})}\) \(GME_{(std_{min})}\)
Prediction RMSE 0.947 1.054 1.132 0.948 1.064 0.948
Prediction CV-RMSE 0.980 1.105 1.191 0.981 1.149 0.981
Precision RMSE 29.191 1.659 1.850 8.939 2.027 9.464

The precision error obtained by the 1se with support spaces defined by standardized bounds was the best at a small expense of the prediction error.

Conclusion

The choice of the support spaces is crucial for an accurate estimation of the regression parameters. Prior information can be used to define those support spaces. That information can be theoretical, or can be obtained from previous regression models, or can rely on the distribution of standardized regression coefficients. From our analysis the last approach produces good results.

References

1.
Golan A, Judge GG, Miller D. Maximum Entropy Econometrics : Robust Estimation with Limited Data. Wiley; 1996.
2.
Hoerl AE, Kennard RW. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics. 1970;12:55-67. doi:10.1080/00401706.1970.10488634
3.
Cabral J, Macedo P, Marques A, Afreixo V. Comparison of feature selection methods—modelling COPD outcomes. Mathematics. 2024;12:1398. doi:10.3390/math12091398
4.
Macedo P, Costa MC, Cruz JP. Normalized entropy: A comparison with traditional techniques in variable selection. In:; 2022:190002. doi:10.1063/5.0081504
5.
Macedo P, Cabral J, Afreixo V, Macedo F, Angelelli M. RidGME estimation and inference in ill-conditioned models. In: Gervasi O, Murgante B, Garau C, et al., eds. Computational Science and Its Applications – ICCSA 2025 Workshops. Springer Nature Switzerland; 2025:300-313.

Acknowledgements

This work was supported by Fundação para a Ciência e Tecnologia (FCT) through CIDMA and projects https://doi.org/10.54499/UIDB/04106/2020 and https://doi.org/10.54499/UIDP/04106/2020.