I’m going to make up an example because I think it’s easier to communicate than what I really did which involves MCMC and Bayesian inference.

Assume you have data on peoples’ earnings, their age, and shoe size. Assume the only relevant peoples are people aged 1-10, and the only possible shoe sizes are 1-10. These are categorical variables you use in a model.

Let’s say you have a data set consisting of exactly 1 observation for each case where age <= show size. So for example, you have the income of a person with shoe size 10 at ages 1-10 (10 observations), the income of a person with shoe size 9 at ages 1-9 (9 observations)… down to a person with shoe size 1 at age 1 (1 observation). Thus you don’t have any observations for people with an age greater than their shoe size, but you want to be able predict the income of shoe size 3 when he’s age 10 for example. Let’s say in all your observations, your salaries are all below 20,000 and you know (both from experience and the data) that for an given shoe size, a higher age makes more money.

You construct a model where log(salary) = age + shoe size, where age and shoe size are categorical variables (binary variables each to account for every possibility). You estimate coefficients and now you can predict salary for any combination of age and shoe size.

You create another model and to compare the 2 models you look at an information criterion like AIC (Aikaike Information Criterion); in my specific case I’m looking at DIC (Deviance Information Criterion). Let’s say the AIC for model 2 is SUBSTANTIALLY lower than model 1.

However, when you use your model to estimate the salary of someone with shoe size 3 at age 10 like mentioned earlier using model 2, you get an estimate of 850,000, way higher than you know is possible in my fictional world where pretty much everyone is making less than 20,000. Also when you get the predictions based on model 1, you don’t get any estimates above 30,000 i.e. much more reasonable predictions.

Thus I definitely can’t use that model for my prediction because as a subject matter expert I know it’s not feasible but from a statistical standpoint I feel at a loss because the DIC was so much lower (19 compared to 60) which is a huge difference for DIC.

TLDR; what to do when statistics about Nevin Manimala point overwhelming to a model that isn’t producing feasible predictions.