Using Feat’s archive

Feat optimizes a population of models. At the end of the run, it can be useful to explore this population to find a trade-off between objectives, such as performance and complexity.

In this example, we apply Feat to a regression problem and visualize the archive of representations.

Note: this code uses the Penn ML Benchmark Suite ( https://github.com/EpistasisLab/penn-ml-benchmarks/ ) to fetch data. You can install it using pip install pmlb .

First, we import the data and create a train-test split.

[1]:
from pmlb import fetch_data
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error as mse
import numpy as np
# fix the random state
random_state=42
dataset='690_visualizing_galaxy'
X, y = fetch_data(dataset,return_X_y=True)
X_t,X_v, y_t, y_v = train_test_split(X,y,train_size=0.75,test_size=0.25,random_state=random_state)

Then we set up a Feat instance and train the model, storing the final archive.

[2]:
from feat import Feat


fest = Feat(pop_size=500, # population size
            gens=100, # maximum generations
            max_time=60, # max time in seconds
            max_depth=2, # constrain features depth
            max_dim=5, # constrain representation dimensionality
            random_state=random_state,
            hillclimb=True, # use stochastic hillclimbing to optimize weights
            iters=10, # iterations of hillclimbing
            n_jobs=1, # restricts to single thread
            verbosity=2, # verbose output (this will go to terminal, sry..)
           )

print('FEAT version:', fest.__version__)
# train the model
fest.fit(X_t,y_t)

FEAT version: 0.4.1.post30
[2]:
Feat(hillclimb=True, max_depth=2, max_dim=5, max_time=60, pop_size=500,
     random_state=42, verbosity=2)
[3]:
# get the test score
test_score = {}
test_score['feat'] = mse(y_v,fest.predict(X_v))

# store the archive
archive = fest.get_archive(justfront=True)

# print the archive
print('complexity','fitness','validation fitness',
     'eqn')
order = np.argsort([a['complexity'] for a in archive])
complexity = []
fit_train = []
fit_test = []
eqn = []

for o in order:
    model = archive[o]
    if model['rank'] == 1:
        print(model['complexity'],
              model['fitness'],
              model['fitness_v'],
              model['eqn'],
             )

        complexity.append(model['complexity'])
        fit_train.append(model['fitness'])
        fit_test.append(model['fitness_v'])
        eqn.append(model['eqn'])
complexity fitness validation fitness eqn
1 1804.70947265625 1815.92822265625 [x_1]
2 967.038330078125 1090.6053466796875 [x_1][x_0]
3 931.5263061523438 916.8345336914062 [x_0][x_1][x_3]
4 912.5177001953125 862.1056518554688 [x_0][x_1][x_2][x_3]
6 669.4453735351562 900.9910888671875 [x_1][x_0][relu(0.00*x_1)]
7 467.7828369140625 704.1397094726562 [x_0][tanh(1.26*x_1)]
8 424.2886657714844 512.7071533203125 [x_0][tanh(1.34*x_1)][x_3]
9 383.1601867675781 463.29949951171875 [tanh(1.39*x_1)][x_2][x_0][x_3]
10 375.2489318847656 511.86627197265625 [tanh(0.98*x_1)][x_1][x_3][x_2][x_0]
12 350.7279357910156 492.56195068359375 [tanh(2.54*x_0)][tanh(1.29*x_1)]
13 319.4396667480469 426.8195495605469 [tanh(1.33*x_1)][x_2][tanh(2.41*x_0)]
14 301.9866638183594 386.4537048339844 [tanh(1.38*x_1)][x_3][x_2][tanh(2.79*x_0)]
15 297.632568359375 394.12969970703125 [tanh(2.86*x_0)][x_1][x_3][x_2][tanh(1.05*x_1)]
17 282.73626708984375 392.1083068847656 [tanh(2.62*x_0)][relu(0.63*x_3)][x_2][tanh(1.39*x_1)]
18 276.8981018066406 403.6515808105469 [tanh(2.66*x_0)][x_1][relu(0.61*x_3)][x_2][tanh(1.03*x_1)]
19 266.9380187988281 261.5851135253906 [(0.26*x_2*(1.32*x_0+0.26*x_1))][x_0][x_1][tanh(1.14*x_1)][x_3]
21 252.79739379882812 391.8340148925781 [(0.30*x_1*sqrt(|0.30*x_3|))][(0.44*x_2-0.01*x_1)][tanh(2.03*x_0)]
22 249.07005310058594 313.2448425292969 [(0.28*x_2*(1.06*x_0+0.24*x_1))][x_1][x_0][tanh(1.11*x_1)][sqrt(|0.18*x_3|)]
24 233.05250549316406 246.5561981201172 [(0.32*x_2*(1.16*x_0+0.27*x_1))][tanh(2.00*x_0)][x_0][x_1][sin(1.98*x_1)]
26 215.35858154296875 1645.1624755859375 [tanh(1.18*x_1)][tanh(2.25*x_0)][x_3][(0.35*x_1-0.34*x_0)][(0.19*x_2*(1.42*x_0+0.29*x_1))]
29 203.5608367919922 249.9871368408203 [(0.48*x_1+0.56*sqrt(|0.17*x_3|))][x_0][tanh(1.17*x_1)][(0.48*x_2*(0.29*x_0+0.07*x_1))][tanh(2.01*x_0)]
30 202.99911499023438 258.62847900390625 [tanh(1.18*x_1)][tanh(1.88*x_0)][(0.46*x_1+0.18*(0.27*x_0-0.34*sqrt(|0.29*x_1|)))][(0.18*x_2*(0.74*x_0+0.19*x_1))]
31 190.41644287109375 241.6823272705078 [tanh(1.20*x_1)][tanh(2.20*x_0)][x_3][(0.58*x_1+0.14*(0.19*x_0-0.25*sqrt(|0.19*x_3|)))][(0.23*x_2*(1.01*x_0+0.21*x_1))]
36 188.87777709960938 259.40362548828125 [tanh(1.19*x_1)][tanh(1.98*x_0)][(0.48*x_1+0.17*(0.20*x_0-0.29*sqrt(|0.28*x_3|)))][(0.08*x_2*(1.30*x_0+0.30*x_1))][(0.88*x_2/0.95*x_0)]
38 189.1365966796875 232.5410614013672 [tanh(1.22*x_1)][tanh(2.08*x_0)][(0.57*x_1+0.11*(0.25*x_0-0.23*sqrt(|0.18*x_3|)))][(0.23*x_2*(1.00*x_0+0.21*x_1))][exp(0.27*x_3)]
46 181.5760498046875 239.0564727783203 [tanh(1.19*x_1)][tanh(1.98*x_0)][(0.48*x_1+0.17*(0.20*x_0-0.29*sqrt(|0.28*x_3|)))][(0.08*x_2*(1.30*x_0+0.30*x_1))][(0.88*x_2/0.95*tanh(1.85*x_0))]
98 178.29383850097656 255.6151885986328 [tanh(1.05*x_1)][exp(0.58*tanh(1.79*x_0))][(0.31*x_3-0.88*log(1.37*(0.06*x_0/0.59*(0.47*x_3+1.14*x_2))))][(0.73*x_1+0.22*(0.15*x_0-0.26*sqrt(|0.15*x_3|)))][(0.28*x_2*(1.15*x_0+0.26*x_1))]
104 177.51222229003906 253.17242431640625 [tanh(1.11*x_1)][exp(0.58*tanh(1.79*x_0))][(0.31*x_3-0.88*log(1.37*(0.06*x_0/0.59*(0.47*x_3+1.14*x_2))))][(0.73*x_1+0.22*(0.15*x_0-0.26*sqrt(|0.83*sqrt(|0.15*x_3|)|)))][(0.28*x_2*(1.15*x_0+0.26*x_1))]
147 174.20518493652344 1227.4205322265625 [tanh(1.20*tanh(0.64*x_1))][exp(0.63*tanh(2.07*x_0))][(0.27*x_3-0.72*log(1.95*(0.04*x_0/0.79*(0.47*x_3+1.20*x_2))))][(0.55*x_1+0.21*(0.23*x_0-0.24*sqrt(|0.17*x_3|)))][exp(0.64*(0.21*x_2*(1.43*x_0+0.33*x_1)))]

For comparison, we can fit an Elastic Net and Random Forest regression model to the same data.

[4]:
from sklearn.ensemble import RandomForestRegressor

rf = RandomForestRegressor(random_state=random_state)

rf.fit(X_t,y_t)

test_score['rf'] = mse(y_v,rf.predict(X_v))

[5]:
from sklearn.linear_model import ElasticNet

linest = ElasticNet()

linest.fit(X_t,y_t)

test_score['elasticnet'] = mse(y_v,linest.predict(X_v))

Let’s look at the test set mean squared errors by method.

[6]:
test_score
[6]:
{'feat': 258.74213477932375,
 'rf': 347.15749506172847,
 'elasticnet': 919.3515337699562}

Visualizing the Archive

Let’s visualize this archive with the test scores. This gives us a sense of how increasing the representation complexity affects the quality of the model and its generalization.

[9]:
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
import math

matplotlib.rcParams['figure.figsize'] = (10, 6)
%matplotlib inline
sns.set_style('white')
h = plt.figure(figsize=(14,8))

# plot archive points
plt.plot(fit_train,complexity,'--ro',label='Train',markersize=6)
plt.plot(fit_test,complexity,'--bx',label='Validation')
# some models to point out
best = np.argmin(np.array(fit_test))
middle = np.argmin(np.abs(np.array(fit_test[:best])-test_score['rf']))
small = np.argmin(np.abs(np.array(fit_test[:middle])-test_score['elasticnet']))

print('best:',complexity[best])
print('middle:',complexity[middle])
print('small:',complexity[small])
plt.plot(fit_test[best],complexity[best],'sk',markersize=16,markerfacecolor='none',label='Model Selection')

# test score lines
y1 = -1
y2 = np.max(complexity)+1
plt.plot((test_score['feat'],test_score['feat']),(y1,y2),'--k',label='FEAT Test',alpha=0.5)
plt.plot((test_score['rf'],test_score['rf']),(y1,y2),'-.xg',label='RF Test',alpha=0.5)
plt.plot((test_score['elasticnet'],test_score['elasticnet']),(y1,y2),'-sm',label='ElasticNet Test',alpha=0.5)

print('complexity',complexity)
xoff = 100
for e,t,c in zip(eqn,fit_test,complexity):
    if c in [complexity[best],complexity[middle],complexity[small]]:
        t = t+xoff
        tax = plt.text(t,c,'$\leftarrow'+e+'$',size=18,horizontalalignment='left',
                      verticalalignment='center')
        tax.set_bbox(dict(facecolor='white', alpha=0.75, edgecolor='k'))

l = plt.legend(prop={'size': 16},loc=[1.01,0.25])
plt.xlabel('MSE',size=16)
plt.xlim(np.min(fit_train)*.75,np.max(fit_test)*2)
plt.gca().set_xscale('log')
plt.gca().set_yscale('log')

plt.gca().set_yticklabels('')
plt.gca().set_xticklabels('')

plt.ylabel('Complexity',size=18)
h.tight_layout()

plt.show()
best: 38
middle: 22
small: 3
complexity [1, 2, 3, 4, 6, 7, 8, 9, 10, 12, 13, 14, 15, 17, 18, 19, 21, 22, 24, 26, 29, 30, 31, 36, 38, 46, 98, 104, 147]
../_images/examples_archive_12_1.png

Note that ElasticNet produces a similar test score to the linear representation in Feat’s archive, and that Random Forest’s test score is near the representation shown in the middle.

The best model, marked with a square, is selected from the validation curve (blue line). The validation curve shows how models begin to overfit as complexity grows. By visualizing the archive, we can see that some lower complexity models achieve nearly as good of a validation score. In this case it may be preferable to choose that representation instead.

By default, FEAT will choose the model with the lowest validation error, marked with a square above. Let’s look at that model.

the function get_model() will print a table of the learned features, optionally ordered by the magnitude of their weights.

[8]:
print(fest.get_model(sort=False))
Weight  Feature
1590.39         offset
-103.53 tanh(1.22*x_1)
20.83   tanh(2.08*x_0)
63.00   (0.57*x_1+0.11*(0.25*x_0-0.23*sqrt(|0.18*x_3|)))
-76.59  (0.23*x_2*(1.00*x_0+0.21*x_1))
-13.93  exp(0.27*x_3)