Python API

class feat. Feat ( pop_size = 100 , gens = 100 , ml = 'LinearRidgeRegression' , classification = False , verbosity = 0 , max_stall = 0 , sel = 'lexicase' , surv = 'nsga2' , cross_rate = 0.5 , root_xo_rate = 0.5 , otype = 'a' , functions = '' , max_depth = 3 , max_dim = 10 , random_state = 0 , erc = False , obj = 'fitness,complexity' , shuffle = True , split = 0.75 , fb = 0.5 , scorer = '' , feature_names = '' , backprop = False , iters = 10 , lr = 0.1 , batch_size = 0 , n_jobs = 1 , hillclimb = False , logfile = '' , max_time = - 1 , residual_xo = False , stagewise_xo = False , stagewise_xo_tol = False , softmax_norm = False , save_pop = 0 , normalize = True , val_from_arch = True , corr_delete_mutate = False , simplify = 0.0 , protected_groups = '' , tune_initial = False , tune_final = True , starting_pop = '' ) [source]

Feature Engineering Automation Tool

Parameters
  • pop_size ( int , optional ( default: 100 ) ) – Size of the population of models

  • gens ( int , optional ( default: 100 ) ) – Number of iterations to train for

  • ml ( str , optional ( default: "LinearRidgeRegression" ) ) – ML pairing. Choices: LinearRidgeRegression, Lasso, L1_LR, L2_LR FeatRegressor sets to “LinearRidgeRegression”; FeatClassifier sets to L2 penalized LR (“LR”)

  • classification ( boolean or None , optional ( default: None ) ) – Whether to do classification or regression. Set explicitly in FeatRegressor and FeatClassifier accordingly.

  • verbosity ( int , optional ( default: 0 ) ) – How much to print out (0, 1, 2)

  • max_stall ( int , optional ( default: 0 ) ) – How many generations to continue after the validation loss has stalled. If 0, not used.

  • sel ( str , optional ( default: "lexicase" ) ) – Selection algorithm to use.

  • surv ( str , optional ( default: "nsga2" ) ) – Survival algorithm to use.

  • cross_rate ( float , optional ( default: 0.5 ) ) – How often to do crossover for variation versus mutation.

  • root_xo_rate ( float , optional ( default: 0.5 ) ) – When performing crossover, how often to choose from the roots of the trees, rather than within the tree. Root crossover essentially swaps features in models.

  • otype ( string , optional ( default: 'a' ) ) – Feature output types: ‘a’: all ‘b’: boolean only ‘f’: floating point only

  • functions ( string , optional ( default: "" ) ) – What operators to use to build features. If functions=””, all the available functions are used.

  • max_depth ( int , optional ( default: 3 ) ) – Maximum depth of a feature’s tree representation.

  • max_dim ( int , optional ( default: 10 ) ) – Maximum dimension of a model. The dimension of a model is how many independent features it has. Controls the number of trees in each individual.

  • random_state ( int , optional ( default: 0 ) ) – Random seed. If -1, will choose a random random_state.

  • erc ( boolean , optional ( default: False ) ) – If true, ephemeral random constants are included as nodes in trees.

  • obj ( str , optional ( default: "fitness , complexity" ) ) – Objectives to use for multi-objective optimization.

  • shuffle ( boolean , optional ( default: True ) ) – Whether to shuffle the training data before beginning training.

  • split ( float , optional ( default: 0.75 ) ) – The internal fraction of training data to use. The validation fold is then 1-split fraction of the data.

  • fb ( float , optional ( default: 0.5 ) ) – Controls the amount of feedback from the ML weights used during variation. Higher values make variation less random.

  • scorer ( str , optional ( default: '' ) ) – Scoring function to use internally.

  • feature_names ( str , optional ( default: '' ) ) – Optionally provide comma-separated feature names. Should be equal to the number of features in your data. This will be set automatically if a Pandas dataframe is passed to fit().

  • backprop ( boolean , optional ( default: False ) ) – Perform gradient descent on feature weights using backpropagation.

  • iters ( int , optional ( default: 10 ) ) – Controls the number of iterations of backprop as well as hillclimbing for learning weights.

  • lr ( float , optional ( default: 0.1 ) ) – Learning rate used for gradient descent. This the initial rate, and is scheduled to decrease exponentially with generations.

  • batch_size ( int , optional ( default: 0 ) ) – Number of samples to train on each generation. 0 means train on all the samples.

  • n_jobs ( int , optional ( default: 0 ) ) – Number of parallel threads to use. If 0, this will be automatically determined by OMP.

  • hillclimb ( boolean , optional ( default: False ) ) – Applies stochastic hillclimbing to feature weights.

  • logfile ( str , optional ( default: "" ) ) – If specified, spits statistics into a logfile. “” means don’t log.

  • max_time ( int , optional ( default: -1 ) ) – Maximum time terminational criterion in seconds. If -1, not used.

  • residual_xo ( boolean , optional ( default: False ) ) – Use residual crossover.

  • stagewise_xo ( boolean , optional ( default: False ) ) – Use stagewise crossover.

  • stagewise_xo_tol ( boolean , optional ( default:False ) ) – Terminates stagewise crossover based on an error value rather than dimensionality.

  • softmax_norm ( boolean , optional ( default: False ) ) – Uses softmax normalization of probabilities of variation across the features.

  • save_pop ( int , optional ( default: 0 ) ) – Saves the population of models. 0: don’t save; 1: save final population; 2: save every generation.

  • normalize ( boolean , optional ( default: True ) ) – Normalizes the floating point input variables using z-scores.

  • val_from_arch ( boolean , optional ( default: True ) ) – Validates the final model using the archive rather than the whole population.

  • corr_delete_mutate ( boolean , optional ( default: False ) ) – Replaces root deletion mutation with a deterministic deletion operator that deletes the feature with highest collinearity.

  • simplify ( float , optional ( default: 0 ) ) – Runs post-run simplification to try to shrink the final model without changing its output more than the simplify tolerance. This tolerance is the norm of the difference in outputs, divided by the norm of the output. If simplify=0, it is ignored.

  • protected_groups ( list , optional ( default: [ ] ) ) – Defines protected attributes in the data. Uses for adding fairness constraints.

  • tune_initial ( boolean , optional ( default: False ) ) – Tune the initial linear model’s penalization parameter.

  • tune_final ( boolean , optional ( default: True ) ) – Tune the final linear model’s penalization parameter.

  • starting_pop ( str , optional ( default: "" ) ) – Provide a starting pop in json format.

fit ( X , y , zfile = None , zids = None ) [source]

Fit a model.

fit_predict ( X , y ) [source]

Convenience method that runs fit(X,y) then predict(X)

fit_transform ( X , y ) [source]

Convenience method that runs fit(X,y) then transform(X)

get_params ( deep = False , static_params = False ) [source]

Get parameters for this estimator.

Parameters

deep ( bool , default=True ) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

load ( filename ) [source]

Load a saved Feat state from file.

predict ( X , zfile = None , zids = None ) [source]

Predict on X.

predict_archive ( X , zfile = None , zids = None ) [source]

Returns a list of dictionary predictions for all models.

save ( filename ) [source]

Save a Feat state to file.

score ( X , y , zfile = None , zids = None ) [source]

Returns a score for the predictions of Feat on X versus true labels y

set_params ( ** params ) [source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params ( dict ) – Estimator parameters.

Returns

self – Estimator instance.

Return type

object

transform ( X , zfile = None , zids = None ) [source]

Return the representation’s transformation of X