Kraken requires a dataset that is mostly ready for machine learning. However, we do apply some basic pre-processing steps to the data before building models.
- Random train/test split of the data into 70/30% chunks
- Imputation of nulls
- Encoding categorical features (also know as creating "dummy variables")
- Feature scaling, or normalization
- Handling high correlation of a Driver to the predicted Metric or correlation between Drivers
All of these pre-processing steps are performed given different thresholds set in our pipeline. The thresholds can be changed by us as we learn more about how accurate the models are that Kraken creates.