You need to be careful about which fields to use when training your models. Below are some fields that you need to pay particular attention to.
In statistics and machine learning, leakage (also known as data leakage or target leakage) is the use of information in the model training process that would not be expected to be available at prediction time, causing the predictive scores (metrics) to overestimate the model's utility when run in a production environment.
Object | Field | Comment |
Opportunity | Stage | Do not train on this field as it is typically "Closed Won" or "Closed Lost" |
Opportunity | Forecast Category | Do not train on this field as it is typically "Closed Won" or "Closed Lost" |
Opportunity | Probability | Do not train on this field as it will be either 100% or 0% |
Opportunity | Expected Revenue | Do not train on this field as it will be 0 when Closed Lost |
Opportunity | Record Type | Very often this is used to lock a record when it is won |
Opportunity | Last Modified By | This can often be limited to a few people in a sales ops or finance team that modify the opportunity after it is closed |
Opportunity | Last Modified Date | Opportunities are often modified after they are closed |
Opportunity | Amount | Check to make sure this is not put to 0 when Closed Lost |
Account | Record Type | Can often be Customer or Prospect |
Account | Type | Can often be Customer or Prospect |
Account | Last Modified By | This can often be limited to a few people in a sales ops or finance team that modify the account after it is closed, e.g Type from Prospect to Customer |
Account | Last Modified Date | Accounts are often modified after an Opportunity is closed |