Possible Data Leakage Fields
Updated over a week ago

You need to be careful about which fields to use when training your models. Below are some fields that you need to pay particular attention to.

In statistics and machine learning, leakage (also known as data leakage or target leakage) is the use of information in the model training process that would not be expected to be available at prediction time, causing the predictive scores (metrics) to overestimate the model's utility when run in a production environment.

Object

Field

Comment

Opportunity

Stage

Do not train on this field as it is typically "Closed Won" or "Closed Lost"

Opportunity

Forecast Category

Do not train on this field as it is typically "Closed Won" or "Closed Lost"

Opportunity

Probability

Do not train on this field as it will be either 100% or 0%

Opportunity

Expected Revenue

Do not train on this field as it will be 0 when Closed Lost

Opportunity

Record Type

Very often this is used to lock a record when it is won

Opportunity

Last Modified By

This can often be limited to a few people in a sales ops or finance team that modify the opportunity after it is closed

Opportunity

Last Modified Date

Opportunities are often modified after they are closed

Opportunity

Amount

Check to make sure this is not put to 0 when Closed Lost

Account

Record Type

Can often be Customer or Prospect

Account

Type

Can often be Customer or Prospect

Account

Last Modified By

This can often be limited to a few people in a sales ops or finance team that modify the account after it is closed, e.g Type from Prospect to Customer

Account

Last Modified Date

Accounts are often modified after an Opportunity is closed

Did this answer your question?