It was previously mentioned that there is a difference between a numerical value and a categorical value, where a categorical value gets taken into account as something that doesn’t have a clearly measurable relationship with the other values in that same column. The challenge that hasn’t been addressed is how does a categorical column get converted into a numeric representation to be measured by math. There are a number of ways that a category can be given mathematical representation like this, but the most common is called one hot encoding.
One hot encoding pivots the categorical column into n number of columns, where n is equal to the number of unique values in the column and assigning a one to the appropriate column for value in each row and a zero to the other columns that were generated. CATEGORICALENCODING1 shows how `MarketingSource` from DATAPREVIEW1 would be one hot encoded.
PersonID 
MarketingSource 

Google Paid Search 
Organic Search 
Customer Referral 
Person_1 

1 
0 
0 
0 
Person_2 

1 
0 
0 
0 
Person_3 

1 
0 
0 
0 
Person_4 
Google Paid Search 
0 
1 
0 
0 
Person_5 
Google Paid Search 
0 
1 
0 
0 
Person_6 
Organic Search 
0 
0 
1 
0 
Person_7 
Organic Search 
0 
0 
1 
0 
Person_8 
Customer Referral 
0 
0 
0 
1 
Person_9 
Customer Referral 
0 
0 
0 
1 
Person_10 

1 
0 
0 
0 
CATEGORICALENCODING1
Categorical encoding allows the math to evaluate each unique variable independently of the others, unlike a numerical value that is evaluated in relative terms to the other values in the column, unique or not.
Comments
0 comments
Article is closed for comments.