How Do You Visualize Categorical Variables In Python?

Asked 12 months ago
Answer 1
Viewed 141
1

This present circumstance happens while performing arrangement. Here the objective variable is absolute, consequently the indicators can either be consistent or clear cut. Consequently, when the indicator is additionally clear cut, then you utilize assembled bar outlines to envision the relationship between's the factors.

Consider the underneath model, where the objective variable is "APPROVE_LOAN". One of the indicators is "Orientation", so to comprehend whether there is an impact of Orientation on the endorsement of a credit or not, you plot gathered bar outline.

import pandas as pd
ColumnNames=['CIBIL','AGE','GENDER' ,'SALARY', 'APPROVE_LOAN']
DataValues=[ [480, 28, 'M', 610000, 'Yes'],
             [480, 42, 'M',140000, 'No'],
             [480, 29, 'F',420000, 'No'],
             [490, 30, 'M',420000, 'No'],
             [500, 27, 'M',420000, 'No'],
             [510, 34, 'F',190000, 'No'],
             [550, 24, 'M',330000, 'Yes'],
             [560, 34, 'M',160000, 'Yes'],
             [560, 25, 'F',300000, 'Yes'],
             [570, 34, 'M',450000, 'Yes'],
             [590, 30, 'F',140000, 'Yes'],
             [600, 33, 'M',600000, 'Yes'],
             [600, 22, 'M',400000, 'Yes'],
             [600, 25, 'F',490000, 'Yes'],
             [610, 32, 'M',120000, 'Yes'],
             [630, 29, 'F',360000, 'Yes'],
             [630, 30, 'M',480000, 'Yes'],
             [660, 29, 'F',460000, 'Yes'],
             [700, 32, 'M',470000, 'Yes'],
             [740, 28, 'M',400000, 'Yes']]
 
#Create the Data Frame
LoanData=pd.DataFrame(data=DataValues,columns=ColumnNames)
print(LoanData.head())
#################################################
# Cross tabulation between GENDER and APPROVE_LOAN
CrosstabResult=pd.crosstab(index=LoanData['GENDER'],columns=LoanData['APPROVE_LOAN'])
print(CrosstabResult)
 
# Grouped bar chart between GENDER and APPROVE_LOAN
%matplotlib inline # only needed for jupyter notebook
CrosstabResult.plot.bar()

Sample Output

If the bars of the classification "M" is like the bars of the classification "F", then you can say the Orientation and APPROVE_LOAN are NOT related.

The purpose for it is basic. Assuming the bars are comparative, that implies on the off chance that we change the orientation, we can't say that the advances are more endorsed or less supported, the proportion of endorsement Versus non-endorsement is no different for both the sexes.

On the off chance that the assembled bars are of various length for every classification, the factors are related to one another

Correlated variables example

Consider one more situation of similar information displayed beneath, here the proportions of endorsement versus non-endorsement of credits are different for classification "M" and "F". Thus, you can say that changing the orientation will affect the advance endorsement. Subsequently, there is a relationship between's these two factors.


# Creating a sample data frame
import pandas as pd
ColumnNames=['CIBIL','AGE','GENDER' ,'SALARY', 'APPROVE_LOAN']
DataValues=[ [480, 28, 'M', 610000, 'Yes'],
             [480, 42, 'M',140000, 'No'],
             [480, 29, 'M',420000, 'No'],
             [490, 30, 'M',420000, 'No'],
             [500, 27, 'M',420000, 'No'],
             [510, 34, 'F',190000, 'No'],
             [550, 24, 'M',330000, 'Yes'],
             [560, 34, 'M',160000, 'No'],
             [560, 25, 'F',300000, 'Yes'],
             [570, 34, 'M',450000, 'Yes'],
             [590, 30, 'F',140000, 'Yes'],
             [600, 33, 'F',600000, 'Yes'],
             [600, 22, 'M',400000, 'No'],
             [600, 25, 'F',490000, 'Yes'],
             [610, 32, 'F',120000, 'Yes'],
             [630, 29, 'F',360000, 'Yes'],
             [630, 30, 'F',480000, 'Yes'],
             [660, 29, 'F',460000, 'Yes'],
             [700, 32, 'M',470000, 'Yes'],
             [740, 28, 'M',400000, 'Yes']]
 
#Create the Data Frame
LoanData=pd.DataFrame(data=DataValues,columns=ColumnNames)
print(LoanData.head())
#########################################################
# Cross tabulation between GENDER and APPROVE_LOAN
CrosstabResult=pd.crosstab(index=LoanData['GENDER'],columns=LoanData['APPROVE_LOAN'])
print(CrosstabResult)
 
# Grouped bar chart between GENDER and APPROVE_LOAN
CrosstabResult.plot.bar(figsize=(7,4), rot=0)

Sample Output

Presently, here you can see the distinction in the proportions! Basically, your advance will get supported assuming you are Female! What's more, in the event that you are a Male, there are 50/50 possibilities of endorsement. Orientation influences the endorsement rate. Thus, orientation and credit endorsement are related here.

Read Also : How do you generate B2B leads through email marketing?
Answered 12 months ago   Wolski Kala Wolski Kala