728x90
import pandas as pd
import numpy as np
데이터 프레임 = df
features의 특정 feature = feature
feature percent (pie chart)
import matplotlib.pyplot as plt
df.feature.value_counts().plot(kind='pie', autopct='%1.1f%%') # autopct = 비율표시
pandas_profiling (profile report)
!pip install -U pandas-profiling
from pandas_profiling import ProfileReport
df.profile_report()
chi-sqare test (두 변수간 연관성이 있는지 확인)
from scipy.stats import chi2_contingency
# chi2: The test statistic
# p: The p-value of the test
# dof: Degrees of freedom
# expected: The expected frequencies, based on the marginal sums of the table
contigency= pd.crosstab(df['feature'], df['target'])
chi2, p, dof, expected = chi2_contingency(contigency)
p # p값이 0에 가까울수록 연관성이 있음'데이터 분석 > 전처리 및 EDA' 카테고리의 다른 글
| 선형모델이란? (0) | 2021.01.28 |
|---|---|
| Pandas란? (0) | 2021.01.26 |
| Pandas 기본설정 (0) | 2020.12.31 |