国产精品chinese,色综合天天综合精品网国产在线,成午夜免费视频在线观看,清纯女学生被强行糟蹋小说

    <td id="ojr13"><tr id="ojr13"><label id="ojr13"></label></tr></td>
        • <source id="ojr13"></source>
            <td id="ojr13"><ins id="ojr13"><label id="ojr13"></label></ins></td>

            Article / 文章中心

            Python數(shù)據(jù)分析招式:pandas庫過濾分組透視表-2

            發(fā)布時(shí)間:2021-11-23 點(diǎn)擊數(shù):560

            相關(guān): Python數(shù)據(jù)分析招式:pandas庫提取清洗排序-1

            要點(diǎn):

            • 數(shù)據(jù)的字符處理
            • 數(shù)據(jù)的過濾
            • 數(shù)據(jù)的分組
            • 數(shù)據(jù)的透視表

            引入數(shù)據(jù)

            # -*- coding: utf-8 -*-  # @File    : 數(shù)據(jù)集的處理.py # @Date    : 2018-06-03  import pandas as pd  file = "data/train.csv" df = pd.DataFrame(pd.read_csv(file)) print(df.head(3)) """  PassengerId  Survived  Pclass    ...        Fare Cabin  Embarked 0            1         0       3    ...      7.2500   NaN         S 1            2         1       1    ...     71.2833   C85         C 2            3         1       3    ...      7.9250   NaN         S  [3 rows x 12 columns] """

            1、數(shù)據(jù)集的字符處理

            # 對列的處理 mapping ={  'PassengerId': '乘客編號',  'Survived':'是否獲救',  'Name':'姓名',  'Pclass':'船艙等級','Sex':'性別',  'Age':'年齡','SibSp':'兄弟姐妹數(shù)',  'Parch':'父母小孩數(shù)','Ticket':'船票',  'Fare':'船票費(fèi)' }  ret = df.rename(columns=mapping) print(ret.head(3)) """  乘客編號  是否獲救  船艙等級    ...         船票費(fèi) Cabin  Embarked 0     1     0     3    ...      7.2500   NaN         S 1     2     1     1    ...     71.2833   C85         C 2     3     1     3    ...      7.9250   NaN         S  [3 rows x 12 columns] """  # 對數(shù)據(jù)集里面的特定字符串進(jìn)行替換 ret = df['Sex'].map({'female':'女','male':'男'}) print(ret.head(3)) """ [3 rows x 12 columns] 0    男 1    女 2    女 Name: Sex, dtype: object """  # 對列的字符進(jìn)行替換, 只保留數(shù)字部分 # contains,split,match,findall,endswith df['Embarked']=df['Embarked'].replace(regex='[CS]', value='xxx') print(df.head(3)) """  PassengerId  Survived  Pclass    ...        Fare Cabin  Embarked 0            1         0       3    ...      7.2500   NaN       xxx 1            2         1       1    ...     71.2833   C85       xxx 2            3         1       3    ...      7.9250   NaN       xxx  [3 rows x 12 columns] """

            2、數(shù)據(jù)集的過濾

            # 用邏輯表達(dá)式組合過濾 ==, != ret = df[(df['Sex']=='female')&(df['Age']>10)] print(ret.head(3)) """  PassengerId  Survived  Pclass    ...        Fare Cabin  Embarked 1            2         1       1    ...     71.2833   C85       xxx 2            3         1       3    ...      7.9250   NaN       xxx 3            4         1       1    ...     53.1000  C123       xxx  [3 rows x 12 columns] """ # query函數(shù) ret = df.query('Age==[10, 20]') print(ret[["Name", "Age"]].head(3)) """  Name   Age 12   Saundercock, Mr. William Henry  20.0 91       Andreasson, Mr. Paul Edvin  20.0 113         Jussila, Miss. Katriina  20.0 """

            3、數(shù)據(jù)的分類

            # 用where函數(shù) import numpy as np ret=np.where(df['Age']>=18)  # apply函數(shù) def convert_age(age):  if age> 0 and age < 10:  return "小孩"  elif age < 30:  return "大人"  else:  return "老人"  df["年齡分類"] = df['Age'].apply(convert_age) print(df[["Name", "Age", "年齡分類"]].sample(3)) """  Name   Age 年齡分類 624                   Bowen, Mr. David John "Dai"  21.0   大人 880  Shelley, Mrs. William (Imanita Parrish Hall)  25.0   大人 471                               Cacic, Mr. Luka  38.0   老人 """

            4、 數(shù)據(jù)的切片和透視表

            # groupby函數(shù) print(df.groupby('Sex')[['Name', 'Sex']].count()) """  Name  Sex Sex female   314  314 male     577  577 """  # 對數(shù)據(jù)進(jìn)行軸切片分析 ret = df.groupby(['Survived','Pclass'])['Age'].agg(['size','max','min','mean']) print(ret) """ Survived Pclass 0        1         80  71.0   2.00  43.695312  2         97  70.0  16.00  33.544444  3        372  74.0   1.00  26.555556 1        1        136  80.0   0.92  35.368197  2         87  62.0   0.67  25.901566  3        119  63.0   0.42  20.646118 """  # 數(shù)據(jù)透視表 ret = df.pivot_table(columns=['Sex'],index=['Survived','Pclass'],values='Age', aggfunc={'Age':[np.mean,min,max]}) print(ret) """  max             mean               min Sex             female  male     female       male female   male Survived Pclass 0        1        50.0  71.0  25.666667  44.581967   2.00  18.00  2        57.0  70.0  36.000000  33.369048  24.00  16.00  3        48.0  74.0  23.818182  27.255814   2.00   1.00 1        1        63.0  80.0  34.939024  36.248000  14.00   0.92  2        55.0  62.0  28.080882  16.022000   2.00   0.67  3        63.0  45.0  19.329787  22.274211   0.75   0.42 """