bin_age

bin_age

Description

Categorize continous data into separate bins. this is a way of turning continous feature into categorical feature.

Signature:
ds.feature_engineering.bin_age(data,
                                feature,
                                bins,
                                labels,
                                fill_missing=None,
                                drop_original=False,
                                )
Docstring:
Categorize age data into separate bins

Parameter:
-----------------------------------------
data: DataFrame, Series.

    Data for which feature to be binned exist.

feature: List, Series

    Columns to be binned

Bins: List, numpy.ndarray

    Specifies the different categories. Bins must be one greater labels.

labels: List, Series

    Name identified to the various categories

fill_missing(default = None): int

    mean : feature average.
    mode : most occuring data in the feature.
    median : middle point in the feature.

drop_original: bool

    Drops original feature after beaning.

Returns:
    Returns a binned dataframe.

Examples

>>> import datasist as ds
>>> import pandas as pd
df = pd.DataFrame(data =[list(np.random.randint(5,30,4)),
                         list(np.random.randint(7,30,4)),
                         list(np.random.randint(5,30,4)),
                         list(np.random.randint(5,30,4)),
                         list(np.random.randint(5,30,4)),
                         list(np.random.randint(5,30,4))],
             columns = ['A','B','C','D'])
>>> df

       A    B    C    D
0    27    7    8    17
1    27    25    29    21
2    17    19    24    19
3    28    8    21    12
4    10    25    21    15
5    15    12    19    9


>>> ds.feature_engineering.bin_age(df,['A'],3,['A1','A2','A3'])
    A    B    C    D    A_binned
0    27    7    8    17    A3
1    27    25    29    21    A3
2    17    19    24    19    A2
3    28    8    21    12    A3
4    10    25    21    15    A1
5    15    12    19    9    A1

Setting drop_original: bool to True

>>> ds.feature_engineering.bin_age(df,['A'],3,['A1','A2','A3'],drop_original = True)
B    C    D    A_binned
0    7    8    17    A3
1    25    29    21    A3
2    19    24    19    A2
3    8    21    12    A3
4    25    21    15    A1
5    12    19    9    A1

Last updated