Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a simple python function running for too long, not stopping #4832

Open
KAMALESH0081 opened this issue Sep 8, 2024 · 1 comment
Open

a simple python function running for too long, not stopping #4832

KAMALESH0081 opened this issue Sep 8, 2024 · 1 comment

Comments

@KAMALESH0081
Copy link

KAMALESH0081 commented Sep 8, 2024

this is the code :

Define your threshold for unique value counts

threshold = 125 # Example: if unique values > 10, fill with mean, else fill with mode

Function to fill NaN values based on the threshold

def fill_nulls_with_mean_or_mode(df, threshold):
for column in df.columns:
unique_count = df[column].nunique()
print(f"Processing column: {column}")

    if unique_count > threshold:
        # Fill NaN values with the mean if unique value count exceeds the threshold
        df[column].fillna(df[column].mean(), inplace=True)
    else:
        # Fill NaN values with the mode if unique value count is below or equal to the threshold
        df[column].fillna(df[column].mode()[0], inplace=True)

return df

fill_nulls_with_mean_or_mode(train2, threshold)

@cperry-goog
Copy link

We can't investigate if we don't have the data, our guess is train2 is really large. If you can share a minimal reproducible notebook and data we can take a look but it doesn't seem Colab specific.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants