Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate use of Sparse Matrices for Tabulated Data Storage #38

Open
michaelweinold opened this issue Jun 19, 2024 · 1 comment
Open
Assignees

Comments

@michaelweinold
Copy link
Member

...since @dodedic's CSV files are currently ~40MB in size, you might want to look into: scipy.sparse.save_npz

import numpy as np
from scipy.sparse import csr_matrix
import scipy as sp
import pandas as pd

df = pd.read_csv('01-averageDailyFlights.csv', header=0, index_col=0)  # Adjust the path and options as needed
dense_matrix = df.values

# Step 3: Convert the dense matrix to a sparse matrix
sparse_matrix = csr_matrix(dense_matrix,)

sp.sparse.save_npz('test.npz', sparse_matrix)
@dodedic
Copy link
Contributor

dodedic commented Jun 26, 2024

@michaelweinold

This was successful. Files are now 120 KB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants