Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid creating redundant datasets #74

Open
bcyphers opened this issue Jan 31, 2018 · 0 comments
Open

Avoid creating redundant datasets #74

bcyphers opened this issue Jan 31, 2018 · 0 comments

Comments

@bcyphers
Copy link
Contributor

If enter_data() is called with the same train_path twice in a row and the data itself hasn't changed, a new Dataset does not need to be created.

We should add a column which stores some kind of hash of the actual data. When a Dataset would be created, if the metadata and data hash are exactly the same as an existing Dataset, nothing should be added to the ModelHub database and the existing Dataset should be returned instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants