Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Tiered Storage Movement Based on Last Modified Date in ISM Policy #1228

Open
aziz-arz opened this issue Aug 8, 2024 · 2 comments
Labels

Comments

@aziz-arz
Copy link

aziz-arz commented Aug 8, 2024

Is your feature request related to a problem? Please describe

Our use case involves utilizing AWS OpenSearch to store flight data, which accumulates in large chunks of data that is no more needed. We aim to implement tiered storage where data moves between hot, warm, and cold storages before eventually being deleted. Specifically, we want to keep data in hot storage for 14 days, move it to warm storage for 365 days, and then either expire or move it to cold storage.

Currently, OpenSearch ISM policy settings allow rollover strategies based on the index age, calculated from the index creation date. However, in our use case, data ingestion happens at various times and is linked to flight data. Indices are created automatically by a Java client and can have different dates embedded in the index name, representing the actual flight date, which can be a future date. For example, the index flight_09.15.2024 can get created in June of 2024. However, we want this particular index to be moved to warm 2 weeks from the flight date, which is in September.

Describe the solution you'd like

We propose implementing tiered storage movement (ISM policy) based on the flight date instead of the index creation date. A feasible approach would be to calculate the min_index_age from the last modified date, allowing indices to move to different tiers once they are no longer actively used, such as after the last modification date.

Add a configuration option to calculate min_index_age based on the time between the last modified date and the present, instead of the index creation date. This adjustment will allow us to roll over flight-based indices after the data in those indices is no longer being actively used.

Related component

Plugins

Describe alternatives you've considered

As discussed with AWS support, we considered using the min_size parameter to cap the space of our indices to prevent them from becoming too large. However, this does not fully address our need for tiered storage based on the flight date.

Additional context

Our time-series indices are automatically created by a Java client and have different dates in the index name that do not correlate with the creation date. The current min_index_age calculation based on the creation date does not fit our use case, which includes future and past flight dates.

By implementing the min_index_age based on the last modified date, we want to more accurately manage the lifecycle of our flight data indices.
In essence, the last modified date provides a more accurate reflection of data activity, enabling better alignment of ISM policies with our usage case. This feature will significantly benefit scenarios where data ingestion times vary and are not directly tied to the index creation date, such as with flight data.

@dblock dblock transferred this issue from opensearch-project/OpenSearch Aug 12, 2024
@aziz-arz
Copy link
Author

Hi there,

Has there been any development on his issue? Will gladly discuss any details

@dblock dblock removed the untriaged label Aug 26, 2024
@dblock
Copy link
Member

dblock commented Aug 26, 2024

[Catch All Triage - 1, 2, 3, 4, 5]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants