Per segment chunks #8272

zhiltsov-max · 2024-08-07T12:07:54Z

Motivation and context

Changed chunk generation from per-task chunks to per-segment chunks
Fixed a memory leak in video reading on the server side (only in media_extractors, so there are several more left)
Fixed a potential hang in import worker or the server process on process shutdown
Disabled multithreading in video reading in endpoints (not in static chunk generation)
Refactored static chunk generation code (moved after job creation)
Refactored various server internal APIs for frame retrieval
Updated UI logic to access chunks, added support for non-sequential frames in chunks
Added a new server configuration option CVAT_ALLOW_STATIC_CACHE (boolean) to enable and disable static cache support. The option is disabled by default (it's changed from the previous behavior)
Added tests for the changes made
Added missing original chunk type field in job responses
Fixed invalid kvrocks cleanup in tests for Helm deployment
Added a new 0-based index parameter in GET /api/jobs/{id}/data/?type=chunk to simplify indexing
- GT job chunks with non-sequential frames have no placeholders inside

When this update is applied to the server, there will be a data storage setting migration for the tasks. Existing tasks using static chunks (task.data.storage_method == FILE_SYSTEM) will be switched to the dynamic cache (i.e. to == CACHE)). The remaining files should be removed manually, there will be a list of such tasks in the migration log file.

After this update, you'll have an option to enable or disable static cache use during task creation. This allows, in particular, prohibit new tasks using the static cache. With this option, any tasks using static cache will use the dynamic cache instead on data access.

User-observable changes:

Job chunk ids now start from 0 for each job instead of using parent task ids
The use_cache = false or storage_method = filesystem parameters in task creation can be ignored by the server
Task chunk access may be slower for some chunks (particularly, for tasks with overlap configured, for chunks on segment boundaries, and for tasks previously using static chunks)
The last chunk in a job will contain only the frames from the current job, even if there are more frames in the task

How has this been tested?

Checklist

I submit my changes into the develop branch
I have created a changelog fragment
I have updated the documentation accordingly
I have added tests to cover my changes
I have linked related issues (see GitHub docs)
I have increased versions of npm packages if it is necessary
(cvat-canvas,
cvat-core,
cvat-data and
cvat-ui)

License

I submit my code changes under the same MIT License that covers the project.
Feel free to contact the maintainers if that's a concern.

Summary by CodeRabbit

New Features
- Introduced a new server setting to disable media chunks on the local filesystem.
- Enhanced frame prefetching with a startFrame parameter for improved chunk calculations.
- Added a new property, data_original_chunk_type, for enhanced job differentiation in the metadata.
Bug Fixes
- Resolved memory management issues to prevent leaks during video processing.
- Corrected naming inconsistencies related to the prefetchAnalyzer.
Documentation
- Included configuration for code formatting tools to ensure consistent code quality across the project.
Refactor
- Restructured classes and methods for improved clarity and maintainability, particularly in media handling and task processing.
Chores
- Updated formatting scripts to include additional directories for automated code formatting.

…iterator

coderabbitai · 2024-08-07T12:08:10Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This update introduces several enhancements across various components, particularly focusing on improved media handling, frame processing, and task management. Notable features include new configuration options for disabling local media storage, refined frame prefetching logic, and updated test frameworks for better reliability. Memory management issues have been addressed, ensuring smoother operations during video processing, while modular code improvements enhance overall maintainability.

Changes

Files	Change Summary
changelog.d/..._mzhiltso_job_chunks.md	New server setting to disable local media chunks; distinct chunk IDs for jobs; memory leak fixes.
cvat-core/src/frames.ts	`FrameData` and `PrefetchAnalyzer` updates for improved prefetching logic and constructor enhancements.
cvat-data/src/ts/cvat-data.ts	`FrameDecoder` updates for handling `startFrame` and chunkNumber adjustments.
cvat/apps/dataset_manager/bindings.py	Replaced `FrameProvider` with `TaskFrameProvider`, reflecting task-specific handling changes.
cvat/apps/dataset_manager/formats/cvat.py	Refactored `dump_media_files` for better modularity; frame retrieval changes.
cvat/apps/dataset_manager/tests/test_formats.py	Removed outdated functions and classes; enhanced test reliability through inheritance.
cvat/apps/engine/apps.py	Enhanced `ready` method for dynamic configuration management.
cvat/apps/engine/cache.py	Improved `MediaCache` class with new methods for cache management.
cvat/apps/engine/default_settings.py	New setting for static media caching based on environment variables.
cvat/apps/engine/frame_provider.py	Major refactor of frame loading capabilities with modular design changes.
cvat/apps/engine/log.py	Enhanced logging functionality for migration processes.
cvat/apps/engine/media_extractors.py	New classes/methods for media processing and improved type annotations.
cvat/apps/engine/models.py	Updated method signatures for chunk names/paths.
cvat/apps/engine/pyproject.toml	New configuration settings for code formatting tools.
cvat/apps/engine/serializers.py	Added a new read-only field to `JobReadSerializer` for additional information.
cvat/apps/engine/task.py	Enhanced task processing logic for segment creation and job management.
cvat/apps/engine/tests/...	Various tests updated for improved validation and functionality.
cvat/apps/lambda_manager/tests/test_lambda.py	Refactored test code structure for better maintainability.
cvat/requirements/base.in	Added comments for clarity on library dependencies.
cvat/schema.yml	New property `data_original_chunk_type` added to enhance schema expressiveness.
tests/python/shared/utils/helpers.py	Updated image/video functions and added a new utility for reading video files.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Server
    participant MediaCache
    participant FrameProvider

    User->>Server: Request Media
    Server->>MediaCache: Check Cache
    MediaCache-->>Server: Return Media (if available)
    Server->>FrameProvider: Retrieve Frames
    FrameProvider-->>Server: Return Frames
    Server-->>User: Deliver Media

🐰 In the meadow, changes bloom bright,
New features hop with pure delight!
With frames and tasks now dancing in line,
Our code, like carrots, is truly divine!
So celebrate with a joyful cheer,
For every fix brings us near! 🌿

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
-- I pushed a fix in commit <commit_id>, please review it.
-- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
-- @coderabbitai generate unit testing code for this file.
-- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
-- @coderabbitai generate interesting stats about this repository and render them as a table.
-- @coderabbitai read src/utils.ts and generate unit testing code.
-- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
-- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

Early access features: enabled

We are currently testing the following features in early access:

OpenAI o1 for code reviews: OpenAI's new o1 model is being tested for generating code suggestions in code reviews.

Note:

You can enable or disable early access features from the CodeRabbit UI or by updating the CodeRabbit configuration file.

coderabbitai

Actionable comments posted: 2

Outside diff range, codebase verification and nitpick comments (1)

cvat/apps/engine/media_extractors.py (1)
Ensure all instantiations of VideoReader and VideoReaderWithManifest handle the new allow_threading parameter.

The following instances do not include the allow_threading parameter:

cvat/apps/engine/migrations/0024_auto_20191023_1025.py

cvat/apps/engine/cache.py

Please update these instances to correctly handle the allow_threading parameter.

Analysis chain

Line range hint 509-524:
LGTM! But verify the new parameter usage.

The addition of the allow_threading parameter enhances flexibility in video processing.

However, ensure that all usages of VideoReader correctly handle the new parameter.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify all usages of `VideoReader` handle the new `allow_threading` parameter.

# Test: Search for the class usage. Expect: Only occurrences with the new parameter.
rg --type python -A 5 $'VideoReader'
Length of output: 66

Script:
#!/bin/bash
# Description: Verify all usages of `VideoReader` handle the new `allow_threading` parameter.

# Test: Search for the class usage. Expect: Only occurrences with the new parameter.
rg --type py -A 5 $'VideoReader'
Length of output: 5816

Script:
#!/bin/bash
# Description: Verify all instantiations of `VideoReader` handle the new `allow_threading` parameter.

# Check for instantiations of `VideoReader` and include context lines to verify parameter usage.
rg --type py 'VideoReader(' -A 3
Length of output: 109

Script:
#!/bin/bash
# Description: Verify all instantiations of `VideoReader` handle the new `allow_threading` parameter.

# Check for instantiations of `VideoReader` and include context lines to verify parameter usage.
rg --type py 'VideoReader\(' -A 3
Length of output: 852

Script:
#!/bin/bash
# Description: Verify all instantiations of `VideoReader` and `VideoReaderWithManifest` handle the new `allow_threading` parameter.

# Check for instantiations of `VideoReader` and `VideoReaderWithManifest` and include context lines to verify parameter usage.
rg --type py 'VideoReader\(' -A 3
rg --type py 'VideoReaderWithManifest\(' -A 3
Length of output: 1164

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 6e30164 and ee3c905.

Files selected for processing (13)

cvat-core/src/frames.ts (10 hunks)
cvat-data/src/ts/cvat-data.ts (4 hunks)
cvat/apps/dataset_manager/bindings.py (4 hunks)
cvat/apps/dataset_manager/formats/cvat.py (3 hunks)
cvat/apps/engine/cache.py (3 hunks)
cvat/apps/engine/frame_provider.py (1 hunks)
cvat/apps/engine/media_extractors.py (15 hunks)
cvat/apps/engine/models.py (2 hunks)
cvat/apps/engine/pyproject.toml (1 hunks)
cvat/apps/engine/task.py (8 hunks)
cvat/apps/engine/views.py (10 hunks)
cvat/apps/lambda_manager/views.py (2 hunks)
dev/format_python_code.sh (1 hunks)

Files skipped from review due to trivial changes (2)

cvat/apps/engine/pyproject.toml
dev/format_python_code.sh

Additional comments not posted (69)

cvat-data/src/ts/cvat-data.ts (7)

103-103: Initialize the new startFrame parameter.

The startFrame parameter is correctly added to the class properties.

109-109: Include the new startFrame parameter in the constructor.

The startFrame parameter is correctly included in the constructor.

123-123: Set the startFrame parameter in the constructor.

The startFrame parameter is correctly set in the constructor.

209-209: Integrate the startFrame parameter in the frame method.

The startFrame parameter is correctly integrated into the calculation of chunkNumber.

268-268: Integrate the startFrame parameter in the startDecode method.

The startFrame parameter is correctly integrated into the calculation of chunkNumber.

Line range hint 1-1: Ensure proper error handling in decodeContextImages.

The function correctly handles errors by releasing the mutex and rejecting the promise.

Line range hint 1-1: Ensure proper resource management in decodeContextImages.

The function correctly manages resources by using a mutex to control access.

cvat/apps/engine/cache.py (11)

61-62: Ensure correct checksum computation in get_checksum.

The method correctly computes a checksum using zlib.crc32.

64-99: Ensure correct cache item handling in _get_or_set_cache_item.

The method correctly handles cache item creation and retrieval, including checksum validation.

100-107: Ensure correct cache retrieval in _get.

The method correctly retrieves an item from the cache and handles potential errors.

111-118: Ensure correct segment chunk handling in get_segment_chunk.

The method correctly retrieves or prepares a segment chunk.

121-128: Ensure correct selective job chunk handling in get_selective_job_chunk.

The method correctly retrieves or prepares a selective job chunk.

131-134: Ensure correct segment preview handling in get_or_set_segment_preview.

The method correctly retrieves or prepares a segment preview.

137-138: Ensure correct cloud preview handling in get_cloud_preview.

The method correctly retrieves a cloud preview.

140-143: Ensure correct cloud preview handling in get_or_set_cloud_preview.

The method correctly retrieves or prepares a cloud preview.

146-149: Ensure correct frame context image handling in get_frame_context_images.

The method correctly retrieves or prepares frame context images.

152-232: Ensure correct raw frame reading in _read_raw_frames.

The method correctly reads raw frames based on task and frame IDs.

234-286: Ensure correct segment chunk preparation in prepare_segment_chunk.

The method correctly prepares a segment chunk based on the segment type.

cvat/apps/engine/frame_provider.py (6)

41-73: Ensure correct implementation of _ChunkLoader class.

The class correctly defines an abstract interface for loading chunks of media data.

76-94: Ensure correct implementation of _FileChunkLoader class.

The class correctly handles file-based chunk loading.

96-108: Ensure correct implementation of _BufferChunkLoader class.

The class correctly handles buffer-based chunk loading.

128-206: Ensure correct implementation of IFrameProvider class.

The class correctly defines an abstract interface for frame providers.

209-377: Ensure correct implementation of TaskFrameProvider class.

The class correctly handles task-based frame provision.

379-524: Ensure correct implementation of SegmentFrameProvider class.

The class correctly handles segment-based frame provision.

cvat-core/src/frames.ts (7)

28-28: Typo correction: Renamed prefetchAnalizer to prefetchAnalyzer.

This change corrects a typographical error, improving code readability.

211-211: Added private field #startFrame to PrefetchAnalyzer.

This field is necessary for the new functionality involving the startFrame parameter.

213-216: Updated constructor to accept startFrame parameter.

This change is essential for initializing the new #startFrame field.

224-224: Updated currentChunk calculation to account for startFrame.

This change ensures that the prefetching logic correctly considers the starting frame.

230-230: Updated condition to use modified formula for currentChunk.

This change ensures that the prefetching logic correctly considers the starting frame.

272-272: Updated chunkNumber calculation to account for startFrame.

This change ensures that the chunk number calculation correctly considers the starting frame.

277-280: Updated nextChunkNumber calculation to account for startFrame.

This change ensures that the next chunk number calculation correctly considers the starting frame.

cvat/apps/engine/models.py (5)

259-267: Updated _get_chunk_name to include segment_id and return descriptive format.

This change improves the naming conventions and provides clearer context for the chunk names.

269-270: Updated _get_compressed_chunk_name to include segment_id.

This change ensures consistency with the updated chunk naming conventions.

272-273: Updated _get_original_chunk_name to include segment_id.

This change ensures consistency with the updated chunk naming conventions.

275-277: Renamed get_original_chunk_path to get_original_segment_chunk_path and added segment parameter.

This change aligns with the updated method signatures and provides clearer context for the parameters.

279-281: Renamed get_compressed_chunk_path to get_compressed_segment_chunk_path and added segment parameter.

This change aligns with the updated method signatures and provides clearer context for the parameters.

cvat/apps/lambda_manager/views.py (4)

35-35: Updated import statement to replace FrameProvider with FrameQuality and TaskFrameProvider.

This change indicates a shift towards a more structured approach in managing frame quality attributes.

483-485: Updated logic for determining quality to use FrameQuality.

This change enhances clarity and maintainability by encapsulating quality definitions within a dedicated class.

492-492: Updated instantiation of frame provider to TaskFrameProvider(db_task).

This change suggests a refactoring that may provide better context or functionality tied to the task rather than the data alone.

485-485: Updated condition to reference FrameQuality.ORIGINAL and FrameQuality.COMPRESSED.

This change improves the semantic accuracy of the code.

cvat/apps/engine/media_extractors.py (8)

55-57: LGTM!

The FrameQuality enumeration is well-defined and categorizes frame quality effectively.

104-137: LGTM!

The RandomAccessIterator class is well-implemented, allowing random access to elements in an iterable and managing resources effectively.

144-189: LGTM!

The CachingMediaIterator class is well-implemented, extending RandomAccessIterator and adding effective caching capabilities.

533-573: LGTM!

The _make_frame_iterator method is well-implemented, encapsulating the logic for frame iteration and handling optional filtering and rotation based on metadata.

589-605: LGTM!

The _read_av_container method ensures proper cleanup of resources, addressing potential memory leaks.

645-667: LGTM!

The get_frame_count method is well-implemented, providing the total frame count in the video.

670-679: LGTM!

The ImageReaderWithManifest class is well-implemented, providing manifest-based access to images.

680-754: LGTM!

The VideoReaderWithManifest class is well-implemented, providing manifest-based access to videos and handling threading and frame iteration effectively.

cvat/apps/engine/task.py (5)

Line range hint 125-173:
LGTM!

The _generate_segment_params function is well-implemented, generating segment parameters with named parameters and improving readability and maintainability.

176-200: LGTM!

The _create_segments_and_jobs function is well-implemented, focusing on segment and job creation and streamlining the process.

1134-1266: LGTM!

The _create_static_chunks function is well-implemented, encapsulating the logic for creating static chunks and improving modularity.

77-79: LGTM!

The SegmentParams class is well-implemented, providing a clearer structure for segment parameters with the addition of the type field and optional frames parameter.

1140-1154: LGTM!

The update_progress function is well-implemented, enhancing user feedback during long-running operations.

cvat/apps/dataset_manager/formats/cvat.py (2)

1374-1375: LGTM! Using the factory function improves modularity.

The use of make_frame_provider instead of directly instantiating FrameProvider enhances modularity and flexibility.

1380-1385: Verify the correctness of the frame retrieval process.

Ensure that the iterate_frames method correctly retrieves frames with the specified parameters.

cvat/apps/dataset_manager/bindings.py (5)

34-34: Import changes approved.

The import statement has been updated to reflect the new frame provider and enums.

1351-1361: Frame quality and output type changes approved.

The method now uses FrameQuality.ORIGINAL and FrameOutputType.NUMPY_ARRAY for video frames and FrameQuality.ORIGINAL and FrameOutputType.BUFFER for image frames, aligning with the new enums.

1369-1369: Frame provider update approved.

The method now uses TaskFrameProvider instead of FrameProvider, aligning with the new class introduced.

1505-1505: Frame extension update approved.

The method now uses TaskFrameProvider.VIDEO_FRAME_EXT to set the frame extension for video tasks, aligning with the new class and its properties.

1596-1596: Frame extension update approved.

The method now uses TaskFrameProvider.VIDEO_FRAME_EXT to set the frame extension for video tasks, aligning with the new class and its properties.
cvat/apps/engine/views.py (9)
631-641: Method Implementation: Ensure _TaskDataGetter usage

The preview method uses _TaskDataGetter for data retrieval. Ensure that _TaskDataGetter is correctly used and implemented.

Verification successful

Method Implementation: Ensure _TaskDataGetter usage

The _TaskDataGetter class is correctly defined and used in the preview method of ProjectViewSet. The class extends _DataGetter and includes the necessary parameters.

_TaskDataGetter is defined in cvat/apps/engine/views.py.

It is used multiple times in the same file, including in the preview method.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Verify that `_TaskDataGetter` is correctly used and implemented.

# Test: Search for `_TaskDataGetter` usage. Expect: Correct usage and implementation.
rg --type python '_TaskDataGetter'
Length of output: 65

Script:
#!/bin/bash
# Verify that `_TaskDataGetter` is correctly used and implemented.

# Test: Search for `_TaskDataGetter` usage. Expect: Correct usage and implementation.
rg '_TaskDataGetter' --glob '*.py' -A 5
Length of output: 1385
714-727: Constructor: Ensure db_task is a valid Task model instance

The constructor of _TaskDataGetter accepts db_task. Ensure that db_task is a valid instance of the Task model.

Verification successful

Constructor: Ensure db_task is a valid Task model instance

The constructor of _TaskDataGetter accepts db_task. Based on the extensive usage of self._db_task across the codebase, it is consistently assigned a valid Task model instance, particularly highlighted in backup.py where it is assigned from a Task model object.

cvat/apps/engine/backup.py: self._db_task = models.Task.objects.create(**self._manifest, organization_id=self._org_id)

cvat/apps/engine/backup.py: self._db_task = models.Task.objects.prefetch_related(...).get(pk=pk)
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Verify that `db_task` is a valid instance of the `Task` model.

# Test: Search for `db_task` usage. Expect: Valid instance of `Task`.
rg --type python 'self\._db_task'
Length of output: 66

Script:
#!/bin/bash
# Verify that `db_task` is a valid instance of the `Task` model.

# Test: Search for `db_task` usage. Expect: Valid instance of `Task`.
rg --glob '*.py' 'self\._db_task'
Length of output: 6864
661-664: Constructor: Ensure data_num type consistency

The constructor of _DataGetter accepts data_num as Optional[Union[str, int]]. Ensure that data_num is consistently handled as an integer within the class.

Verification successful

Constructor: Ensure data_num type consistency

The constructor of _DataGetter accepts data_num as Optional[Union[str, int]]. The search results confirm that data_num is consistently converted to an integer within the class, ensuring type consistency.

self.number = int(data_num) if data_num is not None else None
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Verify that `data_num` is consistently handled as an integer within `_DataGetter`.

# Test: Search for `data_num` usage. Expect: Consistent handling as an integer.
rg --type python 'self\.number'
Length of output: 64

Script:
#!/bin/bash
# Verify that `data_num` is consistently handled as an integer within `_DataGetter`.

# Test: Search for `data_num` usage. Expect: Consistent handling as an integer.
rg --type py 'self\.data_num'
Length of output: 31

Script:
#!/bin/bash
# Verify that `data_num` is consistently handled as an integer within `_DataGetter`.

# Test: Search for `data_num` usage in the entire file. Expect: Consistent handling as an integer.
rg 'data_num' cvat/apps/engine/views.py
Length of output: 848
730-743: Constructor: Ensure db_job is a valid Job model instance

The constructor of _JobDataGetter accepts db_job. Ensure that db_job is a valid instance of the Job model.

Verification successful

Constructor: Ensure db_job is a valid Job model instance

The constructor of _JobDataGetter accepts db_job. It has been verified that db_job is a valid instance of the Job model.

db_job is assigned using Job.objects.select_related, Job.objects.get, and models.Job.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Verify that `db_job` is a valid instance of the `Job` model.

# Test: Search for `db_job` usage. Expect: Valid instance of `Job`.
rg --type python 'self\._db_job'
Length of output: 65

Script:
#!/bin/bash
# Verify that `db_job` is a valid instance of the `Job` model.

# Test: Search for `self._db_job` usage. Expect: Valid instance of `Job`.
rg -t py 'self\._db_job'
Length of output: 1377

Script:
#!/bin/bash
# Verify the initialization or type assignment of `db_job` to ensure it is a valid instance of the `Job` model.

# Search for `db_job` assignments in the codebase to verify its type.
rg -t py 'db_job\s*='
Length of output: 1908
1997-2000: Method Implementation: Ensure _JobDataGetter usage

The data method uses _JobDataGetter for data retrieval. Ensure that _JobDataGetter is correctly used and implemented.

Verification successful

Method Implementation: Ensure _JobDataGetter usage

The _JobDataGetter class is correctly used and implemented in the data method of JobViewSet. The class is properly defined with an __init__ method that initializes necessary parameters and a _get_frame_provider method that encapsulates the frame provider logic.

_JobDataGetter class definition and methods:

__init__ method initializes with db_job, data_type, data_quality, and data_num.

_get_frame_provider method returns a JobFrameProvider instance.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Verify that `_JobDataGetter` is correctly used and implemented.

# Test: Search for `_JobDataGetter` usage. Expect: Correct usage and implementation.
rg --type python '_JobDataGetter'
Length of output: 64

Script:
#!/bin/bash
# Verify that `_JobDataGetter` is correctly used and implemented.

# Test: Search for `_JobDataGetter` usage. Expect: Correct usage and implementation.
rg --type-add 'py:*.py' '_JobDataGetter'
Length of output: 229

Script:
#!/bin/bash
# Retrieve the definition and methods of `_JobDataGetter` from `cvat/apps/engine/views.py`.

# Search for the class definition and its methods.
ast-grep --lang python --pattern 'class _JobDataGetter($_) { $$$ }'
Length of output: 524

Script:
#!/bin/bash
# Retrieve the definition and methods of `_JobDataGetter` from `cvat/apps/engine/views.py`.

# Search for the class definition and its methods.
ast-grep --lang python --pattern 'class _JobDataGetter($_): $$$'
Length of output: 932
742-743: Method Implementation: Ensure JobFrameProvider usage

The _get_frame_provider method returns an instance of JobFrameProvider. Ensure that JobFrameProvider is correctly used and implemented.

Verification successful

JobFrameProvider is correctly used and implemented

The JobFrameProvider class is correctly implemented and used within the codebase. It inherits from SegmentFrameProvider and is instantiated properly in the make_frame_provider function.

cvat/apps/engine/frame_provider.py:

Definition at line 537

Usage at line 546
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Verify that `JobFrameProvider` is correctly used and implemented.

# Test: Search for `JobFrameProvider` usage. Expect: Correct usage and implementation.
rg --type python 'JobFrameProvider'
Length of output: 66

Script:
#!/bin/bash
# Find all Python files and search for `JobFrameProvider` usage and implementation.
fd -e py -x grep -Hn 'JobFrameProvider' {}
Length of output: 423

Script:
#!/bin/bash
# Extract lines around the definition and usage of `JobFrameProvider` in `frame_provider.py`.
grep -A 10 -B 10 'class JobFrameProvider' ./cvat/apps/engine/frame_provider.py
grep -A 10 -B 10 'JobFrameProvider(data_source)' ./cvat/apps/engine/frame_provider.py
Length of output: 1575
2093-2098: Method Implementation: Ensure _JobDataGetter usage

The preview method uses _JobDataGetter for data retrieval. Ensure that _JobDataGetter is correctly used and implemented.

Verification successful

Method Implementation: Ensure _JobDataGetter usage

The preview method uses _JobDataGetter for data retrieval. The _JobDataGetter class is correctly implemented and used in the preview method.

_JobDataGetter class is defined in cvat/apps/engine/views.py (line 730).

It inherits from _DataGetter and correctly implements necessary methods.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Verify that `_JobDataGetter` is correctly used and implemented.

# Test: Search for `_JobDataGetter` usage. Expect: Correct usage and implementation.
rg --type python '_JobDataGetter'
Length of output: 64

Script:
#!/bin/bash
# Verify that `_JobDataGetter` is correctly used and implemented.

# Test: Search for `_JobDataGetter` usage. Expect: Correct usage and implementation.
rg --type py '_JobDataGetter'
Length of output: 218

Script:
#!/bin/bash
# Extract the implementation of `_JobDataGetter` from `cvat/apps/engine/views.py`.

# Use `ast-grep` to get the class definition and its methods.
ast-grep --lang python --pattern $'class _JobDataGetter($_):\n    $$$' cvat/apps/engine/views.py
Length of output: 962

Script:
#!/bin/bash
# Extract the implementation of `_DataGetter` from `cvat/apps/engine/views.py`.

# Use `ast-grep` to get the class definition and its methods.
ast-grep --lang python --pattern $'class _DataGetter($_):\n    $$$' cvat/apps/engine/views.py
Length of output: 4104
1266-1269: Method Implementation: Ensure _TaskDataGetter usage

The data method uses _TaskDataGetter for data retrieval. Ensure that _TaskDataGetter is correctly used and implemented.

Verification successful

Method Implementation: Ensure _TaskDataGetter usage

The data method in TaskViewSet correctly uses _TaskDataGetter for data retrieval. The _TaskDataGetter class is properly implemented with necessary methods.

_TaskDataGetter class is defined in cvat/apps/engine/views.py and inherits from _DataGetter.

It includes an __init__ method and a _get_frame_provider method.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Verify that `_TaskDataGetter` is correctly used and implemented.

# Test: Search for `_TaskDataGetter` usage. Expect: Correct usage and implementation.
rg --type python '_TaskDataGetter'
Length of output: 65

Script:
#!/bin/bash
# Verify that `_TaskDataGetter` is correctly used and implemented.

# Test: Search for `_TaskDataGetter` usage. Expect: Correct usage and implementation.
rg '_TaskDataGetter' --glob '*.py' -A 5
Length of output: 1385

Script:
#!/bin/bash
# Extract the full definition of `_TaskDataGetter` and its methods from `cvat/apps/engine/views.py`.

# Search for the class definition and its methods.
ast-grep --lang python --pattern $'class _TaskDataGetter(_DataGetter) {
  $$$
}' cvat/apps/engine/views.py
Length of output: 574

Script:
#!/bin/bash
# Extract the full definition of `_TaskDataGetter` and its methods from `cvat/apps/engine/views.py`.

# Search for the class definition and its methods.
ast-grep --lang python --pattern 'class _TaskDataGetter(_DataGetter):' cvat/apps/engine/views.py -A 50
Length of output: 104

Script:
#!/bin/bash
# Extract the full definition of `_TaskDataGetter` and its methods from `cvat/apps/engine/views.py`.

# Search for the class definition and its methods.
rg 'class _TaskDataGetter' cvat/apps/engine/views.py -A 50
Length of output: 1637
726-727: Method Implementation: Ensure TaskFrameProvider usage

The _get_frame_provider method returns an instance of TaskFrameProvider. Ensure that TaskFrameProvider is correctly used and implemented.
Verification successful

TaskFrameProvider is correctly used and implemented

The TaskFrameProvider class is defined and implemented in the cvat/apps/engine/frame_provider.py file. The class includes methods for validating frame and chunk numbers, indicating that it adheres to the expected interface and functionality.
cvat/apps/engine/frame_provider.py:
class TaskFrameProvider(IFrameProvider):
    def __init__(self, db_task: models.Task) -> None:
        self._db_task = db_task

    def validate_frame_number(self, frame_number: int) -> int:
        start = self._db_task.data.start_frame
        stop = self._db_task.data.stop_frame
        if frame_number not in range(start, stop + 1, self._db_task.data.get_frame_step()):
            raise ValidationError(
                f"Invalid frame '{frame_number}'. "
                f"The frame number should be in the [{start}, {stop}] range"
            )
        return frame_number

    def validate_chunk_number(self, chunk_number: int) -> int:
        last_chunk = math.ceil(self._db_task.data.size / self._db_task.data.chunk_size) - 1
        if not (0 <= chunk_number <= last_chunk):
            raise ValidationError(
                f"Invalid chunk number '{chunk_number}'. "
                f"The chunk number should be in the [0, {last_chunk}] range"
            )
This confirms that the TaskFrameProvider is correctly used and implemented in the codebase.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Verify that `TaskFrameProvider` is correctly used and implemented.

# Test: Search for `TaskFrameProvider` usage. Expect: Correct usage and implementation.
rg --type python 'TaskFrameProvider'
Length of output: 67

Script:
#!/bin/bash
# Find all Python files and search for `TaskFrameProvider` usage and implementation.

# Locate all Python files and search for `TaskFrameProvider`.
fd -e py -x rg 'TaskFrameProvider' {}
Length of output: 841

Script:
#!/bin/bash
# Extract the implementation details of `TaskFrameProvider`.

# Search for the definition and implementation of `TaskFrameProvider`.
ast-grep --lang python --pattern $'class TaskFrameProvider(IFrameProvider) {
  $$$
}'
Length of output: 558

Script:
#!/bin/bash
# Extract the implementation details of `TaskFrameProvider`.

# Search for the definition and implementation of `TaskFrameProvider`.
ast-grep --lang python --pattern 'class TaskFrameProvider(IFrameProvider):'
Length of output: 77

Script:
#!/bin/bash
# Search for the definition of `TaskFrameProvider` class.

# Use `rg` to find the class definition of `TaskFrameProvider`.
rg 'class TaskFrameProvider' -A 20
Length of output: 1730

cvat/apps/engine/views.py

cvat/apps/engine/models.py

…matches

…ached field

bsekachev · 2024-09-13T02:08:34Z

Perhaps need to prepare a pull request in private repos to update code according to new changes (as I see now, some private code uses outdated imports)

bsekachev · 2024-09-13T02:53:03Z

I will suggest to make a global replacement, like this to go to the same naming approach

bsekachev · 2024-09-13T03:21:40Z

If to press Play in a GT job

I think the problem is somewhere in this code:

As for a ground truth job this condition may not to work.

nmanovic · 2024-09-13T09:53:54Z

cvat/apps/engine/default_settings.py

@@ -0,0 +1,16 @@
+# Copyright (C) 2024 CVAT.ai Corporation


Why do we need the file? Why don't define the settings in the global settings file?

You can redefine these settings in the global settings file if you need it, this file defines the default values and guarantees the required settings are there.

tests/docker-compose.configurable_static_cache.yml

Marishka17 · 2024-09-16T18:28:31Z

@zhiltsov-max, Mostly LGTM, but I've found 2 small bugs:

If I create a task with 4 images and start_frame 2, I'll receive a 400 code on the task preview request.
I'm not sure whether it is related to the current PR or not, but I guess the progress bar can show an incorrect interval of loaded chunks/frames for a GT job (e.g. there were 2 frames 3 and 6 in the GT job):

I see that now we will have a "3rd type" of created tasks on the server: tasks with images initially created with static chunks, migrated to dynamic chunks but without manifest file. I'm not sure right now whether there can be some problems with that or it becomes a waste of time. However, it is probably some kind of inconsistent state.

cvat/apps/engine/cache.py

…ation and retrieval

zhiltsov-max · 2024-09-17T09:57:48Z

If I create a task with 4 images and start_frame 2, I'll receive a 400 code on the task preview request.

Fixed

I'm not sure whether it is related to the current PR or not, but I guess the progress bar can show an incorrect interval of loaded chunks/frames for a GT job (e.g. there were 2 frames 3 and 6 in the GT job):

Fixed

sonarcloud · 2024-09-17T15:30:47Z

Quality Gate passed

Issues
33 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
1.9% Duplication on New Code

See analysis details on SonarCloud

Marishka17 · 2024-09-18T10:58:05Z

cvat/apps/engine/cache.py

+            # TODO: find a way to use prefetched results, if provided
+            db_images = (
+                db_data.images.order_by("frame")
+                .filter(frame__gte=frame_ids[0], frame__lte=frame_ids[-1])


Probably we should set chunk_size max limit and use here frame__in (in that case we do not need to iterate over requested_frame_iter and compare a frame_id with next_requested_frame_id). What do you think? (can be changed in a separate PR)

I don't really like using __in, as it can result in very big queries, which are processed slowly. And it is possible here, as chunk_size is only limited by the task size now. I think it can be done once we introduce a limit on the chunk_size, which is planned to be done in another PR.

I don't really like using __in, as it can result in very big queries, which are processed slowly.

I agree, but I also said that frame__in should be used only when we add a chunk_size limit.

Ok, I thought you're speaking about order by + limit in the resulting sql.

zhiltsov-max added 19 commits July 25, 2024 19:39

Update frame provider and media cache

b32a9eb

t

cb4ff93

t

d49233c

Support static chunk building, fix av memory leak, add caching media …

146a896

…iterator

Refactor static chunk generation - extract function, revise threading

52d1bac

Refactor and fix task chunk creation from segment chunks, any storage

0c53436

Fix chunk number validation

c166123

Enable formatting for updated components

630c97e

Remove the checksum field

8d710e7

Be consistent about returned task chunk types (allow video chunks)

654a827

Support iterator input in video chunk writing

12e5f2a

Fix type annotation

a79a681

Refactor video reader memory leak fix, add to reader with manifest

d5118a2

Disable threading in video reading in frame provider

1b429cf

Fix keyframe search

d512312

Return frames as generator in dynamic chunk creation

167ee12

Update chunk requests in UI

88a9cb2

Update cache indices in FrameDecoder, enable video play

30bf8fd

Fix frame retrieval for video

ee3c905

zhiltsov-max requested review from azhavoro, bsekachev and Marishka17 as code owners August 7, 2024 12:07

zhiltsov-max marked this pull request as draft August 7, 2024 12:07

coderabbitai bot reviewed Aug 7, 2024

View reviewed changes

cvat/apps/engine/views.py Outdated Show resolved Hide resolved

zhiltsov-max changed the title ~~Per job chunks~~ Per segment chunks Aug 7, 2024

Fix frame reading in updated dynamic cache building

dc03220

nmanovic reviewed Aug 9, 2024

View reviewed changes

cvat/apps/engine/models.py Outdated Show resolved Hide resolved

zhiltsov-max added 2 commits August 9, 2024 17:29

Fix invalid frame quality

4bb8a74

Fix video reading in media_extractors - exception handling, frame mis…

f7d2c4c

…matches

zhiltsov-max added 3 commits September 12, 2024 17:00

Use builtins for range and binary search, convert frame step into a c…

add5ae6

…ached field

Merge remote-tracking branch 'origin/zm/job-chunks' into zm/job-chunks

643d998

Fix cached chunk indicators in frame player

df90b33

zhiltsov-max added 3 commits September 13, 2024 11:50

Fix chunk predecode logic

6ccb7db

Rename chunkNumber to chunkIndex where necessary

1fb68bc

Fix potential prefetch problem with reverse playback

92d0c7a

nmanovic reviewed Sep 13, 2024

View reviewed changes

Merge branch 'develop' into zm/job-chunks

67c1650

bsekachev approved these changes Sep 16, 2024

View reviewed changes

Marishka17 reviewed Sep 16, 2024

View reviewed changes

tests/docker-compose.configurable_static_cache.yml Outdated Show resolved Hide resolved

zhiltsov-max added 3 commits September 16, 2024 13:41

Move env variable into docker-compose.yml

3cdc4dc

Merge remote-tracking branch 'origin/zm/job-chunks' into zm/job-chunks

716042e

Merge branch 'develop' into zm/job-chunks

19279c7

Marishka17 reviewed Sep 16, 2024

View reviewed changes

cvat/apps/engine/cache.py Outdated Show resolved Hide resolved

cvat/apps/engine/cache.py Show resolved Hide resolved

zhiltsov-max added 5 commits September 17, 2024 12:40

Fix invalid cached chunk display in GT jobs

bc5ed39

Fix invalid task preview generation

08ddd28

Refactor CS previews, context image chunk generation, media cache cre…

1d969bd

…ation and retrieval

Merge remote-tracking branch 'origin/zm/job-chunks' into zm/job-chunks

e2cba8c

Remove extra import

d135475

zhiltsov-max added 3 commits September 17, 2024 15:24

Fix CS preview in response

a1638c9

Add reverse migration

fc89c01

Merge branch 'develop' into zm/job-chunks

c6e65f6

Marishka17 reviewed Sep 18, 2024

View reviewed changes

Marishka17 approved these changes Sep 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Per segment chunks #8272

Per segment chunks #8272

zhiltsov-max commented Aug 7, 2024 •

edited

Loading

coderabbitai bot commented Aug 7, 2024 •

edited

Loading

Review skipped

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

bsekachev commented Sep 13, 2024 •

edited

Loading

bsekachev commented Sep 13, 2024

bsekachev commented Sep 13, 2024

nmanovic Sep 13, 2024

zhiltsov-max Sep 13, 2024 •

edited

Loading

Marishka17 commented Sep 16, 2024 •

edited

Loading

zhiltsov-max commented Sep 17, 2024

sonarcloud bot commented Sep 17, 2024

Marishka17 Sep 18, 2024 •

edited

Loading

zhiltsov-max Sep 18, 2024

Marishka17 Sep 18, 2024

zhiltsov-max Sep 18, 2024

Per segment chunks #8272

Are you sure you want to change the base?

Per segment chunks #8272

Conversation

zhiltsov-max commented Aug 7, 2024 • edited Loading

Motivation and context

How has this been tested?

Checklist

License

Summary by CodeRabbit

Summary by CodeRabbit

coderabbitai bot commented Aug 7, 2024 • edited Loading

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

bsekachev commented Sep 13, 2024 • edited Loading

bsekachev commented Sep 13, 2024

bsekachev commented Sep 13, 2024

nmanovic Sep 13, 2024

Choose a reason for hiding this comment

zhiltsov-max Sep 13, 2024 • edited Loading

Choose a reason for hiding this comment

Marishka17 commented Sep 16, 2024 • edited Loading

zhiltsov-max commented Sep 17, 2024

sonarcloud bot commented Sep 17, 2024

Quality Gate passed

Marishka17 Sep 18, 2024 • edited Loading

Choose a reason for hiding this comment

zhiltsov-max Sep 18, 2024

Choose a reason for hiding this comment

Marishka17 Sep 18, 2024

Choose a reason for hiding this comment

zhiltsov-max Sep 18, 2024

Choose a reason for hiding this comment

zhiltsov-max commented Aug 7, 2024 •

edited

Loading

coderabbitai bot commented Aug 7, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

bsekachev commented Sep 13, 2024 •

edited

Loading

zhiltsov-max Sep 13, 2024 •

edited

Loading

Marishka17 commented Sep 16, 2024 •

edited

Loading

Marishka17 Sep 18, 2024 •

edited

Loading