Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collector API redesign #383

Closed
setrofim opened this issue May 2, 2019 · 9 comments
Closed

Collector API redesign #383

setrofim opened this issue May 2, 2019 · 9 comments
Assignees
Milestone

Comments

@setrofim
Copy link
Collaborator

setrofim commented May 2, 2019

As discussed in the comments in PR #382, and elsewhere, the TraceCollector interface is not fit for purpose -- it is inconsistent, unintuitive, and undocumented. Additionally, as already raised in #378, the scope of what is being collected by the various implementations of this interface is not adequately captured by the term "trace".

We need to conduct a review of existing implementations, and come up with modifications to the API to make it handle these implementations in more intuitive way. The updated API should be documented.

@setrofim
Copy link
Collaborator Author

setrofim commented May 2, 2019

See #382 (comment) for further discussion of the issue and some suggestions.

@marcbonnici marcbonnici added this to the Release v1.2 milestone Oct 24, 2019
@marcbonnici
Copy link
Collaborator

A per #378 the Trace interface name is not suitable for purpose and from @pict0 comments the current get_trace method is inconsistent with both it's input and output.

To remedy these we are planning to rename the TraceCollector interface to CollectorBase and plan to update the API at the same time to separate the old get_trace method into 2 separate calls set_output and get_data.

The aim here is to decouple the output path (either of a directory or file) from the call to retrieve the data. This allows supplying the output location earlier during execution which can avoid the need to save files to a temporary location and then copy them to the desired location at a later time.

And to unify the output, get_data will always return a list of CollectorOutput objects which will indicate their type: directory, file, or value and their corresponding filepath or value, this should allow for the output of any collector to be handled in a generic fashion if required.

class CollectorBase(object):
# ...

    def set_output(self, path):
    '''Takes a path that is collector dependant to configure it's output location.'''
         # Perform validate and store internally  
         pass

    def get_data(self):
    '''Provides output into the previously configured location.
       Return: List of `CollectorOutput` objects.'''
        # Retrieve data into pre-defined location and create `CollectorOutput` object(s)
        return []
# ...

Does anyone has any thoughts/concerns/improvements? @douglas-raillard-arm @valschneider @qais-yousef

@douglas-raillard-arm
Copy link
Contributor

Would that make sense to give the path directly to CollectorBase.__init__ , rather than in a separate call ? It could be None by default, in which case either a tempfile/tempfolder is used or nothing if the collector only returns values.

@marcbonnici
Copy link
Collaborator

We did debate this, however we thought this would cause issues / still require copying, in the event that you don't know the directory at creation time. E.g. you want to add something like a timestamp or in WA as we want to instantiate the collectors at the beginning of a run and then determine the output directory depending on the job that is currently executing.

@douglas-raillard-arm
Copy link
Contributor

Right, I was forgetting that the collector can be stopped and restarted also. In that case, the user might want to set the output to a new file, avoiding any copies.
In LISA, that issue is solved by shovelling around (and updating) the configuration of the collector (rather than the collector itself), so we can create the collector at the last instant and discard it after use.

@marcbonnici
Copy link
Collaborator

Ah ok that makes sense. Yea we thought that going for the separate call approach would hopefully be the most flexible solution allowing for different use cases.

@qais-yousef
Copy link

Sorry for the delayed response. I'm not an expert in the details but this looks good to me. Thanks!

@valschneider
Copy link
Contributor

I had a look as well, no big concerns from my end.

@marcbonnici
Copy link
Collaborator

Implemented with #454

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants