Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First working implementation of the derotoation package #3

Merged
merged 169 commits into from
Dec 19, 2023

Conversation

lauraporta
Copy link
Member

@lauraporta lauraporta commented Nov 2, 2023

Description

What is this PR about?

This Pull Request introduces the inaugural implementation of the derotation package, tailored specifically for neuroscience research applications in calcium imaging, with a focus on 3-photon imaging.

Recommended Review Strategy

Given the substantive nature of this PR, which encompasses around 2,000 lines of code, I recommend the following review strategy to ensure a thorough understanding and efficient use of reviewers' time:

  1. Initial Discussion: A walk-through to discuss the scope, mechanics, and intended application of the package, complemented by demonstrations using actual data.

  2. Hands-On Trial: Personally execute a run of the package to gain firsthand experience with its functionality.

  3. Architectural Review: Engage in a dialogue to dissect and understand the high-level design and architectural decisions.

Please note, as this is an embryonic stage of a research-focused package, fine-grained scrutiny on software engineering practices, though important, may not be the immediate priority. Contextual information is provided below to aid in the review process.

Core Functionality

This package is designed to execute derotation-by-line via motor feedback for image correction, enhancing the data quality from 3-photon calcium imaging. It utilizes analog signals from the operational motor to unwarp TIFF images acquired during the imaging process. In addition, it is equipped with the capability to generate diagnostic plots for signal validation and create CSV files encapsulating rotation angle data for each frame.

Rationale for This PR

The primary goal of this PR is to integrate the initial version of the derotation package into the main branch, thereby streamlining access and utilization for end-users.

Contributions of This PR

This PR introduces the following components to the project:

  • Core classes embodying the package's functionality.
  • Example scripts demonstrating the use cases.
  • A README file that provides a preliminary guide to the package.
  • Preliminary tests that cover the critical aspects of the package.

Context

References

The conceptual foundation for this package stems from the work found in jvoigts' repository, which offers Matlab-based image derotation techniques, and is further informed by the study from Voigts and Harnett 2019. For additional insights, refer to this internal presentation.

Testing Approach

The package has undergone extensive local testing with datasets pertinent to its intended use cases. Unit tests have been crafted to validate the most crucial methods. There is a scope for adding more tests to ensure broader coverage.

Impact Assessment

  • Is this a breaking change? Not Applicable.
  • Does this PR necessitate updates to the documentation? Not Applicable, as current documentation updates are included within this PR.

Checklist for Reviewers

  • Ensure code functions correctly locally.
  • Verify that new test cases cover new functionality.
  • Check that documentation is updated to reflect any new changes or additions.
  • Confirm that code is formatted in accordance with pre-commit standards.

Copy link
Member

@niksirbi niksirbi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @lauraporta , here I will provide a high-level summary of my review.

More detailed comments are sprinkled throughout, but because of your pushes to the PR during the review, many of them show up at the wrong code lines (now we learned not to do that again). If you are unsure what some these comments are referring to, feel free to ask me.

As you requested, I focused on test-running the pipelines and reviewing the high-level design and architecture. I haven't checked the scientific accuracy of the results, that would probably take much more work.

I would recommend only addressing my most trivial comments in this PR, then merging to main, and raising the more substantive comments as issues to be addressed in future PRs.

So here are my high-level impressions:

  • I got both rotation pipelines to run on some datasets. I tried it on several datasets, and it seems to run (after fixing the path issues I have pointed out). So yayyy 🎉
  • That said, it took me a while to figure out how to run the pipeline. It would be helpful to have one docs page explaining how to setup the pipeline to run from the user's perspective (even if the user is only you, or your future self)
  • I think having configs in a specific folder, and requiring the code to be run from the root directory of the package are quite restrictive choices. In one of my comments I've suggested some alternative design ideas, we can also discuss them in person
  • The test coverage is quite low, I would try to increase that in next PRs. Perhaps it's wise to do that before massive refactorings.
  • The two pipeline classes are very long and hard to parse from start to finish. I think some refactoring will be needed. For example, you can rewrite many of the general purpose methods as static functions and put them in separate utility modules. Then they can be imported and used by both pipelines. I think it's better to rely more on composition rather than on inheritance.
  • Regarding the inheritance: it seems that the incremental pipeline uses only a small subset of the methods in the full rotation pipeline. I wonder whether it's worth defining a BaseRotationPipeline class from which both inherit. You can also consider other design patterns for pipelines, for example chain of responsibility, though I haven't thought through how it would exactly apply here.
  • Many of the methods inside the pipeline don't need to be public, you can convert those to private by prefixing them with _. Sphinx will then exclude them from API docs by default.

Overall, well done, this is a substantial piece of work! Must have taken you many hours of hard work.


@pytest.fixture
def full_rotation(number_of_rotations, rotation_len, intervals, dummy_signal):
dummy_signal += np.random.normal(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be more futureproof, use a numpy.random.Generator here instead of this legacy function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same applies to all other uses of np.random, I'll not flag them individually

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I didn't know we are going to say goodbye to np.random.normal 🪦

# when starting a new rotation in the middle of the image
rotated_filled_image = (
np.ones_like(image_stack[image_counter]) * min_value_img
) # non sampled pixels are set to the min val of the image
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether this is a robust choice over time. The minimum image value just signifies very low intensity, so someone looking at the derotated dataset would not know whether a particular pixel was just dark or whether is was never sampled.

One idea would be the following:

  • initialise the new stack with all NaN values
  • For sampled pixels, fill their actual values
  • After rotation is done, optionally "fill" the missing NaN via interpolation from neighboring pixels.

This approach would work well for those missing curved lines visible on derotated images, but not so well for missing areas around the edges.
Your approache might be simpler after all.

Another idea would be to generated the same image as you are now (missing pixels are filled with min_value_img), but additionally produce a mask array with the indices fo all "missing" pixels, so there is no ambiguity about it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback Niko. Having this dark pixel was very useful in debugging because it helpted me see clearly when and how the derotation of the analysis made sense.

I have then changed the code and now I am using the $F_0$ val of the luminance distribution, the same that should then be calculated by suite2p.

What do you think? I am unsure of filling-in with interpolation. Maybe I can leave the choice to the user and let them decide if they want $F_0$ or the interpolation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think F0 is fine for now. You may add an option for interpolation in a later PR.

from scipy.ndimage import rotate


def rotate_an_image_array_line_by_line(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thinks this function is quite complex and could use some refactoring to make it more understandable and maintainable. Just sth to note for future 'refactoring' PRs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, thanks

Comment on lines 21 to 33
ax[0].imshow(img, cmap="gray")
ax[0].set_title("Original image")
ax[1].imshow(img_rotated[0], cmap="gray")
ax[1].set_title("Rotated image 1")
ax[2].imshow(img_rotated[1], cmap="gray")
ax[2].set_title("Rotated image 2")
ax[3].imshow(img_rotated[2], cmap="gray")
ax[3].set_title("Rotated image 3")

ax[0].axis("off")
ax[1].axis("off")
ax[2].axis("off")
ax[3].axis("off")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be refactored to sth like:

image_names = ["Original image"] + [f"Rotated image {i}" for i in range(1, 4)]
images_to_plot = [img] + [img_rotated[i] for i in range(3)]
for i, image_name in enumerate(image_names):
    ax[i].imshow(images_to_plot[i], cmap="gray")
    ax[i].set_title(image_name)
    ax[i].axis("off")

But your version is more readable for Python novices.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do this refactoring 👍

from scipy.io import loadmat


def read_randomized_stim_table(path_to_randperm):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will be annoyingly pedantic and ask you to add type hints and docstring to all functions in this module. No need to be super-details, but some info about expected inputs/outputs would be nice.

I understand these functions are not user-facing, and you already provide some info in the FullPipeline.load_data() method docstring, but your future self (+ other developers) will be thankful. Especially if you expect people to modify these functions to adapt them to other setups.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also tests, these functions are barely tested, and they are vulnerable to breaking if the format of the input files changes (which happens very often).
At least try to catch stuff like missing fields?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how much care I should give to these very methods because they are specific to the way a single postdoc in a single lab saves the data. The idea is that people should re-write these methods for their own setup.
I think typing and tests should be there to verify if the new implementation other people will write works. Ideally they will be for an abstract interface of data_loaders (yet to be built).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that this implementation is specific to one person's data, but I think it's still useful to have at least some minimal docstrings, because they would be of great help when trying to adapt these functions for another project (at least they would help me understand what kind of loader functions do I need to implement in the first place).

paths_write:
debug_plots_folder: "your_path_to/debug_plots/"
logs_folder: "your_path_to/logs/"
derotated_tiff_folder: "your_path_to/data_folder/"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in some places you use tif and in some others (like here) you use tiff. To avoid typos and trivial error, decide on one of them and apply throughout

Comment on lines +14 to +15
from skimage.exposure import rescale_intensity
from sklearn.mixture import GaussianMixture
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scikit-learn and scikit-image have to be added to the dependencies

Comment on lines +310 to +311
k : float
The factor used to quantify the threshold.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add that it represents the number of std away from mean

Comment on lines +850 to +852
gm = GaussianMixture(n_components=7, random_state=0).fit(
img[0].reshape(-1, 1)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why specifically 7 components?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I am following the guidelines of a tutorial of 2p data analysis. I think they have found empirically that 7 in general is a good number of terms for this fit 🤔

Comment on lines +206 to +211
ax.scatter(
self.frame_start,
self.rot_deg_frame,
label="frame angles",
color="green",
marker="o",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try using a smaller marker size than the default here, to imrpove readability of the plot

@lauraporta lauraporta merged commit 3011ddd into main Dec 19, 2023
22 checks passed
@lauraporta lauraporta deleted the code-cleanup branch December 19, 2023 22:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants