Enhance cross-platform compatibility for loading PySRRegressor models #681

zzccchen · 2024-07-27T03:08:35Z

This PR improves the cross-platform compatibility of the PySRRegressor.from_file method, allowing seamless loading of models saved on different operating systems (Windows and Linux).

Key changes:

Introduced a custom unpickler (CustomUnpickler) to handle path objects from different operating systems.
Added a path_to_str function to convert path objects to strings, ensuring consistent path handling across platforms.
Updated the model loading process to use the custom unpickler and path conversion.
Maintained the existing functionality for creating models from scratch when a pickle file is not found.

These changes allow users to load PySRRegressor models on any supported platform, regardless of where the model was originally saved. This enhancement is particularly useful for users working in mixed environments or collaborating across different operating systems.

The modifications are contained within the from_file method to minimize impact on other parts of the codebase. The method remains backwards compatible with existing usage patterns.

Testing:

Tested loading models saved on Linux from a Windows environment.
Verified that the existing functionality for creating models from scratch remains intact.

This enhancement addresses the issue of cross-platform incompatibility when loading saved models, improving the overall user experience and flexibility of PySRRegressor.

for more information, see https://pre-commit.ci

coveralls · 2024-07-27T03:24:53Z

Pull Request Test Coverage Report for Build 10138721258

Details

15 of 17 (88.24%) changed or added relevant lines in 2 files are covered.
3 unchanged lines in 1 file lost coverage.
Overall coverage decreased (-0.3%) to 93.448%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
pysr/utils.py	12	14	85.71%

Files with Coverage Reduction	New Missed Lines	%
pysr/export_sympy.py	3	77.27%

Totals
Change from base Build 10137135457:	-0.3%
Covered Lines:	1141
Relevant Lines:	1221

💛 - Coveralls

MilesCranmer · 2024-07-28T14:59:41Z

Thanks, this is great! I'll do a code review for suggesting some changes

pysr/sr.py

MilesCranmer

Thanks again, added some comments. Also could you add some unittests for CustomUnpickler? The test should be one that is currently failing, so this CustomUnpickler is required to get that unittest working.

pysr/sr.py

for more information, see https://pre-commit.ci

MilesCranmer · 2024-07-29T08:15:35Z

(By the way, feel free to click "Resolved" on any of the comments once they are completed)

zzccchen · 2024-07-29T08:18:57Z

(By the way, feel free to click "Resolved" on any of the comments once they are completed)

Got it :)

MilesCranmer · 2024-07-29T08:20:30Z

Looking good! Thanks. I think it just needs some tests now – ideally tests that fail with the current master branch, but are fixed by this PR

MilesCranmer · 2024-07-29T08:22:09Z

One option is to add a test that is only run on the Windows tests – it could generate and then pickle the model on Windows, and then create and run a dockerfile that attempts to unpickle the model? (Since docker runs linux).

zzccchen · 2024-07-29T08:34:20Z

Looking good! Thanks. I think it just needs some tests now – ideally tests that fail with the current master branch, but are fixed by this PR

Thank you for the feedback. I agree tests would be valuable, but I'm not experienced in writing test cases. Could you provide some guidance or examples for appropriate tests for these changes?

zzccchen · 2024-07-29T08:35:29Z

One option is to add a test that is only run on the Windows tests – it could generate and then pickle the model on Windows, and then create and run a dockerfile that attempts to unpickle the model? (Since docker runs linux).

I appreciate your suggestion, but I'm afraid I'm a bit lost on how to implement this. The process of generating, pickling, and then using Docker for unpickling sounds complex. I'm not sure where to start. Would you be able to help implement this test, or provide a more detailed guide?

MilesCranmer · 2024-07-29T09:07:14Z

Ok what I would do is:

Copy https://github.com/MilesCranmer/PySR/blob/master/pysr/test/test_dev.py to pysr/test/test_pickle.py
Copy https://github.com/MilesCranmer/PySR/blob/master/pysr/test/test_dev_pysr.dockerfile to pysr/test/test_pickle.dockerfile

Modify the code

PySR/pysr/test/test_dev.py

Lines 8 to 47 in 3aee19e

    
           def test_simple_change_to_backend(self): 
        
               """Test that we can use a development version of SymbolicRegression.jl""" 
        
               PYSR_TEST_JULIA_VERSION = os.environ.get("PYSR_TEST_JULIA_VERSION", "1.6") 
        
               PYSR_TEST_PYTHON_VERSION = os.environ.get("PYSR_TEST_PYTHON_VERSION", "3.9") 
        
               build_result = subprocess.run( 
        
                   [ 
        
                       "docker", 
        
                       "build", 
        
                       "-t", 
        
                       "pysr-dev", 
        
                       "--build-arg", 
        
                       f"JLVERSION={PYSR_TEST_JULIA_VERSION}", 
        
                       "--build-arg", 
        
                       f"PYVERSION={PYSR_TEST_PYTHON_VERSION}", 
        
                       "-f", 
        
                       "pysr/test/test_dev_pysr.dockerfile", 
        
                       ".", 
        
                   ], 
        
                   env=os.environ, 
        
                   cwd=Path(__file__).parent.parent.parent, 
        
                   universal_newlines=True, 
        
               ) 
        
               self.assertEqual(build_result.returncode, 0) 
        
               test_result = subprocess.run( 
        
                   [ 
        
                       "docker", 
        
                       "run", 
        
                       "--rm", 
        
                       "pysr-dev", 
        
                       "python3", 
        
                       "-c", 
        
                       "from pysr import SymbolicRegression as SR; print(SR.__test_function())", 
        
                   ], 
        
                   stdout=subprocess.PIPE, 
        
                   stderr=subprocess.PIPE, 
        
                   env=os.environ, 
        
                   cwd=Path(__file__).parent.parent.parent, 
        
               ) 
        
               self.assertEqual(test_result.returncode, 0) 
        
               self.assertEqual(test_result.stdout.decode("utf-8").strip(), "2.3")

within test_pickle.py to run a simple PySR example where the equation file parameter set to some Path to a folder created with https://docs.python.org/3/library/tempfile.html#tempfile.TemporaryDirectory. Also change the class name to TestPickle or something.

Modify

PySR/pysr/test/test_dev.py

Lines 51 to 59 in 3aee19e

    
           tests = [TestDev] 
        
           if just_tests: 
        
               return tests 
        
           suite = unittest.TestSuite() 
        
           loader = unittest.TestLoader() 
        
           for test in tests: 
        
               suite.addTests(loader.loadTestsFromTestCase(test)) 
        
           runner = unittest.TextTestRunner() 
        
           return runner.run(suite)

to reference TestPickle.

Search for any appearances of test_dev throughout the codebase, and use that to figure out where you need to add the equivalent test_pickle.

Modify the test_pickle.dockerfile, specifically all of these lines:

PySR/pysr/test/test_dev_pysr.dockerfile

Lines 34 to 55 in 3aee19e

    
           RUN mkdir /pysr/pysr/test 
        
           # Now, we create a custom version of SymbolicRegression.jl 
        
           # First, we get the version from juliapkg.json: 
        
           RUN python3 -c 'import json; print(json.load(open("/pysr/pysr/juliapkg.json", "r"))["packages"]["SymbolicRegression"]["version"])' > /pysr/sr_version 
        
           # Remove any = or ^ or ~ from the version: 
        
           RUN cat /pysr/sr_version | sed 's/[\^=~]//g' > /pysr/sr_version_processed 
        
           # Now, we check out the version of SymbolicRegression.jl that PySR is using: 
        
           RUN git clone -b "v$(cat /pysr/sr_version_processed)" --single-branch https://github.com/MilesCranmer/SymbolicRegression.jl /srjl 
        
           # Edit SymbolicRegression.jl to create a new function. 
        
           # We want to put this function immediately after `module SymbolicRegression`: 
        
           RUN sed -i 's/module SymbolicRegression/module SymbolicRegression\n__test_function() = 2.3/' /srjl/src/SymbolicRegression.jl 
        
           # Edit PySR to use the custom version of SymbolicRegression.jl: 
        
           ADD ./pysr/test/generate_dev_juliapkg.py /generate_dev_juliapkg.py 
        
           RUN python3 /generate_dev_juliapkg.py /pysr/pysr/juliapkg.json /srjl 
        
           # Install and pre-compile 
        
           RUN pip3 install --no-cache-dir . && python3 -c 'import pysr'

to try to load the pickle file generated by the test you created in (3). And basically just verify that the equations loaded are correct.

Note how the code here:

PySR/pysr/test/test_dev.py

Lines 12 to 47 in 3aee19e

    
           build_result = subprocess.run( 
        
               [ 
        
                   "docker", 
        
                   "build", 
        
                   "-t", 
        
                   "pysr-dev", 
        
                   "--build-arg", 
        
                   f"JLVERSION={PYSR_TEST_JULIA_VERSION}", 
        
                   "--build-arg", 
        
                   f"PYVERSION={PYSR_TEST_PYTHON_VERSION}", 
        
                   "-f", 
        
                   "pysr/test/test_dev_pysr.dockerfile", 
        
                   ".", 
        
               ], 
        
               env=os.environ, 
        
               cwd=Path(__file__).parent.parent.parent, 
        
               universal_newlines=True, 
        
           ) 
        
           self.assertEqual(build_result.returncode, 0) 
        
           test_result = subprocess.run( 
        
               [ 
        
                   "docker", 
        
                   "run", 
        
                   "--rm", 
        
                   "pysr-dev", 
        
                   "python3", 
        
                   "-c", 
        
                   "from pysr import SymbolicRegression as SR; print(SR.__test_function())", 
        
               ], 
        
               stdout=subprocess.PIPE, 
        
               stderr=subprocess.PIPE, 
        
               env=os.environ, 
        
               cwd=Path(__file__).parent.parent.parent, 
        
           ) 
        
           self.assertEqual(test_result.returncode, 0) 
        
           self.assertEqual(test_result.stdout.decode("utf-8").strip(), "2.3")

builds and runs a dockerfile. Do the same in your new test_pickle.py, but reference the test_pickle.dockerfile. You should be able to pass the filename as an argument.

I think this should work. It might be overcomplicated though, maybe there's an easier way to test this. Feel free to take a different approach to test your new addition. For simple pure-Python tests you can just add a new test case to the class in test.py, such as

PySR/pysr/test/test.py

Lines 285 to 297 in 3aee19e

    
           X = self.rstate.randn(100, 3) + 1j * self.rstate.randn(100, 3) 
        
           y = (2 + 1j) * np.cos(X[:, 0] * (0.5 - 0.3j)) 
        
           model = PySRRegressor( 
        
               binary_operators=["+", "-", "*"], 
        
               unary_operators=["cos"], 
        
               **self.default_test_kwargs, 
        
               early_stop_condition="(loss, complexity) -> loss <= 1e-4 && complexity <= 6", 
        
           ) 
        
           model.niterations = DEFAULT_NITERATIONS * 10 
        
           model.fit(X, y) 
        
           test_y = model.predict(X) 
        
           self.assertTrue(np.issubdtype(test_y.dtype, np.complexfloating)) 
        
           self.assertLessEqual(np.average(np.abs(test_y - y) ** 2), 1e-4)

zzccchen and others added 2 commits July 27, 2024 11:03

Enhance cross-platform compatibility for loading PySRRegressor models

cc2ed43

[pre-commit.ci] auto fixes from pre-commit.com hooks

8f984db

for more information, see https://pre-commit.ci