Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorise split linestring #14

Open
tlestang opened this issue Jul 16, 2021 · 2 comments
Open

Vectorise split linestring #14

tlestang opened this issue Jul 16, 2021 · 2 comments
Labels
big Looks like a lot of work, may link to smaller issues feature New feature or functionality

Comments

@tlestang
Copy link
Contributor

tlestang commented Jul 16, 2021

Subissue of #8

We'd like to use Shapely or PyGEOS on the Pythoon side (user side) because this is what GeoPandas outputs after reading files. Therefore we want our C++ function to receive a Shapely or PyGEOS LineString as an argument.

Pybind11 is available to convert from a shapely.coords.CoordinateSequence object to a std::vector<std::tuple<double, double>>:

double fun(std::vector<std::tuple<double, double>> linestring) {
  std::tuple<double, double> point1 = linestring[0];
  return std::get<0>(point1); //return x value
}

PYBIND11_MODULE(example, m) {
  m.doc() = "pybind11 example plugin"; // optional module docstring
  m.def("fun", &fun, "A function");
}
from shapely.geometry import LineString
import example

example.fun(
    LineString([(0,0), (1,0)]).coords
)
# 0.0

The conversion from Python type to C++ type implies a copy. In addition, we would still have to convert the Linestring to a vector of Point

std::vector<Point> linestring;
for (auto point_as_tuple : linestring_frompy) {
  Point p = Point(point_as_tuple); // Construct Point object from tuple
  linestring.push_back(p);
 }

This conversion would have to be done for each element in the vector dataset.

To avoid the first conversion we could work with native Python types, through Pybind11's pybind11::object class:

double fun(py::object linestring_frompy) {
  py::object coords = linestring_frompy.attr("coords");
  int size = coords.attr("__len__")();
  // Build linestring from CoordinateSequence Python object
  std::vector<Point> linestring;
  for (int i = 0; i<size;i++) {
    // Provided Point constructor takes a std::tuple
    Point p = Point(coords.attr("__getitem__")(i)); 
    linestring.push_back(p);
  }
}

Both Shapely and PyGEOS are wrapper around GEOS. For example PyGEOS geometries contain a pointer to a GEOS GEOSGeometry object. Therefore it sounds redundant to convert a PyGEOS Python geometry to a C++ vector of Point, when we could use GEOS on the C++ side and grab the GEOSGeometry pointer attached to the Python object.This seems to be the main idea behind PyGEOS as explained in this blog post by Matthew Rockling (see section Cythonizing GeoPandas).

PyGEOS geometry object have an attribute _ptr which I think is a pointer to GEOSGeometry:

// in pygeos/src/pygeom.c
static PyMemberDef GeometryObject_members[] = {
    {"_ptr", T_PYSSIZET, offsetof(GeometryObject, ptr), READONLY,
     "pointer to GEOSGeometry"},
    {"_ptr_prepared", T_PYSSIZET, offsetof(GeometryObject, ptr_prepared), READONLY,
     "pointer to PreparedGEOSGeometry"},
    {NULL} /* Sentinel */
};

However I don't know how to access this pointer from Pybind11, as attr("_ptr") returns another pybind11::object from which I cannot define a new GEOSGeometry pointer.

In conclusion, the ideal solution would be to work directly with GEOS in the C++ side, working with the objects that underly the PyGEOS objects - but I'm stuck for now.

An alternative approach would to stick to Python types (option 1) and define a wrapper around a Shapely/PyGEOS class:

class LineString {
  pybind11::object coord_seq;

public:
  LineString(pybind11::object linestring_frompy) {
    coord_seq = linestring_frompy.attr("coords");
  }
  Point getPoint(const int idx){
    return Point(coords.attr("__getitem__")(idx));
  }
  void setPoint(const int idx){
    // ...
  }
};
@tomalrussell
Copy link
Member

As discussed, a related piece of the puzzle is how best to operate on a whole "geometry" column / PyGEOS vector.

Is this best as passing the column as in as an argument to the C++ code? Can we interpret the numpy array as a vector of pointers to GEOS objects through pybind11? Can we access the coordinates with a minimum of copies directly, and construct a new array of split geometries to return?

Or is it simpler and fast enough to work with pygeos to return a representation of raw coordinates in numpy arrays of (coordinates, offsets) as in pygeos/pygeos#75? The example there is for multipolygons, with offsets to index 1) multipolygon parts 2) exterior/interior rings of the parts 3) coordinates of the rings. For linestrings we would presumably need only one offsets array for the coordinates of the linestrings.

@thomas-fred thomas-fred changed the title Interface between Shapely/PyGEOS and C++ Vectorise split linestring for PyGEOS objects May 25, 2022
@thomas-fred thomas-fred added big Looks like a lot of work, may link to smaller issues feature New feature or functionality labels May 25, 2022
@tomalrussell tomalrussell changed the title Vectorise split linestring for PyGEOS objects Vectorise split linestring Aug 22, 2023
@tomalrussell
Copy link
Member

With shapely 2.0 and geoarrow, the ragged_array representation of coordinates and offsets might be simplest to work with, cf:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
big Looks like a lot of work, may link to smaller issues feature New feature or functionality
Projects
None yet
Development

No branches or pull requests

3 participants