Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiling from source fails to find pyarrow #314

Open
pecigonzalo opened this issue Jun 29, 2021 · 7 comments
Open

Compiling from source fails to find pyarrow #314

pecigonzalo opened this issue Jun 29, 2021 · 7 comments

Comments

@pecigonzalo
Copy link

pecigonzalo commented Jun 29, 2021

As already reflected in #276 compiling from source fails to find pyarrow outside of conda.

This is using pyarrow installation from wheels.

Reproduce in

FROM python:3.8-slim-buster
RUN apt-get update \
    && apt-get install --no-install-recommends -y \
    g++ \
    ninja-build cmake git-core wget \
    libboost-all-dev \
    unixodbc unixodbc-dev \
    python-dev \
    && apt-get clean

RUN pip install --user pybind11==2.6.2 pyarrow==3.0.0

# Attempt to make the container find the pyarrow lib.
ENV LD_LIBRARY_PATH=/root/.local/lib/python3.8/site-packages/pyarrow:$LD_LIBRARY_PATH
RUN pip install --user turbodbc==4.2.0

I dont understand why #276 was closed as many users are reporting the exact same issue. The issue is likely due to pyarrow .so files being suffixed with .300 for version 3.0.0 and so on.

The following comment (which links to the actual comments) compiling from source is mentioned as symlinking the names will not work, but its not clear what needs to be compiled.

Sample error output:

[...]
#7 416.6     src/turbodbc_arrow/set_arrow_parameters.cpp: In member function ‘void turbodbc_arrow::{anonymous}::string_converter::rebind_to_maximum_length(const arrow::BinaryArray&, std::size_t, std::size_t)’:
#7 416.6     src/turbodbc_arrow/set_arrow_parameters.cpp:101:33: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘std::size_t’ {aka ‘long unsigned int’} [-Wsign-compare]
#7 416.6                for (int64_t i = 0; i != elements; ++i) {
#7 416.6                                    ~~^~~~~~~~~~~
#7 416.6     src/turbodbc_arrow/set_arrow_parameters.cpp: In member function ‘void turbodbc_arrow::{anonymous}::string_converter::set_batch_utf16(std::size_t, std::size_t)’:
#7 416.6     src/turbodbc_arrow/set_arrow_parameters.cpp:140:31: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘std::size_t’ {aka ‘long unsigned int’} [-Wsign-compare]
#7 416.6              for (int64_t i = 0; i != elements; ++i) {
#7 416.6                                  ~~^~~~~~~~~~~
#7 416.6     src/turbodbc_arrow/set_arrow_parameters.cpp: In function ‘std::shared_ptr<arrow::Table> turbodbc_arrow::unwrap_pyarrow_table(const pybind11::object&)’:
#7 416.6     src/turbodbc_arrow/set_arrow_parameters.cpp:427:64: warning: ‘arrow::Status arrow::py::unwrap_table(PyObject*, std::shared_ptr<arrow::Table>*)’ is deprecated: Use Result-returning version [-Wdeprecated-declarations]
#7 416.6          if (not arrow::py::unwrap_table(pyarrow_table.ptr(), &table).ok()) {
#7 416.6                                                                     ^
#7 416.6     In file included from src/turbodbc_arrow/set_arrow_parameters.cpp:3:
#7 416.6     /root/.local/lib/python3.8/site-packages/pyarrow/include/arrow/python/pyarrow.h:54:30: note: declared here
#7 416.6        ARROW_PYTHON_EXPORT Status unwrap_##FUNC_SUFFIX(PyObject*,                           \
#7 416.6                                   ^~~~~~~
#7 416.6     /root/.local/lib/python3.8/site-packages/pyarrow/include/arrow/python/pyarrow.h:54:30: note: in definition of macro ‘DECLARE_WRAP_FUNCTIONS’
#7 416.6        ARROW_PYTHON_EXPORT Status unwrap_##FUNC_SUFFIX(PyObject*,                           \
#7 416.6                                   ^~~~~~~
#7 416.6     src/turbodbc_arrow/set_arrow_parameters.cpp:427:64: warning: ‘arrow::Status arrow::py::unwrap_table(PyObject*, std::shared_ptr<arrow::Table>*)’ is deprecated: Use Result-returning version [-Wdeprecated-declarations]
#7 416.6          if (not arrow::py::unwrap_table(pyarrow_table.ptr(), &table).ok()) {
#7 416.6                                                                     ^
#7 416.6     In file included from src/turbodbc_arrow/set_arrow_parameters.cpp:3:
#7 416.6     /root/.local/lib/python3.8/site-packages/pyarrow/include/arrow/python/pyarrow.h:54:30: note: declared here
#7 416.6        ARROW_PYTHON_EXPORT Status unwrap_##FUNC_SUFFIX(PyObject*,                           \
#7 416.6                                   ^~~~~~~
#7 416.6     /root/.local/lib/python3.8/site-packages/pyarrow/include/arrow/python/pyarrow.h:54:30: note: in definition of macro ‘DECLARE_WRAP_FUNCTIONS’
#7 416.6        ARROW_PYTHON_EXPORT Status unwrap_##FUNC_SUFFIX(PyObject*,                           \
#7 416.6                                   ^~~~~~~
#7 416.6     src/turbodbc_arrow/set_arrow_parameters.cpp: In instantiation of ‘void turbodbc_arrow::{anonymous}::string_converter::set_batch_of_type(std::size_t, std::size_t) [with String = std::__cxx11::basic_string<char>; std::size_t = long unsigned int]’:
#7 416.6     src/turbodbc_arrow/set_arrow_parameters.cpp:173:57:   required from here
#7 416.6     src/turbodbc_arrow/set_arrow_parameters.cpp:121:33: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘std::size_t’ {aka ‘long unsigned int’} [-Wsign-compare]
#7 416.6                for (int64_t i = 0; i != elements; ++i) {
#7 416.6                                    ~~^~~~~~~~~~~
#7 416.6     In file included from /root/.local/lib/python3.8/site-packages/pyarrow/include/arrow/python/platform.h:28,
#7 416.6                      from /root/.local/lib/python3.8/site-packages/pyarrow/include/arrow/python/pyarrow.h:20,
#7 416.6                      from src/turbodbc_arrow/set_arrow_parameters.cpp:3:
#7 416.6     /usr/local/include/python3.8/datetime.h: At global scope:
#7 416.6     /usr/local/include/python3.8/datetime.h:189:25: warning: ‘PyDateTimeAPI’ defined but not used [-Wunused-variable]
#7 416.6      static PyDateTime_CAPI *PyDateTimeAPI = NULL;
#7 416.6                              ^~~~~~~~~~~~~
#7 416.6     gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -Iinclude/ -I/root/.local/lib/python3.8/site-packages/pybind11/include -I/root/.local/lib/python3.8/site-packages/pyarrow/include -I/usr/local/include/python3.8 -c src/turbodbc_arrow/arrow_result_set.cpp -o build/temp.linux-x86_64-3.8/src/turbodbc_arrow/arrow_result_set.o --std=c++11 -fvisibility=hidden
#7 416.6     In file included from /root/.local/lib/python3.8/site-packages/pyarrow/include/arrow/python/platform.h:28,
#7 416.6                      from /root/.local/lib/python3.8/site-packages/pyarrow/include/arrow/python/pyarrow.h:20,
#7 416.6                      from src/turbodbc_arrow/arrow_result_set.cpp:7:
#7 416.6     /usr/local/include/python3.8/datetime.h:189:25: warning: ‘PyDateTimeAPI’ defined but not used [-Wunused-variable]
#7 416.6      static PyDateTime_CAPI *PyDateTimeAPI = NULL;
#7 416.6                              ^~~~~~~~~~~~~
#7 416.6     g++ -pthread -shared -Wl,--strip-all build/temp.linux-x86_64-3.8/src/turbodbc_arrow/python_bindings.o build/temp.linux-x86_64-3.8/src/turbodbc_arrow/set_arrow_parameters.o build/temp.linux-x86_64-3.8/src/turbodbc_arrow/arrow_result_set.o -Lbuild/lib.linux-x86_64-3.8 -L/root/.local/lib/python3.8/site-packages/pyarrow -L/usr/local/lib -lodbc -larrow -larrow_python -lturbodbc.cpython-38-x86_64-linux-gnu -o build/lib.linux-x86_64-3.8/turbodbc_arrow_support.cpython-38-x86_64-linux-gnu.so -Wl,-rpath,$ORIGIN -Wl,-rpath,$ORIGIN/pyarrow
#7 416.6     /usr/bin/ld: cannot find -larrow
#7 416.6     /usr/bin/ld: cannot find -larrow_python
#7 416.6     collect2: error: ld returned 1 exit status
#7 416.6     error: command 'g++' failed with exit status 1
#7 416.6     ----------------------------------------
#7 416.6 ERROR: Command errored out with exit status 1: /usr/local/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-ji0uyzrh/turbodbc_7775e8dbafcc47e48727f140b05fac07/setup.py'"'"'; __file__='"'"'/tmp/pip-install-ji0uyzrh/turbodbc_7775e8dbafcc47e48727f140b05fac07/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-dinks7e7/install-record.txt --single-version-externally-managed --user --prefix= --compile --install-headers /root/.local/include/python3.8/turbodbc Check the logs for full command output.
#7 ERROR: executor failed running [/bin/sh -c pip install --user turbodbc==4.2.0]: exit code: 1
@pecigonzalo
Copy link
Author

pecigonzalo commented Jun 29, 2021

This build works:

FROM python:3.8-slim-buster as deps
RUN apt-get update \
    && apt-get install --no-install-recommends -y \
    g++ \
    ninja-build cmake git-core wget \
    libboost-all-dev \
    unixodbc unixodbc-dev \
    python-dev \
    && apt-get clean

RUN pip install --user pybind11==2.6.2 pyarrow==3.0.0

FROM deps as turbodbc

RUN ln -rvs \
  /root/.local/lib/python3.8/site-packages/pyarrow/libarrow.so.300 \
  /root/.local/lib/python3.8/site-packages/pyarrow/libarrow.so
RUN ln -rvs \
  /root/.local/lib/python3.8/site-packages/pyarrow/libarrow_dataset.so.300 \
  /root/.local/lib/python3.8/site-packages/pyarrow/libarrow_dataset.so
RUN ln -rvs \
  /root/.local/lib/python3.8/site-packages/pyarrow/libarrow_flight.so.300 \
  /root/.local/lib/python3.8/site-packages/pyarrow/libarrow_flight.so
RUN ln -rvs \
  /root/.local/lib/python3.8/site-packages/pyarrow/libarrow_python.so.300 \
  /root/.local/lib/python3.8/site-packages/pyarrow/libarrow_python.so
RUN ln -rvs \
  /root/.local/lib/python3.8/site-packages/pyarrow/libarrow_python_flight.so.300 \
  /root/.local/lib/python3.8/site-packages/pyarrow/libarrow_python_flight.so
RUN ln -rvs \
  /root/.local/lib/python3.8/site-packages/pyarrow/libparquet.so.300 \
  /root/.local/lib/python3.8/site-packages/pyarrow/libparquet.so
RUN ln -rvs \
  /root/.local/lib/python3.8/site-packages/pyarrow/libplasma.so.300 \
  /root/.local/lib/python3.8/site-packages/pyarrow/libplasma.so
RUN pip install --user turbodbc==4.2.0

But I dont know if the software will work as a commented in the linked issue has then following comment:

You will end up with random segmentation faults otherwise.

in reference to symlinking.

This also means we cant define turbodbc==4.2.0 in a requirements.txt together with pyarrow because we need to do a manual step in between.

@pecigonzalo
Copy link
Author

pecigonzalo commented Jun 29, 2021

The fix that was mentioned in the previous issue, is likely the one in this doc https://arrow.apache.org/docs/python/extending.html#building-extensions-against-pypi-wheels and referenced in this comment #276 (comment).

I think its a bad call from pyarrow to ask consumers to modify the installation.

@ldacey
Copy link
Contributor

ldacey commented Oct 29, 2021

The documentation you linked was helpful for me. I am now able to get turbodbc up and running without conda for the first time. I am installing pyarrow in a separate RUN command with some other dependencies, then I have a line which runs the create_library_symlinks() command. Finally, the rest of my requirements (including turbodbc and airflow-providers-odbc) are installed.

RUN pip install --user --upgrade pip \
    && pip install --no-cache --user \
    python-snappy \
    pybind11 \
    numpy \
    pyarrow==5.0.0 \
    apache-airflow[password,crypto]==${AIRFLOW_VERSION}

RUN python -c "import pyarrow; pyarrow.create_library_symlinks()"

RUN pip install --no-cache --user -r requirements.txt

@ldacey
Copy link
Contributor

ldacey commented Oct 31, 2021

Well, the build worked but then turbodbc was not able to find pyarrow during actual tasks. Both libraries are installed in the same environment. I will try @pecigonzalo's approach with symlinks

I know this works with conda, but I want to move towards using the official apache/airflow image which does not use conda. The only failure is turbodbc right now.

@DevangB9
Copy link

I am facing the same issue, @idacey did you find any solution?

@xhochy I went through this : #276 and #227.

I'm using Ubuntu 20.04 in a windows system. Any help would be great. Thanks a lot

@ldacey
Copy link
Contributor

ldacey commented Feb 21, 2022

Negative. I ended up installing with mamba instead and used a package called conda-pack to avoid having conda installed in my final image.

COPY ${ENV_FILE} /conda-env.yml

#creates the conda environment from conda-env.yml and unpacks it to be copied from the /venv folder
RUN mamba env create -f /conda-env.yml \
    && /opt/conda/envs/airflow/bin/conda-pack --name airflow --ignore-missing-files --output /tmp/env.tar.gz \
    && mkdir -p ${VIRTUAL_ENV} \ 
    && cd ${VIRTUAL_ENV} \
    && tar -xvf /tmp/env.tar.gz \
    && rm /tmp/env.tar.gz \
    && ${VIRTUAL_ENV}/bin/conda-unpack \
    && conda clean -afy

WORKDIR ${VIRTUAL_ENV}

My final image copies my venv folder which results in a working pyarrow without anaconda installed .

COPY --chown=airflow:root --from=python-dependencies /venv /venv

I am still hoping for the day when I can pip install everything since a chunk of my most important libraries are not on conda at all.

@david-engelmann
Copy link
Contributor

david-engelmann commented May 13, 2022

I am facing the same issue, @idacey did you find any solution?

@xhochy I went through this : #276 and #227.

I'm using Ubuntu 20.04 in a windows system. Any help would be great. Thanks a lot

@DevangB9 I recently was able to solve this issue and posted it in this comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants