Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embed the native libraries in the hadoop-lzo jar #73

Closed
julienledem opened this issue Jul 3, 2013 · 17 comments
Closed

Embed the native libraries in the hadoop-lzo jar #73

julienledem opened this issue Jul 3, 2013 · 17 comments

Comments

@julienledem
Copy link

the snappy-java library has this cool feature of embedding the native libraries in the jar and loads the correct one depending on the os. That would be a cool feature to add to LZO and would make testing easier.
https://github.com/xerial/snappy-java

in particular:
https://github.com/xerial/snappy-java/blob/develop/src/main/java/org/xerial/snappy/SnappyLoader.java
https://github.com/xerial/snappy-java/tree/develop/src/main/resources/org/xerial/snappy/native

@sjlee
Copy link
Collaborator

sjlee commented Jul 8, 2013

It sounds like a good idea. I can see how it can make testing easier.

There are a couple of caveats:
First, unlike snappy, hadoop-lzo depends on another native library (lzo itself) present on the machine. So the lzo library needs to be installed on the machine and added to the path environment variable before hadoop-lzo can run successfully. So it would not be "zero-configuration" even if we embed the hadoop-lzo native lib in the jar. So in that sense, the value of this may be somewhat limited.

It seems like snappy checks in the built native libs back into git! So both the native source and the built libraries are checked in git. That seems a little bit yucky to me. It also probably implies that the release process would be a two-step process. But if we want to embed libraries for more than one platform, this may be an inevitable conclusion (there never will be a single build that will build the whole thing anyway).

At any rate, a pull request is always welcome! :)

@julienledem
Copy link
Author

Hi @sjlee
The goal is really to get to a self contained jar with no other dependency, so maybe the native jni library could statically link the lzo library to avoid that dependency. I agree that checking in the binaries is not great, especially as it hides how those binaries were built. The java code that decides what library should be loaded based on the OS, then puts the said library somewhere and loads it is interesting though.

@sjlee
Copy link
Collaborator

sjlee commented Jul 8, 2013

Yeah I agree there are some nifty ideas in that java code that loads the library. As for static linking, I think it would make the jar self-contained. However, I do want us to think through the implications of statically linking lzo v. dynamically linking (upgrade implication, any resource usage implication, etc.)

@sjlee
Copy link
Collaborator

sjlee commented Oct 4, 2013

I'd like to restart this discussion. I think there are a at least a couple of different ways of embedding the native binaries, each of which would have its pros and cons.

One approach is to generate and embed the native binaries into the jar at build time. This approach is lightweight and doesn't have a lot of implications in terms of maintaining separate native binaries under source control. But it would deliver most of the benefit, and would make the jar more self-contained. In native library loading, it could check the presence of the embedded native library and load it from there if found. And if not found, it could simply fall back on the current behavior (i.e. finding it in the library path). So none of the existing use cases would be disturbed, but only with more convenience.

The other approach is to create an area to check in the native libraries (like snappy). While the benefit is that a single jar can support multiple OSes and platforms, the drawback is that this could incur more significant maintenance burden (every time the native source changes, one needs to generate the native libraries for all "supported" platforms and check them in).

I would favor going with the former approach. It's lightweight, unintrusive, and still adds good value.

Thoughts?

@jrottinghuis
Copy link
Contributor

Shipping native libraries in the jar already means generating multiple jars right, or are you thinking we have one jar with native binaries for many platforms all built in one?

How would this work across Linux flavored and even Windows ?

Thanks,

Joep

Sent from my iPhone

On Oct 4, 2013, at 4:03 PM, Sangjin Lee [email protected] wrote:

I'd like to restart this discussion. I think there are a at least a couple of different ways of embedding the native binaries, each of which would have its pros and cons.

One approach is to generate and embed the native binaries into the jar at build time. This approach is lightweight and doesn't have a lot of implications in terms of maintaining separate native binaries under source control. But it would deliver most of the benefit, and would make the jar more self-contained. In native library loading, it could check the presence of the embedded native library and load it from there if found. And if not found, it could simply fall back on the current behavior (i.e. finding it in the library path). So none of the existing use cases would be disturbed, but only with more convenience.

The other approach is to create an area to check in the native libraries (like snappy). While the benefit is that a single jar can support multiple OSes and platforms, the drawback is that this could incur more significant maintenance burden (every time the native source changes, one needs to generate the native libraries for all "supported" platforms and check them in).

I would favor going with the former approach. It's lightweight, unintrusive, and still adds good value.

Thoughts?


Reply to this email directly or view it on GitHub.

@sjlee
Copy link
Collaborator

sjlee commented Oct 7, 2013

With approach (1), we will not create a single jar that has native libraries for many platforms all built into one. When you build, the native libraries for the platform you're building on (and that only) will be added to the jar. The goal is more of added convenience than creating a single jar that officially supports multiple platforms out of the box.

However, if your deployment environment is of a single platform, then you could build the jar once on that environment, and the jar will be self-contained.

On the other hand, if the jar is deployed into a platform the embedded native libraries do not match, it would simply fall back to the current behavior and look for the appropriate native libraries in the library path.

@sjlee
Copy link
Collaborator

sjlee commented Oct 8, 2013

I'm going to creating a pull request shortly for this...

@sjlee
Copy link
Collaborator

sjlee commented Aug 29, 2014

This was resolved with pull request #81.

@sjlee sjlee closed this as completed Aug 29, 2014
@julienledem
Copy link
Author

thanks!
@sjlee which version do I use to try this?

@sjlee
Copy link
Collaborator

sjlee commented Aug 30, 2014

It seems like we didn't do a release after this was merged. It would be 0.4.20.

@zman0900
Copy link
Contributor

Sorry to bring this back from the dead, but are there any plans to do a release with this? If so, what platform will the jar in the maven repo be built for?

@sjlee
Copy link
Collaborator

sjlee commented Aug 16, 2016

Sorry this fell through the cracks. I was hoping to close PR #90 before cutting a release, but that's stalled. I could do a release before that as it's been some time.

My baseline thinking is to have x86_64 built into the jar for the maven central as that would probably be the largest user base. Thoughts?

@zman0900
Copy link
Contributor

We are running on x86_64 Linux here, so that would be perfect for me.

@zman0900
Copy link
Contributor

zman0900 commented Sep 9, 2016

Any news on that release? If you do decide to release, any chance PR #117 could be included?

@sjlee
Copy link
Collaborator

sjlee commented Sep 13, 2016

My apologies things got delayed a while. Hopefully I can pick this up next week. Yes, I'll look at #117 before a release. Thanks for your patience.

@harperjiang
Copy link

I realize this is a pretty old post. But may I know the current situation?

@sjlee
Copy link
Collaborator

sjlee commented Feb 21, 2018

AFAIK, the 0.4.20 release has this: https://github.com/twitter/hadoop-lzo/releases/tag/release-0.4.20

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants