-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add async APIs #28
Add async APIs #28
Conversation
@tobilg after I merge this, you can use the async apis to run concurrent jobs. no need for explicit java threads! If you can try this out I'd love to hear how it worked for you. |
The naming scheme is unorthodox (sync versions have bare names, async versions are suffixed "Async") to avoid having to touch tons of lines. This should be fixed, but isn't urgent as none of the raw jvm objects are exposed anywhere.
This change turns the "action" methods (those that run a computation) into async methods taking node-style callbacks. For each of the asyncified methods, an additional synchronous version is added, which has the same name with a 'Sync' suffix. The methods converted here are: collect(..) columns(..) count(..) head(..)
This change turns the "action" methods (those that run a computation) into async methods taking node-style callbacks. For each of the asyncified methods, an additional synchronous version is added, which has the same name with a 'Sync' suffix. The methods converted here are: json(..) text(..) load(..) // undocumented
This change turns the "action" methods (those that run a computation) into async methods taking node-style callbacks. For each of the asyncified methods, an additional synchronous version is added, which has the same name with a 'Sync' suffix. The methods converted here are: json(..) text(..) saveAsTable insertInto save(..) // undocumented
Thanks! Have you successfully tested that actions get executed in parallel? From my understanding this wasn't possible because of the way Spark works... Or do the async calls just wait until the predecessor is finished? |
I did some basic testing with load() by instrumenting the spark sources to Here's some pointers to why this works: when node-java wraps jvm methods, As to the statement in the Spark docs that "multiple parallel jobs can run On 16 January 2016 at 10:41, Tobi [email protected] wrote:
|
That sounds great! I'll try to test this on monday when I'm back in the office. One idea: To get rid of the callbacks, what do you think about using ES7 async/await and create promise wrappers for the async callback action methods? I use this in my project via ad-hoc babel.js transpilation... |
One question: Did you use the same context in your tests? |
Yes, I used the same context in my tests. Is that what you were planning to do? |
Re callbacks, I just did it that way as a simple starting point, from which it's easy for users to get promisified versions. Definitely not a personal preference for callbacks here :) I also hesitated to add promisified versions as part of this PR... that's easy to do but I'm just not yet sure if it's best to build those in. It sounds like you think they should be? |
Great, and yes, that's what I'd want to do. I'll test this on monday and try to incorporate this in my project. |
Sorry, last comment was regarding the context. Concerning the promisified functions, I think they would be a nice addon. And it would make it really easy to use await-style quasi-synchronous code style. |
Filed #29 for promisified functions On 16 January 2016 at 11:31, Tobi [email protected] wrote:
|
Could you mayble share your test code? I was trying to promisify |
When I originally checked (for Now I just validated that
The output was:
Showing that the collect that started second finished first. |
Thanks! Don't you also need to enable the fair scheduler for this,e.g. |
Update: I got it to work with my project as well. The problems were that I was using some event listeners which got overwritten when issuing parallel request... D'oh! |
Excellent! On Monday, January 18, 2016, Tobi [email protected] wrote:
|
Thanks! Regarding the fair scheduler, have you used this as well? With my Spark 1.6.0 the standard method is FIFO which I guess is not recommended:
|
I haven't touched the scheduler settings in any way so far. As to FIFO vs On 18 January 2016 at 08:20, Tobi [email protected] wrote:
|
…enridf#28 - Added some basic tests of these actions (methods)
#26
This is a breaking change that transforms all methods that "run computation" from synchronous to asynchronous (node-style callback) operation. The existing synchronous methods are renamed with a "Sync" suffix.
The methods converted are: