Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve request cancelation handling. #103

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jpd236
Copy link
Collaborator

@jpd236 jpd236 commented Sep 27, 2017

  • Provide stronger thread safety guarantees by locking access to any
    variable which can be mutated during the processing of a request (i.e.
    by the CacheDispatcher/NetworkDispatcher threads) and dispatching
    response callbacks while holding the same lock that is held when
    setting mCanceled. Unfortunately there's not much we can do about
    alternative ResponseDelivery mechanisms, which are hopefully rare.

  • Add a cancel listener interface and use it to prevent RequestFuture
    from blocking for the complete timeout (or indefinitely) if a request
    is canceled.

  • Since listeners are often Android component classes like Activitys,
    clear them upon cancellation. Unfortunately, this must be done in each
    Request subclass to work, assuming the subclass follows the standard
    pattern of holding the response listener as a member. But this enables
    concerned apps to resolve this leak.

Fixes #15
Fixes #85

- Provide stronger thread safety guarantees by locking access to any
variable which can be mutated during the processing of a request (i.e.
by the CacheDispatcher/NetworkDispatcher threads) and dispatching
response callbacks while holding the same lock that is held when
setting mCanceled. Unfortunately there's not much we can do about
alternative ResponseDelivery mechanisms, which are hopefully rare.

- Add a cancel listener interface and use it to prevent RequestFuture
from blocking for the complete timeout (or indefinitely) if a request
is canceled.

- Since listeners are often Android component classes like Activitys,
clear them upon cancellation. Unfortunately, this must be done in each
Request subclass to work, assuming the subclass follows the standard
pattern of holding the response listener as a member. But this enables
concerned apps to resolve this leak.

Fixes google#15
Fixes google#85
@jpd236 jpd236 requested a review from jjoslin September 27, 2017 23:20
@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
  • If your company signed a CLA, they designated a Point of Contact who decides which employees are authorized to participate. You may need to contact the Point of Contact for your company and ask to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the project maintainer to go/cla#troubleshoot.
  • In order to pass this check, please resolve this problem and have the pull request author add another comment and the bot will run again.

@jpd236
Copy link
Collaborator Author

jpd236 commented Sep 27, 2017

@googlebot is apparently having issues right now; the above comment can be ignored and I'll request a re-check once it's resolved.

Copy link
Contributor

@joebowbeer joebowbeer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be deferred to 1.0.2.

I suggest releasing a fix for the DiskBasedCache OOME/NAIE issues sooner rather than later -- with few additional (potentially destabilizing) changes.

@jpd236
Copy link
Collaborator Author

jpd236 commented Sep 27, 2017

I appreciate the concern, but given the volume of reports of the leak in #15 and duped reports, I'd like to get this resolved as well in the next release. I also find the bug with RequestFuture hanging forever in wait() to be pretty severe in nature - it caused thread pool exhaustion in one of our internal apps. The DiskBasedCache bug is severe as well when it does happen, but has been present for quite some time already.

This does happen to be the last planned change on the roadmap for 1.0.1, though. I wouldn't anticipate other changes being merged before release unless they fix regressions introduced since 1.0.0. So hopefully we'll have an RC build ready soon.

@@ -56,6 +56,12 @@
int PATCH = 7;
}

/** Callback interface to notify listeners when a request has been canceled. */
public interface CancelListener {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The names seems odd. Ideas for a better name:

  • OnRequestCanceledListener
  • CancellationListener
  • OnCancelListener

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CancellationListener makes sense - the intent is to be consistent with "ErrorListener", so the prefixes should both be nouns describing what happened.

// To uphold the contract of cancel(), we hold the same lock here around both the check of
// isCanceled() as well as the actual response delivery. This ensures that after a call
// to cancel() returns, we will never deliver the response.
synchronized (mLock) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change from open calls (without lock held) to closed calls makes me nervous about deadlock.

Copy link
Contributor

@joebowbeer joebowbeer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpd236 Then I suggest making the minimal change to alleviate the memory retention. Similar to the mcxiaoke fix, which has been in use for a while.

Table the lock changes for another release.

@@ -96,9 +105,11 @@
private boolean mShouldCache = true;

/** Whether or not this request has been canceled. */
// GuardedBy(mLock)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're OK with importing android.support.annotation.GuardedBy you could use @GuardedBy("mLock").

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a dependency on the support lib yet, so for now I'm holding off on this. At some point later on we might add it to cover things like this, @VisibleForTesting, etc.

@@ -96,9 +105,11 @@
private boolean mShouldCache = true;

/** Whether or not this request has been canceled. */
// GuardedBy(mLock)
private boolean mCanceled = false;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of guarding this by a lock, would it be sufficient to make this volatile? Since we don't have a mechanism to un-cancel a request, we know that once isCancelled returns true, it will always return true, so we mainly have to make sure that once one thread cancels, all other threads see that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not sufficient as it stands due to holding mLock around the response dispatching below; it might be sufficient if we change that, but I'm not sure that n volatile variables is necessarily more performant than a simple lock, and it's certainly more tricky to reason about :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this a little closer, I see that synchronization is being added to methods that are likely to be called on the main thread. This should be avoided. In particular, Request.cancel() is waiting for a lock held by the ExecutorDelivery.

mCanceled = true;
synchronized (mLock) {
mCanceled = true;
onCanceledInternal();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we don't know what happens in onCancelled, should we maybe check here if the request is already cancelled and avoid double-cancellation? Especially if you make mCanceled volatile as suggested above rather than guarded, but I think it's safer either way.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point; I could change this to avoid double-cancellation.

if (mRunnable != null) {
mRunnable.run();
}
mRequest.deliverResponse(mResponse, mRunnable);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you call that something different, maybe handleResponse or prepareResponseDelivery of maybeDeliverResponse or something? It was a bit confusing to read for me since there are now 2 deliverResponse with different arguments and having different jobs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, agreed this is confusing; I couldn't come up with a better name. If we end up keeping this (per my other comment) then maybe deliverResponseAndFinish() or something to indicate the additional work that this method is responsible for doing.

// To uphold the contract of cancel(), we hold the same lock here around both the check of
// isCanceled() as well as the actual response delivery. This ensures that after a call
// to cancel() returns, we will never deliver the response.
synchronized (mLock) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a lot of stuff behind this lock, I'm not sure that's needed. I think instead we could add a check whether finish() has been called (addMarker() after finish() leads to IllegalStateException). Since this method (and finish) are package-private it is anyway not possible for users to access them without using ExecutorDelivery. And as long as ExecutorDelivery always runs them on the same thread, we can't have conflicts here as far as I can tell. What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joebowbeer I acknowledge the concern, but I don't see a way to properly implement the guarantees that are advertised by cancel() without ensuring that the same lock is held for both cancel() and for the response dispatching itself. Without this lock, it's possible for someone to call cancel() and still have the callbacks called (if we've already checked isCanceled() here).

@uhager It doesn't seem to me like finish() is always called (in the response.intermediate case) so I'm not sure that works in every case, right? And the assumption about ExecutorDelivery running on the same thread is the one this locking is intended to resolve.

Another option might be to weaken the guarantee by stating that you must call cancel() from the same thread as the ResponseDelivery being used (typically the main thread) in order to guarantee that the callbacks aren't invoked. If you call cancel() from another thread, and the main thread is in the process of delivering responses, it may still end up delivering the response. In this case we could leave the dispatching as it was and just accept the risk of a post-cancel dispatch. However, this would mean needing to lock writes to the error listener / response listener.

@jjoslin - any thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT the same lock needs to be held during cancel() and response dispatching to uphold the cancel() contract as @jpd236 states. I feel like this is a reasonable thing to do but also acknowledge that it does create the potential for deadlock (although I think it will be rare in practice). I support/prefer the current implementation but I wouldn't be opposed to weakening the guarantee either.

Copy link
Collaborator Author

@jpd236 jpd236 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you both for the prompt reviews!

Regarding keeping the scope small: I'm not sure what the other proposed fix would be but the memory retention is only one of the two issues being resolved here, the other being the RequestFuture cancellation bug. Note that just the minimal fix of clearing listeners in cancel() can only safely be done if cancel invocation is thread-safe.

I also think the ship has already sailed re: small bug fixes; the Apache header changes and soft TTL tweaks that recently landed are just as large if not more so. Volley's 1.0.0 release has been out for some time and I believe the vast majority of clients are not running into issues, so I don't see a strong reason to rush a release prematurely. We'll get there soon, and affected clients may use SNAPSHOT builds in the mean time (and can pin to a specific snapshot if they're concerned about picking up less-tested changes).

@@ -56,6 +56,12 @@
int PATCH = 7;
}

/** Callback interface to notify listeners when a request has been canceled. */
public interface CancelListener {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CancellationListener makes sense - the intent is to be consistent with "ErrorListener", so the prefixes should both be nouns describing what happened.

if (mRunnable != null) {
mRunnable.run();
}
mRequest.deliverResponse(mResponse, mRunnable);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, agreed this is confusing; I couldn't come up with a better name. If we end up keeping this (per my other comment) then maybe deliverResponseAndFinish() or something to indicate the additional work that this method is responsible for doing.

@@ -96,9 +105,11 @@
private boolean mShouldCache = true;

/** Whether or not this request has been canceled. */
// GuardedBy(mLock)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a dependency on the support lib yet, so for now I'm holding off on this. At some point later on we might add it to cover things like this, @VisibleForTesting, etc.

@@ -96,9 +105,11 @@
private boolean mShouldCache = true;

/** Whether or not this request has been canceled. */
// GuardedBy(mLock)
private boolean mCanceled = false;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not sufficient as it stands due to holding mLock around the response dispatching below; it might be sufficient if we change that, but I'm not sure that n volatile variables is necessarily more performant than a simple lock, and it's certainly more tricky to reason about :)

mCanceled = true;
synchronized (mLock) {
mCanceled = true;
onCanceledInternal();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point; I could change this to avoid double-cancellation.

// To uphold the contract of cancel(), we hold the same lock here around both the check of
// isCanceled() as well as the actual response delivery. This ensures that after a call
// to cancel() returns, we will never deliver the response.
synchronized (mLock) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joebowbeer I acknowledge the concern, but I don't see a way to properly implement the guarantees that are advertised by cancel() without ensuring that the same lock is held for both cancel() and for the response dispatching itself. Without this lock, it's possible for someone to call cancel() and still have the callbacks called (if we've already checked isCanceled() here).

@uhager It doesn't seem to me like finish() is always called (in the response.intermediate case) so I'm not sure that works in every case, right? And the assumption about ExecutorDelivery running on the same thread is the one this locking is intended to resolve.

Another option might be to weaken the guarantee by stating that you must call cancel() from the same thread as the ResponseDelivery being used (typically the main thread) in order to guarantee that the callbacks aren't invoked. If you call cancel() from another thread, and the main thread is in the process of delivering responses, it may still end up delivering the response. In this case we could leave the dispatching as it was and just accept the risk of a post-cancel dispatch. However, this would mean needing to lock writes to the error listener / response listener.

@jjoslin - any thoughts?

@uhager
Copy link
Contributor

uhager commented Sep 28, 2017

@jpd236 I meant that ExecutorDelivery always sends deliverResponse on the same Handler. So deliverResponse can't be called simultaneously, since - as far as I can tell - no other class calls it and it's not accessible to users. Or does RequestFuture change that behaviour in ExecutorDelivery?
For an intermediate response, we want it to be allowed to be called again, just not after finish was called. So if 2 threads call ExecutorDelivery#postResponse with non-intermediate responses, only the first deliverResponse will run, the second one will see that finish has been called and return. If we allow users to call deliverResponse(response, runnable), or it is called somewhere else in Volley, then I agree, we need some locking.

@uhager
Copy link
Contributor

uhager commented Sep 28, 2017

Thinking about this some more, wouldn't it make more sense to synchronize access to mErrorListener? Where it is set to null when cancelled and then in deliverResponse around deliverError? Check if it's non-null and then synchronize using it?
I'm just worried that this is a very big block that is locked and can block a thread that tries to cancel the request for some time. At the very least we should use a dedicated lock for this rather than reusing the other one.

@googlebot
Copy link

CLAs look good, thanks!


private void onCanceledInternal() {
mErrorListener = null;
if (mCancelListener != null) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to synchronize access to mCancelListener because setCancelListener() could be called on a different thread. Or pass it in through the ctor and make it volatile.

}

/** Sets a {@link CancelListener} which will be notified when the request is canceled. */
public void setCancelListener(CancelListener listener) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not add another ctor to accommodate this?

}

private void onCanceledInternal() {
mErrorListener = null;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mErrorListener needs to be volatile (or access to it needs to be synchronized) to ensure getErrorListener() returns the correct value. Same for mCancelListener.

private void onCanceledInternal() {
mErrorListener = null;
if (mCancelListener != null) {
mCancelListener.onRequestCanceled();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other callbacks run on the same thread as the ResponseDelivery being used whereas this runs on the calling thread. I realize in most cases those threads will be the same (main thread) but what do you think about providing a stronger guarantee about the thread onRequestCanceled() runs on?

// To uphold the contract of cancel(), we hold the same lock here around both the check of
// isCanceled() as well as the actual response delivery. This ensures that after a call
// to cancel() returns, we will never deliver the response.
synchronized (mLock) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT the same lock needs to be held during cancel() and response dispatching to uphold the cancel() contract as @jpd236 states. I feel like this is a reasonable thing to do but also acknowledge that it does create the potential for deadlock (although I think it will be rare in practice). I support/prefer the current implementation but I wouldn't be opposed to weakening the guarantee either.

Copy link
Contributor

@joebowbeer joebowbeer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove any new locking from code that can be called from main thread.

@@ -96,9 +105,11 @@
private boolean mShouldCache = true;

/** Whether or not this request has been canceled. */
// GuardedBy(mLock)
private boolean mCanceled = false;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this a little closer, I see that synchronization is being added to methods that are likely to be called on the main thread. This should be avoided. In particular, Request.cancel() is waiting for a lock held by the ExecutorDelivery.

@@ -51,9 +52,10 @@
*
* @param <T> The type of parsed response this future expects.
*/
public class RequestFuture<T> implements Future<T>, Response.Listener<T>,
Response.ErrorListener {
public class RequestFuture<T> implements Future<T>, Response.Listener<T>, Response.ErrorListener,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how much difference this makes, but the pattern I like to use for classes that implement Future is to have them wrap an internal FutureTask instance to handle the synchronization. For example, see BackgroundTask in JCiP (Ch. 9 ). This makes it easier to get the synchronization correct, and makes it more apparent that it is.

@jpd236
Copy link
Collaborator Author

jpd236 commented Oct 3, 2017

OK - this PR seems too contentious to try and merge all at once :) Let me try to split it into more reasonable chunks so it's more manageable; I'll send them out separately.

First of all, thanks all for the detailed comments; I really do appreciate it.

For the leak, I need to think it through a bit more but I think you're probably right that we can just synchronize access to mErrorListener/mListener in Request and each of the toolbox subclasses, and then clear them on cancel. This doesn't guarantee that deliverResponse()/deliverError() won't be called after calling cancel(), but calling them should just no-op if the respective listener hasn't been nulled out, so we should be okay just updating the Javadoc to note this limitation - there are no guarantees if cancel() is called from a different thread than the response delivery. It also leaves this locking up to the Request subclass which isn't ideal, but there's not much we can do about that without holding our own lock for code we don't control.

Please note that grabbing a lock on the main thread should not be a problem; it should finish in microseconds. It's only an issue if that lock is contentious or if the work performed inside the lock can take a non-trivial amount of time. Certainly if it's safe to avoid it's better, as with anything on the main thread, but I don't think it's a concern here when necessary.

For the cancel listener - it'll be a bit more involved, but a possible option here is to add a postCancel to ResponseDelivery and plumb cancel events through there, so for our default implementation it'll run on the main thread. The main issue is that ResponseDelivery is an interface, so as we did with the Apache HTTP refactoring for HttpStack, we'd need to add a new interface and continue to support the old one (dropping cancellation events) for anyone using it. I may punt on this for a while given the complexity and uncertainty about the API, and instead just update RequestFuture to warn about using wait(0) in some form.

I think that covers the higher level open questions. For the nitty-gritty we can cover that on each PR.

@jpd236
Copy link
Collaborator Author

jpd236 commented Oct 3, 2017

Leak/thread-safety: #105

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants