Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

starting a snapshot listener from afar causes problems #8451

Open
michaelAtCoalesce opened this issue Aug 22, 2024 · 12 comments
Open

starting a snapshot listener from afar causes problems #8451

michaelAtCoalesce opened this issue Aug 22, 2024 · 12 comments

Comments

@michaelAtCoalesce
Copy link

michaelAtCoalesce commented Aug 22, 2024

Operating System

windows

Environment (if applicable)

chrome

Firebase SDK Version

10.13.0

Firebase SDK Product(s)

Firestore

Project Tooling

create-react-app example

Detailed Problem Description

i have a collection with ~50 megabytes of data across ~1500 documents.

when i try to start a listener while in north america (connecting to US firestore), it takes only 10 seconds to start the listener, and it completes 100% of the time

image

. when i turn on my india VPN (same machine, same code, only difference is routing through india), the listener never even completes.

i immediately get these errors -

image

on some machines from APAC region connecting to US firestore, i also get really poor behavior, and it never actually succeeds.

image

for what its worth - the connection FROM india to united states should easily be able to handle this...

image

Steps and code to reproduce issue

if someone wants to email me i can send them the info for the recreate. it's a few lines of code.

@michaelAtCoalesce michaelAtCoalesce added new A new issue that hasn't be categoirzed as question, bug or feature request question labels Aug 22, 2024
@google-oss-bot
Copy link
Contributor

I couldn't figure out how to label this issue, so I've labeled it for a human to triage. Hang tight.

@jbalidiong jbalidiong added api: firestore needs-attention and removed needs-triage new A new issue that hasn't be categoirzed as question, bug or feature request labels Aug 22, 2024
@michaelAtCoalesce
Copy link
Author

michaelAtCoalesce commented Aug 22, 2024

based on previous issues @dconeybe might be a good person to look at this? Someone can email me at 'mx2323 <@> gmail.com' and I'll jump on a call to recreate.

It's very concerning behavior so we'd like to get this looked at ASAP.

@michaelAtCoalesce
Copy link
Author

any updates? this is causing our application to not load.

@wu-hui
Copy link
Contributor

wu-hui commented Aug 23, 2024

Hey @michaelAtCoalesce ,

I suspect the issue here is that the bidirectional stream between the SDK and the backend does not work well when the network is not stable, especially when you need to load a lot data over the wire. There are several things you can try:

  1. Can you create a test firestore in Asian to see if things improve?
  2. Try to always turn on longpolling (https://firebase.google.com/docs/reference/js/firestore_.firestoresettings.md#firestoresettingsexperimentalforcelongpolling) see if that helps.

@sampajano Do you have some other suggestions/ideas?

@wu-hui wu-hui self-assigned this Aug 23, 2024
@michaelAtCoalesce
Copy link
Author

michaelAtCoalesce commented Aug 26, 2024

hi @wu-hui,

  1. yes, this is definitely the case. as i said above, when i turn off the VPN (so connect from nearby) - the reliability is 100%
  2. i have forced long polling, but i am noticing that i see slower load times. more consistent, but slower load times.

the data here is only on the order of tens of megabytes - but it can take over a minute (and sometimes not load at all), whereas a closer connection will take ~10 seconds.

in conclusion:
the speed test shows that this should work, so it seems there is work to do here for firestore to reliability support this kind of a connection... it shouldn't take over a minute to load when it takes 10 seconds in the ideal case and even the force long polling is slow.

i can reliably recreate this issue 100% of the time, within seconds. happy to hop on a call and share recreate details (or do it over email). you can email me at mx2323 <@> gmail.com

@michaelAtCoalesce
Copy link
Author

why was the needs attention label removed and a needs-info label added? i believe ive given the information required and this is causing our production to not load.

@DellaBitta
Copy link
Contributor

My mistake! I must have had been looking at a stale page that I had loaded yesterday, sorry!

@michaelAtCoalesce
Copy link
Author

from nearby: 11seconds
from afar: 226 seconds

im uploading some firestore debug level logs of the degenerative case here.

aec2-38-34-123-154.ngrok-free.app-1724779566040.log

@michaelAtCoalesce
Copy link
Author

any updates?

@MarkDuckworth
Copy link
Contributor

@michaelAtCoalesce, Thank you for providing the logs. I reviewed them and I don't see a clear indication of an issue in the SDK, however I have forwarded this to our backend team for review. Googlers see b/361143373

For what it's worth, on this behavior, you may get more frequent updates if you open a Firebase or Google Cloud support ticket rather than a GitHub issue on the SDK. However, we will update this GH issue when we learn more.

@michaelAtCoalesce
Copy link
Author

michaelAtCoalesce commented Aug 28, 2024

@MarkDuckworth thanks for the update. wanted to add another data point in here. it appears to happen more frequently on windows. i have anecdotally noticed in my recreate case that on chrome on windows the default implementation is more likely to fail than chrome on macOS. if nothing else, it appears that windows is at least 3x as slow.

something appears to happen where the default implementation will start up, download a bit, then just hang there for tens of seconds or minutes and not do anything. when i turn on the experimental long polling option, it immediately goes back to working.

@michaelAtCoalesce
Copy link
Author

michaelAtCoalesce commented Sep 10, 2024

its been 2 weeks.. any updates?

i was told by firebase support to try to paginate the snapshots. i tried that, and the performance did not improve and sometimes the listeners do not ever start still. this appears to happen more frequently on windows. it appears that this issue appears even when the user is nearby the firestore location.

on my mac it'll take 12 seconds, the exact same page on a windows e2-standard-2 instance has 400 errors, 404 errors, and takes over 3 minutes sometimes) for the same test case.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants