Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle FTP [more] gracefully #121

Open
Mr0grog opened this issue Nov 21, 2018 · 1 comment
Open

Handle FTP [more] gracefully #121

Mr0grog opened this issue Nov 21, 2018 · 1 comment
Labels

Comments

@Mr0grog
Copy link
Member

Mr0grog commented Nov 21, 2018

We currently have a few FTP directories we monitor, but we don’t actually handle them very well.

When we have snapshots from Wayback, we compare them poorly: https://monitoring.envirodatagov.org/page/a0ba1338-2d04-4ba4-9487-c6ff9be0383b/97490127-9a81-474b-a3dc-29afa943edfd..33a00df8-665b-40e8-806d-9033b57f1588

And from Versionista, we fail to display anything useful at all: https://monitoring.envirodatagov.org/page/a0ba1338-2d04-4ba4-9487-c6ff9be0383b/ae7f6a24-bc8e-4cdd-860a-0eaa354f00ac..33a00df8-665b-40e8-806d-9033b57f1588

The real issues under the hood:

  • When we get FTP listings out of Versionista, we wind up storing them as application/octet-stream, which means we wind up treating them like binary data later on. We could store them as text/plain (see Wayback below) or we could make something more specific. (See also FTP listings should not be stored as application/octet-stream web-monitoring-versionista-scraper#166)

  • When we get FTP listings out of Wayback, we wind up storing them as text/plain, which at least makes them displayable and diffable, but we don’t diff them in a particularly useful way:

    screen shot 2018-11-21 at 9 04 50 am

  • In the UI, we are parsing mime types poorly and we read wayback’s text/plain as text/html, so we don’t give it the most friendly visualization (Incorrect diff types presented for versions with mime_type instead of content_type web-monitoring-ui#322).

  • We could diff these as plain text, which is moderately useful:

    screen shot 2018-11-21 at 9 35 44 am

    But it might be nice to have a fancier diff in this case, like we do for links. I think we’d probably need a new mime-type for this (e.g. text/x-wm-ftp-directory or text/ftp-dir-listing [this is what Versionista appears to be sending, which is non-standard but also used by some other tools]), though we could possibly also detect that it’s an FTP listing by checking version.capture_url.startsWith('ftp://') in the UI.

@stale
Copy link

stale bot commented May 20, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in seven days if no further activity occurs. If it should not be closed, please comment! Thank you for your contributions.

@stale stale bot added the stale label May 20, 2019
@Mr0grog Mr0grog added the bug label May 21, 2019
@stale stale bot removed the stale label May 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant