Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Git checker doesn't get timestamps, resulting in unwanted PRs #154

Open
wjt opened this issue Mar 11, 2021 · 10 comments
Open

Git checker doesn't get timestamps, resulting in unwanted PRs #154

wjt opened this issue Mar 11, 2021 · 10 comments
Labels
bug Something isn't working enhancement New feature or request

Comments

@wjt
Copy link
Contributor

wjt commented Mar 11, 2021

On a whim I wrote flathub/org.gnome.Boxes.Extension.OsinfoDb#2 to automatically update the database used by GNOME Boxes when it is tagged upstream. When reviewing my own PR I noticed that the date added in the appdata is today's date, rather than the date attached to the tag.

It doesn't seem to be possible to ask git ls-remote to include more commit metadata. I guess the "easiest" thing to do would be to make a shallow clone of the repository at exactly the tag in question, then read the timestamp out of the tag; but it would be nice to avoid having to fetch the commit contents. I don't know Git well enough to know if there is an operation that can do this. (The GitHub API has a way! But that's not useful in cases, like this one, which aren't on GitHub.)

@gasinvein
Copy link
Collaborator

gasinvein commented Mar 11, 2021

Sure it would be a nice feature to have.
But I believe it won't make much difference in practice in most cases, because f-e-d-c on Flathub is likely to get the new tag about 1 hour after it was tagged, so the current date is likely to match the tag date anyway.

@gasinvein gasinvein added the enhancement New feature or request label Mar 11, 2021
@wjt
Copy link
Contributor Author

wjt commented Mar 12, 2021

The name of the branches that f-e-d-c pushes are based on the tree hash, which will change whenever the date changes, so if a PR is not merged within the same UTC day as being opened, it'd open another one because the hash would change.

Maybe theoretical? Maybe acceptable?

@gasinvein
Copy link
Collaborator

Didn't think about it. Yeah, that would be not great.

@gasinvein
Copy link
Collaborator

gasinvein commented Mar 12, 2021

Maybe we should add GitHub and GitLab checkers instead, that would use API?
Or even add a special case for git_ls_remote() helper - if the repo is hosted on GitHub or GitLab, use their APIs to get commit timestamp.
I really would like to avoid cloning whole repos. This would make checking git sources significantly slower.

@gasinvein gasinvein added bug Something isn't working and removed enhancement New feature or request labels Mar 12, 2021
@wjt
Copy link
Contributor Author

wjt commented Mar 12, 2021

That could work. The GitLab API for fetching a tag's metadata is unauthenticated if the repo is public:

$ http GET https://gitlab.gnome.org/api/v4/projects/gnome%2Fgnome-initial-setup/repository/tags/3.38.2
HTTP/1.1 200 OK
Cache-Control: max-age=0, private, must-revalidate
Connection: keep-alive
Content-Length: 1699
Content-Type: application/json
Date: Fri, 12 Mar 2021 08:45:04 GMT
Etag: W/"d7088e5a56582b820054cfdcea731b6d"
Server: nginx
Vary: Origin
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-Gitlab-Feature-Category: source_code_management
X-Request-Id: 01F0JSRM9EBF100F4E4E05R8R2
X-Runtime: 0.044975

{
    "commit": {
        "author_email": "[email protected]",
        "author_name": "Will Thompson",
        "authored_date": "2020-11-21T15:08:26.000+00:00",
        "committed_date": "2020-11-21T15:08:26.000+00:00",
        "committer_email": "[email protected]",
        "committer_name": "Will Thompson",
        "created_at": "2020-11-21T15:08:26.000+00:00",
        "id": "8aacbcb5913ef340fc6a78e6806177b23810766b",
        "message": "NEWS for 3.38.2\n",
        "parent_ids": [
            "099f5f52535f22ab0fb63a74440561c506e2188e"
        ],
        "short_id": "8aacbcb5",
        "title": "NEWS for 3.38.2",
        "web_url": "https://gitlab.gnome.org/GNOME/gnome-initial-setup/-/commit/8aacbcb5913ef340fc6a78e6806177b23810766b"
    },
    "message": "Git-EVTag-v0-SHA512: fab136661b6d1de47c8540773309664249fb43e4f301aaabdff7e889fb4c0ff1f9969cc61c892fd719c8439bbf3f786e4c9b1041443ebbc88689d25daa7e8281\n-----BEGIN PGP SIGNATURE-----\n\niQIzBAABCgAdFiEEHmjljPJViIMBZFtWNCLcDXrUgqcFAl+5LbAACgkQNCLcDXrU\ngqdRNA//WyWvz2XPrcLP7pCjwtV3nVVhWwnI1XEtlWUuYbFGQ3zZ2oIRZZcBwdlH\nmmgd8R8glDL1RAt7Pjor9CWSMmjqlK5eoT2IBCvNlMuQcmrTHGUoZ+uMWZKQYFXu\nGYfHkyF/1RwGWX4vDKW8P77grei7T507DQWBOwY1xEDllTlL0qAiAUzOHHTRFjNq\nHlpTahfuILyA5l/c08Wk/rPjhR+O1L3hBpZ8izdPbpd0yEUI8Bj4u/kt45jkVCRZ\nJSEtTRw1MWHbJtbkeW+Uw6IlLoHEAWVLcnFfw8YdLDXanoupzcLExmi8eUFPZMkO\n6br3s/ww3Ha19ZQuJdO1Ai/+kcsUxBIHwmhLPtbwjBx5TcpCggJIkWMfcZvvKWEB\nRlKrT3PWELAMD3eZg9zyZFUxsgPtTHwwD/LQLb7StyOJJ6Wl4O5/jbvKxoBWP+TM\nMJzB7hQlqrfhymhCkKCE7MOZV8fECreCc7BMeJpDC4vTdES4qVYgAlRkjNLa1DDB\n+jW0UDUcoyPdM3GIkb84uP/A5b7yNpGh49o6XMozQUkTWY1MqN5hSS37HZZ3xXPc\nZfhiODNQBnKf5x5ClE3xeLkpaIIt7z7DxfJenza3yFGh6tj5d/m5NXkIT603sY63\nqIEwtZtLVyJYYPh/RgP45CcW6pLY77pMiIv5ErVhfA4tzszFilA=\n=D9Dd\n-----END PGP SIGNATURE-----",
    "name": "3.38.2",
    "protected": false,
    "release": null,
    "target": "b873e05ad7c4c91ae690cdadb8f1779e42d3ee97"
}

GitHub's appears to need authentication but happily the script already knows how to do that.

(Was there a reason for shelling out to the git CLI rather than using a Python library OOI?)

@gasinvein
Copy link
Collaborator

(Was there a reason for shelling out to the git CLI rather than using a Python library OOI?)

Dunno. git command was used previously in main.py, so I just continued using it in utils.py.

@gasinvein
Copy link
Collaborator

gasinvein commented Mar 12, 2021

if a PR is not merged within the same UTC day as being opened, it'd open another one because the hash would change

An alternate approach to prevent this would be to invent an algorithm to uniquely identify the update we made without relying on git commit hash to match.

@gasinvein
Copy link
Collaborator

And yet another idea for workaround - maybe add timestamp-query to JSONChecker? This would allow extracting timestamps from GitHub / GitLab APIs.

wjt added a commit to wjt/org.gnome.Boxes.Extension.OsinfoDb that referenced this issue Mar 24, 2021
This will automatically open PRs when new tags appear in the upstream
repository.

This uses the JSON checker rather than the Git checker because the Git
checker cannot [currently][0] extract the timestamp from the Git tag, and
would instead use today's date. fedc uses the Git tree hash to
avoid duplicate PRs, so this would mean a new PR would be opened every
day until one of them were merged.

Unfortunately the JSON provided by GitLab does not include the date from
the tag, only the date from the commit that the tag points to. As we
will see in the subsequent commit, these are different, and it looks a
bit weird that the release date is before the release name (itself a
date). But this is relatively harmless, and importantly it is
deterministic over time.

[0]: flathub-infra/flatpak-external-data-checker#154
@wjt wjt closed this as completed Mar 24, 2021
@wjt wjt reopened this Mar 24, 2021
@wjt
Copy link
Contributor Author

wjt commented Mar 24, 2021

if a PR is not merged within the same UTC day as being opened, it'd open another one because the hash would change

An alternate approach to prevent this would be to invent an algorithm to uniquely identify the update we made without relying on git commit hash to match.

We have an in-memory representation of the manifest file(s) so I guess we could just hash those.

And yet another idea for workaround - maybe add timestamp-query to JSONChecker? This would allow extracting timestamps from GitHub / GitLab APIs.

Thanks for adding this in #155. It works nicely. I will leave this ticket open as a wishlist item – it'd be less obscure to have the Git checker do the right thing by itself. Maybe I'll have a crack when I get time.

@wjt wjt added the enhancement New feature or request label Mar 24, 2021
wjt added a commit to wjt/org.gnome.Boxes.Extension.OsinfoDb that referenced this issue Mar 24, 2021
This will automatically open PRs when new tags appear in the upstream
repository.

This uses the JSON checker rather than the Git checker because the Git
checker cannot [currently][0] extract the timestamp from the Git tag, and
would instead use today's date. fedc uses the Git tree hash to
avoid duplicate PRs, so this would mean a new PR would be opened every
day until one of them were merged.

Unfortunately the JSON provided by GitLab does not include the date from
the tag, only the date from the commit that the tag points to. As we
will see in the subsequent commit, these are different, and it looks a
bit weird that the release date is before the release name (itself a
date). But this is relatively harmless, and importantly it is
deterministic over time.

[0]: flathub-infra/flatpak-external-data-checker#154
felipeborges pushed a commit to flathub/org.gnome.Boxes.Extension.OsinfoDb that referenced this issue Mar 29, 2021
This will automatically open PRs when new tags appear in the upstream
repository.

This uses the JSON checker rather than the Git checker because the Git
checker cannot [currently][0] extract the timestamp from the Git tag, and
would instead use today's date. fedc uses the Git tree hash to
avoid duplicate PRs, so this would mean a new PR would be opened every
day until one of them were merged.

Unfortunately the JSON provided by GitLab does not include the date from
the tag, only the date from the commit that the tag points to. As we
will see in the subsequent commit, these are different, and it looks a
bit weird that the release date is before the release name (itself a
date). But this is relatively harmless, and importantly it is
deterministic over time.

[0]: flathub-infra/flatpak-external-data-checker#154
tinywrkb added a commit to tinywrkb/io.qt.qtwebengine.BaseApp that referenced this issue Jun 21, 2021
The Git checker does not extract timestamps from tags so flathubbot will
create a new pr on every daily run if the previous pr wasn't merged.
See more details in f-e-d-c's bug report flathub-infra/flatpak-external-data-checker#154

To avoid this, switch to the KDE qtwebengine Gitlab mirror that can be
used with the JSON checker.

Also switch to pcituils tarball releases and add a check for python2.
tinywrkb added a commit to tinywrkb/io.qt.qtwebengine.BaseApp that referenced this issue Jun 21, 2021
The Git checker does not extract timestamps from tags so flathubbot will
create a new pr on every daily run if the previous pr wasn't merged.
See more details in f-e-d-c's bug report flathub-infra/flatpak-external-data-checker#154

To avoid this, switch to the KDE qtwebengine Gitlab mirror that can be
used with the JSON checker.

Also switch to pcituils tarball releases and add a check for python2.
gasinvein referenced this issue in flathub/radio.k0swe.Kel_Agent Jul 10, 2021
@gasinvein gasinvein pinned this issue Jul 10, 2021
@gasinvein gasinvein changed the title Get timestamp from Git tags Git checker doesn't get timestamps, resulting in unwanted PRs Jul 10, 2021
@gasinvein
Copy link
Collaborator

This now also affects tarballs from GitHub, which now don't have persistent timestamps in HTTP Date header.

hadess added a commit to flathub/org.ghidra_sre.Ghidra that referenced this issue Nov 7, 2022
Change x-data-checker to check the timestamp of updates, to avoid tons
of duplicate pull requests if we don't merge changes straight away.

See:
flathub-infra/flatpak-external-data-checker#154
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants