Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect and warn for timeout-minutes change #1098

Closed
wants to merge 21 commits into from
Closed
Show file tree
Hide file tree
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion bin/cml/runner.js
Original file line number Diff line number Diff line change
Expand Up @@ -401,10 +401,11 @@ const run = async (opts) => {
}
}

if (driver === 'github') {
if (cml.driver === 'github') {
0x2b3bfa0 marked this conversation as resolved.
Show resolved Hide resolved
winston.warn(
'Github Actions timeout has been updated from 72h to 35 days. Update your workflow accordingly to be able to restart it automatically.'
);
dacbd marked this conversation as resolved.
Show resolved Hide resolved
await cml.workflowCheck();
}

winston.info(`Preparing workdir ${workdir}...`);
Expand Down
14 changes: 7 additions & 7 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@
"@actions/core": "^1.8.2",
"@actions/github": "^4.0.0",
"@npcz/magic": "^1.3.12",
"@octokit/core": "^3.5.1",
"@octokit/core": "^3.6.0",
"@octokit/graphql": "^4.8.0",
"@octokit/plugin-throttling": "^3.5.2",
"@octokit/rest": "18.0.0",
Expand All @@ -82,6 +82,7 @@
"globby": "^11.0.4",
"https-proxy-agent": "^5.0.1",
"js-base64": "^3.7.2",
"js-yaml": "^4.1.0",
"kebabcase-keys": "^1.0.0",
"node-fetch": "^2.6.5",
"node-ssh": "^12.0.0",
Expand Down
4 changes: 4 additions & 0 deletions src/cml.js
Original file line number Diff line number Diff line change
Expand Up @@ -559,6 +559,10 @@ Automated commits for ${this.repo}/commit/${sha} created by CML.
return renderPr(url);
}

async workflowCheck() {
return await getDriver(this).workflowCheck();
}

async pipelineRerun(opts) {
return await getDriver(this).pipelineRerun(opts);
}
Expand Down
75 changes: 75 additions & 0 deletions src/drivers/github.js
Original file line number Diff line number Diff line change
Expand Up @@ -702,6 +702,15 @@ class Github {
return GITHUB_SHA;
}

async workflowCheck() {
try {
// TODO post telemetery merge - only run if in ci?
await this.checkWorkflowForTimeoutMinutes();
} catch (err) {
// noop
}
DavidGOrtega marked this conversation as resolved.
Show resolved Hide resolved
}

get branch() {
return branchName(GITHUB_HEAD_REF || GITHUB_REF);
}
Expand All @@ -713,6 +722,72 @@ class Github {
get userName() {
return 'GitHub Action';
}

async checkWorkflowForTimeoutMinutes() {
const WARNINGS = [];
const _DIR = '.github/workflows/';
const yaml = require('js-yaml');
dacbd marked this conversation as resolved.
Show resolved Hide resolved
const files = await fs.readdir(_DIR);
for (const file of files) {
const rawStr = await fs.readFile(`${_DIR}${file}`, 'utf8');
const doc = yaml.load(rawStr);
Object.keys(doc.jobs).forEach((job) => {
// does job contain timeout?
const timeoutVal = doc.jobs[job]['timeout-minutes'];
if (timeoutVal === undefined) return;
Copy link
Contributor

@casperdcl casperdcl Jul 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well undefined timeout is also a problem right?

  • cml runner --cloud-spot AND:
    • timeout 4320: raise
    • timeout undefined: raise
    • timeout 50400: noop, all ok
    • timeout != 50400: maybe warn?

Copy link
Contributor Author

@dacbd dacbd Jul 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are plenty of use case where cml runner instances last less that 6hrs and timeout-minutes is never required. Hence my effort to not bother those users with warning messages :)

if timeout is 50400 noop makes sense so does timeout != 50400: maybe warn?.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️ timeout 50400: noop, all ok

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️ timeout 4320: raise

  • I disagree on timeout undefined: raise

✔️ / ➖ timeout != 50400: maybe warn? last timeout condition checked is 4320 / so if:

  1. timeout is set
  2. timeout is not 50400
  3. timeout is not 4320
  4. check for cml usage, then raise

// does job have label runs-on "self-hosted" or "cml"
const runsOn = doc.jobs[job]['runs-on'];
switch (typeof runsOn) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I follow the intention of this switch statement 🤔

Copy link
Contributor Author

@dacbd dacbd Jul 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can be a string or a list of strings

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smells like Go 😄

case 'string':
if (
runsOn !== 'self-hosted' &&
!runsOn.toLocaleLowerCase().includes('cml')
DavidGOrtega marked this conversation as resolved.
Show resolved Hide resolved
)
return;
break;
case 'object':
if (!Array.isArray(runsOn)) return; // shouldnt happen
if (
!runsOn
.map((v) => {
return (
v === 'self-hosted' || v.toLocaleLowerCase().includes('cml')
);
})
.reduce((pv, v) => pv || v)
)
return;
break;
default:
return;
}
// locate timeout for warning
const warning = (function (find) {
const lines = rawStr.split('\n');
for (const i in lines) {
if (lines[i].includes(find[0])) {
find.shift();
if (find.length === 0) {
return parseInt(i) + 1;
}
}
}
return -1;
})([`${job}:`, 'timeout-minutes:']);
if (warning !== -1) {
WARNINGS.push({
file: `${_DIR}${file}`,
line: warning
});
}
});
}
for (const warning of WARNINGS) {
console.log(
`::warning file=${warning.file},line=${warning.line},title=Possible Unexpected Behavior using "cml runner"::GitHub Actions has updated timeout-minutes if your job is unlikely to run longer than 6hrs then you should remove timeout-minutes, to have cml auto-restart for long training jobs change this value to: 50400 (35d)`
);
}
}
}

module.exports = Github;