Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ch] Dr. CI to use CH Part 1 #5634

Merged
merged 6 commits into from
Sep 11, 2024
Merged

[ch] Dr. CI to use CH Part 1 #5634

merged 6 commits into from
Sep 11, 2024

Conversation

clee2000
Copy link
Contributor

@clee2000 clee2000 commented Sep 9, 2024

Moves most queries used by Dr. CI to CH. There are some one off queries that I still need to track down

  • types changed: the CH query doesn't have nulls, so I changed null checks to check for the corresponding default value ("" for string, 0 for int, time 0 for timestamps) and removed a lot of !. This causes a mild conflict with the JobData struct, so there's some casting that I don't think actually does anything. I'm also not sure how accurate the typing was originally, since I think some ids were int before too
  • CH returns timestamps without the "Z" timezone, but they are all in UTC, so use dayjs.utc to make sure it is correct (dayjs uses local timezone by default)

I checked the outputs by cherry picking csl/comp_orig (4db3fa5) onto this branch and origin/main (origin/main also needs 055a919 to fix a bug), running curl curl "http://localhost:3000/api/drci/drci" --data 'repo=pytorch' > ../_logs/<new, orig>.json, running a script



import json
from library import REPO_ROOT
import re

def failures(thing):
    thing = thing["failures"]
    for pr_num in thing:
        for classification in thing[pr_num]:
            thing[pr_num][classification] = sorted(thing[pr_num][classification], key=lambda x: x["id"])
            for failure in thing[pr_num][classification]:
                if failure["failure_captures"] == None:
                    failure["failure_captures"] = []
                if failure["failure_context"] == None:
                    failure["failure_context"] = []
                if failure["failure_lines"] == None:
                    failure["failure_lines"] = []
                if failure["runnerName"] == None:
                    failure["runnerName"] = ""
                for key, value in failure.items():
                    date_match = re.match(r"(.*) (.*)\.000000000", str(value))
                    if date_match:
                        failure[key] = f"date"
                    date_match_2 = re.match(r"(\d\d\d\d-\d\d-\d\d)T(\d\d:\d\d:\d\d).*", str(value))
                    if date_match_2:
                        failure[key] = f"date"
    return thing


def comments(thing):
    thing = thing["comments"]
    s = "\n"
    for pr_num in sorted(thing.keys()):
        s += f"{pr_num}\n"
        s += f"{thing[pr_num]['new']}\n"
        s += f"{thing[pr_num]['body']}\n"
    return s

if __name__ == "__main__":
    orig_file = REPO_ROOT / "../test-infra/_logs/orig.json"
    new_file = REPO_ROOT / "../test-infra/_logs/new.json"

    with open(orig_file, "r") as f:
        orig = json.load(f)

    with open(new_file, "r") as f:
        new = json.load(f)

    o_thing = failures(orig)
    n_thing = failures(new)

    with open(orig_file, "w") as f:
        json.dump(o_thing, f, indent=2, sort_keys=True)
        print(comments(orig), file=f)

    with open(new_file, "w") as f:
        json.dump(n_thing, f, indent=2, sort_keys=True)
        print(comments(new), file=f)

and the diffing the files

There were small differences which I then checked individually, and they were due to the databases not being exactly the same when I did the curls. There were also some small differences in which disable issues were used, but based on the code for that I think it's still technically correct

Copy link

vercel bot commented Sep 9, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
torchci ✅ Ready (Inspect) Visit Preview 💬 Add feedback Sep 11, 2024 8:21pm

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 9, 2024
acc[row.pr_num].push(row.merge_commit_sha);
return acc;
}, new Map<number, string[]>());
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this function here from jobUtils.ts because it imports from clickhouse.ts file

label,
});
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

split up functions since I want Dr. CI to use CH, but hud to still use rockset for now, and the gating by environment variable doesn't work for that

OWNER,
repo,
Array.from(workflowsByPR.keys())
);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this query to be out of the loop so its one big query instead of one query for each PR

@clee2000 clee2000 marked this pull request as ready for review September 10, 2024 16:59
@clee2000 clee2000 requested a review from a team September 10, 2024 17:00
@clee2000 clee2000 changed the title [ch] Dr. CI to use CH [ch] Dr. CI to use CH Part 1 Sep 10, 2024
@@ -81,3 +89,5 @@ WHERE
w.id in (select id from materialized_views.workflow_run_by_head_sha where head_sha in (select sha from recent_prs))
ORDER BY
time DESC
-- Non experimental analyzer has problems with final...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought, the need to add final into all the different places seems easy to miss. Maybe we should adopt what @kit1980 has been doing in https://github.com/pytorch/test-infra/blob/main/torchci/clickhouse_queries/queued_jobs_by_label/query.sql#L48 by setting final to true for the whole query?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe when we get a linter working in CI

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, a linter for this would be good

@@ -1,5 +1,12 @@
// This file can't be imported by .tsx files due to client/server stuff (ex not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can only be used on the server side, i.e. API. All HUD pages should query the API https://github.com/pytorch/test-infra/tree/main/torchci/pages/api/clickhouse to get the data instead. This is the same reason I need to add the toggle for CH/Rockset on HUD metric page

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something that can be done by a linter?

// Import itself to ensure that mocks can be applied, see
// https://stackoverflow.com/questions/51900413/jest-mock-function-doesnt-work-while-it-was-called-in-the-other-function
// https://stackoverflow.com/questions/45111198/how-to-mock-functions-in-the-same-module-using-jest
import * as thisModule from "./clickhouse";
Copy link
Contributor

@huydhn huydhn Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL! Previously, I need to refactor the code to move the functions I want to mock to another module instead

torchci/lib/drciUtils.ts Outdated Show resolved Hide resolved
torchci/lib/drciUtils.ts Outdated Show resolved Hide resolved
Copy link
Contributor

@huydhn huydhn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some minors comments but overall LGTM!

@clee2000 clee2000 merged commit eb5fda4 into main Sep 11, 2024
7 checks passed
clee2000 added a commit that referenced this pull request Sep 16, 2024
Follow up to #5634. These are
the one off queries that I needed to track down.

Surprisingly, none of the current PRs go through this path, so I used
some older PRs, like 135397, 114772, as tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants