Hello PH! We’re development Haystack to lend a hand groups handle the explosion within the collection of pull requests that want to be reviewed because of the upward thrust of coding brokers.
Haystack replaces the GitHub PR evaluate gadget with a queue that triages each and every PR sooner than a human has to learn any diffs. It appears on the diffs, the codebase, and the coding-agent dialog that produced the PR. Haystack then routes it into one among 3 buckets:
1. Secure to merge. This implies the PR has sufficient proof in the back of it that the workforce can merge it with out every other human’s evaluate.
Some examples:
-
A small UI reproduction alternate that features a screenshot appearing the general state
-
A backend alternate the place the creator obviously examined the necessary paths and ran the adjustments in an actual atmosphere
2. Wishes fixes. Which means the PR has insects or violates a rule to your codebase and subsequently the PR must be fastened by means of the creator.
Some examples:
-
The agent used to be requested to make loading a big desk sooner by means of including pagination, however the PR nonetheless a lot each end result without delay and “implements” pagination within the UI
-
The PR silently catches an error as a substitute of logging, surfacing, or dealing with it. This violates the workforce’s “no silent error swallowing” rule
3. Wishes human evaluate. Which means the PR may no longer be sufficiently verified by means of the creator or is touching a delicate a part of the codebase (decided by means of user-input tips) and thus calls for human evaluate.
Some examples:
-
The PR adjustments an important quantity of good judgment in billing
-
The PR adjustments crucial consumer go with the flow like onboarding, however the creator best ran unit checks and not opened the app to test the go with the flow end-to-end. That violates the workforce’s rule that high-impact user-facing adjustments want handbook verification.
As a substitute of beginning with line-by-line diffs, Haystack straight away tells the reviewer the purpose in the back of the PR, what design selections the creator made (knowledgeable by means of their coding-agent dialog), and what kind of the creator did to ensure that the pull request works (e.g. run scripts, checked the frontend, and so forth.).
On this means, evaluate shifts from “what modified?” to “is that this the best habits and is there proof that it really works?”.
Right here’s a handy guide a rough demo: https://www.tella.television/video/strea…
We prior to now introduced Haystack as a device for figuring out massive PRs (https://information.ycombinator.com/ite…). As a lot of you’ll be able to most definitely relate to, the discharge of Opus 4.5 totally shattered our conception of the way rapid an engineer may craft a PR.
And as coding brokers were given even higher from 4.5, we discovered that pull requests didn’t scale together with our coding pace. With each and every member of our workforce having the ability to pump out greater than 20 pull requests an afternoon, code evaluate temporarily become cognitively hard and not more useful.
After speaking with folks, we realized many really feel in a similar fashion, and recently face the binary possibility of both no longer doing evaluate in any respect or looking to stay alongside of a fireplace hose of pull requests.
Haystack is our try at a 3rd trail. We nonetheless imagine in code evaluate, however as coding brokers produce extra code, human reviewer consideration turns into extra precious and dearer.
Haystack is helping groups spend that focus at the PRs the place a human can meaningfully alternate the result of that PR. And for such PRs, Haystack presentations the reviewer what the PR supposed to do, whether or not the creator confirmed that it really works, and what design selections desire a 2nd pair of eyes.
We’re nonetheless moderately early and are working out whether or not Haystack in reality makes code evaluate higher. We would really like any and all comments!



