In November, we introduced the Sora Android app to the arena, giving someone with an Android tool the power to show a brief steered right into a vibrant video. On release day, the app reached #1 within the Play Retailer. Android customers generated greater than 1,000,000 movies within the first 24 hours.
In the back of the release is a tale: the preliminary model of Sora’s manufacturing Android app used to be inbuilt 28 days, because of the similar agent that’s to be had to any crew or developer: Codex.
From October 8 to November 5, 2025, a lean engineering crew operating along Codex and eating kind of 5 billion tokens, shipped Sora for Android from prototype to world release. Regardless of its scale, the app has a crash-free price of 99.9 p.c and an structure we’re happy with. For those who’re questioning whether or not we used a secret type, we used an early model of the GPT‑5.1‑Codex type – the similar model that any developer or industry can use lately by way of CLI, IDE extension, or internet app.
Suggested: determine skater plays a triple axle with a cat on her head
When Sora introduced on iOS, utilization exploded. Folks in an instant started producing a move of movies. On Android, in contrast, we had just a small inside prototype and a mounting collection of pre-registered customers on Google Play.
A commonplace reaction to a prime stakes, time-pressured release is to pile on assets and upload procedure. A manufacturing app of this scope and high quality would usually contain many engineers operating for months, bogged down through coordination.
American pc architect Fred Brooks famously warned that “including extra other people to a overdue device mission makes it later.” In different phrases, when looking to send a posh mission briefly, including extra engineers can regularly decelerate potency through including to conversation overhead, process fragmentation, and integration prices. We leaned into this perception as an alternative of ignoring it; we assembled a robust crew of 4 engineers – all supplied with Codex to significantly building up each and every engineer’s have an effect on.
Operating this fashion, we shipped an inside construct of Sora for Android to workers in 18 days and introduced publicly 10 days later. We maintained a prime bar on Android engineering practices, invested in maintainability, and held the app to the similar reliability bar we’d be expecting from a extra conventional mission. (We additionally proceed to make use of Codex widely lately to adapt and produce new options to the app).
To make sense of ways we labored with Codex, it is helping to understand the place it shines and the place it wishes course. Treating it like a newly employed senior engineer used to be a excellent method. Codex’s skill intended lets spend extra time directing and reviewing code than writing it ourselves.
The place Codex wishes steerage
- Codex isn’t but nice at inferring what it hasn’t been instructed (e.g., your most popular structure patterns, product technique, genuine consumer conduct, and inside norms or shortcuts).
- In a similar fashion, Codex couldn’t see the app if truth be told run: It couldn’t open Sora on a tool, understand {that a} scroll felt off, or sense {that a} go with the flow used to be complicated. Handiest our crew may quilt those experiential duties.
- Every example calls for onboarding. Sharing context with transparent objectives, constraints, and steerage on “how we do issues” used to be crucial to creating Codex execute nicely.
- In the similar vein, Codex struggled with deep architectural judgment: Left by itself, it will introduce an additional view type the place we actually sought after to increase an current one or push good judgment into the UI layer that obviously belonged in a repository. Its intuition is to get one thing operating, to not prioritize lengthy‑time period cleanliness.
We discovered it helpful to have Codex create and care for a beneficiant quantity of AGENT.md recordsdata all through the codebase. This made it simple to use the similar steerage and very best practices throughout periods. For instance, to verify Codex wrote code in our taste pointers, we added the next to our top-level AGENTS.md:
- Studying and working out huge codebases all of a sudden: Codex is aware of necessarily all primary programming languages, which makes it more straightforward to leverage the similar ideas throughout many platforms with out complicated abstractions.
- Checking out protection: Codex is (uniquely) writing unit assessments to hide a vast number of instances. Now not each check used to be deep, however having breadth of protection used to be useful in fighting regressions.
- Making use of comments: In a identical vein, Codex is excellent at reacting to comments. When CI failed, lets paste log output right into a steered and ask Codex to suggest fixes.
- Hugely parallel, disposable execution: Maximum gained’t push the boundaries of the collection of periods they might if truth be told run at anybody time. It’s extremely possible to check a couple of concepts in parallel and look at code as disposable.
- Providing new point of view: In design discussions, we used Codex as a generative instrument to discover attainable failure issues and uncover new tactics to resolve an issue. For instance, whilst we designed video participant reminiscence optimizations, Codex sifted via a couple of SDKs to suggest approaches we wouldn’t have had time to parse. The insights from Codex’s analysis proved helpful in minimizing reminiscence footprint within the ultimate app.
- Enabling upper‑leverage paintings: In observe, we ended up spending extra time reviewing and directing code than writing it ourselves. That stated, Codex is superb at code assessment, too, regularly catching insects ahead of they’re merged, making improvements to reliability.
When we stated those traits, our operating type was easier. We leaned on Codex to do an enormous quantity of heavy lifting within nicely‑understood patterns and nicely‑bounded scopes, whilst our crew concerned with structure, consumer revel in, systemic adjustments, and ultimate high quality.
Even the most productive new, senior rent doesn’t have the precise vantage level for making long-term trade-offs instantly. To leverage Codex and make sure its paintings used to be tough and maintainable, it used to be key that we oversaw the app’s methods design and key trade-offs ourselves. Those incorporated shaping the app’s structure, modularization, dependency injection, and navigation; we additionally applied authentication and base networking flows.
From this basis, we wrote a couple of consultant options finish‑to‑finish. We used the foundations we would have liked all the codebase to observe and documented mission‑vast patterns as we went. By way of pointing Codex to consultant options, it used to be ready to paintings extra independently inside our requirements. For a mission that we estimate used to be 85% written through Codex, a in moderation deliberate basis have shyed away from expensive backtracking and refactoring. It used to be one of the vital necessary selections we made.
The speculation used to be to not make “one thing that works” as briefly as conceivable, moderately to make “one thing that will get how we would like issues to paintings.” There are lots of “right kind” tactics to put in writing code. We didn’t wish to inform Codex precisely what to do; we had to display Codex what’s “right kind” on our crew. When we had established our start line and the way we favored to construct, Codex used to be able to begin.
To look what would occur, we did check out prompting: “Construct the Sora Android app in response to the iOS code. Move,” however briefly aborted that trail. Whilst what Codex created technically labored, the product revel in used to be sub-par. And with out a transparent working out of endpoints, information, and consumer flows, Codex’s single-shot code used to be unreliable (Even with out the usage of an agent, it’s dangerous to merge 1000’s of traces of code.)
We hypothesized Codex would thrive in a sandbox of well-written examples; and we had been proper. Asking Codex to “construct this settings display” with virtually no context used to be unreliable. Asking Codex to “construct this settings display the usage of the similar structure and patterns as this different display you simply noticed” labored a long way higher. People made the structural selections and set the invariants; Codex then crammed in huge quantities of code within that construction.
Our subsequent step in maximizing Codex’s attainable used to be understanding allow Codex to paintings for lengthy classes of time (lately, greater than 24 hours), unsupervised.
Early on in the usage of Codex, we jumped to activates like, “Here’s the characteristic. Listed below are some recordsdata. Please construct it.” That once in a while labored, however most commonly produced code that technically compiled, whilst straying from our structure and objectives.
So we modified the workflow. For any non‑trivial alternate, we first requested Codex to lend a hand us know the way the device and code paintings. For instance, we’d ask it to learn a suite of comparable recordsdata and summarize how that characteristic works; for instance, how information flows from the API throughout the repository layer, the view type, and into the UI. Then we’d right kind or refine its working out. (For instance, we’d indicate {that a} explicit abstraction actually belongs in a unique layer or {that a} given magnificence exists just for offline mode and must now not be prolonged.)
In a similar fashion to how you may have interaction a brand new, extremely succesful teammate, we labored with Codex to create a forged implementation plan. That plan regularly seemed like a miniature design report directing which recordsdata must alternate, what new states must be presented, and the way good judgment must go with the flow. Handiest then did we ask Codex to begin making use of the plan, one step at a time. One useful tip: for terribly lengthy duties, the place we hit the restrict of our context window, we’d ask Codex to avoid wasting its plan to a document, permitting us to use the similar course throughout circumstances.
This additional making plans loop grew to become out to be well worth the time. It allowed us to let Codex run “unsupervised” for lengthy stretches, as a result of we knew its plans. It made code assessment more straightforward, as a result of lets take a look at the implementation towards our plan moderately than studying a diff with out context. And when one thing went flawed, lets debug the plan first and the code 2d.
The dynamic felt very similar to the way in which a excellent design report offers a tech lead self assurance in a mission. We weren’t simply producing code: we had been generating code that supported a shared roadmap.
On the height of the mission, we had been regularly working a couple of Codex periods in parallel. One used to be operating on playback, every other on seek, every other on error dealing with, and once in a while every other on assessments or refactors. It felt much less like the usage of a device and extra like managing a crew.
Every consultation would periodically report to us with development. One may say, “I’m performed making plans out this module; right here’s what I suggest,” whilst every other would supply a big diff for a brand new characteristic. Every required consideration, comments, and assessment. It used to be uncannily very similar to being a tech lead with a number of new engineers, all making development, all desiring steerage.
The end result used to be a collaborative go with the flow. Codex’s uncooked coding capacity freed us from numerous handbook typing. We had extra time to consider structure, learn pull requests in moderation, and check out the app.
On the similar time, that additional pace intended we all the time had one thing ready in our assessment queue. Codex didn’t get blocked through context switching, however we did. Our bottleneck in construction shifted from writing code to creating selections, giving comments, and integrating adjustments.
That is the place Brooks’s insights land in a brand new method. You’ll be able to’t merely upload Codex periods and be expecting linear speedups any further than you’ll be able to stay including engineers to a mission and be expecting the agenda to shrink linearly. Every further “pair of fingers,” even digital ones, provides coordination overhead. We had change into the conductor of an orchestra as opposed to merely sooner solo avid gamers.
We began our mission with an enormous stepping stone: Sora had already shipped on iOS. We ceaselessly pointed Codex on the iOS and backend codebases to lend a hand it perceive key necessities and constraints. All the way through the mission we joked that we had reinvented the speculation of a move‑platform framework. Omit React Local or Flutter; the way forward for move‑platform is solely Codex.
Underneath the quip are two rules:.
- Common sense is moveable. Whether or not the code is written in Swift or Kotlin, the underlying software good judgment – information fashions, community calls, validation laws, industry good judgment – are the similar. Codex is superb at studying a Swift implementation and generating an similar in Kotlin that preserves semantics.
- Concrete examples supply robust context. A recent Codex consultation that may see “right here is strictly how this works on iOS” and “here’s the Android structure” is way more efficient than one who’s most effective operating from herbal language descriptions.
Hanging those rules to paintings, we made the iOS, backend and Android repos to be had in the similar setting. We gave Codex activates like:
“Learn those fashions and endpoints within the iOS code after which suggest a plan to enforce the similar conduct on Android the usage of our current API consumer and type categories.”
One small however helpful trick used to be to element in ~/.codex/AGENTS.md the place native repos lived and what they contained. That made it more straightforward for Codex to find and navigate related code.
We had been successfully doing cross-platform construction via translation as an alternative of shared abstraction. As a result of Codex treated many of the translation, we have shyed away from doubling implementation prices.
The wider lesson is that for Codex, context is the entirety. Codex did its very best paintings when it understood how the characteristic already labored in iOS, paired with an working out of ways our Android app used to be structured. When Codex lacked that context, it wasn’t “refusing to cooperate”; it used to be guessing. The extra we handled it like a brand new teammate and invested in giving it the precise inputs, the simpler it carried out.
By way of the tip of our 4‑week dash, the usage of Codex stopped feeling like an experiment and was our default construction loop. We used it to know current code, plan adjustments, and enforce options. We reviewed its output the similar method we’d assessment a teammate’s. It used to be merely how we shipped device.
It was transparent that AI‑assisted construction does now not scale back the will for rigor; it will increase it. As succesful as Codex is, its goal is to get from A to B, now. This is the reason AI-assisted coding doesn’t paintings with out people. Device engineers can perceive and practice the real-world constraints of methods, the most productive tactics to architect device, and construct with long run construction and product plans in thoughts. The tremendous powers of the next day to come’s device engineer shall be deep methods working out and the power to paintings collaboratively with AI over very long time horizons.
Essentially the most fascinating portions of device engineering are development compelling merchandise, designing scalable methods, writing complicated algorithms, and experimenting with information, patterns, and code. Then again, the realities of device engineering of the previous and provide regularly lean extra mundane: centering buttons, wiring endpoints, and writing boilerplate. Now, Codex makes it conceivable to concentrate on essentially the most significant portions of device engineering and the explanations we adore our craft.
As soon as Codex is about up in a context-rich setting the place it understands your objectives and the way you love to construct, any crew can multiply its features. Our release unfashionable isn’t a one‑dimension‑suits‑all recipe, and we are not claiming to have solved AI‑assisted construction. However we are hoping our revel in makes it more straightforward to search out the most productive tactics to empower Codex to empower you.
When Codex introduced in a analysis preview seven months in the past, device engineering appeared very other. Thru Sora, we were given to discover the following bankruptcy of engineering. As our fashions and harness stay making improvements to, AI will change into an more and more indispensable a part of development.
What’s going to you’re making with your personal crew of Codex?


