When AI Joins the Team: Observations from a 3-Day Hack Event

May 11, 2026 - 12 minutes read

Seven Teams, One Vision

TL;DR: Seven teams built an AI-driven audio product in three days. AI agents didn’t replace human collaboration — they amplified individual speed, shifting the bottleneck to coordination and integration. Vibe coding works for UIs and exploration; spec-driven approaches win at integration boundaries. The architectural lesson: humans own the interfaces, agents own the interiors. Participants estimated 3 days = 3 months of normal elapsed time, though that compresses prototyping time, not production delivery.

Teams collaborating at the hack event in Lisbon

Last week I spent three days in Lisbon at an AI hack day focused on audio. Seven teams, each building an individual capability, all working toward a combined product at the end. The scope was defined upfront. The teams competed for the best performance of their individual piece while building toward a shared vision. That mix of competition and collaboration created an energy I haven’t seen in traditional hackathons. The observations below aren’t specific to audio — they apply to any AI-assisted hack day where multiple teams build toward a shared goal.

The participants were a mixed set along two dimensions. Some were cloud-native engineers comfortable with deploying services and wiring infrastructure. Others were touching cloud development for the second time. Independently, some had already adopted agentic coding tools in their daily work, while others were prompting an AI agent for the first time. These two axes created four different starting points in the same room. What everyone shared was a goal: build something that works by Friday.

Like always in hackathons, magic happened. It clicked. The demos were impressive. And the business opportunities for AI in audio became tangible. Context turned out to be king. When you understand what’s happening around a listener, you can change news, music, and advertisement in ways that weren’t possible before.

That said, “magic” undersells the effort. The final integration required significant last-day work. Not all components clicked on the first try. Some needed manual intervention to talk to each other. The format’s strength — a shared vision — is also its risk: if one piece underdelivers, the combined demo suffers. But the teams pulled through.

The thing I kept thinking about, though, wasn’t the product. It was the process.

The Question I Had Going In

Before the event, I wasn’t sure how agentic coding would change hackathon dynamics. I had real concerns:

Would teams stop talking to each other? Would individuals only converse with an AI? Would they be able to split work, or would it collapse into one human talking to one agent while everyone else watches?

I’ve seen enough hackathons to know that the social dynamics matter as much as the technology. A team that doesn’t communicate doesn’t ship. So I watched closely.

What Actually Happened

The answer surprised me in its normalcy. Teams discussed a lot. They debated architecture. They negotiated interfaces. They split work across people. The human collaboration didn’t disappear. It shifted.

What changed: each individual had at least one AI agent as a building partner. In many cases, more than one. So effectively, teams were composed of humans and AI agents working in parallel. A team of four, for example, became a team of four humans and four-plus agents.

The agents didn’t replace the conversations between humans. They replaced the solo grinding [3]. The hours spent reading documentation, writing boilerplate, debugging configuration. That work still happened, but it happened faster and with less friction.

Bridging the Knowledge Gap

Here’s where it got interesting. The mixed experience levels that would normally create bottlenecks became less of a problem. AI agents helped bridge both knowledge gaps. A participant unfamiliar with cloud infrastructure could ask their agent to explain IAM roles, scaffold a deployment, or debug a networking issue. A participant new to agentic coding could watch a more experienced teammate’s workflow and replicate the pattern within minutes. The steep learning curves on both axes flattened. A recent HKU study on a GenAI hackathon found the same pattern: “less technically experienced students particularly benefited, using LLMs to bridge knowledge gaps and develop working prototypes they couldn’t have built otherwise” [6].

In practice, the cloud knowledge gap was bridged more completely. Agents are good at explaining well-documented services and generating working configurations. The agentic coding gap was bridged through peer observation — seeing someone else prompt effectively is the fastest way to learn the pattern yourself.

But speed didn’t come at the cost of learning. Teams were still troubleshooting together. When something broke, humans gathered around a screen, discussed the error, and figured it out collectively. The implementation wasn’t totally hidden. And that matters, because understanding what’s being built is what drives real learning. If you can’t explain what your agent produced, you haven’t learned anything.

To be honest: learning depth varied. Some participants gained deep understanding through debugging and iteration. Others shipped working code they couldn’t fully explain without the agent. Both outcomes are valid for a hack day. But only the former transfers to production work.

The takeaway isn’t “you need less experienced people.” AI agents lower the floor — the minimum viable contribution someone can make. They don’t eliminate the need for the ceiling. Deep expertise is still required for architecture decisions, debugging novel problems, and knowing when the agent is wrong.

Where Vibe Coding Shines

AI agents turned out to be exceptionally capable at building user interfaces. This improved the quality of the demos significantly. Teams could give the agents a lot of freedom here. Describe what you want, iterate on the result, ship it. For a hackathon environment, this is perfect. The output doesn’t need to be pixel-perfect. It needs to be impressive enough to tell the story.

Trying out new ideas and unfamiliar technology also sits well with vibe coding. Andrej Karpathy coined the term in February 2025 to describe exactly this: a style where you “surrender to the vibes,” describe what you want, and iterate until it feels right [5]. He scoped it explicitly to throwaway projects, not production systems. A hack day is the ideal environment for it.

Where Vibe Coding Breaks

Components that integrate with other components are a different story. Interfaces, data models, API contracts. These require precision. When one team’s output feeds into another team’s input, “close enough” doesn’t work.

Results from pure vibe coding tend to be too nondeterministic for integration. Team A’s agent interprets the spec slightly differently than Team B’s agent. The formats don’t quite match. The error handling assumptions diverge. Integration fails.

Here it’s much better to drive a spec-driven approach [4]. Define the interfaces explicitly. Maybe even have the teams contribute to a common specification before building independently. The agents can then implement against that spec rather than inventing their own interpretation.

Vibe Coding Works	Spec-Driven Works
UI prototypes	API contracts
New tech spikes	Data models
Demo frontends	Integration layers
Internal component logic	Shared schemas
Exploring unfamiliar APIs	Error handling conventions

The dividing line: if another team depends on your output format, use a spec. If only your demo depends on it, vibe-code away.

Prototypes Are Not Products

A lot of what was built during those three days needs to evolve into a more spec-driven approach. Like prototypes in the past, the output is valid design input. It proves the concept works. It shows the art of the possible. But it should not be confused with production-ready software. Research on vibe-coded projects shows a predictable decay pattern: rapid shipping in months 1-3, integration challenges in months 4-9, and delivery stalls by month 16 when teams no longer understand their own systems [7].

This isn’t a criticism. It’s the nature of hack days. The value is in the proof, not the polish. The teams proved that AI-driven audio works. That context-aware content generation is feasible. That seven independent capabilities can combine into something coherent. Now the real engineering begins.

The 3-Day = 3-Month Compression

Participants estimated that in three days they built output that would normally require three months of elapsed time. That comparison isn’t apples-to-apples. Three months of elapsed time includes context switching, meetings, approvals, dependencies, and production-quality requirements. A hack day strips all of that away. The more honest framing: hack days compress exploration and prototyping time. Getting the same proof of concept through a normal organizational process takes months. Shipping production software from it takes longer still.

But the compression is real, and it’s not just AI tools. It’s the combination of factors: bringing everyone into one room. Allowing them to focus without the noise of daily operations. Using cloud technology that enables quick experimentation without procurement cycles. And yes, the support of AI tools that enable fast progression even into unknown technologies.

None of these factors alone explains the compression. Together, they compound. Focus multiplies the effect of good tools. Good tools multiply the effect of focus. Remove any one factor and the equation breaks.

Incentivizing AI Usage

We actively incentivized participants to use more AI tokens. This sounds counterintuitive. Shouldn’t AI be used responsibly and effectively, not just maximally?

In a hackathon context, the answer is: sometimes more is more. I was supporting the event with an opening talk and technical guidance, and one thing I noticed early was hesitation. Many participants had never used agentic coding tools before. They weren’t sure when to prompt, how to prompt, or whether the output would be useful. The incentive gave them permission to just try. Push into it. Experiment without worrying about waste.

We built a small leaderboard that tracked token consumption across teams. It’s a bad proxy metric for real projects. In production, AI should be used deliberately and efficiently. But in a learning environment, the correlation between iteration velocity and output is stronger than you’d think. Teams that stop iterating stop shipping [1].

The leaderboard created a social signal. When your bar is flat, you’ve stopped experimenting. Research on leaderboard design in learning environments confirms this effect: visible progress indicators enhance competence need satisfaction and perceived task meaningfulness [8]. That visibility alone was enough to unstick teams that were overthinking their next prompt.

Voices from the Room

A few things I heard that stuck with me:

“Not as much coding experience needed. Just creativity and a good prompt is able to create amazing products.”

“Planning and discussing between humans is still needed.”

“Thinking is not replaced.”

“Collective intelligence, speed through focus, building a network.”

“We built in 3 days what would have required elapsed time of 3 months.”

Architecting for Hack Days

Bird's eye view of a city block with hand-drawn streets and machine-built buildings, representing humans owning boundaries while agents own interiors — Humans own the boundaries, agents own the interiors

Watching seven teams build in parallel with AI agents taught me something about architecture. The teams that shipped fastest weren’t the ones with the best prompts. They were the ones that decomposed their problem into the right-sized pieces.

The principle: make components small enough that a single agent session can own one end-to-end. When a component fits in one context window, the agent can reason about it completely. When it spans multiple sessions or requires coordination across boundaries, you’re back to the integration problems that slow everything down.

This suggests an architectural approach for hack day builds (different rules apply for production systems):

The hack day architecture pattern: spec at the boundaries, vibe coding in the interiors

The spec layer is human territory. API contracts, data models, interface definitions, error handling conventions. Humans discuss, agree, write it down. This is where “planning and discussing between humans is still needed” lives. It’s also where the hack day format shines: everyone in one room, whiteboard, 30 minutes, done.

The components are agent territory. Once the boundaries are clear, each team (or individual) can vibe-code their component with full creative freedom. The agent doesn’t need to know about the other components. It just needs to respect the contract.

The integration layer is spec-driven. This is where you test that contracts are honored. Where you validate data formats. Where you catch the drift that vibe coding inevitably introduces. Automated tests here are worth more than anywhere else.

This pattern works well for time-boxed events where speed matters more than long-term maintainability. Production systems demand more rigorous specs, comprehensive test coverage, and deliberate contract evolution — but the shape of “humans own boundaries, agents own interiors” is a useful starting point for thinking about how to organize agentic builds in any context.

The teams that struggled during the hack day were the ones that tried to vibe-code across boundaries. The ones that shipped were the ones that spent their first hour defining contracts and their remaining time building independently.

What I Take Away

We called this a hack day rather than a hackathon for a reason. The scope was defined upfront. Teams weren’t inventing problems to solve. They were building defined capabilities toward a shared product. Competition on execution quality. Collaboration on integration. A shared demo that only works if everyone delivers. No team could succeed alone, and no team wanted to be the one that broke the final product.

AI agents don’t replace human collaboration. They amplify individual execution speed, which makes the human collaboration more valuable, not less. When everyone can build faster, the bottleneck shifts to alignment, architecture decisions, and integration [2]. The things that require humans talking to humans.

Vibe coding is a spectrum, not a binary. It works brilliantly for exploration, UIs, and isolated components. It breaks for integration points and shared contracts. Knowing where you are on that spectrum is the skill that separates teams that ship from teams that struggle.

The architectural lesson: decompose into components small enough for one agent to own. Define boundaries with humans. Fill interiors with agents. Test at the seams.

And hack days, done right, compress not just time but learning. Everyone involved called this a success. This should be done again.

Have you run an AI-assisted hackathon? I’d be curious whether you saw similar dynamics, or whether your teams found different patterns for splitting work between humans and agents.

Sources

[1] S. Christoph, “Hackathon Gamification: A Real-Time Leaderboard You Can Deploy in 5 Minutes,” schristoph.online, May 2026. https://schristoph.online/blog/hackathon-in-a-hackathon/

[2] S. Christoph, “The Bottleneck Moved: What 10 Studies Say About AI Developer Productivity,” schristoph.online, May 2026. https://schristoph.online/blog/bottleneck-moved-productivity/

[3] S. Christoph, “The Coding Agent That Doesn’t Code,” schristoph.online, March 2026. https://schristoph.online/blog/the-coding-agent-that-doesnt-code/

[4] S. Christoph, “Code Quality Is the New Infrastructure,” schristoph.online, May 2026. https://schristoph.online/blog/code-quality-new-infrastructure/

[5] A. Karpathy, “Vibe Coding,” X/Twitter, February 2025. Term coined to describe AI-assisted development where developers “surrender to the vibes” and iterate through conversation.

[6] N. Law et al., “The role of generative artificial intelligence in collaborative problem solving of authentic challenges,” British Journal of Educational Technology, 2025. https://doi.org/10.1111/bjet.70010

[7] M. Shah, “Vibe Coding vs Spec-Driven Development (2026): When to Use Each,” Augment Code, March 2026. https://www.augmentcode.com/guides/vibe-coding-vs-spec-driven-development

[8] S. Park & S. Kim, “Leaderboard Design Principles to Enhance Learning and Motivation in a Gamified Educational Environment,” JMIR Serious Games, 2021. https://pmc.ncbi.nlm.nih.gov/articles/PMC8097522/

❤️ Created with the support of AI (Kiro)