Content
<aside>
👉
The program runs for 5 weeks. Each week focuses on a different phase of building and testing an alignment method. The goal is to embed values in a system in a way that generalizes and can’t be easily gamed. Participants will form teams during the app process. We recommend teams of 3–5. You can apply solo or with collaborators.
Mentors may support specific teams depending on availability. Teams are expected to coordinate independently and meet regularly. If someone drops out, we’ll help rebalance teams where needed.
</aside>
Week 1: Scoping
In the first week, teams will examine previous attempts within their chosen method. This includes reviewing what has been done before, why they failed or suceeded, and what directions are promising to explore. The goal is to understand what makes this iteration different, and to identify what makes this experiment a success. Teams with assigned mentors will coordinate throughout the week to stress-test their direction.
By the end of Week 1, the team should have:
- Reviewed prior work related to their method.
- Looked into the novelty in their chosen method.
- Drafted a preliminary plan for how to collect and interpret supporting evidence.
<aside>
✍️
Track-specific Examples
- Agent Foundations: Define a formal proof under bounded assumptions and sketch a path from proof to implementable architecture
- Neuroscience-Based: Identify candidate regions or mechanisms in the brain associated with value encoding, and outline a model for reproducing or testing this behavior
- Preference Optimization: Establish a case for how the method improves on prior oversight approaches, supported by references to eval results or known alignment gaps
- Open Track: Justify expected scalability and identify alignment evaluations, interpretability tools, and robustness tests appropriate for the method
</aside>
Week 2-3: Experimentation
Teams will begin implementation by running tests and iterating based on the research they found from Week 1. We will provide TPU credits and mentorship to help teams build their project from the ground up. In general, every team is expected to test whether their method actually moves the needle on alignment.
By the end of Week 3, the team should have:
- At least one working implementation running, even if minimal
- Tested the method at increasing scale (larger models, more data, more steps)
- Planned for robustness testing in Week 4
<aside>
✍️
For the Agent Foundations tracks
If you’re in this track, you’ll follow one of two subtracks:
- Theoretical Focus: Extend proofs, derive constraints, stress-test assumptions
- Category Theory: Use string diagrams or string machines to construct and reason about infrakernels
</aside>
Week 4: Testing
Teams will critique their alignment method, attempt to break their own evals, and run tests on larger or more adversarial setups. Mentors and the AI-Plans team will advise teams based on prior hackathons in alignment evals.
By the end of Week 4, the team should have:
- Attempted to red-team its own method
- Documented failure modes and anomalies in the results
- Finalizes the version they’ll write up and present
Week 5: Wrap-up
Teams will write their final summary. This includes the method, evidence, assumptions, failure analysis, and proposed next steps. The summary should stand on its own as a falsifiable alignment contribution. Teams will also prepare their poster and get final feedback.
By the end of Week 5, the team should have:
- Written a research summary that includes a clear statement of the method, rationale, evaluation, and result
- Created poster materials for the final presentation
- Integrated the feedback from mentors and peers
Final Day Presentation + Job Fair
The program ends with a public poster session and a job fair.
- Presentation Evening: Posters will be presented online in a conference-style format in GatherTown. Each team will have a virtual space to share their work and talk with attendees. Teams will present their work and defend their method. A panel will vote on standout projects.
- Job Fair: Research orgs, labs, and startups can host booths, meet researchers, and share open roles.
<aside>
🍿
Attendance and Pricing
All funds go toward program costs and participant stipends.
- General admission: €10 for a guaranteed spot
- Unpaid participants or below €10 threshold: Free with reservation
- Org booths at job fair: €200
- 15-minute talk slot on main stage: €2000
</aside>