Collective Constitutional AI
Summary
Researchers used the Polis platform to gather constitutional principles from ~1,000 Americans. They trained a language model using these publicly sourced principles and compared it to their standard model.
Review
This research represents an innovative attempt to democratize AI alignment by incorporating public preferences into an AI system's constitutional principles. By engaging approximately 1,000 Americans in an online deliberation process, the researchers sought to move beyond developer-defined values and explore how collective input might shape AI behavior.
Methodologically, the study used the Polis platform to solicit and vote on potential AI governance principles, then translated these into a constitutional framework for model training. The resulting 'Public' model was rigorously evaluated against a 'Standard' model, revealing interesting nuances. While performance remained largely equivalent, the Public model showed notably lower bias across social dimensions, particularly in disability status and physical appearance. This suggests that public input can potentially introduce more inclusive and balanced principles into AI systems.
Key Points
- First known attempt to collectively define AI constitutional principles through public deliberation
- Public-sourced constitution emphasized objectivity, impartiality, and accessibility
- Publicly trained model demonstrated reduced bias compared to developer-defined model