Lessons from archival provenance: Reframing data creator relationships in AI
Emily MaemuraAbstract
This paper presents a brief history of archival provenance and its shifting role across different paradigms of archives, considering how this past work can provide insights into current trajectories of AI. It explores what archival provenance does, or can do, differently than current imaginations of data provenance in AI. It proposes three “lessons from archival provenance” that challenge the present focus of existing approaches in AI, in particular highlighting how narrowly this work structures and models evidence, authenticity and creatorship of data. These lessons can be used to identify new opportunities for applying concepts of archival provenance by presenting a roadmap for future work where AI research might align and converge with related work in archival theory. It concludes by asserting that grounding AI datasets in this perspective of archival provenance can realize a new paradigm of “data as archives,” serving to envision responsibilities to a range of data creators, determine needs for documentation of context and establish a crucial role for “data archivists.”