Hamidah Oderinwale
BLUE Community Fellow
|
Fall
2024
Streamlining data provenance: documenting data origins and processes to replicate machine learning research
BLUE Community Fellow
Fall
2024
Background
In science, replicating experiments and achieving consistent results is crucial for maintaining scientific integrity and credibility. But it is getting harder with the rise of machine learning experiments, especially those using computer simulations. These experiments have built-in randomness, making it tough to do them again precisely the same way each time. Reproducibility is important for accountability but can also make model training easier and more precise, supporting capabilities in a world where algorithmic creativity will be increasingly important to make breakthroughs. My research focuses on streamlining data provenance—the documentation of data origins and processes.