Software Heritage

Software Heritage
FormationJune 30, 2016 (2016-06-30)
FounderRoberto Di Cosmo,
Stefano Zacchiroli
TypeNon‑profit
HeadquartersInria
Location
Scientific Advisors
Gérard Berry
Jean-François Abramatic
Julia Lawall
Serge Abiteboul
AffiliationsInria
Staff13
Websitesoftwareheritage.org

Software Heritage is a non-profit organization which provides a service for archiving and referencing historical and contemporary software — with a focus on human readable source code. The site was unveiled in 2016 by Inria  and is supported by UNESCO. The project itself is structured as a non‑profit multi‑stakeholder initiative.

The stated mission of Software Heritage is to collect, preserve and share all software that is publicly available in source code form, with the goal of building a common, shared infrastructure at the service of industry, research, culture and society as a whole.

Software source code is collected by crawling code hosting platforms, like GitHub, GitLab.com or Bitbucket, and packages archives, like npm or PyPI, and ingested into a special data structure, a Merkle DAG, that is the core of the archive. Each artifact in the archive is associated with a SoftWare Hash IDentifier (SWHID).

In order to increase the chances of preserving the Software Heritage archive over the long term, a mirror program was established in 2018, joined by ENEA and FossID as of October 2020.