Census II of Free and Open Source Software — Application Libraries

Citation:

Frank Nagle, James Dana, Jennifer Hoffman, Steven Randazzo, and Yanuo Zhou. 3/2/2022. Census II of Free and Open Source Software — Application Libraries. The Linux Foundation. Harvard Laboratory for Innovation Science (LISH) and Open Source Security Foundation (OpenSSF). Publisher's Version

Abstract:

Free and Open Source Software (FOSS) has become a critical part of the modern economy. There are tens of millions of FOSS projects, many of which are built into software and products we use every day. However, it is difficult to fully understand the health, economic value, and security of FOSS because it is produced in a decentralized and distributed manner. This distributed development approach makes it unclear how much FOSS, and precisely what FOSS projects, are most widely used. This lack of understanding is a critical problem faced by those who want to help enhance the security of FOSS (e.g., companies, governments, individuals), yet do not know what projects to start with. This problem has garnered widespread attention with the Heartbleed and log4shell vulnerabilities that resulted in the susceptibility of hundreds of millions of devices to exploitation.

This report, Census II, is the second investigation into the widespread use of FOSS and aggregates data from over half a million observations of FOSS libraries used in production applications at thousands of companies, which aims to shed light on the most commonly used FOSS packages at the application library level. This effort builds on the Census I report that focused on the lower level critical operating system libraries and utilities, improving our understanding of the FOSS packages that software applications rely on. Such insights will help to identify critical FOSS packages to allow for resource prioritization to address security issues in this widely used software.

The Census II effort utilizes data from partner Software Composition Analysis (SCA) companies including Snyk, the Synopsys Cybersecurity Research Center (CyRC), and FOSSA, which partnered with Harvard to advance the state of open source research. Our goal is to not only identify the most widely used FOSS, but to also provide an example of how the distributed nature of FOSS requires a multi-party effort to fully understand the value and security of the FOSS ecosystem. Only through data-sharing, coordination, and investment will the value of this critical component of the digital economy be preserved for generations to come.

In addition to the detailed results on FOSS usage provided in the report, we identified five high-level findings: 1) the need for a standardized naming schema for software components, 2) the complexities associated with package versions, 3) much of the most widely used FOSS is developed by only a handful of contributors, 4) the increasing importance of individual developer account security, and 5) the persistence of legacy software in the open source space.

Last updated on 03/11/2022