Crowdsourcing & Open Innovation

Crowdsourcing Memories from the 1947 Partition of British India

Working with the Lakshmi Mittal and Family South Asia Institute at Harvard University, this project aims to collect and analyze oral histories and memories of the 1947 Partition of British India with a focus on minority voices. Aspects of this project include gathering discrete historical data such as locations and descriptions of refugee camps; mapping geographical locations... Read more about Crowdsourcing Memories from the 1947 Partition of British India

City Challenges

LISH researchers are designing experiments wrapped around NYU GovLab’s City Challenges. The City Challenges program aims to use competitions and coaching to solve urban problems. See here for information on a prior challenge.

Dental Image Recognition System

In collaboration with Charite-Berlin Hospital, we are studying the drivers of variability in doctor performance when diagnosing ailments in dental x-ray images, and how multiple human-labelings of the same data can yield more reliable diagnoses of ailments. These studies aim to provide new insights on improving clinical care and... Read more about Dental Image Recognition System

Christoph Riedl, Richard Zanibbi, Marti A. Hearst, Siyu Zhu, Michael Menietti, Jason Crusan, Ivan Metelsky, and Karim R. Lakhani. 2016. “Detecting Figures and Part Labels in Patents: Competition-Based Development of Image Processing Algorithms.” International Journal on Document Analysis and Recognition (IJDAR), 19, 2, Pp. 155-172. Publisher's VersionAbstract

Most United States Patent and Trademark Office (USPTO) patent documents contain drawing pages which describe inventions graphically. By convention and by rule, these drawings contain figures and parts that are annotated with numbered labels but not with text. As a result, readers must scan the document to find the description of a given part label. To make progress toward automatic creation of ‘tool-tips’ and hyperlinks from part labels to their associated descriptions, the USPTO hosted a monthlong online competition in which participants developed algorithms to detect figures and diagram part labels. The challenge drew 232 teams of two, of which 70 teams (30 %) submitted solutions. An unusual feature was that each patent was represented by a 300-dpi page scan along with an HTML file containing patent text, allowing integration of text processing and graphics recognition in participant algorithms. The design and performance of the top-5 systems are presented along with a system developed after the competition, illustrating that the winning teams produced near state-of-the-art results under strict time and computation constraints. The first place system used the provided HTML text, obtaining a harmonic mean of recall and precision (F-measure) of 88.57 % for figure region detection, 78.81 % for figure regions with correctly recognized figure titles, and 70.98 % for part label detection and recognition. Data and source code for the top-5 systems are available through the online UCI Machine Learning Repository to support follow-on work by others in the document recognition community.

Karim R. Lakhani and Eric Lonstein. 2011. TopCoder (A): Developing Software through Crowdsourcing (TN). Harvard Business School Teaching Notes. Harvard Business School. Publisher's VersionAbstract

Teaching Note for HBS Case 610-032.

TopCoder's crowdsourcing-based business model, in which software is developed through online tournaments, is presented. The case highlights how TopCoder has created a unique two-sided innovation platform consisting of a global community of over 225,000 developers who compete to write software modules for its over 40 clients. Provides details of a unique innovation platform where complex software is developed through ongoing online competitions. By outlining the company's evolution, the challenges of building a community and refining a web-based competition platform are illustrated. Experiences and perspectives from TopCoder community members and clients help show what it means to work from within or in cooperation with an online community. In the case, the use of distributed innovation and its potential merits as a corporate problem solving mechanism is discussed. Issues related to TopCoder's scalability, profitability, and growth are also explored.