Most United States Patent and Trademark Office (USPTO) patent documents contain drawing pages which describe inventions graphically. By convention and by rule, these drawings contain figures and parts that are annotated with numbered labels but not with text. As a result, readers must scan the document to find the description of a given part label. To make progress toward automatic creation of ‘tool-tips’ and hyperlinks from part labels to their associated descriptions, the USPTO hosted a monthlong online competition in which participants developed algorithms to detect figures and diagram part labels. The challenge drew 232 teams of two, of which 70 teams (30 %) submitted solutions. An unusual feature was that each patent was represented by a 300-dpi page scan along with an HTML file containing patent text, allowing integration of text processing and graphics recognition in participant algorithms. The design and performance of the top-5 systems are presented along with a system developed after the competition, illustrating that the winning teams produced near state-of-the-art results under strict time and computation constraints. The first place system used the provided HTML text, obtaining a harmonic mean of recall and precision (F-measure) of 88.57 % for figure region detection, 78.81 % for figure regions with correctly recognized figure titles, and 70.98 % for part label detection and recognition. Data and source code for the top-5 systems are available through the online UCI Machine Learning Repository to support follow-on work by others in the document recognition community.
Teaching Note for HBS Case 610-032.
TopCoder's crowdsourcing-based business model, in which software is developed through online tournaments, is presented. The case highlights how TopCoder has created a unique two-sided innovation platform consisting of a global community of over 225,000 developers who compete to write software modules for its over 40 clients. Provides details of a unique innovation platform where complex software is developed through ongoing online competitions. By outlining the company's evolution, the challenges of building a community and refining a web-based competition platform are illustrated. Experiences and perspectives from TopCoder community members and clients help show what it means to work from within or in cooperation with an online community. In the case, the use of distributed innovation and its potential merits as a corporate problem solving mechanism is discussed. Issues related to TopCoder's scalability, profitability, and growth are also explored.