Data Science & AI Development

2022
Jaehan Cho, Timothy DeStefano, Hanhin Kim, Inchul Kim, and Jin Hyun Paik. 2/19/2022. “What's driving the diffusion of next-generation digital technologies?” Technovation. Publisher's VersionAbstract
The recent development and diffusion of next-generation digital technologies (NGDTs) such as artificial intelligence, the Internet of Things, big data, 3D printing, and so on are expected to have an immense impact on businesses, innovation, and society. While we know from extant research that a firm's R&D investment, intangible assets, and productivity are factors that influence technology use more generally, to date there is little known about the factors that determine how these emerging tools are used, and by who. Using Probit and OLS modeling on a survey of 12,579 South Korean firms in 2017, we conduct one of the first comprehensive examinations highlighting various firm characteristics that drive NGDT implementation. While much of the literature assesses the use of individual technologies, our research attempts to unveil the extent to which firms implement NGDTs in bundles. Our investigation shows that more than half of the firms that use NGDTs deployed multiple technologies simultaneously. One of the insightful complementarities identified in this research exists amongst technologies that generate, facilitate and demand large sums of data, including big data, IoT, cloud computing and AI. Such technologies also appear important for innovative tools such as 3D printing and robotics.
2021
Karim R. Lakhani, Yael Grushka-Cockayne, Jin H. Paik, and Steven Randazzo. 10/2021. “Customer-Centric Design with Artificial Intelligence: Commonwealth Bank”. Publisher's VersionAbstract
As Commonwealth Bank (CommBank) CEO Matt Comyn delivered the full financial year results in August 2021 over videoconference, it took less than two minutes for him to make his first mention of the organization's Customer Engagement Engine (CEE), the AI-driven customer experience platform. With full cross-channel integration, CEE operated using 450 machine learning models that learned from a total of 157 billion data points. Against the backdrop of a once-in-a century global pandemic, CEE had helped the Group deliver a strong financial performance while also supporting customers with assistance packages designed in response to the coronavirus outbreak. Six years earlier, in 2015, financial services were embarking on a transformation driven by the increased availability and standardization of data and artificial intelligence (AI). Speed, access and price, once key differentiators for attracting and retaining customers, had been commoditized by AI, and new differentiators such as customization and enhanced interactions were expected. Seeking to create value for customers through an efficient, data-driven practice, CommBank leveraged existing channels of operations. Angus Sullivan, Group Executive of Retail Banking, remarked, "How do we, over thousands of interactions, try and generate the same outcomes as from a really in-depth, one-to-one conversation?" The leadership team began to make key investments in data and infrastructure. While some headway had been made, newly appointed Chief Data and Analytics Officer, Andrew McMullan, was brought in to catalyze the process and progress of the leadership's vision for a new customer experience. Success would depend on continued drive from leadership, buy-in from frontline staff, and a reliable team of passionate and knowledgeable data professionals. How did Comyn and McMullan bring their vision to life: to deliver better outcomes through a new approach to customer-centricity? How did they overcome internal resistance, data sharing barriers, and requirements for technical capabilities?
Andrea Blasco, Ted Natoli, Michael G. Endres, Rinat A. Sergeev, Steven Randazzo, Jin Paik, Max Macaluso, Rajiv Narayan, Karim R. Lakhani, and Aravind Subramaniam. 4/6/2021. “Improving Deconvolution Methods in Biology through Open Innovation Competitions: An Application to the Connectivity Map.” Bioinformatics. Publisher's VersionAbstract
Do machine learning methods improve standard deconvolution techniques for gene expression data? This article uses a unique new dataset combined with an open innovation competition to evaluate a wide range of approaches developed by 294 competitors from 20 countries. The competition’s objective was to address a deconvolution problem critical to analyzing genetic perturbations from the Connectivity Map. The issue consists of separating gene expression of individual genes from raw measurements obtained from gene pairs. We evaluated the outcomes using ground-truth data (direct measurements for single genes) obtained from the same samples.
2020
Hannah Mayer. 7/2020. “AI in Enterprise: AI Product Management.” Edited by Jin H. Paik, Jenny Hoffman, and Steven Randazzo.Abstract

While there are dispersed resources to learn more about artificial intelligence, there remains a need to cultivate a community of practitioners for cyclical exposure and knowledge sharing of best practices in the enterprise. That is why Laboratory for Innovation Science at Harvard launched the AI in the Enterprise series, which exposes managers and executives to interesting applications of AI and the decisions behind developing such tools. 

Moderated by HBS Professor and co-author of Competing in the Age of AI, Karim R. Lakhani, the July virtual session featured Peter Skomoroch from DataWrangling and formerly at LinkedIn. Together, they discussed what differentiates AI product management from managing other tech products and how to adapt to the uncertainty in the AI product lifecycle.

AI in Enterprise - AI Product Management (P Skomoroch).pdf
Hannah Mayer. 10/2020. “Data Science is the New Accounting.” Edited by Jin H. Paik and Jenny Hoffman.Abstract

In the October session of the AI in Enterprise series, HBS Professor and co-author of Competing in the Age of AI, Karim R. Lakhani and Roger Magoulas (Data Science Advisor) delved into O'Reilly's most recent survey of AI adoption in larger companies. The discussion explored common risk factors, techniques, tools, as well as the data governance and data conditioning that large companies are using to build and scale their AI practices. 

 

Read Hannah Mayer's recap of the event to learn more about what senior managers in enterprises need to know about AI - particularly, if they want to adopt at scale. 

 

AI in Entreprise - Data is the New Accounting (R Magoulas)
Hannah Mayer. 9/2020. “AI in Enterprise: In Tech We Trust.. Maybe Too Much?Edited by Jin H. Paik and Jenny Hoffman.Abstract

While there are dispersed resources to learn more about artificial intelligence, there remains a need to cultivate a community of practitioners for cyclical exposure and knowledge sharing of best practices in the enterprise. That is why Laboratory for Innovation Science at Harvard launched the AI in the Enterprise series, which exposes managers and executives to interesting applications of AI and the decisions behind developing such tools. 

In the September session of the AI in Enterprise series, HBS Professor and co-author of Competing in the Age of AI, Karim R. Lakhani spoke with Latanya Sweeney about algorithmic bias, data privacy, and the way forward for enterprises adopting AI. They explored how AI and ML can impact society in unexpected ways and what senior enterprise leaders can do to avoid negative externalities. Professor of the Practice of Government and Technology at the Harvard Kennedy School and in the Harvard Faculty of Arts and Sciences, director and founder of the Data Privacy Lab, and former Chief Technology Officer at the U.S. Federal Trade Commission, Latanya Sweeney pioneered the field known as data privacy and launched the emerging area known as algorithmic fairness.

AI in Enterprise - In Tech We Trust - Maybe Too Much (L Sweeney)
Hannah Mayer, Jin H. Paik, Timothy DeStefano, and Jenny Hoffman. 8/2020. “From Craft to Commodity: The Evolution of AI in Pharma and Beyond”.Abstract

While there are dispersed resources to learn more about artificial intelligence, there remains a need to cultivate a community of practitioners for cyclical exposure and knowledge sharing of best practices in the enterprise. That is why Laboratory for Innovation Science at Harvard launched the AI in the Enterprise series, which exposes managers and executives to interesting applications of AI and the decisions behind developing such tools. 

Moderated by HBS Professor and co-author of Competing in the Age of AI, Karim R. Lakhani, the August virtual session featured Reza Olfati-Saber, an experienced academic researcher currently managing teams of data scientists and life scientists across the globe for Sanofi. Together, they discussed the evolution of AI in life science experimentation and how it may become the determining factor for R&D success in pharma and other industries.

AI in Enterprise - From Craft to Commodity (R Olfati-Saber).pdf
Jin H. Paik, Steven Randazzo, and Jenny Hoffman. 6/2020. “AI in the Enterprise: How Do I Get Started?”.Abstract

While there are dispersed resources to learn more about artificial intelligence, there remains a need to cultivate a community of practitioners for cyclical exposure and knowledge sharing of best practices in the enterprise. That is why Laboratory for Innovation Science at Harvard launched the AI in the Enterprise series, which exposes managers and executives to interesting applications of AI and the decisions behind developing such tools. 

Moderated by HBS Professor and co-author of Competing in the Age of AI, Karim R. Lakhani, the most recent virtual session with over 240 attendees featured Rob May, General Partner at PJC, an early-stage venture capital firm, and founder of Inside AI, a premier source for information on AI, robotics and neurotechnology. Together, they discussed why we have seen a rise in interest in AI, what managers should consider when wading into the AI waters, and what steps they can take when it is time to do so. 

AI in Enterprise - How Do I Get Started (R May).pdf
Roberto Verganti, Luca Vendraminelli, and Marco Iansiti. 3/19/2020. “Innovation and Design in the Age of Artificial Intelligence”. Publisher's VersionAbstract

At the heart of any innovation process lies a fundamental practice: the way people create ideas and solve problems. This “decision making” side of innovation is what scholars and practitioners refer to as “design”. Decisions in innovation processes have so far been taken by humans. What happens when they can be substituted by machines? Artificial Intelligence (AI) brings data and algorithms to the core of innovation processes. What are the implications of this diffusion of AI for our understanding of design and innovation? Is AI just another digital technology that, akin to many others, will not significantly question what we know about design? Or will it create transformations in design that current theoretical frameworks cannot capture?

This article proposes a framework for understanding design and innovation in the age of AI. We discuss the implications for design and innovation theory. Specifically, we observe that, as creative problem solving is significantly conducted by algorithms, human design increasingly becomes an activity of sense making, i.e. understanding which problems should or could be addressed. This shift in focus calls for new theories and brings design closer to leadership, which is, inherently, an activity of sense making.

Our insights are derived from and illustrated with two cases at the frontier of AI ‐‐ Netflix and AirBnB (complemented with analyses in Microsoft and Tesla) ‐‐, which point to two directions for the evolution of design and innovation in firms. First, AI enables an organization to overcome many past limitations of human‐intensive design processes, by improving the scalability of the process, broadening its scope across traditional boundaries, and enhancing its ability to learn and adapt on the fly. Second, and maybe more surprising, while removing these limitations, AI also appears to deeply enact several popular design principles. AI thus reinforces the principles of Design Thinking, namely: being people‐centered, abductive, and iterative. In fact, AI enables the creation of solutions that are more highly user‐centered than human‐based approaches (i.e., to an extreme level of granularity, designed for every single person); that are potentially more creative; and that are continuously updated through learning iterations across the entire product life cycle.

In sum, while AI does not undermine the basic principles of design, it profoundly changes the practice of design. Problem solving tasks, traditionally carried out by designers, are now automated into learning loops that operate without limitations of volume and speed. The algorithms embedded in these loops think in a radically different way than a designer who handles complex problems holistically with a systemic perspective. Algorithms instead handle complexity through very simple tasks, which are iterated continuously. This article discusses the implications of these insights for design and innovation management scholars and practitioners.

Marco Iansiti and Karim R. Lakhani. 3/3/2020. “From Disruption to Collision: The New Competitive Dynamics.” MIT Sloan Management Review.Abstract
In the age of AI, traditional businesses across the economy are being attacked by highly scalable data-driven companies whose operating models leverage network effects to deliver value.
2017
Karim R. Lakhani, Andrew Hill, Po-Ru Loh, Ragu B. Bharadwaj, Pascal Pons, Jingbo Shang, Eva C. Guinan, Iain Kilty, and Scott Jelinsky. 2017. “Stepwise Distributed Open Innovation Contests for Software Development: Acceleration of Genome-Wide Association Analysis.” GigaScience, 6, 5, Pp. 1-10. Publisher's VersionAbstract

BACKGROUND: The association of differing genotypes with disease-related phenotypic traits offers great potential to both help identify new therapeutic targets and support stratification of patients who would gain the greatest benefit from specific drug classes. Development of low-cost genotyping and sequencing has made collecting large-scale genotyping data routine in population and therapeutic intervention studies. In addition, a range of new technologies is being used to capture numerous new and complex phenotypic descriptors. As a result, genotype and phenotype datasets have grown exponentially. Genome-wide association studies associate genotypes and phenotypes using methods such as logistic regression. As existing tools for association analysis limit the efficiency by which value can be extracted from increasing volumes of data, there is a pressing need for new software tools that can accelerate association analyses on large genotype-phenotype datasets.

RESULTS: Using open innovation (OI) and contest-based crowdsourcing, the logistic regression analysis in a leading, community-standard genetics software package (PLINK 1.07) was substantially accelerated. OI allowed us to do this in <6 months by providing rapid access to highly skilled programmers with specialized, difficult-to-find skill sets. Through a crowd-based contest a combination of computational, numeric, and algorithmic approaches was identified that accelerated the logistic regression in PLINK 1.07 by 18- to 45-fold. Combining contest-derived logistic regression code with coarse-grained parallelization, multithreading, and associated changes to data initialization code further developed through distributed innovation, we achieved an end-to-end speedup of 591-fold for a data set size of 6678 subjects by 645 863 variants, compared to PLINK 1.07's logistic regression. This represents a reduction in run time from 4.8 hours to 29 seconds. Accelerated logistic regression code developed in this project has been incorporated into the PLINK2 project.

CONCLUSIONS: Using iterative competition-based OI, we have developed a new, faster implementation of logistic regression for genome-wide association studies analysis. We present lessons learned and recommendations on running a successful OI process for bioinformatics.

Stepwise_Distributed_Open_Innovation_Contests.pdf
2016
Christoph Riedl, Richard Zanibbi, Marti A. Hearst, Siyu Zhu, Michael Menietti, Jason Crusan, Ivan Metelsky, and Karim R. Lakhani. 2016. “Detecting Figures and Part Labels in Patents: Competition-Based Development of Image Processing Algorithms.” International Journal on Document Analysis and Recognition (IJDAR), 19, 2, Pp. 155-172. Publisher's VersionAbstract

Most United States Patent and Trademark Office (USPTO) patent documents contain drawing pages which describe inventions graphically. By convention and by rule, these drawings contain figures and parts that are annotated with numbered labels but not with text. As a result, readers must scan the document to find the description of a given part label. To make progress toward automatic creation of ‘tool-tips’ and hyperlinks from part labels to their associated descriptions, the USPTO hosted a monthlong online competition in which participants developed algorithms to detect figures and diagram part labels. The challenge drew 232 teams of two, of which 70 teams (30 %) submitted solutions. An unusual feature was that each patent was represented by a 300-dpi page scan along with an HTML file containing patent text, allowing integration of text processing and graphics recognition in participant algorithms. The design and performance of the top-5 systems are presented along with a system developed after the competition, illustrating that the winning teams produced near state-of-the-art results under strict time and computation constraints. The first place system used the provided HTML text, obtaining a harmonic mean of recall and precision (F-measure) of 88.57 % for figure region detection, 78.81 % for figure regions with correctly recognized figure titles, and 70.98 % for part label detection and recognition. Data and source code for the top-5 systems are available through the online UCI Machine Learning Repository to support follow-on work by others in the document recognition community.

Detecting_Figures_and_Part_Labels_in_Patents.pdf