Seminars & Conferences

Faculty Research - Seminars & Conferences - TGIF: Research Seminar Series on Data Science in Business

TGIF (Think Grapple Innovate Fridays): Research Seminar Series on Data Science in Business

In early 2020, the School of Data Science instituted a new tradition we call TGIF, or “Think-Grapple-Innovate-Fridays.” These recurring events are intended to bring together researchers from a variety of departments and disciplines to explore areas for collaboration. Please visit this link for more information.

2021 Speakers:

  • January 22, Rupert Freeman, "Eliciting Truthful Forecasts with Competitive Incentives"
    • Abstract:

      I will discuss a setting in which a principal elicits probabilistic predictions from multiple forecasters on a sequence of events. Unlike the standard setting in which the principal can reward each forecaster independently, we will instead suppose that forecasters compete for a single prize. This competitive incentive structure, common in many real-world settings, is known to distort the incentive properties of existing truthful elicitation schemes, meaning that forecasters may no longer truthfully report their predictions to the principal.

      I will begin by presenting a truthful mechanism for selecting a single winning forecaster once all events have materialized. When the number of events is large, we are guaranteed to choose the most accurate forecaster with high probability. I will then consider a closely related problem in which the principal wishes to use the elicited forecasts to make accurate predictions of her own by, for every event, mimicking the prediction of one of the forecasters. Forecasters derive utility (the "prize") from having their prediction used by the principal. We show that the principal can simultaneously incentivize truthful predictions and make an accurate sequence of predictions as measured by their competitiveness with the single most accurate forecaster in hindsight (aka. their regret). The mechanism we obtain can be viewed as a truthful analogue of the multiplicative weights algorithm from online learning.

      Based on joint work with Chara Podimata, Jens Witkowski, Jennifer Wortman Vaughan, David Pennock, and Andreas Krause.

  • February 19, Michael Albert, “Provable Lower Bounds for Black Box Mechanism Design”

    • Abstract: The field of mechanism design has had significant success in constructing optimal mechanisms to allocate resources when there are information asymmetries. These successes include the modern second price auction format implemented by eBay as well as matching markets for medical residencies and kidney exchanges. However, there are many situations under which no optimal mechanism is known. For example, the revenue optimal mechanism for two bidders with two items is an open question. This has led to the adoption of black box non-linear function optimizers, such as deep neural networks, to learn good performing mechanisms. However, these learned mechanisms only approximately satisfy traditional mechanism design guarantees, such as incentive compatibility. Given that these mechanisms fail traditional mechanism design guarantees, they cannot guarantee any lower bound on their performance. In this work, we present a procedure where by having sample access to a mechanism we can prove a lower bound on the performance. Moreover, we develop new techniques to construct mechanisms using deep neural networks that provide good lower bounds on the performance.

  • March 19, Jingjing Li, "Individualism during Crises: Big Data Analytics of Collective Actions amid COVID-19"
    • Abstract: In this study, we present the first evidence of how individualism, a persistent American culture, shapes online and offline collective actions and impacts attitude towards vaccination during COVID-19. Following economic history studies, we leverage GIS techniques to construct a new US county-level individualism measure that traces the time each country spent on the American frontier in 1790-1890. We then analyze this measure's impact on GoFundMe fundraising activities, county- and individual-level social distancing compliance and attitude towards vaccination collected by US and global surveys. Our analysis unlocks a rich set of insights. First, higher individualism reduces both online donations and social distancing during the COVID-19 pandemic. An interquartile increase in individualism reduces COVID-related charitable campaigns and funding by 48% and offsets the effect of state lockdown orders on social distancing by 41%. These effects hold both in Democratic and Republican counties and are robust to alternative specifications, measures, and samples. Second, we find that government interventions, such as stimulus checks, can potentially mitigate the negative effect of individualism on charitable crowdfunding. Third, the impacts of individualism on social distancing are stronger in counties with higher externalities, such as those with higher population densities or higher percentages of seniors. Hence, the discovered negative impacts of individualism may be partly driven by the failure to internalize the positive externalities of the collective actions. Finally, we find strong evidence that individualism negatively impacts the intent to receive the COVID-19 vaccine, both within the U.S. and across countries. This research is the first to uncover the potential downsides of individualism during crises, despite the well-documented upsides of individualism in driving innovation and economic growth. It also explores individualism as an intra-country cultural dimension and demonstrates its critical value in big-data-drive, culture-aware policymaking.
  • April 23, Max Biggs, "Loss Functions for Data Driven Personalized Pricing"
    • Abstract: With increasing amounts of data gathered on customers and how they interact with products, firms are looking for ways to utilize this data to improve their pricing policies. In particular, firms are looking to tailor their prices for different customers to maximize their revenue. We study a pricing setting where each customer is offered a personalized price based on customer and/or product features that are predictive of the customer's valuation, or willingness to pay, for that product. One obstacle encountered in practice is that firms are limited in the type of data they can access. Often only historical sales records are available, where we only observe whether each customer purchased a product at the price prescribed rather than the customer's true valuation. As such, the data is influenced by the historical sales policy which introduces difficulties in estimating revenue from pricing policies. We approach this problem using ideas from casual inference and machine learning. In particular, we study how to formulate loss functions which directly optimize revenue, rather than going through an intermediate demand estimation stage. These loss functions have certain asymmetries which aren't present in typical classification loss functions, necessitating the development of new techniques. In this talk I'll review some ideas from the literature and our progress thus far.
  • May 21, Brent Kitchens, "Understanding Echo Chambers and Filter Bubbles: The Impact of Social Media on Diversification and Partisan Shifts in News Consumption"
    • Abstract: Echo chambers and filter bubbles are potent metaphors that encapsulate widespread public fear that the use of social media may limit the information that users encounter or consumer online. Specifically, the concern is that social media algorithms combined with tendencies to interact with like-minded others both limits users' exposure to diverse viewpoints and encourages the adoption of more extreme ideological positions. Yet empirical evidence about how social media shapes information consumption is inconclusive. We articulate how characteristics of platform algorithms and users' online social networks may combine to shape user behavior. We bring greater conceptual clarity to this phenomenon by expanding beyond discussion of a binary presence of absence of echo chambers and filter bubbles to a richer set of outcomes incorporating changes in both diversity and slant of users' information sources. Using a data set with over four years of web browsing history for a representative panel of nearly 200,000 U.S. adults, we analyzed how individuals' social media usage was associated with changes in the information sources they chose to consumer. We find differentiated impacts on news consumption by platform. Increased use of Facebook was associated with increased information source diversity and a shift toward more partisan sites in news consumption; increased use of Reddit with increased diversity and a shift toward more moderate sites; and increased use of Twitter with little to no change in either. Our results demonstrate the value of adopting a nuanced multidimensional view of how social media use may shape information consumption.
  • June 25, Gizem Korkmaz, "Measuring the Cost and Impact of Open Source Software Innovation on GitHub"
    • Abstract: Open Source Software (OSS), defined by Open Source Initiative, is computer software with its source code shared with a license in which the copyright holder provides the rights to study, change, and distribute the software to anyone and for any purpose. OSS is developed, maintained, and extended both within and outside of the private sector, through the contribution of independent developers as well as people from universities, government research institute, businesses, and nonprofits. Examples include Apache server software, and R statistical programming software. Despite its ubiquity and extensive use, reliable measures of the scope and impact of OSS developed outside of the business sector are scare. Activities around OSS development, a vital component of science activity, are not well-measured in existing federal statistics on innovation. Many of the OSS projects are developed and maintained in free repositories, such as GitHub, and information embedded in these repositories, including the code, contributors, and development activity, is publicly available.
      In this paper, we use data from GitHub, the largest platform with over 30 million users and developers worldwide, obtaining information about OSS projects. We collect 7.8 million project repositories, containing metadata such as author, license, commits (approved code edits), and lines of code. We adopt methods used in software engineering to estimate the resource cost associated with creating OSS. We use lines of code as the measure of effort to estimate the time spent on software development and calculate the monetary value using the average compensation for computer programmers from Bureau of Labor Statistics wage data and other costs based on national accounts methodologies. The preliminary estimates show that the resource cost for developing OSS projects exceeds $512 billion dollars in 2019, we can directly attribute $36.3 billion to US contributors. Finally, we propose to use network analysis methods developed for bibliometrics and patent analysis to study the impact of these projects, and the actors of the OSS ecosystem.