background
Curators at Twitter use Curation Studio to curate trends that users see in the "What's Happening" box on the Twitter home page. Trends will appear automatically according to certain trendyness metrics, and curators make these trends more useful and engaging to users with curated content; summarizing the trend, adding Representative Tweets (and other Twitter specific elements e.g. Events, Spaces), and adding media.
problem
Currently the trend curation workflow is manual; with curators being tasked with triaging related tweets, summarizing the trend, and adding value. This takes a lot of time, and by the nature of manual curation, can also introduce bias for certain sources of information.
Additionally -- we have algorithms that predict "trendyness" and relevance of tweets, but need human intervention to validate and increase precision of these algorithms.
hypotheses
h1
We can both make the curator's task more efficient as well as train a recommender system on tweets related to a trend by utilizing a Human-in-the-Loop system.
h2
If we are able to recommend trendy, relevant tweets to curators, we can increase the diversity of sources of relevant tweet curation elements, thus reducing the bias resulting from manual curation
proposed solution
We propose a Human-in-the-Loop system where the recommender system recommends candidate tweets, as well as its own confidence score for each candidate, to the curator, who then provides explicit signal that we can to train the recommender system in an iterative manner.
requirements
feature - the curator needs to select a candidate tweet as the Representative Tweet
upvote - the curator needs to upvote a candidate tweet, indicating that it is related to the trend and is a Good Tweet*
downvote - the curator needs to downvote a candidate tweet, indicating that this tweet is related to the trend but is a Bad Tweet*
dismiss - the curator needs to dismiss tweets that are not related to the trend
Each of these signals are utilized by the recommender system to provide better tweet candidates.
NOTE: The definitions of these implicit signals evolved as I conducted user interviews with curators. Initial requirements defined downvote and dismiss as essentially the same signal. Defining these two signals separately more accurately represented the curator's mental model of the voting task, and additionally provides richer signal to the model.
*👍 A Good Tweet must meet the guidelines for either a Representative Tweet or a Trendy Tweet.
*👎 A Bad Tweet either fails to meet Twitter’s Global Curation Policy and its guidelines for including individual pieces of content in curated collections, or is deemed Toxic (in accordance with Twitter’s Abusive Behavior and Hateful Conduct policies), spammy, or irrelevant by Curators.
role & responsibilities
product designer, user researcher
all product design (high-fidelity mockups, MVP spec, interactive prototype), ethnographic observation, facilitation, analysis and integration of user feedback
project duration: 3 mos
deliverables / human-centered HITL presentation
deliverables / MVP spec
I designed all high-fidelity mockups and feature variants shown below. I also annotated the mockups with 3 different versions ordered by priority; MVP, stretch and very stretch. The annotations were used both to spec engineering tickets as well as interface definition for curators.
feature highlight / delight
In order to support my stated goal of Human-Centered Human-In-The-Loop, I include playful copy and alternate states as variants in the interface. Human annotation of datasets can sometimes feel boring and repetitive, or even that the human part of the Human-in-the-Loop task is preventing the human from their actual task (e.g. captcha). To combat the feeling of repetition and boringness, I generated copy and variants that are playful and unique. These features were well received in user feedback sessions with curators.



deliverables / prototype
I designed the interactive prototype demonstrated here, as well as the composite components and interactive variants. The prototype was used for both user studies and engineering ticket interaction spec.


future directions
The MVP version of the HITL affordances are (at the time of this case study documentation) in development. Future directions include development and testing of the stretch and very-stretch versions.
Additionally, my prototypes include forward thinking affordances not currently supported, including expected additional algorithms (e.g. labels for classifications like 'authoritative' etc.) and explicit feedback from curators (e.g. the voting labels post up / down vote).
To further the goal of Human-Centered Human-in-the-Loop, I also envision a future version of this product that includes gamification aspects of the HITL task, with the intent of both informing curators through data summaries but also motivation through leaderboards and metric records. This metadata would also be used to test hypothesis h2; can we generate trends with more diverse & relevant representative tweets by incentivizing curators to choose diverse and relevant tweets.