INTERVIEWEE: Jeff Hammerbacher
BACKGROUND: Scientist, software developer, Cofounder of Cloudera and Related Sciences, founding manager of Facebook’s Data team
TOPIC: What gets measured
LISTEN: On the web, Apple, Spotify, RSS
“It makes me uneasy when I think about how much corporations are measuring about human interactions today and not making considered decisions about what gets measured. Deciding what gets measured is a political and ethical choice." — Jeff Hammerbacher
Sup y’all. Welcome to the Ideaspace.
That person was Jeff Hammerbacher, an important figure in data science who now works full-time in biomedicine. As the founding manager of Facebook’s Data team and cofounder of the publicly traded enterprise data company Cloudera, Jeff pioneered a lot of data techniques and ideas — including the title “Data Scientist” — that are foundational today.
Over the past decade Jeff’s attention shifted to medicine, working on cancer research at Mt Sinai using computational methods to design personalized therapeutic cancer vaccines (as covered in the New York Times). He’s also a founder and advisory board member of the COVID Tracking Project. His current focus is a venture pharmaceutical studio called Related Sciences, which does early-stage research to identify and develop new pharmaceutical drugs.
Jeff is someone who continues to make a significant impact on the world while being quiet about it. He doesn’t post on social media, and this conversation for the Ideaspace is his first on-the-record public interview in years. Jeff is a dear friend and one of the smartest people I know. I highly recommend listening to the conversation on the web, Apple, or Spotify, or reading the complete transcript (which goes much more into biomedicine, philosophy, and science). Below is an edited and condensed conversation focusing on Facebook, data science, and whether corporations should live forever.
YANCEY: When you started at Facebook was it still TheFacebook?
JEFF: That sounds right. I interviewed at the end of 2005 and started in the beginning of 2006. Somewhere in that timeframe we expanded from colleges to high schools. After I started we expanded to corporations, and then later that year the big thing was what we called “Open Registration,” which was allowing anyone with an email address to sign up. Somewhere in there the branding evolved from “TheFacebook” to “Facebook.”
YANCEY: What was your first project? What was the state of data?
JEFF: The state of data was interesting. There was an internal tool called the Watch Page: the MySQL databases that deliver data to the website, it was running queries over that data and syncing results from those queries back to a single MySQL server, which would give you a sense of how many active users were on the website and how they broke down across networks. The first order of business when I arrived was to create a more professional version of that, which would be referred to as a data warehouse, that could collect a broader range of data. The questions that people wanted to ask of that data warehouse were primarily around growth: which of the networks were growing, which of them weren’t, what was causing them to grow. That was the early focus.
YANCEY: Were those easy questions to answer? How nascent was that?
JEFF: It was very nascent. I don't think we really had any idea. We explored a lot of different hypotheses. At the time it was broken into networks, so we didn't really think of it as a single product. It was more like, “At this college, we took off, at this college, we didn't take off; what could’ve caused that?” So we did analyses. I remember crawling student newspapers for mentions of Facebook and correlating that with user growth. We looked at things like the largest clusters of users — the largest connected component or nearly-connected component of users — thinking there might be something. We would often compare it to walking into a party where there are seven people standing in the room all by themselves or walking into a party where there's seven people standing in a circle talking to each other. It's a different social experience. We evaluated things like that. We didn't really get rigorous for another year or two. Matt Cohler was an early executive, and I remember him pulling me and a colleague aside and basically saying, “We need to make growth the number one focus for our analyses.” To my first hire I said, “Every week I want you to publish a thing called the Growth Report.” Which was literally a PDF where he would present a set of standardized metrics and then go deep on one hypothesis about what might be causing growth. The Growth Report was a pretty fundamental tool internally at Facebook that ultimately served as the foundation for the Growth team. A lot of the terms like “growth hacking” could be traced to Matt telling us to really emphasize growth for our analyses.
YANCEY: I'm going to get the numbers wrong, but I've heard Chamath Palihapitiya talk about how Facebook was trying to get someone connected to seven friends in the first ten days and then certain different behaviors happen. Are those the kinds of things that were showing up in the Growth Report?
JEFF: I recall doing analyses where we would say, “Take everyone who signed up in this fourteen-day period, and let’s take everything we know about them and their activities on the site, then let's follow them for six months or twelve months” — so this would be a cohort analysis — “and let's try and predict their behavior six months, twelve months from now, based on what we saw them do in those first two weeks.” We would use a methodology known as feature selection, which would allow us to take all the variables that we put into that model and highlight the subset of those variables that was most predictive of the eventual outcomes. Then we would try and create narratives around those features which had high importance. Those features could be derived from hypotheses that different executives had — if someone had a hypothesis that walking into a party with seven people talking to each other had value, then we would figure out a way to formulate that quantitatively and include that as something that the model could put emphasis upon. We also did a bit of automated feature engineering to see if we could actually discover features computed from the basic underlying features. We published some of this stuff publicly, but then ultimately that became very controversial after I left, so I don't think the team published as much of what they found.
NOTE: I later asked Jeff if any of those reports were still online. He linked me to these two examples: Feed Me: Motivating newcomer contributions in social media sites and Gsundheit! Modeling contagion through Facebook News Feed.
YANCEY: I don't know how you want credit, but you created the title of Data Scientist while at Facebook. Why was there a need for a new name for this kind of work?
JEFF: The proximal cause was pretty simple. We had certain people in our group who had PhDs and there was a well-understood Research Scientist title hierarchy. We hired them away from research groups where they participated in that title hierarchy, so when they joined our team they wanted a title of Research Scientist. The other people who came more from the business intelligence realm, as it was known, there's a title hierarchy there and Business Analyst was the standard title. I looked at my group, and at the time, we maybe had twelve, fifteen people. I thought, this is really silly. Why do we need so many different titles for people who are more athletes: capable of on one day writing a script to collect data from the source systems into our warehouse environment, on another day building a dashboard, and on another day running an analysis to identify what might cause people to use the site in a few weeks? We wanted people who were more adaptable and had a broad base of skills to keep the team small.
At my quarterly team retreat I was getting pressure externally to simplify, and in particular to squeeze out any notion of research. At the time there were negative connotations for people to be seen as being involved in research. It was all hands on deck. Today Facebook feels inevitable, but at the time it certainly didn't feel that way. We wanted to emphasize that everyone in our group was actually contributing to the success of the product, not just writing papers and doing research. So the proximal cause was effectively that I needed to get rid of the title Research Scientist in my team. Parsimony led us to Data Scientist as the title that could preserve what they valued in terms of the science. Subsequently people have done all this sleuthing and discovered that people used the term “data science” in different contexts prior to our use of the term Data Scientist, so I certainly don't want to assert that I had any kind of unique claim to creating a field of thought. It really was just trying to find a title that captured what people liked about research scientists.
At the time Google was such a dominant force in what could be done with data. It's not because they were publishing great statistical algorithms. People didn't look at Google and say, “Those are the best statisticians in the world.” It was more that they had a muscle, a capability, to think creatively about how consumer web products could be informed by data, and then to execute on building the infrastructure necessary to leverage data at the scale that they were able to leverage it. I wanted to build a group that gave Facebook that broad base of capabilities. I was a mathematics major as an undergrad and I did a lot of courses in the Statistics department. I felt I had a good understanding of the shape of statistics, and I wanted to be more ambitious than that. The amount of software required to do the job of a statistician or of a data analyst or of a business analyst was ever-increasing. A statistician who can write software might be another great way to describe what a Data Scientist does, in a summary.
But never underestimate how fashionable it all is. Larry Ellison from Oracle always talks about how software is even more trend-driven than fashion. The fact that we called people Data Scientists at Facebook and Facebook became this meteoric success as a startup is really the thing that explains it. We could have called it anything. Because Facebook did it, people would have adopted it.
YANCEY: You left about a decade ago?
YANCEY: Now Facebook and data together are these combustible words. How do you reflect on the mood that exists around social data and Facebook today?
JEFF: One of the books that I recall purchasing and reading to get ready for the job at Facebook was a book called Database Nation by a guy named Simson Garfinkel. The subtitle of the book is “The Death of Privacy in the 21st Century.” Everything that he said in that book came to pass. It's interesting to continue to hear the bull case from people inside of Facebook, but ultimately I think the bears are mostly right. We’ve achieved a level of utter imbalance in surveillance and data use and we need to rigorously regulate surveillance if we're going to get back to a healthy society.
YANCEY: If you would have tried to share ideas like those in that book in a meeting in 2006 or 2007, what would have happened?
JEFF: I often describe the process of building a hypergrowth startup as removing all of the arrows which point in a different direction from the direction that the founder-CEO is trying to take it. At the time, Facebook felt like a very open intellectual environment where you could say that kind of thing in a meeting and there was no real repercussion. People were genuinely trying to figure out what this new world was that we were creating, how we fit into it, and the right way to evolve along with that. Once you reach a certain scale, that sort of deliberation is not helpful if your goal is to get to the scale that a Google or an Amazon or a Facebook has achieved. There was definitely a thinning of heterodox viewpoints that occurred in the era in which I was leaving. If I had brought it up — I probably did. We probably had a good discussion about it, and we tried to achieve a good outcome. But ultimately you're trying to achieve a goal that requires de-prioritizing these kinds of discussions and just focusing on the corporate outcome.
“We're just now entering a period where for the first time in human history, the vast majority of our actions will be digitized. So let's at least give ourselves the tools and the option to perform numerical thinking alongside our very well-developed mechanism for narrative thinking.”
A year later you did a talk at Berkeley where you said:
“It makes me uneasy when I think about how much corporations are measuring about human interactions today, and not making considered decisions about what gets measured. Deciding what gets measured is a political and ethical choice."
Do those things still feel true?
JEFF: Absolutely. The first quote — I like the framing that Past Jeff used there. There are things you see when you use models to interpret digital representations of reality that you don't see when you're equipped with the senses with which we're all equipped. A lot of those things that you might see could have tremendous value.
YANCEY: What do you mean by that?
JEFF: One example that jumps to mind is some very cool work out of MIT on something called Affective Computing, where they can use computer vision to detect emotional states that might not be apparent to people that have a hard time detecting emotional states using the sensory and reasoning apparatus that they've been equipped with from birth. Often these are autistic people or other people who have deficits of perceiving emotion. That's one thing that you can imagine that an algorithm can see that could bring someone up to the same perception level that the rest of us have. Another simple example could be: I could take a recording of this talk and I'm sure I would recognize verbal tics or things I'm doing that I wish I weren’t in terms of how I speak. They could be arrived at just by watching it, but I could also run algorithms over it that could detect things that I didn't personally detect by watching it. It might say, “You’re using words of Germanic origin in situations where had you used one from a romance language you would have been received more softly." There are things like that where allowing algorithms to operate upon a digital representation of reality can give us a deepened awareness of what reality is. That is genuinely exciting for me, and one of the reasons why I enjoy working in data. But obviously there are things that you can do with those insights — things like predictive policing — which should not be done.
YANCEY: If you think about the organizations you've been a part of, or even just the field, why do we choose to measure something?
JEFF: There's two pieces. There's what we choose to measure, but there's also what we can measure. Those two things determine structures of society to a much greater extent than we recognize. To your question about what we choose to measure — the way that I said it in that talk is the way that I still think about it today: that it's a moral, ethical choice. We need thought experiments related to measurement in the same way that we have thought experiments related to the trolley problem.
I actually find myself arguing against measurement in many situations in which customers of mine at Cloudera or companies that I invest in now are pursuing a strategy that involves measurement. I can see the potential for improved business outcomes, but I also see the potential for reduced societal outcomes. To put it in the language of the Bento: would Future Us want that measurement act to be performed? Right now we're erring on the side of if it makes us money — in your language, financial maximization is really the ethical principle that's being applied to whether or not we should measure something. We need to move to a different ethical principle.
YANCEY: People in power decide what gets measured, and they measure things that are ultimately in their interest. So you have things like workplace monitoring for lower-level employees but not for higher-level employees, even though higher-level employees might be more prone to commit crimes that have a greater impact on the business. Measurement can be a form of control.
JEFF: When we think about freedom, freedom from domination is a term that I see a lot in moral philosophy. Freedom from observation. There was a conversation I had a long time ago where someone said something that changed the way I thought of cities forever. They said the city invented anonymity. I think about growing up in the Midwest in a smaller town. Even though I wasn't fully surveilled, I did feel coupled to an identity that had been constructed by others and given to me. Moving to New York City, that identity was no longer coupled to me. But then we reinvented small town life — and this is nothing new, Marshall McLuhan has given lectures on this for fifty, seventy years — but it really struck me that we basically uninvented anonymity. Perhaps there's a role for it in contemporary society. I don't think we're going to figure it out on the timeline of our lives, though. I think we're going to live in an era where anonymity is not a choice that we can make. We need to figure out how to construct a society in which that is a choice. We've become this global village. How can we reconstruct the metropolis in the digital realm?
YANCEY: There's a question you asked in one of your talks that I've been thinking about, which is: how do corporations die? It made me think about a related question, which is whether companies should have wills. Could a company declare from the very beginning: “If this happens, here's the Do Not Resuscitate order. Shut us down at that moment.” Some hard-coded kill switch if things go truly astray.
JEFF: The first thing I thought of is my mom always says to me, “If I get a perm, shoot me.” [laughs] Now whenever I bring up companies dying, I’m gonna use this. The Do Not Resuscitate is like “if there's a perm on this head…”
Around this time last year, I was basically working full-time to help create something called the COVID Tracking Project, which became an authoritative source of state-level data on tests, cases, hospitalizations, and deaths from COVID-19 in the United States. That organization just shut down. My esteem for the leaders of the COVID Tracking Project grew immensely when they made the difficult decision to not capitalize on the achievement of the COVID Tracking Project. I mean, this thing had 500,000-plus Twitter followers, and these people are journalists, they work in an industry that’s had a very difficult time rewarding its highest achievers. This absolutely could have been a vehicle for self-aggrandizement for the leaders, and they made an incredibly humble choice to shut it down because it made sense. If you paid attention to the motivating mission of the organization, then this was clearly the right call.
Corporate America has asserted that corporations should be considered as people. Should be treated as people, literally by the law. I'm no expert on that, there are other people that can provide a lot more color on that. But you know what? People die. Most of what makes us human is working backwards from that. Why can't corporations be under the same restrictions? Corporations should die, I believe. Corporations also should not be able to be stunningly obese. They should not be able to be so large. In the same way that it's difficult to be healthy as an organic being if you consume, consume, consume, we need to figure out ways of a more humane capitalism and, frankly, a more humane political economy.
YANCEY: Last question: for someone who's interested in data science and applied research, if you were to be starting over right now knowing what you know, how would you do it?
JEFF: I worked on Wall Street for a year out of college and learned a lot. They do really hard math on Wall Street. I learned how to code professionally. I learned how to write software as an eight-year-old, I did it as a hobby, but I learned how to write software professionally and how to do some really hard math, on Wall Street. And honestly, just how to show up at work and create value for other people trying to solve business problems. At Facebook I learned how to hire and manage a team, and build out the infrastructure for storing and analyzing petabytes of data. All those skills were incredibly important to be able to do anything that I could be proud of. I'm proud of the learnings that happened and I'm proud of the interpersonal commitments that I upheld in those work environments. But I can't say that I'm proud of the impact on the world that the work that I did at either of those places had. So I guess I would say: we all have to optimize for skill-building early on, and preserving optionality. I often tell people that finding a place where you can see what the win state looks like, a place where they’re doing things right, go sit inside of that and pay attention. If you're coming out of college, thinking that you can invent the win state from whole cloth is a bit naive. Every once in a while it happens, but you're probably better served by trying to observe the win state inside of somewhere that's doing good work.
In terms of finding one that can align with your personal values — that’s a lot harder. I personally think that the skills that we build come from the people that we solve problems with, and the quality of problems we're solving. I would place the most emphasis on that early in my career, recognizing that until you have the economic freedom, don't put too much pressure on yourself. This is difficult, because ethically, it's like, “Okay, even for one minute contributing to something that you don't fully believe in…” But things evolve. Albert Wenger mentioned to you that when they invested in Twitter, the position was “that's silly.” Now it's too weighty for its own good. Ethics change. I personally recognize constraints that cause people to make decisions that I might not agree with ethically, and I don't vilify them for that. I personally am an incredibly flawed human. We're all collaboratively trying to get better and we’re all trying to solve for constraints.
Early on, find a hard problem to solve with smart people and have fun doing it. Then as you build more autonomy for yourself, that can allow you to find your way into things that you think may be more aligned with how you feel. I would also say honestly, I don't know that I had a fully-formed ethics coming out of college. There's a lot of moral controversy happening in college today, and if I try to place myself in that environment, I don't know how I would have perceived it then relative to how I perceive it now. I don't want to put so much pressure on people to be fully-formed ethically at nineteen. Focus on building skills, keep the option value open. Then figure out who you are and what you value, and then bend your career arc towards that over time.
- VIDEO: Jeff Hammerbacher at Berkeley on Ethics and Data Science (highly recommended!)
- Related Sciences
- Jeff’s investment fund Techammer
Join the newsletter to receive the latest updates in your inbox.