Hiring data teams
This is one of those posts that I’m writing mostly because I’m still frequently hearing people say they’re struggling with the question of how to interview data people. I’ve blogged and spoken previously about the misery of being an interviewee, so suffice it to say I have a ton of empathy for how awful it is to go through a bad interview process, and how disappointing it is to go through a long, grueling interview and not get an offer.
How I hire for data teams
After many years in research science, where I watched a lot of science presentations and helped interview a large number of graduate students and postdocs and faculty (I never counted how many), I’m a big fan of learning from what other people have done, rather than reinventing the wheel.
I’m also an engineer in the way I think about things, which is to say I like to iteratively improve on any processes I develop. If it’s working, great, don’t fix it if it ain’t broke.
But I think we can all agree that hiring in tech is still a work in progress.
So here’s my current process, partly based off of what the Product Hackers were doing when I joined them at Yahoo, and partly based off my own observations of what has worked well over the last few years, and what didn’t.
I’m not saying it’s perfect, but I’ve been happy with the candidates that I’ve hired, and the teams that I’ve built. I’m always looking for candidates who bring complementary skill sets, and whose personalities work well together.
I’ll also talk a bit about having to miss out on candidates I wanted but couldn’t hire for various reasons.
How I think about hiring
In my research career, we did a lot of screening. We screened samples of cells or DNA or antibodies almost daily. Some of it is just a numbers game, and has to do with sampling. Some of it has to do with how you do your quality assessments. This basically boils down to two keys to success:
-
Start with the most diverse population you can get.
-
Use the right filters.
Note that in research science, we never ask anyone to go to the bench and perform an experiment to prove they have technical skills. We have to figure it out just from talking to them. So I mostly focus on that.
So if all you can do is talk to someone, you have to choose your filters wisely. Here are the things I always prioritize, regardless of the team, though not always in this order. Your needs may be different.
-
communication skills (both written and verbal)
-
curiosity
-
work ethic
-
willingness to learn from everyone
-
ability to work independently
-
ability to work with others
-
technical aptitude.
Step 1. Outreach: start with the most diverse population you can get.
Advertise in diverse networks. Go out and meet people.
Don’t just stick a job ad on the web and wait for people to apply.
If you’re having trouble finding diverse candidates, either you’re not looking, or your company is already too homogenous, and it sends a message that either a) you haven’t prioritized this, or b) you’re biased. Maybe both.
Do think about how your job ad is written. Here’s a map of how some common red-flag words sound to many candidates:
{'passionate': 'no work-life balance',
'rockstar': 'elitist',
'fast-paced': 'highly stressful, disorganized work environment'}
Do break your ad into “essential” and “nice-to-have” skill sets.
If you’re not sure what skills the team needs most, you shouldn’t be hiring anyone yet.
And then: go to meetups. Join slack groups. MEET PEOPLE. Ask around. Be approachable. As hiring manager, you should take calls with anyone who asks who seems potentially qualified. If you’re really getting too many candidates that look good and you can’t screen them all, get a recruiter to help, but you’ll have to give them very specific instructions about what to ask, and what qualities to prioritize.
Step 2. Resumé review
I have hired people whom I met via a Slack group or a conference and their resumé wasn’t the first thing I knew about them. It’s worth thinking about how much the resumé really captures about what a person is like to work with.
Content
Don’t get hung up on what schools people went to, what they studied, or where else they’ve worked.
Consider candidates from big companies and small ones, from academia and from industry.
Don’t assume you can tell much about a person from their resumé alone.
The savvy candidate will get someone to help them with formatting, which may help them stand out from the crowd. That can help showcase their skills, which is great, but it can also be misleading.
You’ll have to talk to them to find out if there’s any substance behind the shine.
Verbosity and layout
One thing I look at carefully is whether the resumé seems organized. I don’t care if they’ve used bullet points or sentences, as long as the communication about what they did is clear.
If the resumé seems jam-packed with buzzwords, I’m going to drill down on that (a little) in the phone screen.
If they’ve been judicious in what they chose to emphasize about their past experience, that suggests to me that they’re thoughtful and focused on quality (or they had help).
It’s not necessarily a bad thing if they had help with formatting, if anything it suggests they know how to find resources for career growth and take advantage of them. And they understand the importance of communication.
Step 3. Phone screen
1. Answer a deceptively simple question: When would you use SQL vs. python?
What I’m looking for when I ask this:
- Can I understand your answer? Are you able to express your thoughts clearly?
- Does your answer make sense? Are you able to give examples to support your reasoning?
- Are you familiar with these tools? Are you honest about your experience level if you’ve used one more than the other, and are you thoughtful about how that affects your choices?
2. Tell me a data story about a project you did, or a job you had where you learned something.
Talking about past projects is best, but that only works well for senior people, and it also takes some experience as the interviewer.
Note that I don’t care if they’re talking about a software project. It could be a science project or a math project. It should be something that involved data.
To really do it well and make it fair, you have know what to ask, what to listen for, when to dig into something someone mentions in passing, etc.
Otherwise it can be too biased toward people who are particularly outgoing, self-promoting, well-rehearsed, not nervous, or just more like you than the other candidates.
So if you’re going to take this approach, try to keep some structure around it to make it easier to compare across candidates.
Here are the basics I try to cover, and some people need more prompting than others:
- What was the business problem you were trying to solve?
- What approaches did you consider?
- Where did the data come from?
- What did you end up doing?
- Who else worked on this project with you? How was the work divided?
- What was the end result of your project? Was it successful? How could you tell?
- What would you do differently if you could do it over again?
- What did you like best and least about this project?
You’ll notice that the better candidates can communicate at both high and low levels, that is, they can talk about the big picture and business impact, as well as discussing the technical details, trade-offs of the choices they made, and think critically about what could’ve gone better.
Having said all that, if you have to hire very junior people, that’s not going to work. They won’t have projects they can talk about that are directly relevant to the role, or they will have been mostly just following directions. In those cases, it’s hard to get a sense for their work ethic, and whether they can work well with others, or handle responsibility.
Story time: I remember interviewing a technician for one lab I worked in. We didn’t have any candidates (college students or recent graduates) who had experience with breeding, feeding, and otherwise maintaining Xenopus Laevis. I ended up asking people about their previous work experience of all kinds.
I hired one student who had worked with her parents, who owned a Subway sandwich shop. She was prompt and efficient, took notes, referred back to her own notes frequently, and communicated clearly. She learned quickly, asked great questions, and was enthusiastic about trying new things and taking on responsibility.
I had another student who was super enthusiastic, but her notes were so bad they were basically useless. She had done a lot of volunteer work with kids, so she was good at thinking on her feet, but didn’t enjoy the methodical aspect of working at a bench.
My advisor at the time also hired a guy who had worked in a pet shop. On paper, he was perfectly qualified to help take care of the frogs, but in practice, he was kind of disorganized and unreliable, and unwilling to take on much responsibility without our having to nag him, for example, to keep on top of things that were part of his job, like ordering supplies.
Point being, think carefully about what qualities you want for the role, and what the team needs, and interview accordingly.
More on this in the section about building teams.
Step 4a. Take-home (or code pairing, though no one has chosen that when I offered it)
Why do a take-home at all?
I’ll admit, I hate giving homework. I really do. But it’s mostly just a discussion tool. It’s a way for us to talk about work together, and puts all the candidates on equal footing.
It also gives me a lot of ammunition in case anyone pushes back on a candidate I want to hire. If all we have to go on is subjective conversations, and no recent technical work product to reference, if someone comes back and says “this person isn’t technical enough”, or whatever, argues that they aren’t senior enough for a Senior title, it can help to persuade them that this is the right person for the job.
So to try to make it as pleasant as possible, I give the candidate multiple days to work on the take-home, even though I only expect them to spend a couple of hours on it. This way they can work at their convenience.
It’s supposed to be a facsimile of a real, though brief, work experience. It’s open-book. I ask them to disclose any resources they used (Stack Overflow, asking friends, etc. are all fine).
I schedule a phone call in the middle of the take-home. This is another trick I learned from my team at Yahoo.
I give the candidate time to look over the data, and maybe start working, and I encourage them to come up with questions to ask me. I learned that I have to tell them it’s part of the evaluation, because otherwise they’ll think they’re not supposed to ask anything.
I’ve never hired anyone who had zero questions during that phone call. If they have no questions, that usually means they didn’t start on the take-home yet, they didn’t bother to prepare for the phone call, or they lack curiosity, or all of the above.
I have had one or two people refuse to do a take-home, and they also turned down a live code-pairing option. In one of those cases, the person already had a full-time job, and wasn’t sure how much she wanted a new job.
Tiered approach
The best interview questions work for candidates at all levels. More senior candidates will make it further and their answers will be more sophisticated, but ideally we still want to level-set with a series of example tasks. So I use the same take-home assignment for everyone, though I re-do it for each company, and each cohort. I haven’t had problems with candidates cheating, but I figure why set yourself up for having to worry about that?
Note: always use real (sanitized) data, which is relevant for the job. Trick questions are the wrong filter.
Structured data
The most basic level for a data person is, can they handle structured data, and extract information from it. Everyone should be able to do this, from an entry-level analyst to a senior machine-learning engineer or data engineer.
I always give them total freedom to use whatever tools are within the realm of things we typically use on the team - usually that would be vanilla python, pandas, SQL, scala, or java. Maybe R or Julia if that’s really all they know how to use. Depends on the role and the team. I don’t want to deal with reviewing submissions in javascript or SAS, though I’ve gotten them before. So I generally encourage people to use python if they can.
I start by giving them a CSV, usually a time series of some kind, and their task is to clean the data, answer some basic questions about it, and then tell me a data story (see below) about what they observed. This part should be fun!
They can build models if they want, but it’s not required.
I try to choose data sets that I think are representative of the types of things we see, and I sometimes use a trick that my team at Yahoo used, which is to dirty up the data a bit on purpose. Maybe drop some rows, mangle a couple of dates, just to give them some things to work with.
This part shouldn’t take more than an hour.
Summary statistics
At a minimum, I expect everyone from analyst to data scientist to engineer should be able to use the right summary statistics. They should know the difference between a median and a mean, and why it matters. I explicitly ask them to answer some of these kinds of questions, again just for level setting, and to demonstrate that they can follow directions.
For a more senior person, this should be trivial. I’m not quizzing them on statistics jargon or anything like that (I personally have a terrible memory for that sort of thing anyway). I want to know if they understand the concepts, and if they can reason and communicate about things like the size of the data set, and how they would divide up subsets of the data.
Room to tell a story
I want to hire people who can think creatively, and dig deeper, without my having to tell them what to do every step of the way. This doesn’t require a ton of experience, although it can help. Mostly it requires curiosity.
It’s really important to me that everyone on the team can:
-
Frame what questions they thought of while they were exploring data and/or cleaning data
-
Describe the steps they took to answer those questions
-
Discuss what they learned. Were they surprised by the results? Why or why not?
This could be as simple as “I initially assumed that the data set only covered March, but when I checked, I saw that there are a couple of days in April, so I had to group the summary statistics differently than I did at first.”
Or it might be something like “I wanted to see what features were predictive of xyz, so I built a model”
Or it might be a bunch of data visualizations.
Or all of the above.
Ideally, I want a team where everyone sees different sides of the elephant.
Unstructured data
For more senior candidates or people coming from more of a software engineering/computer science background, I also include a sample of unstructured data.
This might be a file with a bunch of log lines, or a json file that isn’t entirely consistent with some nested fields, or in one case, I used the paginated output from a particular type of database that included multiple kinds of messy stuff.
I always choose something that I was able to clean up and analyze in an hour or less.
In this case, I’m looking for all the same things as above, plus I care a lot more about the code. I want to see logging, tests, and error handling.
This has turned out to be a great litmus test for candidates with more coding experience, because there are so many different ways to parse a file.
I had a couple of people who used a dead-letter box for bad rows.
I had one person who contacted me to ask which fields mattered, before unpacking everything (I really really wanted to hire her, but she ended up going elsewhere).
I had one person who wrote some perfectly decent code, but just blindly unpacked and included everything from the input file (that was a no-hire).
Not everyone does the unstructured data part. I leave it up to the candidate whether they want to try it.
I should note that I don’t care if they got through the whole thing, or got a perfect solution. I’m more interested in whether they tried, what approach they took, how far they got, and if they’re able to articulate how it went.
Step 4b. Review the take-home and do a follow-up phone call.
I always do a phone call when I’m done reviewing the take-homes, to give the person a chance to tell me what they liked or didn’t like, what was hard or easy, and what they spent the most time on.
I’ve found this is useful for improving the take-home, and helps me decide who to move onto the next stage.
By this point I’ll have had 3 phone calls with each candidate, on 3 different days, so even if each call is only 15 minutes, it’s not a huge time commitment, and it’s worthwhile for me to get to know them a little better.
Step 5. Presentation
If the take-home looks good, I’ll ask them to do a video call to present their work.
This step is critical for weeding out cheaters, and seeing how the candidate deals with being asked questions or offered feedback.
I’ve had the person explode with anger when I asked something along the lines of “I see you did xyz, did you consider abc instead?” That didn’t make a great first impression.
I’ve had the person who seemed unable to talk through what their own code was doing, even with help, and was genuinely surprised and confused when we pointed out some pretty egregious logical errors.
Ultimately, I’m looking for confirmation that people did their own work, and even if they had help (which is fine), they can explain the code they ended up using.
1:1 with me for more junior people
One of the mistakes I made in the past was having the candidate get up and speak in front of two or more people. This is too much for junior folks, they get pretty freaked out and tend to underperform.
For the whole team if they’re more senior
If the candidate is more experienced and they don’t have a phobia of public speaking, I’ll ask them to walk the whole team through their findings. More experienced candidates will make slides anyway, and they’re usually happy to discuss their reasoning, their approaches, and their insights. If they’re really outstanding, we might include this as part of the “onsite”.
Step 6. “Onsite” aka “meet more people + some other stuff”
The main point of the onsite is for the team to meet the candidate, if they haven’t already, and for the candidate to spend more time getting to know the team they’ll hopefully be joining. It’s also important for the candidate to meet people in other parts of the company.
Fill the gaps from the take-home
If they haven’t done any SQL as part of the take-home, and it’s an important part of the job (this depends on the company and the role), we’ll have them do something simple at the onsite to demonstrate that they have basic skills. I don’t care if people can’t do windowing functions off the top of their head (I usually can’t). I want to know that they at least have a basic sense of how to retrieve data and the order of execution of SQL statements.
If they ran out of time to build any models because they were futzing with code, or they spent all their time on models but didn’t do any polishing on their code, this is the time to get more information about their actual experience level with these things.
If they’re more senior, we’ll have them talk through a potential modeling project with a more junior team member. Here I’m looking for whether they’ll be able to help mentor, and grow the team. Are they condescending? Do they listen? How do they react when someone asks them for help?
See how they talk with nontechnical stakeholders
For data roles, it’s really important to be able to talk to people in all parts of the company. We are often the glue. In my last role, we supported basically the whole company. We had projects for all the teams: Product, Engineering, Executive, Customer Support, Finance, Sales, Marketing, and sometimes Ops too.
When I set up one of these interviews, I’m looking to evaluate the following:
- Can the candidate re-frame the business question as a data question?
- Do they know how to scope a problem and estimate timelines for projects?
- Are they willing and able to communicate with someone who doesn’t know, and doesn’t need to know, all the technical details of a potential solution?
- Do they know when to under-promise and over-deliver? Do they know how to politely push back when someone is asking if we can build them the moon and have it done by last week? For more experienced folks, can they figure out how to deliver something smaller/simpler/hackier while we work on a better long-term solution?
Code review exercise
This is one of those things that I haven’t always done, but I think everyone should. One of the first startups I worked at did this as part of the interview, and it was helpful. The expectations are going to be different depending on the person’s level, and it says a lot about how they’ll participate on the team.
The take-home exercise I give is a greenfield project. But a lot of what we do in real life is re-use and revise other people’s old code, and help each other iterate.
Can a junior candidate read and understand an unfamiliar piece of code?
For a junior candidate, I’d give them a piece of code that we know runs just fine as-is, and is maybe currently in use.
This is similar to the take-home in the sense that I just want them to talk through what they think the code is doing, and how it works.
They should be asking questions, not making assumptions. If they have suggestions for improvements, that’s great. If not, that’s ok too.
Can a senior candidate give constructive feedback?
How to do code reviews is a whole can of worms, and other people have written a lot about it. So I’m not going into details on that here. I’ll just try to summarize some of my thoughts.
Code review is critical to how the team grows, maintains institutional knowledge, and levels up. Good code review should speed up productivity, not beat down morale.
A senior person should be able to make useful suggestions for improving code, and they should be able to do it in a non-judgmental, non-blamey, non-mansplainy way. They shouldn’t feel it’s their job to find something to pick on just to be able to say they found something that needed to be fixed.
They should be able to break their suggestions down into: critical/blocking issues that should be fixed before code is merged; stylistic suggestions; and nice-to-haves, things that could wait for a separate PR.
They should be able to articulate their thoughts on version control and what constitutes a good PR. I don’t believe there’s just one right answer here, but one time somebody got mad at me for my having opened a PR that was more than 100 lines of code for a greenfield project, and I never want that to happen to anyone on my team.
Last year, I made a checklist for how I want code reviews to work on my teams. I will probably just stick it in my github for easy access (for myself and others to use). I view this as a living document, and I assume that depending on who is on the team and my observations, it will be modified accordingly. So when I’m hiring, I’m looking for people who are open to discussing these things in a collaborative way.
A more senior person should also be ok with peers and junior people on the team asking clarifying questions about their feedback, or pushing back on it, or maybe just acknolwedging but otherwise ignoring it. They should be able to explain the reasoning for their suggestions, and prioritize easy fixes over hard ones, and be able to discuss the trade-offs of using imperfect code as time or other constraints require, and how they decide when to do that vs. when to undertake larger cleanup projects.
Here I’m mostly looking to see how well this person negotiates with at least one other person on the team.
Successes: How this process correlates with on-the-job performance
Why should you use my process? Well, maybe you don’t want to use all of it. It’s pretty involved. But hopefully some of the stuff I’m mentioning here is useful for you. On the other hand, I’m really proud of the teams I’ve built, and I wouldn’t mind if this became the new standard for data hiring.
People who did well on the interview actually did better on the job. No one did worse.
Everyone I have hired has impressed me. I’ve had a couple people who were really strong on math and SQL, a couple who were great at data visualization, a couple who were great at coding, and everyone was good at working with stakeholders. They also all helped each other, which as a manager meant I could focus on helping out in other ways.
So I’d say my filters worked pretty well in the sense that I hired people who performed well on the job, and made the team stronger.
Building a team with a diverse set of talents.
It’s so important to hire people with different strengths. It’s important to have an open mind about what you’re going to find. I look for aptitude and the stuff I can’t teach: curiosity and work ethic.
Failures: people I couldn’t hire
Unintentionally terrified the candidate, more than once.
We’ve probably all done this. There were a couple of candidates who had to deal with 2:1 interviews, and I think it freaked them out and they didn’t do as well as they would have otherwise. Try to avoid doing this, especially with junior candidates. It’s too intimidating, and it’s not worth it for whatever you think you’re gaining by having someone shadow or whatever. They can shadow in a mock interview instead.
Lost a candidate I really wanted because she got a better offer.
This has happened to me twice now, and in both cases I was disappointed that we couldn’t offer what these candidates were worth. I’m still glad they interviewed with us, though, because it helped me benchmark expectations and helped educate the less experienced members of the team about what to look for.
Lost candidates because of visa problems.
Sometimes you find the person you think would be best for the team, and it turns out they’re not authorized to work in the United States. Sometimes the timing just doesn’t work out, or your company can’t sponsor anyone. This is always frustrating because it ends up feeling like a waste of everyone’s time.
Building Teams
At the end of the day, or the very long blog post, all I’m trying to do when I hire is find people who bring some skills that the team needs, and have a conversation. It’s not a test, I’m not grading anyone or evaluating them as people. It’s just a small snapshot and it can be hard to make a decision based on a small number of encounters. This is where I rely on the rest of the team to give me their perspective.
It’s also important to keep in mind that you always want to have at least one junior person, and at least one woman in the interview loop.
I’ve worked at places where I was the only woman on the engineering team, which meant I met with probably more than my fair share of candidates (almost all of them were men).
I also had an interesting experience once with a candidate who was perfectly respectful toward me, but treated the (young-looking) female head of Product like dirt. I was surprised, but we ended up rejecting the guy. I don’t usually wield the PhD like a weapon, but in this case, it shielded me from this guy’s bad attitude.
I wouldn’t hire anyone who made any team member uncomfortable. Most of all, I want my teams to build trust and be able to rely on each other. That means everyone has to be respectful and supportive of each other, first and foremost.
Make it as pleasant as possible for the candidate
Ideally, the candidates enjoy the process, and maybe learn something, and the people on the team are excited to be getting a new teammate.
If everything goes well, even the people we end up being unable to hire, for whatever reason, aren’t scarred by the experience. Hopefully it isn’t a waste of anyone’s time, and we don’t have too many regrets about how it went, or what we wish we had done differently.
As a hiring manager, I always lean towards being as transparent as possible with candidates. Data people like data. We don’t love it when there’s radio silence, or we’re getting cryptic messages about what to expect.
I’ve mostly focused this post on aspects specific to hiring for data positions, but a lot of this is just good hiring practice. Tell your candidates what to expect. Try to be aware of any possible biases, and take steps to address those.
Special thanks to Ray Buhr for feedback on the draft version of this post.