Recent Posts
Hiring data teams
This is one of those posts that I’m writing mostly because I’m still frequently hearing people say they’re struggling with the question of how to interview data people. I’ve blogged and spoken previously about the misery of being an interviewee, so suffice it to say I have a ton of empathy for how awful it is to go through a bad interview process, and how disappointing it is to go through a long, grueling interview and not get an offer.
read more
DVC_vs_Pachyderm
I decided to embark on this comparison mostly out of curiosity. No tool is perfect for all use cases, that’s why we have forks, and spoons, and sometimes when we’re camping, sporks. Although Pachyderm claims to use a git-style approach for data and code versioning, there are aspects of the Pachyderm approach (like forking) that aren’t exactly like git. So one thing I wanted to know is, how well does this analogy to git work for DVC?
read more
Test Patterns for Data Engineering
Coming from a background in bench science, or what we affectionately referred to as “wetlab”, I like to test everything I do, and I like my tests to be fast and representative of what I expect to find when I run things “for real”.
Most people I’ve met who are newer to data engineering find that it’s not immediately obvious how to write and run tests for data things. It’s different enough from writing unit tests for web apps that there are some pitfalls to be aware of.
read more