Posts
Data pipelining with pandas
For better or worse, when you’re dealing with data pipelines of varying shapes and sizes, sometimes you need to combine objects that don’t match up evenly.
For example, if you want to apply a condition via lookup, sometimes it makes sense to just do a merge. This creates a new column in your data table, and then you can use that for reference.
This is an extremely simple example to show what I mean:
read more
Posts
Biking data from XML to analysis, revised
Am I getting slower every day?
If you’ve ever been a bike commuter, you’ve probably asked yourself this question. Thanks to these little devices we can now attach to ourselves or our bicycles, we can now use our own actual ride data to investigate these kinds of questions, as well as questions like these:
If I’m going to work from home one day a week, which day would maximize my recovery?
read more
Posts
Working with device data
In continuing my series on investigating bike data, I ran into some interesting aspects of working with device data.
I have some experience with devices, thanks to my many years of working in research labs. This post is about the fun of hunting down what’s working and what’s not.
Things to consider when working with devices Are you using the device yourself? Are you interacting with the user(s) (directly or indirectly)?
read more
Posts
Biking data from XML to analysis, part 2
So I have some bike data that I parsed out of XML and put into a pandas dataframe. Most of the questions I wanted to ask required that the timestamp of each ride segment, or lap, be used as the index along the x-axis of a plot.
Non-obvious nuances of pandas datetime objects and indexes. You have to sort the dataframe by timestamps, before you can convert the timestamps to use as an index.
read more
Posts
Biking data from XML to analysis, part 3
One thing I wanted to do with this data set was experiment with plotting methods. I had already done some exploratory plotting with regular matplotlib, so I had some vague ideas about what I wanted to do.
First I had to select out subsets of data to compare. I knew that there were two types of rides: shorter trips in the city, and longer trips in the suburbs. I was feeling lazy, so I just did a quick threshold with SQL.
read more
Posts
Biking data from XML to analysis, part 4
One of the main reasons this project turned out to be interesting is that time series data has all kinds of gotchas. I never had to deal with a lot of this before, because the sorts of time series I did in my scientific life didn’t care about real-life things like time zones. We mostly just cared about calculating time elapsed.
…tick…tick…tick
Anyway one thing I wondered about with the bike data was, can we compare average speeds in the morning vs.
read more
Posts
Things I learned about zip files
In an effort to advance my python skills, I spent some time slowly pecking away at the puzzles on pythonchallenge. I got stuck on most of the challenges, and either had to search for a hint, or ask for help from a friend, or both. This latest one was particularly instructive, and it had to do with zipfiles.
I thought I knew what zip files were. I have used them since grad school, for transferring folders via email, and for compression.
read more
Posts
Things I learned studying the cell cycle in cancer
I know that from the outside, ‘science’ seems like The Place Where Scientists Live. But ‘science’ is not a monolithic, homogenous thing. Not all scientists are the same.
Today someone called me a Biologist. But I was never really a Biologist. My undergraduate degree was in a chemistry department.
My past life as a researcher was always very interdisciplinary. To better understand cancer cells, I used a lot of sophisticated software, and mathematical intuition, in addition to chemistry and physics.
read more
Posts
Advice on recruiting
I have had a few pleasant job interviews. Here’s what’s different about those interviews, that made them really stand out from the others I’ve done. I’ll describe a specific example, and then give some specific suggestions.
The hiring manager contacted me directly He had done his homework. He had looked at my GitHub repos. He told gave me pretty specific information about the structure of the interview, and gave me ~2 weeks to prepare.
read more
Posts
Automating user-friendly documentation with Selenium
Once upon a time, a friend recruited me to do some technical writing for the company where he works now. Basically, they needed someone to quickly revise and update the documentation for their software.
Most modern user-friendly software documentation isn’t just writing, though. It’s screenshots. A LOT of screenshots. So you don’t just write “click on the blue box”, you also show a picture of it, like this. See the blue box?
read more