Summary
Experienced data scientist and software engineer. My research is on theory and methodology for sequential testing and estimation, particularly for causal inference in randomized experiments, best-arm identification in multi-armed bandit problems, quantile estimation, and matched observational studies. Prior to graduate school, I was a software engineer and technical lead at Google and then Thumbtack, where I also did quite a bit of data science and spent a year as a manager. My software background tilts towards backend engineering with a focus on software design and testability. My recent experience centers on Python, R and Linux, with a bit of C++.
Work Experience
- Investigated the potential for variance reduction within Stats Engine, the sequential A/B analysis engine behind Optimizely’s platform. This involved some theory, some simulations (Python), and some analysis of past experiments (Spark, Scala, Python).
- Initiated research on general justification of the normal mixture sequential probability ratio test, the underlying methodology in Stats Engine.
- Consulted on customer questions related to statistical methodology, particularly sequential testing, frequentist vs. Bayesian approaches to A/B testing, and false discovery rate control.
Assisted students, graded assignments, devised midterm questions, planned some lectures.
Researched methodology related to the design and analysis of sequential experiments with heavy-tailed data.
- Developed a suite of accuracy benchmarks for Celeste, a variational Bayesian inference engine for astronomical imagery written in Julia.
- Tracked down and fixed bugs in the Celeste model and implementation. Wrote extensive documentation based on my learnings.
Defined metrics to capture the long-term value of service providers and developed predictive models for these metrics based on information available at signup time.
- Wrote best practices, developed tools, and educated coworkers on randomized experiments. We recorded ~500 product experiments under this framework during my tenure.
- Over ~18 months, developed and ran a distributed pipeline to crawl a billion web pages, identifying and categorizing local services using SVM text classification
- Designed and analyzed large, randomized, controlled SEO experiments.
- Convinced early team to adopt mandatory code reviews and continually pushed for better automated testing and design review, preferably by example
- Designed and implemented much of our Python application framework, early build system and source organization
- Led engineering team for a year; launched initiatives to migrate from PHP towards Python and from dedicated hardware to AWS; composed job ladder and compensation bands; managed ~seven engineers
- Conducted ~300 interviews; devised interview questions and at-home challenges for software engineers, data scientists and site reliability engineers; trained new interviewers
- Built our systems monitoring and reporting infrastructure (Python, Tornado, Graphite)
- Designed randomized experiment to measure effect of content creation on SEO landing page traffic, and implemented optimization algorithm to construct new landing pages, increasing overall revenue by about 20%
- Administered our ~dozen production servers for a year, writing new Puppet configs, responding on-call to outages and publishing postmortems
Android Team, 2010
- Wrote complete automated test suite for Android system download provider, refactoring existing code as necessary to achieve testability; received a peer bonus for my final design documentation
- Designed and implemented public API for Android download manager and implemented system “Downloads” app, both released in Gingerbread
- Revised accelerometer filtering logic for Android platform screen rotation
Platforms Team, 2007-2010
- Technical lead for about a year of the Autotest project, a kernel and hardware qualification platform; mentored summer SWE intern who joined full-time the following summer
- Ported Ruby on Rails prototype of Autotest scheduling web app to Python/Django, vastly extended over the following two years
- Designed and implemented new Autotest reporting web app; frontend rendered complex, spreadsheet-style reports in browser (Google Web Toolkit frontend, Python/Django backend)
- Planned and executed a refactoring of the Autotest job scheduler to replace all multithreading with an asynchronous, event-loop style implementation, ending months of quality issues
- Designed and led implementation of distributed scheduling support to add redundancy for job scheduling and increase capacity for simultaneously executing test machines
- Productized a novel technique for resolution-independent font rendering using GPU pixel shaders
- Reimplemented researchers’ C# prototype in C++, extending its capabilities and solving many issues not addressed in prior research
- Participated in preliminary API design for this work
- Worked on the emerging OpenGL-ES 1.1 driver, adding support for various frame swapping modes and for 2X and 4X full-scene antialiasing
- Began the implementation of a new OpenGL-ES 2.0 (programmable pipeline) driver.
- Worked on kilopixel optical SETI project for a 72” telescope
- Led design + implementation of central control software (Python)
- Designed and wired daughterboard simulator and wrote 8051 microcontroller code
Oversaw weekly labs and help sessions, graded assignments
Developed system for testing throughput and latency of multihop wireless networks, including clock synchronization, data collection and PDF report generation (Java, C, Atmel AVR)
- Created object recognition software using edge detection and a database of templates
- Wrote pipelined implementation using the group’s Real-Time Specification for Java library.
- Developed PHP account administration applications
- Developed Javascript volume bar for browser-based VoIP
- Developed extensions for large software IP phone application (C++)
Spent most of my time developing a PHP message board application and being amazed that someone would pay me to code!
Publications
Bernoulli, 28(3), 1704-1728, 2022
Biometrika, 108(2), 381-396, 2021
Annals of Statistics, 49(2), 1055-1080, 2021
Probability Surveys, Volume 17, 257-317, 2020
International Parallel and Distributed Processing Symposium (IPDPS), 2018
Education
Coursework in probability theory, theoretical statistics, high-dimensional statistics, applied statistical models, convex optimization, causal inference, and statistical consulting.