Steering Big Ships
My last two years as a Software Engineer have been mainly about figuring out how to drive big, multi-person, cross-team projects that will take months / years before they are done.
Some things I've learned from this experience
1. How to get started on such projects.
Taking on such projects will usually be a three part process:
1/ Understand the problem really well first.
E.g., let's say you're tasked with improving load times of a website (> 100M visits per day, lots of complicated backend systems for ranking, indexing, personalization, etc.). What's the current load time? How has it changed as your systems have evolved? What are the bottlenecks? Does having slow load times really matter?
Think of this stage as a broad survey of the problem landscape. At this stage, you're building a mental image of the problem, figuring out how bad the problem is, why it matters for the business, some ways to improve it, etc.
2/ Once you understand the problem well, figure out what the ideal state looks like.
Obviously a latency of 0 is impossible, so you'll have to do some analysis to find a reasonable goal. Also beyond simply improving the latency as a one-off, you want to systematize this: you want to prevent future regressions to the latency, and also have ways to figure out how to improve the latency even further.
So you might define the ideal future state in these terms:
- Reduce p95 latency by 50% (1.5s => 0.75s).
- Prevent future regressions: have strict SLAs on latency and alerting to page someone if there is a regression.
- Improve latency even further: build really great profiling tools so we know where exactly the bottlenecks are.
3/ Define path to get to ideal state, with deliverables and timelines.
What are incremental steps you can take to get to the ideal state? When will you deliver them? Where will you start? How many people will you need? These are questions to answer at this stage.
On each stage check in with all interested parties and make sure they agree with your conclusions.
2. Fill the gaps.
One of your most important responsibilities is to look for problems that aren't being addressed and either fixing them yourself or delegating that task to someone else.
3. Communicate up, down, across. Make progress legible.
If you do some work but don’t talk about it, that work didn’t really happen. You have to communicate the value of your work, and show how you are making progress.
4. Allow for slack.
The bigger the initiative, the harder it is to be precise about its trajectory and what it will take to stay on the desired trajectory. You might think it would take an engineer two months to complete one of the projects your final outcome depends on. But an engineer who is new to the team needs more time to ramp up. They might also have to spend time on other things; e.g., breakages elsewhere in the team. And of course, your estimations could be wrong, as they often are.
So build in some slack. If the project will take 2 months to complete, and you absolutely must have it ready in 2 months, then you need a contingency plan in place for if the project is delayed.
Some things I haven’t figured out yet
1. How to get focused coding time while running such projects.
I still tend to get overwhelmed by details and am not good enough at managing them: Endless messages, emails, bugs, and incidents big and small constantly distract me. In this environment I’ve found it hard to get enough time to be focused coding work done.
It’s easy to do straightforward things like simple refactors, bug fixes, minor feature additions. But anything that requires long periods of concentration: starting a significant piece of work, working through a thorny programming problem, debugging something complicated, or coming up with the design for a new system is hard to find time and energy for. The only thing that works seems to be do either do this work early in the morning (7 am) or late in the evening (8 pm), when there is nobody around to distract me.