The hardest part of an ML project isn't the model. It's everything around the model — the data pipelines feeding it, the deployment infrastructure running it, and the monitoring catching it when it drifts. Here's the structure we use.
Step 1: Feasibility, not modeling
Before any modeling, we ask three questions: Is the right data available? Is there a path to action when the model produces a prediction? Is the use case still valuable if accuracy is 80% instead of 95%? If any answer is "no" or "we don't know," modeling is premature.
Step 2: Build the pipeline before the model
The training pipeline, the inference pipeline, and the feature pipelines all need to exist before the model is final. We build them with placeholder logic and then swap in the real model. This forces the integration work to happen up front, not at the end.
Step 3: Treat the model as code
The model lives in version control. Training is reproducible. The artifact is registered (we default to MLflow). Promotion across environments follows the same rails as the rest of the codebase — no special-case workflows for "the data scientist's laptop."
Step 4: Monitor what matters
Three categories of monitoring, in order of priority:
- Operational: is the inference endpoint running? Latency? Error rate?
- Statistical: are inputs distributed like the training data? Is the prediction distribution stable?
- Business: is the model still producing the outcome it was built for?
Step 5: Plan retraining from day one
"We'll retrain when needed" never happens. The retraining cadence and the trigger conditions need to be designed alongside the first deployment, not bolted on after the model starts to degrade.
None of this is glamorous. That's the point. Most of the value of a production ML system comes from making the unglamorous parts boring and reliable.