I spent a week once chasing a false spike. A client’s signups had jumped 18 percent overnight, which ignited a flurry of “do more of that” messages. It turned out to be a reporting change buried in a vendor’s release notes. No new value, just a new definition. That week reminded me of an old truth: signal hides, noise shouts. If you make decisions at scale, you need more than formulas. You need judgment that blends math with street sense. Around my teams, we call that blend (un)Common Logic.
The parenthetical is deliberate. Plenty of logic is common, at least in slides. What is uncommon is applying it under ambiguity, time pressure, and organizational politics, while still producing decisions that hold up months later. The craft is not flashy. It is a hundred small moves that keep you aligned with reality.
What counts as signal
Signal is an effect you can describe, repeat, and use. In marketing, it might be an incremental lift in revenue per visitor tied to a specific change in creative, channel, or bidding strategy. In operations, it might be a sustained reduction in cycle time after altering queueing rules. Signal survives scrutiny. It keeps showing up when you slice the data by time, segment, or measurement method.
Noise is anything that impersonates signal. It includes natural variability, calendar effects, survivorship bias, new definitions, sampling artifacts, and the ever-present human urge to find patterns. The danger is not only false positives. It is also false negatives, the missed opportunities that never get a fair test.
The tension never ends because most systems we manage are causal tangles. You cannot untangle them fully. What you can do is build habits that shrink the tangle enough to act with confidence.
The spirit of (un)Common Logic
The framework is less a rigid method and more a posture. It insists on clarity about what would change your mind. It favors cheap learning over elaborate certainty. It remembers the asymmetry between actions and observations. Most of all, it makes room for contradictory evidence without freezing.
Here is the short version we use when onboarding new analysts and marketers:
- Start with a decision, not a dashboard. What choice is at stake, who owns it, and what alternative will you take if the data say no? Write the effect size you need to care. Put a number on “material.” If it is less than that, you will not chase it. Design for disconfirmation. Before you launch, list what result would make you stop or reverse the change. Triangulate methods. Prefer two weak, independent measurements over one heroic estimate. Instrument early, optimize later. If you cannot measure it, you cannot keep it.
Five lines, easy to nod along. The hard part is doing them when the CEO asks for a number by 2 p.m., or the https://penzu.com/p/5a8dad0c599e9ff7 campaign needs to go live this week, or procurement cut your analytics tools budget by a third. Still, this posture changes outcomes. It turns frantic optimization into disciplined learning.
An example from paid media: where the money hides
A growth lead at a mid-market ecommerce brand asked for help with non-brand paid search. Performance had stalled. CPA looked stable, but contribution margin on first order was barely positive after rising shipping costs. The team had tried more negatives, tighter geos, and fresh creative, but nothing moved the needle.
We started with a question that sounds obvious and rarely gets answered: what would make you pause spend on a segment you like? After some back and forth, we agreed on this definition of materiality: a 15 percent improvement in contribution margin per click sustained over two weeks, or an equivalent lift in high-LTV cohort share within 30 days of acquisition.
Once we wrote it down, design choices became clearer. Instead of single-silo tinkering, we ran a layered test across match types and query intent categories, pairing bid adjustments with on-site merchandising changes. The bet was that some queries were picking up shoppers who needed different value cues post-click. Without changing the page experience, bid shifts were just shuffling chairs.
Two tactics mattered:
- We assigned queries to intent buckets using a lightweight classifier with human-in-the-loop review for the top 5 percent of spend. Automated text features got us 70 percent of the way. Manual sweeps cleaned the rest where it mattered. We instrumented a simple in-session intent proxy, using clickstream patterns on the first three page interactions. This gave us a leading indicator that correlated 0.42 with 30-day LTV in historic cohorts. Not perfect, but it was available within hours of click, not weeks.
Within nine days, a cluster of mid-intent, price-sensitive queries showed a 17 to 21 percent margin lift when paired with a free-returns banner and a reranked category page that pulled mid-tier SKUs above the fold. High-intent exact matches barely responded to the merchandising changes but benefited from slightly looser bid caps due to their steadier LTV.

The test did not deliver a home run across the account. It delivered a modest, defendable gain where we could repeat it. We pruned six segments where variance drowned any effect. The team shifted budget from those to the winning combination. Sixty days later, blended first-order contribution margin was up 7 percent, and return rates had not spiked. That was signal we could use.

Cleaning the lens: definitions and data hygiene
Before clever modeling, make friends with definitions. I have lost count of teams attributing miracles to campaigns that quietly redefined “active user” or “lead qualified.” A single change to event deduplication can move conversion rate five to ten percent with no behavior change in the market. If you do not version your definitions, you cannot trust your trends.
A short audit, repeated quarterly, pays for itself:
- List the top 10 metrics that drive decisions and annotate each with its source of truth, data freshness, and known caveats. Track changes to metric definitions in a changelog. Give each change an ID and link it to code commits or vendor notes. Keep a frozen extract for critical periods, for example the week of a major launch. Future-you will want to rerun analyses against the original data.
You will notice I cheated and wrote another list here. Consider it the scaffolding you remove once habits stick. In day-to-day narratives and dashboards, replace bullets with context, examples, and reasons.
The danger of small denominators
Most false positives I see come from small denominators dressed up in percentages. A team might celebrate a 40 percent lift in a subsegment with 181 visitors and 9 conversions versus 6 the week before. The absolute difference is three conversions. Random luck produces that swing with embarrassing frequency.
If you cannot gather enough volume in a reasonable time, switch to a metric that accumulates faster. For example, if you are testing an onboarding flow with low daily signups, instrument micro-behaviors that correlate with activation. Use a historical mapping to estimate how a change in the micro-behavior translates to the primary metric, and show the uncertainty. That is not hand-waving if you disclose the link rates and error bands. It is an early look that guides whether to keep the test running or to pivot the design.
In one B2B SaaS onboarding project, activation rate took 21 to 35 days to reveal itself. By correlating specific setup actions in week one with later activation, we used a composite early indicator that gave us a directional read within 72 hours. The composite weightings came from 18 months of cohort data and were updated monthly. When a test moved the indicator by 9 to 12 percent, activation later followed by 6 to 8 percent on average. We never treated the proxy as a final verdict, but it spared us from wasting a month on bad ideas.
Triangulation beats heroics
No single method is universally best. Randomized experiments are gold when you can run them cleanly. When you cannot, you borrow from economics and epidemiology: difference-in-differences, instrumental variables, synthetic controls, or regression discontinuity. Each carries assumptions that can break.
Triangulation is the guardrail. If a marketing change looks promising in an A/B test but does not show up in channel-mix models, dig for reasons. Maybe your test bled due to cross-exposure, or your model smoothed peaks. In retail, price elasticity can shift with competitor behavior faster than your data can learn. In marketplaces, supply constraints can nullify a demand lift. Put the methods in dialogue, not in competition. You are not voting. You are asking whether the same story explains different slices of reality.
I like to keep three types of checks:
- A leading indicator, often noisy but fast. A primary outcome that carries the business case. A long-term health metric that might capture side effects, for example churn, support tickets, or margin erosion.
If a tactic hits the primary but hurts the health metric, that is not an automatic veto. It is a prompt to redesign, for example by adding guardrails or carving segments.
When measurement changes behavior
Systems respond to being measured, sometimes in perverse ways. Sales teams under quarterly quotas pull revenue forward. Support teams rated by resolution time close tickets prematurely. Marketing teams with last-click targets flood branded search or retargeting and call it growth.
This is not a morality play. People optimize against the score you give them. The fix is to make the score harder to game and closer to value creation.
A consumer subscription company I worked with paid acquisition teams on trial starts. Reasonable at first glance, until trials became almost free to start and expensive to cancel. Support costs rose, NPS fell, and credit card disputes tripled. We moved compensation to a blended metric: 45 percent weight on paid conversions within 28 days, 35 percent on six-month retention of those cohorts, and 20 percent on a support load index. Fixing the incentive aligned behavior with durable growth. The teams did not like the change for two quarters. Then their bonuses became more predictable.
Guardrail metrics can feel like drag. They are insurance. If your revenue team can increase bookings by 10 percent this quarter while quietly increasing churn risk by 12 percent next year, you are not creating value. You are borrowing it and paying interest later.
Seasonality, stationarity, and shifting baselines
Not all variance is noise. Some patterns are seasonal or regime-specific. Retailers know the December curve by heart. B2B demand has its own cadence around budgeting cycles. Algorithms drift as competitors deploy changes. Your own pricing strategy or shipping times can alter customer behavior in ways your legacy models never saw.
Build your baselines with these realities in mind:
- Use rolling baselines that adapt to recent data while respecting known seasonal cycles. For segments with sparse data, borrow strength from adjacent segments using hierarchical models or partial pooling. Resist the urge to overfit. Keep an eye on distribution shape, not just mean. If the tail risk grows, your averages might look stable while your worst days get worse.
One travel client saw average daily bookings stable year over year, but the variance had doubled. Marketing kept spending to hit average targets. Cash operations were juggling wildly. The fix was to redesign spend pacing rules around variance bands, not point targets. We accepted slightly lower average bookings in exchange for a narrower distribution that made revenue predictability and staffing much healthier. That was a trade worth taking.
Decision hygiene: rituals that scale judgment
You cannot mandate better thinking with a slide deck. You need rituals that make good habits cheaper than bad ones.
I recommend three lightweight practices:
- Pre-mortems for major bets. Before launch, have the team write short narratives of how the project failed and what evidence would show up early. Capture the mitigations in the launch plan. Decision logs. When you make a significant call, record the alternatives considered, the evidence threshold, the owner, and the next review date. Keep it short, a paragraph or two. Six months later you will remember why you did what you did. Red team by rotation. Assign a small group to argue the opposing case for a big initiative, with access to the same data. Rotate the duty so it is a skill everyone learns.
These rituals slow you down a little up front and speed you up a lot over time. They also create memory in organizations where people move roles fast.
Metrics that do not betray you
North Star metrics are useful if they resist gaming and correlate with enterprise value. They fail when they become idols. I have seen teams worship active users while ignoring margin, or celebrate net-new logos while ignoring pipeline quality.
A good North Star is anchored to durable value and is surrounded by honest companions. For a marketplace, it might be completed transactions weighted by take rate, paired with health metrics on supply liquidity and cancellation time. For a subscription app, it might be weekly engaged subscribers weighted by plan tier, paired with 90-day retention and support load.
Composite indices tempt teams because they promise simplification. Use them sparingly. If you must have one, publish the recipe and its sensitivities. Show how a five percent change in any component moves the composite. Otherwise you end up arguing about the index instead of the business.
A compact field guide for separating signal from noise
Here is a simple checklist I keep on a sticky note near my screen. It is not exhaustive, and it keeps me honest when I am deep in the weeds.
- What decision will this inform, and what alternative will I take if the effect is not there? What is the smallest effect size that matters economically, and how much data do I need to detect it with tolerable risk? What could make this result go away if I sliced it differently or measured it a different way? What might be a side effect, and how will I see it early if it shows up? What would I predict ahead of time, and what would change my mind?
Five questions, thirty seconds to read, hours of grief avoided.
Edge cases and hard problems
Some situations do not yield easily. Algorithmic feedback loops can obscure causality. For example, a recommendation system that boosts popular items makes them more popular, which the system reads as further validation. Breaking the loop requires exogenous variation, for instance holding out a random slice of users from updated algorithms and comparing their outcomes with careful monitoring to avoid long-term harm.
Delayed effects complicate interpretation. Brand advertising can lift direct response months later. Price cuts can steal pipeline from next quarter. When effects lag, short-run optimizations can punish long-run outcomes. The countermeasure is to include at least one long-horizon read in your evaluation plan and to set expectations with stakeholders that some investments will look flat for a while by design.
Multi-causality is the rule in complex funnels. If you replace a landing page, adjust bids, and switch email cadence, your attribution story will be fuzzy. Resist the urge to squeeze certainty from the model. Instead, bound the plausible contributions. Use bracketing: a lower bound if the tactic did none of the lift, an upper bound if it did all of it, and a midrange based on triangulated evidence. Decisions can proceed on ranges if you are strict about costs and reversible steps.
Non-stationarity will make a fool of your past. I once saw a demand model trained on three stable years crumble in a quarter when a competitor launched free shipping with no minimum. The model was fine. The world changed. Put alarms on your model residuals. When the error structure shifts, either re-estimate quickly or switch to simpler rules until you have new data.
Culture eats analytics
The best math breaks under bad incentives. If leaders punish uncertainty, teams will overstate confidence. If teams are rewarded only for wins, they will hide failed tests. Healthy cultures treat negative results as assets. They fund measurement the same way they fund creative. They accept that time spent on clean data and versioned definitions is part of making money, not a side project.
Culture shows up in small choices. Does the weekly review celebrate learning or only outcomes? Do teams get credit for stopping a doomed initiative early? Does finance partner with marketing on agreed-upon methods or fight about attribution every quarter? If you want signal, build trust that the truth gets rewarded.
Bringing it together
Finding signal in noise is not a single technique. It is a stack of practices that reinforce one another: sharp decisions, clear definitions, honest baselines, triangulated methods, aligned incentives, and simple rituals that scale judgment. The name we use, (un)Common Logic, is a reminder to take the extra step that most teams skip. Write the effect size that matters. Decide what would change your mind. Measure what might break. Triangulate rather than declare victory from one chart.
No framework will spare you from the grind. Real systems are messy. Data is partial. People have deadlines and P&L targets. But the grind feels different when it compounds. Each careful test, each tidy changelog entry, each pre-mortem, and each decision log are bricks in a wall that keeps the noise out. Over time, you spend less energy defending your numbers and more energy using them.
The day you catch yourself saying, “We do not know yet, but here is the smallest bet worth placing, the signals we will watch, and the date we will decide,” that is the day the noise starts losing.