Methodology

How we test apps

This page exists because we get asked. Here is exactly what happens between "we want to write a list of best X apps" and the article going live.

Step 1 — The longlist

For every category we start by assembling a longlist of 15 to 40 candidates. Sources include the App Store top charts, Reddit threads in adjacent communities, comments left on prior reviews, and our own years of accumulated notes. We do not source from affiliate program directories. (We don't know what any of the affiliate payouts are. We don't have logins to any of those networks.)

Step 2 — The download phase

We download every app on the longlist. We sign up for free tiers with our actual primary accounts (so we get the same recommendation engines, ad targeting, and notifications that a real user would get). For paid-only apps we either buy a month at retail or wait for a launch promo. We do not request review units, "extended trials," or "press tier" access from developers, ever.

Step 3 — Daily use

The minimum testing window is two weeks of daily use. For categories where we already have years of context (calorie tracking, habit tracking, note-taking, meditation), we draw on continuous multi-year use across multiple iPhones. The "we tested for three weeks" line in the byline is the floor, not the ceiling. We do not write reviews from a single afternoon spent poking around.

Step 4 — The cuts

From the longlist we cut to roughly ten finalists, and from the finalists to a final five. Cuts are decided by a simple question: would we still be using this in three months? Apps that look great in a five-minute demo but become tedious by week two get dropped. Apps with broken sync, ads in paid tiers, or aggressive paywall escalations get dropped immediately.

Step 5 — Accuracy and consistency cross-checks

Where the category has a verifiable accuracy dimension (calorie tracking has weighed-food cross-checks; sleep tracking has Apple Watch ground truth; running tracking has a known-distance loop), we run those checks. We don't claim accuracy that we haven't measured ourselves or that hasn't been independently published in a peer-reviewed or pre-printed validation study. Marketing claims are not accuracy claims.

Step 6 — Pricing and value reality check

For each finalist we map out the actual cost over a year for a typical user — not the headline price, but the realistic total once you cross paywalls for the features that matter. A "free" app that paywalls every useful feature ranks lower than a $30/year app that includes everything in the box.

Step 7 — Drafting

Drafts are written by the named reviewer on the byline. There is no "content team" stitching pieces together. There is no AI generation. Edits are usually limited to a second-pair-of-eyes pass from the other reviewer for clarity and accuracy.

Step 8 — Updates

We re-test every list at least once a year. Our default refresh cadence is twelve months, but we will republish sooner if there's a major change (a flagship app drops a feature, raises prices significantly, or has an accuracy or privacy issue surface). The "Last updated" line in the byline is the actual date of the most recent re-test, not a programmatic timestamp.

What we don't do

See also: Editorial independence policy.