How We Score
Review Methodology
Testing process
Every app is set up from scratch with real bank accounts, real transactions, and real budgeting needs. We do not use demo accounts or synthetic data. Testing runs for a minimum of 30 days per app.
The five scoring dimensions
1. Bank sync reliability (25%)
We track connection success rates across multiple account types (checking, savings, credit card, investment) over the trial period. Key metrics: connection success rate, time to first sync after connection, and — critically — whether the app notifies you when a connection breaks.
Industry context: Plaid (used by YNAB, Monarch, Copilot) connects 12,000+ institutions at a 94% success rate. With 4 accounts, the monthly probability of at least one silent failure is approximately 21%. Tiller uses a different aggregator at 98.4% reliability with active breach notification.
2. Core budgeting functionality (25%)
Whether the app supports the budgeting method it claims to support, and how well it enforces or facilitates that method. Zero-based apps are scored on ZBB faithfulness. Tracking apps are scored on categorisation accuracy and spending insight depth.
3. Onboarding and ease of use (20%)
Time from download to first useful budget view. Learning curve steepness. Quality of in-app guidance. Scored relative to the app's target user profile — YNAB is expected to have a steeper curve than Rocket Money.
4. Data portability (15%)
Can you get your data out? CSV export quality (column structure, date formatting, completeness), whether exports include categories and notes, and whether another app can import the format. This is our original-research differentiator — most competitors do not score portability at all.
5. Value for price (15%)
Features delivered relative to price, compared to the best alternatives at the same price point. Free apps are scored against other free options. Paid apps are scored against the full paid field.
Score interpretation
- 9.0-10: best in class at its primary use case. Strong recommendation.
- 8.0-8.9: excellent for its target user. Minor gaps only.
- 7.0-7.9: good with notable limitations. Pick with eyes open.
- 6.0-6.9: acceptable but alternatives are usually better.
- Below 6.0: significant problems. Not recommended for most users.