Scoring methodology¶
The three composite scores are the paywall anchor — the un-Googleable metrics that buyers pay CHF 19 to see in full. This page explains how they're computed, why each weight was chosen, and the specific decisions that came out of beta testing.
Implementation lives in a single file: code/scores.py.
The three scores¶
1. Daily Life Score¶
How easy is everyday life here?
| Component | Weight | Sub-metrics |
|---|---|---|
| Transit access | 30% | TPG distance, TPG stop count |
| Daily errands | 25% | Supermarket, bakery, pharmacy, post office distances |
| Noise & environment | 20% | Day noise, air quality |
| Health access | 15% | Health vibrancy (doctors, clinics, pharmacies) |
| Dining & leisure | 10% | Cultural vibrancy, service entropy (diversity) |
2. Family Score¶
How good is this for raising kids?
| Component | Weight | Sub-metrics |
|---|---|---|
| School access | 35% | Primary, cycle (middle), college (high) distances |
| Safety & quiet | 20% | Night noise, day noise |
| Child amenities | 20% | Nearest playground, playground count within 800m |
| Green space | 15% | Nearest park |
| Family services | 10% | Pharmacy, supermarket |
Note: Daycares (crèches) are NOT in this score — they're informational only (nearest + count within 1.5km). See "Decisions" below.
3. Smart Living Score¶
What are the hidden advantages of this location?
| Component | Weight | Sub-metrics |
|---|---|---|
| Tax efficiency | 30% | pct_tax (inverted centimes additionnels) |
| Transit connectivity | 25% | Transit travel time to Cornavin, airport, CERN, UN |
| Environmental quality | 20% | Air quality, day noise |
| Amenity density | 15% | Composite vibrancy, category diversity (floor 50) |
| Development context | 10% | Economic vibrancy, service mix balance (floor 50) |
The floor of 50 is important — see "The Cologny fix" below.
The algorithm (simplified)¶
# 1. Compute percentile rank for every raw KPI across all 17K cells
# - Lower = better for distances/noise/tax → INVERT the percentile
# - Higher = better for counts/vibrancy → normal rank
df['pct_tpg_dist_1'] = invert_percentile(df['tpg_dist_1'])
df['pct_tax'] = invert_percentile(df['centimes_additionnels'])
# ... etc
# 2. Fill NaN noise/air with 50 (neutral) for cells outside raster extent
# so a missing raster doesn't crash the whole score
# 3. Compute each sub-component as the MEAN of its percentile columns
transit = mean(pct_tpg_dist_1, pct_tpg_count)
errands = mean(pct_supermarket, pct_bakery, pct_pharmacy, pct_post_office)
# ... etc
# 4. Apply floors to amenity_density and dev_context (Smart Living only)
amenity_density = amenity_density_raw.clip(lower=50)
dev_context = dev_context_raw.clip(lower=50)
# 5. Weighted sum using the weights above
daily_life_raw = transit*0.30 + errands*0.25 + noise_env*0.20 + ...
# 6. RE-PERCENTILE the raw weighted average across the canton
# This gives a proper 0-100 distribution (the raw weighted averages
# compress toward the center because no single cell excels on
# EVERY sub-component).
df['daily_life_score'] = percentile_rank(daily_life_raw)
# 7. Convert to letter grades
# A: ≥85, B: ≥70, C: ≥50, D: ≥30, F: <30
df['daily_life_grade'] = df['daily_life_score'].apply(assign_grade)
Why re-percentile at the end?¶
A cell that's 85th percentile on every single sub-component still has a raw weighted average of 85. But no cell actually achieves that — real neighborhoods trade strengths for weaknesses. In practice, the raw weighted averages bunch between 40 and 70. Re-percentiling gives users a meaningful grade distribution.
Grade thresholds¶
| Grade | Percentile | Meaning |
|---|---|---|
| A | ≥85 | Top 15% of the canton |
| B | ≥70 | Top 30% |
| C | ≥50 | Above median |
| D | ≥30 | Below median |
| F | <30 | Bottom 30% |
Specific decisions¶
The Cologny fix (2026-04-10)¶
A beta tester complained that Chemin du Sechard 10 in Cologny scored Smart Living 65 (grade C "Good"). Cologny is one of Geneva's most upscale residential communes with the lowest tax rate (25 centimes) and lake views of the Mont-Blanc range. 65 felt wrong.
Root cause: 94 of 257 Cologny cells had pct_composite_vibrancy = 25
(the bottom quartile tie). These are the hillside residential areas with
no nearby businesses by design. The original Smart Living formula
penalized them:
Old Smart Living for a Cologny cell:
Tax (30%): 99 × 0.30 = 29.7 ✓
Transit (25%): 65 × 0.25 = 16.25
Env quality (20%): 40 × 0.20 = 8.0
Amenity density (15%): 25 × 0.15 = 3.75 ← penalty
Dev context (10%): 30 × 0.10 = 3.0 ← penalty
Total weighted: 60.7 → percentile rank ≈ 58
The "hidden advantages" narrative was undermined by double-penalizing the thing (low business density) that defines the neighborhood.
The fix: apply a floor of 50 to amenity_density and dev_context
for Smart Living specifically:
Result: - Cologny residential cells: SL 58-65 → 71-76 (grade B) - Chemin du Sechard 10 specifically: 65 → 76 - Cologny median: 88 → 91 - Vandoeuvres: +5, Genthod +4, Bellevue +4 (similar upscale residential)
Trade-off: dense urban areas (Plainpalais, Eaux-Vives, Cité) drop ~10 points on Smart Living. This is philosophically correct — those neighborhoods have visible advantages (transit, vibrancy) but hidden disadvantages (high tax, noise, pollution). They're rewarded in Daily Life instead.
Why international schools aren't scored¶
Geneva has a large expat/UN population and international schools are a top decision factor, but they were deliberately NOT added to the Family Score. Two reasons:
-
Coverage bias: Only ~8 international schools across the canton. Adding a distance-based KPI would create sharp discontinuities (nearby cells score high, 500m away score low) that don't reflect real decision-making (expats routinely drive 15+ minutes to school).
-
Segment mismatch: The Family Score is meant to work for everyone, not just the expat 5-10%. Adding intl schools would either (a) skew the overall score or (b) require a personalization toggle that we don't have infrastructure for yet.
Instead: international schools are shown as informational cards in the Family deep-dive section: nearest school name + distance + count within 3km. The decision is up to the user.
Why daycares aren't scored either¶
Same reasoning. Daycares are shown as info (nearest + count within 1.5km) but not scored, because the availability of a specific crèche depends on opening hours, age group, waiting lists — stuff our data can't capture.
Transit times prefer r5py over Euclidean¶
Smart Living's "transit connectivity" component prefers pct_transit_*
(r5py-computed GTFS travel times) over pct_* (Euclidean distances to
key locations). If r5py is unavailable during a pipeline run, we fall
back to Euclidean. The switch is automatic:
transit_cols = [f'pct_transit_{loc}' for loc in KEY_LOCATIONS
if f'pct_transit_{loc}' in df.columns]
euclidean_cols = [f'pct_{loc}' for loc in KEY_LOCATIONS]
connectivity = _safe_mean(df, transit_cols if transit_cols else euclidean_cols)
The difference matters: Versoix is 15km Euclidean from Cornavin but 20 minutes by transit thanks to the frequent lakeside train. Euclidean gives it a mediocre score; r5py gives it a good one.
Vibrancy: intensity × diversity¶
Raw vibrancy is a weighted sum of business counts in the walking network (pandana aggregate). But a street with 20 banks shouldn't beat a street with 1 bakery, 1 bookshop, 1 doctor, 1 café. So we multiply by a diversity factor:
Where:
- intensity = weighted sum with linear decay (pandana)
- norm_entropy = Shannon entropy of BRANCHE types within the cell's
H3 k-ring (k=5, ~800m), normalized by the 99th percentile
- service_mix_balance = min(economic, cultural, health) / max(...)
This rewards both volume AND variety. See code/kpis.py::compute_vibrancy_kpis.
What you see in the UI¶
Free teaser¶
- 3 letter grades (A/B/C/D/F)
- 1 headline per score (generated in
code/insights.py) - 3 fun facts
- Air quality badges (currently free until SPAIR licensing is resolved)
Paid report¶
- 3 full numeric scores with sub-score tooltips on hover (
ScoreGaugecomponent) - Radar charts for each score's sub-dimensions
- "vs canton" percentile bars
- Tax comparison
- School names, supermarket brands, POI coordinates
- Multimodal travel times (transit / car / bike / walk)
- Noise breakdowns, air quality percentiles
- Fun facts, methodology notes
Tweaking the scores¶
When you want to change a score formula:
- Edit
code/scores.py(the whole thing is in one file) - For weight changes, also update
code/config.py::DAILY_LIFE_WEIGHTS(orFAMILY_WEIGHTS/SMART_LIVING_WEIGHTS) - Run
python pipeline.py(no need to re-run data loaders — they're cached) - Check the parquet:
pd.read_parquet('output/geneva_kpis_by_h3.parquet') - Spot-check specific cells against known neighborhoods
- Upload the new KV data (see deployment.md)
Because of the live-KV reads in the worker (see backend.md), existing unlocked reports will see the new scores on next view — they're not frozen at purchase time.
Debugging a weird score¶
If a cell scores unexpectedly, drill down:
import pandas as pd
df = pd.read_parquet('output/geneva_kpis_by_h3.parquet')
# Find the cell (approximate lat/lng)
import numpy as np
target_lat, target_lng = 46.2017, 6.1930
df['_d'] = np.sqrt((df['lat'] - target_lat)**2 + (df['lng'] - target_lng)**2)
row = df.nsmallest(1, '_d').iloc[0]
# Print all sub-scores
print(f"Commune: {row['commune']}")
print(f"Smart Living: {row['smart_living_score']}")
print(f" pct_tax: {row['pct_tax']:.0f}")
print(f" pct_transit_cornavin: {row['pct_transit_cornavin']:.0f}")
print(f" pct_composite_vibrancy:{row['pct_composite_vibrancy']:.0f}")
# etc.
Next: Backend