Scoring methodology¶

← Back to index

The three composite scores are the paywall anchor — the un-Googleable metrics that buyers pay CHF 19 to see in full. This page explains how they're computed, why each weight was chosen, and the specific decisions that came out of beta testing.

Implementation lives in a single file: code/scores.py.

The three scores¶

1. Daily Life Score¶

How easy is everyday life here?

Component	Weight	Sub-metrics
Transit access	30%	TPG distance, TPG stop count
Daily errands	25%	Supermarket, bakery, pharmacy, post office distances
Noise & environment	20%	Day noise, air quality
Health access	15%	Health vibrancy (doctors, clinics, pharmacies)
Dining & leisure	10%	Cultural vibrancy, service entropy (diversity)

2. Family Score¶

How good is this for raising kids?

Component	Weight	Sub-metrics
School access	35%	Primary, cycle (middle), college (high) distances
Safety & quiet	20%	Night noise, day noise
Child amenities	20%	Nearest playground, playground count within 800m
Green space	15%	Nearest park
Family services	10%	Pharmacy, supermarket

Note: Daycares (crèches) are NOT in this score — they're informational only (nearest + count within 1.5km). See "Decisions" below.

3. Smart Living Score¶

What are the hidden advantages of this location?

Component	Weight	Sub-metrics
Tax efficiency	30%	`pct_tax` (inverted centimes additionnels)
Transit connectivity	25%	Transit travel time to Cornavin, airport, CERN, UN
Environmental quality	20%	Air quality, day noise
Amenity density	15%	Composite vibrancy, category diversity (floor 50)
Development context	10%	Economic vibrancy, service mix balance (floor 50)

The floor of 50 is important — see "The Cologny fix" below.

The algorithm (simplified)¶

# 1. Compute percentile rank for every raw KPI across all 17K cells
#    - Lower = better for distances/noise/tax → INVERT the percentile
#    - Higher = better for counts/vibrancy → normal rank
df['pct_tpg_dist_1'] = invert_percentile(df['tpg_dist_1'])
df['pct_tax']        = invert_percentile(df['centimes_additionnels'])
# ... etc

# 2. Fill NaN noise/air with 50 (neutral) for cells outside raster extent
#    so a missing raster doesn't crash the whole score

# 3. Compute each sub-component as the MEAN of its percentile columns
transit = mean(pct_tpg_dist_1, pct_tpg_count)
errands = mean(pct_supermarket, pct_bakery, pct_pharmacy, pct_post_office)
# ... etc

# 4. Apply floors to amenity_density and dev_context (Smart Living only)
amenity_density = amenity_density_raw.clip(lower=50)
dev_context     = dev_context_raw.clip(lower=50)

# 5. Weighted sum using the weights above
daily_life_raw = transit*0.30 + errands*0.25 + noise_env*0.20 + ...

# 6. RE-PERCENTILE the raw weighted average across the canton
#    This gives a proper 0-100 distribution (the raw weighted averages
#    compress toward the center because no single cell excels on
#    EVERY sub-component).
df['daily_life_score'] = percentile_rank(daily_life_raw)

# 7. Convert to letter grades
#    A: ≥85, B: ≥70, C: ≥50, D: ≥30, F: <30
df['daily_life_grade'] = df['daily_life_score'].apply(assign_grade)

Why re-percentile at the end?¶

A cell that's 85^th percentile on every single sub-component still has a raw weighted average of 85. But no cell actually achieves that — real neighborhoods trade strengths for weaknesses. In practice, the raw weighted averages bunch between 40 and 70. Re-percentiling gives users a meaningful grade distribution.

Grade thresholds¶

Grade	Percentile	Meaning
A	≥85	Top 15% of the canton
B	≥70	Top 30%
C	≥50	Above median
D	≥30	Below median
F	<30	Bottom 30%

Specific decisions¶

The Cologny fix (2026-04-10)¶

A beta tester complained that Chemin du Sechard 10 in Cologny scored Smart Living 65 (grade C "Good"). Cologny is one of Geneva's most upscale residential communes with the lowest tax rate (25 centimes) and lake views of the Mont-Blanc range. 65 felt wrong.

Root cause: 94 of 257 Cologny cells had pct_composite_vibrancy = 25 (the bottom quartile tie). These are the hillside residential areas with no nearby businesses by design. The original Smart Living formula penalized them:

Old Smart Living for a Cologny cell:
  Tax (30%):               99 × 0.30 = 29.7  ✓
  Transit (25%):           65 × 0.25 = 16.25
  Env quality (20%):       40 × 0.20 = 8.0
  Amenity density (15%):   25 × 0.15 = 3.75   ← penalty
  Dev context (10%):       30 × 0.10 = 3.0    ← penalty
  Total weighted:          60.7  →  percentile rank ≈ 58

The "hidden advantages" narrative was undermined by double-penalizing the thing (low business density) that defines the neighborhood.

The fix: apply a floor of 50 to amenity_density and dev_context for Smart Living specifically:

amenity_density = amenity_density_raw.clip(lower=50)
dev_context     = dev_context_raw.clip(lower=50)

Result: - Cologny residential cells: SL 58-65 → 71-76 (grade B) - Chemin du Sechard 10 specifically: 65 → 76 - Cologny median: 88 → 91 - Vandoeuvres: +5, Genthod +4, Bellevue +4 (similar upscale residential)

Trade-off: dense urban areas (Plainpalais, Eaux-Vives, Cité) drop ~10 points on Smart Living. This is philosophically correct — those neighborhoods have visible advantages (transit, vibrancy) but hidden disadvantages (high tax, noise, pollution). They're rewarded in Daily Life instead.

Why international schools aren't scored¶

Geneva has a large expat/UN population and international schools are a top decision factor, but they were deliberately NOT added to the Family Score. Two reasons:

Coverage bias: Only ~8 international schools across the canton. Adding a distance-based KPI would create sharp discontinuities (nearby cells score high, 500m away score low) that don't reflect real decision-making (expats routinely drive 15+ minutes to school).
Segment mismatch: The Family Score is meant to work for everyone, not just the expat 5-10%. Adding intl schools would either (a) skew the overall score or (b) require a personalization toggle that we don't have infrastructure for yet.

Instead: international schools are shown as informational cards in the Family deep-dive section: nearest school name + distance + count within 3km. The decision is up to the user.

Why daycares aren't scored either¶

Same reasoning. Daycares are shown as info (nearest + count within 1.5km) but not scored, because the availability of a specific crèche depends on opening hours, age group, waiting lists — stuff our data can't capture.

Transit times prefer r5py over Euclidean¶

Smart Living's "transit connectivity" component prefers pct_transit_* (r5py-computed GTFS travel times) over pct_* (Euclidean distances to key locations). If r5py is unavailable during a pipeline run, we fall back to Euclidean. The switch is automatic:

transit_cols = [f'pct_transit_{loc}' for loc in KEY_LOCATIONS
                if f'pct_transit_{loc}' in df.columns]
euclidean_cols = [f'pct_{loc}' for loc in KEY_LOCATIONS]
connectivity = _safe_mean(df, transit_cols if transit_cols else euclidean_cols)

The difference matters: Versoix is 15km Euclidean from Cornavin but 20 minutes by transit thanks to the frequent lakeside train. Euclidean gives it a mediocre score; r5py gives it a good one.

Vibrancy: intensity × diversity¶

Raw vibrancy is a weighted sum of business counts in the walking network (pandana aggregate). But a street with 20 banks shouldn't beat a street with 1 bakery, 1 bookshop, 1 doctor, 1 café. So we multiply by a diversity factor:

composite_vibrancy = intensity * (1 + norm_entropy * 0.3 + service_mix_balance * 0.2)

Where: - intensity = weighted sum with linear decay (pandana) - norm_entropy = Shannon entropy of BRANCHE types within the cell's H3 k-ring (k=5, ~800m), normalized by the 99^th percentile - service_mix_balance = min(economic, cultural, health) / max(...)

This rewards both volume AND variety. See code/kpis.py::compute_vibrancy_kpis.

What you see in the UI¶

Free teaser¶

3 letter grades (A/B/C/D/F)
1 headline per score (generated in code/insights.py)
3 fun facts
Air quality badges (currently free until SPAIR licensing is resolved)

Paid report¶

3 full numeric scores with sub-score tooltips on hover (ScoreGauge component)
Radar charts for each score's sub-dimensions
"vs canton" percentile bars
Tax comparison
School names, supermarket brands, POI coordinates
Multimodal travel times (transit / car / bike / walk)
Noise breakdowns, air quality percentiles
Fun facts, methodology notes

Tweaking the scores¶

When you want to change a score formula:

Edit code/scores.py (the whole thing is in one file)
For weight changes, also update code/config.py::DAILY_LIFE_WEIGHTS (or FAMILY_WEIGHTS / SMART_LIVING_WEIGHTS)
Run python pipeline.py (no need to re-run data loaders — they're cached)
Check the parquet: pd.read_parquet('output/geneva_kpis_by_h3.parquet')
Spot-check specific cells against known neighborhoods
Upload the new KV data (see deployment.md)

Because of the live-KV reads in the worker (see backend.md), existing unlocked reports will see the new scores on next view — they're not frozen at purchase time.

Debugging a weird score¶

If a cell scores unexpectedly, drill down:

import pandas as pd
df = pd.read_parquet('output/geneva_kpis_by_h3.parquet')

# Find the cell (approximate lat/lng)
import numpy as np
target_lat, target_lng = 46.2017, 6.1930
df['_d'] = np.sqrt((df['lat'] - target_lat)**2 + (df['lng'] - target_lng)**2)
row = df.nsmallest(1, '_d').iloc[0]

# Print all sub-scores
print(f"Commune: {row['commune']}")
print(f"Smart Living: {row['smart_living_score']}")
print(f"  pct_tax:               {row['pct_tax']:.0f}")
print(f"  pct_transit_cornavin:  {row['pct_transit_cornavin']:.0f}")
print(f"  pct_composite_vibrancy:{row['pct_composite_vibrancy']:.0f}")
# etc.

Next: Backend