Confidence Scores
Confidence measures how much we trust a country's pressure score — not the score itself. A high-confidence "Stable" means genuinely low activity. A low-confidence "Stable" means we might just not be seeing what's happening.
A pressure score without confidence is misleading. Consider two countries both scoring 15 (Stable):
12 independent sources, 40+ events analyzed in last 30 days, strong English-language coverage. Score is trustworthy — the country is genuinely quiet.
2 sources, 3 events in 30 days, mostly non-English media. Score may under-report — instability could exist but isn't reaching our feeds.
Six signals combine to produce a per-country confidence percentage. Each factor contributes independently — a country can have great source coverage but poor dedup certainty.
How many independent feeds cover this country. More sources mean better cross-validation and reduced single-source bias. Countries like the US or UK have 15+ dedicated feeds; small island nations may have 1-2.
Total classifiable incidents in the trailing 30-day window. Higher volume means the scoring model has more data points. Zero events in 30 days significantly drops confidence — it could mean peace or it could mean blindness.
How reliably incidents map to the correct country. The NLP engine assigns country roles (location, actor, subject, mentioned) and geographic misattribution filters catch errors. Countries frequently confused with others (e.g., Georgia state vs. country) score lower.
How well the dedup engine handles this country's feed mix. When multiple sources report the same event, dedup must cluster them correctly. Countries with many feeds in different languages are harder to dedup, reducing certainty.
Whether the system has feeds in the country's primary language(s). English-dominant countries score highest. Countries with feeds only in non-supported languages rely on international wire coverage, which skews toward major events.
How recently the system received data from this country's sources. Feeds that haven't returned new content in 7+ days signal potential staleness. Circuit-broken sources also reduce this factor.
confidence = (
source_count_score * 0.30 +
event_volume_score * 0.25 +
attribution_score * 0.20 +
dedup_certainty * 0.15 +
language_coverage * 0.10 +
recency_score * 0.10
) * 100
Each sub-score is normalized to 0–1 before weighting. The total weights sum to 110% intentionally — the extra 10% allows well-covered countries to exceed nominal baselines, then the result is clamped to 0–100%.
The continuous 0–100% score maps to four interpretive tiers:
Major nations with extensive English-language media, multiple wire services, and dedicated local feeds. Scores are representative of actual conditions.
Regional powers with mixed-language coverage. International wire services cover major events, but local unrest or minor incidents may not reach the pipeline.
Smaller nations with limited international reporting. Coverage depends heavily on ReliefWeb and regional wire services. Scores may undercount localized instability.
Isolated territories, censored regions, or areas with near-zero international media presence. Scores are unreliable — "Stable" often means "invisible", not "safe".
These are related but distinct systems:
- Per-country metric
- Answers: "Can we trust this score?"
- Based on coverage breadth and depth
- Shown to users as interpretive context
- Does not change the pressure score
- Per-feed metric
- Answers: "Can we trust this source?"
- Based on reliability, yield, and dedup survival
- Used internally by the scoring pipeline
- Directly dampens severity from low-quality feeds
A country can have high confidence (many feeds) but include some low-credibility sources. The credibility system handles the per-source quality; confidence captures the overall picture.
Several pipeline features automatically improve confidence where conditions are met:
When 2+ independent sources report the same event (detected via semantic dedup), confidence gets a +0.15 to +0.25 boost. This cross-validation is the strongest confidence signal.
Countries with data from USGS, GDACS, or ReliefWeb structured APIs get a baseline confidence floor. These APIs are machine-readable and highly reliable.
Having feeds in both English and the local language improves coverage of both international and domestic events, boosting the language coverage factor.
Sources that consistently return fresh content within each 12h cycle demonstrate active, reliable coverage — increasing the recency factor.
Use this matrix to interpret pressure scores in context:
- English bias — The system has stronger coverage in English-speaking countries. Multilingual detection helps but doesn't fully close the gap.
- Wire service dependency — For many countries, international coverage comes from 2-3 wire services (Reuters, AP, AFP). If they don't cover an event, it's invisible.
- Internet shutdowns — Governments that cut internet during crises reduce feed output precisely when coverage matters most, creating paradoxical confidence drops during real events.
- Conflict asymmetry — Active war zones often have high confidence despite being dangerous to report from, because international media focus compensates for local access issues.
- Small-state noise — Very small countries (populations under 100K) can see confidence swing dramatically from a single feed going offline.
See the sources
View and search all 1,248 feeds powering the pipeline.