How the map thinks
Turning global events into live pressure signals. The system ingests incidents, computes country pressure scores, and serves a snapshot-first map boot path via same-origin API routes with caching and boundary fallbacks.
Every ingested item passes through a local NLP classifier that runs entirely in-process with zero external API cost. The classifier extracts five signals from each news item:
- Event type โ conflict, disaster, protest, cyber, crime, economic, health, climate, or other. Uses keyword matching with non-incident demotion (procurement, diplomacy, policy filtered out).
- Severity (1-10) โ base severity from event type, escalated by casualty counts and escalation phrases (extreme +4, strong +3, medium +2).
- Content kind โ hard_incident (actual event), context (ongoing situation, rescue ops), or analysis (opinion, commentary). Analysis gets 0.35x weight; context gets 0.6x.
- Country roles โ each mentioned country tagged as location (where it happens), actor (who does it), subject (discussed), or mentioned (passing reference).
- Casualties โ regex extraction of killed/injured/displaced counts from text. Casualty data directly boosts severity (100+ killed forces severity 9+).
Four NLP pattern filters detect and downweight content that matches incident keywords but doesn't represent current instability:
- Historical/commemorative โ "10 years on", "anniversary", "daily quiz on the genocide" get 0.15x multiplier
- Festival/cultural โ "Rocket War Preparations" (Greek Easter on Chios), reenactments get 0.1x multiplier
- Geographic misattribution โ "KP's Bajaur" (Pakistan) wrongly resolved to a different country gets 0.1x multiplier
- Spillover attack โ "Iran attacks Bahrain" scores the aggressor (Iran) not the victim (Bahrain) via dynamic spillover dampening
Each RSS source accumulates a health score (0.3-1.0) across ingestion runs, computed from four signals:
- Success rate (35%) โ how often the source fetches without error
- Yield ratio (35%) โ what fraction of fetched items become classifiable incidents
- Dedup survival (20%) โ what fraction of classified items survive deduplication (original reporting scores higher)
- Latency (10%) โ sources with consistently fast responses get a small bonus
New sources start at 0.7 (neutral) and need 5+ observations to diverge. High-credibility sources (AP, Reuters) pass at full weight; low-credibility sources get severity dampened to 0.5x.
Beyond URL and SimHash matching, the system clusters events by semantic signature: event type + country + date window + extracted entities. Two differently-worded articles about the same earthquake cluster together even if their headlines share no words. Multi-source clusters get a confidence boost (2 sources: +0.15, 3+ sources: +0.25).
A lightweight language detector identifies 12 Unicode script families (Arabic, Cyrillic, CJK, Devanagari, Bengali, Thai, Hangul, Hebrew, Greek, Ethiopic, Myanmar, Latin) and sub-classifies Latin text into English, Spanish, French, German, Portuguese, or Turkish. Non-English content is flagged for routing, and diacritic normalization enables cross-language deduplication.
raw = sum(type_impact_i) normalized = tanh(log1p(raw) / 3.0) score = normalized * 100
- type_impact_i - Event severity ร source quality ร event-type weight
- absolute scale - `3.0` keeps war zones in Severe while limiting global-noise inflation
- banding - Stable 0-29, Moderate 30-59, Unstable 60-84, Severe 85-100
- continuity - Increases immediate, decreases capped at โ8/day
Scores use asymmetric smoothing to balance responsiveness with stability:
- Increases: unlimited โ a war breaking out shows immediately
- Decreases: capped at โ8/day โ peace returns gradually
- Conflict events decay much slower than other types
- Cyber events decay 50% faster than baseline
- No-event days decay by ~2 points
This prevents sudden drops when a war ends but small incidents (riots, unrest) continue. Recovery takes days, not hours.
| Type | Weight | Rationale |
|---|---|---|
| Conflict | 5.0x | Primary disruption signal for war/armed violence |
| Disaster | 0.9x | Acute infrastructure and safety disruption |
| Protest | 0.8x | Civil unrest baseline |
| Crime | 0.55x | Organized violence and criminal instability |
| Cyber | 0.5x | Critical digital infrastructure disruption |
| Economic | 0.45x | Structural financial and sanctions pressure |
| Health | 0.6x | Public-health shock and outbreak risk |
| Climate | 0.55x | Persistent climate-driven disruption |
| Other | 0.35x | Residual unclassified disruption signals |
The same event from multiple sources is counted once. The system uses:
- URL matching - Exact canonical URL comparison
- Content hashing - SHA-256 of normalized title + date + country
- SimHash fingerprinting - Semantic similarity for near-duplicates
- Bucketing - Per-day buckets by country + event type
Back to the live map
See the current pressure bands after recent updates.