How the map thinks

Turning global events into live pressure signals. The system ingests incidents, computes country pressure scores, and serves a snapshot-first map boot path via same-origin API routes with caching and boundary fallbacks.

⏱ 12-hour ingestion cycle 📊 0-100 pressure score 🧠 Local NLP classification ✓ 4 pressure bands

System snapshot

Signal types

Conflict, protest, crime, cyber, economic, health, disaster

Decay window

30 days

Confidence inputs

Coverage depth + source reliability + recency

Score model

Absolute tanh normalization + asymmetric smoothing (↑ unlimited, ↓ max −8/day)

Delivery path

Vercel API proxy + edge-cached live snapshot

Pipeline walkthrough

GitHub Actions

12h trigger (02:00/14:00 UTC)

Fetch + Parse

1,248 feeds via resilient fetch with circuit breaker

NLP Classification

Event type, severity, casualties, content kind, country roles

Deduplication

URL + SimHash + Jaccard + semantic event clustering

NLP Enrichment

Source credibility weight, content kind filter, confidence boost

Scoring + Persist

Aggregation, normalization, continuity, Supabase upsert

Vercel CDN

Edge-cached snapshot API for map boot

NLP classification engine

Every ingested item passes through a local NLP classifier that runs entirely in-process with zero external API cost. The classifier extracts five signals from each news item:

Event type — conflict, disaster, protest, cyber, crime, economic, health, climate, or other. Uses keyword matching with non-incident demotion (procurement, diplomacy, policy filtered out).
Severity (1-10) — base severity from event type, escalated by casualty counts and escalation phrases (extreme +4, strong +3, medium +2).
Content kind — hard_incident (actual event), context (ongoing situation, rescue ops), or analysis (opinion, commentary). Analysis gets 0.35x weight; context gets 0.6x.
Country roles — each mentioned country tagged as location (where it happens), actor (who does it), subject (discussed), or mentioned (passing reference).
Casualties — regex extraction of killed/injured/displaced counts from text. Casualty data directly boosts severity (100+ killed forces severity 9+).

False-positive filters

Four NLP pattern filters detect and downweight content that matches incident keywords but doesn't represent current instability:

Historical/commemorative — "10 years on", "anniversary", "daily quiz on the genocide" get 0.15x multiplier
Festival/cultural — "Rocket War Preparations" (Greek Easter on Chios), reenactments get 0.1x multiplier
Geographic misattribution — "KP's Bajaur" (Pakistan) wrongly resolved to a different country gets 0.1x multiplier
Spillover attack — "Iran attacks Bahrain" scores the aggressor (Iran) not the victim (Bahrain) via dynamic spillover dampening

Source credibility

Each RSS source accumulates a health score (0.3-1.0) across ingestion runs, computed from four signals:

Success rate (35%) — how often the source fetches without error
Yield ratio (35%) — what fraction of fetched items become classifiable incidents
Dedup survival (20%) — what fraction of classified items survive deduplication (original reporting scores higher)
Latency (10%) — sources with consistently fast responses get a small bonus

New sources start at 0.7 (neutral) and need 5+ observations to diverge. High-credibility sources (AP, Reuters) pass at full weight; low-credibility sources get severity dampened to 0.5x.

Semantic deduplication

Beyond URL and SimHash matching, the system clusters events by semantic signature: event type + country + date window + extracted entities. Two differently-worded articles about the same earthquake cluster together even if their headlines share no words. Multi-source clusters get a confidence boost (2 sources: +0.15, 3+ sources: +0.25).

Multilingual detection

A lightweight language detector identifies 12 Unicode script families (Arabic, Cyrillic, CJK, Devanagari, Bengali, Thai, Hangul, Hebrew, Greek, Ethiopic, Myanmar, Latin) and sub-classifies Latin text into English, Spanish, French, German, Portuguese, or Turkish. Non-English content is flagged for routing, and diacritic normalization enables cross-language deduplication.

Scoring formula

raw = sum(type_impact_i)
normalized = tanh(log1p(raw) / 3.0)
score = normalized * 100

type_impact_i - Event severity × source quality × event-type weight
absolute scale - `3.0` keeps war zones in Severe while limiting global-noise inflation
banding - Stable 0-29, Moderate 30-59, Unstable 60-84, Severe 85-100
continuity - Increases immediate, decreases capped at −8/day

Asymmetric smoothing

Scores use asymmetric smoothing to balance responsiveness with stability:

Increases: unlimited — a war breaking out shows immediately
Decreases: capped at −8/day — peace returns gradually
Conflict events decay much slower than other types
Cyber events decay 50% faster than baseline
No-event days decay by ~2 points

This prevents sudden drops when a war ends but small incidents (riots, unrest) continue. Recovery takes days, not hours.

Event type weights

Type	Weight	Rationale
Conflict	5.0x	Primary disruption signal for war/armed violence
Disaster	0.9x	Acute infrastructure and safety disruption
Protest	0.8x	Civil unrest baseline
Crime	0.55x	Organized violence and criminal instability
Cyber	0.5x	Critical digital infrastructure disruption
Economic	0.45x	Structural financial and sanctions pressure
Health	0.6x	Public-health shock and outbreak risk
Climate	0.55x	Persistent climate-driven disruption
Other	0.35x	Residual unclassified disruption signals

Deduplication

The same event from multiple sources is counted once. The system uses:

URL matching - Exact canonical URL comparison
Content hashing - SHA-256 of normalized title + date + country
SimHash fingerprinting - Semantic similarity for near-duplicates
Bucketing - Per-day buckets by country + event type

Back to the live map

See the current pressure bands after recent updates.

Back to map View sources