World Chaos Map Docs
Loading...
Open map

How the map thinks

Turning global events into live pressure signals. The system ingests incidents, computes country pressure scores, and serves a snapshot-first map boot path via same-origin API routes with caching and boundary fallbacks.

โฑ 12-hour ingestion cycle ๐Ÿ“Š 0-100 pressure score ๐Ÿง  Local NLP classification โœ“ 4 pressure bands
System snapshot
Signal types
Conflict, protest, crime, cyber, economic, health, disaster
Decay window
30 days
Confidence inputs
Coverage depth + source reliability + recency
Score model
Absolute tanh normalization + asymmetric smoothing (โ†‘ unlimited, โ†“ max โˆ’8/day)
Delivery path
Vercel API proxy + edge-cached live snapshot
Pipeline walkthrough
GitHub Actions
12h trigger (02:00/14:00 UTC)
Fetch + Parse
1,248 feeds via resilient fetch with circuit breaker
NLP Classification
Event type, severity, casualties, content kind, country roles
Deduplication
URL + SimHash + Jaccard + semantic event clustering
NLP Enrichment
Source credibility weight, content kind filter, confidence boost
Scoring + Persist
Aggregation, normalization, continuity, Supabase upsert
Vercel CDN
Edge-cached snapshot API for map boot
NLP classification engine

Every ingested item passes through a local NLP classifier that runs entirely in-process with zero external API cost. The classifier extracts five signals from each news item:

  • Event type โ€” conflict, disaster, protest, cyber, crime, economic, health, climate, or other. Uses keyword matching with non-incident demotion (procurement, diplomacy, policy filtered out).
  • Severity (1-10) โ€” base severity from event type, escalated by casualty counts and escalation phrases (extreme +4, strong +3, medium +2).
  • Content kind โ€” hard_incident (actual event), context (ongoing situation, rescue ops), or analysis (opinion, commentary). Analysis gets 0.35x weight; context gets 0.6x.
  • Country roles โ€” each mentioned country tagged as location (where it happens), actor (who does it), subject (discussed), or mentioned (passing reference).
  • Casualties โ€” regex extraction of killed/injured/displaced counts from text. Casualty data directly boosts severity (100+ killed forces severity 9+).
False-positive filters

Four NLP pattern filters detect and downweight content that matches incident keywords but doesn't represent current instability:

  • Historical/commemorative โ€” "10 years on", "anniversary", "daily quiz on the genocide" get 0.15x multiplier
  • Festival/cultural โ€” "Rocket War Preparations" (Greek Easter on Chios), reenactments get 0.1x multiplier
  • Geographic misattribution โ€” "KP's Bajaur" (Pakistan) wrongly resolved to a different country gets 0.1x multiplier
  • Spillover attack โ€” "Iran attacks Bahrain" scores the aggressor (Iran) not the victim (Bahrain) via dynamic spillover dampening
Source credibility

Each RSS source accumulates a health score (0.3-1.0) across ingestion runs, computed from four signals:

  • Success rate (35%) โ€” how often the source fetches without error
  • Yield ratio (35%) โ€” what fraction of fetched items become classifiable incidents
  • Dedup survival (20%) โ€” what fraction of classified items survive deduplication (original reporting scores higher)
  • Latency (10%) โ€” sources with consistently fast responses get a small bonus

New sources start at 0.7 (neutral) and need 5+ observations to diverge. High-credibility sources (AP, Reuters) pass at full weight; low-credibility sources get severity dampened to 0.5x.

Semantic deduplication

Beyond URL and SimHash matching, the system clusters events by semantic signature: event type + country + date window + extracted entities. Two differently-worded articles about the same earthquake cluster together even if their headlines share no words. Multi-source clusters get a confidence boost (2 sources: +0.15, 3+ sources: +0.25).

Multilingual detection

A lightweight language detector identifies 12 Unicode script families (Arabic, Cyrillic, CJK, Devanagari, Bengali, Thai, Hangul, Hebrew, Greek, Ethiopic, Myanmar, Latin) and sub-classifies Latin text into English, Spanish, French, German, Portuguese, or Turkish. Non-English content is flagged for routing, and diacritic normalization enables cross-language deduplication.

Scoring formula
raw = sum(type_impact_i)
normalized = tanh(log1p(raw) / 3.0)
score = normalized * 100
  • type_impact_i - Event severity ร— source quality ร— event-type weight
  • absolute scale - `3.0` keeps war zones in Severe while limiting global-noise inflation
  • banding - Stable 0-29, Moderate 30-59, Unstable 60-84, Severe 85-100
  • continuity - Increases immediate, decreases capped at โˆ’8/day
Asymmetric smoothing

Scores use asymmetric smoothing to balance responsiveness with stability:

  • Increases: unlimited โ€” a war breaking out shows immediately
  • Decreases: capped at โˆ’8/day โ€” peace returns gradually
  • Conflict events decay much slower than other types
  • Cyber events decay 50% faster than baseline
  • No-event days decay by ~2 points

This prevents sudden drops when a war ends but small incidents (riots, unrest) continue. Recovery takes days, not hours.

Event type weights
TypeWeightRationale
Conflict5.0xPrimary disruption signal for war/armed violence
Disaster0.9xAcute infrastructure and safety disruption
Protest0.8xCivil unrest baseline
Crime0.55xOrganized violence and criminal instability
Cyber0.5xCritical digital infrastructure disruption
Economic0.45xStructural financial and sanctions pressure
Health0.6xPublic-health shock and outbreak risk
Climate0.55xPersistent climate-driven disruption
Other0.35xResidual unclassified disruption signals
Deduplication

The same event from multiple sources is counted once. The system uses:

  • URL matching - Exact canonical URL comparison
  • Content hashing - SHA-256 of normalized title + date + country
  • SimHash fingerprinting - Semantic similarity for near-duplicates
  • Bucketing - Per-day buckets by country + event type

Back to the live map

See the current pressure bands after recent updates.