Methodology
What this dataset does
The Climate Displacement Event Database documents climate induced displacement at a granular event level. Each entry captures a specific instance of displacement linked to a climate related hazard, with structured information on its timing, location, drivers, scale, and impacts.
The dataset is designed to move beyond aggregated estimates and instead provide event based evidence that reflects how displacement unfolds in real contexts. It enables analysis of patterns across geographies, hazard types, and movement pathways.
The focus is on capturing the dynamics of displacement rather than only reporting numbers. This includes where people move from, where they move to, and under what conditions.
What is a displacement event
A displacement event refers to a situation in which individuals or households are forced to leave their place of residence due to climate related hazards or environmental stress.
This includes both sudden onset events such as floods, cyclones, and landslides, as well as slow onset processes such as drought, salinization, erosion, and long term environmental degradation.
An event is defined based on a specific time period and geographic location. Where available, the dataset captures multiple phases of movement, including initial displacement, secondary movement, and return.
What data is collected
Each displacement event is structured across a consistent set of variables to enable comparability and analysis.
- Geographic information, including origin and destination locations
- Temporal information such as date and duration
- Hazard type and contributing environmental drivers
- Scale of displacement, where reported
- Movement patterns, including direction and type of movement
- Conditions at destination locations, including housing, water, sanitation, and food access
- Indicators of vulnerability across affected populations
- Reported loss and damage, including impacts on housing, livelihoods, and assets
The dataset is designed to accommodate partial information, as complete data is often not available in real world reporting contexts.
| Field | Example |
|---|---|
| Event ID | CDL-IND-2024-001 |
| Location (Origin) | Coastal Odisha |
| Destination | Bhubaneswar |
| Hazard | Cyclone |
| Date | May 2024 |
| Estimated Displacement | 12,000 people |
| Movement | Rural → Urban |
| Housing | Temporary shelters |
| Water Access | Limited |
| Source | ReliefWeb |
How data is collected
Data is collected through a multi source approach that combines desk based research with remote engagement.
- Verified news reports and media coverage
- Government publications and official statements
- Humanitarian situation reports produced by NGOs and international agencies
- Academic and field based research, where available
- Direct conversations with affected communities conducted remotely
Given the absence of continuous field presence, the methodology relies on systematic extraction and structuring of publicly available information.
Remote interactions with affected communities are used, where feasible, to validate and contextualize reported information.
Priority is given to sources that provide specific, time bound, and location referenced data.
News reports
Government publications
Humanitarian reports
Field interactions
Data quality and transparency
All data points are linked to their original sources to ensure traceability. The dataset does not introduce estimates or inferred values where data is not available.
Where information is missing, fields are left unfilled rather than approximated. This approach prioritizes transparency over completeness.
In cases where multiple sources report differing figures or details, the most consistent or clearly attributable information is retained, and discrepancies are noted where necessary.
The dataset is structured to allow users to assess the reliability and origin of each data point.
Limitations
The dataset is subject to several limitations.
Displacement events are often underreported, particularly in rural or remote areas. Media and institutional coverage tends to focus on large scale or sudden onset events, leading to gaps in smaller or slower processes.
Data availability varies significantly across regions and hazard types. As a result, some events may have detailed information while others remain partial.
The reliance on secondary sources and remote data collection limits the ability to verify all aspects of an event on the ground.
Efforts are made to expand direct data collection through field engagement. However, this is dependent on access and available resources.
Confidence and verification levels
Each displacement event is assigned a confidence level based on the quality and consistency of available information.
This is not a score of impact, but a measure of how reliably the event is documented.
- High confidence Multiple independent sources report consistent information, with clear references to time, location, and scale.
- Medium confidence Information is available from at least one credible source, but may lack completeness or cross verification.
- Low confidence Limited or fragmented reporting, where key details such as scale, exact location, or timing are unclear.
Confidence levels are intended to help users interpret the dataset critically, especially when comparing across regions or event types.
- High confidence
- Medium confidence
- Low confidence
Event verification approach
- Identifying whether the displacement is directly or indirectly linked to a climate related hazard
- Checking for consistency across available sources
- Confirming minimum required attributes such as location and time reference
- Distinguishing between reported displacement and projected or anticipated displacement
Events based purely on forecasts, projections, or policy discussions are not included unless displacement has actually occurred.
Where possible, remote conversations with affected communities are used to validate or enrich reported information.
Data structure and schema
The dataset is structured at the level of individual displacement events.
- Event identification
- Geographic attributes
- Temporal attributes
- Hazard classification
- Displacement characteristics
- Conditions at destination
- Vulnerability indicators
- Loss and damage
- Source metadata
The schema is designed to be extensible and scalable.