A newer version of the Streamlit SDK is available:
1.45.0
license: cc-by-4.0
tags:
- text
- news
- global
- knowledge-graph
- geopolitics
dataset_info:
features:
- name: GKGRECORDID
dtype: string
- name: DATE
dtype: string
- name: SourceCollectionIdentifier
dtype: string
- name: SourceCommonName
dtype: string
- name: DocumentIdentifier
dtype: string
- name: V1Counts
dtype: string
- name: V2.1Counts
dtype: string
- name: V1Themes
dtype: string
- name: V2EnhancedThemes
dtype: string
- name: V1Locations
dtype: string
- name: V2EnhancedLocations
dtype: string
- name: V1Persons
dtype: string
- name: V2EnhancedPersons
dtype: string
- name: V1Organizations
dtype: string
- name: V2EnhancedOrganizations
dtype: string
- name: V1.5Tone
dtype: string
- name: V2.1EnhancedDates
dtype: string
- name: V2GCAM
dtype: string
- name: V2.1SharingImage
dtype: string
- name: V2.1Quotations
dtype: string
- name: V2.1AllNames
dtype: string
- name: V2.1Amounts
dtype: string
Dataset Card for dwb2023/gdelt-gkg-2025-v2
Dataset Details
Dataset Description
This dataset contains GDELT Global Knowledge Graph (GKG) data covering February 2025. It captures global event interactions, actor relationships, and contextual narratives to support temporal, spatial, and thematic analysis.
- Curated by: dwb2023
Dataset Sources
- Repository: http://data.gdeltproject.org/gdeltv2
- GKG Documentation: GDELT 2.0 Overview, GDELT GKG Codebook
Uses
Direct Use
This dataset is suitable for:
- Temporal analysis of global events
Out-of-Scope Use
- Not designed for real-time monitoring due to its historic and static nature
- Not intended for medical diagnosis or predictive health modeling
Dataset Structure
Features and Relationships
- this dataset focuses on a subset of features from the source GDELT dataset.
Name | Type | Aspect | Description |
---|---|---|---|
DATE | string | Metadata | Publication date of the article/document |
SourceCollectionIdentifier | string | Metadata | Unique identifier for the source collection |
SourceCommonName | string | Metadata | Common/display name of the source |
DocumentIdentifier | string | Metadata | Unique URL/identifier of the document |
V1Counts | string | Metrics | Original count mentions of numeric values |
V2.1Counts | string | Metrics | Enhanced numeric pattern extraction |
V1Themes | string | Classification | Original thematic categorization |
V2EnhancedThemes | string | Classification | Expanded theme taxonomy and classification |
V1Locations | string | Entities | Original geographic mentions |
V2EnhancedLocations | string | Entities | Enhanced location extraction with coordinates |
V1Persons | string | Entities | Original person name mentions |
V2EnhancedPersons | string | Entities | Enhanced person name extraction |
V1Organizations | string | Entities | Original organization mentions |
V2EnhancedOrganizations | string | Entities | Enhanced organization name extraction |
V1.5Tone | string | Sentiment | Original emotional tone scoring |
V2.1EnhancedDates | string | Temporal | Temporal reference extraction |
V2GCAM | string | Sentiment | Global Content Analysis Measures |
V2.1SharingImage | string | Content | URL of document image |
V2.1Quotations | string | Content | Direct quote extraction |
V2.1AllNames | string | Entities | Comprehensive named entity extraction |
V2.1Amounts | string | Metrics | Quantity and measurement extraction |
Aspects Overview:
- Metadata: Core document information
- Metrics: Numerical measurements and counts
- Classification: Categorical and thematic analysis
- Entities: Named entity recognition (locations, persons, organizations)
- Sentiment: Emotional and tone analysis
- Temporal: Time-related information
- Content: Direct content extraction
Dataset Creation
Curation Rationale
This dataset was curated to capture the rapidly evolving global narrative during February 2025. By zeroing in on this critical period, it offers a granular perspective on how geopolitical events, actor relationships, and thematic discussions shifted amid the escalating pandemic. The enhanced GKG features further enable advanced entity, sentiment, and thematic analysis, making it a valuable resource for studying the socio-political and economic impacts of emergent LLM capabilities.
Curation Approach
A targeted subset of GDELT’s columns was selected to streamline analysis on key entities (locations, persons, organizations), thematic tags, and sentiment scores—core components of many knowledge-graph and text analytics workflows. This approach balances comprehensive coverage with manageable data size and performance. The ETL pipeline used to produce these transformations is documented here: https://gist.github.com/donbr/5293468436a1a39bd2d9f4959cbd4923.
Citation
When using this dataset, please cite both the dataset and original GDELT project:
@misc{gdelt-gkg-2025-v2,
title = {GDELT Global Knowledge Graph 2025 Dataset},
author = {dwb2023},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/datasets/dwb2023/gdelt-gkg-2025-v2}
}
Dataset Card Contact
For questions and comments about this dataset card, please contact dwb2023 through the Hugging Face platform.