File size: 5,738 Bytes
3bb5fb5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
---
license: cc-by-4.0
tags:
- text
- news
- global
- knowledge-graph
- geopolitics
dataset_info:
  features:
  - name: GKGRECORDID
    dtype: string
  - name: DATE
    dtype: string
  - name: SourceCollectionIdentifier
    dtype: string
  - name: SourceCommonName
    dtype: string
  - name: DocumentIdentifier
    dtype: string
  - name: V1Counts
    dtype: string
  - name: V2.1Counts
    dtype: string
  - name: V1Themes
    dtype: string
  - name: V2EnhancedThemes
    dtype: string
  - name: V1Locations
    dtype: string
  - name: V2EnhancedLocations
    dtype: string
  - name: V1Persons
    dtype: string
  - name: V2EnhancedPersons
    dtype: string
  - name: V1Organizations
    dtype: string
  - name: V2EnhancedOrganizations
    dtype: string
  - name: V1.5Tone
    dtype: string
  - name: V2.1EnhancedDates
    dtype: string
  - name: V2GCAM
    dtype: string
  - name: V2.1SharingImage
    dtype: string
  - name: V2.1Quotations
    dtype: string
  - name: V2.1AllNames
    dtype: string
  - name: V2.1Amounts
    dtype: string
---

# Dataset Card for dwb2023/gdelt-gkg-2025-v2

## Dataset Details

### Dataset Description

This dataset contains GDELT Global Knowledge Graph (GKG) data covering February 2025. It captures global event interactions, actor relationships, and contextual narratives to support temporal, spatial, and thematic analysis.

- **Curated by:** dwb2023

### Dataset Sources

- **Repository:** [http://data.gdeltproject.org/gdeltv2](http://data.gdeltproject.org/gdeltv2)
- **GKG Documentation:** [GDELT 2.0 Overview](https://blog.gdeltproject.org/gdelt-2-0-our-global-world-in-realtime/), [GDELT GKG Codebook](http://data.gdeltproject.org/documentation/GDELT-Global_Knowledge_Graph_Codebook-V2.1.pdf)

## Uses

### Direct Use

This dataset is suitable for:

- Temporal analysis of global events

### Out-of-Scope Use

- Not designed for real-time monitoring due to its historic and static nature
- Not intended for medical diagnosis or predictive health modeling

## Dataset Structure

### Features and Relationships

- this dataset focuses on a subset of features from the source GDELT dataset.

| Name | Type | Aspect | Description |
|------|------|---------|-------------|
| DATE | string | Metadata | Publication date of the article/document |
| SourceCollectionIdentifier | string | Metadata | Unique identifier for the source collection |
| SourceCommonName | string | Metadata | Common/display name of the source |
| DocumentIdentifier | string | Metadata | Unique URL/identifier of the document |
| V1Counts | string | Metrics | Original count mentions of numeric values |
| V2.1Counts | string | Metrics | Enhanced numeric pattern extraction |
| V1Themes | string | Classification | Original thematic categorization |
| V2EnhancedThemes | string | Classification | Expanded theme taxonomy and classification |
| V1Locations | string | Entities | Original geographic mentions |
| V2EnhancedLocations | string | Entities | Enhanced location extraction with coordinates |
| V1Persons | string | Entities | Original person name mentions |
| V2EnhancedPersons | string | Entities | Enhanced person name extraction |
| V1Organizations | string | Entities | Original organization mentions |
| V2EnhancedOrganizations | string | Entities | Enhanced organization name extraction |
| V1.5Tone | string | Sentiment | Original emotional tone scoring |
| V2.1EnhancedDates | string | Temporal | Temporal reference extraction |
| V2GCAM | string | Sentiment | Global Content Analysis Measures |
| V2.1SharingImage | string | Content | URL of document image |
| V2.1Quotations | string | Content | Direct quote extraction |
| V2.1AllNames | string | Entities | Comprehensive named entity extraction |
| V2.1Amounts | string | Metrics | Quantity and measurement extraction |

### Aspects Overview:
- **Metadata**: Core document information
- **Metrics**: Numerical measurements and counts
- **Classification**: Categorical and thematic analysis
- **Entities**: Named entity recognition (locations, persons, organizations)
- **Sentiment**: Emotional and tone analysis
- **Temporal**: Time-related information
- **Content**: Direct content extraction

## Dataset Creation

### Curation Rationale
This dataset was curated to capture the rapidly evolving global narrative during February 2025. By zeroing in on this critical period, it offers a granular perspective on how geopolitical events, actor relationships, and thematic discussions shifted amid the escalating pandemic. The enhanced GKG features further enable advanced entity, sentiment, and thematic analysis, making it a valuable resource for studying the socio-political and economic impacts of emergent LLM capabilities.

### Curation Approach
A targeted subset of GDELT’s columns was selected to streamline analysis on key entities (locations, persons, organizations), thematic tags, and sentiment scores—core components of many knowledge-graph and text analytics workflows. This approach balances comprehensive coverage with manageable data size and performance. The ETL pipeline used to produce these transformations is documented here:
[https://gist.github.com/donbr/5293468436a1a39bd2d9f4959cbd4923](https://gist.github.com/donbr/5293468436a1a39bd2d9f4959cbd4923).

## Citation

When using this dataset, please cite both the dataset and original GDELT project:

```bibtex
@misc{gdelt-gkg-2025-v2,
    title = {GDELT Global Knowledge Graph 2025 Dataset},
    author = {dwb2023},
    year = {2025},
    publisher = {Hugging Face},
    url = {https://huggingface.co/datasets/dwb2023/gdelt-gkg-2025-v2}
}
```

## Dataset Card Contact

For questions and comments about this dataset card, please contact dwb2023 through the Hugging Face platform.