Spaces:

dwb2023
/

insight

Running

File size: 6,314 Bytes

3bb5fb5

---
license: cc-by-4.0
tags:
- text
- news
- global
- knowledge-graph
- geopolitics
dataset_info:
  features:
  - name: GKGRECORDID
    dtype: string
  - name: DATE
    dtype: string
  - name: SourceCollectionIdentifier
    dtype: string
  - name: SourceCommonName
    dtype: string
  - name: DocumentIdentifier
    dtype: string
  - name: V1Counts
    dtype: string
  - name: V2.1Counts
    dtype: string
  - name: V1Themes
    dtype: string
  - name: V2EnhancedThemes
    dtype: string
  - name: V1Locations
    dtype: string
  - name: V2EnhancedLocations
    dtype: string
  - name: V1Persons
    dtype: string
  - name: V2EnhancedPersons
    dtype: string
  - name: V1Organizations
    dtype: string
  - name: V2EnhancedOrganizations
    dtype: string
  - name: V1.5Tone
    dtype: string
  - name: V2GCAM
    dtype: string
  - name: V2.1EnhancedDates
    dtype: string
  - name: V2.1Quotations
    dtype: string
  - name: V2.1AllNames
    dtype: string
  - name: V2.1Amounts
    dtype: string
  - name: tone
    dtype: float64
  splits:
  - name: train
    num_bytes: 3331097194
    num_examples: 281215
  - name: negative_tone
    num_bytes: 3331097194
    num_examples: 281215
  download_size: 2229048020
  dataset_size: 6662194388
configs:
- config_name: default
  data_files:
  - split: train
    path: data/train-*
  - split: negative_tone
    path: data/negative_tone-*
---

# Dataset Card for dwb2023/gdelt-gkg-march2020-v2

## Dataset Details

### Dataset Description

This dataset contains GDELT Global Knowledge Graph (GKG) data covering March 10-22, 2020, during the early phase of the COVID-19 pandemic. It captures global event interactions, actor relationships, and contextual narratives to support temporal, spatial, and thematic analysis.

- **Curated by:** dwb2023

### Dataset Sources

- **Repository:** [http://data.gdeltproject.org/gdeltv2](http://data.gdeltproject.org/gdeltv2)
- **GKG Documentation:** [GDELT 2.0 Overview](https://blog.gdeltproject.org/gdelt-2-0-our-global-world-in-realtime/), [GDELT GKG Codebook](http://data.gdeltproject.org/documentation/GDELT-Global_Knowledge_Graph_Codebook-V2.1.pdf)

## Uses

### Direct Use

This dataset is suitable for:

- Temporal analysis of global events
- Relationship mapping of key actors in supply chain and logistics
- Sentiment and thematic analysis of COVID-19 pandemic narratives

### Out-of-Scope Use

- Not designed for real-time monitoring due to its historic and static nature
- Not intended for medical diagnosis or predictive health modeling

## Dataset Structure

### Features and Relationships

- this dataset focuses on a subset of features from the source GDELT dataset.

| Name | Type | Aspect | Description |
|------|------|---------|-------------|
| DATE | string | Metadata | Publication date of the article/document |
| SourceCollectionIdentifier | string | Metadata | Unique identifier for the source collection |
| SourceCommonName | string | Metadata | Common/display name of the source |
| DocumentIdentifier | string | Metadata | Unique URL/identifier of the document |
| V1Counts | string | Metrics | Original count mentions of numeric values |
| V2.1Counts | string | Metrics | Enhanced numeric pattern extraction |
| V1Themes | string | Classification | Original thematic categorization |
| V2EnhancedThemes | string | Classification | Expanded theme taxonomy and classification |
| V1Locations | string | Entities | Original geographic mentions |
| V2EnhancedLocations | string | Entities | Enhanced location extraction with coordinates |
| V1Persons | string | Entities | Original person name mentions |
| V2EnhancedPersons | string | Entities | Enhanced person name extraction |
| V1Organizations | string | Entities | Original organization mentions |
| V2EnhancedOrganizations | string | Entities | Enhanced organization name extraction |
| V1.5Tone | string | Sentiment | Original emotional tone scoring |
| V2GCAM | string | Sentiment | Global Content Analysis Measures |
| V2.1EnhancedDates | string | Temporal | Temporal reference extraction |
| V2.1Quotations | string | Content | Direct quote extraction |
| V2.1AllNames | string | Entities | Comprehensive named entity extraction |
| V2.1Amounts | string | Metrics | Quantity and measurement extraction |

### Aspects Overview:
- **Metadata**: Core document information
- **Metrics**: Numerical measurements and counts
- **Classification**: Categorical and thematic analysis
- **Entities**: Named entity recognition (locations, persons, organizations)
- **Sentiment**: Emotional and tone analysis
- **Temporal**: Time-related information
- **Content**: Direct content extraction

## Dataset Creation

### Curation Rationale
This dataset was curated to capture the rapidly evolving global narrative during the early phase of the COVID-19 pandemic, focusing specifically on March 10–22, 2020. By zeroing in on this critical period, it offers a granular perspective on how geopolitical events, actor relationships, and thematic discussions shifted amid the escalating pandemic. The enhanced GKG features further enable advanced entity, sentiment, and thematic analysis, making it a valuable resource for studying the socio-political and economic impacts of COVID-19 during a pivotal point in global history.

### Curation Approach
A targeted subset of GDELT’s columns was selected to streamline analysis on key entities (locations, persons, organizations), thematic tags, and sentiment scores—core components of many knowledge-graph and text analytics workflows. This approach balances comprehensive coverage with manageable data size and performance. The ETL pipeline used to produce these transformations is documented here:
[https://gist.github.com/donbr/e2af2bbe441f90b8664539a25957a6c0](https://gist.github.com/donbr/e2af2bbe441f90b8664539a25957a6c0).

## Citation

When using this dataset, please cite both the dataset and original GDELT project:

```bibtex
@misc{gdelt-gkg-march2020,
    title = {GDELT Global Knowledge Graph March 2020 Dataset},
    author = {dwb2023},
    year = {2025},
    publisher = {Hugging Face},
    url = {https://huggingface.co/datasets/dwb2023/gdelt-gkg-march2020-v2}
}
```

## Dataset Card Contact

For questions and comments about this dataset card, please contact dwb2023 through the Hugging Face platform.