# UN Resolution Vote Predictor
#### Documentation & Methodology

#### Overview

This project presents a machine learning–based tool that predicts how each United Nations member state would vote on a given General Assembly resolution, based solely on the text content of that resolution. 

The goal is to create a realistic, interactive instrument that simulates voting behavior for academic, journalistic, or policy-oriented purposes.

#### Objectives

- Model voting behavior of UN member states using resolution texts.

- Avoid reliance on explicit metadata (e.g., voting blocs, resolution topic).

- Focus on text-only input to preserve usability for a public-facing tool.

- Achieve a reasonable trade-off between accuracy and generalizability.

#### Data Collection

- Resolutions were gathered from publicly available UN voting records (website).

- Each record contains:

 - Resolution text

 - Country name

 - Vote (Yes, No, Abstain)

To ensure consistent labeling, "Abstain" and "No" were merged into a single class — “Not Yes” — converting the task into binary classification.

- Based on United Nations General Assembly voting data (https://digitallibrary.un.org/record/4060887?ln=en)
- 2000-2023
- 262 resolutions
- Texts of resolutions were retrieved from digitallibrary.un.org

Vote column is imbalanced

#### Text Vectorization

Resolution texts were converted into dense numeric vectors using:

Model: ```sentence-transformers/all-MiniLM-L6-v2```

Vector size: 384-dim

This model provides contextualized embeddings that preserve semantic structure, useful for distinguishing between subtly different resolution texts.

#### Country Representation
Each country was represented as a categorical variable and passed through an embedding layer in the model.

- Countries were encoded using LabelEncoder.

- Embedding size: 32-dimensional per country.

This allowed the model to learn latent features per country.

![image.png](attachment:image.png)

#### Model Architecture
A PyTorch model was implemented to predict voting behavior. It has two main inputs:

1. Resolution text vector (384-dim)

2. Country ID (embedded as 32-dim vector)

These are concatenated and passed through a feedforward network:

- Linear layer → ReLU → Dropout → Output layer (logit)

- Output is passed through a sigmoid for binary classification.

In [None]:
import torch
import torch.nn as nn

class VotePredictor(nn.Module):
 def __init__(self, text_dim=384, country_count=193, country_emb_dim=32, hidden_dim=256):
 super(VotePredictor, self).__init__()
 
 # Embedding layer
 self.country_embedding = nn.Embedding(country_count, country_emb_dim)
 
 # Core prediction model
 self.model = nn.Sequential(
 nn.Linear(text_dim + country_emb_dim, hidden_dim), 
 nn.ReLU(), 
 nn.Dropout(0.3), 
 nn.Linear(hidden_dim, 1) 
 )

 def forward(self, text_vecs, country_ids):
 country_vecs = self.country_embedding(country_ids)
 
 x = torch.cat([text_vecs, country_vecs], dim=1)
 
 return self.model(x)

#### Handling Imbalanced Data

As the dataset is heavily skewed toward "Yes" votes, techniques to address this included:

- WeightedRandomSampler to oversample minority class.

- Adjusted loss function with pos_weight in BCEWithLogitsLoss.

In [None]:
class_sample_count = np.array([(y_tensor == 0).sum(), (y_tensor == 1).sum()])
weights = 1. / class_sample_count
sample_weights = weights[y_tensor.long().numpy()]

sampler = WeightedRandomSampler(
 weights=sample_weights,
 num_samples=len(sample_weights),
 replacement=True
)

#### Training
- Epochs: Tuned experimentally (optimal: ~27)

- Batch size: 64

- Optimizer: Adam (learning rate: 1e-4)

- Split: Train/test split used to evaluate model generalization.

During training, performance was monitored using f1-score, precision, and recall.

#### Two-Model Strategy
Due to inconsistent voting behavior by a subset of countries, we trained two models:

1. Main model – trained on countries with stable voting patterns.

2. Problem model – trained only on "problematic" countries with lower baseline accuracy.

At inference, the country determines which model is used.

![image.png](attachment:image.png)

F1 < 0.7 was decided to be a line to distinguish "problematic countries"

Problematic countries:
['SURINAME',
 'TURKMENISTAN',
 'MARSHALL ISLANDS',
 'MYANMAR',
 'GABON',
 'CENTRAL AFRICAN REPUBLIC',
 'ISRAEL',
 'REPUBLIC OF THE CONGO',
 'LIBERIA',
 'SOMALIA',
 'CANADA',
 "LAO PEOPLE'S DEMOCRATIC REPUBLIC",
 'TUVALU',
 'DEMOCRATIC REPUBLIC OF THE CONGO',
 'MONTENEGRO',
 'VANUATU',
 'UNITED STATES',
 'TÜRKİYE',
 'SEYCHELLES',
 'SERBIA',
 'CABO VERDE',
 'VENEZUELA (BOLIVARIAN REPUBLIC OF)',
 'KIRIBATI',
 'IRAN (ISLAMIC REPUBLIC OF)',
 'SOUTH SUDAN',
 'ALBANIA',
 'CZECHIA',
 'DOMINICA',
 'SAO TOME AND PRINCIPE',
 'ESWATINI',
 'CHAD',
 'EQUATORIAL GUINEA',
 'GAMBIA',
 'LIBYA',
 "CÔTE D'IVOIRE",
 'SAINT CHRISTOPHER AND NEVIS',
 'RWANDA',
 'TONGA',
 'NIGER',
 'MICRONESIA (FEDERATED STATES OF)',
 'SYRIAN ARAB REPUBLIC',
 'NAURU',
 'PALAU',
 'NORTH MACEDONIA',
 'NETHERLANDS',
 'BOLIVIA (PLURINATIONAL STATE OF)']

#### Inference
At prediction time:

1. User inputs a new resolution text.

2. Resolution is vectorized using the MiniLM model.

3. The model loops through all 193 countries:

 - Chooses the appropriate model (main/problem).

 - Predicts Yes / Not Yes.

4. Results are displayed in a tabular format.

![image.png](attachment:image.png)

https://huggingface.co/spaces/donsek/General_Assembly_Vote_Predicting

#### Limitations

- Class imbalance still affects precision for rare “No” or “Abstain” cases.

- Some countries’ voting logic may depend on factors not captured in text.

- Text-only approach limits nuance in interpretation (e.g., geopolitical context).

#### Future Work

- Add multi-class prediction to distinguish between Yes, No, and Abstain.
- Uniting two models into one.
- Allow users to compare past resolutions for context.