|
--- |
|
library_name: transformers |
|
base_model: ai-forever/rugpt3small_based_on_gpt2 |
|
tags: |
|
- generated_from_trainer |
|
metrics: |
|
- accuracy |
|
model-index: |
|
- name: aristototle_interface |
|
results: [] |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
# aristototle_interface |
|
|
|
This model is a fine-tuned version of [ai-forever/rugpt3small_based_on_gpt2](https://huggingface.co/ai-forever/rugpt3small_based_on_gpt2) on an unknown dataset. |
|
It achieves the following results on the evaluation set: |
|
- Loss: 3.0259 |
|
- Accuracy: 0.4040 |
|
|
|
Model Description |
|
This model is a fine-tuned version of ai-forever/rugpt3small_based_on_gpt2, designed for causal language modeling tasks. It has been trained on a custom dataset to generate coherent and contextually relevant text. |
|
|
|
Training Details |
|
Training Epochs: 29.86 |
|
|
|
Total FLOPs: 8,153,103 GF |
|
|
|
Training Loss: 3.8147 |
|
|
|
Training Runtime: 35 minutes and 43.75 seconds |
|
|
|
Number of Training Samples: 291 |
|
|
|
Training Samples per Second: 4.072 |
|
|
|
Training Steps per Second: 0.056 |
|
|
|
Evaluation Metrics |
|
Evaluation Epoch: 29.86 |
|
|
|
Evaluation Accuracy: 40.4% |
|
|
|
Evaluation Loss: 3.0259 |
|
|
|
Evaluation Runtime: 0.12 seconds |
|
|
|
Number of Evaluation Samples: 1 |
|
|
|
Evaluation Samples per Second: 8.08 |
|
|
|
Evaluation Steps per Second: 8.08 |
|
|
|
Perplexity: 20.6125 |
|
|
|
Intended Use |
|
This model is intended for text generation tasks where coherent and contextually appropriate responses are required. It can be used in applications such as chatbots, content creation, and more. |
|
|
|
Limitations |
|
The model has been trained on a limited dataset (291 samples), which may affect its generalization capabilities. |
|
|
|
The evaluation accuracy of approximately 40% indicates that the model may not perform optimally across all contexts. |
|
|
|
The perplexity score suggests room for improvement in generating more confident predictions. |
|
|
|
Future Work |
|
To enhance the performance of this model, consider the following: |
|
|
|
Increase the size and diversity of the training dataset. |
|
|
|
Experiment with additional training epochs or different hyperparameters. |
|
|
|
Evaluate the model on a broader set of examples to better assess its capabilities. |
|
|
|
## Training procedure |
|
## [Training Procedure](pplx://action/followup) |
|
|
|
The model was trained using the `transformers` library and the `run_clm.py` script. Here's a summary of the training process: |
|
|
|
* **[Model](pplx://action/followup):** `ai-forever/rugpt3small_based_on_gpt2` (a Russian language GPT-2 model). |
|
* **[Objective](pplx://action/followup):** Causal Language Modeling (text generation). |
|
* **[Hardware](pplx://action/followup):** Google Colab with a single CUDA-enabled GPU. |
|
* **[Mixed Precision](pplx://action/followup):** FP16 training was enabled to reduce memory footprint and potentially improve training speed. |
|
* **[Optimizer](pplx://action/followup):** AdamW (`adamw_torch`) was used as the optimizer. |
|
* **[Learning Rate](pplx://action/followup):** The learning rate was set to `3e-5`. |
|
* **[Warmup](pplx://action/followup):** A linear warmup schedule with `500` warmup steps was used. |
|
* **[Training Data](pplx://action/followup):** Custom text dataset loaded from ` |
|
The model was trained on a custom text dataset loaded from the following sources using the `plain_text` dataset configuration: |
|
|
|
* **Training set:** Aristotle's major works. (32,835 examples) |
|
* Аристотель. Категории |
|
* Аристотель. Никомахова этика |
|
* Аристотель. Физика |
|
* Аристотель. Метафизика |
|
* Аристотель. Риторика |
|
* Аристотель. Поэтика |
|
* **[Validation Data](pplx://action/followup):** Custom text dataset loaded from `- Аристотель. Никомахова этика ttps://lib.ru/POEEAST/ARISTOTEL/nikomah.txt` using the `plain_text` dataset configuration. The validation set contained 111 examples. |
|
|
|
* **Validation set:** Aristotle. Никомахова этика (111 examples) |
|
|
|
|
|
* |
|
* **[Batch Size](pplx://action/followup):** A per-device batch size of `8` was used with a gradient accumulation size of `8`, resulting in an effective batch size of 64. |
|
* **[Sequence Length](pplx://action/followup):** The maximum sequence length (block size) was set to `2048`. |
|
* **[Gradient Checkpointing](pplx://action/followup):** Enabled to reduce memory consumption. |
|
* **[Epochs](pplx://action/followup):** Trained for `30` epochs. The training data was passed over 30 times. |
|
* **[Evaluation](pplx://action/followup):** Evaluation was performed every `1000` steps using the validation dataset. |
|
* **[Logging](pplx://action/followup):** Training progress and metrics were logged every `100` steps to TensorBoard and Weights & Biases (WandB). |
|
* **[Checkpoints](pplx://action/followup):** Model checkpoints were saved every `1000` steps, with a limit of `3` saved checkpoints. |
|
|
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 3e-05 |
|
- train_batch_size: 8 |
|
- eval_batch_size: 8 |
|
- seed: 42 |
|
- gradient_accumulation_steps: 8 |
|
- total_train_batch_size: 64 |
|
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments |
|
- lr_scheduler_type: linear |
|
- lr_scheduler_warmup_steps: 500 |
|
- num_epochs: 30.0 |
|
- mixed_precision_training: Native AMP |
|
|
|
### Training results |
|
|
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.49.0.dev0 |
|
- Pytorch 2.5.1+cu124 |
|
- Datasets 3.3.0 |
|
- Tokenizers 0.21.0 |
|
|