File size: 5,606 Bytes
410a5a0 ff98021 410a5a0 ff98021 410a5a0 ff98021 410a5a0 ff98021 410a5a0 ff98021 410a5a0 ff98021 410a5a0 ff98021 410a5a0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
---
library_name: transformers
base_model: ai-forever/rugpt3small_based_on_gpt2
tags:
- generated_from_trainer
metrics:
- accuracy
model-index:
- name: aristototle_interface
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# aristototle_interface
This model is a fine-tuned version of [ai-forever/rugpt3small_based_on_gpt2](https://huggingface.co/ai-forever/rugpt3small_based_on_gpt2) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 3.0259
- Accuracy: 0.4040
Model Description
This model is a fine-tuned version of ai-forever/rugpt3small_based_on_gpt2, designed for causal language modeling tasks. It has been trained on a custom dataset to generate coherent and contextually relevant text.
Training Details
Training Epochs: 29.86
Total FLOPs: 8,153,103 GF
Training Loss: 3.8147
Training Runtime: 35 minutes and 43.75 seconds
Number of Training Samples: 291
Training Samples per Second: 4.072
Training Steps per Second: 0.056
Evaluation Metrics
Evaluation Epoch: 29.86
Evaluation Accuracy: 40.4%
Evaluation Loss: 3.0259
Evaluation Runtime: 0.12 seconds
Number of Evaluation Samples: 1
Evaluation Samples per Second: 8.08
Evaluation Steps per Second: 8.08
Perplexity: 20.6125
Intended Use
This model is intended for text generation tasks where coherent and contextually appropriate responses are required. It can be used in applications such as chatbots, content creation, and more.
Limitations
The model has been trained on a limited dataset (291 samples), which may affect its generalization capabilities.
The evaluation accuracy of approximately 40% indicates that the model may not perform optimally across all contexts.
The perplexity score suggests room for improvement in generating more confident predictions.
Future Work
To enhance the performance of this model, consider the following:
Increase the size and diversity of the training dataset.
Experiment with additional training epochs or different hyperparameters.
Evaluate the model on a broader set of examples to better assess its capabilities.
## Training procedure
## [Training Procedure](pplx://action/followup)
The model was trained using the `transformers` library and the `run_clm.py` script. Here's a summary of the training process:
* **[Model](pplx://action/followup):** `ai-forever/rugpt3small_based_on_gpt2` (a Russian language GPT-2 model).
* **[Objective](pplx://action/followup):** Causal Language Modeling (text generation).
* **[Hardware](pplx://action/followup):** Google Colab with a single CUDA-enabled GPU.
* **[Mixed Precision](pplx://action/followup):** FP16 training was enabled to reduce memory footprint and potentially improve training speed.
* **[Optimizer](pplx://action/followup):** AdamW (`adamw_torch`) was used as the optimizer.
* **[Learning Rate](pplx://action/followup):** The learning rate was set to `3e-5`.
* **[Warmup](pplx://action/followup):** A linear warmup schedule with `500` warmup steps was used.
* **[Training Data](pplx://action/followup):** Custom text dataset loaded from `
The model was trained on a custom text dataset loaded from the following sources using the `plain_text` dataset configuration:
* **Training set:** Aristotle's major works. (32,835 examples)
* Аристотель. Категории
* Аристотель. Никомахова этика
* Аристотель. Физика
* Аристотель. Метафизика
* Аристотель. Риторика
* Аристотель. Поэтика
* **[Validation Data](pplx://action/followup):** Custom text dataset loaded from `- Аристотель. Никомахова этика ttps://lib.ru/POEEAST/ARISTOTEL/nikomah.txt` using the `plain_text` dataset configuration. The validation set contained 111 examples.
* **Validation set:** Aristotle. Никомахова этика (111 examples)
*
* **[Batch Size](pplx://action/followup):** A per-device batch size of `8` was used with a gradient accumulation size of `8`, resulting in an effective batch size of 64.
* **[Sequence Length](pplx://action/followup):** The maximum sequence length (block size) was set to `2048`.
* **[Gradient Checkpointing](pplx://action/followup):** Enabled to reduce memory consumption.
* **[Epochs](pplx://action/followup):** Trained for `30` epochs. The training data was passed over 30 times.
* **[Evaluation](pplx://action/followup):** Evaluation was performed every `1000` steps using the validation dataset.
* **[Logging](pplx://action/followup):** Training progress and metrics were logged every `100` steps to TensorBoard and Weights & Biases (WandB).
* **[Checkpoints](pplx://action/followup):** Model checkpoints were saved every `1000` steps, with a limit of `3` saved checkpoints.
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 3e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 64
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 30.0
- mixed_precision_training: Native AMP
### Training results
### Framework versions
- Transformers 4.49.0.dev0
- Pytorch 2.5.1+cu124
- Datasets 3.3.0
- Tokenizers 0.21.0
|