File size: 17,290 Bytes

dba50f6

---
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:6000
- loss:CoSENTLoss
base_model: avsolatorio/GIST-small-Embedding-v0
widget:
- source_sentence: are paris metro tickets one way?
  sentences:
  - The two big differences between the 2.4 GHz and 5 GHz frequencies are speed and
    range. A wireless transmission at 2.4 GHz provides internet to a larger area but
    sacrifices speed, while 5 GHz provides faster speeds to a smaller area.
  - The State of Rhode Island has adopted the income shares model to determine the
    weekly child support order. It is based upon the philosophy that children are
    entitled to the standard of living based upon both parents monthly income. ...
    Weekly gross income of both parents before taxes and before any other deductions.
  - Insulin NPH may be administered in 2 divided doses daily (either as equally divided
    doses, or as ~2/3 of the dose before the morning meal and ~1/3 of the dose before
    the evening meal or at bedtime).
- source_sentence: how to pxe boot surface pro?
  sentences:
  - The UKTV Play app, with shows from Dave, Drama, Yesterday and Really, is available
    on smart TVs powered by Freeview Play and newer Samsung TVs. ... You can watch
    catch up and box sets from W, Alibi, Gold, Eden, Dave, Drama and Yesterday on
    Sky+HD, Sky Q and Sky Go.
  - In a branch For cash that was deposited over the counter at another bank, the
    processing and clearance time is 5 business days (not including public holidays).
  - '[''Click "account" in the upper right corner of your Facebook page.'', ''Select
    "privacy settings."'', ''Under "block lists" at the bottom center of the page,
    click "edit your lists."'', ''At the top, under "block users," add the name or
    e-mail address of the person you\''d like to block.'', ''Click "block."'']'
- source_sentence: what is long-term capital gains rate?
  sentences:
  - You can get Social Security retirement or survivors benefits and work at the same
    time. But, if you're younger than full retirement age, and earn more than certain
    amounts, your benefits will be reduced. The amount that your benefits are reduced,
    however, isn't truly lost.
  - Dreams that involve shouting can warn of impending trouble. When you are the one
    shouting, this can mean you are going through a tough time in your waking life.
    You may be only feeling only negative emotions. ... Hearing someone else shouting
    signifies a warning of fright or anger.
  - 'A regular polygon is a flat shape whose sides are all equal and whose angles
    are all equal. The formula for finding the sum of the measure of the interior
    angles is (n - 2) * 180. To find the measure of one interior angle, we take that
    formula and divide by the number of sides n: (n - 2) * 180 / n.'
- source_sentence: can a girl get pregnant two days after her menstruation?
  sentences:
  - Newborn usually refers to a baby from birth to about 2 months of age. Infants
    can be considered children anywhere from birth to 1 year old. Baby can be used
    to refer to any child from birth to age 4 years old, thus encompassing newborns,
    infants, and toddlers.
  - 'According to professional numerologists, there are three ultimately lucky numbers
    for Capricorn-born people: they are 5, 6, and 8. In case they want to increase
    the chance of success for anything, simply make use of these numbers.'
  - He's a professional dancer and model. J.C. Before entering the Big Brother house,
    J.C. was a dancer who traveled the world to perform professionally. “I do professional
    dancing. Not really break dancing, I do more choreography dancing,” he said in
    an interview with Entertainment Tonight Canada.
- source_sentence: how long does it take to transfer money between anz and westpac?
  sentences:
  - This service is currently offered free of charge by the bank. You can get the
    last 'Available' balance of your account (by an SMS) by giving a Missed Call to
    18008431122. You can get the Mini Statement (by an SMS) for last 5 transactions
    in your account by giving a Missed Call to 18008431133. 1.
  - Simply put, 1 ply toilet paper is made of a single layer of paper, while 2 ply
    has two layers. ... In the 1950's, a manufacturer created a method to roll and
    attach one-ply paper together to make a thicker “two-ply”. For years, 2-ply toilet
    tissue was always thicker and usually assumed to be better.
  - The main difference between unique and distinct is that UNIQUE is a constraint
    that is used on the input of data and ensures data integrity. While DISTINCT keyword
    is used when we want to query our results or in other words, output the data.
pipeline_tag: sentence-similarity
library_name: sentence-transformers
---

# SentenceTransformer based on avsolatorio/GIST-small-Embedding-v0

This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [avsolatorio/GIST-small-Embedding-v0](https://huggingface.co/avsolatorio/GIST-small-Embedding-v0). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

## Model Details

### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [avsolatorio/GIST-small-Embedding-v0](https://huggingface.co/avsolatorio/GIST-small-Embedding-v0) <!-- at revision 75e62fd210b9fde790430e0b2f040b0b00a021b1 -->
- **Maximum Sequence Length:** 512 tokens
- **Output Dimensionality:** 384 dimensions
- **Similarity Function:** Cosine Similarity
<!-- - **Training Dataset:** Unknown -->
<!-- - **Language:** Unknown -->
<!-- - **License:** Unknown -->

### Model Sources

- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)

### Full Model Architecture

```
SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)
```

## Usage

### Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

```bash
pip install -U sentence-transformers
```

Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("moshew/gist_small_ft_gooaq_v1")
# Run inference
sentences = [
    'how long does it take to transfer money between anz and westpac?',
    "This service is currently offered free of charge by the bank. You can get the last 'Available' balance of your account (by an SMS) by giving a Missed Call to 18008431122. You can get the Mini Statement (by an SMS) for last 5 transactions in your account by giving a Missed Call to 18008431133. 1.",
    "Simply put, 1 ply toilet paper is made of a single layer of paper, while 2 ply has two layers. ... In the 1950's, a manufacturer created a method to roll and attach one-ply paper together to make a thicker “two-ply”. For years, 2-ply toilet tissue was always thicker and usually assumed to be better.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```

<!--
### Direct Usage (Transformers)

<details><summary>Click to see the direct usage in Transformers</summary>

</details>
-->

<!--
### Downstream Usage (Sentence Transformers)

You can finetune this model on your own dataset.

<details><summary>Click to expand</summary>

</details>
-->

<!--
### Out-of-Scope Use

*List how the model may foreseeably be misused and address what users ought not to do with the model.*
-->

<!--
## Bias, Risks and Limitations

*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
-->

<!--
### Recommendations

*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
-->

## Training Details

### Training Dataset

#### Unnamed Dataset

* Size: 6,000 training samples
* Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>label</code>
* Approximate statistics based on the first 1000 samples:
  |         | sentence1                                                                         | sentence2                                                                           | label                                                          |
  |:--------|:----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:---------------------------------------------------------------|
  | type    | string                                                                            | string                                                                              | float                                                          |
  | details | <ul><li>min: 8 tokens</li><li>mean: 11.97 tokens</li><li>max: 23 tokens</li></ul> | <ul><li>min: 14 tokens</li><li>mean: 58.86 tokens</li><li>max: 126 tokens</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.17</li><li>max: 1.0</li></ul> |
* Samples:
  | sentence1                                                                             | sentence2                                                                                                                                                                                                                                                                                          | label            |
  |:--------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|
  | <code>what is the difference between rapid rise yeast and bread machine yeast?</code> | <code>Though there are some minor differences in shape and nutrients, Rapid-Rise Yeast is (pretty much) the same as Instant Yeast and Bread Machine Yeast. ... Also, Rapid-Rise Yeast is a little more potent than Active Dry Yeast and can be mixed in with your dry ingredients directly.</code> | <code>1.0</code> |
  | <code>what is the difference between rapid rise yeast and bread machine yeast?</code> | <code>Omeprazole and esomeprazole therapy are both associated with a low rate of transient and asymptomatic serum aminotransferase elevations and are rare causes of clinically apparent liver injury.</code>                                                                                      | <code>0.0</code> |
  | <code>what is the difference between rapid rise yeast and bread machine yeast?</code> | <code>Benefits of choosing a soft starter A variable frequency drive (VFD) is a motor control device that protects and controls the speed of an AC induction motor. A VFD can control the speed of the motor during the start and stop cycle, as well as throughout the run cycle.</code>          | <code>0.0</code> |
* Loss: [<code>CoSENTLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosentloss) with these parameters:
  ```json
  {
      "scale": 20.0,
      "similarity_fct": "pairwise_cos_sim"
  }
  ```

### Training Hyperparameters
#### Non-Default Hyperparameters

- `per_device_train_batch_size`: 16
- `per_device_eval_batch_size`: 16
- `num_train_epochs`: 1
- `warmup_ratio`: 0.1
- `seed`: 12
- `bf16`: True
- `dataloader_num_workers`: 4

#### All Hyperparameters
<details><summary>Click to expand</summary>

- `overwrite_output_dir`: False
- `do_predict`: False
- `eval_strategy`: no
- `prediction_loss_only`: True
- `per_device_train_batch_size`: 16
- `per_device_eval_batch_size`: 16
- `per_gpu_train_batch_size`: None
- `per_gpu_eval_batch_size`: None
- `gradient_accumulation_steps`: 1
- `eval_accumulation_steps`: None
- `torch_empty_cache_steps`: None
- `learning_rate`: 5e-05
- `weight_decay`: 0.0
- `adam_beta1`: 0.9
- `adam_beta2`: 0.999
- `adam_epsilon`: 1e-08
- `max_grad_norm`: 1.0
- `num_train_epochs`: 1
- `max_steps`: -1
- `lr_scheduler_type`: linear
- `lr_scheduler_kwargs`: {}
- `warmup_ratio`: 0.1
- `warmup_steps`: 0
- `log_level`: passive
- `log_level_replica`: warning
- `log_on_each_node`: True
- `logging_nan_inf_filter`: True
- `save_safetensors`: True
- `save_on_each_node`: False
- `save_only_model`: False
- `restore_callback_states_from_checkpoint`: False
- `no_cuda`: False
- `use_cpu`: False
- `use_mps_device`: False
- `seed`: 12
- `data_seed`: None
- `jit_mode_eval`: False
- `use_ipex`: False
- `bf16`: True
- `fp16`: False
- `fp16_opt_level`: O1
- `half_precision_backend`: auto
- `bf16_full_eval`: False
- `fp16_full_eval`: False
- `tf32`: None
- `local_rank`: 0
- `ddp_backend`: None
- `tpu_num_cores`: None
- `tpu_metrics_debug`: False
- `debug`: []
- `dataloader_drop_last`: False
- `dataloader_num_workers`: 4
- `dataloader_prefetch_factor`: None
- `past_index`: -1
- `disable_tqdm`: False
- `remove_unused_columns`: True
- `label_names`: None
- `load_best_model_at_end`: False
- `ignore_data_skip`: False
- `fsdp`: []
- `fsdp_min_num_params`: 0
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
- `tp_size`: 0
- `fsdp_transformer_layer_cls_to_wrap`: None
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
- `deepspeed`: None
- `label_smoothing_factor`: 0.0
- `optim`: adamw_torch
- `optim_args`: None
- `adafactor`: False
- `group_by_length`: False
- `length_column_name`: length
- `ddp_find_unused_parameters`: None
- `ddp_bucket_cap_mb`: None
- `ddp_broadcast_buffers`: False
- `dataloader_pin_memory`: True
- `dataloader_persistent_workers`: False
- `skip_memory_metrics`: True
- `use_legacy_prediction_loop`: False
- `push_to_hub`: False
- `resume_from_checkpoint`: None
- `hub_model_id`: None
- `hub_strategy`: every_save
- `hub_private_repo`: None
- `hub_always_push`: False
- `gradient_checkpointing`: False
- `gradient_checkpointing_kwargs`: None
- `include_inputs_for_metrics`: False
- `include_for_metrics`: []
- `eval_do_concat_batches`: True
- `fp16_backend`: auto
- `push_to_hub_model_id`: None
- `push_to_hub_organization`: None
- `mp_parameters`: 
- `auto_find_batch_size`: False
- `full_determinism`: False
- `torchdynamo`: None
- `ray_scope`: last
- `ddp_timeout`: 1800
- `torch_compile`: False
- `torch_compile_backend`: None
- `torch_compile_mode`: None
- `include_tokens_per_second`: False
- `include_num_input_tokens_seen`: False
- `neftune_noise_alpha`: None
- `optim_target_modules`: None
- `batch_eval_metrics`: False
- `eval_on_start`: False
- `use_liger_kernel`: False
- `eval_use_gather_object`: False
- `average_tokens_across_devices`: False
- `prompts`: None
- `batch_sampler`: batch_sampler
- `multi_dataset_batch_sampler`: proportional

</details>

### Training Logs
| Epoch  | Step | Training Loss |
|:------:|:----:|:-------------:|
| 0.0027 | 1    | 0.3104        |


### Framework Versions
- Python: 3.11.12
- Sentence Transformers: 4.1.0
- Transformers: 4.51.3
- PyTorch: 2.6.0+cu124
- Accelerate: 1.5.2
- Datasets: 3.5.0
- Tokenizers: 0.21.1

## Citation

### BibTeX

#### Sentence Transformers
```bibtex
@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
```

#### CoSENTLoss
```bibtex
@online{kexuefm-8847,
    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
    author={Su Jianlin},
    year={2022},
    month={Jan},
    url={https://kexue.fm/archives/8847},
}
```

<!--
## Glossary

*Clearly define terms in order to be accessible across audiences.*
-->

<!--
## Model Card Authors

*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
-->

<!--
## Model Card Contact

*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
-->