File size: 16,915 Bytes

39a97eb

---
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:2000
- loss:CoSENTLoss
base_model: avsolatorio/GIST-small-Embedding-v0
widget:
- source_sentence: do vivid seats tickets work?
  sentences:
  - Charlotte-Mecklenburg Schools will be closed for students on Friday due to the
    forecast of severe weather. ... CMS staff members work with city and county leaders
    to receive the most up-to-date information about road and weather conditions.
  - Tickets are $40 per ticket and $400 for a table of ten. Tickets are available
    for purchase when you register for the show.
  - This service is currently offered free of charge by the bank. You can get the
    last 'Available' balance of your account (by an SMS) by giving a Missed Call to
    18008431122. You can get the Mini Statement (by an SMS) for last 5 transactions
    in your account by giving a Missed Call to 18008431133. 1.
- source_sentence: is alexa compatible with tv?
  sentences:
  - To fix this Echo red light, start with the restart of the router and Amazon Echo.
    In case, the restart process doesn't work, check for the device and app update
    in Alexa app. If it's available, click the 'Update' button for compatibility reason.
  - Ligament - A small band of dense, white, fibrous elastic tissue. Ligaments connect
    the ends of bones together in order to form a joint. Tendon - A tough, flexible
    band of fibrous connective tissue that connects muscles to bones.
  - There are 610 calories in a 1 bowl serving of El Pollo Loco Original Pollo Bowl.
- source_sentence: can you play fortnite save the world on mac?
  sentences:
  - '[''In the Music app on your Mac, click iTunes Store in the sidebar. ... '', ''Click
    Purchased (below Quick Links) near the top right of the iTunes Store window.'',
    ''Click Music near the top right of the page that appears. ... '', ''To download
    an item, click its Download button .'']'
  - Essential Oils in the Second and Third Trimesters. "In the second and third trimesters,
    some essential oils are safe to use, as your baby is more developed," Edwards
    adds. These include lavender, chamomile, and ylang ylang—all of which calm, relax,
    and aid sleep.
  - ADR holders do not have to transact the trade in the foreign currency or worry
    about exchanging currency on the forex market. ... ADRs list on either the New
    York Stock Exchange (NYSE), American Stock Exchange (AMEX), or the Nasdaq, but
    they are also sold over-the-counter (OTC).
- source_sentence: how long does money take to transfer boi?
  sentences:
  - 'When will it take more than one working day? It will take more than one working
    day to reach your payee''s bank when: You make a payment online after 3.30pm in
    the Republic of Ireland or after 4.30pm in Northern Ireland and Great Britain
    on a working day. Your payment will begin to process on the next working day.'
  - If you had bought just one share of Microsoft at the IPO, you would now have 288
    shares after all the splits. Those shares would be worth $44,505 at the current
    stock quote of $154.53. A $5,000 investment would have purchased 238 shares at
    the IPO price.
  - FKM is the American standard ASTM short form name for Fluro-Elastomer. ... VITON™
    is a registered trademark of Du Pont performance elastomers, the original developers
    of the rubber. However, the Viton is also used as a general name for the material,
    no matter who the manufacturer is.
- source_sentence: how long is a texas vehicle inspection report good for?
  sentences:
  - '[''Aerospace engineer.'', ''Automotive engineer.'', ''CAD technician.'', ''Contracting
    civil engineer.'', ''Control and instrumentation engineer.'', ''Maintenance engineer.'',
    ''Mechanical engineer.'', ''Nuclear engineer.'']'
  - A key difference is that it's simpler to unlock a credit lock than it is to “thaw”
    a credit freeze. But a freeze may afford legal protections that a lock doesn't.
    ... The credit bureaus sometimes promote their credit lock services, which can
    carry a monthly fee, alongside their credit freeze options, which are free.
  - If your car fails its MOT you can only continue to drive it if the previous year's
    MOT is still valid - which might occur if you submitted the car for its test two
    weeks early. You can still drive it away from the testing centre or garage if
    no 'dangerous' problems were identified during the MOT.
pipeline_tag: sentence-similarity
library_name: sentence-transformers
---

# SentenceTransformer based on avsolatorio/GIST-small-Embedding-v0

This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [avsolatorio/GIST-small-Embedding-v0](https://huggingface.co/avsolatorio/GIST-small-Embedding-v0). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

## Model Details

### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [avsolatorio/GIST-small-Embedding-v0](https://huggingface.co/avsolatorio/GIST-small-Embedding-v0) <!-- at revision 75e62fd210b9fde790430e0b2f040b0b00a021b1 -->
- **Maximum Sequence Length:** 512 tokens
- **Output Dimensionality:** 384 dimensions
- **Similarity Function:** Cosine Similarity
<!-- - **Training Dataset:** Unknown -->
<!-- - **Language:** Unknown -->
<!-- - **License:** Unknown -->

### Model Sources

- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)

### Full Model Architecture

```
SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)
```

## Usage

### Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

```bash
pip install -U sentence-transformers
```

Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("moshew/gist_small_ft_gooaq_v3")
# Run inference
sentences = [
    'how long is a texas vehicle inspection report good for?',
    "If your car fails its MOT you can only continue to drive it if the previous year's MOT is still valid - which might occur if you submitted the car for its test two weeks early. You can still drive it away from the testing centre or garage if no 'dangerous' problems were identified during the MOT.",
    "['Aerospace engineer.', 'Automotive engineer.', 'CAD technician.', 'Contracting civil engineer.', 'Control and instrumentation engineer.', 'Maintenance engineer.', 'Mechanical engineer.', 'Nuclear engineer.']",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```

<!--
### Direct Usage (Transformers)

<details><summary>Click to see the direct usage in Transformers</summary>

</details>
-->

<!--
### Downstream Usage (Sentence Transformers)

You can finetune this model on your own dataset.

<details><summary>Click to expand</summary>

</details>
-->

<!--
### Out-of-Scope Use

*List how the model may foreseeably be misused and address what users ought not to do with the model.*
-->

<!--
## Bias, Risks and Limitations

*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
-->

<!--
### Recommendations

*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
-->

## Training Details

### Training Dataset

#### Unnamed Dataset

* Size: 2,000 training samples
* Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>label</code>
* Approximate statistics based on the first 1000 samples:
  |         | sentence1                                                                         | sentence2                                                                           | label                                                         |
  |:--------|:----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:--------------------------------------------------------------|
  | type    | string                                                                            | string                                                                              | float                                                         |
  | details | <ul><li>min: 8 tokens</li><li>mean: 12.05 tokens</li><li>max: 23 tokens</li></ul> | <ul><li>min: 13 tokens</li><li>mean: 59.84 tokens</li><li>max: 124 tokens</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.5</li><li>max: 1.0</li></ul> |
* Samples:
  | sentence1                                                                             | sentence2                                                                                                                                                                                                                                                                                          | label            |
  |:--------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|
  | <code>what is the difference between rapid rise yeast and bread machine yeast?</code> | <code>Though there are some minor differences in shape and nutrients, Rapid-Rise Yeast is (pretty much) the same as Instant Yeast and Bread Machine Yeast. ... Also, Rapid-Rise Yeast is a little more potent than Active Dry Yeast and can be mixed in with your dry ingredients directly.</code> | <code>1.0</code> |
  | <code>what is the difference between rapid rise yeast and bread machine yeast?</code> | <code>Fermentation recycles NAD+, and produces 2 ATPs. In lactic acid fermentation, pyruvate from glycolysis changes to lactic acid. ... In alcoholic fermentation, pyruvate changes to alcohol and carbon dioxide. This type of fermentation is carried out by yeasts and some bacteria.</code>   | <code>0.0</code> |
  | <code>are light kits universal for ceiling fans?</code>                               | <code>Not all Universal Light Kits are actually Universal. They can be universal to only that manufacturer. ... Casablanca and Hunter Ceiling Fan Light Kits are universal only to their own fans.</code>                                                                                          | <code>1.0</code> |
* Loss: [<code>CoSENTLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosentloss) with these parameters:
  ```json
  {
      "scale": 20.0,
      "similarity_fct": "pairwise_cos_sim"
  }
  ```

### Training Hyperparameters
#### Non-Default Hyperparameters

- `per_device_train_batch_size`: 16
- `per_device_eval_batch_size`: 16
- `num_train_epochs`: 1
- `warmup_ratio`: 0.1
- `seed`: 12
- `bf16`: True
- `dataloader_num_workers`: 4

#### All Hyperparameters
<details><summary>Click to expand</summary>

- `overwrite_output_dir`: False
- `do_predict`: False
- `eval_strategy`: no
- `prediction_loss_only`: True
- `per_device_train_batch_size`: 16
- `per_device_eval_batch_size`: 16
- `per_gpu_train_batch_size`: None
- `per_gpu_eval_batch_size`: None
- `gradient_accumulation_steps`: 1
- `eval_accumulation_steps`: None
- `torch_empty_cache_steps`: None
- `learning_rate`: 5e-05
- `weight_decay`: 0.0
- `adam_beta1`: 0.9
- `adam_beta2`: 0.999
- `adam_epsilon`: 1e-08
- `max_grad_norm`: 1.0
- `num_train_epochs`: 1
- `max_steps`: -1
- `lr_scheduler_type`: linear
- `lr_scheduler_kwargs`: {}
- `warmup_ratio`: 0.1
- `warmup_steps`: 0
- `log_level`: passive
- `log_level_replica`: warning
- `log_on_each_node`: True
- `logging_nan_inf_filter`: True
- `save_safetensors`: True
- `save_on_each_node`: False
- `save_only_model`: False
- `restore_callback_states_from_checkpoint`: False
- `no_cuda`: False
- `use_cpu`: False
- `use_mps_device`: False
- `seed`: 12
- `data_seed`: None
- `jit_mode_eval`: False
- `use_ipex`: False
- `bf16`: True
- `fp16`: False
- `fp16_opt_level`: O1
- `half_precision_backend`: auto
- `bf16_full_eval`: False
- `fp16_full_eval`: False
- `tf32`: None
- `local_rank`: 0
- `ddp_backend`: None
- `tpu_num_cores`: None
- `tpu_metrics_debug`: False
- `debug`: []
- `dataloader_drop_last`: False
- `dataloader_num_workers`: 4
- `dataloader_prefetch_factor`: None
- `past_index`: -1
- `disable_tqdm`: False
- `remove_unused_columns`: True
- `label_names`: None
- `load_best_model_at_end`: False
- `ignore_data_skip`: False
- `fsdp`: []
- `fsdp_min_num_params`: 0
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
- `tp_size`: 0
- `fsdp_transformer_layer_cls_to_wrap`: None
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
- `deepspeed`: None
- `label_smoothing_factor`: 0.0
- `optim`: adamw_torch
- `optim_args`: None
- `adafactor`: False
- `group_by_length`: False
- `length_column_name`: length
- `ddp_find_unused_parameters`: None
- `ddp_bucket_cap_mb`: None
- `ddp_broadcast_buffers`: False
- `dataloader_pin_memory`: True
- `dataloader_persistent_workers`: False
- `skip_memory_metrics`: True
- `use_legacy_prediction_loop`: False
- `push_to_hub`: False
- `resume_from_checkpoint`: None
- `hub_model_id`: None
- `hub_strategy`: every_save
- `hub_private_repo`: None
- `hub_always_push`: False
- `gradient_checkpointing`: False
- `gradient_checkpointing_kwargs`: None
- `include_inputs_for_metrics`: False
- `include_for_metrics`: []
- `eval_do_concat_batches`: True
- `fp16_backend`: auto
- `push_to_hub_model_id`: None
- `push_to_hub_organization`: None
- `mp_parameters`: 
- `auto_find_batch_size`: False
- `full_determinism`: False
- `torchdynamo`: None
- `ray_scope`: last
- `ddp_timeout`: 1800
- `torch_compile`: False
- `torch_compile_backend`: None
- `torch_compile_mode`: None
- `include_tokens_per_second`: False
- `include_num_input_tokens_seen`: False
- `neftune_noise_alpha`: None
- `optim_target_modules`: None
- `batch_eval_metrics`: False
- `eval_on_start`: False
- `use_liger_kernel`: False
- `eval_use_gather_object`: False
- `average_tokens_across_devices`: False
- `prompts`: None
- `batch_sampler`: batch_sampler
- `multi_dataset_batch_sampler`: proportional

</details>

### Training Logs
| Epoch | Step | Training Loss |
|:-----:|:----:|:-------------:|
| 0.008 | 1    | 3.5339        |


### Framework Versions
- Python: 3.11.12
- Sentence Transformers: 4.1.0
- Transformers: 4.51.3
- PyTorch: 2.6.0+cu124
- Accelerate: 1.5.2
- Datasets: 3.5.0
- Tokenizers: 0.21.1

## Citation

### BibTeX

#### Sentence Transformers
```bibtex
@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
```

#### CoSENTLoss
```bibtex
@online{kexuefm-8847,
    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
    author={Su Jianlin},
    year={2022},
    month={Jan},
    url={https://kexue.fm/archives/8847},
}
```

<!--
## Glossary

*Clearly define terms in order to be accessible across audiences.*
-->

<!--
## Model Card Authors

*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
-->

<!--
## Model Card Contact

*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
-->