--- tags: - sentence-transformers - sentence-similarity - feature-extraction - generated_from_trainer - dataset_size:2000 - loss:CoSENTLoss base_model: avsolatorio/GIST-small-Embedding-v0 widget: - source_sentence: do vivid seats tickets work? sentences: - Charlotte-Mecklenburg Schools will be closed for students on Friday due to the forecast of severe weather. ... CMS staff members work with city and county leaders to receive the most up-to-date information about road and weather conditions. - Tickets are $40 per ticket and $400 for a table of ten. Tickets are available for purchase when you register for the show. - This service is currently offered free of charge by the bank. You can get the last 'Available' balance of your account (by an SMS) by giving a Missed Call to 18008431122. You can get the Mini Statement (by an SMS) for last 5 transactions in your account by giving a Missed Call to 18008431133. 1. - source_sentence: is alexa compatible with tv? sentences: - To fix this Echo red light, start with the restart of the router and Amazon Echo. In case, the restart process doesn't work, check for the device and app update in Alexa app. If it's available, click the 'Update' button for compatibility reason. - Ligament - A small band of dense, white, fibrous elastic tissue. Ligaments connect the ends of bones together in order to form a joint. Tendon - A tough, flexible band of fibrous connective tissue that connects muscles to bones. - There are 610 calories in a 1 bowl serving of El Pollo Loco Original Pollo Bowl. - source_sentence: can you play fortnite save the world on mac? sentences: - '[''In the Music app on your Mac, click iTunes Store in the sidebar. ... '', ''Click Purchased (below Quick Links) near the top right of the iTunes Store window.'', ''Click Music near the top right of the page that appears. ... '', ''To download an item, click its Download button .'']' - Essential Oils in the Second and Third Trimesters. "In the second and third trimesters, some essential oils are safe to use, as your baby is more developed," Edwards adds. These include lavender, chamomile, and ylang ylang—all of which calm, relax, and aid sleep. - ADR holders do not have to transact the trade in the foreign currency or worry about exchanging currency on the forex market. ... ADRs list on either the New York Stock Exchange (NYSE), American Stock Exchange (AMEX), or the Nasdaq, but they are also sold over-the-counter (OTC). - source_sentence: how long does money take to transfer boi? sentences: - 'When will it take more than one working day? It will take more than one working day to reach your payee''s bank when: You make a payment online after 3.30pm in the Republic of Ireland or after 4.30pm in Northern Ireland and Great Britain on a working day. Your payment will begin to process on the next working day.' - If you had bought just one share of Microsoft at the IPO, you would now have 288 shares after all the splits. Those shares would be worth $44,505 at the current stock quote of $154.53. A $5,000 investment would have purchased 238 shares at the IPO price. - FKM is the American standard ASTM short form name for Fluro-Elastomer. ... VITON™ is a registered trademark of Du Pont performance elastomers, the original developers of the rubber. However, the Viton is also used as a general name for the material, no matter who the manufacturer is. - source_sentence: how long is a texas vehicle inspection report good for? sentences: - '[''Aerospace engineer.'', ''Automotive engineer.'', ''CAD technician.'', ''Contracting civil engineer.'', ''Control and instrumentation engineer.'', ''Maintenance engineer.'', ''Mechanical engineer.'', ''Nuclear engineer.'']' - A key difference is that it's simpler to unlock a credit lock than it is to “thaw” a credit freeze. But a freeze may afford legal protections that a lock doesn't. ... The credit bureaus sometimes promote their credit lock services, which can carry a monthly fee, alongside their credit freeze options, which are free. - If your car fails its MOT you can only continue to drive it if the previous year's MOT is still valid - which might occur if you submitted the car for its test two weeks early. You can still drive it away from the testing centre or garage if no 'dangerous' problems were identified during the MOT. pipeline_tag: sentence-similarity library_name: sentence-transformers --- # SentenceTransformer based on avsolatorio/GIST-small-Embedding-v0 This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [avsolatorio/GIST-small-Embedding-v0](https://huggingface.co/avsolatorio/GIST-small-Embedding-v0). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [avsolatorio/GIST-small-Embedding-v0](https://huggingface.co/avsolatorio/GIST-small-Embedding-v0) - **Maximum Sequence Length:** 512 tokens - **Output Dimensionality:** 384 dimensions - **Similarity Function:** Cosine Similarity ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) (2): Normalize() ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("moshew/gist_small_ft_gooaq_v3") # Run inference sentences = [ 'how long is a texas vehicle inspection report good for?', "If your car fails its MOT you can only continue to drive it if the previous year's MOT is still valid - which might occur if you submitted the car for its test two weeks early. You can still drive it away from the testing centre or garage if no 'dangerous' problems were identified during the MOT.", "['Aerospace engineer.', 'Automotive engineer.', 'CAD technician.', 'Contracting civil engineer.', 'Control and instrumentation engineer.', 'Maintenance engineer.', 'Mechanical engineer.', 'Nuclear engineer.']", ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 384] # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] ``` ## Training Details ### Training Dataset #### Unnamed Dataset * Size: 2,000 training samples * Columns: sentence1, sentence2, and label * Approximate statistics based on the first 1000 samples: | | sentence1 | sentence2 | label | |:--------|:----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:--------------------------------------------------------------| | type | string | string | float | | details | | | | * Samples: | sentence1 | sentence2 | label | |:--------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------| | what is the difference between rapid rise yeast and bread machine yeast? | Though there are some minor differences in shape and nutrients, Rapid-Rise Yeast is (pretty much) the same as Instant Yeast and Bread Machine Yeast. ... Also, Rapid-Rise Yeast is a little more potent than Active Dry Yeast and can be mixed in with your dry ingredients directly. | 1.0 | | what is the difference between rapid rise yeast and bread machine yeast? | Fermentation recycles NAD+, and produces 2 ATPs. In lactic acid fermentation, pyruvate from glycolysis changes to lactic acid. ... In alcoholic fermentation, pyruvate changes to alcohol and carbon dioxide. This type of fermentation is carried out by yeasts and some bacteria. | 0.0 | | are light kits universal for ceiling fans? | Not all Universal Light Kits are actually Universal. They can be universal to only that manufacturer. ... Casablanca and Hunter Ceiling Fan Light Kits are universal only to their own fans. | 1.0 | * Loss: [CoSENTLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosentloss) with these parameters: ```json { "scale": 20.0, "similarity_fct": "pairwise_cos_sim" } ``` ### Training Hyperparameters #### Non-Default Hyperparameters - `per_device_train_batch_size`: 16 - `per_device_eval_batch_size`: 16 - `num_train_epochs`: 1 - `warmup_ratio`: 0.1 - `seed`: 12 - `bf16`: True - `dataloader_num_workers`: 4 #### All Hyperparameters
Click to expand - `overwrite_output_dir`: False - `do_predict`: False - `eval_strategy`: no - `prediction_loss_only`: True - `per_device_train_batch_size`: 16 - `per_device_eval_batch_size`: 16 - `per_gpu_train_batch_size`: None - `per_gpu_eval_batch_size`: None - `gradient_accumulation_steps`: 1 - `eval_accumulation_steps`: None - `torch_empty_cache_steps`: None - `learning_rate`: 5e-05 - `weight_decay`: 0.0 - `adam_beta1`: 0.9 - `adam_beta2`: 0.999 - `adam_epsilon`: 1e-08 - `max_grad_norm`: 1.0 - `num_train_epochs`: 1 - `max_steps`: -1 - `lr_scheduler_type`: linear - `lr_scheduler_kwargs`: {} - `warmup_ratio`: 0.1 - `warmup_steps`: 0 - `log_level`: passive - `log_level_replica`: warning - `log_on_each_node`: True - `logging_nan_inf_filter`: True - `save_safetensors`: True - `save_on_each_node`: False - `save_only_model`: False - `restore_callback_states_from_checkpoint`: False - `no_cuda`: False - `use_cpu`: False - `use_mps_device`: False - `seed`: 12 - `data_seed`: None - `jit_mode_eval`: False - `use_ipex`: False - `bf16`: True - `fp16`: False - `fp16_opt_level`: O1 - `half_precision_backend`: auto - `bf16_full_eval`: False - `fp16_full_eval`: False - `tf32`: None - `local_rank`: 0 - `ddp_backend`: None - `tpu_num_cores`: None - `tpu_metrics_debug`: False - `debug`: [] - `dataloader_drop_last`: False - `dataloader_num_workers`: 4 - `dataloader_prefetch_factor`: None - `past_index`: -1 - `disable_tqdm`: False - `remove_unused_columns`: True - `label_names`: None - `load_best_model_at_end`: False - `ignore_data_skip`: False - `fsdp`: [] - `fsdp_min_num_params`: 0 - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - `tp_size`: 0 - `fsdp_transformer_layer_cls_to_wrap`: None - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} - `deepspeed`: None - `label_smoothing_factor`: 0.0 - `optim`: adamw_torch - `optim_args`: None - `adafactor`: False - `group_by_length`: False - `length_column_name`: length - `ddp_find_unused_parameters`: None - `ddp_bucket_cap_mb`: None - `ddp_broadcast_buffers`: False - `dataloader_pin_memory`: True - `dataloader_persistent_workers`: False - `skip_memory_metrics`: True - `use_legacy_prediction_loop`: False - `push_to_hub`: False - `resume_from_checkpoint`: None - `hub_model_id`: None - `hub_strategy`: every_save - `hub_private_repo`: None - `hub_always_push`: False - `gradient_checkpointing`: False - `gradient_checkpointing_kwargs`: None - `include_inputs_for_metrics`: False - `include_for_metrics`: [] - `eval_do_concat_batches`: True - `fp16_backend`: auto - `push_to_hub_model_id`: None - `push_to_hub_organization`: None - `mp_parameters`: - `auto_find_batch_size`: False - `full_determinism`: False - `torchdynamo`: None - `ray_scope`: last - `ddp_timeout`: 1800 - `torch_compile`: False - `torch_compile_backend`: None - `torch_compile_mode`: None - `include_tokens_per_second`: False - `include_num_input_tokens_seen`: False - `neftune_noise_alpha`: None - `optim_target_modules`: None - `batch_eval_metrics`: False - `eval_on_start`: False - `use_liger_kernel`: False - `eval_use_gather_object`: False - `average_tokens_across_devices`: False - `prompts`: None - `batch_sampler`: batch_sampler - `multi_dataset_batch_sampler`: proportional
### Training Logs | Epoch | Step | Training Loss | |:-----:|:----:|:-------------:| | 0.008 | 1 | 3.5339 | ### Framework Versions - Python: 3.11.12 - Sentence Transformers: 4.1.0 - Transformers: 4.51.3 - PyTorch: 2.6.0+cu124 - Accelerate: 1.5.2 - Datasets: 3.5.0 - Tokenizers: 0.21.1 ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ``` #### CoSENTLoss ```bibtex @online{kexuefm-8847, title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT}, author={Su Jianlin}, year={2022}, month={Jan}, url={https://kexue.fm/archives/8847}, } ```