SentenceTransformer based on nomic-ai/modernbert-embed-base
This is a sentence-transformers model finetuned from nomic-ai/modernbert-embed-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: nomic-ai/modernbert-embed-base
- Maximum Sequence Length: 8192 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("lochhonest/modernbert-finetuned-for-sas")
# Run inference
sentences = [
'In nearly all cases, how many source and background region spectra are supplied for the RGS?',
'RGS spectral products\n\nThis section describes the spectral data products to be generated from\npointed observations.\n\nSource and background region spectra and a background-subtracted source\nspectrum are supplied for the brightest point sources in the RGS (in\nnearly all cases this is just one source). Spectral response matrices\nare also supplied.\n',
"- This extension gives the good time intervals for the event list.\n\n- There is one extension per CCD in the relevant mode (IMAGING or\n TIMING) during the exposure.\n\n- The following keywords are present:\n\n HDUCLASS= 'OGIP ' / format conforms to OGIP standard\n HDUCLAS1= 'GTI ' / table contains Good Time Intervals\n HDUCLAS2= 'STANDARD' / standard Good Time Interval table\n\n- This extension contains the following columns:\n\n Name Type Description\n ------- ------------- --------------------------------\n START 8-byte REAL seconds (since reference time)\n STOP 8-byte REAL seconds (since reference time)\n",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Details
Training Dataset
Unnamed Dataset
- Size: 3,619 training samples
- Columns:
anchor
andpositive
- Approximate statistics based on the first 1000 samples:
anchor positive type string string details - min: 2 tokens
- mean: 15.7 tokens
- max: 38 tokens
- min: 2 tokens
- mean: 411.84 tokens
- max: 3755 tokens
- Samples:
anchor positive What is the purpose of the document described in the preface?
Preface
This is the reference document describing the individual XMM-Newton
Survey Science Centre (SSC) data product files. It is intended to be of
use to software developers, archive administrators and to scientists
analysing XMM-Newton data. Please see the SSC data products Interface
Control Document (XMM-SOC-ICD-0006-SSC, issue 4.0) for a description of
the product group files and other related files that are sent to the
SOC.
This version (4.3) includes changes related to the upgrade to SAS16.0 in
the processing pipeline originally developped in 2012 to uniformly
process all the XMM data at that time, from which the 3XMM catalogue was
derived. Revisions and additions since version 4.2 are identified by
change bars at the right of each page.
This document will continue to evolve through subsequent issues, under
indirect control from the SAS and SSC configuration control boards.
This document is the result of the work of many people. Contributors
have included:
Hermann Brunner, G...What version of the document is described in the preface?
Preface
This is the reference document describing the individual XMM-Newton
Survey Science Centre (SSC) data product files. It is intended to be of
use to software developers, archive administrators and to scientists
analysing XMM-Newton data. Please see the SSC data products Interface
Control Document (XMM-SOC-ICD-0006-SSC, issue 4.0) for a description of
the product group files and other related files that are sent to the
SOC.
This version (4.3) includes changes related to the upgrade to SAS16.0 in
the processing pipeline originally developped in 2012 to uniformly
process all the XMM data at that time, from which the 3XMM catalogue was
derived. Revisions and additions since version 4.2 are identified by
change bars at the right of each page.
This document will continue to evolve through subsequent issues, under
indirect control from the SAS and SSC configuration control boards.
This document is the result of the work of many people. Contributors
have included:
Hermann Brunner, G...What is the main change in version 4.3 of the document?
Preface
This is the reference document describing the individual XMM-Newton
Survey Science Centre (SSC) data product files. It is intended to be of
use to software developers, archive administrators and to scientists
analysing XMM-Newton data. Please see the SSC data products Interface
Control Document (XMM-SOC-ICD-0006-SSC, issue 4.0) for a description of
the product group files and other related files that are sent to the
SOC.
This version (4.3) includes changes related to the upgrade to SAS16.0 in
the processing pipeline originally developped in 2012 to uniformly
process all the XMM data at that time, from which the 3XMM catalogue was
derived. Revisions and additions since version 4.2 are identified by
change bars at the right of each page.
This document will continue to evolve through subsequent issues, under
indirect control from the SAS and SSC configuration control boards.
This document is the result of the work of many people. Contributors
have included:
Hermann Brunner, G... - Loss:
CachedMultipleNegativesRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "get_similarity" }
Evaluation Dataset
Unnamed Dataset
- Size: 30 evaluation samples
- Columns:
anchor
andpositive
- Approximate statistics based on the first 30 samples:
anchor positive type string string details - min: 8 tokens
- mean: 16.0 tokens
- max: 24 tokens
- min: 6 tokens
- mean: 642.47 tokens
- max: 6152 tokens
- Samples:
anchor positive What is the purpose of the PPS cross-correlation products?
General cross-correlation products
These PPS cross-correlation products list the names of all catalogues
searched (both around each EPIC position and in the whole EPIC field)
and describe the format of their output.What are the task parameters of rgssources?
rgssources
## Parameters
\label{rgssources:description:parameters}
filemode} {modify (Optional): no
(Type:
Controls whether the task opens a previous source list for editing or creates a new one.
}
\optparm{changeprime} {no} {boolean} {yesHow many stars were used in the U-filter analysis for the G153 pointing to create the distortion map?
OM distortion
The OM
(http://www.cosmos.esa.int/web/xmm-newton/technical-details-om) optics,
filters and (primarily) the detector system result in a certain amount
of image distortion. This effect can be corrected with a “distortion
map”, by comparing the expected position with the measured position for
a large number of stars in the OM
(http://www.cosmos.esa.int/web/xmm-newton/technical-details-om) field of
view. A U-filter analysis has been performed on the G153 pointing with
813 stars. The effect of applying this correction is shown in
Fig. [fig:uhb:distmap]. A positional r.m.s. accuracy of 0.5 − 1.5 arcsec
is obtained. The distortion map has been entered into the appropriate
CCF file and is used in http://www.cosmos.esa.int/web/xmm-newton/sas
(http://www.cosmos.esa.int/web/xmm-newton/sas). - Loss:
CachedMultipleNegativesRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "get_similarity" }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 16per_device_eval_batch_size
: 4num_train_epochs
: 2lr_scheduler_type
: constantwarmup_ratio
: 0.1bf16
: Truebatch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 16per_device_eval_batch_size
: 4per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 5e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 2max_steps
: -1lr_scheduler_type
: constantlr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Truefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: no_duplicatesmulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss | Validation Loss |
---|---|---|---|
0.2203 | 50 | 0.2209 | - |
0.4405 | 100 | 0.1635 | 0.0402 |
0.6608 | 150 | 0.1759 | - |
0.8811 | 200 | 0.1674 | 0.1307 |
1.1013 | 250 | 0.1134 | - |
1.3216 | 300 | 0.0809 | 0.0441 |
1.5419 | 350 | 0.0571 | - |
1.7621 | 400 | 0.077 | 0.0268 |
1.9824 | 450 | 0.0557 | - |
Framework Versions
- Python: 3.10.14
- Sentence Transformers: 3.4.1
- Transformers: 4.48.2
- PyTorch: 2.6.0+cu124
- Accelerate: 1.3.0
- Datasets: 3.3.1
- Tokenizers: 0.21.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
CachedMultipleNegativesRankingLoss
@misc{gao2021scaling,
title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
year={2021},
eprint={2101.06983},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
- Downloads last month
- 7
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for lochhonest/modernbert-finetuned-for-sas
Base model
answerdotai/ModernBERT-base
Finetuned
nomic-ai/modernbert-embed-base