metadata

pipeline_tag: text-generation
inference: true
widget:
  - text: |-
      public class HelloWorld {
          public static void main(String[] args) {
    example_title: Hello world
    group: Java
license: bigcode-openrail-m
datasets:
  - bigcode/starcoderdata
metrics:
  - code_eval
library_name: transformers
tags:
  - code
model-index:
  - name: NT-Java-1.1B
    results:
      - task:
          type: text-generation
        dataset:
          type: nuprl/MultiPL-E
          name: MultiPL-HumanEval (Java)
        metrics:
          - name: pass@1
            type: pass@1
            value: 18.3
            verified: false
extra_gated_prompt: >-
  ## Model License Agreement

  Please read the BigCode [OpenRAIL-M
  license](https://huggingface.co/spaces/bigcode/bigcode-model-license-agreement)
  agreement before accepting it.
    
extra_gated_fields:
  I accept the above license agreement, and will use the Model complying with the set of use restrictions and sharing requirements: checkbox
duplicated_from: bigcode-data/starcoderbase-1b

NT-Java-1.1B

Model Summary
Use
Limitations
Training
License
Citation

Model Summary

The Narrow Transformer (NT) model NT-Java-1.1B is an open-source specialized code model built by extending pre-training on StarCoderBase-1B, designed for coding tasks in Java programming. The model is a decoder-only transformer with Multi-Query Attention and with a context length of 8192 tokens. The model was trained with Java subset of the StarCoderData dataset, which is ~22B tokens.

Repository: bigcode/Megatron-LM
Paper:
Language(s): Java

Use

Intended use

Large code models require specialized hardware like GPUs for inference, highlighting the need for research into building small code models that can be deployed on developer desktops. Being a small language model (SLM), the NT-Java-1.1B can be deployed on consumer-grade PCs. It outperforms comparably-sized open-source code models in Java programming tasks. Feel free to explore this powerful language model for your Java projects!

Quantized versions of NT-Java-1.1B, NT-Java-1.1B-GGML & NT-Java-1.1B-GGUF, performs comparably to open 1B models on MultiPL-E Java code benchmarks and can be used with multiple frameworks, including Ctranslate2, GPT4ALL, etc., making it versatile for various deployment scenarios.

Feel free to share your generations in the Community tab!

Generation

# pip install -q transformers
from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "infosys/NT-Java-1.1B"
device = "cuda" # for GPU usage or "cpu" for CPU usage

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

inputs = tokenizer.encode("public class HelloWorld {\n    public static void main(String[] args) {", return_tensors="pt").to(device)
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

Quantized Versions through `bitsandbytes`

Using 8-bit precision (int8)

# pip install bitsandbytes accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

# to use 4bit use `load_in_4bit=True` instead
quantization_config = BitsAndBytesConfig(load_in_8bit=True)

checkpoint = "infosys/NT-Java-1.1B"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint, quantization_config=quantization_config)

inputs = tokenizer.encode("public class HelloWorld {\n    public static void main(String[] args) {", return_tensors="pt").to("cuda")
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

Attribution & Other Requirements

The pretraining dataset of the model was filtered for permissive licenses only. Nevertheless, the model can generate source code verbatim from the dataset. The code's license might require attribution and/or other specific requirements that must be respected. We provide a search index that let's you search through the pretraining data to identify where generated code came from and apply the proper attribution to your code.

Limitations

The model, NT-Java-1.1B, has been trained on publicly available datasets and comes without any safety guarantees. Due to this, like all Language Models, its outputs cannot be reliably predicted and sometimes the generated code is not guaranteed to work as intended. It can also be inefficient and may contain bugs or exploits. Therefore, it's crucial for users and developers to conduct thorough safety testing and implement filtering mechanisms tailored to their needs.

Training

Model

Architecture: GPT-2 model with Multi-Query Attention and Fill-in-the-Middle objective
Pretraining steps: 50k
Pretraining tokens: 22 Billion
Precision: bfloat16

Hardware

GPUs: 6 NVIDIA A100 80GB
Training time: 4 days

Software

Orchestration: Megatron-LM
Neural networks: PyTorch

License

The model is licensed under the BigCode OpenRAIL-M v1 license agreement. You can find the full agreement here.

Citation

@article{li2023starcoder,
      title={NARROW TRANSFORMER: STARCODER-BASED JAVA-LM FOR DESKTOP},
      author={Kamalkumar Rathinasamy and Balaji A J and Rajab Ali Mondal and Ankush Kumar and Harshini K and Gagan Gayari and Sreenivasa Raghavan Karumboor Seshadri},
      year={2024},
      eprint={2305.06161},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

infosys
/

NT-Java-1.1B

NT-Java-1.1B

Table of Contents

Model Summary

Use

Intended use

Generation

Quantized Versions through `bitsandbytes`

Attribution & Other Requirements

Limitations

Training

Model

Hardware

Software

License

Citation

NT-Java-1.1B

Table of Contents

Model Summary

Use

Intended use

Generation

Quantized Versions through bitsandbytes

Attribution & Other Requirements

Limitations

Training

Model

Hardware

Software

License

Citation

Quantized Versions through `bitsandbytes`