File size: 6,370 Bytes
98ef490
 
d63834d
 
f53ec14
d63834d
f53ec14
6efdc98
 
9e2fa5a
6efdc98
 
 
09e3a52
 
 
66b094b
09e3a52
 
 
 
 
 
 
 
 
49f7b80
09e3a52
 
 
 
 
 
 
 
 
 
 
27de13f
 
 
196b9de
27de13f
 
 
 
 
 
 
 
 
b963f7b
 
 
 
f341627
b963f7b
da27032
b963f7b
558d010
19c1733
 
 
 
 
1d960c0
19c1733
 
 
 
 
 
 
 
dc4aef9
19c1733
 
 
 
 
 
 
 
 
5a497b3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19c1733
8740e1b
 
 
 
247a0a0
 
 
 
8740e1b
 
247a0a0
8740e1b
 
 
 
 
d1f176f
5e81566
 
8740e1b
 
 
 
 
747372a
8740e1b
 
 
 
 
 
 
5ce8d3f
8740e1b
 
 
a6bfb9a
423b5d4
 
8740e1b
 
 
 
19c1733
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
---
pipeline_tag: text-generation
inference: true
widget:
- text: "public class HelloWorld {\n    public static void main(String[] args) {"
  example_title: Hello world
  group: Java
license: bigcode-openrail-m
datasets:
- bigcode/starcoderdata
metrics:
- code_eval
library_name: transformers  
tags:
- code
model-index:
- name: NT-Java-1.1B
  results:
  - task:
      type: text-generation
    dataset:
      type: nuprl/MultiPL-E
      name: MultiPL-HumanEval (Java)
    metrics:
    - name: pass@1
      type: pass@1
      value: 18.3
      verified: false
extra_gated_prompt: >-
  ## Model License Agreement

  Please read the BigCode [OpenRAIL-M
  license](https://huggingface.co/spaces/bigcode/bigcode-model-license-agreement)
  agreement before accepting it.
    
extra_gated_fields:
  I accept the above license agreement, and will use the Model complying with the set of use restrictions and sharing requirements: checkbox
duplicated_from: bigcode-data/starcoderbase-1b
---


# NT-Java-1.1B


##  Table of Contents

1. [Model Summary](##model-summary)
2. [Use](##use)
3. [Limitations](##limitations)
4. [Training](##training)
5. [License](##license)
6. [Citation](##citation)

## Model Summary

The Narrow Transformer (NT) model NT-Java-1.1B is an open-source specialized code model built by extending pre-training on StarCoderBase-1B, designed for coding tasks in Java programming. The model is a decoder-only transformer with Multi-Query Attention and with a context length of 8192 tokens. The model was trained with Java subset of the StarCoderData dataset, which is ~22B tokens.

- **Repository:** [bigcode/Megatron-LM](https://github.com/bigcode-project/Megatron-LM)
- **Paper:** 
- **Language(s):** Java

## Use

### Intended use

Large code models require specialized hardware like GPUs for inference, highlighting the need for research into building small code models that can be deployed on developer desktops. This model addresses the gap by focusing on the development of a small Java code model and introducing a quantized version of NT-Java-1.1B, which performs comparably to open 1.1B models on MultiPL-E Java code benchmarks, making it ideal for desktop deployment.

**Feel free to share your generations in the Community tab!**

### Generation
```Java
# pip install -q transformers
from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "infosys/NT-Java-1.1B"
device = "cuda" # for GPU usage or "cpu" for CPU usage

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

inputs = tokenizer.encode("public class HelloWorld {\n    public static void main(String[] args) {", return_tensors="pt").to(device)
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))
```
#### Quantized Versions through `bitsandbytes`
* _Using 8-bit precision (int8)_

```java
# pip install bitsandbytes accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

# to use 4bit use `load_in_4bit=True` instead
quantization_config = BitsAndBytesConfig(load_in_8bit=True)

checkpoint = "infosys/NT-Java-1.1B"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint, quantization_config=quantization_config)

inputs = tokenizer.encode("public class HelloWorld {\n    public static void main(String[] args) {", return_tensors="pt").to("cuda")
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))
```

### Attribution & Other Requirements

The pretraining dataset of the model was filtered for permissive licenses only. Nevertheless, the model can generate source code verbatim from the dataset. The code's license might require attribution and/or other specific requirements that must be respected. We provide a [search index](https://huggingface.co/spaces/bigcode/starcoder-search) that let's you search through the pretraining data to identify where generated code came from and apply the proper attribution to your code.

# Benefits

Large code models require specialized hardware like GPUs for inference, highlighting the need for research into building small code models that can be deployed on developer desktops.  This model addresses the gap by focusing on the development of a small Java code model and introducing a quantized version (in different forms like GGML, GGUF) of NT-Java-1.1B, which performs comparably to open 1.1B models on MultiPL-E Java code benchmarks, making it ideal for desktop deployment.

# Limitations

The model, NT-Java-1.1B, has been trained on publicly available datasets and comes without any safety guarantees. Due to this, like all Language Models, its outputs cannot be reliably predicted and sometimes the generated code is not guaranteed to work as intended. It can also be inefficient and may contain bugs or exploits. Therefore, it's crucial for users and developers to conduct thorough safety testing and implement filtering mechanisms tailored to their needs.

# Training

## Model

- **Architecture:** GPT-2 model with Multi-Query Attention and Fill-in-the-Middle objective
- **•Fine-training steps:** 50k
- **Pretraining tokens:** 22 Billion
- **Precision:** bfloat16

## Hardware

- **GPUs:** 6 NVIDIA A100 80GB 
- **Training time:**  4 days

## Software

- **Orchestration:** [Megatron-LM](https://github.com/bigcode-project/Megatron-LM)
- **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch)

# License
The model checkpoint and vocabulary file are licensed under the [BigCode OpenRAIL-M v1](https://huggingface.co/spaces/bigcode/bigcode-model-license-agreement) . Under the license, you must evaluate if your use case does not violate the use-case restriction under Attachment A of the License.  Any modification of the model (finetuning or extended pre training) for further downstream task needs to be released under [BigCode OpenRAIL-M v1](https://huggingface.co/spaces/bigcode/bigcode-model-license-agreement).
# Citation
```
@article{li2023starcoder,
      title={NARROW TRANSFORMER: STARCODER-BASED JAVA-LM FOR DESKTOP}, 
      author={Kamalkumar Rathinasamy and Balaji A J and Rajab Ali Mondal and Ankush Kumar and Harshini K and Gagan Gayari and Sreenivasa Raghavan Karumboor Seshadri},
      year={2024},
      eprint={2305.06161},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```