File size: 4,514 Bytes
98ef490 fb99c6c 98ef490 d63834d d3a1b2a d63834d 22cbd10 27de13f 4e71283 27de13f b963f7b b4bb557 b963f7b da27032 b963f7b da27032 19c1733 1d960c0 19c1733 dc4aef9 19c1733 8740e1b 247a0a0 8740e1b 247a0a0 8740e1b 5e81566 8740e1b 747372a 8740e1b dc4aef9 8740e1b 19c1733 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 |
---
license: apache-2.0
language:
- en
library_name: transformers
pipeline_tag: text-generation
inference: true
widget:
- text: "public class HelloWorld {\n public static void main(String[] args) {"
example_title: Hello world
group: Java
---
# NT-Java
## Table of Contents
1. [Model Summary](##model-summary)
2. [Use](##use)
3. [Limitations](##limitations)
4. [Training](##training)
5. [License](##license)
6. [Citation](##citation)
## Model Summary
The Narrow Transformer (NT) model NT-Java-1.1B is an open-source specialized code model built by extending pre-training on starcoderbase-1b, designed for code related tasks in Java programming. The model is a decoder-only transformer with Multi-Query-Attention and a context length of 8192 tokens. The model has been trained with Java subset of the starcoderdata dataset, which is ~22B tokens.
- **Repository:** [bigcode/Megatron-LM](https://github.com/bigcode-project/Megatron-LM)
- **Project Website:**
- **Paper:**
- **Point of Contact:**
- **Languages:** Java
## Use
### Intended use
Large code models require specialized hardware like GPUs for inference, highlighting the need for research into building small code models that can be deployed on developer desktops. This model addresses the gap by focusing on the development of a small Java code model and introducing a quantized version of NT-Java-1.1B, which performs comparably to open 1.1B models on MultiPL-E Java code benchmarks, making it ideal for desktop deployment.
**Feel free to share your generations in the Community tab!**
### Generation
```Java
# pip install -q transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "infosys/NT-Java-1.1B"
device = "cuda" # for GPU usage or "cpu" for CPU usage
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
inputs = tokenizer.encode("public class HelloWorld {\n public static void main(String[] args) {", return_tensors="pt").to(device)
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))
```
### Attribution & Other Requirements
The pretraining dataset of the model was filtered for permissive licenses only. Nevertheless, the model can generate source code verbatim from the dataset. The code's license might require attribution and/or other specific requirements that must be respected. We provide a [search index](https://huggingface.co/spaces/bigcode/starcoder-search) that let's you search through the pretraining data to identify where generated code came from and apply the proper attribution to your code.
# Benefits
Large code models require specialized hardware like GPUs for inference, highlighting the need for research into building small code models that can be deployed on developer desktops. This model addresses the gap by focusing on the development of a small Java code model and introducing a quantized version (in different forms like GGML, GGUF) of NT-Java-1.1B, which performs comparably to open 1.1B models on MultiPL-E Java code benchmarks, making it ideal for desktop deployment.
# Limitations
The model, NT-Java-1.1B, has been trained on publicly available datasets and comes without any safety guarantees. Due to this, like all Language Models, its outputs cannot be reliably predicted and sometimes the generated code is not guaranteed to work as intended. It can also be inefficient and may contain bugs or exploits. Therefore, it's crucial for users and developers to conduct thorough safety testing and implement filtering mechanisms tailored to their needs.
# Training
## Model
- **Architecture:** GPT-2 model with multi-query attention and Fill-in-the-Middle objective
- **•Fine-training steps:** 50k
- **Pretraining tokens:** 22 Billion
- **Precision:** bfloat16
## Hardware
- **GPUs:** 6 NVIDIA A100 80GB
- **Training time:** 4 days
## Software
- **Orchestration:** [Megatron-LM](https://github.com/bigcode-project/Megatron-LM)
- **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch)
- **BP16 if applicable:** [apex](https://github.com/NVIDIA/apex)
# License
The model is licensed under the Apache license 2.0 license agreement. You can find the full agreement [here](https://www.apache.org/licenses/LICENSE-2.0).
# Citation
```
@article{li2023starcoder,
title={JavaCoder: may the source be with you!},
author={},
year={2023},
eprint={2305.06161},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
``` |