r2.1 release updates
Browse files
README.md
CHANGED
@@ -18,25 +18,26 @@ library_name: granite-tsfm
|
|
18 |
</p>
|
19 |
|
20 |
TinyTimeMixers (TTMs) are compact pre-trained models for Multivariate Time-Series Forecasting, open-sourced by IBM Research.
|
21 |
-
**With model sizes starting from 1M params, TTM
|
22 |
|
23 |
|
24 |
-
TTM outperforms
|
25 |
forecasters, pre-trained on publicly available time series data with various augmentations. TTM provides state-of-the-art zero-shot forecasts and can easily be
|
26 |
-
fine-tuned for multi-variate forecasts with just 5% of the training data to be competitive.
|
27 |
|
|
|
|
|
|
|
|
|
28 |
|
29 |
-
**The current open-source version supports point forecasting use-cases specifically ranging from minutely to hourly resolutions
|
30 |
-
(Ex. 10 min, 15 min, 1 hour.).**
|
31 |
|
32 |
-
|
33 |
-
|
34 |
-
|
35 |
-
**TTM-R2 comprises TTM variants pre-trained on larger pretraining datasets (~700M samples).** We have another set of TTM models released under `TTM-R1` trained on ~250M samples
|
36 |
-
which can be accessed from [here](https://huggingface.co/ibm-granite/granite-timeseries-ttm-r1). In general, `TTM-R2` models perform better than `TTM-R1` models as they are
|
37 |
-
trained on larger pretraining dataset. In standard benchmarks, TTM-R2 outperform TTM-R1 by over 15%. However, the choice of R1 vs R2 depends on your target data distribution. Hence requesting users to try both
|
38 |
-
R1 and R2 variants and pick the best for your data.
|
39 |
|
|
|
|
|
|
|
|
|
|
|
40 |
|
41 |
|
42 |
## Model Description
|
@@ -47,15 +48,12 @@ we opt for the approach of constructing smaller pre-trained models, each focusin
|
|
47 |
yielding more accurate results. Furthermore, this approach ensures that our models remain extremely small and exceptionally fast,
|
48 |
facilitating easy deployment without demanding a ton of resources.
|
49 |
|
50 |
-
Hence, in this model card, we release several pre-trained
|
51 |
-
|
52 |
-
|
53 |
-
Each pre-trained model will be released in a different branch name in this model card. Kindly access the required model using our
|
54 |
getting started [notebook](https://github.com/IBM/tsfm/blob/main/notebooks/hfdemo/ttm_getting_started.ipynb) mentioning the branch name.
|
55 |
|
56 |
-
## Model Releases:
|
57 |
|
58 |
-
|
59 |
|
60 |
There are several models available in different branches of this model card. The naming scheme follows the following format:
|
61 |
`<context length>-<prediction length>-<frequency prefix tuning indicator>-<pretraining metric>-<release number>`
|
@@ -70,22 +68,87 @@ There are several models available in different branches of this model card. The
|
|
70 |
|
71 |
- release number ("r2" or "r2.1"): Indicates the model release; the release indicates which data was used to train the model. See "training data" below for more details on the data included in the particular training datasets.
|
72 |
|
73 |
-
|
74 |
|
75 |
-
## Model Capabilities with example scripts
|
76 |
|
77 |
-
|
|
|
78 |
|
79 |
-
- Getting
|
|
|
80 |
- Zeroshot Multivariate Forecasting [[Example]](https://github.com/ibm-granite/granite-tsfm/blob/main/notebooks/hfdemo/ttm_getting_started.ipynb)
|
81 |
- Finetuned Multivariate Forecasting:
|
82 |
- Channel-Independent Finetuning [[Example 1]](https://github.com/ibm-granite/granite-tsfm/blob/main/notebooks/hfdemo/ttm_getting_started.ipynb) [[Example 2]](https://github.com/ibm-granite/granite-tsfm/blob/main/notebooks/hfdemo/tinytimemixer/ttm_m4_hourly.ipynb)
|
83 |
- Channel-Mix Finetuning [[Example]](https://github.com/ibm-granite/granite-tsfm/blob/main/notebooks/tutorial/ttm_channel_mix_finetuning.ipynb)
|
84 |
-
-
|
85 |
-
- Finetuning and Forecasting with Exogenous/Control Variables [[
|
86 |
- Finetuning and Forecasting with static categorical features [Example: To be added soon]
|
87 |
- Rolling Forecasts - Extend forecast lengths via rolling capability. Rolling beyond 2*forecast_length is not recommended. [[Example]](https://github.com/ibm-granite/granite-tsfm/blob/main/notebooks/hfdemo/ttm_rolling_prediction_getting_started.ipynb)
|
88 |
- Helper scripts for optimal Learning Rate suggestions for Finetuning [[Example]](https://github.com/ibm-granite/granite-tsfm/blob/main/notebooks/tutorial/ttm_with_exog_tutorial.ipynb)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
89 |
|
90 |
## Benchmarks
|
91 |
|
@@ -101,15 +164,7 @@ adoption in resource-constrained environments. For more details, refer to our [p
|
|
101 |
- TTM-A referred in the paper maps to the 1536 context models.
|
102 |
|
103 |
The pre-training dataset used in this release differs slightly from the one used in the research
|
104 |
-
paper, which may lead to minor variations in model performance as compared to the published results. Please refer to our paper for more details.
|
105 |
-
|
106 |
-
**Benchmarking Scripts: [here](https://github.com/ibm-granite/granite-tsfm/tree/main/notebooks/hfdemo/tinytimemixer/full_benchmarking)**
|
107 |
-
|
108 |
-
## Recommended Use
|
109 |
-
1. Users have to externally standard scale their data independently for every channel before feeding it to the model (Refer to [TSP](https://github.com/IBM/tsfm/blob/main/tsfm_public/toolkit/time_series_preprocessor.py), our data processing utility for data scaling.)
|
110 |
-
2. The current open-source version supports only minutely and hourly resolutions(Ex. 10 min, 15 min, 1 hour.). Other lower resolutions (say weekly, or monthly) are currently not supported in this version, as the model needs a minimum context length of 512 or 1024.
|
111 |
-
3. Enabling any upsampling or prepending zeros to virtually increase the context length for shorter-length datasets is not recommended and will
|
112 |
-
impact the model performance.
|
113 |
|
114 |
|
115 |
|
@@ -117,128 +172,19 @@ paper, which may lead to minor variations in model performance as compared to th
|
|
117 |
|
118 |
For more details on TTM architecture and benchmarks, refer to our [paper](https://arxiv.org/pdf/2401.03955.pdf).
|
119 |
|
120 |
-
TTM
|
121 |
|
122 |
- **Zeroshot forecasting**: Directly apply the pre-trained model on your target data to get an initial forecast (with no training).
|
123 |
|
124 |
- **Finetuned forecasting**: Finetune the pre-trained model with a subset of your target data to further improve the forecast.
|
125 |
|
126 |
-
|
127 |
-
to get more accurate forecasts.**
|
128 |
|
129 |
The current release supports multivariate forecasting via both channel independence and channel-mixing approaches.
|
130 |
Decoder Channel-Mixing can be enabled during fine-tuning for capturing strong channel-correlation patterns across
|
131 |
-
time-series variates, a critical capability lacking in existing counterparts.
|
132 |
-
|
133 |
-
In addition, TTM also supports exogenous infusion and static categorical data infusion.
|
134 |
-
|
135 |
-
|
136 |
-
### Model Sources
|
137 |
-
|
138 |
-
- **Repository:** https://github.com/ibm-granite/granite-tsfm/tree/main/tsfm_public/models/tinytimemixer
|
139 |
-
- **Paper:** https://arxiv.org/pdf/2401.03955.pdf
|
140 |
-
|
141 |
-
|
142 |
-
### Blogs and articles on TTM:
|
143 |
-
- Refer to our [wiki](https://github.com/ibm-granite/granite-tsfm/wiki)
|
144 |
-
|
145 |
-
|
146 |
-
## Uses
|
147 |
-
|
148 |
|
149 |
-
|
150 |
-
```
|
151 |
-
def get_model(
|
152 |
-
model_path,
|
153 |
-
model_name: str = "ttm",
|
154 |
-
context_length: int = None,
|
155 |
-
prediction_length: int = None,
|
156 |
-
freq_prefix_tuning: bool = None,
|
157 |
-
**kwargs,
|
158 |
-
):
|
159 |
-
|
160 |
-
TTM Model card offers a suite of models with varying context_length and forecast_length combinations.
|
161 |
-
This wrapper automatically selects the right model based on the given input context_length and prediction_length abstracting away the internal
|
162 |
-
complexity.
|
163 |
-
|
164 |
-
Args:
|
165 |
-
model_path (str):
|
166 |
-
HF model card path or local model path (Ex. ibm-granite/granite-timeseries-ttm-r1)
|
167 |
-
model_name (*optional*, str)
|
168 |
-
model name to use. Allowed values: ttm
|
169 |
-
context_length (int):
|
170 |
-
Input Context length. For ibm-granite/granite-timeseries-ttm-r1, we allow 512 and 1024.
|
171 |
-
For ibm-granite/granite-timeseries-ttm-r2 and ibm/ttm-research-r2, we allow 512, 1024 and 1536
|
172 |
-
prediction_length (int):
|
173 |
-
Forecast length to predict. For ibm-granite/granite-timeseries-ttm-r1, we can forecast upto 96.
|
174 |
-
For ibm-granite/granite-timeseries-ttm-r2 and ibm/ttm-research-r2, we can forecast upto 720.
|
175 |
-
Model is trained for fixed forecast lengths (96,192,336,720) and this model add required `prediction_filter_length` to the model instance for required pruning.
|
176 |
-
For Ex. if we need to forecast 150 timepoints given last 512 timepoints using model_path = ibm-granite/granite-timeseries-ttm-r2, then get_model will select the
|
177 |
-
model from 512_192_r2 branch and applies prediction_filter_length = 150 to prune the forecasts from 192 to 150. prediction_filter_length also applies loss
|
178 |
-
only to the pruned forecasts during finetuning.
|
179 |
-
freq_prefix_tuning (*optional*, bool):
|
180 |
-
Future use. Currently do not use this parameter.
|
181 |
-
kwargs:
|
182 |
-
Pass all the extra fine-tuning model parameters intended to be passed in the from_pretrained call to update model configuration.
|
183 |
-
|
184 |
-
|
185 |
-
```
|
186 |
-
|
187 |
-
```
|
188 |
-
# Load Model from HF Model Hub mentioning the branch name in revision field
|
189 |
-
|
190 |
-
|
191 |
-
model = TinyTimeMixerForPrediction.from_pretrained(
|
192 |
-
"https://huggingface.co/ibm-granite/granite-timeseries-ttm-r2", revision="main"
|
193 |
-
)
|
194 |
-
|
195 |
-
or
|
196 |
-
|
197 |
-
from tsfm_public.toolkit.get_model import get_model
|
198 |
-
model = get_model(
|
199 |
-
model_path="https://huggingface.co/ibm-granite/granite-timeseries-ttm-r2",
|
200 |
-
context_length=512,
|
201 |
-
prediction_length=96
|
202 |
-
)
|
203 |
-
|
204 |
-
|
205 |
-
|
206 |
-
# Do zeroshot
|
207 |
-
zeroshot_trainer = Trainer(
|
208 |
-
model=model,
|
209 |
-
args=zeroshot_forecast_args,
|
210 |
-
)
|
211 |
-
)
|
212 |
-
|
213 |
-
zeroshot_output = zeroshot_trainer.evaluate(dset_test)
|
214 |
-
|
215 |
-
|
216 |
-
# Freeze backbone and enable few-shot or finetuning:
|
217 |
-
|
218 |
-
# freeze backbone
|
219 |
-
for param in model.backbone.parameters():
|
220 |
-
param.requires_grad = False
|
221 |
-
|
222 |
-
finetune_model = get_model(
|
223 |
-
model_path="https://huggingface.co/ibm-granite/granite-timeseries-ttm-r2",
|
224 |
-
context_length=512,
|
225 |
-
prediction_length=96,
|
226 |
-
# pass other finetune params of decoder or head
|
227 |
-
head_dropout = 0.2
|
228 |
-
)
|
229 |
-
|
230 |
-
finetune_forecast_trainer = Trainer(
|
231 |
-
model=model,
|
232 |
-
args=finetune_forecast_args,
|
233 |
-
train_dataset=dset_train,
|
234 |
-
eval_dataset=dset_val,
|
235 |
-
callbacks=[early_stopping_callback, tracking_callback],
|
236 |
-
optimizers=(optimizer, scheduler),
|
237 |
-
)
|
238 |
-
finetune_forecast_trainer.train()
|
239 |
-
fewshot_output = finetune_forecast_trainer.evaluate(dset_test)
|
240 |
-
|
241 |
-
```
|
242 |
|
243 |
|
244 |
## Training Data
|
@@ -279,8 +225,8 @@ The r2.1 TTM models (denoted by branches with suffix r2.1) were trained on the a
|
|
279 |
|
280 |
|
281 |
## Citation
|
282 |
-
|
283 |
-
work
|
284 |
|
285 |
**BibTeX:**
|
286 |
|
@@ -295,13 +241,13 @@ work
|
|
295 |
|
296 |
## Model Card Authors
|
297 |
|
298 |
-
Vijay Ekambaram, Arindam Jati, Pankaj Dayama, Wesley M. Gifford, Sumanta Mukherjee, Chandra Reddy and Jayant Kalagnanam
|
299 |
|
300 |
|
301 |
-
## IBM Public Repository Disclosure
|
302 |
|
303 |
All content in this repository including code has been provided by IBM under the associated
|
304 |
open source software license and IBM is under no obligation to provide enhancements,
|
305 |
updates, or support. IBM developers produced this code as an
|
306 |
open source project (not as an IBM product), and IBM makes no assertions as to
|
307 |
-
the level of quality nor security, and will not be maintaining this code going forward.
|
|
|
18 |
</p>
|
19 |
|
20 |
TinyTimeMixers (TTMs) are compact pre-trained models for Multivariate Time-Series Forecasting, open-sourced by IBM Research.
|
21 |
+
**With model sizes starting from 1M params, TTM introduces the notion of the first-ever “tiny” pre-trained models for Time-Series Forecasting. The paper describing TTM was accepted at [NeurIPS 24](https://proceedings.neurips.cc/paper_files/paper/2024/hash/874a4d89f2d04b4bcf9a2c19545cf040-Abstract-Conference.html).**
|
22 |
|
23 |
|
24 |
+
TTM outperforms other models demanding billions of parameters in several popular zero-shot and few-shot forecasting benchmarks. TTMs are lightweight
|
25 |
forecasters, pre-trained on publicly available time series data with various augmentations. TTM provides state-of-the-art zero-shot forecasts and can easily be
|
26 |
+
fine-tuned for multi-variate forecasts with just 5% of the training data to be competitive. **Note that zeroshot, fine-tuning and inference tasks using TTM can easily be executed on 1 GPU or on laptops.**
|
27 |
|
28 |
+
TTM r2 comprises TTM variants pre-trained on larger pretraining datasets (~700M samples). The TTM r2.1 release increases the pretraining dataset size to approximately (~1B samples). The prior model releases, TTM r1, were trained on ~250M samples and can be accessed [here](https://huggingface.co/ibm-granite/granite-timeseries-ttm-r1). In general, TTM r2 models perform better than TTM r1 models as they are
|
29 |
+
trained on a larger pretraining dataset. In standard benchmarks, TTM r2 outperform TTM r1 by over 15%. However, the choice of r1 vs. r2 depends on your target data distribution, and hence users should try both variants and pick the best model for your data.
|
30 |
+
The TTM r2 releases support point forecasting use-cases specifically ranging from minutely to hourly resolutions
|
31 |
+
(Ex. 10 min, 15 min, 1 hour.). With the TTM r2.1 release, we add support for daily and weekly resolutions.
|
32 |
|
|
|
|
|
33 |
|
34 |
+
### Links
|
|
|
|
|
|
|
|
|
|
|
|
|
35 |
|
36 |
+
- **Paper:** [NeurIPS 2024](https://proceedings.neurips.cc/paper_files/paper/2024/hash/874a4d89f2d04b4bcf9a2c19545cf040-Abstract-Conference.html), [ArXiV](https://arxiv.org/pdf/2401.03955.pdf)
|
37 |
+
- **Repository:** https://github.com/ibm-granite/granite-tsfm
|
38 |
+
- **PyPI project:** https://pypi.org/project/granite-tsfm/
|
39 |
+
- **Model architecture:** https://github.com/ibm-granite/granite-tsfm/tree/main/tsfm_public/models/tinytimemixer
|
40 |
+
- **Time Series Cookbook:** https://github.com/ibm-granite-community/granite-timeseries-cookbook
|
41 |
|
42 |
|
43 |
## Model Description
|
|
|
48 |
yielding more accurate results. Furthermore, this approach ensures that our models remain extremely small and exceptionally fast,
|
49 |
facilitating easy deployment without demanding a ton of resources.
|
50 |
|
51 |
+
Hence, in this model card, we release several pre-trained TTMs that can cater to many common forecasting settings in practice.
|
52 |
+
Each pre-trained model will be released in a different branch name in this model card. Given the variety of models included, we recommend the use of [`get_model()`](https://github.com/ibm-granite/granite-tsfm/blob/main/tsfm_public/toolkit/get_model.py) utility to automatically select the required model based on your input context length, and forecast length, and other requirements. You can also directly access a specific model using our
|
|
|
|
|
53 |
getting started [notebook](https://github.com/IBM/tsfm/blob/main/notebooks/hfdemo/ttm_getting_started.ipynb) mentioning the branch name.
|
54 |
|
|
|
55 |
|
56 |
+
## Model Releases
|
57 |
|
58 |
There are several models available in different branches of this model card. The naming scheme follows the following format:
|
59 |
`<context length>-<prediction length>-<frequency prefix tuning indicator>-<pretraining metric>-<release number>`
|
|
|
68 |
|
69 |
- release number ("r2" or "r2.1"): Indicates the model release; the release indicates which data was used to train the model. See "training data" below for more details on the data included in the particular training datasets.
|
70 |
|
|
|
71 |
|
|
|
72 |
|
73 |
+
### Example recipes and notebooks
|
74 |
+
The scripts below can be used for any of the above TTM models. Please update the HF model URL and branch name in the `from_pretrained` call appropriately to pick the model of your choice. Please note that a few of the notebooks directly use the [`get_model()`](https://github.com/ibm-granite/granite-tsfm/blob/main/tsfm_public/toolkit/get_model.py) utility to select the model.
|
75 |
|
76 |
+
- Getting started [[Recipe]](https://github.com/ibm-granite-community/granite-timeseries-cookbook/blob/main/recipes/Time_Series/Time_Series_Getting_Started.ipynb) [[colab]](https://colab.research.google.com/github/ibm-granite/granite-tsfm/blob/main/notebooks/hfdemo/ttm_getting_started.ipynb)
|
77 |
+
- Getting started with IBM watsonx [[Recipe]](https://github.com/ibm-granite-community/granite-timeseries-cookbook/blob/main/recipes/Time_Series/Getting_Started_with_WatsonX_AI_SDK.ipynb)
|
78 |
- Zeroshot Multivariate Forecasting [[Example]](https://github.com/ibm-granite/granite-tsfm/blob/main/notebooks/hfdemo/ttm_getting_started.ipynb)
|
79 |
- Finetuned Multivariate Forecasting:
|
80 |
- Channel-Independent Finetuning [[Example 1]](https://github.com/ibm-granite/granite-tsfm/blob/main/notebooks/hfdemo/ttm_getting_started.ipynb) [[Example 2]](https://github.com/ibm-granite/granite-tsfm/blob/main/notebooks/hfdemo/tinytimemixer/ttm_m4_hourly.ipynb)
|
81 |
- Channel-Mix Finetuning [[Example]](https://github.com/ibm-granite/granite-tsfm/blob/main/notebooks/tutorial/ttm_channel_mix_finetuning.ipynb)
|
82 |
+
- TTM r2 release (extended features released on October 2024):
|
83 |
+
- Finetuning and Forecasting with Exogenous/Control Variables [[Recipe 1]](https://github.com/ibm-granite-community/granite-timeseries-cookbook/blob/main/recipes/Time_Series/Few-shot_Finetuning_and_Evaluation.ipynb) [[Recipe 2]](https://github.com/ibm-granite-community/granite-timeseries-cookbook/blob/main/recipes/Time_Series/Bike_Sharing_Finetuning_with_Exogenous.ipynb)
|
84 |
- Finetuning and Forecasting with static categorical features [Example: To be added soon]
|
85 |
- Rolling Forecasts - Extend forecast lengths via rolling capability. Rolling beyond 2*forecast_length is not recommended. [[Example]](https://github.com/ibm-granite/granite-tsfm/blob/main/notebooks/hfdemo/ttm_rolling_prediction_getting_started.ipynb)
|
86 |
- Helper scripts for optimal Learning Rate suggestions for Finetuning [[Example]](https://github.com/ibm-granite/granite-tsfm/blob/main/notebooks/tutorial/ttm_with_exog_tutorial.ipynb)
|
87 |
+
- TTM r2.1 release:
|
88 |
+
- GIFT-Eval benchmark [[notebook]](https://github.com/SalesforceAIResearch/gift-eval/blob/main/notebooks/ttm.ipynb)
|
89 |
+
|
90 |
+
|
91 |
+
### Usage guidelines
|
92 |
+
1. Users have to externally standard scale their data independently for every channel before feeding it to the model (refer to [`TimeSeriesPreprocessor`](https://github.com/IBM/tsfm/blob/main/tsfm_public/toolkit/time_series_preprocessor.py), our data processing utility for data scaling).
|
93 |
+
2. The current open-source version supports only minutely and hourly resolutions(Ex. 10 min, 15 min, 1 hour.). Other lower resolutions (say monthly or yearly) are currently not supported in this version, as the model needs a minimum context length of 512 or 1024. With the r2.1 release, we now also support daily and weekly resolution.
|
94 |
+
3. Enabling any upsampling or prepending zeros to virtually increase the context length for shorter-length datasets is not recommended and will impact the model performance.
|
95 |
+
|
96 |
+
|
97 |
+
### Automatic model selection
|
98 |
+
Automatic model selection based on context length, prediction length, and other requirements can be done through use of the `get_model()` function. For reference, the signature of the function is provided below:
|
99 |
+
```
|
100 |
+
def get_model(
|
101 |
+
model_path: str,
|
102 |
+
model_name: str = "ttm",
|
103 |
+
context_length: Optional[int] = None,
|
104 |
+
prediction_length: Optional[int] = None,
|
105 |
+
freq_prefix_tuning: bool = False,
|
106 |
+
freq: Optional[str] = None,
|
107 |
+
prefer_l1_loss: bool = False,
|
108 |
+
prefer_longer_context: bool = True,
|
109 |
+
force_return: Optional[str] = None,
|
110 |
+
return_model_key: bool = False,
|
111 |
+
**kwargs,
|
112 |
+
) -> Union[str, PreTrainedModel]:
|
113 |
+
"""TTM Model card offers a suite of models with varying `context_length` and `prediction_length` combinations.
|
114 |
+
This wrapper automatically selects the right model based on the given input `context_length` and
|
115 |
+
`prediction_length` abstracting away the internal complexity.
|
116 |
+
|
117 |
+
Args:
|
118 |
+
model_path (str): HuggingFace model card path or local model path (Ex. ibm-granite/granite-timeseries-ttm-r2)
|
119 |
+
model_name (str, optional): Model name to use. Current allowed values: [ttm]. Defaults to "ttm".
|
120 |
+
context_length (int, optional): Input Context length or history. Defaults to None.
|
121 |
+
prediction_length (int, optional): Length of the forecast horizon. Defaults to None.
|
122 |
+
freq_prefix_tuning (bool, optional): If true, it will prefer TTM models that are trained with frequency prefix
|
123 |
+
tuning configuration. Defaults to None.
|
124 |
+
freq (str, optional): Resolution or frequency of the data. Defaults to None. Allowed values are as
|
125 |
+
per the `DEFAULT_FREQUENCY_MAPPING`.
|
126 |
+
prefer_l1_loss (bool, optional): If True, it will prefer choosing models that were trained with L1 loss or
|
127 |
+
mean absolute error loss. Defaults to False.
|
128 |
+
prefer_longer_context (bool, optional): If True, it will prefer selecting model with longer context/history
|
129 |
+
Defaults to True.
|
130 |
+
force_return (str, optional): This is used to force the get_model() to return a TTM model even when the provided
|
131 |
+
configurations don't match with the existing TTMs. It gets the closest TTM possible. Allowed values are
|
132 |
+
["zeropad"/"rolling"/"random_init_small"/"random_init_medium"/"random_init_large"/`None`].
|
133 |
+
"zeropad" = Returns a pre-trained TTM that has a context length higher than the input context length, hence,
|
134 |
+
the user must apply zero-padding to use the returned model.
|
135 |
+
"rolling" = Returns a pre-trained TTM that has a prediction length lower than the requested prediction length,
|
136 |
+
hence, the user must apply rolling technique to use the returned model to forecast to the desired length.
|
137 |
+
The `RecursivePredictor` class can be utilized in this scenario.
|
138 |
+
"random_init_small" = Returns a randomly initialized small TTM which must be trained before performing inference.
|
139 |
+
"random_init_medium" = Returns a randomly initialized medium TTM which must be trained before performing inference.
|
140 |
+
"random_init_large" = Returns a randomly initialized large TTM which must be trained before performing inference.
|
141 |
+
`None` = `force_return` is disable. Raises an error if no suitable model is found.
|
142 |
+
Defaults to None.
|
143 |
+
return_model_key (bool, optional): If True, only the TTM model name will be returned, instead of the actual model.
|
144 |
+
This does not downlaod the model, and only returns the name of the suitable model. Defaults to False.
|
145 |
+
|
146 |
+
Returns:
|
147 |
+
Union[str, PreTrainedModel]: Returns the Model, or the model name.
|
148 |
+
"""
|
149 |
+
```
|
150 |
+
|
151 |
+
|
152 |
|
153 |
## Benchmarks
|
154 |
|
|
|
164 |
- TTM-A referred in the paper maps to the 1536 context models.
|
165 |
|
166 |
The pre-training dataset used in this release differs slightly from the one used in the research
|
167 |
+
paper, which may lead to minor variations in model performance as compared to the published results. Please refer to our paper for more details. Benchmarking scripts can be found [here](https://github.com/ibm-granite/granite-tsfm/tree/main/notebooks/hfdemo/tinytimemixer/full_benchmarking).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
168 |
|
169 |
|
170 |
|
|
|
172 |
|
173 |
For more details on TTM architecture and benchmarks, refer to our [paper](https://arxiv.org/pdf/2401.03955.pdf).
|
174 |
|
175 |
+
TTM currently supports two modes:
|
176 |
|
177 |
- **Zeroshot forecasting**: Directly apply the pre-trained model on your target data to get an initial forecast (with no training).
|
178 |
|
179 |
- **Finetuned forecasting**: Finetune the pre-trained model with a subset of your target data to further improve the forecast.
|
180 |
|
181 |
+
Since, TTM models are extremely small and fast, it is practically very easy to finetune the model with your available target data in few minutes to get more accurate forecasts.
|
|
|
182 |
|
183 |
The current release supports multivariate forecasting via both channel independence and channel-mixing approaches.
|
184 |
Decoder Channel-Mixing can be enabled during fine-tuning for capturing strong channel-correlation patterns across
|
185 |
+
time-series variates, a critical capability lacking in existing counterparts. In addition, TTM also supports exogenous infusion and static categorical data infusion.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
186 |
|
187 |
+
The r2.1 release builds upon the above, adding improved accuracy for shorter context length, daily/weekly resolution, combined with a larger pre-training dataset.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
188 |
|
189 |
|
190 |
## Training Data
|
|
|
225 |
|
226 |
|
227 |
## Citation
|
228 |
+
Please cite the following paper if you intend to use our model or its associated architectures/approaches in your
|
229 |
+
work.
|
230 |
|
231 |
**BibTeX:**
|
232 |
|
|
|
241 |
|
242 |
## Model Card Authors
|
243 |
|
244 |
+
Vijay Ekambaram, Arindam Jati, Pankaj Dayama, Wesley M. Gifford, Tomoya Sakai, Sumanta Mukherjee, Chandra Reddy and Jayant Kalagnanam
|
245 |
|
246 |
|
247 |
+
## IBM Public Repository Disclosure
|
248 |
|
249 |
All content in this repository including code has been provided by IBM under the associated
|
250 |
open source software license and IBM is under no obligation to provide enhancements,
|
251 |
updates, or support. IBM developers produced this code as an
|
252 |
open source project (not as an IBM product), and IBM makes no assertions as to
|
253 |
+
the level of quality nor security, and will not be maintaining this code going forward.
|