File size: 27,915 Bytes
7934b29 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 |
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# TTS Inference Prosody Control\n",
"\n",
"This notebook is intended to teach users how to control duration and pitch with the FastPitch model."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# License\n",
"\n",
"> Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.\n",
">\n",
"> Licensed under the Apache License, Version 2.0 (the \"License\");\n",
"> you may not use this file except in compliance with the License.\n",
"> You may obtain a copy of the License at\n",
">\n",
"> http://www.apache.org/licenses/LICENSE-2.0\n",
">\n",
"> Unless required by applicable law or agreed to in writing, software\n",
"> distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"> WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"> See the License for the specific language governing permissions and\n",
"> limitations under the License."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"\"\"\"\n",
"You can either run this notebook locally (if you have all the dependencies and a GPU) or on Google Colab.\n",
"Instructions for setting up Colab are as follows:\n",
"1. Open a new Python 3 notebook.\n",
"2. Import this notebook from GitHub (File -> Upload Notebook -> \"GITHUB\" tab -> copy/paste GitHub URL)\n",
"3. Connect to an instance with a GPU (Runtime -> Change runtime type -> select \"GPU\" for hardware accelerator)\n",
"4. Run this cell to set up dependencies.\n",
"\"\"\"\n",
"BRANCH = 'r1.17.0'\n",
"# # If you're using Google Colab and not running locally, uncomment and run this cell.\n",
"# !apt-get install sox libsndfile1 ffmpeg\n",
"# !pip install wget text-unidecode\n",
"# !python -m pip install git+https://github.com/NVIDIA/NeMo.git@$BRANCH#egg=nemo_toolkit[all]\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"Please run the below cell to import libraries used in this notebook. This cell will load the fastpitch model and hifigan models used in the rest of the notebook. Lastly, two helper functions are defined. One is used for inference while the other is used to plot the inference results."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Import all libraries\n",
"import IPython.display as ipd\n",
"import librosa\n",
"import librosa.display\n",
"import numpy as np\n",
"import torch\n",
"from matplotlib import pyplot as plt\n",
"%matplotlib inline\n",
"\n",
"# Reduce logging messages for this notebook\n",
"from nemo.utils import logging\n",
"logging.setLevel(logging.ERROR)\n",
"\n",
"from nemo.collections.tts.models import FastPitchModel\n",
"from nemo.collections.tts.models import HifiGanModel\n",
"from nemo.collections.tts.parts.utils.helpers import regulate_len\n",
"\n",
"# Load the models from NGC\n",
"fastpitch = FastPitchModel.from_pretrained(\"tts_en_fastpitch\").eval().cuda()\n",
"hifigan = HifiGanModel.from_pretrained(\"tts_en_hifigan\").eval().cuda()\n",
"sr = 22050\n",
"\n",
"# Define some helper functions\n",
"# Define a helper function to go from string to audio\n",
"def str_to_audio(inp, pace=1.0, durs=None, pitch=None):\n",
" with torch.no_grad():\n",
" tokens = fastpitch.parse(inp)\n",
" spec, _, durs_pred, _, pitch_pred, *_ = fastpitch(text=tokens, durs=durs, pitch=pitch, speaker=None, pace=pace)\n",
" audio = hifigan.convert_spectrogram_to_audio(spec=spec).to('cpu').numpy()\n",
" return spec, audio, durs_pred, pitch_pred\n",
"\n",
"# Define a helper function to plot spectrograms with pitch and display the audio\n",
"def display_pitch(audio, pitch, sr=22050, durs=None):\n",
" fig, ax = plt.subplots(figsize=(12, 6))\n",
" spec = np.abs(librosa.stft(audio[0], n_fft=1024))\n",
" # Check to see if pitch has been unnormalized\n",
" if torch.abs(torch.mean(pitch)) <= 1.0:\n",
" # Unnormalize the pitch with LJSpeech's mean and std\n",
" pitch = pitch * 65.72037058703644 + 214.72202032404294\n",
" # Check to see if pitch has been expanded to the spec length yet\n",
" if len(pitch) != spec.shape[0] and durs is not None:\n",
" pitch = regulate_len(durs, pitch.unsqueeze(-1))[0].squeeze(-1)\n",
" # Plot and display audio, spectrogram, and pitch\n",
" ax.plot(pitch.cpu().numpy()[0], color='cyan', linewidth=1)\n",
" librosa.display.specshow(np.log(spec+1e-12), y_axis='log')\n",
" ipd.display(ipd.Audio(audio, rate=sr))\n",
" plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Duration Control\n",
"\n",
"This section is applicable to models that use a duration predictor module. This module is called the Length Regulator and was introduced in FastSpeech [1]. A list of NeMo models that support duration predictors are as follows:\n",
"\n",
"- [FastPitch](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_en_fastpitch)\n",
"- [FastPitch_HifiGan_E2E](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_en_e2e_fastpitchhifigan)\n",
"- [FastSpeech2](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_en_fastspeech_2)\n",
"- [FastSpeech2_HifiGan_E2E](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_en_e2e_fastspeech2hifigan)\n",
"- [TalkNet](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_en_talknet)\n",
"- [Glow-TTS](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_en_glowtts)\n",
"\n",
"While each model has their own implementation of this duration predictor, all of them follow a simple convolutional architecture. The input is the encoded tokens, and the output of the module is a value that represents how many frames in the decoder correspond to each token. It is essentially a hard attention mechanism.\n",
"\n",
"Since each model outputs a duration value per token, it is simple to slow down or increase the speech rate by increasing or decreasing these values. Consider the following:\n",
"\n",
"```python\n",
"def regulate_len(durations, pace=1.0):\n",
" durations = durations.float() / pace\n",
" # The output from the duration predictor module is still a float\n",
" # If we want the speech to be faster, we can increase the pace and make each token duration shorter\n",
" # Alternatively we can slow down the pace by decreasing the pace parameter\n",
" return durations.long() # Lastly, we need to make the durations integers for subsequent processes\n",
"```\n",
"\n",
"Let's try this out with FastPitch"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#Define what we want the model to say\n",
"input_string = \"Hey, I am speaking at different paces!\" # Feel free to change it and experiment\n",
"\n",
"# Let's run fastpitch normally\n",
"_, audio, *_ = str_to_audio(input_string)\n",
"print(f\"This is fastpitch speaking at the regular pace of 1.0. This example is {len(audio[0])/sr:.3f} seconds long.\")\n",
"ipd.display(ipd.Audio(audio, rate=sr))\n",
"\n",
"# We can speed up the speech by adjusting the pace\n",
"_, audio, *_ = str_to_audio(input_string, pace=1.2)\n",
"print(f\"This is fastpitch speaking at the faster pace of 1.2. This example is {len(audio[0])/sr:.3f} seconds long.\")\n",
"ipd.display(ipd.Audio(audio, rate=sr))\n",
"\n",
"# We can slow down the speech by adjusting the pace\n",
"_, audio, *_ = str_to_audio(input_string, pace=0.75)\n",
"print(f\"This is fastpitch speaking at the slower pace of 0.75. This example is {len(audio[0])/sr:.3f} seconds long.\")\n",
"ipd.display(ipd.Audio(audio, rate=sr))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Pitch Control\n",
"\n",
"The newer spectrogram generator models predict the pitch for certain words. Since these models predict pitch, we can adjust the predicted pitch in a similar manner to the predicted durations like in the previous section. A list of NeMo models that support pitch control are as follows:\n",
"\n",
"- [FastPitch](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_en_fastpitch)\n",
"- [FastPitch_HifiGan_E2E](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_en_e2e_fastpitchhifigan)\n",
"- [FastSpeech2](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_en_fastspeech_2)\n",
"- [FastSpeech2_HifiGan_E2E](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_en_e2e_fastspeech2hifigan)\n",
"- [TalkNet](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_en_talknet)\n",
"\n",
"### FastPitch\n",
"\n",
"As with the previous tutorial, we will focus on FastPitch. FastPitch differs from some other models as it predicts a pitch difference to a normalized (mean 0, std 1) speaker pitch. Other models will just predict the unnormalized pitch. Looking at a simplified version of the FastPitch model, we see\n",
"\n",
"```python\n",
"# Predict pitch\n",
"pitch_predicted = self.pitch_predictor(enc_out, enc_mask) # Predicts a pitch that is normalized with speaker statistics \n",
"pitch_emb = self.pitch_emb(pitch.unsqueeze(1)) # A simple 1D convolution to map the float pitch to a embedding pitch\n",
"\n",
"enc_out = enc_out + pitch_emb.transpose(1, 2) # We add the pitch to the encoder output\n",
"spec, *_ = self.decoder(input=len_regulated, seq_lens=dec_lens) # We send the sum to the decoder to get the spectrogram\n",
"```\n",
"\n",
"Let's see the `pitch_predicted` for a sample text. You can run the below cell. You should get an image that looks like the following for the input `Hey, what is my pitch?`:\n",
"\n",
"<img src=\"https://raw.githubusercontent.com/NVIDIA/NeMo/main/tutorials/tts/images/fastpitch-pitch.png\">\n",
"\n",
"Notice that the last word `pitch` has an increase in pitch to stress that it is a question."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import librosa\n",
"import librosa.display\n",
"from matplotlib import pyplot as plt\n",
"import numpy as np\n",
"from nemo.collections.tts.parts.utils.helpers import regulate_len\n",
"%matplotlib inline\n",
"\n",
"#Define what we want the model to say\n",
"input_string = \"Hey, what is my pitch?\" # Feel free to change it and experiment\n",
"\n",
"# Run inference to get spectrogram and pitch\n",
"with torch.no_grad():\n",
" spec, audio, durs_pred, pitch_pred = str_to_audio(input_string)\n",
"\n",
" # FastPitch predicts one pitch value per token. To plot it, we have to expand the token length to the spectrogram length\n",
" pitch_pred, _ = regulate_len(durs_pred, pitch_pred.unsqueeze(-1))\n",
" pitch_pred = pitch_pred.squeeze(-1)\n",
" # Note we have to unnormalize the pitch with LJSpeech's mean and std\n",
" pitch_pred = pitch_pred * 65.72037058703644 + 214.72202032404294\n",
"\n",
"# Let's plot the predicted pitch and how it affects the predicted audio\n",
"fig, ax = plt.subplots(figsize=(12, 6))\n",
"spec = np.abs(librosa.stft(audio[0], n_fft=1024))\n",
"ax.plot(pitch_pred.cpu().numpy()[0], color='cyan', linewidth=1)\n",
"librosa.display.specshow(np.log(spec+1e-12), y_axis='log')\n",
"ipd.display(ipd.Audio(audio, rate=sr))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Plot Control\n",
"\n",
"Now that we see how the pitch affects the predicted spectrogram, we can now adjust it to add some effects. We will explore 4 different manipulations:\n",
"\n",
"1) Pitch shift\n",
"\n",
"2) Pitch flatten\n",
"\n",
"3) Pitch inversion\n",
"\n",
"4) Pitch amplification"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Pitch Shift\n",
"First, let's handle pitch shifting. To shift the pitch up or down by some Hz, we can just add or subtract as needed. Let's shift the pitch down by 50 Hz and compare it to the previous example."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#Define what we want the model to say\n",
"input_string = \"Hey, what is my pitch?\" # Feel free to change it and experiment\n",
"\n",
"# Run inference to get spectrogram and pitch\n",
"with torch.no_grad():\n",
" spec_norm, audio_norm, durs_norm_pred, pitch_norm_pred = str_to_audio(input_string)\n",
" \n",
" # Note we have to unnormalize the pitch with LJSpeech's mean and std\n",
" pitch_shift = pitch_norm_pred * 65.72037058703644 + 214.72202032404294\n",
" # Now let's pitch shift down by 50Hz\n",
" pitch_shift = pitch_shift - 50\n",
" # Now we have to renormalize it to be mean 0, std 1\n",
" pitch_shift = (pitch_shift - 214.72202032404294) / 65.72037058703644\n",
" \n",
" # Now we can pass it to the model\n",
" spec_shift, audio_shift, durs_shift_pred, _ = str_to_audio(input_string, pitch=pitch_shift)\n",
" # NOTE: We do not plot the pitch returned from str_to_audio.\n",
" # When we override the pitch, we want to plot the pitch that override the model with.\n",
" # In thise case, it is `pitch_shift`\n",
"\n",
"# Let's see both results\n",
"print(\"The first unshifted sample\")\n",
"display_pitch(audio_norm, pitch_norm_pred, durs=durs_norm_pred)\n",
"print(\"The second shifted sample. This sample is much deeper than the previous.\")\n",
"display_pitch(audio_shift, pitch_shift, durs=durs_shift_pred)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Pitch Flattening\n",
"Second, let's look at pitch flattening. To flatten the pitch, we just set it to 0. Let's run it and compare the results."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#Define what we want the model to say\n",
"input_string = \"Hey, what is my pitch?\" # Feel free to change it and experiment\n",
"\n",
"# Run inference to get spectrogram and pitch\n",
"with torch.no_grad():\n",
" spec_norm, audio_norm, durs_norm_pred, pitch_norm_pred = str_to_audio(input_string)\n",
" \n",
" # Now let's set the pitch to 0\n",
" pitch_flat = pitch_norm_pred * 0\n",
" # Now we can pass it to the model\n",
" spec_flat, audio_flat, durs_flat_pred, _ = str_to_audio(input_string, pitch=pitch_flat)\n",
"\n",
"# Let's see both results\n",
"print(\"The first unaltered sample\")\n",
"display_pitch(audio_norm, pitch_norm_pred, durs=durs_norm_pred)\n",
"print(\"The second flattened sample. This sample is more monotone than the previous.\")\n",
"display_pitch(audio_flat, pitch_flat, durs=durs_flat_pred)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Pitch Inversion\n",
"Third, let's look at pitch inversion. To invert the pitch, we just multiply it by -1. Let's run it and compare the results."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#Define what we want the model to say\n",
"input_string = \"Hey, what is my pitch?\" # Feel free to change it and experiment\n",
"\n",
"# Run inference to get spectrogram and pitch\n",
"with torch.no_grad():\n",
" spec_norm, audio_norm, durs_norm_pred, pitch_norm_pred = str_to_audio(input_string)\n",
" \n",
" # Now let's invert the pitch\n",
" pitch_inv = pitch_norm_pred * -1\n",
" # Now we can pass it to the model\n",
" spec_inv, audio_inv, durs_inv_pred, _ = str_to_audio(input_string, pitch=pitch_inv)\n",
"\n",
"# Let's see both results\n",
"print(\"The first unaltered sample\")\n",
"display_pitch(audio_norm, pitch_norm_pred, durs=durs_norm_pred)\n",
"print(\"The second inverted sample. This sample sounds less like a question and more like a statement.\")\n",
"display_pitch(audio_inv, pitch_inv, durs=durs_inv_pred)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Pitch Amplify\n",
"Lastly, let's look at pitch amplifying. To amplify the pitch, we just multiply it by a positive constant. Let's run it and compare the results."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#Define what we want the model to say\n",
"input_string = \"Hey, what is my pitch?\" # Feel free to change it and experiment\n",
"\n",
"# Run inference to get spectrogram and pitch\n",
"with torch.no_grad():\n",
" spec_norm, audio_norm, durs_norm_pred, pitch_norm_pred = str_to_audio(input_string)\n",
" \n",
" # Now let's amplify the pitch\n",
" pitch_amp = pitch_norm_pred * 1.5\n",
" # Now we can pass it to the model\n",
" spec_amp, audio_amp, durs_amp_pred, _ = str_to_audio(input_string, pitch=pitch_amp)\n",
"\n",
"# Let's see both results\n",
"print(\"The first unaltered sample\")\n",
"display_pitch(audio_norm, pitch_norm_pred, durs=durs_norm_pred)\n",
"print(\"The second amplified sample.\")\n",
"display_pitch(audio_amp, pitch_amp, durs=durs_amp_pred)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Putting it all together\n",
"\n",
"Now that we understand how to control the duration and pitch of TTS models, we can show how to adjust the voice to sound more solemn (slower speed + lower pitch), or more excited (higher speed + higher pitch)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#Define what we want the model to say\n",
"input_string = \"I want to pass on my condolences for your loss.\"\n",
"\n",
"# Run inference to get spectrogram and pitch\n",
"with torch.no_grad():\n",
" spec_norm, audio_norm, durs_norm_pred, pitch_norm_pred = str_to_audio(input_string)\n",
" \n",
" # Let's try to make the speech more solemn\n",
" # Let's deamplify the pitch and shift the pitch down by 75% of 1 standard deviation\n",
" pitch_sol = (pitch_norm_pred)*0.75-0.75\n",
" # Fastpitch tends to raise the pitch before \"loss\" which sounds inappropriate. Let's just remove that pitch raise\n",
" pitch_sol[0][-5] += 0.2\n",
" # Now let's pass our new pitch to fastpitch with a 90% pacing to slow it down\n",
" spec_sol, audio_sol, durs_sol_pred, _ = str_to_audio(input_string, pitch=pitch_sol, pace=0.9)\n",
" \n",
"# Let's see both results\n",
"print(\"The first unaltered sample\")\n",
"display_pitch(audio_norm, pitch_norm_pred, durs=durs_norm_pred)\n",
"print(\"The second solumn sample\")\n",
"display_pitch(audio_sol, pitch_sol, durs=durs_sol_pred)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#Define what we want the model to say\n",
"input_string = \"Congratulations on your promotion.\"\n",
"\n",
"# Run inference to get spectrogram and pitch\n",
"with torch.no_grad():\n",
" spec_norm, audio_norm, durs_norm_pred, pitch_norm_pred = str_to_audio(input_string)\n",
" \n",
" # Let's amplify the pitch to make it sound more animated\n",
" # We also pitch shift up by 50% of 1 standard deviation\n",
" pitch_excite = (pitch_norm_pred)*1.7+0.5\n",
" # Now let's pass our new pitch to fastpitch with a 110% pacing to speed it up\n",
" spec_excite, audio_excite, durs_excite_pred, _ = str_to_audio(input_string, pitch=pitch_excite, pace=1.1)\n",
" \n",
"# Let's see both results\n",
"print(\"The first unaltered sample\")\n",
"display_pitch(audio_norm, pitch_norm_pred, durs=durs_norm_pred)\n",
"print(\"The second excited sample\")\n",
"display_pitch(audio_excite, pitch_excite, durs=durs_excite_pred)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Other Models\n",
"\n",
"This notebook lists other models that allow for control of speech rate and pitch. However, please note that not all models accept a `pace`, nor a `pitch` parameter as part of their forward/generate_spectrogram functions. Users who are interested in adding this functionality can use this notebook as a guide on how to do so.\n",
"\n",
"### Duration Control\n",
"\n",
"Adding duration control is the simpler of the two and one simply needs to add the `regulate_lens` function to the appropriate model for duration control.\n",
"\n",
"### Pitch Control\n",
"\n",
"Pitch control is more complicated. There are numerous design decisions that differ between models: 1) Whether to normalize the pitch, 2) Whether to predict pitch per spectrogram frame or per token, and more. While the basic transformations presented here (shift, flatten, invert, and amplify) can be done with all pitch predicting models, where to add this pitch transformation will differ depending on the model."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## References\n",
"\n",
"[1] https://arxiv.org/abs/1905.09263"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
|