# Scripts for creation of synthetic code-switched data from monolingual sources Follow the 2 steps listed below in order - 1. Create the (intermediate) manifest file using `code_switching_manifest_creation.py`. It's usage is as follows: `python code_switching_manifest_creation.py --manifest_language1 --manifest_language2 --manifest_save_path --id_language1 --id_language2 --max_sample_duration_sec --min_sample_duration_sec --dataset_size_required_hrs ` Estimated runtime for dataset_size_required_hrs=10,000 is ~2 mins 2. Create the synthetic audio data and the corresponding manifest file using `code_switching_audio_data_creation.py` It's usage is as follows: `python code_switching_audio_data_creation.py --manifest_path --audio_save_folder_path --manifest_save_path --audio_normalized_amplitude --cs_data_sampling_rate --sample_beginning_pause_msec --sample_joining_pause_msec --sample_end_pause_msec --workers ` Estimated runtime for generating a 10,000 hour corpus is ~40 hrs with a single worker