Modules¶

edgel3.cli¶

edgel3.cli.run(inputs, output_dir=None, suffix=None, model_type='sparse', emb_dim=128, retrain_type='ft', sparsity=95.45, center=True, hop_size=0.1, verbose=False)[source]¶

Computes and saves L3 embedding for given inputs.

Parameters:

inputs (list of str, or str) – File/directory path or list of file/directory paths to be processed
output_dir (str or None) – Path to directory for saving output files. If None, output files will be saved to the directory containing the input file.
suffix (str or None) – String to be appended to the output filename, i.e. <base filename>_<suffix>.npy. If None, then no suffix will be added, i.e. <base filename>.npy.
model_type ({sea, sparse}) – Type of smaller version of L3 model. If sea is selected, the audio model is a UST specialized (SEA) model. sparse gives a sparse L3 model with the desired sparsity.
emb_dim ({512, 256, 128, 64}) – Desired embedding dimension of the UST specialized embedding approximated (SEA) models.
retrain_type (str) – Type of retraining after sparsification of the L3 audio. Finetuned model is returned for ft and kd gives knowledge distilled sparse audio.
sparsity ({95.45, 53.5, 63.5, 72.3, 87.0}) – The desired sparsity to be achieved for the audio model of L3. Sparsity of 95.45 corresponds to the EdgeL3 model.
center (boolean) – If True, pads beginning of signal so timestamps correspond to center of window.
hop_size (float) – Hop size in seconds.
quiet (boolean) – If True, suppress all non-error output to stdout

edgel3.core¶

edgel3.core._center_audio(audio, frame_len)[source]¶: Center audio so that first sample will occur in the middle of the first frame

edgel3.core._pad_audio(audio, frame_len, hop_len)[source]¶: Pad audio if necessary so that all samples are processed

edgel3.core.get_embedding(audio, sr, model=None, model_type='sparse', emb_dim=128, retrain_type='ft', sparsity=95.45, center=True, hop_size=0.1, verbose=1)[source]¶

Computes and returns L3 embedding for an audio data from pruned audio model.

Parameters:

audio (np.ndarray [shape=(N,) or (N,C)]) – 1D numpy array of audio data.
sr (int) – Sampling rate, if not 48kHz or 8kHz will audio will be resampled for sparse and sea models respectively.
model (keras.models.Model or None) – Loaded model object. If a model is provided, then sparsity will be ignored. If None is provided, the desired version of smaller L3 will be loaded, determined by model_type. model will be loaded using
model_type ({'sea', 'sparse'}) – Type of smaller version of L3 model. If sea is selected, the audio model is a UST specialized (SEA) model. sparse gives a sparse L3 model with the desired ‘sparsity’.
emb_dim ({512, 256, 128, 64}) – Desired embedding dimension of the UST specialized embedding approximated (SEA) models. Not used for sparse models.
retrain_type ({'ft', 'kd'}) – Type of retraining for the sparsified weights of L3 audio model. ft chooses the fine-tuning method and kd returns knowledge distilled model.
sparsity ({95.45, 53.5, 63.5, 72.3, 87.0}) – The desired sparsity of audio model.
center (boolean) – If True, pads beginning of signal so timestamps correspond to center of window.
hop_size (float) – Hop size in seconds.
verbose (0 or 1) – Keras verbosity.

Returns:

embedding (np.ndarray [shape=(T, D)]) – Array of embeddings for each window.
timestamps (np.ndarray [shape=(T,)]) – Array of timestamps corresponding to each embedding in the output.

edgel3.core.get_output_path(filepath, suffix, output_dir=None)[source]¶

Parameters:	filepath (str) – Path to audio file to be processed. suffix (str) – String to append to filename (including extension) output_dir (str or None) – Path to directory where file will be saved. If None, will use directory of given filepath.
Returns:	output_path – Path to output file.
Return type:	str

edgel3.core.process_file(filepath, output_dir=None, suffix=None, model=None, model_type='sparse', emb_dim=128, sparsity=95.45, center=True, hop_size=0.1, verbose=True)[source]¶

Computes and saves L3 embedding for given audio file

Parameters:

filepath (str) – Path to WAV file to be processed.
output_dir (str or None) – Path to directory for saving output files. If None, output files will be saved to the directory containing the input file.
suffix (str or None) – String to be appended to the output filename, i.e. <base filename>_<suffix>.npz. If None, then no suffix will be added, i.e. <base filename>.npz.
model (keras.models.Model or None) – Loaded model object. If a model is provided, then model_type will be ignored. If None is provided, UST specialized L3 or sparse L3 is loaded according to the model_type.
model_type ({'sea', 'sparse'}) – Type of smaller version of L3 model. If sea is selected, the audio model is a UST specialized (SEA) model. sparse gives a sparse L3 model with the desired ‘sparsity’.
emb_dim ({512, 256, 128, 64}) – Desired embedding dimension of the UST specialized embedding approximated (SEA) models. Not used for sparse models.
sparsity ({95.45, 53.5, 63.5, 72.3, 87.0}) – The desired sparsity of audio model.
center (boolean) – If True, pads beginning of signal so timestamps correspond to center of window.
hop_size (float) – Hop size in seconds.
verbose (0 or 1) – Keras verbosity.

edgel3.models¶

edgel3.models._construct_sparsified_audio_network(**kwargs)[source]¶

Returns an uninitialized model object for a sparsified network with a Melspectrogram input (with 256 frequency bins).

Returns:	model – Model object.
Return type:	keras.models.Model

edgel3.models._construct_ust_specialized_audio_network(emb_dim=128, **kwargs)[source]¶

Returns an uninitialized model object for a UST specialized audio network with a Melspectrogram input (with 64 frequency bins).

Returns:	model – Model object.
Return type:	keras.models.Model

edgel3.models.load_embedding_model(model_type, emb_dim, retrain_type, sparsity)[source]¶

Returns a model with the given characteristics. Loads the model if the model has not been loaded yet.

Parameters:	model_type ({sea, sparse}) – Type of smaller version of L3 model. If ‘sea’ is selected, the audio model is a UST specialized (SEA) model. ‘sparse’ gives a sparse L3 model with the desired ‘sparsity’. emb_dim ({512, 256, 128, 64}) – Desired embedding dimension of the UST specialized embedding approximated (SEA) models. retrain_type ('ft' or 'kd') – Type of retraining for the sparsified weights of L3 audio model. ‘ft’ chooses the fine-tuning method and ‘kd’ returns knowledge distilled model. sparsity ({95.45, 53.5, 63.5, 72.3, 87.0}) – The desired sparsity of audio model.
Returns:	model – Model object.
Return type:	keras.models.Model

edgel3.models.load_embedding_model_path(model_type, emb_dim, retrain_type, sparsity)[source]¶

Returns the local path to the model weights file for the model with the given sparsity

Parameters:	model_type ({sea, sparse}) – Type of smaller version of L3 model. If ‘sea’ is selected, the audio model is a UST specialized (SEA) model. ‘sparse’ gives a sparse L3 model with the desired ‘sparsity’. emb_dim ({512, 256, 128, 64}) – Desired embedding dimension of the UST specialized embedding approximated (SEA) models. retrain_type ('ft' or 'kd') – Type of retraining for the sparsified weights of L3 audio model. ‘ft’ chooses the fine-tuning method and ‘kd’ returns knowledge distilled model. sparsity ({95.45, 53.5, 63.5, 72.3, 87.0}) – Desired sparsity of the audio model.
Returns:	output_path – Path to given model object
Return type:	str