Modules

edgel3.cli

edgel3.cli.run(inputs, output_dir=None, suffix=None, model_type='sparse', emb_dim=128, retrain_type='ft', sparsity=95.45, center=True, hop_size=0.1, verbose=False)[source]

Computes and saves L3 embedding for given inputs.

Parameters:
  • inputs (list of str, or str) – File/directory path or list of file/directory paths to be processed
  • output_dir (str or None) – Path to directory for saving output files. If None, output files will be saved to the directory containing the input file.
  • suffix (str or None) – String to be appended to the output filename, i.e. <base filename>_<suffix>.npy. If None, then no suffix will be added, i.e. <base filename>.npy.
  • model_type ({sea, sparse}) – Type of smaller version of L3 model. If sea is selected, the audio model is a UST specialized (SEA) model. sparse gives a sparse L3 model with the desired sparsity.
  • emb_dim ({512, 256, 128, 64}) – Desired embedding dimension of the UST specialized embedding approximated (SEA) models.
  • retrain_type (str) – Type of retraining after sparsification of the L3 audio. Finetuned model is returned for ft and kd gives knowledge distilled sparse audio.
  • sparsity ({95.45, 53.5, 63.5, 72.3, 87.0}) – The desired sparsity to be achieved for the audio model of L3. Sparsity of 95.45 corresponds to the EdgeL3 model.
  • center (boolean) – If True, pads beginning of signal so timestamps correspond to center of window.
  • hop_size (float) – Hop size in seconds.
  • quiet (boolean) – If True, suppress all non-error output to stdout

edgel3.core

edgel3.core._center_audio(audio, frame_len)[source]

Center audio so that first sample will occur in the middle of the first frame

edgel3.core._pad_audio(audio, frame_len, hop_len)[source]

Pad audio if necessary so that all samples are processed

edgel3.core.get_embedding(audio, sr, model=None, model_type='sparse', emb_dim=128, retrain_type='ft', sparsity=95.45, center=True, hop_size=0.1, verbose=1)[source]

Computes and returns L3 embedding for an audio data from pruned audio model.

Parameters:
  • audio (np.ndarray [shape=(N,) or (N,C)]) – 1D numpy array of audio data.
  • sr (int) – Sampling rate, if not 48kHz or 8kHz will audio will be resampled for sparse and sea models respectively.
  • model (keras.models.Model or None) – Loaded model object. If a model is provided, then sparsity will be ignored. If None is provided, the desired version of smaller L3 will be loaded, determined by model_type. model will be loaded using
  • model_type ({'sea', 'sparse'}) – Type of smaller version of L3 model. If sea is selected, the audio model is a UST specialized (SEA) model. sparse gives a sparse L3 model with the desired ‘sparsity’.
  • emb_dim ({512, 256, 128, 64}) – Desired embedding dimension of the UST specialized embedding approximated (SEA) models. Not used for sparse models.
  • retrain_type ({'ft', 'kd'}) – Type of retraining for the sparsified weights of L3 audio model. ft chooses the fine-tuning method and kd returns knowledge distilled model.
  • sparsity ({95.45, 53.5, 63.5, 72.3, 87.0}) – The desired sparsity of audio model.
  • center (boolean) – If True, pads beginning of signal so timestamps correspond to center of window.
  • hop_size (float) – Hop size in seconds.
  • verbose (0 or 1) – Keras verbosity.
Returns:

  • embedding (np.ndarray [shape=(T, D)]) – Array of embeddings for each window.
  • timestamps (np.ndarray [shape=(T,)]) – Array of timestamps corresponding to each embedding in the output.

edgel3.core.get_output_path(filepath, suffix, output_dir=None)[source]
Parameters:
  • filepath (str) – Path to audio file to be processed.
  • suffix (str) – String to append to filename (including extension)
  • output_dir (str or None) – Path to directory where file will be saved. If None, will use directory of given filepath.
Returns:

output_path – Path to output file.

Return type:

str

edgel3.core.process_file(filepath, output_dir=None, suffix=None, model=None, model_type='sparse', emb_dim=128, sparsity=95.45, center=True, hop_size=0.1, verbose=True)[source]

Computes and saves L3 embedding for given audio file

Parameters:
  • filepath (str) – Path to WAV file to be processed.
  • output_dir (str or None) – Path to directory for saving output files. If None, output files will be saved to the directory containing the input file.
  • suffix (str or None) – String to be appended to the output filename, i.e. <base filename>_<suffix>.npz. If None, then no suffix will be added, i.e. <base filename>.npz.
  • model (keras.models.Model or None) – Loaded model object. If a model is provided, then model_type will be ignored. If None is provided, UST specialized L3 or sparse L3 is loaded according to the model_type.
  • model_type ({'sea', 'sparse'}) – Type of smaller version of L3 model. If sea is selected, the audio model is a UST specialized (SEA) model. sparse gives a sparse L3 model with the desired ‘sparsity’.
  • emb_dim ({512, 256, 128, 64}) – Desired embedding dimension of the UST specialized embedding approximated (SEA) models. Not used for sparse models.
  • sparsity ({95.45, 53.5, 63.5, 72.3, 87.0}) – The desired sparsity of audio model.
  • center (boolean) – If True, pads beginning of signal so timestamps correspond to center of window.
  • hop_size (float) – Hop size in seconds.
  • verbose (0 or 1) – Keras verbosity.

edgel3.models

edgel3.models._construct_sparsified_audio_network(**kwargs)[source]

Returns an uninitialized model object for a sparsified network with a Melspectrogram input (with 256 frequency bins).

Returns:model – Model object.
Return type:keras.models.Model
edgel3.models._construct_ust_specialized_audio_network(emb_dim=128, **kwargs)[source]

Returns an uninitialized model object for a UST specialized audio network with a Melspectrogram input (with 64 frequency bins).

Returns:model – Model object.
Return type:keras.models.Model
edgel3.models.load_embedding_model(model_type, emb_dim, retrain_type, sparsity)[source]

Returns a model with the given characteristics. Loads the model if the model has not been loaded yet.

Parameters:
  • model_type ({sea, sparse}) – Type of smaller version of L3 model. If ‘sea’ is selected, the audio model is a UST specialized (SEA) model. ‘sparse’ gives a sparse L3 model with the desired ‘sparsity’.
  • emb_dim ({512, 256, 128, 64}) – Desired embedding dimension of the UST specialized embedding approximated (SEA) models.
  • retrain_type ('ft' or 'kd') – Type of retraining for the sparsified weights of L3 audio model. ‘ft’ chooses the fine-tuning method and ‘kd’ returns knowledge distilled model.
  • sparsity ({95.45, 53.5, 63.5, 72.3, 87.0}) – The desired sparsity of audio model.
Returns:

model – Model object.

Return type:

keras.models.Model

edgel3.models.load_embedding_model_path(model_type, emb_dim, retrain_type, sparsity)[source]

Returns the local path to the model weights file for the model with the given sparsity

Parameters:
  • model_type ({sea, sparse}) – Type of smaller version of L3 model. If ‘sea’ is selected, the audio model is a UST specialized (SEA) model. ‘sparse’ gives a sparse L3 model with the desired ‘sparsity’.
  • emb_dim ({512, 256, 128, 64}) – Desired embedding dimension of the UST specialized embedding approximated (SEA) models.
  • retrain_type ('ft' or 'kd') – Type of retraining for the sparsified weights of L3 audio model. ‘ft’ chooses the fine-tuning method and ‘kd’ returns knowledge distilled model.
  • sparsity ({95.45, 53.5, 63.5, 72.3, 87.0}) – Desired sparsity of the audio model.
Returns:

output_path – Path to given model object

Return type:

str