Modules¶
edgel3.cli¶
-
edgel3.cli.
run
(inputs, output_dir=None, suffix=None, model_type='sparse', emb_dim=128, retrain_type='ft', sparsity=95.45, center=True, hop_size=0.1, verbose=False)[source]¶ Computes and saves L3 embedding for given inputs.
Parameters: - inputs (list of str, or str) – File/directory path or list of file/directory paths to be processed
- output_dir (str or None) – Path to directory for saving output files. If None, output files will be saved to the directory containing the input file.
- suffix (str or None) – String to be appended to the output filename, i.e. <base filename>_<suffix>.npy. If None, then no suffix will be added, i.e. <base filename>.npy.
- model_type ({sea, sparse}) – Type of smaller version of L3 model.
If
sea
is selected, the audio model is a UST specialized (SEA) model.sparse
gives a sparse L3 model with the desiredsparsity
. - emb_dim ({512, 256, 128, 64}) – Desired embedding dimension of the UST specialized embedding approximated (SEA) models.
- retrain_type (str) – Type of retraining after sparsification of the L3 audio. Finetuned model is returned for
ft
andkd
gives knowledge distilled sparse audio. - sparsity ({95.45, 53.5, 63.5, 72.3, 87.0}) – The desired sparsity to be achieved for the audio model of L3. Sparsity of 95.45 corresponds to the EdgeL3 model.
- center (boolean) – If True, pads beginning of signal so timestamps correspond to center of window.
- hop_size (float) – Hop size in seconds.
- quiet (boolean) – If True, suppress all non-error output to stdout
edgel3.core¶
-
edgel3.core.
_center_audio
(audio, frame_len)[source]¶ Center audio so that first sample will occur in the middle of the first frame
-
edgel3.core.
_pad_audio
(audio, frame_len, hop_len)[source]¶ Pad audio if necessary so that all samples are processed
-
edgel3.core.
get_embedding
(audio, sr, model=None, model_type='sparse', emb_dim=128, retrain_type='ft', sparsity=95.45, center=True, hop_size=0.1, verbose=1)[source]¶ Computes and returns L3 embedding for an audio data from pruned audio model.
Parameters: - audio (np.ndarray [shape=(N,) or (N,C)]) – 1D numpy array of audio data.
- sr (int) – Sampling rate, if not 48kHz or 8kHz will audio will be resampled for sparse and sea models respectively.
- model (keras.models.Model or None) – Loaded model object. If a model is provided, then sparsity will be ignored. If None is provided, the desired version of smaller L3 will be loaded, determined by model_type. model will be loaded using
- model_type ({'sea', 'sparse'}) – Type of smaller version of L3 model. If sea is selected, the audio model is a UST specialized (SEA) model. sparse gives a sparse L3 model with the desired ‘sparsity’.
- emb_dim ({512, 256, 128, 64}) – Desired embedding dimension of the UST specialized embedding approximated (SEA) models. Not used for sparse models.
- retrain_type ({'ft', 'kd'}) – Type of retraining for the sparsified weights of L3 audio model. ft chooses the fine-tuning method and kd returns knowledge distilled model.
- sparsity ({95.45, 53.5, 63.5, 72.3, 87.0}) – The desired sparsity of audio model.
- center (boolean) – If True, pads beginning of signal so timestamps correspond to center of window.
- hop_size (float) – Hop size in seconds.
- verbose (0 or 1) – Keras verbosity.
Returns: - embedding (np.ndarray [shape=(T, D)]) – Array of embeddings for each window.
- timestamps (np.ndarray [shape=(T,)]) – Array of timestamps corresponding to each embedding in the output.
-
edgel3.core.
get_output_path
(filepath, suffix, output_dir=None)[source]¶ Parameters: - filepath (str) – Path to audio file to be processed.
- suffix (str) – String to append to filename (including extension)
- output_dir (str or None) – Path to directory where file will be saved. If None, will use directory of given filepath.
Returns: output_path – Path to output file.
Return type: str
-
edgel3.core.
process_file
(filepath, output_dir=None, suffix=None, model=None, model_type='sparse', emb_dim=128, sparsity=95.45, center=True, hop_size=0.1, verbose=True)[source]¶ Computes and saves L3 embedding for given audio file
Parameters: - filepath (str) – Path to WAV file to be processed.
- output_dir (str or None) – Path to directory for saving output files. If None, output files will be saved to the directory containing the input file.
- suffix (str or None) – String to be appended to the output filename, i.e. <base filename>_<suffix>.npz. If None, then no suffix will be added, i.e. <base filename>.npz.
- model (keras.models.Model or None) – Loaded model object. If a model is provided, then model_type will be ignored.
If None is provided, UST specialized L3 or sparse L3 is loaded according to the
model_type
. - model_type ({'sea', 'sparse'}) – Type of smaller version of L3 model. If sea is selected, the audio model is a UST specialized (SEA) model. sparse gives a sparse L3 model with the desired ‘sparsity’.
- emb_dim ({512, 256, 128, 64}) – Desired embedding dimension of the UST specialized embedding approximated (SEA) models. Not used for sparse models.
- sparsity ({95.45, 53.5, 63.5, 72.3, 87.0}) – The desired sparsity of audio model.
- center (boolean) – If True, pads beginning of signal so timestamps correspond to center of window.
- hop_size (float) – Hop size in seconds.
- verbose (0 or 1) – Keras verbosity.
edgel3.models¶
-
edgel3.models.
_construct_sparsified_audio_network
(**kwargs)[source]¶ Returns an uninitialized model object for a sparsified network with a Melspectrogram input (with 256 frequency bins).
Returns: model – Model object. Return type: keras.models.Model
-
edgel3.models.
_construct_ust_specialized_audio_network
(emb_dim=128, **kwargs)[source]¶ Returns an uninitialized model object for a UST specialized audio network with a Melspectrogram input (with 64 frequency bins).
Returns: model – Model object. Return type: keras.models.Model
-
edgel3.models.
load_embedding_model
(model_type, emb_dim, retrain_type, sparsity)[source]¶ Returns a model with the given characteristics. Loads the model if the model has not been loaded yet.
Parameters: - model_type ({sea, sparse}) – Type of smaller version of L3 model. If ‘sea’ is selected, the audio model is a UST specialized (SEA) model. ‘sparse’ gives a sparse L3 model with the desired ‘sparsity’.
- emb_dim ({512, 256, 128, 64}) – Desired embedding dimension of the UST specialized embedding approximated (SEA) models.
- retrain_type ('ft' or 'kd') – Type of retraining for the sparsified weights of L3 audio model. ‘ft’ chooses the fine-tuning method and ‘kd’ returns knowledge distilled model.
- sparsity ({95.45, 53.5, 63.5, 72.3, 87.0}) – The desired sparsity of audio model.
Returns: model – Model object.
Return type: keras.models.Model
-
edgel3.models.
load_embedding_model_path
(model_type, emb_dim, retrain_type, sparsity)[source]¶ Returns the local path to the model weights file for the model with the given sparsity
Parameters: - model_type ({sea, sparse}) – Type of smaller version of L3 model. If ‘sea’ is selected, the audio model is a UST specialized (SEA) model. ‘sparse’ gives a sparse L3 model with the desired ‘sparsity’.
- emb_dim ({512, 256, 128, 64}) – Desired embedding dimension of the UST specialized embedding approximated (SEA) models.
- retrain_type ('ft' or 'kd') – Type of retraining for the sparsified weights of L3 audio model. ‘ft’ chooses the fine-tuning method and ‘kd’ returns knowledge distilled model.
- sparsity ({95.45, 53.5, 63.5, 72.3, 87.0}) – Desired sparsity of the audio model.
Returns: output_path – Path to given model object
Return type: str