Transformers

Python wrapper around the SoX library. This module requires that SoX is installed.

class sox.transform.Transformer[source]

Audio file transformer. Class which allows multiple effects to be chained to create an output file, saved to output_filepath.

Methods

set_globals(dither, guard, multithread, …) Sets SoX’s global arguments.
build(input_filepath, pathlib.Path, …) Given an input file or array, creates an output_file on disk by executing the current set of commands.
build_file(input_filepath, pathlib.Path, …) An alias for build.
build_array(input_filepath, pathlib.Path, …) Given an input file or array, returns the ouput as a numpy array by executing the current set of commands.
allpass(frequency: float, width_q: float = 2.0)[source]

Apply a two-pole all-pass filter. An all-pass filter changes the audio’s frequency to phase relationship without changing its frequency to amplitude relationship. The filter is described in detail in at http://musicdsp.org/files/Audio-EQ-Cookbook.txt

Parameters:
frequency : float

The filter’s center frequency in Hz.

width_q : float, default=2.0

The filter’s width as a Q-factor.

bandpass(frequency: float, width_q: float = 2.0, constant_skirt: bool = False)[source]

Apply a two-pole Butterworth band-pass filter with the given central frequency, and (3dB-point) band-width. The filter rolls off at 6dB per octave (20dB per decade) and is described in detail in http://musicdsp.org/files/Audio-EQ-Cookbook.txt

Parameters:
frequency : float

The filter’s center frequency in Hz.

width_q : float, default=2.0

The filter’s width as a Q-factor.

constant_skirt : bool, default=False

If True, selects constant skirt gain (peak gain = width_q). If False, selects constant 0dB peak gain.

See also

bandreject, sinc
bandreject(frequency: float, width_q: float = 2.0, constant_skirt: bool = False)[source]

Apply a two-pole Butterworth band-reject filter with the given central frequency, and (3dB-point) band-width. The filter rolls off at 6dB per octave (20dB per decade) and is described in detail in http://musicdsp.org/files/Audio-EQ-Cookbook.txt

Parameters:
frequency : float

The filter’s center frequency in Hz.

width_q : float, default=2.0

The filter’s width as a Q-factor.

constant_skirt : bool, default=False

If True, selects constant skirt gain (peak gain = width_q). If False, selects constant 0dB peak gain.

See also

bandreject, sinc
bass(gain_db: float, frequency: float = 100.0, slope: float = 0.5)[source]

Boost or cut the bass (lower) frequencies of the audio using a two-pole shelving filter with a response similar to that of a standard hi-fi’s tone-controls. This is also known as shelving equalisation.

The filters are described in detail in http://musicdsp.org/files/Audio-EQ-Cookbook.txt

Parameters:
gain_db : float

The gain at 0 Hz. For a large cut use -20, for a large boost use 20.

frequency : float, default=100.0

The filter’s cutoff frequency in Hz.

slope : float, default=0.5

The steepness of the filter’s shelf transition. For a gentle slope use 0.3, and use 1.0 for a steep slope.

See also

treble, equalizer
bend(n_bends: int, start_times: List[float], end_times: List[float], cents: List[float], frame_rate: int = 25, oversample_rate: int = 16)[source]

Changes pitch by specified amounts at specified times. The pitch-bending algorithm utilises the Discrete Fourier Transform (DFT) at a particular frame rate and over-sampling rate.

Parameters:
n_bends : int

The number of intervals to pitch shift

start_times : list of floats

A list of absolute start times (in seconds), in order

end_times : list of floats

A list of absolute end times (in seconds) in order. [start_time, end_time] intervals may not overlap!

cents : list of floats

A list of pitch shifts in cents. A positive value shifts the pitch up, a negative value shifts the pitch down.

frame_rate : int, default=25

The number of DFT frames to process per second, between 10 and 80

oversample_rate: int, default=16

The number of frames to over sample per second, between 4 and 32

See also

pitch
biquad(b: List[float], a: List[float])[source]

Apply a biquad IIR filter with the given coefficients.

Parameters:
b : list of floats

Numerator coefficients. Must be length 3

a : list of floats

Denominator coefficients. Must be length 3

See also

fir, treble, bass, equalizer
build(input_filepath: Union[str, pathlib.Path, None] = None, output_filepath: Union[str, pathlib.Path, None] = None, input_array: Optional[str] = None, sample_rate_in: Optional[float] = None, extra_args: Optional[List[str]] = None, return_output: bool = False)[source]

Given an input file or array, creates an output_file on disk by executing the current set of commands. This function returns True on success. If return_output is True, this function returns a triple of (status, out, err), giving the success state, along with stdout and stderr returned by sox.

Parameters:
input_filepath : str or None

Either path to input audio file or None for array input.

output_filepath : str

Path to desired output file. If a file already exists at the given path, the file will be overwritten. If ‘-n’, no file is created.

input_array : np.ndarray or None

An np.ndarray of an waveform with shape (n_samples, n_channels). sample_rate_in must also be provided. If None, input_filepath must be specified.

sample_rate_in : int

Sample rate of input_array. This argument is ignored if input_array is None.

extra_args : list or None, default=None

If a list is given, these additional arguments are passed to SoX at the end of the list of effects. Don’t use this argument unless you know exactly what you’re doing!

return_output : bool, default=False

If True, returns the status and information sent to stderr and stdout as a tuple (status, stdout, stderr). If output_filepath is None, return_output=True by default. If False, returns True on success.

Returns:
status : bool

True on success.

out : str (optional)

This is not returned unless return_output is True. When returned, captures the stdout produced by sox.

err : str (optional)

This is not returned unless return_output is True. When returned, captures the stderr produced by sox.

Examples

>>> import numpy as np
>>> import sox
>>> tfm = sox.Transformer()
>>> sample_rate = 44100
>>> y = np.sin(2 * np.pi * 440.0 * np.arange(sample_rate * 1.0) / sample_rate)

file in, file out - basic usage

>>> status = tfm.build('path/to/input.wav', 'path/to/output.mp3')

file in, file out - equivalent usage

>>> status = tfm.build(
        input_filepath='path/to/input.wav',
        output_filepath='path/to/output.mp3'
    )

array in, file out

>>> status = tfm.build(
        input_array=y, sample_rate_in=sample_rate,
        output_filepath='path/to/output.mp3'
    )
build_array(input_filepath: Union[str, pathlib.Path, None] = None, input_array: Optional[numpy.ndarray] = None, sample_rate_in: Optional[float] = None, extra_args: Optional[List[str]] = None)[source]

Given an input file or array, returns the ouput as a numpy array by executing the current set of commands. By default the array will have the same sample rate as the input file unless otherwise specified using set_output_format. Functions such as rate, channels and convert will be ignored!

Parameters:
input_filepath : str or None

Either path to input audio file or None.

input_array : np.ndarray or None

A np.ndarray of an waveform with shape (n_samples, n_channels). If this argument is passed, sample_rate_in must also be provided. If None, input_filepath must be specified.

sample_rate_in : int

Sample rate of input_array. This argument is ignored if input_array is None.

extra_args : list or None, default=None

If a list is given, these additional arguments are passed to SoX at the end of the list of effects. Don’t use this argument unless you know exactly what you’re doing!

Returns:
output_array : np.ndarray

Output audio as a numpy array

Examples

>>> import numpy as np
>>> import sox
>>> tfm = sox.Transformer()
>>> sample_rate = 44100
>>> y = np.sin(2 * np.pi * 440.0 * np.arange(sample_rate * 1.0) / sample_rate)

file in, array out

>>> output_array = tfm.build(input_filepath='path/to/input.wav')

array in, array out

>>> output_array = tfm.build(input_array=y, sample_rate_in=sample_rate)

specifying the output sample rate

>>> tfm.set_output_format(rate=8000)
>>> output_array = tfm.build(input_array=y, sample_rate_in=sample_rate)

if an effect changes the number of channels, you must explicitly specify the number of output channels

>>> tfm.remix(remix_dictionary={1: [1], 2: [1], 3: [1]})
>>> tfm.set_output_format(channels=3)
>>> output_array = tfm.build(input_array=y, sample_rate_in=sample_rate)
build_file(input_filepath: Union[str, pathlib.Path, None] = None, output_filepath: Union[str, pathlib.Path, None] = None, input_array: Optional[numpy.ndarray] = None, sample_rate_in: Optional[float] = None, extra_args: Optional[List[str]] = None, return_output: bool = False)[source]

An alias for build. Given an input file or array, creates an output_file on disk by executing the current set of commands. This function returns True on success. If return_output is True, this function returns a triple of (status, out, err), giving the success state, along with stdout and stderr returned by sox.

Parameters:
input_filepath : str or None

Either path to input audio file or None for array input.

output_filepath : str

Path to desired output file. If a file already exists at the given path, the file will be overwritten. If ‘-n’, no file is created.

input_array : np.ndarray or None

An np.ndarray of an waveform with shape (n_samples, n_channels). sample_rate_in must also be provided. If None, input_filepath must be specified.

sample_rate_in : int

Sample rate of input_array. This argument is ignored if input_array is None.

extra_args : list or None, default=None

If a list is given, these additional arguments are passed to SoX at the end of the list of effects. Don’t use this argument unless you know exactly what you’re doing!

return_output : bool, default=False

If True, returns the status and information sent to stderr and stdout as a tuple (status, stdout, stderr). If output_filepath is None, return_output=True by default. If False, returns True on success.

Returns:
status : bool

True on success.

out : str (optional)

This is not returned unless return_output is True. When returned, captures the stdout produced by sox.

err : str (optional)

This is not returned unless return_output is True. When returned, captures the stderr produced by sox.

Examples

>>> import numpy as np
>>> import sox
>>> tfm = sox.Transformer()
>>> sample_rate = 44100
>>> y = np.sin(2 * np.pi * 440.0 * np.arange(sample_rate * 1.0) / sample_rate)

file in, file out - basic usage

>>> status = tfm.build('path/to/input.wav', 'path/to/output.mp3')

file in, file out - equivalent usage

>>> status = tfm.build(
        input_filepath='path/to/input.wav',
        output_filepath='path/to/output.mp3'
    )

array in, file out

>>> status = tfm.build(
        input_array=y, sample_rate_in=sample_rate,
        output_filepath='path/to/output.mp3'
    )
channels(n_channels: int)[source]

Change the number of channels in the audio signal. If decreasing the number of channels it mixes channels together, if increasing the number of channels it duplicates.

Note: This overrides arguments used in the convert effect!

Parameters:
n_channels : int

Desired number of channels.

See also

convert
chorus(gain_in: float = 0.5, gain_out: float = 0.9, n_voices: int = 3, delays: Optional[List[float]] = None, decays: Optional[List[float]] = None, speeds: Optional[List[float]] = None, depths: Optional[List[float]] = None, shapes: Optional[List[typing_extensions.Literal['s', 't'][s, t]]] = None)[source]

Add a chorus effect to the audio. This can makeasingle vocal sound like a chorus, but can also be applied to instrumentation.

Chorus resembles an echo effect with a short delay, but whereas with echo the delay is constant, with chorus, it is varied using sinusoidal or triangular modulation. The modulation depth defines the range the modulated delay is played before or after the delay. Hence the delayed sound will sound slower or faster, that is the delayed sound tuned around the original one, like in a chorus where some vocals are slightly off key.

Parameters:
gain_in : float, default=0.3

The time in seconds over which the instantaneous level of the input signal is averaged to determine increases in volume.

gain_out : float, default=0.8

The time in seconds over which the instantaneous level of the input signal is averaged to determine decreases in volume.

n_voices : int, default=3

The number of voices in the chorus effect.

delays : list of floats > 20 or None, default=None

If a list, the list of delays (in miliseconds) of length n_voices. If None, the individual delay parameters are chosen automatically to be between 40 and 60 miliseconds.

decays : list of floats or None, default=None

If a list, the list of decays (as a fraction of gain_in) of length n_voices. If None, the individual decay parameters are chosen automatically to be between 0.3 and 0.4.

speeds : list of floats or None, default=None

If a list, the list of modulation speeds (in Hz) of length n_voices If None, the individual speed parameters are chosen automatically to be between 0.25 and 0.4 Hz.

depths : list of floats or None, default=None

If a list, the list of depths (in miliseconds) of length n_voices. If None, the individual delay parameters are chosen automatically to be between 1 and 3 miliseconds.

shapes : list of ‘s’ or ‘t’ or None, default=None

If a list, the list of modulation shapes - ‘s’ for sinusoidal or ‘t’ for triangular - of length n_voices. If None, the individual shapes are chosen automatically.

clear_effects()[source]

Remove all effects processes.

compand(attack_time: float = 0.3, decay_time: float = 0.8, soft_knee_db: float = 6.0, tf_points: List[Tuple[float, float]] = [(-70, -70), (-60, -20), (0, 0)])[source]

Compand (compress or expand) the dynamic range of the audio.

Parameters:
attack_time : float, default=0.3

The time in seconds over which the instantaneous level of the input signal is averaged to determine increases in volume.

decay_time : float, default=0.8

The time in seconds over which the instantaneous level of the input signal is averaged to determine decreases in volume.

soft_knee_db : float or None, default=6.0

The ammount (in dB) for which the points at where adjacent line segments on the transfer function meet will be rounded. If None, no soft_knee is applied.

tf_points : list of tuples

Transfer function points as a list of tuples corresponding to points in (dB, dB) defining the compander’s transfer function.

See also

mcompand, contrast
contrast(amount=75)[source]

Comparable with compression, this effect modifies an audio signal to make it sound louder.

Parameters:
amount : float

Amount of enhancement between 0 and 100.

See also

compand, mcompand
convert(samplerate: Optional[float] = None, n_channels: Optional[int] = None, bitdepth: Optional[int] = None)[source]

Converts output audio to the specified format.

Parameters:
samplerate : float, default=None

Desired samplerate. If None, defaults to the same as input.

n_channels : int, default=None

Desired number of channels. If None, defaults to the same as input.

bitdepth : int, default=None

Desired bitdepth. If None, defaults to the same as input.

See also

rate
dcshift(shift: float = 0.0)[source]

Apply a DC shift to the audio.

Parameters:
shift : float

Amount to shift audio between -2 and 2. (Audio is between -1 and 1)

See also

highpass
deemph()[source]

Apply Compact Disc (IEC 60908) de-emphasis (a treble attenuation shelving filter). Pre-emphasis was applied in the mastering of some CDs issued in the early 1980s. These included many classical music albums, as well as now sought-after issues of albums by The Beatles, Pink Floyd and others. Pre-emphasis should be removed at playback time by a de-emphasis filter in the playback device. However, not all modern CD players have this filter, and very few PC CD drives have it; playing pre-emphasised audio without the correct de-emphasis filter results in audio that sounds harsh and is far from what its creators intended.

The de-emphasis filter is implemented as a biquad and requires the input audio sample rate to be either 44.1kHz or 48kHz. Maximum deviation from the ideal response is only 0.06dB (up to 20kHz).

See also

bass, treble
delay(positions: List[float])[source]

Delay one or more audio channels such that they start at the given positions.

Parameters:
positions: list of floats

List of times (in seconds) to delay each audio channel. If fewer positions are given than the number of channels, the remaining channels will be unaffected.

downsample(factor: int = 2)[source]

Downsample the signal by an integer factor. Only the first out of each factor samples is retained, the others are discarded.

No decimation filter is applied. If the input is not a properly bandlimited baseband signal, aliasing will occur. This may be desirable e.g., for frequency translation.

For a general resampling effect with anti-aliasing, see rate.

Parameters:
factor : int, default=2

Downsampling factor.

See also

rate, upsample
earwax()[source]

Makes audio easier to listen to on headphones. Adds ‘cues’ to 44.1kHz stereo audio so that when listened to on headphones the stereo image is moved from inside your head (standard for headphones) to outside and in front of the listener (standard for speakers).

Warning: Will only work properly on 44.1kHz stereo audio!

echo(gain_in: float = 0.8, gain_out: float = 0.9, n_echos: int = 1, delays: List[float] = [60], decays: List[float] = [0.4])[source]

Add echoing to the audio.

Echoes are reflected sound and can occur naturally amongst mountains (and sometimes large buildings) when talking or shouting; digital echo effects emulate this behav- iour and are often used to help fill out the sound of a single instrument or vocal. The time differ- ence between the original signal and the reflection is the ‘delay’ (time), and the loudness of the reflected signal is the ‘decay’. Multiple echoes can have different delays and decays.

Parameters:
gain_in : float, default=0.8

Input volume, between 0 and 1

gain_out : float, default=0.9

Output volume, between 0 and 1

n_echos : int, default=1

Number of reflections

delays : list, default=[60]

List of delays in miliseconds

decays : list, default=[0.4]

List of decays, relative to gain in between 0 and 1

See also

echos, reverb, chorus
echos(gain_in: float = 0.8, gain_out: float = 0.9, n_echos: int = 1, delays: List[float] = [60], decays: List[float] = [0.4])[source]

Add a sequence of echoes to the audio.

Like the echo effect, echos stand for ‘ECHO in Sequel’, that is the first echos takes the input, the second the input and the first echos, the third the input and the first and the second echos, … and so on. Care should be taken using many echos; a single echos has the same effect as a single echo.

Parameters:
gain_in : float, default=0.8

Input volume, between 0 and 1

gain_out : float, default=0.9

Output volume, between 0 and 1

n_echos : int, default=1

Number of reflections

delays : list, default=[60]

List of delays in miliseconds

decays : list, default=[0.4]

List of decays, relative to gain in between 0 and 1

See also

echo, reverb, chorus
equalizer(frequency: float, width_q: float, gain_db: float)[source]

Apply a two-pole peaking equalisation (EQ) filter to boost or reduce around a given frequency. This effect can be applied multiple times to produce complex EQ curves.

Parameters:
frequency : float

The filter’s central frequency in Hz.

width_q : float

The filter’s width as a Q-factor.

gain_db : float

The filter’s gain in dB.

See also

bass, treble
fade(fade_in_len: float = 0.0, fade_out_len: float = 0.0, fade_shape: typing_extensions.Literal['q', 'h', 't', 'l', 'p'][q, h, t, l, p] = 'q')[source]

Add a fade in and/or fade out to an audio file. Default fade shape is 1/4 sine wave.

Parameters:
fade_in_len : float, default=0.0

Length of fade-in (seconds). If fade_in_len = 0, no fade in is applied.

fade_out_len : float, defaut=0.0

Length of fade-out (seconds). If fade_out_len = 0, no fade in is applied.

fade_shape : str, default=’q’
Shape of fade. Must be one of
  • ‘q’ for quarter sine (default),
  • ‘h’ for half sine,
  • ‘t’ for linear,
  • ‘l’ for logarithmic
  • ‘p’ for inverted parabola.

See also

splice
fir(coefficients: List[float])[source]

Use SoX’s FFT convolution engine with given FIR filter coefficients.

Parameters:
coefficients : list

fir filter coefficients

flanger(delay: float = 0, depth: float = 2, regen: float = 0, width: float = 71, speed: float = 0.5, shape: typing_extensions.Literal['sine', 'triangle'][sine, triangle] = 'sine', phase: float = 25, interp: typing_extensions.Literal['linear', 'quadratic'][linear, quadratic] = 'linear')[source]

Apply a flanging effect to the audio.

Parameters:
delay : float, default=0

Base delay (in miliseconds) between 0 and 30.

depth : float, default=2

Added swept delay (in miliseconds) between 0 and 10.

regen : float, default=0

Percentage regeneration between -95 and 95.

width : float, default=71,

Percentage of delayed signal mixed with original between 0 and 100.

speed : float, default=0.5

Sweeps per second (in Hz) between 0.1 and 10.

shape : ‘sine’ or ‘triangle’, default=’sine’

Swept wave shape

phase : float, default=25

Swept wave percentage phase-shift for multi-channel flange between 0 and 100. 0 = 100 = same phase on each channel

interp : ‘linear’ or ‘quadratic’, default=’linear’

Digital delay-line interpolation type.

See also

tremolo
gain(gain_db: float = 0.0, normalize: bool = True, limiter: bool = False, balance: Optional[typing_extensions.Literal['e', 'B', 'b'][e, B, b]] = None)[source]

Apply amplification or attenuation to the audio signal.

Parameters:
gain_db : float, default=0.0

Gain adjustment in decibels (dB).

normalize : bool, default=True

If True, audio is normalized to gain_db relative to full scale. If False, simply adjusts the audio power level by gain_db.

limiter : bool, default=False

If True, a simple limiter is invoked to prevent clipping.

balance : str or None, default=None
Balance gain across channels. Can be one of:
  • None applies no balancing (default)
  • ‘e’ applies gain to all channels other than that with the
    highest peak level, such that all channels attain the same peak level
  • ‘B’ applies gain to all channels other than that with the
    highest RMS level, such that all channels attain the same RMS level
  • ‘b’ applies gain with clipping protection to all channels other
    than that with the highest RMS level, such that all channels attain the same RMS level

If normalize=True, ‘B’ and ‘b’ are equivalent.

See also

loudness
highpass(frequency: float, width_q: float = 0.707, n_poles: int = 2)[source]

Apply a high-pass filter with 3dB point frequency. The filter can be either single-pole or double-pole. The filters roll off at 6dB per pole per octave (20dB per pole per decade).

Parameters:
frequency : float

The filter’s cutoff frequency in Hz.

width_q : float, default=0.707

The filter’s width as a Q-factor. Applies only when n_poles=2. The default gives a Butterworth response.

n_poles : int, default=2

The number of poles in the filter. Must be either 1 or 2

hilbert(num_taps: Optional[int] = None)[source]

Apply an odd-tap Hilbert transform filter, phase-shifting the signal by 90 degrees. This is used in many matrix coding schemes and for analytic signal generation. The process is often written as a multiplication by i (or j), the imaginary unit. An odd-tap Hilbert transform filter has a bandpass characteristic, attenuating the lowest and highest frequencies.

Parameters:
num_taps : int or None, default=None

Number of filter taps - must be odd. If none, it is chosen to have a cutoff frequency of about 75 Hz.

loudness(gain_db: float = -10.0, reference_level: float = 65.0)[source]

Loudness control. Similar to the gain effect, but provides equalisation for the human auditory system.

The gain is adjusted by gain_db and the signal is equalised according to ISO 226 w.r.t. reference_level.

Parameters:
gain_db : float, default=-10.0

Loudness adjustment amount (in dB)

reference_level : float, default=65.0

Reference level (in dB) according to which the signal is equalized. Must be between 50 and 75 (dB)

See also

gain
lowpass(frequency: float, width_q: float = 0.707, n_poles: int = 2)[source]

Apply a low-pass filter with 3dB point frequency. The filter can be either single-pole or double-pole. The filters roll off at 6dB per pole per octave (20dB per pole per decade).

Parameters:
frequency : float

The filter’s cutoff frequency in Hz.

width_q : float, default=0.707

The filter’s width as a Q-factor. Applies only when n_poles=2. The default gives a Butterworth response.

n_poles : int, default=2

The number of poles in the filter. Must be either 1 or 2

mcompand(n_bands: int = 2, crossover_frequencies: List[float] = [1600], attack_time: List[float] = [0.005, 0.000625], decay_time: List[float] = [0.1, 0.0125], soft_knee_db: List[Optional[float]] = [6.0, None], tf_points: List[List[Tuple[float, float]]] = [[(-47, -40), (-34, -34), (-17, -33), (0, 0)], [(-47, -40), (-34, -34), (-15, -33), (0, 0)]], gain: List[Optional[float]] = [None, None])[source]

The multi-band compander is similar to the single-band compander but the audio is first divided into bands using Linkwitz-Riley cross-over filters and a separately specifiable compander run on each band.

When used with n_bands=1, this effect is identical to compand. When using n_bands > 1, the first set of arguments applies a single band compander, and each subsequent set of arugments is applied on each of the crossover frequencies.

Parameters:
n_bands : int, default=2

The number of bands.

crossover_frequencies : list of float, default=[1600]

A list of crossover frequencies in Hz of length n_bands-1. The first band is always the full spectrum, followed by the bands specified by crossover_frequencies.

attack_time : list of float, default=[0.005, 0.000625]

A list of length n_bands, where each element is the time in seconds over which the instantaneous level of the input signal is averaged to determine increases in volume over the current band.

decay_time : list of float, default=[0.1, 0.0125]

A list of length n_bands, where each element is the time in seconds over which the instantaneous level of the input signal is averaged to determine decreases in volume over the current band.

soft_knee_db : list of float or None, default=[6.0, None]

A list of length n_bands, where each element is the ammount (in dB) for which the points at where adjacent line segments on the transfer function meet will be rounded over the current band. If None, no soft_knee is applied.

tf_points : list of list of tuples, default=[

[(-47, -40), (-34, -34), (-17, -33), (0, 0)], [(-47, -40), (-34, -34), (-15, -33), (0, 0)]]

A list of length n_bands, where each element is the transfer function points as a list of tuples corresponding to points in (dB, dB) defining the compander’s transfer function over the current band.

gain : list of floats or None

A list of gain values for each frequency band. If None, no gain is applied.

See also

compand, contrast
noiseprof(input_filepath: Union[str, pathlib.Path], profile_path: Union[str, pathlib.Path])[source]

Calculate a profile of the audio for use in noise reduction. Running this command does not effect the Transformer effects chain. When this function is called, the calculated noise profile file is saved to the profile_path.

Parameters:
input_filepath : str

Path to audiofile from which to compute a noise profile.

profile_path : str

Path to save the noise profile file.

See also

noisered
noisered(profile_path: Union[str, pathlib.Path], amount: float = 0.5)[source]

Reduce noise in the audio signal by profiling and filtering. This effect is moderately effective at removing consistent background noise such as hiss or hum.

Parameters:
profile_path : str

Path to a noise profile file. This file can be generated using the noiseprof effect.

amount : float, default=0.5

How much noise should be removed is specified by amount. Should be between 0 and 1. Higher numbers will remove more noise but present a greater likelihood of removing wanted components of the audio signal.

See also

noiseprof
norm(db_level: float = -3.0)[source]

Normalize an audio file to a particular db level. This behaves identically to the gain effect with normalize=True.

Parameters:
db_level : float, default=-3.0

Output volume (db)

See also

gain, loudness
oops()[source]

Out Of Phase Stereo effect. Mixes stereo to twin-mono where each mono channel contains the difference between the left and right stereo channels. This is sometimes known as the ‘karaoke’ effect as it often has the effect of removing most or all of the vocals from a recording.

overdrive(gain_db: float = 20.0, colour: float = 20.0)[source]

Apply non-linear distortion.

Parameters:
gain_db : float, default=20

Controls the amount of distortion (dB).

colour : float, default=20

Controls the amount of even harmonic content in the output (dB).

pad(start_duration: float = 0.0, end_duration: float = 0.0)[source]

Add silence to the beginning or end of a file. Calling this with the default arguments has no effect.

Parameters:
start_duration : float

Number of seconds of silence to add to beginning.

end_duration : float

Number of seconds of silence to add to end.

See also

delay
phaser(gain_in: float = 0.8, gain_out: float = 0.74, delay: int = 3, decay: float = 0.4, speed: float = 0.5, modulation_shape: typing_extensions.Literal['sinusoidal', 'triangular'][sinusoidal, triangular] = 'sinusoidal')[source]

Apply a phasing effect to the audio.

Parameters:
gain_in : float, default=0.8

Input volume between 0 and 1

gain_out: float, default=0.74

Output volume between 0 and 1

delay : float, default=3

Delay in miliseconds between 0 and 5

decay : float, default=0.4

Decay relative to gain_in, between 0.1 and 0.5.

speed : float, default=0.5

Modulation speed in Hz, between 0.1 and 2

modulation_shape : str, defaul=’sinusoidal’

Modulation shpae. One of ‘sinusoidal’ or ‘triangular’

See also

flanger, tremolo
pitch(n_semitones: float, quick: bool = False)[source]

Pitch shift the audio without changing the tempo.

This effect uses the WSOLA algorithm. The audio is chopped up into segments which are then shifted in the time domain and overlapped (cross-faded) at points where their waveforms are most similar as determined by measurement of least squares.

Parameters:
n_semitones : float

The number of semitones to shift. Can be positive or negative.

quick : bool, default=False

If True, this effect will run faster but with lower sound quality.

See also

bend, speed, tempo
power_spectrum(input_filepath: Union[str, pathlib.Path])[source]

Calculates the power spectrum (4096 point DFT). This method internally invokes the stat command with the -freq option.

Note: The file is downmixed to mono prior to computation.

Parameters:
input_filepath : str

Path to input file to compute stats on.

Returns:
power_spectrum : list

List of frequency (Hz), amplitude pairs.

See also

stat, stats, sox.file_info
preview(input_filepath: Union[str, pathlib.Path])[source]

Play a preview of the output with the current set of effects

Parameters:
input_filepath : str

Path to input audio file.

rate(samplerate: float, quality: typing_extensions.Literal['q', 'l', 'm', 'h', 'v'][q, l, m, h, v] = 'h')[source]

Change the audio sampling rate (i.e. resample the audio) to any given samplerate. Better the resampling quality = slower runtime.

Parameters:
samplerate : float

Desired sample rate.

quality : str
Resampling quality. One of:
  • q : Quick - very low quality,
  • l : Low,
  • m : Medium,
  • h : High (default),
  • v : Very high
remix(remix_dictionary: Optional[Dict[int, List[int]]] = None, num_output_channels: Optional[int] = None)[source]

Remix the channels of an audio file.

Note: volume options are not yet implemented

Parameters:
remix_dictionary : dict or None

Dictionary mapping output channel to list of input channel(s). Empty lists indicate the corresponding output channel should be empty. If None, mixes all channels down to a single mono file.

num_output_channels : int or None

The number of channels in the output file. If None, the number of output channels is equal to the largest key in remix_dictionary. If remix_dictionary is None, this variable is ignored.

Examples

Remix a 4-channel input file. The output file will have input channel 2 in channel 1, a mixdown of input channels 1 an 3 in channel 2, an empty channel 3, and a copy of input channel 4 in channel 4.

>>> import sox
>>> tfm = sox.Transformer()
>>> remix_dictionary = {1: [2], 2: [1, 3], 4: [4]}
>>> tfm.remix(remix_dictionary)
repeat(count: int = 1)[source]

Repeat the entire audio count times.

Parameters:
count : int, default=1

The number of times to repeat the audio.

reverb(reverberance: float = 50, high_freq_damping: float = 50, room_scale: float = 100, stereo_depth: float = 100, pre_delay: float = 0, wet_gain: float = 0, wet_only: bool = False)[source]

Add reverberation to the audio using the ‘freeverb’ algorithm. A reverberation effect is sometimes desirable for concert halls that are too small or contain so many people that the hall’s natural reverberance is diminished. Applying a small amount of stereo reverb to a (dry) mono signal will usually make it sound more natural.

Parameters:
reverberance : float, default=50

Percentage of reverberance

high_freq_damping : float, default=50

Percentage of high-frequency damping.

room_scale : float, default=100

Scale of the room as a percentage.

stereo_depth : float, default=100

Stereo depth as a percentage.

pre_delay : float, default=0

Pre-delay in milliseconds.

wet_gain : float, default=0

Amount of wet gain in dB

wet_only : bool, default=False

If True, only outputs the wet signal.

See also

echo
reverse()[source]

Reverse the audio completely

set_globals(dither: bool = False, guard: bool = False, multithread: bool = False, replay_gain: bool = False, verbosity: int = 2)[source]

Sets SoX’s global arguments. Overwrites any previously set global arguments. If this function is not explicity called, globals are set to this function’s defaults.

Parameters:
dither : bool, default=False

If True, dithering is applied for low files with low bit rates.

guard : bool, default=False

If True, invokes the gain effect to guard against clipping.

multithread : bool, default=False

If True, each channel is processed in parallel.

replay_gain : bool, default=False

If True, applies replay-gain adjustment to input-files.

verbosity : int, default=2
SoX’s verbosity level. One of:
  • 0 : No messages are shown at all
  • 1 : Only error messages are shown. These are generated if SoX
    cannot complete the requested commands.
  • 2 : Warning messages are also shown. These are generated if
    SoX can complete the requested commands, but not exactly according to the requested command parameters, or if clipping occurs.
  • 3 : Descriptions of SoX’s processing phases are also shown.
    Useful for seeing exactly how SoX is processing your audio.
  • 4, >4 : Messages to help with debugging SoX are also shown.
set_input_format(file_type: Optional[str] = None, rate: Optional[float] = None, bits: Optional[int] = None, channels: Optional[int] = None, encoding: Optional[typing_extensions.Literal['signed-integer', 'unsigned-integer', 'floating-point', 'a-law', 'u-law', 'oki-adpcm', 'ima-adpcm', 'ms-adpcm', 'gsm-full-rate'][signed-integer, unsigned-integer, floating-point, a-law, u-law, oki-adpcm, ima-adpcm, ms-adpcm, gsm-full-rate]] = None, ignore_length: bool = False)[source]

Sets input file format arguments. This is primarily useful when dealing with audio files without a file extension. Overwrites any previously set input file arguments.

If this function is not explicity called the input format is inferred from the file extension or the file’s header.

Parameters:
file_type : str or None, default=None

The file type of the input audio file. Should be the same as what the file extension would be, for ex. ‘mp3’ or ‘wav’.

rate : float or None, default=None

The sample rate of the input audio file. If None the sample rate is inferred.

bits : int or None, default=None

The number of bits per sample. If None, the number of bits per sample is inferred.

channels : int or None, default=None

The number of channels in the audio file. If None the number of channels is inferred.

encoding : str or None, default=None

The audio encoding type. Sometimes needed with file-types that support more than one encoding type. One of:

  • signed-integer : PCM data stored as signed (‘two’s
    complement’) integers. Commonly used with a 16 or 24−bit encoding size. A value of 0 represents minimum signal power.
  • unsigned-integer : PCM data stored as unsigned integers.
    Commonly used with an 8-bit encoding size. A value of 0 represents maximum signal power.
  • floating-point : PCM data stored as IEEE 753 single precision
    (32-bit) or double precision (64-bit) floating-point (‘real’) numbers. A value of 0 represents minimum signal power.
  • a-law : International telephony standard for logarithmic
    encoding to 8 bits per sample. It has a precision equivalent to roughly 13-bit PCM and is sometimes encoded with reversed bit-ordering.
  • u-law : North American telephony standard for logarithmic
    encoding to 8 bits per sample. A.k.a. μ-law. It has a precision equivalent to roughly 14-bit PCM and is sometimes encoded with reversed bit-ordering.
  • oki-adpcm : OKI (a.k.a. VOX, Dialogic, or Intel) 4-bit ADPCM;
    it has a precision equivalent to roughly 12-bit PCM. ADPCM is a form of audio compression that has a good compromise between audio quality and encoding/decoding speed.
  • ima-adpcm : IMA (a.k.a. DVI) 4-bit ADPCM; it has a precision
    equivalent to roughly 13-bit PCM.
  • ms-adpcm : Microsoft 4-bit ADPCM; it has a precision
    equivalent to roughly 14-bit PCM.
  • gsm-full-rate : GSM is currently used for the vast majority
    of the world’s digital wireless telephone calls. It utilises several audio formats with different bit-rates and associated speech quality. SoX has support for GSM’s original 13kbps ‘Full Rate’ audio format. It is usually CPU-intensive to work with GSM audio.
ignore_length : bool, default=False

If True, overrides an (incorrect) audio length given in an audio file’s header. If this option is given then SoX will keep reading audio until it reaches the end of the input file.

set_output_format(file_type: Optional[str] = None, rate: Optional[float] = None, bits: Optional[int] = None, channels: Optional[int] = None, encoding: Optional[typing_extensions.Literal['signed-integer', 'unsigned-integer', 'floating-point', 'a-law', 'u-law', 'oki-adpcm', 'ima-adpcm', 'ms-adpcm', 'gsm-full-rate'][signed-integer, unsigned-integer, floating-point, a-law, u-law, oki-adpcm, ima-adpcm, ms-adpcm, gsm-full-rate]] = None, comments: Optional[str] = None, append_comments: bool = True)[source]

Sets output file format arguments. These arguments will overwrite any format related arguments supplied by other effects (e.g. rate).

If this function is not explicity called the output format is inferred from the file extension or the file’s header.

Parameters:
file_type : str or None, default=None

The file type of the output audio file. Should be the same as what the file extension would be, for ex. ‘mp3’ or ‘wav’.

rate : float or None, default=None

The sample rate of the output audio file. If None the sample rate is inferred.

bits : int or None, default=None

The number of bits per sample. If None, the number of bits per sample is inferred.

channels : int or None, default=None

The number of channels in the audio file. If None the number of channels is inferred.

encoding : str or None, default=None

The audio encoding type. Sometimes needed with file-types that support more than one encoding type. One of:

  • signed-integer : PCM data stored as signed (‘two’s
    complement’) integers. Commonly used with a 16 or 24−bit encoding size. A value of 0 represents minimum signal power.
  • unsigned-integer : PCM data stored as unsigned integers.
    Commonly used with an 8-bit encoding size. A value of 0 represents maximum signal power.
  • floating-point : PCM data stored as IEEE 753 single precision
    (32-bit) or double precision (64-bit) floating-point (‘real’) numbers. A value of 0 represents minimum signal power.
  • a-law : International telephony standard for logarithmic
    encoding to 8 bits per sample. It has a precision equivalent to roughly 13-bit PCM and is sometimes encoded with reversed bit-ordering.
  • u-law : North American telephony standard for logarithmic
    encoding to 8 bits per sample. A.k.a. μ-law. It has a precision equivalent to roughly 14-bit PCM and is sometimes encoded with reversed bit-ordering.
  • oki-adpcm : OKI (a.k.a. VOX, Dialogic, or Intel) 4-bit ADPCM;
    it has a precision equivalent to roughly 12-bit PCM. ADPCM is a form of audio compression that has a good compromise between audio quality and encoding/decoding speed.
  • ima-adpcm : IMA (a.k.a. DVI) 4-bit ADPCM; it has a precision
    equivalent to roughly 13-bit PCM.
  • ms-adpcm : Microsoft 4-bit ADPCM; it has a precision
    equivalent to roughly 14-bit PCM.
  • gsm-full-rate : GSM is currently used for the vast majority
    of the world’s digital wireless telephone calls. It utilises several audio formats with different bit-rates and associated speech quality. SoX has support for GSM’s original 13kbps ‘Full Rate’ audio format. It is usually CPU-intensive to work with GSM audio.
comments : str or None, default=None

If not None, the string is added as a comment in the header of the output audio file. If None, no comments are added.

append_comments : bool, default=True

If True, comment strings are appended to SoX’s default comments. If False, the supplied comment replaces the existing comment.

silence(location: typing_extensions.Literal[0, 1, -1][0, 1, -1] = 0, silence_threshold: float = 0.1, min_silence_duration: float = 0.1, buffer_around_silence: bool = False)[source]

Removes silent regions from an audio file.

Parameters:
location : int, default=0
Where to remove silence. One of:
  • 0 to remove silence throughout the file (default),
  • 1 to remove silence from the beginning,
  • -1 to remove silence from the end,
silence_threshold : float, default=0.1

Silence threshold as percentage of maximum sample amplitude. Must be between 0 and 100.

min_silence_duration : float, default=0.1

The minimum ammount of time in seconds required for a region to be considered non-silent.

buffer_around_silence : bool, default=False

If True, leaves a buffer of min_silence_duration around removed silent regions.

See also

vad
sinc(filter_type: typing_extensions.Literal['high', 'low', 'pass', 'reject'][high, low, pass, reject] = 'high', cutoff_freq: Union[float, List[float]] = 3000, stop_band_attenuation: float = 120, transition_bw: Union[float, List[float], None] = None, phase_response: Optional[float] = None)[source]

Apply a sinc kaiser-windowed low-pass, high-pass, band-pass, or band-reject filter to the signal.

Parameters:
filter_type : str, default=’high’
Type of filter. One of:
  • ‘high’ for a high-pass filter
  • ‘low’ for a low-pass filter
  • ‘pass’ for a band-pass filter
  • ‘reject’ for a band-reject filter
cutoff_freq : float or list, default=3000

A scalar or length 2 list indicating the filter’s critical frequencies. The critical frequencies are given in Hz and must be positive. For a high-pass or low-pass filter, cutoff_freq must be a scalar. For a band-pass or band-reject filter, it must be a length 2 list.

stop_band_attenuation : float, default=120

The stop band attenuation in dB

transition_bw : float, list or None, default=None

The transition band-width in Hz. If None, sox’s default of 5% of the total bandwith is used. If a float, the given transition bandwith is used for both the upper and lower bands (if applicable). If a list, the first argument is used for the lower band and the second for the upper band.

phase_response : float or None

The filter’s phase response between 0 (minimum) and 100 (maximum). If None, sox’s default phase repsonse is used.

See also

band, bandpass, bandreject, highpass, lowpass
speed(factor: float)[source]

Adjust the audio speed (pitch and tempo together).

Technically, the speed effect only changes the sample rate information, leaving the samples themselves untouched. The rate effect is invoked automatically to resample to the output sample rate, using its default quality/speed. For higher quality or higher speed resampling, in addition to the speed effect, specify the rate effect with the desired quality option.

Parameters:
factor : float

The ratio of the new speed to the old speed. For ex. 1.1 speeds up the audio by 10%; 0.9 slows it down by 10%. Note - this argument is the inverse of what is passed to the sox stretch effect for consistency with speed.

See also

rate, tempo, pitch
stat(input_filepath: Union[str, pathlib.Path], scale: Optional[float] = None, rms: Optional[bool] = False)[source]

Display time and frequency domain statistical information about the audio. Audio is passed unmodified through the SoX processing chain.

Unlike other Transformer methods, this does not modify the transformer effects chain. Instead it computes statistics on the output file that would be created if the build command were invoked.

Note: The file is downmixed to mono prior to computation.

Parameters:
input_filepath : str

Path to input file to compute stats on.

scale : float or None, default=None

If not None, scales the input by the given scale factor.

rms : bool, default=False

If True, scales all values by the average rms amplitude.

Returns:
stat_dict : dict

Dictionary of statistics.

stats(input_filepath: Union[str, pathlib.Path])[source]

Display time domain statistical information about the audio channels. Audio is passed unmodified through the SoX processing chain. Statistics are calculated and displayed for each audio channel

Unlike other Transformer methods, this does not modify the transformer effects chain. Instead it computes statistics on the output file that would be created if the build command were invoked.

Note: The file is downmixed to mono prior to computation.

Parameters:
input_filepath : str

Path to input file to compute stats on.

Returns:
stats_dict : dict

List of frequency (Hz), amplitude pairs.

See also

stat, sox.file_info
stretch(factor: float, window: float = 20)[source]

Change the audio duration (but not its pitch). Unless factor is close to 1, use the tempo effect instead.

This effect is broadly equivalent to the tempo effect with search set to zero, so in general, its results are comparatively poor; it is retained as it can sometimes out-perform tempo for small factors.

Parameters:
factor : float

The ratio of the new tempo to the old tempo. For ex. 1.1 speeds up the tempo by 10%; 0.9 slows it down by 10%. Note - this argument is the inverse of what is passed to the sox stretch effect for consistency with tempo.

window : float, default=20

Window size in miliseconds

See also

tempo, speed, pitch
swap()[source]

Swap stereo channels. If the input is not stereo, pairs of channels are swapped, and a possible odd last channel passed through.

E.g., for seven channels, the output order will be 2, 1, 4, 3, 6, 5, 7.

See also

remix
tempo(factor: float, audio_type: Optional[typing_extensions.Literal['m', 's', 'l'][m, s, l]] = None, quick: bool = False)[source]

Time stretch audio without changing pitch.

This effect uses the WSOLA algorithm. The audio is chopped up into segments which are then shifted in the time domain and overlapped (cross-faded) at points where their waveforms are most similar as determined by measurement of least squares.

Parameters:
factor : float

The ratio of new tempo to the old tempo. For ex. 1.1 speeds up the tempo by 10%; 0.9 slows it down by 10%.

audio_type : str
Type of audio, which optimizes algorithm parameters. One of:
  • m : Music,
  • s : Speech,
  • l : Linear (useful when factor is close to 1),
quick : bool, default=False

If True, this effect will run faster but with lower sound quality.

See also

stretch, speed, pitch
treble(gain_db: float, frequency: float = 3000.0, slope: float = 0.5)[source]

Boost or cut the treble (lower) frequencies of the audio using a two-pole shelving filter with a response similar to that of a standard hi-fi’s tone-controls. This is also known as shelving equalisation.

The filters are described in detail in http://musicdsp.org/files/Audio-EQ-Cookbook.txt

Parameters:
gain_db : float

The gain at the Nyquist frequency. For a large cut use -20, for a large boost use 20.

frequency : float, default=100.0

The filter’s cutoff frequency in Hz.

slope : float, default=0.5

The steepness of the filter’s shelf transition. For a gentle slope use 0.3, and use 1.0 for a steep slope.

See also

bass, equalizer
tremolo(speed: float = 6.0, depth: float = 40.0)[source]

Apply a tremolo (low frequency amplitude modulation) effect to the audio. The tremolo frequency in Hz is giv en by speed, and the depth as a percentage by depth (default 40).

Parameters:
speed : float

Tremolo speed in Hz.

depth : float

Tremolo depth as a percentage of the total amplitude.

See also

flanger

Examples

>>> tfm = sox.Transformer()

For a growl-type effect

>>> tfm.tremolo(speed=100.0)
trim(start_time: float, end_time: Optional[float] = None)[source]

Excerpt a clip from an audio file, given the start timestamp and end timestamp of the clip within the file, expressed in seconds. If the end timestamp is set to None or left unspecified, it defaults to the duration of the audio file.

Parameters:
start_time : float

Start time of the clip (seconds)

end_time : float or None, default=None

End time of the clip (seconds)

upsample(factor: int = 2)[source]

Upsample the signal by an integer factor: zero-value samples are inserted between each pair of input samples. As a result, the original spectrum is replicated into the new frequency space (imaging) and attenuated. The upsample effect is typically used in combination with filtering effects.

Parameters:
factor : int, default=2

Integer upsampling factor.

See also

rate, downsample
vad(location: typing_extensions.Literal[1, -1][1, -1] = 1, normalize: bool = True, activity_threshold: float = 7.0, min_activity_duration: float = 0.25, initial_search_buffer: float = 1.0, max_gap: float = 0.25, initial_pad: float = 0.0)[source]

Voice Activity Detector. Attempts to trim silence and quiet background sounds from the ends of recordings of speech. The algorithm currently uses a simple cepstral power measurement to detect voice, so may be fooled by other things, especially music.

The effect can trim only from the front of the audio, so in order to trim from the back, the reverse effect must also be used.

Parameters:
location : 1 or -1, default=1

If 1, trims silence from the beginning If -1, trims silence from the end

normalize : bool, default=True

If true, normalizes audio before processing.

activity_threshold : float, default=7.0

The measurement level used to trigger activity detection. This may need to be cahnged depending on the noise level, signal level, and other characteristics of the input audio.

min_activity_duration : float, default=0.25

The time constant (in seconds) used to help ignore short bursts of sound.

initial_search_buffer : float, default=1.0

The amount of audio (in seconds) to search for quieter/shorter bursts of audio to include prior to the detected trigger point.

max_gap : float, default=0.25

The allowed gap (in seconds) between quiteter/shorter bursts of audio to include prior to the detected trigger point

initial_pad : float, default=0.0

The amount of audio (in seconds) to preserve before the trigger point and any found quieter/shorter bursts.

See also

silence

Examples

>>> tfm = sox.Transformer()

Remove silence from the beginning of speech

>>> tfm.vad(initial_pad=0.3)

Remove silence from the end of speech

>>> tfm.vad(location=-1, initial_pad=0.2)
vol(gain: float, gain_type: typing_extensions.Literal['amplitude', 'power', 'db'][amplitude, power, db] = 'amplitude', limiter_gain: Optional[float] = None)[source]

Apply an amplification or an attenuation to the audio signal.

Parameters:
gain : float

Interpreted according to the given gain_type. If `gain_type’ = ‘amplitude’, `gain’ is a positive amplitude ratio. If `gain_type’ = ‘power’, `gain’ is a power (voltage squared). If `gain_type’ = ‘db’, `gain’ is in decibels.

gain_type : string, default=’amplitude’
Type of gain. One of:
  • ‘amplitude’
  • ‘power’
  • ‘db’
limiter_gain : float or None, default=None

If specified, a limiter is invoked on peaks greater than limiter_gain’ to prevent clipping. `limiter_gain should be a positive value much less than 1.

See also

gain, compand

Combiners

Python wrapper around the SoX library. This module requires that SoX is installed.

class sox.combine.Combiner[source]

Audio file combiner. Class which allows multiple files to be combined to create an output file, saved to output_filepath.

Inherits all methods from the Transformer class, thus any effects can be applied after combining.

Methods

allpass(frequency, width_q) Apply a two-pole all-pass filter.
bandpass(frequency, width_q, constant_skirt) Apply a two-pole Butterworth band-pass filter with the given central frequency, and (3dB-point) band-width.
bandreject(frequency, width_q, constant_skirt) Apply a two-pole Butterworth band-reject filter with the given central frequency, and (3dB-point) band-width.
bass(gain_db, frequency, slope) Boost or cut the bass (lower) frequencies of the audio using a two-pole shelving filter with a response similar to that of a standard hi-fi’s tone-controls.
bend(n_bends, start_times, end_times, cents, …) Changes pitch by specified amounts at specified times.
biquad(b, a) Apply a biquad IIR filter with the given coefficients.
build(input_filepath_list, pathlib.Path], …) Builds the output_file by executing the current set of commands.
build_array(input_filepath, pathlib.Path, …) Given an input file or array, returns the ouput as a numpy array by executing the current set of commands.
build_file(input_filepath, pathlib.Path, …) An alias for build.
channels(n_channels) Change the number of channels in the audio signal.
chorus(gain_in, gain_out, n_voices, delays, …) Add a chorus effect to the audio.
clear_effects() Remove all effects processes.
compand(attack_time, decay_time, …) Compand (compress or expand) the dynamic range of the audio.
contrast([amount]) Comparable with compression, this effect modifies an audio signal to make it sound louder.
convert(samplerate, n_channels, bitdepth) Converts output audio to the specified format.
dcshift(shift) Apply a DC shift to the audio.
deemph() Apply Compact Disc (IEC 60908) de-emphasis (a treble attenuation shelving filter).
delay(positions) Delay one or more audio channels such that they start at the given positions.
downsample(factor) Downsample the signal by an integer factor.
earwax() Makes audio easier to listen to on headphones.
echo(gain_in, gain_out, n_echos, delays, decays) Add echoing to the audio.
echos(gain_in, gain_out, n_echos, delays, decays) Add a sequence of echoes to the audio.
equalizer(frequency, width_q, gain_db) Apply a two-pole peaking equalisation (EQ) filter to boost or reduce around a given frequency.
fade(fade_in_len, fade_out_len, fade_shape, …) Add a fade in and/or fade out to an audio file.
fir(coefficients) Use SoX’s FFT convolution engine with given FIR filter coefficients.
flanger(delay, depth, regen, width, speed, …) Apply a flanging effect to the audio.
gain(gain_db, normalize, limiter, balance, , …) Apply amplification or attenuation to the audio signal.
highpass(frequency, width_q, n_poles) Apply a high-pass filter with 3dB point frequency.
hilbert(num_taps) Apply an odd-tap Hilbert transform filter, phase-shifting the signal by 90 degrees.
loudness(gain_db, reference_level) Loudness control.
lowpass(frequency, width_q, n_poles) Apply a low-pass filter with 3dB point frequency.
mcompand(n_bands, crossover_frequencies, …) The multi-band compander is similar to the single-band compander but the audio is first divided into bands using Linkwitz-Riley cross-over filters and a separately specifiable compander run on each band.
noiseprof(input_filepath, pathlib.Path], …) Calculate a profile of the audio for use in noise reduction.
noisered(profile_path, pathlib.Path], amount) Reduce noise in the audio signal by profiling and filtering.
norm(db_level) Normalize an audio file to a particular db level.
oops() Out Of Phase Stereo effect.
overdrive(gain_db, colour) Apply non-linear distortion.
pad(start_duration, end_duration) Add silence to the beginning or end of a file.
phaser(gain_in, gain_out, delay, decay, …) Apply a phasing effect to the audio.
pitch(n_semitones, quick) Pitch shift the audio without changing the tempo.
power_spectrum(input_filepath, pathlib.Path]) Calculates the power spectrum (4096 point DFT).
preview(input_filepath_list, pathlib.Path]], …) Play a preview of the output with the current set of effects
rate(samplerate, quality, , , , ][q, l, m, …) Change the audio sampling rate (i.e.
remix(remix_dictionary, List[int]]] = None, …) Remix the channels of an audio file.
repeat(count) Repeat the entire audio count times.
reverb(reverberance, high_freq_damping, …) Add reverberation to the audio using the ‘freeverb’ algorithm.
reverse() Reverse the audio completely
set_globals(dither, guard, multithread, …) Sets SoX’s global arguments.
set_input_format(file_type, rate, bits, …) Sets input file format arguments.
set_output_format(file_type, rate, bits, …) Sets output file format arguments.
silence(location, 1, -1][0, 1, -1] = 0, …) Removes silent regions from an audio file.
sinc(filter_type, , , ][high, low, pass, …) Apply a sinc kaiser-windowed low-pass, high-pass, band-pass, or band-reject filter to the signal.
speed(factor) Adjust the audio speed (pitch and tempo together).
stat(input_filepath, pathlib.Path], scale, rms) Display time and frequency domain statistical information about the audio.
stats(input_filepath, pathlib.Path]) Display time domain statistical information about the audio channels.
stretch(factor, window) Change the audio duration (but not its pitch).
swap() Swap stereo channels.
tempo(factor, audio_type, , ][m, s, …) Time stretch audio without changing pitch.
treble(gain_db, frequency, slope) Boost or cut the treble (lower) frequencies of the audio using a two-pole shelving filter with a response similar to that of a standard hi-fi’s tone-controls.
tremolo(speed, depth) Apply a tremolo (low frequency amplitude modulation) effect to the audio.
trim(start_time, end_time) Excerpt a clip from an audio file, given the start timestamp and end timestamp of the clip within the file, expressed in seconds.
upsample(factor) Upsample the signal by an integer factor: zero-value samples are inserted between each pair of input samples.
vad(location, -1][1, -1] = 1, normalize, …) Voice Activity Detector.
vol(gain, gain_type, , ][amplitude, power, …) Apply an amplification or an attenuation to the audio signal.
build(input_filepath_list: Union[str, pathlib.Path], output_filepath: Union[str, pathlib.Path], combine_type: typing_extensions.Literal['concatenate', 'merge', 'mix', 'mix-power', 'multiply'][concatenate, merge, mix, mix-power, multiply], input_volumes: Optional[List[float]] = None)[source]

Builds the output_file by executing the current set of commands.

Parameters:
input_filepath_list : list of str

List of paths to input audio files.

output_filepath : str

Path to desired output file. If a file already exists at the given path, the file will be overwritten.

combine_type : str
Input file combining method. One of the following values:
  • concatenate : combine input files by concatenating in the
    order given.
  • merge : combine input files by stacking each input file into
    a new channel of the output file.
  • mix : combine input files by summing samples in corresponding
    channels.
  • mix-power : combine input files with volume adjustments such
    that the output volume is roughly equivlent to one of the input signals.
  • multiply : combine input files by multiplying samples in
    corresponding samples.
input_volumes : list of float, default=None

List of volumes to be applied upon combining input files. Volumes are applied to the input files in order. If None, input files will be combined at their original volumes.

Returns:
status : bool

True on success.

preview(input_filepath_list: List[Union[str, pathlib.Path]], combine_type: typing_extensions.Literal['concatenate', 'merge', 'mix', 'mix-power', 'multiply'][concatenate, merge, mix, mix-power, multiply], input_volumes: Optional[List[float]] = None)[source]

Play a preview of the output with the current set of effects

Parameters:
input_filepath_list : list of str

List of paths to input audio files.

combine_type : str
Input file combining method. One of the following values:
  • concatenate : combine input files by concatenating in the
    order given.
  • merge : combine input files by stacking each input file into
    a new channel of the output file.
  • mix : combine input files by summing samples in corresponding
    channels.
  • mix-power : combine input files with volume adjustments such
    that the output volume is roughly equivlent to one of the input signals.
  • multiply : combine input files by multiplying samples in
    corresponding samples.
input_volumes : list of float, default=None

List of volumes to be applied upon combining input files. Volumes are applied to the input files in order. If None, input files will be combined at their original volumes.

set_input_format(file_type: Optional[List[str]] = None, rate: Optional[List[float]] = None, bits: Optional[List[int]] = None, channels: Optional[List[int]] = None, encoding: Optional[List[typing_extensions.Literal['signed-integer', 'unsigned-integer', 'floating-point', 'a-law', 'u-law', 'oki-adpcm', 'ima-adpcm', 'ms-adpcm', 'gsm-full-rate'][signed-integer, unsigned-integer, floating-point, a-law, u-law, oki-adpcm, ima-adpcm, ms-adpcm, gsm-full-rate]]] = None, ignore_length: Optional[List[bool]] = None)[source]

Sets input file format arguments. This is primarily useful when dealing with audio files without a file extension. Overwrites any previously set input file arguments.

If this function is not explicity called the input format is inferred from the file extension or the file’s header.

Parameters:
file_type : list of str or None, default=None

The file type of the input audio file. Should be the same as what the file extension would be, for ex. ‘mp3’ or ‘wav’.

rate : list of float or None, default=None

The sample rate of the input audio file. If None the sample rate is inferred.

bits : list of int or None, default=None

The number of bits per sample. If None, the number of bits per sample is inferred.

channels : list of int or None, default=None

The number of channels in the audio file. If None the number of channels is inferred.

encoding : list of str or None, default=None

The audio encoding type. Sometimes needed with file-types that support more than one encoding type. One of:

  • signed-integer : PCM data stored as signed (‘two’s
    complement’) integers. Commonly used with a 16 or 24−bit encoding size. A value of 0 represents minimum signal power.
  • unsigned-integer : PCM data stored as unsigned integers.
    Commonly used with an 8-bit encoding size. A value of 0 represents maximum signal power.
  • floating-point : PCM data stored as IEEE 753 single precision
    (32-bit) or double precision (64-bit) floating-point (‘real’) numbers. A value of 0 represents minimum signal power.
  • a-law : International telephony standard for logarithmic
    encoding to 8 bits per sample. It has a precision equivalent to roughly 13-bit PCM and is sometimes encoded with reversed bit-ordering.
  • u-law : North American telephony standard for logarithmic
    encoding to 8 bits per sample. A.k.a. μ-law. It has a precision equivalent to roughly 14-bit PCM and is sometimes encoded with reversed bit-ordering.
  • oki-adpcm : OKI (a.k.a. VOX, Dialogic, or Intel) 4-bit ADPCM;
    it has a precision equivalent to roughly 12-bit PCM. ADPCM is a form of audio compression that has a good compromise between audio quality and encoding/decoding speed.
  • ima-adpcm : IMA (a.k.a. DVI) 4-bit ADPCM; it has a precision
    equivalent to roughly 13-bit PCM.
  • ms-adpcm : Microsoft 4-bit ADPCM; it has a precision
    equivalent to roughly 14-bit PCM.
  • gsm-full-rate : GSM is currently used for the vast majority
    of the world’s digital wireless telephone calls. It utilises several audio formats with different bit-rates and associated speech quality. SoX has support for GSM’s original 13kbps ‘Full Rate’ audio format. It is usually CPU-intensive to work with GSM audio.
ignore_length : list of bool or None, default=None

If True, overrides an (incorrect) audio length given in an audio file’s header. If this option is given then SoX will keep reading audio until it reaches the end of the input file.

File info

Audio file info computed by soxi.

sox.file_info.bitdepth(input_filepath: Union[str, pathlib.Path]) → Optional[int][source]

Number of bits per sample, or None if not applicable.

Parameters:
input_filepath : str

Path to audio file.

Returns:
bitdepth : int or None

Number of bits per sample. Returns None if not applicable.

sox.file_info.bitrate(input_filepath: Union[str, pathlib.Path]) → Optional[float][source]

Bit rate averaged over the whole file. Expressed in bytes per second (bps), or None if not applicable.

Parameters:
input_filepath : str

Path to audio file.

Returns:
bitrate : float or None

Bit rate, expressed in bytes per second. Returns None if not applicable.

sox.file_info.channels(input_filepath: Union[str, pathlib.Path]) → int[source]

Show number of channels.

Parameters:
input_filepath : str

Path to audio file.

Returns:
channels : int

number of channels

sox.file_info.comments(input_filepath: Union[str, pathlib.Path]) → str[source]

Show file comments (annotations) if available.

Parameters:
input_filepath : str

Path to audio file.

Returns:
comments : str

File comments from header. If no comments are present, returns an empty string.

sox.file_info.duration(input_filepath: Union[str, pathlib.Path]) → Optional[float][source]

Show duration in seconds, or None if not available.

Parameters:
input_filepath : str

Path to audio file.

Returns:
duration : float or None

Duration of audio file in seconds. If unavailable or empty, returns None.

sox.file_info.encoding(input_filepath: Union[str, pathlib.Path]) → str[source]

Show the name of the audio encoding.

Parameters:
input_filepath : str

Path to audio file.

Returns:
encoding : str

audio encoding type

sox.file_info.file_extension(filepath: Union[str, pathlib.Path]) → str[source]

Get the extension of a filepath.

Parameters:
filepath : path-like (str or pathlib.Path)

File path.

Returns:
extension : str

The file’s extension

sox.file_info.file_type(input_filepath: Union[str, pathlib.Path]) → str[source]

Show detected file-type.

Parameters:
input_filepath : str

Path to audio file.

Returns:
file_type : str

file format type (ex. ‘wav’)

sox.file_info.info(filepath: Union[str, pathlib.Path]) → Dict[str, Union[str, numbers.Number]][source]

Get a dictionary of file information

Parameters:
filepath : str

File path.

Returns:
info_dictionary : dict
Dictionary of file information. Fields are:
  • channels
  • sample_rate
  • bitdepth
  • bitrate
  • duration
  • num_samples
  • encoding
  • silent
sox.file_info.num_samples(input_filepath: Union[str, pathlib.Path]) → Optional[int][source]

Show number of samples, or None if unavailable.

Parameters:
input_filepath : path-like (str or pathlib.Path)

Path to audio file.

Returns:
n_samples : int or None

total number of samples in audio file. Returns None if empty or unavailable.

sox.file_info.sample_rate(input_filepath: Union[str, pathlib.Path]) → float[source]

Show sample-rate.

Parameters:
input_filepath : str

Path to audio file.

Returns:
samplerate : float

number of samples/second

sox.file_info.silent(input_filepath: Union[str, pathlib.Path], threshold: float = 0.001) → bool[source]

Determine if an input file is silent.

Parameters:
input_filepath : str

The input filepath.

threshold : float

Threshold for determining silence

Returns:
is_silent : bool

True if file is determined silent.

sox.file_info.stat(filepath: Union[str, pathlib.Path]) → Dict[str, Optional[float]][source]

Returns a dictionary of audio statistics.

Parameters:
filepath : str

File path.

Returns:
stat_dictionary : dict

Dictionary of audio statistics.

sox.file_info.validate_input_file(input_filepath: Union[str, pathlib.Path]) → None[source]

Input file validation function. Checks that file exists and can be processed by SoX.

Parameters:
input_filepath : path-like (str or pathlib.Path)

The input filepath.

sox.file_info.validate_input_file_list(input_filepath_list: List[Union[str, pathlib.Path]]) → None[source]

Input file list validation function. Checks that object is a list and contains valid filepaths that can be processed by SoX.

Parameters:
input_filepath_list : list

A list of filepaths.

sox.file_info.validate_output_file(output_filepath: Union[str, pathlib.Path]) → None[source]

Output file validation function. Checks that file can be written, and has a valid file extension. Throws a warning if the path already exists, as it will be overwritten on build.

Parameters:
output_filepath : path-like (str or pathlib.Path)

The output filepath.

Core functionality

Base module for calling SoX

exception sox.core.SoxError(*args, **kwargs)[source]

Exception to be raised when SoX exits with non-zero status.

exception sox.core.SoxiError(*args, **kwargs)[source]

Exception to be raised when SoXI exits with non-zero status.

sox.core.all_equal(list_of_things: List[Any]) → bool[source]

Check if a list contains identical elements.

Parameters:
list_of_things : list

list of objects

Returns:
all_equal : bool

True if all list elements are the same.

sox.core.is_number(var: Any) → bool[source]

Check if variable is a numeric value.

Parameters:
var : object
Returns:
is_number : bool

True if var is numeric, False otherwise.

sox.core.play(args: Iterable[str]) → bool[source]

Pass an argument list to play.

Parameters:
args : iterable

Argument list for play. The first item can, but does not need to, be ‘play’.

Returns:
status : bool

True on success.

sox.core.sox(args: Iterable[str], src_array: Optional[numpy.ndarray] = None, decode_out_with_utf: bool = True) → Tuple[bool, Union[str, numpy.ndarray, None], Optional[str]][source]

Pass an argument list to SoX.

Parameters:
args : iterable

Argument list for SoX. The first item can, but does not need to, be ‘sox’.

src_array : np.ndarray, or None

If src_array is not None, then we make sure it’s a numpy array and pass it into stdin.

decode_out_with_utf : bool, default=True

Whether or not sox is outputting a bytestring that should be decoded with utf-8.

Returns:
status : bool

True on success.

out : str, np.ndarray, or None

Returns a np.ndarray if src_array was an np.ndarray. Returns the stdout produced by sox if src_array is None. Otherwise, returns None if there’s an error.

err : str, or None

Returns stderr as a string.

sox.core.soxi(filepath: Union[str, pathlib.Path], argument: str) → str[source]

Base call to SoXI.

Parameters:
filepath : path-like (str or pathlib.Path)

Path to audio file.

argument : str

Argument to pass to SoXI.

Returns:
shell_output : str

Command line output of SoXI