Hi there! I’m a Ph.D. student in the Statistical Signal Inference (SSigInf) Group at the University of Cambridge. I hold a BA and an MEng in Information and Computer Engineering, also from Cambridge.

My research focuses on statistical signal processing and machine learning methods for audio and music processing, including high-resolution time-frequency analysis, music transcription, beat tracking, and signal decomposition.

I’m also interested in multi-object tracking and deep learning–based hierarchical generative models for music visualisation and composition.

Download CV

Publications

  • Link to full paper

    Authors: J. M. Cozens and S. J. Godsill

    Published in: IEEE Open Journal of Signal Processing, vol. 5, pp. 140-149, 2024, doi: 10.1109/OJSP.2023.3344048.

    Abstract: This paper proposes a probabilistic approach for extracting time-varying and irregular time signature information from polyphonic audio extracts, subsequently providing beat and bar line positions given inferred time signature divisions. This is achieved via dynamically evaluating the beat tempo as a function of time through finding an optimal compromise in beat and bar alignment in the time and tempo domains. Time signature divisions are determined based on a new representation, termed the Metrogram, that presents time-varying information regarding rhythmic and metric periodicities in the Tempogram. Our methodology is characterised by its ability to provide a distribution over metric interpretations, offering insights into the diverse ways music can be rhythmically perceived. Results indicate high-level accuracy for a variety of polyphonic extracts containing irregular, complex, irrational, and time-varying time signatures. Accuracy rivalling state-of-the-art methodologies is also reported in a beat tracking task performed on the standard Ballroom Dataset. The paper offers insights into the field of dynamic time signature recognition and beat tracking, offering a valuable and versatile resource for the analysis, composition, and performance of music.

  • Link to full paper

    Link to project page

    Authors: James M. Cozens, Simon J. Godsill

    Published in: arXiv Preprint (submitted to IEEE Transactions on Signal Processing)

    Abstract: We introduce a new method for estimating the Ideal Time-Frequency Representation (ITFR) of complex nonstationary signals. The Reconstructive Ideal Fractional Transform (RIFT) computes a constellation of Continuous Fractional Wavelet Transforms (CFWTs) aligned to different local time–frequency curvatures. This constellation is combined into a single optimised time-frequency energy representation via a localised entropy-based sparsity measure, designed to resolve auto-terms and attenuate cross-terms. Finally, a positivity-constrained Lucy–Richardson deconvolution with total-variation regularisation is applied to estimate the ITFR, achieving auto-term resolution comparable to that of the Wigner–Ville Distribution (WVD), yielding the high-resolution RIFT representation. The required Cohen's class convolutional kernels are fully derived in the paper for the chosen CFWT constellations. Additionally, the optimisation yields an Instantaneous Phase Direction (IPD) field, which allows the localised curvature in speech or music extracts to be visualised and utilised within a Kalman tracking scheme, enabling the extraction of signal component trajectories and the construction of the Spline-RIFT variant. Evaluation on synthetic and real-world signals demonstrates the algorithm's ability to effectively suppress cross-terms and achieve superior time-frequency precision relative to competing methods. This advance holds significant potential for a wide range of applications requiring high-resolution cross-term-free time-frequency analysis.

  • Link to full paper

    Link to project page

    Authors: J. M. Cozens and S. J. Godsill

    Published in: 2024 27th International Conference on Information Fusion (FUSION), Venice, Italy, 2024, pp. 1-8, doi: 10.23919/FUSION59988.2024.10706333.

    Abstract: This paper presents an adaptive approach to real-time multi-object localisation in addition to Siteswap inference, and performance evaluation metrics for juggling routines, employing a proposed bimodal machine learning-enhanced state-space model implementation. Considering the complex multi-modal characteristics exhibited by objects during performances, the paper introduces a bespoke Interacting Multiple Model (IMM) component for increased Siteswap beat detection accuracy and gravitational acceleration inference, and a scheme for causal Siteswap inference derived through machine learning-enhanced IMM mode outputs. The algorithm effectively models the transitory behaviour of the system, enabling rapid and smooth transitions between the two discrete tracking cases (airborne, and caught) and accurate Siteswap inference under a variety of camera and environmental conditions. The employment of beat tracking algorithms that exploit optimal compromises in time domain onset detection functions and Tempograms, enables effective error correction of Siteswap detections, in addition to providing performance analysis and visualisation utilities. Experimentally, the algorithm is capable of object tracking and Siteswap inference with up to 11 objects for a variety of challenging Siteswaps and conditions, serving as a versatile performance analysis, evaluation, and visualisation utility.


Featured Paper

RIFT: Entropy–Optimised Fractional Wavelet Constellations for Ideal Time–Frequency Estimation

Featured Projects

Fractal Visualiser

Explore

Chromesthetic Generative Music Visualiser

explore

Juggling Tracking, Siteswap Inference, and Analysis

explore

Polyrhythmic Siteswap Animator & Hybrid Juggling-Music Notation System

explore

Juggling Performance Simulation and Visualisation

explore

PVC Tube Instrument

explore

Siteswap Math Playground

explore

Reconstructive Ideal Fractional Transform (RIFT)

Explore