Skip to content

Utils review

This looks correct, as usual, most of my request is documentation and sanity tests.

utils/data.py

The functions in this file all seem correct, however, adding docstrings and unit tests would make future maintainance much easier.

convert_m1q_to_lm1m2

  • remove mass_1 and mass_ratio
  • add log_mass_1 and log_mass_2
  • add log_prior = log(prior) + log(mass_2)
  • jacobian: \frac{\partial (\ln m_1, \ln m_2)}{\partial (m_1, q)} = \begin{bmatrix} \frac{1}{m_1} & 0 \\ 0 & \frac{1}{q} \end{bmatrix} = \frac{1}{m_2}

convert_m1_to_lm1

  • remove mass_1
  • add log_mass_1
  • add log_prior = log(prior) + log(mass_1)
  • jacobian: \frac{\partial (\ln m_1)}{\partial m_1} = \frac{1}{m_1}

convert_m1m2_to_lm1lm2

  • remove mass_1 and mass_2
  • add log_mass_1 and log_mass_2
  • add log_prior = log(prior) + log(mass_1) + log(mass_2)
  • jacobian: \frac{\partial (\ln m_1, \ln m_2)}{\partial (m_1, m_2)} = \begin{bmatrix} \frac{1}{m_1} & 0 \\ 0 & \frac{1}{m_2} \end{bmatrix} = \frac{1}{m_1 m_2}

clean_par

  • takes as arguments: dictionary of data, list of parameter names, minimum, maximum, and whether to remove bad samples
  • returns the data, modified in place
  • if remove, remove the samples that are outside the range
  • if not remove, set the samples outside the range to the center of support and log prior to infinity

place_in_bins

  • takes as arguments:
    • list of parameters
    • list of dictionaries of posteriors
    • dictionary of injections
    • the number of bins
    • the minima and maxima per parameter
  • returns a list of bin occupancies for posteriors, bin occupancies for injections, the bin edges, and the volume element per bin
  • default for minima and maxima are mutable, but they shouldn't be changed by the function
  • if bounds are missing, falls back to hardcoded defaults
  • calls place_samples_in_bins to do the binning

utils/nearest_neighbor.py

The function in here is primarily to put the sample data onto the pixelpop grid and compute adjacency matrices. The functions look sensible, however they are quite complex and generally should be documented better. There are potential danger points with jax not performing bounds checking.

is_valid

  • no docstring which makes this very difficult to parse without looking at it's usage in context
  • inputs:
    • l: a coordinate
    • base: the length along the dimension(s)
    • dimension: the dimensionality
  • false when:
    • the length of l doesn't match dimension OR
    • if base in an int, any element of l is not in [0, base) OR
    • if base is a list, any element of l is not in [0, base[i]) for each dimension i
  • true when:
    • the length of l matches dimension AND
    • if base in an int, all elements of l are in [0, base) OR
    • if base is a list, all elements of l are in [0, base[i]) for each dimension i
  • none otherwise

coordinate_to_index

  • has a docstring!
  • has a test for out of bounds access
  • core calculation is jnp.ravel_multi_index

index_to_coordinate

  • has a docstring!
  • inverse of coordinate_to_index
  • core_calculation is jnp.unravel_index
  • verified that these are inverses
import numpy as np
from pixelpop.utils.nearest_neighbor import index_to_coordinate, coordinate_to_index

for _ in range(100):
    dimension = np.random.choice(2) + 1
    density = list(np.random.randint(20, 30, dimension))
    index = np.random.randint(np.prod(density))
    assert coordinate_to_index(index_to_coordinate(index, dimension, density), density, dimension) == index

nearest_neighbors

  • has a minimal docstring, missing on optional argument isVisible
  • isVisible turns on a tqdm progress bar
  • why doesn't this use index_to_coordinate
  • runs over all valid points in full grid
  • runs over each dimension and tests previous and next, if it is_valid, add the index and coordinate to the lists

create_CAR_coupling_matrix

  • docstring has extra and missing parameters
  • returns the adjacency matrix for the given density and dimension using nearest_neighbors
  • returns a scipy.sparse.coo_array

place_grid_in_bins

  • appears unused, I'm going to ignore it

place_samples_in_bins

  • docstring is missing reshape, and different return format
  • main calculation uses jnp.digitize
  • if reshape returns the coordinates in the grid
  • else returns the indices in the flattened grid using coordinate_to_index

utils/event_utils.py

Ignoring this file as none of this functionality is planned for use.