Performance

This doc describes the performance-oriented pieces of astropath: the numpy-accelerated core calculations, the optional numba acceleration for the fixed-grid posterior, and the profiling module used to measure them.

numpy-accelerated core

The two most expensive steps in a PATH analysis have been re-implemented in pure numpy:

localization.calc_LWx (the localization term \(L(w-x)\)) for the error-ellipse (eellipse) localization now computes the angular separation and position angle with closed-form numpy formulae instead of astropy SkyCoord operations. Results are numerically identical (to roundoff) and the routine is several times faster on large grids.
bayesian.px_Oi_fixedgrid (the fixed-grid \(p(x|O_i)\)) pre-extracts the candidate and center coordinates into numpy arrays once, rather than reading astropy attributes inside the per-candidate loop.
bayesian.px_Oi_local (the local-grid \(p(x|O_i)\), which builds one grid per candidate) has likewise been rewritten in pure numpy for the eellipse localization: it pre-extracts the coordinates once and evaluates \(L(w-x)\) directly in a flat-sky tangent plane, removing every astropy call from the per-candidate loop. Other localization types (healpix, wcs) fall back to localization.calc_LWx as before. See Local-grid posterior below for details.

These changes are transparent: you do not need to do anything to benefit from them, and the public APIs are unchanged.

Optional numba acceleration

The per-candidate loop of bayesian.px_Oi_fixedgrid can optionally be evaluated with a numba kernel (bayesian.px_Oi_numba) that fuses the offset, offset-PDF, product with \(L(w-x)\), and grid sum into a single pass with no full-grid temporaries.

To enable it, pass use_numba=True:

from astropath import bayesian

p_xOi = bayesian.px_Oi_fixedgrid(
    box_hwidth, localiz, cand_coords, cand_ang_size,
    theta_prior, step_size=0.1, use_numba=True)

Key points:

Optional. numba need not be installed. If it is absent (or if you simply leave use_numba=False, the default), the calculation runs via the standard numpy path. With use_numba=True but numba not installed, the code emits a warning and falls back to numpy — it never errors.
Default is off. use_numba defaults to False, so existing code and results are unchanged unless you opt in.
Only ``px_Oi_fixedgrid``. The numba option is available only for the fixed-grid method. px_Oi_local and the other routines are unaffected. The fused kernel returns only the scalar posterior, so it is bypassed (numpy path used) when return_grids or return_debug is requested.
Primarily for sandbox analyses. numba is recommended mainly for interactive/sandbox work and large-grid experiments, where its speed-up is largest. The first call pays a one-time JIT compilation cost; the win grows with the grid size and the number of candidates (roughly 1.5x on small grids up to ~5-6x on a 7200x7200 grid with many candidates).

Note

The numba flag is exposed on bayesian.px_Oi_fixedgrid directly. The high-level path.PATH.calc_posteriors does not currently forward use_numba; call bayesian.px_Oi_fixedgrid directly to use it.

Local-grid posterior

bayesian.px_Oi_local evaluates \(p(x|O_i)\) on a separate grid per candidate, centered on the galaxy and sized to the offset prior (box_hwidth = phi*max). It is the method of choice for localizations that span a large area of sky, where a single fixed grid would be prohibitively large.

For the eellipse localization the calculation is pure numpy and flat-sky:

The per-candidate grid is built once in normalized units and merely rescaled per galaxy — the pixel count ngrid = 2*max/step_size is the same for every candidate.
\(L(w-x)\) is evaluated directly in the tangent plane (offsets rotated into the ellipse frame), matching the spherical calc_LWx to ~1e-5 fractionally at arcsec scales.

Here step_size is relative to the galaxy size (default 0.05), so the grid spacing is phi*step_size.

Note

px_Oi_local is pure numpy and does not require (or use) numba — for now. The optional numba acceleration described above applies only to px_Oi_fixedgrid. px_Oi_local is fast on its own because each per-candidate grid is small.

Small-localization correction

When the localization minor axis \(b\) is smaller than the galaxy angular size \(\phi\), the galaxy-centered grid under-resolves the sharp localization and the raw sum is biased low. In that case px_Oi_local divides the result by a correction factor computed by bayesian._Lwx_correction: the discrete “total \(L(w-x)\)” on a small grid that is centered on the localization and aligned to the galaxy grid (same spacing, shifted by an integer number of cells). Because the localization is sampled at the same sub-cell phase in the raw sum and in this factor, the under-resolution bias cancels in the ratio (accurate to ~1%). This is the local analogue of px_Oi_fixedgrid’s correction='L_wx'. The correction grid is bounded (it is skipped when it would exceed ~5000 cells per side, which only happens when the localization is already well resolved and no correction is needed), so it never allocates a large array.

Profiling module

The astropath.profiling module measures these calculations across a range of grid sizes. It mirrors the calculations/step_size/Profiling.ipynb notebook: a faux FRB, a circular error ellipse, and a set of candidate galaxies spread over a range of locations and angular sizes.

Run it from the command line:

python -m astropath.profiling

This sweeps a range of step_size values (hence grid sizes) and profiles both posterior methods:

calc_LWx, the numpy px_Oi_fixedgrid, and (if numba is installed) the numba path with its speed-up factor — written to profiling_timing.png;
px_Oi_local (via run_profiling_local), whose per-candidate grid size is 2*max/step_size — written to profiling_local_timing.png.

Both timing tables are printed to the screen.

You can also import and call the pieces directly:

from astropath import profiling

df = profiling.run_profiling()            # fixed-grid (+ numba)
profiling.plot_results(df, 'timing.png')

df_local = profiling.run_profiling_local()   # local-grid method
profiling.plot_local_results(df_local, 'local.png')

Benchmark results

Accuracy

px_Oi_local is validated against px_Oi_fixedgrid run on a fine grid (the astropath/tests/tests_local.py module). Both evaluate the same integral \(p(x|O_i)=\int L(w-x)\,p(w|O_i)\,dw\); the fine fixed grid is taken as truth. Across the size regimes the local method reproduces it to a fraction of a percent (run at step_size=0.02):

px_Oi_local vs fine fixed grid (step_size = 0.02)
Case	localization	galaxy	rel. diff
large galaxy / small loc	a=b=1″	10″	+7.0e-5
large galaxy / large loc	a=b=10″	10″	-3.3e-3
small galaxy / small loc	a=b=1″	1″	-3.3e-3
small galaxy / large loc	a=b=10″	1″	-3.3e-3
small galaxy / ellipse	a=10″, b=0.2″	0.5″	-3.3e-3

When the localization minor axis is smaller than the galaxy (\(b<\phi\)) the _Lwx_correction removes the O(step) bias, so the result stays accurate even at coarse (galaxy-relative) step sizes where the uncorrected sum would be off by ~1-2 %:

L_wx correction (b < phi), corrected rel. diff vs raw
Case	step	raw (no corr.)	corrected
gal 10″ / loc a=b=1″	0.05	-0.85 %	-1.7e-4
gal 10″ / loc a=b=1″	0.10	-1.85 %	-1.9e-3
ellipse a=10″, b=0.2″	0.05	-0.84 %	-7.1e-6
ellipse a=10″, b=0.2″	0.10	-1.66 %	+3.4e-5

Even for a very small (sub-arcsec) localization, where the galaxy grid badly under-resolves the localization, the correction recovers the fine fixed-grid value (step_size=0.05):

Very small localization (a=b=0.1”)
galaxy	rel. diff
0.3″	+6.9e-5
0.6″	+2.5e-5
1.0″	+4.8e-5

Profiling

Timings below were produced by python -m astropath.profiling (50 candidate galaxies; absolute times are machine-dependent, the trends and ratios are the point).

Fixed-grid posterior (px_Oi_fixedgrid): the numba path overtakes numpy beyond small grids, reaching ~5-6x on the largest grids; the one-off JIT cost makes it slower than numpy only on the smallest grid.

px_Oi_fixedgrid timing (50 candidates)
step	grid	calc_LWx	numpy	numba	numba speed-up
0.50	360²	9 ms	30 ms	145 ms	0.2x
0.25	720²	42 ms	173 ms	91 ms	1.9x
0.10	1800²	344 ms	2.06 s	670 ms	3.1x
0.05	3600²	1.67 s	16.4 s	2.96 s	5.6x
0.025	7200²	6.49 s	68.5 s	12.4 s	5.5x

_images/profiling_timing.png — Fixed-grid timing vs grid side length (sqrt of pixels). The dashed line marks 10 s.

Local-grid posterior (px_Oi_local): the cost depends on the localization. With no correction (circular, \(b\ge\phi\)) or a tiny localization (a=b=0.1”, small correction grid) the per-candidate cost is modest; a long thin ellipse (a=12.5”, b=0.2”) drives the correction grid to ~1000² and dominates the run time – i.e. the correction cost scales with the localization major axis.

px_Oi_local timing (50 candidates), milliseconds
step	galaxy grid	ellipse corr grid	circular	ellipse	small loc
0.50	24²	97²	0.9	6.8	2.3
0.25	48²	197²	1.8	22	3.3
0.10	120²	497²	7.9	146	11
0.05	240²	997²	31	1553	41
0.025	480²	1997²	155	9037	185

_images/profiling_local_timing.png — Local-grid timing vs per-candidate galaxy-grid side length, for the three localization scenarios. The dashed line marks 10 s.

API Reference

Profiling utilities for the PATH calculations.

This module times the two core PATH steps that were targeted for the numpy speed-up work (see prompts/speed_up.md):

localization.calc_LWx – the localization term L(w-x)

bayesian.px_Oi_fixedgrid – the fixed-grid p(x|O_i)

It is modeled on the calculations/step_size/Profiling.ipynb notebook: a faux FRB, a circular error ellipse, and a couple of candidate galaxies. Rather than timing each sub-step interactively, it sweeps a range of grid sizes (by varying step_size), builds a table of timings, prints it, and writes a figure.

Run from the command line:

python -m astropath.profiling

or import and call run_profiling() directly.

astropath.profiling.default_setup(ncand=50)[source]

Build the faux-FRB profiling scenario.

Mirrors calculations/step_size/Profiling.ipynb (circular 5” error ellipse), but scatters ncand candidate galaxies over a range of locations and angular sizes so the per-candidate loop in px_Oi_fixedgrid is exercised realistically.

Parameters:

ncand (int, optional) – Number of candidate galaxies.

Returns:

(localiz, cand_coords, cand_ang_size, theta_prior): localiz (dict): eellipse localization dict. cand_coords (SkyCoord): candidate host coordinates. cand_ang_size (np.ndarray): candidate angular sizes, arcsec. theta_prior (dict): offset-prior parameters.

Return type:

tuple

astropath.profiling.ellipse_setup(ncand=50)[source]

High-axis-ratio ellipse scenario that exercises _Lwx_correction.

Unlike default_setup() (circular localization, where b >= phi so no correction fires), this uses a long thin error ellipse (a=12.5”, b=0.2”, axis ratio ~60) with candidate galaxies LARGER than b. px_Oi_local therefore triggers the _Lwx_correction for every candidate. The galaxy sizes (1.5”-2.5”) are chosen so the correction grid is ~1000x1000 cells at the default step (0.05) – and stays below the ~5000-cell skip threshold across the whole step sweep.

Parameters:

ncand (int, optional) – Number of candidate galaxies.

Returns:

(localiz, cand_coords, cand_ang_size, theta_prior), same: shape as default_setup().

Return type:

tuple

astropath.profiling.main()[source]

Run the profiling sweep, print the table, and save the figure.

Parameters:: None
Returns:: The timing table (also printed to screen).
Return type:: pandas.DataFrame

astropath.profiling.plot_local_results(df, outfile)[source]

Plot px_Oi_local timing vs per-candidate grid size (log-log).

Parameters:

df (pandas.DataFrame) – Output of run_profiling_local().
outfile (str) – Path to write the PNG figure.

Returns:

The path the figure was written to.

Return type:

str

astropath.profiling.plot_results(df, outfile)[source]

Plot the timing results vs grid size on log-log axes.

Parameters:

df (pandas.DataFrame) – Output of run_profiling().
outfile (str) – Path to write the PNG figure.

Returns:

The path the figure was written to.

Return type:

str

astropath.profiling.run_profiling(step_sizes=None, box_hwidth=90.0)[source]

Profile calc_LWx and px_Oi_fixedgrid over a range of grid sizes.

Parameters:

step_sizes (list, optional) – Grid step sizes (arcsec) to sweep. Defaults to DEFAULT_STEP_SIZES.
box_hwidth (float, optional) – Analysis-box half-width, arcsec.

Returns:

One row per step size with columns: step_size, ngrid, n_pixels, calc_LWx_s, px_Oi_fixedgrid_s (numpy) and px_Oi_numba_s (the numba use_numba=True path; NaN if numba is unavailable).

Return type:

pandas.DataFrame

astropath.profiling.run_profiling_local(step_sizes=None)[source]

Profile px_Oi_local over a range of (relative) step sizes.

px_Oi_local builds one grid per candidate, sized to the offset prior (box_hwidth = phi*max) with spacing phi*step_size. The per-candidate pixel count is therefore ngrid = 2*max/step_size (independent of phi), so sweeping step_size sweeps the per-candidate grid size. The whole multi-candidate call is timed.

Three scenarios are timed against this SAME per-candidate galaxy grid (the x-axis of the figure):

circular (default_setup()) – b >= phi, so the _Lwx_correction never fires: pure galaxy-grid cost.
ellipse (ellipse_setup()) – a long thin localization (a=12.5”, b=0.2”) with b < phi, so the correction fires every candidate on a ~1000x1000 grid (at the default step).
small (small_loc_setup()) – a tiny circular localization (a=b=0.1”) with galaxies 0.1”-20”; the correction fires but its grid stays small (tiny major axis). Comparing the three isolates how the correction cost scales with the localization size.

Parameters:

step_sizes (list, optional) – Relative step sizes to sweep. Defaults to DEFAULT_STEP_SIZES.

Returns:

One row per step size with columns: step_size, ngrid (per candidate), n_pixels (per candidate), ncand, px_Oi_local_s (circular), px_Oi_local_ellipse_s (ellipse + correction), px_Oi_local_smallloc_s (small localization), and corr_ngrid (representative ellipse correction-grid side).

Return type:

pandas.DataFrame

astropath.profiling.small_loc_setup(ncand=50)[source]

Very small (sub-arcsec) circular localization scenario.

A tiny circular error ellipse (a=b=0.1”) with candidate galaxies spanning 0.1”-20”. For every galaxy larger than b the _Lwx_correction fires, but because the ellipse major axis is tiny its correction grid (~8a/h cells per side) stays small even as the galaxy grid grows – the opposite regime to ellipse_setup() (large axis ratio -> large correction grid). This stresses the deeply under-resolved case (grid spacing phi*step can be many times b).

Parameters:

ncand (int, optional) – Number of candidate galaxies.

Returns:

(localiz, cand_coords, cand_ang_size, theta_prior), same: shape as default_setup().

Return type:

tuple