*********** Performance *********** This doc describes the performance-oriented pieces of *astropath*: the numpy-accelerated core calculations, the optional ``numba`` acceleration for the fixed-grid posterior, and the ``profiling`` module used to measure them. numpy-accelerated core ====================== The two most expensive steps in a PATH analysis have been re-implemented in pure numpy: - ``localization.calc_LWx`` (the localization term :math:`L(w-x)`) for the error-ellipse (``eellipse``) localization now computes the angular separation and position angle with closed-form numpy formulae instead of ``astropy`` ``SkyCoord`` operations. Results are numerically identical (to roundoff) and the routine is several times faster on large grids. - ``bayesian.px_Oi_fixedgrid`` (the fixed-grid :math:`p(x|O_i)`) pre-extracts the candidate and center coordinates into numpy arrays once, rather than reading ``astropy`` attributes inside the per-candidate loop. - ``bayesian.px_Oi_local`` (the local-grid :math:`p(x|O_i)`, which builds one grid per candidate) has likewise been rewritten in pure numpy for the ``eellipse`` localization: it pre-extracts the coordinates once and evaluates :math:`L(w-x)` directly in a flat-sky tangent plane, removing every ``astropy`` call from the per-candidate loop. Other localization types (``healpix``, ``wcs``) fall back to ``localization.calc_LWx`` as before. See `Local-grid posterior`_ below for details. These changes are transparent: you do not need to do anything to benefit from them, and the public APIs are unchanged. Optional numba acceleration =========================== The per-candidate loop of ``bayesian.px_Oi_fixedgrid`` can optionally be evaluated with a `numba `_ kernel (``bayesian.px_Oi_numba``) that fuses the offset, offset-PDF, product with :math:`L(w-x)`, and grid sum into a single pass with no full-grid temporaries. To enable it, pass ``use_numba=True``:: from astropath import bayesian p_xOi = bayesian.px_Oi_fixedgrid( box_hwidth, localiz, cand_coords, cand_ang_size, theta_prior, step_size=0.1, use_numba=True) Key points: - **Optional.** ``numba`` need *not* be installed. If it is absent (or if you simply leave ``use_numba=False``, the default), the calculation runs via the standard numpy path. With ``use_numba=True`` but ``numba`` not installed, the code emits a warning and falls back to numpy — it never errors. - **Default is off.** ``use_numba`` defaults to ``False``, so existing code and results are unchanged unless you opt in. - **Only ``px_Oi_fixedgrid``.** The numba option is available *only* for the fixed-grid method. ``px_Oi_local`` and the other routines are unaffected. The fused kernel returns only the scalar posterior, so it is bypassed (numpy path used) when ``return_grids`` or ``return_debug`` is requested. - **Primarily for sandbox analyses.** numba is recommended mainly for interactive/sandbox work and large-grid experiments, where its speed-up is largest. The first call pays a one-time JIT compilation cost; the win grows with the grid size and the number of candidates (roughly 1.5x on small grids up to ~5-6x on a 7200x7200 grid with many candidates). .. note:: The numba flag is exposed on ``bayesian.px_Oi_fixedgrid`` directly. The high-level ``path.PATH.calc_posteriors`` does not currently forward ``use_numba``; call ``bayesian.px_Oi_fixedgrid`` directly to use it. Local-grid posterior ==================== ``bayesian.px_Oi_local`` evaluates :math:`p(x|O_i)` on a separate grid per candidate, centered on the galaxy and sized to the offset prior (``box_hwidth = phi*max``). It is the method of choice for localizations that span a large area of sky, where a single fixed grid would be prohibitively large. For the ``eellipse`` localization the calculation is pure numpy and flat-sky: - The per-candidate grid is built once in normalized units and merely rescaled per galaxy — the pixel count ``ngrid = 2*max/step_size`` is the same for every candidate. - :math:`L(w-x)` is evaluated directly in the tangent plane (offsets rotated into the ellipse frame), matching the spherical ``calc_LWx`` to ~1e-5 fractionally at arcsec scales. Here ``step_size`` is *relative* to the galaxy size (default ``0.05``), so the grid spacing is ``phi*step_size``. .. note:: ``px_Oi_local`` is **pure numpy and does not require (or use) numba** — for now. The optional numba acceleration described above applies only to ``px_Oi_fixedgrid``. ``px_Oi_local`` is fast on its own because each per-candidate grid is small. Small-localization correction ------------------------------ When the localization minor axis :math:`b` is smaller than the galaxy angular size :math:`\phi`, the galaxy-centered grid under-resolves the sharp localization and the raw sum is biased low. In that case ``px_Oi_local`` divides the result by a correction factor computed by ``bayesian._Lwx_correction``: the discrete "total :math:`L(w-x)`" on a small grid that is centered on the localization and *aligned to the galaxy grid* (same spacing, shifted by an integer number of cells). Because the localization is sampled at the same sub-cell phase in the raw sum and in this factor, the under-resolution bias cancels in the ratio (accurate to ~1%). This is the local analogue of ``px_Oi_fixedgrid``'s ``correction='L_wx'``. The correction grid is bounded (it is skipped when it would exceed ~5000 cells per side, which only happens when the localization is already well resolved and no correction is needed), so it never allocates a large array. Profiling module ================ The ``astropath.profiling`` module measures these calculations across a range of grid sizes. It mirrors the ``calculations/step_size/Profiling.ipynb`` notebook: a faux FRB, a circular error ellipse, and a set of candidate galaxies spread over a range of locations and angular sizes. Run it from the command line:: python -m astropath.profiling This sweeps a range of ``step_size`` values (hence grid sizes) and profiles both posterior methods: - ``calc_LWx``, the numpy ``px_Oi_fixedgrid``, and (if ``numba`` is installed) the numba path with its speed-up factor — written to ``profiling_timing.png``; - ``px_Oi_local`` (via ``run_profiling_local``), whose per-candidate grid size is ``2*max/step_size`` — written to ``profiling_local_timing.png``. Both timing tables are printed to the screen. You can also import and call the pieces directly:: from astropath import profiling df = profiling.run_profiling() # fixed-grid (+ numba) profiling.plot_results(df, 'timing.png') df_local = profiling.run_profiling_local() # local-grid method profiling.plot_local_results(df_local, 'local.png') Benchmark results ================= Accuracy -------- ``px_Oi_local`` is validated against ``px_Oi_fixedgrid`` run on a fine grid (the ``astropath/tests/tests_local.py`` module). Both evaluate the same integral :math:`p(x|O_i)=\int L(w-x)\,p(w|O_i)\,dw`; the fine fixed grid is taken as truth. Across the size regimes the local method reproduces it to a fraction of a percent (run at ``step_size=0.02``): .. csv-table:: px_Oi_local vs fine fixed grid (step_size = 0.02) :header: "Case", "localization", "galaxy", "rel. diff" :widths: 30, 18, 10, 12 "large galaxy / small loc", "a=b=1″", "10″", "+7.0e-5" "large galaxy / large loc", "a=b=10″", "10″", "-3.3e-3" "small galaxy / small loc", "a=b=1″", "1″", "-3.3e-3" "small galaxy / large loc", "a=b=10″", "1″", "-3.3e-3" "small galaxy / ellipse", "a=10″, b=0.2″", "0.5″", "-3.3e-3" When the localization minor axis is smaller than the galaxy (:math:`b<\phi`) the ``_Lwx_correction`` removes the O(step) bias, so the result stays accurate even at coarse (galaxy-relative) step sizes where the uncorrected sum would be off by ~1-2 %: .. csv-table:: L_wx correction (b < phi), corrected rel. diff vs raw :header: "Case", "step", "raw (no corr.)", "corrected" :widths: 26, 8, 16, 12 "gal 10″ / loc a=b=1″", "0.05", "-0.85 %", "-1.7e-4" "gal 10″ / loc a=b=1″", "0.10", "-1.85 %", "-1.9e-3" "ellipse a=10″, b=0.2″", "0.05", "-0.84 %", "-7.1e-6" "ellipse a=10″, b=0.2″", "0.10", "-1.66 %", "+3.4e-5" Even for a very small (sub-arcsec) localization, where the galaxy grid badly under-resolves the localization, the correction recovers the fine fixed-grid value (``step_size=0.05``): .. csv-table:: Very small localization (a=b=0.1") :header: "galaxy", "rel. diff" :widths: 12, 12 "0.3″", "+6.9e-5" "0.6″", "+2.5e-5" "1.0″", "+4.8e-5" Profiling --------- Timings below were produced by ``python -m astropath.profiling`` (50 candidate galaxies; absolute times are machine-dependent, the trends and ratios are the point). **Fixed-grid posterior** (``px_Oi_fixedgrid``): the numba path overtakes numpy beyond small grids, reaching ~5-6x on the largest grids; the one-off JIT cost makes it slower than numpy only on the smallest grid. .. csv-table:: px_Oi_fixedgrid timing (50 candidates) :header: "step", "grid", "calc_LWx", "numpy", "numba", "numba speed-up" :widths: 8, 12, 10, 12, 10, 14 "0.50", "360²", "9 ms", "30 ms", "145 ms", "0.2x" "0.25", "720²", "42 ms", "173 ms", "91 ms", "1.9x" "0.10", "1800²", "344 ms", "2.06 s", "670 ms", "3.1x" "0.05", "3600²", "1.67 s", "16.4 s", "2.96 s", "5.6x" "0.025", "7200²", "6.49 s", "68.5 s", "12.4 s", "5.5x" .. figure:: figures/profiling_timing.png :width: 90 % :align: center Fixed-grid timing vs grid side length (sqrt of pixels). The dashed line marks 10 s. **Local-grid posterior** (``px_Oi_local``): the cost depends on the localization. With no correction (circular, :math:`b\ge\phi`) or a tiny localization (a=b=0.1", small correction grid) the per-candidate cost is modest; a long thin ellipse (a=12.5", b=0.2") drives the correction grid to ~1000² and dominates the run time -- i.e. the correction cost scales with the localization major axis. .. csv-table:: px_Oi_local timing (50 candidates), milliseconds :header: "step", "galaxy grid", "ellipse corr grid", "circular", "ellipse", "small loc" :widths: 7, 12, 16, 10, 10, 10 "0.50", "24²", "97²", "0.9", "6.8", "2.3" "0.25", "48²", "197²", "1.8", "22", "3.3" "0.10", "120²", "497²", "7.9", "146", "11" "0.05", "240²", "997²", "31", "1553", "41" "0.025", "480²", "1997²", "155", "9037", "185" .. figure:: figures/profiling_local_timing.png :width: 90 % :align: center Local-grid timing vs per-candidate galaxy-grid side length, for the three localization scenarios. The dashed line marks 10 s. API Reference ============= .. automodule:: astropath.profiling :members: :undoc-members: :show-inheritance: