Running PATH on Simulated FRBs

This notebook demonstrates how to run PATH (Probabilistic Association of Transients to Hosts) on simulated FRB populations.

The workflow consists of three main steps:

  1. Generate FRBs: Create synthetic FRB populations with realistic properties

  2. Assign to Hosts: Place FRBs in host galaxies with localization errors

  3. Run PATH: Calculate posterior probabilities for host candidates

This builds on the previous notebooks:

  • Simulate_Generate_FRBs.ipynb: Generating FRB populations

  • Simulate_Assign_FRBs.ipynb: Assigning FRBs to host galaxies

[11]:
import os
import numpy as np
import matplotlib.pyplot as plt
from importlib import reload
import pandas as pd
from pathlib import Path

from astropy.table import Table
from astropy.coordinates import SkyCoord
from astropy import units

# Set up plotting style
plt.rcParams['figure.figsize'] = (10, 6)
plt.rcParams['font.size'] = 12
[16]:
# Import astropath modules
from astropath.simulations import generate_frbs, assign_frbs_to_hosts, run_path
from astropath.simulations import utils as sim_utils
from astropath import run

Step 1: Generate FRB Population

First, we generate a population of FRBs using generate_frbs(). This gives us FRBs with realistic DM, redshift, and host galaxy magnitude distributions for different surveys.

[4]:
# Generate 100 CHIME FRBs for demonstration
n_frbs = 100
seed = 42

frbs = generate_frbs(n_frbs, 'CHIME', seed=seed)

print(f"Generated {len(frbs)} FRBs")
print(f"\nFRB properties:")
print(f"  DM range:  {frbs['DMeg'].min():.0f} - {frbs['DMeg'].max():.0f} pc/cm³")
print(f"  z range:   {frbs['z'].min():.3f} - {frbs['z'].max():.3f}")
print(f"  m_r range: {frbs['m_r'].min():.2f} - {frbs['m_r'].max():.2f}")

frbs.head()
Loading P(DM,z) grid from /home/xavier/Projects/FRB/FRB/frb/data/DM/CHIME_pzdm.npz
Sampling DM values
Sampling redshifts
Sampling host galaxy absolute magnitudes
Using Lz values to sample host galaxy absolute magnitudes
Generated 100 FRBs

FRB properties:
  DM range:  40 - 2090 pc/cm³
  z range:   0.010 - 1.960
  m_r range: 12.84 - 27.61
[4]:
DMeg z M_r m_r
0 445.0 0.25 -22.608415 17.963133
1 1175.0 1.38 -19.821438 25.200431
2 710.0 0.63 -21.196656 21.734748
3 585.0 0.45 -20.429629 21.624262
4 215.0 0.11 -18.170599 20.437868

Step 2: Load Galaxy Catalog

We need a galaxy catalog to:

  1. Assign FRBs to host galaxies

  2. Provide candidate hosts for PATH analysis

We’ll use the real combined HSC/DECaLs/HECATE catalog if available, otherwise create a mock catalog.

[5]:
def create_mock_galaxy_catalog(n_galaxies=50000, seed=42):
    """
    Create a mock galaxy catalog for demonstration.
    """
    np.random.seed(seed)

    # Create galaxies across the sky
    ra = np.random.uniform(0., 360., n_galaxies)
    dec = np.random.uniform(-30., 60., n_galaxies)

    # Magnitude distribution (peaks around m_r ~ 21-22)
    mag_best = np.random.beta(a=2, b=5, size=n_galaxies) * 10 + 16
    mag_best = np.clip(mag_best, 16., 26.)

    # Half-light radii (log-normal distribution)
    half_light = np.random.lognormal(mean=np.log(0.8), sigma=0.6, size=n_galaxies)
    half_light = np.clip(half_light, 0.1, 5.0)

    df = pd.DataFrame({
        'ra': ra,
        'dec': dec,
        'mag_best': mag_best,
        'half_light': half_light,
        'ID': np.arange(n_galaxies)
    })

    return df


def load_galaxy_catalog(seed=42):
    """
    Load galaxy catalog - tries real catalog first, falls back to mock.
    """
    frb_apath = os.environ.get('FRB_APATH')

    if frb_apath is not None:
        catalog_path = Path(frb_apath) / 'combined_HSC_DECaLs_HECATE_galaxies_hecatecut.parquet'

        if catalog_path.exists():
            print(f"Loading real galaxy catalog from:")
            print(f"  {catalog_path}")
            galaxies = pd.read_parquet(catalog_path)

            required_cols = ['ra', 'dec', 'mag_best', 'half_light', 'ID']
            if all(col in galaxies.columns for col in required_cols):
                print(f"  Loaded {len(galaxies):,} galaxies")
                return galaxies, True

    print("Creating mock galaxy catalog...")
    print("  (Set $FRB_APATH to use real HSC/DECaLs/HECATE catalog)")
    galaxies = create_mock_galaxy_catalog(n_galaxies=50000, seed=seed)
    print(f"  Created {len(galaxies):,} mock galaxies")
    return galaxies, False


# Load catalog
galaxies, is_real = load_galaxy_catalog(seed=42)

print(f"\nGalaxy catalog:")
print(f"  N_galaxies: {len(galaxies):,}")
print(f"  Mag range:  {galaxies['mag_best'].min():.2f} - {galaxies['mag_best'].max():.2f}")
Loading real galaxy catalog from:
  /home/xavier/Projects/FRB/data/Catalogs/combined_HSC_DECaLs_HECATE_galaxies_hecatecut.parquet
  Loaded 2,297,005 galaxies

Galaxy catalog:
  N_galaxies: 2,297,005
  Mag range:  9.75 - 28.00

Step 3: Assign FRBs to Host Galaxies

Now we assign FRBs to host galaxies based on their expected host magnitude. The assignment includes:

  • True FRB position within the galaxy

  • Observed position with localization error

  • Localization ellipse parameters (a, b, PA)

[6]:
# Define localization error ellipse
# (semi-major axis, semi-minor axis, position angle) in arcsec, arcsec, degrees
localization = (1.0, 0.5, 45.)  # 1" x 0.5" ellipse at 45 deg

print(f"Localization error ellipse:")
print(f"  Semi-major axis (a): {localization[0]}\"")
print(f"  Semi-minor axis (b): {localization[1]}\"")
print(f"  Position angle:      {localization[2]} deg")
print()

# Assign FRBs to hosts
assignments = assign_frbs_to_hosts(
    frb_df=frbs,
    galaxy_catalog=galaxies,
    localization=localization,
    mag_range=(17., 26.),  # Only assign FRBs with hosts in this mag range
    seed=seed
)

print(f"\nAssigned {len(assignments)} FRBs to hosts")
print(f"(Filtered from {len(frbs)} based on magnitude range)")

assignments.head()
Localization error ellipse:
  Semi-major axis (a): 1.0"
  Semi-minor axis (b): 0.5"
  Position angle:      45.0 deg

Assigning 86 FRBs to hosts (filtered from 100)
Iteration 1: 86 FRBs remaining
  Brightest unassigned FRB: m_r = 17.17 deg
  Max magnitude separation: 0.0001 deg deg
Assignment complete after 1 iterations
Generating FRB positions within galaxies...
Applying localization error...

Assigned 86 FRBs to hosts
(Filtered from 100 based on magnitude range)
[6]:
ra dec true_ra true_dec gal_ID gal_off mag half_light loc_off FRB_ID a b PA
0 199.216883 6.409707 199.216605 6.409882 8796117175370389 0.439120 17.963129 1.871511 1.177618 0 1.0 0.5 45.0
1 33.800890 -4.716802 33.800732 -4.717030 37485405112650917 0.993813 25.200432 0.660314 0.999297 1 1.0 0.5 45.0
2 34.736942 -6.066167 34.736657 -6.066538 8796112420210440 0.485265 21.734750 0.737067 1.680305 2 1.0 0.5 45.0
3 34.353103 -2.951097 34.352760 -2.951153 8796113549727288 1.997182 21.624264 4.375264 1.247809 3 1.0 0.5 45.0
4 36.008173 -5.754997 36.008381 -5.754480 8796112514517822 0.037168 20.437860 0.438232 2.003143 4 1.0 0.5 45.0

Output columns:

  • ra, dec: Observed FRB coordinates (with localization error)

  • true_ra, true_dec: True FRB coordinates (within galaxy)

  • gal_ID: Host galaxy ID

  • gal_off: FRB offset from galaxy center (arcsec)

  • mag: Host galaxy magnitude

  • half_light: Host galaxy half-light radius (arcsec)

  • loc_off: Localization error magnitude (arcsec)

  • a, b, PA: Localization ellipse parameters

Step 4: Run PATH Analysis

Now we run PATH on each assigned FRB. For each FRB, we:

  1. Extract nearby galaxies as candidates

  2. Build the input dictionary with localization and priors

  3. Run PATH to compute posterior probabilities

  4. Compare with the true host

[7]:
def extract_candidates(frb_row, galaxies, search_radius_arcmin=1.0):
    """
    Extract candidate host galaxies near an FRB position.

    Args:
        frb_row: Row from assignments DataFrame
        galaxies: Galaxy catalog DataFrame
        search_radius_arcmin: Search radius in arcmin

    Returns:
        astropy.table.Table: Candidate catalog for PATH
    """
    # Get FRB position
    frb_ra = frb_row['ra']
    frb_dec = frb_row['dec']

    # Convert search radius to degrees
    search_radius_deg = search_radius_arcmin / 60.

    # Simple box cut (fast approximation)
    cos_dec = np.cos(np.radians(frb_dec))
    ra_mask = np.abs(galaxies['ra'] - frb_ra) * cos_dec < search_radius_deg
    dec_mask = np.abs(galaxies['dec'] - frb_dec) < search_radius_deg

    nearby = galaxies[ra_mask & dec_mask].copy()

    if len(nearby) == 0:
        return None

    # Convert to astropy Table for PATH
    catalog = Table()
    catalog['ra'] = nearby['ra'].values
    catalog['dec'] = nearby['dec'].values
    catalog['ang_size'] = nearby['half_light'].values
    catalog['mag'] = nearby['mag_best'].values
    catalog['ID'] = nearby['ID'].values

    return catalog
[8]:
def run_path_on_frb(frb_row, galaxies, search_radius_arcmin=1.0, verbose=False):
    """
    Run PATH analysis on a single FRB.

    Args:
        frb_row: Row from assignments DataFrame
        galaxies: Galaxy catalog DataFrame
        search_radius_arcmin: Search radius for candidates
        verbose: Print detailed output

    Returns:
        dict: PATH results including posteriors and true host info
    """
    # Extract candidate galaxies
    catalog = extract_candidates(frb_row, galaxies, search_radius_arcmin)

    if catalog is None or len(catalog) == 0:
        return {'success': False, 'reason': 'no_candidates'}

    # Build input dictionary
    lparam = {
        'a': frb_row['a'],
        'b': frb_row['b'],
        'theta': frb_row['PA']
    }

    idict = run.build_idict(
        ra=frb_row['ra'],
        dec=frb_row['dec'],
        ltype='eellipse',
        lparam=lparam,
        P_O_method='inverse',
        PU=0.1,  # 10% prior on unseen host
        scale=0.5,
        theta_PDF='exp',
        theta_max=6.0
    )

    # Run PATH
    try:
        result_candidates, P_Ux, Path, mag_key, cut_catalog, _ = run.run_on_dict(
            idict,
            verbose=verbose,
            catalog=catalog,
            mag_key='mag'
        )
    except Exception as e:
        return {'success': False, 'reason': str(e)}

    if result_candidates is None or len(result_candidates) == 0:
        return {'success': False, 'reason': 'path_failed'}

    # Find the true host in results
    true_host_id = frb_row['gal_ID']

    # Check if true host is in candidates
    if 'ID' in result_candidates.columns:
        true_host_mask = result_candidates['ID'] == true_host_id
        if true_host_mask.any():
            true_host_P_Ox = result_candidates.loc[true_host_mask, 'P_Ox'].values[0]
            true_host_rank = (result_candidates['P_Ox'] > true_host_P_Ox).sum() + 1
        else:
            true_host_P_Ox = 0.0
            true_host_rank = len(result_candidates) + 1
    else:
        true_host_P_Ox = None
        true_host_rank = None

    return {
        'success': True,
        'n_candidates': len(result_candidates),
        'P_Ux': P_Ux,
        'top_P_Ox': result_candidates['P_Ox'].iloc[0],
        'true_host_P_Ox': true_host_P_Ox,
        'true_host_rank': true_host_rank,
        'true_host_id': true_host_id,
        'candidates': result_candidates
    }

Run PATH on a single FRB (example)

[9]:
# Run PATH on the first FRB
frb_idx = 0
frb_row = assignments.iloc[frb_idx]

print(f"FRB {frb_idx}:")
print(f"  Position: ({frb_row['ra']:.6f}, {frb_row['dec']:.6f})")
print(f"  True host ID: {frb_row['gal_ID']}")
print(f"  True host mag: {frb_row['mag']:.2f}")
print(f"  Localization: {frb_row['a']:.2f}\" x {frb_row['b']:.2f}\"")
print()

result = run_path_on_frb(frb_row, galaxies, search_radius_arcmin=1.0, verbose=True)

if result['success']:
    print(f"\n" + "="*60)
    print(f"PATH Results:")
    print(f"  N candidates: {result['n_candidates']}")
    print(f"  P(U|x):       {result['P_Ux']:.4f}")
    print(f"  Top P(O|x):   {result['top_P_Ox']:.4f}")
    print(f"  True host P(O|x): {result['true_host_P_Ox']:.4f}")
    print(f"  True host rank:   {result['true_host_rank']}")
else:
    print(f"PATH failed: {result['reason']}")
FRB 0:
  Position: (199.216883, 6.409707)
  True host ID: 8796117175370389.0
  True host mag: 17.96
  Localization: 1.00" x 0.50"

                 ID          ra       dec  ang_size        mag  P_O      P_Ox
0  8796117175370389  199.216727  6.409863  1.871511  17.963129  0.9  0.999985
P_Ux = 1.5185724702115839e-05

============================================================
PATH Results:
  N candidates: 1
  P(U|x):       0.0000
  Top P(O|x):   1.0000
  True host P(O|x): 1.0000
  True host rank:   1

Run PATH on all assigned FRBs, in parallel

This took ~3min on my laptop

[12]:
# Prior dict
DECaLS_chime_priors = {}
DECaLS_chime_priors['version'] = '1.0'
DECaLS_chime_priors['PU'] = 0.15
DECaLS_chime_priors['scale'] = 0.5
prior_dict = DECaLS_chime_priors

#
final_sims = run_path.full(assignments, galaxies, prior_dict,
        debug=False, ncpu=4, multi=True)

Slicing...
kk:  0
PATH time
[13]:
final_sims
[13]:
ra dec ang_size mag ID sep P_O p_xO P_Ox P_Ux gal_ID iFRB
0 199.216727 6.409863 1.871511 17.963129 8796117175370389 0.793340 0.420775 0.056456 0.999951 0.000049 1 0
1 199.258934 6.388591 1.296497 18.840237 8796117175369996 168.555642 0.157922 0.000000 0.000000 0.000049 0 0
2 199.233992 6.439563 4.672406 18.350800 8796117175370961 123.687674 0.271302 0.000000 0.000000 0.000049 2 0
3 33.800652 -4.716766 0.660314 25.200432 37485405112650917 0.863657 0.000071 0.106230 0.788365 0.121549 1236 1
4 33.801193 -4.717033 0.397299 25.941730 37485405112650915 1.370733 0.000042 0.019177 0.084025 0.121549 1234 1
... ... ... ... ... ... ... ... ... ... ... ... ...
95287 34.280275 -6.777631 0.480865 21.614834 8796112138537127 179.791867 0.001291 0.000000 0.000000 0.000180 68 84
95288 34.309795 -6.775455 0.598027 21.801176 8796112138537191 176.932092 0.001087 0.000000 0.000000 0.000180 69 84
95289 34.298661 -6.773713 0.976511 21.308273 8796112138537228 179.692401 0.001721 0.000000 0.000000 0.000180 70 84
95290 219.961916 21.458217 1.339557 18.801554 8796122665518662 1.620448 0.239000 0.050786 0.999905 0.000095 0 85
95291 219.927882 21.461977 1.061329 17.962357 8796122665518745 115.985859 0.611000 0.000000 0.000000 0.000095 1 85

95292 rows × 12 columns

Step 5. Digest

[19]:
frbs.keys()
[19]:
Index(['DMeg', 'z', 'M_r', 'm_r'], dtype='object')
[21]:
reload(sim_utils)
digest = sim_utils.build_digest(final_sims, frbs, assignments, galaxies)
Get parameters from the simulation results dataframe
Rename some columns to make concatenation cleaner
Calculate angular separation between true host and best candidate
Add the RA/Dec of the host *galaxy* from the original catalog
Merge dataframes, to create a nice big cross-checked dataframe for plotting purposes
[23]:
digest
[23]:
ra_loc dec_loc true_ra true_dec host_ID gal_off mag_host half_light_host loc_off FRB_ID ... ang_size_cand mag_cand cand_ID sep_cand P_O p_xO P_Ox P_Ux gal_ind iFRB
0 199.216883 6.409707 199.216605 6.409882 8796117175370389 0.439120 17.963129 1.871511 1.177618 0 ... 1.871511 17.963129 8796117175370389 0.793340 0.420775 0.056456 0.999951 0.000049 1 0
1 33.800890 -4.716802 33.800732 -4.717030 37485405112650917 0.993813 25.200432 0.660314 0.999297 1 ... 0.660314 25.200432 37485405112650917 0.863657 0.000071 0.106230 0.788365 0.121549 1236 1
2 34.736942 -6.066167 34.736657 -6.066538 8796112420210440 0.485265 21.734750 0.737067 1.680305 2 ... 0.737067 21.734750 8796112420210440 1.275588 0.003214 0.098519 0.996358 0.003642 31 2
3 34.353103 -2.951097 34.352760 -2.951153 8796113549727288 1.997182 21.624264 4.375264 1.247809 3 ... 4.375264 21.624264 8796113549727288 2.564606 0.001682 0.009761 0.934149 0.065851 54 3
4 36.008173 -5.754997 36.008381 -5.754480 8796112514517822 0.037168 20.437860 0.438232 2.003143 4 ... 0.438232 20.437860 8796112514517822 2.023352 0.009207 0.021587 0.994207 0.005790 25 4
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
81 37.603128 -5.110529 37.602826 -5.110579 8796112796975557 0.723388 20.342892 1.481896 1.095469 95 ... 1.481896 20.342892 8796112796975557 0.979629 0.006700 0.059533 0.997107 0.002893 27 81
82 35.058354 -5.599320 35.058318 -5.599317 37489919123282555 0.123619 22.189007 0.615296 0.126097 96 ... 0.615296 22.189007 37489919123282555 0.084937 0.001473 0.207259 0.996222 0.003778 733 82
83 34.589116 -3.983474 34.588902 -3.983679 38549165432636034 0.017656 22.667412 0.537738 1.065733 97 ... 0.537738 22.667412 38549165432636034 1.060546 0.000756 0.138690 0.989086 0.010914 591 83
84 34.299906 -6.823612 34.300143 -6.823644 8796112138535839 0.426606 17.306215 2.290571 0.854336 98 ... 2.290571 17.306215 8796112138535839 0.492979 0.121529 0.052765 0.999799 0.000180 20 84
85 219.962297 21.458493 219.961921 21.458290 8796122665518662 0.263433 18.801554 1.339557 1.457904 99 ... 1.339557 18.801554 8796122665518662 1.620448 0.239000 0.050786 0.999905 0.000095 0 85

86 rows × 34 columns

Step 5: Analyze PATH Performance

Let’s analyze how well PATH identifies the true hosts.

[24]:
matched = digest['cand_ID'] == digest.host_ID
np.sum(matched)
[24]:
np.int64(84)

Step 6: Compare Different Localization Precisions

Let’s see how PATH performance varies with localization precision.

[13]:
# Define different localization scenarios
localization_scenarios = {
    'Excellent (0.1")': (0.1, 0.1, 0.),
    'Good (0.5")': (0.5, 0.3, 45.),
    'Moderate (2")': (2.0, 1.0, 30.),
    'Poor (5")': (5.0, 3.0, 0.)
}

# Run simulations for each scenario
scenario_results = {}

for scenario_name, loc in localization_scenarios.items():
    print(f"\nRunning {scenario_name}...")

    # Assign FRBs with this localization
    assign_scenario = assign_frbs_to_hosts(
        frb_df=frbs,
        galaxy_catalog=galaxies,
        localization=loc,
        mag_range=(17., 26.),
        seed=42
    )

    # Run PATH on each
    scenario_path_results = []
    for idx in range(len(assign_scenario)):
        frb_row = assign_scenario.iloc[idx]
        result = run_path_on_frb(frb_row, galaxies, search_radius_arcmin=1.0)
        if result['success']:
            scenario_path_results.append(result)

    scenario_results[scenario_name] = pd.DataFrame(scenario_path_results)

    # Summary
    df = scenario_results[scenario_name]
    rank1_frac = (df['true_host_rank'] == 1).mean() * 100
    print(f"  Assigned: {len(assign_scenario)}, PATH success: {len(df)}")
    print(f"  Correct (rank=1): {rank1_frac:.1f}%")

Running Excellent (0.1")...
Assigning 68 FRBs to hosts (filtered from 100)
Iteration 1: 68 FRBs remaining
  Brightest unassigned FRB: m_r = 17.22 deg
  Max magnitude separation: 0.0001 deg deg
Assignment complete after 1 iterations
Generating FRB positions within galaxies...
Applying localization error...
  Assigned: 68, PATH success: 68
  Correct (rank=1): 98.5%

Running Good (0.5")...
Assigning 68 FRBs to hosts (filtered from 100)
Iteration 1: 68 FRBs remaining
  Brightest unassigned FRB: m_r = 17.22 deg
  Max magnitude separation: 0.0001 deg deg
Assignment complete after 1 iterations
Generating FRB positions within galaxies...
Applying localization error...
  Assigned: 68, PATH success: 68
  Correct (rank=1): 95.6%

Running Moderate (2")...
Assigning 68 FRBs to hosts (filtered from 100)
Iteration 1: 68 FRBs remaining
  Brightest unassigned FRB: m_r = 17.22 deg
  Max magnitude separation: 0.0001 deg deg
Assignment complete after 1 iterations
Generating FRB positions within galaxies...
Applying localization error...
  Assigned: 68, PATH success: 68
  Correct (rank=1): 91.2%

Running Poor (5")...
Assigning 68 FRBs to hosts (filtered from 100)
Iteration 1: 68 FRBs remaining
  Brightest unassigned FRB: m_r = 17.22 deg
  Max magnitude separation: 0.0001 deg deg
Assignment complete after 1 iterations
Generating FRB positions within galaxies...
Applying localization error...
  Assigned: 68, PATH success: 68
  Correct (rank=1): 79.4%
[14]:
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728']

# 1. Rank 1 success rate comparison
ax = axes[0]
scenarios = list(scenario_results.keys())
rank1_rates = [(scenario_results[s]['true_host_rank'] == 1).mean() * 100 for s in scenarios]

bars = ax.bar(range(len(scenarios)), rank1_rates, color=colors, alpha=0.7, edgecolor='black')
ax.set_xticks(range(len(scenarios)))
ax.set_xticklabels(scenarios, rotation=15, ha='right')
ax.set_ylabel('Correct Identification Rate (%)')
ax.set_title('PATH Success Rate by Localization Precision')
ax.set_ylim(0, 100)

# Add value labels
for bar, rate in zip(bars, rank1_rates):
    ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 2,
            f'{rate:.1f}%', ha='center', fontsize=11)

# 2. True host P(O|x) distributions
ax = axes[1]
for scenario, color in zip(scenarios, colors):
    df = scenario_results[scenario]
    ax.hist(df['true_host_P_Ox'], bins=20, alpha=0.4, color=color,
            label=scenario, density=True)

ax.set_xlabel('True Host P(O|x)')
ax.set_ylabel('Density')
ax.set_title('True Host Posterior Distribution by Localization')
ax.legend()

plt.tight_layout()
plt.show()
../_images/nb_Simulate_Run_PATH_26_0.png

Step 7: Effect of Unseen Host Prior (P_U)

The PU parameter controls the prior probability that the true host is not in the catalog. Let’s see how this affects PATH performance.

[15]:
def run_path_with_PU(frb_row, galaxies, PU, search_radius_arcmin=1.0):
    """Run PATH with a specific P_U value."""
    catalog = extract_candidates(frb_row, galaxies, search_radius_arcmin)

    if catalog is None or len(catalog) == 0:
        return None

    lparam = {
        'a': frb_row['a'],
        'b': frb_row['b'],
        'theta': frb_row['PA']
    }

    idict = run.build_idict(
        ra=frb_row['ra'],
        dec=frb_row['dec'],
        ltype='eellipse',
        lparam=lparam,
        P_O_method='inverse',
        PU=PU,
        scale=0.5,
        theta_PDF='exp',
        theta_max=6.0
    )

    try:
        result_candidates, P_Ux, Path, mag_key, cut_catalog, _ = run.run_on_dict(
            idict, catalog=catalog, mag_key='mag'
        )
        return {
            'P_Ux': P_Ux,
            'candidates': result_candidates
        }
    except:
        return None

# Test different P_U values on a subset
PU_values = [0.0, 0.05, 0.1, 0.2, 0.3, 0.5]
n_test = min(30, len(assignments))

PU_results = {pu: [] for pu in PU_values}

print(f"Testing P_U effect on {n_test} FRBs...")
for idx in range(n_test):
    frb_row = assignments.iloc[idx]
    true_host_id = frb_row['gal_ID']

    for pu in PU_values:
        result = run_path_with_PU(frb_row, galaxies, pu)
        if result is not None:
            cands = result['candidates']
            if 'ID' in cands.columns:
                mask = cands['ID'] == true_host_id
                if mask.any():
                    true_P_Ox = cands.loc[mask, 'P_Ox'].values[0]
                else:
                    true_P_Ox = 0.0
            else:
                true_P_Ox = None

            PU_results[pu].append({
                'P_Ux': result['P_Ux'],
                'true_host_P_Ox': true_P_Ox
            })

print("Done!")
Testing P_U effect on 30 FRBs...
Done!
[16]:
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# 1. Mean P(U|x) vs prior P_U
ax = axes[0]
mean_PUx = [np.mean([r['P_Ux'] for r in PU_results[pu]]) for pu in PU_values]
ax.plot(PU_values, mean_PUx, 'bo-', markersize=10, linewidth=2)
ax.plot([0, 0.5], [0, 0.5], 'k--', alpha=0.5, label='1:1 line')
ax.set_xlabel('Prior P(U)', fontsize=12)
ax.set_ylabel('Mean Posterior P(U|x)', fontsize=12)
ax.set_title('Effect of Unseen Prior on Posterior')
ax.legend()
ax.grid(alpha=0.3)

# 2. Mean true host P(O|x) vs P_U
ax = axes[1]
mean_true_POx = [np.mean([r['true_host_P_Ox'] for r in PU_results[pu] if r['true_host_P_Ox'] is not None])
                 for pu in PU_values]
ax.plot(PU_values, mean_true_POx, 'go-', markersize=10, linewidth=2)
ax.set_xlabel('Prior P(U)', fontsize=12)
ax.set_ylabel('Mean True Host P(O|x)', fontsize=12)
ax.set_title('Effect of Unseen Prior on True Host Posterior')
ax.grid(alpha=0.3)

plt.tight_layout()
plt.show()

print("P(U) effect summary:")
print(f"{'P_U':<8} {'Mean P(U|x)':<15} {'Mean True Host P(O|x)':<20}")
print("-" * 45)
for pu, pux, pox in zip(PU_values, mean_PUx, mean_true_POx):
    print(f"{pu:<8.2f} {pux:<15.4f} {pox:<20.4f}")
../_images/nb_Simulate_Run_PATH_29_0.png
P(U) effect summary:
P_U      Mean P(U|x)     Mean True Host P(O|x)
---------------------------------------------
0.00     0.0000          0.9229
0.05     0.0008          0.9223
0.10     0.0016          0.9216
0.20     0.0036          0.9200
0.30     0.0061          0.9179
0.50     0.0136          0.9116

Summary

This notebook demonstrated the complete workflow for running PATH on simulated FRBs:

  1. Generate FRBs: Used generate_frbs() to create realistic FRB populations

  2. Load Catalog: Used real or mock galaxy catalog

  3. Assign Hosts: Used assign_frbs_to_hosts() to place FRBs with localization errors

  4. Run PATH: Used run.run_on_dict() to compute posteriors

  5. Analyze Performance: Evaluated correct identification rates

Key Findings

  • Localization precision matters: Better localization leads to higher correct identification rates

  • P_U trade-off: Higher unseen prior reduces true host posterior but may be more realistic

  • PATH is effective: Even with moderate localization, PATH correctly identifies most hosts

Using with Real Data

To use this workflow with real FRB observations:

  1. Replace simulated FRBs with real observations

  2. Use actual localization parameters from your survey

  3. Query appropriate survey catalogs (Pan-STARRS, DECaLS, etc.)

  4. Adjust priors based on your expectations

[ ]: