Running PATH on Simulated FRBs

This notebook demonstrates how to run PATH (Probabilistic Association of Transients to Hosts) on simulated FRB populations.

The workflow consists of three main steps:

Generate FRBs: Create synthetic FRB populations with realistic properties
Assign to Hosts: Place FRBs in host galaxies with localization errors
Run PATH: Calculate posterior probabilities for host candidates

This builds on the previous notebooks:

Simulate_Generate_FRBs.ipynb: Generating FRB populations
Simulate_Assign_FRBs.ipynb: Assigning FRBs to host galaxies

[11]:

import os
import numpy as np
import matplotlib.pyplot as plt
from importlib import reload
import pandas as pd
from pathlib import Path

from astropy.table import Table
from astropy.coordinates import SkyCoord
from astropy import units

# Set up plotting style
plt.rcParams['figure.figsize'] = (10, 6)
plt.rcParams['font.size'] = 12

[16]:

# Import astropath modules
from astropath.simulations import generate_frbs, assign_frbs_to_hosts, run_path
from astropath.simulations import utils as sim_utils
from astropath import run

Step 1: Generate FRB Population

First, we generate a population of FRBs using generate_frbs(). This gives us FRBs with realistic DM, redshift, and host galaxy magnitude distributions for different surveys.

[4]:

# Generate 100 CHIME FRBs for demonstration
n_frbs = 100
seed = 42

frbs = generate_frbs(n_frbs, 'CHIME', seed=seed)

print(f"Generated {len(frbs)} FRBs")
print(f"\nFRB properties:")
print(f"  DM range:  {frbs['DMeg'].min():.0f} - {frbs['DMeg'].max():.0f} pc/cm³")
print(f"  z range:   {frbs['z'].min():.3f} - {frbs['z'].max():.3f}")
print(f"  m_r range: {frbs['m_r'].min():.2f} - {frbs['m_r'].max():.2f}")

frbs.head()

Loading P(DM,z) grid from /home/xavier/Projects/FRB/FRB/frb/data/DM/CHIME_pzdm.npz
Sampling DM values
Sampling redshifts
Sampling host galaxy absolute magnitudes
Using Lz values to sample host galaxy absolute magnitudes
Generated 100 FRBs

FRB properties:
  DM range:  40 - 2090 pc/cm³
  z range:   0.010 - 1.960
  m_r range: 12.84 - 27.61

[4]:

	DMeg	z	M_r	m_r
0	445.0	0.25	-22.608415	17.963133
1	1175.0	1.38	-19.821438	25.200431
2	710.0	0.63	-21.196656	21.734748
3	585.0	0.45	-20.429629	21.624262
4	215.0	0.11	-18.170599	20.437868

Step 2: Load Galaxy Catalog

We need a galaxy catalog to:

Assign FRBs to host galaxies
Provide candidate hosts for PATH analysis

We’ll use the real combined HSC/DECaLs/HECATE catalog if available, otherwise create a mock catalog.

[5]:

def create_mock_galaxy_catalog(n_galaxies=50000, seed=42):
    """
    Create a mock galaxy catalog for demonstration.
    """
    np.random.seed(seed)

    # Create galaxies across the sky
    ra = np.random.uniform(0., 360., n_galaxies)
    dec = np.random.uniform(-30., 60., n_galaxies)

    # Magnitude distribution (peaks around m_r ~ 21-22)
    mag_best = np.random.beta(a=2, b=5, size=n_galaxies) * 10 + 16
    mag_best = np.clip(mag_best, 16., 26.)

    # Half-light radii (log-normal distribution)
    half_light = np.random.lognormal(mean=np.log(0.8), sigma=0.6, size=n_galaxies)
    half_light = np.clip(half_light, 0.1, 5.0)

    df = pd.DataFrame({
        'ra': ra,
        'dec': dec,
        'mag_best': mag_best,
        'half_light': half_light,
        'ID': np.arange(n_galaxies)
    })

    return df


def load_galaxy_catalog(seed=42):
    """
    Load galaxy catalog - tries real catalog first, falls back to mock.
    """
    frb_apath = os.environ.get('FRB_APATH')

    if frb_apath is not None:
        catalog_path = Path(frb_apath) / 'combined_HSC_DECaLs_HECATE_galaxies_hecatecut.parquet'

        if catalog_path.exists():
            print(f"Loading real galaxy catalog from:")
            print(f"  {catalog_path}")
            galaxies = pd.read_parquet(catalog_path)

            required_cols = ['ra', 'dec', 'mag_best', 'half_light', 'ID']
            if all(col in galaxies.columns for col in required_cols):
                print(f"  Loaded {len(galaxies):,} galaxies")
                return galaxies, True

    print("Creating mock galaxy catalog...")
    print("  (Set $FRB_APATH to use real HSC/DECaLs/HECATE catalog)")
    galaxies = create_mock_galaxy_catalog(n_galaxies=50000, seed=seed)
    print(f"  Created {len(galaxies):,} mock galaxies")
    return galaxies, False


# Load catalog
galaxies, is_real = load_galaxy_catalog(seed=42)

print(f"\nGalaxy catalog:")
print(f"  N_galaxies: {len(galaxies):,}")
print(f"  Mag range:  {galaxies['mag_best'].min():.2f} - {galaxies['mag_best'].max():.2f}")

Loading real galaxy catalog from:
  /home/xavier/Projects/FRB/data/Catalogs/combined_HSC_DECaLs_HECATE_galaxies_hecatecut.parquet
  Loaded 2,297,005 galaxies

Galaxy catalog:
  N_galaxies: 2,297,005
  Mag range:  9.75 - 28.00

Step 3: Assign FRBs to Host Galaxies

Now we assign FRBs to host galaxies based on their expected host magnitude. The assignment includes:

True FRB position within the galaxy
Observed position with localization error
Localization ellipse parameters (a, b, PA)

[6]:

# Define localization error ellipse
# (semi-major axis, semi-minor axis, position angle) in arcsec, arcsec, degrees
localization = (1.0, 0.5, 45.)  # 1" x 0.5" ellipse at 45 deg

print(f"Localization error ellipse:")
print(f"  Semi-major axis (a): {localization[0]}\"")
print(f"  Semi-minor axis (b): {localization[1]}\"")
print(f"  Position angle:      {localization[2]} deg")
print()

# Assign FRBs to hosts
assignments = assign_frbs_to_hosts(
    frb_df=frbs,
    galaxy_catalog=galaxies,
    localization=localization,
    mag_range=(17., 26.),  # Only assign FRBs with hosts in this mag range
    seed=seed
)

print(f"\nAssigned {len(assignments)} FRBs to hosts")
print(f"(Filtered from {len(frbs)} based on magnitude range)")

assignments.head()

Localization error ellipse:
  Semi-major axis (a): 1.0"
  Semi-minor axis (b): 0.5"
  Position angle:      45.0 deg

Assigning 86 FRBs to hosts (filtered from 100)
Iteration 1: 86 FRBs remaining
  Brightest unassigned FRB: m_r = 17.17 deg
  Max magnitude separation: 0.0001 deg deg
Assignment complete after 1 iterations
Generating FRB positions within galaxies...
Applying localization error...

Assigned 86 FRBs to hosts
(Filtered from 100 based on magnitude range)

[6]:

	ra	dec	true_ra	true_dec	gal_ID	gal_off	mag	half_light	loc_off	FRB_ID	a	b	PA
0	199.216883	6.409707	199.216605	6.409882	8796117175370389	0.439120	17.963129	1.871511	1.177618	0	1.0	0.5	45.0
1	33.800890	-4.716802	33.800732	-4.717030	37485405112650917	0.993813	25.200432	0.660314	0.999297	1	1.0	0.5	45.0
2	34.736942	-6.066167	34.736657	-6.066538	8796112420210440	0.485265	21.734750	0.737067	1.680305	2	1.0	0.5	45.0
3	34.353103	-2.951097	34.352760	-2.951153	8796113549727288	1.997182	21.624264	4.375264	1.247809	3	1.0	0.5	45.0
4	36.008173	-5.754997	36.008381	-5.754480	8796112514517822	0.037168	20.437860	0.438232	2.003143	4	1.0	0.5	45.0

Output columns:

ra, dec: Observed FRB coordinates (with localization error)
true_ra, true_dec: True FRB coordinates (within galaxy)
gal_ID: Host galaxy ID
gal_off: FRB offset from galaxy center (arcsec)
mag: Host galaxy magnitude
half_light: Host galaxy half-light radius (arcsec)
loc_off: Localization error magnitude (arcsec)
a, b, PA: Localization ellipse parameters

Step 4: Run PATH Analysis

Now we run PATH on each assigned FRB. For each FRB, we:

Extract nearby galaxies as candidates
Build the input dictionary with localization and priors
Run PATH to compute posterior probabilities
Compare with the true host

[7]:

def extract_candidates(frb_row, galaxies, search_radius_arcmin=1.0):
    """
    Extract candidate host galaxies near an FRB position.

    Args:
        frb_row: Row from assignments DataFrame
        galaxies: Galaxy catalog DataFrame
        search_radius_arcmin: Search radius in arcmin

    Returns:
        astropy.table.Table: Candidate catalog for PATH
    """
    # Get FRB position
    frb_ra = frb_row['ra']
    frb_dec = frb_row['dec']

    # Convert search radius to degrees
    search_radius_deg = search_radius_arcmin / 60.

    # Simple box cut (fast approximation)
    cos_dec = np.cos(np.radians(frb_dec))
    ra_mask = np.abs(galaxies['ra'] - frb_ra) * cos_dec < search_radius_deg
    dec_mask = np.abs(galaxies['dec'] - frb_dec) < search_radius_deg

    nearby = galaxies[ra_mask & dec_mask].copy()

    if len(nearby) == 0:
        return None

    # Convert to astropy Table for PATH
    catalog = Table()
    catalog['ra'] = nearby['ra'].values
    catalog['dec'] = nearby['dec'].values
    catalog['ang_size'] = nearby['half_light'].values
    catalog['mag'] = nearby['mag_best'].values
    catalog['ID'] = nearby['ID'].values

    return catalog

[8]:

def run_path_on_frb(frb_row, galaxies, search_radius_arcmin=1.0, verbose=False):
    """
    Run PATH analysis on a single FRB.

    Args:
        frb_row: Row from assignments DataFrame
        galaxies: Galaxy catalog DataFrame
        search_radius_arcmin: Search radius for candidates
        verbose: Print detailed output

    Returns:
        dict: PATH results including posteriors and true host info
    """
    # Extract candidate galaxies
    catalog = extract_candidates(frb_row, galaxies, search_radius_arcmin)

    if catalog is None or len(catalog) == 0:
        return {'success': False, 'reason': 'no_candidates'}

    # Build input dictionary
    lparam = {
        'a': frb_row['a'],
        'b': frb_row['b'],
        'theta': frb_row['PA']
    }

    idict = run.build_idict(
        ra=frb_row['ra'],
        dec=frb_row['dec'],
        ltype='eellipse',
        lparam=lparam,
        P_O_method='inverse',
        PU=0.1,  # 10% prior on unseen host
        scale=0.5,
        theta_PDF='exp',
        theta_max=6.0
    )

    # Run PATH
    try:
        result_candidates, P_Ux, Path, mag_key, cut_catalog, _ = run.run_on_dict(
            idict,
            verbose=verbose,
            catalog=catalog,
            mag_key='mag'
        )
    except Exception as e:
        return {'success': False, 'reason': str(e)}

    if result_candidates is None or len(result_candidates) == 0:
        return {'success': False, 'reason': 'path_failed'}

    # Find the true host in results
    true_host_id = frb_row['gal_ID']

    # Check if true host is in candidates
    if 'ID' in result_candidates.columns:
        true_host_mask = result_candidates['ID'] == true_host_id
        if true_host_mask.any():
            true_host_P_Ox = result_candidates.loc[true_host_mask, 'P_Ox'].values[0]
            true_host_rank = (result_candidates['P_Ox'] > true_host_P_Ox).sum() + 1
        else:
            true_host_P_Ox = 0.0
            true_host_rank = len(result_candidates) + 1
    else:
        true_host_P_Ox = None
        true_host_rank = None

    return {
        'success': True,
        'n_candidates': len(result_candidates),
        'P_Ux': P_Ux,
        'top_P_Ox': result_candidates['P_Ox'].iloc[0],
        'true_host_P_Ox': true_host_P_Ox,
        'true_host_rank': true_host_rank,
        'true_host_id': true_host_id,
        'candidates': result_candidates
    }

Run PATH on a single FRB (example)

[9]:

# Run PATH on the first FRB
frb_idx = 0
frb_row = assignments.iloc[frb_idx]

print(f"FRB {frb_idx}:")
print(f"  Position: ({frb_row['ra']:.6f}, {frb_row['dec']:.6f})")
print(f"  True host ID: {frb_row['gal_ID']}")
print(f"  True host mag: {frb_row['mag']:.2f}")
print(f"  Localization: {frb_row['a']:.2f}\" x {frb_row['b']:.2f}\"")
print()

result = run_path_on_frb(frb_row, galaxies, search_radius_arcmin=1.0, verbose=True)

if result['success']:
    print(f"\n" + "="*60)
    print(f"PATH Results:")
    print(f"  N candidates: {result['n_candidates']}")
    print(f"  P(U|x):       {result['P_Ux']:.4f}")
    print(f"  Top P(O|x):   {result['top_P_Ox']:.4f}")
    print(f"  True host P(O|x): {result['true_host_P_Ox']:.4f}")
    print(f"  True host rank:   {result['true_host_rank']}")
else:
    print(f"PATH failed: {result['reason']}")

FRB 0:
  Position: (199.216883, 6.409707)
  True host ID: 8796117175370389.0
  True host mag: 17.96
  Localization: 1.00" x 0.50"

                 ID          ra       dec  ang_size        mag  P_O      P_Ox
0  8796117175370389  199.216727  6.409863  1.871511  17.963129  0.9  0.999985
P_Ux = 1.5185724702115839e-05

============================================================
PATH Results:
  N candidates: 1
  P(U|x):       0.0000
  Top P(O|x):   1.0000
  True host P(O|x): 1.0000
  True host rank:   1

Run PATH on all assigned FRBs, in parallel

This took ~3min on my laptop

[12]:

# Prior dict
DECaLS_chime_priors = {}
DECaLS_chime_priors['version'] = '1.0'
DECaLS_chime_priors['PU'] = 0.15
DECaLS_chime_priors['scale'] = 0.5
prior_dict = DECaLS_chime_priors

#
final_sims = run_path.full(assignments, galaxies, prior_dict,
        debug=False, ncpu=4, multi=True)

Slicing...
kk:  0
PATH time

[13]:

final_sims

[13]:

	ra	dec	ang_size	mag	ID	sep	P_O	p_xO	P_Ox	P_Ux	gal_ID	iFRB
0	199.216727	6.409863	1.871511	17.963129	8796117175370389	0.793340	0.420775	0.056456	0.999951	0.000049	1	0
1	199.258934	6.388591	1.296497	18.840237	8796117175369996	168.555642	0.157922	0.000000	0.000000	0.000049	0	0
2	199.233992	6.439563	4.672406	18.350800	8796117175370961	123.687674	0.271302	0.000000	0.000000	0.000049	2	0
3	33.800652	-4.716766	0.660314	25.200432	37485405112650917	0.863657	0.000071	0.106230	0.788365	0.121549	1236	1
4	33.801193	-4.717033	0.397299	25.941730	37485405112650915	1.370733	0.000042	0.019177	0.084025	0.121549	1234	1
...	...	...	...	...	...	...	...	...	...	...	...	...
95287	34.280275	-6.777631	0.480865	21.614834	8796112138537127	179.791867	0.001291	0.000000	0.000000	0.000180	68	84
95288	34.309795	-6.775455	0.598027	21.801176	8796112138537191	176.932092	0.001087	0.000000	0.000000	0.000180	69	84
95289	34.298661	-6.773713	0.976511	21.308273	8796112138537228	179.692401	0.001721	0.000000	0.000000	0.000180	70	84
95290	219.961916	21.458217	1.339557	18.801554	8796122665518662	1.620448	0.239000	0.050786	0.999905	0.000095	0	85
95291	219.927882	21.461977	1.061329	17.962357	8796122665518745	115.985859	0.611000	0.000000	0.000000	0.000095	1	85

95292 rows × 12 columns

Step 5. Digest

[19]:

frbs.keys()

[19]:

Index(['DMeg', 'z', 'M_r', 'm_r'], dtype='object')

[21]:

reload(sim_utils)
digest = sim_utils.build_digest(final_sims, frbs, assignments, galaxies)

Get parameters from the simulation results dataframe
Rename some columns to make concatenation cleaner
Calculate angular separation between true host and best candidate
Add the RA/Dec of the host *galaxy* from the original catalog
Merge dataframes, to create a nice big cross-checked dataframe for plotting purposes

[23]:

digest

[23]:

	ra_loc	dec_loc	true_ra	true_dec	host_ID	gal_off	mag_host	half_light_host	loc_off	FRB_ID	...	ang_size_cand	mag_cand	cand_ID	sep_cand	P_O	p_xO	P_Ox	P_Ux	gal_ind	iFRB
0	199.216883	6.409707	199.216605	6.409882	8796117175370389	0.439120	17.963129	1.871511	1.177618	0	...	1.871511	17.963129	8796117175370389	0.793340	0.420775	0.056456	0.999951	0.000049	1	0
1	33.800890	-4.716802	33.800732	-4.717030	37485405112650917	0.993813	25.200432	0.660314	0.999297	1	...	0.660314	25.200432	37485405112650917	0.863657	0.000071	0.106230	0.788365	0.121549	1236	1
2	34.736942	-6.066167	34.736657	-6.066538	8796112420210440	0.485265	21.734750	0.737067	1.680305	2	...	0.737067	21.734750	8796112420210440	1.275588	0.003214	0.098519	0.996358	0.003642	31	2
3	34.353103	-2.951097	34.352760	-2.951153	8796113549727288	1.997182	21.624264	4.375264	1.247809	3	...	4.375264	21.624264	8796113549727288	2.564606	0.001682	0.009761	0.934149	0.065851	54	3
4	36.008173	-5.754997	36.008381	-5.754480	8796112514517822	0.037168	20.437860	0.438232	2.003143	4	...	0.438232	20.437860	8796112514517822	2.023352	0.009207	0.021587	0.994207	0.005790	25	4
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
81	37.603128	-5.110529	37.602826	-5.110579	8796112796975557	0.723388	20.342892	1.481896	1.095469	95	...	1.481896	20.342892	8796112796975557	0.979629	0.006700	0.059533	0.997107	0.002893	27	81
82	35.058354	-5.599320	35.058318	-5.599317	37489919123282555	0.123619	22.189007	0.615296	0.126097	96	...	0.615296	22.189007	37489919123282555	0.084937	0.001473	0.207259	0.996222	0.003778	733	82
83	34.589116	-3.983474	34.588902	-3.983679	38549165432636034	0.017656	22.667412	0.537738	1.065733	97	...	0.537738	22.667412	38549165432636034	1.060546	0.000756	0.138690	0.989086	0.010914	591	83
84	34.299906	-6.823612	34.300143	-6.823644	8796112138535839	0.426606	17.306215	2.290571	0.854336	98	...	2.290571	17.306215	8796112138535839	0.492979	0.121529	0.052765	0.999799	0.000180	20	84
85	219.962297	21.458493	219.961921	21.458290	8796122665518662	0.263433	18.801554	1.339557	1.457904	99	...	1.339557	18.801554	8796122665518662	1.620448	0.239000	0.050786	0.999905	0.000095	0	85

86 rows × 34 columns

Step 5: Analyze PATH Performance

Let’s analyze how well PATH identifies the true hosts.

[24]:

matched = digest['cand_ID'] == digest.host_ID
np.sum(matched)

[24]:

np.int64(84)

Step 6: Compare Different Localization Precisions

Let’s see how PATH performance varies with localization precision.

[13]:

# Define different localization scenarios
localization_scenarios = {
    'Excellent (0.1")': (0.1, 0.1, 0.),
    'Good (0.5")': (0.5, 0.3, 45.),
    'Moderate (2")': (2.0, 1.0, 30.),
    'Poor (5")': (5.0, 3.0, 0.)
}

# Run simulations for each scenario
scenario_results = {}

for scenario_name, loc in localization_scenarios.items():
    print(f"\nRunning {scenario_name}...")

    # Assign FRBs with this localization
    assign_scenario = assign_frbs_to_hosts(
        frb_df=frbs,
        galaxy_catalog=galaxies,
        localization=loc,
        mag_range=(17., 26.),
        seed=42
    )

    # Run PATH on each
    scenario_path_results = []
    for idx in range(len(assign_scenario)):
        frb_row = assign_scenario.iloc[idx]
        result = run_path_on_frb(frb_row, galaxies, search_radius_arcmin=1.0)
        if result['success']:
            scenario_path_results.append(result)

    scenario_results[scenario_name] = pd.DataFrame(scenario_path_results)

    # Summary
    df = scenario_results[scenario_name]
    rank1_frac = (df['true_host_rank'] == 1).mean() * 100
    print(f"  Assigned: {len(assign_scenario)}, PATH success: {len(df)}")
    print(f"  Correct (rank=1): {rank1_frac:.1f}%")


Running Excellent (0.1")...
Assigning 68 FRBs to hosts (filtered from 100)
Iteration 1: 68 FRBs remaining
  Brightest unassigned FRB: m_r = 17.22 deg
  Max magnitude separation: 0.0001 deg deg
Assignment complete after 1 iterations
Generating FRB positions within galaxies...
Applying localization error...
  Assigned: 68, PATH success: 68
  Correct (rank=1): 98.5%

Running Good (0.5")...
Assigning 68 FRBs to hosts (filtered from 100)
Iteration 1: 68 FRBs remaining
  Brightest unassigned FRB: m_r = 17.22 deg
  Max magnitude separation: 0.0001 deg deg
Assignment complete after 1 iterations
Generating FRB positions within galaxies...
Applying localization error...
  Assigned: 68, PATH success: 68
  Correct (rank=1): 95.6%

Running Moderate (2")...
Assigning 68 FRBs to hosts (filtered from 100)
Iteration 1: 68 FRBs remaining
  Brightest unassigned FRB: m_r = 17.22 deg
  Max magnitude separation: 0.0001 deg deg
Assignment complete after 1 iterations
Generating FRB positions within galaxies...
Applying localization error...
  Assigned: 68, PATH success: 68
  Correct (rank=1): 91.2%

Running Poor (5")...
Assigning 68 FRBs to hosts (filtered from 100)
Iteration 1: 68 FRBs remaining
  Brightest unassigned FRB: m_r = 17.22 deg
  Max magnitude separation: 0.0001 deg deg
Assignment complete after 1 iterations
Generating FRB positions within galaxies...
Applying localization error...
  Assigned: 68, PATH success: 68
  Correct (rank=1): 79.4%

[14]:

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728']

# 1. Rank 1 success rate comparison
ax = axes[0]
scenarios = list(scenario_results.keys())
rank1_rates = [(scenario_results[s]['true_host_rank'] == 1).mean() * 100 for s in scenarios]

bars = ax.bar(range(len(scenarios)), rank1_rates, color=colors, alpha=0.7, edgecolor='black')
ax.set_xticks(range(len(scenarios)))
ax.set_xticklabels(scenarios, rotation=15, ha='right')
ax.set_ylabel('Correct Identification Rate (%)')
ax.set_title('PATH Success Rate by Localization Precision')
ax.set_ylim(0, 100)

# Add value labels
for bar, rate in zip(bars, rank1_rates):
    ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 2,
            f'{rate:.1f}%', ha='center', fontsize=11)

# 2. True host P(O|x) distributions
ax = axes[1]
for scenario, color in zip(scenarios, colors):
    df = scenario_results[scenario]
    ax.hist(df['true_host_P_Ox'], bins=20, alpha=0.4, color=color,
            label=scenario, density=True)

ax.set_xlabel('True Host P(O|x)')
ax.set_ylabel('Density')
ax.set_title('True Host Posterior Distribution by Localization')
ax.legend()

plt.tight_layout()
plt.show()

../_images/nb_Simulate_Run_PATH_26_0.png

Step 7: Effect of Unseen Host Prior (P_U)

The PU parameter controls the prior probability that the true host is not in the catalog. Let’s see how this affects PATH performance.

[15]:

def run_path_with_PU(frb_row, galaxies, PU, search_radius_arcmin=1.0):
    """Run PATH with a specific P_U value."""
    catalog = extract_candidates(frb_row, galaxies, search_radius_arcmin)

    if catalog is None or len(catalog) == 0:
        return None

    lparam = {
        'a': frb_row['a'],
        'b': frb_row['b'],
        'theta': frb_row['PA']
    }

    idict = run.build_idict(
        ra=frb_row['ra'],
        dec=frb_row['dec'],
        ltype='eellipse',
        lparam=lparam,
        P_O_method='inverse',
        PU=PU,
        scale=0.5,
        theta_PDF='exp',
        theta_max=6.0
    )

    try:
        result_candidates, P_Ux, Path, mag_key, cut_catalog, _ = run.run_on_dict(
            idict, catalog=catalog, mag_key='mag'
        )
        return {
            'P_Ux': P_Ux,
            'candidates': result_candidates
        }
    except:
        return None

# Test different P_U values on a subset
PU_values = [0.0, 0.05, 0.1, 0.2, 0.3, 0.5]
n_test = min(30, len(assignments))

PU_results = {pu: [] for pu in PU_values}

print(f"Testing P_U effect on {n_test} FRBs...")
for idx in range(n_test):
    frb_row = assignments.iloc[idx]
    true_host_id = frb_row['gal_ID']

    for pu in PU_values:
        result = run_path_with_PU(frb_row, galaxies, pu)
        if result is not None:
            cands = result['candidates']
            if 'ID' in cands.columns:
                mask = cands['ID'] == true_host_id
                if mask.any():
                    true_P_Ox = cands.loc[mask, 'P_Ox'].values[0]
                else:
                    true_P_Ox = 0.0
            else:
                true_P_Ox = None

            PU_results[pu].append({
                'P_Ux': result['P_Ux'],
                'true_host_P_Ox': true_P_Ox
            })

print("Done!")

Testing P_U effect on 30 FRBs...
Done!

[16]:

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# 1. Mean P(U|x) vs prior P_U
ax = axes[0]
mean_PUx = [np.mean([r['P_Ux'] for r in PU_results[pu]]) for pu in PU_values]
ax.plot(PU_values, mean_PUx, 'bo-', markersize=10, linewidth=2)
ax.plot([0, 0.5], [0, 0.5], 'k--', alpha=0.5, label='1:1 line')
ax.set_xlabel('Prior P(U)', fontsize=12)
ax.set_ylabel('Mean Posterior P(U|x)', fontsize=12)
ax.set_title('Effect of Unseen Prior on Posterior')
ax.legend()
ax.grid(alpha=0.3)

# 2. Mean true host P(O|x) vs P_U
ax = axes[1]
mean_true_POx = [np.mean([r['true_host_P_Ox'] for r in PU_results[pu] if r['true_host_P_Ox'] is not None])
                 for pu in PU_values]
ax.plot(PU_values, mean_true_POx, 'go-', markersize=10, linewidth=2)
ax.set_xlabel('Prior P(U)', fontsize=12)
ax.set_ylabel('Mean True Host P(O|x)', fontsize=12)
ax.set_title('Effect of Unseen Prior on True Host Posterior')
ax.grid(alpha=0.3)

plt.tight_layout()
plt.show()

print("P(U) effect summary:")
print(f"{'P_U':<8} {'Mean P(U|x)':<15} {'Mean True Host P(O|x)':<20}")
print("-" * 45)
for pu, pux, pox in zip(PU_values, mean_PUx, mean_true_POx):
    print(f"{pu:<8.2f} {pux:<15.4f} {pox:<20.4f}")

../_images/nb_Simulate_Run_PATH_29_0.png

P(U) effect summary:
P_U      Mean P(U|x)     Mean True Host P(O|x)
---------------------------------------------
0.00     0.0000          0.9229
0.05     0.0008          0.9223
0.10     0.0016          0.9216
0.20     0.0036          0.9200
0.30     0.0061          0.9179
0.50     0.0136          0.9116

Summary

This notebook demonstrated the complete workflow for running PATH on simulated FRBs:

Generate FRBs: Used generate_frbs() to create realistic FRB populations
Load Catalog: Used real or mock galaxy catalog
Assign Hosts: Used assign_frbs_to_hosts() to place FRBs with localization errors
Run PATH: Used run.run_on_dict() to compute posteriors
Analyze Performance: Evaluated correct identification rates

Key Findings

Localization precision matters: Better localization leads to higher correct identification rates
P_U trade-off: Higher unseen prior reduces true host posterior but may be more realistic
PATH is effective: Even with moderate localization, PATH correctly identifies most hosts

Using with Real Data

To use this workflow with real FRB observations:

Replace simulated FRBs with real observations
Use actual localization parameters from your survey
Query appropriate survey catalogs (Pan-STARRS, DECaLS, etc.)
Adjust priors based on your expectations

[ ]: