Running PATH on Simulated FRBs
This notebook demonstrates how to run PATH (Probabilistic Association of Transients to Hosts) on simulated FRB populations.
The workflow consists of three main steps:
Generate FRBs: Create synthetic FRB populations with realistic properties
Assign to Hosts: Place FRBs in host galaxies with localization errors
Run PATH: Calculate posterior probabilities for host candidates
This builds on the previous notebooks:
Simulate_Generate_FRBs.ipynb: Generating FRB populationsSimulate_Assign_FRBs.ipynb: Assigning FRBs to host galaxies
[11]:
import os
import numpy as np
import matplotlib.pyplot as plt
from importlib import reload
import pandas as pd
from pathlib import Path
from astropy.table import Table
from astropy.coordinates import SkyCoord
from astropy import units
# Set up plotting style
plt.rcParams['figure.figsize'] = (10, 6)
plt.rcParams['font.size'] = 12
[16]:
# Import astropath modules
from astropath.simulations import generate_frbs, assign_frbs_to_hosts, run_path
from astropath.simulations import utils as sim_utils
from astropath import run
Step 1: Generate FRB Population
First, we generate a population of FRBs using generate_frbs(). This gives us FRBs with realistic DM, redshift, and host galaxy magnitude distributions for different surveys.
[4]:
# Generate 100 CHIME FRBs for demonstration
n_frbs = 100
seed = 42
frbs = generate_frbs(n_frbs, 'CHIME', seed=seed)
print(f"Generated {len(frbs)} FRBs")
print(f"\nFRB properties:")
print(f" DM range: {frbs['DMeg'].min():.0f} - {frbs['DMeg'].max():.0f} pc/cm³")
print(f" z range: {frbs['z'].min():.3f} - {frbs['z'].max():.3f}")
print(f" m_r range: {frbs['m_r'].min():.2f} - {frbs['m_r'].max():.2f}")
frbs.head()
Loading P(DM,z) grid from /home/xavier/Projects/FRB/FRB/frb/data/DM/CHIME_pzdm.npz
Sampling DM values
Sampling redshifts
Sampling host galaxy absolute magnitudes
Using Lz values to sample host galaxy absolute magnitudes
Generated 100 FRBs
FRB properties:
DM range: 40 - 2090 pc/cm³
z range: 0.010 - 1.960
m_r range: 12.84 - 27.61
[4]:
| DMeg | z | M_r | m_r | |
|---|---|---|---|---|
| 0 | 445.0 | 0.25 | -22.608415 | 17.963133 |
| 1 | 1175.0 | 1.38 | -19.821438 | 25.200431 |
| 2 | 710.0 | 0.63 | -21.196656 | 21.734748 |
| 3 | 585.0 | 0.45 | -20.429629 | 21.624262 |
| 4 | 215.0 | 0.11 | -18.170599 | 20.437868 |
Step 2: Load Galaxy Catalog
We need a galaxy catalog to:
Assign FRBs to host galaxies
Provide candidate hosts for PATH analysis
We’ll use the real combined HSC/DECaLs/HECATE catalog if available, otherwise create a mock catalog.
[5]:
def create_mock_galaxy_catalog(n_galaxies=50000, seed=42):
"""
Create a mock galaxy catalog for demonstration.
"""
np.random.seed(seed)
# Create galaxies across the sky
ra = np.random.uniform(0., 360., n_galaxies)
dec = np.random.uniform(-30., 60., n_galaxies)
# Magnitude distribution (peaks around m_r ~ 21-22)
mag_best = np.random.beta(a=2, b=5, size=n_galaxies) * 10 + 16
mag_best = np.clip(mag_best, 16., 26.)
# Half-light radii (log-normal distribution)
half_light = np.random.lognormal(mean=np.log(0.8), sigma=0.6, size=n_galaxies)
half_light = np.clip(half_light, 0.1, 5.0)
df = pd.DataFrame({
'ra': ra,
'dec': dec,
'mag_best': mag_best,
'half_light': half_light,
'ID': np.arange(n_galaxies)
})
return df
def load_galaxy_catalog(seed=42):
"""
Load galaxy catalog - tries real catalog first, falls back to mock.
"""
frb_apath = os.environ.get('FRB_APATH')
if frb_apath is not None:
catalog_path = Path(frb_apath) / 'combined_HSC_DECaLs_HECATE_galaxies_hecatecut.parquet'
if catalog_path.exists():
print(f"Loading real galaxy catalog from:")
print(f" {catalog_path}")
galaxies = pd.read_parquet(catalog_path)
required_cols = ['ra', 'dec', 'mag_best', 'half_light', 'ID']
if all(col in galaxies.columns for col in required_cols):
print(f" Loaded {len(galaxies):,} galaxies")
return galaxies, True
print("Creating mock galaxy catalog...")
print(" (Set $FRB_APATH to use real HSC/DECaLs/HECATE catalog)")
galaxies = create_mock_galaxy_catalog(n_galaxies=50000, seed=seed)
print(f" Created {len(galaxies):,} mock galaxies")
return galaxies, False
# Load catalog
galaxies, is_real = load_galaxy_catalog(seed=42)
print(f"\nGalaxy catalog:")
print(f" N_galaxies: {len(galaxies):,}")
print(f" Mag range: {galaxies['mag_best'].min():.2f} - {galaxies['mag_best'].max():.2f}")
Loading real galaxy catalog from:
/home/xavier/Projects/FRB/data/Catalogs/combined_HSC_DECaLs_HECATE_galaxies_hecatecut.parquet
Loaded 2,297,005 galaxies
Galaxy catalog:
N_galaxies: 2,297,005
Mag range: 9.75 - 28.00
Step 3: Assign FRBs to Host Galaxies
Now we assign FRBs to host galaxies based on their expected host magnitude. The assignment includes:
True FRB position within the galaxy
Observed position with localization error
Localization ellipse parameters (a, b, PA)
[6]:
# Define localization error ellipse
# (semi-major axis, semi-minor axis, position angle) in arcsec, arcsec, degrees
localization = (1.0, 0.5, 45.) # 1" x 0.5" ellipse at 45 deg
print(f"Localization error ellipse:")
print(f" Semi-major axis (a): {localization[0]}\"")
print(f" Semi-minor axis (b): {localization[1]}\"")
print(f" Position angle: {localization[2]} deg")
print()
# Assign FRBs to hosts
assignments = assign_frbs_to_hosts(
frb_df=frbs,
galaxy_catalog=galaxies,
localization=localization,
mag_range=(17., 26.), # Only assign FRBs with hosts in this mag range
seed=seed
)
print(f"\nAssigned {len(assignments)} FRBs to hosts")
print(f"(Filtered from {len(frbs)} based on magnitude range)")
assignments.head()
Localization error ellipse:
Semi-major axis (a): 1.0"
Semi-minor axis (b): 0.5"
Position angle: 45.0 deg
Assigning 86 FRBs to hosts (filtered from 100)
Iteration 1: 86 FRBs remaining
Brightest unassigned FRB: m_r = 17.17 deg
Max magnitude separation: 0.0001 deg deg
Assignment complete after 1 iterations
Generating FRB positions within galaxies...
Applying localization error...
Assigned 86 FRBs to hosts
(Filtered from 100 based on magnitude range)
[6]:
| ra | dec | true_ra | true_dec | gal_ID | gal_off | mag | half_light | loc_off | FRB_ID | a | b | PA | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 199.216883 | 6.409707 | 199.216605 | 6.409882 | 8796117175370389 | 0.439120 | 17.963129 | 1.871511 | 1.177618 | 0 | 1.0 | 0.5 | 45.0 |
| 1 | 33.800890 | -4.716802 | 33.800732 | -4.717030 | 37485405112650917 | 0.993813 | 25.200432 | 0.660314 | 0.999297 | 1 | 1.0 | 0.5 | 45.0 |
| 2 | 34.736942 | -6.066167 | 34.736657 | -6.066538 | 8796112420210440 | 0.485265 | 21.734750 | 0.737067 | 1.680305 | 2 | 1.0 | 0.5 | 45.0 |
| 3 | 34.353103 | -2.951097 | 34.352760 | -2.951153 | 8796113549727288 | 1.997182 | 21.624264 | 4.375264 | 1.247809 | 3 | 1.0 | 0.5 | 45.0 |
| 4 | 36.008173 | -5.754997 | 36.008381 | -5.754480 | 8796112514517822 | 0.037168 | 20.437860 | 0.438232 | 2.003143 | 4 | 1.0 | 0.5 | 45.0 |
Output columns:
ra,dec: Observed FRB coordinates (with localization error)true_ra,true_dec: True FRB coordinates (within galaxy)gal_ID: Host galaxy IDgal_off: FRB offset from galaxy center (arcsec)mag: Host galaxy magnitudehalf_light: Host galaxy half-light radius (arcsec)loc_off: Localization error magnitude (arcsec)a,b,PA: Localization ellipse parameters
Step 4: Run PATH Analysis
Now we run PATH on each assigned FRB. For each FRB, we:
Extract nearby galaxies as candidates
Build the input dictionary with localization and priors
Run PATH to compute posterior probabilities
Compare with the true host
[7]:
def extract_candidates(frb_row, galaxies, search_radius_arcmin=1.0):
"""
Extract candidate host galaxies near an FRB position.
Args:
frb_row: Row from assignments DataFrame
galaxies: Galaxy catalog DataFrame
search_radius_arcmin: Search radius in arcmin
Returns:
astropy.table.Table: Candidate catalog for PATH
"""
# Get FRB position
frb_ra = frb_row['ra']
frb_dec = frb_row['dec']
# Convert search radius to degrees
search_radius_deg = search_radius_arcmin / 60.
# Simple box cut (fast approximation)
cos_dec = np.cos(np.radians(frb_dec))
ra_mask = np.abs(galaxies['ra'] - frb_ra) * cos_dec < search_radius_deg
dec_mask = np.abs(galaxies['dec'] - frb_dec) < search_radius_deg
nearby = galaxies[ra_mask & dec_mask].copy()
if len(nearby) == 0:
return None
# Convert to astropy Table for PATH
catalog = Table()
catalog['ra'] = nearby['ra'].values
catalog['dec'] = nearby['dec'].values
catalog['ang_size'] = nearby['half_light'].values
catalog['mag'] = nearby['mag_best'].values
catalog['ID'] = nearby['ID'].values
return catalog
[8]:
def run_path_on_frb(frb_row, galaxies, search_radius_arcmin=1.0, verbose=False):
"""
Run PATH analysis on a single FRB.
Args:
frb_row: Row from assignments DataFrame
galaxies: Galaxy catalog DataFrame
search_radius_arcmin: Search radius for candidates
verbose: Print detailed output
Returns:
dict: PATH results including posteriors and true host info
"""
# Extract candidate galaxies
catalog = extract_candidates(frb_row, galaxies, search_radius_arcmin)
if catalog is None or len(catalog) == 0:
return {'success': False, 'reason': 'no_candidates'}
# Build input dictionary
lparam = {
'a': frb_row['a'],
'b': frb_row['b'],
'theta': frb_row['PA']
}
idict = run.build_idict(
ra=frb_row['ra'],
dec=frb_row['dec'],
ltype='eellipse',
lparam=lparam,
P_O_method='inverse',
PU=0.1, # 10% prior on unseen host
scale=0.5,
theta_PDF='exp',
theta_max=6.0
)
# Run PATH
try:
result_candidates, P_Ux, Path, mag_key, cut_catalog, _ = run.run_on_dict(
idict,
verbose=verbose,
catalog=catalog,
mag_key='mag'
)
except Exception as e:
return {'success': False, 'reason': str(e)}
if result_candidates is None or len(result_candidates) == 0:
return {'success': False, 'reason': 'path_failed'}
# Find the true host in results
true_host_id = frb_row['gal_ID']
# Check if true host is in candidates
if 'ID' in result_candidates.columns:
true_host_mask = result_candidates['ID'] == true_host_id
if true_host_mask.any():
true_host_P_Ox = result_candidates.loc[true_host_mask, 'P_Ox'].values[0]
true_host_rank = (result_candidates['P_Ox'] > true_host_P_Ox).sum() + 1
else:
true_host_P_Ox = 0.0
true_host_rank = len(result_candidates) + 1
else:
true_host_P_Ox = None
true_host_rank = None
return {
'success': True,
'n_candidates': len(result_candidates),
'P_Ux': P_Ux,
'top_P_Ox': result_candidates['P_Ox'].iloc[0],
'true_host_P_Ox': true_host_P_Ox,
'true_host_rank': true_host_rank,
'true_host_id': true_host_id,
'candidates': result_candidates
}
Run PATH on a single FRB (example)
[9]:
# Run PATH on the first FRB
frb_idx = 0
frb_row = assignments.iloc[frb_idx]
print(f"FRB {frb_idx}:")
print(f" Position: ({frb_row['ra']:.6f}, {frb_row['dec']:.6f})")
print(f" True host ID: {frb_row['gal_ID']}")
print(f" True host mag: {frb_row['mag']:.2f}")
print(f" Localization: {frb_row['a']:.2f}\" x {frb_row['b']:.2f}\"")
print()
result = run_path_on_frb(frb_row, galaxies, search_radius_arcmin=1.0, verbose=True)
if result['success']:
print(f"\n" + "="*60)
print(f"PATH Results:")
print(f" N candidates: {result['n_candidates']}")
print(f" P(U|x): {result['P_Ux']:.4f}")
print(f" Top P(O|x): {result['top_P_Ox']:.4f}")
print(f" True host P(O|x): {result['true_host_P_Ox']:.4f}")
print(f" True host rank: {result['true_host_rank']}")
else:
print(f"PATH failed: {result['reason']}")
FRB 0:
Position: (199.216883, 6.409707)
True host ID: 8796117175370389.0
True host mag: 17.96
Localization: 1.00" x 0.50"
ID ra dec ang_size mag P_O P_Ox
0 8796117175370389 199.216727 6.409863 1.871511 17.963129 0.9 0.999985
P_Ux = 1.5185724702115839e-05
============================================================
PATH Results:
N candidates: 1
P(U|x): 0.0000
Top P(O|x): 1.0000
True host P(O|x): 1.0000
True host rank: 1
Run PATH on all assigned FRBs, in parallel
This took ~3min on my laptop
[12]:
# Prior dict
DECaLS_chime_priors = {}
DECaLS_chime_priors['version'] = '1.0'
DECaLS_chime_priors['PU'] = 0.15
DECaLS_chime_priors['scale'] = 0.5
prior_dict = DECaLS_chime_priors
#
final_sims = run_path.full(assignments, galaxies, prior_dict,
debug=False, ncpu=4, multi=True)
Slicing...
kk: 0
PATH time
[13]:
final_sims
[13]:
| ra | dec | ang_size | mag | ID | sep | P_O | p_xO | P_Ox | P_Ux | gal_ID | iFRB | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 199.216727 | 6.409863 | 1.871511 | 17.963129 | 8796117175370389 | 0.793340 | 0.420775 | 0.056456 | 0.999951 | 0.000049 | 1 | 0 |
| 1 | 199.258934 | 6.388591 | 1.296497 | 18.840237 | 8796117175369996 | 168.555642 | 0.157922 | 0.000000 | 0.000000 | 0.000049 | 0 | 0 |
| 2 | 199.233992 | 6.439563 | 4.672406 | 18.350800 | 8796117175370961 | 123.687674 | 0.271302 | 0.000000 | 0.000000 | 0.000049 | 2 | 0 |
| 3 | 33.800652 | -4.716766 | 0.660314 | 25.200432 | 37485405112650917 | 0.863657 | 0.000071 | 0.106230 | 0.788365 | 0.121549 | 1236 | 1 |
| 4 | 33.801193 | -4.717033 | 0.397299 | 25.941730 | 37485405112650915 | 1.370733 | 0.000042 | 0.019177 | 0.084025 | 0.121549 | 1234 | 1 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 95287 | 34.280275 | -6.777631 | 0.480865 | 21.614834 | 8796112138537127 | 179.791867 | 0.001291 | 0.000000 | 0.000000 | 0.000180 | 68 | 84 |
| 95288 | 34.309795 | -6.775455 | 0.598027 | 21.801176 | 8796112138537191 | 176.932092 | 0.001087 | 0.000000 | 0.000000 | 0.000180 | 69 | 84 |
| 95289 | 34.298661 | -6.773713 | 0.976511 | 21.308273 | 8796112138537228 | 179.692401 | 0.001721 | 0.000000 | 0.000000 | 0.000180 | 70 | 84 |
| 95290 | 219.961916 | 21.458217 | 1.339557 | 18.801554 | 8796122665518662 | 1.620448 | 0.239000 | 0.050786 | 0.999905 | 0.000095 | 0 | 85 |
| 95291 | 219.927882 | 21.461977 | 1.061329 | 17.962357 | 8796122665518745 | 115.985859 | 0.611000 | 0.000000 | 0.000000 | 0.000095 | 1 | 85 |
95292 rows × 12 columns
Step 5. Digest
[19]:
frbs.keys()
[19]:
Index(['DMeg', 'z', 'M_r', 'm_r'], dtype='object')
[21]:
reload(sim_utils)
digest = sim_utils.build_digest(final_sims, frbs, assignments, galaxies)
Get parameters from the simulation results dataframe
Rename some columns to make concatenation cleaner
Calculate angular separation between true host and best candidate
Add the RA/Dec of the host *galaxy* from the original catalog
Merge dataframes, to create a nice big cross-checked dataframe for plotting purposes
[23]:
digest
[23]:
| ra_loc | dec_loc | true_ra | true_dec | host_ID | gal_off | mag_host | half_light_host | loc_off | FRB_ID | ... | ang_size_cand | mag_cand | cand_ID | sep_cand | P_O | p_xO | P_Ox | P_Ux | gal_ind | iFRB | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 199.216883 | 6.409707 | 199.216605 | 6.409882 | 8796117175370389 | 0.439120 | 17.963129 | 1.871511 | 1.177618 | 0 | ... | 1.871511 | 17.963129 | 8796117175370389 | 0.793340 | 0.420775 | 0.056456 | 0.999951 | 0.000049 | 1 | 0 |
| 1 | 33.800890 | -4.716802 | 33.800732 | -4.717030 | 37485405112650917 | 0.993813 | 25.200432 | 0.660314 | 0.999297 | 1 | ... | 0.660314 | 25.200432 | 37485405112650917 | 0.863657 | 0.000071 | 0.106230 | 0.788365 | 0.121549 | 1236 | 1 |
| 2 | 34.736942 | -6.066167 | 34.736657 | -6.066538 | 8796112420210440 | 0.485265 | 21.734750 | 0.737067 | 1.680305 | 2 | ... | 0.737067 | 21.734750 | 8796112420210440 | 1.275588 | 0.003214 | 0.098519 | 0.996358 | 0.003642 | 31 | 2 |
| 3 | 34.353103 | -2.951097 | 34.352760 | -2.951153 | 8796113549727288 | 1.997182 | 21.624264 | 4.375264 | 1.247809 | 3 | ... | 4.375264 | 21.624264 | 8796113549727288 | 2.564606 | 0.001682 | 0.009761 | 0.934149 | 0.065851 | 54 | 3 |
| 4 | 36.008173 | -5.754997 | 36.008381 | -5.754480 | 8796112514517822 | 0.037168 | 20.437860 | 0.438232 | 2.003143 | 4 | ... | 0.438232 | 20.437860 | 8796112514517822 | 2.023352 | 0.009207 | 0.021587 | 0.994207 | 0.005790 | 25 | 4 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 81 | 37.603128 | -5.110529 | 37.602826 | -5.110579 | 8796112796975557 | 0.723388 | 20.342892 | 1.481896 | 1.095469 | 95 | ... | 1.481896 | 20.342892 | 8796112796975557 | 0.979629 | 0.006700 | 0.059533 | 0.997107 | 0.002893 | 27 | 81 |
| 82 | 35.058354 | -5.599320 | 35.058318 | -5.599317 | 37489919123282555 | 0.123619 | 22.189007 | 0.615296 | 0.126097 | 96 | ... | 0.615296 | 22.189007 | 37489919123282555 | 0.084937 | 0.001473 | 0.207259 | 0.996222 | 0.003778 | 733 | 82 |
| 83 | 34.589116 | -3.983474 | 34.588902 | -3.983679 | 38549165432636034 | 0.017656 | 22.667412 | 0.537738 | 1.065733 | 97 | ... | 0.537738 | 22.667412 | 38549165432636034 | 1.060546 | 0.000756 | 0.138690 | 0.989086 | 0.010914 | 591 | 83 |
| 84 | 34.299906 | -6.823612 | 34.300143 | -6.823644 | 8796112138535839 | 0.426606 | 17.306215 | 2.290571 | 0.854336 | 98 | ... | 2.290571 | 17.306215 | 8796112138535839 | 0.492979 | 0.121529 | 0.052765 | 0.999799 | 0.000180 | 20 | 84 |
| 85 | 219.962297 | 21.458493 | 219.961921 | 21.458290 | 8796122665518662 | 0.263433 | 18.801554 | 1.339557 | 1.457904 | 99 | ... | 1.339557 | 18.801554 | 8796122665518662 | 1.620448 | 0.239000 | 0.050786 | 0.999905 | 0.000095 | 0 | 85 |
86 rows × 34 columns
Step 5: Analyze PATH Performance
Let’s analyze how well PATH identifies the true hosts.
[24]:
matched = digest['cand_ID'] == digest.host_ID
np.sum(matched)
[24]:
np.int64(84)
Step 6: Compare Different Localization Precisions
Let’s see how PATH performance varies with localization precision.
[13]:
# Define different localization scenarios
localization_scenarios = {
'Excellent (0.1")': (0.1, 0.1, 0.),
'Good (0.5")': (0.5, 0.3, 45.),
'Moderate (2")': (2.0, 1.0, 30.),
'Poor (5")': (5.0, 3.0, 0.)
}
# Run simulations for each scenario
scenario_results = {}
for scenario_name, loc in localization_scenarios.items():
print(f"\nRunning {scenario_name}...")
# Assign FRBs with this localization
assign_scenario = assign_frbs_to_hosts(
frb_df=frbs,
galaxy_catalog=galaxies,
localization=loc,
mag_range=(17., 26.),
seed=42
)
# Run PATH on each
scenario_path_results = []
for idx in range(len(assign_scenario)):
frb_row = assign_scenario.iloc[idx]
result = run_path_on_frb(frb_row, galaxies, search_radius_arcmin=1.0)
if result['success']:
scenario_path_results.append(result)
scenario_results[scenario_name] = pd.DataFrame(scenario_path_results)
# Summary
df = scenario_results[scenario_name]
rank1_frac = (df['true_host_rank'] == 1).mean() * 100
print(f" Assigned: {len(assign_scenario)}, PATH success: {len(df)}")
print(f" Correct (rank=1): {rank1_frac:.1f}%")
Running Excellent (0.1")...
Assigning 68 FRBs to hosts (filtered from 100)
Iteration 1: 68 FRBs remaining
Brightest unassigned FRB: m_r = 17.22 deg
Max magnitude separation: 0.0001 deg deg
Assignment complete after 1 iterations
Generating FRB positions within galaxies...
Applying localization error...
Assigned: 68, PATH success: 68
Correct (rank=1): 98.5%
Running Good (0.5")...
Assigning 68 FRBs to hosts (filtered from 100)
Iteration 1: 68 FRBs remaining
Brightest unassigned FRB: m_r = 17.22 deg
Max magnitude separation: 0.0001 deg deg
Assignment complete after 1 iterations
Generating FRB positions within galaxies...
Applying localization error...
Assigned: 68, PATH success: 68
Correct (rank=1): 95.6%
Running Moderate (2")...
Assigning 68 FRBs to hosts (filtered from 100)
Iteration 1: 68 FRBs remaining
Brightest unassigned FRB: m_r = 17.22 deg
Max magnitude separation: 0.0001 deg deg
Assignment complete after 1 iterations
Generating FRB positions within galaxies...
Applying localization error...
Assigned: 68, PATH success: 68
Correct (rank=1): 91.2%
Running Poor (5")...
Assigning 68 FRBs to hosts (filtered from 100)
Iteration 1: 68 FRBs remaining
Brightest unassigned FRB: m_r = 17.22 deg
Max magnitude separation: 0.0001 deg deg
Assignment complete after 1 iterations
Generating FRB positions within galaxies...
Applying localization error...
Assigned: 68, PATH success: 68
Correct (rank=1): 79.4%
[14]:
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728']
# 1. Rank 1 success rate comparison
ax = axes[0]
scenarios = list(scenario_results.keys())
rank1_rates = [(scenario_results[s]['true_host_rank'] == 1).mean() * 100 for s in scenarios]
bars = ax.bar(range(len(scenarios)), rank1_rates, color=colors, alpha=0.7, edgecolor='black')
ax.set_xticks(range(len(scenarios)))
ax.set_xticklabels(scenarios, rotation=15, ha='right')
ax.set_ylabel('Correct Identification Rate (%)')
ax.set_title('PATH Success Rate by Localization Precision')
ax.set_ylim(0, 100)
# Add value labels
for bar, rate in zip(bars, rank1_rates):
ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 2,
f'{rate:.1f}%', ha='center', fontsize=11)
# 2. True host P(O|x) distributions
ax = axes[1]
for scenario, color in zip(scenarios, colors):
df = scenario_results[scenario]
ax.hist(df['true_host_P_Ox'], bins=20, alpha=0.4, color=color,
label=scenario, density=True)
ax.set_xlabel('True Host P(O|x)')
ax.set_ylabel('Density')
ax.set_title('True Host Posterior Distribution by Localization')
ax.legend()
plt.tight_layout()
plt.show()
Step 7: Effect of Unseen Host Prior (P_U)
The PU parameter controls the prior probability that the true host is not in the catalog. Let’s see how this affects PATH performance.
[15]:
def run_path_with_PU(frb_row, galaxies, PU, search_radius_arcmin=1.0):
"""Run PATH with a specific P_U value."""
catalog = extract_candidates(frb_row, galaxies, search_radius_arcmin)
if catalog is None or len(catalog) == 0:
return None
lparam = {
'a': frb_row['a'],
'b': frb_row['b'],
'theta': frb_row['PA']
}
idict = run.build_idict(
ra=frb_row['ra'],
dec=frb_row['dec'],
ltype='eellipse',
lparam=lparam,
P_O_method='inverse',
PU=PU,
scale=0.5,
theta_PDF='exp',
theta_max=6.0
)
try:
result_candidates, P_Ux, Path, mag_key, cut_catalog, _ = run.run_on_dict(
idict, catalog=catalog, mag_key='mag'
)
return {
'P_Ux': P_Ux,
'candidates': result_candidates
}
except:
return None
# Test different P_U values on a subset
PU_values = [0.0, 0.05, 0.1, 0.2, 0.3, 0.5]
n_test = min(30, len(assignments))
PU_results = {pu: [] for pu in PU_values}
print(f"Testing P_U effect on {n_test} FRBs...")
for idx in range(n_test):
frb_row = assignments.iloc[idx]
true_host_id = frb_row['gal_ID']
for pu in PU_values:
result = run_path_with_PU(frb_row, galaxies, pu)
if result is not None:
cands = result['candidates']
if 'ID' in cands.columns:
mask = cands['ID'] == true_host_id
if mask.any():
true_P_Ox = cands.loc[mask, 'P_Ox'].values[0]
else:
true_P_Ox = 0.0
else:
true_P_Ox = None
PU_results[pu].append({
'P_Ux': result['P_Ux'],
'true_host_P_Ox': true_P_Ox
})
print("Done!")
Testing P_U effect on 30 FRBs...
Done!
[16]:
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# 1. Mean P(U|x) vs prior P_U
ax = axes[0]
mean_PUx = [np.mean([r['P_Ux'] for r in PU_results[pu]]) for pu in PU_values]
ax.plot(PU_values, mean_PUx, 'bo-', markersize=10, linewidth=2)
ax.plot([0, 0.5], [0, 0.5], 'k--', alpha=0.5, label='1:1 line')
ax.set_xlabel('Prior P(U)', fontsize=12)
ax.set_ylabel('Mean Posterior P(U|x)', fontsize=12)
ax.set_title('Effect of Unseen Prior on Posterior')
ax.legend()
ax.grid(alpha=0.3)
# 2. Mean true host P(O|x) vs P_U
ax = axes[1]
mean_true_POx = [np.mean([r['true_host_P_Ox'] for r in PU_results[pu] if r['true_host_P_Ox'] is not None])
for pu in PU_values]
ax.plot(PU_values, mean_true_POx, 'go-', markersize=10, linewidth=2)
ax.set_xlabel('Prior P(U)', fontsize=12)
ax.set_ylabel('Mean True Host P(O|x)', fontsize=12)
ax.set_title('Effect of Unseen Prior on True Host Posterior')
ax.grid(alpha=0.3)
plt.tight_layout()
plt.show()
print("P(U) effect summary:")
print(f"{'P_U':<8} {'Mean P(U|x)':<15} {'Mean True Host P(O|x)':<20}")
print("-" * 45)
for pu, pux, pox in zip(PU_values, mean_PUx, mean_true_POx):
print(f"{pu:<8.2f} {pux:<15.4f} {pox:<20.4f}")
P(U) effect summary:
P_U Mean P(U|x) Mean True Host P(O|x)
---------------------------------------------
0.00 0.0000 0.9229
0.05 0.0008 0.9223
0.10 0.0016 0.9216
0.20 0.0036 0.9200
0.30 0.0061 0.9179
0.50 0.0136 0.9116
Summary
This notebook demonstrated the complete workflow for running PATH on simulated FRBs:
Generate FRBs: Used
generate_frbs()to create realistic FRB populationsLoad Catalog: Used real or mock galaxy catalog
Assign Hosts: Used
assign_frbs_to_hosts()to place FRBs with localization errorsRun PATH: Used
run.run_on_dict()to compute posteriorsAnalyze Performance: Evaluated correct identification rates
Key Findings
Localization precision matters: Better localization leads to higher correct identification rates
P_U trade-off: Higher unseen prior reduces true host posterior but may be more realistic
PATH is effective: Even with moderate localization, PATH correctly identifies most hosts
Using with Real Data
To use this workflow with real FRB observations:
Replace simulated FRBs with real observations
Use actual localization parameters from your survey
Query appropriate survey catalogs (Pan-STARRS, DECaLS, etc.)
Adjust priors based on your expectations
[ ]: