Generate Traces For Melissa P2P
Since all the other tools (scalasca, gprof, google-pprof, perf) seem to not like either
- that we kill runners instead of waiting until output is written
- that we use mpi
- that we actually want to instrument a library
I propose reusing melissa-da's timing library.
-
think of a strategy what we want to meassure what are good metrics, which bottlenecks we might want to identify -
Add necessary timing events (see common/TimingEvent.h
,TimingEventType
) -
Add calls to trigger(...)
in p2p API -
Be sure to write out at least the raw data using Timing::print_events()
-
Probably it would save postprocessing time if we also let every rank write out its own trace....csv
as done byTiming::write_region_csv()
(you need to call it with the correct parameters specifying which event pairs define begin and end of a region) -
Add a time parameter at which (difference to MELISSA_TIMING_NULL
) every core saves its timing to disk (runners generally don't quit gracefully and thus there is no other simple way to ensure write out of traces)
Metrics that we might want to get:
-
duration per assimilation cycle -
State transfers -
general purpose requests per second -
which general purpose requests -
how many peer requests, how many fail/succeed (should we add more intelligent runner distribution so they don't ask the wrong -
how many peer requests are abandoned as the server side is busy?
-
-
Request wait time (could we get faster from more weight servers)
Things we might want to measure (first proposal):
App Core
-
Model execution and idle time on app cores (Checking if old instrumentation still works should be enough) -
Time app core waits for states -
Time app core uses to communicate with fti head -
Time to calculate weights -
time to do job requests to the server
FTI head
-
time to do all kind of requests (prefetch, delete, peer to peer...) -
time to delete a state -
time to prefetch a state from another runner -
time to load a state from the pfs -
time used to send states to other runner -
time in server loop?
Maybe on the server side
-
Time a particle takes in total from sending out until receiving the weight -
Execution per assimilation cycle -
Whole execution
Edited by Kai Keller