接着上一篇的疑问,我们说道,会执行 try_kill_peers 函数,它的函数定义在 ompi_mpi_abort.c 下:
// 这里注释也说到了,主要是杀死在同一个communicator的进程(不包括自己)
/*
* Local helper function to build an array of all the procs in a
* communicator, excluding this process.
*
* Killing a just the indicated peers must be implemented for
* MPI_Abort() to work according to the standard language for
* a ‘high-quality‘ implementation.
*
* It would be nifty if we could differentiate between the
* abort scenarios (but we don‘t, currently):
* - MPI_Abort()
* - MPI_ERRORS_ARE_FATAL
* - Victim of MPI_Abort()
*/
// 调用时传入了对应通信子
static void try_kill_peers(ompi_communicator_t *comm,
int errcode)
{
// 1. 第一部分: 给 ompi_process_name_t 指针申请空间,得到进程个数
int nprocs;
ompi_process_name_t *procs;
nprocs = ompi_comm_size(comm);
/* ompi_comm_remote_size() returns 0 if not an intercomm, so
this is safe */
nprocs += ompi_comm_remote_size(comm);
procs = (ompi_process_name_t*) calloc(nprocs, sizeof(ompi_process_name_t));
if (NULL == procs) {
/* quick clean orte and get out */
ompi_rte_abort(errno, "Abort: unable to alloc memory to kill procs");
}
// 2. 第二部分: 将进程放入数组中
/* put all the local group procs in the abort list */
int rank, i, count;
rank = ompi_comm_rank(comm); //这里可以获取到自己在该 communicator 中的 rank————疑问1
for (count = i = 0; i < ompi_comm_size(comm); ++i) {
if (rank == i) {
/* Don‘t include this process in the array */
--nprocs;
} else {
assert(count <= nprocs);
procs[count++] =
*OMPI_CAST_RTE_NAME(&ompi_group_get_proc_ptr(comm->c_remote_group, i, true)->super.proc_name);
}
}
// 3. 第三部分: 远程的 group 进程也放入数组中
/* if requested, kill off remote group procs too */
for (i = 0; i < ompi_comm_remote_size(comm); ++i) {
assert(count <= nprocs);
procs[count++] =
*OMPI_CAST_RTE_NAME(&ompi_group_get_proc_ptr(comm->c_remote_group, i, true)->super.proc_name);
}
// 4. 第四部分: 杀死进程
if (nprocs > 0) {
ompi_rte_abort_peers(procs, nprocs, errcode);
}
/* We could fall through here if ompi_rte_abort_peers() fails, or
if (nprocs == 0). Either way, tidy up and let the caller
handle it. */
free(procs);
}
这个时候,就得去看看 ompi_rte_abort_peers(procs, nprocs, errcode) 函数的定义,
原文:https://www.cnblogs.com/HelloGreen/p/8757349.html