The MC is about checkpoint-restore and process and container migration in the Linux esocsystem.
We cover C/R in the world of containers (CRIU) and in the world of HPC (Slurm, OpenMPI and BLCR)
This year we also going to discuss how ideas from academic research about heterogeneous computing, for instance Popcorn Linux project, may be (or maybe not) brought into the Linux ecosystem.
CRIU people
Popcorn Linux
Slurm project
BLCR (missed them last time)
Rodrigo HPC live migration
George from OpenMPI
IBM cross-arch live migration
Docker C/R integration
undo.io / RR project from Mozilla / Dinesh
Mike Rapoport, IBM (CRIU and cross-arch migration, hackathon)
Andrey Vagin, Virtuozzo (CRIU, hackathon)
Pavel Emelyanov, Virtuozzo (Tutorials, P.Haul, hackathon)
Kir Kolyshkin, OpenVZ (Tutorials, hackathon)
Popcorn Linux team, Virginia Tech
George Bosilca (OpenMPI, is willing to join)
Rodrigo Bruno, Distributed Systems Group, INSEC-ID
Manuel Rodriguez Pascual, (CIEMAT, SLURM project, is willing to join)
Dinesh Subhraveti, Fermat Inc.
Tycho Andersen, Docker.
Saied Kazemi, Google
Michael Holzheu, IBM, s390
Eric Biederman, (namespaces)
C/R in jobs scheduling
Exploring the possibilities that checkpoint/restart brings to the field of job scheduling. How can we benefit from being able to save the status of a running job in a cluster and restore it at some other time and place?
Server-less computing could use C/R to decrease startup time for interpreter engines.
Performance
Restoring a single task now takes too much operations. What can be the options to speed things up?
There are many places where we need to call A LOT of syscalls mostly in vain. E.g. – to restore socket params. Most of them will remain defaults from creation, but how can we detect this to skip relevant sockopt?
Can we somehow benefit from parallel dump?
Is there any way to do pre-restore? We do have pre-dump that dumps only the memory keeping the processes running. Restore is 100% synchronous.
Pre-dumps now generate too much pipes. Which is bad by itself and also pins tasks memory in ram. We need to make pre-dumps w/o this. On of the options is to use sys_read_process_vm() syscall, but this duplicates the mem. May we have sys_vmsplice_process_vm() in the kernel please?
Virtuozzo devs report, that restoring a container with 5GiGs or
RSS on 8GiGs host eats all the memory and causes the existing application starve and sometimes OOM-ed. Need to discuss this problem.
Security
The hottest topic here is user-mode checkpoint-restore. Although for checkpoint we more or less have enough APIs in the kernel, for restore we cannot do the simple “fork with pid” operation without diving into enormous complexity (in particular – to do this, we need a pid namespace, for pid namespace we need mount namespace to have relevant /proc mount, and for both we need user namespace. Said that user-mode restore is for now unreachable dream.
C/R-aware applications
Rodrigo is researching the ways to friend JVM with C/R for more effective live migration. Long time ago Xemul had an idea of libcrassist.so that applications can link with thus helping criu to dump and restore them.
On of the examples of the above is – triggering GC before live migration to reduce memory footprint.
Andrei Vagin <avagin (at) openvz (dot) org>
Mike Rapoport <rppt (at) linux (dot) vnet (dot) ibm (dot) com>
Kir Kolyshkin <kir (at) openvz (dot) org>
Pavel Emelyanov <xemul (at) virtuozzo (dot) com>