Overview

The MC is about checkpoint-restore and process and container migration in the Linux esocsystem.

We cover C/R in the world of containers (CRIU) and in the world of HPC (Slurm, OpenMPI and BLCR)

This year we also going to discuss how ideas from academic research about heterogeneous computing, for instance Popcorn Linux project, may be (or maybe not) brought into the Linux ecosystem.

Potential invitees

  • CRIU people
  • Popcorn Linux
  • Slurm project
  • BLCR (missed them last time)
  • Rodrigo HPC live migration
  • George from OpenMPI
  • IBM cross-arch live migration
  • Docker C/R integration
  • undo.io / RR project from Mozilla / Dinesh

Attendees

  • Mike Rapoport, IBM (CRIU and cross-arch migration, hackathon)
  • Andrey Vagin, Virtuozzo (CRIU, hackathon)
  • Pavel Emelyanov, Virtuozzo (Tutorials, P.Haul, hackathon)
  • Kir Kolyshkin, OpenVZ (Tutorials, hackathon)
  • Popcorn Linux team, Virginia Tech
  • George Bosilca (OpenMPI, is willing to join)
  • Rodrigo Bruno, Distributed Systems Group, INSEC-ID
  • Manuel Rodriguez Pascual, (CIEMAT, SLURM project, is willing to join)
  • Dinesh Subhraveti, Fermat Inc.
  • Tycho Andersen, Docker.
  • Saied Kazemi, Google
  • Michael Holzheu, IBM, s390
  • Eric Biederman, (namespaces)

Key Topics for Discussion (tentative)

  • Checkpoint-restore in HPC
    • HPC requirements from C/R and gaps that exist with current technologies
    • What can we do to make CRIU suitable for HPC, discuss integration opportunities :)
    • We've had some time ago a patch to openMPI that called criu to C/R. How about resurrecting this discussion?
    • Explore the possibilities that checkpoint/restart brings to the field of job scheduling. How can we benefit from being able to save the status of a running job in a cluster and restore it at some other time and place?
  • C/R in jobs scheduling
    • Exploring the possibilities that checkpoint/restart brings to the field of job scheduling. How can we benefit from being able to save the status of a running job in a cluster and restore it at some other time and place?
    • Server-less computing could use C/R to decrease startup time for interpreter engines.
  • Heterogeneous computing, checkpoint-restore and process migration
    • How can we bring ideas from academic research to Linux ecosystem
      • Popcorn Linux approach for thread migration
      • Cross-architecture container migration
        • This brigs another topic into the game – user-friendly access to the images. Right now criu images are a big mess :(
  • Checkpoint-restore, migration and userfaultfd
    • Userfaultfd-WP for dirty memory tracking (current soft-dirt scheme is error prone and not flexible)
    • Extensions to non-cooperative userfautlfd
      • The most important one is restoring COW memory areas. There's currently no API in uffd to restore one page into two mms.
    • Checkpoint-restore of userfault-enabled applications
  • Keep up the pace
    • We constantly see that kernel guys break backward compatibility. And since criu uses quite strange and rare APIs nobody except us complain. Need to discuss what can be done about it if anything.
  • Performance
    • Restoring a single task now takes too much operations. What can be the options to speed things up?
    • There are many places where we need to call A LOT of syscalls mostly in vain. E.g. – to restore socket params. Most of them will remain defaults from creation, but how can we detect this to skip relevant sockopt?
    • Can we somehow benefit from parallel dump?
    • Is there any way to do pre-restore? We do have pre-dump that dumps only the memory keeping the processes running. Restore is 100% synchronous.
    • Pre-dumps now generate too much pipes. Which is bad by itself and also pins tasks memory in ram. We need to make pre-dumps w/o this. On of the options is to use sys_read_process_vm() syscall, but this duplicates the mem. May we have sys_vmsplice_process_vm() in the kernel please?
    • Virtuozzo devs report, that restoring a container with 5GiGs or RSS on 8GiGs host eats all the memory and causes the existing application starve and sometimes OOM-ed. Need to discuss this problem.
  • Security
    • The hottest topic here is user-mode checkpoint-restore. Although for checkpoint we more or less have enough APIs in the kernel, for restore we cannot do the simple “fork with pid” operation without diving into enormous complexity (in particular – to do this, we need a pid namespace, for pid namespace we need mount namespace to have relevant /proc mount, and for both we need user namespace. Said that user-mode restore is for now unreachable dream.
  • Revert to snapshot
  • C/R-aware applications
    • Rodrigo is researching the ways to friend JVM with C/R for more effective live migration. Long time ago Xemul had an idea of libcrassist.so that applications can link with thus helping criu to dump and restore them.
    • On of the examples of the above is – triggering GC before live migration to reduce memory footprint.

Contact

  • Andrei Vagin <avagin (at) openvz (dot) org>
  • Mike Rapoport <rppt (at) linux (dot) vnet (dot) ibm (dot) com>
  • Kir Kolyshkin <kir (at) openvz (dot) org>
  • Pavel Emelyanov <xemul (at) virtuozzo (dot) com>
 
2017/checkpoint-restart.txt · Last modified: 2017/07/13 05:38 by 195.110.40.7
 
Except where otherwise noted, content on this wiki is licensed under the following license:CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki