The Linux Plumbers 2011 Virtualization track is focusing on general free software Linux Virtualization. It is not
focused on a specific hypervisor, but will focus on general virtualization issues and in particular collaboration
amongst projects. This would include KVM, Xen, QEMU, containers etc.
The structure will be similar to what was used in 2010,
ie. 30 minutes per subject, including discussion.
The schedule of the 2011 Virtualization Plumbers Micro Conference was as follows. Note presentation slides can be found on the Plumbers page by following the links to the abstracts:
The following is the notes taken in the Etherpad from the discussions.
Xen PV network transmit operation:
Back end either maps the guest pages or copies them, the former is needed to implement zero copy but there is one issue: network stack does not do proper page reference counting.
Page life cycle tracking issue in the network layer affects any subsystem that gives paget to the network layer (such as NFS).
Basic requirement for a general solution: even if we give a page to the network layer we need ti retain ownership of it.Implemented using fragment
API and destructor infrastructure.
Problem should not be relevant to KVM. Discuss issue on the mailing list whether KVM's implementation is affected?.
What about performance?: still have not measured the effect on performance of calling the destructors. It is a correctness issue so we have to live with it.
Might double the overhead of allocating a skb. Need to measure how it affects memory usage.
James: could we get rid of the new page struct member using an indexed array.
Patches well received by the netdev guys.
* Started as Google Summer of code project.
* Need to figure out why “pv guest, with virtio nic” is so slow.
- Trying to switch Xen over to virtio?
It was not the speaker's inital intent.
Ian: could do it for simple devices such as the random generator and such.
Jes: If we could build Xen APIs on top of virtio it would be good for the whole
FOSS virtualization community.
Ian: that is not our current goal.
Proposed by Takahiro Hirofuchi, AIST. Presented by Isaku Yamahata, VALinux systems Japan K.K.
precopy: copy mem before switching execution. (status quo)
postcopy: start executing before pages are copied on demand, and in background
precopy can result in the same page being copied multiple times, as it is repeatedly dirtied. Migration time depends on how fast memory is re-dirtied (memory update intensity).
postcopy switch time deterministic 200-300ms, independent of RAM size. Performance loss after switch, but should be short.
Early-stage proof-of-concept code
qemu-kvm cannot handle 100% itself, since others may have modified guest RAM on source machine. Must hook guest RAM access during postcopy phase
postcopy vulnerable to either machine failing during transition period. Checkpointing likely needed. Lockstep? (a talk in cloud MC (Remus) talked about HA snapshotting that may be relevant). James described Stratus lockstep solution, duplicating inputs (e.g. incoming web requests) and verifying outputs match.
paravirt: bypass qemu improves latency & perfomance
even better: assign a device directly to the VM via SRIOV. close to bare metal performance a compatibilty win
downside: guest pinned in host memory, VM tied to phys host device
PCI config, BARs, interrupts mapped or fwded to guest
Current implementation not ideal
VFIO: high-perf userspace driver. KVM not required
kernel module. configs iommu, other stuff. iommu issues can be thorny. VFIO-NG coming soon.
IOMMU2 will help, can page-fault
Idea is to suspend writes to a filesystem before snapshot / resume after.
Often used for backup - get a consistent snapshot
Suspend writes – includes VFS I/O and MMAP I/O (mmap since v3.0).
Was XFS specific – now in VFS: ioctls FIFREEZE and FITHAW.
New usecase: snapshot of virtual machine from hypervisor
Needs in-guest support (Linux: virtagent. Windows: VSS)
Problem: what if agent dies within guest. cannot check state on restart
New ioctl: FIGETFREEZEFD – freeze and return fd (handle). thaw when fd is closed. automagically thawed, solves issues with agent going away. On this fd can do FS_FREEZE_FD, FS_THAW_FD and FS_ISFROZEN_FD. Can add freeze=true/false paramter to FIGETFREEZEFD.
Access control: Use CAPABILITY (CAP_SG?) and permission to open a path within the fs (needed for FIGETFREEZEFD).
Need to be careful when snapshotting filesystem containing agent binary. mlock etc. small dedicated binary for ease of review (rather than funciton of larger binary)
Need for polling/notiication interface for applications? Cannot freeze/thaw but can monitor.