首页 > 其他 > 详细

gvisor overview

时间:2021-01-13 10:35:10      阅读:38      评论:0      收藏:0      [点我收藏+]
1. Overview:
  no interrupts, no devices, no io
  tasks are goroutines
   
  2. syscall:
  sentry can run in non-root(ring0) and root(ring3).
   
  userapp‘s syscall are intercepted like normal guest, and handled by sentry kernel(in non-root mode iif the syscall can be
  handled without sentry call syscall on host)
   
  sentry kernel‘s syscall always executed in root(ring3) mode. sentry kernel‘s syscall finally execute HLT, which causes
  VM exit. and then in root mode.
   
  basic flow:
  bluepill loop util bluepillHandler + setcontext return, now in non-root mode.
   
  user app -> syscall -> sysenter -> user: -> SwitchToUser returns -> syscall
  handled by sentry kernel(t.doSyscall()), in non-root mode.
   
  sentry kernel -> syscall -> sysenter -> HLT -> vm exit -> t.doSyscall() [in root
  mode].
   
  3. memory:
  physical memory:
  size of physical memory almost equals 1 << cpu.physicalbits, but might be smaller because of reserved region,etc.
  vsize - psize part not in physicalRegion. gva <-> gpa, ie
  guest pagetable, maps almost all gva <-> gpa, but gpa <-> hva(hpa)
  is only set for sentry kernel initially. Then gpa page frame
  is filled by HandleUserFault(from filemem or HostFile) each
  time there is ept fault..
   
  pagetables:
  gvisor itself is mapped in root and non-root mode, and the gva == hva. So, sentry runs in userspace address space
  in root ring3 mode, also run in userspace address space in non-root ring0 mode.
   
  user app: userspace address space(lower part of 64bits address) <--> gpa
  kernelspace address space(higher part of 64bits address), which actually
  is sentry kernel userspace address with 63th bit set <--> gpa. This
  map is almost useless, maybe only for pagetable switch and some setups.
  we cannot run sentry on this range of address..(even
  PIC cannot work, since PIC will be resolved once, not everytime when
  hits).
   
  sentry kernel: userspace address space, which is the userspace address on host.
  so, gva actually equals hva. then gva <-> gpa <-> hva.
  kernelspace address space is hva with 63th bits set <--> gpa. gpa <--> hva(hpa)
  is set using ept. Again, gpa <--> hva is set up for sentry kernel initially. All subsequent
  are handled by EPT fault, which eventually causes HandleUserFault().
   
  From here, we can see, for each user app syscall, there is pagetable switch.
  somewhat similary to KPTI. but the pagetable is very different.
   
  Since user app and sentry kernel‘s pagetable probably overlap(use the same userspace address space), they cannot be
  mapped at the same time. when syscall, switch to sentry kernel‘s pagetable, there
  is no map of user app in the table.. it causes access to user memory complicated..
  (This is why usermem is needed...). unlike linux, kernel‘s pagetable is superset
  of user process‘s pagetable, so kernel can access user memory convieniently.
   
  The access to userapp‘s memory from sentry kernel(for example, write syscall for userapp, sentry kernel
  have to copy data from userapp‘s memory address space). How to find the sentry kernel‘s addr according to the userapp‘s
  addr? Basically, Walkthrough userapp‘s pagetable to get uaddr --> gpa, Or walk userapp‘s vma to findout
  uaddr -> file + file offset, the walk userapp‘s address_space to findout file +file offset -> gpa. Then sentry
  knows gpa -> hva(it itself maps all the memory, stores the mapping), gets hva.. In sentry, gva == hva, no matter
  sentry in root or non-root, both ok to access this hva.
   
  Filesystem:
  The thin vfs is in sentry, like linux. Also has limited proc and sys. gofer only for 9pfs.
  From code path, all file operations go through 9p server, However From log, ther is no Tread/Twrite message in
  9p server. Topen/Tclunk go through 9p server, assume
  that read/write directly to host file, probably fd passed by unix domain socket.
   
  Network:
  receive via go routine, tx via endpoint.WritePacket.
   
  Summary:
  shortcomings: compatibility, unstable, syscall overhead. eg, mount command causes sudden exit of gvisor, ip command
  cannot run, SO_SNDBUF socket option not supported..
  merits: small memory footprints. physical memory be backed up by memfd/physical file(somehow like dax). on demand
  memory map, not fixed for the beginning.

gvisor overview

原文:https://www.cnblogs.com/dream397/p/14270544.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!