Proxmox: Difference between revisions
m (→Commands) |
m (→Links) |
||
(88 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
= Links = | = Links = | ||
Line 13: | Line 7: | ||
*[https://www.proxmox.com/en/proxmox-backup-server Proxmox Backup Server] | *[https://www.proxmox.com/en/proxmox-backup-server Proxmox Backup Server] | ||
*[https://pve.proxmox.com/wiki/Backup_and_Restore Backup and Restore] | *[https://pve.proxmox.com/wiki/Backup_and_Restore Backup and Restore] | ||
*[https://www.danatec.org/2021/05/21/two-node-cluster-in-proxmox-ve-with-raspberry-pi-as-qdevice/ Proxmox Cluster with Raspberry Pi as QDevice] | *[https://www.danatec.org/2021/05/21/two-node-cluster-in-proxmox-ve-with-raspberry-pi-as-qdevice/ Proxmox Cluster with Raspberry Pi as QDevice (outdate)] | ||
*[https://pve.proxmox.com/pve-docs/chapter-pvecm.html#_corosync_external_vote_support External vote ( like a raspberry pi)] | |||
*[https://gist.github.com/dragolabs/f391bdda050480871ddd129aa6080ac2 Useful proxmox commands ] | |||
*[https://tteck.github.io/Proxmox/ Proxmox VE helper scripts] | |||
*[https://bugzilla.proxmox.com/ Proxmox bug tracker] | |||
= Commands = | = Commands = | ||
==Get PVE version== | |||
pveversion -v | head -n 2 | |||
== qm Qemu Manager == | == qm Qemu Manager == | ||
Line 29: | Line 25: | ||
== pvecm == | == pvecm == | ||
===About pvecm output=== | |||
A = Alive, NA = Not Alive | |||
V = Vote, NV = Not Vote | |||
MW = Master Wins, NMW = Not Master Wins [0] | |||
NR = Not Registered | |||
==pvesh== | ==pvesh== | ||
Line 34: | Line 37: | ||
pvesh get /cluster/resources --output-format json-pretty | pvesh get /cluster/resources --output-format json-pretty | ||
===Get backup jobs=== | |||
pvesh get /cluster/backup | |||
then | |||
pvesh get /cluster/backup/{backupid} | |||
To change the vms included in the job: | |||
pvesh set /cluster/backup/{backupid} -vmid 100,101,102 | |||
===Get backup errors=== | |||
pvesh get /nodes/pve01/tasks --typefilter vzdump --errors | |||
= Documentation = | = Documentation = | ||
Line 40: | Line 54: | ||
*[https://pve.proxmox.com/wiki/Proxmox_VE_API Proxmox VE API] | *[https://pve.proxmox.com/wiki/Proxmox_VE_API Proxmox VE API] | ||
*[https://pve.proxmox.com/pve-docs/api-viewer/ Proxmox API viewer] | *[https://pve.proxmox.com/pve-docs/api-viewer/ Proxmox API viewer] | ||
View api: https://your.server:8006/api2/json/ ? | |||
===Invalid token name=== | |||
'''pve uses separator '=', but pbs wants ':'''' | |||
==Ballooning== | |||
*[https://pve.proxmox.com/wiki/Dynamic_Memory_Management Dynamic Memory Management] | |||
Ballooning memory limit 80%: | |||
autoballooning is done by one of our daemons (pvestatd) and this limit is hardcoded at the moment | |||
== Directory structure == | == Directory structure == | ||
Line 78: | Line 104: | ||
[https://github.com/takala-jp/zabbix-proxmox https://github.com/takala-jp/zabbix-proxmox] | [https://github.com/takala-jp/zabbix-proxmox https://github.com/takala-jp/zabbix-proxmox] | ||
=Terms= | |||
==vram== | |||
maximum amount of memory a vm may use | |||
== | ==lrm== | ||
Local Resource Manager | |||
=HOWTO= | |||
==Maintenance== | |||
===Rebooting a node=== | |||
=== | If HA enabled check https://pve.proxmox.com/wiki/High_Availability#ha_manager_node_maintenance | ||
If you don't want it to start migrating, 'Freeze' might be the right option for HA Settings. | |||
otherwise just do it :) | |||
==Disk cache for guest== | |||
*https://pve.proxmox.com/wiki/Performance_Tweaks#Disk_Cache | |||
*https://forum.proxmox.com/threads/disk-cache-wiki-documentation.125775/ | |||
= | ==Show vm configuration== | ||
qm config 101 | |||
==Get VM name by ID== | |||
grep '^name:' /etc/pve/nodes/*/qemu-server/$ID.conf | awk '{print $2}' | |||
or | |||
pvesh get /cluster/resources -type vm --output-format yaml | egrep -i 'vmid|name' | sed 's@.*:@@' | |||
or | |||
grep "name:" /etc/pve/nodes/*/*/<vmid>.conf | awk '{ print $2 }' | |||
== Clustering == | == Clustering == | ||
Line 105: | Line 142: | ||
It seems relatively safe to restart corosync | It seems relatively safe to restart corosync | ||
===View cluster logs=== | |||
pvesh get /cluster/tasks --output-format=json-pretty | |||
==Sysctl settings for kvm guests== | |||
Still investigating, going for /etc/sysctl.d/50-kvmguest.conf | |||
vm.vfs_cache_pressure=30 | |||
vm.swappiness=5 | |||
==Installing proxmox via PXE== | |||
https://github.com/morph027/pve-iso-2-pxe | |||
==Storage== | |||
===Adding another thin pool=== | |||
lvcreate -L 500G --thinpool newpool vg1 | |||
after creating lvm thin pool (TODO: link to that) add to '''/etc/pve/storage.cfg''' | |||
lvmthin: lvm-raid10 | |||
thinpool raid10pool | |||
vgname raid10 | |||
content images | |||
==Disks== | |||
===Identify disks in linux guest=== | |||
lsblk -o +SERIAL | |||
===Run fstrim from host=== | |||
Assuming agent is running: | |||
qm agent 102 fstrim | |||
==Suspend or hibernate== | |||
===Suspend=== | |||
Suspend does not turn off your computer. It puts the computer and all peripherals on a low power consumption mode. If the battery runs out or the computer turns off for some reason, the current session and unsaved changes will be lost. | |||
qm suspend | |||
in GUI: sleep | |||
qm status: paused | |||
===Hibernate=== | |||
Hibernate saves the state of your computer to the hard disk and completely powers off. When resuming, the saved state is restored to RAM. | |||
qm suspend to disk | |||
or in GUI: Hibe | |||
==Backups== | |||
=== proxmox-backup-client === | |||
export PBS_REPOSITORY="backup@pbs@pbs-server:backuprepo" | |||
proxmox-backup-client snapshot list | |||
proxmox-backup-client prune vm/101 --dry-run --keep-daily 7 --keep-weekly 3 | |||
proxmox-backup-client garbage-collect | |||
===vzdump limit bandwidth=== | |||
--bwlimit 50000 | |||
it looks like that limits read speed, i also noticed that bad write/speed to PBS has bad effects on guests | |||
or nowadays in '''/etc/vzdump.conf''': | |||
bwlimit | |||
===Get total memory allocated to vms=== | |||
grep memory: /etc/pve/nodes/*/qemu-server/*conf|awk '{sum+=$2} END {print sum}' | |||
= FAQ = | |||
==Web interface stuck on "loading"== | |||
===When clicking on guest on a particular node=== | |||
====Works on webui of that node==== | |||
Different versions of PVE? | |||
==Console: unable to find serial interface== | |||
Maybe you're trying to get console on guest of another node in your cluster. To investigate why this goes wrong. | |||
==Cores or threads?== | |||
What's called "core" in the Web UI is a core from guest point of view, it would probably be a thread on the host. | |||
==Cloud-init== | |||
===No CloudInit Drive found=== | |||
See https://gist.github.com/aw/ce460c2100163c38734a83e09ac0439a | |||
==Error messages== | |||
=== Proxmox API call failed: Couldn't authenticate user: zabbix@pve === | |||
Funky characters in password string? | |||
=== SMP vm created on host with unstable TSC; guest TSC will not be reliable === | |||
===memory: hotplug problem - 400 Parameter verification failed. dimm17: error unplug memory module=== | |||
bad! | |||
=== Failed to establish a new connection: [Errno -2] Name or service not known === | |||
Just that, check your DNS | |||
=== ConditionPathExists=/etc/corosync/corosync.conf was not met === | |||
Problably set up node with bad /etc/hosts, or forgot to join cluster | |||
https://forum.proxmox.com/threads/cluster.103370/ | |||
=== [https://blog.jenningsga.com/proxmox-keeping-quorum-with-qdevices/ https://blog.jenningsga.com/proxmox-keeping-quorum-with-qdevices/] === | === [https://blog.jenningsga.com/proxmox-keeping-quorum-with-qdevices/ https://blog.jenningsga.com/proxmox-keeping-quorum-with-qdevices/] === | ||
Line 131: | Line 276: | ||
corosync-cmapctl | grep quorum.device<br/> quorum.device.model (str) = net<br/> quorum.device.net.algorithm (str) = ffsplit<br/> quorum.device.net.host (str) = 192.168.178.2<br/> quorum.device.net.tls (str) = on<br/> quorum.device.votes (u32) = 1<br/> <br/> [https://bugs.launchpad.net/ubuntu/+source/corosync-qdevice/+bug/1733889 https://bugs.launchpad.net/ubuntu/+source/corosync-qdevice/+bug/1733889] | corosync-cmapctl | grep quorum.device<br/> quorum.device.model (str) = net<br/> quorum.device.net.algorithm (str) = ffsplit<br/> quorum.device.net.host (str) = 192.168.178.2<br/> quorum.device.net.tls (str) = on<br/> quorum.device.votes (u32) = 1<br/> <br/> [https://bugs.launchpad.net/ubuntu/+source/corosync-qdevice/+bug/1733889 https://bugs.launchpad.net/ubuntu/+source/corosync-qdevice/+bug/1733889] | ||
=== x86/split lock detection: #AC: kvm/1161956 took a split_lock trap at address: 0x7ffebcb378ab=== | |||
The number after '''kvm/''' is process it, this will help you find the culprit. | |||
See: | |||
*[https://www.sobyte.net/post/2022-05/split-locks/ In-depth analysis of split locks] | |||
*[https://lwn.net/Articles/790464/ Detecting and handling split locks] | |||
*[https://lwn.net/Articles/911219/ The search for the correct amount of split-lock misery] | |||
Could be large Windows guests on NUMA. Can probably be ignored | |||
== Shutting down a node == | |||
Should just work. Takes guests down with it when they're not in HA | Should just work. Takes guests down with it when they're not in HA | ||
== | == API calls == | ||
===List all vms in cluster=== | |||
pvesh get /cluster/resources --type vm --output-format yaml | egrep -i 'vmid|name' | |||
or json: | |||
pvesh get /cluster/resources --type vm --output-format json | jq '.[] | {id,name}' | |||
== Cores, sockets and vCPUs == | |||
vCPUs is what the vm uses, maximum is sockets*cores but you could set it lower to allow adding cores/vcpus dynamically. | |||
== Migrating == | == Migrating == | ||
Line 176: | Line 326: | ||
qm agent 105 ping | qm agent 105 ping | ||
or | |||
qm guest cmd 111 ping | |||
===qm agent ping return values=== | |||
0: OK | |||
2: VM not running | |||
255: No QEMU guest agent configured (just disabled in vm config?) (QEMU guest agent is not running would only show when enabled in in config?) | |||
There is no way to tell if agent is running when it's not enabled in VM config. | |||
When VM is not running, GUI claims agent not running | |||
== Move to unused disk == | == Move to unused disk == | ||
Line 211: | Line 374: | ||
== Storage == | == Storage == | ||
===Could not determine current size of volume=== | |||
When trying to grow a disk | |||
another secret! | |||
=== Add local disk or LV to vm === | === Add local disk or LV to vm === | ||
Line 221: | Line 387: | ||
| | ||
===Storage migration failed: block job (mirror) error: drive-scsi0: 'mirror' has been cancelled=== | |||
Maybe moving disk to LVM, check for 4MiB alignment. qemu-img resize to 4MiB aligned size. | |||
===fstrim guests=== | |||
qm guest <ID> fstrim | |||
===qmp command 'guest-fstrim' failed - got timeout=== | |||
seems to be a windows thing | |||
===No disk unused=== | |||
when trying to create thin volume, use command line? | |||
==qcow image bigger than assigned disk== | |||
Probably snapshots | |||
== Backups == | == Backups == | ||
===backup write data failed: command error: protocol canceled=== | |||
Temporary network failure? | |||
===storing login ticket failed: $XDG_RUNTIME_DIR must be set=== | |||
Temporary bug, ignore it | |||
=== PBS GC & Prune scheduling === | === PBS GC & Prune scheduling === | ||
Line 234: | Line 423: | ||
| | ||
=== dirty-bitmap status: existing bitmap was invalid and has been cleared === | === dirty-bitmap status: existing bitmap was invalid and has been cleared === | ||
Line 267: | Line 448: | ||
qm unlock 101 | qm unlock 101 | ||
=== qmp command 'blockdev-snapshot-delete-internal-sync' failed - got timeout === | |||
Another job for | |||
qm unlock 101 | |||
===qmp command 'blockdev-snapshot-delete-internal-sync' failed - Snapshot with id 'null' and name 'mysnapshot' does not exist on device 'drive-scsi1'=== | |||
Verify there is no such snapshot at all: | |||
qemu-img snapshot -l vm-114-disk-1.qcow2 | |||
and then delete the entire system from [mysnapshot] in the vm config file | |||
=== can't acquire lock '/var/run/vzdump.lock' - got timeout === | === can't acquire lock '/var/run/vzdump.lock' - got timeout === | ||
Check if vzdump is running, otherwise kill it (cluster?) | Check if vzdump is running, otherwise kill it (cluster?) | ||
You could change lockwait in vzdump.conf, or as --lockwait parameter. | |||
Default is 180 minutes | |||
| | ||
Line 282: | Line 482: | ||
qm delsnapshot 101 snapname | qm delsnapshot 101 snapname | ||
If that throws like Failed to find logical volume 'pve/snap_vm-101-disk-0_saving' | If that throws like '''Failed to find logical volume 'pve/snap_vm-101-disk-0_saving''' | ||
qm delsnapshot 101 snapname --force | qm delsnapshot 101 snapname --force | ||
If it says '''VM is locked (snapshot-delete)''' us | |||
qm unlock XXX | |||
When you get '''does not exist on device 'drive-scsi0''' you might also need to delete the line "lock: snapshot-delete" from the 101.conf file | |||
===qmp command 'query-backup' failed - got wrong command id=== | |||
=== Restoring single file from (PBS) backup === | === Restoring single file from (PBS) backup === | ||
Line 297: | Line 501: | ||
proxmox-file-restore | proxmox-file-restore | ||
===proxmox-file-restore failed: Error: mounting 'drive-scsi0.img.fidx/part/["2"]' failed: all mounts failed or no supported file system (500)=== | |||
Maybe because of lvm? | |||
===Backup log=== | |||
====Upload size==== | |||
Seems to be in kilobytes | |||
====Duplicates==== | |||
== Error: VM quit/powerdown failed - got timeout == | == Error: VM quit/powerdown failed - got timeout == | ||
Line 306: | Line 521: | ||
| | ||
== You have not turned on protection against thin pools running out of space. == | == You have not turned on protection against thin pools running out of space. == | ||
Seems noboby knows how, just monitor it? | |||
== serial console from command line == | == serial console from command line == | ||
Line 367: | Line 583: | ||
== Adding hardware shows orange == | == Adding hardware shows orange == | ||
The keyword here is "PENDING", possibly ion /etc/pve/qemu-server/<id>.conf | |||
something is not supported (Options->Hotplug) | Maybe something is not supported (Options->Hotplug), options: | ||
reboot or click "revert" | |||
== "Connection error 401: no ticket" == | == "Connection error 401: no ticket" == | ||
Line 376: | Line 594: | ||
== can't lock file '/var/lock/qemu-server/lock-102.conf' - got timeout (500) == | == can't lock file '/var/lock/qemu-server/lock-102.conf' - got timeout (500) == | ||
Maybe someone else has/had webui open | Maybe someone else has/had webui open, otherwise just remove it | ||
== TASK ERROR: Can't use string ("keep-all=0,keep-last=3") as a HASH ref while "strict refs" in use at /usr/share/perl5/PVE/VZDump.pm line 502. == | |||
Classic, means incorrect syntax in your /etc/pve/storage.cfg | Classic, means incorrect syntax in your /etc/pve/storage.cfg | ||
Line 410: | Line 630: | ||
== Find vm name by id == | == Find vm name by id == | ||
qm config 100 | grep '^name:' | awk '{print $2}' | |||
or a bit cruder" | |||
grep name: /etc/pve/nodes/*/qemu-server/101.conf |head -n 1 | cut -d ' ' -f 2 | grep name: /etc/pve/nodes/*/qemu-server/101.conf |head -n 1 | cut -d ' ' -f 2 | ||
Line 448: | Line 669: | ||
| | ||
==vzdump: # cluster wide vzdump cron schedule== | |||
# Automatically generated file - do not edit | |||
edit it anyway? | |||
==Guest issues== | |||
===virtio_balloon virtio0: Out of puff! Can't get 1 pages=== | |||
==iSCSI== | |||
===iscsid: conn 0 login rejected: initiator error - target not found=== | |||
pvesm scan iscsi <targetip> | |||
and | |||
iscsiadm -m session -P 3 | |||
==udev high load== | |||
Check | |||
udevadmin monitor | |||
KERNEL[426405.347906] change /devices/virtual/block/dm-8 (block) | |||
UDEV [426405.359582] change /devices/virtual/block/dm-8 (block) | |||
ls -al /dev/mapper/ | |||
pve-vm--113--disk--0 -> ../dm-8 | |||
So vm/lx '113' is the one. | |||
In general see https://forum.proxmox.com/threads/udev-malfunction-udisksd-high-cpu-load.99169/ | |||
since it could be usdisks2 | |||
==start failed: org.freedesktop.DBus.Error.Disconnected: Connection is closed== | |||
Most likely that VM isn't running. | |||
[[Category:Proxmox]] | [[Category:Proxmox]] |
Revision as of 15:05, 29 April 2024
Links
- Proxmox VE Administration Guide
- https://pve.proxmox.com/wiki Wiki
- Monitoring Proxmox with Zabbix
- Proxmox Backup Server
- Backup and Restore
- Proxmox Cluster with Raspberry Pi as QDevice (outdate)
- External vote ( like a raspberry pi)
- Useful proxmox commands
- Proxmox VE helper scripts
- Proxmox bug tracker
Commands
Get PVE version
pveversion -v | head -n 2
qm Qemu Manager
pvesm Storage manager
Check
pvesm scan
pveperf
pvecm
About pvecm output
A = Alive, NA = Not Alive V = Vote, NV = Not Vote MW = Master Wins, NMW = Not Master Wins [0] NR = Not Registered
pvesh
pvesh get /cluster/resources
pvesh get /cluster/resources --output-format json-pretty
Get backup jobs
pvesh get /cluster/backup
then
pvesh get /cluster/backup/{backupid}
To change the vms included in the job:
pvesh set /cluster/backup/{backupid} -vmid 100,101,102
Get backup errors
pvesh get /nodes/pve01/tasks --typefilter vzdump --errors
Documentation
Proxmox API
View api: https://your.server:8006/api2/json/ ?
Invalid token name
pve uses separator '=', but pbs wants ':'
Ballooning
Ballooning memory limit 80%:
autoballooning is done by one of our daemons (pvestatd) and this limit is hardcoded at the moment
Directory structure
/etc/pve
/etc/pve/qemu-server
The VM configs
vmstate? sems related to snapshots
/var/lib/vz
/var/lib/vz/template/iso
Proxmox cluster
https://pve.proxmox.com/wiki/Cluster_Manager
Cluster manager
pvecm status pvecm nodes
HA status
ha-manager status
Monitoring proxmox with zabbix
https://github.com/takala-jp/zabbix-proxmox
Terms
vram
maximum amount of memory a vm may use
lrm
Local Resource Manager
HOWTO
Maintenance
Rebooting a node
If HA enabled check https://pve.proxmox.com/wiki/High_Availability#ha_manager_node_maintenance If you don't want it to start migrating, 'Freeze' might be the right option for HA Settings.
otherwise just do it :)
Disk cache for guest
- https://pve.proxmox.com/wiki/Performance_Tweaks#Disk_Cache
- https://forum.proxmox.com/threads/disk-cache-wiki-documentation.125775/
Show vm configuration
qm config 101
Get VM name by ID
grep '^name:' /etc/pve/nodes/*/qemu-server/$ID.conf | awk '{print $2}'
or
pvesh get /cluster/resources -type vm --output-format yaml | egrep -i 'vmid|name' | sed 's@.*:@@'
or
grep "name:" /etc/pve/nodes/*/*/<vmid>.conf | awk '{ print $2 }'
Clustering
Show cluster status
pvecm status
It seems relatively safe to restart corosync
View cluster logs
pvesh get /cluster/tasks --output-format=json-pretty
Sysctl settings for kvm guests
Still investigating, going for /etc/sysctl.d/50-kvmguest.conf
vm.vfs_cache_pressure=30 vm.swappiness=5
Installing proxmox via PXE
https://github.com/morph027/pve-iso-2-pxe
Storage
Adding another thin pool
lvcreate -L 500G --thinpool newpool vg1
after creating lvm thin pool (TODO: link to that) add to /etc/pve/storage.cfg
lvmthin: lvm-raid10 thinpool raid10pool vgname raid10 content images
Disks
Identify disks in linux guest
lsblk -o +SERIAL
Run fstrim from host
Assuming agent is running:
qm agent 102 fstrim
Suspend or hibernate
Suspend
Suspend does not turn off your computer. It puts the computer and all peripherals on a low power consumption mode. If the battery runs out or the computer turns off for some reason, the current session and unsaved changes will be lost.
qm suspend
in GUI: sleep
qm status: paused
Hibernate
Hibernate saves the state of your computer to the hard disk and completely powers off. When resuming, the saved state is restored to RAM.
qm suspend to disk
or in GUI: Hibe
Backups
proxmox-backup-client
export PBS_REPOSITORY="backup@pbs@pbs-server:backuprepo"
proxmox-backup-client snapshot list
proxmox-backup-client prune vm/101 --dry-run --keep-daily 7 --keep-weekly 3
proxmox-backup-client garbage-collect
vzdump limit bandwidth
--bwlimit 50000
it looks like that limits read speed, i also noticed that bad write/speed to PBS has bad effects on guests or nowadays in /etc/vzdump.conf:
bwlimit
Get total memory allocated to vms
grep memory: /etc/pve/nodes/*/qemu-server/*conf|awk '{sum+=$2} END {print sum}'
FAQ
Web interface stuck on "loading"
When clicking on guest on a particular node
Works on webui of that node
Different versions of PVE?
Console: unable to find serial interface
Maybe you're trying to get console on guest of another node in your cluster. To investigate why this goes wrong.
Cores or threads?
What's called "core" in the Web UI is a core from guest point of view, it would probably be a thread on the host.
Cloud-init
No CloudInit Drive found
See https://gist.github.com/aw/ce460c2100163c38734a83e09ac0439a
Error messages
Proxmox API call failed: Couldn't authenticate user: zabbix@pve
Funky characters in password string?
SMP vm created on host with unstable TSC; guest TSC will not be reliable
memory: hotplug problem - 400 Parameter verification failed. dimm17: error unplug memory module
bad!
Failed to establish a new connection: [Errno -2] Name or service not known
Just that, check your DNS
ConditionPathExists=/etc/corosync/corosync.conf was not met
Problably set up node with bad /etc/hosts, or forgot to join cluster
https://forum.proxmox.com/threads/cluster.103370/
https://blog.jenningsga.com/proxmox-keeping-quorum-with-qdevices/
https://pve.proxmox.com/pve-docs/chapter-pvecm.html#_corosync_external_vote_support
corosync-qdevice[11695]: Can't read quorum.device.model cmap key
On the qdevice node
Check corosync-cmapctl ?
also see https://pve.proxmox.com/pve-docs/chapter-pvecm.html#_corosync_external_vote_support
"Quorum: 2 Activity blocked"
In my case this meant boot up second real node first
On working node:
corosync-cmapctl | grep quorum.device
quorum.device.model (str) = net
quorum.device.net.algorithm (str) = ffsplit
quorum.device.net.host (str) = 192.168.178.2
quorum.device.net.tls (str) = on
quorum.device.votes (u32) = 1
https://bugs.launchpad.net/ubuntu/+source/corosync-qdevice/+bug/1733889
x86/split lock detection: #AC: kvm/1161956 took a split_lock trap at address: 0x7ffebcb378ab
The number after kvm/ is process it, this will help you find the culprit.
See:
- In-depth analysis of split locks
- Detecting and handling split locks
- The search for the correct amount of split-lock misery
Could be large Windows guests on NUMA. Can probably be ignored
Shutting down a node
Should just work. Takes guests down with it when they're not in HA
API calls
List all vms in cluster
pvesh get /cluster/resources --type vm --output-format yaml | egrep -i 'vmid|name'
or json:
pvesh get /cluster/resources --type vm --output-format json | jq '.[] | {id,name}'
Cores, sockets and vCPUs
vCPUs is what the vm uses, maximum is sockets*cores but you could set it lower to allow adding cores/vcpus dynamically.
Migrating
VM is locked (create) (500)
Not always clear why, but try
qm unlock 111
Replication
missing replicate feature on volume 'local-lvm
looks like replication of lvm isn't supported
Check if qemu agent is running
See if IP is shown under Summary, also
qm agent 105 ping
or
qm guest cmd 111 ping
qm agent ping return values
0: OK
2: VM not running
255: No QEMU guest agent configured (just disabled in vm config?) (QEMU guest agent is not running would only show when enabled in in config?)
There is no way to tell if agent is running when it's not enabled in VM config.
When VM is not running, GUI claims agent not running
Move to unused disk
If you moved disk, and decided to move back to the old one:
- detach current disk
- select the unused disk
- click Add
Stop all proxmox services
systemctl stop pve-cluster systemctl stop pvedaemon systemctl stop pveproxy systemctl stop pvestatd
Storage (xx) not available on selected target
probably some storage mounted only on one node, so not clustered
switch to community repository
cat /etc/apt/sources.list.d/pve-enterprise.list #deb https://enterprise.proxmox.com/debian/pve buster pve-enterprise
echo "deb http://download.proxmox.com/debian/pve buster pve-no-subscription" > /etc/apt/sources.list.d/pve-no-subscription.list
apt update
W: (pve-apt-hook) You are attempting to remove the meta-package 'proxmox-ve'!
cehck sources.list :)
Storage
Could not determine current size of volume
When trying to grow a disk another secret!
Add local disk or LV to vm
That would be passtrough
qm set 101 -scsi1 /dev/mapper/somevolume
Make sure node node can't migrate: ?? PVE won't try that anyway, but still
Storage migration failed: block job (mirror) error: drive-scsi0: 'mirror' has been cancelled
Maybe moving disk to LVM, check for 4MiB alignment. qemu-img resize to 4MiB aligned size.
fstrim guests
qm guest <ID> fstrim
qmp command 'guest-fstrim' failed - got timeout
seems to be a windows thing
No disk unused
when trying to create thin volume, use command line?
qcow image bigger than assigned disk
Probably snapshots
Backups
backup write data failed: command error: protocol canceled
Temporary network failure?
storing login ticket failed: $XDG_RUNTIME_DIR must be set
Temporary bug, ignore it
PBS GC & Prune scheduling
https://pbs.proxmox.com/docs/prune-simulator/
qmp command 'backup' failed - got timeout
https://github.com/proxmox/qemu/blob/master/qmp-commands.hx
dirty-bitmap status: existing bitmap was invalid and has been cleared
unexpected property 'prune-backups' (500)
When for example Add: iSCSI Uncheck "Keep all backups" in "Backup retention"
FAILED 00:00:02 unable to activate storage
TODO
VM 101 Backup failed: VM is locked (snapshot)
Check if there's no snapshot running (how?)
qm unlock 101
qmp command 'blockdev-snapshot-delete-internal-sync' failed - got timeout
Another job for
qm unlock 101
qmp command 'blockdev-snapshot-delete-internal-sync' failed - Snapshot with id 'null' and name 'mysnapshot' does not exist on device 'drive-scsi1'
Verify there is no such snapshot at all:
qemu-img snapshot -l vm-114-disk-1.qcow2
and then delete the entire system from [mysnapshot] in the vm config file
can't acquire lock '/var/run/vzdump.lock' - got timeout
Check if vzdump is running, otherwise kill it (cluster?)
You could change lockwait in vzdump.conf, or as --lockwait parameter. Default is 180 minutes
VM 101 Backup failed::= VM is locked (snapshot-delete)
Check /etc/pve/qemu-server/101.conf for 'snapstate'
If that says 'delete' for a snapshot try deleting the snapshot:
qm delsnapshot 101 snapname
If that throws like Failed to find logical volume 'pve/snap_vm-101-disk-0_saving
qm delsnapshot 101 snapname --force
If it says VM is locked (snapshot-delete) us
qm unlock XXX
When you get does not exist on device 'drive-scsi0 you might also need to delete the line "lock: snapshot-delete" from the 101.conf file
qmp command 'query-backup' failed - got wrong command id
Restoring single file from (PBS) backup
Check Mounting of archives with fuse
Requires package proxmox-backup-file-restore:
proxmox-file-restore
proxmox-file-restore failed: Error: mounting 'drive-scsi0.img.fidx/part/["2"]' failed: all mounts failed or no supported file system (500)
Maybe because of lvm?
Backup log
Upload size
Seems to be in kilobytes
Duplicates
Error: VM quit/powerdown failed - got timeout
qm stop VMID
if that complains about lock, remove the lock and try again
You have not turned on protection against thin pools running out of space.
Seems noboby knows how, just monitor it?
serial console from command line
qm terminal <id}
enable serial console in guest
looks like this is not needed:
systemctl enable serial-getty@ttyS0.service
in /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="console=ttyS0 console=tty0"
ttyS0 is for qm terminal, tty0 is for the "console" buttion in UI
- debian based
update-grub
- redhat based
grub2-mkconfig --output=/boot/grub2/grub.cfg
add
serial0: socket
to /etc/pve/qemu-server/[vmid].conf and restart
agetty: /dev/ttyS0: not a device
systemctl status useless again, means the serial bit is missing from <vmid>.conf
TASK ERROR: command 'apt-get update' failed: exit code 100
subtle way of telling you to get subscription of at least change the sources list
Import vmdk to lvm
https://pve.proxmox.com/wiki/Qemu/KVM_Virtual_Machines#_importing_virtual_machines_and_disk_images
Can't apply changes to memory allocation
Maybe try enabling NUMA in CPU settings
Adding hardware shows orange
The keyword here is "PENDING", possibly ion /etc/pve/qemu-server/<id>.conf
Maybe something is not supported (Options->Hotplug), options: reboot or click "revert"
"Connection error 401: no ticket"
Login session expired?
can't lock file '/var/lock/qemu-server/lock-102.conf' - got timeout (500)
Maybe someone else has/had webui open, otherwise just remove it
Classic, means incorrect syntax in your /etc/pve/storage.cfg
The current guest configuration does not support taking new snapshots
You're using raw instead of qcow2. Convert: Hardware->Hard disk "Move Disk"
WARNING: Device /dev/dm-21 not initialized in udev database even after waiting 10000000 microseconds.
Until someone fixes it:
udevadm trigger
Also look for link to dm-21 in /dev/disk/by-id
"connection error - server offline?"
Try reconnection browser
Find vm name by id
qm config 100 | grep '^name:' | awk '{print $2}'
or a bit cruder"
grep name: /etc/pve/nodes/*/qemu-server/101.conf |head -n 1 | cut -d ' ' -f 2
Started Proxmox VE replication runner.
??
Find ID by name
grep -l "name: <NAME>" /etc/pve/nodes/*/qemu-server/*conf| sed 's/^.*\/\([0-9]*\)\.conf/\1/g'
Can't migrate VM with local CD/DVD
Remove the CD :)
Memory allocated to VMs
qm list|egrep -v "VM|stopped" | awk '{ sum+=$4 } END { print sum }'
Ceph
Got timeout(500)
Check
pveceph status
Possibly problem with ceph mgr
vzdump: # cluster wide vzdump cron schedule
- Automatically generated file - do not edit
edit it anyway?
Guest issues
virtio_balloon virtio0: Out of puff! Can't get 1 pages
iSCSI
iscsid: conn 0 login rejected: initiator error - target not found
pvesm scan iscsi <targetip>
and
iscsiadm -m session -P 3
udev high load
Check
udevadmin monitor
KERNEL[426405.347906] change /devices/virtual/block/dm-8 (block) UDEV [426405.359582] change /devices/virtual/block/dm-8 (block)
ls -al /dev/mapper/
pve-vm--113--disk--0 -> ../dm-8
So vm/lx '113' is the one.
In general see https://forum.proxmox.com/threads/udev-malfunction-udisksd-high-cpu-load.99169/ since it could be usdisks2
start failed: org.freedesktop.DBus.Error.Disconnected: Connection is closed
Most likely that VM isn't running.