How to Fix Ceph Error “cluster_uuid file exists with value X, != our uuid Y”

This error can occur if you are trying to perform a recovery from OSDs, and the cluster_uuid extracted from the recovery does not match the monmap.

# Replace 'pve1' with the name of your monitor
# Stop the monitor
systemctl stop ceph-mon@pve1
# Extract the monitor map to a file called monmap
ceph-mon -i pve1 --extract-monmap monmap
# Change FSID
monmaptool --clobber --fsid $NEW_FSID monmap
# Make any other changes via monmaptool, such as rewriting the monitor list
# Inject the new monmap
ceph-mon -i pve1 --inject-monmap monmap
# Start service
systemctl start ceph-mon@pve1

Mini-Review: Supermicro X11SDV-16C-TP8F

With a whopping 16C32T CPU, this board probably has the most powerful embedded CPU I’ve used. I scored a nice deal on one, intending to eventually replace one of my older X10SDV quad-core boards.

The Good

  • 16C32T
  • Lower power (100W TDP CPU)
  • Supports up to 512GB RAM in LRDIMMs, or 256GB of RDIMMs
  • Good I/O – One PCIe x16 slot, one x8 slot, two x4 ports for U.2, an x4 M.2, an x2 M.2, and a miniPCIe slot
  • Compact (Micro-ATX)
  • Six fan headers should be enough even for many server chassis designs
  • BMC can monitor temperature of NVMe drives on the U.2 ports

The Bad

  • Intel X710 for the 10G LANs, despite the block diagram in the manual showing an X557-AT2 like the X10SDV series. This is my first time using the X710, and I can see why they have their reputation.
  • U.2 ports have a few issues, most of which are likely due to the fact that they are using flex I/O lanes rather than normal PCIe lanes:
    • They support coordinated hot add/removal, but only if you booted with a device plugged into them.
    • If you didn’t have a device connected, then they don’t even get a bus number, which can cause other devices to change PCI addresses (definitely the BMC, probably the B-key M.2 and miniPCIe as well).
    • While they can measure NVMe temperature, sometimes the temperature is measured wrong, leading to fans spinning up for no reason.
    • No VMD support.

Haven’t Tried

  • Haven’t attempted to use the U.2 ports with a backplane to see if LEDs work.
  • The BIOS says that the x16 slot supports VMD and hotplug. I can confirm that the BIOS seems to support hotplug events by using a PCIe switch card with known good hotplug support, so I don’t see why it wouldn’t.

Quick Fix: grub-probe ZFS error

I ran into an issue with being unable to apt-update because of grub-probe failing to determine my root filesystem type. This wasn’t a ZFS-root system, it had a separate /boot partition. The error was “grub-probe: error: failed to get canonical path of `tank/root’.” Most solutions suggest the use of the ZPOOL_VDEV_NAME_PATH environment variable, but this didn’t fix the problem for me.

I found out that the actual problem was that in my single-drive zpool, a networking hiccup caused it to be marked as degraded. Despite the fact that the pool still worked, the fact that there was no fully-online vdev caused some confusion for grub-probe. Upon clearing the errors with zpool clear, it worked fine.

Repairing a CRS328-24p-4s+

I scored a used CRS328 for $135 which is dirt cheap for a switch that would normally be $400+. Problem is, ports 17-24 had broken PoE – so broken that RouterOS believed they were non-PoE ports. No pictures since the switch is in use.

Fortunately, the switch has three separate PoE daughterboards, one for each group of eight ports. These are not easy to remove, due to the design of the front panel of the switch. There is a cable underneath each card as well as a pin header, so be careful to not yank cables or damage any connectors. In my case, swapping the second and third caused ports 9-16 to break and 17-24 to work, so it was easy to confirm it was a bad card. If you wish to quickly install a card for testing, you don’t need to plug in the power input nor the power output – only the pin header connector to the mainboard.

Upon inspecting the failed board, I noticed that the component “FB3” seemed to have been knocked off the board. The “FB” designation would typically refer to a “ferrite bead”, but I didn’t have any surface mount ferrite beads on hand, so I replaced it with a jumper wire. Not a good long-term plan, but it was enough to get the card working, which would indicate that the missing ferrite bead was indeed the culprit. The long-term fix would be to find an appropriate replacement bead, or replace the board entirely.

Quick Fix: AppArmor+Libvirt Errors in Debian, Round 2

After smooth sailing for a while after the last post on the subject, I ran into another one. Once again, I was getting errors when trying to start guests.

I was getting error messages such as these in syslog:

2023-06-20T14:14:41.858010-07:00 store libvirtd[8623]: internal error: Process exited prior to exec: libvirt:  error : Cannot delete directory '/run/libvirt/qemu/4-autoserver.shm': Device or resource busy
2023-06-20T14:14:42.060935-07:00 store libvirtd[8623]: internal error: Failed to autostart VM 'autoserver': internal error: Process exited prior to exec: libvirt:  error : Cannot delete directory '/run/libvirt/qemu/4-autoserver.shm': Device or resource busy

/var/log/audit/audit.log showed errors such as these:

type=AVC msg=audit(1687295681.852:196): apparmor="DENIED" operation="umount" class="mount" profile="libvirtd" name="/run/libvirt/qemu/" pid=9441 comm="daemon-init"

The fix is to add this to /etc/apparmor.d/abstractions/libvirt:

umount /run/libvirt/qemu/**,

Reload apparmor (systemctl reload apparmor) and try starting a guest.

Switch-Based NVMe Hotplug – a Few Attempts, and one Success

Let’s say you’ve just bought a chassis with an NVMe backplane, or retrofit one into your chassis. Now, it’s time to see if we can get hotplugging and backplane management working.

First of all, PCIe hotplugging is hard. It’s nothing new – after all, PCI hotplug has been around in the form of PCMCIA cards for decades, and PCIe got the same treatment with the later ExpressCard standard. But the reality is that whether it’s a laptop with a card slot, a system with Thunderbolt, or a server with an NVMe backplane, it’s one of those things that you can only expect to work seamlessly if you buy a full OEM system validated for that purpose. If you cobble together a machine from parts, it’s much more difficult to get any sort of PCIe hotplugging working.

I went through this recently after adding a U.2 backplane. Here’s a few things I tried, some of which worked better than others.

Read the rest of this entry »

Upgrading an SC847 with a rear 2×2.5″ Drive Cage

I read this post about upgrading an older SC826 to support the rear drive cage option, and wondered if I could do the same with an SC847. The newer ‘B’ models support this natively, but there are still tons of cheap non-B models out there.

The first question is “why”? To which there are several answers:

  • More drives! Free up a couple 3.5″ bays for 3.5″ drives, rather than using an entire 3.5″ bay for a 2.5″ drive.
  • Cheapest NVMe option. $60-65 for the upgraded motherboard tray, and $80-90 for the drive cage, compared to $200 or so for the 4x U.2 rear backplane.
  • More NVMe. You could get both the 4x U.2 rear backplane (BPN-SAS3-826EL1-N4) and the rear 2x NVMe cage for 6 NVMe bays. (There is also an 8x U.2 front backplane, but it’s hard to find). for a little over $200. The 8x U.2 front backplane is much more difficult to find. Maybe 4 NVMe bays isn’t quite enough for you.
  • Dual-expander (EL2) backplanes: There are no backplane options with both dual SAS expanders and NVMe support.

Enough intro, I was able to get this conversion working, and it was much easier than the 826 conversion (though took a lot longer). Here’s how.

Read the rest of this entry »

Restoring eBay’s Sale History Link

It’s very useful to be able to see the sale history for an item that isn’t yours. You might want to see how quickly it sells, or whether offers are likely to be accepted or rejected. Unfortunately, eBay seems to have recently removed this link. The good news is that the page still exists and can be accessed via the same URL as before. Here’s a GreaseMonkey script to turn the “x sold” text into a clickable link like it previously was:

// ==UserScript== 
// @name     Restore eBay sold items link 
// @version  1
// @grant    none
// @match *://**
// ==/UserScript==

element = document.querySelector("div.d-quantity__availability span:last-child")
text = element.textContent
re = /(.*)\/itm\/([0-9]+).*/
url = document.location.href.replace(re, '$1/bin/purchaseHistory?item=$2')
element.innerHTML = '<a href="' + url + '">' + text + '</a>'

I have not tested it on other userscript plugins. The end result looks like this:

Clicking the link takes you to the sale history page:

Broadcom 9400 – Should You Buy One for a Homelab?

The 9400 series is LSI/Avago/Broadcom’s first “Tri-Mode” HBA, capable of supporting SAS, SATA, and NVMe all in one adapter. There’s a few catches, but despite that, it might still be worth the buy depending on your circumstances.

Read the rest of this entry »

Quick Bash Tip: Alt-Shift-3 (Alt-#)

Alt-Shift-3 in Bash inserts a ‘#’ at the beginning of the line, and then runs it. # is the comment character, so running it does nothing. So why would you want to do this?

The simplest use case is when you have a command typed out, but you realize there’s another command you need to run first. By pressing alt-#, you push the command into your history, so you can quickly recall it after running whatever other commands first.


# Set some dataset properties...but wait, what was the name of the dataset again?
# I have the whole line except for the dataset ready, so press Alt-#
$ #zfs set primarycache=none secondarycache=metadata tank3/
# It's in my bash history now, so I can run whatever other commands first
$ zfs list
# Okay, figured it out, so now I press up-arrow/^P (or search with ^R) to recall the command, then home/^A to go to the start of the line, then del/^D to remove the #
$ zfs set primarycache=none secondarycache=metadata tank3/path/to/my/data/set