Edit2: Thanks all for your responses! I have checked the logs, https://lemmy.nz/comment/6192604, and based on that removed tracker-miner-fs as it’s a search/index tool which I don’t need. No idea why it took over all memory. I’ll also get a WiFi Smartplug as a kill switch. Hopefully that solves it. Thanks again heaps!


I’ve got a HP ProDesk G3 which I’m using as home server, I’ve installed Ubuntu on it. Earlier this week the services I host on it stopped (Immich & Frigate). I tried to SSH, but it just hung after asking for a password. I could ping it, but it was just unresponsive.

I had to force reboot it manually. This is fine, but I’m not always at home.

The chip has Intel vPro as far as I know, which could be an option, but I have no idea how this works. The documentation on the Intel site seems focused on enterprises. I tried to connect with RealVNC which does not work, so I think I’ve got to install/configure something on the server first.

I also asked Bing Chat but it came up with non existing packages & commands. Welcome your thoughts!

/edit: I just found this, which seems to be exactly what I need: https://manpages.ubuntu.com/manpages/focal/en/man7/amt-howto.7.html

  • The_Pete@lemmy.world
    link
    fedilink
    English
    arrow-up
    21
    ·
    10 months ago

    Check if your motherboard has a watchdog function. If the OS can’t ping the watchdog every 5 min or whatever you set it to, the board resets.

  • Blaster M@lemmy.world
    link
    fedilink
    English
    arrow-up
    11
    arrow-down
    1
    ·
    edit-2
    10 months ago

    A unifi power strip on a unifi network so you can control the power switch, and setting the motherboard to auto turn on after power failure. Though this is the nuclear option for restarting the system. Maybe while you’re at it, diagnose why it keeps hanging up on you.

    • sylverstream@lemmy.nzOP
      link
      fedilink
      English
      arrow-up
      3
      ·
      10 months ago

      Yeah think I’ll get a standalone WiFi smart plug, not connected to my Home Assistant, as a kill switch. But you’re right, it’s overkill.

      I found some weird things in the logs, this goes beyond my knowledge :( See https://lemmy.nz/comment/6192604

      • JASN_DE@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        ·
        10 months ago

        But you’re right, it’s overkill.

        I wouldn’t say that. Sure, it’s not the preferred way of restarting a system, but it is a good backup to have if nothing else works. Remotely messing up the network connections for example.

  • nightrunner@lemmy.world
    link
    fedilink
    English
    arrow-up
    7
    ·
    edit-2
    10 months ago

    Ok, I grabbed a few screen shots for you as well. Here is a site that will link you to MEBx setup that enables AMT: http://h10032.www1.hp.com/ctg/Manual/c03883429

    When power on your ProDesk G3, you can access the MEBx setup by pressing Ctrl+P or they also say F6 or Escape will get you there. Intel AMT runs on a different IP address than what your OS gets. You can assign DHCP or a static IP address and setup your admin password. You can then access the portal from http://ipaddress:16992 There should be a method of access what would show on the screen through a KVM like access but I use MeshCentral for that so I couldn’t tell you how to do it without.

    Hopefully, that gives you a start. Feel free to reach back out if you have any questions. Thank you!

    • sylverstream@lemmy.nzOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      10 months ago

      Yes, thanks for that. Good point. I checked the logs, and minutes before it crashed I can see below in the logs. Seems like either a GPU error or out of memory error. I’ve deleted tracker-miner-fs as I don’t need it. It also shows a massive list of processes with their memory usage.

      Feb 21 17:27:49 hppd600-g3 kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 9:0:00000000
      Feb 21 17:32:43 hppd600-g3 kernel: 1305621 total pagecache pages
      Feb 21 17:32:43 hppd600-g3 kernel: 16258 pages in swap cache
      Feb 21 17:32:43 hppd600-g3 kernel: Free swap  = 0kB
      Feb 21 17:32:43 hppd600-g3 kernel: Total swap = 1000444kB
      Feb 21 17:32:43 hppd600-g3 kernel: 2065206 pages RAM
      Feb 21 17:32:43 hppd600-g3 kernel: 0 pages HighMem/MovableOnly
      Feb 21 17:32:43 hppd600-g3 kernel: 64196 pages reserved
      Feb 21 17:32:43 hppd600-g3 kernel: 0 pages hwpoisoned
      
      Feb 21 17:32:43 hppd600-g3 kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=user.slice,mems_allowed=0,global_oom,task_memcg=/user.slice/user-113.slice/user@113.service/background.slice/tracker-miner-fs-3.service,task=t>
      Feb 21 17:32:43 hppd600-g3 kernel: Out of memory: Killed process 833 (tracker-miner-f) total-vm:625676kB, anon-rss:3144kB, file-rss:4816kB, shmem-rss:4kB, UID:113 pgtables:280kB oom_score_adj:200
      Feb 21 17:32:43 hppd600-g3 kernel: i915 0000:00:02.0: [drm] Resetting rcs0 for stopped heartbeat on rcs0
      
      
    • sylverstream@lemmy.nzOP
      link
      fedilink
      English
      arrow-up
      5
      ·
      10 months ago

      Yes, thanks for that. Good point. I checked the logs, and minutes before it crashed I can see below in the logs. Seems like either a GPU error or out of memory error. No idea what tracker-miner-f is by the way. It also shows a massive list of processes with their memory usage.

      This goes beyond my knowledge :(

      Feb 21 17:27:49 hppd600-g3 kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 9:0:00000000
      Feb 21 17:32:43 hppd600-g3 kernel: 1305621 total pagecache pages
      Feb 21 17:32:43 hppd600-g3 kernel: 16258 pages in swap cache
      Feb 21 17:32:43 hppd600-g3 kernel: Free swap  = 0kB
      Feb 21 17:32:43 hppd600-g3 kernel: Total swap = 1000444kB
      Feb 21 17:32:43 hppd600-g3 kernel: 2065206 pages RAM
      Feb 21 17:32:43 hppd600-g3 kernel: 0 pages HighMem/MovableOnly
      Feb 21 17:32:43 hppd600-g3 kernel: 64196 pages reserved
      Feb 21 17:32:43 hppd600-g3 kernel: 0 pages hwpoisoned
      
      Feb 21 17:32:43 hppd600-g3 kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=user.slice,mems_allowed=0,global_oom,task_memcg=/user.slice/user-113.slice/user@113.service/background.slice/tracker-miner-fs-3.service,task=t>
      Feb 21 17:32:43 hppd600-g3 kernel: Out of memory: Killed process 833 (tracker-miner-f) total-vm:625676kB, anon-rss:3144kB, file-rss:4816kB, shmem-rss:4kB, UID:113 pgtables:280kB oom_score_adj:200
      Feb 21 17:32:43 hppd600-g3 kernel: i915 0000:00:02.0: [drm] Resetting rcs0 for stopped heartbeat on rcs0
      
      
      • ryannathans@aussie.zone
        link
        fedilink
        English
        arrow-up
        6
        ·
        10 months ago

        Tracker miner fs generates thumbnails for files iirc. There was a recent vulnerability where malicious files could crash it and execute code just by being on disk. Make sure you haven’t been hit by malware

        • sylverstream@lemmy.nzOP
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          1
          ·
          10 months ago

          Yeah tracker miner sounds dodgy. I’ve only installed Immich & Frigate on the box, and no dodgy repositories. It’s also auto updating. Will do research how to check for malware, thought that was a Windows only thing :D

          • constantokra@lemmy.one
            link
            fedilink
            English
            arrow-up
            1
            ·
            10 months ago

            I’ve previously had a problem with my server becoming unresponsive when running immich. It’s been a while, but I remember there being some kind of memory leak having to do with immich. It was in their GitHub issues and everything. On my system it would take about a day and a half and then ssh, along with everything else, would become unresponsive. Rebooting would fix it for a day and a half. I stopped running immich and it hasn’t happened since. I suppose you could try using a cron job to restart immich periodically and see if that resolves your problem.

            • sylverstream@lemmy.nzOP
              link
              fedilink
              English
              arrow-up
              2
              ·
              10 months ago

              That is good to know! Will keep an eye on memory usage of immich. I really like it, so I’m reluctant to let it go.

  • cmnybo@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    5
    arrow-down
    1
    ·
    10 months ago

    You could connect an ESP32 to the power and reset switches through opto-isolators or relays. You will have to do a little bit of programming, but you can host a website on the ESP32 that will allow you to operate the switches remotely.

    If you want to get a bit fancier, you could connect the UART on the ESP32 to a serial port on the server through a TTL to RS-232 level converter and have a remote serial terminal embedded in the web page too. That won’t do much good if the server is completely locked up though.

  • lordnikon@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    arrow-down
    1
    ·
    10 months ago

    remote kvm if you are relying on a box that no longer has a network connection you are SOL and need something that can power cycle the box.

  • Shadow@lemmy.ca
    link
    fedilink
    English
    arrow-up
    3
    ·
    10 months ago

    If it hung like that, you probably have some sort of storage issue or high memory consumption pushing the box into swap.

    Intel amt may help you, if you want hardware then google pikvm. Raritan also makes a small single node ip kvm, but it’ll probably cost more.

    • sylverstream@lemmy.nzOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      10 months ago

      Thanks! Yeah it seemed to be an OOM issue, but based on my Kagi qualities it seems like an OS issue. But, it also has an error about the GPU. Normal memory usage is more than fine, so perhaps it was a one time thing. See logs: https://lemmy.nz/comment/6192604

  • solrize@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    ·
    edit-2
    10 months ago

    On actual server motherboards (as opposed to repurposed home PC’s) there is sometimes a special KVM like interface (keyboard/video/mouse, not the VM hypervisor) so you can connect to it with VNC and have the equivalent of local access. This is called IDRAC on Dell servers and other vendors have something similar.

    On a home PC, hmm, you might be able to set up some kind of remote power cycle and serial console connection, using a second computer (Raspberry Pi or the like). I’m unfamiliar with Intel AMT that you linked to, but it seems like another idea.

    I do remember hearing of a DRAC-like board for PC’s but the name of it escapes me right now.

    At the end of the day, if you want a long running server, you probably should host it in a data center, maybe with failover and other HA provisions. Home environments are a pain to set up for that. If your computer goes offline and you can’t reach it, how do you even know that your home isn’t having a power outage? Home ISP’s are flaky too, so maybe you want a backup route over mobile data, etc. Yes you can make workarounds for everything but it amounts to turning your home into a crappy low capacity data center.

    • agentsac@lemmy.world
      link
      fedilink
      English
      arrow-up
      6
      ·
      10 months ago

      PiKVM or a similar device could work for OP - is that what you are thinking of? I’ve used it and it works well.

      I think a lot of people who self-host get caught up in the excitement of getting the services up and running and neglect disaster planning, prevention, and recovery (myself included). Either they put it off for later or don’t realize it could be a problem down the road until it happens. We always say not to self host anything you can’t live without, and most take that advice, others don’t. Not saying OP falls in either category, necessarily, just adding on to some of your points.

      Self hosting really is the land of compromise where we all have to balance our requirements, budget, time and effort. Personally, I have a little disposable income that I spend on hardware to host non-critical services so I can learn and tinker. It could all go away and all I will have lost is the time and money I put into it, but I gained some knowledge and enjoyment. Needless to say, I don’t have much in the way of backups and monitoring.

      • solrize@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        10 months ago

        PiKVM isn’t the board I was thinking of, but same idea, and maybe even better.

    • sylverstream@lemmy.nzOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      Thanks, but a data center is probably overkill for my needs. I’ve got it power loss protected with a UPS, and that’s more than enough for us. Thanks anyway :)

      I have a RPI, but of course that one can hang too. I’ll buy a simple WiFi smart plug, standalone, as a kill switch.

  • Decronym@lemmy.decronym.xyzB
    link
    fedilink
    English
    arrow-up
    5
    arrow-down
    2
    ·
    edit-2
    10 months ago

    Acronyms, initialisms, abbreviations, contractions, and other phrases which expand to something larger, that I’ve seen in this thread:

    Fewer Letters More Letters
    HA Home Assistant automation software
    ~ High Availability
    IP Internet Protocol
    NAS Network-Attached Storage
    RPi Raspberry Pi brand of SBC
    SBC Single-Board Computer
    SSH Secure Shell for remote terminal access
    VNC Virtual Network Computing for remote desktop access

    5 acronyms in this thread; the most compressed thread commented on today has 5 acronyms.

    [Thread #533 for this sub, first seen 22nd Feb 2024, 04:35] [FAQ] [Full list] [Contact] [Source code]

  • nightrunner@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    10 months ago

    I’m not in front of my computer atm, but I think I have something that can help you out. I have a 3-node Lenovo Thin client cluster that I manage their KVMs using the Intel vPro. I even went a step further using MeshCentral running on a VM to centralize my KVM access since I have 3 of them, but that’s another story.

    Anyway, I’ll see if I can grab you some URLs in the morning if someone else doesn’t beat me to it or you find it on your own running google queries.

    • sylverstream@lemmy.nzOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      Thanks mate. It was a bit of a rabbit hole, I found stuff about the watchdog package, and you can configure it to use the iTCO_wdt module, but I also read it was blacklisted, and then I just gave up. I posted somewhere else in the thread what lead up to the hang. And, I think I’ll buy a WiFi smartplug so I can remotely reboot everything; assuming the WiFi still works :D