Showing posts with label Debian. Show all posts
Showing posts with label Debian. Show all posts

Thursday, March 17, 2022

SSH is going ape due to ed25519 host keys

I started seeing the complaints from SSH clients about the unknown host keys from various servers that are perfectly stable and secure few months ago. I didn't have time to fully analyze it but I suspected that it is partially caused by me running cutting-edge Debain Unstable and/or Ubuntu 22.04 (before the release).

The low-level cause is that "something" has changed (and as I said, I am not exactly sure what is going on here) that caused that the servers or clients prefer new ssh-ed25519 instead of the old ecdsa-sha2-nistp256. Which is a good thing, only that the client keeps saying something like this:

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the RSA key sent by the remote host is
.
Please contact your system administrator.
Add correct host key in /.ssh/known_hosts to get rid of this message.
Offending RSA key in /.ssh/known_hosts:
  remove with:
  ssh-keygen -f "/.ssh/known_hosts" -R ""
RSA host key for  has changed and you have requested strict checking.
Host key verification failed.

Which is a serious warning that should not be ignored in any case, unless one positively knows that the server had good reason for changing the key.

If you read it and started worrying, that's good. Bad thing is that after an half an hour googling and browsing openssh changelogs I haven't got to any a detailed explanation and/or diffs that causes that. I have two suspects:

New upstream release
(https://www.openssh.com/releasenotes.html#8.5p1):
    - ssh(1), sshd(8): change the first-preference signature algorithm from
      ECDSA to ED25519.
Or maybe I am wrong. Please provide more details in discussions if you know what changes causes this change in behavior. In any case it is wrong because flashing tons of these unknown key warnings is going to make everyone used to the fact that this happening on its own. Ultimately people are going to be less vigilant and prone to MitM attacks in the end. :-(

Tuesday, September 1, 2015

systemd vs. syslog on Debian

I got pissed off by yet another systemd weirdness: My logfiles (generated by rsyslog) /var/log/daemon.log got flooded with shit like this:

Sep  1 13:35:36 lux systemd[13644]: Starting Paths.
Sep  1 13:35:36 lux systemd[13644]: Reached target Paths.
Sep  1 13:35:36 lux systemd[13644]: Starting Timers.
Sep  1 13:35:36 lux systemd[13644]: Reached target Timers.
Sep  1 13:35:36 lux systemd[13644]: Starting Sockets.
Sep  1 13:35:36 lux systemd[13644]: Reached target Sockets.
Sep  1 13:35:36 lux systemd[13644]: Starting Basic System.
Sep  1 13:35:36 lux systemd[13644]: Reached target Basic System.
Sep  1 13:35:36 lux systemd[13644]: Starting Default.
Sep  1 13:35:36 lux systemd[13644]: Reached target Default.
Sep  1 13:35:36 lux systemd[13644]: Startup finished in 13ms.
Sep  1 13:35:37 lux systemd[13644]: Stopping Default.
Sep  1 13:35:37 lux systemd[13644]: Stopped target Default.
Sep  1 13:35:37 lux systemd[13644]: Stopping Basic System.
Sep  1 13:35:37 lux systemd[13644]: Stopped target Basic System.
Sep  1 13:35:37 lux systemd[13644]: Stopping Paths.
Sep  1 13:35:37 lux systemd[13644]: Stopped target Paths.
Sep  1 13:35:37 lux systemd[13644]: Stopping Timers.
Sep  1 13:35:37 lux systemd[13644]: Stopped target Timers.
Sep  1 13:35:37 lux systemd[13644]: Stopping Sockets.
Sep  1 13:35:37 lux systemd[13644]: Stopped target Sockets.
Sep  1 13:35:37 lux systemd[13644]: Starting Shutdown.
Sep  1 13:35:37 lux systemd[13644]: Reached target Shutdown.
Sep  1 13:35:37 lux systemd[13644]: Starting Exit the Session...
Sep  1 13:35:37 lux systemd[13644]: Received SIGRTMIN+24 from PID 13654 (kill).
Sep  1 13:35:37 lux systemd[13666]: Starting Paths.
Sep  1 13:35:37 lux systemd[13666]: Reached target Paths.
Sep  1 13:35:37 lux systemd[13666]: Starting Timers.
Sep  1 13:35:37 lux systemd[13666]: Reached target Timers.
Sep  1 13:35:37 lux systemd[13666]: Starting Sockets.
Sep  1 13:35:37 lux systemd[13666]: Reached target Sockets.
Sep  1 13:35:37 lux systemd[13666]: Starting Basic System.
Sep  1 13:35:37 lux systemd[13666]: Reached target Basic System.
Sep  1 13:35:37 lux systemd[13666]: Starting Default.
Sep  1 13:35:37 lux systemd[13666]: Reached target Default.
Sep  1 13:35:37 lux systemd[13666]: Startup finished in 8ms.
Sep  1 13:35:37 lux systemd[13666]: Stopping Default.
Sep  1 13:35:37 lux systemd[13666]: Stopped target Default.
Sep  1 13:35:37 lux systemd[13666]: Stopping Basic System.
Sep  1 13:35:37 lux systemd[13666]: Stopped target Basic System.
Sep  1 13:35:37 lux systemd[13666]: Stopping Paths.
Sep  1 13:35:37 lux systemd[13666]: Stopped target Paths.
Sep  1 13:35:37 lux systemd[13666]: Stopping Timers.
Sep  1 13:35:37 lux systemd[13666]: Stopped target Timers.
Sep  1 13:35:37 lux systemd[13666]: Stopping Sockets.
Sep  1 13:35:37 lux systemd[13666]: Stopped target Sockets.
Sep  1 13:35:37 lux systemd[13666]: Starting Shutdown.
Sep  1 13:35:37 lux systemd[13666]: Reached target Shutdown.
Sep  1 13:35:37 lux systemd[13666]: Starting Exit the Session...
Sep  1 13:35:37 lux systemd[13666]: Received SIGRTMIN+24 from PID 13685 (kill).
Sep  1 13:35:37 lux systemd[13705]: Starting Paths.
Sep  1 13:35:37 lux systemd[13705]: Reached target Paths.
Sep  1 13:35:37 lux systemd[13705]: Starting Timers.
Sep  1 13:35:37 lux systemd[13705]: Reached target Timers.
Sep  1 13:35:37 lux systemd[13705]: Starting Sockets.
Sep  1 13:35:37 lux systemd[13705]: Reached target Sockets.
Sep  1 13:35:37 lux systemd[13705]: Starting Basic System.
Sep  1 13:35:37 lux systemd[13705]: Reached target Basic System.
Sep  1 13:35:37 lux systemd[13705]: Starting Default.
Sep  1 13:35:37 lux systemd[13705]: Reached target Default.
Sep  1 13:35:37 lux systemd[13705]: Startup finished in 8ms.
Sep  1 13:35:38 lux systemd[13705]: Stopping Default.
Sep  1 13:35:38 lux systemd[13705]: Stopped target Default.
Sep  1 13:35:38 lux systemd[13705]: Stopping Basic System.
Sep  1 13:35:38 lux systemd[13705]: Stopped target Basic System.
Sep  1 13:35:38 lux systemd[13705]: Stopping Paths.
Sep  1 13:35:38 lux systemd[13705]: Stopped target Paths.
Sep  1 13:35:38 lux systemd[13705]: Stopping Timers.
Sep  1 13:35:38 lux systemd[13705]: Stopped target Timers.
Sep  1 13:35:38 lux systemd[13705]: Stopping Sockets.
Sep  1 13:35:38 lux systemd[13705]: Stopped target Sockets.
Sep  1 13:35:38 lux systemd[13705]: Starting Shutdown.
Sep  1 13:35:38 lux systemd[13705]: Reached target Shutdown.
Sep  1 13:35:38 lux systemd[13705]: Starting Exit the Session...
Sep  1 13:35:38 lux systemd[13705]: Received SIGRTMIN+24 from PID 13714 (kill).
Sep  1 13:35:38 lux systemd[13742]: Starting Paths.
Sep  1 13:35:38 lux systemd[13742]: Reached target Paths.
Sep  1 13:35:38 lux systemd[13742]: Starting Timers.
Sep  1 13:35:38 lux systemd[13742]: Reached target Timers.
Sep  1 13:35:38 lux systemd[13742]: Starting Sockets.
Sep  1 13:35:38 lux systemd[13742]: Reached target Sockets.
Sep  1 13:35:38 lux systemd[13742]: Starting Basic System.
Sep  1 13:35:38 lux systemd[13742]: Reached target Basic System.
Sep  1 13:35:38 lux systemd[13742]: Starting Default.
Sep  1 13:35:38 lux systemd[13742]: Reached target Default.
Sep  1 13:35:38 lux systemd[13742]: Startup finished in 14ms.
Sep  1 13:35:38 lux systemd[13742]: Stopping Default.
Sep  1 13:35:38 lux systemd[13742]: Stopped target Default.
Sep  1 13:35:38 lux systemd[13742]: Stopping Basic System.
Sep  1 13:35:38 lux systemd[13742]: Stopped target Basic System.
Sep  1 13:35:38 lux systemd[13742]: Stopping Paths.
Sep  1 13:35:38 lux systemd[13742]: Stopped target Paths.
Sep  1 13:35:38 lux systemd[13742]: Stopping Timers.
Sep  1 13:35:38 lux systemd[13742]: Stopped target Timers.
Sep  1 13:35:38 lux systemd[13742]: Stopping Sockets.
Sep  1 13:35:38 lux systemd[13742]: Stopped target Sockets.
Sep  1 13:35:38 lux systemd[13742]: Starting Shutdown.
Sep  1 13:35:38 lux systemd[13742]: Reached target Shutdown.
Sep  1 13:35:38 lux systemd[13742]: Starting Exit the Session...
Sep  1 13:35:38 lux systemd[13742]: Received SIGRTMIN+24 from PID 13753 (kill).
Sep  1 13:35:39 lux systemd[13779]: Starting Paths.
Sep  1 13:35:39 lux systemd[13779]: Reached target Paths.
Sep  1 13:35:39 lux systemd[13779]: Starting Timers.
Sep  1 13:35:39 lux systemd[13779]: Reached target Timers.
Sep  1 13:35:39 lux systemd[13779]: Starting Sockets.
Sep  1 13:35:39 lux systemd[13779]: Reached target Sockets.
Sep  1 13:35:39 lux systemd[13779]: Starting Basic System.
Sep  1 13:35:39 lux systemd[13779]: Reached target Basic System.
Sep  1 13:35:39 lux systemd[13779]: Starting Default.
Sep  1 13:35:39 lux systemd[13779]: Reached target Default.
Sep  1 13:35:39 lux systemd[13779]: Startup finished in 14ms.
Sep  1 13:35:39 lux systemd[13779]: Stopping Default.
Sep  1 13:35:39 lux systemd[13779]: Stopped target Default.
Sep  1 13:35:39 lux systemd[13779]: Stopping Basic System.
Sep  1 13:35:39 lux systemd[13779]: Stopped target Basic System.
Sep  1 13:35:39 lux systemd[13779]: Stopping Paths.
Sep  1 13:35:39 lux systemd[13779]: Stopped target Paths.
Sep  1 13:35:39 lux systemd[13779]: Stopping Timers.
Sep  1 13:35:39 lux systemd[13779]: Stopped target Timers.
Sep  1 13:35:39 lux systemd[13779]: Stopping Sockets.
Sep  1 13:35:39 lux systemd[13779]: Stopped target Sockets.
Sep  1 13:35:39 lux systemd[13779]: Starting Shutdown.
Sep  1 13:35:39 lux systemd[13779]: Reached target Shutdown.
Sep  1 13:35:39 lux systemd[13779]: Starting Exit the Session...
Sep  1 13:35:39 lux systemd[13779]: Received SIGRTMIN+24 from PID 13788 (kill).

What the fuck is that? What the hell does it mean? I googled a bit but after reading few meaningless discussion like A: "What does it mean? Is it serious? How do I get rid of this". B: "It is for your own good. Suffer and be silent." I got finally an impression that it is harmless and it simple means that systemd mothefucker somehow help to log-in and log-out users and it does something(?) with processes that get started by cron.

Then I came across some RHEL/Fedora bug report where even Lennart Poettering posted his "Won'tfix! It's for your own good. Resistance is futile, keep you mouth shut and suffer with systemd." Somebody called him an idiot in response. :-)

Anyway, the problem occurred only on Debian systems that have been upgraded from Wheezy to Jessie, but not on freshly installed Jessies. So I compared configuration and the difference that... well, it made the difference, was:

session        optional        pam_systemd.so

line in /etc/pam.d/common-session . I just commented it out and the shitload of annoying log messages stopped.

Wednesday, August 12, 2015

mgetty in systemd for modem dial-in server

I used to have a modem dial-in server as an out-of-band management for key network elements. The idea was to call the server over the GSM from another place with minicom, connect to the terminal of the server and then use another serial consoles that was connected from the server to routers, switches etc...

So far so good, I used to have a simple /etc/inittab line like this:

T3:23:respawn:/sbin/mgetty -D -s 115200 ttyS0

But... The fucking almighty systemd came to create hell on earth for Linux users... No /etc/inittab anymore, no nothing. Well, UTFG, so I get completely wrong advice to try:

# systemctl start getty@ttyS0.service

And that's it... (?) No, it isn't! It simply does not accept the call, because it uses agetty and agetty can't do that or it is not configured or whatever.

After one particularly hot, unpleasant and exhausting afternoon spent on experimenting with this I came to following config for systemd that works for me (place it to /etc/systemd/system/mgetty.service):

[Unit]
Description=Smart Modem Getty(mgetty)
Documentation=man:mgetty(8)
Requires=systemd-udev-settle.service
After=systemd-udev-settle.service

[Service]
Type=simple
ExecStart=/sbin/mgetty -D -s 115200 /dev/ttyS0
Restart=always
PIDFile=/var/run/mgetty.pid.ttyS0

[Install]
WantedBy=multi-user.target

And of course:

# systemctl start mgetty.service
# systemctl enable mgetty.service

I hate systemd! I really do. But I'll learn about it. It has spread to all reasonable distros like a nasty infection so we have to learn living with it, I guess.

Thursday, December 11, 2014

Debian's driver for Intel GPUs is shit

After a few years spent with Ubuntu and Linux Mint I decided to give Debian Jessie a try this week. And I have to admit that I also wanted to see systemd in action just to assess it myself and see whether it has some potential to do something good or otherwise.

But the Debian Jessie with either Cinnamon or Mate desktop was unusable on my ThinkPad X301. The graphics was sluggish. Even video playback in VLC was apparently loosing one third of frames even in lower quality movies. The mouse wheel on external USB mouse was lagging. Keyboard has visible lag behind key press and the letter being inserted to terminal/text editor/whatever. The experience was really horrible. Wtf?

Well, I suspected the Cinnamon desktop that I installed in the first place. I thought that it might eat up CPU and put too much strain on GPU. This assumption proved wrong. So the problem was apparently somwhere in XOrg. Sluggish video and lagging inputs worried me because how one problem could possibly cause that. Well, it is possible in the XOrg because one driver can slow down the whole thing and cause problems even in other subsystems.

After a few hours tuning this and that I checked the version of the Intel driver and... Well, the driver is old deprecated shit. And it is present in all current Debian versions (stable... that's a joke:-) ), testing (which is freezed so it is going to be "stable", LOL), and unstable (at least the name suggest that it does not work properly, which is completely true).

This blog post explains everything: http://blogs.fsfe.org/the_unconventional/2014/11/12/debian-x-drivers/

Just for reference: That's what I have.

00:02.0 VGA compatible controller: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller (rev 07) (prog-if 00 [VGA controller])
 Subsystem: Lenovo Device 20e4
 Flags: bus master, fast devsel, latency 0, IRQ 47
 Memory at f0000000 (64-bit, non-prefetchable) [size=4M]
 Memory at d0000000 (64-bit, prefetchable) [size=256M]
 I/O ports at 1800 [size=8]
 Expansion ROM at  [disabled]
 Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
 Capabilities: [d0] Power Management version 3
 Kernel driver in use: i915

And my solution was:
echo "deb http://ftp.debian.org/debian experimental main" >> /etc/apt/sources.list
apt-get update
apt-get -t experimental install xserver-xorg-video-intel

Wednesday, December 10, 2014

OpenHantek udev rulez

I have a small and cheap Hantek DSO-2090 USB oscilloscope... And, well, the HW is old, cheap Chinese box that looks extremely ugly. I have not enough bravery to look inside but I do not hope for anything good. But it was cheap enough and when I was moving I really needed to get rid of my old Russian 15 kg, 0.5 cubic meter CRT oscilloscope so I decided to buy this one. It was conscious choice and it works for me pretty well since I am doing only basic electronic measurements here.

But I migrated to a newer Debian system so I had to rebuild my OpenHantek SW, which was pretty painless. Many thanks to the author of this blog post: http://verahill.blogspot.cz/2012/12/298-hantek-dso-2250-usb-with-openhantek.html

But there were still a problem with udev that failed to change group and access rights to the /dev/bus/usb/... file, so I was unable to use OpenHantek as an ordinary user. The solution is obvious, the old udev rule file used SYSFS instead of ATTR. This is my working version:


# Hantek DSO-2090
SUBSYSTEM=="usb", ACTION=="add", ENV{DEVTYPE}=="usb_device", ENV{PRODUCT}=="4b4/2090/*", RUN+="/sbin/fxload -t fx2 -I /usr/local/share/hantek/dso2090-firmware.hex -s /usr/local/share/hantek/dso2090-loader.hex -D $env{DEVNAME}"
ATTR{idVendor}=="04b5", ATTR{idProduct}=="2090", MODE="0660", GROUP="plugdev"

# Hantek DSO-2100
SUBSYSTEM=="usb", ACTION=="add", ENV{DEVTYPE}=="usb_device", ENV{PRODUCT}=="547/1006/*", RUN+="/sbin/fxload -t an21 -I /usr/local/share/hantek/dso2100-firmware.hex -s /usr/local/share/hantek/dso2100-loader.hex -D $env{DEVNAME}"
ATTR{idVendor}=="0547", ATTR{idProduct}=="1002", MODE="0660", GROUP="plugdev"

# Hantek DSO-2150
SUBSYSTEM=="usb", ACTION=="add", ENV{DEVTYPE}=="usb_device", ENV{PRODUCT}=="4b4/2150/*", RUN+="/sbin/fxload -t fx2 -I /usr/local/share/hantek/dso2150-firmware.hex -s /usr/local/share/hantek/dso2150-loader.hex -D $env{DEVNAME}"
ATTR{idVendor}=="04b5", ATTR{idProduct}=="2150", MODE="0660", GROUP="plugdev"

# Hantek DSO-2250
SUBSYSTEM=="usb", ACTION=="add", ENV{DEVTYPE}=="usb_device", ENV{PRODUCT}=="4b4/2250/*", RUN+="/sbin/fxload -t fx2 -I /usr/local/share/hantek/dso2250-firmware.hex -s /usr/local/share/hantek/dso2250-loader.hex -D $env{DEVNAME}"
ATTR{idVendor}=="04b5", ATTR{idProduct}=="2250", MODE="0660", GROUP="plugdev"

# Hantek DSO-5200
SUBSYSTEM=="usb", ACTION=="add", ENV{DEVTYPE}=="usb_device", ENV{PRODUCT}=="4b4/5200/*", RUN+="/sbin/fxload -t fx2 -I /usr/local/share/hantek/dso5200-firmware.hex -s /usr/local/share/hantek/dso5200-loader.hex -D $env{DEVNAME}"
ATTR{idVendor}=="04b5", ATTR{idProduct}=="5200", MODE="0660", GROUP="plugdev"

# Hantek DSO-5200A
SUBSYSTEM=="usb", ACTION=="add", ENV{DEVTYPE}=="usb_device", ENV{PRODUCT}=="4b4/520A/*", RUN+="/sbin/fxload -t fx2 -I /usr/local/share/hantek/dso520a-firmware.hex -s /usr/local/share/hantek/dso520a-loader.hex -D $env{DEVNAME}"
ATTR{idVendor}=="04b5", ATTR{idProduct}=="520A", MODE="0660", GROUP="plugdev"

Assuming the path to the *.hex file generated by the openhantek-extractfw is /usr/local/share/hantek .

Thursday, April 12, 2012

Buggy Xfce xkb applet looses keyboard layout config

I have been hit by a nasty Xfce xkb applet bug, which is widely known in several bloody bugzillas (of RH/CentOS, Ubuntu, Debian,...) but it seems unresolved yet. The backround of the bug is, that the applet looses it's configuration, especially configured keyboard layouts and shortcut for switching them while for while which may seem to be random or it may seem to have some coincidence with suspending/wakeups of the computer. Well it is not random, it resolves to connections and disconnections of external (or even internal) USB keyboard, which may of course be triggered by suspend/wakeup.

So it seems I have the reason isolated. Now what is the solution? Well the applet is part of so-called Xfce-goodies, which is sort of external project. I did not find corresponding bugzilla nor some mailinglist etc. But I think that author is perhaps aware of this behavior, so there is no need to shout at him. But I needed the workaround and I think I have found one. It is simple and it is Linux-like: Just configure all your keyboards in xorg.xonf file in InputClass section and then use the applet as an indicator and switcher.

I am using following snippet of xorg.conf on my ThinkPad X301 which I am connecting and disconnecting to a USB keyboard:

Section "InputClass"
    Identifier "keyboard defaults"
    MatchIsKeyboard "on"
    Option "XkbLayout" "us, cz"
    Option "XkbVariant" ", qwerty_bksl"
    Option "XKbOptions" "grp:alt_shift_toggle, terminate:ctrl_alt_bksp"
EndSection

Desktop hell

Well after some time I am back with my reckless criticism of insane (but OpenSource) desktops. Basically what can you choose when you want a Linux desktop? Well, of course plenty of things but when you need getting things done and you do not want to play with own xrandr, xkb, xinput scripting or with printing, keybindings for multimedia keys etc. you would probably need some desktop environment. And there are quite a lot of such projects (look at Wikipedia). KDE. Gnome3. Xfce, LXDE, Unity to name some.

Well what you can find is that there is a huge hare-core of Gnome3 and Unity. People just hate them because these desktops are everything but useful and intuitive. The both are resembling some of iPad madness and with Unity there are rumors on the web that it is actually designed for tablet computers and its usage on desktop is some sort of side-effect of development in the meantime when Canonical (author of Unity & Ubuntu) is negotiating deals with tablet manufacturers. Pity. In fact these desktops forces you to do things by their ways (different from what we all are used to from previous generations of desktops starting from Win 3.11 up to Gnome 2.32) and they are putting obstacles between you and your productivity apps. And I concur with all these objections against particularly these two desktops.

There is of course KDE. KDE is way long from what is my idea of a decent desktop. I like some aspects of it but it lacks what I need - really fast desktop (with or rather without eye-candy but really fast!), easy-to used virtual desktops, app panel and good application switcher and support for docking/undocking (which means changing screen resolution, switching on/off the LVDS etc. It takes me to the remaining Xfce.

But current stable Xfce (on Debian Wheezy) to be specific is fare from being easy to use. I had to tweak it deeply to become usable. I had 4 most severe problems I had to cope with. NetworkManager (+ NM Applet) was unable to connect to set network. (Solved). There are nasty sounds in different applications set. The most annoying are terminal bell in gnome-terminal and gdm3 greeter sound (turned off easily). Two problems are more complicated and I am still working on them. First is Xfce xkb applet which looses it's configuration (I mean set layouts, layout-switcher keyboard shortcut etc.) each and every time you connect or disconnect USB keyboard. Which effectively means it looses the configuration each time you dock or undock your laptop. And the last problem is automatic switching of desktop resolution and putting LVDS to on/off state according to presence of another screen connected via Display Port. I have done some scripting workarounds to switch this semi-automatically but I am not proud of it at all.

Next time I am going to put here mentioned Debian tweaks for the two already-solved problems on my Debian Wheezy + Xfce desktop.

Tuesday, September 27, 2011

hpacucli: Error: No controllers detected. with hpsa and SmartArray P410i

I have encountered a fucking weird problem with hpacucli (HP utility for their RAID controllers) on ProLiant DL360 G7 or whatever. The controller was HP SmartArray P410i and the problem was simple:

root@XYZ:~# hpacucli 
HP Array Configuration Utility CLI 8.70-8.0
Detecting Controllers...Done.
Type "help" for a list of supported commands.
Type "exit" to close the console.

=> ctrl all show

Error: No controllers detected.

The solution is: Load the sg driver and that's it:

root@XYZ:~# modprobe sg
root@XYZ:~# hpacucli 
HP Array Configuration Utility CLI 8.70-8.0
Detecting Controllers...Done.
Type "help" for a list of supported commands.
Type "exit" to close the console.

=> ctrl all show

Smart Array P410i in Slot 0 (Embedded)    (sn: 500143801630C980)

Saturday, August 27, 2011

Test driven service operation vs. Nagios et al.

I am constantly thinking about network and server outages handling. This was my focus few years ago, I worked as an admin/op in a small hosting company and I was bombarded by SMSes from our home-brewed server/service monitoring system written in Perl. The system has bunch of drawbacks so we decided to replace it by Nagios (3 dot something). It has drawbacks as well, and I would say that even more serious in some cases.

I was responsible for the Nagios migration but I am not an op anymore so I have less experience with that. From my point of view, being aware of my limited insight, I can describe some drawbacks in Nagios 3 (= Nagios Core, simply the OSS version you get when apt-get install Nagios3 on top of Debian...).

The first one and probably the most severe: The configuration is complicated by nature. In addition Debian forces/strongly suggest some ideas about how you should write the config files. And when you try to do so, you have to read manual which does not give answers to all questions. For example: Are  multiple parents in dependency tree of hosts/services in AND, OR or whatever relation? You can find lots of small questions and eventually Google some answers, dig into documentation or whatever, but it takes time. Anyway, writing Nagios configs takes time and one should ask himself: Why? Of course you can use Swiss-made NConf to convert writing config into clicking configs in web interface, but it is not a real improvement. Why can't it be automatic? Let's say the system can auto-discover hosts and test-run all yet-know service tests. If some of tests results to OK, it can suggest that test. It should be able to clone hosts, categorize hosts, make exceptions etc. but on the other hand I prefer text config files over some sophisticated database schema...

The another thing is that sending alarms should be smarter than only triggering scripts like send mail to contacts and send messages to pagers or cell phones in modern days. Well, I like the idea of master alarm. I would like to have a possibility to set some alarms as not crucial for business/system operation and have them listed in web interface/reports and alarmed by less aggressive way to ops. I would like to have a possibility to have some threshold for sounding master alarm and then sending this master alarm (once or N-times but not overflowing ops with hundreds of different and probably correlated errors). And I would like to have a permissive and easy to use system, not a system which does not allow to acknowledge all reported errors and when not acknowledged, it bothers by SMSes over and over. I would like the system to accept my input and respect what I want or want not to save, not like Nagios->Acknowledge->Error: You have to write a comment. Wtf.? I have major network problem, I want to investigate what is going on and not writing stupid comments, especially in situation I do not know what to say, I just want to stop SMSes from bothering me.

I would like to have a monitoring cluster, able to monitor network/servers/services from more locations and give me a overall report. I would like to have a possibility to write own triggers on errors/warnings, to report more complex situations. Let's say that I have a cluster of 10 servers with loadbalancing and I know that 5 servers would be sufficient. I would make sense not to send alarm during nighttime when one of these 10 servers went down. But it make sense to send alarm if only 6 or less servers remains operational. Event more complex situations could be described and it would be nice to set this triggers easily.

And I would like to have a overview on my system. I want to see what is going on, what happened in past and write afterwards how did I solved the problem to have op's log and tip for next time.

I think that technically it should be relatively easy to run few thousands of test each minute on a decent Intel server. Not speaking about parallelization. Then it comes an idea: We have a paradigm/style/philosophy of test driven development. Why not to have a test driven system operation? I think there are two "contras": ComplexNess of configuration and complications with data acquisition and interpretation - i.e. people fears that it would be more complicated to answer a question "what is broken?". But I believe that both "con's" a only drawbacks of current software. Discussion will be appreciated.

Wednesday, November 17, 2010

Hibernation on Debian unstable (sid) sucks badly

As the header says... It sucks completely. Basically what you need to make hibernation working on pretty decent laptop like for instance ThinkPad X301 as in my case? First you need running Debian of course, SWAP partition of sufficient size (= greater than your RAM:-)) and install packages hibernate and uswsusp. Then try it... For instance run s2disk or click on some button on your Gnome/KDE/... desktop. It should hibernate (some percentage growing, disk working and then it is off), hopefully.

When you turn it on you may see message:
Invalidating stale software suspend images
and then the systems boot from scratch like it has beeing rebooted... What the fuck? Well in my case the

resume=swap:/dev/sda2

line was missing in /boot/grub/grub.cfg.

Well, you can add it there by hard. (Why it is not there by default? Well I am not a Debian guy, so I do not even know where to fill the bug actually, but I am providing a hack. Stone me.) Just add something like this line:

GRUB_CMDLINE_LINUX_DEFAULT="quiet resume=swap:/dev/sda2"

to file /etc/default/grub.

That't it. It worked for me.