Writing enterprise ready software
http://www.hungry.com/~pere/mypapers/enterprise-software/enterprise-software.html
Petter Reinholdtsen
pere@hungry.com
Debconf5, Helsinki 2005-06-12
Overview
- we are in trouble
- clues for the clueless
- multilevel configuration
We are in trouble
Some things are possible for 1 to 10 machines, and impossible with
500 machines. What do you do when you break the ssh configuration
file on 700 machines?
Trouble moving
With 60000 users and about 150 home directory file servers
available from 12000 machines, users move from file server to file
server. This break several applications when the path to the users
home directory changes. (example: /mn/hegel/u1/pere to
/usit/saruman/u1/pere).
Downgrade trouble
With 900 linux machines with common user database and home
directories while running different versions of programs, users will
run several versions of a program with the same configuration
files.
Some old trouble
Some users lost the source of their production systems, and need
the binaries to keep working for 10-15 years.
No room for more trouble
When the file system for the PostgreSQL database in production goes
full, one do not want to kick out 30000 users to take down the
database and resize the file system.
Disk trouble
RAID is only only useful until the last redundancy disk is lost.
Automatic RAID status systems need API or command line tools to
extract the status. Not like afacli, which go into interactive mode
when an error is detected.
Installation trouble
Trying to compile/install software on Irix, Solaris, Linux, HP-UX,
Tru64 Unix, MacOSX and AIX when the process require a sysadmin to sit
around to answer questions, change CDs, or insert licenses is both
painful and prone to errors.
Network trouble
Trying to get some network server to work when it require to use
some given port range, which is already taken by some other service
and blocked in the router -- or try to get the corporate network
gatekeeper to open up the firewall
Version trouble
Given three tcl or php applications, is there one version of TCL or
PHP usable with all of these?
Usability trouble
When starting a program from the menu, where does it go if nothing
appear on the screen?
Do users always read their ~/.xsession-error file?
Clues for the clueless
- at least three levels of config files; package defaults, site
defaults and host defaults
- never ask questions at compile time. when compiling automatically
for 10 platforms, a sysadmin do not want to sit down and answer
questions.
- split installation tasks in two, one for installation, and one for
the operations needing root access. (build / configuration as well)
- make sure the software can be installed anywhere (location
independent), avoid hard coding paths into the binaries.
- make the source available to make it possible to fix problems on
site, and to use it on different platforms (os/hw) in the future
More tips
- Make paths into users home directories relative to ~user/, as users
will move from disk to disk, or copy their home directory from site
to site. Always convert paths when saving config files.
- depend on as few libraries as possible, as it is a pain to get every
extra library in place
- use well known libraries instead of making your own implementation.
reduces the security risk.
- make sure libraries, and programming languages are backwards
compatible.
- use a well known license. it is a pain to evaluate every new
license
Make it easier for everyone
- when distributing source, do not use vendor specific compiler
features. It will not work with the other vendors compilers used to
compile on site.
- write portable code, make sure it works the same on all platforms.
- make the software work out of the box (require as little
configuration as possible).
- avoid resource leaks (memory, shared memory, locks, file
descriptors, X server resources, etc). Restarting a long-running
server is not always an option.
- system services should send messages to syslog. always log why when
crashing. always log problems and errors.
Final clues
- reuse configuration when possible. ktouch have its own x layout setting.
better to fetch the current one from X like xkeycaps.
- providing hooks to the local administrators
- reduce flexibility. trying to support people over the phone when
the gui is different for every person is a pain.
- do not try to cleverly find the final resting place of the installation.
Solving the upgrade problem using multilevel
configuration
- local configuration should be kept during upgrades
- do not change configuration file format
- easiest to do if the local configuration is separate from the
package default
- several actors what to have a say in the service
configuration. allow them to have their own files
- Example: read config from /usr/share/foo/config,
/site/share/foo/config, /etc/foo/config, ~/.foo/config,
/etc/foo/config.fixed, /site/share/foo/config.fixed,
/usr/share/foo/config.fixed.
- make it possible to provide package, site, host and user
defaults, as well as locking down features on a host, site and
package level.
- always well known where the admin made his changes
Thank you very much
Questions?