RTS2 - the Remote Telescope System

RTS2 is an open source observatory manager. It was written from scratch in the C++ language, with portability and modularity in mind. Its driving requirements originated from quick follow-ups of Gamma Ray Bursts. After some years of development it is now used to carry tasks it was originally not intended to carry. This article presents the current development status of the RTS2 code. It focuses on describing strategies which worked as well as things which failed to deliver expected results.


Project goals and its development
The original goal of the RTS2 development was to produce software for a small robotic telescope, devoted to study Gamma Ray Burst (GRB) transients. The requirements of the project were: -must be fully autonomous -must react as fast as possible to incoming GRB alerts -must perform regular observations during periods with no GRB observation to verify system readiness -must be modular to enable easy switching of the instruments Those requirements were laid in early 2000. They share similarities with other projects, notably AudeLA [8] and the Liverpool Telescope System [6]. At the time, open-source projects of similar scale did not exist. Available commercial, closed-source solutions were either associated with a single instrument, with questionable portability to others telescopes, or did not fulfill the requirement for fast switching between targets. It is worth noting that even today it is not easy to implement this requirement. It forms a strong entry barrier for those who would like to become GRB observers.
As the project advanced, additional requirements were added. They reflected experiences gained during development, and particularly pains and problems associated with early porting of the system to the other observatories: -system must be robust enough to continue operations even when noncritical part(s) fail -observer must have possibility to remotely interact with observations -system must be fully configurable using configuration files -system must provide clear description of what its components are performing -observer must be presented with a list of devices which failed -the code should include dummy device drivers for testing the software without hardware Creation of dummy environments brings great benefits during system development. They enable developers to debug the system before it is deployed. There are still cases when errors and bugs are detected only during night runs. But as the project matures, the number of those cases significantly decreases -and there are recorded cases when new software, with significant new features, was deployed and it just worked, without any debugging.
As the number of observatories running RTS2 grows, their management, fixing various small glitches, as well as their scheduling and data processing started to saturate staff time. We are aware of this, and we are in progress of creating tools which will allow us to manage network operations using a smaller amount of operators' time.

RTS2 quick overview
RTS2 is based on the "plug and play" philosophy. The parts which constitute the system can be started, restarted or stopped anytime, without affecting system performance. Special care is taken of resolving all possible blocking states, so the system will always respond to requests in reasonable time.
The whole system is user-space based and except for drivers provided either by hardware manufacturers or by our group, it does not require any kernel-space based components.
Code is designed around a central select system call, which picks any incoming messages. If there is no incoming message, class idle method is called. The code uses extensive hierarchy of its own C++ classes.
The RTS2 system consists of the processes summarised in table 1. Figure  1 shows processes and connections used on an example observatory, which includes three CCDs, two domes and a single weather station serving both domes. For a detailed description please see [9]. Table 1. RTS2 processes Process Description centrald Central component of the RTS2 system. It provides three main servicesa list of devices and services present in the system, system state changes and synchronisation among devices devices Corresponds to hardware attached to the observatory system. Different classes of devices are provided. They represents hardware coming from different manufacturers. services Represents execution logic of the system. They provides functions for the end user -carry observations, receive GCN and other alerts, enables XML-RPC 3 based access to the system. clients Provides information to end-user. They are usually run in interactive mode, with end-user interacting with the programm. They include an interactive monitor and simple tools to execute observation scripts.

Communication protocol
RTS2 employs its own communication protocol. A detailed protocol description can be found in [10]. The protocol is based on sending ASCII strings over TCP/IP sockets. It is fast, simple and robust. Among its important features are: -sending of "I am alive traffic", disconnecting connection if reply is not received -this removes connections to dead components Fig. 1. Example RTS2 environment. Two basic setups are present on the sitea telescope with a guiding CCD and a main CCD equipped with a filter wheel, and an all sky camera. There are two domes (one for telescope, second for all sky camera), three CCD detectors, executor and selector services controlling telescope setup and scriptor, a simplified executor service, controlling the all sky camera. Both observatories offers external access through XML-RPC protocol, provided by XMLRPCD. This is used by the Graphical User Interface and the Web server.
XMLRPCD also provides a Web browser with direct access to some functions. Access to XML-RPC and Web functions can be protected with a password.
-ability to switch to binary mode for data transfer, to transfer images and other large data items -traffic speed-up -values which were not changed are not transported The protocol supporting libraries provides developers with a flexible and simple way to use common parts of the system. This results in a robust, transparent code, which can be easily extended. This greatly enhances software reuse. There are reasonable hopes and claims that the interfaces are simple enough to be understand by developers unfamiliar with the project.

Coding style & philosophy
It is widely rumoured that there are as many approaches to coding as there are software developers. This section aims to list rules and customs used during RTS2 development. Although most of them are widely known we hope they are worth mentioning.
The best documentation for the code is the code itself. Readers of this article are encouraged to check out the system and study it. Code with complexity of the RTS2 project cannot be made completely transparent. It is not expected that the developer will understood code on the first encounter. It is important that he/she find interfaces being used. Later he/she can continue progressing towards system core classes.

Track software changes in Version Control System
The project is tracked in Subversion 4 tracking system. Previously version tracking relied on Concurrent Version System (CVS) 5 . We have to admit that switch of the version control system to Subversion provides significant improvements, and we were really surprised with Subversion capabilities. Subversion really provides simply resolution for situations which were difficult or impossible to handle with CVS.

Code releases
As the project is still under active development, a release formalism is yet not well established. There are usually two big releases during a year. All observatories are running Subversion code -code is regularly updated from Subversion and put to use. This way version management is also used for software distribution. We expect to calm this pace and establish a formal release mechanism.

Development cycle -release early
Changes are usually committed to Subversion as soon as they compile without any errors. Before each commit, difference between new code and code in repository is reviewed. Sense and purpose of committed lines is examined once again.
This approach tracks vast majority of bugs right before they make it to the version control repository. After all changes are committed, and at least some documentation which explains new features is provided, behaviour of the code is tested once again. It is up to developer, where tests will be performed. Usually the system is tested on dummy devices, before moving to a real observatory. But small changes, which should introduce predictable results, are tested directly on observatories.

Follow common practices
The common practices can be summarized by a few sentences, such as: Think twice, code once. Discuss with others, inform users about changes. Try to design a generic solution instead of a simple additions for new problems. Add new features slowly and test them before implementing another. Divide a complex problem to simpler subproblems, implement and test the solution for them first, and then integrate the code to solve the complex problem. Keep in mind that the best developers are able to design, code, test, and release no more then 100 lines per working day -try to keep number of lines small by reusing what is already available either in your code or in C++ libraries.

Current developments
This section deals with features which were recently added to RTS2. The list provided bellow is not complete -please see project change-logs for a more in-depth description.

State machines
RTS2 uses state machines. The states represents various states of the hardware -for example camera can have idle, exposing and readout state.
The states were originally used for coding purposes, to distinguish various states of the code. They were displayed in monitoring applications, so the user was informed what the device should do.
Later state use was expanded toward synchronisations. States prevent the camera from taking exposures during telescope movement, and unwanted telescope movements during camera exposures. They are displayed on users displays, allowing observes to identify which device is blocking the next exposure or the next telescope movement. The use of states for synchronisation is depicted in figure 2.

Default configuration
Originally RTS2 did not change configuration of devices prior to observations. That leaves the user responsible for device configuration. He/She can do that using either a monitor application or a script for observation.
This handling was later extended by RTS2 remembering default values. The command script ends was added to the protocol. It loads back all changed values.
This code was successfully used for focusing. The system was able to change back to the last known good focuser position even when a focusing run was interrupted. But it starts to show its bottlenecks, notably: Interactions of devices using states for synchronisation. Note how various block are placed to keep telescope from moving and camera from exposing. The observation is interrupted by bad weather reported from a weather sensor, which closes the roof.
-unpredictable behaviour for the end user -he/she changes something, and suddenly another value appears in acquired images -unknown default values -system unfortunately does not show which values were changed -the system was quite hard to code, and as a consequence, quite hard to maintain Various solutions were considered to resolve those bottlenecks. The following was selected as the best one: -when appropriate, the system offers default value and various offsets. The default value can be changed only on user request, and is never changed by the system. Offsets can be changed by user and/or system. Offsets are zeroed before beginning of the next observation. Both cases can be best understood from an example. The first case can be best demonstrated on the new focuser interface and the second on handling mode settings on CCD.

Offsets, default and target values
The interface provides end user with the following values: current value, which is read from focuser target value, equal to sum of values listed below default value, set by the user focusing offset value, settable by the user temporary focusing offset value, settable by the user, zeroed at the end of the observing script Changes of the last three values are used for user focuser interaction. The default value is changed when the user would like to significantly change focuser position. For example the offset can be set for different optical thickness of the filters. And temporal focusing offset is used in the focusing script to change the focuser position during focusing runs.
If a focusing run is interrupted, the focusing script finished, or the focusing program is disconnected, the temporal focusing offset is set by focuser driver to zero. Then a focuser new target value is calculated and used.
Similar control is used for the telescope. There offsets are used to command dithering from the target position.

Camera acquisition modes
New modern cameras provide various settings. Some of the combinations of settings are very good for astronomy, while some are pretty bad. A desired behaviour of the system is to allow the user to set only good settings, avoiding the bad ones, while still leaving an option to set all variables manually.
A desired behaviour of the system will also switch the camera at the end of the script to a default mode so that the next script will be able to use camera, without changing any variable. This functionality is provided by setting the camera mode back to default at the end of the script. A camera mode file provides various modes. The first is the default one, which is used to set camera. An example configuration file is included in the software distribution.

Observatory scheduling
It is widely known that observatory scheduling is not a trivial problem. Scheduling is known to belong to NP-hard class of problems, which does not make the problem easier. On top of that, it is not always clear what exactly should be observed in order to maximise the scientific output of the observatory.
RTS2 currently provides users with three scheduling modes: dispatch scheduling, queue scheduling and preprogrammed night plan. The advanced, genetic algorithm scheduling, which should be able to schedule full night runs, is in the late integration phase. The scheduling modes are described in the following sections.

Dispatch scheduling
Dispatch scheduling is the kind of scheduling used on most, if not all fully autonomous observatories. Each observable target is assigned a merit function -its expected benefits. Dispatch scheduling then calculates merits for all targets, picks the one with the best merit, and observes it. After this target is finished, the scheduler recalculates merit functions, and picks a new target. This approach is discussed by multiple authors, notably [7] and [14]. RTS2 provides this mode in the selector component.

Queue scheduling
Queue scheduling schedules targets with the human -night observer in the loop. The night observer is presented with a list of possible targets, which are worth observing. Based on the current conditions, instrument setup and astronomer preferences, he/she picks a queue and executes its observation. After the target is finished, a new target is picked from the queue, or the queue is changed by the observer. This scheduling is used by current big observatories -for example by ESO VLT [1], CAHA and IAC telescopes.
The observer can fill a queue of targets in the RTS2 executor component. There are plans to provide automatic selection from those targets -so the observer enters them in the evening, and the system will observe them during the night at optimal weather and time.

Night plan
Another option is to provide a detailed night plan, which will list sequence of observations, their time and how images should be handled. Although RTS2 does provide support for night plan, it does not provide any interfaces for plan creation and its management. The observer must fill a text file and load it into a database. Because of the complexity of this operation and lack of support tools this option is not widely used.

Genetic Algorithm based scheduling
None of the scheduling modes discussed above is optimal. Dispatch scheduling lacks predictability and is short-sighted -it produces schedules which search for a local maxima, instead of focusing on long-term gains. Queue scheduling and night planning requires non-trivial involvement of the observer.
We proposed and implemented scheduling based on genetics algorithms (GA). The problem is described in full depth in [11]. Here is outline of this approach.
Observation targets can list various constraints and merits. The algorithm then searches for the Pareto optimal front[12] -a set of schedules which does not violate any constrains and their respective merit functions are at least comparable to the other best schedules. The search for this front is performed with NSGA-II [2].
Experimental implementation of GA scheduling is part of the current RTS2 developer branch. The main benefits of GA scheduling are a simple addition of new constraints and merits. Ease of its reuse is demonstrated in [3].
The plan is either to provide the observer with Pareto front schedules and let him/her pick the one he/she likes, or to have system select autonomously during night the path which will be followed.

Weather blocking
It is very important to close the observatory roof when conditions are hostile for its normal operations and keeping it closed as long as those conditions prevail. Our records clearly indicate that having the roof open with bad weather over it produces significant problems as soon as the wrong roof state is detected.
The following sections deal with this problem. First the original approach is described. This is followed by a detailed description of current hardware and software setup, which is protecting equipment against the elements.

Original weather reporting
Keeping track of observatory conditions was originally a job of the dome module. It was regarded as a single point of failure. Dome code was handled with extreme care, and dome software was carefully tested after each software upgrade. Each device which might indicate bad weather sends information about weather state directly to the dome module. The dome module puts them together, decides if weather is favourable for the observation, and reacts accordingly -if all conditions were satisfied, it switched the system to "on" mode. If a single condition was not met, the dome control module tried to close the roof and switch the system to "standby" mode.
This solutions has the following problems: -dome software testing is time consuming -in case of a serious computer hardware problem the dome can be left open -every new weather sensor requires modifications to dome control software and extensive testing before being put in operation

Design of the state based weather handling
As different new sensors were added to the observatory setup, it becomes clear that weather reporting deserves special attention. The following requirements were put for the new algorithm: -it must be easy to add new sensors for weather reporting -it must be possible to configure the system to include various sensors -the system must report bad weather in the event of a sensor failure Apart from those software requirements, additional requirements were put on the hardware responsible for roof operations: -it must run independently from controlling computer -it shall be as reliable as possible -it shall be as simple as possible -it must close the roof on its own when it loses connection to its controlling computer -it must report to the computer the states of all sensors connected to it The software and hardware implementation which fulfills all those requiremens is described in the next section.

Current weather blocking
RTS2 centrald and all devices have a state. When changed, the state is propagated to all connections.
One of the bits in the state represents bad weather. If that bit is set, it means that the component (in case of device) or whole system (in case of centrald) reports violation of observing conditions, and therefore asks for controlled system shutdown.
Centrald also holds a list of devices which are mandatory for observation. If any of the mandatory devices is not present, or do not reply to centrald requests, the system is switched to bad weather state. This presents developers with a very simple way to add a new weather sensor to the system. They can use methods for weather state manipulation provided in SensorWeather class.
The ultimate weather protection element is on most systems a programmable logic controller (PLC). The inputs of this hardware are connected to the various sensors carring information about the roof state. The outputs of the relay are connected to the motors responsible for roof operations. The PLC is conservatively programmed. It is actually harder to open the roof then to close it.
PLC can be controlled manually with switches. For an image of actual roof control panel please see figure 3.
If the computer would like to open the roof it must send signals every few seconds to the PLC, to check that it is alive. The PLC is programed to close the roof if this "I am alive" signal is not received.
The critical failure points of this setup are: motor failure, failure of power wire to motors and signals to PLC, PLC failure or power failure.
We do not have capability to make motors redundant. PLCs are designed for harsh industrial conditions. The PLC program is relatively simple and is very carefully tested. As its installation requires the presence of the person installing it on site, it is not sensitive to usual bugs introduced by system upgrades. Power backup is provided by UPS capable of closing the roof. The UPS state is monitored by RTS2 and the roof is commanded to close if the remaining UPS uptime or battery level drops below a predefined values.
PLCs are also commodity value. In the unlikely event that the PLC is damaged or destroyed, a new can be purchased without significant problems on almost any place of the planet and installed within a few days from time the failure was detected.
Source code of the PLC program is available from RTS2 subversion repository. As was shown by installing it on multiple sites, this setup is fully replicable and quite modular. Fig. 3. Roof control box switches. Switching roof to manual or stop also stops autonomous observations. Unlike (graphical) user interfaces running on computers, it is very unlikely that this panel will fail. Watcher telescope, Boyden Observatory, Republic of South Africa. Table 2 provides list of the current RTS2 installations together with the year they started using RTS2 and current diameter of the primary optic. The full listing of RTS2 installations is provided on project web pages.

RTS2 installations
The system is currently being considered for various new and refurbished telescopes. Please note that 1.23m at CAHA is currently operated in semiautomatic mode, and RTS2 is not on controls only during a few nights. More details about this can be found in [4].
8 Coding strategy -to restart or to code properly?
At the beginning of the project, we sometimes employed a "last chance" strategy where if something fails 6 , the system will notice it, wait for a while and restart the failed part. Although this looked as very wise approach, it turned out that it hides potential serious problems and can affect system performance. Moreover, with errors in certain places it can produce situations when devices will keep restarting, produce a few lines in the system log, and exit.
Our experience is that the if it fails, let system restart it and hope for the best strategy is contra-productive and should not be employed. After fixing the most important bugs in the code, the programmes run on some setups for months without need for a single restart. If a problem is encountered, its root cause is found and fixed. A fixed driver is then installed and started with the remote ssh access.
The following things are vital for this strategy to work: compile all code with debugging option, change ulimit for core dumps to unlimited and knowledge of how to use gdb 7 to find cause of the core dump. If all of the previous fails, have valgrind 8 or a similar memory profiler installed, with knowledge of how to use it to find memory allocation problems.
Once the coder has managed all those issues, not only will the system operate smoothly, but he/she also will not be temped to switch to some language with a garbage collector 9 . That is not to say that high-level languages are not fine for some jobs -we do use Python for GUI and Java/PHP for Web pages. But in our experience, for low-level, hardware control algorithms, nothing beats properly designed and coded C/C++ code.

Expected project changes
Although RTS2 is able to control autonomous observatories, there is still a lot which deserves more attention, some solutions and some coding. These items are presented in the following list.

Image quality monitoring
The system currently provides some basic quality checks. Those includes results of a plate solving routine provided by Jibaro [13] and/or Astrometry.net 10 packages, so observers knows if telescope is pointing towards the expected position.
The system lacks real time display of various image quality parametersaverage and median values, minimal and maximal values, number of detected objects (stars) in the image, and so on. Those are calculated and saved to the image header. Our experience shows that those should be easily accessible in real-time displays, as well as in tool for navigation through the image archive.

Image processing
Image processing can be configured as an external script, providing relative photometry and other services. We would like to better integrate it with the system by: -adding calibration image database, and providing tools to maintain and use this database -providing observations-to-paper solutions for predefined well understood problems -among others observatorion of planetary transit occultations and micro-lensing events

Better user interfaces
The system is primarily controlled through an ncurses based interface. While this is sufficient for experienced users, the interface is not well suited for occasional users. Development of both Web and X-Windows Graphical User Interfaces is currently in initial phases. Plans also call for addition of applets for GNOME and other desktops environments, so we will be able to track telescope operations world-wide from a single desktop.

Network management
The System currently lacks a central management console. Observatories are usually managed and monitored through ssh connections. While this approach is feasible, it quickly grows beyond the capabilities of a single maintainer.
There is a need for application, which will: -show states of individual observatories in the network -list unresolved problems of individual telescopes, keeping track of actions to fix the problem, and the results of those actions -enable scheduling of the whole network -synthesise results obtained by network members It would be nice to have those features backed by simple, low-level, Internet interface. There is work in progress on an XML-RPC interface which will do just this.

Archive access
RTS2 keeps all important information in a database. Information about executed observations, and images acquired, together with basic image characteristics (which includes WCS coordinates, fitted by Jibaro and/or Astrometry.net) is recorded for quick retrieval.
Current archive access is provided primarily through console based utilities. Without doubt in this is a bit too old fashioned in the graphical user interfaces age.
There were PHP scripts for Web based archive access, which even included cut-out service. Unfortunately they were not very well designed. As the project matures and introduces changes to the database, those scripts cease to function, to the point when it was ruled to be too expansive to make them work again -as is the usual case with simple small fast PHP and other scripts.
Up to now multiple attempts to provide better solutions were made. Yet till now each of them has failed to deliver usable results.
Current attempts include a Google Web Toolkit XML-RPC backed application. We have reasonable hopes that this will deliver promised results, although not in a short timescale.

Conclusions
This article presented an open-source system for robotic observatory control. It provides an overview of rationales for its development, its composition and its main features. It then focused on a list of recent improvements. The article concludes with lists of development items still left on the agenda.