The Java Apache Project Information System

by Stefano Mazzocchi

Introduction

The importance of information systems is often underestimated by the programming community. While high standards for coding, bug fixes and behavior have been created and acknowledged, information systems (documentation and web sites) suffer from a lack of such search for perfection. In this paper, some ideas for the design and creation of such system -- as well as guidelines for their management -- are described. This paper is mainly for active developers and people interested in understanding the project management processes.

A container for projects

The Java Apache Project started as one single project (the Apache JServ servlet engine) and later become involved in the general idea of Java software creation. For this reason, the whole information system is designed in such a way that many independent project may be contained and glued together by the same resources (web site, mail lists, CVS repository). These resources offer a common ground between the hosted projects and reduce the overhead for general project management and save time that can be dedicated to more productive (and fun) activities.

Main goals

Reduce the work
Volunteer project usually have an hard time finding people willing to spend their time writing documentation or improving/rewriting/porting existing ones. One of the major issues in designing good information systems is to minimize the work done by contributors to manage and enhance the systems.
Esthetic issues
Esthetic issues do have a place in information system design, even in places where such needs may appear of second importance. While trying to keep the overhead of esthetic content to a minimum, a pleasant and original graphic style can help a project to acquire new subscribers by attracting their attention. The Mozilla web site was an inspiration for both graphics and design ideas.
Heritability
Another big issue in volunteer projects is keeping the process separated from the contributors. This allows easy replacement in case people leave the project or remain vacant for a while. Such bottlenecks should be avoided by documenting every aspect of the development and writing guidelines (like this paper) on how things where done and how should be done in the future.
Flexibility
Even if every aspect of the development process is documented, flexibility and adaptivity should be always considered important issues. Software projects, like the people who develop them, are subject to changes. The project processes should remain as flexible as the project so flexible that should be easy to them as the project evolves.

The java.apache.org web site

The project's web site is the portal to all the resources hosted by the Java Apache Project. It's the front door for the information system that includes all the subprojects documentation and glues them together. The HTML structure of the web site is designed following this idea. The graphic part is separated by the contents using frames and placing graphic files and informative files in different directories. Even if the site was designed using to be pleasant and well rendered on graphic displays at all resolutions, the site can be browsed even with browsers on text-only systems to allow the widest possible range of users to have access to the information contained by the site. We understand that in many countries, text-only internet access is the rule, not the exception.

The web site is composed by two parts: the navigation bar on the left frame and the browsing area in the main frame. The navigation bar uses the same file used by text-only browsers as a table of content. This is to minimize the overhead of having two different files for the same thing.

The tree structure of the web site reflects its design ideas:

  • / - the root directory contains the index.html page as well as graphic frames pages. This directory should not contain any information but only graphic contents that doesn't change. This allows better separation between graphics and content and it's important for dynamic content generation.
  • /main – the main directory contains all the general documentation shared between projects as well as the table of contents file used for the navigation bar.
  • /images – this directory contains the graphic content for the main site. Projects store their graphic in their own directory (see below)
  • /images/ads – this directory contains the images used as advertisement on sites that donate advertising spaces on their web sites.
  • /xxx – every project has its own directory and this directory should contain the exact copy of the HTML documentation stored in the CVS module for that project.
  • /xxx/images – this directory stores the graphic contents relative to that project. This allows documentation to be valid if both browsed from a distribution or from the web site.
  • /xxx/dist – the distribution directory where product releases are placed. Every project has its own. This directory should also contain the HEADER.html and README.html files used by the web server to add graphic contents and other information to the directory listing. This directory should not be contained in the distributed HTML documentation.

Subproject documentation

Each subproject directory should contain a copy of the documentation section found in the project's CVS module. This allows automatic update of the web site without human action simply by updating the directory using a CVS client. The HTML documentation for every project must be designed with there idea in mind: it should be possible to simply move the documentation on the web site without breaking any link. For this reason, relative links must be used between files and absolute ones must be used when they refer to directories hosted on the web site. In fact, some directories present on the web side (i.e. the distribution directory) should not be placed in the CVS module. Each subproject must have its own graphic content (logo, snapshots, etc..) stored in a subdirectory relative to the main documentation directory.

Since each subproject has only one documentation directory shared by the distribution and the web site, the documentation must be complete in all its parts and reduce the external or absolute links to those resources which should not be included in the product distribution. This allows both complete off line browsing as well as simple web site management between different projects since it can be made automatic.

Very Important: one other thing to keep in mind is that the web site will show these pages in a frame. For this reason, every link to external resource MUST have target="_top" set to avoid mixing Java Apache information with other resources that could create legal problems. If any external link is found in the documentation without the proper target location must be considered a bug and corrected.

Other documentation media

HTML was designed for distributed information systems but its nature does not allow nice porting to other media. While this has been partially covered with the introduction of the media abstraction in CSS (cascading style sheets), we analyse other documentation formats to evaluate their benefits for the project.

  • HTML is the standard for hypertext documentation systems. It has a human readable/editable format, it's simple, open and standard, and editors are available for every operating system. Its main problem is lack of media abstraction that gives bad results on media different from computer displays (paper, speech, braille-devices, etc…)
  • TeX is a well known word processing format but like HTML, it is highly targeted on printing media. There exist tools for the conversion of TeX files to other formats (plain text, info, HTML, etc…) but for the same reason HTML generated bad printing material, TeX-generated HTML files offer poor use of hypermedia contents such as links and graphics.
  • DocBook (SGML) is an SGML DTD designed to be a container for well described documentation information. While DocBook files are not viewed or printed directly, tools exist to convert these files to both printing media (dvi, ps, pdf) or hypertext media (html). The Linux Documentation Project as well as O'Reilly use this system for their documents and books.
  • DocBook (XML) is the (yet unofficial) XML port of the SGML DocBook DTD. The port of DocBook on XML would allow the use of new innovative languages that are currently being standardized by the Web Consortium to enhance the portability along with the media abstraction. The use of XSL (eXtended Stylesheet Language) together with XLink (eXtended LINKing language) would guarantee an amazing flexibility of such documentation format, because content (XML), presentation (XSL) and hierarchy (Xlink) may be separated in different files. Browsers and automatic tools may then choose to "shape" the information in the most useful manner for the media being used, allowing complete media abstraction. Even if major efforts have being committed to the design of such flexible information system, there are no currently available standards for XSL and Xlink (only drafts and proposals) and very few research tools work with them.

Conclusions

While many issues in the creation of information systems have been addressed by this paper, better tools and documentation formats are needed to significantly improve the usability and usefulness of such systems. XML and the other languages that are now being researched and standardized offer a nice alternative to old operating system oriented tools or to portable but not abstract enough languages such as HTML. Currently, the Java Apache Project uses plain text and HTML as documentation formats, but will move to more advanced and flexible documentation systems based on XML when (and if) they will be standardized and freely available.

Copyright (c) 1997-98 The Java Apache Project.
$Id: design.html,v 1.4 1998/12/28 09:07:03 ed Exp $
All rights reserved.