Business Collaboration over the Web: Distributed group/team editing technology

Abstract: Business collaboration, what is it, is there any way to improve it, and does it need improving? As it stands at the moment, business collaboration can be a tricky situation, even the seeming simple task of converting from one type of document to another can have quite unexpected results. An engine (implemented in Java) is proposed to help solve this dilemma and perform a lot of the work associated with business collaboration. This engine supports as many types of documents as possible, allowing the creator of the user interface to define the types that will be supported. Initially it will be available via the Web to allow an easy way for people to access it. In the near future a stand-alone version will be available, supporting more advanced features.
Keywords: Collaboration, World Wide Web, Web server, Java

D.R. Hean and W.H. Page
December 1999

Institute of Information Sciences and Technology
Massey University, Palmerston North
New Zealand
Contact me

Presented at the 6th Annual New Zealand Engineering and Technology Postgraduate Student Conference (2nd to 3rd of December 1999) and as
published in the Proceedings (ISBN-0-473-06494-4)

What is business collaboration?
Business collaboration is where people assist and co-operate with colleagues and other contacts to further a particular aspect of the business that they share.

What if you could keep track of all the changes to a document (any kind of document, not just word processor files), without mandating that everyone use the same package (especially important if you collaborate with other businesses or even just people out side your business).

You would be able collaborate with anyone, at anytime with all changes and versions kept up to date automatically. Furthermore, all changes could be traced back to whom made them and when. All old versions of documents would be available for you to check information.
Why over the Web?
The Web is becoming (or already has become) an essential part of most businesses. Most businesses either have access to the Internet, or will have access in the near future. Because of this, it is important to use the technology that not only fits best with the possible solutions, but also one that allows for a large portion of the potential users to be able to access it, and to communicate with one another. After all, communication is essential for collaboration to take place.

Most people assume that because the Web gets so much attention these days, that it is all there is to the Internet, whereas in reality, the Web is just one application that runs over the Internet. The Web is just one of the most visible applications, along with Email.

Technically the solution that I am proposing will run over the Internet, rather than just over the Web. The reason that I am starting with the option to run it over the Web is because by running it over the Web, the majority of people will have quick and easy access to it. There is the potential to have more powerful versions that work beside the Web version and support more advanced features. However they will require some effort on the user's part, whereas the Web version will be wholly server-side, and therefore require practically no set-up by the user.
Common methods used
Currently the most common methods for group document (any kind of document, whether it is a word processor file, source code, etc) creation are to either write a portion of the document and pass it along (usually by email), or to have a shared directory where everyone reads and writes the document. This requires that everyone that is to collaborate on the document has either access to the shared directory, or has access to email and everyone's address (neither of which are a great problem now).

A problem with this approach is that everyone must use programs that have filters that can read every other program that others use. Even if everyone has the same program, anyone that has worked with users can confirm the problems of people sharing documents written with different versions of the same program, let alone different programs. Often problems such as these only show up after continued use (although sometimes they can show up at the start).
Isn't collaborative editing and version tracking already present in some packages?
Of course if you want any sort of version tracking, then you are limited to using one program, (sometimes others may be compatible, but unless they are completely compatible, problems may not show up until it is too late to solve them). Generally different versions of the same program will have compatible versioning information.

If you work in an environment that enforces the use of only one particular software package, then you may be able to get away with just using the in-built abilities of the particular package.

On the other hand, if you ever need to collaborate with people outside your organisation, or your organisation has a more relaxed view on software packages that you can use, then there are more factors that need to be taken into consideration. The people that you want to collaborate with may not have a package that supports collaboration in the same way as your package, and even if it does, what happens to the information when it is saved in a different format?
Do we really need the version tracking information?
If you need to work with others in your work place that use different packages, or you need to work with others outside your organisation, what can you do? The easy answer is to avoid the question by keeping very little in the way of version information (maybe a copy each time you make a change, assuming of course that there are backup copies available of the latest version at all times). Often this may be enough information, but in a complex document you may need to know who changed what and when, or what changes a particular person made, or other specific change information.

If all that you need in a package is the ability to save a version now and then as a backup (for example, a copy of the document as at the end of each week), then any package will work much the same as any other. Unfortunately you lose any information about who changed what, although you can narrow down the when, to the right week.

On the other hand, if you are involved in creating a large, complex document that involves many people, simple backups may not be enough.
What is currently available?
For documents such as source code, commonly used programs have a check in and out facility to allow for versioning. This stores each change in the file along with when it was made, and by whom. It works well for those that have a good handle on the complex issues and programs that they have to drive. Some examples of these programs are CVS, PRCS, RCS, Aegis, SourceSafe [1] (Some of these have some web front ends, generally for reading).

For word processed documents, each of the major word processing suites have some version tracking included, that allow some group collaboration. Although compared with the abilities of the above group, they are very limited. Examples of the word processing suites are Microsoft Office (Microsoft), WordPerfect Suite (Corel), Lotus Notes (Lotus), StarOffice (Sun).
What is available on the Web?
Commonly the Web is used for retrieving information, and there is very little in the way of interaction between users. The Web is mostly one way, from the web master/mistress (person), to the reader. There are a few initiatives to make the Web more interactive. One method is to add forms onto web pages, such as guestbooks, comment pages, polls, and feedback forms. Another method that is employed by a small number of web sites is to make the whole web site editable by anyone, such as WikiWikiWeb [2], and it's clones.
One possible answer
I am currently working on creating a collaboration system that is based on the original intentions of the Web, that is to use the Web as a two way street, both for reading and writing. This will, as another benefit, provide a possible answer to the problem of collaboration between different people with different packages.

In searching for the very information that I needed to work on this problem, I came up against the very problem that I am trying to solve. This problem is not so evident in print mediums, because once it is printed, it can be stored that way, and is very seldom rewritten and replaced. Unfortunately, on the Web, things are changing so quickly, that information that is present one day may not be present the next. Sometimes this is deliberate changing of information, but more often it is just that the information that was present, seems to the author to have less relevance now, and so to keep the page updated, they remove that piece of information. The problem with this is that the very piece of information that one person may think is no longer relevant, and years out of date, may be the very information that someone is looking for, as I found in my search. Luckily W3C (The World Wide Web Consortium, www.w3.org) did keep the old outdated pages, and some digging around their site revels them, even if it takes a little work.

While looking through these documents I realised that Tim Berners-Lee's [3] original vision for the web was along similar lines to what I was thinking would be a good direction for the Web to go.

My idea was for an engine that could be used by almost any front end (be it web browser, word processor, web site, etc) that would allow interaction in a full version tracking fashion. This will allow web sites (or any server or application) to keep the full revision history for documents (any kind of document, text, pictures, sound, etc), and allow any groups that have permission, to edit, update and change any available document, provided they have sufficient access to be allowed to do so.

This will allow web sites to be fully interactive, while keeping the version information that is important to ensure that the information is correct, and will even allow the whole site to be rolled back to a previous known working configuration, should disaster strike.

Because of the full version history that is included, and the ability to go back and check the previous versions, it will be easier to trust the information that you receive, as you can easily check up on who changed what and when, especially if the site uses some form of write once media to either store the information, or to back up the new information to, while checking the current information.

This added ability to trust (especially a web site), also follows one of the ideas that Tim had regarding the Web.
Current progress
This site remains to provide information on these ideas. I can only provide the ideas and documents because the code was stolen early in 2003.

~~Currently the progress of the engine to achieve the ideas that I have written about in this paper is in the Alpha testing stage (as of December 1999).~~

I have implemented the base functionality to allow the engine to be extended by extra modules, as people want or need them. It has been designed such that it only requires the minimum of modules to work with base functionality, with the ability to easily extend it, both now, and in the future.

Currently there is only a simple command line interface to the program, and it can only interactively store information in the current operating system's file directory, with the ability to access a read only demo inside the distribution archive. The engine has been designed to move as many arbitrary limitations as possible to the design of the user interface that accesses the engine. This allows the creator of the user interface to support any and all types of documents. In principle the engine can handle any type of document with complete transparency, this potentially also includes streaming media (a sequence of video and/or audio that is sent in a compressed form over a network and is decoded and displayed as it arrives, rather than the traditional method of downloading the whole clip, and then viewing it).

I am currently in the process of creating a module to store the documents in a database, for easier manipulation, as well as creating a web based front end for the engine, rather than the current simple command line front end.

The engine has been written in Java because the main considerations of this engine match the strengths of Java. These strengths are the fact that it is an object-orientated language, with in-built networking and security features, the servlet [4] persistence in web servers, and the cross platform write once run anywhere ability. The write once, run anywhere ability will by itself ease the maintenance of the engine once it is complete.
~~For the future~~
Complete the database integration of the engine. Create a web based front end (running through servlets on the web server), and start testing it in the real world. The first tests will be by select groups of people, followed by more open trials, available to the general public.
References
[1] Source code version control
CVS, "Concurrent Versions System" http://www.cvshome.org/
RCS, "Revision Control System" http://en.tldp.org/HOWTO/mini/RCS-1.html
PRCS, "Project Revision Control System" http://prcs.sourceforge.net/
Aegis http://aegis.sourceforge.net/
SourceSafe Microsoft
"Using CVS for Web development" http://philip.greenspun.com/wtr/cvs

[2] Quote from their front page: "Wiki is a composition system; it's a news server; it's a repository; it's a mail system. Really, we don't know quite what it is, but it's a fun way of communicating asynchronously across the network.", http://c2.com/cgi-bin/wiki/

[3] Tim Berners-Lee, "About The World Wide Web", WWW Creator, http://www.w3.org/WWW/, (current Sept. 1999).

[4] "Servlets are server side Java programs that extend a web server in a similar fashion to CGI scripts", http://java.sun.com/products/servlet/whitepaper.html

Previous:	CMS
Next:	Business Document Management Software

Business Collaboration over the Web: Distributed group/team editing technology

YEdit