Xena is open-source software for use in digital preservation. Xena is short for XML Electronic Normalising for Archives.
Xena is a Java application that was developed by the National Archives of Australia. It is available free of charge under the GNU General Public License.
Version 6.1.0 was released 31 July 2013. Source code and binaries for Linux, OS X and Windows are available from SourceForge. However, as of 2018, it is no longer maintained or supported.
Mode of operation
editXena attempts to avoid digital obsolescence by converting files into an openly specified format, such as ODF or PNG. If the file format is not supported or the Binary Normalisation option is selected, Xena will perform ASCII Base64 encoding on binary files and wrap the output in XML metadata. The resulting .xena
file is plain text, although the content of the data itself is not directly human-readable. The exact original file can be retrieved by stripping the metadata and reversing the Base64 encoding, using an internal viewer.
Features
editPlatforms supported by Xena are Microsoft Windows, Linux and Mac OS X.
Xena uses a series of plugins to identify file formats and convert them to an appropriate openly specified format.
Xena has an application programming interface which allows any reasonably skilled Java developer to develop a plugin to cover a new file type.
Xena can process individual files or whole directories. When processing a whole directory, it can preserve the original directory structure of the converted records.
Xena can create plain text versions of file formats such as TIFF, Word and PDF, with the use of Tesseract (software).
The Xena interface or Xena Viewer can be used to view or export a Xena file (extension .xena
) in its target file format. These files contain the normalised file as well as any extra information relevant to the normalisation process.
The Xena Viewer supports bulk export of Xena files to target file formats.
Xena can be used via its graphical user interface or the command line.
For Xena to be fully functional, it requires a local installation of the following external software:
- LibreOffice suite - to convert office documents to OpenDocument format
- Tesseract - to create plain text versions of file formats
- ImageMagick - to convert a subset of image files to PNG
- Readpst - to convert Microsoft Outlook PST files to XML. Readpst is part of the free and open source libpst software suite.
- FLAC - to convert audio files to FLAC format. This is also required to play back audio files using Xena.
Supported file types
editXena will recognize and process the file types listed below, plus a few others of minor importance. Unsupported file types will automatically undergo binary normalization.
Office file formats:
- Microsoft Office files (including MS Office XML, SYLK spreadsheets and Rich Text Format) are converted to the corresponding OpenDocument files
- Microsoft Outlook PST files are parsed for their individual messages, which are converted to XML files and a Xena index file is created
- Microsoft Project MPP files are converted to XML
- OpenOffice.org XML files (SXC, SXI, SXW) are converted to the corresponding OpenDocument formats
- WordPerfect WPD files are converted to OpenDocument ODT
- OpenDocument documents (ODT, ODS, ODB, ODP) are preserved unchanged
- Acrobat PDF files are stored as binaries
- Mailbox files (MBX) are converted to individual XML files
Graphics:
- BMP, GIF, PSD, PCX, RAS, and the X Window System XBM and XPM bitmap files are converted to PNG; TIFF files additionally get embedded metadata stored in Xena XML. If the Tesseract OCR software is installed, text will be extracted from TIFF files.
- OpenDocument Drawings (ODG) and SVG files are wrapped in Xena XML
- JPG and PNG files are stored unchanged
Archive Files:
- Files are extracted from archives (ZIP, GZIP, TAR/TAR.gz, JAR, WAR, Mac binary) and normalised into a separate Xena file. A Xena index file is created, which when opened in the internal Xena viewer will display the files in a table.
Audio files:
Databases:
- SQL files are processed as plain text wrapped in XML
Other file types:
- HTML is converted to XHTML
- TXT text files are stored as plain text wrapped in XML; CSS files are stored as plain text wrapped in XML
Reviews
editAn April 22, 2010 review in Practical e-Records rated Xena at 82/100 points. At present Xena has no target preservation format for video files.[1]
References
edit- ^ "Review of XENA Normalization Software". 2010-04-22. Archived from the original on 2012-07-08.