Secure Open Document Format: October 2014

Sunday, 26 October 2014

Functional Implementation of Validators

Should document validators be written in imperative or functional languages?

They could be done in both, of course, but there are benefits to implementing them in a disciplined functioal language.

Functional languages force you to break down the problem and describe it declaratively. Ths is good discipline because it means your understanding of the problem has to be precise with clear input/output behaviours.

Furthemore, each sub-problem is addressed by smaller functions, each of which must always behave in predictable ways, with guarantees that there will be no interference from the state of other parts of the program - there are no global variables or the state of other objects which can affect the function at all - that's the guarantee of functional programming.

It seems to me that functional programming discipline is ideal for reducing risks in security enforcing software.

I think I'll try clojure to prototype a validator.

Saturday, 11 October 2014

Executable Content

There is another thing we want to include in our definition of secure document profiles - we don't want any of the content to trigger any kind of unpredictable execution on the client.

Javascript in PDFs, or VBScript in Office files, would fail this test.

In reality, each element of the document's content, the XML tags and fields, need to be acted upon - otherwise they would be useless. So in some sense they do all cause execution to happen.

The difference is between document elements which

Cause predefined, constrained and predictable execution - such as <bold> might, and

Allow execution to happen which cannot be defined beforehand, and so cannot be finely constrained, such as <script> might.

You might argue that the execution permitted by the latter category will always be constrained - it may only run in a sandbox, or within the application process with its privileges, and wouldn't have arbitrary access to the network or the disk or the users contacts. You'd probably be right.

So this leads us to the question of how high do we want to set the "paranoia" bar. I want to set it as high as I can, and we'll revisit this issue when we hit actual functionality decisions. The guiding philosophy here is to only allow just enough functionality to be useful for the most common functions. Being able to set bold, underline etc falls within that category but executing macros and other scripts is a minority requirement.

Another reason why the latter category of code execution will not be allowed is the reality that software has bugs. That's a reality, you can't pretend there is code without bugs. Bugs means the application reading a document could be subverted to execute malicious code.

Now if a buggy application only ever ingested upper-case ASCII [A-Z] with document sizes of 1-144 characters only, then it is really difficult to subvert the application. On the other hand, if the application was allowed to execute very widely scoped <scripts> then it is easier to subvert it.

This boundary between the two types of execution appears to me to be fuzzy. I'll need to do more digging. I'd appreciate any thoughts via twitter @secureodf.

Wednesday, 8 October 2014

Verifiably Correct Content

Now that we've started to successfully remove elements from the ODF XML we now need to think about what we want to leave in, and how we might define testable constraints for it.

We used the word "verifiable" in the first post. Let's expand on that:

All content conforms to a set which we know to be safe.

This means:

Only certain XML keys are permitted. The name of the tags and attributes must be exactly correct, matching a finite set of known-good variants. Misspellings are not permitted.

Only certain XML values are permitted. This means the stuff between <x> and </x>, and the stuff assigned to attributes like <x attribe="yyy"> must be from a known good set.

Where fields have user or automatically generated content, like the actual text of a document, or the names of automatically generated styles, this content must have range and size constraints. Range means each piece of this content be know-good, like very simple ASCII for example. Size means the content must not be smaller or larger than known-good limits, to prevent buffer overruns or even unexpected behavior with zero-length content.

The order in which content appears must also be right. We can't predict how ODF readers will behave

The content must be complete. That is required (vs optional) items should not be missing, as this can lead to unpredictable behaviour in reader applications.

So these are the constraints - and they must be precise enough for us to define tests for pass / fail for them - the content must be verifiable.

UPDATE - I added the requirement for non-executable content, see next post,

Tuesday, 7 October 2014

Minimal XML Test Document

So after being able to rebuild an ODF from component files as per the previous post, I wanted to first see if I could find a minimal set of XML which would successfully load and open into LibreOffice and MS Office 2007.

So this worked! I was surprised by how much I could pare away.

The results are 3 files as follows, and this isn't a bad place to start describing a minimal XML subset of ODF.

We'll keep paring away....

META-INF/manifest.xml

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<manifest:manifest xmlns:manifest="urn:oasis:names:tc:opendocument:xmlns:manifest:1.0">
    <manifest:file-entry manifest:media-type="application/vnd.oasis.opendocument.text" manifest:full-path="/" />
    <manifest:file-entry manifest:media-type="text/xml" manifest:full-path="content.xml" />
</manifest:manifest>

content.xml

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<office:document-content office:version="1.1" xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:ooo="http://openoffice.org/2004/office" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0" xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0" xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0" xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0" xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0" xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0" xmlns:number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0" xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0" xmlns:chart="urn:oasis:names:tc:opendocument:xmlns:chart:1.0" xmlns:dr3d="urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0" xmlns:math="http://www.w3.org/1998/Math/MathML" xmlns:form="urn:oasis:names:tc:opendocument:xmlns:form:1.0" xmlns:script="urn:oasis:names:tc:opendocument:xmlns:script:1.0" xmlns:ooow="http://openoffice.org/2004/writer" xmlns:oooc="http://openoffice.org/2004/calc" xmlns:dom="http://www.w3.org/2001/xml-events">
   
   <office:body>
       <office:text>
           <text:p>
               <text:span>test</text:span>
           </text:p>
           <text:p>
               <text:span>1<text:s/>2<text:s/>3<text:s/>4</text:span>
           </text:p>
           <text:p>
               <text:span>abc<text:s/>123<text:s/>def<text:s/>456</text:span>
           </text:p>
           <text:p />
       </office:text>
   </office:body>
</office:document-content>

mimetype.xml

application/vnd.oasis.opendocument.text

ODF Explode and ReZip Fails

I thought I'd do a very basic test first by unzipping an ODT file to see what was inside it, and then rezip it into a new ODT file.

This failed. I'm trying to work out why.

The following should work:

Save simple ODT from Google Docs (or LibreOffice) test.odt
Rename test.odt to test.zip and unzip it to a folder test/
Explore the contents of the test/ folder which contains stuff like contents.xml and a manifest.xml
Don't make any changes to any xml files
Rezip the test/ folder as test1.zip, rename it to test1.odt
Open it using LibreOffice.

Step 6 fails with LibreOffice reporting a corrupted document. I'll need to find out why. It could be a timestamp or some kind of hash that is part of the zip structure.

UPDATE - the issue seems to be the file path ames of the entries in the zipped file which should not include the folder name itself. That is folder/file.xml is bad, but file.xml is ok. If you create a zip from within the unzipped folder, that works. That is, if you create test1.zip from within the test/ folder the resultant test1.zip can be renamed test1.odt and that file opens fine.

Secure ODF Profile

Welcome!

It strikes me that office productivity documents are a significant vector for malware and not much is being done to design a document format which specifically tries to prevent or mitigate these exploits.

This blog will track the development of a subset of the Open Document Format (ODF) which meet the following criteria:

Verifiable - All content is fully and strictly verifiable, with no scope for unexpected behaviour or overloading.

Minimal - Just enough structure and options to meet user functionality.

Libre - Fully open and independently implementable.

These objective will be refined as we proceed.