Wednesday 8 October 2014

Verifiably Correct Content

Now that we've started to successfully remove elements from the ODF XML we now need to think about what we want to leave in, and how we might define testable constraints for it.

We used the word "verifiable" in the first post. Let's expand on that:

All content conforms to a set which we know to be safe.

This means:

  • Only certain XML keys are permitted. The name of the tags and attributes must be exactly correct, matching a finite set of known-good variants. Misspellings are not permitted. 
  • Only certain XML values are permitted. This means the stuff between <x> and </x>, and the stuff assigned to attributes like <x attribe="yyy"> must be from a known good set.
  • Where fields have user or automatically generated content, like the actual text of a document, or the names of automatically generated styles, this content must have range and size constraints. Range means each piece of this content be know-good, like very simple ASCII for example. Size means the content must not be smaller or larger than known-good limits, to prevent buffer overruns or even unexpected behavior with zero-length content.
  • The order in which content appears must also be right. We can't predict how ODF readers will behave
  • The content must be complete. That is required (vs optional) items should not be missing, as this can lead to unpredictable behaviour in reader applications.


So these are the constraints - and they must be precise enough for us to define tests for pass / fail for them - the content must be verifiable.


UPDATE - I added the requirement for non-executable content, see next post,

4 comments:

  1. I notice that you don't mention the ODF OASIS Standard much. That should be a source of constraints. Plus a good way to express a secure profile is as annotations on the specification.

    There are a variety of ODF validators. One is part of the ODF Toolkit, available at Apache, developed in Java but presumably can be ported from that, http://incubator.apache.org/odftoolkit/

    There's also a new project being proposed at Apache, Corinthia, that has some intersection with Secure ODF in the sense that profiling will be done and there will be test suites. https://wiki.apache.org/incubator/CorinthiaProposal

    There is also the Apache OpenOffice project itself, although that might be rather heavy-weight. http://www.openoffice.org/

    The common value to your work might be that the Apache License is permissive the way you might prefer for anything you incorporate in Secure ODF work.

    ReplyDelete
  2. Thanks - I do in fact intend to work with the ODF standard, and not invent a new standard from scratch. The benefits are many - it is an increasingly popular standard adopted by national governments, it is an open standards so there are not obscure execution paths, and several popular applications already work with ODF.

    I will explore Corinthia - thanks for the pointer.

    Are "annotations" additional constraints over the basic ODF standard?

    ReplyDelete
    Replies
    1. A way of expressing a constrained use of the ODF format would be by producing a profile that lines up with the specification sections that are qualified by the profile. That is what I meant.

      For example, Microsoft produces a document that describes their compliance with ODF and any deviations. So you can read them side-by-side to see what the qualifications are.

      A profile document that establishes what the SecureODF constraints are could work the same way.

      Delete
    2. Thanks - that is a good idea. My starting point will be establishing the security objectives - and from those work out whether a simple constraint for each ODF spec section would work (local - eg length of fields) .. or whether wider-scoped constraints need to be applied (global - eg strict order of tags). I don't know yet - I really need to find some time to make a start :)

      Delete