Subject Classification with DITA Markup for Agricultural Learning Resources. A Case Example in Agroforestry

Technical documentation and training materials are important elements in helping to accelerate the use and impact of agricultural research for development. The creation and delivery of these resources can be enhanced through content enrichment and the production and reuse of modular components. This process can be further improved by integrating rich semantic descriptions with resource metadata and domain-specific markup combined with the consistent use of controlled vocabulary. The Darwin Information Typing Architecture (DITA) supports both the integration of metadata as well as markup for technical as well as learning and training content. DITA also includes mechanisms for adding the semantics of taxonomy and ontology definitions for classifying content.

This paper explores the potential use of enriching agroforestry learning resources with DITA markup through descriptive metadata and subject classification.

Contributed by Thomas Zschocke.

  1. The paper is an interesting one as it tackles with the practicalities of metadata for a very particular domain. The authors show their expertise in the field and a lot of information can be extracted from it.

    The main flaw of the paper in my view is that it is actually containing two papers inside: (1) a sketch of the problem of scope of agroforestry and how this impacts the use of existing KOS, which is basically in Section 2, and (2) a discussion of how mappings between thesauri and other KOS can be done inside DITA. While the two parts are connected logically, they deserve separate attention, and the result is that the paper is too short for containing a complete discussion on the problems.

    As a conference paper, it is presenting the issues and proposed solutions, but I would suggest the authors a more repeatable and maybe quantifiable approach, i.e. counting matchings between the KOS and the looking at the overall matching structure to try to discover the main relations between them.

    Another issue that is not clear to me is if technically the mappings should be stored and mantained inside the DITA-based system. These mappings are already stored in some form elsewhere, for example AGROVOC is having these mappings exposed. If the mappings are replicated in each system, there will not be a possibility to benefit and reuse from the effort of others. Maybe a more sensible solution would be that of exposing the mappings openly. This can be done more sophisticatedly using linked data, but other initiatives as the OBO Foundry for biomedical ontologies did that long ago by simply publishing files with the mappings.

    In summary, the paper is giving a good overview of the problems and exposes technical solutions, but requires much longer discussion that cannot be fitted into the page limits of a conference paper. In an extended, more elaborated version, also the issues of sharing and reusing mappings should be further discussed.

  2. The article addresses a very interesting issue of syntactical interoperability and portability of different knowledge organization systems. Moreover, it discusses the syntactical interoperability with the aim of providing a set of concrete technical solutions by means of DITA markup. In that respect, the paper provides an interesting and valuable solution to a concrete theoretical problem.

    There are though three different theoretical aspects addressed in the article that might be more elaborated and more strongly related to each other.

    (1) The article points out that the technical documentation and training materials are important elements in enhancing the use and impact of agricultural research for development. It would be valuable if the writers of the article describe in a more detailed way the concrete features of the technical documentation and the learning and training materials, the possible ways of sharing them by different user/research/learning groups, as well as the concrete ways in which they contribute to the enhancement of the use and impact of agricultural research.

    (2) A more detailed analyses of the principles for identifying controlled vocabulary terms and the classification principles relevant for the agroforestry systems and learning practices would be good to provide. This also relates to actually undefined principles for selecting the core set of terms from AGROVOC to be matched with the corresponding terminology of NALT (referred to in the paragraph before section 3.2).

    (3) Finally, concerning the ways of handling the controlled vocabulary in DITA Markup, it would be good to elaborate the nicely selected principles for glossary specialization: why and how might these very principles of presenting the glossary (provided in section 3.2) contribute to enhancing the use and impact of agricultural research? The similar type of elaboration might be provided concerning the selection of the logical relations between the subjects: what is the specific theoretical /practical contribution of specifying the logical relations as the given hierarchical structures and particularly, what specific associative relationships are allowed by the classification mechanism. How these specific classification mechanisms might be expected to contribute to a more efficient production and reuse of agroforestry technical documentation and learning content.
    Elaborated reflection on these questions might contribute to a stronger theoretical ground of this very interesting article.

  3. What I enjoyed in this paper is the fact that it combines a really interesting background on agroforestry with some technical issues that come up when working with metadata in this field. Even though the process described includes a significant aspects (which are not included in my area of expertise), the paper is easy to read in case you have the basic background about the issues discussed.

    DITA looks like a suitable approach for this purpose, posing as an alternative to the existing solutions. It combines a lot of components and advantages, which was expected, as Thomas has been working in this specific field for a really long time. This experience is clearly depicted in this paper and I am really anxious to see DITA in practice.

  4. Thanks for making a presentation that doesn't state what you're saying, but shows us beautiful pictures and gives us the idea of the whole subject.
    I really liked the slide with the sieves!

  5. Good process as new domains are created then we must approach the process getting the information that is useful from before. I look forward to learning more about integration of terms used in fields and how they get tied to official terminology