The life of a language resource (LR), from its mere conception and drafting to its adult phases of active exploitation by the HLT community, varies considerably. Ensuring that language resources be a part of a sustainable and endurable living process represents a multi-faceted challenge that certainly calls for well-planned anti-neglecting actions to be put into action by the different actors participating in the process. Clearing all IPR issues, exploiting best practices at specification and production time are just a few samples of such actions. Sustainability and lifecycle management issues are thus concepts that should be addressed before endeavouring into any serious LR production.
When thinking of long-term LRs a number of aspects come to our minds which do not always succeed to be taken into account before development. Some of these aspects are usability, accessibility, interoperability and scalability, which inevitably call for a long list of neglected points that would need to be taken into account at a very early stage of development. Looking further into the portability and scalability of a language resource, a number of dimensions should be taken into account to ensure that a language resource reaches its adult life in an active and productive way.
An aspect that is often neglected is the accessibility and thus secured reusability of a language resource. Institutions such as ELRA (European Language resources Association) and LDC (Linguistic Data Consortium), at a European and American level, respectively, as well as BAS (Bavarian Archive for Speech Signals) and TST-Centrale (Flemish-Dutch Human Language Technology Agency), at a language-specific level, have worked on these aspects for a large number of years. Through their different activities, they have successfully implemented a sharing policy which allows different users to gain access to already existing resources. Other emerging programmes such as CLARIN (Common Language Resources and Technology Infrastructure) are also looking into these aspects. Nevertheless, many resources still follow development without a long-term accessibility plan into place which makes impossible to gain access once the resource is finished. This accessibility plan should consider issues such as ownership rights, licensing, types of use, aiming for a wide community from the very beginning. This accessibility plan calls for an optimal co-operation between all actors (LR users, financing bodies, owners, developers and organisations) so that issues related to the life of a LR are well established, roles and actors are clearly identified within the cycle and best practices are defined towards the management of the entire LR lifecycle.
We are aware, though, that these above-presented ideas are but a take-off for discussion. It is at this point that we would like to invite the community to participate in this workshop and share with us their views on these and other relevant issues of concern. A fruitful discussion could lead us to finding new mechanisms to support perpetuating language resources, and may lead us towards a sustainability model that guarantees an appropriate and well-defined LR storyboard and lifecycle management plan in the future.
Among the many issues and topics that may be presented and discussed during this workshop, we would like to already suggest the following:
Which fields require LRs and which are their respective needs?
What needs to be part of a LR storyboard? What points are we missing in its design?
General specifications vs. detailed specifications and design
Annotation frameworks and layers: interoperable at all?
Should creation and provision of LRs be included in higher education curriculae?
How to plan for scalable resources?
Language Resource maintenance and improvement: feasible?
Sharing language resources: how to bear this in mind and implement it? Logistics of the sharing: online vs. offline
Centralised vs. decentralised, and national vs. international management and maintenance of LRs
What happens when users create updated or derived LRs?
Sharing language resources: legal issues concerned
Sharing language resources: pricing issues concerned, commercial vs. non-commercial use
Do LR actors work in a synchronised manner?
What should be the roles of the different actors?
What are the business models and arrangements for IPRs?
Self-supporting vs. subsidised LR organisations
Other general problems faced by the community
We solicit papers that address these questions and other related issues relevant to the workshop.
This full-day workshop aims to address all those involved with language resources at some point of their research/work (LR users, producers, ...) and all those with an interest in the different aspects involved, whether universities, companies or funding agencies of some nature. It aims to be a meeting and discussion point for the so many bottlenecks surrounding the life of a resource and which remain to be addressed with a sustainability plan.
The workshop features two invited talks, opening the morning and afternoon sessions, submitted papers, and will conclude with a round table to brainstorm on the issues raised during the presentations and the individual discussions. This round table will be run by a number of experts already experienced in some of the highlighted problems and in open discussion with the workshop participants. In short, this workshop will result in a plan of action towards a sustainability and lifecycle management plan to implement.
Mark Liberman, LDC (Linguistic Data Consortium), USA.