ResourceTagging
From Create Wiki
Resource tagging is useful for organizing resources (brushes, patterns, etc). They might be useful for many applications. It might make sense to share resources and tags between applications (or at least share user experience, as she performs the same action). Therefore there is an effort to describe how tagging could be better integrated into applications.
Contents |
Draft File Format
(provided as a background for discussion.)
Tags would be stored in a separate file (not embedded into resources).
Example:
<?xml version='1.0' encoding='UTF-8'?> <tags> <resource identifier="/home/auris/gimp/trunk/build/share/gimp/2.0/brushes/confetti.gih" checksum="d136b60fdd9cf41693a485a329b32e95"> <tag>aaa</tag> </resource> </tags>
Resources are mapped with entries in tags.xml using:
* identifier - filename for resources stored on disk. Easy to understand, but can be different between computers and platforms. * checksum - uniquely represent resource. MD5 checksum.
Currently identifier is used primarily to identify resources, and checksum is only used when identifier can't match actual resource file (file rename/move).
Tag is a text string which cannot contain separator characters, newlines, tabs, etc. There is no additional information stored with a tag, such as language. However, tags provided with GIMP can be translated into various languages. Localized version is installed in user's home directory on the first run but is not changed when switched to a different language later.
Discussion
Identifier
The identifier is absolute. At most it should be relative. For cross- platform users and/or users sharing assets from one OS to another this is a significant issue. It also might be handy to use some logical paths/types.
Absoulte/Relative
Converting the identifier to a relative path would then be
<?xml version='1.0' encoding='UTF-8'?> <tags> <resource identifier="brushes/confetti.gih" checksum="d136b60fdd9cf41693a485a329b32e95"> <tag>aaa</tag> </resource> </tags>
Logical Bases
The use of a logical base would change the identifier to
<?xml version='1.0' encoding='UTF-8'?>
<tags>
<resource identifier="${BRUSHES}/confetti.gih" checksum="d136b60fdd9cf41693a485a329b32e95">
<tag>aaa</tag>
</resource>
</tags>
Q: Relative path by itself is meaningless, as there has to be some "base" path which has to be prepended (like working directory + relative path). Concrete application may have such path defined.
A: By itself, relative paths have an implicit base of the current application's base directory or directories for installing resources. In the initial example, the implied base would be
/home/auris/gimp/trunk/build/share/gimp/2.0
However, it should be assumed that there is more than one base.
Q: What if there are multiple base paths defined (say, ...brushes/johns/ ...brushes/maries/) and each contains file round.gbr? (local paths would be the same)
A: Multiple base directories (including XDG base directories) should be assumed to be present and in a list ordered by precedence. An application would walk the set of base directories in order, trying each in turn until either a file is found or the end of the list is passed. The list usually includes directories in both the system share locations and under the current user's home directory. Glib has g_get_system_data_dirs() to fetch one such XDG list.
Q: What about other applications which do not know relative paths of concrete application ?
A: There are probably three main cases.
- The first case would be a tag file shipped with an application as a seeding. That tag file would be in the applications data directories, so any other application would need to have to use an explicit base directory to get at the data to begin with. So the same explicit base directory used to load the tags file would also be used as the base for the resources listed in that tag file.
- The second case is a file stored somewhere under a user's local share. Normally that would be the config directory for the user for that specific application (Application A). A second application (Application B) trying to access Application A's resources would also want to access Application B's tags for the current user. For best user experience Application B would need to locate both Application A's systm data resources and also Application A's user data resources. Given that, Application B could also load the tag file from Application A's user data directory base. The simplest way for Application B to know which user data dir matches is for Application B to ask the user the first time Application A's data is being accessed.
- The third case is for shared resources. Shared resources go into a shared location for data, and a corresponding tag file can be looked for in that system shared directory and the user's current config directory for that shared resource. The precise details of this would be up to the group publishing the shared resources (and the shared resources spec).
Given all the possible sources, it might seem overly complex and prone to problems, especially in regards to multiple overlapping tag files and resources. However there are three main mitigating factors.
- First is that the actual complexity really only comes into play if one application is accessing the internal resources of a second application. The majority of applications will not be doing this, or at least will not initially be doing this.
- Second is that when there is a name collision of resource files loaded from two different sets (internal resources and shared resources) the checksum will allow an application to know which target resource is meant.
- Third is when an identical resource (same relative name, same checksum) is present in the application resources+tags and also in the shared resources+tags the simple solution is to add the tag to the user version of both tag sets.
Checksum
The checksum is a little vague. It would be helpful if it were listed as an explicit type. The use of MD5 could be sufficient if we are dealing just with simple identification and not any security issues. It is also helpful that command-line md5 tools are quite common.
A latter addition could add Sha1, or other checksums. Using explicit names would allow for simpler switching.
<?xml version='1.0' encoding='UTF-8'?> <tags> <resource identifier="/home/auris/gimp/trunk/build/share/gimp/2.0/brushes/confetti.gih" md5="d136b60fdd9cf41693a485a329b32e95"> <tag>aaa</tag> </resource> </tags>
Q: I was wondering wouldn't it be nice to get rid or filenames entirely and use only checksum? The reason filenames exist, is to track renamed resource files and because of secret hope that one day GIMP will actually load resource files on demand only.
Tag as name vs. label
The <tag> itself seems like it is a functional name, and not just a user displayable label. It is *almost* the latter, but not quite. Splitting out to an explicit label also helps make internationalization easier.
A simple change to include an explicit label would be
<?xml version='1.0' encoding='UTF-8'?>
<tags>
<resource identifier="/home/auris/gimp/trunk/build/share/gimp/2.0/brushes/confetti.gih"
checksum="d136b60fdd9cf41693a485a329b32e95">
<tag>
<label>aaa</label>
</tag>
</resource>
</tags>
Perhaps the subtle difference might be expressed as the thought that instead of a tag being a label, one can think that a tag has a label.
I18N
Internationalization is probably quite important. Since the proposed tag format is XML it makes sense to leverage base XML methods for i18n. Using the xml:lang attribute to denote the language of a given tag label (note that this is per label, not per tag) would be helpful. The functional name of the tag itself can be denoted independent of language.
An example with "<key>" for the tag name and tags localized for two languages would be
<?xml version='1.0' encoding='UTF-8'?>
<tags>
<resource identifier="/home/auris/gimp/trunk/build/share/gimp/2.0/brushes/confetti.gih"
checksum="d136b60fdd9cf41693a485a329b32e95">
<tag>
<key>ch</key>
<label xml:lang="en">see-aich</label>
<label xml:lang="es">che</label>
</tag>
<tag>
<key>ll</key>
<label xml:lang="en">double-el</label>
<label xml:lang="es">elle</label>
</tag>
</resource>
</tags>
Questions:
Q: Tags tend to be repeated - is it good to have fuzzy, etc with all translations repeated tens or hundreds of times ?
A: I think this is a good point for potential enhancement. Instead of having tags required to be explicitly contained by a resource, we can have them optionally linked by reference.
Q: If user assigns double-el to resource in en locale is it expected to change to elle in es locale ?
A: Yes. If an 'es' locale user enters elle in the search window, that user is looking for the concept 'll'. They will want to see the pictures an 'en' locale user tagged as 'double-ell'. (Of course, these are not the most likely tag names, they were just chosen to show simple character differences). Another example is if a Spanish speaking user searched on "fuego", they would like to see the images an English speaker tagged as "fire"
Q: What is key good for ? Why cannot it be label without xml:lang attribute ?
A: The problem is that there is the user-visible string and the internal concept itself. In the simple case they are the same, but to be robust a very base i18n principal is to decouple ui strings from visible strings.
Tag Linking
For a large dataset there is a good chance that tags will be repeated. If <tag> elements are not just short, simple strings, it becomes useful to reference them from a single common instance.
<?xml version='1.0' encoding='UTF-8'?>
<tags>
<tag name="01">
<key>letter</key>
<label xml:lang="en">letra</label>
<label xml:lang="es">letter</label>
</tag>
<resource identifier="/home/auris/gimp/trunk/build/share/gimp/2.0/brushes/confetti.gih"
checksum="d136b60fdd9cf41693a485a329b32e95">
<tag xlink:href="#01" />
<tag>
<key>ch</key>
<label xml:lang="en">see-aich</label>
<label xml:lang="es">che</label>
</tag>
<resource identifier="/home/auris/gimp/trunk/build/share/gimp/2.0/brushes/confetti2.gih"
checksum="0036b60fdd9cf41693a485a329123456">
</resource>
<tag xlink:href="#01" />
<tag>
<key>ll</key>
<label xml:lang="en">double-el</label>
<label xml:lang="es">elle</label>
</tag>
</resource>
</tags>
Proposed Refinement
<?xml version='1.0' encoding='UTF-8'?>
<tags>
<tag name="01">
<key>letter</key>
<label xml:lang="en">letra</label>
<label xml:lang="es">letter</label>
</tag>
<resource identifier="${BRUSHES}/confetti.gih" md5="d136b60fdd9cf41693a485a329b32e95">
<tag xlink:href="#01"/>
<tag>
<key>ch</key>
<label xml:lang="en">see-aich</label>
<label xml:lang="es">che</label>
</tag>
<tag>
<key>ll</key>
<label xml:lang="en">double-el</label>
<label xml:lang="es">elle</label>
</tag>
<tag>
<key>ñ</key>
<label xml:lang="es">eñe</label>
<label xml:lang="en">en-with-that-funny-squiggle-on-top</label>
</tag>
</resource>
</tags>
Storage
It might be helpful for applications to look for resource tagging info files in some common locations, and then overlay with application specific locations.
The XDG Base Directory Specification is one set of locations to consider http://standards.freedesktop.org/basedir-spec/latest/
Implementations
Code
| This section is incomplete. You can help by expanding it. |
References
- Wikipedia info on Tags.

