Changes between Initial Version and Version 1 of DatapkgDistribution


Ignore:
Timestamp:
12/26/10 18:05:37 (3 years ago)
Author:
rgrp
Comment:

mass import of existing work/research on the topic of distribution formats and metadata

Legend:

Unmodified
Added
Removed
Modified
  • DatapkgDistribution

    v1 v1  
     1[[PageOutline]] 
     2 
     3Designing a distribution format for datapkgs. 
     4 
     5Would like it to be: 
     6 
     7 * Simple 
     8 * Extensible 
     9 * Human editable 
     10 * Machine usable (easily parsable and editable) 
     11 * Based on existing standard formats 
     12 * Not linked to a particular language or system 
     13 
     14= Proposed Format = 
     15 
     16 * metadata.json - for metadata 
     17 * manifest.json - file listings (may be optional) 
     18 
     19== Metadata == 
     20 
     21See also the current source at: https://bitbucket.org/okfn/datapkg/src/tip/datapkg/metadata.py 
     22 
     23Current spec (heavily based on python distributions) 
     24 
     25 * version 
     26 * license 
     27 * author 
     28 * author_email 
     29 * maintainer 
     30 * maintainer_email 
     31 * url 
     32 * notes 
     33 * tags 
     34 * resources - urls where package data can be obtained 
     35  * download_url - deprecated in favour of resources 
     36 * extras - arbitrary additional metadata 
     37 
     38=== Future spec === 
     39 
     40This is very closely based on the Common JS spec (also a json based format). That spec in turn shared many common attributes with Debs, Python etc. 
     41 
     42 * name - the name of the package. 
     43 * description - a brief description of the package. By convention, the first sentence (up to the first ". ") should be usable as a package title in listings. 
     44 * version - a version string conforming to the Semantic Versioning requirements (http://semver.org/). 
     45 * keywords - an Array of string keywords to assist users searching for the package in catalogs. 
     46 * maintainers - Array of maintainers of the package. Each maintainer is a hash which must have a "name" property and may optionally provide "email" and "web" properties. 
     47 * licenses - array of licenses under which the package is provided. Each license is a hash with a "type" property specifying the type of license and a url property linking to the actual text. 
     48 * repositories - Array of repositories where the package can be located. Each repository is a hash with properties for the "type" and "url" location of the repository to clone/checkout the package. A "path" property may also be specified to locate the package in the repository if it does not reside at the root. 
     49 * dependencies - Hash of prerequisite packages on which this package depends in order to install and run. Each dependency defines the lowest compatible MAJOR[.MINOR[.PATCH]] dependency versions (only one per MAJOR version) with which the package has been tested and is assured to work. The version may be a simple version string (see the version property for acceptable forms), or it may be a hash group of dependencies which define a set of options, any one of which satisfies the dependency. The ordering of the group is significant and earlier entries have higher priority. 
     50 
     51Optional attributes: 
     52 
     53 * contributors - an Array of hashes each containing the details of a contributor. Format is the same as for author. By convention, the first contributor is the original author of the package. 
     54 
     55 
     56= Existing Distribution Formats = 
     57 
     58== Debs == 
     59 
     60http://www.debian.org/doc/debian-policy/ch-controlfields.html 
     61 
     62The fields in the binary package paragraphs are: 
     63 
     64 * Package (mandatory) 
     65 * Architecture (mandatory) 
     66 * Section (recommended) 
     67 * Priority (recommended) 
     68 * Essential 
     69 * Depends et al 
     70 * Description (mandatory) 
     71 * Homepage 
     72 
     735.6.2 Maintainer 
     74 
     75The package maintainer's name and email address. The name must come first, then the email address inside angle brackets <> (in RFC822 format). 
     76 
     775.6.13 Description 
     78 
     79In a source or binary control file, the Description field contains a description of the binary package, consisting of two parts, the synopsis or the short description, and the long description. The field's format is as follows: 
     80 
     815.6.5 Section 
     82 
     83This field specifies an application area into which the package has been classified. See Sections, Section 2.4. 
     84 
     85 
     86== JARs == 
     87 
     88http://java.sun.com/j2se/1.3/docs/guide/jar/jar.html 
     89 
     90The META-INF directory 
     91 
     92The following files/directories in the META-INF directory are recognized and interpreted by the Java 2 Platform to configure applications, extensions, class loaders and services: 
     93 
     94MANIFEST.MF - The manifest file that is used to define extension and package related data. 
     95 
     96INDEX.LIST 
     97 
     98== CommonJS javascript packages == 
     99 
     100http://wiki.commonjs.org/wiki/Packages/1.0 
     101 
     102The following is an extract: 
     103 
     104=== Packages === 
     105 
     106This specification describes the CommonJS package format for distributing CommonJS programs and libraries. A CommonJS package is a cohesive wrapping of a collection of modules, code and other assets into a single form. It provides the basis for convenient delivery, installation and management of CommonJS components. 
     107 
     108This specifies the CommonJS package descriptor file and package file format. It does not specify a package catalogue file or format; this is an exercise for future specifications. 
     109The package descriptor file is a statement of known fact at the time the package is published and may not be modified without publishing a new release. 
     110 
     111=== Package Descriptor File === 
     112 
     113Each package must provide a top-level package descriptor file called "package.json". This file is a JSON format file. Each package must provide all the following fields in its package descriptor file. 
     114 
     115 * name - the name of the package. 
     116 * description - a brief description of the package. By convention, the first sentence (up to the first ". ") should be usable as a package title in listings. 
     117 * version - a version string conforming to the Semantic Versioning requirements (http://semver.org/). 
     118 * keywords - an Array of string keywords to assist users searching for the package in catalogs. 
     119 * maintainers - Array of maintainers of the package. Each maintainer is a hash which must have a "name" property and may optionally provide "email" and "web" properties. 
     120 * contributors - an Array of hashes each containing the details of a contributor. Format is the same as for author. By convention, the first contributor is the original author of the package. 
     121 * bugs - URL for submitting bugs. Can be mailto or http. 
     122 * licenses - array of licenses under which the package is provided. Each license is a hash with a "type" property specifying the type of license and a url property linking to the actual text. If the license is one of the [http://www.opensource.org/licenses/alphabetical official open source licenses] the official license name or its abbreviation may be explicated with the "type" property.  If an abbreviation is provided (in parentheses), the abbreviation must be used. 
     123 * repositories - Array of repositories where the package can be located. Each repository is a hash with properties for the "type" and "url" location of the repository to clone/checkout the package. A "path" property may also be specified to locate the package in the repository if it does not reside at the root. 
     124 * dependencies - Hash of prerequisite packages on which this package depends in order to install and run. Each dependency defines the lowest compatible MAJOR[.MINOR[.PATCH]] dependency versions (only one per MAJOR version) with which the package has been tested and is assured to work. The version may be a simple version string (see the version property for acceptable forms), or it may be a hash group of dependencies which define a set of options, any one of which satisfies the dependency. The ordering of the group is significant and earlier entries have higher priority. 
     125 
     126=== Catalog Properties === 
     127 
     128When a package.json is included in a catalog of packages, the following fields should be present for each package.  
     129 
     130* checksums - Hash of package checksums. This checksum is used by package manager tools to verify the integrity of a package. For example: 
     131 
     132{{{ 
     133 checksums: { 
     134   "md5": "841959b03e98c92d938cdeade9e0784d", 
     135   "sha1": " f8919b549295a259a6cef5b06e7c86607a3c3ab7", 
     136   "sha256": "1abb530034bc88162e8427245839ec17c5515e01a5dede6e702932bbebbfe8a7" 
     137 } 
     138}}} 
     139 
     140This checksum is meant to be automatically added by the catalog service 
     141 
     142== Open Document Format == 
     143 
     144http://en.wikipedia.org/wiki/OpenDocument_technical_specification#Format_internals 
     145 
     146{{{ 
     147meta.xml 
     148META-INF/ 
     149  manifest.xml 
     150}}} 
     151 
     152meta.xml contains the file metadata. For example, Author, "Last modified by", date of last modification, etc. The contents look somewhat like this: 
     153 
     154{{{ 
     155    <meta:creation-date>2003-09-10T15:31:11</meta:creation-date> 
     156    <dc:creator>Daniel Carrera</dc:creator> 
     157    <dc:date>2005-06-29T22:02:06</dc:date> 
     158    <dc:language>es-ES</dc:language> 
     159    <meta:document-statistic  table-count="6" object-count="0" 
     160      page-count="59" paragraph-count="676" 
     161      image-count="2" word-count="16701" 
     162      character-count="98757"/> 
     163}}} 
     164 
     165META-INF is a separate folder. Information about the files contained in the OpenDocument package is stored in an XML file called the manifest file. The manifest file is always stored at the pathname META-INF/manifest.xml. The main pieces of information stored in the manifest are: 
     166 
     167 * A list of all of the files in the package. 
     168 * The media type of each file in the package. 
     169 * If a file stored in the package is encrypted, the information required to decrypt the file is stored in the manifest.