wiki:DatapkgDistribution

Version 1 (modified by rgrp, 3 years ago) (diff)

mass import of existing work/research on the topic of distribution formats and metadata

Designing a distribution format for datapkgs.

Would like it to be:

  • Simple
  • Extensible
  • Human editable
  • Machine usable (easily parsable and editable)
  • Based on existing standard formats
  • Not linked to a particular language or system

Proposed Format

  • metadata.json - for metadata
  • manifest.json - file listings (may be optional)

Metadata

See also the current source at: https://bitbucket.org/okfn/datapkg/src/tip/datapkg/metadata.py

Current spec (heavily based on python distributions)

  • version
  • license
  • author
  • author_email
  • maintainer
  • maintainer_email
  • url
  • notes
  • tags
  • resources - urls where package data can be obtained
    • download_url - deprecated in favour of resources
  • extras - arbitrary additional metadata

Future spec

This is very closely based on the Common JS spec (also a json based format). That spec in turn shared many common attributes with Debs, Python etc.

  • name - the name of the package.
  • description - a brief description of the package. By convention, the first sentence (up to the first ". ") should be usable as a package title in listings.
  • version - a version string conforming to the Semantic Versioning requirements (http://semver.org/).
  • keywords - an Array of string keywords to assist users searching for the package in catalogs.
  • maintainers - Array of maintainers of the package. Each maintainer is a hash which must have a "name" property and may optionally provide "email" and "web" properties.
  • licenses - array of licenses under which the package is provided. Each license is a hash with a "type" property specifying the type of license and a url property linking to the actual text.
  • repositories - Array of repositories where the package can be located. Each repository is a hash with properties for the "type" and "url" location of the repository to clone/checkout the package. A "path" property may also be specified to locate the package in the repository if it does not reside at the root.
  • dependencies - Hash of prerequisite packages on which this package depends in order to install and run. Each dependency defines the lowest compatible MAJOR[.MINOR[.PATCH]] dependency versions (only one per MAJOR version) with which the package has been tested and is assured to work. The version may be a simple version string (see the version property for acceptable forms), or it may be a hash group of dependencies which define a set of options, any one of which satisfies the dependency. The ordering of the group is significant and earlier entries have higher priority.

Optional attributes:

  • contributors - an Array of hashes each containing the details of a contributor. Format is the same as for author. By convention, the first contributor is the original author of the package.

Existing Distribution Formats

Debs

http://www.debian.org/doc/debian-policy/ch-controlfields.html

The fields in the binary package paragraphs are:

  • Package (mandatory)
  • Architecture (mandatory)
  • Section (recommended)
  • Priority (recommended)
  • Essential
  • Depends et al
  • Description (mandatory)
  • Homepage

5.6.2 Maintainer

The package maintainer's name and email address. The name must come first, then the email address inside angle brackets <> (in RFC822 format).

5.6.13 Description

In a source or binary control file, the Description field contains a description of the binary package, consisting of two parts, the synopsis or the short description, and the long description. The field's format is as follows:

5.6.5 Section

This field specifies an application area into which the package has been classified. See Sections, Section 2.4.

JARs

http://java.sun.com/j2se/1.3/docs/guide/jar/jar.html

The META-INF directory

The following files/directories in the META-INF directory are recognized and interpreted by the Java 2 Platform to configure applications, extensions, class loaders and services:

MANIFEST.MF - The manifest file that is used to define extension and package related data.

INDEX.LIST

CommonJS javascript packages

http://wiki.commonjs.org/wiki/Packages/1.0

The following is an extract:

Packages

This specification describes the CommonJS package format for distributing CommonJS programs and libraries. A CommonJS package is a cohesive wrapping of a collection of modules, code and other assets into a single form. It provides the basis for convenient delivery, installation and management of CommonJS components.

This specifies the CommonJS package descriptor file and package file format. It does not specify a package catalogue file or format; this is an exercise for future specifications. The package descriptor file is a statement of known fact at the time the package is published and may not be modified without publishing a new release.

Package Descriptor File

Each package must provide a top-level package descriptor file called "package.json". This file is a JSON format file. Each package must provide all the following fields in its package descriptor file.

  • name - the name of the package.
  • description - a brief description of the package. By convention, the first sentence (up to the first ". ") should be usable as a package title in listings.
  • version - a version string conforming to the Semantic Versioning requirements (http://semver.org/).
  • keywords - an Array of string keywords to assist users searching for the package in catalogs.
  • maintainers - Array of maintainers of the package. Each maintainer is a hash which must have a "name" property and may optionally provide "email" and "web" properties.
  • contributors - an Array of hashes each containing the details of a contributor. Format is the same as for author. By convention, the first contributor is the original author of the package.
  • bugs - URL for submitting bugs. Can be mailto or http.
  • licenses - array of licenses under which the package is provided. Each license is a hash with a "type" property specifying the type of license and a url property linking to the actual text. If the license is one of the official open source licenses the official license name or its abbreviation may be explicated with the "type" property. If an abbreviation is provided (in parentheses), the abbreviation must be used.
  • repositories - Array of repositories where the package can be located. Each repository is a hash with properties for the "type" and "url" location of the repository to clone/checkout the package. A "path" property may also be specified to locate the package in the repository if it does not reside at the root.
  • dependencies - Hash of prerequisite packages on which this package depends in order to install and run. Each dependency defines the lowest compatible MAJOR[.MINOR[.PATCH]] dependency versions (only one per MAJOR version) with which the package has been tested and is assured to work. The version may be a simple version string (see the version property for acceptable forms), or it may be a hash group of dependencies which define a set of options, any one of which satisfies the dependency. The ordering of the group is significant and earlier entries have higher priority.

Catalog Properties

When a package.json is included in a catalog of packages, the following fields should be present for each package.

  • checksums - Hash of package checksums. This checksum is used by package manager tools to verify the integrity of a package. For example:
 checksums: {
   "md5": "841959b03e98c92d938cdeade9e0784d",
   "sha1": " f8919b549295a259a6cef5b06e7c86607a3c3ab7",
   "sha256": "1abb530034bc88162e8427245839ec17c5515e01a5dede6e702932bbebbfe8a7"
 }

This checksum is meant to be automatically added by the catalog service

Open Document Format

http://en.wikipedia.org/wiki/OpenDocument_technical_specification#Format_internals

meta.xml
META-INF/
  manifest.xml

meta.xml contains the file metadata. For example, Author, "Last modified by", date of last modification, etc. The contents look somewhat like this:

    <meta:creation-date>2003-09-10T15:31:11</meta:creation-date>
    <dc:creator>Daniel Carrera</dc:creator>
    <dc:date>2005-06-29T22:02:06</dc:date>
    <dc:language>es-ES</dc:language>
    <meta:document-statistic  table-count="6" object-count="0"
      page-count="59" paragraph-count="676"
      image-count="2" word-count="16701"
      character-count="98757"/>

META-INF is a separate folder. Information about the files contained in the OpenDocument? package is stored in an XML file called the manifest file. The manifest file is always stored at the pathname META-INF/manifest.xml. The main pieces of information stored in the manifest are:

  • A list of all of the files in the package.
  • The media type of each file in the package.
  • If a file stored in the package is encrypted, the information required to decrypt the file is stored in the manifest.