Changes between Initial Version and Version 1 of Ticket #2732

07/24/12 08:59:02 (22 months ago)


  • Ticket #2732 – Description

    initial v1  
    11We should simplify upload and storage of files, initially only to local storage with archiver eventually being fixed to archive data externally. WIP pad is   
     3_Simplifying uploads_ 
     5Currently uploads are too painful/difficult/fiddly to use and/or configure.  We want to simplify uploads so that they are done directly to the CKAN server, without support for remote services (S3 etc) and/or the dependencies it introduces. 
     7We want to fix: 
     9 * File uploads themselves 
     10 * Storage of uploaded files 
     11 * Notification of the upload to other components 
     14_File uploads_ 
     16Things file upload should do: 
     18 * Allow sysadmin to disable 
     19 * Allow auth'ed users to upload 
     20 * Store whatever they send on disk, and store DB entry linking the file to the person 
     21 * When creating the resource, the user should be able to choose from all of the files 
     22   they have uploaded but not yet associated with a resource. This will allow for bulk 
     23   upload and then a delayed association.  Whenver a user creates a resource they 
     24   either upload a file now, or see previously uploaded files. 
     26? Can we do the upload asynchronously and then associated uploaded key with the 
     27   resource before the save ? What happens if the user tries to submit before asymc upload finishes ? Should we delay them? 
     30_File storage_ 
     32File storage should be local to the CKAN install, and not a remote service.  Any archiving to remove storage providers should be outside of the main request. 
     34File storage should: 
     36 * allow moving data, a sysadmin should be able to move the storage root and change configuration and  have the system continue running (i.e. don't store absolute paths). 
     37 * provide maintainability, it should be easy to determine which old files are not associated with resources  and thus can be cleaned up. 
     38 * allow for collection of information (i.e. estimate of storate space used) 
     39 * check whether there is enough space and handling the conequences cleanly 
     40 * ensure files to be written only underneath its own root folder, checks should be made after any path generation that the file begins with the location of the file storage. 
     41 * Have a configurable maximum accepted blob size during upload. 
     42 * Should store what meta-data was provided with the upload, such as mimetype. 
     44Somewhere in the DB we should store ... 
     47|id|An identifier| 
     48|owner|The owning user, who uploaded the file| 
     49|path|The path (from the 'storage root') to the file| 
     50|size|The size in bytes of the file on disk| 
     51|mimetype|The mimetype of the file, as provided by the uploader| 
     52|upload_date|When the data was uploaded| 
     53|resource|The ID of the resource it belongs to. A unidirectional relationship.| 
     55Generating paths should try and separate the files, perhaps based on username of the owner, or some other mechanism to avoid a single folder full of files. 
     60We need to make sure that it is possible to notify other components within the system that an upload has taken place, or at least make it easy for them to be notified.  The primary use case for this is to notify the component that will translate/upload certain formats to the data store. 
     62We could do this based on the post-upload update to the file model (i.e. when we record the total received size of the file).