Ticket #1447 (assigned defect)
disk space leakage
Reported by: | dread | Owned by: | kindly |
---|---|---|---|
Priority: | major | Milestone: | ckan-backlog |
Component: | ckan | Keywords: | |
Cc: | nils.toedtmann@…, pudo | Repository: | ckan |
Theme: | none |
Description
Periodically we see some CKAN servers fall over because they run out of disk space. We need to find out if there is a common cause and fix it.
One problem in the past has been file handles running out when creating lots of tiny files in the data directory.
Another problem has been several enourmous backups being created every day - pdeu on eu25.
Change History
comment:2 Changed 2 years ago by dread
This appears to have happened again today on test.ckan.net and someone has sorted it.
The problem is visible on munin as inodes running out.
- eu25 seems ready to fall over in about a week: http://munin.okfn.org/okfn.org/eu25.okfn.org-df_inode.html
- thedatahub.org on s055 (and other fry instances) seem to have dynamically adjusted inode table size (by the kernel) so is less of a problem
comment:3 Changed 2 years ago by dread
As predicted, this happened again today. From the following analysis it confirms that the problem is the cache growing and growing.
Disk usage in megabytes:
okfn@s025:~/var/srvc/publicdata.eu$ du -s -m /* 7 /bin 22 /boot 1 /dev 10 /etc 4157 /home 0 /initrd.img 0 /initrd.img.old 114 /lib 1 /lost+found 1 /media 1 /mnt 1 /opt 0 /proc 1 /root 7 /sbin 1 /selinux 1 /srv 0 /sys 1 /tmp 421 /usr 443 /var 0 /vmlinuz 0 /vmlinuz.old
okfn@s025:~/var/srvc/publicdata.eu$ du -s -m /home/okfn/var/srvc/publicdata.eu/*2173 /home/okfn/var/srvc/publicdata.eu/backup 1 /home/okfn/var/srvc/publicdata.eu/backup_RENAMED_TO_AVOID_MAYHEM.sh 1 /home/okfn/var/srvc/publicdata.eu/common.sh 1893 /home/okfn/var/srvc/publicdata.eu/data 1 /home/okfn/var/srvc/publicdata.eu/fetch.sh 1 /home/okfn/var/srvc/publicdata.eu/gather.sh 1 /home/okfn/var/srvc/publicdata.eu/pip-requirements.txt 1 /home/okfn/var/srvc/publicdata.eu/publicdata.eu.ini 86 /home/okfn/var/srvc/publicdata.eu/pyenv 1 /home/okfn/var/srvc/publicdata.eu/run.sh 1 /home/okfn/var/srvc/publicdata.eu/sstore 0 /home/okfn/var/srvc/publicdata.eu/who.ini
okfn@s025:~/var/srvc/publicdata.eu$ ls -l /home/okfn/var/srvc/publicdata.eu/backup total 2224588 -rw-r--r-- 1 okfn okfn 343199744 2011-06-14 20:50 db-20110614-2050.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 20:51 db-20110614-2051.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 20:52 db-20110614-2052.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 20:53 db-20110614-2053.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 20:54 db-20110614-2054.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 20:55 db-20110614-2055.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 20:56 db-20110614-2056.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 20:57 db-20110614-2057.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 20:58 db-20110614-2058.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 20:59 db-20110614-2059.sql -rw-r--r-- 1 okfn okfn 1036288 2011-06-14 22:00 db-20110614-2200.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:01 db-20110614-2201.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:02 db-20110614-2202.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:03 db-20110614-2203.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:04 db-20110614-2204.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:05 db-20110614-2205.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:06 db-20110614-2206.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:07 db-20110614-2207.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:08 db-20110614-2208.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:09 db-20110614-2209.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:10 db-20110614-2210.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:11 db-20110614-2211.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:12 db-20110614-2212.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:13 db-20110614-2213.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:14 db-20110614-2214.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:15 db-20110614-2215.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:16 db-20110614-2216.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:17 db-20110614-2217.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:18 db-20110614-2218.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:19 db-20110614-2219.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:20 db-20110614-2220.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:21 db-20110614-2221.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:22 db-20110614-2222.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:23 db-20110614-2223.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:24 db-20110614-2224.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:25 db-20110614-2225.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:26 db-20110614-2226.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:27 db-20110614-2227.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:28 db-20110614-2228.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:29 db-20110614-2229.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:30 db-20110614-2230.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:31 db-20110614-2231.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:32 db-20110614-2232.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:33 db-20110614-2233.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:34 db-20110614-2234.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:35 db-20110614-2235.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:36 db-20110614-2236.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:37 db-20110614-2237.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:38 db-20110614-2238.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:39 db-20110614-2239.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:40 db-20110614-2240.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:41 db-20110614-2241.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:42 db-20110614-2242.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:43 db-20110614-2243.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:44 db-20110614-2244.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:45 db-20110614-2245.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:46 db-20110614-2246.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:47 db-20110614-2247.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:48 db-20110614-2248.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:49 db-20110614-2249.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:50 db-20110614-2250.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:51 db-20110614-2251.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:52 db-20110614-2252.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:53 db-20110614-2253.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:54 db-20110614-2254.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:55 db-20110614-2255.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:56 db-20110614-2256.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:57 db-20110614-2257.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:58 db-20110614-2258.sql -rw-r--r-- 1 okfn okfn 0 2011-06-14 22:59 db-20110614-2259.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:00 db-20110615-0000.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:01 db-20110615-0001.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:02 db-20110615-0002.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:03 db-20110615-0003.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:04 db-20110615-0004.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:05 db-20110615-0005.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:06 db-20110615-0006.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:07 db-20110615-0007.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:08 db-20110615-0008.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:09 db-20110615-0009.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:10 db-20110615-0010.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:11 db-20110615-0011.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:12 db-20110615-0012.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:13 db-20110615-0013.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:14 db-20110615-0014.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:15 db-20110615-0015.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:16 db-20110615-0016.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:17 db-20110615-0017.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:18 db-20110615-0018.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:19 db-20110615-0019.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:20 db-20110615-0020.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:21 db-20110615-0021.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:22 db-20110615-0022.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:23 db-20110615-0023.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:24 db-20110615-0024.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:25 db-20110615-0025.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:26 db-20110615-0026.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:27 db-20110615-0027.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:28 db-20110615-0028.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:29 db-20110615-0029.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:30 db-20110615-0030.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:31 db-20110615-0031.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:32 db-20110615-0032.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:33 db-20110615-0033.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:34 db-20110615-0034.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:35 db-20110615-0035.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:36 db-20110615-0036.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:37 db-20110615-0037.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:38 db-20110615-0038.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:39 db-20110615-0039.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:40 db-20110615-0040.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:41 db-20110615-0041.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:42 db-20110615-0042.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:43 db-20110615-0043.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:44 db-20110615-0044.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:45 db-20110615-0045.sql -rw-r--r-- 1 okfn okfn 0 2011-06-15 00:46 db-20110615-0046.sql -rw-r--r-- 1 okfn okfn 483144447 2011-06-15 10:00 db-20110615-1000.sql -rw-r--r-- 1 okfn okfn 482136064 2011-06-15 10:07 db-20110615-1007.sql -rw-r--r-- 1 okfn okfn 483144447 2011-06-15 10:50 db-20110615-1050.sql -rw-r--r-- 1 okfn okfn 483053568 2011-06-15 10:51 db-20110615-1051.sql
okfn@s025:~/var/srvc/publicdata.eu$ du -s -m /home/okfn/var/srvc/publicdata.eu/data/sessions/container_file/* 117 /home/okfn/var/srvc/publicdata.eu/data/sessions/container_file/0 116 /home/okfn/var/srvc/publicdata.eu/data/sessions/container_file/1 117 /home/okfn/var/srvc/publicdata.eu/data/sessions/container_file/2 116 /home/okfn/var/srvc/publicdata.eu/data/sessions/container_file/3 116 /home/okfn/var/srvc/publicdata.eu/data/sessions/container_file/4 116 /home/okfn/var/srvc/publicdata.eu/data/sessions/container_file/5 116 /home/okfn/var/srvc/publicdata.eu/data/sessions/container_file/6 117 /home/okfn/var/srvc/publicdata.eu/data/sessions/container_file/7 116 /home/okfn/var/srvc/publicdata.eu/data/sessions/container_file/8 117 /home/okfn/var/srvc/publicdata.eu/data/sessions/container_file/9 117 /home/okfn/var/srvc/publicdata.eu/data/sessions/container_file/a 116 /home/okfn/var/srvc/publicdata.eu/data/sessions/container_file/b 116 /home/okfn/var/srvc/publicdata.eu/data/sessions/container_file/c 116 /home/okfn/var/srvc/publicdata.eu/data/sessions/container_file/d 117 /home/okfn/var/srvc/publicdata.eu/data/sessions/container_file/e 116 /home/okfn/var/srvc/publicdata.eu/data/sessions/container_file/f
comment:5 Changed 2 years ago by dread
eu25 ran out of space again this weekend and eu8 (at/it/us_co) today.
comment:6 Changed 2 years ago by nils.toedtmann
For time being, i created a cron script remove_old_files.
You could just copy it to /etc/cron.daily/, but i recommend to not run it as root: if it's misconfigured, it could wipe the system!
So you better copy it to /home/okfn/sbin/ (not /home/okfn/bin/ which often is the sysadmin HG repo), and add it to some unprivileged user's crontab. In most cases, the leftover files are owned by user "www-data", so
$ sudo crontab -e -u www-data
and then add something like
37 4 * * * /home/okfn/sbin/remove_old_files
Don't forget to edit the script remove_old_files itself and list the directories you want to be cleaned up.
This is already done on s008/eu8 and s019/eu19. dread, do you want to do this for s025/eu25 and see how this goes?
Todo nils: verify tomorrow on s019 that it worked properly, e.g. this should show only a few files:
find /var/lib/ckan/nederland/data/sessions/ -type f -amin +$((7*24*60)) -ls
comment:7 Changed 2 years ago by nils.toedtmann
I had forgotten to check s019 how well my cleanup script is working (and now s019 is gone), but at least it didn't destroy it :-)
You might want to give it a try on s025/PDEU. (Tell me if you want me to do that).
comment:9 Changed 2 years ago by nils.toedtmann
- Cc pudo added
OK i fixed a bug in my script and refactored it so that it can now be dropped into /etc/cron.daily/ while still deleting as unprivileged user.
It is now running on s025, removing everything older than 7 days. Please verify in 9 days or so that it's working.
Consider to add this cron job to the ckan deb package e.g. as "/etc/cron.daily/ckan-cleanup"
comment:10 Changed 2 years ago by nils.toedtmann
- Status changed from new to closed
- Resolution set to fixed
Just checked s025 (which is depricated now), looks like my script is working fine - nothing older than a week in /home/okfn/var/srvc/publicdata.eu/data/sessions/.
We should activate this script on other hosts as well, e.g. so55/thedatahub.
comment:11 Changed 2 years ago by nils.toedtmann
Just to add: the remove_old_files script is only a workaround, not a fix. CKAN should clean up after itself. Feel free to re-open this ticket for a proper solution ;-)
comment:12 Changed 2 years ago by rgrp
- Status changed from closed to reopened
- Resolution fixed deleted
comment:13 Changed 2 years ago by rgrp
- Owner set to kindly
- Status changed from reopened to assigned
- Milestone set to ckan-v1.7
@kindly: hope ok to assign to you (maybe just for review and thought on who would be best placed to look at ...)
comment:14 Changed 2 years ago by nils.toedtmann
Ticket http://trac.okfn.org/ticket/1222 tracks the effort to push the clean-up script onto CKAN hosts.
comment:16 Changed 22 months ago by nils.toedtmann
This is becoming painful for the sysadmins. Please fix.
comment:17 Changed 22 months ago by dread
BTW on DGU I have set it up to use memcached for these sessions (v. easy) and I think it solves the problem.