Ticket #1447 (assigned defect)

Opened 2 years ago

Last modified 22 months ago

disk space leakage

Reported by: dread Owned by: kindly
Priority: major Milestone: ckan-backlog
Component: ckan Keywords:
Cc: nils.toedtmann@…, pudo Repository: ckan
Theme: none

Description

Periodically we see some CKAN servers fall over because they run out of disk space. We need to find out if there is a common cause and fix it.

One problem in the past has been file handles running out when creating lots of tiny files in the data directory.

Another problem has been several enourmous backups being created every day - pdeu on eu25.

Change History

comment:1 Changed 2 years ago by nils.toedtmann

  • Cc nils.toedtmann@… added

comment:2 Changed 2 years ago by dread

This appears to have happened again today on test.ckan.net and someone has sorted it.

The problem is visible on munin as inodes running out.

comment:3 Changed 2 years ago by dread

As predicted, this happened again today. From the following analysis it confirms that the problem is the cache growing and growing.

Disk usage in megabytes:

okfn@s025:~/var/srvc/publicdata.eu$ du -s -m /*
7	/bin
22	/boot
1	/dev
10	/etc
4157	/home
0	/initrd.img
0	/initrd.img.old
114	/lib
1	/lost+found
1	/media
1	/mnt
1	/opt
0	/proc
1	/root
7	/sbin
1	/selinux
1	/srv
0	/sys
1	/tmp
421	/usr
443	/var
0	/vmlinuz
0	/vmlinuz.old
okfn@s025:~/var/srvc/publicdata.eu$ du -s -m /home/okfn/var/srvc/publicdata.eu/*2173	/home/okfn/var/srvc/publicdata.eu/backup
1	/home/okfn/var/srvc/publicdata.eu/backup_RENAMED_TO_AVOID_MAYHEM.sh
1	/home/okfn/var/srvc/publicdata.eu/common.sh
1893	/home/okfn/var/srvc/publicdata.eu/data
1	/home/okfn/var/srvc/publicdata.eu/fetch.sh
1	/home/okfn/var/srvc/publicdata.eu/gather.sh
1	/home/okfn/var/srvc/publicdata.eu/pip-requirements.txt
1	/home/okfn/var/srvc/publicdata.eu/publicdata.eu.ini
86	/home/okfn/var/srvc/publicdata.eu/pyenv
1	/home/okfn/var/srvc/publicdata.eu/run.sh
1	/home/okfn/var/srvc/publicdata.eu/sstore
0	/home/okfn/var/srvc/publicdata.eu/who.ini
okfn@s025:~/var/srvc/publicdata.eu$ ls -l /home/okfn/var/srvc/publicdata.eu/backup
total 2224588
-rw-r--r-- 1 okfn okfn 343199744 2011-06-14 20:50 db-20110614-2050.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 20:51 db-20110614-2051.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 20:52 db-20110614-2052.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 20:53 db-20110614-2053.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 20:54 db-20110614-2054.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 20:55 db-20110614-2055.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 20:56 db-20110614-2056.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 20:57 db-20110614-2057.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 20:58 db-20110614-2058.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 20:59 db-20110614-2059.sql
-rw-r--r-- 1 okfn okfn   1036288 2011-06-14 22:00 db-20110614-2200.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:01 db-20110614-2201.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:02 db-20110614-2202.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:03 db-20110614-2203.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:04 db-20110614-2204.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:05 db-20110614-2205.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:06 db-20110614-2206.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:07 db-20110614-2207.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:08 db-20110614-2208.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:09 db-20110614-2209.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:10 db-20110614-2210.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:11 db-20110614-2211.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:12 db-20110614-2212.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:13 db-20110614-2213.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:14 db-20110614-2214.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:15 db-20110614-2215.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:16 db-20110614-2216.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:17 db-20110614-2217.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:18 db-20110614-2218.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:19 db-20110614-2219.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:20 db-20110614-2220.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:21 db-20110614-2221.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:22 db-20110614-2222.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:23 db-20110614-2223.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:24 db-20110614-2224.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:25 db-20110614-2225.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:26 db-20110614-2226.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:27 db-20110614-2227.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:28 db-20110614-2228.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:29 db-20110614-2229.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:30 db-20110614-2230.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:31 db-20110614-2231.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:32 db-20110614-2232.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:33 db-20110614-2233.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:34 db-20110614-2234.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:35 db-20110614-2235.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:36 db-20110614-2236.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:37 db-20110614-2237.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:38 db-20110614-2238.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:39 db-20110614-2239.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:40 db-20110614-2240.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:41 db-20110614-2241.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:42 db-20110614-2242.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:43 db-20110614-2243.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:44 db-20110614-2244.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:45 db-20110614-2245.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:46 db-20110614-2246.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:47 db-20110614-2247.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:48 db-20110614-2248.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:49 db-20110614-2249.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:50 db-20110614-2250.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:51 db-20110614-2251.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:52 db-20110614-2252.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:53 db-20110614-2253.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:54 db-20110614-2254.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:55 db-20110614-2255.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:56 db-20110614-2256.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:57 db-20110614-2257.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:58 db-20110614-2258.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-14 22:59 db-20110614-2259.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:00 db-20110615-0000.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:01 db-20110615-0001.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:02 db-20110615-0002.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:03 db-20110615-0003.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:04 db-20110615-0004.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:05 db-20110615-0005.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:06 db-20110615-0006.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:07 db-20110615-0007.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:08 db-20110615-0008.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:09 db-20110615-0009.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:10 db-20110615-0010.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:11 db-20110615-0011.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:12 db-20110615-0012.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:13 db-20110615-0013.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:14 db-20110615-0014.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:15 db-20110615-0015.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:16 db-20110615-0016.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:17 db-20110615-0017.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:18 db-20110615-0018.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:19 db-20110615-0019.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:20 db-20110615-0020.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:21 db-20110615-0021.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:22 db-20110615-0022.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:23 db-20110615-0023.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:24 db-20110615-0024.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:25 db-20110615-0025.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:26 db-20110615-0026.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:27 db-20110615-0027.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:28 db-20110615-0028.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:29 db-20110615-0029.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:30 db-20110615-0030.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:31 db-20110615-0031.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:32 db-20110615-0032.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:33 db-20110615-0033.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:34 db-20110615-0034.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:35 db-20110615-0035.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:36 db-20110615-0036.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:37 db-20110615-0037.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:38 db-20110615-0038.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:39 db-20110615-0039.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:40 db-20110615-0040.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:41 db-20110615-0041.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:42 db-20110615-0042.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:43 db-20110615-0043.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:44 db-20110615-0044.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:45 db-20110615-0045.sql
-rw-r--r-- 1 okfn okfn         0 2011-06-15 00:46 db-20110615-0046.sql
-rw-r--r-- 1 okfn okfn 483144447 2011-06-15 10:00 db-20110615-1000.sql
-rw-r--r-- 1 okfn okfn 482136064 2011-06-15 10:07 db-20110615-1007.sql
-rw-r--r-- 1 okfn okfn 483144447 2011-06-15 10:50 db-20110615-1050.sql
-rw-r--r-- 1 okfn okfn 483053568 2011-06-15 10:51 db-20110615-1051.sql
okfn@s025:~/var/srvc/publicdata.eu$ du -s -m /home/okfn/var/srvc/publicdata.eu/data/sessions/container_file/*
117	/home/okfn/var/srvc/publicdata.eu/data/sessions/container_file/0
116	/home/okfn/var/srvc/publicdata.eu/data/sessions/container_file/1
117	/home/okfn/var/srvc/publicdata.eu/data/sessions/container_file/2
116	/home/okfn/var/srvc/publicdata.eu/data/sessions/container_file/3
116	/home/okfn/var/srvc/publicdata.eu/data/sessions/container_file/4
116	/home/okfn/var/srvc/publicdata.eu/data/sessions/container_file/5
116	/home/okfn/var/srvc/publicdata.eu/data/sessions/container_file/6
117	/home/okfn/var/srvc/publicdata.eu/data/sessions/container_file/7
116	/home/okfn/var/srvc/publicdata.eu/data/sessions/container_file/8
117	/home/okfn/var/srvc/publicdata.eu/data/sessions/container_file/9
117	/home/okfn/var/srvc/publicdata.eu/data/sessions/container_file/a
116	/home/okfn/var/srvc/publicdata.eu/data/sessions/container_file/b
116	/home/okfn/var/srvc/publicdata.eu/data/sessions/container_file/c
116	/home/okfn/var/srvc/publicdata.eu/data/sessions/container_file/d
117	/home/okfn/var/srvc/publicdata.eu/data/sessions/container_file/e
116	/home/okfn/var/srvc/publicdata.eu/data/sessions/container_file/f

comment:4 Changed 2 years ago by dread

Once again eu25 ran out of space again today.

comment:5 Changed 2 years ago by dread

eu25 ran out of space again this weekend and eu8 (at/it/us_co) today.

comment:6 Changed 2 years ago by nils.toedtmann

For time being, i created a cron script remove_old_files.

You could just copy it to /etc/cron.daily/, but i recommend to not run it as root: if it's misconfigured, it could wipe the system!

So you better copy it to /home/okfn/sbin/ (not /home/okfn/bin/ which often is the sysadmin HG repo), and add it to some unprivileged user's crontab. In most cases, the leftover files are owned by user "www-data", so

$ sudo crontab -e -u www-data

and then add something like

37 4 * * *  /home/okfn/sbin/remove_old_files

Don't forget to edit the script remove_old_files itself and list the directories you want to be cleaned up.

This is already done on s008/eu8 and s019/eu19. dread, do you want to do this for s025/eu25 and see how this goes?


Todo nils: verify tomorrow on s019 that it worked properly, e.g. this should show only a few files:

find /var/lib/ckan/nederland/data/sessions/ -type f -amin +$((7*24*60)) -ls

comment:7 Changed 2 years ago by nils.toedtmann

I had forgotten to check s019 how well my cleanup script is working (and now s019 is gone), but at least it didn't destroy it :-)

You might want to give it a try on s025/PDEU. (Tell me if you want me to do that).

comment:8 Changed 2 years ago by dread

Yes please Nils!

comment:9 Changed 2 years ago by nils.toedtmann

  • Cc pudo added

OK i fixed a bug in my script and refactored it so that it can now be dropped into /etc/cron.daily/ while still deleting as unprivileged user.

It is now running on s025, removing everything older than 7 days. Please verify in 9 days or so that it's working.

Consider to add this cron job to the ckan deb package e.g. as "/etc/cron.daily/ckan-cleanup"

comment:10 Changed 2 years ago by nils.toedtmann

  • Status changed from new to closed
  • Resolution set to fixed

Just checked s025 (which is depricated now), looks like my script is working fine - nothing older than a week in /home/okfn/var/srvc/publicdata.eu/data/sessions/.

We should activate this script on other hosts as well, e.g. so55/thedatahub.

comment:11 Changed 2 years ago by nils.toedtmann

Just to add: the remove_old_files script is only a workaround, not a fix. CKAN should clean up after itself. Feel free to re-open this ticket for a proper solution ;-)

comment:12 Changed 2 years ago by rgrp

  • Status changed from closed to reopened
  • Resolution fixed deleted

comment:13 Changed 2 years ago by rgrp

  • Owner set to kindly
  • Status changed from reopened to assigned
  • Milestone set to ckan-v1.7

@kindly: hope ok to assign to you (maybe just for review and thought on who would be best placed to look at ...)

comment:14 Changed 2 years ago by nils.toedtmann

Ticket http://trac.okfn.org/ticket/1222 tracks the effort to push the clean-up script onto CKAN hosts.

comment:15 Changed 2 years ago by kindly

  • Milestone changed from ckan-v1.7 to ckan-backlog

comment:16 Changed 22 months ago by nils.toedtmann

This is becoming painful for the sysadmins. Please fix.

comment:17 Changed 22 months ago by dread

BTW on DGU I have set it up to use memcached for these sessions (v. easy) and I think it solves the problem.

Note: See TracTickets for help on using tickets.