Ticket #371 (closed requirement: fixed)
The system shall monitor QoS against SLA
Reported by: | johnbywater | Owned by: | nils.toedtmann |
---|---|---|---|
Priority: | critical | Milestone: | ckan-v1.4 |
Component: | ckan | Keywords: | |
Cc: | nils.toedtmann@… | Repository: | |
Theme: |
Description (last modified by johnbywater) (diff)
Requested by DGU.
Change History
comment:3 Changed 4 years ago by johnbywater
- Type changed from enhancement to requirement
- businessvalue set to 300
- Description modified (diff)
comment:8 Changed 4 years ago by johnbywater
- Summary changed from Implement QoS monitoring to The system shall monitor QoS against SLA
comment:9 Changed 3 years ago by nils.toedtmann
- Cc nils.toedtmann@… added
(I know the term "QoS" as a very specific networking term about classifying and prioritising network traffic. I assume here it means uptime, availability, performance monitoring?)
There seem to be are at least three monitors already in place:
- http://munin.okfn.org/ on eu1 monitoring eu[0-7] and us1, gathering additional health information via locally installed daemons. Munin's notification subsystem is not configured.
- http://nagios.hmg.ckan.net/ on hmg.ckan.net monitoring the CKAN-HMG service group (network monitoring only). Notfications are not configured (or?)
- We have a http://wasitup.com/ account which is watching some OKFN services (e.g. {ca,de,www}.ckan.org, {blog,www}.okfn.org) and sending loads of alerts to sysadmin@…. Only checking for "HTTP 200 OK" and whether the response contains a configurable string.
My 2ct: We should consolidate. What do we want?
- A webservice like https://www.pingdom.com/ ($40/month incl 30 checks and 200 SMS, $0.5/month per extra check, $0.14-20/SMS) or http://www.serverdensity.com/ ($10/server-month plus 5-10p/SMS)?
- Or run our own monitor (nagios, opsview, monin)?
In the latter case we want to have a separate machine which is not in on EC2 (but e.g. ByteMark?), dedicated to monitoring only.
We should also include root mails into the alert/notification policies. Root mails should be trimmed down to important warnings and errors only.
comment:10 Changed 3 years ago by nils.toedtmann
The nagios fork OPSview might be worth a look.
comment:11 Changed 3 years ago by nils.toedtmann
Replying to nils.toedtmann:
There seem to be are at least three monitors already in place:
Correction: at least four, we seem to have a Montastic account, too:
On 18/12/10 15:03, noreply@… wrote:
Dear okfn, This is a monthly reminder that you have an account on Montastic, the website monitor service. ### ACCOUNT INFORMATION Signup date: 2009-10-06 Email you signup with: [email protected] ### 20 WEBSITES MONITORED [OK] - http://www.ckan.net/ [OK] - http://www.knowledgeforge.net/ [OK] - http://okfn.org/ [not monitored] - http://blog.okfn.org/ [...] ### EMAIL ALERT RECIPIENTS - [email protected] - [email protected] - [email protected] [...] To make changes to your account or contact us, go to www.montastic.com. [...]
comment:12 Changed 3 years ago by thejimmyg
- Owner changed from johnbywater to nils.toedtmann
- Status changed from reopened to new
comment:13 Changed 3 years ago by thejimmyg
It is implied in this that the performance of sites should beat the QoS criteria, therefore closing #485. Ensuring this happens is an ongoing process.
comment:14 Changed 3 years ago by thejimmyg
From #440 we'll also need to "Write and pass comprehensive performance tests"
comment:15 Changed 3 years ago by thejimmyg
From #395:
At the moment, some pages within CKAN tend to load slowly. We should create a profiling setup in which we can measure response times for complete requests and individual methods calls.
This could be used to identify bottlenecks and find an appropriate caching or tuning strategy to improve CKAN performance.
NB: We should also agree on a maximum request latency.
TODO: Read up on all those QoS tickets to avoid overlapping efforts.
comment:16 Changed 3 years ago by anonymous
Mainly handled in http://knowledgeforge.net/okfn/tasks/ticket/564 now. Close here?
comment:17 Changed 3 years ago by thejimmyg
- Status changed from new to closed
- Resolution set to fixed
Marking as closed since http://knowledgeforge.net/okfn/tasks/ticket/600 now takes on this ticket.
I will check nils has added the new DGU Bytemark servers are added to Nagios.