This page is a work-in-progress aimed at capturing all of the details needed to deploy a new service in the edX environment.
Understanding how your service works and what it does helps Ops support the service in production.
What class of machine does your service require? What resources are most likely to be bottlenecks for your service: CPU, memory, bandwidth, something else?
Who will be using your service? What is the anticipated initial usage? What factors will cause usage to grow? How many users can your service support?
What repository or repositories does your service require. Will your service be deployed from a non-public repo?
Ideally your service should follow the same release management process as the LMS. This is documented in the wiki, so please ensure you understand that process in depth.
Was the service code reviewed?
How does your service read in environment specific settings? Were all hard- coded references to values that should be settings, such as database URLs and credentials, message queue endpoints, and so on, found and resolved during code review?
Is the license included in the repo?
Is it HTTP based? Does it run periodically? Both?
Ops will need to know the following things.
It is important that your application logging in built out to provide sufficient feedback for problem determination as well as ensuring that it is operating as desired. It is also important that your service log uses our deployment standards, for example, logs vs. syslog in deployment environments, and the standard log format for syslog. Can the logs be consumed by Splunk? They should not be if they contain data discussed in the Data Security section below.
What are the key metrics for your application? Concurrent users? Transactions per second?
Does your service need to access a message queue?
Does your service need to send email?
Does your service need access to other services, either within or outside of the edX environment? Some example might be, the comment service, the LMS, YouTube, s3 buckets, and so on.
Your service should have a facility for remote monitoring that has the following characteristics.
How can your application be deployed to ensure that it is fault tolerant and scalable?
From where should your service be accessible?
Will your application be storing or handling data in any of the following categories?
Has your service been load tested? What there the details of the test. What determinations can we make regarding when we will need to scale if usage trend upward? How can ops exercise your service in order to tests end-to-end integration. We love no-op-able tasks.
Anything else we should know about.