3.6.1. Installing Open edX Analytics Devstack

This section describes how to install and run the Open edX Analytics developer stack.


Before you install analytics developer stack, make sure that you have met the installation prerequisites. Installing Analytics Devstack

To install analytics devstack extensions directly from the command line, follow these steps.

  1. Halt any running Open edX devstacks. Navigate to the directory that contains the Vagrantfile for the devstack and run vagrant suspend or vagrant halt.

    $ cd ~/open-edx/devstack/
    $ vagrant suspend
  2. Create the analyticstack directory and navigate to it in the command prompt.

    $ mkdir analyticstack
    $ cd analyticstack
  3. Download the analytics devstack Vagrant file.

    $ curl -L https://raw.githubusercontent.com/edx/configuration/open-release/eucalyptus.master/vagrant/release/analyticstack/Vagrantfile > Vagrantfile
  4. Create the analytics devstack virtual machine.

    $ export OPENEDX_RELEASE="open-release/eucalyptus.2"
    $ vagrant up
  5. Clone the edx-analytics-pipeline repository.

    $ git clone git@github.com:edx/edx-analytics-pipeline.git ./edx-analytics-pipeline/
  6. Prepare the data pipeline inside the virtual machine.

    $ vagrant ssh
    $ cd /edx/app/analytics_pipeline/
    $ sudo mkdir venvs
    $ sudo chown vagrant:vagrant venvs
    $ virtualenv venvs/analytics_pipeline/
    $ . venvs/analytics_pipeline/bin/activate
    $ cd analytics_pipeline
    $ make system-requirements
    $ make develop


    The version of edx-analytics-pipeline that you checked out on your host will be mounted at /edx/app/analytics_pipeline/analytics_pipeline inside the virtual machine. Vagrant directory sharing allows the code to be modified using an editor on the host machine and executed within the virtual machine.

  7. Run tests and quality checks.

    $ make coverage
  8. Run the acceptance tests as the hadoop user.

    $ sudo su hadoop
    $ cd /edx/app/analytics_pipeline/
    $ . venvs/analytics_pipeline/bin/activate
    $ cd analytics_pipeline
    # The next step will take hours to run.
    $ make test-acceptance-local

    A subset of acceptance tests can be run using the ONLY_TESTS parameter.

    $ make test-acceptance-local ONLY_TESTS=edx.analytics.tasks.tests.acceptance.test_enrollments

    Acceptance tests usually destroy any existing state before running. This behavior can be disabled by setting the DISABLE_RESET_STATE environment variable.

    $ DISABLE_RESET_STATE=true make test-acceptance-local ONLY_TESTS=edx.analytics.tasks.tests.acceptance.test_enrollments


    Acceptance tests emulate deployment of the code to a remote Hadoop cluster. During this process the tests check out a new copy of the code from the repo. For this reason, all changes must be committed before running the test.

  9. Display parameters for a task. You can use the following technique to see the parameters for any task.

    $ launch-task ImportEnrollmentsIntoMysql --help

    The EdX Analytics Pipeline Reference Guide contains a more detailed list of available tasks and their parameters.