8. Installing the Open edX Analytics Developer Stack

This section describes how to install and run the Open edX Analytics Developer Stack.

8.1. Overview

The Open edX Analytics Developer stack, known as the Analytics Devstack, is a modified version of the Open edX Developer Stack.

It provides all of the services and tools needed to modify the Open edX Analytics Pipeline, Data API, and Insights projects.

8.2. Components

The Analytics Devstack includes the following additional components.

  • edX Analytics Data API
  • edX Insights

The Analytics Devstack also includes all of the components needed to run the Open edX Analytics Pipeline, which is the primary ETL (extract, transform, and load) tool that extracts and analyzes data from the other Open edX services.

8.3. Install the Software Prerequisites

To install and run the Analytics Devstack, you must first install the following required software.

  • VirtualBox 4.3.12 or higher.
  • Vagrant 1.6.5 or higher.
  • A Network File System (NFS) client, if your operating system does not already include one. Devstack uses VirtualBox Guest Editions to share folders through NFS.

Additionally, the Open edX Analytics Pipeline includes a tool that is used to deploy itself. To make use of these tools, follow these steps.

  1. Clone the repository on your host, not on the Virtual Machine.

    $ git clone https://github.com/edx/edx-analytics-pipeline
  2. Install the project dependencies into a virtualenv on your host.

    $ cd edx-analytics-pipeline
    $ virtualenv venv
    $ source venv/bin/activate
    $ make bootstrap

The system is now ready to start running tasks on the Analytics Devstack using the remote-task tool.

8.4. Install the Analytics Devstack

To install the Analytics Devstack extensions directly from the command line, follow these steps.

  1. Create the analyticstack directory and navigate to it in the command prompt.

    $ mkdir analyticstack
    $ cd analyticstack
  2. Download the Analytics Devstack Vagrant file.

    $ curl -L https://raw.github.com/edx/configuration/master/vagrant/release/analyticstack/Vagrantfile > Vagrantfile
  3. Create the Analytics Devstack virtual machine.

    $ vagrant up

8.5. Using the Analytics Devstack

8.5.1. Run the Open edX LMS

  1. Log in to the Analytics Devstack.

    $ vagrant ssh
  2. Switch to the edxapp user.

    $ sudo su edxapp
  3. Start the LMS.

    $ paver devstack lms

8.5.2. Run the Open edX Analytics Data API

  1. Log in to the Analytics Devstack.

    $ vagrant ssh
  2. Switch to the analytics_api user.

    $ sudo su analytics_api
  3. Start the Data API.

    $ ~/venvs/analytics_api/bin/python ~/analytics_api/manage.py runserver --insecure

8.5.3. Run Open edX Insights

  1. Log in to the Analytics Devstack.

    $ vagrant ssh
  2. Switch to the insights user.

    $ sudo su insights
  3. Enable features that are disabled by default.

    $ ~/venvs/insights/bin/python ~/edx_analytics_dashboard/manage.py switch display_verified_enrollment on --create
    $ ~/venvs/insights/bin/python ~/edx_analytics_dashboard/manage.py switch enable_course_api on --create
  4. Start Insights.

    $ ~/venvs/insights/bin/python ~/edx_analytics_dashboard/manage.py runserver --insecure
  5. Open the URL in a browser on the host.


    Be sure to use the IP address instead of localhost. Using localhost will prevent you from logging in.

8.5.4. Run the Open edX Analytics Pipeline

  1. In the Devstack LMS, register a new user and enroll in the demo course.

  2. Navigate to the courseware and submit answers to a few problems.

  3. Navigate to the location where edx-analytics-pipeline project was cloned on the host.

    $ cd edx-analytics-pipeline
  4. Run the enrollment task.

    $ export WHEEL_URL=http://edx-wheelhouse.s3-website-us-east-1.amazonaws.com/Ubuntu/precise
    # On Mac OS X replace the date command below with $(date -v+1d +%Y-%m-%d)
    $ remote-task --vagrant-path <path to `analyticstack`> --remote-name devstack --override-config ${PWD}/config/devstack.cfg --wheel-url $WHEEL_URL --wait \
       ImportEnrollmentsIntoMysql --local-scheduler --interval-end $(date +%Y-%m-%d -d "tomorrow") --n-reduce-tasks 1
  5. Run the answer distribution task.

    $ export WHEEL_URL=http://edx-wheelhouse.s3-website-us-east-1.amazonaws.com/Ubuntu/precise
    $ export UNIQUE_NAME=$(date +%Y-%m-%dT%H_%M_%SZ)
    $ remote-task --vagrant-path <path to `analyticstack`> --remote-name devstack --override-config ${PWD}/config/devstack.cfg --wheel-url $WHEEL_URL --wait \
        AnswerDistributionWorkflow --local-scheduler \
          --src hdfs://localhost:9000/data/ \
          --include '*tracking.log*' \
          --dest hdfs://localhost:9000/edx-analytics-pipeline/output/answer_distribution_raw/$UNIQUE_NAME/data \
          --name $UNIQUE_NAME \
          --output-root hdfs://localhost:9000/edx-analytics-pipeline/output/answer_distribution/ \
          --marker hdfs://localhost:9000/edx-analytics-pipeline/output/answer_distribution_raw/$UNIQUE_NAME/marker \
          --n-reduce-tasks 1