EdX transfers course data to the data czars at our partner institutions in regularly generated data packages. Data packages can only be accessed by a single contact at each university, referred to as the “data czar”.
The data czar who is selected at each institution sets up keys for securely transferring files from edX to the partner institution. Meanwhile, the Analytics team at edX sets up credentials so that the data czar can log in to the site where data packages are stored.
After these steps for setting up credentials are complete, the data czar can download data packages on an ongoing basis.
To assure the security of data packages, the edX Analytics team encrypts all files before making them available to a partner institution. As a result, when you receive a data package (or other files) from the edX Analytics team, you must decrypt the files that it contains before you use them.
The cryptographic processes of encrypting and decrypting data files require that you create a pair of keys: the public key in the pair, which you send to the edX Analytics team, is used to encrypt data. You use your corresponding private key to decrypt any files that have been encrypted with that public key.
To create the keys needed for this encryption and decryption process, you use GNU Privacy Guard (GnuPG or GPG). Essentially, you install a cryptographic application on your local computer and then supply your email address and a secret passphrase (a password).
Important
The result is the public key that you send to edX to use in encrypting data files for your institution, and the private key which you keep secret and use to decrypt the encrypted files that you receive. Creating these keys is a one- time process that you coordinate with your edX partner manager. Instructions for creating the keys on Windows or Macintosh follow.
For more information about GPG encryption and creating key pairs, see the Gpg4win Compendium.
Important
Do not reveal your passphrase, or share your private key, with anyone else. If you need another person to be able to transfer and decrypt files, work with edX to set her or him up as an additional data czar. Data czars must create and use their own passphrases.
Go to the GPG Tools website. Scroll down to the GPG Suite section of the page and select Download GPG Suite.
When the download is complete, select the .dmg file to begin the installation.
When installation is complete, GPG Keychain Access opens a web page with First Steps and a dialog box.
Enter your name and email address. Be sure to enter your official university or institution email address. EdX cannot use public keys that are based on personal or other non-official email addresses to encrypt data.
Select Generate key. A dialog box opens to prompt you for a passphrase.
Enter a strong passphrase. Be sure to select a passphrase that you can remember, or use a secure method of retaining it for reuse in the future: you use this passphrase when you decrypt your data packages.
To send only your public key to your edX partner manager, select the key and then select Export. A dialog box opens.
- Specify a file name and location to save the file.
- Make sure that Format is set to ASCII.
- Make sure that Allow secret key export is cleared.
When you select Save, only the public key is saved in the resulting .asc file. Do not share your private key with edX or any third party.
The data packages that edX prepares for each partner organization are uploaded to the Amazon Web Service (AWS) Simple Storage Service (Amazon S3). The edX Analytics team creates an individual account to access this storage service for each data czar. The credentials for accessing this account are called an Access Key and a Secret Key.
After the edX Analytics team creates these access credentials for you, they use the public encryption key that you sent your partner manager to encrypt the credentials into a credentials.csv.gpg file. The edX Analytics team then sends the file to you as an email attachment.
The credentials.csv.gpg file is likely to be the first file that you decrypt with your private GPG key. You use the same process to decrypt the data package files that you retrieve from Amazon S3. See Decrypt an Encrypted File.
To work with an encrypted .gpg file, you use the same GNU Privacy Guard program that you used to create your public/private key pair. You use your private key to decrypt the Amazon S3 credentials file and the files in your data packages.
Save the encrypted file in an accessible location.
On a Windows computer, open Windows Explorer. On a Macintosh, open Finder.
Navigate to the file and right-click it.
On a Windows computer, select Decrypt and verify, and then select Decrypt/Verify. Do not change any other setting.
On a Macintosh, select Services, and then select OpenPGP: Decrypt File.
Enter your passphrase. The GNU Privacy Guard program decrypts the file.
For example, when you decrypt the credentials.csv.gpg file the result is a credentials.csv file. Open the decrypted credentials.csv file to see that it contains your email address, your Access Key, and your Secret Key.
To connect to Amazon S3, you must have your decrypted credentials. You may want to have a third-party tool that gives you a user interface for managing files and transferring them from Amazon S3 to your network. Some data czars use applications like CloudBerry Explorer for Amazon S3, Bucket Explorer, or S3 Browser. Alternatively, you can use the AWS Command Line Interface.
Select and install a third-party tool or interface to manage your S3 account.
Open your decrypted credentials.csv
file. This file contains your AWS
Access Key and your AWS Secret Key.
Open the third-party tool.
Enter information to connect to the S3 account.
For example, you might need to select an option such as Open Connection, and then supply the service you want to connect to (Amazon S3), your Access Key, and your Secret Key. For more information, see the documentation provided for the tool that you selected.
To access the database data files, specify or select s3://course-data
.
To access the event data files, specify or select s3://edx-course-
data/{org}/
. You must include the identifier for your organization after
the name of the bucket.
Note
If you are using a third-party tool to connect to Amazon S3, you
might not be able to navigate directly between s3://course-data
and
s3://edx-course-data/{org}/
. You might need to disconnect from Amazon
S3 and then reconnect to specify the other destination.
For information about the files found at each of these Amazon S3 destinations, see Data Delivered in Data Packages.