Getting Data Out of Aggregate

Warning

ODK Aggregate is no longer being updated. Please use ODK Central instead.

ODK Aggregate supports three primary mechanisms for data transfer:

  • Exporting: one time snapshot

  • Publishing: continuous streaming of submissions

  • APIs: programmatic access to the data

Since ODK supports complex data structures such as question grouping, repeating questions, and multimedia, compromises have been made for each mechanism in regards to these data structures. This document explains each mechanism and what each supports.

Exporting

The easiest way to get data from Aggregate is by using its ‘Export’ feature. Export allows the user to manually export all of the data (or filtered data) at any time into one of the following formats:

CSV (comma separated values)

CSV files are a text only, tabular representation of the data. Multimedia files are represented by including standard web links to the files. Repeats are also represented with links to the underlying data. Grouping information is not preserved. Metadata is only preserved if a filter is created with the metadata.

To download CSV files of forms with repeats, consider using ODK Briefcase instead of ODK Aggregate's export functionality. Instead of repeats that are represented with links to the underlying data, ODK Briefcase will export a set of CSV files, one for each repeating group. ODK Briefcase will also export any multimedia as files (e.g., pictures will get exported as JPEGs). The only metadata that is preserved is the submission date in the CSV is preserved, but the XML files that ODK Briefcase downloads has all form metadata.

Note

To import CSVs into Excel, you cannot download and open in one step; nor can you double-click on the CSV. You must open Excel and choose Import. If you are asked, the file origin or encoding is UTF-8.

JSON

JSON is a text only representation of the data in a key:value format. Multimedia files are represented by including standard web links to the files. Repeating data is preserved, but grouping information not related to repeats is not. Metadata is only preserved if a filter is created with the metadata.

KML (Keyhole Markup Language)

KML is a text only representation of the data similar to XML, but used for mapping applications like Google Earth. Multimedia files are represented by including standard web links to the files hosted on the server. Pictures will appear embedded in popup windows (when pushpins are clicked) in programs that render .kml files. Repeats and grouping information is not preserved. Metadata is only preserved if a filter is created with the metadata.

Export Data Summary

Format

Groups

Repeats

Multimedia

Metadata

CSV(Aggregate)

not preserved

links to

links to

preservable

CSV(Briefcase)

not preserved

split into separate CSVs

exported as files

only submission date

JSON

not preserved

preserved

links to

preservable

KML

not preserved

not preserved

links to

preservable

Publishing

Aggregate provides mechanism for either bulk publishing data to another service, or for streaming incoming data to a service as it is received, or both.

Warning

Under certain failure conditions, the downstream service can receive multiple copies of a given submission. This is known, expected, behavior.

Tip

  • Duplicates typically occur if the downstream service is slow to respond or acknowledge a request.

  • It is your responsibility to detect and eliminate these duplicates should they occur (they will always have exactly the same information in all fields).

Note

  • Publishing attempts happen at most every 15 minutes. If there are pending submissions to publish, there will be attempts at most every 30 seconds until there are no pending submissions.

  • If ODK Aggregate aborts its publishing attempt before it gets an acknowledgment, it will re-send the submission a short time later. If the downstream server successfully processed the first request, the re-send of that same submission can cause a duplicate record of it to appear in the downstream system. The instance ID of the submission will appear as the metainstanceid column in Google Sheets and as the meta-instance-id column in Google Fusion tables.

  • When the downstream server fails to respond or responds with an error code, ODK Aggregate first delays a re-send for 60 seconds, and, if that also fails, it then backs off its publishing attempts, and will re-send at either 7.5-minute or 15-minute intervals until the downstream service successfully responds, after which ODK Aggregate will resume its normal publishing behavior.

Aggregate currently supports publishing data to the following services or in the following formats:

Google Spreadsheets

Spreadsheets is Google’s cloud-hosted spreadsheet solution. Multimedia files are represented by including standard web links to the files. Repeats are also represented with links to the underlying data. All metadata is preserved. Spreadsheets has cell limits above which errors will be reported. Spreadsheets also has an API that can be used to export/publish data.

Warning

Non-repeating grouping information is not preserved with Google Sheets. Use globally unique names throughout your form or data might not transfer correctly to Google Sheets.

JSON Server

JSON preserves grouping and repeat structures. The user can choose to let multimedia files be represented as web links or embedded as base64 encoded strings. All metadata is preserved.

See details of the JSON publisher for more.

Publish Data Summary

Service

Groups

Repeats

Multimedia

Metadata

Google Spreadsheets

not preserved

links to

links to

preserved

JSON Server

preserved

preserved

links to or embedded as base64

preserved

APIs

ODK Aggregate has public APIs defined for sending data to and from its various components. This section gives references to those API implementations that can be used for connecting new applications to the ODK ecosystem.

Briefcase CLI

ODK Briefcase includes a CLI (Command Line Interface) to automate downloading forms from ODK Aggregate (or ODK Collect) and exporting the forms to an Excel-compatible format like CSV. For more details, see Working with command line on Briefcase.

Briefcase-Aggregate API

ODK Briefcase uses an API that external applications can use to pull all data from and push data to ODK Aggregate.

Interfaces used during pull actions are:

  • OpenRosa Form Discovery API

  • download forms and media files using URLs form discovery

  • view/submissionList to obtain a chunk of submission keys

  • view/downloadSubmission to download an individual submission

Interfaces used during push actions are:

  • formUpload to upload a form and its media files to ODK Aggregate

  • view/submissionList to obtain a chunk of submission keys

  • OpenRosa Form Submission/Overwrite API (with extensions)

For more details, see Briefcase Aggregate API.

OpenRosa API

Collect and Aggregate communicate using a standard set of API calls defined in the OpenRosa Protocol specification, which can also be used by alternative clients.

Direct database connection

Warning

This is an extremely dangerous way to access data from Aggregate because table structure can and does change between versions. Moreover, changing any of this data could corrupt your Aggregate install.

If you are running on a Tomcat server, you have access to the underlying MySQL or PostgreSQL tables. The structure of those tables are documented at Aggregate Database Structure.

Accessing Media

Many of the export and publishing options provide a URL to the media (image, audio or video) without providing the content itself. To enable the viewing (following) of this link without requiring a log-in:

  • Go to the Permissions sub-tab under the Site Admin tab.

  • Check the checkbox for: Allow anonymous retrieval of images, audio and video data (needed for GoogleEarth balloon displays).

  • Click the Save Changes button.

This allows anyone to view the media files on your server. Even though you are granting anyone access to this information, it is still quite secure because the users would need to have a valid URL.

Did this page help you?

Selecting an option will open a 1-question survey

👍 Yes 👎 No