Data Source Management API

This API allows you to manage your data sources.


Personal Access Token

To make calls to the Data Source Management API, you need to first obtain your personal access token. Your personal access token can be found in your settings page under the developer settings section


Run your data source

Gets your data source to start collecting data.

Call Structure

POST https://getdata.io/data-sources/:DATA_SOURCE_UNIQUE_ID/run

HEADERS
  Authorization: Bearer :PERSONAL_ACCESS_TOKEN

BODY
{
  "origin_urls": :ORIGIN_URLS,
  "data": :DATA,
  "format": "json"
}

Supported Params

PERSONAL_ACCESS_TOKEN Required String Your personal access token that can be obtained from your developer settings page
DATA_SOURCE_UNIQUE_ID Required String The unique id which we will use to identify the data source you want to run
ORIGIN_URLS Optional Array of Strings An array of valid URLs in String format to gather data from
DATA Optional Hash A hash of key value pairs you want to preload your crawler with to use as dynamic variables

Sample JSON Request

POST https://getdata.io/data-sources/n15623_f379e48619a72273f95a21bcd6812c19eses/run

HEADERS
  Authorization: Bearer 07ab2a8a985a9720bd355cc310467661391072e50b2698bb644d32a6589e326e

BODY
{
  "origin_urls": [
    "https://google.com",
    "https://amazon.com",
    "https://bing.com"
  ],
  "data": {
    "attribute_1": "value_1",
    "attribute_2": "value_2"
  },
  "format": "json"
}

Sample JSON Response

{
    "id": 7742,
    "name": "Page Titles",
    "description": "A titles collected from major web portals in the world",
    "keywords": null,
    "frequency": "",
    "created_at": "2018-09-16T18:09:02.247Z",
    "updated_at": "2019-03-09T01:22:52.182Z",
    "handle": "n15623_f379e48619a72273f95a21bcd6812c19eses",
    "is_private": false,
    "status": "QUEUED",
    "last_ran": "2019-03-09T01:22:52.000Z",
    "latest_count": 0,
    "url": "https://getdata.io/data-sources/n15623_f379e48619a72273f95a21bcd6812c19eses.json",
    "batch": {
        "id": 862839,
        "name": "2019-03-09 01:22:52",
        "batch_key": 1552094572,
        "status": "queued",
        "count": null,
        "start_crawl": null,
        "end_crawl": null,
        "duration": null,
        "created_at": "2019-03-09T01:22:52.175Z",
        "updated_at": "2019-03-09T01:22:52.175Z"
    }
}


List your data sources

Fetch a list of the data sources belonging to you

Call Structure

GET https://getdata.io/data-sources/mine?page=:PAGE&search=:SEARCH&schedule_filters=:SCHEDULE&status_filters=:STATUS

HEADERS
  Authorization: Bearer :YOUR_PERSONAL_ACCESS_TOKEN

Supported Params

PERSONAL_ACCESS_TOKEN Required String Your personal access token that can be obtained from your developer settings page
PAGE Optional Integer The page of which you want to fetch data. Defaults to 1 if not provided
SEARCH Optional String The search term you would like to use to filter your data sources with. Filter is applied on values in your data source name, description and recipe
SCHEDULE Optional String A comma separated String indicating the list of schedules returned data sources should match.
Supported values are:

  • NOT_SCHEDULED
  • 15_MIN
  • 30_MIN
  • HOURLY
  • THREE_HOURLY
  • SIX_HOURLY
  • TWICE_DAILY
  • DAILY
  • WEEKLY
  • FORTNIGHTLY
  • MONTHLY
  • QUARTERLY
  • YEARLY

STATUS Optional String A comma separated String indicating the list of status returned data sources should match.
Supported values are:

  • QUEUED
  • IN PROGRESS
  • CACHING
  • COMPLETED
  • ABORTED


Sample JSON Request

GET https://getdata.io/data-sources/mine?page=2&search=someting&schedule_filters=15_MIN,30_MIN,HOURLY&status_filters=QUEUED,IN PROGRESS

HEADERS
  Authorization: Bearer 07ab2a8a985a9720bd355cc310467661391072e50b2698bb644d32a6589e326e

Sample JSON Response

{
  total: 380,
  pagination: {
    current_page: 2,
    per_page: 100,
    total_pages: 4,
    next_page: "https://getdata.io/data-sources/mine?page=3",
    prev_page: "https://getdata.io/data-sources/mine?page=1"
  },
  results: [{
      id: 4671,
      name: "Binance Live Price Chart, Exchanges, Trade Volume and Market Listing",
      description: null,
      keywords: null,
      frequency: "",
      created_at: "2018-03-01T06:29:04.079Z",
      updated_at: "2018-12-28T20:10:13.472Z",
      handle: "n4671_f41240836438832e018401faae14a69ceses",
      is_private: false,
      status: null,
      last_ran: null,
      latest_count: 0,
      url: "https://getdata.io/data-sources/4671-binance-live-price-chart-exchanges-trade-volume-and-market-listing.json"
    }, {
      id: 4216,
      name: "The Best Star Wars: The Last Jedi Toys You Can Buy | WIRED",
      description: null,
      keywords: null,
      frequency: "",
      created_at: "2017-10-24T06:20:20.754Z",
      updated_at: "2018-12-28T20:10:07.972Z",
      handle: "n4216_bc6bc5a45f81645f38ac2e519654bdb0eses",
      is_private: false,
      status: null,
      last_ran: null,
      latest_count: 0,
      url: "https://getdata.io/data-sources/4216-the-best-star-wars-the-last-jedi-toys-you-can-buy-wired.json"
    },
    ...
  ]
}


List your data source batches

Returns of list of batches that is available in your data source.

Call Structure

GET https://getdata.io/data-sources/:DATA_SOURCE_UNIQUE_ID/batches

HEADERS
  Authorization: Bearer :PERSONAL_ACCESS_TOKEN

Supported Params

PERSONAL_ACCESS_TOKEN Required String Your personal access token that can be obtained from your developer settings page
DATA_SOURCE_UNIQUE_ID Required String The unique id which we will use to identify the data source you want to run

Sample JSON Request

GET https://getdata.io/data-sources/n15623_f379e48619a72273f95a21bcd6812c19eses/batches

HEADERS
  Authorization: Bearer 07ab2a8a985a9720bd355cc310467661391072e50b2698bb644d32a6589e326e

Sample JSON Response

{
  data_source_url: "https://getdata.io/data-sources/n15623_f379e48619a72273f95a21bcd6812c19eses.json",
  total: 664,
  pagination: {
    current_page: 2,
    per_page: 100,
    total_pages: 7,
    next_page: "https://getdata.io/data-sources/n15623_f379e48619a72273f95a21bcd6812c19eses/batches?page=3",
    prev_page: "https://getdata.io/data-sources/n15623_f379e48619a72273f95a21bcd6812c19eses/batches?page=1"
  },
  results: [
    {
      id: 1145650,
      name: "2019-03-09 00:00:00",
      batch_key: 1552089600,
      status: "completed",
      count: 112,
      start_crawl: "2019-03-09T00:02:34.008Z",
      end_crawl: "2019-03-09T00:02:46.215Z",
      duration: 12,
      created_at: "2019-03-09T00:00:02.497Z",
      updated_at: "2019-03-09T00:12:55.187Z",
      data_url_json: "http://cache.getdata.io/n15623_f379e48619a72273f95a21bcd6812c19eses/1552089600_page_1.json",
      data_url_csv: "http://cache.getdata.io/n15623_f379e48619a72273f95a21bcd6812c19eses/1552089600_all.csv"
    },
    {
      id: 1145437,
      name: "2019-03-08 12:00:00",
      batch_key: 1552046400,
      status: "completed",
      count: 112,
      start_crawl: "2019-03-08T12:05:04.015Z",
      end_crawl: "2019-03-08T12:05:16.061Z",
      duration: 12,
      created_at: "2019-03-08T12:00:06.127Z",
      updated_at: "2019-03-08T12:15:22.209Z",
      data_url_json: "http://cache.getdata.io/n15623_f379e48619a72273f95a21bcd6812c19eses/1552046400_page_1.json",
      data_url_csv: "http://cache.getdata.io/n15623_f379e48619a72273f95a21bcd6812c19eses/1552046400_all.csv"
    },
    ...
  ]
}


More resources

To find out more about how GetData.IO works