This API allows you to manage your data sources.
To make calls to the Data Source Management API, you need to first obtain your personal access token. Your personal access token can be found in your settings page under the developer settings section
Gets your data source to start collecting data.
POST https://getdata.io/data-sources/:DATA_SOURCE_UNIQUE_ID/run HEADERS Authorization: Bearer :PERSONAL_ACCESS_TOKEN BODY { "origin_urls": :ORIGIN_URLS, "data": :DATA, "format": "json" }
PERSONAL_ACCESS_TOKEN | Required String |
Your personal access token that can be obtained from your developer settings page |
---|---|---|
DATA_SOURCE_UNIQUE_ID | Required String |
The unique id which we will use to identify the data source you want to run |
ORIGIN_URLS | Optional Array of Strings |
An array of valid URLs in String format to gather data from |
DATA | Optional Hash |
A hash of key value pairs you want to preload your crawler with to use as dynamic variables |
POST https://getdata.io/data-sources/n15623_f379e48619a72273f95a21bcd6812c19eses/run HEADERS Authorization: Bearer 07ab2a8a985a9720bd355cc310467661391072e50b2698bb644d32a6589e326e BODY { "origin_urls": [ "https://google.com", "https://amazon.com", "https://bing.com" ], "data": { "attribute_1": "value_1", "attribute_2": "value_2" }, "format": "json" }
{ "id": 7742, "name": "Page Titles", "description": "A titles collected from major web portals in the world", "keywords": null, "frequency": "", "created_at": "2018-09-16T18:09:02.247Z", "updated_at": "2019-03-09T01:22:52.182Z", "handle": "n15623_f379e48619a72273f95a21bcd6812c19eses", "is_private": false, "status": "QUEUED", "last_ran": "2019-03-09T01:22:52.000Z", "latest_count": 0, "url": "https://getdata.io/data-sources/n15623_f379e48619a72273f95a21bcd6812c19eses.json", "batch": { "id": 862839, "name": "2019-03-09 01:22:52", "batch_key": 1552094572, "status": "queued", "count": null, "start_crawl": null, "end_crawl": null, "duration": null, "created_at": "2019-03-09T01:22:52.175Z", "updated_at": "2019-03-09T01:22:52.175Z" } }
Fetch a list of the data sources belonging to you
GET https://getdata.io/data-sources/mine?page=:PAGE&search=:SEARCH&schedule_filters=:SCHEDULE&status_filters=:STATUS
HEADERS
Authorization: Bearer :YOUR_PERSONAL_ACCESS_TOKEN
PERSONAL_ACCESS_TOKEN | Required String |
Your personal access token that can be obtained from your developer settings page |
---|---|---|
PAGE | Optional Integer |
The page of which you want to fetch data. Defaults to 1 if not provided |
SEARCH | Optional String |
The search term you would like to use to filter your data sources with. Filter is applied on values in your data source name, description and recipe |
SCHEDULE | Optional String |
A comma separated String indicating the list of schedules returned data sources should match. Supported values are:
|
STATUS | Optional String |
A comma separated String indicating the list of status returned data sources should match. Supported values are:
|
GET https://getdata.io/data-sources/mine?page=2&search=someting&schedule_filters=15_MIN,30_MIN,HOURLY&status_filters=QUEUED,IN PROGRESS HEADERS Authorization: Bearer 07ab2a8a985a9720bd355cc310467661391072e50b2698bb644d32a6589e326e
{ total: 380, pagination: { current_page: 2, per_page: 100, total_pages: 4, next_page: "https://getdata.io/data-sources/mine?page=3", prev_page: "https://getdata.io/data-sources/mine?page=1" }, results: [{ id: 4671, name: "Binance Live Price Chart, Exchanges, Trade Volume and Market Listing", description: null, keywords: null, frequency: "", created_at: "2018-03-01T06:29:04.079Z", updated_at: "2018-12-28T20:10:13.472Z", handle: "n4671_f41240836438832e018401faae14a69ceses", is_private: false, status: null, last_ran: null, latest_count: 0, url: "https://getdata.io/data-sources/4671-binance-live-price-chart-exchanges-trade-volume-and-market-listing.json" }, { id: 4216, name: "The Best Star Wars: The Last Jedi Toys You Can Buy | WIRED", description: null, keywords: null, frequency: "", created_at: "2017-10-24T06:20:20.754Z", updated_at: "2018-12-28T20:10:07.972Z", handle: "n4216_bc6bc5a45f81645f38ac2e519654bdb0eses", is_private: false, status: null, last_ran: null, latest_count: 0, url: "https://getdata.io/data-sources/4216-the-best-star-wars-the-last-jedi-toys-you-can-buy-wired.json" }, ... ] }
Returns of list of batches that is available in your data source.
GET https://getdata.io/data-sources/:DATA_SOURCE_UNIQUE_ID/batches
HEADERS
Authorization: Bearer :PERSONAL_ACCESS_TOKEN
PERSONAL_ACCESS_TOKEN | Required String |
Your personal access token that can be obtained from your developer settings page |
---|---|---|
DATA_SOURCE_UNIQUE_ID | Required String |
The unique id which we will use to identify the data source you want to run |
GET https://getdata.io/data-sources/n15623_f379e48619a72273f95a21bcd6812c19eses/batches HEADERS Authorization: Bearer 07ab2a8a985a9720bd355cc310467661391072e50b2698bb644d32a6589e326e
{ data_source_url: "https://getdata.io/data-sources/n15623_f379e48619a72273f95a21bcd6812c19eses.json", total: 664, pagination: { current_page: 2, per_page: 100, total_pages: 7, next_page: "https://getdata.io/data-sources/n15623_f379e48619a72273f95a21bcd6812c19eses/batches?page=3", prev_page: "https://getdata.io/data-sources/n15623_f379e48619a72273f95a21bcd6812c19eses/batches?page=1" }, results: [ { id: 1145650, name: "2019-03-09 00:00:00", batch_key: 1552089600, status: "completed", count: 112, start_crawl: "2019-03-09T00:02:34.008Z", end_crawl: "2019-03-09T00:02:46.215Z", duration: 12, created_at: "2019-03-09T00:00:02.497Z", updated_at: "2019-03-09T00:12:55.187Z", data_url_json: "http://cache.getdata.io/n15623_f379e48619a72273f95a21bcd6812c19eses/1552089600_page_1.json", data_url_csv: "http://cache.getdata.io/n15623_f379e48619a72273f95a21bcd6812c19eses/1552089600_all.csv" }, { id: 1145437, name: "2019-03-08 12:00:00", batch_key: 1552046400, status: "completed", count: 112, start_crawl: "2019-03-08T12:05:04.015Z", end_crawl: "2019-03-08T12:05:16.061Z", duration: 12, created_at: "2019-03-08T12:00:06.127Z", updated_at: "2019-03-08T12:15:22.209Z", data_url_json: "http://cache.getdata.io/n15623_f379e48619a72273f95a21bcd6812c19eses/1552046400_page_1.json", data_url_csv: "http://cache.getdata.io/n15623_f379e48619a72273f95a21bcd6812c19eses/1552046400_all.csv" }, ... ] }
To find out more about how GetData.IO works