Recipes are JSON
documents interpreted by our decentralized ODBC engine to fetch the data you need from the web.
To write a recipe you will need to use the available clauses in our Semantic Query Language (SQL)
Below are some examples to get you up and running.
Useful for getting data from just a single webpage
{ "origin_url": "http://www.zillow.com/homes/for_sale/Meadows-Village-Las-Vegas-NV/fsba,fsbo_lt/house,townhouse_type/274486_rid/36.159152,-115.139552,36.131323,-115.185686_rect/14_zm/0_mmm/", "columns": [ { "col_name": "apartment page", "dom_query": "article a[href*=homedetails]" } ] }
Useful for getting data from single webpage and other webpages nested in it
{ "origin_url": "http://www.zillow.com/homes/for_sale/Meadows-Village-Las-Vegas-NV/fsba,fsbo_lt/house,townhouse_type/274486_rid/36.159152,-115.139552,36.131323,-115.185686_rect/14_zm/0_mmm/", "columns": [ { "col_name": "apartment page", "dom_query": "article a[href*=homedetails]", "required_attribute": "href", "options": { "columns": [ { "col_name": "address", "dom_query": "header h1" }, { "col_name": "neighborhood", "dom_query": "h2:contains('Neighborhood')", "regex_pattern": "Neighborhood: (.*)", "regex_group": 1 }, ] } } ] }
Useful for getting data from series of paginated web pages
{ "origin_url": "http://www.zillow.com/homes/for_sale/Meadows-Village-Las-Vegas-NV/fsba,fsbo_lt/house,townhouse_type/274486_rid/36.159152,-115.139552,36.131323,-115.185686_rect/14_zm/0_mmm/", "columns": [ { "col_name": "apartment page", "dom_query": "article a[href*=homedetails]" } ], "next_page": { "dom_query": ".zsg-pagination_active+li a" } }
Useful for getting data spread across multiple domains. In this scenario, we fetch the cast of the movie Gattaca from IMDB with date of birth of Google. Click to see example.
{ "engine": "nokogiri", "client_version": "4.0.20", "origin_url": "https://www.imdb.com/title/tt0119177/fullcredits", "columns": [ { "col_name": "CAST_NAME", "dom_query": "table.cast_list tr td:nth-child(2).itemprop a span.itemprop", "options": { "origin_url": "https://www.google.com/search?q={{CAST_NAME}}+date+of+birth", "columns": [ { "col_name": "date_of_birth", "dom_query": "div[role='heading'][aria-level='3'].HwtpBd.kno-fb-ctx div:nth-child(1)" } ] } } ] }
Useful for getting data from web pages which requires login. In this scenario, we fetch our trading account balance from Ameritrade.com
{ "engine": "chrome", "origin_url": "https://www.tdameritrade.com/home.page", "actions": [{ "action_name": "insert", "dom_query": "#userid", "value": "MY_USER_ID" },{ "action_name": "insert", "dom_query": "#password", "value": "MY_PASSWORD" },{ "action_name": "click", "dom_query": "#password + button", "milliseconds": 10000 }], "columns": [{ "col_name": "my trading account balance", "dom_query": ".balance" }] }
To find out more about how GetData.IO works