Writing your first recipe

Recipes are JSON documents interpreted by our decentralized ODBC engine to fetch the data you need from the web. To write a recipe you will need to use the available clauses in our Semantic Query Language (SQL)

Below are some examples to get you up and running.


Query data from a single page

Useful for getting data from just a single webpage

{
    "origin_url": "http://www.zillow.com/homes/for_sale/Meadows-Village-Las-Vegas-NV/fsba,fsbo_lt/house,townhouse_type/274486_rid/36.159152,-115.139552,36.131323,-115.185686_rect/14_zm/0_mmm/",
    "columns": [
        {
            "col_name": "apartment page",
            "dom_query": "article a[href*=homedetails]"
        }
    ]
}

Query data from nested pages

Useful for getting data from single webpage and other webpages nested in it

{
    "origin_url": "http://www.zillow.com/homes/for_sale/Meadows-Village-Las-Vegas-NV/fsba,fsbo_lt/house,townhouse_type/274486_rid/36.159152,-115.139552,36.131323,-115.185686_rect/14_zm/0_mmm/",
    "columns": [
        {
            "col_name": "apartment page",
            "dom_query": "article a[href*=homedetails]",
            "required_attribute": "href",
            "options": {
                "columns": [
                    {
                        "col_name": "address",
                        "dom_query": "header h1"
                    },
                    {
                        "col_name": "neighborhood",
                        "dom_query": "h2:contains('Neighborhood')",
                        "regex_pattern": "Neighborhood: (.*)",
                        "regex_group": 1
                    },
                ]
            }
        }
    ]
}

Query data from listed pages

Useful for getting data from series of paginated web pages

{
    "origin_url": "http://www.zillow.com/homes/for_sale/Meadows-Village-Las-Vegas-NV/fsba,fsbo_lt/house,townhouse_type/274486_rid/36.159152,-115.139552,36.131323,-115.185686_rect/14_zm/0_mmm/",
    "columns": [
        {
            "col_name": "apartment page",
            "dom_query": "article a[href*=homedetails]"
        }
    ],
    "next_page": {
        "dom_query": ".zsg-pagination_active+li a"
    }    
}

Query data across multiple domains

Useful for getting data spread across multiple domains. In this scenario, we fetch the cast of the movie Gattaca from IMDB with date of birth of Google. Click to see example.

{
    "engine": "nokogiri",
    "client_version": "4.0.20",
    "origin_url": "https://www.imdb.com/title/tt0119177/fullcredits",
    "columns": [
        {
            "col_name": "CAST_NAME",
            "dom_query": "table.cast_list tr td:nth-child(2).itemprop a span.itemprop",
            "options": {
                "origin_url": "https://www.google.com/search?q={{CAST_NAME}}+date+of+birth",
                "columns": [
                    {
                        "col_name": "date_of_birth",
                        "dom_query": "div[role='heading'][aria-level='3'].HwtpBd.kno-fb-ctx div:nth-child(1)"
                    }
                ]
            }
        }
    ]
}

Query data with logged in sessions

Useful for getting data from web pages which requires login. In this scenario, we fetch our trading account balance from Ameritrade.com

{
    "engine": "chrome",
    "origin_url": "https://www.tdameritrade.com/home.page",
    "actions": [{
        "action_name": "insert",
        "dom_query": "#userid",        
        "value": "MY_USER_ID"
    },{
        "action_name": "insert",
        "dom_query": "#password",        
        "value": "MY_PASSWORD"
    },{
        "action_name": "click",
        "dom_query": "#password + button",
        "milliseconds": 10000
    }],
    "columns": [{
        "col_name": "my trading account balance",
        "dom_query": ".balance"
    }]
}

Next Steps

To find out more about how GetData.IO works