DataSource API Quick start

A few examples to get you up and running with your first crawler


Crawl a single page

Useful for getting data from just a single webpage

{
    "origin_url": "http://www.zillow.com/homes/for_sale/Meadows-Village-Las-Vegas-NV/fsba,fsbo_lt/house,townhouse_type/274486_rid/36.159152,-115.139552,36.131323,-115.185686_rect/14_zm/0_mmm/",
    "columns": [
        {
            "col_name": "apartment page",
            "dom_query": "article a[href*=homedetails]"
        }
    ]
}

Crawl nested pages

Useful for getting data from single webpage and other webpages nested in it

{
    "origin_url": "http://www.zillow.com/homes/for_sale/Meadows-Village-Las-Vegas-NV/fsba,fsbo_lt/house,townhouse_type/274486_rid/36.159152,-115.139552,36.131323,-115.185686_rect/14_zm/0_mmm/",
    "columns": [
        {
            "col_name": "apartment page",
            "dom_query": "article a[href*=homedetails]",
            "required_attribute": "href",
            "options": {
                "columns": [
                    {
                        "col_name": "address",
                        "dom_query": "header h1"
                    },
                    {
                        "col_name": "neighborhood",
                        "dom_query": "h2:contains('Neighborhood')",
                        "regex_pattern": "Neighborhood: (.*)",
                        "regex_group": 1
                    },
                ]
            }
        }
    ]
}

Crawl listed pages

Useful for getting data from series of paginated web pages

{
    "origin_url": "http://www.zillow.com/homes/for_sale/Meadows-Village-Las-Vegas-NV/fsba,fsbo_lt/house,townhouse_type/274486_rid/36.159152,-115.139552,36.131323,-115.185686_rect/14_zm/0_mmm/",
    "columns": [
        {
            "col_name": "apartment page",
            "dom_query": "article a[href*=homedetails]"
        }
    ],
    "next_page": {
        "dom_query": ".zsg-pagination_active+li a"
    }    
}