A package to simplify usage of the DeepCrawl GraphQL

These details have not been verified by PyPI

Project links

Homepage

Project description

DeepCrawl GraphQL Wrapper 1.0.0 documentation

Welcome to DeepCrawl GraphQL Wrapper’s documentation!¶

pip install deepcrawl_graphql

Authentication¶

To authenticate with the DeepCrawl API you have to first create a connection.

Connection¶

class deepcrawl_graphql.api.DeepCrawlConnection(user_key_id=None, secret=None, token=None)¶

DeepCrawlConnection class

Creates a connection instance used for sending GraphQL queries to DeepCrawl.

>>> from deepcrawl_graphql.api import DeepCrawlConnection

>>> DeepCrawlConnection("user_key_id", "secret")
DeepCrawlConnection instance

>>> DeepCrawlConnection(token="token")
DeepCrawlConnection instance

Parameters:

user_key_id (int or str) – user key, used together with secret for authentication.
secret (str) – secret key, used together with user_key_id for authentication.
token (str) – authentication token, if used ignores the user_key_id and secret

run_query(query, variables=None, var=None, **kwargs)¶ Runs a query.

>>> conn = DeepCrawlConnection("user_key_id", "secret")

There are 3 possible ways you can run a query

Using a query string. You can use the DeepCrawl explorer to construct the string https://graph-docs.deepcrawl.com/graphql/explorer

>>> query = 'query MyQuery {version}'
>>> conn.run_query(query)
{'version': '1.91.1-next.0-en'}

Using the gql package to construct a dynamic query.

>>> from gql.dsl import DSLQuery
>>> query = DSLQuery(conn.ds.Query.me.select(conn.ds.User.id, conn.ds.User.username))
>>> conn.run_query(query)
{
 'me': {
 'id': 'id',
 'username': 'email@example.com'
 }
}

>>> from gql.dsl import DSLVariableDefinitions
>>> from gql.dsl import DSLQuery
>>> var = DSLVariableDefinitions()
>>> query = conn.ds.Query.me.select(
 conn.ds.User.accounts.args(first=var.first, after=var.after)
 .select(
 conn.ds.AccountConnection.nodes.select(
 conn.ds.Account.id,
 conn.ds.Account.name,
 )
 )
 .select(
 conn.ds.AccountConnection.pageInfo.select(
 conn.ds.PageInfo.startCursor,
 conn.ds.PageInfo.endCursor,
 conn.ds.PageInfo.hasNextPage,
 conn.ds.PageInfo.hasPreviousPage,
 )
 )
 )
>>> conn.run_query(query, variables={"first": 1, "after": "MQ"}, var=var)
{
 "me": {
 "accounts": {
 "nodes": [
 {
 "id": "id",
 "name": "name"
 }
 ],
 "pageInfo": {
 "startCursor": "Mg",
 "endCursor": "Mg",
 "hasNextPage": false,
 "hasPreviousPage": true
 }
 }
 }
}
# For more information about constructing queries with dsl
# see https://gql.readthedocs.io/en/stable/advanced/dsl_module.html

Import a query from the deepcrawl_graphql package and use it’s prebuild queries.

>>> from deepcrawl_graphql.me.me import MeQuery
>>> me_query = MeQuery(conn)
>>> me_query.select_me()
>>> conn.run_query(me_query)
{
 'me': {
 'id': 'id',
 'username': 'email@example.com',
 'email': 'email@example.com',
 'firstName': 'FirstName',
 'lastName': 'LastName',
 'createdAt': '2019-10-27T17:11:17.000Z',
 'updatedAt': '2022-01-15T10:10:38.000Z',
 'jobTitle': None,
 'overallLimitLevelsMax': 1000,
 'overallLimitPagesMax': 10000000,
 'ssoClientId': None,
 'termsAgreed': True,
 'rawID': 'id',
 'permissions': []
 }
}

Parameters:

query (Query or DSLField or DSLQuery or str) – query object
variables (dict or None) – variables to use in the query. run_query(… variables={“first”: 1, “after”: “MQ”})
var (DSLVariableDefinitions) – gql DSLVariableDefinitions instance to be used in the query. Works together with variables
kwargs – variables to use in the query. If variables is not used and the user prefers to send variables as function arguments run_query(… first=1, after=”MQ”)

get_by_base64_id(base64_id, object_type, fields)¶ Get object by Base64 id

>>> conn.get_by_base64_id(
 "TjAxNFJlcG9ydERvd25sb2FkNTk2Njg1MzE",
 "ReportDownload",
 fields=(conn.ds.ReportDownload.id, conn.ds.ReportDownload.fileURL),
 )

Parameters:

base64_id (str) – Object’s Base64 id
object_type (str) – Object’s meta type name
fields (List*(DSLField)*) – Fields to select

run_mutation(mutation, variables=None, var=None, **kwargs)¶ Runs a mutation.

>>> conn = DeepCrawlConnection("user_key_id", "secret")

There are multiple ways you can run a mutation

Import the Mutation class from deepcrawl_graphql package and use it’s prebuild methods.

>>> from deepcrawl_graphql.mutation import Mutation
>>> mutation = Mutation(conn)
>>> project_input = {"accountId": "acount_id", "name": "Create-Project-0", "primaryDomain": "domain"}
>>> mutation.create_project(project_input)  # Automatically runs this method
{
 "createProject": {
 "project": {
 "id": "id",
 "name": "Create-Project-0",
 "primaryDomain": "http://python.deepcrawl.com/",
 ...
 }
 }
}

Parameters:

mutation (DSLField or DSLMutation or str) – mutation object
variables (dict or None) – variables to use in the query. run_query(… variables={“input”: {“name”: “name”, …}})
var (DSLVariableDefinitions) – gql DSLVariableDefinitions instance to be used in the query. Works tougether with variables
kwargs – variables to use in the query. If variables is not used and the user prefers to send variables as function arguments run_query(… input={“name”: “name”, …})

Special Arguments¶

Pagination¶

There are some optional arguments which can be used while running the query: first, last, after, before

first - Number of records to fetch from start
last - Number of records to fetch from end
after - Fetch after cursor
before - Fetch before cursor

>>> conn = DeepCrawlConnection("user_key_id", "secret")

By default first 100 or less objects are retrieved

>>> account_query = AccountQuery(conn, "74910")
>>> account_query.select_projects()
>>> conn.run_query(account_query)

{
    "getAccount": {
        "projects": {
            "nodes": [
                {"id": "id-1", "name": "name-1", ...},
                {"id": "id-2", "name": "name-2", ...},
                {"id": "id-3", "name": "name-3", ...},
                {"id": "id-4", "name": "name-4", ...},
            ],
            "pageInfo": {
                "startCursor": "MQ",
                "endCursor": "NA",
                "hasNextPage": False,
                "hasPreviousPage": False
            }
        }
    }
}

By using the first argument you can choose how many object to retrieve per page.

>>> conn.run_query(account_query, variables={"first": 2})
OR
>>> conn.run_query(account_query, first=2)

{
    "me": {
        "accounts": {
            "nodes": [
                {"id": "id-1", "name": "name-1", ...},
                {"id": "id-2", "name": "name-2", ...},
            ],
            "pageInfo": {
                "startCursor": "MQ",
                "endCursor": "Mg",
                "hasNextPage": True,
                "hasPreviousPage": False
            }
        }
    }
}

By using the first argument combined with the after argument you can choose how many object to retrieve per page after the selected cursor.

>>> conn.run_query(account_query, variables={"first": 2, "after": "Mg"})
OR
>>> conn.run_query(account_query, first=2, after="Mg")

{
    "me": {
        "accounts": {
            "nodes": [
                {"id": "id-3", "name": "name-3", ...},
                {"id": "id-4", "name": "name-4", ...},
            ],
            "pageInfo": {
                "startCursor": "Mw",
                "endCursor": "NA",
                "hasNextPage": False,
                "hasPreviousPage": True
            }
        }
    }
}

By using the first argument combined with the before argument you can choose how many object to retrieve per page before the selected cursor.

>>> conn.run_query(account_query, variables={"first": 2, "before": "Mg"})
OR
>>> conn.run_query(account_query, first=2, before="Mg")

{
    "me": {
        "accounts": {
            "nodes": [
                {"id": "id-1", "name": "name-1", ...},
                {"id": "id-2", "name": "name-2", ...},
            ],
            "pageInfo": {
                "startCursor": "MQ",
                "endCursor": "Mg",
                "hasNextPage": True,
                "hasPreviousPage": False
            }
        }
    }
}

Ordering¶

There is an optional argument which can be used while running the query: order_by

The order argument can be used to sort the dataset.

It accepts a dictionary with two keys: direction (ASC or DESC) and a field (to sort against)

>>> conn = DeepCrawlConnection("user_key_id", "secret")

>>> account_query = AccountQuery(conn, "74910")
>>> account_query.select_projects()

Default result:

>>> conn.run_query(account_query)

{
    "getAccount": {
        "projects": {
            "nodes": [
                {"id": "id-1", "name": "name-1", ...},
                {"id": "id-2", "name": "name-2", ...},
                {"id": "id-3", "name": "name-3", ...},
                {"id": "id-4", "name": "name-4", ...},
            ],
            "pageInfo": {
                "startCursor": "MQ",
                "endCursor": "NA",
                "hasNextPage": False,
                "hasPreviousPage": False
            }
        }
    }
}

Using order_by result:

>>> conn.run_query(account_query, order_by={"direction": "DESC", "field": "id"})

{
    "getAccount": {
        "projects": {
            "nodes": [
                {"id": "id-4", "name": "name-4", ...},
                {"id": "id-3", "name": "name-3", ...},
                {"id": "id-2", "name": "name-2", ...},
                {"id": "id-1", "name": "name-1", ...},
            ],
            "pageInfo": {
                "startCursor": "MQ",
                "endCursor": "NA",
                "hasNextPage": False,
                "hasPreviousPage": False
            }
        }
    }
}

Select Specific Fields¶

In some cases, you may not need all the returned fields. Maybe you are interested only in a list of project ids and names.

For this kind of scenario, you can use the fields argument implemented for all the select methods.

>>> conn = DeepCrawlConnection("user_key_id", "secret")
>>> account_query = AccountQuery(conn, "74910")
>>> account_query.select_projects(fields=(conn.ds.Project.id, conn.ds.Project.name))
>>> conn.run_query(account_query)

{
    "getAccount": {
        "projects": {
            "nodes": [
                {"id": "id-1", "name": "name-1"},
                {"id": "id-2", "name": "name-2"},
            ],
            "pageInfo": {
                "startCursor": "MQ",
                "endCursor": "NA",
                "hasNextPage": False,
                "hasPreviousPage": False
            }
        }
    }
}

Although this is very useful you have to have a pretty good understanding of the schema for using this feature.

Me¶

MeQuery¶

class deepcrawl_graphql.me.me.MeQuery(conn: DeepCrawlConnection)¶

MeQuery class

Creates a me query instance. “Me” being the authenticated user. The instance will be passed to the run_query method in order to execute the query.

>>> from deepcrawl_graphql.me.me import MeQuery

>>> me_query = MeQuery(conn)
>>> me_query.select_me()
>>> me_query.select_accounts()
>>> me = conn.run_query(me_query)

Parameters: conn (DeepCrawlConnection) – Connection.

select_me(fields=None)¶ Selects user fields.

Parameters: fields (List*(DSLField)*) – Select specific fields.

select_accounts(fields=None)¶ Selects users accounts.

Parameters: fields (List*(DSLField)*) – Select specific fields.

Accounts¶

AccountQuery¶

class deepcrawl_graphql.accounts.account.AccountQuery(conn: DeepCrawlConnection, account_id)¶

AccountQuery class

Creates an accout query instance. The instance will be passed to the run_query method in order to execute the query.

>>> from deepcrawl_graphql.accounts.account import AccountQuery

>>> account_query = AccountQuery(conn, "id")
>>> account_query.select_account()
>>> account_query.select_settings()
>>> account_query.select_callback_headers()
>>> account_query.select_feature_flags()
>>> account_query.select_locations()
>>> account_query.select_package()
>>> account_query.select_subscription()
>>> account_query.select_projects()
>>> account_query.select_project("project_id")
>>> account = conn.run_query(account_query)

Parameters:

conn (DeepCrawlConnection) – Connection.
account_id (int or str) – account id.

select_account(fields=None)¶ Selects account fields.

Parameters: fields (List*(DSLField)*) – Select specific fields.

select_settings(fields=None)¶ Selects account accountSettings.

Parameters: fields (List*(DSLField)*) – Select specific fields.

select_callback_headers(fields=None)¶ Selects account apiCallbackHeaders.

Parameters: fields (List*(DSLField)*) – Select specific fields.

select_feature_flags(fields=None)¶ Selects account featureFlags.

Parameters: fields (List*(DSLField)*) – Select specific fields.

select_locations(fields=None)¶ Selects account locations.

Parameters: fields (List*(DSLField)*) – Select specific fields.

select_package(fields=None)¶ Selects account primaryAccountPackage.

Parameters: fields (List*(DSLField)*) – Select specific fields.

select_subscription(include_addons=False, integration_type=None, fields=None)¶ Selects account subscription.

Parameters:

include_addons (bool) – If true includes the addons available.
integration_type (str) – Selects an addon by integration type
fields (List*(DSLField)*) – Select specific fields.

select_projects(fields=None)¶ Selects account projects.

Parameters: fields (List*(DSLField)*) – Select specific fields.

select_project(project_id, fields=None)¶ Selects account project by id.

Parameters:

project_id (bool) – Project id.
fields (List*(DSLField)*) – Select specific fields.

Projects¶

ProjectQuery¶

class deepcrawl_graphql.projects.project.ProjectQuery(conn: DeepCrawlConnection, project_id)¶

ProjectQuery class

Creates a project query instance. The instance will be passed to the run_query method in order to execute the query.

>>> from deepcrawl_graphql.projects.project import ProjectQuery

>>> project_query = ProjectQuery(conn, "project_id")
>>> project_query.select_project()
>>> project_query.select_sitemaps()
>>> project_query.select_advanced_crawl_rate()
>>> project_query.select_majestic_configuration()
>>> project_query.select_location()
>>> project_query.select_google_search_configuration()
>>> project_query.select_custom_extraction_settings()
>>> project_query.select_account()
>>> project_query.select_crawls()
>>> project = conn.run_query(project_query)

Parameters:

conn (DeepCrawlConnection) – Connection.
project_id (int or str) – project id.

select_project(fields=None)¶ Selects project fields.

Parameters: fields (List*(DSLField)*) – Select specific fields.

select_sitemaps(fields=None)¶ Selects project sitemaps.

Parameters: fields (List*(DSLField)*) – Select specific fields.

select_advanced_crawl_rate(fields=None)¶ Selects project maximumCrawlRateAdvanced.

Parameters: fields (List*(DSLField)*) – Select specific fields.

select_majestic_configuration(fields=None)¶ Selects project majesticConfiguration.

Parameters: fields (List*(DSLField)*) – Select specific fields.

select_location(fields=None)¶ Selects project location.

Parameters: fields (List*(DSLField)*) – Select specific fields.

select_last_finished_crawl(fields=None)¶ Selects project lastFinishedCrawl.

Not implemented yet.

Parameters: fields (List*(DSLField)*) – Select specific fields.

select_google_search_configuration(fields=None)¶ Selects project googleSearchConsoleConfiguration.

Parameters: fields (List*(DSLField)*) – Select specific fields.

select_google_analytics_project_view(fields=None)¶ Selects project googleAnalyticsProjectView.

Not implemented yet.

Parameters: fields (List*(DSLField)*) – Select specific fields.

select_custom_extraction_settings(fields=None)¶ Selects project customExtractions.

select_account(fields=None)¶ Selects project account.

Parameters: fields (List*(DSLField)*) – Select specific fields.

select_crawls(fields=None)¶ Selects project crawls.

Parameters: fields (List*(DSLField)*) – Select specific fields.

Crawls¶

CrawlQuery¶

class deepcrawl_graphql.crawls.crawl.CrawlQuery(conn: DeepCrawlConnection, crawl_id)¶

CrawlQuery class

Creates a crawl query instance. The instance will be passed to the run_query method in order to execute the query.

>>> from deepcrawl_graphql.crawls.crawl import CrawlQuery

>>> crawl_query = CrawlQuery(conn, "crawl_id")
>>> crawl_query.select_crawl()
>>> crawl_query.select_parquet_files("datasource_name")
>>> crawl_query.select_compared_to()
>>> crawl = conn.run_query(crawl_query)

Parameters:

conn (DeepCrawlConnection) – Connection.
crawl_id (int or str) – crawl id.

select_crawl(fields=None)¶ Selects crawl fields.

Parameters: fields (List*(DSLField)*) – Select specific fields.

select_parquet_files(datasource_name, fields=None)¶ Selects crawl parquetFiles.

Parameters:

datasource_name (str) – Datasource name.
fields (List*(DSLField)*) – Select specific fields.

select_crawl_type_counts(crawl_types, segment_id=None, fields=None)¶ Selects crawl fields.

Not implemented yet.

Parameters:

crawl_types (str) – Crawl type.
segment_id (int or str) – Segment id.
fields (List*(DSLField)*) – Select specific fields.

select_crawl_settings(fields=None)¶ Selects crawl crawlSetting.

Parameters: fields (List*(DSLField)*) – Select specific fields.

select_compared_to(fields=None)¶ Selects crawl comparedTo.

Parameters: fields (List*(DSLField)*) – Select specific fields.

select_reports(fields=None)¶ Selects crawl reports.

Parameters: fields (List*(DSLField)*) – Select specific fields.

select_report_downloads(fields=None)¶ Selects reports downloads.

Parameters: fields (List*(DSLField)*) – Select specific fields.

Reports¶

ReportQuery¶

class deepcrawl_graphql.reports.report.ReportQuery(conn: DeepCrawlConnection, crawl_id, report_tamplate_code, report_type_code, segment_id=None)¶

ReportQuery class

Creates a report query instance. The instance will be passed to the run_query method in order to execute the query.

>>> from deepcrawl_graphql.reports.report import ReportQuery

>>> report_query = ReportQuery(conn, "crawl_id", "report_tamplate_code", "report_type_code")
>>> report_query.select_report()
>>> report_query.select_datasource()
>>> report_query.select_type()
>>> report_query.select_trend()
>>> report_query.select_segment()
>>> report_query.select_report_template()
>>> conn.run_query(report_query)

Parameters:

conn (DeepCrawlConnection) – Connection.
crawl_id (int or str) – crawl id.
report_tamplate_code (str) – report template code.
report_type_code (str) – report type code.
segment_id (int or str) – segment id.

select_report(fields=None)¶ Selects report fields.

Parameters: fields (List*(DSLField)*) – Select specific fields.

select_raw_trends(fields=None)¶ Selects report rawTrends.

Parameters: fields (List*(DSLField)*) – Select specific fields.

select_datasource(fields=None)¶ Selects report datasources.

Parameters: fields (List*(DSLField)*) – Select specific fields.

select_type(fields=None)¶ Selects report type.

Parameters: fields (List*(DSLField)*) – Select specific fields.

select_trend(fields=None)¶ Selects report trend.

Parameters: fields (List*(DSLField)*) – Select specific fields.

select_segment(fields=None)¶ Selects report segment.

Parameters: fields (List*(DSLField)*) – Select specific fields.

select_report_template(fields=None)¶ Selects report reportTemplate.

Parameters: fields (List*(DSLField)*) – Select specific fields.

select_report_downloads(fields=None)¶ Selects report reportDownloads.

Parameters: fields (List*(DSLField)*) – Select specific fields.

Mutations¶

class deepcrawl_graphql.mutations.Mutation(conn: DeepCrawlConnection)¶ Mutation class

cancel_crawling(crawl_id, fields=None)¶ Cancel a running Crawl

Parameters:

crawl_id (int or str) – Crawl id
fields (List*(DSLField)*) – Select specific fields.

create_project(project_input, fields=None)¶ Creates a Project

GraphQL Input Example

It has to be converted to dict

input CreateProjectInput {
  accountId: ObjectID!
  alertEmails: [String!]
  alertSettingCode: AlertSettingCode! = Always
  apiCallbackHeaders: [APICallbackHeaderInput!]! = []
  apiCallbackUrl: String = null
  autoFinalizeOnCrawlLimits: Boolean! = false
  compareToCrawl: CompareToCrawlType! = LastCrawl
  crawlDisallowedUrls1stLevel: Boolean! = false
  crawlHyperlinksExternal: Boolean! = false
  crawlHyperlinksInternal: Boolean! = true
  crawlImagesExternal: Boolean! = false
  crawlImagesInternal: Boolean! = true
  crawlNofollowHyperlinks: Boolean! = true
  crawlNonHtml: Boolean! = false
  crawlNotIncluded1stLevel: Boolean! = false
  crawlRedirectsExternal: Boolean! = true
  crawlRedirectsInternal: Boolean! = true
  crawlRelAmphtmlExternal: Boolean! = false
  crawlRelAmphtmlInternal: Boolean! = true
  crawlRelCanonicalsExternal: Boolean! = false
  crawlRelCanonicalsInternal: Boolean! = true
  crawlRelHreflangsExternal: Boolean! = false
  crawlRelHreflangsInternal: Boolean! = true
  crawlRelMobileExternal: Boolean! = false
  crawlRelMobileInternal: Boolean! = true
  crawlRelNextPrevExternal: Boolean! = false
  crawlRelNextPrevInternal: Boolean! = true
  crawlRobotsTxtNoindex: Boolean! = true
  crawlScriptsExternal: Boolean! = false
  crawlScriptsInternal: Boolean! = true
  crawlStylesheetsExternal: Boolean! = false
  crawlStylesheetsInternal: Boolean! = true
  crawlTestSite: Boolean! = false
  crawlTypes: [CrawlType!]! = [Web]
  customDns: [CustomDnsSettingInput!]! = []
  customExtractions: [CustomExtractionSettingInput!]! = []
  customRequestHeaders: [CustomRequestHeaderInput!]! = []
  discoverSitemapsInRobotsTxt: Boolean! = true
  duplicatePrecision: Float! = 3
  emptyPageThreshold: Int! = 512
  excludeUrlPatterns: [String!]! = []
  gaDateRange: Int! = 30
  ignoreInvalidSSLCertificate: Boolean! = false
  includeHttpAndHttps: Boolean! = false
  includeSubdomains: Boolean! = false
  includeUrlPatterns: [String!]! = []
  industryCode: String
  limitLevelsMax: Int
  limitUrlsMax: Int
  locationCode: LocationCode! = Default
  logSummaryRequestsHigh: Int! = 100
  logSummaryRequestsLow: Int! = 10
  maxBodyContentLength: Int! = 51200
  maxDescriptionLength: Int! = 230
  maxFollowedExternalLinks: Int! = 10
  maxHtmlSize: Int! = 204800
  maxLinks: Int! = 250
  maxLoadTime: Float! = 3
  maxRedirections: Int! = 4
  maxTitleWidth: Int! = 600
  maxUrlLength: Int! = 1024
  maximumCrawlRate: Float! = 3
  maximumCrawlRateAdvanced: [AdvancedCrawlRateInput!]! = []
  minDescriptionLength: Int! = 50
  minTitleLength: Int! = 10
  minVisits: Int! = 2
  mobileHomepageUrl: String = null
  mobileUrlPattern: String = null
  mobileUserAgentCode: String! = "googlebot-smartphone"
  name: String!
  primaryDomain: String!
  rendererBlockAds: Boolean! = true
  rendererBlockAnalytics: Boolean! = true
  rendererBlockCustom: [String!]! = []
  rendererJsString: String = null
  rendererJsUrls: [String!]! = []
  robotsOverwrite: String = null
  secondaryDomains: [String!]! = []
  startUrls: [String!]! = []
  testSiteDomain: String = null
  testSitePassword: String = null
  testSiteUsername: String = null
  thinPageThreshold: Int! = 3072
  urlRewriteQueryParameters: [String!]! = []
  urlRewriteRules: [UrlRewriteRuleInput!]! = []
  urlRewriteStripFragment: Boolean! = true
  urlSampling: [UrlSamplingInput!]! = []
  useMobileSettings: Boolean! = false
  useRenderer: Boolean! = false
  useRobotsOverwrite: Boolean! = false
  useStealthMode: Boolean! = false
  useUrlRewriteRules: Boolean! = false
  userAgentCode: String! = "googlebot-smartphone"
  userAgentString: String = null
  userAgentStringMobile: String = null
  userAgentToken: String = null
  userAgentTokenMobile: String = null
}

Parameters:

project_input (dict) – Project input
fields (List*(DSLField)*) – Select specific fields

create_report_download(report_download_input, fields=None)¶ Creates a report download.

GraphQL Input Example

It has to be converted to dict

input CreateReportDownloadInput {
  crawlDuplicateUrlFilter: CrawlDuplicateUrlConnectionFilterInput
  crawlHreflangsFilter: CrawlHreflangsConnectionFilterInput
  crawlId: ObjectID
  crawlLinkFilter: CrawlLinkConnectionFilterInput
  crawlLinkedDomainFilter: CrawlLinkedDomainConnectionFilterInput
  crawlSitemapFilter: CrawlSitemapConnectionFilterInput
  crawlUncrawledUrlFilter: CrawlUncrawledUrlConnectionFilterInput
  crawlUniqueLinkFilter: CrawlUniqueLinkConnectionFilterInput
  crawlUrlFilter: CrawlUrlConnectionFilterInput
  crawlWebCrawlDepthFilter: CrawlWebCrawlDepthConnectionFilterInput
  fileName: String
  filter: JSONObject
  outputType: ReportDownloadOutputType! = CsvZip
  reportId: ObjectID
  reportTemplateCode: String
  reportTypeCode: ReportTypeCode
  segmentId: ObjectID
  selectedMetrics: [String!]
}

Parameters:

report_download_input (dict) – Report Download input.
fields (List*(DSLField)*) – Select specific fields.

delete_crawl(crawl_id, fields=None)¶ Deletes a Crawl

Parameters:

crawl_id (int or str) – Crawl id
fields (List*(DSLField)*) – Select specific fields.

delete_project(project_id, fields=None)¶ Deletes a Project

Parameters:

project_id (int or str) – Project id
fields (List*(DSLField)*) – Select specific fields

delete_report_download(report_download_id, fields=None)¶ Deletes a report download.

Parameters:

report_download_id (int or str) – Report Download id.
fields (List*(DSLField)*) – Select specific fields.

pause_crawling(crawl_id, fields=None)¶ Pause a running Crawl

Parameters:

crawl_id (int or str) – Crawl id
fields (List*(DSLField)*) – Select specific fields.

resume_crawling(crawl_id, fields=None)¶ Resume a Crawl

Parameters:

crawl_id (int or str) – Crawl id
fields (List*(DSLField)*) – Select specific fields.

run_crawl_for_project(project_id, fields=None)¶ Run a Crawl for a Project

Parameters:

project_id (int or str) – Project id
fields (List*(DSLField)*) – Select specific fields.

unarchive_crawl(crawl_id, fields=None)¶ Unarchive a Crawl

Parameters:

crawl_id (int or str) – Crawl id
fields (List*(DSLField)*) – Select specific fields.

update_account(account_input, fields=None)¶ Update Account.

GraphQL Input Example

It has to be converted to dict

input UpdateAccountInput {
    accountId: ObjectID!
    apiCallbackHeaders: [APICallbackHeaderInput!]
    apiCallbackUrl: String
    country: String
    customLogo: Upload
    customProxy: String
}

Parameters:

account_input (dict) – Account input
fields (List*(DSLField)*) – Select specific fields.

update_project(project_input, fields=None)¶ Updates a Project

GraphQL Input Example

It has to be converted to dict

input UpdateProjectInput {
  alertEmails: [String!]
  alertSettingCode: AlertSettingCode
  apiCallbackHeaders: [APICallbackHeaderInput!]
  apiCallbackUrl: String
  autoFinalizeOnCrawlLimits: Boolean
  compareToCrawl: CompareToCrawl
  crawlDisallowedUrls1stLevel: Boolean
  crawlHyperlinksExternal: Boolean
  crawlHyperlinksInternal: Boolean
  crawlImagesExternal: Boolean
  crawlImagesInternal: Boolean
  crawlNofollowHyperlinks: Boolean
  crawlNonHtml: Boolean
  crawlNotIncluded1stLevel: Boolean
  crawlRedirectsExternal: Boolean
  crawlRedirectsInternal: Boolean
  crawlRelAmphtmlExternal: Boolean
  crawlRelAmphtmlInternal: Boolean
  crawlRelCanonicalsExternal: Boolean
  crawlRelCanonicalsInternal: Boolean
  crawlRelHreflangsExternal: Boolean
  crawlRelHreflangsInternal: Boolean
  crawlRelMobileExternal: Boolean
  crawlRelMobileInternal: Boolean
  crawlRelNextPrevExternal: Boolean
  crawlRelNextPrevInternal: Boolean
  crawlRobotsTxtNoindex: Boolean
  crawlScriptsExternal: Boolean
  crawlScriptsInternal: Boolean
  crawlStylesheetsExternal: Boolean
  crawlStylesheetsInternal: Boolean
  crawlTestSite: Boolean
  crawlTypes: [CrawlType!]
  customDns: [CustomDnsSettingInput!]
  customExtractions: [CustomExtractionSettingInput!]
  customRequestHeaders: [CustomRequestHeaderInput!]
  discoverSitemapsInRobotsTxt: Boolean
  duplicatePrecision: Float
  emptyPageThreshold: Int
  excludeUrlPatterns: [String!]
  gaDateRange: Int
  ignoreInvalidSSLCertificate: Boolean
  includeHttpAndHttps: Boolean
  includeSubdomains: Boolean
  includeUrlPatterns: [String!]
  industryCode: String
  limitLevelsMax: Int
  limitUrlsMax: Int
  locationCode: LocationCode
  logSummaryRequestsHigh: Int
  logSummaryRequestsLow: Int
  maxBodyContentLength: Int
  maxDescriptionLength: Int
  maxFollowedExternalLinks: Int
  maxHtmlSize: Int
  maxLinks: Int
  maxLoadTime: Float
  maxRedirections: Int
  maxTitleWidth: Int
  maxUrlLength: Int
  maximumCrawlRate: Float
  maximumCrawlRateAdvanced: [AdvancedCrawlRateInput!]
  minDescriptionLength: Int
  minTitleLength: Int
  minVisits: Int
  mobileHomepageUrl: String
  mobileUrlPattern: String
  mobileUserAgentCode: String
  name: String
  primaryDomain: String
  projectId: ObjectID!
  rendererBlockAds: Boolean
  rendererBlockAnalytics: Boolean
  rendererBlockCustom: [String!]
  rendererJsString: String
  rendererJsUrls: [String!]
  robotsOverwrite: String
  secondaryDomains: [String!]
  startUrls: [String!]
  testSiteDomain: String
  testSitePassword: String
  testSiteUsername: String
  thinPageThreshold: Int
  urlRewriteQueryParameters: [String!]
  urlRewriteRules: [UrlRewriteRuleInput!]
  urlRewriteStripFragment: Boolean
  urlSampling: [UrlSamplingInput!]
  useMobileSettings: Boolean
  useRenderer: Boolean
  useRobotsOverwrite: Boolean
  useStealthMode: Boolean
  useUrlRewriteRules: Boolean
  userAgentCode: String
  userAgentString: String
  userAgentStringMobile: String
  userAgentToken: String
  userAgentTokenMobile: String
}

Parameters:

project_input (dict) – Project input
fields (List*(DSLField)*) – Select specific fields

Shortcuts¶

class deepcrawl_graphql.shortcuts.crawl_shortcuts.CrawlShortcuts(conn: DeepCrawlConnection, crawl_id)¶

Shortcuts class

Creates a shortcuts query instance The instance will be passed to the run_query method in order to execute the query.

>>> from deepcrawl_graphql.shortcuts.crawl_shortcuts import CrawlShortcuts

>>> query = CrawlShortcuts(conn, "crawl_id")
>>> crawl_query.select_explorer_data("path1", report_filters)

Parameters:

conn (DeepCrawlConnection) – Connection.
crawl_id (int or str) – crawl id.

select_explorer_data(path, report_filters, agregate_filters=None, paginate=True)¶ Select Explorer data

Parameters:

path (str) – Path. One of the paths from one to 9 e.g. path1
report_filters (dict) – Report filters. reportTemplateCode and reportTypeCode are mandatory
agregate_filters (dict) – Filters applied to the aggregation.
paginate (bool) – Include pagination or not.

Indices and tables¶

Index
Module Index
Search Page

DeepCrawl GraphQL Wrapper

Navigation

Pagination
Ordering
Select Specific Fields
MeQuery
AccountQuery
ProjectQuery
CrawlQuery
ReportQuery
Mutations
Shortcuts

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.2.13

Feb 7, 2024

0.2.12

Jan 16, 2024

0.2.11

Jan 16, 2024

0.2.10

Dec 7, 2023

0.2.9

Dec 7, 2023

0.2.8

Dec 6, 2023

0.2.7

Dec 5, 2023

0.2.6

Sep 13, 2023

0.2.5

Aug 27, 2023

This version

0.2.4

Aug 15, 2023

0.2.3

Oct 30, 2022

0.2.2

Sep 17, 2022

0.2.1

Aug 11, 2022

0.2.0

Jul 18, 2022

0.1.4

Jul 10, 2022

0.1.3

Jul 9, 2022

0.1.2

Jun 28, 2022

0.1.1

Jun 28, 2022

0.1.0

Jun 27, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deepcrawl_graphql-0.2.4.tar.gz (81.7 kB view details)

Uploaded Aug 15, 2023 Source

File details

Details for the file deepcrawl_graphql-0.2.4.tar.gz.

File metadata

Download URL: deepcrawl_graphql-0.2.4.tar.gz
Upload date: Aug 15, 2023
Size: 81.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.2 importlib_metadata/4.11.4 pkginfo/1.8.3 requests/2.28.0 requests-toolbelt/0.9.1 tqdm/4.64.0 CPython/3.8.13

File hashes

Hashes for deepcrawl_graphql-0.2.4.tar.gz
Algorithm	Hash digest
SHA256	`5b6524838009631f779619e11e5c76a4c247794dac013c3a6d754edcd69539d7`
MD5	`20537fe8684629b14d47fa0e1f5f8977`
BLAKE2b-256	`ecad9de6d8de5ba3f297f5e539471331f7670525fa8686cd7025c9e08e0e8030`