CrawlLogAnalyzer¶
Analyze crawl logs to identify trends.
Source code in seotools/logs.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
crawl_frequency_aggregate(url=None, path=None)
¶
Find the number of times a URL has been crawled by date.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
url |
str
|
The URL to analyze. |
None
|
path |
str
|
The path to analyze. |
None
|
Returns:
Name | Type | Description |
---|---|---|
dict |
dict
|
A dictionary of dates and the number of times the URL was crawled on that date. |
Example
from seotools.logs import CrawLogAnalyzer
analyzer = CrawlLogAnalyzer("access.log")
analyzer.crawl_frequency_aggregate(url="...")
Source code in seotools/logs.py
125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
crawl_frequency_by_url(url)
¶
Find the number of times a URL has been crawled.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
url |
str
|
The URL to analyze. |
required |
Returns:
Name | Type | Description |
---|---|---|
int |
int
|
The number of times the URL was crawled. |
Source code in seotools/logs.py
78 79 80 81 82 83 84 85 86 87 88 89 |
|
get_count(col)
¶
Count the number of times a value appears in a column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
col |
str
|
The column to analyze. |
required |
Returns:
Name | Type | Description |
---|---|---|
dict |
dict
|
A dictionary of values and the number of times they appear in the column. |
Source code in seotools/logs.py
64 65 66 67 68 69 70 71 72 73 74 75 76 |
|
get_top_urls(n=10)
¶
Find the top n most crawled URLs.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n |
int
|
The number of URLs to return. |
10
|
Returns:
Name | Type | Description |
---|---|---|
dict |
dict
|
A dictionary of URLs and the number of times they were crawled. |
Source code in seotools/logs.py
113 114 115 116 117 118 119 120 121 122 123 |
|
get_unique(col)
¶
Get the unique values in a column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
col |
str
|
The column to analyze. |
required |
Returns:
Name | Type | Description |
---|---|---|
list |
list
|
A list of unique values in the column. |
Example
from seotools.logs import CrawLogAnalyzer
analyzer = CrawlLogAnalyzer("access.log")
analyzer.get_unique("request")
Source code in seotools/logs.py
43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
|