Twarc-Cloud commandline¶
General¶
Help¶
For any command or subcommand, -h
will provide additional help.
Bucket¶
By default the bucket is specified in twarc_cloud.ini
. For many commands, it can be overridden with --bucket
.
Collection configuration commands¶
Collection configuration commands are for creating and updating collection configuration files.
By default, collection.json
is the collection configuration file. For many commands, it can be overridden
with --collection-config-filepath
.
Create a template¶
$ python3 twarc_cloud.py collection-config template filter
Template written to collection.json.
$ cat collection.json
{
"id": "<Identifier for collection. Should not have spaces. Must be unique for bucket.>",
"keys": {
"consumer_key": "<Your Twitter API consumer key>",
"consumer_secret": "<Your Twitter API consumer secret>",
"access_token": "<Your Twitter API access token>",
"access_token_secret": "<Your Twitter API access token secret>"
},
"type": "filter",
"filter": {
"track": "<Comma separated list of terms or hashtags>",
"follow": "<Comma separated list of user ids>",
"max_records": "<Optional. Maximum number of records to collect per harvest."
}
}
You can now fill in the template or use other collection configuration commands to populate it.
Get the latest collection configuration file¶
To download the latest collection configuration file for an existing collection:
$ python twarc_cloud.py collection-config download test_collection
Downloaded to collection.json.
Add Twitter API keys¶
$ python twarc_cloud.py collection-config keys
Added keys to collection.json.
Add users¶
To add users by screen names provided on the commandline:
$ python twarc_cloud.py collection-config screennames @justin_littman @jack @not_justin_littman
Getting users ids for screen names. This may take some time ...
Added screen names to collection.json.
Following screen names where not found:
not_justin_littman
To add users by screen names from files:
$ python twarc_cloud.py collection-config screenname-files screennames.txt
Getting users ids for screen names. This may take some time ...
Added screen names to collection.json.
To add users by user ids provided on the commandline:
$ python twarc_cloud.py collection-config userids 481186914
Added user ids to collection.json.
To add users by user ids from files:
$ python twarc_cloud.py collection-config userid-filenames userids.txt
Added user ids to collection.json.
Update¶
$ python twarc_cloud.py collection-config update
Collection configuration updated.
Updating the collection configuration file creates a changeset file and copies both to your S3 bucket.
List changes¶
$ python twarc_cloud.py collection-config changes test_collection
credentials -> consumer_key changed from None to mBbq9ruEckIngQztUir8Kn0 on 2019-03-09T15:33:04.577744
credentials -> consumer_secret changed from None to Pf28yReBUD9fpLVOsb4r5idZnKQ6xlOomBAjDfs5npFEQ6Rm on 2019-03-09T15:33:04.577744
credentials -> access_token changed from None to 4811346914-5yIyfryJqfscH4dV29YVLOIzjseVsYuRzCLmwO6 on 2019-03-09T15:33:04.577744
credentials -> access_token_secret changed from None to S51yYftbEsgdf4WMKMGendxbZO014Zvmv38Tfvc on 2019-03-09T15:33:04.577744
users -> 481186914 -> screen_name changed from None to justin_littman on 2019-03-09T15:33:26.730416
keys -> consumer_key changed from None to mBbq9ruEckIngQztTHUir8Kn0 on 2019-03-10T02:51:34.267589
keys -> consumer_secret changed from None to Pf28yReBUD9Xz0pLVOsb4r5idZnKCKQ6xlOomBAjD5npFEQ6Rm on 2019-03-10T02:51:34.267589
keys -> access_token changed from None to 481186914-5yIyfryJqcH4dV29YVL37BOIzjseVsYuRzCLmwO6 on 2019-03-10T02:51:34.267589
keys -> access_token_secret changed from None to S51yY5HjfftbEs4WMKMgvGendxbZVsZO014Zvmv38Tfvc on 2019-03-10T02:51:34.267589
users -> 12 -> screen_name changed from None to jack on 2019-03-10T02:51:34.267589
The changes are derived from the changeset files that are created whenever a change is made to a collection configuration file.
Collection commands¶
Collection commands are for managing collections.
List collections¶
$ python3 twarc_cloud.py collection list
Collections:
candidates_for_congress
mueller
Add a collection¶
$ python3 twarc_cloud.py collection add
Collection added.
Don't forget to start or schedule the collection.
The default collection configuration file is collection.json
. When added, it is copied to your S3 bucket.
Schedule, run once, and stop user timeline and search collections¶
Before running, a collection must be added.
To run once:
$ python3 twarc_cloud.py collection once test_collection
Started
To schedule:
$ python3 twarc_cloud.py collection schedule test_collection "rate(7 days)"
Scheduled
The schedule can be specified using a rate or cron expression.
To stop a scheduled collection:
$ python3 twarc_cloud.py collection stop test_collection
Stopped
And to list scheduled collections:
$ python3 twarc_cloud.py collection scheduled
twarc-cloud2_test_collection_schedule => rate(7 days)
Start and stop filter collections¶
Before starting, a collection must be added.
To start:
$ python3 twarc_cloud.py collection timeline-start test_filter
Started
To stop:
$ python3 twarc_cloud.py collection timeline-stop test_filter
Stopping ...
Stopped
Stopping a filter collection may take a few minutes.
Download a collection¶
$ python3 twarc_cloud.py collection download test_collection
Collection downloaded to download/twarc-cloud2/collections/test_collection
Files that have already been downloaded will be skpped unless --clean
is provided.
Harvest commands¶
List running harvests¶
$ python twarc_cloud.py harvest list
mueller => Bucket: twarc-cloud2. Status: RUNNING
Get info on a running harvest¶
$ python3 twarc_cloud.py harvest running mueller
mueller => Bucket: twarc-cloud2. Harvest timestamp: 2019-03-10T02:57:27.196194. Tweets: 1252. Files: 2 (15MB)
Get info on the last harvest¶
$ python3 twarc_cloud.py harvest last test_collection
test_collection => Bucket: twarc-cloud2. Harvest timestamp: 2019-03-09T15:35:07.464791. Tweets: 2,140. Files: 1 (855K)
No user changes.