c3dbdl/README.md

254 lines
11 KiB
Markdown
Raw Normal View History

# Customs Creators Collective archive tool
2023-04-02 12:50:09 -04:00
The Customs Creators Collective archive tool allows for easy scraping to a local JSON database and downloading of
files from the C3 (Customs Creators Collective) database, a collection of custom songs for Rock Band and similar
clone games.
2023-04-02 12:50:09 -04:00
This tool exists because the C3DB is very hard to mass download from: each song must be found in the extensive
list, selected manually, and a second link clicked through, before a random file name is obtained. This tool
simplifies the process by first collecting information about all available songs of a particular type, and then
is able to download songs based on customizable filters (e.g. by genre, artist, author, etc.) and output them in
a standardized format.
2023-04-02 12:50:09 -04:00
To use the tool, first use the "database" command to build or modify your local JSON database, then use the
"download" command to download songs.
## Installation
### `pip`
1. Use `pip3 install .` to install the package to a virtualenv or your system Python. The tool will be available
as `c3dbdl` in your shell.
### Manual
2023-04-02 12:50:09 -04:00
1. Install the Python3 requirements from `requirements.txt`.
1. Copy the file `c3dbdl/c3dbdl.py` to somewhere in your `$PATH`. You can optionally remove the `.py` if you
with for command compatibility with a `pip` installation.
2023-04-02 12:50:09 -04:00
## Usage
2023-04-06 19:47:11 -04:00
Before running any command, use the built-in help via the `-h`/`--help` option to view the available option(s)
of the command. This option is available everywhere by virtue of the Click tool, so use it frequently to get
a comprehensive understanding of all available options and how they work.
2023-04-02 12:50:09 -04:00
The general process of using `c3dbdl` is as follows:
1. Select a download location, and either specify it with the `-d`/`--download-directory` option or via the
environment variable `C3DBDL_DOWNLOAD_DIRECTORY`.
1. Select a base URL. Use this to determine what game(s) you want to want to limit to, or use the default to
fetch all avilable songs for all games, and either specify it with the `-u`/`--base-url` option or via the
environment variable `C3DBDL_BASE_URL`.
1. Initialize your C3DB JSON database with `c3dbdl [options] database build`. This will take a fair amount
of time to complete as all pages of the chosen base URL, and all song pages (30,000+) are scanned. Note that if
you cancel this process, no data will be saved, so let it complete! The default concurrency setting should make
this relatively quick but YMMV.
2023-04-02 12:50:09 -04:00
1. Download any song(s) you want with `c3dbdl [options] download [options]`.
2023-04-06 19:47:11 -04:00
## Database & Included Data
The database is contained in a JSON document which lists all possible songs which were scraped from the C3DB
pages during the `database build` step.
To obtain the database, first the specified base URL is downloaded to get a list of pages, and then each page
is iterated through. Within each page, all "song" table entries are extracted for information, and the song
page itself visited to obtain a full list of download links. The song iteration is performed in parallel with
a default of 10 simultaneous jobs (configurable with `-c`/`--concurrency`) to speed up downloading.
Once all pages and songs have been scanned, the results are saved into the database file specified, which can
then be reused for future downloads. Note that cancelling a `database build` before it is finished will result
in an empty database and the process will have to be started again from the beginning.
A database file cannot be updated; it must be replaced wholesale. You can however interactively edit your local
database with the `database edit` command should you choose to do so (for instance, to normalize album names
or similar).
The contents of the database includes all information required for filtering and downloading as described below.
An example entry (first entry on the first page) is:
```
{
"artist": "Heatwave",
"title": "Boogie Nights",
"album": "Too Hot to Handle",
"song_link": "https://db.c3universe.com/song/-34018",
"genre": "Pop-Rock",
"year": "1976",
"length": "0:05",
"author": "D97",
"dl_links": [
{
"link": "https://dl.c3universe.com/642d6ab2aa5b87.10964554",
"description": "Rock Band 3 Xbox 360"
}
]
}
```
### Download Links
The `c3dbdl` tool is very picky about the download links (`dl_links`) it selects. Specifically, it will *only*
include links from `c3universe.com`, and not any other external "download sites" such as Mega.nz, Angelfire,
etc.
This is done because the non-iteractive, command-based download method is not compatible with those sites, and
we want this tool to be as automated as possible. Requiring some manual clickthrough of a web page would defeat
the purpose here, and thus, we simply exclude them and require you download any such songs manually.
If a song ends up with no `dl_links` during scanning, for instance because they all pointed to such external
"download sites", it will not be included in the database. Thus, the final number of songs in your database is
guaranteed to be smaller than the total number listed on the C3DB website.
2023-04-07 02:09:41 -04:00
## Searching & Downloading
2023-04-06 19:47:11 -04:00
Once a database has been built, you can start searching for and downloading songs.
2023-04-07 02:09:41 -04:00
To search for songs, use the `search` command. This command takes `--filter` arguments in order to show what
song(s) would be downloaded by a given filter, along with their basic information, without actually triggering
a download. Once you have a valid filter from a search, you can use it to `download` precisely the song(s) you
want.
2023-04-07 02:09:41 -04:00
See the following sections for more details on the specifics of the filters and output formatting of the
`search` and `download` commands.
2023-04-06 19:47:11 -04:00
By default, when downloading a given song, all possible download links (`dl_links`) will be downloaded; this
can be limited by using the `-i`/`--download-id` and `-d`/`--download-descr` options to pick and choose specific
files. A specific example usecase would be to specify `--download-descr 360` to only download Xbox 360 RBCONs.
2023-04-06 19:47:11 -04:00
Once a song has been downloaded, assuming that the file structure doesn't change, subsequent `download` runs will
not overwrite it and will simply skip downloading the file.
### Filtering
2023-04-02 12:50:09 -04:00
Filtering out the songs in the database is a key part of this tool. You might want to be able to grab only songs
with certain genres, artists, instruments, etc. or by certain authors, to make your custom song packs.
2023-04-02 12:50:09 -04:00
If multiple filters are specified, they are treated as a logical AND, i.e. *all* of the give filters must apply
to a song for it to be matched.
Filtering is always done during the search/download stage; the JSON database will always contain all possible
entries from the C3DB.
#### Information Filters
`c3dbdl` is able to filter songs by their general information in several key categories:
2023-04-02 12:50:09 -04:00
* `genre`: The genre of the song.
* `artist`: The artist of the song.
* `album`: The album of the song.
* `title`: The title of the song.
* `year`: The year of the album/song.
* `author`: The author of the file on C3DB.
To use information filters, append one or more `--filter` options to your `c3dbdl search` or `download` command. An
information filter option begins with the literal `--filter`, followed by the field (e.g. `genre` or `artist`), then
finally the text value to filter on, for instance `Rock` or `Santana` or `2012`. The text must be quoted if it
contains any whitespace.
Information filter values are fuzzy. They are case insensitive, and use the `in` construct. So, for example, the
filter string `--filter song "edmund fitzgerald"` would match the song title "The Wreck of the Edmund Fitzgerald".
For example, to find all songs by Rush from the album Vapor Trails (the remixed version) authored by ejthedj:
```
c3dbdl search --filter artist Rush --filter album "Vapor Trails [Remixed]" --filter author ejthedj
Found 19563 songs from JSON database file 'Downloads/c3db.json'
Found 1 matching songs:
> Song: "Rush - Sweet Miracle" from "Vapor Trails [Remixed] (2002)" by ejthedj
Instruments: guitar [2], bass [3], drums [4], vocals [4], keys [None]
Available downloads:
* Rock Band 3 Xbox 360
```
In this case, one song matched; applying the same filter to a `download` would thus download only the single song.
2023-04-02 12:50:09 -04:00
#### Instrument Filters
2023-04-02 12:50:09 -04:00
In addition to the information filters, `c3dbdl` can also filter by available instrument parts. There are 5 valid
instruments that can be filtered on:
2023-04-02 12:50:09 -04:00
* `guitar`
* `bass`
* `drums`
* `vocals`
* `keys`
2023-04-02 12:50:09 -04:00
To use instrument filters, append one or more `--filter instrument <instrument>` options to your `c3dbdl search` or
`download` command. An instrument filter option begins with the literal `--filter instrument`, followed by the
instrument you wish to filter on.
2023-04-07 02:09:41 -04:00
If a part contains the instrument at any difficulty (from 0-6), it will match the filter; if the instrument part
is missing, it will not match.
You can also invert the match by adding `no-` to the instrument name. So `--filter instrument no-keys` would
only match songs *without* a keys part.
For example, to find all songs by Rush which have a keys part but no vocal part:
2023-04-02 12:50:09 -04:00
```
c3dbdl search --filter artist Rush --filter instrument keys --filter instrument no-vocals
Found 19562 songs from JSON database file 'Downloads/c3db.json'
Found 1 matching songs:
> Song: "Rush - La Villa Strangiato" from "Hemispheres (1978)" by DoNotPassGo
Instruments: guitar [6], bass [5], drums [6], vocals [None], keys [1]
Available downloads:
* Rock Band 3 Xbox 360
* Rock Band 3 Wii
* Rock Band 3 PS3
* Phase Shift
* Rock Band 3 Xbox 360 (Alternate Version)
2023-04-02 12:50:09 -04:00
```
In this case, one song matched; applying the same filter to a `download` would thus download only the single song.
2023-04-06 19:47:11 -04:00
### Output Format
2023-04-02 12:50:09 -04:00
When downloading files, it may be advantageous to customize the output directory and filename structure to better
match what you plan to do with the files. For instance, for pure organiation you might want nicely laid out
files with clear directory structures and names, while for Onyx packaging you might want everything in a flat
directory.
`c3dbdl` provides complete flexibility in the output file format. When downloading, use the `--file-structure`
option to set the file structure. This value is an interpolated string containing one or more field variables,
2023-04-07 02:09:41 -04:00
which are mapped at download time. The available fields are:
2023-04-02 12:50:09 -04:00
* `genre`: The genre of the song.
* `artist`: The artist of the song.
* `album`: The album of the song.
* `title`: The title of the song.
* `year`: The year of the album/song.
* `author`: The author of the file on C3DB.
2023-04-06 19:47:11 -04:00
* `orig_name`: The original filename that would be downloaded by e.g. a browser.
2023-04-02 12:50:09 -04:00
2023-04-06 19:47:11 -04:00
The default structure leverages most of these options to create an archive-ready structure as follows:
2023-04-02 12:50:09 -04:00
```
2023-04-06 19:47:11 -04:00
{artist}/{album}/{title}.{author}.{orig_name}
2023-04-02 12:50:09 -04:00
```
2023-04-06 19:47:11 -04:00
As an example, as shown in the previous section:
2023-04-02 12:50:09 -04:00
```
2023-04-06 19:47:11 -04:00
Rush/Vapor Trails [Remixed]/Sweet Miracle.ejthedj.sweetMiracle
2023-04-02 12:50:09 -04:00
```
2023-04-06 19:47:11 -04:00
The genre is excluded because in my experience it is a fairly useless metric and is often incorrectly set,
so it gets in the way more often than not. You are free of course to add it in to your own custom structure.
The date is excluded for similar reasons and because if you know the album, you know the date.
If any field is missing during download, it is replaced with "None".
2023-04-02 12:50:09 -04:00
Note that any parent director(ies) will be automatically created down the whole tree until the final filename.
## Help
This is a quick and dirty tool I wrote to quickly grab collections of songs. I provide no guarantee of success
when using this tool. If you have issues, please open an issue on this repository and provide *full details*
of your problem.