Catalog

Data Models

all types of catalog items inherits from Item which stores as multi-table django model. one Item may have multiple ExternalResources, each represents one page on an external site

classDiagram
    class Item {
        <<abstract>>
    }
    Item <|-- Album
    class Album {
        +String barcode
        +String Douban_ID
        +String Spotify_ID
    }
    Item <|-- Game
    class Game {
        +String Steam_ID
    }
    Item <|-- Podcast
    class Podcast {
        +String feed_url
        +String Apple_ID
    }
    Item <|-- Performance
    Item <|-- Work
    class Work {
        +String Douban_Work_ID
        +String Goodreads_Work_ID
    }
    Item <|-- Edition
    Item <|-- Series

    Series *-- Work
    Work *-- Edition

    class Series {
        +String Goodreads_Series_ID
    }
    class Work {
        +String Douban_ID
        +String Goodreads_ID
    }
    class Edition{
        +String ISBN
        +String Douban_ID
        +String Goodreads_ID
        +String GoogleBooks_ID
    }

    Item <|-- Movie
    Item <|-- TVShow
    Item <|-- TVSeason
    Item <|-- TVEpisode
    TVShow *-- TVSeason
    TVSeason *-- TVEpisode

    class TVShow{
        +String IMDB_ID
        +String TMDB_ID
    }
    class TVSeason{
        +String Douban_ID
        +String TMDB_ID
    }
    class TVEpisode{
        +String IMDB_ID
        +String TMDB_ID
    }
    class Movie{
        +String Douban_ID
        +String IMDB_ID
        +String TMDB_ID
    }

    Item <|-- Collection

    ExternalResource --* Item
    class ExternalResource {
        +enum site
        +url: string
    }

Add a new site

If official API is available for the site, it should be the preferred way to get data.
add a new value to IdType and SiteName in catalog/common/models.py
add a new file in catalog/sites/, a new class inherits AbstractSite, with:
- SITE_NAME
- ID_TYPE
- URL_PATTERNS
- WIKI_PROPERTY_ID (not used now)
- DEFAULT_MODEL (unless specified in scrape() return val)
- a classmethod id_to_url()
- a method scrape() returns a ResourceContent object
  - BasicDownloader or ProxiedDownloader can used to download website content or API data. e.g. content = BasicDownloader(url).download().html()
- ...
see existing files in catalog/sites/ for more examples
add an import in catalog/sites/__init__.py
add some tests to catalog/<folder>/tests.py according to site type
- add DOWNLOADER_SAVEDIR = '/tmp' to settings.py can save all response to /tmp
- run python3 manage.py cat <url> for debugging or saving response file to /tmp. Detailed code of cat is in catalog/management/commands/cat.py
- move captured response file to test_data/, except large/images files. Or if have to, use a smallest version (e.g. 1x1 pixel / 1s audio)
- add @use_local_response decorator to test methods that should pick up these responses (if BasicDownloader or ProxiedDownloader is used)
run all the tests and make sure they pass
- Command: python3 manage.py test [--keepdb]. More detailed for test, please check this doc
- See this issue if lxml.etree.ParserError occurs on macOS.
add a site UI label to common/static/scss/_sitelabel.scss

3.4 KiB Raw Blame History

Catalog

Data Models

Add a new site

3.4 KiB

Raw Blame History