lib.itmens/doc/catalog.md
2023-08-10 16:03:38 -04:00

3.4 KiB

Catalog

Data Models

all types of catalog items inherits from Item which stores as multi-table django model. one Item may have multiple ExternalResources, each represents one page on an external site

classDiagram
    class Item {
        <<abstract>>
    }
    Item <|-- Album
    class Album {
        +String barcode
        +String Douban_ID
        +String Spotify_ID
    }
    Item <|-- Game
    class Game {
        +String Steam_ID
    }
    Item <|-- Podcast
    class Podcast {
        +String feed_url
        +String Apple_ID
    }
    Item <|-- Performance
    Item <|-- Work
    class Work {
        +String Douban_Work_ID
        +String Goodreads_Work_ID
    }
    Item <|-- Edition
    Item <|-- Series

    Series *-- Work
    Work *-- Edition

    class Series {
        +String Goodreads_Series_ID
    }
    class Work {
        +String Douban_ID
        +String Goodreads_ID
    }
    class Edition{
        +String ISBN
        +String Douban_ID
        +String Goodreads_ID
        +String GoogleBooks_ID
    }

    Item <|-- Movie
    Item <|-- TVShow
    Item <|-- TVSeason
    Item <|-- TVEpisode
    TVShow *-- TVSeason
    TVSeason *-- TVEpisode

    class TVShow{
        +String IMDB_ID
        +String TMDB_ID
    }
    class TVSeason{
        +String Douban_ID
        +String TMDB_ID
    }
    class TVEpisode{
        +String IMDB_ID
        +String TMDB_ID
    }
    class Movie{
        +String Douban_ID
        +String IMDB_ID
        +String TMDB_ID
    }

    Item <|-- Collection

    ExternalResource --* Item
    class ExternalResource {
        +enum site
        +url: string
    }

Add a new site

  • If official API is available for the site, it should be the preferred way to get data.

  • add a new value to IdType and SiteName in catalog/common/models.py

  • add a new file in catalog/sites/, a new class inherits AbstractSite, with:

    • SITE_NAME
    • ID_TYPE
    • URL_PATTERNS
    • WIKI_PROPERTY_ID (not used now)
    • DEFAULT_MODEL (unless specified in scrape() return val)
    • a classmethod id_to_url()
    • a method scrape() returns a ResourceContent object
      • BasicDownloader or ProxiedDownloader can used to download website content or API data. e.g. content = BasicDownloader(url).download().html()
    • ...

    see existing files in catalog/sites/ for more examples

  • add an import in catalog/sites/__init__.py

  • add some tests to catalog/<folder>/tests.py according to site type

    • add DOWNLOADER_SAVEDIR = '/tmp' to settings.py can save all response to /tmp
    • run python3 manage.py cat <url> for debugging or saving response file to /tmp. Detailed code of cat is in catalog/management/commands/cat.py
    • move captured response file to test_data/, except large/images files. Or if have to, use a smallest version (e.g. 1x1 pixel / 1s audio)
    • add @use_local_response decorator to test methods that should pick up these responses (if BasicDownloader or ProxiedDownloader is used)
  • run all the tests and make sure they pass

    • Command: python3 manage.py test [--keepdb]. More detailed for test, please check this doc
    • See this issue if lxml.etree.ParserError occurs on macOS.
  • add a site UI label to common/static/scss/_sitelabel.scss