lib.itmens/docs/internals/catalog.md

Catalog
=======

Data Models
-----------
all types of catalog items inherits from `Item` which stores as multi-table django model.
one `Item` may have multiple `ExternalResource`s, each represents one page on an external site

```mermaid
classDiagram
    class Item {
        <<abstract>>
    }
    Item <|-- Album
    class Album {
        +String barcode
        +String Douban_ID
        +String Spotify_ID
    }
    Item <|-- Game
    class Game {
        +String Steam_ID
    }
    Item <|-- Podcast
    class Podcast {
        +String feed_url
        +String Apple_ID
    }
    Item <|-- Performance
    Item <|-- Work
    class Work {
        +String Douban_Work_ID
        +String Goodreads_Work_ID
    }
    Item <|-- Edition
    Item <|-- Series

    Series *-- Work
    Work *-- Edition

    class Series {
        +String Goodreads_Series_ID
    }
    class Work {
        +String Douban_ID
        +String Goodreads_ID
    }
    class Edition{
        +String ISBN
        +String Douban_ID
        +String Goodreads_ID
        +String GoogleBooks_ID
    }

    Item <|-- Movie
    Item <|-- TVShow
    Item <|-- TVSeason
    Item <|-- TVEpisode
    TVShow *-- TVSeason
    TVSeason *-- TVEpisode

    class TVShow{
        +String IMDB_ID
        +String TMDB_ID
    }
    class TVSeason{
        +String Douban_ID
        +String TMDB_ID
    }
    class TVEpisode{
        +String IMDB_ID
        +String TMDB_ID
    }
    class Movie{
        +String Douban_ID
        +String IMDB_ID
        +String TMDB_ID
    }

    Item <|-- Collection

    ExternalResource --* Item
    class ExternalResource {
        +enum site
        +url: string
    }
```

Add a new site
--------------

 - If official API is available for the site, it should be the preferred way to get data.
 - add a new value to `IdType` and `SiteName` in `catalog/common/models.py`
 - add a new file in `catalog/sites/`, a new class inherits `AbstractSite`, with:
    * `SITE_NAME`
    * `ID_TYPE`
    * `URL_PATTERNS`
    * `WIKI_PROPERTY_ID` (not used now)
    * `DEFAULT_MODEL` (unless specified in `scrape()` return val)
    * a classmethod `id_to_url()`
    * a method `scrape()` returns a `ResourceContent` object
        * `BasicDownloader` or `ProxiedDownloader` can used to download website content or API data. e.g. `content = BasicDownloader(url).download().html()`
    * check out existing files in `catalog/sites/` for more examples
 - add an import in `catalog/sites/__init__.py`
 - add some tests to `catalog/<folder>/tests.py` according to site type
    + add `DOWNLOADER_SAVEDIR = '/tmp'` to `settings.py` can save all response to /tmp
    + run `neodb-manage cat <url>` for debugging or saving response file to `/tmp`. Detailed code of `cat` is in `catalog/management/commands/cat.py`
    + move captured response file to `test_data/`, except large/images files. Or if have to, replace it with a smallest version (e.g. 1x1 pixel / 1s audio)
    + add `@use_local_response` decorator to test methods that should pick up these responses (if `BasicDownloader` or `ProxiedDownloader` is used)
 - run all the tests and make sure they pass
    - Command: `neodb-manage python3 manage.py test [--keepdb]`.
    - See [this issue](https://github.com/neodb-social/neodb/issues/5) if `lxml.etree.ParserError` occurs on macOS.
 - add a site UI label style to `common/static/scss/_sitelabel.scss`
 - update documentation in [sites.md](../sites.md)
more doc and test 2022-12-13 18:12:43 +00:00			`Catalog`
			`=======`

			`Data Models`
			`-----------`
update sponsor links & etc 2023-01-23 17:35:52 -05:00			all types of catalog items inherits from `Item` which stores as multi-table django model.
more doc and test 2022-12-13 18:12:43 +00:00			one `Item` may have multiple `ExternalResource`s, each represents one page on an external site

			```mermaid
			`classDiagram`
			`class Item {`
			`<<abstract>>`
			`}`
			`Item <\|-- Album`
			`class Album {`
			`+String barcode`
			`+String Douban_ID`
			`+String Spotify_ID`
			`}`
			`Item <\|-- Game`
			`class Game {`
			`+String Steam_ID`
			`}`
			`Item <\|-- Podcast`
			`class Podcast {`
			`+String feed_url`
			`+String Apple_ID`
			`}`
			`Item <\|-- Performance`
			`Item <\|-- Work`
			`class Work {`
			`+String Douban_Work_ID`
			`+String Goodreads_Work_ID`
			`}`
			`Item <\|-- Edition`
			`Item <\|-- Series`
update sponsor links & etc 2023-01-23 17:35:52 -05:00
more doc and test 2022-12-13 18:12:43 +00:00			`Series *-- Work`
			`Work *-- Edition`
update sponsor links & etc 2023-01-23 17:35:52 -05:00
more doc and test 2022-12-13 18:12:43 +00:00			`class Series {`
			`+String Goodreads_Series_ID`
			`}`
			`class Work {`
			`+String Douban_ID`
			`+String Goodreads_ID`
			`}`
			`class Edition{`
			`+String ISBN`
			`+String Douban_ID`
			`+String Goodreads_ID`
			`+String GoogleBooks_ID`
			`}`

			`Item <\|-- Movie`
			`Item <\|-- TVShow`
			`Item <\|-- TVSeason`
			`Item <\|-- TVEpisode`
			`TVShow *-- TVSeason`
			`TVSeason *-- TVEpisode`
update sponsor links & etc 2023-01-23 17:35:52 -05:00
more doc and test 2022-12-13 18:12:43 +00:00			`class TVShow{`
			`+String IMDB_ID`
			`+String TMDB_ID`
			`}`
			`class TVSeason{`
			`+String Douban_ID`
			`+String TMDB_ID`
			`}`
			`class TVEpisode{`
			`+String IMDB_ID`
			`+String TMDB_ID`
			`}`
			`class Movie{`
			`+String Douban_ID`
			`+String IMDB_ID`
			`+String TMDB_ID`
			`}`

			`Item <\|-- Collection`

			`ExternalResource --* Item`
			`class ExternalResource {`
			`+enum site`
			`+url: string`
			`}`
			```

			`Add a new site`
			`--------------`
fix doc format 2024-07-08 04:09:41 -04:00
hide search category 2023-07-12 01:11:15 -04:00			`- If official API is available for the site, it should be the preferred way to get data.`
add support for books.com.tw 2023-04-25 19:04:57 -04:00			- add a new value to `IdType` and `SiteName` in `catalog/common/models.py`
more doc and test 2022-12-13 18:12:43 +00:00			- add a new file in `catalog/sites/`, a new class inherits `AbstractSite`, with:
update sponsor links & etc 2023-01-23 17:35:52 -05:00			* `SITE_NAME`
more doc and test 2022-12-13 18:12:43 +00:00			* `ID_TYPE`
			* `URL_PATTERNS`
			* `WIKI_PROPERTY_ID` (not used now)
			* `DEFAULT_MODEL` (unless specified in `scrape()` return val)
add support for books.com.tw 2023-04-25 19:04:57 -04:00			* a classmethod `id_to_url()`
more doc and test 2022-12-13 18:12:43 +00:00			* a method `scrape()` returns a `ResourceContent` object
supports localized title 2024-07-13 00:16:47 -04:00			* `BasicDownloader` or `ProxiedDownloader` can used to download website content or API data. e.g. `content = BasicDownloader(url).download().html()`
fix doc format 2024-07-08 04:09:41 -04:00			* check out existing files in `catalog/sites/` for more examples
more doc and test 2022-12-13 18:12:43 +00:00			- add an import in `catalog/sites/__init__.py`
update doc to support new site 2023-05-29 21:50:48 +02:00			- add some tests to `catalog/<folder>/tests.py` according to site type
supports localized title 2024-07-13 00:16:47 -04:00			+ add `DOWNLOADER_SAVEDIR = '/tmp'` to `settings.py` can save all response to /tmp
			+ run `neodb-manage cat <url>` for debugging or saving response file to `/tmp`. Detailed code of `cat` is in `catalog/management/commands/cat.py`
			+ move captured response file to `test_data/`, except large/images files. Or if have to, replace it with a smallest version (e.g. 1x1 pixel / 1s audio)
			+ add `@use_local_response` decorator to test methods that should pick up these responses (if `BasicDownloader` or `ProxiedDownloader` is used)
minor doc correction 2022-12-13 17:07:45 -05:00			`- run all the tests and make sure they pass`
supports localized title 2024-07-13 00:16:47 -04:00			- Command: `neodb-manage python3 manage.py test [--keepdb]`.
			- See [this issue](https://github.com/neodb-social/neodb/issues/5) if `lxml.etree.ParserError` occurs on macOS.
fix doc format 2024-07-08 04:09:41 -04:00			- add a site UI label style to `common/static/scss/_sitelabel.scss`
add doc for federation 2024-11-29 17:10:12 -05:00			`- update documentation in [sites.md](../sites.md)`