2022-12-13 18:12:43 +00:00
|
|
|
Catalog
|
|
|
|
=======
|
|
|
|
|
|
|
|
Data Models
|
|
|
|
-----------
|
2023-01-23 17:35:52 -05:00
|
|
|
all types of catalog items inherits from `Item` which stores as multi-table django model.
|
2022-12-13 18:12:43 +00:00
|
|
|
one `Item` may have multiple `ExternalResource`s, each represents one page on an external site
|
|
|
|
|
|
|
|
```mermaid
|
|
|
|
classDiagram
|
|
|
|
class Item {
|
|
|
|
<<abstract>>
|
|
|
|
}
|
|
|
|
Item <|-- Album
|
|
|
|
class Album {
|
|
|
|
+String barcode
|
|
|
|
+String Douban_ID
|
|
|
|
+String Spotify_ID
|
|
|
|
}
|
|
|
|
Item <|-- Game
|
|
|
|
class Game {
|
|
|
|
+String Steam_ID
|
|
|
|
}
|
|
|
|
Item <|-- Podcast
|
|
|
|
class Podcast {
|
|
|
|
+String feed_url
|
|
|
|
+String Apple_ID
|
|
|
|
}
|
|
|
|
Item <|-- Performance
|
|
|
|
Item <|-- Work
|
|
|
|
class Work {
|
|
|
|
+String Douban_Work_ID
|
|
|
|
+String Goodreads_Work_ID
|
|
|
|
}
|
|
|
|
Item <|-- Edition
|
|
|
|
Item <|-- Series
|
2023-01-23 17:35:52 -05:00
|
|
|
|
2022-12-13 18:12:43 +00:00
|
|
|
Series *-- Work
|
|
|
|
Work *-- Edition
|
2023-01-23 17:35:52 -05:00
|
|
|
|
2022-12-13 18:12:43 +00:00
|
|
|
class Series {
|
|
|
|
+String Goodreads_Series_ID
|
|
|
|
}
|
|
|
|
class Work {
|
|
|
|
+String Douban_ID
|
|
|
|
+String Goodreads_ID
|
|
|
|
}
|
|
|
|
class Edition{
|
|
|
|
+String ISBN
|
|
|
|
+String Douban_ID
|
|
|
|
+String Goodreads_ID
|
|
|
|
+String GoogleBooks_ID
|
|
|
|
}
|
|
|
|
|
|
|
|
Item <|-- Movie
|
|
|
|
Item <|-- TVShow
|
|
|
|
Item <|-- TVSeason
|
|
|
|
Item <|-- TVEpisode
|
|
|
|
TVShow *-- TVSeason
|
|
|
|
TVSeason *-- TVEpisode
|
2023-01-23 17:35:52 -05:00
|
|
|
|
2022-12-13 18:12:43 +00:00
|
|
|
class TVShow{
|
|
|
|
+String IMDB_ID
|
|
|
|
+String TMDB_ID
|
|
|
|
}
|
|
|
|
class TVSeason{
|
|
|
|
+String Douban_ID
|
|
|
|
+String TMDB_ID
|
|
|
|
}
|
|
|
|
class TVEpisode{
|
|
|
|
+String IMDB_ID
|
|
|
|
+String TMDB_ID
|
|
|
|
}
|
|
|
|
class Movie{
|
|
|
|
+String Douban_ID
|
|
|
|
+String IMDB_ID
|
|
|
|
+String TMDB_ID
|
|
|
|
}
|
|
|
|
|
|
|
|
Item <|-- Collection
|
|
|
|
|
|
|
|
ExternalResource --* Item
|
|
|
|
class ExternalResource {
|
|
|
|
+enum site
|
|
|
|
+url: string
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
|
|
|
Add a new site
|
|
|
|
--------------
|
2023-07-12 01:11:15 -04:00
|
|
|
- If official API is available for the site, it should be the preferred way to get data.
|
2023-04-25 19:04:57 -04:00
|
|
|
- add a new value to `IdType` and `SiteName` in `catalog/common/models.py`
|
2022-12-13 18:12:43 +00:00
|
|
|
- add a new file in `catalog/sites/`, a new class inherits `AbstractSite`, with:
|
2023-01-23 17:35:52 -05:00
|
|
|
* `SITE_NAME`
|
2022-12-13 18:12:43 +00:00
|
|
|
* `ID_TYPE`
|
|
|
|
* `URL_PATTERNS`
|
|
|
|
* `WIKI_PROPERTY_ID` (not used now)
|
|
|
|
* `DEFAULT_MODEL` (unless specified in `scrape()` return val)
|
2023-04-25 19:04:57 -04:00
|
|
|
* a classmethod `id_to_url()`
|
2022-12-13 18:12:43 +00:00
|
|
|
* a method `scrape()` returns a `ResourceContent` object
|
2023-06-01 14:49:19 -04:00
|
|
|
* `BasicDownloader` or `ProxiedDownloader` can used to download website content or API data. e.g. `content = BasicDownloader(url).download().html()`
|
2022-12-13 18:12:43 +00:00
|
|
|
* ...
|
|
|
|
|
|
|
|
see existing files in `catalog/sites/` for more examples
|
|
|
|
- add an import in `catalog/sites/__init__.py`
|
2023-05-29 21:50:48 +02:00
|
|
|
- add some tests to `catalog/<folder>/tests.py` according to site type
|
|
|
|
+ add `DOWNLOADER_SAVEDIR = '/tmp'` to `settings.py` can save all response to /tmp
|
|
|
|
+ run `python3 manage.py cat <url>` for debugging or saving response file to `/tmp`. Detailed code of `cat` is in `catalog/management/commands/cat.py`
|
2022-12-13 17:07:45 -05:00
|
|
|
+ move captured response file to `test_data/`, except large/images files. Or if have to, use a smallest version (e.g. 1x1 pixel / 1s audio)
|
2023-06-01 14:49:19 -04:00
|
|
|
+ add `@use_local_response` decorator to test methods that should pick up these responses (if `BasicDownloader` or `ProxiedDownloader` is used)
|
2022-12-13 17:07:45 -05:00
|
|
|
- run all the tests and make sure they pass
|
2024-06-05 21:53:02 -04:00
|
|
|
- Command: `python3 manage.py test [--keepdb]`.
|
2023-06-01 14:49:19 -04:00
|
|
|
- See [this issue](https://github.com/neodb-social/neodb/issues/5) if `lxml.etree.ParserError` occurs on macOS.
|
2023-05-29 21:50:48 +02:00
|
|
|
- add a site UI label to `common/static/scss/_sitelabel.scss`
|