2022-12-13 18:12:43 +00:00
Catalog
=======
Data Models
-----------
2023-01-23 17:35:52 -05:00
all types of catalog items inherits from `Item` which stores as multi-table django model.
2022-12-13 18:12:43 +00:00
one `Item` may have multiple `ExternalResource` s, each represents one page on an external site
```mermaid
classDiagram
class Item {
< < abstract > >
}
Item < |-- Album
class Album {
+String barcode
+String Douban_ID
+String Spotify_ID
}
Item < |-- Game
class Game {
+String Steam_ID
}
Item < |-- Podcast
class Podcast {
+String feed_url
+String Apple_ID
}
Item < |-- Performance
Item < |-- Work
class Work {
+String Douban_Work_ID
+String Goodreads_Work_ID
}
Item < |-- Edition
Item < |-- Series
2023-01-23 17:35:52 -05:00
2022-12-13 18:12:43 +00:00
Series *-- Work
Work *-- Edition
2023-01-23 17:35:52 -05:00
2022-12-13 18:12:43 +00:00
class Series {
+String Goodreads_Series_ID
}
class Work {
+String Douban_ID
+String Goodreads_ID
}
class Edition{
+String ISBN
+String Douban_ID
+String Goodreads_ID
+String GoogleBooks_ID
}
Item < |-- Movie
Item < |-- TVShow
Item < |-- TVSeason
Item < |-- TVEpisode
TVShow *-- TVSeason
TVSeason *-- TVEpisode
2023-01-23 17:35:52 -05:00
2022-12-13 18:12:43 +00:00
class TVShow{
+String IMDB_ID
+String TMDB_ID
}
class TVSeason{
+String Douban_ID
+String TMDB_ID
}
class TVEpisode{
+String IMDB_ID
+String TMDB_ID
}
class Movie{
+String Douban_ID
+String IMDB_ID
+String TMDB_ID
}
Item < |-- Collection
ExternalResource --* Item
class ExternalResource {
+enum site
+url: string
}
```
Add a new site
--------------
2023-05-29 21:50:48 +02:00
> **Site official API** should be the prioritised way to get data when adding a new site.
2023-04-25 19:04:57 -04:00
- add a new value to `IdType` and `SiteName` in `catalog/common/models.py`
2022-12-13 18:12:43 +00:00
- add a new file in `catalog/sites/` , a new class inherits `AbstractSite` , with:
2023-01-23 17:35:52 -05:00
* `SITE_NAME`
2022-12-13 18:12:43 +00:00
* `ID_TYPE`
* `URL_PATTERNS`
* `WIKI_PROPERTY_ID` (not used now)
* `DEFAULT_MODEL` (unless specified in `scrape()` return val)
2023-04-25 19:04:57 -04:00
* a classmethod `id_to_url()`
2022-12-13 18:12:43 +00:00
* a method `scrape()` returns a `ResourceContent` object
2023-06-01 14:49:19 -04:00
* `BasicDownloader` or `ProxiedDownloader` can used to download website content or API data. e.g. `content = BasicDownloader(url).download().html()`
2022-12-13 18:12:43 +00:00
* ...
see existing files in `catalog/sites/` for more examples
- add an import in `catalog/sites/__init__.py`
2023-05-29 21:50:48 +02:00
- add some tests to `catalog/<folder>/tests.py` according to site type
+ add `DOWNLOADER_SAVEDIR = '/tmp'` to `settings.py` can save all response to /tmp
+ run `python3 manage.py cat <url>` for debugging or saving response file to `/tmp` . Detailed code of `cat` is in `catalog/management/commands/cat.py`
2022-12-13 17:07:45 -05:00
+ move captured response file to `test_data/` , except large/images files. Or if have to, use a smallest version (e.g. 1x1 pixel / 1s audio)
2023-06-01 14:49:19 -04:00
+ add `@use_local_response` decorator to test methods that should pick up these responses (if `BasicDownloader` or `ProxiedDownloader` is used)
2022-12-13 17:07:45 -05:00
- run all the tests and make sure they pass
2023-05-29 22:03:18 +02:00
- Command: `python3 manage.py test [--keepdb]` . More detailed for `test` , please check [this doc ](https://github.com/neodb-social/neodb/blob/main/doc/development.md#run-test )
2023-06-01 14:49:19 -04:00
- See [this issue ](https://github.com/neodb-social/neodb/issues/5 ) if `lxml.etree.ParserError` occurs on macOS.
2023-05-29 21:50:48 +02:00
- add a site UI label to `common/static/scss/_sitelabel.scss`