--- title: "Architecture" vignette: > %\VignetteIndexEntry{Architecture} %\VignetteEncoding{UTF-8} %\VignetteEngine{quarto::html} knitr: opts_chunk: collapse: true comment: "#>" --- This vignette provides a high-level overview of the core architectural components in **LaminDB**. Understanding these concepts will help you navigate the system and effectively manage your data and metadata. ## Core concepts **LaminDB** is built around a few key ideas: ### Instance A LaminDB **instance** is a self-contained environment for storing and managing data and metadata. You can think of it like a database or a project directory. Each instance has its own: * **Schema:** Defines the structure of the metadata. * **Storage:** Where the actual data files are stored (locally, on S3, etc.). * **Database:** Stores the metadata records in registries. For more information about instances, see `?connect()` and `?Instance`. ### Module A **module** in LaminDB is a collection of related registries that provide functionality in a specific domain. For example: * **core:** Provides registries for general data management (Artifacts, Collections, Transforms, etc.). This module is included by default in every LaminDB instance. * **bionty:** Offers registries for managing biological entities (genes, proteins, cell types) and links them to public ontologies. * **wetlab:** Includes registries for managing experimental metadata (samples, treatments, etc.). * **And many more...** Modules help organize the system and make it easier to find the specific registries you need. For more information about modules, see `?Module`. The core module is documented in the `module_core` vignette: `vignette("module_core", package = "laminr")`. ### Registry A **registry** is a centralized collection of related records. It's like a table in a database, where each row represents a specific entity. Examples of registries include: * **Artifacts**: Datasets, models, or other data entities. * **Collections**: Groupings of related artifacts. * **Transforms**: Data processing operations. * **Features**: Variables or measurements within datasets. * **Labels**: Annotations or classifications applied to data. Each registry has a defined structure with specific fields that hold relevant information. For more information about registries, see `?Registry`. The core registries are documented in the `module_core` vignette: `vignette("module_core", package = "laminr")`. ### Field A **field** is a single piece of information within a registry. It's analogous to a column in a database table. For example, the Artifact registry might have fields like: * `key`: Storage key, the relative path within the storage location. * `storage`: Storage location, e.g. an S3 or GCP bucket or a local directory. * `description`: A description of the artifact. * `created_by`: The user who created the artifact. Fields define the type of data that can be stored in a registry and provide a way to organize and query the metadata. For more information about fields, see `?Field`. The fields of core registries are documented in the `module_core` vignette: `vignette("module_core", package = "laminr")`. ### Record A **record** is a single entry within a registry. It's like a row in a database table. A record combines multiple fields to represent a specific entity. For example, a record in the Artifact registry might represent a single dataset with its key, storage location, description, creator, and other relevant information. ### Putting it together In essence, you have **instances** that contain **modules**. Each module contains **registries**, which in turn hold **records**. Every record is composed of multiple **fields**. This hierarchical structure allows for flexible and organized management of data and metadata within LaminDB. ## Class structure The `laminr` package provides a set of classes that mirror the core concepts of LaminDB. These classes allow you to interact with instances, modules, registries, fields, and records in a programmatic way. The package provides two sets of classes: the base classes and the sugar syntax classes. ### Base classes These classes provide the core functionality for interacting with LaminDB instances, modules, registries, fields, and records. These are the classes that are documented via `?Instance`, `?Module`, `?Registry`, `?Field`, and `?Record`. The class diagram below illustrates the relationships between these classes. However, they are not intended to be used directly in most cases. Instead, the sugar syntax classes provide a more user-friendly interface for working with LaminDB data. ```{mermaid} classDiagram %% # nolint start laminr --> Instance laminr --> UserSettings laminr --> InstanceSettings Instance --> InstanceAPI Instance --> Module Module --> Registry Registry --> Field Registry --> Record Field --> RelatedRecords Record --> RelatedRecords UserSettings --> InstanceSettings InstanceSettings --> Instance InstanceAPI --> Module Instance --> Registry InstanceAPI --> Registry Instance --> Record InstanceAPI --> Record Instance --> RelatedRecords InstanceAPI --> RelatedRecords %% Methods must be on one line to be shown in the right diagram section %% Use \n for newlines and #emsp; to create indents in the rendered %% diagram when necessary class laminr{ +connect(String slug): RichInstance } class UserSettings{ +initialize(...): UserSettings +email: String +access_token: String +uid: String +uuid: String +handle: String +name: String } class InstanceSettings{ +initialize(...): InstanceSettings +owner: String +name: String +id: String +schema_id: String +api_url: String } class Instance{ +initialize(\n#emsp;InstanceSettings Instance_settings, API api, \n#emsp;Map schema\n): Instance +get_modules(): Module[] +get_module(String module_name): Module +get_module_names(): String[] +get_api(): InstanceAPI +get_settings(): InstanceSettings +get_py_lamin(Boolean check, String what): PythonModule +track(String path, String transform): NULL +finish(): NULL +is_default: Boolean } class InstanceAPI{ +initialize(InstanceSettings Instance_settings) +get_schema(): Map +get_record(...): Map +get_records(...): Map +delete_record(...): NULL } class Module{ +initialize(\n#emsp;Instance Instance, API api, String module_name,\n#emsp;Map module_schema\n): Module +name: String +get_registries(): Registry[] +get_registry(String registry_name): Registry +get_registry_names(): String[] } class Registry{ +initialize(\n#emsp;Instance Instance, Module module, API api,\n#emsp;String registry_name, Map registry_schema\n): Registry +name: String +class_name: String +is_link_table: Bool +get_fields(): Field[] +get_field(String field_name): Field +get_field_names(): String[] +get(\n#emsp;String id_or_uid, Bool include_foreign_keys,\n#emsp;List~String~ select, Bool verbose\n): RichRecord +get_record_class(): RichRecordClass +get_temporary_record_class(): TemporaryRecordClass +df(Integer limit, Bool verbose): DataFrame +from_df(\n#emsp;DataFrame dataframe, String key,\n#emsp;String description, String run\n): TemporaryRecord +from_path(\n#emsp;Path path, String key, String description, String run\n): TemporaryRecord +from_anndata(\n#emsp;AnnData adata, String key, String description, String run\n): TemporaryRecord } class Field{ +initialize(\n#emsp;String type, String through, String field_name,\n#emsp;String registry_name, String column_name, String module_name,\n#emsp;Bool is_link_table, String relation_type, String related_field_name,\n#emsp;String related_registry_name, String related_module_name\n): Field +type: String +through: Map +field_name: String +registry_name: String +column_name: String +module_name: String +is_link_table: Bool +relation_type: String +related_field_name: String +related_registry_name: String +related_module_name: String } class Record{ +initialize(\n#emsp;Instance Instance, Registry registry,\n#emsp;API api, Map data\n): Record +get_value(String field_name): Any +delete(): NULL } class RelatedRecords{ +initialize(\n#emsp;Instance instance, Registry registry, Field field,\n#emsp;String related_to, API api\n): RelatedRecords +df(): DataFrame +field: Field } %% # nolint end ``` ### Sugar syntax classes The sugar syntax classes provide a more user-friendly way to interact with LaminDB data. These classes are designed to make it easier to access and manipulate instances, modules, registries, fields, and records. For example, to get an artifact with a specific ID using **only** base classes, you might write: ```r db <- connect("laminlabs/cellxgene") artifact <- db$get_module("core")$get_registry("artifact")$get("MkRm3eUKPwfnAyZMWD9v") artifact$get_value("id") ``` With the sugar syntax classes, you can achieve the same result more concisely: ```r db <- connect("laminlabs/cellxgene") artifact <- db$Artifact$get("MkRm3eUKPwfnAyZMWD9v") artifact$id ``` This sugar syntax is achieved by creating RichInstance and RichRecord classes that inherit from Instance and Record, respectively. These classes provide additional methods and properties to simplify working with LaminDB data. ### Class diagram The class diagram below illustrates the relationships between the sugar syntax classes in the `laminr` package. These classes provide a more user-friendly interface for interacting with LaminDB data. ```{mermaid} classDiagram %% # nolint start %% --- Copied from base diagram -------------------------------------------- laminr --> UserSettings laminr --> InstanceSettings Instance --> InstanceAPI Instance --> Module Module --> Registry Registry --> Field Field --> RelatedRecords Record --> RelatedRecords UserSettings --> InstanceSettings InstanceSettings --> Instance InstanceAPI --> Module Instance --> Registry InstanceAPI --> Registry Instance --> Record InstanceAPI --> Record Instance --> RelatedRecords InstanceAPI --> RelatedRecords %% ------------------------------------------------------------------------- %% --- New links for Rich classes ------------------------------------------ RichInstance --|> Instance laminr --> RichInstance Core --|> Module RichInstance --> Core Bionty --|> Module RichInstance --> Bionty Registry --> RichRecord Registry --> TemporaryRecord RichRecord --|> Record TemporaryRecord --|> RichRecord Registry --> Artifact Artifact --|> RichRecord %% ------------------------------------------------------------------------- %% --- Copied from base diagram -------------------------------------------- class laminr{ +connect(String slug): RichInstance } class UserSettings{ +initialize(...): UserSettings +email: String +access_token: String +uid: String +uuid: String +handle: String +name: String } class InstanceSettings{ +initialize(...): InstanceSettings +owner: String +name: String +id: String +schema_id: String +api_url: String } class Instance{ +initialize(\n#emsp;InstanceSettings Instance_settings, API api, \n#emsp;Map schema\n): Instance +get_modules(): Module[] +get_module(String module_name): Module +get_module_names(): String[] +get_api(): InstanceAPI +get_settings(): InstanceSettings +get_py_lamin(Boolean check, String what): PythonModule +track(String path, String transform): NULL +finish(): NULL +is_default: Boolean } class InstanceAPI{ +initialize(InstanceSettings Instance_settings) +get_schema(): Map +get_record(...): Map +get_records(...): Map +delete_record(...): NULL } class Module{ +initialize(\n#emsp;Instance Instance, API api, String module_name,\n#emsp;Map module_schema\n): Module +name: String +get_registries(): Registry[] +get_registry(String registry_name): Registry +get_registry_names(): String[] } class Registry{ +initialize(\n#emsp;Instance Instance, Module module, API api,\n#emsp;String registry_name, Map registry_schema\n): Registry +name: String +class_name: String +is_link_table: Bool +get_fields(): Field[] +get_field(String field_name): Field +get_field_names(): String[] +get(\n#emsp;String id_or_uid, Bool include_foreign_keys,\n#emsp;List~String~ select, Bool verbose\n): RichRecord +get_record_class(): RichRecordClass +get_temporary_record_class(): TemporaryRecordClass +df(Integer limit, Bool verbose): DataFrame +from_df(\n#emsp;DataFrame dataframe, String key,\n#emsp;String description, String run\n): TemporaryRecord +from_path(\n#emsp;Path path, String key, String description, String run\n): TemporaryRecord +from_anndata(\n#emsp;AnnData adata, String key, String description, String run\n): TemporaryRecord } class Field{ +initialize(\n#emsp;String type, String through, String field_name,\n#emsp;String registry_name, String column_name, String module_name,\n#emsp;Bool is_link_table, String relation_type, String related_field_name,\n#emsp;String related_registry_name, String related_module_name\n): Field +type: String +through: Map +field_name: String +registry_name: String +column_name: String +module_name: String +is_link_table: Bool +relation_type: String +related_field_name: String +related_registry_name: String +related_module_name: String } class Record{ +initialize(\n#emsp;Instance Instance, Registry registry,\n#emsp;API api, Map data\n): Record +get_value(String field_name): Any +delete(): NULL } class RelatedRecords{ +initialize(\n#emsp;Instance instance, Registry registry, Field field,\n#emsp;String related_to, API api\n): RelatedRecords +df(): DataFrame +field: Field } %% ------------------------------------------------------------------------- %% --- New Rich classes ---------------------------------------------------- class RichInstance{ +initialize( #emsp;InstanceSettings Instance_settings, API api, #emsp;Map schema ): RichInstance +Registry Artifact +Registry Collection +...registry accessors... +Registry User +Bionty bionty } style RichInstance fill:#ffe1c9 class Core{ +Registry Artifact +Registry Collection +...registry accessors... +Registry User } style Core fill:#ffe1c9 class Bionty{ +Registry CellLine +Registry CellMarker +...registry accessors... +Registry Tissue } style Bionty fill:#ffe1c9 class RichRecord{ +...field value accessors... } style RichRecord fill:#ffe1c9 class TemporaryRecord{ +save(): NULL } style TemporaryRecord fill:#ffe1c9 class Artifact{ +...field value accessors... +cache(): String +load(): AnnData | DataFrame | ... +open(): SOMACollection | SOMAExperiment +describe(): NULL } style Artifact fill:#ffe1c9 %% ------------------------------------------------------------------------- %% # nolint end ```