Mappers define how source database columns are mapped to PhenEx's internal representation. Each mapper class corresponds to a specific source table and declares which source columns map to PhenEx's standard fields (PERSON_ID, EVENT_DATE, CODE, VALUE, etc.).
PhenEx ships with a complete set of mappers for the OMOP CDM. These are bundled into the OMOPDomains dictionary, which is the main entry point for working with OMOP data.
Using mappers
In most cases you interact with mappers through a DomainsDictionary rather than instantiating them directly.
Concept ID columns (e.g. CONDITION_CONCEPT_ID) contain OMOP standard concept IDs.
Source value columns (e.g. CONDITION_SOURCE_VALUE) contain the original vocabulary codes (ICD-10, CPT, NDC, etc.).
PhenEx provides a mapper for each. Use the concept ID mapper when your codelist contains OMOP concept IDs, and the source value mapper when your codelist contains native vocabulary codes.
def__init__(self,table,name=None,column_mapping={}):""" Instantiate a PhenexTable, possibly overriding NAME_TABLE and COLUMN_MAPPING. """ifnotisinstance(table,Table):raiseTypeError(f"Cannot instantiatiate {self.__class__.__name__} from {type(table)}. Must be ibis Table.")self.NAME_TABLE=nameorself.NAME_TABLEself.column_mapping=self._get_column_mapping(column_mapping)self._table=table.mutate(**self._resolve_column_mapping(table,self.column_mapping))forkeyinself.REQUIRED_FIELDS:try:getattr(self._table,key)exceptAttributeError:raiseValueError(f"Required field {key} not defined in COLUMN_MAPPING.")self._add_phenotype_table_relationship()
filter(expr)
Filter the table by an Ibis Expression or using a PhenExFilter.
deffilter(self,expr):""" Filter the table by an Ibis Expression or using a PhenExFilter. """input_columns=self.columnsifisinstance(expr,ibis.expr.types.Expr)orisinstance(expr,list):filtered_table=self.table.filter(expr)else:filtered_table=expr.filter(self)returntype(self)(filtered_table.select(input_columns),name=self.NAME_TABLE,column_mapping=self.column_mapping,)
from_dict(data)classmethod
Reconstruct a PhenexTable class reference from serialized data.
Note: This returns the class itself, not an instance, since we cannot
reconstruct the actual table data without a database connection.
@classmethoddeffrom_dict(cls,data:dict):""" Reconstruct a PhenexTable class reference from serialized data. Note: This returns the class itself, not an instance, since we cannot reconstruct the actual table data without a database connection. Args: data: Serialized class configuration Returns: The PhenexTable subclass """# The class should already exist in the module, just return itreturncls
join(other,*args,domains=None,**kwargs)
The join method performs a join of PhenexTables, using autojoin functionality if Phenex is able to find the table types specified in PATHS.
defjoin(self,other:"PhenexTable",*args,domains=None,**kwargs):""" The join method performs a join of PhenexTables, using autojoin functionality if Phenex is able to find the table types specified in PATHS. """ifisinstance(other,Table):returntype(self)(self.table.join(other,*args,**kwargs))ifnotisinstance(other,PhenexTable):raiseTypeError(f"Expected a PhenexTable instance, got {type(other)}")iflen(args):# if user specifies join keys and join type, simply perform join as specifiedreturntype(self)(self.table.join(other.table,*args,**kwargs))# Do an autojoin by finding a path from the left to the right table and sequentially joining as necessary# joined table is the sequentially joined table# current table is the table for the left join in the current iterationjoined_table=current_left_table=selflogger.debug(f"Starting autojoin from {self.__class__.__name__} to {other.__class__.__name__}")forright_table_class_nameinself._find_path(other):# get the next right tableright_table_search_results=[vfork,vindomains.items()ifv.__class__.__name__==right_table_class_name]logger.debug(f"Searching for {right_table_class_name} in domains: {list(domains.keys())}")logger.debug(f"Found {len(right_table_search_results)} matches for {right_table_class_name}")iflen(right_table_search_results)!=1:raiseValueError(f"Unable to find unqiue {right_table_class_name} required to join {other.__class__.__name__}")right_table=right_table_search_results[0]print(f"\tJoining : {current_left_table.__class__.__name__} to {right_table.__class__.__name__}")# join keys are defined by the left table; in theory should enforce symmetryjoin_keys=current_left_table.JOIN_KEYS[right_table_class_name]# Build join predicate(s) - supports symmetric and asymmetric joins# Symmetric: ["COLUMN"] or ["COL1", "COL2"] - same column names in both tables# Asymmetric: [("LEFT_COL", "RIGHT_COL")] - different column names# Mixed: ["COL1", ("LEFT_COL", "RIGHT_COL")]predicates=[]forjoin_keyinjoin_keys:ifisinstance(join_key,str):# Symmetric: column exists in both tables with same namepredicates.append(joined_table[join_key]==right_table[join_key])elifisinstance(join_key,(tuple,list))andlen(join_key)==2:# Asymmetric: (left_col, right_col) - different column namesleft_col,right_col=join_keypredicates.append(joined_table[left_col]==right_table[right_col])else:raiseValueError(f"Invalid join key format: {join_key}. Must be either a string or a 2-element tuple/list.")# Combine all predicates with ANDiflen(predicates)==1:join_predicate=predicates[0]else:join_predicate=predicates[0]forpredinpredicates[1:]:join_predicate=join_predicate&predcolumns=list(set(joined_table.columns+right_table.columns))# subset columns, making sure to set type of table to the very left table (self)joined_table=type(self)(joined_table.join(right_table,join_predicate,**kwargs).select(columns))current_left_table=right_tablereturnjoined_table
to_dict()classmethod
Serialize the PhenexTable class configuration (not the data).
This serializes the class-level attributes that define the table mapping,
but not the actual ibis table data which cannot be serialized.
Returns:
Name
Type
Description
dict
dict
Class configuration including NAME_TABLE, JOIN_KEYS, DEFAULT_MAPPING, etc.
@classmethoddefto_dict(cls)->dict:""" Serialize the PhenexTable class configuration (not the data). This serializes the class-level attributes that define the table mapping, but not the actual ibis table data which cannot be serialized. Returns: dict: Class configuration including NAME_TABLE, JOIN_KEYS, DEFAULT_MAPPING, etc. """return{"__table_class__":cls.__name__,"__module__":cls.__module__,"NAME_TABLE":cls.NAME_TABLE,"JOIN_KEYS":cls.JOIN_KEYS,"KNOWN_FIELDS":cls.KNOWN_FIELDS,"DEFAULT_MAPPING":cls.DEFAULT_MAPPING,"PATHS":cls.PATHS,"REQUIRED_FIELDS":cls.REQUIRED_FIELDS,}
def__init__(self,table,name=None,column_mapping={}):""" Instantiate a PhenexTable, possibly overriding NAME_TABLE and COLUMN_MAPPING. """ifnotisinstance(table,Table):raiseTypeError(f"Cannot instantiatiate {self.__class__.__name__} from {type(table)}. Must be ibis Table.")self.NAME_TABLE=nameorself.NAME_TABLEself.column_mapping=self._get_column_mapping(column_mapping)self._table=table.mutate(**self._resolve_column_mapping(table,self.column_mapping))forkeyinself.REQUIRED_FIELDS:try:getattr(self._table,key)exceptAttributeError:raiseValueError(f"Required field {key} not defined in COLUMN_MAPPING.")self._add_phenotype_table_relationship()
filter(expr)
Filter the table by an Ibis Expression or using a PhenExFilter.
deffilter(self,expr):""" Filter the table by an Ibis Expression or using a PhenExFilter. """input_columns=self.columnsifisinstance(expr,ibis.expr.types.Expr)orisinstance(expr,list):filtered_table=self.table.filter(expr)else:filtered_table=expr.filter(self)returntype(self)(filtered_table.select(input_columns),name=self.NAME_TABLE,column_mapping=self.column_mapping,)
from_dict(data)classmethod
Reconstruct a PhenexTable class reference from serialized data.
Note: This returns the class itself, not an instance, since we cannot
reconstruct the actual table data without a database connection.
@classmethoddeffrom_dict(cls,data:dict):""" Reconstruct a PhenexTable class reference from serialized data. Note: This returns the class itself, not an instance, since we cannot reconstruct the actual table data without a database connection. Args: data: Serialized class configuration Returns: The PhenexTable subclass """# The class should already exist in the module, just return itreturncls
join(other,*args,domains=None,**kwargs)
The join method performs a join of PhenexTables, using autojoin functionality if Phenex is able to find the table types specified in PATHS.
defjoin(self,other:"PhenexTable",*args,domains=None,**kwargs):""" The join method performs a join of PhenexTables, using autojoin functionality if Phenex is able to find the table types specified in PATHS. """ifisinstance(other,Table):returntype(self)(self.table.join(other,*args,**kwargs))ifnotisinstance(other,PhenexTable):raiseTypeError(f"Expected a PhenexTable instance, got {type(other)}")iflen(args):# if user specifies join keys and join type, simply perform join as specifiedreturntype(self)(self.table.join(other.table,*args,**kwargs))# Do an autojoin by finding a path from the left to the right table and sequentially joining as necessary# joined table is the sequentially joined table# current table is the table for the left join in the current iterationjoined_table=current_left_table=selflogger.debug(f"Starting autojoin from {self.__class__.__name__} to {other.__class__.__name__}")forright_table_class_nameinself._find_path(other):# get the next right tableright_table_search_results=[vfork,vindomains.items()ifv.__class__.__name__==right_table_class_name]logger.debug(f"Searching for {right_table_class_name} in domains: {list(domains.keys())}")logger.debug(f"Found {len(right_table_search_results)} matches for {right_table_class_name}")iflen(right_table_search_results)!=1:raiseValueError(f"Unable to find unqiue {right_table_class_name} required to join {other.__class__.__name__}")right_table=right_table_search_results[0]print(f"\tJoining : {current_left_table.__class__.__name__} to {right_table.__class__.__name__}")# join keys are defined by the left table; in theory should enforce symmetryjoin_keys=current_left_table.JOIN_KEYS[right_table_class_name]# Build join predicate(s) - supports symmetric and asymmetric joins# Symmetric: ["COLUMN"] or ["COL1", "COL2"] - same column names in both tables# Asymmetric: [("LEFT_COL", "RIGHT_COL")] - different column names# Mixed: ["COL1", ("LEFT_COL", "RIGHT_COL")]predicates=[]forjoin_keyinjoin_keys:ifisinstance(join_key,str):# Symmetric: column exists in both tables with same namepredicates.append(joined_table[join_key]==right_table[join_key])elifisinstance(join_key,(tuple,list))andlen(join_key)==2:# Asymmetric: (left_col, right_col) - different column namesleft_col,right_col=join_keypredicates.append(joined_table[left_col]==right_table[right_col])else:raiseValueError(f"Invalid join key format: {join_key}. Must be either a string or a 2-element tuple/list.")# Combine all predicates with ANDiflen(predicates)==1:join_predicate=predicates[0]else:join_predicate=predicates[0]forpredinpredicates[1:]:join_predicate=join_predicate&predcolumns=list(set(joined_table.columns+right_table.columns))# subset columns, making sure to set type of table to the very left table (self)joined_table=type(self)(joined_table.join(right_table,join_predicate,**kwargs).select(columns))current_left_table=right_tablereturnjoined_table
to_dict()classmethod
Serialize the PhenexTable class configuration (not the data).
This serializes the class-level attributes that define the table mapping,
but not the actual ibis table data which cannot be serialized.
Returns:
Name
Type
Description
dict
dict
Class configuration including NAME_TABLE, JOIN_KEYS, DEFAULT_MAPPING, etc.
@classmethoddefto_dict(cls)->dict:""" Serialize the PhenexTable class configuration (not the data). This serializes the class-level attributes that define the table mapping, but not the actual ibis table data which cannot be serialized. Returns: dict: Class configuration including NAME_TABLE, JOIN_KEYS, DEFAULT_MAPPING, etc. """return{"__table_class__":cls.__name__,"__module__":cls.__module__,"NAME_TABLE":cls.NAME_TABLE,"JOIN_KEYS":cls.JOIN_KEYS,"KNOWN_FIELDS":cls.KNOWN_FIELDS,"DEFAULT_MAPPING":cls.DEFAULT_MAPPING,"PATHS":cls.PATHS,"REQUIRED_FIELDS":cls.REQUIRED_FIELDS,}
def__init__(self,table,name=None,column_mapping={}):""" Instantiate a PhenexTable, possibly overriding NAME_TABLE and COLUMN_MAPPING. """ifnotisinstance(table,Table):raiseTypeError(f"Cannot instantiatiate {self.__class__.__name__} from {type(table)}. Must be ibis Table.")self.NAME_TABLE=nameorself.NAME_TABLEself.column_mapping=self._get_column_mapping(column_mapping)self._table=table.mutate(**self._resolve_column_mapping(table,self.column_mapping))forkeyinself.REQUIRED_FIELDS:try:getattr(self._table,key)exceptAttributeError:raiseValueError(f"Required field {key} not defined in COLUMN_MAPPING.")self._add_phenotype_table_relationship()
filter(expr)
Filter the table by an Ibis Expression or using a PhenExFilter.
deffilter(self,expr):""" Filter the table by an Ibis Expression or using a PhenExFilter. """input_columns=self.columnsifisinstance(expr,ibis.expr.types.Expr)orisinstance(expr,list):filtered_table=self.table.filter(expr)else:filtered_table=expr.filter(self)returntype(self)(filtered_table.select(input_columns),name=self.NAME_TABLE,column_mapping=self.column_mapping,)
from_dict(data)classmethod
Reconstruct a PhenexTable class reference from serialized data.
Note: This returns the class itself, not an instance, since we cannot
reconstruct the actual table data without a database connection.
@classmethoddeffrom_dict(cls,data:dict):""" Reconstruct a PhenexTable class reference from serialized data. Note: This returns the class itself, not an instance, since we cannot reconstruct the actual table data without a database connection. Args: data: Serialized class configuration Returns: The PhenexTable subclass """# The class should already exist in the module, just return itreturncls
join(other,*args,domains=None,**kwargs)
The join method performs a join of PhenexTables, using autojoin functionality if Phenex is able to find the table types specified in PATHS.
defjoin(self,other:"PhenexTable",*args,domains=None,**kwargs):""" The join method performs a join of PhenexTables, using autojoin functionality if Phenex is able to find the table types specified in PATHS. """ifisinstance(other,Table):returntype(self)(self.table.join(other,*args,**kwargs))ifnotisinstance(other,PhenexTable):raiseTypeError(f"Expected a PhenexTable instance, got {type(other)}")iflen(args):# if user specifies join keys and join type, simply perform join as specifiedreturntype(self)(self.table.join(other.table,*args,**kwargs))# Do an autojoin by finding a path from the left to the right table and sequentially joining as necessary# joined table is the sequentially joined table# current table is the table for the left join in the current iterationjoined_table=current_left_table=selflogger.debug(f"Starting autojoin from {self.__class__.__name__} to {other.__class__.__name__}")forright_table_class_nameinself._find_path(other):# get the next right tableright_table_search_results=[vfork,vindomains.items()ifv.__class__.__name__==right_table_class_name]logger.debug(f"Searching for {right_table_class_name} in domains: {list(domains.keys())}")logger.debug(f"Found {len(right_table_search_results)} matches for {right_table_class_name}")iflen(right_table_search_results)!=1:raiseValueError(f"Unable to find unqiue {right_table_class_name} required to join {other.__class__.__name__}")right_table=right_table_search_results[0]print(f"\tJoining : {current_left_table.__class__.__name__} to {right_table.__class__.__name__}")# join keys are defined by the left table; in theory should enforce symmetryjoin_keys=current_left_table.JOIN_KEYS[right_table_class_name]# Build join predicate(s) - supports symmetric and asymmetric joins# Symmetric: ["COLUMN"] or ["COL1", "COL2"] - same column names in both tables# Asymmetric: [("LEFT_COL", "RIGHT_COL")] - different column names# Mixed: ["COL1", ("LEFT_COL", "RIGHT_COL")]predicates=[]forjoin_keyinjoin_keys:ifisinstance(join_key,str):# Symmetric: column exists in both tables with same namepredicates.append(joined_table[join_key]==right_table[join_key])elifisinstance(join_key,(tuple,list))andlen(join_key)==2:# Asymmetric: (left_col, right_col) - different column namesleft_col,right_col=join_keypredicates.append(joined_table[left_col]==right_table[right_col])else:raiseValueError(f"Invalid join key format: {join_key}. Must be either a string or a 2-element tuple/list.")# Combine all predicates with ANDiflen(predicates)==1:join_predicate=predicates[0]else:join_predicate=predicates[0]forpredinpredicates[1:]:join_predicate=join_predicate&predcolumns=list(set(joined_table.columns+right_table.columns))# subset columns, making sure to set type of table to the very left table (self)joined_table=type(self)(joined_table.join(right_table,join_predicate,**kwargs).select(columns))current_left_table=right_tablereturnjoined_table
to_dict()classmethod
Serialize the PhenexTable class configuration (not the data).
This serializes the class-level attributes that define the table mapping,
but not the actual ibis table data which cannot be serialized.
Returns:
Name
Type
Description
dict
dict
Class configuration including NAME_TABLE, JOIN_KEYS, DEFAULT_MAPPING, etc.
@classmethoddefto_dict(cls)->dict:""" Serialize the PhenexTable class configuration (not the data). This serializes the class-level attributes that define the table mapping, but not the actual ibis table data which cannot be serialized. Returns: dict: Class configuration including NAME_TABLE, JOIN_KEYS, DEFAULT_MAPPING, etc. """return{"__table_class__":cls.__name__,"__module__":cls.__module__,"NAME_TABLE":cls.NAME_TABLE,"JOIN_KEYS":cls.JOIN_KEYS,"KNOWN_FIELDS":cls.KNOWN_FIELDS,"DEFAULT_MAPPING":cls.DEFAULT_MAPPING,"PATHS":cls.PATHS,"REQUIRED_FIELDS":cls.REQUIRED_FIELDS,}
def__init__(self,table,name=None,column_mapping={}):""" Instantiate a PhenexTable, possibly overriding NAME_TABLE and COLUMN_MAPPING. """ifnotisinstance(table,Table):raiseTypeError(f"Cannot instantiatiate {self.__class__.__name__} from {type(table)}. Must be ibis Table.")self.NAME_TABLE=nameorself.NAME_TABLEself.column_mapping=self._get_column_mapping(column_mapping)self._table=table.mutate(**self._resolve_column_mapping(table,self.column_mapping))forkeyinself.REQUIRED_FIELDS:try:getattr(self._table,key)exceptAttributeError:raiseValueError(f"Required field {key} not defined in COLUMN_MAPPING.")self._add_phenotype_table_relationship()
filter(expr)
Filter the table by an Ibis Expression or using a PhenExFilter.
deffilter(self,expr):""" Filter the table by an Ibis Expression or using a PhenExFilter. """input_columns=self.columnsifisinstance(expr,ibis.expr.types.Expr)orisinstance(expr,list):filtered_table=self.table.filter(expr)else:filtered_table=expr.filter(self)returntype(self)(filtered_table.select(input_columns),name=self.NAME_TABLE,column_mapping=self.column_mapping,)
from_dict(data)classmethod
Reconstruct a PhenexTable class reference from serialized data.
Note: This returns the class itself, not an instance, since we cannot
reconstruct the actual table data without a database connection.
@classmethoddeffrom_dict(cls,data:dict):""" Reconstruct a PhenexTable class reference from serialized data. Note: This returns the class itself, not an instance, since we cannot reconstruct the actual table data without a database connection. Args: data: Serialized class configuration Returns: The PhenexTable subclass """# The class should already exist in the module, just return itreturncls
join(other,*args,domains=None,**kwargs)
The join method performs a join of PhenexTables, using autojoin functionality if Phenex is able to find the table types specified in PATHS.
defjoin(self,other:"PhenexTable",*args,domains=None,**kwargs):""" The join method performs a join of PhenexTables, using autojoin functionality if Phenex is able to find the table types specified in PATHS. """ifisinstance(other,Table):returntype(self)(self.table.join(other,*args,**kwargs))ifnotisinstance(other,PhenexTable):raiseTypeError(f"Expected a PhenexTable instance, got {type(other)}")iflen(args):# if user specifies join keys and join type, simply perform join as specifiedreturntype(self)(self.table.join(other.table,*args,**kwargs))# Do an autojoin by finding a path from the left to the right table and sequentially joining as necessary# joined table is the sequentially joined table# current table is the table for the left join in the current iterationjoined_table=current_left_table=selflogger.debug(f"Starting autojoin from {self.__class__.__name__} to {other.__class__.__name__}")forright_table_class_nameinself._find_path(other):# get the next right tableright_table_search_results=[vfork,vindomains.items()ifv.__class__.__name__==right_table_class_name]logger.debug(f"Searching for {right_table_class_name} in domains: {list(domains.keys())}")logger.debug(f"Found {len(right_table_search_results)} matches for {right_table_class_name}")iflen(right_table_search_results)!=1:raiseValueError(f"Unable to find unqiue {right_table_class_name} required to join {other.__class__.__name__}")right_table=right_table_search_results[0]print(f"\tJoining : {current_left_table.__class__.__name__} to {right_table.__class__.__name__}")# join keys are defined by the left table; in theory should enforce symmetryjoin_keys=current_left_table.JOIN_KEYS[right_table_class_name]# Build join predicate(s) - supports symmetric and asymmetric joins# Symmetric: ["COLUMN"] or ["COL1", "COL2"] - same column names in both tables# Asymmetric: [("LEFT_COL", "RIGHT_COL")] - different column names# Mixed: ["COL1", ("LEFT_COL", "RIGHT_COL")]predicates=[]forjoin_keyinjoin_keys:ifisinstance(join_key,str):# Symmetric: column exists in both tables with same namepredicates.append(joined_table[join_key]==right_table[join_key])elifisinstance(join_key,(tuple,list))andlen(join_key)==2:# Asymmetric: (left_col, right_col) - different column namesleft_col,right_col=join_keypredicates.append(joined_table[left_col]==right_table[right_col])else:raiseValueError(f"Invalid join key format: {join_key}. Must be either a string or a 2-element tuple/list.")# Combine all predicates with ANDiflen(predicates)==1:join_predicate=predicates[0]else:join_predicate=predicates[0]forpredinpredicates[1:]:join_predicate=join_predicate&predcolumns=list(set(joined_table.columns+right_table.columns))# subset columns, making sure to set type of table to the very left table (self)joined_table=type(self)(joined_table.join(right_table,join_predicate,**kwargs).select(columns))current_left_table=right_tablereturnjoined_table
to_dict()classmethod
Serialize the PhenexTable class configuration (not the data).
This serializes the class-level attributes that define the table mapping,
but not the actual ibis table data which cannot be serialized.
Returns:
Name
Type
Description
dict
dict
Class configuration including NAME_TABLE, JOIN_KEYS, DEFAULT_MAPPING, etc.
@classmethoddefto_dict(cls)->dict:""" Serialize the PhenexTable class configuration (not the data). This serializes the class-level attributes that define the table mapping, but not the actual ibis table data which cannot be serialized. Returns: dict: Class configuration including NAME_TABLE, JOIN_KEYS, DEFAULT_MAPPING, etc. """return{"__table_class__":cls.__name__,"__module__":cls.__module__,"NAME_TABLE":cls.NAME_TABLE,"JOIN_KEYS":cls.JOIN_KEYS,"KNOWN_FIELDS":cls.KNOWN_FIELDS,"DEFAULT_MAPPING":cls.DEFAULT_MAPPING,"PATHS":cls.PATHS,"REQUIRED_FIELDS":cls.REQUIRED_FIELDS,}
def__init__(self,table,name=None,column_mapping={}):""" Instantiate a PhenexTable, possibly overriding NAME_TABLE and COLUMN_MAPPING. """ifnotisinstance(table,Table):raiseTypeError(f"Cannot instantiatiate {self.__class__.__name__} from {type(table)}. Must be ibis Table.")self.NAME_TABLE=nameorself.NAME_TABLEself.column_mapping=self._get_column_mapping(column_mapping)self._table=table.mutate(**self._resolve_column_mapping(table,self.column_mapping))forkeyinself.REQUIRED_FIELDS:try:getattr(self._table,key)exceptAttributeError:raiseValueError(f"Required field {key} not defined in COLUMN_MAPPING.")self._add_phenotype_table_relationship()
filter(expr)
Filter the table by an Ibis Expression or using a PhenExFilter.
deffilter(self,expr):""" Filter the table by an Ibis Expression or using a PhenExFilter. """input_columns=self.columnsifisinstance(expr,ibis.expr.types.Expr)orisinstance(expr,list):filtered_table=self.table.filter(expr)else:filtered_table=expr.filter(self)returntype(self)(filtered_table.select(input_columns),name=self.NAME_TABLE,column_mapping=self.column_mapping,)
from_dict(data)classmethod
Reconstruct a PhenexTable class reference from serialized data.
Note: This returns the class itself, not an instance, since we cannot
reconstruct the actual table data without a database connection.
@classmethoddeffrom_dict(cls,data:dict):""" Reconstruct a PhenexTable class reference from serialized data. Note: This returns the class itself, not an instance, since we cannot reconstruct the actual table data without a database connection. Args: data: Serialized class configuration Returns: The PhenexTable subclass """# The class should already exist in the module, just return itreturncls
join(other,*args,domains=None,**kwargs)
The join method performs a join of PhenexTables, using autojoin functionality if Phenex is able to find the table types specified in PATHS.
defjoin(self,other:"PhenexTable",*args,domains=None,**kwargs):""" The join method performs a join of PhenexTables, using autojoin functionality if Phenex is able to find the table types specified in PATHS. """ifisinstance(other,Table):returntype(self)(self.table.join(other,*args,**kwargs))ifnotisinstance(other,PhenexTable):raiseTypeError(f"Expected a PhenexTable instance, got {type(other)}")iflen(args):# if user specifies join keys and join type, simply perform join as specifiedreturntype(self)(self.table.join(other.table,*args,**kwargs))# Do an autojoin by finding a path from the left to the right table and sequentially joining as necessary# joined table is the sequentially joined table# current table is the table for the left join in the current iterationjoined_table=current_left_table=selflogger.debug(f"Starting autojoin from {self.__class__.__name__} to {other.__class__.__name__}")forright_table_class_nameinself._find_path(other):# get the next right tableright_table_search_results=[vfork,vindomains.items()ifv.__class__.__name__==right_table_class_name]logger.debug(f"Searching for {right_table_class_name} in domains: {list(domains.keys())}")logger.debug(f"Found {len(right_table_search_results)} matches for {right_table_class_name}")iflen(right_table_search_results)!=1:raiseValueError(f"Unable to find unqiue {right_table_class_name} required to join {other.__class__.__name__}")right_table=right_table_search_results[0]print(f"\tJoining : {current_left_table.__class__.__name__} to {right_table.__class__.__name__}")# join keys are defined by the left table; in theory should enforce symmetryjoin_keys=current_left_table.JOIN_KEYS[right_table_class_name]# Build join predicate(s) - supports symmetric and asymmetric joins# Symmetric: ["COLUMN"] or ["COL1", "COL2"] - same column names in both tables# Asymmetric: [("LEFT_COL", "RIGHT_COL")] - different column names# Mixed: ["COL1", ("LEFT_COL", "RIGHT_COL")]predicates=[]forjoin_keyinjoin_keys:ifisinstance(join_key,str):# Symmetric: column exists in both tables with same namepredicates.append(joined_table[join_key]==right_table[join_key])elifisinstance(join_key,(tuple,list))andlen(join_key)==2:# Asymmetric: (left_col, right_col) - different column namesleft_col,right_col=join_keypredicates.append(joined_table[left_col]==right_table[right_col])else:raiseValueError(f"Invalid join key format: {join_key}. Must be either a string or a 2-element tuple/list.")# Combine all predicates with ANDiflen(predicates)==1:join_predicate=predicates[0]else:join_predicate=predicates[0]forpredinpredicates[1:]:join_predicate=join_predicate&predcolumns=list(set(joined_table.columns+right_table.columns))# subset columns, making sure to set type of table to the very left table (self)joined_table=type(self)(joined_table.join(right_table,join_predicate,**kwargs).select(columns))current_left_table=right_tablereturnjoined_table
to_dict()classmethod
Serialize the PhenexTable class configuration (not the data).
This serializes the class-level attributes that define the table mapping,
but not the actual ibis table data which cannot be serialized.
Returns:
Name
Type
Description
dict
dict
Class configuration including NAME_TABLE, JOIN_KEYS, DEFAULT_MAPPING, etc.
@classmethoddefto_dict(cls)->dict:""" Serialize the PhenexTable class configuration (not the data). This serializes the class-level attributes that define the table mapping, but not the actual ibis table data which cannot be serialized. Returns: dict: Class configuration including NAME_TABLE, JOIN_KEYS, DEFAULT_MAPPING, etc. """return{"__table_class__":cls.__name__,"__module__":cls.__module__,"NAME_TABLE":cls.NAME_TABLE,"JOIN_KEYS":cls.JOIN_KEYS,"KNOWN_FIELDS":cls.KNOWN_FIELDS,"DEFAULT_MAPPING":cls.DEFAULT_MAPPING,"PATHS":cls.PATHS,"REQUIRED_FIELDS":cls.REQUIRED_FIELDS,}
def__init__(self,table,name=None,column_mapping={}):""" Instantiate a PhenexTable, possibly overriding NAME_TABLE and COLUMN_MAPPING. """ifnotisinstance(table,Table):raiseTypeError(f"Cannot instantiatiate {self.__class__.__name__} from {type(table)}. Must be ibis Table.")self.NAME_TABLE=nameorself.NAME_TABLEself.column_mapping=self._get_column_mapping(column_mapping)self._table=table.mutate(**self._resolve_column_mapping(table,self.column_mapping))forkeyinself.REQUIRED_FIELDS:try:getattr(self._table,key)exceptAttributeError:raiseValueError(f"Required field {key} not defined in COLUMN_MAPPING.")self._add_phenotype_table_relationship()
filter(expr)
Filter the table by an Ibis Expression or using a PhenExFilter.
deffilter(self,expr):""" Filter the table by an Ibis Expression or using a PhenExFilter. """input_columns=self.columnsifisinstance(expr,ibis.expr.types.Expr)orisinstance(expr,list):filtered_table=self.table.filter(expr)else:filtered_table=expr.filter(self)returntype(self)(filtered_table.select(input_columns),name=self.NAME_TABLE,column_mapping=self.column_mapping,)
from_dict(data)classmethod
Reconstruct a PhenexTable class reference from serialized data.
Note: This returns the class itself, not an instance, since we cannot
reconstruct the actual table data without a database connection.
@classmethoddeffrom_dict(cls,data:dict):""" Reconstruct a PhenexTable class reference from serialized data. Note: This returns the class itself, not an instance, since we cannot reconstruct the actual table data without a database connection. Args: data: Serialized class configuration Returns: The PhenexTable subclass """# The class should already exist in the module, just return itreturncls
join(other,*args,domains=None,**kwargs)
The join method performs a join of PhenexTables, using autojoin functionality if Phenex is able to find the table types specified in PATHS.
defjoin(self,other:"PhenexTable",*args,domains=None,**kwargs):""" The join method performs a join of PhenexTables, using autojoin functionality if Phenex is able to find the table types specified in PATHS. """ifisinstance(other,Table):returntype(self)(self.table.join(other,*args,**kwargs))ifnotisinstance(other,PhenexTable):raiseTypeError(f"Expected a PhenexTable instance, got {type(other)}")iflen(args):# if user specifies join keys and join type, simply perform join as specifiedreturntype(self)(self.table.join(other.table,*args,**kwargs))# Do an autojoin by finding a path from the left to the right table and sequentially joining as necessary# joined table is the sequentially joined table# current table is the table for the left join in the current iterationjoined_table=current_left_table=selflogger.debug(f"Starting autojoin from {self.__class__.__name__} to {other.__class__.__name__}")forright_table_class_nameinself._find_path(other):# get the next right tableright_table_search_results=[vfork,vindomains.items()ifv.__class__.__name__==right_table_class_name]logger.debug(f"Searching for {right_table_class_name} in domains: {list(domains.keys())}")logger.debug(f"Found {len(right_table_search_results)} matches for {right_table_class_name}")iflen(right_table_search_results)!=1:raiseValueError(f"Unable to find unqiue {right_table_class_name} required to join {other.__class__.__name__}")right_table=right_table_search_results[0]print(f"\tJoining : {current_left_table.__class__.__name__} to {right_table.__class__.__name__}")# join keys are defined by the left table; in theory should enforce symmetryjoin_keys=current_left_table.JOIN_KEYS[right_table_class_name]# Build join predicate(s) - supports symmetric and asymmetric joins# Symmetric: ["COLUMN"] or ["COL1", "COL2"] - same column names in both tables# Asymmetric: [("LEFT_COL", "RIGHT_COL")] - different column names# Mixed: ["COL1", ("LEFT_COL", "RIGHT_COL")]predicates=[]forjoin_keyinjoin_keys:ifisinstance(join_key,str):# Symmetric: column exists in both tables with same namepredicates.append(joined_table[join_key]==right_table[join_key])elifisinstance(join_key,(tuple,list))andlen(join_key)==2:# Asymmetric: (left_col, right_col) - different column namesleft_col,right_col=join_keypredicates.append(joined_table[left_col]==right_table[right_col])else:raiseValueError(f"Invalid join key format: {join_key}. Must be either a string or a 2-element tuple/list.")# Combine all predicates with ANDiflen(predicates)==1:join_predicate=predicates[0]else:join_predicate=predicates[0]forpredinpredicates[1:]:join_predicate=join_predicate&predcolumns=list(set(joined_table.columns+right_table.columns))# subset columns, making sure to set type of table to the very left table (self)joined_table=type(self)(joined_table.join(right_table,join_predicate,**kwargs).select(columns))current_left_table=right_tablereturnjoined_table
to_dict()classmethod
Serialize the PhenexTable class configuration (not the data).
This serializes the class-level attributes that define the table mapping,
but not the actual ibis table data which cannot be serialized.
Returns:
Name
Type
Description
dict
dict
Class configuration including NAME_TABLE, JOIN_KEYS, DEFAULT_MAPPING, etc.
@classmethoddefto_dict(cls)->dict:""" Serialize the PhenexTable class configuration (not the data). This serializes the class-level attributes that define the table mapping, but not the actual ibis table data which cannot be serialized. Returns: dict: Class configuration including NAME_TABLE, JOIN_KEYS, DEFAULT_MAPPING, etc. """return{"__table_class__":cls.__name__,"__module__":cls.__module__,"NAME_TABLE":cls.NAME_TABLE,"JOIN_KEYS":cls.JOIN_KEYS,"KNOWN_FIELDS":cls.KNOWN_FIELDS,"DEFAULT_MAPPING":cls.DEFAULT_MAPPING,"PATHS":cls.PATHS,"REQUIRED_FIELDS":cls.REQUIRED_FIELDS,}
def__init__(self,table,name=None,column_mapping={}):""" Instantiate a PhenexTable, possibly overriding NAME_TABLE and COLUMN_MAPPING. """ifnotisinstance(table,Table):raiseTypeError(f"Cannot instantiatiate {self.__class__.__name__} from {type(table)}. Must be ibis Table.")self.NAME_TABLE=nameorself.NAME_TABLEself.column_mapping=self._get_column_mapping(column_mapping)self._table=table.mutate(**self._resolve_column_mapping(table,self.column_mapping))forkeyinself.REQUIRED_FIELDS:try:getattr(self._table,key)exceptAttributeError:raiseValueError(f"Required field {key} not defined in COLUMN_MAPPING.")self._add_phenotype_table_relationship()
filter(expr)
Filter the table by an Ibis Expression or using a PhenExFilter.
deffilter(self,expr):""" Filter the table by an Ibis Expression or using a PhenExFilter. """input_columns=self.columnsifisinstance(expr,ibis.expr.types.Expr)orisinstance(expr,list):filtered_table=self.table.filter(expr)else:filtered_table=expr.filter(self)returntype(self)(filtered_table.select(input_columns),name=self.NAME_TABLE,column_mapping=self.column_mapping,)
from_dict(data)classmethod
Reconstruct a PhenexTable class reference from serialized data.
Note: This returns the class itself, not an instance, since we cannot
reconstruct the actual table data without a database connection.
@classmethoddeffrom_dict(cls,data:dict):""" Reconstruct a PhenexTable class reference from serialized data. Note: This returns the class itself, not an instance, since we cannot reconstruct the actual table data without a database connection. Args: data: Serialized class configuration Returns: The PhenexTable subclass """# The class should already exist in the module, just return itreturncls
join(other,*args,domains=None,**kwargs)
The join method performs a join of PhenexTables, using autojoin functionality if Phenex is able to find the table types specified in PATHS.
defjoin(self,other:"PhenexTable",*args,domains=None,**kwargs):""" The join method performs a join of PhenexTables, using autojoin functionality if Phenex is able to find the table types specified in PATHS. """ifisinstance(other,Table):returntype(self)(self.table.join(other,*args,**kwargs))ifnotisinstance(other,PhenexTable):raiseTypeError(f"Expected a PhenexTable instance, got {type(other)}")iflen(args):# if user specifies join keys and join type, simply perform join as specifiedreturntype(self)(self.table.join(other.table,*args,**kwargs))# Do an autojoin by finding a path from the left to the right table and sequentially joining as necessary# joined table is the sequentially joined table# current table is the table for the left join in the current iterationjoined_table=current_left_table=selflogger.debug(f"Starting autojoin from {self.__class__.__name__} to {other.__class__.__name__}")forright_table_class_nameinself._find_path(other):# get the next right tableright_table_search_results=[vfork,vindomains.items()ifv.__class__.__name__==right_table_class_name]logger.debug(f"Searching for {right_table_class_name} in domains: {list(domains.keys())}")logger.debug(f"Found {len(right_table_search_results)} matches for {right_table_class_name}")iflen(right_table_search_results)!=1:raiseValueError(f"Unable to find unqiue {right_table_class_name} required to join {other.__class__.__name__}")right_table=right_table_search_results[0]print(f"\tJoining : {current_left_table.__class__.__name__} to {right_table.__class__.__name__}")# join keys are defined by the left table; in theory should enforce symmetryjoin_keys=current_left_table.JOIN_KEYS[right_table_class_name]# Build join predicate(s) - supports symmetric and asymmetric joins# Symmetric: ["COLUMN"] or ["COL1", "COL2"] - same column names in both tables# Asymmetric: [("LEFT_COL", "RIGHT_COL")] - different column names# Mixed: ["COL1", ("LEFT_COL", "RIGHT_COL")]predicates=[]forjoin_keyinjoin_keys:ifisinstance(join_key,str):# Symmetric: column exists in both tables with same namepredicates.append(joined_table[join_key]==right_table[join_key])elifisinstance(join_key,(tuple,list))andlen(join_key)==2:# Asymmetric: (left_col, right_col) - different column namesleft_col,right_col=join_keypredicates.append(joined_table[left_col]==right_table[right_col])else:raiseValueError(f"Invalid join key format: {join_key}. Must be either a string or a 2-element tuple/list.")# Combine all predicates with ANDiflen(predicates)==1:join_predicate=predicates[0]else:join_predicate=predicates[0]forpredinpredicates[1:]:join_predicate=join_predicate&predcolumns=list(set(joined_table.columns+right_table.columns))# subset columns, making sure to set type of table to the very left table (self)joined_table=type(self)(joined_table.join(right_table,join_predicate,**kwargs).select(columns))current_left_table=right_tablereturnjoined_table
to_dict()classmethod
Serialize the PhenexTable class configuration (not the data).
This serializes the class-level attributes that define the table mapping,
but not the actual ibis table data which cannot be serialized.
Returns:
Name
Type
Description
dict
dict
Class configuration including NAME_TABLE, JOIN_KEYS, DEFAULT_MAPPING, etc.
@classmethoddefto_dict(cls)->dict:""" Serialize the PhenexTable class configuration (not the data). This serializes the class-level attributes that define the table mapping, but not the actual ibis table data which cannot be serialized. Returns: dict: Class configuration including NAME_TABLE, JOIN_KEYS, DEFAULT_MAPPING, etc. """return{"__table_class__":cls.__name__,"__module__":cls.__module__,"NAME_TABLE":cls.NAME_TABLE,"JOIN_KEYS":cls.JOIN_KEYS,"KNOWN_FIELDS":cls.KNOWN_FIELDS,"DEFAULT_MAPPING":cls.DEFAULT_MAPPING,"PATHS":cls.PATHS,"REQUIRED_FIELDS":cls.REQUIRED_FIELDS,}
def__init__(self,table,name=None,column_mapping={}):""" Instantiate a PhenexTable, possibly overriding NAME_TABLE and COLUMN_MAPPING. """ifnotisinstance(table,Table):raiseTypeError(f"Cannot instantiatiate {self.__class__.__name__} from {type(table)}. Must be ibis Table.")self.NAME_TABLE=nameorself.NAME_TABLEself.column_mapping=self._get_column_mapping(column_mapping)self._table=table.mutate(**self._resolve_column_mapping(table,self.column_mapping))forkeyinself.REQUIRED_FIELDS:try:getattr(self._table,key)exceptAttributeError:raiseValueError(f"Required field {key} not defined in COLUMN_MAPPING.")self._add_phenotype_table_relationship()
filter(expr)
Filter the table by an Ibis Expression or using a PhenExFilter.
deffilter(self,expr):""" Filter the table by an Ibis Expression or using a PhenExFilter. """input_columns=self.columnsifisinstance(expr,ibis.expr.types.Expr)orisinstance(expr,list):filtered_table=self.table.filter(expr)else:filtered_table=expr.filter(self)returntype(self)(filtered_table.select(input_columns),name=self.NAME_TABLE,column_mapping=self.column_mapping,)
from_dict(data)classmethod
Reconstruct a PhenexTable class reference from serialized data.
Note: This returns the class itself, not an instance, since we cannot
reconstruct the actual table data without a database connection.
@classmethoddeffrom_dict(cls,data:dict):""" Reconstruct a PhenexTable class reference from serialized data. Note: This returns the class itself, not an instance, since we cannot reconstruct the actual table data without a database connection. Args: data: Serialized class configuration Returns: The PhenexTable subclass """# The class should already exist in the module, just return itreturncls
join(other,*args,domains=None,**kwargs)
The join method performs a join of PhenexTables, using autojoin functionality if Phenex is able to find the table types specified in PATHS.
defjoin(self,other:"PhenexTable",*args,domains=None,**kwargs):""" The join method performs a join of PhenexTables, using autojoin functionality if Phenex is able to find the table types specified in PATHS. """ifisinstance(other,Table):returntype(self)(self.table.join(other,*args,**kwargs))ifnotisinstance(other,PhenexTable):raiseTypeError(f"Expected a PhenexTable instance, got {type(other)}")iflen(args):# if user specifies join keys and join type, simply perform join as specifiedreturntype(self)(self.table.join(other.table,*args,**kwargs))# Do an autojoin by finding a path from the left to the right table and sequentially joining as necessary# joined table is the sequentially joined table# current table is the table for the left join in the current iterationjoined_table=current_left_table=selflogger.debug(f"Starting autojoin from {self.__class__.__name__} to {other.__class__.__name__}")forright_table_class_nameinself._find_path(other):# get the next right tableright_table_search_results=[vfork,vindomains.items()ifv.__class__.__name__==right_table_class_name]logger.debug(f"Searching for {right_table_class_name} in domains: {list(domains.keys())}")logger.debug(f"Found {len(right_table_search_results)} matches for {right_table_class_name}")iflen(right_table_search_results)!=1:raiseValueError(f"Unable to find unqiue {right_table_class_name} required to join {other.__class__.__name__}")right_table=right_table_search_results[0]print(f"\tJoining : {current_left_table.__class__.__name__} to {right_table.__class__.__name__}")# join keys are defined by the left table; in theory should enforce symmetryjoin_keys=current_left_table.JOIN_KEYS[right_table_class_name]# Build join predicate(s) - supports symmetric and asymmetric joins# Symmetric: ["COLUMN"] or ["COL1", "COL2"] - same column names in both tables# Asymmetric: [("LEFT_COL", "RIGHT_COL")] - different column names# Mixed: ["COL1", ("LEFT_COL", "RIGHT_COL")]predicates=[]forjoin_keyinjoin_keys:ifisinstance(join_key,str):# Symmetric: column exists in both tables with same namepredicates.append(joined_table[join_key]==right_table[join_key])elifisinstance(join_key,(tuple,list))andlen(join_key)==2:# Asymmetric: (left_col, right_col) - different column namesleft_col,right_col=join_keypredicates.append(joined_table[left_col]==right_table[right_col])else:raiseValueError(f"Invalid join key format: {join_key}. Must be either a string or a 2-element tuple/list.")# Combine all predicates with ANDiflen(predicates)==1:join_predicate=predicates[0]else:join_predicate=predicates[0]forpredinpredicates[1:]:join_predicate=join_predicate&predcolumns=list(set(joined_table.columns+right_table.columns))# subset columns, making sure to set type of table to the very left table (self)joined_table=type(self)(joined_table.join(right_table,join_predicate,**kwargs).select(columns))current_left_table=right_tablereturnjoined_table
to_dict()classmethod
Serialize the PhenexTable class configuration (not the data).
This serializes the class-level attributes that define the table mapping,
but not the actual ibis table data which cannot be serialized.
Returns:
Name
Type
Description
dict
dict
Class configuration including NAME_TABLE, JOIN_KEYS, DEFAULT_MAPPING, etc.
@classmethoddefto_dict(cls)->dict:""" Serialize the PhenexTable class configuration (not the data). This serializes the class-level attributes that define the table mapping, but not the actual ibis table data which cannot be serialized. Returns: dict: Class configuration including NAME_TABLE, JOIN_KEYS, DEFAULT_MAPPING, etc. """return{"__table_class__":cls.__name__,"__module__":cls.__module__,"NAME_TABLE":cls.NAME_TABLE,"JOIN_KEYS":cls.JOIN_KEYS,"KNOWN_FIELDS":cls.KNOWN_FIELDS,"DEFAULT_MAPPING":cls.DEFAULT_MAPPING,"PATHS":cls.PATHS,"REQUIRED_FIELDS":cls.REQUIRED_FIELDS,}
def__init__(self,table,name=None,column_mapping={}):""" Instantiate a PhenexTable, possibly overriding NAME_TABLE and COLUMN_MAPPING. """ifnotisinstance(table,Table):raiseTypeError(f"Cannot instantiatiate {self.__class__.__name__} from {type(table)}. Must be ibis Table.")self.NAME_TABLE=nameorself.NAME_TABLEself.column_mapping=self._get_column_mapping(column_mapping)self._table=table.mutate(**self._resolve_column_mapping(table,self.column_mapping))forkeyinself.REQUIRED_FIELDS:try:getattr(self._table,key)exceptAttributeError:raiseValueError(f"Required field {key} not defined in COLUMN_MAPPING.")self._add_phenotype_table_relationship()
filter(expr)
Filter the table by an Ibis Expression or using a PhenExFilter.
deffilter(self,expr):""" Filter the table by an Ibis Expression or using a PhenExFilter. """input_columns=self.columnsifisinstance(expr,ibis.expr.types.Expr)orisinstance(expr,list):filtered_table=self.table.filter(expr)else:filtered_table=expr.filter(self)returntype(self)(filtered_table.select(input_columns),name=self.NAME_TABLE,column_mapping=self.column_mapping,)
from_dict(data)classmethod
Reconstruct a PhenexTable class reference from serialized data.
Note: This returns the class itself, not an instance, since we cannot
reconstruct the actual table data without a database connection.
@classmethoddeffrom_dict(cls,data:dict):""" Reconstruct a PhenexTable class reference from serialized data. Note: This returns the class itself, not an instance, since we cannot reconstruct the actual table data without a database connection. Args: data: Serialized class configuration Returns: The PhenexTable subclass """# The class should already exist in the module, just return itreturncls
join(other,*args,domains=None,**kwargs)
The join method performs a join of PhenexTables, using autojoin functionality if Phenex is able to find the table types specified in PATHS.
defjoin(self,other:"PhenexTable",*args,domains=None,**kwargs):""" The join method performs a join of PhenexTables, using autojoin functionality if Phenex is able to find the table types specified in PATHS. """ifisinstance(other,Table):returntype(self)(self.table.join(other,*args,**kwargs))ifnotisinstance(other,PhenexTable):raiseTypeError(f"Expected a PhenexTable instance, got {type(other)}")iflen(args):# if user specifies join keys and join type, simply perform join as specifiedreturntype(self)(self.table.join(other.table,*args,**kwargs))# Do an autojoin by finding a path from the left to the right table and sequentially joining as necessary# joined table is the sequentially joined table# current table is the table for the left join in the current iterationjoined_table=current_left_table=selflogger.debug(f"Starting autojoin from {self.__class__.__name__} to {other.__class__.__name__}")forright_table_class_nameinself._find_path(other):# get the next right tableright_table_search_results=[vfork,vindomains.items()ifv.__class__.__name__==right_table_class_name]logger.debug(f"Searching for {right_table_class_name} in domains: {list(domains.keys())}")logger.debug(f"Found {len(right_table_search_results)} matches for {right_table_class_name}")iflen(right_table_search_results)!=1:raiseValueError(f"Unable to find unqiue {right_table_class_name} required to join {other.__class__.__name__}")right_table=right_table_search_results[0]print(f"\tJoining : {current_left_table.__class__.__name__} to {right_table.__class__.__name__}")# join keys are defined by the left table; in theory should enforce symmetryjoin_keys=current_left_table.JOIN_KEYS[right_table_class_name]# Build join predicate(s) - supports symmetric and asymmetric joins# Symmetric: ["COLUMN"] or ["COL1", "COL2"] - same column names in both tables# Asymmetric: [("LEFT_COL", "RIGHT_COL")] - different column names# Mixed: ["COL1", ("LEFT_COL", "RIGHT_COL")]predicates=[]forjoin_keyinjoin_keys:ifisinstance(join_key,str):# Symmetric: column exists in both tables with same namepredicates.append(joined_table[join_key]==right_table[join_key])elifisinstance(join_key,(tuple,list))andlen(join_key)==2:# Asymmetric: (left_col, right_col) - different column namesleft_col,right_col=join_keypredicates.append(joined_table[left_col]==right_table[right_col])else:raiseValueError(f"Invalid join key format: {join_key}. Must be either a string or a 2-element tuple/list.")# Combine all predicates with ANDiflen(predicates)==1:join_predicate=predicates[0]else:join_predicate=predicates[0]forpredinpredicates[1:]:join_predicate=join_predicate&predcolumns=list(set(joined_table.columns+right_table.columns))# subset columns, making sure to set type of table to the very left table (self)joined_table=type(self)(joined_table.join(right_table,join_predicate,**kwargs).select(columns))current_left_table=right_tablereturnjoined_table
to_dict()classmethod
Serialize the PhenexTable class configuration (not the data).
This serializes the class-level attributes that define the table mapping,
but not the actual ibis table data which cannot be serialized.
Returns:
Name
Type
Description
dict
dict
Class configuration including NAME_TABLE, JOIN_KEYS, DEFAULT_MAPPING, etc.
@classmethoddefto_dict(cls)->dict:""" Serialize the PhenexTable class configuration (not the data). This serializes the class-level attributes that define the table mapping, but not the actual ibis table data which cannot be serialized. Returns: dict: Class configuration including NAME_TABLE, JOIN_KEYS, DEFAULT_MAPPING, etc. """return{"__table_class__":cls.__name__,"__module__":cls.__module__,"NAME_TABLE":cls.NAME_TABLE,"JOIN_KEYS":cls.JOIN_KEYS,"KNOWN_FIELDS":cls.KNOWN_FIELDS,"DEFAULT_MAPPING":cls.DEFAULT_MAPPING,"PATHS":cls.PATHS,"REQUIRED_FIELDS":cls.REQUIRED_FIELDS,}
def__init__(self,table,name=None,column_mapping={}):""" Instantiate a PhenexTable, possibly overriding NAME_TABLE and COLUMN_MAPPING. """ifnotisinstance(table,Table):raiseTypeError(f"Cannot instantiatiate {self.__class__.__name__} from {type(table)}. Must be ibis Table.")self.NAME_TABLE=nameorself.NAME_TABLEself.column_mapping=self._get_column_mapping(column_mapping)self._table=table.mutate(**self._resolve_column_mapping(table,self.column_mapping))forkeyinself.REQUIRED_FIELDS:try:getattr(self._table,key)exceptAttributeError:raiseValueError(f"Required field {key} not defined in COLUMN_MAPPING.")self._add_phenotype_table_relationship()
filter(expr)
Filter the table by an Ibis Expression or using a PhenExFilter.
deffilter(self,expr):""" Filter the table by an Ibis Expression or using a PhenExFilter. """input_columns=self.columnsifisinstance(expr,ibis.expr.types.Expr)orisinstance(expr,list):filtered_table=self.table.filter(expr)else:filtered_table=expr.filter(self)returntype(self)(filtered_table.select(input_columns),name=self.NAME_TABLE,column_mapping=self.column_mapping,)
from_dict(data)classmethod
Reconstruct a PhenexTable class reference from serialized data.
Note: This returns the class itself, not an instance, since we cannot
reconstruct the actual table data without a database connection.
@classmethoddeffrom_dict(cls,data:dict):""" Reconstruct a PhenexTable class reference from serialized data. Note: This returns the class itself, not an instance, since we cannot reconstruct the actual table data without a database connection. Args: data: Serialized class configuration Returns: The PhenexTable subclass """# The class should already exist in the module, just return itreturncls
join(other,*args,domains=None,**kwargs)
The join method performs a join of PhenexTables, using autojoin functionality if Phenex is able to find the table types specified in PATHS.
defjoin(self,other:"PhenexTable",*args,domains=None,**kwargs):""" The join method performs a join of PhenexTables, using autojoin functionality if Phenex is able to find the table types specified in PATHS. """ifisinstance(other,Table):returntype(self)(self.table.join(other,*args,**kwargs))ifnotisinstance(other,PhenexTable):raiseTypeError(f"Expected a PhenexTable instance, got {type(other)}")iflen(args):# if user specifies join keys and join type, simply perform join as specifiedreturntype(self)(self.table.join(other.table,*args,**kwargs))# Do an autojoin by finding a path from the left to the right table and sequentially joining as necessary# joined table is the sequentially joined table# current table is the table for the left join in the current iterationjoined_table=current_left_table=selflogger.debug(f"Starting autojoin from {self.__class__.__name__} to {other.__class__.__name__}")forright_table_class_nameinself._find_path(other):# get the next right tableright_table_search_results=[vfork,vindomains.items()ifv.__class__.__name__==right_table_class_name]logger.debug(f"Searching for {right_table_class_name} in domains: {list(domains.keys())}")logger.debug(f"Found {len(right_table_search_results)} matches for {right_table_class_name}")iflen(right_table_search_results)!=1:raiseValueError(f"Unable to find unqiue {right_table_class_name} required to join {other.__class__.__name__}")right_table=right_table_search_results[0]print(f"\tJoining : {current_left_table.__class__.__name__} to {right_table.__class__.__name__}")# join keys are defined by the left table; in theory should enforce symmetryjoin_keys=current_left_table.JOIN_KEYS[right_table_class_name]# Build join predicate(s) - supports symmetric and asymmetric joins# Symmetric: ["COLUMN"] or ["COL1", "COL2"] - same column names in both tables# Asymmetric: [("LEFT_COL", "RIGHT_COL")] - different column names# Mixed: ["COL1", ("LEFT_COL", "RIGHT_COL")]predicates=[]forjoin_keyinjoin_keys:ifisinstance(join_key,str):# Symmetric: column exists in both tables with same namepredicates.append(joined_table[join_key]==right_table[join_key])elifisinstance(join_key,(tuple,list))andlen(join_key)==2:# Asymmetric: (left_col, right_col) - different column namesleft_col,right_col=join_keypredicates.append(joined_table[left_col]==right_table[right_col])else:raiseValueError(f"Invalid join key format: {join_key}. Must be either a string or a 2-element tuple/list.")# Combine all predicates with ANDiflen(predicates)==1:join_predicate=predicates[0]else:join_predicate=predicates[0]forpredinpredicates[1:]:join_predicate=join_predicate&predcolumns=list(set(joined_table.columns+right_table.columns))# subset columns, making sure to set type of table to the very left table (self)joined_table=type(self)(joined_table.join(right_table,join_predicate,**kwargs).select(columns))current_left_table=right_tablereturnjoined_table
to_dict()classmethod
Serialize the PhenexTable class configuration (not the data).
This serializes the class-level attributes that define the table mapping,
but not the actual ibis table data which cannot be serialized.
Returns:
Name
Type
Description
dict
dict
Class configuration including NAME_TABLE, JOIN_KEYS, DEFAULT_MAPPING, etc.
@classmethoddefto_dict(cls)->dict:""" Serialize the PhenexTable class configuration (not the data). This serializes the class-level attributes that define the table mapping, but not the actual ibis table data which cannot be serialized. Returns: dict: Class configuration including NAME_TABLE, JOIN_KEYS, DEFAULT_MAPPING, etc. """return{"__table_class__":cls.__name__,"__module__":cls.__module__,"NAME_TABLE":cls.NAME_TABLE,"JOIN_KEYS":cls.JOIN_KEYS,"KNOWN_FIELDS":cls.KNOWN_FIELDS,"DEFAULT_MAPPING":cls.DEFAULT_MAPPING,"PATHS":cls.PATHS,"REQUIRED_FIELDS":cls.REQUIRED_FIELDS,}
def__init__(self,table,name=None,column_mapping={}):""" Instantiate a PhenexTable, possibly overriding NAME_TABLE and COLUMN_MAPPING. """ifnotisinstance(table,Table):raiseTypeError(f"Cannot instantiatiate {self.__class__.__name__} from {type(table)}. Must be ibis Table.")self.NAME_TABLE=nameorself.NAME_TABLEself.column_mapping=self._get_column_mapping(column_mapping)self._table=table.mutate(**self._resolve_column_mapping(table,self.column_mapping))forkeyinself.REQUIRED_FIELDS:try:getattr(self._table,key)exceptAttributeError:raiseValueError(f"Required field {key} not defined in COLUMN_MAPPING.")self._add_phenotype_table_relationship()
filter(expr)
Filter the table by an Ibis Expression or using a PhenExFilter.
deffilter(self,expr):""" Filter the table by an Ibis Expression or using a PhenExFilter. """input_columns=self.columnsifisinstance(expr,ibis.expr.types.Expr)orisinstance(expr,list):filtered_table=self.table.filter(expr)else:filtered_table=expr.filter(self)returntype(self)(filtered_table.select(input_columns),name=self.NAME_TABLE,column_mapping=self.column_mapping,)
from_dict(data)classmethod
Reconstruct a PhenexTable class reference from serialized data.
Note: This returns the class itself, not an instance, since we cannot
reconstruct the actual table data without a database connection.
@classmethoddeffrom_dict(cls,data:dict):""" Reconstruct a PhenexTable class reference from serialized data. Note: This returns the class itself, not an instance, since we cannot reconstruct the actual table data without a database connection. Args: data: Serialized class configuration Returns: The PhenexTable subclass """# The class should already exist in the module, just return itreturncls
join(other,*args,domains=None,**kwargs)
The join method performs a join of PhenexTables, using autojoin functionality if Phenex is able to find the table types specified in PATHS.
defjoin(self,other:"PhenexTable",*args,domains=None,**kwargs):""" The join method performs a join of PhenexTables, using autojoin functionality if Phenex is able to find the table types specified in PATHS. """ifisinstance(other,Table):returntype(self)(self.table.join(other,*args,**kwargs))ifnotisinstance(other,PhenexTable):raiseTypeError(f"Expected a PhenexTable instance, got {type(other)}")iflen(args):# if user specifies join keys and join type, simply perform join as specifiedreturntype(self)(self.table.join(other.table,*args,**kwargs))# Do an autojoin by finding a path from the left to the right table and sequentially joining as necessary# joined table is the sequentially joined table# current table is the table for the left join in the current iterationjoined_table=current_left_table=selflogger.debug(f"Starting autojoin from {self.__class__.__name__} to {other.__class__.__name__}")forright_table_class_nameinself._find_path(other):# get the next right tableright_table_search_results=[vfork,vindomains.items()ifv.__class__.__name__==right_table_class_name]logger.debug(f"Searching for {right_table_class_name} in domains: {list(domains.keys())}")logger.debug(f"Found {len(right_table_search_results)} matches for {right_table_class_name}")iflen(right_table_search_results)!=1:raiseValueError(f"Unable to find unqiue {right_table_class_name} required to join {other.__class__.__name__}")right_table=right_table_search_results[0]print(f"\tJoining : {current_left_table.__class__.__name__} to {right_table.__class__.__name__}")# join keys are defined by the left table; in theory should enforce symmetryjoin_keys=current_left_table.JOIN_KEYS[right_table_class_name]# Build join predicate(s) - supports symmetric and asymmetric joins# Symmetric: ["COLUMN"] or ["COL1", "COL2"] - same column names in both tables# Asymmetric: [("LEFT_COL", "RIGHT_COL")] - different column names# Mixed: ["COL1", ("LEFT_COL", "RIGHT_COL")]predicates=[]forjoin_keyinjoin_keys:ifisinstance(join_key,str):# Symmetric: column exists in both tables with same namepredicates.append(joined_table[join_key]==right_table[join_key])elifisinstance(join_key,(tuple,list))andlen(join_key)==2:# Asymmetric: (left_col, right_col) - different column namesleft_col,right_col=join_keypredicates.append(joined_table[left_col]==right_table[right_col])else:raiseValueError(f"Invalid join key format: {join_key}. Must be either a string or a 2-element tuple/list.")# Combine all predicates with ANDiflen(predicates)==1:join_predicate=predicates[0]else:join_predicate=predicates[0]forpredinpredicates[1:]:join_predicate=join_predicate&predcolumns=list(set(joined_table.columns+right_table.columns))# subset columns, making sure to set type of table to the very left table (self)joined_table=type(self)(joined_table.join(right_table,join_predicate,**kwargs).select(columns))current_left_table=right_tablereturnjoined_table
to_dict()classmethod
Serialize the PhenexTable class configuration (not the data).
This serializes the class-level attributes that define the table mapping,
but not the actual ibis table data which cannot be serialized.
Returns:
Name
Type
Description
dict
dict
Class configuration including NAME_TABLE, JOIN_KEYS, DEFAULT_MAPPING, etc.
@classmethoddefto_dict(cls)->dict:""" Serialize the PhenexTable class configuration (not the data). This serializes the class-level attributes that define the table mapping, but not the actual ibis table data which cannot be serialized. Returns: dict: Class configuration including NAME_TABLE, JOIN_KEYS, DEFAULT_MAPPING, etc. """return{"__table_class__":cls.__name__,"__module__":cls.__module__,"NAME_TABLE":cls.NAME_TABLE,"JOIN_KEYS":cls.JOIN_KEYS,"KNOWN_FIELDS":cls.KNOWN_FIELDS,"DEFAULT_MAPPING":cls.DEFAULT_MAPPING,"PATHS":cls.PATHS,"REQUIRED_FIELDS":cls.REQUIRED_FIELDS,}
def__init__(self,table,name=None,column_mapping={}):""" Instantiate a PhenexTable, possibly overriding NAME_TABLE and COLUMN_MAPPING. """ifnotisinstance(table,Table):raiseTypeError(f"Cannot instantiatiate {self.__class__.__name__} from {type(table)}. Must be ibis Table.")self.NAME_TABLE=nameorself.NAME_TABLEself.column_mapping=self._get_column_mapping(column_mapping)self._table=table.mutate(**self._resolve_column_mapping(table,self.column_mapping))forkeyinself.REQUIRED_FIELDS:try:getattr(self._table,key)exceptAttributeError:raiseValueError(f"Required field {key} not defined in COLUMN_MAPPING.")self._add_phenotype_table_relationship()
filter(expr)
Filter the table by an Ibis Expression or using a PhenExFilter.
deffilter(self,expr):""" Filter the table by an Ibis Expression or using a PhenExFilter. """input_columns=self.columnsifisinstance(expr,ibis.expr.types.Expr)orisinstance(expr,list):filtered_table=self.table.filter(expr)else:filtered_table=expr.filter(self)returntype(self)(filtered_table.select(input_columns),name=self.NAME_TABLE,column_mapping=self.column_mapping,)
from_dict(data)classmethod
Reconstruct a PhenexTable class reference from serialized data.
Note: This returns the class itself, not an instance, since we cannot
reconstruct the actual table data without a database connection.
@classmethoddeffrom_dict(cls,data:dict):""" Reconstruct a PhenexTable class reference from serialized data. Note: This returns the class itself, not an instance, since we cannot reconstruct the actual table data without a database connection. Args: data: Serialized class configuration Returns: The PhenexTable subclass """# The class should already exist in the module, just return itreturncls
join(other,*args,domains=None,**kwargs)
The join method performs a join of PhenexTables, using autojoin functionality if Phenex is able to find the table types specified in PATHS.
defjoin(self,other:"PhenexTable",*args,domains=None,**kwargs):""" The join method performs a join of PhenexTables, using autojoin functionality if Phenex is able to find the table types specified in PATHS. """ifisinstance(other,Table):returntype(self)(self.table.join(other,*args,**kwargs))ifnotisinstance(other,PhenexTable):raiseTypeError(f"Expected a PhenexTable instance, got {type(other)}")iflen(args):# if user specifies join keys and join type, simply perform join as specifiedreturntype(self)(self.table.join(other.table,*args,**kwargs))# Do an autojoin by finding a path from the left to the right table and sequentially joining as necessary# joined table is the sequentially joined table# current table is the table for the left join in the current iterationjoined_table=current_left_table=selflogger.debug(f"Starting autojoin from {self.__class__.__name__} to {other.__class__.__name__}")forright_table_class_nameinself._find_path(other):# get the next right tableright_table_search_results=[vfork,vindomains.items()ifv.__class__.__name__==right_table_class_name]logger.debug(f"Searching for {right_table_class_name} in domains: {list(domains.keys())}")logger.debug(f"Found {len(right_table_search_results)} matches for {right_table_class_name}")iflen(right_table_search_results)!=1:raiseValueError(f"Unable to find unqiue {right_table_class_name} required to join {other.__class__.__name__}")right_table=right_table_search_results[0]print(f"\tJoining : {current_left_table.__class__.__name__} to {right_table.__class__.__name__}")# join keys are defined by the left table; in theory should enforce symmetryjoin_keys=current_left_table.JOIN_KEYS[right_table_class_name]# Build join predicate(s) - supports symmetric and asymmetric joins# Symmetric: ["COLUMN"] or ["COL1", "COL2"] - same column names in both tables# Asymmetric: [("LEFT_COL", "RIGHT_COL")] - different column names# Mixed: ["COL1", ("LEFT_COL", "RIGHT_COL")]predicates=[]forjoin_keyinjoin_keys:ifisinstance(join_key,str):# Symmetric: column exists in both tables with same namepredicates.append(joined_table[join_key]==right_table[join_key])elifisinstance(join_key,(tuple,list))andlen(join_key)==2:# Asymmetric: (left_col, right_col) - different column namesleft_col,right_col=join_keypredicates.append(joined_table[left_col]==right_table[right_col])else:raiseValueError(f"Invalid join key format: {join_key}. Must be either a string or a 2-element tuple/list.")# Combine all predicates with ANDiflen(predicates)==1:join_predicate=predicates[0]else:join_predicate=predicates[0]forpredinpredicates[1:]:join_predicate=join_predicate&predcolumns=list(set(joined_table.columns+right_table.columns))# subset columns, making sure to set type of table to the very left table (self)joined_table=type(self)(joined_table.join(right_table,join_predicate,**kwargs).select(columns))current_left_table=right_tablereturnjoined_table
to_dict()classmethod
Serialize the PhenexTable class configuration (not the data).
This serializes the class-level attributes that define the table mapping,
but not the actual ibis table data which cannot be serialized.
Returns:
Name
Type
Description
dict
dict
Class configuration including NAME_TABLE, JOIN_KEYS, DEFAULT_MAPPING, etc.
@classmethoddefto_dict(cls)->dict:""" Serialize the PhenexTable class configuration (not the data). This serializes the class-level attributes that define the table mapping, but not the actual ibis table data which cannot be serialized. Returns: dict: Class configuration including NAME_TABLE, JOIN_KEYS, DEFAULT_MAPPING, etc. """return{"__table_class__":cls.__name__,"__module__":cls.__module__,"NAME_TABLE":cls.NAME_TABLE,"JOIN_KEYS":cls.JOIN_KEYS,"KNOWN_FIELDS":cls.KNOWN_FIELDS,"DEFAULT_MAPPING":cls.DEFAULT_MAPPING,"PATHS":cls.PATHS,"REQUIRED_FIELDS":cls.REQUIRED_FIELDS,}
def__init__(self,table,name=None,column_mapping={}):""" Instantiate a PhenexTable, possibly overriding NAME_TABLE and COLUMN_MAPPING. """ifnotisinstance(table,Table):raiseTypeError(f"Cannot instantiatiate {self.__class__.__name__} from {type(table)}. Must be ibis Table.")self.NAME_TABLE=nameorself.NAME_TABLEself.column_mapping=self._get_column_mapping(column_mapping)self._table=table.mutate(**self._resolve_column_mapping(table,self.column_mapping))forkeyinself.REQUIRED_FIELDS:try:getattr(self._table,key)exceptAttributeError:raiseValueError(f"Required field {key} not defined in COLUMN_MAPPING.")self._add_phenotype_table_relationship()
filter(expr)
Filter the table by an Ibis Expression or using a PhenExFilter.
deffilter(self,expr):""" Filter the table by an Ibis Expression or using a PhenExFilter. """input_columns=self.columnsifisinstance(expr,ibis.expr.types.Expr)orisinstance(expr,list):filtered_table=self.table.filter(expr)else:filtered_table=expr.filter(self)returntype(self)(filtered_table.select(input_columns),name=self.NAME_TABLE,column_mapping=self.column_mapping,)
from_dict(data)classmethod
Reconstruct a PhenexTable class reference from serialized data.
Note: This returns the class itself, not an instance, since we cannot
reconstruct the actual table data without a database connection.
@classmethoddeffrom_dict(cls,data:dict):""" Reconstruct a PhenexTable class reference from serialized data. Note: This returns the class itself, not an instance, since we cannot reconstruct the actual table data without a database connection. Args: data: Serialized class configuration Returns: The PhenexTable subclass """# The class should already exist in the module, just return itreturncls
join(other,*args,domains=None,**kwargs)
The join method performs a join of PhenexTables, using autojoin functionality if Phenex is able to find the table types specified in PATHS.
defjoin(self,other:"PhenexTable",*args,domains=None,**kwargs):""" The join method performs a join of PhenexTables, using autojoin functionality if Phenex is able to find the table types specified in PATHS. """ifisinstance(other,Table):returntype(self)(self.table.join(other,*args,**kwargs))ifnotisinstance(other,PhenexTable):raiseTypeError(f"Expected a PhenexTable instance, got {type(other)}")iflen(args):# if user specifies join keys and join type, simply perform join as specifiedreturntype(self)(self.table.join(other.table,*args,**kwargs))# Do an autojoin by finding a path from the left to the right table and sequentially joining as necessary# joined table is the sequentially joined table# current table is the table for the left join in the current iterationjoined_table=current_left_table=selflogger.debug(f"Starting autojoin from {self.__class__.__name__} to {other.__class__.__name__}")forright_table_class_nameinself._find_path(other):# get the next right tableright_table_search_results=[vfork,vindomains.items()ifv.__class__.__name__==right_table_class_name]logger.debug(f"Searching for {right_table_class_name} in domains: {list(domains.keys())}")logger.debug(f"Found {len(right_table_search_results)} matches for {right_table_class_name}")iflen(right_table_search_results)!=1:raiseValueError(f"Unable to find unqiue {right_table_class_name} required to join {other.__class__.__name__}")right_table=right_table_search_results[0]print(f"\tJoining : {current_left_table.__class__.__name__} to {right_table.__class__.__name__}")# join keys are defined by the left table; in theory should enforce symmetryjoin_keys=current_left_table.JOIN_KEYS[right_table_class_name]# Build join predicate(s) - supports symmetric and asymmetric joins# Symmetric: ["COLUMN"] or ["COL1", "COL2"] - same column names in both tables# Asymmetric: [("LEFT_COL", "RIGHT_COL")] - different column names# Mixed: ["COL1", ("LEFT_COL", "RIGHT_COL")]predicates=[]forjoin_keyinjoin_keys:ifisinstance(join_key,str):# Symmetric: column exists in both tables with same namepredicates.append(joined_table[join_key]==right_table[join_key])elifisinstance(join_key,(tuple,list))andlen(join_key)==2:# Asymmetric: (left_col, right_col) - different column namesleft_col,right_col=join_keypredicates.append(joined_table[left_col]==right_table[right_col])else:raiseValueError(f"Invalid join key format: {join_key}. Must be either a string or a 2-element tuple/list.")# Combine all predicates with ANDiflen(predicates)==1:join_predicate=predicates[0]else:join_predicate=predicates[0]forpredinpredicates[1:]:join_predicate=join_predicate&predcolumns=list(set(joined_table.columns+right_table.columns))# subset columns, making sure to set type of table to the very left table (self)joined_table=type(self)(joined_table.join(right_table,join_predicate,**kwargs).select(columns))current_left_table=right_tablereturnjoined_table
to_dict()classmethod
Serialize the PhenexTable class configuration (not the data).
This serializes the class-level attributes that define the table mapping,
but not the actual ibis table data which cannot be serialized.
Returns:
Name
Type
Description
dict
dict
Class configuration including NAME_TABLE, JOIN_KEYS, DEFAULT_MAPPING, etc.
@classmethoddefto_dict(cls)->dict:""" Serialize the PhenexTable class configuration (not the data). This serializes the class-level attributes that define the table mapping, but not the actual ibis table data which cannot be serialized. Returns: dict: Class configuration including NAME_TABLE, JOIN_KEYS, DEFAULT_MAPPING, etc. """return{"__table_class__":cls.__name__,"__module__":cls.__module__,"NAME_TABLE":cls.NAME_TABLE,"JOIN_KEYS":cls.JOIN_KEYS,"KNOWN_FIELDS":cls.KNOWN_FIELDS,"DEFAULT_MAPPING":cls.DEFAULT_MAPPING,"PATHS":cls.PATHS,"REQUIRED_FIELDS":cls.REQUIRED_FIELDS,}
def__init__(self,table,name=None,column_mapping={}):""" Instantiate a PhenexTable, possibly overriding NAME_TABLE and COLUMN_MAPPING. """ifnotisinstance(table,Table):raiseTypeError(f"Cannot instantiatiate {self.__class__.__name__} from {type(table)}. Must be ibis Table.")self.NAME_TABLE=nameorself.NAME_TABLEself.column_mapping=self._get_column_mapping(column_mapping)self._table=table.mutate(**self._resolve_column_mapping(table,self.column_mapping))forkeyinself.REQUIRED_FIELDS:try:getattr(self._table,key)exceptAttributeError:raiseValueError(f"Required field {key} not defined in COLUMN_MAPPING.")self._add_phenotype_table_relationship()
filter(expr)
Filter the table by an Ibis Expression or using a PhenExFilter.
deffilter(self,expr):""" Filter the table by an Ibis Expression or using a PhenExFilter. """input_columns=self.columnsifisinstance(expr,ibis.expr.types.Expr)orisinstance(expr,list):filtered_table=self.table.filter(expr)else:filtered_table=expr.filter(self)returntype(self)(filtered_table.select(input_columns),name=self.NAME_TABLE,column_mapping=self.column_mapping,)
from_dict(data)classmethod
Reconstruct a PhenexTable class reference from serialized data.
Note: This returns the class itself, not an instance, since we cannot
reconstruct the actual table data without a database connection.
@classmethoddeffrom_dict(cls,data:dict):""" Reconstruct a PhenexTable class reference from serialized data. Note: This returns the class itself, not an instance, since we cannot reconstruct the actual table data without a database connection. Args: data: Serialized class configuration Returns: The PhenexTable subclass """# The class should already exist in the module, just return itreturncls
join(other,*args,domains=None,**kwargs)
The join method performs a join of PhenexTables, using autojoin functionality if Phenex is able to find the table types specified in PATHS.
defjoin(self,other:"PhenexTable",*args,domains=None,**kwargs):""" The join method performs a join of PhenexTables, using autojoin functionality if Phenex is able to find the table types specified in PATHS. """ifisinstance(other,Table):returntype(self)(self.table.join(other,*args,**kwargs))ifnotisinstance(other,PhenexTable):raiseTypeError(f"Expected a PhenexTable instance, got {type(other)}")iflen(args):# if user specifies join keys and join type, simply perform join as specifiedreturntype(self)(self.table.join(other.table,*args,**kwargs))# Do an autojoin by finding a path from the left to the right table and sequentially joining as necessary# joined table is the sequentially joined table# current table is the table for the left join in the current iterationjoined_table=current_left_table=selflogger.debug(f"Starting autojoin from {self.__class__.__name__} to {other.__class__.__name__}")forright_table_class_nameinself._find_path(other):# get the next right tableright_table_search_results=[vfork,vindomains.items()ifv.__class__.__name__==right_table_class_name]logger.debug(f"Searching for {right_table_class_name} in domains: {list(domains.keys())}")logger.debug(f"Found {len(right_table_search_results)} matches for {right_table_class_name}")iflen(right_table_search_results)!=1:raiseValueError(f"Unable to find unqiue {right_table_class_name} required to join {other.__class__.__name__}")right_table=right_table_search_results[0]print(f"\tJoining : {current_left_table.__class__.__name__} to {right_table.__class__.__name__}")# join keys are defined by the left table; in theory should enforce symmetryjoin_keys=current_left_table.JOIN_KEYS[right_table_class_name]# Build join predicate(s) - supports symmetric and asymmetric joins# Symmetric: ["COLUMN"] or ["COL1", "COL2"] - same column names in both tables# Asymmetric: [("LEFT_COL", "RIGHT_COL")] - different column names# Mixed: ["COL1", ("LEFT_COL", "RIGHT_COL")]predicates=[]forjoin_keyinjoin_keys:ifisinstance(join_key,str):# Symmetric: column exists in both tables with same namepredicates.append(joined_table[join_key]==right_table[join_key])elifisinstance(join_key,(tuple,list))andlen(join_key)==2:# Asymmetric: (left_col, right_col) - different column namesleft_col,right_col=join_keypredicates.append(joined_table[left_col]==right_table[right_col])else:raiseValueError(f"Invalid join key format: {join_key}. Must be either a string or a 2-element tuple/list.")# Combine all predicates with ANDiflen(predicates)==1:join_predicate=predicates[0]else:join_predicate=predicates[0]forpredinpredicates[1:]:join_predicate=join_predicate&predcolumns=list(set(joined_table.columns+right_table.columns))# subset columns, making sure to set type of table to the very left table (self)joined_table=type(self)(joined_table.join(right_table,join_predicate,**kwargs).select(columns))current_left_table=right_tablereturnjoined_table
to_dict()classmethod
Serialize the PhenexTable class configuration (not the data).
This serializes the class-level attributes that define the table mapping,
but not the actual ibis table data which cannot be serialized.
Returns:
Name
Type
Description
dict
dict
Class configuration including NAME_TABLE, JOIN_KEYS, DEFAULT_MAPPING, etc.
@classmethoddefto_dict(cls)->dict:""" Serialize the PhenexTable class configuration (not the data). This serializes the class-level attributes that define the table mapping, but not the actual ibis table data which cannot be serialized. Returns: dict: Class configuration including NAME_TABLE, JOIN_KEYS, DEFAULT_MAPPING, etc. """return{"__table_class__":cls.__name__,"__module__":cls.__module__,"NAME_TABLE":cls.NAME_TABLE,"JOIN_KEYS":cls.JOIN_KEYS,"KNOWN_FIELDS":cls.KNOWN_FIELDS,"DEFAULT_MAPPING":cls.DEFAULT_MAPPING,"PATHS":cls.PATHS,"REQUIRED_FIELDS":cls.REQUIRED_FIELDS,}