PhenEx Study Tutorial¶
In this page we will show you how to use PhenEx to :
- Connect to a Snowflake Database
- Work with OMOP data
- Create a simple cohort
- View cohort summary statistics
First make sure that your PhenEx version is up to date
# For updating PhenEx to latest released version
# !pip install -Uq PhenEx
import ibis
ibis.options.interactive = True
import os
# authentication
os.environ.update({
'SNOWFLAKE_ACCOUNT':'ACCOUNT NAME',
'SNOWFLAKE_WAREHOUSE':'WAREHOUSE NAME',
'SNOWFLAKE_ROLE':'ROLE',
'SNOWFLAKE_USER':'USERNAME',
})
Method 2 :¶
You can also specify these with using a dotenv file (https://github.com/motdotla/dotenv). One advantage to doing this is that you do not put sensitive credential information into your jupyter notebook.
from dotenv import load_dotenv
load_dotenv()
False
If you see True above, it means python was able to find and load your environment file.
Connect to the database¶
We will now establish a connection to Snowflake using a SnowflakeConnector; these connectors will use your environment variables (set above) for login credentials.
At this point we must define two databases in Snowflake:
- Source : the snowflake location where input data to phenex should come from
- Destination (dest) : the snowflake location where output data from phenex should be written. The destination will be created if it does not exist.
Run this cell to connect to these databases; this cell will open up two browser tabs (if you're using browser authentication). After those pages load (wait for them to say completed!), close them and return to this notebook.
%%capture
from phenex.ibis_connect import SnowflakeConnector
con = SnowflakeConnector(
SNOWFLAKE_SOURCE_DATABASE = 'SCHEMA_SOURCE.DATABASE',
SNOWFLAKE_DEST_DATABASE = 'SCHEMA_DEST.DATABASE'
)
Notice that both of these locations can also be specified using environment variables (like we did in method 1/2 for credentials), and vice versa (credentials can be passed to a connector as keyword arguments, rather being hidden in the .env file). However, as credentials generally remain the same between projects and the database locations are project dependent, it is best practice to define database locations with the connector.
Define input data structure¶
PhenEx needs to know a little bit about the structure of the input data in order to help us make phenotypes and cohorts.
What this means is that PhenEx knows in what table and column to find information such as patient id, year of birth, diagnosis events, etc. This information is generally present in all RWD sources, but for each data source, is (1) organized in a different way and (2) can have different column names.
When using a new data source, we need to onboard that database for usage with PhenEx (tell it about table structure and column names). Go to the tutorial on onboarding a new database to learn how to onboard a database.
For the purposes of this tutorial, we will be using OMOP data, which is already onboarded and available in the PhenEx library. All we have to do is import the OMOPDomains and then get the mapped tables.
from phenex.mappers import OMOPDomains
omop_mapped_tables = OMOPDomains.get_mapped_tables(con)
omop_domains = list(omop_mapped_tables.keys())
omop_domains
['PERSON', 'VISIT_DETAIL', 'CONDITION_OCCURRENCE', 'DEATH', 'PROCEDURE_OCCURRENCE', 'DRUG_EXPOSURE', 'CONDITION_OCCURRENCE_SOURCE', 'PROCEDURE_OCCURRENCE_SOURCE', 'DRUG_EXPOSURE_SOURCE', 'PERSON_SOURCE', 'OBSERVATION_PERIOD']
Looking at input data¶
PhenEx bundles all input data into a dictionary, in this case in the variable called omop_mapped_tables. The keys in this dictionary are known as 'domains'; we can access the input data by these domain keys. The values for each key are the actual tables
Entry criterion¶
from phenex.phenotypes.codelist_phenotype import CodelistPhenotype
from phenex.codelists.codelists import Codelist
af_codelist = Codelist([313217])
entry = CodelistPhenotype(
name='af',
domain='CONDITION_OCCURRENCE',
codelist=af_codelist,
use_code_type=False,
return_date='first',
)
entry.execute(omop_mapped_tables)
entry.table.head(5).to_pandas()
Inclusions¶
from phenex.filters.value import Value
from phenex.filters.categorical_filter import CategoricalFilter
from phenex.filters.relative_time_range_filter import RelativeTimeRangeFilter
inpatient = CategoricalFilter(
column_name='VISIT_DETAIL_SOURCE_VALUE',
allowed_values=['22'],
domain='VISIT_DETAIL'
)
preindex = RelativeTimeRangeFilter(max_days=Value('<', 90), anchor_phenotype=entry)
mi_codelist = Codelist([49601007])
mi_emergency_preindex = CodelistPhenotype(
name='hf',
domain='condition_occurrence'.upper(),
codelist=af_codelist,
use_code_type=False,
return_date='first',
categorical_filter=inpatient,
relative_time_range=preindex
)
mi_emergency_preindex.execute(omop_mapped_tables)
mi_emergency_preindex.table.head(5).to_pandas()
inclusions = [mi_emergency_preindex]
Exclusions¶
exclusions = []
Characteristics¶
from phenex.phenotypes.age_phenotype import AgePhenotype
age = AgePhenotype(anchor_phenotype=entry)
characteristics = [age]
Cohort¶
from phenex.phenotypes.cohort import Cohort
cohort = Cohort(
name = 'af',
entry_criterion=entry,
inclusions=inclusions,
exclusions=exclusions,
characteristics=characteristics
)
cohort.execute(omop_mapped_tables)
cohort.characteristics_table.head(5).to_pandas()
cohort.table1