PhenEx Study Tutorial¶

In this page we will show you how to use PhenEx to :

Connect to a Snowflake Database
Work with OMOP data
Create a simple cohort
View cohort summary statistics

First make sure that your PhenEx version is up to date

In [8]:

Copied!

# For updating PhenEx to latest released version
# !pip install -Uq PhenEx
# For updating PhenEx to latest released version
# !pip install -Uq PhenEx

In [2]:

Copied!

import ibis
ibis.options.interactive = True
import ibis
ibis.options.interactive = True

Set Snowflake Credentials¶

PhenEx needs to connect to a Snowflake backend and therefore needs your login credentials. There are two ways to do this : (1) explicitly or (2) using an .env (dot env) file. We show how to do both, but only do one!

Method 1 :¶

In [3]:

Copied!





import os

# authentication
os.environ.update({
    'SNOWFLAKE_ACCOUNT':'ACCOUNT NAME',
    'SNOWFLAKE_WAREHOUSE':'WAREHOUSE NAME',
    'SNOWFLAKE_ROLE':'ROLE',
    'SNOWFLAKE_USER':'USERNAME',
})
import os

# authentication
os.environ.update({
    'SNOWFLAKE_ACCOUNT':'ACCOUNT NAME',
    'SNOWFLAKE_WAREHOUSE':'WAREHOUSE NAME',
    'SNOWFLAKE_ROLE':'ROLE',
    'SNOWFLAKE_USER':'USERNAME',
})

Method 2 :¶

You can also specify these with using a dotenv file (https://github.com/motdotla/dotenv). One advantage to doing this is that you do not put sensitive credential information into your jupyter notebook.

In [4]:

Copied!

from dotenv import load_dotenv
load_dotenv()
from dotenv import load_dotenv
load_dotenv()

Out[4]:

False

If you see True above, it means python was able to find and load your environment file.

Connect to the database¶

We will now establish a connection to Snowflake using a SnowflakeConnector; these connectors will use your environment variables (set above) for login credentials.

At this point we must define two databases in Snowflake:

Source : the snowflake location where input data to phenex should come from
Destination (dest) : the snowflake location where output data from phenex should be written. The destination will be created if it does not exist.

Run this cell to connect to these databases; this cell will open up two browser tabs (if you're using browser authentication). After those pages load (wait for them to say completed!), close them and return to this notebook.

In [5]:

Copied!





%%capture
from phenex.ibis_connect import SnowflakeConnector

con = SnowflakeConnector(
    SNOWFLAKE_SOURCE_DATABASE = 'SCHEMA_SOURCE.DATABASE',
    SNOWFLAKE_DEST_DATABASE = 'SCHEMA_DEST.DATABASE'
)
%%capture
from phenex.ibis_connect import SnowflakeConnector

con = SnowflakeConnector(
    SNOWFLAKE_SOURCE_DATABASE = 'SCHEMA_SOURCE.DATABASE',
    SNOWFLAKE_DEST_DATABASE = 'SCHEMA_DEST.DATABASE'
)

Notice that both of these locations can also be specified using environment variables (like we did in method 1/2 for credentials), and vice versa (credentials can be passed to a connector as keyword arguments, rather being hidden in the .env file). However, as credentials generally remain the same between projects and the database locations are project dependent, it is best practice to define database locations with the connector.

Define input data structure¶

PhenEx needs to know a little bit about the structure of the input data in order to help us make phenotypes and cohorts.

What this means is that PhenEx knows in what table and column to find information such as patient id, year of birth, diagnosis events, etc. This information is generally present in all RWD sources, but for each data source, is (1) organized in a different way and (2) can have different column names.

When using a new data source, we need to onboard that database for usage with PhenEx (tell it about table structure and column names). Go to the tutorial on onboarding a new database to learn how to onboard a database.

For the purposes of this tutorial, we will be using OMOP data, which is already onboarded and available in the PhenEx library. All we have to do is import the OMOPDomains and then get the mapped tables.

In [6]:

Copied!





from phenex.mappers import OMOPDomains
omop_mapped_tables = OMOPDomains.get_mapped_tables(con)
omop_domains = list(omop_mapped_tables.keys())
omop_domains
from phenex.mappers import OMOPDomains
omop_mapped_tables = OMOPDomains.get_mapped_tables(con)
omop_domains = list(omop_mapped_tables.keys())
omop_domains

Out[6]:

['PERSON',
 'VISIT_DETAIL',
 'CONDITION_OCCURRENCE',
 'DEATH',
 'PROCEDURE_OCCURRENCE',
 'DRUG_EXPOSURE',
 'CONDITION_OCCURRENCE_SOURCE',
 'PROCEDURE_OCCURRENCE_SOURCE',
 'DRUG_EXPOSURE_SOURCE',
 'PERSON_SOURCE',
 'OBSERVATION_PERIOD']

Looking at input data¶

PhenEx bundles all input data into a dictionary, in this case in the variable called omop_mapped_tables. The keys in this dictionary are known as 'domains'; we can access the input data by these domain keys. The values for each key are the actual tables

Entry criterion¶

In [ ]:

Copied!





from phenex.phenotypes.codelist_phenotype import CodelistPhenotype
from phenex.codelists.codelists import Codelist

af_codelist = Codelist([313217])
entry = CodelistPhenotype(
    name='af',
    domain='CONDITION_OCCURRENCE',
    codelist=af_codelist,
    use_code_type=False,
    return_date='first',
)
from phenex.phenotypes.codelist_phenotype import CodelistPhenotype
from phenex.codelists.codelists import Codelist

af_codelist = Codelist([313217])
entry = CodelistPhenotype(
    name='af',
    domain='CONDITION_OCCURRENCE',
    codelist=af_codelist,
    use_code_type=False,
    return_date='first',
)

In [ ]:

Copied!

entry.execute(omop_mapped_tables)
entry.table.head(5).to_pandas()
entry.execute(omop_mapped_tables)
entry.table.head(5).to_pandas()

Inclusions¶

In [ ]:

Copied!





from phenex.filters.value import Value
from phenex.filters.categorical_filter import CategoricalFilter
from phenex.filters.relative_time_range_filter import RelativeTimeRangeFilter

inpatient = CategoricalFilter(
    column_name='VISIT_DETAIL_SOURCE_VALUE', 
    allowed_values=['22'], 
    domain='VISIT_DETAIL'
)

preindex = RelativeTimeRangeFilter(max_days=Value('<', 90), anchor_phenotype=entry)

mi_codelist = Codelist([49601007])
mi_emergency_preindex = CodelistPhenotype(
    name='hf',
    domain='condition_occurrence'.upper(),
    codelist=af_codelist,
    use_code_type=False,
    return_date='first',
    categorical_filter=inpatient,
    relative_time_range=preindex
)

from phenex.filters.value import Value
from phenex.filters.categorical_filter import CategoricalFilter
from phenex.filters.relative_time_range_filter import RelativeTimeRangeFilter

inpatient = CategoricalFilter(
    column_name='VISIT_DETAIL_SOURCE_VALUE', 
    allowed_values=['22'], 
    domain='VISIT_DETAIL'
)

preindex = RelativeTimeRangeFilter(max_days=Value('<', 90), anchor_phenotype=entry)

mi_codelist = Codelist([49601007])
mi_emergency_preindex = CodelistPhenotype(
    name='hf',
    domain='condition_occurrence'.upper(),
    codelist=af_codelist,
    use_code_type=False,
    return_date='first',
    categorical_filter=inpatient,
    relative_time_range=preindex
)

In [ ]:

Copied!

mi_emergency_preindex.execute(omop_mapped_tables)
mi_emergency_preindex.table.head(5).to_pandas()
mi_emergency_preindex.execute(omop_mapped_tables)
mi_emergency_preindex.table.head(5).to_pandas()

In [ ]:

Copied!

inclusions = [mi_emergency_preindex]
inclusions = [mi_emergency_preindex]

Exclusions¶

In [ ]:

Copied!

exclusions = []
exclusions = []

Characteristics¶

In [ ]:

Copied!

from phenex.phenotypes.age_phenotype import AgePhenotype

age = AgePhenotype(anchor_phenotype=entry)
characteristics = [age]
from phenex.phenotypes.age_phenotype import AgePhenotype

age = AgePhenotype(anchor_phenotype=entry)
characteristics = [age]

Cohort¶

In [ ]:

Copied!





from phenex.phenotypes.cohort import Cohort

cohort = Cohort(
    name = 'af',
    entry_criterion=entry,
    inclusions=inclusions,
    exclusions=exclusions,
    characteristics=characteristics
)
from phenex.phenotypes.cohort import Cohort

cohort = Cohort(
    name = 'af',
    entry_criterion=entry,
    inclusions=inclusions,
    exclusions=exclusions,
    characteristics=characteristics
)

In [ ]:

Copied!

cohort.execute(omop_mapped_tables)
cohort.execute(omop_mapped_tables)

In [ ]:

Copied!

cohort.characteristics_table.head(5).to_pandas()
cohort.characteristics_table.head(5).to_pandas()

In [ ]:

Copied!

cohort.table1
cohort.table1

In [ ]: