Codelist
Codelist
Codelist is a class that allows us to conveniently work with medical codes used in RWD analyses. A Codelist represents a (single) specific medical concept, such as 'atrial fibrillation' or 'myocardial infarction'. A Codelist is associated with a set of medical codes from one or multiple source vocabularies (such as ICD10CM or CPT); we call these vocabularies 'code types'. Code type is important, as there are no assurances that codes from different vocabularies (different code types) do not overlap. It is therefore highly recommended to always specify the code type when using a codelist.
Codelist is a simple class that stores the codelist as a dictionary. The dictionary is keyed by code type and the value is a list of codes. Codelist also has various convenience methods such as read from excel, csv or yaml files, and export to excel files.
Fuzzy codelists allow the use of '%' as a wildcard character in codes. This can be useful when you want to match a range of codes that share a common prefix. For example, 'I48.%' will match any code that starts with 'I48.'. Multiple fuzzy matches can be passed just like ordinary codes in a list.
If a codelist contains more than 100 fuzzy codes, a warning will be issued as performance may suffer significantly.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
Optional[str]
|
Descriptive name of codelist |
None
|
codelist
|
Union[str, List, Dict[str, List]]
|
User can enter codelists as either a string, a list of strings or a dictionary keyed by code type. In first two cases, the class will convert the input to a dictionary with a single key None. All consumers of the Codelist instance can then assume the codelist in that format. |
required |
Methods:
Name | Description |
---|---|
from_yaml |
Load a codelist from a YAML file. |
from_excel |
Load a codelist from an Excel file. |
from_csv |
Load a codelist from a CSV file. |
File Formats
YAML: The YAML file should contain a dictionary where the keys are code types (e.g., "ICD-9", "ICD-10") and the values are lists of codes for each type.
Example:
ICD-9:
- "427.31" # Atrial fibrillation
ICD-10:
- "I48.0" # Paroxysmal atrial fibrillation
- "I48.1" # Persistent atrial fibrillation
- "I48.2" # Chronic atrial fibrillation
- "I48.91" # Unspecified atrial fibrillation
Excel: The Excel file should contain a minimum of two columns for code and code_type. If multiple codelists exist in the same table, an additional column for codelist names is required.
Example (Single codelist):
| code_type | code |
|-----------|--------|
| ICD-9 | 427.31 |
| ICD-10 | I48.0 |
| ICD-10 | I48.1 |
| ICD-10 | I48.2 |
| ICD-10 | I48.91 |
Example (Multiple codelists):
| code_type | code | codelist |
|-----------|--------|--------------------|
| ICD-9 | 427.31 | atrial_fibrillation|
| ICD-10 | I48.0 | atrial_fibrillation|
| ICD-10 | I48.1 | atrial_fibrillation|
| ICD-10 | I48.2 | atrial_fibrillation|
| ICD-10 | I48.91 | atrial_fibrillation|
CSV: The CSV file should follow the same format as the Excel file, with columns for code, code_type, and optionally codelist names.
Example:
# Initialize with a list
cl = Codelist(
['x', 'y', 'z'],
'mycodelist'
)
print(cl.codelist)
{None: ['x', 'y', 'z']}
Example:
Example:
# Initialize with a dictionary
>> atrial_fibrillation_icd_codes = {
"ICD-9": [
"427.31" # Atrial fibrillation
],
"ICD-10": [
"I48.0", # Paroxysmal atrial fibrillation
"I48.1", # Persistent atrial fibrillation
"I48.2", # Chronic atrial fibrillation
"I48.91", # Unspecified atrial fibrillation
]
}
cl = Codelist(
atrial_fibrillation_icd_codes,
'atrial_fibrillation',
)
print(cl.codelist)
{
"ICD-9": [
"427.31" # Atrial fibrillation
],
"ICD-10": [
"I48.0", # Paroxysmal atrial fibrillation
"I48.1", # Persistent atrial fibrillation
"I48.2", # Chronic atrial fibrillation
"I48.91", # Unspecified atrial fibrillation
]
}
# Initialize with a fuzzy codelist
anemia = Codelist(
{'ICD10CM': ['D55%', 'D56%', 'D57%', 'D58%', 'D59%', 'D60%']},
{'ICD9CM': ['284%', '285%', '282%']},
'fuzzy_codelist'
)
Source code in phenex/codelists/codelists.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 |
|
from_excel(path, sheet_name=None, codelist_name=None, code_column='code', code_type_column='code_type', codelist_column='codelist')
classmethod
Load a single codelist located in an Excel file.
It is required that the Excel file contains a minimum of two columns for code and code_type. The actual columnnames can be specified using the code_column and code_type_column parameters.
If multiple codelists exist in the same excel table, the codelist_column and codelist_name are required to point to the specific codelist of interest.
It is possible to specify the sheet name if the codelist is in a specific sheet.
-
Single table, single codelist : The table (whether an entire excel file, or a single sheet in an excel file) contains only one codelist. The table should have columns for code and code_type.
-
Single table, multiple codelists: A single table (whether an entire file, or a single sheet in an excel file) contains multiple codelists. A column for the name of each codelist is required. Use codelist_name to point to the specific codelist of interest.
Parameters: path: Path to the Excel file. sheet_name: An optional label for the sheet to read from. If defined, the codelist will be taken from that sheet. If no sheet_name is defined, the first sheet is taken. codelist_name: An optional name of the codelist which to extract. If defined, codelist_column must be present and the codelist_name must occur within the codelist_column. code_column: The name of the column containing the codes. code_type_column: The name of the column containing the code types. codelist_column: The name of the column containing the codelist names.
Returns: Codelist instance.
Source code in phenex/codelists/codelists.py
225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 |
|
from_medconb(codelist)
classmethod
Converts a MedConB style Codelist into a PhenEx style codelist.
Source code in phenex/codelists/codelists.py
from_yaml(path)
classmethod
Load a codelist from a yaml file.
The YAML file should contain a dictionary where the keys are code types (e.g., "ICD-9", "ICD-10") and the values are lists of codes for each type.
Example:
ICD-9:
- "427.31" # Atrial fibrillation
ICD-10:
- "I48.0" # Paroxysmal atrial fibrillation
- "I48.1" # Persistent atrial fibrillation
- "I48.2" # Chronic atrial fibrillation
- "I48.91" # Unspecified atrial fibrillation
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
str
|
Path to the YAML file. |
required |
Returns:
Type | Description |
---|---|
Codelist
|
Codelist instance. |
Source code in phenex/codelists/codelists.py
resolve(use_code_type=True, remove_punctuation=False)
Resolve the codelist based on the provided arguments.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
use_code_type
|
bool
|
If False, merge all the code lists into one with None as the key. |
True
|
remove_punctuation
|
bool
|
If True, remove '.' from all codes. |
False
|
Returns:
Type | Description |
---|---|
Codelist
|
Codelist instance with the resolved codelist. |
Source code in phenex/codelists/codelists.py
to_pandas()
Export the codelist to a pandas DataFrame. The DataFrame will have three columns: code_type, code, and codelist.
Source code in phenex/codelists/codelists.py
to_tuples()
Convert the codelist to a list of tuples, where each tuple is of the form (code_type, code).
Source code in phenex/codelists/codelists.py
LocalCSVCodelistFactory
LocalCSVCodelistFactory allows for the creation of multiple codelists from a single CSV file. Use this class when you have a single CSV file that contains multiple codelists.
To use, create an instance of the class and then call the create_codelist
method with the name of the codelist you want to create; this codelist name must be an entry in the name_code_type_column.
Source code in phenex/codelists/codelists.py
__init__(path, name_code_column='code', name_codelist_column='codelist', name_code_type_column='code_type')
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
str
|
Path to the CSV file. |
required |
name_code_column
|
str
|
The name of the column containing the codes. |
'code'
|
name_codelist_column
|
str
|
The name of the column containing the codelist names. |
'codelist'
|
name_code_type_column
|
str
|
The name of the column containing the code types. |
'code_type'
|