Available Rules¶
Below are a list of available rules that can be attached to a SandPaper instance.
All of these rules first must pass several optional filters discussed in Rule Filters.
In the following examples of these rules the symbol □ represents whitespace.
Value Rules¶
These rules are applied to every value that passes the specified rule filters documented in Rule Filters.
lower()¶
A basic rule that lowercases the text in a value.
SandPaper().lower()
| Input | Output |
|---|---|
| DATa | data |
upper()¶
A basic rule that uppercases the text in a value.
SandPaper().upper()
| Input | Output |
|---|---|
| daTa | DATA |
capitalize()¶
A basic rule that capitalizes the text in a value.
SandPaper().capitalize()
| Input | Output |
|---|---|
| daTa | Data |
title()¶
A basic rule that titlecases the text in a value.
SandPaper().title()
| Input | Output |
|---|---|
| mY dAta | My Data |
lstrip()¶
A basic rule that strips all left whitespace from a value.
SandPaper().lstrip()
| Input | Output |
|---|---|
| □□data | data |
rstrip()¶
A basic rule that strips all right whitespace from a value.
SandPaper().rstrip()
| Input | Output |
|---|---|
| data□□ | data |
strip()¶
A basic rule that strips all whitespace from a value.
SandPaper().strip()
| Input | Output |
|---|---|
| □data□ | data |
translate_text()¶
A translation rule that translate regex matches to a specified format.
SandPaper().translate_text({
r'group_(?P<group_id>\d+)$': '{group_id}'
})
| Input | Output |
|---|---|
| group_47 | 47 |
| group_123 | 123 |
| group_0 | 0 |
translate_date()¶
A translation rule that translate greedily evaluated dates to a specified datetime format.
Note
This rule is very greedy and can potentailly evaluate dates incorrectly.
It is highly recommended that at the very least a column_filter is supplied with this rule.
SandPaper().translate_date({
'%Y-%m-%d': '%Y',
'%Y-%m': '%Y',
'%Y': '%Y'
})
| Input | Output |
|---|---|
| 2017-01-32 | 2017 |
| 2017-01 | 2017 |
| 2017 | 2017 |
Record Rules¶
These rules are applied to every record regardless to most rule filters documented in Rule Filters.
add_columns()¶
Adds a column to every record.
The given dictionary should be a key value pairing where the key is a new column name and the paired value is either a callable, string, or other low level data type for the newly added column’s value. If the value is a callable it should expect to receive the record as the only parameter and should return the value desired for the newly added column.
import uuid
def gen_uuid(record):
return uuid.uuid4()
SandPaper().add_columns({
'uuid': gen_uuid
})
| Before | ||
|---|---|---|
| id | name | value |
| 1 | hello | world |
| 2 | test | table |
| After | |||
|---|---|---|---|
| id | name | value | uuid |
| 1 | hello | world | a6a76896-c33d-4654-afdf-12aa80dd6238 |
| 2 | test | table | b1e171c2-fee9-4270-96e9-4853c3a6e130 |
remove_columns()¶
Removes a column from every record.
SandPaper().remove_columns([
'name'
])
| Before | ||
|---|---|---|
| id | name | value |
| 1 | hello | world |
| 2 | test | table |
| After | |
|---|---|
| id | value |
| 1 | world |
| 2 | table |
keep_columns()¶
Removes all other columns for every record.
SandPaper().keep_columns([
'id',
'name'
])
| Before | ||
|---|---|---|
| id | name | value |
| 1 | hello | world |
| 2 | test | table |
| After | |
|---|---|
| id | name |
| 1 | hello |
| 2 | test |
rename_columns()¶
Renames a column for every record.
SandPaper().rename_columns([
'old_name': 'new_name'
])
| Before | ||
|---|---|---|
| id | old_name | value |
| 1 | hello | world |
| 2 | test | table |
| After | ||
|---|---|---|
| id | new_name | value |
| 1 | hello | world |
| 2 | test | table |
order_columns()¶
Reorders columns from every record.
SandPaper().order_columns([
'value',
'name',
'id'
])
| Before | ||
|---|---|---|
| id | name | value |
| 1 | hello | world |
| 2 | test | table |
| After | ||
|---|---|---|
| value | id | name |
| world | 1 | hello |
| table | 2 | test |