Available Rules¶

Below are a list of available rules that can be attached to a SandPaper instance. All of these rules first must pass several optional filters discussed in Rule Filters.

In the following examples of these rules the symbol □ represents whitespace.

Value Rules¶

These rules are applied to every value that passes the specified rule filters documented in Rule Filters.

`lower()`¶

A basic rule that lowercases the text in a value.

SandPaper().lower()

Input	Output
DATa	data

`upper()`¶

A basic rule that uppercases the text in a value.

SandPaper().upper()

Input	Output
daTa	DATA

`capitalize()`¶

A basic rule that capitalizes the text in a value.

SandPaper().capitalize()

Input	Output
daTa	Data

`title()`¶

A basic rule that titlecases the text in a value.

SandPaper().title()

Input	Output
mY dAta	My Data

`lstrip()`¶

A basic rule that strips all left whitespace from a value.

SandPaper().lstrip()

Input	Output
□□data	data

`rstrip()`¶

A basic rule that strips all right whitespace from a value.

SandPaper().rstrip()

Input	Output
data□□	data

`strip()`¶

A basic rule that strips all whitespace from a value.

SandPaper().strip()

Input	Output
□data□	data

`translate_text()`¶

A translation rule that translate regex matches to a specified format.

SandPaper().translate_text({
   r'group_(?P<group_id>\d+)$': '{group_id}'
})

Input	Output
group_47	47
group_123	123
group_0	0

`translate_date()`¶

A translation rule that translate greedily evaluated dates to a specified datetime format.

Note

This rule is very greedy and can potentailly evaluate dates incorrectly. It is highly recommended that at the very least a column_filter is supplied with this rule.

SandPaper().translate_date({
   '%Y-%m-%d': '%Y',
   '%Y-%m': '%Y',
   '%Y': '%Y'
})

Input	Output
2017-01-32	2017
2017-01	2017
2017	2017

Record Rules¶

These rules are applied to every record regardless to most rule filters documented in Rule Filters.

The given dictionary should be a key value pairing where the key is a new column name and the paired value is either a callable, string, or other low level data type for the newly added column’s value. If the value is a callable it should expect to receive the record as the only parameter and should return the value desired for the newly added column.

import uuid

def gen_uuid(record):
   return uuid.uuid4()

SandPaper().add_columns({
   'uuid': gen_uuid
})

Before
id	name	value
1	hello	world
2	test	table

After
id	name	value	uuid
1	hello	world	a6a76896-c33d-4654-afdf-12aa80dd6238
2	test	table	b1e171c2-fee9-4270-96e9-4853c3a6e130

`remove_columns()`¶

Removes a column from every record.

SandPaper().remove_columns([
   'name'
])

Before
id	name	value
1	hello	world
2	test	table

After
id	value
1	world
2	table

`keep_columns()`¶

Removes all other columns for every record.

SandPaper().keep_columns([
   'id',
   'name'
])

Before
id	name	value
1	hello	world
2	test	table

After
id	name
1	hello
2	test

`rename_columns()`¶

Renames a column for every record.

SandPaper().rename_columns([
   'old_name': 'new_name'
])

Before
id	old_name	value
1	hello	world
2	test	table

After
id	new_name	value
1	hello	world
2	test	table

`order_columns()`¶

Reorders columns from every record.

SandPaper().order_columns([
   'value',
   'id',
   'name'
])

Before
id	name	value
1	hello	world
2	test	table

After
value	id	name
world	1	hello
table	2	test