sandpaper¶

This is the base sandpaper package that gets imported.

sandpaper.sandpaper module¶

sandpaper.sandpaper.value_rule(func)[source]¶

A meta wrapper for value normalization rules.

Note

Value rules take in a full record and a column name as implicit parameters. They are expected to return the value at record[column] that has be normalized by the rule.

Parameters:	func (callable) – The normalization rule
Returns:	The wrapped normalization rule
Return type:	callable

sandpaper.sandpaper.record_rule(func)[source]¶

A meta wrapper for table normalization rules.

Note

Record rules are applied after all value rules have been applied to a record. They take in a full record as an implicit parameter and are expected to return the normalized record back.

Parameters:	func (callable) – The normalization rule
Returns:	The wrapped normalization rule
Return type:	callable

class sandpaper.sandpaper.SandPaper(name=None)[source]¶

Bases: object

The SandPaper object.

Allows chained data normalization across multiple different table type: data files such as .csv, .xls, and .xlsx.

name¶

The descriptive name of the SandPaper instance.

Note

If no name has been given, a continually updating uid hash of the active rules is used instead

Getter:	Returns the given or suitable name for a SandPaper instance
Setter:	Sets the descriptive name of the SandPaper instance
Return type:	str

uid¶

A continually updating hash of the active rules.

A hexadecimal digest string

Getter:	Returns a continually updating hash of the active rules
Return type:	str

rules¶

This list of applicable rules for the SandPaper instance.

Getter:	Returns the list of applicable rules for the instance
Return type:	list[tuple(callable, tuple(…,…), dict[str,…])]

value_rules¶

The set of value rules for the SandPaper instance.

Getter:	Returns the set rules for the SandPaper instance
Return type:	set(callable)

record_rules¶

The set of record rules for the SandPaper instance.

Getter:	Returns the set rules for the SandPaper instance
Return type:	set(callable)

lower(record, column, **kwargs)[source]¶

A basic lowercase rule for a given value.

Only applies to text type variables

Parameters:	record (collections.OrderedDict) – A record whose value within `column` should be normalized and returned column (str) – A column that indicates what value to normalize kwargs (dict) – Any named arguments
Returns:	The value lowercased

upper(record, column, **kwargs)[source]¶

A basic uppercase rule for a given value.

Only applies to text type variables

Parameters:	record (collections.OrderedDict) – A record whose value within `column` should be normalized and returned column (str) – A column that indicates what value to normalize kwargs (dict) – Any named arguments
Returns:	The value uppercased

capitalize(record, column, **kwargs)[source]¶

A basic capitalization rule for a given value.

Only applies to text type variables

Parameters:	record (collections.OrderedDict) – A record whose value within `column` should be normalized and returned column (str) – A column that indicates what value to normalize kwargs (dict) – Any named arguments
Returns:	The value capatilized

title(record, column, **kwargs)[source]¶

A basic titlecase rule for a given value.

Only applies to text type variables

Parameters:	record (collections.OrderedDict) – A record whose value within `column` should be normalized and returned column (str) – A column that indicates what value to normalize kwargs (dict) – Any named arguments
Returns:	The value titlecased

lstrip(record, column, content=None, **kwargs)[source]¶

A basic lstrip rule for a given value.

Only applies to text type variables.

Parameters:	record (collections.OrderedDict) – A record whose value within `column` should be normalized and returned column (str) – A column that indicates what value to normalize content (str) – The content to strip (defaults to whitespace) kwargs (dict) – Any named arguments
Returns:	The value with left content stripped

rstrip(record, column, content=None, **kwargs)[source]¶

A basic rstrip rule for a given value.

Only applies to text type variables.

Parameters:	record (collections.OrderedDict) – A record whose value within `column` should be normalized and returned column (str) – A column that indicates what value to normalize content (str) – The content to strip (defaults to whitespace) kwargs (dict) – Any named arguments
Returns:	The value with right content stripped

strip(record, column, content=None, **kwargs)[source]¶

A basic strip rule for a given value.

Only applies to text type variables.

Parameters:	record (collections.OrderedDict) – A record whose value within `column` should be normalized and returned column (str) – A column that indicates what value to normalize content (str) – The content to strip (defaults to whitespace) kwargs (dict) – Any named arguments
Returns:	The value with all content stripped

increment(record, column, amount=1, **kwargs)[source]¶

A basic increment rule for a given value.

Only applies to numeric (int, float) type variables.

Parameters:	record (collections.OrderedDict) – A record whose value within `column` should be normalized and returned column (str) – A column that indicates what value to normalize amount (int or float) – The amount to increment by kwargs (dict) – Any named arguments
Returns:	The value incremented by `amount`

decrement(record, column, amount=1, **kwargs)[source]¶

A basic decrement rule for a given value.

Only applies to numeric (int, float) type variables.

Parameters:	record (collections.OrderedDict) – A record whose value within `column` should be normalized and returned column (str) – A column that indicates what value to normalize amount (int or float) – The amount to decrement by kwargs (dict) – Any named arguments
Returns:	The value incremented by `amount`

replace(record, column, replacements, **kwargs)[source]¶

Applies a replacements dictionary to a value.

Take for example the following SandPaper instance:

s = SandPaper('my-sandpaper').replace({
    'this_is_going_to_be_replaced': 'with_this',
})

Parameters:	record (collections.OrderedDict) – A record whose value within `column` should be normalized and returned column (str) – A column that indicates what value to normalize replacements (dict[str, str]) – A dictionary of replacements for the value kwargs (dict) – Any named arguments
Returns:	The value with all replacements made

translate_text(record, column, translations, **kwargs)[source]¶

A text translation rule for a given value.

Take for example the following SandPaper instance:

s = SandPaper('my-sandpaper').translate_text({
    r'^group(?P<group_id>\d+)\s*(.*)$': '{group_id}'
}, column_filter=r'^group_definition$')

This will translate all instances of the value group<GROUP NUMBER> to <GROUP NUMBER> only in columns named group_definition.

Important

Note that matched groups and matched groupdicts are passed as *args and **kwargs to the format method of the returned to_format string.

Parameters:	record (collections.OrderedDict) – A record whose value within `column` should be normalized and returned column (str) – A column that indicates what value to normalize translations (dict[str, str]) – A dictionary of translations the value kwargs (dict) – Any named arguments
Returns:	The potentially translated value

translate_date(record, column, translations, **kwargs)[source]¶

A date translation rule for a given value.

Take for example the following SandPaper instance:

s = SandPaper('my-sandpaper').translate_date({
    '%Y-%m-%d': '%Y',
    '%Y': '%Y',
    '%Y-%m': '%Y'
}, column_filter=r'^(.*)_date$')

This will translate all instances of a date value matching the given date formats in columns ending with _date to the date format %Y.

Parameters:	record (collections.OrderedDict) – A record whose value within `column` should be normalized and returned column (str) – A column that indicates what value to normalize translations (dict[str, str]) – A dictionary of translations from an arrow based dateformats to a different format kwargs (dict) – Any named arguments
Returns:	The value potentially translated value

add_columns(record, additions, **kwargs)[source]¶

Adds columns to a record.

Note

If the value of an entry in additions is a callable, then the callable should expect the record as the only parameter and should return the value that should be placed in the newly added column.

If the value of an entry in additions is a string, the record is passed in as kwargs to the value’s format method.

Otherwise, the value of an entry in additions is simply used as the newly added column’s value.

Parameters:	record (collections.OrderedDict) – A record whose value within `column` should be normalized and returned additions (dict[str,...]) – A dictionary of column names to callables, strings, or other values kwargs (dict) – Any named arguments
Returns:	The record with a potential newly added column

remove_columns(record, removes, **kwargs)[source]¶

Removes columns from a record.

Parameters:	record (collections.OrderedDict) – A record whose value within `column` should be normalized and returned removes (list[str]) – A list of columns to remove kwargs (dict) – Any named arguments
Returns:	The record with a potential newly removed column

keep_columns(record, keeps, **kwargs)[source]¶

Removes all other columns from a record.

Parameters:	record (collections.OrderedDict) – A record whose value within `column` should be normalized and returned keeps (list[str]) – A list of columns to keep kwargs (dict) – Any named arguments
Returns:	The record with a potential newly kept column

rename_columns(record, renames, **kwargs)[source]¶

Maps an existing column to a new column.

Parameters:	record (collections.OrderedDict) – A record whose value within `column` should be normalized and returned renames (dict[str, str]) – A dictionary of column to column renames kwargs (dict) – Any named arguments
Returns:	The record with the remapped column

order_columns(record, order, ignore_missing=False, **kwargs)[source]¶

Orders columns in a specific order.

Parameters:	record (collections.OrderedDict) – A record who should be ordered order (list[str]) – The order that columns need to be in ignore_missing (bool) – Boolean which inidicates if missing columns from `order` should be ignored kwargs (dict) – Any named arguments
Returns:	The record with the columns reordered

apply(from_file, to_file, sheet_name=None, row_filter=None, monitor_rules=False, **kwargs)[source]¶

Applies a SandPaper instance rules to a given glob of files.

Parameters:	from_file (str) – The path of the file to apply the rules to to_file (str) – The path of the file to write to sheet_name (str) – The name of the sheet to apply rules to (defaults to the first available sheet) row_filter (callable) – A callable which accepts a cleaned record and returns True if the record should be written out monitor_rules (bool) – Boolean flag that inidicates if the count of applied rules should be monitored kwargs (dict) – Any additional named arguments (applied to the pyexcel `iget_records` method)
Returns:	The rule statistics if `monitor_rules` is true
Return type:	dict[str, int]

classmethod from_json(serialization)[source]¶

Loads a SandPaper instance from a json serialization.

Note

Raises a UserWarning when the loaded instance does not match the serialized instance’s uid.

Parameters:	serialization (dict) – The read json serialization
Returns:	A new SandPaper instance
Return type:	SandPaper

sandpaper¶

sandpaper.sandpaper module¶

Table Of Contents

Related Topics

This Page