Finish upsert code, and documentation

This commit is contained in:
Scot Hacker 2019-03-09 17:59:35 -08:00
parent ed3d366ead
commit 365435e839
3 changed files with 111 additions and 50 deletions

View file

@ -132,7 +132,6 @@ The current django-todo version number is available from the [todo package](http
python -c "import todo; print(todo.__version__)" python -c "import todo; print(todo.__version__)"
## Upgrade Notes ## Upgrade Notes
django-todo 2.0 was rebuilt almost from the ground up, and included some radical changes, including model name changes. As a result, it is *not compatible* with data from django-todo 1.x. If you would like to upgrade an existing installation, try this: django-todo 2.0 was rebuilt almost from the ground up, and included some radical changes, including model name changes. As a result, it is *not compatible* with data from django-todo 1.x. If you would like to upgrade an existing installation, try this:
@ -166,6 +165,43 @@ django-todo uses pytest exclusively for testing. The best way to run the suite i
The previous `tox` system was removed with the v2 release, since we no longer aim to support older Python or Django versions. The previous `tox` system was removed with the v2 release, since we no longer aim to support older Python or Django versions.
# Importing Tasks via CSV
django-todo has the ability to batch-import ("upsert") tasks from a specifically formatted CSV spreadsheet. This ability is provided through both a management command or the web interface.
## Management Command
`./manage.py import_csv -f /path/to/file.csv`
## Web Importer
Link from your navigation to `{url todo:import_csv}`
## Import Logic
Because data entered via CSV is not going through the same view permissions enforced in the rest of django-todo, and to simplify the logic of when to update vs create a record, etc., the importer will *not* create new users, groups, or task lists. All users, groups, and task lists referenced i your CSV must already exist, and memberships must be correct (if you have a row specifying a user in an incorrect group, the importer will skip that row).
Any validation error (e.g. unparse-able dates) results in that row being skipped.
A report of rows upserted and rows skipped (with line numbers and reasons) is provided at the end of the run.
## CSV Formatting
Copy `todo/data/import_example.csv` to another location on your system and edit in a spreadsheet or directly.
The "Created By", "Task List" and "Group" columns are required -- all others are optional and should work pretty much exactly like manual task entry via the web UI.
Note: Internally, Tasks are keyed to TaskLists, not to Groups (TaskLists are in Gruops). However, we request the Group in the CSV
because it's possible to have multiple TaskLists with the same name in different groups; i.e. we need it for namespacing and permissions.
## Upsert Logic:
For each valid row, we need to decide whether to create a new task or update an existing one. django-todo matches on the unique combination of Task List, Task Title, and Created By. If we find a task that matches those three, we *update* the rest of the columns. In other words, if you import a CSV once, then edit the Assigned To for a task and import it again, the original task will be updated with a new assignee (and same for the other columns).
Otherwise we create a new task.
# Version History # Version History
**2.2.1** Convert task delete and toggle_done views to POST only **2.2.1** Convert task delete and toggle_done views to POST only

View file

@ -1,5 +1,5 @@
Title,Group,Task List,Created Date,Due Date,Completed,Created By,Assigned To,Note,Priority Title,Group,Task List,Created Date,Due Date,Completed,Created By,Assigned To,Note,Priority
Make dinner,Scuba Divers,Groovy,2012-03-12,2012-03-14,No,shacker,shacker,This is as good as it gets,3 Make dinner,Scuba Divers,Groovy,2012-03-12,2012-03-14,No,shacker,shacker,Temmo is a dog,3
Bake bread,Scuba Divers,Example List,2012-03-14,2012-03-14,,nonexistentusername,,, Bake bread,Scuba Divers,Example List,2012-03-14,2012-03-14,,nonexistentusername,,,
Eat food,Scuba Divers,Groovy,,2015-06-24,Yes,user1,user1,Every generation throws a hero up the pop charts,77 Eat food,Scuba Divers,Groovy,,2015-06-24,Yes,user1,user1,Every generation throws a hero up the pop charts,77
Be glad,Scuba Divers,Example List,2019-03-07,,,user3,user2,,1 Be glad,Scuba Divers,Example List,2019-03-07,,,user3,user2,,1

1 Title Group Task List Created Date Due Date Completed Created By Assigned To Note Priority
2 Make dinner Scuba Divers Groovy 2012-03-12 2012-03-14 No shacker shacker This is as good as it gets Temmo is a dog 3
3 Bake bread Scuba Divers Example List 2012-03-14 2012-03-14 nonexistentusername
4 Eat food Scuba Divers Groovy 2015-06-24 Yes user1 user1 Every generation throws a hero up the pop charts 77
5 Be glad Scuba Divers Example List 2019-03-07 user3 user2 1

View file

@ -15,21 +15,13 @@ log = logging.getLogger(__name__)
class CSVImporter: class CSVImporter:
"""Core upsert functionality for CSV import, for re-use by `import_csv` management command, web UI and tests. """Core upsert functionality for CSV import, for re-use by `import_csv` management command, web UI and tests.
For each row processed, first we try and get the correct related objects or set default values, then decide Supplies a detailed log of what was and was not imported at the end. See README for usage notes.
on our upsert logic - create or update? We must enforce internal rules during object creation and take a SAFE """
approache - for example
we shouldn't add a task if it specifies that a user is not a specified group. For that reason, it also doesn't
make sense to create new groups from here. In other words, the ingested CSV must accurately represent the current
database. Non-conforming rows are skipped and logged. Unlike manual task creation, we won't assume that the person
running this ingestion is the task creator - the creator must be specified, and a blank cell is an error. We also
do not create new lists - they must already exist (because if we did create new lists we'd also have to add the user to it,
etc.)
Supplies a detailed log of what was and was not imported at the end."""
def __init__(self): def __init__(self):
self.errors = [] self.errors = []
self.line_count = 0 self.line_count = 0
self.upsert_count = 0
def upsert(self, filepath): def upsert(self, filepath):
@ -38,29 +30,49 @@ class CSVImporter:
sys.exit(1) sys.exit(1)
with open(filepath, mode="r") as csv_file: with open(filepath, mode="r") as csv_file:
# Have arg and good file path -- read rows # Have arg and good file path -- read in rows as dicts.
# Inbound columns: # Header row is:
# Title, Group, Task List, Created Date, Due Date, Completed, Created By, Assigned To, Note, Priority # Title, Group, Task List, Created Date, Due Date, Completed, Created By, Assigned To, Note, Priority
print("\n")
csv_reader = csv.DictReader(csv_file) csv_reader = csv.DictReader(csv_file)
for row in csv_reader: for row in csv_reader:
self.line_count += 1 self.line_count += 1
newrow = self.validate_row(row) # Copy so we can modify properties newrow = self.validate_row(row)
if newrow: if newrow:
ic(newrow) # newrow at this point is fully validated, and all FK relations exist,
print("\n") # e.g. `newrow.get("Assigned To")`, is a Django User instance.
obj, created = Task.objects.update_or_create(
created_by=newrow.get("Created By"),
task_list=newrow.get("Task List"),
title=newrow.get("Title"),
defaults={
"assigned_to": newrow.get("Assigned To"),
"completed": newrow.get("Completed"),
"created_date": newrow.get("Created Date"),
"due_date": newrow.get("Due Date"),
"note": newrow.get("Note"),
"priority": newrow.get("Priority"),
},
)
self.upsert_count += 1
print(
f"Upserted task {obj.id}: \"{obj.title}\""
f"in list \"{obj.task_list}\" (group \"{obj.task_list.group}\")"
)
# Report. Stored errors has the form: # Report. Stored errors has the form:
# self.errors = [{3: ["Incorrect foo", "Non-existent bar"]}, {7: [...]}] # self.errors = [{3: ["Incorrect foo", "Non-existent bar"]}, {7: [...]}]
print("\n")
for error_dict in self.errors: for error_dict in self.errors:
for k, error_list in error_dict.items(): for k, error_list in error_dict.items():
print(f"Skipped row {k}:") print(f"Skipped CSV row {k}:")
for msg in error_list: for msg in error_list:
print(f"\t{msg}") print(f"\t{msg}")
print(f"\nProcessed {self.line_count} rows") print(f"\nProcessed {self.line_count} CSV rows")
print(f"Inserted xxx rows") print(f"Upserted {self.upsert_count} rows")
def validate_row(self, row): def validate_row(self, row):
"""Perform data integrity checks and set default values. Returns a valid object for insertion, or False. """Perform data integrity checks and set default values. Returns a valid object for insertion, or False.
@ -68,6 +80,7 @@ class CSVImporter:
row_errors = [] row_errors = []
# #######################
# Task creator must exist # Task creator must exist
if not row.get("Created By"): if not row.get("Created By"):
msg = f"Missing required task creator." msg = f"Missing required task creator."
@ -81,6 +94,7 @@ class CSVImporter:
msg = f"Invalid task creator {row.get('Created By')}" msg = f"Invalid task creator {row.get('Created By')}"
row_errors.append(msg) row_errors.append(msg)
# #######################
# If specified, Assignee must exist # If specified, Assignee must exist
if row.get("Assigned To"): if row.get("Assigned To"):
assigned = get_user_model().objects.filter(username=row.get("Assigned To")) assigned = get_user_model().objects.filter(username=row.get("Assigned To"))
@ -92,6 +106,7 @@ class CSVImporter:
else: else:
assignee = None # Perfectly valid assignee = None # Perfectly valid
# #######################
# Group must exist # Group must exist
try: try:
target_group = Group.objects.get(name=row.get("Group")) target_group = Group.objects.get(name=row.get("Group"))
@ -99,53 +114,63 @@ class CSVImporter:
msg = f"Could not find group {row.get('Group')}." msg = f"Could not find group {row.get('Group')}."
row_errors.append(msg) row_errors.append(msg)
# #######################
# Task creator must be in the target group # Task creator must be in the target group
if creator and target_group not in creator.groups.all(): if creator and target_group not in creator.groups.all():
msg = f"{creator} is not in group {target_group}" msg = f"{creator} is not in group {target_group}"
row_errors.append(msg) row_errors.append(msg)
# #######################
# Assignee must be in the target group # Assignee must be in the target group
if assignee and target_group not in assignee.groups.all(): if assignee and target_group not in assignee.groups.all():
msg = f"{assignee} is not in group {target_group}" msg = f"{assignee} is not in group {target_group}"
row_errors.append(msg) row_errors.append(msg)
# #######################
# Task list must exist in the target group
try:
tasklist = TaskList.objects.get(name=row.get("Task List"), group=target_group)
row["Task List"] = tasklist
except TaskList.DoesNotExist:
msg = f"Task list {row.get('Task List')} in group {target_group} does not exist"
row_errors.append(msg)
# #######################
# Validate Due Date
dd = row.get("Due Date")
if dd:
try:
row["Due Date"] = datetime.datetime.strptime(dd, "%Y-%m-%d")
except ValueError:
msg = f"Could not convert Due Date {dd} to python date"
row_errors.append(msg)
else:
row["Created Date"] = None # Override default empty string '' value
# #######################
# Validate Created Date
cd = row.get("Created Date")
if cd:
try:
row["Created Date"] = datetime.datetime.strptime(cd, "%Y-%m-%d")
except ValueError:
msg = f"Could not convert Created Date {cd} to python date"
row_errors.append(msg)
else:
row["Created Date"] = None # Override default empty string '' value
# #######################
# Group membership checks have passed # Group membership checks have passed
row["Created By"] = creator row["Created By"] = creator
row["Group"] = target_group row["Group"] = target_group
if assignee: if assignee:
row["Assigned To"] = assignee row["Assigned To"] = assignee
# Task list must exist in the target group # Set Completed
try:
tasklist = TaskList.objects.get(name=row.get("Task List"), group=target_group)
row["Task List"] = tasklist
except TaskList.DoesNotExist:
msg = (
f"Task list {row.get('Task List')} in group {target_group} does not exist"
)
row_errors.append(msg)
# Validate Due Date
dd = row.get("Due Date")
if dd:
try:
row["Due Date"] = datetime.datetime.strptime(dd, '%Y-%m-%d')
except ValueError:
msg = f"Could not convert Due Date {dd} to python date"
row_errors.append(msg)
# Validate Created Date
cd = row.get("Created Date")
if cd:
try:
row["Created Date"] = datetime.datetime.strptime(cd, '%Y-%m-%d')
except ValueError:
msg = f"Could not convert Created Date {cd} to python date"
row_errors.append(msg)
# Set Completed default
row["Completed"] = True if row.get("Completed") == "Yes" else False row["Completed"] = True if row.get("Completed") == "Yes" else False
# #######################
if row_errors: if row_errors:
self.errors.append({self.line_count: row_errors}) self.errors.append({self.line_count: row_errors})
return False return False