Finish upsert code, and documentation

This commit is contained in:
Scot Hacker 2019-03-09 17:59:35 -08:00
parent ed3d366ead
commit 365435e839
3 changed files with 111 additions and 50 deletions

View file

@ -132,7 +132,6 @@ The current django-todo version number is available from the [todo package](http
python -c "import todo; print(todo.__version__)"
## Upgrade Notes
django-todo 2.0 was rebuilt almost from the ground up, and included some radical changes, including model name changes. As a result, it is *not compatible* with data from django-todo 1.x. If you would like to upgrade an existing installation, try this:
@ -166,6 +165,43 @@ django-todo uses pytest exclusively for testing. The best way to run the suite i
The previous `tox` system was removed with the v2 release, since we no longer aim to support older Python or Django versions.
# Importing Tasks via CSV
django-todo has the ability to batch-import ("upsert") tasks from a specifically formatted CSV spreadsheet. This ability is provided through both a management command or the web interface.
## Management Command
`./manage.py import_csv -f /path/to/file.csv`
## Web Importer
Link from your navigation to `{url todo:import_csv}`
## Import Logic
Because data entered via CSV is not going through the same view permissions enforced in the rest of django-todo, and to simplify the logic of when to update vs create a record, etc., the importer will *not* create new users, groups, or task lists. All users, groups, and task lists referenced i your CSV must already exist, and memberships must be correct (if you have a row specifying a user in an incorrect group, the importer will skip that row).
Any validation error (e.g. unparse-able dates) results in that row being skipped.
A report of rows upserted and rows skipped (with line numbers and reasons) is provided at the end of the run.
## CSV Formatting
Copy `todo/data/import_example.csv` to another location on your system and edit in a spreadsheet or directly.
The "Created By", "Task List" and "Group" columns are required -- all others are optional and should work pretty much exactly like manual task entry via the web UI.
Note: Internally, Tasks are keyed to TaskLists, not to Groups (TaskLists are in Gruops). However, we request the Group in the CSV
because it's possible to have multiple TaskLists with the same name in different groups; i.e. we need it for namespacing and permissions.
## Upsert Logic:
For each valid row, we need to decide whether to create a new task or update an existing one. django-todo matches on the unique combination of Task List, Task Title, and Created By. If we find a task that matches those three, we *update* the rest of the columns. In other words, if you import a CSV once, then edit the Assigned To for a task and import it again, the original task will be updated with a new assignee (and same for the other columns).
Otherwise we create a new task.
# Version History
**2.2.1** Convert task delete and toggle_done views to POST only

View file

@ -1,5 +1,5 @@
Title,Group,Task List,Created Date,Due Date,Completed,Created By,Assigned To,Note,Priority
Make dinner,Scuba Divers,Groovy,2012-03-12,2012-03-14,No,shacker,shacker,This is as good as it gets,3
Make dinner,Scuba Divers,Groovy,2012-03-12,2012-03-14,No,shacker,shacker,Temmo is a dog,3
Bake bread,Scuba Divers,Example List,2012-03-14,2012-03-14,,nonexistentusername,,,
Eat food,Scuba Divers,Groovy,,2015-06-24,Yes,user1,user1,Every generation throws a hero up the pop charts,77
Be glad,Scuba Divers,Example List,2019-03-07,,,user3,user2,,1

1 Title Group Task List Created Date Due Date Completed Created By Assigned To Note Priority
2 Make dinner Scuba Divers Groovy 2012-03-12 2012-03-14 No shacker shacker This is as good as it gets Temmo is a dog 3
3 Bake bread Scuba Divers Example List 2012-03-14 2012-03-14 nonexistentusername
4 Eat food Scuba Divers Groovy 2015-06-24 Yes user1 user1 Every generation throws a hero up the pop charts 77
5 Be glad Scuba Divers Example List 2019-03-07 user3 user2 1

View file

@ -15,21 +15,13 @@ log = logging.getLogger(__name__)
class CSVImporter:
"""Core upsert functionality for CSV import, for re-use by `import_csv` management command, web UI and tests.
For each row processed, first we try and get the correct related objects or set default values, then decide
on our upsert logic - create or update? We must enforce internal rules during object creation and take a SAFE
approache - for example
we shouldn't add a task if it specifies that a user is not a specified group. For that reason, it also doesn't
make sense to create new groups from here. In other words, the ingested CSV must accurately represent the current
database. Non-conforming rows are skipped and logged. Unlike manual task creation, we won't assume that the person
running this ingestion is the task creator - the creator must be specified, and a blank cell is an error. We also
do not create new lists - they must already exist (because if we did create new lists we'd also have to add the user to it,
etc.)
Supplies a detailed log of what was and was not imported at the end."""
Supplies a detailed log of what was and was not imported at the end. See README for usage notes.
"""
def __init__(self):
self.errors = []
self.line_count = 0
self.upsert_count = 0
def upsert(self, filepath):
@ -38,29 +30,49 @@ class CSVImporter:
sys.exit(1)
with open(filepath, mode="r") as csv_file:
# Have arg and good file path -- read rows
# Inbound columns:
# Have arg and good file path -- read in rows as dicts.
# Header row is:
# Title, Group, Task List, Created Date, Due Date, Completed, Created By, Assigned To, Note, Priority
print("\n")
csv_reader = csv.DictReader(csv_file)
for row in csv_reader:
self.line_count += 1
newrow = self.validate_row(row) # Copy so we can modify properties
newrow = self.validate_row(row)
if newrow:
ic(newrow)
print("\n")
# newrow at this point is fully validated, and all FK relations exist,
# e.g. `newrow.get("Assigned To")`, is a Django User instance.
obj, created = Task.objects.update_or_create(
created_by=newrow.get("Created By"),
task_list=newrow.get("Task List"),
title=newrow.get("Title"),
defaults={
"assigned_to": newrow.get("Assigned To"),
"completed": newrow.get("Completed"),
"created_date": newrow.get("Created Date"),
"due_date": newrow.get("Due Date"),
"note": newrow.get("Note"),
"priority": newrow.get("Priority"),
},
)
self.upsert_count += 1
print(
f"Upserted task {obj.id}: \"{obj.title}\""
f"in list \"{obj.task_list}\" (group \"{obj.task_list.group}\")"
)
# Report. Stored errors has the form:
# self.errors = [{3: ["Incorrect foo", "Non-existent bar"]}, {7: [...]}]
print("\n")
for error_dict in self.errors:
for k, error_list in error_dict.items():
print(f"Skipped row {k}:")
print(f"Skipped CSV row {k}:")
for msg in error_list:
print(f"\t{msg}")
print(f"\nProcessed {self.line_count} rows")
print(f"Inserted xxx rows")
print(f"\nProcessed {self.line_count} CSV rows")
print(f"Upserted {self.upsert_count} rows")
def validate_row(self, row):
"""Perform data integrity checks and set default values. Returns a valid object for insertion, or False.
@ -68,6 +80,7 @@ class CSVImporter:
row_errors = []
# #######################
# Task creator must exist
if not row.get("Created By"):
msg = f"Missing required task creator."
@ -81,6 +94,7 @@ class CSVImporter:
msg = f"Invalid task creator {row.get('Created By')}"
row_errors.append(msg)
# #######################
# If specified, Assignee must exist
if row.get("Assigned To"):
assigned = get_user_model().objects.filter(username=row.get("Assigned To"))
@ -92,6 +106,7 @@ class CSVImporter:
else:
assignee = None # Perfectly valid
# #######################
# Group must exist
try:
target_group = Group.objects.get(name=row.get("Group"))
@ -99,53 +114,63 @@ class CSVImporter:
msg = f"Could not find group {row.get('Group')}."
row_errors.append(msg)
# #######################
# Task creator must be in the target group
if creator and target_group not in creator.groups.all():
msg = f"{creator} is not in group {target_group}"
row_errors.append(msg)
# #######################
# Assignee must be in the target group
if assignee and target_group not in assignee.groups.all():
msg = f"{assignee} is not in group {target_group}"
row_errors.append(msg)
# #######################
# Task list must exist in the target group
try:
tasklist = TaskList.objects.get(name=row.get("Task List"), group=target_group)
row["Task List"] = tasklist
except TaskList.DoesNotExist:
msg = f"Task list {row.get('Task List')} in group {target_group} does not exist"
row_errors.append(msg)
# #######################
# Validate Due Date
dd = row.get("Due Date")
if dd:
try:
row["Due Date"] = datetime.datetime.strptime(dd, "%Y-%m-%d")
except ValueError:
msg = f"Could not convert Due Date {dd} to python date"
row_errors.append(msg)
else:
row["Created Date"] = None # Override default empty string '' value
# #######################
# Validate Created Date
cd = row.get("Created Date")
if cd:
try:
row["Created Date"] = datetime.datetime.strptime(cd, "%Y-%m-%d")
except ValueError:
msg = f"Could not convert Created Date {cd} to python date"
row_errors.append(msg)
else:
row["Created Date"] = None # Override default empty string '' value
# #######################
# Group membership checks have passed
row["Created By"] = creator
row["Group"] = target_group
if assignee:
row["Assigned To"] = assignee
# Task list must exist in the target group
try:
tasklist = TaskList.objects.get(name=row.get("Task List"), group=target_group)
row["Task List"] = tasklist
except TaskList.DoesNotExist:
msg = (
f"Task list {row.get('Task List')} in group {target_group} does not exist"
)
row_errors.append(msg)
# Validate Due Date
dd = row.get("Due Date")
if dd:
try:
row["Due Date"] = datetime.datetime.strptime(dd, '%Y-%m-%d')
except ValueError:
msg = f"Could not convert Due Date {dd} to python date"
row_errors.append(msg)
# Validate Created Date
cd = row.get("Created Date")
if cd:
try:
row["Created Date"] = datetime.datetime.strptime(cd, '%Y-%m-%d')
except ValueError:
msg = f"Could not convert Created Date {cd} to python date"
row_errors.append(msg)
# Set Completed default
# Set Completed
row["Completed"] = True if row.get("Completed") == "Yes" else False
# #######################
if row_errors:
self.errors.append({self.line_count: row_errors})
return False