Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breaking change: component identifier format #1478

Merged
merged 6 commits into from
Dec 14, 2023
Merged

Conversation

marlo-longley
Copy link
Contributor

@marlo-longley marlo-longley commented Dec 11, 2023

I branched off Chris's work on this in a branch from last year. Closes #1318

This PR will have 2 parts (will undraft after both are in):

  1. create a seam for customizing the identifier format
  2. change the default format (breaking change)

I also drafted text for a wiki entry on how to use the new traject setting, and how to customize in order to keep the old format after this breaking change.

@marlo-longley marlo-longley force-pushed the identifier-format-2 branch 2 times, most recently from 9ed8336 to 59aa0a0 Compare December 12, 2023 14:58
@marlo-longley
Copy link
Contributor Author

marlo-longley commented Dec 12, 2023

Here is the text of the wiki that I wrote -- to publish when we do a release [edited based on Sean's work]

component_identifier_format in ArcLight

ArcLight has default configuration for how IDs are minted for components from an EAD. These are used internally in the application for navigation. As of a breaking change in version 1.xxxx ?, the default format for component IDs includes an undesrcore: <root_id>_<ref_id> . Here root_id is the root EAD document, and ref_id is a particular component in the hierarchy. In practice this looks something like umich-bhl-851981_aspace_ffa8f2e89cab96c9fa8c25b55ddb1e16.

How to customize component_identifier_format

(This is also the process to retain the default format that existed prior to 1.xxxxx)

Provide the component_identifier_format setting in the ead2_component_config.rb file. You need to provide this as a Ruby “named format string”.

Our default looks like this: (provide link to code line)

provide 'component_identifier_format', '%<root_id>s_%<ref_id>s'

Examples of customization:

  • If instead of umich-bhl-851981_aspace_ffa8f2e89cab96c9fa8c25b55ddb1e16 , you want aspace_ffa8f2e89cab96c9fa8c25b55ddb1e16 (no root prefix), you could provide the following traject setting:

provide 'component_identifier_format', '%<ref_id>s'

  • If you want to retain the previous default format, you could provide the following traject setting (no underscore):

provide 'component_identifier_format', '%<root_id>s%<ref_id>s’

For implementers upgrading to 1.xxxx

  • To incorporate the changes of 1.xxxx and start to use underscores in your component IDs, you will need to run a full reindex because these IDs are stored in Solr. Without doing this, navigating an EAD tree of components will break with routes not found (ArcLight will be looking for http://localhost:3000/catalog/aoa271_aspace_24d96d896c187b4e90ebb6c910f0462f when your component is stored as http://localhost:3000/catalog/aoa271aspace_24d96d896c187b4e90ebb6c910f0462f).

  • To retain the old-style IDs, do not run a reindex. Instead, customize the component_identifier_format as described above to remove the underscore.

@marlo-longley marlo-longley force-pushed the identifier-format-2 branch 3 times, most recently from 9817f69 to 3de2de5 Compare December 12, 2023 16:19
@marlo-longley marlo-longley marked this pull request as ready for review December 12, 2023 16:27
@marlo-longley
Copy link
Contributor Author

marlo-longley commented Dec 12, 2023

I also wanted to include this from @randalldfloyd from Slack since it gives good context for the breaking change:

-- If someone pulls in this breaking change, they will have to run a reindex to get the new underscores into their documents.

@marlo-longley marlo-longley changed the title Identifier format Breaking change: component identifier format Dec 12, 2023
@seanaery seanaery self-requested a review December 12, 2023 18:36
Copy link
Contributor

@seanaery seanaery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really great, @marlo-longley. I got thinking about the complexity of having to override that Parent.global_id method in addition to changing the format in the traject config and thought it'd be ideal to only have to specify this setting in one place.

I branched off your branch (see identifier-format-no-global-id) and made one modest commit that looks to me like it works:
1035486

Curious what you think.

@marlo-longley
Copy link
Contributor Author

marlo-longley commented Dec 13, 2023

@seanaery thanks so much for digging into this. Your work is a big improvement in my opinion! For reducing codebase complexity and also for implementers using this new setting. I will update my wiki text accordingly. I updated the wiki text above.

I tested your branch and all looks good.

I am fine if you wanted to push your commit to this branch/PR, or PR your branch -- not sure the best way.

…red in one place. Advances #1318

- Capture the concatenated/formatted IDs at indexing time, in parent_ids_ssim array
- Use that data instead of global_id
@seanaery
Copy link
Contributor

@marlo-longley Sounds great -- in that case I will cherry-pick my commit into this branch and push it back up so this PR remains the one under review (it'll just have three contributors now).

@randalldfloyd randalldfloyd merged commit 1895ac2 into main Dec 14, 2023
4 checks passed
@randalldfloyd randalldfloyd deleted the identifier-format-2 branch December 14, 2023 15:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Component URLs should separate the ead & ref slugs
5 participants